| Package | Description |
|---|---|
| org.apache.tika |
Apache Tika.
|
| org.apache.tika.config |
Tika configuration tools.
|
| org.apache.tika.detect |
Media type detection.
|
| org.apache.tika.embedder | |
| org.apache.tika.exception |
Tika exception.
|
| org.apache.tika.extractor |
Extraction of component documents.
|
| org.apache.tika.fork |
Forked parser.
|
| org.apache.tika.io |
IO utilities.
|
| org.apache.tika.language | |
| org.apache.tika.language.translate | |
| org.apache.tika.metadata.filter | |
| org.apache.tika.mime |
Media type information.
|
| org.apache.tika.parser |
Tika parsers.
|
| org.apache.tika.parser.external |
External parser process.
|
| org.apache.tika.sax |
SAX utilities.
|
| org.apache.tika.utils |
Utilities.
|
| Modifier and Type | Method and Description |
|---|---|
String |
Tika.parseToString(File file)
Parses the given file and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream)
Parses the given document and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata,
int maxLength)
Parses the given document and returns the extracted text content.
|
String |
Tika.parseToString(Path path)
Parses the file at the given path and returns the extracted text content.
|
String |
Tika.parseToString(URL url)
Parses the resource at the given URL and returns the extracted
text content.
|
| Modifier and Type | Method and Description |
|---|---|
static <T> Param<T> |
Param.load(InputStream stream) |
void |
Param.save(OutputStream stream) |
| Constructor and Description |
|---|
TikaConfig()
Creates a default Tika configuration.
|
TikaConfig(Document document) |
TikaConfig(Document document,
ServiceLoader loader) |
TikaConfig(Element element) |
TikaConfig(Element element,
ClassLoader loader) |
TikaConfig(File file) |
TikaConfig(File file,
ServiceLoader loader) |
TikaConfig(InputStream stream) |
TikaConfig(Path path) |
TikaConfig(Path path,
ServiceLoader loader) |
TikaConfig(String file) |
TikaConfig(URL url) |
TikaConfig(URL url,
ClassLoader loader) |
TikaConfig(URL url,
ServiceLoader loader) |
| Constructor and Description |
|---|
AutoDetectReader(InputStream stream) |
AutoDetectReader(InputStream stream,
Metadata metadata) |
AutoDetectReader(InputStream stream,
Metadata metadata,
EncodingDetector encodingDetector) |
AutoDetectReader(InputStream stream,
Metadata metadata,
ServiceLoader loader) |
| Modifier and Type | Method and Description |
|---|---|
void |
ExternalEmbedder.embed(Metadata metadata,
InputStream inputStream,
OutputStream outputStream,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
void |
Embedder.embed(Metadata metadata,
InputStream originalStream,
OutputStream outputStream,
ParseContext context)
Embeds related document metadata from the given metadata object into the
given output stream.
|
| Modifier and Type | Class and Description |
|---|---|
class |
AccessPermissionException
Exception to be thrown when a document does not allow content extraction.
|
class |
CorruptedFileException
This exception should be thrown when the parse absolutely, positively has to stop.
|
class |
EncryptedDocumentException |
class |
TikaConfigException
Tika Config Exception is an exception to occur when there is an error
in Tika config file and/or one or more of the parsers failed to initialize
from that erroneous config.
|
class |
TikaMemoryLimitException |
class |
UnsupportedFormatException
Parsers should throw this exception when they encounter
a file format that they do not support.
|
class |
ZeroByteFileException
Exception thrown by the AutoDetectParser when a file contains zero-bytes.
|
| Modifier and Type | Method and Description |
|---|---|
void |
ParserContainerExtractor.extract(TikaInputStream stream,
ContainerExtractor recurseExtractor,
EmbeddedResourceHandler handler) |
void |
ContainerExtractor.extract(TikaInputStream stream,
ContainerExtractor recurseExtractor,
EmbeddedResourceHandler handler)
Processes a container file, and extracts all the embedded
resources from within it.
|
| Modifier and Type | Method and Description |
|---|---|
ParserFactory |
ParserFactoryFactory.build() |
void |
ForkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
This sends the objects to the server for parsing, and the server via
the proxies acts on the handler as if it were updating it directly.
|
| Modifier and Type | Class and Description |
|---|---|
static class |
EndianUtils.BufferUnderrunException |
| Modifier and Type | Method and Description |
|---|---|
void |
TemporaryResources.dispose()
Calls the
TemporaryResources.close() method and wraps the potential
IOException into a TikaException for convenience
when used within Tika. |
| Modifier and Type | Method and Description |
|---|---|
static LanguageProfilerBuilder |
LanguageProfilerBuilder.create(String name,
InputStream is,
String encoding)
Deprecated.
Creates a new Language profile from (preferably quite large - 5-10k of
lines) text file
|
float |
LanguageProfilerBuilder.getSimilarity(LanguageProfilerBuilder another)
Deprecated.
Calculates a score how well NGramProfiles match each other
|
| Modifier and Type | Method and Description |
|---|---|
String |
Translator.translate(String text,
String targetLanguage)
Translate text to the given language
This method attempts to auto-detect the source language of the text.
|
String |
DefaultTranslator.translate(String text,
String targetLanguage)
Translate, using the first available service-loaded translator
|
String |
Translator.translate(String text,
String sourceLanguage,
String targetLanguage)
Translate text between given languages.
|
String |
DefaultTranslator.translate(String text,
String sourceLanguage,
String targetLanguage)
Translate, using the first available service-loaded translator
|
| Modifier and Type | Method and Description |
|---|---|
void |
CompositeMetadataFilter.filter(Metadata metadata) |
void |
NoOpFilter.filter(Metadata metadata) |
void |
ExcludeFieldMetadataFilter.filter(Metadata metadata) |
void |
ClearByMimeMetadataFilter.filter(Metadata metadata) |
void |
MetadataFilter.filter(Metadata metadata) |
void |
IncludeFieldMetadataFilter.filter(Metadata metadata) |
| Modifier and Type | Class and Description |
|---|---|
class |
MimeTypeException
A class to encapsulate MimeType related exceptions.
|
| Modifier and Type | Method and Description |
|---|---|
static void |
MimeTypesReader.setPoolSize(int poolSize)
Set the pool size for cached XML parsers.
|
| Modifier and Type | Method and Description |
|---|---|
abstract Parser |
ParserFactory.build() |
Parser |
AutoDetectParserFactory.build() |
DocumentBuilder |
ParseContext.getDocumentBuilder()
Returns the DOM builder specified in this parsing context.
|
SAXParser |
ParseContext.getSAXParser()
Returns the SAX parser specified in this parsing context.
|
Transformer |
ParseContext.getTransformer()
Returns the transformer specified in this parsing context.
|
XMLReader |
ParseContext.getXMLReader()
Returns the XMLReader specified in this parsing context.
|
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata) |
void |
AbstractParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata)
Deprecated.
use the
Parser.parse(InputStream, ContentHandler, Metadata, ParseContext) method instead |
void |
CryptoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ParserPostProcessor.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Forwards the call to the delegated parser and post-processes the
results as described above.
|
void |
CompositeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the call to the matching component parser.
|
void |
DelegatingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Looks up the delegate parser from the parsing context and
delegates the parse operation to it.
|
void |
DigestingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
NetworkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ErrorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
void |
RecursiveParserWrapper.parse(InputStream stream,
ContentHandler recursiveParserWrapperHandler,
Metadata metadata,
ParseContext context)
Acts like a regular parser except it ignores the ContentHandler
and it automatically sets/overwrites the embedded Parser in the
ParseContext object.
|
void |
ParserDecorator.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the method call to the decorated parser.
|
| Modifier and Type | Method and Description |
|---|---|
static void |
ExternalParsersFactory.attachExternalParsers(TikaConfig config) |
static List<ExternalParser> |
ExternalParsersFactory.create() |
static List<ExternalParser> |
ExternalParsersFactory.create(ServiceLoader loader) |
static List<ExternalParser> |
ExternalParsersFactory.create(String filename,
ServiceLoader loader) |
static List<ExternalParser> |
ExternalParsersFactory.create(URL... urls) |
void |
ExternalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
static List<ExternalParser> |
ExternalParsersConfigReader.read(Document document) |
static List<ExternalParser> |
ExternalParsersConfigReader.read(Element element) |
static List<ExternalParser> |
ExternalParsersConfigReader.read(InputStream stream) |
| Constructor and Description |
|---|
CompositeExternalParser() |
CompositeExternalParser(MediaTypeRegistry registry) |
| Modifier and Type | Method and Description |
|---|---|
void |
SecureContentHandler.throwIfCauseOf(SAXException e)
Converts the given
SAXException to a corresponding
TikaException if it's caused by this instance detecting
a zip bomb. |
| Modifier and Type | Method and Description |
|---|---|
static Document |
XMLReaderUtils.buildDOM(InputStream is)
Builds a Document with a DocumentBuilder from the pool
|
static Document |
XMLReaderUtils.buildDOM(InputStream is,
ParseContext context)
This checks context for a user specified
DocumentBuilder. |
static Document |
XMLReaderUtils.buildDOM(Path path)
Builds a Document with a DocumentBuilder from the pool
|
static Document |
XMLReaderUtils.buildDOM(String uriString)
Builds a Document with a DocumentBuilder from the pool
|
static DocumentBuilder |
XMLReaderUtils.getDocumentBuilder()
Returns the DOM builder specified in this parsing context.
|
static SAXParser |
XMLReaderUtils.getSAXParser()
Returns the SAX parser specified in this parsing context.
|
static Transformer |
XMLReaderUtils.getTransformer()
Returns a new transformer
|
static XMLReader |
XMLReaderUtils.getXMLReader()
Returns the XMLReader specified in this parsing context.
|
static void |
XMLReaderUtils.parseSAX(InputStream is,
DefaultHandler contentHandler,
ParseContext context)
This checks context for a user specified
SAXParser. |
static void |
XMLReaderUtils.setPoolSize(int poolSize)
Set the pool size for cached XML parsers.
|
Copyright © 2007–1969 The Apache Software Foundation. All rights reserved.