| Package | Description |
|---|---|
| org.apache.tika |
Apache Tika.
|
| org.apache.tika.detect |
Media type detection.
|
| org.apache.tika.embedder | |
| org.apache.tika.extractor |
Extraction of component documents.
|
| org.apache.tika.fork |
Forked parser.
|
| org.apache.tika.io |
IO utilities.
|
| org.apache.tika.metadata |
Multi-valued metadata container, and set of constant metadata fields.
|
| org.apache.tika.metadata.filter | |
| org.apache.tika.mime |
Media type information.
|
| org.apache.tika.parser |
Tika parsers.
|
| org.apache.tika.parser.digest | |
| org.apache.tika.parser.external |
External parser process.
|
| org.apache.tika.sax |
SAX utilities.
|
| org.apache.tika.utils |
Utilities.
|
| Modifier and Type | Method and Description |
|---|---|
String |
Tika.detect(InputStream stream,
Metadata metadata)
Detects the media type of the given document.
|
Reader |
Tika.parse(File file,
Metadata metadata)
Parses the given file and returns the extracted text content.
|
Reader |
Tika.parse(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
Reader |
Tika.parse(Path path,
Metadata metadata)
Parses the file at the given path and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata,
int maxLength)
Parses the given document and returns the extracted text content.
|
| Modifier and Type | Method and Description |
|---|---|
Charset |
EncodingDetector.detect(InputStream input,
Metadata metadata)
Detects the character encoding of the given text document, or
null if the encoding of the document can not be detected. |
MediaType |
EmptyDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
FileCommandDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
MagicDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
TrainedModelDetector.detect(InputStream input,
Metadata metadata) |
Charset |
NonDetectingEncodingDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
TypeDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on a type hint
given in the input metadata.
|
MediaType |
NameDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on the document
name given in the input metadata.
|
MediaType |
Detector.detect(InputStream input,
Metadata metadata)
Detects the content type of the given input document.
|
MediaType |
ZeroSizeFileDetector.detect(InputStream stream,
Metadata metadata) |
MediaType |
TextDetector.detect(InputStream input,
Metadata metadata)
Looks at the beginning of the document input stream to determine
whether the document is text or not.
|
MediaType |
CompositeDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
OverrideDetector.detect(InputStream input,
Metadata metadata) |
Charset |
CompositeEncodingDetector.detect(InputStream input,
Metadata metadata) |
| Constructor and Description |
|---|
AutoDetectReader(InputStream stream,
Metadata metadata) |
AutoDetectReader(InputStream stream,
Metadata metadata,
EncodingDetector encodingDetector) |
AutoDetectReader(InputStream stream,
Metadata metadata,
ServiceLoader loader) |
| Modifier and Type | Method and Description |
|---|---|
void |
ExternalEmbedder.embed(Metadata metadata,
InputStream inputStream,
OutputStream outputStream,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
void |
Embedder.embed(Metadata metadata,
InputStream originalStream,
OutputStream outputStream,
ParseContext context)
Embeds related document metadata from the given metadata object into the
given output stream.
|
protected List<String> |
ExternalEmbedder.getCommandMetadataSegments(Metadata metadata)
Constructs a collection of command line arguments responsible for setting
individual metadata fields based on the given
metadata. |
| Modifier and Type | Method and Description |
|---|---|
String |
EmbeddedDocumentUtil.getExtension(TikaInputStream is,
Metadata metadata) |
void |
EmbeddedDocumentUtil.parseEmbedded(InputStream inputStream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
void |
EmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml)
Processes the supplied embedded resource, calling the delegating
parser with the appropriate details.
|
void |
ParsingEmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
static void |
EmbeddedDocumentUtil.recordEmbeddedStreamException(Throwable t,
Metadata m) |
static void |
EmbeddedDocumentUtil.recordException(Throwable t,
Metadata m) |
boolean |
DocumentSelector.select(Metadata metadata)
Checks if a document with the given metadata matches the specified
selection criteria.
|
boolean |
EmbeddedDocumentUtil.shouldParseEmbedded(Metadata m) |
boolean |
EmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
boolean |
ParsingEmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
| Modifier and Type | Method and Description |
|---|---|
void |
ForkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
This sends the objects to the server for parsing, and the server via
the proxies acts on the handler as if it were updating it directly.
|
| Modifier and Type | Method and Description |
|---|---|
static TikaInputStream |
TikaInputStream.get(Blob blob,
Metadata metadata)
Creates a TikaInputStream from the given database BLOB.
|
static TikaInputStream |
TikaInputStream.get(byte[] data,
Metadata metadata)
Creates a TikaInputStream from the given array of bytes.
|
static TikaInputStream |
TikaInputStream.get(File file,
Metadata metadata)
Deprecated.
use
TikaInputStream.get(Path, Metadata). In Tika 2.0,
this will be removed or modified to throw an IOException. |
static TikaInputStream |
TikaInputStream.get(Path path,
Metadata metadata)
Creates a TikaInputStream from the file at the given path.
|
static TikaInputStream |
TikaInputStream.get(URI uri,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URI.
|
static TikaInputStream |
TikaInputStream.get(URL url,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URL.
|
| Modifier and Type | Method and Description |
|---|---|
static void |
XMPDM.ChannelTypePropertyConverter.convertAndSet(Metadata metadata,
Object value)
Deprecated.
How convert+set might work
|
| Modifier and Type | Method and Description |
|---|---|
void |
CompositeMetadataFilter.filter(Metadata metadata) |
void |
NoOpFilter.filter(Metadata metadata) |
void |
ExcludeFieldMetadataFilter.filter(Metadata metadata) |
void |
ClearByMimeMetadataFilter.filter(Metadata metadata) |
void |
MetadataFilter.filter(Metadata metadata) |
void |
IncludeFieldMetadataFilter.filter(Metadata metadata) |
| Modifier and Type | Method and Description |
|---|---|
MediaType |
ProbabilisticMimeDetectionSelector.detect(InputStream input,
Metadata metadata) |
MediaType |
MimeTypes.detect(InputStream input,
Metadata metadata)
Automatically detects the MIME type of a document based on magic
markers in the stream prefix and any given metadata hints.
|
| Modifier and Type | Method and Description |
|---|---|
List<Metadata> |
RecursiveParserWrapper.getMetadata()
Deprecated.
use a
RecursiveParserWrapperHandler instead |
| Modifier and Type | Method and Description |
|---|---|
void |
DigestingParser.Digester.digest(InputStream is,
Metadata m,
ParseContext parseContext)
Digests an InputStream and sets the appropriate value(s) in the metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata)
Returns the parser that best matches the given metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata,
ParseContext context) |
String |
PasswordProvider.getPassword(Metadata metadata)
Looks up the password for a document with the given metadata,
and returns it for the Parser.
|
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata) |
void |
AbstractParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata)
Deprecated.
use the
Parser.parse(InputStream, ContentHandler, Metadata, ParseContext) method instead |
void |
CryptoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ParserPostProcessor.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Forwards the call to the delegated parser and post-processes the
results as described above.
|
void |
CompositeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the call to the matching component parser.
|
void |
DelegatingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Looks up the delegate parser from the parsing context and
delegates the parse operation to it.
|
void |
DigestingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
NetworkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ErrorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
void |
EmptyParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
RecursiveParserWrapper.parse(InputStream stream,
ContentHandler recursiveParserWrapperHandler,
Metadata metadata,
ParseContext context)
Acts like a regular parser except it ignores the ContentHandler
and it automatically sets/overwrites the embedded Parser in the
ParseContext object.
|
void |
ParserDecorator.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the method call to the decorated parser.
|
| Constructor and Description |
|---|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context,
Executor executor)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
| Modifier and Type | Method and Description |
|---|---|
void |
InputStreamDigester.digest(InputStream is,
Metadata metadata,
ParseContext parseContext) |
void |
CompositeDigester.digest(InputStream is,
Metadata m,
ParseContext parseContext) |
| Modifier and Type | Method and Description |
|---|---|
void |
ExternalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
| Modifier and Type | Field and Description |
|---|---|
protected List<Metadata> |
RecursiveParserWrapperHandler.metadataList |
| Modifier and Type | Method and Description |
|---|---|
List<Metadata> |
RecursiveParserWrapperHandler.getMetadataList() |
| Modifier and Type | Method and Description |
|---|---|
void |
AbstractRecursiveParserWrapperHandler.endDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after the full parse has completed.
|
void |
RecursiveParserWrapperHandler.endDocument(ContentHandler contentHandler,
Metadata metadata) |
void |
AbstractRecursiveParserWrapperHandler.endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing each embedded document.
|
void |
RecursiveParserWrapperHandler.endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing an embedded document.
|
void |
XMPContentHandler.metadata(Metadata metadata) |
void |
AbstractRecursiveParserWrapperHandler.startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing each embedded document.
|
void |
RecursiveParserWrapperHandler.startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing an embedded document
|
| Constructor and Description |
|---|
DIFContentHandler(ContentHandler delegate,
Metadata metadata) |
PhoneExtractingContentHandler(ContentHandler handler,
Metadata metadata)
Creates a decorator for the given SAX event handler and Metadata object.
|
StandardsExtractingContentHandler(ContentHandler handler,
Metadata metadata)
Creates a decorator for the given SAX event handler and Metadata object.
|
XHTMLContentHandler(ContentHandler handler,
Metadata metadata) |
| Modifier and Type | Method and Description |
|---|---|
static Metadata |
ParserUtils.cloneMetadata(Metadata m)
Does a deep clone of a Metadata object.
|
| Modifier and Type | Method and Description |
|---|---|
static Metadata |
ParserUtils.cloneMetadata(Metadata m)
Does a deep clone of a Metadata object.
|
static void |
ParserUtils.recordParserDetails(Parser parser,
Metadata metadata)
|
static void |
ParserUtils.recordParserFailure(Parser parser,
Throwable failure,
Metadata metadata)
|
Copyright © 2007–1969 The Apache Software Foundation. All rights reserved.