public class OfficeParserConfig extends Object implements Serializable
| Constructor and Description |
|---|
OfficeParserConfig() |
| Modifier and Type | Method and Description |
|---|---|
boolean |
getConcatenatePhoneticRuns() |
boolean |
getExtractAllAlternativesFromMSG() |
boolean |
getExtractMacros() |
boolean |
getIncludeDeletedContent() |
boolean |
getIncludeHeadersAndFooters() |
boolean |
getIncludeMoveFromContent() |
boolean |
getIncludeShapeBasedContent() |
boolean |
getUseSAXDocxExtractor() |
boolean |
getUseSAXPptxExtractor() |
void |
setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
Microsoft Excel files can sometimes contain phonetic (furigana) strings.
|
void |
setExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG)
Some .msg files can contain body content in html, rtf and/or text.
|
void |
setExtractMacros(boolean extractMacros)
Sets whether or not MSOffice parsers should extract macros.
|
void |
setIncludeDeletedContent(boolean includeDeletedContent)
Sets whether or not the parser should include deleted content.
|
void |
setIncludeHeadersAndFooters(boolean includeHeadersAndFooters)
Whether or not to include headers and footers.
|
void |
setIncludeMoveFromContent(boolean includeMoveFromContent)
With track changes on, when a section is moved, the content
is stored in both the "moveFrom" section and in the "moveTo" section.
|
void |
setIncludeShapeBasedContent(boolean includeShapeBasedContent)
In Excel and Word, there can be text stored within drawing shapes.
|
void |
setUseSAXDocxExtractor(boolean useSAXDocxExtractor)
Use the experimental SAX-based streaming DOCX parser?
If set to
false, the classic parser will be used; if true,
the new experimental parser will be used. |
void |
setUseSAXPptxExtractor(boolean useSAXPptxExtractor)
Use the experimental SAX-based streaming DOCX parser?
If set to
false, the classic parser will be used; if true,
the new experimental parser will be used. |
public void setExtractMacros(boolean extractMacros)
false.extractMacros - public boolean getExtractMacros()
public void setIncludeDeletedContent(boolean includeDeletedContent)
SXWPFWordExtractorDecorator so far!!!includeDeletedContent - public boolean getIncludeDeletedContent()
public void setIncludeMoveFromContent(boolean includeMoveFromContent)
true
Default: false
This has only been implemented in the streaming docx parser
(SXWPFWordExtractorDecorator so far!!!includeMoveFromContent - public boolean getIncludeMoveFromContent()
public void setIncludeShapeBasedContent(boolean includeShapeBasedContent)
false
Default: trueincludeShapeBasedContent - public boolean getIncludeShapeBasedContent()
public void setIncludeHeadersAndFooters(boolean includeHeadersAndFooters)
trueincludeHeadersAndFooters - public boolean getIncludeHeadersAndFooters()
public boolean getUseSAXDocxExtractor()
public void setUseSAXDocxExtractor(boolean useSAXDocxExtractor)
false, the classic parser will be used; if true,
the new experimental parser will be used.
Default: false (classic DOM parser)useSAXDocxExtractor - public void setUseSAXPptxExtractor(boolean useSAXPptxExtractor)
false, the classic parser will be used; if true,
the new experimental parser will be used.
Default: false (classic DOM parser)useSAXPptxExtractor - public boolean getUseSAXPptxExtractor()
public boolean getConcatenatePhoneticRuns()
public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
This is currently only supported by the xls and xlsx parsers (not the xlsb parser),
and the default is true.
concatenatePhoneticRuns - public void setExtractAllAlternativesFromMSG(boolean extractAllAlternativesFromMSG)
extractAllAlternativesFromMSG - whether or not to extract all alternative partspublic boolean getExtractAllAlternativesFromMSG()
Copyright © 2007–2018 The Apache Software Foundation. All rights reserved.