Package org.apache.poi.hwpf.converter
Class WordToTextConverter
java.lang.Object
org.apache.poi.hwpf.converter.AbstractWordConverter
org.apache.poi.hwpf.converter.WordToTextConverter
-
Field Summary
Fields inherited from class org.apache.poi.hwpf.converter.AbstractWordConverter
UNICODECHAR_NO_BREAK_SPACE, UNICODECHAR_NONBREAKING_HYPHEN, UNICODECHAR_ZERO_WIDTH_SPACE
-
Constructor Summary
ConstructorsConstructorDescriptionCreates new instance ofWordToTextConverter
.WordToTextConverter
(TextDocumentFacade textDocumentFacade) WordToTextConverter
(Document document) Creates new instance ofWordToTextConverter
. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
Special actions that need to be called after processing complete, like updating stylesheets or building document notes list.getText()
static String
static String
getText
(HWPFDocumentCore wordDocument) static String
getText
(DirectoryNode root) boolean
static void
Java main() interface to interact withWordToTextConverter
protected void
outputCharacters
(Element block, CharacterRun characterRun, String text) protected void
processBookmarks
(HWPFDocumentCore wordDocument, Element currentBlock, Range range, int currentTableLevel, List<Bookmark> rangeBookmarks) Wrap range into bookmark(s) and process it.protected void
processDocumentInformation
(SummaryInformation summaryInformation) void
processDocumentPart
(HWPFDocumentCore wordDocument, Range range) protected void
processDrawnObject
(HWPFDocument doc, CharacterRun characterRun, OfficeDrawing officeDrawing, String path, Element block) protected void
processEndnoteAutonumbered
(HWPFDocument wordDocument, int noteIndex, Element block, Range endnoteTextRange) protected void
processFootnoteAutonumbered
(HWPFDocument wordDocument, int noteIndex, Element block, Range footnoteTextRange) protected void
processHyperlink
(HWPFDocumentCore wordDocument, Element currentBlock, Range textRange, int currentTableLevel, String hyperlink) protected void
processImage
(Element currentBlock, boolean inlined, Picture picture) protected void
processImage
(Element currentBlock, boolean inlined, Picture picture, String url) protected void
processImageWithoutPicturesManager
(Element currentBlock, boolean inlined, Picture picture) protected void
processLineBreak
(Element block, CharacterRun characterRun) protected boolean
processOle2
(HWPFDocument wordDocument, Element block, Entry entry) protected void
processPageBreak
(HWPFDocumentCore wordDocument, Element flow) protected void
processPageref
(HWPFDocumentCore wordDocument, Element currentBlock, Range textRange, int currentTableLevel, String pageref) protected void
processParagraph
(HWPFDocumentCore wordDocument, Element parentElement, int currentTableLevel, Paragraph paragraph, String bulletText) protected void
processSection
(HWPFDocumentCore wordDocument, Section section, int s) protected void
processTable
(HWPFDocumentCore wordDocument, Element flow, Table table) void
setOutputSummaryInformation
(boolean outputDocumentInformation) Methods inherited from class org.apache.poi.hwpf.converter.AbstractWordConverter
getCharacterRunTriplet, getFontReplacer, getNumberColumnsSpanned, getNumberRowsSpanned, getPicturesManager, processCharacters, processDeadField, processDocument, processDrawnObject, processDropDownList, processField, processNoteAnchor, processParagraphes, processSingleSection, processSymbol, setFontReplacer, setPicturesManager, tryDeadField
-
Constructor Details
-
WordToTextConverter
Creates new instance ofWordToTextConverter
. Can be used for output severalHWPFDocument
s into single text document.- Throws:
ParserConfigurationException
- if an internalDocumentBuilder
cannot be created
-
WordToTextConverter
Creates new instance ofWordToTextConverter
. Can be used for output severalHWPFDocument
s into single text document.- Parameters:
document
- XML DOM Document used as storage for text pieces
-
WordToTextConverter
-
-
Method Details
-
getText
- Throws:
Exception
-
getText
- Throws:
Exception
-
getText
- Throws:
Exception
-
main
Java main() interface to interact withWordToTextConverter
Usage: WordToTextConverter infile outfile
Where infile is an input .doc file ( Word 95-2007) which will be rendered as plain text into outfile- Throws:
Exception
-
afterProcess
protected void afterProcess()Description copied from class:AbstractWordConverter
Special actions that need to be called after processing complete, like updating stylesheets or building document notes list. Usually they are called once, but it's okay to call them several times.- Overrides:
afterProcess
in classAbstractWordConverter
-
getDocument
- Specified by:
getDocument
in classAbstractWordConverter
-
getText
- Throws:
Exception
-
isOutputSummaryInformation
public boolean isOutputSummaryInformation() -
outputCharacters
- Specified by:
outputCharacters
in classAbstractWordConverter
-
processBookmarks
protected void processBookmarks(HWPFDocumentCore wordDocument, Element currentBlock, Range range, int currentTableLevel, List<Bookmark> rangeBookmarks) Description copied from class:AbstractWordConverter
Wrap range into bookmark(s) and process it. All bookmarks have starts equal to range start and ends equal to range end. Usually it's only one bookmark.- Specified by:
processBookmarks
in classAbstractWordConverter
-
processDocumentInformation
- Specified by:
processDocumentInformation
in classAbstractWordConverter
-
processDocumentPart
- Overrides:
processDocumentPart
in classAbstractWordConverter
-
processDrawnObject
protected void processDrawnObject(HWPFDocument doc, CharacterRun characterRun, OfficeDrawing officeDrawing, String path, Element block) - Specified by:
processDrawnObject
in classAbstractWordConverter
-
processEndnoteAutonumbered
protected void processEndnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, Element block, Range endnoteTextRange) - Specified by:
processEndnoteAutonumbered
in classAbstractWordConverter
-
processFootnoteAutonumbered
protected void processFootnoteAutonumbered(HWPFDocument wordDocument, int noteIndex, Element block, Range footnoteTextRange) - Specified by:
processFootnoteAutonumbered
in classAbstractWordConverter
-
processHyperlink
protected void processHyperlink(HWPFDocumentCore wordDocument, Element currentBlock, Range textRange, int currentTableLevel, String hyperlink) - Specified by:
processHyperlink
in classAbstractWordConverter
-
processImage
- Overrides:
processImage
in classAbstractWordConverter
-
processImage
- Specified by:
processImage
in classAbstractWordConverter
-
processImageWithoutPicturesManager
protected void processImageWithoutPicturesManager(Element currentBlock, boolean inlined, Picture picture) - Specified by:
processImageWithoutPicturesManager
in classAbstractWordConverter
-
processLineBreak
- Specified by:
processLineBreak
in classAbstractWordConverter
-
processOle2
protected boolean processOle2(HWPFDocument wordDocument, Element block, Entry entry) throws Exception - Overrides:
processOle2
in classAbstractWordConverter
- Throws:
Exception
-
processPageBreak
- Specified by:
processPageBreak
in classAbstractWordConverter
-
processPageref
protected void processPageref(HWPFDocumentCore wordDocument, Element currentBlock, Range textRange, int currentTableLevel, String pageref) - Specified by:
processPageref
in classAbstractWordConverter
-
processParagraph
protected void processParagraph(HWPFDocumentCore wordDocument, Element parentElement, int currentTableLevel, Paragraph paragraph, String bulletText) - Specified by:
processParagraph
in classAbstractWordConverter
-
processSection
- Specified by:
processSection
in classAbstractWordConverter
-
processTable
- Specified by:
processTable
in classAbstractWordConverter
-
setOutputSummaryInformation
public void setOutputSummaryInformation(boolean outputDocumentInformation)
-