org.opencrx.kernel.text
Class WordToText

java.lang.Object
  extended by org.opencrx.kernel.text.WordToText

public class WordToText
extends Object

Re-factored from http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/extractor/WordExtractor.java?view=log


Constructor Summary
WordToText()
           
 
Method Summary
 String[] getParagraphText(org.apache.poi.hwpf.HWPFDocument doc)
          Get the text from the word file, as an array with one String per paragraph
 String getTextFromPieces(org.apache.poi.hwpf.HWPFDocument doc)
          Grab the text out of the text pieces.
 Reader parse(InputStream in)
          Gets the text from a Word document.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordToText

public WordToText()
Method Detail

getParagraphText

public String[] getParagraphText(org.apache.poi.hwpf.HWPFDocument doc)
Get the text from the word file, as an array with one String per paragraph


getTextFromPieces

public String getTextFromPieces(org.apache.poi.hwpf.HWPFDocument doc)
Grab the text out of the text pieces. Might also include various bits of crud, but will work in cases where the text piece -> paragraph mapping is broken. Fast too.


parse

public Reader parse(InputStream in)
             throws ServiceException,
                    IOException
Gets the text from a Word document.

Parameters:
in - The InputStream representing the Word file.
Throws:
ServiceException
IOException


This software is published under the BSD license. Copyright © 2003-2010, CRIXP AG, Switzerland, All rights reserved. Use is subject to license terms.