org.opencrx.application.uses.com.auxilii.msgparser
Class MsgParser

java.lang.Object
  extended by org.opencrx.application.uses.com.auxilii.msgparser.MsgParser

public class MsgParser
extends Object

Main parser class that does the actual parsing of the Outlook .msg file. It uses the POI library for parsing the .msg container file and is based on a description posted by Peter Fiskerstrand at fileformat.info.

It parses the .msg file and stores the information in a Message object. Attachments are put into an FileAttachment object. Hence, please keep in mind that the complete mail is held in the memory! If an attachment is another .msg file, this attachment is not processed as a normal attachment but rather included as a MsgAttachment. This attached mail is, again, a Message object and may have further attachments and so on.

Note: this code has not been tested on a wide range of .msg files. Use in production level (as in any other level) at your own risk.

Usage:

MsgParser msgp = new MsgParser();
Message msg = msgp.parseMsg("test.msg");

Author:
roman.kurmanowytsch

Field Summary
protected static Logger logger
           
 
Constructor Summary
MsgParser()
          Empty constructor.
 
Method Summary
protected  FieldInformation analyzeDocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry de)
          Analyzes the DocumentEntry and returns a FieldInformation object containing the class (the field name, so to say) and type of the entry.
protected  void checkDirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry dir, Message msg)
          Recursively parses the complete .msg file with the help of the POI library.
protected  void checkRecipientDirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry dir, Message msg)
          Parses a recipient directory entry which holds informations about one of possibly multiple recipients.
protected  Object getData(org.apache.poi.poifs.filesystem.DocumentInputStream dstream, FieldInformation info)
          Reads the information from the InputStream and creates, based on the information in the FieldInformation object, either a String or a byte[] (e.g., for attachments) Object containing this data.
protected  void parseAttachment(org.apache.poi.poifs.filesystem.DirectoryEntry dir, Message msg)
          Creates an Attachment object based on the given directory entry.
 Message parseMsg(File msgFile)
          Parses a .msg file provided in the specified file.
 Message parseMsg(InputStream msgFileStream)
          Parses a .msg file provided by an input stream.
 Message parseMsg(InputStream msgFileStream, boolean closeStream)
          Parses a .msg file provided by an input stream.
 Message parseMsg(String msgFile)
          Parses a .msg file provided in the specified file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected static final Logger logger
Constructor Detail

MsgParser

public MsgParser()
Empty constructor.

Method Detail

parseMsg

public Message parseMsg(File msgFile)
                 throws IOException,
                        UnsupportedOperationException
Parses a .msg file provided in the specified file.

Parameters:
msgFile - The .msg file.
Returns:
A Message object representing the .msg file.
Throws:
IOException - Thrown if the file could not be loaded or parsed.
UnsupportedOperationException - Thrown if the .msg file cannot be parsed correctly.

parseMsg

public Message parseMsg(String msgFile)
                 throws IOException,
                        UnsupportedOperationException
Parses a .msg file provided in the specified file.

Parameters:
msgFile - The .msg file as a String path.
Returns:
A Message object representing the .msg file.
Throws:
IOException - Thrown if the file could not be loaded or parsed.
UnsupportedOperationException - Thrown if the .msg file cannot be parsed correctly.

parseMsg

public Message parseMsg(InputStream msgFileStream)
                 throws IOException,
                        UnsupportedOperationException
Parses a .msg file provided by an input stream.

Parameters:
msgFileStream - The .msg file as a InputStream.
Returns:
A Message object representing the .msg file.
Throws:
IOException - Thrown if the file could not be loaded or parsed.
UnsupportedOperationException - Thrown if the .msg file cannot be parsed correctly.

parseMsg

public Message parseMsg(InputStream msgFileStream,
                        boolean closeStream)
                 throws IOException,
                        UnsupportedOperationException
Parses a .msg file provided by an input stream.

Parameters:
msgFileStream - The .msg file as a InputStream.
closeStream - Indicates whether the provided stream should be closed after the message has been read.
Returns:
A Message object representing the .msg file.
Throws:
IOException - Thrown if the file could not be loaded or parsed.
UnsupportedOperationException - Thrown if the .msg file cannot be parsed correctly.

checkDirectoryEntry

protected void checkDirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry dir,
                                   Message msg)
                            throws IOException,
                                   UnsupportedOperationException
Recursively parses the complete .msg file with the help of the POI library. The parsed information is put into the Message object.

Parameters:
dir - The current node in the .msg file.
msg - The resulting Message object.
Throws:
IOException - Thrown if the .msg file could not be parsed.
UnsupportedOperationException - Thrown if the .msg file contains unknown data.

checkRecipientDirectoryEntry

protected void checkRecipientDirectoryEntry(org.apache.poi.poifs.filesystem.DirectoryEntry dir,
                                            Message msg)
                                     throws IOException,
                                            UnsupportedOperationException
Parses a recipient directory entry which holds informations about one of possibly multiple recipients. The parsed information is put into the Message object.

Parameters:
dir - The current node in the .msg file.
msg - The resulting Message object.
Throws:
IOException - Thrown if the .msg file could not be parsed.
UnsupportedOperationException - Thrown if the .msg file contains unknown data.

getData

protected Object getData(org.apache.poi.poifs.filesystem.DocumentInputStream dstream,
                         FieldInformation info)
                  throws IOException
Reads the information from the InputStream and creates, based on the information in the FieldInformation object, either a String or a byte[] (e.g., for attachments) Object containing this data.

Parameters:
dstream - The InputStream of the Document Entry.
info - The field information that is needed to determine the data type of the input stream.
Returns:
The String/byte[] object representing the data.
Throws:
IOException - Thrown if the .msg file could not be parsed.
UnsupportedOperationException - Thrown if the .msg file contains unknown data.

analyzeDocumentEntry

protected FieldInformation analyzeDocumentEntry(org.apache.poi.poifs.filesystem.DocumentEntry de)
Analyzes the DocumentEntry and returns a FieldInformation object containing the class (the field name, so to say) and type of the entry.

Parameters:
de - The DocumentEntry that should be examined.
Returns:
A FieldInformation object containing class and type of the document entry or, if the entry is not an interesting field, an empty FieldInformation object containing FieldInformation.UNKNOWN class and type.

parseAttachment

protected void parseAttachment(org.apache.poi.poifs.filesystem.DirectoryEntry dir,
                               Message msg)
                        throws IOException
Creates an Attachment object based on the given directory entry. The entry may either point to an attached file or to an attached .msg file, which will be added as a MsgAttachment object instead.

Parameters:
dir - The directory entry containing the attachment document entry and some other document entries describing the attachment (name, extension, mime type, ...)
msg - The Message object that this attachment should be added to.
Throws:
IOException - Thrown if the attachment could not be parsed/read.


This software is published under the BSD license. Copyright © 2003-2013, CRIXP AG, Switzerland, All rights reserved. Use is subject to license terms.