Parsing Mail with PHP

Many PHP applications require to parse e-mail messages. For example bug systems and ticket systems that want to allow input by e-mail. For sending e-mail there are already decent implementations, ones that even allow sending multi-part and mixed text/html messages with attachments and so on.

Parsing e-mail is a whole different story, and definitely not an easy task. As we see this task as something important we decided to add e-mail parsing functionality to the Mail component . We just released an alpha release of this component as PEAR package which you should be able to install with (not sure if you have to add the components channel first, for information on how to do that see the "PEAR Installer" section in this article.

pear install components.ez.no/mail-beta

The Mail parsing part of the component can currently use a POP3 server or a file to parse e-mail messages from. A small example to parse e-mail from a POP 3 server looks like (if you use the PEAR installer to install the component):

<?php
require_once "ezc/Base/base.php";
function __autoload( $className )
{
    ezcBase::autoload( $className );
}
$pop3 = new ezcMailPop3Transport( "pop3.example.com" );
$pop3->authenticate( "user", "password" );
$set = $pop3->fetchAll();
$parser = new ezcMailParser();
$mails = $parser->parseMail( $set );
foreach ( $mails as $mail )
{
    echo "From: {$mail->from->email}\n";
    echo "To: ";
    foreach ( $mail->to as $to ) {
        echo "{$to->name} ({$to->email}) ";
    }
    echo "\n";
    echo "Subject: {$mail->subject}\n";
    switch ( get_class( $mail->body ) )
    {
        case 'ezcMailText':
            echo "Text part, ".
                "type={$mail->body->subType}\n--\n";
            echo $mail->body->text;
            echo "\n--\n";
            break;
        case 'ezcMailMultipartMixed':
            echo "Multipart mail\n";
            break;
    }
    echo "\n";
}
?>

The $mail variable now holds an array of ezcMail objects. For more information on how to access the information in the mail classes, please refer to the documentation . Currently not all the ezcMailPart decendents document the available properties yet, but this will ofcourse be addressed before the first beta.

In the near future we want to expand the component with an IMAP transport, more authentication mechanisms and add methods that allow you to "reply" to a parsed e-mail message or "forward" one. Those methods then set the correct headers in the e-mail object, including the correct handling of "References" and "In-Reply-To" headers.

Comments

Very cool component :-)

However, you're saying that instead of reading a mail from a POP3 account, one could also use a file... reading the documentation (haven't looked at the actual code yet) does not give me any hint of how to do so - can you perhaps give an example for that?

Currently we only have one for a single file but we want to expand this to allow reading of Unix mbox files too. This is not ready yet though and thus we didn't put the single file reader in the component yet. In http://svn.ez.no/svn/ezcomponents/trunk/Mail/tests/parser/parser_test.php you find the SingleFileSet class which you can use (but change the __construct() for your path). This class will end up as ezcMailMboxTransport later on with support for multiple files in it.

This looks good.

Are there any obvious differences to the MimeDecode class also in PEAR? I currently use MimeDecode so wodering if there are any immediate advantages to this.

Also, will this work by supplying a variable, e.g. command line input in a shell script?

@Chris: I believe we support more formats regarding multi parts and return a more structured result. The result of the parseMail() method is the same as if you would create a mail message yourself.

All mail that comes from a transport is also parsed on the fly while reading so that no unncessary memory is used.

The currently available classes do not yet work on a string only because there is no StringTransport available yet. This is something we could add though.

That sounds great Derick, the memory problem is a major issue with mimeDecode, I have also been looking at the PECL package mailparse but this istn't a portable option in most cases.

One further thing, have you done much testing with the class using other character sets, particularly when other character sets are used in the subject of an email?

@Chris: We tested some with iso-8859-1 and UTF-8. We use the iconv function currently and I think that should handle most encodings. We tested this will all header fields, including the Subject header.

i'm currently using PEAR::Mail_IMAPv2. Are there arguments to use this class instead ? What can you say about this ? thanks.

@Florent: One of the things is that we don't rely on any PHP extension for our component. It also seems that the parsed message is a bit more structured and we support multiple transports (POP3 and single mail file for now, IMAP , mailbox and string are following.) Our license is also slightly friendlier as we don't have the advertising clause which exists in the old BSD license (and also in the PHP license that this package is under).

Ok Derick. I'm convinced ;-) i've planned to use string instead of pop3 in my application so i'm glad to see that there will be a StringTransport. Have you an idea of the release date ?

We should have a beta out just after easter.

This looks really great! I've been desperately seeking something like this.

What's the current status? Has it made it to beta yet?

It has been released as beta, and as release candidate already. We will release the final version on Monday.

An excellent library ez :) thumbs up for the work.

I just want to know that how can I change the pop server port number? As I want to snatch email from gmail account which operates on pop.gmail.com 995. How can I change that?

Add Comment

Name:
Email:

Will not be posted. Please leave empty instead of filling in garbage though!
Comment:

Please follow the reStructured Text format. Do not use the comment form to report issues in software, use the relevant issue tracker. I will not answer them here.


All comments are moderated

Life Line