New MongoDB Drivers for PHP and HHVM: History

We recently released a new version of the MongoDB driver for PHP. This release is the result of nearly a year and a half work to re-engineer and rewrite the MongoDB driver. In this blog post, I will cover the back story of the how and why we undertook this effort.

The original driver was created by Kristina and released in early 2009. The extension had a very simple architecture and was written mostly in PHP. A handful of core functions were implemented in C, such as creating a resource for the connection, standard CRUD operations, query execution, and iterating over a result set (sans an Iterator interface). All of the convenient user facing APIs were implemented in PHP code. There were also a few classes to encapsulate certain MongoDB data types that PHP otherwise could not represent, such as its ISODate, ObjectID and Regex types.

A small example to create a connection, insert a document, and retrieve a result set looked like:

<?php
include "Mongo.php";

$m = new Mongo();
$c = $m->selectCollection("phpt", "find");
$c->drop();

$c->insert( array(
        "foo" => "bar",
        "a" => "b",
        "b" => "c")
);

$cursor = $c->find(array("foo"=>"bar"), array("a"=>1,"b"=>1));

while ($cursor->hasNext()) {
        var_dump($cursor->getNext());
}
?>

Support for PHP 5.3 was soon added. In this release, much of the syntactic sugar in the form of PHP classes and code had been replaced by implementations in C. Where most (if not all) PHP extensions use phpt tests, the MongoDB driver instead implemented its test suite with PHPUnit tests. The main difference between test methods, is that phpt tests are all run in a separate PHP process in isolation, whereas PHPUnit tests re-use the same process by default. Which means that if one test case causes a crash in the driver, none of the others run either, and no result is visible.

In late 2009, the MongoDB driver for PHP reached a stable state with version 1.0.0. It added a feature to placate some PHP developers that refused to use single quotes around query operators. Query operators in MongoDB have the form of $ + operation, such as $gte for Greater Than or Equal or $set for setting fields during updates. As they are array keys, extra care needs to be taken. For example, the following does not do what you expect it to do:

$r = $c->find( array( "fieldname" => array( "$gte" => 42 ) ) );

The $gte in the double quotes of course causes a variable substitution, usually evaluating to an empty string. Instead of educating users to use single quotes (as in '$gte'), the driver acquired a way to replace the $ symbol to signal a query operator with a different symbol:

ini_set('mongo.cmd', '@');
$r = $c->find( array( "fieldname" => array( "@gte" => 42 ) ) );

The driver would replace these symbols before sending them on to the server. A little later, a warning was added in case an empty array key was found.

The 1.0.0 release also settled issues with serialising PHP variables to database types. MongoDB uses Binary JSON (BSON) as its internal format, and the PHP driver needs to serialize to and from this data type. BSON also supports compound data types, arrays (packed numerical keys), and documents (basically hash maps, or something akin to JSON objects). Because in PHP there is really only one array type, the decision was taken that both BSON documents and BSON arrays would be deserialized into PHP variables as arrays. The following example illustrates this:

<?php
// Make a connection, select the 'test' collection in the 'demo' database,
// and clean out the collection.
$m = new Mongo;
$c = $m->selectCollection( 'demo', 'test' );
$c->drop();

// Construct two documents
// 1. A stdClass object
$docObj = new stdClass;
$docObj->foo = 'bar';

// 2. An array
$docArr = [ 'foo' => 'baz' ];

// Insert documents into the collection:
$c->insert( $docObj );
$c->insert( $docArr );

// Query and show results
$r = $c->find();

foreach ( $r as $record )
{
        print_r( $record );
}
?>

The result of this script is (after formatting):

Array
(
        [_id] => MongoId Object ( [$id] => 565888e844670acd368b4567 )
        [foo] => bar
)
Array
(
        [_id] => MongoId Object ( [$id] => 565888e844670acd368b4568 )
        [foo] => baz
)

As you see, both come back as arrays, as arrays are easier to work with than objects of stdClass.

At the start of February 2012, I took over the role as maintainer of the PHP driver. Connection pools were new in the 1.2 series, and ended up causing lots of issues. Because PHP is single threaded (from a request handling point of view), a connection pool makes little sense as PHP can't use multiple connections at the same time anyway. Ultimately, connection pools were ripped out in the 1.3 series to improve reliability (and sanity while debugging). At the same time, I also added lots of logging!

With 1.3 out the door, Hannes started helping out, and because we were now two, we settled on a new Git workflow. Jeremy joined our little team around at the same time, and started contributing to the driver in earnest somewhere in 2013.

Over the next few years, Hannes, Jeremy and I worked on the driver, keeping it up to date with new server features. While doing so, we ran into quite a few earlier design issues. A few I have mentioned so far, but also included are:

  • Two inconsistent ways of passing in options to methods. The driver uses both positional arguments and arguments passed in as an array. Sometimes both are allowed for the same method due to backwards compatibility reasons.

  • There are static methods for setting query and cursor objects, which makes it difficult to find out which option value was being used for each query.

  • We had no clear strategy as to when to add command helpers. Command helpers are convenience methods for tasks such as creating users, indexes, and collections — things that could also be done through the generic command execution method.

  • Maintenance of some complex functionality such as GridFS was hard, and was both difficult and cumbersome, as it was all implemented in C code.

  • Because everything was implemented in the PHP extension, we couldn't easily share code in case we wanted to support something like HHVM.

Lastly, because each language driver was implementing its own version of the protocol to talk to MongoDB, we were doing a lot of duplicated work.

With this in mind, we set some goals for a new version of the driver:

  • A bare bones driver just implementing the essential API for accessing all features of MongoDB.

  • No syntactic sugar, or command helpers (they are better suited to a PHP library)

  • It should be fast to write and easy to maintain

  • Support for other PHP engines like HHVM

  • No reinvention of the wheel by implementing the protocol to talk to MongoDB again.

  • The addition of a supporting PHP library, to provide convenience methods, command helpers and in general, a nice user-facing API.

With all these requirement in place, we came up with a new architecture, that I will describe in an upcoming blog post.

Comments

No comments yet

Add Comment

Name:
Email:

Will not be posted. Please leave empty instead of filling in garbage though!
Comment:

Please follow the reStructured Text format. Do not use the comment form to report issues in software, use the relevant issue tracker. I will not answer them here.


All comments are moderated

Life Line