Off-by-One Error Memory Corruption

A while ago, a user of the PHP Library for MongoDB reported a bug stating:

There is a special find query that fails with the message: Client Error: bad object in message: bson length doesn't match what we found in object with unknown _id

One can change the characters but not the length of strings and obtain the same exception.

Over the past week I had a look at this, after this was filed as PHPC-1067 for the MongoDB PHP driver, once we found that it actually causes a crash. When running the script from the report, I indeed found a Segmentation Fault (read: crash), where I noticed that during the course of a few function calls an unrelated C variable changed. The change was in a bson_t* type, which the driver uses for variables that go in and out of the MongoDB C client library. This data type has two implementations: an inline version, used for most small BSON documents; and an allocated version that can be used for building BSON structures, and which is typically used for larger documents.

In the driver's function, the data types are declared as:

bson_t  bdocument = BSON_INITIALIZER, boptions = BSON_INITIALIZER;
bson_t *bson_out = NULL;

With the affected code being:

php_phongo_zval_to_bson(zdocument, bson_flags, &bdocument, &bson_out TSRMLS_CC)

zdocument is the zval structure that PHP uses for storing variables, in this case, the document sent into the method. bdocument receives the created BSON document, boptions are options to go with the insertion, and bson_out receives the _id field that is either present in the original zdocument document, or that was generated in case no _id field was present. MongoDB documents are required to have this _id field and bson_out is used to be able to return the generated _id field so that you can do follow-up queries with it.

However, after php_phongo_zval_to_bson was run, parts of the structure of boptions changed in such a way that an internal flag now showed it was an allocated bson_t* instead of the inline version it really was. This meant that when freeing it later in the code through bson_destroy(&boptions) a memory free was attempted on non-allocated memory, which then of course caused a crash in form of the Segmentation Fault. The weird thing is that the change happened in a variable that should not have been touched by the call to php_phongo_zval_to_bson.

With the reason of the crash found, it was now time to find the cause of the crash. Which proved to be a little trickier.

Normally, with any sort of memory issues the first thing I would do is to wield Valgrind as it usually tells were things go wrong. I wrote about Valgrind before. But in this case, it didn't show anything besides that the process tried to free not-allocated memory:

==9002== Invalid read of size 8
==9002==    at 0xC7CBDA6: bson_destroy (bson.c:2285)
==9002==    by 0xC862155: zim_BulkWrite_insert (BulkWrite.c:246)
==9002==    by 0x9D64E2: execute_internal (zend_execute.c:2077)
==9002==    by 0xB8690CC: xdebug_execute_internal (xdebug.c:2020)
==9002==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

And we already knew that. After looking, and stepping through the code with GDB for way too long, I had a revelation while relaxing on the weekend that the bdocument and boptions were not actually allocated variables, but instead just variables on the stack. And Valgrind's memcheck tool does not have ways to actually detect this.

Having these variables allocated on the stack is a performance improvement, as there is no overhead of memory allocation, but while trying to find a bug, we don't care much about it being fast. So I converted these few lines of code to use an allocated version, to see if I could get some more useful information out of Valgrind:

bson_t  bdocument = BSON_INITIALIZER, boptions = BSON_INITIALIZER;

php_phongo_zval_to_bson(zdocument, bson_flags, &bdocument, &bson_out TSRMLS_CC);

to:

bson_t *bdocument, *boptions;
bdocument = bson_new();
boptions = bson_new();

php_phongo_zval_to_bson(zdocument, bson_flags, bdocument, &bson_out TSRMLS_CC);

With that changed, the Valgrind run now showed:

Invalid write of size 1
   at 0xC7C72FA: _bson_append_va (bson.c:348)
   by 0xC7C74DB: _bson_append (bson.c:404)
   by 0xC7C9DA1: bson_append_regex (bson.c:1575)
   by 0xC85122D: php_phongo_bson_append_object (bson-encode.c:234)
   by 0xC851C57: php_phongo_bson_append (bson-encode.c:382)
   by 0xC85230E: php_phongo_zval_to_bson (bson-encode.c:540)
   by 0xC862047: zim_BulkWrite_insert (BulkWrite.c:229)
 Address 0xd8c9540 is 0 bytes after a block of size 128 alloc'd
   at 0x4C2CBEF: malloc (vg_replace_malloc.c:299)
   by 0x940C00: __zend_malloc (zend_alloc.c:2829)
   by 0xC84DBCF: php_phongo_malloc (php_phongo.c:2485)
   by 0xC7DD388: bson_malloc (bson-memory.c:68)
   by 0xC7CB170: bson_new (bson.c:2015)
   by 0xC861F94: zim_BulkWrite_insert (BulkWrite.c:216)

It says invalid write of size 1 and 0 bytes after a block of size 128 alloc'd. This quickly illustrated that it was trying to do a write outside of its allocated memory area. Adding elements to the BSON document, in this case a regular expression through bson_append_regex, should normally grow the allocated memory when it runs out of already allocated space. Knowing this, combined with the phrasing One can change the characters but not the length of strings and obtain the same exception. from the original bug report, hinted towards an improper calculation of the amount of new data being written to the allocated memory buffer.

I dove into the bson_append_regex code and found the spot where it builds up all the elements to add. I have annotated it a little:

r =  _bson_append (bson,
        5,          // Number of data elements to add

        // number of bytes to add
        (1 + key_length + 1 + regex_len + options_sorted->len),

        1,          // length of first element (BSON type, int8)
        &type,      // the BSON type
        key_length, // the length of the field name
        key,        // the field name (not 0-termined)
        1,          // the length of the ending 0 byte
        &gZero,     // the null 0 byte
        regex_len,  // the length of the regular expression (including 0 byte)
        regex,      // the regular expression with 0 byte
        options_sorted->len + 1, // the length of the sorted options, with 0 byte
        options_sorted->str);    // the sorted options, with 0 byte

When comparing the "bytes" to add to the sum of the 5 "length" fields, I noticed that although an extra byte was added for options_sorted->len on the second to last line, that was not done in the calculated size on the third line.

This code got recently changed as part of CDRIVER-2128. Unfortunately a bug sneaked in miscalculating the length of the now sorted options. After adding the extra byte back in, the bug disappeared:

-     (1 + key_length + 1 + regex_len + options_sorted->len),
+     (1 + key_length + 1 + regex_len + options_sorted->len + 1),

I filed this bug as CDRIVER-2455, and made a pull request.

Additional note: I now have found Valgrind's SGCheck tool which should assist in finding stack related memory errors. Unfortunately, this tool currently seems to be inoperative on my platform.

Shortlink

This article has a short URL available: https://drck.me/offbyone-dwm

Comments

No comments yet

PHP 7.2's "switch" optimisations

PHP 7.2 is around the corner soon, and comes with many optimisations. Many new optimisations are implemented in opcache, but some others are implemented in PHP itself. One optimisation that falls in the latter category is an optimisation of the switch/case construct.

Before PHP 7.2, PHP considers each case statement in order. Let's say we have the following (trivial code):

<?php
$cc = "no";
switch ( $cc )
{
        case 'de': echo "DEUTSCH"; break;
        case 'en': echo "English"; break;
        case 'nl': echo "Nederlands"; break;
        case 'no': echo "Norsk"; break;
        default:
                echo "unknown"; break;
}
?>

The PHP interpreter approaches the executing of this switch/case structure as if it was written like:

<?php
$cc = "no";
if ($cc == 'de') {
        echo "DEUTSCH";
} else if ($cc == 'en') {
        echo "English";
} else if ($cc == 'nl') {
        echo "Nederlands";
} else if ($cc == 'no') {
        echo "Norsk";
} else {
        echo "unknown"; break;
}
?>

Yup, it really compares $cc variable to each case statement one by one. Which means that if your most commonly used switch case is further down the list, you end up wasting valuable CPU code.

With vld we can represent this in a graph. For each case statement, the graph shows that the interpreter can either go left or right, just as it was executing multiple if statements in sequence:

PHP 7.2 introduces an optimisation that converts this sequence of if statements into a jump table if all the case statements are either integers or strings. If we look at the pseudo code (PHP does not support variable labels with goto), we get something like:

<?php
$cc = "no";
$_table = [ "de" => 1, "en" => 2, "nl" => 3, "no" => 4 ];
if (gettype($cc) == 'string') {
        if (array_key_exists($cc, $_table)) {
                goto "jmp_{$_table[$cc]}";
        } else {
                goto jmp_default;
        }
} else {
        /* do original if/else if/else sequence */
}
jmp_1: echo "DEUTSCH"; goto end;
jmp_2: echo "English"; goto end;
jmp_3: echo "Nederlands"; goto end;
jmp_4: echo "Norsk"; goto end;
jmp_default: echo "unknown";
end:
?>

This example shows that as long as $cc is a string, it will use the jump table $_table to jump (with goto) to the immediate code representing the case statement. If $cc is not a string, it will have to do the usual if/else if/else dance. In this case, it will eventually end up on the default case.

Looking at the graph generated by vld for this optimised construct, we see this new structure as well:

Especially with big switch/case structures this is a welcome optimisation!

However…

Xdebug implements code coverage gathering functionality. This can be used with PHP_CodeCoverage to see which parts of your code, which branches, and which paths are covered while executing your tests. In order for Xdebug to figure out the branches and paths, it has to follow PHP's internal opcode structures to see where jump instructions happen. I had to add support for this new switch/case optimisation to make this work again. This has now been done, and is part of Xdebug's master branch. This branch will soon become the first beta release of Xdebug 2.6 which will provide support for PHP 7.2.

Shortlink

This article has a short URL available: https://drck.me/php72switch-dgj

Comments

The goto jmp_5; → jmp_5; is not defined

Thank you for sharing. Quite interesting - even for those in older versions of PHP!

That`s nice to know. I will keep that in mind. Thanks for sharing with us!

Is it strict ? eg switch($num) {

case 1: echo 'its a num'; break; case '1': echo 'its a string': break;

}

Yes, it is strict. If there are mixed types in case statements, it won't create a jump table at all, and fall back to the original PHP 7.1 behaviour.

New Date/Time Support in MongoDB


Updated on Friday, October 13th 2017: Reflects changes made since MongoDB 3.5.12 — this now documents the state as of version MongoDB 3.6.0-rc0.


In the past few months I have been working on adding time zone support to MongoDB's Aggregation Framework. This support brings in the timelib library that is also used in PHP and HHVM to do time zone calculations.

Time Zone Support for Date Extraction Operators

MongoDB's Aggregation Framework already has several operators to extract date/time information from a Date value. For example, $month would retrieve the month from a Date value. Considering we have the document:

{ _id: 'derickr', dateOfBirth: ISODate("1978-12-22T08:15:00Z") }

The following PHP code would extract the month 12 from the dateOfBirth field:

<?php
include 'vendor/autoload.php';

$collection = (new MongoDB\Client)->demo->birthdays;

$cursor = $collection->aggregate([
    [ '$project' => [
        'birthMonth' => [ '$month' => '$dateOfBirth' ]
    ] ]
]);

foreach ($cursor as $person) {
    echo $person->birthMonth;
}

However, it was not possible to extract this information in the local time zone. So although my birth-time is given at 08:15:00, in reality it was 09:15 in the Netherlands.

The new functionality adds the timezone argument to the date extraction functions. Currently the operators only take a single value:

{ "$month" : ISODateValue }

Because it is not possible to add another argument to this, the syntax has been extended to:

{ "$month" : { "date" : ISODateValue } }

And then with the time zone argument added, it becomes:

{ "$month" : {
    "date" : ISODateValue,
    "timezone": timeZoneIdentifier
} }

For example:

{ $hour: {
    date: ISODate("1978-12-22T08:15:00Z"),
    timezone: "Europe/Amsterdam"
} }

Would return 09 (One hour later than UTC).

The timezone argument is optional, and must evaluate to either an Olson Time Zone Identifier such as Europe/London or America/New_York, or, an UTC offset string in the forms: +03, -0530, and +04:45. If you specify a timezone argument it means that the dateString that you provided will be interpreted as it was in that time zone.

In PHP, this example extracts the hour, and expresses the value in the Europe/Amsterdam time zone:

<?php
include 'vendor/autoload.php';

$collection = (new MongoDB\Client)->demo->birthdays;

$cursor = $collection->aggregate([
    [ '$project' => [
        'birthHour' => [
            '$hour' => [
                "date" => '$dateOfBirth',
                "timezone" => "Europe/Amsterdam",
            ]
        ]
    ] ]
]);

foreach ($cursor as $person) {
    echo $person->birthHour, "\n";
}

The $dateToParts Operator

Extracting a single value from a date can easily be done with one of the Date Extraction operators, but it becomes cumbersome to do this for all the fields one by one, especially when considering the application of time zones as well:

{ "$project" : {
    "year": { "$year" : { "date" : '$dob', "timezone" : "Europe/Amsterdam" } },
    "month": { "$month" : { "date" : '$dob', "timezone" : "Europe/Amsterdam" } },
    "day": { "$day" : { "date" : '$dob', "timezone" : "Europe/Amsterdam" } },
    "hour": { "$hour" : { "date" : '$dob', "timezone" : "Europe/Amsterdam" } },
    "minute": { "$minute" : { "date" : '$dob', "timezone" : "Europe/Amsterdam" } },
} }

The new $dateToParts operator simplifies having multiple single date value extraction operators into a single one. Its syntax is:

{ "$project" : {
    "parts" : {
        "$dateToParts" : {
            "date" : ISODateValue,
            "timezone" : timeZoneIdentifier,
            "iso8601" : boolean
        }
    }
} }

The timezone argument is optional, and is interpreted in the same as the timezone argument in the Date Extraction functions as explained above.

The result of the operator is a sub-document with the broken down parts, expressed in the (optionally) given time zone:

"parts" : {
    "year" : 1978, "month" : 12, "day" : 22,
    "hour" : 9, "minute" : 15, "second" : 0, "millisecond" : 0
}

$dateToParts also supports a third boolean argument, iso8601. If set to true, instead of year, month, and day, it returns the ISO 8601 isoWeekYear, isoWeek, and isoDayOfWeek fields representing an ISO Week Date. With the same date, the example is represented as:

"parts" : {
    "isoWeekYear" : 1978, "isoWeek" : 51, "isoDayOfWeek" : 5,
    "hour" : 9, "minute" : 15, "second" : 0, "millisecond" : 0
}

In PHP:

<?php
include 'vendor/autoload.php';

$collection = (new MongoDB\Client)->demo->birthdays;

$cursor = $collection->aggregate([
    [ '$project' => [
        'parts' => [
            '$dateToParts' => [
                "date" => '$dateOfBirth',
                "timezone" => "Europe/Amsterdam",
            ]
        ]
    ] ]
]);

foreach ($cursor as $person) {
    var_dump( $person->parts );
}

Which outputs, with formatting:

class MongoDB\Model\BSONDocument#5 (1) {
  private $storage =>
  array(7) {
    'year' => int(1978)
    'month' => int(12)
    'day' => int(22)
    'hour' => int(9)
    'minute' => int(15)
    'second' => int(0)
    'millisecond' => int(0)
  }
}

The $dateFromParts Operator

The new $dateFromParts operator does the opposite of the $dateToParts operator and constructs a new Date value from its constituent parts, with the possibility of interpreting the given values in a different time zone.

Its syntax is either:

{ "$project" : {
    "date" : {
        "$dateFromParts": {
            "year" : yearExpression,
            "month" : monthExpression,
            "day" : dayExpression,
            "hour" : hourExpression,
            "minute" : minuteExpression,
            "second" : secondExpression,
            "millisecond" : millisecondExpression,
            "timezone" : timezoneExpression
        }
    }
} }

or:

{ "$project" : {
    "date" : {
        "$dateFromParts": {
            "isoWeekYear" : isoWeekYearExpression,
            "isoWeek" : isoWeekExpression,
            "isoDayOfWeek" : isoDayOfWeekExpression,
            "hour" : hourExpression,
            "minute" : minuteExpression,
            "second" : secondExpression,
            "millisecond" : millisecondExpression,
            "timezone" : timezoneExpression
        }
    }
} }

Each argument's expression needs to evaluate to a number. This means the source can be either double, NumberInt, NumberLong, or Decimal. Decimal and double values are only supported if they convert to a NumberLong without any data loss.

Every argument is optional, except for year or isoWeekYear, depending on which variant is used. If month, day, isoWeek, or isoDayOfWeek are not given, they default to 1. The hour, minute, second and millisecond values default to 0 if not present.

The timezone argument is interpreted in the same as the timezone argument in the Date Extraction functions as explained above.

In PHP, an example looks like:

<?php
include 'vendor/autoload.php';

$collection = (new MongoDB\Client)->demo->birthdays;

$cursor = $collection->aggregate([
    [ '$project' => [
        'date' => [
            '$dateFromParts' => [
                "year" => 1978, "month" => 12, "day" => 22,
                "hour" => 9, "minute" => 15, "second" => 0,
                "millisecond" => 0,
                "timezone" => "Europe/Amsterdam",
            ]
        ]
    ] ]
]);

foreach ($cursor as $person) {
    var_dump( $person->date->toDateTime() );
}

Which outputs:

class DateTime#12 (3) {
  public $date => string(26) "1978-12-22 08:15:00.000000"
  public $timezone_type => int(1)
  public $timezone => string(6) "+00:00"
}

Changes to the $dateToString Operator

The $dateToString operator is extended with the timezone argument. Its full new syntax is now:

{ $dateToString: {
    format: formatString,
    date: dateExpression,
    timezone: timeZoneIdentifier
} }

The timezone argument is optional. If present, it formats the string according to the given time zone, otherwise it uses UTC.

The $dateToString format arguments have also been expanded. With the addition of the timezone argument came the %z and %Z format specifiers:

%z

The +hhmm or -hhmm numeric time zone as a string (that is, the hour and minute offset from UTC). Example: +0445, -0500

%Z

The minutes offset from UTC as a number. Example (following the +0445 and -0500 from %z): +285, -300

Once SERVER-29627 gets merged, the following new format specifiers will also be available:

%a

The abbreviated English name of the day of the week.

%b

The abbreviated English name of the month.

%e

The day of the month as a decimal number, but unlike %d, pre-padded with space instead of a 0.

An example of this in PHP:

<?php
include 'vendor/autoload.php';

$collection = (new MongoDB\Client)->demo->birthdays;

$cursor = $collection->aggregate([
    [ '$project' => [
        'date' => [
            '$dateToString' => [
                'date' => '$dateOfBirth',
                'format' => '%Y-%m-%d %H:%M:%S %z',
                'timezone' => 'Australia/Sydney',
            ]
        ]
    ] ]
]);

foreach ($cursor as $person) {
    echo $person->date;
}

Which outputs:

1978-12-22 19:15:00 +1100

The $dateFromString Operator

Analogous to PHP's DateTimeImmutable constructor, this operator can be used to create a Date value out of a string. It has the following syntax:

{ "$dateFromString": {
    "dateString": dateString,
    "timezone": timeZoneIdentifier
} }

The dateString could be anything like:

  • 2017-08-04T17:02:51Z

  • August 4, 2017 17:10:27.812+0100

In fact, it will accept everything that PHP's DateTimeImmutable constructor accepts as under the hood, it uses the same library. MongoDB enforces though that it is an absolute date/time string.

The timezone argument is optional, and is interpreted in the same as the timezone argument in the Date Extraction functions as explained above.

For example:

{ $dateFromString: {
    dateString: "2017-08-04T17:06:41.113",
    timezone: "Europe/London"
} }

Would mean 17:06 local time in London, or 16:06 in UTC (as London right now is at UTC+1).

It is not allowed to specify a time zone through the dateString (such as the ending Z or +0400) and also specify a time zone through the timezone argument. In that case, an exception is thrown.

In PHP, this looks like:

<?php
include 'vendor/autoload.php';

$collection = (new MongoDB\Client)->demo->birthdays;

$cursor = $collection->aggregate([
    [ '$project' => [
        'date' => [
            '$dateFromString' => [
                "dateString" => 'August 8th, 2017. 14:14:40',
                "timezone" => "Europe/Amsterdam",
            ]
        ]
    ] ]
]);

foreach ($cursor as $person) {
    var_dump( $person->date->toDateTime() );
}

Which outputs:

class DateTime#12 (3) {
  public $date =>
  string(26) "2017-08-08 12:14:40.000000"
  public $timezone_type =>
  int(1)
  public $timezone =>
  string(6) "+00:00"
}

As you can see, the time zone information is lost when the data is transferred between MongoDB and PHP as the BSON DateTime data type does not carry this information.

Using Date Expressions in $match

From MongoDB 3.5.12, it is also possible to use the new date expressions (and other expressions) in the $match pipeline operator. For example, in order to find all the documents before June 17th, 2017 in the New York time zone:

db.dates.aggregate( [
    { $match: {
        date: { $gte: { $expr: {
            $dateFromString: {
                dateString: "June 17th, 2017",
                timezone: "America/New_York"
            }
        } } }
    } }
] );

Or from PHP:

<?php
include 'vendor/autoload.php';

$collection = (new MongoDB\Client)->demo->dates;

$date = "June 17th, 2017";

$cursor = $collection->aggregate( [
    [ '$match' => [
        'date' => [ '$gte' => [ '$expr' => [
            '$dateFromString' => [
                "dateString" => [ '$literal' => $date ],
                "timezone" => "America/New_York",
            ]
        ] ] ]
    ] ]
]);

foreach ($cursor as $person) {
    var_dump( $person->date->toDateTime() );
}

Please note the use of the $literal operator here, which should be used for any user input that might be able to sneak in an expression into the value.

Notes

The time zone support is currently only available in a development release of MongoDB, and should be considered experimental. The following changes have happened since the original introduction MongoDB in 3.5.12:

  • Before MongoDB 3.5.12, the argument millisecond to dateFromParts is incorrectly spelled milliseconds.

  • Before MongoDB 3.6.0-rc0, the argument isoWeekYear was incorrectly called isoYear, and isoWeek was incorrectly called isoWeekYear. They are now in line with the existing $isoWeekYear and $isoWeek operators.

And the following issues are going to be addressed in future versions (3.7.x):

  • Until SERVER-30547 gets resolved, $dateFromParts does not accept an sub-document as argument, and instead requires each single field to be specified.

  • Until SERVER-30523 gets resolved, the field values to dateFromParts can not underflow or overflow their expected range. For example, the day field's value needs to be in the range 1..31 and the hour field's value needs to be in the range 0..23.

Shortlink

This article has a short URL available: https://drck.me/mongotimelib-ddg

Comments

No comments yet

Detecting Problems With -fsanitize

In the past few months I have been working on adding time zone support to MongoDB's Aggregation Framework. This support brings in the timelib library that is also used in PHP and HHVM to do time zone calculations. One of the stages in our workflow before we commit code to master, is to put our patches up onto our continuous integration platform Evergreen, where tests are run against multiple platforms. You expect the usual Ubuntu, RHEL and Windows platforms, but we also run on more esoteric platforms like s390. We also define a few special platforms that run tests in an environment where the code has been compiled in a special mode to test for undefined behaviour in the C language, and memory leaks.

One of the issues that this found (quite quickly) in timelib, was:

src/third_party/timelib-2017.05beta3/parse_tz.c:152:16: runtime error:
  left shift of 128 by 24 places cannot be represented in type 'int'

Which referred to the following code:

buffer[i] = timelib_conv_int(buffer[i]);

timelib_conv_int is a macro defined as:

#define timelib_conv_int(l) ((l & 0x000000ff) << 24) + \
        ((l & 0x0000ff00) << 8) + ((l & 0x00ff0000) >> 8) + \
        ((l & 0xff000000) >> 24)

The sanitiser stumbled over some of the data in the Olson database where we attempted to shift the unsigned integer 128 left by 24 positions into an signed integer, which of course can not represent this value. Thanks to the sanitizer, beta4 has this problem now fixed.

As part of the fix, I investigated how our tests managed to figure out that there was undefined behaviour. It appeared that GCC and clang have a specific flag to enable this debugging tool. It is as simple as adding -fsanitize=undefined to your build flags, which is what timelib in its standalone Makefile now includes.

One of the things that is tricky is that when writing C++, I tend to use many C-isms as that is what I have been working in for so long. And the opposite is true to. C++-isms are sometimes used when dealing with the timelib library which is written in C. One of these issues created a memory leak (fixed through this patch), as a structure was not allocated on the heap, but on the stack. This structure (timelib_time*) sometimes contains an extra string (for the time zone abbreviate) that needs to be freed if set.

This memory leak was also discovered by a sanitizer flag, but this time it was -fsanitize=address. This flag adds code to the compiled binary to test for memory leaks, overflows, etc.—not to dissimilar as to what Valgrind (I wrote about that before) provides. After adding this flag to the default build rules for timelib, it quickly found a few other memory leaks in some test and example files which I then addressed.

So there we have it, two new GCC and Clang flags that I did not know about. I now always compile timelib, as well as my local MongoDB test builds with these two flags. It certainly slows down execution, but that's a cheap price to pay to prevent bugs from making it into production.

Shortlink

This article has a short URL available: https://drck.me/san-dcf

Comments

No comments yet

HHVM and MongoDB

At the start of 2015 we began work on an HHVM driver for MongoDB, as part of our project to renew our PHP driver. Back then, HHVM was in its ascendancy and outperforming PHP 5.6 two to one. With such a huge performance difference it was reasonable to assume that many users would be switching over.

Around the start of 2016 I wrote a series of blog posts when we released the brand new drivers for PHP and HHVM. However, by then, PHP 7.0 had been released. PHP 7's performance is on par with HHVM making it less useful for users to move from PHP to HHVM. HHVM still offers a more strongly typed PHP syntax through Hack, and some other features, but its main attraction, speed, was mostly gone.

Writing an extension for HHVM is very different than doing so for PHP. PHP extensions, of which there are plenty, are written in C. HHVM extensions are written in C++, with very few third party extensions existing. At the same time, although PHP's APIs are not particularly well documented, there is a large group of people to ask for help. PHP also has a clearly defined internal API which is stable across minor versions. HHVM does not have this, and APIs kept changing often, although it wasn't always clear whether we were using an internal API.

Writing an HHVM extension is a very different experience for extension developers compared to PHP extensions. There is even less documentation, and virtually no third party extensions to look at for "inspiration". At the same time, it was much harder to get help from the developers, and much harder to debug as HHVM is many times more complex than PHP.

With PHP 7 released, we saw very little use of the HHVM driver for MongoDB. Some months ago I did a twitter poll, where very few people were indicating that they were using HHVM—and even if they were, they would likely not choose to switch to HHVM given the current climate.

Some of the feedback on the poll was not very assuring either:

With few users, frequent API breaks, and curious bugs we came to the conclusion that supporting the HHVM driver for MongoDB no longer makes good use of our engineering time. With Doctrine and Symfony 4 also no longer supporting HHVM, we have decided to discontinue the MongoDB driver for HHVM.

If anyone is interested in assuming ownership of the HHVM driver, please drop me a line and we can discuss the process in more detail.

Shortlink

This article has a short URL available: https://drck.me/hhvm-mongo-da6

Comments

No comments yet

Life Line