HHVM and MongoDB

At the start of 2015 we began work on an HHVM driver for MongoDB, as part of our project to renew our PHP driver. Back then, HHVM was in its ascendancy and outperforming PHP 5.6 two to one. With such a huge performance difference it was reasonable to assume that many users would be switching over.

Around the start of 2016 I wrote a series of blog posts when we released the brand new drivers for PHP and HHVM. However, by then, PHP 7.0 had been released. PHP 7's performance is on par with HHVM making it less useful for users to move from PHP to HHVM. HHVM still offers a more strongly typed PHP syntax through Hack, and some other features, but its main attraction, speed, was mostly gone.

Writing an extension for HHVM is very different than doing so for PHP. PHP extensions, of which there are plenty, are written in C. HHVM extensions are written in C++, with very few third party extensions existing. At the same time, although PHP's APIs are not particularly well documented, there is a large group of people to ask for help. PHP also has a clearly defined internal API which is stable across minor versions. HHVM does not have this, and APIs kept changing often, although it wasn't always clear whether we were using an internal API.

Writing an HHVM extension is a very different experience for extension developers compared to PHP extensions. There is even less documentation, and virtually no third party extensions to look at for "inspiration". At the same time, it was much harder to get help from the developers, and much harder to debug as HHVM is many times more complex than PHP.

With PHP 7 released, we saw very little use of the HHVM driver for MongoDB. Some months ago I did a twitter poll, where very few people were indicating that they were using HHVM—and even if they were, they would likely not choose to switch to HHVM given the current climate.

Some of the feedback on the poll was not very assuring either:

With few users, frequent API breaks, and curious bugs we came to the conclusion that supporting the HHVM driver for MongoDB no longer makes good use of our engineering time. With Doctrine and Symfony 4 also no longer supporting HHVM, we have decided to discontinue the MongoDB driver for HHVM.

If anyone is interested in assuming ownership of the HHVM driver, please drop me a line and we can discuss the process in more detail.

Shortlink

This article has a short URL available: https://drck.me/hhvm-mongo-da6

Comments

No comments yet

15 years of Xdebug

This article was going to be about some upcoming features in the 2.6 release. Or rather, I was hoping to announce at least a beta release of Xdebug 2.6. Unfortunately, I couldn't find enough time to work on all the issues that I wanted, although I've made a little progress.

What I can write about, is a little mystery. About 3 weeks ago, I got a mysterious invitation to meet up with James Titcumb, right outside my (and James's!) favourite whisky shop in London. And that I must bring some carrying capacity:

At that time I had had put together that with Xdebug's 15th anniversary looming, and me having an Amazon wishlist full of whisky, James would be kind enough to buy me something less "standard".

I however had not quite expected what actually happened. During the day of meeting, I saw some tweets going on about a little (secret) fundraiser. Apparently I wasn't to know, but it is difficult to keep things a secret I suppose. In any case, because I had thought it'd have something to do with meeting James later that day, I didn't really read much of it, as it would likely spoil a surprise.

And what a surprise it was!

So I show up at 5, and there is James with his phone trying to figure out how Periscope works. We go in and the manager, has a story to tell about 8 quite amazing whiskies, which I then get to take home. With amazing, I mean, amazing and special and rare whiskies from closed down distilleries, and a few more approachable whiskies. I quickly realized that they are not, erm, cheap either:

  • Balvenie 21 Portwood

  • Dalmore King Alexander

  • Dallas Dhu 27

  • Springbank 25 - 2017 edition

  • Caperdonich 20 - (Demolished Distillery) 2016 release

  • Glenlochy 1979 - (Demolished Distillery) Rare Old Label 2016

  • St.Magdalane 26 - (Demolished Distillery) Rare Archive bottle from 2006

  • Banff 1974 18 year old 40% - (Demolished Distillery) Rare Archive bottle from 1992 Gordon & Macphails

I have tried the Balvenie Portwood 21 and Dalmore King Alexander before, but certainly not the others!

So yeah, after borrowing a suitcase I managed to get these great bottles home, and while doing so, James explains what hat happened. PHP's "godfather", Cal Evans, had originally intended to raise money to buy the most expensive bottle from his local whisk(e)y shop - at 4699 USD. James (luckily) managed to convince him that in the whisky world, price doesn't always equal quality. There is a bit of a limit at perhaps £125 for "normal" bottles, but of course quite a bit more for "rare" whiskies. They managed to raise slightly more, through the generous donation of many people and companies that find Xdebug useful. I saw the list, and there were many lovely messages in there as well, a few of them I am including here:

"XDebug is hugely important for our team. Thank you for all you have done!"

"var_dump($scotch);"

"Thanks Derick! Such a critical part of day-to-day PHP dev life =)"

"Xdebug has helped me solving numerous bugs, thanks!"

"Because every good developer knows that great code begins with great debugging tools!"

"xdebug helps me to not drink... ashnazg"

"Thank you for your efforts in the community!"

"Xdebug has made my development life immeasurably easier over the last 15 years. Here's to the next 15, and a massive thank you! :-)"

"For one of the most used and usefully tools in a professionals day to day live beside a good editor."

"It's people like you that make this such an amazing community! Thank you for giving yourself so freely for so long!"

"I can barely remember what coding was like before discovering XDebug. Thank you Derick for making our lives easier!"

Thanks for these lovely messages, and thanks for donating to my whisky fund Aaron Saray, Accent Interactive, Adam Culp, Adam Kammeyer, Adrián Cárdenas, Alain Schlesser, Alex Ross, Alexander Marinov, Andreas Heigl, Andrew Caya, Andrew Millington, Antonis Pavlakis, Barry Hughes, Bart Reunes, Ben Ramsey, Bernhard Breytenbach, Bill Condo, Boone Gorges, Boyan Djumakov, Boyan Yordanov, Chris Brookins, Chris Hartjes, Chris Sherry, Chris Spruck, Chuck Burgess, Code4Hire Kft, Cristiano Diniz da Silva, Damien Seguy, Daniel Abernathy, Dave Hall, David Alger, David Lundgren, Diana Espino, Doug Johnson, Dougal Campbell, Enrico Zimuel, Eric Hogue, Ferenc Kovács, Fran Novo, Frank de Jonge, Frederic Dewinne, Freek Van der Herten, Gilbert Pellegrom, Goran Mitrovic, Gordon Forsythe, Guillaume Rossolini, Iain Poulson, Ian H, Ian Littman, J.T. Grimes, Jake Smith, Jakub Gadkowski, James LaChance, James Titcumb, Jeff Carouth, Jeff Kolesnikowicz, Jeff Rupert, Jeremy Emery, Jeremy Lindblom, Jeroen Boersma, Jeroen de Jong, JetBrains sro, Joey Fowler, Josh Butts, Josh Holmes, Joshua Thijssen, Juliette Reinders Folmer, Kara Ferguson, Kathryn Reeve, Ken Sherman, Kevin Schroeder, Lance Cleveland, Laura Folco, Liam Wiltshire, Lucas van Lierop, Luke Stokes, Mark Baker, Matt Trask, Matthew Weier O'Phinney, Max Griffin, Merlijn Tishauser, Michael Babker, Michael Butler, Michael Dyrynda, Michael Moussa, Michael Pearson, Michael Stowe, Michael Williams, Mihail Irintchev, Milan Popovic, Modern Tribe, Nate Ritter, Navarr Barnier, Nikolay Ignatov, Nils Preuss, Noah Heck, Omni Adams, Paul McGrane, Paul Sohier, Paul Yasi, Peter Breuls, Pádraic Brady, Rafael Dohms, Rich Sage, Richard Bairwell, Richard Hagen, Rob Allen, Robert Basic, Robert Landers, Rodrigo Capilé, Russell Barnhart, Ryan Weaver and Leanna Pelham, Samantha Quiñones, Sammy Powers, Sandy Smith, Scott Arciszewski, Sebastian Feldmann, Shaun Hare, SitePoint PHP Channel, Stefan Koopmanschap, Stephan Hochdörfer, Steve Grunwell, Steven Wade, Svetlozar Stoyanov, Team Enrise, Tim Stamp, Tom Cruickshank, Tom De Wit, Toni Vega, Ulf Wandschneider, Willem-Jan Zijderveld, Wim Godden, Youri Thielen, Zeke Farwell, and the anonymous donors!*

What's next?

I'll be publishing the tasting notes for the whiskies on the https://dram.io — one of the few places where I actually use Xdebug myself. I might not open all of them (yet) though.

And on the Xdebug front, there are plenty of bugs to fix, features to add for Xdebug 2.6, and undoubtedly Dmitry will be "breaking" some things in PHP 7.2 that I need to support in Xdebug as well.

Slàinte!

Shortlink

This article has a short URL available: https://drck.me/xdebug-10-d9k

Comments

I'm sorry to have missed hearing about the donation call, 'cause I guess that means I won't be getting any tastings!

Just wanted to say thank you :)

Cheers brother, mad respect and thanks for xdebug!!!

I also just wanted to say super thanks for this amazing tool called Xdebug. Enjoy the well deserved whiskeys ;-)

Thanks for XDebug. Makes life a lot more easier and enjoyable. I use it all the time to learn new stuff under the hood.

Good Bye PHP 5

A few days ago I merged a patch into Xdebug that removes support for PHP 5 in Xdebug's master branch on GitHub. Maintaining PHP 5 and PHP 7 support in one code base is not particularly easy, and even more complicated for something like Xdebug, with its deep interactions with PHP's internals.

As PHP 5.6's active support has ended on December 31st, I also felt it no longer needed to support PHP 5 with Xdebug any more. It saves more than 5000 lines of code:

Many people people were quite positive about that:

Others were less keen:

Removing PHP 5 support from Xdebug's master branch does not mean that Xdebug suddenly stops working for PHP 5 installations. Xdebug 2.5, which was recently released supports PHP 5.5 and 5.6, and is not going to go away.

Right now, Xdebug will no longer receive new features in the branch that also supports PHP 5. New features will only go into master (to become Xdebug 2.6). However, Xdebug 2.5 continues to receive bug fixes until Xdebug 2.6 comes out.

Once Xdebug 2.6 comes out, the Xdebug 2.5 branch will no longer get bug fixes, and hence support for PHP 5 goes away. That still does not mean that you can no longer use Xdebug with PHP 5. The releases of the 2.5 branch will still be available.

On the positive side, not having to implement lots of code twice, also means that new features can be added faster, as less work is required. Xdebug 2.6 has already have some new features lined up.

Shortlink

This article has a short URL available: https://drck.me/byephp5-d4v

Comments

No comments yet

Natural Language Sorting with MongoDB 3.4

Arranging English words in order is simple—most of the time. You simply arrange them in alphabetical order. Sorting a set of German words, or French words with all of their accents, or Chinese with their different characters is a lot harder than it looks. Sorting rules are specified through locales, which determine how accents are sorted, in which order the characters are in, and how to do case-insensitive sorting. There is a good set of those sorting rules available through CLDR, and there is a neat example to play with all kinds of sorting at ICU's demo site. If you want to know how the algorithms work, have a look at the Unicode Consortium's report on the Unicode Collation Algorithm.

Years ago I wrote about collation and MongoDB. There is an old issue in MongoDB's JIRA tracker, SERVER-1920, to implement collation so that sorting and indexing could work depending on the different sorting orders as described for each language (locale).

Support for these collations have finally landed in MongoDB 3.4 and in this article we are going to have a look at how they work.

How Unicode Collation Works

Many computer languages have their own implementation of the Unicode Collation Algorithm, often implemented through ICU. PHP has an ICU based implementation as part of the intl extension, in the form of the Collator class.

The Collator class encapsulates the Unicode Collation Algorithm to allow you to sort an array of text yourself. It also allows you to visualise the "sort key" to see how the algorithm works:

Take for example the following array of words:

$dictionary = [
    'boffey', 'bøhm', 'brown',
];

Which we can turn into sort keys, and sort using the en locale (English):

$collator = new Collator( 'en' );
foreach ( $dictionary as $word )
{
    $sortKey = $collator->getSortKey( $word );
    $dictionaryWithKey[ bin2hex( $sortKey ) ] = $word;
}

ksort( $dictionaryWithKey );
print_r( $dictionaryWithKey );

Which outputs:

Array
(
    [2b4533333159010a010a] => boffey
    [2b453741014496060109] => bøhm
    [2b4b45554301090109] => brown
)

If we would do this according to the nb (Norwegian) locale, the output would have brown and bøhm reversed:

Array
(
    [2b4533333159010a010a] => boffey
    [2b4b45554301090109] => brown
    [2b5c6703374101080108] => bøhm
)

The sort key for bøhm has now changed, so that its numerical value now makes it sort after brown instead of before brown. In Norwegian, the ö is a distinct letter that sorts after z.

MongoDB 3.4

Before the release of MongoDB 3.4, it was not possible to do a locale based search. As case-insensitivity is just another property of a locale, that was not supported either. Many users worked around this by storing a lower case version of the value in separate field just to do a case-insensitive search. But this has now changed with the implementation of SERVER-1920.

In MongoDB 3.4 you may attach a default locale to a collection:

db.createCollection( 'dictionary', { collation: { locale: 'nb' } } );

A default locale is used for any query without a different locale being specified with the query. Compare the default (nb) locale:

> db.dictionary.find().sort( { word: 1 } );
{ "_id" : ObjectId("5846d65210d52027a50725f0"), "word" : "boffey" }
{ "_id" : ObjectId("5846d65210d52027a50725f1"), "word" : "brown" }
{ "_id" : ObjectId("5846d65210d52027a50725f2"), "word" : "bøhm" }

With the English (en) locale:

> db.dictionary.find().collation( { locale: 'en'} ).sort( { word: 1 } );
{ "_id" : ObjectId("5846d65210d52027a50725f0"), "word" : "boffey" }
{ "_id" : ObjectId("5846d65210d52027a50725f2"), "word" : "bøhm" }
{ "_id" : ObjectId("5846d65210d52027a50725f1"), "word" : "brown" }

The default locale of a collection is also inherited by an index when you create one:

db.dictionary.createIndex( { word: 1 } );

db.dictionary.getIndexes();
[
    …
    {
        "v" : 2,
        "key" : { "word" : 1 },
        "name" : "word_1",
        "ns" : "demo.dictionary",
        "collation" : {
            "locale" : "nb",
            "caseLevel" : false,
            "caseFirst" : "off",
            "strength" : 3,
            "numericOrdering" : false,
            "alternate" : "non-ignorable",
            "maxVariable" : "punct",
            "normalization" : false,
            "backwards" : false,
            "version" : "57.1"
        }
    }
]


From PHP

All the examples below are using the PHP driver for MongoDB (1.2.0) and the accompanying library (1.1.0). These are the minimum versions to work with locales.

To use the MongoDB PHP Library, you need to use Composer to install it, and include the Composer-generated autoloader to make the library available to the script. In short, that is:

php composer require mongodb/mongodb=^1.1.0

And at the start of your script:

<?php
require 'vendor/autoload.php';

In this first example, we are going to drop the collection dictionary from the demo database, and create a collection with the default collation en. We also create an index on the word field and insert a couple of words.

First the set-up and assigning of the database handle ($demo):

$client = new \MongoDB\Client();
$demo = $client->demo;

Then we drop the dictionary collection:

$demo->dropCollection( 'dictionary' );

We create a new collection dictionary and set the default collation for this collection to the en locale:

$demo->createCollection(
    'dictionary',
    [
        'collation' => [ 'locale' => 'en' ],
    ]
);
$dictionary = $demo->dictionary;

We create the index, and we also give the index the name dictionary_en. MongoDB supports multiple indexes with the same field pattern, as long as they have a different name and have different collations (e.g. locale, or locale options):

$dictionary->createIndex(
    [ 'word' => 1 ],
    [ 'name' => 'dictionary_en' ]
);

And then we insert some words:

$dictionary->insertMany( [
    [ 'word' => 'beer' ],
    [ 'word' => 'Beer' ],
    [ 'word' => 'côte' ],
    [ 'word' => 'coté' ],
    [ 'word' => 'høme' ],
    [ 'word' => 'id_12' ],
    [ 'word' => 'id_4' ],
    [ 'word' => 'Home' ],
] );

When doing a query, you can specify the locale for that operation. Only one locale can be used for a single operation, which means that MongoDB uses the same locale for the find and the sort parts of a query. We do intent to support more granular support for using collations on different parts of an operation. This is tracked in SERVER-25954.

Using the Default Locale

Let's do a query while sorting with the en locale. Because this is the default locale for this collection, we don't have to specify it. We also define a helper function to show the result of this query, and further queries:

function showResults( string $name, \MongoDB\Driver\Cursor $results )
{
    echo $name, ":\n";
    foreach( $results as $result )
    {
        echo $result->word, " ";
    }
    echo "\n\n";
}

showResults(
    "Sort with default locale",
    $dictionary->find( [], [ 'sort' => [ 'word' => 1 ] ] )
);

This outputs:

Sort with default locale:
beer Beer coté côte Home høme id_12 id_4


Only the Base Character

There are many variants of locales. The strength option defines the number of levels that are used to perform a comparison of characters. At strength=1, only base characters are compared. This means that with the en locale: beer == Beer, coté == côte, and Home == høme.

You can specify the strength while doing each query. First we use the en locale and strength 1. This is equivalent to a case insensitive match:

showResults(
    "Match on base character only",
    $dictionary->find(
        [ 'word' => 'beer' ],
        [ 'collation' => [ 'locale' => 'en', 'strength' => 1 ] ]
    )
);

Which outputs:

Match on base character only:
beer Beer

Strength 1 also ignores accents on characters, such as in:

showResults(
    "Match on base character only, ignoring accents",
    $dictionary->find(
        [ 'word' => 'home' ],
        [ 'collation' => [ 'locale' => 'en', 'strength' => 1 ] ]
    )
);

Which outputs:

Match on base character only, ignoring accents:
høme Home

As strength, or any of the other options we will see later, changes the sort key for a string, it is important that you realise that because of this, an index in MongoDB will only be used if it is created with the exact same locale options as the query.

Because we only have an index on word with the default en locale, all other examples do not make use of an index while matching or sorting. If you want to make an indexed lookup for the en/strength=1 example, you need to create an index with:

$dictionary->createIndex(
    [ 'word' => 1 ],
    [
        'name' => 'word_en_strength1',
        'collation' => [
            'locale' => 'en',
            'strength' => 1
        ],
    ]
);

Different Locales, Different Letters

Not every language considers an accented character a variant of the original base character. If we run the last example with the Norwegian Bokmål (nb) locale we get a different result:

showResults(
    "Match on base character only (nb locale)",
    $dictionary->find(
        [ 'word' => 'home' ],
        [ 'collation' => [ 'locale' => 'nb', 'strength' => 1 ] ]
    )
);

Which outputs:

Match on base character only (nb locale), ignoring accents:
Home

In Norwegian, the ø sorts as a distinct letter after z, where the alphabet ends with: y z æ ø å.

Sorting Accents

Strength 2 takes into account accents on letters while matching and sorting. If we run the match on home in the English locale with strength 2, we get:

showResults(
    "Match on base character with accents",
    $dictionary->find(
        [ 'word' => 'home' ],
        [ 'collation' => [ 'locale' => 'en', 'strength' => 2 ] ]
    )
);

Which outputs:

Match on base character with accents:
Home

The word høme is no longer included. However, the case of characters is still not considered:

showResults(
    "Match on base character with accents (and not case sensitive)",
    $dictionary->find(
        [ 'word' => 'beer' ],
        [ 'collation' => [ 'locale' => 'en', 'strength' => 2 ] ]
    )
);

Which outputs:

Match on base character with accents (and not case sensitive):
beer Beer

Again, more fun can be had while sorting with accents, because languages do things differently. If we take the words cøte and coté, we see a difference in sorting between the fr (French) and fr_CA (Canadian French) locales:

showResults(
    "Sorting accents in French (France)",
    $dictionary->find(
        [ 'word' => new \MongoDB\BSON\Regex( '^c' ) ],
        [
            'collation' => [ 'locale' => 'fr', 'strength' => 2 ],
            'sort' => [ 'word' => 1 ],
        ]
    )
);

showResults(
    "Sorting accents in Canadian French",
    $dictionary->find(
        [ 'word' => new \MongoDB\BSON\Regex( '^c' ) ],
        [
            'collation' => [ 'locale' => 'fr_CA', 'strength' => 2 ],
            'sort' => [ 'word' => 1 ],
        ]
    )
);

Which outputs:

Sorting accents in French (France):
coté côte

Sorting accents in Canadian French:
côte coté

In Canadian French, the accents sort from back to front. This is called Backward Secondary Sorting sorting, and is an option you can set on any locale-based query. Some language locales have different default values for options. To make the French Canadian sort the "wrong" way, we can specify the additional backwards option:

showResults(
    "Sorting accents in Canadian French, the 'wrong' way",
    $dictionary->find(
        [ 'word' => new \MongoDB\BSON\Regex( '^c' ) ],
        [
            'collation' => [ 'locale' => 'fr_CA', 'strength' => 2, 'backwards' => false ],
            'sort' => [ 'word' => 1 ],
        ]
    )
);

Which outputs:

Sorting accents in Canadian French, the 'wrong' way:
coté côte

Interesting Locales

There are a few other interesting sorting and matching methods in different locales.

  • In Germany's phone book collation, the ö in böhm sorts like an oe.

  • In Russian, the Cyrillic letters sort before Latin letters.

  • In Sweden's "standard" collation, the v and w are considered equivalent letters.

As an example:

$demo->dropCollection( 'dictionary' );

$dictionary->insertMany( [
    [ 'word' => 'swag' ],
    [ 'word' => 'Boden' ],
    [ 'word' => 'böse' ],
    [ 'word' => 'Bogen' ],
    [ 'word' => 'sverre' ],
    [ 'word' => 'Валенти́на' ],
    [ 'word' => 'Ю́рий' ],
] );

$locales = [
    'de',
    'de@collation=phonebook',
    'ru',
    'sv@collation=standard',
];

foreach( $locales as $locale )
{
    showResults(
        "Sorting with the '$locale' locale",
        $dictionary->find(
            [],
            [
                'collation' => [ 'locale' => $locale, 'strength' => 2 ],
                'sort' => [ 'word' => 1 ]
            ]
        )
    );
}

Which outputs:

Sorting with the 'de' locale:
Boden Bogen böse sverre swag Валенти́на Ю́рий

Sorting with the 'de@collation=phonebook' locale:
Boden böse Bogen sverre swag Валенти́на Ю́рий

Sorting with the 'ru' locale:
Валенти́на Ю́рий Boden Bogen böse sverre swag

Sorting with the 'sv@collation=standard' locale:
Boden Bogen böse swag sverre Валенти́на Ю́рий

Please also note that I had to set strength to 2 here, as Germans like capitalizing their nouns as well as names!

Other Options

The default strength is 3, which besides base character and accents, also takes the case into account. A search for beer will no longer find Beer (☹).

But there are a few other things you can configure with locales. If you paid attention, you saw that my word list includes id_4 and id_12. If you sort this in the normal default order, you will see the following:

showResults(
    "Sorting with numbers in strings",
    $dictionary->find(
        [ 'word' => new \MongoDB\BSON\Regex( '^id_' ) ],
        [ 'sort' => [ 'word' => 1 ] ]
    )
);

Which outputs:

Sorting with numbers in strings:
id_12 id_4

In order to fix that, you can set the numericOrdering option on the locale, as this done here:

showResults(
    "Sorting with numbers in strings, properly",
    $dictionary->find(
        [ 'word' => new \MongoDB\BSON\Regex( '^id_' ) ],
        [
            'collation' => [ 'locale' => 'en', 'numericOrdering' => true ],
            'sort' => [ 'word' => 1 ],
        ]
    )
);

Which then outputs:

Sorting with numbers in strings, properly:
id_4 id_12

Other options are also available, and are documented in the Collation section of the MongoDB manual.

Conclusion

Languages and language sorting is complex. In the examples above I have only shown collations with Western Latin and Cyrillic characters. Asian languages make searching and sorting even more complicate. With Japanese and Chinese characters, there are different ways of determining their sort order for example. But getting sorting strings and matching search phrases right is very important for the usability of applications. And because of that, the implementation of SERVER-1920 is a very welcome addition to MongoDB. The implementation in MongoDB supports every locale and variant that ICU supports. A list of these locales with their identifier can be found in the documentation.

Further work on collation support is also expected. To track issues and vote for them, please refer to list on JIRA.

Shortlink

This article has a short URL available: https://drck.me/mdbcoll34-cqh

Comments

This is very insightful, thanks for taking the time to write this!

Thanks for the insightful info presented here. I have learnt something new today because of this blog article. Thanks again. Will be delighted to read more.

Not Finding the Symbols

Yesterday we released the new version of the MongoDB Driver for PHP, to coincide with the release of MongoDB 3.4. Not long after that, we received an issue through GitHub titled "Undefined Symbol php_json_serializable_ce in Unknown on Line 0".

TL;DR: Load the JSON extension before the MongoDB extension.

The newly released version of the driver has support for PHP's json_encode() through the JsonSerializable interface, to convert some of our internal BSON types (think MongoDB\BSON\Binary and MongoDB\BSON\UTCDateTime) directly to JSON. For this it uses functionality in PHP's JSON extension, and with that the php_json_serializable_ce symbol that this extension defines.

We run our test suite on many different distributions, but (nearly) always with our own compiled PHP binaries as we need to support so many versions of PHP (5.4-5.6, 7.0, and now 7.1), in various configurations (ZTS, or not; 32-bit or 64-bit). It came hence quite as a surprise that a self-compiled extension would not load for one of our users.

When compiling PHP from its source, by default the JSON extension becomes part of the binary. This means that the JSON extension, and the symbols it implements are always available. Linux distributions often split out each extension into their own package or shared object. Debian has php5-json (on which php5-cli depends), while Fedora has php-json. In order to make use of the JSON extension, you therefore need to install a separate package that provides the shared object (json.so) and a configuration file. Fedora installs the 20-json.ini file in /etc/php.d/. Debian installs the 20-json.ini file in /etc/php5/mods-available with a symlink to /etc/php5/cli/conf.d/20-json.ini. In both cases, they include the required extension=json.so line that instruct PHP to load the shared object and make its symbols (and PHP functions) available.

A normal PHP binary uses the dlopen system call to load a shared object, with the RTLD_LAZY flag. This flag means that symbols (such as php_json_serializable_ce) are only resolved lazily, when they are first used. This is important, because PHP extensions and the shared objects they live in, can depend on each other. The MongoDB extension depends on date, spl and json. After PHP has loaded all the shared extensions, it registers the classes and functions contained in them, in an order to satisfy this dependency graph. PHP makes sure that the classes and functions in the JSON extension are registered before the MongoDB extension, so that when the latter uses the php_json_serializable_ce symbol to declare that the MongoDB\\BSON\\UTCDateTime class implements the JsonSerializable interface the symbol is already available.

Distributions often want to harden their provided packages with additional security features. For that, they compile binaries with additional features and flags.

Debian patches PHP to replace the RTLD_LAZY flag with RTLD_NOW. Instead of resolving symbols when they are first used, this signals to the dlopen system call to resolve the symbols when the shared object is loaded. This means, that if the MongoDB extension is loaded before the JSON extension, the symbols are not available yet, and the linker throws the "Undefined Symbol php_json_serializable_ce in Unknown on Line 0" error from our bug report. This is not a problem that only related to PHP; TCL has similar issues for example.

With Fedora, the same issue is present, but shows through slightly different means. Instead of patching PHP to replace RTLD_LAZY with RTLD_NOW, it uses linker flags ("-Wl,-z,relro,-z,now") to force binaries to resolve symbols as soon as they are loaded process wide. This Built with BIND_NOW security feature goes hand in hand with Built with RELRO. The explanation on why these features are enabled on Fedora is well described on their wiki. Previously, this did expose an issue with an internal PHP API regarding creating a DateTime object.

But where does this leave us? The solution is fairly simple: You need to make sure that the JSON extension's shared object is loaded before the MongoDB extension's shared object. PECL's pecl install suggests to add the extension=mongodb.so line to the end of php.ini. Instead, on Debian, it would be much better to put the extension=mongodb.so line in a separate 99-mongodb.ini file under /etc/php5/mods-available, with a symlink to /etc/php5/cli/conf.d/99-mongodb.ini and /etc/php5/apache2/conf.d/99-mongodb.ini:

cat << EOF > /etc/php5/mods-available/mongodb.ini
; priority=99
extension=mongodb.so
EOF
php5enmod mongodb

On Fedora, you should add the extension=mongodb.so line to the new file /etc/php.d/50-mongodb.ini:

echo "extension=mongodb.so" > /etc/php.d/50-mongodb.ini

Alternatively, you can install the distribution's package for the MongoDB extension. Fedora currently has the updated 1.2.0 release for Rawhide (Fedora 26). Debian however, does not yet provide a package for the latest release yet, although an older version (1.1.7) is available in Debian unstable. At the time of this writing, Ubuntu only provides older versions for Xenial and Yakkety.

Shortlink

This article has a short URL available: https://drck.me/undefsym-cpv

Comments

Adding "extension=mongodb.so" at the end of php.ini didn't solve the issue but 99-mongodb.ini did! Thanks! BTW $ uname -a Linux titanic 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux

Made new file mongodb.ini

added 2 lines ; priority=99 extension=mongodb.so

& run command phpenmod mongodb

This worked for me. I had to remove extension=mongodb.so from php.ini file

On Ubuntu 14 & php 7 ( Working on Vagrant )

Thanks for the info, I tried just sticking extension=json.so one line above in the php.ini and that worked for me

I was using a standard AWS image. And the following worked for me: # Don't add extension to /etc/php-5.6.ini echo "extension=mongodb.so" > /etc/php.d/50-mongodb.ini echo "extension=mongo.so" > /etc/php.d/50-mongo.ini

Life Line