Using the Right Debugging Tools
A while ago, we updated the MongoDB PHP driver's embedded C library to a new version. The C library handles most of the connection management and other low level tasks that the PHP driver needs to successfully talk to MongoDB deployments, especially in replicated environments where servers might disappear for maintenance or hardware failures.
After upgrading the C library from 1.12 to 1.13 we noticed that one of the PHP driver's tests was failing:
derick@singlemalt:~/dev/php/derickr-mongo-php-driver $ make test TESTS=tests/atlas.phpt … ==================================================================== FAILED TEST SUMMARY --------------------------------------------------------------------- Atlas Connectivity Tests [tests/atlas.phpt] =====================================================================
When running the test manually, we get:
derick@singlemalt:~/dev/php/mongodb-mongo-php-driver $ php tests/atlas.phpt
--TEST--
Atlas Connectivity Tests
--SKIPIF--
skip Atlas tests not wanted
--FILE--
PASS
mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cluster.c:1852 mongoc_cluster_fetch_stream_single():
precondition failed: scanner_node && !scanner_node->retired
Aborted
That was not good news.
The atlas.phpt test tests whether the PHP driver (through the C driver) can connect to a set of different deployments of Atlas, MongoDB's Database as a Service platform. The test makes sure we can talk to an Atlas replica set, a sharded cluster, a free tier replica set, as well as TLS 1.1 and TLS 1.2 specific deployments. The test started failing when connecting to the second provided URI (the sharded cluster).
At first I thought this was caused by the upgrade from version 1.12 to 1.13 of the C driver, but that didn't end up being the case. Let's see how we got to finding and fixing this bug.
First of all, I started GDB to see where it was failing:
derick@singlemalt:~/dev/php/mongodb-mongo-php-driver $ gdb --args php tests/atlas.phpt
GNU gdb (Debian 8.1-4+b1) 8.1
…
Reading symbols from php...done.
(gdb) run
Starting program: /usr/local/php/7.3dev/bin/php tests/atlas.phpt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
--TEST--
Atlas Connectivity Tests
--FILE--
PASS
mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cluster.c:1852 mongoc_cluster_fetch_stream_single():
precondition failed: scanner_node && !scanner_node->retired
Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
This shows that the C driver bailed out due to a specific assertion on line 1852 of mongoc-cluster.c. This assertion reads:
BSON_ASSERT (scanner_node && !scanner_node->retired);
Which didn't really say a lot. The next thing to try is then to make a backtrace in GDB with the bt command. This revealed:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff41472f1 in __GI_abort () at abort.c:79
#2 0x00007ffff32cdcf1 in mongoc_cluster_fetch_stream_single (
cluster=0x555556ba8e68, server_id=2, reconnect_ok=true, error=0x555556b95858)
at mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cluster.c:1852
#3 0x00007ffff32cd9df in _mongoc_cluster_stream_for_server (
cluster=0x555556ba8e68, server_id=2, reconnect_ok=true, cs=0x0, reply=0x7fffffff9ee0, error=0x555556b95858)
at mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cluster.c:1762
#4 0x00007ffff32cdbe6 in mongoc_cluster_stream_for_server (
cluster=0x555556ba8e68, server_id=2, reconnect_ok=true, cs=0x0, reply=0x7fffffff9ee0, error=0x555556b95858)
at mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cluster.c:1826
#5 0x00007ffff32ddb0a in _mongoc_cursor_fetch_stream (cursor=0x555556b95700)
at mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cursor.c:647
#6 0x00007ffff32e12a0 in _prime (cursor=0x555556b95700)
at mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cursor-find.c:40
#7 0x00007ffff32df22c in _call_transition (cursor=0x555556b95700)
at mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cursor.c:1146
#8 0x00007ffff32df54b in mongoc_cursor_next (cursor=0x555556b95700, bson=0x7fffffffa038)
at mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cursor.c:1214
#9 0x00007ffff332511c in phongo_cursor_advance_and_check_for_error (cursor=0x555556b95700)
at mongo-php-driver/php_phongo.c:742
#10 0x00007ffff33253d9 in phongo_execute_query (
client=0x555556ba8e60, namespace=0x7fffeaeb5d58 "test.test", zquery=0x7ffff38201c0, options=0x0,
server_id=1, return_value=0x7ffff38200f0, return_value_used=1)
at mongo-php-driver/php_phongo.c:810
#11 0x00007ffff3342cc3 in zim_Manager_executeQuery (execute_data=0x7ffff3820160, return_value=0x7ffff38200f0)
at mongo-php-driver/src/MongoDB/Manager.c:492
#12 0x0000555555e41d16 in execute_internal (execute_data=0x7ffff3820160, return_value=0x7ffff38200f0)
at /home/derick/dev/php/php-src.git/Zend/zend_execute.c:2328
…
At first glance, I couldn't really see anything wrong with this backtrace, and was still puzzled as to why it would abort. I decided to go for a lunch time walk and have a look at it again. I always find that walks are good for clearing my mind.
After the walk, and a cuppa tea, I looked at the backtrace again, and noticed the following curiosity:
#4 0x00007ffff32cdbe6 in mongoc_cluster_stream_for_server (
cluster=0x555556ba8e68, server_id=2, reconnect_ok=true, cs=0x0, reply=0x7fffffff9ee0, error=0x555556b95858)
at mongo-php-driver/src/libmongoc/src/libmongoc/src/mongoc/mongoc-cluster.c:1826
vs:
#10 0x00007ffff33253d9 in phongo_execute_query (
client=0x555556ba8e60, namespace=0x7fffeaeb5d58 "test.test", zquery=0x7ffff38201c0, options=0x0,
server_id=1, return_value=0x7ffff38200f0, return_value_used=1)
at mongo-php-driver/php_phongo.c:810
In frame #10 the server_id variable is 1, whereas in frame #4 later on, the server_id variable is 2. These should have been the same.
The server ID is determined by the C driver when selecting a server to send a read or write operation to, and refers to a specific server's connection ID. The PHP driver uses this server ID when executing the query through the phongo_execute_query function, which calls the C driver's mongoc_collection_find_with_opts. The latter accepts as its 3rd argument a bson_t value with options to use while executing a query. These options include the pre-selected server ID, so that the C driver does not attempt to reselect a server:
cursor = mongoc_collection_find_with_opts(collection, query->filter, query->opts,
phongo_read_preference_from_zval(zreadPreference TSRMLS_CC));
I decided to investigate which options the PHP driver was sending to mongoc_collection_find_with_opts. A while ago I developed a GDB helper function, about which I wrote in pretty-printing BSON. I sourced this helper within my GDB instance, and switched to frame #10 to inspect the value of the query options:
(gdb) source ~/dev/php/mongodb-mongo-php-driver/src/libmongoc/.gdbinit (gdb) frame 10
The function call uses the options from the query struct query->opts, so I used the printbson helper function to display its contents:
(gdb) printbson query->opts
Which showed:
$11 = "!\000\000\000\020serverId\000\002\000\000\000\020serverId\000\001", '\000' <repeats 90 times>
INLINE (len=33)
{
'serverId' : NumberInt("2"),
'serverId' : NumberInt("1")
}
There are not supposed to be two conflicting serverId elements. Unlike PHP's arrays, bson_t values can have the same key appear multiple times. Although the C driver had selected server ID 1 for this query, server 2 was used because it was the first serverId element in the options struct. But why were there two values in the first place?
If you look at the PHP test, you see the following:
<?php
$urls = explode("\n", file_get_contents('.travis.scripts/atlas-uris.txt'));
…
$query = new \MongoDB\Driver\Query([]);
foreach ($urls as $url) {
…
try {
$m = new \MongoDB\Driver\Manager($url);
…
iterator_to_array($m->executeQuery('test.test', $query));
…
} catch(Exception $e) {
…
}
}
?>
From this follows that we create the Query object, assign it to $query, and then use the same variable for each iteration. Somehow, we were not resetting the query options back to default before we used them, resulting in a duplicate serverId field. Once we figured out the problem, creating the fix was easy enough: Make sure we use a clean query->opts struct before we pass it to the mongoc_collection_find_with_opts function.
Debugging this issue was made a lot easier by having the right debugging tools, and this case shows that spending time on writing the GDB helper function printbson earlier in the year paid off. With this bug fixed, we could release a new patch version of the MongoDB Driver for PHP.
Happy hacking!
Life Line
@beasts Is your ns1 doing OK? I am getting warning from monitoring that it is timing out and having different results than ns2.
Updated a restaurant
I walked 8.5km in 1h50m50s
I consider myself a decent photographer, but then you find out about the likes of Angel Fux (https://www.angelfux.com/portfolio), and I would like to crawl into a hole.
Better alignment from multiple GPS tracks for QE II Garden (and removed duplicated node)
After two busy days doing elections stuff, I spent yesterday looking at birds, in a nature reserve to relax and recharge.
I saw some cracking birds, with these being my favourite shots.
#BirdPhotography #photography #BirdsOfMastodon #Nature #London
I walked 8.0km in 1h26m01s
Updated a restaurant
Updated 3 bird_hides; Confirmed 2 informations and a toilet
I walked 7.0km in 3h10m47s
This map was mostly red yesterday.
I'm glad we managed to win all the wards we've targeted.
However, no overall control, and two parties with the same amount of seats in second place.
I walked 2.2km in 41m46s
I walked 1.9km in 19m17s
I've spend most of my time outside polling telling today, taking down poll card numbers so we don't need to chase people up to go vote.
I have made several observations:
Updated a restaurant
I walked 12.1km in 2h3m04s
I walked 1.1km in 9m46s
I walked 1.2km in 10m31s
Updated a waste_basket
Created 11 benches, 2 life_rings, and 3 other objects; Updated 8 benches and a waste_basket; Deleted a bench and a log; Confirmed a cafe
I walked 7.8km in 1h52m27s
Tomorrow we have elections in the UK!
Lots of local authorities, all London Councils, the Welsh Senedd, and the Scottish Parliament.
Don't forget to vote if you have the right.
I get to vote for myself again 😎.
Benches, and corrections for the QE II Gardens
Addresses on College Road
Created 2 main entrances and an entrance; Updated an entrance, a residential building, and a house building



Shortlink
This article has a short URL available: https://drck.me/dbgtools-e7g