Last week I listened to an episode of The Sceptics' Guide to the Universe where the word of the week was "analemma". An analemma is a diagram showing the position of the Sun in the sky over the course of a year, as viewed at a fixed time of day from the same location on Earth. I once tried to make such a diagram when I was still living in Norway from a series of photos, but the weather wasn't consistent enough to make that work.

But as I am currently starting to update the Guide to Date and Time Programming for a second edition, I was wondering whether I could create an analemma from existing PHP functions. Unfortunately, PHP only provides functionality to calculate when the Sun is at its highest point, through date_sun_info():

$sunInfo = date_sun_info(
        (new DateTimeImmutable())->getTimestamp(), // Unix timestamp
        51.53,                                     // latitude
        -0.19                                      // longitude

$zenith = new DateTimeImmutable( "@{$sunInfo['transit']}" );
echo $zenith->format( DateTimeImmutable::ISO8601 ), "\n";

Which on February 26th, was at 2018-02-26T12:13:38+0000 in London.

Then I remembered that a few years ago I wrote Where is the Sun?. There I features a new hobby library "astro" that I was working on. This library implements a few astronomical calculations. I wrote a little PHP extension around it too: php-solarsystem. Neither library or extension have really been released.

The php-solarsystem extension implements just one function: earth_sunpos(), which fortunately does exactly what I needed for drawing an analemma: it gives you the position of the Sun in the sky for a specific location on Earth at a specific time.

With this function, all I had to do is calculate the position of the Sun in the sky at the same time-of-day for a whole year. With the DatePeriod class in PHP, I can easily create an iterator that does just that:

date_default_timezone_set( "UTC" );

$dateStart = new DateTimeImmutable( "2018-01-01 09:00" );
$dateEnd   = $dateStart->modify( "+1 year 1 day" );
$dateInterval = new DateInterval( "P1D" );

foreach ( new DatePeriod( $dateStart, $dateInterval, $dateEnd ) as $date )

We don't really want Daylight Saving Time to be in the way, so we set the time zone to just UTC, which works fine for London for which we'll draw the analemma.

We start at the start of the year (2018-01-01 09:00) and iterate for a year and a day (+1 year 1 day) so we can create a closed loop. Each iteration increases the returned DateTimeImmutable by exactly one day (P1D).

After defining the latitude and longitude of London, all we need to do is to use the earth_sunpos() function to calculate the azimuth and altitude inside the loop. Azimuth is the direction of where the Sun is, with 180° being due South. And altitude is the height of the Sun above the horizon.

$lat = 51.53;
$lon = -0.09;

foreach ( new DatePeriod( $dateStart, $dateInterval, $dateEnd ) as $date )
        $ts = $date->format( 'U' );
        $position = earth_sunpos( $ts, $lat, $lon );
        echo $ts, "\n";
        echo $position['azimuth'], ",";
        echo $position['altitude'], "\n";

The script outputs the calculation as a "CSV", which we should redirect to a file:

php tests/analemma.php > /tmp/analemma.csv

To plot we use the following gnuplot script:

set style line 1 lt 1 lw 2 pt 0 ps 0 linecolor rgb "orange"
set style line 2 lt 1 lw 1 pt 0 ps 0 linecolor rgb "grey"

set datafile separator comma
set xrange [100:150]
set yrange [0:50]

set grid linestyle 2
set terminal png size 640,640 enhanced font "Helvetica,12"
set output '/tmp/analemma.png'

plot "/tmp/analemma.csv" using 2:3 title "London @ 9 am" with linespoints linestyle 1

With this script, we can then draw the analemma:

gnuplot /tmp/analemma.plot

The result:


Analemma (Plot) — Derick Rethans


This article has a short URL available:


No comments yet

Pretty Printing BSON

In Wireshark and MongoDB 3.6, I explained that Wireshark is amazing for debugging actual network communications. But sometimes it is necessary to debug things before they get sent out onto the wire. The majority of the driver's communication with the server is through BSON documents with minimal overhead of wire protocol messages. BSON documents are represented in the C Driver by bson_t data structures. The bson_t structure wraps all of the different data types from the BSON Specification. It is analogous to PHP's zval structure, although its implementation is a little more complicated.

A bson_t structure can be allocated on the stack or heap, just like a zval structure. A zval structure represents a single data type and single value. A bson_t structure represents a buffer of bytes constituting one or more values in the form of a BSON document. This buffer is exactly what the MongoDB server expects to be transmitted over a network connection. As many BSON documents are small, the bson_t structure can function in two modes, determined by a flag: inline, or allocated. In inline mode it only has space for 120 bytes of BSON data, but no memory has to be allocated on the heap. This mode can significantly speed up its creation, especially if it is allocated on the stack (by using bson_t value, instead of bson_t *value = bson_new()). It makes sense to have this mode, as many common interactions with the server fall under this 120-byte limit.

For PHP's zval, the PHP developers have developed a helper function, printzv, that can be loaded into the GDB debugger. This helper function unpacks all the intricacies of the zval structure (e.g. arrays, objects) and displays them on the GDB console. When working on some code for the MongoDB Driver for PHP, I was looking for something similar for the bson_t structure only to find that no such thing existed yet. With the bson_t structure being more complicated (two modes, data as a binary stream of data), it would be just as useful as PHP's printzv GDB helper. You can guess already that, of course, I felt the need to just write one myself.

GDB supports extensions written in Python, but that functionality is sometimes disabled. It also has its own scripting language that you can use on its command line, or by loading your own files with the source command. You can define functions in the language, but the functions can't return values. There are also no classes or scoping, which means all variables are global. With the data stored in the bson_t struct as a stream of binary data, I ended up writing a GDB implementation of a streamed BSON decoder, with a lot of handicaps.

The new printbson function accepts a bson_t * value, and then determines whether its mode is inline or allocated. Depending on the allocation type, printbson then delegates to a "private" __printbson function with the right parameters describing where the binary stream is stored.

__printbson prints the length of the top-level BSON document and then calls the _printelements function. This function reads data from the stream until all key/value pairs have been consumed, advancing its internal read pointer as it goes. It can detect that all elements have been read, as each BSON document ends with a null byte character (\0).

If a value contains a nested BSON document, such as the document or array types, it recursively calls __printelements, and also does some housekeeping to make sure the following output is nicely indented.

Each element begins with a single byte indicating the field type, followed by the field name as a null-terminated string, and then a value. After the type and name are consumed, __printelements defers to a specialised print function for each type. As an example, for an ObjectID field, it has:

if $type == 0x07
    __printObjectID $data

The __printObjectID function is then responsible for reading and displaying the value of the ObjectID. In this case, the value is 12 bytes, which we'd like to display as a hexadecimal string:

define __printObjectID
    set $value = ((uint8_t*) $arg0)
    set $i = 0
    printf "ObjectID(\""
    while $i < 12
        printf "%02X", $value[$i]
        set $i = $i + 1
    printf "\")"
    set $data = $data + 12

It first assigns a value of a correctly cast type (uint8_t*) to the $value variable, and initialises the loop variable $i. It then uses a while loop to iterate over the 12 bytes; GDB does not have a for construct. At the end of each display function, the $data pointer is advanced by the number of bytes that the value reader consumed.

For types that use a null-terminated C-string, an additional loop advances $data until a \0 character is found. For example, the Regex data type is represented by two C-strings:

define __printRegex
    printf "Regex(\"%s\", \"", (char*) $data

    # skip through C String
    while $data[0] != '\0'
        set $data = $data + 1
    set $data = $data + 1

    printf "%s\")", (char*) $data

    # skip through C String
    while $data[0] != '\0'
        set $data = $data + 1
    set $data = $data + 1

We start by printing the type name prefix and first string (pattern) using printf and then advance our data pointer with a while loop. Then, the second string (modifiers) is printed with printf and we advance again, leaving the $data pointer at the next key/value pair (or our document's trailing null byte if the regex type was the last element).

After implementing all the different data types, I made a PR against the MongoDB C driver, where the BSON library resides. It has now been merged. In order to make use of the .gdbinit file, you can include it in your GDB session with source /path/to/.gdbinit.

With the file loaded, and bson_doc being bson_t * variable in the local scope, you can run printbson bson_doc, and receive something like the following semi-JSON formatted output:

(gdb) printbson bson_doc
ALLOC [0x555556cd7310 + 0] (len=475)
    'bool' : true,
    'int32' : NumberInt("42"),
    'int64' : NumberLong("3000000042"),
    'string' : "Stŕìñg",
    'objectId' : ObjectID("5A1442F3122D331C3C6757E1"),
    'utcDateTime' : UTCDateTime(1511277299031),
    'arrayOfInts' : [
        '0' : NumberInt("1"),
        '1' : NumberInt("2"),
        '2' : NumberInt("3"),
        '3' : NumberInt("5"),
        '4' : NumberInt("8"),
        '5' : NumberInt("13"),
        '6' : NumberInt("21"),
        '7' : NumberInt("34")
    'embeddedDocument' : {
        'arrayOfStrings' : [
            '0' : "one",
            '1' : "two",
            '2' : "three"
        'double' : 2.718280,
        'notherDoc' : {
            'true' : NumberInt("1"),
            'false' : false
    'binary' : Binary("02", "3031343532333637"),
    'regex' : Regex("@[a-z]+@", "im"),
    'null' : null,
    'js' : JavaScript("print foo"),
    'jsws' : JavaScript("print foo") with scope: {
        'f' : NumberInt("42"),
        'a' : [
            '0' : 3.141593,
            '1' : 2.718282
    'timestamp' : Timestamp(4294967295, 4294967295),
    'double' : 3.141593

In the future, I might add information about the length of strings, or the convert the predefined types of the Binary data-type to their common name. Happy hacking!


This article has a short URL available:


No comments yet

Wireshark and SSL

This is a follow up post to Wireshark and MongoDB 3.6, in which I explained how I added support for MongoDB's OP_MSG and OP_COMPRESSED message formats to Wireshark.

In the conclusion of that first article, I alluded to the complications with inspecting SSL traffic in Wireshark, which I hope to cover in this post. It is common to enable SSL when talking to MongoDB, especially if the server communicates over a public network. When a connection is encrypted with SSL, it is impossible to dissect the MongoDB Wire Protocol data that is exchanged between client and server—unless a trick is employed to first decrypt that data.

Fortunately, Wireshark allows dissection and analysis of encrypted connections in two different ways. Firstly, you can configure Wireshark with the private keys used to encrypt the connection, and secondly, you can provide Wireshark with pre-master keys obtained from a client process that uses OpenSSL.

The first option, providing Wireshark with the private keys, is by far the easiest. You can go to EditPreferencesProtocolsSSL and add the private key to the RSA keys list:

When you start using Wireshark with SSL encryption, it is also wise to configure an SSL debug file in the same screen. I have set it here to /tmp/ssl-debug.txt.

Months ago, I had added my private key to the RSA keys list, but when I tried it now for this post, Wireshark failed to decrypt my SSL traffic to MongoDB. I was a little confused as it worked in the past. Since I had my SSL debug file at least I had some chance of figuring out why this no longer worked. After a quick look I noticed the following in the debug file:

   session uses Diffie-Hellman key exchange
   (cipher suite 0xC030 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384)
   and cannot be decrypted using a RSA private key file.

After some searching, I found out that if the session uses Diffie-Hellman for key exchange, Wireshark can not use the RSA private key, and needs different information. On an earlier run, I must have used a different version of either the encryption library (OpenSSL) or MongoDB, which did not use Diffie-Hellman.

This brings me to the second way of providing Wireshark with the information it needs to decrypt SSL encrypted connections: the pre-master key. This key is created during the connection set-up, and therefore you need to read data structures from within the OpenSSL library. You can do that manually with GDB, but it is also possible to inject a special library that hooks into OpenSSL symbols to read the data for you, and store them in a file with a format that Wireshark understands. You can find the source code for the library here.

Once you've obtained the source code, you can compile it with:

cc sslkeylog.c -shared -o -fPIC -ldl

The compiled key logging library can be loaded in the process to override the existing OpenSSL symbols with:

SSLKEYLOGFILE=/tmp/premaster.txt LD_PRELOAD=./ \
    ./mongo --ssl \
    --sslPEMKeyFile=/tmp/ssl/ssl/client.pem --sslCAFile=/tmp/ssl/ssl/ca.pem

The OpenSSL LD_PRELOAD trick should also work with the PHP driver for MongoDB as long as it uses OpenSSL. You can verify which SSL library the PHP driver uses by looking at phpinfo() output. For Java programs, there is an agent you can use instead.

With the key logging library and its generated file with pre-master keys in place, and Wireshark configured to read the keys from this file through the (Pre)-Master-Secret log filename setting, we can now decrypt SSL-encrypted connections between MongoDB client and server:

There was one caveat: a small patch to Wireshark is needed for it to realise that MongoDB's connections can be SSL encrypted on the default port (27017). I created a patch with the following one-liner:

      dissector_add_uint_with_preference("tcp.port", TCP_PORT_MONGO, mongo_handle);
+     ssl_dissector_add(TCP_PORT_MONGO, mongo_handle);

This patch, and the two patches mentioned in the previous post, have been merged into Wireshark's master branch and will be included in the upcoming 2.6 release. Until that is released, you will have to compile Wireshark yourself, or use a nightly build.


This article has a short URL available:


No comments yet

Xdebug 2.6

I have just released Xdebug 2.6. Xdebug 2.6 adds supports for PHP 7.2 (and drops support for PHP 5), and adds a whole bunch of new features. This article describes these new features.

Garbage Collection Statistics

PHP has a built-in garbage collector, which makes it possible for PHP to free up memory that normally would be lost due to interdependent references. I wrote about the garbage collectors in previous articles.

Xdebug 2.6 provides insight into the runs of PHP's built-in Garbage Collector. There are two new functions: xdebug_get_gc_run_count(), which returns how often the Garbage Collector has run, and xdebug_get_gc_total_collected_roots(), which returns the number of variable roots that the garbage collection has collected.

There is also a new set of settings: xdebug.gc_stats_enable, xdebug.gc_stats_output_dir, and xdebug.gc_stats_output_name. When the statistics collection is enabled by setting xdebug.gc_stats_enable to true, Xdebug will write a file to the configured output directory with a name configured through xdebug.gc_stats_output_name. Just like xdebug.trace_output_name, the latter supports different format specifier to add additional information to the file names.

Instead of recording the Garbage Collection runs for the whole script, you can also selectively record this information by using xdebug_start_gcstats() and xdebug_stop_gcstats().

When PHP's garbage collector runs, Xdebug will write information about each run into a file. The format of the file is:

Garbage Collection Report
version: 1
creator: xdebug 2.6.0 (PHP 7.2.0)

Collected | Efficiency% | Duration | Memory Before | Memory After | Reduction% | Function
    10000 |    100.00 % |  0.00 ms |       5539880 |       579880 |    79.53 % | bar
    10000 |    100.00 % |  0.00 ms |       5540040 |       580040 |    79.53 % | Garbage::produce
     4001 |     40.01 % |  0.00 ms |       2563048 |       578968 |    77.41 % | gc_collect_cycles

For each run, it will write how many roots are collected, and how much % of them ends up getting freed. The duration of the Garbage Collector's run, and the memory usage before and after are also recorded, as well as how much reduction in memory usage this Garbage Collector run created. The last column shows the active function or method name when the Garbage Collection algorithm was run, or gc_collect_cycles() if it was run manually.

Profiler Enhancements

Xdebug's profiler now also collects information about memory usage. This can assist tracking down which parts of your application allocate a lot of memory, and perhaps why some memory is not freed up.

Caveat: As described above in Garbage Collection Statistics, PHP has a Garbage Collector built in, which can trigger at seemingly random times. This will distort the memory information that is recorded in the profiler's output files. In order to get better results for memory profiling, you might want to consider disabling PHP's internal garbage collector.

Additionally, Xdebug will now add a X-Xdebug-Profile-Filename HTTP header for requests for which the profiler is active. This header holds the name of the file that contains the profiling information for that request.

Remote Debugging Improvements

A new protocol feature, extended_properties, has been introduced that IDEs can opt into. When this feature is enabled, Xdebug will send variable names as Base64 encoded data to allow for characters that can not be represented safely in XML.

Another new protocol feature, notifications, has been introduced that IDEs can opt into. When this feature is enabled, Xdebug will send any Notice, Warning, or Error as an out-of-band notification over the debugging protocol to the IDE which can then display this information.

A new setting, xdebug.remote_timeout, has been added to configure how long Xdebug should wait for an IDE to acknowledge an incoming debugging connection. The default value, 200 ms, should in most cases be enough, but can be increased if you have a particularly high latency on your network and Xdebug fails to make a connection due to the low timeout.

A new function, xdebug_is_debugger_active(), can be used whether there currently is an IDE attached to Xdebug through the DBGp protocol.

Xdebug now supports debugging through Unix domain sockets. You can specify Unix domain socket "hosts" with unix:///path/to/sock, with thanks to Sara Golemon.

Xdebug now enables FD_CLOEXEC on its debugging sockets to prevent them from being leaked to forked processes, thanks to Chris Wright.

Smaller Improvements

A new setting, xdebug.filename_format, has been added to configure how Xdebug will render filenames in HTML-like stack traces. Just like xdebug.trace_output_name, it accepts a set of format specifiers that can be used to include certain aspects of a path. Xdebug 2.6 adds the specifiers below. With a full path of /var/www/vendor/mail/transport/mta.php, the able below lists what each specifier represents:





File name



Directory and file name



Two directory segments and filename



Full path



Platform specific slash

/ on Linux and OSX, \ on Windows

Xdebug now adds the values of superglobals to the error log as well. Previously, it would only add this information to on-screen stack traces. In order for Xdebug to show this information, you need to configure through xdebug.dump_globals and xdebug.dump.* which superglobal keys you want to see at all.

The %s format specifier is now available to be used with the xdebug.trace_output_name setting. Previously, it was only available for use with the xdebug.profiler_output_name setting.

Trace files generated with xdebug.collect_assignments now also contain assign-by-ref (=&) assignments.

Behavioural Changes

Instead of throwing a fatal error when an infinite recursion (xdebug.max_nesting_level) is detected, Xdebug now throws an Error exception instead.


As you can see, Xdebug 2.6 packs a whole bunch of new features and has been the cumulation of a little over a year's work. Although the majority of the work was done by myself, there were notable contributions by Arnaud Gendre, Benjamin Eberlei, Chris Wright, Emir Beganović, Frode E. Moe, Kalle Sommer Nielsen, Nikita Popov, Sara Golemon, Remi Collet, and Zaid Al Khishman.

During the year I have also launched my Patreon page, in case, you want to contribute for further development of Xdebug. Alternatively, you might want to look at my Amazon wishlist to say thank you.


This article has a short URL available:


No comments yet

Wireshark and MongoDB 3.6

While working on the MongoDB Driver for PHP, we sometimes run into problems where we send the wrong data to the server, tripping it up, and not getting the results we want.

Drivers talk to MongoDB with a binary protocol, which we refer to as the Wire Protocol. It is a documented binary format, mostly wrapping around BSON structures.

Different versions of the MongoDB may respond with varying message formats and likewise prefer (or require) that drivers use newer formats to issue commands and queries. An additional problem is that sometimes mongoS, our query router for sharding, wants instructions in different formats, and sends results back in a slightly different format than mongoD.

Over the past years, I have been using a network traffic analyser called Wireshark to figure out what actually gets sent over the network connections.

Wireshark runs on various platforms, and on Debian platforms can easily be installed by running apt install wireshark. Starting Wireshark presents you with the following screen:

This screen lists all the network interfaces that are available on the system, and it allows you to select one (Loopback: lo in my case) and start packet collection by clicking on the shark fin.

With Wireshark collecting data, I then connected with the MongoDB shell to the server to see what was sent over a network connection. As no network protocol is the same as any other, Wireshark contains lots of different "dissectors" to analyse each specific protocol. There are a lot of dissectors, for nearly every protocol that you can think of—including HTTP, MySQL, NTP, TCP, and of course MongoDB.

After we connected with the MongoDB shell to the server, we can have a look at what Wireshark collected.

In this first screenshot we see that the MongoDB client's first packet is a Request : Query packet. In this first packet it sends an isMaster command to the server to figure out its version, and some other parameters. The dissector has unpacked the BSON Document with length 201 to be able to show us the various elements that make up the request. For example, it sends along which driver is being used to make the request (MongoDB Internal Client) in the clientdrivername element.

The result includes parameters such as the maximum size of a data packet (maxMessageSizeBytes), the localTime, as well as the wire version (not pictured). The latter is used to determine which wire protocol versions can understand.

So far we have seen two types of packets, Request and Response, but others also exist. In MongoDB 3.6, two extra types of packages are are supported. These are the Extensible Message Format and Compressed packets. The former is replacing all earlier ways of sending queries and commands (and receiving their replies), with a new single packet that can be used in both directions. The latter can be used to compress the data, and this packet also wraps an Extensible Message Format packet.

Unfortunately, Wireshark's MongoDB dissector did not support these new packets yet. Instead of waiting for them to be added by the maintainers, I set out to add the missing functionality myself. It is after all closely related to my day job ☺.

As I had never contributed to Wireshark, I read up on their development guidelines and quickly found that they have a highly automated way of doing code reviews with rigorous practises in place. This is a welcome difference from most other open source projects to which I've contributed.

As I expected, my first patch adding OP_MSG support needed a few revisions before it was merged.

The result of this first patch is that packets of the OP_MSG type can now also be visualised:

A Extensible Message Format packet consists of one more Sections, which can either be a Body, or a Document Sequence. With a Body being required with zero or more Document Sequence sections following. In this example, all of the Message Flags are empty (unset).

In this case the reply contains a cursor description, including its initial batch (firstBatch) of three documents totalling 6392 bytes. Drivers can use these cursor descriptions to iterate over a whole result set.

Once there is lot of data to transport, it can be beneficial to compress the data stream. MongoDB 3.4 already included some preliminary work for this, but with MongoDB 3.6 compression is now enabled by default. It can either use the snappy compressor or zlib compressor. Compressed data is send with a different packet type (OP_COMPRESSED) which, as you can probably guess, also was not yet supported.

My second patch adds OP_COMPRESSED support to Wireshark. Adding support for zlib was easy, as Wireshark already had helper functions available. Supporting snappy required a bit more manual work.

Compressed packets replace the original packet type (OpCode) with one (2012) to denote it is a compressed packet. The header that describes the parameters of the compressed data (Uncompressed Size, and Compressor) also includes the original OpCode that is represented in the compressed data (Original Opcode).

When the dissector decompresses a data packet, it adds a new tab containing the Decompressed Data so that you can view the raw bytes of this uncompressed data once you dive into the data structures (e.g. Section, BodyDocument).

With support for OP_MSG and OP_COMPRESSED added, figuring out if we send something wrong to a MongoDB 3.6 server becomes a lot easier. Things might still be complicated once we throw SSL into the mix, but that will have to wait until another blog post.


This article has a short URL available:


No comments yet

Life Line