Cursors and the Aggregation Framework
Now that MongoDB 2.6 has been released, the PHP driver for MongoDB has also received amny updates to support the new features. In this series of articles, I will illustrate some of those updates.
In this article, I will introduce command cursors and demonstrate how they can be applied to aggregations. I previously wrote about the Aggregation Framework last year, but since then it has received a lot of updates and improvements. One of those improvements relates to how the Aggregation Framework (A/F) returns results. Before MongoDB 2.6, the A/F could only return one document, with all the results stored under the results key:
<?php
$m = new MongoClient;
$c = $m->demo->cities;
$pipeline = [
[ '$group' => [
'_id' => '$country_code',
'timezones' => [ '$addToSet' => '$timezone' ]
] ],
[ '$sort' => [ '_id' => 1 ] ],
];
$r = $c->aggregate( $pipeline );
var_dump( $r['result'] );
?>
This code would output something like:
array(242) {
[0] =>
array(2) {
'_id' => string(2) "AD"
'timezones' => array(1) { [0] => string(14) "Europe/Andorra" }
}
[1] =>
array(2) {
'_id' => string(2) "AE"
'timezones' => array(1) { [0] => string(10) "Asia/Dubai" }
}
[2] =>
array(2) {
'_id' => string(2) "AF"
'timezones' => array(1) { [0] => string(10) "Asia/Kabul" }
}
…
MongoCollection::aggregate() is implemented under the hood as a database command. The method in the PHP driver merely wraps this, but you can also call A/F through the MongoDB::command() method:
<?php
$m = new MongoClient;
$d = $m->demo;
$pipeline = [
[ '$group' => [
'_id' => '$country_code',
'timezones' => [ '$addToSet' => '$timezone' ]
] ],
[ '$sort' => [ '_id' => 1 ] ],
];
$r = $d->command( [
'aggregate' => 'cities',
'pipeline' => $pipeline,
] );
var_dump( $r['result'] );
?>
Because a database command only returns one document, the result is limited to a maximum of 16MB. This is not a problem for my example, but it can can certainly be a limiting factor for other A/F queries.
MongoDB 2.6 adds support for returning a cursor for an aggregation command. With the raw command interface, you simply add the extra cursor element:
$r = $d->command( [
'aggregate' => 'cities',
'pipeline' => $pipeline,
'cursor' => [ 'batchSize' => 1 ],
] );
var_dump( $r );
Instead of a document with all results inline, you get a cursor definition back:
array(2) {
'cursor' =>
array(3) {
'id' => class MongoInt64#5 (1) {
public $value => string(12) "392201189815"
}
'ns' => string(11) "demo.cities"
'firstBatch' => array(1) {
[0] =>
array(2) {
'_id' => string(2) "AD"
'timezones' => array(1) { [0] => string(14) "Europe/Andorra" }
}
}
}
'ok' => double(1)
}
The cursor definition contains the cursor ID (in id), the namespace (ns), and whether the command succeeded (in ok). The definition also a portion of the results. The number of items in firstBatch is configured by the value given to batchSize in the command.
To create a cursor that you can iterate over in PHP, you need to convert this cursor definition to a MongoCommandCursor object. You can do that with the MongoCommandCursor::createFromDocument() factory method. This factory method takes three arguments: the MongoClient object ($m in my example), the connection hash, and the cursor definition that was returned. The hash is required so that we can fetch new results from the same connection that executed the original command.
To obtain the connection hash, we need to include a by-ref variable as the third argument to MongoCollection::command():
<?php
$m = new MongoClient;
$d = $m->demo;
$pipeline = [
[ '$group' => [
'_id' => '$country_code',
'timezones' => [ '$addToSet' => '$timezone' ]
] ],
[ '$sort' => [ '_id' => 1 ] ],
];
$r = $d->command(
[
'aggregate' => 'cities',
'pipeline' => $pipeline,
'cursor' => [ 'batchSize' => 1 ],
],
null,
$hash
);
var_dump( $hash );
The hash looks like localhost:27017;-;.;26415. Together with the result, you can now construct a MongoCommandCursor:
$cursor = MongoCommandCursor::createFromDocument( $m, $hash, $r );
And iterate over it:
foreach ( $cursor as $result )
{
echo $result['_id'], ': ', join( ', ', $result['timezones'] ), "\n";
}
?>
As this is all a bit cumbersome, we have also added a helper method for this: MongoCollection::aggregateCursor. This internally does the whole MongoCommandCursor creation dance, and simplifies the previous example to:
<?php
$m = new MongoClient;
$c = $m->demo->cities;
$pipeline = [
[ '$group' => [
'_id' => '$country_code',
'timezones' => [ '$addToSet' => '$timezone' ]
] ],
[ '$sort' => [ '_id' => 1 ] ],
];
$r = $c->aggregateCursor( $pipeline );
foreach ( $r as $result )
{
echo $result['_id'], ': ', join( ', ', $result['timezones'] ), "\n";
}
?>
This helper also automatically sets the initial batch size to 101. You can change the batchSize for subsequent batches by using the MongoCommandCursor::batchSize() method, and for the initial batch by specifying an option to MongoCollection::aggregateCursor:
$options = [ 'cursor' => [ 'batchSize' => 5 ] ]; $r = $d->cities->aggregateCursor( $pipeline, $options ); $r->batchSize( 25 );
In general, you probably should not change the default batch sizes.
The Aggregation Framework has some other new features in MongoDB 2.6 as well. Please refer to the release notes for more information. I might write another post on some of those features later, too.
Life Line
@Edent With your ActivityPub implementation, have you figured out how to allow quote posts for your bot posts yet?
📷 Brown Cap in the Grass
🚩 Herikhuizerweg, Rheden, Nederland
📷 Stalkers Lane
🚩 Graywood Lane, Wealden, United Kingdom
After my PHP 8.5 in Leeds last night, it's now time to head to Rotterdam to give the same talk there tonight!
It's cold out here, but atleast the snow is now gone.
📷 Avenue Gardens
🚩 Princess Road, London Borough of Brent, United Kingdom
📷 Leafy Entrance
🚩 East Heath Road, London Borough of Camden, United Kingdom
📷 From Green to Yellow.
🚩 St John's Wood Road, City of Westminster, United Kingdom
The Secret Maps exhibition at the British Library is well worth a visit!
We went last Sunday and it still runs to January.
📷 Mirror
🚩 The Terrace, London Borough of Richmond upon Thames, United Kingdom
📷 Green, Red, Orange, and Yellow
🚩 Mortlake High Street, London Borough of Richmond upon Thames, United Kingdom
📷 Leaf
🚩 Lonsdale Road, London Borough of Richmond upon Thames, United Kingdom
I walked 10.6km in 1h47m32s
Merged pull request #1048
Fixed issue #2386: Crashes when running context_get in an exception t…
It's PHP 8.5 release week !
I'm giving two talks on what's new in it, first on Wednesday evening in Leeds: https://www.meetup.com/leedsphp/events/311677834/
And then on Thursday evening in Rotterdam: https://eventy.io/events/q8lmw0v4Will I see you there?
@robinince Loving the new series of the Infinite Monkey Cage so far! #bbc
I walked 1.7km in 18m51s
Updated a clothes shop and a restaurant; Deleted a dentist; Confirmed an estate_agent shop, a dentist, and 2 other objects
I walked 3.4km in 39m19s












Shortlink
This article has a short URL available: https://drck.me/aggrcur-ax5