Parallelizing document retrieval
This is an article I wrote a while ago, but apparently hadn't posted.
MongoDB 2.6 has a new feature that allows you to read all the documents from one collection with multiple cursors in parallel. This is done through a database command called parallelCollectionScan. The idea behind it is that it is faster then reading all the documents in a collection sequentially.
Just like the Aggregation Cursor, calling this command returns cursor information. However, it returns an array of these structures. Let run this example to see what it returns:
<?php
$m = new MongoClient;
$d = $m->demo;
$r = $d->command( [
'parallelCollectionScan' => 'cities',
'numCursors' => 3
] );
var_dump( $r );
?>
And this outputs (after some formatting):
array(2) {
["cursors"]=>
array(3) {
[0]=>
array(2) {
["cursor"]=>
array(3) {
["firstBatch"]=> array(0) { }
["ns"]=> string(14) "demo.cities"
["id"]=>
object(MongoInt64)#5 (1) {
["value"]=> string(12) "339843550291"
}
}
["ok"]=> bool(true)
}
[1]=>
array(2) {
["cursor"]=>
array(3) {
["firstBatch"]=> array(0) { }
["ns"]=> string(14) "demo.cities"
["id"]=>
object(MongoInt64)#6 (1) {
["value"]=> string(12) "340949759620"
}
}
["ok"]=> bool(true)
}
[2]=>
array(2) {
...
}
}
["ok"]=> float(1)
}
With the MongoCommandCursor::createFromDocument from an earlier article you can create a MongoCommandCursor for each of the array elements:
<?php
$m = new MongoClient;
$d = $m->demo;
$r = $d->command( [
'parallelCollectionScan' => 'cities',
'numCursors' => 3
], null, $hash );
$cursors = [];
foreach( $r['cursors'] as $cursorInfo )
{
$cursors[] = MongoCommandCursor::createFromDocument( $m, $hash, $cursorInfo );
}
?>
Instead of creating an array of cursors yourself, the driver implements the MongoCollection::parallelCollectionScan method. Making the above a little bit easier:
<?php $m = new MongoClient; $c = $m->demo->cities; $cursors = $c->parallelCollectionScan( 3 ); ?>
The idea is that with multiple cursors you can iterate over each of the segments in parallel, for example indifferent threads. Of course, PHP does not have threads so that you can't really run things in parallel. However, PHP does have a MultipleIterator class that allows you to iterate over multiple cursors at the same time:
<?php
$m = new MongoClient;
$c = $m->demo->cities;
$cursors = $c->parallelCollectionScan( 3 );
$multiple_it = new MultipleIterator( MultipleIterator::MIT_NEED_ANY );
foreach ( $cursors as $cursor )
{
$multiple_it->attachIterator( $cursor );
}
foreach ( $multiple_it as $items )
{
foreach ( $items as $item )
{
if ( $item !== NULL )
{
echo $item['name'], "\n";
}
}
}
?>
There are three sections here. First we create the cursors with MongoCollection::parallelCollectionScan, then we collect the created cursors into a MultipleIterator and lastly we iterate over the $multiple_it iterator to get our results. Each iteration gives us an array of elements back. One element for each of the containing cursors (3 in our example). We need a second loop (foreach) to pick out the real document.
Not every contained cursor will provide the same amount of items, it is up to the MongoDB server to divide this. When a contained iterator is exhausted, the MultipleIterator sets the value to NULL. It is probably better to then remove that specific contained iterator from the MultipleIterator, but that is left as an excercise for the reader.
When running some benchmarks, I didn't actually see any performance benefit with multiple cursors over just one cursor, but that is likely because the cursors are still iterated over sequentially, and not in parallel. Perhaps using the pthreads PECL extension allows for a better benchmark, but right now, the PHP driver for MongoDB doesn't support threaded execution yet.
Life Line
Created 2 main entrances and an entrance; Updated an entrance, a residential building, and a house building
Created an apartments building and a main entrance
Created a waste_basket
On my walk from Aylesbury to Princes Risborough I spotted a few new bird species. I didn't get all the best photos though!
A Common Buzzard, a Yellow Wagtail, a Greater White throat, and a Green Woodpecker.
#photography #Birds #BirdPhotogaphy #BirdsOfMastodon #nature #Buckinghamshire
Updated an alcohol shop
Updated 2 benches
Created a bench; Updated a bench
I hiked 19.0km in 4h35m50s
I hiked 19.0km in 4h35m50s
I walked 6.8km in 1h15m36s
Updated an estate_agent office
I walked 4.1km in 55m33s
I walked 1.1km in 10m05s
My First Lapwing!
I went to the London Wetland Centre yesterday, for a day out in nature.
While hiding in a hide, this chap and a friend showed up starting to forage for grubs.
#BirdPhotography #BirdsOfMastodon #Photography #Birds #London #Nature
Created a waste_basket; Updated a cafe and a restaurant; Confirmed an estate_agent office
I walked 6.6km in 1h8m53s
@bennuttall Are you at the Crucible this year again?
I walked 9.7km in 5h29m12s
Updated a gate
Staring Contest with a Squirrel
On my walk on the weekend, I sat down on a tree branch of a tree that had fallen over some time ago. Just listening to the birds.
Then after hearing rustling in the foliage above me, I looked up, and saw this chap staring at me.
I walked 3.0km in 41m38s
I walked 1.1km in 12m20s
Bluebell Carpet
I had a lovely walk on Hampstead Heath yesterday, finding all the nooks and crannies away from the busy paths.
This field of bluebells under the colourful tree was a stand-out quiet spot.
I walked 2.3km in 21m51s
Fix paths





Shortlink
This article has a short URL available: https://drck.me/mongopcs-b6u