Crowd-serfing

Yesterday, Google announced that they have made Google Map Maker available in the United Kingdom. Like OpenStreetMap it allows everybody to update and add things to the map. But there is one big difference: With MapMaker you don't get access to the data.

Fellow OpenStreetMapper, Richard Fairhurst, describes this as Crowd-serfing:

Crowd-serfing, n.: when a large corp uses crowd-sourced volunteering for its own financial gain, without giving back. See: @googlemapmaker.

I will never understand why people do work for a commercial company without getting any real benefit back: http://www.bbc.co.uk/news/business-21226623

Unfortunately, today's BBC coverage on the availability of Google MapMaker in the UK read more like a manual on MapMaker than a nicely unbiased piece on crowd-sourced maps. Only after one of the OpenStreetMappers reached out to the journalist that wrote the piece, they added some background on OpenStreetMap:

"The biggest problem with Google Map Maker is that anything people contribute may appear on Google's map, but only Google can get at the underlying data to be able to do anything else with it," said Chris Hill.

"If someone includes a Google map on their web site to show where their business is they may also be showing where their competitors are and they can't change that."

Sure, it's nice to have some roads on the Google Map, but you will never even have full access back to your data, unlike OpenStreetMap where you can download and work with all the data. Even nicer is that often, OpenStreetMap still has better maps than GoogleMaps - for example, have a look at this in North Korea: http://tools.geofabrik.de/mc/?mt0=mapnik&mt1=googlemap&lon=125.74677&lat=39.01863&zoom=14 . And closer to home in the United Kingdom, compare some of the hiking trails in the Peak District: http://tools.geofabrik.de/mc/?mt0=mapnik&mt1=googlemap&lon=-1.99876&lat=53.17827&zoom=15 . Even in places like London, the accuracy of the locations of addresses and points-of-interest (POIs) is often a lot better, as OpenStreetMap doesn't use web site scraping and post-code-centroid locations to place POIs: http://tools.geofabrik.de/mc/?mt0=mapnik&mt1=googlemap&lon=-0.12405&lat=51.50862&zoom=18 . OpenStreetMap mostly relies on surveys, done by individuals (like you!) to verify things are actually there, aided a little by the availability of Bing Maps as background imagery.

One of the things that most people forget, is the terms and conditions that commercial entities state. An except from MapMaker's reads:

"You give Google a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive licence to reproduce, adapt, modify, translate, publish, publicly perform, publicly display, distribute, and create derivative works of the user submission."

Note that it never mentions that you can do anything with the data yourself…

stamen-hydepark.jpg

If you are contributing time and knowledge, why not allow yourself to benefit from it as well? I realise that OpenStreetMap might not be as accessible, and the map-tiles on their web site aren't the prettiest, but the real benefit is in the access to the raw data that makes up the images in the map. The OpenStreetMap wiki also has an article on this.

Access to the data allows you to do so many more things. From creating your own fancy map-styles, creating "washable, wearable, all-weather maps designed for the real outdoors" such as SplashMaps, to powering web-sites that show accessibility. If you don't like the way map data is rendered, you can produce something in your own style, just like Nike did with this campaign.

Also, there is nothing better if some of your handiwork shows up in a "best of OSM" poster :-) I would never spend my time adding data to Google's Maps. Instead, I prefer to contribute to OpenStreetMap and do awesome things with the data. In the meanwhile, OpenStreetMap continues to support humanitarian relief efforts in Mali and many other places.

If you're near London, come and join us this summer! (Other places will also run mapping parties).

Shortlink

This article has a short URL available: http://drck.me/gmm-a5f

Comments

I'm not sure when it was done, but I assume this point: http://goo.gl/maps/vgcb2 showing some Final Fantasy place on Kerguelen island's old dump is also an example of why Google Map Maker isn't that good an idea. It was added a few months ago apparently.

I also see no way of indicating that it's wrong, but maybe that's because I'm not in the right country.

The difference struck me yesterday. A friend gave me a printout from Google Maps of an area and asked if I could produce something they could use on the back of a black and white printed leaflet. Using Geofabrik to download an extract, Osmosis to cut out a smaller area to work with, JOSM to tweak my small extract to remove features that I didn't want to appear, and Maperitive to create a set of black and white rules this was surprisingly easy (it helped I'd used all the tools before, of course).

With Google Map Maker you'd give Google the data, and then couldn't do anything like this with it.

Thanks for the great research and all the links. I find the accessibility map most useful.

I wouldn't want my data to be sucked into a black hole. If there were no alternatives, I might have helped Google. But with a true open alternative, the choice of where to contribute is quite obvious.

It's a pity more businesses aren't building with the API.

In Australia, there's a lack of free/open data sets to import, and we got hit heavily by the redaction.

I'd love to see real estate and property industry folks start picking up the open map data idea. One simple use case, generating location descriptions of a given point:

"123 Example St is located (distance) from transport, shopping and playground areas, with approx Nsqm of sheds/garages and a pool/tennis court"

Or for something a bit more way out there: a tourism/gps nav app that highlights scenic, country drives by the sqm of vineyards in a particular area.

The one decent adoption I've seen, our local bus network picked up open street map for routing (hurray), but doesn't make their bus stop or timetable information available for consumption into OSM (boo!).

I'm sure there's a lot of businesses with geographical data sets they could share and liberate, but they don't seem to know how at this stage; or see the value.

I completely agree with your critique. However, at least Ed Parsons says that they are working on getting your own data out of the system.

https://twitter.com/edparsons/status/322347865689190400

cf. http://www.dataliberation.org/

Iceland trip

Aurora Borealis

I have always wanted to see Aurora Borealis (Northern Lights) and I never managed to see it the five years I lived in Norway. The Aurora is caused by the collision of energetic particles coming from the Sun with the atmosphere. Their typical green colour comes from the interaction of the particles with Oxygen atoms in the atmosphere. The auroral mechanism is better explained on Wikipedia. Because the particles are charged, they are directed towards the magnetic poles of the Earth.

Aurora gets more intense, and is visible on lower latitudes, when a geomagnetic storm is in progress. One of their causes is a coronal mass ejection (CME) on the Sun which sends huge quantities of matter and electromagnetic radiation out into space. When CMEs are directed towards Earth they cause a geomagnetic storm which increases the change of Aurora Borealis to occur.

The solar cycle, which is eleven years is a cycle in which the Sun's activity goes from very little activity to a lot of activity and back to very little. During solar maximum there is a lot higher chance of a CME to occur. Originally the most recent solar maximum was forecasted to be in 2010 or 2011 but more recent predictions expect it to be this autumn. For some unexpected reason, Auroras are strongest around the vernal and autumnal equinoxes.

pmapN-march17.gif

With it being so close to both the vernal equinox and solar maximum we set off to Iceland in the hope to be able to see them. The original plan was to seek them out on the second night of our trip (March 18th) but just before we left I noticed through the SpaceWeather site that the Sun produced a solar flare as well as an Earth-bound CME. Instead of trying our luck the 2nd night we decided to book an extra excursion as chances where very high that we would see the Aurora on our first night (March 17th). The Aurora Buddy application that I had installed on my phone had constantly been warning me about high activity all through Sunday after all. The image above shows the extend of auroral activity on the night of the 17th. Just when we got on the bus to travel to some darker skies the Aurora already showed up in the sky visible through Reykjavik's city lights...

When we got to the viewing location, the sky was fully alight as you can see in this timelapse:

Aurora Timelapse, near Vogar, Iceland on March 17th.

This timelapse shows the auroras over a three minute period with a picture taken every 5 seconds. At a frame rate of 3 frames per second this is sped up 15 times. The show lasted until about midnight, when we headed back to Reykjavik. Some more still photos are available on flickr.

The Golden Circle

Of course, Iceland has much more to offer than just the occasional show of Auroras and the next morning we set off on the Golden Circle—a trip past Iceland's touristic highlights. Our first stop was Þingvellir, the site of Alþingi, the first Icelandic parliament that was founded in 930. It is situated in a rift valley that marks the crest of the Mid-Atlantic Ridge and is part of Þingvellir National Park.

thingvellir.jpg

After Þingvellir we proceeded to the geyser Strokkur (Icelandic for "churn"). It's quite spectacular to see a whole lot of water being launched in the air every 4 to 8 minutes.

Our next stop was Gullfoss, a waterfall in the river Hvítá. I had visited Gullfoss in summer many many years ago and this trip's experience was quite different. It was so increadible windy and cold that we could hardly make it to view the waterfall. However, with the ice surrounding it it was quite beautiful:

gullfoss.jpg

Smoke and Wind

The last day of the trip consisted of exploring the Rejkjanes peninsula which is a large geothermic area. We visited the Kerið crater, the geothermal fields Seltún and Gunnuhver, the cliffs near Reykjanestá and the "Bridge between two continents". The colours of some of the landscapes were beautiful, but there was a strong cold wind almost everywhere which made us want to go to the car very fast most of the time. Something to re-explore in summer I suppose.

leif.jpg

Blue Lagoon

After a good night's dinner, drinks and rest we spend the last morning of our trip with our bottoms in the Blue Lagoon to relax. A perfect ending to a quick, but gorgeous trip in Iceland. We'll be back!

For further photos and timelapses, please see my flickr set.

Shortlink

This article has a short URL available: http://drck.me/iceland-a4r

Comments

No comments yet

MongoDB's aggregation framework

As part of my preparations for my MongoDB workshop at PHP Benelux, I ran into a nice use case for MongoDB's aggregation framework. As I have already promised to write about it, this seems to be a good time to actually write the article.

The dataset that I am using for the workshop contains of restaurants and other points of interest, extracted from OpenStreetMap data. As an example, one of the documents that I am storing is:

{
        "_id" : "n558797601",
        "type" : NumberLong(1),
        "loc" : [ 4.4577708, 51.1611465 ],
        "tags" : [
                "addr:city=Edegem",
                "addr:country=BE",
                "addr:housenumber=398",
                "addr:postcode=2650",
                "addr:street=Mechelsesteenweg",
                "amenity=restaurant",
                "cuisine=regional",
                "name=La Rosa",
        ]
}

I wanted to find out which cuisines are used in all of the documents of my dataset. Described in a different way: I want all the different cuisine=… tags as used in my dataset. Traditionally you would write a really complex™ Map/Reduce job, but since MongoDB 2.2, there is a new feature called the aggregation framework. The aggregation framework is meant to be an easy way to do fairly complex aggregation jobs.

The idea behind it is that each aggregation job is defined by a pipeline of operators. Each operator does a specific task. There is for example $match, which allows you to restrict which documents pass through. If the document matches the predicate contained in the $match operator, then it is allowed through, and otherwise it is dropped from the pipeline.

On the MongoDB shell, you use the aggregate() shell helper to execute a pipeline of operators, and with PHP you use the MongoCollection::aggregate() method.

In order to let pass all the documents that have the tag amenity=restaurant, you would run on the shell:

db.poi.aggregate( { $match: { tags: "amenity=restaurant" } } );

And in PHP you would use the following script:

<?php
$m = new MongoClient;
$c = $m->demo->poi;
$result = $c->aggregate(
        array(
                array( '$match' => array( 'tags' => 'amenity=restaurant' ) )
        )
);

var_dump( $result['result'] );
?>

If everything goes well, the return value of the aggregate() helper method is an array with two elements: ok with a value of double(1) as well as a result element containing an array of all the documents that made it through the whole pipeline. Because the aggregation framework returns all of its results as one document over the network, the full result is limited to 16MB. There are also memory limits internally, so it is always wise to restrict the data coming through the pipeline with an operator as soon as you can.

Let's try to construct a pipeline to get a list of all the different cuisine=… tags including how often they appear.

$match

The first thing to do is to match all documents that have the cuisine=… tag in the first place. We use the $match operator for that. If a $match operator is the first operator in a pipeline than it can make use of indexes. It is therefore important that you have indexes in place. The $match operator definition that we need is:

$allWithCuisine = array(
        '$match' => array( 'tags' => new MongoRegex( '/^cuisine=/' ) )
);

To fit that into our script, we change it to:

<?php
$m = new MongoClient;
$c = $m->demo->poi;

$allWithCuisine = array(
        '$match' => array( 'tags' => new MongoRegex( '/^cuisine=/' ) )
);

$result = $c->aggregate(
        array( $allWithCuisine )
);

var_dump( $result['result'] );
?>

In my case, this returns an array of 117 documents in the result element.

For each pipeline step, we create a variable such as $allWithCuisine to define the operator, and then add those to the array that is passed to the aggregate() method.

$project

To reduce the amount of data going through the pipeline, our next step is to remove all the fields from the documents that we are not interested in. In fact, we are actually only interested in the tags field. In order to "re-shape" a document into a different structure, we use the $project operator. In its most basic form, it works the same as the $fields argument to MongoCollection::find(). It is a lot more powerful that that, as it supports changing the whole structure of a document, as well as computed fields. Have a look at the $project documentation for some more inspiration.

As we are only interested in the tags field of the documents, we just put that in the projection:

$justTheTags = array(
        '$project' => array( 'tags' => 1 )
);

and modify the aggregate() call:

$result = $c->aggregate(
        array( $allWithCuisine, $justTheTags )
);

$unwind

In order to be able to do some work on individual tags, we need to split up the tags array. The $unwind operator does just that. It is a rather tricky operator to explain, so I will try with an example. Take for example this document:

{
        _id: "n478547159",
        related_ids: [ "n516583937", "n401309937" ]
}

Using the $unwind operator on related_ids removes each document from the pipeline and introduces two new ones. One for each of the related_ids elements. At the same time, it replaces the related_ids array with one of the values. Running { $unwind: '$related_ids' } turns the above document into the following two:

{
        _id: "n478547159",
        related_ids: "n516583937"
}
{
        _id: "n478547159",
        related_ids: "n401309937"
}

In our case, we want a document for each of the elements in the tags array so that we can group on this field later. We introduce our $unwind operator:

$unwindTags = array(
        '$unwind' => '$tags'
);

and add it to our list of pipeline operators:

$result = $c->aggregate(
        array( $allWithCuisine, $justTheTags, $unwindTags )
);

When we run the script now, we get 554 documents in the following form:

…
array (
        '_id' => 'n470071537',
        'tags' => 'amenity=fast_food',
),
array (
        '_id' => 'n470071537',
        'tags' => 'cuisine=burger',
),
array (
        '_id' => 'n470071537',
        'tags' => 'name=C&Ms',
),
…

Because we are only interested in the cuisine=… tag, we use our previously defined $match operator to filter out all the documents that don't have this tag:

$result = $c->aggregate(
        array( $allWithCuisine, $justTheTags, $unwindTags, $allWithCuisine )
);

Which leaves us with 117 documents again.

$group

Now that we have extracted and massaged our data, we are ready to group the documents by their cuisine=… key. The $group operator groups all documents in the pipeline by a key, and allows for computed fields. In our case we want to group by the tags field:

$groupByTags = array(
        '$group' => array( '_id' => '$tags' )
);

Then we add it to our list of pipeline operators:

$result = $c->aggregate(
        array(
                $allWithCuisine, $justTheTags, $unwindTags, $allWithCuisine,
                $groupByTags,
        )
);

Our results includes one document for each distinct $tags value. A small excerpt:

…
array (
        '_id' => 'cuisine=kebab;turkish',
),
array (
        '_id' => 'cuisine=pizza',
),
array (
        '_id' => 'cuisine=fine_dining',
),
…

In order to also have a count for each of the distinct values in an extra count field, we need to modify the $group operator in the pipeline. I have already mentioned that you can have computed fields, and that's what we need here. A computed field attaches an expression to a field name. In this case, we want the count field to increment by 1 each time we find a document with this field—for this we use the $sum operator:

$groupByTags = array(
        '$group' => array(
                '_id' => '$tags',
                'count' => array( '$sum' => 1 )
        )
);

Each document that now comes out of the pipeline looks like:

…
array (
        '_id' => 'cuisine=turkish',
        'count' => 2,
),
array (
        '_id' => 'cuisine=japanese',
        'count' => 7,
),
array (
        '_id' => 'cuisine=italian',
        'count' => 10,
),
…

Other computed fields are also possible. If we want for example to also record which original _id field had a cuisine=… tag, we modify the group operator to add this field as well:

$groupByTags = array(
        '$group' => array(
                '_id' => '$tags',
                'count' => array( '$sum' => 1 ),
                'ids' => array( '$addToSet' => '$_id' )
        )
);

The $addToSet operator adds the original _id value as a new value to the ids array for each grouped cuisine=… tag. When we run the full script with the modified $group operator, we now get documents in the form:

array (
        '_id' => 'cuisine=friture',
        'count' => 3,
        'ids' => array (
                0 => 'n2040116467',
                1 => 'n1701471939',
                2 => 'n1701465430',
        ),
),

Because our requirement didn't really want this ids array, I have removed it from future examples.

$sort

The only thing left to do is now sort our cuisine=… tags with the most used tags first. For this we use the $sort operator. The sort operator works in the same way as the MongoCursor::sort() method and accepts the same arguments. In order to sort by the count field in descending order, we create the pipeline operator as follows:

$sort = array(
        '$sort' => array( 'count' => -1 )
);

And add it to our pipeline:

$result = $c->aggregate(
        array(
                $allWithCuisine, $justTheTags, $unwindTags, $allWithCuisine,
                $groupByTags, $sort,
        )
);

When running our script now, we get a list of all distinct cuisine=… tags ordered by their occurrence:

array (
        '_id' => 'cuisine=regional',
        'count' => 19,
),
array (
        '_id' => 'cuisine=burger',
        'count' => 15,
),
array (
        '_id' => 'cuisine=chinese',
        'count' => 14,
),
…

Conclusion

With this I conclude my introduction to the aggregation framework. You can find the final script here. The documentation is extensive so I would suggest to give it a good read.

I'm going back to preparing my PHP Benelux workshop now!

Shortlink

This article has a short URL available: http://drck.me/mdbaggr-a2a

Comments

No comments yet

Where is the Sun?

In a previous article I wrote that I am using my Raspberry PI as status screen showing the weather among things, but I wanted to make the widget that shows the current weather a bit more interesting. Instead of having the background black (for nights) and white (for days) I want to have a better approximation of the lightness of the sky. In order to be able to do this, I need to know: Where the Sun in the sky is? (In Britain, of course that would be: Where in the sky is the Sun behind the clouds.)

With PHP's date_sun_info() function you can easily calculate when the Sun rises and sets, but it's not useful to determine how far above or under the horizon the Sun is. For that, I needed to implement a little bit more maths. I found an excellent tutorial online that explains the formulas that are used to calculate the position of the Sun. The trigonometry and maths go beyond me at the moment though!

I've implemented some of those functions in a simple library, called "astro". You can find it on GitHub at https://github.com/derickr/astro. Right now, it doesn't implement a lot more than just the position of the Sun, but I am intending to implement the rest of the algorithms too.

Of course, just a C-library of some maths isn't very useful if your language of choice is PHP, so I also implemented a tiny PHP extension wrapping the astro library. It's called solarsystem and available on GitHub as well. There is only an earth_sunpos() function so far, but again, I am intending to extend on that.

In order to make use of it, you'll have to run:

git clone git://github.com/derickr/php-solarsystem.git
cd php-solarsystem
git submodule init
git submodule update
phpize
./configure
make
make install

Then you can either add extension=solarsystem.so to php.ini, or run scripts with php -dextension=solarsystem.so yourscript.php. In the tests/ directory of the Git checkout you can find a script called sun-position.php. If we examine that, we will see that for four cities (Johannesburg, London, Longyearbyen and Oslo) we calculate the position of the Sun for every 15 minutes during January 14th, 2013. The main function there is earth_sunpos() which takes a Unix timestamp, as well as the latitude and the longitude of the location for which we want to calculate the Sun's position.

The script produces CSV, that I redirect into a file:

php -dextension=solarsystem.so sun-position.php > sunpos.csv

I then opened this file in LibreOffice and made a pretty graph out of it:

sunpos.png

For Longyearbyen (yellow line) it shows that the Sun never rises as it always stays below the horizon. It also shows that the highest point over the horizon is different for London and Oslo—mostly because they are 10° apart horizontally. For London, you can also see that sunrise happens around 08:00 and sunset around 16:20.

The position over the horizon, combined with the weather forecast allows me to calculate the likely lightness of the sky. But that will have to wait to a future blog post.

Shortlink

This article has a short URL available: http://drck.me/where-sun-a22

Comments

Pretty cool stuff in there! I love when you get a chance to work on something fun to strengthen your brain! Especially when you spend most of your day doing the type of web development that is little more than database CRUD over HTTP.

Tweaking the Logitech R400 presenter tool on Linux

Updated on April 9th, 2013, after comments by Jim Diamond

For Christmas I received a Logitech R400 presenter tool as a replacement for the php|architect one that has now fallen apart. However, to use it together with my presentation system—pres2, about which I previously wrote—I need it to provide left and right arrow keypresses. By default its left and right buttons generate Prior and Next events in X.

It took me a while to get this to work, so in short, this is how I changed it. First of all, I created the file /etc/udev/logitech-r400 as root with contents:

0x70037 f5
0x70029 f11
0x7003E f11
0x7004B left
0x7004E right

logitech-r400.png

This maps the two scan codes that the Play button (lower-left) both to f11, the empty screen button to f5 and the left and right buttons to the left and right arrow keys.

The first column represents the scancode, which I obtained by first looking up with input event the device was tied to:

stat -t /dev/input/by-id/usb-Logitech_USB_Receiver-event-kbd --printf "%N\n"

Which showed the following for me:

‘/dev/input/by-id/usb-Logitech_USB_Receiver-event-kbd’ -> ‘../event16’

Then with event16 I ran:

/lib/udev/keymap -i input/event16

And pressed all four buttons (and the Play button twice). This then showed up on the screen:

scan code: 0x7004B   key code: pageup
scan code: 0x7004E   key code: pagedown
scan code: 0x7003E   key code: f5
scan code: 0x70029   key code: esc
scan code: 0x70037   key code: dot

After creating the file, to test things I ran:

sudo /lib/udev/keymap input/event16 /etc/udev/logitech-r400

Which showed:

Remapped scancode 0x70037 to 0x3f (prior: 0x34)
Remapped scancode 0x70029 to 0x57 (prior: 0x01)
Remapped scancode 0x7003e to 0x57 (prior: 0x3f)
Remapped scancode 0x7004b to 0x69 (prior: 0x68)
Remapped scancode 0x7004e to 0x6a (prior: 0x6d)

After this ran, the presenter tool now sends the key presses that I want. To make this permanent, I added as root a new file /etc/udev/rules.d/logitech.rules, with as contents:

ENV{ID_VENDOR}=="Logitech*", ATTRS{idProduct}=="c52d", RUN+="keymap $name /etc/udev/logitech-r400"

The changes will now also persist after rebooting as well.

My first possibility to test the new tool will be at PHP Benelux where I will be giving a MongoDB tutorial on January 25th. Tickets are still available.

Shortlink

This article has a short URL available: http://drck.me/r400-a1r

Comments

Very nice and useful. Thanks!

Life Line