Importing OpenStreetMap data into MongoDB
In many recent MongoDB related presentations I have used OpenStreetMap data as basis for most of my examples. I wrote a script that imports OpenStreetMap (OSM) nodes, ways and to a lesser extend relations into MongoDB with a specific and optimal schema. I have written about this briefly before in Indexing Freeform-Tagged Data, but now MongoDB received numerous updates to geospatial indexes I find it warrants a new article.
In MongoDB 2.2 and before, the index type that MongoDB used was the 2d type, which was basically a two dimensional flat earth coordinate system index—with some spherical features built on top. MongoDB 2.4 has a new index type: 2sphere. Instead of just being able to index points described by x/y coordinates (longitude/latitude) it now has support for indexing points, line strings and polygons as defined by GeoJSON.
I have modified my import script to use those new types, and also added support for very simple multi-polygons that OpenStreetMap records through its relation tag. The script also creates an indexes on { l: '2dsphere' } (the GeoJson object), { ts: 1 } (the tags), and { ty: 1 } (the type).
The structure it converts a node from OpenStreetMap to looks like:
{
"_id" : "n26486695",
"ty" : NumberLong(1),
"l" : {
"type" : "Point",
"coordinates" : [
-0.1580359,
51.4500055
]
},
"ts" : [
"addr:housenumber=97",
"addr:postcode=SW12 8NX",
"addr:street=Nightingale Lane",
"amenity=pub",
"name=The Nightingale",
"operator=Youngs",
"source:name=photograph",
"toilets=yes",
"toilets:access=customers",
"website=http://www.youngs.co.uk/pub-detail.asp?PubID=430"
],
"m" : {
"v" : NumberLong(5),
"cs" : NumberLong(11229430),
"uid" : NumberLong(652021),
"ts" : NumberLong(1333911628)
}
}
There are several sections that make up the document:
-
_id: Is a combination of n and the OSM node id.
-
ty: Is the type. For nodes, this is always 1.
-
l: The point's location in GeoJson format. An OSM point is translated to a GeoJson Point feature with an array describing the longitude and latitude.
-
ts: Are the tags that describe the node. Each tag is stored as a concatenation of its key and its value. This creates both a smaller index and it still allows for exact tag/value matches as well as matching against specfic keys through a regular expression match. For example, we could find the above document with:
db.poiConcat.find( { ts: 'name=The Nightingale' } );And the index would also be used when we look for all amenities:
db.poiConcat.find( { ts: /^amenity=/ } ); -
m: Contains meta information that describes the node. The following fields are currently present:
-
v: The object's version. This is the version number of the version that was found in the imported file. There is always just one version per object.
-
cs: The changeset ID in which this object was last updated.
-
uid: The user ID of the OpenStreetMap contributor who uploaded the latest version.
-
ts: The Unix timestamp of when this object was last updated.
-
OpenStreetMap ways are stored in two different types. Unclosed ways are stored as GeoJSON LineStrings (think roads):
{
"_id" : "w2423886",
"ty" : NumberLong(2),
"l" : {
"type" : "LineString",
"coordinates" : [
[ -0.1044769, 51.508462 ],
[ -0.1044093, 51.5106306 ],
[ -0.1044139, 51.5107814 ],
[ -0.104427, 51.5108453 ],
[ -0.1044459, 51.5109208 ],
[ -0.1045131, 51.5110686 ]
]
},
…
And closed ways are stored as GeoJSON Polygons (think buildings and parks):
{
"_id" : "w24257746",
"ty" : NumberLong(2),
"l" : {
"type" : "Polygon",
"coordinates" : [
[
[ -0.0745133, 51.560977 ],
[ -0.0742252, 51.5609742 ],
[ -0.0742308, 51.5606721 ],
[ -0.0745217, 51.5606721 ],
[ -0.0745133, 51.560977 ]
]
]
},
"ts" : [
"amenity=park",
"leisure=park",
"name=Kynaston Gardens"
],
"m" : {
"v" : NumberLong(1),
"cs" : NumberLong(357805),
"uid" : NumberLong(5139),
"ts" : NumberLong(1210169336)
}
}
Both ways and areas (closed ways) will have a ty value of 2, as they both come from a way primitive as stored in OpenStreetMap.
The script is available on GitHub as part of the 3angle repository. The latest version is at https://raw.github.com/derickr/3angle/master/import-data.php and it also requires https://raw.github.com/derickr/3angle/master/classes.php for some GeoJSON helper classes and https://raw.github.com/derickr/3angle/master/config.php where you can set the database name and collection name (in my case, demo and poiConcat).
Map data © OpenStreetMap contributors (terms).
Comments
As interesting as it is, what has this got to do with php, Perhaps planetMongo would be a better place?
@Brooksie: The script is in PHP and shows you how to actually import. I just didn't add the script's code directly into the article.
Also, this is on Planet MongoDB too: http://planet.mongodb.org/search?q=%22Importing+OpenStreetMap%22
I'm curious, are you importing the whole 29 GB planet file into Mongo or just a UK extract?
@Norman: Just the UK for now. I don't have enough disk space for the whole planet.
Ive recently done some work with importing data into MySQL. I'm considering switching to MongoDB though. Do you have an idea of the relative performance differences?
I am not quite sure, but there seems to be something wrong with your "import_data.php" file. It always crashes because it tries to add duplicate indexes. But look yourself:
- PHP Fatal error: Uncaught exception 'MongoCursorException' with message 'localhost:40000: E11000 duplicate key error index: cache_demo.nodecache.$_ dup key: { : 2147483647 }' in /opt/lamp/htdocs/import-data.php:72 Stack trace: #0 /opt/lampp/htdocs/import-data.php(72):MongoCollection->batchInsert(Array, Array) #1 {main}
-
thrown in /opt/lampp/htdocs/import-data.php on line 72
@Chris: This error includes the number 2147483647 which is the maximum 32-bit integer number. I would suspect you are using this script on a 32-bit platform, and not a 64-bit one. I'm afraid that that is not supported as OSM's IDs are now large enough to not fit in 32-bits any more.
Thank you. It works now, sorry for the inconvenience.
Life Line
I've finished reading This Way Up. It's about maps, that went wrong.
It's a good read, but htyerr were several chapters that were written in a novel way (as a video transcript, a series of letters), and I found distracting from the a tail content. It'll have worked better in a produced video.
No mention of @openstreetmap though :-(
Updated a bench
Created a tree; Updated 3 humps and a waste_basket
The Early Cormorant Catches the Eel
Sorry, not the best photo! But I caught this Cormorant catching this large eel when looking for Bank Swallows, right next to Eel Pie Island in the Thames.
#Birds #BirdPhotography #BirdsOfMastodon #Photography #London
Updated an estate_agent office
I went to my nieces' birthday party yesterday.
The theme was pink, and that included all the food, mostly died with beet root.
Shock and horror this morning when doing number two. Not only was my turd dark red, it was also glittering at me. Apparently the carrot cake had edible glitter...
So now I know what's worse than glitter.
😂 ✨ 💩 🟣Long-Tailed Tit on a Branch with Lichen
I've been spending some time in random London local nature reserves.
Sitting and listening, and in fifteen minutes you spot countless species.
This one was in Ham Lands Local Nature Reserve near Teddington.
#london #BirdPhotogaphy #BirdsOfMastodon #Birds #LichenSubscribe
A Colourful Mandarin
In The Long Water in Kensington Palace Gardens, London.
Created 7 benches
Created 2 benches
Created a bench
I walked 7.3km in 2h28m39s
Added a note about a duplicate Papersmiths
I walked 4.1km in 49m02s
Fixed website
fix typo
Updated a bench
I walked 1.6km in 20m26s
I walked 1.1km in 11m49s
The Yellow Eye
A blue heron's head, with its very yellow stare-y eye.
#BirdPhotography #Photography #BirdsOfFediverse #BirdsOfMastodon #London
My little Lego box is telling me it really is quite warm outside.
Created a bicycle_parking and a crossing
I walked 3.3km in 41m56s








Shortlink
This article has a short URL available: https://drck.me/mongosm-a7u