Importing OpenStreetMap data into MongoDB
In many recent MongoDB related presentations I have used OpenStreetMap data as basis for most of my examples. I wrote a script that imports OpenStreetMap (OSM) nodes, ways and to a lesser extend relations into MongoDB with a specific and optimal schema. I have written about this briefly before in Indexing Freeform-Tagged Data, but now MongoDB received numerous updates to geospatial indexes I find it warrants a new article.
In MongoDB 2.2 and before, the index type that MongoDB used was the 2d type, which was basically a two dimensional flat earth coordinate system indexβwith some spherical features built on top. MongoDB 2.4 has a new index type: 2sphere. Instead of just being able to index points described by x/y coordinates (longitude/latitude) it now has support for indexing points, line strings and polygons as defined by GeoJSON.
I have modified my import script to use those new types, and also added support for very simple multi-polygons that OpenStreetMap records through its relation tag. The script also creates an indexes on { l: '2dsphere' } (the GeoJson object), { ts: 1 } (the tags), and { ty: 1 } (the type).
The structure it converts a node from OpenStreetMap to looks like:
{
"_id" : "n26486695",
"ty" : NumberLong(1),
"l" : {
"type" : "Point",
"coordinates" : [
-0.1580359,
51.4500055
]
},
"ts" : [
"addr:housenumber=97",
"addr:postcode=SW12 8NX",
"addr:street=Nightingale Lane",
"amenity=pub",
"name=The Nightingale",
"operator=Youngs",
"source:name=photograph",
"toilets=yes",
"toilets:access=customers",
"website=http://www.youngs.co.uk/pub-detail.asp?PubID=430"
],
"m" : {
"v" : NumberLong(5),
"cs" : NumberLong(11229430),
"uid" : NumberLong(652021),
"ts" : NumberLong(1333911628)
}
}
There are several sections that make up the document:
-
_id: Is a combination of n and the OSM node id.
-
ty: Is the type. For nodes, this is always 1.
-
l: The point's location in GeoJson format. An OSM point is translated to a GeoJson Point feature with an array describing the longitude and latitude.
-
ts: Are the tags that describe the node. Each tag is stored as a concatenation of its key and its value. This creates both a smaller index and it still allows for exact tag/value matches as well as matching against specfic keys through a regular expression match. For example, we could find the above document with:
db.poiConcat.find( { ts: 'name=The Nightingale' } );And the index would also be used when we look for all amenities:
db.poiConcat.find( { ts: /^amenity=/ } ); -
m: Contains meta information that describes the node. The following fields are currently present:
-
v: The object's version. This is the version number of the version that was found in the imported file. There is always just one version per object.
-
cs: The changeset ID in which this object was last updated.
-
uid: The user ID of the OpenStreetMap contributor who uploaded the latest version.
-
ts: The Unix timestamp of when this object was last updated.
-
OpenStreetMap ways are stored in two different types. Unclosed ways are stored as GeoJSON LineStrings (think roads):
{
"_id" : "w2423886",
"ty" : NumberLong(2),
"l" : {
"type" : "LineString",
"coordinates" : [
[ -0.1044769, 51.508462 ],
[ -0.1044093, 51.5106306 ],
[ -0.1044139, 51.5107814 ],
[ -0.104427, 51.5108453 ],
[ -0.1044459, 51.5109208 ],
[ -0.1045131, 51.5110686 ]
]
},
β¦
And closed ways are stored as GeoJSON Polygons (think buildings and parks):
{
"_id" : "w24257746",
"ty" : NumberLong(2),
"l" : {
"type" : "Polygon",
"coordinates" : [
[
[ -0.0745133, 51.560977 ],
[ -0.0742252, 51.5609742 ],
[ -0.0742308, 51.5606721 ],
[ -0.0745217, 51.5606721 ],
[ -0.0745133, 51.560977 ]
]
]
},
"ts" : [
"amenity=park",
"leisure=park",
"name=Kynaston Gardens"
],
"m" : {
"v" : NumberLong(1),
"cs" : NumberLong(357805),
"uid" : NumberLong(5139),
"ts" : NumberLong(1210169336)
}
}
Both ways and areas (closed ways) will have a ty value of 2, as they both come from a way primitive as stored in OpenStreetMap.
The script is available on GitHub as part of the 3angle repository. The latest version is at https://raw.github.com/derickr/3angle/master/import-data.php and it also requires https://raw.github.com/derickr/3angle/master/classes.php for some GeoJSON helper classes and https://raw.github.com/derickr/3angle/master/config.php where you can set the database name and collection name (in my case, demo and poiConcat).
Map data Β© OpenStreetMap contributors (terms).
Comments
As interesting as it is, what has this got to do with php, Perhaps planetMongo would be a better place?
@Brooksie: The script is in PHP and shows you how to actually import. I just didn't add the script's code directly into the article.
Also, this is on Planet MongoDB too: http://planet.mongodb.org/search?q=%22Importing+OpenStreetMap%22
I'm curious, are you importing the whole 29 GB planet file into Mongo or just a UK extract?
@Norman: Just the UK for now. I don't have enough disk space for the whole planet.
Ive recently done some work with importing data into MySQL. I'm considering switching to MongoDB though. Do you have an idea of the relative performance differences?
I am not quite sure, but there seems to be something wrong with your "import_data.php" file. It always crashes because it tries to add duplicate indexes. But look yourself:
- PHP Fatal error: Uncaught exception 'MongoCursorException' with message 'localhost:40000: E11000 duplicate key error index: cache_demo.nodecache.$_ dup key: { : 2147483647 }' in /opt/lamp/htdocs/import-data.php:72 Stack trace: #0 /opt/lampp/htdocs/import-data.php(72):MongoCollection->batchInsert(Array, Array) #1 {main}
-
thrown in /opt/lampp/htdocs/import-data.php on line 72
@Chris: This error includes the number 2147483647 which is the maximum 32-bit integer number. I would suspect you are using this script on a 32-bit platform, and not a 64-bit one. I'm afraid that that is not supported as OSM's IDs are now large enough to not fit in 32-bits any more.
Thank you. It works now, sorry for the inconvenience.
Life Line
π· Low Thames
π© Ranelagh Gardens, London Borough of Hammersmith and Fulham, United Kingdom
Updated a cafe
Created 2 gates; Updated a gate
Updated a restaurant
Updated a restaurant; Confirmed a cafe
π· Shiny Translucent Mushroom
π© De Steeg, Nederland
I walked 1.7km in 18m14s
My personal domain name needs renewal. Gandi is charging Β£27 for this. I'll be moving it to @beasts, which charge Β£11, and actively reply to my silly questions here on Mastodon.
I walked 6.9km in 1h9m01s
π· Public Footpath/Bridleway
π© West Parkside, Chelsham, United Kingdom
Update shops and extend of Londis on Harrow Road (and realign some odd alignments of buildings)
Created an interior_decoration shop, an entrance, and an alcohol shop; Updated a caterer, an architect office, and 3 other objects; Confirmed a bakery shop, a gallery, and 2 other objects
I walked 7.2km in 1h33m59s
I walked 1.2km in 10m20s
My union UTAW (with others) is hosting a session with @pluralistic on enshittification of the Internet next week: https://tuccampaigns.typeform.com/to/cWeW7uQ2?typeform-source=tech.unions.org.uk
It has an open registration.
You should still join a union: https://utaw.tech/
π· Avenue Gardens
π© Princess Road, London, United Kingdom
Back to -dev
Go with 3.5.0alpha3
Use shell_exec because the grinch has taken away my backticks






Shortlink
This article has a short URL available: https://drck.me/mongosm-a7u