Importing OpenStreetMap data into MongoDB

In many recent MongoDB related presentations I have used OpenStreetMap data as basis for most of my examples. I wrote a script that imports OpenStreetMap (OSM) nodes, ways and to a lesser extend relations into MongoDB with a specific and optimal schema. I have written about this briefly before in Indexing Freeform-Tagged Data, but now MongoDB received numerous updates to geospatial indexes I find it warrants a new article.

In MongoDB 2.2 and before, the index type that MongoDB used was the 2d type, which was basically a two dimensional flat earth coordinate system index—with some spherical features built on top. MongoDB 2.4 has a new index type: 2sphere. Instead of just being able to index points described by x/y coordinates (longitude/latitude) it now has support for indexing points, line strings and polygons as defined by GeoJSON.

I have modified my import script to use those new types, and also added support for very simple multi-polygons that OpenStreetMap records through its relation tag. The script also creates an indexes on { l: '2dsphere' } (the GeoJson object), { ts: 1 } (the tags), and { ty: 1 } (the type).

The structure it converts a node from OpenStreetMap to looks like:

{
        "_id" : "n26486695",
        "ty" : NumberLong(1),
        "l" : {
                "type" : "Point",
                "coordinates" : [
                        -0.1580359,
                        51.4500055
                ]
        },
        "ts" : [
                "addr:housenumber=97",
                "addr:postcode=SW12 8NX",
                "addr:street=Nightingale Lane",
                "amenity=pub",
                "name=The Nightingale",
                "operator=Youngs",
                "source:name=photograph",
                "toilets=yes",
                "toilets:access=customers",
                "website=http://www.youngs.co.uk/pub-detail.asp?PubID=430"
        ],
        "m" : {
                "v" : NumberLong(5),
                "cs" : NumberLong(11229430),
                "uid" : NumberLong(652021),
                "ts" : NumberLong(1333911628)
        }
}

There are several sections that make up the document:

  • _id: Is a combination of n and the OSM node id.

  • ty: Is the type. For nodes, this is always 1.

  • l: The point's location in GeoJson format. An OSM point is translated to a GeoJson Point feature with an array describing the longitude and latitude.

  • ts: Are the tags that describe the node. Each tag is stored as a concatenation of its key and its value. This creates both a smaller index and it still allows for exact tag/value matches as well as matching against specfic keys through a regular expression match. For example, we could find the above document with:

    db.poiConcat.find( { ts: 'name=The Nightingale' } );

    And the index would also be used when we look for all amenities:

    db.poiConcat.find( { ts: /^amenity=/ } );

  • m: Contains meta information that describes the node. The following fields are currently present:

    • v: The object's version. This is the version number of the version that was found in the imported file. There is always just one version per object.

    • cs: The changeset ID in which this object was last updated.

    • uid: The user ID of the OpenStreetMap contributor who uploaded the latest version.

    • ts: The Unix timestamp of when this object was last updated.

OpenStreetMap ways are stored in two different types. Unclosed ways are stored as GeoJSON LineStrings (think roads):

{
        "_id" : "w2423886",
        "ty" : NumberLong(2),
        "l" : {
                "type" : "LineString",
                "coordinates" : [
                        [ -0.1044769, 51.508462 ],
                        [ -0.1044093, 51.5106306 ],
                        [ -0.1044139, 51.5107814 ],
                        [ -0.104427, 51.5108453 ],
                        [ -0.1044459, 51.5109208 ],
                        [ -0.1045131, 51.5110686 ]
                ]
        },
…

And closed ways are stored as GeoJSON Polygons (think buildings and parks):

{
    "_id" : "w24257746",
    "ty" : NumberLong(2),
    "l" : {
        "type" : "Polygon",
        "coordinates" : [
            [
                [ -0.0745133, 51.560977 ],
                [ -0.0742252, 51.5609742 ],
                [ -0.0742308, 51.5606721 ],
                [ -0.0745217, 51.5606721 ],
                [ -0.0745133, 51.560977 ]
            ]
        ]
    },
    "ts" : [
        "amenity=park",
        "leisure=park",
        "name=Kynaston Gardens"
    ],
    "m" : {
        "v" : NumberLong(1),
        "cs" : NumberLong(357805),
        "uid" : NumberLong(5139),
        "ts" : NumberLong(1210169336)
    }
}

Both ways and areas (closed ways) will have a ty value of 2, as they both come from a way primitive as stored in OpenStreetMap.

The script is available on GitHub as part of the 3angle repository. The latest version is at https://raw.github.com/derickr/3angle/master/import-data.php and it also requires https://raw.github.com/derickr/3angle/master/classes.php for some GeoJSON helper classes and https://raw.github.com/derickr/3angle/master/config.php where you can set the database name and collection name (in my case, demo and poiConcat).

Map data © OpenStreetMap contributors (terms).

Shortlink

This article has a short URL available: http://drck.me/mongosm-a7u

Comments

As interesting as it is, what has this got to do with php, Perhaps planetMongo would be a better place?

@Brooksie: The script is in PHP and shows you how to actually import. I just didn't add the script's code directly into the article.

Also, this is on Planet MongoDB too: http://planet.mongodb.org/search?q=%22Importing+OpenStreetMap%22

I'm curious, are you importing the whole 29 GB planet file into Mongo or just a UK extract?

@Norman: Just the UK for now. I don't have enough disk space for the whole planet.

Ive recently done some work with importing data into MySQL. I'm considering switching to MongoDB though. Do you have an idea of the relative performance differences?

I am not quite sure, but there seems to be something wrong with your "import_data.php" file. It always crashes because it tries to add duplicate indexes. But look yourself:

PHP Fatal error: Uncaught exception 'MongoCursorException' with message 'localhost:40000: E11000 duplicate key error index: cache_demo.nodecache.$_ dup key: { : 2147483647 }' in /opt/lamp/htdocs/import-data.php:72 Stack trace: #0 /opt/lampp/htdocs/import-data.php(72):MongoCollection->batchInsert(Array, Array) #1 {main}

thrown in /opt/lampp/htdocs/import-data.php on line 72

@Chris: This error includes the number 2147483647 which is the maximum 32-bit integer number. I would suspect you are using this script on a 32-bit platform, and not a 64-bit one. I'm afraid that that is not supported as OSM's IDs are now large enough to not fit in 32-bits any more.

Thank you. It works now, sorry for the inconvenience.

Add Comment

Name:
Email:

Will not be posted. Please leave empty instead of filling in garbage though!
Comment:

Please follow the reStructured Text format. Do not use the comment form to report issues in software, use the relevant issue tracker. I will not answer them here.


All comments are moderated

Life Line