08.17.14

Reading OSM data in C++

Posted in Uncategorized at 10:50 am by danvk

I’m interested in using OpenStreetMap data to add lots more shapes to Comparea. There are far too many polygons in OSM to include everything, so you have to filter to “interesting” ones. That’s a hard concept to make precise! One idea is to say that any feature with an associated Wikipedia article is interesting.

To make a list of such features, I started with the planet.osm file, which you can download as a Torrent. This file is in “PBF” format, an OSM-specific format based on Google’s Protocol Buffers. I tried to filter down to just the features with Wikipedia tags using GDAL’s ogr2ogr tool (which supports PBF format), but had no luck.

Instead, I wrote my own filter using C++. This was much easier than you might expect and, since the planet.osm.pbf file is 25GB and growing, probably worth the effort.

I used libosmpbfreader, which depends on libosmpbf, which in turn depends on protoc. On Mac OS X, this was what my install sequence looked like:

brew install protobuf
git clone https://github.com/scrosby/OSM-binary.git
cd OSM-binary
make -C src
make -C src install
cd ..
git clone https://github.com/CanalTP/libosmpbfreader.git
cd libosmpbfreader
make
./example_counter planet.osm.pbf

Running the example_counter binary over planet.osm.pbf, I was able to read something like 2GB/minute of data, so 12-13 minutes for the full file. The wikipedia filtering code ran in ~30 minutes. Here's my full code if you're interested. There were 117,211 ways with Wikipedia tags and 156,064 relations.

Comments are closed.