---
title: OpenStreetMap PBF Parser
description: Protocol Buffers from scratch, no libraries
section: craft
tags: [project, systems-programming]
genre: reference
stability: stable
lastUpdated: 2026-04-19
url: https://fardiniqbal.com/docs/craft/projects/openstreetmap-pbf-parser
---


An OSM `.pbf` parser in C with a hand-written Protocol Buffers
deserializer. 2,672 lines across 11 files, one runtime dependency
(`zlib`), and no `libprotobuf` or `protoc`-generated code. Five CLI
query modes against real OpenStreetMap extracts.

## What it is [#what-it-is]

Given a `.pbf` file, the tool decodes the blob stream, inflates
compressed data blocks, reconstructs `HeaderBlock` and `PrimitiveBlock`
messages, and exposes the underlying `OSM_Map` as nodes, ways, and a
bounding box. All Protocol Buffers parsing — tag/wire-type decoding,
varint, length-delimited fields, fixed-width `I32`/`I64`, packed
repeated fields, embedded messages — is hand-written against the
`protobuf.h` spec. The CLI answers structural and key/value queries in
a single pass over the parsed map.

## By the numbers [#by-the-numbers]

| Metric               | Value                                                |
| -------------------- | ---------------------------------------------------- |
| C source             | 2,672 lines across 11 files                          |
| Runtime dependencies | 1 (`zlib`)                                           |
| CLI query modes      | 5 (summary, bbox, node, way, tag filter)             |
| Sample map           | 46,415 nodes, 5,812 ways                             |
| Sample bbox          | `-73.1387 .. -73.1074` lon, `40.9040 .. 40.9290` lat |
| Compile flags        | `-std=gnu11 -Wall -Werror`                           |

## Architecture [#architecture]

```
.pbf file
   |
   v
+--------------------+
| main.c             |  argv -> validate -> OSM_read_Map -> query
+--------------------+
           |
           v
+--------------------+
| osmpbf.c           |  blob loop: BlobHeader, Blob, inflate,
|                    |  HeaderBlock vs PrimitiveBlock, build OSM_Map
+--------------------+
           |
           v
+--------------------+
| protobuf.c         |  tag/varint/length-delimited/I32/I64/packed
|                    |  decoding into PB_Field linked list
+--------------------+
           |
           v
+--------------------+
| zlib_inflate.c     |  zlib inflate() over fmemopen / open_memstream
+--------------------+
```

| Path                 | Lines | Role                                                       |
| -------------------- | ----: | ---------------------------------------------------------- |
| `src/osmpbf.c`       | 1,038 | OSM model, blob loop, string table, delta + zig-zag decode |
| `src/protobuf.c`     |   912 | wire-format decoder, packed field expansion                |
| `src/process_args.c` |   233 | CLI validation and query dispatch                          |
| `include/protobuf.h` |    95 | `PB_Field`, `PB_Message`, wire-type enum                   |
| `src/zlib_inflate.c` |    83 | zlib stream inflation over FILE streams                    |
| `src/main.c`         |    68 | entrypoint, two-pass argv handling                         |
| `include/osm.h`      |    63 | opaque `OSM_Map` / `OSM_Node` / `OSM_Way`                  |

## Key features [#key-features]

* **Hand-written Protocol Buffers deserializer** — `PB_read_tag` splits
  wire type (3 bits) and field number; `PB_read_value` dispatches on
  `VARINT_TYPE`, `I64_TYPE`, `LEN_TYPE`, `I32_TYPE`; fields stream into
  a circular doubly-linked list headed by a `SENTINEL_TYPE` for
  forward/backward traversal.
* **Embedded-message and packed-field handling** —
  `PB_read_embedded_message` parses nested messages from in-memory
  buffers, inflating zlib blobs on the fly;
  `PB_expand_packed_fields` expands packed repeated scalars into
  individual `PB_Field` entries for uniform traversal.
* **Zlib decompression over memory buffers** — compressed `Blob`
  payloads pipe through `zlib_inflate` using `fmemopen` /
  `open_memstream`, so inflation works against in-memory buffers
  without temp files.
* **Delta + zig-zag decoding** — `DenseNodes` store IDs, lat, and lon
  as deltas; the parser accumulates the running sum and reverses
  zig-zag (`(n << 1) ^ (n >> 63)`) so negative coordinates round-trip.
  Nanodegrees print as decimal degrees at 6-digit precision.
* **Opaque OSM object model** — `include/osm.h` exposes `OSM_Map`,
  `OSM_BBox`, `OSM_Node`, `OSM_Way` as opaque handles; nodes and ways
  carry parallel `keys` / `vals` arrays built from the PrimitiveBlock
  string table.
* **Five CLI query modes** — `-s` summary, `-b` bounding box, `-n <id>`
  node lookup, `-w <id> [key ...]` way lookup (node refs or tag
  values), `-f <file>` input path. Argument order flexible; validation
  first pass, queries second pass.

## What makes it stand out [#what-makes-it-stand-out]

* **No `libprotobuf`, no `protoc`.** The entire wire format — varint,
  zig-zag, delta, packed repeated, embedded messages — is decoded by
  hand against the Protocol Buffers spec. The only runtime link is
  `libz`.
* **Two-pass CLI against a single loaded map.** Validation and query
  phases share one in-memory `OSM_Map`, so `-s -b -n <id>` on the same
  invocation parses the file once.
* **Valgrind-clean.** Opaque types, explicit ownership, no leaks on
  the sample extract.

## Stack [#stack]

| Layer       | Technology                        |
| ----------- | --------------------------------- |
| Language    | C (`-std=gnu11`)                  |
| Build       | GNU Make, `gcc`, `-MMD` auto-deps |
| Compression | `zlib` (`-lz`)                    |
| Tests       | Criterion (`-lcriterion`)         |
| Platform    | Linux, macOS                      |

Built for CSE 320 (Systems Fundamentals) at Stony Brook, Jan–Feb 2025.
The course fixed the public API in `include/protobuf.h` and
`include/osm.h`; the `src/` implementation is original.

## Links [#links]

* **Source:** [https://github.com/FardinIqbal/openstreetmap-pbf-parser](https://github.com/FardinIqbal/openstreetmap-pbf-parser)
