OSM+: Billion-Level Open Street Map Dataset for City-wide Experiments
Abstract
Road network data provides rich information about cities, but processing a large volume of worldwide OpenStreetMap (OSM) data is computationally intensive, and the resulting graphs are often difficult to unify for benchmarking downstream tasks. Existing graph learning benchmarks fail to capture the billion-scale and unique topological properties of real-world road networks, leading to a gap in our understanding of model scalability. To study and close this gap, we process OpenStreetMap data with distributed cloud computing using 5,000 cores and release OSM+, a structured worldwide 1-billion-vertex road network graph dataset designed for high accessibility and usability. OSM+ is open source and globally downloadable, and it provides an open-box graph structure together with an easy spatial query interface. We demonstrate the utility of OSM+ through three illustrative use cases: city boundary detection, traffic prediction, and traffic policy control. For traffic prediction, we construct a new 31-city benchmark by processing traffic data and combining it with OSM+, enabling broader spatial coverage and more comprehensive evaluation than previously frequently-used datasets, while scaling from hundreds of road network intersections to thousands. For traffic policy control, we release a new six-city dataset at a much larger scale, introducing challenges for thousand-scale multi-agent coordination. In addition, we provide comprehensive data processing tools that support integrating multimodal spatial-temporal data with OSM+ for geospatial foundation model training, thereby expediting the discovery of compelling scientific insights.