Geographica

Recent advances to the state of the art in query languages and implemented systems for linked geospatial data has not so far been matched with much work on the evaluation and benchmarking of implemented geospatial RDF stores. Although there are various benchmarks for spatially enabled RDBMS (e.g., SEQUOIA 2000, VESPA, Jackpine, DynaMark), there is only one paper by Dave Kolas in the literature that proposes a benchmark for geospatial data expressed in RDF. However, since this work has preceded the proposal of GeoSPARQL and stSPARQL, it does not cover much of the features available in these languages. For example, only point and rectangle geometries are used in the data and only two topological functions and two non-topological functions are considered, while metric spatial functions and spatial aggregates are not discussed. Similarly, only the geospatial RDF store SPAUK, which is a precursor to Parliament, has been evaluated by this paper, which uses a synthetic workload only and does not consider any linked geospatial datasets such as the ones that are available in the LOD cloud today.

We have developed a benchmark, that goes significantly beyond this work and can be used for the evaluation of the new generation of RDF stores supporting the query languages GeoSPARQL and stSPARQL. Our benchmark, nick-named Geographica, is composed by two workloads with their associated datasets and queries: a real-world workload based on publicly available linked data sets and a synthetic workload. The real-world workload uses publicly available linked geospatial data, covering a wide range of geometry types (e.g., points, lines, polygons). To define this workload, we follow the approach of the benchmark Jackpine and we define a micro benchmark and a macro benchmark. The micro benchmark tests primitive spatial functions. We check the spatial component of a system with queries that use non-topological functions, spatial selections, spatial joins and spatial aggregate functions. In the macro benchmark we test the performance of the selected RDF stores in typical application scenarios like reverse geocoding, map search and browsing, and a real-world use case from the Earth Observation domain. In the second workload of Geographica we use a generator that produces synthetic datasets of various sizes and generates queries of varying thematic and spatial selectivity. In this way, we can perform the evaluation of geospatial RDF stores in a controlled environment. For reasons of reproducibility, both workloads are publicly available.

We chose to test the systems Strabon, Parliament and uSeekM. To the best of our knowledge, these systems are the only ones that currently provide support for a rich subset of GeoSPARQL and stSPARQL. Other RDF stores like OpenLink Virtuoso, OWLIM and AllegroGraph, allow only the representation of point geometries and provide support for a few geospatial functions. The limited functionality provided by these systems did not allow us to include them in the main part of the benchmark.

See the paper Geographica: A Benchmark for Geospatial RDF Stores for more details.