LocationSpark: A distributed in-memory data management system for big spatial data
Abstract
We present LocationSpark, a spatial data processing system built on top of Apache Spark, a widely used distributed data processing system. LocationSpark offers a rich set of spatial query operators, e.g., range search, kNN, spatio-textual operation, spatial-join, and kNN-join. To achieve high performance, LocationSpark employs various spatial indexes for in-memory data, and guarantees that immutable spatial indexes have low overhead with fault tolerance. In addition, we build two new layers over Spark, namely a query scheduler and a query executor. The query scheduler is responsible for mitigating skew in spatial queries, while the query executor selects the best plan based on the indexes and the nature of the spatial queries. Furthermore, to avoid unnecessary network communication overhead when processing overlapped spatial data, We embed an efficient spatial Bloom filter into LocationSpark's indexes. Finally, LocationSpark tracks frequently accessed spatial data, and dynamically ushes less frequently accessed data into disk. We evaluate our system on real workloads and demonstrate that it achieves an order of magnitude performance gain over a baseline framework.
Collections
- Computer Science & Engineering [2402 items ]
Related items
Showing items related by title, author, creator and subject.
-
TOWARDS AN UNDERSTANDING OF SPATIALITY OF INDETERMINATE SPACES: DOHA MIGRANT LABOURERS AS SPATIAL ACTOR
Khalfani, Fatma Abdullah (2015 , Master Thesis)This study investigated publicly accessible spaces where the city’s normal forces of control have not shaped their perception, usage and occupancy. The so-called indeterminate spaces were examined in traditional Doha ... -
A real-time early warning seismic event detection algorithm using smart geo-spatial bi-axial inclinometer nodes for Industry 4.0 applications
Tariq H.; Touati F.; Al-Hitmi M.A.E.; Crescini D.; Mnaouer A.B. ( MDPI AG , 2019 , Article)Earthquakes are one of the major natural calamities as well as a prime subject of interest for seismologists, state agencies, and ground motion instrumentation scientists. The real-time data analysis of multi-sensor ... -
Spatial Impact Network Exposure Model (SINEM) with Integration of Traffic Characteristics and Air Pollution Concentration
Ghosh, Sumanta; Rp, Rohit ( Qatar Univesrity Press , 2020 , Conference Paper)The State of Qatar has undergone rapid economic growth and urbanization during the last few decades. This has resulted in an increase in the number of motor vehicles in the country. Therefore, it is vital to monitor and ...