SOLD: A node-Splitting algorithm for R-tree based on Objects' Locations Distribution
Spatial data indexing methods are of extreme importance as they massively build up as a result of the explosive growth in capturing data with spatial features. No matter how much the data size is, eventually it will reside on disk pages. Disk pages have to be properly indexed to preserve spatial properties of objects, optimise disk space usage and improve objects’ retrieval performance. One of the most popular spatial data indexes is the R-tree which is a height balanced tree data structure, where leaf nodes resemble disk pages and contain pointers to objects’ locations. A single tree node can host up to a maximum number of objects, where any more insertion makes it an overflown node and it has to be split. Better splits lead to better index performance and more utilisation of disk space. In this work, we introduce a new way of finding the most proper split for an overflown node in the R-tree index. The proposed work scans – in a linear cost – the overflown node’s objects once to identify the distribution of objects’ locations (minimum bounding rectangles (MBRs)) in relative to its node’s bounding rectangle (node’s MBR). It uses objects’ locations to calculate – for each main axis – the split quality factors: expected overlap between resulting nodes, objects distribution evenness among resulting nodes and the perimeter of resulting nodes. The axis with better combined quality factors values is selected as the split axis. The Splitting based on Objects’ Locations Distribution (SOLD) algorithm was implemented and tested against two other splitting algorithms, experiments using synthetic and real data files showed good results and it outperformed both algorithms in index creation tests and data retrieval tests.
- Information Intelligence [17 items ]