Please use this identifier to cite or link to this item:

Efficient co-location pattern discovery

Authors Xiao, Xiangye
Issue Date 2009
Summary Co-location pattern discovery is to find classes of objects whose associated spatial locations are frequently in proximity. For example, map search queries, which contain keywords in text as well as target locations on the map, can be mined for co-located query patterns, i.e., sets of keyword queries that often search for target locations near one another. Such co-located query patterns can be used in location sensitive query suggestion, Point of Interest (POI) recommendation, and local advertising. This thesis investigates ways to improve the efficiency of co-location mining for large data sets, e.g., million-entry map search query logs. In particular, we improve the efficiency of the generate-and-test method, which is commonly used in co-location mining. This method iteratively generates instances of candidate patterns, tests and prunes the false candidates. Through experiments, we find that the major problem in generate-and-test is the excessive number of instances generated for false candidates. This problem causes both the instance generation and the pruning steps to take an unnecessarily long time. Therefore, we propose two orthogonal approaches, DenseMiner and BitmapMiner, to address the problem on instance generation and on pruning respectively. DenseMiner partitions the geographic area of objects and processes the partitions that contain a high density of objects first. This approach works well because the spatial distributions of real-world objects are usually non-uniform. As a result, processing the dense areas first often identifies false candidates early on and avoids generating the instances of these false candidates in the other areas. BitmapMiner also partitions objects by their geographical locations. It further utilizes a bitmap structure to represent the neighboring relationship between classes of objects. The cell size for the bitmap structure is set based on the overall density of the objects. The candidate pruning in each cell can be done on a per-object basis, on a per-cell basis, or in an adaptive, multi-resolution way to match the spatial distribution of the objects in the cell. Consequently, we test and prune false candidates efficiently based on the bitmaps. We have conducted experiments on both synthetic data sets and real-world data sets. The experimental results show that our proposed approaches improve the execution time by up to an order of magnitude over prior work.
Note Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2009
Language English
Format Thesis
Files in this item:
File Description Size Format
th_redirect.html 343 B HTML