|
HKUST Institutional Repository >
Computer Science and Engineering >
CSE Master Theses >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1783.1/4232
|
| Title: | Multi-schema entity resolution |
| Authors: | Huang, Qiong |
| Issue Date: | 2008 |
| Abstract: | Entity resolution (ER) is the problem of identifying and merging the records judged to represent the same real-world entity. Most previous ER approaches assumed a unified schema (or a global schema) under which all records are compared and merged in a field-by-field basis. We consider the multi-schema ER problem in which records come from multiple sources that are of different schemas. A prime example of multi-schema ER is Information Integration over the deep web, where the goal is to integrate data from heterogeneous sources.
In this thesis, we formalize the multi-schema ER problem, investigate some properties that are satisfied in a unified-schema setting, but not in a multi-schema setting, and identify the possible resolution conflicts that might occur in a multi-schema setting using the previous ER approaches. We then propose the validity-ensured and order-sensitive (VEOS) algorithm that is free from such conflicts and, at the same time, can take advantage of order scheduling to improve accuracy.
We identify schema-level and data-level criteria to distinguish the more reliable comparisons so that by comparing them first a more accurate result is obtained. To leverage such information, we propose to construct a confidence graph upon which our scheduling algorithm is developed. Our experiments, using real online shopping data, show that: (1) our scheduling algorithm is very effective in improving accuracy, and (2) VEOS with scheduling outperforms other methods in both accuracy and efficiency. |
| Description: | Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2008 ix, 56 leaves : ill. ; 30 cm HKUST Call Number: Thesis CSED 2008 Huang |
| URI: | http://hdl.handle.net/1783.1/4232 |
| Appears in Collections: | CSE Master Theses
|
Files in This Item:
| File |
Description |
Size | Format |
| th_redirect.html | | 0Kb | HTML | View/Open |
|
All items in this Repository are protected by copyright, with all rights reserved.
|