HKUST Library Institutional Repository Banner

HKUST Institutional Repository >
Computer Science and Engineering >
CSE Doctoral Theses >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/2780
Title: On the use of the appropriateness and cohesiveness Web data quality dimensions for finding high quality Web pages
Authors: Pun, Joshua Chun-chung
Issue Date: 2006
Abstract: While users can readily find information from the immense store of knowledge in the Web with the help of a search engine, they often complain about the quality of the results returned. The quality of a web page is not necessarily only limited to its relevance, as judged by most search engines, but can also depend on other features of a web page. This thesis first examines what factors constitute a high quality web page; hence a framework of web data quality dimensions is proposed. From this framework, it is found that current search engines only consider a very small number of web data quality dimensions. We then propose a general methodology for evaluating web data quality metrics derived from the web data quality dimensions. The methodology has been applied to two web data quality dimensions (that is, appropriateness and cohesiveness) as well as the combination of these two dimensions. Metrics were developed to measure each dimension from a web page. They have been verified to measure users’ expectations. The web data quality dimension, appropriateness, measures how well the results returned from search engines satisfying the web genre needs of a user. It is based on the linguistic and visual complexity of a web page. The web data quality dimension, cohesiveness, is a measure of how closely the concepts within a web page are related to each other. A distance metric is defined to measure how close two concepts are in an ontology and the cohesiveness of a web page is calculated as the total distances of all the concepts within it. In addition, a technique to combine different quality metrics and to incorporate user’s preference for each web data quality dimension is proposed. With these metrics, users can more easily find high quality web pages (i.e., not just relevant web pages, but also web pages matching other desired web data quality dimensions). A user evaluation has been conducted to show the conformance of the metrics to the corresponding web data quality dimensions (i.e., appropriateness, cohesiveness and combined dimensions). It also revealed that the judgments on these dimensions from users are fairly consistent.
Description: Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2006
xx, 202 leaves : ill. ; 30 cm
HKUST Call Number: Thesis COMP 2006 Pun
URI: http://hdl.handle.net/1783.1/2780
Appears in Collections:CSE Doctoral Theses

Files in This Item:

File Description SizeFormat
th_redirect.html0KbHTMLView/Open

All items in this Repository are protected by copyright, with all rights reserved.