HKUST Library Institutional Repository Banner

HKUST Institutional Repository >
Computer Science and Engineering >
CSE Conference Papers >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/6000
Title: Query by document
Authors: Yang, Yin
Bansal, Nilesh
Dakka, Wisam
Ipeirotis, Panagiotis
Koudas, Nick
Papadias, Dimitris
Keywords: Wikipedia
Cross referencing
Blog
Similarity matching
Issue Date: 2009
Citation: Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM), Barcelona, Spain, February 9-12, 2009, p. 34-43
Abstract: We are experiencing an unprecedented increase of content contributed by users in forums such as blogs, social networking sites and microblogging services. Such abundance of content complements content on web sites and traditional media forums such as news papers, news and financial streams, and so on. Given such plethora of information there is a pressing need to cross reference information across textual services. For example, commonly we read a news item and we wonder if there are any blogs reporting related content or vice versa. In this paper, we present techniques to automate the process of cross referencing online information content. We introduce methodologies to extract phrases from a given “query document” to be used as queries to search interfaces with the goal to retrieve content related to the query document. In particular, we consider two techniques to extract and score key phrases. We also consider techniques to complement extracted phrases with information present in external sources such as Wikipedia and introduce an algorithm called RelevanceRank for this purpose. We discuss both these techniques in detail and provide an experimental study utilizing a large number of human judges from Amazons’s Mechanical Turk service. Detailed experiments demonstrate the effectiveness and efficiency of the proposed techniques for the task of automating retrieval of documents related to a query document.
Rights: © ACM, 2009. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM), Barcelona, Spain, February 9-12, 2009, p. 34-43.
URI: http://hdl.handle.net/1783.1/6000
Appears in Collections:CSE Conference Papers

Files in This Item:

File Description SizeFormat
WSDM09-QBD.pdfpre-published version598KbAdobe PDFView/Open

All items in this Repository are protected by copyright, with all rights reserved.