Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/7562

Transfer learning with open web data

Authors Xiang, Wei
Issue Date 2012
Summary In recent years, transfer learning has been applied to a variety of real-world application domains, ranging from text classification, image classification, link prediction, activity recognition, to social network analysis. Transfer learning is particularly useful when we only have limited labeled data in a target domain, which requires that we consult one or more auxiliary or source domains to gain insight on how to solve the target problem. Thus, the key point for successful knowledge transfer is that one or more “right” source data should be given by the problem designer at the learning time. However, it is very difficult to identify a proper set of source data. An intuitive idea is whether we can directly seek the needed source data from the open Web. In this thesis, we try to study how to extend the existing transfer learning techniques to cope with the need for transfer learning from the massive and noisy Web data. The main contribution of this thesis is that we use two popular applications as prototypes and investigate their applicability and the difficulties in theWeb-based transfer learning. We focus on tackling the following four research issues: (1) Transfer over information gap; (2) Transfer from heterogeneous data; (3) Transfer with partially labeled correspondence; (4) Selective transfer from massive and noisy sources. For each of the above mentioned issues, we first conduct extensive study on the difficulty of the problems, and then propose a series of effective solutions accordingly. Moreover, to cope with the need for manipulating the massive Web data as the source, we also investigate how to make our transfer learning models to be scalable with the assist of distributed computing techniques. We apply these methods to two diverse applications: text classification and link prediction, and achieve promising results. Experimental results show that our methods can successfully benefit from the truly useful information contained in the Web, while reducing the risks caused by massive and noisy property of the open Web to the minimum.
Note Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2012
Subjects
Language English
Format Thesis
Access
Files in this item:
File Description Size Format
th_redirect.html 343 B HTML