Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/5830

Kernel-based multiple-instance learning

Authors Cheung, Pak Ming
Issue Date 2006
Summary In recent years, the Multiple-Instance Learning (MIL) problem is becoming more and more popular in the machine learning community. Each training object (bag) of the MIL problem is a set of patterns (instances). Label information is only associated with the bags, but not with their constituent instances. Moreover, a positive bag must have at least one positive instance, but may have many neg-ative instances. Since we can only access the label information of a bag and a positive bag may have many negative instances, MIL is more challenging than the traditional supervised learning (or single-instance learning). On the other hand, it is fruitful to study MIL, since many real-world problems such as drug activity prediction are inherently MI problems which cannot be generalized well under the traditional single-instance learning model. In addition, the generaliza-tion performance of many single-instance learning problems, e.g., Content-based Image Retrieval (CBIR), are found to be improved when they are casted into an appropriate MIL representation. In this thesis, I study MIL algorithms based on kernel methods. In particular, I focus on support vector machines, which have been highly successful in many machine learning problems. This thesis first discusses how to re-formulate the SVM to adapt to the MI problem setting by utilizing both the bag and instance information at the same time. After that, I propose how to define a MI kernel over bags based on the marginalizing kernel. The resulted bag kernel can then be used in a standard SVM. I also extend this marginalized kernel to the real-valued regression setting, which is more and more popular in the MIL community. Empirical results show that the proposed methods have better performance over various traditional methods.
Note Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2006
Subjects
Language English
Format Thesis
Access
Files in this item:
File Description Size Format
th_redirect.html 347 B HTML