HKUST Library Institutional Repository Banner

HKUST Institutional Repository >
Electronic and Computer Engineering  >
ECE Master Theses >

Please use this identifier to cite or link to this item:
Title: Word sense alignment using bilingual corpora
Authors: Carpuat, Marine Jacinthe
Issue Date: 2002
Abstract: The growing importance of multilingual information retrieval and machine translation has made multilingual ontologies an extremely valuable resource. Since the construction of an ontology from scratch is a very expensive and time consuming undertaking, it is attractive to consider ways of automatically aligning monolingual ontologies, which already exist for many of the world’s major languages. Previous research exploited similarity in the structure of the ontologies to align, or manually created bilingual resources. These approaches cannot be used to align ontologies with vastly different structures, and can only be applied to much studied language pairs for which expensive resources are already available. In this thesis, we propose a novel approach to align the ontologies at a node level: Given a concept represented by a particular word sense in one ontology, our task is to find the best corresponding word sense in the second language ontology. To this end, we present a language-independent, corpus-based method that borrows from techniques used in information retrieval and machine translation. We show its efficiency by applying it to two very different ontologies in very different languages: the Mandarin Chinese HowNet and the American English WordNet . Moreover, we propose a methodology to measure bilingual corpora comparability and show that our method is robust enough to use noisy non-parallel bilingual corpora efficiently.
Description: Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2002
vii, 44 leaves : ill. ; 30 cm
HKUST Call Number: Thesis ELEC 2002 Carpua
Appears in Collections:ECE Master Theses

Files in This Item:

File Description SizeFormat

All items in this Repository are protected by copyright, with all rights reserved.