HKUST Library Institutional Repository Banner

HKUST Institutional Repository >
Computer Science and Engineering >
CSE Master Theses  >

Please use this identifier to cite or link to this item:
Title: Learning bilingual semantic frames
Authors: Wu, Zhaojun
Issue Date: 2008
Abstract: We present our studies on the task of automatically learning bilingual semantic frames from a Chinese and English parallel corpus in this thesis. Bilingual semantic frames, the mappings of core semantic arguments (roles) for a predicate pair in a bi-sentence, have the potential to improve the translation quality of the Statistical Machine Translation (SMT) system. As a prerequisite, we first report our research on the subtask of Chinese Semantic Role Labeling (SRL). We present our implementation of two new state-of-the-art Chinese shallow semantic parsers, based on the Support Vector Machine (SVM) and the Maximum Entropy classification techniques. We also present a full-scale feature comparison and classifier performance comparison, and propose some new important features in this subtask. We also propose to learn bilingual semantic frames from a parallel corpus of translated sentence pairs. We first present our observation on a reference set that is manually extracted from the parallel corpus. We find that a considerable 15.73% of semantic argument mappings are not direct mappings but mismatches, which means the core semantic argument i in Chinese is not aligned to i in English. We then present a conventional model SYN_ALIGN that acquires bilingual semantic frames from the results of semantic role projection based on syntactic constituent alignment. The evaluation result shows that, unfortunately, SYN_ALIGN only achieves a very modest performance (44.80% F-measure) due to its brittle assumption that all semantic arguments in one language can directly map to their syntactic counterparts in the other language. Therefore, we propose our novel model ARG_ALIGN to learn bilingual semantic frames using phrasal similarity measure of semantic roles that are automatically produced by two monolingual semantic parsers. As a result, ARG_ALIGN surpasses SYN_ALIGN by about 25 points in F-measure and has an 86% F-measure upper bound. Our experimental results suggest that, for integrating bilingual semantic frames into an SMT system, ARG_ALIGN is a much better solution to acquire such frames.
Description: Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2008
x, 75 leaves : ill. ; 30 cm
HKUST Call Number: Thesis CSED 2008 WuZ
Appears in Collections:CSE Master Theses

Files in This Item:

File Description SizeFormat

All items in this Repository are protected by copyright, with all rights reserved.