Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/4231

Learning bilingual semantic frames

Authors Wu, Zhaojun
Issue Date 2008
Summary We present our studies on the task of automatically learning bilingual semantic frames from a Chinese and English parallel corpus in this thesis. Bilingual semantic frames, the mappings of core semantic arguments (roles) for a predicate pair in a bi-sentence, have the potential to improve the translation quality of the Statistical Machine Translation (SMT) system. As a prerequisite, we first report our research on the subtask of Chinese Semantic Role Labeling (SRL). We present our implementation of two new state-of-the-art Chinese shallow semantic parsers, based on the Support Vector Machine (SVM) and the Maximum Entropy classification techniques. We also present a full-scale feature comparison and classifier performance comparison, and propose some new important features in this subtask. We also propose to learn bilingual semantic frames from a parallel corpus of translated sentence pairs. We first present our observation on a reference set that is manually extracted from the parallel corpus. We find that a considerable 15.73% of semantic argument mappings are not direct mappings but mismatches, which means the core semantic argument i in Chinese is not aligned to i in English. We then present a conventional model SYN_ALIGN that acquires bilingual semantic frames from the results of semantic role projection based on syntactic constituent alignment. The evaluation result shows that, unfortunately, SYN_ALIGN only achieves a very modest performance (44.80% F-measure) due to its brittle assumption that all semantic arguments in one language can directly map to their syntactic counterparts in the other language. Therefore, we propose our novel model ARG_ALIGN to learn bilingual semantic frames using phrasal similarity measure of semantic roles that are automatically produced by two monolingual semantic parsers. As a result, ARG_ALIGN surpasses SYN_ALIGN by about 25 points in F-measure and has an 86% F-measure upper bound. Our experimental results suggest that, for integrating bilingual semantic frames into an SMT system, ARG_ALIGN is a much better solution to acquire such frames.
Note Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2008
Subjects
Language English
Format Thesis
Access
Files in this item:
File Description Size Format
th_redirect.html 339 B HTML