|
HKUST Institutional Repository >
Computer Science and Engineering >
CSE Master Theses >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1783.1/4231
|
| Title: | Learning bilingual semantic frames |
| Authors: | Wu, Zhaojun |
| Issue Date: | 2008 |
| Abstract: | We present our studies on the task of automatically learning bilingual semantic frames from a Chinese and English parallel corpus in this thesis. Bilingual semantic frames, the mappings of core semantic arguments (roles) for a predicate pair in a bi-sentence, have the potential to improve the translation quality of the Statistical Machine Translation (SMT) system.
As a prerequisite, we first report our research on the subtask of Chinese Semantic Role Labeling (SRL). We present our implementation of two new state-of-the-art Chinese shallow semantic parsers, based on the Support Vector Machine (SVM) and the Maximum Entropy classification techniques. We also present a full-scale feature comparison and classifier performance comparison, and propose some new important features in this subtask.
We also propose to learn bilingual semantic frames from a parallel corpus of translated sentence pairs. We first present our observation on a reference set that is manually extracted from the parallel corpus. We find that a considerable 15.73% of semantic argument mappings are not direct mappings but mismatches, which means the core semantic argument i in Chinese is not aligned to i in English.
We then present a conventional model SYN_ALIGN that acquires bilingual semantic frames from the results of semantic role projection based on syntactic constituent alignment. The evaluation result shows that, unfortunately, SYN_ALIGN only achieves a very modest performance (44.80% F-measure) due to its brittle assumption that all semantic arguments in one language can directly map to their syntactic counterparts in the other language. Therefore, we propose our novel model ARG_ALIGN to learn bilingual semantic frames using phrasal similarity measure of semantic roles that are automatically produced by two monolingual semantic parsers. As a result, ARG_ALIGN surpasses SYN_ALIGN by about 25 points in F-measure and has an 86% F-measure upper bound.
Our experimental results suggest that, for integrating bilingual semantic frames into an SMT system, ARG_ALIGN is a much better solution to acquire such frames. |
| Description: | Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2008 x, 75 leaves : ill. ; 30 cm HKUST Call Number: Thesis CSED 2008 WuZ |
| URI: | http://hdl.handle.net/1783.1/4231 |
| Appears in Collections: | CSE Master Theses
|
Files in This Item:
| File |
Description |
Size | Format |
| th_redirect.html | | 0Kb | HTML | View/Open |
|
All items in this Repository are protected by copyright, with all rights reserved.
|