Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/2846

Large-scale automatic extraction of an English-Chinese translation lexicon

Authors Wu, Dekai
Xia, Xuanyin
Issue Date 1995
Source Machine Translation , v. 9, (3-4), 1995, p. 285-313
Summary We report experimental results on automatic extraction of an English-Chinese translation lexicon, by statistical analysis of a large parallel corpus, using limited amounts of linguistic knowledge. To our knowledge, these are the first empirical results of the kind between an Indo-European and non-Indo-European language for any significant vocabulary and corpus size. The learned vocabulary size is about 6,500 English words, achieving translation precision in the 86-96% range, with alignment proceeding at paragraph, sentence, and word levels. Specifically, we report (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus, (2) experiments supporting the usefulness of restricted lexical cues for statistical paragraph and sentence alignment, and (3) experiments that question the role of hand-derived monolingual lexicons for automatic word translation acquisition. Using a hand-derived monolingual lexicon, the learned translation lexicon averages 2.33 Chinese translations per English entry, with a manually-filtered precision of 95.1%, and an automatically-filtered weighted precision of 86.0%. We then introduce a fully automatic two-stage statistical methodology that is able to learn translations for collocations. A statistically-learned monolingual Chinese lexicon is first used to segment the Chinese text, before applying bilingual training to produce 6,429 English entries with 2.25 Chinese translations per entry. This method improves the manually-filtered precision to 96.0% and the automatically-filtered weighted precision to 91.0%, an error rate reduction of 35.7% from using a hand-derived monolingual lexicon. © 1995 Kluwer Academic Publishers.
Subjects
ISSN 0922-6567
Rights Machine Translation © copyright (1994) Springer. The original publication is available at http://www.springerlink.com/
Language English
Format Article
Access View full-text via DOI
View full-text via Scopus
Find@HKUST
Files in this item:
File Description Size Format
large.pdf 2711454 B Adobe PDF