HKUST Library Institutional Repository Banner

HKUST Institutional Repository >
Computer Science and Engineering >
CSE Master Theses  >

Please use this identifier to cite or link to this item:
Title: Conversion of Chinese phonetic symbols to characters
Authors: Chung, Kim-hang
Issue Date: 1993
Abstract: Conversion of Chinese phonetic symbols to the corresponding Chinese characters has been one of the most important topics currently being pursued in the field of Chinese information processing. In Mandarin Chinese, the character to syllable and syllable to character are both many to many mappings. There are around 1,300 different syllables in Mandarin, but more than thirteen thousand commonly used Chinese characters. Some syllables can be mapped to more than 100 characters (e.g. yi). In this research, the conversion of Chinese phonetic symbols to Chinese characters is based on linguistic and statistical techniques. The phonetic symbols are first segmented into a list of syllable words by the Augmented Maximal Matching method developed in this thesis. A syllable word is a sequence of syllable that can be transcribed to one or more valid Chinese words. Augmented Maximal Matching uses Maximal Matching as backbone, integrates with special techniques that identify derived words, and modules that use both linguistic and statistical methods to determine the final segmentation. The ambiguity in syllable words are then resolved by idiomatic phrase matching, adjacency constraint rules, and statistical methods. A working prototype system to demonstrate the techniques developed in the project, together with compilers for the linguistic information, has been implemented. Extensive experiments on data from domains, ranging form science articles and linguistic text to political articles, have been done and the sources of error have been identified and analyzed. Experimental results based on 264 sentences, with 3,001 characters, show the error rate of transcription is 1.3%. The techniques and linguistic knowledge developed in this research maybe useful for many applications such as keyboard input of Chinese characters, Chinese speech recognition, and optical Chinese character recognition.
Description: Thesis (M.Phil.)--Hong Kong University of Science and Technology, 1993
xi, 110 leaves : ill. ; 30 cm
HKUST Call Number: Thesis COMP 1993 Chung
Appears in Collections:CSE Master Theses

Files in This Item:

File Description SizeFormat

All items in this Repository are protected by copyright, with all rights reserved.