HKUST Institutional Repository >
Electronic and Computer Engineering >
ECE Master Theses >
Please use this identifier to cite or link to this item:
|Title: ||Improved acoustic model training for speech recognition and verification|
|Authors: ||Au, Wing Hei|
|Issue Date: ||2004 |
|Abstract: ||Model estimation is an important step in pattern recognition tasks. Different optimization criteria, such as maximum likelihood (ML), minimum classification error (MCE) and maximum mutual information (MMI) are widely used in speech processing tasks. A good criterion typically should be close to the performance metric, be a continuous function and can be computed efficiently. Past research has shown that a well-selected criterion can significantly improve performance.
The choice of criterion depends on the task and available resources. In this thesis, we focus on developing two criteria, one for verification tasks and one for recognition. Because operating point classification error rates are the typical evaluation metrics used in verification, our first criterion is an operating point specific "minimum verification error rate" (MVER) criterion and it is an extension of minimum classification error (MCE), which is often applied in recognition. We tested the models that trained with this new criterion on utterance verification experiments and showed that it outperformed the MCE-trained models.
A more specific model, such as those trained using MCE or MVER often outperforms a more general model. However, a slight mismatch in test may significantly degrade performance. Therefore, we propose a neighborhoodbased training criterion to broaden the distribution. This is related to both recently proposed neighborhood-based decoding and Bayesian learning. One advantage of this training criterion is that it can be integrated into different optimization criteria. On the task of robust speech recognition, we integrated the neighborhood training criteria with ML and MCE criteria, and both showed improved robustness under moderate mismatch conditions.
The results in this thesis were obtained from speech-related tasks, but because model estimation is widely used in pattern recognition, the principles learned and methodologies developed will also be applicable to other pattern recognition tasks.|
|Description: ||Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2004|
xiii, 86 leaves : ill. ; 30 cm
HKUST Call Number: Thesis ELEC 2004 Au
|Appears in Collections:||ECE Master Theses|
Files in This Item:
All items in this Repository are protected by copyright, with all rights reserved.