Please use this identifier to cite or link to this item:

A comparative study of two kernel eigenspace-based speaker adaptation methods on large vocabulary continuous speech recognition

Authors Hsiao, R. HKUST affiliated (currently or previously)
Mak, B. View this author's profile
Issue Date 2005
Source Proceedings 9th European Conference on Speech Communication and Technology, Interspeech 2005-Eurospeech, Lisbon, Portugal , 4-8 September 2005, p. 1797-1800
Summary Eigenvoice (EV) speaker adaptation has been shown effective for fast speaker adaptation when the amount of adaptation data is scarce. In the past two years, we have been investigating the application of kernel methods to improve EV speaker adaptation by exploiting possible nonlinearity in the speaker space, and two methods were proposed: embedded kernel eigenvoice (eKEV) and kernel eigenspace-based MLLR (KEMLLR). In both methods, kernel PCA is used to derive eigenvoices in the kernel-induced high-dimensional feature space, and they differ mainly in the representation of the speaker models. Both had been shown to outperform all other common adaptation methods when the amount of adaptation data is less than 10s. However, in the past, only small vocabulary speech recognition tasks were tried since we were not familiar with the behaviour of these kernelized methods. As we gain more experience, we are now ready to tackle larger vocabularies. In this paper, we show that both methods continue to outperform MAP, and MLLR when only 5s or 10s of adaptation data are available on the WSJ0 5K-vocabulary task. Compared with the speaker-independent model, the two methods reduce recognition word error rate by 13.4%-21.1%.
Language English
Format Conference paper
Access View full-text via Scopus
Files in this item:
File Description Size Format
interspeech2005kadapt.pdf 87640 B Adobe PDF