HKUST Library Institutional Repository Banner

HKUST Institutional Repository >
Computer Science and Engineering >
CSE Journal/Magazine Articles >

Please use this identifier to cite or link to this item:
Title: Kernel eigenspace-based MLLR adaptation
Authors: Mak, Brian Kan-Wing
Hsiao, Roger
Keywords: Eigenvoice speaker adaptation
Eigenspace-based MLLR adaptation
Kernel PCA
Composite kernels
Kernel eigenvoice adaptation
Embedded kernel eigenvoice adaptation
BFGS optimization
Issue Date: Mar-2007
Citation: IEEE transactions on audio, speech and language processing, v. 15, no. 3, March 2007, p. 784-795
Abstract: Recently, we have been investigating the application of kernel methods for fast speaker adaptation by exploiting possible non-linearity in the input speaker space. In this paper, we propose another solution based on kernelizing the eigenspace-based MLLR adaptation (EMLLR) method. We call our new method “kernel eigenspace-based MLLR adaptation”(KEMLLR). In KEMLLR, speaker-dependent (SD) models are estimated from a common speaker-independent (SI) model using MLLR adaptation, and the SD MLLR transformation matrices are mapped to a kernel-induced high-dimensional feature space, and kernel principal component analysis is used to derive a set of eigenmatrices in the feature space. In addition, composite kernel is used to preserve the row information in the transformation matrices. A new speaker’s MLLR transformation matrix is then represented as a linear combination of the leading kernel eigenmatrices, which, though exists only in the feature space, still allows the speaker’s mean vectors to be found explicitly. As a result, at the end of KEMLLR adaptation, a regular HMM is obtained for the new speaker and subsequent speech recognition is as fast as normal HMM decoding. KEMLLR adaptation was tested and compared with other adaptation methods (MAP, MLLR, EV, EMLLR, and eKEV) on the Resource Management and Wall Street Journal tasks using 5s or 10s of adaptation speech. It is found that in both cases, KEMLLR adaptation gives the greatest improvement over the SI model with 11–20% word error rate reduction.
Rights: © 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Appears in Collections:CSE Journal/Magazine Articles

Files in This Item:

File Description SizeFormat
x.sap2007kemllr.pdfpre-published version303KbAdobe PDFView/Open

Find published version via OpenURL Link Resolver

All items in this Repository are protected by copyright, with all rights reserved.