HKUST Library Institutional Repository Banner

HKUST Institutional Repository >
Computer Science and Engineering >
CSE Journal/Magazine Articles >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/2935
Title: Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting
Authors: Mak, Brian Kan-Wing
Hsiao, Roger
Ho, Simon
Kwok, James Tin-Yau
Keywords: Eigenvoice speaker adaptation
Kernel eigenvoice speaker adaptation
Kernel PCA
Composite kernels
Pre-image problem
Reference speaker weighting
Issue Date: Jul-2006
Citation: IEEE transactions on audio, speech and language processing, v. 14, no. 4, July 2006, p. 1267-1280
Abstract: Recently, we proposed an improvement to theconventional eigenvoice (EV) speaker adaptation using kernel methods. In our novel kernel eigenvoice (KEV) speaker adaptation [1], speaker supervectors are mapped to a kernel-induced high dimensional feature space, where eigenvoices are computed using kernel principal component analysis. A new speaker model is then constructed as a linear combination of the leading eigenvoices in the kernel-induced feature space. KEV adaptation was shown to outperform EV, MAP, and MLLR adaptation in a TIDIGITS task with less than 10s of adaptation speech [2]. Nonetheless, due to many kernel evaluations, both adaptation and subsequent recognition in KEV adaptation are considerably slower than conventional EV adaptation. In this paper, we solve the efficiency problem and eliminate all kernel evaluations involving adaptation or testing observations by finding an approximate pre-image of the implicit adapted model found by KEV adaptation in the feature space; we call our new method embedded kernel eigenvoice (eKEV) adaptation. eKEV adaptation is faster than KEV adaptation, and subsequent recognition runs as fast as normal HMM decoding. eKEV adaptation makes use of multi-dimensional scaling technique so that the resulting adapted model lies in the span of a subset of carefully chosen training speakers. It is related to the reference speaker weighting (RSW) adaptation method that is based on speaker clustering. Our experimental results on Wall Street Journal show that eKEV adaptation continues to outperform EV, MAP, MLLR, and the original RSW method. However, by adopting the way we choose the subset of reference speakers for eKEV adaptation, we may also improve RSW adaptation so that it performs as well as our eKEV adaptation.
Rights: © 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
URI: http://hdl.handle.net/1783.1/2935
Appears in Collections:CSE Journal/Magazine Articles

Files in This Item:

File Description SizeFormat
x.sap2006ekev.pdfpre-published version300KbAdobe PDFView/Open

Find published version via OpenURL Link Resolver

All items in this Repository are protected by copyright, with all rights reserved.