Please use this identifier to cite or link to this item:

Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting

Authors Mak, Brian Kan Wing View this author's profile
Hsiao, Roger
Ho, Simon
Kwok, James Tin-Yau View this author's profile
Issue Date 2006
Source IEEE TRANSACTIONS on Audio SPEECH and LANGUAGE PROCESSING , v. 14, (4), 2006, JUL, p. 1267-1280
Summary Recently, we proposed an improvement to the conventional eigenvoice (EV) speaker adaptation using kernel methods. In our novel kernel eigenvoice (KEV) speaker adaptation, speaker supervectors are mapped to a kernel-induced high dimensional feature space, where eigenvoices are computed using kernel principal component analysis. A new speaker model is then constructed as a linear combination of the leading eigenvoices in the kernel-induced feature space. KEV adaptation was shown to outperform EV, MAP, and MLLR adaptation in a TIDIGITS task with less than 10 s of adaptation speech. Nonetheless, due to many kernel evaluations, both adaptation and subsequent recognition in KEV adaptation are considerably slower than conventional EV adaptation. In this paper, we solve the efficiency problem and eliminate all kernel evaluations involving adaptation or testing observations by finding an approximate pre-image of the implicit adapted model found by KEV adaptation in the feature space; we call our new method embedded kernel eigenvoice (eKEV) adaptation. eKEV adaptation is faster than KEV adaptation, and subsequent recognition runs as fast as normal HMM decoding. eKEV adaptation makes use of multidimensional scaling technique so that the resulting adapted model lies in the span of a subset of carefully chosen training speakers. It is related to the reference speaker weighting (RSW) adaptation method that is based, on speaker clustering. Our experimental results on Wall Street Journal show that eKFV adaptation continues to outperform EV, MAP, MLLR, and the original RSW method. However, by adopting the way we choose the subset of reference speakers for eKEV adaptation, we may also improve RSW adaptation so that it performs as well as our eKEV adaptation.
ISSN 1558-7916
Rights © 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Language English
Format Article
Access View full-text via DOI
View full-text via Web of Science
View full-text via Scopus
Files in this item:
File Description Size Format
x.sap2006ekev.pdf 307915 B Adobe PDF