HKUST Institutional Repository >
Electronic and Computer Engineering >
ECE Master Theses >
Please use this identifier to cite or link to this item:
|Title: ||Variable speed playback system for speech and audio signals (and topics in video processing)|
|Authors: ||Wong, Hon Wah|
|Issue Date: ||1998 |
|Abstract: ||For many audio/speech retrieval applications, it is desirable to perform time scale modification (TSM) to change the apparent rhythm of audio and human speech. Such applications include fast and slow playback from recording, fast browsing of audio/speech database, audio post-production, etc. Time scaling of speech and audio signals is also one of the key features of the upcoming MPEG4 standard. This thesis addresses problems related to TSM.
Firstly, the Synchronized Overlap-and-Add (SOLA) algorithm is relatively simple time-domain TSM algorithm that produces high quality results. However, the computational complexity of SOLA large making it impractical to be incorporated in real applications. In this thesis, two fast variants of SOLA, the Sub-sampled Hierarchical Search (SHS) and Envelope Matching Time Scale Modification (EM-TSM), are proposed to reduce the computational complexity of SOLA. Secondly, most TSM algorithms suffer from low intelligibility when the TSM factor is small. In this thesis, a modification of SOLA is proposed to improve the intelligibility.
Thirdly, some speech materials have irregular rhythms that lowers intelligibility. In this thesis, an Adaptive TSM (ATSM) is proposed to increase the intelligibility by regularizing the rhythm of speech materials.
Fourthly, a proposed fast SOLA variant is used to reduce the computational complexity of Dynamic Time Warping (DTW) based isolated word speech recognition. The proposed scheme suppresses noise to a certain degree, achieving higher recognition accuracy at reduced computation.
A topic on video processing, unrelated to TSM, is also studied in this thesis. A new fast motion estimation algorithm for block-based video compression algorithms such as MPEG and ITU-T H.263 is proposed. An existing motion estimation algorithm called One-Bit Transform (1BT) achieves low complexity at the expense of significant degradation in the predicted image quality compared to full search (FS). Several modifications are proposed to improve 1BT by adding conditional local searches that can improve quality significantly with only slightly increased computational complexity.|
|Description: ||Thesis (M.Phil.)--Hong Kong University of Science and Technology, 1998|
xv, 148 leaves : ill. (some col.) ; 30 cm
HKUST Call Number: Thesis ELEC 1998 WongHW
|Appears in Collections:||ECE Master Theses|
Files in This Item:
All items in this Repository are protected by copyright, with all rights reserved.