Please use this identifier to cite or link to this item:

Variable speed playback system for speech and audio signals (and topics in video processing)

Authors Wong, Hon Wah
Issue Date 1998
Summary For many audio/speech retrieval applications, it is desirable to perform time scale modification (TSM) to change the apparent rhythm of audio and human speech. Such applications include fast and slow playback from recording, fast browsing of audio/speech database, audio post-production, etc. Time scaling of speech and audio signals is also one of the key features of the upcoming MPEG4 standard. This thesis addresses problems related to TSM. Firstly, the Synchronized Overlap-and-Add (SOLA) algorithm is relatively simple time-domain TSM algorithm that produces high quality results. However, the computational complexity of SOLA large making it impractical to be incorporated in real applications. In this thesis, two fast variants of SOLA, the Sub-sampled Hierarchical Search (SHS) and Envelope Matching Time Scale Modification (EM-TSM), are proposed to reduce the computational complexity of SOLA. Secondly, most TSM algorithms suffer from low intelligibility when the TSM factor is small. In this thesis, a modification of SOLA is proposed to improve the intelligibility. Thirdly, some speech materials have irregular rhythms that lowers intelligibility. In this thesis, an Adaptive TSM (ATSM) is proposed to increase the intelligibility by regularizing the rhythm of speech materials. Fourthly, a proposed fast SOLA variant is used to reduce the computational complexity of Dynamic Time Warping (DTW) based isolated word speech recognition. The proposed scheme suppresses noise to a certain degree, achieving higher recognition accuracy at reduced computation. A topic on video processing, unrelated to TSM, is also studied in this thesis. A new fast motion estimation algorithm for block-based video compression algorithms such as MPEG and ITU-T H.263 is proposed. An existing motion estimation algorithm called One-Bit Transform (1BT) achieves low complexity at the expense of significant degradation in the predicted image quality compared to full search (FS). Several modifications are proposed to improve 1BT by adding conditional local searches that can improve quality significantly with only slightly increased computational complexity.
Note Thesis (M.Phil.)--Hong Kong University of Science and Technology, 1998
Language English
Format Thesis
Files in this item:
File Description Size Format
th_redirect.html 345 B HTML