Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/7573

Accelerating genomic sequence compression with graphics processors

Authors Tan, Yuwei
Issue Date 2012
Summary A modern sequencing instrument is able to generate hundreds of millions of short reads of genomic data on a daily basis. As a result, there is an urgent need to develop fast algorithms that can efficiently handle, store, compress, access, and decompress genomic data. This thesis focuses on specialized compression schemes that can quickly compress and decompress large scale genomic data. We developed light-weight compression schemes for the FASTQ/FASTA format data, as well as specifically for sequence alignment output data. Furthermore, we leverage the Graphics Processing Unit’s (GPU) massively parallel architecture, high density of arithmetic logic units, and superior memory bandwidth to significantly accelerate compression and decompression. We demonstrate that our GPU-powered custom compression schemes achieve a compression ratio similar to or better than general purpose compressing algorithms for sequence data, also gain 20 times faster in compression process. Finally, we integrate our compression techniques into the state-of-the-art alignment tools and accelerate the overall speed by an order of magnitude by reducing the IO cost.
Note Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2012
Subjects
Language English
Format Thesis
Access
Files in this item:
File Description Size Format
th_redirect.html 339 B HTML