Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/6890

Understanding the improved sensitivity of spectral library searching over sequence database searching in proteomics

Authors Zhang, Xin
Issue Date 2010
Summary Peptide sequencing by tandem mass spectrometry is the cornerstone of proteomics technology. This computationally intensive task of assigning tandem mass spectra to their peptide identifications is predominantly accomplished by sequence database searching, which is slow and error-prone. A newer method, spectral library searching, has been recently proposed and was shown to be much faster and more sensitive than sequence database searching in limited studies. In this thesis, we performed a systematic comparison between sequence database searching and spectral library searching using a wide variety of data to better demonstrate, and understand, the superior sensitivity of spectral library searching. Our conclusions are as follows. Firstly, we showed, using several large datasets and a more robust statistical validation approach, that spectral library searching consistently outperforms sequence database searching by several different search engines employing different algorithms. Secondly, we demonstrated that the sensitivity advantage of spectral library searching is primarily due to the use of real library spectra, as opposed to simplistic theoretical spectra used in sequence database searching, for matching. Thirdly, we found that spectral library searching is more sensitive than sequence database searching only when the library spectra and the query spectra are acquired on the same type of instruments. Fourthly, we determined that the use of real peak intensities and the inclusion of non-canonical fragment ion peaks are both important factors in the sensitivity advantage of spectral searching, and that state-of-the-art spectrum prediction tools are still far from reproducing those favorable effects. Fifthly, we confirmed that spectral library searching is disproportionately more successful in identifying low-quality spectra. Our results answered important outstanding questions about this promising yet unproven method using well-controlled computational experiments and sound statistical approaches.
Note Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2010
Subjects
Language English
Format Thesis
Access
Files in this item:
File Description Size Format
th_redirect.html 343 B HTML