HKUST Library Institutional Repository Banner

HKUST Institutional Repository >
Bioengineering  >
BIEN Master Theses  >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/7051
Title: A bayesian approach to de-noise peptide tandem mass spectra
Authors: Shao, Wenguang
Issue Date: 2010
Abstract: Tandem mass spectrometry is the dominant proteomics technology for identification of proteins in a mixture. Nowadays, developments in modern mass spectrometers have made it possible to produce a large number of tandem mass spectra in a relative short time. Unfortunately, almost every single spectrum contains a significant amount of noise, which is introduced as a result of contamination or other experimental artifacts. Thus, computational analysis of thousands of noise-contaminated spectra is a major challenge in proteomics research. The appearance of noise peaks in spectra not only leads to a waste of time spent in sequence database searching and data storage, but also, more critically, increases the false positives or false negatives in the process of interpretation of peptides and proteins. Strategies to de-noise spectra intend to retain signal peaks while removing noisy peaks. On average, up to 74% of the peaks in tandem mass spectra are noise. Therefore, it is appealing to develop a noise-filtering algorithm before assigning peptides to spectra, as well as when spectra are archived in spectral libraries. A common strategy is to specify a threshold, based on intensity of each peak. Peaks with intensity below that threshold are considered as noise and thrown out. Another simple method applies a rank cut-off criterion. For example, all peaks in a certain tandem mass spectrum are ranked by their intensity, and the top 50 peaks are assumed to be signals. It is obvious that these simple filters just takes the intensity information into consideration but neglect other useful hidden characteristic of peptide MS/MS spectra as location of peaks, relationship between pair of peaks, etc. In addition to this disadvantage, since the signal density, i.e. the fraction of peaks that are signals, varies a lot among spectra, it is not optimal to apply a constant rank cutoff to de-noise spectra. Here we propose a Bayesian machine-learning approach to assign a probability of being a signal to each peak in a spectrum based on their characteristics that are overlooked by intensity-based filtering methods. The cut-off criterion is determined according to this estimated probability. Therefore, spectra with different Signal density can be de-noised at a controlled number of signal peaks. The conditional probabilities are learned from a training set, in which signal and noise peaks are independently partitioned by reproducibility. Our model confirms, and quantifies, well-known qualitative behavior of peptide fragmentation.
Description: Thesis (M.Phil.)--Hong Kong University of Science and Technology, 2010
x, 58 p. : col. ill. ; 30 cm
HKUST Call Number: Thesis BIEN 2010 Shao
URI: http://hdl.handle.net/1783.1/7051
Appears in Collections:BIEN Master Theses

Files in This Item:

File Description SizeFormat
th_redirect.html0KbHTMLView/Open

All items in this Repository are protected by copyright, with all rights reserved.