PhD Defense
Lei Xin
Probability Scoring System for De Novo And Protein Identification with Tandem Mass Spectrometry
Time:
Place:
Supervisor:
Thesis Examiners:
Extra-Departmental
Examiner:
External Examiner:
12:30 a.m.
Middlesex College, Room 320
Dr. Kaizhong Zhang
Dr. Lucian Ilie
Dr. Sheng Yu
Dr. Peter Rogan (Biochemistry)
Dr. Bin Ma (Univ. of Waterloo)
Abstract:
In the past decade, tandem mass spectrometry (MS/MS) has become
the most popular technology in the field of proteomics. With the
growth of the scalability of mass spectrometer, automating the
process of assigning peptide sequences to spectra has become an
urgent need. One popular approach to this problem is \textit{de
novo} sequencing. The other one is to search a protein sequence
database. PEAKS algorithm package provides both approaches. It is
the most popular \textit{de novo} sequencing package and one of
the most widely used database searching package. However, the
score provided by PEAKS4.5 together with the result could not be
used to judge the quality of the result directly. As a
consequence, there was no easy way to automatically filter out the
false matches according to the score.
In this thesis, based on the PEAKS raw ion score, we propose some new features to distinguish correct matches from false matches. Then we build statistical models on these features and a probability scoring system is established. Not only does the new scoring function provide the automated result validation, but also it improves the accuracy of the PEAKS algorithm. In addition we propose a novel local search method for improving the \textit{de novo} sequencing algorithm of PEAKS. The thesis is divided into two parts according to two different approaches. In the first part, we calculate a probability score for each amino acid from \textit{de novo} sequencing results. In the second part, probability scoring systems are established for both peptide matches and protein hits. Experimental results show that new probability scoring system outperforms PEASK4.5 scoring system in both probability accuracy and the ability to distinguish correct matches from false matches.
Also from this web page:

