De novo sequencing of peptides poses one of the most challenging tasks in data analysis for proteome research. In this paper, a generative hidden Markov model (HMM) of mass spectra for de novo peptide sequencing which constitutes a novel view on how to solve this problem in a Bayesian framework is proposed. Further extensions of the model structure to a graphical model and a factorial HMM to substantially improve the peptide identification results are demonstrated. Inference with the graphical model for de novo peptide sequencing estimates posterior probabilities for amino acids rather than scores for single symbols in the sequence. Our model outperforms state-of-the-art methods for de novo peptide sequencing on a large test set of spectra.
Download Publication as PDF
The software is for non-commercial use only. Download the zip-file (NovoHMM.zip). It contains an executable file (NovoHMM.exe) for windows and a model file (hmm.txt) for ThermoFinnigan LCQ mass spectrometer. The first time you run NovoHMM, you will be asked to select the model file hmm.txt.
If you have any questions concerning the paper or the software, please contact Bernd Fischer, ETH Zurich (E-Mail: bernd.fischer@inf.ethz.ch)