Published September 14, 2023 | Version v1
Presentation Open

Deep learning to uncover the immunopeptidome

  • 1. University of Antwerp
  • 2. University of Washington
  • 3. Talus Bioscience
  • 4. Technical University of Munich

Description

Introduction

Mass spectrometry-based immunopeptidomics plays a crucial role in identifying targets for immunotherapy and vaccine development. However, the analysis of immunopeptidomics data remains challenging, suffering from low spectrum annotation rates. This is in large part because immunopeptides are generated from their parent proteins in an unpredictable manner, leading to a massive search space that has to be considered. We have developed two deep learning-based solutions to address this issue, by rescoring sequence database searching results based on accurate fragment ion intensity predictions to boost the spectrum identification sensitivity and accuracy, and by de novo peptide sequencing using a transformer neural network to translate peaks in a mass spectrum into the amino acids that comprise the generating peptide.

Experimental and Results

First, we developed an optimized immunopeptide fragment ion intensity prediction model based on Prosit by analyzing over 300,000 synthesized non-tryptic peptides from the ProteomeTools project on a timsTOF-Pro, which were used to fine-tune the existing Prosit fragment ion intensity prediction model. Comparison of the previously published Orbitrap and the here developed timsTOF Prosit models showed a substantially improved normalized spectral contrast angle (SA) between predicted and experimental spectra for non-tryptic peptides (SA ≥ 0.9 for 2.4% vs 26.3% of spectra, respectively) and for tryptic peptides (SA ≥ 0.9 for 0.2% vs 42.1%) measured on timsTOF. Next, we used the predicted fragment ion intensities as additional features for Percolator to rescore peptide-spectrum matches (PSMs) obtained using MaxQuant. When reprocessing public HLA class I and class II immunopeptidome data, incorporating the fragment ion intensity predictions into the database matching process increased the spectrum identification rate by 3.0-fold for HLA class I peptides and by 1.7-fold for HLA class II peptides compared to MaxQuant, as well as identifying novel neoepitopes.

Second, we have developed the Casanovo deep learning tool for de novo peptide sequencing. Casanovo was first trained on 30 million tryptic PSMs from the MassIVE-KB dataset and subsequently fine-tuned on multi-enzyme data to develop a non-enzymatic version suitable for processing immunopeptidomics data. When evaluating Casanovo on a multi-species benchmark dataset, it significantly outperformed the state-of-the-art de novo peptide sequencing tools DeepNovo and Novor, achieving an average peptide precision of 0.95 compared to 0.76 and 0.64, respectively. Using the non-enzymatic model to analyze MDA-MB-231 breast cancer cells, Casanovo identified 66% more antigen peptides that match the human proteome with higher predicted MHC binding affinity than a database searching strategy using Tide.

Conclusions

Identifying immunopeptides from mass spectrometry experiments is challenging due to the massive search space to consider, resulting in low spectrum annotation rates. We have introduced an optimized Prosit immunopeptidomics fragment ion intensity prediction model to rescore sequence database searching results, as well as the Casanovo tool for de novo peptide sequencing. These advanced deep learning solutions enabled us to detect significantly more immunopeptides and identify several novel neoepitopes. These powerful tools will enable an increased understanding of the immunopeptidome, ultimately leading to advances in immunotherapies.

Files

20230914_bmss.pdf

Files (7.5 MB)

Name Size Download all
md5:3f7cd58d563f370b809e5b1e9230f2db
7.5 MB Preview Download