Characterization of Proteoform Post-Translational Modifications by Top-Down and Bottom-Up Mass Spectrometry in Conjunction with Annotations

Many proteoforms can be produced from a gene due to genetic mutations, alternative splicing, post-translational modifications (PTMs), and other variations. PTMs in proteoforms play critical roles in cell signaling, protein degradation, and other biological processes. Mass spectrometry (MS) is the primary technique for investigating PTMs in proteoforms, and two alternative MS approaches, top-down and bottom-up, have complementary strengths. The combination of the two approaches has the potential to increase the sensitivity and accuracy in PTM identification and characterization. In addition, protein and PTM knowledge bases, such as UniProt, provide valuable information for PTM characterization and verification. Here, we present a software pipeline PTM-TBA (PTM characterization by Top-down and Bottom-up MS and Annotations) for identifying and localizing PTMs in proteoforms by integrating top-down and bottom-up MS as well as PTM annotations. We assessed PTM-TBA using a technical triplicate of bottom-up and top-down MS data of SW480 cells. On average, database search of the top-down MS data identified 2000 mass shifts, 814.5 (40.7%) of which were matched to 11 common PTMs and 423 of which were localized. Of the mass shifts identified by top-down MS, PTM-TBA verified 435 mass shifts using the bottom-up MS data and UniProt annotations.


S-1: An algorithm for removing duplicated mass shifts
To remove duplicated mass shifts, we first group mass shifts reported from top-down or bottomup MS data into clusters and then remove duplicated mass shifts in each cluster.In the clustering step, two mass shifts [m1, p1, a1, b1] and [m2, p2, a2, b2] are added to the same cluster if p1 and p2 are the same and the difference between m1 and m2 is smaller than an error tolerance.A greedy algorithm is used to remove duplicated mass shifts in the same cluster with the objective of reporting a set of non-duplicated mass shifts and maximizing the number of mass shifts (Fig. S1).We sort all mass shifts in a cluster from a protein in the increasing order of the left boundary.Let L = S1, S2, …, Sn be the sorted mass shifts of the cluster.We compare the boundaries (a1, b1) of mass shift S1 with the boundaries (a2, b2) of S2 to remove duplicated ones.There are three cases.Case 1:  1 ≤  2 , that is, S1 and S2 do not overlap (Step 4 in Fig. S1).In this case, S1 is removed from the mass shift list L and added to the result list R. Case 2:  2 <  1 <  2 , that is, S1 and S2 partially overlap (Step 6 in Fig. S1).In this case, S2 is removed from the list L. Case 3,  1 ≥  2 , that is, S1 fully covers S2 (Step 8 in Fig. S1).In this case, S1 is removed from the list L. The comparison step is repeated for the first two mass shifts in L until only one mass shift remains in the list.Finally, the last remaining mass shift is added to the result list R.

S-2: Extracting PTMs from UNIMOD
All PTMs in the UNIMOD database (version 07/18/2023) [45] were downloaded in the text format.A Python script in PTM-TBA was used to extract the UNIMOD ID, PSI-MS name, monoisotopic mass, modified amino acid residues, mortification positions (N-terminal, C-terminal, or any) of each UNIMOD PTM from the downloaded text file.

Tables Table S1.
Parameter settings of MS-Fragger.

MS-Fragger
FiguresFigure S1: A greedy algorithm for removing duplicated mass shifts Figure S2.A histogram of mass shifts reported by MS-Fragger (round 2) from the first replicate of the SW480 bottom-up MS data in the range [-500, 500] Da Figure S3: Comparison of high-frequency PTMs identified by MS-Fragger and MetaMorpheus from the first replicate of the SW480 bottom-up MS data Figure S4: A histogram of mass shifts reported by TopPIC from the first replicate of the SW480 top-down MS data in the range [-500, 500] Da Figure S5: Comparison of mass shifts identified and verified by the three replicates of the SW480 top-down and bottom-up MS data Figure S6: Comparison of NTA, phosphorylation, and methylation sites identified from the first replicate of the SW480 top-down MS data and verified by UniProt and dbPTM annotations Figure S7: The distribution of protein sequence coverage of peptides identified by MS-Fragger (round 1) from the first replicate of the SW480 bottom-up MS data Tables Table S1: Parameter settings of MS-Fragger Table S2: The complete list of modifications used for G-PTM-D (Table_S2.xlsx)

FiguresFigure
Figures Figure S1.A greedy algorithm for removing duplicated mass shifts

Figure S3 .
Figure S3.Comparison of high-frequency PTMs identified by MS-Fragger and MetaMorpheus from the first replicate of the SW480 bottom-up MS data.Aminoethylbenzenesulfonylation (AEBS) is not included in the comparison because it was not a variable PTM in the database search of MetaMorpheus.

Figure S4 .
Figure S4.A histogram of mass shifts reported by TopPIC from the first replicate of the SW480 top-down MS data in the range [-500, 500] Da.

Figure S5 .
Figure S5.Comparison of mass shifts identified and verified by the three replicates of the SW480 top-down and bottom-up MS data.(a) Mass shifts identified from bottom-up MS data; (b) mass shifts identified from top-down MS data; (c) mass shifts identified by top-down MS data and verified by bottom-up MS data.

Figure S6 .
Figure S6.Comparison of NTA, phosphorylation, and methylation sites identified from the first replicate of the SW480 top-down MS data and verified by UniProt and dbPTM annotations: (a) NTA, (b) phosphorylation and (c) methylation.

Figure S7 .
Figure S7.The distribution of protein sequence coverage of peptides identified by MS-Fragger (round 1) from the first replicate of the SW480 bottom-up MS data.

Table S3 :
Parameter settings of MetaMorpheus

Table S4 :
Parameter settings of MaxQuant

Table S5 :
Parameter settings of TopFD

Table S6 :
Parameter settings of TopPIC

Table S7 :
Parameter settings of ProMex

Table S8 :
Parameter settings of MSPathFinder

Table S9 :
High frequency PTMs reported by MS-Fragger (round 1) from the first replicate of the SW480 bottom-up MS data

Table S10 :
Comparison of MS-Fragger, MetaMorpheus and MaxQuant for mass shift identification using the first replicate of the SW480 bottom-up MS data

Table S11 :
Running times of software tools and matching functions in PTM-TBA for analyzing the first replicate of the SW480 data

Table S12 :
The complete list of mass shifts identified from the first replicate of SW480 top-down MS data and verified by the first replicate of SW480 bottom-up MS data and annotations (Table_S12.xlsx)

Table S13 :
Comparison of several combinations of software tools for mass shifts identification and verification using the first replicate of SW480 data

Table S14 :
Numbers of annotated PTM sites downloaded from the dbPTM database and PTM sites identified from the first replicate of the SW480 top-down MS data and verified by dbPTM annotations

Table S15 :
The complete list of mass shifts identified by the first replicate of the SW480 top-down MS data and verified by the first replicate of the SW480 bottom-up MS data and UniProt annotations using all PTMs in the UNIMOD database (Table_S15.xlsx)

Table S16 :
The complete list of mass shifts identified from the Jurkat top-down MS data and verified by the Jurkat bottom-up MS data and UniProt annotations (Table_S16.xlsx)

Table S9 .
High frequency PTMs reported by MS-Fragger (round 1) from the first replicate of the SW480 bottom-up MS data.Commonly modified residues and uncommonly modified residues are obtained from the UNIMOD database.

Table S10 .
Comparison of MS-Fragger, MetaMorpheus and MaxQuant for mass shift identification using the first replicate of the SW480 bottom-up MS data

Table S11 .
Running times of software tools and matching functions in PTM-TBA for analyzing the first replicate of the SW480 data

Table S13 .
Table S14: Comparison of several combinations of software tools for mass shifts identification and verification using the first replicate of SW480 data

Table S14 .
Numbers of annotated PTM sites downloaded from the dbPTM database and PTM sites identified from the first replicate of the SW480 top-down MS data and verified by dbPTM annotations