University of Southern Denmark Citrullinome of Porphyromonas gingivalis Outer Membrane Vesicles Confident Identification of Citrullinated Peptides

Porphyromonas gingivalis is a key pathogen in chronic periodontitis and has recently been mechanistically linked to the development of rheumatoid arthritis via the activity of peptidyl arginine deiminase generating citrullinated epitopes in the periodontium. In this project the outer membrane vesicles (OMV) from P. gingivalis W83 wild-type (WT), a W83 knock-out mutant of peptidyl arginine deiminase (ΔPPAD), and a mutant strain expressing PPAD with the active site cysteine mutated to alanine (C351A), have been analyzed using a two-dimensional HFBA-based separation system combined with LC-MS. For optimal and positive identification and validation of citrullinated peptides and proteins, high resolution mass spectrometers and strict MS search criteria were utilized. This may have compromised the total number of identified citrullinations but increased the confidence of the validation. A new two-dimensional separation system proved to increase the strength of validation, and along with the use of an in-house build program, Citrullia, we establish a fast and easy semi-automatic (manual) validation of citrullinated peptides. For the WT OMV we identified 78 citrullinated proteins having a total of 161 citrullination sites. Notably, in keeping with the mechanism of OMV formation, the majority (51 out of 78) of citrullinated proteins were predicted to be exported via the inner membrane and to reside in the periplasm or being translocated to the bacterial surface. Citrullinated surface proteins may contribute to the pathogenesis of rheumatoid arthritis. For the C351A-OMV a single citrullination site was found and no citrullinations were identified for the ΔPPAD-OMV, thus validating the unbiased character of our method of citrullinated peptide identification.


Introduction
Citrullination is a deimination of arginine, which results in the loss of a single nitrogen and hydrogen along with the addition of an oxygen, resulting in a mass shift of 0.984 Da and loss of a single charge.
Citrullination is a post-translational modification that can only occur on arginine residues, either on the Nor C-terminal of the peptides or internally.
Citrullination occurs in physiological and pathological conditions and is thought to play a range of different functions. The human peptidyl arginine deiminases (PAD) 1, 2, 3, 4, and 6 exert different roles as a result of expression in different cellular environments. PAD1 citrullinates keratin and filaggrin, which is important for terminal differentiation of keratinocytes. 1,2 PAD2 citrullinates myelin basic protein, which is involved in the myelin sheath formation. 3,4 PAD3 is involved in hair growth by citrullination of trichohyalin. 5,6 PAD4 has been assigned more adverse functions, from regulation of gene-expression [7][8][9][10][11] , through immune modulation 12 , to auto-citrullination for regulation of citrullinated protein 13 . Finally, the PAD6 function is not completely understood due to lack of substrate, but it is thought to have a role in reproduction 14 .
Only few species of bacteria belonging to genus Porphyromonas have been found to express peptidyl arginine citrullinating enzyme closely related on the amino acid sequence level, if not identical. 15 The best characterized is Porphyromonas gingivalis peptidyl arginine deiminase (PPAD). The sequence identity between PPAD and the human PADs is low, approximately 30% 16 , however, similar activity is preserved.
There are some differences, however, and while human PADs activity is dependent on calcium and targets internal Arg residues, PPAD preferentially citrullinates C-terminal Arg in a calcium-independent manner.
Apart from the preference for C-terminal Arg citrullination, specificity with respect to preceding residues has not been identified for PPAD. Conversely a study on PAD2 and PAD4 showed very broad specificity with PAD2 favoring Tyr in the +3 position (Assohou-Luy et al. 17 ).
Citrullinations have been found to contribute to the pathogenicity of various diseases including rheumatoid arthritis (RA) and periodontitis. P. gingivalis, while absent or at the low level in dental biofilm of Page 4 of 40 periodontally healthy subjects, occurs in high numbers in the mouth of periodontitis patients and is thought to be one of the primary causes of periodontitis. [18][19][20] Apart from PPAD P. gingivalis expresses Arg-specific gingipains, RgpA and RgpB 16 , which are important enzymes for citrullination, as they generate peptides and protein fragments with C-terminal Arg, instant substrates for modification by PPAD 21 . In line with concerted action of Rgps and PPAD the Rgp-null mutant shows very little citrullination in comparison to the parental P. gingivalis W83 strain. Nevertheless, PPAD can modify internal Arg residues 22,23 but this reaction occurs at the rate thousand times slower than citrullination of C-terminal Arg 24 . This is in accordance with the topology of the substrate binding site perfectly shaped to accommodate Arg at the C-terminus with no room for an extended peptide chain. 25 Several findings implicate PPAD as an important virulence factor of P. gingivalis. Through citrullination of Cterminal residues in epidermal growth factor 23 and C5a anaphylatoxin 26 the enzyme can contribute to the periodontal tissue damage and attenuation of innate immune responses, respectively. Furthermore, P.
gingivalis' citrullinome modulates neutrophil activity 27 , constrains P. gingivalis biofilm development 28 , affects epithelial cells transcriptome 29 , contributes to formation of dual-species biofilm with opportunistic fungus Candidia 30 , and is responsible for stimulation of prostaglandin E2 (PGE2) secretion by gingival fibroblasts. 31 The latter activity can be directly linked to periodontitis and RA pathogenicity as PGE2promotesbone resorption. Apart from that, C-terminally citrullinated peptides generated by concerted action of Rgp and PPAD are considered pivotal in breaking the immunotolerance leading to production of specific anti-citrullinated protein antibodies (ACPA) directly responsible for development of RA. 32 This theory fits well, with the findings of heightened levels of ACPAs in periodontitis and RA patients. 19,20,[33][34][35][36] Likewise, it is supported by data from animal models of RA and P. gingivalis infection in which RA severity is dependent on PPAD expression. 32 In this context, determination of P. gingivalis citrullinome is very important but a challenging task.

Page 5 of 40
The detection of citrullinations started with a color development reagent assay (COLDER assay) according to Clancy et al. 37 , which depends on chemical derivatization of the urea group of citrulline. 38 This method is mostly used in in vitro assays, due to its poor sensitivity and need of large amounts of citrullinations. 37 Antibody based methods also depend on chemical derivatization, the first anti-citrulline assay was developed by Senshu et al. 39 These antibody-based assays have a major disadvantage as they have been found to cross-react with carbamylation, which is a chemical modification of lysine into homocitrulline. 40 Currently the most promising technique for the identification of citrullinations with regard to sensitivity and specificity is mass spectrometry (MS).
The development during the last decades within the MS field of protein research has made the investigation of clinical samples and whole-cell lysates possible, particularly when fractionated prior to injection. Furthermore, less complex samples can be run directly, leading to significant reduction of manual work prior to analysis. The major problem with MS based methods for the single Dalton mass shift is the possibility of misinterpretation of a deamination of asparagine or glutamine or a wrong isotope picking, the loss of a single charge, giving rise to poor ionization and fragmentation, and small retention time shift. We addressed these problems in the present paper by the use of heptafluorobutyric acid (HFBA) as the ionpairing reagent during two-dimensional fractionation, as well as optimized mass spectrometric data acquisition, and development of specific software, Citrullia, designed for identification and validation of citrullinations.
Page 6 of 40

Experimental Procedures
Bacterial fraction preparation Cultures of P. gingivalis strain W83 and its isogenic mutants; C351A (with a point mutation of the catalytic cysteine residue, C351A, in PPAD rendering the enzyme catalytically inactive) and ΔPPAD (with the ppad gene deleted)were maintained on TSB agar plates with 5% defibrinated sheep blood and supplements: yeast extract (5 mg/ml), L-cysteine (0.5 mg/ml), hemin (5 µg/ml), and menadione (1 µg/ml). Liquid cultures were inoculated from 5-6 days old plates to liquid TSB medium with supplements and cultured for 18-20 hours at 37 °C in an anaerobic chamber. Cultures were then diluted to OD600 = 0.1 in fresh medium and cultured as before for 20 -22 hours. Aldrithiol-4 (1.5 mM) was added to cultures immediately after incubation. Cultures were then centrifuged (7 500 rcf, 15 minutes, 4 °C), supernatant collected and filtered through 0.45 µm membrane filter. The filtrate was ultracentrifuged at 70,000 rcf for 2 hours at 4 °C. The collected sediment encompassing OMV was washed and then suspended in PBS with 1 mM TLCK by gentle sonication. The concentration of protein was determined using the Bradford method with bovine albumin as a standard.

Sample preparation
OMV were reduced by the addition of dithiothreitol (DTT) to a final concentration of 10 mM. The samples were then incubated for 30 min at 50-57°C, followed by alkylation with iodoacetamide (IAA) added to a final concentration of 24 mM and incubation in the dark for 20 min. Excess IAA was removed by treatment with DTT and proteins in the sample were digested by overnight incubation at 37°C with 2% w/w in-house methylated trypsin 41 .
For complete analysis without fractionation, the samples were micro purified essentially as described, 42 dried down and resuspended in 0.1% formic acid (FA).
For analysis by high performance liquid chromatography (HPLC) fractionation, each sample was dried down and resuspended in 0.05 % heptafluorobutyric acid (HFBA) prior to off-line separation.

Mass spectrometry
Samples were run on an EASY-nLC1000 Liquid Chromatography system (Thermo Fisher Scientific), using a 3 µm trap column (100 µm inner diameter, 5 µm Reprosilpur 120 C18, Dr. Maisch GmbH, Germany) and an 18 cm analytical column (75 µm inner diameter, 3µm Reprosilpur 120 C18) coupled online to a Q Exactive HF Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Fisher Scientific). The methods applied on the mass spectrometer had the following settings in common: positive mode, an MS1 resolution of 120,000, AGC target of 3e6, maximum injection time of 100 ms, and a scan rage of 300-1400 m/z. The common MS2 settings were: resolution of 30,000, AGC target of 1e6, isolation window of 0.8 m/z, and a fixed first mass of 110.0 m/z. Furthermore, peptides with charges ranging from +1 to +6 were included, while +7 and above were excluded along with isotopes.

Amino acid analysis
To determine protein amounts and composition, amino acid analysis (AAA) was applied essentially as described by Højrup 2015 43 . Samples of 2-4 µg protein were dried in small polypropylene tubes, lids were punctured, and they were placed in 25 ml glass vials along with 200 µl of 6N HCl, 0.1 % phenol, 0.1 % 2thioglycolic acid and closed with a MinInert valve (VICI) after being covered by argon and evacuated to <1 mBar pressure. After overnight hydrolysis at 110 °C, samples were dried, re-dissolved, and analyzed on a BioChrom 30+ amino acid analyzer using recommended conditions.

Data handling
Citrullination is the exchange of an amino group with oxygen resulting in a mass increase of 0.984 Da. This difference is readily identified by modern MS search engines, but as the mass difference is identical to the commonly occurring deamidation of asparagine and glutamine and can further be mistaken by wrongly picked isotope in the ms1 spectrum, we developed a program for extraction and display of spectra identified as potentially containing citrullinated residues by the search engine.
In order to improve the validation of citrullinations and enable manual validation within a reasonable timeframe, we developed a program called Citrullia in C# version 7.2 for Windows. In addition, the .NET framework v. 4.7.2 was used in Visual Studio 2019 integrated development environment and the Metro Modern UI (Dennis Magno) 44 was used for visual user interface elements. As search engine we chose the publicly available X! Tandem program (www.thegpm.org) 45 .
As we deemed it necessary also to validate that the correct parent ion isotope had been picked, we used a new mass file format based on the Mascot Generic File format (mgf) but extended with ms1 information.
The format named mgx (Mascot Generic eXtended) has been developed by MassAI Bioinformatics Page 9 of 40 (www.massai.dk) and the mgx conversion program MGF Filter can be freely downloaded from www.massai.dk/download.html. The mgx format contains the MS1 level information interspersed with the MS2 level data, but each spectrum is recognized by a new keyword "MSLEVEL=". These files can easily be separated in ms1 and ms2 information for generating the standard mgf format files suitable for standard search engines.
X! Tandem was called with the following parameters: Parent ion precision: 10 ppm, ms2 precision: 0.02 Da, enzyme specificity: Cleavage after Lys and Arg, no cleavage before Pro, number of missed cleavages: <= 2.

Citrullia
The citrullination of an arginine residue results in a mass change of +0.9840 Da. This change can be readily identified by modern mass spectrometers, but as pointed out by Küster and co-workers 46 identification is fraught with danger of misinterpretation. As we thus anticipated that all potential citrullinated peptides had to be manually validated, we developed a program, Citrullia, for fast identification and easy manual validation. For fast and easy validation, the data needed is presented within a single window containing filename, retention time (RT), sequence, charge-state, e-value from the X! Tandem mass search, parent mass, accession number, table of determined ions, MS1, and MS2 spectra. It further provides information on the isotopic distribution of the MS1 peak, neutral losses, immonium ions, and an elution position for both the first-(HFBA) and the second-dimension (FA) separation. Furthermore, within this single window, each spectrum can be marked as validated. All the validated spectra are saved into a list, which can be extracted as one single list. This makes the validation a relatively straightforward process, as a researcher can quickly scroll through all potential candidates and mark them as validated or non-validated. Citrullia thereby ensures identification and semi-automatic validation of citrullinated peptides.
Raw files were first converted into mzXML using MS converter (part of the ProteoWizard package), peak picking was the only filter used, prior to conversion into the MGX file format. When loaded into Citrullia, the MS1 and MS2 data were separated and the MS2 data saved in a standard mgf file format. Each MS2 datafile was then searched individually using the X! Tandem search engine, with all parameters set by Citrullia. The 26-30 result files from a given experiment were loaded into Citrullia, and a multiple path search was performed (Fig 1). Initially peptides identified as citrullinated were extracted, and the entire X! Tandem run was searched for matching peptides with an MS1 mass difference of -0.984. If paired peptides were found, they were forwarded for validation. A second run was then performed using argininecontaining peptides, which were searched for potential citrulline-containing peptides using a mass Page 11 of 40 difference of +0.984. For validation, matching citrullinated and non-citrullinated peptides were displayed with one spectrum mirrored. This enables an easy comparison and visualization of mass shifts and differences in the fragmentation pattern. Validated peptides were then marked as such. Finally, nonmatched citrullinated peptides were also validated individually but marked as singles. The two main criteria for validation were: 1) To establish that citrullinated arginine residues were delineated by fragment ions in the MS2 spectra in order to unambiguously distinguish it from potential deamidated asparagine and glutamine residues.
2) Verify that the correct monoisotopic ion was picked for parent ion fragmentation.
Three additional criteria were used for the validation: 1) The fragmentation pattern in agreement with the charge localization in the peptide, e.g. a C-terminally citrullinated tryptic peptide will not have a C-terminally located positive charge which results in a subsequent change in the fragmentation pattern (supplemental Fig. 4).
2) In reversed phase FA-based chromatography a citrullinated peptides will show delayed retention time relative to the non-citrullinated peptide. 47 3) In reversed phase HFBA-based chromatography a citrullinated peptide will show the reversed (leading) retention time behavior. 48 Citrullinations were thus evaluated both on MS1 specificity, fragment ions, fragmentation pattern, and retention time behavior in one or two dimensions. The user interface of Citrullia is presented in supplemental Fig. 1.
Identification of citrullination in the OMV of P. gingivalis W83 strains (WT, C351A, and

ΔPPAD)
In order to improve the detection of citrullinated peptides, we decided to evaluate a two-dimensional chromatographic system. This was based on off-line separation of peptides using HFBA as modifier in the first dimension and an on-line FA-based system, both using C18 reversed phase column material. In a publication by Mant et al. 48 it was reported that HFBA as a modifier bound strongly to positively charged residues, in particular arginines, causing arginine containing peptides to be retained longer in a reversed phase system compared to same peptides with citrulline(s). This was corroborated using synthetic peptides [results not shown].We therefore decided to evaluate whether the HFBA modifier could be used in a standard two-dimensional proteomics setup and introduce a validation step for identification of citrullinated peptides. The first-dimension separation of P. gingivalis OMV tryptic peptides resulted in [26][27][28][29][30] fractions, each of which was subsequently analyzed using a 30-minute gradient in FA.
The isolated OMV from three different P. gingivalis strains were analyzed for citrullinations in technical duplicates. The preparations were from wild-type P. gingivalis W83 (WT), the W83 strain with a mutated PPAD, where the active cysteine was replaced with alanine (C351A), and a knockout of PPAD (ΔPPAD) in the W83 strain. Based on previous results on the P. gingivalis OMV secretome 49 Fig.2A-E) indicating that the difference in content of citrullinated peptides was not sufficient for a measurable difference in the TIC.
The majority of the identified proteins and peptides were found in fractions collected between 12-and 30minute elution from the first-dimension separation. The profile of identified peptides in each fraction was reproducible, and each fraction showed similar identification numbers across the different samples and replicates. Citrullinated peptides were found across the entire fractionation, but the majority of citrullinations were identified in fractions collected between 18 and 26 min. Only a few fractions showed large standard deviations in peptide or protein identification, which could be explained by lack of material or too low resuspension buffer volume (Fig. 3).
For the majority of fractions, the number of peptides and proteins identified in each OMV fraction was C531A >ΔPPAD> WT ( Table 1). The number of unique peptides identified in C351A was approximately 15% higher than in ΔPPAD and twice that of WT. The number of identified unique proteins was almost identical for C351A and ΔPPAD, but a quarter less for WT (Fig. 4A). This may indicate that the 115 proteins determined in C351A are close to the total number measurable in the current dynamic range and MS settings. However, while the overlap between technical replicates was fairly high, the overlap of peptides between OMV derived from different strains was low, resulting in only 626 common peptides for all three strains (Fig. 4B). On the other hand, the overlap between identified proteins was relatively high (i.e. only 12-18 additional unique proteins identified per strain-fraction OMV), indicating that many of the unique identified proteins are of low abundance.
When searching for citrullinations, the vast majority was identified in the WT replicates, as only a single citrullination was identified in replicate 1, fraction 15 of the P. gingivalis PPAD C351A strain and none in ΔPPAD. Of the total of 52 unique citrullinations, 13 were found in pairs with the non-citrullinated peptide and the remainders were found as singles, i.e. the corresponding Arg-containing peptide could not be found. Based on the paired citrullinations, fraction shifts in the first dimension HFBA separation can be Page 14 of 40 calculated. Of the 13 paired citrullinations, 12 were found to elute in an earlier fraction and one was found to elute both earlier as well as later, depending on the replicate. The RT shift was calculated for the second dimension FA separation, where the average RT shift has been calculated to 50 seconds of delayed elution for a citrullinated peptide. This is in accordance with the previous observations.

Citrullination of OMV derived from WT P. gingivalis
Based on the above presented results, we decided to analyze the wild type P. gingivalis W83 in more detail, in order to obtain the best characterization of the citrullinome and to evaluate our two-dimensional separation strategy. Using a single off-line HFBA separation of a tryptic digest of the WT resulted in 26 fractions, technical triplicate injections were performed on the mass spectrometer and was compared to a triplicate injection of the entire tryptic digest. For the off-line separation short 30-minute gradients were used, while a two-hour gradient was used when analyzing the complete mixture.
For the detailed analysis of the WT OMV, mass spectrometric data acquisition was optimized by decreasing the loading time (from 200 to 100 ms) and using a top 10 instead of a top 5 method, thus doubling the number of ions being fragmented and analyzed. For the long 120 minutes gradient, two microgram of WT OMV tryptic digest was micro-purified for each injection, as this was the estimated amount injected from the two-dimensional separation method over the central 20 fractions.
As C-terminally citrullinated peptides were expected mainly to be singly charged, these ion species were included for all sample runs. This inclusion had a greater impact for the long gradients where 15 out of 95 identified citrullinated peptides were singly charged. For the two-dimensional system the same occurred in only 6 out of 206 peptides (supplemental Table 3). The reason for the lower number of singly charged peptides in the 2D system is likely that the same peptide has a higher probability of being observed multiple times. If a peptide occurs as a doubly charged species, the resulting improved fragmentation would result in a higher e-value which would be more favorable, and thereby be selected for further analysis. This can also be seen from the observation that in the long gradient 44% of identified citrullinated Page 15 of 40 peptides were observed without a positively charged side-chain residue (primarily histidine, as tryptic digest takes place after lysine and arginine), which was only the case for 29% of the two-dimensional method.
For the triplicate HFBA fractionated samples, between 1312 and 1452 unique peptides were identified (Table 2). This is approximately four times the number of peptides identified in the first duplicate analysis.
This resulted in the identification of 133 unique proteins, which is approximately 35% more than in the first experiment. While the number of identified citrullinated peptides also increased, the increase was approximately 48% compared to the number obtained in the first experiment. A total of 99 citrullinated peptides were identified for the triplicate, resulting in 56 proteins being positively identified as citrullinated. Of the citrullinated peptides, approximately half were identified along with the corresponding arginine-containing peptide (e.g. paired) while the rest were found as singles. The use of HFBA as modifier in the first dimension also showed the majority of the paired peptides (85) eluting in leading fractions with an average difference of 1.7 fractions. Eleven peptides eluted in the same and 2 in lagging fractions. As fractions were collected for either 5 (first and last fractions) or 1 min, an even better separation could probably be obtained by collecting smaller fraction (for shorter time windows) in the first dimension, at the expense of additional MS runs and less material per run. A single citrullinated peptide (AGNHTVQGATR) was found in both lagging and leading fractions, and twice in the same fraction, indicating that it may be sticky and elute over a large part of the chromatogram. In the second dimension, the opposite retention behavior of the citrulline/arginine peptides was observed, with an average separation time of approximately one minute using a 30-minute gradient and four minutes for the long gradient. This shows that both the first-and second-dimension RT can be used for validation, if both the citrullinated and noncitrullinated peptide are present and have been matched.
In the second dimension, the fractionation showed peptide elution across a wide time period. For the first 20 out of the 30 total fractions, the peptides eluted over a period of ≈12 min (from 10 to 22 min) on a 30 Page 16 of 40 min gradient, while the elution for the last 10 fractions shifted towards a later start of peptide elution (17 min). Thereby the pooling of fractions, which is often performed when using HILIC 57 or High pH 56 is not possible, which increases the number of fractions to run in our HFBA system, but likely results in better separation of the sample. Furthermore, pooling of fractions would decrease the obtained resolution and obscure information on the exact fraction in which the peptide is eluting in the first dimension.
For the experiment using a long gradient, the total number of identified peptides and proteins was half of that observed in the two-dimensional separation ( Table 2). However, the number of identified citrullinated peptides and proteins were very similar. A major difference was that only a small number of paired citrulline/arginine peptides were found. These were found with an average delayed elution of the citrulline peptide of 4 minutes in the 2h FA-based gradient. The difference in retention compared to the twodimensional separation is due to the four times longer gradient.
Large differences in the number of observed proteins were observed between the WT OMV and the mutant strains when analyzed by the two-dimensional method. In total 115 proteins were identified for C351A and 112 for ΔPPAD, while only 88 were identified for the WT. This is likely caused by many WT citrullinated peptides being singly charged and thus more difficult to detect due to less fragmentation relative to multiply charged ions. As protein content and composition in the samples, except for the presence of PPAD, should be identical, the number of lysine terminated peptides should thus remain constant. However, the number of arginine/citrulline-terminated peptides may vary depending on the level of citrullination, hereby shifting the ratio. Although the number of identified peptides varied between the samples, the most striking difference is that the arginine/lysine terminated peptide ratio was very low in WT OMV (0.18) while almost identical for C351A (0.82) and ΔPPAD (0.79) ( Table 3). Similar ratios were observed for the WT OMV long gradient triplicates (0.16) while the two-dimensional method increased the ratio to 0.28 (Table 4) showing that optimization of the data acquisition and the two-dimensional separation had improved the depth of analysis regarding citrullination.

Discussion
P. gingivalis secretes a very active peptidyl arginine deiminase (PPAD) along with Arg-specific gingipains (Rgps) using type IX secretion system (T9SS). During translocation across the outer membrane conserved Cterminal domains are cleaved off by sortase (PorU), and an anionic lipopolysaccharide is attached, anchoring the enzymes to the P. gingivalis cell surface. 21,50 In this way PPAD and gingipains are in very close proximity since they are major components of the surface electron dense layer composed of circa 30 proteins secreted via T9SS. 51 Apparently in this environment Rgps generate C-terminal arginine residues on peptides and protein fragments, which are efficiently converted into citrulline by PPAD. 21 By budding of the outer membrane, the OMV coated with the surface electron dense layer are released into the environment carrying inside some periplasmicproteins. 52 Formation of the OMV is not a random process and is driven by a mechanism selectively sorting virulence factors into OMV excluding at the same time abundant outer membrane proteins. 53 In this way OMVs are very important for host-pathogen interactions extending the outreach of P. gingivalis virulence factors, including gingipains, PPAD, and citrullinated proteins into periodontal tissues. Therefore, taking into account pathogenic potency of citrullinated proteins it was important to develop a technique allowing unbiased identification of citrullination sites and delineation of the citrullinome of the P. gingivalis OMV.
Since PPAD mainly citrullinate C-terminal arginines 54 , a large part of the citrullinated peptides in a tryptic digest were expected to be singly charged, due to the peptide size and citrullination. Therefore, we included singly-charged ions in our MS method and database searches. Such an approach increases the number of possible citrullination sites but is usually neglected due to potential false-positive issues and inclusion of a large number of non-peptide ions. In the most comprehensive study to date of the human citrullinome performed by Küster and coworkers 46 , the authors excluded all potential C-terminal citrullinations and thereby ignored an estimated 10 -48% of their data, which could potentially contain true positives.

Page 19 of 40
In order to characterize the citrullinome at the P. gingivalis surface and the importance of PPAD, we analyzed proteins located in the OMV of the wild type W83 strain (WT), the mutant with the active cysteine mutated to alanine (C351A), and the knockout mutant (ΔPPAD). The proteomic analysis of the OMV derived from WT and mutants of the W83 using a two-dimensional strategy, identified a combined set of 88 to 115 proteins displaying a common core of 69 proteins (Table 1, Fig. 4). The P. gingivalis OMV proteome has previously been determined by Reynolds and co-workers 51 , where they established a set of 151 proteins.
Comparing our set of identified proteins to theirs, revealed only an approximately fifty percent overlap (supplemental Table 1). The reason for this can either be some differences in the purification of the OMVs as a bacterial growth phase, gel separation and slicing vs. 2D chromatography, or different settings during the database search. Setting our database search parameters similar to the ones presented in Veith et al. 51 increased our identifications to >650 proteins, but only increased the overlap to approximately 60 percent.
In a more recent paper 55 the same group using the same strain and preparation, but a slightly different gel and MS strategy, identified 181 proteins. Here 27 proteins from the earlier paper (18%) were not found, showing that the exact details for preparation and analysis are essential for comparison.
We have mapped the identified citrullinated proteins to their predicted subcellular location within the bacteria using the presence of signal peptide and the PG locus database. Three main groups were identified: lipoproteins (24%), T9SS secreted (28%), and others including cytoplasmic and inner membrane proteins (48%), (supplemental Fig. 4). Comparing our distribution of the citrullinated proteins to the complete OMV proteome of Veith et al., 55 revealed large differences particularly with regard to the cytoplasmic and inner membrane proteins. Whereas these proteins constituted only a few percent of recognized protein in the Veith et al. study they were found in much larger numbers in our experiment.
Also, in contrast to abundance of the T9SS secreted proteins (almost 2/3 of identified proteins) and rarity of lipoproteins identified by Veith et al., in our study, we found an equal number of lipoproteins and T9SS cargo proteins. As outlined above, the differences may be caused by several analytical differences and is likely further skewed by comparing all found proteins against citrullinated proteins. The T9SS secreted Page 20 of 40 proteins are likely more resistant towards proteolysis and citrullination by Rgps and PPAD, respectively, while lipoproteins, cytoplasmic, and inner membrane proteins are more sensitive.
In another publication by van Dijl and coworkers 49 , the authors aimed at determining the secreted citrullinome of P. gingivalis. This study showed a similar number of identified proteins (64) as a core set, with a relatively large difference in the secretome among five different strains of P. gingivalis. Note that the procedure used in this work did not exclusively purify the OMV, and apparently both soluble proteins and those associated with OMV were analyzed, which probably contributed to the observed differences among analyzed strains.
A striking difference between the data presented here and the data of van Dijl 49 is the number of citrullinated proteins identified with high confidence which is 34 versus only 2, respectively. Furthermore, they found 11 citrullinated proteins in the ΔPPAD W83 strain, which originated in our laboratory. Although none of these peptides was identified with high confidence, finding such peptides in the PPAD-null strain undermines their approach of unbiased identification of citrullination. Apparently, the van Dijl's group either could not distinguish Gln/Asn deamidations from Arg deimination or picked up a wrong isotope, or have other technical problems precluding confident identification of citrullinated peptides. For the C351A PPAD mutant we only identified a single citrullination (supplemental Fig.4), showing that the replacement of the cysteine in the active site may not completely remove all activity, but severely diminish it.
As the analysis of the various mutants revealed that the WT OMV was the only sample-type producing a  (Table 2). Although the total number of identified citrullinations does not vary much between the two methods, the confidence in identification of the citrullinations differed. A major difference was, however, that the two-dimensional system managed to identify twice as many proteins as the simple gradient. This difference in identification efficiency of citrullinated and normal peptides may be caused by citrullinated peptides ionizing as singly charged species, which need a higher ion count in order to fragment sufficiently for identification. Although the vast majority of identified citrullinated peptides were found as C-terminal citrullinations, we did identify a few peptides having internal citrullinations (spectra presented in supplemental Fig 4.1 and 4.2). As internally citrullinated peptides generally ionizes as doubly charged ions with resulting higher detection efficiency, it supports the contention that PPAD has a strong to almost exclusive preference for C-terminal arginine residues.
While identified citrullinated peptides varied somewhat between technical duplicates, the number of identified unique proteins varied surprisingly little (Fig. 5 panel A to F). In contrast to this, the difference between both identified citrullinated peptides as well as citrullinated proteins varied more (Fig. 5 panel G and H). The variation could be due to the ion pairing reagent utilized (HFBA) in the first-dimension, where it enriches for specific peptides. However, the two-dimensional method doubled the number of identified peptides and proteins. Whether this increase was related to this particular two-dimensional system and could be reproduced by a high/low pH system 56 or similar was not tested. Even though the number of identified citrullinations was similar between our two systems, our two-dimensional separation clearly shows potential advantages with regard to validation of citrullinations and the depth of analysis.
On the other hand, the number of identified peptides and proteins was much lower for WT OMV than for C351A and ΔPPAD. As the amount of sample was identical for all samples, the difference is likely caused by citrullination of the WT peptides resulting in a lower detection efficiency. This assumption was verified by measuring of the ratio between lysine terminated and arginine terminated peptides in the various samples as shown in Table 3 and 4. Taken together these results indicate that numerous P. gingivalis proteins were Page 22 of 40 cleaved by RgpA/B, but almost all generated C-terminal arginines were modified by PPAD to citrulline. As expected, however, OMV-associated proteins were not degraded but rather a relatively small fraction of molecules was nicked at some sites, as SDS-gel electrophoresis of the various samples showed clear bands for intact proteins [results not shown].
The present results clearly show that limitations in the detection of C-terminally citrullinated peptides lie in the detection by mass spectrometry, as most of these peptides are ionized as singly charged species. A number of ways to alleviate this can be suggested. The most straightforward way would be to increase the net charge of the peptides, either at the N-terminus (e.g. by TMT labeling 58 ), or at the C-terminus (e.g. using techniques from C-terminomics 59 ). A purely MS-based method could be to increase the general charge by supercharging 60 or use a faster MS instrument in combination with a decision tree for optimal data acquisition of singly charged species. A biochemical way of increasing the charge could be to digest with endopeptidase Lys-N which will generate lysine N-terminals at the expense of larger peptides, which may be more difficult to identify. An additional complexity arises when the sample has a low level of citrullination, which is often the case for mammalian citrullination analysis. For these analyses the sample has to be enriched. Here a reaction with diols 61 is a possibility, particularly when the diol is coupled to a biotin group, which enables isolation with streptavidin 62 . This method has not been generally used, probably due to difficult synthesis of the reagent or problems with the identification. In this way there is ample room for improvements on different levels of the analysis, from MS method optimization to sample preparation.

Conclusion
The citrullinations identified and validated by our method are characterized by high overall confidence due to the manual validation steps and no exclusion of data prior to analysis. One drawback is the large amount of data that must be handled manually. This clear downside of the method was mitigated by use of the in- Interestingly however, only one of them was mapped to the native Arg occurring at the C-terminus of circa 200 proteins encoded in the P. gingivalis genome (Suppl. Table 3). This finding confirms our contention that Rgps and PPAD work in concert in generation of C-terminally citrullinated peptides derived from both bacteria and host proteins. Some of modified peptides/proteins apparently contribute to the autoimmune response leading to RA 32 , others stimulate the PGE2 synthesis pathway 31 , affect inter-and intra-microbial interactions in biofilm 27,29 and finally modify responses of neutrophils 27 and gingival epithelial cells. 29 Recognizing which citrullinated proteins/peptides are responsible for which activity is a challenge and the method described in this report is a first step in the direction of understanding the pathobiological meaning of the P. gingivalis citrullinome.