Human Immunoglobulin Heavy Gamma Chain Polymorphisms: Molecular Confirmation Of Proteomic Assessment*

Immunoglobulin G (IgG) proteins are known for the huge diversity of the variable domains of their heavy and light chains, aimed at protecting each individual against foreign antigens. The IgG also harbor specific polymorphism concentrated in the CH2 and CH3-CHS constant regions located on the Fc fragment of their heavy chains. But this individual particularity relies only on a few amino acids among which some could make accurate sequence determination a challenge for mass spectrometry-based techniques. The purpose of the study was to bring a molecular validation of proteomic results by the sequencing of encoding DNA fragments. It was performed using ten individual samples (DNA and sera) selected on the basis of their Gm (gamma marker) allotype polymorphism in order to cover the main immunoglobulin heavy gamma (IGHG) gene diversity. Gm allotypes, reflecting part of this diversity, were determined by a serological method. On its side, the IGH locus comprises four functional IGHG genes totalizing 34 alleles and encoding the four IgG subclasses. The genomic study focused on the nucleotide polymorphism of the CH2 and CH3-CHS exons and of the intron. Despite strong sequence identity, four pairs of specific gene amplification primers could be designed. Additional primers were identified to perform the subsequent sequencing. The nucleotide sequences obtained were first assigned to a specific IGHG gene, and then IGHG alleles were deduced using a home-made decision tree reading of the nucleotide sequences. IGHG amino acid (AA) alleles were determined by mass spectrometry. Identical results were found at 95% between alleles identified by proteomics and those deduced from genomics. These results validate the proteomic approach which could be used for diagnostic purposes, namely for a mother-and-child differential IGHG detection in a context of suspicion of congenital infection.

Immunoglobulin G (IgG) proteins are known for the huge diversity of the variable domains of their heavy and light chains, aimed at protecting each individual against foreign antigens. The IgG also harbor specific polymorphism concentrated in the CH2 and CH3-CHS constant regions located on the Fc fragment of their heavy chains. But this individual particularity relies only on a few amino acids among which some could make accurate sequence determination a challenge for mass spectrometry-based techniques.
The purpose of the study was to bring a molecular validation of proteomic results by the sequencing of encoding DNA fragments. It was performed using ten individual samples (DNA and sera) selected on the basis of their Gm (gamma marker) allotype polymorphism in order to cover the main immunoglobulin heavy gamma (IGHG) gene diversity. Gm allotypes, reflecting part of this diversity, were determined by a serological method. On its side, the IGH locus comprises four functional IGHG genes totalizing 34 alleles and encoding the four IgG subclasses.
The genomic study focused on the nucleotide polymorphism of the CH2 and CH3-CHS exons and of the intron. Despite strong sequence identity, four pairs of specific gene amplification primers could be designed. Additional primers were identified to perform the subsequent sequencing. The nucleotide sequences obtained were first assigned to a specific IGHG gene, and then IGHG alleles were deduced using a home-made decision tree reading of the nucleotide sequences. IGHG amino acid (AA) alleles were determined by mass spectrometry. Identical results were found at 95% between alleles identified by proteomics and those deduced from genomics. These results validate the proteomic approach which could be used for diagnostic purposes, namely for a mother-and-child differential IGHG detection in a context of suspicion of congenital infection. The human immune response mediated by the antibodies relies essentially on IgG, subdivided into four subclasses IgG1, IgG2, IgG3, and IgG4, ordered by decreasing concentrations in the circulating blood (1). The specificity of this immune response is ensured by the immense diversity of the repertoire of antigenic recognition carried by the paratopes, i.e. the variable domains of the heavy and light chains (2). The immunoglobulin (IG) 1 heavy gamma chains exhibit polymorphisms, mainly localized on the CH2 and CH3-CHS regions of the Fc fragment. This diversity relies on the number of heavy gamma (IGHG) nucleotide substitutions and amino acid (AA) changes listed in IMGT®, the international ImMunoGeneTics information system® (http://www.imgt.org) (3). The polymorphism of the IG gamma chains is both isotypic (there are four functional IGHG genes) and allelic. To date 34 IMGT alleles (5 IGHG1, 6 IGHG2, 19 IGHG3, and 4 IGHG4) are identified, which correspond to 25 alleles with amino acid changes in the coding regions or IMGT AA alleles (3 IGHG1, 4 IGHG2, 15 IGHG3, and 3 IGHG4 IMGT AA alleles, respectively) (4,5). Several of the gamma chain polymorphisms are genetic variants detected serologically (or allotypes) and the combination of these gamma markers (Gm) carried by the gamma1, gamma2, and gamma3 chains constitute the G1m alleles, G2m alleles, and G3m alleles (4). The peptide diversity is subtle and is based on only a few amino acids sequence changes, some of which being very close when using mass measurement.
It may be important to take into account the IGHG polymorphism in several, nonexhaustive, cases. First, the use of monoclonal antibodies and related products is growing rapidly as therapeutic agents in disease areas such as cancer, rheumatoid arthritis, and Alzheimer's disease (6 -8). Second, observations have been made on the dependence on some Fc conserved but also polymorphic AA localized at the CH2-CH3 domain interface, of the IgG Fc binding affinity to the FCGRT (neonatal Fc receptor, FcRn) (9). Indeed, apart from its role in transferring maternal IgG from mother to fetus via the placenta, the FCGRT contributes to enhance the IgG half-life in serum (10), and this aspect may be critical in the context of IgG-based therapeutics (11). Last but not least, the knowledge and the use of the in vivo IGHG polymorphism may lead to an important improvement of the biological diagnosis in neonates of certain congenital pathologies. Indeed, difficulties are encountered in the serologic detection of IgG neosynthesized by the fetus in cases where congenital infections are suspected, because of the systemic transfer of maternal IgG that occurs in utero across the placenta, resulting in about 90% of the maternal serum level of IgG in the full-term newborns at delivery (12). It is namely the case of parasitic infections such as congenital toxoplasmosis and congenital Chagas disease (13) for which combinations of parasitological, molecular or immunological tests are required to confirm the infection status (14,15). As proof of concept, a proteomic approach exploiting the individual IGHG3 AA polymorphism has been patented in our laboratory, that aimed at distinguishing maternal from fetal total IgG in a newborn's blood sample by means of the detection by bottom-up mass spectrometry of proteotypic peptides allowing an assignation to IGHG3 alleles attributable to the IgG3 of either maternal or fetal origin (16,17).
However, the strong sequence homogeneity between the different IGHG AA alleles imposes a strict verification of the results obtained by mass spectrometry (MS). For this reason, the aim of the present study was to bring a molecular confirmation of IGHG AA alleles determined by bottom-up mass spectrometry. A panel of ten plasma samples was selected from a previous study on immunogenetics of malaria performed in Benin, on the basis of the inter-individual diversity in their Gm alleles (4), deduced by the phenotypic characterization of their Gm allotypes using a serological technique (18). In parallel, the corresponding genomic DNA was extracted and the gene portions comprising the polymorphic CH2 and CH3-CHS exons of the IGHG1, IGHG2, IGHG3, and IGHG4 genes were amplified and then sequenced in order to determine individual IGHG alleles. This study allowed assessing the degree of concordance between serological, proteomic and genomic determinations performed for each sample. Correlations have already been established between Gm alleles and IGHG AA alleles (4) as well as between G3m alleles deduced from serology and IGHG3 nucleotide alleles determined by genomic sequencing (19). The present study is the first to validate IGHG alleles identified by MS using DNA sequencing on samples selected on the basis of their Gm allele diversity, itself deduced from the serological determination of Gm allotypes. Outside the originality of the double confrontation of results presented here, this work aims at reinforcing the necessity to determine unambiguously IGHG AA sequences in view of the various therapeutic and/or diagnostic applications under development cited above.

EXPERIMENTAL PROCEDURES
Samples Collection-Plasma and corresponding DNA samples were obtained from a study on human genetic determinants of malaria that was performed in 2006 -2007 in the south of Benin by the UMR 216 (18). This study was conducted among 155 children belonging mainly to the Fon ethnic group and was authorized by the institutional ''Ethics Committee of the Faculté des Sciences de la Santé Љ (FSS) from the Université d'Abomey-Calavi (UAC) in Benin. For the purpose of the present study, samples from 10 children were selected on the basis of their Gm allotype polymorphism, determined by a serological method.
Children PA01, PA07, PA09, PA16, PA31, PA42, PA45, PA48 were asymptomatic carriers of the parasites responsible for malaria and were recruited in a primary school of Ouidah with the approval of the coordinating doctor of the sanitary zone and the inspector of education. For these schoolchildren, a collective written informed consent was obtained from the responsible person in charge of the Parents Association, after having dispensed oral information on the study to the school director, the teachers and the members of the Parents Association. The remaining two children had different clinical presentations of malaria, i.e. mild malaria attack (AS50) and severe malaria (NP49) and were recruited in the pediatric service of the National University Hospital of Cotonou. For these children, a written informed consent was obtained from their parents or guardians.
Blood was collected into 5 ml EDTA Vacutainer® tubes and after centrifugation, plasma samples and isolated peripheral buffy coat were frozen at Ϫ20°C. Genomic DNA was extracted from peripheral blood buffy-coats using the QIAamp DNA Blood Mini Kit (Qiagen GmbH, Hilden, Germany) (20).
Serological Determination of G1m and G3m Allotypes-Gm allotypes for G1m (1,2,3,17) and G3m (5,6,10,11,13,14,15,16,21,24,28) determinants (4) were analyzed in plasma samples by a qualitative standard hemagglutination inhibition method (21). Whereas G3m28 is a marker of the gamma 3 chains in European and Asian populations, Gm28 is often present on the gamma 1 chains in African populations (22). In brief, human blood group O Rh (D) erythrocytes were coated with anti-Rh (D) antibodies of known Gm allotypes. Plasma sample and reagent monospecific anti-allotype antibodies were added. Plasma containing IgG with a particular Gm allotype inhibited hemagglutination by the corresponding reagent anti-allotype antibody, whereas plasma sample that was negative for this allotype did not.

Mass Spectrometry (MS) Analysis of IGHG Tryptic Peptides-Sample
Preparation-For each sample, total IgG was purified by injecting 200 l of plasma into a Protein G column (Protein G Sepharose HP SpinTrap, GE Healthcare, France) which presents high affinity for the Fc fragment of all IgG subclasses. Following manufacturer's instructions, IgG binding was performed at neutral pH, elution was performed by lowering the pH, and the eluted material was collected in neutralization buffer to preserve the integrity of acid-labile IgG. Dithiothreitol (DTT, 20 mM final) was added to 47 l of purified samples for 30 min at 56°C in order to disrupt disulfide bonds. Proteolysis was performed by incubation with trypsin (Promega, France, 10 ng/l final) for 3 h at 37°C, and stopped with trifluoroacetic acid (TFA, Pierce, France, 0.5% final).
Bottom-up LC-MS/MS Analysis-Peptides were concentrated and separated by nano High-Performance Liquid Chromatography (nHPLC) hyphenated with an orbitrap mass spectrometer. For each sample, 4 l were then analyzed in LC-MS/MS. Analyzes were realized using an RSLC Ultimate 3000 Rapid Separation liquid chromatographic system coupled to a hybrid Q Exactive mass spectrometer (Thermo Fisher Scientific). Briefly, peptides were loaded and washed on a C18 reverse phase precolumn (3 m particle size, 100 Å pore size, 75 m i.d., 2 cm length). The loading buffer contained 98% H 2 O, 2% acetonitrile (ACN) and 0.1% TFA. Peptides were then separated on a C18 reverse phase resin (2 m particle size, 100 Å pore size, 75 m i.d.,15 cm length) with a 35 min "effective" gradient from 99% A (0.1% formic acid and 100% H 2 O) to 40% B (80% ACN, 0.085% formic acid and 20% H 2 O).
The mass spectrometer acquired data throughout the elution process and operated in a data dependent scheme with full MS scans acquired in the orbitrap analyzer, followed by up to 15  Database Queries of Extracted Experimental Data-Three FASTAformatted protein databases were used in combination in order to assign a majority of fragmentation spectra: 1-the Homo sapiens entries from the UniProt/SwissProt database release 2016 -02 (20,273 sequences), 2-the 2014 IMGT® IGHG allele database (2, 3, 5) (IMGT Repertoire. Sections: Protein displays, Alignments of alleles, http://www.imgt.org) (60 sequences; 20,789 amino acids) and 3-the individual databases of IGHG CH2 and CH3-CHS sequences obtained following the molecular investigation described above for the 10 patients incorporated in this study and supplied in supplemental Table S1 (60 sequences; 13,004 amino acids). Peaklists extracted from MS/MS spectra were compared with the above databases using Mascot search engine 2.5.1 (Matrix Science). The following settings were applied: mass tolerances were 4 ppm for precursors and 10 mmu for fragments, a significance threshold score corresponding to p Ͻ 0.05 was applied to filter identifications but a minimum individual peptide Mascot score value of 25 excluded poorly annotated spectra. Based on these criteria, a nontargeted (bottom up) analysis of the samples was performed. In order to avoid possible ambiguities, the following restrictive conditions were applied for mascot queries: trypsin proteolysis specificity without missed cleavage, methionine oxidation as unique modification allowed (as partial) and mass-to-charge (m/z) states with zϾ2 or more. Under these conditions false positive rates were usually under 1%. Mascot searches resulted in protein groups sharing several peptides but showing specific (unique) peptides demonstrating their presence. Peptide information regarding AA diversity of IGHG1 to IGHG4 genes and their alleles were carefully collected.
Molecular Investigation of the IGHG Diversity-DNA Amplification-The nucleotide sequences of the 34 IGHG alleles which cover the diversity of the four functional IGHG genes were extracted from IMGT/GENE-DB (5) and Alignment of alleles, IMGT Repertoire® (3) (http://www.imgt.org) and aligned using the Multalin website (http://multalin.toulouse.inra.fr). Two polymorphic areas framing the CH2 and the CH3-CHS exons, respectively, allowed designing 4 pairs of primers (IDT, Leuven Belgium) for the amplification of each IGHG gene separately, as represented in Fig. 1 and supplemental Table S2.
DNA Sequencing-Sequencing of the PCR products was performed using each of the four pairs of amplification primers with the addition of one consensual pair of primers located inside the CH2 and CH3-CHS region (Fig. 1). A control sequencing of the same PCR products was performed using sequencing primers specific for each of the four IGHG genes and located either inside the CH2 exon, or in the intron between the CH2 and CH3-CHS exons. As recommended by the "Genome and Sequencing Platform" (Institut Cochin-Centre de Recherche, Paris, France), sample templates were prepared as follows: 800 ng of purified PCR product and 30 ng (4 pmoles) primers (forward or reverse) in a 15 l final volume. Sequencing was performed at the Platform on a 3730XL DNA Analyzer (Applied Biosystems) and raw sequences made available on the Platform website.
Sequence Analysis-Raw sequences were read and corrected with the "4peaks" free software (http://nucleobytes.com/index.php/ 4peaks). Contigs were then created using the "Sequencher® 5.0" software (Gene Codes Corporation, Ann Arbor, MI), allowing a visualization of sequence chromatograms to be compared with the IGHG1*01, IGHG2*01, IGHG3*01, and IGHG4*01 IMGT® reference alleles (http://www.imgt.org) (5). Although the intron localized between the CH2 and CH3-CHS exons was amplified and sequenced, the sequence analysis was focused on the CH2 and CH3-CHS exons. A first analysis step consisted to check, in addition to the observation of a single band on agarose gel, that the amplification products were IGHG gene specific. For this purpose, verification was made of the presence of several specific codons located along the CH2 and CH3-CHS exons of each IGHG gene, as summarized in supplemental Table S3. Once this done, each IGHG allele could be attributed by moving successively from a polymorphic nucleotide to another along the coding sequence covering the CH2 and CH3-CHS exons. This work was facilitated by the implementation of a decision tree reading of the IGHG nucleotide sequences, represented in supplemental Table S4.
Proteogenomic Analysis-The IGHG nucleotide sequences resulting from the sequencing of amplified DNA fragments of all experimental samples were artificially spliced between the CH2 and CH3-CHS exons and translated using the converter software (http:// didac.free.fr/seq/dna2pro.htm). This allowed to create for each sample an experimental "sample database" to be imported in Mascot software for a new peptide query aiming at detecting single amino acid variants (SAAV) unknown to date (26,27) and therefore not listed in the concatenated IMGT® IGHG allele and SwissProt database used in the first MS analysis. The peptide interrogation was performed according to the same conditions as previously used. For each sample of purified IgG, the analysis proceeded in two steps, a first query of IGHG peptides on each sample-specific database preceding a second query where the sample-specific databases were added to the IMGT® and SwissProt databases.

Gm Alleles Deduced from Serological Determination of Gm
Allotypes-Experimental samples originated from 7 boys and 3 girls aged from 4 to 10 years (mean age Ϯ S.D. ϭ 6.9 Ϯ 1.6 years) all belonging to the Fon ethnic group which prevails in South-Benin except one child (PA09) who belonged to the Yoruba ethnic group originating from Nigeria. These samples were selected out of a series of 155 samples for the presence in their plasma of IgG characterized by the four main G3m alleles (G3m5*, G3m6*, G3m15*, and G3m24*) that may be encountered among sub-Saharan African populations (28). Table I presents the results of the serological determination of G3m allotypes, from which G3m alleles were deduced. It appears that IgG from each plasma sample were representative of one particular G3m allele combination, at the homozygous or heterozygous state, Gm allotypes being co-dominantly expressed (4,19). Otherwise, all samples presented the G1m17,1 allele, known to be encoded by IGHG1*01, IGHG1*02, or IGHG1*05 (Gene table Homo sapiens IGHC, IMGT Repertoire) (http://www.imgt.org) (4). NP49 and PA31 samples were Gm28-positive with a plausible attribution of this allotype to the gamma 1 chain (IGHG1*05p, IGHG1*06p) (4) (Allotypes, IMGT Repertoire, http://www.imgt.org).  (2,4). As Gm allotypes are encoded on chromosome 14 (14q 32.3), two G3m alleles are expressed and mentioned using a slash mark. b the simplified G3m form contains the number of one representative allotype followed by *.
IGHG AA Alleles Deduced from Mass Spectrometry Analysis Using Usual Peptide Databases-As we routinely use the SwissProt Homo sapiens FASTA protein database we quickly realized it was not pertinent regarding the IGHG diversity. Thus we added the IMGT® database of all IGHG AA alleles. Using two FASTA sequence databases with overlapping sequences can be confusing. Indeed if unique peptides were attributable to specific IGHG genes and alleles to each IG heavy chain, as listed in Table II, many peptides were shared between all alleles of a given IGHG gene. Because Mascot sorts resulting identifications according to the hits with the best matching set of peptides, only peptides comprised in the same sets of the Mascot analysis were considered and by default not those included in the subsets. This allowed exclusion of possible ambiguities corresponding to alleles with lesser occurrence probability. Following these rules, two heterozygous IGHG allele pairings could be easily deduced. It was for example the case of IGHG3*03/IGHG3*13 for AS50 and PA16 where the unambiguous attribution of IGHG3*03 (based on the presence of the discriminatory R.WQEGNVF-SCSVMHEALHNR.F peptide) led to the deduction of IGHG3*13 as second IGHG3 allele (based on the presence of both K.GFYPSDIAVEWESSGQPENNYK.T and R.WQEGNIFSCS-VMHEALHNR.F issued from IGHG3*06*07*13 and IGHG3*13, respectively). In other cases, allele pairings could be partly deduced, such as IGHG2*06/IGHG2*01*03*04*05*06 for A550, PA07, PA09, PA16, and PA42. By lack of a sufficient coverage of discriminatory peptides, MS analysis did not allow defining the alleles for IGHG1 of all samples and IGHG4 of a majority of them, however for PA01, PA42, and PA45 the K.TTPPVLDSDGSFFLYSR.L peptide allowed to exclude IGHG4*03.
IGHG Alleles Deduced from Sequencing of the CH2 and CH3-CHS Region-Critical nucleotide positions listed in the supplemental Table S4 allowed discriminating alleles assigned to each IGHG gene. The 10 samples were homozygous for IGHG1*02 (Alignment of alleles IGHG1, IMGT Repertoire, http://www.imgt.org), as demonstrated by the reading of the IGHG1 CH2 to CH3-CHS nucleotide sequences. Following the same strategy, the IGHG2*06 allele (Alignment of alleles IGHG2, IMGT Repertoire, http://www.imgt.org) could be deduced unambiguously, for 9 out of 20 IGHG2 alleles (Table III). In 10 other cases, the IGHG2*01*03*04*05 could be assigned to four types of alleles departing from the IGHG2*01 IMGT® reference allele by some silent nucleotide substitutions (all comprising at least CH2 t273Ͼc, based on the IMGT® unique numbering (25)), which have not yet been described. In the last case, that concerned the PA01 sample, a nonsynonymous CH2 g274Ͼc substitution (associated with two silent ones, CH2 t273Ͼc and CH3 a243Ͼg) generated a CH2 V92ϾL AA change defining, for the sequence CH2 and CH3-CHS, a putative new AA allele of IGHG2. Regarding IGHG3 (Alignment of alleles IGHG3, IMGT Repertoire, http:// www.imgt.org), AA alleles could be directly deduced from the sequencing, and were in order of decreasing frequency IGHG3*01*04*05*10, IGHG3*03, IGHG3*13, and IGHG3*17 (Table IV). Lastly, for IGHG4 (Alignment of alleles IGHG4, IMGT Repertoire, http://www.imgt.org), the IGHG4*01 or IGHG4* 04 AA alleles could be unambiguously assigned from the sequencing in 8 out of 20 cases (Table V). In nine other cases, two types of silent substitutions (including at least CH2 g3.4Ͼa) led to the identification of putative alleles which have not yet been described, without impeding a common deduction of the IGHG4*01*04. For the PA07 sample, two nonsynonymous substitutions (CH2 g271Ͼa and a322Ͼc) generated the AA changes CH2 V91ϾI and N108ϾH, respectively, defining for the sequence CH2 and CH3-CHS a putative new AA allele of IGHG4. An additional CH3 a32Ͼg substitution corresponding to a CH3 Q11ϾR AA change occurred for one allele from both AS50 and PA16 samples, defining for the sequence CH2 and CH3-CHS an additional putative novel IGHG4 AA allele.
Confrontation of Serological, Proteomic, and Genomic Approaches of IGHG Allele Deduction or Determination-IGHG1-Whatever the method employed, all samples were monomorphic for IGHG1 alleles. The genomic determination was the most accurate, leading to IGHG1*02, whereas serology identified G1m17,1 allotypes (that may correspond to either IGHG1*01 or IGHG1*02) and proteomics could not yield distinguishable peptides from specific IGHG1 alleles because of lack of distinguishing peptide coverage. In contrary to serological results which indicated a Gm28 carriage for NP49 and PA31, the combined CH3 g344, 115R and CH3 a347, 116Y positions corresponding to this marker were not found in any sample either on IGHG1 or IGHG3 alleles. However, it is not excluded that only one of the two alleles present per individual was sequenced and that the allele carrying Gm28 was missed in the amplification.
IGHG2-No serological determination was performed for IGHG2 but prevalent Gm haplotypes are characteristic of given populations, and because of the African origin of the individuals under study, an absence of G2m23 allotype is the most plausible configuration, consistent with the IGHG2*01*03* 04*05*06 (4,28). Proteomics helped to restrict the possibilities with the assignment of IGHG2*06 combined to IGHG2*01*03* 04*05*06 for AS50, PA07, PA09, PA16, and PA42. Interestingly, genomics contributed to further specify these results by attributing IGHG2*06 at the homozygous state to AS50, NP49, and PA42 and at the heterozygous one to PA07, PA09, and PA16. Moreover, it is worth to notice that a new IGHG2 AA allele was suggested for PA01 on the basis of a nonsynonymous substitution.
IGHG3-For the much more polymorphic IGHG3 alleles, results from the different methods were challenging. They are summarized in the Table VI. Considering that a whole number of 20 IGHG3 alleles were deduced, the greatest number of discordances (n ϭ 8) was recorded between serological and genomic approaches, and 6 of them appeared also between   Interest in Proteomic Determination of the IGHG Diversity serology and proteomics. Strikingly, only one discrepancy could be noted between proteomics and genomics regarding the IGHG3*01*04*05*10 nucleotide allele which was not identified neither by serology (deduction of homozygous IGHG3*17) nor proteomics (deduction of heterozygous IGHG3*03/IGHG3* 17*18*19, as shown by MS/MS fragmentation spectra of the supplemental Fig. S1). IGHG4 -Regarding IGHG4 alleles, only proteomics and genomics could be compared: as for IGHG2 above, results were concordant, with a greater accuracy afforded by the genomic sequencing, which allowed identification of IGHG4*01, IGHG4*04, or IGHG4*01*04. Again, here two new IGHG4 AA alleles (one for AS50 and PA16 and another for PA07) resulting from three nonsynonymous substitutions were uncovered by both techniques.

Interest in Proteomic Determination of the IGHG Diversity
IGHG AA Alleles Deduced from Mass Spectrometry Analysis Using Sample-specific Databases-Molecular analysis allowed the identification of novel IGHG2 (6) ( Table III and  supplemental Table S5) and IGHG4 (4) ( Table V and supplemental Table S6) nucleotide sequences, and among them, 1 IGHG2 and 2 IGHG4 sequences were associated with a total of four AA changes. As the previous Mascot searches could only match known sequences, we compiled a sample-specific database with the aim to perform the Mascot query again for checking the presence of putatively four new trypsin-cut peptides in the purified IgG samples. The same conditions as already described earlier were used, except that query results considered sets, samesets but also subsets of peptides. The use of the SwissProt Homo sapiens FASTA database from Expasy was kept to match all nonrelevant proteins that were co-purified, thus preventing an increase of false positive matching. The transcribed R.VVSVLTVLHQDWLNGK.E peptide (CH2 85.1-101, V92ϾL) was assigned to an IGHG2 subset by Mascot following analysis of purified IgG from the PA01 sample. This observation was striking as this peptide is usually found on all IGHG1, all IGHG3 (except IGHG3*09) and all IGHG4 (except IGHG4*02) peptide alleles but never on IGHG2. Similarly, in the case of AS50 and PA16 samples, the transcribed R.EPQVYTLPPSR.E peptide (CH3 1.1-11, Q11ϾR) was assigned to an IGHG4 set of peptides whereas this peptide is usually exclusive of IGHG1, IGHG2, and IGHG3 peptide alleles. A potentially new IGHG4 peptide, K.VSHK.G (CH2 106 -108, N108ϾH), never referenced among IGHG sequences, was too small to be seen by MS because of the tryptic cleavage specificity, from the AS50, PA07 and PA16 samples. It was not the case of the transcribed R.VVSVLTIL-HQDWLNGK.E peptide (CH2 85.1-101, V91ϾI), never referenced among IGHG sequences, that Mascot observed and assigned to an IGHG4 set of peptides for purified IgG from both AS50 and PA16 samples, and not in PA07 where it was also expected.
In order to check the accuracy of these last results, a final analysis consisted for each sample in a Mascot query among pooled IMGT®, SwissProt and sample-specific databases.   (29); amino acid (AA) EU numbering according to IMGT® www.imgt.org (23).   . S2).

DISCUSSION
The present study was carried in an attempt to check with another technique the adequacy of bottom-up mass spectrometry for the identification of the IGHG allelic diversity. Therefore, a nucleotide sequencing of the CH2 and CH3-CHS domains, which concentrate the allelic diversity corresponding to the Fc fragment of the IG heavy gamma chain, was undertaken among DNA samples from ten individuals, and results were compared with those obtained in the corresponding sera by a hemagglutination inhibition method as well as in the corresponding purified IgG by bottom-up MS. For IGHG3, the highest concordance was found between proteomics under nonambiguous conditions and genomics (19 identical alleles/groups of alleles out of 20) whereas identical results were found in only 14 and 12 out of 20 alleles when comparing serology versus proteomics and serology versus genomics, respectively (Table VI). Moreover, the sequencing of the IGHG among these samples originating from Beninese individuals led to the identification of 6 IGHG2 and 4 IGHG4 nucleotide sequences not yet described. When translated, two of these sequences led to 2 putative IGHG peptides not yet described and assigned to IGHG4 by Mas-cot, that were R.VVSVLTILHQDWLNGK.E (CH2 85.1-101, V91ϾI) and K.VSHK.G (CH2 106 -108, N108ϾH). By means of proteogenomics, R.VVSVLTILHQDWLNGK.E was identified in the genomically modified set peptide group of the Mascot analysis for 2 (AS50 and PA16) out of 3 samples where this peptide was expected. Regarding this last peptide, a sequence alignment performed by a BLAST search on NCBI® database (National Center for Biotechnology Information, http://blast.ncbi.nlm.nih.gov/Blast.cgi) did not identify any referenced peptide allowing 100% sequence cover: whatever the aligned peptide, the amino acid at position 308 of CH2 85.1-101 is always V instead of I. As the monoisotopic mass difference of 14.015650 Da between V and I is exactly the same as a methylation (UniMod, http://www.unimod.org/, protein modifications for mass spectrometry), the possibility of an artifact was considered but immediately ruled out for two reasons: (1) valine is a neutral amino acid not subject to methylation and (2) in both AS50 and PA16 samples, the MS/MS fragmentation reported y and or b fragments circumscribing exactly the residues of interest (supplemental Fig.  S2). An experimental proof would consist in the addition to the test sample before injection of an identical synthetic peptide harboring a CH2 85.1-101, V91ϾI substitution, in order to compare the m/z and retention time values of endogenous versus synthetic peptides. Similarly, deamidation of Asn (N) or Gln (Q) as possible amino acids modifications were not considered in the Mascot queries to avoid erroneous appearance of irrelevant allele sequences in the Mascot results. The sample preparation and handling methods must therefore avoid  (30,31). To summarize, proteomic and genomic results were highly concordant for all IGHG sequences, with the pointing of defined AA changes suggestive of new IGHG2 and/or IGHG4 peptide sequences. For IGHG3, genomic results always consolidated proteomic ones except in one case (Table VI). It concerned the NP49 sample issued from a 6-year old boy hospitalized at Cotonou (Benin) for cerebral malaria combined with clinically diagnosed anemia. As blood transfusion is currently used in African developing countries to prevent worsening of the malaria pathology (32), it is plausible that the IGHG3*17/IGHG3*17*18*19 detected by serology/proteomics may originate from a donor's IgG3 circulating in the blood of the NP49 recipient. The other discordances recorded between IGHG3 alleles deduced by serology versus proteomics or genomics may be attributable to the following reasons. First, IGHG3 alleles are encoded by codominant genes (4), but in case of heterozygous carriage, it is plausible that one allele may be expressed more abundantly than the other, as already shown for IGHG1 alleles (33), leading to its only detection by serology, such as may be the case for the PA16 sample, where the IGHG3*13 production may exceed that of IGHG3*03 (found by MS and not by hemagglutination inhibition). In some other cases (PA01 and PA42) the serological attribution of IGHG3*17 seems unlikely because absent from both MS and molecular deductions. It must be kept in mind that IGHG3*17 results from the G3m10,11,13,15,27 combination of Gm allotypes and differs from IGHG3*01 and related alleles from the G3m5*, by a concomitant presence of G3m15 and absence of G3m5 and G3m14 allotypes. It can be argued that the difficulty of obtaining well characterized reagent monoclonal antibodies may lead to unstable agglutinates (we used polyclonal reagents coming from blood donors) (34). Among other possible explanations for discrepancies between serological and proteomic/genomic results, are the access to plasma (containing fibrinogen) rather than serum, combined to availability of limited plasma volumes (implying dilutions), that hampered an optimal realization of the hemagglutination inhibition method. However, an eventual depletion in IgG bearing particular IGHG3 alleles during IgG purification on Protein-G column was not an option, as in any case among the results presented here (Table VI) was an IGHG3 allele found by both serology and genomics and not by proteomics.
The excellent correlation between proteomic and genomic results was partly inherent to the analysis setting used in Mascot, where removing dynamic parameters such a as missed cleavages, methionine oxidation and especially deamidation of N or Q prevented a misclassification of alleles linked for example to the identification of the proteotypic peptide R.WQEGNIFSCSVMHEALHNR.F (peptide signature of IGHG3*13) instead of R.WQQGNIFSCSVMHEALHNR.F (peptide signature of IGHG3*01*02*04 to *12). Nonetheless, these precautions did not prevent a mismatch of short peptide sequences (averaging 20 AA) during the probabilistic reconstitution by Mascot of the polypeptide (more than 200 AA) that covers the CH2 and CH3-CHS domains of the Fc fragment. For example, a misalignment by Mascot of K.PREEQYNSTYR.V and R.WQQGNIFSCSVMHEALHNHYTQK.S may lead to identification of IGHG1*04 whereas the second peptide may also be relevant to IGHG3*17 or IGHG3*18*19 alleles when associated with other short discriminatory peptides all present in the mixture resulting from the trypsin digestion of purified IgG from the four IgG subclasses.
A methodological effort of simplification would consist in performing a specific enzymatic cleavage of all Fc/2 fragments by a cysteine proteinase from Streptococcus pyogenes (IdeS) (35). This enzyme, combined to PNGase F for the hydrolysis of all glycans attached on IgG heavy chain and subject to inter-and intraindividual variations (36), generates polypeptide fragments of about 24 kDa (211 AA) concentrating all possible polymorphic AA combinations on the IGHG CH2 and CH3-CHS domains. This results in 21 possible Fc/2 peptides differing by at least 1 Da which could be analyzed using a middle-down MS strategy (37,38). This new technology that combines aspects of top-down (intact protein) and bottom-up (enzymatic proteolysis) strategies aims to achieve both high resolution and high mass accuracy (39). It presents the advantage of minimizing wrong assignment to a particular AA IGHG allele which could result from erroneous combinations of small peptides when using the bottom-up process. Moreover, as the identification of one IGHG allele will resume in the characterization of one polypeptide, it is conceivable that the discriminatory peptides under analysis will be more frequent than when dissected by trypsin into small peptides necessitating a probabilistic reconstitution in a single sequence (40).
In the context of the present study, a middle-down MS approach would have allowed to assign VVSVLTVLHQD-WLNGK (CH2 85.1-101, V92ϾL) to IGHG2 (sample PA01) as well as EPQVYTLPPSR to IGHG4 (samples AS50 and PA16) by identifying these infrequent sequences within a polypeptide harboring other IGHG2 or IGHG4 AA signatures, respectively. It would also have been possible to make analyzable the short VSHK (CH2 106 -108, N108ϾH) new peptide assignable to IGHG4 (samples AS50, PA07 and PA16). Lastly, despite very high sequence percentage of identity, there is no formal proof that the newly identified R.VVSVLTILHQD-WLNGK.E (CH2 85.1-101, V91ϾI) peptide originates from an IGHG4 polypeptide sequence. Indeed, the preparative treatment of plasma samples using Protein G columns led more to IgG enrichment than to an exclusive IgG purification, and middle-down MS could refute the very low probability for this peptide to belong to a residual plasma protein bearing an unknown to date amino acid polymorphism. In fact, the advent of middle-down MS combined to proteogenomics will contribute to move forward an increasingly detailed descrip-tion of the Fc fragment diversity, in support of demonstrations like the one presented here.
In conclusion, this study confirms the reliability of the MS approach for investigating the IGHG AA diversity under stringent conditions of analysis, and brings new molecular tools adapted to a fast screening of this diversity. Many applications can result from an accurate determination of these polymorphisms, such as the full validation of therapeutic antibody sequences whose technology is booming (41). Another promising application would consist in the diagnosis of congenital infections in neonates by a differential detection of maternal and fetal IgG on the basis of the IGHG individual diversity (16,17). Work is underway in our laboratory to apply the middle-down MS approach to polymorphic Fc/2 fragments obtained after complete isolation of parasite-specific IgG from neonates suspected of congenital toxoplasmosis or Chagas disease. If successful, this new way to neonatal serological diagnosis using proteomics could also benefit to congenital infections of bacterial or viral origin.
Acknowledgments-We thank the participating children and their families. We are grateful to Evelyne Guitard (UMR 5288, CNRS, Université Paul Sabatier Toulouse III, France) for technical advice.

DATA AVAILABILITY
The newly described IGHG2 nucleotide sequences have been deposited in the GenBank database under GEDI (GenBank/ENA/DDBJ/IMGT/LIGM-DB) Accession Numbers KX670549 to KX670554, sequences KX670550 and KX670551 differing in the intron. Similarly, the newly described IGHG4 exon nucleotide sequences have been deposited in the GenBank database under GEDI Accession Numbers KX670555 to KX670558. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (42) partner repository with the dataset identifier PXD005021.