The Surprising Composition of the Salivary Proteome of Preterm Human Newborn

Saliva is a body fluid of a unique composition devoted to protect the mouth cavity and the digestive tract. Our high performance liquid chromatography (HPLC)-electrospray ionization-MS analysis of the acidic soluble fraction of saliva from preterm human newborn surprisingly revealed more than 40 protein masses often undetected in adult saliva. We were able to identify the following proteins: stefin A and stefin B, S100A7 (two isoforms), S100A8, S100A9 (four isoforms), S100A11, S100A12, small proline-rich protein 3 (two isoforms), lysozyme C, thymosins β4 and β10, antileukoproteinase, histone H1c, and α and γ globins. The average mass value reported in international data banks was often incongruent with our experimental results mostly because of post-translational modifications of the proteins, e.g. acetylation of the N-terminal residue. A quantitative label-free MS analysis showed protein levels altered in relation to the postconceptional age and suggested coordinate and hierarchical functions for these proteins during development. In summary, this study shows for the first time that analysis of these proteins in saliva of preterm newborns might represent a noninvasive way to obtain precious information of the molecular mechanisms of development of human fetal oral structures.

and statherin (2)(3)(4)(5). In a previous study (6), we have established that some salivary proteins and peptides reach the levels typically observed in the adult around 18 years of age. Encouraged by the noninvasive specimen collection, we explored the salivary protein composition of at-term and preterm newborns, in order to establish the starting point of the secretion of the proteins and peptides specific of saliva. Our first study (7) showed that acidic proline-rich proteins secretion started, although at very low levels, at 7 months of postconceptional age. At this age the level of phosphorylation of these proteins was low and it increased reaching a value comparable with that of adults at about one year of age, in concomitance with the beginning of deciduous dentition. Other deep differences between human and preterm saliva were however evident. Highly abundant protein masses detected in preterm saliva were undetectable (at the sensitivity level of our MS apparatus) or at very low level in adult saliva. In a previous study (8) we identified, by different MS approaches, thymosin ␤ 4 (T␤ 4 ) and thymosin ␤ 10 (T␤ 10 ) in preterm newborn saliva and established by immunohistochemistry their presence in fetal salivary glands. This finding let us to suppose that in preterm newborns these peptides derived from gland secretion (8) whereas the low levels in adult saliva were mainly of crevicular fluid origin (9). In another study (10) we were able to elucidate the structure of two isoforms of the small proline-rich protein 3 (SPRR3, or cornifin ␤) 1 detectable only in preterm saliva.
The present study extends our previous findings and reports for the first time the identification of other proteins in the acidic soluble fraction of whole saliva of human preterm newborns. This goal could only be reached by using a top-down proteomic platform based on high resolution ESI-MS measurements following chromatographic separation of intact proteins.
The identification of 20 components was achieved by different chemical and enzymatic treatments coupled to low-and high-resolution MS and mainly confirmed by high-resolution tandem MS (MS/MS) experiments performed on intact proteins. Moreover, the relative amounts of these proteins were determined in 61 different samples of eight preterm subjects in order to evaluate concentration trends as a function of postconceptional age. The relative amount of proteins in whole saliva of preterm newborn was also compared with whole saliva of at-term newborns and adults.

EXPERIMENTAL PROCEDURES
Ethics Statements-The study protocol and written consent forms were approved by both the Pediatric Department Ethics Committee and by the Medical Ethics Committee of the Faculty of Medicine of the Catholic University of Rome (according to the instructions of the Declaration of Helsinki). Full written consent forms were obtained from adult donors and from the parents of the newborns and all rules were respected.
Reagents and Apparatus-All common chemicals and reagents were of analytical grade and were purchased from Carlo Erba, (Milan, Italy), Merck (Darmstadt, Germany), Sigma Aldrich (St. Louis, MO), and Pierce Biotechnology (Rockford, IL). Low-resolution HPLC-ESI-MS measurements were carried out by a Surveyor HPLC system (ThermoFisher, San Jose, CA) connected by a T splitter to a photodiode-array detector and an LCQ Deca XP Plus mass spectrometer (ThermoFisher). The chromatographic column was a Vydac (Hesperia, CA) C8 with 5-m particle diameter (column dimensions 150 ϫ 2.1 mm). High-resolution HPLC-ESI-MS/MS experiments were carried out by an Ultimate 3000 Nano/Micro HPLC apparatus (Dionex, Sunnyvale, CA) equipped with a FLM-3000-Flow manager module and coupled to an LTQ Orbitrap XL apparatus (ThermoFisher). A Dionex C18 column (3-m particle diameter; column dimension 300 m i.d. ϫ15 cm) or a Zorbax 300 SB-C8 (3.5-m particle diameter; column dimension 1 mm i.d. ϫ15 cm) were used as chromatographic columns. Matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF)-MS was an Autoflex Brucker Daltonics (Billerica, MA) apparatus.
Subjects Enrolled, Sample Collection and Treatment-Eight newborns (five females, three males) with birth weight between 500 g and 1250 g and 193-217 days of postconceptional age (27-31 weeks), admitted to the Neonatal Intensive Care Unit of the Faculty of Medicine of the Catholic University were enrolled for this study. Infants with major congenital malformations or prenatal infections were excluded from the study. Sample collection was performed on the same preterm newborn during several weeks following birth at established time intervals (1 or 2 weeks). If possible, analysis was also performed following discharge from the neonatal unit during the periodical check visits up to 1 year follow-up. For ethical reasons, saliva was only collected when sample collection caused no stress to the newborn. In this way we were able to collect and analyze 61 saliva specimens from newborns with postconceptional age ranging between 194 and 545 days.
Moreover, saliva from four at term infants (two females, two males) born after uncomplicated pregnancies and vaginal delivery and admitted to Policlinico "A. Gemelli" nursery, was studied. Gestational age and birth weight (Mean Ϯ S.D.) were 272 Ϯ 7 days (38 Ϯ 1 week) and 3280 Ϯ 150 g, respectively. These four newborns had no clinical problems, therefore they were discharged after 3 days of breast-feeding.
Resting whole saliva was also collected with a soft plastic aspirator at the base of the tongue from 20 informed adult volunteers (42 Ϯ 18 years old, 10 males, 10 females).
After collection, salivary samples were immediately mixed with an equal volume of 0.2% aqueous trifluoroacetic acid (TFA) (v/v) in ice bath. After stirring, the acidic supernatant was immediately centrifuged at 9000 ϫ g for 3 min to remove precipitate and the acidic clear solution was either immediately analyzed by HPLC-ESI-MS (100 l, corresponding to 50 l of whole saliva) or stored at Ϫ80°C.
Reversed Phase (RP)-HPLC-ESI-MS Analysis-The following solutions were utilized in low-resolution ESI-MS chromatographic separations: (eluent A) 0.056% aqueous TFA and (eluent B) 0.050% TFA in acetonitrile-water 80/20 (v/v). The applied gradient was linear from 0% to 55% B in 40 min, at a flow rate of 0.30 ml/min. The T splitter permitted 0.20 ml/min to flow toward the diode array detector and 0.10 ml/min toward the ESI source. The photodiode array detector was set at a wavelength of 214 and 276 nm. During the first 5 min of separation eluate was not addressed to the mass spectrometer to avoid source contamination and instrument damage because of the high salt concentration. Mass spectra were collected every 3 ms in the positive ion mode. MS spray voltage was 4.50 kV and capillary temperature was 220°C.
High-resolution nano-HPLC-ESI-MS/MS experiments were performed by using the following eluents: (A) 0.05% (v/v) aqueous TFA and (B) 0.05% (v/v) TFA in acetonitrile. The applied gradient was 0 -4 min 5% B, 4 -34 min from 5% to 50% B (linear), 34 -64 min from 50% to 90% B (linear), at a flow rate of 4.5 l/min. High-resolution positive MS/MS spectra were collected in full scan using the lock mass for internal mass calibration (polydimethyl cyclosiloxane, 445.1200 m/z) with the resolution of 60,000 and 30,000, respectively, and m/z range from 350 to 2000. In data-dependent acquisition mode the three most intense multiply charged ions were selected and fragmented by using collision induced dissociation (35% normalized collision energy) and spectra were recorded. Alternatively, fragmentation was carried out under the same conditions on selected multiply charged ions corresponding to specific protein masses. Tuning parameters were: capillary temperature 220°C, source voltage 2.4 kV, capillary voltage 26 V, tube lens voltage 245 V.
MALDI-TOF-MS Analysis-For MALDI-TOF-MS analysis, samples were dissolved in 0.1% aqueous TFA and the solution treated with a C-18 ZipTip pipette tip (Millipore, Billerica, MA) following the manufacturer indications including a wash with 0.1% TFA and elution with 0.1% TFA/acetonitrile (1:1, v/v). The desalted solution was mixed 1:1 (v/v) with saturated solutions of R-cyano-4-hydroxycinnamic acid in acetonitrile/water (50:50, v/v) containing 0.1% TFA. Aliquots of 1 l of the mixture were spotted onto the stainless steel target of the MALDI instrument. The calibration was performed using peptide calibration standards (angiotensin I and II, substance P, and bombesin, m/z range 1000 -3150 Da). Positive MALDI spectra were acquired in either linear or reflectron mode with a pulsed nitrogen laser (337 nm). In linear mode an acceleration voltage of 20 kV, a detector gain voltage of 1300 V, a pulsed ion extraction time of 350 ns and a laser frequency of 5 Hz were applied. In the reflectron mode an acceleration voltage of 19 kV, a detector gain voltage of 1400 V, a pulsed ion extraction time of 150 ns, and a laser frequency of 5 Hz were applied. Mass spectra were acquired over the mass range of 700 -6000 Da with the low mass cut-off of 500 Da and 400 scans were averaged for each spectrum.
Preparative RP-HPLC Purification of the Proteins Detectable in Whole Saliva of Preterm Newborns-Semipreparative RP-HPLC was utilized in order to purify the proteins detected in whole saliva of preterm newborn and/or to collect peptide fragments from trypsin digest (see in the following sections). Acidic solution from preterm saliva (200 l or more, when available) or tryptic digest were purified on a Surveyor Plus LC system (ThermoFisher) equipped with a PDA detector set at 214 and 280 nm. The column was a Vydac C8 with 5 m particle diameter (column dimensions 250 ϫ 4.6 mm). Eluent A was 0.056% aqueous TFA and eluent B was 0.050% TFA in acetonitrile-water 80/20 (v/v). The applied gradient was linear from 0% to 55% B in 40 min, at a flow rate of 0.90 ml/min. Fractions were collected in concomitance with the peak exit. The content of any purified fraction was checked by using HPLC-ESI-MS procedures described in the previous sections. Fractions with similar content obtained from different HPLC separation were sometimes pooled. Several fractions resulted to contain more than one protein. They were used for enzymatic and chemical treatments as described in the following sections without further purification.
Automated Edman Sequencing-Protein and peptide sequencing was carried out on the purified protein or peptide with a Procise 610A Protein Sequencer (Applied Biosystems, Foster City, CA). Sequencing was carried out by classical Edman degradation according to manufacturer's instructions.
Enzymatic Dephosphorylation-Freeze-dried powder of the purified protein or of protein mixtures (ca. 50 g) was dissolved in 100 l of 0.2 M Tris-HCl (pH 8.6) and 40 l of calf intestinal alkaline phosphatase (1 EU/l, Roche-Boehringer, Mannheim, Germany) were added. Incubation was carried out at 37°C and after 40 min the solution was centrifuged at 8000 ϫ g for 5 min, and the supernatant immediately analyzed by RP-HPLC-ESI-MS.
Removal of N-Terminal Acetylation-Freeze-dried powder of the purified protein or of protein mixtures was incubated in the presence of 25% TFA at 55°C for 1h and the solution was analyzed by RP-HPLC ESI-MS.
Reduction and Alkylation of Cysteine Residues-Freeze-dried powders of the purified proteins or protein mixtures were dissolved at a concentration about 5 mol/L in 100 l of 100 mmol/L Tris-HCl buffer pH 7.5 containing 5 mmol/L dithiotreitol. The solution was incubated at 100°C for 5 min and subsequently at 50°C for 15 min. After incubation, 1 l of dimethylsulfoxide and 13 l of 0.4 mol/L 4-vinylpyridine in methanol 95% were added at room temperature to the solution. The solution was kept in the dark at room temperature for 3 h. Reaction was stopped by adding 115 l of 0.2% TFA in H 2 O. The solution was analyzed by RP-HPLC ESI-MS.
Trypsin Digestion-50 g of the freeze-dried powder of the purified protein or protein mixture was dissolved in 250 l of 0.5 M ammonium hydrogen-carbonate (pH 8.0) and digested at 37°C under stirring for 5 h by adding 10 units of immobilized trypsin (Pierce Biotechnology, Rockford, IL). After incubation the solution was centrifuged at 8000 ϫ g for 5 min and the supernatant lyophilized and stored at Ϫ80°C.
Data Analysis-Deconvolution of averaged ESI mass spectra was automatically performed either by the software provided with the Deca-XP instrument (Bioworks Browser) or by MagTran 1.0 software (11). Experimental mass values were compared with theoretical average mass (Mav) values available at the Swiss-Prot data bank (http:// us.expasy.org/tools). Proteins and derivatives were identified as described in the result section.
The relative abundances of proteins and derivatives were determined by measuring the XIC peak area and relating it to 1.0 ml of whole saliva. Under identical experimental conditions this value is linearly proportional to peptide concentration and it can be used with confidence to monitor relative abundances (12). In the determination of XIC peak area a correct choice of the m/z values for protein detection is necessary, avoiding m/z potentially overlapping with ESI spectra of other closeeluting proteins in crowded chromatographic elution ranges (see "Results"). The window for all the m/z values chosen was in a range of Ϯ 0.5 m/z. The percentage error of the measurements was less than 10%. Pearson r coefficient was used to evaluate linear correlations among the (XIC peak area/ml of whole saliva) of different proteins. Data obtained from the analysis of trypsin digestion products by nano-HPLC-ESI-MS/MS LTQ Orbitrap XL apparatus were elaborated by the Proteome Discoverer 1.0 program, based on SEQUEST cluster as search engine (University of Washington, USA, licensed to Thermo Electron Corp., San Jose, CA) against Swiss-Prot human proteome (March 10th, 2010 release; uniprot-taxonomy-9606-AND-reviewed-yes.fasta; 34756 nonredundant protein sequences). For peptide matching the following limits were used: Xcorr scores greater than 1.5 for singly charged peptide ions and 2.0 and 2.5 for doubly and triply charged ions, respectively, one missed cleavage sites. Different searches were carried out allowing the recognition of various post-translational modifications (PTMs). Precursor mass search tolerance was set to 10 ppm and fragment mass tolerance was set to 1.5 Da.  Tables I and II) refer to ␣-globin and ␥-globins, respectively. The indent in panel B shows, for example, the ESI spectrum of the protein eluting between 40.0 and 40.8 min, which after deconvolution displayed an experimental Mav of 10834 Ϯ 2 Da and that was identified as S100A8. Whole saliva (WS), NL, normalization level; RT, retention time. different from that of the adult (Fig. 1, A and B). In the profile of saliva from adult ( Fig. 1A) the principal chromatographic peaks corresponded to peptides and proteins specific of salivary glands (acidic and basic proline-rich proteins, statherin, histatins, and S-type cystatins) and to their fragments and derivatives (13)(14)(15)(16)(17)(18). In the profile of saliva from preterm newborns ( Fig. 1B) proteins characteristic of adult whole saliva were either absent or present in very low relative amounts. On the contrary, various proteins almost absent in adult whole saliva were consistently detected in whole saliva of preterm newborns, as evidenced by deconvolution of their ESI-MS spectra. For example, the indent of Fig. 1 shows the ESI mass spectrum of the protein, identified as S100A8, eluting between 40.0 and 40.8 min, which following deconvolution displayed an experimental average mass (Mav) value of 10834 Ϯ 2 Da. The numbers assigned to specific elution positions in the B profile correspond to the protein masses listed in Tables  I and II (Table III). Table I shows the 25 Mav values of the salivary proteins observed with high frequency (Ͼ 50%) in 61 specimens of whole saliva collected at different postconceptional age from eight preterm newborns. Those detected with low frequency (10%-50%) are reported in Table II. In Tables I and  II protein elution times are also reported.

Top-Down HPLC-ESI-MS Analysis of Whole
The detection of ␥-globins required sometimes the modification of the elution gradient because of overlapping with heterogeneous glyco-proteins, which generated crowded ESI spectra.
Identification of the Proteins Detected in Saliva of Preterm Newborns-Because of the small sample volume available, the most demanding task of this study was the characterization of the proteins reported in Tables I and II. The Mav value of the intact protein, even if determined with good accuracy (in our apparatus Ϯ 0.02%), is obviously not enough for the definitive identification. The experimental scheme of the topdown proteomic approach followed in this study is described in the flow chart of Fig. 2. Preparative RP-HPLC on some samples of preterm newborn saliva allowed several chromatographic fractions to be collected. Fractions were analyzed by HPLC-ESI-MS in order to check protein content, and, because of similar polarity, more than one protein/peptide in each chromatographic fraction was often detected.
Fractions showing a predominant protein were lyophilized and an aliquot subjected to automated Edman sequencing. It was possible to sequence only lysozyme and antileukoproteinase whereas results of the other proteins suggested a blocked N-terminal residue. Some of the chromatographic fractions were subjected to reduction and alkylation with vinyl-pyridine, others were treated to remove the N-terminal acetyl group, others were submitted to the action of phosphatases, follow- ing the procedures described in the Experimental Procedures section. After these reactions, an aliquot of the modified proteins was analyzed by HPLC-ESI-MS in order to measure the mass variation. Another approach for protein characterization was based on trypsin digestion of purified proteins and analysis of digestion products by nano-HPLC-ESI-MS/MS using the high-resolution LTQ Orbitrap XL apparatus. Many MS/MS fragmentation spectra were manually analyzed. This approach sometimes resulted in plausible protein identification. However, often theoretical Mav values deduced from the sequence reported in the data bank did not correspond to the experimental one. For some proteins the hypothesis of loss of the N-terminal methionine residue followed by N-terminal acetylation resulted in a good correspondence of the theoret- ical Mav with the experimental one. Protein structures were definitely confirmed by submitting intact protein to high-resolution nano-HPLC-ESI-MS/MS experiments (Fig. 2), performed on the most abundant ions. Sometimes this procedure provided an MS/MS fragmentation spectrum rich enough to be manually compared with the theoretical spectrum generated by the MS-Product program available at the Protein-Prospector site (http://prospector.ucsf.edu/prospector/mshome. htm). Structures were assumed as correct if the experimental fragments with a relative abundance higher than 10% were present in the theoretical fragmentation spectrum with 0.03 Da fragment mass tolerance (see Supplemental Data). Table III lists the experimental and theoretical mass values, the number of accessible cysteins, the proposed PTMs, and the methodologies utilized to establish protein identity.
Proteins with Mav values corresponding to the values available in Swiss-Prot data bank were T␤ 4 and T␤ 10 , S100A8 (calgranulin A; for S100 nomenclature see reference (19)), stefin B, lysozyme C, S100A12 (calgranulin C), antileukoproteinase, peroxyredoxin 6 (this last pending for a definitive characterization), ␣-globin and ␥-globins. The protein with a Mav 11,006 Ϯ 2 Da was identified as stefin A with an additional N-terminal methionine residue (see Supplemental Data) with respect to the isoform reported in Swiss-Prot (Mav 10,875 Da). We detected just following the stefin A peak (Table I) a protein with an experimental mass of 10,875 Ϯ 2 Da but the MS/MS fragmentation spectrum of the intact protein, even poor, did not correspond to the theoretical fragmentation values expected for stefin A missing N-terminal methionine. S100A7 (psoriasin) was detected in two isoforms, one   27 3 D. Both S100A7 variants were found to be submitted to N-terminal acetylation following the loss of the terminal methionine. The Mav values of the other proteins reported in Table III were in agreement with different PTMs not reported in the data banks. S100A9 (calgranulin B) was detected in four isoforms already characterized by other authors in human granulocytes (20). Two isoforms (Mav 13,153 Ϯ 2 and 13,233 Ϯ 2 Da), which could be defined as long-types, were found to be acetylated following loss of the N-terminal methionine residue and differed from each other in phosphorylation of the penultimate threonine residue of the sequence (Thr 112 ). The other two isoforms (Mav 12,689 Ϯ 2 and 12,769 Ϯ 2 Da), defined as short-types, were found to be acetylated following the loss of the five N-terminal amino acid residues (MTCKM) and differed in the phosphorylation of the same residue of the long-types (Thr 108 ). S100A11 (calgizzarin) was also found to be acetylated at the N-terminal residue following methionine loss (PTM not available in Swiss-Prot data bank). In a previous study (10) we were able to establish that the same PTMs (methionine loss and N-terminal acetylation) were responsible for the observed Mav values of the two isoforms of SPRR3. The isoform with Mav 18,065 Ϯ 3 Da differed for the insertion of a further octapeptide repeat (GCT-KVPEP) to the five identical repeats present in the 17,239 Ϯ 3 Da isoform and for the presence of Leu at positions 148, instead of Val at the corresponding position 140 of the shorter isoform (10). Three subjects (out of 8) were homozygous for the 17,239 Ϯ 3 Da isoform, one was homozygous for the 18,065 Ϯ 3 Da isoform, and four subjects were heterozygous. No polymorphism was observed for the other salivary components of preterm newborns (except for ␥ globin, which was detected, obviously, as A␥ and G␥ isoforms). Tentative identification of the protein with Mav 22,365 Ϯ 4 Da with histone H1c (histone H1.3) was based on analysis of the tryptic digestion mixture obtained from the purified protein. Data analysis by Proteome Discoverer attributed three peptides to histone H1c, and this assignment was also manually checked. The three fragments corresponded to the sequences 34 -46, 65-75, and 140 -148 of the protein, two present in other histones, and one (fr. 140 -148) specific for histone H1c. However, the theoretical Mav of histone H1c is 22,219 Da, 146 Ϯ 4 Da less than the experimental value. Different isoforms of H1 histone have been detected in the spleen of adult subjects (21). Thus, we may hypothesize either that the preterm newborn protein could be an unknown variant of histone H1c or that not characterized PTMs are present in the mature protein. Tentative identification of peroxiredoxin 6 was only based on the detection in saliva of preterm infants of a peptide corresponding to the fragment 1-32 of the intact protein. Therefore, this protein is still pending for a definitive identification. Detailed information on the results of the experiments performed to characterize proteins and PTMs reported in Table III is available in Supplemental Data. Some other protein masses (not reported) were sporadically detected in whole saliva of preterm newborns and some masses, probably pertaining to peptide fragments of larger proteins, were also detected. Moreover, the (small) precipitate formed during the acidic treatment of whole saliva might contain some other proteins/peptides not detectable in the adult. Their characterization will require further studies.
Determination of Protein Relative Amount as a Function of Preterm Postconceptional Age-To study the time course of the identified proteins, HPLC-ESI-MS analyses of samples collected at different postconceptional age (from 194 to 545 days) from eight newborns, for a total of 61 samples, were carried out. The relative amounts of each protein were established, as previously reported (6), by using the area of the eXtracted Ion Current (XIC) peaks and relating it to 1 ml of whole saliva. Quantitative analysis is reported only for the 25 proteins with higher frequencies in the 61 samples listed in Table I. The multiply charged ions selected for XIC search of each protein are reported in Table IV. Fig. 3 shows the mean values of the ratio (XIC peak area)/ (ml of whole saliva (WS)) computed in three ranges of postconceptional age, i.e. 194 -235 days (n ϭ 26), 236 -290 days (n ϭ 21), and 291-545 days (n ϭ 14) for all the 25 proteins. The values reported for SPRR3 correspond to the sum of the two isoforms. In the heterozygous, the XIC peak areas of the two isoforms were always very similar.
Even though the values reported in Fig. 3 do not correspond to absolute concentration values, because ESI efficiency differs between proteins, they suggest that concentrations of the proteins were markedly different. In previous studies, we correlated the values of the XIC peak area of T␤ 4 and T␤ 10 to their concentration (8,9). The mean concentration of T␤ 4 in saliva of preterm newborns with 194 -235 days of postconceptional age was about 1 mol/L, and that one of T␤ 10 was about four times lower. Because of the high structural similarity of isoforms and variants, data reported in Fig. 3 suggest that concentration of the D 27 variant of S100A7 is at least three times higher than the E 27 variant.
In order to compare the concentration variation over time of the various proteins reported in Table I, quantitative data obtained in the 61 samples were normalized to the highest value determined for each protein (Fig. 4). Fig. 4 shows that concentration of the majority of the proteins decreased, even though with different slope. For instance, although stefin A (panel 8) concentration showed a fast decrease from 195 days of postconceptional age, S100A8 (panel 6) decrease was slower. The unidentified protein with Mav 24,652 Ϯ 5 Da (panel 25) and S100A7 (D 27 ) (panel 11) showed low concentration values immediately following birth followed by a fast increase within 1 or 2 weeks up to maximum of concentration. Thereafter, concentration decreased and reached the minimal levels around 260 -280 days of postconceptional age (corresponding to the normal term of delivery). Panel 12, which shows data of S100A7 (E 27 ), is compressed because of three high values detected in one subject. However, the shape is similar to that one of S100A7 (D 27 ) isoform, but with values significantly lower. Further evidence from Fig. 4 is that concentration of all the proteins reached the minimal levels in correspondence with the normal term of delivery, sometimes disappearing, at the sensitivity level of the MS apparatus.  Fig. 3)  temporal range considered, and therefore they are different from those reported in Fig. 4. Fig. 5 clearly shows different trends of decrease, some almost linear (i.e. T␤ 4 ), others with exponential-like shapes, but with different slopes (i.e. stefin B and the not identified protein with a Mav of 10,651 Ϯ 2 Da), others characterized by a sharp decrease following a constant apex period (i.e. short isoform of S100A9 and SPRR3), and others, such as S100A7 (D 27 ), by a biphasic trend. Interestingly, Fig. 5 shows that the decrease of concentration of each protein showed similar shapes in the different subjects, but it was often phase shifted. Because of high interindividual variability temporal changes of the different proteins varied markedly in the 8 subjects. Table IV shows the postconceptional age ranges corresponding to the maximum concentration observed for each protein.

Charge of the ions utilized for the determination of the XIC peak area of the 25 proteins of Table I and parameters expressing concentration trend over time of each protein in the population investigated. Experimental error of the average mass: Ϯ 0.02%. Ratios between the mean of (XIC peak area)/(ml of whole saliva) values (reported in
When the level measured at the lowest postconceptional age was the highest, it was assumed that maximum concentration occurred before 195 days of postconceptional age. In order to find a parameter that could give for each protein a rough estimate of the concentration trend over time in the population investigated, we calculated the ratio between the mean concentration values shown in Fig. 3. The results reported in Table IV suggest in some cases time-coordinate functions, such as for stefin A and B whose decrease was similar, in other hierarchical functions, such as for S100A7, whose increase followed that of many other proteins. Fig. 6 shows the correlation coefficients (Pearson r) computed among all the proteins of Table I. Because of the size of our sample (n ϭ 61), r values greater than 0.400 indicate a level of significance better than p Ͻ 0.001. Fig. 6 shows several clusters of high significant correlations, including the majority of S100 proteins, stefins, SPRR3, and some unidentified proteins. For its particular biphasic trend, S100A7 showed less significant correlations with the other proteins. Antileukoproteinase, lysozyme, and histone H1c showed low significant correlations. The unidentified protein with Mav 24,652 Ϯ 5 Da showed a significant correlation only with lysozyme. Interestingly, although T␤ 10 correlation with many proteins of the list was highly significant, the same was not observed for T␤ 4 . Interestingly, the two peptides showed not very high significant correlation each other. The highest correlation values (higher than 0.93) were observed between the two S100A9 isoforms and their phosphorylated counterparts, between S100A8 and S100A9 short type, as well as between SPRR3, stefin A, stefin B and the unidentified protein with Mav 10,651 Ϯ 2 Da.  Table I normalized to highest value determined for each protein in the 61 samples plotted as a function of postconceptional age. SPRR3 data refer to both isoforms. PCA, postconceptional age.
Extensive quantitative analysis of the proteins reported in Table II was not carried out, principally because of their low frequency. They were usually detectable in the samples in the range of 194 -240 days of postconceptional age, disappearing in the total ion current profile at higher postconceptional age. This observation suggests that numerous other different proteins are probably present in fetal mouth during a temporal period preceding the one investigated in this study, with a maximum of expression occurring before 6 to-7 months of postconceptional age.
Comparison of the Salivary Protein Patterns of At-Term Newborns with Adults-As shown in Fig. 4, concentration of all the proteins detected in this study decreases in the range 195-280 days of postconceptional age reaching a value up to two orders of magnitude lower than the maximum measured. For all the proteins reported in Tables I and II the mean values measured in the range 270 -280 days of postconceptional age (considered as the normal delivery time) corresponded perfectly to the mean values measured in whole saliva of at term newborns (n ϭ 4). The mean values measured following 350 days of postconceptional age were very similar to those measured in adult whole saliva (n ϭ 20). Often the protein was not detectable at all in adult whole saliva, even by XIC search, at the sensitivity level of our HPLC-MS apparatus. Statistical analysis of all the data collected did not disclose gender related differences.

DISCUSSION
The top-down proteomic approach used in this study gave relevant qualitative and quantitative information on the salivary proteome of human preterm newborns. In contrast to bottom-up proteomic strategies, knowledge of the mass of the intact protein obtained by MS measurements allows to discriminate between different isoforms. Clearly, the topdown strategy used in this study appears the most direct approach to detect the differences, e.g. between the isoforms of S100A7 and S100A9 proteins.
It should be mentioned that the proteins characterized in this study are likely to exert their function throughout a subtle interplay between monomeric and homo-and hetero-oligomeric not-covalent assemblies, which are unstable under the acidic conditions used during HPLC-ESI-MS analysis. Moreover, the acidic treatment generates a small precipitate, which in adult whole saliva is mainly constituted by mucins. In the preterm whole saliva, the precipitate can enclose other proteins relevant for the fetal development, which will be the aim of future studies.
Saliva is a body fluid of a very complex composition with fundamental roles in the protection of the mouth and its annexes during life. Although the precise molecular interplay between the different salivary adult components is far from being elucidated, the exploitation of this function is partly granted by specific peptides and proteins secreted by salivary glands that could play relevant roles in the protection of the enamel surface, in the assumption of food and beverage and in the modulation of the oral flora (1). This protection is not necessary during fetal life because fetus nutrients are supplied directly by the placental system of the mother and therefore protected against the threats of external environment. Until now the fetal mouth has been believed to be a passive organ reaching the physiological-anatomical maturation mostly following delivery. However, our group recently reported high levels of T␤ 4 and T␤ 10 in preterm newborn saliva (8). We were able to show that these peptides in fetuses mainly originate from salivary gland secretion, in contrast to adult saliva where low quantities originate from the gingival crevicular fluid (9). Thus, salivary glands may have a specific function during the intrauterine life. This view is further supported by this study, where we could show that several other proteins were present in whole saliva of preterm newborns. Salivary concentration of these proteins decreased as a function of postconceptional age, reaching values observed in at term newborns at about 270 The decrease of concentration as a function of preterm newborn age can be partly linked to an increased salivary flow-rate, which could not be measured for ethical reasons. However, because the shape of decrease for many proteins detected in this study was different, the variations are probably not linked to a dilution process but rather suggest a coordinate and hierarchical actions of these proteins. The interindividual variability shown in Fig. 5 most likely is not linked to feeding habits because of similar nutritional program, standardized for all the subjects.
Among the proteins characterized, many are members of the S100 protein family, which is the largest subgroup within the superfamily of EF-hand calcium-binding proteins, exclusively expressed in vertebrate (22,23). A specific feature of this protein family is that some members are secreted from cells upon stimulation exerting cytokine-and chemokine-like extracellular activities via the receptor for advanced glycation endproducts. Functional diversification of this family of proteins is achieved by specific cell-and tissue-expression patterns, structural variations, different metal-ion binding properties, as well as by their ability to form homo-and hetero-and even oligomeric assemblies (24). In this study significant amounts of S100A7, A8, A9, A11, and A12 were detected in whole saliva of preterm newborns. Interestingly, the most abundant S100A7 isoform detected was the D 27 , with minor quantities of the E 27 isoform. Some of the mass values not identified could pertain to other S100, such as the protein with a Mav 10,651 Ϯ 2 Da, which might correspond to S100B or S100A5. Minute amounts of a protein with a mass of 11,712 Ϯ 2 Da, which might correspond to S100A16, eluting in the proximity of S100A7, were found in few samples.
The presence of high amounts of two isoforms of SPRR3 in preterm whole saliva is intriguing, and it may be in some way related to the S100 family. The molecular mechanisms underlying the formation of the epithelial barrier start from the desmosomal cadherins, that are linked to the keratin cytoskeleton via several plaque proteins, such as desmoplakin and ␥-catenin. However, oral keratinocytes express additional differentiation markers, including filaggrinin and tricohyalin, which associate with the cytoskeleton during terminal differentiation. Moreover, other proteins such as loricrin, involucrin, and small proline-rich proteins, comprising SPRR3, are crosslinked into the epithelial envelope by transglutaminases (25). The presence of SPRR3 in preterm whole saliva could be thus linked to the inability of immature keratinocytes to cross-link the protein to the cytoskeleton for a lack of epithelial trans-  Table I. The significance at the conventional levels for r values greater than 0.400 is better than p Ͻ 0.001 (n ϭ 61). The range of r values corresponding to the different colors is reported on the right side of the figure. SPRR3 data refer to both isoforms.
glutaminase. However, in this study no other proteins of the cytoskeleton were detected in whole saliva of preterm newborns and none of the protein masses waiting for characterization seems to correspond to other proteins of the cytoskeleton. Free SPRR3 is only detected in highly differentiated keratinocytes of well-differentiated squamous cell carcinomas, indicating that the protein is induced in skin tumors only when keratinocytes undergo extensive squamous differentiation (26). Furthermore, various authors have suggested that the proteins of the SPRR3 family (SPR family, pancornulins) play a relevant "active" role in the formation of cornified epithelia (27). The correlation between SPRR3, and the proteins of S100 family is interesting, considering that several of these proteins are related (have an EF-hand structural motif) and their genes localized in a cluster on human chromosome 1q21, which encodes for this protein (SPRR3) as well as for S100A7, A8, A9, A10, A11, and loricrin (19,28,29). Because of the presence of different transglutaminase reactive residues, S100A10 and S100A11 are incorporated into the cornified envelope of keratinocytes (30,31). The colocalization of genes expressed during maturation of epidermal cells together with genes encoding calcium-binding proteins is particularly intriguing, because calcium levels tightly control the differentiation of epithelial cells, the expression of genes encoding epidermal structural proteins, as well as transglutaminase activity. Because of this close functional cooperation, the cluster 1q21, including these loci, has been proposed to be a specific gene complex and has been named "epidermal differentiation complex" (32).
The detection of stefin A and stefin B in whole saliva of preterm newborns can be related to some recent hypothesized properties of these proteins. Stefins A and B (or cystatins A and B) are indeed the better known members of type 1 cystatin family (cysteine proteinase inhibitors), generically called stefins (33). Stefins were commonly considered potent intracellular inhibitors of papain and cathepsins L, S, and H, but they have also been detected in significant amounts in different body fluids (33,34) and recent studies outlined a very complex set of functions and interplays for them (35). Stefin A is predominantly present in blood and epithelial cells, particularly in high amounts in immune follicular dendritic cells, where it has been speculated that it plays a role in the prevention of apoptosis of B-lymphocytes by inhibition of cathepsin B (36,37). A recent proteomic study postulated relevant functions of stefin A in mouse neonatal skin during development and in immune response (38), role probably related to the significant presence of stefin A found in this study in preterm whole saliva. Some authors have established the participation of this protein in the structural organization of cornified cell envelope too (39).
Stefin B is more widely distributed inside the cytoplasm of most human cells and increased levels have been described in a variety of malignant tumors (40). Its gene expression is increased in human monocytes following stimulation by li-popolysaccharides (41) and it plays important and composite, although not defined, roles in neural stem cells and in differentiated neurons and astrocytes (42). Moreover, it is an intracellular modulator of rat bone reabsorption (43). However, a possible correlation between stefin A and B with other proteins such as SPRR3 and S100 proteins are presently not known.
Antileukoproteinase could also be involved in the development of mucosal epithelia. This protein is an inhibitor of granulocytic serine proteases, modulating granulocyte-endothelium interaction (44) and it seems involved in the formation of cartilage, because of its accumulation in normal but not in arthritic cartilage (45). T␤ 4 and T␤ 10 are new entries in the repertoire of adult salivary proteins and peptides (9,46). Beta thymosins are ubiquitous peptides with intra-and extracellular functions (47,48), although the secretion pathway of these peptides is not fully understood (49). T␤ 4 was shown to modify the rate of attachment and spreading of endothelial cells on matrix components inducing matrix metalloproteinases (50) and to stimulate the migration of human umbilical vein endothelial cells (51). Although T␤ 4 is a potent enhancer of angiogenesis, T␤ 10 inhibits it and changes of the two peptides concentration ratio can exert either positive or negative control (52,53). As demonstrated by Reti and coworkers (54), T␤ 4 plays an important role in suppressing the production of interleukin-8 following stimulation by tumor-necrosis factor ␣ and it acts on the whole as antimicrobial, anti-inflammatory and anti-apoptotic peptide on gingival fibroblasts. The properties of T␤ 10 are less known when compared with T␤ 4 . The results of this study confirm that the two peptides are present in significant amounts in preterm whole saliva, and at variance in adult saliva, they derive mainly from salivary gland secretion (8). T␤ 4 andT␤ 10 could play a role in angiogenesis, which has also reported for some S100 proteins, mainly S100A4 (for recent review see (55)). The different biological role of these peptides may be at the basis of the low correlations observed among the levels of thymosins ␤ and the other proteins (Fig. 6).
The presence of large amounts of histone H1c in preterm whole saliva is puzzling. Indeed, in the hypothesis of its release from cell shedding, other histones should have been present in saliva. Similar consideration can be made for lysozyme, because its defensive role against pathogens does not fit with its decrease observed as a function of postconceptional age. The detection of ␣and ␥-globins was restricted to samples of all the subjects in the approximate range of 194 -240 days of postconceptional age. Their detection most probably is because of their release from the preterm mucosal epithelium.
Many of the proteins identified in this study are considered to be tumor markers in the adult. This observation led us to suppose that during fetal development the interplay between these proteins might contribute to the molecular events re-sponsible for cell growth and death. On the contrary, the abnormal regulation of their expression in the adult might be at the basis of anomalous cellular growth and might be connected to the development of different tumors with embryonal etiology. Moreover, the recognition of tumor stem cells in many solid cancers has reinvigorated the hypothesis of a pluripotent stem cell as the cell of origin for cancer (56). Because these tumor stem cells have direct access to embryologic programs, including the capacity to produce proteins and peptides normally secreted only during intrauterine life (57), we can speculate that at least a part of the fetal proteins described in this study, following disappearing in the postnatal life, could reappear in cancer cells. Preliminary data from our group on T␤4 expression in salivary glands' tumors and in colon cancer (58) evidenced T␤4 reactivity in tumor cells undergoing epithelial-mesenchymal transition, a highly conserved cellular program typical of several stages of embryonic development as well as of cancer invasion and metastasis (59). This finding seems to further support our hypothesis.
In conclusion, by analyzing a noninvasive specimen collection of saliva, this study suggests that the development of preterm newborns, and fetuses, requires the presence of distinct proteins of variable amounts at defined stages. Some of them have been characterized in this study, but others are waiting for a definitive identification. Further studies will be conducted to investigate their functions and interplay and to establish if they derive from gland secretion (as shown for T␤ 4 ), from the fetal oral epithelium or other sources.