Quantitative Age-specific Variability of Plasma Proteins in Healthy Neonates, Children and Adults*

Human blood plasma is a complex biological fluid containing soluble proteins, sugars, hormones, electrolytes, and dissolved gasses. As plasma interacts with a wide array of bodily systems, changes in protein expression, or the presence or absence of specific proteins are regularly used in the clinic as a molecular biomarker tool. A large body of literature exists detailing proteomic changes in pathologic contexts, however little research has been conducted on the quantitation of the plasma proteome in age-specific, healthy subjects, especially in pediatrics. In this study, we utilized SWATH-MS to identify and quantify proteins in the blood plasma of healthy neonates, infants under 1 year of age, children between 1–5 years, and adults. We identified more than 100 proteins that showed significant differential expression levels across these age groups, and we analyzed variation in protein expression across the age spectrum. The plasma proteomic profiles of neonates were strikingly dissimilar to the older children and adults. By extracting the SWATH data against a large human spectral library we increased protein identification more than 6-fold (940 proteins) and confirmed the concentrations of several of these using ELISA. The results of this study map the variation in expression of proteins and pathways often implicated in disease, and so have significant clinical implication.

Human blood plasma is a complex but ideal biological system containing proteins from a variety of cellular localizations. As such plasma is one of the best reporter systems in the clinical setting, as it communicates with most parts of the body and reflects changes in the state of the organism as a whole. Unlike serum, which is depleted of coagulation proteins that play a role not only in coagulation but also in processes such as inflammation, human plasma contains all the soluble proteins found in blood. Hence, human plasma is a commonly used sample for biomarker discovery.
Proteomics studies utilizing human plasma have, to date, focused largely on specific disease settings, on detecting the highest number of proteins or on the effect of certain drugs on changes in plasma protein expression. In addition, most such studies have focused on adults, with limited number of studies and hence limited knowledge of the plasma proteome in infants and children. The plasma proteomic studies in children have focused on examining changes in protein expression that occur during Cardiopulmonary Bypass Surgery (1), as well as in several disease states including: blood disorders (2), sickle cell disease (3), leukemia (4), and cystic fibrosis (5). There has, to date, only been one study that has focused on the physiological development of the plasma proteome by examining the changes in plasma protein expression that occur with age, from neonates through to adults (6). That study utilized two-dimensional difference in gel electrophoresis (2D-DIGE) methodology and described the concept of aging of the human plasma proteome by identifying significant age-specific changes in expression of up to 100 proteins spots.
The Human Proteome Organization (HUPO) 1 Plasma Proteome Project (PPP) recognizes the importance of analyzing and understanding age related differences in the plasma proteome by identifying this as one of their scientific aims and research priorities (7). In addition, the main aim of the recently established Pediatric Proteomics (PediOme) initiative, a part of HUPO, is to advance the use of proteomics techniques to solve major issues in pediatric medicine through characterization of the pediatric proteome across a wide-variety of tissues and biological samples.
In 2012, a new proteomics approach known as SWATH-MS became available (8). This high-throughput, data independent acquisition (DIA) method allows confident identification and quantification of query peptides across large sample sets with a dynamic range of 4 orders of magnitude, and is as such well suited to studies of human plasma. A particular advantage of SWATH-MS is the ability to re-mine previously acquired data sets. As an emerging technology, to date only a handful of studies have utilized SWATH-MS for investigations of human plasma (9,10). The first study investigated the performance and suitability of SWATH-MS for quantification of Nlinked glycoproteins in human plasma and suggested that this combined approach is suitable for biomarker discovery and verification (9). The second study utilized SWATH-MS to quantify the variability in the plasma proteome in monozygotic and dizygotic twins, demonstrating large variability in protein abundance between twins (10).
The use of SWATH-MS technology in neonatal and pediatric plasma samples to determine variability and subsequent comparison to adult plasma remains an exciting research opportunity, and one that has, to date, not been explored. Therefore, this study aimed to utilize SWATH-MS to quantitate age-specific variability of plasma proteins in healthy neonates, children, and adults.

EXPERIMENTAL PROCEDURES
Participant Recruitment, Sample Collection and Storage-Blood samples used in this study were collected and processed according to established protocols (6,(11)(12)(13)(14)(15). Neonatal samples (within 72 h postbirth) were collected from healthy term neonates from the Family Birthing Unit or post-natal wards at the Royal Women's Hospital, Melbourne. Eligibility criteria included: gestation Ͼ37 weeks, vaginal delivery, birth-weight Ͼ2500gm, APGAR at 5min Ն7 and absence of systemic abnormalities. Pediatric samples were obtained from healthy children attending the hospital for minor elective day-surgery (e.g. tongue-tie release and circumcision) who were not receiving any medications and had no significant family history of major diseases, particularly diseases known to be associated with the process of aging (e.g. diabetes and cardiovascular disease). The pediatric cohort excluded adolescents taking oral contraceptive and smokers. Adult samples were obtained from healthy volunteers aged between 20 and 45 years of age who were not taking any medications (including oral contraceptive), were not smokers, and similarly, with no past medical history. Family history was assessed via a brief interview with the parents of the children and the adult volunteers themselves. Blood samples were collected in S-Monovette® tubes (Sarstedt, Australia), containing 1 volume of citrate per 9 volumes of blood and were centrifuged at 3000 rpm, 10 min, 10°C, with plasma stored at Ϫ80°C, until testing.
This study was approved by the Royal Children's Hospital Ethics in Human Research Committee, reference number 20031; and the Royal Women's Hospital Human Research Ethics Committee (#02/08). Writ-ten informed consent was obtained from parents of children and from adult participants themselves.
Plasma Sample Preparation-25 l of human plasma sample was diluted in 475 l of 50 mM ammonium bicarbonate (Sigma, St. Louis, MO) solution. Samples were reduced with 5 mM dithiothreitol (Bio-Rad, Hercules, CA) at 65°C for 30 min, followed by alkylation with 10 mM iodoacetamide (Sigma) at room temperature for 30 min in the dark. One fifth of the reduced and alkylated sample (121 l) was digested with sequencing-grade porcine trypsin (20 g, Promega) overnight at 37°C. The digested sample was diluted (ϫ10.5) in 0.1% formic acid prior to all mass spectrometry analyses.
Information Dependent Acquisition Mass Spectrometry-A 5600 TripleTOF mass spectrometer (Sciex, Framingham, MA) coupled to an Eksigent Ultra-nanoLC-1D system (Eksigent Technologies, Dublin, CA) was employed for both IDA and SWATH-MS analysis. For IDA data, 10 l of digested sample from each group was pooled, making four group pools (Neonate, Ͻ 1 year, 1-5 year, adult) and 5 l from each group was injected.
Peptides were loaded onto a reverse phase peptide C18 Captrap (Bruker, Billerica, MA) for pre-concentration and desalted for 5 min with the loading buffer (0.1% (v/v) formic acid, 2% (v/v) acetonitrile) at a flow rate of 10 l per minute. After desalting, the peptide trap was switched in-line with an in-house packed analytical column (150 m ϫ 10 cm, solid core Halo C18, 160 Å, 2.7 m (Bruker)). Peptides were eluted and separated from the column using the buffer B (99.9% (v/v) acetonitrile, 0.1% (v/v) formic acid) gradient starting from 2% and increasing to 10% in 10 min then to 35% over 78 min at a flow rate of 500 nL per minute. After peptide elution, the column was flushed with 95% buffer B for 15 min and re-equilibrated with 98% buffer A (0.1% (v/v) formic acid) for 15 min before next injection. Blanks were run between samples. In IDA mode, a TOFMS survey scan was acquired at m/z 350 -1500 with 0.25 s accumulation time, with the ten most intense precursor ions (2 ϩ -5 ϩ ; counts Ͼ 150) in the survey scan consecutively isolated for subsequent product ion scans. Dynamic exclusion was used with a window of 20 s. Product ion spectra were accumulated for 50 milliseconds in the mass range m/z 100 -1500 with rolling collision energy.
IDA Data Analysis-Protein identification from IDA data was performed with ProteinPilot (v4.2, Sciex) using the Paragon algorithm. The search parameters were as follows: sample type: identification; cys alkylation: iodoacetamide; digestion: trypsin; instrument: Trip-leTOF 5600; special factors: none; ID focus: biological modifications; miss-cleavages: two. The Homo sapiens search database was obtained from UniProt (20,265 entries, Jan 2015). A reversed-decoy database search strategy was used with ProteinPilot, with the calculated protein FDR equaling 0.68%.
SWATH Library Construction-The ProteinPilot group file from the IDA search result of the 4 sample groups was imported into PeakView (v2.1) (Sciex) and used as a local peptide assay library. This library contained 151 proteins identified from un-depleted plasma samples. An extended peptide assay library was constructed using the SWATHXtend script (16), which merged the local peptide assay library with a 10,000 protein human (17) spectral library downloaded from SWATHAtlas.
SWATH-MS-For SWATH-MS experiments 5 l of each digested sample was injected in sample order, with two technical replicates for each injection; one sample was run in triplicates once at the beginning and twice at the end of the series to check for potential instrument drift as detailed in Supplemental Information. Identical LC conditions were used as described above, with m/z window sizes determined based on precursor m/z frequencies in previous IDA data. SWATH variable window acquisition with a set of 60 overlapping windows (1 amu for the window overlap) was constructed covering the mass range of m/z 399.5 -1249.5. In SWATH mode, first a TOFMS survey scan was acquired (m/z 350 -1500, 0.05 s) then the 60 predefined m/z ranges were sequentially subjected to MS/MS analysis. Product ion spectra were accumulated for 50 milliseconds in the mass range m/z 350 -1500 with rolling collision energy optimized for lowed m/z in m/z window ϩ10%, with the total duty cycle of 3.7 s. SWATH Data Analysis-SWATH peaks were extracted using PeakView (v.2.1). Shared and modified peptides were excluded. Peak extraction parameters were set as the following: 100 peptides per protein, 6 transition ions per peptide, peptide confidence threshold 99%, FDR extraction threshold 1%, XIC (Extract Ion Chromatogram) retention time window 10 min and mass tolerance 75 ppm. The extracted transition ion peak areas, peptide peak areas and protein peak areas were exported in Excel for further statistical analysis.
Experimental Design and Statistical Rationale-Quantitative MS data was obtained from ten independent biological samples for each of the four age cohort groups. Two technical replicate were obtained for each biological sample in SWATH acquisition. The peptide ion peak areas were averaged for the replicate technical injections, then further scaled so that the sum of the ion intensities of each sample equal the maximum total ion intensity. The normalized ion intensities were summed for each peptide and protein. The entire protein level data was clustered using hierarchical clustering (Euclidean distance and complete linkage), and visualized using a principal component analysis to examine global trends. The data distribution was checked visually using boxplots and density plots (Supplemental information: data quality).
Two approaches were undertaken for determining differential expression: an analysis of variance on the log-transformed normalized protein peak areas, and pairwise comparisons of pairs of samples. For the analysis of variance, proteins were deemed to be differentially expressed if the ANOVA p value was less than 0.05, and the maximum protein fold change exceeded 1.5. We evaluated adjusting the ANOVA p values for multiple testing using the fdr criterion of Benjamini and Hochberg, however the combination of unadjusted p value with fold change thresholds was utilized as it represented a stricter requirement in this case.
For the pairwise comparisons, samples from all selected pairs of conditions were compared first by a two-sample Student's t test of the log-transformed protein areas, and secondly by combining individual peptide-level ratios for each protein. The reporting threshold required a Student's t test p value less than 0.05 and fold change exceeding 1.5.
For ELISA testing we used the same 40 independent biological samples utilized for the SWATH analysis, as well as 10 additional individuals for each age group. A two-sample Student's t test was applied to detect significant differences between groups.
Pathway Analysis-Differentially expressed proteins were examined in the context of biological data using the Ingenuity Pathway Analysis tool with the abundance ratio of neonates to adults as the featured observation, and using default settings for determining enrichment. The top five most enriched pathways, functions and diseases and the top three networks were tabulated and described.
SWATH-MS-We used nondepleted plasma samples from the sample cohort and IDA to establish a local peptide assay library containing 146 proteins. SWATH acquisition of the 40 individual samples were obtained, and then matched to the local assay library for quantitation of peptide areas (supplemental Table S1).
Overall Differences Across the Age-spectrum From Neonates to Adults-Hierarchical clustering of the 40 plasma samples using SWATH quantitation of 146 proteins showed well separated clusters corresponding to the experimental age groups. Interestingly, this analysis clearly showed that plasma protein levels from the neonates were quite distinct from the three remaining age groups, with adults and one to five year olds clustering next to each other (Fig. 1). These trends were confirmed by principal component analysis, showing clear segregation based on age, with neonates clustering separately compared with all other age groups.
Given the strong separation of the samples with unsupervised analysis (clustering and PCA), not surprisingly a large percentage of the proteins were found to be differentially expressed based on the analysis of variance (ANOVA). A total of 107 proteins were found to have ANOVA p value Ͻ 0.05 and maximum fold change (ratio of highest average to the lowest average) at least 1.5 (Table I). This was a stricter approach compared with the 121 proteins identified using a Benjamini and Hochberg adjusted p value less than 0.05 (data not shown). The minimum fold change, of 1.51 was observed for Apolipoprotein-M, ranging up to greater than 300-fold change for Hemoglobin subunit gamma-2 which is significantly elevated in neonates. Specifically, there were 86 (80%) proteins with a maximum fold change ranging between 1.5 and 5; 11 (10%) proteins with a maximum fold change between 5-and 10-fold; 6 (6%) proteins with a maximum fold change between 10-and 50-fold and 4 (4%) proteins with a maximum fold change Ͼ50-fold.
The resulting hierarchical clustering of these 107 proteins confirmed a similar pattern to that observed for the unsupervised clustering of the overall data ( Fig. 2A). These proteins could be grouped into 6 specific clusters of expression (Fig.  2B). The list of proteins included in each cluster is presented in supplemental Table S2. The most interesting of these were clusters 2 and 4, with 25 and 30 proteins, respectively, showing increased expression with increasing age; and clusters 1 and 5, with 14 and 9 proteins, respectively, showing decreased expression with increasing age. Proteins included in these clusters represent the KEGG pathways of complement and coagulation cascade, with the most significant gene ontology functions being: acute inflammatory response, protein activation cascade, complement activation and regulation of humoral immune response.
Ingenuity Pathway Analysis-The top enriched pathways for the 107 differentially expressed proteins (based on the ANOVA) are shown in Fig. 3. Although 14 pathways were shown to be represented by the differentially expressed proteins, the top 5 pathways in order of significance were: Acute phase response signaling (18.1% overlap); LXR/RXR Activation (19.5% overlap); FXR/RXR Activation (17.4% overlap); Complement system (34.2% overlap) and Coagulation system (34.3% overlap). Fig. 4 shows the top 3 identified networks as Hematological system development and function, Organismal functions, Developmental disorder (Fig. 4A); Neurological disease, Lipid metabolism, Molecular transport (Fig. 4B) and Humoral Immune response, Inflammatory disease, Immunological disease (Fig. 4C). Diseases and disorders, as well as physiological system development and functions identified to be of significance are listed in Table II.
Extending the PediOme With a Large Repository Based Spectral Library-As it is notoriously difficult to generate an extensive protein profile of nondepleted plasma by MS because of the presence of many highly abundant proteins we evaluated whether more proteins could be detected by searching the SWATH data against a large human repository based spectral library containing spectra for more than 10,000 human proteins (17). We used SWATHXtend software to merge this library with our local library, ensuring reliable retention time alignments for extraction of SWATH acquired data. With this approach using nondepleted, nonfractionated plasma we extracted quantitative information from 940 proteins (extraction FDR 1%, two or more peptides required) (supplemental Table S3, Fig. 1C), a greater than 6-fold increase compared with the locally generated IDA based spectral library. We selected two proteins periostin (POSTN) and GPI-specific phospholipase 1 (GPLD1) which had high quality peptide spectral matches across multiple peptides in the ex-tended library. These proteins are reported to be secreted but are commonly under-reported in proteomic analyses of plasma, so we used ELISA to confirm their presence and concentrations. The expression patterns for five proteins with the largest age-specific changes in expression are shown in Fig. 5. These are: Hemoglobin subunit gamma-1 (HBG1); Hemoglobin subunit gamma-2 (HBG2); collagen alpha-1I chain (CO1A1); Ig alpha-1 chain C region (IGHA1), and Ig alpha-2 chain C region (IGHA2).
ELISA Validation-ELISA confirmed the expression of periostin and GPLD1 in plasma samples from each age group. GPLD1 expression (Fig. 6A) was greatest in adults with declining abundance with age (40 g/ml in adults, 18 g/ml neonates). Periostin showed the inverse trend, but with much lower expression (0.6 g/ml in neonates, 0.01 g/ml in adults). It was significantly downregulated (Fig. 6B) in all age groups in comparison to neonates, and significantly downregulated in Adults in comparison to Ͻ1 and 1-5-year age groups. This agrees with the SWATH data and demonstrates that an extended human library is a useful strategy for deeper plasma profiling.
We also validated expression of two other proteins, with our findings in agreement with the SWATH data Alpha-2macroglobulin (A2M) was found to be significantly downregulated (Fig. 6C) in all age groups (Ͻ1 year, 1-5 years and Adult) in comparison to neonates. The 1-5-year group was found to be significantly up-regulated in comparison to adults. Histidine rich glycoprotein (HRG) (Fig. 6D) was shown to be up-regulated in the 1-5-year group when compared with neonates, but downregulated in Ͻ1 year and Adults.
Pairwise Comparisons of Protein Expression Using an Extended Plasma Library-Pairwise comparisons between specific age groups were also performed. Proteins that were  differentially expressed in the local as well as the extended library are listed as: neonates compared with Ͻ 1 year olds (supplemental Table S4), with 26 and 25 proteins downregulated and 17 and 23 proteins up-regulated in local and extended libraries; neonates compared with 1-5 year olds (supplemental Table S5), with 35 and 38 proteins downregulated and 8 and 11 proteins up-regulated in local and extended libraries; neonates compared with adults (supplemental Table  S6), with 25 and 24 proteins downregulated and 9 and 18 proteins up-regulated in local and extended libraries; Ͻ 1 year olds compared with 1 -5 year olds (supplemental Table S7), with 15 and 16 proteins downregulated and 0 and 2 proteins up-regulated in local and extended libraries; Ͻ 1 year olds compared with adults (supplemental Table S8), with 15 and 20 proteins downregulated and 9 and 12 proteins up-regulated in local and extended libraries; 1 -5 year olds compared with adults (supplemental Table S9), with 8 and 7 proteins downregulated and 11 and 15 proteins up-regulated in local and extended libraries.

DISCUSSION
Although previous studies have assessed the aging of the plasma proteome in pediatric and adult populations (6), this is the first study to have utilized SWATH-MS technology to quantify and analyze variability in differential protein expression in both adult and pediatric age-specific contexts. An added advantage of this methodology was the ability to utilize ingenuity pathway analysis to elucidate enriched biochemical pathways and pathways involved in disease processes. Indeed, ANOVA analysis of the variation in protein expression across multiple age groups shows that proteins involved in hematopoietic development and function, the immune system, and physical growth undergo significant change as the plasma proteome ages.
We identified 107 proteins that varied in abundance over the four age groups examined. The most significant as identified by ANOVA analysis play a role in hematopoietic development, and immune function. Hemoglobin subunit gamma-2 (HBG2) is a protein encoded by the HBG2 gene, and has been shown to play an important role in the development of the hematopoietic system (18). In the developing fetus, HBG1 and HBG2 code for two gamma chain proteins, which along with corresponding alpha chains, make up fetal hemoglobin (HbF). At birth, HbF undergoes replacement by mature adult hemoglobin (HbA), in a process that continues over the first 6 months of age (19). As shown in Table I, the mean expression levels of HBG2 decreased significantly with age as HbA replaced HbF. In a similar manner, collagen alpha-1I chain (CO1A1), a protein that forms part of the alpha-1I chain in Type I collagen also undergoes decreased expression from neonates to adults. As Type I collagen is found in abundance in bone, tendons and the dermis (20), a likely explanation for this is that CO1A1 expression spikes early in life as a result of growth, and is then downregulated in adulthood to a mainte- nance function. In contrast, Ig alpha-1 chain C region (IGHA1), a protein that plays a role in the formation of IgA antibodies (21), was shown in our analysis to exhibit higher expression as the proteome aged, with largest expressional changes observed in the neonatal to 1-5 age groups. This suggests that rapid development of the immune system occurs in the neonatal to early childhood period, and continues to mature in adolescent years and adulthood. An advantage to utilizing SWATH-MS was the ability to re-mine the data by using a significantly larger assay library for matching. Here we took advantage of an extensive human assay library containing spectra from over 10,000 proteins and re-extracted quality matching peptides from SWATH data of undepleted and nonfractionated plasma. This increased the detected peptides by greater than 6-fold, enabling more thorough peptidome quantification, representing the largest set to date of SWATH quantified proteins in nondepleted human plasma. ELISA validation of two putatively secreted proteins, periostin and GPLD-1 confirmed their SWATH measured levels in plasma. Liu et al. (10) used SWATH analysis of human plasma and reported the quantitation of 342 proteins in 232 plasma samples collected longitudinally from pairs of twins. They detected GPLD-1, but not periostin and the annotated subcellular localization of these 342 protein was varied. This included nuclear proteins (e.g. Transcription elongation regu-lator 1(TCERG1), Heterogeneous nuclear ribonucleoproteins), mitochondrial proteins (e.g. GRP75, Trifunctional enzyme subunit alpha (ECHA), Adenylate kinase 2) and Golgi proteins (e.g. Golgi membrane protein GP73), among others with nonobvious functional roles in circulation. With these observations there is good reason to utilize the large-scale, proteome wide library approach as we have shown here as this readily increases the depth of plasma proteome coverage using LC-MS without sample immunodepletion. Interestingly, a related approach enabled quantitation of over 1500 proteins in mouse plasma, many of which were of cellular origin (22). Of course, secondary approaches should attempt to validate the detection of cellular proteins observed in plasma. Type 1 error cannot be excluded when utilizing large-scale libraries of this heterogeneous nature, and one strategy is to add stringency by requiring evidence for multiple peptides, in this case at least 2 per protein.
We used pathway knowledge to identify the most enriched pathways, as well as, functions and diseases. Acute phase response signaling, involved in the acute nonspecific immune response, constituted the most enriched canonical pathway. Given the previous evidence that IGHA1 expression increased as the subjects aged, this is further indication that immune development is a dynamic and continuously evolving system. Moreover, identification of IL-6 as FIG. 2. A, Heatmap of the 107 differentially expressed proteins from the local assay library identified using ANOVA. The clustering patterns were obtained using a correlation-based distance and complete linkage. Blue -proteins with decreased expression; Red -proteins with increased expression. B, Clusters and protein abundance trends obtained by hierarchical clustering of the differentially expressed proteins identified using an ANOVA. x-axis represents the four age groups (1, Neonates; 2, Ͻ1 year olds; 3, 1-5 year olds; 4, Adults). y-axis represents log Normalized Area of the expression of each relevant protein (gray), with the overall mean presented as a colored line incorporating standard deviation (2SD) bars. The separation in three clusters was based on cutting the dendrogram at a distance of 0.1 and coloring the resulting sample clustering. the top upstream regulator (a cytokine involved in the inflammatory response, and a mediator of the acute phase response (23)), the inflammatory response and the humoral immune response as the most enriched disease context and physiological function respectively, further supports this contention.
Indeed, quantitation of age-specific variability of plasma proteins has clear and relevant application to the clinic. Al-though changes in proteomic expression in disease-specific contexts have been well characterized in the literature and mentioned previously in this paper, identification of expression changes in age-specific contexts is a potentially powerful tool for clinical use. For example, blood borne pathologies such as beta-thalassemias and other related conditions often feature prolonged gamma chain HBG2 production, long into early childhood (24). Thus, proteomic analysis of patient plasma, in reference to age-specific protein reference ranges, may assist clinicians in the diagnostic phase, or help to elucidate the severity of the relevant pathology.
In this study we have shown that the plasma proteome undergoes significant change as the system ages. Apart from identifying proteins with various functional roles in systems such as hematopoietic development, the immune response and growth, we also identified the most highly enriched pathways, upstream regulators, disease contexts and functions in comparison to the default background. Future studies should be directed at further elucidating age-specific variability in protein expression and, importantly, its links to disease processes.  5. Age-specific expression patterns of five proteins of interest selected for discussion. Expression patterns for five proteins with the largest age-specific changes in expression. A, Hemoglobin subunit gamma-1 (HBG1); B, Hemoglobin subunit gamma-2 (HBG2); C, collagen alpha-1I chain (CO1A1); D, Ig alpha-1 chain C region (IGHA1) and E, Ig alpha-2 chain C region (IGHA2).