Multiomic signals associated with maternal epidemiological factors contributing to preterm birth in low- and middle-income countries

Preterm birth (PTB) is the leading cause of death in children under five, yet comprehensive studies are hindered by its multiple complex etiologies. Epidemiological associations between PTB and maternal characteristics have been previously described. This work used multiomic profiling and multivariate modeling to investigate the biological signatures of these characteristics. Maternal covariates were collected during pregnancy from 13,841 pregnant women across five sites. Plasma samples from 231 participants were analyzed to generate proteomic, metabolomic, and lipidomic datasets. Machine learning models showed robust performance for the prediction of PTB (AUROC = 0.70), time-to-delivery (r = 0.65), maternal age (r = 0.59), gravidity (r = 0.56), and BMI (r = 0.81). Time-to-delivery biological correlates included fetal-associated proteins (e.g., ALPP, AFP, and PGF) and immune proteins (e.g., PD-L1, CCL28, and LIFR). Maternal age negatively correlated with collagen COL9A1, gravidity with endothelial NOS and inflammatory chemokine CXCL13, and BMI with leptin and structural protein FABP4. These results provide an integrated view of epidemiological factors associated with PTB and identify biological signatures of clinical covariates affecting this disease.


Untargeted Metabolomics by Liquid Chromatography (LC)-MS: Metabolic extracts
were analyzed four times using HILIC and RPLC separation in both positive and negative ionization modes. Data were acquired on a Thermo Q Exactive HF mass spectrometer for HILIC (Thermo Fisher Scientific, Bremen, Germany) and a Thermo Q Exactive mass spectrometer for RPLC (Thermo Fisher Scientific, Bremen, Germany). RPLC experiments were performed using a Zorbax SBaq column 2.1 x 50 mm, 1.7 μm, 100Å (Agilent Technologies, Palo Alto, CA) and mobile phase solvents consisting of 0.06% acetic acid in water (A) and 0.06% acetic acid in methanol (B). Data quality was ensured by (i) injecting 6 and 12 pool samples to equilibrate the LC-MS system prior to running the sequence for RPLC and HILIC, respectively, (ii) injecting a pool sample every 10 injections to control for signal deviation with time, and (iii) checking mass accuracy, retention time and peak shape of internal standards in each sample.
2. Targeted Lipidomics using the Lipidyzer Platform: Lipid extracts were analyzed using the Lipidyzer platform that comprises a 5500 QTRAP system equipped with a SelexION differential mobility spectrometry (DMS) interface (Sciex) and a high flow LC-30AD solvent delivery unit (Shimazdu, Columbia, MD). Briefly, lipid molecular species were identified and quantified using multiple reaction monitoring (MRM) and positive/negative ionization switching. Two acquisition methods were employed covering 13 lipid classes; method 1 had SelexION voltages turned on while method 2 had SelexION voltages turned off. Data quality was ensured by i) tuning the DMS compensation voltages using a set of lipid standards (cat# 5040141, Sciex) after each cleaning, more than 24 hours of idling or 3 days of consecutive use, ii) performing a quick system suitability test (QSST) (cat# 5040407, Sciex) before each batch to ensure acceptable limit of detection for each lipid class, and iii) triplicate injection of lipids extracted from a reference plasma sample (cat# 4386703, Sciex) at the beginning of the batch.

Proteomics
The proteomic analysis was performed by O-link Proteomics (Watertown, MA) with a highly multiplex proteomic platform using proximity extension technology (88). For this study, thirteen panels were used, each measuring 92 different proteins simultaneously in 1μL of plasma. Each protein was detected by a matched pair of antibodies that were coupled to unique and partially complementary oligonucleotides. When in close proximity, a new and protein-specific DNA reporter sequence was formed by hybridization and extension, which was then amplified and quantified by real-time PCR.
Relative amounts of protein were quantified as normalized protein expression (NPX). NPX was derived by subtracting the Ct value of the extension control reaction from the raw Ct-value (threshold cycle) to adjust for technical variations (dCT), then subtracting differences in Ctvalues between plates (inter-plate control) from the dCt-value (ddCt-value) to adjust for interassay variability, and then subtracting the ddCt-value from a correction factor to adjust for background noise and invert the scale. An increase of 1 NPX corresponded to a doubling of the relative protein concentration (log 2 scale).
Quality control (QC) was performed at the assay and sample level. At the assay level (internal controls) each sample was spiked with two non-human antigens (incubation control), an antibody coupled with a unique pair of DNA tags (extension control), and a double-stranded DNA amplicon (detection control) to monitor the three major procedural steps (immunoreaction, extension, and amplification/detection). At the sample level three controls were added to each plate. A synthetic sample containing 92 antibodies with one pair of unique DNA tags in fixed proximity was added in triplicate to monitor and compensate for inter-run and inter-plate variations (inter-plate control). A negative control was added in triplicate to monitor for background noise. Finally, a pooled plasma sample was added in duplicate to monitor for intraand interassay variability and determine coefficient of variations. A plate passes QC if the standard deviation of internal controls was less than 0.2 NPX. Individual samples pass QC if values of internal controls deviated by less than 0.3 NPX from the plate median. In this study, all plates passed quality control, as did 98.2% of the plasma samples. Of all assayed proteins, 84.4% were detected in more than 75% of samples. The median intra-assay coefficient of variation was 6%. Prior studies have demonstrated strong associations between this assay and ELISA analysis (e.g., (95)(96)(97)).

Independent Test Predictions
Data S1. (separate file)

Data S2. (separate file)
Metabolomic feature data with extended analyte description