Transcriptome analysis of early pregnancy vitamin D status and spontaneous preterm birth

Background We conducted a literature review on the studies that investigated the relationship of preterm birth, including spontaneous preterm birth (sPTB), with vitamin D status. Overall, these studies demonstrated that the incidence of sPTB was associated with maternal vitamin D insufficiency in early pregnancy. However, the potential mechanisms and biological pathways are unknown. Objectives To investigate early pregnancy gene expression signatures associated with both vitamin D insufficiency and sPTB. We further constructed a network of these gene signatures and identified the common biological pathways involved. Study design We conducted peripheral blood transcriptome profiling at 10–18 weeks of gestation in a nested case-control cohort of 24 pregnant women who participated in the Vitamin D Antenatal Asthma Reduction Trial (VDAART). In this cohort, 8 women had spontaneous preterm delivery (21–32 weeks of gestation) and 17 women had vitamin D insufficiency (25-hydroxyvitamin D < 30 ng/mL). We separately identified vitamin D-associated and sPTB gene signatures at 10 to 18 weeks and replicated the overlapping signatures in the mid-pregnancy peripheral blood of an independent cohort with sPTB cases. Result At 10–18 weeks of gestation, 146 differentially expressed genes (25 upregulated) were associated with both vitamin D insufficiency and sPTB in the discovery cohort (FDR < 0.05). Of these genes, 43 (25 upregulated) were replicated in the independent cohort of sPTB cases and controls with normal pregnancies (P < 0.05). Functional enrichment and network analyses of the replicated gene signatures suggested several highly connected nodes related to inflammatory and immune responses. Conclusions Our gene expression study and network analyses suggest that the dysregulation of immune response pathways due to early pregnancy vitamin D insufficiency may contribute to the pathobiology of sPTB.


Introduction
Preterm birth (PTB), defined as delivery occurring before 37 weeks of gestation, affects up to 10% of all pregnancies, of which, 45-50% are idiopathic or spontaneous [1,2]. Spontaneous PTB (sPTB) is defined as commencement of labor with intact or prelabor rupture of membrane and birth before 37 weeks of gestation. While the risk factors and etiology of sPTB are still being investigated, several studies have investigated the association of vitamin D status with the incidence of sPTB. Several of these investigations provided evidence on the protective role of vitamin D during pregnancy in the prevention of both spontaneous and medically indicated PTB, however, a few found no association between vitamin D insufficiency and PTB [3][4][5][6][7]. These studies differ in methodology in that some investigated the impact of vitamin D supplementation, and some looked only at the association between vitamin D level (25-hydroxyvitamin D [25OHD]) during pregnancy and PTB. These studies also used varied definitions of vitamin D deficiency and sufficiency. More importantly, much of the available research on vitamin D and PTB considered vitamin D level at mid-or late pregnancy, while recent observations highlight the importance of the early vitamin D sufficiency in pregnancy and early vitamin D supplementation to rectify the insufficiency [8,9].
As such, we conducted a literature review of studies investigated the relationship between vitamin D and PTB including sPTB [9][10][11][12][13][14][15][16][17][18]. In this work and considering the results from systematic review studies and meta-analysis of these prior investigations [11][12][13][14], we investigate the potential biological pathways related to early pregnancy vitamin D sufficiency status that might be related to sPTB specifically.
Gene expression profiling can be useful for identifying pathway genes that provide insight into understanding the molecular mechanisms responsible for sPTB at early pregnancy. Previous research has looked at early pregnancy peripheral blood gene expression in patients who had preterm deliveries, each finding a set of genes that can be explored further for their roles in PTB [19,20]. Therefore, gene expression profiling could be employed as a helpful tool for exploring the biological pathways related to early pregnancy vitamin D status that may contribute to sPTB. We performed a nested case-control study in the Vitamin D Antenatal Asthma Reduction Trial (VDAART) to identify differentially expressed gene signatures associated with both vitamin D status and sPTB in early pregnancy. Systems biology approaches have revealed that disease-related genes distribute non-randomly in the protein-protein interaction network (interactome), thereby constructing a disease module [21,22]. Accordingly, we examined the connectivity and modularity of the differentially expressed genes related to early pregnancy vitamin D status and sPTB to identify potential key drivers of sPTB module.

VDAART design, participants, interventions, and oversight
VDAART (www.vdaart.com) is a randomized, double-blind, placebo-controlled clinical trial looking at the effect of vitamin D supplementation (4,000 IU vitamin D plus a multivitamin with 400 IU vitamin D daily) in comparison with a placebo (placebo pill plus a multivitamin containing 400 IU vitamin D) for pregnant women with a history of asthma or atopy. The trial aimed to determine whether vitamin D supplementation was associated with reduced incidence of asthma and recurrent wheeze in the participants' children and to determine whether vitamin D supplementation reduced the incidence of adverse pregnancy outcomes such as preeclampsia. The details of the trial and inclusion criteria are published [23]. In brief, VDAART participants were pregnant women who were non-smoker and between 18 and 40 years old who were recruited at 10 and 18 weeks of gestation from 3 study centers in the United States: Boston University Medical Center in Boston, Massachusetts; Washington University in St. Louis, and Kaiser Permanente Southern California Region in San Diego, California [23]. VDAART was approved by the IRBs of the participating institutions (Washington University in St. Louis, Boston Medical Center, Kaiser Health Care San Diego) and Brigham and Women's Hospital, and written consent was obtained from all participating pregnant women at their first enrollment visit. VDAART has been registered on ClinicalTrials.gov NCT00920621. This study is an ancillary and a nested-case control gene expression study in the VDAART cohort. Baseline maternal characteristics of the study subjects, those with sPTB and healthy controls, summarized in Table 1. The difference in proportions of study groups was compared using a Chi-Square test and two-tail P-values were reported. Student's t-test was applied as appropriate.

Study subjects in discovery cohort (VDAART)
We selected PTB cases prior to 32 weeks of gestation who had spontaneous preterm labor (sPTB) and control subjects for gene expression analysis from the participants of VDAART, excluding pregnancy-induced hypertensive cases of preterm birth or preterm birth due to chorioamnionitis. In this study, we looked specifically at non-hypertensive and un-induced cases of PTB diagnosed with sPTB given that our previous research found an association between early-pregnancy vitamin D insufficiency and preeclampsia [24].
15 participants who had PTB with available and suitable RNA were considered for our gene expression study. Of these 15 participants, 7 samples were excluded if they had PTB due to hypertension during pregnancy, an induced delivery or a positive lab test for chorioamnionitis, preterm rupture of membrane, or delivery between 32-37 weeks of gestation or abruption. Our nested case-control group (N = 24) was comprised of 8 women who had sPTBs (21-32 weeks of gestation), and 2 matched controls per woman, for a total of 16 controls (Table 1). Control subjects were women with normal pregnancy and suitable RNA who were matched with sPTB cases on maternal age (within 5 years), race, and study site. In a post-matching comparison of controls, we found no significant difference in maternal age, race, or pregnancy gestational age among the 8 subjects with sPTB and the 16 controls. Overall, in the discovery (VDAART) cases and controls, 17 subjects (71%) had insufficient vitamin D status (<30 ng/ mL, 5 with sPTB), and 7 (29%) had sufficient vitamin D status (�30ng/mL, 3 with sPTB, Table 1).

Measurement of Vitamin D (25OHD) in VDAART
Quantitative determination of 25-hydroxyvitamin D in the subjects' plasma was assessed using the FDA approved, direct, competitive chemiluminescence immunoassay (CLIA) by the DiaSorin LIAISON 25-OH Vitamin D Total assay at the Channing Division of Network Medicine. This assay is co-specific for 25-hydroxyvitamin D3 and 25-hydroxyvitamin D2. The assay utilizes a specific antibody to 25-hydroxyvitamin D for coating magnetic particles (solid phase) and a vitamin D analogue, 22-carboxy-23,24,25,26,27-pentanorvitamin D3, linked to an isoluminol derivative. During the incubation, 25-hydroxyvitamin D is dissociated from its binding protein and competes with the isoluminol labeled analogue for binding sites on the antibody. After the incubation, the unbound material is removed with a wash cycle. Subsequently, the starter reagents are added, and a flash chemiluminescent reaction is initiated. The light signal is measured by a photomultiplier as relative light units (RLU) and is inversely proportional to the concentration of 25-hydroxyvitamin D present in calibrators, controls, or samples. The inter-and intra-assay Coefficients of Variability for this assay are 11.2% and 8.1%, respectively. For this study, Vitamin D insufficiency was defined at a 25(OH)D threshold of <30 ng/mL based on the Endocrine Society's recommendations for health benefits2 and

RNA isolation and microarray processing
Total RNA was isolated from whole blood using the QIAGEN PAXgene Blood RNA Kit according to the manufacturer's protocol. The Ambion Globin Clear kit (Ambion 1 ) is used to remove alpha and beta-globin mRNA from the sample. This procedure is done for whole blood samples to increase the sensitivity of gene expression assays, by improving the detection rate of expressed genes. The RNA was quantified using Nanodrop 8000 and checked for high integrity before the preparation of cDNA (first-strand synthesis). RNA concentration and purity of the sample were assessed using the Agilent 2100 Bioanalyzer, which estimates an RNA Integrity Number (RIN). Samples with RIN � 8 were deemed acceptable and included in this analysis. Gene expression was assessed using the Affymetrix Human Gene 1.0 ST Array. Biotinylated cRNA was prepared according to the manufacturer's protocol, and hybridization was processed according to the protocol for the GeneChip Hybridization Control Kit at the Channing Division of Network Medicine, data coordinating center of the VDAART. As such using the isolated and processed RNA samples collected at 10 to 18 weeks of gestation, we created an expression set of 33,297 gene probes from 24 samples. The data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE142974 available at https://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc=GSE142974.
Quantiles of raw expression and principal components (PCs) across arrays were examined before and after background adjustment normalization and log 2 transformation using iCheck package [28] and the results were compared by running the "QCReport" function in the R library's "affyQCReport" [29]. We background adjusted, log 2 transformed, and quantile normalized the arrays by applying the robust multiarray analysis ("rma") function in R BioConductor's "affy" library [30]. Probes were annotated using the annotation package "hugene10sttranscriptcluster.db" available on Bioconductor [31]. We limited the expression set to the annotated probes for autosomal chromosomes. Then, we applied the interquartile range (IQR) filter from R Bioconductor "genefilter" package [32] to remove the expressions with variance less than 20% within arrays and accounted for sources of expression heterogeneity and confounders (i.e., batch effect) using Surrogate Variable Analysis (SVA) using the "sva package" from the Bioconductor [33]. With the finalized expression set of 15,222 probes and 24 samples (8 with sPTB), we implemented the rank prod method for identifying differentially expressed genes by using R Bioconductor package "RankProd" [34]. All statistical analyses were performed using R version 3.6.0 [35].

Replication cohort
To replicate the differentially expressed genes identified in the discovery cohort, we used a Gene Expression Omnibus (GEO) dataset that contained peripheral blood gene expression profiles from a cohort of pregnant women with and without sPTB at early and late pregnancy (17-23 and 27-33 weeks of gestation, respectively). The cohort included 51 cases with sPTB<37 weeks (10 cases had sPTB<32 weeks) matched with 114 controls who had normal pregnancies in the All Our Babies cohort in Calgary, Canada (N = 1878) [20]. The peripheral blood transcriptomes in this cohort were profiled using Affymetrix Human Gene 2.1 ST Array. We used the gene expressions from 17-23 weeks of gestation for the replication of our gene signatures and accessed the raw intensity files (.CEL) on GEO (https://www.ncbi.nlm. nih.gov/geo/query/acc.cgi?acc=GSE59491). The pre-processing of the CEL files, included background adjustment normalization and log 2 transformation, were carried out using the RMA algorithm in the R Bioconductor "affy" library. Similar to the discovery dataset and after the quality control, we limited the replication expression set to the annotated probes for autosomal chromosomes with values of IQR including 80% of the probe expressions and accounted for sources of expression heterogeneity and confounders using SVA. The resulting expression set used for replication consisted of 53617 probes and 165 subjects (51 with sPTB).

Gene expression analysis
R Bioconductor package "RankProd 2.0" [34] was used to perform differential expression analysis following quality control and adjustment for expression heterogeneity. "RankProd" is a non-parametric method that implements the Rank Product (RP) method for identifying genes that are consistently upregulated (or downregulated) in a number of replicated experiments. The advantage of the RP method is the non-parametric statistic which allows for increased performance in the case small-sample size and heterogeneity of samples [34,36]. After running the RP differential expression method on the expression dataset from our discovery cohort, we obtained two set of differentially expression gene lists (FDR < 0.05) at 10-18 weeks of gestations: one consisting of genes associated with sPTB, and the other consisting of genes associated with vitamin D insufficiency (< 30 ng/mL). The overlapping genes of the two lists were determined and analyzed for differential expression in the replication cohort. We matched the probes in the discovery (VDDART) and replication (sPTB GEO) expression datasets by converting the platform-specific probe-ID to Entrez Gene ID [37]. The replicated gene signatures (P-value < 0.05) were considered for literature curation and database annotation in association with sPTB using MetaCore from Clarivate Analytics, GeneCards [38], and dbPTB [39] biological pathway enrichment, and network analyses. Fig 1 depicts an overview of our approach and a summary of the results at each step.

Biological pathways of differentially expressed genes and their interaction in the interactome
We conducted functional enrichment of the replicated differentially expressed genes using gProfiler [40], which identifies a list of enriched functional terms from the Gene Ontology (GO) and other biological databases, ranked in order of statistical significance (FDR < 0.05). Further, we explored the connectivity and physical interaction of the replicated gene signatures by mapping them onto the human protein-protein interaction (PPI) network (interactome) using R Bioconductor package "STRINGdb" [41], which is the R interface for the STRING PPI database. We used molecular interactions with confidence score of > 0.4 demonstrating at least 50% confidence in the proposed interactions [41].The mapped gene set made up our sPTB module associated with vitamin D if demonstrated local enrichment in the PPI. To link the connected and disconnected components of the sPTB module, we looked at neighboring genes of the two submodules in the interactome and whether VDR gene is connected to the sPTB module through the direct interacting genes in its neighborhood, specifically IL-10, IL-8 and IL-6, which have immunologic roles in PTB [42].

Early pregnancy vitamin D status modulates the expression of sPTB transcriptome signatures
In the discovery cohort, 314 genes (153 upregulated) had differential expression in the peripheral blood of women with vitamin D insufficiency (25OHD levels < 30 ng/mL) relative to those with sufficiency (25OHD levels � 30 ng/mL) (FDR<0.05) at 10-18 weeks of gestation. Pregnant women who developed sPTB had 335 genes (197 upregulated) that were differentially  Table A in S1 File). The intersection of the vitamin D gene signatures and the sPTB gene signatures returned 118 overlapping genes with the same direction in expressions under each signature set. Of these overlapping gene signatures, 43 genes were differentially expressed (25 upregulated) in the peripheral blood of women with sPTB relative to those with normal pregnancy in the replication cohort (P<0.05; Table A in S1 File; Fig 1). Among these replicated genes, 31 genes (31/43[72.1%]) were previously reported in association with sPTB; 8 were found in a database search and 13 by manual curation through literature review (Table A in S1 File). Of these genes that have previously been reported in association with sPTB, we identified MMP8, HLA-DQB1, IFI44L, and several others in our curated list of 43 genes. Of those genes that have not been previously reported in association with sPTB (31/43[72.1%]), we identified CLEC12A, CLEC12B, IFIT1B, KIAA1324 among others. The full gene list with annotations is reported in S1 File, Table A.

Biological pathways of differentially expressed genes and their interaction in the interactome
In the Gene Ontology (GO) enrichment analysis 36 out 43 replicated gene signatures (83.72%) were enriched in several GO terms of immunologic functions including immune system process (N = 28) and response(N = 22), innate immune response (N = 12), ITGA2B-ITGB3 complex (N = 2), and leukocyte mediated immunity (N = 16, all corrected P < 0.05; Table B in S1 File).
Of 43 replicated differentially expressed genes, 36 (83.72%) were mapped onto the PPI with local enrichment (P < 0.001) and constructed the sPTB module (Fig 2). 20 genes in the sPTB module had direct interactions in the PPI and constructed the largest connected component (LCC) of the sPTB module. Functional enrichment of the LCC in the sPTB module can be found in S1 File, Table C. A submodule in the sPTB module consisting of 6 genes constructed the small connected component (SCC, Fig 2). Among the 26 genes in these two submodules (LCC and SCC) CEACAM8, OLFM4, IF44L, RSAD2, GYPA, ITGB3, MMP8, and OAS3 showed high degree of connectivity in their corresponding submodule and all were previously reported to be associated with sPTB (Fig 2, Table B in S1 File). VDR gene was connected to sPTB module through neighboring genes of IL-10, IL8 (CXCL8), and IL-6. The addition of these 4 genes to the sPTB module, the size of LCC was increased from 20 to 32 genes (Fig 3). Several genes exhibited a high degree of betweenness, defined as the extent to which a gene lies on the shortest path between two other genes. Genes with a high betweenness centrality and connectivity degree can be leverage points in a network system due to their control over the passage of information between network components [42]. IL10, IL8 (CXCL8), IL6, CEACAM8, HP, MMP8 and ARG1 were among the module of sPTB-associated genes that exhibited the highest betweenness centrality and connectivity degree (Fig 3).

Discussion
This transcriptome analysis is founded on the large body of research showing an association between maternal vitamin D status and risk of PTB, including sPTB, concluded from metaanalysis and systematic review of several observational studies [11,12,14,44]. A summary of the studies investigating the relationship between vitamin D and PTB can be found in Table 2.
The combined results from these studies led us to explore further the underlying mechanism through which vitamin D deficiency is linked to the incidence of preterm birth. It is important to note that there is a lack of consensus in the literature on what constitutes vitamin D sufficiency and insufficiency, which may have contributed to the inconsistency of results. Furthermore, few randomized trials have examined the effect of vitamin D supplementation on pregnancy outcomes such as PTB [14,49]. A systematic review of vitamin D supplementation and levels concerning birth outcomes indicated that such trials have small sample sizes and low dosage amounts of vitamin D [49]. No other study, to our knowledge, has explored the underlying biological mechanisms and gene signature of sPTB in relation to early pregnancy vitamin D status.
Our gene expression study returned a set of connected gene signatures related to both maternal vitamin D and sPTB. This network illuminated the connectivity of our differentially expressed genes to known vitamin D-signaling pathways and immunoinflammatory responses, thereby indicating a potential functional role of vitamin D on genes associated with sPTB. We further analyzed this network by measuring betweenness centrality for each node, or the number of shortest paths passing through a node, as well as its connectivity degree. This measurement depicted which genes in the network were the most responsible for monitoring communication between other genes in the network [22,43]. The replicated genes we identified through our differential expression analysis were enriched in biological pathways related to maternal systemic changes in immune (innate and adaptive) and inflammatory responses (Fig 2). This result was particularly interesting due to the previously established ability of vitamin D to modulate innate and adaptive immune response [50].
Three neighboring genes that exhibited high levels of betweenness centrality and connectivity were IL-6, IL-8 (CXCL8), and IL10 which all are pro-inflammatory cytokines that previous research has identified as potential biomarkers for preterm birth (Fig 3). Previous research has shown that concentrations of maternal serum IL-6 and IL-8 levels are significantly higher in women who experienced preterm labor [51,52]. Additionally, alterations in levels of antiinflammatory cytokine Interleukin 10 (IL-10), a regulator of immune responses, have been linked pregnancy-related pathologic conditions such as preterm labor [53]. IL-10 reportedly suppresses the production of pro-inflammatory cytokines, and previous studies present evidence that IL-10 mRNA and protein are upregulated in gestational tissues in normal pregnancies [53]. VDR was linked to the sPTB module and it's LCC through the neighboring 3 interleukin (IL) genes (Fig 3). This finding is in agreement with a prior study reporting that treatment of epithelial cells with 1,25(OH) 2 D 3 , a VDR agonist modulates IL-6 and IL-8 protein secretion [54]. The modulation of anti-inflammatory and pro-inflammatory cytokines is Transcriptome analysis of early pregnancy vitamin D status and spontaneous preterm birth highly influential in the preparation of the placenta and pregnancy outcome. The down-regulation of IL-6 and IL-8 by 1,25(OH) 2 D 3 , would slow down cervical ripening and allow for the expression of anti-inflammatory cytokine IL-10 [51], one of the key nodes in our expanded sPTB module. Another notable biological process depicted in our network was the up-regulation of MMP8, metalloproteinase-8. A relevant scholarship has identified elevated levels of MMP8 in amniotic fluid is predictive of preterm delivery, finding that as many as 42% of deliveries before 32 weeks of gestation are associated with an elevated MMP-8 level [55]. The upregulation of these immune-related proteins and their proximity to the VDR in the interactome provide insight into the mechanisms of how vitamin D sufficient at early pregnancy might reduce the risk of sPTB. A limitation of our gene expression study was the small sample size. We were able to identify only 8 sPTB cases (<32 weeks of gestation) who provided samples and had suitable RNA for transcriptome profiling and had the criteria for inclusion. This limitation and potential heterogeneity among the subjects might have affected the number of genes we were able to identify in our gene expression study at both discovery and replication stages which may have affected the size of sPTB module and comprehensiveness of the gene-gene interactions. The novel nature of our study made it difficult to find a replication cohort with both sPTB and vitamin D status, so we replicated subset of sPTB gene signatures that were also associated with vitamin D status in the discovery phase in an independent dataset according to the sPTB status. Similar to our discovery cohort, the sPTB cases in the replication cohort had a transcriptome profile from earlier pregnancy cohort; however, they had mostly sPTB at 32-37 weeks of gestation (N = 51, 10 cases with sPTB < 32 weeks), while our cases had sPTB < 32 weeks of gestation. The identification of a network that represents an overlapping gene module between sPTB and vitamin D gene signatures, as well as the reduced risk of sPTB as a result of vitamin D sufficiency based on collective published evidence justify the necessity for further research on this association with a larger sample size. One consideration for such studies should be the vitamin supplementation dose. In the VDAART, only 74% of the subjects who received the 4,000 IU dose of vitamin D 3 had a sufficient level of serum vitamin D (� 30 ng/mL) at 32 to 38 weeks of gestation, as compared to 82% at 1 month prior to delivery in an NICHD trial on a similar dose [56]. The results of these 2 studies highlight the latest emerging evidence that higher supplementation doses (2,000-6,000 IU/day) than the current recommendation of 400 to 600 IU/ day, particularly in the absence of adequate sun exposure, might be necessary to support sufficient levels throughout pregnancy. Furthermore, we recommend pre-pregnancy, individualized doses for women who have a predisposition for vitamin D deficiency and/or a risk of non-adherence to supplementation. These recommendations need to be definitively tested in a pre-pregnancy clinical trial in a population of women at high risk of sPTB. Finally, we believe that follow-up studies should include a differential gene expression analysis of mothers with sPTB before 37 weeks of pregnancy.
In conclusion, the peripheral blood gene expression patterns of women who had sPTB show immune gene activation at 10 to 18 weeks, suggesting the importance of vitamin D sufficiency in early pregnancy. With further knowledge about the pathways underlying sPTB, we hope to be able to identify modifiable risk factors affecting these biological pathways related to sPTB and predict which subjects are predisposed to PTB based on their vitamin D levels and those risk factors; to achieve this, a larger study looking at vitamin D levels and supplementation in early pregnancy as it relates to sPTB is necessary.
Supporting information S1 File. Replicated gene signatures of common genes with differential expression between sPTB and Vitamin D status and their literature curation (N = 43, Table A), Gene Ontology (GO) enrichment analysis of the replicated gene signatures that were mapped to the proteinprotein interaction network, i.e., sPTB module (N = 36, Table B), and GO enrichment analysis of the Largest Connected Component (LCC) of sPTB module (N = 20, Table C). (DOCX)