Time-Variant Genetic Effects as a Cause for Preterm Birth: Insights from a Population of Maternal Cousins in Sweden

Preterm delivery (PTD) is the leading cause of neonatal mortality worldwide, yet its etiology remains largely unexplained. We propose that the genetic factors controlling this trait could act in a nonuniform manner during pregnancy, with each factor having a unique “window of sensitivity.” We test this hypothesis by modeling the distribution of gestational ages (GAs) observed in maternal cousins from the Swedish Medical Birth Register (MBR) (n = 35,541 pairs). The models were built using a time-to-event framework, with simulated genetic factors that increase the hazard of birth either uniformly across the pregnancy (constant effect) or only in particular windows (varying effect). By including various combinations of these factors, we obtained four models that were then optimized and compared. Best fit to the clinical data was observed when most of the factors had time-variant effects, independently of the number of loci simulated. Finally, power simulations were performed to assess the ability to discover varying-effect loci by usual methods for genome-wide association testing. We believe that the tools and concepts presented here should prove useful for the design of future studies of PTD and provide new insights into the genetic architecture determining human GA.


Mathematical framework for generating survival times
Given a hazard function, under certain conditions it is possible to simulate the corresponding survival times analytically. The cumulative distribution of survival times ( ) ( ) follows a uniform distribution between 0 and 1 (Bender et al. 2005). Since available only under certain conditions. Austin (Austin 2012) provides several such formulas, namely, for proportionally increasing or decreasing covariates (i.e., when ( ) ) and for covariates that flip between two states, e.g., untreated → treated → untreated. The latter formulas rely on piecewise definitions of the inverse cumulative hazard function. The main limitation in using these expressions is that covariate values may only change at the knots 1 . For the simulation models of our study, this limitation would mean that the sensitivity periods of simulated risk factors must not overlap in time and have sharp "on" and "off" steps, instead of gradual increase and decrease of effect. We believe this would be a gross oversimplification of real developmental genetics. Thus, we opted to use the closed-form expression only in simulations with constant-effect genetic risk factors alone.

Parameter estimation
For each model, the following algorithm was used to estimate the lowest cost and parameters associated with it: The best cost reported for each model is the best average of costs from 20 replications performed in step 7. ∑ was chosen as the predictor for metamodeling, since we observed that it showed a quadratic relationship with the model cost, resulting in a easily 1 Some confusion may arise as the formulas elsewhere generally refer to fixed effect sizes β x and time-variant covariates x(t), while our simulations use fixed covariates G i and time-variant effect sizes, γ i (t). However, as these variables always enter the model as a product , both approaches are mathematically equivalent. identifiable minimum. (See Figure S2 for an example of this prediction.) It is also closely related to the mean genetic effect of each individual, as the expected value of a genotype composed from two 0/1 alleles is 2np.
As our models are stochastic, each simulation uses synchronized random number streams, or common random numbers (CRNs). The streams are not synchronized between replicated simulations of the same input. In this way, the observed variance of cost is reduced, and more closely reflects the "true" variance caused by parameter differences (see Kleijnen 2015 for more details on CRNs).
Number of combinations used in step 4 was chosen to maintain comparability between 2parameter and 4-parameter modelsincreasing number of tested inputs as k m , where m is the number of parameters, ensures the same sampling density for all models (Hastie et al. 2009). The choice of k was assessed by bootstrap as follows. Model M1 was simulated 10,000 times with random parameters (limited by the ∑ boundaries determined previously) to generate a distribution of costs, and 100 samples of size k were drawn from this distribution. We vary k and observe how frequently these samples contain the "true" lowest cost among the 10,000 replicates, and how far the average minimum observed per one sample of k simulations deviates from the "true" lowest cost ( Figure S3).

Sensitivity analysis
We performed a sensitivity analysis to evaluate how the fit of model M3 is affected by the choice of parameter values for varying-effect loci. Each of the 12 tested parameters was perturbed individually in both directions by -3 to +3 days (for μ) or -3 to +3 % of the best-fit value (for p and γ). Parameters for the constant-effect locus were fixed at their best-fit values. 10 iterations were performed for each parameter value. Resulting GA quantiles were averaged across the iterations to produce Figure        Mean and SD represent the average and standard deviation of costs obtained from 20 replications. γ 2 γ 3 γ 4 γ 5 mean cost SD of cost Table S6. 10 input parameter combinations resulting in the lowest costs for model M4.
Mean and SD represent the average and standard deviation of costs obtained from 20 replications.