Bayesian Functional Data Modeling for Heterogeneous Volatility

Although there are many methods for functional data analysis, less emphasis is put on characterizing variability among volatilities of individual functions. In particular, certain individuals exhibit erratic swings in their trajectory while other individuals have more stable trajectories. There is evidence of such volatility heterogeneity in blood pressure trajectories during pregnancy, for example, and reason to suspect that volatility is a biologically important feature. Most functional data analysis models implicitly assume similar or identical smoothness of the individual functions, and hence can lead to misleading inferences on volatility and an inadequate representation of the functions. We propose a novel class of functional data analysis models characterized using hierarchical stochastic differential equations. We model the derivatives of a mean function and deviation functions using Gaussian processes, while also allowing covariate dependence including on the volatilities of the deviation functions. Following a Bayesian approach to inference, a Markov chain Monte Carlo algorithm is used for posterior computation. The methods are tested on simulated data and applied to blood pressure trajectories during pregnancy.


Introduction
Multi-subject functional data arise frequently in many fields of research, including epidemiology, clinical trials and environmental health. Sequential observations are collected over time for multiple subjects, and can be treated as being generated from a smooth trajectory contaminated with noise. There are a rich variety of methods available for characterizing variability and covariate dependence in functional data ranging from hierarchical basis expansions to functional principal components analysis. In defining models for functional data and in interpreting variability in trajectories, either unexplained or due to covariates, the emphasis has been on differences in the level and local trends. Dynamic features, such as velocity, acceleration and especially volatility, are also important but have received less attention, with exceptions in the study of growth data (Ramsay and Silverman, 2002) and in finance (Heston, 1993;Jacquier et al., 2002;Shephard, 2005;Barndorff-Nielsen and Shephard, 2012;Horváth et al., 2014).
Analysis of functional data dynamics studies temporal changes in trajectories and effects of covariates on these changes. For example, Wang et al. (2008) used linear differential equations to model price velocity and acceleration in eBay auctions and explored the auction subpopulation effect. Müller and Yao (2010) modeled the velocity of online auction bids using empirical stochastic differential equations with time-varying coefficients and a smooth drift process. Zhu et al. (2011) inferred the rate functions of prostate-specific antigen profiles using a semiparametric stochastic velocity model, in which rate functions are regarded as realizations of Ornstein-Uhlenbeck processes dependent on covariates of interest.
This article investigates a different dynamic feature, the volatility, which measures the conditional variance of trajectory changes over an infinitesimal time interval. We propose a stochastic volatility regression model, with Gaussian process priors used for the group mean and subject specific deviation functions through stochastic differential equations. We further accommodate inference on covariate effects on volatility through allowing the diffusion term of stochastic differential equations for deviation functions to depend on covariates. Although volatility has been extensively studied through stochastic volatility models in finance, the setting, model specifications and data features are distinct from ours. Stochastic volatility models typically focus on a single volatility process which is time-varying and associated with a price process for high-frequency finance data. More relevant is the literature on multivariate stochastic volatility models; for recent references, refer to Loddo et al. (2011), Van Es and Spreij (2011), Müller et al. (2011), Ishihara and Omori (2012) and Durante et al. (2013).
This setting differs from ours in that the focus is on multivariate time series modeling instead of functional data analysis, with interest in the joint volatility dynamics over time for the different assets. In contrast, we are interested in studying variation across individuals in a time-constant subject-specific volatility; that is, certain subjects may have very smooth trajectories while other subjects have erratic trajectories. It is our conjecture that such volatility heterogeneity is common in biomedical settings, but is overlooked in analyzing data with models that implicitly prescribe a single level of smoothness for all subjects. As data are sparse and irregularly spaced in most studies, it is not surprising such behavior is overlooked. However, the volatility in a biomarker may be as important or more important than the overall level and trend in the biomarker. We provide motivation through the following longitudinal blood pressure data set.
The Healthy Pregnancy, Healthy Baby study (Miranda et al., 2009) collected longitudinal blood pressure measurements for pregnant women. Blood pressures are measured at irregularly spaced times during the second and third trimesters, with the number of measurements per subject varying from 9 to 19. We are interested in estimating subjectspecific volatilities of blood pressure trajectories and in identifying covariates associated with the volatility. Figure 1(a) plots mean arterial pressure trajectories for twenty randomly selected normal women and women with preeclampsia, respectively. Clearly the mean arterial pressure trajectories among the preeclampsia group are more wiggly than Figure 1: (a) Mean arterial pressure trajectories for twenty randomly selected normal women and women with preeclampsia; (b) Log-transformed empirical volatilities for women in the normal group and preeclampsia group. Y ij denotes blood pressure for the ith woman at time t ij , and U ij = Y ij −Ȳ j is the deviation from the group mean blood pressureȲ j . The empirical volatility measures the fluctuation of trajectories empirically and is defined as the ones in the normal group, which is also implied by boxplots of log-transformed empirical volatilities in Figure 1(b). To explore volatility differences among various groups in addition to preeclampsia, we apply normal linear regression for log-transformed empirical volatilities with the covariates race, mother's age, obesity, preeclampsia, parity and smoking. The results suggest that preeclampsia and smoking are associated with empirical volatility, with p-values of 0.0005 and 0.002, respectively. This is a two-stage approach, which is useful as an exploratory tool but ignores measurement errors and uncertainty in volatility estimation.
Additionally, empirical volatilities in Figure 1(b) are heterogeneous even within the normal or preeclampsia group. This heterogeneity will be largely omitted when we apply functional data analysis methods with identical or similar smoothness for individual functions within a group. Consequently, the wiggly trajectories will be over-smoothed while the smooth trajectories will be under-smoothed. We can potentially estimate the individual trajectories separately but it is well known that borrowing of information will dramatically improve performance for sparse functional data. In addition, separate estimation does not allow for inferences on covariate effects and unexplained variability in volatility.
As for the clinical question addressed, the previous functional data analysis methods mainly focus on the shift of blood pressure level and ignore examining the volatility of blood pressure, which measures haemodynamic stability and is crucial for cardiovascular health. For example, a recent study shows that blood pressure stability rather than blood pressure level is associated with increased survival among patients on hemodialysis (Raimann et al., 2012). For the Healthy Pregnancy, Healthy Baby study, we observe that preeclampsia is commonly accompanied by blood pressure over-swinging. The joint effect of high blood pressure level and large volatility may lead to adverse birth outcomes, such as low birth weight and preterm birth.

The Model Specification
Suppose that Y i (t), i = 1, 2, . . . , m, is the observation of the ith subject at time t ∈ T i = {t i,1 , t i,2 , . . . , t i,ni < t U }, with T i the set of observation times before time t U for the ith subject. We specify an observation equation for Y i (t) as where Y i (t) is contaminated by a measurement error ε i (t) following a one-dimensional normal distribution with mean 0 and variance σ 2 ε . Assuming the ith subject belongs to the k i th group, with k i ∈ {1, 2, . . . , g}, we include a k i th group mean function M ki (t) = E{Y i (t) | M ki (t)} in the observation equation. The U i (t) characterizes the subjectspecific deviation from the group-mean with E{U i (t)} = 0.
The volatility of the ith subject is defined as the conditional variance of the (q −1)th order derivative of U i (t) over an infinitesimal time interval. We denote the volatility with differential operator D q = d q /dt q . As volatility approaches zero, U i (t) would be a straight line. In contrast, increasing the value of volatility would lead to a more wiggly U i (t) with a larger magnitude of fluctuation around M ki (t). We focus on the case when σ 2 Ui is constant over time but varies across subjects and depends on covariates for smooth curves without spikes, in which observations per subject are sparse and insufficient to estimate volatility varying over t. In contrast, stochastic volatility models typically focus on a single or a few volatility processes in which volatility is time-varying but unrelated to covariates for high frequency financial data. Our motivation is drawn from the blood pressure data; in related applications it is of substantial interest to do inferences on variability among subjects in the bumpiness or erratic-ness of biomarkers collected over time.
We specify Gaussian process priors for M ki (t) and U i (t) using stochastic differential equations, which incorporate the group and individual volatilities σ 2

a delta function. Equations (2) and
(3) specify M ki (t) and U i (t) as integrated Brownian motions, which have Bayesian connections to smoothing splines (Wahba, 1990;Gu, 2013). More details are given in Section 2.2. Drift terms can be included in equations (2) and (3) when domainspecific knowledge supports a particular curve pattern, such as convergent evolution for prostate-specific antigen profiles (Zhu et al., 2011). In this article, we are interested in investigating heterogeneous individual volatilities. Particularly, the volatility σ 2 Ui in stochastic differential equation (3) is allowed to vary between subjects and with covariates through a simple Gaussian log linear model, log(σ 2 , which can be extended easily to more complex specifications. In the current setting, the group volatility σ M k i would not depend on the same x i and we has outlined the possible extension in Discussion section. The mean and covariance functions of Gaussian process priors for M ki (t) and U i (t) can be obtained by applying stochastic integration to stochastic differential equations (2) and (3) as detailed in the Supplementary Material (Zhu and Dunson, 2016), resulting in the following lemma.

n, are the summations of mutually independent Gaussian processes written as
and U i (t) and their derivatives up to orders p − 1 and q − 1 respectively.
Hence, we can represent the prior of M ki (t)+U i (t) as a hierarchical Gaussian process, where GP (M (t), K(s, t)) denotes a Gaussian process with mean function M (t) and covariance function K(s, t). Different from previous hierarchical Gaussian processes (Park and Choi, 2010), in which the covariance function is modeled as squared exponential, and is identical across subjects within a group, here K Ui0 (s, t) + K Ui1 (s, t) is subjectspecific and depends on covariates through σ 2 Ui .
To carry out Bayesian inference, we further specify the following prior distributions. In particular, where invGa(a, b) denotes the inverse gamma distribution with shape parameter a and scale parameter b. We choose weakly informative priors, for example a = b = 0.01, to allow the data to dominate the inference of posteriors of σ 2 ε , σ 2 M k and σ 2 U0 , as illustrated by the MCMC steps 3a), 3b) and 3d) in Section 3. In practice, we have found the posterior distributions for these hyperparameters to be substantially more concentrated than the prior in simulations and applications, suggesting substantial Bayesian learning. Additionally, the β and σ 2 follow the independent Jeffreys' prior, f (β, σ 2 ) ∝ 1/σ 2 .

Double-Penalized Smoothing Spline
It is well known that the smoothing spline estimate is interpretable as a Bayes estimate under an integrated Wiener process prior (Wahba, 1990). By similar arguments, we can show that when the volatilities are given and σ 2 M0 and σ 2 U0 go to infinity, the posterior means of M k (t) and U i (t) are equivalent to the double penalized smoothing splinê M k (t) +Û i (t), which is the minimizer of the double penalized sum-of-squares, where penalty terms T {D p M k (t)} 2 dt and T {D q U i (t)} 2 dt penalize the roughness of M k (t) and U i (t) respectively, where the smoothness and the fidelity to data are balanced by the smoothing parameters λ M k = i:ki=k σ 2 ε /(n i σ 2 M k ) and λ Ui = σ 2 ε /(n i σ 2 Ui ), which balance the fidelity to the data and smoothness of M k (t) and U i (t) respectively. Expressions forM k (t) andÛ i (t), depending on λ M k and λ Ui , can be obtained explicitly, as shown in the following theorem.
Theorem 2.1. The smoothing splinesM k (t) andÛ i (t) with t ∈ T minimize the doublepenalized sum-of-squares (4) and have the formŝ . . , n}, the index set of unique observation times among all m subjects.
GivenM k (t) andÛ i (t), the double-penalized sum-of-squares (4) can be written as The proofs of Theorem 2.1 and the following Corollary are included in Supplementary Material.
Corollary 1. The μ k , ν k , α i and γ i can be obtained through a backfitting algorithm or the Gauss-Seidel method, iterating the following two steps until convergence:

Posterior Computation
We prefer a fully Bayesian approach that allows for uncertainty in smoothing parameters through hyperpriors. In addition, it is unclear how to implement generalized cross validation (Chap. 4, Wahba, 1990) when λ Ui depends on covariances, and when n is large, it is computational infeasible to invert the n × n matrix S M k involved in the backfitting algorithm. Instead, we propose an Markov chain Monte Carlo algorithm for posterior computation that solves these problems. The algorithm achieves computational efficiency by leveraging on the Markovian property of stochastic differential equations and samples M k (t) and U i (t) through the simulation smoother (Durbin and Koopman, 2002), which requires the following proposition.

Proposition 1. Let X(t) denote a (r − 1)th-order integral Wiener process, defined by the stochastic differential equation D r X(t) =Ẇ (t).
Consequently, the X j = {X(t j ), D 1 X(t j ), . . . , D r−1 X(t j )} T , j = 1, 2, . . . , n, follows a state equation The proof is in Supplementary Material. Finally, we outline the proposed Markov chain Monte Carlo algorithm as follows.
(1) Given M ki (t ij ), σ 2 ε and σ 2 Ui , sample U i (t ij ), i = 1, 2, . . . , m, j = 0, 1, . . . , n i . Let Y Uij = Y i (t ij ) − M ki (t ij ) and the stochastic volatility regression model for the ith subject can be expressed as the following state space model (Jones, 1993;Durbin and Koopman, 2001), from which we can draw samples of U i (t ij ) and its derivatives using the simulation smoother.
, which denotes ε Uij independently following an identical distribution N 1 (0, σ 2 ε ). Similar to the G j , ω j and W j in Proposition 1, the G Uij , ω Uij and W Uij follow the same specifications with r = q.
(2) Given U i (t j ), σ 2 ε and σ 2 M k , sample M k (t j ), k = 1, 2, . . . , g, j = 0, 1, . . . , n. Similarly, we rewrite the stochastic volatility regression model for the kth group as the following state space model and then sample M ki (t ij ) and its derivatives by the simulation smoother.
When ith subject has an observation at time t j and k The G M kj , ω M kj and W M kj are given by Proposition 1 with r = p.

Simulation
We carry out two simulation studies to evaluate the performance of the proposed method and compare it to alternative methods including natural cubic splines (Wahba, 1990), functional principal components analysis (Yao et al., 2005) and functional mixed effects models (Guo, 2002). The comparison focuses on performance in estimating the trajectory M ki (t) + U i (t), the volatility σ 2 Ui and the coefficients β. We considered both the cases with heterogeneous and homogeneous volatilities respectively, the later of which is in favor of the functional principal components analysis method and functional mixed effects models.
The first simulation study is designed to investigate the consequence of ignoring heterogeneity of volatilities. One hundred replicated datasets, each consisting of 100 trajectories, are sampled from the stochastic volatility regression model, in which the log-transformed volatilities are normally distributed. More precisely, we choose β = (0, 0.6, 2) T and ∼ N 1 (0, 0.25) respectively. Given β and x i , volatilities σ 2 Ui 's are drawn from log(σ 2 Ui ) ∼ N 1 (x T i β, 1). Along with σ 2 M1 = σ 2 M2 = 10, σ 2 ε = 1, p = 2 and q = 1, M 1 (t), M 2 (t), U i (t) and ε i (t) are sampled at t ∈ {0.2, 0.4, . . . , 4} from equations (2) and (3) and the distribution of measurement error ε i (t). Twenty percent of samples are removed completely at random, resulting in an average of 16 unequally spaced observations per subject. The ith subject is randomly assigned to one of the two groups with equal probability and Y i (t) is obtained from observation equation (1).
We first apply stochastic volatility regression using the proposed Markov chain Monte Carlo algorithm, with 15,000 iterations and keeping every 5th of the last 10,000 samples for posterior analysis. It takes about 80 minutes on a personal computer with 2.33 Gigahertz Intel(R) Xeon(R) central processing unit. Posterior means are chosen as the estimates of M ki (t)+U i (t), σ 2 Ui and β. The trajectories M ki (t)+U i (t)'s are estimated by natural cubic splines for one subject at a time, by functional principal components analysis and by functional mixed effects models for subjects within each group, taking about 1 minute, 2 minutes and 50 minutes on the same personal computer, respectively. For natural cubic spline, functional principal components analysis and functional mixed effects models, we may also estimate covariate effects on volatility through a two-stage method: estimating empirical volatility by the first stage withÛ i,j the estimate of U i (t) at time t ij , and in the second stage, empirical volatilities are regressed on covariates to obtain the estimate of β.
For each simulated dataset, we calculate average squared error for the trajectory Ui )} 2 /m, and squared errors for coefficient estimates SE(β l ) = (β l − β l ) 2 , l = 0, 1, 2. Table 1 reports means of ASE(M +U ), ASE{log(σ 2 U )} and SE(β l ) across 100 replicate datasets. Means of average squared errors and means of squared errors by natural cubic spline and functional principal components analysis approaches are significantly inflated, for example, being doubled and tripled in MASE(M +U ) respectively, while MASE(M +U ) of functional mixed effects models is slightly larger than for stochastic volatility regression. We randomly select a data set for close examination. We calculate the individual average squared error of the trajectory and select the subjects with the largest individual average squared errors for natural cubic splines and functional principal components. Figure 2 shows estimates of the trajectory for six subjects. The figure illustrates that, by treating one trajectory at a time, natural cubic splines lead to over fitting, e.g. Figure 2(d) and 2(e), with both over and under estimated volatilities. Functional principal components analysis instead faces problems in not adapting to the different volatility levels, for example, subjects with high volatility are over smoothed, e.g. Figure 2(b) and 2(d). Functional mixed effects models does not borrow smoothness information across subjects. Hence, it fits some curves well (e.g. Figure 2(c)) but over smooths other curves (e.g. Figure 2(b)). Although this simulation is based on the proposed model, it nonetheless illustrates the importance of adaptation of varying smoothness while borrowing smoothness information across subjects.
Our second simulation study assumes constant volatilities across subjects, with the set-up otherwise identical to the first study. The observations are generated from Y i (t) = 10{t+sin(t)}+0.6α 1i cos(πt/10)+0.2α 2i sin(πt/10)+ε i (t) for subjects in the first group and from Y i (t) = 10{t + cos(t)} + 0.5α 1i cos(πt/10) + 0.3α 2i sin(πt/10) + ε i (t) for the ∼ N 1 (0, 1). As illustrated in Table 1, stochastic volatility regression, similar to functional principal components analysis and functional mixed effects models, has similar performance with lower errors than natural cubic splines. This suggests that stochastic volatility regression can also adapt to the homogeneous case.

Applications
It is a standard practice to monitor blood pressure of pregnant woman. However, fluctuations in pregnancy and the associated factors are largely unstudied. We apply the   proposed stochastic volatility regression approach to analyze longitudinal blood pressure measurements in the Healthy Pregnancy, Healthy Baby study, aiming to investigate the stability of blood pressure trajectories and identify the associated factors. The data consist of 106 non-Hispanic white and 176 non-Hispanic black women whose first blood pressure measurement is collected before the 16th week of gestation and the last one no earlier than the 37th week of gestation. Most subjects have 9 (35.10% of them), 10 (29.28%) or 11 (14.98%) measurements spaced at irregular times. The covariates we focused on include race as non-Hispanic white versus non-Hispanic black and indicators of advanced maternal age, obesity, preeclampsia, previous pregnancy, and smoking. The analysis was implemented as in the simulation studies. Trace plots and autocorrelation plots suggest rapid convergence and mixing. The posterior means of standardized residuals ε i (t)'s are plotted in Figure 3a. Most of points locate within two standard deviations from the mean zero. In addition, a QQ-plot in Figure 3b of the empirical quantiles of the posterior means of the standardized residuals shows close agreement with a diagonal line. These diagnostics suggest that the proposed model fits the data well. Posterior summaries of selected parameters are presented in Table 2.
The panels (a) and (b) of Figure 4 show posterior means and 95% credible intervals of the average blood pressure for non-Hispanic white and non-Hispanic black groups, respectively, which share a common pattern: decreasing till the late stage of the second trimester during 20 to 25 weeks and then increasing toward the pre-pregnancy level. Within ethnic group, there is significant heterogeneity among women in the stability of the blood pressure trajectory. As Figure 4(c) indicates, posterior means of volatility vary Figure 4: The posterior means and 95% highest posterior density credible intervals for (a) the blood pressure during the 2nd and 3rd trimesters for non-Hispanic white group; (b) the blood pressure during the 2nd and 3rd trimesters for non-Hispanic black group; (c) the volatility in the logarithmic scale; (d) covariate effects. from -0.5 to 2 in the logarithmic scale, suggesting some women have stable trajectories parallel to the group mean, while other women have erratic trajectories.
Most interesting, we find that obesity and preeclampsia are associated with blood pressure volatility, with their 95% credible intervals not covering zero in Figure 4(d). This implies that pregnant women with obesity and/or preeclampsia are more likely to demonstrate irregular patterns of blood pressure relative to their ethnic group. We further examine the characteristics of women with extreme volatilities. Among the eight women presenting with the largest volatilities, most of them are non-Hispanic black with obesity and preeclampsia, do not smoke and give birth to a baby for the first time; half of them are younger than 35. For the eight subjects with the smallest volatilities, they are surprising homogeneous, with all but one being non-Hispanic white without obesity and preeclampsia, younger than 35, not smoking and giving birth to a baby before.

Discussion
We have proposed a Bayesian model to investigate functional data volatility and its association with covariates. As an important dynamic feature, volatility measures the stability of the biological process. The analysis of volatility not only reveals its heterogeneity among subjects but also its dependence on the covariates of interest. Complementing current FDA methods, which mainly focus on trends in each trajectory and (perhaps) derivatives, the proposed method initiates the exploration of stability of functional data. As illustrated with the blood pressure data, our view is that substantial new insights can be obtained in a rich variety of biomedical applications by studying volatility.
The proposed model utilizes Markovian property and adopts a state space model approach to achieve computational efficiency, which requires O(m 2 n) for calculating the matrix inverse with m subjects and n observations per subject. In contrast, the linear mixed model requires O(m 3 n 3 ) for the same calculation. The current algorithm however would face the challenge of handling large number of subjects, the solution of which warrants further investigation.
The proposed stochastic volatility regression model is closely related to the functional mixed effects models (Guo, 2002) but with different specifications. Both models incorporate population-average and subject-specific curves, whose smoothness properties are the same in functional mixed effects models but could be different in stochastic volatility regression model with unequal p and q. Moreover, although both models allow smoothness parameters of subject-specific curves to vary, they are dependent on covariates in stochastic volatility regression model but not in the functional mixed effects models.
The proposed model can be extended in multiple different directions. For example, we may substitute single group mean function in equation (1) by a weighted sum of several mean functions with covariates. Limited by the sparse observations in the HPHB study, we assume the volatility time-constant, so that each subject has their own distinct volatility controlling the "erraticness" of their function. Given denser measurements, we may allow volatility to vary across time and subjects. It is also of interest to avoid normality assumptions in modeling the population distribution of volatility and in developing methods that scale to high-dimensional covariates.