Cortisol dynamics in depression: Application of a continuous-time process model

Background: The temporal dynamics of cortisol may be altered in depression. Optimally studying these dynamics in daily life requires specific analytical methods. We used a continuous-time multilevel process model to study set point (rhythm-corrected mean), variability around this set point, and regulation strength (speed with which cortisol levels regulate back to the set point after any perturbation). We examined the generalizability of the parameters across two data sets with different sampling and assay methods, and the hypothesis that regulation strength, but not set point or variability thereof, would be altered in depressed, compared to non-depressed individuals. Methods: The first data set is a general population sample of female twins (n=523), of which 21 were depressed, with saliva samples collected 10 times a day for 5 days. The second data set consists of pair-matched clinically depressed and non-depressed individuals (n=30), who collected saliva samples 3 times a day for 30 days. Set point, regulation strength and variability were examined using a Bayesian multilevel Ornstein- Uhlenbeck (OU) process model. They were first compared between samples, and thereafter assessed within samples in relation to depression. Results: Set point and variability of salivary cortisol were twice as high in the female twin sample, compared to the pair-matched sample. The ratio between set point and variability, as well as regulation strength, which are relative measures and therefore less affected by the specific assay method, were similar across samples. The average regulation strength was high; after an increase in cortisol, cortisol levels would decrease by 63 % after 10min, and by 95 % after 30min, but depressed individuals of the pair-matched sample displayed an even faster regulation strength. Conclusions: The OU process model recovered similar cortisol dynamics for the relative parameters of the two data sets. The results suggest that regulation strength is increased in depressed individuals. We recommend the presented methodology for future studies and call for replications with more diverse depressed populations.


Introduction
Depression is a prevalent stress-related disorder, with an often recurrent or chronic course (Moussavi et al., 2007). Research into the pathophysiology of depression has often focused on cortisol levels, as cortisol is the end-product of the hypothalamic-pituitary-adrenal (HPA) axis, one of the major stress systems in the body. Findings to date regarding the association between basal cortisol levels and depression in large cross-sectional as well as in-depth repeated assessment studies have been inconclusive Stetler and Miller, 2011). Part of these mixed results may be explained by the high fluctuation of cortisol levels. Thus, examining the way cortisol behaves over time (i.e. https://doi.org/10.1016/j.psyneuen.2020.104598 Received 7 July 2019; Received in revised form 26 January 2020; Accepted 26 January 2020 its temporal dynamics rather than just basal levels alone) within depressed and non-depressed individuals may help us better understand the underlying mechanisms of depression.
Cortisol is one of the important mediators of allostasis, the dynamic process that keeps the organism alive by "maintaining stability through change", and promoting adaptation and coping (McEwen, 2002). Allostasis includes circadian (daily) and ultradian (hourly) rhythms; the latter are thought to optimize HPA axis responsiveness to the environment and to be important for normal cognitive, emotional and metabolic functioning (Gjerstad et al., 2018). Laboratory studies in depressed individuals have shown that several aspects of ultradian dynamics are altered, such as changes in the duration, amplitude and frequency of pulsatile episodes (Deuschle et al., 1997;Halbreich et al., 1985;Linkowski et al., 1985;Mortola et al., 1987). Additionally, experimentally-induced changes in ultradian dynamics of cortisol in the brain influence emotional and cognitive processes that have also been shown to be affected in mood disorders (Harmer et al., 2017;Kalafatakis et al., 2018).
A study by Peeters et al. (2004) was the first to examine depressed and non-depressed individuals' moment-to-moment cortisol dynamics in the participants' natural environment, representing an ecologically valid setting. They found that subsequently assessed cortisol samples (with 90-minute intervals on average) were less strongly associated over time (i.e. lower autocorrelation) in depressed than in non-depressed individuals, especially in those with more severe or recurrent episodes. They found no differences in mean (set point) or variability of cortisol levels. While the findings regarding autocorrelation point to alterations in cortisol regulation, the magnitude and direction (increased or decreased) of the malfunctioning remains unclear. One explanation for this is that this study did not consider the dynamic characteristics together from a context that captures cortisol regulation as a dynamical process.
Within-individual psychobiological processes such as cortisol regulation are mostly analyzed using discrete-time approaches, such as (multilevel) autoregressive models (van Ockenburg et al., 2015). However, such models do not consider how phenomena change continuously over time, between observations. Only under strict circumstances, such as equally spaced sampling intervals and normally distributed data (e.g. Ryan et al., 2018), can a discrete time model give a reasonable approximation to continuous time processes. However, in practice these assumptions are hardly ever met. Discrete time models also make it impossible to compare studies that have different sampling intervals (Voelkle and Oud, 2013). In addition, these models are most often fitted to data in multiple steps (as opposed to one-step), this way generating bias in the estimation (Pagan, 1984): point estimates of model parameters are regressed on predictors, without considering the error in the parameter estimates.
In the current study, we argue that a state-of-the-art continuoustime stochastic differential modeling approach, namely the multilevel/ hierarchical Ornstein-Uhlenbeck (OU) process model (Oravecz et al., 2009(Oravecz et al., , 2016 can address the above-mentioned issues. This model was originally developed for capturing affect dynamics (Kuppens et al., 2010), but can also be applied to other dynamic processes with set points. With this approach, short-term within-individual changes in cortisol are described using stochastic differential equations based on the OU process (Uhlenbeck and Ornstein, 1930). The OU process is a continuous-time model defined by a stochastic differential equation with three parameters: set-point, regulation towards the set-point, and stochastic variation around the set-point. Because of its continuous-time nature, it also conforms with the properties of unequally spaced, unstructured and unbalanced experience sampling data, this way providing pertinent insights into continuous-time cortisol regulation, independent of the specific measurement choices. Its parameter capturing regulation strength indicates how quickly cortisol levels regulate back towards the set point after any perturbation. Since this parameter comes from a continuous time model, the sampling frequencies do not scale its value, therefore we can compare regulation strength across studies. Finally, the estimated process model parameters can be mapped directly to the theorized behavioral characteristics of cortisol: set points, regulation and stochastic perturbations, resulting in within-person variability in cortisol.
The individual level process model is cast in a Bayesian multilevel framework (Gelman et al., 2013) that accommodates simultaneous modeling of multiple individuals. With this approach we derive individual and group level parameters at once, with the estimates informing each other. Moreover, the multilevel Bayesian framework allows us to introduce predictors on the process model parameters (i.e., on set-points, within-person variabilities and regulations strengths) and estimate the corresponding regression coefficients, process model parameters, and group-level characteristics simultaneously -this is a one-step approach to estimation that is only feasible in the Bayesian framework for the OU model (Oravecz et al., 2016).
With the current study, we aimed to examine set point, regulation strength and variability of cortisol by fitting the OU process model to cortisol time-series data of depressed and non-depressed individuals from two different samples. Because the OU model was applied to salivary cortisol data for the first time, we started with comparing set points, regulation strengths and variabilities of cortisol across the two samples to examine whether these parameters largely align. Second, we examined whether individual differences in these parameters are related to (recurrent) depression. In accordance with the study of Peeters et al. (2004), we hypothesized that individuals with depression, especially those with chronic/recurrent depression, have altered (be it increased or decreased) regulation strength compared to individuals without chronic/recurrent depressive episodes, but similar set points and variability.

Sample and procedures
We tested our hypotheses using data from two ecological momentary assessment (EMA) studies. In an EMA study participants complete assessments during their daily lives, this way facilitating ecological validity of those measures; for more information about this methodology, see Stone and Shiffman (1994). The first sample comprised females (mostly twins, therefore referred to as 'female twin sample') from the general population (Flanders, Belgium), who sampled saliva ten times a day for five days (Jacobs et al., 2007). The second sample consisted of depressed and pair-matched non-depressed individuals of both genders (therefore referred to as 'pair-matched sample'), who sampled saliva three times a day for thirty days .  (Derom et al., 2013;Loos et al., 1998) and from birth registers of Flemish municipalities. The EFPTS is a population-based survey that started in 1964, registering prospectively all multiple births within East Flanders. The project was approved by the local medical ethics committee and all participants gave their written consent.
Of the 621 women, 610 participated in the ambulatory assessment procedure. We excluded 80 participants who did not have at least 25 (50 %) valid salivary cortisol samples, to ensure enough observations for fitting the continuous time model to the data. In line with a previous study, participants with more than 20 % of cortisol values above 44 nmol/L were also excluded (n = 4) (Jacobs et al., 2007). In addition, extreme values (> = 100 nmol/L) were set to missing. As a result, one more participant was excluded due to insufficient valid salivary cortisol samples. Finally, two participants were excluded because of missing age or depression diagnosis. This left 523 women for the analyses, of which 21 were diagnosed with a depressive episode according to the Structural Clinical Interview for DSM-IV (see section 2.2.2 for more details).

Ambulatory sampling procedures.
Subjects received a digital wristwatch that emitted a signal ten times a day on five consecutive days, at random times between 7:30 a.m. and 10:30 p.m. After each 'beep', subjects completed self-assessment forms concerning current context, thoughts, and emotions. For a schematic representation of the sampling scheme, please see Fig. 1. After completion, subjects were asked to take a saliva sample (Salivette, Sarstedt, EttenLeur, Netherlands). After saliva collection, the subjects were instructed to store the swab in a salivette tube and to record the exact time of collection on the label. Samples were stored in the subjects' home freezers until transport to the lab, where uncentrifuged samples were kept at −20°C until analyses. To know whether the subjects had performed their tasks (filling in the form and taking the saliva sample) within 15 min of the beep, the self-indicated time of sample collection was compared with the actual time of the beep (Jacobs et al., 2005). All data not collected within 15 min after the beep were excluded from the analysis (as these were already excluded from the current dataset, they did not further affect the current sample size). Salivary free cortisol levels were determined in duplicate, using a time-resolved immunoassay with fluorescence detection. The lower detection limit of this assay was 0.2 nmol/L; interassay and intraassay coefficients of variation were less than 10 %. For more information regarding the sampling procedure and the determination of cortisol levels from the saliva samples, please see Jacobs et al. (2007).
2.1.2. Pair-matched sample 2.1.2.1. Participants. The sample was part of the 'Mood and movement in daily life' study, in which depressed and non-depressed participants (aged 20-50 years; mean age: 34 years) were ambulatory monitored three times a day for 30 days by means of electronic diaries, actigraphy, and saliva sampling. This resulted in a maximum number of 90 salivary cortisol samples per individual. Participants with and without a depressive disorder were pair-matched on gender, smoking, age, and BMI (see Booij et al. (2015) for further details on the matching procedure).
Before inclusion, participants were screened for depressive symptomatology with the Beck Depression Inventory (BDI-II, Beck et al., 1996), which was followed by a Composite International Diagnostic Interview (CIDI) (World Health Organization, 1990) if they had a BDI score > 14 (depressed group) or < 9 (non-depressed group). Depressed individuals were included if they met criteria for a DSM-IV diagnosis of Major Depressive Disorder (MDD; current episode or in remission for no longer than 8 weeks). Non-depressed individuals were included if were free of mood disorders at the moment of inclusion. Individuals with a current or recent (within the last two years) psychotic or bipolar disorder were excluded. Other reasons for exclusion were pregnancy and significant hearing or visual impairments. The study design was approved by the responsible Medical Ethical Committee, and all participants gave written consent before inclusion.
Of the 62 participants who started the study, 4 participants dropped out early. Another 4 participants completed the study but did not have enough valid measurements (T < 60) due to non-compliance, technical problems, or protocol violations. This left 54 participants for further study. The present study was based on the subsample for which cortisol and α-amylase concentrations have been determined, consisting of 15 matched pairs (n = 30, whereof 15 depressed and 15 nondepressed individuals). They did not differ significantly from the remaining participants on BDI score, gender, age, BMI and smoking status (p > 0.05). We discarded two-thirds of the data of one participant (D12), who reported a breast infection during this period and had relatively high values in this period (mean = 19.7 nmol/l). Values returned to a more normal range after day 20, around when she had started taking antibiotics. Therefore, we included data from the last 10 days (t = 30) only for this participant. The dataset is available online (doi: 10.1371/journal.pone.0131002).

Ambulatory sampling procedures.
The participants completed questionnaires on an electronic diary, the PsyMate® (PsyMate BV, Maastricht, The Netherlands) (Myin-Germeys et al., 2011) for a total of 32 days, of which the first two days served to get familiar with the device. The electronic diary questionnaire contained 60 items on mood, cognition, and daily activities and habits. The PsyMate was programmed to generate beeps at three predetermined moments a day at equidistant time points: in the morning (mean≈10 AM), six hours later in the afternoon (mean≈4 PM) and again six hours later in the evening (mean≈10 PM). The exact time points depended on participants' sleep-wake schedule. Participants were instructed to fill S.H. Booij, et al. Psychoneuroendocrinology 115 (2020) 104598 out the diary immediately after the beep, but a delay of maximally 1 h was allowed. For a schematic representation of the sampling scheme, please see Fig. 1. Saliva was collected while completing the diary, by means of a synthetic salivette collection device. More information about the sampling and storage procedures can be found in Booij et al. (2015). Salivary cortisol samples were analyzed by means of online-solid phase extraction in combination with isotope dilution liquid chromatographytandem mass spectrometry (LC-MS/MS), which has a broad analyte compatibility and high analytical performance. All samples of one participant were assayed in the same batch. Mean intra-and inter-assay coefficients of variation were below 10 %. The quantification limit for cortisol was 0.1 nmol/L.

Cortisol dynamics
Set point, regulation and within-person variability of salivary cortisol were estimated from the data. (1) Set point (μ) represents the level to which cortisol is regulated towards and it can change as a function of time. For example, in humans, cortisol has a higher set point in the morning than in the evening, due to circadian rhythmicity. Statistically, this translates into making set point a function of time-varying predictors -in this study, time of the day was used as a time-varying predictor. Examples of two hypothetical individuals with different set points are presented in Fig. 2A. In these examples, the mean set point is corrected already for time of day, to account for different set points at different phases of the circadian rhythm. Person (P)1 has a relatively high set point (μ = 7), while P2 has a relatively low set point (μ = 2).
(2) Regulation strength (β) captures how quickly cortisol levels regulate back towards the set point after any perturbation; increased regulation strength relates directly and inversely to the continuous-time autocorrelation function. To illustrate what regulation strength is, two hypothetical individuals with different regulation strength are presented in Fig. 2B. For person (P)1, cortisol levels return quickly after a perturbation to the system, indicating strong regulation (P1; β = 6), while for person (P)2, cortisol levels remain high for several hours, which indicates weak regulation (P2; β = 0.1). (3) Within-person variability (γ) in cortisol levels is caused by perturbations, which are external and/ or internal inputs, such as ultradian oscillations, daily hassles, caffeine intake, and physical activity. In cross-sectional group studies, this is usually (incorrectly) regarded as measurement error. Examples of two hypothetical individuals who differ in variability are presented in Fig. 2C. P1 shows relatively little variability (γ = 1), while P2 shows relatively large variability (γ = 3).

Depression diagnoses
For the female twin sample, the Structured Clinical Interview for DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, 4th Edition) axis I disorders (SCID-I) was administered to obtain current and lifetime diagnoses of MDD. For the pair-matched sample, current and lifetimes diagnoses of MDD were obtained from the CIDI interview. The current diagnoses were used to create groups of depressed and nondepressed individuals.
Individuals were further classified using both the lifetime and current diagnoses of MDD. Subjects were designated to a never depressed (no current or past depression), acute depression (current depression, no past depression), chronic depression (current and past depression) or past depression (no current, but past depression) group. In the pairmatched sample, there was only one currently healthy participant with a past depression (> seven years ago). Because she was the only one in this category and the depression occurred more than 7 years ago, the analysis was conducted with her in the never depressed category.

Descriptive variables
Age, gender, and completed education (0=primary education, 1= secondary education, 2=university/college education), and BMI were based on self-report. Severity of depressive symptoms was calculated as the average item score of the 16 items of the SCL-90 depression subscale (Derogatis and Unger, 2010) in the female twin sample) and as the total sum score of the 21 items of the BDI in the pair-matched sample at the start of the study.

Statistical analysis
We fit the described multilevel OU process model to study the three intra-individual dynamic features, namely set point, regulation strength and variability of salivary cortisol. First, we explored the intra-individual temporal dynamics of cortisol across the two samples (model 1). Because different cortisol assay methods were used for the two samples (radioimmunoassay versus liquid chromatography-tandem mass spectrometry), we expected that the absolute levels and variability might be different (Bae et al., 2016;Baecher et al., 2013). Therefore, we also compared the ratio between the set point and variability for the two samples as this measure is a relative measure, and might be less influenced by the assay method and was hypothesized to be similar for both samples. Second, we related these dynamics to individual differences in depression (model 2). We also included gender (only in the pair-matched sample) and age as covariates, as they have been previously related to cortisol levels (e.g. Larsson et al., 2009;Roelfsema et al., 2017). Finally, to confirm previous findings that cortisol dynamics are particularly altered in individuals with chronic/recurrent depression, we also examined never depressed, acute, past and chronically depressed individuals separately (model 3).
Models for cortisol were fit to the data of the two samples (pairmatched: 5 % missing observations; female twin: 25 % missing observations). Data were inherently nested, in the sense that there were multiple cortisol records (level 1) per person (level 2). Predictor variables were organized into time-varying covariates (time of day, time of day squared), and time-invariant predictors (gender, age, depression). To appropriately place the data in continuous time, a time variable was constructed, operationalized as the number of hours of each observation since midnight of the first day of each individuals' measurement burst.
The dynamical model parameters of the multilevel OU process model were estimated in the Bayesian framework, using Markov chain Monte Carlo algorithms. The data analysis was carried out in R (R Core Team, 2017) and JAGS (Plummer, 2003), with rjags (Plummer, 2016) that helped running JAGS from R. Scripts can be found as supplemental material S1. Bayesian estimation results in a posterior probability distribution for each parameter that summarizes the plausible range of the parameter, conditional on the data and prior settings, see (Gelman et al., 2013). In the Bayesian framework, p-values or 95 % confidence intervals are not defined; instead we calculate the 95 % Credibility Intervals based on the posterior probability distribution of the corresponding parameter. Credibility Intervals (CI) are interpreted as follows: the parameter is within the estimated 95 % Credibility Interval with a 95 % probability. Based on the posterior probability distribution we can also calculate summary statistics such as the estimate's posterior mean (a point estimate) and posterior standard deviation (similar to the standard error of the estimate). In this framework, we can assess whether a regression effect is credibly different from 0 by checking whether the regression coefficient estimate's 95 % CI contains 0.
First, model 1 was fitted to both data sets to study cortisol dynamics (model 1) in the two samples. The ratio between the set point and variability was also calculated. This model had only measurement time as a time-varying predictor to correct for the circadian rhythm. We let the set-point vary as linear and quadratic functions of time of day, because, with these variables together, we can model a curvilinear relationship (instead of only linear) to capture the potential rise and fall in the daily cortisol rhythm. Measurement time was centered on 12 a.m. (midday), so that the intercept quantifies the cortisol level at noon. Second, we fitted model 2 that also included gender (1=male, 0=female; pair-matched sample only), age (centered), and depression (depressed = 1, non-depressed = 0) as predictors of inter-individual differences in the parameters under study. Finally, we fitted model 3 with dummy variables for current, past (female twin sample only) and chronic depression (reference category = never depressed).
The outcome parameters and the effects of within-person predictors (i.e. time effects) were allowed to be person-specific. They are assumed to be sampled from a population distribution (i.e., they are random effects), which accounts for individual differences. Measurement error was not modeled separately from intra-individual variation. Models were first fitted using 6 chains, taking 20.000 posterior samples per chain after 2000 samples of discarded burn-in per chain (120.000 samples in total). An exception was model 2 for the pair-matched sample, which was a bit slower to converge, and needed 150.000 samples, instead of 120.000. We checked whether all six chains converged to the same area of the posterior distribution by calculating Rhat statistics (Gelman et al., 2013), all of which were below 1.1, therefore we concluded that there were no problems with convergence. Note: Smoking was missing for 13 non-depressed and 1 depressed individual of the female twin sample. Highest completed education was missing for 1 depressed individual of the pair-matched sample. * Depressive symptoms were assessed with the SCL-90 questionnaire (range 1-5; the total score was divided by the number of valid items) (female twin) and the BDI questionnaire (range 1-63) (pair-matched).
** These differences have been tested with the Bayesian multilevel Ornstein Uhlenbeck model. *** Two depressed individuals used Saint John's wort.

Comparison of samples
Descriptive statistics of the depressed and non-depressed groups of the two samples are shown in Table 1. The female twin sample was on average 7.5 years younger, scored 4.95 points lower on BMI and had 4.1 nmol/l higher mean levels of salivary cortisol than the pair-matched sample. In addition, the female twin sample included relatively higher numbers of oral contraceptive users (borderline significant) and lower numbers of antidepressant users. Smoking behavior was not significantly different between the two samples, nor was education level and the number of individuals with a previous depressive episode.

Comparison of depressed and non-depressed groups per sample
As shown in Table 1, for the female twin sample, age, BMI, smoking status, education level and oral contraceptive use were not significantly different between the depressed and the non-depressed group. SCL-90 depressive symptom scores were significantly higher, there were less individuals with a previous depressive episode, and there were more individuals using antidepressants in the depressed group. For the pairmatched sample, gender, age, BMI, smoking status, education level and oral contraceptive use were not significantly different between the depressed and the non-depressed group, while the BDI depressive symptoms score was significantly higher, the number of individuals with a previous depressive episode and the number of individuals using antidepressants was significantly higher in the depressed group (Table 1). Table 2 shows results from the three models specified previously in terms of average levels of set points (rhythm-corrected mean), variability (fluctuations around the mean) and regulation strength (the speed of return to baseline after a perturbation) of salivary cortisol across both samples. For all three parameters point estimates (posterior mean), error estimates (posterior SD) and interval estimates (95 % CI) are reported.

Cortisol dynamics; basic model compared across the samples (Model 1)
The average set point for the female twin sample was 10.05 nmol/l around noon, almost twice as high as for the pair-matched sample (5.29 nmol/l), both credibly different from 0 (95 % credibility interval non-overlapping with 0). The regulation strength was about equal in both samples, around 1.8. This equals to a decrease of about 40 % after 5 min, 63 % after 10 min, and 95 % after 30 min (see Fig. 3, for the complete continuous-time autocorrelation curve). The average withinperson (intra-individual variability) was about twice as large in the female twin sample (2.77) as the pair-matched sample (1.49) as well. The ratio between set point and variability was about the same in both samples, around 3.6. All three parameters showed substantial inter-individual variation, as quantified as SDs in the bottom part of Table 2.

Inter-individual variation in cortisol dynamics; association to gender, age and depression (Model 2)
The goal of model 2 was to study associations between depression (and gender and age as covariates) and set point, variability and regulation strength, of which the results are presented in Table 2.
Set point was not credibly related to depression in either of the samples. It was also not credibly associated to gender in the pair-matched sample. Set point was related to age in the female twin sample only, with older individuals showing lower set-points.
Credibly increased regulation strength for cortisol was found in the depressed group of the pair-matched sample, but not in the female twin sample. Regulation strength was not credibly associated with gender. Regulation strength was credibly associated with age in the female twin sample only, indicating faster returns to the set point with increasing age.
Variability was not credibly related to depression in the two samples. Variability of cortisol was credibly associated with gender in the pair-matched sample, where men showed less variability than women. Variability was credibly associated with age in the female twin sample only, with older participants showing less variability.

Inter-individual variation in cortisol dynamics; association to acute, chronic and past depression (Model 3)
When splitting up the two groups into acute, chronic, past and never depressed individuals, none of these groups showed credible associations to set point or variability, which result is consistent with Model 2 ( Table 2). The chronic depressed group of the pair-matched sample, but not the female twin sample, showed credibly increased regulation strength. The acute and past depression groups did not show credible differences in terms of regulation strength.

Discussion
In the current study, we examined intra-individual cortisol dynamics in two different samples in ecologically valid settings, and compared depressed and non-depressed individuals within these samples. Specifically, we examined set point, regulation strength and variability of salivary cortisol. For this, we used a multilevel continuous-time process model appropriate for the nested, unequallyspaced cortisol data at hand. The set point and variability were twice as high in the female twin sample as in the pair-matched sample. However, the ratio between these two variables, as well as the regulation strength to the set point, which are relative measures (opposed to the absolute values of set point and variability), were very similar, indicating generalizability across different populations. This supports the utility of the adopted method to study cortisol dynamics in daily life, even in different ambulatory sampling designs. Previous findings of a study reporting lower autocorrelation in depressed compared to nondepressed individuals (Peeters et al., 2004) were replicated and extended in one out of two samples, showing increased regulation strength in depressed individuals.
The fact that the set point and variability were different across samples can be due to the fact that the specific levels of salivary cortisol are very much dependent on a variety of factors. A very important factor is the assay method. The saliva samples of the female twin sample were analyzed with radioimmunoassay, while the samples from the pair-matched sample were analyzed with liquid chromatographytandem mass spectrometry (LC-MS/MS). Previous studies comparing these two methods have shown that radioimmunoassay can overestimate cortisol levels by one-and-a-half to two-fold, compared to LC-MS/MS (e.g. Bae et al., 2016;Baecher et al., 2013). This is mainly due to cross-reactivity of immunometric cortisol assays with inactive cortisone. Putting such a correction factor on the female twin sample, results in similar parameters to the pair-matched sample. The difference in gender composition could also contribute to the differences in set point and variability, although this is less likely because gender is usually weakly/not related to basal levels (Hansen et al., 2008). Importantly, the ratios between the variability and the set point at noon as well as the regulation strength were roughly the same, indicating that, even though the absolute values of the parameters may be confounded, relative indices were not, and may be useful cross-sample indicators of cortisol dynamics.
The recovered parameters for regulation strength suggest that if one would give an isolated boost of cortisol secretion at one moment, the levels decrease to about 50 % after 10 min, and about 95 % after 30 min. With the current study, we mainly captured small fluctuations in salivary cortisol during daily life. One of the major factors contributing to within-person fluctuations is the ultradian (about hourly) rhythm of cortisol (Young et al., 2004). Although our results cannot be directly compared to laboratory studies on ultradian rhythms due to different analytical methods, the calculated autocorrelation curve fits roughly with what is known about these bursts. Laboratory studies with high-frequency sampling (i.e. at 10−20 minute intervals) have shown that cortisol fluctuations come in several bursts during the day (i.e. ultradian oscillations) (Young et al., 2004). Bursts come in hourly oscillations, and the burst half duration has been estimated between 16-27 minutes (Carroll et al., 2012;Veldhuis et al., 2011Veldhuis et al., , 1989. Finally, it has been found that subsequent bursts are not correlated to each other, meaning that there is no autocorrelation over > 1 h (Veldhuis et al., 1989), which matches our findings. Hence, with respect to what is known about cortisol regulation in the context of ultradian rhythms, our estimates for regulation strength seem plausible.
Based on our results and prior studies (Oravecz and Tuerlinckx, 2011), the recommendation for the number of observations for studying cortisol with the OU model is at least 50. Because of the continuoustime nature of the model, the total number of observations is more important than the within-day sampling frequency. In the current study, we obtained similar results across the two samples for the parameters that were based on relative estimates using either a high within-day sampling frequency over five days (t = 50) or a lower within-day sampling frequency over a month (t = 90). However, because cortisol's diurnal pattern (in our case quadratic) needs to be captured, the minimum within-day frequency is three. The OU model   S.H. Booij, et al. Psychoneuroendocrinology 115 (2020) 104598 could also be applied to experimental data (e.g. laboratory stressors), but the recommendation of at least 50 time points still stands. In these situations, it is worthwhile to sample several times within an hour to separate reactivity to the stress task from fluctuations due to other factors (e.g. ultradian rhythm). Our hypothesis that cortisol's regulation strength would be higher in depressed individuals was supported in the pair-matched sample only. It was found that depressed individuals, and in particular chronically depressed individuals, had faster returns to the set point (and reduced autocorrelation), while they did not credibly differ in terms of variability and set point. Although we replicated the finding of Peeters et al. (2004) in the pair-matched sample, we did not do so in the female twin sample. This may be in part due to different sample characteristics. The pair-matched sample was more similar to the Peeters et al. (2004) sample than the female twin sample, as it also recruited participants from outpatient clinics, and preselected on a minimum level of depressive symptoms (> 14 on the BDI). Also, in the pair-matched sample there was a preselection on the non-depressed individuals; scores above 8 on the BDI were not allowed. The female twin sample was recruited from the general population, and there was no preselection on symptom levels. Hence, the pair-matched depressed sample may have been a more severely depressed sample, in terms of symptomatology and/or functioning (given that they sought help for their problems), and the contrast between depressed and non-depressed individuals may have been greater. Although no direct comparisons can be made, the SCL-90 depression scores (usual cut-off: 1.56, Aben et al., 2002) of the depressed and non-depressed groups of the female twin sample seem to lie closer to each other than the BDI scores of the groups in the pairmatched sample. In combination with the general problem of the small number of individuals in the depressed groups, this may have reduced power to detect differences, especially in the female twin sample. However, the exact post-hoc power is difficult to quantify with complex multilevel stochastic differential equation models such as the current one. Another possibility for differential results between the pair-matched and female-twin sample is a difference in the distribution of individuals with melancholic versus other types of depression; melancholic depression has been associated with altered HPA axis functioning more consistently than other types of depression (Lamers et al., 2013;Karlović et al., 2012). Post hoc comparison of the distribution of individuals with symptoms associated with melancholic depression (hyposomnia and loss of appetite/weight) across both samples, showed that the pair-matched sample included relatively many individuals with melancholic symptoms, as compared to the female twin sample: five individuals (33.3 %) in the pair-matched sample, and three (14.3 %) in the female twin sample.
Thus, we found at least partial support for our hypothesis that depressed individuals have altered cortisol regulation strength. To understand possible underlying mechanisms for this finding, we connect our results to the existing literature. Another study has found lower autocorrelation in subsequent cortisol values as well as more erratic cortisol output. The explanation of these patterns focused on reduced negative glucocorticoid feedback after perturbations (Peeters et al., 2004); however, our results rather support the idea that cortisol regulation is increased. Although we agree that the overall pattern of the salivary cortisol time series looks more erratic and spiky, the resulting autocorrelation curve suggests that this is due to increased regulation strength over an already short interval, possibly at the level of ultradian oscillations. This is in line with studies in chronically stressed animals and severely depressed men, in which the frequency of ultradian pulsatility was increased (Deuschle et al., 1997;Young et al., 2004), and with a study in which the duration of a cortisol burst was decreased in depressed individuals with high cortisol levels compared to non-depressed individuals (Carroll et al., 2012).
Another interesting finding was the lower set point and variability and increased regulation with increasing age in the female twin sample. In other studies, age has been mainly examined in terms of basal levels, limiting our comparisons to the set point only. In some of these studies, it has been found that cortisol levels increase with age (Nicolson et al., 1997;Seeman et al., 2001), but in others this was found only for men (Larsson et al., 2009), or for specific times of day (Roelfsema et al., 2017). However, these studies mostly compared adults (30-40) to elderly people (60-80). The female twin sample included individuals in the broad age range of 20-60, with a few exceptions of 60 + . One study did look at age effects on cortisol across the day in a similarly broad age range (Roelfsema et al., 2017). Interestingly, aggregated over the day, they found increased cortisol levels in older individuals, but when looking at specific times of day, they found decreased cortisol levels in the afternoon, and increased levels in the late evening and night, suggesting a flattening of the circadian rhythm. Hence, our finding of decreased cortisol levels with increased age may be the result of the timing of our assessments.
Another possible explanation for the finding of lower salivary cortisol levels and variability in older (female) adults is the fact that this study was done in ecologically valid settings, while other studies were mostly conducted under more standardized conditions. These differences in research methods might themselves lead to differences in findings, as has been shown in other EMA studies (e.g. Bylsma et al., 2011). In developmental psychology, theories such as the strength and vulnerability integration (Charles, 2010) and socioemotional selectivity theory (Carstensen et al., 1999) postulate that healthy older adults are more efficient at regulating their affect than younger adults, in part because older adults limit exposure to potentially perturbing stimuli. Evidence for this proposition has been found with EMA studies as well (Wood et al., 2017). Wood et al. (2017) found reduced emotional arousal and arousal variability with increasing age in a cohort of individuals in the range of 18-89. Because cortisol is sensitive to similar kinds of perturbing stimuli as feelings of arousal (Koolhaas et al., 2011), a similar explanation may hold for decreasing levels and variability of cortisol in older adults.

Strengths and limitations
The results should be interpreted in the light of several strengths and limitations. A strength of this study was the large female twin sample and the examination of a second, slightly different replication sample, which allowed us to draw more firm conclusions about cortisol dynamics in daily life. A drawback was that the depressed subgroups in both samples were rather small. Hence, we may have lacked power to detect differences between the depressed and non-depressed subgroups. In addition, several potential confounding variables that could also explain individual differences in cortisol dynamics were not taken into account. These include factors like menstrual cycle phase during the EMA paradigm, and menopausal status.

Conclusion
We examined several aspects of cortisol dynamics in two samples of depressed and non-depressed individuals, in ecologically valid settings. Despite differences in sample size, gender, inclusion criteria, and sampling schemes, the relative parameters were similar across samples, indicating that cortisol dynamics can be successfully captured in daily life with the continuous-time process model that we used. Furthermore, we provided further evidence that individual differences in cortisol regulation strength relates to depression. Future studies should replicate these effects in larger samples, especially with more depressed individuals and preferably with measurement burst designs (Boker et al., 2009;Ram et al., 2014), assessing these dynamics before, during and after a depressive episode, to better understand the nature of the association between changing cortisol dynamics and depression. Finally, we suggest to further explore the nature of individual differences in cortisol regulation, by including contextual moderators.

Funding
Since its origin the East Flanders Prospective Survey has been partly supported by grants from the Fund of Scientific Research, Flanders and Twins, a non-profit Association for Scientific Research in Multiple Births (Belgium). The current project was supported by the Foundation "De Drie Lichten" in The Netherlands, and the Foundation "Stichting Koningsheide" in The Netherlands. Dr

Declaration of Competing Interest
The authors have no conflicts of interest to disclose.