Hypothalamus volume and DNA methylation of stress axis genes in major depressive disorder: A CAN-BIND study report

Dysfunction of the hypothalamic-pituitary-adrenal (HPA) axis is considered one of the mechanisms underlying the development of major depressive disorder (MDD), but the exact nature of this dysfunction is unknown. We investigated the relationship between hypothalamus volume (HV) and blood-derived DNA methylation in MDD. We obtained brain MRI, clinical and molecular data from 181 unmedicated MDD and 90 healthy control (HC) participants. MDD participants received a 16-week standardized antidepressant treatment protocol, as part of the first Canadian Biomarker Integration Network in Depression (CAN-BIND) study. We collected bilateral HV measures via manual segmentation by two independent raters. DNA methylation and RNA sequencing were performed for three key HPA axis-regulating genes coding for the corticotropin-binding protein (CRHBP), glucocorticoid receptor (NR3C1) and FK506 binding protein 5 (FKBP5). We used elastic net regression to perform variable selection and assess predictive ability of methylation variables on HV. Left HV was negatively associated with duration of current episode (ρ = -0.17, p = 0.035). We did not observe significant differences in HV between MDD and HC or any associations between HV and treatment response at weeks 8 or 16, overall depression severity, illness duration or childhood maltreatment. We also did not observe any differentially methylated CpG sites between MDD and HC groups. After assessing functionality by correlating methylation levels with RNA expression of the respective genes, we observed that the number of functionally relevant CpG sites differed between MDD and HC groups in FKBP5 (χ2 = 77.25, p < 0.0001) and NR3C1 (χ2 = 7.29, p = 0.007). Cross-referencing functionally relevant CpG sites to those that were highly ranked in predicting HV in elastic net modeling identified one site from FKBP5 (cg03591753) and one from NR3C1 (cg20728768) within the MDD group. Stronger associations between DNA methylation, gene expression and HV in MDD suggest a novel putative molecular pathway of stress-related sensitivity in depression. Future studies should consider utilizing the epigenome and ultra-high field MR data which would allow the investigation of HV sub-fields.


Introduction
The ongoing search for biomarkers and predictors of response in major depressive disorder (MDD) has seen varying levels of success from separate clinical, molecular and neuroimaging perspectives (Kang and Cho, 2020;Perna et al., 2020;Fiori et al., 2018;Mora et al., 2018). In the present study, we combined and examined relationships between a priori-defined clinical, neuroimaging and molecular variables within the theoretical framework of the hypothalamic-pituitary-adrenal (HPA) stress axis, posited to be dysregulated in MDD (Cernackova et al., 2020;Halaris, 2019).
The HPA axis is comprised of a series of hormonal actions and feedback regulation steps primarily among the hypothalamus, pituitary and adrenal glands. Corticotropin-releasing hormone (CRH) is excreted by the hypothalamus into portal blood, stimulating the anterior pituitary to release adrenocorticotropic hormone (ACTH) that results in cortisol release from the adrenals. Cortisol exerts negative feedback on this axis by binding to glucocorticoid receptors (GR) located in the hypothalamus and pituitary, as well as other central regions such as the hippocampus and prefrontal cortex. A plausible etiological theory of MDD is the desensitization of GR to cortisol due to inflammation, resulting in a dysregulated and prolonged stress response (Pace et al., 2007). FK506 binding protein 51 (FKBP5) is a chaperone to GR, and the two have been implicated in multiple studies with respect to MDD susceptibility and risk (Roy et al., 2017;Rao et al., 2016;Piechaczek et al., 2019), antidepressant response Binder et al., 2004) and childhood maltreatment (Bockmühl et al., 2015). Although some studies have examined the interaction of molecular variables pertaining to these genes with structural and functional neuroimaging (Tozzi et al., 2018, for example), none have done so in relation to hypothalamus volumetry (HV). Of particular note is the lack of studies examining the link between HV and childhood maltreatment in the imaging literature, although recent preclinical work suggests that early life stress suppresses cellular proliferation in the hypothalamus in adulthood (Bielefeld et al., 2021). There has been some success identifying volumetric indicators of MDD symptoms and antidepressant response in vivo (Kang and Cho, 2020), established most reliably in the hippocampus (Nogovitsyn et al., 2020;Santos et al., 2018). Additionally, antidepressant treatment has been associated with reversal of hippocampal volume deficits to a certain extent on a long-term basis (Malberg et al., 2021;Boldrini et al., 2013). Both the hippocampus and the hypothalamus express significant levels of GR, and although it is widely known that the hippocampus exhibits neurogenesis, the discovery of the same phenomenon in the hypothalamus is relatively more recent and less explored (Markakis et al., 2004;Cheng, 2013). Despite this and its central role in stress regulation, the hypothalamus is not studied often volumetrically in MDD due to its amorphous, heterogeneous structure leading to difficulty in segmentation. A few studies have examined group differences in volume in both in vivo and post-mortem contexts that were discrepant due to methodological limitations (Schindler et al., 2012). Post-mortem studies (using the same patient sample) found bilaterally smaller volumes of the hypothalamus in MDD (Bielau et al., 2005;Bernstein et al., 2012). Similar results were found in studies examining hypothalamus volume in anxiety (Terlevic et al., 2013) and schizophrenia (Koolschijn et al., 2008). However, a recent imaging study found greater left-sided volume of this region in MDD in a small sample of 20 MDD participants (Schindler et al., 2019). We aim to leverage a larger and well-defined MDD study sample to add to these results.
The objective of this study was to investigate the relationship between HPA-axis-related brain and blood variables in MDD in an exploratory fashion. We used bilateral HV and the blood DNA methylation and gene expression profiles of three genes involved in the HPA axis: CRH binding protein (CRHBP) as a proxy for CRH, NR3C1 (encodes GR) and FKBP5. We used the elastic net regularization method to quantify the extent to which DNA methylation at individual CpG sites on these genes predict HV. Secondary objectives included a hypothesisdriven analysis for group differences in HV measures and their clinical correlations (including childhood maltreatment, which has not yet been studied in relation to HV) in the largest analysis on this brain structure to date, for which we predicted smaller volumes in MDD. An additional discovery analysis was performed to assess whether there are any longitudinal changes in HV over 8 weeks of escitalopram treatment.

Study protocol
Participant recruitment occurred at six Canadian research sites from August 2013 to December 2016. All protocols were approved by the Research Ethics Boards at each institution. MDD participants were unmedicated at baseline, aged 18-60 years and met DSM-IV-TR criteria for a major depressive episode. MDD participants received flexible-dosage escitalopram treatment for 8 weeks, starting at 10 mg with a maximum dose of 20 mg per day. Clinical response was defined as a greater than 50% reduction in the Montgomery-Asberg Depression Rating Scale (MADRS) score after 8 weeks. At the 8-week timepoint, those who did not achieve response received an adjunctive treatment of aripiprazole (flexible dosage 2-10 mg/day); responders continued escitalopram treatment as originally prescribed. At 16 weeks, MDD participants were further categorized into the following response categories: early sustained responders (achieved response at week 8, continued to respond at week 16), late responders (did not achieve week 8 response, but achieved response at week 16 with adjunctive treatment) and non-responders (week 8 non-responders that failed to respond to adjunctive treatment by week 16). Additional inclusion/ exclusion criteria for the CAN-BIND 1 study can be found in Supplementary Materials. Further details pertaining to clinical, neuroimaging and biomarker protocols as well as patient outcomes and a CONSORT diagram outlining the flow of participants throughout the 16-week trial are available elsewhere Lam et al., 2016;Mac-Queen et al., 2019).

Image acquisition
Structural T1-weighted images were obtained at 3T at baseline and week 8. Across the six clinical sites four scanner types were used: three GE Discovery MR750, GE Signa HDxt, Siemens TrioTim and Philips Intera. All sites used a whole-brain turbo gradient echo sequence with the following ranges of parameters: repetition time (TR) = 6.4-1760 ms, echo time (TE) = 2.2-3.4 s, flip angle = 8-15 degrees, inversion time (TI) = 450-950 ms, field of view (FOV) = 220-256 mm, acquisition matrix = 256 × 256 -512 × 512, 176-192 contiguous slices at 1 mm thickness with voxel dimensions of 1 mm isotropic. Total acquisition time ranged from 3:30 min to 9:50 min. Raw data were manually checked for artifacts and if the visual inspection warranted, efforts were made to re-scan the participant as permitted by study timeline. Refer to MacQueen et al. (2019) for specific details on the neuroimaging protocols.

Manual hypothalamus segmentation
The retrospective nature of our investigation of this brain region and the variability in our 3T MRI data due to multiple scanners required manual segmentation to ensure data quality. We drew upon many recent sources (described in detail below) pertaining to hypothalamic boundaries and strengthened our protocol by blinding raters to subject identity/attributes.
Manual segmentation of the hypothalamus was performed by two independent raters (J.S. & M.A.) using the open-source software itk-SNAP (http://www.itksnap.org/). Segmentation was aided by an overlay of the subject's gray matter tissue probability map (GM-TPM) as per Wolff et al. (2018), automatically generated using the VBM-Segment tool in SPM 12 in Matlab (with very light regularization, all other default settings maintained). Raters were blinded to subject ID, scanning site, treatment group and scanning timepoint. Subjects were shuffled and images were segmented in random order. A training phase for raters comprised at least 100 'throwaway' segmentations prior to proper data collection of segmentation values to overcome any practice effects.
The segmentation boundaries from Bocchetta et al. (2015) provided a visual guide for manual work, where the authors used co-registration of T1 and T2-weighted images to discern finer details and subdivide the segmentation into subunits. To make up for our lack of T2-weighted images, for each subject we used the GM-TPM as a transparent overlay (opacity ~6-10%) for additional perspective (Fig. 1). We deviated slightly from the protocol of Bocchetta et al. (2015). While that group sub-segmented and subsequently excluded the fornix from volume calculations, we included the fornix in our segmentation once the medial edge separated from the wall of the third ventricle, thereby aligning more closely with the inclusion criteria in Schindler et al. (2013). The reason for this deviation is that in our 3T dataset, the fornix becomes difficult to distinguish from the surrounding gray matter as it starts to travel away from the ventricle wall in more caudal slices. Detailed anatomical landmarks are tabulated in table e1 of the supplement in Bocchetta et al. (2015).
We carried out voxel-wise segmentation in itk-SNAP (brush size 1, isotropic) to 'shade in' the right and left hemispheres of the hypothalamus separately, working in the coronal view in the rostral-caudal direction with sagittal and horizontal views visible for reference. We alternated starting with the left or right hemispheres and for each image and hid the first side from view as the other hemisphere was segmented to avoid visual bias. Whole volume segmentations included the preoptic region (starting at the anterior commissure), anterior hypothalamic region and mammillary bodies (stopping at the last 'complete' slice). We randomly selected 10-15% of segmentations from each site as well as a select number of more challenging segmentations (N = 50 segmentations in total) for inspection by a neuroanatomist/radiologist (M.R.), who provided expert quality assurance feedback on the boundaries.
Inter-rater agreement scores were calculated across all segmentations and intra-rater agreement was calculated independently for each rater using a randomly selected set of 50 segmentations. Spatial overlap was calculated using Dice scores and numeric agreement was calculated using two-way mixed, single score intraclass correlation coefficient (ICC; Shrout and Fleiss convention [1,3]). The average of the values from each independent rater were used in subsequent analyses. Volumes were corrected for intracranial volume (ICV) using the powerproportion method as described in Liu et al. (2014), shown to exhibit better performance in removing the confound of ICV compared to scaling by a linear factor only. The scaling factor s was defined as the beta coefficient for log(HV) regressed on log(ICV) for each hemisphere. The resulting ICV-corrected HV values were obtained by dividing the original values by ICV s . A measure of hemispheric asymmetry was calculated as (Left volume -Right volume)/Total volume (Zuo et al., 2019).

Molecular analyses of RNA transcript levels and DNA methylation
For all participants, molecular analyses were performed at McGill University and the Genome Quebec Innovation Centre; full methodological details pertaining to DNA methylation analyses can be found in Ju et al. (2019). Briefly, genome-wide DNA methylation was performed on the Infinium MethylationEPIC Beadschip (Illumina, USA) using DNA extracted from whole blood samples obtained from participants at baseline prior to treatment. Quality assurance and pre-processing were performed using the Chip Analysis Methylation Pipeline Bioconductor package in R 3.4. Beta values at each CpG site refer to the ratio of methylated signal to the sum of unmethylated and methylated signal.
For gene expression analyses, whole blood for RNA was collected in EDTA tubes and filtered using LeukoLOCK filters (Life Technologies, USA). Total RNA was extracted using a modified version of the Leuko-LOCK Total RNA Isolation System protocol and included DNase treatment to remove genomic DNA. RNA quality was assessed using the Agilent 2200 Tapestation, and only samples with RNA Integrity Number (RIN) ≥ 6.0 were used. All libraries were prepared using the Illumina TruSeq mRNA stranded protocol following the manufacturer's instructions. Samples were sequenced at McGill University and the Genome Quebec Innovation Centre (Montreal, Canada) using the Illumina HiSeq4000 with 100nt paired-end reads. FASTXToolkit and Trimmomatic were respectively used for quality and adapter trimming. Tophat2, using bowtie2 was used to align the cleaned reads to reference genome. Reads that lost their mates through the cleaning process were aligned independently from the reads that still had pairs. Quantification on each gene's expression was estimated using HTSeq-count and a reference transcript annotation from ENSEMBL. Counts for the paired and orphaned reads for each sample were added to each other. Normalization was conducted on the resulting gene matrix using DESeq2.

Childhood maltreatment measure
The Childhood Experience of Care and Abuse (CECA) interview was used to assess childhood maltreatment and caregiver relationships up to age 18 (Bifulco, Brown, and Harris, 1994;Chakrabarty et al., 2019). CECA is retrospective and semi-structured, administered via secure videoconferencing and rated using standardized criteria. Subscales of antipathy, neglect, physical abuse and sexual abuse were rated on a 4-point severity scale based on narrative details such as frequency, age of experiencing the maltreatment and for how long, relationship with perpetrator and degree of injury. Subscales were summed to yield an overall childhood maltreatment score.

Statistical analyses
All statistical analyses were performed in Python 3.8.1 using packages statsmodels, sklearn and pingouin (Vallat, 2018). We tested for significant differences between demographic variables using either Student's t-test or chi-square test for continuous and categorical variables, respectively. Testing for group differences in HV measures (left [LHV], right [RHV], asymmetry) was performed using Student's t-test (in order to report confidence intervals and effect sizes) and subsequently general linear model to further control for sex and age. Spearman's rank correlations with each HV measure, which were normally distributed, were performed with the following clinical measures, which were not normally distributed: baseline symptom severity (MADRS; individual items and total score), childhood maltreatment (CECA; sub-scores and total score) current episode duration, age of MDD onset and number of prior episodes. Baseline HV variables were compared between antidepressant response groups at week 8 (responders, non-responders) and week 16 (early sustained responders, late responders, non-responders) to determine whether baseline values are associated with treatment response. Linear mixed models were used for the longitudinal analysis of HV change over 8 weeks, during which the MDD group received escitalopram treatment, accounting for sex and age. Site effects for both HV and methylation were assessed with analysis of variance. Correlations with age and sex differences were exploratory tests performed for right, left and asymmetry HV measures.
We assessed variances for each CpG site, excluding sites whose standard deviation fell below the threshold of 0.02. All analyses described hereon use only CpG sites that passed this variance threshold. After checking homoscedasticity using Bartlett's test, Student's t-tests were used to compare methylation levels between MDD and HC groups at each CpG site, corrected for multiple comparisons using false discovery rate (FDR) at p = 0.05. To assess functional significance of CpG sites for each gene, we computed Spearman's ρ on the normalized beta values of each CpG site and the expression level of their respective genes, within each group. Statistical significance of correlations for the MDD group, which comprised twice as many samples as the HC group, were confirmed with 95% confidence intervals of ρ with a bootstrapping procedure that used a resampling size identical to that of the HC group. Correlations were further corrected for multiple comparisons using false discovery rate (FDR) at p = 0.05. Chi-square test was used to assess the significance of the group difference in the numbers of significant gene expression/methylation correlations.
To explore relationships between HV measures and methylation profiles (low variance sites excluded), we used multiple linear regression with elastic net regularization (Zou and Hastie, 2005). Regularization methods optimize model coefficients such that the most important variables are assigned higher values. Due to combining variable selection and parameter estimation in one step, these methods do not run the risk of inflated standard errors and model variances as traditional ordinary least squares methods do (Finch and Maria, 2016) and are additionally well-suited for the common scenario where the number of predictors is smaller than or roughly equivalent to the number of samples.
Models were constructed for left and right HV each as a dependent variable and in each diagnostic group separately, resulting in 4 elastic net models (MDD-LHV, MDD-RHV, HC-LHV, HC-RHV). Sex and age were included in addition to methylation profiles at each CpG site, comprising 64 explanatory variables in total. For each model, we split the data into train/test samples using a 0.75:0.25 ratio and used 10-fold cross-validation and grid search for hyperparameter optimization. We repeated this procedure 100 times for each model and recorded values for mean squared error (MSE), mean absolute error (MAE) and R 2 on the test sample at each iteration. To calculate relative variable importance (RVI) scores, we counted how many times out of 100 runs the absolute value of each variable's coefficient ranked in the top 15% of model coefficients. We cross-referenced CpG sites with RVI scores > 50 with those that were significantly correlated with gene expression (indicating functional significance) in order to infer which sites might reasonably be associated with HV. Although we are not able to directly infer p-values for coefficients, the procedure of regularization in itself is designed to shrink 'non-significant' coefficients toward zero; additionally, we included sex and age as reference variables to infer relative strength of the other methylation variables in predicting HV.

Results
274 images from baseline and 223 images from week 8 were ultimately included for final analyses after excluding for visual quality issues (e.g., failing GM-TPM generation, overly pixelated quality) for a total of 497 manual segmentations. The outlier analysis resulted in removal of 3 participants for a total of 271 participants (181 MDD, 90 HC) at baseline and 217 participants at week 8 (138 MDD, 79 HC, no outliers found). There were no significant differences between groups in mean age, female:male ratio or ethnicity. Groups significantly differed in years of education, baseline MADRS, % family history of psychiatric disorders and total CECA score (Table 1).

Site effects
A one-way ANOVA revealed no effect of site on methylation values but did indicate an effect on HV values. Further pairwise testing revealed that one site (MCU) exhibited significantly different HV values in relation to all other sites. No significant differences were observed among the other 5 sites. A statistical correction method, ComBat (Fortin et al., 2017), was used to eliminate the effect of site while retaining variance associated with diagnosis, sex and age (Supplementary Figure 1); however, using site-corrected values did not affect statistical significance of results. As the estimation of site effects is inherently noisy and there is no guarantee that we corrected for only the variance truly associated with scanning site, we report uncorrected values to prioritize simpler modeling (Kass et al., 2016). Refer to Supplemental Table 1 for demographic and clinical information grouped by site.

Inter-and intra-rater agreement
Raters exhibited good agreement with themselves and between each other overall with values equal to or greater than 0.78. Refer to Supplemental Table 2 for inter-and intra-rater agreement statistics.

Demographic and clinical associations with HV
GLM analyses (satisfying statistical assumptions of normality, linearity and homoscedasticity) accounting for sex and age revealed no significant difference between MDD and HC in either RHV or LHV ( Fig. 2A; see Supplemental Table 3 for t-test results, power analyses and effect sizes for mean differences). There was a negative association of LHV with duration of current episode (Spearman's ρ = − 0.166, p = 0.03). With respect to individual MADRS items, we observed for LHV a significant positive correlation with difficulty concentrating (ρ = 0.16, p = 0.035). For RHV, the negative correlation with inner tension was significant (ρ = − 0.19, p = 0.011). However, these clinical associations did not survive FDR correction for comparisons made across all items. There was no association of HV variables with CECA scores, baseline total MADRS, illness duration, family history of psychiatric disorders or antidepressant response status at 8 or 16 weeks, accounting for age and sex.
Within the MDD group, a significant difference between males and females was observed in RHV (t = 2.13, p = 0.044) but not LHV (t = 1.64, p = 0.10). This pattern was similar in the HC group with no sex difference observed in LHV (t = 1.81, p = 0.08) although the difference in RHV was marginally significant (t = 1.96, p = 0.05) (Fig. 2B). Significant associations with age were observed in both left and right HV in the MDD group (LHV: ρ = − 0.27, p = 2.9e-4; RHV: ρ = − 0.24, p = 1.3e-3) but not in the HC group. Slopes were similar in controls but not statistically significant (likely due to an uneven age distribution at higher values in this group). There was no effect of group on the association with age in either hemisphere (Fig. 2C).
Longitudinal analysis revealed no significant changes in HV variables over 8 weeks of escitalopram treatment in the MDD group. Visualization of baseline and week 8 values reveal a small yet consistent group difference that persisted after 8 weeks ( Supplementary Fig. 2), but these differences were not statistically significant at either timepoint.

Functional relevance analysis of CpG sites
We excluded 15 out of 23 CpG sites on CRHBP, 24 out of 44 sites on FKBP5 and 54 out of 81 on NR3C1 from analyses due to low variance (SD < 0.02). Within the MDD group, 11 out of the remaining 20 CpG sites on FKBP5 correlated significantly to gene expression levels, and 8 out of 27 were significantly correlated for NR3C1. Within the HC group, the methylation profiles of 4 CpG sites on NR3C1 were significantly correlated with gene expression (Fig. 3; Supplemental Table 4). Chisquare tests confirmed that the number of functionally relevant CpG sites within NR3C1 and FKBP5 in the MDD group were significantly different from the number of significant correlations observed in HC (FKBP5: χ 2 = 77.25, p < 0.0001; NR3C1: χ 2 = 7.29, p = 0.007). Neither group exhibited any functionally relevant CpG sites within CRHBP.

Elastic net modeling of the effect of DNA methylation on HV
On average across runs, regularized models for predicting RHV and LHV in MDD explained significantly higher amounts of variance (R 2 ) than in HC (LHV: Welch's t = 10.91, p = 2.63e-20; RHV: Welch's t = 8.36, p = 2.3e-14) (Table 2; Fig. 4). The variable of sex was shown to be consistently highly ranked across all four models, although age ranked highly only in the HC models. In the MDD group only, two highly ranked CpG sites were shown to also be significantly correlated with gene expression (cg20728768 [NR3C1], cg03591753 [FKBP5]). The coefficient signs for both variables indicated positive associations of methylation with HV. CRHBP featured most heavily among highly ranked variables in the MDD models (4 CpG sites) with the other 4 sites evenly split among NR3C1 and FKBP5, whereas all highly ranked CpG sites in the HC models were located on FKBP5.

Discussion
We investigated characteristics of hypothalamus volume and Table 1 Demographic and clinical information for the MDD and HC samples. Superscripts indicate the significance of the test statistic comparing patient and healthy control samples. 'ns' -p > 0.05, no significant differences between samples. '***' -p < 0.005. The N indicates the number of participants for which the corresponding information is available. epigenetic/gene expression profiles of three specific genes with major roles in the HPA axis in MDD. Novel findings of this study include the identification of two functionally relevant CpG sites that were predictive of HV within the MDD group, but not in healthy individuals, as well as the negative association of LHV with current episode duration. We showed that there are distinct relationships between HV and sex/age that are not affected by MDD status. We did not find any indication of a change in HV associated with 8-week escitalopram treatment. Finally, we did not observe that baseline volume was predictive of antidepressant response status at 8 or 16 weeks. We found distinct differences between groups on the correlations between methylation and gene expression, whereby MDD exhibited more CpG sites within each gene that significantly correlated with its RNA expression levels; however, reduced methylation levels at 3 CpG sites (2 located on NR3C1 and 1 on FKBP5) in MDD did not survive correction for multiple comparisons. Elastic net models further quantified the relative contributions of each CpG site in predicting HV. One of the highly ranked CpG sites that was also significantly correlated with gene expression was cg20728768, located on NR3C1. Previously, methylation at this site was found to be positively correlated with a measure of resilience in a sample with varying levels of PTSD symptom severity (Miller et al., 2020). In our study, methylation at this site was positively associated with HV, consistent with the implication that it may be a protective factor. In another study, it was negatively associated with PTSD, although this correlation did not survive correction for multiple comparisons (Mehta et al., 2019). The other significant CpG site was identified to be cg03591753, located on FKBP5. In one previous study, methylation at this site significantly predicted cortisol reactivity in interaction with the rs1360780 variant of the FKBP5 gene and resistant behavior to predict cortisol reactivity in infants (Mulder et al., 2017). We add to these findings, positing that DNA methylation levels at these CpG sites are linked to hypothalamic macrostructure in depression.
The same explanatory variables, including sex and age as reference variables, explained significantly less model variance overall within the HC group. The HC models also consistently ranked sex and age highly among explanatory variables in predicting HV, although age was not significantly correlated with HV in HC as indicated by the Spearman correlation; this gives some indication as to the relative strength of the association of HV with methylation variables that were ranked below age in HC compared to MDD (ie. relatively weaker). Conversely, although age was significantly correlated with HV in the MDD group, its RVI was <20 in both MDD elastic net models, indicating that the methylation variables that ranked higher were relatively more successful in predicting RHV and LHV than age. Additionally, only the MDD models identified highly ranked sites that were also significantly correlated with gene expression. Taken together, these results demonstrate greater overall statistical associations among variables of gene expression, DNA methylation and hypothalamic structure in MDD than in HC. This may be indicative of a concerted physiological response in MDD that could either be 1) a developmental or environment-elicited disruption leading to MDD symptoms or 2) a compensatory mechanism by the body to counteract other underlying etiological factors. With only baseline data available for DNA methylation, we were not able to test causal hypotheses. Although we did not observe group differences in DNA methylation among the CpG sites in these 3 genes, this was not entirely surprising given other genes have been identified to exhibit differential methylation in MDD (Chan et al., 2020;Xie et al., 2021).
Not observing significant group differences in either left nor right HV suggests that either there are no hypothalamic volumetric changes in MDD, or the volumetric changes might be very small and would require much larger samples to be detected. The latter is likely the case, as indicated by power analyses (Supplementary Table 3). In the only other

Table 2
Mean elastic net model error parameters (out of 100 runs) and top-ranking CpG sites (RVI>50) for each model, listed in order of RVI. Double asterisks (**) indicate CpG sites that were significantly related to respective gene expression levels.
study that explicitly examined HV in MDD in vivo, Schindler et al. (2019) found essentially the opposite result: their sample of unmedicated MDD participants exhibited greater left HV in a much smaller cohort of only 20 MDD and 20 HC participants. Other relevant major differences between our analyses aside from sample size are the correction for ICV using the power-proportion method and including age as a covariate given that we observed a significant HV-age correlation in the MDD sample. Otherwise, the lack of association between HV and other clinical variables is consistent across the two studies. We additionally did not find an association between HV and CECA, the composite score of childhood maltreatment. As there are no previous studies examining this relationship available for direct comparison, we entertain a few explanations: 1) early childhood experiences of emotional, physical or sexual abuse are not reflected by hypothalamic structure, 2) the CECA score, with its 4-point sub-scores, does not capture the specific lasting aspects of trauma that may be embedded in HV, or 3) similar to depressive status, more power is required to detect a finer-grained relationship between early trauma and HV. Previous literature regarding the link between HPA axis variables and depressive characteristics has been equivocal. It has previously been observed that most depression characteristics (severity, chronicity, symptom profile, prior childhood trauma) were not associated with HPA axis activity except for comorbid anxiety (Vreeburg et al., 2009). A more recent meta-analysis assessing effects across 361 studies did observe that patients with depression exhibited increased cortisol and adrenocorticotropic hormone levels; however, this effect did not extend to CRH, with the additional caveat that methodological limitations may have inflated some effects among primary studies (Stetler and Miller, 2011). That we observed no change in volume due to escitalopram treatment is not entirely surprising given the heterogeneous and contradictory evidence for detectable volumetric changes in humans following pharmacological treatment in MDD (Enneking et al., 2020;Suh et al., 2020).
With respect to the imaging component, we adapted innovative methods from previous papers (Wolff et al., 2018;Bocchetta et al., 2015;Schindler et al., 2013) to support a comprehensive and highly structured manual segmentation protocol designed to mitigate the limitations inherent in the study (eg. 3T resolution, scanner variability, lack of T2-weighted images for contrast, inter-rater agreement). A proportion of segmentations was also checked for boundary accuracy and overall quality by a certified neuroradiologist with extensive neuroanatomical expertize. Given our data were derived from 3T imaging from multiple scanners, we did not have sufficient resolution to do a subunit-level analysis (Bocchetta et al., 2015). In particular, since the paraventricular nucleus is solely responsible for the release of CRH, being able to isolate the volume of this region may have increased our chances of detecting a relationship with GE and/or DNA methylation. A major recommendation for future research is to use higher resolution images coupled with a robust semi-automatic segmentation procedure in order to be able to discern hypothalamic sub-fields. Another common limitation in our study, as in other human in vivo studies, is that despite having access to structural and functional data in vivo, it was not possible to obtain the direct genetic and proteomic correlates in the human brain. Future studies could also explore the role of pro-inflammatory markers (Hiles et al., 2012;Milenkovic et al., 2019;Gill et al., 2020), notwithstanding the current challenges in integrating multimodal information in biological psychiatry.

Conclusions
In a large sample of well-characterized individuals with major depression, we investigated baseline hypothalamic volume characteristics in MDD and whether they are correlated to DNA methylation and expression of candidate genes implicated in HPA axis function. We identified CpG sites whose methylation levels simultaneously exhibited high scores in predicting HV and were significantly correlated to gene expression in MDD but not in HC, which may suggest a molecular pathway of stress-related sensitivity in depression. Future studies could expand their genetic approach to either the entire epigenome or other/ additional candidate genes, as well as utilizing increasingly available ultra-high-resolution images at 7T or higher to discern hypothalamic sub-fields.

Fig. 4.
Plots illustrating the RVI scores of the 64 explanatory variables for each elastic net model, encompassing all CpG sites as well as sex and age as reference variables. Variables are color coded to indicate the gene on which they are located as illustrated by the legend. The threshold for a CpG site being considered a highly ranked variable is denoted with a dashed red line (RVI = 50). See Table 2 for the model mean error/fit parameters and the top-ranked variables for each model.

Funding
This study was supported in part by the Ontario Ministry of Research and Innovation (Early Research Award -Dr. Frey). JS is supported by the Canadian Institutes for Health Research (CIHR): Frederick Banting and Charles Best Canada Graduate Scholarships Doctoral Award. The neuroimaging platform was supported in part by a CIHR grant (Co-PIs: Drs. Kennedy and MacQueen, MOP 125880). CAN-BIND is an Integrated Discovery Program carried out in partnership with, and financial support from, the Ontario Brain Institute, an independent non-profit corporation, funded partially by the Ontario government. The opinions, results and conclusions are those of the authors and no endorsement by the Ontario Brain Institute is intended or should be inferred. Additional funding was provided by CIHR, Lundbeck, Bristol-Myers Squibb, Pfizer, and Servier. Funding and/or in-kind support is also provided by the investigators' universities and academic institutions. All study medications were independently purchased at wholesale market values.