Metabolomic profiles in breast cancer:a pilot case-control study in the breast cancer family registry

Metabolomics is emerging as an important tool for detecting differences between diseased and non-diseased individuals. However, prospective studies are limited. We examined the detectability, reliability, and distribution of metabolites measured in pre-diagnostic plasma samples in a pilot study of women enrolled in the Northern California site of the Breast Cancer Family Registry. The study included 45 cases diagnosed with breast cancer at least one year after the blood draw, and 45 controls. Controls were matched on age (within 5 years), family status, BRCA status, and menopausal status. Duplicate samples were included for reliability assessment. We used a liquid chromatography/gas chromatography mass spectrometer platform to measure metabolites. We calculated intraclass correlations (ICCs) among duplicate samples, and coefficients of variation (CVs) across metabolites. Of the 661 named metabolites detected, 338 (51%) were found in all samples, and 490 (74%) in more than 80% of samples. The median ICC between duplicates was 0.96 (25th – 75th percentile: 0.82–0.99). We observed a greater than 20% case-control difference in 24 metabolites (p < 0.05), although these associations were not significant after adjusting for multiple comparisons. These data show that assays are reproducible for many metabolites, there is a minimal laboratory variation for the same sample, and a large between-person variation. Despite small sample size, differences between cases and controls in some metabolites suggest that a well-powered large-scale study is likely to detect biological meaningful differences to provide a better understanding of breast cancer etiology.


Background
Metabolomics is the systematic survey of the small molecules (< 1 k Dalton in size) that are the products of metabolism in biological systems [1,2]. A metabolic phenotype represents the collection of metabolites within the body which reflects influences from both genetic and lifestyle/environmental factors. Because metabolites include the intermediate-and end-products of the cellular processes, metabolomics provides a functional readout of the physiological state of health and disease. Changes in energy metabolism within cells are one of the hallmarks of carcinogenesis. Under aerobic conditions, normal cells metabolize energy by first converting glucose into pyruvate and then to carbon dioxide, and under anaerobic conditions, cells metabolize by glycolysis. However, the converse is true for cancer cells, where under aerobic conditions, energy metabolism occurs largely by glycolysis, i.e., "aerobic glycolysis" [3]. Thus, a characterization of metabolic processes may provide new insights into carcinogenesis. In recent years, metabolomics has emerged as an important tool for the identification of biomarkers in a growing number of applications, including early disease detection, monitoring of disease progression, and investigation of metabolic pathways. The application of metabolomics has yielded novel signatures predicting the occurrence and progression of complex diseases, including cancers of the breast [4], prostate, colon, and kidney [5][6][7][8].
Most metabolomics studies of breast cancer to date have been conducted in tumor tissues or cell lines and with the goals of distinguishing cancer from normal tissue and cancers with metastasis from those without, as well as identifying therapeutic targets [9][10][11]. Data from these studies have suggested that metabolomic profiles may differ by pathological and molecular subtype of breast cancer. In large scale epidemiologic studies, blood and urine are more readily available than tissue. Because blood and urine serve as transporters of nutrients and wastes to and from cells for excretion, and maintain homeostasis of essential molecules and fluid levels, they are sensitive indicators of health and perturbations from diseases. Several studies have measured urinary metabolic profiles and found promising candidate markers for early detection and monitoring of breast cancer progression [12][13][14]. However, studies using pre-diagnostic blood are limited. To our knowledge, there has been only one previously published study on metabolomics and breast cancer risk using pre-diagnostic blood [4], warranting additional studies to replicate the findings in other populations. We conducted a pilot study to generate preliminary data to assess whether circulating metabolomic profiles could be detected in pre-diagnostic plasma samples of women enrolled in the Breast Cancer Family Registry (BCFR) cohort, and to evaluate the reproducibility of metabolomic assays.

Study population
Pre-diagnostic plasma samples were obtained from the BCFR, an international prospective cohort of breast cancer families established in 1995 [15] [16]. For this pilot study, samples were selected from the Northern California site (NC-BCFR), which enrolled women with newly diagnosed breast cancer (probands) identified through the population-based cancer registry of the San Francisco Bay area and family members [17]. At baseline, participants completed a risk factor questionnaire and provided a blood sample. During follow-up, newly diagnosed breast cancer cases were identified among family members who were unaffected at baseline. This pilot study included 45 women who were diagnosed with breast cancer at least one year after the blood draw (cases) and 45 women who did not develop breast cancer (controls). Of the 45 cases, 72% of the cases were confirmed via cancer registry linkage or pathology reports; the remainder were self-reported. Controls were matched to cases on family status (a sister was selected if available; if more than one sister was available, we selected the sister closest in age), age at blood draw (±5 years), menopausal status at diagnosis, and number of affected first degree relatives (1, 2, or ≥3). The age range for cases was 26-80 years (average 52.4 years), and that for controls was 36-73 years (average 53 years).

Laboratory assays
Plasma samples obtained from cases and matched controls were aliquoted into 200 μl ethylenediaminetetraacetic acid (EDTA) plasma vials. Case-control sets were assayed in the same batch and adjacent to each other in sequence. Samples were identified by specimen ID only, and laboratory technicians were masked to the case-control status of samples. The samples were assayed on the Discovery HD4 platform, a mass spectrometry-based metabolomics profiling platform, at Metabolon (Durham, NC, USA). This method combines automated sample extraction processing, an ultrahigh performance liquid chromatography/electrospray ionization tandem mass spectrometry (UHPLC/MS) with additional gas chromatography mass spectrometry (GC/MS) platform. Peaks were quantified using area-under-the-curve and metabolite levels were generated. The metabolite data were normalized to a median of 1.00 to correct for variation resulting from instrument tuning differences. Metabolites not detected in individual samples were imputed with the minimum value for that metabolite. The data were then log-transformed to reduce non-normality. Duplicates from 10 controls were included to assess assay reproducibility. Data from the duplicates were used to assess intra-class correlation (ICC) and were averaged for the case-control analysis.

Statistical analysis
To assess the reliability of our assay results, we calculated coefficients of variation (CVs) and ICCs across duplicate samples. Coefficient of variation is a measure of dispersion, that describes the amount of variability relative to the mean. For samples measured using the same method, a low (~10%) variability within subjects and high variability across subjects is desirable. Intra-class correlations describe the degree to which duplicate samples agree: a value between 0.75 and 1 indicates excellent agreement.
We used the variance component from a one-way analysis of variance (ANOVA) model to estimate the ICC for replicate samples, and estimated confidence intervals for the ICCs using the Smith method [18] in R (ICC-Package) [19]. We calculated CVs across named metabolites, and used principal component analysis [20] to identify the important components (groups of metabolites) in each sample (including the duplicates). We used paired t-tests to examine differences in the normalized metabolite levels between cases and controls. We used non-linear modeling to examine whether normalized metabolite levels were associated with age at blood draw. We evaluated quadratic and cubic models and used the Akaike Information Criterion (AIC) to evaluate the best model fit. We also evaluated using ANOVA with robust variance to examine differences in metabolite levels by the number of affected first-degree relatives (1, 2, or ≥3), and among cases with available information, by estrogen receptor (ER) status (positive/negative) and progesterone receptor (PR) status (positive/negative). Due to the limited sample size of this pilot study, further analysis of subgroups by age was not statistically meaningful.

Results
Of the 45 cases selected, 31 had two or more affected first-degree female relatives, while 14 cases had one affected first-degree female relative. Six cases were BRCA1 mutation carriers, while four were BRCA2 mutation carriers. Twenty-one cases were premenopausal at blood collection, and the remainder were postmenopausal. The average age at breast cancer diagnosis was 58.8 years.
The average age at blood draw was 52.4 years for cases, compared to 53.1 years for controls. Approximately 51% of cases were ER positive, and 22% were ER negative. About 20% of cases had localized tumors, and 5% had regional involvement limited to the nodes (Table 1).
We detected a total of 661 known named metabolites in our samples. Of these, 338 (51%) were detected in all the samples, and 490 (74%) were detected in greater than 80% of the 90 study samples. These metabolites include amino acids and lipids, and some related to microbiome influences and xenobiotics metabolism (Additional file 1: Table S1). The average CV across all named metabolites was 0.16 (25th -75th percentile: 0. 06-0.20) ( Table 2). The median ICC between duplicates was 0.96 (25th -75th percentile: 0.82-0.99). The average variance was 60.7% among individuals, and 6.0% for duplicate samples within individuals.
Principal component analysis identified the top 3 components of all samples. The scores of the components identified were very similar for duplicate samples. (Fig. 1).
We observed a greater than 20% case-control difference in 24 metabolites that were statistically significant (p < 0.05). Metabolites including 3-(cystein-S-yl)acetaminophen (xenobiotics pathway), 4-acetylphenol sulfate (xenobiotics pathway), and cysteine s-sulfate (amino acid pathway) were significantly higher in cases, whereas indoleacetylglutamine, (amino acid pathway), 2-ethylphenylsulfate (xenobiotics  pathway), and sphingosine (lipid pathway) were significantly higher in controls (Fig. 2). Among the metabolites that showed a greater than 20% case-control difference, we also examined differences among hypothesized predictors of breast cancer risk, including differences by age at blood draw, the number of affected first-degree relatives (Table 3), ER status, and PR status (Table 4). Statistically significant (p < 0.05) but modest associations were observed between some metabolites and age at blood draw; for example, age explained 13% of the variation in 1-(1-enyl-palmitoyl)-2-oleoyl-GPC (P-16: 0/18:1) (adjusted r 2 = 0.13; p < 0.01).  We examined whether metabolites differed by the number of affected first-degree relatives. Overall, there was no clear monotonically increasing or decreasing pattern by the number of affected relatives: mean levels were similar for women with one or three affected relatives, and lower or higher for those with two affected relatives. For example, for 2-ethylphenylsulfate, the mean levels were 0.68, 0.97, and 0.64, for women with one, two, or three affected relatives, respectively (p < 0.001).
In some metabolites, we found differences by ER status. For example, cases with ER+ breast cancer had higher mean laurylcarnitine level than those with ER-breast cancer (1.16 vs. 0.83, p = 0.04). Conversely, the mean indoleacetyleglutamine level was lower for ER+ breast cancer cases than ER-cases (0.46 vs. 0.54, p = 0.04).
Finally, we examined whether metabolite levels differed by PR status. The mean asparagine level for PR+ cases was 1.12, compared to 1.18 for PR-cases (p = 0.02). For adrenate (22:4n6), the mean level for PR+ cases was 1.07 compared to 0.69 for PR-cases, and for N-(2-furoyl)glycine, the mean level for PR+ cases was 2.87 compared to 7.54 for PR-cases. However, none of the associations remained significant after adjusting for multiple comparisons.

Discussion
These data, despite small numbers, suggest that a large number of metabolites have detectable levels, with good reproducibility, as suggested by high ICCs and reasonable CVs. We also showed that for most metabolites, the within-person variance is small, while the between-person The age models for all metabolites were fitted with a linear and a quadratic term, with the exception of maltitriose, which also included a cubic term for age, as the model fit was improved for this metabolite over the liner and quadratic model b Not adjusted for multiple comparisons c Indicates compounds that have not been officially confirmed based on a standard, although identity is certain Significant data are in bold variance is much larger. We found that some metabolites have a greater than 20% case-control difference. Finally, we showed that some metabolites (including N-(2-furoyl) glycine in the xenobiotics pathway) differed by key breast cancer risk factors such as the number of affected family members, although these associations were not significant after adjusting for multiple comparisons, likely due to the small sample size. Taken together, these results suggest that a large-scale study (~1000) would be well-powered to detect meaningful biological and statistically significant differences between cases and controls to provide a better understanding of breast cancer etiology across a wide spectrum of risks, and among high-risk women in particular. Metabolomics profiles are becoming increasingly utilized in epidemiological studies to predict the risk of chronic diseases, including breast cancer; however, data from prospective studies are limited. In the first prospective study of metabolomics and breast cancer risk, Kuhn et al. [4] found that phosphatidylcholines were associated with breast cancer risk. That study included 362 sporadic breast cancer cases, and measured only 120 metabolites. To date, there are no metabolomics data on women at increased risk of breast cancer due to their family history of breast cancer. Clearly additional data from prospective studies are needed to further examine the role of metabolomics in breast carcinogenesis.
The assay performance on our samples, measured by ICCs and CVs, is consistent with earlier studies that have examined the utility of metabolomics in epidemiological research among participants of the Shanghai Physical Activity Study [21]. In that study, the variability Indicates compounds that have not been officially confirmed based on a standard, although identity is certain Significant data are in bold in a large subset of metabolites was assessed and the intraclass correlation was high (median 0.8). Similar assay performance was also observed in a nested casecontrol study of metabolomics and colorectal cancer risk [22] that included 254 cases and 254 matched controls from the Prostate, Lung, Colorectal and Ovarian Cancer study. In that study, which used a metabolomics platform similar to the one used in our pilot study, the median intraclass correlation was 0.86 (25th-75th percentile: 0.64-0.92).
Consistent with our observation that age at blood collection was associated with metabolite levels, Saito et al. [23] also reported that certain metabolites were associated with age at blood draw in a Japanese population. Because the populations in the two studies are quite different, a direct comparison is not possible. Similarly, Tang et al. [24] reported that metabolites in tumor tissues were associated with ER status, and also with BRCA1-associated tumors. However, studies utilizing human plasma are limited.
One notable limitation of our study is that we did not match on the duration of storage time between cases and controls. Post-hoc analysis revealed that among casecontrol pairs, 35 pairs (78%) had a difference of less than 3 years of storage duration, while 1 pair had a difference of more than 10 years. Further analyses showed that while there were no appreciable differences in the analyses by age, there were differences in 5 metabolites when evaluating levels by the number of affected relatives. Future studies should match on calendar year of blood draw (hence storage duration) within case-control matched sets.
Our study is among the first to examine the association between metabolomics and breast cancer risk using prediagnostic plasma samples. Despite the limited sample size, we were able to find a larger than 20% case-control difference in several metabolites, although we cannot rule out the possibility that the presence of asymptomatic preclinical breast cancer may have affected metabolite levels in cases. Such bias is possible but should be minimal as we excluded cases diagnosed with breast cancer within 12 months after the blood draw in order to limit the potential for preclinical disease to influence metabolite levels. Finally, our study is also among the first to examine reliability across more than 600 metabolites.

Conclusions
In conclusion, findings from this study suggest that metabolomics can be used reliably in large-scale epidemiologic studies of breast cancer to detect meaningful differences in risk.

Additional file
Additional file 1: TableS1. Number of metabolites measured in plasma of BCFR participants (DOCX 17 kb)