Race-associated biological differences among luminal A and basal-like breast cancers in the Carolina Breast Cancer Study

We examined racial differences in the expression of eight genes and their associations with risk of recurrence among 478 white and 495 black women who participated in the Carolina Breast Cancer Study Phase 3. Breast tumor samples were analyzed for PAM50 subtype and for eight genes previously found to be differentially expressed by race and associated with breast cancer survival: ACOX2, MUC1, FAM177A1, GSTT2, PSPH, PSPHL, SQLE, and TYMS. The expression of these genes according to race was assessed using linear regression and each gene was evaluated in association with recurrence using Cox regression. Compared to white women, black women had lower expression of MUC1, a suspected good prognosis gene, and higher expression of GSTT2, PSPHL, SQLE, and TYMS, suspected poor prognosis genes, after adjustment for age and PAM50 subtype. High expression (greater than median versus less than or equal to median) of FAM177A1 and PSPH was associated with a 63% increase (hazard ratio (HR) = 1.63, 95% confidence interval (CI) = 1.09–2.46) and 76% increase (HR = 1.76, 95% CI = 1.15–2.68), respectively, in risk of recurrence after adjustment for age, race, PAM50 subtype, and ROR-PT score. Log2-transformed SQLE expression was associated with a 20% increase (HR = 1.20, 95% CI = 1.03–1.41) in recurrence risk after adjustment. A continuous multi-gene score comprised of eight genes was also associated with increased risk of recurrence among all women (HR = 1.11, 95% CI = 1.04–1.19) and among white (HR = 1.14, 95% CI = 1.03–1.27) and black (HR = 1.11, 95% CI = 1.02–1.20) women. Racial differences in gene expression may contribute to the survival disparity observed between black and white women diagnosed with breast cancer.


Background
Historically, white women have had higher incidence rates of breast cancer compared to black women; however, in recent years incidence rates among white and black women have converged [1]. Mortality rates, on the other hand, remain higher among black women, and rates have continued to diverge despite notable improvements in survival in both races since 1990 [2]. Environmental and other factors including socio-economic status, access to and quality of care, and delays in treatment have been cited as potential explanations of the survival disparity, as have biological factors [3]. Previous research indicates that even among estrogen receptor (ER)-positive and HER2negative breast cancers, which have more favorable outcomes [4], black women have higher mortality rates compared to white women [5]. Recent work highlighted racial differences in risk of recurrence (ROR) scores among ER + /HER2breast cancers [6,7], but biological differences in tumors between black and white women are only just beginning to be understood.
Several studies have used whole genome expression data to screen for racial differences in tumors [8][9][10][11], including our own recent findings [12]. In that study, we examined biological differences by race among luminal A and basallike breast cancers using publicly available data, and identified several genes including ACOX2, CRYBB2, MUC1, PSPH, SQLE, and TYMS that were differentially expressed by race and that were associated with differences in survival [12]. A limitation of our prior study was the small study population, with data from only 108 Caucasian and 57 African-American women. Herein, we expand this analysis to validate our previous findings in approximately 1000 cases, half of whom are black women, within a larger population-based context. Specifically, we sought to estimate differences in the expression of two suspected good prognosis genes (ACOX2 and MUC1) and six suspected poor prognosis genes (FAM177A1, GSTT2, PSPH, PSPHL, SQLE, and TYMS) by race, and to examine their associations with risk of breast cancer recurrence.

Study population
This study uses data from the Carolina Breast Cancer Study Phase 3 (CBCS3), a population-based study of 3000 women conducted in 24 counties in eastern and central North Carolina from 2008-2013. Recruitment and data collection procedures for CBCS3 and prior study phases appear elsewhere [13]. In brief, women aged 20-74 years residing in the 24 counties and diagnosed with first primary invasive breast cancer were identified using rapid case ascertainment in collaboration with the NC Central Cancer Registry. After determination of study eligibility, sampling was performed to ensure adequate representation of various subgroups (i.e., young and African-American women). After informed consent was obtained, all participants completed an interviewer-administered questionnaire, provided blood samples, and provided written consent for retrieval of medical records and paraffin-embedded tumor blocks.

Tumor gene expression profiling and molecular subtyping
Procedures for tumor gene expression profiling of the 1013 of 3000 women enrolled in the CBCS3 have been previously published [6]. In brief, RNA was isolated from cores using the Qiagen RNeasy FFPE kit and protocol, with 95% of tumors producing quantifiable RNA. The majority (98.2%) of samples were obtained before neoadjuvant chemotherapy treatment. Samples were randomized to batches for RNA extraction and analyses. In total, 1122 samples from 1042 cases from CBCS3 were analyzed for the PAM50 assay and for the expression of an additional 150 genes using the NanoString nCounter gene expression system [14]. The PAM50 predictor [15] was used to categorize breast tumors into intrinsic subtype as luminal A, luminal B, HER2-enriched, basal-like, and normal-like, and to calculate the ROR score with proliferation (ROR-P) and tumor size (ROR-PT) included. Probes for nine genes identified by D' Arcy et al. were included: ACOX2, CRYBB2, MUC1, FAM177A1, GSTT2, PSPH, PSPHL, SQLE, and TYMS [12].
Quality control was conducted using the NanoString-Norm package in R. Samples with poor quality were identified using the following criteria: (1) the ratio of the geometric mean expression levels of six positive controls of a sample to the average geometric means of the six positive controls across all samples fell outside the range of 0.3-3; (2) the expression level of 90% of endogenous genes was lower than the mean (+3 SD) of negative controls; and (3) the geometric mean of the reference genes of a sample was greater than 3 SDs from the average geometric means of the reference genes across all batches. Of the 1122 samples, 39 did not pass quality control. We further excluded 70 duplicate samples with lower quality gene expression data, resulting in an analytic gene expression sample of 1013. Of the nine genes of initial interest in the current study, the expression of one (CRYBB2) was below the geometric mean of negative controls in > 60% of samples and was not considered further. The raw RNA counts were normalized using the geometric mean of the six positive control genes and then log 2 -transformed for analyses. Among the 1013 women with available gene expression data, we excluded all women who self-identified as non-black or non-Caucasian white, including seven American-Indian, 13 Asian, and 20 women of 'other' races, resulting in an analytic sample of 478 white and 495 black women (see Additional file 1: Table S1) for participant characteristics).

Breast cancer recurrence
The time from breast cancer diagnosis to the first breast cancer recurrence was obtained from the medical records. Among the 973 women with available gene expression data, we identified 114 women with at least one recurrence during a median follow-up of 5.07 years (range = 0.39-8.22 years). Approximately 9% of white women and 15% of black women had at least one recurrence during the follow-up period.

Statistical analysis
We first examined associations between gene expression and a range of participant demographics, reproductive factors, and clinical characteristics using linear regression and independent sample t tests. Based on likelihood ratio tests from age-adjusted linear regression models, with the exception of ER status, there were no significant gene expressionby-covariate interactions. Therefore, results from the independent sample t tests based on all women are reported in Additional file 1: Table S1), and age-adjusted RNA counts by race and ER status are reported separately in Additional file 1: Table S2). We then examined race-associated gene expression of the eight genes overall, and by luminal A and basal-like breast cancer subtype using linear regression. In separate models, we regressed the normalized log 2 -transformed expression of each of the eight genes on: race (black vs white), study design variables (age at diagnosis in years, which was used for sampling; and codeset, which varied between Nanostring batches), and PAM50 subtype (luminal A, luminal B, HER2-enriched, basal-like, and normal-like), as appropriate. The covariate-adjusted β coefficients, representing the log 2 (relative difference in gene expression among black women relative to white women), and the corresponding 95% confidence limits from the linear regression models were back-transformed (i.e., 10 (log 10 (2)*β) ) to obtain the relative difference in gene expression.
We dichotomized gene expression at the median (i.e., ≤ median = low, and > median = high expression) for each gene and, among the 938 women with breast cancer stages I-III, examined unadjusted associations with risk of recurrence using the Kaplan-Meier survival function. Overall, and by race, and among women with ER + /HER2breast cancer, we used Cox regression to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for the associations between dichotomized as well as continuous log 2transformed gene expression adjusted for age, race, codeset, PAM50 subtype, and ROR-PT score (low, medium, and high), as appropriate. Although breast cancer subtype could potentially mediate the associations between gene expression and breast cancer recurrence, we were interested in understanding these adjusted associations rather than assuming a causal model. We evaluated the joint effects of all eight genes on risk of recurrence by creating a multi-gene race-associated expression (MRE) score. To compute the score, we applied the method of D' Arcy et al. [12] wherein we assigned individual scores of -1 or +1 to each of the eight genes. For six of the eight genes (FAM177A1, GSTT2, PSPH, PSPHL, SQLE, and TYMS), expression below the median was assigned a risk score of -1 indicating lower risk of recurrence, and expression above the median was assigned a score of +1 indicating higher risk of recurrence. Given the inverse associations between survival and expression of ACOX2 and MUC1, for these two genes expression below the median was assigned a score of +1 and expression above the median was assigned a score of -1. We summed the individual gene risk scores resulting in an MRE score ranging from -8 to +8, with higher scores indicating higher risk of recurrence, and also categorized the MRE score as -8 to -2 (low), -1 to 3 (medium), and 4 to 8 (high recurrence risk). We conducted all analyses using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA).

Results
In this subsample of women from CBCS3, there were approximately equal proportions of black (51%) and white (49%) breast cancer patients (Additional file 1: Table S1). Women were approximately 52 years of age on average, and the majority were postmenopausal (57%), and diagnosed with stage I/II (84%) and grade I/II (52%) tumors. By PAM50 classification, the majority of tumors were luminal A (38%), followed by basal-like (25%), luminal B (20%), HER2-enriched (12%), and normal-like (5%). As previously reported in CBCS3 [6] and elsewhere [5,7], black women of all ages had a higher frequency of basal-like (33.9% versus 16.7%) and HER2-enriched (13.3% versus 10.0%) cancers, and lower frequency of luminal A breast cancers (29.5% versus 47.3%). Few participant demographic and reproductive factor characteristics were consistently associated with gene expression. On the other hand, the expression of most genes was associated with clinicopathological factors including tumor grade, tumor size, ER and progesterone receptor (PR) status, and PAM50 subtype (Additional file 1: Tables S1 and S2).

Racial differences in gene expression
Overall, black women had lower expression of MUC1, a good prognosis gene, and higher expression GSTT2, PSPHL, SQLE, and TYMS, poor prognosis genes, after adjustment for age, codeset, and PAM50 subtype ( Table 1). The largest difference in expression was for PSPHL, of which black women had expression levels that were more than double those in white women (relative expression = 2.38, 95% CI = 2.11-2.67). Racial patterns in expression of these five genes were similar in direction and magnitude when restricted to women with luminal A breast tumors; however, among women with basal-like tumors, only GSTT2 and PSPHL were differentially expressed by race.
We next stratified these survival relationships by race. Patterns of recurrence were similar when adjusting for study design factors only. However, after further adjustment for PAM50 subtype and ROR-PT score, most associations among white women were weaker than those among black women, with the exception of PSPH which was stronger in white (HR = 2.04, 95% CI = 1.00-4.15) than black (HR = 1.69, 95% CI = 1.00-2.85) women. Among black women, high (vs low) expression of FAM177A1 was associated with a 73% increase (HR = 1.73, 95% CI = 1.04-2.87) in risk of recurrence in the fully adjusted model.

Discussion
Previously reported race and survival-associated genes including MUC1, GSTT2, PSPHL, SQLE, and TYMS were associated with race in this population-based study of women diagnosed with breast cancer. Except for FAM177A1 and GSTT2, the genes we examined in this study were associated with risk of recurrence in unadjusted models. Of the genes differentially expressed by race, SQLE expression as a continuous measure was associated with increased risk of breast cancer recurrence, even after adjustment for breast cancer subtype and ROR score. Additionally, a multi-gene score comprised of all eight genes examined in this study was strongly associated with recurrence risk among all women and among black women diagnosed with ER + /HERbreast cancer.
Our findings are consistent with prior studies reporting lower expression of MUC1 and higher expression of GSTT2, PSPHL, SQLE, and TYMS among black women compared to white women [9][10][11][12]. MUC1 expression was positively associated with lower grade, smaller tumor size, and positive ER/PR status in our study and in a previous study [16]; however, expression was not associated with recurrence among black women after adjustment for PAM50 subtype, although there was a suggestive inverse association with recurrence among white women. MUC1, which is part of a large family of mucin glycoproteins, is involved with cell signaling and cell-cell and cell-matrix adhesion [17], and may impact breast cancer recurrence via these pathways or by directly binding to and activating ERα [18]. In contrast to previous studies, in our study PSPH and ACOX2 were not differentially expressed by race, although PSPH, but Interestingly, recent evidence suggests that racial differences in the expression of PSPHL may be a consequence of a 30-kb deletion from chromosome 7p11, including the promoter and first three of four exons of PSPHL, effectively eliminating PSPHL expression, more frequently found among individuals of African ancestry [19]. Although we did not examine PSPHL polymorphisms, our findings may reflect underlying genetic differences. Whereas the study by Rummel and colleagues [19] found no association between PSPHL loss or retention and pathological characteristics, in our study, PSPHL expression was associated with grade, tumor size, ER/PR status, and breast cancer PAM50 subtype [19]. SQLE expression was higher in tumors of black women compared to white women, and was associated with more  938). a Genes previously found to be inversely associated with breast cancer mortality, and b genes previously found to be positively associated with breast cancer mortality aggressive tumors including tumors of high histologic grade, nodal involvement, larger size, ER -/HER2 + status, and with increased risk of breast cancer recurrence, consistent with prior studies [20]. Applying the criteria proposed by D' Arcy et al. [12] for a disparity-associated gene that: (1) the gene should be differentially expressed by race in the tumor, and (2) the differential expression of a candidate gene should be associated with a difference in breast cancer survival, we identified SQLE as a disparity-associated gene. SQLE is located on chromosome 8q24.13, and encodes squalene epoxidase, an enzyme that catalyzes the first oxygenation step in cholesterol synthesis [21]. Given that squalene epoxidase is thought to be one of the rate-limiting enzymes in the cholesterol synthesis pathway, overexpression of SQLE may also result in increased cholesterol bioavailability, which may promote ER-dependent growth and Liver X receptor-dependent metastasis [22]. Furthermore, as prior researchers have hypothesized [20], SQLE expression together with overexpression of other nearby genes including RAD21, which encodes a protein involved in DNA repair, could work to promote a more aggressive cancer phenotype. If SQLE is confirmed by other studies, these findings provide further evidence for the potential use of statins in adjuvant breast cancer therapy [23,24] as well as the potential for SQLE inhibition as a novel cancer treatment option [20]. The function of FAM177A1 (family with sequence similarity 177 member A1) [25] and PSPHL (phospherine phosphatase-like) [26] are not well characterized and thus their associations with recurrence are not entirely clear. PSPHL is hypothesized to influence rates of cellular proliferation [27], and therefore could potentially directly impact cancer progression. This study had several strengths including the large population-based design including the oversampling of young and black women; however, this study had several limitations. First, in our analyses of breast cancer recurrence, the proportion of women with at least one recurrence was relatively small (10%); however, ours is the largest study conducted to date on the topic and provides results consistent with previous studies. Second, a limitation of this research is that we cannot establish the mechanism for higher expression levels (i.e., we cannot distinguish between expression changes that are due to differentiation state or cell lineage versus those that are due to tumor-specific mutations). We also note that some of the genes had prognostic value only within one subtype. For example, genes that tend to be strongly associated with proliferation, such as MUC1 and TYMS, tended to have more prognostic value among luminal breast cancers where proliferation status is variable; very few basal-like breast cancers have low proliferation and therefore proliferation genes often do not provide prognostic value. Third, given prior reports of higher expression of CRYBB2 among black women compared with white women [7-9, 11, 12], we were a priori interested in including CRYBB2 in our analyses; unfortunately, we were unable to examine expression of this gene due to a large amount of missing data. Future studies should continue to examine CRYBB2 expression for its potential relevance as a disparity-associated gene. Finally, in this study we did not compare gene expression in tumor tissue to normal or adjacent-normal tissue; however, in our previous work [12] we observed that patterns of expression, comparing normal to tumor, were similar between black and white women. This suggests that differences in cellular composition between black and white women are not responsible for the racial differences in MUC1 expression.

Conclusions
In summary, we validated previously observed racial differences in the expression of several genes using a large population-based study. Of the genes that were differentially expressed by race, high expression of one gene, SQLE, was also associated with an increased risk of breast cancer recurrence and thus may be a potential disparity-associated gene. Among women with the more favorable ER + /HER2breast cancer subtype, the multigene race-associated score comprised of all eight genes was associated with a 15% increase in risk of recurrence among black but not white women. We conclude that racial differences in gene expression may contribute to the survival disparity observed between black and white women diagnosed with breast cancer.

Additional file
Additional file 1: Table S1. Gene expression by participant and clinical characteristics from CBCS Phase 3, 2008-2013.