Genome-Wide Association Study of Breast Density among Women of African Ancestry

Simple Summary In the US, Black women are disproportionately affected by higher breast cancer mortality rates and later-stage tumor diagnoses compared with White women. Breast density, the ratio of dense fibroglandular breast tissue to overall breast tissue area, has previously been identified as an important breast cancer risk factor. Most current genome-wide association studies for breast density have been performed in participants of European ancestry, which have yielded important insights into genetic etiology of breast density. However, little is known about the influence of common genetic variants on breast density in African ancestry populations. Our study aimed to determine genetic factors associated with breast density in African ancestry populations using a Genome-Wide Association Study (GWAS) and post-GWAS analyses on a cohort of genomic data available through the Penn Medicine BioBank. Results of this study elucidate potential genetic mechanisms associated with breast density, and thus cancer risk, among women of African ancestry. Abstract Breast density, the amount of fibroglandular versus fatty tissue in the breast, is a strong breast cancer risk factor. Understanding genetic factors associated with breast density may help in clarifying mechanisms by which breast density increases cancer risk. To date, 50 genetic loci have been associated with breast density, however, these studies were performed among predominantly European ancestry populations. We utilized a cohort of women aged 40–85 years who underwent screening mammography and had genetic information available from the Penn Medicine BioBank to conduct a Genome-Wide Association Study (GWAS) of breast density among 1323 women of African ancestry. For each mammogram, the publicly available “LIBRA” software was used to quantify dense area and area percent density. We identified 34 significant loci associated with dense area and area percent density, with the strongest signals in GACAT3, CTNNA3, HSD17B6, UGDH, TAAR8, ARHGAP10, BOD1L2, and NR3C2. There was significant overlap between previously identified breast cancer SNPs and SNPs identified as associated with breast density. Our results highlight the importance of breast density GWAS among diverse populations, including African ancestry populations. They may provide novel insights into genetic factors associated with breast density and help in elucidating mechanisms by which density increases breast cancer risk.


Introduction
Black women in the US have 40% higher breast cancer mortality than White women [1] and are more likely to be diagnosed with later stage tumors and with triple negative breast cancers, which have limited treatment options and poorer prognosis than hormone receptor positive tumors [2,3]. Given these disease patterns, early detection is vitally important for

Study Data
We utilized a cohort of women aged 40-85 years who underwent mammography screening (Selenia or Selenia Dimensions; Hologic Inc, Marlborough, MA, USA) at the Hospital of the University of Pennsylvania from 1 September 2010 through 31 December 2014 and did not have a known BRCA1/2 mutation (N = 32,213 screening exams). We excluded screening exams with uncertain outcomes (N = 13), screenings preceded by a breast cancer diagnosis (N = 74), and true positive and false negative screening exams (N = 153). We additionally excluded screenings where a breast implant was present (N = 429), screenings for which all four processed (i.e., 'FOR PRESENTATION') DM views were unavailable (N = 406), and screenings for which breast density evaluation software failed (N = 808). Then, we removed screenings for women who did not consent to participate in the Penn Medicine BioBank (PMBB), a research biorepository where patients provide a blood sample and broadly consent to allow their biospecimens to be used for research purposes (N = 22,627). We excluded screenings for women without a geneticallyinformed genotype (N = 3718), and women who did not self-identify as Black or have an African ancestry genotype (N = 1803). From this pool of screening exams, we selected the earliest exam per person, resulting in 1348 individuals.
Women completed a breast cancer risk factor questionnaire at the time of mammography, from which reproductive factors, age, height, and weight were pulled. Body mass index (BMI) was calculated from self-reported height and weight, supplemented with electronic health record (EHR) data by using the nearest available measurement within 1 year prior to and 6 months after the date of screening. Implausible BMI values (<12 kg/m 2 or >82 kg/m 2 ) [16] were set to missing. Women were considered as postmenopausal if their menstrual periods had stopped or if they were over the age of 55. Patients still missing data on BMI, menopausal status, or age after this were excluded from the analysis (N = 15), resulting in a final study population of 1333 women.

Breast Density
Breast Imaging Reporting and Data System (BI-RADS) 4th or 5th edition breast density was obtained from the mammography report. For each mammogram, the publicly available "LIBRA" software (v1.0.4) was used to automatically quantify breast density [10,17]. Briefly, LIBRA partitions the breast region into density clusters of similar gray-level intensity, which are then aggregated into the final dense tissue segmentation. Summing the area of dense pixels provides total absolute dense area (DA), and normalizing DA by the total breast area gives area percent density. We used the dense area and area percent density estimates obtained from all mammography views available for each woman. A per-woman value of each density measure was generated by averaging the corresponding density estimates from all breast views. Since each view is only a 2D cross section of the breast, none of the views independently capture the true volume of density in the breast. By averaging estimates across each view and across breasts for each woman, we produce a more robust estimate of the actual density in the breast. Distributions of breast area, dense area, and area percent density were visually inspected, and observations that fell greater than 3 standard deviations above the mean were excluded (3 for breast area, 19 for dense area, and 27 for area percent density out of 1333).

Genome-Wide Association Study in Penn Medicine BioBank
We performed a GWAS on our cohort of 1333 women using PLINK 2.0 [18], a tool that employs a generalized linear regression model approach to association testing. We applied the following filters to the imputed PMBB data of the individuals: Excluded 189 individuals to remove relatedness, kept variants with Hardy-Weinberg equilibrium value greater than or equal to 10 −6 , and only kept variants with minor allele frequency greater than 0.01 [19,20]. The covariates used in the model were BMI, age, two principal components of genetic ancestry [21,22], and menopausal status (binary). Principal component analysis (PCA) was conducted using fast PCA approximation in EIGENSOFT package by projecting PCS on 1000 Genome population [23]. Supplementary Figure S1 shows PCA plot for PC1 and PC2 and scree plot showing proportion of variance explained by first 10 PCS to identify significant PCs to use as covariates in this study. For all GWAS results with a p-value of 10 −5 or less, we used Biofilter [24,25] to annotate the variants with their nearest genes. To identify significant and suggestive loci, we applied the clumping parameter in PLINK, which involved identifying the lead single nucleotide polymorphism (SNP) within a 100 kilobase (KB) window of SNPs with a linkage disequilibrium (LD) threshold of R2 ≥ 0.2, as the SNP that represented each locus.

Functional Mapping
We used polyfun [26] for functionally informed fine-mapping. To define functional variants, we used ENSEMBL sequence annotations and epigenetic annotations from EpiMap breast tissue samples [27]. We made the annotations disjoint to optimize regression stability, prioritizing smaller categories. Hapmap SNPs were selected for the regression weights for the L2-regularized h2 step. For the fine-mapping step, we set a p-value threshold of 10 −4 and a maximum number of causal variants of 10. Then, we searched for any variants with a posterior probability of greater than or equal to 0.5 for further annotation.

Transcriptome-Wide Association Study (TWAS)
We performed summary-based TWAS for area percent density and breast dense area [28]. The S-PrediXcan best practices workflow was used to impute expression levels based on GTEx [29] models and to test association in breast mammary tissue only.

GWAS-Catalog Lookup
We performed a lookup of all nominally significant SNPs and genome-wide significant genes identified from GWAS as described in Section 2.3 in the EMBL-EBI GWAS catalog  [30] to identify the known associations of SNPs and genes from our study with other phenotypes. We filtered the catalog results to associations with a reported p-value of no more than 5 × 10 −8 .

Correlation Analyses
We performed estimation of SNP-based heritability and genetic correlation between breast density traits, age, BMI, and menopause status using GCTA's Haseman-Elston's regression approach [31,32]. Additionally, we compared effect sizes and MAF for significant variants from recently published breast density GWAS in European ancestry population [14] with estimates from GWAS in our study of African ancestry participants only.

Results
The characteristics of the study population (N = 1333) are displayed in Table 1 and compared with the total underlying cohort. The average age at screening was 56.7, most patients were postmenopausal (72.2%), and just over 12% had a family history of breast cancer. Based on radiologist-rated BI-RADS density, 18% had heterogeneously dense breasts and 0.5% had extremely dense breasts. The mean area percent density was 22.5% (SD 11.1%).

Functional Mapping
Seventeen variants for breast density and eighteen variants for dense area are identified through fine-mapping at posterior probability >0.98. Among the fine-mapped results are the variants mapped to Enhancer regions on genes, such as KIFC3, CNGB1, and heterochromatin regions on genes PDE10A and KIFC3, as shown in Table 3. Table 3. Fine-mapping results with PIP ≥ 0.98. Epigenetic annotations: Repr = repressor, TssFlnkD = downstream flanking region to transcription start site, TssFlnkU = upstream flanking region to transcription start site, Het = heterochromatin.

Trait
Gene

GWAS Catalog Lookups
In the EMBL-EBI GWAS catalog, we found many traits that had reported associations with the genes (Supplementary Table S3) that were found to be significantly associated with area percent density and dense area. Traits associated with three genes were cardiovascular diseases, lipid measurements, immune system diseases, body mass index, and response to drugs. All traits shown in Figure 4 were associated with at least two genes.

GWAS Catalog Lookups
In the EMBL-EBI GWAS catalog, we found many traits that had reported associations with the genes (Supplementary Table S3) that were found to be significantly associated with area percent density and dense area. Traits associated with three genes were cardiovascular diseases, lipid measurements, immune system diseases, body mass index, and response to drugs. All traits shown in Figure 4 were associated with at least two genes.  Our set of suggestive SNPs from all three GWAS (p < 1 × 10 −5 ) was reduced to only eight SNPs when we looked for an overlap with reported associations in the GWAS catalog, as shown in Figure 5. One SNP was associated with breast size (rs10110651), rs61895110 was associated with bone density, and rs77754964 was associated with FEV/FEC ratio (measurement of forced expiratory volume). Other SNP-trait pairings are: rs78049001, cognitive decline; rs2976530, hip bone mineral density; rs75986475, physical Our set of suggestive SNPs from all three GWAS (p < 1 × 10 −5 ) was reduced to only eight SNPs when we looked for an overlap with reported associations in the GWAS catalog, as shown in Figure 5. One SNP was associated with breast size (rs10110651), rs61895110 was associated with bone density, and rs77754964 was associated with FEV/FEC ratio (measurement of forced expiratory volume). Other SNP-trait pairings are: rs78049001, cognitive decline; rs2976530, hip bone mineral density; rs75986475, physical activity; rs78730126, sex hormone binding globulin measurement.

Correlation Analyses
Genetic correlation among the breast density traits evaluated in this study are shown in Figure 6. The results suggest positive correlation between all breast density traits. Positive correlation was also observed between BMI and the three breast density traits. However, a negative correlation is observed between BMI and menopause status. Genetic correlation between menopause status and breast density traits was close to 0.

Correlation Analyses
Genetic correlation among the breast density traits evaluated in this study are shown in Figure 6. The results suggest positive correlation between all breast density traits. Positive correlation was also observed between BMI and the three breast density traits. However, a negative correlation is observed between BMI and menopause status. Genetic correlation between menopause status and breast density traits was close to 0.

Comparison among EUR and AFR Breast Density GWAS
We compared our results with effect estimates and significance reported in a recently published breast density GWAS for 27,900 European ancestry individuals [14] (Table 4).
Two SNPs identified as genome-wide significant in EUR ancestry study were significant at p-value < 0.001 in our analyses (rs16885613 and rs10087804).

Discussion
To our knowledge, ours is the first GWAS of quantitative breast density measurements performed among women of African ancestry. Among 1333 women, we measured dense area and area percent density from digital mammograms using a validated software algorithm and found sixty-five variants in twenty-nine genes associated with dense area and nine variants in five genes associated with area percent density. Our results highlight the potential value of examining SNPs associated with breast density among women of African ancestry, emphasizing the need for diverse ancestry analyses to better understand the genetic underpinnings of breast density and its impact on breast cancer risk in underrepresented populations.
Of the loci identified in this study, 13 of these regions had been previously identified as associated with breast density in studies of European ancestry populations [12,14,15]. Fifty-seven loci had previously been identified as associated with breast cancer risk among European ancestry populations [42], and three had previously been associated with breast cancer risk among African ancestry populations [43].
Several of the identified SNPs and genes have potentially plausible mechanistic connections to breast density and breast cancer risk. Two SNPs were identified with genome-wide significance in CTNNA3, catenin alpha 3, which encodes a protein involved in cell-cell adhesion in muscle cells. CTNNA3 was previously identified in breast cancer GWAS [33]. Alpha and beta catenins have been implicated in cancer cell metastasis [44]. Additionally, prior African ancestry GWAS found CTNNA3 to be associated with metabolic syndrome. This is interesting given the observed differences in breast density by BMI levels [6]. In addition, we identified a variant within a gene encoding another alpha catenin, CTNNA1, which has recently been categorized as a predisposition gene for Hereditary Diffuse Gastric Cancer [45]. Furthermore, loss of function mutations have been identified among breast cancer patients undergoing multigene panel testing [46]. Together, these findings suggest further research on the role of alpha catenins in both breast density as well as breast cancer risk.
HSD17B6, hydroxysteroid 17-beta dehydrogenase 6, is involved in androgen catabolism and has been implicated in polycystic ovarian syndrome (PCOS) [47], including metabolic perturbations correlated with PCOS, including increased BMI, fasting insulin, and insulin resistance [48]. LRP1B encodes a member of the low density lipoprotein (LDL) receptor family, which has been implicated in both metabolic phenotypes [49] and several cancers [50]. GACAT3, gastric cancer associated transcript 3 is a long non-coding RNA that has been previously implicated in gastric and other cancers, with high expression observed in breast cancer tissue [51] and correlated with prognosis among breast cancer patients [52].
Fine-mapping results include the heterochromatin region of the phosphodiesterase 10A (PDE10A). PDEs have oncogenic effects, and several preclinical studies have shown that inhibition of PDEs has an anti-tumor effect [53]. A recent study demonstrated that inhibition of PDE10A decreased cell proliferation, induced cell cycle arrest, and increased apoptosis in ovarian cancer cells [53]. Kinesin family member C3 (KIFC3) encodes a member of the kinesin-14 family of microtubule motors. These motor proteins attach to microtubules and move along them to transport cellular cargo. Overexpression of KIFC3 was shown to be associated with resistance to docetaxel in breast cancer cell lines [54]. SH3GL3 has been implicated as a tumor suppressor in glioblastoma and lung cancer, as well as in cell migration and invasion in myeloma, and has been previously identified in colorectal cancer GWAS associated with colorectal cancer [55].
Exploratory TWAS identified two loci associated with dense area and percent density, CD63/PRIM1 and FXYD3. CD63 encodes a membrane protein of lysosomes [38] and glycosylation of this protein has been shown to affect breast carcinogenesis [35][36][37][38]. In addition, CD63 was identified when machine learning was applied to GWAS data with respect to radiation-associated contralateral breast cancer [39]. PRIM1, which encodes DNA primase polypeptide 1, has been found to be overexpressed in breast tumors [34]. The observed association with PRIM1 may also be explained by the fact that PRIM1 is associated with age at menopause, and breast density is known to decrease following menopause [56]. FXYD3, an mRNA also known as Mat-8 (Mammary tumor 8 kDa), is highly expressed in breast cancers [40] and has been shown to regulate breast cancer stem cells [41].
Despite the modest sample size, our study identified novel SNPs with plausible mechanistic connections to both breast density and breast cancer risk. Breast density was measured quantitatively using an automated algorithm with high accuracy. In addition, breast density is known to be highly heritable. In combination, the continuous quantitative trait and the high heritability may have resulted in the ability to detect moderate-to-large associations despite a relatively small sample size. However, the small sample size limits our ability to detect more modest associations. Therefore, replication in larger populations of African ancestry populations and meta-analyses are warranted to increase statistical power to detect additional SNPs with more modest associations with breast density.
Given that the biological underpinnings of breast density are poorly understood, further investigation of our findings may help in identifying pathways relevant for breast density development as well as the mechanistic relationship between breast density and breast cancer risk. In this study, we investigated the overlap between our top-associated variants and those reported in previous studies of breast cancer and other related traits. We believe that this integrative approach can help in shedding light on the underlying biology of breast density and its relationship to breast cancer risk. Future research is needed to validate our preliminary findings and further explore the functional implications of the identified genetic variants.
Our study highlights the value of exploring the genetic factors associated with breast density among African ancestry populations, providing proof-of-concept that additional SNPs relevant to breast density may be identified through expanding the diversity of GWAS studies. Furthermore, a strength of our study is its use of quantitative measures of breast density, which have been shown to be strongly correlated to breast cancer risk [9,10] and more objective and reproducible than radiologist-rated breast density [11]. Despite these strengths, our sample size was small, and therefore results will need validation in larger studies. Furthermore, novel fully volumetric methods derived from digital breast tomosynthesis, or 3D mammograms, may provide even more precise quantification of dense breast tissue enabling even greater power to detect associations with genetic factors.

Conclusions
We report the first GWAS of breast density among women of African ancestry, in which we identified novel SNPs associated with quantitative breast density measures, many of which had been previously identified as associated with breast cancer. Our results mark the beginning of the study of breast density among African ancestry populations and provide hypothesis generating findings that may help in clarifying the biology of both breast density and breast cancer. provided in the supplement). We thank the outstanding Penn Medicine Corporate IS team (Jessica Chen, Christine Vanzandbergen, Jeffrey Landgraf, Colin Wollack, Ned Haubein) for its major efforts to implement e-consenting in the EHR as well as biospecimen acquisition and tracking. We thank the Regeneron Genetics Center for partnership in generating genetic variant data and for scientific interactions. We thank the Smilow family for their generous gift that made the launch of the PMBB possible.

Conflicts of Interest:
The authors declare no conflict of interest.