Characteristics of Mammographic Breast Density and Associated Factors for Chinese Women: Results from an Automated Measurement

Background Characteristics of mammographic density for Chinese women are understudied. This study aims to identify factors associated with mammographic density in China using a quantitative method. Methods Mammographic density was measured for a total of 1071 (84 with and 987 without breast cancer) women using an automatic algorithm AutoDensity. Pearson tests examined relationships between density and continuous variables and t-tests compared differences of mean density values between groupings of categorical variables. Linear models were built using multiple regression. Results Percentage density and dense area were positively associated with each other for cancer-free (r=0.487, p<0.001) and cancer groups (r=0.446, p<0.001), respectively. For women without breast cancer, weight and BMI (p<0.001) were found to be negatively associated (r=-0.237, r=-0.272) with percentage density whereas they were found to be positively associated (r=0.110, r=0.099) with dense area; age at mammography was found to be associated with percentage density (r=-0.202, p<0.001) and dense area (r=-0.086, p<0.001) but did not add any prediction within multivariate models; lower percentage density was found within women with secondary education background or below compared to women with tertiary education. For women with breast cancer, percentage density demonstrated similar relationships with that of cancer-free women whilst breast area was the only factor associated with dense area (r=0.739, p<0.001). Conclusion This is the first time that mammographic density was measured by a quantitative method for women in China and identified associations should be useful to health policy makers who are responsible for introducing effective models of breast cancer prevention and diagnosis.


Introduction
Breast cancer is the most commonly diagnosed neoplasm amongst women in China and it is one of the leading causes of cancer death in females [1]. Mammographic density, describing the amount of fibrous and glandular tissue within the breasts, is consistently demonstrated to be an important risk factor for breast cancer. Women with highest density were shown to have 2 to 6 times higher risk in developing breast cancer compared to those with the lowest [2]. Mammographic density is also associated with an elevated risk of masking tumours, which lowers the sensitivity of 2 Journal of Oncology mammography [3] and therefore identifies women who may benefit from additional imaging such as breast ultrasound or magnetic resonance imaging [4]. Digital breast tomosynthesis might also be recommended [5].
Well-confirmed factors associated with higher density include younger age, lower body mass index (BMI), premenopausal status, nulliparity, late age at first delivery, a smaller number of live births, and family history of breast cancer [6]. However, current knowledge around density data is largely based on women from westernised countries and the characteristics of mammographic density for women in China are under studied [7]. From limited data that are available, Chinese mammographic density was shown to be positively associated with earlier age at menarche, premenopausal status, smaller number of children, later age at first delivery, and personal history of benign breast disease [8,9]. Also, larger breast size was found to be negatively associated with density amongst premenopausal women in China [10].
Even though the previously mentioned studies investigated Chinese mammographic density, the associations predominantly focused on reproductive agents. In addition, all previous studies used the qualitative method of Breast Imaging Reporting and Data System (BI-RADS) classifications. Despite being the most commonly used assessment approach of mammographic density in both clinical settings and screening programs in China and many other countries [11,12], the BI-RADS classification has been shown to suffer limited reproducibility with wide inter-(kappa = 0.02-0.77) and intrareader (kappa = 0.32-0.88) variations [13]. This subjectivity has the potential to result in inconsistent breast cancer risk prediction and unnecessary discrepancies in decision-making for density assessment [14]. As a consequence, automated methods using mathematical and physical principles have been designed to promote objective and consistent assessment of mammographic density.
The aim of the current work is to identify predictive factors of mammographic density for both Chinese women with and without breast cancer using a quantitative algorithm. Two density metrics will be considered, percentage density (PD) and dense area (DA) measures, and the impact of each metric on various associations will be explored.  . All data that came from FUSCC database were deidentified in this retrospective study and informed consent was waived.

Data Collection.
Women's characteristics were obtained from the registration form and the discharge summary contained within the health record for each woman with breast cancer and through a BCSP questionnaire for breast cancerfree women. All the information for women was deidentified, with dedicated study IDs used to link mammograms and other data.
Details on height, weight, age at menarche, age at menopause, age at first delivery, and duration of breastfeeding were collected as continuous variables. Age at mammography was calculated by the assessment date and date of birth.
Ethnicities other than Han Chinese were classified into a single non-Han grouping and level of education was coded into a dichotomous variable in order to increase statistical power since these two variables with more than two groupings resulted in very uneven and low numbers in certain groups. Geographic location was also coded as a categorical variable with two groupings (Shanghai and other locations) since the program was conducted in Shanghai and consequently most of the participants came from Shanghai. Menopause status, parity history, number of children, breastfeeding history, personal history of breast cancer, family history of breast cancer, degree of consanguinity, smoking history, and history of alcohol consumption were also classified into two groupings which were specific to each variable detailed in the results.
All of the factors of interest mentioned above were collected for women without breast cancer; however ethnicity, smoking, alcohol history, level of education, and geographic location were unavailable for women with cancer since these details were not recorded on admission to FUSCC.

Image Acquisition.
Mammograms taken closest in time to the cancer diagnosis and to the questionnaire completion were obtained for women with and without breast cancer, respectively. For all women, craniocaudal projection of both sides of breasts (where available) was accessed and these mammograms were acquired by Mammomat Inspiration (Siemens; Erlangen, Germany) or Selenia (Hologic, Inc., Bedford, MA, USA) units.

Mammographic Density Measurement.
Mammographic density was measured by a fully automatic algorithm Auto-Density version 1.7, which identifies both areas of dense tissue (dense area) and of breast tissue (breast area) in mammograms and then classifies percentage mammographic density. This algorithm, which has been validated elsewhere [15], automatically finds an optimal threshold for each mammogram independently from any other images in a data set, in order to segment the breast from the background within a mammogram and outline the dense tissue within the breast (Figure 1(a)). Both the dense area (Figure 1(b)) and breast area (Figure 1(c)) are highlighted and the resultant PD was produced by dividing the dense area (number of pixels) by the breast area (number of pixels) and expressing in a percentage. Mammograms of both left and right breasts for each woman were assessed and the average value of both sides was used for all the statistical analyses. This algorithm was provided to the affiliation of the corresponding author in September 2016.

Statistical
Analysis. The data derived from both the screening program (cancer-free women) and clinical settings (cancer women) were subjected to two types of statistical analysis: univariable and multivariable analysis. Women with and without cancer were analysed as separate groups, because the variable sets available for each group differed slightly. The relationship between PD and continuous variables was assessed using the Pearson correlation coefficient (r). Difference of mean values of PD was compared between the groupings of each dichotomous variable using t-tests.
To identify key factors associated with PD, linear model building was performed using stepwise multiple regression adopting the significant variables from Pearson tests and ttests except those restricted to women with specific conditions (for example, age at menopause was restricted to postmenopausal women only, so this variable was not used in the model building). Residuals of the PD were examined to check for assumptions of linear models by using regression scatterplots and histograms. R-squared statistics were used to assess the goodness of fit of the models.
All of the statistical tests performed for PD were repeated for DA. SPSS (IBM SPSS statistics for windows, version 22.0) statistical package was used for all statistical analyses, and twotailed tests of significance were employed using a significance level of 0.05.

Characteristics of Participants.
After excluding cases with unilateral images, a total of 1071 (84 with and 987 without breast cancer) women were finally selected for statistical analysis. Table 1 shows the characteristics for both groups of women. Figure 2 depicts the distribution of PD, DA, and breast area from AutoDensity algorithm for both cancer and cancerfree women.

Association between PD and DA.
PD and DA were positively correlated for cancer-free women (r = 0.487, p < 0.001) and for women with cancer (r = 0.446, p < 0.001), respectively.

Determinants of Mammographic Density.
The output from the Pearson and t-tests for both PD and DA are shown in Table 2 for both cancer and cancer-free women.

Women without Breast
Cancer. Age at mammography (r = -0.202), weight (r = -0.237), BMI (r = -0.272), and age at menarche (r = -0.078) were significantly and negatively associated (p < 0.001) with PD. Lower PD (p < 0.001) was found within postmenopausal women and women with secondary education background or below compared to premenopausal women and women with tertiary education.
DA was found to be positively associated with breast area (r = 0.790, p < 0.001), body weight (r = 0.110, p < 0.001), and BMI (r = 0.099, p = 0.002). Negative associations were shown between DA and age at mammography (r = -0.086, p = 0.007) and age at menarche (r = -0.080, p = 0.012). DA was also found to be lower in women with a history of nulliparity   (p = 0.014) and lack of breastfeeding (p= 0.002) compared to women without such histories.
Within this group of women, breast area was positively associated with DA (r = 0.739, p < 0.001).

Linear Models.
Linear models were built for both PD and DA for each of the two groups of women (where for menopause status, 0 = premenopausal and 1 = postmenopausal, and for education level, 0 = secondary and below and 1 = tertiary). The equations of the 4 most-effective models (I-IV) are presented as follows and the residuals of these models were all normally distributed. The details of individual coefficients are in Supplementary Tables S1-S4.

Discussion
This study, for the very first time, identified a number of factors associated with mammographic density for women both with and without breast cancer in China by employing a fully automatic algorithm AutoDensity. Two measures provided by this algorithm were used to assess mammographic density in our study: PD and DA, which we found were moderately correlated with each other. Previous studies that compared the differences of prediction of breast cancer risk between these two measures suggested that the cancer risk associated with DA was stronger than or as strong as that with PD [16,17]. By combining the effects of the constituting measures [18], PD delivers limited information regarding the absolute amount of dense tissues which are potentially at risk of undergoing a malignant transformation [19]. To illustrate, when a certain amount of dense tissue is measured within a small breast, a relatively higher percentage will be provided, compared to the identical amount of target tissue measured within a large breast. However, even though it may therefore be argued that PD is not an appropriate measure of choice in etiologic research, it is very commonly used to present mammographic density since it is an easily applicable and practicable prognostic factor of breast cancer risk [20]. This might partially result from the fact that percentage density appears to be less affected by technical issues such as the degree of breast compression [16].
Determinants for both measures were demonstrated within our study; however predictors were not consistent for PD and DA. An example of this inconsistency is breast area, which accounted for more than half of the DA variation for both cancer and cancer-free women, whereas it appeared to have no impact on PD. This was particularly noticeable for women diagnosed with cancer since breast area is the only factor arising from univariate analysis that was statistically significant.
Associations of body weight and BMI were dependent on which of the two measures were used. The negative associations of mammographic density with increasing BMI and increasing weight that have been shown for the percentage metric have been shown for several decades across many populations [21]. In contrast, DA was found to be positively associated with weight and BMI, which is not aligned with most of the westernised-based literature [17,20,21]. However a similar finding was shown in studies involving Chinese women living in westernised and developed countries [22,23]. The alignment with our work suggests that the positive association (although not strong) between BMI/weight and DA might be unique to Chinese mammographic density. Nevertheless, this hypothesis will need further study to be proven or disproven because there is very limited literature on this topic that our results can be compared to (see Introduction). The question, however, remains of why would relationships appear in the opposite directions in our work focusing on Chinese women depending on whether PD or DA is used as the dependent variable.
Another important finding was that women in tertiary education appeared to have denser breasts compared to those women with lower level of education. This finding is consistent with previous studies from Europe and North American with a focus on Caucasian women [24,25]. To our knowledge, this is the first time the relationship between mammographic density and education for women in China has been shown. This could have important future implications since Chinese people are increasingly keen to undergo tertiary education, for example, the graduation rate from tertiary education institutions increased by three times over the last two decades [26]. Other socioeconomic factors, e.g., employment, household income, home ownership, urbanisation/ruralisation, and social class, associated with higher education levels, may also impact on this relationship [25], but this was not investigated in our study.
Despite displaying a negative association with mammographic density within the univariate analysis, age at mammography did not add any prediction beyond other variables within the multivariate model for PD or DA in women without breast cancer. This is inconsistent with previous work based on either Chinese [10,22] or other populations [17,27] and may suggest a characteristic only relevant to women in our study and not applicable to the general Chinese female population. Another explanation is that the contributions of other elements within the multivariate models had a much greater impact than that of age at mammography or that age has already been modelled by proxy through menopause, which is highly correlated to age in the optimal model (r = 0.762, p < 0.001).
The available data around relationships between density and smoking history and alcohol consumption for populations other than Chinese are inconsistent. Some studies found a positive association with alcohol consumption [28] and a negative association with smoking history [29], whereas others showed no associations [30]. We also failed to identify any association with these two lifestyle factors, which is consistent with two previous work focusing on Chinese women [8,10], but may be partially explained by the low number of women in our study with a positive smoking or alcohol intake history (less than 15%), and information only being available for women without breast cancer. With regard to ethnic variations, this is the first time that density was studied between women of Han origin and non-Han origin in China and no associations were shown, which was different to that seen for ethnic variations in other populations [31]. This finding however should be treated with some degree of caution since all the ethnic minority groups at data collection were categorized as non-Han origin in order to increase statistical power, since the total number of women in this study belonging to specific ethnic minorities was very low (<2%). This aggregation could be obscuring minority-specific observations, an issue that needs to be addressed in further work.
This study used a fully automatic algorithm to measure mammographic density for women in China. Even though, in the clinical and screening settings in China, the BI-RADS scheme is the most commonly used classification to assess density, this visual approach is relatively time-consuming and requires more workload from radiologists compared to quantitative computer aided methods [32]. Also, the reproducibility of BI-RADS classification is questionable due to the subjectivity of readers involved with density assessment [33]. Even though it is the first time that AutoDensity has been used for density assessment for Chinese women, it has been shown to be comparable to Cumulus, a globally employed semiautomatic algorithm, in terms of association with breast cancer risk and breast cancer screening outcomes in Australia [15]. This approach allowed important associations to be identified but also revealed that one must standardise and understand better the metric being used. In addition, Auto-Density is a breast area-based algorithm instead of a volumebased algorithm. AutoDensity is therefore based on the projected area, rather than the volume of breast tissues, and consequently finds a threshold between dense and nondense areas. Therefore the thickness of the breast is not taken into account during the AutoDensity measurement. This potential source of error in measurement is likely to attenuate the observed relationship between percentage density/dense area and potential determinants and risk of breast cancer.
Nevertheless, this study has a few limitations. As menopause was shown to be an important and contributing factor for Chinese mammographic density, different menopausal status might have important influences on the density values. However, we did not separate pre-, peri-and postmenopausal women in our study, which will be the focus of further work. Also, the small sample size of women in the cancer group is noted. A larger sample of women with cancer may have revealed further relationships, and future studies seeking to recruit larger samples of women diagnosed with cancer are recommended. Besides, we acknowledge that the lack of follow-up period after mammograms in our study may be a challenge. But due to the fact that the followup period was not a standard process of BCSP, we were unable to collect these data. This could mean that the cancerfree women may contain missed breast cancer and thus increased values of both PD and DA. Finally, we did not provide a comparative analysis using both quantitative (i.e., AutoDensity) and qualitative (i.e., BI-RADS) measurements because BI-RADS scales were not routinely reported in the BCSP. But this could be a focus of further work.
In conclusion, this study for the first time in China demonstrated important determinants of mammographic density in AutoDensity-generated PD and DA values. Differences between the two density metrics emphasise the importance of understanding better what each metric represents for both women with and without breast cancer and ensuring that approaches are standardised. We believe our findings should be valuable to health policy makers who are responsible for introducing effective models of breast cancer prevention and diagnosis.

Data Availability
The images and dataset used to support the findings of this study are restricted by the Human Research Ethics Committee of the University of Sydney and the Institutional Review Board of Fudan University Shanghai Cancer Center in order to protect patient privacy and confidentiality. Data may be available to researchers who meet the criteria for access to confidential data by contacting the corresponding author.

Disclosure
Tong Li and Lichen Tang share the first authorship.

Conflicts of Interest
The authors declared no conflicts of interest.

Supplementary Materials
Supplementary Table S1: linear models of predictors of percentage mammographic density for women without breast cancer. Supplementary Table S2: linear models of predictors of dense area for women without breast cancer. Supplementary Table S3: linear models of predictors of percentage mammographic density for women with breast cancer. Supplementary Table S4: linear models of predictors of dense area for women with breast cancer. (Supplementary Material)