Intronic Variant of MUTYH Gene Exhibits A Strong Association with Early Onset of Breast Cancer Susceptibility in Indonesian Women Population

Objective: Several studies have recently indicated a huge shifting pattern toward early age onset cases in breast cancer (BC) patients. However, the studies exerted relatively limited to the Caucasian population. This preliminary study is aimed to investigate the genetic risk factors for young BC patients specifically in Indonesia population. Methods: DNA samples were extracted from 79 BC patients aged younger than 40 years old and 90 healthy samples. These DNA samples were sequenced using Illumina NextSeq 500 platform and preprocessed to extract the single-nucleotide polymorphisms (SNPs) data. Firstly, multiple univariate logistic regressions were performed to test the association between each SNP and BC incidence in young patients. Furthermore, to analyze the polygenic effects derived from multiple SNPs, we employed a multivariate logistics regression. Results: There were only 15 SNPs passed our 95% call rate threshold thus subsequently were used in the association test. One of these variants, rs3219493, emerged to be significantly associated with early-onset BC (p-value = 0.025, OR = 3.750, 95% CI = 1.178-11.938). This result is consistent with the multivariate logistic regression model, where the pertinent variant was found statistically significant (p-value = 0.008, OR = 8.398, 95% CI = 1.720-40.920). This variant was identified as an intronic variant within MUTYH gene which has been reported in several published studies to exhibit an association with the incidence of breast cancer in China, Italy and Sephardi Jews population. However, there is no evident this gene impacting the risk of developing early onset of BC in Indonesia population. Conclusion: Despite our limitation in terms of sample size analyzed in this preliminary study, our finding on significant association of intronic MUTHY with the early onset of BC in Indonesia led to a broadened insight of population-based unique aspect to being taken into an in-depth account for and advancement of chemotherapy.


Introduction
Cancer is a manifestation of bizarre and uncontrolled proliferation of the different types of cell in the body (Visvader, 2011). There are hundreds of cancer types which behave and respond distinguishably toward the given treatments (Doll, 2018). In women, breast cancer (BC) is the top leading cause of cancer death globally although its incidences and mortality rates vary in each country as a reflection of economic and lifestyle factors (Curado, 2011;Bray et al., 2018). In Eastern Africa, the incidence rate of BC was relatively low with a 19.3 rate per 100,000 women population while in Western Europe the rate was observed at the level of 89.9, both in a 100,000

Intronic Variant of MUTYH Gene Exhibits A Strong Association with Early Onset of Breast Cancer Susceptibility in Indonesian Women Population
women population (Curado, 2011). China as the most populated country reported in 2012 a number of 187,213 BC cases with the mortality level reached about 25.63%. In contrary, South Korea in the same year reported only 13.27% mortal cases among 17,140 patients diagnosed with BC (Youlden et al., 2014). The incidence of BC was found to increase with age in several Asian countries. In Thailand, a higher risk of developing BC was observed in women aged 45-49 years old, whereas in China the similar risk was seen in an upper age group ranged from 50-84 years old. An interesting profile is exhibited by the BC incidence in India where those women aged 20-55 showed an increased risk of having BC yet the risk declined in the age group of 60-79 (Mubarik et al., 2019).
In 2018 alone the BC incidence in Indonesia reached a number of 58,256 new cases. It constituted 30.9% of all observed cancer cases. Of the number of new cases, a total of 22,692 (38.95%) was reported to be mortal (GLOBOCAN, 2019). According to a report by the Ministry of Health of Indonesia, these far-beyondcontrolled BC incidences are caused by the poor access towards early screening and treatments. Early detection is a crucial part of the cancer prevention program (Muljo et al., 2017;Muljo et al., 2018). However, only up to 5% of Indonesian women are early-screened for BC compared to about 40% screened women in some developed countries (Muljo et al., 2019;Muljo et al., 2020). Most of the BC patients are diagnosed at an advanced stage which generally affects their responsiveness upon given regiments (Departemen Kesehatan Republik Indonesia, 2010;Solikhah et al., 2017;Mardela et al., 2017). This information is usually included in the National Cancer Registry (Pardamean et al., 2016;Pardamean et al., 2018). Other than these clinical access-related factors, survivability of BC patients is highly dependent to demographic variables for instances family history (Fagerholm et al., 2018), financial and marital status (Gomez et al., 2016;Martínez et al., 2017) and educational level (Bahk et al., 2017). Poor dietary habits and an unhealthy lifestyle are also deemed to be essential reasons for the increased BC incidence (Mubarik et al., 2019).
Although human disease-associated genetic variants identification demonstrated a huge advancement thus far, heredity-driven risks have yet to be elucidated. Hence, genome-wide studies are necessitated to allow an in-depth understanding of, particularly, less common alleles in a broad range of ancestry (The International HapMap 3 Consortium, 2010). The international HapMap project (International HapMap Consortium, 2003), the human genome project (Lander et al., 2001) and SNP consortium (Sachidanandam et al., 2001) were built to mainly map and characterize gigantic information provided within the DNA sequences. Approximately 10 million common DNA variants have been identified hitherto (The International HapMap 3 Consortium, 2010). However, they were limited to some well-represented population such as Asian, Europe and African (International HapMap Consortium, 2003).
Cancers are considered a manifestation of disadvantageous genome alteration and when the alterations are derived from ancestors this would worsen the condition. Among cancer subtypes that mostly are fatally dead, breast cancer (BC) emerges as the most frightening cancer for females although the most killer disease is heart diseases ( P&T Snapshot, 2014). BC, in Indonesia, to most recent data accounts for 30.5% of all cancers diagnosed and 21.5% of cancer-related deaths among females. The increasing rates of mortality of the cancer are primarily attributable to the tardiness of getting an initial diagnosis (Bray et al., 2019).
Globally, BC incidence is associated with either deleterious BRCA1/2 mutations (Henouda et al., 2016). A study conducted at UT MD Anderson Cancer Center (MDACC) revealed that, in BRCA1/2 mutation carriers, breast cancer, other than ovarian cancer, the incidence was reported to be increasing in rates (Mersch et al., 2015). Separately elsewhere, Henouda et al., (2016) demonstrated 3 deleterious mutations as well as a large genomic rearrangement being that exon 2 deletion in BRCA 1 gene and 4 mutations in BRCA 2 gene with the novel mutation was discovered within Algerian population (Henouda et al., 2016).
Besides BRCA1/2 gene, MUTYH gene mutations were detected in breast cancer patients and high-risk groups in China and contributed to an increased risk of developing on male breast cancer in Italy (Jian et al., 2017;Rizzolo et al., 2018). Hitherto we figured out that it was very little information regarding the association of MUTYH mutation and BC incidences. One of which is a study conducted in the Sephardi Jews population where MUTYH variants observed were G396D and Y179C variants. Of these 2, only G396D variant that yielded significance to the augmented BC risk healthy controls (Rennert et al., 2012).
A pilot genome-wide association study (GWAS) in 89 Indonesia women with BC and 46 healthy women as referral population has been recently performed using 443,813-covering single nucleotide polymorphism (SNP) array 5.0 from Affymetrix. It was recognized 11 highly-suggestive chromosomes with BC risk, of those 4 loci were with identified genes: 2p.12 with the CTNNA2 gene; chromosome 18p11.2 with the SOGA2 gene; chromosome 5q14.1 with the SSBP2 gene and chromosome 9q31.1 with the TEX10 gene (Haryono et al., 2015). The study, however, did not exclusively plot the possible influence of hereditary aspect or family history the BC participants.
We, henceforward, were interested in evaluating if there is a particular mutation in the women with early-onset BC case. Therefore, we compared the genotypic-profile of two groups, women with early-onset breast cancer (age < 40 years old) and healthy women as a control group. The 40-years-old is our baseline age as suggested by a previously published meta-analysis study by Nelson et al., (2012) who concluded that within the range of 40 -49 years old the first degree relative with BC was seen to be strongly associated with higher than 2-fold increase of BC risk. In this particular study, we employed the Next Generation Sequencing (NGS) approach through which we could anticipate novel BC-associated SNP findings. Significant factor(s) found in this study can be harnessed as a predictor to calculate early-onset breast cancer risk in women in Indonesia or, in a wider scope, South-East Asian population.

Study participants
A total of 79 breast cancer patients aged below 40 years old were recruited from Cipto Mangunkusumo Hospital, Jakarta, Indonesia (Panigoro et al., 2020). These participants (case) were subsequently matched with 90 healthy donors (control) recruited by Genetics Indonesia (GI) as a part of their Hereditary Cancer Screening (HCS) service. All participants have previously well writtenconsented and agreed to participate in this study. This passing SNPs were tested in the same polynomial logistic regression model. To avoid singular matrix problem in polynomial regression, the redundant SNPs that share a similar proportion between samples were removed and only represented by only one SNP. The similarity of this proportion was measured with a correlation score. Those SNPs which correlate with other SNP that higher or lower than 0.7 and -0.7 and have a significant level below 0.05 were filtered.
The logistic regression was conducted in Python with Statsmodel library (Seabold, 2010). The significant SNPs or SNPs with a p-value less than 0.05 from the logistic regression were subsequently re-evaluated using fisher test whenever the SNPs comprised less than 5 samples in one of its segments.

Results
We successfully gathered 79 respondents of under 40 years old breast cancer patients and 90 samples of healthy respondent aged between 20 and 61 years old. The mean age at diagnosis in cases of breast cancer was 41.033 ± SD 8.703 years. DNA 260/280 ratio and breast cancer DNA concentration represented the quality control index of this study. Both of case and control DNA concentration are range between 80 ng/µl and 100 ng/µl. The next-generation platform was employed to assess polymorphisms in the case and control samples. Eventually, our analysis yielded 15 autosomal SNPs which association of each to BC incidence was next evaluated via logistics regression analysis. The result of the univariate logistic regression analysis on the 15 SNPs can be found in Table 1.
Only rs3219493 appeared as a significant factor from the logistic regression analysis. Since this SNP contain less than 5 samples in one of its segments, the SNP was analyzed using the Fisher Exact test. The cross table of rs3219493 between sample group can be found in

DNA Extraction
DNA samples were extracted from the whole blood of our participated BC patients and a buccal swab of our healthy donors. Prior to DNA purification with QIAmp DNA Mini Kit (Qiagen), the blood samples were centrifugated at 2500 x g at room temperature for 10 minutes to obtain the buffy coat. Separately, the buccal swab samples were ethanol-precipitated (prepIT-L2P Kit, DNA Genotek) prior to DNA isolation using OCR-100 Kit (DNA Genotek). DNA concentrations were gauged using the Qubit dsDNA HS Assay Kit (Invitrogen). We took a 10 ng/µl of DNA per sample for our sequencing.

Next-Generation Sequencing
Sequencing was performed on the Illumina NextSeq 500 platform with a High Output Kit v2.5 (300 Cycles). Library construction was conducted following the Hereditary Cancer Screening protocol by Kailos which consisted of steps elucidated as follows. The first step was targeting, where two parts oligos contained a region complementary to the DNA sequence outside the region to capture plus a recognition sequence for restriction enzyme to cut DNA in a specific location. The second step was patch ligation where the oligos, barcodes, and adapters were ligated onto the ends of the targeted region. The last step was amplification where the results of ligated regions were amplified to yield sufficient amount of amplicons necessitated by the sequencer.

Statistical Analysis
We filtered the SNPs in which we only used the SNPs that passed our 95% call rate. Next, based on these SNPs, we carried out two case-control association analyses using logistic regression. The first analysis was regressing each individual SNP with call rate > 95% between case and control group. In the second analysis, 95%-call rate- In the second analysis, we run a polynomial analysis to the 15 SNPs. However, similar predictors may cause a poor result. Therefore, SNPs that correlate higher or lower than ±0.6 with p-value smaller than 0.05 were represented with only one SNP. The correlation heatmap and polynomial logistic regression result can be seen in Figure 1 and Table 3.
To help visualize the result in Table 3, a forest plot of this polygenic analysis was generated and can be seen in Figure 2.

Discussion
Both the univariate and multivariate analysis output the same significant variant associated with BC cases. This intronic variant MUTYH DNA glycosylase without strand mutation was found in all our BC patient cohort   where, based on a coding DNA reference sequence (LRG_220), nucleotide guanine is substituted by cytosine. The mutation was detected only in chromosome 1 at position g.45796269 (rs3219493). Although an alternative position at g.45330597 has also been listed in https://www.ncbi.nlm.nih.gov/snp, we were unable to confirm it with our datasets. It suggests that the alternative form of the substitution is ethnical-associated. However, further assessment is necessary to confirm this speculation.
Taking into account no previous data related with the risk of BC in Indonesian women with family history, this information emerges intriguing. Especially when, on the other hand, BRCA1/2, which, more globally, are known as highly-associated and are suggestive genes to BC incidence are insignificant to the cancer case. Some previous studies have reported, MUTYH gene monoallelic and biallelic mutation variations escalated risk of developing cancer in the absence of the BRCA1/2 mutation (Nielsen et al., 2005;Vogt et al., 2009;Wasielewski et al., 2010;Rennert et al., 2012). However, on some research related to breast cancer in SNP mutation profiling, mutations of MUTYH and TP53 genes were found with a significant BRCA1/2 mutation frequently (Lin et al., 2016;Rummel et al., 2017;Nones et al., 2019;Ow et al., 2019). A study by Jian et al. (2017) regarding the genetic characteristics of breast cancer in the Chinese population showed mutations in the MUTYH gene in breast cancer cases and high-risk groups. MUTYH mutation variant rate of c.892-2A> G in high-risk women who have a family history of breast cancer higher than breast cancer cases based on East Asians in 1000G. Several cases of breast and colorectal cancer in Dutch families occurred an increase in the frequency of MUTYH mutations (Wasielewski et al., 2010). From 153 Dutch families, mutations have been observed withinside the MUTYH p.Tyr179Cys, p.Gly396Asp and p. Pro405Leu genotypes in breast and colorectal cancer cases. However, the correlation between MUTYH gene mutations and breast cancer still needs to be explored considering this gene also has a complex correlation with colorectal cancer, especially in polyposis.
Based on the KEGG pathway, the MUTYH gene plays a role in the DNA repair process, especially through Base Excision Repair. MUTYH as an early function gene that recognizes and removes the presence of mismatched, alkylated, and deaminated Long BER patches. Gene MUTYH is located in the short arm of chromosome 1 (1p34.1) and 11.2 kb long. It is a DNA glycosylase that is pivotally involved in fixing post-replicative mispairs (Kashfi et al., 2013). The effect of the MUTYH gene activates the DNA glycosylase enzyme to detect peculiar A: 8-oxoG and 2-OH-A: G mispairs on DNA helix (AP site) (Slupska et al., 1996;Ohtsubo et al., 2000;Gu, 2001). The mutation on the MUTYH gene possible to interference with the DNA repair process in breast cancer cases. DNA glycosylase mutations especially those influenced by the MUTYH gene were also found to affect the increase in colorectal cancer cases (Farrington et al., 2005). The ability of DNA glycosylase to remove bases, especially methylated ones, correlates with increasing age. According to age growth, the base excision repair process decreases due to a lack of the DNA glycosylase enzyme (Fernandes et al., 2017). This theory allows for the association of high-risk groups for breast cancer, which is generally middle-aged women (over 50 years). However, there are very few studies on the effect of variations in MUTYH mutations on age-related breast cancer. Even though several studies have shown mutations in MUTYH in some cancer cases, the study of this gene which is particularly related to the initiation of the carcinogenesis process or early onset of cancer still requires more massive data to validate the effect of this gene, especially in breast cancer (Kairupan and Scott, 2007).
In this preliminary study, we found out a statistically significant variant associated with early-onset breast cancer case, tested in both univariate and multivariate analysis. It is the intronic variant of the MUTYH gene in chromosome 1. MUTYH is associated with several cancers including breast cancer from other populations but, to our knowledge, there are no reports of the gene's significance in the incidence of early onset of BC in Indonesia or South Asian countries thus far. Surprisingly, variants from BRCA1 and BRCA2 genes did not emerge as significant genetical aspect in our observed population. Despite our limitation in the sense of assayed sample size in our study, our results demonstrated a potential clue of the harnessing MUTYH gene as a novel target for chemotherapy, yet it seemed to be population restricted. However, there is still a long way to go to confirm our finding and thus in our future study, we are going to set as much clinical data as we can extract as our confounding variable in the model.

Author Contribution Statement
All authors contributed and approved the final manuscript. SSP and EL coordinated and supervised the clinical sampling. IN drafted and structured the manuscript. BM, AAH, AB and DS equally conducted data analysis. DA carried out the genetic sequencing. S supervised the sequencing work. JB and BP supervised the data analysis.