SNPs within microRNA binding sites and the prognosis of breast cancer

Single nucleotide polymorphisms (SNPs) within microRNA binding sites can affect the binding of microRNA to mRNA and regulate gene expression, thereby contributing to cancer prognosis. Here we performed a two-stage study of 2647 breast cancer patients to explore the association between SNPs within microRNA binding sites and breast cancer prognosis. In stage I, we genotyped 192 SNPs within microRNA binding sites using the Illumina Goldengate platform. In stage II, we validated SNPs associated with breast cancer prognosis in another dataset using the TaqMan platform. We identified 8 SNPs significantly associated with breast cancer prognosis in stage I (P<0.05), and only rs10878441 was statistically significant in stage II (AA vs CC, HR=2.21, 95% CI: 1.11-4.42, P=0.024). We combined the data from stage I and stage II, and found that, compared with rs10878441 AA genotype, CC genotype was associated with poor survival of breast cancer (HR=2.19, 95% CI: 1.30-3.70, P=0.003). Stratified analyses demonstrated that rs10878441 was related to breast cancer prognosis in grade II and lymph node-negative patients (P<0.05). The Leucine-rich repeat kinase 2 (LRRK2) rs10878441 CC genotype is associated with poor prognosis of breast cancer in a Chinese population and may be used as a potential prognostic biomarker for breast cancer. • The LRRK2 rs10878441 CC genotype is associated with poor prognosis of breast cancer in a Chinese population. • Stratified analyses demonstrated that rs10878441 was related to breast cancer prognosis in grade II patients and lymph node-negative patients.

AGING [2]. It is estimated that around 3-6 million SNPs in the human genome could provide a means for elucidating the genetic component of complex diseases [3].
For many years, age at diagnosis, axillary lymph node metastasis, tumor size, histological grade, hormone receptor status, and human epidermal growth factor receptor 2 (HER2) status represented principal factors used for the purposes of evaluating the prognosis and determining the appropriate strategy of treatment [4]. In addition, different environmental exposures can lead to different prognosis of breast cancer. Body mass index (BMI), nutrition and physical activity are related to the prognosis of breast cancer [5,6]. Reproductive factors such as breastfeeding and pregnancy have been reported to be associated with breast cancer prognosis [7,8].
MicroRNAs (miRNAs) are endogenous non-coding small RNAs (containing about 22 nucleotides) that regulate gene expression by Waston-Crick pairing with the target gene of the 3' untranslated region (3'UTR). It has been reported that microRNAs regulate nearly 30% of human genes [9], and play important roles in most physiological and pathological processes such as tumorigenesis and proliferation. The binding of microRNA to mRNA is critical for regulating the mRNA level and protein expression. However, this binding can be affected by SNPs that reside in the microRNA binding sites. Therefore, SNP variations may interfere or disrupt the binding of the SNPs to microRNAs, which may affect the regulation of miRNAs on target genes, thereby contributing to the prognosis of cancer [10][11][12].
In recent years, a number of studies have reported a link between SNPs within microRNA binding sites and prognosis of various types of cancer including breast cancer [12][13][14]. Teo et al [15] reported the role of rs7180135 in RAD51 in the prognosis of breast cancer patients, and the G minor allele had improved breast cancer specific survival. Brendle et al [16] identified that the A allele of the SNP rs743554 in the 3'UTR of ITGB4 gene was associated with estrogen receptornegative tumors and worse survival in patients with breast cancer. Zhang et al [17] found that miR-367binding site rs1044129 in RYR3 gene was associated with poor survival of patients with breast cancer. Liu et al [18] uncovered that TT genotype of rs16917496 on SET8 3′-UTR region was significantly associated with poor outcome of breast cancer in a Chinese population.
However, there is still a lack of association studies between SNPs within microRNA binding sites and the prognosis of breast cancer with large sample size in China. Therefore, we carried out a two-stage cohort study to investigate the relationship between SNPs within microRNA binding sites and breast cancer prognosis.

Demographic and epidemiological characteristics of patients
The demographic and epidemiological characteristics of 2647 breast cancer patients were shown in Table 1

Clinicopathological characteristics of patients
The clinicopathological characteristics of all participants were presented in Table 2 (Table 3 and Supplementary Figure 1). The associated SNPs were rs1053739 located in NMT1 at 17q21.31, rs2693 located in KIF13B at 8p12, rs698761 located in PREPL at 2p21, rs8602 located in AGING  MKNK1 at 1p33, rs10878441 located in LRRK2 at 12q12, rs10318 located in GREM1 at 15q13.3, rs10075853 located in ST8SIA4 at 5q21.1 and rs8410 located in PREPL at 2p21. We further analyzed the association between the 8 SNPs and breast cancer DFS, rs1053739, rs698761, rs10878441, rs10318, and rs8410 showed a significant association with breast cancer DFS (P<0.05) ( Table 3 and Supplementary Figure 2).

Association between 8 SNPs and breast cancer prognosis in stage II
In stage II, the median follow-up time was 67 months (0 to 143). Among the 8 SNPs identified from stage I, the SNP rs10878441 in LRRK2 gene (the duplex structure between miR-550-3p and LRRK2 was shown in Supplementary Figure 3) was significantly associated AGING  Figure 1). Furthermore, we evaluated the association between the SNP rs10878441 and breast cancer OS stratified by clinical characteristics (Supplementary Table 3). The association was significant for grade II breast cancer AGING  Table 3).

DISCUSSION
Through this association study, we genotyped 192 SNPs within microRNA binding sites and found that 8 SNPs were associated with the prognosis of breast cancer. We further replicated the 8 SNPs in an independent data set, and identified that the SNP rs10878441 (C allele) in LRRK2 gene was significantly associated with poor prognosis of breast cancer. This study provided some evidence for a novel prognostic locus for breast cancer.
In this present study, two SNPs (MKNK1 rs8602, GREM1 rs10318) were previously reported in the context of cancer prognosis. MKNK1 regulates diverse biologic processes including translation, cell proliferation, and differentiation [19,20]. Berger et al found that MKNK1 polymorphism rs8602 might serve as a predictive marker in KRAS wild-type metastatic colorectal cancer patients treated with first-line FOLFIRI and bevacizumab [21]. Neckmann et al AGING  AGING showed that GREM1 was associated with metastasis and predicted poor prognosis in ER-negative breast cancer patients [22]. Dai et al indicated that GREM1 polymorphism rs10318 was associated with recurrence in stage II colorectal cancer patients [23]. Our study found significant association between these two SNPs and breast cancer prognosis only in stage I, while no significant difference was observed in stage II (the validation set).
The LRRK2 gene, located in human chromosome 12q12, is a member of the leucine-rich repeat kinase family and encodes a protein with multiple domains such as a leucine-rich repeat (LRR) domain, a RAS domain, a GTPase domain, a kinase domain and several protein-protein interaction domains [24]. Mutations in LRRK2 gene have been demonstrated to be associated with autosomal-dominant Parkinson's disease [25,26]. Studies have revealed that SNPs in LRRK2 gene have been related to Crohn's disease [27,28]. LRRK2 gene is involved in a variety of cellular processes including cell transformation, proliferation and tumorigenesis, and is linked to various types of cancer [29,30]. Gu et al demonstrated that high expression of LRRK2 promoted the cell proliferation and migration of intrahepatic cholangiocarcinoma (ICC) cells, and predicted worse prognosis in ICC patients [31]. Looyenga et al indicated that MET and LRRK2 cooperated to promote efficient tumor cell growth and survival in papillary renal and thyroid carcinomas [29]. Warø et al reported that LRRK2 mutation carriers had an increased risk of nonskin cancer [32].
Our findings suggest that the C allele of LRRK2 has poor prognosis in breast cancer. LRRK2 expression may be regulated in a variety of ways, while the association between the SNP rs10878441 and the prognosis of breast cancer might be caused by differential microRNA regulation. SNP rs10878441 (A/C) is located within the miR-550-3p binding site, and it is likely to affect the miR-550-3p/LRRK2 interaction. As shown in Supplementary Figure 3, the C allele cannot be targeted by miR-550-3p, leading to an increase expression of LRRK2 protein, thereby altering the prognosis of breast cancer. The expression analysis of TCGA data in Supplementary Figure 4 showed that CC genotype increased the expression of LRRK2 in 1058 breast cancer patients. The definite underlying mechanism for the association with the prognosis of breast cancer remains unknown. Lin et al identified a LINK-A lncRNA that mediated HIF1α phosphorylation at Ser797 by LRRK2, resulting in the activation of normoxic HIF1α signaling and promoting glycolysis reprogramming, tumorigenesis and progression in triple-negative breast cancer [33]. Jiang et al revealed that downregulated LRRK2 gene expression inhibited proliferation and migration while promoting the apoptosis of thyroid cancer cells by inhibiting activation of the JNK signaling pathway [34]. Although we conducted a large systematic two-stage cohort study to evaluate mircoRNA target SNPs and breast cancer prognosis, our study has several limitations. First, we only selected high frequency SNPs with MAF ≥ 0.05, inevitably miss low frequency SNPs that have an impact on breast cancer prognosis. Second, Type 1 error of multiple testing was not corrected in this study, although our design with large sample size and replication set can ensure a high repeatability of our findings. Third, due to the good prognosis of breast cancer patients, the number of deaths and tumor progression were small, and further follow-up will be required to confirm the reliability of the results. In addition, it would be more plausible if we had the data of the expression level of miRNAs and their target genes in clinical samples, further studies are warranted to evaluate the meaning of SNPs on miRNA binding sites in breast cancer biology.
In conclusion, the LRRK2 rs10878441 CC genotype is associated with poor prognosis of breast cancer in a Chinese population, suggesting that it could be a potential prognostic biomarker for breast cancer. Further studies to elucidate the underling mechanism for this association are warranted.

Study subjects
We  [35]. Demographic and epidemiological data were obtained from face-to-face questionnaires by trained personnel. Clinical data and pathology reports were taken from medical records. All patients were followed up by telephone annually. In addition, we further confirmed the accuracy of selfreported information through Hospital information system (HIS) at TJMUCH and death registration system. The study was approved by the Ethics Committee of Tianjin Medical University Cancer Institute and Hospital, AGING and all patients participated in the study signed written informed consent.

SNP selection
The "Patrocles" database (http://www.patrocles.org/) was used to select genome-wide microRNA target SNPs. Of all the 5035 SNPs within microRNA binding site provided by the database, 1742 SNPs had been confirmed. At the same time, SNPs for inclusion conformed with the following criteria: (1) SNPs located at the binding site of microRNA-seed region, and the seed region was defined according to the "7-mirs" criteria [36]. (2)

SNP genotyping
We collected 10 ml ETDA-anticoagulated venous blood, and separated the plasma and white blood cell layer, and stored the white blood cells in a cryotube at -80° C Celsius refrigerator for DNA extraction. Genomic DNA was extracted using QIAGEN DNA Extraction Kit (QIAGEN Inc.) [37]. The Illumina Golden Gate SNP Genotyping Arrays was used to genotype 192 SNPs in stage I. The TaqMan platform was taken to genotype 8 SNPs associated with breast cancer prognosis in stage II. We used a 5-μl reaction mixture system with 20 ng of genomic DNA, 2.5 μl of 2×TaqMan Genotyping Master Mix, 0.1 μl of 40×probe and 1.9μl of double distilled water. The PCR reaction conditions were 95° C for 10 minutes, followed by 50 cycles of 92° C for 30 seconds, and 60° C for 1 minutes. We amplified using the 384-well reaction plates and performed genotype analysis using SDS 2.4 software (Applied Biosystems, Foster City, CA, USA). In order to ensure the accuracy and reliability of the experimental results, approximately 5% of the samples were randomly selected for retesting.

Statistical analysis
Patients' characteristics such as demographic, epidemiological and clinicopathological are represented by n (%). The Kaplan-Meier method was used to calculate survival estimates, and log-rank test was used to compare the survival differences of these SNPs. To determine potential prognostic risk factors, univariate Cox regression was used to evaluate the relationship between demographic, epidemiological and clinicopathological characteristics and breast cancer prognosis, presented as hazard ratios (HRs) and 95% confidence intervals (CIs). Cox regression was used to appraise the association between SNPs and breast cancer OS, with and without adjustments for age at diagnosis, education, occupation, age at menarche, number of live births, breastfeeding duration, abortion, menopause, TNM stage, tumor size, histopathologic classification, grade, lymph node, estrogen receptor (ER), progestogen receptor (PR), and HER2. Similarly, Cox regression was used to assess the relationship between SNPs and breast cancer DFS, with and without adjustments for age at diagnosis, number of live births, breastfeeding duration, abortion, menopause, benign breast disease (BBD), TNM stage, tumor size, histopathologic classification, grade, lymph node, ER, PR, and HER2. We further analyzed the relationship between the SNP rs10878441 and breast cancer OS stratified by clinical characteristics. All statistical tests were two-sided and P<0.05 was considered statistically significant. All statistical analysis was performed using SPSS 20.0 software (SPSS Inc. Chicago, IL, USA) and R version 3.4.3.

AUTHOR CONTRIBUTIONS
LWZ and LH developed the ideas and drafted the manuscript. YBH, ZWF, LYL, JXL and XW were responsible for data processing and statistical analysis. HXL, FFS, HZ, PSW supervised the study procedure and revised the manuscript. FJS and KXC are also involved in data analysis and interpretation, as well as manuscript preparation. All authors read and approved the final manuscript.