Genetic variants in lncRNA SRA and risk of breast cancer

Long non-coding RNA (lncRNA) steroid receptor RNA activator (SRA) has been identified to activate steroid receptor transcriptional activity and participate in tumor pathogenesis. This case-control study evaluated the association between two haplotype tagging SNPs (htSNPs) (rs10463297, rs801460) of the whole SRA sequence and breast cancer risk. We found that rs10463297 TC genotype significantly increased BC risk compared with CC genotype in both the codominant (TC vs. TT: OR=1.43, 95 % CI=1.02–2.00) and recessive (TC+CC vs. TT: OR=1.39, 95 % CI=1.01–1.92) genetic models. Both TC, TC + CC genotypes of rs10463297 and GA, AA, GA+AA genotypes of rs801460 were significantly associated with estrogen receptor (ER) positivity status. rs10463297 TC (2.09 ± 0.41), CC (2.42 ± 0.51) and TC + CC (2.20 ± 0.47) genotypes were associated with higher blood plasma SRA mRNA levels compared with the TT genotype(1.45 ± 0.34). Gene–reproductive interaction analysis presented a best model consisted of four factors (rs10463297, age, post-menopausal, No. of pregnancy), which could increase the BC risk with 1.58-fold (OR=1.58, 95 % CI=1.23–2.03). These findings suggest that SRA genetic variants may contribute to BC risk and have apparent interaction with reproductive factors in BC progression.


INTRODUCTION
Breast cancer is the most frequently diagnosed malignant tumor and the first leading cause of cancer death among females [1,2]. A large number of reproductive factors have been reported to be associated with BC, including early menarche, late menopause, no breast-feeding history for born baby, nullparity, abortion and family history of BC [3]. Moreover, a series of susceptibility genes have been identified to be implicated with breast cancer risk, and the association between single nucleotide polymorphisms (SNPs) and risk of BC has been reported [4,5]. It is generally considered that genetic susceptibility, reproductive factors and gene-reproductive factors interactions all contribute to the development of BC.
Up to 98% of the transcriptional output of the human genome could represent RNA that do not code for protein [6]. These 'non-coding RNAs' (ncRNAs) were previously believed to be transcriptional noise, but now accumulating evidences suggest that they play important roles in cell proliferation, differentiation, apoptosis, metabolism and immune [7]. A basic classification criterion of ncRNAs is based on their length: small ncRNAs and long ncRNAs (lncRNAs). Small ncRNAs are processed from longer precursors [8]. Over the past few years, a wealth of studies have highlighted the importance of small ncRNAs, especially microRNAs (miRNAs), in the development of cancers, and their variants were associated with various cancer risks [9][10][11]. By contrast, lncRNAs are eukaryotic RNAs longer than 200 nucleotides, lacking open reading frame, having no protein coding capacity, and function without major prior processing [12]. Recent studies have indicated that lncRNAs may play regulatory and structural roles through diverse molecular mechanisms in important biological processes [13]. LncRNAs contribute to carcinogenesis, and deliver functions in controlling cell cycle progression, apoptosis, invasion, and migration. Several studies have highlighted the importance of lncRNA and their genetic variants in the development of cancers. For example, H19 is an estrogen-inducible gene and plays a key role in www.impactjournals.com/oncotarget cell survival, which may serve as a biomarker for breast cancer diagnosis and progression [14], and a significantly decreased risk of bladder cancer was found for H19 rs2839698 TC carriers [15]. rs11752942 AG+GG in the lincRNA-uc003opf.1 exon had a significantly reduced risk of esophageal squamous cell carcinoma (ESCC), the rs11752942G allele could markedly attenuate the level of lincRNA-uc003opf.1 and affect cell proliferation and tumor growth [16]. HOTAIR has been widely identified to participate in tumor pathogenesis, acting as a promoter in colorectal cancer carcinogenesis, and rs7958904 CC decreased the risk of colorectal cancer compared with GG genotype [17]. Li et al., founded that the C to T base change at rs12325489 could disrupts the binding site for miRNA-370, influencing lincRNA-ENST00000515084 transcriptional activity and affecting breast cancer cell proliferation and tumor growth [18].
Another lncRNA that may play an important role in breast cancer is the steroid receptor RNA activator (SRA). SRA, located on chromosome 5q31.3 and containing five exons and four introns, was initially characterized as belonging to the growing family of functional noncoding RNAs, specifically activating steroid receptor transcriptional activity [19]. The level of SRA is increased in breast tumors and the expression of SRA correlates with estrogen receptor (ER) and progesterone receptor (PR) levels, which may alter ER/PR action and promote tumorigenesis [20]. However, to date, no research has been executed to evaluate the SRA polymorphism and the risk of BC. On the basis of the above description, we hypothesized that functional SNPs in SRA might have association with the BC risk. Tagging SNPs of SRA were selected with the Haploview version 4.2 software. Four particular SNPs (rs10463297, rs801460, rs250425 and rs250426) were representative and could capture all the other common SNPs with a tagging threshold of r 2 > 0.80. However, rs250425 was not in the region of SRA and the refSNP alleles of rs250426 was A/G/T (FWD) according to the NCBI dbSNP database, and we could not find a restriction enzyme to cut the PCR amplification products and genotyping accurately. According to the HapMap data of Chinese Han populations in Beijing, T and C allele frequency of SRA rs10463297 were 0.467 and 0.533 respectively. C and T allele frequency of SRA rs801460 were 0.412 and 0.588 respectively. So we finally selected these two particular SNPs (rs10463297 and rs801460) for our study by using the criteria of a minor allele frequency (MAF) ≥0.1 in the Chinese Han population. We genotyped the two SRA haplotype tagging SNPs (rs10463297 and rs801460) in a population-based case-control study comprising 489 BC patients and 495 age frequency matched controls from China. The association between the SRA SNPs and breast cancer risk were investigated by molecular epidemiology.

Characteristics of the study population
The baseline characteristics of the 489 BC cases and 490 cancer-free controls are shown in Table 1. The mean age was 48.45±10.13 and 49.14±10.06 years for BC cases and healthy controls, respectively. As expected, the mean age for two groups paired quite well. There was no significant differences between case and control groups with respect to other baseline characteristic factors, including age at menarche and menopause, menstrual history, No. of abortion, breast-feeding and family history.

Associations between SRA genotypes and the risk of BC
The genotype and allele distributions of two SNPs (rs10463297 and rs801460) in cases and controls are shown in Table 2. The observed genotype frequencies for the two SNPs agreed with the expected ones from the Hardy-Weinberg equilibrium in the 495 cancer-free controls, respectively (P = 0.14 for rs10463297, P = 0.06 for rs801460

Functional relevance of rs10463297 genotypes on SRA mRNA expression
We further randomly selected 82 cancer-free controls and investigated the correlations between rs10463297 genotypes and SRA mRNA expression level in blood plasma. Among the 82 cancer-free controls, 17 had TT genotype of rs10463297, 42 had TC genotype of rs10463297, and 23 had CC genotype of rs10463297. As shown in Figure 1, SRA mRNA expression levels were significantly higher for the TC (2.09 ± 0.41), CC (2.42 ± 0.51) and TC + CC genotypes (2.20 ± 0.47) than the TT genotype (1.45 ± 0.34) (P = 0.002, 0.001 and 0.002, respectively). A significance increased SRA mRNA expression towards was found for the effect of the C allele (P trend =0.001).

Haplotype analyses and combined effect of two SNPs
Haplotype analysis was performed to evaluate the combined effect of the two polymorphisms on the risk of BC. A total of four haplotypes were derived from the observed genotypes (Table 3), of which C rs10463297 A rs801460 was the most common haplotype in cases and controls. No significant association with BC risk was observed for these four haplotypes. We further calculated the joint effect and potential locus-locus interaction on BC risk by categorizing the SNPs (rs10463297 and rs801460) into the number of combined variant alleles. When compared to individuals with 0-1 mutation allele, no statistical increased risk for BC in each subgroup and no increased dose-dependent manner was observed on the combined effect of the two SNPs (Table 4).

Stratified analysis of SNP genotypes and BC risk
A stratified analysis assessing the associations between the SRA SNP genotypes and the risk of breast cancer was conducted. As indicated in Table 5, we found that the increased risk of breast cancer associated with the rs10463297 variant allele was significant among age >50 (P=0.03, adjusted OR =1.79, 95% CI=1.05-3.05). No significant association with SRA polymorphisms was observed in other subgroups.

Receptor status and BC risk
We further demonstrated the association of rs10463297 and rs801460 polymorphism genotypes with the clinicopathological features in Table 6, including ER status, PR status and HER-2 status. Among the 489 cases

Gene-reproductive factors interaction analysis
MDR analysis was performed to analyze the gene-reproductive factors interaction with two SNPs (rs10463297 and rs801460), age, the ages of menarche and menopause, menopausal status, number of pregnancies and abortions, breast-feeding and family history of BC in fist-degree relatives ( Table 7). The best model consisted of four factors (rs10463297, age, post-menopausal, No. of pregnancy) with TBA: 0.56 and CVC: 3/10, which could categorize the BC risk in the "high-risk group" 1.58-fold (P<0.001, OR=1.58, 95 % CI=1.23-2.03) compared to the "low-risk group".

FPRP values for all significant associations
Moreover, for all the significant associations observed above, we calculated the false positive report probability (FPRP) values to test whether there were false positive associations. As shown in Table 8, when we set the assumption of prior probability at 0.25, all of the significant associations were noteworthy (FPRP <0.5). After correction for the assumption of prior probability (p=0.10), the rs10463297 TC with BC and ER (FPRP=0.351 and 0.196 respectively), rs10463297 TC+TT with BC and ER (FPRP=0.378 and 0.194 respectively), rs801460 GA, AA and GA+AA with ER (FPRP=0.341, 0.456 and 0.242 respectively) were still noteworthy.

DISCUSSION
Single nucleotide polymorphisms (SNPs) have been confirmed to have profound effects on gene expression and function, and participate in carcinogenesis. Recently, studies on the effects of SNPs have extended to functional lncRNAs. SNPs in several lncRNAs have been reported to be associated with cancer risk. In this populationbased case-control study in a Chinese population, we selected htSNPs in lncRNA SRA region, and assessed the association between these genetic variants and breast cancer susceptibility. Our results shown rs10463279   The results of molecular epidemiology studies were always accompanied by high probability of false positive [21][22][23]. The false positive report probability (FPRP) calculation was aimed to report the true association between the genetic variant and the disease, depends not only on the observed P value, but also on both the prior probability and the statistical power of the test [24]. We subsequently calculated the FPRP for all significant genetic effects observed in our study to test the false positive associations. The results of FPRP indicated that our results were less likely to be false positives, which implies the functional SNPs in SRA might be involved in the breast cancer development with a high likelihood.
The SRA RNA is a non-coding RNA that strongly associated with breast cancer and participate in nuclear coactivation for several hormone-related systems [25], including the estrogen receptor [19,26,27], androgen receptor [28], progesterone receptor [19,29] and thyroid hormone receptor [30]. A study by Leygue et al., reported that SRA expression could correlate positively or negatively with ER and PR levels, depending on the subgroup considered [20]. In that study, SRA expression was similar in ER-/PR-and in ER+/PR+ tumors, and SRA expression in these two subgroups was significantly lower than that observed in ER-/PR+ and ER+/PR-tumors. In our study, we further estimated the association between SRA polymorphism and ER, PR and HER-2 in BC patients, to clarify the role of SRA polymorphism in the pathologic state of BC. No significant association was observed between PR, HER-2 status and the genetic variants. However, both rs10463297 TC, TC + CC and rs801460 GA, AA, GA+AA genotype were significantly associated with ER positivity, which is a novel finding and suggests that SRA polymorphisms might have potential effects on estrogen receptor in breast cancer development.
In the current study, SRA rs10463297 TC and TC+CC genotype were associated with increased BC risk in the Chinese population. Furthermore, in cancer-free controls, variant genotypes of rs10463297 were associated with increased serum mRNA expression levels of SRA, suggesting SRA polymorphism may have a potential  Cross-validation consistency functional impact on mRNA levels, thus supporting a role in the susceptibility to BC. BC is a complex disease likely resulting from multiple interacting genetic polymorphisms and genereproductive factor interactions [31][32][33]. In this study, the gene-reproductive factor interaction on breast cancer susceptibility was examined by using a MDR method. A nominally significant interaction was found for rs10463297, age, post-menopausal, No. of pregnancy. One of the advantages of MDR method is that false-positive results due to multiple testing are minimized [34]. Thus, we can carefully suggest that a potential influence of age, post-menopausal, No. of pregnancy interaction with SRA polymorphisms rs10463297 contribute to the risk of BC in a central Chinese population. This is the first study to our knowledge to examine the role of SRA genetic polymorphism in BC carcinogenesis and focus on the gene-reproductive factor interactions on BC risk in a Chinese women population. There were some strengths of this study that should be noted. First, our controls were selected from people in a large sampling survey based on community, not from hospital, which significantly diminished the effect of selection bias. Second, a well-defined cohort of newly pathological diagnosed cases avoided the prevalence-incidence bias. Third, the controls and the cases were matched on age, and the baseline characteristic distributions in our control group were similar to case group. Therefore, we believed that selection bias was not substantial and not likely to influence the analyses of our study. Furthermore, for all significant genetic effects observed in our study, we calculated the FPRP. It is proved that our results are less likely to be false positives according to the FPRP results. However, several limitations may exist in the present study. The sample size of our study was not large, and the statistical power of the study may be limited. Therefore, it will be worth-while to validate these findings in larger studies with other ethnic populations, and clarify the genetic mechanisms of the SRA in the etiology of BC.
In summary, our results reveal for the first time that a novel SNP rs10463297 located in SRA gene was significantly associated with increased risk of BC. SRA rs10463297 polymorphism might be a helpful genetic marker to predict BC predisposition. Larger prospective studies are needed to validate our findings and further investigations are required to understand the exact mechanisms of SRA rs10463297 polymorphism in BC cells.

Subjects
All subjects participating in this study were genetically unrelated ethnic Chinese women. 489 newly diagnosed breast cancer patients with pathologically confirmed incident primary BC were recruited from the First Affiliated Hospital of Zhengzhou University and the Third Affiliated Hospital of Zhengzhou University between 2014 and 2015. At the same period, 495 healthy controls were randomly recruited from a pool of >20000 subjects participated community-based chronic diseases program of Henan province. All the controls were

DNA extraction
For each participant, venous blood (5 ml) was collected into a test tube containing ethylene diamine tetra acetic acid (EDTA). Genomic DNA was extracted from peripheral blood samples of all participants using the DNA Extraction Kit of TIANGEN BIOTECH (Beijing) according to the manufacturer's instructions. The extracted DNA was stored at -80°C until use.

SNP genotyping
The genotyping of rs10463297 was determined by polymerase chain reaction-restriction fragment-length polymorphism (PCR-RFLP), while SRA rs801460 was genotyped with created restriction site PCR (CRS-RFLP) assays.
The primers used for PCR amplification were designed by Primer 6.0 software (Table 9). PCR primers were further verified by NCBI BLAST (http://blast. ncbi.nlm.nih.gov/Blast.cgi/) to assess the possibility of amplifiation of any non-specifi DNA sequences and synthesized commercially. For each sample, PCR amplification was performed in a final volume of 30 μl, which contained 15 μl 2×Tap PCR MasterMix, 0.5 μl each primer (10 μM), 50 ng DNA, and 13 μl deionized water. Thermocycling conditions of PCR were as follows: initial denaturation at 95 °C for 5 min, 35 cycles of PCR consisting of denaturation at 94 °C for 30 s, optimal annealing temperature ( Table 9) for 45 s and extension at 72 °C for 45 s, and final extension step of 72 °C for 5 min.
In addition, the restriction enzyme AvaII and NsiI (Fermentas, Canada) were used for genotyping of rs10463297 and rs801460 respectively. The digestion patterns were separated by 3% agarose gel electrophoresis with ethidium bromide. The wild-type genotype of rs10463297 TT produced one 483 bp fragment; the TC genotype (heterozygote) produced 483, 317 and 166 bp fragments; CC genotype (variant homozygote) produced 317 and 166 bp fragments. The wild-type genotype of rs801460 GG produced one 294 bp fragment; the GA genotype produced 294, 271 and 23 bp fragments; AA genotype produced 271 and 23 bp fragments. All analyses were performed without knowledge of the case or control status for quality control. 10% of the study populations were randomly selected to confirm the genotyping results by different persons. In addition, a 10% random sample was also examined by direct sequencing (BGI Sequencing, Beijing). The results of confirmation were found to be 100% concordant.

Real-time reverse transcription PCR analysis of SRA mRNA expression levels in plasma
To explore the effects of different genotypes of rs10463297 on the SRA mRNA expression, the relative levels of SRA mRNA was examine using SYBR-Green real-time quantitative PCR method in 82 samples obtained from cancer-free controls whose genotypic data were anonymous. Total RNA was isolated from blood plasma samples using TRIzol LS Reagent (Ambion). Then cDNA was synthesized with Primescript RT Reagent (Takara, Japan). The SRA primers used for quantitative Sense: TTTTTAGTAGAGACAGGGTTTTGCC Antisense: ACTCTACGCCAGACAATATGCTATG a Minor allele frequency, based on the Chinese Han population data of the international HapMap project real-time PCR were as follows: forward primer 5′-CAAGCGGAAGTGGAGATGGCGGAGC-3′ and reverse primer 5′-GCGAAGTGTGTAGGGAGCGGAGGCG-3′. For β-actin, as an internal reference gene, the primers used were 5′-AGAAAATCTGGCACCACACC-3′ and 5′-TAGCACAGCCTGGATAGCAA-3′ [35]. Amplification reactions were performed in a final volume of 20 μl containing10.0 μl Master mix, 150 ng cDNA, 1 μl primers. The reaction conditions of Real-time PCR were set at 95°C for 30s, followed by 40 cycles at 95°C for 5 s and 60°C for 30 s. All procedures were performed in triplicate. The expression of individual SRA mRNA expression measurements was calculated relative to expression of β-actin using the2 -ΔCT method.

Statistical analysis
Our case-control study ample size was estimated with the PSAA 11.0 software, and calculate the sample size of gene-environment interaction was calculated by Quanto software under dominant inheritance model (http://biostats.usc.edu/cgi-bin/DownloadQuanto.pl). Hardy-Weinberg equilibrium (HWE) was tested by using a goodness-of-fi χ 2 -test to compare the observed genotype frequencies with the expected ones among the cancer-free control subjects. The differences in the distributions of age, reproductive variables, as well as the SNPs genotype frequencies between BC cases and controls, were appraised by using student's t test (for continuous variables) and Chi-squared (χ 2 ) test (for categorical variables). Unconditional logistic regression models were used to evaluate the association between case-control status and each SNP by the odds ratio (OR) and its corresponding 95% confidence interval (95%CI), with adjustments for age, age at menarche, status of menopausal, number of pregnancy, number of abortion, breast-feeding history for born baby, family history of BC in first-degree relatives. Furthermore, the data were stratified by age and reproductive factors to evaluate the stratum variable-related ORs among various SRA SNPs. Multifactor Dimensionality Reduction (MDR) method was also performed to assess the potential interactions among gene-reproductive factors. Haplotype analysis was conducted using the online SHEsis (http://analysis.bio-x.cn/myAnalysis.php). For all significant genetic effects observed in our study, the false positive associations were calculated by FPRP (false positive report probability) with prior probabilities of 0.001, 0.01, 0.1, and 0.25. The OR was set at 1.5 under dominant genetic model, and a probability < 0.5 was considered as noteworthy. Statistical analysis was performed by using SPSS 16.0 software package (SPSS Inc., Chicago, IL, USA) and SAS 9.2 software package. A two-sided P value less than 0.05 was considered as the significant level.