Novel Insights into Breast Cancer Genetic Variance through RNA Sequencing

Horvath, Anelia; Pakala, Suresh Babu; Mudvari, Prakriti; Reddy, Sirigiri Divijendra Natha; Ohshiro, Kazufumi; Casimiro, Sandra; Pires, Ricardo; Fuqua, Suzanne A. W.; Toi, Masakazu; Costa, Luis; Nair, Sujit S.; Sukumar, Saraswati; Kumar, Rakesh

doi:10.1038/srep02256

Download PDF

Article
Open access
Published: 25 July 2013

Novel Insights into Breast Cancer Genetic Variance through RNA Sequencing

Anelia Horvath^1,2^na1,
Suresh Babu Pakala²^na1,
Prakriti Mudvari¹^na1,
Sirigiri Divijendra Natha Reddy²^na1,
Kazufumi Ohshiro²^na1,
Sandra Casimiro³^na1,
Ricardo Pires³^na1,
Suzanne A. W. Fuqua⁴^na1,
Masakazu Toi⁵^na1,
Luis Costa³^na1,
Sujit S. Nair^1,2^na1,
Saraswati Sukumar⁶^na1 &
…
Rakesh Kumar^1,2,7^na1

Scientific Reports volume 3, Article number: 2256 (2013) Cite this article

8092 Accesses
42 Citations
9 Altmetric
Metrics details

Subjects

Abstract

Using RNA sequencing of triple-negative breast cancer (TNBC), non-TBNC and HER2-positive breast cancer sub-types, here we report novel expressed variants, allelic prevalence and abundance and coexpression with other variation and splicing signatures. To reveal the most prevalent variant alleles, we overlaid our findings with cancer- and population-based datasets and validated a subset of novel variants of cancer-related genes: ESRP2, GBP1, TPP1, MAD2L1BP, GLUD2 and SLC30A8. As a proof-of-principle, we demonstrated that a rare substitution in the splicing coordinator ESRP2 (R353Q) impairs its ability to bind to its substrate FGFR2 pre-mRNA. In addition, we describe novel SNPs and INDELs in cancer relevant genes with no prior reported association of point mutations with cancer, such as MTAP and MAGED1. For the first time, this study illustrates the power of RNA-sequencing in revealing the variation landscape of breast transcriptome and exemplifies analytical strategies to search regulatory interactions among cancer relevant molecules.

Integrating somatic CNV and gene expression in breast cancers from women with PTEN hamartoma tumor syndrome

Article Open access 05 July 2023

Takae Brewer, Lamis Yehia, … Charis Eng

Genomic basis for RNA alterations in cancer

Article Open access 05 February 2020

PCAWG Transcriptome Core Group, Claudia Calabrese, … PCAWG Consortium

A comprehensive genomic and transcriptomic dataset of triple-negative breast cancers

Article Open access 24 September 2022

Qingwang Chen, Yaqing Liu, … Ying Yu

Introduction

Breast cancer is the third most frequent cancer in the world as it affects approximately one in ten women in the western world¹. The initial knowledge that connected breast cancer to genetic susceptibility originated from the clinical observations that highlighted the clustering of breast cancer cases in families^2,3. Approximately 5–10% of breast cancers are believed to result from the inheritance of rare genetic components that confer significantly elevated risk^4,5. For example, mutations in the tumor suppressor genes BRCA1 and BRCA2 account for approximately 16% of the familial breast cancer^6,7,8. The vast majority of breast cancer cases, however, are derived from a complex interaction between multiple environmental, lifestyle and genetic factors with relatively weak individual risk contribution^9,10.

While the effects of many environmental and lifestyle factors, such as diet, reproductive behavior and radiation are well appreciated, the knowledge on genetically contributing patterns is limited. Association studies have identified ATM, BRIP1, CASP8, CDH1, CHEK2, PALB2, PTEN, STK11 and TP53 as breast cancer susceptibility genes. Such mutations collectively account for 2.3% of familial risk of breast cancer and together with BRCA1, BRCA2 and others have been implicated in high risk screening strategies^{5,8,11,12,13,14,15,16,17,18,19,20}. Nonetheless, significant proportion of the familial and non-familial breast cancer susceptibility remains unknown, suggesting plethora of genetic elements that need to be understood.

Transcriptome sequencing comprises a unique interplay between individual genetic background, reflected in the variation content and the epigenetic and environmental regulation affecting gene expression levels and splice patterns. Recent transcriptome sequencing efforts have highlighted important somatic events in metastatic triple negative breast cancer (TNBC) and described important for the clinical outcome genotype-phenotype correlations²¹. Further, transcriptome sequencing data have been successfully explored to reveal disrupted pathways in TNBC through genome-wide loss of heterozygosity and mono-allelic expression estimation²². As a result of these and other studies, the feasibility of transcriptome sequencing to uncover molecular mechanisms of breast cancer drivers is increasingly appreciated²³.

Here we used whole transcriptome RNA-sequencing to reveal the variation signatures of 17 breast cancer patient tissues and compared with human normal breast organoids (referred from here on as normal breast tissue, NBT). The 17 samples include six TNBC, lacking expression of therapeutically significant components - estrogen receptor (ER), progesterone-receptor (PR) and the Human Epidermal Growth Factor Receptor 2 (HER2); six Non-TNBC (ER, PR and HER2-positive); and five HER2-positive samples (ER and PR negative). Compared to the extensively performed searches for somatic breast cancer mutations, our RNA-sequencing based approach detects SNPs that are expressed at the mRNA level and allows estimation of their allelic expression at nucleotide resolution. A set of novel variants were validated through Sanger sequencing. As a proof-of-principle, we have explored the effect of a rare SNP- p.R353Q - in the epithelial splicing regulatory protein ESRP2, on the binding and splicing of its target pre-mRNA. Our study reports a set of novel mutations in essential regulatory molecules in breast cancer and discusses their allelic preferential expression and potential involvement in breast cancer.

Results

Analytical strategies and overall variation landscape

We set out to define the transcribed variation profile of TNBC, Non-TNBC and HER2-positive breast cancer samples. To achieve this, we applied mRNA sequencing on 17 breast cancer samples from unrelated individuals as well as on three NBT samples on the Illumina HiSeq 2000 platform. The raw reads were aligned against Ensembl GRCh37.62 B (hg19) using TopHat²⁴ and the variants were called using Samtools²⁵. Prior to filtering, a total of 1,876,617 SNPs, 331,197 of which were novel, were called across all 17 breast cancer samples and between 30,294 and 258,465 SNPs (average 110,389) were called in each individual sample (Supplementary Table 1). The overview of the workflow and the filtering strategy is presented in Figure 1. The SNP calls were separated into two groups – either reported in the databases (between 22,914 and 218,411 per sample, average 91,201), or novel. The previously reported SNPs, due to validation by at-least one independent group, were analyzed further without filtering. To increase the confidence in the calls of novel variants, we initially analyzed the SNP calling reads of 1,000 SNPs through Integrative Genomics Viewer (IGV) files and 96 of the calls were tested by Sanger sequencing. Based on the findings of this pilot validation test, we set up filtering criteria retaining minimum false-positive and false-negative calls as follows: those supported by at least of three bidirectional reads with unique start position, minimum phred quality value of 20, mapping quality value (MQV) > 20 and presence in 3 or less different samples. To ensure that we were not missing any novel high prevalence SNPs among our samples, all the positions at which a novel SNP was called in 4 or more samples were visually examined through IGV before to be assigned as false positives – no novel SNPs called in 4 or more samples were identified. This filtering left us with between 60 and 1143 novel variants per sample (average 285). The transition to transvertion (Ts/Tv) ratio among the novel coding SNPs was 2.8 and aligns with previously reported values for human exome, thus increasing the confidence of our filtering algorithm^26,27.

Prior to filtering, between 1,574 and 11,669 previously reported INDELs were called in each of the studied breast cancer samples and subjected to further analysis (See Supplementary Table 1). The novel INDELs were quality filtered to remove calls with MQV less than 20, phred quality value below 20 and presence in three or more different samples. This left between 18 and 142 novel INDELs (average 59) per sample, which were retained in our further analysis.

Expressed SNP density

To assess the overall expressed variation landscape of the breast cancer samples, we estimated the SNP density by counting the number of SNPs per megabase (MB) genome intervals. The SNP density was calculated individually for each sample and compiled per group (TNBC, non-TNBC and HER-2) and as a whole for the 17 samples (Figure 2). Overall, the SNP density distribution across the three groups was very similar, with a few regions showing group-specific high-density loci. All the TNBC samples presented with high number of SNPs in the region of chr14:10500000–106000000, which was mainly contributed by increased overall SNP number in the large gene encoding nucleoprotein AHNAK2. Specifically enriched in all non-TNBC samples was the region on chr19:53000000, mainly due to high number of SNPs in the zinc finger protein (ZNF) encoding genes.

We also overlapped the expressed overall SNP density in our samples with somatic genome SNP density calculated from the COSMIC dataset. There was a significant overlap in the overall SNP distribution. However, the regions with highest SNP frequency differed: while in the COSMIC dataset they were chr2:48000000, chr17:20000000 and chr5:72000000, the three top SNP-enriched regions when all of our samples were analyzed together were chr6:31000000–32000000, chr8:144000000 and chr19:53000000. While the high density observed on chromosome 6 was due to a the well-known variability in the histone cluster (HIST1H1A) and major histocompatibility complex (HLA), the chromosome 8 region was enriched by variants in epidermal antigen Epiplakin1 (EPPK1) and lymphocyte antigen (LY6E) (see Figure 2).

Comparative analysis with cancer genome variations (COSMIC)

We compared the SNPs identified in our samples with the COSMIC cancer genome somatic mutation database (http://www.sanger.ac.uk/cosmic)²⁸. A total of 2,169 SNPs from the COSMIC database were found among our samples, 129 of which were present in more than 10 of the 17 samples and 6 were called in all breast cancer samples. Only one SNP – the relatively common variant R1322X in the ABC transporter gene ABCA10 was nonsense, 515 were missense and 20 were located within a splice site. Of note, only two of the SNPs in our dataset, both UTR located, overlapped with COSMIC variants found in breast cancer: 1) the promoter T > C substitution in the proto-oncogene binding Yes-associated protein (YAP1) was seen in 8 of our 17 patients and 2) the 3′-UTR C > T substitution in peptidylprolyl isomerase F (PPIF) was found in 5 of our samples. Among these comparisons, highly represented in our datasets were COSMIC missense variants in the DNA-repair encoding probable helicase senataxin SETX and Ewing's tumor-associated antigen ETAA1.

GWAS associated SNPs in the breast cancer transcriptome

To outline SNPs that have been previously associated with breast cancer phenotypes, we overlaid our datasets with the publically available genome wide association studies (GWAS); the results are summarized in Supplementary Table 2. The pre-B-cell leukemia homeobox 1 (PBX1) intronic SNP rs1387389 that has been reported to strongly associate with early onset breast cancer²⁹ was present in 4/17 samples, two of which were homozygous. Similarly, two breast cancer associated SNPs in the fibroblast growth factor receptor (FGFR2)³⁰, rs2420946 and rs2981582, were present each in two of our samples, (one patient was a carrier of both), again, in a homozygous state. Of note, the mitogen-activated protein kinase kinase kinase (MAP3K1) SNP reported by the same study was not present among our samples, however, we found a higher prevalence of the closely positioned D860N and V906I missense substitutions in MAP3K1; they were called in 13 (9 homozygous) and 16 (11 homozygous) of our samples, respectively. Similar high homozygocity prevalence was seen for the rs704010 rs8170, rs2180341, rs13281615, rs3817198 and rs4973768, but was not observed for the intergenic rs4415084. Other SNPs reported to be in strong association with breast cancer from recent meta-analyses^{29,30,31,32,33} were not seen in our samples.

Variations in genes previously implicated in hereditary breast cancer

To search for known predisposing breast cancer variants among our samples, we extracted SNPs and INDELS in genes that have been previously associated with hereditary breast cancer. Among all 17 samples, 80 SNP calls (38 unique SNPs) and 66 INDEL calls (38 unique INDELs) mapped within ATM, BRCA1, BRCA2, BRIP1, CASP8, CDH1, CHEK1, PTEN, STK11 or TP53. While the majority of the SNPs called in those genes variants were common or have no known effect on the protein, several variants have been previously linked to breast cancer predisposition (Table 1). In BRCA1 and BRCA2 collectively, twelve different missense substitutions were identified in a total of nine patients. Five of the missense substitutions (p.Q356R, p.R496H and p.D693N in BRCA1 and p.N289H and p.D1420Y in BRCA2) have been previously associated with breast cancer through either family or case-control studies^34,35. Three patients from the non-TNBC group were carriers of the missense variant p.Q356R and the other cancer-associated variants were present in one patient each – from either non-TNBC or HER2 positive groups. The two BRCA2 missense substitutions were seen in HER2-positive patients. In addition, one TNBC patient carried a small BRCA1 deletion (chr17:41246251delC, rs80357794) that leads to a frame-shift and premature stop codon expected to completely abolish protein function. In the ATM gene, we identified the non-synonymous substitutions p.F858L and p.L1420F, which have been previously associated with increased risk for breast cancer³⁶. The ATM missense p.F858L is known to impact the interaction of ATM with beta-adaptin, which is necessary for clatherin mediated receptor endocytosis and is proposed to contribute to the hereditary radio sensitivity and breast cancer^37,38. In addition to breast cancer-associated variants, a missense substitution in STK11, p.H175Y, previously reported in a lung carcinoma³⁹ was been found in one patient. One non-TNBC patient carried simultaneously pathogenic variants in BRCA1 and ATM. Of note, overall a higher number of variant versus reference reads was assessed across BRCA1, BRCA2 and ATM variations.

Table 1 Variants in ATM, BRCA1, BRCA2 and STK11 identified in the 17 breast cancer samples (from HGMD, http://www.hgmd.cf.ac.uk/ac/index.php. The homo- or heterozygosity and the number of the unique variant and reference reads are also shown

Full size table

Prevalence of rare variants

To reveal variants that might be overrepresented in our samples compared to the general population, we compared the allele frequency of coding SNPs called in our samples against 11,666 alleles from the Exome Sequencing Project dataset (http://evs.gs.washington.edu/EVS/). To minimize error due to different variant calling platforms and to increase the statistical significance of the findings, we excluded from this analysis SNPs called in less than 10 alleles from the ESP dataset and in less than 3 individuals among our 17 samples. SNPs called in all 17 of our samples were also excluded. For the purposes of allele frequencies comparison, we assigned two alleles for every homozygote call in our dataset; and Yates corrected chi-square was calculated for each distribution. The top 50 most prevalent missense SNPs among the 17 samples are presented in Table 2. The highest difference in the allele distribution between our dataset and ESP was estimated for rs2305376 in the gene HOOK2, encoding a component of the FTS/Hook/FHIP (FHF) complex that has a role in vesicle trafficking and maintenance of centrosome function and is known to interact with the JUN proto-oncogene^40,41. Interestingly, the variant is predicted to be damaging change due to glycine to arginine substitution (p.G10R), which in addition to its low prevalence in the ESP datasets, is rare to absent in the European population datasets (see Table 2). This variant was called in 3 of our 17 samples and all of them were called homozygous due to the high abundance of variant over wild type reads. Another overrepresented SNP in our dataset was the missense substitution p.T573A in the protein tyrosine phosphatase PTPN12, whose activity is lost in a large proportion of breast cancer cases⁴². Of note, while PTPN12 loss is most strongly associated with the TNBC phenotype⁴³, we found this variant equally prevalent in all three breast cancer subtypes; one TNBC and one HER2-positive samples carried it in homozygote state. Other breast cancer implicated genes with prevalent variants amongst our samples were PLEC, PRCP, DSG2 and ERBB2IP, all harboring predicted to be damaging aminoacid changes^44,45,46,47. Potential contribution of such variants to the phenotype in these patients is worth investigation.

Table 2 Comparison between the observed prevalence of known SNPs in the 17 studied breast cancer samples and Exome Sequencing Project – presented are the top 50 variants showing higher prevalence among the 17 samples. The variants are sorted according descending chi-square value. High Polyphen score indicates high probability of the variant to alter the protein function

Full size table

Deleterious protein mutations

A selected subgroup of potentially deleterious SNPs consisted of mutations predicted to generate premature stop (PMS) codons through either nonsense substitution or a splice-site aberration leading to a frame-shift due to out-of-frame exon skip or intron retention. In this group we also retained SNPs removing a stop codon, because of their known severe biological consequences. A total of 1,593 variants of this type were called across our 17 samples, from which 77 were different nonsense SNPs and 16 were INDELs leading to a stop codon generation (Table 3). Among the stop-codon mutations, p.R93X in Steroid Receptor activator SRA1 was present in 10 of the 17 samples and in 8 of them it was called homozygous. Given that SRA1 is involved in regulating the activity of steroid receptors and is deregulated in breast cancer⁴⁸, the high expression rate of a nonsense variant might indicate functional implications in our breast cancer samples. In addition, two HER2 and two TNBC samples were positive for the p.Q281X in the zinc-activated ligand-gated ion channel (ZACN). Interestingly, the nucleotide change causing this mutation - chr17:74077797 C > T - also resides in the 5′UTR of the gene encoding exocyst complex component (EXOC7), whose deregulated expression was reported to be a strong predictor for metastatic outcome in early stage TNBC⁴⁹. The two TNBC samples positive for the variant did not show presence of reference reads in this position. When all the genes affected by a deleterious mutation were analyzed through Ingenuity Pathway Analysis (IPA), the top affected molecular networks were cell death and survival, cellular development and cellular growth and proliferation and the top affected canonical pathway was estrogen receptor signaling (Supplementary Figure 1).

Table 3 Deleterious mutations identified among the 17 Breast Cancer Samples

Full size table

Novel expressed variations in breast cancer and allele specific expression

The statistics on the filtered novel SNPs and INDELs are summarized in Supplementary Table 3; a complete list of the novel exonic annotated variations is available upon request. As expected, majority novel variants mapped within gene regions (70% of the SNPs and 66% INDELs). Filtering out of the intronic calls significantly reduced both SNP and INDEL numbers to between 43 and 186 SNPs per sample (average 76) and between none and 17 INDELs (average 8). Overall, 8% of the novel intergenic SNPs and 4% of the novel INDELs mapped within exons. Across the 17 samples, the total number of genes with coding and regulatory sequences affected by at least one novel SNP was 2103 and the genes with at least one novel INDEL were 566. A selected set of exonic variants were confirmed by Sanger sequencing (Figure 3).

From the novel exonic SNPs, 285 unique SNPs were predicted to alter the protein sequence. Based on position and function, three of these SNPs were annotated to generate a novel stop codon, 114 were located within 2 bp of a splice junction and 174 were missense, from which 70 were predicted to significantly affect the protein function. Six novel SNPs had dual annotation: missense substitutions located at a splice site. A total of 121 novel coding SNPs affected highly conservative nucleotide positions. Three novel coding SNPs - one missense and two synonymous substitutions - were called in two samples and one - a stop codon in the solute carrier SLC30A8 (p.Q28X) - in three different breast cancer samples (Table 4). Interestingly, p.Q28X affects only one (NM_173851.2) of the five protein coding SLC30A8 splice isoforms; this isoform possesses an alternative 5′-end and is present in all three samples from our set that expressed this isoform. The stop codon is located early in the protein chain and likely leads to complete abolishment of the protein expression. Since this SLC30A8 isoform was not expressed in the remaining 14 breast cancer samples, this early stop codon may indicate regulatory mechanism preventing the expression of this particular isoform in breast tissue.

Table 4 Novel Exonic Variants that are seen in two and three out of the 17 breast Cancer Samples

Full size table

To assess potential allele preferential expression, we analyzed the ratio of reference and variant reads at all coding positions for the novel SNPs called by 6 or more reads harboring the variant nucleotide (Figure 4). Fifty seven of these novel SNPs were called by variant reads only (i.e. no reference call was present at the corresponding position) and additional 53 showed higher than 5-fold number of variant calling reads over the wild type (Supplementary Table 4). Among the most preferentially expressed novel SNPs were missense variants in previously linked breast cancer genes, such as methyl-thioadenosine phosphorylase MTAP (p.K71R) and melanoma antigen MAGED1 (p.G87A)⁵⁰.

Studies revealing impaired interaction of splicing coordinator ESRP2 bearing a R353Q substitution

Among the novel and rare SNPs predicted to be protein-altering in our breast cancer samples, we selected to study the functional effect of the R353Q substitution in ESRP2, based on the established connection of ESRP2 to cancer through its role in epithelial-to-mesenchymal transition (EMT)^51,52,53. Arginine 353 is located in the second RNA recognition motif (RRM) domain of the ESRP2, which is known to interact with specific pre-mRNAs sequences. There are three RRM domains in ESRP2 and they are implicated in regulating the expression of specific splice variants of FGFR2, CTNND1 and ENAH that are involved in EMT. We applied site-directed mutagenesis to generate ESRP2^R353Q harboring expression vector and transfected MDA-MB-231 human breast cancer cells in parallel with expression constructs containing wild type ESPR2. After transfection, wild type and the mutant ESRP2^R353Q proteins were purified and compared for their ability to bind the FGFR2 pre-mRNA region through Electrophoretic Mobility Shift Assay (EMSA); the results are shown in Figure 5. We observed strong interactions between the wild type ESRP2 and the FGFR2 pre-mRNA as previously reported⁵². However, this interaction was significantly impaired for the mutant ESRP2^R353Q (compare lanes 6 and 7 with Lanes 2 and 3, Panels A–C), suggesting that the R353Q substitution compromises ESPR2 binding and potentially, splice regulation of the FGFR2 pre-mRNA. This effect was observed in all three tested breast cancer cell lines: MCF-7 (Figure 5A), MDA-MB-231 (Figure 5B) and BT-549 (Figure 5C). Further, in line with previous observations⁵², RT-PCR showed increased expression of the epithelial FGFR2 isoform IIIb after transfection with wild type ESRP2 in the mesenchymal FGFR2 IIIc-expressing cell lines MDA-MB-231 and BT-549 (Figure 5D). This increase in FGFR2 IIIb expression was lower (BT-549) to completely abolished (MDA-MB-231) after the transfection with the mutant ESRP2^R353Q (Figure 5D).

Discussion

Here we present the first mRNA sequencing based study that reports expressed variations in TNBC, Non-TNBC and HER2-positive breast cancer transcriptome. Several molecular mechanisms, such as RNA editing and allele preferential expression, could cause a discrepancy between the variations found at mRNA and DNA levels. Compared to exome and genome sequencing, RNA-seq provides essential insights into the functionality of the variants through estimation of the absolute and relative abundance of variant reads and the co-existence or mutual exclusion of variations, expression and splicing patterns. In addition to outlining the general landscape of the breast cancer variation transcriptome, our study reports novel variants in an allele-specific expression context, aligns our findings with the existing knowledge on breast cancer genetics and exemplifies efficient extraction of information from the transcriptome through extensive analyses.

Mutations in previously associated breast cancer genes, BRCA1, BRCA2 and ATM, were called in 9/17 (53%) samples which is a higher than the previously reported mutation prevalence among breast cancer patients^{5,12,13,14,15,16,18,19,20}. While only one patient was a carrier of known pathological variants in both BRCA1 and ATM, five other individuals carried missense substitutions in at least two different breast cancer associated genes (see Table 1). Whether the disease in these patients could be contributed to cumulative impaired functioning of these genes is a subject of further investigation; nevertheless, the relatively frequent co-occurrence of protein altering variations in known breast cancer-associated genes in different cancers raises the necessity to examine larger series of patients and controls for combinatorial genetic risk. An interesting observation is the high prevalence of homozygote vs. heterozygote calls in BRCA1, BRCA2 and ATM for both breast cancer-associated genes and those not known to be pathogenic variants, suggesting potential allelic loss in those genes.

Among the essential findings of our study is a subset of novel SNPs and INDELs, some affecting genes previously implicated in breast cancer, in which, however, no predisposing or causative point variants have been reported so far. An example is p.K71R in MTAP, frequently seen co-deleted with the CDKN2A and CDKN2B tumor suppressor genes in a large cohort of 2000 breast tumors⁵⁰. While the biological significance of p.K71R in MTAP and other novel variations in cancer-associated genes is currently unclear, overexpression of novel variant over reference alleles points to a possible contribution to tumor initiation or progression. Since these variants have not been previously reported, they are not likely to be present in a homozygote state at the genomic level and their allelic dominance may indicate expression or growth advantage, as well as potential loss of heterozygosity. Because such events may drive or contribute to cancer, a systematic investigation of allelic dominance of novel variants across larger expression sets is needed.

In addition, we also identified a higher frequency (compared to non-breast cancer populations) of previously reported variants in many genes, including breast cancer-associated genes, such as PTPN12, PRCP, PLEC, DSG2 and ERBB2IP. Estimation of the prevalence of such variants in larger breast cancer cohorts is needed as it may indicate contribution to genetic risk or co-existence with causative mutations. Although this analysis holds promising potential to identify overrepresented alleles, it is important to take into account that transcriptome sequencing variant calls differs from the exome sequencing in allelic representation of homo- and heterozygote state (i.e. number of alleles). While homo vs. heterozygosity on transcriptome level provides an additional layer of information on the potential functionality of these variants, the results should be used only after confirmation by independent studies. Nevertheless, statistical confidence may be increased for SNPs in which the difference is achieved through the analysis of high number of samples rather than homozygote appearance, such as PTPN12 and DSG2 (see Table 2). Such prevalent variants in genes implicated in breast cancer are worthy of investigation in independent breast cancer datasets.

Similarly to the above discussed prevalence of mutant reads in breast cancer-associated variants, GWAS associated SNPs in our set also showed high prevalence of homozygous vs. heterozygous calls. This overall prevalence of variant over reference reads for variant positions in cancer implicated genes, needs to be further investigated as potential indicator of mechanistic implications, such as loss of heterozygosity or preferential allelic expression. As the information content of the transcriptome as a common denominator combining frequency and expression data is emerging, large scale studies are expected to enlighten the feasibility and the information value of these types of analyses³².

Finally, we selected a rare, predicted to be protein damaging missense substitution from our dataset – p.R353Q in the splicing coordinator ESRP2 - to demonstrate in vitro the effect of the p.R353Q substitution on the ESRP2 protein function. We were able to show that the replacement of the arginine 353 with the variant glutamine leads to a significant reduction of the binding ability of ESRP2 to FGFR2 pre-mRNA. Thus, this could potentially affect epithelial-to-mesenchymal transition programs.

Overall, our analysis identified enrichment of variants known to be implicated in breast cancer as well as novel and rare variants in genes associated with breast cancer in our set of 17 breast cancer samples. Further, the within-individual exploration of the variance showed multiple disease associating variants in most of the individuals and points to the need for estimation of cumulative action of genetic alterations. This study reports an initial collection of variants that are expressed across the breast cancer transcriptome, including novel and reported mutations in their allelic abundance and co-presence with other variants. In addition to providing an overall variation landscape of the breast cancer transcriptome, such as expressed SNP density and deleterious variants scaffold, we exemplify different analytical strategies to search for molecular interactions and regulatory networks potentially implicated in breast tumorigenesis. Compared to exome and genome studies, transcriptome exploration provides higher information content through the estimation of the expression abundance, in the immediate context of allelic prevalence and co-existence with expression and splicing features^54,55. It is essential to keep in mind however that the transcriptome only captures a snap shot and further functional characterization of the observed molecular features is needed to prove disease-causative relationships. Nevertheless, our study provides an important breast cancer transcriptome dataset for further explorations on either high-throughput or individual gene/protein scale.

Methods

Human patient samples

The human breast cancer tissue RNA samples were provided by Dr. Suzanne Fuqua (Baylor College of Medicine). All of the human samples were used in accordance with the IRB procedures of Baylor College of Medicine and Dana-Farber Cancer Institute and Harvard Medical School, respectively. The breast tumor types, TNBC, Non-TNBC and HER2-positive, were classified on the basis of RNA sequencing FPKM abundance and immunohistochemical and qRT-PCR classification (data not shown) as previously described^54,55. All breast cancer patients were from European descent.

Illumina genome sequencing RNA sequencing library preparation

Large and small ribosomal RNA (rRNA) was removed from total RNA using RiboMinus Eukaryote Kit (Invitrogen, Carlsbad, CA). Five micrograms of total RNA were hybridized to rRNA-specific biotin labeled probes at 70°C for 5 minutes. The rRNA-probe complexes were then removed by streptavidin-coated magnetic beads. The rRNA-free transcriptome RNA was concentrated by ethanol precipitation. The cDNA synthesis and DNA library construction for all the seventeen samples were performed as described^54,55.

Read alignment and transcript assembly

The paired end raw reads were aligned using the TopHat version 1.2.0 that allows two mismatches in the alignment. The aligned reads were assembled into transcripts using cufflinks version 2.0.0. The alignment quality and distribution of the reads were estimated using SAM tools. From the aligned reads, de novo transcript assembly was performed to capture the major splice rearrangements and novel variations that occur in the transcriptomes of TNBC, Non-TNBC and HER2- positive breast cancers in comparison to NBT using cufflinks version 1.3.036. The cuffcompare program was used to identify transcripts that are identical to the reference human genome (the Ensembl GRCh37.62 B (hg19) reference genome). Further analysis and novel isoform call was performed through the reconstructed transfrags that comprise novel splice junctions and share at least one splice junction with a reference transcript. The very low abundant transcripts were identified by binning the transcripts according to their FPKM and the transcripts with FPKM below 0.3 were eliminated from further analysis. All the analyses presented in this manuscript are performed using two categories of transcripts: transcripts that are identical to reference and transcripts that comprise novel junctions. The global statistics, which includes the distributions of FPKM scores across samples and the dendogram that shows the relationship between the samples based on the reconstructed transcripts, were analyzed using cummeRbund package of cufflinks suite of programs. The average exon number was in the reassembled transcripts is comparable to the human genome reference average. To annotate novel splice events, we used Multivariate Analysis of Transcript Splicing (MATS). Additionally, for consistency checking and independent validation we used an inhouse built program (http://ccb.jhu.edu/software/ASprofile/) to compare the exon models between isoforms assembled with the program cufflinks for the normal and cancer samples. As mentioned earlier, only the isoforms that are similar to reference and isoforms that comprise novel splice junctions were considered. We determined the splicing differences indicative of exon inclusion, exclusion, alternative 5′, 3′ and intron retention events.

Variants call and annotation

Variants calls were obtained using Mpileup utility of SAMTools (http://samtools.sourceforge.net/mpileup.shtml). Base Alignment Quality was used to score the variant call. Consensus calling is done using bcftools. Maximum depth call was set at 100000. The variants were annotated using SeattleSeq Annotation Tools version 8.01, dbSNP build 137 (http://snp.gs.washington.edu/SeattleSeqAnnotation137/).

Sanger sequencing

First-strand cDNA was synthesized with SuperScript III reverse transcriptase (Invitrogen, Inc) using 1 μg of total RNA and mixture of oligo dT primer and random hexamers. For selected variants, cDNA primers flanking the variant position were designed using Primer3⁵⁶ and in RT-PCR to amplify the region of interest. The products were separated on 1% agarose gel, excited and purified using QIAquick Gel Extraction Kit (Qiagen, Inc.) according to the manufacturer instructions. The purified fragments were subjected to bi-directional Sanger sequencing with the forward and the reverse primer used for the amplification.

Statistics

To test if the distribution of variant alleles differed between our group and non-breast cancer populations, we applied chi-square test (2 × 2 tables). All the values were subjected to Yates correction for contingency to prevent overestimation of significance; p values below 0.05 were considered significant.

EMSA

To determine if the R353Q substitution affects the ability of ESRP2 to bind its substrate we used wild type and mutant FLAG-tagged ESRP2 ORF introduced in PBIX as previously described⁵². Three cell lines MCF-7, MDA-231 and BT-549 were transfected cell lines using FuGENE® Transfection Reagents (Promega, Inc.) according to the manufacturer recommendations. Nuclear extracts were prepared using a Nonidet P-40 lysis method. RNA oligos of ISE/ISS-3 were end labeled with using the annealed [γ-³²P] ATP in a 20 μl reaction mixture for 15 min at room temperature. RNA probes were incubated with respective nuclear extracts. Samples were run on a non-denaturing 6% polyacrylamide gel and imaged by autoradiography. Specific competitions were performed by adding a 100-molar excess of unlabeled probe to the incubation mixture and supershift Electrophoretic mobility shift assay (EMSA) were performed using FLAG antibody (Sigma-Aldrich).

References

DeSantis, C., Siegel, R., Bandi, P. & Jemal, A. Breast cancer statistics, 2011. CA Cancer. J. Clin. 61, 409–418 (2011).
Article PubMed Google Scholar
Claus, E. B., Risch, N. J. & Thompson, W. D. Age at onset as an indicator of familial risk of breast cancer. Am. J. Epidemiol. 131, 961–972 (1990).
Article CAS PubMed Google Scholar
Peto, J. & Mack, T. M. High constant incidence in twins and other relatives of women with breast cancer. Nat. Genet. 26, 411–414 (2000).
Article CAS PubMed Google Scholar
Lynch, H. T. et al. Genetic predisposition to breast cancer. Cancer 53, 612–622 (1984).
Article CAS PubMed Google Scholar
Turnbull, C. & Rahman, N. Genetic predisposition to breast cancer: past, present and future. Annu. Rev. Genomics Hum. Genet. 9, 321–345 (2008).
Article CAS PubMed Google Scholar
Wooster, R. et al. Identification of the breast cancer susceptibility gene BRCA2. Nature 378, 789–792 (1995).
Article ADS CAS PubMed Google Scholar
Stratton, M. R. & Rahman, N. The emerging landscape of breast cancer susceptibility. Nat. Genet. 40, 17–22 (2008).
Article CAS PubMed Google Scholar
Thompson, D., Easton, D. F. & Breast Cancer Linkage Consortium. . Cancer Incidence in BRCA1 mutation carriers. J. Natl. Cancer Inst. 94, 1358–1365 (2002).
Article CAS PubMed Google Scholar
Hopper, J. L. & Carlin, J. B. Familial aggregation of a disease consequent upon correlation between relatives in a risk factor measured on a continuous scale. Am. J. Epidemiol. 136, 1138–1147 (1992).
Article CAS PubMed Google Scholar
Parkin, D. M., Pisani, P. & Ferlay, J. Global cancer statistics. CA Cancer. J. Clin. 49, 33–64, 1 (1999).
Article CAS PubMed Google Scholar
Guilford, P. et al. E-cadherin germline mutations in familial gastric cancer. Nature 392, 402–405 (1998).
Article ADS CAS PubMed Google Scholar
Hemminki, A. et al. A serine/threonine kinase gene defective in Peutz-Jeghers syndrome. Nature 391, 184–187 (1998).
Article ADS CAS PubMed Google Scholar
Jenne, D. E. et al. Peutz-Jeghers syndrome is caused by mutations in a novel serine threonine kinase. Nat. Genet. 18, 38–43 (1998).
Article CAS PubMed Google Scholar
Malkin, D. et al. Germ line p53 mutations in a familial syndrome of breast cancer, sarcomas and other neoplasms. Science 250, 1233–1238 (1990).
Article ADS CAS PubMed Google Scholar
Meijers-Heijboer, H. et al. Low-penetrance susceptibility to breast cancer due to CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations. Nat. Genet. 31, 55–59 (2002).
Article CAS PubMed Google Scholar
Nelen, M. R. et al. Localization of the gene for Cowden disease to chromosome 10q22–23. Nat. Genet. 13, 114–116 (1996).
Article CAS PubMed Google Scholar
Nelen, M. R. et al. Germline mutations in the PTEN/MMAC1 gene in patients with Cowden disease. Hum. Mol. Genet. 6, 1383–1387 (1997).
Article CAS PubMed Google Scholar
Rahman, N. et al. PALB2, which encodes a BRCA2-interacting protein, is a breast cancer susceptibility gene. Nat. Genet. 39, 165–167 (2007).
Article CAS PubMed Google Scholar
Renwick, A. et al. ATM mutations that cause ataxia-telangiectasia are breast cancer susceptibility alleles. Nat. Genet. 38, 873–875 (2006).
Article CAS PubMed Google Scholar
Seal, S. et al. Truncating mutations in the Fanconi anemia J gene BRIP1 are low-penetrance breast cancer susceptibility alleles. Nat. Genet. 38, 1239–1241 (2006).
Article CAS PubMed Google Scholar
Craig, D. W. et al. Genome and transcriptome sequencing in prospective metastatic triple-negative breast cancer uncovers therapeutic vulnerabilities. Mol. Cancer. Ther. 12, 104–116 (2013).
Article CAS PubMed Google Scholar
Ha, G. et al. Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer. Genome Res. 22, 1995–2007 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hartmaier, R. J., Priedigkeit, N. & Lee, A. V. Who's driving anyway? Herculean efforts to identify the drivers of breast cancer. Breast Cancer Res. 14, 323 (2012).
Article PubMed PubMed Central Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central CAS Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS PubMed PubMed Central Google Scholar
Le, S. Q. & Durbin, R. SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res. 21, 952–960 (2011).
Article CAS PubMed PubMed Central Google Scholar
Shepherd, R. et al. Data mining using the Catalogue of Somatic Mutations in Cancer BioMart. Database (Oxford) 2011, bar018 (2011).
Article CAS Google Scholar
Rafiq, S. et al. Identification of inherited genetic variations influencing prognosis in early-onset breast cancer. Cancer Res. 73, 1883–1891 (2013).
Article CAS PubMed PubMed Central Google Scholar
Jara, L. et al. Genetic variants in FGFR2 and MAP3K1 are associated with the risk of familial and early-onset breast cancer in a South-American population. Breast Cancer Res. Treat. 137, 559–569 (2013).
Article CAS PubMed Google Scholar
Guo, H. et al. A common polymorphism near the ESR1 gene is associated with risk of breast cancer: evidence from a case-control study and a meta-analysis. PLoS One 7, e52445 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, Q. et al. Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell 152, 633–641 (2013).
Article CAS PubMed PubMed Central Google Scholar
Palmer, J. R. et al. Genetic susceptibility loci for subtypes of breast cancer in an African American population. Cancer Epidemiol. Biomarkers Prev. 22, 127–134 (2013).
Article CAS PubMed Google Scholar
Durocher, F. et al. Comparison of BRCA1 polymorphisms, rare sequence variants and/or missense mutations in unaffected and breast/ovarian cancer populations. Hum. Mol. Genet. 5, 835–842 (1996).
Article CAS PubMed Google Scholar
Schoumacher, F. et al. BRCA1/2 mutations in Swiss patients with familial or early-onset breast and ovarian cancer. Swiss Med. Wkly. 131, 223–226 (2001).
CAS PubMed Google Scholar
Fletcher, O. et al. Missense variants in ATM in 26,101 breast cancer cases and 29,842 controls. Cancer Epidemiol. Biomarkers Prev. 19, 2143–2151 (2010).
Article CAS PubMed PubMed Central Google Scholar
Gutierrez-Enriquez, S. et al. Functional consequences of ATM sequence variants for chromosomal radiosensitivity. Genes Chromosomes Cancer 40, 109–119 (2004).
Article CAS PubMed Google Scholar
Stredrick, D. L. et al. The ATM missense mutation p.Ser49Cys (c.146C > G) and the risk of breast cancer. Hum. Mutat. 27, 538–544 (2006).
Article CAS PubMed PubMed Central Google Scholar
Blons, H. et al. Genome wide SNP comparative analysis between EGFR and KRAS mutated NSCLC and characterization of two models of oncogenic cooperation in non-small cell lung carcinoma. BMC Med. Genomics 1, 25-8794-1-25 (2008).
Baron Gaillard, C. L. et al. Hook2 is involved in the morphogenesis of the primary cilium. Mol. Biol. Cell 22, 4549–4562 (2011).
Article PubMed PubMed Central Google Scholar
Miyamoto-Sato, E. et al. A comprehensive resource of interacting protein regions for refining human transcription factor networks. PLoS One 5, e9289 (2010).
Article ADS PubMed PubMed Central CAS Google Scholar
Sun, T. et al. Activation of multiple proto-oncogenic tyrosine kinases in breast cancer via loss of the PTPN12 phosphatase. Cell 144, 703–718 (2011).
Article CAS PubMed PubMed Central Google Scholar
Albeck, J. G. & Brugge, J. S. Uncovering a tumor suppressor for triple-negative breast cancers. Cell 144, 638–640 (2011).
Article CAS PubMed Google Scholar
Duan, L. et al. Prolylcarboxypeptidase regulates proliferation, autophagy and resistance to 4-hydroxytamoxifen-induced cytotoxicity in estrogen receptor-positive breast cancer cells. J. Biol. Chem. 286, 2864–2876 (2011).
Article CAS PubMed Google Scholar
Niwa, T. et al. BRCA2 interacts with the cytoskeletal linker protein plectin to form a complex controlling centrosome localization. Cancer. Sci. 100, 2115–2125 (2009).
Article CAS PubMed Google Scholar
Song, C., Wang, W., Li, M., Liu, Y. & Zheng, D. Tax1 enhances cancer cell proliferation via Ras-Raf-MEK-ERK signaling pathway. IUBMB Life 61, 685–692 (2009).
Article CAS PubMed Google Scholar
Davies, E. et al. The role of desmoglein 2 and E-cadherin in the invasion and motility of human breast cancer cells. Int. J. Oncol. 11, 415–419 (1997).
MathSciNet CAS PubMed Google Scholar
Cooper, C. et al. Increasing the relative expression of endogenous non-coding Steroid Receptor RNA Activator (SRA) in human breast cancer cells using modified oligonucleotides. Nucleic Acids Res. 37, 4518–4531 (2009).
Article CAS PubMed PubMed Central Google Scholar
Yau, C. et al. A multigene predictor of metastatic outcome in early stage hormone receptor-negative and triple-negative breast cancer. Breast Cancer Res. 12, R85 (2010).
Article PubMed PubMed Central Google Scholar
Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).
CAS PubMed PubMed Central Google Scholar
Tavanez, J. P. & Valcarcel, J. A splicing mastermind for EMT. EMBO J. 29, 3217–3218 (2010).
Article CAS PubMed PubMed Central Google Scholar
Warzecha, C. C., Sato, T. K., Nabet, B., Hogenesch, J. B. & Carstens, R. P. ESRP1 and ESRP2 are epithelial cell-type-specific regulators of FGFR2 splicing. Mol. Cell 33, 591–601 (2009).
Article CAS PubMed PubMed Central Google Scholar
Warzecha, Z. et al. Therapeutic effect of ghrelin in the course of cerulein-induced acute pancreatitis in rats. J. Physiol. Pharmacol. 61, 419–427 (2010).
CAS PubMed Google Scholar
Eswaran, J. et al. RNA sequencing of cancer reveals novel splicing alterations. Sci. Rep. 3, 1689 (2013).
Article PubMed PubMed Central CAS Google Scholar
Eswaran, J. et al. RNA sequencing of cancer reveals novel splicing alterations. Sci. Rep. 3, 1689 (2013).
Article PubMed PubMed Central CAS Google Scholar
Rozen, S. & Skaletsky, H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. 132, 365–386 (2000).
CAS PubMed Google Scholar

Download references

Acknowledgements

This work is supported by the McCormick Genomic and Proteomics Center. The authors wish to thank Sucheta Godbole, Jeyanthy Eswaran and Dinesh Cyanam for their technical assistance in this project.

Author information

Horvath Anelia, Pakala Suresh Babu and Mudvari Prakriti contributed equally to this work.

Authors and Affiliations

McCormick Genomic and Proteomics Center, Washington, District of Columbia, 20037, USA
Anelia Horvath, Prakriti Mudvari, Sujit S. Nair & Rakesh Kumar
Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, District of Columbia, 20037, USA
Anelia Horvath, Suresh Babu Pakala, Sirigiri Divijendra Natha Reddy, Kazufumi Ohshiro, Sujit S. Nair & Rakesh Kumar
Institute of Molecular Medicine and, Hospital de Santa Maria – CHLN, Lisbon, Portugal
Sandra Casimiro, Ricardo Pires & Luis Costa
Breast Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas, 77030, USA
Suzanne A. W. Fuqua
Department of of Surgery, Kyoto University Graduate School of Medicine, Kyoto, Japan
Masakazu Toi
Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University, Baltimore, MD, 21231, USA
Saraswati Sukumar
Cancer Research Program, Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram, India
Rakesh Kumar

Authors

Anelia Horvath
View author publications
You can also search for this author in PubMed Google Scholar
Suresh Babu Pakala
View author publications
You can also search for this author in PubMed Google Scholar
Prakriti Mudvari
View author publications
You can also search for this author in PubMed Google Scholar
Sirigiri Divijendra Natha Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Kazufumi Ohshiro
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Casimiro
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Pires
View author publications
You can also search for this author in PubMed Google Scholar
Suzanne A. W. Fuqua
View author publications
You can also search for this author in PubMed Google Scholar
Masakazu Toi
View author publications
You can also search for this author in PubMed Google Scholar
Luis Costa
View author publications
You can also search for this author in PubMed Google Scholar
Sujit S. Nair
View author publications
You can also search for this author in PubMed Google Scholar
Saraswati Sukumar
View author publications
You can also search for this author in PubMed Google Scholar
Rakesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.K. directed the project and designed the experiments. A.H. and R.K. designed the analyses, analyzed the data and wrote the manuscript. P.M. and S.B.P. carried out dry-lab and validation studies, respectively. S.D.R. and K.O. performed EMSA and RT-PCR experiments. S.C., R.P., L.C., S.A.W.F., M.T. and S. S. provided reagents and biological insights. S.S.N. carried out the initial RNA-sequencing experiments and editing.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Supplementary Info

Rights and permissions

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/

Reprints and permissions

About this article

Cite this article

Horvath, A., Pakala, S., Mudvari, P. et al. Novel Insights into Breast Cancer Genetic Variance through RNA Sequencing. Sci Rep 3, 2256 (2013). https://doi.org/10.1038/srep02256

Download citation

Received: 23 May 2013
Accepted: 17 June 2013
Published: 25 July 2013
DOI: https://doi.org/10.1038/srep02256

This article is cited by

Investigating the overlap of machine learning algorithms in the final results of RNA-seq analysis on gene expression estimation
- Kalliopi-Maria Stathopoulou
- Spiros Georgakopoulos
- Vassilis P. Plagianakos
Health Information Science and Systems (2024)
Underlying mechanisms of epithelial splicing regulatory proteins in cancer progression
- Ying Liu
- Yiwen Li
- Xiang Ao
Journal of Molecular Medicine (2022)
scReQTL: an approach to correlate SNVs to gene expression from individual scRNA-seq datasets
- Hongyu Liu
- N. M. Prashant
- Anelia Horvath
BMC Genomics (2021)
Comprehensive analysis of the expression of SLC30A family genes and prognosis in human gastric cancer
- Yongdong Guo
- Yutong He
Scientific Reports (2020)
T lymphocytes facilitate brain metastasis of breast cancer by inducing Guanylate-Binding Protein 1 expression
- Dana A. M. Mustafa
- Rute M. S. M. Pedrosa
- Johan M. Kros
Acta Neuropathologica (2018)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Analytical strategies and overall variation landscape

Expressed SNP density

Comparative analysis with cancer genome variations (COSMIC)

GWAS associated SNPs in the breast cancer transcriptome

Variations in genes previously implicated in hereditary breast cancer

Prevalence of rare variants

Deleterious protein mutations

Novel expressed variations in breast cancer and allele specific expression

Studies revealing impaired interaction of splicing coordinator ESRP2 bearing a R353Q substitution

Discussion

Methods

Human patient samples

Illumina genome sequencing RNA sequencing library preparation

Read alignment and transcript assembly

Variants call and annotation

Sanger sequencing

Statistics

EMSA

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links