Transcriptional signature of lymphoblastoid cell lines of BRCA1, BRCA2 and non-BRCA1/2 high risk breast cancer families

Approximately 25% of hereditary breast cancer cases are associated with a strong familial history which can be explained by mutations in BRCA1 or BRCA2 and other lower penetrance genes. The remaining high-risk families could be classified as BRCAX (non-BRCA1/2) families. Gene expression involving alternative splicing represents a well-known mechanism regulating the expression of multiple transcripts, which could be involved in cancer development. Thus using RNA-seq methodology, the analysis of transcriptome was undertaken to potentially reveal transcripts implicated in breast cancer susceptibility and development. RNA was extracted from immortalized lymphoblastoid cell lines of 117 women (affected and unaffected) coming from BRCA1, BRCA2 and BRCAX families. Anova analysis revealed a total of 95 transcripts corresponding to 85 different genes differentially expressed (Bonferroni corrected p-value <0.01) between those groups. Hierarchical clustering allowed distinctive subgrouping of BRCA1/2 subgroups from BRCAX individuals. We found 67 transcripts, which could discriminate BRCAX from BRCA1/BRCA2 individuals while 28 transcripts discriminate affected from unaffected BRCAX individuals. To our knowledge, this represents the first study identifying transcripts differentially expressed in lymphoblastoid cell lines from major classes of mutation-related breast cancer subgroups, namely BRCA1, BRCA2 and BRCAX. Moreover, some transcripts could discriminate affected from unaffected BRCAX individuals, which could represent potential therapeutic targets for breast cancer treatment.


INTRODUCTION
In 2015, breast cancer represented 26% of all cancer cases among Canadian women and was the second leading cause of cancer death constituting 14% of overall death due to cancer [1]. Like every common cancer, breast cancer shows some degree of familial clustering [2]. Highrisk families having multiple cases of breast or ovarian cancer are associated with a higher risk of developing breast cancer during their lifetime than other families [3]. It is thought that approximately 10-15% of breast cancer cases are hereditary and associated with mutations in BRCA1 or BRCA2 genes and some other genes having high to moderate penetrance such as TP53, PTEN, ATM, CHEK2, PALB2 and BRIP1 and ATR, which account for approximately 5% of the risk [4][5][6][7][8][9][10]. Common variants have also been identified in additional susceptibility loci and would explain a further ~16% of the 2-fold familial risk of breast cancer [11]. Among our French Canadian cohort, 24% of high-risk breast cancer families were found to be carriers of a deleterious BRCA1 or BRCA2 mutation [12].

Research Paper
Therefore, susceptibility alleles for more than half of the high-risk families remain unknown. A portion of these remnant breast cancer families could be explained by modulation of gene expression, which is mainly regulated through methylation or alternative splicing (AS) mechanisms. Indeed, more than 90% of human genes undergo alternative splicing and it is now becoming clear that AS plays an important role in human cancer development [13][14]. RNA sequencing allows a genomewide expression study of the transcriptome and can likely detect and quantify all coding and non-coding transcripts [15]. The use of RNA sequencing greatly enhanced our understanding of gene expression in cells [16].
Human immortalized lymphoblastoid cell lines (LCLs) provide information on gene expression without having to consider tissue specific expression [17]. LCLs used for the establishment of gene expression or splicing signatures are recognized as a reliable biological material to study a given disease [18][19][20][21][22][23][24][25][26][27][28], and some studies recently showed the heritability of splicing, as some exons were spliced in an allele-specific manner [18,29,30]. Moreover, it has been demonstrated that LCLs can be used to study life-course environmental epigenetics [31].
In this study, we performed RNA sequencing on LCLs isolated from BRCA1/2 and BRCAX affected and unaffected individuals coming from high-risk breast cancer families in an attempt to distinguish breast cancer subgroups based on their transcriptome profile. This study revealed several transcripts involved in regulation of translation, apoptosis, cell cycle as well as cell growth and proliferation, which could discriminate BRCAX individuals from BRCA1/2 subgroups.
The BRCAX subgroup included 16 affected and 16 unaffected individuals, which represented 16 pairs of sisters (1 affected and 1 unaffected per family). It should be noted that the oldest unaffected sister available was purposely selected in BRCAX families. This subgroup of unaffected sisters was used as controls for comparison purpose in the analyses described below. The mutational profile and relationship status of BRCA1, BRCA2 and BRCAX individuals are highlighted in Supplementary  Table 1. RNA-Seq analyses generated an average of 68 million reads per sample, and more than 85% of the reads were aligned to the hg19 human reference genome using TopHat (data not shown). As displayed in Supplementary  Table 2, out of a total of 173 259 transcripts detected, 95 transcripts (0.05 % of all transcripts) were found to be significantly (p<0.01) and differentially expressed based on the Bonferroni-corrected ANOVA analysis, when considering all four breast cancer subgroups (BRCA1, BRCA2, unaffected and affected BRCAX individuals). All these significant transcripts were encoded by 85 different genes. In addition to the main isoforms (one per gene), 10 mRNA isoforms were considered as alternatively spliced isoforms (11.8%). These significant transcripts included 54 gene isoforms showing a highly significant Bonferronicorrected p-value (p < 0.001).
Principal component analysis (PCA) of these 95 transcripts identified as differentially expressed among BRCA1-carriers (n=36), BRCA2-carriers (n=49), unaffected BRCAX (n=16) and affected BRCAX (n=16) individuals is presented in Figure 1. The first three principal components of transcriptional variation accounted for 59.6 % of the total variance. PCA on the full dataset showed that the PC1 component accounted for 46 % of the variance, which is highly informative, while PC2 was also informative compared to the variance explained in the randomized data set.
Unsupervised hierarchical clustering of all BRCA1, BRCA2 and BRCAX individuals was then performed using the 95 significant transcripts. As illustrated in Figure  2, gene expression levels of these significant transcripts allowed to discriminate distinctly BRCA1/2 from BRCAX (unaffected and affected) individuals. However, it was not possible to segregate BRCA1 from BRCA2 individuals as well as affected from unaffected BRCAX individuals. In addition, when considering BRCA1 and BRCA2 individuals, no specific clustering could be observed based on gene mutation or the status of the disease. Intragroup variance analysis using gene expression data was performed by Principle component analysis (PCA) for patients with the BRCA1 R1443X mutation (22 patients) and BRCA2 8765delAG mutation (44 patients). We did not find significance of BRCA1 R1443X and BRCA2 8765delAG mutation from their respective BRCA1 and BRCA2 subgroup. Thus, grouping of different mutations in BRCA1 or BRCA2 subgroup is justified (data not shown).
The ANOVA analysis followed by conservative post hoc Scheffé test, which is appropriate for comparing groups with unequal sample sizes, allowed to potentially identify transcripts discriminating BRCA1, BRCA2 and affected BRCAX individuals from unaffected BRCAX individuals, which were used as controls in this analysis. This analysis revealed 69, 71 and 28 gene isoforms differentially expressed from BRCAX unaffected for BRCA1, BRCA2 and affected BRCAX individuals, respectively (See Supplementary Table 2). It should be noted that the large majority of transcripts identified in BRCA1-carriers were also found in BRCA2-carriers.
As presented in Figure 3, these transcripts were then illustrated in Venn diagrams, which showed that 3 common transcripts (3.2%) were differentially expressed in all three subgroups, when compared to unaffected BRCAX individuals. In addition, a large portion of transcripts (65: 68.4%) was commonly identified in BRCA1 and BRCA2 subgroups, while only 1 transcript was specifically and exclusively associated with BRCA1, and another one different transcript with BRCA2. This illustrated the similarity between BRCA1 and BRCA2-carrier individuals regarding their gene expression profile. On the other hand, 23 gene isoforms were exclusively associated with affected BRCAX individuals and are not different in BRCA1 and BRCA2 subgroups. The name of the transcripts is presented in Supplementary Table 3.
In an attempt to further discriminate unaffected and affected BRCAX individuals, hierarchical clustering was then performed using the 28 gene isoforms discriminating both BRCAX subgroups (Figure 4). Although a much better clustering could be observed between both subgroups, these genes could not differentiate distinctly unaffected from affected BRCAX individuals, with 4 affected individuals being located among the unaffected individuals.
Further, we performed Scheffé analysis on all the 4 subgroups (BRCA1, BRCA2, unaffected and affected BRCAX) combined, this allowed us to identify specific transcripts, which are exclusively associated with each subgroup. As listed in Table 1, although no specific transcripts were specifically associated with BRCA1 or BRCA2 individuals, we could identify 67 transcripts specifically associated with BRCA1/BRCA2 following combination of both subgroups and compared to BRCAX individuals. In addition, 3 and 28 transcripts showed  Of interest, out of 67 transcripts specifically associated with BRCA1/BRCA2 subgroups combined, 19 transcripts (28%) were involved in DNA repair, cell proliferation, apoptosis and cell cycle, 32 (48%) different transcripts exert an action in transcription, translation as well as mRNA and protein metabolic processes, while 10 transcripts (15%) were involved in immune processes. Regarding the 28 transcripts exclusively associated with BRCAX affected individuals, more than 28% were involved in DNA repair/proliferation/apoptosis/cell cycle mechanisms, and approximately 43% (12 transcripts) were implicated in transcription and translation-related processes. It should be noted that XRCC6, SGSM3 and HDGF transcripts were associated with BRCA1/BRCA2, BRCAX unaffected and BRCAX affected individuals given that their expression was differentially expressed between all three subgroups. The expression of SGSM3 and HDGF was validated by qPCR in BRCAX subgroup to differentiate affected from non-affected patients (Supplementary Figure 1). The expression level of these genes was also checked in different mammalian breast cancer cell lines. Further, ANOVA analysis highlighted that H3F3B was differentially expressed in BRCA1 and BRCA2 subgroups, which was also evaluated by qPCR (Supplementary Figure 1).
The 85 genes associated with the 95 significant transcripts identified as differentially expressed between BRCA1, BRCA2 and BRCAX (unaffected or affected) individuals were then submitted for pathway and molecular function analyses. Using Ingenuity Pathway analysis, enrichment of several canonical pathways and functions were identified. It should be noted that mapped genes can be classified in more than one biological process or metabolic process.
Moreover, as described in Table 2 Of great interest, these genes were also associated with Cancer and Organismal Injury and Abnormalities ( Table 2).
The IPA analysis was also performed using the 28 genes discriminating unaffected from affected BRCAX individuals. As presented in Table 3, BRCAX-related genes were particularly associated with Telomere Extension by Telomerase, DNA Double-Strand Break Repair by Non-Homologous End Joining as well as specific molecule degradation and biosynthesis (p-value ranging from 1.66 × 10 −4 to 0.05). The top system development and functions represented by these genes were tissue development (top p-value: 1.28 × 10 −3 with 5 molecules), embryonic development (top p-value: 1.28 × 10 −3 ) and Immune Cell Trafficking (top p-value: 1.28 × 10 −3 ) (data not shown).
In addition, cell-to-cell signaling and interaction, cellular development, cellular growth and proliferation, cellular movement and lipid metabolism (top p-value at 1.28 × 10 −3 ) represented the enriched functions ( Table  4). As also described in this table, IPA analysis revealed that these genes were involved in networks such as "Cell death and survival" as well as "Connective tissue disorders and metabolic diseases" with 38 and 26 molecules, respectively.

DISCUSSION
In this study, we analyzed the genome-wide transcription profile observed in LCLs immortalized from high-risk breast cancer families. To our knowledge, this is the first study describing clustering of BRCA1, BRCA2 and unaffected/affected BRCAX individuals based on their whole gene expression profile observed in corresponding LCLs.
The reliability of using LCLs from affected individuals for a given disease with regard to expression studies or splicing signatures has already been established [18][19][20][21][22][23][24][25][26][27][28]. A particular study conducted by Hussain and colleague concluded that LCLs were a good reflection of isolated lymphocytes given their close resemblance at the genetic and phenotypic levels to parent lymphocytes and were a valuable resource for studies regarding genotype-phenotype interactions [39] and inter-individual variations associated with various diseases and disorders such as cancer or infectious disease [40][41][42]. In addition, peripheral blood mononuclear cells (PBMCs) have also been used to investigate the links between DNA damage response, immunity and cancer [43] and to study the early stages of breast cancer development on gene expression patterns [44]. As described in other studies [45][46][47], we used ANOVA analysis of variance with the Scheffé multiple post-hoc test to identify transcripts specifically regulated in BRCA1, BRCA2 as well as in unaffected and affected BRCAX individuals. Using gene expression data of the 95 significant gene isoforms, our clustering results allowed discrimination of BRCA subgroups, particularly BRCA1/2 from BRCAX individuals. This is in agreement with other studies in which gene expression in LCLs was successfully used for clustering analysis in several diseases including autism spectrum disorders and spinocerebellar ataxia (SCA28) [48,49]. Moreover, LCLs served as a model system to assess genotype-phenotype relationships in human cells, including studies for quantitative trait loci influencing levels of individual mRNAs and responses to drugs and radiation [50][51][52][53], as well as regarding the haploinsufficiency effects of various BRCA1 mutation [54].
Over the last decade, several investigations clustered breast tumors based on single gene expression levels as well as their gene/splicing expression profile or molecular and clinical characteristics. The first correlation between the tumor phenotypic diversity (histopathological and clinical characteristics) and gene expression patterns was demonstrated in 2000 by Perou and colleague [55][56][57][58]. Van't Veer et al. conducted clustering analyses of breast tumors based on their gene expression profile and determined a predictive signature of metastases development (poor prognosis) in patients without tumoral cells in local lymph node at diagnosis and established a specific signature of BRCA1 tumors [56]. Clustering analyses of breast tumors based on whole gene expression revealed familial aggregation of BRCA-related tumors and of specific molecular subtypes including Basal, HER2enriched, Luminal A and B as well as Normal-like and sporadic tumors [38,[55][56][57][59][60][61][62][63][64][65][66][67][68][69][70]. In addition, alternative splicing expression profile was also successfully used in several studies aiming to discriminate subtypes of breast tumors [71][72][73], and specific gene expression profiles have also been identified for BRCA1, BRCA2 and CHEK2associated breast tumors [62,74].
To our knowledge, the only clustering analysis involving LCLs in breast cancer classification distinguished BRCA1 carrier from non-carrier individuals [75], in which 133 genes were found to be differentially expressed between BRCA1-mutated and non-carriers. However, hierarchical clustering of these genes did not result in an accurate discrimination between both subgroups. Of these 133 genes identified by Vuillaume et al. [75], the RPL29 and PSMF1 genes have also been identified in our comparison between BRCA1/2 and BRCAX individuals. RPL29 is a ribosomal protein, involved in RNA interaction and protein synthesis, while PSMF1 gene encodes a proteasome inhibitor protein involved in protein folding and degradation [76,77].
We then compared our results with GTEx Portal database, which contains normalized expression data from RNA sequencing for each gene and transcripts for different types of tissues. The normalization of expression was done by similar method for all databases. The expression values for genes from EBV transformed lymphocytes are available, and the normalized expression values in this database are similar with the ones we obtained [78]. In addition, we performed correlation between TPM value and FPKM value by doing a regression analysis using the values for each sample in the analysis and we found significant correlation between both of them.
We also compared our gene lists with the lists for up and down regulated genes associated with breast cancer using BioXpress, a curated gene expression and disease association database using RNA sequencing from TCGA database [79]. A certain number of our significant genes were common. Indeed, there were 5 genes that were present in our list and in the up regulated list for   Regarding the list for down regulated genes associated with breast cancer, 3 genes were also present in our list (HLA-DPA1, PCDH9 and NCALD). Among the highest significant genes associated with BRCA1/2 subgroups, some of them (p-values ranging from 1.1 × 10 −7 to 4.5 × 10 −6 ) namely NOSIP, EIF2AK1, TBCB, XRCC6, SGSM3 and HDGF are involved in key mechanisms implicated in carcinogenesis susceptibility.
The expression of the NOSIP gene was upregulated and EIF2AK1 and TBCB were downregulated in BRCA1/2 individuals when compared to BRCAX subgroups. The eNOS Interacting Protein NOSIP was identified as an interacting protein of the endothelial isoform of nitric oxide synthase (eNOS) to enhance its translocation to intracellular membrane [80].
The EIF2AK1 gene is involved in the modulation of the basal hepatic endoplasmic reticulum stress tone [81]. Although no information links this gene to breast cancer, it has been demonstrated in a mouse xenograft model of human breast cancer that an activator of EIF2AK1 protein was associated with tumor growth inhibition compared with vehicle [82]. Thus, its downregulation in BRCA1/2 individuals could promote tumor development.
Regarding TBCB, this protein is involved in regulation of axonal growth and microtubule functional diversity and dynamics [83]. This protein was shown to be overexpressed and phosphorylated in breast tumors [84]. Therefore the effect of its decreased expression in BRCA1/2 individuals remains to be investigated.
Of great interest, specific expression levels of XRCC6, SGSM3 and HDGF genes were also associated with BRCAX individuals (unaffected and affected). These genes are involved in DNA repair as well as in the regulation of cell cycle, cell proliferation and transcription, and were differentially expressed between BRCA1/2, unaffected BRCAX and affected BRCAX individual. These genes showed the highest expression in affected BRCAX and the lowest expression in BRCA1 and BRCA2 subgroups. Indeed, a similar and progressive pattern of expression values for all three genes was observed between subgroups (BRCAX affected > BRCAX unaffected > BRCA1/2 individuals) as presented in Supplementary Table  2. Futhermore, the study highlighted few genes which could discriminate BRCA1 from BRCA2 subgroup, amongst them the highest difference was depicted by H3F3B.
XRCC6 gene encodes the Ku70 protein, which is a component of the non-homologous end joining (NHEJ) DNA repair pathway. This pathway is an alternative mechanism to homologous recombination (HR) repair pathway involved in double-strand break (DSB) repair in mammalian cells [85]. Defect or variation of expression of NHEJ genes such as XRCC6, might escape cell cycle checkpoint surveillance and could lead to suboptimal DNA repair and subsequently to accumulation of DNA damage and carcinogenesis initiation [86][87][88]. Given the key roles of BRCA1/2 in HR repair pathway [89], defective activity of BRCA1/2 proteins found in some individuals combined with low expression of NHEJ-associated genes could likely increase the accumulation of DNA damage in BRCA1/2 individuals. Indeed, this low expression of Ku70 was previously observed in BRCA1-deficient cell lines [90]. On the other hand, the high expression of XRCC6 in BRCAX individuals affected with breast cancer remains to be elucidated. Moreover, polymorphisms in XRCC6 gene have been shown to increase breast cancer susceptibility as well as other types of cancer [91][92][93][94][95][96].
SGSM3 is a member of the small G protein signaling modulators, which is associated with small G protein coupled receptor signal transduction Analyses were performed using genes associated with significantly and differentially expressed transcripts between BRCAX affected and BRCAX unaffected using bonferroni correct p-value <0.01. www.impactjournals.com/oncotarget pathway [97]. The human SGSM3 proteins were demonstrated to coprecipitate with RAP and RAB subfamily members of the small G protein superfamily. Therefore it has been suggested that the SGSM family members exert a role as modulators of the small G protein RAP and RAB-mediated neuronal signal transduction and vesicular transportation pathways [97]. The only information in the literature associating this protein with breast cancer described a decrease of SGSM3 mRNA in breast cancer tissue compared to normal tissue [98], which is in contrast with the significant increase of expression in LCLs of BRCAX affected individuals observed in our study. However, in the study performed by Nourashrafeddin et al. using basic RT-PCR method and visualization on agarose gel [98], SGSM3 was not detected in normal and cancerous tissues, illustrating the very low expression of this gene in breast tissues. It should be noted that a polymorphism (rs17001868) found in the SGSM3 gene has been associated with mammographic dense areas of the breast [99], which represents a factor of breast cancer risk [100][101][102]. SGSM3 has also been associated with hepatocellular carcinoma [103]. The Hepatoma-derived growth factor (HDGF) is now recognized as a breast cancer-associated gene and promotes the epithelial-mesenchymal transition (EMT) [104]. EMT is a hallmark of many cancers characterized by an increased cell invasion, which enhances the initial phase of metastatic progression [105,106]. HDGF is overexpressed in several types of cancers including breast cancer cell lines and tissues and correlates with poor prognosis [104,[107][108][109][110][111][112]. Blockade of HDGF using a specific antibody results in the inhibition of malignant features and EMT of breast cancer cells [104]. Thus, its overexpression in breast cancer tissues is in concordance with our results demonstrating the overexpression of HDGF in LCLs of BRCAX individuals affected with breast cancer. Hence, this protein could be considered as a prognostic factor for tumor metastasis and recurrence.
H3 Histone Family Member 3B (H3F3B) is part of core histone molecule and has a role in gene regulation, DNA repair, DNA replication and chromosomal stability. Mutations in H3F3B gene have been associated with several cancers including brain cancer, giant cell tumor of bone and colorectal cancer [113][114][115]. Overexpression of this gene is also associated with colorectal cancer. In breast cancer, the copy number of the chromosome carrying this gene is significantly high [116], which is further confirmed by the data from the human protein atlas data.
In addition to XRCC6, SGSM3 and HDGF genes as described above, ZNF687 and FGF23 genes were also associated with and upregulated in BRCAX affected individuals when compared to unaffected individuals.
ZNF687 encodes a zinc finger protein and represents an important regulator of skeletal development and maintenance [117]. Overexpression of ZNF687 has been observed in tumor tissue of individual giant cell tumor of bone associated with Paget disease of bone, and this high expression is also observed in the peripheral blood of patients affected with Paget disease [118]. The role of ZNF687 in breast cancer is unknown.
As to fibroblast growth factor 23 (FGF23), it is a binding partner of Klotho proteins for endocrine signaling through the action of FGFRs. These FGFR receptors are involved in several mechanisms such as regulation of cell survival, proliferation, differentiation and motility during embryogenesis as well as tissue homeostasis and carcinogenesis [119][120][121][122]. Indeed, FGF23 signaling promotes proliferation in myeloma cells [123], while increase of FGF23 levels in serum were observed in The number of genes involved in the process is given in parentheses.
Analyses were performed using genes associated with significantly and differentially expressed transcripts between BRCAX affected and BRCAX unaffected using bonferroni correct p-value <0.01.
cancer patients, and were also elevated in patients with non-cancerous diseases, such as hypophosphatemic rickets and chronic kidney diseases [124]. Considering all significant genes identified following ANOVA analysis of gene expression data observed in four BRCA subgroups, several interesting pathways seem to be affected by the regulation of specific genes. Among these pathways, EIF2 signaling, 14-3-3-mediated pathway and mTOR signaling are particularly significant and relevant to breast cancer. Moreover, both the EIF2 (through the activation of the PI3K pathway) and 14-3-3-mediated signaling cascades regulate the mTOR pathway [125,126], which is involved in the response to hormones and growth factor stimulation and is well known to exert a significant role in tumor cell growth and proliferation as well as in breast cancer development [127] and references therein.
On the other hand, cell death and survival, cellular function and maintenance as well as cell cycle represent the highest enriched functions, while cancer remains among the diseases/disorders having the highest p-value, which is associated with a high number of genes involved in cancer-related pathways.
Taken together, in the present study, we compared gene expression profiles in lymphoblastoid cell lines in BRCA1-and BRCA2-carriers as well as BRCAX affected and unaffected individuals from high-risk breast cancer families in order to determine specific markers which could be of great relevance for further studies. Indeed, several transcripts have been identified as potential valuable markers of interest for breast cancer, and deserve further analysis.

Ascertainment of high-risk families
Recruitment of high-risk French Canadian breast and ovarian cancer families started in 1996 as a large interdisciplinary research program designated INHERIT BRCAs [12]. The High risk group is defined as families with a history of breast cancer with at least 3 cases in 1 st degree relative or 4 cases in 2 nd degree relative, the full selection criteria have been published previously [12]. Patients were screened for deleterious mutations in BRCA1 and BRCA2 genes. The BRCA testing was done by complete sequencing of the BRCA1/2 gene by using primers in both directions (forward and reverse). Confirmation was done by Myriad Laboratories. A subset of 96 high-risk families with no deleterious mutation in BRCA1 or BRCA2 were recruited (BRCAX families) as described elsewhere [128,129]. For the purpose of this study, carriers of a BRCA1 or BRCA2 mutation were selected. As for the BRCAX families, the youngest available breast cancer case in the family was selected, along with the oldest non-affected sister. All unaffected women were post-menopausal. All individuals provided their written informed consent in order for their genetic material to be part of a biobank (Dr J. Simard, director

Cell line immortalization and RNA extraction
Lymphocytes (LCLs) were isolated and immortalized from 7 to 9 mL of blood samples from breast cancer individuals using Epstein-Barr virus in 15% RPMI medium as previously described [128,130,131]. Total RNA was extracted from LCLs using TRI Reagent (Molecular Reasearch Center Inc, Cincinnati, OH, USA) according to the manufacturer's instructions as described previously [128]. The viral strain, number of passage and conditions for cell lines were kept identical to avoid bias in gene expression [133].

RNA-seq experiments
The quality of RNA samples was evaluated with an Agilent Bioanalyzer 2100 to determine the RIN (RNA Integrity) score using the Agilent RNA 6000 Nano chip and reagents. Samples with a RIN score >7 were retained and converted to cDNA with the Illumina RNA seq kit for sequence library preparation based on the Illumina TruSeq RNA Sample Preparation protocol. The final libraries were pooled in triplicate and then sequenced on an Illumina HiSeq 2000 at the McGill University and Génome Québec Innovation Centre.

Statistical analysis
Statistical analyses were carried out using the R Package v3.3. In regard to mRNA levels, One-factor analysis of variance (ANOVA) was performed to compare the breast cancer subgroups. First a model was fitted using the lm function from the stats package then the ANOVA analysis was performed with the Anova command from the car package [139][140]. Bonferroni correction was performed with the p.adjust function from the stats package using "BH", "BY" and "bonferroni" methods and statistically significant differences were considered at p < 0.01 for the "bonferroni" method [140]. The Scheffé test was carried out with the scheffé test function from the agricolae R package for posthoc analysis for comparisons between two of the multiple groups [141]. We performed intra-group variance analysis using gene expression data of patients with the BRCA1 R1443X mutation and BRCA2 8765delAG mutation by Principle component analysis (PCA).

Pathways, network and clustering analyses
Partek Genomics Suites® software package (copyright © 2009 Partek Incorporated. St. Louis, MO) was used for hierarchical clustering using the default setting (Euclidean dissimilarity and average linkage method) as well as for Principal component analysis (PCA).
Identification of overrepresented pathways, functions and gene-associated diseases were performed using QIAGEN's Ingenuity® Pathway Analysis (IPA®, QIAGEN Redwood City, www.qiagen.com/ingenuity) software. Default settings in IPA for expression dataset analyses were used for gene list functional analysis. Gene lists were uploaded using NCBI Entrez gene IDs or gene symbols and submitted for IPA Core Analysis. IPA calculates p-values that reflect the statistical significance of association between the genes and the networks by the Fisher's exact test. Pvalue ≤ 0.05 were considered significant.