Patient survival and tumor characteristics associated with CHEK2:p.I157T – findings from the Breast Cancer Association Consortium

Background P.I157T is a CHEK2 missense mutation associated with a modest increase in breast cancer risk. Previously, another CHEK2 mutation, the protein truncating c.1100delC has been associated with poor prognosis of breast cancer patients. Here, we have investigated patient survival and characteristics of breast tumors of germ line p.I157T carriers. Methods We included in the analyses 26,801 European female breast cancer patients from 15 studies participating in the Breast Cancer Association Consortium. We analyzed the association between p.I157T and the clinico-pathological breast cancer characteristics by comparing the p.I157T carrier tumors to non-carrier and c.1100delC carrier tumors. Similarly, we investigated the p.I157T associated risk of early death, breast cancer-associated death, distant metastasis, locoregional relapse and second breast cancer using Cox proportional hazards models. Additionally, we explored the p.I157T-associated genomic gene expression profile using data from breast tumors of 183 Finnish female breast cancer patients (ten p.I157T carriers) (GEO: GSE24450). Differential gene expression analysis was performed using a moderated t test. Functional enrichment was investigated using the DAVID functional annotation tool and gene set enrichment analysis (GSEA). The tumors were classified into molecular subtypes according to the St Gallen 2013 criteria and the PAM50 gene expression signature. Results P.I157T was not associated with increased risk of early death, breast cancer-associated death or distant metastasis relapse, and there was a significant difference in prognosis associated with the two CHEK2 mutations, p.I157T and c.1100delC. Furthermore, p.I157T was associated with lobular histological type and clinico-pathological markers of good prognosis, such as ER and PR expression, low TP53 expression and low grade. Gene expression analysis suggested luminal A to be the most common subtype for p.I157T carriers and CDH1 (cadherin 1) target genes to be significantly enriched among genes, whose expression differed between p.I157T and non-carrier tumors. Conclusions Our analyses suggest that there are fundamental differences in breast tumors of CHEK2:p.I157T and c.1100delC carriers. The poor prognosis associated with c.1100delC cannot be generalized to other CHEK2 mutations. Electronic supplementary material The online version of this article (doi:10.1186/s13058-016-0758-5) contains supplementary material, which is available to authorized users.


Background
Checkpoint kinase 2 (CHEK2) is a moderate penetrance breast cancer risk gene. The two most frequent CHEK2 mutations in European populations are p.I157T and c.1100delC. Truncating CHEK2 founder mutations (c.1100delC, IVS2 + 1G > A, del5395) confer a higher than twofold increase in the risk of breast cancer [1][2][3], whereas p.I157T (c.470 T > C, rs17879961), a CHEK2 missense mutation is associated with a milder, 1.4-fold elevation in the risk [4]. The c.1100delC carrier frequency is highest in the Netherlands and in Finland (over 1 %), the other two truncating founder mutations are found mainly in Poland [3], and p.I157T is most frequent in Finland and in Poland (around 5 %) [5]. Additionally, dozens of rare CHEK2 missense mutations have been found in breast cancer patients, but their contribution to disease risk is minor on a population level and causative role in disease development probably varies greatly [6][7][8].
The consequences of c.1100delC and p.I157T differ on a molecular level, but both have been shown to severely interfere with the CHEK2 protein activity. C.1100delC is a loss-of-function mutation that induces a premature termination codon in the kinase domain in exon 10 (ter381) leading to a nonsense-mediated mRNA decay, which reduces both mutated and overall CHEK2 mRNA level [9,10]. C.1100delC truncates CHEK2 protein's Cterminal kinase domain. The truncated protein is unstable and practically undetectable in mutation carrier cells [9]. Isoleucine 157 (p.I157T) is required for several van der Waals interactions at the interface of forkheadassociated (FHA) and kinase domains of dimerizing CHEK2 peptide chains. Its replacement to threonine (p.I157T) has been shown to interfere with these interactions and to severely impede the CHEK2 homodimerization required for its activation [11]. Furthermore, ectopic expression of human CHEK2:p.I157T failed a rad53/sml complementation assay in yeast suggesting an impaired protein function [6]. Thus, p.I157T possibly disturbs CHEK2 function by competing with the wild-type protein in dimer formation in heterozygous cells in a dominant negative manner [4].
Since both p.I157T and c.1100delC cause increased risk of breast cancer and compromise the activity of the CHEK2 protein, the question remains whether their effects on patient prognosis would be proportional to their risk effects and how similar the breast cancer phenotypes associated with the mutations would be. C.1100delC is associated with bilateral disease and estrogen receptor (ER)-positive tumors [12][13][14]. However, although tumors from p.I157T carriers are also predominantly ER-positive [15], tumors from p.I157T and c.1100delC carriers are associated with phenotypically different types of breast cancer. The lobular histological type is overrepresented among p.I157T mutation carrier tumors [16], whereas the c.1100delC carrier tumors are typically ductal [13,14].
We have previously reported CHEK2:c.1100delC heterozygosity to be associated with reduced overall and disease-free survival as well as with increased risk of breast cancer-specific death in a Breast Cancer Association Consortium (BCAC) data set combining mutation carriers from multiple European populations [17]. Here, we report a study investigating thoroughly the prognostic associations of CHEK2:p.I157T as well as pathologic characteristics and genomic gene expression profiles of breast tumors from carriers of germ line p.I157T.

Study subjects for survival and pathology analyses
We included in the analyses female invasive breast cancer patients of European ancestry with a first invasive primary breast cancer enrolled in 15 studies participating in the Breast Cancer Association Consortium (BCAC) (Additional file 1: Table S1). In order to be able to stratify the analyses by study, only BCAC studies providing genotype and survival data of about ten CHEK2:p.I157T carriers were included in the analyses (Additional file 1: Table S2). Altogether, the data set consisted of 26,801 study subjects, of which 590 carried germ line p.I157T and 271 carried c.1100delC mutations (Table 1).  [18]. Discordant genotyping results were clarified with Sanger sequencing. CHEK2:c.1100delC was genotyped by independent studies using mainly TaqMan (Additional file 1: Table S1), as described earlier [17].

Pathology analysis
Pathology data was collected from hospital records or from scientific projects within the individual studies, as described previously [19]. Additionally, the TP53 protein expression was measured by individual studies using immunohistochemical staining as described in Additional file 1: Table S3. The pathology data availability and mutation carrier frequencies varied between independent BCAC studies and therefore all analyses were stratified by study. Pathology analyses were performed using R environment for statistical computing version 3.0.2 [20] including packages vcdExtra [21] and meta [22]. Comparisons were made between CHEK2 mutations carriers (heterozygous or homozygous) and non-carriers, for both p.I157T and c.1100delC, as well as between carriers of p.I157T and c.1100delC (Table 1). Associations between the mutations and clinico-pathological characteristics were tested with study-stratified Cochran-Mantel-Haenszel test (mantelhaen.test for categorical characteristics and CMHtest for ordinal characteristics). The category of missing data was not included in these comparisons. Differences in age at diagnosis were tested by meta-analysis of age distribution in independent studies using a random effects model (metacont).

Survival analysis
Survival analyses were performed using the Cox regression [23] as implemented in Stata (Stata/SE 10.1 for Windows, StataCorp LP, College Station, TX, USA) comparing CHEK2 mutation carriers and non-carriers, as described above. Study subjects were considered to become at risk at the time of their first invasive breast cancer diagnosis. The data did not consist entirely of incident cases. Therefore, in order to avoid bias caused by late enrollment, we implemented a method called left censoring, which has been proven to provide robust survival estimates for data, which includes also prevalent cases [24]. Survival analysis endpoints included death of any cause, breast cancer-associated death, distant metastasis relapse, locoregional relapse and second breast cancer. Patients were censored at the end of their follow-up period or at the latest 15 years after the initial breast cancer diagnosis in analyses of overall survival and second breast cancer, but at the latest 10 years in analyses of locoregional or distant relapse-free survival as well as in analyses of breast cancer-specific survival. Patients presenting with distant metastases at diagnosis were excluded from the analyses of locoregional relapse-free survival. All analyses were stratified by study.
In addition to univariate analyses, we performed multivariate analyses, which were stratified by study and age category (≤50 years; >50 and ≤70; >70), and adjusted for tumor grade (1, 2 or 3, ordinal), tumor size (1: maximum diameter less than or equal to 20 mm; 2: more than 20 mm and less than or equal to 50 mm; 3 over 50 mm, ordinal), tumor spread in axillary lymph nodes (0 = negative, 1 = positive) and progesterone receptor (PR) The "missing" category included also rare forms of breast cancer, which did not belong to the named categories: 1179 non-carriers, 25 p.I157T carriers and 9 c.1100delC carriers ‡ Tumor subtypes are defined according to ER, PR and Her2 expression following the St Gallen 2013 guidelines [34] Italics is used to indicate the proportion of study subjects in each category. E.G. 'ER-positive/all with known ER-status' or 'missing/all study subjects' status (0 = negative, 1 = positive). ER was not included in the model, because of the non-linear relationship between tumor ER status and patient survival during the 10 years following the diagnosis; patients with ER-negative tumors have a higher risk of dying from breast cancer during the first 5 years after the diagnosis, but the difference in risk between ER-positive and ER-negative tumors levels out after that period [17,25]. However, since several studies have reported an association between the two CHEK2 mutations and ER-positive disease [12][13][14] (Table 1), we performed the survival analyses in a subgroup of patients with ER-positive tumors. Only cases with complete data on the pathological markers were included in the multivariate analyses. Univariate survival analyses were performed also in a subgroup of breast cancer patients with lobular tumors, because of the association between p.I157T and lobular breast cancer [15] (Table 1).

Study subjects for gene expression analysis
Gene expression analyses were performed using a data set of 183 breast tumors from the Helsinki University Hospital (GEO: GSE24450). As described previously, the data set consisted of total RNA samples from 151 tumors from unselected cohorts of breast cancer patients and 32 tumors from additional familial cases hybridized on Illumina HumanHT-12 v3 Expression BeadChips (Illumina Inc., San Diego, CA, USA) [10,26]. The p.I157T carrier status was defined from peripheral blood samples as described earlier for the BCAC study 'HEBCS' (Additional file 1: Table S1). Ten patients were germ line p.I157T carriers and 162 were non-carriers, of which six carried germ line c.1100delC. The c.1100delC carrier tumors were included in the analyses as non-I157T carriers. The p.I157T genotype information was not available for 11 study subjects. These were included in the molecular subtype analysis, but not in differential gene expression or gene set enrichment analysis. The clinico-pathologic characteristics of the 183 tumors are provided in Additional file 1: Table S4.

Gene expression analysis
Gene expression data quality control and quantile normalization was performed in the Bioconductor [27] as described earlier [26]. Data analyses were performed in R version 3.0.2 and Bioconductor packages genefu [28], limma [29,30] and geneplotter [31]. Probes not mapping to any current Entrez Gene entities (GRCh38.p2) were excluded, resulting in a filtered data set of 20,145 genes.
For determining the intrinsic molecular subtypes, expression data of the fifty PAM50 signature genes was extracted from the filtered data set, median centered and standardized per gene by dividing with the standard deviation of the gene's expression values. Intrinsic subtypes were defined by Pearson correlation between tumors and the luminal A, luminal B, human epidermal growth factor receptor 2 (Her2)-enriched, basal-like and normal-like centroids as implemented in the genefu package [28,32]. Hierarchical clustering was performed using the Ward's method [33]. As a comparison to the subtype classification by gene expression, we used the surrogate clinico-pathologic markers to define the subtypes following the St Gallen 2013 criteria (luminal A: ER+, PR+, Her2-, Ki-67-; luminal B (three marker combinations): ER+, PR-, Her2-or ER+, Her2-, Ki-67+ or ER+, Her2+; basal: ER-, PR-, Her2-; Her2 overexpressing: ER-, PR-, Her2+) [34].
For analysis of differential gene expression the data was filtered by including only genes with highest variation in expression levels over the entire data set (st. dev. ≥ 0.75, 1852 genes). The samples from p.I157T carriers were compared to samples from non-carriers with a moderated t test adjusting for ER, tumor protein 53 (TP53) and Ki-67 protein expression (positive/negative), tumor grade (1, 2, 3, ordinal) as well as histological type (lobular/other). The adjusting covariates were selected from features tabulated in Additional file 1: Table S4 as the most significant factors (p < 0.001) explaining variation in the expression of the 1852 genes as summarized by the first four principal components. Additionally, lobular histologic type was included to avoid bias caused by the association between the p.I157T and lobular type. Data on at least one of the adjusting variables was missing for 12 tumor samples and thus the differential gene expression and gene set enrichment analyses were performed with a set of 160 (ten p.I157T and 150 non-carrier) tumor samples and 1852 genes. Genes with p values below 0.01 were considered to be associated with p.I157T. These were included in a functional enrichment analysis performed using the DAVID functional annotation tool [35]. Functional annotations with Benjamini-Hochberg [36] corrected p values below 0.01 were considered to be significantly enriched.
Gene set enrichment analysis (GSEA) was performed using a java application available at http:// software.broadinstitute.org/gsea following the instructions of the user guide [37]. For the GSEA analysis, the 1852 genes were ranked according to a score calculated as the product of log 2 (fold change) and log 10 (p value) from comparisons of p.I157T carrier and non-carrier tumors as described above. All gene sets available at the Molecular Signatures Database (MSigDB) v5.0 [38] were included in the analyses. The p values were corrected for false discovery rate for all other gene sets but the gene sets originating from single publications ('CGP: chemical and genetic perturbations' database), which were corrected for the family-wise error rate. Gene sets with corrected p value below 0.05 were considered to be significantly enriched in the p.I157T carrier tumors.

Results and discussion
Our findings from extensive analyses of breast tumor phenotypes and patient survival underline a fundamental difference in breast cancers of the carriers of two CHEK2 mutations, p.I157T and c.1100delC. Significant differences were found in tumor grade and histopathological type as well as in patient survival of p.I157T and c.1100delC carriers, whereas no difference was seen in tumor subtypes: ER+, PR+, Her2-disease was the most common type for carriers of both mutations.

Association of p.I157T with clinico-pathological markers
In our analyses p.I157T was associated with low grade as well as several other markers, which have previously been associated with good prognosis (Table 1). Our analyses confirmed the previously reported associations between p.I157T and ER-positive or lobular breast cancer [15]. Also mixed (ductal and lobular) and tubular histological types were more frequent in p.I157T carrier tumors. Both of ER-positive and lobular tumor types are associated with good short-term prognosis, but increased risk of disease progression in the long run [25,39]. Furthermore, p.I157T was associated with PR-positive and TP53-negative breast cancer. PR expression is a marker for good prognosis for ER-positive breast cancer and it has been suggested as a surrogate marker separating luminal A and luminal B subtypes according to immunohistochemical data [34,40,41]. TP53 immunohistochemical staining is considered to be indicative of somatic TP53 mutations. Strong TP53 staining suggests the presence of stabilizing mutations (primarily missense), whereas absence of staining indicates typically a protein-truncating mutation (nonsense or frameshift), and weak staining a wild-type functional TP53. Both strong and completely negative TP53 staining have been associated with poor prognosis in comparison to weak staining [42][43][44]. The sensitivity of the assays used in this study did not enable differentiation between normal, low or absent TP53 expression. Therefore, we used binary classification of TP53 immunohistochemical data, the positive category corresponding to high expression (strong staining) and negative category to low expression (Additional file 1: Table S3). Noteworthy, the loss-of-function mutations associated with absent TP53 staining are relatively rare in breast cancer: these are seen in less than 5 % of all tumors [45,46]. Therefore, it is likely that most of the tumors in the category 'negative' (Table 1) represented tumors with wildtype TP53. However, compromised CHEK2 function as a result of the p.I157T mutation could be another way for TP53 silencing as CHEK2 is among the key upstream activators of TP53 [5].
Like p.I157T, also c.1100delC was associated with ERpositive and PR-positive disease in our data set (Table 1). Furthermore, TP53-positive tumors were slightly less often observed in c.1100delC carriers than in non-carriers, even though the difference was not statistically significant. Significant differences in clinico-pathological features associated with the two CHEK2 mutations were seen in grade and histological type, as the c.1100delC carrier tumors resembled more non-carrier tumors (Table 1).

Breast cancer subtypes
We investigated the I1157T-associated molecular breast cancer subtypes by applying the St Gallen 2013 criteria for immunohistochemical markers [34] on the BCAC data set as well as St Gallen 2013 and the PAM50 classifier [32] on the gene expression data set of 183 breast tumors. The subtype classification of the BCAC study subjects relied on the available immunohistochemical markers, ER, PR and Her2. We found both p.I157T and c.1100delC carrier tumors to be predominantly ER+, ER+, Her2-, suggestive of good prognosis ER+ tumors or the luminal A subtype (Table 1) [34]. Also the frequency of ER+ subtypes linked to poor prognosis (ER+, PR-, Her2-; ER+, Her2+), referred to as luminal B [34], were more common for CHEK2 mutation carriers than for non-carriers. This confirmed previous reports with regard to p.I157T [15], but was not consistent with previous reports on c.1100delC-associated tumor subtypes [47,48]. However, the difference between our findings and these reports may have arisen from different overall cohort compositions or from differing classification methods, as the guidelines for subtype classification have changed over the years.
Subtype classifications of the 183 tumors according to gene expression data and immunohistochemical markers were partly contradictory ( Table 2). Similar inconsistencies between gene expression-based classification and the surrogate immunohistochemical markers have been reported previously for other data sets [40,49]. Overall, the division between basal and luminal appeared rather consistent: only 17 (9 %) of the 183 tumors were classified differentially across the luminal-basal axis. PAM50 [32] classified three of the p.I157T carrier tumors as luminal A, two as luminal B and two as basal. Three lobular tumors were classified as normal-like (Table 2). This kind of a misclassification has been reported to be typical for lobular tumors due to their infiltrating growth pattern, which causes the tumor sample to consist of unusually high proportion of non-cancerous stromal cells [39]. St Gallen 2013 criteria classified these normal-like tumors as luminal (ER+). Furthermore, in unsupervised hierarchical clustering of the 183 tumor samples based on expression of the PAM50 signature genes (Fig. 1), two of the normal-like p.I157T tumors (HEL_045 and HEL_174) clustered within the luminal A branch suggesting that luminal A could be their true molecular subtype. In summary, luminal A appeared to be the most common subtype for p.I157T carrier tumors in the gene expression data concordantly with the findings in the BCAC data.

Patient survival
P.I157T carriers had better prognosis than the c.1100delC carriers with regard to overall or breast cancer-specific survival (Table 3a and b). This difference was possibly due to the poor survival associated with c.1100delC as reported previously by several studies [13,14,17,50]. No statistically significant difference in overall or breast cancer-specific survival was seen between p.I157T carriers and noncarriers. Hazard ratios in the analyses of subgroups of patients with ER-positive or lobular tumors were comparable to those of the main analyses (Table 3b and c).
Noteworthy, the c.1100delC carriers included here were only a subset of the study subjects included in the previous report by Weischer and colleagues on survival of c.1100delC carriers of the BCAC studies [17]. This was because the individual BCAC studies, which did not provide sufficient number of p.I157T carriers, were excluded from these analyses. Thus, the lack of statistical significance in some comparisons of survival difference between c.1100delC carriers and non-carriers (Table 3a and b) probably only reflected limited power due to low number of c.1100delC carriers, since the hazard ratios were always consistent with the previous report.
The different prognoses associated with p.I157T and c.1100delC possibly reflect their difference in molecular level severity of functional consequences. Therefore, it would be tempting to assume that the prognosis of all carriers of the truncating mutations would be similar to the prognosis of c.1100delC carriers. However, a recent Polish study combining three truncating CHEK2 founder mutations found no difference between mutation carrier and non-carrier survival [51]. Some part of the conflicting findings could be explained by different patient selection: in the Polish study all patients had been diagnosed before   [17] also postmenopausal patients were included in the analyses. Another potential explanation could be mutationspecific survival effects. As the Polish study combined in the analyses three different truncating mutations, the c.1100delC specific effects could have been masked, since it is the least common of the three truncating CHEK2 mutations in Polish population [52]. Similarly as here, the Polish study reported no significant difference in survival of the p.I157T carriers and non-carriers [51]. The hazard ratios for locoregional relapse and second breast cancer (91 % contralateral, 9 % ipsilateral)

0.29
Hazard ratios with 95 % confidence intervals (in parenthesis) and p values (italics) are reported from comparisons of p.I157T carriers and non-carriers (nc) as well as comparisons of p.I157T carriers and c.1100delC carriers. All analyses were stratified by study. Multivariate analyses were stratified by study and age category, and adjusted for tumor grade, size, progesterone receptor and nodal status. Analyses were performed also in subgroups of (b) patients with estrogen receptor-positive tumors and (c) patients with lobular tumors ER estrogen receptor associated with p.I157T and c.1100delC were close to the mutations' relative risk estimates of primary breast cancer (Table 3) [5,17,53]. The marginally significant increased risk of locoregional relapse associated with p.I157T in the adjusted analyses (hazard ratio 1.62 [0.99 -2.66], p value 0.056) warrants further studies, but could merely reflect the baseline risk associated with p.I157T: some of the local recurrences could represent new cancers arising during the 10-year follow-up. The risk of locoregional relapse for c.1100delC carriers was elevated in the univariate analysis but leveled out in the adjusted analysis.

P.I157T associated differentially expressed genes
In order to investigate the molecular biology of p.I157T carrier tumors and to identify potential tumor-driving events and pathways, we performed an analysis of differential gene expression and subsequent functional enrichment analysis comparing ten p.I157T to 150 non-carrier tumors. We found 21 genes to be differentially expressed between p.I157T and non-carrier tumors. All of these had higher expression in the p.I157T carrier tumors (Table 4). When the 160 tumor samples were clustered according to expression of these 21 genes, the p.I157T tumors did not form a distinct cluster (Fig. 2), suggesting that high expression of these genes is not exclusive of the p.I157T mutation carrier tumors, but typical for a subgroup of breast tumors including the mutation carrier tumors. Tumors with different intrinsic subtypes appeared to be dispersed across all branches, similarly as the c.1100delC carriers suggesting that in this data set the c.1100delC carrier tumors would not be similar to the p.I157T tumors.

Enrichment of features associated with lobular breast cancer
The list of 21 differentially expressed genes contained seven collagen genes (Table 4), which were a major driver in the functional enrichment analysis. The enriched annotations from DAVID [54] analysis included characteristics of the collagen family and their related functions such as 'focal adhesion' , 'extracellular matrix (ECM) organization' and 'ECM-receptor interaction' (Additional file 1: Table  S5). Similar results were obtained from the GSEA analysis (Additional file 1: Table S6, Additional file 2: Figure S1). Since collagens are usually expressed by stromal fibroblasts, the findings may suggest that infiltrating growth pattern, typical for lobular tumors [39], could be more common also for non-lobular p.I157T carrier tumors than for the non-carrier tumors. Further support for this hypothesis came from the GSEA, which showed cadherin 1 (CDH1) target genes to be significantly enriched among genes, whose expression was lower in p.I157T than in non-carrier tumors (Additional file 1: Table S6). CDH1 silencing is generally considered as a defining characteristic of lobular tumors and it is often caused by somatic mutations targeting the CDH1 gene itself [39]. However, since the differential gene expression analysis, which was also the basis for the ranked gene list used as an input to GSEA, was adjusted for the lobular tumor type, the impact of the diagnosed lobular cancers on these findings should have been minimal. CDH1 gene expression was lower in p.I157T carriers tumors in the adjusted analysis (log 2 fold change -1.12, p value 0.03, Fig. 3), but it did not exceed the preset threshold for significance. Previously, we have reported CDH1 mRNA expression to be higher in c.1100delC carrier than in non-carriers tumors [10]. Therefore, CDH1 expression appears to be yet another factor, which is not shared by breast tumors from carriers of the two CHEK2 mutations, p.157 T and c.1100delC, and possibly reflects somatic changes, which have taken place during the clonal evolution of the p.I157T carrier tumors [39]. Taken together, these results suggest that besides the fact that the lobular tumors are more common among p.I157T carriers and non-carriers, the association between p.I157T and lobular features could be even stronger than what is suggested by the diagnosed histological types.

Enrichment of cancer associated gene signatures
In the GSEA analysis, several independent MSigDB [38] gene signatures related to epithelial-to-mesenchymal  [55][56][57], stromal stem cells [58] or invasive behavior [59,60] were enriched at the top of the gene list with higher expression in p.I157T carrier tumors than in non-carrier tumors (Additional file 1: Table S6). These observations may reflect higher stromal content of the p.I157T carrier tumor samples, as the samples were not prepared at a single cell level. However, to prevent such confounding effects the tumor sample sections were selected by an experienced breast cancer pathologist. Furthermore, the above mentioned MSigDB signatures originated from carefully designed experiments tailored to detect the true signal from cancerous epithelial cells and to escape the effects of non-cancerous stromal cells. The enrichment of these signatures may suggest that the p.I157T carrier tumors have an intrinsically invasive nature. However, this should have been reflected into poor prognosis for the p.I157T carriers, which we did not see in the survival analyses. On the other hand, it is possible that higher state of differentiation of the tumor cells suggested by low grade accompanied with the invasive nature can be seen in the prognosis only in the long run, and within the 10-year follow-up period is only reflected in the slightly elevated risk of local recurrence. All in all, these observations deserve further studies before any definitive conclusions can be made. In addition to CDH1, tumor suppressor retinoblastoma 1 (RB1) appeared as a potential gene expression regulator, whose activity was reduced in p.I157T carrier tumors in comparison to non-carrier tumors (Additional file 1: Table S6, Additional file 1: Figure S1). RB1 and its direct downstream target E2F-1 are both targets of the CHEK2 protein [61,62]. Thus, the differential expression of the RB1 target genes possibly reflects compromised CHEK2 function in the p.I157T carrier tumors.
Noteworthy, the two differential gene expression studies on c.1100delC carrier tumors have reported enrichment of genes of WNT and FGF pathways [10,47], which regulate the growth and differentiation of normal breast epithelium [63][64][65][66]. Among the p.I157T-associated differentially expressed genes we did not see enrichment of any growth factor pathway. These notions on differences in gene expression signatures are more descriptive than definitive by nature, but they further emphasize intrinsic biological differences between p.I157T and 1100deC carrier tumors.

Conclusions
Based on our analyses, breast cancers of p.I157T and c.1100delC CHEK2 mutation carriers differ in disease severity as seen especially in differences in tumor grade and patient survival, as well as in intrinsic biological features as seen in differences in histological type and gene expression profiles. Thus, it appears that even though both mutations have been proven to compromise the protein function [6,9,11], they have different consequences on the disease phenotype, and prognostic findings based on one mutation cannot be generalized to the other. Furthermore, our results raise a hypothesis that the increased risk of locoregional relapse for p.I157T carriers could be caused by intrinsically invasive nature of the tumor cells. Future studies with longer follow-up are needed to test this hypothesis.

Additional files
Additional file 1: Table S1. Description, genotyping methods and references for individual studies as reported by the studies. Table S2. Genotype and follow-up data availability in individual studies. Table S3. Sources and scoring of TP53 immunohistochemistry data used in this study. Table S4. Pathological characteristics of 183 breast tumors used in the gene expression analysis. Table S5. Functional annotations enriched in the 21 differentially expressed genes. Table S6.

Availability of data and materials
The dataset supporting the conclusions of this article is available in the Gene Expression Omnibus (GEO) repository, GEO: GSE24450, http://www.ncbi.nlm.nih. gov/geo/query/acc.cgi?acc=GSE24450.