Impact of suppression of tumorigenicity 14 (ST14)/serine protease 14 (Prss14) expression analysis on the prognosis and management of estrogen receptor negative breast cancer

To elucidate the role of a type II transmembrane serine protease, ST14/Prss14, during breast cancer progression, we utilized publically accessible databases including TCGA, GEO, NCI-60, and CCLE. Survival of breast cancer patients with high ST14/Prss14 expression is significantly poor in estrogen receptor (ER) negative populations regardless of the ratios of ST14/Prss14 to its inhibitors, SPINT1 or SPINT2. In a clustering of 1085 selected EMT signature genes, ST14/Prss14 is located in the same cluster with CDH3, and closer to post-EMT markers, CDH2, VIM, and FN1 than to the pre-EMT marker, CDH1. Coexpression analyses of known ST14/Prss14 substrates and transcription factors revealed context dependent action. In cell lines, paradoxically, ST14/Prss14 expression is higher in the ER positive group and located closer to CDH1 in clustering. This apparent contradiction is not likely due to ST14/Prss14 expression in a cancer microenvironment, nor due to negative regulation by ER. Genes consistently coexpressed with ST14/Prss14 include transcription factors, ELF5, GRHL1, VGLL1, suggesting currently unknown mechanisms for regulation. Here, we report that ST14/Prss14 is an emerging therapeutic target for breast cancer where HER2 is not applicable. In addition we suggest that careful conclusions should be drawn not exclusively from the cell line studies for target development.


INTRODUCTION
Type II transmembrane serine proteases are a family of proteases that play physiological roles including embryo development and maintenance of homeostasis [1,2]. A representative member is protease serine 14 (Prss14), originally called Epithin when it was cloned from thymic epithelium [3]. It has been also called as matriptase [4], membrane type serine protease 1 (MT-SP1) [5], and suppression of tumorigenicity 14 (ST14) [6]. We will use ST14/Prss14 as the name hereafter.
ST14/Prss14 is a protease normally on the cell surface with processing at the Gly 149 residue during biogenesis [7]. The N terminal fragment containing cytoplasmic, transmembrane domains, and a short ectodomain is linked to the rest of the C-terminal fragment containing CUB, LDLR, and a protease domain. Protease activity of ST14/Prss14 requires cleavage at Arg 614, presumably by autocatalysis (reviewed in [8][9][10]). Upon TGFβ induction, the ectodomain is further processed by the TNFα converting enzyme (ADAM17) to be shed from the membrane [11]. Phorbol ester can also induce shedding by regulating ST14/Prss14 binding to filamin, a cytoplasmic actin binding protein [12]. An isoform of ST14/Prss14 missing the LDLRA4 domain is defective in shedding [7]. Shed protein recovered from a thymoma cell line and transfected cells behaves as a proangiogenic factor [13]. In addition, we showed that ST14/Prss14 is necessary and sufficient for epithelial mesenchymal transition (EMT) [14]. It also plays an important role in transendothelial migration of epithelial cancer cells [15] and activated macrophage [16].
Physiological functions of ST14/Prss14 are revealed by genetically modified mouse models [17,[26][27][28][29] and genetic mutations observed in human populations with a rare skin phenotype [30]. ST14/Prss14 primarily functions in epithelial tissues including skin and thymus. When regulated by the inhibitors or by overexpression, most phenotypes appear as forms of malignant tumors in epithelial tissues [31]. For example, skin-specific overexpression of ST14/Prss14 under Keratin 5 promoter in mouse induced spontaneous skin hyperplasia advanced further into squamous cell carcinoma [26]. Recently, its role in accumulation of inflammation was reported [28]. By using multi-transgenic crosses to manipulate expression of ST14/Prss14 and its inhibitor SPINT2, Thomas Bugge's group showed that ST14/Prss14 can initiate tumorigenesis by causing tumor-associated inflammation.
Genetic profiling, such as a gene expression array analysis, revealed that overexpression of ST14/Prss14 was frequently found in advanced epithelial cancers [32][33][34]. However, as the name suggests, 'suppression' of tumors by ST14/Prss14 was claimed based on the observation that the expression is reduced in colon cancer [6]. Indeed, a study using systemic expression profiling and immunohistochemistry (IHC) showed that its expression is lower in advanced ovarian cancers [35]. Thus, it is confusing to determine whether ST14/Prss14 has proor anti-carcinogenic activity. It appears that sample sizes, tissue types, and context with microenvironments influence the outcome of the results. Furthermore, variations in expression of its cognate inhibitors, SPINT1 and SPINT2, seem to add another level of complexity [36,37]. Therefore, careful detailed analyses on expression profiling are necessary to clarify the role of the ST14/Prss14 in specific cases.
Another important parameter derived from the expression profiling is the signature genes of epithelialto-mesenchymal transition (EMT). EMT is an event that involves massive gene expression changes, resulting in switches from one type of more epithelial into the other type of more mesenchymal shape and characteristics (reviews in [40,41]). After EMT, cancer cells acquire abilities of penetrating basement membranes and blood vessels, and of moving to distant sites. As a result, another focus of cancer cells can settle and grow as secondary tumors. Data on EMT signature genes are being accumulated from studies using model cancer cell lines in defined conditions [42][43][44]. For example, during EMT progression, E-cadherin (CDH1) expression decreases, vimentin (VIM) expression increases transiently, while N-cadherin (CDH2) and fibronectin (FN1) expression increase [45]. However, detail mechanisms remain as questions at this time point as to how EMT proceeds and which transcription factors govern the particular signal pathways, making it difficult to connect the whole picture.
In this study, we attempted to clarify the role of ST14/Prss14 in breast cancer progression and patients' survival using public databases of gene expression profiles. Through the analyses, we suggest that ST14/ Prss14 is an excellent prognosis marker and therapeutic target for ER -/TN breast cancers. We also propose yetto be-identified modes of action for ST14/Prss14 during EMT.

ST14/Prss14 is a superior prognosis marker for ER negative breast cancer
Previously, we reported that ST14/Prss14 can enhance metastasis in a mouse breast cancer model [15]. To investigate whether the conclusion drawn from the mouse study is valid in human breast cancer patient population, we searched ST14/Prsss14 expression by using a well-documented TCGA BRCA database. We first examined survival rates of breast carcinoma patients grouped according to ST14/Prss14 expression levels with assignment of "high" and "low" (relatively higher or lower than the average of the whole data points, respectively) and calculated hazard ratio (HR) for each group by using the Mantel-Haenszel method. The ST14/Prss14 high group show poorer survival than the ST14/Prss14 low group (high vs. low, HR: 1.605, Figure 1A). In comparison, survival curves of high and low groups of HER2, a target of the blockbuster antibody drug Herceptin, were between ST14/ Prss14 high and the ST14/Prss14 low . When survival curves were examined by separating groups sequentially, first by HER2 levels, and then by ST14/Prss14 levels, differences in the survival profile became clearer. Among the HER2 low groups, ST14/Prss14 high group showed the poorest survival, while the ST14/Prss14 low group showed the greatest survival (HER2 low /ST14/Prss14 high vs. HER2 low / ST14/Prss14 low , HR: 4.064, P < 0.01, Figure 1B). Among the HER2 high groups, the ST14/Prss14 low group showed poorer survival than the ST14/Prss14 high group (HER2 high / ST14/Prss14 high vs. HER2 high /ST14/Prss14 low , HR: 0.473).
Next, we wondered whether ST14/Prss14 expression shows any degree of correlation with the stages of TCGA breast cancer used in this analysis ( Table 1). The patients in advanced cancer stages III-IV showed poorer survival relative than the patients in the earlier stage I-II ( Figure 1C). Although the average of ST14/Prss14 expression did not appear to be significantly different in two stage groups ( Figure 1D), the ST14 /Prss14 high  subgroup at stage III-IV showed the poorest survival  (III-IV/ST14/Prss14 high vs. III-IV/ ST14/Prss14 low , HR: 1.350, P > 0.05, Figure 1E).
We attempted to analyze ST14/Prss14 expression status with known molecular markers that could separate breast cancers into subgroups. Among the three molecular markers, ER, PR and HER2, as shown in Figure 1F, ER status affected the survival (low vs. high, HR: 4.365, P < 0.01). Interestingly, the level of ST14/ Prss14 expression was significantly higher in the ERgroup than in the ER + group (ERvs. ER + , P < 0.0001, In box plots, the median was plotted as a line in the middle of the gray box. The whiskers were drawn down to the 2.5 percentiles and up to the 97.5 percentiles. Points below and above the whiskers were outlier dots. A one-way ANOVA was calculated between groups and P values were determined by Sidak's multiple comparisons test. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. Figure 1G). Particularly, ST14/Prss14 high in ERshowed the worst outcome, while ST14/Prss14 low in ERshowed the best (ER -/ST14/Prss14 high vs. ER -/ ST14/Prss14 low , HR: 4.213, P < 0.05 Figure 1H). Among the groups of patients divided into four subtypes, triple negative (TN) breast cancer which was negative for these three receptors resulted in the lowest survival rate ( Figure 1I) and the expression level of ST14/Prss14 was the highest (TN vs. luminal A, P < 0.001, TN vs. luminal B, P < 0.0001, TN vs. HER2, P > 0.05, Figure 1J).
In order to examine whether the poor survival is due to high ST14/Prss14 expression, and not due to the absence of HER2 in TN, we tried to compare the survival of ST14/Prss14 high and ST14/Prss14 low populations in the HER2 and TN breast cancer groups. Although the numbers of data in the TCGA breast cancer database are extremely low and all TN patients show high ST14/ Prss14 expression, the pattern showed poor survival in ST14/Prss14 high patients regardless of HER2 expression ( Figure S1A and S1B). From another data set (GSE20685) derived from Taiwanese studies [46], a similar pattern was observed ( Figure S1C and S1D). However, in these analyses, statistical significance was weak or not significant, most likely due to the limiting numbers of data points.
Therefore, these results suggested that ST14/Prss14 is an excellent prognosis marker, especially in ERand TN breast cancers. Moreover, ST14/Prss14 is a highly potential therapeutic target in TN breast cancers when Herceptin is not an option.

Context dependent ST14/Prss14 expression and function in ERbreast cancer patients
Because the increased ratio of ST14/Prss14 to its inhibitor SPINT1 was observed in some carcinomas [36,37], we examined the ratio of ST14/Prss14 to SPINT1 and SPINT2. While the averages of SPINT1 and SPINT2 expression levels are not different in the cancer stages I-II and III-IV (Figure 2A), SPINT2 expression levels were somewhat lower in ERand TN breast cancer groups (Mean differences of SPINT2, ERvs. ER + , P < 0.01, TN vs. luminal A, P < 0.01, TN vs. luminal B, P < 0.01, TN vs. HER2. P > 0.5) ( Figure 2B and 2C). The ratio of ST14/ Prss14 to SPINT1, ratio(I), and that of ST14/Prss14 to SPINT2, ratio(II), showed no big difference among the cancer stages ( Figure 2D). However, both ratios were significantly higher in ER -( Figure 2E) and TN groups than in others ( Figure 2F).
The survival curves examined by these ratios of ST14/Prss14 to inhibitors did not reveal significant differences ( Figure 2G). In addition, high and low ratios as well as those in cancer stage III-IV group also did not reveal any significant differences in the survival profiles ( Figure 2H). The ratio(II) high is somewhat higher in the ERgroup and showed poorer 5 year survival, however, without statistical significance ( Figure 2I). These results suggest that inhibitor expression does not critically contribute to patient's survival, particularly for early years. However, larger number of cases can help to conclude whether the protease/inhibitors ratio contribute to the patient's survival. Table 2 shows a coexpression pattern of known ST14/Prss14 substrates and their receptors/interacting partners. Genes of coexpression profiles with better significance are CUB domain-containing protein 1 (CDCP1), uPA receptor (PLAUR), EGFR, HGF receptor (MET), and desmoglein-2 (DSG2), a component of laminin 5 (LAMB3). These genes are all well-known not only for EMT but also other cancer processes. Among the negatively associated genes, hepsin (HPN), macrophage stimulating protein (MST1), platelet derived growth factor D (PDGFD), and Tie 2 (TEK) showed a stronger correlation. These results suggest that a subset of ST14/Prss14 substrates is involved in breast cancer progression.   Genes were analyzed for correlation with ST14/Prss14 and ER expression status, using TCGA BRCA data. Genes were arranged by the degree of correlation values with ST14/Prss14. *P ≤ 0.05, **P ≤ 0.01. ***P ≤ 0.001, ****P ≤ 0.0001, ns (not significant).

Positioning ST14/Prss14 in the cluster of EMT signature genes with breast cancer types
We next positioned ST14/Prss14 in clusters with particular EMT signature genes in breast cancer patients to find clues for the specific role of ST14/Prss14 in the EMT process that we described earlier [14]. Our selection of 1085 EMT signature genes (Table S1) were first clustered according to the cancer stages, using a selforganizing map (SOM) analysis ( Figure S2). However, it was difficult to clearly differentiate expression clusters between the two groups of cancer stages (stage I-II or stage III-IV). A clustering pattern did not reveal any satisfying grouping of the known EMT signature genes with the early I-II and the late III-IV stages. Therefore, we used an alternative clustering strategy in which ER expression status of the cancer tissues was taken into consideration ( Figure 3). The resulting clusters revealed several differential expressions between the ERand ER + groups. Cluster 1, of which expression was low in ERpatients, contains the two inhibitors, SPINT1 and SPINT2. On the contrary, cluster 6 containing genes show the opposite pattern to cluster 1 (high in ER -, low in ER+ patients). Cluster 6 includes ST14/prss14 and P-cadherin (CDH3). While cluster 2 contains CDH1, cluster 3 does not appear to show a distinctive pattern. In contrast, cluster 4 and cluster 5 contain FN1, VIM, and CDH2. Cluster 4 and 5 showed similar patterns to cluster 6 with minor differences. From these results, the role of ST14/ Prss14 during the EMT process cannot be easily resolved since it is not co-clustered with any of the stage specific EMT markers, such as CDH1 or CDH2 (see discussion).
Next, to verify the pattern of TCGA BRCA gene clustering, we chose to analyze another breast cancer dataset from the Gene Expression Omnibus (GEO) database GSE20685. For comparison with the TCGA BRCA dataset, we named the cluster containing genes that are expressed in the higher level in ERthan that in ER + breast cancer cells as Higher Cluster in Patients (HCP). Similarly, for the other cluster containing genes expressed in lower level in ERbreast cancer, we named it as Lower Cluster in Patients (LCP). In the case of the TCGA BRCA Figure 3: A cluster analysis of 1085 selected EMT signature genes by ER expression status of TCGA BRCA data by a SOM analysis. Six clusters (3X2) were applied using an Euclidean distance. ER status: ER -, gray bar, ER + , black bar. Breast cancer subtypes: luminal A, pink, luminal B, orange, TN, green, HER2 , sky blue, unclassified, light gray. TFs , blue arrow, TFs of cluster 1 and 6 common in two different datasets, with stars. Signature EMT markers along with ST14/Prss14 and inhibitors were indicated. Color key: -2.0 (green) to 2.0 (red). dataset, cluster 6 and cluster 1 (Figure 3) are HCP1 and LCP1, respectively. When HCP1 and LCP1 were compared to those of the GEO dataset, HCP2 and LCP2 ( Figure 4A-4B), a large proportion of the EMT genes were coclustered ( Figure 4C): 252 HCP genes, named as HCP3, are positioned in both HCP1 and HCP2 (Table S2) and 120 LCP genes, named as LCP3, are positioned in both LCP1 and LCP2 (Table S3, Figure 4C).

Coexpression analyses of EMT signature genes and ST14/Prss14 in ER -/low and ER +/high breast cancer patients
To study the regulatory mechanism of ST14/Prss14 expression during the EMT process in detail, we next examined the degree of correlation between well-known EMT signature genes and ST14/Prss14 in ER +/high and ER -/low populations (TCGA, n = 459, where ERand ER + was judged by IHC and GSE20685, n = 108, where ER high and ER low were determined by above and below the standard deviation based on the average value). Although ST14/ Prss14 expression is distinctively high, none of the EMT stage specific genes among CDH1, VIM, FN1, CDH2 showed a clear correlation with ST14/Prss14 ( Figure 5A). However, the majority of CDH2 high expressors, low in numbers, reside in the ST14/Prss14 high population. Interestingly relatively less studied P-cadherin (CDH3) showed the significant correlation values both in ER -/low and ER +/high (P < 0.001). These analyses suggest that ST14/ Prss14 expression in the ER -/low population could not be assigned to any conventional specific EMT stages, pre-EMT or post-EMT states.
Next, coexpression values of well-known EMT TFs in HCP3 and LCP3 clusters, whose protein class was based on PANTHER, are shown in Table 3 Figure 5B). Six TFs (ER, TBX3, SMAD3, TSHZ1, PTGER3, and TRPS1) yielded well distinguishable separate high and low populations, suggesting they may function as negative regulators of ST14/Prss14 in ER -/low populations ( Figure 5C).
In addition, we examined the previously well-characterized EMT TFs outside of HCP3 and LCP3 clusters ( Figure 5D). In cases of TWIST1 and ZEB1, they show negative correlations (r = -0.1639, P < 0.001 and r = -0.1577, P < 0.01, respectively). ZEB1 is previously shown to be a negative regulator of ST14/ Prss14 expression [47]. ZEB2 and SNAI2 did not show strong correlation values with ST14/Prss14 in either ER -/low or ER +/high breast cancers.
Next, we navigated putative cellular function or signal pathways of associated TFs through a text mining tool ( Figure S3). The majority of TFs in HCP3 were associated with cell migration, and stem cell development, cancer progression, and metastasis, in addition to EMT. Therefore, TFs which are expressed highly in ER -/low breast cancer populations have a strong connection to cancer progression and metastasis through EMT as well as to regulation of the expressions of ST14/Prss14. In contrast, fewer known functions such as regulation of granulocyte differentiation and steroid hormone receptor activity, were associated with TFs in the LCP3 group.

Analyses of gene sets with cultured breast cancer cells reveal a paradox in ST14/Prss14 expression
In order to further verify the regulatory mechanism of ST14/Prss14 expression, we decided to examine expression of EMT signature genes in cultured breast cancer cell lines. Two sets of mRNA array data of breast cancer cell lines (5 in NCI-60 and 57 in CCLE databases) were sorted according to ER expression status. Next, EMT signature genes were clustered according to the ER status. For comparison with the patients' data (HCPs LCPs), we named the cluster with high expression level in ER low as Higher Cluster in Cell (HCC) and the one with low expression in ER low as Lower Cluster in Cell (LCC). We extracted from two different cell line databases, NCI-60 and CCLE, and named them HCC1/LCC1 ( Figure 6A) and HCC2/LCC2 ( Figure 6B), respectively. 294 genes (HCC3 , Table S4), were identified common in HCC1 and HCC2, while 228 genes (LCC3 , Table S5) were identified common in LCC1 and LCC2 ( Figure 6C). When HCP3/LCP3 and HCC3/LCC3 were compared, there were only 74 genes in the high group and 69 genes in the low group showing the significant discrepancy between patient data and cell line data. Moreover, some genes were allocated to the opposite side in patient and cell line data. Specifically, 17 genes are members of the HCP3 group in the patient data, but, those of the LCC3 group in the cell line data ( Figure 6C). ST14/Prss14 is one of such genes: its expression was low in ER low breast cancer cell lines, clustered together with CDH1 and its two inhibitors, SPINT1 and SPINT2 ( Figure 6A and 6B). We also identified 10 genes differentially positioned from LCP3 to HCC3 ( Figure 6C). The list of genes that switched clustering positions from the high to the low (HCP3LCC3) and from the low to the high (LCP3HCC3) are listed in Table 4.
A gene networking of the 17 genes in HCP3LCC3 were searched for putative functional pathways through text mining ( Figure 6D). Most genes have genetic interaction among them or the proteins they encode are co-localized. In their subcellular localization according to Genecard (http://www.genecards.org), the majority of gene products are present in the extracellular space, the plasma membrane, and/or the nucleus ( Table 5). Some of the genes were apparently associated with cell growth, granulocyte migration, cell chemotaxis, and cellcell adhesion. Many genes are localized in the extracellular space and/or plasma membrane, suggesting participation in the progression of ERbreast cancer responding to external stimulus. Due to these apparent discrepancies of gene expression patterns between patient and cell line data, we decided to look at these outliers, HCP3LCC3 and LCP3HCC3, with more attention. To exclude the possibility that the increased expression level of ST14/ Prss14 in ERpatients may have resulted from other types of cells in the cancer microenvironment, we analyzed stromal or immune signature genes ( Figure 7A). The expressions of stromal and immune signature genes did not show any bias, but were generally similar in ERand ER + cancers. However, the epithelial cell specific adhesion molecule (EpCAM) is distinctively higher in the ERpopulation than in the ER + population ( Figure 7A), showing strong positive correlation with ST14/Prss14 (r = 0.2213, P < 0.001) ( Figure 7B). These results suggest that high expression of ST14/Prss14 in the ERpopulation is more likely coming from the cancer tissue rather than from the stromal or immune cells.
In an attempt to elucidate the underlining mechanism of regulating ST14/Prss14 expression, we searched the GEO database to get a hint of regulation by exogenous signals, for example, estrogen that is provided from noncancer cells. A basal type ER -MDA-MB-231 cell shows undetectably low ST14/Prss14 expression in western blotting (data not shown). When ER is introduced into cell by transfection, ST14/Prss14 expression does not change either by estrogen induction and/or the presence of ER ( Figure 7C). In the case of MCF7, a luminal A type ER + ST14/Prss14 expressing breast cancer cell Common TFs in TCGA BRCA and GSE20685 analyzed correlation with ST14/Prss14. TFs in HCP3 and LCP3 were arranged by the degree of correlation values with ST14/Prss14. *P ≤ 0.05, **P ≤ 0.01. ***P ≤ 0.001, ****P ≤ 0.0001, ns (not significant).
line, ST14/Prss14 expression was diminished when ER expression was depleted by siRNA ( Figure 7D). The other 12 out of 16 genes showing the coexpression profile also behaved similarly in the cluster analysis with the conditions by removing ER expression. Furthermore the genes negatively correlated in their expression behaved in an opposite fashion. These results suggest that ST14/ Prss14 expression depends on both ER and the cellular environments in ER + cell lines. Therefore, regulation of ST14/Prss14 is independently regulated by different mechanisms in ER + and ERbreast cancer cell lines.

DISCUSSION
In this report, we evaluated ST14/Prss14 for a prognosis marker and a therapeutic target to ERbreast cancer by utilizing various public data.

ST14/Prss14 alone is sufficient to be an independent prognosis marker for ERbreast cancer
Survival of breast cancer patients with higher levels of ST14/Prss14 expression is poor for all breast cancer populations regardless of cancer stages ( Figure 1A and 1E). The value of ST14/Prss14 as a prognosis marker is even higher than that of the well known therapeutic target HER2 ( Figure 1A). For breast cancer patients with little HER2 expression, ST14/Prss14 expression is especially useful as a prognosis marker ( Figure 1B), and therefore for a therapeutic target. ER expression is particularly important to draw a correct prognosis. Expressions of ER and ST14/Prss14 in TCGA and GEO breast cancer patients are clearly in separate populations ( Figure 5C). In ERpatients, all the low ST14/Prss14 expressors survived while half of the high expressors survived less than 5 years ( Figure 1H). Therefore, an expression analysis of ST14/Prss14, for example by classical immunohistochemical analysis as performed in earlier reports [48,49], together with those of ER and HER2 would provide an added benefit to predict the prognosis of breast cancer patients.
The ratios of ST14/Prss14 to its inhibitors showed minor correlative difference with survival rates of breast cancer patients (Figure 2). The same was true for the survival rates of the cancer stage III-IV group or the ERgroup. These results indicate the ST14/Prss14 expression itself, but not the ratios to the inhibitors, is critical for the prognosis as proposed earlier in some other types of cancer. The contradiction to earlier reports [36,37] showing the importance of the ratio in malignant tumor progression can be due to smaller sample sizes and, more likely, lack of genetic classification of breast cancer subtypes in those studies.
Network analysis of expression profiles of ST14/ Prss14 substrates and its interactors revealed a context dependent behavior in breast cancer ( Table 2). A probable pathway in breast cancer included CDCP1 that has known function in cell migration and cell matrix attachment [50]. Other genes positively associated with ST14/Prss14 include PLAUR, EGFR, and MET. These genes are also well documented for participation in epithelial cancer progression. HPN, PDGFD, MST1 are negatively associated genes, although, in weaker strength. The positively associated genes can play intrinsic roles in epithelial cancer progression together with ST14/ Prss14, while the negatively associated genes such as PDGFD and MST1, since they are secreted proteins, may also cooperate with ST14/Prss14 for affecting cancer progression when released from stromal and immune  in other clusters. Correlation coefficient r values between two genes were computed using Pearson correlation calculations. Two-tailed P values were calculated by an unpaired t test. *P < 0.05, ***P < 0.001,****P < 0.0001. Percentages of population by two genes expression in each ER -/low (red) and ER +/high (black) groups were marked in each quadrants. Linear interpolation line in whole patients , blue line. cells near cancer. Because ST14/Prss14 can be activated by an autoactivation mechanism, it may work as a master key activating these substrates and thus initiating various signaling pathways on cancer cells (and/or stromal and immune cells nearby) during ERbreast cancer progression and metastasis.

Does ST14/Prss14 play an alternative role in EMT pathways?
In a previous experiment with various epithelial cell lines, we showed ST14/Prss14 is necessary and sufficient for the EMT process [14]. In the other hand, SPINT1 was also previously shown to be involved in the EMT process [51]. However, the detailed mechanisms of how they participate in EMT is still unknown. In fact, our understanding for the EMT process is very limited although the stages can be divided into 4 steps [43]. The EMT process may also be comprised of multiple pathways as well as multiple steps, and not likely just turning on and off genetic switches.
Our initial search for the clues of ST14/Prss14′s roles in the EMT process was done by clustering a total of 1085 selected EMT signature genes (Table S1) from TCGA and GEO breast cancer patient data ( Figure S2 and Figures 3, 4). Those EMT signature genes in the breast cancer patients, when grouped according to ER expression level first, were clustered into 6 groups with distinct patterns (Figure 3). Typical EMT stage specific markers are positioned in different clusters showing differential expression patterns based on breast cancer subtypes. The position of ST14/Prss14 in the cluster is near post EMT markers, suggesting it may function in post EMT stages. However, a lack of a strong correlation of ST14/ Prss14 with any of stage specific markers suggests ST14/ Prss14 behaves in an unorthodox way.
Next, the same set of selected EMT signature genes were applied for the same analyses with the datasets from breast cancer cell lines (NCI-60 and CCLE databases, Figure 6). Interestingly, ST14/Prss14 expression was positioned in closer association with CDH1 expression in clustering. These results are consistent with previous reports on various types of cancer cell lines [52,53]. This is in contrast to the fact that the position of ST14/Prss14 is closer to CDH2 in our patient's data revealing a possible discrepancy in the data between patient and cell line (see discussion later).
To understand the regulation of gene expression, coexpressing TFs were searched. Among the well known key TFs in EMT, SNAI1 in HCP3, TWIST 1 and ZEB1 appear on the list for coexpressors ( Figure 5B and 5D). When degrees of correlations between expression levels of ST14 and TFs were calculated in ERpopulations as shown in Figure 5B, HMGA1, ELF5, CEBPB, VGLL1, showed higher correlation (r > 0.4) in the ERpopulation. A strongly associated gene, HMGA1 is still a mystery regarding its role in EMT. When those genes are networked by functions in Figure S3A, a majority of the genes are linked but with still uncharacterized functions. The most prominent features of the output, so far, include cell migration and Notch signaling in addition to EMT function. From these results altogether, we concluded that ST14/Prss14 is regulated in yet to be determined steps and pathways during the EMT process. Experimental approaches from these results will provide further details. Location of expressed proteins in the HCP3LCC3 gene list was marked in gray. Through the COMPARTMENTS database (http://compartments.jensenlab.org/Search), all cases which were reported more than one were shown.

Paradox of ST14/Prss14 expression in breast cancer patient and in cell lines regarding ER expression and cell subtypes
In search of explanations of this apparent paradox of ER expression and ST14/prss14 in breast cancer patients and cell lines, we approached the issues as follows. Since ST14/Prss14 is able to shed into the media [7,[11][12][13][14]54], it may therefore have function in trans. Therefore, ST14/ Prss14 expression can be also contributed from a cancer microenvironment. Our earlier finding on the expression in activated macrophage [16] also supports this idea. However, correlation analysis of ST14/Prss14 and the specific epithelial maker EpCAM showed statistically significant values for positivity while stromal/immune signature genes show no correlation ( Figure 7B). We interpret these as supporting results that the majority of the ST14/Prss14 message is coming from the ERcancer tissue. In addition, a study involving IHC shows ST14/ Prss14 expression in cancer tissue, not in stromal tissue [55]. However, we cannot formally exclude that part of ST14/Prss14 expression is coming from immune cells such as from activated macrophages. To resolve this issue, more careful experimental approaches are necessary.
The resolution of this paradox appears not so simple at this point. As seen in the coexpression analysis of known substrates (Table 2), only part of known ST14/ Prss14 functions are likely to be active in a breast cancer context. Therefore a context dependent role of this protein may also take part in eliciting variables, including often contradictory roles as is documented in the case of BMP7 [56,57]. It may also be dependent on the three dimensional tissue structure and/or to particular extracellular matrix proteins as suggested in the networking of integrins [58]. Any of the above proposals cannot completely explain the complicated paradox at this point and answers to questions will need to wait for many more experiments.

An important issue in biomarker and therapeutic target discovery using cell line models
In any case, detail mode of action studies on the mechanism of ST14/Prss14 expression in different cell populations, cancer tissues, and the microenvironment will be necessary to target ST14/Prss14 with therapeutic approaches. This paradoxical gene expression profiles serious perils if prognosis and therapeutic evaluations are derived solely from the cell cultures. As seen in an earlier analysis with cell lines and patients' data [59], smaller scales without subgrouping breast cancer types may lead to the contradicting results [59]. Although expression profiling of the breast cell lines was claimed to provide proper models for breast cancers [60], it is critical to include whole gene profiling with patients cancer samples, not only for marker discovery and correct prognosis, but also for individual based therapeutics in the future.

Datasets
Two public datasets for breast cancer patients were obtained from the Cancer Genome Atlas Project (TCGA) (http://cancergenome.nih.gov/) and NCBI Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/). The gene expression data include breast invasive carcinoma (BRCA) patients (level 3), enrolled from 2010-2013. The original 603 patients' data were reduced to 464 patients after removing duplicated samples. The only data annotated with the clinical data were selected for use in the analysis. Two array expression datasets of breast cancer cell lines were derived from the CellMiner database (http://discover. nci.nih.gov/cellminer/home.do) and the Cancer Cell Line Encyclopedia (CCLE, http://www.broadinstitute.org/ccle/ home). Agilent mRNA array data of NCI-60 cell lines in the CellMiner database provided 5 breast cancer cell lines. Normalized mRNA expression data of CCLE had 57 breast cancer cell lines chosen from about 1,000 different cancer cell lines. Two experiment sets of data, regulating the expression of ER in MCF7 and MDA-MB-231, were obtained from GEO (GSE27473 and GSE2251).

Group classification
TCGA patients were classified by cancer stages, I-II and III-IV based on the report with clinical data. Four breast cancer subtypes, luminal A, luminal B, triple negative, and HER2-enriched, were classified by the IHC status of ER, PR, and HER2. After dividing into luminal and non-luminal types by expression of ER and/or PR, four subtypes were determined according to HER2 expression. ER high/low groups of GSE20685 were differentiated by mean ± standard deviation of ER. Breast cancer cell lines of NCI-60 and CCLE were grouped by only the mean of ER because of limited sample sizes. Gene high/low groups were determined by mean of each gene in all normalized samples of each dataset. In case of grouping by ER expression status, only TCGA data were divided into negative (-) or positive (+).

Statistical analysis
For the 5 year survival curves, data of TCGA BRCA patients were processed in a Kaplan-Meier survival analysis after excluding those whose contacts were lost. The P value was calculated using a Log-rank (Mantel-Cox) test and the hazard ratio (HR) was determined by the Mantel-Haenszel method (http://www.graphpad.com/ guides/prism/6/statistics/index.htm?stat_the_hazard_ratio. htm). For the mRNA expression analysis, normalized values were drawn in box plots. The differences between multiple groups were assessed by a one-way ANOVA. A ratio of ST14/Prss14 to SPINT1 and SPINT2 was calculated by substituting values with the logarithm to the base of 2. For the correlation analysis, expression values of all patients were drawn in scatter plots with linear interpolation curves between the two genes. The correlation coefficient r values between the two genes were computed using Pearson correlation calculations. Two-tailed P values were calculated by an unpaired T test. The populations of TFs and ST14 by ER status were computed into percentages against each ER low and ER high . To analyze stromal and immune scores in the TCGA BRCA dataset, those signature genes, referred to in the publication [79], were selected. After normalization, each stromal and immune score was calculated as an average. All statistical analyses were performed in GraphPad Prism (version 6).

Cluster analysis
Cluster analyses were examined with self-organizing maps (SOM) using Euclidean distances and calculated 2 to 6 clusters [80]. It was iterated 100,000 times.

Functional analysis
Gene networking was analyzed by GeneMANIA (http://www.genemania.org/). Networks between genes such as co-expression, co-localization, genetic interaction, pathway, physical interactions, predicted, and shared protein domains were indicated with different colored lines. The results of co-expression were referenced by the top 10% publication size (29/287). Network weighting was based on molecular function of Gene Ontology. The functions of genes, which had a low value, were additionally marked with colors. Subcellular localizations of the gene set were obtained from the COMPARTMENTS database (http:// compartments.jensenlab.org/Search). Protein classes of those genes were classified based on PANTHER (http:// pantherdb.org/).