MUC1 as a Putative Prognostic Marker for Prostate Cancer.

MUC1 is expressed on the apical surface of glandular epithelium. With functions including protection, adhesion and signaling, MUC1 has been implicated in prostate cancer. There are many splice variants, the best characterized of which are MUC1/1 and MUC1/2 which are determined by a SNP (rs4072037, 3506G>A). Blood DNA from the general population, BPH, sporadic and hereditary prostate cancer subjects were genotyped for the rs4072037 SNP. G allele frequencies were significantly reduced in hereditary prostate cancer (15%) compared to population, BPH or sporadic prostate cancer samples (27%, 39% and 26% respectively). In addition, the G allele was lost from 3 of 8 heterozygous sporadic prostate tumor samples compared to matched blood DNA. Bioinformatics analysis of MUC1 protein sequences provides insight into differences between the variants which may be functionally relevant. The literature indicates discrepancies between immuno-histochemical studies, possibly due to the variety of MUC1 epitopes targeting diverse regions of the molecule. The contradictory findings in cell lines highlight the problem associated with inadequate experimental systems. This is the first report of genetic differences in MUC1 between blood and prostatic cancer tissue. This finding is important as proof of principle, given that many association studies focus on blood DNA rather than on the tumor DNA. As yet, potential functional differences between splice variants has been paid little attention. Antibodies which discriminate between the variants and standardization of methods would help to clarify whether there is a role for MUC1 as a prognostic marker.


Introduction
Mucin 1 (MUC1, also designated CD227, EMA, H23AG, MAM6, PEM, PEMT or PUM) is a large type I glycoprotein. Classically defi ned by the presence of an extensive variable number tandem repeat (VNTR), MUC1 moieties vary in size. Modifi cations, such as phosphorylation or glycosylation, are frequent on both the core protein and on the VNTR (Obermair et al. 2002). Indeed it is proposed that each repeat may have 5 glycosyl side chains added (Obermair et al. 2002). MUC1 is both a transmembrane and a secreted protein and it's expression is restricted to the apical surface of glandular epithelium (Arai et al. 2005). MUC1 has roles in adhesion, protection from mechanical stress and bacterial infection, hydration and mucus production, immuno-supression and cellular signaling (Macao et al. 2006). Altered expression levels and localization, as well as delayed tumor formation being observed in knockout mice (Baruch et al. 1997), implicate MUC1 in cancers such as prostate cancer.
Processing of MUC1 proteins can result in both secreted and membrane-tethered variants, as demonstrated in Figure 1. The manner by which MUC1 undergoes cleavage has recently been described (Levitin et al. 2005;Palmai-Pallag et al. 2005;Macao et al. 2006), in which an enterokinase and agrin domain found in sea urchin sperm protein (SEA domain) generates the two polypeptides, MUC1-N and MUC1-C (Macao et al. 2006). The extra cellular fragment (MUC1-N) remains at the cell membrane by forming hetero-dimers with the transmembrane fragment (MUC1-C) (Palmai-Pallag et al. 2005). The functions of MUC1 are likely to depend upon the length and modifi cations of MUC1-N, as well as the localization and binding partners of MUC1-C.
There are as many as 9 different MUC1 variants (according to the SwissProt database, see Figure 2 for schematic presentation of variants), with varying degrees of post-translational modifi cations (PTMs). Functional differences or tissue-specifi c distribution of the variants have not been conclusively proven. The general format for identifying these proteins is MUC1/isoform name or number, for example MUC1/1 denotes MUC1 isoform 1. Variants may result from alternative splicing or genetic variations. The two best characterized variants, MUC1/1 or B (3506A) and MUC1/2 or A (3506G), are determined by a single nucleotide polymorphism (SNP) (rs4072037 (2005)) (Obermair et al. 2002) of MUC1 (chromosome 1q21). The variant allele (i.e. the least common) causes formation of a novel splice acceptor site (Ligtenberg et al. 1991) introducing an extra stretch of nucleotides between exons 1 and 2 of the mRNA, thus gives rise to MUC1/1 which encodes 9 amino acids not present in MUC1/2 (Ligtenberg et al. 1991). This SNP (G3506A, GeneBank Accession number NT_ 079484.1) is also associated, by linkage disequilibrium, with the length of the VNTR (Baruch et al. 1997). Differential expression of MUC1 variants, including MUC1/1 and MUC1/2, has been noted previously in ovarian (Obermair et al. 2002) and breast  cancers. Basement membrane Lumen Epithelium Figure 1. A schematic representation of MUC1 processing, adapted from (Julian et al. 2002;Engelmann et al. 2005;Levitin et al. 2005). 1) Transcription of MUC1 gene. 2) Translation of immature MUC1 protein. 3) Maturation of the initial MUC1 protein. 4) Traffi cking of mature MUC1 into the ER. 5) Primary cleavage and dimerization. White fragments correspond to MUC1-N while black fragments correspond to MUC1-C. 6) Transport to golgi for post translational modifi cations i.e. glycosylation. 7) Traffi cking to cell surface. 8) Recycling to golgi via clatherin-mediated endocytosis. 9) Post translational modifi cations i.e. sialylation. 10) Secondary cleavage releasing extra cellular component into intercellular space. 11) Signaling. 12) Endocytosis and recycling or degradation.
The potential role of MUC1 in prostate cancer has been studied extensively. However, development of MUC1 as a biomarker for presence or progression of prostate cancer has been hindered by confl icting reports. This report provides experimental evidence of a reduced G allele frequency in hereditary prostate cancer as well as loss of heterozygocity (LOH) of MUC1 in prostate tumor DNA compared to matched blood DNA. In addition, in silico comparison of protein sequences and motifs and thus analysis of possible isoform differences is summarized.

DNA samples
Samples were collected for a previous study and genomic DNA extracted from blood as previously described (Li et al. 1999). 199 blood DNA samples were analyzed, including 46 from sporadic and 51 from hereditary prostate cancer patients, 35 from benign prostatic hyperplasia (BPH) patients and 67 from healthy young men (population sample). Of the 46 sporadic prostate cancer patients, 22 also had DNA samples extracted from prostate cancer tissues, forming 22 pairs of matched normal and tumor DNA samples. The baseline characteristics of all subjects are described in Table 1. The population samples were collected anonymously from healthy young men (age about 20 years) entering military service in the north of Sweden. Hereditary prostate cancer samples were collected from the same region, while BPH and sporadic prostate cancer samples were collected from, and thus are representative of, the greater Stockholm region. All samples were from Swedish subjects. The Swedish population is rather homogeneous, with approximately 85% being Caucasian. BPH patients were selected at random, but sporadic prostate cancer patients were selected due to large tumor size. All samples had matching slides, which were reviewed and diagnosis confi rmed by a single pathologist at each centre. BPH samples represent a specifi c subset of the population which is very unlikely to subsequently develop prostate cancer, given the average age of 79 years. All BPH patients had histopathological examination of transurethral resection specimen to exclude possible incidental prostate cancer, in addition to measurement of serum prostate specifi c antigen (all within the normal range) and other clinical examinations (including ultrasound and digital rectal examination) without signs of prostate cancer. Hereditary prostate cancer is here defi ned as a patient having at least 2 fi rst degree relatives with clinically and pathologically confi rmed prostate cancer (Smith et al. 1996). Ethical permission from Karolinska Institutet and Umeå University has been granted.

MUC1 SNP genotyping
Based on the human MUC1 genomic sequence (GeneBank accession number: NT_079484.1), we designed 2 primer pairs to carry out a nested PCR according to a standard PCR protocol using a Platinum Taq DNA polymerase (Invitrogen). Thermocycling parameters were set as follows: initial denaturing at 94 °C for 2 min; 30 cycles each of which at 94 °C for 30 sec, 53 °C for 45 sec, 74 °C for 45 sec; and a fi nal extension at 74 °C for 5 min. The size of PCR product was confi rmed by agarose gel electrophoresis, before being purifi ed using a Qiaquick PCR purifi cation kit (Qiagen) and quantifi ed by photo spectrometry. A sequencing reaction was carried out using a sequencing primer on the reverse strand with a Beckman Coulter DTCS quick start kit in accordance with manufacturer's instructions. Primer sequences are available upon request. Beckman Coulter's CEQ™ 8000 Genetic Analysis System was used for sequence analysis. Quality control of sample processing was achieved by a single researcher performing all reactions, with a single protocol and kit. A random selection of samples was repeated to confi rm results and accuracy. No clinical data was available to the researcher prior to genotype calling of these samples, thus preventing any bias. Sequence chromatograms were very clear ( Fig. 2) and genotype calling was carried out by two independent researchers to confi rm analysis.

Statistical analysis
To determine whether specifi c alleles were associated with BPH or prostate cancer, comparisons were drawn on allele frequencies between sample sets using the Chi2 test, with the null hypothesis assuming that there is no signifi cant difference between allele frequencies of each sample set.

LOH analysis
Using the Beckman Coulter's CEQ™ 8000 Genetic Analysis System, LOH analysis was carried out on sporadic prostate cancer samples by comparing the genotype of rs4072037 in blood DNA with tumor DNA from the same patient. Parameters for this analysis were set as follows: Percentage over peak spacing 70%; height ratio 30%; sensitivity 25%.

Experimental data
Disease association MUC1 exon 2 (3506G/A) genotype frequencies in blood DNA samples only demonstrated signifi cant differences ( TC 7, TT 7.9). There appeared to be no correlation between LOH and grade of tumour or age.

Bioinformatics analysis of protein sequences
Variants Table 3 provides a summary of differences between the MUC1 variants, including both described and bioinformatically predicted variants. There is a large degree of overlap between the different MUC1 variants and motifs, as shown schematically for the SwissProt variants (set a) in Figure 2. Another database, the Human Protein Atlas (HPA), lists 12 variants (set b). 4 of the SwissProt variants (6, 7, 8 and 9) share 99% homology with 4 of those listed by the HPA (19, 12, 3 and 17 respectively). Interestingly, with all 4 of these pairs, the one amino acid difference is the 4th from last amino acid, which in set a is alanine, but in set b is theonine. In addition, of the 12 variants in set b, only 2 contain a VNTR (2 and 16), and Blast2 analysis demonstrates 100% homology between these two variants.
Most variants share a similar primary structure, consisting of a signal peptide, an extra cellular domain, a transmembrane domain and a cytosolic domain. The N-terminal domains vary in length, however sequence alignment indicates that a stretch of 51 amino acids of the N terminal is common to all variants (Fig. 4a). MUC1/4 (set a) lacks a further 10 amino acids which are present in all other variants. Only variants 9 (set a), 11 and 17 (set b) lack a transmembrane domain. MUC1-C is also well conserved (Fig. 4b), with a stretch of 74 amino acids present in all but variants 5 (both sets). An adjacent stretch of 30 amino acids is common to all except variants 9 (set a), 11 and 17 (set b). The isoform sequences then become more divergent. Of note, both variants 5 (both sets) appear to be distinct from the other variants; they contain the N terminal consensus sequence, but at the C terminal share a stretch of only 23 amino acids, which contain no predicted domains. MUC1/5, 9 (set a) and 17 (set b) have two regions of 100% homology; the N terminal domain 54 residues, including the signal peptide and a stretch of 42 amino acids in the C terminal region.

Signal peptides
Protein sequences from both SwissProt (set a) and the HPA (set b) were analyzed. Signal peptides were predicted for the primary sequences of all variants in both sets (Bendtsen et al. 2004) and are present in the N terminal sequence (Fig. 4b). The signal peptide sequence reported previously (Baruch et al. 1997) is distorted in some variants, with 2 (set a), 11, 14 and 15 (set b) having a stretch of 9 amino acids inserted into the signal peptide (as a result of the SNP in exon 2 analyzed here) where as variants 3 and 4 (set a) are lacking 1 threonine and 2 valine residues. All other variants have the complete signal peptide.

Secreted vs membrane tethered variants
A SEA domain was predicted for most variants (Falquet et al. 2002), with the exception of 5, 7, 11, 15 and 19 (set b). However, searching the primary amino acid sequence of MUC1-C (Fig. 4b) of each isoform for the SEA autocatalytic cleavage motif (Palmai-Pallag et al. 2005) indicates that while most predictions are correct, MUC1/5, 9 (set a) and 17 (set b) do not have the consensus sequence. MUC1/19 (set b) has part of the consensus sequence, but as the fi rst 2 residues of the motif are altered, it may not be functional (Palmai-Pallag et al. 2005).

Interactions
The motif conferring interaction with β catenin (Li et al. 1998) is present in the cytoplasmic domain (Fig. 4a), and is present in all but MUC1/5 (both sets). Variant 5 (both sets) are the only variants lacking the domain required for interaction with ERα (Wei et al. 2006).

Phosphorylation
Part of the cytoplasmic domain is common to all "membrane-tethered" variants (i.e. those containing a transmembrane domain), and there is further homology between some variants. The potential for cytoplasmic domain phosphorylation (as determined by manually checking the primary sequence for known motifs) by GSK3β, Abl and Src does not appear to vary between the variants, as demonstrated by Figure 4b, with only variants 5 (both sets) lacking these sites. Predicted phosphorylation by PKC and CK2 however demonstrates differences between the variants (Table 3). Of note, there are phosphorylation sites unique to variants 4 (set a) and 7 (set b).

Post-translational modifi cations
Interest in post translational modifi cations has mainly been focused on addition of glycosyl side chains, however many other modifi cations are likely. MUC1 variants 1-4 (set a), 2 and 6 (set b) are predicted to contain a proline-rich domain of 852 residues and 6 amino acids downstream, a serine-rich domain of 97 residues (Falquet et al. 2002). This region encompasses the VNTR. The lengths of the VNTRs of MUC1/3 and 4 (set a) have not been reported. Given the number of modifi cations possible in the VNTR the length of this region is likely to be very important for  function. MUC1/5 (set a) also contains these two domains, but with the order inverted, and while the proline-rich domain is the same length, the serinerich domain is only 47 amino acids long.
A potential motif for palmitylation (Kinlough et al. 2006) is found in the cytoplasmic region ( Fig. 4b and Table 3), thus is predicted for all variants except MUC1/5 (both sets).
Glycosylation of MUC1 variants is dependent on the glycosyl transferases in the cell, however primary sequence analysis does indicate potential sites for attachment. Most side chains are added to the VNTR. Each repeat of the VNTR has the potential for 5 glycosyl modifi cations. Variants 6-9 (set a) and most of set b variants do not contain the repeats characteristic of mucins, thus are unlikely to be modified to the same extent. A variety of modifi cation sites are predicted for all variants of MUC1 (Table 3). Potential myristylation sites indicate another modifi cation, which all membrane-tethered variants share (Table 3). Of note, variants 15 and 7 (set b) have unique modifications sites, for glycosylation and myristylation respectively. Patterns of possible sialylation modifi cations may have similarities with those seen in glycosylation, in that the VNTR is the primary site for such additions.

Discussion
The results presented here demonstrate, for the fi rst time, a signifi cantly reduced frequency in blood DNA of the MUC1 3506G allele in hereditary prostate cancer compared to population, BPH and sporadic prostate cancer samples. The same 3506G allele is subject to LOH in prostate cancer samples compared to matched blood samples. Differential expression of MUC1 variants 1 and 2 is thus implicated in prostate cancer. Loss of the G allele leads to a switch in expression from both MUC1/1 and MUC1/2 to exclusively MUC1/2, thus a protein with an intact signal sequence and shorter VNTR. A shorter VNTR may lead to a decreased protective function of this Mucin on the normal prostatic epithelial cell. Moreover, as motifs for PTMs are mainly present in the VNTR, the length of VNTR may theoretically have a multitude of effects on protein function. Due to the limited size of the sample set, independent validation is required. However, this fi nding is important as proof of principle, given that many association studies focus on blood DNA rather than on the tumor DNA.
Diagnostic and prognostic use of differentially expressed MUC1 variants has been reported for ovarian and breast cancers (Obermair et al. 2002;Schmid et al. 2002). As yet, this has not been addressed in prostate cancer. Our genotyping of matched tumor and blood samples indicates that isoform differences are likely equally important in prostate cancer. For example, the SNP per se may not be critical for function, as it does not alter the encoded amino acid. The functional difference is likely to stem from the associated VNTR lengths and potentially from the altered signal peptide sequence caused by the effect of the SNP on splicing. The signifi cant reduction in frequency of the G allele of MUC1 in hereditary prostate cancer compared to the general population, BPH and sporadic prostate cancer, and the trend of reduction of the G allele in sporadic prostate cancer compared to the general population and BPH is intriguing and worth further investigation. The infl uence of different environmental exposures acting upon the hereditary prostate cancer and general populations compared to the sporadic prostate cancer and BPH subjects can not be ruled out.
MUC1 is a complex gene from which a number of variants can potentially be spliced with further complex PTMs. This makes it diffi cult to pinpoint the functional signifi cance of the allelic variation of the SNP presented here (rs4072037). Therefore we have analysed in silico predictions of MUC1 variants. The potential for differences between the variants has not previously been assessed. Most of the variants are in silico translations of predicted mRNA splicing/open reading frames. Only variant 7 (set a) is reported to be specific to cancer (Obermair et al. 2002), while some are reportedly differentially regulated between normal and malignant tissue. The domains and sites for PTMs predicted are theoretical, thus biological evidence of their importance for the function of these variants is still required. However, this analysis provides insight into differences between the variants which may be functionally relevant.
A signaling role for MUC1 has been proposed. MUC1-C has been shown to colocalize with both ERα and P53 (Wei et al. 2005;Wei et al. 2006), and infl uences stability or transcriptional activity of its binding partner. Sub-cellular localization of the protein is likely to determine interactions of this manner.
The implications of the differences in MUC1 signal peptides are unknown. Inserting or losing amino acids may render the sequence inactive, or may enhance its function. It would be interesting to elucidate whether there are differences in processing effi ciency or sub-cellular localization as a result of the variations in this region. Variants lacking the transmembrane domain will not, unlike the classical mucin, be inserted into the membrane, however whether they are secreted, degraded or remain in the cytosol where it might be more readily available for signaling, is not yet known. Variants which lack the TM domain yet contain a complete signal peptide (variants 9 (set a), 11 and 17 (set b)) may be secreted rather than incorporated into the cell membrane. This possibility is intriguing as the only isoform confi rmed to be secreted (variant 5 (set a)) contains a VNTR. The absence of the VNTR from variants is likely to infl uence their steric barrier function the most, and may infl uence ligand-receptor-like interactions between MUC1-N and MUC1-C fragments and potentially signaling mechanisms. Due to the SEA domain, most variants are potentially able to give rise to a soluble extra-cellular domain. As MUC1/5 (set a) is a secreted isoform the lack of the SEA domain is unsurprising, and may suggest that isoform 9 (set a) may also be secreted. Variants which give rise to extra cellular fragments or are secreted may function as ligands for the membrane tethered potential receptors.
Differences in phosphorylation status of the cytoplasmic domain may determine signaling pathways utilized by the different variants. In particular the presence of PKC phosphorylation sites unique to variants 6 (set a), 7 and 19 (set b) are of interest, in light of an association between PKC phosphorylation and anchorage independent growth (Thompson et al. 2006). Likewise, a CK2 phosphorylation site unique to variants 4 (set a) and 7 (set b) could alter signaling. Thus, the variants may differ in their oncogenic potentials. Further research into the mechanisms and specifi city of modifi cations and their association with progression is awaited with interest.
Most studies of MUC1 focus on the roles which occur by virtue of their VNTR domains, such as adhesion, mucus production and barrier function. Given the importance of the VNTR domain, it is interesting that the variants lacking this domain also contain the SEA site, implying that they too may give rise to soluble extra cellular fragments. These variants would give rise to very short extra cellular fragments as well as transmembrane frag-ments. How the two fragments transduce signals between the exterior and interior cellular environment is not clear. For example, ligand-receptor-like interactions between secreted and transmembrane MUC1 variants which result in phosphorylation of the MUC1/6 tail have been reported (Baruch et al. 1999;Obermair et al. 2002). This poses the questions: is it only the variants which lack a VNTR which are able to act as receptors, or do the transmembrane fragments of all variants have this function? Similarly, do all extra cellular fragments or variants have potential as ligands? The differences in the variants ability to form either ligands or receptors and specifi city of potential interactions add another level of complexity to functions of MUC1. It is feasible that the relative abundance at the cell surface of the different variants and competition between variants may determine the resulting signals.
That both under and over expression of MUC1 is associated with prostate cancer death (Andren et al. 2007) suggests that the level of VNTRcontaining MUC1 variants is regulated in normal cells to maintain a precise level. However, the possibility that variations in expression of non-VNTR-containing variants may determine tumorigenic potential can not be excluded. Regulation of glycosylation and other modifications is a dynamic process (Julian et al. 2002) and may determine the adhesive properties of the molecule and thus it's ability to interact with other cell types. In addition, a heavily glycosylated VNTR would provide a mesh which may store growth factors and minimize immune or pathogen interactions with the cell membrane. The study reporting an association between glycosylation and angiogenesis but not PSA level or Gleason grade (Papadopoulos et al. 2001) suggests that MUC1 can indirectly (in this case via neovascularization) infl uence tumor growth. Further, it was previously reported that sialyated MUC1 interrupts cell-cell adhesion, but removal of sialyation restored adhesion (Wesseling et al. 1996), thus the modifi cations are functionally relevant. Table 4 summarizes the variants likely to have been detected by the different antibodies in the immunohistochemistry studies reported to date of MUC1 expression in prostate cancer. The conclusions of these studies are thus limited to the variants detected by each antibody. Results of these studies have been inconclusive or contradictory and only one addressed the problem of multiple variants, if only briefl y. Furthermore, contradictory results in prostate cancer cell lines have also been observed when PC3, DU145 and LNCAP (or a sub culture, LNCaP LN3) were analyzed, with either differential or no expression observed (O'Connor et al. 2005), compared to high levels of over expression in all cell lines (Cozzi et al. 2005). The specifi c epitopes recognized by some antibodies used are unclear and may be part of the reason for the disparity in the results between studies. Antibodies which discriminate between the variants are required for reliable assay of protein expression. Another reason for contradictions may be the presence of functionally different variants, so that measurement of total MUC1 expression may not be conclusive.
Most studies have assessed expression of MUC1 by targeting antibodies to the VNTR, thus fail to detect expression of 4 of 9 MUC1 set a (and all of set b) variants. Indeed, the modifi cations added to the VNTR are highly variable and dynamic, thus a negative result when using an antibody which recognizes this region may be misleading, as variations in modifi cation patterns (such as glycosylation) are likely to alter binding specifi city. In addition, the VNTR can be extensive, so the question remains as to whether dense staining of a tissue sample refl ects antibody molecules binding to multiple distinct MUC1 molecules, or many antibody molecules binding to the VNTR of a single MUC1 peptide.
We believe that the value of assessing variations in isoform expression has, so far, been under appreciated and methods to date lack standardization which would allow for meta-analyses of results. Predictions of differences between the MUC1 variants suggest distinct functions, of as yet unknown importance. The actual expression of in silico predicted variants needs to be confi rmed and their biological functions determined experimentally. Further research into the mechanisms and specifi city of PTMs and their association with progression is awaited with interest. Improvement in methods for determining isoform expression patterns and functions could thus yield valuable information. The fi nding described in this report of a prostate cancer-associated functionally relevant variation supports the notion that the MUC1 gene may be a useful marker for prostate and other cancers.