Identification of a non-canonical nuclear localization signal (NLS) in BRCA1 that could mediate nuclear localization of splice variants lacking the classical NLS

The breast cancer type 1 susceptibility gene (BRCA1) is a tumor suppressor gene, mutations or loss of which lead to genomic instability and breast cancer. BRCA1 protein is part of a large multi-protein complex involved in a variety of DNA repair and transcription regulatory functions. At least four splice variants have been described and these differ in their function and tissue and spatio-temporal expression patterns. Structural analysis has revealed the presence of two nuclear localization signals (NLS) located in exon 11 of BRCA1. Interestingly, a splice variant of the protein that lacks both of the known NLS still manages to gain entry to the nucleus. While there is experimental proof for the translocation of these proteins by binding to other established nuclear proteins, we examined the possibility of a hitherto unidentified NLS in this particular variant. In this paper, we present evidence for the existence of a previously unreported non-canonical NLS contained within the first 39 amino acids of exon 11. A fusion protein with this 39mer and a reporter green fluorescent protein translocated into the nucleus when it was expressed in breast epithelial cells. We demonstrate the presence of a hitherto unreported noncanonical NLS in exon 11a of BRCA1. This NLS might aid proteins that were encoded by splice variants and lack the canonical NLS to localize to the nucleus.


Abstract:
The breast cancer type 1 susceptibility gene (BRCA1) is a tumor suppressor gene, mutations or loss of which lead to genomic instability and breast cancer. BRCA1 protein is part of a large multi-protein complex involved in a variety of DNA repair and transcription regulatory functions. At least four splice variants have been described and these differ in their function and tissue and spatio-temporal expression patterns. Structural analysis has revealed the presence of two nuclear localization signals (NLS) located in exon 11 of BRCA1. Interestingly, a splice variant of the protein that lacks both of the known NLS still manages to gain entry to the nucleus. While there is experimental proof for the translocation of these proteins by binding to other established nuclear proteins, we examined the possibility of a hitherto unidentified NLS in this particular variant. In this paper, we present evidence for the existence of a previously unreported non-canonical NLS contained within the first 39 amino acids of exon 11. A fusion protein with this 39mer and a reporter green INTRODUCTION BRCA1 is a tumor suppressor gene, mutations of which are associated with hereditary breast cancer [1,2]. Reduced expression and aberrant sub-cellular localization of BRCA1 have also been reported in sporadic breast cancer [3]. The gene is composed of 24 exons, of which 22 are coding, including an unusually large exon 11, which is 3.4 kb long. Altogether it encodes a large protein of 220 kDa or 1,863 amino acids (aa) [1]. From a structural point of view, three domains of the protein are particularly important [4]. The RING domain of BRCA1 (aa 20 to 64) is involved in many protein-protein interactions of BRCA1 and also exhibits E3 ubiquitin ligase activity [4][5][6][7]. Two classical NLS have been described: NLS1-503KRKRRP508 and NLS2-606PKKNRLRRKS615, which are both capable of the nuclear translocation of BRCA1 [7,8]. The C-terminal domain of BRCA1 (aa 1560 to 1863) has been shown to function as a transcriptional activation domain. Four mRNA variants have been described: the full-length BRCA1 (1863 aa; p220 kDa); BRCA1(∆11)(721 aa; p79 kDa); BRCA1(∆11b) or BRCA1a (760 aa; p110 kDa); and BRCA1(∆9, 10,11b) or BRCA1b (719 aa; p100 kDa) [9][10]12]. Both BRCA1a and BRCA1b have an in-frame deletion of the majority of exon 11, from aa 263 to 1365. BRCA1b has an additional deletion of exons 9 and 10. However, both BRCA1a and BRCA1b still retain a 39-aa long part of exon 11 (exon 11a) from aa 224 to 262aa. There are conflicting reports regarding the localization, importance and specific roles of these splice variants in normal and malignant breast epithelial cells. However, over the past 15 years, Rao et al. have performed detailed studies to delineate the specific health-and disease-related roles of the two major non-fulllength splice variants BRCA1a and BRCA1b [10,[13][14][15][16]. Among other things, Wang et al. showed that the BRCA1, BRCA1a, and BRCA1b proteins are localized in the cytoplasm, nucleus and mitochondria and their nuclearcytoplasmic shuttling is a regulated process [10]. They demonstrated the growth inhibitory function of these variants and in particular that BRCA1a has antitumor activity in triple negative breast cancer. They have also shown that both BRCA1a and BRCA1b associate with the E2F family of transcription factors, cyclins and cyclin-dependant kinases. In 2002, Fabbro et al. [17] showed that BRCA1 can bind to the BRCA1associated RING domain (BARD1) through its NH2-terminal RING domain.
Their findings provided a plausible mechanism for the nuclear transport of BRCA1 exon 11 splice variants that lack the two known NLS but retained the RING domain. Qin et al. [15] proposed yet another piggy-back mechanism of BRCA1 translocation into the nucleus through its binding to ubiquitinconjugating enzyme 9 (Ubc9). Ubc9 is a SUMO-E2-conjugating enzyme that is known to be transported into the nucleus by importin 13 [18]. They showed that down-regulation of Ubc9 resulted in enhanced cytoplasmic localization of BRCA1 and exclusive cytoplasmic retention of BRCA1a and BRCA1b. In this study, we looked for the possible presence of an unidentified NLS to explain the nuclear localization of BRCA1a and BRCA1b, which lack the two classical NLS. Based on the presence of BRCA1a and BRCA1b in the nucleus and the lack of BRCA1 (∆11), the hypothesis would be that the 39 aa of exon 11 that are present in these two variants could be responsible for their transport into the nucleus. Examination of the primary sequence of the 39mer did not reveal a classical monopartite or bipartite NLS. However, it does have a stretch of sequence that contains the 3 basic amino acids. We modelled the 3D structure of the BRCA1 variants that lack the full-length exon 11 but retain the 39mer using in silico methods and attempted the docking of this protein with the minor NLS-binding pocket of importin alpha. We then compared the strength of this interaction using a novel scoring function, NLS score. NLS score is the sum of the normalized number of inter-atomic bonds, net hydrophobic/hydrophilic residues, and atomic contact energies that stabilize the docked interfaces of NLS-containing proteins and importins (Hari et al., manuscript in preparation). The results indicated that the 39mer has the potential to bind specifically to the minor NLS-binding pocket of importin alpha on par with the non-classical signal sequences of experimentally proven nuclear proteins. To further investigate the potential of this 39mer to act as a nuclear localization signal, we constructed two different fusion proteins using the vector enhanced green fluorescent protein (EGFP), which contained GFP and regions of BRCA1 with and without the 39mer. When the human breast carcinoma cell line (HBL-100) was transfected with the clones expressing the fusion proteins, only the construct containing the 39mer translocated into the nucleus.

Cell line
HBL-100 cells were purchased from NCCS Pune, India, and maintained as per the instructions provided. They were cultured in McCoy's medium (HI Media Labs, India) supplemented with 1 mM L-Glutamine, 10% fetal bovine serum (FBS; Sigma Aldrich, India) and 1% penicillin-streptomycin.
Extraction of RNA and preparation of cDNA RNA was extracted from approximately 10 6 HBL-100 cells using Tri Reagent (Sigma Aldrich, India) according to the manufacturer's instructions, and quantitated using ribogreen flourimetry (Turner Biosystems). One microgram of total RNA was converted to cDNA using the ABI High Capacity cDNA Kit (Applied Biosystems, India).

Primer design and synthesis
These were the primer sequences for BRCA1 (ENSG00000012048), including restriction enzyme sites (RE), for subsequent cloning steps to amplify the region containing the exon-11a-39mer: The predicted amplicon of 590 bp encompassed a total of 189 aa: from 147 to 263 inclusive, and from 1366 to 1439 inclusive. This amplicon represented a fragment of the BRCA1a splice variant containing exon 11a. The RE sites for XhoI and BamHI were respectively added at the 3' and 5' ends of the two primers. The primer sequences (including RE) for the negative control lacking the region coding for the exon-11a-39mer were: Forward: The predicted amplicon of 480 bp designed to amplify the negative control sequence encompassed a total of 153 aa from 1438 to 1590, encompassing parts of exons 13 and 16, and all of exons 14 and 15. This amplicon could be produced from either the wild-type BRCA1, BRCA1a or BRCA1b. No known NLS have been identified in this fragment. Primers were synthesized from Eurofins MWG (Bangalore, India) at a concentration of 0.01 µmol to a highpurity, salt-free (HPSF) grade. RE sites for XhoI and BamHI were added respectively at the 3' and 5' ends of the two primers.

PCR, gel electrophoresis and amplicon purification
Two separate PCRs were set up to amplify the test and negative control fragments of BRCA1 from HBL-100 total cDNA [10]. 50 ng of total cDNA was used as a template in a 25-µl reaction volume. The PCR mix contained a low fidelity Taq polymerase (Ready MixTaq PCR reaction mix with MgCl 2 , Sigma Scientific, India) with final concentrations of 200 nM each of the forward and reverse primers. The PCR parameters were as follows: denaturation at 95ºC for 10 min; followed by 35 cycles of denaturation at 95ºC for 1 min, annealing at 60ºC for 45 s, extension at 72ºC for 45 s, with a final 10-min extension at 72ºC. PCR products were gel-purified using the Promega Wizard SV Gel and PCR Clean-Up System according to the manufacturer's instructions. Briefly, the band corresponding to the predicted size of the amplicon was excised from the agarose gel (1%) and melted in the buffer provided in the kit and then purified on a column. Purified DNA quantified using Nanodrop (Thermo Fisher) spectophotometry was re-run on an agarose gel to confirm the size. The purified test amplicon was sequenced by a service provider (Eurofins India Pvt. Ltd) on a Model 377 DNA sequencer (Applied Biosystems, Foster City, CA).

Construction of test and control clones
The expression vector used for the generation of the fusion proteins was pEGFP-C3 (Clontech Laboratories, USA). This vector codes for an enhanced variant of GFP with a predicted molecular mass of ~30 kDa [11]. The test clone that included the coding sequence for exon-11a-39mer was constructed by inserting the restriction-digested 590-bp insert downstream of the coding sequence of EGFP in frame. The size of the test fusion protein, EGFP-BRCA1-exon-11a-39mer (hereafter referred to as the test protein) had a predicted molecular weight of 50 kDa. The negative control clone that lacked exon-11a-39mer was constructed by the insertion of the 480-bp fragment downstream of the coding sequence of the EGFP in-frame. The size of the negative control fusion protein, EGFP-BRCA1-exon-13-16 (hereafter referred to as the negative control) was 46 kDa.

Transformation, ligation and plasmid DNA extraction
Electrocompetent cells of an Escherichia coli bacterial strain with an efficiency of 10 4 to 10 6 CFU/µg were used for transformations. Ligations of the vector and fragments of the test protein and negative control (restriction enzyme digested, gel purified) were set up separately at 16ºC overnight. One µl of ligation mix was added to 100 µl of electrocompetent cells and electroporated at 2000V for 5 milliseconds using an e-porator (Eppendorf Technologies) and plated onto LB + kanamycin plates. Colonies were picked, grown and analyzed using standard methods of miniprep small-scale plasmid DNA extraction. The resulting DNA was digested to verify positive clones which were cultured for large-scale DNA preparation before transfection.

Transfection
HBL-100, a human breast carcinoma cell line, was transfected with the test and control constructs. This cell line was chosen so that the behaviour of the fusion proteins in a normal background could be examined. HBL-100 cells were plated at a density of 4 x 10 4 cells per well onto cover-slips in 4-well plates. After 24 h, cells were transfected with 1 µg each of the test protein (EGFP-BRCA1-exon-11a-39mer), EGFP vector or the negative control (EGFP-BRCA1). Transfection was carried out using Effectene Transfection Reagent (Qiagen) according to the manufacturer's instructions. Briefly, 5 µl of the Effectene Transfection Reagent and 60 µl Buffer EC (supplied in the kit) along with the mixture of 1 µg of plasmid DNA was added to HBL-100 cells dropwise. The cells were incubated with Effectene-DNA complex for 5 h at 37ºC. After incubation, the effectene-DNA mix was aspirated and 300 µl of fresh complete medium was added to the cells dropwise.

Fixing and staining
The cells on dishes were fixed with 4% paraformaldehyde in phosphate buffered saline (PBS) 24 h post-transfection and incubated at room temperature (RT) for 20 min. Cells were washed twice with PBS + 0.1% TritonX100 (PBT) and then blocked with 10% fetal bovine serum (FBS) in PBT for 10 min at room temperature. The cells were immunolabelled using an anti-rabbit GFP, IgG (Immunoglobulin G) fraction (Invitrogen) at a 1:200 dilution in blocking solution and incubated for a minimum of 2 h at RT. After the primary antibody treatment, cells were washed twice with PBT. Secondary antibody Alexa Fluor 488 goat antirabbit IgG at 1:1000 dilution (Invitrogen) was added and incubated for 1 h at room temperature. The cells were then washed several times with PBT. Nuclei were stained with the 4',6-diamidino-2-phenylindole (DAPI) present in ProLong Gold anti-fade reagent (Invitrogen). A drop of this mountant was placed on a glass slide and the cover-slip was gently lifted from the well and placed cell surface down on the mountant. Excess mountant was drained and the cover-slip was sealed.

Microscopy
Imaging was done on an Olympus upright Microscope-BX51 with a fluorescent attachment. The cells were visualized and images recorded at 40x and 100x magnification.

RESULTS
The primary sequence of the splice variants that were unequivocally cytoplasmic were compared with the variants that were at least part nuclear. The complete loss of exon 11 leads to total retention of the protein in the cytoplasm, while both BRCA1a and BRCA1b are also found in the nucleus [7]. Alignment of these three variants revealed that both BRCA1a and BRCA1b had the N-terminal 39 amino acids (aa) of exon 11, also termed exon 11a. This is shown schematically in Fig. 1. Fig. 1 also shows the sequence of the 39 aa of exon 11a which is retained during the splicing of exon 11 in the variant BRCA1a(∆11b), and was included in the test clone. Examination of the sequence revealed that while it did not code for a classical NLS, there was a stretch of three basic amino acids from 252 to 257 (KRAAER). In our 3D modelling and docking studies, this stretch had the potential to bind the minor binding pocket of importin alpha, comprised of helix 3 of ARM repeats 7 and 8 (Hari et al., manuscript in preparation). In order to verify the potential NLS function of the 39mer, a GFP-BRCA1fusion protein that contained the 39mer was made. Two criteria were considered for the fusion protein construct. Firstly, the GFP encoded by the vector had a predicted molecular mass of ~30 kDa. It is known that occasionally a few proteins of less than 40 kDa might be able to move unaided into the nucleus [19]. We decided to construct a fusion protein of at least 45 kDa. Therefore, any nuclear localization that would be observed could be assumed to be due to the added 39mer sequence. Secondly, our 3D modelling studies had indicated that engagement with the concave binding surface of the importin NLS-binding pocket is influenced by flanking sequences. The construct designed was approximately 50 kDa and the 39mer was part of a larger stretch that encompassed the sequences immediately up-stream and down-stream that have no experimentally proven nuclear localization signals. The constructs are represented schematically in Fig. 2.  The presence of the cloned DNA fragments of 590 bp in the test clone and 480 bp in the negative control clone was confirmed by release of the fragment by restriction digestion followed by electrophoresis. Clones that contained the positive test construct were sequenced to confirm the fusion between the GFP and BRCA1. Of the nine test clones that were sequenced, two had the expected sequence and seven had a point mutation at position 108 (aa 226) that led to the replacement of an A nucleotide by a T. This caused a change in aa 226 from cysteine to serine. The BRCA1 39mer construct (test), the test construct with the point mutation (test+m) and the parent vector, EGFP were used to transiently transfect the human HBL-100 cells. The localization of the GFP was recorded by immunofluorescence. Fig. 3 represents the results of these transfections. Non-transfected cells were observed by their staining only with DAPI, a nuclear marker. The transfection efficiency was 30%. The transfection experiments were performed 4 times. For each experiment, a total of 500 cells were counted per set and classified according to localization of the GFP labelling in the following way: cells with GFP exclusively in the cytoplasm, cells with GFP entirely localised within the nucleus, and cells with GFP in both nucleus and cytoplasm. A representative picture for each of the 3 sets is given in Fig. 3. The panel shows merged images of cells labelled with GFP and DAPI and represent nuclear and cytoplasmic staining for the test construct, the test+m construct and the negative control. Cells transfected with the negative control construct showed exclusively cytoplasmic localizations (99.7 ± 0.4% p < 0.05). We observed three types of localization in cells transfected with the test and/or test+m constructs. The test construct predominantly showed localizations (63 ± 3%, p < 0.05) in the cytoplasm, with 13 ± 1% (p < 0.05) in the nucleus and 24 ± 8% (n = 4) in both the nucleus and the cytoplasm. The test+m construct demonstrated higher proportions of cytoplasmic localization (78 ± 3%, p < 0.05), a lesser proportion of nuclear GFP fusions (8 ± 3%, p < 0.05) and a nuclear/cytoplasmic localization of (14 ± 7%, n = 4) compared to the test construct. These results suggest that the 39mer present in the test construct has the ability to drag a protein into the nucleus and the presence of a mutation in the 39mer sequence reduces this effect.

DISCUSSION
Of the multiple mechanisms that have evolved to regulate protein function, sequestration of a protein in a compartment that lacks its partners and/or targets is a peculiarity of eukaryotic systems. Nuclear entry is an assisted, energydependent mechanism. Since the initial report that the fundamental property of NLS was contained in a stretch of at least four basic amino acids [20], there has been much refinement in defining consensus sequences and the development of methods to predict the presence of an NLS in a protein. However, there is considerable evidence that the classical NLS are neither necessary nor sufficient in and of themselves in all cases to effectively transport a protein into the nucleus in vivo [21]. The 3D modelling and the calculation of a score for the docking to importin alpha demonstrated that the 39mer has a sequence 252 KRAAER 257 that has the potential to act as a non-classical NLS and to bind to the minor NLS-binding pocket of importin alpha (Hari et al., manuscript in preparation). There are at least two definitive experimental strategies that are necessary to confirm the capability of a sequence to act as an NLS. The first is to tag a confirmed cytoplasmic protein with the putative NLS and demonstrate the nuclear localization of this tagged protein. The second is mutation of the putative NLS residues to neutral amino acids to abolish the nuclear localization of the protein. We used the former approach in our studies and proved that the 39mer is indeed capable of acting as an NLS. This conclusion is based on the shift from cytoplasm to nucleus that was observed when the EGFP construct contained the 39mer sequence, the lessening of this effect when the sequence is mutated, and the absence of nuclear localization when the construct does not contain the 39mer (as in the case of our negative control construct). The results have been confirmed with two different clones for each construct. The amount of positively stained nuclei indicates that the sequence is efficient for transport, even if the level is slightly lower than that obtained with the classical NLS sequence.
The fact that the test clone had a nuclear localization in only approximately 35% of cells might be related to the efficiency of this non-canonical nuclear localization signal. It is known from both experimental and modelling studies that canonical nuclear localization signals bind to the major binding pocket of importin alpha, while the nuclear localization signal contained in the 39mer is likely to mediate its effect by binding to the minor binding pocket. The use of GFP-fusions has been the method of choice for monitoring the subcellular localization of proteins, and this has been the approach used in most tests of putative NLS-containing sequences. The localization of GFP in and of itself has been reported to be both cytoplasmic [22] and nuclear [23,24].We have also tested our vector with unfused EGFP protein for its localization in HBL-100 and human embryonic kidney 293 cells (HEK-293) cells and noted its expression to be entirely cytoplasmic (data not shown). However, in most studies, GFP has been used as a fused component of the protein whose localization was being investigated. In all such studies, only the construct that possessed the NLS was found to have the nuclear localization and the GFP-fusion lacking the NLS was recorded to be cytoplasmic [25][26][27][28][29]. The increased size of the construct and the changed conformation are additional factors which could prevent the free diffusion of the protein into the nucleus, since it is known that proteins larger than 45 kDa are less likely to move freely into the nucleus [19]. In this study, the fusion-protein expressed by the negative control construct was also found only in the cytoplasm. This corroborates the reported behavior of fusion-proteins lacking NLS. The mere ability of a particular sequence to perform a function cannot be construed to mean that it performs the function in vivo under normal physiological circumstances. A straightforward explanation for the retention of the 39mer in the two nuclear phospho-proteins p110 and p100 encoded by the splice variants BRCA1a and BRCA1b might be the ability of this 39mer to function as an NLS. These nuclear proteins are present in the nucleus, but lack the canonical NLS (NLS1-503KRKRRP508 and NLS2-606PKKNRLRRKS615). This does not exclude the possible indirect transport of BRCA1 when bound to BARD1 or Ubc9 [17,30]. The finding that the 39mer could function as an NLS provides a plausible explanation for the evolutionary retention of this part (39mer) of exon 11 in the two main splice variants of BRCA1 that lack the rest of exon 11. "BRCAness" is an emerging concept, wherein a sub-group of sporadic breast cancer types (triple negative or basal sub-group) that do not have BRCA1 mutations still resemble tumors with BRCA1 mutations at both the molecular level and clinically [31]. A variety of mechanisms might lead to such BRCA1 dysfunction or sub-optimal function. A systematic examination of human breast cancer specimens for the presence, levels and localization of these splice variants might provide further clues as to the contribution of these variants to the different sub-groups of breast cancer.