Internal Ribosome Entry Segment Activity of ATXN8 Opposite Strand RNA

Spinocerebellar ataxia type 8 (SCA8) involves the expansion of CTG/CAG repeats from the overlapping ataxin 8 opposite strand (ATXN8OS) and ataxin 8 (ATXN8) genes located on chromosome 13q21. Although being transcribed, spliced and polyadenylated in the CTG orientation, ATXN8OS does not itself appear to be protein coding, as only small open reading frames (ORFs) were noted. In the present study we investigated the translation of a novel 102 amino acids containing-ORF in the ATXN8OS RNA. Expression of chimeric construct with an in-frame ORF-EGFP gene demonstrated that ATXN8OS RNA is translatable. Using antiserum raised against ORF, ATXN8OS ORF expression was detected in various human cells including lymphoblastoid, embryonic kidney 293, neuroblastoma IMR-32, SK-N-SH, SH-SY5Y cells and human muscle tissue. The biological role of the ATXN8OS ORF and its connection to SCA8 remains to be determined.


Introduction
The spinocerebellar ataxias (SCAs) comprise a heterogeneous group of disorders involving progressive degeneration of the cerebellum, brainstem, and spinal tract [1]. Of all SCAs, SCA type 8 (SCA8) presents a molecular trait that distinguishes it from other dominant ataxias: its involving a CTG repeat expansion in the ATXN8OS (ataxin 8 opposite strand) gene and a CAG repeat expansion in the overlapping ATXN8 (ataxin 8) gene [2]. In the CTG direction, ATXN8OS expresses spliced and polyadenylated untranslated transcripts in various brain tissues [3]. In the CAG direction, the expanded ATXN8 encodes a polyglutamine expansion protein [4] known to be pathogenic in other polyglutamine disorders.
The pathogenesis of SCA8 appears to be complex. In addition to polyglutamine expansion protein in the CAG direction, other plausible mechanisms related to the transcripts in the CTG direction were also proposed. Firstly, in a Drosophila model, the ectopically expressed ATXN8OS RNA interacted with RNA binding proteins to lead to late-onset, progressive degeneration in the photoreceptor and pigment cells of flies [5], supporting a RNA gain-of function mechanism [6]. Secondly, partial loss of Klhl1 function with targeted deletion of a single Sca8 ataxia locus allele (including to overlapped KLHL1 gene) in mice leads to degeneration of Purkinje cell function [7], indicating an anti-sense RNA interference mechanism. Our recent study using a cellular model of ATXN8OS also revealed that SCA8 larger triplet expansion alters histone modification and induces RNA foci [8]. RNA foci were also seen in SCA8 patient and mouse brains with MBNL1 protein colocalized with these RNA foci [9].
Although being apparently non-coding [3], a 102 amino acidcontaining open reading frame (ORF) exists. The ORF is 446 nucleotides (according to NR_002717) or 1246 nucleotides (according to [10]) from the 59 end of ATXN8OS RNA (Fig. 1A). In eukaryotes, translation initiation involves recruitment of ribosomal subunits at either the 59 m7G cap structure or at an internal ribosome entry site (IRES). In cap-dependent mechanism, the initiation codon is located some distance downstream for most mRNAs, requiring ribosomal movement to this site, either linear or going around segments of the 59 leader to reach the initiation codon [11]. The cap-independent mechanism requires the formation of a complex RNA structural element termed IRES and the presence of trans-acting factors [12]. As a result, the ribosome entry window attains an unstructured conformation and in doing so facilitates ribosome recruitment. In addition, non-AUG triplets may be used as translation initiators for gene expression [13,14]. In this study we firstly examined the cap independent IRES activity in the ATXN8OS RNA using a dual luciferase reporter assay. Then we fused the ATXN8OS ORF inframe with an EGFP tag to investigate if the ATXN8OS ORF could be translated using cell culture studies. The ORF expression was validated in human lymphoblastoid, neuroblastoma, embryonic kidney cells and muscle tissue using ORF antiserum. The translation of ATXN8OS ORF was further examined by tandem MS determination.

IRES Activity of ATXN8OS RNA
Despite being apparently non-coding [3], a 102 amino-acid ORF (AUG +1247 ) was noted in the ATXN8OS transcripts (Fig. 1B). To investigate if this ORF can be translated via a cap independent IRES activity, we constructed a dicistronic vector pRF in which firefly luciferase was placed after the Renilla luciferase (Fig. 1C). The expression construct was under the control of the HSV-TK promoter. Sequence upstream of ATXN8OS ORF (+801,+1195; [10]) was inserted into the intercistronic region of the pRF. The IRES from the encephalomyocarditis virus (ECMV) [15] was inserted as a positive control. When the expressed luciferase level of the ECMV IRES was set as 100%, the +801,+1195 fragment directed firefly luciferase synthesis to a level of 33.7% and 19.6%, respectively, in HEK-293 and IMR-32 cells as compared to the ECMV IRES sequence (Fig. 1D, left). When the +801,+1195 fragment was further subdivided into +801,+953 and +953,+1195, levels of 29.9% and 180.8%, respectively, in HEK-293 cells was observed as compared to the +801,+1195 fragment sequence (Fig. 1D, right). The result suggests the possible IRES activity existing in the region upstream of ATXN8OS ORF.

ATXN8OS ORF Expression
To investigate if indeed the ATXN8OS ORF could be translated, we cloned the ATXN8OS cDNA (NR_002717) and in-frame fused an EGFP tag to the C terminal of the ATXN8OS ORF ( Fig. 2A, pCMV/+801). The transcripts made from this construct will be initiated from exon D (+801). As the promoter region upstream of exon D5 was identified by comparing human and mouse genomic DNA sequences flanking the 59 end of the transcripts [10], ATXN8OS gene sequence +1,+800 were included in construct pCMV/+1 so that transcripts made will be initiated from exon D5 (+1). In constructs pATXN8OS/2481 and pATXN8OS/2114, proximal ATXN8OS promoter fragments 2481,21 and 2114,21 were used to drive ATXN8OS expression to mimic the in vivo situation.
To visualize the expression of ORF-EGFP protein, confocal microscopic examination of GFP fluorescence was carried out after transfection of pIRES2-EGFP, pCMV/+801, pCMV/+1 and pATXN8OS/2114 constructs into HEK-293 cells. As shown in Fig. 3A, strong GFP fluorescence was distributed diffusely in pIRES2-EGFP-transfected cells. With pCMV/+801 construct, small and dispersed granules appeared mainly in the cytoplasm, in addition to showing diffuse cytoplasm expression. Cells transfected with pCMV/+1 or pATXN8OS/2114 gave sparse granules and weak, diffuse GFP fluorescence.
To examine the expressed ORF-EGFP protein, GST-ORF (S. japonicum GST from pGEX plasmid) fusion protein was prepared as antigen to raise antiserum in rabbit. Western blot immunostaining with GFP antibody or ORF antiserum was performed. As shown in Fig. 3B, similar proteins (40 and 30 kDa) were detected in cells transfected with pCMV/+801. Whereas the weakly expressed 40 kDa protein may represent the predicted ORF-EGFP protein (AUG +1247 start, 348 amino acids with MW of 39472; ExPASy: http://web.expasy.org/compute_pi/), the 30 kDa protein apparently differs from the predicted. The 30 kDa protein may be initiated from a downstream in-frame AUG codon (AUG +1490 start, 267 amino-acid fusion protein, MW 30061). A lager protein around 50 kDa was also noted by the Western blot either probing with GFP antibody or ORF antiserum. The existence of this 50 kDa protein indicated that ORF-EGFP protein may be translated from the sequence upstream of AUG +1247 .

ORF Immunodetection
To validate if indeed ATXN8OS ORF is expressed in human cells, ORF antiserum was used to detect the possible endogenous ORF protein. As we hardly detected ORF protein in RIPA-soluble fraction and also the predicted 102 amino acids ATXN8OS ORF protein has a 62.9% chance of insolubility when overexpressed in E. coli (http://www.biotech.ou.edu/), urea lysis buffer was used for lymphoblastoid protein extraction since the average molecular weight of proteins that dissolve exclusively in urea buffer is up to 60% higher than in RIPA buffer [16]. On Western blot staining with ORF antiserum, while no specific polypeptide was detected with pre-immune serum, an unexpected 23 kDa protein was detected in insoluble pellet fraction (Fig. 4A). The same 23 kDa protein was also observed in urea buffer-insoluble pellet fraction prepared from embryonic kidney 293 cells, neuroblastoma IMR-32, SK-N-SH, SH-SY5Y cells and human muscle tissue (Fig. 4B).

ORF Identification
To identify the endogenous ATXN8OS ORF protein, lymphoblastoid proteins from urea buffer-insoluble pellet fraction were subjected to 2D PAGE and 2D immunoblot (Fig. 5A). The identity of the three ORF-specific spots was determined using LC-MS/MS and Mascot data search in a database set up for the predicted ORF. As shown in Fig. 5B, six matched peptide with sequence coverage of 47% was obtained, including the N-terminal peptide VPCPGAPCCS LVATGSR which can only be generated from translation start from GUG +953 due to the stop codon UGA existing upstream of GUG +953 .

Discussion
The ATXN8OS gene was isolated from a single sample directly, using the RAPID cloning method [3,17]. Sequence analysis revealed that the expansion consisted of a stretch of 11 CTA repeats followed by 80 CTG repeats. Analysis of this sequence did not reveal any possible spliced isoform possessing an ORF to extend through the expansion in either direction. Therefore, SCA8 was first proposed to be caused by an RNA gain-of-function mechanism [6]. In this study, we used dual luciferase assay to demonstrate that ATXN8OS RNA +801,+1195 had IRES activity (Fig. 1). As ATXN8OS ORF detected in human cells was predicted to be translated from GUG +953 (Fig. 5), the IRES activity of +801,+953 was compared with that of +953,+1195. To our surprise, the +953,+1195 fragment showed higher IRES activity while less IRES activity was observed from +801,+953 fragment ( Fig. 1). The presence of a 12 amino-acid ORF (AU-G + 890,UAG +926 ) within +801,+953 fragment may explain the reduced amount of translation that occurs from the downstream firefly luciferase cistron. Similar translation read-through of cellular transcripts can be seen with human angiotensin II type 1 receptor (AGTR1) mRNA (IRES name: AT1R_var3; http:// www.iresite.org/IRESite_web. php?page = browse_cellular_transcripts) [18,19]. Accordingly, the enhancing IRES activity observed with +953,+1195 fragment may be explained by the removal of inhibition derived from the small ORF's translation. As cap-independent mechanism requires the formation of a complex RNA structural element and the presence of trans-acting factors, it is also likely that some inhibitory factors may exist within +801,+953 fragment and regulate ATXN8OS RNA IRES activity. The trans-acting factors are worthy to be further identified to investigate the translation mechanism of ATXN8OS RNA.
In our study, the predicted translation start GUG +953 was within the ATXN8OS IRES region +801,+1195, which is different from the general concept that putative IRES sequences are located in a close proximity to the 59 coding region of the genes. Nevertheless,  the putative IRES region (+33,+362, according to NM_004835) of hAT1R-C v3 mRNA also overlapped with translation of AGTR1 isoform II N-terminal 35 amino acids (AU-G +258 ,AAA +360 ) (http://www.iresite.org/IRESite_web. php?page = view&entry_id = 84), which could support our finding.
Whereas transcripts initiated from +801 and +1 displayed similar range of ORF RNA level, very different range of EGFP fluorescence was seen between transcripts initiated from +801 and +1 (Fig. 2). Unknown proteins or factors binding to ATXN8OS RNA +1,+800 to down-regulate ATXN8OS ORF translation are also worthy to be further investigated.
When the expression of ORF-EGFP protein was visualized by confocal microscopic examination, more or less small and dispersed cytosolic granules were observed (Fig. 3A), correlated with ORF RNA (Fig. 2B) and EGFP fluorescence (Fig. 2C) levels. The cytosolic expression of GFP-tagged ORF was also supported by Western blotting of stepwise isolation of cytoplasmic and nuclear fractions and confocal microscopy examination of images from continuous focal planes (data not shown).
Using antiserum raised against ORF, the expression of ATXN8OS ORF was validated in various human cells and muscle tissue (Fig. 4). The observed 23 kDa ORF protein is likely initiated from the GUG +953 (Fig. 5). A 50 kDa protein was also detected with EGFP antibody or ORF antiserum when ORF-EGFP fusion protein was transiently expressed in 293T cells (Fig. 3). As cellular and viral mRNAs can initiate from non-AUG codons that differ from AUG by just one nucleotide [14], the 23 kDa ORF protein or 50 kDa fusion protein was predicted to be initiated from the same upstream in-frame GUG codon (GUG +953 start, 200 aminoacid ORF, predicted MW 22669 or 446 amino-acid ORF-EGFP, predicted MW 50324).
Utilization of alternative non-AUG translation initiation codons has been demonstrated with increasing frequency in mammalian species, in addition to initiating at a downstream in-frame AUG codon [20]. Translation initiation on such mRNAs results in the synthesis of proteins harboring different amino terminal domains potentially conferring on these isoforms distinct functions. As alternative initiation sites are utilized for the synthesis of proteins that regulate biological processes in health and disease [21][22][23], the biological meaning of the ATXN8OS ORF protein and its role in the pathogenesis of SCA8 remains to be determined.
Previously bioinformatics analyses demonstrated that distinct consensus sequences (at 27 and 26 positions), upstream AUGs, 59-UTR sequence length, G/C ratio and IRES secondary structure are important for categorizing mRNAs as those with and without alternative translation initiation sites [24]. Among these properties, 59-UTR of the alternative translation initiation sites showed conservation of G/C at the 26 position and C at the 27 position. In contrast, the AUG initiation sites showed consensus at position 23 for A/G and position +4 for G/A [24,25]. The ATXN8OS ORF GUG initiation codon has conserved C at the 27 position but less abundance U at the 26 position, the downstream in-frame AUG codon has conserved A at the 23 position but also less abundance U at the +4 position. Although not well conserved at the 26 position, the conserved C at the 27 position and other un-analyzed properties may support the use of the second most common alternative translation initiation GUG site [23] for the translation of ATXN8OS ORF protein.
In summary, our study indicated that the ATXN8OS putative ORF protein could be translatable and may be expressed via a naturally occurring non-AUG start codon. The biological role of ATXN8OS ORF and its connection to SCA8 are deserving of further investigation.

Ethics Statement
This study was performed according to a protocol approved by the institutional review boards of Chang Gung Memorial Hospital, and all examinations were performed after obtaining written informed consents.

Dual Luciferase Reporter Constructs
The 1.3-kb ATXN8OS cDNA containing exons D, C2, C1, B, and A [26] (Fig. 1B) was cloned as described [8]. The ATXN8OS cDNA were then cloned into the EcoRI site of pEGFP-N1 (Clontech). To construct a dual luciferase reporter, a 76-bp XbaI-BamHI polylinker region of pcDNA3 was first added between the XbaI and BamHI sites of phRL-TK vector (Promega) to introduce a XhoI site as well as remove the SV40 late poly(A) region. Then a 1972-bp XhoI-BamHI fragment containing the firefly luciferase gene and the SV40 late poly(A) signal from pGL3-Basic vector (Promega) was placed between the XhoI and BamHI sites of the modified phRL-TK vector. The resulting dual luciferase reporter plasmid had Renilla luciferase and firefly luciferase genes between the TK promoter and polyadenylation signal (Fig. 1C). The ATXN8OS cDNA in pEGFP-N1 was restricted with XhoI and HaeIII and the blunted cDNA fragment (+801,+1195) was placed in the blunted XhoI site between the two luciferase genes. The sense and antisense primers used for ATXN8OS +801,+953 and +953,+1195 cDNA amplification were 59-GCGCCGAATT-CATCCTTCACCTGTT and 59-CAAAAGCTTCTCAG-CAGCCAGCCA, and 59-GGTTAGAATTCGTGCCCTGCC-CAGG and 59-AAATAAGCTTCCCGGCGGGGGGA, respectively (EcoRI and HindIII sites underlined). The resulting PCR products were cloned, sequenced and restricted with EcoRI and HindIII to replace the +801,+1195 fragment in dual luciferase reporter plasmid. The 632-bp blunted XhoI-MscI IRES fragment from pIRES2-EGFP (Clontech) was inserted between the two luciferase genes as a positive control.

Luciferase Reporter Assay
Human embryonic kidney HEK-293 and neuroblastoma IMR-32 cells were cultivated in Dulbecco's modified Eagle's medium (DMEM) containing 10% FBS. Cells were plated into 12-well dishes (2610 5 /well), grown for 20 hr and transfected by the lipofection method (GibcoBRL) with the test dual luciferase reporter plasmid (1.5 mg/well). The cells were grown for 48 hr. Then cell lysates were prepared and luciferase activity was measured by a luminometer using a dual luciferase assay system (Promega). The IRES activity was directly measured by the ratio of the firefly luciferase level to the Renilla luciferase level. For each construct, three independent transfection experiments were performed.

ATXN8OS ORF-EGFP Constructs
The ORF translation termination sequence in C1 exon was removed and a SmaI restricted site (underlined) was added by PCR using primer 59-GCGCCCGGGACACTTCAACTTCCTATA-CATACA and cloned into pGEM-T Easy (Promega). The EcoRI (in MCS of pGEM-T Easy vector)-SmaI fragment containing ATXN8OS ORF was in-frame fused to the EGFP gene in the pEGFP-N1 vector (between the EcoRI and BstUI sites). Portion of the Kozak consensus translation initiation sequence (ACCATG) in the EGFP gene was further removed by site-directed mutagenesis (primer 59-CGGGCCCGGGATCCACCGGTCGCCDGTGAG-CAAGGGCGAGGAGCTG, D = ACCATG) (QuikChange XL Site-Directed Mutagenesis Kit, Stratagene). The resulting pCMV/ +801 construct (where +801 represents transcription start site of exon D) ( Fig. 2A) was verified by DNA sequencing. The construct was predicted to encode an ORF-EGFP fusion protein containing 348 amino acids.

Real-time PCR Quantification of ORF-EGFP Transcripts
HEK-293 cells were plated into 6-well (6610 5 /well) dishes, grown for 20 hr and transfected with the pCMV/+801, pCMV/ +1, pATXN8OS/2114 and pATXN8OS/2481 constructs (4 mg/well). Forty-eight hours later, total RNA was extracted using the Trizol (Invitrogen). The RNA was DNase (Stratagene) treated, quantified, and reverse-transcribed to cDNA using High Capacity cDNA Reverse Transcription Kit (Applied Biosystems) with random primers. Using ABI PRISMH 7000 Sequence Detection System (Applied Biosystems), real-time quantitative PCR was performed on a cDNA amount equivalent to 250 ng total RNA with TaqMan fluorogenic probes Hs01382089-m1 (exon C2 and C1 boundary) for ATXN8OS and 4326321E for HPRT1 (endogenous control) (Applied Biosystems). Additional customized Assays-by-Design probe (forward primer: ACTG-CATTTCAGGAGCAAAAAGAGA, reverse primer: GTCCCTGTGGTTTGAATCTATTCCA, TaqManH probe: CAGTGGCCTCATTTTG) (ATXN8OS exon D5/D4 region, Applied Biosystems) was used for ATXN8OS mRNA quantification. Fold change was calculated using the formula 2 DCt , DC T = C T (control) -C T (target), in which C T indicates cycle threshold. Statistical analysis of differences between the groups was carried out using one-way analysis of variance (ANOVA).

FACS Analysis of ORF-EGFP Expression
HEK-293 cells were plated into 12-well (2610 5 /well) dishes, grown for 20 hr and transfected with the above ATXN8OS ORF-EGFP constructs, pIRES2-EGFP and pEGFP-N1 (2 mg/well). Cells were harvested for fluorescence activated cell sorting (FACS) analysis. The amounts of GFP expressed were analyzed in a FACStar flow cytometer (Becton-Dickinson), equipped with an argon laser operating at 530 nm. A forward scatter gate was established to exclude dead cells and cell debris from the analysis. 10 4 cells were analyzed in each sample.

GST-ORF Construct and Antiserum
To construct GST-tagged ORF for antiserum production, BstBI and EcoRI sites (italic) were added to the 59 and 39 ends of ATXN8OS ORF by PCR using primers 59-GCGCTTCGAATGTGCTTCACATCGAAGTC and 59-CCGGAATTCTCAACACTTCAACTTCCTATAC (initiation and termination codons in boldface). The 317-bp BstBI-EcoRI fragment containing ATXN8OS ORF sequences was then inserted between the AccI (location 928) and EcoRI (location 944) sites of pGEX-5X-3 (GE Healthcare). The location 928 AccI site (italic) used was added by site-directed mutagenesis using primer 59-GATCTGATCGAAGGTCGACGGATCCCCAGGAATTCC (mismatch nucleotides in boldface). The resulting pGST-ORF construct was verified by DNA sequencing and introduced into BL21(DE3)pLysS (Novagen). After IPTG induction, the 36-kDa antigen was purified using GST?Bind TM resin (Novagen) and used to raise antiserum in rabbit (LTK BioLaboratories).

Lymphoblastoid and Neuroblastoma Cell Lines
Lymphoblastoid cells from a normal control were established (Food Industry Research and Development Institute, Taiwan) after obtaining informed consent. Cells were maintained in RPMI 1640 medium (GIBCO) containing 10% FBS. Human neuroblastoma SK-N-SH, SH-SY5Y and IMR-32 cells were cultivated in DMEM (IMR-32 and SK-N-SH) or 1:1 mixture of DMEM and F12 medium (SH-SY5Y) containing 10% FBS.

ORF Identification
For 2D PAGE and 2D immunoblot, 5 volume of 9.8 M urea lysis buffer was added and aliquots of pellet suspension were first separated using Immobiline DryStrip (7 cm, pH 3-10) (GE Healthcare) and further separated by a 12.5% SDS-PAGE. The blotting membranes were stained with ORF antiserum (1:200 dilution) or actin antibody (1:10000 dilution, Chemicon) and immune complexes detected as described. The 2D gel was stained with SYPRO Ruby (Molecular Probe) and scanned on a Typhoon 9400 imager (GE Healthcare). The map was compared to the 2D immunoblot. The ORF-specific spots were punched out and subjected to reduction and alkylation by DTT/iodoacetamide, followed by in-gel digestion with freshly prepared Trypsin Gold (2.5 ng/ml, Promega) at 37uC for overnight. The obtained peptides were extracted with 50% acetonitrile containing 1% trifluoroacetic acid and tandem mass spectra were generated by liquid chromatography-mass spectrometry/mass spectrometry (LC-MS/MS) at Proteomics and Protein Function Core Laboratory, Center of Genomic Medicine, National Taiwan University. MS/MS data were searched using the Mascot search engine (www.matrixscience.com) in a database containing theoretical trypsinized fragments of 23-kDa ORF protein initiated at GUG +953 codon.