Transcriptional Complexity and Distinct Expression Patterns of auts2 Paralogs in Danio rerio

Several genes that have been implicated in autism spectrum disorders (ASDs) have multiple transcripts. Therefore, comprehensive transcript annotation is critical for determining the respective gene function. The autism susceptibility candidate 2 (AUTS2) gene is associated with various neurological disorders, including autism and brain malformation. AUTS2 is important for activation of transcription of neural specific genes, neuronal migration, and neurite outgrowth. Here, we present evidence for significant transcriptional complexity in the auts2 gene locus in the zebrafish genome, as well as in genomic loci of auts2 paralogous genes fbrsl1 and fbrs. Several genes that have been implicated in ASDs are large and have multiple transcripts. Neurons are especially enriched with longer transcripts compared to nonneural cell types. The human autism susceptibility candidate 2 (AUTS2) gene is ∼1.2 Mb long and is implicated in a number of neurological disorders including autism, intellectual disability, addiction, and developmental delay. Recent studies show AUTS2 to be important for activation of transcription of neural specific genes, neuronal migration, and neurite outgrowth. However, much remains to be understood regarding the transcriptional complexity and the functional roles of AUTS2 in neurodevelopment. Zebrafish provide an excellent model system for studying both these questions. We undertook genomic identification and characterization of auts2 and its paralogous genes in zebrafish. There are four auts2 family genes in zebrafish: auts2a, auts2b, fbrsl1, and fbrs. The absence of complete annotation of their structures hampers functional studies. We present evidence for transcriptional complexity of these four genes mediated by alternative splicing and alternative promoter usage. Furthermore, the expression of the various paralogs is tightly regulated both spatially and developmentally. Our findings suggest that auts2 paralogs serve distinct functions in the development and functioning of target tissues.

. Although several features, such as putative nuclear localization sequences (NLS), two proline-rich regions (PR1 and PR2), the PY motif (PPPY), and histidine repeats, can be predicted in the human AUTS2 protein (Sultana et al. 2002;Oksenberg and Ahituv 2013), it does not contain any regions of homology to other proteins, except a region called the Auts2 family domain, with homology to the fibrosin (FBRS) and fibrosin-like 1 (FBRSL1) proteins. AUTS2, FBRSL1, and FBRS genes thus form a paralog group. Paralogs are known to be created by a duplication event within a genome and may evolve new functions. The functional roles of Auts2 family proteins and whether they are functionally diversified are not yet entirely clear.
Recently, proteomic analysis revealed that AUTS2, FBRS, and FBRSL1 are associated with the same subset of Polycomb Repressive Complex 1 (PRC1) (Gao et al. 2012) and, in particular, AUTS2 renders PRC1 capable of transcription activation (Gao et al. 2014). Though initially thought to be a nuclear protein (Bedogni et al. 2010), AUTS2 was shown to be also present in the cytoplasm, where it regulates Rho family GTPases to control neurite outgrowth and neuronal migration (Hori et al. 2014). Thus, it seems that AUTS2 is a multifunctional protein regulating distinct pathways in neural development. The roles of FBRS and FBRSL1 in neural development have not yet been explored.
We wanted to study the functions of auts2 family genes in neural development using the zebrafish model system. Zebrafish provide multiple advantages for addressing these questions because of the ease of genetics, the transparency in early life stages, and the ability to image and record neural activity during nervous system development. As a first step toward this goal, in this manuscript, we describe the transcriptional complexity and the distinct expression patterns of auts2 family genes in zebrafish.

MATERIALS AND METHODS
Zebrafish use and care Zebrafish (Danio rerio) of Indian wild-type strain were purchased from local suppliers and housed in aquarium tanks at 28°with a 14:10 hr light:dark cycle. Fish were maintained according to established protocols (Westerfield 2000) in agreement with the Institutional Animal Ethics Committee and the Institutional Biosafety Committee, National Centre for Biological Sciences.
Sequence collection and gene structure annotation auts2 paralogs were identified using the Ensembl genome browser (GRCz10, Ensembl release 86). We manually curated sequences to extract intron-exon structural information. Gene models were retrieved from the Reference Sequence (RefSeq) database (NCBI D. rerio Annotation Release 105) and were validated experimentally by the sequence analysis of cDNA clones from our 59-RACE (rapid amplification of cDNA ends) and RT-PCR experiments (see below). The putative alternative transcription start sites (TSSs) were identified using our 59-RACE data and RNASeq gene models, generated from the Wellcome Trust Sanger Institute Zebrafish Transcriptome Sequencing Project, Ref: ERP000016 (Collins et al. 2012). The annotated RNASeq gene models are incorporated into the Ensembl genome browser (www.ensembl.org/Danio_rerio/). In our analysis, we assumed that if the position of the first nucleotide in RNASeq transcript is annotated in an intron, it can be considered as a putative TSS.
Rapid amplification of 59 cDNA ends (59-RACE), RT-PCR, and cloning of the full-length cDNAs of auts2 paralogs For 59-RACE experiments, one microgram of a total RNA isolated from 24 hr embryos was utilized as a template for synthesis of first-strand cDNA using a SMARTer RACE cDNA amplification kit (Clontech) according to the manufacturer's instructions. The 59-RACE reactions were performed using the Advantage 2 Polymerase Mix (Clontech). Final PCR products were cloned into pCRII-TOPO vector (Invitrogen) and sequenced.
For RT-PCR analysis, total RNA was isolated from zebrafish embryos at different developmental stages using an RNeasy Mini kit (QIAGEN) and first-strand cDNAs were synthesized from 1 mg of a total RNA by oligo(dT) priming using SMARTScribe Reverse Transcriptase (Clontech) according to the manufacturer's protocol. Amplification of cDNA was performed using Herculase II Fusion DNA polymerase (Agilent). Identity of amplified PCR products was verified by direct sequencing. The same batch of cDNAs was used to profile expression of auts2 paralogs during development.
Full-length cDNAs of auts2 paralogs were amplified using Q5 Hot Start High-Fidelity DNA Polymerase (New England Biolabs) and the resulting PCR products were cloned into pCR-Blunt vector (Invitrogen). Positive clones were verified by sequencing. Sequences of all primers used in this study will be provided upon request.

RNA probe synthesis
The cDNA-containing vectors were linearized with appropriate restriction enzymes (detailed maps of vectors will be provided upon request) and used as a template for RNA probe synthesis. Sense and antisense RNA probes were synthesized using MEGAscript SP6 or T7 kits (Ambion) and either digoxigenin-labeled or fluorescein-labeled rNTPs (Roche).

Double in situ hybridization
To detect two riboprobes simultaneously, we used a two-color in situ hybridization protocol combining colorimetric and tyramide signal amplification-based fluorescent detection systems. The procedure for double in situ hybridization on embryos was essentially the same as that for colorimetric WISH, with the following modifications. (1) Prior to proteinase K treatment, embryos were incubated in 3% H 2 O 2 in PBST for 30 min to quench endogenous peroxidase activity.
(2) Digoxigenin-labeled and fluorescein-labeled riboprobes were mixed and used simultaneously during hybridization. (3) We first detected the fluorescein-labeled riboprobe using anti-fluorescein antibody, conjugated with horse-radish peroxidase (1:500; Roche). After washes, embryos were incubated for 30 min in the dark with the tyramide-Alexa Fluor 488 (Cat. No. T20948; Molecular Probes) working solution, prepared according to the manufacturer's instructions. (4) The detection of the digoxigenin-labeled riboprobe was performed as for the colorimetric WISH using anti-digoxigenin antibody, conjugated with alkaline phosphatase. In case of fluorescent detection of the digoxigenin-labeled riboprobe, we first quenched peroxidase conjugated with anti-fluorescein antibody followed by incubation of embryos with anti-digoxigenin antibody conjugated with peroxidase (1:1000; Roche). After washes, embryos were incubated with the tyramide-Alexa Fluor 594 (Cat. No. T20950; Molecular Probes) working solution.

Microscopy and imaging
Embryos were mounted in 70% glycerol for microscopy and imaging. The brains were embedded in 1.5% agar, blocks were saturated with 30% sucrose in PBS at 4°overnight, and then cut at 12 mm using a cryostat (Leica CM1850 UV). Light and fluorescent images were acquired using a stereomicroscope (OLYMPUS SZX16) equipped with a mercury lamp and digital camera (Jenoptik ProgRes C3). The images were processed using ImageJ (NIH) and Adobe Photoshop CS3.

Data availability
All constructs, primers, and raw data are available upon request. Nucleotide sequences are deposited to GenBank under accession numbers KY492367-KY492385.

Structure of the zebrafish auts2a gene locus
We initially performed bioinformatic analysis to evaluate the structure of the zebrafish auts2a gene locus. The current RefSeq auts2a gene model (NCBI Gene ID: 368890) defines 19 exons annotated in this genomic locus (exons 1A-19 in Figure 1A) and incorporates six computationally annotated transcripts. The RefSeq transcript XM_009305336 begins with untranslated exon 1A ( Figure 1B) and codes for the long protein isoform with 1278 amino acid residues (see Figure S2A). Transcript XM_009305341 begins with the mutually exclusive exon 1B located in intron 3 and spliced to exon 4 ( Figure 1B). This transcript codes for a protein of 1128 amino acid residues with 16 unique residues at the N-terminal end and lacking 166 N-terminal amino acid residues present in the long isoform (see Figure S2B). The other four RefSeq transcripts represent alternatively spliced variants of the long isoform XM_009305336. Alternative splicing occurs at tandem splice acceptors with a NAGNAG motif found at 39 acceptor splice sites of exon 3 (transcript XM_017358057), exon 8 (transcript XM_017358056), and exon 15 (transcript XM_017358058). Alternative tandem splicing of mRNA leads to the exclusion of 3 nt (nucleotides) and, as a result, deletion of a single amino acid in the protein ( Figure 1D). Splicing at an alternative 59 donor splice site of exon 9 (transcript XM_017358059) leads to the exclusion of 21 nt from the spliced mRNA and, as a result, in-frame deletion of seven amino acids in the auts2a protein ( Figure 1D).
In addition to the RefSeq transcripts, numerous RNASeq transcripts generated from the Wellcome Trust Sanger Institute Zebrafish Transcriptome Sequencing Project (Collins et al. 2012) are currently incorporated into the auts2a gene locus (for details visit www.ensembl.org/ Danio_rerio/). The majority of RNASeq transcripts are apparently transcribed from unique TSSs and detected at specific developmental time points (summarized in Table S2). Analysis of RNASeq data revealed two additional mutually exclusive first exons, 1D and 1E, located in introns 5 and 6 and spliced to exons 6 and 7, respectively ( Figure 1A). Transcript RNASEQT00000015232, identified from an olfactory epithelium RNA library (see Table S2), begins with exon 1E and encodes a protein of 1083 amino acid residues with 32 unique residues at the N-terminus. It lacks 227 residues in the N-terminus that are present in the long protein isoform (see Figure S2B). In the zebrafish genome assembly (GRCz10, Ensembl release 86), transcript ENSDART00000078920 corresponds to RNASEQT00000015232.
From in silico data analysis, we could define the presence of multiple putative TSSs that are apparently used to generate alternative auts2a mRNAs. Assuming that different TSSs are supposed to be associated with the corresponding putative alternative promoters (PAPs), we could predict, in addition to the major 59 promoter associated with TSS1, the existence of multiple PAPs in the auts2a gene locus ( Figure 1A).

Identification of novel transcripts in the zebrafish auts2a gene locus
To clarify in silico data and to identify novel mRNAs transcribed from the auts2a gene locus, we performed 59-RACE and RT-PCR analyses followed by isolation of full-length cDNAs encoding zebrafish auts2a mRNAs.
For 59-RACE, we designed gene-specific primers in the 59-part of the gene (in exons 4 and 6). Total RNA isolated from 24 hr embryos was used as a template in 59-RACE experiments. We amplified 11 59-RACE products. The majority of the 59-ends were mapped in close proximity to the annotated positions of the first nucleotides of RNASeq or RefSeq transcripts (see Figure S3 for details). Sequence analysis of 59-RACE products confirmed previously annotated alternative first exons 1A, 1B, and 1D, and identified a novel mutually exclusive first exon 1C located in intron 4, placed 51,016 nt downstream from exon 1B ( Figure  1A). Exon 1C is spliced to exon 4, similar to exon 1B, and no overlapping RefSeq or RNASeq annotations were found for this exon. When we used a reverse primer designed in the 39-UTR of exon 19, we could not amplify cDNA beginning with exon 1C. However, the transcript corresponding to the 59-RACE product could be detected by RT-PCR during analysis of auts2a expression through zebrafish development (TSS5 in Figure 1C), suggesting that RNA transcribed from TSS5 has an alternative 39-end.
We noticed a few interesting features when we analyzed the 59-RACE results. First, multiple 59-ends of different lengths were mapped into exons 1A (five 59-ends) and 1D (four 59-ends) (see Figure S3, A and E). The mapping was in close proximity to the annotated positions of the first nucleotides of RNASeq and/or RefSeq transcripts. Remarkably, all TSSs in exon 1A (annotated and experimentally determined) were clustered within 58 nt, while TSSs in exon 1D were spread over 300 nt. From genome-wide analysis of mammalian promoters, it was found that sharp starting sites are generally associated with promoters having TATA-boxes, while promoters associated with CpG islands do not show an accurate TSS, but instead a broad distribution of TSSs generally spread over 100 nt (Carninci et al. 2006;Gustincich et al. 2006). We found a TATA-box motif in the promoter associated with TSS1 (see Figure S3A), but not in the promoter associated with TSS6. Except for the brain, where TSSs are surprisingly enriched in CpG islands, TATA-box-promoted transcripts tend to be tissue-specific (Gustincich et al. 2006). Despite this fact, the presence of multiple TSSs mapped to a single exon could also be accounted for by random transcription initiation events from the same promoter; even a small extension in the 59-end sequence may also play an important role, for example, by harboring upstream ATGs (and upstream ORFs) that throttle the translation from the downstream authentic ATG.
Second, in one case we mapped a 59-end into the coding region of exon 2 (TSS3 in Figure 1A), 603-bp downstream from the first nucleotide of transcript RNASEQT00000008207 annotated in intron 1 (this transcript begins with alternative exon 2L, see Figure S3B). It is generally assumed that the range over which the TSSs are scattered is on average 62 bp (Suzuki et al. 2001) and that two independent TSS clusters, associated with distinct PAPs, are separated with . 500 bp intervals (Kimura et al. 2006). Since TSS3 was mapped at a distance . 500 bp from the first nucleotide of the RNASeq transcript, it is likely that TSS3 is associated with a different PAP, perhaps overlapping with that of TSS2. Although the location of TSS within the internal exon could be accounted for by the truncated cDNAs, it could also be a genuine "internal" putative promoter with unique features different from canonical promoters since the corresponding genomic regions  Table S2. Exons 2L, 6L, 7L, 8L, 16L, and 17L are 59 extensions of the corresponding constitutive exons. (C) RT-PCR analysis of auts2a isoform expression during zebrafish development. Primers amplifying the full-length transcripts were used for analysis. Letters a, b, and c stand for the respective isoforms. (D) Partial DNA sequence of coding exons undergoing alternative splicing. Positions of alternative 59 donor (exon 9) or 39 acceptor (exons 3, 8, and 15) splice sites are highlighted in orange. Constitutive splice sites are underlined. Splicing at these sites leads to in-frame deletion of either single or seven amino acids (red) in the Auts2a protein. Exonic sequence is shown in upper case. Alternatively spliced isoforms identified in this study correspond to RefSeq transcripts (GenBank accession numbers are shown). ID, identifier; PAP, putative alternative promoter; RACE, rapid amplification of cDNA ends; RNASeq, RNA sequencing; RT-PCR, reverse transcription-polymerase chain reaction; TSS, transcription start site. serve as both exon and promoter. Interestingly, genes having TATAbox promoters are also preferentially associated with the presence of unusual transcripts, originating from exons (Carninci et al. 2006).
To amplify full-length cDNAs that begin with alternative first exons, we designed a set of exon-specific forward primers that bind in close proximity to the annotated TSSs (see Figure S3 for positions of individual primers). Since reverse primers were bound to the 39-UTR of exon 19, we could only amplify the population of cDNAs with complete ORFs that differ in the 59-end but were similar in the 39-end. With such an approach, the presence of alternative 39-ends was not explored. The same set of primers was used for RT-PCR analysis of auts2a expression during zebrafish development.
We cloned four variants of the long isoform auts2a-i1, all beginning with exon 1A and corresponding to the RefSeq transcripts: (1) auts2a-i1a (XM_009305336), (2) auts2a-i1b (XM_017358057), and (3) auts2a-i1c (XM_017358056). In the fourth transcript variant auts2a-i1d, noncoding exon 1A was spliced to exon 2 by utilizing a cryptic 59 donor splice site located in the intron, 132 nt downstream from the constitutive splice site (see Figure S3A). This leads to the formation of a transcript with different 59-UTR length but does not affect the coding region of auts2a. Transcript auts2a-i1d also has an alternatively spliced exon 8, similar to isoform auts2a-i1c. We did not clone transcript variants corresponding to XM_017358059 (alternative exon 15) and XM_017358058 (alternative exon 9). However, direct sequencing of PCR products revealed coamplification of transcripts with constitutively and alternatively spliced exons 3, 8, and 15. RT-PCR analysis showed that the auts2a-i1 isoform is maternally supplied and is also present from 9 to 72 hr (the last examined time point; TSS1 in Figure 1C).
Transcript isoform auts2a-i3 begins with untranslated exon 1D and contains alternatively spliced exons 8 and 15 ( Figure 1B). Direct sequencing of PCR products revealed coamplification of transcripts with constitutively and alternatively spliced exons 8 and 15, similar to the long isoform auts2a-i1. Exon 8 in isoform auts2a-i3 consists of 181 nt of the 59-UTR and 94 nt of CDS, and splicing at an alternative 39 acceptor splice site in this isoform does not affect protein sequence as it does in isoform auts2a-i2 (deletion of serine). The ORF of this transcript is identical to that of the long isoform except that it lacks 467 amino acid residues in the N-terminus ( Figure S2A). Similar to isoform auts2a-i1, auts2a-i3 is maternally supplied and then is detected from 6 to 72 hr during zebrafish development (TSS6 in Figure 1C).
Two novel transcript isoforms, auts2a-i4a and auts2a-i4b beginning with exon 1E, were isolated during RT-PCR analysis. In contrast to the annotated transcript RNASEQT00000015232, exon 1E in isoform auts2a-i4a has 85 nt of 39 extension due to the usage of alternative 59 donor splice site (see Figure S3G). Isoform auts2a-i4a encodes a protein of 1082 amino acid residues with a unique stretch of nine residues at the N-terminal end vs. 32 unique residues in protein coded by RNA-SEQT00000015232 (see Figure S2B). In isoform auts2a-i4b, the first exon retains 1786 nt of intronic sequence and extends through exon 7. Despite this intron retention, the ORF of auts2a-i4b is identical to that of the auts2a-i3 isoform. However, the sequence of the 59-UTR (2880 nt long) in this isoform differs from that in isoform auts2a-i3 (59-UTR of 1684 nt long). During zebrafish development, isoform auts2a-i4a was detected exclusively at 9 hr, while auts2a-i4b could be detected at 12, 16, and 48 hr (TSS8 in Figure 1C). We could not isolate cDNA with the structure of exon 1E corresponding to RNASEQT00000015232. Since this RNASeq transcript was found in sample prepared from olfactory epithelium of adult fish (see Table S2), it may explain why we could not amplify it during our RT-PCR analysis.
RNASeq data show two transcripts beginning with exon 8L (59 extension of exon 8) having alternative 39-ends and encoding small proteins of 219 and 196 amino acid residues with unique N-and C-terminal ends (see Figure 1B and Table S2). During RT-PCR analysis, we could isolate three novel isoforms beginning with exon 8L that share their 39-ends with the long isoform auts2a-i1. Isoform auts2a-i5a encodes a protein of 888 amino acid residues with 15 unique residues at the N-terminal end ( Figure 1B and Figure S2B). Two other isoforms represent alternatively spliced transcripts with retention of intronic sequences that lead to premature stop codons. Isoform auts2a-i5b retains 38 nt from intron 15, while isoform auts2a-i5c retains 745 nt from introns 12 and 13, including exon 13. During zebrafish development, isoform auts2a-i5a was detected at 12, 16, 48, and 72 hr; isoform auts2a-i5b at 9 and 12 hr; and isoform auts2a-i5c at 24 hr only (TSS10 in Figure 1C).

Expression of auts2a mRNA during zebrafish development
We conducted WISH to examine how the spatial expression patterns of auts2a transcripts vary during development. Two isoforms, auts2a-i1 and auts2a-i3, were expressed through all developmental stages according to RT-PCR analysis ( Figure 1C). Since the usage of an antisense RNA probe detecting full-length auts2a-i1 isoform showed ubiquitous staining (data not shown), we used an antisense probe detecting fulllength auts2a-i3 isoform for analysis, which showed more restricted staining. For probe details see Materials and Methods.
Weak expression of auts2a-i3 is first detected at the 1-somite stage (10.3 hr) in a medial part of the presumptive forebrain and more intensely in the presumptive hindbrain. In the hindbrain region, auts2a-i3 expression extends into the mesoderm laterally to the neural plate. It is also expressed in the somite (Figure 2, A and A'). At the 3-somite stage (11 hr), auts2a-i3 continues to be weakly expressed in the presumptive forebrain and more intensely in the presumptive midbrain and hindbrain. It is also expressed outside of the neural plate, in two longitudinal stripes lateral to the hindbrain, presumably the primordial cranial ganglia, and in somites (Figure 2, B and B'). During neural keel formation (12-14 hr), weak expression of auts2a-i3 is continued in the forebrain and more intensely in midbrain and hindbrain, with stronger expression in presumed rhombomere 4 ( Figure 2, C-C" and D-D"). To demarcate auts2a expression in the brain at this developmental stage, we performed double in situ hybridization using an egr2a (krox20) probe to mark rhombomeres 3 and 5 ( Figure 2E'), and a wnt1 probe to mark the midbrain-hindbrain boundary (Figure 2, F' and G'). Indeed, auts2a-i3 was expressed in the telencephalon with more intense expression in the dorsal part, and ventrally in the rostral diencephalon ( Figure 2G), posterior midbrain, rhombomeres 1 and 4, and the spinal cord (Figure 2, E, F, and G). At 18 hr, auts2a-i3 is expressed in the telencephalon, diencephalon, midbrain, hindbrain, and spinal cord. It is also expressed in somites (Figure 2, H and H'). At 24 hr, auts2a-i3 is expressed in the brain with stronger expression in the tectum. It is also expressed in the spinal cord, somites, and pectoral fin buds (Figure 2, I and I').
We also performed in situ hybridization analysis of auts2a-i3 expression in the juvenile zebrafish brain (Figure 2, J-Q). In the forebrain, auts2a-i3 expression is detected in the ventral telencephalic area with strong expression in the ventral nucleus (Vv) and anterior part of the parvocellular preoptic nucleus (PPa), as well as in the dorsal telencephalic area with strong expression in posterior (Dp) and medial parts (Dm) (Figure 2, K and L). Expression of auts2a-i3 is also detected in the mammillary body (CM) ( Figure 2P), a part of the hypothalamus that is important for recollective memory in rodents (Vann 2010), and in the periventricular zone of the hypothalamus (Hv) ( Figure 2M). In the midbrain, auts2a-i3 expression is primarily detected in layer three of the periventricular gray zone of the optic tectum (PGZ) (Figure 2, M-O). In the cerebellum, auts2a-i3 is expressed in the Purkinje cell layer (PCL) ( Figure 2P); in the hindbrain, expression is detected dorsally in facial (LVII) and glossopharyngeal (LIX) lobes, the caudal octavolateralis nucleus (CON), around the rhombencephalic ventricle (RV); and ventrally, in the inferior reticular formation (IRF) and inferior olive (IO) ( Figure 2Q).
Since multiple transcript isoforms were identified in the auts2a gene locus, we wanted to determine if these isoforms are differentially expressed during development. To evaluate isoform-specific expression of auts2a transcripts, we designed riboprobes that recognize alternative first exons: mutually exclusive exons (see Figure S4A). For in situ hybridization, we chose 24 hr embryos as this was the earliest stage at which three of the auts2a isoforms were expressed ( Figure 1C). We could detect reliable expression signals for auts2a transcripts generated from TSS1 and TSS6, but not for that from TSS4 ( Figure S4B), probably because of the short length of the probe (204 bp). The expression pattern of auts2a isoforms generated from TSS1 and TSS6 was quite similar in the brain, with stronger expression of the TSS6 isoform in the telencephalon and midbrain (tectum) in comparison to the TSS1 isoform ( Figure S4B).

Duplication of the auts2 gene in teleost genomes
During evolution, teleost genomes experienced an additional round of whole-genome duplication (Christoffels et al. 2004;Hoegg et al. 2004). As a result, the zebrafish genome contains two auts2 genes, auts2a on chromosome 10 and auts2b on chromosome 15 ( Figure 3C). The current RefSeq auts2b gene model (NCBI Gene ID: 100150849) defines 14 exons present in transcript XM_001921276 ( Figure 3A). In addition, transcript XM_017351168 represents an alternatively spliced variant with a mutually exclusive first exon 1B, which is located in intron 2 and spliced to exon 3 ( Figure 3A). Analysis of RNASeq data allowed the prediction of an additional three alternative first exons representing 59 extensions of constitutive exons: exon 2L (TSS2), exon 6L (TSS4), and exon 11L (TSS5) (summarized in Figure 3A and Table S3).
We performed 59-RACE with gene-specific primers designed in exons 3 and 4. We isolated a single 59-RACE product and mapped it in exon 1, 588 nt downstream from the first nucleotides of RNASeq transcript RNASEQT00000017723 (see Figure S5), which is not annotated in the current Ensembl release, but was included in the previous genome build (Zv9, Ensembl release 75). With a forward primer, located 217 nt downstream from the first nucleotide of RNASEQT00000017723, we isolated full-length cDNA structurally corresponding to XM_001921276 and ENSDART00000161379. During the sequence analysis of clones, we also identified a novel transcript with intron retention (106 bp of intron 5) leading to a premature stop codon.
Based on structure comparison between auts2a and auts2b genes, we assumed that, after whole-genome duplication, the 59 genomic region of the ancestral gene corresponding to exons 1A-6 of auts2a was subsequently lost from the auts2b gene locus. Surprisingly, in silico analysis revealed the absence of the full-length auts2b gene in other teleosts whose genomes were sequenced (amazon molly, cod, fugu, medaka, platyfish, stickleback, tetraodon, and tilapia) with one exception. The genome of cave fish (Astyanax mexicanus) contains the auts2b gene but lacks the auts2a gene. It is not clear if auts2a gene is absent in the cave fish genome or it is simply not annotated in a current version of the cave fish whole-genome assembly due to incomplete sequencing. Protein sequence comparison and analysis of synteny strongly supports the identification of the cave fish auts2 gene as an auts2b ortholog. We analyzed genomic regions in other teleost genomes that are syntenic to the auts2b locus in zebrafish genome. With the exception of the fugu genome, the other loci possess the predicted protein coding gene in the position of the auts2b gene. Protein sequence and gene structure analysis revealed a single conserved exon. This exon corresponds to exon 9 in the auts2b gene ( Figure 3B). Exon 9 encodes 24 highly conserved amino acids within the Auts2 family domain (a single amino acid substitution in the Auts2a protein among teleosts and 100% identity in the Auts2b protein between zebrafish and cave fish) ( Figure 3D). The rest of the amino acid sequence is highly diverse with low sequence similarity between analyzed proteins. At least in some teleosts this "gene" is transcribed as it is supported by RNASeq data from amazon molly, platyfish, and tilapia.
Expression of the auts2b gene during zebrafish development RT-PCR analysis shows that auts2b is a zygotic gene in contrast to maternally supplied auts2a ( Figure 4A). At 90% of epiboly (9 hr), auts2b is expressed in the rostral neural plate, the prospective forebrain territory (Figure 4, B and B'). At the 1-somite stage (10.3 hr), auts2b is expressed in the presumptive forebrain, presumptive hindbrain, and spinal cord (Figure 4, C and C'). At the 3-somite stage (11 hr), auts2b is strongly expressed in the forebrain, hindbrain, and faintly in the midbrain (Figure 4, D and D'). During neural keel formation (12-14 hr), auts2b is expressed in the telencephalon, ventral diencephalon, hindbrain (more strongly in rhombomere 2 and less in rhombomeres 1 and 4), spinal cord, and somites (Figure 4, E-F"). The identity of rhombomeres was confirmed by using second probes, egr2a (krox20) to mark rhombomeres 3 and 5 (Figure 4 We also performed in situ hybridization analysis of auts2b expression in the juvenile zebrafish brain ( Figure 5). In the forebrain, expression of auts2b was detected in Dp and Dm zones of the dorsal telencephalic area, the anterior part of the PPa, the ventral entopeduncular nucleus (vENT), the lateral nucleus of ventral telencephalic area (Vl) (Figure 5, A and B), and the ventral zone of the periventricular hypothalamus (Hv) ( Figure 5C). In the midbrain, auts2b expression was not detected (Figure 5, C and D). In the cerebellum, auts2b expression was observed in the granule cell layer of the corpus cerebelli (CCe) and eminentia granularis (EG) ( Figure 5E). More caudally, auts2b is expressed in the caudal lobe of the cerebellum (LCa), a granular layer of EG ( Figure 5F). In the hindbrain, expression of auts2b is detected in LVII and vagal (LX) lobes ( Figure 5G).
Identification of the 59-ends of the zebrafish fbrsl1 gene and isolation of novel fbrsl1 transcripts In contrast to the auts2a gene, the RefSeq fbrsl1 gene model (NCBI Gene ID: 557358) lacks a completely annotated 59-end. Nevertheless, , and sim1a (J and J') as second probes. de, diencephalon; dth, dorsal thalamus; fb, forebrain; hb, hindbrain; mb, midbrain; mhb, midbrain-hindbrain boundary; r, rhombomere; rp, roof plate; RT-PCR, reverse transcription-polymerase chain reaction; s, somites; te, telencephalon; vde, ventral diencephalon; vth, ventral thalamus. similar to the auts2a gene locus, several RNASeq transcripts coding for short polypeptides or representing noncoding RNAs are annotated in the fbrsl1 gene locus (summarized in Figure 6A, for more details see Table S4 and the Ensembl browser). To define the 59-end of the fbrsl1 gene, we performed 59-RACE analysis with gene-specific primers designed to exons 4, 5, and 7. We isolated 11 different 59-RACE products and annotated two novel exons associated with them (see Figure  S6, A-C).
From 59-RACE experiments, six 59-ends were mapped to the untranslated region of exon 1A and one 59-end into the translated region of exon 1A, 214 nt downstream from ATG (see Figure S6A). Mapped TSSs were clustered in two groups (five 59-ends in one group and two 59-ends in the other) separated by 593 bp. Based on the idea that two independent TSS clusters associated with distinct PAPs are separated with . 500-bp intervals (Kimura et al. 2006), we predict that these two groups of TSSs are linked to distinct PAPs, likely overlapping, and can be recognized as separate TSSs: TSS1 and TSS2 ( Figure 6B).
Using a forward primer designed in close proximity to the most distant 59-end (see Figure S6A), we could isolate two transcripts, both beginning with exon 1A and including 19 exons. The difference between them is that the two isoforms, fbrsl1-i1a and fbrsl1-i1b, contain mutually exclusive exons 9a and 9b, respectively ( Figure 6B). Transcripts fbrsl1-i1a and fbrsl1-i1b code protein isoforms of 1215 and 1249 amino acid residues, respectively (see Figure S6). The RefSeq transcript XM_009301264, although lacking exon 1A, apparently represents an alternative splice variant of fbrsl1-i1b with exon 9b being spliced at an alternative 59 donor splice site leading to the exclusion of 21 nt from the spliced mRNA and, hence, an in-frame deletion of seven amino acids in the protein ( Figure 6D). Transcript XM_009301264 also has an alternative exon 16, which is spliced at an alternative 59 donor splice site leading to the exclusion of 6 nt from mRNA and therefore an in-frame deletion of two amino acids (glycine and lysine) and substitution of tyrosine with aspartic acid ( Figure 6D and Figure S6). Notably, splicing at the alternative 59 donor splice site of exon 9b is exactly the same as that in exon 9 of the XM_017358059 transcript from the auts2a gene locus (with deletion of identical amino acids).
The second novel exon, noncoding exon 1B identified in intron 5, is spliced to exon 6 ( Figure 6A). We mapped three 59-ends of different lengths to this exon (see Figure S6C). In a previous zebrafish genome build (Zv9, Ensembl release 75), RNASeq transcript RNA-SEQT00000006457 was annotated with exon 1B and its first nucleotide was 145 nt upstream from the TSS mapped in this study (see Figure  S6C). In the current zebrafish genome assembly (GRC10) this transcript is not included. However, we could isolate cDNA fbrsl1-i2b corresponding to RNASEQT00000006457 ( Figure 6B). Transcript fbrsl1-i2b includes mutually exclusive exon 9b and codes for a short protein isoform. The ORF of this short transcript is identical to that of the long isoform except that it lacks 405 N-terminal amino acids present in the long isoform, and is predicted to encode a polypeptide of 842 amino acid residues (see Figure S7). Ensembl transcript ENSDART00000081064 and RNASeq transcript RNASEQT00000054612, which lack the annotated exon 1B, apparently represent alternative splice isoforms of the fbrsl1-i2a transcript with mutually exclusive exon 9a and encode a protein of 810 amino acid residues (see Figure S7).
With a similar approach as was used for the auts2a gene locus, by using RNASeq data we defined additional 10 alternative first exons representing 59 extensions of annotated exons: exon 2L (59 extension of exon 2), exon 6L, exon 7L, exon 8L, exon 10L, exon 11L, exon 15L, exon 16L, exon 17L, and exon 18L (Figure 6, A and B and Figure S6). Several RNASeq transcripts have the first nucleotides annotated inside Abbreviations used to label areas in brain sections can be found in Table S1.
of mutually exclusive exons 9a and 9b, so the transcripts begin with ATG (see Table S4 for details). We did not consider such candidates as putative TSSs because it could be accounted for by truncated cDNAs. However, we cannot rule out that they are genuine TSSs. In our 59-RACE experiments, a single 59-end was mapped inside of exon 2, 66 nt downstream from the first nucleotide of transcript RNASEQT00000061379 (TSS3 in Figure 6A, see also Figure S6B for details). In this case, it is likely that the 59-RACE product represents the truncated form transcribed from TSS3.
RT-PCR analysis of fbrsl1 expression during zebrafish development revealed that both isoforms are maternally supplied ( Figure 6C). Isoforms fbrsl1-i1 (TSS1) and fbrsl1-i2 (TSS4) are detected again from 9 to 6 hr, respectively (weak signal at 9 hr for TSS1, Figure 6C). Direct sequencing of PCR products revealed coamplification of transcripts with and without exons 4 or 17 (skipped exons), and coamplification of transcripts with mutually exclusive exons 9a and 9b.
Expression of the fbrsl1 gene during zebrafish development We conducted WISH to examine the spatial expression of the fbrsl1 gene during zebrafish development. Since we identified two populations of fbrsl1 isoforms, fbrsl1-i1 and fbrsl1-i2, that are generated from TSS1 and TSS4, respectively, we asked if these isoforms are expressed differentially during development. We used a similar approach as for auts2a isoforms. We designed two isoform-specific riboprobes that recognize alternative mutually exclusive exons 1A and 1B in fbrsl1 transcripts ( Figure S8A). Analysis of in situ hybridization data showed that although expression was quite similar between the two probes, the isoform transcribed from TSS4 was expressed in hypothalamus and ventral diencephalon, which was not detected for the TSS1 probe, and more strongly in the telencephalon ( Figure S8B). For further analysis of fbrsl1 expression during development, we used a probe designed against full-length isoform fbrsl1-i2b containing exon 9b (for probe details see Materials and Methods).
We also performed in situ hybridization analysis of fbrsl1-i2b expression in the juvenile zebrafish brain (Figure 8). In the forebrain, expression of fbrsl1-i2b is detected in the dorsal telencephalic area (D), the ventral nucleus of ventral telencephalic area (Vv), the ventral (Hv), caudal (Hc), and dorsal (Hd) zones of the periventricular hypothalamus, the mammillary body (CM), and the posterior part of the parvocellular preoptic nucleus (PPp) (Figure 8, A-D). In the midbrain, fbrsl1-i2b is strongly expressed in the PGZ and also detected in the medial preglomerular nucleus (PGm) and dorsomedial optical tract (DOT) (Figure 8, B-D). In the cerebellum, fbrsl1-i2b is expressed in the PCL, similar to auts2a, and in the LCa ( Figure 8E). In the hindbrain, fbrsl1-i2b is expressed in the LVII and CON ( Figure 8F).
In silico identification of transcripts in the fbrs gene locus and expression of fbrs during development The current RefSeq fbrs gene model (NCBI Gene ID: 100535921) defines 19 exons present in transcript XM_003199613 ( Figure 9A). Three RefSeq predicted transcripts, XM_005156304, XM_005156306, and XM_017358534, represent alternatively spliced mRNAs that are tran-scribed from the same TSS1 ( Figure 9B). Alternative splicing occurs at the 39 acceptor splice site of exon 12 (XM_005156304), which leads to the exclusion of 15 nt from the spliced mRNA and results in an in-frame deletion of five amino acids in the protein ( Figure 9D). Skipped exons are found in transcripts XM_005156306 (exons 16 and 17 are spliced out together), XM_017358534 (skipped exon 16), and in Ensembl transcript ENSDART00000153054 (skipped exon 5). Transcript ENSDART00000153054 also has an alternatively spliced exon 12 similar to that in XM_005156304.
We performed 59-RACE analysis with gene-specific primers designed against exon 2. We could identify only a single TSS, which was mapped 55 nt upstream from the first nucleotide of RefSeq transcripts (see Figure S9). Using a forward primer, designed in close proximity to TSS1, we isolated cDNA corresponding to XM_017358534. From RNASeq data, two additional alternative first exons, representing 59 extensions of exon 2 (2L, TSS2) and exon 18 (18L, TSS3), were predicted ( Figure 9B and Figure S9). Analysis of RNASeq data also revealed transcription from TSS1 that generates isoforms with alternative 39-ends (see Table S5).
RT-PCR and in situ analysis revealed that fbrs mRNA is maternally supplied and ubiquitously expressed in neural tissues during development (Figure 9, C and E-J). It is also expressed in Kupffer's vesicle (Figure 9, F, G, and G") and later in the eyes and somites (Figure 9, H-J). In the juvenile brain, expression of fbrs is detected in the Hv, Hc, and Hd zones of the periventricular hypothalamus, the PGZ, and the granular layer of the CCe (Figure 9, K-O).

DISCUSSION
Eukaryotes employ a range of mechanisms to generate multiple mRNA isoforms. These mechanisms include usage of alternative TSSs, alternative splice sites, and polyadenylation. Here, we show that, in zebrafish, Abbreviations used to label areas in brain sections can be found in Table S1.
the auts2 gene family has four paralogs: auts2a, auts2b, fbrsl1, and fbrs. All four paralogs exhibit multiple TSSs and alternative splicing, with auts2a and fbrsl1 giving rise to more mRNA diversity than auts2b or fbrs. However, whether all mRNA isoforms are translated into proteins cannot be easily determined and the interesting question is whether there is a biological function underlying such complexity. Our study  Table S1. ID, identifier; RNASeq, RNA sequencing; RT-PCR, reverse transcription-polymerase chain reaction; TSS, transcription start site. was limited to the usage of 59-RACE analysis followed by RT-PCR and cloning. We have not examined the presence of alternative polyadenylation sites, although available RNASeq data support the presence of RNAs with alternative 39-ends, with some of them encoding short polypeptides or representing ncRNAs.
Transcriptional complexity in the auts2a gene locus Among auts2 paralogs, the highest level of transcriptional complexity was found in auts2a. Here, the complexity is achieved mainly through the usage of alternative promoters rather than through alternative splicing. In this case, transcription from alternative promoters generates auts2a mRNAs that differ in 59-UTRs and encode the N-terminally truncated protein isoforms, while the C-terminal portion of Auts2a protein remains the same. Differences in 59-UTRs may have an impact on translation efficiency since cap-dependent ribosomal scanning is severely hampered in 59-UTRs containing upstream AUGs (uAUGs), uORFs, and secondary structures (Araujo et al. 2012).
The N-terminally truncated protein isoforms may differ in intracellular localization and/or trafficking. In the human AUTS2 protein, several regions were predicted that may be functionally important: the putative NLS in the N-terminal part, two proline-rich regions (PR1 and PR2), and the PY motif in the middle of the protein (Sultana et al. 2002; see Figure S2 for zebrafish Auts2a protein annotation). The relevance of the predicted NLS is not clear, since an AUTS2 protein isoform lacking NLS was localized exclusively in the nucleus (Hori et al. 2014). In our experiments, we also observed nuclear localization after transfection of the HEK293T cell line with either the N-terminal end of Auts2a or an Auts2a-i3 isoform that lacks 467 N-terminal amino acids that are present in the long protein isoform (data not shown). It is clear that Auts2 protein may contain another NLS, different from the one predicted in the N-terminal part. Alternatively, the nuclear localization of Auts2 protein might result from its interaction with another nuclear protein.
The PR1 region was shown to be important for the regulation of actin remodelling, and for neuronal migration and neuritogenesis in particular (Hori et al. 2014). The C-terminal portion of AUTS2, comprising the Auts2 family domain and His repeats, is important for mediating transcriptional activation (Gao et al. 2014) and deletion of this part in the human causes severe phenotypes (Beunders et al. 2015). His repeats have been shown to be associated with protein localization at nuclear speckles (Salichs et al. 2009). The zebrafish Auts2a protein does not contain His repeats at the C-terminal end.
Splicing of auts2a pre-mRNA at tandem splice acceptors with a NAGNAG motif leads to very minor changes in protein sequence (deletion of a single amino acid). It has been proposed that splicing between the intron-proximal and intron-distal AG is achieved by a competition mechanism (Hiller et al. 2004). In many cases, the selection of such AGs can be highly regulated under specific spatiotemporal conditions or external stimuli. Tandem splicing of mRNAs can lead to the production of functionally different proteins, for example, the SCN5A (Makielski et al. 2003) and pou5f3 genes (formerly pou2) (Takeda et al. 1994). In the first example, cells expressing voltage-dependent sodium channel a-subunit protein SCN5A, which contains Q1077, showed a reduced inward sodium current in comparison to SCN5A variants lacking Q1077 (Makielski et al. 2003). In the second example, the usage of either the distal or proximal 39 splice acceptor sites leads to the generation of two proteins that possess distinct functions; one is a transcription factor, while the other is a non-DNA-binding protein (Takeda et al. 1994). In the case of auts2a, alternative splicing at these particular exons is evolutionarily conserved; the positions and identity of deleted amino acids are conserved among either all vertebrates (exons 3 and 8) or only among ray-finned fish, sharks, and coelacanth (exon 15), supporting the functional significance of such minor changes in protein sequence. Splicing at an alternative 59 donor site of exon 9 leading to an in-frame deletion of seven amino acids is highly conserved among jawed vertebrates. Moreover, splicing at this exon is also conserved between auts2a and fbrsl1 genes: identical amino acids that are present in the full-length Fbrsl1 protein are also removed during alternative splicing of the fbrsl1 pre-mRNA ( Figure 1D and Figure 6D).
Transcriptional complexity in the fbrsl1 gene locus Transcriptional complexity in the fbrsl1 gene is achieved through the usage of both alternative promoters and alternative splicing. Three modes of alternative splicing were found in the fbrsl1 gene: exon skipping (exon 17), splicing at alternative 59 donor splice sites of exons 9b and 16, and mutually exclusive exons 9a and 9b. Interestingly, from RNASeq data, two potential TSSs are annotated inside of exons 9a and 9b. Although the importance of exonic promoters is highly speculative, exonic TSSs might have some relationship to so-called exonic splicing enhancers influencing the recruitment of SR proteins (Carninci et al. 2006). Some have speculated that RNAs generated from these TSSs may regulate the decision of which exon will be used in the proteincoding transcript. Differential usage of these mutually exclusive exons leads to protein isoforms that differ substantially in sequence. Amino acid sequence coded by exon 9b is highly conserved even between paralogs. Exclusion of exon 9b may have a drastic effect on protein function. Currently, there is no experimental evidence for a functional role of FBRSL1, except that in RNA-bound proteome analysis FBRSL1 was identified as a candidate RNA-binding protein (Baltz et al. 2012). However, the identity of the particular FBRSL1 protein isoform involved was not reported.

Implications of transcriptional complexity for disease
Although transcriptional complexity allows greater flexibility and control in complex systems, it is more likely to be misregulated, particularly in systems that depend heavily on alternative splicing. Aberrant promoter usage has been associated with several human cancers including colon cancer, ovarian cancer, and neuroblastomas (Landry et al. 2003;Davuluri et al. 2008), suggesting that genes with alternative promoters are more likely to be associated with disease (Davuluri et al. 2008;Liu 2010). BDNF utilizes multiple promoters in a tissue-specific manner, and promoter usage is altered after kainite-induced seizures (Timmusk et al. 1993). A SNP in the promoter region of the 5-HT2A receptor affects promoter activity and is associated with psychiatric disorders (Parsons et al. 2004). Several SNPs, identified within the promoter region of the Kalirin gene are associated with coronary artery disease (Wang et al. 2007;Horne et al. 2009;Boroumand et al. 2014). In this context, the transcriptional complexity in auts2 family genes that we describe here provides an exciting opportunity to understand how misregulation of transcription at these loci leads to the disease conditions that they are associated with.
Expression patterns of auts2 paralogs Differential patterns of gene expression among paralogs is widely believed to play a prominent role in morphological diversification. Duplicated genes are considered to diverge through neofunctionalization (Ohno 1970) and/or subfunctionalization (Lynch and Force 2000), but both processes can occur through evolution of the CDSs and/or the regulatory sequences, giving distinct and/or novel sites of expression.
Previously, it was shown that auts2a is ubiquitously expressed in the central nervous system beginning from 24 hr . Our data show that, already at very early stages, the expression of auts2a becomes restricted to the neural plate. Except for the fbrs gene, which is expressed ubiquitously through development, other auts2 paralogs-auts2a, auts2b, and fbrsl1-show distinct expression patterns, particularly in the hindbrain, suggesting their role in patterning the hindbrain. For example, during neural keel formation, expression of fbrsl1 in the hindbrain is mainly detected in rhombomere 4, while auts2a is expressed in rhombomeres 1, 2, and 4, and auts2b is expressed more broadly, with the strongest expression in rhombomere 2 (see Figure S10). Expression of fbrsl1 and fbrs genes in the Kupffer's vesicle suggests their potential involvement in the establishment or maintenance of left-right asymmetry.
Analysis of expression of auts2 paralogs in juvenile brains revealed the presence of these transcripts in proliferation zones, suggesting their role in adult neurogenesis. In contrast to mammalian brains, teleostean brains have a tremendous number of proliferation zones. Many of these zones are found at or near the surfaces of ventricles (Zupanc et al. 2005). Three paralogs-auts2a, fbrsl1 and fbrs-are localized in the PGZ of the optic tectum, a site known for mitotic activity in the midbrain. In the cerebellum, proliferation zones are located in regions distant from any ventricle. Quantitative analysis has shown that the majority of the new brain cells are generated in the cerebellum; the proliferation zones are located in specific areas within the molecular layers of the cerebellar corpus and the valvula cerebelli (Zupanc and Horschke 1995;Hinsch and Zupanc 2007). Auts2 paralogs are localized either in the granular layer (auts2b and fbrs) or Purkinje cell layer (auts2a and fbrsl1), sites where mitotic activity is minimal. Interestingly two auts2 paralogs, auts2b and fbrsl1, are also expressed in the caudal lobe of the cerebellum, a granular layer of the eminentia granularis, the other site of mitotic activity in the cerebellum (Zupanc and Horschke 1995;Hinsch and Zupanc 2007).
Evolution of the auts2 gene locus To our surprise, only cave fish and zebrafish possess a full-length copy of the auts2b gene. The other teleosts (where sequenced genomes are available) retain a highly reduced copy of the auts2b gene. Moreover, in the cave fish genome we could not find an auts2a gene and it is currently unclear if the absence of auts2a gene in the cave fish genome is simply due to incomplete sequencing and assembly, or it has indeed evolved beyond recognition. Phylogenetically, cave fish and zebrafish belong to the Otomorpha group. Otomorpha and Euteleosteomorpha (all other teleosts with sequenced genomes) split 245 MYA (Broughton et al. 2013). The most common fate of duplicated genes is that while one of the duplicated genes continues to be under selective pressure and retains the ancestral function, the other gene diverges and becomes nonfunctional through the accumulation of deleterious mutations (Langham et al. 2004). Less frequently, both genes are retained, which is the case for the auts2a and auts2b genes in the zebrafish genome. Although in other teleost genomes auts2b has evolved almost beyond recognition, transcription from this genomic locus could still be detected, as suggested by RNASeq data derived from amazon molly, tilapia, and platyfish transcriptomes. It will be interesting to examine how these transcripts are spatially expressed.

Conclusions
Taken together, our results show the existence of multiple auts2 paralogs in zebrafish and the usage of alternative promoters and alternative splice sites for generating huge diversity in mRNA transcripts. The expression of these gene products is also tightly regulated developmentally and across multiple brain regions. Such complexity in regulation of these loci is bound to have significant functional roles and our future studies will be aimed at deciphering them.