Satellite DNA-containing gigantic introns in a unique gene expression program during Drosophila spermatogenesis

Intron gigantism, where genes contain megabase-sized introns, is observed across species, yet little is known about its purpose or regulation. Here we identify a unique gene expression program utilized for the proper expression of genes with intron gigantism. We find that two Drosophila genes with intron gigantism, kl-3 and kl-5, are transcribed in a spatiotemporal manner over the course of spermatocyte differentiation, which spans ~90 hours. The introns of these genes contain megabases of simple satellite DNA repeats that comprise over 99% of the gene loci, and these satellite-DNA containing introns are transcribed. We identify two RNA-binding proteins that specifically localize to kl-3 and kl-5 transcripts and are needed for the successful transcription or processing of these genes. We propose that genes with intron gigantism require a unique gene expression program, which may serve as a platform to regulate gene expression during cellular differentiation.


Author summary
Introns are non-coding elements of eukaryotic genes, often containing important regulatory sequences. Curiously, some genes contain introns so large that more than 99% of the gene locus is non-coding. One of the best-studied large genes, Dystrophin, a causative gene for Duchenne Muscular Dystrophy, spans 2.2Mb, only 11kb of which is coding. This phenomenon, 'intron gigantism', is observed across species, yet little is known about its purpose or regulation. Here we identify a unique gene expression program utilized for the proper expression of genes with intron gigantism using Drosophila spermatogenic genes a model system. We show that the gigantic introns of these genes are transcribed in line with the exons, likely as a single transcript. We identify two RNA-binding proteins that specifically localize to the site of transcription and are needed for the successful transcription or processing of these genes. We propose that genes with intron gigantism require a unique gene expression program, which may serve as a platform to regulate gene expression during cellular differentiation. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
Introns, non-coding elements of eukaryotic genes, often contain important regulatory sequences and allow for the production of diverse proteins from a single gene, adding critical regulatory layers to gene expression [1]. Curiously, some genes contain introns so large that more than 99% of the gene locus is non-coding. In humans, neuronal and muscle genes are enriched amongst those with the largest introns [2]. One of the best-studied large genes, Dystrophin, a causative gene for Duchenne Muscular Dystrophy, spans 2.2Mb, only 11kb of which is coding. A large portion of the remaining non-coding sequence is comprised of introns rich in repetitive DNA [3]. While intron size ('gigantism') is conserved between mouse and human, there is little sequence conservation within the introns, implying the functionality of intron gigantism [4].
In the Drosophila testis, germ cells undergoing differentiation are arranged in a spatiotemporal manner, where the germline stem cells (GSCs) reside at the very apical tip and differentiating cells are gradually displaced distally (Fig 1B) [21]. GSC division gives rise to spermatogonia (SG), which undergo four mitotic divisions with incomplete cytokinesis to become a cyst of 16 SGs. 16-cell SG cysts enter meiotic S phase, at which point they become known as spermatocytes (SCs). SCs have an extended G2 phase, spanning 80-90 hours, prior to initiation of the meiotic divisions [22]. During this G2 phase, the cells increase approximately 25 times in volume and the homologous chromosomes pair and partition into individual chromosome territories ( Fig 1C) [23,24]. During this period, SCs transcribe the majority of genes whose protein products will be needed for meiotic division and spermiogenesis [25][26][27]. Gene expression in SCs is thus tightly regulated to allow for timely expression of meiotic and spermiogenesis genes [28].
It has long been known that three of the Y-chromosome-associated genes that contain gigantic introns (kl-5, kl-3 and ks-1, Fig 1A) form lampbrush-like nucleoplasmic structures in SCs, named Y-loops [denoted as loops A (kl-5), B (kl-3), and C (ks-1), (Fig 1C and 1D)] [17]. Y-loop structures reflect the robust transcription of underlying genes, and have been observed across Drosophilids, including D. simulans, D. yakuba, D. pseudoobscura, D. hydei and D. littoralis [29,30]. Much of the fundamental knowledge about Y-loops comes from D. hydei, which forms large, cytologically distinct Y-loops [31], leading to the discovery that these structures are formed by the transcription of large loci comprised of repetitive DNAs [32-37]. Interestingly, in D. pseudoobscura, which contains a 'neo-Y' (not homologous to the ancestral Y chromosome), Y-loops are thought to be formed by Y-linked genes instead of by the kl-3, kl-5 and ks-1 homologs, which are autosomal [38], suggesting that Y-loop formation is a unique characteristic of Y-linked genes, instead of being a gene-specific phenomenon.
The transcription/processing of such gigantic genes/RNA transcripts, in which exons are separated by megabase-sized introns, must pose a significant challenge for cells. However, how genes with intron gigantism are expressed and whether intron gigantism plays any regulatory roles in gene expression remain largely unknown. In this study, we began addressing these questions by using the Y-loop genes as a model, and describe the unusual nature of the gene expression program associated with intron gigantism. We find that transcription of Y-loop genes progresses in a strictly spatiotemporal manner, encompassing the entire~90 hours of SC development: the initiation of transcription occurs in early SCs, followed by the robust transcription of the satellite DNA from the introns, with cytoplasmic mRNA becoming detectable only in late SCs. We identify two RNA-binding proteins, Blanks and Hephaestus (Heph), which specifically localize to the Y-loops, and show that they are required for robust transcription and/or proper processing of the Y-loop gene transcripts. Mutation of the blanks or heph genes leads to sterility due to the loss of Y-loop gene products. Our study demonstrates that genes with intron gigantism require specialized RNA-binding proteins for proper expression. We propose that such unique processing may be utilized as an additional regulatory mechanism to control gene expression during differentiation.

Transcription of a Y-loop gene, kl-3, is spatiotemporally organized
To start to investigate how the expression of Y-loop genes may be regulated, we sought to monitor their expression during SC development. In previous studies using D. hydei, when two differentially-labeled probes against two intronic repeats of the Y-loop gene DhDhc7(y) (homologous to D. melanogaster kl-5) were used for RNA fluorescent in situ hybridization (FISH), expression of the earlier repeat preceded that of the later repeat [39,40], leading to the idea that Y-loop genes might be transcribed as single, multi-megabase, transcripts. Consistently, Miller spreading of SC Y chromosomes, in which transcripts can be seen still bound to DNA, showed the long Y-loop transcripts [41,42]. However, transcription of the exons was not visualized and extensive secondary structures were present in the Miller spreads, leaving it unclear whether the entire gene region is transcribed as a single transcript.
By using differentially-labeled probe sets designed for RNA FISH to visualize 1) the first exon, 2) the satellite DNA (AATAT) n repeats found in multiple introns including the first [5,43], and 3) exon 14 (of 16) of kl-3 ( Fig 1A, S1 File), we found that kl-3 transcription is organized in a spatiotemporal manner: transcript from the first exon becomes detectable in early SCs, followed by the expression of the (AATAT) n satellite from the introns, then finally by the transcript from exon 14 in more mature SCs (Fig 1D). These results suggest that transcription of kl-3 takes the entirety of SC development, spanning~90 hours. The pattern of transcription is consistent with the model proposed for Y-loop gene expression in D. hydei: the gene is likely transcribed as a single transcript that contains the exons and gigantic introns, although we cannot exclude the possibility of other mechanisms, such as the trans-splicing of multiple individually transcribed exons [44]. over an 80-90 hour G2 phase before initiating the meiotic divisions. (C) Top: SC nucleus model showing the Y-loops in the nucleoplasm. DNA (white), Y chromosome (green), Y-loops A and C (red) and Y-loop B (blue). Bottom: RNA FISH for the Y-loop gene intronic transcripts in a SC nucleus. Y-loops are visualized using probes for Y-loops A and C (Cy3-(AAGAC) 6 , red) and Y-loop B (Cy5-(AATAT) 6  Expression of genes with intron gigantism in Drosophila testis Based on the expression pattern of early exon, (AATAT) n satellite-containing introns, and late exon, SC development can be subdivided into four distinct stages (Fig 1E-1H). In stage 1, only exon 1 transcript is apparent (Fig 1E). In stage 2, the expression of intron transcript is detectable, and the signal from exon 1 remains strong ( Fig 1F). Stage 3 is defined by the addition of late exon signal in addition to the continued presence of exon 1 and intron transcripts, indicating that transcription is nearly complete (Fig 1G). Stage 4 is characterized by the presence of exon probe signals in granule-like structures in the cytoplasm (Fig 1H), which likely reflect kl-3 mRNA localizing to ribonucleoprotein (RNP) granules, as they never contain intron probe signal. These granules are absent following RNAi-mediated knockdown of kl-3 (bam-gal4>UAS-kl-3 TRiP.HMC03546 , Fig 1I), confirming that they reflect kl-3 mRNA. The same pattern of expression was seen for the Y-loop gene kl-5 (see below), suggesting that transcription of the other Y-loop genes proceeds in a similar manner.
Together, these results show that the gigantic Y-loop genes, including megabases of intronic satellite DNA repeats, are transcribed continuously in a process that spans the entirety of SC development, culminating in the formation of mRNA granules in the cytoplasm near the end of the 80-90 hour meiotic G2 phase. While transcription elongation is believed to be quite stable [45], the presence of tandem arrays [46] or repeat expansions (as seen in trinucleotide expansion diseases) [47-49] can greatly slow an elongating polymerase and/or lead to premature dissociation [50]. Therefore, Y-loop gene transcription may require precise regulation.

Identification of genes that may regulate the transcription of the Y-loop genes
Considering the size of the Y-loop gene loci and their satellite DNA-rich introns, transcription of the Y-loop genes likely utilizes unique regulatory mechanisms. To start to understand such a genetic program, we performed a screen (See Methods and S2 File). Briefly, a list of candidates was curated using a combination of gene ontology (GO) terms, expression analysis, predicted functionality and reagent availability, resulting in a final list of 67 candidate genes (S2 File). Candidates were screened for several criteria including protein localization, fertility, and Y-loop gene expression. Among these, two genes, blanks and hephaestus (heph), exhibit localization patterns and phenotypes that reveal critical aspects of Y-loop gene regulation and were further studied. Several proteins, including Boule [51], Hrb98DE [52], Pasilla [52,53] and Rb97D [54], were previously shown to localize to the Y-loops but displayed no detectable phenotypes in Y-loop gene expression in SCs using RNAi-mediated knockdown and/or available mutants (S2 File), and were not further pursued in this study.

Blanks and Heph are RNA binding proteins that specifically localize to the Y-loops and are required for fertility
Blanks, a RNA-binding protein with multiple dsRNA binding domains, is primarily expressed in SCs. Blanks has been shown to be important for post-meiotic sperm development and male fertility [55,56], and Blanks' ability to bind RNA was found to be necessary for fertility [55]. In order to assess Blanks' localization within the SC nucleus, testes expressing GFP-Blanks were processed for RNA FISH with probes against the intronic satellite DNA transcripts [(AATAT) n for Y-loop B/kl-3, (AAGAC) n for Y-loops A/kl-5 and C/ks-1 [57]]. (AATAT) n is the only satellite DNA found in Y-loop B [5,58] and while (AAGAC) n is not the only satellite DNA found in Y-loops A & C, its expression from these loci was previously characterized [57]. We found that GFP-Blanks exhibits strong localization to Y-loop B (Fig 2A).
Heph, a heterogeneous nuclear ribonucleoprotein (hnRNP) homologous to mammalian polypyrimidine track binding protein (PTB), is a RNA-binding protein with multiple RNA recognition motifs (RRMs) that is expressed in the testis [59]. Heph has also been implicated in post-meiotic sperm development and male fertility [60,61]. By using a Heph-GFP protein trap (p(PTT-GC)heph CC00664 ) combined with RNA FISH to visualize the Y-loop gene intronic transcripts, we found that Heph-GFP localizes to Y-loops A and C ( Fig 2B). It should be noted that the heph locus encodes 25 isoforms and the Heph-GFP protein trap likely represents only a subset of heph gene products. A summary of Y-loop designation, gene, intronic satellite DNA repeat, and binding protein is provided in Fig 2C. We confirmed previous reports that blanks is required for male fertility [55,56]. By examining the seminal vesicles for the presence of motile sperm, we found that seminal vesicles from control siblings contain abundant motile sperm ( Fig 2D, 13% empty, 87% normal, n = 46) while seminal vesicles from blanks mutants (blanks KG00084 /Df(3L)BSC371) lack motile sperm ( Fig 2E, 96% = empty, 4% greatly reduced, n = 58). We also confirmed previous reports that heph is required for fertility [60,62]. Seminal vesicles from heph mutants (heph 2 /Df(3R) BSC687) also lack motile sperm ( Fig 2G, 100% empty, n = 21), while those from control siblings contain motile sperm ( Fig 2F, 5% empty, 95% normal, n = 57).
Previous studies [55,56,60] reported that blanks and heph mutants are defective in sperm individualization, one of the final steps in sperm maturation, where 64 interconnected spermatids are separated by individualization complexes (ICs) that form around the sperm nuclei and migrate in unison along the sperm tails, removing excess cytoplasm and encompassing each cell with its own plasma membrane ( Fig 2J) [63]. When the F-actin cones of IC were visualized by Phalloidin staining, it became clear that ICs form properly in all genotypes (Fig 2K-2N), but become disorganized in the late ICs in blanks and heph mutants, a hallmark of axoneme formation defects [64], preventing completion of individualization (Fig 2Q-2T).

blanks is required for transcription of the Y-loop B gene kl-3
As blanks was found to localize to Y-loop B, we first determined whether there were any overt defects in Y-loop B formation or kl-3 expression in blanks mutants. To this end, we performed RNA FISH to visualize the Y-loop gene intronic transcripts in blanks mutants. Compared to control testes where intronic satellite DNA transcripts from all Y-loops become detectable fairly early in SC development and quickly reach full intensity (Fig 3A), the signal from the Yloop B intronic transcripts remains faint in blanks mutants ( Fig 3B). The expression of Y-loops A and C is comparable between control and blanks mutant testes (Fig 3A and 3B).
In addition to a reduction in the expression of the intronic satellite DNA repeats of Y-loop B/kl-3, expression of kl-3 exons is also reduced in blanks mutants. By performing RNA FISH using exonic and intronic (AATAT) n probes for Y-loop B/kl-3, we found that blanks mutants display an overall reduction in signal intensity for both intronic satellite repeats and exons compared to controls (Fig 3C and 3E). Moreover, cytoplasmic kl-3 mRNA granules are rarely detected in blanks mutants (Fig 3F). The same results are obtained following RNAi mediated knockdown of blanks (bam-gal4>UAS-blanks TRiP.HMS00078 , S1 Fig). These results suggest that blanks is required for robust and proper expression of Y-loop B/kl-3 and for the production of kl-3 mRNA granules, likely at the transcriptional level. Consistently, we found that the amount of Kl-3 protein is greatly diminished in blanks mutants, confirming that blanks is required for proper expression of Y-loop B/kl-3 ( Fig 3G).
To obtain a more quantitative measure of kl-3 expression levels in control and blanks mutant testes, we performed RT-qPCR. Primers were designed to amplify early (close to the 5' end), middle, and late (close to the 3' end) regions of kl-3. For each region, two sets of primers were designed: one primer set spanned a satellite DNA-containing large intron and another spanned a normal size intron (Fig 3H, bars denote spanned intron, and S3 File). All primer sets show a detectable drop in kl-3 mRNA levels in blanks mutants when normalized to GAPDH and sibling controls (Fig 3H). We noted a detectable drop between the early primer sets (~75% reduction in expression levels compared to controls) and the middle/late primer sets (~95% reduction in expression levels compared to controls), raising the possibility that blanks mutants may have difficulty transcribing this Y-loop gene soon after encountering the first satellite DNA-containing gigantic intron or stabilizing kl-3 transcripts. A recent study that examined global expression changes in blanks mutant testes reported a similar change in kl-3 gene expression [67]. In summary, the RNA-binding protein Blanks localizes to Y-loop B and allows for the robust transcription of the Y-loop B gene kl-3.
In contrast to Y-loop B/kl-3 expression, Y-loop A/kl-5 expression appeared normal in blanks mutants. We designed RNA FISH probes against kl-5 in the same manner as for kl-3 (i.e. early exon, intron and late exon) (Fig 4A, S1 File). We found that transcription of Y-loop A/kl-5 follows a spatiotemporal pattern similar to that of Y-loop B/kl-3 ( Fig 4B and 4C): early exon transcripts become detectable in early SCs while kl-5 mRNA granules are not detected in the cytoplasm until near the end of SC development (Fig 4B and 4C). No overt differences are observed in kl-5 expression in blanks mutants and kl-5 mRNA granules are observed in the cytoplasm (Fig 4D and 4E). By RT-qPCR with primers for kl-5 designed similarly as described above for kl-3 (Fig 3H), we found a mild reduction in kl-5 expression in blanks mutants when normalized to GAPDH and sibling controls (Fig 4F). However, considering the fact that the kl-5 mRNA granule is correctly formed in blanks mutant testes (Fig 4E), this reduction may not be biologically significant. The mild reduction in kl-5 transcript in blanks mutants could be an indirect effect caused by defective Y-loop B expression. Alternatively, it is possible that a small amount of (AATAT) n satellite, which is predicted to be present in the last intron of kl-5 [43,68], might cause this mild reduction in kl-5 expression in blanks mutant testes.

Blanks is unlikely to be a part of the general meiotic transcription program
It is well known that SCs utilize a specialized transcription program in order to transcribe the vast majority of genes required for meiosis and spermiogenesis [28,69,70]. This program is executed by two groups of transcription factors: tMAC and the tTAFs. The tMAC (testis-specific meiotic arrest complex) complex has both activating and repressing activities and has been shown to physically interact with the core transcription initiation machinery [71][72][73][74][75][76][77]. The tTAFs (testis-specific TATA binding protein associated factors) are homologs of core transcription initiation factors [78][79][80][81]. tMAC and the tTAFs function cooperatively to regulate meiotic gene expression. To examine whether blanks is part of this established meiotic transcription program, we examined the expression of fzo and Dic61B, known targets of the SC-specific transcriptional program [70,79], which are located on autosomes and do not have gigantic introns (S1 File). In contrast to mutants for the tMAC component aly (aly 2/5P ), which has drastically reduced levels of fzo and Dic61B transcripts, the expression of these genes is not visibly affected in blanks mutants (S2 Fig), suggesting that blanks is not a part of the SC-specific transcriptional program involving tTAFs and tMAC. Instead, blanks is likely uniquely involved in the expression of the Y-loop genes.  Expression of genes with intron gigantism in Drosophila testis

Heph is required for processing transcripts of the Y-loop A gene kl-5
As Heph-GFP localized to Y-loops A and C, we first examined whether Y-loops A and C displayed any overt expression defects in heph mutants (Fig 3B). When we performed RNA FISH to visualize the Y-loop gene intronic transcripts in heph mutants, the overall expression levels of both (AAGAC) n and (AATAT) n satellites appear unchanged between control and heph mutant testes (Fig 5A and 5B). However, we noted that the morphology of Y-loops A and C is altered in heph mutants, adopting a less organized, diffuse appearance (Fig 5B), whereas all Yloops in control SCs show characteristic thread-like or globular morphologies (Fig 5A). Y-loop B appears unchanged between controls and heph mutants (Fig 5A and 5B). These results indicate that heph may be important for structurally organizing Y-loop A and C transcripts, without affecting overall transcript levels.
We next examined the expression pattern of kl-5 exons together with the Y-loop A/C intronic satellite [(AAGAC) n ], as described in Fig 4. Overall expression levels of kl-5 appear to be unaltered in heph mutant testes (Fig 5C and 5E). However, in contrast to control testes ( Fig  5D), heph mutant testes rarely have cytoplasmic kl-5 mRNA granules in late SCs (Fig 5F), suggesting that heph mutants affect kl-5 mRNA production without affecting transcription in the nucleus. heph mutants may be defective in processing the long repetitive regions of transcripts to generate mRNA (e.g. splicing, mRNA export or protection from degradation). We also examined the expression of ks-1 (ORY) in heph mutants as Heph-GFP also localized to Y-loop C. While the ORY ORF is too short to allow for designing exon-specific probes to examine temporal expression patterns, RNA FISH with probes targeting all exons of ORY revealed that ORY mRNA granules are not formed in heph mutants (S3 Fig). Similar to blanks mutants, heph mutants show no defects in the expression of fzo or Dic61B (S2 Fig), indicating that heph is not a member of the more general meiotic transcription program. Instead, heph, like blanks, appears to specifically affect the expression of Y-loops to which it localizes.
RT-qPCR showed that heph mutants only exhibit a moderate reduction in kl-5 expression when normalized to GAPDH and sibling controls (Fig 5G), which is in accordance with the RNA FISH results described above. A similar moderate reduction in kl-5 mRNA is observed in blanks mutants (Fig 4F), which do not affect kl-5 mRNA granule formation. Thus, it is unlikely that the reduction in kl-5 expression levels alone causes the lack of kl-5 mRNA granules in heph mutant SCs. Instead, we postulate that mRNA granule formation is dependent on proper processing or stability of primary transcripts, which may be defective in heph mutants.
Surprisingly, we found that kl-3 mRNA granules are also absent in heph mutants, although Y-loop B/kl-3 expression levels in the nucleus appear to be unaffected (Fig 6A-6D). RT-qPCR showed a similar moderate reduction in kl-3 mRNA in heph mutants when normalized to GAPDH and sibling controls (Fig 6E) as was observed in kl-5 mRNA (Fig 5G). Consistent with the absence of cytoplasmic kl-3 mRNA granules, Kl-3 protein levels are dramatically reduced in heph mutant testes (Fig 6F). This is unexpected as Heph protein does not localize to Y-loop B (Fig 2B) or affect Y-loop B morphology (Fig 5A and 5B). It is possible that some of the predicted 25 isoforms of Heph are not visualized by Heph-GFP, and these un-visualized isoforms might localize to and regulate Y-loop B/kl-3 expression. Alternatively, this may be an indirect effect of defective Y-loop A and C expression and/or structure.
Taken together, our results show that Blanks and Heph, two RNA-binding proteins, are essential for the expression of Y-loop genes, but are not members of the more general meiotic transcription program. As Y-loop genes are essential for sperm motility and fertility, the sterility observed in blanks and heph mutants likely stems from defects in Y-loop gene expression. Blanks and Heph highlight two distinct steps (transcriptional processivity and RNA processing (e.g. splicing, export and/or stability of transcripts)) in a unique Y-loop gene expression program.

Discussion
The existence of the Y chromosome lampbrush-like loops of Drosophila has been known for the last five decades [82,83], however little is known as to how Y-loop formation and Expression of genes with intron gigantism in Drosophila testis Expression of genes with intron gigantism in Drosophila testis expression is regulated and whether these SC-specific structures are important for spermatogenesis. Here we identified a Y-loop gene-specific expression program that functions in parallel to the general meiotic transcriptional program to aid in the expression and processing of the gigantic Y-loop genes. Our results suggest that genes with intron gigantism, such as the Yloop genes and potentially other large genes such as Dystrophin, require specialized mechanisms for proper expression.
The mutant phenotypes of blanks and heph, the two genes identified to be involved in this novel expression program, highlight two distinct steps of the Y-loop gene specific expression program (Fig 6G). Blanks was originally identified as an siRNA binding protein, but no defects in small RNA mediated silencing were observed in the testes of blanks mutants [55,56]. We found that blanks is required for transcription of Y-loop B/kl-3, as nuclear transcript levels were visibly reduced in blanks mutants, leading to the lack of both kl-3 mRNA granules in the cytoplasm and Kl-3 protein. As Blanks' ability to bind RNA was previously found to be required for male fertility [55], we speculate that Blanks may bind to newly synthesized nascent kl-3 RNA, which contain megabases of satellite DNA transcripts, so that transcripts' secondary/tertiary structures do not interfere with transcription [84]. It is possible that elongating RNA polymerases, which slow and potentially lose stability on repetitive DNAs [46, 49], might require Blanks to increase processivity, allowing them to transcribe through repetitive DNA sequences, as has been observed for repetitive sequences in other systems [85][86][87].
Heph has been implicated in a number of steps in RNA processing and translational regulation [88][89][90][91], but Heph's exact role in the testis remained unclear despite its requirement for male fertility [60,61]. We found that heph mutants fail to generate kl-5 cytoplasmic mRNA granules even though nuclear transcript levels appeared minimally affected. This suggests that heph may be required for processing the long repetitive transcripts. For example, heph might be required to ensure proper splicing of the Y-loop gene pre-mRNAs, which is predicted to be challenging as the splicing of adjacent exons becomes exponentially more difficult as intron length increases [92]. Y-loop genes may utilize proteins like Heph to combat this challenge or alternatively, Heph could aid in stabilizing this long RNA and preventing premature degradation.
These results highlight the presence of a unique program tailored toward expressing genes with intron gigantism. Although the functional relevance of intron gigantism remains obscure, our results may provide hints as to the possible functions of intron gigantism. Even if intron gigantism did not arise to serve a specific function, once it emerges, the unique gene expression program that can handle intron gigantism must evolve to tolerate the burden of gigantic introns, as indicated by our study on blanks and heph mutants. Ultimately, the presence of a unique gene expression program for genes with gigantic introns would provide a unique opportunity to regulate gene expression. Once such systems evolve, other or new genes may start utilizing such a gene expression program to add an additional layer of complexity to the regulation of gene expression. For example, in the case of Y-loop genes, the extended time period required for the transcription of the gigantic Y-loop genes (~80-90 hours) might function as a 'developmental timer' for SC differentiation. Similar to this idea, it was shown that the expression of two homologous genes, knirps (kni) and knirps-like (knrl), is regulated by intron size during embryogenesis in Drosophila. Although knrl can perform the same function as kni in embryos, mRNA of knrl is not produced due to the presence of a relatively large (14.9kb) intron (as opposed to the small (<1kb) introns of kni), which prevents completion of knrl transcription during the short cell cycles of early development [93]. A similar idea was proposed for Ultrabithorax (Ubx) in the early Drosophila embryo, where large gene size led to abortion of transcription of Ubx during the syncytial divisions of Drosophila embryos, preventing production of Ubx protein. [94]. Thus, intron size can play a critical role in the regulation of gene expression. Alternatively, satellite DNA-containing gigantic introns could act in a manner similar to enhancers, recruiting transcriptional machinery to the Y-loop genes to facilitate expression [1].
In summary, our study provides the first glimpse at how the expression of genes with intron gigantism requires a unique gene expression program, which acts on both transcription and post-transcriptional processing.

Fly Husbandry
All fly stocks were raised on standard Bloomington medium at 25˚C, and young flies (1-to 3-day-old adults) were used for all experiments. Flies used for wild-type experiments were the standard lab wild-type strain yw (y 1 w 1 ). The following fly stocks were used: GFP-blanks (GFP-tagged Blanks expressed by it's endogenous promoter) was a gift of Dean Smith [55]. bam-gal4 was a gift of Dennis McKearin [95]. The aly 2 and aly 5P stocks were a gift of Minx Fuller [69].
It is important to note that the heph 2 allele is known to be male sterile whereas other heph alleles are lethal, thus the heph 2 allele is unlikely to be null and affects only a subset of isoforms, including one/those with a testis-specific function. The Y chromosome in the heph deficiency strain Df(3R)BSC687 appeared to have accumulated mutations that resulted in abnormal Yloop morphology. This Y chromosome was replaced with the yw Y chromosome for all experiments described in this study.
The kl-3-FLAG strain was constructed by Fungene (fgbiotech.com) using CRISPR mediated knock-in of a 3X-FLAG tag in frame at the endogenous C-terminus immediately preceding the termination codon of kl-3 using homology-directed repair. Two guide RNAs were used (CCACTGGACTTTAAGGGGTGTTGC and GCATCCTGACCACTGGACTTTAAG) and point mutations were introduced in the PAM sequences following homology directed repair to prevent continued cutting.

RNA Fluorescent in situ hybridization
All solutions used for RNA FISH were RNase free. Testes from 2-3 day old flies were dissected in 1X PBS and fixed in 4% formaldehyde in 1X PBS for 30 minutes. Then testes were washed briefly in PBS and permeabilized in 70% ethanol overnight at 4˚C. Testes were briefly rinsed with wash buffer (2X saline-sodium citrate (SSC), 10% formamide) and then hybridized overnight at 37˚C in hybridization buffer (2X SSC, 10% dextran sulfate (sigma, D8906), 1mg/mL E. coli tRNA (sigma, R8759), 2mM Vanadyl Ribonucleoside complex (NEB S142), 0.5% BSA (Ambion, AM2618), 10% formamide). Following hybridization, samples were washed three times in wash buffer for 20 minutes each at 37˚C and mounted in VECTASHIELD with DAPI (Vector Labs). Images were acquired using an upright Leica TCS SP8 confocal microscope with a 63X oil immersion objective lens (NA = 1.4) and processed using Adobe Photoshop and ImageJ software.
Fluorescently labeled probes were added to the hybridization buffer to a final concentration of 50nM (for satellite DNA transcript targeted probes) or 100nM (for exon targeted probes). Probes against the satellite DNA transcripts were from Integrated DNA Technologies. Probes against kl-3, kl-5, fzo, and Dic61B exons were designed using the Stellaris 1 RNA FISH Probe Designer (Biosearch Technologies, Inc.) available online at www.biosearchtech.com/ stellarisdesigner. Each set of custom Stellaris 1 RNA FISH probes was labeled with Quasar 670, Quasar 570 or Fluorescein-C3 (S1 File).

RT-qPCR
Total RNA from testes (50 pairs/sample) was extracted using TRIzol (Invitrogen) according to the manufacturer's instructions. 1μg of total RNA was reverse transcribed using SuperScript III 1 Reverse Transcriptase (Invitrogen) followed by qPCR using Power SYBR Green reagent (Applied Biosystems). Primers for qPCR were designed to amplify only mRNA. For average introns, one primer of the pair was designed to span the two adjacent exons. Primers spanning large introns could only produce a PCR product if the intron has been spliced out. Relative expression levels were normalized to GAPDH and control siblings. All reactions were done in technical triplicates with at least two biological replicates. Graphical representation was inclusive of all replicates and p-values were calculated using a t-test performed on untransformed average ddct values. Primers used are listed in S3 File.

Screen for the identification of proteins involved in Y-loop gene expression
Initially,~2200 candidate genes were selected based on gene ontology (GO) terms (e.g.. "mRNA binding", "regulation of translation", "spermatid development"). These genes were cross-referenced against publicly available RNAseq data sets (i.e.: FlyAtlas, modENCODE) and only those genes predicted to be expressed in the testis were selected. Additionally, candidate genes were eliminated if they are known to be involved in ubiquitous processes (e.g. general transcription factors, ribosomal subunits) or processes that are seemingly unrelated to those associated with the Y-loop genes (e.g. mitochondrial proteins, GSC/SG differentiation, mitotic spindle assembly). Finally, candidates were limited to those with available reagents for localization and/or phenotypic analysis, leaving a final list of 67 candidate genes (S2 File). If available, we first analyzed protein localization for each candidate. If candidate proteins did not localize to SCs or the Y-loops, they were not further examined. If the candidate was found to be expressed in SCs or if no localization reagents were available, then RNAi mediated knockdown or mutants were used to examine Y-loop gene expression for any deviations from the expression pattern described in Fig 1D-1H and to assess fertility. As Y-loop genes are all essential for sperm maturation [14], any genes essential for Y-loop gene expression should also be needed for fertility. All selection criteria and a summary of phenotypes observed can be found in S2 File.

Phalloidin staining
Testes were dissected in 1X PBS, transferred to 4% formaldehyde in 1X PBS and fixed for 30 minutes. Testes were then washed in 1X PBST (PBS containing 0.1% Triton-X) for at least 60 minutes followed by incubation with Phalloidin-Alexa546 (ThermoFisher, a22283, 1:200) antibody in 3% bovine serum albumin (BSA) in 1X PBST at 4˚C overnight. Samples were washed for 60 minutes in 1X PBST and mounted in VECTASHIELD with DAPI (Vector Labs). Images were acquired using an upright Leica TCS SP8 confocal microscope with a 63X oil immersion objective lens (NA = 1.4) and processed using Adobe Photoshop and ImageJ software.

Seminal vesicle imaging and analysis
To determine the presence of motile sperm, testes with seminal vesicles were dissected in 1X PBS, transferred to 4% formaldehyde in 1X PBS and fixed for 30 minutes. Testes were then washed in 1X PBST (PBS containing 0.1% Triton-X) for at least 60 minutes and mounted in VECTASHIELD with DAPI (Vector Labs). Seminal vesicles were then examined by confocal microscopy. The number of sperm nuclei, as determined by DAPI staining, was observed. If comparable to wildtype, the seminal vesicle was scored as having a normal number of motile sperm, if the seminal vesicle contained no detectable sperm nuclei, it was scored as empty and if the seminal vesicle contained only a few sperm, it was scored as greatly reduced.
To obtain representative images, seminal vesicles were dissected in 1X PBS and transferred to slides for live observation by phase contrast on a Leica DM5000B microscope with a 40X objective (NA = 0.75) and imaged with a QImaging Retiga 2000R Fast 1394 Mono Cooled camera. Images were adjusted in Adobe Photoshop.