Decoding Transcription Regulatory Mechanisms Associated with Coccidioides immitis Phase Transition Using Total RNA

ABSTRACT New or emerging infectious diseases are commonly caused by pathogens that cannot be readily manipulated or studied under common laboratory conditions. These limitations hinder standard experimental approaches and our abilities to define the fundamental molecular mechanisms underlying pathogenesis. The advance of capped small RNA sequencing (csRNA-seq) now enables genome-wide mapping of actively initiated transcripts from genes and other regulatory transcribed start regions (TSRs) such as enhancers at a precise moment from total RNA. As RNA is nonpathogenic and can be readily isolated from inactivated infectious samples, csRNA-seq can detect acute changes in gene regulation within or in response to a pathogen with remarkable sensitivity under common laboratory conditions. Studying valley fever (coccidioidomycosis), an emerging endemic fungal infection that increasingly impacts livestock, pet, and human health, we show how csRNA-seq can unravel transcriptional programs driving pathogenesis. Performing csRNA-seq on RNA isolated from different stages of the valley fever pathogen Coccidioides immitis revealed alternative promoter usage, connected cis-regulatory domains, and a WOPR family transcription factor, which are known regulators of virulence in other fungi, as being critical for pathogenic growth. We further demonstrate that a C. immitis WOPR homologue, CIMG_02671, activates transcription in a WOPR motif-dependent manner. Collectively, these findings provide novel insights into valley fever pathogenesis and provide a proof of principle for csRNA-seq as a powerful means to determine the genes, regulatory mechanisms, and transcription factors that control the pathogenesis of highly infectious agents. IMPORTANCE Infectious pathogens like airborne viruses or fungal spores are difficult to study; they require high-containment facilities, special equipment, and expertise. As such, establishing approaches such as genome editing or other means to identify the factors and mechanisms underlying caused diseases, and, thus, promising drug targets, is costly and time-intensive. These obstacles particularly hinder the analysis of new, emerging, or rare infectious diseases. We recently developed a method termed capped small RNA sequencing (csRNA-seq) that enables capturing acute changes in active gene expression from total RNA. Prior to csRNA-seq, such an analysis was possible only by using living cells or nuclei, in which pathogens are highly infectious. The process of RNA purification, however, inactivates pathogens and thus enables the analysis of gene expression during disease progression under standard laboratory conditions. As a proof of principle, here, we use csRNA-seq to unravel the gene regulatory programs and factors likely critical for the pathogenesis of valley fever, an emerging endemic fungal infection that increasingly impacts livestock, pet, and human health.

pathogenesis as well as the challenges of analyzing highly infectious fungal spores that are rich in secondary metabolites and protected by thick cell walls (23), we focused this study on C. immitis. Performing csRNA-seq on RNA isolated from different stages of the valley fever pathogen revealed alternative promoter usage, connected cis-regulatory domains, and WOPR family TFs as likely critical for infection. WOPR TFs are known regulators of virulence in other fungi (24)(25)(26)(27), and we further demonstrate that a C. immitis WOPR homologue, CIMG_02671, activates transcription in a WOPR motif-dependent manner. These findings provide a foundation to explore avenues for the diagnosis and treatment of valley fever and a proof of principle for csRNA-seq as a powerful means to identify genes, regulatory mechanisms, and transcription factors that control pathogenesis.

RESULTS
Functional transcriptome annotation of the BSL3 pathogen C. immitis. With the aim to further our understanding of the genes and transcription factors driving C. immitis phase transition and, thus, pathogenesis, we grew C. immitis RS mycelia and young, 48-h-old, and mature, 8-day-old, spherules under biosafety level 3 (BSL3) conditions. Young spherules (48 h old) differ from mature ones (8 days old) in that the former do not contain endospores ( Fig. 1A) (28,29). Samples were inactivated using QIAzol (Qiagen) and ZR BashingBead lysis at 50 Hz for 25 min, and total RNA was extracted under common laboratory conditions. This RNA was then used to capture the steady-state transcriptome by ribosome-depleted paired-end RNA-seq as well as ongoing transcription by csRNA-seq (15) (Fig. 1B). All assays were performed on duplicate samples that were prepared on separate days for all three stages, and the generated data were highly reproducible and correlated across methods (see Fig. S1 in the supplemental material). Combined, they reveal the functional transcriptome of C. immitis at these three stages at unprecedented resolution: RNA-seq accurately quantifies RNAs present in the sample, while csRNA-seq captures actively initiating RNAs and, thus, the transcription start sites (TSSs) of both stable and unstable transcripts (Fig. 1C).
As accurate transcriptome annotations have become an integral part of research (30,31), we first exploited our data to annotate genes and regulatory elements in C. immitis (Fig. 1C). Using StringTie to assemble transcripts directly from our RNA-seq data (32), we identified 7,163 genes (10,069 isoforms in total) in mycelia and 9,669 genes (14,639 isoforms) in spherules (Fig. S2A). Unlike in mycelia, where only 393 novel putative genes were annotated, 2,240 (23.2%) putative genes that are not represented in the official reference annotation GTF file (Ensembl) were identified in spherules, underscoring the need for studying the pathogenic, in addition to the vegetative mycelial, stage. Using csRNAseq, we identified in mycelia 11,728 promoter, putative enhancer, and other transcribed regulatory regions that we jointly refer to as transcription start regions (TSRs) here. Most TSRs found in mycelia were in promoters (6500 bp and either sense or antisense relative to the annotated 59 ends of genes [TSSs]) (Fig. S2B). In 48-h-and 8-day-old spherules, 21,284 and 19,771 TSRs were identified, respectively. Most TSRs started transcription bidirectionally (91.04%) from the same regulatory regions and initiated from several dispersed individual TSSs rather than a single dominant TSS ( Fig. 1C; Fig. S2C and D). Interestingly, TSRs in spherules, but not mycelia, were frequently promoter distal and initiated unstable RNAs (;49% of TSRs are .500 bp from reference annotated TSSs) ( Fig. 1D; Fig. S3A). On average, TSRs at annotated promoters had higher levels of associated RNA reads than promoter-distal loci (Fig. 1E). Similarly, intragenic TSRs had higher levels of associated RNA than intergenic TSRs (Fig. 1E).
To place these findings into a broader concept, we next profiled other fungi, Saccharomyces cerevisiae, Schizosaccharomyces pombe, and champignon mushroom (Agaricus bisporus), and integrated published data from the mold Neurospora crassa and humans (15). Integrated, these data reveal that the patterns of transcription initiation in Coccidioides mycelia share many properties with those of other fungi, including bidirectionally transcribed regulatory elements, multiple initiation sites per promoter, and limited promoter-distal initiation ( Fig. 1E and F; Fig. S3). The transition to Coccidioides spherules, on the other hand, is associated with a massive increase in the numbers of TSRs and unstable RNAs and a higher percentage of TSRs located at promoter-distal locations.
Phase transition in C. immitis is accompanied by large changes in transcription programs. To gain insights into the genes differentially expressed during phase transition, we next compared RNA-seq data from mycelia to those from spherules, which identified 1,930 downregulated and 1,750 upregulated genes in mature spherules (  , ,5%) ( Fig. 2A). The majority of differentially regulated genes in spherules were identified at both 48 h and 8 days ( Fig. 2A). Similarly, the transition from mycelia to spherules was associated  with massive changes in transcription initiation, as determined by csRNA-seq. In 48-hor 8-day-old spherules, 11,768 or 12,752 TSRs were differentially upregulated and 5,536 or 5,573 TSRs were downregulated compared with mycelia, respectively (Log 2 FC, .1; FDR, ,0.05) (Fig. 2B), and the majority (.71%) of differentially regulated TSRs were shared among both spherule stages. Together, these findings suggest a sharp transition in transcriptional regulation from mycelia to spherules, while changes are less profound as spherules develop endospores.
Spherule-specific promoter-distal TSR activity is concentrated in specific genomic regions. To obtain more insights into the phase-specific gene regulation that drives pathogenesis, we first inspected the ;28% of mycelial TSRs that are distal of currently annotated promoters (annotation ASM14933v2). Many of these TSRs are associated with transcripts detected by RNA-seq, suggesting that they may mark novel gene loci. In contrast, 49% of spherule TSRs are promoter distal. While some of these distal TSRs also associate with RNA-seq and are thus likely novel gene loci (Fig. 2C, "*"), 34%, or 3,316, of the distal intergenic TSRs have low levels of, or no, associated stable RNA (Fig. 2C, "#"). These promoter-distal elements have some features that resemble those of unstable enhancer RNAs (eRNAs) that are produced at metazoan enhancers (33). Furthermore, these intergenic TSRs are three times more likely to be located in the vicinity of genes upregulated in spherules (based on the distance to the closest promoter) than in the vicinity of those upregulated in mycelia (640 versus 220, .2-fold change; FDR , 0.05). To determine the locations where TSR activity changed between the mycelial and spherule stages, we compared the genomic locations of all annotated TSSs with those of differentially regulated TSRs. In contrast to mycelium-specific TSRs that were most commonly found at or near annotated TSSs, spherule-specific TSRs were frequently found in regions that lack gene annotations (Fig. 2D). In addition to being "gene-poor" regions, these locations were also enriched for repetitive elements, primarily long terminal repeat (LTR) retrotransposons (Copia and Gypsy types) ( Fig. 2D) (34). The morphological transition to spherules is thus associated with a significant increase in total TSRs, the majority of which are promoter distal and give rise to rapidly degraded RNAs. Intriguingly, these "enhancer-like" TSRs appear to be coregulated and form cis-regulatory domains with spherule-specific genes, similar to a phenomenon thus far found largely in mammals (35,36).
Phase-specific TSR switching as a potential means to alter gene expression. csRNA-seq defines the start sites of transcripts at single-nucleotide resolution and can thus reveal genes that are expressed in both mycelia and spherules but initiate from different TSRs, leading to distinct 59 mRNA isoforms and occasionally alternative exon usage (Fig. S4A). As such, we identified 99 transcribed annotated genes where the primary TSR in mycelia and 8-day-old spherules was separated by more than 100 bp (;1% of annotated transcripts) (Fig. S4B). Alternative TSRs were usually specific to phase transition: only seven 59 TSRs were different among the 48-h-and 8-day-old spherules ( Fig. S4C). Shifting the TSR upstream from mycelia to spherules but not vice versa led to a significant loss of transcription, suggesting a role for TSR switching in gene regulation ( Fig. S4D and E), as previously reported for Histoplasma capsulatum (37). These findings propose alternative promoter usage as a notable regulatory mechanism during spherule formation.
cis-regulatory sequences and transcription initiation in C. immitis. While our refined analysis of the C. immitis transcriptome is fundamental to research, disease diagnostics and, eventually, new treatments require molecular targets and the identification of the pathways critical for pathogenesis. We therefore next investigated enriched sequence patterns proximal to csRNA-seq-defined TSSs. As a quality control, we first assessed the nucleotide frequencies in proximity to transcribed TSSs that, consistent with most other eukaryotic species, revealed a strong preference for the initiator motif "YCA (11)" (Fig. 3A) (38,39). The lack of a clear TATA box signature suggested that C. immitis, in agreement with its phylogeny (40), utilizes the mode of "scanning initiation" that is specific to some yeasts and has been elegantly characterized (41, 42) (Fig. S5A).
To further identify putative regulatory sequences in an unbiased manner, we next applied de novo motif discovery (HOMER [43]) to TSRs from positions 2150 to 150 relative to the primary TSS. Using sequence-content-matched regions of random genomic sequence as a control, our analysis revealed six motifs that were most prevalent in TSRs active in both mycelia and spherules, including the E-box motif (basic helix-loop-helix [bHLH]), CRE/TRE (bZIP), and the yeast family-specific RIM101 and STP3 TFs (Fig. S5B).
Together, these motifs are likely critical for maintaining gene expression programs in both morphological states of C. immitis. cis-regulatory sequences mediating phase-specific transcriptional regulation in C. immitis. To probe gene regulatory differences in mycelia and the pathogenetic spherule stages, we next probed the stage-specific TSRs for enriched motifs. Consistent with previous findings that report the TATA box as being linked to regulated rather than constitutively active genes (44,45), we found that the TATA box was enriched in stage-specific TSRs and depleted at ubiquitous TSRs in both mycelia and spherules (Fig. S5C). Furthermore, we identified two DNA motifs that were highly enriched at either mycelium-or spherule-specific TSRs. The single mycelium-specific motif was a heat shock response element-like (HSE) motif ( Fig. 3B) (46). The single motif that was highly enriched in spherule-specific but not mycelial TSRs was the WOPR motif ( Fig. 3C). Consistently, the HSE and WOPR motifs were largely mutually exclusive (Fig. S5D). Intriguingly, the WOPR motif was enriched from positions 2100 to 250 relative to the TSS, resembling the distribution of an activator, and WOPR transcription factors are known fungus-specific master regulators of morphological changes and virulence (24,25,47).
A C. immitis Ryp1 orthologue can drive gene expression using the WOPR motif. To further explore the mechanisms of spherule stage-specific gene expression, we searched the C. immitis genome for putative WOPR TFs. Protein BLAST analysis of white-opaque regulator (Wor1) from Candida albicans (CaWor1) identified C. immitis CIMG_02671 as the likely homologue (NCBI BLASTp E value of 7e233). Given the limitations associated with C. immitis as a BSL3 pathogen that hinder experimental approaches such as genome editing, we next modeled the CIMG_02671 structure and superimposed it onto available crystal structures from 2 WOPR family members, Wor1 from C. albicans and YHR177w from S. cerevisiae (ScYHR177w) (24,48). This approach demonstrated similar structures and positionings of key amino acids, including those critical for WOPR DNA motif binding ( Fig. 4A and B; Fig. S6A and B). Furthermore, the amino acids reported to be required for Wor1-dependent white-to-opaque morphology switching in C. albicans or mutations in YHR177w that disrupt its binding to DNA containing the WOPR motif were highly conserved across WOPR TFs and CIMG_02671 (Fig. 4C). We therefore next cloned CIMG_02671 and tested its ability to activate transcription in a WOPR motif-dependent manner using a reporter system in S. cerevisiae. An intact or mutated WOPR motif was cloned upstream of the Cyc1 promoter (P cyc1 ) that lacked an upstream activating site (UAS-less), driving a b-galactosidase (LacZ) reporter (Fig. 4D). We then cotransformed this P cyc1 LacZ reporter vector with plasmids   HcRyp1 ScMiT1 CIMG02671 YHR177W mutants -disrupt DNA binding strong (kd >100nM) + Wor1 mutants disrupt white-to-opaque switching frequency YHR177W mutants -disrupt DNA binding moderate (kd 10-100 nM) Wor1 mutants disrupt white-to-opaque switching frequency YHR177W mutants -disrupt DNA binding moderate (kd 10-100 nM) + Wor1 mutants disrupt white-to-opaque switching frequency *** *** *** *** *** *** *  expressing the positive control H. capsulatum Ryp1 (HcRyp1) or CIMG_02671 into S. cerevisiae and measured reporter expression. As reported previously and as shown in Fig. 4E, Ryp1 potently induced b-galactosidase expression when the WOPR motif was intact but not when it was mutated (25). Similarly, CIMG_02671 induced b-galactosidase expression in a WOPR motif-dependent manner (Fig. 4E). Collectively, these results demonstrate that CIMG_02671 resembles other WOPR TFs at conserved and functionally important amino acid residues and can activate transcription in a WOPR motif-dependent manner.

DISCUSSION
Here, we exploit the advance of csRNA-seq to capture active (nascent) TSSs from .500 ng of total RNA instead of intact nuclei or cells as required by other nascent methods (16). This key advantage makes csRNA-seq readily applicable to pathogenic or biohazardous samples, including fungi and others where morphological constraints like cell walls can hinder nucleus isolation. The use of total RNA as the input also greatly facilitates the analysis of pathogens that replicate or reside in the cytoplasm (49). As such, the approach used here to deepen our understanding of the fundamental transcription regulatory mechanisms of C. immitis, the causative agent of valley fever, should also be directly translatable to other infectious diseases.
Using inactivated BSL3 samples from the mycelial and spherule phases of C. immitis and performing csRNA-seq on the isolated RNA identified the WOPR motif as being highly enriched in the promoters and putative enhancer elements of spherule-specific genes. Using homology, we identified CIMG_02671 and showed that it can activate transcription in a WOPR motif-dependent manner. These findings comprise the first comprehensive description and a model for the gene regulatory programs underlying phase transition in the valley fever pathogen (Fig. 4F).
csRNA-seq identified .11,000 TSRs in C. immitis mycelia that were primarily promoter proximal. In contrast, spherule transition nearly doubled the number of TSRs and substantially increased the fraction of promoter-distal TSRs, many of which were associated with rapidly degraded transcripts. These findings were consistent among independent biological replicates. In some cases, spherule-specific transcription activation was found spanning a genomic region that included both putative novel genes and enhancer-like TSRs rather than individual genes. This finding implies a role of alternative regulatory mechanisms beyond cis-regulatory promoters in spherule maturation. Eukaryotic chromosomes exhibit domains of correlated gene expression where gene positioning influences the activation or silencing of transcription (35,36). Ascomycota fungi, of which Coccidioides spp. are members, are known to cluster into functionally related gene families (50). In our data, the spherule-specific activation of promoter-distal TSRs was enriched near spherule-specific genes. Thus, the transcriptional activation of spherule-specific genomic domains, rather than individual gene promoters, may be a key mechanism by which genes involved in spherule maturation are coordinately regulated.
Spherule-specific TSRs were further highly enriched for the WOPR motif. WOPR TFs regulate morphological changes and pathogenesis in diverse fungal species, including the human pathogens C. albicans (24) and H. capsulatum (26). We further found CIMG_02671, which contains a conserved WOPR binding domain, to activate transcription in a WOPR motif-dependent manner. Intriguingly, unlike both RYP1 and WOR1 WOPR TFs that are preferentially expressed with their activity in yeast or opaque cells, respectively (26, 27), CIMG_02671 is not transcriptionally upregulated in 48-h-or 8day-old spherules compared to mycelia. CIMG_02671 activity is thus likely differently regulated compared to WOR1 and RYP1. One potential mechanism could include TSR switching. We found that significantly more TSRs in spherules shifted 59, which resulted in transcription and, potentially, translation activation (37,51). These more active, alternative TSRs commonly contained a WOPR motif and invite speculation that alternative TSR selection and, thus, mRNAs could be a means to regulate gene expression between fungal differentiation states. In support of the importance of CIMG_02671 in valley fever pathogenesis, a recent preprint found that knocking out Coccidioides posadasii CPSG_00528, which has 100% identity to CIMG_02671, blocked spherule maturation under spherule-promoting conditions and was avirulent in the mouse model of coccidioidomycosis (52). Collectively, these findings propose CIMG_02671 as a WOPR TF that is important for spherule morphogenesis and C. immitis pathogenesis.
In conclusion, our findings provide insights into the fundamental transcriptional programs underlying the phase transition of C. immitis and highlight regulatory domains, TSR switching, and WOPR transcription factors as being central to valley fever pathogenesis. We hope that these findings and the data generated will provide a resource and stepping-stone to combat the increasing disease incidence and expanding geographic range of valley fever. On a broader scope, this study also provides a proof of principle for the utility of csRNA-seq as a novel approach to reveal key candidate transcription factors and regulatory programs underlying the pathogenesis of infectious or biohazardous agents under standard laboratory conditions.

MATERIALS AND METHODS
Culture conditions. The C. immitis RS strain was grown as mycelia or spherules as previously described (28). To grow mycelia, 2 Â 10 6 arthroconidia/mL were incubated in 250-mL flat-bottom Erlenmeyer flasks (Corning) in 50 mL glucose-yeast extract (GYE) medium. Flasks were cultured in a 30°C incubator without shaking for 5 days. To grow spherules, arthroconidia were washed 2 times in modified Converse medium (53). The spores were inoculated at 4 Â 10 6 arthroconidia/mL into a 250-mL baffled Erlenmeyer flask containing 50 mL of modified Converse medium. Flasks were set up and grown on a shaker at 160 rpm in 14% CO 2 at 42°C. Four flasks were harvested 2 days after inoculation, and the remaining four flasks were harvested after 8 days. Fresh Converse medium was not added. The spherules did not rupture and release endospores within that time in this culture system. Saccharomyces cerevisiae (strain RYH2863) was grown as described previously (54). Schizosaccharomyces pombe (strain TH972) was generously provided by Tony Hunter (Salk Institute for Biological Sciences) and grown in yeast extract with supplements (YES) (55). White and brown ecotypes of Agaricus bisporus, better known as "champignon mushroom," common mushroom, or "crimini mushroom," were kindly provided by Monterey Mushroom farms.
RNA extraction and purification. C. immitis mycelium and spherule samples were stored in QIAzol (Qiagen) at 270°C and processed as previously described (9). Samples were added to a 2 mL ZR BashingBead lysis tube with 0.5 mm beads (Zymo Research), and tubes were arranged in a precooled TissueLyser II adapter (Qiagen) and disrupted by shaking at 50 Hz for 25 min. QIAzol samples were spun at 21,000 Â g for 5 min at 4°C, and the supernatant was transferred to a fresh tube. Total RNA was purified from mycelium and spherule samples (2 replicates/condition) using chloroform extraction and isopropanol precipitation and quantified using a Qubit 3.0 fluorometer (Invitrogen). RNA from white and brown Agaricus bisporus mushrooms was isolated as described previously (22), using TRIzol LS extraction following tissue homogenization. Libraries were generated for each ecotype separately and then pooled for the analysis. RNA from S. cerevisiae and S. pombe was isolated by resuspending a pelleted culture in TRIzol LS. Next, silica beads were added, and cells were lysed with a multivortexor, with 1 min on and 1 min off, on ice for a total lysis time of 6 min. Samples were spun at 21,000 Â g for 5 min at 4°C, and the supernatant was transferred to a fresh tube, followed by TRIzol LS RNA purification as described by the manufacturer.
RNA and csRNA sequencing. For RNA-seq, strand-specific, paired-end libraries were prepared from total RNA by ribosomal depletion using the yeast Ribo-Zero rRNA removal kit (Illumina) and then using the TruSeq stranded total RNA-seq kit (Illumina) according to the manufacturer's instructions. Next, 100 bases were sequenced from both ends using a NovaSeq 6000 system according to the manufacturer's instructions (Illumina).
csRNA-seq was performed as described previously (15). Small RNAs (sRNAs) of ;20 to 60 nucleotides (nt) were size selected from 0.4 to 2 mg of total RNA by denaturing gel electrophoresis. A 10% input sample was taken aside, and the remainder was enriched for 59-capped RNAs. Monophosphorylated RNAs were selectively degraded by Terminator 59-phosphate-dependent exonuclease (Lucigen), and RNAs were 59 dephosphorylated by quickCIP (New England BioLabs [NEB]). Input (sRNA) and csRNA-seq libraries were prepared as described previously (22) using RppH (NEB) and the NEBNext small RNA library prep kit, amplified for 14 cycles, and sequenced for single end 75 base pair reads (SE75) on the Illumina NextSeq 500 system.
WOPR transcription factor structure-based sequence alignment and modeling. Structure-based sequence alignment was performed with CIMG_02671 and known WOPR family transcriptional regulators such as Candida albicans white-opaque regulator 1 (Wor1) (CaWor1), ScYHR177w, ScMiT1, and HcRyp1 using ESPript (56). To understand the DNA binding properties of CIMG_02671, we performed automated protein structure homology modeling using SWISS-MODEL (57). To perform protein homology modeling, we provided a truncated input sequence (positions 1 to 270) of CIMG_02671 (UniProt accession number J3KLV5_COCIM J3KLV5 Camp-independent regulatory protein; https://www.uniprot .org/uniprot/J3KLV5) and used the crystal structure of the WOPR family member YHR177w of S. cerevisiae with the 19-bp double-stranded DNA (dsDNA) (PDB accession number 4M8B) as the template. This was one of the top-ranked template searches for the resulting model of CIMG_02671, with a top global model quality estimation (GMQE) value of 0.4. Model building was carried out using the ScYHR177w crystal structure (PDB accession number 4M8B) as the template, and a three-dimensional (3D) model was automatically generated by the target-template alignment. The quality of the generated model was evaluated by global and local quality estimates, by Ramachandran plots, and by the qualitative model energy analysis (QMEAN) value for the different geometric properties for a single model. The final CIMG_02671 model contains amino acids 11 to 223. To analyze its potential DNA binding properties, the CIMG homology model was aligned with the crystal structure of the complex of CaWor1-13-bp DNA using PyMOL (http://www.pymol.org).
(ii) csRNA-seq. Sequencing reads were trimmed for 39 adapter sequences using HOMER ("homerTools trim -3 AGATCGGAAGAGCACACGTCT -mis 2 -minMatchLength 4 -min 20") and aligned to the appropriate genome using STAR with default parameters (58). Only reads with a single, unique alignment (MAPQ value of $10) were considered in the downstream analysis. Furthermore, reads with spliced or soft-clipped alignments were discarded. Transcription start regions (TSRs), representing loci with significant transcription initiation activity (i.e., "peaks" in csRNA-seq), were defined using HOMER's findcsRNATSS.pl tool, which uses short input RNA-seq, traditional total RNA-seq, and annotated gene locations to eliminate loci with csRNAseq signals arising from noninitiating, high-abundance RNAs that nonetheless are captured and sequenced by the method (see reference 15 for more details). Replicate experiments were first pooled to form metaexperiments for each condition prior to identifying TSRs. Annotation information, including gene assignments and promoter-distal, stable transcript, and bidirectional annotations, are provided by findcsRNATSS.pl. Stable TSSs were defined as TSS clusters containing at least 2 per 10 7 RNA-seq reads within positions 2100 to 1500 relative to the TSS (15). To identify differentially regulated TSRs, TSRs identified under each condition were first pooled (union) to identify a combined set of TSRs represented in the data set using HOMER's mergePeaks tool, using the option "-strand." The resulting combined TSRs were then quantified across all individual replicate samples by counting the 59 ends of reads aligned at each TSR on the correct strand. The raw read count table was then analyzed using DESeq2 to identify differentially regulated TSRs (60). Normalized genome browser visualization tracks were generated using HOMER and visualized using IGV (61).
(iii) Sequence/motif analysis. Known motif enrichment and de novo motif discovery were performed using HOMER's (43) findMotifsGenome.pl tool with default parameters using TSR sequences from positions 2150 to 150 relative to the primary TSS. When performing de novo motif discovery, TSR sequences were compared to a background set of 50,000 random genomic regions matched for overall GC content. Nucleotide frequency and motif density plots were created using HOMER's annotatePeaks.pl tool (43).
(v) 59 switch analysis. To identify genes with major changes in isoform usage due to changes in TSR promoter usage, we first merged the de novo-assembled transcripts found using StringTie2 with cuffmerge into a single transcript set. We then identified the TSR from spherules (8 days old) or mycelia with the most reads that overlapped on the correct strand in each gene locus (allowing the TSR to be up to 200 bp upstream of the 59 end). The difference in the positions of the top spherule and mycelial TSRs was recorded for each gene, and the mature transcript level expressed in each stage was estimated using the RNA-seq data by counting the fragments per kilobase per million (FPKM) levels from the upstream TSR to the downstream TSR (uFPKM) and then from the downstream TSR to the end of the gene (dFPKM). These values were then filtered by the distance between TSRs (100 bp to 1,000 bp), by the level of expression (FPKM of .5 in mycelia or spherules), and to ensure basal expression in both groups (FPKM of .2 in both mycelia and spherules). To identify alternative TSRs leading to differential starting positions in RNA-seq, the log 2 ratios of uFPKM and dFPKM were calculated for both mycelia and spherules with the addition of a small pseudocount {log 2 [(uFPKM 1 3)/(dFPKM 1 3)]}. Genes with log 2 fold changes of greater than 1 or less than 21 between spherule and mycelial upstream-to-downstream ratios were identified (n = 159 comparing mycelia with 8-day-old spherules). Each of these loci was visually evaluated using the IGV browser to confirm 2 distinct TSRs and different RNA-seq starting positions associated with the alternative TSRs (n = 99 visually confirmed).
Data availability. All raw and processed csRNA-seq data generated for this study can be accessed at the NCBI Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE179468. Previously published Neurospora crassa RNA-seq and csRNA-seq data can be accessed under GEO accession number GSE135498. Previously published Coccidioides immitis RNA-seq data can be accessed under GEO accession number GSE171286.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.