Comparative mitogenome analysis reveals mitochondrial genome characteristics in eight strains of Beauveria

Despite the significant progress that has been made in the genome sequencing of Beauveria species, mitochondrial genome (mitogenome) used to examine genetic diversity within fungal populations. Complete mitogenomes of Beauveria species can be easily sequenced and assembled using various sequencing techniques. However, since mitogenome annotations are mainly derived from similar species comparison and software prediction, and are not supported by RNA-seq transcripts data, it leads to problems with the accuracy of mitochondrial annotations and the inability to understand RNA processing. In this study, we assembled and annotated the mitogenome of eight Beauveria strains using Illumina DNA and RNA sequencing data. The circular mitogenome of eight Beauveria strains ranged from 26,850 bp (B. caledonica strain ATCC 64970) to 35,999 bp (B. brongniartii strain GYU-BMZ03), with the intronic insertions accounting for most of the size variation, thus contributing to a total mitochondrial genome (mitogenome) size of 7.01% and 28.95%, respectively. Intron number variations were not directly related to the evolutionary relationship distance. Besides ribosomal protein S3 (rps3), most introns are lost too quickly and lack the stability of protein-coding genes. The short RNA-seq reads from next-generation sequencing can improve the mitochondrial annotation accuracy and help study polycistronic transcripts and RNA processing. The transcription initiation sites may be located in the control region. Most introns do not serve as taxonomic markers and also lack open reading frames (ORFs). We assumed that the poly A tail was added to the polycistronic transcript before splicing and one polycistronic transcript (trnM(1)-trnL(1)-trnA-trnF-trnK-trnL(2)-trnQ-trnH-trnM(2)-nad2-nad3-atp9-cox2-trnR(1)-nad4L-nad5-cob-trnC-cox1-trnR(2)-nad1-nad4-atp8-atp6-rns-trnY-trnD-trnS-trnN-cox3-trnG-nad6-trnV-trnI-trnS-trnW-trnP-rnl(rps3)-trnT-trnE-trnM(3)) was first processed from the mitogenome and was subsequently processed into smaller mono-, di-, or tricistronic RNAs.

The mitogenome of B. brongniartii isolate IMBST95031 has an additional intron in rnl, differentiated from B. bassiana (Ghikas, Kouvelis & Typas, 2010). The mitogenomes of B. pseudobassiana strain C1010 (Oh, Kong & Sung, 2015), B. malawiensis isolate k89 (Glare et al., 2020), B. caledonica isolate fhr1 (Glare et al., 2020), and B. lii strain RCEF5500 (Zhang et al., 2021), especially B. bassiana (most abundant species of the genus Beauveria) provided insights into the mitogenome of the Beauveria genus (Ghikas, Kouvelis & Typas, 2010;Glare et al., 2020;Lee et al., 2017;Wang et al., 2020). Complete mitogenomes have high variations, and their lengths vary greatly (Uribe & Khachatourians, 2004). The minimum complete mitogenome length was 28,816 bp (B. bassiana isolate k4), whereas the maximum length was nearly double the minimum length, i.e., 59,014 bp (B. lii strain RCEF5500). Complete mitogenomes of Beauveria species are haploid, maternally inherited, and have fewer introns, which can be easily sequenced and assembled using various sequencing techniques (Sanger sequencing, next-generation sequencing (NGS), and PacBio sequencing) (Wang et al., 2020;Xu et al., 2009;Zhang et al., 2021). However, since mitogenome annotations are mainly derived from similar species comparison and software prediction, and are not supported by RNA-seq transcripts data, it leads to problems with the accuracy of mitochondrial annotations and the inability to understand RNA processing. A complete and detailed transcription map of the Dictyostelium discoideum mitochondrial DNA revealed eight major polycistronic transcripts encoding polypeptides, ribosomal RNAs, interspersed transfer RNAs, and polycistronic transcripts processed into smaller transcripts in northern hybridization studies (Barth et al., 2001). PacBio sequencing has identified polycistronic transcripts, which are conserved and prevalent in several mushroom-forming fungal species, including Plicaturopsis crispa and Phanerochaete chrysosporium (Gordon et al., 2015). The PacBio full-length transcriptome profiling of insect mitochondrial gene expression investigates mitochondrial gene transcription, RNA processing, mRNA maturation, and several other related topics (Gao et al., 2016). However, a large number of transcriptomes sequenced by next-generation sequencing (NGS) are in the NCBI Sequence Read Archive (SRA) database, and their numbers are still increasing. The Beauveria strains transcriptome data shared by the NCBI SRA database has not adopted the PacBio or Oxford Nanopore techniques.
We sequenced both the genome and transcriptome of seven Beauveria strains using the Illumina sequencing platform in this study, and downloaded the SRA data (genome and transcriptome) of one Beauveria strain from the NCBI SRA database, which was sequenced using the Illumina sequencing platform in previous study (Valero-Jiménez et al., 2016). Although the NGS short reads resulted in incompletely assembled transcripts lacking some vital information (such as 5 or 3 end information) (Bai et al., 2021;Gao et al., 2016;Gordon et al., 2015), we built an analysis flow using NGS short reads to analyze mitogenome characteristics, polycistronic transcripts and RNA processing.
Plant Maxi Kit (QIAGEN Inc., Valencia, CA, USA) following the kit handbook. The purity and concentration of the obtained gDNA were tested using a NanoPhotometer R spectrophotometer (Implen, CA, USA) and a Qubit R 2.0 fluorometer (Life Technologies, CA, USA), respectively. Sequencing libraries for the quality-checked gDNA were generated using a TrueLib DNA Library Rapid Prep Kit for Illumina sequencing (Illumina, Inc., CA, USA). The libraries were subjected to size distribution analysis using an Agilent 2100 bioanalyzer (Agilent Technologies, Inc., CA, USA), followed by real-time PCR quantitative test. The successfully generated libraries were sequenced using an Illumina NovaSeq 6000 platform (Illumina, Inc., San Diego, CA, USA), and 150-bp paired-end reads with an insert of approximately 350 bp were generated.

RNA extraction, library preparation, and Illumina sequencing
Total RNAs from seven Beauveria were extracted using Trizol reagent (Life Technologies, Carlsbad, CA, USA) following the reagent handbook. The obtained RNA samples were subjected to RNase-free DNase I treatment to remove residual DNA. The purity, concentration, and integrity of the RNA were tested using a NanoDrop 2000 spectrophotometer (Implen, CA, USA), Aglient 2100 bioanalyzer (Agilent Technologies, Inc., Santa Clara, CA, USA), and Qubit 2.0 fluorometer (Life Technologies, Carlsbad, CA, USA), respectively. The mRNAs were purified from the total RNAs using poly-T oligoattached magnetic beads and fragmented. Subsequently, cDNA libraries were generated for the quality-checked samples using a Tru Seq RNA Sample Prep Kit following the kit handbook. The successfully generated cDNA libraries were sequenced using using an Illumina NovaSeq 6000 platform (Illumina, Inc., CA, USA).
The QC standards for RNA reads were as follows: (1) Removing reads with >50% bases with Phred quality < Q10, (2) Removing reads with an average quality score of < 20, (3) trimming reads longer than 150 bases at their tail region to modify them to ≤ 150 bases, (4) trimming the poly-G tails of the reads, (5) filtering low-complexity reads (complexity is defined as the percentage of the base that differs from its next base); the threshold for the low-complexity filter is 10, which means that 10% complexity is required, and (6) Removing reads with < 80 bases.
The main software parameters were set as follows: (1) Mitogenome sequence of B. bassiana (GenBank accession number: EU371503.2/NC_010652.2) (Xu et al., 2009) was selected as the seed sequence and was indirectly extended using NOVOPlasty.
(2) Because the length of the mitogenome sequence of Beauveria varies greatly, the expected mitogenome size was set to range from 20,000 to 40,000 bp.
(3) The value of K-mer (33) was the default parameter.

Phylogenetic analysis
Phylogenetic relationships of the genus Beauveria were deduced using MEGA 11 software (Tamura, Stecher & Kumar, 2021) with 15 amino acid sequences of 16 species using the maximum likelihood (ML) method. Fourteen mitogenomes were obtained from Beauveria strains. The mitogenomes of Metarhizium anisopliae strain ME1 and Trichoderma reesei were selected as the outgroups. The amino acid sequences of 15 PCGs of each mitogenome were connected end to end in the order of rps3-nad2-nad3-atp9-cox2-nad4L-nad5-cob-cox1-nad1-nad4-atp8-atp6 -cox3-nad6 to form a sequence. For ML, models with the lowest Bayesian information criterion (BIC) scores were considered to best describe the substitution pattern. Based on BIC = 50395.840, the general reversible chloroplast (cpREV) (Adachi et al., 2000) + gamma distribution (G) + amino acid frequencies (F) model was chosen as the optimal evolutionary model with 1000 bootstrap replications for the phylogenetic analysis. Initial trees for the heuristic search were obtained automatically by applying the neighbor-join and BioNJ algorithms to a matrix of estimated pairwise distances and then selecting the topology with a superior log-likelihood value. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test of 1000 replicates is shown above the branches (Felsenstein, 1985). There were a total of 5572 positions in the final dataset.

RNA reverse transcription, PCR amplification and sanger sequencing
RNA was reversed to cDNA using the Kit PrimeScript TM RT Reagent Kit with gDNA Eraser (Perfect Real Time) (No. RR047A; TaKaRa). According to the target region, primers were designed using Primer3 version 2.6.1 (Koressaar & Remm, 2007), and PCR amplification was performed using GeneExplorer TC-XP-D (Hangzhou Bioer Technology Co., Inc., Hangzhou, CN). 2 µl of PCR products were used for electrophoresis on 1% agarose gel. PCR products were sequenced using an ABI 3730 automatic sequencer (Applied Biosystems, Foster City, CA, USA).
Although the proportion of short reads of the mitogenomes in the total reads of DNA ranged from 1.07% (B. caledonica strain ATCC 64970) to 5.47% (B. bassiana strain ARSEF 8028), those from the RNA accounted for <0.18% of the total short reads, especially the mitochondrial RNA read proportion of B. caledonica strain ATCC 64970 being almost zero (Fig. 3). This may have something to do with being in a closed petri dish.

Mitogenome annotation
The conserved genes of the Beauveria species mitogenome were located on the positive strand and in the same order and orientation as B. bassiana (NC_010652.2) (Xu et al., 2009), which encodes fifteen protein-coding genes (PCGs) that include the ribosomal protein S3 (rps3), seven NADH dehydrogenase subunits (nad1-6 and nad4L genes), three cytochrome oxidase subunits (cox1-3), apocytochrome b (cob), three subunits of ATP synthase (atp6, 8, 9), small and large subunit ribosomal RNA genes (rnl and rns), 25 tRNAs that include three tRNAs for methionine, two for arginine, two for leucine, two for serine, and only one tRNA for other amino acids, and one control region located between trnM (3) and trnM (1) . Transcriptome annotation revealed that rnl was longer than the reference mitogenome (NC_010652.2), extending 30 bp forward and 10 bp backward. These regions have multiple AT repeats, possibly circularizing the rnl gene. We also observed some reads of the B. amorpha strain GYU-BMZ01 and B. pseudobassiana strain ATCC 90518 from RNA-seq with the half bases aligned at the head of rnl and half bases aligned at its tail, thus causing errors in the de novo assembly of the transcriptome. We found a read in the B. amorpha strain GYU-BMZ02 showing that the rnl gene has a polyA tail at the 3 end.
To investigate the extent of genome divergence, we performed multiple alignments of the eight Beauveria mitogenome sequences (Fig. 4). The results revealed high sequence similarity across all the eight Beauveria mitogenomes for coding DNA sequence (CDS), with some structural variations existing in the mitogenomes. We observed mitogenome length differences mostly in the introns (Fig. 4 and Table 1). The number of introns ranged from one (B. pseudobassiana strain ATCC 90518) to eight (B. brongniartii strain GYU-BMZ03), and were located in rnl, cob, cox1, cox2, and nad1 ( Fig. 4 and Table 1). The total length of the introns varied from 1,882 bp (B. pseudobassiana strain ATCC 90518) to 10,422 bp (B. brongniartii strain GYU-BMZ03), with their contribution to the respective total mitogenome size ranging from 7.01% (B. pseudobassiana strain ATCC 90518) to 28.95% (B. brongniartii strain GYU-BMZ03). The intron of rnl (the second intron of rnl for B. brongniartii strain GYU-BMZ03) was conserved and contained an open reading frame (ORF) for rps3. Introns in other genes are thought to contain ORFs (Ghikas, Kouvelis & Typas, 2010;Oh, Kong & Sung, 2015;Zhang et al., 2021), including the LAGLI-DADG and GIY-YIG endonuclease located in the cox1 and nad1 introns (Ghikas, Kouvelis & Typas, 2010), respectively. After comparing the eight Beauveria strains, we concluded that there were no ORFs owing to the loss of introns among the different species. Even if these intron sequences were not lost, there were many SNP mutations, and even the start codon could not be identified (B. bassiana strain YMM and B. bassiana strain GYU-BMZ04). The cox2 gene of B. brongniartii has one intron, including a putative GIY-YIG homing endonuclease (Oh, Kong & Sung, 2015). However, the cox2 gene of B. pseudobassiana strain ATCC 90518 did not contain any introns from the mitogenome and transcript sequences. The intron types of Beauveria species mitogenomes were group I introns (Wang et al., 2020;Zhang & Zhang, 2019), of which the boundaries were marked by a U residue at the 3 end of upstreaming exon and a G residue at the 3 end of the intron (Burke, 1988;Cech, 1988;Saldanha et al., 1993). The splicing model was mainly GT-AG in introns mainly of Beauveria species mitogenomes (Wang et al., 2020). For verification, the introns of the cox2 can be shown using cDNA and PCR in mitogenome of B. bassiana strain GYU-BMZ04 (Fig. 5, Files S1 and S2). The cox2 gene contained one intron in in mitogenome of B. bassiana strain GYU-BMZ04, but the cox2 transcript did not contain intron. The cox2 gene of B. amorpha strain GYU-BMZ01, B. bassiana strain GYU-BMZ04, and B. brongniartii strain GYU-BMZ03 contained intron, which is consistent with the results of assembly and annotation.
The mitogenomes contained 39 intergenic and two overlapping regions. The total length of the intergenic regions ranged from 5198 bp (B. pseudobassiana strain ATCC 90518) to 6092 bp (B. bassiana strain GYU-BMZ04), thus contributing to a total mitogenome size ranging from 15.76% (B. brongniartii strain GYU-BMZ03) to 20.33% (B. bassiana strain GYU-BMZ04). The A + T content was similar for these regions in the mitogenomes (∼74.5%), with the largest intergenic region being between cox1 and trnR-UCU. Overlapping regions of 1 bp were located between both nad2 and nad3, and between nad4L and nad5.

Phylogenetic analysis
We obtained fourteen mitogenomes from the Beauveria species and selected the mitogenome of Metarhizium anisopliae strain ME1 and Trichoderma reesei as outgroups. Based on the lowest BIC value (50395.840), we constructed a phylogenetic tree using the maximum likelihood (ML) method and a general reversible chloroplast (Adachi et al., 2000) + amino acid frequencies (F) + gamma distribution (G) model with 1000 bootstrap replications (Fig. 6). The structure of the phylogenetic tree is similar to reported  in previous studies (Glare et al., 2020;Zhang et al., 2021). The phylogenetic position of B. amorpha within the Beauveria was first evaluated using complete mitogenomes, and the results showed that B. amorpha was closed to B. bassiana. B. pseudobassiana and B. caledonica were on the same branch. The presence or absence of mitochondrial introns cannot serve as a taxonomic marker (Glare et al., 2020), which also verifies that most introns lack the stability of coding regions from another aspect. All genes in the Beauveria species are on the same chain, which might result in changes to the intron number.

DISCUSSION
We adopted two annotation processes ( Fig. 1): (1) we aligned the reads to the mitogenome, extracted the aligned reads, conducted de novo assembly, and then mapped transcripts to the mitogenome, and (2) we performed the de novo assemble and mapping of the transcripts to the mitochondrial genome. There are differences between mitochondrial transcripts and nuclear transcripts. While nuclear transcripts consist of a very high proportion of the entire transcriptome, we reduced the computational resources used during de novo assembly by filtering reads of nuclear transcripts. Although the first process was much more efficient than the second, it filtered out reads during their alignment, i.e., reads of B. amorpha strain GYU-BMZ01 and B. pseudobassiana strain ATCC 90518 from RNA-seq with the half bases aligned the head of rnl, and half bases aligned the tail. Although NGS short reads resulted in incompletely assembled transcripts that lack some vital information (e.g., 5 or 3 end information) (Bai et al., 2021;Gao et al., 2016;Gordon et al., 2015) and do not accurately determine the continuity of the transcript, we can make predictions with different types of reads, which can improve the utilization of NGS data and aid in the mitogenome annotation.

Polycistronic transcripts and RNA processing
Genome-wide polycistronic transcripts are prevalent and conserved among mushroomforming Agaricomycetes (Gordon et al., 2015). Most polycistronic transcripts are subsequently processed into smaller mono-, di-, or tricistronic RNAs (Barth et al., 2001). In this study, we found two types of reads for regions between introns and exons. The first type of reads were splicing reads, which could be mapped to two adjacent exons, while the second type could be mapped to introns, exons, and across the boundary. Therefore, we can classify de novo assembled transcripts into (1) intron-free and (2) intron-containing transcripts. mRNA was purified from the total RNA using poly T oligo-attached magnetic beads. Mature mRNAs with poly A tails were intron-free. Although some introns may contain ORFs, the origin and termination sites of genes located within introns are hundreds of bases away from the exon region. There should not be any second type of reads. Owing to the ''efficiency of splicing'', (Darnell, 2013), we propose that the maturation processing of pre-mRNA involves polyadenylation, splicing, and cleavage. The poly A tail was added to the polycistronic transcript before splicing. For the B. amorpha strain GYU-BMZ01, the RNA-seq read coverage was high (up to 94.72%), and with the addition of some tRNA regions that are not reads, the coverage will be even higher. Based on the transcript alignment results (Fig. 7 and Table 2), some polycistronic transcripts were likely present.

CONCLUSIONS
In the present study, we utilized Illumina DNA and RNA sequencing data to assemble and annotate mitogenome of Beauveria species. The short RNA-seq reads from nextgeneration sequencing can improve the mitochondrial annotation accuracy and help study polycistronic transcripts and RNA processing. The pipeline of this study offers a reference for mitochondrial transcriptome. Beauveria species tend to process the growing polycistronic transcript first, and the poly A tail was added to the polycistronic transcript before splicing. The small proportion of mitochondrial transcripts among the total transcripts for Beauveria species, B. bassiana transcriptome sequenced by PacBio sequencing only have 11 full-length transcripts from mitogenome in previous study (Wang et al., 2020), the Oxford Nanopore Technology (ONT) may be more advantageous than PacBio technology. The presence of a largen of intron regions is interesting and further studies are needed to investigate the underlying mechanisms creating and preserving such RNA processing. Finally, the results of this study could be applied to other areas for fungal mitogenome.