Characterization and Phylogenetic Analysis of the Complete Mitochondrial Genome of Aythya marila

Aythya marila is a large diving duck belonging to the family Anatidae. However, the phylogenetic relationship among these Aythya species remains unclear due to the presence of extensive interspecific hybridization events within the Aythya genus. Here, we sequenced and annotated the complete mitochondrial genome of A. marila, which contained 22 tRNAs, 13 protein-coding genes (PCGs), 2 ribosomal RNAs, and 1 D-loop, with a length of 16,617 bp. The sizes of the PCGs ranged from 297 to 1824 bp and were all, except for ND6, located on the heavy chain (H). ATG and TAA were the most common start and termination codons of the 13 PCGs, respectively. The fastest- and slowest-evolving genes were ATP8 and COI, respectively. Codon usage analysis indicated that CUA, AUC, GCC, UUC, CUC, and ACC were the six most frequent codons. The nucleotide diversity values indicated a high level of genetic diversity in A. marila. FST analysis suggested a widespread gene exchange between A. baeri and A. nyroca. Moreover, phylogenetic reconstructions using the mitochondrial genomes of all available Anatidae species showed that, in addition to A. marila, four major clades among the Anatidae (Dendrocygninae, Oxyurinae, Anserinae, and Anatinae) were closely related to A. fuligula. Overall, this study provides valuable information on the evolution of A. marila and new insights into the phylogeny of Anatidae.


Introduction
The greater scaup (Aythya marila) is a relatively large diving duck belonging to the order Anseriformes, subfamily Anatinae, and family Anatidae. This species is mainly distributed throughout North Eurasia from Iceland eastward to the Lena River in Siberia, and winters mainly in Northwest Europe and around the Adriatic, Black, and Caspian Seas. In summer, the greater scaup primarily feeds on insects, crustaceans, and mollusks, in particular small bivalves, snails, and amphipods. The greater scaup is the only circumpolar Aythya and one of the few circumpolar duck species. It is also the only species in the genus Aythya that has subspecies: A. marila marila and A. marila nearctica [1]. However, to date, little is known about its biology compared to other Aythya species, partly due to its relatively isolated breeding grounds and the difficulty in distinguishing it from its close relative, the lesser scaup (A. affinis) [2,3].
Aythya originated in the Miocene, and there are 12 extant species worldwide [4]. Of these, five species (A. ferina, A. nyroca, A. baeri, A. fuligula, and A. marila) are widely distributed across China. However, due to the late differentiation between Aythya species and serious interspecific hybridization [5][6][7], the phylogenetic relationship among these five Aythya species remains unclear. Recent research has shown that there is a substantial spatial overlap in the distribution ranges of A. nyroca and A. baeri. This overlap may have a detrimental effect on the conservation status of A. baeri, which has a relatively small population. As such, it is crucial to assess the potential risk of hybridization between these two species. Mitochondria are important functional organelles that provide energy for various cellular biological functions [8]. The animal mitochondrial genome is a circular, closed, double-stranded molecule, usually ranging from 15 to 21 kb in size, with 37 genes, including 13 protein-coding genes (PCGs: ND1, ND2, ND3, ND4, ND4L, ND5, ND6, COI, COII, COIII, ATP6, ATP8, and Cytb), 2 ribosomal RNAs (12S rRNA and 16S rRNA), 22 transfer RNAs (tRNA), and an A + T-rich noncoding control region [9][10][11]. The mitochondrial genome is a highly conserved molecule characterized by its rapid evolutionary rate, maternal inheritance, haploid nature, and limited recombination in comparison to nuclear genes [12][13][14]. Owing to these attributes, mitochondrial genomes have become increasingly utilized in phylogenetic analyses, population genetic diversity studies, and evolutionary research [15][16][17]. In particular, complete mitochondrial genomes offer substantially more information for phylogenetic analysis than a single mitochondrial gene. However, it is important to note that the mitochondrial genome may not accurately depict the population structure of species due to factors such as gender bias, dispersal behavior, and frequent hybridization events.
In this study, we assembled and annotated the first complete sequence of the mitochondrial genome of the greater scaup and assembled twenty-eight complete mitochondrial genomes of five species: six A. ferina, four A. nyroca, six A. baeri, six A. fuligula, and six A. marila. We analyzed the gene content, codon usage bias, repeat sequences, and synonymous and nonsynonymous substitution rates of the greater scaup. Additionally, we revealed the phylogenetic relationships among Anatidae species and the population genetic diversity of A. ferina, A. nyroca, A. baeri, A. fuligula, and A. marila based on a combined set of mitochondrial genes.

Ethics
All experimental protocols were approved by the Qufu Normal University Biomedical Ethics Committee, and all experiments followed the recommendations in the ARRIVE guidelines (Ethical proof 2021097). All animal experiments were performed in accordance with relevant guidelines and regulations.

Specimen Collection, Genome Sequencing, and Assembly
The tissue samples of A. marila used in this study originated from the Wildlife Inspection Center of the Northeast Forestry University; no living animal was involved in this study. Samples were stored at −80 • C ultralow-temperature freezer. Total genomic DNA was extracted from the tissue using the DNeasy Blood & Tissue Kit (QIAGEN, Beverly, MA, USA). The whole genome of A. marila was sequenced by Sangon Biotech Co., Ltd. (Shanghai, China) using an Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA). Approximately 4.7 Gb of raw data from 150 bp paired-end reads were generated. Raw data were processed using Fastp v0.36 [18] and assembled using SPAdes v3.15 [19]. GapFillerr v1.11 [20] was used to fill the gaps, whereas PrInSeS-G was used for sequence correction. In addition, we sequenced 28 samples from 5 species of Aythya using an Illumina NovaSeq 6000 platform and obtained high-quality raw genome sequencing data (Table 1). We then used the NOVOPlasty software (V 4.1) to assemble 28 complete mitochondrial genomes of the 5 species: A. ferina, A. nyroca, A. baeri, A. fuligula, and A. marila.

Phylogenetic Analysis
In this study, we generated two distinct phylogenetic trees to elucidate the evolutionary relationships within the Anatidae family. Initially, to determine the taxonomic position of A. marila, we retrieved 51 Anatidae mitochondrial genomes from GenBank and included Buteo hemilasius and Gallus gallus as outgroup taxa. We constructed Bayesian inference (BI) and maximum likelihood (ML) phylogenetic trees of the 54 mitogenomic sequences based on 13 mtDNA PCGs (ND1, ND2, ND3, ND4, ND4L, ND5, ND6, COI, COII, COIII, ATP6, ATP8, and Cytb). MAFFT v. 7.475 [29] was used for the alignment of the nucleotide sequences of the 13 PCGs, and ambiguously aligned regions were identified and removed by Gblocks [30]. The alignments of individual genes were then concatenated using Phylo-Suite [31]. According to the Bayesian information criterion (BIC), ModelFinder2.2.1 [32] was used to determine the best partitioning scheme and substitution model, and the GTR + F + I + G4 model was chosen as the best fit for the ML method. ML analysis was performed using IQ-TREE [33] and conducted with 5000 ultrafast bootstrap replicates. The BI method was performed using MrBayes [34] with 4 simultaneous Markov chain Monte Carlo (MCMC) chains running for 2,000,000 generations and sampling every 1000 generations with a burn-in of 25%. Subsequently, to further investigate the population structure among 5 Aythya species, we aligned 28 complete mitochondrial genomes using MEGA and constructed a phylogenetic tree, selecting Anas platyrhynchos as the outgroup. Lastly, the resulting phylogenetic trees were annotated and visualized using the Interactive Tree of Life (iTOL) online service [35] (https://itol.embl.de/, accessed on 4 October 2022).

Genome Structure and Composition
The complete mitochondrial genome sequence of A. marila is a typical closed-circular molecule with a size of 16,617 bp and has been deposited in GenBank under the accession number OP326610. The mitochondrial genome contents of A. marila are similar to those of most other published Aythya, which includes 37 genes [36], 13 PCGs, 22 tRNAs, 2 rRNAs (rrnL and rrnS), and a control region (D-loop) ( Figure 1). Among the 37 fragment genes, a PCG (ND6) and 8 tRNA genes (tRNA-Ala, tRNA-Cys, tRNA-Glu, tRNA-Gln, tRNA-Asn, tRNA-Pro, tRNA-Ser2, and tRNA-Tyr) were located on the light strand (L-strand), whereas the CR and the remaining 28 genes were located on the heavy strand (H-strand). The gene order was the same as that in most bird mitochondrial genomes [37,38] (Table 2). The total length of protein-coding, tRNA, and rRNA genes comprised 68.65%, 9.29%, and 15.57% of the entire mitochondrial genome, respectively. The base composition of the A. marila genome was 29.42% A, 22.25% T, 15.49% G, and 32.84% C, and the A + T content (51.67%) was higher than the G + C content (48.33%), suggesting a strong A + T bias. A relatively high AT content has also been reported in other Aythya species [36]. In addition, AT-skews and GC-skews are often used to assess nucleotide-compositional differences in mitochondrial genomes [39]. We found that the A. marila mitochondrial genome AT-skew (0.139) was positive, whereas the GC-skew (−0.359) was negative, indicating higher A and C abundances than T and G abundances, respectively (Table 3).
tRNA-Pro, tRNA-Ser2, and tRNA-Tyr) were located on the light strand (L-strand), whereas the CR and the remaining 28 genes were located on the heavy strand (H-strand). The gene order was the same as that in most bird mitochondrial genomes [37,38] (Table  2). The total length of protein-coding, tRNA, and rRNA genes comprised 68.65%, 9.29%, and 15.57% of the entire mitochondrial genome, respectively. The base composition of the A. marila genome was 29.42% A, 22.25% T, 15.49% G, and 32.84% C, and the A + T content (51.67%) was higher than the G + C content (48.33%), suggesting a strong A + T bias. A relatively high AT content has also been reported in other Aythya species [36]. In addition, AT-skews and GC-skews are often used to assess nucleotide-compositional differences in mitochondrial genomes [39]. We found that the A. marila mitochondrial genome AT-skew (0.139) was positive, whereas the GC-skew (−0.359) was negative, indicating higher A and C abundances than T and G abundances, respectively (Table 3).

Protein-Coding Genes and Codon Usage
The mitochondrial genome of A. marila has 13 coding genes with a total length of 11,407 bp and encodes cytochrome b, 2 ATPases, 3 cytochrome c oxidase subunits, and 6 NADH dehydrogenase subunits. The lengths of the 13 PCGs range from 168 to 1824 bp; among these, 10 were initiated by the start codon ATG, whereas 3 (COX I, COX II, and ND5) were initiated by the start codon GTG. Seven protein genes (ATP6, ATP8, COII, ND3, ND4L, ND5, and Cytb) used TAA as the stop codon. For COI and ND1, the stop codon was AGG, whereas that for ND6 was TAG. ND2, ND4, and COIII ended with the incomplete stop codon T ( Table 2).
The average A + T content of PCGs in the mitochondrial genome of A. marila was 50.58%, ranging from 49.34% (Cytb) to 52.03% (ND4). Except for ND6, the base compositions and skewness of the remaining 12 PCGs were relatively similar. Only the percentages of ND6 T and G were considerably higher than those of the other genes, with a positive GC-skew ( Table 3). The usage of amino acids and RSCU values in the PCGs of A. marila are summarized in Figure 2 and Table 4. The most frequently used codons (amino acids) were CUA (317), AUC (208), GCC (173), UUC (167), CUC (166), and ACC (156). Except for Trp and Met, all amino acids have their preferred codons, which is a unique feature of A. marila compared to other Aythya species identified to date [36].

Ribosomal and Transfer RNA Genes
Similar to other Aythya species, the A. marila mitochondrial genome contained two rRNAs with a total length of 2587 bp, located on the H-strand. The 12S rRNA gene and 16S rRNA are located between tRNA-Phe and tRNA-Leu, which are typically separated

Ribosomal and Transfer RNA Genes
Similar to other Aythya species, the A. marila mitochondrial genome contained two rRNAs with a total length of 2587 bp, located on the H-strand. The 12S rRNA gene and 16S rRNA are located between tRNA-Phe and tRNA-Leu, which are typically separated by tRNA-Val. The base composition of the 12S rRNA gene was A = 32.62%, T = 19.21%, C = 28.2%, and G = 20.02%, whereas that of the 16S rRNA gene was A = 34.56%, T = 19.90%, C = 25.76%, and G = 19.78%; both rRNAs had a slight AT bias ( Table 3).
The A. marila mitochondrial genome contained 22 tRNAs ranging in size from 66 bp to 76 bp, with an obvious AT bias (56.65%). Of these, 14 were located on the H-strand, and 8 were located on the L-strand ( Table 2). The secondary structure of tRNA is shown in Figure 3. In many vertebrate mitochondrial genomes, the tRNA-Ser (GCT) gene has an unusual secondary structure owing to the lack of a dihydrouridine arm. However, some previous studies have shown that this structure does not affect tRNA function [40][41][42][43]; other than tRNA-Ser (GCT), tRNA genes form the classical cloverleaf secondary structure, and the anticodon arms contain a relatively conserved region [40].

Control Region and Repetitive Sequences
A control region was identified in the mitochondrial genome of A. marila. This region is located between tRNA-Glu and tRNA-Phe and has a length of 1066 bp. The AT content (51.97%) and GC content (48.03%) of this region were similar to those of the whole mitochondrial genome (Table 3). We also found that the control region AT-skew (0.051) was positive, whereas the GC-skew (−0.340) was negative, which indicated that the A content was higher than the T content and that the G content was lower than the C content. A total of 24 repeat sequences were detected in the A. marila mitochondrial genome, which contained 22 forward repeats and 2 palindromic repeats (Figure 4). The repeat sequences ranged from 20 to 23 bp with a total size of 497 bp, constituting 2.99% of the mitochondrial genome.  chondrial genome (Table 3). We also found that the control region AT-skew (0.051) was positive, whereas the GC-skew (−0.340) was negative, which indicated that the A content was higher than the T content and that the G content was lower than the C content. A total of 24 repeat sequences were detected in the A. marila mitochondrial genome, which contained 22 forward repeats and 2 palindromic repeats ( Figure 4). The repeat sequences ranged from 20 to 23 bp with a total size of 497 bp, constituting 2.99% of the mitochondrial genome.

Synonymous (Ka) and Nonsynonymous (Ks) Substitution Rate
The Ka/Ks ratio can be used to determine whether the PCGs were under selection pressure [44,45]. To detect the pressure on A. marila mitochondrial PCGs, we calculated the Ka/Ks ratio of the 13 PCGs. As shown in Figure 5, the Ka/Ks ratio did not exceed 0.105 for any genes, indicating that these genes underwent intense purification selection. Moreover, the stabilization of the normal function of mitochondria may be due to the important role of these genes in purification selection [46]. ATP8 had the highest Ka/Ks value (Ka/Ks = 0.105), indicating that ATP8 experienced the least selective pressure and evolved the fastest. However, the Ka/Ks value of COI was the lowest (Ka/Ks = 0.003), indicating that COI had the highest selective pressure and had evolved the slowest.

Synonymous (Ka) and Nonsynonymous (Ks) Substitution Rate
The Ka/Ks ratio can be used to determine whether the PCGs were under selection pressure [44,45]. To detect the pressure on A. marila mitochondrial PCGs, we calculated the Ka/Ks ratio of the 13 PCGs. As shown in Figure 5, the Ka/Ks ratio did not exceed 0.105 for any genes, indicating that these genes underwent intense purification selection. Moreover, the stabilization of the normal function of mitochondria may be due to the important role of these genes in purification selection [46]. ATP8 had the highest Ka/Ks value (Ka/Ks = 0.105), indicating that ATP8 experienced the least selective pressure and evolved the fastest. However, the Ka/Ks value of COI was the lowest (Ka/Ks = 0.003), indicating that COI had the highest selective pressure and had evolved the slowest.

Population Genetic Diversity and Differentiation
The complete mitochondrial genome sequence was used to estimate the genetic diversity and differentiation of the Aythya population. A total of 28 haplotypes were identified from 28 individuals, of which 4 were from A. nyroca, with the other 4 Aythya species each having 6 haplotypes. The haplotype diversity of the 5 populations was 1, whereas

Population Genetic Diversity and Differentiation
The complete mitochondrial genome sequence was used to estimate the genetic diversity and differentiation of the Aythya population. A total of 28 haplotypes were identified from 28 individuals, of which 4 were from A. nyroca, with the other 4 Aythya species each having 6 haplotypes. The haplotype diversity of the 5 populations was 1, whereas the nucleotide diversity (π) ranged from 0.00080 to 0.01109 per population (Table 5). These results show that our target species had the highest genetic diversity, whereas the critically endangered A. baeri had the lowest genetic diversity. The low genetic diversity of A. baeri is consistent with its population size, suggesting that it may be at risk of extinction. In addition, there was obvious differentiation between the five populations; however, no significant genetic differentiation was found between A. baeri and A. nyroca (F ST = −0.09740), suggesting that genetic exchange between the two populations is common ( Table 6).

Phylogenetic Analysis
We performed a comprehensive phylogenetic analysis on the mitochondrial genomes of all Anatidae species available in GenBank, employing both Bayesian inference (BI) and maximum likelihood (ML) methodologies. Both approaches yielded high bootstrap support values and Bayesian posterior probabilities. Our findings revealed that the BI and ML phylogenetic trees shared identical topologies; however, the tree generated using BI exhibited superior support values. Consequently, we exclusively present the BI tree in this study ( Figure 6). In the phylogenetic tree, there were four major clades among the Anatidae: Dendrocygninae, Oxyurinae, Anserinae, and Anatinae. The first branch, Dendrocygninae, contained only Dendrocygna javanica. D. javanica constitutes an early diverging lineage, representing one of the most distinctive genera within the Anatidae family. Their unique characteristics, such as an erect posture, elongated necks and legs, and tree-perching habits, distinguish them from the majority of other waterfowl species [47,48].
The second is Oxyurinae, which consists of only Oxyura jamaicensis. The phylogenetic relationship of Oxyurinae is also controversial [49]. According to early morphological studies, Oxyurinae shares a common ancestor with geese and swans [49,50], leading us to consider Oxyurinae as the group most closely related to Anserinae whilst not being within Anatinae. In previous studies, Oxyurinae was shown to have diverged earlier than Dendrocygninae in Anatidae [51], and our results also support Oxyurinae as an independent subfamily. The third is Anserinae, which includes Cygnini and Anserini. Anserini contains the genera Anser and Branta, which are closely related species and sisters to the Cygnini assemblage, which is consistent with morphological data [52]. Meigini, Anatini, Aythyini, and Cairinini form the fourth branch of the Anatinae. Within Anatinae, Aythyini is monophyletic, Aythyini and Anatini have the closest phylogenetic relationships, and Aythyini diverged earlier than Anatini based on morphological studies and phylogenetic analysis [49,53]. Within the genera, Aythya, A. marila, and A. fuligula form a branch, whereas A. americana and A. ferina, and A. baeri and A. nyroca were clustered into one branch, with all branches being supported by good statistical values. In general, our findings support those of the previous molecular phylogenetic studies. In an effort to further elucidate the population structure among the five Aythya species (A. ferina, A. nyroca, A. baeri, A. fuligula, and A. marila), we conducted a comprehensive phylogenetic analysis based on complete mitochondrial genomes (Figure 7). Our findings revealed that the Aythya genus can be classified into three distinct clades: one comprising A. baeri and A. nyroca, which exhibit a relatively close phylogenetic relationship; another encompassing A. marila and A. fuligula, which are also closely related; a third, separate clade consisting solely of A. ferina. In addition, our results demonstrated that A. baeri and A. nyroca display evident mixed clustering patterns, while A. marila and A. fuligula exhibit an interwoven arrangement, suggesting the potential for frequent hybridization events within the Aythya genus. However, it is important to note that the maternal inheritance characteristics of the mitochondrial genome, coupled with the pervasive hybridization behavior observed in the Aythya genus, may introduce inaccuracies when attempting to depict the population structure of these species. In an effort to further elucidate the population structure among the five Aythya species (A. ferina, A. nyroca, A. baeri, A. fuligula, and A. marila), we conducted a comprehensive phylogenetic analysis based on complete mitochondrial genomes (Figure 7). Our findings revealed that the Aythya genus can be classified into three distinct clades: one comprising A. baeri and A. nyroca, which exhibit a relatively close phylogenetic relationship; another encompassing A. marila and A. fuligula, which are also closely related; a third, separate clade consisting solely of A. ferina. In addition, our results demonstrated that A. baeri and A. nyroca display evident mixed clustering patterns, while A. marila and A. fuligula exhibit an interwoven arrangement, suggesting the potential for frequent hybridization events within the Aythya genus. However, it is important to note that the maternal inheritance characteristics of the mitochondrial genome, coupled with the pervasive hybridization behavior observed in the Aythya genus, may introduce inaccuracies when attempting to depict the population structure of these species.

Conclusions
In the present study, the complete mitochondrial genome of A. marila was sequenced, annotated, and reported for the first time. The mitochondrial genome sequence of A. marila was 16,617 bp in length and consisted of 13 PCGs, 2 rRNA genes, 22 tRNA genes, and a D-loop. Among the PCGs, ATP8 evolved the fastest, whereas COI evolved the slowest. Phylogenomic analysis based on 13 protein-coding genes among 52 Anatidae species showed that A. marila was closely related to A. fuligula. We also assembled 28 complete mitochondrial genomes of 5 Aythya species and found no significant genetic differentiation between A. baeri and A. nyroca. Moreover, A. baeri had the lowest genetic diversity, which not only complemented the available Aythya species resources but also further supported their taxonomic status. Our results also provide a beneficial reference for the taxonomy, population genetics, and systematic studies of Aythya species.

Conclusions
In the present study, the complete mitochondrial genome of A. marila was sequenced, annotated, and reported for the first time. The mitochondrial genome sequence of A. marila was 16,617 bp in length and consisted of 13 PCGs, 2 rRNA genes, 22 tRNA genes, and a D-loop. Among the PCGs, ATP8 evolved the fastest, whereas COI evolved the slowest. Phylogenomic analysis based on 13 protein-coding genes among 52 Anatidae species showed that A. marila was closely related to A. fuligula. We also assembled 28 complete mitochondrial genomes of 5 Aythya species and found no significant genetic differentiation between A. baeri and A. nyroca. Moreover, A. baeri had the lowest genetic diversity, which not only complemented the available Aythya species resources but also further supported their taxonomic status. Our results also provide a beneficial reference for the taxonomy, population genetics, and systematic studies of Aythya species.