Structure and comparative analysis of the mitochondrial genomes of Liolaemus lizards with different modes of reproduction and ploidy levels

Liolaemus is the most specious genus of the Squamata lizards in South America, presenting exceptional evolutionary radiation and speciation patterns. This recent diversification complicates the formal taxonomic treatment and the phylogenetic analyses of this group, causing relationships among species to remain controversial. Here we used Next-Generation Sequencing to do a comparative analysis of the structure and organization of the complete mitochondrial genomes of three differently related species of Liolaemus and with different reproductive strategies and ploidy levels. The annotated mitochondrial genomes of ca. 17 kb are the first for the Liolaemidae family. Despite the high levels of sequence similarity among the three mitochondrial genomes over most of their lengths, the comparative analyses revealed variations at the stop codons of the protein coding genes and the structure of the tRNAs among species. The presence of a non-canonical dihydrouridine loop is a novelty for the pleurodonts iguanians. But the highest level of variability was observed in two repetitive sequences of the control region, which were responsible for most of the length heterogeneity of the mitochondrial genomes. These tandem repeats may be useful markers to analyze relationships of closely related species of Liolaemus and related genera and to conduct population and phylogenetic studies.

In our work, the first mitcochondrial genomes of Liolaemus were assembled using Next-Generation Sequencing Technology to do a comparative analysis of the structure and organization of the complete mitogenomes of three differently related species of Liolaemus and with different reproductive strategies and ploidy levels. The species were selected from different phylogenetic clades, that is, L. darwinii and L. parthenos belong to the L. darwinii clade while L. millcayac belongs to the more basal clade L. anomalus. Moreover, L. darwinii and L. millcayac are sexually reproductive species, while L. parthenos is the only parthenogenetic triploid species known for pleurodont iguanian lizards, and the studies propose that it originated from hybridization between L. darwinii and other unknown species of the genus (Abdala et al., 2016). These are the first annotated mitogenomes for the Liolemidae family, and the comparative analyses revealed a particular structural characteristic of the Liolaemus mitogenomes among pleurodont iguanians, and hypervariable regions that can be adopted to improve further population or phylogenetic studies in Liolaemus and related genera.

Sample, DNA isolation and whole-genome sequencing
This study was conducted in accordance with international standards on animal welfare, as well as being compliant with national regulations and the "Comité Nacional de Ética en la Ciencia y la Tecnología" of Argentina. Specimens were euthanized with a 1% Alital solution, fixed with 10% formaldehyde and stored in 70% alcohol. Voucher of all the specimens are housed in the herpetological collections of the Fundación Miguel Lillo (FML, Tucumán, Argentina). All the collections were done under the permits 090/02, 034/ 06, 1397/07, 821/10, 296/10, 1259/10, 1296/10 issued by Departamento de Fauna, Mendoza, Argentina. For molecular studies, DNA was extracted from liver or muscle tissues stored in a freezer with 96% ethanol.
For genome sequencing, the genomic DNA of three Liolaemus species (L. darwinii, L. parthenos and L. millcayac) were isolated from ethanol-preserved liver samples using salt extraction protocol as outlined in Aljanabi (1997). The integrity and quality of each genomic DNA were checked by electrophoresis and spectrophotometry, respectively. The samples were remitted to Macrogen Inc., Korea, for library construction and sequencing. Briefly, DNA samples were randomly fragmented with Covaris and the libraries were then prepared using the Illumina TruSeq Nano DNA Kit (550 bp insert size). These libraries were sequenced on Illumina NovaSeq platform to obtain 2 × 150 bp paired-end reads.

Assembly and annotation
All the bioinformatic analysis was carried out using the computational infrastructure of the Instituto de Botánica del Nordeste (Universidad Nacional del Nordeste-CONICET). After removing low quality sequences (Phred scores <30) and trimming Illumina adapters with Trim Galore (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), the de novo assemblies of mitochondrial genomes were performed using NOVOPlasty v2.6.3 (Dierckxsens, Mardulyn & Smits, 2017). This software assembles organelle genomes using a seed-and-extend algorithm from WGS data, starting from a related or distant single "seed" sequence and an optional "bait" reference mitogenome. For mitogenome assembly of Liolaemus species, we used the nucleotide sequence of the cytochrome oxidase subunit 1 and the mitogenome of Anolis punctatus as seed and bait reference mitogenome (NCBI RefSeq NC_044125.1), respectively. These assemblies were performed following the developer's suggestions and using a kmer size of 23. The assembled mitogenomes of the three Liolaemus species were annotated using the software GeSeq (Tillich et al., 2017), selecting the options to perform tRNAscan-SE and BLAT using the sequence for Anolis punctatus. Then, annotations were manually checked to correct misannotated regions. The mitogenome maps were drawn using GenomeVx online tool (Conant & Wolfe, 2008). Complete annotated mitogenomes were deposited in the GenBank with the following accession numbers: MT810467 (L. darwinii), MT810468 (L. millcayac) and MT810469 (L. parthenos).

Comparative analysis
The assembled Liolaemus mitogenomes were compared to calculate the nucleotide composition, Relative Synonymous Codon Usage (RSCU), non-synonymous (Ka) and synonymous (Ks) substitutions, gene re-arrangements, and variation sites.
Codon usage statistics were calculated using DnaSP version 5.0 (Librado & Rozas, 2009). The nucleotide sequences of each PCG were aligned based on the amino acid sequences in TranslatorX (Abascal, Zardoya & Telford, 2010) with MAFFT algorithm. The ratios of non-synonymous substitutions (Ka) and synonymous (Ks) substitutions were estimated in DnaSP version 5.0 (Librado & Rozas, 2009). To assess interspecific variation, pairwise comparisons among the three Liolaemus mitochondrial genomes were made with mVISTA program (Frazer et al., 2004) in Shuffle-LAGAN mode. The mitogenomes were aligned, and the overall sequence identity was plotted, using the annotation of L. parthenos as reference.

Genome sequencing and assembly of mitogenomes
A summary of the sequencing and mitogenome assembly outputs is shown in the Table S1. Briefly, after quality filtering of the Illumina NovaSeq raw data, a total of 42,135,996 reads were obtained for Liolaemus parthenos, 39,057,114 reads for L. darwinii and 44,253,406 reads for L. millcayac. These sequence data were used as input to assemble the mitogenome in each species. The assembled mitogenomes of Liolaemus parthenos, L. darwinii, and L. millcayac consisted in circular molecules of 16,838 bp, 16,974 bp and 16,945 bp, respectively. These assemblies were highly covered by sequence reads (average coverage of 1,171 in L. parthenos, 715 in L. darwinii and 572 in L. millcayac) giving the chance to develop robust mitogenomes.

Genome organization and composition
The genetic maps of Liolaemus mitogenomes are shown in Fig. 1. They were composed of 37 genes (13 PCGs, 22 tRNAs and two rRNAs) and two non-coding regions (the origin for light-strand replication and a control region). The mitochondrial genomes showed identical gene order and organization (Fig. 2). In addition, the alignment revealed high sequence similarity across most of the extension of the three mitogenomes. The sequence identities in pairwise comparisions were 99,44% (L. parthenos-L. darwinii), 86,46% (L. millcayac-L. parthenos) and 86,40% (L. millcayac-L. darwinii). The overall base composition of the three mitogenomes was very similar, showing a bias towards A and T nucleotides (ca. 61% Table 1).
Protein-Coding Genes were organized in a large cluster on the heavy strand, except for the PCG "ND6" and eight tRNAs (tRNAGln, tRNAAla, tRNAAsn, tRNACys, tRNATyr, tRNASer (UCN), tRNAGlu and tRNAPro) which were organized on the light strand. Intergenic spacers were observed both on the H and the L strands. Four intergenic spacers (ND2-tRNATrp, COX2-tRNALys, tRNALeu-ND5 and CYTB-tRNAThr) were observed on the H-strand and four (tRNAAla-tRNAAsn, tRNAAsn-tRNACys, tRNACys-tRNATyr and ND6-tRNAGlu) on the L-strand. Overlapped segments were observed on ATP8-ATP6, ATP6-COX3, ND4L-ND4, tRNASer-tRNALeu, all in the H-strand. However, in L. millcayac, overlapped segments were observed on COX2-tRNALys and  tRNALeu-ND5 and one intergenic spacer was found between the genes ATP6 and COX3. The sizes of spacers and overlaps were variable in the three species analyzed were inferred from annotations of Table 2, and their evolutionary significance should be tested in a broader context.

Protein coding genes
The A+T content of PCGs was about 61% in the three species analyzed (Table 1). The total length of the 13 PCGs was conserved in Liolaemus species (11,355 bp). This value accounted for approximately 67% of the total mitochondrial genome length. The sizes of the different protein-coding genes were also conserved in the three species analyzed (Table 2), being the NAD5 the longest (1,791 bp) and ATP8 the shortest one (168 bp). The analysis of the nucleotide diversity revealed high conservation of nucleotide sequences of the 13 PCGs. Among them, the most conserved genes were COX2 (0.0907) and COX3 (0.0909), whereas ND3 (0.1532) was the less conserved PCG (Fig. 3).
Most of the PCGs showed the same start codon in the three species (Table 2). Twelve genes started with ATG, while only the COX1 started with GTG. The four stop codons of the vertebrate mitochondrial code were observed in the species here analyzed. The sequence "TAA" was the most frequent and it was present in COX2, ATP8, ATP6, ND4L and ND5 in the three species, and also in the genes ND2 and CYTB in the species L. darwinii and L. parthenos. The stop codon "AGA" was observed in the ND6 (only in L. darwinii and L. parthenos) and COX1 genes (all species). The sequence "TAG" was only found in the gene ND1 for L. darwinii and L. parthenos, and in the genes ND1, ND2 and CYTB for L. millcayac. The stop codon "AGG" was only present in the gene ND6 of L. millcayac. Three genes (COX3, ND3 and ND4) showed incomplete stop codons "T", which is completed by the addition of 3′ "A" residues to the mRNA. These results showed that the different Liolaemus species present variability in the stop codons of some genes whose usefulness as diagnostic characters should be rigorously sampled in the genus.
The abundance of codon families and Relative Synonymous Codon Usage (RSCU) in the 13 PCGs in Liolaemus mtDNAs are shown in Fig. 4A. A total of 3,774 non-stop codons with a very similar behavior were found in each species. The most frequently used amino acids were Leucine (Leu), followed by Isoleucine (Ile), Threonine (Thr) and Alanine (Ala) (Fig. 4A). The analysis of relative synonymous codon usage (RSCU) revealed the presence of 60 codons representing 22 amino acids. Each amino acid was represented by two to four codons (Figs. 4B-4D), being CTA (Leu), ATT (Ile), TAC (Thr) and AAC (Ala), the most frequently used for the three species. The ratios of non-synonymous (Ka) versus synonymous (Ks) substitutions (Ka/Ks ratios) of the 13 PCGs were less than 1; among which ATP8 and COX1 had the highest and lowest rates, respectively (Fig. 5). This relationship suggests that a strong purifying and negative selection may be operating on these genes.

Ribosomal and transfer RNAs
The two rRNA genes (rrnS and rrnL) are typically between tRNAPhe and tRNALeu (UUA) and separated by tRNAVal (Fig. 1). Both rRNAs were AT-rich (ca. 59.5%), and the length of each rRNAs was identical in the three species (Table 1). The level of sequence identity of rRNA genes followed the general findings for the whole mitogenome, being highest between L. darwinii and L. parthenos ( Table 2). The 22 tRNA genes were interspersed between the rRNAs and the protein-coding region (Fig. 1). The nucleotide composition of these tRNAs showed approximately 59% of A and T (Table 1) and their sequence lengths were very similar in the three species except for tRNATyr, which showed a size variation of 6 bp ( Table 2). The sequences of all tRNA genes displayed the typical cloverleaf secondary structure composed of four domains and a short variable loop: the acceptor stem, the dihydrouridine stem and loop (DHU), the anticodon stem and loop, the thymidine stem and loop (TψC), and the variable (V) loop (Figs. S1-S3) 1. However, some variants in the secondary structure were observed among species in the acceptor, DHU, TψC and anticodon domains of the different tRNAs. A deletion of stem, loop and stem and loop of the DHU domain were observed in the trnI, trnC and trnS genes, respectively. Mismatched base pairs in the acceptor arm (trnF, trnQ, trnM, trnD, trnK, trnR, trnH and trnT), DHU arm (trnW) and TψC arm (trnM,   The origin of light-strand replication (OL) was located in the WANCY region (between the tRNAAsn and tRNACys genes) as expected for vertebrates. The OL stem and loop structures of the three Liolaemus species (Fig. 6) were similar to the consensus sequences in squamate lizards . The stem region presented a potential 3′-GCC-5′ light-strand elongation start point as was described in mouse (Brennicke & Clayton, 1981) and later observed in lizards of the Leiolepis genus (Macey et al., 2006). In addition, the 3′-GBCCB-5′ sequence was found downstream of the stem region, which is a variant of the sequence 3′-GGCCG-5′ required for in vitro replication of human mitochondrial DNA (Hixson, Wong & Clayton, 1986). The CRs of L. darwinii, L. parthenos and L. millcayac were 1,645 bp, 1,511 bp and 1,626 bp in length, respectively (Table 2), as expected for the Iguanidae (Okajima & Kumazawa, 2010). The CRs showed a high content of A and T nucleotides (65% on average, Table 1) and localized in the same position reported for most vertebrates, between the tRNAPro and tRNAPhe genes (Boore, 1999). They are composed by three domains: TAS, CD and CSB (Fig. 7). The TAS domain (domain 1) was the most variable both in length (from 644 bp in L. parthenos to 755 bp in L. millcayac) and nucleotide diversity "π" (0.08712), while CD domain (domain 2) was the most conserved, both in sequence length (341 bp in the three species) and nucleotide diversity (0.02933). Intermediate values of sequence length (from 526 bp in L. parthenos to 530 bp in L. millcayac) and nucleotide diversity (0.06460) were observed in the CSB domain (domain 3). Domain 1 presented the ETAS (extended termination associated sequences) like sequences, the domain 2 had five conserved blocks (F, E, D, C and B) and the domain 3 contained three conserved blocks (CSB-1, 2 and 3). The arrangement observed for the control region is the most frequently reported for vertebrates (Satoh et al., 2016).
Aside from the conserved motifs, repeated regions were found in two domains of the mitochondrial CRs of Liolaemus species (Table 3). Two arrays of tandem repeats were localized in the domain 1. The first one was localized upstream of the ETAS-like sequences and consisted of repeat units of 33 bp (L. darwinii and L. parthenos) and 34 bp (L. millcayac). Importantly, the copy number of these repeat units was variable among the species, ranging from 5 (L. millcayac) to 9 (L. darwinii). This variation in copy number is the responsible of most of the length difference observed in the control region among the Liolaemus species. The second array detected in domain 1 was localized between the first tandem repeat region and ETAS-like sequences and, it was formed by two tandemly repeated units of 77 bp (L. darwinii and L. parthenos) and 76 bp (L. millcayac). In contrast with the first array, the copy number of the latter was conserved in the three species analyzed (two copies). Two types of AT-rich sequences, (AxTy)n and (TA)n, were found in the domain 3 of the CRs of the three Liolaemus species. The first one occurred between CSB-1 and CSB-2 motifs, while the second was localized 5′ to the tRNAPhe gene. The (AxTy)n sequence was composed of an 11 bp (AAATTAAATTA) unit repeated 5 times in L. darwinii and L. parthenos, while in L. millcayac, it was composed of a slightly different motif (AAATCAAATTA) repeated seven times. By contrast, the (TA)n

DISCUSSION
The appearance of Next Generation Sequencing technologies together with the development of bioinformatic pipelines has facilitated the high-quality assembly of organelle genomes in eukaryote species. Here, through the analysis of Illumina paired-end reads, we provided the complete structure and organization of the mitochondrial genomes of three differently related species of Liolaemus with different reproductive strategy and ploidy levels, looking for variable regions useful for phylogenetic and population analysis. The assembled sequences of the three Liolaemus mitogenomes are typical circular DNA molecules of similar size (approximately 16.9 kb). The content and order of genic and non-genic regions showed a high structural conservation among the species analyzed and, in general, when compared to other lizard genomes (Amer & Kumazawa, 2005;Okajima & Kumazawa, 2010;Nogueira Dumans et al., 2019). The low variability observed among the species analyzed is in accordance with the relatively recent radiation of the Liolaemus boulengeri group (Morando et al., 2020). Our results showed that the mitogenome of L. parthenos has not undergone any gene order re-arrangement during its evolutionary history as was previously recorded for several unisexual lizards (Moritz & Brown, 1987;Moritz, 1991;Zevering et al., 1991;Fujita, Boore & Moritz, 2007) suggesting structural conservation of the Liolaemus mitogenomes despite the mode of species reproduction and ploidy level. Despite the high gene conservation at interspecific level, some structural variations were observed in the secondary structure of tRNA genes, particularly the genes trnS2 and trnC. The structural variation in the trnS genes has been reported in diverse lineages of vertebrates, such as Artiodactyla (Zhou et al., 2019), Galliformes (Li, Huang & Lei, 2015), Anura (Yang et al., 2018). By contrast, the phenomenon of the trnC gene lacking a canonical DHU arm is not common in the mitogenomes of vertebrates and was only found in acrodont lizards so far . Thus, the finding of a non-canonical structure on DHU arm in Liolaemus constitutes the first report in the pleurodont iguanians. Moreover, the derived secondary structures of mitochondrial tRNAs found here have useful characters for interspecific phylogenetic studies in Liolaemus, since estimates of homoplastic changes for tRNA secondary structural characters were reported to be several (more than five) times less than for base position changes in lizards .
The control regions were identified as those with the lowest values of sequence identity and are further analyzed below. In fact, the differences in the genome length were mainly due to the variable number of repeats in these regions. This degree of similarity in the pairwise comparison is in agreement with the phylogenetic relationships; whereas L. millcayac is placed in the early-diverged lineage L. anomalus clade; L. darwinii and L. parthenos are in the L. darwinii clade Morando et al., 2004;Abdala et al., 2016). Moreover, in their phylogenetic hypotheses with a few mitochondrial genes, Abdala et al. (2016) recovered L. parthenos nested within L. darwinii and proposed it as their maternal ancestor. The comparisons of the whole mitochondrial genomes are congruent with this hypothesis although a more comprehensive genomic analysis of the L. boulengeri series, including nuclear genomes, is needed to shed light on which species are the ancestors of this curious lizard.
Although the CR is usually thought to be the fastest-evolving region of the mitogenome (Brown, George & Wilson, 1979), few reports explored the usefulness of the CR in phylogenetic analyses of lizards (Lalitha & Chandavar, 2018). Our study evidenced that this region may be highly informative for a variety of studies of the genetic variability of mitogenomes in lizards. For this purpose, a set of primer pairs were designed based on the conserved structures that flanked the repetitive sequences of the CR to reveal their variability (Table 4). We expect that the regions amplified by these primers pairs can provide very informative characters to analyze relationships of closely related species or at the infraspecific level and, to conduct population structure studies in Liolaemus. Moreover,

CONCLUSIONS
We used low coverage whole genome sequencing of genomic DNA to assemble and annotate the first complete mitochondrial genomes from lizards of the Liolaemidae family, more precisely from three species of the genus Liolaemus (L. parthenos, L. darwinii and L. millcayac). The annotation of Liolaemus mitogenomes obtained in this study provides a comprehensive analysis of the nucleotide diversity of different regions for further development of useful genetic markers. Despite the high levels of sequence similarity among the three mitogenomes in most of their length, significant differences in copy numbers and motifs of the tandem repeats in the CRs were identified. These VNTRs arose as highly variable regions among Liolaemus species useful for future population and phylogenetic studies. Although a wide comparative genomic analysis is needed, the high similarity of the mitochondrial genomes evidences that the hypothesis that considers L. darwinii as a matrilineal ancestor of the hybrid triploid L. parthenos is plausible.