Mitochondrial genomes and comparative analyses of Culex camposi, Culex coronator, Culex usquatus and Culex usquatissimus (Diptera:Culicidae), members of the coronator group

The Coronator Group currently encompasses six morphologically similar species (Culex camposi Dyar, Culex coronator Dyar and Knab, Culex covagarciai Forattini, Culex usquatus Dyar, Culex usquatissimus Dyar, and Culex ousqua Dyar). Culex coronator has been incriminated as a potential vector of West Nile Virus (WNV), Saint Louis Encephalitis Virus (SLEV), and Venezuelan Equine Encephalitis Virus (VEEV). The complete mitochondrial genome of Cx. coronator, Cx. usquatus, Cx.usquatissimus, and Cx. camposi was sequenced, annotated, and analyzed to provide genetic information about these species. The mitochondrial genomes of Cx. coronator, Cx. usquatus, Cx.usquatissimus, and Cx. camposi varied from 15,573 base pairs in Cx. usquatus to 15,576 in Cx. coronator. They contained 37 genes (13 protein-encoding genes, 2 rRNA genes, and 22 tRNA genes) and the AT-rich control region. Comparative analyses of the 37 genes demonstrated the mitochondrial genomes to be composed of variable and conserved genes. Despite the small size, the ATP8, ATP6 plus NADH5 protein-encoding genes were polymorphic, whereas tRNAs and rRNAs were conserved. The control region contained some poly-T stretch. The Bayesian phylogenetic tree corroborated that both the Coronator Group and the Culex pipens complex are monophyletic taxa. The mitochondrial genomes of Cx. coronator, Cx. usquatus, Cx. usquatissimus and Cx. camposi share the same gene composition and arrangement features that match to those reported for most Culicidae species. They are composed of the same 37 genes and the AT-rich control region, which contains poly-T stretches that may be involved in the functional role of the mitochondrial genome. Taken together, results of the dN/dS ratios, the sliding window analyses and the Bayesian phylogenetic analyses suggest that ATP6, ATP8 and NADH5 are promising genes to be employed in phylogenetic studies involving species of the Coronator Group, and probably other species groups of the subgenus Culex. Bayesian topology corroborated the morphological hypothesis of the Coronator Group as monophyletic lineage within the subgenus Culex.


Background
Culex camposi Dyar, Culex coronator Dyar and Knab, Culex covagarciai Forattini, Culex ousqua Dyar, Culex usquatus Dyar and Culex usquatissimus Dyar are members of the Coronator Group of the subgenus Culex [1]. Morphological features of the fourth-instar larva, pupa and female cannot distinguish among species of the Coronator Group. However, traits of the male genitalia allow species identification. Species of the Coronator Group are largely distributed and some are sympatric in the Neotropics [2]. In southeastern Brazil, for instance, the geographical distribution of Cx. usquatus, Cx. coronator and Cx. camposi overlaps [1]. Phylogenetic relationships among the species remain unresolved, because results of studies employing DNA sequences of the cytochrome c oxidase subunit 1 (COX1) mitochondrial gene [3,4] and of the internal transcribed spacer 2 (ITS2) of the ribosomal DNA [5] showed poorly supported clades. Likely because of the lack of studies, there is no evidence of hybridization among species of the Coronator Group.
Culex coronator is a potential vector of arboviruses. As such, this species may participate in the dynamics of the transmission of the Saint Louis Encephalitis Virus (VESL), the Venezuelan Equine Encephalitis Virus (VEEV), the Mucambo Virus (MV) [6,7], and the West Nile Virus (WNV) [8]. Experiments of susceptibility of Cx. coronator to the WNV showed that the population from Florida, USA, is competent in disseminating the WN-FL03p2-3 strain under some conditions; however, the transmission rate decreases at lower temperatures (0-17 % at 25°C and 28-67 % at 28°C) [8]. The impossibility of accurately identifying a species can lead to inconclusive studies regarding the vector status of it. The problematic identification of species of the Coronator Group may be causing difficulties in defining the medical importance of this group.
The mitochondrial genome of the majority of the metazoan organisms is a small genome (ranging from 15 to 20 kb). It contains 37 genes, which encode 13 Protein-Coding Genes (PCGs) involved in oxidative phosphorylation, 2rRNA (rrnL and rrnS) and 22 tRNAs genes, which are necessary for the translation of the proteins encoded by the genes [9,10]. The mitochondrial genome also encompasses a region that is rich in adenine (A) and thymine (T). This AT-rich region starts and controls the replication of the mitochondrial genome [11]. Because of the lack of recombination and fast evolutionary processes in comparison to those that involve nuclear genes, the mitochondrial genome seems to be a source of information that can be employed to address evolutionary processes of both vertebrates and invertebrates [12][13][14].
Although being largely employed in molecular taxonomy and phylogeny, only Culex quinquefasciatus Say and Culex pipiens L. possess their mitochondrial genome available in the Genbank. Considering the potential taxonomic and phylogenetic content of genes of the mitochondrial genome for Culex species and the role of Cx. coronator as vector of arboviruses, four species of the Coronator Group were the focus of this study. Next generation sequencing technology was applied to sequence the mitochondrial genomes of Culex coronator, Culex usquatus, Culex camposi, and Culex usquatissimus. The objectives of the study were to: 1) describe the mitochondrial genome of four species of the Coronator Group; 2) to analyze the mitochondrial genome of four species of the Coronator Group; 3) to compare the genomes of species of the Coronator Group with other species of the Culicidae family available in the Genbank; 4) to assess the monophyly of the Coronator Group.

Mosquito sampling and DNA extraction
Mosquitoes and collection data from individuals employed in the study are in Table 1 and Fig. 1. Immature stages were field collected as either larvae or pupae and kept in the laboratory to obtain adults associated with larval and pupal exuviae for species identification. Only males were employed to sequence the mitochondrial genome because species identification is based on traits of the male genitalia (Table 1). Fourth-instar larval, pupal exuviae and dissected male genitalia were mounted on microscope slides in Canada balsam and are deposited in Coleção Entomológica de Referência, Faculdade de Saúde Pública, Universidade de São Paulo, Brazil, as vouchers. Males were kept in 95 % ethanol and stored at −80°C until DNA extraction.

PCR amplification and sequencing
Genomic DNA was extracted from each mosquito individually using QIAgen Dneasy® Blood and Tissue kit (QIAgen Ltd., Crawley, UK) following the same

Genome assembly and annotation
Mira v4 software was used for sequence assembly and Tablet to view the assembly [18,19]. The MITOS website [20] was used to delimit the genes. Annotation of protein coding genes was conducted using the Wise2 v2.2 package and the HMMER2 models [21] (HMMER v3.1b2; http://hmmer.janelia.org/).

Sequence analysis
Nucleotide composition values were calculated using the MEGA 6.06 software [22]. The GC-and AT-skews were used to determine strand asymmetry [23]. To measure these values, the following formulas were used: GC-skew (G-C) / (G + C) and AT-skew (A-T) / (A + T). The secondary structure of tRNAs was inferred in tRNAscan-SE 1.21 [24]. A sliding window of 200 bp and steps of 25 bp was implemented to estimate nucleotide divergence. The analysis was performed using the five complete mitochondrial genome sequences in DnaSP v. 5.10.01 [25]. Ratios between non-synonymous (dN) and synonymous (dS) substitution among protein-coding sequences of all pairs of species were calculated in KaKs calculator [26] using a modified Yang-Nielsen algorithm. This ratio is currently employed to estimate if genes are under negative (purifying) selection (dN/dS < 1), positive (adaptative) selection (dN/dS > 1), or neutral evolution.

Phylogenetic analysis
Phylogenetic analysis was carried out using Bayesian inference from the concatenated sequences of all protein coding genes of the five newly sequenced Culex specimens, plus the mitochondrial genomes of four species available in the GenBank, as follows: . The latter two species were employed as outgroups. Multiple alignments were generated in Clustal X [27] and then edited visually in BioEdit [28] to ensure that genes were in the frame. The best-fit model of nucleotide substitution for each gene was estimated in JModeltest 0.1.1 package [29], under Akaike Information Criterion (AIC). Bayesian analysis was performed in MrBayes v. 3.1.2 [30] and the dataset was partitioned by genes. Two independent runs with one cold and three heated chains each, were implemented for 1,000,000 generations with trees sampled every 100 generations; the consensus topology was generated after a burnin of 25 % of the retained trees.

Mitochondrial genome organization and nucleotide composition
The mitochondrial genomes of Cx. coronator, Cx. usquatus, Cx. usquatissimus and Cx. camposi contain 37 genes, which have been found in other metazoan and Culicidae species, including 13 protein-encoding genes, 22 tRNAs and 2 rRNA genes ,( rrnL and rrnS) and the AT-rich control region [31][32][33][34] (Table 2). No difference was observed in the length of the 13 protein coding genes. Furthermore, differences in gene length were primarily caused by insertion/deletions in the intergenic spacers and in the AT-control region. In all species, nucleotide composition was biased toward overall AT content. This AT-bias has been reported in the mitochondrial genomes of other metazoan species [34][35][36][37]. In Cx. coronator, Cx. usquatissimus AC from Acre state and Cx. usquatissimus RO from Rondônia state, the AT content value was 78.6 %, whereas in Cx. usquatus and Cx. camposi it was 78.9 % (Table 3). These values were similar to those found in other Culicidae species with the mitochondrial genome reported [31][32][33][34]. Considering species of the Anopheles albitaris complex, the AT content ranged from 77.1 % to 77.4 %; whereas in Cx. quinquefasciatus it was 77.7 %, and in Ae. aegypti, 79 %. Similar to Anopheles species [34] and other orders of Insecta [36], the value of the AT content increased in the third codon position of the PCGs varying from 81.4 % to 86.6 %, being slightly lower than in the control region (89.2 % to 90.1 %). One explanation for this phenomenon is the "transcription hypothesis of the codon usage" [38]. According to Sun et al. [38], the high availability of ATPs, along with the lack of other NTPs in cells, leads to the maximization of the use of adenines in the third codon position in order to increase the efficacy of the transcription. The overall AT-skew was positive in all species (0.0025 for Cx. usquatissimus RO and Cx. usquatissimus AC; 0.0012, 0.003 and 0.0038, in Cx coronator, Cx. camposi and Cx. usquatus, respectively), whereas the GC-skew was −0.15 for all species, showing that adenine and cytosine are in the highest proportion compared to thymine and guanine, respectively, in the majority strand. Those findings are in agreement with other studies of species of the genera Anopheles and Drosophila [34].

The protein-coding genes (PCGs)
Except for the COX1 gene in which no canonical ATN was found, the majority of the PCGs of Culex species possessed the start codon ATN (Additional file 1). The absence of the ATN start codon in the COX1 gene was found in Cx. quinquefasciatus, Cx. pipiens, Ae. aegytpti and in other insects [31][32][33][34][35] and therefore it has been widely discussed [39]. In 1983, de Brujin [40] proposed a tetranucleotide (ATAA) positioned upstream in the proximity of the opening codon of COX1 (TCG) as a start codon for Drosophila melanogaster. The author hypothesized that if an ATA, a recognized start codon, overlaps a stop codon TAA in the frame −1, the ATA of the tetranucleotide ATAA could function as a highly conserved 0 frame and as an opening frame. Notwithstanding, in the mitochondrial genome of Culex species, the closest ATAA tetranucleotide is located within the trnL2 gene, overlapping 36 base pairs of the gene. Consequently, it is plausible to suppose that the hexanucleotide ATTTAA, which usually flanks the beginning of the COX1 in mosquitoes, is involved in the initiation of the translation signal of this gene, as it was proposed for Tetrodontophora bielanensis [39].
Eleven genes (ATP6, ATP8, COX3, CytB, NAD2, NAD3, NAD4, NAD4L, NAD5, NAD6) have complete TAA stop codons, except for COX1 and COX2 genes, which terminate with an incomplete codon T. This phenomenon has been extensively reported in other mitochondrial genomes, especially in COX genes, and so  far it has been related to posttranscriptional polyadenylation, during which residual adenines are added in order to originate TAA terminators [34,36,41]. A total of 322 nucleotide sites were variable across PCGs of all Culex species analyzed, in which 270 were at the 3rd, 51 at the 1st and 1 at the 2nd codon position, encompassing most of those transition mutations. As expected, genes with a larger length (NAD2, COX1, NAD4, NAD5, CytB and NAD1) concentrated the majority of the polymorphisms; however COX1,only presented synonymous substitutions, whereas NAD2, NAD5, NAD4, CytB, and NAD1 showed 3, 2, 1, 3, and 2 non-synonymous substitutions, respectively. Additionally, despite the small length, ATP6 and ATP8 were polymorphic genes in Culex species, with 23 (2 non-synonymous) and 4 variable (2 non-synonymous) sites. Additionally, the dN/dS ratios (Fig. 2) were very low, ranging from 0.0 to 0.09, suggesting that all PCGs are under strong negative selection. Despite that, the strength of selection varies among genes; COX1, COX2, COX3, NADH1, NADH3, NADH4, NADH4L and NADH6 are the mitochondrial genes that are under the strongest purifying selection.
Results of the analysis of the relative synonymous codon usage (RSCU) showed that the codons with adenine and thymine in the third position were overused when compared to other synonymous codons (Table 4), which commonly occurs in other metazoan mitochondrial genomes [36,37,42]. This can be observed in the leucine amino acid, i.e., while the TTA codon presented a RSCU value of 3.75, the codon CTG, which also translates a leucine, showed a RSCU value of 0.14.

RNA genes
The set of 22 tRNA genes in all species varied in length from 64 bp in trnR gene to 72 bp in trnV gene (Additional file 1) . The AT content ranged from 78.6 % to 79.1 % ( Table 3). The anticodons of all tRNAs in Culex species were the same as that of other insects [31][32][33][34][35] (Additional file 1). The conventional cloverleaf-like structure of tRNAs was found in 20 (Additional file 2) of the 22 tRNAs described for Culex species. In contrast, the trnR and trnS1 genes could not be folded into cloverleaf-like structure. In the trnS1, several similar cases were reported in previously sequenced metazoan mitochondrial genomes because of the replacement of the DHU (dihydrouridine) arm by a DHU-loop [43,44]. The absence of a secondary structure in the trnS1 has been reported for Collembola species [39].
Similar to the mitochondrial genome of other metazoan species [32][33][34][35][36][37], the rrnL is located between trnL1 and trnV ribosomal genes, and the rrnS is situated between the trnV gene and the AT-rich region. In all species, the rrnL and rrnS are 2,124 base pairs long (Additional file 1). The AT content was 82 %, a value that is slightly higher than the overall content, however, similar to that observed in the 3rd codon position (Table 3). Both tRNAs and rRNAs genes are more conserved than the PCGs, with only 10 variable sites in tRNAs (throughout 1,462pb -1,463 bp) and 11 substitutions among 2,124 bp of the two rRNAs genes. Evolutionary studies relative to the tRNA and rRNA in Drosophila and Anopheles showed signatures of negative selection, because non-pairing substitutions such as G-C to G-U can affect the secondary structure of these genes [41,45].

Intergenic spacer sequences
A total of 52 bp to 56 bp non-coding nucleotides were found in the mitochondrial sequences of the species included in this study ( Table 2). The length of each spacer mostly varies from 1 to 5 bp, except for two spacers: one located between the trnk and the tranD genes (12 bp in Cx. coronator, and 11 bp in the remaining species); and the other between trnS2 and NAD1 genes with 19 bp in all samples. Although those spacers account for only 0.3 % of the whole mitochondrial genomes, they are composed almost exclusively of adenine and thymine, with an AT Fig. 2 Ratio of non-synonymous / synonymous (dN/dS) nucleotide substitutions. Ratios were calculated among the 13 protein-coding genes of the mitochondrial genome of four species of the subgenus Culex: Cx. coronator, Cx. usquatus, Cx. camposi, and Cx. usquatissimus content that is higher than that observed in the ATcontrol region, reaching 94.5 % in Cx. usquatissimus from Acre (Table 3).

AT-rich control region
The size variation, ranging from 719 bp in Cx. usquatus to 724 bp in Cx. coronator (Table 2) is within the range variation observed in the mitochondrial genomes reported for other Culicidae (from 531 bp in Anopheles gambiae to 731 bp in Culex quinquefasciatus). Those differences in length were caused by several indels of one to three nucleotides distributed throughout the AT regions of the mitochondrial genomes. Considering both indels and nucleotide substitutions, 34 variable sites were observed in Culex species. Similar to nuclear noncoding DNA, the AT-control region is subjected to a high level of polymorphisms that lead to variation in size and base composition. However, in species of the Anopheles gambiae complex, there is evidence that this region change more slowly than the 3rd codon position of PCGs [46]. Additionally, among different subgenera of Anopheles, there are conservative poly-T regions of 15 to 18 bp [34]. Comparing the control region of the four Culex species studied, we observed 4 conservative T-stretch regions of 6 to 15 bp. This was likely caused because this region contains several regulatory elements, including the origin of replication and transcription of the major strand of the mitochondrial genome [37] and, consequently, has a functional role, which is conserved among genera.

Sliding window analysis
Mitochondrial DNA is largely employed in phylogeny and evolutionary studies because it has a high mutation rate, maternal inheritance, high number of copies, and intraspecific polymorphisms [47]. Herein, we conducted   were employed as outgroups. Complete mitochondrial genomes of Cx. quinquefasciatus, two specimens of Cx. pipens, Ae. albopictus and An. darlingi were downloaded from the GenBank a sliding window analysis using the alignment of the entire mitochondrial genome of Culex species to verify regions with high nucleotide divergence and thus, identified potential markers that can be employed for future studies focusing on Culex species. Results of the sliding window analysis showed low nucleotide diversity (Pi), ranging from 0.0 to 0.035 (Fig. 3) among species of the Coronator Group. However, nucleotide substitution rates varied across both mitochondrial genomes and within individual genes. Remarkably, the 5′ end of the NADH5 fragment showed the highest nucleotide diversity (0.035) among all mitochondrial genes. The COX1 presented low variation, mainly in its 5′ proximal half. This region is largely employed in studies focusing in Culicidae species, and it is considered as the universal barcode for species identification [48]. Notwithstanding, other descriptive studies focusing on the mitochondrial genome showed that the COX1 is a conserved gene [49][50][51][52][53]. Consequently its utility as a universal barcode needs to be reviewed. The rrnL and rrnS demonstrated to be the most conserved genes within the mitochondrial genomes of species of the Coronator Group, showing low nucleotide diversity. Although rrnL usually shows high variability among some groups, i.e., Taeniidae [49], Nematoda [50], and Octopodus [54], herein, the highest nucleotide diversity found in the rrnL was 0.02. Per se, the rrnL and rrnS genes seem not to be good markers for taxonomy and evolutionary studies focusing on species of the Coronator Group.
Considering both the epidemiological importance of Cx. coronator and the problematic identification of females based on morphological characters, the development of molecular markers can facilitate species identification as well as establish phylogenetic relationships within the group and other closely related species. Taken together, results of the sliding window analysis and the dN/dS ratios indicate that the NADH5, ATP8 and ATP6 genes represent potential markers for future investigations regarding species of the Coronator Group, primarily because they retain high variations among PCGs with an elevated proportion of non-synonymous substitutions. Notwithstanding, the low levels of diversity shown herein, demonstrates that the utility of sequences of the mitochondrial gene needs further evaluation in future studies employing a larger sample size and other species of the subgenus Culex as well as other Culex (Culex) species.

Phylogenetic analysis
Best-fit models selected by the Akaike Information Criterion are showed in Additional file 3. The Bayesian topology generated employing the concatenated PCGs (Fig. 4) yielded two strongly supported (100 % posterior probability) monophyletic clades with the Coronator Group as the sister clade of the lineage composed of species of the Cx. pipiens complex. Within the clade formed by species of the Coronator Group, the placement of Cx. coronator as sister of the lineage leading to Cx. camposi plus Cx. usquatus plus Cx. usquatissimus was strongly supported (100 % Bayesian posterior probability). Additionally, divergence among species from this group under Maximum Composite Likelihood (MCL) was low (Table 5), ranging from 0.2 % between Cx. usquatus and Cx. camposi to 1.4 % between Cx. coronator and Cx. usquatissimus (from Acre and Rondônia). The low nucleotide divergence observed among the mitochondrial genes of species of the Coronator Group supports the close phylogenetic relationships observed in previous studies [3][4][5]. As hypothesized by Lloyd et al. 2012 [55], the high statistical support for the clades suggests that many other species can be included in the PCGs analyses of species of the subgenus Culex. However, if more species were included in the study, likely the phylogenetic signal of the mitochondrial genes would be changed. For instance, studies based on morphological phylogeny [56] and COX1 barcode sequence data of several species of the genus Culex [4] were not capable of supporting phylogenetic relationships among species of the Coronator Group and other closely related species such as Cx. surinamensis and Cx. maxi. Those results suggest that Cx. surinamensis and Cx. maxi may share their ancestor with current species included in Coronator Group and as such, they should be included in the group. However, further studies need to be developed to address this hypothesis. In order to address the effectiveness of the ATP6, ATP8 and NADH5 for reconstructing phylogenetic relationships within the Coronator Group, a Bayesian analysis was performed only with these genes (Fig. 5). Within the Coronator Group, the topology yielded was the same of that performed with all genes, demonstrating the potential of those genes for further evolutionary studies with this group. Interestingly, for Cx. pipiens complex species, results of Bayesian analysis recovered a polytomy between Cx. quinquefasciatus and Cx. pipiens. Those results are important to show how evolutionary behavior of the genes are different among groups of species, even within the same subgenus, and how carefully researchers need to be when selecting a molecular marker.

Conclusions
The complete mitochondrial genomes of Cx. usquatus, Cx. usquatissimus, Cx. coronator, and Cx. camposi share the same gene order and gene content of other Culicidae species with the mitochondrial genome reported to date. The overall length and the AT content of the four Culex species are within the range observed in species of the Anopheles albitarsis complex, Cx. quinquefasciatus, and Ae. aegypti. Except for trnR and trnS1, the tRNAs were folded into the typical cloverleaf-like structure. Similar to other metazoan and species of the family Culicidae, the control region presented some conservative poly T-stretches, which seems to be involved in the functional roles of mitochondrial genome. Results of the Bayesian phylogeny corroborated the monophyly of the Coronator Group. Nevertheless, new insights on the evolution of the group can be addressed using a larger species sampling, including morphologically similar species and species of other Culex subgenera, for example, Lutzia, Phenacomyia and Phytothelmatomyia species. Probably because of the functional roles played in the respiratory chain by the mitochondrial genome, PCGs showed high purifying selection and low nucleotide divergence among the species investigated here.