Short Communication : Organization and Variation of the Mitochondrial DNA Control Region in Ardeidae ( Aves : Ciconiiformes ) and their Phylogenetic Relationship

Nycticorax Nycticorax nycticorax Nny 2 tRNAGlu Cytb, tRNAGlu tRNAPhe 1192, 2200 KJ190954 Zhou et al., 2014 Ardea Ardea cinerea Aci 2 1178, 1500 NC025900 Zhou et al., 2014 Ardea purpurea Apu 2 1162, 1543 NC025919 Zhou et al., 2014 Casmerodius Casmerodius alba Eal 2 1346, 2033 NC025916 Zhou et al., 2014 Egretta Egretta garzetta Ega 2 1548, 1815 KJ190950 Zhou et al., 2014 Egretta eulophotes Eeu 1 tRNAGlu tRNAPhe 1997 NC009736 Zhou et al., 2007 Egretta novaehollandiae Eno 1 tRNAGlu tRNAPhe 1922 NC008551 Gibb et al., 2007 Egretta sacra Esa 2 tRNAGlu Cytb, tRNAGlu tRNAPhe 1383, 1809 NC025920 Zhou et al., 2014


Materials and methods
All the sequences were retrieved from the GenBank (species and accession numbers in Supplementary Table S1).We only analyzed the control region sequence from the whole mitochondrial genome in order to ensure the accuracy.A total of 20 species from 12 genera belonging to family of Ardeidae was analyzed (Supplementary Table S1).
Sequences were aligned by the CLUSTAL X procedure (Thompson et al., 1997).DnaSP v5.0 (Librado and Rozas, 2009) was used to define the variable sites.Only the sequences the genes for tRNA Glu -tRNA Phe was used to analyze, once the species had duplicate control regions.Nucleotide composition was calculated using MEGA7.0(Kumar et al., 2016).Genetic distance between species was calculated using TN93 (Tamura and Nei, 1993) model in MEGA7.0 (Kumar et al., 2016).The conserved sequence boxes found were compared with the previously published avian boxes (e.g., Randi and Lucchini, 1998;Donne-Goussé et al., 2002;Ruokonen and Kvist, 2002).

O n l i n e F i r s t A r t i c l e
Akaike information criterion (AIC), (Posada and Buckley, 2004) were used to identify the appropriate nucleotide substitution models.Maximum likelihood tree (ML) (Strimmer and Haeseler, 1996) was obtained using heuristic searches, based on the substitution model proposed by Modeltest 3.0 (Posada and Crandall, 1998).We calculated ML tree using MEGA7.0(Kumar et al., 2016).Statistical support for the internodes in phylogenetic tree was tested by bootstrap percentages (BP) with 1,000 replicates (Felsenstein, 1985).

Results and discussion Alignments
The alignment of control region of Ardeidae was straightforward.Most the Ardeidae species had duplicate control regions, except four species (Egretta eulophotes, Egretta novaehollandiae, Ixobrychus cinnamomeus, Ixobrychus flavicollis) having only single control regions.The single control region spans the region between the genes for tRNA Glu -tRNA Phe of the four Ardeidae species (Supplementary Table S1), which was inconsistent with most of the avian species between tRNA Glu and tRNA Phe (e.g., Huang and Ke, 2016a).
The length of the control region sequences are high variable, ranging from 1427 bp (Ixobrychus flavicollis) to 3871 bp (Botaurus stellaris), with an average size of 2737 bp.The control region size of Ardeidae and its variation is larger than that of avian Phasianidae family (ranged from 1144bp to 1555 bp; Huang and Ke, 2016a).Control region are usually considered to be the most variable parts of mtDNA (Randi and Lucchini, 1998).Extensive size variation of the mtDNA control region, attributable to variation in number of tandem repeats, has been reported for many animals (e.g., Boyce et al., 1989).

Base composition and genetic distance
The average nucleotide composition of Ardeidae control region sequences was: 32.70%A, 30.16%T, 14.54%G, and 22.60% C, with a bias against G.The amount of A+T was more than that of G+C among whole control region, which is same as avian control region (e.g., Marshall and Baker, 1997;Ruokonen and Kvist, 2002;Huang and Ke, 2016a, b).The lack of guanines is evident in each domain, especially in domain III.Thymines and adenines are most prevalent in the first domain, thymines and cytosines in the second, and adenines and cytosines in the third (Supplementary Table S2).
Nucleotide frequencies were not significantly different among species, and thus the TN93 model (Tamura and Nei, 1993) is an appropriate estimator of genetic distance (Randi and Lucchini, 1998).Ardeidae control region sequences were alignable with certainty within genus.Genetic distances between species ranged from 5.39% (between Egretta sacra and Egretta garzetta) to 41.15% (between Egretta sacra and Botaurus stellaris), showing a wide range of divergences.The average genetic

O n l i n e F i r s t A r t i c l e
distances among the species within the genera varied from 12.64% (Egretta) to 25.26% (Ixobrychus) (Supplementary Table S3).The average genetic distances showed insignificantly negative correlation with ts/tv (r=-0.853,p>0.05).

Distribution of variable sites in the control region
Control region are usually considered to be the most variable parts of mtDNA (Randi and Lucchini, 1998).However, position mutability is not distributed randomly across the whole region, but affects particular hypervariable positions and domains (Yang, 1996).The distribution of the variation in the control region seems to be the same in all the species within genera.Nucleotide substitutions occur more frequently in peripheral domains.Average substitution rate for the three domains were 0.49: 0.12: 0.39, corresponding to relative proportions of 5:1:4, respectively.Among all the genera of Ardeidae, domain I is the most variable of the three domains (Fig. 1).Marshall and Baker (1997) believed mutational hotspots might occur in domain I, whereas more variation was expected in domain III for deeper divergences.Ruokonen and Kvist (2002) found that variation of three genera was greatest in domain III.Our results partly support the opinion of Marshall and Baker (1997).

Conserved sequence
Previous comparisons of control region sequences have identified conserved sequence elements based on greater similarity of the sequence elements compared to that of the flanking areas (e.g., Ruokonen and Kvist, 2002;Huang and Ke, 2016a).We aligned the sequences of every species.Conserved sequence blocks (CSB-1) were located in the Ardeidae (Supplementary Table S4).However, we did not find the CBS-2, 3 in Ardeidae, which were detected in fishes (e.g., Zhang et al., 2011Zhang et al., , 2013)), avian (e.g., Marshall and Baker, 1997), and mammalian species (e.g., Walberg and Clayton, 1981).Five conserved sequence boxes in the domain II of Ardeidae sequences were localized (Supplementary Table S4) and identified as boxes F, E, D, C and B. Recently, six central conserved sequence boxes (F to A) were detected in fishes (Zhang et al., 2011).We did not find A-box in Ardeidae control region.
In F-box, 19 of 28 nucleotide positions were fully conserved among the Anseriformes sequences.While, there were five nucleotide positions were variable in E-box, four in D-box, seven in C-box and eight in B-box (Supplementary Table S4).S1.

O n l i n e F i r s t A r t i c l e
Z. Huang et al. (hLRTs) as implemented in Modeltest 3.0, the model Hasegawa-Kishino-Yano (HKY) model + Gamma distribution was used (HKY+ G, -lnL = 11272.92, p < 0.001, AIC = 22638.04, BIC= 23006.38).We set the shape of the Gamma distribution as 0.55 (estimated by Modeltest).Maximum likelihood method was used to reconstruct the phylogenetic trees based on HKY+ G model.Many clades were supported by bootstrap values of more than 90%.Member of Botaurinae was formed an alone clade, as a sister to Ardeinae (Fig. 1).Species of Botaurus was the first to split from the Ardeidae lineage.
All the genera could be discriminated by their distinct clades in the phylogenetic tree except Gorsachius (Fig. 1).Bock (1956) condisered thr four species gosiagi, melanolophus, magnificus, and leuconotus to comprise a genus "Gorsachius".Comparing to Egretta and Ardea, the systematics of Gorsachius has a little been contended.Control region gene supported that gosiagi and melanolophus formed one clade.However, magnificus was sister to Ardeola bacchus (Fig. 2).Leuconotus was generally solitary like goisagi and melanolophus; observations of magnificus were lacking (Payne and Risley, 1976).McCracken and Sheldon (1998) found that melanolophus and leuconotus did not formed a clade based on osteological characterstics.The present study did not include leuconotus.Our results indicated that the genus Gorsachius might not be monophyletic.To better resolve phylogenetic relationships of the genus Gorsachius, more taxon sampling as well as multiple genetic markers are needed for future studies.

Fig. 1 .
Fig. 1.Distribution of the variable sites in the control region.The number of variable sites within genera has been plotted in 50-bp intervals.
The species codes are shown in Supplementary TableS1.