COMPARATIVE CHLOROPLAST GENOMIC ANALYSES REVEALED EXTENSIVE GENOMIC ARRANGEMENT IN SOME CORE AND NON-CORE CARYOPHYLLALES

The order Caryophyllales exhibit diverse diversity in morphology to molecules, which leads to taxonomic complexities in circumscribing especially to its families. The comparative analysis of the available chloroplast genome to detect pattern of genomic arrangement and variation is lacking; hence, the alignment pattern and genomic rearrangement across the Caryophyllales were detected, and the phylogenetic relationship among the families of the Caryophyllales based on maximum cp genes were inferred. The comparison of the Caryophyllales cp genomes based on representatives of 10 families with Taxillus chinensis as reference genome revealed that coding region were more conserved than the non-coding region; however, clpP, rpl16 and ycf15 were the most divergent coding region among all taxa. Further, the genomic rearrangement occurred in gene organization of the taxa among different families of Caryophyllales, the extensive rearrangement were observed in Amaranthaceae, Caryophyllaceae, Chenopodiaceae, Droseraceae and Cactaceae. Introduction The order Caryophyllales (-the core eudicots) is a diverse clade of angiosperms that includes c. 12,500 species under c.749 genera and c. 40 families [viz. Achatocarpaceae, Agdestidaceae, Aizoaceae, Amaranthaceae, Anacampserotaceae, Ancistrocladaceae, Asteropeiaceae, Barbeuiaceae, Basellaceae, Cactaceae, Caryophyllaceae, Chenopodiaceae, Corbichoniaceae, Didiereaceae, Dioncophyllaceae, Droseraceae, Drosophyllaceae, Frankeniaceae, Gisekiaceae, Halophytaceae, Limeaceae, Lophiocarpaceae, Macarthuriaceae, Microteaceae, Molluginaceae, Montiaceae, Nepenthaceae, Nyctaginaceae, Petiveriaceae, Physenaceae, Phytolaccaceae, Plumbaginaceae, Polygonaceae, Portulacaceae, Rhabdodendraceae, Sarcobataceae, Simmondsiaceae, Stegnospermataceae, Talinaceae, Tamaricaceae] (APG, 2016; Walker et al., 2018; Yao et al., 2019). The members of the order Caryophyllales exhibit diverse diversity in morphology to molecules (Hernández-Ledesma et al., 2015; Smith et al., 2018) which leads to taxonomic complexities in circumscribing especially at the family level, and even at the generic and specific level too; hence, investigating the relationship at different taxonomic level was always remained great interest in the era of pre-phylogenetic (Behnke, 1976) to phylogeny-based classification (Giannasi, 1992; APG 1998, 2003, 2009, 2016; Cuénoud et al., 2002, Brockington et al., 2009; Schäferhoff et al., 2009; Arakaki et al., 2011; Crawley and Hilu, 2012a,b; Ruhfel et al., 2014; Yang et al., 2015, 2018). As a result the identification and description of new taxa at all the taxonomic levels are done and the circumscription of the order Caryophyllales are radically changed now (Hernández-Ledesma et al., 2015; Liu et al., 2015; Walker et al., 2018; Yao et al., 2019). Despite it, many of the relationships among families of Caryophyllales still remain E-mail: majmalali@rediffmail.com, ajmalpdrc@gmail.com, alimohammad@ksu.edu.sa


Introduction
The order Caryophyllales (-the core eudicots) is a diverse clade of angiosperms that includes c. 12 (APG, 2016;Walker et al., 2018;Yao et al., 2019). The members of the order Caryophyllales exhibit diverse diversity in morphology to molecules (Hernández-Ledesma et al., 2015;Smith et al., 2018) which leads to taxonomic complexities in circumscribing especially at the family level, and even at the generic and specific level too; hence, investigating the relationship at different taxonomic level was always remained great interest in the era of pre-phylogenetic (Behnke, 1976) to phylogeny-based classification (Giannasi, 1992;APG 1998APG , 2003APG , 2009APG , 2016Cuénoud et al., 2002, Brockington et al., 2009Schäferhoff et al., 2009;Arakaki et al., 2011;Crawley and Hilu, 2012a,b;Ruhfel et al., 2014;Yang et al., 2015Yang et al., , 2018. As a result the identification and description of new taxa at all the taxonomic levels are done and the circumscription of the order Caryophyllales are radically changed now (Hernández-Ledesma et al., 2015;Liu et al., 2015;Walker et al., 2018;Yao et al., 2019). Despite it, many of the relationships among families of Caryophyllales still remain uncertain, and a comparative analysis of the available chloroplast genome (cp) to detect pattern of genomic arrangement and variation is lacking. Hence the present study has been undertaken to infer the alignment and genomic rearrangement across the selected families of the order Caryophyllales, and phylogenetic relationship among these families based on cp genes.

Comparative analysis of cp genome
The retrieved cp genome of the representatives families of Caryophyllales were compared with one of the out group taxon T. chinensis (GenBankNC_036306.1) from the order Santalales as reference genome using the mVISTA program in Shuffle-LAGAN mode (Brudno et al., 2003;Frazer et al., 2004), and the genomic rearrangements were detected using MAUVE (Darling et al., 2004;Fig. 2).

Molecular phylogenetic analyses
The coding regions of 39 plastid-coding genes ( Table 2) were extracted from the retrieved assembled cp genome, and aligned using CLUSTAL X (Thompson et al., 1997). The Maximum Parsimony (MP) analysis (Eck and Dayhoff, 1996;Nei and Kumar, 2000), using bootstrap method (Felsenstein, 1985), and the Maximum Likelihood (ML) analysis using maximum composite likelihood method (Tamura et al., 2004) were used to conduct the molecular phylogenetic analyses using the software MEGA X (Kumar et al., 2018). Taxillus chinensis (Loranthaceae), T. sutchuenensis (Loranthaceae), Erythropalum scandens (Erythropalaceae) from the order Santalales were used as outgroup in the phylogenetic analyses.

Comparison of Caryophyllales chloroplast genomes
The genomic features (viz. total cp genome size base pair (bp), gene size (bp), spacer size (bp), total number of genes, number of tRNA genes, number of protein encoding genes, number of rRNA genes and total GC content (%) of the selected sequences included in the present analysis were compared (Table 1). The total cp genome size ranged from 113064 bp in Carnegiea gigantea (Cactaceae) to 161541 bp in Rheum palmatum (Polygonaceae). The coding gene size was varied from 68877 bp in Carnegiea gigantea (Cactaceae) to114159 bp in R. palmatum (Polygonaceae). The spacer size was found to be 41173 bp in Dionaea muscipula (Droseraceae) to 76337 bp in Amaranthus hypochondriacus (Amaranthaceae). Further, except the number of rRNA genes which were found in all the analyzed four taxa; the total number of genes, number of tRNA genes, number of protein encoding genes, and total GC content (%) ranges were 98-113, 20-30, 67-80, and 36-37%, respectively. Despite the constancy of genetic content, structures and organization of chloroplast genomes of flowering plants, enormous variation have also been noted especially in the total cp genome coding size, spacer size, total number of genes, number of tRNA genes and number of protein encoding genes (Simpson and Stern, 2002;Raubeson and Jansen, 2005;Daniell et al., 2016) which could be due to genomic duplications or fractionation (Wendel et al., 2016).
The comparative genomic analysis revealed that coding region was more conserved than the non-coding region; however, clpP, rpl16 and ycf15 were the most divergent coding region among all taxa (Fig. 1). Further, the genomic rearrangement occurred in gene organization of taxa among different families of Caryophyllales, the extensive rearrangement were observed in the representatives of the families Amaranthaceae, Caryophyllaceae, Chenopodiaceae, Droseraceae and Cactaceae (Fig. 2). The majority of the loss of introns within protein-coding genes have also previously been observed in specific plant groups or species such as in Hordeum vulgare (Saski et al., 2007), Manihot esculenta , Cicer arietinum  and Bambusa sp. (Wu et al., 2009). Moreover, intron loss (such as that in clpP) occurs in diverse angiosperms including Poaceae, Onagraceae and Oleaceae . The extensive rearrangement could be due to loos of introns, IR expansion and contraction (Daniell et al., 2016).

Phylogenetic analysis
The molecular phylogenetic analysis of aligned combined sequences data matrix had 32374 positions, resulted into most parsimonious tree with the length 20970 (CI: 0.592, RI: 0.700), and the ML tree (with the highest log likelihood -182780.57) whose topology was congruent to MPT (Fig 3). The molecular phylogenetic relationships among the major clades /families of the order Caryophyllales were well resolved and seem to be strongly supported in the present ML analyses, and were found congruent with the previous recent phylogenomic (Yao et al., 2019) and phylotranscrip-tomic  analyses of Caryophyllales. The analysis also inferred strong support for the carnivorous clade Droseraceae (100% BS) as sister to a clade Polygonaceae, and Caryophyllaceae as sister to Amaranthaceae and Chenopodiaceae (100% BS). The molecular phylogenetic studies based on chloroplast markers and extensive sampling (Kadereit et al., 2003(Kadereit et al., , 2012 as well as morphological similarities [petaloid tepals, filament tubes, 2-locular anthers; compare with Table 5 of Kadereit et al. (2003)] place the family Caryophyllaceae closer to the Amaranthaceae s.s., while in terms of habitat preferences they are more like many members of the Chenopodiaceae. The family Montiaceae and Talinaceae resolved as a grade, and as sister to the family Talinaceae, a clade was recovered in which the family Cactaceae was sister to a clade of Portulacaceae. The placements of all families of the order seem to be strongly supported.  Moreover, the monophyly of all major clades within the order (e.g., Centrospermae, the carnivorous clade, the FTPP clade, the globular inclusion clade, and the Portulacineae clade) seem to be supported (Fig. 3). Additionally, in Amaranthaceae clade, Caryophyllaceae clade and Chenopodiaceae clade extensive genomic rearrangement were also observed (Fig. 3). Moreover, the rearrangement and gene/intron loss were correlated with ML tree. The protein-coding gene loss, intron loss, intron inversion, pseudogene formation, IR contraction, expansion and loss have also been previously reported in Caryophyllaes (Yao et al., 2019).