Data of de novo genome assembly of the Chlamydia psittaci strain isolated from the livestock in Volga Region, Russian Federation

Chlamydiae are obligate intracellular bacteria globally widespread across humans, wildlife, and domesticated animals. Chlamydia psittaci is a primarily zoonotic pathogen with multiple hosts, which can be transmitted to humans, resulting in psittacosis or ornithosis. Since this pathogen is a well-recognized threat to human and animal health, it is critical to unravel in detail the genetic make-up of this microorganism. Though many genomes of C. psittaci have been studied to date, little is known about the variants of chlamydial organisms causing infection in Russian livestock. This research is the first de novo genome assembly of the C. psittaci strain Rostinovo-70 of zoonotic origin that was isolated in Russian Federation. The results were obtained by using standard protocols of sequencing with the Illumina HiSeq 2500 and Oxford Nanopore MinION technology that generated 3.88 GB and 3.08 GB of raw data, respectively. The data obtained are available in NCBI DataBase (GenBank accession numbers are CP041038.1 & CP041039.1). The Multi-Locus Sequence Typing (MLST) showed that the strain Rostinovo-70 together with C. psittaci GR9 and C. psittaci WS/RT/E30 belong to the sequence type (ST)28 that could be further separated into two different clades. Despite C. psittaci Rostinovo-70 and C. psittaci GR9 formed a single clade, the latter strain did not contain a cryptic plasmid characteristis to Rostinovo-70. Moreover, the genomes of two strains differed significantly in the cluster of 30 genes that in Rostinovo-70 were closer to Chlamydia abortus rather than C. psittaci. The alignment of the genomes of C. psittaci and C. abortus in this area revealed the exact boarders of homologous recombination that occurred between two Chlamydia species. These findings provide evidence for the first time of genetic exchange between closely related Chlamydia species.

DataBase (GenBank accession numbers are CP041038.1 & CP041039.1). The Multi-Locus Sequence Typing (MLST) showed that the strain Rostinovo-70 together with C. psittaci GR9 and C. psittaci WS/RT/E30 belong to the sequence type (ST)28 that could be further separated into two different clades. Despite C. psittaci Rostinovo-70 and C. psittaci GR9 formed a single clade, the latter strain did not contain a cryptic plasmid characteristis to Rostinovo-70. Moreover, the genomes of two strains differed significantly in the cluster of 30 genes that in Rostinovo-70 were closer to Chlamydia abortus rather than C. psittaci. The alignment of the genomes of C. psittaci and C. abortus in this area revealed the exact boarders of homologous recombination that occurred between two Chlamydia species. These findings provide evidence for the first time of genetic exchange between closely related Chlamydia species. ©

Data
In this study, we report for the first time a complete genome assembly for the C. psittaci wild-type strain Rostinovo-70 sequenced by both the Illumina HiSeq 2500 and Oxford Nanopore MinION platforms. Fig. 1 describes a notable polymorphism with a number of single and multiple single nucleotide polymorphisms (SNPs) in both the coding sequences (CDS) and intergenic spaces in comparison between the C. psittaci Rostinovo-70 and the reference genome of C. psittaci GR9 strain, isolated from wild ducks in Germany [1]. Fig. 2 demonstrates the phylogenetic structure of 12 homologous reference C. psittaci strains and C. psittaci Rostinovo-70 strain, which was constructed and visualized by NDtree 1.2 and phylogenetic tree newick viewer, respectively. Fig. 3 demonstrates a phylogenetical separation of the C. psittaci Rostinovo-70 and reference C. psittaci WS/RT/E30 into two different clades while C. psittaci Rostinovo-70 and C. psittaci GR9 formed a single clade. Table 1 provides a summary of genome statistical characteristics for the hybrid assembly of the C. psittaci Rostinovo-70 by QUAST. Table 2 lists the bioinformatic tools used to analyze the genome of C. psittaci Rostinovo-70 strain. Table 3 describes the list of the whole genome C. psittaci strains and plasmids used for comparative analysis. Table 4 demonstrates a marked difference in 50 genes between the C. psittaci Rostinovo-70 and C. psittaci GR9 and the presence of a cluster of 30 genes in the C. psittaci Rostinovo-70 that were homologous to Chlamydia abortus rather than C. psittaci.

DNA extraction, Illumina and nanopore sequencing, and assembly
Total DNA was extracted from the lyophilized chicken embryo tissue that was infected with C. psittaci strain Rostinovo-70 followed by density gradient centrifugation. For this purpose the DNeasy Blood & Tissue Kit (250) QIAGEN (Qiagen, Hilden, Germany) was applied. The final DNA concentration was measured using a spectrophotometer from BioRad (Bio-Rad Laboratories, Redmond, WA, USA).

Value of the Data
This is the first report on the de novo genome assembly of the C. psittaci Rostinovo-70 bacterial strain of zoonotic origin that was isolated in Russian Federation and now available as a reference strain for molecular epidemiology studies. The genome may be useful for researchers in the fields of molecular biology and epidemiology who study molecular evolution of Chlamydia and other intracellular microorganisms with a limited genetic polymorphism. The data obtained may help to complement the large volume of genome level assemblies and should contribute to exploration of microbial taxonomy and evolution. These data contribute to understanding and improving our knowledge in bacterial diversity and distribution worldwide. Preparation of the DNA library for sequencing was performed using 1D Genomic DNA by ligation SQK-LSK108 (Oxford Nanopore Technologies, Oxford, UK). DNA end repair and dA-tailing steps was performed using NEB repair modules (New England Biolabs, Ipswich, MA, USA). All clean-up steps of DNA preparation were performed using Agencourt AMPure XP beads (Beckman Coulter Life Sciences,  The sequencing runs generated a total of 3.88 GB (7,493,423 total sequences) of single-end reads by the Illumina platform in FASTQ format and 3.08 GB (1,24 M reads) by the Oxford Nanopore in fast5 format. After filtering out chicken embryo tissue reads, the C. psittaci DNA used for de novo hydrid assembly was composed with the clean reads for both Illumina (945 Mb, 1,831,776 total sequences) and Oxford Nanopore (2.5 GB, 271,098 total sequences). Assembly analysis showed an availability of the entire chromosome in a single contig (1,171,768 bp length, the GenBank accession number is CP041038.1). Additionally, the presence of C. psittaci cryptic plasmid (7678 bp length) was identified as the extrachromosomal replicon (the GenBank accession number is CP041039.1).
In contrast to the plasmidless C. psittaci GR9, a crypric plasmid (7659 bp) was detected in the C. psittaci Rostinovo-70. In fact, four SNPs and quadruple-SNP combinations (AGAA/TTCT) were found in the C. psittaci Rostinovo-70 cryptic plasmid in comparison with the reference C. psittaci CP3 plasmid pcp CP3 (GenBank Accession number CP003813.1). The consecutive comparative analysis of several target genes of the C. psittaci Rostinovo-70 strain after Sanger sequencing by another group [2], namely the omp1, omp2, 16S rRNA, 23S rRNA and plasmid pCp putative genes (GenBank Accession numbers DQ177459.1, DQ177460.1, DQ663788.1, DQ663789.1 and DQ663790.1, respectively), with the relevant genes of the whole genome sequence of the Rostinovo-70 strain deposited by us demonstrated their complete identity (100%). The only exception was omp2 (GenBank Accession number DQ177460.1), which showed an identity of 99.83% due to the SNP at position 534 displayed a T/A substitution. 39.08 Table 2 The bioinformatic tools used to analyze the genome of C. psittaci Rostinovo-70 strain.

Program and scripts for bioinformatics
Briefly, taxonomic analysis of the raw reads was performed by Metagenomics Analysis Server MG-RUST [3]. Quality assessment of the reads was performed using FASTQCv0.11.8 [4]. Removal of lowquality reads with ambiguous base (N) and the adapter sequences from the Illumina data was made by AfterQC [5]. The Porechop [6] was used to find and remove adapters from Oxford Nanopore reads. The Filtlong software [7] was used to filter short Nanopore reads smaller than 2000 bp. Single-end Illumina reads were filtered using Bowtie2 v. 2.3.5.1 [8]. The reference strains mapping was performed by Bowtie2 v. 2.3.5.1. with 20 reference C. psittaci genomes (Table 3) and five C. psittaci plasmids deposited in GenBank, which had more than 95% homology to Rostinovo-70. Genome statistical data analysis of the hybrid assembly of the C. psittaci Rostinovo-70 was generated with Quality Assessment Tool for Genome Assemblies (QUAST) [9]. Hybrid de novo assembly was carried out by using Unicycler assembly pipeline for bacterial genomes [10]. A search of local changes, such as nucleotide substitutions in individual genes, alignment, as well as comparison with the reference genomes were performed by software Mauve v. 2.4.0. [11] allowing more accurate determination of the positions of mutations in coding and non-coding regions.

Phylogenetic analysis
The MLST based on the concatenated sequences of seven housekeeping genes with the use of a DataBase hosted at http://pubmlst.org/chlamydiales/ assigned the C. psittaci Rostinovo-70 to sequence type (ST)28. In fact, C. psittaci Rostinovo-70, C. psittaci GR9, and C. psittaci WS/RT/E30 belong to the same ST28 indicating their origination from a single progenitor. Nevertheless, the strains C. psittaci Rostinovo-70 and C. psittaci WS/RT/E30 (GenBank Accession number NC_018622.1) were separated phylogenetically into two different clades (Fig. 3). In contrast, C. psittaci Rostinovo-70 and C. psittaci GR9 formed a single clade, despite that they demonstrated a marked difference in 50 genes (Table 4). Further analysis revealed the presence of a cluster of 30 genes that were closer to C. abortus rather than C. psittaci ( Table 4). The alignment of the genomes of C. psittaci Rostinovo-70, C. psittaci GR9, and C. abortus LLG in this area determined the exact boarders of the homologous recombination that occurred between two Chlamydia species, such as C. psittaci and C. abortus. One region of recombination was located within the gene encoding putative 3-methyladenine DNA glycosylase resulting in the Table 3 The list of the whole genome C. psittaci strains and plasmids used in this study.   frameshift within the FI836_03950 in Rostinovo-70. The consequence of the alteration of this gene to pseudogene on virulence of this strain will be part of a future investigation. Another region of recombination was localized within the FI836_04045 encoding putative sodium symporter family protein resulting in formation of a hybrid protein between two Chlamydia species. Overall, the comparative genomics appears to reveal the first evidence of homologous recombination between two organisms.