Complete genome sequence and characterization of avian pathogenic Escherichia coli field isolate ACN001

Avian pathogenic Escherichia coli is an important etiological agent of avian colibacillosis, which manifests as respiratory, hematogenous, meningitic, and enteric infections in poultry. It is also a potential zoonotic threat to human health. The diverse genomes of APEC strains largely hinder disease prevention and control measures. In the current study, pyrosequencing was used to analyze and characterize APEC strain ACN001 (= CCTCC 2015182T = DSMZ 29979T), which was isolated from the liver of a diseased chicken in China in 2010. Strain ACN001 belongs to extraintestinal pathogenic E. coli phylogenetic group B1, and was highly virulent in chicken and mouse models. Whole genome analysis showed that it consists of six different plasmids along with a circular chromosome of 4,936,576 bp, comprising 4,794 protein-coding genes, 108 RNA genes, and 51 pseudogenes, with an average G + C content of 50.56 %. As well as 237 coding sequences, we identified 39 insertion sequences, 12 predicated genomic islands, 8 prophage-related sequences, and 2 clustered regularly interspaced short palindromic repeats regions on the chromosome, suggesting the possible occurrence of horizontal gene transfer in this strain. In addition, most of the virulence and antibiotic resistance genes were located on the plasmids, which would assist in the distribution of pathogenicity and multidrug resistance elements among E. coli populations. Together, the information provided here on APEC isolate ACN001 will assist in future study of APEC strains, and aid in the development of control measures.


Introduction
The group known as extraintestinal pathogenic Escherichia coli (ExPEC), including uropathogenic E. coli, neonatal-meningitis E. coli, and avian pathogenic E. coli (APEC), encompasses E. coli strains that cause severe extraintestinal systemic infections such as septicemia, meningitis, and pyelonephritis in both humans and animals. In the veterinary field, APEC mainly causes avian colisepticemia, a widespread infectious disease that leads to significant economic losses in the poultry industry [1,2]. It has also been widely reported to represent a zoonotic risk, with the potential for spread between animals and humans [3]. However, an incomplete understanding of the genetic features, as well as the genome diversity and frequent occurrence of horizontal gene transfer in APEC, has made it very difficult to carry out pathogenesis studies aimed at preventing APEC infections [4]. Therefore, it is important to explore any useful features within APEC genomes.
Here, we report the full genome sequence and preliminary functional annotation of virulent APEC strain ACN001 (= CCTCC 2015182 T = DSMZ 29979 T ), which was isolated from a chicken suffering from avian colibacillosis. The study aimed to characterize the genomic features of strain ACN001 to provide information that will drive further study of APEC to better control its spread.

Organism information
Classification and features APEC is a Gram-negative, aerobic and facultatively anaerobic, non-spore forming, short to medium rod-shaped bacterium, which belongs to the Escherichia genus of the family Enterobacteriaceae (Table 1). It is an etiologic agent of avian colibacillosis, which mainly causes systemic extraintestinal diseases in poultry, including respiratory, hematogenous, meningitic, and enteric infections [5]. Based on previous chicken and mouse models infection studies, APEC strain ACN001 is a highly virulent field isolate, with a length of 1-2 μm and a diameter of 0.5-0.8 μm. It is a mesophile that can grow at temperatures of 10-45°C, with optimum growth from 37-42°C (Table 1). It is motile by the means of peritrichous flagella (Fig. 1).
The 16S rRNA gene sequence of ACN001 was compared with those of other E. coli strains available from the GenBank database using BLAST with default settings [6]. ExPEC strains can be divided into different E. coli Collection Reference phylogroups (A, B1, B2, D and E) according to the sequences of housekeeping genes (e.g., adk, fumC, gyrB, icd, mdh, purA, and recA) [7,8]. We constructed a phylogenetic tree based on the aligned gene sequences using a maximum likelihood approach and MEGA (version 5), with 1,000 randomly selected bootstrap replicates [7] (Fig. 2). ACN001 belonged to phylogenetic group B1, and was located on the same branch as APEC O78, another highly virulent group B1 strain [7]. Genes from the following strains were used to construct the phylogenetic tree:  Phylum Proteobacteria TAS [25] Class Gammaproteobacteria TAS [26,27] Order 'Enterobacteriales' TAS [26,27] Family Enterobacteriaceae TAS [28] Genus Escherichia TAS [29,30] Species Escherichia coli TAS [29,30] Gram stain Negative TAS [31] Cell

Genome sequencing and annotation
Genome project history APEC strain ACN001 was selected for whole genome sequencing at the Chinese National Human Genome Center in Shanghai, China, because of its high virulence and potential zoonotic risk. Sequence assembly and annotation were completed in December 2012, and the complete genome sequence was deposited in GenBank under accession number CP007442. A summary of the project information and its association with "Minimum Information about a Genome Sequence" [9] are provided in Table 2.  Growth conditions and genomic DNA preparation APEC strain ACN001 was cultivated on LB medium as previously described [1]. High quality genomic DNA for sequencing was extracted using a cetyl trimethyl ammonium bromide (CTAB) method, and the concentration and purity were determined by agarose gel electrophoresis.

Genome sequencing and assembly
The complete genome of APEC strain ACN001 was sequenced using the Roche 454 GS-FLX platform (Roche, Basel, Switzerland   the finishing process. All large contigs were analyzed using Cytoscape software version 2.8.2 to determine their relative positions. Gap closure between large contigs was completed by sequencing potential neighboring contigs using an ABI 3730xl sequencer (Applied Biosystems). The Phred/Phrap/Consed software package was used for sequence assembly and quality assessment in the subsequent finishing process. Possible misassemblies were corrected by sub-cloning and sequencing bridging PCR fragments. A total of 217 additional amplification reactions were performed to close all gaps and improve the quality of the sequences. The error rate of the final ACN001 genome sequence was less than 10 −5 .

Genome annotation
The complete ACN001 genome sequence was analysed using Glimmer 3.0 [10,11] and GeneMark [12,13] for gene prediction, the tRNAscan-SE tool for tRNA identification [14], and RNAmmer [15] for ribosomal RNA identification. The predicted protein-coding genes were translated into amino acid sequences and annotated using the NCBI and UniProt non-redundant sequence databases [16], the Kyoto Encyclopedia of Genes and Genomes database [17], and, subsequently, the Cluster of Orthologous Genes database [18] to identify the specific protein products and their functional categories. Additional gene analysis and miscellaneous features were predicted using TMHMM [19], SignalP [20], and the Rapid Annotation using Subsystem Technology server databases [21]. Clustered regularly interspaced short palindromic repeat elements were detected using CRT [22] and PILER-CR [23].

Genome properties
The complete genome of ACN001 comprises one circular chromosome of 4.9 Mb in size (4,936,576 bp, Fig. 3) and six plasmids: pACN001-A (60,043 bp), pACN001-B (168,543 bp), pACN001-C (5,784 bp), pACN001-D (6,747 bp), pACN001-E (6,822 bp) and pACN001-F (92,447 bp) ( Fig. 4 and Table 3), with an average G + C content of 50.56 % (Table 4). A total of 5,253 genes were predicted in the genome, of which 4,794 coded for proteins, 108 were RNA-related, and 51 were pseudogenes. A total of 3,630 (69.11 %) of the protein-coding genes were assigned specific functions, with hypothetical functions assigned to the remaining genes. The genome properties are presented in Tables 3 and 4, and Figs. 3 and 4. The COG functional categories are listed in Table 5.
Insights from the genome sequence APEC infection causes significant economic losses to the poultry industry. An incomplete understanding of the APEC genome complexity impedes the study of pathogenesis and subsequent development of control measures.
Here, complete genome sequencing and annotation of APEC virulent isolate ACN001 was carried out, which identified 4,794 protein-coding genes, accounting for 91.26 % of the total number of genes (5,253 genes). Notably, preliminary sequence analysis revealed 39 insertion  prophage-related sequences and 2 CRISPR elements. These elements involved 237 coding sequences on the circular chromosome, indicating possible genetic crosstalk among E. coli populations. These elements might represent the genetic differences between ACN001 and other APEC strains, and reflect the potential interactions of this strain with the environment. Further comparative approaches will be applied to help to better elucidate the interrelationship of these traits with certain phenotypes, such as adaptability and pathogenicity. Moreover, six plasmids were found in strain ACN001, including three large plasmids (pACN001-A, B, and F) and three small plasmids (pACN001-C, D, and E). Antibiotic resistance genes and the majority of the essential virulence genes were located on the three large plasmids, while only 8, 9 and 10 protein-coding genes with unknown functional annotations were found on plasmids pACN001-C, D, and E, respectively. The location of antibiotic resistance and virulence genes on plasmids in strain ACN001 may allow the propagation of multidrug resistance and virulence factors among E. coli populations in poultry.

Conclusion
This study presents the whole genome sequence of APEC strain ACN001, a chicken-derived isolate causing typical avian colibacillosis. The genome of ACN001 consists of a circular chromosome containing 4,794 proteincoding genes and 108 RNA genes, along with six plasmids with different features. We observed 39 ISs, 12 predicated GIs, 8 prophage-related sequences and 2 CRISPR elements on the chromosome, suggesting frequent genetic crosstalk, such as horizontal gene transfer, between ACN001 and other E. coli populations. Among the six plasmids identified in this strain, three large plasmids contained multiple antibiotic resistance and virulence genes, while the three small plasmids contained genes with unknown functional annotations. These plasmid-borne pathogenicity-associated features should be closely monitored to prevent further spread amongst the diverse E. coli populations, especially APEC. The genome sequencing and annotation of virulent APEC isolate ACN001 provides valuable genetic information for future study of the pathogenesis of APEC strains, which will help in the development of prevention and control measures.  Total based on either the size of the genome in base pairs (bp) or the total number of genes in the annotated genome