Genomic sequence data and single nucleotide polymorphism genotyping of Bacillus anthracis strains isolated from animal anthrax outbreaks in Northern Cape Province, South Africa

This report presents genomic data on sequence reads and draft genomes of Bacillus anthracis isolates from anthrax outbreaks in animals in an endemic region of South Africa as well as genotyping of the strains using canonical single nucleotide polymorphisms (canSNPs). It is derived from an article entitle “Phylogenomic structure of B. anthracis strains in the Northern Cape Province, South Africa revealed novel single nucleotide polymorphisms”. Whole genome sequencing (WGS) of twenty-three B. anthracis strains isolated during 1998 and 2009 anthrax outbreaks in the Northern Cape Province (NCP), as well as a strain from Botswana (6102_6B) and one from Namibia-South Africa transfrontier conservation area (Sendlingsdrift, 6461_SP2) were obtained using both the HiSeq 2500 and MiSeq Illumina platforms. Mismatch amplification mutation assay (melt-MAMA) qPCR were used to identify the canSNP genotypes within the global population of B. anthracis. DNA sequencing data is available at NCBI Sequence Read Archive and GenBank database under accession N0. PRJNA580142 and PRJNA510736 respectively. A phylogenetic tree and CanSNP typing profiles of the isolates are presented within this article.


s t r a c t
This report presents genomic data on sequence reads and draft genomes of Bacillus anthracis isolates from anthrax outbreaks in animals in an endemic region of South Africa as well as genotyping of the strains using canonical single nucleotide polymorphisms (canSNPs). It is derived from an article entitle "Phylogenomic structure of B. anthracis strains in the Northern Cape Province, South Africa revealed novel single nucleotide polymorphisms". Whole genome sequencing (WGS) of twenty-three B. anthracis strains isolated during 1998 and 2009 anthrax outbreaks in the Northern Cape Province (NCP), as well as a strain from Botswana (6102_6B) and one from Namibia-South Africa transfrontier conservation area (Sendlingsdrift, 6461_SP2) were obtained using both the HiSeq 2500 and MiSeq Illumina platforms. Mismatch amplification mutation assay (melt-MAMA) qPCR were used to identify the canSNP genotypes within the global population of B. anthracis. DNA sequencing data is available at NCBI Sequence Read Archive and GenBank database under accession N0. PRJNA580142 and PRJNA510736 respectively. A phylogenetic tree and CanSNP typing profiles of the isolates are presented within this article. © 2019 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

Data description
We present the genomic data and analysis of whole genome sequences of B. anthracis strains isolated from animals anthrax outbreaks in Northern Cape Province. Sequence reads (in fastq format) and assembled genomes (in fasta format) were deposited at NCBI SRA and GenBank database under project accession No. PRJNA580142 and PRJNA510736 respectively. The information on the sample collection with accession numbers, SNP genotyping and genome assemblies is represented in Tables 1e3 respectively. Isolates were also grouped using canonical SNPs (Table 4) typing scheme [2] used for phylogenetic branches (Fig. 1).
Specifications Table   Subject Microbial genomics Specific subject area Comparative microbial genomics of B. anthracis strains for evolution and genetic diversity using single nucleotide polymorphisms (SNPs) Type of data Sequence files, Value of the Data The data sheds light of draft genomes and genetic diversity of B. anthracis strains from Northern Cape Province from two anthrax outbreaks during 1998 and 2009 in South Africa. The data serve as a benchmark for other researchers to determine the evolution and genetic diversity of B. anthracis globally.
The data could be used to determine the relationship between B. anthracis strains from South Africa and other areas and to expand the canSNP typing scheme using melt-MAMA. The data might enable trace-back in and between anthrax cases/outbreaks, especially within the context of southern Africa.

Diagnostic real-time PCR for chromosomal and plasmids markers of B. anthracis
The identification of B. anthracis isolates was performed as described by WHO [3]. The 20 ml PCR reaction consisted of 10 ml of FastStart Essential master mix (Roche Applied Science), 0.5 mM of each primer, 0.2 mM of probe for each chromosomal and plasmid target pairs with fluorescein on the one and LCRed640 on the other (Tib MolBiol GmbH, Germany) and 2.5 ml of template DNA. The PCR conditions on a LightCycler™ Nano (Roche Applied Science) were used as described in WHO [3]. The PCR conditions on a LightCycler™ Nano (Roche Applied Science) consisted of an initial cycle at 95 C for 10 minutes, slope at 20 C/second, followed by 40 cycles of 95 C for 10 seconds; 57 C for 20 seconds; 72 C for 30 seconds, slope 20 C/second with one single signal acquisition at the end of annealing cycle. Denaturation at 95 C for 3 seconds with a slope 20 C/second; 40 C for 30 seconds, slope 20 C/ second; 80 C for 3 seconds at a slope of 0.1 C/second with continuous acquisition of the signal. Cooling to 40 C for 30 seconds, slope 20 C/second.

Genotyping of B. anthracis strains using Melt-MAMA assays
Melt-MAMA assays of the canSNP markers were used to amplify the DNA of the NCP B. anthracis strains. The panel included 12 canSNPs that were used for the grouping of the B. anthracis strains (n ¼ 26) using existing Melt-MAMA primers (Table 4) derived and ancestral controls were created as described by Birdsell et al. [2]. The reaction included 2.5 ml DNA diluted in 1Â FastStart DNA Green Master (Roche Applied Science) with an ancestral forward and a derived forward SNP target primer (GC-clamp: no-GC-clamp) and a common reverse primer (Inqaba Biotec™) ( Table 2) with a starting concentration of 0.2 mM depending on the ratio indicated which allowed for separation of melt peaks by at least 5 C. Thermocycling parameters on the LightCycler™ 96 (Roche Applied Science) were 95 C for 10 minutes, followed by 35 cycles at 95 C for 15 seconds and 55 C-60 C (oligonucleotide dependent for 1 minute) for 35 cycles. End-point PCR amplicons were subjected to melt analysis using a dissociation protocol comprising of 95 C for 15 seconds, followed by incremental temperature ramping (0.1 C) from 60 C to 95 C. SYBR Green fluorescence intensity was measured at 530 nm at each ramp interval and plotted against temperature and observed as the separate melt peaks for each SNP. Controls included in every run were DNA from B. anthracis Ames, Vollum and Sterne 34F 2 strains. Phylogenetic relationships between 26 B. anthracis strains were determined in the MEGA version 7 [4]   using the maximum likelihood method based on the Tamura three-parameter model. The tree was generated with a bootstrap replication value of 500.

High-throughput sequencing and bioinformatics analysis
The DNA samples that were extracted from B. anthracis were subjected to library preparation by using the Nextera XT DNA Sample Prep kit (Illumina-compatible, Epicentre Biotechnology). Different sequence reads of B. anthracis genomes were generated on HiSeq 2500 and MiSeq instruments platforms. Clusters were generated on the flow cell using HiSeq Paired-End Cluster Generation kit (Ilumina, USA) for the HiSeq 2500 platform. Sequencing of paired end libraries were performed on the Illumina MiSeq and HiSeq 2500 sequencer using the 200-cycle SBS (sequencing by synthesis) sequencing v3 kit (Illumina, USA) and HiSeq Sequencing Kit (200 cycles) (Illumina, USA) respectively. Quality of the genome sequenced reads were assessed using FastQC software 0:10.1 [5]. Trimommatic version 0.33 [6] was used to remove the sequenced adapter, and ambiguous nucleotide reads. De novo assemblies of the paired end reads were performed using CLC Genomics Workbench version 11.1 (CLC, Denmark). The assembled contigs were ordered by Mauve tool version 2.3.1 [7] using B. anthracis Ames ancestor (GenBank accession numbers NC_007530.2, NC_007322.2 and NC_007323.3) in order to assess the accuracy and efficiency of the contigs. All trimmed sequence reads were also mapped to the reference using Burrows-Wheeler Aligner (BWA) version 0.7.12 [8] to determine B. anthracis replicons i.e. chromosome and the two plasmids. Assembled genomes were annotated using the NCBI Prokaryotic Genome Annotation pipeline. Sequenced reads were deposited to NCBI under Sequence Reads Archive (SRA), and assembled genomes to GenBank.