figshare
Browse
1/2
25 files

ML tree of 456 mapped 7PET genomes

Version 2 2023-08-03, 22:19
Version 1 2022-07-18, 07:29
dataset
posted on 2023-08-03, 22:19 authored by Florent LassalleFlorent Lassalle
Maximum-likelihood phylogeny of the 456 mapped 7PET genomes.

For variant calling, Illumina short reads were mapped against the novel reference strain CNRVC190243 genome. We mapped all 242 short read sets from 2018-2019 Yemeni V. cholerae isolates, provided read mapping were mapped at a sufficient depth (see below); we also mapped read sets from 218 contextual V. cholerae isolates linked to 7PET-T13 sublineage. Reads were trimmed with Trimmomatic, mapped to both CNRVC190243 reference chromosomes with BWA-MEM. Mapped genomes with an average read depth below 5x over the two chromosomes (n = 4, all from the novel Yemen read sets) were deemed of insufficient read depth and were excluded, for a final set of 456 mapped V. cholerae 7PET genomes. We used the software suite samtools/bcftools v1.9 to call variants with a minimum coverage of 10x read depth, excluding indels. Resulting consensus sequences were combined and processed with snp-sites (Page et al., 2016) to produce a single nucleotide polymorphism (SNP) alignment featuring 2,092 positions.


Alternative hypotheses of tolpologies were formulated based on the distribution of branch supports. The topology of the ML tree ouput by RaxML-NG (file with suffix tag "full.rooted") was compared to topologies featured in files with the tags "full.rooted.H9hsisterH9g" and "full.rooted.H9hsisterH9guniteH9c". Shimodaira-Hasegawa test were conducted, showing that the "full.rooted.H9hsisterH9guniteH9c" had better likelihood, and this topology was retained for further analyses.

The tree in files which names include the keyword *full* include the 456 genomes; the tree in files which names include the keyword *subcladeH9* is a subtree of the former and include only 352/456 genomes corresponding to the 7PET-T13 sublineage and close relatives.

BactDating v1.1 was used to estimate a timed phylogeny (using 100,000 Monte-Carlo Markov chain iterations and otherwise default parameters) of the Yemen 2016-2019 genomes and relatives using the ML mapped genome tree (restricted to the 7PET-T13 genome tips) and day-resolved dates as input; median day of the year of isolation was used for isolates where these data were missing.

Funding

Wellcome Trust Grant [206194]

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC