Linkage of Whole Genome Sequencing, Epidemiological, and Clinical Data to Understand the Genetic Diversity and Clinical Outcomes of Shigella flexneri among Men Who Have Sex with Men in England

ABSTRACT The public health value of whole genome sequencing (WGS) for Shigella spp. in England has been limited by a lack of information on sexual identity and behavior. We combined WGS data with other data sources to better understand Shigella flexneri transmission in men who have sex with men (MSM). WGS data for all S. flexneri isolates referred to the national reference laboratory were linked to i) clinical and behavioral data collected in seven of 21 health regions in England using a standardized exposure questionnaire and, ii) national HIV surveillance data. We included 926 S. flexneri isolates, of which 43.0% (n = 398) fell phylogenetically within two domestically circulating clades associated with genotypic markers of azithromycin resistance. Approximately one third of isolates in these clades were from people living with HIV, primarily acquired through sex between men. 182 (19.7%) isolates had linked questionnaire data; 88% (84/95) of MSM isolates fell phylogenetically within the domestically circulating clades, while 92% (72/78) of isolates from other cases fell within lineages linked with travel to high-risk regions. There was no evidence of sustained transmission between networks of MSM and the wider community. MSM were more likely to be admitted to hospital and receive antimicrobials. Our study emphasizes the importance of sex between men as a major route of transmission for S. flexneri. Combined WGS, epidemiological and clinical data provide unique insights that can inform contact tracing, clinical management and the delivery of targeted prevention activities. Future studies should investigate why MSM experience more severe clinical outcomes. IMPORTANCE Within the last 2 decades there have been an increasing number of Shigella spp. outbreaks among men who have sex with men (MSM) worldwide. In 2015, Public Health England (PHE) introduced routine whole genome sequencing (WGS) for the national surveillance of Shigella spp. However, the lack of information on sexual identity and behavior has hindered interpretation. Our study illustrates the power of linking WGS data with epidemiological, behavioral, and clinical data. We provide unique population-level insights into different transmission networks that can inform the delivery of appropriate public health interventions and patient management. Furthermore, we describe and compare clinical characteristics and outcomes of S. flexneri infection in MSM and other exposure groups. We found that MSM were more likely to be admitted to hospital and receive antimicrobials, indicating that their infections were potentially more severe. The exact reasons for this are unclear and require further exploration.


3: PERSONAL AND IDENTIFYING DETAILS
Are there any children living in the household? (other than the case) Y N How many?
Public Health England is committed to ensuring all individuals are treated equally and fairly, and therefore we ask all people the following questions about their ethnicity and sexual orientation. This also helps us to identify and investigate outbreaks. Does the patient fit into any of the following categories (tick all that apply)?:

Group A
Any person of doubtful personal hygiene or with unsatisfactory toilet, hand washing or hand drying facilities at home, work or school. Particular consideration should be given as to whether individual infant-schoolaged children (aged 6 or 7 years) are able to satisfactorily observe good personal hygiene. Health protection personnel (LA and HPU) should agree locally on how to make this assessment in engagement with parents or teachers/carers.

Group B
All children aged five years old or under including those who attend school, pre-school, nursery or other childcare or minding groups.

Group C
People whose work involves preparing or serving unwrapped food to be served raw or not subjected to further heating.

Group D
Clinical, social care or nursery staff who work with young children, the elderly, or other particularly vulnerable people, and whose activities increase the risk of transferring infection via the faecal-oral route. Such activities include helping with feeding or handling objects that could be transferred to the mouth Significant contact is defined as household/workplace/school contact and/or those who have been exposed to the similar circumstances as the case i.e. travelled with the case.

Supplementary File 2. Whole genome sequencing and sequencing analysis methods
Microbiological typing, including confirmation of the species, serotyping and single nucleotide polymorphism (SNP) analysis was performed at Public Health England (PHE) using whole genome sequencing (WGS) [1][2][3].
Genomic DNA from bacterial isolates was extracted using the QiaSymphony DNA extraction platform (Qiagen). DNA was fragmented and tagged for multiplexing with Nextera XT DNA Sample Preparation Kits (Illumina) and sequenced using the Illumina HiSeq 2500 platform at PHE. Bases with a Phred score below 30 (error probability of 1 in 1000) were removed from the trailing ends using Trimmomatic [4].
Species confirmation was performed by comparing kmers (short strings of DNA of length k; in this method, k=18) within the reads to a set of kmers found in a collection reference genomes [5]. The closest percentage match was identified and provided initial confirmation of the species. The kmer ID software is available here: https://github.com/phe-bioinformatics/kmerid.
Serotyping was based on the structure of the O-antigen using 'GeneFinder' (https://github.com/phebioinformatics/gene_finder) [6]. This customised algorithm utilises Bowtie2 alignment software [7] and SAMtools [8] to map newly sequenced reads to a database of sequences encoding the O-antigen synthesis and modification genes [2]. Only predictions of serotype that matched to a reference gene sequence at >80% nucleotide identity over >80% of length were accepted [2].
Sequence type (ST) assignment was performed by aligning the newly sequenced reads to a Multilocus Sequence Typing (MLST) database of reference alleles using Bowtie2 alignment software [7], as described by Tewolde et al. [9]. The software is available here: https://github.com/phebioinformatics/MOST).
For SNP typing analysis, reads were mapped to the reference strain (S. flexneri serotype 2a strain 2457T (Genbank accession: AE014073.1) using Burrows Wheeler Alignment-Maximal Exact Match (BWA-MEM). [10] The resulting Sequence Alignment Maps (SAM) were sorted and indexed to produce Binary Alignment Maps (BAM) using SAMtools [8]. High quality variant positions (Mapping Quality > 30, Depth > 10, Variant Ratio > 0.9) identified using GATK2 in unified genotyper mode [11] were extracted and stored in SnapperDB [12]. Hierarchical single linkage clustering was performed on the pairwise SNP distance matrix at descending SNP thresholds (250, 100, 50, 25, 10, 5 and 0) as previously described [12]. The clustering is summarised as a 'SNP address' (a seven-digit code) which describes the cluster membership at each of the thresholds. For phylogenetic analyses, recombinant regions of the genome were identified using Gubbins and masked [13]. Pseudosequences of polymorphic positions were used to create a maximum likelihood tree using RAxML under the General Time Reversible model using up to 1000 bootstrap replicates [14]. Tree annotation was performed using Interactive tree of life (iTOL) v4.3 [15,16].
Antimicrobial-resistance determinants were detected using 'GeneFinder' (https://github.com/phebioinformatics/gene_finder) [6]. Bowtie2 [7] was used to map newly sequenced reads to a database of reference sequences, followed by SAMtools to create BAM files [8]. Genes were defined as present if they represented 100% of the reference sequence, with greater than 90% nucleotide identity. *Recent foreign travel based on data reported on laboratory request forms only to enable comparison between two groups.

Supplementary
P-values calculated using Chi-squared test.