Assessing hepatitis C virus distribution among vulnerable populations in London using whole genome sequencing: results from the TB-REACH study [version 1; peer review: awaiting peer review]

Background: Injecting drugs substantially increases the risk of hepatitis C virus (HCV) infection and is common in vulnerable population groups, such as the homeless and prisoners. Capturing accurate data on relative genotype distribution within these groups is essential to inform strategies to reduce HCV transmission. The aim of this study was to utilise a next-generation whole-genome sequencing method recently validated by Public Health England, in order to produce near complete HCV genomes. Methods: In total, 98 HCV positive patients were recruited from homeless hostels and drug treatment services through the National Health Services (NHS) Find and Treat (F&T) Service between May 2011 and June 2013 in London, UK. Samples were sequenced by Nextgeneration sequencing, with 88 complete HCV genomes constructed by a de novo assembly pipeline. They were analysed phylogenetically for an estimate of their genetic distance. Results: Of the 88 complete HCV genomes, 50/88 (56.8%) were genotype 1; 32/88 (36.4%) genotype 3; 4/88 (4.5%) genotype 2; and 1/88 (1.1%) for genotypes 4 and 6 each. Subtype 1a had the highest number of samples (51.1%), followed by subtype 3a (35.2%), 1b (5.7%), and 2b (3.4%). Samples collected from drug treatment services had the highest number of genotype 1 (69%); genotypes 4 and 6 were only found from samples collected in homeless shelters. Small clusters of highly related genomic sequences were observed both across and within the vulnerable groups sampled. Open Peer Review Reviewer Status AWAITING PEER REVIEW Any reports and responses or comments on the article can be found at the end of the article. Page 1 of 8 Wellcome Open Research 2021, 6:229 Last updated: 13 SEP 2021


Introduction
Hepatitis C virus (HCV) infection is recognised as the worldwide leading cause of chronic liver disease. The World Health Organization (WHO) estimated the worldwide HCV infection prevalence was approximately 1%, representing about 71 million people who are chronically infected and causing 1.34 million deaths in 2015 1 . In the United Kingdom (UK), around 200,000 people are chronically infected by HCV, the majority of whom are from marginalised and under-served groups in society, such as people who inject drugs (PWID) 2, 3 . London has the highest laboratory reports of HCV infection among other cities in England 2 , with around 60,000 HCV cases reported in 2015 4 , although this number is showing a steady, albeit slow, decrease in the last years and following the expansion of access to direct-acting antiviral (DAA) treatments 5 .
Injecting drug use has been identified as the primary mode of HCV transmission in developed nations. A systematic review including data from 25 countries estimated that 60-80% of PWIDs were anti-HCV positive, equating to approximately 10 million newly infected cases in 2010 (range 6.0-15.2) 6 . In London, the seroprevalence of HCV amongst PWID was estimated as 63% in 2017, which was the highest in the UK 7 ; showing a gradual, limited decrease in subsequent years, probably reflecting the DAA expansion 8,9 . People who are homeless are also known to have high levels of exposure to injecting drug use and hence are exposed to higher risk of blood borne infections, including by HCV 10 ; they also tend to have higher rates of morbidity and mortality related or unrelated to disease 11 . Data on HCV prevalence among people who are homeless is quite limited, as for the UK, there was one study in Oxford reporting that 26.5% of people who are homeless were infected with HCV 12 . In addition, a third related patrisk population group is that of prisoners, who also report high risk of HCV infection. A study from 2000 investigating HBV and HCV prevalence in 8 prisons across England and Wales found that 7% of participants were HCV-antibody positive 13 . Work conducted in five Scottish prisons in 1999 reported 20.3% prevalence with 95% confidence interval (CI): 18.3%-22.3% 14 .
To date, there are at least seven genotypes and 67 subtypes of HCV found worldwide 15 . Although the distribution of genotypes varies between regions 16 , the most common genotype worldwide is genotype 1, with subtype 1a mostly dominating in the USA and northern Europe 17 and subtype 1b found commonly in Japan and Southern and Eastern Europe. Genotype 3 is the next most common genotype worldwide, accounting for approximately 30.1% of global cases 18 and particularly found in the southern region of Asia and Australasia as well as found commonly in Injecting Drug Users (IDUs) in Europe. In the past, determining the HCV genotype became an important parameter in selecting PEGylated interferon antiviral therapy which could influence the rate of treatment effectiveness. The treatment was less effective for genotype 1 patients, with treatment efficacy ranged between 40% and 60% [19][20][21] , compared to patients with genotype 2 and 3 infection with 80%-90% treatment efficacy [22][23][24] . In the current era of DAA treatment, knowledge of the genotype has become less relevant with the introduction of pan-genotypic therapies 25,26 . However, the genomic information remains pivotal in the continuous surveillance of pathogens, the characterization of point outbreaks and the identification of sustained transmission within defined settings such as hospitals, especially when combined with epidemiological information 27,28 .
The aim of the study was to combine whole genome data and epidemiological data in order to investigate the distribution and potential transmission of HCV genotypes amongst people of similar 'socio-economic clustering'. The current hypothesis is that HCV probably evolves and is transmitted in micro-epidemics within geographically or socially defined communities 27 . Thus, it is likely that the genomic information from HCV-positive participants of the TB-REACH study, when combined to the extant epidemiological characteristics, might provide correlates which are relevant to lifestyle parameters of the defined populations in question. In this study we used previously known and well-characterised population groups, where the application of whole genome sequencing is most likely to be impactful.

Methods
Ethical approval NHS Research Ethics Committee approval (13/LO/1303) for the ICONIC study (Infection response through virus genomics) was received on 20th August 2013, Integrated Research Application System (IRAS) project ID 131373. Approval applies to all NHS sites taking part in the study and additional permissions have been obtained from the NHS/HSC R&D offices of all partner sites prior to the start of the study. In addition, ethical approval for TB Reach study was obtained from the East of England -Essex National Research Ethics Service Committee (reference number 10/H0302/5).

Sample collection
We conducted a cross sectional study between May 2011 and June 2013 in London, UK. This was part of a study where the main purpose was to assess prevalence and risk factors of HCV among the three vulnerable groups described below 29 . Patients were recruited from 39 homeless hostels and 20 drug treatment services through the National Health Services (NHS) Find and Treat (F&T) Service 30 . F&T service is a specialist outreach team with the main target to tackle TB among people who are homeless, vulnerable migrants, and drug or alcohol users, alongside with NHS and third sector front-line services. The service screens almost 10,000 high-risk people every year, covering every London borough. The sample size was determined by the available data. All samples for which both viral sequencing and the questionnaire data were available were included in the final analysis unless otherwise specified. As the resulting population sample was a composite of tree smaller sets of samples from three different vulnerable groups, there is the potential of selection bias. While we have tried to address this aspect by including the entire genome to the phylogenetic analyses (thus maximising the number of SNP sites), we have also refrained from extrapolating any observations as representative of the population structure of the HCV patients in London.
Patients were eligible for inclusion in the study if they were aged > 18 years, had the capacity to consent and were identified as people who are homeless (lived in homeless hotel); had a history of drug use (using services from drug treatment centres); or were inmates at the prison at the time of the study with a history of drug use. The research staff visited each study setting and provided information sheets for those eligible. Participants who agreed to join the study were required to complete and sign a consent form. Following their consent, a questionnaire was administered and completed by researchers employed by the study to collect demographic information, information on previous HCV test results, smoking status and risk behaviours. The questionnaire was developed by the F&T service and TB Reach team and it has been piloted to all three targeted populations, with no further changed implemented resulting from the preliminary testing. The questionnaire can be found as Extended data 31 .

Sample sequencing
The samples were obtained as described extensively previously 32 . Briefly, Service users screened for TB on a mobile chest x-ray unit and in prison using the static digital x-ray machine were approached and, with consent, blood was drawn for IGRA (Quantiferon In-Tube) and HIV, HCV and HBV. Results were provided to participants with onward referral to healthcare services in line with current guidance. Treatment outcomes were collected via telephone follow up one-year post referral for the positive cases. RNA extraction was performed on the residual diagnostic blood specimens using the QIAamp Virus BioRobot MDx Kits (Qiagen) according to the manufacturers' instructions and requiring a minimum of 400μl per viral sample for a fully automated procedure. The extractions were performed on the BioRobot MDx 8000 instrument. Extracted RNA samples were amplified exactly as described previously 33 and were processed locally within the UCL Hospital Virology laboratories for PCR library preparation and next generation sequencing using a Nextera XT Library Prep Kit (Illumina) before sequencing using an Illumina MiSeq Benchtop Sequencer generating 2 × 300 bp length paired-end reads (v3 kit).
Sequence de novo assembly Genome assembly and construction of consensus sequences was performed using the ICONIC bioinformatics pipeline for de novo viral sequence assembly, as validated by Public Health England 34 . In short Trimmomatic (v. 0.33) was used to remove primer sequences and trim reads from raw reads. To remove contaminants, raw reads were mapped with SMALT (v. 0.7.6) to a decoy genome containing both viral and human reads. Non-viral sequences were removed. Quality-controlled and filtered read sets were de novo assembled using IVA (v. 1.0.0; https://sanger-pathogens.github.io/iva/). SAMtools (v. 1.2) and custom scripts were used to create a consensus genome from the assembled fragments ("contigs"). In particular, these scripts utilise BLAST to find the closest matching reference sequences to the draft segments and use them as templates to construct the consensus genome. The sequences have been deposited in the GenBank public database.

Phylogenetic analysis
Multiple data analyses were performed using phylogenetic approaches. Under the model of maximum composite likelihood, a neighbour-joining tree was constructed using nearly full-length HCV coding sequences assembled as described previously 35,36 . Genetic distance matrices were created using the pairwise distance matrix calculator in MEGA (v 5.2.2). Distances (expressed in average nucleotide substitutions per site) were calculated from the entire genome under the Maximum Composite likelihood substitution model, with heterogeneity among sites modelled through a 4-category discrete approximation of a gamma distribution.
Almost three-quarters of participants (74/98) reported they had been in a UK prison at some point in the past, and 61% (68/98) had been homeless at least once in their lives. Risk behaviours were common among participants including smoking (93/98, 95%), problem alcohol use (45/98, 46%) and injection drug use (83/98, 85%). More than 90% of participants reported ever having either smoked heroin/crack and/or injected drugs in their lifetime (91/98, 93%). The participants' characteristics are summarised in Table 1.
Of the 98 PCR-positive HCV samples collected, 88 were of sufficient concentration (viral load >100,000 IU/ml) to be further processed through a next-generation sequencing (NGS) platform and to generate more extensive viral genomes with high read-depth coverage. As a result of the NGS sequencing and the subsequent de novo assembly, 88 complete HCV genomes were assembled. There was complete correspondence of the hepatitis C typing between the PCR-and NGS-based methods across all of the samples. Across the samples in which it was possible to build segments, the average read depth was ~1000 and the average genome coverage was 78% (range 63.5%-89.5%).
As there are differences in the relative distribution of HCV genotypes amongst the different population groups, phylogenetic analysis of the whole genomes was undertaken to investigate potential clustering effects of the entire HCV genomes within those population groups ( Figure 2).
As such, participants who were recruited in prison with HCV genotype 2 fell within one cluster, providing some evidence of relatedness of the HCV genomes amongst this group. However, this observation is tentative and likely to be biased due to the low sample number (n=6) of this group. The phylogenetic analysis of the HCV 3a genotype contains some pairs of related genomic sequences at the leaves of the tree, derived from individuals who fell within the same group (8/11 paired leaves are from same group pairs, in particular 5 paired leaves from Homeless shelters individuals). The low sample number of complete HCV genomes within this genotype group (n=31) allows this observation as tentative. Lastly, The 1a genotype cluster (n=45) contains a number of paired leaves from the same groups (7/17). However, the phylogeny of this genotype does not allow the extraction of any further observations.

Discussion
This study provides the most detailed whole genome information to date, on the relative distribution of HCV genotypes among people who are homeless, people who inject drugs, and prisoners, based on samples collected from 39 homeless hostels, 20 drug treatment services, and a prison over a period of two years from central London. The collected samples were not restricted to any specific groups or other criteria. A major challenge when undertaking studies recruiting hard to reach populations is selection bias. We were only able to recruit individuals who were in contact with drug treatment services or homeless shelters or prisons. This may have affected our estimates of relative HCV genotype distribution as individuals who are not in contact with services may have a higher burden of undiagnosed infection(s). Specifically, within the prison setting, the testing was alongside an initiative to screen for active TB using radiography. Since prisoners undergoing drug detoxification were located in another part of the prison (who were unable to access easily the testing facility) our analyses exclude these higher-risk prisoners.
As such, the results should not be viewed as representative of the whole community of those population groups. However, these results offer the most detailed description to date of the whole genomes of circulating genotypes in those vulnerable and hard to reach groups within central London.
According to the study estimates, 56.8% of HCV were genotype 1 and 36.4% were genotype 3. The results were similar to those reported by Public Health England for the same time window, where 90% of HCV infection in the UK was caused by genotype 1 and 3, with proportion of 45% and 45% respectively in 2014 37 , and 47% and 44% in 2015 38 . These relative proportions of the genotypes have remained relatively stable since the time of samples collection for this study, with the relative distribution of genotypes 1 and 3 being reported as 49% and 43% respectively in 2020 39 . The phylogenetic analysis suggests that the social parameters play an important role for HCV transmission among high-risk populations, as HCV sequences from individuals from the same population group tended to cluster together. However, the limited number of samples does not allow for secure conclusions at this point.
This observation of clusters observed for related social groups is in line with findings from other studies involving longterm blood-borne infections in drug using populations 40,41 . However, the observed clustering in the current study is difficult to be attributed to a single social parameter as there is an overlap of behaviours between individuals in the three groups studied. As such it is not possible to make any claims on the exact social parameters driving the HCV infection transmission nor on the directionality of transmission.

Conclusions
We describe the HCV distribution among high-risk population in London, UK, using the application of whole genome sequencing. This work is the first of its kind completely targeting these three specific high-risk groups in London for whole genome sequencing analysis. The study demonstrates the feasibility of this approach, and the need for further studies in this direction, so that it can be used as the basis for further studies and onward recommendations for future intervention plans.
The current results support the view that HCV probably evolves and is transmitted in micro-epidemics within socially and/or geographically defined communities. The further implementation of whole genome sequencing is expected to provide detailed information on transmission and that this would result in a higher proportion of those with epidemiological evidence of transmission being genetically linked than those with no such evidence. The application of genomics can help validate collated epidemiological results and offer an added-value element of support in designing appropriate public health interventions.