Comparative genomics of two Vietnamese Helicobacter pylori strains, CHC155 from a non-cardia gastric cancer patient and VN1291 from a duodenal ulcer patient

Helicobacter pylori is involved in the etiology and severity of several gastroduodenal diseases; however, plasticity of the H. pylori genome makes complete genome assembly difficult. We report here the full genomes of H. pylori strains CHC155 and VN1291 isolated from a non-cardia gastric cancer patient and a duodenal ulcer patient, respectively, and their virulence demonstrated by in vitro infection. Whole-genome sequences were obtained by combining long- and short-reads with a hybrid-assembly approach. Both CHC155 and VN1291 genome possessed four kinds of genomic island: a cag pathogenicity island (cagPAI), two type 4 secretion system islands within an integrative and conjugative element (tfs ICE), and prophage. CHC155 and VN1291 carried East Asian-type cagA and vacA s1m1, and outer membrane protein genes, including two copies of oipA. Corresponded to genetic determinants of antibiotic resistance, chromosomal mutations were identified in CHC155 (rdxA, gyrA, and 23S rRNA) and VN1291 (rdxA, 23S rRNA, and pbp1A). In vitro infection of AGS cells by both strains induced the cell scattering phenotype, tyrosine phosphorylation of CagA, and promoted high levels of IL8 secretion, indicating fully intact phenotypes of the cagPAI. Virulence genes in CHC155 and VN1291 genomes are crucial for H. pylori pathogenesis and are risk factors in the development of gastric cancer and duodenal ulcer. Our in vitro studies indicate that the strains CHC155 and VN1291 carry the pathogenic potential.


Results
Clinical characteristics of the patients and features of H. pylori strains CHC155 and VN1291. We have previously obtained over 100 H. pylori strains from Vietnamese gastric cancer and duodenal ulcer patients 24 . We chose strains CHC155 and VN1291 for this study, because they contain fully intact tfs3 ICEs in addition to cagPAI. H. pylori CHC155 was isolated from a gastric biopsy specimen collected by endoscopy from a 61-yearold Vietnamese male patient with non-cardia gastric cancer. The clinical isolate showed no in vitro resistance to tetracycline or amoxicillin with minimum inhibitory concentrations (MICs) of 0.12 and 0.06 mg/L, respectively. However, resistance was noted to clarithromycin, levofloxacin and metronidazole, with MICs of 4 mg/L, 2 mg/L, and ≥ 256 mg/L, respectively. Strain VN1291 was isolated from a 43-year-old female patient with duodenal ulcer at Cho Ray Hospital, Ho Chi Minh. This strain was resistant to amoxicillin, clarithromycin and metronidazole, with MICs of 0.5 mg/L, 4 mg/L, and ≥ 256 mg/L, respectively, but was susceptible to levofloxacin and tetracycline, with MICs of 0.5 mg/L and 0.125 mg/L, respectively.
The de novo assembly of the CHC155 genome resulted in a single circular contig of 1,696,601 bp. Using the CheckM v1.1.6 algorithm 28 , the genome assembly reached > 99% completeness with no contamination or strain heterogenicity. DFAST quality control in FastANI v.1.33 29 showed the highest average nucleotide identity of the CHC155 genome to be 94.9% against H. pylori strain ATCC43504 (GCA_004295525.1) which was assigned to the same species. Strain CHC155 contained four rRNA genes, including two copies of 23S rRNA and two copies of 16S rRNA (Table 1 and Fig. 1A). We also determined the full genome of strain VN1291 (Table 1 and Fig. 1B), for comparative assessment of CHC155 virulence.
We detected one 21-bp CRISPR-like sequence (CTT CAA TCA AGG CA 30 CTT ATA A) in both strains. This sequence was located in the vlpC gene, which encodes a putative vacuolating cytotoxin (vacA)-like protein C, an outer membrane protein toxin.
Genomic island prediction and sequence comparison. GI prediction was based on common features of genomic islands, including mobility genes, phage-related genes, direct repeats, and nucleotide composition bias. Genomic islands identified in strain CHC155 included two tfs3 ICEs, a KHP30-like prophage, and a cag-

T4SS within the tfs ICE
Strains CHC155 and VN1291 both possessed two tfs ICEs (Figs. 1, 2). Based on the genetic arrangement of rlx (virD2 relaxase), xerD (integrase), virB6 (T4SS gene), and sequence identity, the two tfs ICEs in strain CHC155 In addition, they harbored the DNA processing genes, virD2 relaxase and xerT integrase/recombinase (Fig. 2). In both strains, one tfs3 ICE contained the cell-translocating kinase A (ctkA) gene, a translocation protein of tfs3 ICE. In addition, the longest tfs3 ICE gene was an 8600 bp DNA methyltransferase.

Phylogenomic analysis of strains CHC155 and VN1291
Virulome and antimicrobial resistance genes of strains CHC155 and VN1291.

Virulome
Virulence genes were identified in strains CHC155 and VN1291 by screening the virulence factor database using ABRICATE (Supplementary Table S2). These virulence factors were diverse and classified into eight categories: acid resistance, adherence, endotoxin, molecular mimicry, chemotaxis and motility, proinflammatory effects, toxins, and secretion systems (Supplementary Table S2). Strains CHC155 and VN1291 possessed 19 and 20 kinds of the outer membrane protein genes, hopA-Z, among which were hopH (oipA), hopS (babA)/hopT (babB), and hopU (babC). Interestingly, for some outer membrane proteins, two copies (hopD, hopH, hopJ, hopZ) or multiple copies (hopN, hopO) were found in one or both genomes (Supplementary Table S3). There  www.nature.com/scientificreports/ was a slight difference in the number of outer membrane proteins (hopD, hopMN, and hopO) between the two strains. Strain VN1291 carried two genes encoding the HopD protein while strain CHC155 carried only one. Meanwhile, there were three genes encoding HopO proteins in strain VN1291 but only two in strain CHC155. CagA, an effector protein, and VacA, which encodes a vacuolating and pore-forming protein, were also identified in the two strains. The C-terminus of CagA contained three Glu-Pro-Ile-Tyr-Ala (EPIYA) motifs with neighboring sequences corresponding to ABD-type and East Asian specific CagA-type strains (Fig. 5A), which may undergo tyrosine phosphorylation to hijack internal signaling pathways. In addition, the vacA gene motif of CHC155 and VN1291 was assigned to s1m1 (Fig. 5B), which induces a high level of vacuolation and cytotoxicity and is associated with gastric inflammation, peptic ulcer, and gastric cancer 33,34 .

Antimicrobial resistance gene profiling
Strain CHC155 was phenotypically resistant to clarithromycin, levofloxacin, and metronidazole but susceptible to amoxicillin and tetracycline (Table 3). Consistently, corresponding resistance mutations in the genes, gyrA (HP0701), 23S rRNA, and rdxA (HP0954) were identified in its genome, whereas no relevant mutations were noted in genes, pbp1a (HP0597) or 16S rRNA (Table 3). In strain VN1291, we identified resistance mutations in pbp1A (HP0597), rdxA (HP0954), and 23S rRNA. The high sequence coverage of the VN1291 and CHC155 genomes makes variant calling of antimicrobial resistance mutations reliable. To predict antibiotic resistance, we screened the CHC155 and VN1291 genomes using several antimicrobial resistance databases. However, no acquired antibiotic resistance genes were predicted using CARD, Plasmidfinder, ARG-ANNOT, and ResFinder (Supplementary Table S4). However, MEGARes detected a multidrug efflux transporter gene (HP0313) (Supplementary Table S4-6). This gene belongs to the YbfB/YjiJ family of major facilitator superfamily transporters, which is widely distributed in microbial genomes and exhibits a large spectrum of substrate specificities 35 . In vitro assessment of CHC155 and VN1291 virulence. AGS gastric epithelial cells were infected with CHC155 and VN1291 strains to understand their virulence. A cell elongation phenotype, known as the hummingbird phenotype, is observed after CagA injection from H. pylori into AGS cells [36][37][38] . Moreover, induction of IL8 during H. pylori infection depends on the presence of cagPAI, both in vitro and in vivo 39,40 . To confirm whether such virulent phenotypes occur with these two Vietnamese strains, AGS cells were infected with CHC155 or VN1291, or with 26695 as a control. Twenty-four hours after infection, we observed hummingbird phenotypes in AGS cells (red arrows in Fig. 6A).
After infection of AGS cells with each strain, CagA phosphorylation (Fig. 6B) was assessed in comparison with total CagA by immunoblotting. CagA phosphorylation was induced by strains 26695 and VN1291 at 6 h after infection and the degree of phosphorylation was increased at 24 h. This phenomenon was also observed for  www.nature.com/scientificreports/ strain CHC155, but over a longer time scale; phosphorylation was faint at 6 h, but strong at 24 h. Bacterial counts were monitored using urease subunit UreB, an abundant H. pylori protein (Fig. 6B, and Supplementary Fig. S1). Twenty-four hours after infection, strain CHC155 induced approximately 1.5 times and strain VN1291 approximately 2 times higher levels of IL8 than strain 26695 (Fig. 6C). The level of IL8 secretion indicates the ability to induce inflammatory activity; therefore, strains CHC155 and VN1291 are inflammatory strains.

Discussion
We report the complete genome of H. pylori strains CHC155 and VN1291, which were isolated from Vietnamese patients with non-cardia gastric cancer and duodenal ulcer, respectively. Both strains can induce the hummingbird phenotype, IL8 secretion, and CagA phosphorylation. This indicates that the strains have the potentials to initiate pathogenic changes in the gastric mucosa. Phylogenetic analysis assigned the two strains to the hspEAsia population. In addition, the outer membrane protein compositions of strains CHC155 and VN1291 contain two hopH(oipA), two hopS(babA)/hopT(babB), two hopJ, and multiple hopMN proteins, which is similar to other hspEAsia strains described by Kawai et al. 41 . Divergence in the number of outer membrane protein loci between hspEAsia, hspAmerind, hspWAfrica, and hpEurope populations has been shown, and the hpEurope and hspWAfrica populations possess one hopH locus and three babA/babB/babC loci 41 . It is interesting that we found hopU (babC) from both of Vietnamese strains in low coverage to it from strain 26695. The differences in the number of outer membrane proteins and in genetic variation between two strains of the same population but isolated from two distinct diseases, cancer and duodenal ulcer, and between H. pylori populations, may reflect a flexibility and adaptation capability of the H. pylori genome in host interaction.
Some H. pylori virulence factors are crucial for prolonging infection in the gastric mucosa via molecular mimicry. Strain CHC155 harbors futB and futC, which encode Lewis antigens expressed in the gastric mucosa (Supplementary Table S2). This antigenic mimicry suppresses an immune response against the bacteria and allows it to adhere to the gastric mucosa 42 . NapA promotes adhesion of human neutrophils to endothelial cells and the production of reactive oxygen radicals. In addition, the outer membrane family proteins, babA/babB and hopQ are involved in H. pylori adhesion 43 . The inflammatory effect of H. pylori is associated with the effect of oipA protein (hopH) on IL8 production 44 .
Our interest focused on important known virulence factors of H. pylori that are associated with gastroduodenal diseases. Strains CHC155 and VN1291 harbored the EastAsian-type cagA (ABD motif) and vacA s1m1. Both strains also possessed cagPAI, a genomic island that encodes the T4SS machinery for translocating CagA into host cells. In addition, strain CHC155 possessed two tfs3 ICEs that contain a complete T4SS cluster. The genetics of tfs3/4 ICE varies among H. pylori strains; they can harbor a complete or partial tfs fragment, or no www.nature.com/scientificreports/ tfs 22,23 . Moreover, tfs ICE is frequently exchanged and integrated into the genome in a hybrid tfs3-4 ICE through conjugation 22,23 . Compared with previous genome studies of H. pylori 22,23,45,46 , we found that strain VN1291 possessed one tfs3 and a hybrid tfs3-4. Furthermore, we identified a KHP30-like prophage in the genome of strain CHC155 and a KHP40-like prophage in strain VN1291. The prophage was integrated between comGF at the 5′ end and a putative outer membrane protein at the 3′ end, similar to the other hspEAsia strains 47 . comGF plays a role in transformation and DNA binding, which contributes to the genetic variability of H. pylori 48 , while outer membrane proteins mediate adherence to the gastric epithelium and are associated with the clinical outcome of the infection 49 . These findings indicate that the prophage genetic element is adaptable in the H. pylori population.cagA gene-positive strains affect the severity of gastroduodenal disease. The phosphorylated or non-phosphorylated form of CagA activates downstream host cell-signaling pathways by binding to adaptor proteins, such as Crk, Grab2, HSP-2, PAR1, and c-Met 50,51 . Crk-CagA interaction induces cell-cell dissociation and development of the hummingbird phenotype 50,51 . Therefore, our in vitro results of CagA phosphorylation and the hummingbird phenotype in infected AGS cells by strain CHC155 or VN1291 indicate the potential virulence of both of strains. Interestingly, we observed higher levels of CagA and phosphorylated CagA in strain VN1291 at 6 h compared with levels in other strains, and increased levels of CagA at 24 h post infection in strain CHC155 compared with that in strain 26695, even if we take into account the strain specificity of the commercial CagA antibody used in our immunoblot analysis, which was originally generated using Western-CagA epitopes of a Western H. pylori strain. Regulation of cagA transcription by NaCl, was found in strain 26695 and Colombian clinical isolates, which is mediated via two copies of a TAA TGA motif in the CagA promoter region 52 . Moreover, a + 59 motif in the cagA-5′-untranslated region influences the levels of CagA 53 . Further investigations are necessary to understand this mechanism of CagA regulation in Vietnamese strains. The observation of high IL8 levels following infection with strains CHC155 and VN1291 compared with strain 26695 was particularly interesting, although a difference in IL8 levels between cagPAI-positive strains was previously observed 54 . Two main hypotheses can account for this difference. First, IL8 secretion may be induced when tfs3 ICE is present, as is the case for cagPAI 55 . The genome analysis showed that strains CHC155 and VN1291 possessed a complete T4SS cluster (11 T4SS core genes) in the tfs3 ICE, whereas virB8, virB9, virB10, virB11, and virD4 genes were absent in strain 26695. Second, the effector protein ctkA in tfs3 ICE might promote proinflammatory activity. Strains CHC155 and VN1291 both possess ctkA but strain 26695 does not. Recent evidence indicates that T4SS genes of tfs3 support the injection of CtkA into host cells and the induction of high levels of IL8 secretion 56 . Our virulence profiling showed that strain VN1291 is very similar to strain CHC155. This may be because both strains belong to the hspEAsia population, although they were isolated from two patients with different diseases, duodenal ulcer (DU: VN1291) and gastric cancer (GC: CHC155). Our previous phylogenetic analysis at the whole genome scale showed that the indicated DU and GC strains were distributed together 24,57 . In addition, our previous genome wide association study on hspEAsia strains indicated that single nucleotide polymorphisms between strains causing different diseases could be discovered and underlying mechanisms suggested, such as electric charge alteration at the ligand-binding pocket, change in subunit interaction, and mode-switching DNA methylation. The virulent gene components of DU and GC strains were similar and single nucleotide polymorphisms may affect host-pathogen interaction and are novel candidates for disease discrimination 57 .

Conclusions
Here, we report the complete genomes of strains CHC155 and VN1291, which were isolated from patients with non-cardia gastric cancer and duodenal ulcer, respectively, as two representative virulent strains from Vietnam. Both strains carry East Asian-type cagA and vacA s1m1. Furthermore, each strain possesses cagPAI, two tfs3/tfs4 ICEs, and a prophage. Strains CHC155 and VN1291 can induce proinflammatory responses and morphological changes in gastric epithelial cells, indicating their potential virulence.

Materials and methods
H. pylori and genome sequencing. Helicobacter pylori CHC155 was isolated from a 61 year-old male patient with non-cardia gastric cancer at Cho Ray Hospital, Ho Chi Minh. H. pylori VN1291 was isolated from a 43-year-old female patient with duodenal ulcer at Cho Ray Hospital, Ho Chi Minh. After the international transfer of gastric antral biopsy samples to Oita University, both strains were isolated using standard culture methods as previously described 9 . DNA was extracted using a DNeasy Blood & Tissue kit (Qiagen Inc., Valencia, CA, USA). DNA concentration was measured using a Quantus Fluorometer (Promega, Madison, Wiscosin, USA). The extracted genomic DNA was sheared for library construction using a Covaris g-TUBE device according to the manufacturer's instructions. High-throughput genome sequencing was performed on a HiSeq 2500 (2 × 150 paired-end reads) for strain CHC155, and MiSeq (2 × 300 paired-end reads) system for strain VN1291, following each of the manufacturer's instructions (Illumina, San Diego, CA, USA). Trimmomatic v. 0.35 was used to remove adapter sequences and low-quality bases from raw short-read data 58 .
A SMRTbell library was prepared using a SMRTbell template Prep Kit 1.0 (Pacific Bioscience, CA, USA). DNA fragments larger than 17 kb, were selected using the BluePippin system (Saga Science, MA, USA). For each H. pylori strain, one SMRT cell was run on the PacBio RS II System with P6/C4 or P6/C4v2 chemistry and 360-min movies (Pacific Biosciences, Menlo Park, CA, USA). SMRT sequencing data were analyzed using SMRT Analysis version 2.3.0 via the SMRT Portal.
De novo assembly and genome annotation. To generate a complete genome assembly, de novo assembly was performed with the hybrid-assembly method using Unicycler v.0.4.8, which combined both long and short reads with default parameters; command line: unicycler -1 SHORT1 -2 SHORT2 -s UNPAIRED -l www.nature.com/scientificreports/ LONG -min_fasta_length 200 59 . Briefly, short reads (HiSeq/MiSeq), were assembled into several contigs, and gaps between contigs were connected by long-reads (PacBio in strain CHC155 and Oxford Nanopore in strain VN1291) to generate complete circular contigs, which have one link connecting the end to the start. The assembly was subsequently polished for a maximum of 10 rounds using Pilon 60 and then Unicycler software was applied. Finally, genome features were annotated using PROKKA v1.14.6 61 , a rapid bacterial genome annotation pipeline, using default parameters.

Determination of antibiotic resistance phenotypes and genotypes. Antimicrobial susceptibility
was assessed using Etest ® (bioMerieux) for five antibiotics (amoxicillin, clarithromycin, levofloxacin, tetracycline, and metronidazole) following European Committee on Antimicrobial Susceptibility Testing (EUCAST) v.21.01.01 protocols on Müller-Hinton agar plates supplemented with 5% horse blood. The minimum inhibitory concentrations of the antibiotics were checked every day after incubation for 3-6 days and determined. H. pylori strain 26695 was used as a control strain. Clinical breakpoints between resistant and susceptible strains were determined following the EUCAST guidelines available at http:// www. eucast. org/. The genetic determinants of antibiotic resistance in strain 26695, gyrA (HP0701), 23S rRNA, rdxA (HP0954), pbp1 (HP0597), and 16S rRNA, were used to retrieve those in strains CHC155 and VN1291 using the blastn algorithm with a minimum coverage of 80% and a minimum identity of 90%. Nonsynonymous mutations in either Vietnamese strain versus 26695, were compared with resistance mutations described in H. pylori strains 69 .
Evaluation of H. pylori virulence to and IL8 secretion from AGS cells. The virulence of H. pylori strains CHC155, VN1291, and reference strain 26695, was assessed experimentally by infecting human gastric epithelial AGS cells as described previously 70 . AGS cells were originally isolated in 1979 from the stomach tissue of a 54-year-old, white, female patient with gastric adenocarcinoma. These cells exhibit epithelial morphology 71 . The experiments were performed independently twice. Briefly, AGS cells were seeded onto six-well plates, grown overnight in RPMI 1640 medium supplemented with 10% FBS, and incubated at 37 °C in 5% CO 2 . H. pylori strains were suspended in RPMI 1640/10% FBS from a 2-day Brucella agar plate culture supplemented with 7% horse blood and added to the 70-80% confluent AGS culture at a multiplicity of infection of 100. After coculture for 6 and 24 h, cells were fixed with 10% paraformaldehyde for 15 min, washed with PBS, and formation of the hummingbird phenotype examined under a phase-contrast microscope in randomly chosen fields. IL8 in the supernatant of infected AGS cells in 12-well plates (n = 4) was measured using a Human IL8 Uncoated ELISA Kit (Invitrogen, USA). Western blot analysis was performed using infected cell lysates from a 12-well plate culture. The antibodies against the following were used: p-Tyr (PY-99, Santa Cruz Biotechnology, Dallas, TX, USA), CagA (Austral Biologicals, San Rarnon, CA, USA), urease B (Institute of Immunology, Tokyo, Japan), and β-actin (Sigma-Aldrich, St. Louis, MO, USA).
Ethics approval and consent to participate. The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of Oita University. Informed consent was obtained from all subjects involved in the study. Consent for publication. Written informed consent was obtained from the patients to publish their data in this paper.