Characterization and genetic diversity of Helicobacter pylori type IV secretion system components CagI and CagN and their association with clinical outcomes among Iranian patients

Background: A number of cagPAI genes in H. pylori genome was proposed to be the most probably evolved under a diversifying selection and evolutionary pressure. Among them, CagI and CagN are described as a part of the two different-operon of cagPAI that are involved in the T4SS, but the denite association of these factors with clinical manifestations is unclear. Methods: A total of 70 H. pylori isolates were obtained from different gastroduodenal patients. All isolates were examined for the presence of primary H. pylori virulence genes by PCR analysis. Direct DNA sequence analysis was performed for the cagI and cagN genes. The results were compared with reference strain. Results: The cagI, cagN, cagA, cagL, vacA s1m1, vacA s1m2, vacA s2m2, babA2, sabA and dupA genotypes were detected in 80%, 91.4%, 84%, 91.4%, 32.8%, 42.8%, 24.4%, 97.1%, 84.3%, and 84.3% of the total isolates, respectively. The most variable codon usage in cagI was observed at residues 20 to 25, 55 to 60, 94, 181 to 199, 213 to 221, 241 to 268, and 319 to 320, while the most variable codon usage in CagN hypervariable motif (CagNHM) was observed at residues 53 to 63. Sequencing data analysis of cagN revealed a hypothetical hexapeptide motif (EAKDEN/K) in residues of 278-283 among six H. pylori isolates, which needs further studies to evaluate its putative function. Conclusion: The present study demonstrated a high prevalence of cagI and cagN genes among Iranian H. pylori isolates with gastroduodenal diseases. Furthermore, no signicant correlation between cagI and cagN variants and clinical outcomes was observed in present study. However, all patients had high prevalence of cagPAI genes including cagI, cagN, cagA and cagL that indicates more potential role of these genes in disease outcome.


Introduction
Helicobacter pylori (H. pylori) is a Gram-negative, microaerophilic bacterium that can be chronically colonized in the human stomach. This organism infects more than 50% of the world's population, and is the main cause of chronic active gastritis, gastric and duodenal ulcers, mucosa-associated lymphoid tissue (MALT) lymphoma and gastric adenocarcinoma [1,2]. H. pylori infection is recognized as the major risk factor for the development of gastric cancer, which is the fth most common malignancy and the third leading cause of cancer-associated morbidity worldwide [3]. The severity of H. pylori-induced gastric diseases seems to be associated with several parameters, including host genetic polymorphism, in ammatory responses, environmental factors, and bacterial virulence genotype [4,5].
H. pylori is associated with high genetic variability including virulence genes due to genetic plasticity, rearrangement of DNA and, high transformation and recombination frequency. Thereby, H. pylori infected patients greatly varies in the disease progression and clinical outcomes geographically. To date, several virulence factors have been identi ed in the genome of H. pylori such as CagA, VacA, BabA, SabA, and DupA [5,6]. CagA, an oncoprotein, is the best studied virulence-associated factor of H. pylori that is translocated into the host gastric epithelial cells via the type 4 secretion system (T4SS). The H. pylori T4SS machinery contains a cluster of gene products which harbor approximately 40 kb chromosomal region named cag Pathogenicity Island (cagPAI) [7,8]. cagPAI encodes about 27-31 genes, by which a subset of these genes encodes the main components of the T4SS apparatus spanning bacterial membranes. Moreover, possibly 15 to 16 different proteins of the T4SS are required for translocation of CagA and peptidoglycan fragments into the host cells, and also secretion of IL-8 from gastric epithelial cells [9]. Once CagA is translocated then it modulates the host cell signaling which results the loss of membrane polarity, cell elongation, induction of in ammatory cytokines and development of gastric adenocarcinoma [10]. cagPAI encodes several unique Cag components that have no sequence similarities to any other bacterial proteins involved in T4SS. However, a number of cagPAI genes such as cagI and cagN were proposed to be most probably evolved under a diversifying selection and evolutionary pressure [11]. CagI, a small protein (41.5 kDa) encoded by cagI (cag19/hp0540) gene, does not share any sequence and topological homology to any other known proteins [12,13], whereas CagN, a 32-35 kDa protein also termed as Cag17/HP0538 encoded by cagN gene (hp0538), is a poorly characterized component of the T4SS that appears to be localized to the bacterial inner membrane rather than the periplasm [9,12,14,15].
There are some con icting reports about the role of CagI and CagN in CagA translocation, IL-8 induction from gastric epithelial cells, and H. pylori T4SS machinery [14,[16][17][18][19][20]. Reviewing subsequent and more recent literatures have revealed that CagI is capable of binding to β1 integrins of the host cell and is essential for CagA translocation, and is also involved in pilus biogenesis of T4SS [21,22]. On the other hand, deletion of cagN can reduce the phosphorylation degree of CagA into host cell and it is not considered as a substrate for the T4SS [14]. However, the putative role of CagI and CagN in translocation of CagA and H. pylori pathogenesis has not precisely been clari ed. The oncogenic potential of H. pylori strains is associated with their virulence capacity, genetic diversity and speci c sequence polymorphisms within the key genes involving in translocation and phosphorylation of T4SS effectors [23][24][25][26]. Therefore, the present study aimed to determine the prevalence of cagI and cagN genes and their amino acid sequence polymorphisms in Iranian H. pylori-infected patients with various gastroduodenal diseases. The probable association between the genetic variants of cagI and cagN and other virulence genotypes of H. pylori with clinical consequences were also investigated.

H. pylori clinical isolates and biopsy specimens
Gastric biopsy specimens were obtained from 70 patients who underwent upper gastroduodenal endoscopy at Research Institute for Gastroenterology and Liver Diseases in Tehran between January 2017 and May 2019. Three antral biopsies were taken from each patient and examined for culture and histopathology. The biopsy specimens were immediately placed in transport medium containing Thioglycolate supplemented with 3% yeast extract (Oxoid Ltd., Basingstoke, UK) and 1.3 g/L agar (Merck, Germany). All patients provided written informed consent. The study was approved by the Institutional H. pylori culture and identi cation Biopsy specimens were carefully homogenized and inoculated onto the surface of Brucella agar plates (Merck, Germany) supplemented with 7% (v/v) horse blood, 10% fetal calf serum (FCS), Campylobacterselective supplement (vancomycin 2.0 mg, polymyxin 0.05 mg, trimethoprim 1.0 mg), and amphotericin B (2.5 mg/l). The incubation was performed at 37°C for 3-7 days under a microaerophilic atmosphere (5% O 2 , 10% CO 2 and 85% N 2 ) in a CO 2 incubator (Innova ® CO-170; New Brunswick Scienti c, USA). The suspected colonies were identi ed as H. pylori based on colony morphology, Gram staining, positive reaction for oxidase, catalase and urease tests, and also by H. pylori gene-speci c PCR following the previously described protocols [27,28]. Pure cultures from con rmed isolates were kept in 0.5 ml of Brain heart infusion (BHI) medium (Merck, Germany) containing 15% glycerol plus 20% FCS, and stored at -80°C until further analysis.

Genomic DNA extraction
Genomic DNA was extracted from freshly harvested colonies on agar plates, using the QIAamp DNA Mini Kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions. The quality of DNA was checked by using NanoDrop® ND-1000 spectrophotometer (Thermo Fisher Scienti c, USA). The extracted DNA samples were stored at -20°C until PCR assay.
Genotyping of H. pylori virulence-associated genes PCR analysis was performed to detect virulence target genes including cagL, cagA, vacA alleles (s1/s2 and m1/m2), babA2, sabA and dupA genes using speci c primers (Table S1). Brie y, PCR mixtures in a volume of 25 µl consisted of 2 µl of template DNA (approximately 200 ng), 0.1 mM of each primer, 2.5 µl of a 10-fold concentrate PCR buffer, 100 mM of deoxynucleotide triphosphates, 1 mM MgCl 2 , and 1.5 U of Super-Taq TM DNA polymerase (HT Biotechnology Ltd., Cambridge, UK). PCR ampli cations were performed in a thermocycler (Eppendorf, Hamburg, Germany) under the following conditions: initial denaturation at 94°C for 4 min, followed by 30 cycles of denaturation at 94°C for 1 min, annealing at the indicated temperature for each reaction in Table S1 for 45 s, extension at 72°C for 1 min. A nal extension step was performed at 72°C for 10 min to ensure full extension of the PCR products. PCR amplicons were electrophoresed on a 1.2% TBE agarose gel, stained with ethidium bromide, and examined under a UV transilluminator. H. pylori J99 (CCUG 47164) and a no-template mixture served as positive and negative controls in each PCR experiment, respectively.

Primer designation for cagI and cagN genotyping
The NCBI GenBank database (http://www.ncbi.nlm.nih.gov/genbank/) and the DNA Data Bank of Japan (http://www.ddbj.nig.ac.jp/) were searched for all available complete and partial cagI and cagN sequences of H. pylori strains. Based on pairwise and multiple nucleotide sequence alignments of cagI and cagN genes from different H. pylori strains and using the complete relevant sequence of H. pylori P12 (CP001217.1) as the reference strain, two pairs of speci c primers were designed from the conserved regions for detection of complete related sequences using CLC Sequence Viewer 8 software (https://www.qiagenbioinformatics.com/). The selected primer target sites were compared to all available complete and partial cagI and cagN sequences of H. pylori strains with the Basic Local Alignment Search Tool (http://blast.ncbi.nlm.nih.gov/Blast.cgi).

Analysis of cagI and cagN diversity by PCR sequencing
For DNA sequencing of cagI and cagN, PCR ampli cation was carried out in a nal volume of 25 µl using designed speci c primers including 5΄-CATTTGACTTACCTTGATTAC-3΄ (cagI-F) and 5΄-TTTGAGCACTTGTTGGTTGG-3΄ (cagI-R), 5΄-GAGCGACAAAACAACTATGC-3΄ (cagN-F) and 5΄-GATCCCTAGAACAAAGTAAGC-3΄ (cagN-R) yielding DNA fragments of about 1377 and 1192 bp in length, respectively. The PCR products were puri ed using the Silica Bead DNA Gel Extraction Kit (Thermo Scienti c, Fermentas, USA) followed by sequencing on both strands using an automated sequencer (Macrogen, Seoul, Korea). DNA sequences were edited by Chromas Lite version 2.5.1 (Technelysium Pty Ltd, Australia) and BioEdit version 7.2.5 [29]. The cagI and cagN nucleotide and amino acid sequences were aligned to H. pylori strain P12 as a reference strain (GenBank: CP001217.1). The single nucleotide variations and codon usage of the sequences were examined using BioEdit version 7.2.5.

Phylogenetic analysis
Phylogenetic trees were generated for CagI and CagN nucleotide and amino acid sequences using Molecular Evolutionary Genetics Analysis version 7.0 (MEGA7) [30]. Evolutionary history was inferred by the Maximum Likelihood trees using Tamura 3-parameter model and Poisson correction method for nucleotide and amino acid sequences, respectively.

Nucleotide sequence accession numbers
The complete and partial nucleotide sequences of cagI and cagN genes from H. pylori strains determined in this study were deposited in the NCBI GenBank database under the accession numbers MG573078-MG573107 (cagI) and MG559675-MG559720 (cagN).

Statistical analysis
The statistical associations between H. pylori virulence genotypes and different clinical status were determined by the Chi-square and Fisher's exact tests. A two-sided P value of less than 0.05 was regarded statistically signi cant. The IBM SPSS Statistics for Windows version 21.0 (Armonk, NY: IBM Corp.) was used for all statistical analyses.

Results
Demographic and clinical characteristics of patients The median age of the patients was 45.6 years (ranging from 14 to 75 years). Of the study cohort, 32.8% (n = 23) was male and 67.1% (n = 47) was female. According to the endoscopic and histopathology ndings, 39 (55.7%) patients were diagnosed with were non-ulcer dyspepsia (NUD), 23 (32.8%) patients had peptic ulcer disease (PUD), 7 (10%) patients had intestinal metaplasia (IM), and one (1.4%) had gastric cancer. Three patients (4.3%) suffered from gastritis and duodenitis simultaneously. Table S2 indicates the demographic characteristics and clinical status of the included subjects. In each of the 70 cases, H. pylori was isolated by culture and the isolates were approved by detection of the glmM and 16s rRNA genes.

cagI variants in patients with different clinical status
Out of 56 cagI-positive H. pylori strains, the cagI gene of 30 strains were randomly selected and sequenced. The full-length cagI gene was successfully sequenced in 27 H. pylori strains. Moreover, the cagI gene was partially sequenced in three strains due to poor quality of sequence data or sequencing errors. According to our sequencing data, there was no insertion or deletion in the full-length cagI fragment from 27 H. pylori studied, and sequence alignments were therefore straight forward. In addition, we performed in-frame translation for cagI gene into amino acid sequences, and investigated rates and locations of CagI variants. The distribution of amino acid polymorphisms in CagI of H. pylori strains are represented in Figure S1 and Table 2. The most variable codon usage was observed at residues G20 to I25, Q55 to E60, G94, M181 to A199, K213 to T221, and Q241 to A268. As we expected, the SKVIVK hexapeptide motif (376-381) located at the C-terminal of CagI was completely conserved among the cagI sequenced H. pylori strains. cagN variants in patients with different clinical status Regarding cagN sequence analysis, 46 strains were randomly sent for direct DNA sequencing from 64 cagN-positive H. pylori strains. The complete cagN gene was successfully sequenced in 43 H. pylori strains. Furthermore, the cagN gene fragments of three strains were partially sequenced as the same reasons for the cagI gene. The cagN sequencing ndings showed a high level of variability in CagN nucleotide and protein sequences. The most variable codon usage was observed at residues 53 to 63, socalled as CagN hypervariable motif (CagNHM). Moreover, a hypothetical hexapeptide (EAKDEN/K) was inserted in residues 278-283 among six H. pylori strains. Interestingly, this motif was observed two times in a row in one of these clinical strains (EAKDENEAKDEN). The other insertion sequences were detected between residues 224-225 and 234-235 for KV and KN amino acids in one of the strains. The sequencing data analysis revealed that these insertion sequences in cagN gene caused no frameshift mutations as compared to the P12 reference strain. Figure S2 and Table 3 showed the distribution of amino acid polymorphisms of CagN among 43 H. pylori strains in this study.

Phylogenetic analysis of H. pylori CagI and CagN
The phylogenetic trees of cagI nucleotide and amino acid sequences from H. pylori isolates are illustrated in Figure 1 and Figure 2, respectively. Generally, no characteristic clusters were observed between DNA and amino acid sequences of CagI and different clinical status. Furthermore, on the basis of the CagN nucleotide and amino acid sequences, a phylogenetic tree was reconstructed by using the Maximum Likelihood method, which are illustrated in Figure 3 and Figure 4, respectively. Similar to CagI sequences, the CagN phylogenetic analysis indicated no characteristic clusters with regard to the clinical status.

Discussion
Virulent H. pylori strains harbor the cagPAI (cag + ) encoding a type IV secretion apparatus, which has been shown to inject CagA and possibly also other virulence effectors into infected gastric epithelial cells [31].
It has been well documented that cag + H. pylori strains augment the risk for severe gastritis, peptic ulceration, atrophic gastritis, dysplasia, and gastric adenocarcinoma compared to strains that lack the cagPAI (cag -) [32][33][34]. Previously, it has been described that CagI forms a functional protein complex at the bacterial cell surface by interacting with CagL, which is another important Cag secretion apparatus component. Accordingly, some evidence suggested that CagI can interact with CagL protein and let to bind to integrin receptors on the target cell surface [8,17]. CagI and CagL proteins contain N-terminal signal peptide, and therefore they can be supposed to be transported to the periplasm, however, the two proteins are not distributed equally on the bacterial cell surface [35]. Regarding different views on CagI, Kumar et al. [36] found that CagI does not participate in CagA translocation from cytoplasm to bacterial cell surface. On the other hand, it has been discovered that mutation in cagN did not interrupt CagA delivery or IL-8 secretion and the CagN-de cient H. pylori strains could cause an infection similar to wildtype H. pylori strains. Some experiments also have indicated that CagN is not conclusively required for H. pylori T4SS function [16]. In another study conducted by Kutter et al., CagN was established to interact with two other cagPAI proteins, including CagV and CagY [35]. Thus, the biological function of CagN is yet to be investigated. In the current study, the attempts were made to detect possible variants of CagI and CagN, as uncharacterized cagPAI-encoded factors, on both nucleotide and amino acid sequence levels among H. pylori isolates in Iran. We also investigated the distribution and variations in H. pylori virulence factors. Our ndings revealed that 80% of H. pylori isolates harbored cagI gene, whilst 91.4% of strains had cagN gene. To the best of our knowledge, the cagI and cagN variants in H. pylori isolates in the subset of patients with different gastroduodenal diseases are not available in the literature. Based on our molecular ndings, CagI E22, E221, and V268 amino acid polymorphisms occurred at higher rate in H. pylori isolates from NUD individuals compared to that isolated from PUD patients. On the other hand, CagI amino acid changes A23, S57, and S94 were detected at higher rates in H. pylori isolates from PUD patients compared to NUD subjects.
Despite the fact that Olbermann et al. found that cagN and cagM were demonstrated to be conserved in the cagPAI throughout all cag + H. pylori strains that have been sequenced so far [11], a high level of variability in CagN nucleotide and protein sequences was observed in present study. Furthermore, the most variable region in CagN amino acid sequence, so-called here as CagNHM, was found at residues 53 to 63 and contained many missense mutations. This region is postulated to contain GDEEITEEEKK sequence in the P12 reference strain, but varied among the sequenced strains in the current study.
Our ndings revealed that there was no signi cant correlation between clinical outcomes and cagI and cagN variants at both nucleotide and amino acid levels (P >0.05), which is in consistent with previous study reported by Ogawa et al. [25]. Pham et al., stated that C-terminal motif (SKVIVK) in CagI is essential for T4SS function, and thus is completely conserved among H. pylori strains. Remarkably, the C-terminal motif of CagI is reported to be similar to the C-terminal motifs of CagL SK(I/V)IVK and CagH TKIIVK, representing the possibility that the amino acid sequences essentially act as binding motifs for a common interaction partner of all three proteins [17]. In agreement with above mentioned study, our ndings also con rmed that the CagI C-terminal motif was completely conserved among all H. pylori isolates. Ogawa et al. discovered complete RGD motifs in CagL sequences were observed from all isolates, which possibly imply the importance of the RGD motif for CagL function [25]. A recent investigation on this topic was performed by Yadegar et al. in Iran, in which almost 97% of H. pylori clinical strains contained cagL gene [28]. Furthermore, their ndings highlighted the importance of a common CagL hypervariable motif (CagLHM) such as NEIGQ along with multiple C-type EPIYA repeats, which was linked to PUD, GE, and GC with more severity compared to NUD. In fact, it is believed that the over mentioned CagLHM motif played a key role in the pathogenesis of H. pylori strains. Besides, sequencing analysis of the present study also showed that a hypothetical hexapeptide motif (EAKDEN/K) was detected in residues 278-283 in CagN among 13.9% of H. pylori isolates. Although Bats et al. [37] implied that the mutations and truncations in CagN sequence was irrelevant to folding properties or the overall shape of CagN, further studies are required to assess the impact of this hexapeptide motif on CagN protein structure and its role in H. pylori T4SS activity. Despite the alterations in various cag sequences, it is noticeable that all patients had high prevalence of cagPAI genes including cagI, cagN, cagA and cagL that indicates more potential role of these genes in disease outcome.
In the preset study, we also investigated the presence of various H. pylori virulence genotypes. In accordance with our previous studies in Iranian populations, we detected a high prevalence of vacA s1 (77.1%) and vacA m2 (65.7%) allelic genotypes [38,39]. The vacA s1 allele has been reported to be associated with more severe atrophic gastritis in peptic ulcer patients [40,41]. In our study, the vacA s1 genotype was found to be more prevalent among PUD patients, however, there was no signi cant association between the presence of other virulence genes and clinical disease outcomes. The mosaic combination of s-and m-region allelic genotypes also has been established to be associated with the pathogenicity of H. pylori [42,43]. Accordingly, type s1m1 H. pylori strains express large amounts of VacA toxin and are strongly associated with a higher level of in ammation and mucosal ulceration, while vacA s1m2-harboring strains produce moderate amount of toxin and vacA s2m2 strains are virtually non-toxic and rarely associated to clinical outcome [44]. A majority of H. pylori strains in the current study contained vacA s1m2 genotype and this was mainly observed in NUD patients. On the contrary, allelic combination s1m1 or s2m2 genotypes were detected among the majority of clinical isolates of H. pylori in other parts of the world, and the hypervirulent vacA s1m1 genotype was commonly associated with PUD patients [45]. Hence, it can be inferred that correlation between H. pylori genotyping and clinical outcome of the patients vary in different geographical regions.

Conclusion
In summary, a large body of evidence indicates that certain cagPAI components are correlated with the risk of gastric carcinogenesis. Here, we investigated the diversity of CagI and CagN sequences in clinical H. pylori isolates from Iranian patients with different clinical status. We detected several putative variants of CagI and CagN sequences in H. pylori isolates, however, there was no signi cant relevance between these variants and clinical phenotypes. Our ndings also demonstrated that the C-terminal SKVIVK motif within the CagI protein is conserved among all tested H. pylori strains. Meanwhile, the motif EAKDEN was a typical attribute identi ed in C-terminal sequence of CagN protein among some of the H. pylori strains, however its potential impact on T4SS activity and translocation of effectors requires further investigations. Despite the present study has successfully demonstrated the genetic diversity of cagI and cagN genes, it has certain limitations in terms of insu cient sample size. Accordingly, the possible effects of CagI and CagN variants on the T4SS activity as well as their possible interactions with other cagPAI components in a large number of H. pylori isolates needs to be explored. Also, the probable relevance of overmentioned variants with different clinical outcomes should not be ignored.   Phylogenetic tree of H. pylori clinical strains (n=27) based on cagI nucleotide sequences. Maximum likelihood tree of concatenated sequences was constructed using MEGA7 software with bootstrap method at 1000 replications. The evolutionary distances were computed using the Tamura 3-parameter model. Phylogenetic tree of H. pylori clinical strains (n=27) based on translated CagI amino acid sequences.
Maximum likelihood tree of concatenated sequences was constructed using MEGA7 software with bootstrap method at 1000 replications. The evolutionary distances were computed using the Poisson correction method.  Phylogenetic tree of H. pylori clinical strains (n=43) based on translated CagN amino acid sequences.
Maximum likelihood tree of concatenated sequences was constructed using MEGA7 software with bootstrap method at 1000 replications. The evolutionary distances were computed using the Poisson correction method.