Molecular Phylogenetic Analysis of 16S rRNA Sequences Identified Two lineages of H. pylori Strains Detected from Different Regions in Sudan Suggestive of Differential Evolution

Background H. pylori is ubiquitous among humans, and one of the best studied examples of an intimate association between bacteria and humans. Under several diverse socio-demographic factors in Sudan, a continuous increase in the prevalence rate of H. pylori infection has been noticed which represents a major public health challenge. In this study, we analyzed the molecular evolution of H. pylori Strains detected from different ethnic and regions of Sudan using 16S rRNA gene and phylogenetic approach. Materials and methods A total of 75 gastric biopsies taken from patients who had been referred for endoscopy from different regions of Sudan. The DNA extraction was done by using the guanidine chloride method. Two sets of primers (universal and specific for H. pylori) were used to amplify the 16S ribosomal gene. Sanger sequencing was performed; then Blast these sequences with those available in the NCBI nucleotide database. The evolutionary aspects were analyzed using a MEGA7 software. Result Molecular detection of H. pylori has shown that 28 (37.33%) of patients were positive for H. pylori. Bivariate analysis has found no significant differences exhibited across sociodemographic, endoscopy series and H. pylori infection. Nucleotide variations were found at five nucleotide positions (positions 219, 305, 578, 741 and 763-764) and one insertion mutation (750_InsC_751) was present in sixty-seven percent (7/12) of our strains. The phylogenetic tree diverged into two lineages. Conclusion The phylogenetic analysis of 16S rRNA sequences identified two lineages of H. pylori strains detected from different regions in Sudan. Sex mutations were detected in regions of the 16S rRNA not closely associated with either tetracycline or tRNA binding sites. 66.67% of them were located in the central domain of 16S rRNA. Studying the effect of these mutations on the functions of 16S rRNA molecules in protein synthesis and antibiotic resistance is of great importance.


Introduction
H. pylori is ubiquitous among humans, (1) and one of the best studied examples of an intimate association between bacteria and humans. (2) It has infected human around 100,000 years ago (range 88,000-116,000). (1,3) and has co-evolved with human ancestors for approximately 58000 ± 3500 years, during their first migrations from east Africa. (1,4,5) And it largely escaped notice until it was cultured by Marshall and Warren. (6) In fact, H. pylori possesses several properties that helping this bacterium to persist for several decades, transmit from generation to generation and make an intimate association with its human host. (2,7) H. pylori infection is predominantly transmitted vertically from parent to child and between individuals in close contact such as in a family. (2,8) This close transmission pattern has resulted in a clear phylogeographic differentiation within these bacteria because of the local dispersion of single nucleotide polymorphisms by high rates of homologous recombination. (9,10) However, under poverty and inappropriate conditions, especially in the developing countries, H. pylori can be transmitted horizontally which makes infection with multiple strains probably more common than in the developed world. (11,12) In Sudan, the population is culturally, linguistically and ethnically diverse with more than 597 tribes who speak more than 400 dialects and languages. (13,14) The majority of the Sudanese population is rural with an urban population of just 33.2%, most of them are in Khartoum. (14,15) Sudan has severely suffered war, famine and flood in recent decades and has a large population of IDPs. (16,17) Although Sudan is rich in terms of natural and human resources, the effects of the civil conflict on health, nutrition, population, the economic and social development have undoubtedly been significant. These several diverse socio-demographic factors lead to continually increasing in the prevalence rate of H. pylori infection ranging from 48% to 65.8% which represents a major public health challenge. (18,19) However, studying the genetics of the H. pylori population has been of great interest due to its clinical and phylogeographic significance. (4) A number of studies have discovered the importance of evolutionary history to the clinical outcome of H. pylori infection; and indicated that the scramble of the relationship between bacterial and human ancestries at the individual level due to a consequence of migrations, invasions or racial admixture may moderate adverse outcomes for the host and disrupting the selection for a reduced virulence; and this may give some explanation to the continental enigmas of H. pylori. (12,(20)(21)(22)(23) Also, Phylogeny and Phylogeography of H. pylori strains are known to mirror human migration patterns (24)(25)(26)(27) and reflect significant demographic events in human prehistory. (25,28) Therefore, the phylogeographic pattern of H. pylori is a discriminatory tool to investigate human evolution and migration in addition to the traditional human genetic tools, e.g. mitochondrial DNA (mtDNA) and languages. (1) 16S rRNA gene amplification and sequencing have been extensively used for bacterial phylogeny and taxonomy; and eventually, the establishment of large public-domain databases. (29)(30)(31)(32) Several properties of the 16S rRNA gene make it the "Ultimate molecular chronometer", (33) and the most common housekeeping genetic marker; and hence a useful target for clinical identification and phylogeny. (34,35) These properties include: First, its present in all bacteria, often existing as a multigene family or operons, thus it is a universal target for bacterial identification. (32,35,36) Second, the function of 16S rRNA has not changed over a long period, so random sequence changes are more likely to reflect the microbial evolutionary change (phylogeny) than selected changes which may alter the molecule's function. (33) Finally, the 16S rRNA gene is large enough, approximately 1,500bp, for informatics purposes. (34,35,37) Most importantly that the 16S rRNA gene consists of approximately 50 functional domains and any introduction of selected changes in one domain does not greatly affect sequences in other domains i.e. less impact selected changes have on phylogenetic relationships. (35) Here in this study, two sets of primers (universal and specific for H. pylori) were used to amplify the 16S ribosomal gene directly from gastric endoscopic biopsy samples collected from dyspeptic patients who had been referred for endoscopy. Sanger sequencing was performed and by matching these sequences with those available in the National Center for Biotechnology Information (NCBI) nucleotide database, we analyzed the evolutionary aspects of Sudanese H. pylori strains using a phylogenetic approach. The novelty of our study resides in being the first study to characterize H. pylori Sudanese strains detected from different regions and ethnic of Sudan using the 16S rRNA gene.

Study design and Study sittings
A prospective cross-sectional hospital-based study was conducted in public and private hospitals in Khartoum state from June 2018 to September 2019. These hospitals receive patients from all over the regions of Sudan. Currently, there are 16 states in Sudan which divided between four geographic regions: Eastern, Western, Northern and Central Sudan which includes the largest metropolitan area, Khartoum (that includes Khartoum, Khartoum North and Omdurman), (38) see a map in Figure 3(a) for more illustration. The capital Khartoum is quickly growing and ranges between 6 and 7 million of populations, which includes approximately 2 million internally displaced persons (IDPs) from the southern war zone and the drought-affected areas in the west and east. (14,17) The hospitals include Ibin Sina specialized hospital, Soba teaching hospital, Modern Medical Centre and Al Faisal Specialized Hospital. All laboratory processes were performed in the Molecular biological lab at the Faculty of Medical Laboratory Sciences at the University of Khartoum.

Study population
The study population composed of 75 patients who had been referred for endoscopy and most of them because of dyspepsia. Structured questionnaires were provided for participants to obtain information about their socio-demographic characteristics. Patients were selected from those who were not taking antibiotics or NSAIDS. The patients gave written informed consent before they enrolled in the study. The diagnosis of gastroduodenal diseases was based on the investigation of an experienced gastroenterologist during the upper GI endoscopy procedure. While gastric cancer was diagnosed based on, in addition to gastroscopy, histology.

Sample collection
For DNA extraction purposes, gastric biopsies were collected in 1.5µl Eppendorf tubes with 400µl phosphate buffer saline (PBS), while for histological examination, the biopsies were transported in formalin. Then the biopsies were labeled and transported immediately to the laboratory for further processes.

DNA extraction
The DNA extraction was done by using the manual guanidine chloride method, as described by Alsadig et al. (39) Briefly, biopsies were washed with 400µl PBS and centrifuged for 5min at 100 rpm after each wash. Then, the samples were subjected to digestion by adding 400µl of WBCs lysis buffer, 200µl of 6M Guanidine chloride, 50µl of 7.5M Ammonium Acetate and 5µl of 20mg/µl proteinase K; and incubated at 37°C overnight. After that, On the following day samples were cooled down to room temperature then added 400µl of cooled pre-chilled Chloroform and centrifuged at 1000rpm for 5min. Then three layers were separated and the supernatant was collected to a new labeled Eppendorf. After that, 1ml of cooled pre-chilled absolute Ethanol was added and mixed gently back and forth quickly. Then samples were put in -20°C freezer overnight. After that, samples were subjected to quick vortex for one minute then centrifuged for 5min at 1000rpm; and the supernatants were discarded. Then washing with 70% ethanol was performed and the supernatant was drained with much care to avoid losing the DNA pellet at the bottom of the Eppendorf. Then the Eppendorf was inverted upside down of a tissue paper leaving the pellet to dry from alcohol for at least 2hours. Finally, the DNA pellet was re-suspended in 35µl of deionized water and was put into -20°C until use.

16S rRNA gene amplification
Extracted DNA was amplified for the universal 16S rRNA gene using the following primers: (primers: F:5'-AGAGTTTGATCCTGGCTCAG-3') (R:5'-CTACGGCTACCTTGTTACGA-3'). (40) PCR amplification was carried out with Maxime PCR PreMix Kit (i-Taq) (iNtRON BIOTECHNOLOGY, Seongnam, Korea) and a PCR thermocycler (SensoQuest, Germany). The PCR reaction mixtures contained 2.5U of i-Taq TM DNA polymerase (5U/µl), 2.5mM of each deoxynucleoside tri-phosphates (dNTPs), 1X of PCR reaction buffer (10X),1X of gel loading buffer and 1µl of DNA template. The temperature cycle for the PCR was carried out using a method described previously. (41,42) To detect the DNA, 3µl of each PCR products was loaded onto 2% agarose gels stained with 3µl ethidium bromide (10mg/ml) and subjected to electrophoresis in 1x Tris EDTA Buffer (TEB buffer) (89mM of Tris base, 89mM Boric acid and 2mM EDTA dissolved in 1Litter H 2 O) for 30 min at 120V and 50mA. The gel was visualized under UV light illumination. A 100MW DNA ladder (iNtRON BIOTECHNOLOGY, Seongnam, Korea) was used in each gel as a molecular size standard. The amplified product for the rrs gene is 1500bp.

Sequencing of H. pylori 16S rRNA gene
The amplified 16S rRNA gene (for the universal and specific) was purified and sequenced, using Sanger dideoxy sequencing method, commercially by Macrogen Inc, Korea.

Sequence analysis
The nucleotide sequence was visualized and analyzed by using the Finch TV program version 1.4.0. (44) The nucleotide Basic Local Alignment Search Tool (BLASTn; https://blast.ncbi.nlm.nih.gov/) was used for searching about the similarity with other sequences deposited in GenBank. (45) The 16S rRNA sequence was submitted in the GenBank nucleotide database under the following accession numbers: from MN845181 to MN845190 and from MN845952 to MN845954.

Molecular Phylogenetic Analysis
Highly similar sequences were retrieved from NCBI GenBank and subjected to multiple sequence alignment (MSA) using Clustal W2 (46) -BioEdit software. (47) Gblocks was used to eliminate poorly aligned positions and divergent regions of aligned sequences so the alignment becomes more suitable for phylogenetic analysis. (48,49) The Neighbor-Joining phylogenetic tree (50) of our 16S rRNA sequences with those obtained from the database was constructed using Jukes-Cantor (JC) model (51) from Substitution (ML) model. (52) The tree was replicated 1000 replicates in which the association with taxa clustered together in the bootstrap test. (53) Molecular Evolutionary Genetics Analysis Version 7.0 (MEGA7) was used to conduct evolutionary analyses. (54)

Statistical nalysis
Data were analyzed using GraphPad Prism 5. Regarding the prevalence of H. pylori infection, differences in frequency distribution by age were examined by the Mann-Whitney test. While bivariate analysis with a categorical variable was assessed by the χ 2 test or Fisher's test. The statistical significance level was determined at P<0.05.

Analysis of 16S rRNA sequences
12 sequences of 16S rRNA of H. pylori from Sudanese patients were analyzed for mutations and their conservative nature, and findings revealed diversity with few differences. The description of Sudanese H. pylori strains is shown in Table 3. The amplicon of universal 16S rRNA for patients 22 resulted in Acinetobacter radioresistens (Figure 1(c)). The patient 22 was male with 33 years and he is Mahasi from northern Sudan, and he was diagnosed with antral gastritis.
Regarding mutations, nucleotide variations were found at five nucleotide positions (positions 219, 305, 578, 741 and 763-764) and one insertion mutation (750_InsC_751) was present in sixty-seven percent (7/12) of our strains, the numbering was according to the E.coli sequence). (55) Three strains (17, 21 and 39) revealed a novel transition C→T at position 219. A triple base pair substitution (GCC763-764CAA) was detected in strain 17, 23 and 39. However, a cancerous strain 101 had two G → A transitions at positions 741 and 765. Strain 44, which was detected in a patient with normal gastric finding and suspected celiac disease and lives in Khartoum, had also a novel C→T transversion at nucleoside position 578. Also strain 39, which was found in a patient from west of Sudan (Eastern Darfur) and was diagnosed with antral gastritis, revealed a novel G→A transition at positions 305. See Figure 2 for more illustration.

Phylogenetic analysis
The phylogenetic tree diverged into two lineages. In the lineage one, all the Sudanese H. pylori strain clustered with strains from different countries. The 16S rRNA sequence of strain 44, although clustered with other global strains in one clade, had a novel C→T transversion at nucleotide position 578. However, cancerous strain 101 shared a common ancestor with strains from the USA (AF535198) and Venezuela (DQ829805); and represented with them a separated clade with a bootstrap value of 57%. A novel single nucleotide variation (G→A) at position 305 of the 16S rRNA of strain 23 made it outgroup as a kind of strain evolution with a bootstrap value of 61%.
In the second lineage, strains 17, 23 and 39 were closely related to strains from Hungary (KF297893 and KF297892). Strain 17 and Strain 39, from Khartoum and Eastern Darfur, respectively, were sisters with a bootstrap value of 75%, as presented in Figure 3.

Discussion
In this study, the phylogenetic analysis of 16S rRNA sequences identified two lineages of H. pylori strains detected from different regions in Sudan which suggested differential evolution. This finding is in agreement with a number of studies that found an unusually high degree of genetic diversity in genomic sequence analyses of H. pylori strains in connection with their house-keeping and virulence associated genes. (25,56,57) Strain17, Strain21 and Strain39 were derived from one lineage along with two strains from Hungary, with a bootstrap value of 61%. They shared a double base pair substitution (GC763-764CA) and one insertion mutation (751_InsC_752). However, both mutations were located in the central domain of the 16S rRNA gene. Despite Sudanese strains were from different ethnicity and regions of Sudan but they all caused antral gastritis, while the Hungarian strains (KF297892 and KF297893) were isolated from Demodex mites and caused rosacea. (58) Laboratory diagnosis of H. pylori by conventional cultural methodology and phenotypic identification tests is still less specific, time consuming, increased capital costs, needs of highly skilled personnel and also it difficult to diagnose re-infections. (59) However, molecular tests and PCR technology have been raised as a useful alternative means of bacterial identification, which may circumvent some of these difficulties (59,60) but has limited mainly by contamination and inadequate sensitivity issues. (61) (18,19,(62)(63)(64)(65) Tetracycline is one of the 30S-targeting antibiotics that inhibit translation elongation by sterically interfering with the binding of aminoacyl-tRNA to the A-site of the ribosome. (66) Therefore, mutations located in two domains (III and IV) in 16S rRNA: helix 34 and the loop next to helix 31 (67) can affect the conformation of the tetracycline-binding site, leading to high-level resistance. (66) In this study, six mutations were detected in two domains (I and II) which are in regions of the 16S rRNA not closely associated with either tetracycline or tRNA binding sites. This finding is partially in agreement with a study conducted by Catharine et. al., that found six nucleotide changes in two tetracyclineresistance strains. Two of these changes were located in domain I and domain II, G360A and deletion of G771, respectively. (68) However, we observed that 66.67% of nucleotide variations were located in the central domain of 16S rRNA (nucleotides 567-915) with five associated ribosomal proteins (S15, S6 and S18) folds into the platform of the small subunit. (69)(70)(71)(72) Moreover, deleterious mutations (C18G, A55G, A161G, A373G, G521A, C614A, A622G, and deletion of one A in a triple-A cluster 607-609) in domains I and II which were suggested by Aymen et. al., in purposes of revealing covered putative functional regions of the ribosome and aiding in the development of new antibiotics, (73) were found conserved in our strains, as illustrated in Figure 2(b).
Interestingly, the insertion mutation (InsC), that was detected in sixty-seven percent (7/12) of our strains, between two nucleotides (750 and 751) located at the lower portion of helix 22 (H22), which is one of the lower three-helix junctions (3HJ) of 16S rRNA that bind with S15. (74,75) Also, the mutation in cancerous strain 101 (G → A transition at positions 741) was located in the upper portion of helix 22 (H22). The molecular dynamic (MD) simulations of Wen et. al., showed that 16S rRNA and S15 bind across the major groove of H22 via electrostatic interactions, i.e. the negatively charged phosphate groups of G658, U740, G741 and G742 bind to the positively charged S15 residues Lys7, Arg34 and Arg37. (74) However, studying the effect of these mutations on the functions of 16S rRNA molecules in protein synthesis and antibiotic resistance is of great importance especially for essential regions like the central domain. (76) For example, Prescott et. al., in 1990 found that the presence of substitution from a C to G at position 726 induces the synthesis of heat shock protein and affects the expression levels of various proteins. (77) Also, mutations in the sequence 911-915 were conferred Streptomycin-resistance by impaired the binding of this antibiotic. (78)(79)(80) We acknowledge that limitations of the study are the small sample size and the phylogenetic tree was built based on the 16S rRNA gene only which is unable to cover more complex evolutionary events and distinguish between closely related strains or species. (14) Therefore, further studies with a large sample size and building a phylogeny by increasing the number of genes analyzed like MLST.

In conclusion
The phylogenetic analysis of 16S rRNA sequences identified two lineages of H. pylori strains detected from different regions in Sudan. Sex mutations were detected in regions of the 16S rRNA not closely associated with either tetracycline or tRNA binding sites. 66.67% of them were located in the central domain of 16S rRNA. Studying the effect of these mutations on the functions of 16S rRNA molecules in protein synthesis and antibiotic resistance is of great importance.

• Ethical approval and consent to participate
Approval to conduct this study was obtained from the Khartoum ministry of the health research department, University of Khartoum and Research Ethics Committees of hospitals. The patients gave written informed consent before they enrolled in the study.

• Consent for publication
All authors consent for publication

• Availability of data and materials
All data generated or analyzed during this study are included in this published article • Conflict of interest: