Evolutionary genomics and population structure of Entamoeba histolytica

Amoebiasis caused by the gastrointestinal parasite Entamoeba histolytica has diverse disease outcomes. Study of genome and evolution of this fascinating parasite will help us to understand the basis of its virulence and explain why, when and how it causes diseases. In this review, we have summarized current knowledge regarding evolutionary genomics of E. histolytica and discussed their association with parasite phenotypes and its differential pathogenic behavior. How genetic diversity reveals parasite population structure has also been discussed. Queries concerning their evolution and population structure which were required to be addressed have also been highlighted. This significantly large amount of genomic data will improve our knowledge about this pathogenic species of Entamoeba.


Introduction
Amoebiasis 4caused by the gastrointestinal parasite Entamoeba histolytica is one of the major parasitic diseases after malaria and is responsible for approximately 100,000 human deaths per annum [1]. The parasite has an interchangeable two stage life cycle consisting of an infective cyst form and a motile pathogenic trophozoite form. Infection is endemic in many developing countries where poor sanitation and malnutrition are common. Infection can also be restricted to a certain population in some developed countries (among male homosexual population in Japan) [2,3]. The global prevalence of infection (estimated in 1986) suggested that 10% of the world population was infected by this parasite [4]. E. histolytica infection develops variable disease outcomes. 90% of infected individuals remain asymptomatic, while only 10% develops symptoms of invasive amoebiasis [5,6]. However, the global prevalence was estimated prior to the differentiation of E. histolytica from its non-pathogenic sibling Entamoeba dispar in 1993 [7]. Regardless of this epidemiological modification, invasive amoebiasis is still relatively a rare outcome of E. histolytica infection. Specific determinants for the diverse outcomes of this infection still remain obscure. However, host genetics and parasite genotype could be two possible factors [8,9]. Exploring the hidden genetic trait of parasite, directly linked to its virulence or associated with disease outcome, motivates a substantial area of Entamoeba research. Intra and inter-specific genomic comparisons have been conducted to identify the parasites' genetic factor linked to its virulence or associated with differential disease causing abilities [10][11][12][13]. These studies also provide some interesting and valuable information concerning the evolution and population structure of this parasite. Recent information concerning evolutionary genomics of E. histolytica and their association with parasite phenotype and its virulence have been discussed. How parasite population structure is revealed by genetic diversity has also been discussed. Questions related to their evolution and population structure have also been emphasized in this review.

Whole-genome sequences of Entamoeba species
Several species of Entamoeba infects a wide range of hosts [14]. The simplest morphological characteristic like the number of nuclei per cyst has been exploited to distinguish between species [15]. However, morphological variations do not always reflect species-level differences and significant genetic diversity exists among morphologically indistinguishable organisms [15]. Some species like the oral parasite Entamoeba gingivalis do not produce cysts [14]. Phylogenetic relationships among SSU rRNA gene sequences of Entamoeba species suggested that E. dispar, Entamoeba nuttalli and Entamoeba moshkovskii are closely related to E. histolytica, while Entamoeba invadens and Entamoeba coli are distantly related [15]. E. dispar, morphologically identical with E. histolytica is usually considered as an avirulent commensal of human gut [14]. However, a recent study suggested that a certain strain of E. dispar (ICB-ADO), isolated from a Brazilian patient can cause amoebic liver abscess (ALA) in hamsters [16]. E. moshkovskii is microscopically indistinguishable from E. histolytica and E. dispar in its cyst and trophozoite form. It was initially thought to be a free living protozoan species [17] but a recent study suggested that E. moshkovskii infects humans and causes diarrhea and colitis in infants [17]. E. dispar infection is, in general much more common than E. histolytica worldwide [18]. Since, worldwide prevalence of E. histolytica infection [4] was estimated prior to the genetic discrimination of E. histolytica from E. dispar, the prevalence value can be completely erroneous and E. dispar could be a potential contributor to the prevalence figures in endemic areas [19]. E. moshkovskii can be found more frequently in regions where amoebiasis shows high prevalence [19,20]. Entamoeba bangladeshi, recently discovered from Bangladesh was clearly grouped with the clade of Entamoeba infecting humans, including E. histolytica [21]. E. invadens is a reptilian parasite and is an important model for encystation process. E. invadens can be induced to encyst in axenic laboratory culture, while encystation has not yet been achieved in axenically grown E. histolytica trophozoites [14].
The genome sequence of E. histolytica strain HM1:IMSS was published and analyzed in 2005 [22][23][24]. The genome assembly contains 20, 800, 560 bp of DNA in 1496 scaffolds. The genome has a high AT content (approximately 75%). Approximately half of the assembled sequence is predicted to be coding, with 8333 annotated genes [14]. Genome assembly of E. dispar strain SAW760 is of a similar size to that of E. histolytica strain HM1:IMSS. It consists of 22,955,291 bp of DNA in 3312 scaffolds. AT content is also quite similar to that of E. histolytica strain HM1:IMSS (approximately 76.5%). 50% of the assembled sequence is predicted to be coding, with 8749 annotated genes [14]. Genome assembly of E. invadens strain IP1 appears to be larger than that of E. histolytica strain HM1: IMSS

Structure and organization of genome
Structure of E. histolytica genome has been extensively reviewed by Clark et al. [24]. Many interesting evolutionary features of E. histolytica genome have been highlighted. E. histolytica have gained a significant number of metabolic genes (at least 68) through horizontal gene transfer from bacteria [14,22,24]. Orthologues of these genes found in both E. histolytica and its evolutionary distant species E. invadens [15] indicate that gene transfer is ancient [14].
The haploid genome of E. histolytica strain, HK9 is 3 × 10 7 bp in size, based on renaturation kinetics experiments [26]. Hybridization of gene marker to pulse field gels identified 14 linkage groups with 1-4 chromosomes per linkage group per nucleus [27]. Tetra-nucleated E. histolytica cyst must contain at least one to two genome copies (1n-2n) in each of the nuclei [28]. However, karyotype analysis of E. histolytica trophozoite revealed the presence of at least 4 functional copies of many structural genes and therefore probably a ploidy that is a multiple of four [28]. Ploidy can vary even within a cell lineage under different growth conditions [28]. However, this phenomenon was only studied in-vitro and whether this occurs in nature is not known. The rRNA gene occurs in circular DNA molecules that exist in multiple copies per nucleus [29]. These circular structures could be important for determining parasite phenotypes. The rDNA episome varies in size from 15 kb to 25 kb depending on E. histolytica strains. The rDNA episome in E. histolytica virulent strain HM1:IMSS has two rDNA units per circle, while E. histolytica avirulent strain Rahman has only a single rDNA unit in its episome [30]. Moreover, Jasson et al. reported that structural genes for hemolysins were present within the ribosomal RNA repeat on extra-chromosomal DNA element of E. histolytica [31].
Initial characterization of E. histolytica genome revealed some unusual features of its organization. E. histolytica genome is highly repetitive (about 40% of the sequences are assigned to repetitive elements). Among them, tRNA genes are exceptionally abundant; with an estimated 4500 copies (about 10 times of human genome) were present. Moreover, most of these tRNA genes are clustered and organized into 25 distinct arrays. The tRNA arrays are composed of tandemly repeated units encoding between 1 and 5 tRNA acceptor types [32]. The intergenic regions of these tRNA genes comprises of short tandemly repeated sequences (STRs) which resembles the micro/mini satellites of eukaryotic genomes. The only difference is that unlike randomly dispersed micro/mini satellites, STRs form a part of a larger unit which is itself tandemly arrayed [32]. tRNA genes are thought to be "hotspots" for recombination and mutation due to their unique structural organizations [32]. The arrangement of tRNA gene showed inter-specific variation. E. histolytica has 2 versions of tRNA array containing Asn GTT and Lys CTT genes [i.e. (N-K1) and (N-K2)], while E. dispar genome contains only 1 type of [N-K] array. E. moshkovskii array units are significantly smaller than their homolog in E. histolytica and E. dispar and their intergenic regions do not contain any STRs [32]. STR regions between these tRNA array units showed high degree of intra-specific variation in their repeat number, type and arrangement patterns [13]. These particular features make them very useful as population genetic markers for quantification of evolutionary divergence of this fascinating parasite. The only proposed function of this tRNA array unit is nuclear matrix binding [33]. Moreover, circumstantial evidence also suggests that they may be located either at subtelomeric or at chromosomal ends and could be functional replacements of traditional telomere repeats [32].

Genomic rearrangements and transposable elements
Unlike Plasmodium which has a stable genomic organization even among distantly related species, Entamoeba exhibit high degree of genomic plasticity and instability [14]. Genome rearrangement associated with tissue invasion and organ tropism has been reported as one possible explanation for the different tRNA STR genotypes identified in liver abscess and stool derived parasites from the same infected person [34]. Transposons and repetitive DNA molecule, which are present abundantly in Entamoeba genome, may be responsible for genome reorganization [14]. Transposable elements are organized in clusters, frequently found at syntenic break points providing insights into their contribution to chromosome instability and therefore, to genomic variation and speciation in these parasites [35]. Investigation of repetitive elements within genome from three Entamoeba species identified hundred copies of LINE (long interspersed elements) and SINE (short interspersed elements) elements and a large proportion of Entamoeba specific repeats (ERE1 and ERE2). ERE1 is spread across the three genomes and is associated with different repeats in a species-specific manner [35]. ERE2 sequence was present exclusively in E. histolytica [14]. LINEs and SINEs are class I transposons, propagated by reverse transcription [36]. EhLINEs (LINEs of E. histolytica) each has a single open reading frame with a putative nucleic acid binding motif (CCHC) and restriction enzyme-like endonuclease domain located downstream of the reverse transcriptase (RT) domain. Phylogenetic analysis of the RT domain placed the EhLINEs in the R4 clade of non-LTR elements, a mixed clade of non-LTR elements that includes members from nematodes, insects, and vertebrates [36]. EhLINEs share a common 3′ end sequence with EhLSINEs (SINEs of E. histolytica) which indicates that they are involved in the retro transposition of EhLSINEs. EhLSINEs also have a conserved 5′ end, involved in regulation of their transcription [36]. A genome-wide comparison based on location of LINEs and SINEs elements in E. histolytica and E. dispar genome suggested that SINE expansion has taken place after divergence of two species. However, the basic retrotransposition machinery is conserved in these two species [37]. Since, LINE and SINE can profoundly influence the expression of neighboring genes, their genomic location can affect the phenotypic consequences of parasites [37]. Moreover, a recent study by Yadav et al. [38] suggested that E. histolytica can form recombinant SINEs at high frequency during induced retro transposition in-vivo. DNA transposons (class II transposons) are rarely present in E. histolytica and E. dispar, but are much more prevalent in E. invadens and E. moshkovskii [14]. Representatives of three DNA transposase superfamilies (hobo/Activator/ Tam3, Mutator, and piggyBac) were identified in Entamoeba in addition to a variety of members of a fourth superfamily (Tc1/mariner), previously reported only from ciliates and Trichomonas vaginalis among protozoans [39]. Genomic rearrangement might be responsible for variation in number of transposable elements in different lineages [14].

Large gene families and their diversities
The genome of E. histolytica contains a number of large multi-gene families [14]. One such gene family encodes a group of AIG1 like proteins [23]. AIG1 protein family comprises of 29 members distributed in 3 clusters [23]. 18 of them are present near transposons, but whether their duplication and subsequent growth are encouraged by the proximity of transposons is required to be explored [23]. AIG1 proteins are associated with resistance to bacteria [40]. Another gene family encodes a group of leucine-rich-repeats (LRRs) containing proteins, homologous to bacterial fibronectin (BspA of Bacteroides forsythus) [41,23]. Lorenzi et al. identified 114 genes encoding for BspA-like proteins in the genome of E. histolytica strain HM1:IMSS. 41 of them are associated with transposable elements [23]. Proteins of the family contain conserved N-terminal domain. However, no classic membrane-targeting signal is present in the proteins [23]. Hence, it is tempting to speculate that conserved N-terminal domain of proteins might function as either an export signal or serve as a membrane-anchor domain or that export involves a non-classical transport mechanism, independent of the ER-Golgi pathway, similar to those that have been detected in yeast and mammalian cells [42]. At least one member of this family is expressed at the external surface of parasite [41]. Genome survey of E. invadens identified multiple copies of these leucine-rich-repeats (LRRs) containing genes and differential gene expression within gene families has also been reported [43]. However, it is quite unknown whether gene expression has been controlled in such a way that a single gene family is expressed at any one time, as observed in other parasites like Trypanosoma and Plasmodium [14].
Entamoeba also encodes a large number of Rab GTPase (like another protozoan parasite T. vaginalis), involved in vesicular trafficking in the cell [44,45]. A total of 102 Rab GTPase distributed in 16 subfamilies have been annotated in genome of E. histolytica [44,45]. Majority of them showed moderate similarity to Rab from other organisms, while only 22 amoebic Rab proteins including EhRab1, EhRab2, EhRab5, EhRab7, EhRab8, EhRab11, and EhRab21 showed significant similarity to Rab from other organisms [44]. E. invadens has over 100 Rab genes similar to E. histolytica [45]. A comparison of Rab GTPase from E. histolytica and E. invadens revealed that most Rab subfamilies are conserved among these two Entamoeba species [45]. This indicates that Rab GTPase-controlled vesicular trafficking machinery is well conserved among them and expansion of the gene family largely occurred before the divergence of these two species [45]. Rab GTPases have been involved in the regulation of cysteine protease secretion and transport [46,47]. E. histolytica differentially expressed their RabB protein (EhRabB) during phagocytosis of target cells, suggesting the potential role of EhRabB protein in phagocytosis process [48]. EhRabB protein has been mutated experimentally at 118 amino acid position and thus the resulted protein (RabBN118I) was unable to bind guanine nucleotide and became constitutively inactive [49]. Over-expression of such mutated RabB protein within E. histolytica trophozoites resulted in a significant reduction of parasite phagocytosis, cytopathic activity and ability to produce liver abscess in hamster [49]. Hence, Rab-regulated vesicular trafficking is important for parasite biology and pathogenesis. Gene families encoding heavy (hgl) and light (lgl) chain subunits of virulence determinant Gal/GalNAc lectin present in multiple Entamoeba species, but genes for intermediate chain subunit (igl) are only detected in E. histolytica and E. dispar [24]. Bioinformatics comparison among members of this gene family from E. histolytica and E. dispar identified the evidence of gene conversion within the lineages, which may play an important role in molecular evolution of these parasites [50]. Cysteine protease-5, the key virulence factor of E. histolytica is present as a pseudogene in E. dispar [14]. Over-expression of specific cysteine protease genes (ehcp-b8, ehcp-b9 and ehcp-c13) within parasite cells also confers pathogenicity to non-pathogenic E. histolytica clone A1 [51]. Southern blot analysis indicates that the ariel surface proteins of E. histolytica are either not present or highly divergent in E. dispar [14].

Genetic diversity and population structure
Since E. histolytica genome does not appear to contain any microsatellite like elements, measurement of genetic diversity and estimation of population structures greatly rely on other genetic markers like Serine Rich E. histolytica protein (SREHP) gene and chitinase [14]. SREHP is an immune dominant surface antigen, involved in phagocytosis of apoptotic host cells to prevent inflammatory responses by host [52] whereas chitinase is only expressed during encystations of amoeba [53]. Both genes contain tandem repeats which showed high degree of interisolate diversity based on their repeat types and arrangement patterns [2,3,54]. However, SREHP gene showed comparatively high degree of polymorphism than chitinase [3]. Since SREHP is highly immunogenic, such high genetic diversity within SREHP gene may suggest that it has a biological role like immune evasion [55]. However, PCR amplification of SREHP gene often produces multiple and mixed PCR bands from a single strain due to allelic variation [18]. Direct sequencing of such mixed PCR products (without cloning of PCR product into a vector prior to sequencing) gives rise to a chromatogram showing multiple peaks at a single nucleotide position. Multiple variations of a single sequence can be obtained from the analysis of such a sequence and this can be misinterpreted as genetic diversity. tRNA linked STR loci of E. histolytica has proved to be a useful population genetic marker and has been used to identify the parasite genotypes associated with different disease outcomes [56]. Studies of genetic diversity based on 6 tRNA linked STR loci (i.e. D-A, S TGA -D, N-K2, R-R, A-L and S-Q) have identified few parasite genotypes associated with disease outcomes [8,13,57,12,58]. For example-5RR of R-R locus was associated with asymptomatic outcome, while 10RR was associated with symptomatic outcome [59]. J1DA and VEN2DA of D-A locus were associated with asymptomatic and symptomatic outcomes respectively [60]. Even though tRNA linked STR loci showed few associations with disease outcomes, they are actually surrogate marker and their variations are not at all directly linked to parasite virulence [59]. Moreover, these loci are frequently mutated to form new genotypes and hence any significant association of parasite genotype with disease outcome would be lost over time [18].
However, patterns of polymorphism within these repetitive DNA sometimes reflect the population structure of parasite [14]. For example, in Japan, diversity among parasite population infecting homosexual men was high, while diversity was much more limited among parasite infecting residents of institution [2]. Similarly, low diversity among parasite population infecting residents of institution was seen in the Philippines, where clear population structure was observed within and between locations [54]. In South Africa, genotypes clustered within households but showed extensive diversity among different households [61]. Recently, Zermeno et al. have proposed the worldwide genealogy and population structure of E. histolytica based on two tRNA linked STR loci (i.e. D-A and N-K2) [60]. Majority of these genotypes were found to be exclusive for a particular country. Only few were shared by isolates from different countries. For example-18NK, 17NK, 10NK, and 11NK of N-K2 locus and 5DA and 6DA of D-A locus were the only genotypes distributed in many regions. Among them, 18NK and 6DA, corresponding to the genotype of E. histolytica strain HM1:IMSS were the most abundant and widely distributed in many countries like Mexico, Bangladesh, Japan, China and the USA. However, genealogies based on these two individual loci (i.e. D-A and N-K2) suggested that there were no parasite lineages related with a particular geographic region. Moreover, concatenated analysis of two tRNA linked STR loci (i.e. D-A and N-K2) revealed the possibility of genetic recombination among the population studied [60]. Genetic organization of E. histolytica population from stool and liver abscess samples of same patients were also studied [34]. The study revealed that E. histolytica population from stool and liver abscess samples were genetically distinct [34]. However, few opposite but interesting scenarios have also been reported. E. histolytica population isolated from amoebic liver abscess (ALA) patients was genetically identical with those isolated from asymptomatic patients [57]. This finding was further supported by recent STR loci based genotyping study of E. histolytica from India. E. histolytica isolates remaining asymptomatic are genetically closer to those causing liver abscess rather than the diarrheal isolates ( Fig. 1) [12]. Repetitive DNA markers appear to be stable enough to link closely related parasites recently transmitted among members of a household, an institution or recent sexual partners [14]. However, extensive population diversity in limited geographic regions and frequent occurrences of novel genotypes limit the efficiency of repetitive loci to probe large scale, long term population structure of E. histolytica [14]. SNP (single nucleotide polymorphism) markers may be preferable in these situations.
SNPs within non-repetitive loci arising under neutral, positive and negative pressure are genetically stable and inherited by their descendents [11]. SNP analysis could be a successful strategy to identify the potential virulence marker of parasite linked to infection outcome [11,62]. Comparison between genome sequences of various E. histolytica strains deposited onto AmoebaDB database version 4.1 [25, www.amoebadb.org] have identified a total of 2613 genes, which contain intra-species SNPs within them. Most of the proteins encoded by these genes are hypothetical in nature, while the functions of some genes are known. Few of such genes with known and hypothetical functions are listed in Table 1. A large number of SNPs have been identified in serine threonine isoleucine rich protein (EHI_073630), gene for Gal/Gal NAc lectin lgl2 (EHI_065330), heat shock protein70 (EHI_159140), tyrosine kinase (EHI_124500), gene for AIG1 family protein (EHI_144270), gene for Rab family GTPase (EHI_059670), etc. Gal/Gal NAc lectin is a surface antigen of Entamoeba and involved in parasite adhesion with intestinal epithelium [50]. AIG1 proteins are associated with resistance to bacteria [40]. Rab GTPases are involved in vesicular trafficking machinery of parasite [44,45]. However, further investigation is required to determine the precise function of hypothetical proteins listed in Table 1. Homologs for some of these genes are also found in AmoebaDB database. Few genes of E. histolytica and their homologs are listed in Table 2  E. histolytica and has been shown to be involved in the parasite adherence to human enterocytes. It is also an important virulence factor in liver abscess pathogenesis [63,64]. kerp1 gene has been found in both E. histolytica (AmoebaDB i.d. EHI_098210) and E. nuttalli (AmoebaDB i.d. ENU1_189420) but not in E. dispar [64,65]. Analysis of AmoebaDB database version 4.1 [25, www.amoebadb.org] revealed that inter-species genetic variability within kerp1 gene was present among E. histolytica and E. nuttalli. A total of 10 SNPs were identified within E. nuttalli kerp1 sequence (ENU1_189420) in comparison to that of E. histolytica (EHI_098210). However, no intra-species genetic variability has been observed within the gene (EHI_098210). Genome sequencing of E. histolytica clinical isolates has also identified SNPs within cyclicin-2 gene, significantly associated with asymptomatic and liver abscess outcomes [62]. This indicates that cyclicin-2 could be an important  [66]. Sequence analysis of defined regions also suggests similar observations [11,67]. Such low level of genetic diversity suggests a relatively recent common ancestor for E. histolytica [14]. However, this observation was quite incongruous with a recent report by Gilchirst et al. [62]. Multilocus sequence typing of E. histolytica clinical isolates identified extensive population diversity, suggesting that the genotypes of individual parasites do not contain consistent phylogenetic signals. They have blamed genetic recombination events for such a result, since it can break down the linkage between target loci and assist to form loci with different genealogies [62]. Hence, an important question regarding the population structure of Entamoeba is whether the parasite populations are predominantly clonal or sexual. Sexual reproduction can help parasite to improve the fitness of their progeny [68]. Parasitic protists are continuously exposed to exogenous environmental factors and host immune pressure, which can alter the chemical structure and stability of their genome [68]. Parasites should repair structural alteration in their genome, since it can lead to mutations, deletions, insertions, translocation and loss of essential genetic information [68]. Parasites remove their DNA damage by recombinational DNA repair mechanism and this allows greater survival of offspring with undamaged DNA [68]. It is also an important mechanism to generate genetic diversity used by parasites to evade host immune response [68]. This particular feature of parasite is quite important, since sexual reproduction can exchange genes, responsible for drug resistance and parasite virulence. This could generate selectively advantageous genotypes that can spread very rapidly through host population [14]. Sexual reproduction can also help in the removal of deleterious genes. Current deleterious mutations brought together by sexual reproduction create unfit individuals that are eliminated from the population [68]. The genome of E. histolytica contains meiotic genes like SPO11, DMC1, and MND1 and many homologous recombination (HR) specific genes like MLH1, MSH2, RAD21 and RAD51 [22,69,68]. Moreover, ploidy changes and unscheduled gene amplification, which indicate the possibility of recombination have also been reported in Entamoeba [68]. E. histolytica contain a large number of retrotransposons in its genome, which also indicates their ability to reproduce by sexual means [68]. Organisms which reproduce solely by asexual means would eventually lose these retrotransposons from their genome [68]. However, Singh et al. recently provide the first direct demonstration of HR in Entamoeba using a construct with inverted repeats, which upon recombination results in sequence inversion. Increased rate of genetic recombination has been reported in Entamoeba under stress conditions and during encystation process [68]. Stage inter-conversion between cyst and trophozoite is crucial for disease transmission and pathogenesis in E. histolytica [68]. In addition to this, few indirect evidences of genetic recombination have also been identified in Entamoeba through population genetic studies. Complete genome sequencing of 10 axenic E. histolytica cell lines has identified pattern of polymorphism, indicates that recombination has occurred in the history of the population studied [66]. Concatenated genealogy based on repetitive loci (i.e. D-A and N-K2) also revealed the possibility of genetic recombination among E. histolytica population [60]. Bioinformatics comparison of Gal/GalNAc lectin among E. histolytica and its non-pathogenic sibling E. dispar also identified the evidence of gene conversion within the lineages [50].
Transposable elements constitute a significant portion of E. histolytica genome and they can affect the expression of adjacent genes [37]. Phenotypic characteristic of this parasite is also influenced by their genomic location [37]. Variability in genomic distribution of SINE1 and SINE2 among E. histolytica clinical isolates has been recently studied by Kumari et al. [70]. Several loci with extensive polymorphism of SINE occupancy among E. histolytica strains have been identified [70].

Conclusion
Queries related to evolution and population structure of E. histolytica still remains to be investigated. One of the concerning issue is whether E. histolytica population is sexual or clonal. Circumstantial evidence suggested that Entamoeba might engage in genetic recombination at some stage in their life-cycle. However, further detailed investigations with Entamoeba and other early branching protists are required to understand the origin of their sexual reproduction and to determine the variety of mechanisms by which these organisms exchange their DNA. Another major question that arises is whether E. histolytica population from ALA patients is genetically closer to that of asymptomatic individuals. If they are close (few studies suggested this), then individuals with persistent asymptomatic E. histolytica infection may be under high risk of developing ALA in the future. Prompt preventive measures should be undertaken for such individuals. Advanced whole genome sequencing of E. histolytica clinical isolates can be helpful to address this question.