Genomic Organization and Generation of Genetic Variability in the RHS (Retrotransposon Hot Spot) Protein Multigene Family in Trypanosoma cruzi

Retrotransposon Hot Spot (RHS) is the most abundant gene family in Trypanosoma cruzi, with unknown function in this parasite. The aim of this work was to shed light on the organization and expression of RHS in T. cruzi. The diversity of the RHS protein family in T. cruzi was demonstrated by phylogenetic and recombination analyses. Transcribed sequences carrying the RHS domain were classified into ten distinct groups of monophyletic origin. We identified numerous recombination events among the RHS and traced the origins of the donors and target sequences. The transcribed RHS genes have a mosaic structure that may contain fragments of different RHS inserted in the target sequence. About 30% of RHS sequences are located in the subtelomere, a region very susceptible to recombination. The evolution of the RHS family has been marked by many events, including gene duplication by unequal mitotic crossing-over, homologous, as well as ectopic recombination, and gene conversion. The expression of RHS was analyzed by immunofluorescence and immunoblotting using anti-RHS antibodies. RHS proteins are evenly distributed in the nuclear region of T. cruzi replicative forms (amastigote and epimastigote), suggesting that they could be involved in the control of the chromatin structure and gene expression, as has been proposed for T. brucei.


Introduction
The flagellate protozoan Trypanosoma cruzi is the etiologic agent of Chagas disease or American trypanosomiasis, which affects 6-7 million people mainly in Latin America, with an increasing number of cases in non-endemic countries such as Canada, the United States of America, and some European countries [1]. When compared with other members of the genus Trypanosoma, the T. cruzi genome was expanded, being 2.3-fold larger than that of T. brucei and T. rangeli. Repetitive DNA sequences comprise about 52% of the T. cruzi genome [2][3][4]. The dramatic expansion and diversification of repetitive sequences, particularly of multigene family encoding proteins, such as surface proteins (TS (Trans-Sialidase), MASP (Mucin-Associated Surface Protein), mucins, gp63, Retrotransposon Hot Spot (RHS), and DGF-1 (Dispersed Gene Family-1)) may have contributed to the speciation of the T. cruzi taxon [2,5]. RHS proteins are coded by a multigene family found in the genus Trypanosoma.

Identification of RHS Sequences in T. cruzi and T. cruzi marinkellei Genome Databases
The search for homologous RHS genes in the TriTrypDB and GenBank databases was performed using the algorithms BLASTp, tBLASTn, BLASTx, and the presence of RHS domain architecture was confirmed using rpsBLAST [17]. RHS transcripts of CLB were used as queries to identify homologous sequences in other Trypanosoma species using the tBLASTn (e-value of 1 × 10 −3 ) search program. The retrieved sequences were evaluated for the presence of RHS domains with the rpsBLAST algorithm (e-value of 1 × 10 −5 ) against the database of conserved domains [18]. An extra round of tBLASTn was performed using found RHS sequences as a query to improve genome survey sensibility. Figure S1 shows the flowchart of this analysis. Sequence alignments were carried out with RHS of clone CLB excluding truncated sequences. The nucleotide and amino acid sequences were aligned using the MUSCLE program [19] and the poorly conserved regions were removed using the Gblocks program [20].

Classification and Phylogenetic Analyses of RHS
For these analyses, we selected RHS transcripts of the T. cruzi clone CLB [21]. Transcribed genes were analyzed for the presence of RHS domains with the rpsBLAST algorithm using 1 × 10 −5 e-value against the NCBI Conserved Domain Database (CDD) [18] ( Figure S1). Sequences that showed false-positive RHS domains and pseudogenes were excluded. In the phylogenetic analysis, the global multiple alignment was carried out with the MUSCLE algorithm [19]. Phylogenetic trees were generated using the "Maximum likelihood method" using the RaxML v 8.2.9 program [22], with an automatic search for substitution models (PROTGAMMAAUTO) selected by the Akaike information criterion (AIC) (auto-prot = AIC) information criterion, with 1000 bootstrap replicas. The phylogenetic tree was visualized with the program FigTree V 1.4.2 [23].

Detection of Potential Recombination Events in RHS Sequences
The RHS sequences selected for the phylogenetic study were also used to identify recombination events in the clone CLB using the RDP4 program (Recombination Detection Program) [24], which allows the identification and statistical analysis of recombination events from a set of aligned sequences. It uses non-parametric recombination detection methods (algorithms RDP, GENECONV, MaxChi, Chimera, Bootscan, 3Seq, and SiSscan) to identify breakpoints in the genomic sequences where recombination begins and ends, in addition to the donor parental sequences of the recombinant fragment. For recombination events, sequences detected by at least 6 of the 7 algorithms in the RDP4 package were considered recombinant.

Expression and Purification of Recombinant RHS
An 877-bp fragment encoding a 292-aa region of the carboxy-terminal domain of the RHS (TcCLB.511055.20) was amplified by PCR from CLB genomic DNA, cloned into pGEM-T, and sequenced to confirm gene identity. Then, it was subcloned into pGEX-1λT to produce the RHS-GST fusion protein as described by Martins et al., 2015 [24]. E. coli BL21 bacteria were transformed with the RHS-GST construct, grown in LB medium, and protein expression was induced with 1 mM isopropyl β-D-1-thiogalactopyranoside (IPTG). The RHS recombinant protein was extracted from the insoluble fraction of bacterial lysates with Laemmli's sample buffer and separated on 10% SDS-PAGE. The band W to the recombinant protein was excised from the gel and extracted by dialysis against ammonium bicarbonate and distilled water [24]. The purity of recombinant RHS was checked by SDS-PAGE stained with colloidal Coomassie Blue and immunoblotting ( Figure S2). Purified protein was quantified with Coomassie Plus (Pierce, Thermo Fisher Scientific, Waltham, MA, USA) in 96-well plates at 620 nm.

Antibody Production, Western Blot, and Immunofluorescence Analyses
About two mg of the purified RHS recombinant protein were sent to Rheabiotech Research and Development Laboratory, SP, Brazil, for the production of polyclonal anti-RHS antibodies in mice. The specificity and reactivity of the anti-RHS antibodies were determined by ELISA and Western blot assays using the recombinant protein RHS.
Epimastigotes (10 8 cells) of T. cruzi (clone CLB, strain G), T. cruzi marinkellei, and T. rangeli, and procyclic forms (10 7 cells) of T. brucei were washed in PBS and lysed with 4 × Laemmli's sample buffer, and the extracts were subjected to SDS-PAGE (10% for separation gel and 3% for packaging gel) at 120 V for 45 min. Proteins were transferred to Hybond ECL membranes (Amersham, GE Healthcare Life Sciences, Foster, CA, USA). For the Western blot reaction, the membrane was blocked in 1× PBS Genes 2020, 11, 1085 4 of 19 solution containing 7.5% skimmed milk powder (PBS/milk solution) for 1 h at room temperature. The membrane was then incubated with PBS/milk solution anti-RHS1 (dilution 1:500) for 1 h, at room temperature. Subsequently, the membrane was washed three times (3 × 5 min) in PBS containing 0.05% Tween 20 (PBS/Tween solution). Secondary antibodies (Sigma Aldrich, St. Louis, MO, USA) were incubated for 1 h at room temperature at a dilution of 1:10,000. Bound antibody signals were amplified with ECL (Enhanced Chemiluminescence) substrate (GE Healthcare, Buckinghamshire, UK) and luminescent bands visualized in an Alliance 2.7 photo documenter (UVItec, Cambridge, UK).
For indirect immunofluorescence assay, T. cruzi epimastigotes (10 7 cells) were harvested from the culture medium, washed with PBS, and fixed with 2% paraformaldehyde in PBS for 15 min at room temperature. Then, the parasites were washed with PBS and incubated with anti-RHS antibodies (1:1000 dilution) in the presence of 0.1% saponin and 1% PBS/BSA for 1 h at room temperature. The parasites were washed once more with PBS and incubated for 1 h with an Alexa Flour 568 anti-mouse IgG antibody raised in goat diluted 1:100 in 1% PBS/BSA and 1 mM DAPI (4 ,6 -diamino-2-phenylindole, Molecular Probes). Subsequently, epimastigotes were washed with PBS and the slides were mounted using Glycerol-PPD (p-Phenylenediamine). Images were acquired with a TCS SP5 II TandemScanner confocal microscope (Leica Microsystems, Wetzlar, Germany) using a 63 × NA 1.40 PlanApo oil immersion objective and processed with Imaris software 7.0 (Bitplane).
The clone CL Brener (CLB) is a hybrid strain grouped in lineage TcVI, and sequence analysis of its genome revealed the presence of two haplotypes [2], one of which has contigs similar to the Esmeraldo strain of lineage TcII. The sequence divergence between the two haplotypes is 5.4% [2]. The genomic sequences generated in the Genome Project of T. cruzi clone CLB have been organized in 41 pairs of homologous chromosomes (TcChr), with the smallest having 77,958 bp (TcChr1) and the largest 2,371,736 bp (TcChr41) [2,40,41]. Due to the hybrid nature of CLB, each pair of homologous chromosomes consists of one homolog, which is an Esmeraldo-like-haplotype (S), and another homolog, which is a non-Esmeraldo-like haplotype (P), totaling 82 in silico chromosomes (TcChr) [2,40]. A search for RHS sequences in the CLB genome deposited in the TriTrypDB database resulted in 525 RHS sequences (111 genes, 384 pseudogenes, 30 truncated sequences), which are distributed in the haplotypes as follows: 48 complete genes, 177 pseudogenes, and 8 truncated sequences in the Esmeraldo haplotype (S), and 63 complete genes, 207 pseudogenes and 22 truncated sequences in the non-Esmeraldo haplotype (P) ( Table S1). Besides these sequences, we found 42 complete RHS genes, 175 pseudogenes, and 11 truncated sequences among the unallocated contigs, totaling 753 RHS sequences in the CLB genome. RHS gene sizes range from 351 to 3014 bp. The estimated RHS content of the CLB genome was 3,271,841 bp, comprising about 5.4% of the T. cruzi genome sequence.
The distribution of RHS sequences along the CLB chromosomes is shown in Figure S3. Among 82 chromosomes, three chromosomes, TcChr1-S, TcChr4-S, and TcChr34-S, did not show RHS sequences. Larger chromosomes, such as TcChr40 and TcChr41, have predominantly RHS pseudogenes (Table S1), suggesting that RHS and other repetitive sequences could be involved in the expansion of the chromosome size. It is important to highlight that the total number of RHS sequences present in the genome of the CLB may be even greater than that obtained in this analysis. When non-transcribed sequences were included in our analysis, the total number of RHS sequences was larger than one thousand, showing the presence of fragments dispersed in the genome, which are reminiscent of RHS genes. These results reflect the complexity of the T. cruzi genome and RHS family [2,6,42]. The haploid genome of T. cruzi is about 2-and 5-fold larger than that of T. brucei and Leishmania spp., respectively. In addition, multigenic families (trans-sialidases, mucins, DGF-1, MASP, RHS, and GP63 proteases) underwent a very pronounced expansion process in T. cruzi [2,3,6,[42][43][44].
The frequency of RHS sequences in each chromosome of CLB was plotted as a heatmap in Figure 1, and the proportion of total RHS length in each chromosome is shown in Figure S4. RHS sequences comprise 0.34% to 6.14% of the entire length of each CLB chromosome. Overall, the frequency of RHS was similar in most pairs of homologous chromosomes. However, in some homologous pairs, this proportion was quite different, e.g., between the haplotypes S and P of the chromosome TcChr20 or TcChr21.
The frequency of RHS sequences in each chromosome of CLB was plotted as a heatmap in Figure 1, and the proportion of total RHS length in each chromosome is shown in Figure S4. RHS sequences comprise 0.34% to 6.14% of the entire length of each CLB chromosome. Overall, the frequency of RHS was similar in most pairs of homologous chromosomes. However, in some homologous pairs, this proportion was quite different, e.g., between the haplotypes S and P of the chromosome TcChr20 or TcChr21. Figure 1. Circos diagram depicting the genomic organization and recombination events of the RHS family in the whole genome of T. cruzi clone CLB. Inner track 1 represents the recombination between RHS genes. The recombinant sequences are linked to putative major and minor parental, using purple and green lines, respectively. Track 2 shows the genomic organization of RHS genes in chromosomes. Genes on forward and reverse strands are colored in blue and red, respectively. Track 3 shows the genomic organization of RHS pseudogenes in chromosomes. Pseudogenes on forward and reverse strands are colored in green and orange, respectively. Track 4 depicts a heat map of RHS genes' and pseudogenes' density for each chromosome. Values were obtained by summing the length (bp) of RHS genes and pseudogenes and were divided by the chromosome size. Outer track 5 shows the representation of T. cruzi CLB chromosomes for Esmeraldo (haplotype S) and non-Esmeraldo (haplotype P) allelic loci.

Figure 1.
Circos diagram depicting the genomic organization and recombination events of the RHS family in the whole genome of Trypanosoma cruzi clone CLB. Inner track 1 represents the recombination between RHS genes. The recombinant sequences are linked to putative major and minor parental, using purple and green lines, respectively. Track 2 shows the genomic organization of RHS genes in chromosomes. Genes on forward and reverse strands are colored in blue and red, respectively. Track 3 shows the genomic organization of RHS pseudogenes in chromosomes. Pseudogenes on forward and reverse strands are colored in green and orange, respectively. Track 4 depicts a heat map of RHS genes' and pseudogenes' density for each chromosome. Values were obtained by summing the length (bp) of RHS genes and pseudogenes and were divided by the chromosome size. Outer track 5 shows the representation of T. cruzi CLB chromosomes for Esmeraldo (haplotype S) and non-Esmeraldo (haplotype P) allelic loci.

Phylogeny and Classification of the RHS Multigene Family of Clone CLB
In the phylogenetic analysis, the transcribed RHS genes were examined for the presence of RHS domains by rpsBLAST using an e-value of 1 × 10 −5 against the database of conserved domains [18]. Aiming to reveal the real extension of recombination events within RHS genes, in this analysis, we excluded non-LTR retrotransposons or other protein families with which RHS are commonly associated. The presence of conserved RHS domains (pfam07999, PTZ00209, and TIGRO1631) was also confirmed in other databases (CDD, Pfam, SMART, KOG, COG, PRK, and TIGR). The analysis of 139 RHS amino acid sequences was carried out using the maximum likelihood method in the RxML v 8.2.9 program by replacement models (PROTGAMMAAUTO). One thousand bootstrap replicas were processed to confirm the degree of reliability of the groups, assuming bootstrap values >75. Seventy-four RHS sequences can be categorized into groups 1 to 10 with values above the cutoff (indicated in colors), while three groups comprising 65 sequences with bootstrap values below the cutoff (indicated in black) were designated as unclassified groups. The number of sequences per group ranged from two RHS sequences in group 10 (light blue) to 15 sequences in group 3 (red) ( Figure 2 and Table 1). Phylogenetic analysis showed that each RHS group consists of a monophylogenetic group. The results were also shown in the format rooted in the midpoint ( Figure S5), where all the sequences with their respective TriTrypDB access numbers can be appreciated [41].

Phylogeny and Classification of the RHS Multigene Family of Clone CLB
In the phylogenetic analysis, the transcribed RHS genes were examined for the presence of RHS domains by rpsBLAST using an e-value of 1 × 10 −5 against the database of conserved domains [18]. Aiming to reveal the real extension of recombination events within RHS genes, in this analysis, we excluded non-LTR retrotransposons or other protein families with which RHS are commonly associated. The presence of conserved RHS domains (pfam07999, PTZ00209, and TIGRO1631) was also confirmed in other databases (CDD, Pfam, SMART, KOG, COG, PRK, and TIGR). The analysis of 139 RHS amino acid sequences was carried out using the maximum likelihood method in the RxML v 8.2.9 program by replacement models (PROTGAMMAAUTO). One thousand bootstrap replicas were processed to confirm the degree of reliability of the groups, assuming bootstrap values >75. Seventy-four RHS sequences can be categorized into groups 1 to 10 with values above the cutoff (indicated in colors), while three groups comprising 65 sequences with bootstrap values below the cutoff (indicated in black) were designated as unclassified groups. The number of sequences per group ranged from two RHS sequences in group 10 (light blue) to 15 sequences in group 3 (red) ( Figure 2 and Table 1). Phylogenetic analysis showed that each RHS group consists of a monophylogenetic group. The results were also shown in the format rooted in the midpoint ( Figure  S5), where all the sequences with their respective TriTrypDB access numbers can be appreciated [41].    The bulk of detailed information of the RHS groups of the CLB genome, such as chromosome mapping, genomic location including the subtelomeric region, the sizes of the coding sequence, and the predicted translated protein, is shown in Table 1. Most of RHS transcribed genes (70%) encode proteins of approximately 60 to 180 kDa, and the remainder encode peptides of 38 to 10 kDa. The RHS sequences selected for phylogenetic analysis were those assigned to CLB chromosomes (TcChr). Out of 74 RHS sequences, 58 genes have only one copy located in haplotype S or P, resulting in a hemizygous condition. Twenty-two of the hemizygotes are located in the subtelomere, a polymorphic region susceptible to homologous recombination, including ectopic recombination [5,45,46].
Our results showed that RHS hemizygotes can also be found in the interstitial chromosome regions in which the synteny is interrupted by a set of RHS sequences [47,48]. It has been proposed that the T. cruzi genome is organized in two compartments: a core compartment comprising conserved and hypothetical conserved genes, and a non-syntenic region (disruptive compartment) enriched by repetitive sequences such as members of multigene families TS, MASP, and mucins [3]. Other multigene families (GP63, DGF-1, and RHS) are dispersed throughout both compartments [3].
The members of the RHS groups are organized in multiple clusters at various genomic locations on different chromosomes, including the core and disruptive compartments and subtelomeres. (Table 1 and Figure 1). The distance between two contiguous RHS genes ranged from 2 to 50,000 bp and the identity from 55 to 98%, suggesting the occurrence of gene duplication by homologous mitotic recombination, as has been described in fungi [52,53]. Some rearrangements could be explained by unequal crossing-over between homologous chromatids (interhomolog crossover) leading to the loss of the tandem counterparts in one of the haplotypes. For example, the RHS genes of groups 1 and 7 located on chromosomes TcChr4-P and TcChr7-S, respectively, were mapped in only one haplotype, indicating the loss of these genes in the corresponding haplotype ( Figure 3A,B). The RHS genes of group 6 were mapped to the chromosomes TcChr15-P and TcChr15-S, and only the first gene (TcCLB.511871.130) of the cluster was present on the TcChr15-S haplotype, the remainder was lost by unequal crossing-over-recombination between homologous chromatids ( Figure 3C). The homologous RHS genes of the TcChr15-P encode proteins with >93% identity with each other, and they share 84% identity with the paralogous RHS (TcCLB.511871.130) of the TcChr15-S haplotype. These results showed that duplications gave rise to RHS sequences in tandem that maintained the structure of the functional gene.
The RHS genes of group 7 located on the chromosomes TcChr16-P and ThChr16-S share 84-97% identity ( Figure 3D), and this arrangement could be explained by genetic duplication followed by The RHS genes of group 6 were mapped to the chromosomes TcChr15-P and TcChr15-S, and only the first gene (TcCLB.511871.130) of the cluster was present on the TcChr15-S haplotype, the remainder was lost by unequal crossing-over-recombination between homologous chromatids ( Figure 3C). The homologous RHS genes of the TcChr15-P encode proteins with >93% identity with each other, and they share 84% identity with the paralogous RHS (TcCLB.511871.130) of the TcChr15-S haplotype. These results showed that duplications gave rise to RHS sequences in tandem that maintained the structure of the functional gene.

Generation of Genetic Variability by Recombination between T. cruzi RHS Sequences
In the phylogenetic analysis, we found sixty-five RHS sequences distributed in branches with low bootstrap values, which were included in the unclassified groups. Due to the high number of unclassified sequences, we investigated whether recombination events had also occurred in these sequences. We used the Circos plot to map the recombination events between RHS with a single link connecting each pair of paralogs (Figure 1). We identified 53 recombination events in 139 RHS sequences that were confirmed by at least six of the seven algorithms of the RDP4 package ( Figure 4). We found that about 60% of the recombination events occurred in the unclassified sequences. Thirty-two unclassified RHS sequences were involved in the recombination events. The size of the fragment inserted into the target sequence by recombination is quite variable, and it may represent approximately 4% of the entire RHS gene. The recombination between the RHS genes results in mosaic structures that can contain up to three fragments of different RHSs inserted in the target sequence.
The recombination events occurred in different regions of RHS including the coding regions of the amino-and carboxy-terminal portions, as well as in the central region of the protein. Most recombination events were detected in the RHS sequences of group 3 that served as donors into unclassified sequences and eventually into sequences from other RHS groups. The recombination events occurred in specific regions, e.g., the amino-terminal coding region of RHS genes. As an example, the insertion of the same RHS sequence TcCLB.507841.14 of group 7 into the amino-terminal coding region of unclassified RHS sequences is shown (Figure 4, see recombination events 46 to 53).

Expression and Subcellular Localization of RHS in T. cruzi
The expression of RHS in T. cruzi and other trypanosomes was analyzed by Western blot using anti-RHS antibodies raised against a recombinant protein carrying a 292-amino acid region from the carboxy-terminal domain of RHS (TcCLB.511055.20) of CLB. This region is conserved among RHS of some T. cruzi strains (Dm28c, Sylvio X10/1, Y, Bug2148, Tulahuen, TCC) and T. cruzi marinkellei. The location of RHS (TcCLB.511055.20) in the nucleus has been experimentally demonstrated in the nuclear subproteome of clone CLB [54].
The anti-RHS polyclonal antibodies identified different protein profiles among T. cruzi strains and trypanosome species. They reacted strongly with two bands of 118 kDa and 112 kDa in the T. cruzi clone CLB and G strain, and weakly with two additional bands of 65 kDa and 29 kDa in CLB. A single band of 65 kDa was detected in T. cruzi marinkellei and T. rangeli, and a band of 82 kDa in T. brucei ( Figure 5A). The sizes of RHS proteins identified by Western blot are consistent with those predicted RHS ORFs in the T. cruzi strains and T. cruzi marinkellei. These results suggest that the RHS genes encoding the 118 kDa and 112 kDa proteins are expressed in the CLB and G strain, whereas the lower molecular weight (65 kDa and 29 kDa) RHS proteins are expressed only in lower amounts in CLB. T. cruzi marinkellei and T. rangeli showed a similar expression profile consisting of a single 65 kDa band. The presence of an 82 kDa RHS in T. brucei is in agreement with the RHS protein profile (85 to 110 kDa) described in this trypanosome [6].

Expression and Subcellular Localization of RHS in T. cruzi
The expression of RHS in T. cruzi and other trypanosomes was analyzed by Western blot using anti-RHS antibodies raised against a recombinant protein carrying a 292-amino acid region from the carboxy-terminal domain of RHS (TcCLB.511055.20) of CLB. This region is conserved among RHS of were detected with fluorescent anti-RHS antibodies (shown in blue and green, respectively). The fluorescence distribution in the permeabilized parasites is concentrated at the nuclear region, confirmed by its colocalization with DAPI ( Figure 5B merge). RHS distribution was concentrated in spots within the nucleus. Anti-RHS also reacted within the nucleus of intracellular amastigote ( Figure  6), but no reaction was found in trypomastigotes. Taken together, these results suggest that RHS proteins of clone CLB have a predominantly nuclear location. Permeabilized parasites were analyzed by indirect immunofluorescence, using anti-RHS antibodies ( Figure 5B). Nuclear and kinetoplast DNA was labeled with DAPI, and the RHS proteins were detected with fluorescent anti-RHS antibodies (shown in blue and green, respectively). The fluorescence distribution in the permeabilized parasites is concentrated at the nuclear region, confirmed by its colocalization with DAPI ( Figure 5B merge). RHS distribution was concentrated in spots within the nucleus. Anti-RHS also reacted within the nucleus of intracellular amastigote (Figure 6), but no reaction was found in trypomastigotes. Taken together, these results suggest that RHS proteins of clone CLB have a predominantly nuclear location.
with anti-RHS polyclonal antibodies (diluted 1:500). The RHS recombinant protein was included as a positive control. The molecular masses of the reference proteins are indicated on the left in kDa. (B) Confocal microscopy images from indirect immunofluorescence reaction with anti-RHS antibodies (diluted 1:1000) in permeabilized epimastigotes of clone CLB. The labeling of the nucleus and kinetoplast DNA (DAPI) and RHS proteins is shown in blue and green, respectively. At the top, the reaction with two epimastigotes is shown at 3 μm scale. In the lower panel, the image shows epimastigotes (scale bar 10 μm). N, nucleus; K, kinetoplast.

Genomic Organization and Generation of Genetic Variability in the RHS Multigene Family in T. cruzi
RHS is a genus-specific multigene family identified in the genome of all trypanosomes sequenced so far. RHS genes have a retrotransposon insertion site in their 5′ coding region, which is predicted to disrupt more than 50% of the members of this family. Therefore, our phylogenetic analysis was restricted to transcribed RHS sequences with an uninterrupted ORF encoding the RHS domain. RHS proteins of clone CLB were categorized into 10 groups with significant bootstrap (Figure 2), suggesting that each RHS subfamily is a monophyletic group, as previously reported in T. brucei [6]. Regarding the unclassified RHS sequences, they were separated from the rest of the groups, suggesting some structural differentiation among these sequences, and they evolved together with other RHS groups. Our search showed that T. cruzi RHS paralogous genes shared 75-100% identity at the amino acid level, whereas they shared 30-47% identity with orthologous genes from other trypanosome species, such as T. rangeli, T. grayi, T. evansi, T. vivax, T. brucei, T. theileri and T. conorhini. From these results, we may infer that RHS genes evolved from a common ancestor and started diverging by speciation.
Once we defined the RHS sequence groups of T. cruzi CLB, the next question was whether recombination events occurred among the members of the various RHS groups including the unclassified ones. The comparison of transcribed RHS sequences showed the occurrence of one to three recombinational events resulting in a mosaic structure, which contains up to three fragments derived from different RHSs. The RHS sequences of unclassified groups comprised ~47% of total

Genomic Organization and Generation of Genetic Variability in the RHS Multigene Family in T. cruzi
RHS is a genus-specific multigene family identified in the genome of all trypanosomes sequenced so far. RHS genes have a retrotransposon insertion site in their 5 coding region, which is predicted to disrupt more than 50% of the members of this family. Therefore, our phylogenetic analysis was restricted to transcribed RHS sequences with an uninterrupted ORF encoding the RHS domain. RHS proteins of clone CLB were categorized into 10 groups with significant bootstrap (Figure 2), suggesting that each RHS subfamily is a monophyletic group, as previously reported in T. brucei [6]. Regarding the unclassified RHS sequences, they were separated from the rest of the groups, suggesting some structural differentiation among these sequences, and they evolved together with other RHS groups. Our search showed that T. cruzi RHS paralogous genes shared 75-100% identity at the amino acid level, whereas they shared 30-47% identity with orthologous genes from other trypanosome species, such as T. rangeli, T. grayi, T. evansi, T. vivax, T. brucei, T. theileri and T. conorhini. From these results, we may infer that RHS genes evolved from a common ancestor and started diverging by speciation.
Once we defined the RHS sequence groups of T. cruzi CLB, the next question was whether recombination events occurred among the members of the various RHS groups including the unclassified ones. The comparison of transcribed RHS sequences showed the occurrence of one to three recombinational events resulting in a mosaic structure, which contains up to three fragments derived from different RHSs. The RHS sequences of unclassified groups comprised~47% of total transcribed RHS, being involved in~60% of the recombinational events in which they were used as a template to generate new RHS sequences. Our results suggest that the RHS family has been subjected to rapid gene turnover, resulting in different paralogous groups that are conserved for functional reasons. We believe that the unclassified RHSs may act as sequence reservoirs that can recombine with functional paralogs to generate diversity, and at the same time preserve intact copies in the RHS gene family. The lack of ancestral sequences could be explained by a continuous process of gene turnover mediated by gene conversion (allelic or ectopic) and unequal crossing-over.
The complexity of the RHS family may also be related to the large number of pseudogenes that comprise more than 50% of the family [2,6,7,42]. In T. cruzi and T. brucei, the repertoire of pseudogenes is of great importance in the generation of variants of multigenic families involved in parasitic virulence [6,[55][56][57][58][59]. Taken together, these results suggest that trypanosomes developed alternative mechanisms for achieving genetic diversity in the multigene families, one of which uses incomplete genes (pseudogenes) in the generation of functional genes, while others promote recombination between functional genes. These mechanisms acting together may lead to the generation of multiple RHS sequences, resulting in the diversity within this family but preserving intact RHS copies in the genome.
Sequence diversity in the RHS multigene family of T. cruzi may be generated by unequal crossing-over (sister chromatid exchange and interhomolog crossover), segmental gene conversion, and interlocus nonallelic gene conversion. Tandem duplication generated by unequal crossing-over over between non-sister homologous chromatids (interhomolog crossover) may occur with the loss of tandem allelic counterparts in one of the haplotypes, leading to a condition called hemizygosity. Out of 139 transcribed RHS genes of CLB, 58 genes (~42%) have only one allele with no counterpart in the other haplotype (S or P), resulting in a hemizygous condition. We identified 22 RHS hemizygotes mapped in the subtelomere, which is a polymorphic region that is susceptible to homologous and ectopic recombination [5,45,46,49,51]. Callejas et al., 2006 [60] identified a large hemizygous subtelomere region in the chromosome I of T. brucei. This region accounted for three-quarters of the length of chromosome I and resulted in the amplification and divergence of gene families such as VSG (Variant Surface Glycoprotein) [60].
There is some evidence in the genome of T. cruzi that segmental gene conversion is involved in the generation of sequence diversity for multigene families organized in tandem array repeats [61][62][63][64]. In addition to segmental genetic conversion, we also found evidence of interlocus nonallelic gene conversion (IGC) among gene duplicates between loci. Gene conversion has been proposed as an active force in the evolution of trypanosomes [65]. Araujo et al., 2020 [66] showed that DNA replication origins in T. cruzi are preferentially located at the subtelomeric region, which is a site of conflict between transcription and replication that may lead to DNA double-strand breaks and generation of diversity. Wier et al., 2016 [67] suggested that gene conversion is the mechanism used by T. brucei gambiensis to avoid the Meselson effect of accumulation of mutations on the chromosomes for lack of sexual recombination in this species. The proposed mechanism is based on the repair of a defective gene copy on a chromosome by copying and pasting the functional gene from the homologous chromosome.

The Role of RHS Proteins in T. cruzi
We found that RHS proteins are located in the nucleus of epimastigotes and amastigotes of T. cruzi. This is in agreement with previous work [54] that identified the presence of 74 RHS proteins with apparent molecular masses of 12 to 111 kDa in the nuclear proteome of T. cruzi epimastigotes [54]. These data were corroborated by Western blot analysis, in which we identified RHS proteins from 29 to 118 kDa in CLB. Despite the large number of RHSs expressed in T. cruzi, the profile of proteins recognized by anti-RHS antibodies is relatively simple, composed of 2-3 strongly reactive proteins. A similar profile was described in T. brucei, and it may be due to the absence of cross-reactivity between RHSs of different families [6].
Proteomic studies revealed that RHS proteins are expressed in epimastigotes of T. cruzi [68,69]. More recently, approximately 39 RHS isoforms expressed in T. cruzi trypomastigotes have been identified [70]. However, the diversity of RHS proteins detected by immunoblotting was more restricted, since only eight RHS isoforms were observed in this study [71]. The absence of reactivity of anti-RHS antibodies generated against the carboxy-terminal domain of RHS (TcCLB.511055.20) of CLB with T. cruzi trypomastigotes suggests that RHS proteins carrying the epitopes used in the mice immunization were not expressed in this developmental form. RHS proteins seem to be constitutively expressed in T. brucei, but they are more abundant in the procyclic forms of this parasite [6]. More recently, it has been reported that several RHSs are stage-specific regulated [10].
Since RHS is a target for the insertion of retrotransposons, the participation of RHS in controlling the expansion of these mobile elements has been proposed. Other functions for RHS have been related to T. brucei. TbRRM, a modulator of the chromatin structure in T. brucei, interacts with RHS transcripts, proteins and histones, suggesting that the RHS family could be involved in chromatin modeling [10]. Recently, it has been reported that several RHS proteins (RHS2, RHS4, and RHS6) may act as factors involved in the transcription elongation and mRNA export in T. brucei [11].
Little is known about the role of RHS in the T. cruzi life cycle. T. cruzi RHS proteins have been identified in the secretome of epimastigotes, trypomastigotes, and amastigotes, indicating that they are exported to the extracellular medium [71][72][73][74]. Bautista-Lopez et al., 2017 [71] showed that RHS proteins were present in the extracellular vesicles (EVs) released by T. cruzi trypomastigotes and amastigotes in infected Vero cells. The secreted RHS proteins reacted with sera from chronic chagasic patients ranging from asymptomatic to advanced cardiomyopathy. EVs are important modulators of the mammalian host-T. cruzi relationships, such as heart parasitism, susceptibility to infection of mammalian cells, and inflammatory response [72,75]. The immunoreactivity of RHSs from EVs suggests that they could participate, possibly as adjuvants, in the interaction of T. cruzi with the mammalian host. In this context, it is noteworthy that RHS is more abundant in the T. cruzi strains infective for humans (Bug2148, Y, and Sylvio X10) than in B7, which is not infective in humans [44].
In conclusion, our data suggest that unequal mitotic crossing-over and gene conversion play a significant role in shaping the patterns of homology between the RHS paralogous repeats that accelerate the generation of diversity within this multigene family. Recombination among transcribed RHS genes leads to the generation of multiple chimeric functional RHS genes. Finally, we showed the nuclear location of RHS in the replicative forms of T. cruzi. Although evidence for the functions of RHS in T. cruzi has been elusive, we suggest that these proteins could play a role in modulating the chromatin structure at the transcriptional and posttranscriptional levels, as has been suggested in T. brucei [10,11].
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/9/1085/s1, Figure S1: Flowchart of RHS sequences' identification, and quality validation; Figure S2: Integrity and purity of RHS recombinant protein; Figure S3: Distribution of RHS sequences across the chromosomes of clone CLB of T. cruzi; Figure S4: Proportion of total RHS length in each chromosome of clone CLB; Figure S5: Phylogeny and classification of transcribed RHS sequences of clone CLB; Table S1. Mapping of RHS sequences on the chromosomes of clone CLB of Trypanosoma cruzi.