Phylogenetic diversity of two common Trypanosoma cruzi lineages in the Southwestern United States

Trypanosoma cruzi is the causative agent of Chagas disease, a devastating parasitic disease endemic to Central and South America, Mexico, and the USA. We characterized the genetic diversity of T. cruzi circulating in five triatomine species (Triatoma gerstaeckeri, T. lecticularia, T. indictiva, T. sanguisuga and T. recurva) collected in Texas and Southern Arizona using nucleotide sequences from four single-copy loci (COII-ND1, MSH2, DHFR-TS, TcCLB.506529.310). All T. cruzi variants fall in two main genetic lineages: 75% of the samples corresponded to T. cruzi Discrete Typing Unit (DTU) I (TcI), and 25% to a North American specific lineage previously labelled TcIV-USA. Phylogenetic and sequence divergence analyses of our new data plus all previously published sequence data from those 4 genes collected in the USA, show that TcIV-USA is significantly different from any other previously defined T. cruzi DTUs. The significant level of genetic divergence between TcIV-USA and other T. cruzi lineages should lead to an increased focus on understanding the epidemiological importance of this lineage, as well as its geographical range and pathogenicity in humans and domestic animals. Our findings further corroborate the fact that there is a high genetic diversity of the parasite in North America and emphasize the need for appropriate surveillance and vector control programs for Chagas disease in southern USA and Mexico.


44
Chagas disease is caused by the protozoan parasite Trypanosoma cruzi, which is

119
Furthermore, TcVI was recently detected in captive primates (Herrera et al., 2019), and TcII was 120 also detected in rodents from Louisiana (Herrera et al., 2015). More recently, all DTUs except 121 TcIII and TcBat were detected in a small sample of rodents from New Orleans using next-122 generation sequencing of the mini exon gene (Pronovost et al., 2020).

123
All the above discoveries beseech for a better understanding of the genetic diversity of T.  Table S1) 137 (Mitchell, 2013;Reisenman et al., 2010). Fifty-five individual Triatomine bugs from Texas and

138
Arizona that were infected with T. cruzi are the focus of this study. Collection records for those 139 individual insects were previously reported (Mitchell, 2013;Reisenman et al., 2012) but detailed 140 location information is presented in S1 Table. In Texas, the majority of infected specimens were 141 T. gerstaeckeri (n=44), with a lower frequency of T. lecticularia (n=4), T. sanguisuga (n=2) and 142 T. indictiva (n=1) (Mitchell, 2013    Flores-López and Machado, 2011;Ayala, 2001, 2002) and from recently sampled US isolates were included in the analyses (S2 Table). T. cruzi marinkellei was used as outgroup 173 in the analyses. Sequences were aligned using MUSCLE (Edgar, 2004) and then manually 174 checked. MrBayes 3.1.2 (Huelsenbeck and Ronquist, 2001;Ronquist and Huelsenbeck, 2003) 175 was used to conduct Bayesian analyses using the substitution models chosen by jModeltest on 176 each locus (Posada, 2008). Two independent simultaneous Markov Chain Monte Carlo runs 177 were conducted with four chains each for at least 30,000,000 generations and sampled trees 178 every 10,000 generations. If the standard deviation of split frequencies were not below 0.01 179 after the first run, the analyses were run for an additional 10,000,000 generations and were 180 stopped after convergence (i.e. standard deviation of split frequencies # 0.01). Parameters and 181 corresponding trees were summarized after discarding the initial 25% of each chain as burnin.

182
Maximum likelihood (ML) trees were estimated in PAUP, using the tree bisection-reconnection 183 (TBR) algorithm for the branch swapping (Swofford, 2002b). A reduced data set was selected 184 for the ML analyses due to computational constraints. jModeltest2 was used to estimate the      well-supported monophyletic group that is more closely related to TcIII-IV-V-VI than to TcI or 211 TcII, but that is significantly divergent from any of the previously defined DTUs.

212
Given that the vast majority of DTU reference sequences are from South American

245
Estimates of the divergence time of TcIV-USA from its common recent ancestor with South 246 American TcIV is similar for both nuclear (110,000 yrs ago) and mitochondrial data (120,000-247 160,000 yrs ago) (Table S3), indicating a mid to late Pleistocene divergence and entry into 248 North America, well before the arrival of humans into North America.

251
To complement our phylogenetic analyses, we plotted the distribution of nucleotide 252 pairwise differences within and between DTUs to determine if TcIV-USA is significantly 253 divergent from other DTUs and could thus correspond to an independent evolutionary lineage.