The Global Spread Pattern of Rat Lungworm Based on Mitochondrial Genetics

Eosinophilic meningitis due to rat lungworm, Angiostrongylus cantonensis, is a global public health concern. Human cases and outbreaks have occurred in the new endemic areas, including South America and Spain. The growing genetic data of A. cantonensis provides a unique opportunity to explore the global spread pattern of the parasite. Eight more mitochondrial (mt) genomes were sequenced by the present study. The phylogeny of A. cantonensis by Bayesian inference showed six clades (I–VI) determined by network analysis. A total of 554 mt genomes or fragments, which represented 1472 specimens of rat lungworms globally, were used in the present study. We characterized the gene types by mapping a variety of mt gene fragments to the known complete mt genomes. Six more clades (I2, II2, III2, V2, VII and VIII) were determined by network analysis in the phylogenies of cox1 and cytb genes. The global distribution of gene types was visualized. It was found that the haplotype diversity of A. cantonensis in Southeast and East Asia was significantly higher than that in other regions. The majority (78/81) of samples beyond Southeast and East Asia belongs to Clade II. The new world showed a higher diversity of Clade II in contrast with the Pacific. We speculate that rat lungworm was introduced from Southeast Asia rather than the Pacific. Therefore, systematic research should be conducted on rat lungworm at a global level in order to reveal the scenarios of spread.


Introduction
Angiostrongylus cantonensis is the major cause of human neural angiostrongyliasis [1]. Infection with this nematode often results in eosinophilic meningitis (EM) and other disorders of the central nervous system [2]. Humans are vulnerable to infection because of the varied contact opportunities with mollusk intermediate hosts and paratenic hosts. EM occurs following ingestion of undercooked or raw snails, slugs, freshwater shrimp or crabs, lizards, and even contaminated vegetable salad with mollusk slime [3].
A. cantonensis was first discovered from the pulmonary arteries and heart of rats in southern China in 1933 [4]. Human EM due to A. cantonensis was first reported in 1945 [5], followed by EM outbreaks in Pohnpei, of the Caroline Islands of the Pacific between 1947 and 1948 [6]. However, the health concern was not noted in the early 1960s when an outbreak occurred in Tahiti [7]. The subsequent surveys indicated that EM due to

Mitochondrial Genomes
The present study will provide eight full mt genomes of A. cantonensis. The adult worms of A. cantonensis were collected from the pulmonary arteries of wild rats during the first national survey in China [31], or from Sprague-Dawley rats that were infected by third stage larvae from snails or slugs. Nine female worms from seven collecting sites were used for sequencing mt genomes ( Table 1). One of the full sequences had been published (GQ398121). The other eight were characterized by the same methods and used for the present study [32]. In addition, the mt genomes of A. cantonensis, A. malaysiensis and A. mackerrasae that are deposited in GenBank were also included in the present analysis.

Mitochondrial Gene Sequences
Since our analysis strategy is to map the mt genes to the known complete mt genomes and then determine the types of each sequence, all mt gene sequences of A. cantonensis, A. malaysiensis and A. mackerrasae were collected from GenBank and used for the present study. Meanwhile, the collection information of specimens, e.g., location, year, species, gene name, sample size were also collected. For the sequences which had been published in papers but not yet submitted to GenBank, we collected the sequences directly from the papers including Supplementary Materials.

Phylogenetic Analysis
The complete mt genomes of A. cantonensis, A. malaysiensis and A. mackerrasae were used to construct phylogeny with the A. vasorum (JX268542) and A. costaricensis (GQ398122) as outgroups. A phylogenetic tree was constructed under Bayesian inference, performed in MrBayes version 3.2.7 [31]. Prior to Bayesian inference, the best fit nucleotide substitution model (GTR+G) for the dataset was determined using a hierarchical likelihood ratio test in jModeltest version 2 [32]. The posterior probabilities were calculated using Markov chain Monte Carlo (MCMC) simulations. At the end of this run, the average standard deviation of split frequencies was below 0.01, and the potential scale reduction factor was reasonably close to 1.0 for all parameters. A consensus tree was visualized and edited by Mesquite version 3.70 [36]  The group of complete mt genomes was also submitted to network analysis by TCS version 1.21 [36]. Subgroups were determined with a connection limit of 1% or 132 steps and considered as the clades in the phylogenetic tree mentioned above.
The mt gene sequences downloaded from GenBank and papers were first categorized by the gene name. The sequences of the same gene were mapped to the complete mt genomes by alignment in clustal X. The sequences might be divided into subgroups when the overlapped length was less than 50%. After being trimmed as necessary, the grouped sequences were submitted to DnaSP version 5 [37] and the haplotypes were determined. The phylogenetic trees based on the haplotypes were inferred following the guide described above.
Some studies characterized the sequence of different mt genes of the same specimen, e.g., cox1 and cytb. Only one gene for a single specimen was taken into analysis. The gene cox1 was of priority because it was used broadly, followed by cytb and small ribosomal RNA gene. Large ribosomal RNA gene and nad1 were excluded because cox1 and small ribosomal RNA gene from the same specimens were included.
The gene fragments deduced from the 18 whole mt sequences of A. cantonensis, A. malaysiensis and A. malaysiensis were included as index markers in the individual gene phylogenetic trees. Each haplotype was assigned to a specific clade as delineated by whole mt genomes according to the index markers. New clades could be defined if the haplotypes in the clades were separated by network analysis with a connection limit of 1%, the same threshold as in analysis of full mt genomes. If a new clade was clustered with a specific known clade inferred according to the full mt genomes, the new clade would have the same name with a suffix of 2. Otherwise, a new name would be given to the new clade.

Mapping the Distribution Pattern
The coordinates of specimen collecting sites were either directly collected from the published papers or determined by Google Earth by inputting the location names which were extracted from the papers and GenBank. Each gene sequence from known locations was assigned to a specific clade determined in phylogenetic analysis.
The geographic distributions of gene types were mapped and visualized in a geographical information system (ArcGIS version 10.1). We merged some collecting sites for better display when the distance among them was less than 0.8 degrees at the global level and less than 0.5 degrees at the subregion level.

Results
A total of 18 mt genomes of rat lungworms were used in the analysis, including 10 from GenBank and 8 from the present study (Table 1, File S1). The genetic distance among three species of rat lungworm was around 0.12, while the genetic distance among various strains of A. cantonensis ranged from 0.001 to 0.056 ( Table 2). The phylogenetic and network analyses showed that 15 mt genomes of A. cantonensis collapsed into six clades, encoded as clade I to clade VI ( Figure 1). One genome (KT186242), previously identified as A. cantonensis, had closer genetic relation to A. malaysiensis (KT947979) according to the present study. Another genome (MN793157) belonged to A. mackerrasae.
The individual gene phylogenetic trees of cox1, nad1 and small ribosomal RNA genes from 18 complete mt genomes were re-constructed by Bayesian inference (File S2). When incorporated into the individual gene phylogenetic trees, the gene fragments deduced from the whole mt sequences all remained in their original clades, as expected.
A total of 554 mt genomes or fragments, which represented 1472 specimens of rat lungworms globally (Table 3), were used in the present study. The specimens were collected from 224 sites. According to our analysis one mt genome (KT186242), 13 cox1 sequences and 64 cytb sequences were A. malaysiensis and had previously been identified as A. cantonensis.
There are 327 cox1 sequences, including 160 fragments from GenBank, 18 mt genomes and 149 sequences from our previous study. One sequence (LC515249) was excluded due to the lack of overlap with any other sequence. The rest of the 326 sequences were divided into two subgroups according to the criteria of an overlap of less than 50%. The first subgroups included 308 sequences, which belonged to 112 haplotypes. All six of the clades (I~VI) based on the complete mt genomes were observed in the phylogenetic tree based on the first subgroups of cox1 genes. Meanwhile, another four clades were identified and named as Clade I-2, Clade II-2, Clade III-2 and Clade V-2 based on their relations to the six known clades ( Figure 2). The second subgroup included 18 sequences, belonging to 11 haplotypes. All the sequences fell into Clade II ( Figure 3).  The individual gene phylogenetic trees of cox1, nad1 and small ribosomal RNA genes from 18 complete mt genomes were re-constructed by Bayesian inference (File S2). When incorporated into the individual gene phylogenetic trees, the gene fragments deduced from the whole mt sequences all remained in their original clades, as expected.  Ten cytb sequences from the same specimens of cox1 were excluded in the analysis. The rest of the 195 cytb sequences, including 177 from GenBank and 18 from complete mt genomes, collapsed into 67 haplotypes. Eight clades were found in the phylogenetic analysis (Figure 4), including six that were determined according to complete mt genomes and two distinctive clades named Clade VII and VIII. The average genetic distance of sequences in Clade VIII to the other clades of A. cantonensis was over 5% (Table S1). The difference could be as high as 7%.    Ten cytb sequences from the same specimens of cox1 were excluded in the analysis. The rest of the 195 cytb sequences, including 177 from GenBank and 18 from complete mt genomes, collapsed into 67 haplotypes. Eight clades were found in the phylogenetic analysis (Figure 4), including six that were determined according to complete mt genomes and two distinctive clades named Clade VII and VIII. The average genetic distance of sequences in Clade VIII to the other clades of A. cantonensis was over 5% (Table S1). The difference could be as high as 7%.  Twelve sequences of small and large ribosomal RNA genes were from the same specimens. Taking into consideration the smaller ribosomal subunit gene (ON747257), 13 small ribosomal RNA genes were used in the analysis. The sequences formed five haplotypes. According to network analysis, three haplotypes belonged to Clade II ( Figure 5), while another two haplotypes fell into Clade I and III, respectively. Twelve sequences of small and large ribosomal RNA genes were from the same specimens. Taking into consideration the smaller ribosomal subunit gene (ON747257), 13 small ribosomal RNA genes were used in the analysis. The sequences formed five haplotypes. According to network analysis, three haplotypes belonged to Clade II ( Figure 5), while another two haplotypes fell into Clade I and III, respectively. The gene types (clades) based on phylogenetic and network analyses were mapped in the world ( Figure 6, Table S2). A much higher diversity of clades was observed in Southeast and East Asia. In contrast, almost all 81 specimens of A. cantonensis beyond the region belonged to Clade II, except for two (Clade IV) from Hawaii and one (Clade V) from Rio de Janeiro. In Asia, the Annamite range might be considered the genetic barrier for A. cantonensis (Figure 7). The gene types of Clade III, IV, V, V-2, VI and VIII were mainly distributed east of the Annamite range. On the contrary, Clade I, III-2 and VII were com- The gene types (clades) based on phylogenetic and network analyses were mapped in the world ( Figure 6, Table S2). A much higher diversity of clades was observed in Southeast and East Asia. In contrast, almost all 81 specimens of A. cantonensis beyond the region belonged to Clade II, except for two (Clade IV) from Hawaii and one (Clade V) from Rio de Janeiro. In Asia, the Annamite range might be considered the genetic barrier for A. cantonensis (Figure 7). The gene types of Clade III, IV, V, V-2, VI and VIII were mainly distributed east of the Annamite range. On the contrary, Clade I, III-2 and VII were commonly found to the west of the Annamite range. A. malaysiensis was commonly found in in the west of Annamite range. However, it was also discovered in Taiwan according to the present analysis. A. mackerrasae was only observed in Australia.
Pathogens 2023, 12, x FOR PEER REVIEW 11 of 17 in the west of Annamite range. However, it was also discovered in Taiwan according to the present analysis. A. mackerrasae was only observed in Australia.  The cox1 sequence types in Clade II were further categorized into ten subgroups (Figure 8). According to the geographic distribution of the subgroups, the haplotype diversity in the west of Annamite range. However, it was also discovered in Taiwan according to the present analysis. A. mackerrasae was only observed in Australia.  The cox1 sequence types in Clade II were further categorized into ten subgroups (Figure 8). According to the geographic distribution of the subgroups, the haplotype diversity The cox1 sequence types in Clade II were further categorized into ten subgroups (Figure 8). According to the geographic distribution of the subgroups, the haplotype diversity beyond Southeast and East Asia was 0.944, which was higher than that of 0.4641 in Asia (Figure 9). Clade II was not commonly observed in Asia according to available data. For example, Clade II was not discovered in the mainland of China and the northern part of the Greater Mekong subregion, though intensive sampling was undertaken and the high genetic diversity of A. cantonensis was observed in the region. Another feature of clade distribution in Asia was that none of the seven clades, except for Clade II-G, was simultaneously observed in two or more sites. beyond Southeast and East Asia was 0.944, which was higher than that of 0.4641 in Asia ( Figure 9). Clade II was not commonly observed in Asia according to available data. For example, Clade II was not discovered in the mainland of China and the northern part of the Greater Mekong subregion, though intensive sampling was undertaken and the high genetic diversity of A. cantonensis was observed in the region. Another feature of clade distribution in Asia was that none of the seven clades, except for Clade II-G, was simultaneously observed in two or more sites.   Clade II-G is the most common type, accounting for 48.57% of Clade II and found in seven sites in the world, five of which were located in the Pacific islands. Meanwhile the clade was also extended to the southeast shore area of Brazil. In contrast with the Pacific, the new world shows a higher diversity of clades, including the predominant clades II-D and II-E. Although the clade II-D1 and clade II-D3 were observed on Taiwan island and eastern Australia, 90% of the samples of Clade II-D were from the new world and Spain, while Australia shared two clades with Southeast Asia and South America.

Discussion
A total of 18 complete mt genomes of rat lungworms are currently available. Only one mt genome is A. mackerrasae and A. malaysiensis, respectively. The genetic distance among the species is over 0.11, while that among the different geographical strains within species is less than 0.06. Therefore, the species of rat lungworm could be definitively distinguished based on genetic distance. The genetic distance of the mt genome (KT186242), previously identified as A. cantonensis, was over 0.11 to any known strain of A. cantonensis. In contrast, the genome is almost identical to the mt genome (KT947979) diagnosed as A. malaysiensis. Therefore, KT186242 should be A. malaysiensis.
There were two hypotheses about the origin of A. cantonensis. Earlier opinion indicated that the parasite originated from Africa, which was supported by the fact that the discovery of A. cantonensis coincided with the spread of the African land snail A. fulica to Southeast Asia [38]. However, the theory of Asian origin later became popular. It was thought that the parasite was originally endemic to Southeast Asia and spread by the shipping rats, R. rattus and R. norvegicus, due to extensive traveling [13]. Our results show a much higher genetic diversity of A. cantonensis in Southeast and East Asia, supporting the latter theory.
Although intensive sampling occurred in China, Japan and the Great Mekong subregion, the common gene types (e.g., Clade II-D and Clade II-E shown in Figure 9) in America were rarely found in the region mentioned above. It was therefore believed that the gene types must be from anywhere else in Southeast Asia. Of note, Clade II was genetically close to Clade I. The latter was found only in Thailand and Myanmar, where A. malaysiensis co-occurred. The phenomenon of significant correlation between genetic and geographical distances is commonly observed in population genetics [39]. Therefore, Clade II-G is the most common type, accounting for 48.57% of Clade II and found in seven sites in the world, five of which were located in the Pacific islands. Meanwhile the clade was also extended to the southeast shore area of Brazil. In contrast with the Pacific, the new world shows a higher diversity of clades, including the predominant clades II-D and II-E. Although the clade II-D1 and clade II-D3 were observed on Taiwan island and eastern Australia, 90% of the samples of Clade II-D were from the new world and Spain, while Australia shared two clades with Southeast Asia and South America.

Discussion
A total of 18 complete mt genomes of rat lungworms are currently available. Only one mt genome is A. mackerrasae and A. malaysiensis, respectively. The genetic distance among the species is over 0.11, while that among the different geographical strains within species is less than 0.06. Therefore, the species of rat lungworm could be definitively distinguished based on genetic distance. The genetic distance of the mt genome (KT186242), previously identified as A. cantonensis, was over 0.11 to any known strain of A. cantonensis. In contrast, the genome is almost identical to the mt genome (KT947979) diagnosed as A. malaysiensis. Therefore, KT186242 should be A. malaysiensis.
There were two hypotheses about the origin of A. cantonensis. Earlier opinion indicated that the parasite originated from Africa, which was supported by the fact that the discovery of A. cantonensis coincided with the spread of the African land snail A. fulica to Southeast Asia [38]. However, the theory of Asian origin later became popular. It was thought that the parasite was originally endemic to Southeast Asia and spread by the shipping rats, R. rattus and R. norvegicus, due to extensive traveling [13]. Our results show a much higher genetic diversity of A. cantonensis in Southeast and East Asia, supporting the latter theory.
Although intensive sampling occurred in China, Japan and the Great Mekong subregion, the common gene types (e.g., Clade II-D and Clade II-E shown in Figure 9) in America were rarely found in the region mentioned above. It was therefore believed that the gene types must be from anywhere else in Southeast Asia. Of note, Clade II was genetically close to Clade I. The latter was found only in Thailand and Myanmar, where A. malaysiensis co-occurred. The phenomenon of significant correlation between genetic and geographical distances is commonly observed in population genetics [39]. Therefore, Clade II was possibly from either Myanmar or the lower reach of the Mekong River. Similarly, Clade V showed longer genetic distance from any other clade within A. cantonensis, though the clade co-occurred with clade IV in China and Japan. We did not obtain any genetic evidence from the Philippines, but the country holds the potential to harbor the Clade V as a local gene type. A total of 124 cox1 samples from 24 collecting sites were available in Japan, and they consisted of 4 clades. However, the haplotype diversity of the samples was very low compared with Southeast Asia and the south of China [28,40]. Only seven haplotypes were identified, including three unique haplotypes within a single sample. Therefore, A. cantonensis in Japan might have been introduced from Southeast Asia and/or the south of China.
Our results show that there might be a genetic barrier between the Greater Mekong subregion and the south of China. The geographical isolation due to the Annamite range located in Vietnam may play a key role in the genetic divergence of A. cantonensis. Previous studies have indicated different evolutionary trajectories on western and eastern sides of the Annamite Mountain range [41]. The case of liver fluke also supports this opinion. Two species of liver fluke, i.e., Clonorchis sinensis and Opisthorchis viverrini, are widely distributed in the region. The former is distributed northeast of the Annamite range, while the latter is endemic in the southwest [42,43].
Clade II is the overwhelming gene type beyond Southeast and East Asia except for a small number of samples of Clade IV and Clade V in Hawaii and Rio de Janeiro, respectively. The conclusive global spread route of rat lungworm could not be established based on the present available genetic data. However, our findings imply that the gene types on the Pacific islands, where EM outbreaks occurred between the 1940s and 1960s, show identical and low diversity, which implies that its re-introduction after the Pacific War is less plausible [13]. Australia is the only country where A. cantonensis was reported earlier [21]. Our findings show there was no relation between Australia and the Pacific based on the available evidence, while the majority of samples are genetically close to those from Thailand. The new world, recently identified as endemic areas, probably showed a distinct haplotype structure from the Pacific and hence might have been directly introduced to the rat lungworm from Southeast Asia.
Compared with the global distribution of A. cantonensis, A. mackerrasae and A. malaysiensis are endemic locally. According to our results A. mackerrasae was only found on the eastern shore of Australia. Although A. mackerrasae was also discovered from R. norvegicus, it shows a higher susceptibility to local Australian rodents, e.g., R. fuscipes, R. lutreolus and Melomys cervinipes [21]. Therefore, A. mackerrasae was not endemic beyond the original region. A. malaysiensis showed a wider range of definitive hosts than A. mackerrasae. Although A. malaysiensis seems more susceptible to R. tiomanicus and R. argentiventer, the parasite shared 80% of definitive hosts with A. cantonensis, including the most invasive R. norvegicus and R. rattus (unpublished data). Our present study indicates that A. malaysiensis had gone beyond the original region and established local transmission in eastern Taiwan, since the worms had been isolated from the intermediate host A. fulica [44,45].
In order to reveal the complex spread pattern of A. cantonensis, we suggest the following research priorities. First, more mt genomes of rat lungworm should be characterized. Our present study identified new clades with complete mt genome sequences not yet available. Second, we need to fill the gap of genetic information in the Philippines and Indonesia, and even Myanmar. Since these countries are neighboring to the Greater Mekong subregion and East Asia, the mt genetic information of rat lungworms will be invaluable in constructing the phylogeny. Unfortunately, the large-scale survey has not been conducted and the mt genetic information has been lacking to date, though rat lungworms have been reported in the Philippines and Indonesia since the 1950s. Third, systemic sampling around the world is proposed to update either rare specimens or a small number of collecting sites beyond the Greater Mekong subregion and East Asia. In addition, the types of host animals, i.e., mollusks and rodents, were different in previous studies. It should be noted that the host preferences of species or strains of rat lungworm may cause a bias, and hence lead to false conclusions.

Conclusions
We showed an integrated global distribution of gene types by mapping various fragments to the known complete mt genomes. The new endemic areas including the new world and Spain showed different compositions of gene types to Southeast and East Asia and even the Pacific. Therefore, we need to conduct systematic research on rat lungworm at a global level in order to reveal the scenarios of spread.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pathogens12060788/s1, File S1: Eighteen mt genomes of rat lungworm used in the data analysis; File S2: Comparison between trees based on mt genomes and individual genes; Table S1: Difference matrix of cytb sequences; Table S2: Localities and gene types of samples.
Author Contributions: Conceptualization, S.L.; formal analysis, X.T. and S.L.; data collection and analysis, X.T., S.C. and L.D.; writing-original draft preparation, X.T.; writing-review and editing, Y.Q., H.L. and S.L. All authors have read and agreed to the published version of the manuscript.