Clustering of Mycobacterium tuberculosis Cases in Acapulco: Spoligotyping and Risk Factors

Recurrence and reinfection of tuberculosis have quite different implications for prevention. We identified 267 spoligotypes of Mycobacterium tuberculosis from consecutive tuberculosis patients in Acapulco, Mexico, to assess the level of clustering and risk factors for clustered strains. Point cluster analysis examined spatial clustering. Risk analysis relied on the Mantel Haenszel procedure to examine bivariate associations, then to develop risk profiles of combinations of risk factors. Supplementary analysis of the spoligotyping data used SpolTools. Spoligotyping identified 85 types, 50 of them previously unreported. The five most common spoligotypes accounted for 55% of tuberculosis cases. One cluster of 70 patients (26% of the series) produced a single spoligotype from the Manila Family (Clade EAI2). The high proportion (78%) of patients infected with cluster strains is compatible with recent transmission of TB in Acapulco. Geomatic analysis showed no spatial clustering; clustering was associated with a risk profile of uneducated cases who lived in single-room dwellings. The Manila emerging strain accounted for one in every four cases, confirming that one strain can predominate in a hyperendemic area.


Introduction
Tuberculosis (TB) remains a global health problem, mainly related to poverty and concomitant diseases [1]. The reported annual incidence of pulmonary TB in Mexico is 14.27 cases per 100,000 people; in Guerrero state, the rate is 34.56 per 100,000 [2]. Acapulco is of particular concern as a centre for internal migration in Guerrero state, in addition to its role in international tourism.
TB recurrence and reinfection have quite different implications for prevention. A key issue among migrants to a city like Acapulco is whether TB infection contracted earlier in life is reactivated, to be detected in the city as clinical cases, or whether these are new infections contracted in the city. One research device to help differentiate between recurrence and reinfection is to establish whether contemporary cases are clustered; unless they all came from the same distant place of original infection, clustering suggests a shared source of infection in the locality of the study.
Molecular typing is a well-recognized tool for identifying clustering. The sequence insertion of IS6110 is often the basis for molecular fingerprinting, as the number of copies of this insertion and its localization in the chromosome varies among different strains [6]. Another sequence targeted for molecular typing is the 36-base pair Direct Repeat (DR) locus. Spoligotyping is a PCR-based typing method that generates distinct patterns based on the hybridization of 43 different oligonucleotides to amplified spacer sequences that lie between DR loci in the M. tuberculosis chromosome [7]. This estimates the frequency of recent infection with M. tuberculosis estimated by identification of clusters of a shared spoligotype.
This paper studies spoligotypes isolates of M. tuberculosis from a series of patients in Acapulco, to determine the level of clustering and to identify risk factors for clustering.

Study Location and Population.
A total of 330 patients with pulmonary TB presented a positive diagnosis between February 2001 and September 2002 in the municipality of Acapulco. Receipt of funding set the start of the series which closed with 330 consecutive participants recruited through a review of recent medical records. The series excluded only 30 who had left the city and who could not be localised. Using addresses from clinical records, researchers approached each participant and explained the study, obtaining written consent prior to administering a pretested questionnaire. The study provided an opportunity to follow patients with a preliminary sputum positive result; the Ministry of Health offered directly observed therapy, short course strategy (DOTS) [8] for all cases.

Bacterial Strains.
The 300 participants contacted yielded 273 isolates of M. tuberculosis from sputum samples collected by the Acapulco General Hospital, the Donato G. Alarcón Hospital, the State Public Health Laboratory and the Clínica Avanzada de Atención Primaria a la Salud. Processing began with the Petroff method for preparation and decontamination of sputum samples, followed by inoculation of 0.5 mL of each sample onto Löwenstein-Jensen media, incubated at 37 • C for eight weeks.

Genotyping.
Extraction of DNA required resuspension of M. tuberculosis colonies in 1 mL of sterile distilled water, with samples lysed at 90 • C for 10 minutes and frozen at −70 • C for 15 minutes. Samples were then thawed and incubated at 90 • C for 10 minutes. Centrifugation for three minutes concentrated bacterial contents, and the supernatant was transferred to a new tube. Spoligotyping relied on a commercially available kit (Isogen Bioscience BV Maarssen, The Netherlands) for amplification of DNA from the DR locus, the region with the highest level of polymorphism in the M. tuberculosis chromosome [7]. PCR amplification used 5 uL of extracted DNA from each sample combined with 1.5 uL MgCl 2 , 2.5 mM of each dNTP, 5 µl 10X buffer, 0.5 U Taq polymerase, and 20 pmol of each primer (DRa: biotin-5 -CCG AGA GGG GAC GGA AAC-3 and DRb: 5 -GGT TTT GGG TCT GAC GAC-3 ) in a final volume of 50 uL. The amplification protocol required 5 minutes at 94 • C followed by 30 cycles of 1 minute at 94 • C, 1 minute at 55 • C, and 30 seconds at 72 • C, with a final extension of 10 minutes at 72 • C. The assay included two positive controls (chromosomal DNA from M. tuberculosis H37Rv and from M. bovis BCG P3) and a negative control (sterile H 2 O).
We hybridized the amplified DNA with 43 oligonucleotides covalently linked to a nylon membrane (Isogen Bioscience BV, Maarssen, The Netherlands) at 60 • C for 1 hour in a blotter with 45 lines (Miniblotter 45; Immunetics, Cambridge, Mass). Afterwards, hybridized DNA fragments were incubated with streptavidin, conjugated to peroxidase (Boehringer Mannheim), and then analyzed by chemiluminescence by incubating for 1 minute in 20 mL of ECL detection reagent (Amersham, Buckinghamshire, England) and exposing to X-ray film for 20 minutes.

Data Capture and Analysis.
Data capture from the questionnaires relied on Epi-Info (CDC, version 6.04). We compared spoligotyping results with the SpolDB4 spoligotyping database from the Institute Pasteur de Guadeloupe which, at the time of the analysis, contained 1,939 types from 39,295 strains contributed by 122 countries [3,9]. The analysis defined a cluster as two or more isolates with identical genetic patterns. We used the preexisting code for those already identified matching patterns. When the spoligotype was not found in SpolDB4, we labeled it as "Mx" with a number (for example, Mx1).
We estimated the Recent Transmission Index (RTI) using the formula of RTIn − 1 [10] and RTIn, which takes into account the number of patterns with unique genotype (singletons) described by Luciani et al. [11]. Supplementary analysis of the spoligotyping data used SpolTools [12]: DESTUS (Detecting Emerging Strains of Tuberculosis Using Spoligotypes) focused on the detection of rapidly propagating strains; spoligoforests allowed visualization of probable relations between the spoligotypes based on a plausible history of mutation events. The clusters of strains sharing the same spoligotype in the diagram are nodes, labeled with shared type (ST) numbers in SpolDB4 [3].

Risk and Point Cluster
Analysis. Potential risk factors for clustering included age (less than or older than 30 years), sex, marital status, education, employment, ethnicity, area of residence (urban or rural), duration of residence, number of people in the dwelling, use of alcohol, concomitant illness, or spouse with TB. When simple bivariate risk analysis revealed no statistically significant contrasts, we derived risk profiles of factor combinations associated with clustering. We examined these risk profiles in a multivariate analysis, beginning with a saturated model and eliminating the weakest association stepwise until only significant associations remained in the final model. We report this as an adjusted odds ratio and 95% confidence interval.
For the spatial analysis, we geoindexed cases on a map of Acapulco at a topographic chart scale of 1 : 50,000 in DXF vector data format (Instituto Nacional de Estadística, Geografía e Informática; 2001). Point cluster analysis of quadrants relied on QUADRAT (IDRISI 32, Clark Labs, Worcester, MA), which determines the total number, mean number or density, variance and variance/mean ratio of points in cases where each grid cell measures the total point count found in that cell. The variance/mean ratio describes the pattern of a point set, with values close to 1.0 suggesting a random point pattern, values significantly smaller than 1.0 suggesting a regularly distributed pattern, and values greater than 1.0 suggesting a clustered pattern. The location of each case was compared with the locations of the reported TB cases. Final display relied on ArcView GIS software (ArcView GIS 3.2, Environmental Systems Research Institute Inc., Redlands, CA).
This study was approved by the Committee of Research Ethics at the Centro de Investigación de Enfermedades Tropicales of the Universidad Autónoma de Guerrero.

Spoligotyping Patterns of M. tuberculosis Isolates.
Spoligotyping the 267 isolates produced 85 distinct genotypes, 59 of them with a unique pattern. The 208 (77.9%) remaining isolates were grouped into 26 clusters that were shared with at least one patient. The cluster size varied from 2 to 70 patients, however, most (21/26) included 2 to 5 patients. The Recent Transmission Index, or RTIn, taking into account the number of cases with unique genotypes (singletons), was 0.78, and the RTIn − 1 was 0.68. Table 1 presents a comparison of the Acapulco spoligotypes with data from the global spoligotype database of the cluster genotypes; four were new and 22 had been previously identified by this database.

Detection of an Emerging Strain and Visualization of the
Relations between Spoligotypes. The S19 strain demonstrates an elevated rate of transmission, independent of the sampling fraction f (Table 2), behaving as an emerging strain according to the method proposed by Tanaka and Francis [16]. According to the SpolDB4 database, S19 corresponds to the EAI2-Manila strain. Figure 1 shows the Spoligoforest hierarchical layout, with the lines between the nodes denoting the weights calculated using this model.  (Table 3). Some 59% (172/293) lived less than 16 years in their community of origin. The average duration of residence in Acapulco was 2.6 years (SD: 1.66, range: 1-10 years). Household had an average of 5.1 occupants (SD: 2.83, range: 1-20 people).
Data on the start date of symptoms permitted identification of the "first case" (index case) in the largest clusters. The "first case" of the Manila cluster, for example, was a 35-yearold adult addicted to drugs and alcohol who had symptoms for 10 years and, on examination, had advanced pulmonary TB (bacilloscopy positive, grade 3). Thirty of the 70 cases in this cluster reported a concomitant disease, diabetes being the most common (17/70).
The largest five clusters included 70, 31, 22, 15, and 8 cases, together contributing 70% (146/208) of the case series (Table 4). Figure 2 shows a random distribution pattern of spatial variances. There was no evidence of spatial grouping, consistent with recent infection associated with factors other than residence. Table 5 shows the list of conventional TB risk factors, none of which on their own showed a statistically significant association with clustering on univariate analysis. Seven risk profiles, combining two risk factors, did show significant differences between clustered and nonclustered TB cases ( Table 6): males who consumed alcohol, unmarried men under the age of 30 years, unemployed single cases, indigenous cases without remunerated employment, young people in urban areas, and uneducated people living in one-roomed dwellings. The size of the study did not permit combining all of these profiles in a single multivariate model. Each was therefore tested to see if it was explained by each other profile        Guerrero (13) Oaxaca (1) Michoacán (1) Nah (1) Mix (2) 60% (9) Diabetes (3) AIDS (1) Alcoholism ( in turn. Only one risk profile remained that could not be explained by any others-uneducated cases living in singleroom dwellings.

Discussion
Reflecting the still very partial documentation of spoligotypes worldwide, 58% of our spoligotypes were unique to Acapulco; 21.7% of cases were of a type not previously registered in the global database. Our five most frequent spoligotypes included more than one half (54.7%) of the cases; these types are commonly recognized worldwide [3,13,14]. The T1 Spoligotype (shared type, 53), EAI (shared type 19, 8, and 342), and LAM9 (shared type 42) currently represent 28.75% of M. tuberculosis isolates in the global spoligotype database, being prevalent in Europe, Africa, India, and other countries [3].
The largest cluster, including 33.6% (70/208) of the isolates, was the Manila family. This demonstrates the public health impact of this strain in Acapulco. This relates to Mexico's historic ties with Asia, as the Manila strain is found throughout South East Asia, particularly in the Philippines (73%), Myanmar and Malaysia (53%), and Vietnam and Thailand (32%) [3,15,17]. In a study in the Philippines, 90% (43/48) of isolates were of the Manila genotype [15]. Phylogenetic studies of related spoligotype strains demonstrated that the EAI (East African Indian) genotype has shared ancestral relations [17]. This group includes the Manila or EAI2-Manila strain [3].
Internationally, certain M. tuberculosis strains are linked with a large proportion of recent infection, suggesting these strains might have greater transmissibility or higher probability to cause disease soon after transmission. These strains are associated with families or groups of related isolates, such as the genotypes of the Beijing family strain W, Haarlem, and Africa [3,18,19]. These "cluster strains" come from the Latin American Mediterranean families (LAM), Haarlem (H), and M. Bovis [20]. The Manila strain, responsible for a third of our cluster strain cases, might share this greater transmission or progress to active disease-at least in Acapulco.
The method used for detection of rapidly propagating strains [16] allowed identification of an emerging strain (strain S19), previously classified as EAI2-Manila [3]. This result coincides with the spoligoforest result, where node S19 (n = 70) has the most rapid transmission. Spoligoforest demonstrates all of the possible relations between spoligotypes under the assumption of spoligotype mutation [12]. In our data set, the largest root of the tree was the ST 100 strain, which corresponds to the MANU family. Among the principal descendants, we identified strains ST 54, ST 236, and ST 1193. However, the majority of the 26 unconnected nodes had not been previously registered in the SpolDB4 database.
Our proportion of clustered strains (77.9%) is higher than in other studies. A six-year study in two urban communities in South Africa identified 72% of the cases as belonging to clusters [21]. Similar results came from the Grand Canary Islands of Spain [4]. Other studies in South Africa and in Equatorial Guinea identified a level of grouping of 67 and 61.6%, respectively [5,22]. A range of clustering rates (31 to 67.7%) have been reported in cities from industrial countries such as Spain, Italy, Holland, Denmark, Slovenia, Canada, and United States [23][24][25][26][27][28][29], while France, London, and Switzerland reported a minor portion of 18 to 27.6% cluster strains [30][31][32].  Considering the probable existence of an index case in each cluster, we estimated that 68% (208-26) of our 267 cases could have been due to recent infection [10]. Taking into account the number of singletons, the estimate of RTI was greater (78%). Migration, duration of the study [33], predominance of a local strain, simultaneous reactivation of an infection acquired remotely, and laboratory error can all influence the reliability of this inference [34]. In our study, other factors support a high proportion of recent transmission, including the high prevalence rate, the mobility of population [29], the bacillary load of the cases, and the duration of their symptoms.
Although one cluster included 70 cases, most clusters had five or fewer cases. Similar results come from the Grand Canary Islands [4]. A study in San Francisco in 1996-97 found 73 of 221 (33%) cases in multiple clusters, implying multiple points of infection in the community.
Analysis of why certain strains of M. tuberculosis propagated effectively points mostly to delayed diagnosis [35]. The index cases in five of our clusters were young adult males with at least one risk factor, such as drug use or alcohol addiction [4,10,25,36]. Cases with an associated pathology have a higher tendency to acquire a new infection of M. tuberculosis [37] or to develop tuberculosis [38].
Our analysis of potential risk factors for clustering did not produce clearly actionable results. The notable absence of spatial clustering implies that place of residence is not a useful risk indicator. Type of residence (single room) combined with lack of formal education, on the other hand, was the single enduring risk profile.
SpolTools permitted the analysis of M. tuberculosis spoligotyping data to identify emerging strains and visualization of the probable evolutionary relationships between the spoligotypes in our series. Considering that a single emerging strain, the EAI2-Manila genotype, can account for so many cases, an evolution of spoligotyping could conceivably be used to evaluate the impact of TB prevention and early diagnosis efforts. Further studies of the virulence and drug sensitivity of the Manila genotype may be warranted.

Conflicts of interests
The authors do not have funding, commercial or other associations that might pose a conflict of interest.