Using Genotyping and Geospatial Scanning to Estimate Recent Mycobacterium tuberculosis Transmission, United States

These tools may enable direction of resources to populations with high transmission rates.

M olecular characterization of Mycobacterium tuberculosis complex has been available for >2 decades in the United States. As a tool to enhance programmatic activities, tuberculosis (TB) genotyping is a useful adjunct to epidemiologic fi eld investigations by defi ning outbreaks (1,2), discerning episodes of reactivation and relapse (3,4), confi rming suspected laboratory contamination (5,6), and evaluating and monitoring TB control program performance (7). TB genotyping results, when combined with epidemiologic data, help identify persons with TB disease who are involved in the same chain of recent transmission (8). Previous analytic studies have used TB genotyping data in conjunction with epidemiologic data to assess correlates of recent TB transmission within localized populations (9)(10)(11)(12)(13)(14)(15). A basic assumption of this approach is that recent TB transmission is localized in place and time, that is, progression to TB disease from an infection acquired within the past few years and in the same jurisdiction.
Population-based molecular epidemiologic studies are often subject to several biases and methodologic limitations that impede the ability of investigators to make valid statements about recent TB transmission events in the absence of direct data regarding interpersonal contacts (16). Estimating recent TB transmission is often limited by abbreviated study periods, convenience isolate sampling, and ambiguous geographic boundaries defi ned for jurisdictional or geopolitical reasons (17,18). TB transmission is not likely to be bound by these artifacts, however. Spatial scanning to detect disease clusters has been successfully applied in multiple settings and for various diseases (19). Using this method in a multiyear, nationally representative database of both genotype and routinely collected TB surveillance data may offer a better solution for accurately defi ning recent TB transmission.
In 2004, the US Centers for Disease Control and Prevention (CDC) offered universal access to TB genotyping through the National Tuberculosis Genotyping Service (NTGS) to routinely characterize at least 1 M. tuberculosis complex isolate from every TB case-patient in the United States (20). Although the intent of this system is to support local TB programs for public health action,

Using Genotyping and Geospatial
Scanning to Estimate Recent Mycobacterium tuberculosis Transmission, United States data collected from this system offer a unique opportunity to explore and describe the molecular epidemiology of TB and establish comprehensive molecular TB surveillance in the United States. In this analysis, our goals were to estimate the proportion of TB in the United States attributable to recent transmission and to assess clinical, demographic, and epidemiologic factors associated with recent TB transmission.

Study Population
This study includes verifi ed cases of TB reported to the US National Tuberculosis Surveillance System (NTSS) by the 50 states and the District of Columbia. Clinical, demographic, and epidemiologic variables for each casepatient are collected for surveillance purposes and are described elsewhere (21). M. tuberculosis complex isolates were characterized by using a standardized protocol for spacer oligonucleotide typing (spoligotyping) and 12-locus mycobacterial interspersed repetitive unit-variable-number tandem repeats (MIRU-VNTRs) (22). NTGS results for each submitted isolate were linked to NTSS case records by state and local TB control programs; a standardized case identifi cation number and a unique laboratory accession number were used to form discrete individual isolatecase records (20). When multiple isolates were genotyped for the same person in the same surveillance year, casepatients with discordant genotyping results were excluded from analysis for clustering assignment and risk factor analysis. The fi nal study population included all persons with verifi ed culture-positive TB cases reported during January 2005-December 2009 with a complete spoligotype and 12-locus MIRU-VNTR result.
Four major phylogenetic lineages for M. tuberculosis, along with speciation of M. africanum and M. bovis, were identifi ed by using spoligotyping motifs that referred to an international standard (23). Substance abuse was defi ned by using previously published methods (24). Persons with TB who received a positive HIV test result at the time of TB diagnosis were classifi ed as TB/HIV case-patients. Persons with TB and negative HIV results or unknown HIV status were classifi ed as having non-HIV TB.

Genotype and Geospatial Clustering
Genotype clusters were defi ned as cases with matching spoligotype and 12-locus MIRU-VNTR results (i.e., exact match on all loci) reported within statistically signifi cant geospatial zones determined by a spatial scan statistic (25). SaTScan version 9.1.0 (26) was employed to identify geographic areas with a larger-than-expected rate of discrete genotype clustering, and all other culture-positive TB cases counted during the study were considered as the background rate. In brief, all cases were aggregated by genotype according to residential ZIP code where they were reported. Each genotype was then scanned separately, applying a purely spatial analysis, in which the number of events in an area was assumed to be Poisson-distributed to generate circular zones of various sizes up to a maximum radius of 50 km. An evaluation of outbreak investigations conducted by CDC demonstrated no difference in cluster membership when 50-km and 100-km SaTScan search radii were used to identify known epidemiologically linked genotype cases (CDC, unpub. data).
A log-likelihood ratio was calculated for each zone in comparison with all possible zones, with the maximum likelihood ratio representing the zone most likely to identify spatial clustering for each genotype. A Monte Carlo simulation with 999 repetitions was used to determine the distribution of the scan statistic under the null hypothesis of spatial randomness; signifi cant spatial clusters were chosen at an α of p<0.05. Three scans comprised of 3-year overlapping intervals (scan A, 2005-2007; scan B, 2006-2008; scan C, 2007-2009) were performed to identify spatial clusters occurring within a 3-year period. If cases were identifi ed as a member of a statistically signifi cant spatial cluster in any of the 3 periods, they were considered clustered. No duplicative case counting occurred. The purpose of this spatial scan was to characterize each case for a dichotomous outcome: clustered or not clustered. Cases that were both genotypically and spatially clustered were considered recent TB transmission for the purposes of this study. All cases that were not genotypically and spatially clustered were considered reactivation of remotely acquired TB infection, or reactivation TB. For comparative purposes, national-, state-and county-level clustering defi nitions were created. National-level clustering was defi ned as >2 culture-positive cases with identical genotypes reported anywhere in the United States during 2005-2009. Statelevel clustering was defi ned as >2 culture-positive cases with identical genotypes reported from the same state during 2005-2009. County-level clustering was defi ned as >2 culture-positive cases with identical genotypes reported from same county during 2005-2009.

Statistical Analyses
A predictive logistic regression model was used to determine potential associations between clinical (e.g., sputum-smear status, known HIV positivity, site of disease and cavitation on chest radiograph, and previous TB diagnosis) and demographic and risk characteristic variables (e.g., race/ethnicity, age, country of birth, homelessness, substance abuse, incarceration at time of diagnosis, and residence at long-term care facility at diagnosis) and the outcome of interest: geospatial and genotype clustering as a proxy for recent TB transmission. Univariate analysis of the categorical independent variables was done by using Pearson χ 2 . Any variable with a signifi cance value of <0.20 was included in a best subset, multivariate logistic regression model. We built our fi nal model using backward elimination of nonsignifi cant independent variables (p>0.01). The log-likelihood ratio was used to assess the overall signifi cance of the fi nal models, and the Hosmer-Lemeshow statistic was used to evaluate the fi t of each of the fi nal models. To test the hypothesis that factors associated with recent TB transmission events varied by geographic region of the United States, an additional 4 independent models were created following the same process but subset to western, midwestern, northeastern, and southern states, respectively (27).

TB Case Population
During 2005-2009, a total 65,529 verifi ed cases of TB were reported to CDC. Of these, 51,015 (77.9%) were culture-positive ( Figure 1). During this period, the overall incidence of TB in the United States declined from 4.8 to 3.8 per 100,000 persons, representing a decline of 20.1% in the overall case count (21).

TB Isolates and Genotype Clusters
During 2005-2009, a total of 45,188 isolates were submitted to NTGS for molecular characterization; 39,474 (87.4%) were successfully matched to a case-patient with reported TB. Two hundred seventy isolates (0.7%) had incomplete results on spoligotype, MIRU-VNTR, or both; 344 case-patients (0.9%) had multiple isolates with discordant genotyping results and were excluded from the analysis. The total number of genotyped TB cases available for analysis was 36,860, representing 72.3% of all reported culture-positive cases. The proportion of reported casepatients for whom complete genotype results were available Of the 36,860 cases for which genotyping had been performed, 8,499 (23.1%) were considered clustered by both genotype and spatial concentration and therefore were thought to be members of a putative recent TB transmission event. The average number of spatially concentrated genotype clusters identifi ed per 3-year scanning period was 1,039 (range 970-1,128). Nationally, the overall mean cluster size was 5.7 members (range 2-173 members) ( Figure 2). The median cluster size was 3 members, and almost half (46.1%) of the clusters had only 2 members. Other clustering defi nitions that use geopolitical boundaries had higher average clustering percentages when the same 3-year window periods were used (national-level, 77.3%; state-level, 57.1%; county-level, 38.7%) (Figure 1).
Cluster members with recent TB transmission events were also more likely to have reported HIV-positive results (8.7% versus 5.5%), pulmonary disease exclusively (78.4% versus 72.2%), and positive sputum smear results (61.5% versus 55.3%) and to have had a cavitary chest radiograph at time of diagnosis (36.8% versus 32.2%) than those thought to have reactivation TB. Of the 8,499 persons with cases believed to be caused by recent TB transmission, only 2.1% and 4.4% resided in a long-term care or correctional facility at the time of diagnosis, respectively.

Genotype Lineage and Recent TB Transmission Events
The proportions of isolates in each phylogenetic lineage were as follows: Euro-American, 64.2%; Indo-

Factors Associated with Putative Recent TB Transmission Events
In our fi nal adjusted model, the following odds ratios were noted for variables signifi cantly associated with a higher odds of having a case attributed to putative recent TB transmission (

Geographic Variation Associated with Recent TB Transmission
Best-fi t models to predict those with recent TB transmission were conducted for each of the 4 US geographic regions. Many of the main effects associated with recent TB transmission remained constant (US-born, substance abuse, homeless), although factors varied in both magnitude and risk factor across the United States (Table 2).

Discussion
According to these fi ndings, ≈1 in 4 TB cases reported in the United States may be attributed to recent TB transmission; this increases to 1 in 3 among US-born persons ( Table 1). Our approach to identifying the proportion of reported TB attributable to recent transmission is based on the concept that epidemiologically related organisms share indistinguishable genotypes, whereas unrelated organisms differ at some genetic loci (8). TB cases that occur in spatial clusters and share indistinguishable genotypes are thought to be caused by recently transmitted TB infection; those with nonclustered genotypes are thought to result from progression from an infection acquired >3 years in the past. In the absence of detailed data about interpersonal contact between persons, relying on genotype and on place and time data routinely collected during surveillance activities becomes imperative to assessing recent transmission at a national level. This goal was achieved by using the established infrastructure of NTSS and TB genotyping, universally accessible to TB programs through NTGS, to capture 72% of all cases with culture-positive results over a 5-year period.
Spatial scanning provides a new insight into TB transmission that is independent of jurisdictional or geopolitical boundaries. This nationally representative study incorporated spatial concentration as a core element for defi ning recent TB transmission. Previous studies were limited to clustering defi nitions confi ned to a single jurisdiction (9)(10)(11)14,15), state, or province (28,29), or incomplete sampling of an entire nation (13,30). The proportion of cases representing recent TB transmission varied considerably by cluster defi nitions based on geopolitical borders. If a national clustering defi nition was used, up to 80% of culture-positive cases would be attributed to recent TB transmission. If a state-based defi nition or county-based defi nition was used, up to 57% and 39% of culture-positive cases, respectively, would be attributed to Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 18   recent TB transmission. Although which defi nition most accurately represents recent TB transmission is unclear, a clustering defi nition based on geospatial concentration appears to be the most conservative and is not subject to the potential misclassifi cation of political boundaries. The limitation of using these boundaries can be best exemplifi ed by known inter-jurisdictional TB outbreaks that crossed geopolitical borders (31). Because the proportion of recent TB transmission may be a refl ection of the success of control measures, accurately assessing this quantity is of considerable public health importance.
Estimating recent TB transmission also depends on the duration of the study period (16). Other studies have shown increasing clustering proportions as the duration of the study increases, with a plateau effect after 3 years (12,13,17,32,33). The annual proportion of isolates with a new strain identifi ed in the United States during this study period did plateau (data not shown), suggesting a similar phenomenon and potential infl uencing factor in the longterm estimation of TB genotype clustering nationwide. Using consecutive, overlapping scanning windows that incorporate 3-year intervals maximizes the probability that spatial and temporal clustering represent localized, recent TB transmission within this large and comprehensive dataset. As NTGS continues to mature and grow over time, adjusting for temporal clustering will become essential when estimating recent TB transmission.
Consistent with other published reports from countries with a low incidence of TB, the characteristics of local birth, male sex, minority race, substance abuse, and homelessness were associated with recent TB transmission (17,18,33). These fi ndings highlight the fact that TB may be harder to eliminate among populations characterized by these factors (34). The large proportion of cases attributable to recent TB transmission among minorities, persons who abuse substances, and those who are homeless suggests that limited access to routine health screenings, resulting in delayed diagnoses, may extend infectious periods and rates of TB transmission. Indeed, TB patients who use illicit substances and abuse alcohol have been found to be more contagious (24).
In low-incidence, high-resource countries, efforts to control recent TB transmission are based largely on contact investigation, yet for many reasons, contact investigations may not be suffi ciently intensive or comprehensive, even in successful TB control programs (35). Every case of TB began when a person came into contact with a person with contagious TB. Therefore, it follows that clusters of case-patients representing recent TB transmission could be averted through improved contact investigation efforts. Contact investigations are multistep processes in which exposed contacts are systematically evaluated on the basis of the amount of time spent with an infectious person, the environmental conditions of exposure venue, and the contact's intrinsic predisposition for infection or disease (36). Numerous studies have demonstrated that eliciting names of contacts is neither optimally effective nor suffi cient to interrupt TB transmission among highrisk groups, such as the homeless and persons who abuse substances (1,24,37,38). The potential for uninterrupted TB transmission is further exacerbated by the poor yield of name-based contact investigations among these populations. Locations are as important as named contacts when investigating recent transmission. A recent study found that 81% of case-patients involved in a multiyear TB outbreak lived in close geographic proximity (38). Spatial scanning methods may assist with identifi cation of specifi c clusters representing ongoing transmission that could benefi t from targeted location-based interventions. Using spatial scanning methods to determine locations with high concentrations of both spatial and genotype clustering may be an effective way to prioritize resources to intervene in populations with high rates of TB transmission.
This study does have limitations. First, isolate submission for TB genotyping is not universal; thus, the database, although large, did not contain all reported casepatients with culture-positive TB during the study period. Clinical, demographic, and epidemiologic characteristics of patients without TB genotyping data did not differ statistically from those with TB genotyping data (data not shown). Second, spatial and genotype clustering serves only as a proxy for recent TB transmission in the absence of detailed data on interpersonal connections between casepatients. Because of dynamic migration patterns within the United States, these methods may fail to ascertain cases that are due to recent transmission when a putative source casepatient moves or if exposure occurred outside the range of spatial scanning. Increased global migration has infl uenced the epidemiology of TB in the United States as well. Recent immigrants who became infected with a particular genotype elsewhere may resettle in the same neighborhood and, when TB develops after resettlement, it may falsely be considered recent TB transmission. Third, although spoligotyping and 12-locus MIRU-VNTR have good discriminatory power, these methods may not provide the resolution necessary to differentiate evolutionarily close strains (39,40). The introduction of an expanded panel of 24 MIRU-VNTR loci in 2009 to NTGS may reduce this misclassifi cation in the future (40). It is also critical to note that TB transmission dynamics are multifactorial. TB genotype clustering may overestimate transmission. Consideration of patient characteristics, transmission venues, and temporality may better clarify recent transmission.
The integration of NTGS into routine public health practice and surveillance has led to the establishment of molecular surveillance of M. tuberculosis in the United States (20). With improved access to and rapid dissemination of genotyping information, it may be possible to more effectively identify some cases of TB transmission. Yet, TB genotyping, and likely future molecular advancements do not alter real-time public health action. Rather recent transmission can only be prevented by implementing thorough contact investigation and ensuring that subsequent preventive treatment is completed among those identifi ed at highest risk of undergoing a progression from infection to TB disease. If such practices had been successfully followed, as many as one third of all reported TB cases in US-born patients may have been prevented, especially among highrisk populations, such as persons with substance abuse disorders, those experiencing homelessness, or both. Greater attention and resources are needed to develop, implement, and evaluate interventions to control and prevent transmission among these populations. As the United States continues toward TB elimination, understanding transmission dynamics among high-risk populations and establishing new strategies for rapidly detecting and effectively responding to these transmission events will enhance the progress toward achieving this target.