Determinants of Cluster Size in Large, Population-Based Molecular Epidemiology Study of Tuberculosis, Northern Malawi

Both epidemiologic and strain-related factors may contribute to large clusters of tuberculosis patients.

M olecular techniques, in particular restriction fragment length polymorphism (RFLP) based on the IS6110 insertion element, are used to defi ne clusters of isolates of Mycobacterium tuberculosis with identical DNA fi ngerprints. Many studies have investigated risk factors for clustering, but relatively little is known about the de-terminants of cluster size (1,2). The size of clusters could depend on factors favoring transmission or on differences in the strains themselves. M. tuberculosis strains found in persons with smear-positive disease, many contacts, or delays in diagnosis and effective treatment are particularly likely to be transmitted. Some strains may be inherently more transmissible than others, perhaps because they are particularly likely to give rise to sputum smear-positive disease, they are associated with a more insidious onset of clinical symptoms (so patients are infectious for longer), or they are more virulent and are therefore more likely to give rise to secondary cases within the period studied (3). Large clusters may also be observed if the strain has a particularly stable RFLP pattern; this may be more likely for strains with few bands.
Epidemiologic differences can be explored by examining risk factors for cluster size. Giordano et al. (1) hypothesized that cluster size would be related to duration of symptoms. Those researchers found no evidence of this but did fi nd inverse associations with age and HIV status in a population-based study in Texas in the United States. Strain-related differences are likely if the same strains give rise to large clusters in unrelated populations. The ubiquity of the Beijing family of strains has led to speculation that they may be particularly virulent or transmissible (4).
In a population-based study of the molecular epidemiology of tuberculosis in northern Malawi, we found that clustering was associated with young age, female sex, area of residence, and, in older adults, HIV positivity (5). We explored the determinants of cluster size and the characteristics of the larger clusters.

Methods
As part of the Karonga Prevention Study, northern Malawi, all persons with suspected tuberculosis at peripheral clinics and the district hospital are seen by project staff. Sputum is collected for smear and culture; lymph node and pleural and peritoneal aspirates are also cultured, when available. Cultures are set up in the project laboratory in Malawi, and those macroscopically consistent with M. tuberculosis are sent to the Health Protection Agency Mycobacterium Reference Unit, London, United Kingdom, for species identifi cation and drug resistance testing. HIV testing is conducted after counseling, if consent is given. Patients are treated for tuberculosis according to Malawi government guidelines (6).
DNA fi ngerprinting using IS6110 RFLP has been conducted on isolates from patients who have been diagnosed since late 1995, following standard procedures (7). Patients whose disease was diagnosed up to early 2003 were included in this analysis. RFLP patterns were compared by using computer-assisted (Gelcompar 4.1; Applied Maths, Kortrijk, Belgium) visual comparison. Laboratory error was thought likely if isolates with identical RFLP patterns were isolated on the same day from patients with no known epidemiologic relationship if, in addition, there was no other laboratory evidence of tuberculosis, or if they were the only 2 examples of this RFLP pattern, or if the patients had other isolates with different patterns (8). After likely laboratory errors were excluded, RFLP patterns shared by >1 patient were classifi ed as clustered. Some patients had >1 isolate. To defi ne whether a strain was clustered and to determine the size of the cluster, patients were included more than once if they had >1 RFLP pattern. Thereafter, patients were only included once, for their fi rst episode of tuberculosis for which an RFLP result was available.
Spoligotyping (9) was performed on at least 2 isolates of clusters containing at least 15 patients, to enable comparison of strains with international databases (10,11). Changes in the proportion of tuberculosis cases caused by each of these large cluster strains over time was examined, by using the Fisher exact test to compare proportions and the χ 2 test for linear trend. Spoligotyping was also performed on unique (not clustered) strains from patients with smear-positive tuberculosis in 1998 or 1999, as examples of strains that had apparently not spread in the population; and from all positive cultures from 2002. Previously identifi ed spoligotypes were defi ned as widespread if the international database described them as both "ubiquitous" and "recurrent," "common," or "epidemic." Analysis of cluster size excluded unique strains and strains with <5 bands on the RFLP (because patterns with few bands are insuffi ciently discriminatory). Cluster size was divided into 4 groups (Table 1), and associations with cluster size were determined by using maximum-likelihood ordered logistic regression with the ologit command in STATA (12). With this method, the odds ratios calculated represent the summary relative odds of larger clusters compared to smaller clusters across the 4 groups. This method was used in preference to linear regression because cluster size is not normally distributed, and in preference to logistic regression because it avoids arbitrary dichotomization of cluster size. All available risk factors for cluster size were assessed individually (Table 1), and factors that were signifi cant at the 5% level, after adjusting for other factors, or that confounded other variables were retained in the fi nal model. The molecular epidemiologic work of the Karonga Prevention Study was approved by the Malawi National Health Sciences Research Committee and the ethics committee of the London School of Hygiene and Tropical Medicine.

Results
Over the study period, 1,248 cases of culture-positive tuberculosis were diagnosed in patients in Karonga District. RFLP results were available on 1,194 isolates from 1,044 patients. After we excluded 25 isolates because laboratory error was suspected (8), there were results for 1,029 patients. Eighty-one had <5 bands so they were excluded. Of the remaining 948 patients, 682 (72%) were clustered and form the basis of this analysis.
Cluster size varied from 2 to 37. The determinants of cluster size are shown in Table 1. Older patients were less likely than younger patients to be in large clusters. Male patients were more likely than female patients to be in large clusters, and there was variation by geographic area. Cluster size was not statistically associated with HIV status, type of tuberculosis, previous tuberculosis, or drug resistance. Patients in small clusters were as likely to die during treatment as those in large clusters. In the multivariate analysis, the results were similar (Table 2), with signifi cant associations with age, sex, and area of residence. The results were unchanged by adjusting for year or for RFLP band number. None of the other factors shown in Table 1 was associated with clustering after we adjusted for possible confounders. Repeating the analysis with different categorizations of cluster size gave similar results (not shown).
All of the large cluster strains (>15 people) were found in at least 4 of the 6 geographic areas of the district, and most were found throughout the district. The distributions of the 4 largest clusters are shown in the Figure. Patients with strains from most of the large clusters were present in the district throughout the study period. Trends over time for strains involving at least 15 people are shown in Table 3. Only 1 strain, kps121, showed statistically significant changes over time; it appeared to be decreasing.
The spoligotypes for RFLP-defi ned strains kps104, kps44, and kps97 were also identical or similar to previously described widespread spoligotypes, types 21, 53, and 1 (Beijing), respectively. The spoligotype for strain kps121, spoligotype129, was not similar to any widespread types.
The spoligotypes from the RFLP-defi ned large cluster strains were compared with spoligotypes from patients with positive cultures in 2002, and from patients with smearpositive tuberculosis and unique RFLP patterns in 1998 through 1999. Overall, 9 (90%) of 10 of the large cluster strains had spoligotypes that were identical to, or only 1 spacer different from, previously described widespread spoligotypes. For the patients from 2002, this proportion was 90 (71%) of 126 (p = 0.3 when compared to the large cluster strains), and for the smear-positive unique strains, it was 37 (66%) of 56 (p = 0.3 compared to the large cluster strains).
All the spoligotypes that were found in the RFLP-defi ned large cluster strains were also found among (RFLPdefi ned) unique strains. Seventeen of the unique strains had spoligotype 59, and 2 others had closely related patterns (i.e., 1 spacer different); 1 had spoligotype 21, and 1 had a closely related pattern; 4 had spoligotype 53, and 2 had closely related patterns; and 6 had spoligotype 129. Of the 56 patients from 1998 to 1999, none had Beijing spoligotypes, but we have previously described strains with Beijing spoligotypes and unique RFLP patterns in this population (14).
The spoligotypes found in the large cluster strains were also common among the unselected patients from

Discussion
This study suggests that both epidemiologic and strainrelated factors may contribute to large cluster size. In large clusters young adults, male patients, and those living in the town were over-represented, all factors likely to be associated with increased social mixing. Similar associations with age and sex have been found previously, in the United States and Denmark. In Denmark the largest cluster was particularly predominant in the capital city (1,2).
There was no signifi cant association between tuberculosis type (smear positive, smear-negative pulmonary, or extrapulmonary) and cluster size, but most patients had sputum smear-positive disease. There was also no statistically signifi cant association with degree of smear positivity (not shown). An overall association with infectiousness would not necessarily be expected: the infectiousness of the fi rst cases of a cluster may be important in determining size, but the fi rst cases for the large clusters, which were found throughout the period of study, are not identifi able. There was no signifi cant association with  Figure. Geographic distribution of the 4 most common strains defi ned by restriction fragment length polymorphism: A) strain kps12, B) strain kps121, C) strain kps41, and D) strain kps44. Each o represents a patient. Each square is 10 km × 10 km. The background shading represents the total number of tuberculosis (TB) cases in each area during the study period, which largely refl ects the population density.
isoniazid resistance, but only 39 (6%) patients had resistant strains. Isoniazid resistance has been associated with reduced clustering and reduced generation of secondary cases (15,16) so it might have been expected to be less common in the larger clusters. Only 3 clustered patients had rifampin resistance in our study (2 with 1 strain and 1 with another), so the effect of this factor on cluster size could not be investigated. The factors associated with cluster size were not identical to those associated with clustering overall (5). Whereas younger adults were more likely to have clustered strains and to be in large clusters, female patients were more likely to have clustered strains but among clustered casepatients, male patients were more likely to be in large clusters. Known contact with a previous tuberculosis patient is an important risk factor for tuberculosis, especially for women in this population (17). It may be that women are particularly likely to become infected at home (and therefore be in small clusters) and that men are more likely to become infected outside the home, sometimes from outside the area (seen as unique strains) and sometimes as part of large clusters.
We found no evidence of an association of cluster size with HIV status, although we had previously found HIV to be associated with clustering among older patients (5). The effect of HIV infection on clustering is complex since it depends both on the biologic effects of HIV (increasing the risks for active disease-perhaps to different extents for primary and postprimary disease-and decreasing infectiousness) and on any tendency for HIV and tuberculosis to affect the same subpopulations with shared risk factors.
Strain virulence was assessed by examining the proportion of patients who died: there was no association with cluster size either overall, or separately, in HIV-positive or -negative patients (data not shown). Virulent strains could lead to large clusters if virulence were associated with increased transmission rates or increased rates of disease after infection (3). However, virulent strains could have less opportunity to transmit if the severity of symptoms leads to early treatment or death, thus reducing the duration of the infectious period.
Evidence that strain characteristics may have contributed to cluster size comes from the fi nding that the spoligotypes of most of the common RFLP-defi ned strains in this study were identical to, or only 1 spacer different from, widespread spoligotypes already described. Unique RFLP-defi ned strains from smear-positive patients in the early part of the study were used as a comparison group.

1064
Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 14, No. 7, July 2008  Smear-positive case-patients were chosen to maximize the likelihood of transmission occurring; early cases were used to allow time for secondary cases to have been identifi ed if they had occurred. These unique strains were less likely than the large cluster strains to have spoligotypes that were closely related to widespread types, but this difference was not statistically signifi cant, and the spoligotypes that were found in the large cluster strains were also found among the unique strains. Interestingly, strain kps121, which was the only large cluster strain with a spoligotype not closely related to a widespread previously described type, was also the 1 large cluster strain that was clearly decreasing in the Karonga population.
The fi nding of large cluster strains with previously described widespread spoligotypes may suggest that these strains are particularly transmissible or particularly likely to cause disease. Other possibilities are that they are older in evolutionary terms, and thus have had more time to become widespread, or that we are seeing a founder effect in some populations with subsequent spread following human migration patterns. Spoligotype 59 was common in the Malawi population in all groups of patients, clustered and unique, and was associated with a wide diversity of RFLP patterns, which suggests that it may be a longstanding strain in this area. It was also the most common spoligotype found in studies in Zimbabwe and Zambia (18,19). However, spoligotype 59 was particularly common among the isolates from large clusters, with more closely related RFLP patterns, consistent with some variants having high transmissibility. Spoligotype 59 has been classifi ed as belonging to the Latin-American-Mediterranean lineage (18), and as part of the strain family Southern Africa Family 1 (19). The large cluster strain kps97 had a Beijing spoligotype and in total, we have previously identifi ed 44 patients with Beijing strains in this dataset, with 12 different RFLP patterns (14). Beijing strains have been associated with increased virulence and growth rates in vitro (20)(21)(22). That there are true differences in strain characteristics between other clustered and nonclustered strains is beginning to be established in in vitro studies from other populations (23). Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 14, No. 7, July 2008 The opinions expressed by authors contributing to this journal do not necessarily refl ect the opinions of the Centers for Disease Control and Prevention or the institutions with which the authors are affi liated.