Global Origin of Mycobacterium tuberculosis in the Midlands, UK

DNA fingerprinting data for 4,207 Mycobacterium tuberculosis isolates were combined with data from a computer program (Origins). Largest population groups were from England (n = 1,031) and India (n = 912), and most prevalent strains were the Euro-American (45%) and East African–Indian (34%) lineages. Combining geographic and molecular data can enhance cluster investigation.

K nowledge and understanding of transmission dynamics of Mycobacterium tuberculosis have been improved by development of rapid molecular techniques that are being more extensively applied (1)(2)(3). Globally, application of molecular techniques has identifi ed major M. tuberculosis lineages associated with geographic origin (4)(5)(6)(7). Previous studies on transmission dynamics of M. tuberculosis have usually analyzed patient-declared population groups to identify associations (1,7).
We describe a novel software (Origins; Experian, Nottingham, UK) that assigns cultural, ethnic, and linguistic (CEL) groups on the basis of given and family names. Records from 12 countries containing 1,600,000 family and 600,000 given names were analyzed to construct >200 origin types based on CEL factors associated with given and family names. This approach is applicable worldwide and is more accurate and has better coverage than other software (8). The fi rst use of Origins in healthcare was identifi cation of how a European CEL group came to emergency departments in the United Kingdom (9).
The aim of this study was to combine mycobacterial fi ngerprinting data and patient origin as assigned by Origins to relate the occurrence of major global M. tuberculosis lineages in populations originating from around the world. Combining data obtained from universal typing and associated cultural and social links identifi ed by Origins provides the potential for a deeper understanding of the causes for distribution of prevalent strains in specifi c population groups.

The Study
Nonduplicate initial M. tuberculosis complex isolates (n = 4,207) were referred from the Midlands region of the United Kingdom (population 9.5 million) to our center during January 2004-December 2007. These isolates were incubated, identifi ed, and analyzed by mycobacterial interspersed repetitive units containing variable numbers of tandem repeats (MIRU-VNTR) typing (10). MIRU-VNTR typing analyzes the number of repetitive DNA sequences at multiple independent genetic loci. These data were compared with those in an online database (MIRU-VNTRplus), which was developed by Allix-Beguec et al. (11). This database was used to assign M. tuberculosis strains to 1 of 6 lineages: East African-Indian, East Asian, Euro-American, Indo-Oceanic, West African-1, or West African-2.
The given and family names of 4,207 patients were entered into Origins to obtain a CEL group for each, which was then assigned a continent on the basis of the United Nations Standard Country and Area Codes Classifi cation Scheme (12). Origins can assign a CEL group when the given and family names are present in a dataset.
Within the study population are predominant CEL groups that originate from each continent: 1,031 (25%) from England in Europe, 912 (22%) from India in Asia, and 130 (3%) from Somalia in Africa (Table 1 Using the 15 MIRU-VNTR loci, we matched 4,117 (98%) of 4,207 typed strains to strains in the MIRU-VN-TRplus database. The 90 strains that did not match with 1 of the 6 major global lineages were M. bovis (24 strains) or could not be defi nitively assigned (66 strains) to 1 of the 6 global lineages. Continental and regional origins of patients as assigned by Origins and global lineage were then combined to identify the distribution of global M. tuberculosis lineages within each population ( Table 2).
The Euro-American lineage was the most prevalent lineage in our study. It contained 1,894 (45%) strains and was present in each continental human population group. The Euro-American strain was the most prevalent lineage in patients originating from Africa (125), the Americas (11), and Europe (1,072) and was the second most prevalent lineage in patients originating from Asia (663). The most prevalent M. tuberculosis lineage in patients originating from Asia was the East African-Indian lineage (1,150).
Combining geographic data assigned by Origins and DNA fi ngerprinting data could affect public health efforts to control tuberculosis because this approach can identify strains in CEL groups in which specifi c global M. tuberculosis lineages are not present. The MIRU-VNTR profi le 424352332515333 (East Asian lineage) was identifi ed in 23 patients from the Midlands. Of these 23 patients, 20 resided within a 5-mile radius of each other. Within this geographically restricted cluster, 12 (60%) of these patients were assigned to the Europe CEL group and 8 patients to part of the Asia CEL group. The fi rst strain was identifi ed in 2004, and subsequent strains were identifi ed in each year of this study.
The MIRU-VNTR profi le 422352542517333 was identifi ed in 102 patients during 2004-2007. This profi le was matched with the East African-Indian lineage; 98 (96%) patients originated from Asia and 4 (4%) from Europe. This strain was identifi ed in various locations in the Midlands within an ≈40-mile radius that included all patients.

Conclusions
We studied >4,000 M. tuberculosis isolates typed in the United Kingdom. Our study demonstrated that the combination of molecular and population group data provided by novel software can provide information about the molecular epidemiology of M. tuberculosis. The 2 example MIRU-VNTR profi les show that molecular and social data identifi ed an East Asian strain in an unsuspected CEL group (Europe) and limited transmission of an East African-Indian strain between CEL groups. Geographic restriction of the 424352332515333 East Asian strain in the European CEL group identifi ed possible recent transmission within this population group. The 422352542517333 East African-Indian strain infected a large number of patients (102) and showed wide geographic spread with limited transmission into the European CEL group (4/102 patients). This fi nding indicates that this strain is widely distributed in southern Asia and has not been transmitted between CEL groups. Its wide distribution in the United Kingdom refl ects areas of residence for this CEL group.
Data from our study support previous fi ndings and extend the dataset for Europe. Our results also include a large number of strains from southern Asia, which were underrepresented in other studies (7,13).
Origins identifi ed CEL groups within a country (e.g., Kashmir in Pakistan or northern India) and divided Great Britain and Ireland into 4 CEL groups (Table 1). This enhanced differentiation could be useful in future population-based studies because migration patterns may be localized to specifi c areas within countries and common social networks could be identifi ed. CEL groups can be assigned to any dataset in which the patient's name is known. Traditional epidemiologic identifi cation of ethnic groups requires a questionnaire, but if patient names are not in a dataset, then CEL groups cannot be assigned. Origins showed some discrepancies because the black Caribbean CEL group usually has British names and will be assigned as a British CEL group (8). However, the utility of Origins is maximized when it is applied to diverse populations.
Many countries now routinely type M. tuberculosis isolates by using MIRU-VNTR typing. This analysis identifi es clusters of strain types across place and time. By using Origin software for identifi cation of CEL groups, public health offi cials can identify and investigate possible cultural links for transmission of M. tuberculosis.