Variations in tuberculosis prevalence, Russian Federation: a multivariate approach

Abstract Objective To analyse the epidemiological trends of tuberculosis in the Siberian and Far Eastern federal districts, the areas with the highest disease burden in the Russian Federation. Methods We applied principal coordinate analysis to study a total of 68 relevant variables on tuberculosis epidemiology, prevention and control. Data on these variables were collected over 2003–2016 in all 21 regions of the Siberian federal district and Far Eastern federal district (total population: 25.5 million) through the federal and departmental reporting system. We identified the regions with a favourable or unfavourable tuberculosis epidemiological profile and ranked them as low or high priority for specific interventions. Findings The median number of tuberculosis notifications in the regions was 123.3 per 100 000 population (range: 54.5–265.7) in 2003, decreasing to 82.3 per 100 000 (range: 52.9–178.3) in 2016. We found large variations in the tuberculosis epidemiological profile across different regions. The principal coordinate analysis revealed that three aggregated indicators accounted for 55% of the variation. The first coordinate corresponded to tuberculosis prevalence and case notifications in the regions; the second to the severity of the disease among patients; and the third to the percentage of multidrug-resistant tuberculosis among tuberculosis patients. The regions where intervention was most urgently needed were Chukotka Autonomous Okrug, Jewish Autonomous Oblast and Tyva Republic. Conclusion The variability in tuberculosis epidemiology across regions was likely due to differences in the quality of antituberculosis services. Precision in defining necessary interventions, as determined through the principal coordinate analysis approach, can guide focused tuberculosis control efforts.


Introduction
During the 1990s the tuberculosis epidemic in the Russian Federation worsened due to an economic recession in the country at that time. National tuberculosis case notifications reached a peak of 90.7 per 100 000 in the population of 146.6 million in the year 2000. The robust measures to control tuberculosis taken by the government between 2000 and 2010 led to a reduction in case notifications to 48.3 per 100 000 in the population of 144.5 million by 2017. Despite these measures, the tuberculosis situation in the Asian part of the Russian Federation remained less favourable compared with the European part. For example, in the year 2017, tuberculosis case notifications in the Siberian federal district (population: 19.3 million) and Far Eastern federal district (population: 6.1 million) were as high as 83.3 and 86.1 per 100 000 respectively. [1][2][3][4] Identification of the key factors that determine the unfavourable epidemic situation in these districts is therefore a top priority for research. The evidence collected can then be used to plan focused interventions.
Tuberculosis surveillance and monitoring were established in their present form in these two federal districts in 2003 with data pm dozens of variables are collected annually by the government. We aimed to identify the variables that are closely correlated and to understand how they interact and contribute to the tuberculosis epidemic in different regions of Siberian and Far Eastern federal districts. We used multivariate principal coordinate analysis, 5 a method that allowed us to condense the complex, multidimensional data set into a small number of aggregated indicators for each region. Description of the regions' tuberculosis status in terms of these indicators would enable us to characterize and compare the epidemiology of tuberculosis among regions. The results are intended to inform the development of a unified prevention, care and control policy that addresses each region's specific needs.

Data source
The study was a retrospective epidemiological analysis covering all tuberculosis cases notified in the Asian part of the Russian Federation from 2003 to 2016.
We analysed data obtained from federal and department reports for each of the 21 regions in the Siberian and Far Eastern federal districts which constitute the Asian part of the Russian Federation. The health authority reports cover all registered patients with records of all forms of tuberculosis. A total of 68 variables of tuberculosis incidence, treatment and outcomes are collected. The method of data gathering is identical in all regions of both federal districts. Access to the epidemiological data is restricted to clinical specialists and medical statisticians. The raw data are obtained by regional tuberculosis centres from tuberculosis hospitals and outpatient departments at regional, municipal and township levels using official ministerial guidelines, and collated for analysis in Novosibirsk Tuberculosis Research Institute. The data are regularly checked for consistency and reliability by experienced research personnel at the Institute who frequently Objective To analyse the epidemiological trends of tuberculosis in the Siberian and Far Eastern federal districts, the areas with the highest disease burden in the Russian Federation. Methods We applied principal coordinate analysis to study a total of 68 relevant variables on tuberculosis epidemiology, prevention and control. Data on these variables were collected over 2003-2016 in all 21 regions of the Siberian federal district and Far Eastern federal district (total population: 25.5 million) through the federal and departmental reporting system. We identified the regions with a favourable or unfavourable tuberculosis epidemiological profile and ranked them as low or high priority for specific interventions. Findings The median number of tuberculosis notifications in the regions was 123.3 per 100 000 population (range: 54.5-265.7) in 2003, decreasing to 82.3 per 100 000 (range: 52.9-178.3) in 2016. We found large variations in the tuberculosis epidemiological profile across different regions. The principal coordinate analysis revealed that three aggregated indicators accounted for 55% of the variation. The first coordinate corresponded to tuberculosis prevalence and case notifications in the regions; the second to the severity of the disease among patients; and the third to the percentage of multidrug-resistant tuberculosis among tuberculosis patients. The regions where intervention was most urgently needed were Chukotka Autonomous Okrug, Jewish Autonomous Oblast and Tyva Republic. Conclusion The variability in tuberculosis epidemiology across regions was likely due to differences in the quality of antituberculosis services. Precision in defining necessary interventions, as determined through the principal coordinate analysis approach, can guide focused tuberculosis control efforts.

Research
Tuberculosis in the Asian part of the Russian Federation Ivan Meshkov et al.
conduct supervisory visits in the observed regions.
A detailed description of all variables collected by the federal authorities and their calculation rules and periods of data collection are available from the data repository. 6 These variables are recommended for epidemiological reporting as key factors in control of epidemic tuberculosis. 7,8 Variables cover a wide range of characteristics, including: (i) case notifications and the number of notified tuberculosis cases stratified by age and by urban or rural population; (ii) data on tuberculosis patients who were difficult to treat, including those with multidrug-resistant (MDR) tuberculosis or extensive fibrotic changes and cavities (verified by a medical board of experts reviewing difficult cases using clinical and tomography data); 9-11 (iii) coverage of tuberculosis diagnostics, including any type of examination; and (iv) outcomes of tuberculosis treatment.

Data analysis
We used principal coordinate analysis for the study. In this method, the differences between regions are represented by a small number of coordinates. The method has several advantages compared with more commonly used methods such as principal component analysis. First, principal coordinate analysis has low sensitivity to the presence of outliers in the data. Second, the method can reflect the relationships between the variables analysed regardless of whether they are linear or not. Third, it has good applicability when the number of variables (in our case the epidemiological data) is much greater than the number of observations (in our case the different regions).
We first compiled a table containing the values of all the variables, by year and region. We prepared the table for principal coordinate analysis by transforming the initial data to standardize the scatter of the analysed variables. 12,13 Without these transformations, the effect of variables with relatively small variances would contribute insufficiently to the analysis. For this purpose, we calculated the median values and interquartile ranges for all variables of the initial data and then converted the data using the formula: Where a ij is the initial value of variable j for the region i, Medj is a median value and the IQR j interquartile range of the variable j over all regions, and x ij is the final transformed value. All collected variables should be used to achieve a comprehensive description of the difference in the epidemic situation between the regions. It is possible to choose a suitable metric, then to calculate the differences (called distances) between regions and finally to construct the matrix of distances. For the present study we chose Manhattan distances and calculated the distances between the regions using the formula: where i and k are the regions between which the distance is calculated, and 1, 2, … j are the variables. We chose this metric because the values of variables for pairwise correlation of two regions were mostly identical or slightly different, but there were several variables with large differences of values. In this case, the Manhattan distance will smooth these differences.
To analyse the matrix of the Manhattan distances, information from the distance matrix is used to form a limited number of new variables (coordinates) that are the measures of (dis) similarity between regions and that explain most of the interregional variability. The regions are visualized as points on the coordinate axes of the scatterplot. Similarity between regions is represented as the simple geometric distance between points. The shorter the geometric distance, the higher the similarity.
We assessed the explanatory value of the principal coordinates using the eigenvalues of each coordinate. We divided the eigenvalues of each coordinate by the sum of the eigenvalues of the entire coordinate set. The data transformation is considered informative if two or three coordinates with relatively large eigenvalues (encompassing more than 50% of the total variance) are obtained, while for other axes the eigenvalues are small. The principal coordinate values we obtained are dimensionless quantities. In our case, it was convenient to present the administrative regions as points on a three-dimensional scatterplot.
The epidemiological meaning of a coordinate can be understood from the value of its correlation with 68 initial epidemiological variables, according to the epidemiological meanings of these variables. The Spearman correlations between variables and coordinates were calculated for each year of observation. Only strong relationships with correlation coefficients ≥ 0.6 in absolute value were considered.
We used R programming language, version 3.4.0 (R Foundation for Statistical Computing, Vienna, Austria) for statistical analysis. 14 In the analysis by region, we found that tuberculosis prevalence (

Principal coordinates
In the principal coordinate analysis, we found that the first three coordinates explained 55.1% of the variability of the initial epidemiological data across regions: the first coordinate explained 28.6% of the variability, the second explained 14.1% and the third explained 12.4%.
Tuberculosis in the Asian part of the Russian Federation Ivan Meshkov et al.

Epidemiological favourability
The first coordinate correlated inversely with 13 variables, including: tuberculosis case notification by chest X-ray screening examination per 1000 people; tuberculosis case notification in the total population, in rural and in urban populations (as percentages of the total population) and among adolescents aged 15-17 years; tuberculosis prevalence per 100 000 people among adolescents and adults; and tuberculosis relapse and mortality. An inverse relationship was also found between this coordinate and the prevalence of tuberculosis with extensive fibrotic changes and cavities; and the proportion of patients with that type of tuberculosis among all tuberculosis patients. There was a positive correlation with two variables: the percentage of respiratory tuberculosis patients with bacteriological conversion; and the proportion of closure of cavities among tuberculosis patients. The Spearman correlation coefficients and P-values of all variables correlated with the first coordinate are available from the data repository. 15 The first coordinate could be interpreted as an aggregated indicator of interregional variations of epidemiological tuberculosis transmission and severity, and we called it the epidemiological favourability coordinate. The high values of this coordinate reflect both a low level of tuberculosis transmission and a high efficiency of antituberculosis measures. The higher the proportion of cured patients, the lower the relapse rate and the prevalence of tuberculosis.

Severe tuberculosis
The second coordinate correlated negatively with six variables, including: tuberculosis case notifications in the total population and in urban areas; the total number of people registered for treatment; the number of followed patients co-infected with HIV and tuberculosis; tuberculosis-related case fatality and mortality; and the proportion of tuberculosis patients who died outside hospitals in 2006, 2011 and 2014. There was a positive correlation of this coordinate with one variable: the density of tuberculosis specialists in the region. The Spearman correlation coefficients of all variables correlated with the second coordinate are available from the data repository. 16 The correlations indicate that the regions characterized by high values of the second coordinate were those with a relatively low prevalence of severe tuberculosis. Moreover, these relationships probably indicate that a high concentration of tuberculosis specialists in a region increased the cure rate and lowered tuberculosis related mortality. The second coordinate therefore reflects the interregional variations in the efficacy of interven-tions in preventing severe tuberculosis and in reducing mortality and could be called the control of severe tuberculosis coordinate. This coordinate correlated inversely with tuberculosis case notifications in urban populations, but no relationship was found in rural populations. This finding can probably be explained by the higher availability of antituberculosis services and the higher number of tuberculosis specialists in the cities.

MDR tuberculosis
The third coordinate correlated negatively with seven variables, including: the proportions of MDR tuberculosis cases among tuberculosis patients; MDR tuberculosis case notifications; and the percentage of cavity closures in newly diagnosed patients. This coordinate was therefore associated with the control of MDR tuberculosis. The detailed list of variables correlated with the third coordinate is available from the data repository. 17 The data show that the more patients newly registered for treatment, the higher the proportion of diagnosed cases with MDR tuberculosis. Between the years 2005 and 2010, the tuberculosis epidemic was particularly severe in certain regions, which prompted government-supported interventions. The measures included additional financing, strengthening of tuberculosis services, more training courses for tuberculosis specialists, an increased number of radiological examinations and an increase in the number of tuberculosis surgeries. These actions may have led to an increase in the number of diagnosed bacteriological conversions with a subsequent decrease in transmission and, ultimately, tuberculosis notifications and mortality after 2010. However, we cannot exclude that during these interventions some patients were still treated inadequately, leading to low cure rates and an increased number of patients with MDR tuberculosis. 18

Geographical variations
The values of the three coordinates for all regions are shown on a map of the Russian Federation (Fig. 6) and threedimensional scatter plot ( Fig. 7   had the highest prevalence of HIV and tuberculosis co-infection between 2010 and 2015 (available from the data repository). 23 The Jewish Autonomous Oblast had high tuberculosis mortality from 2004 (available from the data repository). 24

Discussion
We found large variations in the epidemiology of tuberculosis across different regions of the Asian part of the Russian Federation. Using principal coordinate analysis, we could identify regions with a more favourable tuberculosis epidemiological and control situation. The more favourable epidemiological profile of tuberculosis in Tomsk Oblast may be attributed to the drug-resistant tuberculosis programme, which was implemented only in this region. The programme was financed by the Global Fund to Fight AIDS, Tuberculosis and Malaria according to WHO recommendations, implemented between 2000 and 2013. The programme has been recognized as one of the most successful projects to manage MDR tuberculosis in the world. 25 We could also rank regions as high or low priority for specific interventions. The heterogeneity observed can be explained mainly by differing levels of antituberculosis services in different regions. Each region in this part of Russian Federation has its own specific characteristics and needs specific interventions to improve the situation. The regions where intervention is most urgently needed were Chukotka Autonomous Okrug, Jewish Autonomous Oblast and Tyva Republic. Although tuberculosis prevalence, notification and mortality rates have decreased in Tyva Republic over the past 14 years, the general situation of tuberculosis control in this region was unfavourable compared with other regions. In Chukotka Autonomous Okrug during 2011-2016, tuberculosis case notifications in the rural population were 3.0-5.7 times higher than in the urban area, which consists only of Anadyr town. In rural areas this variable had a range of 226.7-373.1 per 100 000 population from 2011 to 2016 versus 63.6-86.2 per 100 000 population in the urban area. In rural areas, where most of the population is of indigenous ethnicity, medical support is difficult to access. Patients from indigenous populations tend to seek medical care late during the disease and often after developing advanced tuberculosis lesions. These patients often then become a source of transmission within the community. Factors promoting a steady increase of tuberculosis notifications include the crowded living conditions in some traditional houses (yarangas) and insufficient access to health care. 26 The high tuberculosis mortality and case fatality in Jewish Autonomous Oblast can be explained by the presence of the Federal Penitentiary Service interregional tuberculosis hospital. This hospital serves tuberculosis patients from prisons located not only in this region, but also from neighbouring regions. All deaths in this hospital are of prisoners, thereby increasing the overall regional tuberculosis mortality by 15-20%. 27 We found low values on the coordinate associated with control of severe tuberculosis in Irkutsk Oblast and Jewish Autonomous Oblast. These data demonstrate the need to improve tuberculosis detection and treatment, to address the issue of HIV and tuberculosis co-infection in Irkutsk Oblast and to decrease tuberculosis deaths in the prison hospital in Jewish Autonomous Oblast.
The high values of MDR tuberculosis control in Chukotka Autonomous Okrug and Jewish Autonomous Oblast do not imply a low burden of MDR tuberculosis, but rather reflect a lack of capacity for diagnosing these patients.
These were the only regions in which tuberculosis facilities were not equipped with the Xpert® Mycobacterium tuberculosis/rifampicin (MTB/RIF) rapid automated nucleic acid amplification assay (Cepheid, Sunnyvale, United States of America) until the year 2016. The Xpert MTB/RIF assay was gradually implemented in most regions during the years 2013-2016, which could potentially bias the data on variables related to MDR tuberculosis. However, the proportion of multidrug resistance among all patients with tuberculosis increased during this period in only two regions (Irkutsk Oblast and Altai Krai) and thus the risk of bias is unlikely. The low rates of reported MDR tuberculosis before the implementation of the Xpert MTB/RIF assay in Irkutsk Oblast and Altai Krai are likely explained by the previous poor rates of detecting MDR tuberculosis.
The high levels of MDR tuberculosis in Sakha Republic and Tomsk Oblast can be attributed to the high quality of tuberculosis diagnostic services in these regions. In contrast, in Tyva Republic, high numbers of multidrug-resistant infections may be explained by inadequate treatment and compliance, and inadequately qualified medical personnel. 28,29 If the quality of diagnostic services in Tyva were improved, the multidrug-resistance levels would likely be even higher. Indeed, in the years 2011-2016 the proportion of smear-positive patients  This study has some limitations. We included variables in the analysis that were measured longitudinally, but each value was considered independently; we could therefore not describe the time trends of the variables using principal coordinate analysis. Since we did not have data at the individual level, correlations between variables could also be marginally affected by some unmeasured confounders. As this work focused on a large geographical area with a high burden of tuberculosis, specific population subgroups, such as marginalized communities or inhabitants from hard-to-reach rural areas, may not be completely covered by the analysis. However, the proportion of these people in the total population is small and we believe that the potential bias is negligible.
The overall burden of tuberculosis in the Asian part of the Russian Federation is still serious. The quality of diagnosis and treatment, and the availability of qualified medical personnel operating in properly equipped facilities are the most important factors that can improve individual care and population control of tuberculosis. Precision in defining and implementing the necessary interventions, as defined through the principal coordinate analysis approach, can guide efforts to control tuberculosis in this vast territory. ■ Research Tuberculosis in the Asian part of the Russian Federation Ivan Meshkov et al.

Variaciones en la prevalencia de la tuberculosis, Federación de Rusia: un enfoque multivariado
Objetivo Analizar las tendencias epidemiológicas de la tuberculosis en los distritos federales de Siberia y Extremo Oriente, las zonas con mayor carga de morbilidad de la Federación de Rusia. Métodos Se aplicó el análisis de coordenadas principales para estudiar un total de 68 variables relevantes en epidemiología, prevención y control de la tuberculosis. Los datos sobre estas variables se recopilaron entre 2003 y 2016 en las 21 regiones del distrito federal de Siberia y del distrito federal del Extremo Oriente (población total: 25,5 millones) a través del sistema de presentación de informes federal y departamental. Se identificaron las regiones con un perfil epidemiológico favorable o desfavorable de la tuberculosis y se clasificaron como de baja o alta prioridad para intervenciones específicas.