Misclassification error and performance of individual, household, community and country risk factors for malaria infection among sub-Saharan children under five

Abstract Very often, health workers are faced with lack of diagnostic materials in sub-Saharan countries, while the need is enormous because of several endemic problems affecting most often children. Malaria is one of the great public health problems in sub-Saharan African countries where 114 million people were infected in 2015, with approximately 400,000 deaths. Only 76% of persons suspected to be infected by malaria were submitted to a malaria diagnostic test in the public sector. The objective of our study is to analyze the classification error and the prediction performance of individuals, households, communities and countries factors as a diagnostic indicator for malaria parasite infection, considering the endemicity of household cluster. 61,292 children of 16 African countries from DHS and MIS surveys were included in analysis. Households and countries factors are the best sensitive tools for the malaria diagnosis among sub-Saharan African children with sensitivities more than 90% in low endemicity areas.


PUBLIC INTEREST STATEMENT
Very often, health workers in sub-Saharan countries are faced with a lack of diagnostic materials. However, because of several endemic problems most often affecting children, the need for these materials is enormous. Malaria is one of the greatest public health problems in sub-Saharan African countries. The objective of our study is to analyze the classification error and prediction performance of individual, household, community and country factors as diagnostic indicators for malaria parasite infection, taking into account the endemicity of household clusters. Misclassification was evaluated to examine which class (infected children or non-infected children) is more misclassified than the other according to the group of factors considered. Percentage of classification error allows us to obtain false positive and false negative rates. Roc curves were used for estimation of sensitivity and specificity of these factors in the prediction of malaria parasite infection

Introduction
Very often, health workers in sub-Saharan countries are faced with a lack of diagnostic materials. However, because of several endemic problems most often affecting children, the need for these materials is enormous. Malaria is one of the greatest public health problems in sub-Saharan African countries. One hundred and fourteen million people were infected in 2015, with approximately 400,000 deaths. Only 76% of persons suspected to be infected with malaria were submitted to a malaria diagnostic test in the public sector (World Health Organization [WHO], 2016a[WHO], , 2016b. In the context of sub-Saharan African countries, several diagnostic tests such as microscopy, rapid diagnostic tests (RDT) or polymerase chain reaction are usually used to detect whether an individual is infected with the malaria parasite. But locally, in some sub-Saharan areas, health workers are obliged to use other tools in the investigation of malaria parasite infection. This is due to a lack of materials and distance to the nearest laboratory, health center, or an absence of local trained health workers with considerable expertise in malaria diagnostic testing (Mharakurwa, Simoloka, Thuma, Shiff, & Sullivan, 2006;Molla, 2016; United Nations Children's Fund. Statistic by country [UNICEF], 2016;World Health Organization [WHO], 2015). Although several diseases cause fever, a history of fever in children is often treated as malaria parasite infection, sometimes leading to a misdiagnosis. Individual, household, community and country factors can be examined and used to support malaria diagnostic testing, enabling treatment to be administered to persons in as short a time as possible. Malaria risk factors could also be used to supplement diagnostic materials, reducing the likelihood of false positives or false negatives. Indeed, sensitivity and specificity of a diagnostic test which determine the performance of a test are rarely equal to 100% and vary according to both the characteristics of the surveyed population and external factors. These include age, sampling season, presence of cross-reacting diseases, malaria species, previous treatment (Fontela, Pant Pai, Schiller, Dendukuri, & Ramsay et al., 2009;Hartnack, Nathues, Nathues, Grosse Beilage, & Lewis, 2014).
As the study of Mfueni found individual, household, community and country factors associated to malaria risk, we assumed that those risk factors can be used for prediction of malaria infection in Sub-Saharan African Countries considering the endemicity level of the area (Mfueni, 2016).
The objective of our study is to analyze the classification error and prediction performance of individual, household, community and country factors as diagnostic indicators for malaria parasite infection, taking into account the endemicity of household clusters. A study by Heidi has demonstrated that the performance of diagnostic tools is strongly influenced by the level of area endemicity (Heidi et al., 2008).

Data
61,292 children of 16 sub-Saharan African countries from the Malaria Indicator Surveys (MIS) and from Demographic and Health Surveys (DHS) surveys were included in the analysis. MIS surveys and DHS surveys are cross-sectional, nationally representative studies carried out in different developing countries. First, each country was divided into small geographic areas (clusters) and in each cluster, three strata were created: towns; cities and rural or urban areas. In the second degree, households were selected. Children for whom inform consent was given by their guardians were tested for malaria and included in the survey data.
Geographic references of clusters were collected during the survey with a GPS device or from paper maps where GPS data was unavailable. To preserve the confidentiality of the households surveyed, urban clusters were randomly displaced by 0-2 km and rural clusters by 0-5 km. Results from rapid diagnostic tests (RDTs) for malaria were used for the analysis. Different types of RDTs were used according to the country surveyed. We did not use results from microscopy because of the many problems with this type of testing in some national surveys (lack of skilled microscopists and poor staining of slides due to transportation and storage) (Inner City fund International [ICF], 2013).
The analysis focuses on African countries in which surveys incorporating malaria tests, conducted between 2010 and 2015, which contained GPS (Global Positioning System) data. For each country considered in our study, we have merged child data, household data and GPS data. Children under five in 16 sub-Saharan countries were included in this study (Angola, Benin, Burkina Faso, Burundi, DR Congo, Ivory Coast, Liberia, Madagascar, Malawi, Mali, Mozambique, Nigeria, Rwanda, Senegal, Tanzania, Uganda) (Demographic and Health Surveys [DHS], 2010[DHS], -2015. The longitude and latitude of each cluster were used to extract data on temperature, precipitation, population density, and the distance of houses from cities where conflicts were taking place (within 100 km) and rivers or bodies of water (within 5 km).
Temperature and rainfall data used in our study were extracted from the WorldClim website (a set of free global climate layers) for the recent period , with a resolution of 2.5 arcminutes. The mean temperature and mean precipitation in each country for the month(s) during which the survey was conducted were considered (WorldClim -Global Climate Data [Free climate data for ecological modeling and GIS. http://www.worldclim.org/]).
The data on population density comes from the Gridded Population of the World version 3, from the Centre for International Earth Science Information Network (CIESIN), Columbia University and Centro International de Agricultura Tropical (CIAT), with a resolution of 2.5 arcminutes, using the unit 'persons per square kilometre (Center for International Earth Science

Statistical methods
IBM SPSS Statistics Version 20.0 software was used for merging data, for the creation of variable prevalence of malaria by cluster and for estimating the number of individuals in each sub-group. For extracting environmental data, QGIS 2.12.0-Lyon was used, in the EPSG: 4326 (WGS 84) coordinate reference system. Software r 3.3.2 was used for the evaluation of misclassification error and to assess the performance of malaria risk factors. SAGA GIS 2.1.2 (System for Automated Geoscientific Analysis) software was used to map malaria endemicity by household cluster.
Four models were investigated in this study: a model without consideration of endemicity level, a model in a low endemicity cluster (malaria prevalence ≤5), a model in an intermediate endemicity cluster (malaria prevalence 5-40%) and a model in a high endemicity cluster (malaria prevalence >40). In each model, all variables were included (individual, household, community and country factors). Error plots were performed for each model, assessing accuracy by showing the OOB estimates of error rate. A lower OOB estimate of error rate indicates a more accurate model. Misclassification was evaluated to examine which class (infected children or non-infected children) is more misclassified than the other according to the group of factors considered. Percentage of classification error allows us to obtain false positive and false negative rates when individual, household, community and country factors are used as a diagnostic tool for the prediction of malaria parasite infection. Roc (Receiver operating characteristic) curves were used for the estimation of sensitivity and specificity of these factors in the prediction of malaria parasite infection (Roc curves are provided in appendices). Investigations on error rate and on performance of factors were performed using Randomforest (Breiman et al., 2015).
To produce a map of endemicity according to household cluster, we used a gridding shape with levels of endemicity by the cluster as the attribute variable considering the mean of points and 2 bytes as the multiplication value.
DHS data were weighted according to complexes sample for subgroups numbers of household factors.

Results
Among 61,292 children under five from 16 sub-Saharan African countries, 17,822 were infected with the malaria parasite. In tables 1, 2 and 3 we have indicated numbers of infected children within the categories of individual, household and community factors, respectively. We observed that in the categories of less educated mother, severe anemia, poorest household, urban areas, intermediate temperature (25-28°C), abundant precipitation (>200 mm), near to conflict territories, distant from rivers or bodies of water and living in low density population areas, more children are infected by the malaria parasite. Selected countries in our study, with numbers of infected and non-infected children, are presented in Figure 1. Burkina-Faso is the country with the highest malaria prevalence. We can see that Burkina Faso is the only country where the number of infected children is greater than the number of non-infected children.

Malaria endemicity across sub-Saharan countries
Whilst malaria affects all sub-Saharan African countries, we have observed that the endemicity of malaria varies between countries and even within the same country ( Figure 2).

Accuracy of models
We have examined the accuracy of models using individual, household, community and country factors for prediction of malaria parasite infection. Results shown in Figure 3 demonstrate OOB estimates of error rate (21%, 0.20%, 25%, 32%) for models without consideration of endemicity, at low, intermediate and high endemicity levels, respectively. OOB estimates of error rate correspond to points in the curve of the plot from where the error rate decreases and the number of trees in Randomforest increases.

Error rate estimation
Without consideration of area endemicity, all groups of factors (individual, household, community and country) better classified malaria parasite infection among non-infected children. The best performing factor among non-infected children was household characteristics with 2.36% of classification error. Among infected children, individual factors were the best performing with 64.60% of classification error. OOB estimates of error rate were similar for all groups of factors (around 25%).
In low endemicity areas, individual, household, community and country factors had 0.00% of classification error for non-infected children, while for infected children, those factors had 100% of classification error. OOB estimates of error rates were very low for all groups of factors. In intermediate endemicity areas, classification error was almost the same as in low endemicity areas. For non-infected children, individual, household, community and country factors had 0.00% of classification error, but 100% misclassified malaria parasite infection among infected children.
In high endemicity areas, the situation is the inverse of that in low and intermediate endemicity areas. Individual, household, community and country factors had almost 0% of classification error for infected children. For non-infected children, those groups of variables presented 100% of error for classification of children according to malaria parasite infection.

Performance of individual factors
Individual factors had the highest performing sensitivity and specificity for prediction of malaria parasite infection for children living in low endemicity areas (Se = 70.6, Sp = 74.0, PV + = 0.1, PV − = 99.4) compared to other models. Individual factors remain a good tool for prediction of malaria parasite infection among children under five in sub-Saharan African countries. Even where the level of area endemicity was not taken into account, individual factors were sensitive and specific tools for the prediction of malarial parasite infection across sub-Saharan African countries.

Performance of household factors
Household factors as a prediction tool for the malaria parasite infection displayed the best sensitivity in low endemicity areas (Se = 91.5%), while in intermediate and high endemicity areas, household factors displayed low sensitivity and specificity.

Performance of community factors
In low and intermediate endemicity areas, community factors displayed higher sensitivity than specificity as a tool for prediction of malaria parasite infection among sub-Saharan African children (Se = 78.0, Sp = 53.5; Se = 62.9, Sp = 44.1). In high endemicity areas, the inverse was the case; community factors displayed higher specificity than sensitivity (Se = 45.4; Sp = 63.6).

Performance of country factor
Country factor was the most sensitive indicator in low endemicity areas (Se = 96.2) and the most specific indicator in high endemicity areas (Sp = 80.0) for the prediction of malaria parasite infection among children under five in sub-Saharan African countries. In intermediate endemicity areas, country factor displayed poor sensitivity and poor specificity.

Discussion
Malaria is among the leading causes of morbidity and mortality for sub-Saharan African children, killing a child every 50 s (Rollback malaria [RBM], 2015; WHO, 2016b). Making an accurate diagnosis within a short time will enable the provision of appropriate treatment and a reduction in mortality rates. But making an accurate diagnosis remains a problem in certain areas due to a lack of diagnostic materials or the distance to the nearest health center.
The examination of demographic, socioeconomic or environmental factors at individual, household, community and country level could provide information which could help health workers in emergency situations to make an accurate diagnosis and rapidly provide the most appropriate available treatment. The aim of this study is to investigate the possibility of using these factors in the prediction of malaria parasite infection, with or without consideration of area endemicity. To do this, we first calculated the proportion of each category of factors among infected children in order to observe sub-groups with more infected children than others. Mapping of endemicity by household cluster allowed us to observe that the distribution of malaria transmission differs both within and across sub-Saharan African countries.
We have evaluated the performance of individual, household, community and country factors as diagnostic tools for the prediction of malaria parasite infection among sub-Saharan African children in four models: a model without consideration of area endemicity, a model in a low endemicity area, a model in an intermediate endemicity area and a model in a high endemicity area. OOB estimates of error rate were made for each model, showing that the model in a low endemicity area was the most accurate, with a prediction error of almost nil.
The estimation of classification errors was determined for each group of predictors. Results showed that there was a poor error of classification for predicting malaria parasite infection by individual, household, community and country factors among non-infected children in low and intermediate endemicity areas. In high endemicity areas, classification errors were almost 0 %. Information from those factors indicates that children with a less educated mother, with anemia status, those who sleep without a bed net, live in a household in poverty, live in urban as opposed to rural areas, whose place of residence has abundant precipitation with a temperature between 25°C and 28.5°C, who live in an area where the density of population is low and live far from rivers or near to conflict areas, were more frequently infected than other children. Prediction performance was assessed for all models by computing sensitivity and specificity of individual, household, community and country factors using Roc curve.
Household and country factors were the most sensitive tools for malaria diagnosis among sub-Saharan African children, with sensitivities of more than 90% in low endemicity areas. Community and country factors displayed poor sensitivity in high endemicity areas. The most specific indicator in high endemicity areas was country of residence. We observed that the consideration of endemicity level in the household cluster was crucial. Performance of individual, household, community and country factors for the prediction of malaria parasite infection among sub-Saharan African children depends strongly on endemicity in the place of residence. We observed that false predictive or positive predictive rates depend greatly on the class of children (infected by the malaria parasite or not) and the level of endemicity.
Several studies have investigated the performance of fever episodes in malaria diagnosis. However, we have found no studies which estimate the performance of individual, household, community and country factors as diagnostic tools in the prediction of malaria parasite infection among children under five across sub-Saharan African countries. A study by Mutanda et al. in low transmission areas found sensitivity and specificity of 88.9% and 15.4%, respectively, for fever among Kenyan children under five (Mutanda, 2014;Ajayi, 2009).
It should be noted that, due to concerns about the privacy of children included in the DHS survey, geographic references were displaced by 0-2 km for urban clusters and by 0-5 km for rural clusters. Data on the individual factor, fever, were obtained using a questionnaire completed by mothers, and could, therefore, have been influenced by varying levels of understanding and personal definitions of fever. The results of our study may, therefore, contain some inaccuracy. Because of the limitations of microscopy and the absence of gold standard tests for use during national surveys in some developing countries, results from RDT alone were preferred for our analysis.
This study has the merit of evaluating the performance of individual, household, community and country factors whilst taking into account the endemicity of very small geographic areas (cluster of households) as opposed to the endemicity of regions. The number of sub-Saharan countries included in our study allows health workers to interpolate performance of demographic, socio-economic or environmental variables for the prediction of malaria parasite infection in other sub-Saharan countries for which there is no DHS or MIS data available.
As with almost all screening tools, an ideal diagnostic test with 100% of sensitivity and 100% of specificity does not exist. It was important to investigate under which conditions the available tools perform best and to evaluate the limits of their performance. This study will be helpful for health-care providers in the prediction of malaria infection in Sub-Saharan areas.

Conclusion
We have demonstrated the potentiality of the predictive efficacy of some factors for malaria risk in sub-Saharan African countries by estimating their performance and their errors when they are used. This study found that it is possible to investigate in malaria diagnostic with weak errors rates considering the area endemicity level. Knowing the diagnostic problems faced for malaria in Sub-Saharan African Countries, it is important that such studies be performed country by country for adding others specific regional factors helping to be more as possible the best malaria prediction using this tool.