A machine learning model for supporting symptom-based referral and diagnosis of bronchitis and pneumonia in limited resource settings

Pneumonia is a leading cause of mortality in limited resource settings (LRS), which are common in low-and middle-income countries (LMICs). Accurate referrals can reduce the devastating impact of pneumonia, especially in LRS. Discriminating pneumonia from other respiratory conditions based only on symptoms is a major challenge. Machine learning has shown promise in overcoming the diagnostic difﬁculties of pneumonia (i


Introduction
Pneumonia has a devastating impact on global health and is the largest cause of death due to infection in children worldwide [1]. Although cases of pneumonia are found globally, the disease burden falls heavily in low-and middle income countries (LMICs), with around 90% of the global child mortality due to pneumonia and diarrhoea occurring in sub-Saharan Africa and South Asia [2]. Characteristics of individuals living in LMICs, such as malnutrition and exposure to air pollution, have long been understood to increase susceptibility to severe pneumonia [3]. Further to this, the access to appropriate treatment can be problematic in many LMICs, further increasing pneumonia fatalities. Fundamental to reducing deaths is rapid identification of the most appropriate treatment for pneumonia, as recognised in the 2015 Global Action Plan for the Prevention and Control of Pneumonia and Diarrhoea (GAPPD) [2], which calls for an end to preventable child deaths from pneumonia by 2025.
Once the patient is referred to specialised hospitals with suspected pneumonia, a confirmed diagnosis requires instrumental diagnostic tests (e.g., X-ray, pulse oximetry, blood tests and sputum tests), often integrated with more advanced investigations (e.g., CT scan, arterial blood gas tests, pleural fluid culture and bronchoscopy) [4,5]. Those tests and investigations are costly and not widely available in LMIC community settings, especially in rural areas. Unfortunately, pneumonia signs and symptoms are common to many other respiratory diseases (e.g., bronchitis), resulting in incorrect or delayed referrals. In fact, pneumonia signs and symptoms include cough (often with mucus and blood production), dyspnoea, fever, sweating, chest pain, loss of appetite, fatigue, nausea and vomiting (sometimes with mucus and blood) and confusion, especially in senior patients.
Accurate and timely pneumonia referral is crucial, especially in limited resource settings (LRS), which are abundant in LMICs [6]. The meaning of LRS may differ depending on the context [7], here LRS are taken to describe a healthcare setting experiencing a lack of either physical or organizational infrastructure, such as trained personal, facilities or equipment [6]. The COVID-19 pandemic demonstrated that appropriate referral of pneumonia is crucial also in highincome countries during a disaster, such as a pandemic [8]. For instance, the number of papers retrieved in PubMed combining the keywords ''pneumonia" and ''referral" moved from approximately 13 papers per month in 2019 to 90 papers per month in 2020 and 136 papers per month in 2021 until March (search query ''(referral) AND (pneumonia)" in PubMed; search date 6th of April 2021).
The availability of large datasets and the need for highly accurate and timely referral and detection of diseases are motivating the use of data-driven machine learning (ML) methods in the field [9][10][11]. ML and deep learning methods have gained much attention in recent years for the automatic detection of pneumonia through imaging, in particular through analysis of chest X-ray or computed tomography (CT) [12][13][14][15][16][17][18][19][20]. The onset of COVID-19 and the subsequent global pressure on healthcare systems has further driven research in this area [21][22][23][24]. Such techniques are an attractive way to reduce the pressure on healthcare services with limited medical resources and staff, by providing fast and accurate diagnosis, reducing demand on equipment and expertise. Important considerations for use in LRS are speed of classification and minimal user input and energy requirements [12]. Although these methods are crucial for diagnosing pneumonia in a hospital setting, it is important to also consider that barriers exist in LMICs, which delay the diagnosis of pneumonia. In this regard, it is crucial to also support disease referrals in community settings, where symptoms alone can be assessed. In fact, it is widely accepted that identification and management of pneumonia in community settings significantly reduces deaths [25]. Healthcare services in LMICs strongly rely on community health workers (CHWs), especially in rural areas where there is inconsistent access to specialised doctors or hospitals [26]. Evidence suggested that Rapid Diagnostic Tests (RDTs) may support CHWs in detecting pneumonia in community settings [27,28]. Such an approach has proved promising in increasing quality of care and improving diagnosis and treatment availability especially in LRS [29], where patients may gain access to diagnostic systems using widely available technology such as smart phones [30]. Yet, there is a clear gap in interpreting pneumonia symptoms, which may be bridged through integration of RDTs and smart phones with ML, especially in community settings. This remains challenging due to the similarity of pneumonia symptoms with those of other respiratory diseases, such as bronchitis. In fact, different lower-respiratory-tract diseases tend to present with an overlapping set of symptoms, in this way it may be difficult to manually identify patterns and features in data, making it another appropriate challenge to be faced with ML. Unfortunately, this problem has not been well investigated. In fact, from the existing literature, it can be found that use of traditional ML for lower-respiratorydisease recognition based on symptoms is relatively scarce. This is limiting the potential for implementation of ML models in clinical analysis, which could improve the diagnostic capability of existing Computer-Aided Design (CAD) systems to automatically detect diseases such as pneumonia [31].
Overlapping symptoms represent a significant barrier for pneumonia referral, especially when resources and expertise are not widely available such as in LMICs and LRS. Correct referral holds the key to the most effective diagnosis, treatment and management [32]. In particular, pneumonia and bronchitis have many similar symptoms, affecting patients' referral. Indeed, where pneumonia refers to the presence of fluid in the alveoli, bronchitis is characterized by acute inflammation of the trachea and airways. Bronchitis most commonly results from viral infection, therefore, treatment with antibiotics is not generally effective, whereas for pneumonia the situation is reversed, providing strong motivation for correct referrals [17,33].
Several studies have achieved promising performance in classification of pneumonia [34][35][36][37][38][39][40][41][42][43] and other respiratory disease such as Chronic Obstructive Pulmonary Disease (COPD) and asthma [44][45][46] using ML with symptomatic predictors in combination with laboratory test results. However, there has been relatively little investigation of models using purely symptoms and signs as predictors [47,48], which, given the restraints upon healthcare at the community level, highlights a shortfall of research in this area. Furthermore, several issues in the field are apparent: reports do not provide a strong evidence base for their models, there is a lack of clarity in reporting of ML methods and the issue of distinguishing pneumonia from other similarly presenting respiratory diseases is not addressed in the existing literature.
Therefore, this study aimed to design an evidence based and interpretable ML model, using easily recognized symptoms and signs as predictors, which can distinguish between patients with bronchitis and pneumonia. Such a model is suitable for incorporation into a diagnostic tool, for the purpose of screening for pneumonia in the community, with the aim of improving access to referral and treatment.

Materials and methods
The steps completed during this work, beginning with the raw data and ending with an evaluation of predictive model performance, are outlined in Fig. 1. Symptomatic features were manually considered for machine learning in accordance with their clinical relevance and the goals of this study. A full description of the dataset and further details of each stage in the analysis are provided in the following sections.

Dataset
The data used in this work was collected as part of a prospective study, following internationally accepted medical practices for diagnosis of Chronic obstructive pulmonary disease (COPD) and asthma [49,50]. The dataset was generated to be suitable for design, validation, and real-time testing of a classifier to automatically identify bronchitis and pneumonia. Before starting the study, the ethics board approval for human subject testing from the Hospital Sarajevo was obtained (No. 01-11/EO-06/18), as well as the patients' informed consent.
Healthcare institutions also approved all methods and procedures which were performed in accordance with the relevant guidelines and regulations. Samples originate from the period of October 2017 until December 2018.
Only patients with confirmed diagnosis were included as subjects in this study. Diagnoses were performed by medical professionals following clinical assessment according to international guidelines. Baseline assessments consisted of screening for patient symptoms using symptom-based questionnaires or interviews conducted by a medical professional. All spirometry lung function tests were obtained using the CareFusion ''Master Screen" device (Hoechberg, Germany), which measured, derived and calculated all the required spirometry parameters.
The dataset comprised clinical information on 4500 individuals either diagnosed with bronchitis (1500) or pneumonia (3000). Information collected included a range of symptoms typical of respiratory illness, laboratory test results and various population descriptive characteristics such as exposure to air pollution or malnutrition. This information was established by medical professionals. Full description of the variables extracted are presented in Table 1. Initially raw data was cleaned i.e., to deal with missing data and analyse outliers. Statistical analysis was performed to evaluate possible heterogeneity of population characteristics and variable distribution between patient groups. Initial manual feature choice on the clean dataset was completed before data splitting into separate folders. Feature selection was done by using backwards feature selection during the ML model training and validation. Model testing was performed on Folder 2. Finally, performance of the models on the test data folder was evaluated using several metrics recommended by the literature.

Statistical analysis
Statistical analysis was used to determine whether there was homogeneity between patients with pneumonia and bronchitis in terms of age, sex, and population characteristics. Further statistical tests were employed to understand the behavior of variables in terms of their distribution i.e., whether continuous variables were normally distributed. Finally, statistical differences of variables between the two patient groups were evaluated. Statistical test selection was informed by best practice guidelines described in the literature [51] and the methodology of the identified similar studies [34][35][36][37][38][39][40][41][42]52]. Features were tested for normality using the Chi-square goodness of fit test. Continuous variables were expressed as mean ± standard deviation or median and standard error. A non-parametric statistical test, Kruskal-Wallis test, was used for comparison of continuous variables between the two groups of patients (pneumonia and bronchitis), as it is appropriate for variables which are not normally distributed [51]. Categorical variables were expressed as a percentage and were compared using the Chi-Square, or Fisher Exact tests. A p-value of <0.05 was considered significant when assessing the variation of the features among the two patient groups. Bonferroni's correction was used for multiple hypothesis correction if necessary. Correlation analysis was carried out by Goodman and Kruskal's tau correlation.
Box and scatter plots were used to identify outliers in continuous variables. After ensuring that there were no changes to the data on importing or coding, outliers were quantified as any value which is more than three scaled absolute deviations from the median [53]. Subjects with outlying continuous variable values (156 individuals) were removed from the dataset, leaving 2844 pneumonia and 1500 bronchitis subjects. All the analyses were run in Matlab2019b.

2.3.
Model training, validation, and testing procedure As shown in Fig. 2, training and validation was performed on Folder 1 (60% of the total amount of data), and testing was performed on the remaining independent 40 % of data. The splitting was done in a stratified subject-wise fashion. Feature Selection. Sufficiently large numbers of subjects allowed free selection amongst available attributes, complying with the '10 events per attribute' rule of thumb to avoid overfitting [9,54]. In the event that there were zero occurrences of a certain symptom in either class, these symptoms were discarded due to risk that information was not collected and to avoid a trivial separation of groups. Initial manual feature choice was performed on the clean dataset based on clinical relevance. In fact, for a diagnostic and/or screening application, the features should have some bearing on the disease [55]. Feature selection, based on the cluster of features manually selected, was then performed on Folder 1, after data splitting. Therefore, feature selection and model training were performed on the same folder [56]. Correlations between variables was evaluated using Kendall rank correlation test, with correlating variables not to be included as features. Correlated variables were considered as those which had a statistically significant Kendall rank coefficient greater than 0.5 ( s j j > 0:5; p < 0:05). Backward feature selection was performed on the training dataset with only the best combination of features reported on. Machine Learning Methods. Models automatically classifying patients as either having bronchitis or pneumonia were developed using three different machine learning methods: logistic regression (LR), decision tree and support vector machine (SVM). LR is an extension of linear regression, which predicts probability of a case belonging to a certain class [57]. A decision tree creates a set of 'if-else' conditions to predict the class of a given case [58]. SVM, which belongs to a general field of kernel-based machine learning methods, is used to efficiently classify both linearly and nonlinearly separable data [59]. Algorithm parameter tuning was performed during training and validation. Regarding the final model parameters, a fine tree with maximum splits of 100 was used for the decision tree, while a linear kernel with a scale of 0.8792 was used for SVM.
Training and validation. The training of the machinelearning models was performed on the folder 1 (1706 pneu- monia patients, 900 bronchitis). Folder 1 was further divided into ten equal sized subsamples, according to the 10-fold person-independent cross-validation approach. Of these ten subsamples, nine subsamples were used as training data and the remaining one was retained for validating the model. The process was then repeated ten times, with each of the ten subsamples used exactly once as the validation data. Finally, the cross-validated estimations were computed by averaging the performances over the ten validation subsamples. Classification measures were adopted according to the standard formulae [60].
Testing. Testing a classifier involves analyzing its performance on a set of subjects that is independent from the training and validation set [61]. Accordingly, folder 2 (1138 pneumonia patients, 600 bronchitis) was used to test the trained models.
The model performance was obtained for the optimal operating point on the receiving operating characteristic (ROC) curve, as calculated by the MATLAB perfcurve function that relies on a previously described cost-function curve analysis [62].
Finally, the best performing model was selected as the one achieving the highest averaged area under curve (AUC), which is a reliable estimator of both sensitivity and specificity rates. In case of equal AUC, the model with the highest overall accuracy was selected.

Results
The clean dataset consisted of a total of 4344 samples, of which 2844 patients were diagnosed with pneumonia and the remaining with bronchitis. All continuous features were not normally distributed, with p-values <0.01. The mean, median, standard deviation and range of continuous variables is shown in Table 2. The final column reports the p-value of the Kruskal-Wallis Test for attribute variations between bronchitis and pneumonia subjects. All p-values fell <0.01, this indicates significant difference for all attributes between bronchitis and pneumonia. The counts and proportions of the categorical variables between pneumonia and bronchitis groups is presented in Table 3. The final column reports the resulting p-value of the Chi-square (multi-class attributes) and Fisher Exact (binary attributes) Tests. Several symptoms were either not registered during data collection or not experienced by bronchitis sufferers: fever, sweating, muscle pain, headache and loss of appetite. Such attributes were discarded as the clear distinction does not provide an appropriate machine learning problem. Further, there may have been differences in data collection between groups for these symptoms. Age above 65 years old, auscultation, sputum and RTG showed to be statistically different with a p-value less than 0.01 between bronchitis and pneumonia cases.
Kendal rank correlation analysis between symptoms (cough, expectoration, dyspnoea, pleura pain, auscultation and sputum) and population descriptive variables (Above 65, associated chronic bronchopulmonary, immunosuppression, allergy, exposure to air pollution) found no correlations. Backwards feature selection found that including all the six above symptoms granted the best model performance. Population descriptive variables did not improve performance so were discarded in order to reduce the complexity of the model. Therefore, the final selected features were: cough, expectoration, dyspnoea, pleura pain, auscultation and sputum. Results of the three different ML methods are reported in Table 4. Although AUC are similar across the three methods, the model considered most successful and suitable was the decision tree. This is due to its superior overall accuracy over both LR and SVM. Furthermore, decision tree granted the fastest execution time to accurately predict pneumonia.
The ROC curves for the final models are shown in Fig. 3.

Discussion
This study proposes an easily interpreted, tree-based model for the automatic classification of pneumonia from bronchitis based entirely on easily measurable symptoms and signs. Features were selected based on their clinical relevance and availability on patient assessment. The manual feature selection method employed permitted a clear focus on the clinical utility and application of the model. Some key criteria used were: i) measurable in a point of care setting [63]; ii) parame- ters frequently investigated [64]; iii) ease of availability [65] and iv) reliability [34]. The results showed that by using a set of symptoms such as cough, expectoration, dyspnoea, pleura pain, auscultation and sputum, the model correctly identified more than 80% of patients with confirmed diagnosis of pneumonia. Although cough, expectoration, dyspnoea and pleura pain were not found to be statistically different among the two classes of respiratory diseases, the combination of those variables with auscultation and sputum signs achieved significant results to automatically distinguish patients affected by pneumonia from those with bronchitis. These results suggested that it was possible to correctly distinguish patients presenting with bronchitis or pneumonia, before performing clinical tests (e.g., X-ray), which required extensive expertise or advanced equipment. This can be crucial for patient referrals in community settings, especially in LRS. In fact, a set of predictors, which are easily recognised by CHWs or even self-reported and an automated ML model, which is suitable for incorporation into a tool such as an APP for use via mobile phones, would have great value in assisting referrals of pneumonia in LMICs. Moreover, in high income settings such a tool may complement traditional community healthcare services by providing widely available digital tests through apps for triage. The global need for such technology has been brought to the forefront of healthcare concerns in particular during outbreaks of COVID-19.
Although symptoms of bronchitis and pneumonia are similar, as discussed in the introduction, the required treatment is very different. First line treatment for pneumonia generally comprises antibiotic administration with close monitoring, while acute bronchitis is self-limiting and does not benefit from antibiotic treatment [35]. Therefore, there is a high cost to patient outcomes when these conditions are misdiagnosed or poorly referred, an easy-to-use tool capable of distinguishing the symptoms of these diseases is desirable.
The use of the symptoms identified in this study is largely supported by their identification as predictors of pneumonia by several authors in the literature, using different data and a variety feature selection methodology. The most commonly employed predictor from the literature is the outcome of auscultation, of which fast breathing specifically is strongly associated with pneumonia [34,36,[40][41][42]. Cough [40,48] and productive cough/expectoration [36] were also found in several previous studies, as well as pleura pain [36,40] and dyspnoea [36,40,66]. Interestingly we were unable to identify any use of sputum evaluation in ML classifiers of pneumonia in the literature, which indeed in this study was found to be The p-value corresponds to the outcome of the chi-square or Fisher Exact tests for variation between pneumonia and bronchitis groups).
statistically different among bronchitis and pneumonia classes. This may reflect difference in clinical practices in different regions. Three ML methods: SVM; decision tree and LR, were employed to facilitate comparison among commonly used models for clinical classification problems with varying interpretability. The selection was motivated by a desire to represent and compare methods which are frequently employed in the literature for similar problems, in particular between interpretable models. Comparison of performance between existing studies is limited for several reasons: variation in pneumonia reference standard; variation in subject population and lack of standardized reporting of ML methods and performance. The state-of-the-art studies to detect pneumonia and/or COVID-19 Pneumonia are reported in Table 5. Of the existing studies in the literature which utilized SVMs [35,39,67], only one specified distinguishing pneumonia from other diseases (as oppose to healthy patients). In this report from Rother et al. 2015 [67], the authors reported a program consisting of eight classifiers, with a sensitivity of 90% being the only performance parameter reported. Perhaps due to it is simplicity, LR has also proved a popular choice, Feng et al. 2020 [52] reported a high sensitivity of 100% with a specificity of 78% and AUC of 93% when identifying COVD-19 pneumonia based on symptoms and blood test results. Classification and Regression Trees (CARTs) were used by De Santis et al. 2017 [42] (sensitivity of 38%, specificity of 97%) and Steurer et al. 2011 [40] (AUC of 90%) to classify pneumonia, however the latter failed to use any internal or external validation techniques. Therefore, further analysis of the chosen methods in relation to detection of pneumonia was required. The results reported here support the use of simple, interpretable models such as LR and decision tree, which were shown to perform as well as or indeed better than linear SVM.
The tree-based model, which was considered most favourable for use in a clinical tool due to being easily understood, was found to give the best performance with an AUC of 93%. In comparison, in 2018 Pervaiz et al. found the WHO pneumonia symptomatic predictors for childhood pneumonia: cough; difficulty breathing, fever, tachypnoea and chest indrawing, to achieve an AUC of only 62% [36,47]. Reports of models built using symptoms/signs alone are relatively rare [42,67,68]. Nuzhat et al. reported high sensitivity and specificity (94%, 99%) in a logistic regression model using cough and lower chest wall indrawing as predictors, however the methodology lacked internal or external validation [68]. Rother et al. utilized an ensemble method, achieving a sensitivity of 90%, in this case with the disadvantage of an uninterpretable final model [67]. More common has been to include additional laboratory or clinical tests as predictors [34,37,40,41,69,70], however, such techniques are costly, time consuming and require specialist training, so would be unavailable in low resource settings. Other studies in the literature have focused on employing image-based classification to diagnosis pneumonia, in particular relating to detecting cases of COVID-19 (Table 5). When the patients are referred to hospital for suspected pneumonia, instrumental investigations are certainly needed, including X-ray imaging. In this case, image classification is most often approached through deep learning methods and achieved high  performance; with the majority of methods reporting performance metrics above 90% (accuracy or AUC) [14,[19][20][21][22][23][24]. For instance, Li et al. reported an AUC of 95% for detecting CAP from pulmonary CT scans [22], and Yue et al. achieved an accuracy of over 90% for pneumonia detection in five different convolutional neural networks (CNNs) using chest X-ray images [14]. Whereas, Wang et al. [16] and Stephen at al. [12] used a CNN-based model via chest X-ray images to detect pneumonia from other medical conditions and/or healthy patients by achieving higher performance than Sirazitdinov et al. [13], that used an ensemble of two convolutional neural networks for pneumonia localization from a large-scale chest X-ray database. Another study [19] by Nahid et al. employed a CNN model by also using chest X-ray images to detect patients affected by pneumonia achieving over 97% accuracy. A recent study by Musad et al. [18] employed radiomic features extracted from chest X-ray images via a CNN method which were then inputted to more traditional machine learning algorithms such as Random Forest Tree. However, they achieved a lower classification accuracy (86.3%) to discriminate among healthy, bacterial pneumonia and viral pneumonia categories. CNN was, in fact, often selected among the studies reported in Table 5 to automatically detect viral pneumonia via imaging techniques. Only one study by Srivastava et al. [20] applied CNN methodologies to assist medical experts by providing a detailed and rigorous analysis of the medical respiratory audio data for Chronic Obstructive Pulmonary detection.
In order to reduce the time and complexity of developing novel models from scratch, transfer learning has proved a promising fast route to building high performance deep learning models. In particular regarding rapid development of COVID-19 detection models, Hira et al. [21], Elgendi et al. [23] and Brunese et al. [24] achieved an accuracy of 97.5%, 94% and 97% respectively, in detecting pneumonia from chest X-rays using pre-trained models.
However, despite performing well on existing data, the deep learning methods used in these studies are less prone to be adopted in clinical settings due to low reliability and trustfulness. Moreover, the studies employing imaging techniques as predictors aimed to develop a diagnostic model, whereas in this study we aimed at developing a classifier that would have great value in assisting referrals of pneumonia, especially in LMICs. In fact, we presented a fully interpretable, tree-based model taking symptoms and signs as inputs, that can distinguish pneumonia patients with a similar performance (above 90%) to image-based deep learning approaches.
Nevertheless, the high performance seen from these very different ML approaches at both the initial patient referral (based on symptoms) and hospital confirmation (based on instrumental investigation through image analysis) highlighted the varied and exciting promise of AI for both referral and diagnosis of pneumonia, which may contribute to alleviating pressures on clinical staff and equipment especially in LRS of LMICs. Most importantly, early symptomatic discrimination of pneumonia from bronchitis may avoid unnecessary antibiotic treatments, helping to limit antibiotic resistance and avoiding the onset of pneumonia complications that may compromise patient treatment.
Among the studies aiming to develop tools for screening and/or diagnosis of respiratory disease such as pneumonia, bronchitis, asthma and COPD [44,45], the vast majority used additional laboratory test results such as white blood cell counts and sedimentation as well as symptoms. In practice, carrying out such tests requires a high level of expertise and costly facilities. Such requirements are not only challenging in LRS, but also take time, therefore are not an ideal basis for wide screening tools for rural areas in LMICs. Moreover, Pervaiz et al. found no benefit to adding oxyhaemoglobin levels to a ML model based on signs and symptoms alone to predict radiographically confirmed pneumonia in children. Furthermore, Naydenova et al. and Groeneveld et al. find that addition of patient C-Reactive Protein (CRP) levels to models based on symptoms, vital signs and age worsened performance in classification of pneumonia. This falls into a wider picture in which there is currently contention regarding the use of biomarkers such as CRP or Procalcitonin (PCT) as indicators for pneumonia [37,71], further research into their relevance, in particular in LMICs, is necessary to justify their use as predictors in diagnostic tools.
As well as demonstrating promising performance on existing data from a middle-income country, the model proposed has the advantage of being easily interpreted. Such explainable AI models have several benefits, which increases their clinical utility: trust in the system, guarding against bias, passing regulatory requirements, verifying outputs and     [72]. Indeed, the importance of explainability in AI is not limited to symptom-based classifiers but extends also into the field of image analysis. This is challenging for deep learning, where it is not easy to follow the 'decision making' process leading to the final classification, unless specific tools for visualization of data significance are employed [73,74]. In fact, there is evidence in the literature of predictive deep learning models which are able to indicate the areas of a chest X-ray which contributed most to disease detection, which allows rapid identification of areas of interest to a radiologist in a hospital setting [24]. The work detailed here is a proof of concept that a simple, evidence-based ML model has the potential to perform well using symptoms and signs alone as predictors. However, this study comes with certain limitations. Firstly, the data driving this work was collected in a European middle-income country, therefore, may be of limited utility in populations of low-income countries (e.g., Sub-Saharan Africa), which indeed experience the greatest burden from pneumonia. In fact, pneumonia in Europe and pneumonia in Sub-Saharan Africa may have different causes (e.g., ageing and pollution vs Saharan desert dust), which may result in different symptoms. Furthermore, training of CHWs in low-income settings in auscultation, spirometry or sputum evaluation may not be easy, indeed equipment and expertise may not be available in emergency rooms even in high-income regions. Whilst the ML model reported in this study has the advantage of discriminating pneumonia from bronchitis, determining of the underlying pathogenesis and severity of pneumonia was beyond the scope of this study. This will be an interesting future avenue of research, as this information would be highly valuable in identifying the best treatment for pneumonia patients on an individual basis. Finally, in order to produce a clinical tool, it would be necessary to incorporate all commonly presenting respiratory diseases, not only bronchitis and pneumonia.

Conclusions
Correct referral and diagnosis of pneumonia is challenging due to low specificity of symptoms, lack of widely available diagnostic tests and varied clinical presentation amongst subpopulations. In this study, we applied machine learning algorithms to a dataset of 4344 patients (1500 bronchitis, 2844 pneumonia) containing information on subject population characteristics, symptoms and laboratory test results. Feature selection found 6 clinically relevant and easily interpreted patient symptoms to be the best predictors of pneumonia in this dataset. The best performing model that was able to distinguish pneumonia from bronchitis via sign and symptoms was a decision tree, which achieved an AUC of 93%. The robust, evidence-based design and ability to use symptoms to distinguish pneumonia from a similar respiratory disease (i.e., bronchitis) grants advantage for application in LMICs, compared to previously reported models relying mainly on instrumental tests and X-ray images. To be of most practical use in resource limited settings, machine learning models aiming at supporting disease screening, early diagnosis and appropriate referral, must provide thorough reporting of  CRediT authorship contribution statement

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.