An Intelligent Web-Based Decision Support System for Monitoring the Chronic Kidney Disease in Brazilian Communities

Background: Chronic Kidney Disease (CKD) is a worldwide public health problem, usually diagnosed in the late stages of the disease, increasing public health costs and mortality rates. The late diagnosis is even more critical in developing countries due to the high levels of poverty, a large number of hard-to-reach locations, and sometimes lack/precarious primary care. Methods: We designed and evaluated an intelligent web-based Decision Support System (DSS) using the J48 decision tree machine learning algorithm, knowledge-based system concepts, the clinical document architecture, Cohen's kappa statistic, and interviews with an experienced nephrologist. Results: We provided a DSS methodology that guided the development of the system, that provides remote monitoring features, to assist patients, primary care physicians, and the government in identifying and monitoring the CKD in Brazilian communities. A CKD dataset enabled the training and evaluation of the J48 decision tree algorithm, while Cohen's kappa statistic guided the evaluation of the knowledge-based system by interviews with an experienced nephrologist. Conclusion: The DSS facilitates the identication and monitoring of the CKD considering low-income populations in Brazil. In addition, the methodology and DSS can be reused in other developing countries with similar scenarios.


Introduction
The high prevalence and mortality rates of persons with chronic diseases, for example, the Chronic Kidney Disease(CKD) [1], are real-world public health problems. The World Health Organization (WHO) estimated that chronic diseases would cause 60 percent of the deaths reported in 2005, 80 percent in lowincome and lower-middle-income countries, increasing to 66.7 percent in 2020 [2]. According to the WHO health statistics 2019 [3], people who live in lowincome and lower-middle-income countries have a higher probability of dying prematurely from known chronic diseases such as Diabetes Mellitus (DM).
For the speci c case of CKD, the early identi cation and monitoring of this disease and its risk factors reduce the CKD progression and prevent adverse events, such as sudden development of diabetic nephropathy. The present study considers the CKD identi cation and monitoring focusing on people living in Brazil, a continental-size developing country. Developing countries stand for low-and middle-income regions, while developed countries are high-income regions, such as the USA [4]. The population of developing countries suffers from increased mortality rates caused by chronic diseases, e.g., CKD, Arterial Hypertension (AH), and DM [5]. AH and DM are two of the most common CKD risk factors. People with type 1 or type 2 DM are at high risk of developing diabetic nephropathy [6], while severe AH cases may increase kidney damage. For example, about 10 percent of the adult Brazilian population is aware of having some kidney damage, while about 70 percent remains undiagnosed [7].
The CKD is characterized by permanent damage, reducing the kidneys' excretory function, easily measured using the glomerular ltration [8]. However, the diagnosis usually occurs during more advanced stages because it is asymptomatic, postponing the application of counter measure, decreasing people's quality of life, and possibly leading to lethal kidney damage. For example, in 2010, about 650-500 people per million of the Brazilian population faced dialysis and kidney transplantation [9]. This number has grown, warning governments about the relevance of the CKD early diagnosis. In 2016, according to the Brazilian chronic dialysis survey, the number of patients under dialysis was 122,825.00, increasing this number by 31,000.00 in the last ve years [10]. In 2017, the prevalence and incidence rates of patients under dialysis were 610 and 194 per million population [11]. The incidence continued to be high in 2018 (133,464.00) [12].
The high prevalence and incidence of dialysis and kidney transplantation increase cost with public health. Therefore, the CKD has an expressive impact in the health economics perspective [13]. For instance, the Brazilian Ministry of Health reported that the transplantation and its procedures resulted in spending about 720 million reais in 2008 and 1.3 billion in 2015 [14]. The costs and the high rate of persons waiting for transplantation suggest the increased public spent on kidney diseases. Preventing CKD has a relevant role in reducing mortality rates and public health costs [15]. The CKD early diagnosis is even more challenging for people who live in remote and hard-to-reach settings because of either lack of or precarious primary care.
Organizations such as the Brazilian society of nephrology and the national kidney foundation have proposed tools to support the CKD diagnosis, assisting physicians in identifying kidney damage by estimating the glomerular ltration rate (GFR). The Cockcroft-Gault [16] and the modi cation of diet in renal disease (MDRD) [17] are classical equations to estimate the GFR. This type of tool assists the CKD diagnosis when physicians have adequate and simple access during clinical evaluations; however, this is not always the reality, mainly for developing countries. For example, Brazil, a continental-size country, has many problems related to computer-assisted healthcare compared to developed countries that maintain Electronic Health Records (EHRs) infrastructures.
EHRs are composed of large amounts of data, enabling analysis empowered by machine learning techniques to assist physicians during clinical evaluations [18]. Unfortunately, some remote settings, e.g., the Amazon Jungle, are subject to precarious public health and the absence of EHR, sometimes even facing the lack of primary care physicians.
In this study, we present a decision support system (DSS) methodology and propose an intelligent web-based DSS to assist patients, primary care physicians, and the government in identifying and monitoring the CKD in Brazilian communities. Thus, the system named MultCare may be integrated with existing government systems, e.g., the Brazilian SUS, to address increased mortality rates and public health costs. By providing continuous and remote monitoring, the DSS methodology addresses precarious public health, the absence of EHR, and the lack of primary care physicians. The study extends results of previous research articles [19,4], consisting of two main new contributions: (i) an intelligent DSS available for patients, physicians, and government to assist the CKD monitoring using machine learning and knowledge-based system concepts; and (ii) a methodology to design DSS for identifying and monitoring chronic diseases in Brazilian communities. These contributions represent state-of-the-art advances, considering that results presented in [19] do not consider machine learning, knowledge-based system concepts (knowledge modeling of a Brazilian expert), and functionalities for physicians and government. Besides, the results presented in [4] only relate to a comparative analysis of classi ers.
Nowadays, considering the increasing need for social distancing required by epidemics such as the COVID-19, the remote identi cation and monitoring of chronic diseases are relevant to prevent increasing morbidity and mortality rates due to possible infections and lack of treatment for existing conditions. Healthcare systems designed for these purposes are powered by arti cial intelligence techniques such as machine learning [20].
Decision support frameworks and systems have received the attention of researchers in the last years. For instance, Li et al. [21] present a utilization-based knowledge discovery framework to assist the assessment of healthcare access, meaning the presence of potential resources or actual use of healthcare services. The authors evaluate the framework by instantiating a DSS to analyze physician shortage areas. Hsu [22] describes a framework based on a ranking and feature selection algorithm to assist physicians' decision-making on the most relevant risk factors for cardiovascular diseases. The author also applies machine learning techniques to enable identifying the risk factors.
Walczak and Velanovich [23] developed an arti cial neural network (ANN) system to assist physicians and patients' decision-making in selecting the optimal treatment of pancreatic cancer. The system determines the 7-month survival or mortality of patients based on a speci c treatment decision. Topuz et al. [24] propose a decision support methodology guided by a Bayesian belief network algorithm to predict kidney transplantation's graft survival. The authors use a database with more than 31,000 U.S. patients and argue that the methodology can be reused to other datasets.
Wang et al. [25] evaluate a murine model, induced by intravenous Adriamycin injection, using optical coherence tomography (OCT) to assess the CKD progression by image of rat kidneys. The authors highlight that OCT images contain relevant data about kidney histopathology. Jahantigh, Malmir, and Avilaq [26] propose a fuzzy expert system to assist the medical diagnosis, focusing initially on kidney diseases. The system is guided by the experience of physicians to indicate disease pro les. Neves et al. [27] present a DSS to assist in identifying acute kidney injury and CKD using knowledge representation and reasoning procedures based on logic programming and ANN. Polat et al. [28] used the support vector machine technique and the two feature selection methods wrapper and lter to conduct the CKD identi cation in the early stages. The authors justify the computed-aided diagnosis based on high mortality rates of CKD. Finally, Arulanthu and Perumal [29] presented a DSS for CKD prediction (CKD or non-CKD) using a logistic regression model.
However, these CKD studies have some limitations. For example, no one considers the monitoring of CKD risk factors. Additionally, the solutions do not apply well-accepted standards to simplify the representation and sharing of evaluation results, such as the health level 7 (HL7) clinical document architecture (CDA) [30]. Other relevant topics to point out are the machine learning technique used to identify the disease and the costs of required examinations (predictors).
Most of the studies use a large number of predictors and apply complex analysis, increasing costs, and making di cult the double-checking of DSS results by physicians. Indeed, this type of functionality is relevant because other clinical conditions in uence the CKD, and the diagnosis is usually improved when physicians collaborate to conclude. Integrating functionalities focusing on patients and physicians is also not considered by the previous studies.

Data analysis
We primarily selected the dataset features based on medical guidelines, speci cally, the KDIGO guideline [31], the national institute for health and care excellence guideline [32], and the KDOQI guideline [33]. Besides, we interviewed a set of Brazilian nephrologists to con rm the relevance of the features in the context of Brazil. The nal set of CKD features focusing on Brazilian communities included AH, DM, creatinine, urea, albuminuria, age, gender, and GFR. Table   1 presents descriptions and types of features of the dataset.

Crea Real
Result of blood test used to assess kidney function.

Urea Real
Result of blood test of a substance produced (mainly) by the liver.

Albu Real
Result of a test to measure the amount of albumin in the urine.

Age Integer
The age of the subject.

Gender Integer
The number 0 represents male, while 1 represents female.

GFR Real
Result of the glomerular filtration rate of the subject.
In a previous study [19], we collected medical data (60 real-world medical records) from physical medical records of adult subjects (age ≥ 18) under the treatment of University Hospital Prof. Alberto Antunes of the UFAL, Brazil. The data collection from medical records maintained in a non-electronic format at the hospital was approved by the Brazilian ethics committee of UFAL, and conducted between 2015 and 2016. The dataset comprises 16 subjects with no kidney damage, 14 subjects diagnosed only with CKD, and 30 subjects diagnosed with CKD, AH, and/or DM. In general, the sample included subjects with ages between 31 and 79 years; approximately 94.5% of the subjects were diagnosed with AH, and 58.82% were diagnosed with DM. Table 2 presents a sample of the 60 real-world medical records, related to the four risk classes: low risk (30 records), moderate risk (11 records), high risk (16 records), and very high risk (3 records). An experienced nephrologist, with more than 30 years of CKD treatment and diagnosis in Brazil, labeled the risk classi cation based on the KDIGO guideline. The dataset did not contain duplicated and missing values. We only translated the dataset to English and converted the gender of subjects from string to a binary representation to enable the usage of the J48 decision tree algorithm. In addition, only for the training set, we augmented the dataset to decrease the impact of imbalanced data and improve the data analysis (more 54 records) by duplicating real-world medical records and carefully modifying the features, i.e., increasing each CKD biomarker by 0.5. We selected the constant 0.5 with no other purpose than to differentiate the instances and maintain the new one with the same label of the original. The perturbation of the data did not result in unacceptable ranges of values and incorrect labeling. The total number consisted of 108 for training and 6 records for testing. The experienced nephrologist veri ed the validity of the augmented data by analyzing each record regarding the correct risk classi cation (i.e., low, moderate, high, or very high risk). As stated above, the experienced nephrologist also evaluated the 60 realworld medical records. The preprocessed original and augmented datasets are available in a public repository [34].
To increase con dence in the system, after development, we conducted tests using the test set and performance metrics (i.e., correctly classi ed instances, incorrectly classi ed instances, precision, Precision-Recall Curve (PRC) area, and Receiver Operating Characteristic (ROC) area). We used the augmented training set to de ne a decision tree model based on the J48 algorithm. For evaluating the model, we applied the 10-fold cross-validation using the Weka© software. We used the Knowledge Flow interface of the Weka© to handle the 10 folds during the data augmentation to ensure that the test set only contained unseen real data. Afterward, we used the risk assessment model and the Weka© application programming interface for Java to develop the DSS, extending the results presented in [19].

Simulated scenarios and interviews
Also extending the results presented in [19], we used concepts of knowledge-based systems to design functionality to verify an emergency of a patient with hyperglycemia, hypoglycemia, hyperkalemia, or hypokalemia. We de ned a knowledge base by analyzing medical guidelines and interviewing a nephrologist with more than 30 years of teaching and treating patients with CKD and DM. We interviewed the experienced nephrologist to evaluate the knowledge-based system after developing the DSS. We simulated a total of 112 scenarios (i.e., ctitious subjects) considering the knowledge base designed and the risk of hyperglycemia, hypoglycemia, hyperkalemia, and hypokalemia.
To conduct the evaluation, we measured Cohen's kappa statistic by calculating the gross agreement and the kappa concordance index with 95% con dence intervals without adjusting for the bias and prevalence. The analysis considers the following k indices: k < 0, no agreement; k between 0 and 0.19, poor agreement; k between 0.20 and 0.39, low agreement; k between 0.40 and 0.59, moderate agreement; k between 0.60 and 0.79, substantial agreement; and k between 0.80 and 1, nearly perfect agreement. We used Cohen's kappa statistic to compare the knowledge-based system's results with the nephrologist evaluation when considering all the simulated scenarios.

DSS methodology
We show in Fig. 1 a schema for the proposed methodology to design DSS for identifying and monitoring CKD in Brazilian communities. Three actors interact with the DSS generated following the methodology: Physician (internal), Patient (internal), and Government Health System (external). This type of methodology is relevant because developing countries such as Brazil usually suffer from precarious primary health care in speci c settings, e.g., hard-to-reach and rural settings

System for patients
We divided this methodological step into two main tasks, which may be conducted simultaneously, for designing the front-end and back-end of web-based systems used by patients. The system should contain personal health records (PHR) and risk assessment functionalities. The risk assessment of the monitored chronic disease is based on the machine learning technique of decision tree analysis. A decision tree is suitable for this type of DSS because it is a white-box analysis approach, enabling physicians to double-check the patient's system's risk assessments quickly. In a previous study [4], we compared existing risk assessment models, showing the suitability of decision tree models for the context of developing countries. Once the patient's system evaluates the user's clinical situation, it sends a clinical document, structured using the HL7 CDA, to the physician responsible for monitoring the patient. The HL7 CDA document is an XML le that contains the risk analysis data, a risk analysis decision tree, and the PHR. Being a web-based system, patients can use it in remote and hard-to-reach settings using different devices, such as desktop computers, smartphones, and tablets. The computer-based risk assessment is used to address precarious public health problems, lack of EHRs, and lack of primary care physicians related to remote and hard-to-reach settings in developing countries, e.g., reaching people who live in the Brazilian Amazonas' state. The PHR is continuously sent to a central data server to update the patient's medical records.

System for physicians
We de ned two main tasks for this methodological step, which may be conducted simultaneously to design the front-end and back-end web-based systems used by physicians. The system should enable physicians to receive CDA documents from the patient's system to double-check the chronic disease risk assessment, conduct the nal diagnosis using the risk analysis data, risk analysis decision tree, and PHR. The decision tree is relevant for physicians to perform a step-by-step veri cation of the initial risk assessment. From these initial data, in case of uncertainty about the diagnosis, the physician may use the system to include more speci c tests in the CDA document and send it to other physicians to get second opinions until a more precise diagnosis is reached. When the physician concludes, the system updates data into the patient's medical records maintained using the clinical data server. This system is relevant to enable the remote evaluation (i.e., a simple analysis of risk assessment results provided by the model) of people who live in remote and hard-to-reach-settings.

Government Health System
Considering the maintenance of PHR, the continuous usage of the system by patients and physicians results in a large and centralized dataset of users under monitoring in remote and hard-to-reach settings. The subsystems handled by patients and physicians use a server subsystem's web services, aiming to update the local data into the centralized dataset. External government health systems can bene t from the centralized dataset by applying machine learning techniques, generating relevant information for planning public policies, e.g., conducting disease awareness marketing tactics for preventing chronic disease, focusing on settings that present high incidence.

WebMultCare
The proposed system is composed of three main subsystems: Patient, Medical, and Server. The Patient subsystem is composed of functionalities to handle, among other elements, glucose, and blood pressure sensors, acquiring data related to DM and AH. These data are recorded locally and sent to a database by the Server subsystem. When the CKD risk is identi ed, alerts and the patient's data are sent to the Medical subsystem; a subsystem used by a physician in a healthcare environment. Thus, the Medical subsystem enables physicians to analyze the risk analysis data and the patient's PHR, updating/con rming the patient's clinical condition under monitoring using the Server subsystem. As an example of a health system, the Brazilian SUS can reuse the patient's central data for planning public policies.
The architecture of the WebMultCare was de ned following the attribute-driven design method [35], and guided by the architectural drivers' modi ability, portability, scalability, availability, and interoperability. The system is based on the model-view-controller (MVC) pattern and architectural tactics called semantic coherence and information hiding to achieve modi ability and portability. In contrast, we use the client-server architectural pattern and web services to improve scalability, availability, and interoperability.

Patient subsystem
A previous Android version of the Patient subsystem was presented in [19], including formal speci cations, effectiveness evaluation, and usability tests. The usability tests showed some limitations that motivated the re-engineering of the subsystem based on web technologies. Additionally, the version presented in this article improves the CKD risk analysis using machine learning and knowledge-based system concepts. For instance, the system provides a new feature to refer patients with speci c emergencies (i.e., hyperglycemia, hypoglycemia, hypokalemia, and hyperkalemia) to an adequate healthcare facility using a knowledge base when visiting an unknown location.
The back-end of the Patient subsystem was implemented using Java and web services. The subsystem comprises the following main features: access control, management of ingested drugs, management of allergies, management of examinations, monitoring of AH and DM, execution of risk analysis, generation and sharing CDA documents, and analysis of the emergency. In contrast, the front-end of the Patient subsystem is implemented using HTML 5, Bootstrap, JavaScript, and Vue.js. Fig. 2a illustrates the graphical user interface (GUI) for recording a new CKD test result (the main inputs for the risk assessment model). The user can also upload an XML le containing the test results to present a large number of manual inputs. Once the patient provides the current test results, the main GUI of the Patient subsystem is updated, showing the test results available for the risk assessment. Fig. 2b illustrates the main GUI of the Patient subsystem, describing the creatinine, urea, albuminuria, and GFR (i.e., the main features used by the risk assessment model). This study reduces the number of required test results to conduct the CKD risk analysis from 5 to 4 compared to the previously published research [19]. This is critical for low-income populations using the Patient subsystem. The subsystem provides a new CKD risk analysis when the patient inputs all CKD features.
During the CKD risk analysis (conducted when all tests are available), and based on the presence/absence of DM, presence/absence of AH, age, and gender, the J48 decision tree algorithm classi es the patient's situation considering four classes: low risk, moderate risk, high risk, and very high risk. In case of moderate risk, high risk, or very high risk, the subsystem packages the classi cation results as a CDA document, along with the decision tree graphic and general data of the patient. The Patient subsystem alerts the physician responsible for the patient and sends the complete CDA document (i.e., the main output of the DSS) for further clinical analysis. In case of low risk, the Patient subsystem only records the risk analysis results to keep track of the patient's clinical situation. It does not send the physician alert, automating the risk analysis and sharing, previously requested to the users by button events [19]. In this article, the data of 114 records, available in the same CKD dataset used in [4], guided the training of the J48 decision tree algorithm to de ne the nal risk assessment model embedded in the proposed DSS. We experimented with modifying the parameters of the J48 decision tree algorithm to improve accuracy.
Thus, we con gured the split point, preventing the scanning of the entire dataset for the closest data value (relocation). For the remaining parameters, we used the default values of the J48 Weka© package.
Results presented in a previous study [4] justify the usage of the J48 decision tree algorithm and features (i.e., presence/absence of DM, presence/absence of AH, creatinine, urea, albuminuria, age, gender, and GFR) to conduct risk analyses in developing countries. The physician responsible for the healthcare of a speci c patient can, remotely, access the CDA document by Medical subsystem, re-evaluate or con rm the risk analysis (i.e., preliminary diagnosis) provided by the Patient subsystem, and share the data with other physicians to get second opinions. If the physician con rms the preliminary diagnosis, the patient can continue using the Patient subsystem to prevent the CKD progression, including the monitoring of risk factors (DM and AH), CKD stage, and risk level.
Besides, the Patient subsystem includes a knowledge-based system to refer the patient with CKD and risk factors to an adequate healthcare facility at an emergency, as another new contribution from [19]. This feature considers the patient's scenario outside his/her county and does not know the correct facility for treatment, according to the current health situation. Based on semi-structural interviews with an experienced nephrologist that has treated patients in Brazil for more than 30 years, we addressed the following topics: (i) possible emergency care locations; (ii) pathology to be identi ed; (iii) symptoms; and (iv) associated drugs. For hyperglycemia, hypoglycemia, hyperkalemia, and hypokalemia, the system can refer the patient to emergency care units (ECU) or hospital emergencies, based on the current patient's health condition.
The knowledge base de ned for the knowledge-based system comprises data collected from medical guidelines and semi-structured interviews with the nephrologist. The data relates to the symptoms that patients may present and risk factors that can cause health conditions (e.g., speci c drugs). Fig. 3 describes the rst decisions used to identify the risk of hyperglycemia, hypoglycemia, hyperkalemia, or hypokalemia. Nausea is a symptom shared by all clinical conditions, and each including symptom helps identify a speci c condition.
In addition to the symptoms, the excessive consumption of alcohol, and excessive quantity of insulin, may increase the risk of hypoglycemia. For all clinical conditions, the usage of speci c drugs may also result in the clinical conditions considered. The possible ingestion of a drug is a relevant indication of the risk of a speci c clinical condition. Fig. 4 describes the commonly ingested drugs that may lead to hyperglycemia, hypoglycemia, hyperkalemia, and hypokalemia. Fig. 5 illustrates, as a tree, a summary of the relationships between the questions presented in the DSS, considered during the identi cation of hyperglycemia (left side of Fig. 3). We generated the tree from the knowledge base to present an overview of a sample of the knowledge-based system that composes the DSS. Hyperglycemia is a common clinical condition in patients who have DM. The rule base for hypoglycemia, hyperkalemia, and hypokalemia, is de ned similarly to the example of Fig. 5, differing by speci c tests, symptoms, and ingested drugs. Finally, Fig. 6 shows a view of the GUI of the knowledge-based system in a risk scenario of hyperglycemia. Whenever facing an emergency, the patient can provide information about his/her current clinical condition, enabling the DSS to identify the emergency and recommend a healthcare unit (another example of the DSS). In this case, after asking about speci c symptoms, the patient is required to inform if he/she ingested some drugs to increase con dence in the evaluation, following the relationships presented in Fig. 5.

Medical subsystem
On the one hand, the back-end of the Medical subsystem is implemented using Java, Spring MVC framework, and Drools (a business rules management system). The subsystem comprises the following main features: (i) access control; (ii) management of CDA documents; (iii) control version of CDA documents; (iv) sharing of CDA documents; (v) history of CDA documents versions and (vi) re-evaluation of risk analysis. On the other hand, the front-end of the Medical subsystem is implemented using the HTML 5, CSS, JavaScript, Bootstrap, and Java server pages. After validating his/her credentials, the system directs the doctor to the main GUI, displaying a brief presentation of the available features to handle clinical documents.
Two scenarios guide the usage of the Medical subsystem: creating a new clinical document and evaluating an existing clinical document. Fig. 7a illustrates the feature of creating a new clinical document that enables physicians to start evaluating a patient without the dependency on data received from the Patient subsystem. Fig. 7b illustrates that the physicians are requested to provide the risk assessment for patients guided by the classi cations proposed in wellaccepted international medical guidelines. In contrast, the evaluation of an existing clinical document relies on data received from the Patient subsystem, which performs the risk assessments of patients. The remote monitoring feature is relevant to address precarious public health, the absence of EHRs, and the lack of primary care physicians in Brazilian communities. Suppose a moderate risk, high risk, or very high risk is identi ed by the Patient subsystem. In that case, physicians receive general data and the risk assessment conducted using the J48 decision tree algorithm, enabling the nal evaluation or the interaction with other physicians to improve con dence in a suspicious clinical situation. When clinical documents are available, physicians can perform version control to access current and past documents-the version control helps keep track of the history of clinical evaluations of patients. The re-evaluation of only a subset of patients (referred by the patient`s system) can reduce the burden (or ine ciency) of the public health.
A real-time database supports the central data server, assisting data analysis by patients, physicians, and the government. The Patient and Medical subsystems use web services provided by the Server subsystem to update the PHR of patients as part of the medical records available in a healthcare facility.
Therefore, the government can conduct data mining, which is relevant to enable the analysis of a large number of data to support the planning and execution of public health policies. For example, it is possible to identify locations that require educational activities to prevent worsening mortality rates.

Machine Learning Model
A CKD dataset guided the evaluation of the patient s ⊂ systemoftheDSSwhenconduct ∈ gtheCKDriskassessmenta or d ∈ g → lowrisk, mod eraterisk, highrisk, and veryhighrisk. Thedataco s subsystem. When using the 10-fold cross-validation, the model presented high accuracy (i.e., 95.00%). The 10-fold cross-validation was executed 5 times, showing stability. The J48 decision tree presented a precision of 0.97, ROC area of 0.96, and PRC area of 0.94.

Knowledge-based system
Also, the complete system was presented to the experienced nephrologist, con rming the completeness of the requirements. For instance, the knowledge base and questionings were presented to a nephrologist with more than 30 years of teaching and treating patients with CKD and DM to evaluate the knowledgebased system as part of the DSS. The nephrologist reviewed the knowledge base and questionings, validating the nal version of the DSS. The nephrologist analyzed simulated data to evaluate the risk of hyperglycemia, hypoglycemia, hyperkalemia, and hypokalemia. The same data was analyzed using the DSS to compare risk assessments. Table 4 presents a sample of the total 112 simulated scenarios used to evaluate the knowledge-based system, covering 7 of the 21 paths of Fig. 5. This sample relates to ctitious subjects with a risk of hyperglycemia. To conduct the evaluation, Cohen's kappa statistic was measured by calculating the gross agreement and the kappa concordance. This task consisted of two steps of evaluations with the experienced nephrologist using Cohen's kappa. In the rst step (Table 5), the knowledge-based system only achieved substantial agreement (k = 0.6821) and moderate agreement (k = 05962) with the nephrologist for risk classi cation and refer, respectively. The main cause of the disagreement was that the nephrologist considered some of the scenarios of hyperkalemia and hypokalemia risks as inconclusive (Table 7, column 2). In the second step, the knowledge base was corrected, and the evaluation resulted in 100% concordance with the opinion of the experienced nephrologist.

Discussion
During monitoring of the CKD, based on the decision tree model, assuming the previous evaluation of DM, a user only needs to periodically conduct two blood tests: creatinine and urea. Albuminuria is measured using a urine test, while GFR can be calculated with the Cockcroft-Gault equation. The reduced number of examinations is relevant for developing countries such as Brazil, given the high levels of poverty of the population. However, it is also relevant to discuss the impacts of the reduced number of features during the training and testing phases. Table 6 describes the incorrectly classi ed instances identi ed when testing the decision tree model. For example, the decision tree model disagreed with the most-experienced nephrologist, stating moderate risk instead of very high risk for the patient with ID 59. However, the model did not conduct any critical underestimations of the risk situation of subjects (e.g., low risk instead of moderate risk). This would be a critical problem because a patient is usually referred to a nephrologist when there is a moderate or higher risk. The remaining misleading classi cations are less harmful because they still result in the referral of the patient under evaluation, even slighting overestimating/underestimating the risk. Besides the use of a reduced number of features and the absence of critical underestimations, another advantage of a decision tree model is the easy interpretation of results. The easy interpretation of CKD risk analyses by nephrologists and primary care physicians who need to conduct further examinations to con rm the clinical situation of a patient is a critical factor for the reuse of the model in real-world situations. For example, Fig. 8 shows the decision tree generated by the J48 decision tree algorithm, which comprises each CKD biomarker considered and the related classi cation de ned by the Weka© software.
A physician only follows the decisions to interpret the logic of the classi cation. From the 8 CKD features, the J48 decision tree used only 4 to classify the risk (i.e., DM, GFR, albuminuria, and age), requiring 1 blood test and 1 urine test when DM is already evaluated, at the cost of 4 incorrectly classi ed instances. The DM (1), GFR (2), and albuminuria (3) features had the same prediction power for the experienced nephrologist and the decision tree model, considering a scale ranging from 1 (highest priority) to 8 (lowest priority); however, the nephrologist prioritized and used the creatinine (4), urea (5), gender (6), AH (7), and age (8) features.
In addition to the machine learning model, testing all the 112 simulated scenarios of the knowledge-based system (as part of the DSS) covered all possible paths from initial to end nodes of the decision tree (e.g., Fig. 5) representing the complete knowledge base, increasing con dence in the de ned rules.
Evaluating the knowledge-based system was relevant to increase con dence in the DSS when conducting the risk evaluation of a user with hyperglycemia, hypoglycemia, hyperkalemia, or hypokalemia. This type of functionality can decrease the negative impacts on the health condition of patients during emergency situations. For example, carrying out some palliative control measures during the transit to a health care facility may reduce the impact of low glucose levels in a hypoglycemia emergency.
However, the evaluation of the DSS has some limitations. The size of the dataset used to evaluate the CKD risk is limited. The k-fold cross validation assisted in reducing the impact of this limitation. Besides, the current evaluation of the knowledge-based system did not consider the opinions of patients. The interviews with the experienced nephrologist were relevant to decrease the negative impacts of this limitation.
As future work, the evaluation of the DSS needs to be improved. Testing the risk assessment requires improving the number of subjects in the CKD dataset.
Besides, testing the knowledge-based system also requires interviews with patients to evaluate the GUI design, considering usability and user perceptions. We also envision to conduct a clinical study on the usage of the system to compare a group that used the system with a control group.

Conclusions
The intelligent web-based DSS presented in this article helps patients, physicians, and the government identify and monitor the CKD and risk factors. We evaluated the DSS using a CKD dataset and interviews with an experienced nephrologist. In addition, the proposed DSS methodology facilitates the identi cation and monitoring of the CKD, considering low-income populations in Brazil that usually suffer from the lack/precarious primary care. Nowadays, the remote identi cation and monitoring of chronic diseases are even more relevant, considering epidemics that prevent face-to-face assistance.   Summary of rst decisions to identify the risk of hyperglycemia, hypoglycemia, hyperkalemia, or hypokalemia.

Figure 4
Drugs that may cause hyperglycemia, hypoglycemia, hyperkalemia, and hypokalemia, collected by analyzing medical guidelines and interviews with a nephrologist.

Figure 5
Relationships between decisions used to identify the risk of hyperglycemia.

Figure 8
Decision tree for CKD risk evaluation made by J48 algorithm.