Knowledge Elicitation Using the Delphi Technique in Developing Diagnosis Systems

– Knowledge elicitation is important in designing knowledge-based diagnosis systems. Various approaches such as interviews and questionnaires have been used to elicit knowledge from experts. These approaches elicit knowledge from individual experts separately. Medical practitioners have diverse knowledge and experience in the diagnosis and management of a particular disease. A major challenge is in producing a harmonised diagnosis from different practitioners, which will inform the level of agreement among them on the treatment of Sickle Cell Disease (SCD). Therefore, it is important to elicit and integrate knowledge from different medical practitioners in developing an effective diagnosis system. Thus, the Delphi technique was employed in this study to elicit domain knowledge in developing SCD diagnosis systems in African Traditional Medicine (ATM) since there is no gold standard for achieving diagnosis in ATM. A kappa value of 0.487 was achieved. This implies that the Herb sellers averagely agree in the ranking of the SCD symptoms. Therefore, to build an effective SCD diagnosis system, further work should be done by conducting more Delphi rounds to ensure that a high level of consensus is reached. The Delphi technique used in this study helped in the area of requirement elicitation of SCD diagnosis in ATM which could be used in the development of an SCD diagnosis system.


I. INTRODUCTION
A clinical diagnosis system is a decision support system that helps clinicians in diagnosing patients' clinical conditions, making the correct judgment, and assisting in reducing the rate of diagnostic errors [1].Diagnosis is a very important aspect of the health care system, and it is vital for treating patients.Diagnostic systems require eliciting adequate domain knowledge from experts.Most of the time, domain knowledge is elicited from experts individually.However, experts have diverse experiences and knowledge on the same subject matter.It is, therefore, important to elicit and integrate knowledge from various experts in designing an effective expert system.This can be achieved through the Delphi method, which is a wellknown method for gaining consensus using rounds of questionnaires [2].During the first round of a Delphi survey, open-ended questions are typically used to increase the richness of the information gathered.The purpose of this round is to identify problems that need to be addressed in later rounds.The succeeding rounds are then built using the first-round outcomes and are more precise using rating or ranking to elucidate previous findings [3].The number of subsequent round(s) varies from one (1) to three (3) depending on the degree of consensus reached [2].
Traditional medicine is becoming more popular worldwide, with developing nations having the highest prevalence of its use [4].It is estimated that 85 % of Nigerians use traditional medicine for social, psychological, and medical purposes [5].This is particularly common in treating diseases like epilepsy, asthma, and Sickle Cell Disorder (SCD) [4].The occurrence of SCD is high among Africans and African Americans with Nigeria having the highest cases in the world [6].A significantly high number of people use herbal medicines obtained from Herb Sellers (HS) for their health maintenance in Nigeria [7].Unlike orthodox medicine, where laboratory investigations are carried out to ascertain the cause of a particular disease, diagnosis is mainly by observation, visual examination, and history taking in African Traditional Medicine (ATM) [8].Thus, the Delphi technique was employed in this study to elicit domain knowledge about SCD diagnosis in ATM.
This paper seeks to answer three research questions (RQs) 1, 2, and 3.
RQ1: To what extent can the HS averagely agree on the ranking of SCD symptoms using the Delphi technique?RQ2: How can we ensure the building of an effective SCD diagnosis system?RQ3: What are the implications of Delphi techniques in developing an SCD diagnosis system?Delphi's use in resolving issues in healthcare is well known.Several studies described how the Delphi method had been used to manage and assign final diagnoses to several diseases.For example, the study [9] used the Delphi survey to get information on the management and assessment of people with Atopic Dermatitis.The Delphi study consisted of two rounds and consensus was reached on 36 of 46 items in the questionnaire.To provide recommendations for children's dyslexia diagnosis and assistance, the study [10] carried out a three-round Delphi process.48 moms and 1 stepmother made up the panel.Using open-ended, qualitative questions via an online survey, the first round of Delphi concentrated on idea generation.After compiling and developing themes from the qualitative responses from the previous round, parents were asked to rank the various issues in the second round.The panellists were invited to review their rankings and the mean group rankings from the previous round to come to a consensus.
Delphi survey was performed to reach a consensus on the diagnosis of scabies [11].The survey consisted of four rounds.An open-ended questionnaire was used to generate a list of symptoms of scabies for the first round.The second round involved ranking these symptoms.In the third and fourth rounds, the outcomes of earlier rounds were shown to the participants.In the fourth round, consensus was reached.Also, there was feedback between each round.Similarly, some authors carried out a 4-round Delphi study to get consensus in identifying and managing Advanced Parkinson's Disease (APD) [12].An open-ended questionnaire was used in the first round.The use of both open and closed-ended questions in the second round helped get the true opinion of the experts and save time.Delphi study consisting of three rounds was performed to get a consensus on the diagnosis of APD [13].The experts ranked 33 items relating to the diagnosis of APD during the first Delphi round.Similarly, the symptoms of visual stress (VS) were ranked during the first round of Delphi [14].Ranking during the first round might not give the experts the ability to express their honest opinions on the subject matter.A modified Delphi technique with three rounds and three medical experts was used to attain a clinical reference standard in diagnosing Alzheimer's disease [15].In the third round, the experts had a face-to-face meeting, which increased the level of consensus compared to before the meeting.
A modified version of the Delphi technique was adopted for the development of a clinician-led guideline for diagnosing and treating Hirayama [16].It was completed by 47 panellists from several areas of expertise around the world.Agreements of 75.0 %, 67.0 %, and 50.0 % were reached in the first, second, and third rounds of Delphi, respectively.The modified Delphi begins with closed-ended questions, while the original Delphi begins with open-ended questions.Beginning with closedended questions helps reduce the number of rounds.
In some cases, consensus is not reached such as in the cases of [17]- [19].A couple of factors such as the lack of anonymity could be adduced to explain the lack of consensus.Second, the first round of Delphi consisted of the ranking phase.Usually, the first round of Delphi should consist of open-ended questions, but when closed-ended questionnaires are used, consensus may not be achieved [20].

II. MATERIALS AND METHODS
The Delphi study was facilitated by three paediatricians with a minimum of five years of experience in treating patients with SCD.The facilitators supervised the study, formulated the questionnaire, reviewed the responses, summarised responses as feedback for subsequent rounds, and selected the HS.The Delphi technique used in this study consisted of three phases as shown in Table I. 1.Using both open and closed-ended questions to elicit the symptoms of SCD in the first round instead of using openended questions alone.This is important to assess the knowledge of HS in the diagnosis of SCD. 2. Narrowing down the list of symptoms by asking the HS to select the important symptoms of SCD, but a limit was not given to the number of symptoms to be selected.This flexibility was necessary in getting the Herb seller's true opinion of the important SCD symptoms.3. Ranking the whole list of symptoms instead of ranking only the narrowed-down symptoms.The ranked list was compared to the SCD symptoms selected by the HS in (ii) above.This was used as a way of validating the accuracy of the important (principal) symptoms of SCD.• Exact duplicates were removed.
Phase 2: Narrowing Down • The remaining symptoms were then combined and given back to 33 HS from the previous 120 HS.
• These HS selected the important symptoms observed before concluding that SCD was present in individuals.
• Symptoms selected by over 50 % of the HS were retained and termed the principal symptoms of SCD.
• The HS equally gave the conditions that had to hold for SCD to be present in an individual.
• Conditions given by over 50 % of the HS were also retained.

Phase 3: Ranking
• Each HS ranked the list of symptoms.
• A mean rank (weight of symptom) was calculated and assigned to each symptom.
• The level of agreement was assessed using Fleiss Kappa.

A. The First Phase of Data Collection
In the first phase, SCD symptoms were elicited from the HS using interviews guided by questionnaires (open and closedended questions).The questions asked on the diagnosis of SCD were centred on the symptoms of SCD.Specifically, the questionnaire was administered to 120 HS in Osun and Oyo States.
In formulating the closed-ended questions, several studies on the diagnosis of SCD were reviewed.The process facilitated the identification of the symptoms associated with SCD.The identified symptoms were used to elicit information on how the diagnosis of SCD was done among HS.Hence, the closedended questions were administered to 60 HS.
The open-ended questionnaire was also administered to 60 HS.The open-ended questions allowed our respondents to supply responses that were not narrow or did give an impression of pre-judgment, thus leading to bias.For example, questions such as "how do you diagnose SCD?" were asked instead of "What are the symptoms of SCD?".This is especially necessary in the first round of the survey to elicit implicit knowledge from the experts in the field of herbal medicine.The diagnosis process for administering herbal medicine is not structured, and many of the practitioners are not formally educated.Different practitioners have different methods of diagnosis while expecting the same or similar results.Providing a well-structured process required that we first elicited general knowledge about the practice from different experts and thereafter made efforts to give a structure to the process.As a result, closed-ended questions were used to guide respondents toward a structured approach.The questions used were amalgam of various opinions given by the different practitioners.However, in cases where disparity in views was wider than expected, a majority of views were either taken, or another round of survey was done.
One of the problems with this approach is the introduction of bias into the study.It simply implies that because the majority assented to the practice, it is the best.This can be resolved by adjusting the iteration and by increasing the sample size.The minority opinion can also be closely examined by employing the views of other experts as well as discussions with the individuals who gave the minority opinion for clarifications.The research must be explicitly objective and allow for previous knowledge of the field to guide the research process and not interfere with the contents of the result and the conclusions drawn.

B. The Second Phase of Data Collection
In the second round, SCD symptoms from the first round were combined.The exact duplicates were removed, and the list of SCD symptoms to be used in the second phase was then generated which consisted of 20 symptoms.Consequently, interviews were conducted anonymously, with 33 HS from the whole set.These HS were chosen based on two criteria: they had to have more than four years of experience in treating SCD and also treated a minimum of three patients per month.The reason for the set criteria was that during the first round, most of the HS with less than these criteria failed to mention some of the cogent symptoms of SCD stated in the literature.Forty-four HS met these criteria, but only 33 of them participated because the remaining 11 were not willing to participate.
Going further, the HS went through the list of the 20 symptoms generated from the first phase, out of which they selected the symptoms that had to be observed before concluding that SCD was present in an individual.The symptoms selected by more than half of the HS were retained.These symptoms were then termed the principal symptoms of SCD.They equally gave several conditions that could hold for SCD to be present in an individual.Similar conditions were grouped.Likewise, the conditions given by more than half were retained.These conditions are: 1.The patient must have a pain crisis and any other principal symptoms.2. The patient must have three of the principal symptoms.

C. The Third Phase of Data Collection
The twenty symptoms generated from the second phase were also presented to the HS individually in the third phase.At this phase, the HS were asked to rank these symptoms on a scale of 0-5, that is, from the least important symptom to the most important symptom.
"0" represents "SCD patients do not have such symptoms"."1" indicates "SCD patients with such symptoms are very few"."2" represents "SCD patients with such symptoms are few"."3" denotes "SCD patients with such symptoms are many"."4" denotes "SCD patients with such symptoms are very many"."5" denotes "All SCD patients do have such symptoms".

III. RESULTS AND DISCUSSION
The principal symptoms generated during the second phase of the Delphi survey are the main symptoms that the HS look out for to reach a diagnosis.These symptoms are 11 in number.They are bossed skull, pain crises, anaemia, jaundice, stunted growth, delayed onset of puberty, increase in abdominal girth, swelling of hands/feet, weak hair, black vein, and drum shaped fingers.
The mean ranks of the 33 participants during the third phase of the Delphi survey were derived and assigned as weights to these symptoms as presented in Table II.The symptoms selected in the second phase as the important symptoms of SCD had the highest rankings.This validated the accuracy of the selection process in the second phase.Finally, the level of agreement between the HS was assessed using Fleiss Kappa.Fleiss Kappa agreement assumes that if a particular number of raters give ratings to several objects, kappa can be used to measure the consistency of the ratings.The kappa, K, is given as: The numerator measures the extent of agreement achievable above chance, while the denominator is the extent of agreement that was achieved above chance.
The values for computing the kappa value are shown in Table III.The symptoms are listed in rows, while the ranks are listed in columns.The number of HS who agreed that an i-th symptom belongs to a j-th rank is indicated in each cell.First, Pj, which is the sum of all symptoms that were assigned to the j-th rank, was calculated using the equation: where N is the total number of symptoms being ranked = 20; n implies the sum of all the raters = 33; k means the various ranks which range from 0-5; the symptoms have indices i = 1,...N, and the ranks have indices j = 1,….k;nij is the amount of HS that ascribed the symptom 'i' to the rank 'j'.For example,    Thereafter, Pi, which is the degree to which the experts agree for the i-th symptom, can be calculated using the: For example, ( ) Thereafter, the average of the Pi's is computed.IV.The results are discussed using the three research questions proposed for this study.

RQ1 -To what extent can the herb sellers averagely agree on the ranking of SCD symptoms using the Delphi technique?
The Pi value (the extent to which the herb sellers agree) for stunted growth, bossed skull, pain, anaemia, and swelling of hands and feet was between the range of 0.81-1.00implying nearly perfect agreement in the ranking of these symptoms.The Pi value for jaundice, delayed onset of puberty, leg ulcer, stroke, and increase in abdominal girth was between 0.61-0.80,which implied almost substantial agreement in the ranking of these symptoms.The Pi value for blurred vision, priapism, and weak hair, wrinkling of the skin, loss of appetite, drum-shaped fingers, and fever was between the range of 0.41-0.60.This shows that the herb sellers averagely agree in the ranking of these symptoms.The following symptoms such as thin bones, fainting, and black vein had a Pi value of 0.21-0.40.This indicates that the herb sellers slightly agree in the ranking of these symptoms.A high degree of agreement was obtained on most of the important symptoms of SCD mentioned in the literature such as pain crises, jaundice, stunted growth, increase in abdominal girth, swelling of hands and feet, and anaemia.
However, overall, a Kappa value of 0.487 was achieved, which implied that the HS averagely agreed in the ranking of the SCD symptoms.This result indicates that, to a great extent, the result is reliable, and the practitioners are largely reliable in providing the knowledge required in treating patients with SCD.
The following factors, as depicted in Fig. 1, can be adduced to as reasons for the level of agreement obtained during the research.The findings from the study have shown that the HS have good knowledge of the disease.
For a more reliable consensus level, there is the need for additional rounds of survey.Instead of limiting it to three rounds, it might be necessary to increase it to between five to six rounds.In some cases, the iteration might be allowed until a possible "saturation" level is reached, when a greater level of consensus among various experts has been achieved.Openended questions might be used for the first two rounds.The first round will allow for elicitation of rudimentary implicit knowledge that will guide how the second open-ended questions will be coined.The next two rounds could employ the use of a mix of open and closed-ended questions.This allows for structured questions on knowledge generally agreed upon by the experts and allowing further clarifications on dissent.The remaining rounds can then be closed-ended when it is obvious that dissenting opinions have been resolved.This will improve the consensus level because harmonisation of views and opinions would have been strategically achieved.

RQ2 -How can we ensure the building of an effective SCD diagnosis system?
To build an effective diagnosis system, the experts must be willing to divulge their knowledge of the domain of discourse.This is important for producing a diagnostic system that is reliable and appropriate.The result indicated by the level of Applied Computer Systems _________________________________________________________________________________________________2024/29 122 agreement among the different practitioners has shown the level of knowledge acquired by these practitioners.
Therefore, their knowledge of the disease is not in doubt as they have managed a good number of SCD patients in the past.Using the open-ended questions, the HS with greater years of experience showed a better understanding of the disease as they were able to provide more information on the diagnosis of SCD.In the first round of Delphi, most of the HS with less than four years of experience failed to mention some of the cogent symptoms of SCD stated in the literature.
The need to use the experts who have spent longer years on practice, learned from renowned experts, whose motive is genuine (i.e., to preserve the traditional herbal medical practice and not simply for money), was generally the target and necessary for achieving reliable results.The number of genuine herbal medical practitioners is dwindling because the new generation is not embracing the practice.They are more concentrated in rural areas.Response during the first round of survey helps eliminate those without expert knowledge.This generally informed the sample size that was used in the Delphi survey.
In the future, the scope of the study should be expanded to include more communities and states since the number of the practitioners is dwindling.Practitioners' associations can also be used to reach out to as many individuals as possible.Also, referral systems would be employed to get genuine experts.Practitioners will be asked to refer others.Likewise, some published knowledge can be used as a guide.Lastly, the number of years in practice can be relaxed to be able to include apprentice and others who recently set up practice.Since they learned from the experts, their knowledge can be used as a control to determine the extent of knowledge transfer and perception of the process.
The conditions for having SCD given by the HS in the second stage of data collection and analysis were interpreted as a percentage of the weights of the symptoms assigned during the third phase.These conditions are represented using a set of IF-THEN rules and used to define the conditional statements that form the knowledge base of the system.The IF-THEN rule assumed the form: IF (weight of symptom=value) THEN (diagnosis result).
To diagnose SCD, each of the symptoms is presented to the user.The user selects an input of either a 'yes' for the presence or 'no' in case of absence of the symptom.For an input of 'yes', the weight of the symptom is assigned while for an input of 'no', zero weight is assigned.The final weight of patient symptoms is then calculated, and the diagnosis is performed.

RQ3 -What are the implications of Delphi techniques in developing a SCD diagnosis system?
The results obtained and the experience gathered during this research have proven that Delphi can be employed to reach a consensus on diagnosis in situations where there are no laboratory investigations to confirm or refute the diagnosis.Therefore, the system can be used as an effective tool for the diagnosis of SCD in ATM.

IV. CONCLUSION
The Delphi technique was employed in this study to attain a clinical reference standard for SCD diagnosis since there is no gold standard for achieving diagnosis in African Traditional Medicine.Some of the herb sellers were reluctant to divulge their knowledge of SCD diagnosis because, traditionally, practice of this nature requires that one person wants to claim superiority over the other.They see themselves as professionals just like any other professional in the health sector.Unlike other professionals in the health sector (doctors, nurses etc.), they do not have a professional body that regulates entry or exit into their profession.They are of the belief that if they give out their knowledge to a fellow herb seller who they feel they are superior to or someone who is not even in the profession, directly or indirectly, they are sending themselves out of job and out of the market and thereby destroying their means of livelihood.Also, an herbal medical practitioner takes pride in the acclaim received from others.This acclaim is one reason for hiding knowledge.Lastly, there is a lack of trust.A practitioner is not sure whether upon release of the knowledge it will be used justly or not.
Strategies to address these barriers include ensuring the practitioner that they will be referenced for whatever knowledge volunteered.The awareness that many people around the world will get to know about them can serve as an incentive to sharing knowledge.Likewise, giving the assurance that the knowledge shared will not be explicitly shared with other practitioners except with the express permission of the practitioner from whom the knowledge is obtained.Besides, it is necessary telling the practitioner the reason for asking for the knowledge, which includes preventing the knowledge from going extinct and helping to give structure to the process used in the practice.Lastly, other methods of eliciting knowledge, such as observation and pseudo-apprenticeship, can be employed.
This study has contributed to the diagnosis of SCD in the area of requirement elicitation and specification for ATM practices.It is deduced that the consensus level was average as shown in Table IV.Therefore, to build an effective SCD diagnosis system, further work should be done by extending the Delphi study to include more rounds to ensure that a high level of consensus is reached.

P
can be computed using(3).
The interpretation of k values is as shown in Table

TABLE I DESCRIPTION
OF THE DELPHI TECHNIQUE USED IN THIS STUDY Phase Tasks Phase 1: Brainstorming • SCD symptoms were elicited from 120 HS.