Virtual assistant upper respiratory tract infection education based natural language

ABSTRACT


INTRODUCTION
Upper respiratory tract infection (URTI) is a very easily spread infectious disease.Based on research, URTI disease in Indonesia is at the first rank as a disease that affects the community, mostly on children [1].In this country, it is noted that toddlers experience a problem with cough and cold at least 3 to 6 times a year.WHO data revealed the incidence of pneumonia in children under five in Indonesia is quite high at around 10-20% per year [2].The Ministry of Health reported that in 2016 approximately 800,000 children in Indonesia were affected by acute pneumonia or inflammation that attacked the lung tissue and its surroundings [3].It is estimated that 3.5 percent of the total number of children under five, the number of toddlers is 100 percent of the population, approximately 24 thousand children under five, resulting in 3.55 percent of the 24 thousand Comput.Sci.Inf.Technol.


Virtual assistant upper respiratory tract infection education based natural language (Wiwin Suwarningsih) 133 affected (pneumonia) acute respiratory infection (ARI).During the 2018 Lebaran homecoming, the Ministry of Health also stated that upper respiratory tract infection (ISPA) became the disease most commonly suffered by people, especially by children [4].This disease has been found by many health workers guarding the homecoming post provided by the Ministry of Health.In addition, ARI is one of the main causes of patient visits to health facilities, 40% -60% of visits to public health, and 15% -30% of outpatient and hospital inpatient visits [5].
Considering the high incidence of URTI in Indonesia, it is deemed important to recognize the symptoms, treatment, and prevention measures that are most appropriate for this disease.This can be done if supported with information in the form of an appropriate education and consultation system [6].The traditional health education system is considered time consuming and uncomfortable for those involved [7].It is also insufficient to meet the needs of medical services in communities, especially for those living in remote areas.Hence, there is a need for a way to develop some efficient health education solutions to deal with emergency conditions, reduce maintenance costs and improve the quality of care.The current e-health application for education and medical case studies is often ad-hoc with a focus on implementation and technology in certain settings and scenarios.A research conducted by Soegijoko S., et al. [8], for instance, developed e-prescription, disease diagnosis, patient and drug data record.Mardiyanto et al. [9] a diagnostic study was conducted based on the frequency of breath sound.Whereas Grayman [10], the research focused on patient mental health consultation.Irawan et al. [11] made web application that connects parents and midwives throughout the pregnancy process and the growth of children from the womb to five-year-old children.Similarly, Octovia et al. [12] it was only about the review of the use of m-Health to improve maternal and child health.The results of this review were to propose a design for a series of m-Health applications focused on improving maternal and child nutrition in Indonesia.
One solution to handling the limitations of existing e-health and m-health is by using information retrieval (IR) system with a touch of natural language needed with the dramatic growth of digital information today.One type of IR is question and answer system (QAS) that aims to provide precise and fast answers to user questions from any documents or databases.Conventional question and answer systems in the medical domain such as research conducted built the AskHERMES question and answer system to conduct a strong semantic analysis on complex clinical data and produced extractive summaries focused on questions as answers [13].A semantic approach to QA based on an in-depth analysis using the natural language processing techniques for questions and medical documents and using semantic web technology at the level of representation and interrogation [14].Built the SQA system by handling disambiguation in choosing the correct meaning when mentioning similar words, especially when triple linear did not match any concepts [15].
This system was able to treat the process of the complexity of natural questions separately from the processing of the complexity of the Knowledge Base structure.Natural language processing starts to be widely implemented in the medical domain, including in summarizing the long blocks of narrative text such as clinical records or academic journal articles, by identifying key concepts or phrases contained in material sources [16], [17].Mapping data elements in unstructured text to structured one in electronic health records aims to improve the integrity of clinical data [18], [19].Answer free text queries require the synthesis of multiple data sources [13]- [15].
Natural language processing such as an interactive question and answer system (IQA) is the key to support any effective decisions.IQA allows users to find answers to questions interactively [20].During this interactive process, the system automatically initiates a dialogue with users to clarify information as expected by the users.IQA was built by focusing on customer service [21]- [23].Interactive question and answer system were applied by developing a user-centered evaluation model with the aim of increasing user satisfaction [21].The method used was the formation of segmentation based on user queries and then was extracted to obtain target requests and requirements for generating SQL statements.It is in line with the research conducted by Purwarianti and Hakim [22], building an IQA system prototype for customer service by understanding user topics, knowledge structures, and recording information in dialogue where the knowledge base used data structures and the concept of information state for recording interactions between systems and users.Similarly, the research conducted by ZhiFei et al. [23] built IQA by adopting a hybrid reasoning mechanism.This system diagnosed errors online and returned a solution matching the interactive dialogue with users.This research was focused on interactivity and the ontology knowledge base.Many efforts have been made to improve other syntheses by providing answers to the medical question, and answer system began to implement the IQA system, including [24], [25].In Shamli et al. [24], the user in the system interacted by including queries in the form of disease symptoms, and the system provided disease inference and cure.The method used was by normalizing the extracted nouns and medical identifiers.In Jurafsky and Martin [25] IQA was built where the user would get a description of the question and the system would return the answer after going through the extraction process.The answers sought used the cQA website where the system would collect medical ontology and analyze user questions.The system would give a feedback if the answer was not as expected by the user.In the midst of this condition where acute pneumonia or inflammation attacks children, we attempt to promote education by using the consultation tools in the form of virtual assistants such as interactive question and answer systems in Bahasa Indonesia.Our research presents virtual interactive question answering assistant (AVIQA) -a system that accepts questions from users in natural languages and then returns short text as the answer.In other words, AVIQA is presented by inputting natural language questions and the expected output is a definite answer identified in a text containing an answer.Compared to the existing research, the questionand-answer system built (such as AskHermes [13] and Means [14]) to find answers carried out a document retrieval process; consequently, it took quite a long time to sort out several appropriate documents and then it made the classification of the document into an answer in the form of a sentence.Meanwhile, the AVIQA system we built no longer did a document retrieval but through QA-pairs retrieval to achieve the faster process.The research constraint was related to the limitation of the data sources as it was time-consuming validation process from both linguists and health experts.The corpus we had was quite a lot i.e. 125,000 sentences but it was only about 57.000 sentences validated.
This paper aims to achieve the following objectives.First, it is to return a solution in matching the interactive dialogue with users; here, the system will provide disease inference and cure.Second, it is to provide an overview of the interactive question and answer system with natural language approaches.Third, it is to propose AVIQA to design and develop solutions to integrate various technologies in the health care education system.AVIQA contains an architecture for optimizing time in the process of finding answers by transforming natural languages into some template patterns.Finally, it is to formulate an answer pattern that matches the question pattern by finding the correct answer using case-based reasoning.The rest of the paper is organized as follows: Section 2 describes the issue on dialogue management for interactive question answering.It is followed by Section 3 presenting the details of the proposed method.Furthermore, Section 4 will discuss about the testing results from the application of the proposed method and the last section will provide the conclusions and challenges and opportunities for the next research.

ISSUE ON DIALOGUE MANAGEMENT FOR INTERACTIVE QUESTION ANSWERING
Dialogue management (DM) is the main component of the IQA system.It functions to coordinate the activities of the subcomponents in the dialogue system with the main purposes to maintain a representation of the state of the ongoing dialogue [25], interact with processing tasks/domains (eg.databases, planners, execution modules, or other back-end systems), and coordinate dialogue and behavior and non-dialogue [26].There are 4 main types of DM architecture that have been applied to the IQA system, i.e. finite-state DM, frame/form-based DM, information-state DM and plan-based DM [27].The characteristics of each DM will be explained follows.

Finite-state DM
Finite-state DM involves a dialogue of movement to observe any reliable constraints and to present a number of database results [28].The user here is brought through the dialogue following a predetermined sequence of circumstances.The size of the sentence for the answer is optimized so that the answer is in the form of a long list (list) and long interactions using limited circumstances affecting the character of the database and the policy learning process.Van Schooten et al. [29] defined this method as a process answering the questions to provide appropriate answers in stages in which it is rather complicated for using corpora collected from this system and other sources.The process of comparing and handling the advanced queries between finite states is necessary.Hence, it requires a generalization process about questions in all domains and their applications.The weaknesses of finite-state models include (i) being insuitable for tasks with subtasks in an unpredictable order, (ii) being inflexible, (iii) problems occurred if the user must correct items and introduce unforeseen information, and (iv) being bound to the predictable dialogue courses.

Frame/form-based DM
This is a popular approach to dialogue management based on the limited conditions model.The system response is determined by the user's speech, which will trigger a transition between the dialogue state and the current state [30].This model takes the form of filling slots in the form or frame.Each slot has an associated prompt guiding the user through the dialogue, and a priority determining the order in which the system tries to obtain information.Another text framing technique is applied in HITIQA aimed to illustrate the gap between defining user questions and the way of the system in understanding the user questions [31].The framing done at HITIQA is an attempt to impose a separate structure on the text, thus allowing the system to systematically compare the different pieces of text to each other.HITIQA is designed to answer complex questions.The representation of the question uses a frame and helps the user through a simple dialogue.This frame-based visualization dialogue system can expand the space and explore the resulting data. Comput.Sci.Inf.Technol.


Virtual assistant upper respiratory tract infection education based natural language (Wiwin Suwarningsih) 135

Information-state DM
This information-state model focuses on the feature structure of the information state.The features consist of ln representing attributes and a representing its value.This feature can be used as a module in the system, which is able to add information proportionally.It is because the interrogative sentence coming from the user is expected to provide motivation for other people to lead to a more specific or specific statement [32].The information state model can also be adapted to the knowledge structure and private customer needs [22].Here, the special form of service to users is adjusted according to the perpetrator's representation as evidenced in the form of testing aspects of IQA needs that are met.

Plan-based DM
The plan-based dialogue system model is task-oriented and can address a number of problems such as multi-topic problem, topic change problem, information sharing among various topics, and different interests for different items of information.This method can take control of mixed-initiative dialogue easier [33].Wu implemented a reasoning engine based on topic forest with a reasoning strategy so as not to depend on a domain to facilitate the process of moving to a different domain.In this research, Freedman [34] used a reactive planning and hierarchical task network planning (HTN).This dialogue planning was implemented in the form of an atlas planning engine (APE), used to add dialogue skills to the physics tutor system.According to Freedman [34] plan-based DM is superior to the status machine approach to dialogue management.

Proposed method in this study
The method we used for AVIQA is to combine the DM-based frame with the DM information state.The consideration is that the information state tracking process is carried out to make it easier to specify or generalize the scope of questions or clarification sentences from the user.By so doing, the frame of the QA dialogue can focus on the target information to be conveyed to the user.The DM model we proposed is illustrated in Figure 1.The DM strategy we proposed is an effort to focus on questions and to handle user responses in the form of negation information.This is done to provide the expectations in the form of a communicative system behavior in the context of interpretations observed by users.It is always able to update the dialogue context on the basis of communication interpreted in the system learning process.-and-answer process on the AVIQA system.Broadly speaking, in AVIQA, the following user will provide input in the form of an interrogative sentence into the system that then will carry out the question so that the question process is included from the sentence and seeks a database in the form of a similar case.If there are similar questions, the answers are immediately used and read on the screen for the user.During the interactive process, the user will receive an answer.However, if there are questions not implemented by the system, the system will give questions to the user to focus on the system questions given by the user purposely to provide an answer in the form of valid information.AVIQA is an interactive application for health education in which it is different from other IQA systems in terms of the interactive concept applied to dialogue management by combining two approaches, i.e., frame-based and information state.This is done because the characteristics of the information to be conveyed must be properly and correctly validated.Thus, the resulting outcome can provide basic knowledge for users in handling health.It is not to invite the users to be the doctors but it is to provide the proper and precise education, particularly in treating children with ARI.

RESEARCH METHOD
AVIQA is designed to help mothers to deal with any emergency conditions if toddlers experience respiratory problems.It is designed to be used in remote and rural areas in Indonesia far away from health facilities.With this health education patients can communicate as if they were consulting a doctor.Figure 1 presents the simple representation of AVIQA, where the components on the AVIQA system have no difference compared to other QA systems.The differences are in terms of the reasoning process that uses Case Base Reasoning and the addition of a dialogue manager component.This dialogue manager component functions to manage some interactive processes and record information in the dialogue process between users and system.The design of dialogue models for interactive consultation uses a combination between information state and frame base model.A detailed description of each step shown in Figure 5 is explained follows.AVIQA architecture (see Figure 2) we built consists of three major modules, i.e. question processing, answer processing and manager dialogue.Our question processing used the case-based reasoning method with the aim because the medical domain is a domain requiring a form of reasoning based on previous evidence.For the search for answers, we used QA retrieval instead of using document retrieval to speed up the process of providing answers to users.In the dialogue Manager Modul we implemented a combination of finite state and frame based.In answer processing, we used the ranking learning method and automatic lexicon generator.The details of each stage of the module that we proposed are explained follows.

Question processing
Question processing in this study used the case-based reasoning stage where the question sentence was analysed using parsing question to do the basic process of sentence processing (such as tokenizer, stopword removal, and stemming) and formalization was to improve the structure of question sentences and words (e.g., really).The retrieve case and Reuse were the similar cases while the revision and retain process was handled by the case retainer component.This question processing process also involved a component of dialogue managers to process the clarification of question sentences.

Parsing and formalization
Parsing used in this study was a parsing shallow using the part-of-speech (POS) labelling.This was done to observe the grammatical relationship in the sentence in addition to parsing sentences and representing the forms of dependence using stanford dependency.Parsers represented all relationships between words in a given sentence in the dependence form.Dependence is a binary relationship between the head (the core of words) and dependent (words explaining the core of the word).

Examples of input sentences:
Apa gejala cacar air?
The next step was to determine keywords and define the formalization of each word if non-standard words were found.The examples of complete words were in the form of abbreviations (spt = seperti (such as); bgt = banget (very); ortu = orang tua (parents); or skt = sekitar (around)).The formalization namely a feature of AVIQA also handled the out-of-standard words as the absorption words from regional languages and foreign languages.The formalization keywords were used for the classification of question sentences to facilitate the process of retrieving and searching patterns in the QA-pairs database.

Retrieve case
The retrieval stage here was to search for the similarity of cases in accordance with the queries of the keywords defined from the parsing results.This similar case search process used the semantic role labelling and a combination of transformation rules that were built in previous studies [35].The SRL used was PPPICCOODTQ (Problem, patient, population, intervention, compare, control, outcome, organs, drug, time, and quantity).Hence, the form of the question-1 sentences if given by SRL became the following sentence-4.

Apa gejala [cacar air
The result of defining SRL was used to transform the question sentences into question sentence patterns by using transformation rules.The transformation rule used was if the system found SRL= POPULATION, the system would display a form of sentence patterns containing the elements of POPULATION, for example "Dapatkah + POPULATION + disembuhkan?"OR "Apa + penyebab + POPULATION?" OR "mengapa + POPULATION + meningkat?"OR "bagaimana + mengatasi + POPULATION?" and so on.The result of this pattern search was to calculate the similarity value of new cases to all existing old cases.The process of calculating similarities requires an indexing process -the process of grouping the existing cases based on the specified features.The indexing process was carried out so that during the similar case search process in the new case the CBR system would only calculate the similarity value of the cases in the same group as the new case.The method used was indexing hashing.Based on the results of the literature [36], [37] indexing hashing can speed up the data retrieval time.The system built in this research was IQA in which the retrieval process and similarity of data required a qualified method.Indexing hashing also could minimize the use of mobile device resources.In this study we compared several methods as  1 provides the results of comparison with other methods.The hashing concept is to map each hash key (x) using the K (x) function.
The key to hash (x) in this study was SRL defined during the process of transforming sentences into template patterns.K (x) = i, where i refers to the SRL index to the hash table (KI) and KT = [i ... n] refers to an array of size n.Then, K (x) takes the hash key (x), which produces an integer type index.This integer index will represent each sentence pattern and is contained in the KT table.Subsequently, sentence patterns containing similar cases can be taken directly at KT based on the hash key address (x).Table 1 shows that the method we used i.e., Hashing indexing, had higher accuracy than the Association rule method (baseline 1) and propagation (baseline 2).This accuracy was obtained from the ratio of prediction to the correct value of similar cases so that it could answer the question what percentage a similar case to a user query.Meanwhile, the recall value was used to answer what percentage of similar data was suitable and predictable compared to the overall negative answers.The f-measure value was used to answer the question of how much data similar to the user's query was obtained.

Reuse similar case
The search results for similar data were then selected using the PPPICCOODTQ element rule base extraction [26]- [35] to determine whether they could be reused or must be revised.Based on the pattern shown in Table 2, the rule used for sentence-4 was id-rule: KS-115 IF Apa + gejala + <POPULATION> +? Then <PROBLEM> +, + <PROBLEM> +, + <PROBLEM>.Then, the answer pattern obtained from KS-115 was selected to determine whether it could be reused or revised.If it was possible to reuse all answers, the candidates would be analysed in the answer processing module.If the pattern was not suitable, then the revision stage was entered as handled by the component retainer.The component retainer performed the following steps: (i) creating a scheme <argument, relation, argument>, (ii) making a pair of arguments combined against all pairs with all relations, (iii) selecting each pair of arguments and relations using the Name Entity relation rule, (iv) generating rules from the name entity relation, (v) simplifying the sentence generated by NE and (vi) storing in the database QA-pairs as the answer sentence for the pair of question sentences.Table 2 shows the results of rule-based extraction using the Pico frame from some of the Eid showing that the Pico frame plays an important role in supporting the manager's dialogue process when identifying the finite state.This rule-based development could be further developed using the deep learning method where we must collect a lot of data and then did a training so that the rule-based results could represent all types of questions with a more complex syntax.Then penyakit + yang + <INTERVENTION> + oleh + <PROBLEM> Noted: This Rule base was built manually by authors from various medical sources both online and offline

Answer processing
Answer processing refers to a stage for the task of matching candidate answers to the semantic representation of the questions, and generating a list of answers sorted by the truth probability.The first process Comput.Sci.Inf.Technol.


Virtual assistant upper respiratory tract infection education based natural language (Wiwin Suwarningsih) 139 carried out at this stage was to find some valid answers from the QA-pairs group of candidate answers containing semantic type strings according to the type of expected answer.Furthermore, a number of answers were ranked where boundaries would be applied to a group of answers found, such as the compatibility between term (keyword) queries and QA-pairs patterns.This ranking determined the feasibility of each answer to be considered as a candidate answer.Then, it would be displayed by the answer generator module.

Answer validation
Answer validation is a component that selects a number of reliable answers from extracted candidate answers.In this paper, we proposed a supervised learning approach and considered two main processes, i.e., making hypotheses and extracting feature [29]- [38].This was done on the basis of supervised learning if the expected output has been previously known.This learning was done using the existing data.Supervised learning makes predictions and classifications.In supervised learning, the algorithm seems to be trained first to make predictions and classifications based on PPPICCOODT's semantic role labelling.The supervised learning ability in predictions was combined with feature extraction to obtain a truly valid answer.Extracted features used word features and lexical association strengths.The word features were formed based on semantic role labelling keyword groupings.Therefore, a group of words was produced and had a general lexical association value.To see the advantages of the combination of methods used, we then made some comparisons with other methods such as baseline-1 using a combination of keyword association & technical association and baseline-2 using lexical over-laps and answer redundancies.Table 3 presents the results of comparison of several baseline methods.As depicted in Table 3, the method we used i.e., the combination of supervised learning and feature extraction had higher accuracy than the keyword Association (base line 1) and lexical overlap (base line 2).This showed that the validation of an answer was very good using a combination of these methods.While the recall value was used to find out what percentage of correct answers should be compared to all alternative correct answers and the F measure value was used to find out the size of the correct validated answers.

Answer ranking
In this paper, we proposed a combination of approaches to answer the ranking learning.For each answer to be scored, a feature was produced to show that the validity of the answer was correct.The built-in feature provided several measures of valid answers by adjusting the data in the knowledge base.This feature then was used to rank the candidate answers based on the possibility that the answer was correct.The next step was to devise a strategy to complete thousands of answers to the final score.The strategy used was to create a vector feature [40] presenting evidence against the correct pair of answers.Focusing on the paper at the final stage of the ranking of the candidate answers, we produced 157 vector-based count features by calculating the alignment of answer scores to semantic role labelling based annotations for each vector feature annotation.The resulting vector feature showed that the methods we used were better.We evaluated them with several methods including baseline-1 to basic ranking features such as bag-of-word and entity matching, and baseline-2 feature testing using registration score answers and performing machine learning.Table 4 presents the comparison between the methods proposed and the baseline.As shown in Table 4, the learning process for ranking answers produced a very significant accuracy compared to the use of the basic method of ranking back of word features and feature testing using the registration score Answers.The recall value obtained showed that the correct value of the data ranking went correctly, while the F-measure value showed the total amount of data ranked for validation greater than the wrong data.

Answer generator
Answer generator is the most important component of processing answers to display the answers to user questions.By utilizing the dialog manager system, the answer generator was built to implement an automatic lexicon generator [31]- [40].This was done considering that the lexicon could identify interactive questions and answers from the system being built.The automatic lexicon generator was integrated with the question-and-answer system using logical inference models.The results of choosing an automatic lexicon generator helped in creating knowledge representation to be used by an interactive question and answer system.This showed that the lexicons produced successfully reduced the amount of time significantly and became stronger.The choice of the method applied to AVIQA refers to the results of a comparison of several methods as the baselines of this study.baseline-1 was first order logic and baseline-2 were a logical inference model.The results of the comparison with this baseline showed that the proposed method was superior.Table 5 shows the comparative results.As seen in Table 5, the accuracy value using the automatic lexicon generator module for generating questions could work well where the recoil value could determine the truth value compared to the overall generic data.Meanwhile, f-measure reached a significant number where the system was able to provide an answer generation to the user.

Dialogue manager
Dialogue Manager is also one of the most important components of the IQA system for being functioned to coordinate activity between the components of AVIQA.It is purposely to maintain an interactive representation of the current state of the ongoing dialogue.In this paper, Dialogue Manager controlled the entire architecture and structure of the dialogue and functioned as a link between the components of processing and answer processing.The process of interpreting answers, formulating responses and generally maintaining a system idea about the state of IQA requires a learning process.The learning process carried out by Dialogue Managers is formal representation and update strategies.Formal representation handles when finding some question sentences are not in accordance with the standard sentence pattern.Whereas, update rules and strategies handle the negation of the question sentence.These two things are important because the style of the user in dialogue using Indonesian has the following characteristics: (i) Sometimes the use of question words (such as: APA (WHAT), BAGAIMANA (HOW), or KAPAN (WHEN) is stored at the end of the sentence, (ii) using levels influenced by the dialects of regional languages and foreign languages, and (iii) the use of the language "alay" as a trendy language among people.

Information state tracking
Clarification sentences from users who enter the dialogue manager will be managed by the information state tracking module.This module has the task of representing the information needed so that the types of questions can be distinguished as the specific questions or general questions.This information state module will represent the input data in the form of a feature structure.The feature structure used is the Cooper Larsson model because this model distinguishes between private and common attributes [41].The private attribute contains a collection of propositions from the question-and-answer sentence and it is in the form of a stack of actions that will be sent to the QA frame.Meanwhile, common contains questions under discussion (QUD) holding the value of the most recent questions discussed in the QA dialogue.In general, Figure 3 illustrate the feature structure.
In Figure 3, the justification of using the Cooper-Larson model is to determine the extent to which the feedback process must be carried out by the system to the user [41], [42].The specification or generalization of questions and information provided by users will determine how much the system needs to be expanded/made specific to the answer space.Thus, specifications with targeted information can meet the user needs.One of the features raised in AVIQA is the examination of the language structure aimed to facilitate the system in determining patterns in accordance with the standard pattern of sentences.The standard pattern of question sentences is the question word at the beginning of the sentence.However, users can save the question word in the middle or at the end of the question sentence.For example, if there is a sentence about the question "what cough medicine?",and then, the sentence arrangement will be corrected to "What is the name of a cough medicine?"or "What is the medicine for coughing?"Repairing this sentence is a way of the system to map sentence patterns into a good pattern structure.While formalization functions are to correct the non-standard words.The examples of non-standard words such as "aq" meaning or "bgt" meaning "really".

Update rules and strategies in handling negations
This stage served to change the rules when finding the sentence negation.In the process of medical domain dialogue, sometimes negative sentences were found but not negative.The sentence negation in Indonesian is denial or negation.In linguistic terms, negation is a term for denial.There are four denial words in Indonesian, namely "tidak", "bukan", "belum", and "jangan".An example of a negation sentence (sentence-5) follows, which asks why his body has bumps like being bitten by an insect but not itchy.
Badan tidak gatal tetapi ada bentol-bentol, kenapa ya? ( (In English: The body is not itchy but there are bumps, why?) Handling sentences containing negation is done by applying positive predictive value (PPV) methods such as research conducted by [43].The author manually collected the negation phrases in the Indonesian medical domain and performed the calcifications based on true-positive, true-negative, false-positive and falsenegative.This classification was done so that the learning process and performance analysis of each negation phrase used a series of regular expressions and a list of negation terms to distinguish the actual negation.AVIQA has an ability to carry out a learning process and form some new rules and sentence patterns based on a series of regular expressions.Regular expressions formed refer to the pattern of sentences found in the QApairs database by adding the phrase negation.Table 6 presents a set of Indonesian Language negation phrases formed.This negation phrase is the result of calculations for the Indonesian medical domain found during training data.As shown in Table 6, the positive predictive value and negative phrases in Indonesian are often used to convey a problem or question into the negation word system.Negative phrases are used when conducting testing; for this, the system must be able to overcome and choose and understand what the user expects.The set of new rules for the AVIQA system were based on the recognized words implying the negation and terms stopping the assignment of negation.If the negation appearred just before the keyword, then the new rule would form and recognize the negative expressions and appropriate keywords.The results of combining PPV methods and simple regular expression algorithms accurately detected most of the negations associated in the question sentence.In this study, we compared several methods as baseline-1 using the term snomed-ct to index clinical documents and baseline-2 using the supervised combination method.7, the accuracy obtained by using the PPV method and regular expression showed that the system had good reliability for being able to show a good percentage of its record value.The apostles here showed that these values, the correct answer dominated more than the wrong answer from a series of test data.

RESULTS AND DISCUSSION
In this section, we evaluated the performance of various modules contained in AVIQA based on the data sets we collected.The method of testing was to do an interactive procedure simulation by giving a random feedback based upon the additional questions generated as the user input, and then to predict the answers.Dialogue was done through a keyboard with a character display where the system could only answer simple questions around ARI disease.The AVIQA system would not answer complex questions such as asking for reasons and opinions.The language used was Indonesian with no restrictions on the use of language expressions.Figure 4 shows the examples of interactive procedures.4 shows an interactive example of an interactive dialogue between the user and the system.When there is a question from a user whose question is unfocused, the system will respond or ask the user again to direct the intent and purpose of the user's question so that the system can provide an appropriate answer. Comput.Sci.Inf.Technol.


Virtual assistant upper respiratory tract infection education based natural language (Wiwin Suwarningsih) 143

Characteristics of questions and answers
Most users used the question "Apa" (What), "Kenapa/Mengapa" (Why), and "gimana/ bagaimana" (How) in which the users were impressed to request information in initial handling to overcome a condition suddenly occurred.Table 8 shows the syntax classification of users and their distribution.As seen in Table 8, it can be concluded how important it is to handle questions from users as a powerful tool that can have an impact to generate correct emotions and thoughts.Good questions can trigger the learning system process to continuously develop the knowledge base with the latest information.Meanwhile, there were fewer Yes / No questions asked by the users.Commonly, this kind of question is related to general questions and only to ensure the correctness of an opinion, action or decision already in the user's mind.The result of the AVIQA system we built using the one question was one answer principle.Questions entered from users Answers in the log, which aim to evaluate if there are questions that must be developed into several new questions.Table 9 presents the classification of questions and user requests according to the subject being asked or requested.This classification shows how important health information to users is.Initial treatment before being taken to medical personnel is critical to prevent more severe conditions.From the results of the test, we conducted as shown in Table 9, the subject of questions often asked by users where the largest percentage was in the type of drug that could be used freely followed by questions that are action to deal with disease and the third is how to do it before bringing patients to medical personnel.This showed that Afika was able to provide a means to do questions and answers about preventive actions and handling that must be taken by users to patients with ARI.

User clarification of system answers
The results of the observations from the AVIQA experiment showed several clarification sentences given by the user.This clarification asked about the subject of unclear questions (39 questions), ambiguous references (51 questions) and non-topic questions (90 questions).Figure 5 shows the examples of clarification.This clarification occurred because of the search for in-depth information needed by the user.To overcome this, the system required a handling in terms of identifying the question type to estimate the form of the final answer to a question.AVIQA has been able to overcome this clarification.One of the ways carried out was by dialogue management to evaluate where the question sentence pattern was successfully identified.Then, it was compared with the manual pattern used in IQA. Figure 5 shows the example of how the system provides a clarification to the user's question.This clarification is done to provide directions to the queries given by the user so that the system can provide the right answer.

CONCLUSION
In this paper, we proposed the AVIQA -an interactive question and answer system utilizing CBR architecture and integrating state information and management dialogue frameworks into seeking answers to user questions.AVIQA was built to achieve the ideal model of an IQA where the interaction model built was possible because of the QA database to provide an interactive model.The use of a combination of methods in dialogue management, namely tracing the status of information and framing of answers to questions less able to negate with an accuracy of 77% compared to the baseline, only reached more than 70%.The system response to the interactive performed by the user indicated that an ability of AVIQA to ask and provide answers as well as all questions about ARDs.The AVIQA model being developed has resulted in a new IQA dataset that could open new avenues for researchers to explore in the QA community.In future, we plan to use a deep learning to generate sentence rules and patterns.

Figure 4 .
Figure 4.A sample interactive Q/A dialogue

Figure
Figure4shows an interactive example of an interactive dialogue between the user and the system.When there is a question from a user whose question is unfocused, the system will respond or ask the user again to direct the intent and purpose of the user's question so that the system can provide an appropriate answer.

Table 1 .
Comparison of the retrieve case method

Table 2 .
Element-based rule base extraction PPPICCOODTQ

Table 3 .
Comparison of the answer validation method

Table 5 .
Comparison of the answer generator method

Table 6 .
Positive predictive value (PPV) negation phrases Bahasa Indonesia found in the training set

Table 7 .
Comparison of handle negation phrase methods

Table 8 .
Syntactic classification of user utterances

Table 9 .
Categorization by subject