A Mobile-Based Question-Answering and Early Warning System for Assisting Diabetes Management

. With increasing demand for preventive management of chronic diseases in real time by using the Internet, interest in developing a convenient device on health management and monitoring has intensified. Unlike other chronic diseases, diabetes particularly type 2 is a lifelong chronic disease and usually requires daily health management by patients themselves. This study is to develop a mobile-based diabetes question-answering (Q&A) and early warning system named Dia-AID, assisting diabetes patients and populations at high risk. The Dia-AID system consists of three modules: a large-scale multilanguage diabetes frequently asked question repository, a multimode fusion Q&A framework, and a health data management module. A list of services including risk assessment and health early warning is provided to users for health condition monitoring. Using the diabetes frequently asked question repository as data, experiments are conducted on answer ranking and answer selection aspects. Results show that two essential methods in the system outperform baseline methods on both aspects.


Introduction
With the increasing attention of ubiquitous healthcare (Uhealthcare) services and the developing of information technology, there has been a great need for preventive management of chronic diseases and management of individual health conditions [1]. Diabetes mellitus, a.k.a. diabetes, as one of the most representative chronic diseases, has become a serious global public health issue and the most challenging health problem in the 21st century [2][3][4]. The statistics of the number of diabetes patients 20-79 years of age in the past 18 years are shown in Figure 1, according to the latest global estimation from the International Diabetes Federation (IDF) and the Research2guidance report (http://www.research2guidance.com). Compared to 151 million in 2000, there is nearly a threefold increase in the number of adults living with diabetes mellitus. Moreover, the number is expected to increase from 425 million in 2017 to 629 million in 2045, which means that one out of 11 adults will suffer from diabetes [5]. In addition, as reported by the World Health Organization (WHO), diabetes was the direct cause of 1.6 million deaths in 2015. However, nearly 50% of diabetes patients are undiagnosed and remain unaware of their conditions. Among the patient population, the majority of diabetes cases are type 2 diabetes mellitus (T2DM) [6]. Unlike type 1 diabetes mellitus which remains unpreventable with current knowledge, 80% of type 2 diabetes mellitus can be prevented by keeping moderate blood sugar and lifestyle [7]. People with diabetes type 2 frequently need counseling on healthy diet and regular physical activity to reduce the risk of complication [8]. Thus, diabetes management is a crucial and necessary procedure for diabetes patients or people at a high diabetes risk [9][10][11].
Recently, the focus of healthcare is shifting from treatment to prevention and early diagnosis of disease [12]. Fox et al. [13] addressed that 31% of US smartphone owners used their phones to search for medical information online, 30% of Internet users consulted online reviews of rankings of healthcare services or treatments, and 26% of Internet users read other people's experiences about health or medical issues. By 2015, nearly 500 million smartphone users used mobile health applications especially for diet and disease management [7]. Later, Krebs et al. [14] showed that 58.23% (934/1604) of mobile phone users downloaded a healthrelated mobile app and used it at least once per day. As a convenient platform for checking users' health status on a real-time basis, mobile applications have been developed from information provision to lifestyle-oriented smart health management. Moreover, existing research presented that continuous real-time consulting and monitoring supported by smartphones is applicable for improving efficiency of diabetes self-management [7,[15][16][17]. Therefore, developing a mobile-based system for diabetes patients to assist in their health management is of great importance. Many studies have shown that current medical search engines, e.g., PubMed, Medical Subject Headings (MeSH), and Unified Medical Language System (UMLS), are often unable to serve users with clinically relevant answers in a timely manner and thus fail to satisfy patients' counseling need [18,19]. Hersh et al. [20] found that a healthcare professional took more than 30 minutes on average to seek an answer utilizing information retrieval systems. The process needs about 2 minutes on average to obtain an answer even for experienced doctors [21]. Instead, based on natural language processing techniques, question answering (Q&A) aims to provide users with direct, precise answers to their questions, and thus it is more preferred. Hence, there is an increasing demand to develop convenient and effective question-answering systems for the medical domain [21][22][23][24]. Moreover, there is a particularly growing demand of Q&A systems for effectively and efficiently assisting diabetes patients to better utilize ever-accumulating expert knowledge [1,7,15].
To that end, this study aims to develop a mobile-based question-answering and early warning system, called Dia-AID. The system consists of 3 modules: a large-scale multilanguage diabetes FAQ repository, a multimode fusion Q&A framework, and a health data management module with early warning function. The repository captures diabetes questions with expert-defined answers and stores the question-answer knowledge in an interpretable and extendible form. The framework contains three different Q&A resolution strategies: knowledge-based Q&A, FAQ-based Q&A, and webbased Q&A. The health data management module containing early warning provides a convenient counseling service on a smart health platform to assist diabetes patients in monitoring their health conditions. The contributions of this work include the following: (1) a large-scale multilanguage diabetes FAQ repository is built with a consistent representation format; (2) a novel multimode fusion Q&A framework that integrates three modes of Q&A technologies is proposed to fulfill diabetes information seeking need; (3) a health data management module containing early warning function is developed to monitor patient health status.
The rest of this paper is organized as follows. Section 2 introduces related work in biomedical question answering. Section 3 describes the mobile-based question-answering and early warning system Dia-AID in detail. Section 4 presents the experiment results of our methods based on the FAQ data repository. Section 5 addresses the conclusions.

Related Work
The aim of question answering is to provide precise answers instead of relevant documents from unstructured data sources to inquirers. The research of open domain question answering (Q&A) started from the prompt and instantiated work in the Text REtrieval Conference (TREC) evaluation campaign [25]. Recently, with the increasing demand of domain-specific applications, a growing interest has shifted from open domain Q&A to restricted domain Q&A [26,27]. Molla et al. [28] addressed that restricted domain Q&A targeting domain-specific information was expected to achieve effective and reliable performance in real-world applications. Further, as claimed by Mishra et al. [29], restricted domain Q&A could fulfill the specialized information requirements of domain experts, therefore improving the satisfaction of users. Similarly, Yu et al. [30] and Rinaldi et al. [31] noted that restricted domain Q&A, such as biomedical domain [24,32], could exploit domain-specific knowledge resources for deeper text analysis, as well as taking advantage of domainspecific typology formatting conventions to improve the answer extraction performance.
In light of Athenikos et al. 's research [27], medical domain question answering was facing the challenge of highly complex domain-specific terminology and lexical and ontological resources. Also emphasized by Abacha et al. [33], the key process was to translate the semantic relations expressed in questions into a machine-readable representation to analyze the natural language questions deeply and efficiently. They presented a complete question analysis approach including medical entity recognition, semantic relation extraction, and automatic translation to SPARQL queries. Result presented that 60% of the questions were correctly translated to SPARQL queries via the proposed method. Later, Anca [34] proposed the GFMed for dealing with the same problem and the challenge of querying a large number of Linked Data from various domains. GFMed was a Q&A system for biomedical interlinked data aiming to fill the gap between end users and formal languages by introducing a grammatical framework to translate biomedical information in natural language to the corresponding SPARQL language. The experimental results demonstrated that the proposed methodology for building Wireless Communications and Mobile Computing 3 Controlled Nature Language for querying Linked Data was valid. Abacha et al. [35] proposed an approach for "Answer Search" based on semantic search and query relaxation to resolve the problem of automatic Q&A in medical domain. They defined question focuses as medical entities that were the most closely linked to answers to improve the overall performance of question answering. Terol et al. [36] claimed a general Q&A system that was capable of working over any restricted domain. Taking medical domain as an example application, their system answered medical questions according to a generic question taxonomy and gained 94.4% overall precision on the task.
During question-answering process, question representation is an essential step in question analysis and answer retrieval. Zhang et al. [37] proposed a system based on multilayer self-organizing map, providing an efficient solution to the organization problem of structured data of electronic books. A tree-structured representation was proposed to formulate the rich features of an e-book author. Their experiment results corroborated that the proposed models based on the tree-structured representation outperformed content-based models. Later in their further research, an efficient learning framework Tree2Vector for transforming tree-structured data into vectorial representations was proposed [38]. By utilizing Tree2Vector framework to map treestructured book data into vectorial space, their continued experiments further presented that the mapped vectorial space could explore term spatial distributions over a book rather than the traditional document modeling methods [39].
A recent trend among medical Q&A systems was to incorporate the organized medical information throughout Q&A process in order to utilize the information for efficient health management in various areas such as U-healthcare [40,41]. Jung et al. [42] developed a decision supporting method mainly for pain management for chronic disease patients based on frequent pattern tree mining. The proposed method aimed to reduce time and expenses for pain decision-making of patients who were frequently exposed to pain. Chung et al. [12] presented a knowledge-based health service by leveraging a hybrid wireless fidelity peer-to-peer architecture. The service was proposed to provide patients with efficient and economical healthcare through correct measurement of various biosignals, so that users could easily predict and manage both health and disease. Han et al. [43] introduced a U-health service system THE MUSS, focusing on achieving reusability and resolvability, to provide stress and weight management services.
In subsequent medical Q&A developments, diabetes mellitus, as one of the top three major worldwide causes of death from noncommunicable diseases, has prompted numerous researches investigating the prevention, prevalence, and mortality of diabetes mellitus [15,[44][45][46][47][48][49][50][51][52][53]. There is a great demand for a Q&A system that can effectively and efficiently provide health consulting services and assist people in monitoring and managing their individual health conditions. Jung et al. [7] explored a mobile healthcare application for providing self-diabetes management to patients. By interoperating with Electronic Medical Record (EMR), the healthcare application provided services such as weight management, cardio-cerebrovascular risk evaluation, and exercise management. Waki et al. [54] developed a realtime interactive system DialBetics to achieve diabetes selfmanagement, particularly HbA1c management. By an evaluation strategy, the system helped patients improve their HbA1c significantly by monitoring health data compared with continuing self-care regimen patients. More recently, Yoo et al. [1] proposed a Personal Health Record-(PHR-) based diabetes index service model through a mobile device, offering users a management information service for preventing diabetes. Users were able to check their health conditions on a real-time basis and receive information about desirable health behaviors and dietary habits related to diabetes.
Yet, the existing diabetes management applications provided general information search and management, while ignoring counseling services, which were crucial for managing the health condition of diabetes patients. Besides, as claimed by Mishra et al. [29], the cons of restricted domain Q&A included the limited repository of domainspecific questions. To overcome these difficulties, we built a LMD-FAQ repository to provide users with concise and accurate answers by physicians or experts from debates related professional websites. Moreover, we aim to leverage the LMD-FAQ repository to provide counseling services of diet, medication, and symptoms for diabetes patients. In addition, based on our previous work [55], by analyzing global clinical trials of 190 countries provided by the National Institutes of Health (NIH), we discovered 6 representative health characteristics that were closely related to diabetes mellitus to better manage users' health conditions. The six representative health characteristics were Body Mass Index (BMI), Glucose, Systolic Hypertension, Diastolic Hypertension, HbA1c, and creatinine. Further, we defined several early warning intervals of health characteristics by referring to the existing international medical standards for health management and risk warning.

Methods and Materials
The architecture of our mobile-based diabetes questionanswering and early warning system Dia-AID is shown in Figure 2. It consists of 3 modules: a large-scale multilanguage diabetes FAQ repository (LMD-FAQ repository), a novel multimode fusion question-answering framework (MMF-QA), and a diabetes data management module with early warning (DM-EW). The LMD-FAQ repository contains a large number of diabetes question-answer pairs acquired from mainstream diabetes-related professional websites. The MMF-QA framework integrates three strategies: knowledgebased Q&A, FAQ-based Q&A, and web-based Q&A. The DM-EW module records patients' health data and monitors their health conditions in real time. Six representative health characteristics that are closely related to diabetes mellitus, that is, BMI, glucose, systolic hypertension, diastolic hypertension, HbA1c, and creatinine, are applied. In case of a rapid characteristic change or a predication of deterioration, the module will automatically warn patients and provide them with dietary guidelines.

The Large-Scale Multilanguage Diabetes FAQ Repository.
Frequently Asked Questions (FAQs) provide specific answers to the questions that are frequently asked when users browse specific websites. For example, the website Health China (http://health.china.com) allows users to ask questions in free text and those who are experts in the field answer the questions freely. These questions with professional answers are collected and organized as FAQ data. The FAQ data can dramatically benefit question answering by reusing the accumulated professional knowledge. In this paper, we develop a method to automatically construct a large-scale multilanguage diabetes FAQ (LMD-FAQ) repository through identifying FAQ data from professional diabetes websites.
As illustrated in Figure 2, our method includes four steps: (1) The first step is automatic question acquisition. We first analyze the page structures of specific websites to identify diabetes questions. The websites, elaborately selected by domain experts, include Diabetes Clinical Guidelines (Chinese Medical Association Diabetes Branch), professional diabetes websites (International Diabetes Union, the American Diabetes Association, etc.), diabetes professional information websites (CDC Health Channel), and diabetes interactive question-answering websites (Yahoo! Knowledge). The questions and associated answers are then extracted using regular expression matching with the page codes. (2) The second step is question target identification and classification. Based on our previous work [56,57], an automated answer type identification and classification method is applied to extract the target and intent of questions by utilizing both syntactic and semantic analysis. Considering that syntactic structures vary according to the ways questions are asked, four typical situations are identified and analyzed with each of them having a specific processing strategy. During the process, question target features are extracted via a principle-based syntactic parser and then expanded with their hypernymy features and semantic labels. Finally, the expanded features are sent to a trained classifier to predict corresponding answer types. pattern, which consists of five components: the question target, question type, concept, event, and constraint. An entropy-based method proposed in previous work [58] is applied for automated semantic pattern generation. Figure 3 shows the visualization of example FAQ data in the LMD-FAQ repository.
Based on the above procedure, the method extracts FAQ data from professional websites, formats them using a consistent representation, and indexes them with semantic patterns for fast retrieval. Through the automatic process and the human review on the indexed data, the FAQ repository can be incrementally maintained. Currently, the LMD-FAQ repository comprises 19,317 English frequently asked QA pairs and 6,041 Chinese QA pairs. The repository provides our Q&A system with fundamental data support for answering commonly posted questions.

The Multimode Fusion Question-Answering Framework.
The multimode fusion question-answering framework (MMF-QA) integrates three Q&A models: knowledge-based Q&A, FAQ-based Q&A, and web-based Q&A. The overall framework is shown in Figure 4. The procedure of the models is described as follows.
The knowledge-based Q&A model relies on a diabetes knowledge base to generate concise answers for posted questions. For a new given question, the model analyzes the structure and keywords of the question and then generates a corresponding semantic pattern. Thus, the question is transformed from natural language to a structural semantic representation that captures semantic information such as question target, question type, concept, event, and constraint The question then is further represented as a tuple: ([Concept 1 ], {Relation}, [Concept 2 ]), in which "Concept 1 " and "Concept 2 " are used to label meaningful entities. The represented question is used for answer extraction from knowledge base. For instance, "What's the symptoms of diabetes?" is represented as ([symptoms], {Rel: of }, [diabetes]). Therefore, the knowledge-based Q&A process mainly maps entities and their relations to formally represented tuples, which are further used to match knowledge base to retrieve accurately matched knowledge elements as answers.
The FAQ-based Q&A model computes matching scores between a given question and questions in the FAQ repository. The questions with matching scores larger than a specific threshold are kept as candidates. The candidate questions then are ranked and the top questions with the highest scores are returned. The model consists of three main steps: Qsem-based question matching, LSI-based answer ranking, and answer selection. As claimed by [59], a major challenge of FAQ-based Q&A is to match questions to corresponding question-answer pairs. Here, we apply a QSem-based question matching framework, proposed in one of our previous works [60], to support answering FAQs through reusing accumulated QA data. The framework considers  both question word types and semantic pattern according to their functionalities in question matching. The question word types include question target word, user-oriented word, and irrelevant word. These three word types are semantically labeled by a predefined ontology to enrich the semantic representation of questions. For each word type, different similarity strategies are applied to calculate the similarity, as described in [60]. The similarity calculations for question target and user-oriented word type between question and a FAQ candidate faq are shown in (1), (2), and (3), respectively.
( , ) In the equations, Simi denotes the similarity score of QT word type between a given new question and an existing FAQ question faq . * denotes the set of semantic labels corresponding to target words of the question. ( → * ) and ( → * ) represent the semantic labels of QT words in and faq through semantic labeling, respectively. ∪ → ( ) denotes synonymy words expansion of word . SMatch denotes the synonymy-based word matching of two words and . ∪ → ( ) is the synonymy extension of word by adding synonymy word collection ( ). By integrating the previous three parts of matching, the overall matching score ( , ) of the two questions and faq through balancing the similarity of each part is calculated as shown in After question matching, top FAQs with the highest matching scores are selected as candidates set . Meanwhile, the web-based Q&A model uses a similar strategy to compute the matching scores to web question collections. It extracts answers from websites via the standard questionanswering techniques. Similarly, the web-based Q&A returns a candidate question-answer set Q . Q and are Figure 5: The screen snapshots of the mobile-based system for providing diabetes information services using the three Q&A models.
merged as the final answer candidates Q for answer ranking and answer selection. We propose a LSI-based answer ranking method to rerank the questions in Q . The ranking method consists of three steps: feature extraction, Latent Semantic Indexing (LSI) similarity calculation, and ranking. The extracted features of Chinese questions are bag-of-words (BOWs) and Character, while the features for English questions are bagof-words feature only. The LSI approach takes advantage of implicit higher-order semantic structure and matches words in queries with words in documents [61]. Here we treat each candidate answer as a short document and detect the most relevant answers via the LSI-based method. After that, the candidate answers are reranked based on the similarity values and the top answers as candidate list are returned.
Finally, there is an answer selection process. The selection of a candidate answer as correct or incorrect can be treated as a binary classification task. The question and corresponding top candidate answers in list are transformed to QA pairs. We propose an answer selection approach via a Logistic Regression (LR) classifier, which includes four steps: feature extraction, parameter tuning, model training, and answer selecting. Using the features similar to the LSIbased approach, QA pairs are randomly selected from the LMD-FAQ repository as training data. The QA pairs with correct answers are labeled as "1", and "0" otherwise. We then tune the parameter "C" (inverse of regularization strength) to avoid overfitting/underfitting issue. After parameter optimization, the best parameter is applied in the LR classifier, which then is applied to select the best candidate answers, where the top 1 is the best answer and the remaining N-1 answers in list are relevant answers. Figure 5 shows the screen snapshots of the knowledge-based Q&A, FAQ-based Q&A, and web-based Q&A modes.

Diabetes Data Management with Early Warning.
Since diabetes patients and people at high risk usually need long-term health management, we develop a real-time data management module incorporating early warning to achieve patient health self-management.
In the data management module, users are required to register their basic information. After that, the users can log in to report their recent health data related to six main characteristics: HbA1c, BMI, glucose, systolic hypertension (hypertension S), diastolic hypertension (hypertension D), and creatinine. The health data then are stored in server side securely.
With the historical health data, the module calculates and monitors the health status in real time. For each of the characteristics, we set an alarm value according to literature review on IDF documents and reports. Once the health data has a dramatic change or the characteristics are close to their corresponding alarm value ranges, the system will automatically deliver a warning message to the users about the situation. To evaluate the usability of the system, a 2-month randomized study is designed. Thirty people volunteered as internal test users to monitor their health condition via the Dia-AID system. During the test, users measure and report the data of the six characteristics by themselves. Based on each new data report, the system calculates the existing data and newly submitted data to make a summarization of the health condition in real time. Table 1 shows the reported health data records by a user Cecil.
The system records all the reported health data and generates data change curves automatically. For example, Figure 6 8 Wireless Communications and Mobile Computing  shows the trend curve of Cecil's diastolic hypertension in the last 7 days. When the current newly submitted health data is within safe range and there is no dramatic change compared with last report, the system shows the user with the health status messages, e.g., "Your health status is good" in green color. Once the system identifies current user data exceeding alarm range (either too high or too low) according to the current change trend, the system will evaluate how long it takes to reach the alarm value. The system will evaluate how long it will take to reach the alarm value. If the period is too short, the system will automatically warn the current user. For example, the system warns the user Cecil that diastolic hypertension is too high and will be in a danger range after 2 days if the user does not have any control on it. Through the health data management incorporating early warning, users can review their health status and take actions to reduce the risk of diabetes according to the warning messages.

Datasets.
Since there is no available diabetes FAQ dataset for evaluation, the evaluations of the proposed LSI-based answer ranking approach and answer selection method were based on the constructed LMD-FAQ repository. To test the LSI-based answer ranking approach, we randomly selected 500, 750,1000,1250,1500, and 1750 Chinese questionanswer tuples (question, <answer-set>) from the repository, respectively, as six subdatasets of Evaluation dataset-A. For each question-answer tuple, it contains one question and an answer set which consists of one correct answer and nine incorrect answers randomly generated from the rest of the repository. Thus, each question contains 10 candidate answers for ranking. For answer selection evaluation, we suppose each question has candidate answers; i.e., for each question, k-1 incorrect answers are randomly generated as negative samples. In this paper, k is set to 5 and 10. For the setting k=5, 6000 QA pairs are randomly generated as Training dataset-B1, and 2500 QA pairs are randomly generated as Testing dataset-C1. For the setting k=10, 8000 QA pairs are randomly generated as Training dataset-B2, and 5000 QA pairs are randomly generated as Testing dataset-C2.

Evaluation Metrics. The evaluation metrics include
Mean Reciprocal Rank (MRR), Accuracy@N of the returned answers, precision, recall, and F1 measure, all of which are commonly used metrics to evaluate the performance of Q&A systems.
(i) MRR: Mean Reciprocal Rank of the first correct answer, as shown in (5) (i.e., 1 if a correct answer was retrieved at rank 1, 0.5 if a correct answer was retrieved at rank 2, and so on. Q is the test set and | | denotes the number of questions in Q. rank represents the position of the first correct answer in answer ranking candidates to a test question ).
(ii) Accuracy@N: proportion of correct answers in top returned answers by the system, as shown in (6) ( ( ) = 1 if there is at least one correct answer in top candidates; otherwise, it is 0).
(iii) Precision for any of the categories is the number of true positives (TP) (i.e., the number of questions correctly labeled as belonging to the positive categories) that are divided by the total number of questions labeled as belonging to the positive categories, as shown in (7). False positive (FP) is the number of questions that the system incorrectly labeled.
(iv) Recall is defined as the number of true positives divided by the total number of questions that actually belong to positive categories (i.e., the sum of true positive and false negative), as shown in (8).
(v) F1-measure considers both the precision and the recall to compute a balanced score, as shown in (9).

Results.
To validate the proposed LSI-based answer ranking method, we conduct the following two experiments. The first experiment is to verify the effectiveness of the LSIbased answer ranking method by comparing to five baselines. We adopt Doc2Vec, Latent Dirichlet Allocation (LDA), Locality Sensitive Hashing (LSH), docsim, and Synonyms [62] as baselines. We randomly select 500 questions and measure the performance in MRR and Accuracy@N (Acc@N, = 1, 2, 3, 4, 5). Compared with the baselines, our method achieves the best performance in all evaluation metrics, as shown in Table 2. For MRR, our method improves by 17.80% compared to LSH which has the best performance among baselines. For Acc@1, LSH also obtains the best performance as 0.6733 among baselines. Our method outperformed LSH with an improvement of 23.52%. In addition, our method ranks 94.99% of the correct answers in the top five of candidate answers. The improvements of MRR and Acc@1 prove that the proposed method can potentially promote the positions of correct answers.
To assess the stability of the proposed method, the second experiment is conducted by comparing to the same five baselines with the measures of MRR and Acc@1. The used dataset is Evaluation dataset-A. Figure 7 illustrates the experiment results measured in MRR, while Figure 8 shows the results measured in Acc@1. From the result, our method achieves stable performance over all different sizes of the question sets. This result is promising since our method ranks most of the correct answers in the top of the candidate answer list. Moreover, compared to the baselines, our method gains  the best performance measured in Acc@1 on all the question sets. From the results, even with the increasing number of questions, nearly 85% of correct answers are ranked in the top of the candidate answer list. Since our answer selection approach uses a binary classifier, we assess the method by evaluating the effectiveness of answer classification. During the evaluation, three experiments are designed: the first is to train optimized parameters, the second aims to assess the stability of classification, and the third aims to evaluate the effectiveness by comparing with baseline methods. The datasets used for evaluation are from the constructed LMD-FAQ repository and the evaluation metrics are precision, recall, F1, and accuracy.
To avoid the overfitting/underfitting problem, we tune the parameter "C" (inverse of regularization strength) for the LR classifier as described above. 12,651 QA pairs are randomly selected from the LMD-FAQ repository as the dataset. The dataset then is randomly shuffled into two subgroups as training (70%) and testing (30%). We use k-fold cross-validation to assess the model performance. Figure 9 demonstrates the validation curve, where training accuracy represents the results on testing dataset and validation accuracy denotes the 10-fold cross-validation results. From the results, the method gains the best performance when "C" is equal to 1, which is the best parameter applied in the following two experiments. The stability of the proposed method is tested with different sizes of training data and different values. By setting k=5, the Training dataset-B1 is randomly divided into 5 training subsets containing 2000, 3000, 4000, 5000, and 6000 question-answer pairs, respectively. Similarly, by setting k=10, the Training dataset B2 is randomly divided into 5 training subsets with 4000, 5000, 6000, 7000, and 8000 question-answer pairs. The datasets C1 and C2 are used as testing datasets independently. The results are measured in accuracy (Acc), precision, recall, and F1-measure (F1). As shown in Figure 10, our method receives a stable performance on all evaluation metrics with k=5. When the size of training dataset is larger than 3000, the performance on all metrics increases. The experiment results indicate that our method is not affected much by training dataset size. As illustrated in Figure 11, the performance measured in accuracy remains stable on all sizes of training datasets. With the increasing of training dataset size, the performance measured in F1 increases. Comparing the performance on the two dataset settings, our method yields a better performance when equals 10, which indicates that the proposed method remains stable even with more incorrect answers in candidate answer lists.
We further compare our method with five commonly used classification methods: Support Vector Machine (SVM),  Perceptron (PPN), Random Forest (RF), Gaussian Naive Bayes (GaussianNB), and k-Nearest Neighbor (KNN). The datasets used are the Training dataset-B1 and Training dataset-B2 and the corresponding Testing dataset-C1 and Testing dataset-C2. The evaluation metrics are accuracy, precision, recall, and F1. Table 3 shows the comparison results using different dataset settings. By setting k=5, an accuracy of 0.9222, a precision of 0.8859, a recall of 0.8657, and an F1 of 0.8753 are achieved as the best performance compared to five baseline methods. By setting k=10, our method also obtains the highest performance on all evaluation metrics compared to the baselines. Particularly, the higher precision and F1 are more preferable since our expectation is the return of more correct answers to users to improve user satisfaction.

Conclusions
Aimed at assisting diabetes patients or populations at high risk of diabetes to have long-term health management, this paper designed and developed a mobile-based questionanswering and early warning system, Dia-AID. The system   assists users in providing diabetes information and monitoring their health status through diabetes question answering, risk assessment, and health record management. We evaluated two essential models in our system and compared them with five baseline methods on various metrics. The results showed that our methods achieved the best performance compared with the baseline methods.

Data Availability
The diabetes data is not made publicly available.

Conflicts of Interest
There are no conflicts of interest in this paper.