Auxiliary Diagnosis Based on the Knowledge Graph of TCM Syndrome

: As one of the most valuable assets in China, traditional medicine has a long history and contains pieces of knowledge. The diagnosis and treatment of Traditional Chinese Medicine (TCM) has benefited from the natural language processing technology. This paper proposes a knowledge-based syndrome reasoning method in computer-assisted diagnosis. This method is based on the established knowledge graph of TCM and this paper introduces the reinforcement learning algorithm to mine the hidden relationship among the entities and obtain the reasoning path. According to this reasoning path, we could infer the path from the symptoms to the syndrome and get all possibilities via the relationship between symptoms and causes. Moreover, this study applies the Term Frequency-Inverse Document Frequency (TF-IDF) idea to the computer-assisted diagnosis of TCM for the score of syndrome calculation. Finally, combined with symptoms, syndrome, and causes, the disease could be confirmed comprehensively by voting, and the experiment shows that the system can help doctors and families to disease diagnosis effectively. This paper uses the reinforcement learning method to obtain a multi-inferential meta-path of a symptom entity to a syndromes entity, and obtain a multi-inferential meta-path of a symptom entity to an etiology entity. The first is the machine, which mainly involves how the agent selects the action in a specific state. In this paper, the neural network is used to calculate the probability of selecting an action in a certain state. The environment refers to the constructed knowledge graph of TCM syndrome. The nodes in the graph are all states in which the agent may be located. The edge in the graph is the action that the agent may take.


Introduction
Many types of research on disease diagnosis are based on western medicine, and carried out from the perspective of computer vision. There are fewer related studies that combine disease diagnosis and knowledge graph. Hua et al. [Hua, Bao, Chen et al. (2019)] only conducted related research on the knowledge graph and proposed optimization of the dynamic measure in the spillover effect. But there is no TCM diagnosis and treatment based on natural language and knowledge graph. This paper mainly refers to the knowledge graph based on reinforcement learning and the transformation idea of TF-IDF. The rise of deep learning has led scholars to consider combining deep learning with reinforcement learning. Li et al. [Li, Wang and Li (2017)] reviewed the construction, representation, and application of knowledge graphs, and explained the key technologies of knowledge graphs. Wang et al. [Wang, Li, Wang et al. (2013)] combined the knowledge graph with enterprise data and proposed to build a large-scale application knowledge graph. On TF-IDF, Dimitrovski et al. [Dimitrovski, Kocev, Kitanovski et al. (2014)] used TF-IDF in the medical field, but they used it to improve medical image modality classification and did not use them in disease reasoning scores. Wang et al. [Wang, Zhang, Yuan et al. (2018)] applied TF-IDF to the knowledge graph, but they only assigned reasonable weights to keywords, and did not use TF-IDF in scoring, especially in the specific field of Chinese medicine. Mnih et al. [Mnih, Kavukcuoglu, Silver et al (2015); Caicedo and Lazebnik (2015)] combined the Convolutional Neural Network(CNN) with the Q-learning algorithm in the traditional reinforcement learning and proposed the deep Q-network model, which is a pioneering work in the field of deep reinforcement learning. However, the reinforcement learning techniques in the medical field are rarely used, especially in the path of disease reasoning. This auxiliary diagnostic system aims to assist doctors in diagnosis. Based on the knowledge graph [Feng and Qin (2015); E, Lin and Xiang (2016)] of TCM syndrome, this paper analyzes the intrinsic relationship between symptoms entity and syndromes entity, and this paper also analyzes the relationship between symptoms entity and etiologies entity, and finally obtains the path from a symptom entity to a syndrome entity and path from a symptom entity to an etiology entity. A multi-inferential meta-path, where the meta-path is a path containing only the entity type and the relationship type, indicating that a certain type of entity can reach a new type of entity through a certain type of relationship in the graph. And through these meta-paths, the corresponding candidate syndromes are inferred, and the candidate syndromes appearing are scored according to the scoring mechanism set by themselves. The higher the score value, the higher the recommendation degree of the system. The reinforcement learning algorithm [François-Lavet, Henderson and Islam (2018)] is adopted, and the knowledge graph of TCM syndrome is used as the reinforcement learning environment. The specific reward mechanism for this experiment is designed. The fully connected neural network is used to parameterize the strategy function, and the strategy of finding the path between nodes is designed. The network discovers the metapath between pairs of the entity. The resulting meta-path is analyzed and an experiment is performed to verify the validity of the meta-path. Inferring the disease based on the obtained meta-path requires a better scoring rule to get the most likely disease for the patient. This article creatively uses the idea of TF-IDF to calculate the discrimination of each symptom, and the weight of the meta-path can calculate the score of each candidate syndrome. Given a set of symptoms, when the corresponding etiological node is found according to the meta-path, the score of the etiological node found by the meta-path is calculated by superimposing the weight of the meta-path. Finally, the idea of syndrome differentiation with multi-strategy is put forward. According to the concept of pre-determined etiology in medical consultation, the reasoning from symptoms to syndromes, from symptoms to causes is used to improve the effectiveness of disease underlying reasoning.
2 Related work 2.1 Dataset This auxiliary diagnostic system aims to assist doctors in the diagnosis of diseases. Based on the knowledge graph of TCM syndrome, this paper starts the related work. First, the TCM syndrome books are scanned and the knowledge is extracted to form a triad form. Then we use the neo4j database to establish the corresponding knowledge graph. There are 54016 entities in the database, 113413 groups in the relationship between the entities, and 24 nodes and 23 relationships in the graph.

Framework
First, the symptom path and the path of the cause are inferred through reinforcement learning methods; then the symptoms are input, on the one hand, through the obtained symptom path and TF-IDF, multiple related syndromes are calculated, and on the other hand, the path and settings of the cause are obtained. The cause weight is calculated for the most likely cause. Finally, the cause is integrated into the syndromes to further screen out the most likely syndromes. The specific framework is shown in Fig. 1.

Reinforcement learning
The basic model of reinforcement learning is shown in Fig. 2. The environment is sensed by interacting with the environment through an agent, and the action is selected to obtain the maximum cumulative reward value. The interaction interface between the agent and the environment includes actions, rewards, and status.  (2018)]: if an action causes the environment to be rewarded, the trend of generating this action will be strengthened, and vice versa. That is, only relying on the current state and rewards, the experience is accumulated through enhanced signals. Finally, the optimization option is achieved. In this principle ' s s → is a state change, r is reward and punishment feedback, γ is the discount factor.
Reinforcement learning formulas is defined as: In general, reinforcement learning is often modeled using the Markov decision process and iteratively solved by the well-known Bellman formula. First, we give a formal definition of the Markov process. Markov decision process: consider discrete state set S , discrete action set is A , the reward function is R . Then there is ( ) S A PD S × → . The goal of the agent is to seek an optimal strategy π at each discrete state S to maximize the sum of desired discount and reward v( , ) s π , where v( , ) s π is defined as follows: In the above formula, 0 s is the initial state, t r is the reward obtained at a time t . For any state s S ∈ , it can be proved that there is an optimal strategy * π that satisfies the Bellman formula: Among them, s π is the optimal value of the agent in the state S . If the agent knows the reward function and the state transition function, * π can be obtained by some iterative search method. Based on the Markov decision process, the researchers proposed many different reinforcement learning algorithms to adapt to various learning environments. This paper uses the reinforcement learning method to obtain a multi-inferential meta-path of a symptom entity to a syndromes entity, and obtain a multi-inferential meta-path of a symptom entity to an etiology entity. The first is the machine, which mainly involves how the agent selects the action in a specific state. In this paper, the neural network is used to calculate the probability of selecting an action in a certain state. The environment refers to the already constructed knowledge graph of TCM syndrome. The nodes in the graph are all states in which the agent may be located. The edge in the graph is the action that the agent may take.
In terms of motion, the operation of selecting an action in this project is to select some type of edges in the graph. Selecting the edge operation means that the agent selects a certain relationship and jumps to the new node. For example, the current agent, selecting the edge with the type of "master token_inv", which is at the headache node, and jump to the disloyalty node. The abstraction of such an operation is: the current state of the agent is "headache", and an action such as "main syndrome-inv" is selected. The state of the agent changes from "headache" to "disorder". The agent is to continuously change the state of the situation by taking a series of such operations, and finally find the target node. In terms of status, each node in the TCM syndrome knowledge graph formed is a state.
In the policy network, by learning a given set of entity pairs that need to discover a path, Breadth-First-Search (BFS) search method is used in the given environment to find a path between entities. In the knowledge graph of TCM syndrome, if the BFS algorithm finds a path between two nodes, it will stop searching. This situation results in the inability to calculate the reward value of path diversity in subsequent work, so it is solved by designing intermediate nodes. This problem is to set five intermediate nodes when actually implementing the code. When the BFS searches for the path between the node A and B, it needs to search for the path from the node A to the intermediate node C and the path from the intermediate node C to the node B. With such a design, it is possible to roughly control the number of paths that can be searched by setting the number of intermediate nodes.
In the paper, the reward value is calculated mainly for the path between two nodes, and the reward value is used to judge the quality of a path. The reward values found by the symptom and syndrome entity path are calculated as follows: Given two nodes node A and node B, if there is no path between node A and node B, then The newly discovered path length between node A and node B is p, then 1 length r p = . The original path between node A and node B is 1, 2, 3, If the newly discovered path between node A and node B has a side of the master syndrome type: 3 mid r = . If there is a side of the descriptive form type in the newly discovered path between node A and node B: 2 mid r = . If there is a side of the guest form type in the newly discovered path between node A and node B: 1 mid r = . then the overall model reward mechanism is defined as:

Discovery of inter-entity paths between one-to-many structures
We apply the above-mentioned reinforcement learning framework to the specific business and discover the path between symptoms and syndromes. The discovered path can provide an auxiliary diagnostic service. The common process of computer-assisted diagnosis is reasoning the disease from patient's symptoms. The popular saying of symptoms in this paper is the symptom, and the popular saying of the syndrome is a disease. Based on the path found, it is thus possible to obtain a set of syndromes that a patient may have from a given set of symptoms. After using reinforcement learning to find the path between two types of entities, a set of meta-paths will eventually be obtained. The so-called meta-path [Ji, Sun and Danilevsky (2012)] refers to the path consisting only of actions and the most simplified path in the path. Such paths have more generalization ability than specific paths. All specific paths conforming to the meta-path can be regarded as specific forms of the meta-path. The specific path is shown in Fig. 3.

Figure 3:
The path between the symptoms and syndromes is found in the final meta-path

Discovery of many-to-many structure inter-entity paths
TCM pays attention to determining the etiology and location of diseases, so the other actual business is to find the path between the symptoms and causes. The process of preprocessing the corpus in this specific business is consistent with the above business. First, the symptoms and causes entity pairs are searched in the database of TCM syndrome, and the entity is used to obtain the corpus. The states of the various meta-paths found in the figure are shown in Fig. 4.

Figure 4:
The meta-paths and the number of paths found between the symptoms and causes This is different from the path used to find the symptom and cause entities in the graph database. The meta-path used when searching for the inter-path between the symptom and cause entities in the graph database is symptoms-inv-main syndrome-inv-syndromecause-single cause, which is also in line with the principle of path length angle in the reward mechanism. In the experiment of path discovery between the symptom and cause entity pairs, the length of the path is significantly longer than the path between the symptom and syndrome entity pairs, and the connection between entity pairs is also more complicated, so in the case that the connection relationship in the knowledge graph is more complicated, the reinforcement learning framework in this project can find some path connections that are difficult to find manually. The paths in Fig. 4 are reduced to obtain a number of available meta-paths that are considered to be the final output, as shown in Fig. 5.

Figure 5: The path between the symptoms and causes finds the final meta-path
After obtaining the inference path between the symptoms and causes, the validity of these discovered paths will be verified by verifying sets and test sets in the experimental part.

Meta-path based on reasoning strategy
Through the intensive learning method, the inference path between the target entity pairs is found in the knowledge graph of TCM syndrome. The goal of this paper is to assist in the diagnosis of the disease. In this chapter, we will explain how to use the obtained meta-path between entities to perform the auxiliary diagnosis of the disease.

TF-IDF conversion application
Given a set of symptoms of the patient and the path of reasoning from symptoms to syndrome, the idea of TF-IDF can be used to Al-Talib et al. [Al-Talib and Hassan (2013); Zheng and Xu (2014)] calculate the degree of discrimination for each symptom, plus the weight of the metapath can be calculated for each candidate syndrome [Roul, Devanand and Sahay (2014)]. In order to make the nodes in the TCM syndrome knowledge graph use the TF-IDF method to calculate the weight of the nodes, the traditional TF-IDF formula needs to be modified. Comparing the two concepts of documents and words in TF-IDF, we find the two concepts corresponding to the knowledge graph of TCM syndrome -the purpose type entity and the input type entity. Therefore, the modified TF and IDF formulas are: where w is the number of times that the input entity can connect to the target entity. t is the number of input entities that the target entity can connect to input entities. a is the total number of target entities. b is the number of target entities associated with the input entity.
The above formula is applied to the path discovery business between the symptom (the input entity type) and syndrome (the target entity type) entities. The formula for TF-IDF in this specific business is: w is the number of times the symptoms can connect to the syndromes. ' t is the number of symptomatic entities connected to the syndromes. ' a is the total number of syndrome entities. ' b is the number of connections between syndrome entities and symptom entities.
Among them, TF is the word frequency calculation of symptom( w ) for syndrome( t ). In TCM syndromes, a symptom appears in syndrome many times, so the TF value for this symptom and syndrome pair is very high. After inputting some symptoms, multiple syndromes can be obtained according to the meta-path. Through the above-mentioned TF-IDF algorithm, the scores of each syndrome can be calculated. The highest score is the disease of the patient. Enter some symptoms of the syndrome node "liver-lung wind-phlegm syndrome" with the id L34-Z04-H033, and it is derived by using the meta-path. The result is shown in Fig. 6. Figure 6: Syndrome score ranking corresponding to some symptoms As shown in Fig. 6, the score of syndrome L34-Z04-H033 is highest in the rank. These results can be demonstrated that the scoring mechanism based on the modified TF-IDF is persuasive, and it can predict the disease by symptoms.

Cause weight superposition
Using the meta-path, given a set of symptoms that the patient is suffering from, the corresponding cause node is found according to the meta-path. The scores of cause nodes are superimposed according to the weights of the meta-paths. For example, given a group of symptoms: stroke fainting, tooth closed, both symptoms can pass the meta-path symptoms-inv → syndrome-inv → cause → single cause connect to single cause wind.
The weight of the meta-path corresponds to 3, so when the cause is determined, the wind gets a score of 6. In this way, all the symptomatic entities in the symptom set can be fixed. For each symptom, the specific cause is the highest score. Some results of the cause are shown in Tab. 1. Thirsty and want to drink water fiery 8 Thirsty and want to drink water irritating 3 Thirsty and want to drink water wet 2 Thirsty and want to drink water weak 1

Syndrome inference with multiple strategies
According to the concept of the pre-determined cause of the disease in the TCM theory, . The fusion between these two reasoning methods, from symptom to symptoms and from symptom to etiological factor, improve the reasoning accuracy. By deciding the cause of the symptoms, a group of symptoms of the patients is mapped to their respective individual causes, and the etiology of these groups of symptoms is calculated by the voting method [Zhang, Luo and Tang (2013)]. For example, the syndrome id of "Clear Gas Loss" in the graph is L34-Z04-H033, and some of its symptoms are shown in Tab. 2: (1) Quiet but not bothered (2) Blind and confused (3) Very much sputum (4) Sleep very sweet (5) Twitch (6) Mouth-eyes(oblique) (7) The eyes were black, and I fainted This group of symptoms is entered as the symptoms exhibited by the patient. If it is carried out by means of symptomatic reasoning, that is, (1) corresponds to a single cause wind; (2) corresponds to a single cause wind; (3) corresponds to a single cause sputum; (4) corresponds to a single cause wind; (5) corresponding to a single cause hot; (6) corresponding to a single cause wind; (7) corresponding to a single cause wind. Therefore, according to the voting method, it can be known that the disease of the patient is probably caused by the wind, and this group of symptoms can reason out five syndromes, namely L34-Z04-H033, L33-Z10-H190, L02-Z10-H190, L33-Z16-H071, L34-Z18-H042. Among them, the causes of the L34-Z04-H033, L02-Z10-H190 and L34-Z18-H042 syndromes include wind, and the scores of these three syndromes are multiplied by 2 to obtain the final score. In the final score, the syndrome L34-Z04-H033 scored the highest, and it is the most suitable symptom description of the patient, which is the most likely disease of the patient.

Experiment
After using reinforcement learning to obtain the inference path between symptoms and syndromes, verification and test sets are needed to verify the validity of these discovered meta-paths. In this experiment, there are 18,000 entity pairs in the corpus of entity pairs of symptoms and syndromes. When the reinforcement learning in the previous step finds the entity pair relationship, the training corpus of the first 2000 entity pairs as the search path is selected, in this step, entity pairs are randomly selected from all entity pairs as positive samples, and then the symptom entities in these positive samples are randomly replaced by one of all symptom entities. Since there are 20,306 symptoms entities in total, the probability of an entity pair or a positive sample is negligible after randomly selecting symptom entities from the positive samples, so the entity pair generated by such a replacement method is considered to be a negative sample. In this experiment, the ratio of positive and negative samples is 1 to 6, that is, one positive sample generates six negative samples. A total of 400 positive samples are randomly selected from the validation set, resulting in 2400 negative samples. A test set corpus is generated by the same method of replacing the symptom entity in the positive sample. The number of positive samples extracted from the corpus of the test set is 200, and the number of negative samples generated is 1200. This experimental model uses the sequential model of Keras. The model is a single layer. The dimension of the data is the path length of the meta-path obtained by the previous training. The activation function is sigmoid, the optimizer is RootMeanSqaureprop (RMSprop), the loss function uses a common logarithmic loss function to judge whether the labels of entity pairs in the verification set and the entity pairs can be linked through the generated meta-path, and the model that is optimal for the verification set is given. After obtaining this model, the average accuracy of the model was calculated on the test set, and the final accuracy was 98.4%. The final results are shown in Fig. 7.

Figure 7:
The accuracy of the model on the test set To prove that the above method of accuracy calculation is convincing, the entity pairs of several negative samples are randomly taken out in the test set, and the entity is modified from the following negative sample mark "-" to the positive sample mark "+". In this way, the program will treat these negative samples as positive samples when reading. When evaluating the negative samples, the model will consider that the positive samples considered by these programs are non-conforming to the discovered meta-paths. Therefore, the accuracy of the model will decrease when the final average accuracy calculation is performed, which indicates that the evaluation criteria of the previously designed meta-path are appropriate. After performing such an experiment, the accuracy of the model did drop to about 96.9%, as shown in Fig. 8. After obtaining the inference path between symptoms and causes, validation and test sets are needed to verify the validity of these discovered paths. In this experiment, there are a total of 7700 entity pairs in the corpus of the symptom and cause entity pairs, which are similar to the experiment of finding the path between the one-to-many entities. The ratio of positive and negative samples is still set to 1:6, the verification set selects 400 positive samples and generates 2400 negative samples. The samples of the verification set are shown in Tab. 3. The positive and negative samples in the table record the corresponding labels with "+" or "-" after the entity pair. A test set corpus is generated by the same method of replacing the symptom entity in the positive sample. The number of positive samples extracted from the corpus of the test set is 200, and the number of negative samples generated is 1200. The result of testing the model through the test sets is shown in Fig. 9. The same network is used for the two experiments. The final accuracy rate is 89.1%. The results show that the path between the symptoms and causes finds the meta-path obtained in this specific business. The reasoning of the path between symptoms and causes in the knowledge graph of TCM syndromes is extensive and applicability. Enter the symptoms and the syndrome score can be obtained by the TF-IDF calculation method. The syndrome node id of "liver and kidney deficiency wind turns card" in the graph is L07-Z21-H296 and input some symptoms of the node, using the meta-path for reasoning results shown in Fig. 10. Figure 10: Symptom prediction syndrome score for node id L07-Z21-H296

Conclusion
Based on the constructed medical knowledge graph, this paper uses the method of reinforcement learning to reason the path among entities. The path between the symptoms entities and syndrome entities, symptoms entities and causes entities were discovered specifically. As the results shown, the paths discovered by the strategy network and reward value mechanism of intensive learning are usable. The dialectical reasoning from symptoms to syndromes and from symptoms to causes are both applied on the actual data for validity check. According to the principle of Chinese medicine diagnosis, the combined reasoning method improves the diagnosis performance.
The results have shown that the concomitant cause combined diagnosis method, especially the TF-IDF scoring, obtain the better Chinese medicine diagnosis. It is possible to further parallel diseases application, and the cause-based diagnosis process will improve the intelligent reasoning performance in the future.