Prediction of Diabetic Macular Edema Using Knowledge Graph

Diabetic macular edema (DME) is a significant complication of diabetes that impacts the eye and is a primary contributor to vision loss in individuals with diabetes. Early control of the related risk factors is crucial to reduce the incidence of DME. Artificial intelligence (AI) clinical decision-making tools can construct disease prediction models to aid in the clinical screening of the high-risk population for early disease intervention. However, conventional machine learning and data mining techniques have limitations in predicting diseases when dealing with missing feature values. To solve this problem, a knowledge graph displays the connection relationships of multi-source and multi-domain data in the form of a semantic network to enable cross-domain modeling and queries. This approach can facilitate the personalized prediction of diseases using any number of known feature data. In this study, we proposed an improved correlation enhancement algorithm based on knowledge graph reasoning to comprehensively evaluate the factors that influence DME to achieve disease prediction. We constructed a knowledge graph based on Neo4j by preprocessing the collected clinical data and analyzing the statistical rules. Based on reasoning using the statistical rules of the knowledge graph, we used the correlation enhancement coefficient and generalized closeness degree method to enhance the model. Meanwhile, we analyzed and verified these models’ results using link prediction evaluation indicators. The disease prediction model proposed in this study achieved a precision rate of 86.21%, which is more accurate and efficient in predicting DME. Furthermore, the clinical decision support system developed using this model can facilitate personalized disease risk prediction, making it convenient for the clinical screening of a high-risk population and early disease intervention.


Introduction
According to statistics from the World Health Organization (WHO), there are approximately 285 million people worldwide who suffer from visual impairment, and 4.8% of these cases are caused by diabetic retinopathy (DR) [1,2]. Diabetic macular edema (DME) is a serious complication of diabetes that affects the eye and is an advanced symptom of DR. It is one of the main causes of irreversible vision loss in diabetic patients [3], affecting approximately 1/15 diabetic patients globally [4]. As the number of diabetic patients Multi-source heterogeneous data: The data sources for a knowledge graph can include structured data (such as databases or canonical form data), semi-structured data (such as tables), and unstructured data (such as text and words).
Knowledge extraction: This process involves extracting entities, relationships, and attributes from multi-source heterogeneous data. Traditional rules, machine learning, and deep learning methods are often used to extract the required knowledge. Currently, the most popular method is CNN [10], which extracts lexical and sentence features for knowledge extraction. Socher et al. [11] used RNN to obtain vector features of sentences, which improved the performance of relation extraction. Zhang et al. [12] proposed a Bi-LSTM method to obtain information between words for relation classification.
Knowledge fusion: This process involves disambiguating, processing, and integrating heterogeneous and diverse knowledge from different data sources in the same framework, thus achieving the fusion of data and information from multiple perspectives. Currently, the technology for knowledge fusion is mainly divided into ontology fusion and data fusion. Ontology fusion involves integrating multiple heterogeneous ontologies from data sources into a unified ontology and establishing mapping rules between multiple ontologies, so information can be transferred between different ontologies. Data fusion involves using methods such as entity merging, entity alignment, and entity attribute fusion to achieve data unification and standardization.
Knowledge representation: This is a way of describing knowledge that involves transforming massive real-world information into structured data using information technology. Currently, the common methods of knowledge representation include XML [13], RDF [14], RDFS [15], and OWL [16], among others. XML represents knowledge in the form of documents and allows users to mark data and define data types. RDF uses a unified standard "subject-predicate-object" triple to describe entities and relationships, which can also be expressed as a directed graph structure. RDFS can be used as an extension of RDF with schema definitions and simple constraint rules for RDF entities, attributes, and relationships. OWL is based on RDFS, adding a predefined vocabulary to describe the characteristics of resources.
Knowledge storage: This component is used to manage and store knowledge. The main methods for storing knowledge include an RDF database, a traditional relational database, and a graph database. Among them, a graph database has become the mainstream method for knowledge storage, as it represents data as nodes and edges and clearly shows the dependencies between data nodes. Graph query languages support various graph mining algorithms, and popular graph databases include Neo4j [17], JanusGraph [18], HugeGraph [19], etc.
Knowledge reasoning: This process infers unknown relationship information based on existing entity relationship information, which further enhances the completeness and usefulness of a knowledge graph. There are several methods of knowledge reasoning, including reasoning based on logical rules, reasoning based on distributed feature representation, and reasoning based on a neural network. Multi-source heterogeneous data: The data sources for a knowledge graph can include structured data (such as databases or canonical form data), semi-structured data (such as tables), and unstructured data (such as text and words).
Knowledge extraction: This process involves extracting entities, relationships, and attributes from multi-source heterogeneous data. Traditional rules, machine learning, and deep learning methods are often used to extract the required knowledge. Currently, the most popular method is CNN [10], which extracts lexical and sentence features for knowledge extraction. Socher et al. [11] used RNN to obtain vector features of sentences, which improved the performance of relation extraction. Zhang et al. [12] proposed a Bi-LSTM method to obtain information between words for relation classification.
Knowledge fusion: This process involves disambiguating, processing, and integrating heterogeneous and diverse knowledge from different data sources in the same framework, thus achieving the fusion of data and information from multiple perspectives. Currently, the technology for knowledge fusion is mainly divided into ontology fusion and data fusion. Ontology fusion involves integrating multiple heterogeneous ontologies from data sources into a unified ontology and establishing mapping rules between multiple ontologies, so information can be transferred between different ontologies. Data fusion involves using methods such as entity merging, entity alignment, and entity attribute fusion to achieve data unification and standardization.
Knowledge representation: This is a way of describing knowledge that involves transforming massive real-world information into structured data using information technology. Currently, the common methods of knowledge representation include XML [13], RDF [14], RDFS [15], and OWL [16], among others. XML represents knowledge in the form of documents and allows users to mark data and define data types. RDF uses a unified standard "subject-predicate-object" triple to describe entities and relationships, which can also be expressed as a directed graph structure. RDFS can be used as an extension of RDF with schema definitions and simple constraint rules for RDF entities, attributes, and relationships. OWL is based on RDFS, adding a predefined vocabulary to describe the characteristics of resources.
Knowledge storage: This component is used to manage and store knowledge. The main methods for storing knowledge include an RDF database, a traditional relational database, and a graph database. Among them, a graph database has become the mainstream method for knowledge storage, as it represents data as nodes and edges and clearly shows the dependencies between data nodes. Graph query languages support various graph mining algorithms, and popular graph databases include Neo4j [17], JanusGraph [18], HugeGraph [19], etc.
Knowledge reasoning: This process infers unknown relationship information based on existing entity relationship information, which further enhances the completeness and usefulness of a knowledge graph. There are several methods of knowledge reasoning, including reasoning based on logical rules, reasoning based on distributed feature representation, and reasoning based on a neural network.

Knowledge application:
A knowledge graph is widely applied in semantic retrieval, knowledge question answering, recommendation systems, and decision making. A semantic search is used to understand user retrieval needs at the semantic level and search for matching resources. Answering a knowledge question involves converting user input questions into objective entity concepts within the knowledge base, which is achieved through structured query mapping via natural language methods in order to obtain the answer. Recommendation systems and decision making involve providing multi-level decision support and knowledge services for decision-making systems by enhancing the mining ability of recommendation algorithms in the knowledge graph, taking into consideration the personalized preferences of users.
At present, there is a lack of research on disease prediction models in the field of DME, and the development of a DME disease prediction model is of great significance for the early screening of a high-risk population. In this context, this study used knowledge graph technology to implement clinical auxiliary disease prediction functions to meet clinical needs. The clinical data collected by the Tianjin Medical University Eye Hospital were used as the research object, and a DME disease prediction model based on the knowledge graph [20] was proposed. By comprehensively analyzing the different influencing factors, the model was able to obtain the incidence probability of a DME disease.
In addition, a clinical auxiliary decision support system was developed to enable personalized disease prediction analysis, which can effectively assist in the clinical screening of a high-risk population for DME, provide technical support for disease prediction, and serve as a reference for innovative applications of disease prediction models. The innovations are as follows.

1.
A statistical analysis was conducted on the clinical data to identify the high-risk factors of DME for clinical reference and early intervention.

2.
Personalized disease prediction model was developed, which comprehensively considered the disease influencing factors and analyzed the disease probability of the target population.

3.
Knowledge graph technology was utilized to construct the disease prediction model, facilitating data updating, maintenance, and iterative improvements.

4.
A clinical decision support system was developed to promote the implementation and usage of this model in clinical practice, providing a technical foundation for an innovative application of the knowledge graph.

Literature Review
Studies showed that the risk factors for DME [21] include hyperglycemia, duration of diabetes, hypertension, hyperlipidemia, renal dysfunction, pregnancy, the level of cytokine VEGF, heredity, etc. Additionally, other factors such as a history of cataract surgery, insulin use, sleep apnea syndrome, anemia, and ocular inflammation were also identified.
In addition, studies showed that the level of β-collagen degradation products (β-CTx) is significantly correlated with DME in female patients with type 2 diabetes, and high serum calcium can also aggravate macular edema in patients with type 2 diabetes. Moreover, it is widely accepted that DME is caused by oxidative stress induced by continuous hyperglycemia and the accumulation of inflammatory cytokines and vascular endothelial growth factor (VEGF), leading to the disruption of the blood-retinal barrier (BRB) [22,23]. Due to the complex pathogenesis of DME, controlling the systemic factors that contribute to DME progression is key to reducing its morbidity.
At present, disease prediction models in the medical big data environment mainly use data mining and machine learning methods to predict unknown data. In medical diagnosis analysis, the characteristic values of disease types are extracted from case examination items, and the probability of different disease types with different characteristic values is analyzed to construct disease prediction models.
Puchao, H. et al. [24] used SVM, decision tree C5.0, and ANN methods in a data mining model to construct a lung cancer risk prediction model. Xing, W. et al. [25] used the decision tree classification method model to test and analyze historical cases and predict the prevalence probability of diseases such as cold and cough.
Alourani, A. et al. [26] used a model based on a deep neural network to predict the mortality of patients in health cloud data.
Raju, M. et al. [27] used machine learning methods to capture the clinical symptoms that may have an EHR history in potential pre-glaucoma patients, to achieve early intervention and preventive treatment of the disease.
However, these methods require sufficient feature data selection to complete disease prediction, which is challenging when faced with missing feature values. To address this issue, knowledge graph reasoning can infer unknown relationships or facts from the existing relationships or facts using any number of entity feature values and apply simple rules or statistical features for reasoning. By combining manually defined logical rules with various probabilistic graphical models, knowledge reasoning can be performed on the constructed logic network, which plays a crucial role in the fields of medical diagnosis, disease prediction, medical treatment, and medical standardization. A knowledge graph has a wide range of applications in TCM clinical, TCM basic, TCM health care, and other fields, including semantic retrieval, intelligent question and answer, decision support, etc. It lays a solid foundation for the intelligent application of TCM and can also be extended to other disease prediction fields.
Tong, Y. et al. [28] constructed a knowledge graph for the field of TCM to achieve an effective integration of TCM knowledge resources.
Li, Y. [29] took ILP as the basic algorithm framework, combined it with the characteristics of TCM diagnosis and treatment to form a knowledge base of TCM clinical syndrome diagnosis rules, and used the relevant weight training algorithm to learn the weights of the learned rules in MLN.
Yingying, Z. [30] constructed a TCM knowledge graph, applied it to the field of TCM diagnosis and treatment, and designed a tongue image diagnosis and treatment system based on the knowledge graph. According to the TCM diagnosis process, the final syndrome diagnosis results and treatment plans were given through the symptoms and tongue body photos entered by users.
Ziqiang, Z. [31] constructed a knowledge graph of the TCM medical cases of chronic kidney disease to perform representation learning of the CKD TCM medical case knowledge graph and reasoning of the CKD TCM medical case knowledge graph, realizing the learning and reasoning of the knowledge graph.
Dan, Y. et al. [32] used the construction method of a knowledge graph and graph search pattern to construct the pattern layer and data layer design of the knowledge graph with TCM classics prescriptions as the research object and designed a knowledge retrieval framework of classics' prescriptions by using Cypher language in Neo4j.
Kai, Z. et al. [33] used the Neo4j graph database to build a small knowledge graph based on the Guizhu decoction prescriptions in "Shang Han Lun" and realized the visual analysis and retrieval functions of the syndromes, prescriptions, and drugs of the Guizhu decoction prescriptions.
Wenlong, G. [34] realized the construction (knowledge fusion, knowledge acquisition, knowledge storage, and knowledge reasoning) and visualization framework of a TCM prescription knowledge graph.
Fan, L. [35] proposed a standardized system process for the construction of a knowledge graph in the field of experience inheritance of famous and old Chinese medicine.
Dan, Z. [36] constructed a knowledge graph of the rules of fatty liver disease syndrome treatment using famous old Chinese medicine and realized a summary of the rules of syndrome treatment by medical case collection, data aggregation, data preprocessing, and data analysis and mining methods (online analysis technology, the complex network method, and the FP-Growth algorithm).
By presenting the interconnection between multi-source and multi-domain data through a semantic network, a knowledge graph enables cross-domain modeling and querying [37] and allows for the expansion of data on demand without disrupting the existing data structure. It has found extensive applications in semantic retrieval [38], intelligent question answering [39], decision support [40], and other fields.
Knowledge reasoning refers to the process of using existing knowledge, rules, and constraints to acquire new knowledge. This process involves reasoning about unknown information based on known information [41]. For example, tasks such as recommendation computing [42], causal analysis, and query question answering can be reduced to the problem of reasoning about the relationships between entities. Knowledge graph reasoning is a technique that can be used to perform tasks such as attribute completion, relation prediction, error checking, question expansion, and semantic understanding [43]. There are several methods for performing knowledge reasoning, including rule-based reasoning, distributed reasoning, and neural-network-based reasoning [44]. Rule-based knowledge reasoning is based on the connection rules or statistical features of entities. The rules are obtained by the statistical learning of the whole knowledge graph, and the rules are improved and optimized on this basis, which can be applied in the field of DME disease prediction.

Materials and Methods
Initially, the dataset was split into a training set and a test set with an 8:2 ratio. Subsequently, the entities and relationships of knowledge graph were constructed through statistical analysis and data preprocessing.
On this basis, the correlation enhancement algorithm was applied to refine the weight attributes of the connecting edges in the knowledge graph. In addition, the generalized closeness method was utilized to enhance the disease prediction model, enabling the classification of influencing factors among different target populations. Ultimately, expert suggestions informed by clinical practice were provided based on the obtained results.
As a result, 3 DME disease prediction models were analyzed and verified based on link prediction evaluation indicators, including AUC, accuracy, and ranking score, to select a more accurate and efficient disease prediction model. The disease prediction model proposed in this study is shown in Figure 2, its activity sequence is shown in Figure 3, and the disease prediction process is as follows.
By presenting the interconnection between multi-source and multi-domain data through a semantic network, a knowledge graph enables cross-domain modeling and querying [37] and allows for the expansion of data on demand without disrupting the existing data structure. It has found extensive applications in semantic retrieval [38], intelligent question answering [39], decision support [40], and other fields.
Knowledge reasoning refers to the process of using existing knowledge, rules, and constraints to acquire new knowledge. This process involves reasoning about unknown information based on known information [41]. For example, tasks such as recommendation computing [42], causal analysis, and query question answering can be reduced to the problem of reasoning about the relationships between entities. Knowledge graph reasoning is a technique that can be used to perform tasks such as attribute completion, relation prediction, error checking, question expansion, and semantic understanding [43]. There are several methods for performing knowledge reasoning, including rule-based reasoning, distributed reasoning, and neural-network-based reasoning [44]. Rule-based knowledge reasoning is based on the connection rules or statistical features of entities. The rules are obtained by the statistical learning of the whole knowledge graph, and the rules are improved and optimized on this basis, which can be applied in the field of DME disease prediction.

Materials and Methods
Initially, the dataset was split into a training set and a test set with an 8:2 ratio. Subsequently, the entities and relationships of knowledge graph were constructed through statistical analysis and data preprocessing.
On this basis, the correlation enhancement algorithm was applied to refine the weight attributes of the connecting edges in the knowledge graph. In addition, the generalized closeness method was utilized to enhance the disease prediction model, enabling the classification of influencing factors among different target populations. Ultimately, expert suggestions informed by clinical practice were provided based on the obtained results.
As a result, 3 DME disease prediction models were analyzed and verified based on link prediction evaluation indicators, including AUC, accuracy, and ranking score, to select a more accurate and efficient disease prediction model. The disease prediction model proposed in this study is shown in Figure 2, its activity sequence is shown in Figure 3, and the disease prediction process is as follows.

Data preprocessing:
The original clinical data were collected, preprocessed, and subjected to statistical analysis. Initially, clinical diagnostic data and basic information of cases were classified based on clinical standards, and then data related to biochemical and non-biochemical indicators were processed separately to identify a limited number of influencing factor entities. The preprocessed data were subjected to statistical analysis, which revealed the high-risk factors associated with DME in the clinical data.
Model training: After the preprocessed dataset was split into training set and test set, the model was trained on the data of the training set, and the knowledge extraction task was completed by means of entity extraction, attribute extraction, and relationship extraction. A medical knowledge graph was constructed, the disease prediction model was constructed by using the correlation enhancement algorithm and the generalized paste progress method, and the model training was completed.
Model testing: The knowledge reasoning algorithm based on the knowledge graph was used to predict the disease on the test set and the ontology rules constructed by the knowledge graph. It includes disease prediction models based on statistical rule reasoning, based on correlation enhancement, and based on improved correlation enhancement.

Data preprocessing:
The original clinical data were collected, preprocessed, and subjected to statistical analysis. Initially, clinical diagnostic data and basic information of cases were classified based on clinical standards, and then data related to biochemical and non-biochemical indicators were processed separately to identify a limited number of influencing factor entities. The preprocessed data were subjected to statistical analysis, which revealed the high-risk factors associated with DME in the clinical data.
Model training: After the preprocessed dataset was split into training set and test set, the model was trained on the data of the training set, and the knowledge extraction task was completed by means of entity extraction, attribute extraction, and relationship extraction. A medical knowledge graph was constructed, the disease prediction model was constructed by using the correlation enhancement algorithm and the generalized paste progress method, and the model training was completed.
Model testing: The knowledge reasoning algorithm based on the knowledge graph was used to predict the disease on the test set and the ontology rules constructed by the knowledge graph. It includes disease prediction models based on statistical rule reasoning, based on correlation enhancement, and based on improved correlation enhancement.

Analysis and evaluation:
Three models were verified and analyzed according to the parameters of the link prediction evaluation indicators of the knowledge graph, including AUC, precision, and ranking score.

Dataset
This study utilized clinical data from 507 cases, which were collected by the hospital information system of Tianjin Medical University Eye Hospital. Specifically, the data comprised 261 cases of DME and 246 cases of non-DME.

Statistical Analysis and Preprocessing of Data
The influencing factors of the disease were based on 39 categories, as shown in Table 1. These factors were preprocessed, classified, and subjected to statistical analysis, and the quantitative numerical data were transformed into qualitative data that could be interpreted. The processing of biochemical data was divided according to the normal value range or interval stage, such as fasting plasma glucose, as shown in Table 2. The processing of Diagnostics 2023, 13, 1858 9 of 18 non-biochemical data was divided according to 0/1, such as family history of diabetes, as shown in Table 3. The statistical classification results of the data are shown in Table 4. Table 2. Data processing of fasting plasma glucose data.

0
No family history of diabetes 1 Family history of diabetes According to the results of data analysis after data processing, 39 categories of influencing factors were divided into 116 qualitative interpretable value ranges. Using the statistical rule reasoning method based on knowledge graph, the classification statistical results of influencing factors were used as the main basis for the weight setting of the connection edges in the process of constructing DME knowledge graph, which provides data support for the comprehensive evaluation of influencing factors in the process of disease prediction.

Construction of Knowledge Graph
After statistical classification and preprocessing of data, a medical knowledge graph was constructed, consisting of 2 types of nodes: "whether or not the disease" and "116 DME disease influencing factors", connected by 232 relationships. The weight attributes of the relationships are added later by computing the weight calculation results, which completes the task of knowledge completion [45]. Figure 4 shows the schematic diagram of DME disease prediction model based on improved correlation enhancement. Through data preprocessing and statistical analysis of the basic information of the case, biochemical indicators, and influence factors of nonbiochemical indicators, the DME data in the training set were input into the knowledge graph as historical case data. In the process of model validation, the similarity between the input personalized diagnosis data results and the historical cases was compared. The weight was used to represent the impact of each influencing factor on the disease, and the correlation enhancement and generalized progress method were used to further obtain the calculation results of DME disease probability after feature fusion. According to the results of data analysis after data processing, 39 categories of influencing factors were divided into 116 qualitative interpretable value ranges. Using the statistical rule reasoning method based on knowledge graph, the classification statistical results of influencing factors were used as the main basis for the weight setting of the connection edges in the process of constructing DME knowledge graph, which provides data support for the comprehensive evaluation of influencing factors in the process of disease prediction.

Construction of Knowledge Graph
After statistical classification and preprocessing of data, a medical knowledge graph was constructed, consisting of 2 types of nodes: "whether or not the disease" and "116 DME disease influencing factors", connected by 232 relationships. The weight attributes of the relationships are added later by computing the weight calculation results, which completes the task of knowledge completion [45]. Figure 4 shows the schematic diagram of DME disease prediction model based on improved correlation enhancement. Through data preprocessing and statistical analysis of the basic information of the case, biochemical indicators, and influence factors of nonbiochemical indicators, the DME data in the training set were input into the knowledge graph as historical case data. In the process of model validation, the similarity between the input personalized diagnosis data results and the historical cases was compared. The weight was used to represent the impact of each influencing factor on the disease, and the correlation enhancement and generalized progress method were used to further obtain the calculation results of DME disease probability after feature fusion.

Disease Prediction Models
To achieve a comprehensive assessment of the impact of multiple influencing factors on disease risk, the DME knowledge graph was constructed using weight as the connection edge attribute between the disease node and the influencing factor node. The statistical analysis results of the training set data were utilized as the standard for weight setting during the construction of the knowledge graph [46]. The clinical dataset was randomly divided into training set and test set at a ratio of 8:2, including 406 cases of training set (203 cases of DME and 203 cases of non-DME), and 101 cases of test set data contained 58 cases of DME clinical data. The statistical characteristics of the training set data were used as attributes for the knowledge graph connection edge. Three weight setting methods are proposed as follows.
Weight 1 is the frequency of cases in which the i disease influencing factor is connected to the j disease node, which is the frequency in the sum of the case frequencies of all node pair connection relationships, where i = 1 . . . 39 corresponding to 39 disease influencing factors, j = 0 . . . 1, 0 denotes non-DME, 1 denotes DME, and n ij denotes frequency under corresponding conditions.
Weight 2 is the case frequency connected by the i disease influencing factor and the j disease node, the frequency ×100 in the sum of case frequencies connected by the node with or without disease, and the influencing factor under this influencing factor, where i = 1 . . . 39 corresponding to 39 disease influencing factors, j = 0/1. 0 represents non-DME, 1 represents DME, and n ij represents the frequency in the corresponding condition.
Weight 3 is combined with the correlation enhancement algorithm based on the results of weight 2 setting, that is, weight 2× correlation enhancement coefficient, where i = 1 . . . 39 corresponding to 39 disease influencing factors, j = 0/1. 0 represents non-DME, 1 represents DME, n ij represents the frequency under the corresponding condition, z = 1 . . . 5, and α z denotes the correlation enhancement coefficient (Table 5). The correlation enhancement coefficient indicates that for weight 2 calculation results, if the frequency of an influencing factor node in DME and non-DME datasets is not significantly different (i.e., the frequency range is closer to 50%), it implies that the factor has less effect on DME disease, resulting in a smaller correlation enhancement coefficient. Conversely, a larger correlation enhancement coefficient indicates a stronger influence of the factor. These coefficients are ranked based on the frequency range priority, with 1 being the highest and 5 being the lowest.
The statistical analysis features of the training set were applied to the knowledge graph to complete the disease prediction task, and the calculation result of weight 1 was used as the attribute value of the knowledge graph connection edge to construct a DME disease prediction model based on statistical rules. On the basis of setting the attribute values of the connected edges of the knowledge graph with weight 2, the correlation enhancement algorithm was used to set weight 3 to further improve the correlation between the influencing factors and the disease, and a DME disease prediction model based on correlation enhancement was constructed. In addition, the method of generalized closeness [47] was used to divide the disease prediction results into interpretable intervals, the DME disease prediction formula was obtained, as shown in Equation 4, and the DME disease prediction model based on improved correlation enhancement was constructed.
T M +i , M 0 values are used for similarity discrimination to calculate DME disease prediction results, where M +i is the dataset of the ith disease influencing factors in the knowledge graph (i = 1, 2 . . . ). M 0 is the current input dataset. µ values represents the data connection relationship in the knowledge graph. If there are data connected with influencing factors at the same time, that is, µ s µ + i = µ s µ 0 i , the probability of DME disease in the target population is 1. The weight W i of the edge is the result of setting weight 3, and the maximum value of the weight W max = 500. If the edge weights W i = W max , the two nodes are fully weighted connected, so Take the predicted DME probability P ∈ [Predict min , Predict max ] calculated by Equation (4), and scale the range [Predict min , Predict max ] to the clinically interpretable range [P min , P max ] = [0, 100]. Thus, the probability of DME predicted by the proposed disease prediction model for the target population was obtained.
The range of [P min , P max ] was used as the range of DME disease prediction results. After discussion with clinical experts, the probability of DME was divided into four categories, and relevant suggestions were given, as shown in Table 6. Table 6. Probability of DME and recommendations.

<50
Probability of DME is low  Recommend follow-up 70-85 Recommend further examination >85 Recommend specialist examination

Results
The partial results of the weight setting based on correlation enhancement are used as attributes between the disease nodes and influencing factor nodes in the knowledge graph (Table A1), and the results are filled into the knowledge graph.
After data processing and data analysis, the top six influencing factors with the highest weights in the DME data were Scr, DR, urine protein, 24 h PRO, 24 h MALB, and FIB. The statistical results are shown in Figure 5. A DME disease prediction analysis was performed on the test set of the DME dataset using the DME disease prediction model based on improved correlation enhancement, and the results are shown in Table 7. A DME disease prediction analysis was performed on the test set of the DME dataset using the DME disease prediction model based on improved correlation enhancement, and the results are shown in Table 7. According to the validation results of the training set, taking a disease probability of 70% as the standard for predicting DME disease, the DME case data in the test set were analyzed and verified, and the DME disease prediction results shown in Table 8 were obtained. According to the DME disease prediction model based on improved correlation enhancement, the disease probability of the 58 cases of DME data in the test set was predicted. The results showed that the disease probability was mostly 70-85%, which was consistent with the expected results.  Table 9 presents the evaluation results of the three disease prediction models, namely, AUC [48], precision [49], and ranking score [50], based on the link prediction evaluation parameters of the knowledge graph reasoning. These results were obtained through analysis and verification. The AUC measures the accuracy of an algorithm as a whole. Given a prediction algorithm, for each unknown node will be given a value of the likelihood of existence. After training, the algorithm obtains the similarity value of each pair of nodes in the knowledge graph network, and a degree greater than 0.5 indicates that the degree of the algorithm is better than the random selection algorithm.
The precision is the proportion of accurate predictions among the top L predicted nodes. If the probability value of the connection is arranged from large to small, and m nodes in the top L are in the test set, then the accuracy is defined as Equation (7).
The ranking score considers the position of the nodes in the test set in the final ranking, and the smaller the ranking score is, the better the prediction effect of the algorithm is. H is the collections of unknown nodes, r e ∈ E p denotes test node e in sorting ranking, and the ranking score of the test nodes is RS e = r e |H| . By traversing all the nodes in the test set, the system ranking score is obtained.
The link prediction evaluation index parameters of the knowledge graph reasoning were used to analyze and verify the three disease prediction models. Based on the evaluation results of the three models, the algorithm using correlation enhancement demonstrated superior performance than the statistical rule reasoning, with a 38.65% higher AUC and a 38.15% higher precision. The algorithm based on improved correlation enhancement showed even better performance, with a 27.95% higher AUC and a 28.45% higher precision compared to the previous method. The proposed DME disease prediction model based on improved correlation enhancement can more accurately and effectively predict the disease by comprehensively evaluating the diagnostic data of the cases. The model can facilitate the clinical screening of the high prevalence of a DME disease, provide references for early disease intervention, and achieve the expected effect.
A clinical decision support system was developed using the improved DME disease prediction model, which enables personalized disease risk prediction by allowing users to input any number of diagnostic data results. Figure 6 illustrates the flowchart of this system. The model can facilitate the clinical screening of the high prevalence of a DME disease, provide references for early disease intervention, and achieve the expected effect. A clinical decision support system was developed using the improved DME disease prediction model, which enables personalized disease risk prediction by allowing users to input any number of diagnostic data results. Figure 6 illustrates the flowchart of this system. Figure 6. Flowchart of DME disease prediction system. For example: The influencing factors were "female, 60-69 years old, BMI is 24-28 kg/m , Waist-to-hip ratio of central obesity, duration of diabetes is 10-19 years, family history of diabetes, hypertension, hyperlipidemia, diabetic nephropathy, anemia, no coronary heart disease, no cerebral infarction, diabetic peripheral neuropathy, no pre-cervical Figure 6. Flowchart of DME disease prediction system. For example: The influencing factors were "female, 60-69 years old, BMI is 24-28 kg/m 2 , Waist-to-hip ratio of central obesity, duration of diabetes is 10-19 years, family history of diabetes, hypertension, hyperlipidemia, diabetic nephropathy, anemia, no coronary heart disease, no cerebral infarction, diabetic peripheral neuropathy, no pre-cervical dark spots, diabetes. The stage of retinopathy is PDR, no history of cataract surgery, no history of retinal laser photocoagulation, smoking history, no history of alcohol consumption, use of insulin, glycosylated hemoglobin > 9%, mean platelet volume 7-11 fL, erythrocyte sedimentation rate >20 mm/h, triglyceride 0.45-1.7 mmol/L, total cholesterol 2.9-5.18 mmol/L, high density lipoprotein cholesterol 1.04-1.55 mmol/L, low density lipoprotein cholesterol 3.37-4.12 mmol/L, D-dimer 0-0.5 mg/L, fibrinogen 2-4 g/L, urea 2.5-7.5 mmol/L, creatinine <88.4 µmol/L, uric acid 89-357µmol/L, estimated glomerular filtration rate 80-120 mL/min, total protein < 60 The probability of DME in the users with g/L, albumin < 40 g/L, urinary protein (+), 24-h urinary protein < 0.15 g, and 24-h urinary microalbumin < 15 mg". For these people, the probability of DME is 86.65%, and a special examination is recommended.

Discussion
A knowledge graph allows for disease prediction even in the case of missing clinical data values. This method is more universally applicable and significant compared to traditional data mining algorithms, as it can complete disease prediction tasks with small amounts of data using simple statistical rule reasoning. The DME disease prediction model based on improved correlation enhancement proposed in this study defines a simple and effective rule-based reasoning algorithm, which realizes a more accurate and effective reasoning application. It not only provides a reference for the clinical screening of a DME high-risk population but also provides research ideas and methods for the early prediction and intervention of other clinical major special diseases. This model can comprehensively evaluate disease influencing factors using any number of objective diagnostic data and tolerate incomplete data. Compared to traditional data mining classification algorithms, it facilitates model optimization, update and iteration, and can assist in clinical screening for high-risk populations for diabetic macular edema, providing expert advice based on different disease probabilities and, thus, contributing to clinical disease prediction and early intervention.
The proposed DME disease prediction model uses a knowledge graph to statistically analyze clinical data, classify the 39 types of DME disease influencing factors, and construct a medical knowledge graph using three weight-setting methods for the knowledge graph connection edge. Based on this, the disease prediction formula was derived using the generalized closeness degree method and improved using correlation enhancement. Three DME disease prediction models were tested, and the precision of the disease prediction model based on improved correlation enhancement was increased by 28.45% compared to the algorithm before improvement, indicating the model's better performance in completing the disease prediction task and providing technical methods for the intelligent prediction of clinical diseases.
In this study, the proposed DME disease prediction model comprehensively evaluates disease probability based on basic clinical data, disease history, medical test results, and other factors, resulting in a clinical decision support system for disease prediction visualization [51]. This system is useful in promoting and applying the disease prediction model in clinical practice. The study plans to collect more clinical objective data in the future to update the model and further improve its accuracy of disease prediction. Additionally, this study's findings can serve as a reference for the prediction of other major clinical diseases such as hypertension, diabetes, coronary heart disease, and others, providing support for early disease screening and intervention. Furthermore, the model can offer decision-making technical support for TCM syndrome differentiation, disease diagnosis, disease treatment, and physical identification.