Abstract
Visualized knowledge representation can more effectively help the public gain knowledge about lung cancer prevention, diagnosis, treatment, and subsequent life. Therefore, this study collected articles on lung cancer from the well-known Web of Science database to analyze lung cancer literature, and the text data were published between 2016 and 2021. First, we used natural language processing to handle the collected text data, and then we used the latent Dirichlet allocation method to perform topic modeling and obtain the optimal topic numbers based on two coherence metrics for assigning the class of every article. Next, a PMI_2 weighted was proposed to build an initial weighted knowledge graph, and four graph neural network algorithms were used to train the initial weighted knowledge graph. In addition, we proposed a PMI_2 + link to improve the classification performance, and the additional links were obtained from the graph auto-encoder and graph convolutional network training. When the best classification performance has been obtained, these edge weights have a representative. For visualized knowledge representation, we used the Neo4j tool to display the nodes and edge weights for the final literature knowledge. The results show that the use of the proposed PMI_2 + link to build a weighted graph has a better classification performance. Further, the proposed PMI_2 + link can effectively reduce the number of edges on the knowledge graphs and avoid insufficient GPU memory.
Similar content being viewed by others
Data availability
The data are collected from Web of Science (https://www.webofscience.com/wos/woscc/basic-search).
References
World Health Organization, Cancer. https://www.who.int/news-room/factsheets/detail/cancer. Accessed 21 Sept 2021
Murray CJ et al (2020) Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the global burden of Disease Study 2019. Lancet 396(10258):1223–1249. https://doi.org/10.1016/S0140-6736(20)30752-2
Walter FM et al (2015) Symptoms and other factors associated with time to diagnosis and stage of lung cancer: a prospective cohort study. Br J Cancer 112(1):S6–S13. https://doi.org/10.1038/bjc.2015.30
Akabe K, Takeuchi T, Aoki T, Nishimura K (2021) Information retrieval on oncology knowledge base using recursive paraphrase lattice. J Biomed Inform 116:103705. https://doi.org/10.1016/j.jbi.2021.103705
Nurdiati S, Hoede C (2008) 25 years development of knowledge graph theory: the results and the challenge. Memorandum 1876(2):1–10
Vlietstra WJ, Vos R, Sijbers AM, van Mulligen EM, Kors JA (2018) Using predicate and provenance information from a knowledge graph for drug efficacy screening. J Biomed Semant 9(1):1–10. https://doi.org/10.1186/s13326-018-0189-6
Zhou H, Lang C, Liu Z, Ning S, Lin Y, Du L (2019) Knowledge-guided convolutional networks for chemical-disease relation extraction. BMC Bioinform 20(1):1–13. https://doi.org/10.1186/s12859-019-2873-7
Zhang Z, Cao L, Chen X (2020) Representation learning of knowledge graphs with entity attributes. IEEE Access 8:7435–7441. doi:https://doi.org/10.1109/access.2020.2963990
Neo4j Graph Data Platform, Neo4j Graph Data Platform – The Leader in Graph Databases. https://neo4j.com/. Accessed 06 Apr 2022
Futia G, Vetrò A, De Martin JC (2020) SeMi: a SEmantic modeling machIne to build knowledge graphs with graph neural networks. SoftwareX 12:100516. https://doi.org/10.1016/j.softx.2020.100516
Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing? Int J Hum Comput Stud 43:5–6. https://doi.org/10.1006/ijhc.1995.1081
Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188. https://doi.org/10.1613/jair.2934
Yang Y, Cao Z, Zhao P, Zeng DD, Zhang Q, Luo Y (2021) Constructing public health evidence knowledge graph for decision-making support from COVID-19 literature of modelling study. J Saf Sci Resil 2(3):146–156. https://doi.org/10.1016/j.jnlssr.2021.08.002
Akkasi A, Moens M-F (2021) Causal relationship extraction from biomedical text using deep neural models: a comprehensive survey. J Biomed Inform 119:103820. https://doi.org/10.1016/j.jbi.2021.103820
Sheth A, Padhee S, Gyrard A (2019) Knowledge graphs and knowledge networks: the story in brief. IEEE Internet Comput 23(4):67–75
Lin Y, Han X, Xie R, Liu Z, Sun M (2018) Knowledge representation learning: a quantitative review. arXiv preprint arXiv:1812.10901. https://doi.org/10.48550/arXiv.1812.10901
Wang Q, Mao Z, Wang B, Guo L (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Trans Knowl Data Eng 29(12):2724–2743. https://doi.org/10.1109/TKDE.2017.2754499
Zhang Z, Cui P, Zhu W (2020) Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2981333
Finlayson SG, LePendu P, Shah NH (2014) Building the graph of medicine from millions of clinical narratives. Sci Data 1(1):1–9. https://doi.org/10.1038/sdata.2014.32
Rotmensch M, Halpern Y, Tlimat A, Horng S, Sontag D (2017) Learning a health knowledge graph from electronic medical records. Sci Rep 7(1):1–11. https://doi.org/10.1038/s41598-017-05778-z
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, Conference Track Proceedings
Li Z, Zhao Y, Zhang Y, Zhang Z (2022) Multi-relational graph attention networks for knowledge graph completion. Knowl Based Syst 251:109262. https://doi.org/10.1016/j.knosys.2022.109262
Shi Y, Huang Z, Feng S, Zhong H, Wang W, Sun Y (2020) Masked label prediction: Unified message passing model for semi-supervised classification. arXiv preprint arXiv:2009.03509. https://doi.org/10.48550/arXiv.2009.03509
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc EEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32(1):4–24. https://doi.org/10.1109/TNNLS.2020.2978386
Lin J, Zhao Y, Huang W et al (2021) Domain knowledge graph-based research progress of knowledge representation. Neural Comput Appl 33:681–690. https://doi.org/10.1007/s00521-020-05057-5
Yao L, Mao C, Luo Y (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, no 01, pp 7370–7377. https://doi.org/10.1609/aaai.v33i01.33017370
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neural Inform Process Syst 30
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Veličković P, Casanova A, Lio P, Cucurull G, Romero A, Bengio Y (2018) Graph attention networks. 6th International Conference on Learning Representations, ICLR - Conference Track Proceedings. https://doi.org/10.17863/CAM.48429
Devlin J, Chang M-W, Lee K, Toutanova K Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1, 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics
Wu T, Qi G, Li C (2018) A survey of techniques for constructing chinese knowledge graphs and their applications. Sustainability 10(9):3245. doi:https://doi.org/10.3390/su10093245
Xu H, Jiang B, Huang L, Tang J, Zhang S (2022) Multi-head collaborative learning for graph neural networks. Neurocomputing 499:47–53. https://doi.org/10.1016/j.neucom.2022.05.027
Baker S, Silins I, Guo Y, Ali I, Högberg J, Stenius U, Korhonen A (2016) Automatic semantic classification of scientific literature according to the hallmarks of cancer. Bioinformatics 32(3):432–440. https://doi.org/10.1093/bioinformatics/btv585
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on web search and data mining, pp 399–408. https://doi.org/10.1145/2684822.2685324
Goodfellow I, Bengio Y, Courville A (2016) 6.2. 2.3 softmax units for multinoulli output distributions. Deep Learning, no 1, pp 180
Fey M, Lenssen JE (2019) Fast graph representation learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428. https://doi.org/10.48550/arXiv.1903.02428
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980
Heist N (2018) Towards knowledge graph construction from entity co-occurrence, the proceedings of the 21th International conference on knowledge engineering and knowledge management, EKAW 2018, held in Nancy, France, in November 2018
Duan Y, Shao L, Hu G (2018) Specifying knowledge graph with data graph, information graph, knowledge graph, and wisdom graph. Int J Softw Innov (IJSI) 6(2):10–25. https://doi.org/10.4018/IJSI.2018040102
Sun Z, Huang J, Hu W, Chen M, Guo L, Qu Y (2019) TransEdge: Translating relation-contextualized Embeddings for Knowledge Graphs, the semantic web – ISWC 2019. Lecture Notes in Computer Science book series, vol 11778. Springer, Cham. https://doi.org/10.1007/978-3-030-30793-6_35
Begum M, Urquhart I, Grant Lewison FF, Sullivan R (2020) Research on lung cancer and its funding, 2004–2018. Ecancermedicalscience 14:1132. https://doi.org/10.3332/ecancer.2020.1132
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interest
The authors declare that they have no relevant financial or non-financial interests or competing interests to disclose.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cheng, CH., Ji, ZT. A weighted-link graph neural network for lung cancer knowledge classification. Appl Intell 53, 17610–17628 (2023). https://doi.org/10.1007/s10489-022-04437-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04437-9