skip to main content
10.1145/3611643.3613897acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open Access

EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System

Authors Info & Claims
Published:30 November 2023Publication History

ABSTRACT

The Cancer Registry of Norway (CRN) collects information on cancer patients by receiving cancer messages from different medical entities (e.g., medical labs, hospitals) in Norway. Such messages are validated by an automated cancer registry system: GURI. Its correct operation is crucial since it lays the foundation for cancer research and provides critical cancer-related statistics to its stakeholders. Constructing a cyber-cyber digital twin (CCDT) for GURI can facilitate various experiments and advanced analyses of the operational state of GURI without requiring intensive interactions with the real system. However, GURI constantly evolves due to novel medical diagnostics and treatment, technological advances, etc. Accordingly, CCDT should evolve as well to synchronize with GURI. A key challenge of achieving such synchronization is that evolving CCDT needs abundant data labelled by the new GURI. To tackle this challenge, we propose EvoCLINICAL, which considers the CCDT developed for the previous version of GURI as the pretrained model and fine-tunes it with the dataset labelled by querying a new GURI version. EvoCLINICAL employs a genetic algorithm to select an optimal subset of cancer messages from a candidate dataset and query GURI with it. We evaluate EvoCLINICAL on three evolution processes. The precision, recall, and F1 score are all greater than 91%, demonstrating the effectiveness of EvoCLINICAL. Furthermore, we replace the active learning part of EvoCLINICAL with random selection to study the contribution of transfer learning to the overall performance of EvoCLINICAL. Results show that employing active learning in EvoCLINICAL increases its performances consistently.

References

  1. John Ahlgren, Kinga Bojarczuk, Sophia Drossopoulou, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Maria Lomeli, Simon M. M. Lucas, Erik Meijer, Steve Omohundro, Rubmary Rojas, Silvia Sapora, and Norm Zhou. 2021. Facebook’s Cyber–Cyber and Cyber–Physical Digital Twins. In Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering (EASE ’21). Association for Computing Machinery, New York, NY, USA. 1–9. isbn:9781450390538 https://doi.org/10.1145/3463274.3463275 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrea Arcuri and Lionel Briand. 2011. A Practical Guide for Using Statistical Tests to Assess Randomized Algorithms in Software Engineering. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). Association for Computing Machinery, New York, NY, USA. 1–10. isbn:9781450304450 https://doi.org/10.1145/1985793.1985795 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Andrea Arcuri, Juan Pablo Galeotti, Bogdan Marculescu, and Man Zhang. 2021. Evomaster: A search-based system test generation tool. Google ScholarGoogle Scholar
  4. Josh Attenberg and Foster Provost. 2011. Inactive Learning? Difficulties Employing Active Learning in Practice. SIGKDD Explor. Newsl., 12, 2 (2011), mar, 36–41. issn:1931-0145 https://doi.org/10.1145/1964897.1964906 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mohamed Bekkar, Hassiba Kheliouane Djemaa, and Taklit Akrouf Alitouche. 2013. Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl, 3, 10 (2013). Google ScholarGoogle Scholar
  6. Iwo Biał ynicki-Birula and Jerzy Mycielski. 1975. Uncertainty relations for information entropy in wave mechanics. Communications in Mathematical Physics, 44 (1975), 129–132. https://doi.org/10.1007/BF01608825 Google ScholarGoogle ScholarCross RefCross Ref
  7. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv:2005.14165. Google ScholarGoogle Scholar
  8. Deng Cai and Xiaofei He. 2011. Manifold adaptive experimental design for text categorization. IEEE Transactions on Knowledge and Data Engineering, 24, 4 (2011), 707–719. https://doi.org/10.1109/TKDE.2011.104 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cristian Cardellino, Serena Villata, Laura Alonso Alemany, and Elena Cabrio. 2015. Information extraction with active learning: A case study in legal text. In Computational Linguistics and Intelligent Text Processing: 16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part II 16. 483–494. https://doi.org/10.1007/978-3-319-18117-2_36 Google ScholarGoogle ScholarCross RefCross Ref
  10. Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. arxiv:1803.11175. Google ScholarGoogle Scholar
  11. Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2017. Very Deep Convolutional Networks for Text Classification. arxiv:1606.01781. Google ScholarGoogle Scholar
  12. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805. Google ScholarGoogle Scholar
  13. Juan J Durillo and Antonio J Nebro. 2011. jMetal: A Java framework for multi-objective optimization. Advances in Engineering Software, 42, 10 (2011), 760–771. https://doi.org/10.1016/j.advengsoft.2011.05.014 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Matthias Eckhart and Andreas Ekelhart. 2018. Towards Security-Aware Virtual Environments for Digital Twins. In Proceedings of the 4th ACM Workshop on Cyber-Physical System Security (CPSS ’18). Association for Computing Machinery, New York, NY, USA. 61–72. isbn:9781450357555 https://doi.org/10.1145/3198458.3198464 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Matthias Eckhart and Andreas Ekelhart. 2019. Digital Twins for Cyber-Physical Systems Security: State of the Art and Outlook. In Security and Quality in Cyber-Physical Systems Engineering: With Forewords by Robert M. Lee and Tom Gilb, Stefan Biffl, Matthias Eckhart, Arndt Lüder, and Edgar Weippl (Eds.). Springer International Publishing, Cham. 383–412. isbn:978-3-030-25312-7 https://doi.org/10.1007/978-3-030-25312-7_14 Google ScholarGoogle ScholarCross RefCross Ref
  16. International Agency for Research on Cancer. 2020. All Cancers Fact Sheet. https://gco.iarc.fr/today/data/factsheets/cancers/39-All-cancers-fact-sheet.pdf Accessed: May 7th, 2023 Google ScholarGoogle Scholar
  17. B. Fuglede and F. Topsoe. 2004. Jensen-Shannon divergence and Hilbert space embedding. In International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings.. 31–. https://doi.org/10.1109/ISIT.2004.1365067 Google ScholarGoogle ScholarCross RefCross Ref
  18. Mohamed Goudjil, Mouloud Koudil, Mouldi Bedda, and Noureddine Ghoggali. 2018. A novel active learning method using SVM for text classification. International Journal of Automation and Computing, 15 (2018), 290–298. https://doi.org/10.1007/s11633-015-0912-z Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Minyoung Huh, Pulkit Agrawal, and Alexei A. Efros. 2016. What makes ImageNet good for transfer learning? arxiv:1608.08614. Google ScholarGoogle Scholar
  20. Erblin Isaku, Hassan Sartaj, Christoph Laaber, Tao Yue, Shaukat Ali, Thomas Schwitalla, and Jan F. Nygård. 2023. Cost Reduction on Testing Evolving Cancer Registry System. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). Google ScholarGoogle Scholar
  21. Ajay J. Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. 2009. Multi-class active learning for image classification. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2372–2379. https://doi.org/10.1109/CVPR.2009.5206627 Google ScholarGoogle ScholarCross RefCross Ref
  22. Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of Tricks for Efficient Text Classification. arxiv:1607.01759. Google ScholarGoogle Scholar
  23. Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arxiv:1412.6980. Google ScholarGoogle Scholar
  24. Christoph Laaber, Tao Yue, Shaukat Ali, Thomas Schwitalla, and Jan F. Nygård. 2023. Automated Test Generation for Medical Rules Web Services: A Case Study at the Cancer Registry of Norway. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). https://doi.org/10.1145/3611643.3613882 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Christoph Laaber, Tao Yue, Shaukat Ali, Thomas Schwitalla, and Jan F. Nygård. 2023. Challenges of Testing an Evolving Cancer Registration Support System in Practice. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 355–359. https://doi.org/10.1109/ICSE-Companion58688.2023.00102 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kun-Lin Liu, Wu-Jun Li, and Minyi Guo. 2012. Emoticon Smoothed Language Models for Twitter Sentiment Analysis. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI’12). AAAI Press, 1678–1684. https://doi.org/10.5555/2900929.2900966 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Wenhe Liu, Xiaojun Chang, Ling Chen, Dinh Phung, Xiaoqin Zhang, Yi Yang, and Alexander G. Hauptmann. 2020. Pair-Based Uncertainty and Diversity Promoting Early Active Learning for Person Re-Identification. ACM Trans. Intell. Syst. Technol., 11, 2 (2020), Article 21, jan, 15 pages. issn:2157-6904 https://doi.org/10.1145/3372121 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wei Liu, Tongge Xu, Qinghua Xu, Jiayu Song, and Yueran Zu. 2019. An Encoding Strategy Based Word-Character LSTM for Chinese NER. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota. 2379–2389. https://doi.org/10.18653/v1/N19-1247 Google ScholarGoogle ScholarCross RefCross Ref
  29. Chengjie Lu, Huihui Zhang, Tao Yue, and Shaukat Ali. 2021. Search-based selection and prioritization of test scenarios for autonomous driving systems. In Search-Based Software Engineering: 13th International Symposium, SSBSE 2021, Bari, Italy, October 11–12, 2021, Proceedings 13. 41–55. https://doi.org/10.1007/978-3-030-88106-1_4 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Karan Malhotra, Shubham Bansal, and Sriram Ganapathy. 2019. Active Learning Methods for Low Resource End-to-End Speech Recognition. In Proc. Interspeech 2019. 2215–2219. https://doi.org/10.21437/Interspeech.2019-2316 Google ScholarGoogle ScholarCross RefCross Ref
  31. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arxiv:1301.3781. Google ScholarGoogle Scholar
  32. Mahdi Namazifar, Alexandros Papangelis, Gokhan Tur, and Dilek Hakkani-Tür. 2020. Language Model is All You Need: Natural Language Understanding as Question Answering. arxiv:2011.03023. Google ScholarGoogle Scholar
  33. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar. 1532–1543. https://doi.org/10.3115/v1/D14-1162 Google ScholarGoogle ScholarCross RefCross Ref
  34. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arxiv:1802.05365. Google ScholarGoogle Scholar
  35. Oscar Reyes and Sebastián Ventura. 2018. Evolutionary Strategy to Perform Batch-Mode Active Learning on Multi-Label Data. ACM Trans. Intell. Syst. Technol., 9, 4 (2018), Article 46, jan, 26 pages. issn:2157-6904 https://doi.org/10.1145/3161606 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Seonghan Ryu, Seokhwan Kim, Junhwi Choi, Hwanjo Yu, and Gary Geunbae Lee. 2017. Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems. Pattern Recognition Letters, 88 (2017), mar, 26–32. https://doi.org/10.1016/j.patrec.2017.01.008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Cedric Seger. 2018. An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing. Google ScholarGoogle Scholar
  38. Okeke Stephen, Uchenna Joseph Maduh, Sanjar Ibrokhimov, Kueh Lee Hui, Ahmed Abdulhakim Al-Absi, and Mangal Sain. 2019. A Multiple-Loss Dual-Output Convolutional Neural Network for Fashion Class Classification. In 2019 21st International Conference on Advanced Communication Technology (ICACT). 408–412. https://doi.org/10.23919/ICACT.2019.8701958 Google ScholarGoogle ScholarCross RefCross Ref
  39. Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, Sanjoy Dasgupta and David McAllester (Eds.) (Proceedings of Machine Learning Research, Vol. 28). PMLR, Atlanta, Georgia, USA. 1139–1147. https://proceedings.mlr.press/v28/sutskever13.html Google ScholarGoogle Scholar
  40. Raquel Sánchez-Cauce, Jorge Pérez-Martín, and Manuel Luque. 2021. Multi-input convolutional neural network for breast cancer detection using thermal images and clinical data. Computer Methods and Programs in Biomedicine, 204 (2021), 106045. issn:0169-2607 https://doi.org/10.1016/j.cmpb.2021.106045 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Annegreet van Opbroek, M. Arfan Ikram, Meike W. Vernooij, and Marleen de Bruijne. 2015. Transfer Learning Improves Supervised Image Segmentation Across Imaging Protocols. IEEE Transactions on Medical Imaging, 34, 5 (2015), 1018–1030. https://doi.org/10.1109/TMI.2014.2366792 Google ScholarGoogle ScholarCross RefCross Ref
  42. Qinghua Xu, Shaukat Ali, Tao Yue, and Maite Arratibel. 2022. Uncertainty-Aware Transfer Learning to Evolve Digital Twins for Industrial Elevators. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA. 1257–1268. isbn:9781450394130 https://doi.org/10.1145/3540250.3558957 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Qinghua Xu, Shaukat Ali, Tao Yue, Nedim Zaimovic, and Singh Inderjeet. 2023. Uncertainty-Aware Transfer Learning to Evolve Digital Twins for Industrial Elevators. In Proceedings of the 31th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA. 11 pages. isbn:979-8-4007-0327-0/23/12 https://doi.org/10.1145/3611643.3613879 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jinsong Yu, Yue Song, Diyin Tang, and Jing Dai. 2021. A Digital Twin approach based on nonparametric Bayesian network for complex system health monitoring. Journal of Manufacturing Systems, 58 (2021), 293–304. issn:0278-6125 https://doi.org/10.1016/j.jmsy.2020.07.005 Digital Twin towards Smart Manufacturing and Industry 4.0 Google ScholarGoogle ScholarCross RefCross Ref
  45. Eckart Zitzler and Simon Künzli. 2004. Indicator-based selection in multiobjective search. In PPSN. 4, 832–842. https://doi.org/10.1007/978-3-540-30217-9_84 Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Conferences
                    ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
                    November 2023
                    2215 pages
                    ISBN:9798400703270
                    DOI:10.1145/3611643

                    Copyright © 2023 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 30 November 2023

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article

                    Acceptance Rates

                    Overall Acceptance Rate112of543submissions,21%

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader