EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System

Authors:
Chengjie Lu

Simula Research Laboratory, Oslo, Norway / University of Oslo, Oslo, Norway

Simula Research Laboratory, Oslo, Norway / University of Oslo, Oslo, Norway
View Profile

,
Qinghua Xu

Simula Research Laboratory, Oslo, Norway / University of Oslo, Oslo, Norway

Simula Research Laboratory, Oslo, Norway / University of Oslo, Oslo, Norway
View Profile

,
Tao Yue

Simula Research Laboratory, Oslo, Norway

Simula Research Laboratory, Oslo, Norway
View Profile

,
Shaukat Ali

Simula Research Laboratory, Oslo, Norway / Oslo Metropolitan University, Oslo, Norway

Simula Research Laboratory, Oslo, Norway / Oslo Metropolitan University, Oslo, Norway
View Profile

,
Thomas Schwitalla

Cancer Registry of Norway, Oslo, Norway

Cancer Registry of Norway, Oslo, Norway
View Profile

,
Jan Nygård

Cancer Registry of Norway, Oslo, Norway / UiT The Arctic University of Norway, Tromsø, Norway

Cancer Registry of Norway, Oslo, Norway / UiT The Arctic University of Norway, Tromsø, Norway
View Profile

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringNovember 2023Pages 1973–1984https://doi.org/10.1145/3611643.3613897

Published:30 November 2023Publication History

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 1973–1984

ABSTRACT

The Cancer Registry of Norway (CRN) collects information on cancer patients by receiving cancer messages from different medical entities (e.g., medical labs, hospitals) in Norway. Such messages are validated by an automated cancer registry system: GURI. Its correct operation is crucial since it lays the foundation for cancer research and provides critical cancer-related statistics to its stakeholders. Constructing a cyber-cyber digital twin (CCDT) for GURI can facilitate various experiments and advanced analyses of the operational state of GURI without requiring intensive interactions with the real system. However, GURI constantly evolves due to novel medical diagnostics and treatment, technological advances, etc. Accordingly, CCDT should evolve as well to synchronize with GURI. A key challenge of achieving such synchronization is that evolving CCDT needs abundant data labelled by the new GURI. To tackle this challenge, we propose EvoCLINICAL, which considers the CCDT developed for the previous version of GURI as the pretrained model and fine-tunes it with the dataset labelled by querying a new GURI version. EvoCLINICAL employs a genetic algorithm to select an optimal subset of cancer messages from a candidate dataset and query GURI with it. We evaluate EvoCLINICAL on three evolution processes. The precision, recall, and F1 score are all greater than 91%, demonstrating the effectiveness of EvoCLINICAL. Furthermore, we replace the active learning part of EvoCLINICAL with random selection to study the contribution of transfer learning to the overall performance of EvoCLINICAL. Results show that employing active learning in EvoCLINICAL increases its performances consistently.

References

John Ahlgren, Kinga Bojarczuk, Sophia Drossopoulou, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Maria Lomeli, Simon M. M. Lucas, Erik Meijer, Steve Omohundro, Rubmary Rojas, Silvia Sapora, and Norm Zhou. 2021. Facebook’s Cyber–Cyber and Cyber–Physical Digital Twins. In Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering (EASE ’21). Association for Computing Machinery, New York, NY, USA. 1–9. isbn:9781450390538 https://doi.org/10.1145/3463274.3463275 Google ScholarDigital Library
Andrea Arcuri and Lionel Briand. 2011. A Practical Guide for Using Statistical Tests to Assess Randomized Algorithms in Software Engineering. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). Association for Computing Machinery, New York, NY, USA. 1–10. isbn:9781450304450 https://doi.org/10.1145/1985793.1985795 Google ScholarDigital Library
Andrea Arcuri, Juan Pablo Galeotti, Bogdan Marculescu, and Man Zhang. 2021. Evomaster: A search-based system test generation tool. Google Scholar
Josh Attenberg and Foster Provost. 2011. Inactive Learning? Difficulties Employing Active Learning in Practice. SIGKDD Explor. Newsl., 12, 2 (2011), mar, 36–41. issn:1931-0145 https://doi.org/10.1145/1964897.1964906 Google ScholarDigital Library
Mohamed Bekkar, Hassiba Kheliouane Djemaa, and Taklit Akrouf Alitouche. 2013. Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl, 3, 10 (2013). Google Scholar
Iwo Biał ynicki-Birula and Jerzy Mycielski. 1975. Uncertainty relations for information entropy in wave mechanics. Communications in Mathematical Physics, 44 (1975), 129–132. https://doi.org/10.1007/BF01608825 Google ScholarCross Ref
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv:2005.14165. Google Scholar
Deng Cai and Xiaofei He. 2011. Manifold adaptive experimental design for text categorization. IEEE Transactions on Knowledge and Data Engineering, 24, 4 (2011), 707–719. https://doi.org/10.1109/TKDE.2011.104 Google ScholarDigital Library
Cristian Cardellino, Serena Villata, Laura Alonso Alemany, and Elena Cabrio. 2015. Information extraction with active learning: A case study in legal text. In Computational Linguistics and Intelligent Text Processing: 16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part II 16. 483–494. https://doi.org/10.1007/978-3-319-18117-2_36 Google ScholarCross Ref
Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. arxiv:1803.11175. Google Scholar
Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2017. Very Deep Convolutional Networks for Text Classification. arxiv:1606.01781. Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805. Google Scholar
Juan J Durillo and Antonio J Nebro. 2011. jMetal: A Java framework for multi-objective optimization. Advances in Engineering Software, 42, 10 (2011), 760–771. https://doi.org/10.1016/j.advengsoft.2011.05.014 Google ScholarDigital Library
Matthias Eckhart and Andreas Ekelhart. 2018. Towards Security-Aware Virtual Environments for Digital Twins. In Proceedings of the 4th ACM Workshop on Cyber-Physical System Security (CPSS ’18). Association for Computing Machinery, New York, NY, USA. 61–72. isbn:9781450357555 https://doi.org/10.1145/3198458.3198464 Google ScholarDigital Library
Matthias Eckhart and Andreas Ekelhart. 2019. Digital Twins for Cyber-Physical Systems Security: State of the Art and Outlook. In Security and Quality in Cyber-Physical Systems Engineering: With Forewords by Robert M. Lee and Tom Gilb, Stefan Biffl, Matthias Eckhart, Arndt Lüder, and Edgar Weippl (Eds.). Springer International Publishing, Cham. 383–412. isbn:978-3-030-25312-7 https://doi.org/10.1007/978-3-030-25312-7_14 Google ScholarCross Ref
International Agency for Research on Cancer. 2020. All Cancers Fact Sheet. https://gco.iarc.fr/today/data/factsheets/cancers/39-All-cancers-fact-sheet.pdf Accessed: May 7th, 2023 Google Scholar
B. Fuglede and F. Topsoe. 2004. Jensen-Shannon divergence and Hilbert space embedding. In International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings.. 31–. https://doi.org/10.1109/ISIT.2004.1365067 Google ScholarCross Ref
Mohamed Goudjil, Mouloud Koudil, Mouldi Bedda, and Noureddine Ghoggali. 2018. A novel active learning method using SVM for text classification. International Journal of Automation and Computing, 15 (2018), 290–298. https://doi.org/10.1007/s11633-015-0912-z Google ScholarDigital Library
Minyoung Huh, Pulkit Agrawal, and Alexei A. Efros. 2016. What makes ImageNet good for transfer learning? arxiv:1608.08614. Google Scholar
Erblin Isaku, Hassan Sartaj, Christoph Laaber, Tao Yue, Shaukat Ali, Thomas Schwitalla, and Jan F. Nygård. 2023. Cost Reduction on Testing Evolving Cancer Registry System. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). Google Scholar
Ajay J. Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. 2009. Multi-class active learning for image classification. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2372–2379. https://doi.org/10.1109/CVPR.2009.5206627 Google ScholarCross Ref
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of Tricks for Efficient Text Classification. arxiv:1607.01759. Google Scholar
Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arxiv:1412.6980. Google Scholar
Christoph Laaber, Tao Yue, Shaukat Ali, Thomas Schwitalla, and Jan F. Nygård. 2023. Automated Test Generation for Medical Rules Web Services: A Case Study at the Cancer Registry of Norway. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). https://doi.org/10.1145/3611643.3613882 Google ScholarDigital Library
Christoph Laaber, Tao Yue, Shaukat Ali, Thomas Schwitalla, and Jan F. Nygård. 2023. Challenges of Testing an Evolving Cancer Registration Support System in Practice. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 355–359. https://doi.org/10.1109/ICSE-Companion58688.2023.00102 Google ScholarDigital Library
Kun-Lin Liu, Wu-Jun Li, and Minyi Guo. 2012. Emoticon Smoothed Language Models for Twitter Sentiment Analysis. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI’12). AAAI Press, 1678–1684. https://doi.org/10.5555/2900929.2900966 Google ScholarDigital Library
Wenhe Liu, Xiaojun Chang, Ling Chen, Dinh Phung, Xiaoqin Zhang, Yi Yang, and Alexander G. Hauptmann. 2020. Pair-Based Uncertainty and Diversity Promoting Early Active Learning for Person Re-Identification. ACM Trans. Intell. Syst. Technol., 11, 2 (2020), Article 21, jan, 15 pages. issn:2157-6904 https://doi.org/10.1145/3372121 Google ScholarDigital Library
Wei Liu, Tongge Xu, Qinghua Xu, Jiayu Song, and Yueran Zu. 2019. An Encoding Strategy Based Word-Character LSTM for Chinese NER. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota. 2379–2389. https://doi.org/10.18653/v1/N19-1247 Google ScholarCross Ref
Chengjie Lu, Huihui Zhang, Tao Yue, and Shaukat Ali. 2021. Search-based selection and prioritization of test scenarios for autonomous driving systems. In Search-Based Software Engineering: 13th International Symposium, SSBSE 2021, Bari, Italy, October 11–12, 2021, Proceedings 13. 41–55. https://doi.org/10.1007/978-3-030-88106-1_4 Google ScholarDigital Library
Karan Malhotra, Shubham Bansal, and Sriram Ganapathy. 2019. Active Learning Methods for Low Resource End-to-End Speech Recognition. In Proc. Interspeech 2019. 2215–2219. https://doi.org/10.21437/Interspeech.2019-2316 Google ScholarCross Ref
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arxiv:1301.3781. Google Scholar
Mahdi Namazifar, Alexandros Papangelis, Gokhan Tur, and Dilek Hakkani-Tür. 2020. Language Model is All You Need: Natural Language Understanding as Question Answering. arxiv:2011.03023. Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar. 1532–1543. https://doi.org/10.3115/v1/D14-1162 Google ScholarCross Ref
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. arxiv:1802.05365. Google Scholar
Oscar Reyes and Sebastián Ventura. 2018. Evolutionary Strategy to Perform Batch-Mode Active Learning on Multi-Label Data. ACM Trans. Intell. Syst. Technol., 9, 4 (2018), Article 46, jan, 26 pages. issn:2157-6904 https://doi.org/10.1145/3161606 Google ScholarDigital Library
Seonghan Ryu, Seokhwan Kim, Junhwi Choi, Hwanjo Yu, and Gary Geunbae Lee. 2017. Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems. Pattern Recognition Letters, 88 (2017), mar, 26–32. https://doi.org/10.1016/j.patrec.2017.01.008 Google ScholarDigital Library
Cedric Seger. 2018. An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing. Google Scholar
Okeke Stephen, Uchenna Joseph Maduh, Sanjar Ibrokhimov, Kueh Lee Hui, Ahmed Abdulhakim Al-Absi, and Mangal Sain. 2019. A Multiple-Loss Dual-Output Convolutional Neural Network for Fashion Class Classification. In 2019 21st International Conference on Advanced Communication Technology (ICACT). 408–412. https://doi.org/10.23919/ICACT.2019.8701958 Google ScholarCross Ref
Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, Sanjoy Dasgupta and David McAllester (Eds.) (Proceedings of Machine Learning Research, Vol. 28). PMLR, Atlanta, Georgia, USA. 1139–1147. https://proceedings.mlr.press/v28/sutskever13.html Google Scholar
Raquel Sánchez-Cauce, Jorge Pérez-Martín, and Manuel Luque. 2021. Multi-input convolutional neural network for breast cancer detection using thermal images and clinical data. Computer Methods and Programs in Biomedicine, 204 (2021), 106045. issn:0169-2607 https://doi.org/10.1016/j.cmpb.2021.106045 Google ScholarDigital Library
Annegreet van Opbroek, M. Arfan Ikram, Meike W. Vernooij, and Marleen de Bruijne. 2015. Transfer Learning Improves Supervised Image Segmentation Across Imaging Protocols. IEEE Transactions on Medical Imaging, 34, 5 (2015), 1018–1030. https://doi.org/10.1109/TMI.2014.2366792 Google ScholarCross Ref
Qinghua Xu, Shaukat Ali, Tao Yue, and Maite Arratibel. 2022. Uncertainty-Aware Transfer Learning to Evolve Digital Twins for Industrial Elevators. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA. 1257–1268. isbn:9781450394130 https://doi.org/10.1145/3540250.3558957 Google ScholarDigital Library
Qinghua Xu, Shaukat Ali, Tao Yue, Nedim Zaimovic, and Singh Inderjeet. 2023. Uncertainty-Aware Transfer Learning to Evolve Digital Twins for Industrial Elevators. In Proceedings of the 31th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA. 11 pages. isbn:979-8-4007-0327-0/23/12 https://doi.org/10.1145/3611643.3613879 Google ScholarDigital Library
Jinsong Yu, Yue Song, Diyin Tang, and Jing Dai. 2021. A Digital Twin approach based on nonparametric Bayesian network for complex system health monitoring. Journal of Manufacturing Systems, 58 (2021), 293–304. issn:0278-6125 https://doi.org/10.1016/j.jmsy.2020.07.005 Digital Twin towards Smart Manufacturing and Industry 4.0 Google ScholarCross Ref
Eckart Zitzler and Simon Künzli. 2004. Indicator-based selection in multiobjective search. In PPSN. 4, 832–842. https://doi.org/10.1007/978-3-540-30217-9_84 Google ScholarCross Ref

Index Terms

EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System

Recommendations

Transfer active learning
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Active learning traditionally assumes that labeled and unlabeled samples are subject to the same distributions and the goal of an active learner is to label the most informative unlabeled samples. In reality, situations may exist that we may not have ...
Read More
Transfer Learning Based Classification of Cervical Cancer Immunohistochemistry Images
ISICDM 2019: Proceedings of the Third International Symposium on Image Computing and Digital Medicine

Cervical cancer is the fourth leading cause of cancer-related deaths. It is very important to make the precise diagnosis for the early stage of cervical cancer. In recent years, transfer Learning makes a great breakthrough in the field of machine ...
Read More
Knowledge transfer for multi-labeler active learning
ECMLPKDD'13: Proceedings of the 2013th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

In this paper, we address multi-labeler active learning, where data labels can be acquired from multiple labelers with various levels of expertise. Because obtaining labels for data instances can be very costly and time-consuming, it is highly desirable ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2023
2215 pages
ISBN:9798400703270
DOI:10.1145/3611643
General Chair:
Satish Chandra
Google, USA
,
Program Chairs:
Kelly Blincoe
University of Auckland, New Zealand
,
Paolo Tonella
USI Lugano, Switzerland
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
cyber-cyber digital twin
digital twin
neural network
transfer learning
validation system
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 107
  Total Downloads
- Downloads (Last 12 months)107
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Transfer active learning

Transfer Learning Based Classification of Cervical Cancer Immunohistochemistry Images

Knowledge transfer for multi-labeler active learning