ReviewPredictive data mining in clinical medicine: Current issues and guidelines
Introduction
Over the last few years, the term ‘data mining’ has been increasingly used in the medical literature. In general, the term has not been anchored to any precise definition but to some sort of common understanding of its meaning: the use of (novel) methods and tools to analyze large amounts of data. Data mining has been applied with success to different fields of human endeavor, including marketing, banking, customer relationship management, engineering and various areas of science. However, its application to the analysis of medical data – despite high hopes – has until recently been relatively limited. This is particularly true of practical applications in clinical medicine which may benefit from specific data mining approaches that are able to perform predictive modeling, exploit the knowledge available in the clinical domain and explain proposed decisions once the models are used to support clinical decisions. The goal of predictive data mining in clinical medicine is to derive models that can use patient-specific information to predict the outcome of interest and to thereby support clinical decision-making. Predictive data mining methods may be applied to the construction of decision models for procedures such as prognosis, diagnosis and treatment planning, which – once evaluated and verified – may be embedded within clinical information systems.
In this paper, we give a methodological review of data mining, focusing on its data analysis process and highlighting some of the most relevant issues related to its application in clinical medicine. We limit the paper's scope to predictive data mining whose methods are methodologically ripe and often easily available and may be particularly suitable for the class of problems arising from clinical data analysis and decision support.
Section snippets
Background
Data mining is the process of selecting, exploring and modeling large amounts of data in order to discover unknown patterns or relationships which provide a clear and useful result to the data analyst [1]. Coined in the mid-1990s, the term data mining has today become a synonym for ‘Knowledge Discovery in Databases’ which, as proposed by Fayyad et al. [2], emphasized the data analysis process rather than the use of specific analysis methods. Data mining problems are often solved by using a
Contribution of data mining to predictive modeling in clinical medicine
Predictive models in clinical medicine are ‘… tools for helping decision making that combine two or more items of patient data to predict clinical outcomes’ [68]. Such models may be used in several clinical contexts by clinicians and may allow a prompt reaction to unfavorable situations [69]. Data mining may effectively contribute to the development of clinically useful predictive models thanks to at least three inter-related aspects: (a) a comprehensive and purposive approach to data analysis
Predictive data mining process: tasks and guidelines
Data mining is most often the application of a number of different techniques from various disciplines with the goal to discover interesting patterns from data. Given the large variety of techniques available and interdisciplinary fields, it is no surprise that data mining is often viewed as a craft that is hard to learn and even harder to master.
As we mentioned, several process models and standards have been proposed to introduce engineering principles, systemize the process and define typical
Discussion
Compared to data mining in business, marketing and the economy, medical data mining applications have several distinguishing features [104]. The most important one is that medicine is a safety critical context [105] in which decision-making activities should always be supported by explanations. This means that the value of each datum may be higher than in other contexts: experiments can be costly due to the involvement of the personnel and use of expensive instrumentation and due to the
Conclusion
At present, many ripe predictive data mining methods have been successfully applied to a variety of practical problems in clinical medicine. As suggested by Hand [40], data mining is particularly successful where data are in abundance. For clinical medicine, this includes the analysis of clinical data warehouses, epidemiological studies and emerging studies in genomics and proteomics. Crucial to such data are those data mining approaches which allow the use of the background knowledge, discover
Acknowledgements
The authors would like to acknowledge the help given by the International Medical Informatics Association and its Working Group on Intelligent Data Analysis and Data Mining, which they are chairing. The work was supported by a Slovenian-Italian Bilateral Collaboration Project. RB is also supported by the Italian Ministry of University and Scientific Research through the PRIN Project ‘Dynamic modeling of gene and protein expression profiles: clustering techniques and regulatory networks’, and BZ
References (108)
Machine learning for medical diagnosis: history, state of the art and perspective
Artif. Intell. Med.
(2001)- et al.
Medical expert systems based on causal probabilistic networks
Int. J. Biomed. Comput.
(1991) - et al.
NasoNet, modeling the spread of nasopharyngeal cancer with networks of probabilistic events in discrete time
Artif. Intell. Med.
(2002) - et al.
The role of Bayesian Networks in the diagnosis of pulmonary embolism
J. Thromb. Haemost.
(2003) - et al.
Application of a data-mining method based on Bayesian networks to lesion-deficit analysis
Neuroimage
(2003) - et al.
Learning Gaussian networks
- et al.
Data integration and genomic medicine
J. Biomed. Inform.
(2007) - et al.
Use of proteomic patterns in serum to identify ovarian cancer
Lancet
(2002) - et al.
Defining aggressive prostate cancer using a 12-gene model
Neoplasia
(2006) - et al.
A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility
J. Theor. Biol.
(2006)
Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system
Comput. Biomed. Res.
Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms An empirical comparison between different approaches
Artif. Intell. Med.
Knowledge-based data analysis and interpretation
Artif. Intell. Med.
Two-stage machine learning model for guideline development
Artif. Intell. Med.
Wrappers for feature subset selection
Artif. Intell.
Machine learning for survival analysis: a case study on recurrence of prostate cancer
Artif. Intell. Med.
Applied Data Mining Statistical Methods for Business and Industry
Data mining and knowledge discovery in databases
Commun. ACM
Predicting patient's long-term clinical status after hip arthroplasty using hierarchical decision modelling and data mining
Meth. Inf. Med.
Orange: from experimental machine learning to interactive data mining
Inductive and Bayesian learning in medical diagnosis
Appl. Artif. Intelligen.
A practical device for the application of a diagnostic or prognostic function
Meth. Inf. Med.
Nomograms for visualization of naive bayesian classifier
Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis
A preoperative nomogram for disease recurrence following radical prostatectomy for prostate cancer
J. Natl. Cancer Inst.
International validation of a preoperative nomogram for prostate cancer recurrence after radical prostatectomy
J. Clin. Oncol.
C4.5: Programs for Machine Learning
Classification and Regression Trees
The CN2 Induction Algorithm
Mach. Learn.
Learning patterns in noisy data: the AQ approach
Intelligent data analysis for medical diagnosis: using machine learning and temporal abstraction
AI Commun.
Applied Logistic Regression
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology
Stat. Med.
An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods
Statistical Learning Theory
Support-vectors networks
Mach. Learn.
Clinical applications of Bayesian belief networks in pathology
Pathologica
Sequential updating of conditional probabilities on directed graphical structures
Networks
A guide to the literature on learning probabilistic networks from data
IEEE Trans. Know. Data Eng.
A Bayesian method for the induction of probabilistic networks from data
Mach. Learn.
Robust learning with missing data
Mach. Learn.
Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia
Nat. Genet.
P J M. Learning Bayesian networks by genetic algorithms: a case study in the prediction of survival in malignant skin melanoma
Using prior knowledge to improve genetic network reconstruction from microarray data
In. Silico. Biol.
CRISP-DM 1. 0: Step-by-Step Data Mining Guide: The CRISP-DM Consortium
Anatomic pathology data mining
Supporting discovery in medicine by association rule mining in Medline and UMLS
Medinfo
Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program
Proc. AMIA Symp.
Data mining: statistics and more?
Am. Statist.
Cited by (635)
Prediction of the burden of road traffic injuries in Iran by 2030: Prevalence, death, and disability-adjusted life years
2024, Chinese Journal of Traumatology - English EditionHubris or talent? Estimating the role of overconfidence in Chinese households’ investment decisions
2024, International Review of Financial AnalysisArtificial intelligence in healthcare and IJMI scope
2023, International Journal of Medical InformaticsUnlocking the potential of artificial intelligence in sports cardiology: does it have a role in evaluating athlete’s heart?
2024, European Journal of Preventive Cardiology