Abstract
Distinguishing migraine from stroke is a challenge due to many common signs and symptoms. It is important to consider the cost of hospitalization and the time spent by neurologists and stroke nurses to visit, diagnose, and assign appropriate care to the patients; therefore, devising new ways to distinguish stroke, migraine and other types of mimics can help in saving time and cost, and improve decision-making. In this study, we utilized text and data mining methods to extract the most important predictors from clinical reports in order to establish a migraine detection model and distinguish migraine patients from stroke or other types of mimic (non-stroke) cases. The available data for this study was a heterogeneous mix of free-text fields, such as triage main-complaints and specialist final-impressions, as well as numeric data about patients, such as age, blood-pressure, and so on. After a careful combination of these sources, we obtained a highly imbalanced dataset where the migraine cases were only about 6 % of the dataset. Our main challenge was tackling this data imbalance. Using the dataset in its original form to build classifiers led to a learning bias towards the majority class and against the minority (migraine) class. We used a sampling method to address the imbalance problem. First, different sources of data were preprocessed and balanced datasets were generated; second, attribute selection algorithms were used to reduce the dimensionality of the data; third, a novel combination of data mining algorithms was employed in order to effectively distinguish migraine from other cases. We achieved a sensitivity and specificity of about 80 and 75 %, respectively, which is in contrast to a sensitivity and specificity of 15.7 and 97 % when using the original imbalanced data for building classifiers.
Similar content being viewed by others
Notes
The cost for an MRI is approximately $500, but the more comprehensive the brain scan, the more costly the MRI exam becomes. Besides, there are additional consultation fees for interpretation that are added on to the MRI scan. The neurological consultation fee can be found in Government of British Columbia (2016).
SGS was produced by Synapse Publishing in 2001 and version 2.0.6 is currently used by VGH.
References
Arauzo-Azofra A, Benitez JM, Castro JL (2008) Consistency measures for feature selection. J Intell Inf Syst 30(3):273–292
Cao ZH, Ko LW, Lai KL, Huang SB, Wang SJ, Lin CT (2015) Classification of migraine stages based on resting-state eeg power. In: 2015 international joint conference on neural networks (IJCNN), IEEE, pp 1–5
Duval B, Hao JK, Hernandez Hernandez JC (2009) A memetic algorithm for gene selection and molecular classification of cancer. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, New York, pp 201–208
Etminan M, Takkouche B, Isorna FC, Samii A et al (2005) Risk of ischaemic stroke in people with migraine: systematic review and meta-analysis of observational studies. BMJ 330(7482):63
Ghandehari K, Ashrafzadeh F, Mood ZI, Ebrahimzadeh S, Arabikhan K (2012) Development and validation of the asian migraine criteria (AMC). J Clin Neurosci 19(2):224–228
Government of British Columbia: Msc payment schedule index, neurology (2016). http://www2.gov.bc.ca/assets/gov/health/practitioner-pro/medical-services-plan/msc-payment-schedule-2016-01-31.pdf
Hornik K, Buchta C, Hothorn T, Karatzoglou A, Meyer D, Zeileis A (2016) Rweka: R/weka interface. https://cran.r-project.org/web/packages/RWeka
Jason B (2016) Feature selection to improve accuracy and decrease training time. http://machinelearningmastery.com/feature-selection-to-improve-accuracy-and-decrease-training-time/
Ko LW, Lai KL, Huang PH, Lin CT, Wang SJ (2013) Steady-state visual evoked potential based classification system for detecting migraine seizures. In: 2013 6th international IEEE/EMBS conference on neural engineering (NER), IEEE, pp 1299–1302
Len Trigg: class costsensitiveclassifier (2016). http://weka.sourceforge.net/doc.dev/weka/classifiers/meta/CostSensitiveClassifier.html
MediResource: C.health, migraine (migraine headache) (2015). http://chealth.canoe.com/channel_condition_info_details.asp?disease_id=88&
Microsoft: Microsoft azure machine learning studio (2016). https://azure.microsoft.com/en-us/free/?WT.srch=1&WT.mc_ID=SEM_eYMJ89zv
Navot A (2006) On the role of feature selection in machine learning. PhD thesis, Hebrew University
Sedghi E, Weber JH, Thomo A, Bibok M, Penn A (2015) Mining clinical text for stroke prediction. Netw Model Anal Health Inf Bioinf 4(1):1–9
Sun Y, Kamel MS, Wang Y (2006) Boosting for learning multiple classes with imbalanced class distribution. Sixth international conference on data mining ICDM’0. IEEE, New York, pp 592–602
TheMigraineTrust: stroke and migraine (2015). http://www.migrainetrust.org/factsheet-stroke-and-migraine-10891
The_R_Foundation: What is r? https://www.r-project.org/about.html
Tzourio C, Tehindrazanarivelo A, Iglesias S, Alperovitch A, Chedru F, d’Anglejan Chatillon J, Bousser, MG (1995) Case–control study of migraine and risk of ischaemic stroke in young women. BMJ 310(6983):830–833
University of Waikato, New Zealand: Weka (machine learning) (2014). http://en.wikipedia.org/wiki/Weka(machine learning)
Viticchi G, Falsetti L, Silvestrini M, Luzzi S, Provinciali L, Bartolini M (2012) The real usefulness and indication for migraine diagnosis of neurophysiologic evaluation. Neurol Sci 33(1):161–163
Wasikowski M, Chen XW (2010) Combating the small sample class imbalance problem using feature selection. IEEE Trans Knowl Data Eng 22(10):1388–1400
WebMD: tests for diagnosing migraines (2015). http://www.webmd.com/migraines-headaches/migraine-diagnosing-tests
Weiss GM (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newslett 6(1):7–19
Wikipedia: feature selection (2016). https://en.wikipedia.org/wiki/Feature_selection
Acknowledgments
The authors would like to acknowledge Kristine Votova, PhD, the project manager for the SpecTRA Research Project and the Island Health clinical research team at the Stroke Rapid Assessment Unit for their support. Funding for the natural experiment in stroke care and the large-scale personalized medicine for mass spectrometry in rapid TIA triage comes from Canadian Institute of Health Research (2009–2012) and Genome Canada/BC (2013–2017).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sedghi, E., Weber, J.H., Thomo, A. et al. A new approach to distinguish migraine from stroke by mining structured and unstructured clinical data-sources. Netw Model Anal Health Inform Bioinforma 5, 30 (2016). https://doi.org/10.1007/s13721-016-0137-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-016-0137-2