Machine-learning classifiers for imbalanced tornado data

Trafalis, Theodore B.; Adrianto, Indra; Richman, Michael B.; Lakshmivarahan, S.

doi:10.1007/s10287-013-0174-6

Machine-learning classifiers for imbalanced tornado data

Original Paper
Published: 13 June 2013

Volume 11, pages 403–418, (2014)
Cite this article

Computational Management Science Aims and scope Submit manuscript

Theodore B. Trafalis¹,
Indra Adrianto¹,
Michael B. Richman² &
…
S. Lakshmivarahan³

940 Accesses
15 Citations
Explore all metrics

Abstract

Learning from imbalanced data, where the number of observations in one class is significantly larger than the ones in the other class, has gained considerable attention in the machine learning community. Assuming the difficulty in predicting each class is similar, most standard classifiers will tend to predict the majority class well. This study applies tornado data that are highly imbalanced, as they are rare events. The severe weather data used herein have thunderstorm circulations (mesocyclones) that produce tornadoes in approximately 6.7 % of the total number of observations. However, since tornadoes are high impact weather events, it is important to predict the minority class with high accuracy. In this study, we apply support vector machines (SVMs) and logistic regression with and without a midpoint threshold adjustment on the probabilistic outputs, random forest, and rotation forest for tornado prediction. Feature selection with SVM-recursive feature elimination was also performed to identify the most important features or variables for predicting tornadoes. The results showed that the threshold adjustment on SVMs provided better performance compared to other classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HOUSEN: Hybrid Over–Undersampling and Ensemble Approach for Imbalance Classification

An Useful Survey on Supervised Machine Learning Algorithms: Comparisons and Classifications

Assessing Imbalanced Datasets in Binary Classifiers

References

Bi J, Bennett KP, Embrechts M, Breneman CM, Song M (2003) Dimensionality reduction via sparse support vector machines. J Mach Learn Res 3:1229–1243
Google Scholar
Bluestein HB (1993) Synoptic-dynamic meteorology in midlatitudes: volume II: observations and theory of weather systems. Oxford University Press, New York
Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on computational learning theory, Pittsburgh, Pennsylvania, US
Breiman L (2001) Random Forests. Mach Learn 45(1):5–32. doi:10.1023/a:1010933404324
Article Google Scholar
Cárdenas AA, Baras JS (2006) B-ROC curves for the assessment of classifiers over imbalanced data sets. In: Proceedings of the 21st national conference on artificial intelligence (AAAI 06), Boston, Massachusetts, July 16–20, 2006
Donaldson RJ, Dyer RM, Krauss MJ (1975) An objective evaluator of techniques for predicting severe weather events. In: Ninth conference on severe local storms, Norman, OK, 1975. American Meteorological Society, pp 321–326
Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced data sets II, ICML, Washington, DC, 2003
Efron B, Tibshirani R (1993) An introduction to the bootstrap. In: Monographs on statistics and applied probability, vol 57. Chapman & Hall, New York
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422. doi:10.1023/a:1012487302797
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. doi:10.1145/1656274.1656278
Article Google Scholar
Hand DJ, Mannila H, Smyth P (2001) Principles of data mining. In: Adaptive computation and machine learning. MIT Press, Cambridge
Heidke P (1926) Berechnung des erfolges und der gute der windstarkvorhersagen im sturmwarnungsdienst. Geografiska Annaler 8:301–349
Article Google Scholar
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of 14th international conference on machine learning, 1997. Morgan Kaufmann, Los Altos, pp 179–186. citeulike-article-id:2526066
Lakshmanan V, Stumpf G, Witt A (2005) A neural network for detecting and diagnosing tornadic circulations using the mesocyclone detection and near storm environment algorithms. In: 21st international conference on information processing systems, San Diego, CA, 2005. p J5.2
Marzban C, Stumpf GJ (1996) A neural network for tornado prediction based on Doppler radar-derived attributes. J Appl Meteorol 35(5):617–626
Article Google Scholar
McGill R, Tukey JW, Larsen WA (1978) Variations of box plots. Am Stat 32(1):12–16. doi:10.2307/2683468
Google Scholar
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola A, PB, Schölkopf B, Schuurmans D (ed) Advances in large margin classifiers. pp 61–74. citeulike-article-id:3115812
Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231
Article Google Scholar
Provost FJ, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. Paper presented at the proceedings of the fifteenth international conference on machine learning
Richman MB (1986) Rotation of principal components. J Climatol 6(3):293–335
Article Google Scholar
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630. doi:10.1109/TPAMI.2006.211
Article Google Scholar
Roebber PJ (2009) Visualizing multiple measures of forecast quality. Weather Forecast 24:601–608
Article Google Scholar
Stumpf GJ, Witt A, Mitchell ED, Spencer PL, Johnson JT, Eilts MD, Thomas KW, Burgess DW (1998) The national severe storms laboratory mesocyclone detection algorithm for the WSR-88D. Weather Forecast 13(2):304–326
Article Google Scholar
Trafalis TB, Ince H, Richman MB (2003) Tornado detection with support vector machines. Paper presented at the proceedings of the (2003) international conference on computational science. Melbourne, Australia
Trafalis TB, Santosa B, Richman MB (2004) Bayesian neural networks for tornado detection. WSEAS Trans Syst 3:3211–3216
Google Scholar
Trafalis TB, Santosa B, Richman MB (2005) Learning networks for tornado forecasting: a Bayesian perspective. WIT Trans Inf Commun Technol 35:5–14
Google Scholar
Vapnik VN (1998) Statistical learning theory. In: Adaptive and learning systems for signal processing, communications, and control. Wiley, New York
Wilks D (1995) Statistical methods in atmospheric sciences. Academic Press, San Diego
Google Scholar
Yang JH, Honavar V (1998) Feature subset selection using a genetic algorithm. IEEE Intell Syst App 13(2):44–49. doi:10.1109/5254.671091
Article Google Scholar

Download references

Acknowledgments

Funding for this research was provided under the National Science Foundation Grants AGS0831359 and EIA-0205628.

Author information

Authors and Affiliations

School of Industrial and Systems Engineering, The University of Oklahoma, Norman, OK, 73019, USA
Theodore B. Trafalis & Indra Adrianto
School of Meteorology, The University of Oklahoma, Norman, OK, 73019, USA
Michael B. Richman
School of Computer Science, The University of Oklahoma, Norman, OK, 73019, USA
S. Lakshmivarahan

Authors

Theodore B. Trafalis
View author publications
You can also search for this author in PubMed Google Scholar
Indra Adrianto
View author publications
You can also search for this author in PubMed Google Scholar
Michael B. Richman
View author publications
You can also search for this author in PubMed Google Scholar
S. Lakshmivarahan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Theodore B. Trafalis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Trafalis, T.B., Adrianto, I., Richman, M.B. et al. Machine-learning classifiers for imbalanced tornado data. Comput Manag Sci 11, 403–418 (2014). https://doi.org/10.1007/s10287-013-0174-6

Download citation

Received: 05 December 2012
Accepted: 22 May 2013
Published: 13 June 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10287-013-0174-6

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine-learning classifiers for imbalanced tornado data

Abstract

Access this article

Similar content being viewed by others

HOUSEN: Hybrid Over–Undersampling and Ensemble Approach for Imbalance Classification

An Useful Survey on Supervised Machine Learning Algorithms: Comparisons and Classifications

Assessing Imbalanced Datasets in Binary Classifiers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Machine-learning classifiers for imbalanced tornado data

Abstract

Access this article

Similar content being viewed by others

HOUSEN: Hybrid Over–Undersampling and Ensemble Approach for Imbalance Classification

An Useful Survey on Supervised Machine Learning Algorithms: Comparisons and Classifications

Assessing Imbalanced Datasets in Binary Classifiers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation