초록

In this paper, the appropriate model is selected for the risk assessment of the electric utility pole data with the help of cheat sheets and k-fold cross validation. In order to analyze, predict and forecast the data, the appropriate model has to be selected. The major issue is the declination of the accuracy in the model fitting, which may result in poor model selection. There are different type of machine learning algorithm, which makes it difficult to conclude the model selection. To ensure the proper selection of the model, we undergo a two-step process. Firstly, the basic model is selected with the existing model selection cheat sheets named as Scikit learn and Microsoft azure, by understanding the available input and required output of the data. After getting through the multiple question, the respective models such as Generalized Additive Model, Generalized Linear Model, Linear Regression and Support Vector Machine are obtained. In order to attain the appropriate model, we perform k-fold cross validation to estimate the risk of the algorithms, by comparing 2-fold, 8-fold and 10-fold cross validation. Between the three set, the 10-cross fold validation of generalized additive model is selected with the least risk error. Using k-fold cross validation, we estimate the accuracy of the model that is suitable for the data, by using the electric power data set.

키워드

Model selection, K-fold cross validation, Machine learning, Model fit, Electric power

참고문헌(16)open

  1. [단행본] R. S. Michalski / 2013 / Machine learning: An artificial intelligence approach / Springer Science & Business Media

  2. [단행본] C. M. Bishop / 2006 / Pattern recognition and machine learning (information science and statistics) / springer-verlag new york. Inc

  3. [단행본] D. J. Hand / 2001 / Principles of data mining / MIT press

  4. [학술지] A. K. Jain / 2000 / Statistical pattern recognition: A review / IEEE Transactions on pattern analysis and machine intelligence 22 (1) : 4 ~ 37

  5. [단행본] T. O. Ayodele / 2010 / New advances in machine learning / InTech

  6. [보고서] S. Arlot / 2010 / A survey of cross-validation procedures for model selection : 40 ~ 79

  7. [학술대회] B. Gu / 2015 / A new generalized error path algorithm for model selection / International Conference on Machine Learning : 2549 ~ 2558

  8. [학술지] R. R. Bies / 2006 / A genetic algorithm-based, hybrid machine learning approach to model selection / Journal of Pharmacokinetics and Pharmacodynamics 33 (2) : 195 ~ 221

  9. [학술지] F. Pedregosa / 2011 / Scikit-learn: Machine learning in Python / Journal of Machine Learning Research 12 : 2825 ~ 2830

  10. [단행본] S. Mund / 2015 / Microsoft azure machine learning / Packt Publishing Ltd

  11. [단행본] S. Chatterjee / 2015 / Regression analysis by example / John Wiley & Sons

  12. [단행본] D. Michie / 1994 / Machine learning, neural and statistical classification

  13. [단행본] S. B. Kotsiantis / 2007 / Supervised machine learning: A review of classification techniques

  14. [학술지] T. Kanungo / 2002 / An efficient k-means clustering algorithm:Analysis and implementation / IEEE transactions on pattern analysis and machine intelligence 24 (7) : 881 ~ 892

  15. [학술지] F. Pedregosa / 2011 / Scikit-learn: Machine learning in Python / Journal of Machine Learning Research 12 : 2825 ~ 2830

  16. [인터넷자료] / Microsoft Azure