Abstract
Machine learning methods can offer a practicalalternative to deterministic and statistical methods forpredicting air pollution concentrations. However, for agiven data set, it is often not clear beforehand whichmachine learning method will yield the best predictionperformance. This study compares the variable selection andprediction performance of four machine-learning methods ofdifferent complexity: logistic regression, decision tree,multivariate adaptive regression splines and neuralnetwork. The methods are applied to the task of predictingthe exceedance of the European PM10 daily averageobjective of 50 μg m-3 for a station in Helsinki,Finland. Our study shows that some predictors were selectedby all models but that the different models also pickeddifferent variables. The performance of three of the fourmethods investigated was very similar, however, performanceof the decision tree method was significantly inferior.Performance was sensitive to the learning sample size andtime period used.
Similar content being viewed by others
References
Berge, E., Walker, S-E., Sorteberg, A., Lenkopane, M. L., Eastwood, S., Jablonska, H. J. and Ødegaard, M.: 2001, ‘A Real Time Operational Forecast Model for Meteorology and Air Quality During Peak Air Pollution Episodes in Oslo, Norway’, Proceedings of 3th International Conference on Urban Air Quality, Loutraki, Greece, March 2001.
Berthold, M. and Hand, D. (eds): 1999, Intelligent Data Analysis, Springer.
Breiman, L., Friedman, J., Olshen, R. and Stone, C.: 1984, Classification and Regression Trees, Wadsworth International Group.
Brodley, C. E.: 1993, ‘Addressing the selective superiority problem: Automatic algorithms/model class selection’, in P. Utgoff (ed.), Proceedings of the Tenth International Conference on Machine Learning, pp. 17–24.
De Leeuw, F., Moussiopoulos, N., Bartonova, A. and Sahm, P.: 2000, ‘Air Quality in Larger Conurbations in the European Union’, European Topic Centre on Air Quality.
Friedman, J. H.: 1991, ‘Multivariate adaptive regression splines (with discussion)’, Ann. Statis. 19,1–141.
Gardner, M. and Dorling, S., 1998: 'Artificial neural networks (the multi-layer perceptron) – a review of applications in the atmospheric sciences’, Atmos. Environ. 32, 2627–2636
Gardner, M. and Dorling, S.: 1999, ‘Statistical surface ozone models: an improved methodology to account for non-linear behaviour, Atmos. Environ. 34, 21–34.
Goldberg, D. E.: 1989, Genetic Algorithms, Reading, MA: Addison Wesley.
Kennedy, R. L., Yuchun, L., van Roy, B., Reed, C. and Lippman, R.: 1997, ‘Solving Data Mining Problems with Pattern Recognition’, The Data Warehousing Institute Series.
Kooperberg, C., Smarajit, B. and Charles, J.: 1997, ‘Polychotomous regression’, J. Amer. Stat. Assoc. 92, 117–127.
Pohjola, M., Kousa, A., P. Aarnio, P., Koskentalo, T., Kukkonen, Harkonen, J. and Karppinen, A.: 2000, ‘Meteorological interpretation of measured urban PM2.5 and PM10 concentrations in Helsinki Metropolitan Area’, Air Pollution VIII, 679–698.
SPSS, User Manual, Version 9.0.
US EPA: 1999 'Guideline for Developing an Ozone Forecasting Program’, EPA-454/R–99–009.
Zickus, M.: 1999, ‘Influence of Meteorological Parameters on Urban Air Pollution and Its Forecast’, PhD. Thesis, Department of Physics, Vilnius University, 105 pp. Available on Internet: http://195.194.93.120/thesis/.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Zickus, M., Greig, A.J. & Niranjan, M. Comparison of Four Machine Learning Methods for Predicting PM10 Concentrations in Helsinki, Finland. Water, Air, & Soil Pollution: Focus 2, 717–729 (2002). https://doi.org/10.1023/A:1021321820639
Issue Date:
DOI: https://doi.org/10.1023/A:1021321820639