Abstract
Ensemble filtering techniques filter noisy instances by combining the predictions of multiple base models, each of which is learned using a traditional algorithm. However, in the last decade, due to the massive increase in the amount of online streaming data, ensemble filtering methods, which largely operate in batch mode and requires multiple passes over the data, cause time and storage complexities. In this paper, we present an ensemble bootstrap model filtering technique with multiple inductive learning algorithms on several small Poisson bootstrapped samples of online data to filter noisy instances. We analyze three prior filtering techniques using Bayesian computational analysis to understand the underlying distribution of the model space. We implement our and other prior filtering approaches and show that our approach is more accurate than other prior filtering methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Brodley, C.E., Friedl, M.A.: Identifying and eliminating mislabeled training instances. In: AAAI/IAAI, vol. 1, pp. 799–805 (1996)
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. Journal of Artificial Intelligence Research, 131–167 (1999)
Gamberger, D., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: ICML, pp. 143–151 (1999)
Weisberg, S.: Applied linear regression, vol. 528. John Wiley and Sons (2005)
Zhu, X., Wu, X., Chen, Q.: Eliminating class noise in large datasets. In: ICML, vol. 3, pp. 920–927, August 2003
Davidson, I., Fan, W.: When efficient model averaging out-performs boosting and bagging. In: Knowledge Discovery in Databases: PKDD 2006, pp. 478–486 (2006)
Efron, B., Tibshirani, R.J.: An introduction to the bootstrap. CRC Press (1994)
Satyanarayana, A.: Data mining for large datasets: intelligent sampling and filtering. Doctoral Dissertation, State University of New York, Albany (2006)
Oza, N.C.: Online bagging and boosting. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 2340–2345, October 2005
Shakil, K.A., Anis, S., Alam, M.: Dengue disease prediction using weka data mining tool, arXiv preprint arXiv:1502.05167 (2015)
Lichman, M.: UCI Machine Learning Rep. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Satyanarayana, A., Chinchilla, R. (2016). Ensemble Noise Filtering for Streaming Data Using Poisson Bootstrap Model Filtering. In: Latifi, S. (eds) Information Technology: New Generations. Advances in Intelligent Systems and Computing, vol 448. Springer, Cham. https://doi.org/10.1007/978-3-319-32467-8_75
Download citation
DOI: https://doi.org/10.1007/978-3-319-32467-8_75
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32466-1
Online ISBN: 978-3-319-32467-8
eBook Packages: EngineeringEngineering (R0)