Skip to main content

Ensemble Noise Filtering for Streaming Data Using Poisson Bootstrap Model Filtering

  • Conference paper
  • First Online:
Information Technology: New Generations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 448))

  • 2282 Accesses

Abstract

Ensemble filtering techniques filter noisy instances by combining the predictions of multiple base models, each of which is learned using a traditional algorithm. However, in the last decade, due to the massive increase in the amount of online streaming data, ensemble filtering methods, which largely operate in batch mode and requires multiple passes over the data, cause time and storage complexities. In this paper, we present an ensemble bootstrap model filtering technique with multiple inductive learning algorithms on several small Poisson bootstrapped samples of online data to filter noisy instances. We analyze three prior filtering techniques using Bayesian computational analysis to understand the underlying distribution of the model space. We implement our and other prior filtering approaches and show that our approach is more accurate than other prior filtering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  2. Brodley, C.E., Friedl, M.A.: Identifying and eliminating mislabeled training instances. In: AAAI/IAAI, vol. 1, pp. 799–805 (1996)

    Google Scholar 

  3. Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. Journal of Artificial Intelligence Research, 131–167 (1999)

    Google Scholar 

  4. Gamberger, D., Lavrac, N., Groselj, C.: Experiments with noise filtering in a medical domain. In: ICML, pp. 143–151 (1999)

    Google Scholar 

  5. Weisberg, S.: Applied linear regression, vol. 528. John Wiley and Sons (2005)

    Google Scholar 

  6. Zhu, X., Wu, X., Chen, Q.: Eliminating class noise in large datasets. In: ICML, vol. 3, pp. 920–927, August 2003

    Google Scholar 

  7. Davidson, I., Fan, W.: When efficient model averaging out-performs boosting and bagging. In: Knowledge Discovery in Databases: PKDD 2006, pp. 478–486 (2006)

    Google Scholar 

  8. Efron, B., Tibshirani, R.J.: An introduction to the bootstrap. CRC Press (1994)

    Google Scholar 

  9. Satyanarayana, A.: Data mining for large datasets: intelligent sampling and filtering. Doctoral Dissertation, State University of New York, Albany (2006)

    Google Scholar 

  10. Oza, N.C.: Online bagging and boosting. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 2340–2345, October 2005

    Google Scholar 

  11. Shakil, K.A., Anis, S., Alam, M.: Dengue disease prediction using weka data mining tool, arXiv preprint arXiv:1502.05167 (2015)

  12. Lichman, M.: UCI Machine Learning Rep. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashwin Satyanarayana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Satyanarayana, A., Chinchilla, R. (2016). Ensemble Noise Filtering for Streaming Data Using Poisson Bootstrap Model Filtering. In: Latifi, S. (eds) Information Technology: New Generations. Advances in Intelligent Systems and Computing, vol 448. Springer, Cham. https://doi.org/10.1007/978-3-319-32467-8_75

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32467-8_75

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32466-1

  • Online ISBN: 978-3-319-32467-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics