Skip to main content

Confidence in Random Forest for Performance Optimization

  • Conference paper
  • First Online:
Book cover Artificial Intelligence XXXV (SGAI 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11311))

Abstract

In this paper, we present a non-deterministic strategy for searching for optimal number of trees (NoTs) hyperparameter in Random Forest (RF). Hyperparameter tuning in Machine Learning (ML) algorithms optimizes predictability of an ML algorithm and/or improves computer resources utilization. However, hyperparameter tuning is a complex optimization task and time consuming. We set up experiments with the goal of maximizing predictability, minimizing NoTs and minimizing time of execution (ToE). Compared to the deterministic algorithm, e-greedy and default configured RF, this research’s non-deterministic algorithm recorded an average percentage accuracy (acc) of approximately 98%, NoTs percentage average improvement of 29.39%, average ToE improvement ratio of 415.92 and an average improvement of 95% iterations. Moreover, evaluations using Jackknife Estimation showed stable and reliable results from several experiment runs of the non-deterministic strategy. The non-deterministic approach in selecting hyperparameter showed a significant acc and better computer resources (i.e. cpu and memory time) utilization. This approach can be adopted widely in hyperparameter tuning, and in conserving utilization of computer resources i.e. green computing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bernard, S., Heutte, L., Adam, S.: Influence of hyperparameters on random forest accuracy. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds.) MCS 2009. LNCS, vol. 5519, pp. 171–180. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02326-2_18

    Chapter  Google Scholar 

  2. Breiman, L.: Random forests. J. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  3. Breiman, L., Cutler, A.: Random forests manual v4.0 (2017). https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf

  4. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016). https://doi.org/10.1145/2939672.2939785

  5. Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  6. Ganjisaffar, Y., Debeauvais, T., Javanmardi, S., Caruana, R., Lopes, C.V.: Distributed tuning of machine learning algorithms using MapReduce clusters. In: 3rd Workshop on Large Scale Data Mining: Theory and Applications. ACM (2011). https://doi.org/10.1145/2002945.2002947

  7. Hazan, E., Klivans, A., Yuan, Y.: Hyperparameter optimization: a spectral approach. arXiv preprint arXiv:1706.00764 (2017)

  8. Kaggle: Kaggle datasets. https://www.kaggle.com/datasets

  9. Lalor, J., Wu, H., Yu, H.: Improving machine learning ability with fine-tuning (2017)

    Google Scholar 

  10. Liu, X., et al.: Semi-supervised node splitting for random forest construction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 492–499 (2013). https://doi.org/10.1109/CVPR.2013.70

  11. Oshiro, T.M., Perez, P.S., Baranauskas, J.A.: How many trees in a random forest? In: Perner, P. (ed.) MLDM 2012. LNCS (LNAI), vol. 7376, pp. 154–168. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31537-4_13

    Chapter  Google Scholar 

  12. Senagi, K., Jouandeau, N., Kamoni, P.: Using parallel random forest classifier in predicting land suitability for crop production. J. Agric. Inform. 8(3), 23–32 (2017). https://doi.org/10.17700/jai.2017.8.3.390

    Article  Google Scholar 

  13. Smit, S.K., Eiben, A.E.: Comparing parameter tuning methods for evolutionary algorithms. In: IEEE Congress on Evolutionary Computation, pp. 399–406. IEEE (2009). https://doi.org/10.1109/CEC.2009.4982974

  14. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)

    Google Scholar 

  15. Wager, S., Hastie, T., Efron, B.: Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J. Mach. Learn. Res. 15(1), 1625–1651 (2014)

    MathSciNet  MATH  Google Scholar 

  16. White, J.: Bandit Algorithms for Website Optimization. O’Reilly Media, Inc., Farnham (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kennedy Senagi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Senagi, K., Jouandeau, N. (2018). Confidence in Random Forest for Performance Optimization. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXV. SGAI 2018. Lecture Notes in Computer Science(), vol 11311. Springer, Cham. https://doi.org/10.1007/978-3-030-04191-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04191-5_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04190-8

  • Online ISBN: 978-3-030-04191-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics