A streaming flow-based technique for traffic classification applied to 12 + 1 years of Internet traffic

Carela-Español, Valentín; Barlet-Ros, Pere; Bifet, Albert; Fukuda, Kensuke

doi:10.1007/s11235-015-0114-6

A streaming flow-based technique for traffic classification applied to 12 + 1 years of Internet traffic

Published: 19 November 2015

Volume 63, pages 191–204, (2016)
Cite this article

Telecommunication Systems Aims and scope Submit manuscript

Valentín Carela-Español¹,
Pere Barlet-Ros¹,
Albert Bifet² &
…
Kensuke Fukuda³

519 Accesses
19 Citations
Explore all metrics

Abstract

The continuous evolution of Internet traffic and its applications makes the classification of network traffic a topic far from being completely solved. An essential problem in this field is that most of proposed techniques in the literature are based on a static view of the network traffic (i.e., they build a model or a set of patterns from a static, invariable dataset). However, very little work has addressed the practical limitations that arise when facing a more realistic scenario with an infinite, continuously evolving stream of network traffic flows. In this paper, we propose a streaming flow-based classification solution based on Hoeffding Adaptive Tree, a machine learning technique specifically designed for evolving data streams. The main novelty of our proposal is that it is able to automatically adapt to the continuous evolution of the network traffic without storing any traffic data. We apply our solution to a 12 + 1 year-long dataset from a transit link in Japan, and show that it can sustain a very high accuracy over the years, with significantly less cost and complexity than existing alternatives based on static learning algorithms, such as C4.5.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Machine Learning: Algorithms, Real-World Applications and Research Directions

Article 22 March 2021

Iqbal H. Sarker

Driving behavior analysis and classification by vehicle OBD data using machine learning

Article 19 May 2023

Raman kumar & Anuj Jain

A Survey of Traffic Prediction: from Spatio-Temporal Data to Intelligent Transportation

Article Open access 23 January 2021

Haitao Yuan & Guoliang Li

References

Dainotti, A., Pescapè, A., & Claffy, K. C. (2012). Issues and future directions in traffic classification. IEEE Network, 26(1), 35–40. doi:10.1109/MNET.2012.6135854.
Article Google Scholar
Alcock, S., & Nelson, R. (2015). Libprotoident: traffic classification using lightweight packet inspection. Technical report, University of Waikato (2012). [Online]. Retrieved June 22, 2015 from http://www.wand.net.nz/publications/lpireport.
Carela-Español, V., Bujlow, T., & Barlet-Ros, P. (2014). Is our ground-truth for traffic classification reliable? In Proceedings of the 15th international conference on passive and active network measurement, PAM’14 (pp. 98–108). Berlin: Springer. doi:10.1007/978-3-319-04918-2_10.
Lim, Y. S., Kim, H. C., Jeong, J., Kim, C. K., Kwon, T. T., & Choi, Y. (2010). Internet traffic classification demystified: On the sources of the discriminative power. In Proceedings of the 6th international conference, Co-NEXT’10 (pp. 9:1–9:12). New York, NY: ACM. doi:10.1145/1921168.1921180.
Nguyen, T. T., & Armitage, G. (2008). A survey of techniques for internet traffic classification using machine learning. IEEE on Communications Surveys & Tutorials, 10(4), 56–76. doi:10.1109/SURV.2008.080406.
Article Google Scholar
Carela-Español, V., Barlet-Ros, P., Cabellos-Aparicio, A., & Solé-Pareta, J. (2011). Analysis of the impact of sampling on netflow traffic classification. Computer Networks, 55(5), 1083–1099. doi:10.1016/j.comnet.2010.11.002.
Article Google Scholar
Alcock, S., & Nelson, R. (2013). Measuring the accuracy of open-source payload-based traffic classifiers using popular internet applications. In IEEE 38th conference on local computer networks workshops (LCN workshop on network measurements) (pp. 956–963). doi:10.1109/LCNW.2013.6758538.
Bujlow, T., Carela-Español, V., & Barlet-Ros, P. (2015). Independent comparison of popular dpi tools for traffic classification. Computer Networks, 76, 75–89. doi:10.1016/j.comnet.2014.11.001.
Article Google Scholar
de Donato, W., Pescape, A., & Dainotti, A. (2014). Traffic identification engine: An open platform for traffic classification. IEEE on Network, 28(2), 56–64. doi:10.1109/MNET.2014.6786614.
Article Google Scholar
Gama, J. A., Sebastião, R., & Rodrigues, P. P. (2009). Issues in evaluation of stream learning algorithms. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’09 (pp. 329–338). New York, NY: ACM. doi:10.1145/1557019.1557060.
Bifet, A., Holmes, G., Kirkby, R., & Pfahringer, B. (2010). Moa: Massive online analysis. Journal of Machine Learning Research, 11, 1601–1604. http://www.jmlr.org/proceedings/papers/v11/bifet10a.html.
Carela-Español, V., Barlet-Ros, P., Mula-Valls, O., & Sole-Pareta, J. (2013). An automatic traffic classification system for network operation and management. Journal of Network and Systems Management. http://link.springer.com/article/10.1007/s10922-013-9293-1.
Cisco IOS NetFlow: [Online]. Retrieved June 22, 2015, from http://www.cisco.com/c/en/us/products/ios-nx-os-software/ios-netflow/index.html.
MAWI Working Group Traffic Archive: [Online]. Retrieved June 22, 2015, from http://mawi.wide.ad.jp/mawi/.
Quinlan, J. (1993). C4. 5: Programs for machine learning. San Francisco, CA: Morgan Kaufmann.
Google Scholar
Gama, J. (2012). A survey on learning from data streams: current and future trends. Progress in Artificial Intelligence, 1(1), 45–55. doi:10.1007/s13748-011-0002-6.
Article Google Scholar
Tian, X., Sun, Q., Huang, X., & Ma, Y. (2008). Dynamic online traffic classification using data stream mining. In Proceedings of the 2008 international conference on multimedia and information technology, MMIT’08 (pp. 104–107). Washington, DC: IEEE Computer Society. doi:10.1109/MMIT.2008.185.
Tian, X., Sun, Q., Huang, X., & Ma, Y. (2009). A dynamic online traffic classification methodology based on data stream mining. In Proceedings of the 2009 WRI world congress on computer science and information engineering, CSIE ’09 (vol. 01, pp. 298–302). Washington, DC: IEEE Computer Society. doi:10.1109/CSIE.2009.904.
Raahemi, B., Zhong, W., & Liu, J. (2008). Peer-to-peer traffic identification by mining ip layer data streams using concept-adapting very fast decision tree. In Proceedings of the 2008 20th IEEE international conference on tools with artificial intelligence, ICTAI’08 (vol. 01, pp. 525–532). Washington, DC: IEEE Computer Society. doi:10.1109/ICTAI.2008.12.
Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, KDD’01 (pp. 97–106). New York: ACM. doi:10.1145/502512.502529.
Moore, A. W., & Papagiannaki, K. (2005). Toward the accurate identification of network applications. In Proceedings of the 6th international conference on passive and active network measurement, PAM’05 (pp. 41–54). Berlin: Springer. doi:10.1007/978-3-540-31966-5_4.
Dainotti, A., Gargiulo, F., Kuncheva, L. I., Pescape, A., & Sansone, C. (2010). Identification of traffic flows hiding behind tcp port 80. In IEEE international conference on communications (ICC) (pp. 1–6). doi:10.1109/ICC.2010.5502266.
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 13–30. doi:10.2307/2282952.
Article Google Scholar
Bifet, A., & Gavaldà, R. (2009). Adaptive learning from evolving data streams. In Proceedings of the 8th international symposium on intelligent data analysis: Advances in intelligent data analysis VIII, IDA’09 (pp. 249–260). Berlin: Springer. doi:10.1007/978-3-642-03915-7_22.
Bifet, A., & Gavaldà, R. (2007). Learning from time-changing data with adaptive windowing. In Siam international data mining conference (pp. 443–448). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.144.2279.
NBAR2 or Next Generation NBAR—Cisco: [Online]. Retrieved 22, June, 2015, http://www.cisco.com/en/US/prod/collateral/iosswrel/ps6537/ps6558/ps6616/qa_c67-697963.html.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The weka data mining software: An update. SIGKDD Explorations, 11(1), 10–18. doi:10.1145/1656274.1656278.
Bifet, A., & Kirkby, R. (2009). Data stream mining a practical approach. Citeseer. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.192.1957.
Li, W., Canini, M., Moore, A. W., & Bolla, R. (2009). Efficient application identification and the temporal and spatial stability of classification schema. Computer Networks, 53(6), 790–809. doi:10.1016/j.comnet.2008.11.016.
Article Google Scholar
Williams, N., Zander, S., & Armitage, G. (2006). A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification. ACM SIGCOMM Computer Communication Review Journal, 36(5), 5–16. doi:10.1145/1163593.1163596.
Article Google Scholar

Download references

Acknowledgments

This research was funded by the NII International Internship Program, by the Spanish Ministry of Economy and Competitiveness under contract TEC2011-27474 (NOMADS project) and by AGAUR (ref. 2014-SGR-1427).

Author information

Authors and Affiliations

UPC BarcelonaTech, Barcelona, Spain
Valentín Carela-Español & Pere Barlet-Ros
HUAWEI Noah’s Ark Lab, Shatin, Hong Kong
Albert Bifet
National Institute of Informatics (NII), Tokyo, Japan
Kensuke Fukuda

Authors

Valentín Carela-Español
View author publications
You can also search for this author in PubMed Google Scholar
Pere Barlet-Ros
View author publications
You can also search for this author in PubMed Google Scholar
Albert Bifet
View author publications
You can also search for this author in PubMed Google Scholar
Kensuke Fukuda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valentín Carela-Español.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carela-Español, V., Barlet-Ros, P., Bifet, A. et al. A streaming flow-based technique for traffic classification applied to 12 + 1 years of Internet traffic. Telecommun Syst 63, 191–204 (2016). https://doi.org/10.1007/s11235-015-0114-6

Download citation

Published: 19 November 2015
Issue Date: October 2016
DOI: https://doi.org/10.1007/s11235-015-0114-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A streaming flow-based technique for traffic classification applied to 12 + 1 years of Internet traffic

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

Driving behavior analysis and classification by vehicle OBD data using machine learning

A Survey of Traffic Prediction: from Spatio-Temporal Data to Intelligent Transportation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A streaming flow-based technique for traffic classification applied to 12 + 1 years of Internet traffic

Abstract

Access this article

Similar content being viewed by others

Machine Learning: Algorithms, Real-World Applications and Research Directions

Driving behavior analysis and classification by vehicle OBD data using machine learning

A Survey of Traffic Prediction: from Spatio-Temporal Data to Intelligent Transportation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation