skip to main content
research-article

Workload Characterization and Performance Implications of Large-Scale Blog Servers

Published:01 November 2012Publication History
Skip Abstract Section

Abstract

With the ever-increasing popularity of Social Network Services (SNSs), an understanding of the characteristics of these services and their effects on the behavior of their host servers is critical. However, there has been a lack of research on the workload characterization of servers running SNS applications such as blog services. To fill this void, we empirically characterized real-world Web server logs collected from one of the largest South Korean blog hosting sites for 12 consecutive days. The logs consist of more than 96 million HTTP requests and 4.7TB of network traffic. Our analysis reveals the following: (i) The transfer size of nonmultimedia files and blog articles can be modeled using a truncated Pareto distribution and a log-normal distribution, respectively; (ii) user access for blog articles does not show temporal locality, but is strongly biased towards those posted with image or audio files. We additionally discuss the potential performance improvement through clustering of small files on a blog page into contiguous disk blocks, which benefits from the observed file access patterns. Trace-driven simulations show that, on average, the suggested approach achieves 60.6% better system throughput and reduces the processing time for file access by 30.8% compared to the best performance of the Ext4 filesystem.

References

  1. Aban, I. B., Meerschaert, M. M., and Panorska, A. K. 2006. Parameter estimation for the truncated pareto distribution. J. Amer. Statist. Assoc. 101, 473, 270--277.Google ScholarGoogle ScholarCross RefCross Ref
  2. Arlitt, M. F. and Jin, T. 2000. A workload characterization study of the 1998 world cup web site. IEEE Netw. 14, 3, 33--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arlitt, M. F. and Williamson, C. L. 1997. Internet web servers: Workload characterization and performance implications. IEEE/ACM Trans. Netw. 5, 5, 631--645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Barford, P. and Crovella, M. 1999. A performance evaluation of hyper text transfer protocols. SIGMETRICS Perform. Eval. Rev. 27, 1, 188--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bent, L., Rabinovich, M., Voelker, G. M., and Xiao, Z. 2004. Characterization of a large web site population with implications for content delivery. In Proceedings of the 13th International World Wide Web Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Borghol, Y., Mitra, S., Ardon, S., Carlsson, N., Eager, D., and Mahanti, A. 2011. Characterizing and modeling popularity of user-generated videos. Perform. Eval. 68, 11, 1037--1055. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bucy, J. S., Schindler, J., Schlosser, S. W., and Ganger, G. R. 2008. The disksim simulation environment version 4.0 reference manual. Tech. rep. CMU-PDL-08-101, Carnegie Mellon University.Google ScholarGoogle Scholar
  8. Burke, M., Marlow, C., and Lento, T. 2009. Feed me: Motivating newcomer contribution in social network sites. In Proceedings of the 27th ACM CHI Conference on Human Factors in Computing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cha, M., Mislove, A., and Gummadi, K. P. 2009. A measurement-driven analysis of information propagation in the flickr social network. In Proceedings of the 18th International Conference on World Wide Web. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Challenger, J. 1996. A distributed web server and its performance analysis on multiple platforms. In Proceedings of the 16th International Conference on Distributed Computing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Crovella, M. E. and Bestavros, A. 1997. Self-similarity in world wide web traffic: Evidence and possible causes. IEEE/ACM Trans. Netw. 5, 6, 835--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Crovella, M. E. and Taqqu, M. S. 1999. Estimating the heavy tail index from scaling properties. Meth. Comput. Appl. Probab. 1, 1, 55--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dingle, A., MacNair, E., and Nguyen, T. 1999. An analysis of web server performance. In Proceedings of the Global Telecommunication Conference.Google ScholarGoogle Scholar
  14. Duarte, F., Mattos, B., Bestavros, A., Almeida, V., and Almeida, J. 2007. Traffic characteristics and communication patterns in blogosphere. In Proceedings of the International Conference on Weblogs and Social Media.Google ScholarGoogle Scholar
  15. Faber, A. M., Gupta, M., and Viecco, C. H. 2006. Revisiting web server workload invariants in the context of scientific web sites. In Proceedings of the ACM/IEEE Conference on Supercomputing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gill, P., Arlitt, M., Li, Z., and Mahanti, A. 2007. Youtube traffic characterization: A view from the edge. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gill, P., Arlitt, M., Carlsson, N., Mahanti, A., and Williamson, C. 2011. Characterizing organizational use of web-based services: Methodology, challenges, observations, and insights. ACM Trans. Web 5, 4, 19:1--19:23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Guo, L., Tan, E., Chen, S., Zhang, X., and Zhao, Y. E. 2009. Analyzing patterns of user content generation in online social networks. In Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Holmedahl, V., Smith, B., and Yang, T. 1998. Cooperative caching of dynamic content on a distributed web server. In Proceedings of the 7th International Symposium on High Performance Distributed Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Iyengar, A. and Challenger, J. 1997. Improving web server performance by caching dynamic data. In Proceedings of the USENIX Symposium on Internet Technologies and Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kant, K. and Won, Y. 1999. Performance impact of uncached file accesses in specweb99. In Proceedings of the 2nd IEEE Workshop on Workload Characterization.Google ScholarGoogle Scholar
  22. Krishnamurthy, B. 2009. A measure of online social networks. In Proceedings of the 1st International Conference on Communication Systems and Networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Leskovec, J., Mcglohon, M., Faloutsos, C., Glance, N., and Hurst, M. 2007. Cascading behavior in large blog graphs. In Proceedings of the 7th SIAM International Conference on Data Mining.Google ScholarGoogle Scholar
  24. Li, Z., Chen, Z., Srinivasan, S. M., and Zhou, Y. 2004. C-miner: Mining block correlations in storage systems. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Limpert, E., Stahel, W. A., and Abbt, M. 2001. Log-Normal Distributions across the Sciences: Keys and Clues. BioScience.Google ScholarGoogle Scholar
  26. Nagpurkar, P., Horn, W., Gopalakrishnan, U., Dubey, N., Jann, J., and Pattnaik, P. 2008. Workload characterization of selected JEE-based web 2.0 applications. In Proceedings of the IEEE International Symposium on Workload Characterization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ohara, M., Nagpurkar, P., Ueda, Y., and Ishizaki, K. 2009. The data-centricity of web 2.0 workloads and its impact on server performance. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software.Google ScholarGoogle Scholar
  28. Oke, A. and Bunt, R. B. 2002. Hierarchical workload characterization for a busy web server. In Proceedings of the 12th International Conference on Modelling Tools and Techniques for Computer and Communication System Performance Evaluation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Patterson, R. H., Gibson, G. A., Ginting, E., Stodolsky, D., and Zelenka, J. 1995. Informed prefetching and caching. In Proceedings of the 15th ACM Symposium on Operating System Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Paxson, V. and Floyd, S. 1994. Wide-area traffic: the failure of poisson modeling. In Proceedings of the Conference on Communications Architectures, Protocols and Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rodriguez, P. 2009. Web infrastructure for the 21st century. In Proceedings of the 18th International World Wide Web Conference.Google ScholarGoogle Scholar
  32. Shriver, E., Gabber, E., Huang, L., and Stein, C. A. 2001. Proceedings of the USENIX Annual Technical Conference.Google ScholarGoogle Scholar
  33. Stewart, C., Leventi, M., and Shen, K. 2008. Empirical examination of a collaborative web application. In Proceedings of the IEEE International Symposium on Workload Characterization.Google ScholarGoogle Scholar
  34. Tomkins, A., Patterson, R. H., and Gibson, G. 1997. Informed multi-process prefetching and caching. In Proceedings of the ACM SIGMETRICS Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Veres, S. and Ionescu, D. 2009. Measurement-Based traffic characterization for web 2.0 applications. In Proceedings of the International Instrumentation and Measurement Technology Conference.Google ScholarGoogle Scholar
  36. Wachs, M., Abd-El-Malek, M., Thereska, E., and Ganger, G. R. 2007. Argon: Performance insulation for shared storage servers. In Proceedings of the 6th USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wang, J. and Li, D. 2003.A light-weight, temporary file system for large-scale web servers. In Proceedings of the 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.Google ScholarGoogle Scholar
  38. Williams, A., Arlitt, M., Williamson, C., and Barker, K. 2005. Web Workload Characterization: Ten Years Later. Springer.Google ScholarGoogle Scholar
  39. Zipf, G. K. 1949. Human Behavior and the Principle of Least-Effort. Addison-Wesley.Google ScholarGoogle Scholar

Index Terms

  1. Workload Characterization and Performance Implications of Large-Scale Blog Servers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 6, Issue 4
        November 2012
        138 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/2382616
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 November 2012
        • Accepted: 1 August 2012
        • Revised: 1 May 2012
        • Received: 1 September 2011
        Published in tweb Volume 6, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader