Abstract
With the ever-increasing popularity of Social Network Services (SNSs), an understanding of the characteristics of these services and their effects on the behavior of their host servers is critical. However, there has been a lack of research on the workload characterization of servers running SNS applications such as blog services. To fill this void, we empirically characterized real-world Web server logs collected from one of the largest South Korean blog hosting sites for 12 consecutive days. The logs consist of more than 96 million HTTP requests and 4.7TB of network traffic. Our analysis reveals the following: (i) The transfer size of nonmultimedia files and blog articles can be modeled using a truncated Pareto distribution and a log-normal distribution, respectively; (ii) user access for blog articles does not show temporal locality, but is strongly biased towards those posted with image or audio files. We additionally discuss the potential performance improvement through clustering of small files on a blog page into contiguous disk blocks, which benefits from the observed file access patterns. Trace-driven simulations show that, on average, the suggested approach achieves 60.6% better system throughput and reduces the processing time for file access by 30.8% compared to the best performance of the Ext4 filesystem.
- Aban, I. B., Meerschaert, M. M., and Panorska, A. K. 2006. Parameter estimation for the truncated pareto distribution. J. Amer. Statist. Assoc. 101, 473, 270--277.Google ScholarCross Ref
- Arlitt, M. F. and Jin, T. 2000. A workload characterization study of the 1998 world cup web site. IEEE Netw. 14, 3, 33--37. Google ScholarDigital Library
- Arlitt, M. F. and Williamson, C. L. 1997. Internet web servers: Workload characterization and performance implications. IEEE/ACM Trans. Netw. 5, 5, 631--645. Google ScholarDigital Library
- Barford, P. and Crovella, M. 1999. A performance evaluation of hyper text transfer protocols. SIGMETRICS Perform. Eval. Rev. 27, 1, 188--197. Google ScholarDigital Library
- Bent, L., Rabinovich, M., Voelker, G. M., and Xiao, Z. 2004. Characterization of a large web site population with implications for content delivery. In Proceedings of the 13th International World Wide Web Conference. Google ScholarDigital Library
- Borghol, Y., Mitra, S., Ardon, S., Carlsson, N., Eager, D., and Mahanti, A. 2011. Characterizing and modeling popularity of user-generated videos. Perform. Eval. 68, 11, 1037--1055. Google ScholarDigital Library
- Bucy, J. S., Schindler, J., Schlosser, S. W., and Ganger, G. R. 2008. The disksim simulation environment version 4.0 reference manual. Tech. rep. CMU-PDL-08-101, Carnegie Mellon University.Google Scholar
- Burke, M., Marlow, C., and Lento, T. 2009. Feed me: Motivating newcomer contribution in social network sites. In Proceedings of the 27th ACM CHI Conference on Human Factors in Computing Systems. Google ScholarDigital Library
- Cha, M., Mislove, A., and Gummadi, K. P. 2009. A measurement-driven analysis of information propagation in the flickr social network. In Proceedings of the 18th International Conference on World Wide Web. Google ScholarDigital Library
- Challenger, J. 1996. A distributed web server and its performance analysis on multiple platforms. In Proceedings of the 16th International Conference on Distributed Computing Systems. Google ScholarDigital Library
- Crovella, M. E. and Bestavros, A. 1997. Self-similarity in world wide web traffic: Evidence and possible causes. IEEE/ACM Trans. Netw. 5, 6, 835--846. Google ScholarDigital Library
- Crovella, M. E. and Taqqu, M. S. 1999. Estimating the heavy tail index from scaling properties. Meth. Comput. Appl. Probab. 1, 1, 55--79. Google ScholarDigital Library
- Dingle, A., MacNair, E., and Nguyen, T. 1999. An analysis of web server performance. In Proceedings of the Global Telecommunication Conference.Google Scholar
- Duarte, F., Mattos, B., Bestavros, A., Almeida, V., and Almeida, J. 2007. Traffic characteristics and communication patterns in blogosphere. In Proceedings of the International Conference on Weblogs and Social Media.Google Scholar
- Faber, A. M., Gupta, M., and Viecco, C. H. 2006. Revisiting web server workload invariants in the context of scientific web sites. In Proceedings of the ACM/IEEE Conference on Supercomputing. Google ScholarDigital Library
- Gill, P., Arlitt, M., Li, Z., and Mahanti, A. 2007. Youtube traffic characterization: A view from the edge. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement. Google ScholarDigital Library
- Gill, P., Arlitt, M., Carlsson, N., Mahanti, A., and Williamson, C. 2011. Characterizing organizational use of web-based services: Methodology, challenges, observations, and insights. ACM Trans. Web 5, 4, 19:1--19:23. Google ScholarDigital Library
- Guo, L., Tan, E., Chen, S., Zhang, X., and Zhao, Y. E. 2009. Analyzing patterns of user content generation in online social networks. In Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Holmedahl, V., Smith, B., and Yang, T. 1998. Cooperative caching of dynamic content on a distributed web server. In Proceedings of the 7th International Symposium on High Performance Distributed Computing. Google ScholarDigital Library
- Iyengar, A. and Challenger, J. 1997. Improving web server performance by caching dynamic data. In Proceedings of the USENIX Symposium on Internet Technologies and Systems. Google ScholarDigital Library
- Kant, K. and Won, Y. 1999. Performance impact of uncached file accesses in specweb99. In Proceedings of the 2nd IEEE Workshop on Workload Characterization.Google Scholar
- Krishnamurthy, B. 2009. A measure of online social networks. In Proceedings of the 1st International Conference on Communication Systems and Networks. Google ScholarDigital Library
- Leskovec, J., Mcglohon, M., Faloutsos, C., Glance, N., and Hurst, M. 2007. Cascading behavior in large blog graphs. In Proceedings of the 7th SIAM International Conference on Data Mining.Google Scholar
- Li, Z., Chen, Z., Srinivasan, S. M., and Zhou, Y. 2004. C-miner: Mining block correlations in storage systems. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies. Google ScholarDigital Library
- Limpert, E., Stahel, W. A., and Abbt, M. 2001. Log-Normal Distributions across the Sciences: Keys and Clues. BioScience.Google Scholar
- Nagpurkar, P., Horn, W., Gopalakrishnan, U., Dubey, N., Jann, J., and Pattnaik, P. 2008. Workload characterization of selected JEE-based web 2.0 applications. In Proceedings of the IEEE International Symposium on Workload Characterization. Google ScholarDigital Library
- Ohara, M., Nagpurkar, P., Ueda, Y., and Ishizaki, K. 2009. The data-centricity of web 2.0 workloads and its impact on server performance. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software.Google Scholar
- Oke, A. and Bunt, R. B. 2002. Hierarchical workload characterization for a busy web server. In Proceedings of the 12th International Conference on Modelling Tools and Techniques for Computer and Communication System Performance Evaluation. Google ScholarDigital Library
- Patterson, R. H., Gibson, G. A., Ginting, E., Stodolsky, D., and Zelenka, J. 1995. Informed prefetching and caching. In Proceedings of the 15th ACM Symposium on Operating System Principles. Google ScholarDigital Library
- Paxson, V. and Floyd, S. 1994. Wide-area traffic: the failure of poisson modeling. In Proceedings of the Conference on Communications Architectures, Protocols and Applications. Google ScholarDigital Library
- Rodriguez, P. 2009. Web infrastructure for the 21st century. In Proceedings of the 18th International World Wide Web Conference.Google Scholar
- Shriver, E., Gabber, E., Huang, L., and Stein, C. A. 2001. Proceedings of the USENIX Annual Technical Conference.Google Scholar
- Stewart, C., Leventi, M., and Shen, K. 2008. Empirical examination of a collaborative web application. In Proceedings of the IEEE International Symposium on Workload Characterization.Google Scholar
- Tomkins, A., Patterson, R. H., and Gibson, G. 1997. Informed multi-process prefetching and caching. In Proceedings of the ACM SIGMETRICS Conference. Google ScholarDigital Library
- Veres, S. and Ionescu, D. 2009. Measurement-Based traffic characterization for web 2.0 applications. In Proceedings of the International Instrumentation and Measurement Technology Conference.Google Scholar
- Wachs, M., Abd-El-Malek, M., Thereska, E., and Ganger, G. R. 2007. Argon: Performance insulation for shared storage servers. In Proceedings of the 6th USENIX Conference on File and Storage Technologies. Google ScholarDigital Library
- Wang, J. and Li, D. 2003.A light-weight, temporary file system for large-scale web servers. In Proceedings of the 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.Google Scholar
- Williams, A., Arlitt, M., Williamson, C., and Barker, K. 2005. Web Workload Characterization: Ten Years Later. Springer.Google Scholar
- Zipf, G. K. 1949. Human Behavior and the Principle of Least-Effort. Addison-Wesley.Google Scholar
Index Terms
- Workload Characterization and Performance Implications of Large-Scale Blog Servers
Recommendations
SSD-based Workload Characteristics and Their Performance Implications
Special Section on Usenix Fast 2020Storage systems are designed and optimized relying on wisdom derived from analysis studies of file-system and block-level workloads. However, while SSDs are becoming a dominant building block in many storage systems, their design continues to build on ...
A comprehensive study of energy efficiency and performance of flash-based SSD
Use of flash memory as a storage medium is becoming popular in diverse computing environments. However, because of differences in interface, flash memory requires a hard-disk-emulation layer, called FTL (flash translation layer). Although the FTL enables ...
Ext4 file system performance analysis in linux environment
AIASABEBI'11: Proceedings of the 11th WSEAS international conference on Applied informatics and communications, and Proceedings of the 4th WSEAS International conference on Biomedical electronics and biomedical informatics, and Proceedings of the international conference on Computational engineering in systems applicationsThis paper considers the characteristics and behavior of the modern 64-bit ext4 file system under the Linux operating system, kernel version 2.6. It also provides the performance comparison of ext4 file system with earlier ext3 and ext2 file systems. ...
Comments