skip to main content
10.1145/3642963.3652202acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Free Access

TrackIops: Real-Time NFS Performance Metrics Extractor

Published:14 May 2024Publication History

ABSTRACT

Network File System (NFS) is commonly used in cloud environments as a cost-effective file storage solution that is easy to set up. However, the multi-tenant nature of cloud infrastructures makes distributed file systems prone to instability and unpredictability. These performance issues can be very harmful to both Cloud Service Providers (CSPs) and tenants. Therefore, CSPs and their customers require more and more real-time granular metrics (per-file, high-frequency) for dynamically optimizing data placement, resource usage and ensuring file access performance as well as for provisioning resources cost-effectively, billing and troubleshooting them rapidly. In this paper, we propose TrackIops, a novel NFS tracer that provides these metrics without effort and at low cost. TrackIops is an eBPF-based client-side request-oriented tracing solution. The main contribution of this paper is a smart kernel-level solution that reconstructs NFS request and response threads and analyses them online without requiring server instrumentation. TrackIops provides real-time per-tenant, per-file, per-second NFS metrics extractor, easy to integrate in any optimization or troubleshooting solution, with an overhead lower than 3.5% on the client in a worst-case scenario.

References

  1. 2010. nfsiostat man page. https://man7.org/linux/man-pages/man8/nfsiostat.8.htmlGoogle ScholarGoogle Scholar
  2. 2020. Mandatory Emissions Reporting Around the Globe. https://www.ul.com/news/mandatory-emissions-reporting-around-globeGoogle ScholarGoogle Scholar
  3. 2021. nfsdist. https://github.com/iovisor/bcc/blob/master/tools/nfsdist.pyGoogle ScholarGoogle Scholar
  4. 2021. nfsslower. https://github.com/iovisor/bcc/blob/master/tools/nfsslower.pyGoogle ScholarGoogle Scholar
  5. 2023. blktrace man page. https://linux.die.net/man/8/blktraceGoogle ScholarGoogle Scholar
  6. 2023. inotifywatch man page. https://linux.die.net/man/1/inotifywatchGoogle ScholarGoogle Scholar
  7. 2023. nfsstat man page. https://linux.die.net/man/8/nfsstatGoogle ScholarGoogle Scholar
  8. 2023. pidstat man page. https://man7.org/linux/man-pages/man1/pidstat.1.htmlGoogle ScholarGoogle Scholar
  9. 2023. ps man page. https://man7.org/linux/man-pages/man1/ps.1.htmlGoogle ScholarGoogle Scholar
  10. 2023. QCOW2 format reference. https://github.com/qemu/qemu/blob/master/docs/interop/qcow2.txtGoogle ScholarGoogle Scholar
  11. 2023. top man page. https://man7.org/linux/man-pages/man1/top.1.htmlGoogle ScholarGoogle Scholar
  12. 2023. What Is eBPF? https://ebpf.io/what-is-ebpf/Google ScholarGoogle Scholar
  13. 2024. atop man page. https://linux.die.net/man/1/atopGoogle ScholarGoogle Scholar
  14. 2024. BCC storage tools. https://github.com/iovisor/bcc?tab=readme-ov-file#storage-and-filesystems-toolsGoogle ScholarGoogle Scholar
  15. 2024. Dell EMC storage metrics. https://www.ibm.com/docs/en/storage-insights?topic=metrics-performance-dell-emc-storage-systemsGoogle ScholarGoogle Scholar
  16. 2024. NetApp storage metrics. https://docs.netapp.com/us-en/ontap-automation/rest/performance_metrics.htmlGoogle ScholarGoogle Scholar
  17. Luiz André Barroso and Urs Hölzle. 2007. The case for energy-proportional computing. Computer 40, 12 (2007), 33--37. Publisher: IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Philip Carns, Kevin Harms, William Allcock, Charles Bacon, Samuel Lang, Robert Latham, and Robert Ross. 2011. Understanding and Improving Computational Science Storage Access through Continuous Characterization. ACM Trans. Storage 7, 3 (Oct. 2011). https://doi.org/10.1145/2027066.2027068 Place: New York, NY, USA Publisher: Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tao Chen, Xiaofeng Gao, and Guihai Chen. 2016. The features, hardware, and architectures of data center networks: A survey. J. Parallel and Distrib. Comput. 96 (Oct. 2016), 45--74. https://doi.org/10.1016/j.jpdc.2016.05.009Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Steven WD Chien, Artur Podobas, Ivy B Peng, and Stefano Markidis. 2020. tf-Darshan: Understanding fine-grained I/O performance in machine learning workloads. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 359--370.Google ScholarGoogle ScholarCross RefCross Ref
  21. Jonathan Corbet. 2016. Tracepoints with eBPF. https://lwn.net/Articles/683504/Google ScholarGoogle Scholar
  22. Tânia Esteves, Francisco Neves, Rui Oliveira, and João Paulo. 2021. CAT: Content-aware tracing and analysis for distributed systems. In Proceedings of the 22nd International Middleware Conference. 223--235.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Pawan Kumar and Rakesh Kumar. 2019. Issues and Challenges of Load Balancing Techniques in Cloud Computing: A Survey. ACM Comput. Surv. 51, 6 (Feb. 2019). https://doi.org/10.1145/3281010 Place: New York, NY, USA Publisher: Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Daniel Kunkle and Jiri Schindler. 2008. A load balancing framework for clustered storage systems. In International Conference on High-Performance Computing. Springer, 57--72.Google ScholarGoogle ScholarCross RefCross Ref
  25. Haitao Li, Yuliang Yang, and Bin Zheng. 2012. Research on Billing Strategy of Cloud Storage. In 2012 Fourth International Conference on Multimedia Information Networking and Security. 624--627. https://doi.org/10.1109/MINES.2012.172Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bjørn Lindi. [n. d.]. I/O-profiling with Darshan. PRACE report ([n. d.]).Google ScholarGoogle Scholar
  27. Guoxin Liu, Haiying Shen, and Haoyu Wang. 2015. Computing load aware and long-view load balancing for cluster storage systems. In 2015 IEEE International Conference on Big Data (Big Data). 174--183. https://doi.org/10.1109/BigData.2015.7363754Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mohammed Islam Naas, François Trahay, Alexis Colin, Pierre Olivier, Stéphane Rubini, Frank Singhoff, and Jalil Boukhobza. 2021. EZIO-Tracer: unifying kernel and user space I/O tracing for data-intensive applications. In Proceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems. ACM, Online Event United Kingdom, 1--11. https://doi.org/10.1145/3439839.3458731Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Francisco Neves, Nuno Machado, and others. 2018. Falcon: A practical log-based analysis tool for distributed systems. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 534--541.Google ScholarGoogle ScholarCross RefCross Ref
  30. Pankesh Patel, Ajith Ranabahu, and Amit Sheth. 2009. Service Level Agreement in Cloud Computing. Kno.e.sis Publications (Jan. 2009). https://corescholar.libraries.wright.edu/knoesis/78%7DGoogle ScholarGoogle Scholar
  31. Lorenzo Posani, Alessio Paccoia, and Marco Moschettini. 2018. The carbon footprint of distributed cloud storage. (2018). https://doi.org/10.48550/ARXIV.1803.06973 Publisher: arXiv Version Number: 3.Google ScholarGoogle ScholarCross RefCross Ref
  32. Junxian Shen, Han Zhang, Yang Xiang, Xingang Shi, Xinrui Li, Yunxi Shen, Zijian Zhang, Yongxiang Wu, Xia Yin, Jilong Wang, and others. 2023. Network-centric distributed tracing with DeepFlow: Troubleshooting your microservices in zero code. In Proceedings of the ACM SIGCOMM 2023 Conference. 420--437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Arie Taal, Dexter Drupsteen, Marc X. Makkes, and Paola Grosso. 2014. Storage to energy: Modeling the carbon emission of storage task offloading between data centers. In 2014 IEEE 11th Consumer Communications and Networking Conference (CCNC). 50--55. https://doi.org/10.1109/CCNC.2014.6866547Google ScholarGoogle ScholarCross RefCross Ref
  34. François Trahay, François Rue, Mathieu Faverge, Yutaka Ishikawa, Raymond Namyst, and Jack Dongarra. 2011. EZTrace: a generic framework for performance analysis. In 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 618--619.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Matthew Wachs, Lianghong Xu, Arkady Kanevsky, and Gregory R Ganger. 2011. Exertion-based billing for cloud storage access. In 3rd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 11).Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    CHEOPS '24: Proceedings of the 4th Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems
    April 2024
    38 pages
    ISBN:9798400705380
    DOI:10.1145/3642963

    Copyright © 2024 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 May 2024

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate6of8submissions,75%
  • Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)17

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader