skip to main content
10.1145/3546096.3546103acmotherconferencesArticle/Chapter ViewAbstractPublication PagescsetConference Proceedingsconference-collections
research-article
Open Access

The DARPA SEARCHLIGHT Dataset of Application Network Traffic

Authors Info & Claims
Published:08 August 2022Publication History

ABSTRACT

Researchers are in constant need of reliable data to develop and evaluate AI/ML methods for networks and cybersecurity. While Internet measurements can provide realistic data, such datasets lack ground truth about application flows. We present a ∼ 750GB dataset that includes ∼ 2000 systematically conducted experiments and the resulting packet captures with video streaming, video teleconferencing, and cloud-based document editing applications. This curated and labeled dataset has bidirectional and encrypted traffic with complete ground truth that can be widely used for assessments and evaluation of AI/ML algorithms.

References

  1. 8x8. 2022. Jitsi Meet. https://jitsi.org/meetGoogle ScholarGoogle Scholar
  2. Calvin Ardi, Alefiya Hussain, and Stephen Schwab. 2021. Building Reproducible Video Streaming Traffic Generators. In Cyber Security Experimentation and Test Workshop (Virtual, CA, USA) (CSET ’21). Association for Computing Machinery, New York, NY, USA, 91–95. https://doi.org/10.1145/3474718.3474721Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. MergeTB Authors. 2022. The Merge Testbed Platform. https://next.mergetb.orgGoogle ScholarGoogle Scholar
  4. Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (Anaheim, CA) (ATEC ’05). USENIX Association, USA, 41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. RIPE Network Coordination Center. 2022. RIPE Atlas. https://www.ripe.net/analyse/internet-measurementsGoogle ScholarGoogle Scholar
  6. kc claffy, David Clark, John Heidemann, Fabian Bustamante, Mattijs Jonker, Aaron Schulman, and Ellen Zegura. 2021. Workshop on Overcoming Measurement Barriers to Internet Research (WOMBIR 2021) Final Report. SIGCOMM Comput. Commun. Rev. 51, 3 (July 2021), 33–40. https://doi.org/10.1145/3477482.3477489Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. DARPA. 2022. Searchlight. https://www.darpa.mil/program/searchlightGoogle ScholarGoogle Scholar
  8. David DeAngelis, Alefiya Hussain, Brian Kocoloski, Calvin Ardi, and Stephen Schwab. 2022. Generating Representative Video Teleconferencing Traffic(CSET ’22). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3546096.3546107Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jason A Donenfeld. 2017. Wireguard: Next Generation Kernel Network Tunnel. In 24th Annual Network and Distributed System Security Symposium (San Diego, California, USA) (NDSS ’17). Internet Society. https://doi.org/10.14722/ndss.2017.23160Google ScholarGoogle Scholar
  10. Naganand Doraswamy and Dan Harkins. 2003. IPSec: the new security standard for the Internet, intranets, and virtual private networks. Prentice Hall Professional.Google ScholarGoogle Scholar
  11. Constantine Dovrolis, Krishna Gummadi, Aleksandar Kuzmanovic, and Sascha D. Meinrath. 2010. Measurement Lab: Overview and an Invitation to the Research Community. SIGCOMM Comput. Commun. Rev. 40, 3 (June 2010), 53–56. https://doi.org/10.1145/1823844.1823853Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Anja Feldmann, Oliver Gasser, Franziska Lichtblau, Enric Pujol, Ingmar Poese, Christoph Dietzel, Daniel Wagner, Matthias Wichtlhuber, Juan Tapiador, Narseo Vallina-Rodriguez, Oliver Hohlfeld, and Georgios Smaragdakis. 2020. The Lockdown Effect: Implications of the COVID-19 Pandemic on Internet Traffic. In Proceedings of the ACM Internet Measurement Conference (Virtual Event, USA) (IMC ’20). Association for Computing Machinery, New York, NY, USA, 1–18. https://doi.org/10.1145/3419394.3423658Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Anja Feldmann, Oliver Gasser, Franziska Lichtblau, Enric Pujol, Ingmar Poese, Christoph Dietzel, Daniel Wagner, Matthias Wichtlhuber, Juan Tapiador, Narseo Vallina-Rodriguez, Oliver Hohlfeld, and Georgios Smaragdakis. 2021. A Year in Lockdown: How the Waves of COVID-19 Impact Internet Traffic. Commun. ACM 64, 7 (June 2021), 101–108. https://doi.org/10.1145/3465212Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. The Etherpad Foundation. 2022. Etherpad. https://etherpad.orgGoogle ScholarGoogle Scholar
  15. Timur Friedman, Phillipa Gill, Sue Moon, Dave Clark, and Ítalo Cunha. 2022. The Networking Channel: Network Datasets: what exists, and what are the problems?https://networkingchannel.eu/network-datasets-what-exists-and-what-are-the-problems/Google ScholarGoogle Scholar
  16. John Heidemann and Christos Papadopoulos. 2009. Uses and Challenges for Network Datasets. In Proceedings of the IEEE Cybersecurity Applications and Technologies Conference for Homeland Security (CATCH). IEEE, Washington, DC, USA, 73–82. https://doi.org/10.1109/CATCH.2009.29Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alefiya Hussain, Genevieve Bartlett, Yuri Pryadkin, John Heidemann, Christos Papadopoulos, and Joseph Bannister. 2005. Experiences with a Continuous Network Tracing Infrastructure. In Proceedings of the 2005 ACM SIGCOMM Workshop on Mining Network Data (Philadelphia, Pennsylvania, USA) (MineNet ’05). Association for Computing Machinery, New York, NY, USA, 185–190. https://doi.org/10.1145/1080173.1080181Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. kc claffy. 2022. CAIDA Datasets. https://www.caida.org/catalog/datasets/completed-datasets/Google ScholarGoogle Scholar
  19. Alexander D. Kent. 2016. Cyber-Security Data Sources for Dynamic Network Research. In Dynamic Networks in Cybersecurity, Niall Adams and Nick Heard (Eds.). Imperial College Press, 37–65. https://doi.org/10.1142/9781786340757_0002Google ScholarGoogle Scholar
  20. Brian Kocoloski, Alefiya Hussain, Matthew Troglia, Calvin Ardi, Steven Cheng, Dave DeAngelis, Christopher Symonds, Michael Collins, Ryan Goodfellow, and Stephen Schwab. 2021. Case Studies in Experiment Design on a Minimega Based Network Emulation Testbed. In Cyber Security Experimentation and Test Workshop (Virtual, CA, USA) (CSET ’21). Association for Computing Machinery, New York, NY, USA, 83–90. https://doi.org/10.1145/3474718.3474730Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Richard Lippmann, Joshua W Haines, David J Fried, Jonathan Korba, and Kumar Das. 2000. The 1999 DARPA off-line intrusion detection evaluation. Computer Networks 34, 4 (2000), 579–595. https://doi.org/10.1016/S1389-1286(00)00139-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Microsoft. 2022. Playwright. https://playwright.devGoogle ScholarGoogle Scholar
  23. Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. (2019). https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdfGoogle ScholarGoogle Scholar
  24. Markus Ring, Sarah Wunderlich, Deniz Scheuring, Dieter Landes, and Andreas Hotho. 2019. A survey of network-based intrusion detection data sets. Computers & Security 86(2019), 147–167. https://doi.org/10.1016/j.cose.2019.06.005Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sandvine. 2020. The Global Internet Phenomena Report COVID-19 Spotlight. (7 May 2020). https://www.sandvine.com/phenomenaGoogle ScholarGoogle Scholar
  26. Sandvine. 2022. 2022 Global Internet Phenomena Report. (20 Jan. 2022). https://www.sandvine.com/phenomenaGoogle ScholarGoogle Scholar
  27. Jihwang Yeo, David Kotz, and Tristan Henderson. 2006. CRAWDAD: A Community Resource for Archiving Wireless Data at Dartmouth. SIGCOMM Comput. Commun. Rev. 36, 2 (April 2006), 21–22. https://doi.org/10.1145/1129582.1129588Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The DARPA SEARCHLIGHT Dataset of Application Network Traffic

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            CSET '22: Proceedings of the 15th Workshop on Cyber Security Experimentation and Test
            August 2022
            150 pages
            ISBN:9781450396844
            DOI:10.1145/3546096

            Copyright © 2022 ACM

            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 8 August 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)327
            • Downloads (Last 6 weeks)86

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format