skip to main content
10.1145/3471621.3471858acmotherconferencesArticle/Chapter ViewAbstractPublication PagesraidConference Proceedingsconference-collections
research-article
Public Access

Living-Off-The-Land Command Detection Using Active Learning

Published:07 October 2021Publication History

ABSTRACT

In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distribution or another user-installed binary, therefore this type of attack is called “Living-Off-The-Land”. Detecting these attacks is challenging, as adversaries may not create malicious files on the victim computers and anti-virus scans fail to detect them.

We propose the design of an Active Learning framework called LOLAL for detecting Living-Off-the-Land attacks that iteratively selects a set of uncertain and anomalous samples for labeling by a human analyst. LOLAL is specifically designed to work well when a limited number of labeled samples are available for training machine learning models to detect attacks. We investigate methods to represent command-line text using word-embedding techniques, and design ensemble boosting classifiers to distinguish malicious and benign samples based on the embedding representation. We leverage a large, anonymized dataset collected by an endpoint security product and demonstrate that our ensemble classifiers achieve an average F1 score of 96% at classifying different attack classes. We show that our active learning method consistently improves the classifier performance, as more training data is labeled, and converges in less than 30 iterations when starting with a small number of labeled instances.

References

  1. Mohamad Mahmoud Al Rahhal, Yakoub Bazi, Haikel AlHichri, Naif Alajlan, Farid Melgani, and Ronald R Yager. 2016. Deep learning approach for active classification of electrocardiogram signals. Information Sciences 345(2016), 340–354.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Almgren and E. Jonsson. 2004. Using active learning in intrusion detection. In Proc. IEEE Computer Security Foundations Workshop. 88–98.Google ScholarGoogle Scholar
  3. Dana Angluin. 1988. Queries and concept learning. Machine learning 2, 4 (1988), 319–342.Google ScholarGoogle Scholar
  4. AppArmor. 2021. AppArmor: Linux kernel security module. https://apparmor.net/Google ScholarGoogle Scholar
  5. Ignacio Arnaldo, Kalyan Veeramachaneni, and Mei Lam. 2019. eX2: a framework for interactive anomaly detection.. In IUI Workshops.Google ScholarGoogle Scholar
  6. Les Atlas, David Cohn, and Richard Ladner. 1989. Training connectionist networks with queries and selective sampling. Advances in neural information processing systems 2 (1989), 566–573.Google ScholarGoogle Scholar
  7. Anaël Beaugnon, Pierre Chifflier, and Francis Bach. 2017. Ilab: An interactive labelling strategy for intrusion detection. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 120–140.Google ScholarGoogle ScholarCross RefCross Ref
  8. Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning attacks against support vector machines. In ICML.Google ScholarGoogle Scholar
  9. Daniel Bohannon and Lee Holmes. 2017. Revoke-obfuscation: powershell obfuscation detection using science.Google ScholarGoogle Scholar
  10. Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Varin Chouvatut, Wattana Jindaluang, and Ekkarat Boonchieng. 2015. Training set size reduction in large dataset problems. In 2015 International Computer Science and Engineering Conference (ICSEC). IEEE, 1–5.Google ScholarGoogle ScholarCross RefCross Ref
  12. David A Cohn, Zoubin Ghahramani, and Michael I Jordan. 1996. Active learning with statistical models. Journal of artificial intelligence research 4 (1996), 129–145.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Crowdstrike. 2019. The Rise of “Living off the Land” Attacks | CrowdStrike. https://www.crowdstrike.com/blog/going-beyond-malware-the-rise-of-living-off-the-land-attacksGoogle ScholarGoogle Scholar
  14. Cytomic. 2019. Living-off-the-Land attacks: what are they and why should they worry you? | Cytomic. https://www.cytomicmodel.com/news/living-off-the-land-attacksGoogle ScholarGoogle Scholar
  15. Alessandra De Paola, Salvatore Gaglio, Giuseppe Lo Re, and Marco Morana. 2018. A hybrid system for malware detection on big data. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 45–50.Google ScholarGoogle ScholarCross RefCross Ref
  16. Hervé Debar, Marc Dacier, Mehdi Nassehi, and Andreas Wespi. 1998. Fixed vs. variable-length patterns for detecting suspicious process behavior. In European Symposium on Research in Computer Security. Springer, 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Benoît Frénay and Michel Verleysen. 2013. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25, 5(2013), 845–869.Google ScholarGoogle Scholar
  18. Nico Görnitz, Marius Kloft, Konrad Rieck, and Ulf Brefeld. 2009. Active learning for network intrusion detection. In Proceedings of the 2nd ACM workshop on Security and artificial intelligence. ACM, 47–54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nico Görnitz, Marius Kloft, Konrad Rieck, and Ulf Brefeld. 2013. Toward supervised anomaly detection. Journal of Artificial Intelligence Research 46 (2013), 235–262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733(2017).Google ScholarGoogle Scholar
  21. M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li. 2018. Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning. In 2018 IEEE Symposium on Security and Privacy (SP). 19–35. https://doi.org/10.1109/SP.2018.00057Google ScholarGoogle Scholar
  22. Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning. Springer, 137–142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759(2016).Google ScholarGoogle Scholar
  24. David D Lewis and William A Gale. 1994. A sequential algorithm for training text classifiers. In SIGIR’94. Springer, 3–12.Google ScholarGoogle Scholar
  25. Tong Luo, Kurt Kramer, Dmitry B Goldgof, Lawrence O Hall, Scott Samson, Andrew Remsen, Thomas Hopkins, and David Cohn. 2005. Active learning to recognize multiple types of plankton.Journal of Machine Learning Research 6, 4 (2005).Google ScholarGoogle Scholar
  26. Carla Marceau. 2001. Characterizing the behavior of a program using multiple-length n-grams. In Proceedings of the 2000 workshop on New security paradigms. 101–110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tren Micro. 2021. Tracking, Detecting, and Thwarting PowerShell-based Malware and Attacks - Security News. https://www.trendmicro.com/vinfo/hk-en/security/news/cybercrime-and-digital-threats/tracking-detecting-and-thwarting-powershell-based-malware-and-attacksGoogle ScholarGoogle Scholar
  28. Microsoft. 2018. Out of sight but not invisible: Defeating fileless malware with behavior monitoring, AMSI, and next-gen AV - Microsoft Security. https://www.microsoft.com/security/blog/2018/09/27/out-of-sight-but-not-invisible-defeating-fileless-malware-with-behavior-monitoring-amsi-and-next-gen-avGoogle ScholarGoogle Scholar
  29. Microsoft. 2021. Microsoft Defender for Endpoint | Microsoft Security. https://www.microsoft.com/en-us/security/business/threat-protection/endpoint-defenderGoogle ScholarGoogle Scholar
  30. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.Google ScholarGoogle Scholar
  31. Brad Miller, Alex Kantchelian, Sadia Afroz, Rekha Bachwani, Edwin Dauber, Ling Huang, Michael Carl Tschantz, Anthony D Joseph, and J Doug Tygar. 2014. Adversarial active learning. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop. ACM, 3–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Palo Alto Networks. 2020. What Are Fileless Malware Attacks and “Living Off the Land”? Unit 42 Explains. https://www.paloaltonetworks.com/cyberpedia/what-are-fileless-malware-attacksGoogle ScholarGoogle Scholar
  33. D. Pelleg and A. Moore. 2004. Active Learning for Anomaly and Rare-Category Detection. In Proc. Advances in Neural Information Processing Systems. 1073–1080.Google ScholarGoogle Scholar
  34. GTFOBins Project. 2021. Living Off The Land Binaries for UNIX. https://gtfobins.github.io/.Google ScholarGoogle Scholar
  35. LOLBAS Project. 2021. Living Off The Land Binaries and Scripts (and also Libraries). https://lolbas-project.github.io/.Google ScholarGoogle Scholar
  36. Shubham Rai. 2020. Behavioral Threat Detection: detecting Living of Land Techniques. Master’s thesis. University of Twente.Google ScholarGoogle Scholar
  37. Juan Ramos 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. Piscataway, NJ, 133–142.Google ScholarGoogle Scholar
  38. Korlakai Vinayak Rashmi and Ran Gilad-Bachrach. 2015. DART: Dropouts meet Multiple Additive Regression Trees.. In AISTATS. 489–497.Google ScholarGoogle Scholar
  39. N Roy and A McCallum. 2001. Toward optimal active learning through sampling estimation of error reduction. Int. Conf. on Machine Learning.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Amir Rubin, Shay Kels, and Danny Hendler. 2019. Detecting Malicious PowerShell Scripts Using Contextual Embeddings. arXiv preprint arXiv:1905.09538(2019).Google ScholarGoogle Scholar
  41. Gili Rusak, Abdullah Al-Dujaili, and Una-May O’Reilly. 2018. Ast-based deep learning for detecting malicious powershell. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2276–2278.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Andrew I Schein and Lyle H Ungar. 2007. Active learning for logistic regression: an evaluation. Machine Learning 68, 3 (2007), 235–265.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D Sculley. 2007. Online active learning methods for fast label-efficient spam filtering.. In CEAS, Vol. 7. 143.Google ScholarGoogle Scholar
  44. D Sculley, Matthew Eric Otey, Michael Pohl, Bridget Spitznagel, John Hainsworth, and Yunkai Zhou. 2011. Detecting adversarial advertisements in the wild. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 274–282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin–Madison.Google ScholarGoogle Scholar
  46. H Sebastian Seung, Manfred Opper, and Haim Sompolinsky. 1992. Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory. 287–294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Md Amran Siddiqui, Jack W Stokes, Christian Seifert, Evan Argyle, Robert McCann, Joshua Neil, and Justin Carroll. 2019. Detecting Cyber Attacks Using Anomaly Detection with Explanations and Expert Feedback. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2872–2876.Google ScholarGoogle Scholar
  48. Robin Sommer and Vern Paxson. 2010. Outside the Closed World: On Using Machine Learning for Network Intrusion Detection. In 2010 IEEE Symposium on Security and Privacy. 305–316. https://doi.org/10.1109/SP.2010.25Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ryan Stewart. 2019. Let’s dig deeper on how cybercriminals use ‘Living off the land’ attack tactics | Cyware Hacker News. Cyware (Mar 2019). https://cyware.com/news/lets-dig-deeper-on-how-cybercriminals-use-living-off-the-land-attack-tactics-cac5c132Google ScholarGoogle Scholar
  50. Jack W Stokes, Ashish Kapoor, and Debajyoti Ray. 2016. Asking for a second opinion: Re-querying of noisy multi-class labels. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2329–2333.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jack W Stokes, John Platt, Joseph Kravis, and Michael Shilman. 2008. Aladin: Active learning of anomalies to detect intrusions. (2008).Google ScholarGoogle Scholar
  52. Symantec. 2017. Living off the land and fileless attack techniques. Technical Report.Google ScholarGoogle Scholar
  53. Symantec. 2021. PowerShell threats surge - Symantec Enterprise. https://community.broadcom.com/symantecenterprise/communities/community-home/librarydocuments/viewdocument?DocumentKey=cbd24b89-1022-4fe8-800d-a362f3d4cf06&CommunityKey=1ecf5f55-9545-44d6-b0f4-4e4a7f5f5e68&tab=librarydocumentsGoogle ScholarGoogle Scholar
  54. Jorge L Guerra Torres, Carlos A Catania, and Eduardo Veas. 2019. Active learning approach to label network traffic datasets. Journal of information security and applications 49 (2019), 102388.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Devis Tuia, Frédéric Ratle, Fabio Pacifici, Mikhail F Kanevski, and William J Emery. 2009. Active learning methods for remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing 47, 7 (2009), 2218–2232.Google ScholarGoogle ScholarCross RefCross Ref
  56. Kalyan Veeramachaneni, Ignacio Arnaldo, Vamsi Korrapati, Constantinos Bassias, and Ke Li. 2016. AI2303 2: training a big data machine to defend. In 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). IEEE, 49–54.Google ScholarGoogle Scholar
  57. Venafi. 2020. Beware of Cyber Attackers “Living off the Land” | Venafi. https://www.venafi.com/blog/beware-cyber-attackers-living-landGoogle ScholarGoogle Scholar
  58. Qi Wang, Wajih Ul Hassan, Ding Li, Kangkook Jee, Xiao Yu, Kexuan Zou, Junghwan Rhee, Zhengzhang Chen, Wei Cheng, C Gunter, 2020. You are what you do: Hunting stealthy malware via data provenance analysis. In Proc. of the Symposium on Network and Distributed System Security (NDSS).Google ScholarGoogle ScholarCross RefCross Ref
  59. Colin Whittaker, Brian Ryner, and Marria Nazif. 2010. Large-scale automatic classification of phishing pages. (2010).Google ScholarGoogle Scholar
  60. Muhammd Mudassar Yamin and Basel Katt. 2018. Detecting Malicious Windows Commands Using Natural Language Processing Techniques. In International Conference on Security for Information Technology and Communications. Springer, 157–169.Google ScholarGoogle Scholar
  1. Living-Off-The-Land Command Detection Using Active Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        RAID '21: Proceedings of the 24th International Symposium on Research in Attacks, Intrusions and Defenses
        October 2021
        468 pages
        ISBN:9781450390583
        DOI:10.1145/3471621

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 October 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format