Living-Off-The-Land Command Detection Using Active Learning

Authors:
Talha Ongun

Northeastern University, US

Northeastern University, US
View Profile

,
Jack W. Stokes

Microsoft Research, US

Microsoft Research, US
View Profile

,
Jonathan Bar Or

Microsoft Corporation, US

Microsoft Corporation, US
View Profile

,
Ke Tian

Microsoft Corporation and Palo Alto Networks, US

Microsoft Corporation and Palo Alto Networks, US
View Profile

,
Farid Tajaddodianfar

Microsoft Corporation and Amazon, US

Microsoft Corporation and Amazon, US
View Profile

,
Joshua Neil

Microsoft Corporation, US

Microsoft Corporation, US
View Profile

,
Christian Seifert

Microsoft Corporation, US

Microsoft Corporation, US
View Profile

,
Alina Oprea

Northeastern University, US

Northeastern University, US
View Profile

,
John C. Platt

Microsoft Research and Google, US

Microsoft Research and Google, US
View Profile

RAID '21: Proceedings of the 24th International Symposium on Research in Attacks, Intrusions and DefensesOctober 2021Pages 442–455https://doi.org/10.1145/3471621.3471858

Published:07 October 2021Publication History

RAID '21: Proceedings of the 24th International Symposium on Research in Attacks, Intrusions and Defenses

Pages 442–455

ABSTRACT

In recent years, enterprises have been targeted by advanced adversaries who leverage creative ways to infiltrate their systems and move laterally to gain access to critical data. One increasingly common evasive method is to hide the malicious activity behind a benign program by using tools that are already installed on user computers. These programs are usually part of the operating system distribution or another user-installed binary, therefore this type of attack is called “Living-Off-The-Land”. Detecting these attacks is challenging, as adversaries may not create malicious files on the victim computers and anti-virus scans fail to detect them.

We propose the design of an Active Learning framework called LOLAL for detecting Living-Off-the-Land attacks that iteratively selects a set of uncertain and anomalous samples for labeling by a human analyst. LOLAL is specifically designed to work well when a limited number of labeled samples are available for training machine learning models to detect attacks. We investigate methods to represent command-line text using word-embedding techniques, and design ensemble boosting classifiers to distinguish malicious and benign samples based on the embedding representation. We leverage a large, anonymized dataset collected by an endpoint security product and demonstrate that our ensemble classifiers achieve an average F1 score of 96% at classifying different attack classes. We show that our active learning method consistently improves the classifier performance, as more training data is labeled, and converges in less than 30 iterations when starting with a small number of labeled instances.

References

Mohamad Mahmoud Al Rahhal, Yakoub Bazi, Haikel AlHichri, Naif Alajlan, Farid Melgani, and Ronald R Yager. 2016. Deep learning approach for active classification of electrocardiogram signals. Information Sciences 345(2016), 340–354.Google ScholarDigital Library
M. Almgren and E. Jonsson. 2004. Using active learning in intrusion detection. In Proc. IEEE Computer Security Foundations Workshop. 88–98.Google Scholar
Dana Angluin. 1988. Queries and concept learning. Machine learning 2, 4 (1988), 319–342.Google Scholar
AppArmor. 2021. AppArmor: Linux kernel security module. https://apparmor.net/Google Scholar
Ignacio Arnaldo, Kalyan Veeramachaneni, and Mei Lam. 2019. eX2: a framework for interactive anomaly detection.. In IUI Workshops.Google Scholar
Les Atlas, David Cohn, and Richard Ladner. 1989. Training connectionist networks with queries and selective sampling. Advances in neural information processing systems 2 (1989), 566–573.Google Scholar
Anaël Beaugnon, Pierre Chifflier, and Francis Bach. 2017. Ilab: An interactive labelling strategy for intrusion detection. In International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 120–140.Google ScholarCross Ref
Battista Biggio, Blaine Nelson, and Pavel Laskov. 2012. Poisoning attacks against support vector machines. In ICML.Google Scholar
Daniel Bohannon and Lee Holmes. 2017. Revoke-obfuscation: powershell obfuscation detection using science.Google Scholar
Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.Google ScholarDigital Library
Varin Chouvatut, Wattana Jindaluang, and Ekkarat Boonchieng. 2015. Training set size reduction in large dataset problems. In 2015 International Computer Science and Engineering Conference (ICSEC). IEEE, 1–5.Google ScholarCross Ref
David A Cohn, Zoubin Ghahramani, and Michael I Jordan. 1996. Active learning with statistical models. Journal of artificial intelligence research 4 (1996), 129–145.Google ScholarDigital Library
Crowdstrike. 2019. The Rise of “Living off the Land” Attacks | CrowdStrike. https://www.crowdstrike.com/blog/going-beyond-malware-the-rise-of-living-off-the-land-attacksGoogle Scholar
Cytomic. 2019. Living-off-the-Land attacks: what are they and why should they worry you? | Cytomic. https://www.cytomicmodel.com/news/living-off-the-land-attacksGoogle Scholar
Alessandra De Paola, Salvatore Gaglio, Giuseppe Lo Re, and Marco Morana. 2018. A hybrid system for malware detection on big data. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 45–50.Google ScholarCross Ref
Hervé Debar, Marc Dacier, Mehdi Nassehi, and Andreas Wespi. 1998. Fixed vs. variable-length patterns for detecting suspicious process behavior. In European Symposium on Research in Computer Security. Springer, 1–15.Google ScholarDigital Library
Benoît Frénay and Michel Verleysen. 2013. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25, 5(2013), 845–869.Google Scholar
Nico Görnitz, Marius Kloft, Konrad Rieck, and Ulf Brefeld. 2009. Active learning for network intrusion detection. In Proceedings of the 2nd ACM workshop on Security and artificial intelligence. ACM, 47–54.Google ScholarDigital Library
Nico Görnitz, Marius Kloft, Konrad Rieck, and Ulf Brefeld. 2013. Toward supervised anomaly detection. Journal of Artificial Intelligence Research 46 (2013), 235–262.Google ScholarDigital Library
Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733(2017).Google Scholar
M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and B. Li. 2018. Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning. In 2018 IEEE Symposium on Security and Privacy (SP). 19–35. https://doi.org/10.1109/SP.2018.00057Google Scholar
Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning. Springer, 137–142.Google ScholarDigital Library
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759(2016).Google Scholar
David D Lewis and William A Gale. 1994. A sequential algorithm for training text classifiers. In SIGIR’94. Springer, 3–12.Google Scholar
Tong Luo, Kurt Kramer, Dmitry B Goldgof, Lawrence O Hall, Scott Samson, Andrew Remsen, Thomas Hopkins, and David Cohn. 2005. Active learning to recognize multiple types of plankton.Journal of Machine Learning Research 6, 4 (2005).Google Scholar
Carla Marceau. 2001. Characterizing the behavior of a program using multiple-length n-grams. In Proceedings of the 2000 workshop on New security paradigms. 101–110.Google ScholarDigital Library
Tren Micro. 2021. Tracking, Detecting, and Thwarting PowerShell-based Malware and Attacks - Security News. https://www.trendmicro.com/vinfo/hk-en/security/news/cybercrime-and-digital-threats/tracking-detecting-and-thwarting-powershell-based-malware-and-attacksGoogle Scholar
Microsoft. 2018. Out of sight but not invisible: Defeating fileless malware with behavior monitoring, AMSI, and next-gen AV - Microsoft Security. https://www.microsoft.com/security/blog/2018/09/27/out-of-sight-but-not-invisible-defeating-fileless-malware-with-behavior-monitoring-amsi-and-next-gen-avGoogle Scholar
Microsoft. 2021. Microsoft Defender for Endpoint | Microsoft Security. https://www.microsoft.com/en-us/security/business/threat-protection/endpoint-defenderGoogle Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.Google Scholar
Brad Miller, Alex Kantchelian, Sadia Afroz, Rekha Bachwani, Edwin Dauber, Ling Huang, Michael Carl Tschantz, Anthony D Joseph, and J Doug Tygar. 2014. Adversarial active learning. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop. ACM, 3–14.Google ScholarDigital Library
Palo Alto Networks. 2020. What Are Fileless Malware Attacks and “Living Off the Land”? Unit 42 Explains. https://www.paloaltonetworks.com/cyberpedia/what-are-fileless-malware-attacksGoogle Scholar
D. Pelleg and A. Moore. 2004. Active Learning for Anomaly and Rare-Category Detection. In Proc. Advances in Neural Information Processing Systems. 1073–1080.Google Scholar
GTFOBins Project. 2021. Living Off The Land Binaries for UNIX. https://gtfobins.github.io/.Google Scholar
LOLBAS Project. 2021. Living Off The Land Binaries and Scripts (and also Libraries). https://lolbas-project.github.io/.Google Scholar
Shubham Rai. 2020. Behavioral Threat Detection: detecting Living of Land Techniques. Master’s thesis. University of Twente.Google Scholar
Juan Ramos 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. Piscataway, NJ, 133–142.Google Scholar
Korlakai Vinayak Rashmi and Ran Gilad-Bachrach. 2015. DART: Dropouts meet Multiple Additive Regression Trees.. In AISTATS. 489–497.Google Scholar
N Roy and A McCallum. 2001. Toward optimal active learning through sampling estimation of error reduction. Int. Conf. on Machine Learning.Google ScholarDigital Library
Amir Rubin, Shay Kels, and Danny Hendler. 2019. Detecting Malicious PowerShell Scripts Using Contextual Embeddings. arXiv preprint arXiv:1905.09538(2019).Google Scholar
Gili Rusak, Abdullah Al-Dujaili, and Una-May O’Reilly. 2018. Ast-based deep learning for detecting malicious powershell. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2276–2278.Google ScholarDigital Library
Andrew I Schein and Lyle H Ungar. 2007. Active learning for logistic regression: an evaluation. Machine Learning 68, 3 (2007), 235–265.Google ScholarDigital Library
D Sculley. 2007. Online active learning methods for fast label-efficient spam filtering.. In CEAS, Vol. 7. 143.Google Scholar
D Sculley, Matthew Eric Otey, Michael Pohl, Bridget Spitznagel, John Hainsworth, and Yunkai Zhou. 2011. Detecting adversarial advertisements in the wild. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 274–282.Google ScholarDigital Library
Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin–Madison.Google Scholar
H Sebastian Seung, Manfred Opper, and Haim Sompolinsky. 1992. Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory. 287–294.Google ScholarDigital Library
Md Amran Siddiqui, Jack W Stokes, Christian Seifert, Evan Argyle, Robert McCann, Joshua Neil, and Justin Carroll. 2019. Detecting Cyber Attacks Using Anomaly Detection with Explanations and Expert Feedback. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2872–2876.Google Scholar
Robin Sommer and Vern Paxson. 2010. Outside the Closed World: On Using Machine Learning for Network Intrusion Detection. In 2010 IEEE Symposium on Security and Privacy. 305–316. https://doi.org/10.1109/SP.2010.25Google ScholarDigital Library
Ryan Stewart. 2019. Let’s dig deeper on how cybercriminals use ‘Living off the land’ attack tactics | Cyware Hacker News. Cyware (Mar 2019). https://cyware.com/news/lets-dig-deeper-on-how-cybercriminals-use-living-off-the-land-attack-tactics-cac5c132Google Scholar
Jack W Stokes, Ashish Kapoor, and Debajyoti Ray. 2016. Asking for a second opinion: Re-querying of noisy multi-class labels. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2329–2333.Google ScholarDigital Library
Jack W Stokes, John Platt, Joseph Kravis, and Michael Shilman. 2008. Aladin: Active learning of anomalies to detect intrusions. (2008).Google Scholar
Symantec. 2017. Living off the land and fileless attack techniques. Technical Report.Google Scholar
Symantec. 2021. PowerShell threats surge - Symantec Enterprise. https://community.broadcom.com/symantecenterprise/communities/community-home/librarydocuments/viewdocument?DocumentKey=cbd24b89-1022-4fe8-800d-a362f3d4cf06&CommunityKey=1ecf5f55-9545-44d6-b0f4-4e4a7f5f5e68&tab=librarydocumentsGoogle Scholar
Jorge L Guerra Torres, Carlos A Catania, and Eduardo Veas. 2019. Active learning approach to label network traffic datasets. Journal of information security and applications 49 (2019), 102388.Google ScholarDigital Library
Devis Tuia, Frédéric Ratle, Fabio Pacifici, Mikhail F Kanevski, and William J Emery. 2009. Active learning methods for remote sensing image classification. IEEE Transactions on Geoscience and Remote Sensing 47, 7 (2009), 2218–2232.Google ScholarCross Ref
Kalyan Veeramachaneni, Ignacio Arnaldo, Vamsi Korrapati, Constantinos Bassias, and Ke Li. 2016. AI2303 2: training a big data machine to defend. In 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). IEEE, 49–54.Google Scholar
Venafi. 2020. Beware of Cyber Attackers “Living off the Land” | Venafi. https://www.venafi.com/blog/beware-cyber-attackers-living-landGoogle Scholar
Qi Wang, Wajih Ul Hassan, Ding Li, Kangkook Jee, Xiao Yu, Kexuan Zou, Junghwan Rhee, Zhengzhang Chen, Wei Cheng, C Gunter, 2020. You are what you do: Hunting stealthy malware via data provenance analysis. In Proc. of the Symposium on Network and Distributed System Security (NDSS).Google ScholarCross Ref
Colin Whittaker, Brian Ryner, and Marria Nazif. 2010. Large-scale automatic classification of phishing pages. (2010).Google Scholar
Muhammd Mudassar Yamin and Basel Katt. 2018. Detecting Malicious Windows Commands Using Natural Language Processing Techniques. In International Conference on Security for Information Technology and Communications. Springer, 157–169.Google Scholar

Living-Off-The-Land Command Detection Using Active Learning
1. Computing methodologies
2. Security and privacy

Recommendations

Unraveled — A semi-synthetic dataset for Advanced Persistent Threats
Abstract
U n r a v e l e d is a novel cybersecurity dataset capturing Advanced Persistent Threat (APT) attacks not available in the public domain. Existing cybersecurity datasets lack coherent information about sophisticated and persistent cyber-attack ...
Read More
Using Supervised Learning to Detect Command and Control Attacks in IoT

The rapid proliferation of internet of things (IoT) devices has ushered in a new era of technological development. However, this growth has also exposed these devices to various cybersecurity risks, including command and control (C&C) attacks. C&C ...
Read More
EarlyCrow: Detecting APT Malware Command and Control over HTTP(S) Using Contextual Summaries
Information Security
Abstract
Advanced Persistent Threats (APTs) are among the most sophisticated threats facing critical organizations worldwide. APTs employ specific tactics, techniques, and procedures (TTPs) which make them difficult to detect in comparison to frequent and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RAID '21: Proceedings of the 24th International Symposium on Research in Attacks, Intrusions and Defenses
October 2021
468 pages
ISBN:9781450390583
DOI:10.1145/3471621
Program Chairs:
Leyla Bilge,
Tudor Dumitras
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Active learning for security
Advanced Persistent Threats
Contextual text embeddings
Threat detection
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 1,252
  Total Downloads
- Downloads (Last 12 months)730
- Downloads (Last 6 weeks)104
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Living-Off-The-Land Command Detection Using Active Learning

RAID '21: Proceedings of the 24th International Symposium on Research in Attacks, Intrusions and Defenses

ABSTRACT

References

Cited By

Recommendations

Unraveled — A semi-synthetic dataset for Advanced Persistent Threats

Using Supervised Learning to Detect Command and Control Attacks in IoT

EarlyCrow: Detecting APT Malware Command and Control over HTTP(S) Using Contextual Summaries