research-article

Open Access

Fingerprinting the Shadows: Unmasking Malicious Servers with Machine Learning-Powered TLS Analysis

Authors:
Andreas Theofanous

Foundation for Research and Technology - Hellas, Heraklion, Greece

Foundation for Research and Technology - Hellas, Heraklion, Greece

0009-0008-6672-2133
View Profile

,
Eva Papadogiannaki

Technical University of Crete, Chania, Greece

Technical University of Crete, Chania, Greece

0000-0003-0205-964X
View Profile

,
Alexander Shevtsov

Technical University of Crete, Chania, Greece

Technical University of Crete, Chania, Greece

0000-0001-5072-5569
View Profile

,
Sotiris Ioannidis

Technical University of Crete, Chania, Greece

Technical University of Crete, Chania, Greece

0000-0001-9340-2241
View Profile

Authors Info & Claims

WWW '24: Proceedings of the ACM on Web Conference 2024May 2024Pages 1933–1944https://doi.org/10.1145/3589334.3645719

Published:13 May 2024Publication History

WWW '24: Proceedings of the ACM on Web Conference 2024

Pages 1933–1944

ABSTRACT

Over the last few years, the adoption of encryption in network traffic has been constantly increasing. The percentage of encrypted communications worldwide is estimated to exceed 90%. Although network encryption protocols mainly aim to secure and protect users' online activities and communications, they have been exploited by malicious entities that hide their presence in the network. It was estimated that in 2022, more than 85% of the malware used encrypted communication channels.

In this work, we examine state-of-the-art fingerprinting techniques and extend a machine learning pipeline for effective and practical server classification. Specifically, we actively contact servers to initiate communication over the TLS protocol and through exhaustive requests, we extract communication metadata. We investigate which features favor an effective classification, following state-of-the-art approaches. Our extended pipeline can indicate whether a server is malicious or not with 91% precision and 95% recall, while it can specify the botnet family with 99% precision and 99% recall.

Supplemental Material

rfp2531.mp4

Supplemental video

mp4

4.6 MB

Download

References

2012. SSL Fingerprinting for p0f. https://idea.popcount.org/2012-06--17-sslfingerprinting-for-p0f/. Accessed: 2023-08--11.Google Scholar
2016. FingerprinTLS. https://github.com/LeeBrotherston/tls-fingerprinting. Accessed: 2023-08--11.Google Scholar
2019. https://tranco-list.eu/. Accessed: 2023-08--11.Google Scholar
2019. HTTPS encryption on the web -- Google Transparency Report. https: //transparencyreport.google.com/https/overview?hl=en. Accessed on 2023-08-11.Google Scholar
2020. JARM: A Solid Fingerprinting Tool for Detecting Malicious Servers. https: //securitytrails.com/blog/jarm-fingerprinting-tool. Accessed on 2023-08--11.Google Scholar
2020. JARM: An active Transport Layer Security (TLS) server fingerprinting tool. https://github.com/salesforce/jarm. Accessed: 2023-08--11.Google Scholar
2020. Suricata Open Source IDS / IPS / NSM engine. https://www.suricata-ids.org/. Accessed: 2023-08--11.Google Scholar
2021. The 2021 TLS Telemetry Report. https://www.f5.com/labs/articles/threatintelligence/ the-2021-tls-telemetry-report. Accessed: 2023-08--11.Google Scholar
2021. The Feodo Tracker Browse Botnet C&Cs. https://feodotracker.abuse.ch/ browse/. Accessed: 2023-08--11.Google Scholar
2021. The SSL Blacklist (SSLBL) . https://sslbl.abuse.ch/blacklist/sslipblacklist.txt. Accessed: 2023-08--11.Google Scholar
2021. WatchGuard Threat Lab Reports 91.5Arrived over Encrypted Connections in Q2 2021. https://www.watchguard.com/wgrd-news/press-releases/ watchguard-threat-lab-reports-915-malware-arrived-over-encrypted. Accessed: 2023-08--11.Google Scholar
2022. Spoiler: New ThreatLabz Report Reveals Over 85Attacks Are Encrypted. https://www.zscaler.com/blogs/security-research/2022-encryptedattacks-report. Accessed: 2023-08--11.Google Scholar
2023. https://lists.blocklist.de/lists/all.txt. Accessed: 2023-08--11.Google Scholar
2023. A Research-Oriented Top Sites Ranking Hardened Against Manipulation. https://tranco-list.eu/list/JXP6Y/1000000. Accessed: 2023-08--11.Google Scholar
2023. Censys. https://search.censys.io/search/definitions?resource=hosts. Accessed: 2023-08--11.Google Scholar
2023. Censys Search. https://search.censys.io/. Accessed: 2023-08--11.Google Scholar
2023. The CINS Score CI-Badguys list. https://cinsscore.com/list/ci-badguys.txt. Accessed: 2023-08--11.Google Scholar
2023. The Darklist IP blacklist. https://darklist.de/raw.php. Accessed: 2023-08--11.Google Scholar
2023. Easily Identify Malicious Servers on the Internet with JARM. https://engineering.salesforce.com/easily-identify-malicious-servers-onthe-internet-with-jarm-e095edac525a/. Accessed: 2023-08--11.Google Scholar
2023. Semi Automated Machine Learning Pipeline. https://github.com/alexdrk14/ SAMLP. Accessed: 23-06--23.Google Scholar
2023. Shodan Facet Analysis. https://beta.shodan.io/search/facet?query=http& facet=ssl.jarm. Accessed: 2023-08--11.Google Scholar
2023. Shodan Search Engine. https://www.shodan.io/. Accessed: 2023-08--11.Google Scholar
2023. SSLyze: A fast and powerful SSL/TLS scanning tool. https://github.com/ nabla-c0d3/sslyze. Accessed: 2023-08--11.Google Scholar
2023. TestSSL. https://testssl.sh/. Accessed: 2023-08--11.Google Scholar
2024. Dataset of paper "Fingerprinting the Shadows: Unmasking Malicious Servers with Machine Learning-Powered TLS Analysis". https://doi.org/10.5281/ zenodo.10655329. Accessed: 2024-02--18.Google Scholar
Luai Al Shalabi, Zyad Shaaban, and Basel Kasasbeh. 2006. Data mining: A preprocessing engine. Journal of Computer Science 2, 9 (2006), 735--739.Google ScholarCross Ref
Blake Anderson and David McGrew. 2019. Tls beyond the browser: Combining end host and network data to understand application behavior. In Proceedings of the Internet Measurement Conference. 379--392.Google ScholarDigital Library
Blake Anderson, Subharthi Paul, and David McGrew. 2018. Deciphering malware's use of TLS (without decryption). Journal of Computer Virology and Hacking Techniques 14 (2018), 195--211.Google ScholarCross Ref
Brandon Enright Lucas Messenger Adam Weller Andrew Chi Shekhar Achary Blake Anderson, David McGrew. 2019. Mercury: A network metadata tool for capturing and analysis. https://github.com/cisco/mercury. Accessed: 2023-08--11.Google Scholar
Leo Breiman. 2001. Random forests. Machine learning 45 (2001), 5--32.Google Scholar
Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, Rory Mitchell, Ignacio Cano, Tianyi Zhou, et al. 2015. Xgboost: extreme gradient boosting. R package version 0.4--2 1, 4 (2015), 1--4.Google Scholar
Yige Chen, Tianning Zang, Yongzheng Zhang, Yuan Zhou, and Yipeng Wang. 2019. Rethinking encrypted traffic classification: A multi-attribute associated fingerprint approach. In 2019 IEEE 27th International Conference on Network Protocols (ICNP). IEEE, 1--11.Google ScholarCross Ref
Bill Hudson David McGrew, Blake Anderson and Philip Perricone. 2017. Joy. https://github.com/cisco/joy. Accessed: 2023-08--11.Google Scholar
Zakir Durumeric, David Adrian, Ariana Mirian, Michael Bailey, and J Alex Halderman. 2015. A search engine backed by Internet-wide scanning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 542--553.Google ScholarDigital Library
Peter Emerson. 2013. The original Borda count and partial voting. Social Choice and Welfare 40 (2013), 353--358.Google ScholarCross Ref
Jon Fraenkel and Bernard Grofman. 2014. The Borda Count and its real-world alternatives: Comparing scoring rules in Nauru and Slovenia. Australian Journal of Political Science 49, 2 (2014), 186--205.Google ScholarCross Ref
Sergey Frolov and Eric Wustrow. 2019. The use of TLS in Censorship Circumvention.. In NDSS.Google Scholar
Ralph Holz, Jens Hiller, Johanna Amann, Abbas Razaghpanah, Thomas Jost, Narseo Vallina-Rodriguez, and Oliver Hohlfeld. 2020. Tracking the deployment of TLS 1.3 on the Web: A story of experimentation and centralization. ACM SIGCOMM Computer Communication Review 50, 3 (2020), 3--15.Google ScholarDigital Library
Mahdi Jafari Siavoshani, Amirhossein Khajehpour, Amirmohammad Ziaei Bideh, Amirali Gatmiri, and Ali Taheri. 2023. Machine learning interpretability meets tls fingerprinting. Soft Computing 27, 11 (2023), 7191--7208.Google ScholarDigital Library
Jeff Atkinson John B. Althouse and Josh Atkins. 2017. JA3. https://github.com/ salesforce/ja3. Accessed: 2023-08--11.Google Scholar
Kinan Keshkeh, Aman Jantan, Kamal Alieyan, and Usman Mohammed Gana. 2021. A Reviewon TLS Encryption Malware Detection: TLS Features, Machine Learning Usage, and Future Directions. In Advances in Cyber Security: Third International Conference, ACeS 2021, Penang, Malaysia, August 24--25, 2021, Revised Selected Papers 3. Springer, 213--229.Google ScholarCross Ref
Hyundo Kim, Minsu Kim, Joonseo Ha, and Heejun Roh. 2022. Revisiting TLSEncrypted Traffic Fingerprinting Methods for Malware Family Classification. In 2022 13th International Conference on Information and Communication Technology Convergence (ICTC). IEEE, 1273--1278.Google Scholar
Maciej Korczy'ski and Andrzej Duda. 2014. Markov chain fingerprinting to classify encrypted traffic. In IEEE INFOCOM 2014 - IEEE Conference on Computer Communications. 781--789. https://doi.org/10.1109/INFOCOM.2014.6848005Google ScholarCross Ref
Platon Kotzias, Abbas Razaghpanah, Johanna Amann, Kenneth G Paterson, Narseo Vallina-Rodriguez, and Juan Caballero. 2018. Coming of age: A longitudinal study of tls deployment. In Proceedings of the Internet Measurement Conference 2018. 415--428.Google ScholarDigital Library
Martin La?tovicka, Stanislav ?pacek, Petr Velan, and Pavel Celeda. 2020. Using TLS fingerprints for OS identification in encrypted traffic. In NOMS 2020--2020 IEEE/IFIP Network Operations and Management Symposium. IEEE, 1--6.Google Scholar
Xigao Li, Babak Amin Azad, Amir Rahmati, and Nick Nikiforakis. 2021. Good bot, bad bot: Characterizing automated browsing activity. In 2021 IEEE symposium on security and privacy (sp). IEEE, 1589--1605.Google ScholarCross Ref
Antonio Nappa, Zhaoyan Xu, M Zubair Rafique, Juan Caballero, and Guofei Gu. 2014. Cyberprobe: Towards internet-scale active detection of malicious servers. In In Proceedings of the 2014 Network and Distributed System Security Symposium (NDSS 2014). The Internet Society, 1--15.Google ScholarCross Ref
Chaeyeon Oh, Joonseo Ha, and Heejun Roh. 2021. A survey on TLS-encrypted malware network traffic analysis applicable to security operations centers. Applied Sciences 12, 1 (2021), 155.Google ScholarCross Ref
Eva Papadogiannaki and Sotiris Ioannidis. 2021. A survey on encrypted network traffic analysis applications, techniques, and countermeasures. ACM Computing Surveys (CSUR) 54, 6 (2021), 1--35.Google ScholarDigital Library
Eva Papadogiannaki and Sotiris Ioannidis. 2023. Pump Up the JARM: Studying the Evolution of Botnets using Active TLS Fingerprinting. In Proceedings of the 28th IEEE Symposium on Computers and Communications (ISCC).Google ScholarCross Ref
Muhammad Talha Paracha, Daniel J Dubois, Narseo Vallina-Rodriguez, and David Choffnes. 2021. IoTLS: understanding TLS usage in consumer IoT devices. In Proceedings of the 21st ACM Internet Measurement Conference. 165--178.Google ScholarDigital Library
Abbas Razaghpanah, Arian Akhavan Niaki, Narseo Vallina-Rodriguez, Srikanth Sundaresan, Johanna Amann, and Phillipa Gill. 2017. Studying TLS usage in Android apps. In Proceedings of the 13th International Conference on emerging Networking EXperiments and Technologies. 350--362.Google ScholarDigital Library
Shahbaz Rezaei, Bryce Kroencke, and Xin Liu. 2019. Large-scale mobile app identification using deep learning. IEEE Access 8 (2019), 348--362.Google ScholarCross Ref
Ivan Ristic. 2009. HTTP Client Fingerprinting using SSL Handshake Analysis. https://blog.ivanristic.com/2009/06/http-client-fingerprinting-using-sslhandshake-analysis.html. (2009). Accessed: 2023-08--11.Google Scholar
Ivan Ristic. 2012. Sslhalf. https://github.com/ssllabs/sslhaf. Accessed: 2023-08--11.Google Scholar
Donald G Saari. 1985. The optimal ranking method is the Borda Count. Technical Report. Discussion paper.Google Scholar
Meng Shen, Zhenbo Gao, Liehuang Zhu, and Ke Xu. 2021. Efficient fine-grained website fingerprinting via encrypted traffic analysis with deep learning. In 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS). IEEE, 1--10.Google ScholarCross Ref
Alex Shevtsov, Despoina Antonakaki, Ioannis Lamprou, Polyvios Pratikakis, and Sotiris Ioannidis. 2023. BotArtist: Twitter bot detection Machine Learning model based on Twitter suspension. arXiv preprint arXiv:2306.00037 (2023).Google Scholar
Markus Sosnowski, Johannes Zirngibl, Patrick Sattler, and Georg Carle. 2023. DissecTLS: A Scalable Active Scanner for TLS Server Configurations, Capabilities, and TLS Fingerprinting. In International Conference on Passive and Active Network Measurement. Springer, 110--126.Google ScholarDigital Library
Markus Sosnowski, Johannes Zirngibl, Patrick Sattler, Georg Carle, Claas Grohnfeldt, Michele Russo, and Daniele Sgandurra. 2022. Active TLS stack fingerprinting: characterizing TLS server deployments at scale. arXiv preprint arXiv:2206.13230 (2022).Google Scholar
Geoffrey I Webb, Eamonn Keogh, and Risto Miikkulainen. 2010. Naïve Bayes. Encyclopedia of machine learning 15, 1 (2010), 713--714.Google Scholar
Ziqing Zhang, Cuicui Kang, Gang Xiong, and Zhen Li. 2019. Deep forest with LRRS feature for fine-grained website fingerprinting with encrypted SSL/TLS. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 851--860.Google ScholarDigital Library

Index Terms

Fingerprinting the Shadows: Unmasking Malicious Servers with Machine Learning-Powered TLS Analysis

Recommendations

Hunting Malicious TLS Certificates with Deep Neural Networks
AISec '18: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security

Encryption is widely used across the internet to secure communications and ensure that information cannot be intercepted and read by a third party. However, encryption also allows cybercriminals to hide their messages and carry out successful malware ...
Read More
Formulistic Detection of Malicious Fast-Flux Domains
PAAP '12: Proceedings of the 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming

Bonnet creates harmful network attacks nowadays. Lawbreaker may implant malware into victim machines using botnets and, furthermore, he employs fast-flux domain technology to improve the lifetime of botnets. To circumvent the detection of command and ...
Read More
Real-time bot infection detection system using DNS fingerprinting and machine-learning
Abstract
In today’s cyberattacks, botnets are used as an advanced technique to generate sophisticated and coordinated attacks. Infected systems connect to a command and control (C&C) server to receive commands and attack. Thus, detecting infected hosts ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '24: Proceedings of the ACM on Web Conference 2024
May 2024
4826 pages
ISBN:9798400701719
DOI:10.1145/3589334
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University
Copyright © 2024 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2024
Check for updates
Badges
- Artifacts Available / v1.1
Author Tags
TLS
TLS fingerprinting
active probing
botnet
command and control
machine learning
server characterization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 89
  Total Downloads
- Downloads (Last 12 months)89
- Downloads (Last 6 weeks)89
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fingerprinting the Shadows: Unmasking Malicious Servers with Machine Learning-Powered TLS Analysis

WWW '24: Proceedings of the ACM on Web Conference 2024

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Hunting Malicious TLS Certificates with Deep Neural Networks

Formulistic Detection of Malicious Fast-Flux Domains

Real-time bot infection detection system using DNS fingerprinting and machine-learning