ABSTRACT
Over the last few years, the adoption of encryption in network traffic has been constantly increasing. The percentage of encrypted communications worldwide is estimated to exceed 90%. Although network encryption protocols mainly aim to secure and protect users' online activities and communications, they have been exploited by malicious entities that hide their presence in the network. It was estimated that in 2022, more than 85% of the malware used encrypted communication channels.
In this work, we examine state-of-the-art fingerprinting techniques and extend a machine learning pipeline for effective and practical server classification. Specifically, we actively contact servers to initiate communication over the TLS protocol and through exhaustive requests, we extract communication metadata. We investigate which features favor an effective classification, following state-of-the-art approaches. Our extended pipeline can indicate whether a server is malicious or not with 91% precision and 95% recall, while it can specify the botnet family with 99% precision and 99% recall.
Supplemental Material
- 2012. SSL Fingerprinting for p0f. https://idea.popcount.org/2012-06--17-sslfingerprinting-for-p0f/. Accessed: 2023-08--11.Google Scholar
- 2016. FingerprinTLS. https://github.com/LeeBrotherston/tls-fingerprinting. Accessed: 2023-08--11.Google Scholar
- 2019. https://tranco-list.eu/. Accessed: 2023-08--11.Google Scholar
- 2019. HTTPS encryption on the web -- Google Transparency Report. https: //transparencyreport.google.com/https/overview?hl=en. Accessed on 2023-08-11.Google Scholar
- 2020. JARM: A Solid Fingerprinting Tool for Detecting Malicious Servers. https: //securitytrails.com/blog/jarm-fingerprinting-tool. Accessed on 2023-08--11.Google Scholar
- 2020. JARM: An active Transport Layer Security (TLS) server fingerprinting tool. https://github.com/salesforce/jarm. Accessed: 2023-08--11.Google Scholar
- 2020. Suricata Open Source IDS / IPS / NSM engine. https://www.suricata-ids.org/. Accessed: 2023-08--11.Google Scholar
- 2021. The 2021 TLS Telemetry Report. https://www.f5.com/labs/articles/threatintelligence/ the-2021-tls-telemetry-report. Accessed: 2023-08--11.Google Scholar
- 2021. The Feodo Tracker Browse Botnet C&Cs. https://feodotracker.abuse.ch/ browse/. Accessed: 2023-08--11.Google Scholar
- 2021. The SSL Blacklist (SSLBL) . https://sslbl.abuse.ch/blacklist/sslipblacklist.txt. Accessed: 2023-08--11.Google Scholar
- 2021. WatchGuard Threat Lab Reports 91.5Arrived over Encrypted Connections in Q2 2021. https://www.watchguard.com/wgrd-news/press-releases/ watchguard-threat-lab-reports-915-malware-arrived-over-encrypted. Accessed: 2023-08--11.Google Scholar
- 2022. Spoiler: New ThreatLabz Report Reveals Over 85Attacks Are Encrypted. https://www.zscaler.com/blogs/security-research/2022-encryptedattacks-report. Accessed: 2023-08--11.Google Scholar
- 2023. https://lists.blocklist.de/lists/all.txt. Accessed: 2023-08--11.Google Scholar
- 2023. A Research-Oriented Top Sites Ranking Hardened Against Manipulation. https://tranco-list.eu/list/JXP6Y/1000000. Accessed: 2023-08--11.Google Scholar
- 2023. Censys. https://search.censys.io/search/definitions?resource=hosts. Accessed: 2023-08--11.Google Scholar
- 2023. Censys Search. https://search.censys.io/. Accessed: 2023-08--11.Google Scholar
- 2023. The CINS Score CI-Badguys list. https://cinsscore.com/list/ci-badguys.txt. Accessed: 2023-08--11.Google Scholar
- 2023. The Darklist IP blacklist. https://darklist.de/raw.php. Accessed: 2023-08--11.Google Scholar
- 2023. Easily Identify Malicious Servers on the Internet with JARM. https://engineering.salesforce.com/easily-identify-malicious-servers-onthe-internet-with-jarm-e095edac525a/. Accessed: 2023-08--11.Google Scholar
- 2023. Semi Automated Machine Learning Pipeline. https://github.com/alexdrk14/ SAMLP. Accessed: 23-06--23.Google Scholar
- 2023. Shodan Facet Analysis. https://beta.shodan.io/search/facet?query=http& facet=ssl.jarm. Accessed: 2023-08--11.Google Scholar
- 2023. Shodan Search Engine. https://www.shodan.io/. Accessed: 2023-08--11.Google Scholar
- 2023. SSLyze: A fast and powerful SSL/TLS scanning tool. https://github.com/ nabla-c0d3/sslyze. Accessed: 2023-08--11.Google Scholar
- 2023. TestSSL. https://testssl.sh/. Accessed: 2023-08--11.Google Scholar
- 2024. Dataset of paper "Fingerprinting the Shadows: Unmasking Malicious Servers with Machine Learning-Powered TLS Analysis". https://doi.org/10.5281/ zenodo.10655329. Accessed: 2024-02--18.Google Scholar
- Luai Al Shalabi, Zyad Shaaban, and Basel Kasasbeh. 2006. Data mining: A preprocessing engine. Journal of Computer Science 2, 9 (2006), 735--739.Google ScholarCross Ref
- Blake Anderson and David McGrew. 2019. Tls beyond the browser: Combining end host and network data to understand application behavior. In Proceedings of the Internet Measurement Conference. 379--392.Google ScholarDigital Library
- Blake Anderson, Subharthi Paul, and David McGrew. 2018. Deciphering malware's use of TLS (without decryption). Journal of Computer Virology and Hacking Techniques 14 (2018), 195--211.Google ScholarCross Ref
- Brandon Enright Lucas Messenger Adam Weller Andrew Chi Shekhar Achary Blake Anderson, David McGrew. 2019. Mercury: A network metadata tool for capturing and analysis. https://github.com/cisco/mercury. Accessed: 2023-08--11.Google Scholar
- Leo Breiman. 2001. Random forests. Machine learning 45 (2001), 5--32.Google Scholar
- Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, Rory Mitchell, Ignacio Cano, Tianyi Zhou, et al. 2015. Xgboost: extreme gradient boosting. R package version 0.4--2 1, 4 (2015), 1--4.Google Scholar
- Yige Chen, Tianning Zang, Yongzheng Zhang, Yuan Zhou, and Yipeng Wang. 2019. Rethinking encrypted traffic classification: A multi-attribute associated fingerprint approach. In 2019 IEEE 27th International Conference on Network Protocols (ICNP). IEEE, 1--11.Google ScholarCross Ref
- Bill Hudson David McGrew, Blake Anderson and Philip Perricone. 2017. Joy. https://github.com/cisco/joy. Accessed: 2023-08--11.Google Scholar
- Zakir Durumeric, David Adrian, Ariana Mirian, Michael Bailey, and J Alex Halderman. 2015. A search engine backed by Internet-wide scanning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 542--553.Google ScholarDigital Library
- Peter Emerson. 2013. The original Borda count and partial voting. Social Choice and Welfare 40 (2013), 353--358.Google ScholarCross Ref
- Jon Fraenkel and Bernard Grofman. 2014. The Borda Count and its real-world alternatives: Comparing scoring rules in Nauru and Slovenia. Australian Journal of Political Science 49, 2 (2014), 186--205.Google ScholarCross Ref
- Sergey Frolov and Eric Wustrow. 2019. The use of TLS in Censorship Circumvention.. In NDSS.Google Scholar
- Ralph Holz, Jens Hiller, Johanna Amann, Abbas Razaghpanah, Thomas Jost, Narseo Vallina-Rodriguez, and Oliver Hohlfeld. 2020. Tracking the deployment of TLS 1.3 on the Web: A story of experimentation and centralization. ACM SIGCOMM Computer Communication Review 50, 3 (2020), 3--15.Google ScholarDigital Library
- Mahdi Jafari Siavoshani, Amirhossein Khajehpour, Amirmohammad Ziaei Bideh, Amirali Gatmiri, and Ali Taheri. 2023. Machine learning interpretability meets tls fingerprinting. Soft Computing 27, 11 (2023), 7191--7208.Google ScholarDigital Library
- Jeff Atkinson John B. Althouse and Josh Atkins. 2017. JA3. https://github.com/ salesforce/ja3. Accessed: 2023-08--11.Google Scholar
- Kinan Keshkeh, Aman Jantan, Kamal Alieyan, and Usman Mohammed Gana. 2021. A Reviewon TLS Encryption Malware Detection: TLS Features, Machine Learning Usage, and Future Directions. In Advances in Cyber Security: Third International Conference, ACeS 2021, Penang, Malaysia, August 24--25, 2021, Revised Selected Papers 3. Springer, 213--229.Google ScholarCross Ref
- Hyundo Kim, Minsu Kim, Joonseo Ha, and Heejun Roh. 2022. Revisiting TLSEncrypted Traffic Fingerprinting Methods for Malware Family Classification. In 2022 13th International Conference on Information and Communication Technology Convergence (ICTC). IEEE, 1273--1278.Google Scholar
- Maciej Korczy'ski and Andrzej Duda. 2014. Markov chain fingerprinting to classify encrypted traffic. In IEEE INFOCOM 2014 - IEEE Conference on Computer Communications. 781--789. https://doi.org/10.1109/INFOCOM.2014.6848005Google ScholarCross Ref
- Platon Kotzias, Abbas Razaghpanah, Johanna Amann, Kenneth G Paterson, Narseo Vallina-Rodriguez, and Juan Caballero. 2018. Coming of age: A longitudinal study of tls deployment. In Proceedings of the Internet Measurement Conference 2018. 415--428.Google ScholarDigital Library
- Martin La?tovicka, Stanislav ?pacek, Petr Velan, and Pavel Celeda. 2020. Using TLS fingerprints for OS identification in encrypted traffic. In NOMS 2020--2020 IEEE/IFIP Network Operations and Management Symposium. IEEE, 1--6.Google Scholar
- Xigao Li, Babak Amin Azad, Amir Rahmati, and Nick Nikiforakis. 2021. Good bot, bad bot: Characterizing automated browsing activity. In 2021 IEEE symposium on security and privacy (sp). IEEE, 1589--1605.Google ScholarCross Ref
- Antonio Nappa, Zhaoyan Xu, M Zubair Rafique, Juan Caballero, and Guofei Gu. 2014. Cyberprobe: Towards internet-scale active detection of malicious servers. In In Proceedings of the 2014 Network and Distributed System Security Symposium (NDSS 2014). The Internet Society, 1--15.Google ScholarCross Ref
- Chaeyeon Oh, Joonseo Ha, and Heejun Roh. 2021. A survey on TLS-encrypted malware network traffic analysis applicable to security operations centers. Applied Sciences 12, 1 (2021), 155.Google ScholarCross Ref
- Eva Papadogiannaki and Sotiris Ioannidis. 2021. A survey on encrypted network traffic analysis applications, techniques, and countermeasures. ACM Computing Surveys (CSUR) 54, 6 (2021), 1--35.Google ScholarDigital Library
- Eva Papadogiannaki and Sotiris Ioannidis. 2023. Pump Up the JARM: Studying the Evolution of Botnets using Active TLS Fingerprinting. In Proceedings of the 28th IEEE Symposium on Computers and Communications (ISCC).Google ScholarCross Ref
- Muhammad Talha Paracha, Daniel J Dubois, Narseo Vallina-Rodriguez, and David Choffnes. 2021. IoTLS: understanding TLS usage in consumer IoT devices. In Proceedings of the 21st ACM Internet Measurement Conference. 165--178.Google ScholarDigital Library
- Abbas Razaghpanah, Arian Akhavan Niaki, Narseo Vallina-Rodriguez, Srikanth Sundaresan, Johanna Amann, and Phillipa Gill. 2017. Studying TLS usage in Android apps. In Proceedings of the 13th International Conference on emerging Networking EXperiments and Technologies. 350--362.Google ScholarDigital Library
- Shahbaz Rezaei, Bryce Kroencke, and Xin Liu. 2019. Large-scale mobile app identification using deep learning. IEEE Access 8 (2019), 348--362.Google ScholarCross Ref
- Ivan Ristic. 2009. HTTP Client Fingerprinting using SSL Handshake Analysis. https://blog.ivanristic.com/2009/06/http-client-fingerprinting-using-sslhandshake-analysis.html. (2009). Accessed: 2023-08--11.Google Scholar
- Ivan Ristic. 2012. Sslhalf. https://github.com/ssllabs/sslhaf. Accessed: 2023-08--11.Google Scholar
- Donald G Saari. 1985. The optimal ranking method is the Borda Count. Technical Report. Discussion paper.Google Scholar
- Meng Shen, Zhenbo Gao, Liehuang Zhu, and Ke Xu. 2021. Efficient fine-grained website fingerprinting via encrypted traffic analysis with deep learning. In 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS). IEEE, 1--10.Google ScholarCross Ref
- Alex Shevtsov, Despoina Antonakaki, Ioannis Lamprou, Polyvios Pratikakis, and Sotiris Ioannidis. 2023. BotArtist: Twitter bot detection Machine Learning model based on Twitter suspension. arXiv preprint arXiv:2306.00037 (2023).Google Scholar
- Markus Sosnowski, Johannes Zirngibl, Patrick Sattler, and Georg Carle. 2023. DissecTLS: A Scalable Active Scanner for TLS Server Configurations, Capabilities, and TLS Fingerprinting. In International Conference on Passive and Active Network Measurement. Springer, 110--126.Google ScholarDigital Library
- Markus Sosnowski, Johannes Zirngibl, Patrick Sattler, Georg Carle, Claas Grohnfeldt, Michele Russo, and Daniele Sgandurra. 2022. Active TLS stack fingerprinting: characterizing TLS server deployments at scale. arXiv preprint arXiv:2206.13230 (2022).Google Scholar
- Geoffrey I Webb, Eamonn Keogh, and Risto Miikkulainen. 2010. Naïve Bayes. Encyclopedia of machine learning 15, 1 (2010), 713--714.Google Scholar
- Ziqing Zhang, Cuicui Kang, Gang Xiong, and Zhen Li. 2019. Deep forest with LRRS feature for fine-grained website fingerprinting with encrypted SSL/TLS. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 851--860.Google ScholarDigital Library
Index Terms
- Fingerprinting the Shadows: Unmasking Malicious Servers with Machine Learning-Powered TLS Analysis
Recommendations
Hunting Malicious TLS Certificates with Deep Neural Networks
AISec '18: Proceedings of the 11th ACM Workshop on Artificial Intelligence and SecurityEncryption is widely used across the internet to secure communications and ensure that information cannot be intercepted and read by a third party. However, encryption also allows cybercriminals to hide their messages and carry out successful malware ...
Formulistic Detection of Malicious Fast-Flux Domains
PAAP '12: Proceedings of the 2012 Fifth International Symposium on Parallel Architectures, Algorithms and ProgrammingBonnet creates harmful network attacks nowadays. Lawbreaker may implant malware into victim machines using botnets and, furthermore, he employs fast-flux domain technology to improve the lifetime of botnets. To circumvent the detection of command and ...
Real-time bot infection detection system using DNS fingerprinting and machine-learning
AbstractIn today’s cyberattacks, botnets are used as an advanced technique to generate sophisticated and coordinated attacks. Infected systems connect to a command and control (C&C) server to receive commands and attack. Thus, detecting infected hosts ...
Comments