skip to main content
10.1145/3589334.3645719acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Open Access
Artifacts Available / v1.1

Fingerprinting the Shadows: Unmasking Malicious Servers with Machine Learning-Powered TLS Analysis

Published:13 May 2024Publication History

ABSTRACT

Over the last few years, the adoption of encryption in network traffic has been constantly increasing. The percentage of encrypted communications worldwide is estimated to exceed 90%. Although network encryption protocols mainly aim to secure and protect users' online activities and communications, they have been exploited by malicious entities that hide their presence in the network. It was estimated that in 2022, more than 85% of the malware used encrypted communication channels.

In this work, we examine state-of-the-art fingerprinting techniques and extend a machine learning pipeline for effective and practical server classification. Specifically, we actively contact servers to initiate communication over the TLS protocol and through exhaustive requests, we extract communication metadata. We investigate which features favor an effective classification, following state-of-the-art approaches. Our extended pipeline can indicate whether a server is malicious or not with 91% precision and 95% recall, while it can specify the botnet family with 99% precision and 99% recall.

Skip Supplemental Material Section

Supplemental Material

rfp2531.mp4

Supplemental video

mp4

4.6 MB

References

  1. 2012. SSL Fingerprinting for p0f. https://idea.popcount.org/2012-06--17-sslfingerprinting-for-p0f/. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  2. 2016. FingerprinTLS. https://github.com/LeeBrotherston/tls-fingerprinting. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  3. 2019. https://tranco-list.eu/. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  4. 2019. HTTPS encryption on the web -- Google Transparency Report. https: //transparencyreport.google.com/https/overview?hl=en. Accessed on 2023-08-11.Google ScholarGoogle Scholar
  5. 2020. JARM: A Solid Fingerprinting Tool for Detecting Malicious Servers. https: //securitytrails.com/blog/jarm-fingerprinting-tool. Accessed on 2023-08--11.Google ScholarGoogle Scholar
  6. 2020. JARM: An active Transport Layer Security (TLS) server fingerprinting tool. https://github.com/salesforce/jarm. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  7. 2020. Suricata Open Source IDS / IPS / NSM engine. https://www.suricata-ids.org/. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  8. 2021. The 2021 TLS Telemetry Report. https://www.f5.com/labs/articles/threatintelligence/ the-2021-tls-telemetry-report. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  9. 2021. The Feodo Tracker Browse Botnet C&Cs. https://feodotracker.abuse.ch/ browse/. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  10. 2021. The SSL Blacklist (SSLBL) . https://sslbl.abuse.ch/blacklist/sslipblacklist.txt. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  11. 2021. WatchGuard Threat Lab Reports 91.5Arrived over Encrypted Connections in Q2 2021. https://www.watchguard.com/wgrd-news/press-releases/ watchguard-threat-lab-reports-915-malware-arrived-over-encrypted. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  12. 2022. Spoiler: New ThreatLabz Report Reveals Over 85Attacks Are Encrypted. https://www.zscaler.com/blogs/security-research/2022-encryptedattacks-report. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  13. 2023. https://lists.blocklist.de/lists/all.txt. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  14. 2023. A Research-Oriented Top Sites Ranking Hardened Against Manipulation. https://tranco-list.eu/list/JXP6Y/1000000. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  15. 2023. Censys. https://search.censys.io/search/definitions?resource=hosts. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  16. 2023. Censys Search. https://search.censys.io/. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  17. 2023. The CINS Score CI-Badguys list. https://cinsscore.com/list/ci-badguys.txt. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  18. 2023. The Darklist IP blacklist. https://darklist.de/raw.php. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  19. 2023. Easily Identify Malicious Servers on the Internet with JARM. https://engineering.salesforce.com/easily-identify-malicious-servers-onthe-internet-with-jarm-e095edac525a/. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  20. 2023. Semi Automated Machine Learning Pipeline. https://github.com/alexdrk14/ SAMLP. Accessed: 23-06--23.Google ScholarGoogle Scholar
  21. 2023. Shodan Facet Analysis. https://beta.shodan.io/search/facet?query=http& facet=ssl.jarm. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  22. 2023. Shodan Search Engine. https://www.shodan.io/. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  23. 2023. SSLyze: A fast and powerful SSL/TLS scanning tool. https://github.com/ nabla-c0d3/sslyze. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  24. 2023. TestSSL. https://testssl.sh/. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  25. 2024. Dataset of paper "Fingerprinting the Shadows: Unmasking Malicious Servers with Machine Learning-Powered TLS Analysis". https://doi.org/10.5281/ zenodo.10655329. Accessed: 2024-02--18.Google ScholarGoogle Scholar
  26. Luai Al Shalabi, Zyad Shaaban, and Basel Kasasbeh. 2006. Data mining: A preprocessing engine. Journal of Computer Science 2, 9 (2006), 735--739.Google ScholarGoogle ScholarCross RefCross Ref
  27. Blake Anderson and David McGrew. 2019. Tls beyond the browser: Combining end host and network data to understand application behavior. In Proceedings of the Internet Measurement Conference. 379--392.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Blake Anderson, Subharthi Paul, and David McGrew. 2018. Deciphering malware's use of TLS (without decryption). Journal of Computer Virology and Hacking Techniques 14 (2018), 195--211.Google ScholarGoogle ScholarCross RefCross Ref
  29. Brandon Enright Lucas Messenger Adam Weller Andrew Chi Shekhar Achary Blake Anderson, David McGrew. 2019. Mercury: A network metadata tool for capturing and analysis. https://github.com/cisco/mercury. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  30. Leo Breiman. 2001. Random forests. Machine learning 45 (2001), 5--32.Google ScholarGoogle Scholar
  31. Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich, Yuan Tang, Hyunsu Cho, Kailong Chen, Rory Mitchell, Ignacio Cano, Tianyi Zhou, et al. 2015. Xgboost: extreme gradient boosting. R package version 0.4--2 1, 4 (2015), 1--4.Google ScholarGoogle Scholar
  32. Yige Chen, Tianning Zang, Yongzheng Zhang, Yuan Zhou, and Yipeng Wang. 2019. Rethinking encrypted traffic classification: A multi-attribute associated fingerprint approach. In 2019 IEEE 27th International Conference on Network Protocols (ICNP). IEEE, 1--11.Google ScholarGoogle ScholarCross RefCross Ref
  33. Bill Hudson David McGrew, Blake Anderson and Philip Perricone. 2017. Joy. https://github.com/cisco/joy. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  34. Zakir Durumeric, David Adrian, Ariana Mirian, Michael Bailey, and J Alex Halderman. 2015. A search engine backed by Internet-wide scanning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 542--553.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Peter Emerson. 2013. The original Borda count and partial voting. Social Choice and Welfare 40 (2013), 353--358.Google ScholarGoogle ScholarCross RefCross Ref
  36. Jon Fraenkel and Bernard Grofman. 2014. The Borda Count and its real-world alternatives: Comparing scoring rules in Nauru and Slovenia. Australian Journal of Political Science 49, 2 (2014), 186--205.Google ScholarGoogle ScholarCross RefCross Ref
  37. Sergey Frolov and Eric Wustrow. 2019. The use of TLS in Censorship Circumvention.. In NDSS.Google ScholarGoogle Scholar
  38. Ralph Holz, Jens Hiller, Johanna Amann, Abbas Razaghpanah, Thomas Jost, Narseo Vallina-Rodriguez, and Oliver Hohlfeld. 2020. Tracking the deployment of TLS 1.3 on the Web: A story of experimentation and centralization. ACM SIGCOMM Computer Communication Review 50, 3 (2020), 3--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mahdi Jafari Siavoshani, Amirhossein Khajehpour, Amirmohammad Ziaei Bideh, Amirali Gatmiri, and Ali Taheri. 2023. Machine learning interpretability meets tls fingerprinting. Soft Computing 27, 11 (2023), 7191--7208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jeff Atkinson John B. Althouse and Josh Atkins. 2017. JA3. https://github.com/ salesforce/ja3. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  41. Kinan Keshkeh, Aman Jantan, Kamal Alieyan, and Usman Mohammed Gana. 2021. A Reviewon TLS Encryption Malware Detection: TLS Features, Machine Learning Usage, and Future Directions. In Advances in Cyber Security: Third International Conference, ACeS 2021, Penang, Malaysia, August 24--25, 2021, Revised Selected Papers 3. Springer, 213--229.Google ScholarGoogle ScholarCross RefCross Ref
  42. Hyundo Kim, Minsu Kim, Joonseo Ha, and Heejun Roh. 2022. Revisiting TLSEncrypted Traffic Fingerprinting Methods for Malware Family Classification. In 2022 13th International Conference on Information and Communication Technology Convergence (ICTC). IEEE, 1273--1278.Google ScholarGoogle Scholar
  43. Maciej Korczy'ski and Andrzej Duda. 2014. Markov chain fingerprinting to classify encrypted traffic. In IEEE INFOCOM 2014 - IEEE Conference on Computer Communications. 781--789. https://doi.org/10.1109/INFOCOM.2014.6848005Google ScholarGoogle ScholarCross RefCross Ref
  44. Platon Kotzias, Abbas Razaghpanah, Johanna Amann, Kenneth G Paterson, Narseo Vallina-Rodriguez, and Juan Caballero. 2018. Coming of age: A longitudinal study of tls deployment. In Proceedings of the Internet Measurement Conference 2018. 415--428.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Martin La?tovicka, Stanislav ?pacek, Petr Velan, and Pavel Celeda. 2020. Using TLS fingerprints for OS identification in encrypted traffic. In NOMS 2020--2020 IEEE/IFIP Network Operations and Management Symposium. IEEE, 1--6.Google ScholarGoogle Scholar
  46. Xigao Li, Babak Amin Azad, Amir Rahmati, and Nick Nikiforakis. 2021. Good bot, bad bot: Characterizing automated browsing activity. In 2021 IEEE symposium on security and privacy (sp). IEEE, 1589--1605.Google ScholarGoogle ScholarCross RefCross Ref
  47. Antonio Nappa, Zhaoyan Xu, M Zubair Rafique, Juan Caballero, and Guofei Gu. 2014. Cyberprobe: Towards internet-scale active detection of malicious servers. In In Proceedings of the 2014 Network and Distributed System Security Symposium (NDSS 2014). The Internet Society, 1--15.Google ScholarGoogle ScholarCross RefCross Ref
  48. Chaeyeon Oh, Joonseo Ha, and Heejun Roh. 2021. A survey on TLS-encrypted malware network traffic analysis applicable to security operations centers. Applied Sciences 12, 1 (2021), 155.Google ScholarGoogle ScholarCross RefCross Ref
  49. Eva Papadogiannaki and Sotiris Ioannidis. 2021. A survey on encrypted network traffic analysis applications, techniques, and countermeasures. ACM Computing Surveys (CSUR) 54, 6 (2021), 1--35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Eva Papadogiannaki and Sotiris Ioannidis. 2023. Pump Up the JARM: Studying the Evolution of Botnets using Active TLS Fingerprinting. In Proceedings of the 28th IEEE Symposium on Computers and Communications (ISCC).Google ScholarGoogle ScholarCross RefCross Ref
  51. Muhammad Talha Paracha, Daniel J Dubois, Narseo Vallina-Rodriguez, and David Choffnes. 2021. IoTLS: understanding TLS usage in consumer IoT devices. In Proceedings of the 21st ACM Internet Measurement Conference. 165--178.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Abbas Razaghpanah, Arian Akhavan Niaki, Narseo Vallina-Rodriguez, Srikanth Sundaresan, Johanna Amann, and Phillipa Gill. 2017. Studying TLS usage in Android apps. In Proceedings of the 13th International Conference on emerging Networking EXperiments and Technologies. 350--362.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Shahbaz Rezaei, Bryce Kroencke, and Xin Liu. 2019. Large-scale mobile app identification using deep learning. IEEE Access 8 (2019), 348--362.Google ScholarGoogle ScholarCross RefCross Ref
  54. Ivan Ristic. 2009. HTTP Client Fingerprinting using SSL Handshake Analysis. https://blog.ivanristic.com/2009/06/http-client-fingerprinting-using-sslhandshake-analysis.html. (2009). Accessed: 2023-08--11.Google ScholarGoogle Scholar
  55. Ivan Ristic. 2012. Sslhalf. https://github.com/ssllabs/sslhaf. Accessed: 2023-08--11.Google ScholarGoogle Scholar
  56. Donald G Saari. 1985. The optimal ranking method is the Borda Count. Technical Report. Discussion paper.Google ScholarGoogle Scholar
  57. Meng Shen, Zhenbo Gao, Liehuang Zhu, and Ke Xu. 2021. Efficient fine-grained website fingerprinting via encrypted traffic analysis with deep learning. In 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS). IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  58. Alex Shevtsov, Despoina Antonakaki, Ioannis Lamprou, Polyvios Pratikakis, and Sotiris Ioannidis. 2023. BotArtist: Twitter bot detection Machine Learning model based on Twitter suspension. arXiv preprint arXiv:2306.00037 (2023).Google ScholarGoogle Scholar
  59. Markus Sosnowski, Johannes Zirngibl, Patrick Sattler, and Georg Carle. 2023. DissecTLS: A Scalable Active Scanner for TLS Server Configurations, Capabilities, and TLS Fingerprinting. In International Conference on Passive and Active Network Measurement. Springer, 110--126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Markus Sosnowski, Johannes Zirngibl, Patrick Sattler, Georg Carle, Claas Grohnfeldt, Michele Russo, and Daniele Sgandurra. 2022. Active TLS stack fingerprinting: characterizing TLS server deployments at scale. arXiv preprint arXiv:2206.13230 (2022).Google ScholarGoogle Scholar
  61. Geoffrey I Webb, Eamonn Keogh, and Risto Miikkulainen. 2010. Naïve Bayes. Encyclopedia of machine learning 15, 1 (2010), 713--714.Google ScholarGoogle Scholar
  62. Ziqing Zhang, Cuicui Kang, Gang Xiong, and Zhen Li. 2019. Deep forest with LRRS feature for fine-grained website fingerprinting with encrypted SSL/TLS. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 851--860.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fingerprinting the Shadows: Unmasking Malicious Servers with Machine Learning-Powered TLS Analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Article Metrics

          • Downloads (Last 12 months)89
          • Downloads (Last 6 weeks)89

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader