skip to main content
10.1145/3627106.3627190acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacsacConference Proceedingsconference-collections
research-article
Artifacts Evaluated & Functional / v1.1

Domain and Website Attribution beyond WHOIS

Published:04 December 2023Publication History

ABSTRACT

Currently, WHOIS is the main method for identifying which company or individual owns a domain or website. But, WHOIS usefulness is limited due to privacy protection services and data redaction. We present a novel automated approach for domain and website attribution. When WHOIS data does not reveal the owner, our approach leverages information from multiple other sources such as passive DNS, TLS certificates, and the analysis of website content. We propose a novel ranking technique to select the domain owner among multiple identified entities. Our approach identifies the domain owner with an F1 score of 0.94 compared to 0.54 for WHOIS. When applied on 3,001 tracker domains from the popular Disconnect list, it identifies needed updates to the list. It also attributes 84% of previously unattributed tracker domains.

References

  1. 2020. Xray2020 blacklist. https://github.com/TrackerControl/tracker-control-android/blob/master/app/src/main/assets/xray-blacklist.json.Google ScholarGoogle Scholar
  2. 2021. Yahoo is Yahoo once more after new owners complete acquisition. https://www.theverge.com/2021/9/2/22653652/yahoo-aol-acquired-apollo-global-management-private-equity.Google ScholarGoogle Scholar
  3. 2022. Disconnect, Inc.disconnect.me.Google ScholarGoogle Scholar
  4. 2022. Hosting provider catalogue. https://hostings.info/catalog.Google ScholarGoogle Scholar
  5. 2022. Natural Language Toolkit. https://www.nltk.org/.Google ScholarGoogle Scholar
  6. 2022. Schema.org. https://schema.org/.Google ScholarGoogle Scholar
  7. 2022. Stanford CoreNLP NER Model. https://stanfordnlp.github.io/CoreNLP/index.html.Google ScholarGoogle Scholar
  8. 2022. VirusTotal. https://www.virustotal.com/.Google ScholarGoogle Scholar
  9. 2022. webXray. https://github.com/timlib/webXray.Google ScholarGoogle Scholar
  10. 2022. whois: A Python package for retrieving WHOIS information of domains. https://github.com/DannyCork/python-whois.Google ScholarGoogle Scholar
  11. 2022. Zoxh. https://zoxh.com/.Google ScholarGoogle Scholar
  12. 2023. disconnectme issue #330 : Recategorizing the Entity which they belong to. https://github.com/disconnectme/disconnect-tracking-protection/issues/330.Google ScholarGoogle Scholar
  13. 2023. List of DynDNS Pro (Dynamic DNS) Domain Names. https://help.dyn.com/list-of-dyn-dns-pro-remote-access-domain-names/.Google ScholarGoogle Scholar
  14. 2023. PhishTank Submission 8296476. https://phishtank.org/phish_detail.php?phish_id=8296476.Google ScholarGoogle Scholar
  15. 2023. Tranco popular domain list. https://tranco-list.eu/list/K2XYW.Google ScholarGoogle Scholar
  16. 2023. WhoseDomain. https://hub.docker.com/r/dianecode/whosedomain.Google ScholarGoogle Scholar
  17. Sahar Abdelnabi, Katharina Krombholz, and Mario Fritz. 2019. WhiteNet: Phishing Website Detection by Visual Whitelists. CoRR abs/1909.00300 (2019).Google ScholarGoogle Scholar
  18. ammarshah. 2022. Email provider list. https://gist.github.com/ammarshah/f5c2624d767f91a7cbdc4e54db8dd0bf.Google ScholarGoogle Scholar
  19. Benjamin Andow, Samin Yaseer Mahmud, Wenyu Wang, Justin Whitaker, William Enck, Bradley Reaves, Kapil Singh, and Tao Xie. 2019. PolicyLint: Investigating Internal Privacy Policy Contradictions on Google Play. In USENIX Security Symposium.Google ScholarGoogle Scholar
  20. International Trademark Association. 2020. WHOIS Challenges: A Toolkit for Intellectual Property Professionals. https://www.inta.org/wp-content/uploads/public-files/advocacy/committee-reports/WHOIS-Challenges-A-Toolkit-for-Intellectual-Property-Professionals-3.20.20.pdf.Google ScholarGoogle Scholar
  21. NORC at the University of Chicago. 2010. Draft Report for the Study of the Accuracy of WHOIS Registrant Contact Information. hhttps://www.icann.org/en/resources/compliance/reports/whois-accuracy-study-17jan10-en.pdf.Google ScholarGoogle Scholar
  22. Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. 2009. Scalable, Behavior-Based Malware Clustering. In Network and Distributed System Security.Google ScholarGoogle Scholar
  23. Robert Biddle, Paul C Van Oorschot, Andrew S Patrick, Jennifer Sobey, and Tara Whalen. 2009. Browser interfaces and extended validation SSL certificates: an empirical study. In ACM Workshop on Cloud Computing Security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Leyla Bilge, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. 2014. EXPOSURE: a Passive DNS Analysis Service to Detect and Report Malicious Domains. ACM Transactions on Information and System Security 16, 4 (2014), 1–28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Reuben Binns, Jun Zhao, Max Van Kleek, and Nigel Shadbolt. 2018. Measuring Third-party Tracker Power across Web and Mobile. ACM Transactions on Internet Technology 18, 4 (2018), 1–22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Juan Caballero, Gibran Gomez, Srdjan Matic, Gustavo Sánchez, Silvia Sebastián, and Arturo Villacañas. 2023. The Rise of GoodFATR: A Novel Accuracy Comparison Methodology for Indicator Extraction Tools. Future Generation Computer Systems 144 (July 2023), 74–89. https://doi.org/10.1016/j.future.2023.02.012Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Orcun Cetin, Carlos Ganan, Maciej Korczynski, and Michel Van Eeten. 2017. Make notifications great again: learning how to notify in the age of large-scale vulnerability scanning. In Workshop on the Economics of Information Security.Google ScholarGoogle Scholar
  28. Neha Chachra, Stefan Savage, and Geoffrey M Voelker. 2015. Affiliate Crookies: Characterizing Affiliate Marketing Abuse. In Internet Measurement Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Richard Clayton and Tony Mansfield. 2014. A Study of Whois Privacy and Proxy Service Abuse. In Workshop on the Economics of Information Security.Google ScholarGoogle Scholar
  30. Cliqz GmbH. 2019. WhoTracks.me: Bringing Transparency to Online Tracking. https://github.com/cliqz-oss/whotracks.me.Google ScholarGoogle Scholar
  31. crunchbase 2022. Crunchbase. https://www.crunchbase.com/.Google ScholarGoogle Scholar
  32. Savino Dambra, Iskander Sanchez-Rola, Leyla Bilge, and Davide Balzarotti. 2022. When Sally Met Trackers: Web Tracking From the Users’ Perspective. In USENIX Security Symposium.Google ScholarGoogle Scholar
  33. David Dittrich and Sven Dietrich. 2007. Command and control structures in malware. Usenix magazine 32, 6 (2007).Google ScholarGoogle Scholar
  34. Steven Englehardt and Arvind Narayanan. 2016. Online tracking: A 1-million-site measurement and analysis. In ACM SIGSAC Conference on Computer and Communications Security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Chris Grier et al.2012. Manufacturing Compromise: The Emergence of Exploit-as-a-Service. In ACM Conference on Computer and Communication Security.Google ScholarGoogle Scholar
  36. Durumeric et al.2014. The Matter of Heartbleed. In Internet Measurement Conference.Google ScholarGoogle Scholar
  37. Europol. 2015. Use of WHOIS for cyber investigations. https://gac.icann.org/briefing-materials/public/gregory-mounier-ec3-lea-use-case-examples-of-whois-icann-54-publish-2015-10-19.pdf.Google ScholarGoogle Scholar
  38. Marjan Falahrastegar, Hamed Haddadi, Steve Uhlig, and Richard Mortier. 2016. Tracking personal identifiers across the web. In International Conference on Passive and Active Network Measurement.Google ScholarGoogle ScholarCross RefCross Ref
  39. fuelads 2022. Future Ads LLC (now Propel Media). https://www.linkedin.com/company/future-ads-llc/about/.Google ScholarGoogle Scholar
  40. Anthony Ha. 2020. BuzzFeed acquires HuffPost. https://techcrunch.com/2020/11/19/buzzfeed-acquires-huffpost/.Google ScholarGoogle Scholar
  41. Austin Hounsel, Jordan Holland, Ben Kaiser, Kevin Borgolte, Nick Feamster, and Jonathan Mayer. 2020. Identifying Disinformation Websites Using Infrastructure Features. In USENIX Workshop on Free and Open Communications on the Internet.Google ScholarGoogle Scholar
  42. Ghaith Husari, Ehab Al-Shaer, Mohiuddin Ahmed, Bill Chu, and Xi Niu. 2017. TTPDrill: Automatic and Accurate Extraction of Threat Actionsfrom Unstructured Text of CTI Sources. In Annual Computer Security Applications Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Collin Jackson, Daniel R Simon, Desney S Tan, and Adam Barth. 2007. An Evaluation of Extended Validation and Picture-in-Picture Phishing Attacks. In International Conference on Financial Cryptography and Data Security.Google ScholarGoogle Scholar
  44. Ishan Karunanayake, Nadeem Ahmed, Robert Malaney, Rafiqul Islam, and Sanjay K. Jha. 2021. De-Anonymisation Attacks on Tor: A Survey. IEEE Communications Surveys and Tutorials 23, 4 (2021), 2324–2350. https://doi.org/10.1109/COMST.2021.3093615Google ScholarGoogle ScholarCross RefCross Ref
  45. Amin Kharraz, William Robertson, and Engin Kirda. 2018. Surveylance: Automatically Detecting Online Survey Scams. In IEEE Symposium on Security and Privacy.Google ScholarGoogle Scholar
  46. Christian Kohlschütter, Peter Fankhauser, and Wolfgang Nejdl. 2010. Boilerplate Detection Using Shallow Text Features. In ACM International Conference on Web Search and Data Mining.Google ScholarGoogle Scholar
  47. Konrad Kollnig, Anastasia Shuba, Reuben Binns, Max Van Kleek, and Nigel Shadbolt. 2022. Are iPhones Really Better for Privacy? Comparative Study of iOS and Android Apps. Proceedings on Privacy Enhancing Technologies 2022, 2 (2022), 6–24. https://doi.org/doi:10.2478/popets-2022-0033Google ScholarGoogle ScholarCross RefCross Ref
  48. Platon Kotzias, Leyla Bilge, and Juan Caballero. 2016. Measuring PUP Prevalence and PUP Distribution through Pay-Per-Install Services. In USENIX Security Symposium.Google ScholarGoogle Scholar
  49. Adam Lerner, Anna Kornfeld Simpson, Tadayoshi Kohno, and Franziska Roesner. 2016. Internet jones and the raiders of the lost trackers: An archaeological study of web tracking from 1996 to 2016. In USENIX Security Symposium.Google ScholarGoogle Scholar
  50. letsencrypt 2022. Let’s Encrypt. https://letsencrypt.org/.Google ScholarGoogle Scholar
  51. Chaz Lever, Robert Walls, Yacin Nadji, David Dagon, Patrick McDaniel, and Manos Antonakakis. 2016. Domain-Z: 28 Registrations Later. Measuring the Exploitation of Residual Trust in Domains. In IEEE Symposium on Security and Privacy.Google ScholarGoogle Scholar
  52. Frank Li, Zakir Durumeric, Jakub Czyz, Mohammad Karami, Michael Bailey, Damon McCoy, Stefan Savage, and Vern Paxson. 2016. You’ve Got Vulnerability: Exploring Effective Vulnerability Notifications. In USENIX Security Symposium.Google ScholarGoogle Scholar
  53. Xigao Li, Anurag Yepuri, and Nick Nikiforakis. 2023. Double and Nothing: Understanding and Detecting Cryptocurrency Giveaway Scams. In Network and Distributed Systems Security Symposium.Google ScholarGoogle Scholar
  54. Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem Beyah. 2016. Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence. In ACM SIGSAC Conference on Computer and Communications Security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Yun Lin, Ruofan Liu, Dinil Mon Divakaran, Jun Yang Ng, Qing Zhou Chan, Yiwen Lu, Yuxuan Si, Fan Zhang, and Jin Song Dong. 2021. Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages. In USENIX Security Symposium.Google ScholarGoogle Scholar
  56. Ruofan Liu, Yun Lin, Xianglin Yang, Siang Hwee Ng, Dinil Mon Divakaran, and Jin Song Dong. 2022. Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach. In USENIX Security Symposium.Google ScholarGoogle Scholar
  57. Suqi Liu, Ian Foster, Stefan Savage, Geoffrey M Voelker, and Lawrence K Saul. 2015. Who is. com? Learning to parse WHOIS records. In Internet Measurement Conference.Google ScholarGoogle Scholar
  58. Chaoyi Lu, Baojun Liu, Yiming Zhang, Zhou Li, Fenglu Zhang, Haixin Duan, Ying Liu, Joann Qiongna Chen, Jinjin Liang, Zaifeng Zhang, 2021. From WHOIS to WHOWAS: A Large-Scale Measurement Study of Domain Registration Privacy under the GDPR. In Network and Distributed System Security Symposium–NDSS.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Maltego 2022. https://www.maltego.com/.Google ScholarGoogle Scholar
  60. Srdjan Matic, Platon Kotzias, and Juan Caballero. 2015. CARONTE: Detecting Location Leaks for Deanonymizing Tor Hidden Services. In ACM Conference on Computer and Communication Security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Damon McCoy, Andreas Pitsillidis, Grant Jordan, Nicholas Weaver, Christian Kreibich, Brian Krebs, Geoffrey M Voelker, Stefan Savage, and Kirill Levchenko. 2012. PharmaLeaks: Understanding the Business of Online Pharmaceutical Affiliate Programs. In USENIX Security Symposium.Google ScholarGoogle Scholar
  62. Rami M Mohammad, Fadi Thabtah, and Lee McCluskey. 2012. An Assessment of Features Related to Phishing Websites using an Automated Technique. In IEEE International Conference for Internet Technology and Secured Transactions.Google ScholarGoogle Scholar
  63. Mozilla. 2022. Public Suffix List. https://publicsuffix.org/.Google ScholarGoogle Scholar
  64. Ben Munson. 2021. Univision acquires Vix ahead of PrendeTV launch. https://www.fiercevideo.com/video/univision-acquires-vix-ahead-prendetv-launch.Google ScholarGoogle Scholar
  65. Panagiotis Papadopoulos, Nicolas Kourtellis, and Evangelos Markatos. 2019. Cookie synchronization: Everything you always wanted to know but were afraid to ask. In The World Wide Web Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyński, and Wouter Joosen. 2019. Tranco: A Research-Oriented Top Sites Ranking Hardened against Manipulation. In Network and Distributed Systems Security.Google ScholarGoogle Scholar
  67. Tara Poteat and Frank Li. 2021. Who you Gonna Call? An Empirical Evaluation of Website security.txt Deployment. In ACM Internet Measurement Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Thomas Rid and Ben Buchanan. 2015. Attributing cyber attacks. Journal of Strategic Studies 38, 1-2 (2015), 4–37.Google ScholarGoogle ScholarCross RefCross Ref
  69. Richard Rivera, Platon Kotzias, Avinash Sudhodanan, and Juan Caballero. 2019. Costly Freeware: A Systematic Analysis of Abuse in Download Portals. IET Information Security 13, 1 (January 2019), 27–35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Iskander Sanchez-Rola, Matteo Dell’Amico, Davide Balzarotti, Pierre-Antoine Vervier, and Leyla Bilge. 2021. Journey to the center of the cookie ecosystem: Unraveling actors’ roles and relationships. In IEEE Symposium on Security and Privacy.Google ScholarGoogle ScholarCross RefCross Ref
  71. Iskander Sanchez-Rola and Igor Santos. 2018. Knockin’ on trackers’ door: Large-scale automatic analysis of web tracking. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment.Google ScholarGoogle Scholar
  72. Silvia Sebastián and Juan Caballero. 2020. Towards Attribution in Mobile Markets: Identifying Developer Account Polymorphism. In ACM Conference on Computer and Communication Security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Konstantinos Solomos, Panagiotis Ilia, Sotiris Ioannidis, and Nicolas Kourtellis. 2020. Clash of the Trackers: Measuring the Evolution of the Online Tracking Ecosystem. In Network Traffic Measurement and Analysis Conference.Google ScholarGoogle Scholar
  74. Marco Squarcina, Mauro Tempesta, Lorenzo Veronese, Stefano Calzavara, and Matteo Maffei. 2021. Can I Take Your Subdomain? Exploring Same-Site Attacks in the Modern Web. In USENIX Security Symposium.Google ScholarGoogle Scholar
  75. Oleksii Starov, Yuchen Zhou, Xiao Zhang, Najmeh Miramirkhani, and Nick Nikiforakis. 2018. Betrayed by Your Dashboard: Discovering Malicious Campaigns via Web Analytics. In World Wide Web Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Ben Stock, Giancarlo Pellegrino, Christian Rossow, Martin Johns, and Michael Backes. 2016. Hey, You Have a Problem: On the Feasibility of Large-Scale Web Vulnerability Notification. In USENIX Security Symposium.Google ScholarGoogle Scholar
  77. Florian Streibelt, Martina Lindorfer, Seda Gürses, Carlos H Gañán, and Tobias Fiebig. 2023. Back-to-the-Future Whois: An IP Address Attribution Service for Working with Historic Datasets. In International Conference on Passive and Active Network Measurement.Google ScholarGoogle Scholar
  78. vtgraph 2022. VirusTotal Graph overview. https://support.virustotal.com/hc/en-us/articles/360004679937-VirusTotal-Graph-overview.Google ScholarGoogle Scholar
  79. Pengcheng Xia, Haoyu Wang, Bowen Zhang, Ru Ji, Bingyu Gao, Lei Wu, Xiapu Luo, and Guoai Xu. 2020. Characterizing Cryptocurrency Exchange Scams. Computers & Security 98 (2020).Google ScholarGoogle Scholar
  80. Yue Zhang, Jason I Hong, and Lorrie F Cranor. 2007. CANTINA: A Content-Based Approach to Detecting Phishing Web Sites. In International Conference on World Wide Web.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Ziyun Zhu and Tudor Dumitras. 2018. ChainSmith: Automatically Learning the Semantics of Malicious Campaigns by Mining Threat Intelligence Reports. In IEEE European Symposium on Security and Privacy.Google ScholarGoogle ScholarCross RefCross Ref
  82. Maya Ziv, Liz Izhikevich, Kimberly Ruth, Katherine Izhikevich, and Zakir Durumeric. 2021. ASdb: A System for Classifying Owners of Autonomous Systems. In ACM Internet Measurement Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Domain and Website Attribution beyond WHOIS

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ACSAC '23: Proceedings of the 39th Annual Computer Security Applications Conference
      December 2023
      836 pages
      ISBN:9798400708862
      DOI:10.1145/3627106

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 December 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate104of497submissions,21%
    • Article Metrics

      • Downloads (Last 12 months)175
      • Downloads (Last 6 weeks)45

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format