ABSTRACT
Currently, WHOIS is the main method for identifying which company or individual owns a domain or website. But, WHOIS usefulness is limited due to privacy protection services and data redaction. We present a novel automated approach for domain and website attribution. When WHOIS data does not reveal the owner, our approach leverages information from multiple other sources such as passive DNS, TLS certificates, and the analysis of website content. We propose a novel ranking technique to select the domain owner among multiple identified entities. Our approach identifies the domain owner with an F1 score of 0.94 compared to 0.54 for WHOIS. When applied on 3,001 tracker domains from the popular Disconnect list, it identifies needed updates to the list. It also attributes 84% of previously unattributed tracker domains.
- 2020. Xray2020 blacklist. https://github.com/TrackerControl/tracker-control-android/blob/master/app/src/main/assets/xray-blacklist.json.Google Scholar
- 2021. Yahoo is Yahoo once more after new owners complete acquisition. https://www.theverge.com/2021/9/2/22653652/yahoo-aol-acquired-apollo-global-management-private-equity.Google Scholar
- 2022. Disconnect, Inc.disconnect.me.Google Scholar
- 2022. Hosting provider catalogue. https://hostings.info/catalog.Google Scholar
- 2022. Natural Language Toolkit. https://www.nltk.org/.Google Scholar
- 2022. Schema.org. https://schema.org/.Google Scholar
- 2022. Stanford CoreNLP NER Model. https://stanfordnlp.github.io/CoreNLP/index.html.Google Scholar
- 2022. VirusTotal. https://www.virustotal.com/.Google Scholar
- 2022. webXray. https://github.com/timlib/webXray.Google Scholar
- 2022. whois: A Python package for retrieving WHOIS information of domains. https://github.com/DannyCork/python-whois.Google Scholar
- 2022. Zoxh. https://zoxh.com/.Google Scholar
- 2023. disconnectme issue #330 : Recategorizing the Entity which they belong to. https://github.com/disconnectme/disconnect-tracking-protection/issues/330.Google Scholar
- 2023. List of DynDNS Pro (Dynamic DNS) Domain Names. https://help.dyn.com/list-of-dyn-dns-pro-remote-access-domain-names/.Google Scholar
- 2023. PhishTank Submission 8296476. https://phishtank.org/phish_detail.php?phish_id=8296476.Google Scholar
- 2023. Tranco popular domain list. https://tranco-list.eu/list/K2XYW.Google Scholar
- 2023. WhoseDomain. https://hub.docker.com/r/dianecode/whosedomain.Google Scholar
- Sahar Abdelnabi, Katharina Krombholz, and Mario Fritz. 2019. WhiteNet: Phishing Website Detection by Visual Whitelists. CoRR abs/1909.00300 (2019).Google Scholar
- ammarshah. 2022. Email provider list. https://gist.github.com/ammarshah/f5c2624d767f91a7cbdc4e54db8dd0bf.Google Scholar
- Benjamin Andow, Samin Yaseer Mahmud, Wenyu Wang, Justin Whitaker, William Enck, Bradley Reaves, Kapil Singh, and Tao Xie. 2019. PolicyLint: Investigating Internal Privacy Policy Contradictions on Google Play. In USENIX Security Symposium.Google Scholar
- International Trademark Association. 2020. WHOIS Challenges: A Toolkit for Intellectual Property Professionals. https://www.inta.org/wp-content/uploads/public-files/advocacy/committee-reports/WHOIS-Challenges-A-Toolkit-for-Intellectual-Property-Professionals-3.20.20.pdf.Google Scholar
- NORC at the University of Chicago. 2010. Draft Report for the Study of the Accuracy of WHOIS Registrant Contact Information. hhttps://www.icann.org/en/resources/compliance/reports/whois-accuracy-study-17jan10-en.pdf.Google Scholar
- Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. 2009. Scalable, Behavior-Based Malware Clustering. In Network and Distributed System Security.Google Scholar
- Robert Biddle, Paul C Van Oorschot, Andrew S Patrick, Jennifer Sobey, and Tara Whalen. 2009. Browser interfaces and extended validation SSL certificates: an empirical study. In ACM Workshop on Cloud Computing Security.Google ScholarDigital Library
- Leyla Bilge, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. 2014. EXPOSURE: a Passive DNS Analysis Service to Detect and Report Malicious Domains. ACM Transactions on Information and System Security 16, 4 (2014), 1–28.Google ScholarDigital Library
- Reuben Binns, Jun Zhao, Max Van Kleek, and Nigel Shadbolt. 2018. Measuring Third-party Tracker Power across Web and Mobile. ACM Transactions on Internet Technology 18, 4 (2018), 1–22.Google ScholarDigital Library
- Juan Caballero, Gibran Gomez, Srdjan Matic, Gustavo Sánchez, Silvia Sebastián, and Arturo Villacañas. 2023. The Rise of GoodFATR: A Novel Accuracy Comparison Methodology for Indicator Extraction Tools. Future Generation Computer Systems 144 (July 2023), 74–89. https://doi.org/10.1016/j.future.2023.02.012Google ScholarDigital Library
- Orcun Cetin, Carlos Ganan, Maciej Korczynski, and Michel Van Eeten. 2017. Make notifications great again: learning how to notify in the age of large-scale vulnerability scanning. In Workshop on the Economics of Information Security.Google Scholar
- Neha Chachra, Stefan Savage, and Geoffrey M Voelker. 2015. Affiliate Crookies: Characterizing Affiliate Marketing Abuse. In Internet Measurement Conference.Google ScholarDigital Library
- Richard Clayton and Tony Mansfield. 2014. A Study of Whois Privacy and Proxy Service Abuse. In Workshop on the Economics of Information Security.Google Scholar
- Cliqz GmbH. 2019. WhoTracks.me: Bringing Transparency to Online Tracking. https://github.com/cliqz-oss/whotracks.me.Google Scholar
- crunchbase 2022. Crunchbase. https://www.crunchbase.com/.Google Scholar
- Savino Dambra, Iskander Sanchez-Rola, Leyla Bilge, and Davide Balzarotti. 2022. When Sally Met Trackers: Web Tracking From the Users’ Perspective. In USENIX Security Symposium.Google Scholar
- David Dittrich and Sven Dietrich. 2007. Command and control structures in malware. Usenix magazine 32, 6 (2007).Google Scholar
- Steven Englehardt and Arvind Narayanan. 2016. Online tracking: A 1-million-site measurement and analysis. In ACM SIGSAC Conference on Computer and Communications Security.Google ScholarDigital Library
- Chris Grier et al.2012. Manufacturing Compromise: The Emergence of Exploit-as-a-Service. In ACM Conference on Computer and Communication Security.Google Scholar
- Durumeric et al.2014. The Matter of Heartbleed. In Internet Measurement Conference.Google Scholar
- Europol. 2015. Use of WHOIS for cyber investigations. https://gac.icann.org/briefing-materials/public/gregory-mounier-ec3-lea-use-case-examples-of-whois-icann-54-publish-2015-10-19.pdf.Google Scholar
- Marjan Falahrastegar, Hamed Haddadi, Steve Uhlig, and Richard Mortier. 2016. Tracking personal identifiers across the web. In International Conference on Passive and Active Network Measurement.Google ScholarCross Ref
- fuelads 2022. Future Ads LLC (now Propel Media). https://www.linkedin.com/company/future-ads-llc/about/.Google Scholar
- Anthony Ha. 2020. BuzzFeed acquires HuffPost. https://techcrunch.com/2020/11/19/buzzfeed-acquires-huffpost/.Google Scholar
- Austin Hounsel, Jordan Holland, Ben Kaiser, Kevin Borgolte, Nick Feamster, and Jonathan Mayer. 2020. Identifying Disinformation Websites Using Infrastructure Features. In USENIX Workshop on Free and Open Communications on the Internet.Google Scholar
- Ghaith Husari, Ehab Al-Shaer, Mohiuddin Ahmed, Bill Chu, and Xi Niu. 2017. TTPDrill: Automatic and Accurate Extraction of Threat Actionsfrom Unstructured Text of CTI Sources. In Annual Computer Security Applications Conference.Google ScholarDigital Library
- Collin Jackson, Daniel R Simon, Desney S Tan, and Adam Barth. 2007. An Evaluation of Extended Validation and Picture-in-Picture Phishing Attacks. In International Conference on Financial Cryptography and Data Security.Google Scholar
- Ishan Karunanayake, Nadeem Ahmed, Robert Malaney, Rafiqul Islam, and Sanjay K. Jha. 2021. De-Anonymisation Attacks on Tor: A Survey. IEEE Communications Surveys and Tutorials 23, 4 (2021), 2324–2350. https://doi.org/10.1109/COMST.2021.3093615Google ScholarCross Ref
- Amin Kharraz, William Robertson, and Engin Kirda. 2018. Surveylance: Automatically Detecting Online Survey Scams. In IEEE Symposium on Security and Privacy.Google Scholar
- Christian Kohlschütter, Peter Fankhauser, and Wolfgang Nejdl. 2010. Boilerplate Detection Using Shallow Text Features. In ACM International Conference on Web Search and Data Mining.Google Scholar
- Konrad Kollnig, Anastasia Shuba, Reuben Binns, Max Van Kleek, and Nigel Shadbolt. 2022. Are iPhones Really Better for Privacy? Comparative Study of iOS and Android Apps. Proceedings on Privacy Enhancing Technologies 2022, 2 (2022), 6–24. https://doi.org/doi:10.2478/popets-2022-0033Google ScholarCross Ref
- Platon Kotzias, Leyla Bilge, and Juan Caballero. 2016. Measuring PUP Prevalence and PUP Distribution through Pay-Per-Install Services. In USENIX Security Symposium.Google Scholar
- Adam Lerner, Anna Kornfeld Simpson, Tadayoshi Kohno, and Franziska Roesner. 2016. Internet jones and the raiders of the lost trackers: An archaeological study of web tracking from 1996 to 2016. In USENIX Security Symposium.Google Scholar
- letsencrypt 2022. Let’s Encrypt. https://letsencrypt.org/.Google Scholar
- Chaz Lever, Robert Walls, Yacin Nadji, David Dagon, Patrick McDaniel, and Manos Antonakakis. 2016. Domain-Z: 28 Registrations Later. Measuring the Exploitation of Residual Trust in Domains. In IEEE Symposium on Security and Privacy.Google Scholar
- Frank Li, Zakir Durumeric, Jakub Czyz, Mohammad Karami, Michael Bailey, Damon McCoy, Stefan Savage, and Vern Paxson. 2016. You’ve Got Vulnerability: Exploring Effective Vulnerability Notifications. In USENIX Security Symposium.Google Scholar
- Xigao Li, Anurag Yepuri, and Nick Nikiforakis. 2023. Double and Nothing: Understanding and Detecting Cryptocurrency Giveaway Scams. In Network and Distributed Systems Security Symposium.Google Scholar
- Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem Beyah. 2016. Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence. In ACM SIGSAC Conference on Computer and Communications Security.Google ScholarDigital Library
- Yun Lin, Ruofan Liu, Dinil Mon Divakaran, Jun Yang Ng, Qing Zhou Chan, Yiwen Lu, Yuxuan Si, Fan Zhang, and Jin Song Dong. 2021. Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages. In USENIX Security Symposium.Google Scholar
- Ruofan Liu, Yun Lin, Xianglin Yang, Siang Hwee Ng, Dinil Mon Divakaran, and Jin Song Dong. 2022. Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach. In USENIX Security Symposium.Google Scholar
- Suqi Liu, Ian Foster, Stefan Savage, Geoffrey M Voelker, and Lawrence K Saul. 2015. Who is. com? Learning to parse WHOIS records. In Internet Measurement Conference.Google Scholar
- Chaoyi Lu, Baojun Liu, Yiming Zhang, Zhou Li, Fenglu Zhang, Haixin Duan, Ying Liu, Joann Qiongna Chen, Jinjin Liang, Zaifeng Zhang, 2021. From WHOIS to WHOWAS: A Large-Scale Measurement Study of Domain Registration Privacy under the GDPR. In Network and Distributed System Security Symposium–NDSS.Google ScholarCross Ref
- [59] Maltego 2022. https://www.maltego.com/.Google Scholar
- Srdjan Matic, Platon Kotzias, and Juan Caballero. 2015. CARONTE: Detecting Location Leaks for Deanonymizing Tor Hidden Services. In ACM Conference on Computer and Communication Security.Google ScholarDigital Library
- Damon McCoy, Andreas Pitsillidis, Grant Jordan, Nicholas Weaver, Christian Kreibich, Brian Krebs, Geoffrey M Voelker, Stefan Savage, and Kirill Levchenko. 2012. PharmaLeaks: Understanding the Business of Online Pharmaceutical Affiliate Programs. In USENIX Security Symposium.Google Scholar
- Rami M Mohammad, Fadi Thabtah, and Lee McCluskey. 2012. An Assessment of Features Related to Phishing Websites using an Automated Technique. In IEEE International Conference for Internet Technology and Secured Transactions.Google Scholar
- Mozilla. 2022. Public Suffix List. https://publicsuffix.org/.Google Scholar
- Ben Munson. 2021. Univision acquires Vix ahead of PrendeTV launch. https://www.fiercevideo.com/video/univision-acquires-vix-ahead-prendetv-launch.Google Scholar
- Panagiotis Papadopoulos, Nicolas Kourtellis, and Evangelos Markatos. 2019. Cookie synchronization: Everything you always wanted to know but were afraid to ask. In The World Wide Web Conference.Google ScholarDigital Library
- Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyński, and Wouter Joosen. 2019. Tranco: A Research-Oriented Top Sites Ranking Hardened against Manipulation. In Network and Distributed Systems Security.Google Scholar
- Tara Poteat and Frank Li. 2021. Who you Gonna Call? An Empirical Evaluation of Website security.txt Deployment. In ACM Internet Measurement Conference.Google ScholarDigital Library
- Thomas Rid and Ben Buchanan. 2015. Attributing cyber attacks. Journal of Strategic Studies 38, 1-2 (2015), 4–37.Google ScholarCross Ref
- Richard Rivera, Platon Kotzias, Avinash Sudhodanan, and Juan Caballero. 2019. Costly Freeware: A Systematic Analysis of Abuse in Download Portals. IET Information Security 13, 1 (January 2019), 27–35.Google ScholarDigital Library
- Iskander Sanchez-Rola, Matteo Dell’Amico, Davide Balzarotti, Pierre-Antoine Vervier, and Leyla Bilge. 2021. Journey to the center of the cookie ecosystem: Unraveling actors’ roles and relationships. In IEEE Symposium on Security and Privacy.Google ScholarCross Ref
- Iskander Sanchez-Rola and Igor Santos. 2018. Knockin’ on trackers’ door: Large-scale automatic analysis of web tracking. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment.Google Scholar
- Silvia Sebastián and Juan Caballero. 2020. Towards Attribution in Mobile Markets: Identifying Developer Account Polymorphism. In ACM Conference on Computer and Communication Security.Google ScholarDigital Library
- Konstantinos Solomos, Panagiotis Ilia, Sotiris Ioannidis, and Nicolas Kourtellis. 2020. Clash of the Trackers: Measuring the Evolution of the Online Tracking Ecosystem. In Network Traffic Measurement and Analysis Conference.Google Scholar
- Marco Squarcina, Mauro Tempesta, Lorenzo Veronese, Stefano Calzavara, and Matteo Maffei. 2021. Can I Take Your Subdomain? Exploring Same-Site Attacks in the Modern Web. In USENIX Security Symposium.Google Scholar
- Oleksii Starov, Yuchen Zhou, Xiao Zhang, Najmeh Miramirkhani, and Nick Nikiforakis. 2018. Betrayed by Your Dashboard: Discovering Malicious Campaigns via Web Analytics. In World Wide Web Conference.Google ScholarDigital Library
- Ben Stock, Giancarlo Pellegrino, Christian Rossow, Martin Johns, and Michael Backes. 2016. Hey, You Have a Problem: On the Feasibility of Large-Scale Web Vulnerability Notification. In USENIX Security Symposium.Google Scholar
- Florian Streibelt, Martina Lindorfer, Seda Gürses, Carlos H Gañán, and Tobias Fiebig. 2023. Back-to-the-Future Whois: An IP Address Attribution Service for Working with Historic Datasets. In International Conference on Passive and Active Network Measurement.Google Scholar
- vtgraph 2022. VirusTotal Graph overview. https://support.virustotal.com/hc/en-us/articles/360004679937-VirusTotal-Graph-overview.Google Scholar
- Pengcheng Xia, Haoyu Wang, Bowen Zhang, Ru Ji, Bingyu Gao, Lei Wu, Xiapu Luo, and Guoai Xu. 2020. Characterizing Cryptocurrency Exchange Scams. Computers & Security 98 (2020).Google Scholar
- Yue Zhang, Jason I Hong, and Lorrie F Cranor. 2007. CANTINA: A Content-Based Approach to Detecting Phishing Web Sites. In International Conference on World Wide Web.Google ScholarDigital Library
- Ziyun Zhu and Tudor Dumitras. 2018. ChainSmith: Automatically Learning the Semantics of Malicious Campaigns by Mining Threat Intelligence Reports. In IEEE European Symposium on Security and Privacy.Google ScholarCross Ref
- Maya Ziv, Liz Izhikevich, Kimberly Ruth, Katherine Izhikevich, and Zakir Durumeric. 2021. ASdb: A System for Classifying Owners of Autonomous Systems. In ACM Internet Measurement Conference.Google ScholarDigital Library
Index Terms
- Domain and Website Attribution beyond WHOIS
Recommendations
WHOIS Lost in Translation: (Mis)Understanding Domain Name Expiration and Re-Registration
IMC '16: Proceedings of the 2016 Internet Measurement ConferenceInternet domain names expire when not renewed and may be claimed by a new owner. To date, despite existing work on abuses of residual trust after domain ownership changes, it is not well understood how often and how fast re-registrations occur, and the ...
Domain Registration Date Retrieval System of URLs in E-Mail Messages for Improving Spam Discrimination
COMPSACW '13: Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference WorkshopsIn recent years, many spam mails intending for ``One-click fraud" or ``Phishing" have been sent to many unspecified e-mail users. As one anti-spam technology, URL Blacklist based on the URLs in the spam mails is well used. However, spammers have been ...
Empirically Measuring WHOIS Misuse
Computer Security - ESORICS 2014AbstractWHOIS is a publicly-accessible online directory used to map domain names to the contact information of the people who registered them (registrants). Regrettably, registrants have anecdotally complained about their WHOIS information being misused, ...
Comments