ABSTRACT
Much of the content and structure of the Web remains inaccessible to evaluate at scale because it is gated by user authentication. This limitation restricts researchers to examining only a superficial layer of a website: the landing page or public, search-indexable pages. Since it is infeasible to create individual accounts across thousands of webpages, we examine the prevalence of Single Sign-On (SSO) on the web to explore the feasibility of using a few accounts to authenticate to many sites. We find that 58% of the top 10K websites with logins are accessible with popular 3rd-party SSO providers, such as Google, Facebook, and Apple, indicating that leveraging SSO offers a scalable solution to access a large volume of user-gated content.
- [n. d.]. Cached Chrome Top Million Websites. https://github.com/zakird/crux- top-listsGoogle Scholar
- [n. d.]. Simplabel. https://github.com/hlgirard/SimplabelGoogle Scholar
- Adrian Rosebrock. [n.,d.]. Multi-scale Template Matching using Python and OpenCV. https://pyimagesearch.com/2015/01/26/multi-scale-template-matching-using-python-opencv/Google Scholar
- Bernhard Ager, Wolfgang Mühlbauer, Georgios Smaragdakis, and Steve Uhlig. 2011. Web Content Cartography. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference (Berlin, Germany) (IMC '11). Association for Computing Machinery, New York, NY, USA, 585--600. https://doi.org/10.1145/2068816.2068870Google ScholarDigital Library
- Apple. 2019. New Guidelines for Sign in with Apple. https://developer.apple.com/news/?id=09122019bGoogle Scholar
- Apple. 2023. App Store Review Guidelines. https://developer.apple.com/app-store/review/guidelines/#sign-in-with-appleGoogle Scholar
- Waqar Aqeel, Balakrishnan Chandrasekaran, Anja Feldmann, and Bruce M. Maggs. 2020. On Landing and Internal Web Pages: The Strange Case of Jekyll and Hyde in Web Performance Measurement. In Proceedings of the ACM Internet Measurement Conference (Virtual Event, USA) (IMC '20). Association for Computing Machinery, New York, NY, USA, 680--695. https://doi.org/10.1145/3419394.3423626Google ScholarDigital Library
- Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. SURF: Speeded Up Robust Features. In Computer Vision - ECCV 2006, Alevs Leonardis, Horst Bischof, and Axel Pinz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 404--417.Google ScholarDigital Library
- Michael Butkiewicz, Harsha V. Madhyastha, and Vyas Sekar. 2011. Understanding Website Complexity: Measurements, Metrics, and Implications. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference (Berlin, Germany) (IMC '11). Association for Computing Machinery, New York, NY, USA, 313--328. https://doi.org/10.1145/2068816.2068846Google ScholarDigital Library
- Michael Butkiewicz, Daimeng Wang, Zhe Wu, Harsha V. Madhyastha, and Vyas Sekar. 2015. Klotski: Reprioritizing Web Content to Improve User Experience on Mobile Devices. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). USENIX Association, Oakland, CA, 439--453. https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/butkiewiczGoogle ScholarDigital Library
- Trinh Viet Doan, Roland van Rijswijk-Deij, Oliver Hohlfeld, and Vaibhav Bajpai. 2022. An Empirical View on Consolidation of the Web. ACM Trans. Internet Technol., Vol. 22, 3, Article 70 (feb 2022), 30 pages. https://doi.org/10.1145/3503158Google ScholarDigital Library
- Theresa Enghardt, Thomas Zinner, and Anja Feldmann. 2019. Web Performance Pitfalls. In Passive and Active Measurement, David Choffnes and Marinho Barcellos (Eds.). Springer International Publishing, Cham, 286--303.Google Scholar
- Google. [n.,d.]. Detect logos | Cloud Vision API. https://cloud.google.com/vision/docs/detecting-logosGoogle Scholar
- Dick Hardt. 2012. The OAuth 2.0 Authorization Framework. RFC 6749. https://doi.org/10.17487/RFC6749Google ScholarDigital Library
- Andrew J. Kaizer and Minaxi Gupta. 2016. Characterizing Website Behaviors Across Logged-in and Not-Logged-in Users. In Proceedings of the 2016 Internet Measurement Conference (Santa Monica, California, USA) (IMC '16). Association for Computing Machinery, New York, NY, USA, 111--117. https://doi.org/10.1145/2987443.2987450Google ScholarDigital Library
- Conor Kelton, Jihoon Ryoo, Aruna Balasubramanian, and Samir R. Das. 2017. Improving User Perceived Page Load Times Using Gaze. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 545--559. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/keltonGoogle ScholarDigital Library
- Nico Kokonas. 2021. Playwright Stealth. https://gist.github.com/nicoandmee/1ec1b6a07c94f82df41d2496194ef3a6Google Scholar
- Zhichun Li, Ming Zhang, Zhaosheng Zhu, Yan Chen, Albert Greenberg, and Yi-Min Wang. 2010. WebProphet: Automating Performance Prediction for Web Services. In 7th USENIX Symposium on Networked Systems Design and Implementation (NSDI 10). USENIX Association, San Jose, CA. https://www.usenix.org/conference/nsdi10-0/webprophet-automating-performance-prediction-web-servicesGoogle Scholar
- Greg Linden. 2006. Make Data Useful. http://sites.google.com/site/glinden/Home/StanfordDataMining.2006-11-28.ppt.Google Scholar
- Ravi Netravali, Ameesh Goyal, James Mickens, and Hari Balakrishnan. 2016. Polaris: Faster Page Loads Using Fine-grained Dependency Tracking. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). USENIX Association, Santa Clara, CA. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/netravaliGoogle ScholarDigital Library
- Ravi Netravali, Vikram Nathan, James Mickens, and Hari Balakrishnan. 2018. Vesper: Measuring Time-to-Interactivity for Web Pages. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 217--231. https://www.usenix.org/conference/nsdi18/presentation/netravali-vesperGoogle Scholar
- New York Times. 2023. Internet Archive Snapshot of nytimes.com/robots.txt from 2023/05/19. http://web.archive.org/web/20230519003326/https://www.nytimes.com/robots.txtGoogle Scholar
- OpenCV. [n.,d.]. OpenCV: Template Matching. https://docs.opencv.org/3.4/d4/dc6/tutorial_py_template_matching.htmlGoogle Scholar
- Playwright. 2023. Playwright: Fast and reliable end-to-end testing for modern web apps. https://playwright.dev/Google Scholar
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 779--788. https://doi.org/10.1109/CVPR.2016.91Google ScholarCross Ref
- Kimberly Ruth, Aurore Fass, Jonathan Azose, Mark Pearson, Emma Thomas, Caitlin Sadowski, and Zakir Durumeric. 2022a. A World Wide View of Browsing the World Wide Web. In Proceedings of the 22nd ACM Internet Measurement Conference (Nice, France) (IMC '22). Association for Computing Machinery, New York, NY, USA, 317--336. https://doi.org/10.1145/3517745.3561418Google ScholarDigital Library
- Kimberly Ruth, Deepak Kumar, Brandon Wang, Luke Valenta, and Zakir Durumeric. 2022b. Toppling Top Lists: Evaluating the Accuracy of Popular Website Lists. In Proceedings of the 22nd ACM Internet Measurement Conference (Nice, France) (IMC '22). Association for Computing Machinery, New York, NY, USA, 374--387. https://doi.org/10.1145/3517745.3561444Google ScholarDigital Library
- W3 Schools. [n.,d.]. Accessibility Labels. https://www.w3schools.com/accessibility/accessibility_labels.phpGoogle Scholar
- Stoyan Stefanov. 2008. YSlow 2.0. In CSDN SD2C.Google Scholar
- Tailscale Inc. 2023. Supported SSO identity providers. https://tailscale.com/kb/1013/sso-providers/Google Scholar
- Xiao Sophia Wang, Aruna Balasubramanian, Arvind Krishnamurthy, and David Wetherall. 2013. Demystifying Page Load Performance with WProf. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX Association, Lombard, IL, 473--485. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/wang_xiaoGoogle Scholar
- Wikipedia. [n.,d.]. Template Matching. https://en.wikipedia.org/wiki/Template_matchingGoogle Scholar
- Rui Xin, Shihan Lin, and Xiaowei Yang. 2023. Quantifying User Password Exposure To Third-Party CDNs. In Passive and Active Measurement: 24th International Conference, PAM 2023, Virtual Event, March 21-23, 2023, Proceedings. Springer-Verlag, Berlin, Heidelberg, 652--668. https://doi.org/10.1007/978-3-031-28486-1_27Google ScholarDigital Library
Index Terms
- The Prevalence of Single Sign-On on the Web: Towards the Next Generation of Web Content Measurement
Recommendations
Classifying web sites
WWW '07: Proceedings of the 16th international conference on World Wide WebIn this paper, we present a novel method for the classification of Web sites. This method exploits both structure and content of Web sites in order to discern their functionality. It allows for distinguishing between eight of the most relevant ...
Browser Feature Usage on the Modern Web
IMC '16: Proceedings of the 2016 Internet Measurement ConferenceModern web browsers are incredibly complex, with millions of lines of code and over one thousand JavaScript functions and properties available to website authors. This work investigates how these browser features are used on the modern, open web. We ...
Comparative eye tracking of experts and novices in web single sign-on
CODASPY '13: Proceedings of the third ACM conference on Data and application security and privacySecurity indicators in web browsers alert users to the presence of a secure connection between their computer and a web server; many studies have shown that such indicators are largely ignored by users in general. In other areas of computer security, ...
Comments