Elsevier

Expert Systems with Applications

Volume 39, Issue 15, 1 November 2012, Pages 11861-11869
Expert Systems with Applications

Using automated individual white-list to protect web digital identities

https://doi.org/10.1016/j.eswa.2012.02.020Get rights and content

Abstract

The theft attacks of web digital identities, e.g., phishing, and pharming, could result in severe loss to users and vendors, and even hold users back from using online services, e-business services, especially. In this paper, we propose an approach, referred to as automated individual white-list (AIWL), to protect user’s web digital identities. AIWL leverages a Naïve Bayesian classifier to automatically maintain an individual white-list of a user. If the user tries to submit his or her account information to a web site that does not match the white-list, AIWL will alert the user of the possible attack. Furthermore, AIWL keeps track of the features of login pages (e.g., IP addresses, document object model (DOM) paths of input widgets) in the individual white-list. By checking the legitimacy of these features, AIWL can efficiently defend users against hard attacks, especially pharming, and even dynamic pharming. Our experimental results and user studies show that AIWL is an efficient tool for protecting web digital identities.

Highlight

► Protect web identities by using Automated Individual White-List. ► White-list based anti-phishing. ► White-list based anti-pharming.

Introduction

Web digital identities (Chen, Wu, Shen, & Ji, 2011) in the form of pairs of usernames and passwords is a commonly used mechanism to authenticate individuals wishing to carry on transactions across the World Wide Web (Web for short). Applications that rely on such mechanisms include webmail, on-line banking, and social networking services (SNSs). It is not a surprise, thus, that a variety of attacks that aim at stealing user’s web digital identities are perpetrated. Among these attacks, phishing is the most widespread one. Phishing employs social engineering to trick a user into revealing his or her web digital identities to a fraudulent web site. The open source model of web pages makes it easy for attackers to create an exact replica of a legitimate site. Because such a replica can be easily created with little cost and looks very convincing to users, many such fraudulent web sites continuously appear (Fette et al., 2007, Zhang, Egelman, et al., 2007). As a result, phishing not only leads to a severe threat to user’s web digital identities, but also erodes the fundamental premise of activities and business on the Web.

Users are not usually skillful enough to defend themselves against the theft attacks of web digital identities, especially phishing attacks (Dodge et al., 2007, He et al., 2011, Sheng et al., 2007), because fraudulent web sites generally have appearances similar to the genuine ones. Moreover, the URLs of fraudulent web sites are forged so to look very similar, and sometimes even identical, to the legitimate sites. So it is difficult for even a more careful user to detect fraudulent web sites.

Because of the potential severe damages resulting from phishing attacks, anti-phishing techniques and tools represent a very active research area in web security. Many approaches and tools have thus been developed to address the problem of phishing (Aburrous et al., 2010, eBay Toolbar’s Account Guard, 2011, CallingID, 2011, Chen et al., 2009, Dodge et al., 2007, EarthLink Tool, 2011, GeoTrust, 2011, Google, 2011, He et al., 2011, NetCraft, 2011, SpoofGuard, 2011). There are four main topics in anti-phishing research (Zhang, Hong, & Cranor, 2007): understanding why people fall for phishing attacks; methods for educating people in order not to fall for phishing attacks; user interfaces for helping individuals in making better judgments about trustable email and legitimate web sites; and automated tools for detecting phishing.

Among the four topics, designing automated tools for detecting detecting phishing is today the focus of intense research. Approaches to the design of these tools can be categorized in four types: blacklist, white-list, heuristic, and hybrid.

  • Blacklist approach: In the blacklist approach all web sites recognized as fraudulent web sites are listed in a list, referred to as blacklist. Since web sites are added into the blacklist after verifications, users can be sure of the illegitimacy of the web sites which cause warnings. But it takes a great deal of resources and time to maintain the blacklist. Furthermore, since fraudulent sites continously emerge, it is hard to keep the blacklist up to date.

  • White-list approach: Unlike the blacklist approach, the white-list approach maintains a list containing all legitimate web sites. Any web sites that do not appear the list are recognized as potential malicious web sites. Thus the white-list approach requires to list all legitimate web sites in the world and to keep the white-list up to date.

    The current white-list tools usually use a global white-list where all legitimate web sites are required to be included in the white-list. But it is obviously impossible for the administrator of the white list to cover the information of all legitimate web sites in the Internet. Thus, when such types of tools alerts, users will not be sure whether the current web site is an illegitimate one or is a legitimate one whose information is not contained in the white-list in time.

  • Heuristic approach: The heuristic approach, adopted by the majority of anti-phishing tools, leverages the characteristics of a web site to decide the legitimacy of the web site. In a heuristic approach, web sites that have high similarity or tight relationship with legitimate web sites but actually are not the original ones are recognized as fraudulent web sites. The similarity or relationship of a web site with the legitimate ones is computed based on information collected on the legitimate web sites, referred to as a feature library (Chen et al., 2009).

  • Hybrid approach: A hybrid approach combines the above approaches, such as a global white list and some heuristic approaches (Xiang & Hong, 2009), or a combination of a heuristic approach and a blacklist approach (eBay Toolbar’s Account Guard, 2011), to recognize phishing pages.

Several experiments carried out by Zhang, Egelman, et al. (2007) have shown that the current automated tools are not effective in protecting not provide the users’ digital identities.

This paper, therefore, proposes an approach, referred to as Automated Individual White-List (AIWL), to protect user’s web digital identities. Although a global white-list approach is unpractical, we argue that an individual white-list approach is practical, because an individual white-list approach records the familiar legitimate web sites of a user rather than all the legitimate web sites in the world. The study of Florencio and Herley (2007) and our experiments in Section 4.3 show that a user only logs in a limited and stable number of web sites. AIWL, therefore, takes advantage of these observations to build an individual white-list to defend users against the theft attacks of web digital identities efficiently.

The main contributions of AIWL are as follows:

  • AIWL is the tool that employs an individual white-list, automatically maintained by a Naïve Bayesian classifier, to protect user’s web digital identities. In AIWL, any web site that does not match the individual white-list is classified as a fraudulent web site, and AIWL will alert the user who is trying to submit his or her account information to such a web site. Compared with the traditional blacklist approach and global white-list approach, this individual white-list approach is more practical.

  • AIWL offers an effective solution to defend users against pharming attacks, including dynamic pharming (Karlof, Tygar, Wagner, & Shankar, 2007). AIWL keeps track of the features of login pages (e.g., IP addresses, Document Object Model (DOM) paths of input widgets) in the individual white-list to detect these attacks. AIWL can recognize pharming by checking the IP addresses of web sites. In addition, AIWL is able to effectively defend users against dynamic pharming by checking the Document Object Model (DOM) paths of the input widgets in the web page. Because the dynamic pharming attack embeds a legitimate login web page into the phishing site, the DOM paths will be modified, and thus AIWL can detect the attack based on such modification.

The rest of the paper is organized as follows: Section 2 introduces some background knowledge needed for the discussion in the paper; Section 3 describes AIWL in details Section 4 reports experimental results and user studies concerning the efficiency of AIWL; Section 5 analyzes some important issues in AIWL and discusses the limitations of AIWL; Section 6 introduces related work; and Section 7 outlines the conclusions and our future work.

Section snippets

Phishing and pharming

A phishing attack (APWG, 2011, Fette et al., 2007) usually involves sending a user a fake e-mail claiming to be from a legitimate web site, leading the user to a fraudulent web site which looks very similar to the legitimate one, and tricking the user into exposing his or her web digital identity. Once the user submits his or her account information to such a fraudulent web site, the attackers are able to impersonate the victim and steal victim’s personal information, such as financial

Construct an individual white-list

To construct an individual white-list for a user, the familiar legitimate web sites of the user should be identified. In AIWL, we assume that the web sites where an individual user has successfully accessed the anticipatory services after submitting his or her account information are familiar legitimate web sites for the user. The reason is that the aim of malicious web sites is stealing user’s web digital identities. The malicious web sites would not provide the same services as the legitimate

Constructing the Naïve Bayesian classifier

The Naïve Bayesian classifier was constructed to enable AIWL to recognize a successful login process. We simulated login processes for 34 web sites. 18 of 34 web sites are phishing web sites from PhishTank.com (PhishTank, 2011). The other 16 web sites are legitimate web sites. For every legitimate web site, both the successful login process and the failed one were simulated. We simulated failed login processes by purposely using wrong passwords. Thus, there are altogether 50 login processes

Efficiency in identifying login processes

AIWL uses Inbrowserhistory, HasNopasswordField, Numberoflink, HasNoUsername and Opertime as the features to identify successful login processes. With those features, AIWL can classify login processes in 100% true positive and 0% false positive. That is, all login processes that AIWL recognizes as successful login processes are actually successful login processes and all login processes that AIWL recognizes as failed login processes are actually failed login processes. This perfect result is

Related work

The problem of protecting from the theft attacks of web digital identities, especially phishing attacks, has been widely investigated from several different perspectives and several approaches exist.

First, the user’s own security awareness is a very important factor in ensuring a safe and secure e-business environment. Therefore, the Anti-Phishing Working Group and other financial organizations have gathered a large amount of materials giving suggestions and guidelines to users in order to

Conclusion and future work

This paper proposes an approach, called Automated Individual White-List (AIWL), to protect user’s web digital identities. AIWL is effective in detecting the theft attacks of web digital identities by maintaining an automated individual white-list of all web sites familiar to the user together with the LUI information of these web sites. AIWL uses a Naïve Bayesian classifier to automatically build an individual white-list for the user. As is shown by our experiments, AIWL recognizes a successful

Acknowledgements

This paper is partly supported by “211-Project Sponsorship Projects for Young Professors at Fudan”, the 863 project (Grant No: 2011AA100701) and Key Lab of Information Network Security, Ministry of Public Security (Grant NO: C11601).

References (40)

  • Dhamija, R., & Tygar, J. D. (2005). The battle against phishing: Dynamic security skins. In Proceedings of the 2005...
  • R.C. Dodge et al.

    Phishing for user security awareness

    Computers & Security

    (2007)
  • Domingos, P., & Pazzani, M. (1996). Beyond Independence: Conditions for the optimality of the simple Bayesian...
  • R.O. Duda et al.

    Bayes decision theory

    (1973)
  • EarthLink Tool (2011)....
  • EBay (2007). Spoof email tutorial....
  • eBay Toolbar’s Account Guard (2011)....
  • Fette, I., Sadeh, N., & Tomasic, A. (2007). Learning to detect phishing emails. In Proceeding of international world...
  • Florencio, D., & Herley, C. (2005). Stopping a phishing attack, even when the victims ignore warnings, microsoft...
  • Florencio, D., & Herley, C. (2007). A large-scale study of web password habits. In Proceeding of international world...
  • Cited by (56)

    • COVID-19 malicious domain names classification[Formula presented]

      2022, Expert Systems with Applications
      Citation Excerpt :

      Finally, Section 6 presents and compares the results of the different algorithms, and Section 7 offers conclusions and suggests future works on the topic. Software-based detection techniques are generally divided into three classes: visual similarity-based (VSB) detection systems (Jain & Gupta, 2017), list-based (LB) detection systems (Cao et al., 2008; Han et al., 2012), and machine learning based (MLB) detection systems. VSB approaches can be grouped into HTML DOM (HyperText Markup Language Document Object Model), Cascading Style Sheet (CSS) similarity, visual features, visual perception, and hybrid approaches (ALmomani, 2013).

    • A predictive model for phishing detection

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      For instance, most client-side deployments suffer from the use of a specific browser to secure online communication against phishing attacks (e.g. SpoofGuard is deployed on Mozilla). In addition, intensive administration (e.g. undue updates of a browser for a security patch, installation, etc.) and JavaScript exploits limited the efficiency of client-side deployment (Aparna and Muniasamy, 2015; Han et al. 2012). Similarly, the use of anti-phishing scheme as a server-side filter is being challenged by trust and third-party involvement.

    • PhiDMA – A phishing detection model with multi-filter approach

      2020, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      The primary characteristic for the whitelist is the magnitude of the lists. Researchers have contributed different strategies to distinguish phishing sites utilizing whitelist (Kang and Lee, 2007; Wang et al., 2008; Afroz and Greenstadt, 2011; Han et al., 2012; Cao et al., 2008). Although, whitelist gives a considerable commitment on recognizing phishing sites, notwithstanding, the main downside of this approach is retaining the whitelist updated.

    • Analysis model at the sentence level for phishing detection

      2024, Deep Learning, Reinforcement Learning, and the Rise of Intelligent Systems
    View all citing articles on Scopus
    View full text