Website reorganization using an ant colony system

https://doi.org/10.1016/j.eswa.2010.04.083Get rights and content

Abstract

The growth of the Internet has led to many studies on adaptive websites based on web usage mining. Most studies focus on providing assistance to users rather than optimizing the website structure itself. A recent work pioneered the use of 0–1 programming models to optimally reorganize websites based on the cohesion among web pages obtained by web usage mining. The proposed models reduce the information overload and search depth for users surfing the web. A heuristic approach has also been proposed to reduce the required computation time. However, the heuristic approach involving two successive 0–1 programming models still requires a very long computation time to find the optimal solution, especially when the website contains many hyperlinks. To resolve the efficiency problem, this study proposes an ant colony system to reorganize website structures. The proposed algorithm is tested extensively with numerical examples. Additionally, an empirical study with a real-world website is conducted to verify the algorithm applicability.

Introduction

Designing and redesigning a complex website are increasingly challenging tasks for webmasters due to pervasiveness of the Internet. Perkowitz and Eizioni (1997) noted that different visitors, or the same visitor visiting at different times, may have different goals. Furthermore, a site may be designed for a particular purpose, but be used in unanticipated ways in practice. Moreover, many sites outgrow their original design, accumulating links and pages in unanticipated places. Web pages might become obsolete as time passes and the need for information changes. Web visitors then have to navigate redundant pages to reach their required information, prolonging the navigation time. Facilitating the browsing of a complex website is increasingly important in modern web management.

Perkowitz and Eizioni (1997) defined adaptive websites as sites that automatically improve their organization and presentation by learning from visitor access patterns. The browsing efficiency of a website can be enhanced by two basic directions, namely customization and transformation. Customization means aiding a user by providing additional meta-information or guidance. Although customization has many advocates, it has obvious limitations. First, additional index pages containing suggesting URLs might result in information overload for new visitors, or become useless for old visitors with different goals from those of previous visits. Second, web usage mining is computationally intensive for a large website, particularly one that is disorganized or has many obsolete pages. Additionally, obsolete pages raise the required disk space to store the pages, server logs, and user browsing history. Recent investigations have considered transformation, which substantially reorganizes the links among web pages. Transformation involves partially or completely changing the website structure to ease the navigation for a large set of users. The benefits of reorganizing the website structure include enhanced web usage mining efficiency and lower memory requirement for storing the pages, server logs and user browsing history.

Both customization and transformation depend on mining the web server logs to discover access patterns. Web mining involves adopting data mining techniques to discover useful information hidden in the web content and concealed user behaviors. Web mining is helpful in fields such as adaptive websites (Perkowitz & Eizioni, 1997), customized search engines (Madria, Bhowmick, Ng, & Lim, 1999) and customized e-commerce (Hollfelder, Oria, & Ozsu, 2000). Cooley, Mobasher, and Srivastava (1997) categorized web mining as web content mining and web usage mining. Web content mining digs knowledge from web content, for instance, to enhance search engines, while web usage mining digs user browsing patterns from web server logs to discover user needs, and thereby provide assistance to users. Web usage mining involves three main tasks, preprocessing, pattern discovery and pattern analysis (Srivastava, Cooley, Deshpande, & Tan, 2000). Preprocessing involves abstracting the usage, content and structural information from various data sources for pattern discovery. In usage preprocessing, the click-stream for each identified user is split into sessions. A 30-min timeout is often adopted as the default method of splitting a user’s click-stream into sessions. Pattern discovery adopts methods and algorithms developed for several fields such as statistical analysis, associate rules, clustering, classification and sequential patterns. The methods adopted depend on the purpose of mining. Pattern analysis filters out uninteresting rules or patterns from the set discovered in the pattern discovery phase.

Adaptive websites have become important applications for Web usage mining owing to the pioneering work of Perkowitz and Eizioni (1997). Most works on adaptive websites have focused on gathering related pages or users to provide assistance to the users. Users are clustered according to similar browsing patterns, whereas pages are clustered according to related content. In both applications, permanent or dynamic HTML pages can be created to recommend related hyperlinks to the user based on the user’s query or browsing history. However, clustering users appears to be more difficult than clustering web pages, possibly because user behaviors frequently change. Consequently, a user cannot be expected to have a consistent browsing behavior, and therefore cannot be categorized into a single group. Conversely, if a user can belong to many groups, and a user group is defined as users visiting the same page group, then clustering users becomes identical to clustering Web pages.

However, usage-based web page clustering can be performed in two ways: gathering similar sessions or transactions, and gathering Web pages with high co-occurrence frequency. In the first approach, a session is a group of web pages, similar to the basket items in association rule mining, and is therefore an object in the hyperspace to be clustered. Traditional distance-based clustering approaches can be used for session-based page clustering. Other clustering techniques, such as the adaptive resonance theory adopted by Kang and Cho (2001), the self-organizing map of Smith and Ng (2003) and ant colony clustering of Abraham and Ramos (2003), have also been applied to this clustering problem. However, as indicated by Mobasher, Cooley, and Srivastava (1999), session clusters are not effective for capturing an aggregated view of frequent user access patterns. Each session cluster may contain thousands of user sessions involving hundreds of URLs. Furthermore, determining the similarity between session vectors is not a trivial task. Hence, association rule mining techniques, such as the Apriori algorithm (Agrawal & Srikant, 1994) and the Association Rule Hypergraph Partitioning technique (Hollfelder et al., 2000, Kang and Cho, 2001) used by Mobasher et al. (1999), have been adopted to identify page pairs with high co-occurrence frequencies. Perkowitz and Eizioni (2000) processed access logs into visits, and obtained a similarity matrix containing co-occurrence frequencies between pages to represent a graph. Clusters in the graph were identified, ranked and presented to the webmaster. Finally, an index page consisting of links to the pages in each selected cluster was created.

Notably, none of the previous works had proposed reorganizing websites. Only dynamic or static index pages were provided to facilitate visitor searches, while the original hyperlink structure remained untouched. However, websites require reorganization, especially when they are old, or when their content outgrows the original structure. Fu, Shih, Creado, and Ju (2002) provided approaches to modify hyperlinks locally to evolve a website along with its usage. Hyperlinks are rearranged, and pages are merged or removed when necessary. Nevertheless, their method reorganizes websites locally rather than globally. Accumulating local refinements do not necessarily lead to global optimality.

Website structure reorganization can be considered as a special graph optimization problem. Lin (2006) recently pioneered the approach of adopting 0–1 programming models to formulate and solve problems based on the cohesion between web pages obtained from web usage mining. The proposed models consider reducing the information overload and search depth simultaneously for users surfing websites. Additionally, a heuristic approach involving two successive 0–1 programming models has been proposed to reduce the required computation time. However, due to the combinatorial nature of the models, the heuristic approach still requires a very long computation time to obtain the optimal solution, especially when the website contains many hyperlinks. To resolve the computation efficiency problem, this study presents an ant colony system to reorganize website structures. The next section introduces the website reorganization problem and Lin’s method (Lin, 2006). Section 3 presents an overview of ant colony optimization. Section 4 presents the proposed ant colony system, and Section 5 uses simulated data to compare the ant colony system approach with the 0–1 programming approach. Section 6 addresses the proposed method’s applicability to real-world problems by reorganizing a real-world website. Section 7 draws conclusions.

Section snippets

Models for website reorganization

Since the co-occurrence frequencies between web pages are available through web usage mining, they can be utilized as the basis for website reorganization. Clustering and association rule mining consider only which pages participate in a session, rather than the access sequence of these pages, and ignores the direction of access between pages. However, web pages might have logical navigation sequences that are essential to website reorganization, in which case the frequency of visiting page A

A review of ant colony system

Colorni, Dorigo, and Maniezzo (1992) published the first ant colony algorithm before Dorigo, Maniezzo, and Colorni (1996) had proposed the ant system to solve the well-known traveling salesman problem. Their method later evolved into the ant colony system (ACS) (Dorigo & Gambardella, 1997). More detailed theoretical analysis can be found in Dorigo and Blum (2005). Ant colony optimization mimics the behavior of ants that deposit pheromones along the paths in which they move when foraging. The

The proposed ant colony system for website reorganization

An ant that is required to find a subgraph instead of a tree has good chance of picking up edges with both ends already in the constructing graph until the degree limit is met for all the nodes, and no edge is left as a candidate. Finally, the ant stops at a subgraph containing only some of the nodes. To avoid forming a smaller subgraph would require considerable additional efforts that might impede the algorithm. Conversely, since the second stage can be performed using a simple deterministic

Computation experiments with artificial graphs

For comparing the proposed ACS with the 0–1 programming approach, three complete graphs of 80, 100 and 200 nodes with edge frequencies of random numbers between 0 and 1 were generated. For the graphs of 80 and 100 nodes, the edges of the largest 400, 800 and 1600 frequencies were selected repeatedly to form three test graphs before normalizing the frequencies of each graph. However, for the graphs of 200 nodes, the edges of the largest 1600 and 3200 frequencies were selected to retain

Experiments with a real-world website

To further investigate the applicability of the proposed ACS to real-world websites, this study attempted to reorganize the website of a medium-sized university in Taiwan. Only files with extensions .shtml and .aspx were regarded as graph nodes. According to this criterion, the website comprised 146 pages and 5361 hyperlinks, and had a maximum level of 3. The average number of hyperlinks per page was 36.8. Users were identified by IP address, according to the method of Cooley, Mobasher, and

Conclusions

Lin (2006) has shown that the website reorganization problem can be formulated as a graph optimization problem, and solved with a 0–1 programming approach. However, the 0–1 programming approach is applicable only for medium-size problems. Many real-world website reorganization problems have hundreds of nodes and thousands of links. The performance of the 0–1 programming approach quickly deteriorates as the problem size increases due to the combinatorial nature of the problem. This study

References (21)

There are more references available in the full text version of this article.

Cited by (24)

  • Improving website structure through reducing information overload

    2018, Decision Support Systems
    Citation Excerpt :

    Thus, it is important to consider hyperlinks as a design element during website maintenance. There is an abundant literature on improving website navigability, and the methods can be classified into two categories in general: (1) restitution of a completely new web structure [13–17], (2) introducing extra links to the current structure [18]. Nevertheless, such methods either cause substantial disorientation to existing users because of the radical changes to the site structure or complicate the existing structure because of the insertion of many new links, causing information overload to users and difficulty in locating appropriate links during navigation.

  • Context preserving navigation redesign under Markovian assumption for responsive websites

    2017, Electronic Commerce Research and Applications
    Citation Excerpt :

    This is called the average click ratio (Gupta et al., 2007). As a solution to this problem, the original website structure is viewed as a graph and replaced with another graph with a different link structure (Lin and Tseng, 2010). In this graph, the nodes are the webpages connected by the links which appear on each webpage.

  • Optimization of multi-criteria website structure based on enhanced tabu search and web usage mining

    2013, Applied Mathematics and Computation
    Citation Excerpt :

    Owing to deliberate control between diversification and intensification searches, metaheuristic techniques usually are unlikely to get trapped by local optima and are able to provide good quality solutions within reasonable time. With our literature survey, we found that many previous works formulate WSO by a quadratic assignment problem (QAP) model [12,13,16,18,20]. The QAP searches for the minimum distance-flow product cost by assigning facilities to individual locations.

  • Predicting web user behavior using learning-based ant colony optimization

    2012, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    Each one has been used depending on the approaches that have been adopted. Soft computing methodologies have gained a considerable amount of relevance in relation with WUM, due to their flexible implementation and results in the field of recommendation-based systems and adaptive web sites (Lin and Tseng, 2010). Within these fields, special attention has been concentrated on bio-inspired metaheuristics, which are commonly ruled by the concept of swarm intelligence, the ability of a group of agents to perform complex tasks through a collaborative process.

  • A Novel Transformation-Based E-Commerce Website Structure Optimization Model

    2024, IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India)
  • Aco with heuristic desirability for web page positioning problem

    2021, Studies in Computational Intelligence
View all citing articles on Scopus
View full text