Website reorganization using an ant colony system

doi:10.1016/j.eswa.2010.04.083

Expert Systems with Applications

Volume 37, Issue 12, December 2010, Pages 7598-7605

https://doi.org/10.1016/j.eswa.2010.04.083 Get rights and content

Abstract

The growth of the Internet has led to many studies on adaptive websites based on web usage mining. Most studies focus on providing assistance to users rather than optimizing the website structure itself. A recent work pioneered the use of 0–1 programming models to optimally reorganize websites based on the cohesion among web pages obtained by web usage mining. The proposed models reduce the information overload and search depth for users surfing the web. A heuristic approach has also been proposed to reduce the required computation time. However, the heuristic approach involving two successive 0–1 programming models still requires a very long computation time to find the optimal solution, especially when the website contains many hyperlinks. To resolve the efficiency problem, this study proposes an ant colony system to reorganize website structures. The proposed algorithm is tested extensively with numerical examples. Additionally, an empirical study with a real-world website is conducted to verify the algorithm applicability.

Introduction

Designing and redesigning a complex website are increasingly challenging tasks for webmasters due to pervasiveness of the Internet. Perkowitz and Eizioni (1997) noted that different visitors, or the same visitor visiting at different times, may have different goals. Furthermore, a site may be designed for a particular purpose, but be used in unanticipated ways in practice. Moreover, many sites outgrow their original design, accumulating links and pages in unanticipated places. Web pages might become obsolete as time passes and the need for information changes. Web visitors then have to navigate redundant pages to reach their required information, prolonging the navigation time. Facilitating the browsing of a complex website is increasingly important in modern web management.

Perkowitz and Eizioni (1997) defined adaptive websites as sites that automatically improve their organization and presentation by learning from visitor access patterns. The browsing efficiency of a website can be enhanced by two basic directions, namely customization and transformation. Customization means aiding a user by providing additional meta-information or guidance. Although customization has many advocates, it has obvious limitations. First, additional index pages containing suggesting URLs might result in information overload for new visitors, or become useless for old visitors with different goals from those of previous visits. Second, web usage mining is computationally intensive for a large website, particularly one that is disorganized or has many obsolete pages. Additionally, obsolete pages raise the required disk space to store the pages, server logs, and user browsing history. Recent investigations have considered transformation, which substantially reorganizes the links among web pages. Transformation involves partially or completely changing the website structure to ease the navigation for a large set of users. The benefits of reorganizing the website structure include enhanced web usage mining efficiency and lower memory requirement for storing the pages, server logs and user browsing history.

Both customization and transformation depend on mining the web server logs to discover access patterns. Web mining involves adopting data mining techniques to discover useful information hidden in the web content and concealed user behaviors. Web mining is helpful in fields such as adaptive websites (Perkowitz & Eizioni, 1997), customized search engines (Madria, Bhowmick, Ng, & Lim, 1999) and customized e-commerce (Hollfelder, Oria, & Ozsu, 2000). Cooley, Mobasher, and Srivastava (1997) categorized web mining as web content mining and web usage mining. Web content mining digs knowledge from web content, for instance, to enhance search engines, while web usage mining digs user browsing patterns from web server logs to discover user needs, and thereby provide assistance to users. Web usage mining involves three main tasks, preprocessing, pattern discovery and pattern analysis (Srivastava, Cooley, Deshpande, & Tan, 2000). Preprocessing involves abstracting the usage, content and structural information from various data sources for pattern discovery. In usage preprocessing, the click-stream for each identified user is split into sessions. A 30-min timeout is often adopted as the default method of splitting a user’s click-stream into sessions. Pattern discovery adopts methods and algorithms developed for several fields such as statistical analysis, associate rules, clustering, classification and sequential patterns. The methods adopted depend on the purpose of mining. Pattern analysis filters out uninteresting rules or patterns from the set discovered in the pattern discovery phase.

Adaptive websites have become important applications for Web usage mining owing to the pioneering work of Perkowitz and Eizioni (1997). Most works on adaptive websites have focused on gathering related pages or users to provide assistance to the users. Users are clustered according to similar browsing patterns, whereas pages are clustered according to related content. In both applications, permanent or dynamic HTML pages can be created to recommend related hyperlinks to the user based on the user’s query or browsing history. However, clustering users appears to be more difficult than clustering web pages, possibly because user behaviors frequently change. Consequently, a user cannot be expected to have a consistent browsing behavior, and therefore cannot be categorized into a single group. Conversely, if a user can belong to many groups, and a user group is defined as users visiting the same page group, then clustering users becomes identical to clustering Web pages.

However, usage-based web page clustering can be performed in two ways: gathering similar sessions or transactions, and gathering Web pages with high co-occurrence frequency. In the first approach, a session is a group of web pages, similar to the basket items in association rule mining, and is therefore an object in the hyperspace to be clustered. Traditional distance-based clustering approaches can be used for session-based page clustering. Other clustering techniques, such as the adaptive resonance theory adopted by Kang and Cho (2001), the self-organizing map of Smith and Ng (2003) and ant colony clustering of Abraham and Ramos (2003), have also been applied to this clustering problem. However, as indicated by Mobasher, Cooley, and Srivastava (1999), session clusters are not effective for capturing an aggregated view of frequent user access patterns. Each session cluster may contain thousands of user sessions involving hundreds of URLs. Furthermore, determining the similarity between session vectors is not a trivial task. Hence, association rule mining techniques, such as the Apriori algorithm (Agrawal & Srikant, 1994) and the Association Rule Hypergraph Partitioning technique (Hollfelder et al., 2000, Kang and Cho, 2001) used by Mobasher et al. (1999), have been adopted to identify page pairs with high co-occurrence frequencies. Perkowitz and Eizioni (2000) processed access logs into visits, and obtained a similarity matrix containing co-occurrence frequencies between pages to represent a graph. Clusters in the graph were identified, ranked and presented to the webmaster. Finally, an index page consisting of links to the pages in each selected cluster was created.

Notably, none of the previous works had proposed reorganizing websites. Only dynamic or static index pages were provided to facilitate visitor searches, while the original hyperlink structure remained untouched. However, websites require reorganization, especially when they are old, or when their content outgrows the original structure. Fu, Shih, Creado, and Ju (2002) provided approaches to modify hyperlinks locally to evolve a website along with its usage. Hyperlinks are rearranged, and pages are merged or removed when necessary. Nevertheless, their method reorganizes websites locally rather than globally. Accumulating local refinements do not necessarily lead to global optimality.

Website structure reorganization can be considered as a special graph optimization problem. Lin (2006) recently pioneered the approach of adopting 0–1 programming models to formulate and solve problems based on the cohesion between web pages obtained from web usage mining. The proposed models consider reducing the information overload and search depth simultaneously for users surfing websites. Additionally, a heuristic approach involving two successive 0–1 programming models has been proposed to reduce the required computation time. However, due to the combinatorial nature of the models, the heuristic approach still requires a very long computation time to obtain the optimal solution, especially when the website contains many hyperlinks. To resolve the computation efficiency problem, this study presents an ant colony system to reorganize website structures. The next section introduces the website reorganization problem and Lin’s method (Lin, 2006). Section 3 presents an overview of ant colony optimization. Section 4 presents the proposed ant colony system, and Section 5 uses simulated data to compare the ant colony system approach with the 0–1 programming approach. Section 6 addresses the proposed method’s applicability to real-world problems by reorganizing a real-world website. Section 7 draws conclusions.

Section snippets

Models for website reorganization

Since the co-occurrence frequencies between web pages are available through web usage mining, they can be utilized as the basis for website reorganization. Clustering and association rule mining consider only which pages participate in a session, rather than the access sequence of these pages, and ignores the direction of access between pages. However, web pages might have logical navigation sequences that are essential to website reorganization, in which case the frequency of visiting page A

A review of ant colony system

Colorni, Dorigo, and Maniezzo (1992) published the first ant colony algorithm before Dorigo, Maniezzo, and Colorni (1996) had proposed the ant system to solve the well-known traveling salesman problem. Their method later evolved into the ant colony system (ACS) (Dorigo & Gambardella, 1997). More detailed theoretical analysis can be found in Dorigo and Blum (2005). Ant colony optimization mimics the behavior of ants that deposit pheromones along the paths in which they move when foraging. The

The proposed ant colony system for website reorganization

An ant that is required to find a subgraph instead of a tree has good chance of picking up edges with both ends already in the constructing graph until the degree limit is met for all the nodes, and no edge is left as a candidate. Finally, the ant stops at a subgraph containing only some of the nodes. To avoid forming a smaller subgraph would require considerable additional efforts that might impede the algorithm. Conversely, since the second stage can be performed using a simple deterministic

Computation experiments with artificial graphs

For comparing the proposed ACS with the 0–1 programming approach, three complete graphs of 80, 100 and 200 nodes with edge frequencies of random numbers between 0 and 1 were generated. For the graphs of 80 and 100 nodes, the edges of the largest 400, 800 and 1600 frequencies were selected repeatedly to form three test graphs before normalizing the frequencies of each graph. However, for the graphs of 200 nodes, the edges of the largest 1600 and 3200 frequencies were selected to retain

Experiments with a real-world website

To further investigate the applicability of the proposed ACS to real-world websites, this study attempted to reorganize the website of a medium-sized university in Taiwan. Only files with extensions .shtml and .aspx were regarded as graph nodes. According to this criterion, the website comprised 146 pages and 5361 hyperlinks, and had a maximum level of 3. The average number of hyperlinks per page was 36.8. Users were identified by IP address, according to the method of Cooley, Mobasher, and

Conclusions

Lin (2006) has shown that the website reorganization problem can be formulated as a graph optimization problem, and solved with a 0–1 programming approach. However, the 0–1 programming approach is applicable only for medium-size problems. Many real-world website reorganization problems have hundreds of nodes and thousands of links. The performance of the 0–1 programming approach quickly deteriorates as the problem size increases due to the combinatorial nature of the problem. This study

References (21)

M. Dorigo et al.
Ant colony optimization theory: A survey
Theoretical Computer Science
(2005)
C.C. Lin
Optimal web site reorganization considering information overload and search depth
European Journal of Operational Research
(2006)
M. Perkowitz et al.
Toward adaptive Web sites: Conceptual framework and case study
Artificial Intelligence
(2000)
K.A. Smith et al.
Web page clustering using a self-organizing map of user navigation patterns
Decision Support Systems
(2003)
A. Abraham et al.
Web usage mining using artificial ant colony clustering and linear genetic programming
A. Agrawal et al.
Fast algorithms for mining association rules
T.N. Bui et al.
An ant-based algorithm for finding degree-constrained minimum spanning tree
A. Colorni et al.
Distributed optimization by ant colonies
R. Cooley et al.
Web mining: information and pattern discovery on the World Wide Web
R. Cooley et al.
Data preparation for mining World Wide Web browsing patterns
Knowledge and Information Systems
(1999)

There are more references available in the full text version of this article.

Cited by (24)

Improving website structure through reducing information overload
2018, Decision Support Systems
Citation Excerpt :
Thus, it is important to consider hyperlinks as a design element during website maintenance. There is an abundant literature on improving website navigability, and the methods can be classified into two categories in general: (1) restitution of a completely new web structure [13–17], (2) introducing extra links to the current structure [18]. Nevertheless, such methods either cause substantial disorientation to existing users because of the radical changes to the site structure or complicate the existing structure because of the insertion of many new links, causing information overload to users and difficulty in locating appropriate links during navigation.
It is well known that website success relies heavily on its usability. Previous studies find that website usability depends greatly upon its visual complexity which has significant effects on users' psychological perception and cognitive load. In this study, we use a page's outdegree as one measurement for its visual complexity. In general, outdegrees should be kept not too high in page design as large outdegrees are often signs of high page complexity which can adversely affect user navigation. This is particularly desirable and critical for maintaining website structures, because as a website evolves over time, the need for information also changes. Website structures must be updated periodically to align with users' information needs. In this process, obsolete links should be removed to avoid clustering of links that could cause information overload to users. However, the need to slim down website structures is understudied in the literature. In this paper, we propose a mathematical programming model that reduces information load by removing links from highly clustered pages while minimizing the impact to users. Results from tests on a real dataset indicate that the model not only significantly reduces page complexity with little impact on user navigation, but also can be solved effectively. The model is also tested on large synthetic datasets to demonstrate its remarkable scalability.
Context preserving navigation redesign under Markovian assumption for responsive websites
2017, Electronic Commerce Research and Applications
Citation Excerpt :
This is called the average click ratio (Gupta et al., 2007). As a solution to this problem, the original website structure is viewed as a graph and replaced with another graph with a different link structure (Lin and Tseng, 2010). In this graph, the nodes are the webpages connected by the links which appear on each webpage.
Identification and sorting of contextually relevant links are important for navigation design of responsive websites. While average click ratio is often used as a metric to evaluate contextual relevance of navigation structure from users’ view, we propose a metric called implied deviation to quantify the same from designer point of view. Average click ratio minimization problem typically solved using meta-heuristics has issues such as loss of designer-defined contextual relevance and loss of connectivity among webpages. To solve this problem, we propose a deviation minimization framework to suggest context-preserving navigation structure. The proposed framework consists of three stages. Stage 1 models user navigation behavior as a Markov process and generates a transition probability matrix. Then we use the transition probabilities as weights to relax the original average click ratio minimization problem, and bring it to a form similar to a transportation model. The corresponding solution is considered as the initial basic feasible solution of the original problem. In Stage 2, transition probability guided meta-heuristics improve upon the initial basic feasible solution. Specifically, we use modified simulated annealing and ant colony optimization algorithms. This resulting solution is further weighted with transition probabilities to make a tradeoff in Stage 3. We obtained experimental results based on two education and university-related datasets to show that the proposed framework can achieve a good tradeoff for contextual relevance in terms of both the average click ratio and implied deviation.
Optimization of multi-criteria website structure based on enhanced tabu search and web usage mining
2013, Applied Mathematics and Computation
Citation Excerpt :
Owing to deliberate control between diversification and intensification searches, metaheuristic techniques usually are unlikely to get trapped by local optima and are able to provide good quality solutions within reasonable time. With our literature survey, we found that many previous works formulate WSO by a quadratic assignment problem (QAP) model [12,13,16,18,20]. The QAP searches for the minimum distance-flow product cost by assigning facilities to individual locations.
With the rapid development in World Wide Web (WWW) technology, the number of webpages and the volume of information content have been overwhelming. It becomes increasingly important to help users find relevant webpage and information more easily and quickly. This situation causes widespread attention in constructing adaptive websites which automatically reorganize the structure or content by learning from the users’ browsing behaviors, as such the usage of the websites is improved. In this study we propose a new formulation for the website structure optimization (WSO) problem based on a comprehensive survey of existing works and practice considerations. An enhanced tabu search (ETS) algorithm is proposed with advanced search features of multiple neighborhoods, adaptive tabu lists, dynamic tabu tenure, and multi-level aspiration criteria. The experimental result on 24 real-world problem instances shows that the proposed ETS algorithm can obtain a better value of web usage estimation than a genetic algorithm method. Moreover, ETS is computationally efficient due to the strategy that handles problem constraints on-the-fly when constructing the solution.
Predicting web user behavior using learning-based ant colony optimization
2012, Engineering Applications of Artificial Intelligence
Citation Excerpt :
Each one has been used depending on the approaches that have been adopted. Soft computing methodologies have gained a considerable amount of relevance in relation with WUM, due to their flexible implementation and results in the field of recommendation-based systems and adaptive web sites (Lin and Tseng, 2010). Within these fields, special attention has been concentrated on bio-inspired metaheuristics, which are commonly ruled by the concept of swarm intelligence, the ability of a group of agents to perform complex tasks through a collaborative process.
An ant colony optimization-based algorithm to predict web usage patterns is presented. Our methodology incorporates multiple data sources, such as web content and structure, as well as web usage. The model is based on a continuous learning strategy based on previous usage in which artificial ants try to fit their sessions with real usage through the modification of a text preference vector. Subsequently, trained ants are released onto a new web graph and the new artificial sessions are compared with real sessions, previously captured via web log processing. The main results of this work are related to an effective prediction of the aggregated patterns of real usage, reaching approximately 80%. In the second place, this approach allows the obtaining of a quantitative representation of the keywords that influence the navigational sessions.
A Novel Transformation-Based E-Commerce Website Structure Optimization Model
2024, IETE Technical Review (Institution of Electronics and Telecommunication Engineers, India)
Aco with heuristic desirability for web page positioning problem
2021, Studies in Computational Intelligence

View all citing articles on Scopus

View full text

Website reorganization using an ant colony system

Abstract

Introduction

Section snippets

Models for website reorganization

A review of ant colony system

The proposed ant colony system for website reorganization

Computation experiments with artificial graphs

Experiments with a real-world website

Conclusions

Theoretical Computer Science

European Journal of Operational Research

Artificial Intelligence

Decision Support Systems

Web usage mining using artificial ant colony clustering and linear genetic programming

Fast algorithms for mining association rules

An ant-based algorithm for finding degree-constrained minimum spanning tree

Distributed optimization by ant colonies

Web mining: information and pattern discovery on the World Wide Web

Data preparation for mining World Wide Web browsing patterns

Knowledge and Information Systems