A new intelligent algorithm to create a profile for user based on web interactions

Article history: Received December 2, 2012 Received in revised format 2 March 2013 Accepted 5 March 2013 Available online March 9 2013 This paper presents a method to classify the web user’s navigation patterns automatically. The proposed model of this paper classifies user’s navigation patterns and predicts his/her upcoming requirements. To create users’ profile, a new method is introduced by recording user’s settings active and user’s similarity measurement with neighboring users. The proposed model is capable of creating the profile implicitly. Besides, it updates the profile based on created changes. In fact, we try to improve the function of recommender engine using user’s navigation patterns and clustering. The method is based on user’s navigation patterns and is able to present the result of recommender engine based on user’s requirement and interest. In addition, this method has the ability to help customize websites, more efficiently. © 2013 Growing Science Ltd. All rights reserved.


Introduction
Web-usage mining has become an area of extensive investigation (Kosala & Blockeel, 2000;Berendt et al., 2001).However, the capability of Web-usage mining outcomes depends on the appropriate preparation of the input datasets.More specifically, errors in building the sessions and incomplete tracking users' personal activities in a site can yield in invalid patterns and baseless conclusions.There are different studies in this regard and Spiliopoulou et al. (2003), for instance, made an assessment on the performance of heuristics employed to rebuild sessions from the server log data.They presented a set of performance figures, which were sensitive to two kinds of reconstruction errors and appropriate for various knowledge discovery (KDD) applications.
According to Phatak and Mulvaney (2002), web access from mobile devices expresses its own unique challenges because of the existing resource constraints on the mobile devices such as power, form factor, bandwidth, Phatak and Mulvaney (2002) recommend to try and predict a user's actions instead of reacting to a user's requests.Adomavicius and Tuzhilin (2005) performed a comprehensive review on the field of recommender systems and explained the current generation of recommendation techniques, which are normally categorized into three main categories including content-based, collaborative, and hybrid recommendation techniques.They also explained different limitations of current recommendation techniques and explained possible extensions, which could contribute to recommendation capabilities and make recommender systems more useful to an even wider range of implications.These extensions incorporate an improvement of recognizing users' requirements and include contextual data into the recommendation process.In addition, they could provide support for multicriteria ratings, and a provision of more flexible and less intrusive kinds of recommendations.Nicholas et al. (2006), in other survey, investigated different related works on deep log analysis (DLA) reporting on the information seeking methods of users.Nicholas et al. (2005) demonstrated a powerful and new DLA technique for mapping and evaluating information seeking behavior.Nicholas et al. (2006) used DLA techniques to show what usage data can reveal information seeking behavior of virtual scholars -academics, and researchers.Mobasher et al. (1999;2001) presented some scalable techniques for Web personalization based on association rule discovery from usage data.We explained that the method could reach better recommendation effectiveness through detailed experimental evaluation on real usage data.Lin et al. (2000) investigated the implementation of association rule mining as an underlying method for collaborative recommender systems and reported that such method was inefficient for collaborative recommendation since they include different rules, which are not relevant to users.Cooley et al. (1999) presented data propagation techniques to identify unique users and user session.Breeding (2005) described different methods to streamline and optimize how a Web site works to improve both its visibility and usability.They study also explained how to analyze logs and other system data to compute the effectiveness of the Web site design and search engine.Finally, Forsati and Meybodi (2009) presented an algorithmic based on structure pages and user's usage information for recommending web pages.

The preliminary requirements and definitions
In this section, we the necessary assumptions for web mining, customizing based on web usage mining.We also present the method of clustering and neural network and the concepts.

Web mining
Web mining uses the idea of data mining technique to find necessary data among documents and web services.Web structure mining in another idea, which analyzes nodes and structural relationships in a website based on models represented in graphs.Web content mining is also another process, which deals with discovering necessary information from texts, images, voice and visual data through web.Another term is associated with web usage mining, which is a process concentrates on techniques capable of predicting user's behavior interacting in web.The main functions in web usage mining are to retrieve comprehensive data from profile storage and using web servers based on user's browse.This process itself is categorized into three parts including pre-processing, pattern discovery and pattern analysis.

Clustering
The collection of input models X= {x 1 , x 2, …, x n } includes n objects where each one is from the collection of equal size vector with the length s in terms of properties.These objects must be clustered in K groups named C={C 1 , C 2 , … , C k }, which do not overlap with each other.In this paper, k-means algorithm is used to cluster similar users, which is a popular technique for many clustering applications.

Neural network
Artificial neural network is an idea for processing some data, which are inspired from biological neural network and it acts like a brain.This system includes large numbers of processing elements called neuron, which act harmoniously to solve the problem.The distinctive advantage of these networks is their excessive capability along with simplicity.

The proposed method
Data recovery often arrives along with error since the available profiles in a server, saved sub sequentially, do not belong only to a user but they are available for various components.In addition, there are various search information kept for each user as well and these data must be pre-processed and prepared before implementation.Processing web logs incorporates data cleaning, user as well as user session identification.After preparing data and identifying users and their sessions, we build session vector as follows, User's session can be described in terms of a vector of weight of page views during a specific period since a session includes all activities performed by the users from their arrival to site until their departure.A threshold is taken into account for the session duration and it this duration excesses from certain predefined level, it is a sign of the other access session of user.Based on this experiment , a thirty minute threshold is suggested for session duration (Berendt et al., 2011;Spiliopoulou et al., 2003).User session is also expressed in the following way: Let p be a collection of all accessible pages by site users with p= {p 1, p 2 , …, p M } provided that each p i be distinguished by a particular URL.The collection of S also shows a subset of access sessions of users provided that each S i be a subset of P.

S= {s 1 , s 2 ,…,s n }
Each session is an M-dimensional vector as follows,

S i ={W(P 1 ,S i ),W(P 2 ,S i ),…,W(P m ,S i )}
where the weight of each page, p j , is determined in i session and every page weight shows the amount of user's interest to that page.In fact, to determine weight and amount of user's interest to the page, two factors of frequency and duration of page must be considered as follows, (2) The relative importance of whole page is calculated by compounding two mentioned criteria.In this system, we use from the harmonious average of frequency and duration to explain the amount of user's interest to a web page in a session like below: (3) Now we have a vector for every session where W i determines the weight of the page i in a particular session.As the number of M dimension should not exceed from pre-specified number, the pages whose the amount of their support is high or low should be discarded.

Creating user's profile
Each user has k sessions such that S 1 , S 2 , …, S k are collection of i user sessions.Average vector of S ui is considered as a criterion or ui user interest.Weight of each page in average vector obtains from the average weight of that page in all user sessions.To achieve more efficient results, in addition to history of user's behavior, his/her trivial session can be used as well.

Clustering profiles
Now, the vector of average sessions needs to be compared with each other and they need to be clustered based on their similarities.In this algorithm, the number of clusters must be entered into an algorithm as an input parameter and cosines distance should be implemented to calculate the distance between two objects.The collection of clusters is as follows, As a representative for each cluster, we obtain the average of each m c cluster, which shows the user navigation pattern of each cluster in a particular collection of accessible web pages.At last, as the result of profile clustering, there will be a collection like the following, where each p i is a subset of web page collection p.After training neural network, by entrance of new user to site, we need to prepare current user session in such a way that it would be possible to enter to neural network.Now it should be determined that the profile of current session belongs to which navigation pattern.Then, the profile of current session is given to the entrance of neural network and the network will determine appropriate cluster for the session.When the number of cluster is determined, pages of unvisited cluster in current session have high potentials to be next visiting candidate page and they will be kept into suggested list.

Results and Conclusion
In this paper, to provide useful and required data for users, a new method was introduced on the basis of user navigation patterns, which is capable of obtaining results by recommender's engine based on user's requirement.The preliminary results indicate that the proposed model of this paper performs better than alternative methods.The proposed method separates the pages, which are relevant to user's interest from irrelevant ones and to examine the impact of new method, the researcher performed a survey on the structure of user's profile based on the history of their behavior.If in adjustable research for each user, the researcher concentrates on user's current session more than his/her.Search history will lead to efficient results.This system uses neural network to determine classification of user similarity and common interests.Fig. 1 and Fig. 2 summarize the results obtained for precision and coverage: