Customer Behaviour Analysis using Web Usage Mining

- This paper attempts to determine the veiled information and identifying user behavior on the web by using the web data sources. With the help of this information, Prediction of user behavior can be done very easily. The user pattern is analyzed by applying the free tool similar web and page score on the website of the particular organization. This tool tries to conduct web mining in a domain independent manner. The experimental results provide an easier way to navigate the website and improve the website design architecture. This work deliberates the detailed results of a website in a specific education domain application. The main objective of conducting the study on websites of the different universities is to discern the pattern of behavioural expressions and user characteristics so that the organization can improve the structure of website and can upload the more information in a proper way


I. INTRODUCTION
The popularity of Internet is ever growing and it provides the number of services in every field.One of the most profoundly services of the Internet is the World Wide Web or WWF which has become mainstream tool for almost every activities requiring knowledge base.It not only fulfill all kinds of requirement of different communities whatever a student wants to know more about their particular topic, any university for admission, projects or a wants to purchase something but also it provide help to Internet based business web sites which are facing competition since development of E commerce is growing rapidly and these web-portals generate large amounts of data.This data include various logs and transaction details and with these information, market analyses and predictions can be done very fairly accurately.Many organizations and corporations use web-portals as key tool for sharing and accessing information, and for creating networks/collaborations.In most cases, transactions are encouraged by online transaction process.These are now becoming the main strategy for these organizations and corporate offices.Even the top education organizations have started offering online courses and teaching methods have also been modified with webinars etc.In fact, these online opportunities have revolutionized the education system.This is also true for distance education system and given the multitasking requirement in the current situation, online opportunities meet these requirements with efficiency.Because web process is relatively easy to store and retrieve information, data mining opportunity is also enormous, and it is more beneficial when the logs could also be accessed, which is useful for research purpose.There has been various research efforts to unearth the quantum of information and the way this could used so as to increase the e-commerce opportunities.

II. DATA MINING PROCESSES
There is always a need to develop data mining technique to extract information from the huge web-log files.These are often automated and systematic process towards of analyses of emerging information and users' response patterns.Given the voluminous data, it is highly challenging task to go into all details of website users.
Data Collection: Log files are usually accessed from various sources such as portal specific servers, open-source portals, personal sites of clients, proxy servers, etc. Web log server provides the most ready source of information relating to websites and user interface.The source files are stored in plain text format and can be accessed by the users.
Preprocessing: This is essentially for removal of click stream data and categorization of transactions in terms of visitation details.Preprocessing is very critical since all servers do not have right or required format.
Data Cleaning: Data cleaning is complex phase in the whole process of data mining from web sources.Because of such voluminous information and has unrestricted opportunities, it is important to remove the noise and misinformation or redundant information.This usually involves removal of file such as jpeg, word, gif, sound and animation files.Thereafter, IP address is used to identify users and related log files to understand users' behaviors, which include visitation frequency and duration of website access.

Pattern Discovery:
Pattern discovery is nothing but studying specific patterns of information storage and retrieval, user behaviors and specific transaction details.This involves summary statistics, pattern detection process and machine learning technique such as artificial neural networks, and clustering algorithm.The emerging pattern discovery includes data mining methods like path identification, Frequent Pattern based Association rule, clustering and classification on preprocessed log data7.Clustering approach is used for similarity pattern discovery.
Pattern Analysis: To isolate the predictive emerging patterns from web log files Knowledge discovery technique is used.Pattern analysis is the last stage of web usage mining process.It include the conversion between mined pattern and web log data.

III. TOOLS AND DATA COLLECTION
Number of tools are available to analyze the web log files.Data is collected by using free web analyzer tools.A comparative study is made between the Top most universities of Haryana by knowing the customer interaction between the period Dec 2016 to Feb 2017 on the web sites of the universities mentioned in table 1  This tables shows that when a customer browse the website of a particular universities then how much loading time it takes.

V. CONCLUSION
As there is increase of tremendous use of website for online purchasing and for everything which a customer wants with the click of mouse.Web usage mining is the one of the important research areas through which browsing and navigation of data can easily be predicted.There are several techniques and tools proposed by different researchers for the web usage mining.This paper discussed about education domain web user pattern discovery and pattern analysis using similar web and page score.The results obtained should absolutely help the website Analysts, Website Maintainers, Website Designers.

Table 1
by using similar web tool and page score tool.Number of experiments is available to analyze these web server access logs as an input.It generates the reports for frequent access and identifies the user access patterns.Shows the Universities name and its code which is temporary assigned for experiment purpose

Table 1 .
1 gives the comparative Global Ranking and county Rank of the Universities Analysis Report from similar web and page score.

Table 1
This work used the data collected from the web Sites of the Top Universities of the Haryana from 7 Dec 2016 to 31 Jan 2017.This collected data is analyzed by using Different free Tools Such as Page Score and Similar web.The complete experimental analysis was done on the basis of web log data of the educational Universities website and a comparative analysis report is prepared.The design and execution of such work is restricted and time consuming.Comparative chart of educational websites of universities is