A CLASSIFICATION APPROACH FOR NAÏVE BAYES OF ONLINE RETAILERS

Many small online retailers and new entrants to the online retail sector are keen to practice data mining and 
consumer-centric marketing in their businesses yet technically lack the necessary knowledge and expertise 
to do so. In this article a case study of using data mining techniques in customer-centric business intelligence 
for an online retailer is presented. The main purpose of this analysis is to help the business better understand 
its customers and therefore conduct customer-centric marketing more effectively. On the basis of the Recency, 
Frequency, and Monetary model, customers of the business have been segmented into various meaningful groups 
using the classification and naive bayes algorithm, and the main characteristics of the consumers in each segment 
have been clearly identify ed. Accordingly a set of recommendations is further provided to the business on 
consumer-centric marketing.


Introduction
Data mining is the process of discovering interesting knowledge, such as associations, patterns, changes, significant structures and anomalies, from large amounts of data stored in databases or data warehouses or other information repositories.It has been widely used in recent years due to the availability of huge amounts of data in electronic form, and there is a need for turning such data into useful information and knowledge for large applications.These applications are found in fields such as Artificial Intelligence, Machine Learning, Market Analysis, Statistics and Database Systems, Business Management and Decision Support.
The online retailer under consideration in this dataset is a UK-based and registered non-store business with some 80 members of staff.The company was established in 1981 mainly selling unique all-occasion gifts.For years in the past, the merchant relied heavily on direct mailing catalogues, and orders were taken over phone calls.It was only 2 years ago that the company launched its own web site and shifted completely to the Web.Since then the company has maintained a steady and healthy number of customers from all parts of the United Kingdom and Europe, and has accumulated a huge amount of data about many customers.The company also uses Amazon.co.uk to market and sell its products [1].
The main purpose of this analysis is to help the business better understand its customers and therefore conduct customer-centric marketing more effectively.So, customers of the business have been segmented into various meaningful groups using the naïve bayes classification algorithm the main characteristics of the consumers in each segment have been clearly identified.Accordingly, a set of recommendations is provided to the business on customer-centric marketing and further data analysis tasks.The dataset was collected by using mark reports.
On the basis of the RFM model, customers of the business have been segmented into various meaningful groups using the clustering algorithm and decision tree induction, and the main characteristics of the consumers in each segment have been clearly identified.

Related Work
Researchers have proposed many different approaches for sentiment analysis.In this previous technical article for the online retails industry is used k-means clustering to create some nested segments internally inside the cluster.In other words, these nested segments form some sub-clusters inside cluster, and make it possible to categorize the consumers concerned into some sensible subcategories.
With the prepared target dataset they intended to identify whether consumers can be segmented meaningfully in the view of recency, frequency and monetary values.The k -means clustering algorithm was employed for this purpose, and it can be easily performed by using the Cluster node in SAS Enterprise Miner.As well-known, the k -means clustering algorithm is very sensitive to a dataset that contains outliers (anomalies) or variables that are of incomparable scales or magnitudes.

Methodology
For this online retailer dataset, we choose classification methodology.Classification is a function of data mining, distributed collection in a project goal category or class.Classification is the purpose of accurately predicting each case of target class data.Classification task begins in a data set of class assignments are known Classification is a data mining function that assigns items in a collection to target categories or classes.The goal of classification is to accurately predict the target class for each case in the data.For example, a classification model could be used to identify loan applicants as low, medium, or high credit risks.
A classification task begins with a data set in which the class assignments are known.For example, a classification model that predicts credit risk could be developed based on observed data for many loan applicants over a period of time.In addition to the historical credit rating, the data might track employment history, home ownership or rental, years of residence, number and type of investments, and so on.Credit rating would be the target, the other attributes would be the predictors, and the data for each customer would constitute a case.

Advantages
• Users are thus better able to find precisely what they want, whether they wish to search in one discipline or across all.Works need not but can be coded by discipline.
• While other classification systems provide specific instructions in multiple places for coding by time or place or people, this system has a universal coding for such elements.This renders both classification and searching easier.
• The classification is able to use natural language but yet there are very precise definitions associated with each concept in the BCC.Following these steps a target dataset for the analysis has been generated.The original dataset was in MS Excel format, and was Part of the target dataset is shown in figure 1. Naive Bayes classifiers, a family of classifiers that are based on the popular Bayes' probability theorem, are known for creating simple yet well performing models, especially in the fields of document classification.
The chosen datasets were downloaded from the UCI repository (University of California, Irvine) [4].Table I shows their characterization.The name by which they are known in the literature, the variables data type with nominal grouped into categorical, the number of Attributes and the number of possible with numeric type.In the following sections, we will take a closer look at the probability model of the naive Bayes classifier and apply the concept to a retailer online data.
In this paper, we try to give the basic concept of classification technique using naïve bayes Support vector that can be used for classifying the unknown records [2-4].

Experiment a. Dataset
The experiments that were carried out in this research were done using by Rapidminer.Rapidminer is a java-based open source data mining and machine learning software.It has a graphical user interface (GUI) where the user can design his machine learning process without having to code.Here, the classification task for a given classifier is designed as a process.From the data we gathered, Algorithm that we use in online retailer is naïve bayes classification algorithm.We provided with a training dataset consisting of information about online retailers.Figure 3 show the dataset was in the form of a Microsoft Excel 2003 spreadsheet and had details of country, invoice number, stock code, quantity, description, quantity, invoice date, unit price and customer id.For ease of performing data mining operations, the data was change into a csv format.In figure 11, it shows the example chart of country based on the quantities of each product (item) per transaction.We can see that portugal is the highest density with 0.068.

Conclusion
A case study has been presented in this article to demonstrate how customer-centric business intelligence for online retailers can be created by naïve bayes of data mining techniques.The distinct customer groups characterized in the case study can help the business better understand its customers in terms of their profitability, and accordingly, adopt appropriate marketing strategies for different consumers.It has been shown in this analysis that there are two steps in the whole datamining process that are very crucial and the most time-consuming: data preparation and model interpretation and evaluation.

Result
Results of the empirical study show that prediction the country in online retailer dataset.The classification of this datasets was conducted to prove that the histograms in figure 8 of the variables country the target dataset of online retailer it is evident that there are a few instances having quite different monetary and frequency values compared to the majority of the country in the dataset.These instances are valid from the business point of view as they are genuine transaction records; however, they are outliers from the data analysis point of view.United kingdom is the highest ranking country for delivery the item.
The next step was to feed the pruned online retailer database as input to RapidMiner.This helped us in evaluating interesting results by applying classification algorithms on the online retailer training dataset.b.Algorithm of naive Bayesian ClassifierNaïve Bayes:The design view of Naïve Bayes Classifier is given below in Figure4which includes the UCI dataset with 1000 instances and 8 attributes, a cross validation Operator with 10 fold cross validation, Apply Model and Performance evaluation operator with necessary parameter values for execution.

Figure 5
Figure5shows the example dataset online retailers when apply the naive bayes.It include 1 special attribute which is country and 7 regular attribute.