A New Clustering Algorithm of Data Mining

The aim of this study is to present a new useful process of segmentation in large data, because organizing data into sensible groupings is now becoming the most fundamental modes of understanding and learning and enterprises have gathered a large amount of information over the last decades, the dilemma of managing such information by retrieving advantage in efficient way and less costing methods is becoming the key business success and takes top rows in strategy scale, different methods and techniques have been developed to reduce the data volumes to manageable structure and help enterprise to isolate the business value from the data sets. Clustering is one of those most important used data mining techniques. The algorithm that we will present can be helpful in CRM area. It can be potentially useful to better study customer profiles based on parameters called descriptor and may have a positive impact on customer retention and churn prevention, because the main aim of an ideal business is to optimise customer interactions by well remaining connected with customer.


INTRODUCTION
One of the most important typology is based on objectives: The high level view is schematized by Fig. 1.
Classification: This method aims to assign data set to the appropriate classes: Example outcome for a credit request assignation: A bank loan representative wants to perform a customer's data analysis in order to know which customer (loan applicant) are potentially illegible or no; The underlying Model can be built as follow: Let C = { C1, C2 } Where C1 means the Customer is illegible, C2 other way round, Ω = {X1 … .Xn} a data set where Xi is customer data extracted from analytical financial CRM, Xij where, j belongs to the interval [1 P] represents the value of variable j for customer i, examples: Socio-demo, age, financial antecedents of a customer, second example is issued from medical ones to identify the risk factors for prostate cancer, based on clinical, diet and demographical parameters.Y(+) = Clinical favourable parameters, Y(-) for other cases.
Clustering: is an unsupervised learning task which aims to arrange the instances described in a database into a set of homogeneous and mixed clusters (Han and Kamber, 2001), where similarity intra classes and dissimilarity interclass principles are satisfied.Regression: It has been a mainstay of statistics for the past 50 years and remains one of our most important tools.The principle is to predict or explain values of quantitative variable Y based on content or elements of a given variable X, X is known and true (Tibshirani, 1996), Y is called predictor, the underlying mathematical model may be described as follow (1): (1) The term a is the intercept, also known as the bias in machine learning.Often it is convenient to include the constant variable 1 in X, include a in the vector of coefficientsa and then write the linear model in vector form as an inner product Eq.( 2): (2) Association rules: The aim of this task is to identify the correlated attributes and highlight eventual relationships between items of the basket, the main objective is to continuously have a focus on the customer's behaviour; Introduced by Agrawal et al. (1993), It is an important data mining model studied extensively by the database and data mining community.One important criterion for using such method is to suppose that all data sets are categorical.Notice some elementary notions of used algorithms such as support, least Support, Support of an item set I is the total number of tuples containing I, lest least Support š support's thresholds, frequent item set : all itemset where support >δ.Frequent item sets: the following array (Table 1) is representing a frequently purchased telco products: If δ = 60% P1 product is frequent whereas for P5 and P2.
If we have an item set I = {A, B} we can notice the following 2 cases Fig. 2.

Fig. 2: Main scenarios of association rules
As recognized rule if we have two cases A, B if we have A true this is implying that we have also B true.
To asses association rules algorithm two important parameters have to be highlighted: least support and least confidante.
A rule is valid in data set if the following conditions are realized: Each cluster may contain a large data set rules, rebuilding and generalizing all rules may be difficult to manage, a highly ranked representative rules may be used thanks the pruning method (Tibshirani, 1996), (Fig. 3).
For the rest of this process we will present a new algorithm of segmentation, belonging to discovery methods and that we will call M-clustering, we will summarize a methodology of its implementation in Micro Finance world.Fig. 3: Pruning method for clustering methods

BODY OF THE NEW ALGORITHM OF SEGMENTATION
For our algorithm we consider some assumptions that are mandatories for its deployment, we use Euclidean distance to calculate similarities.Let ε a data set.
• Similarity index is an application noticed S: S: ε ×ε → R + where, the following properties are satisfied: • Dissimilarity index is an application noticed D: Where the following properties are satisfied: Calculation step: We calculate similarities of each elements of the data set data ε (dimension n x p), Let define X 0 as reference point: X0 = (a 0 , …, a p ) = 0 For the rest of this article we will consider the following assumption as true: q = 2 in this case Euclidean distances is used:  • We assign the rest of elements to the nearest K found elements • We have now k build segments • For each segment we perform an ascending sort of dxi.(last element is farest from the last one) (Fig. 4) Move step: For each segment we take the last element (farest one, we assign it to nearest first element of the K-1 other segment we stop until no assignation is possible means that the segment is definitive.

MFI CUSTOMER SEGMENTATION CASE STUDY
We aim in the below case to apply this new innovating segmentation algorithm to Micro Finance

CONCLUSION
This study proposes a new of customer segmentation analysis using innovative M clustering algorithm.We have validated the algorithm to the Micro Finance Institution using the K have proved that the algorithm can be well applied to Wealth Rank Index.This study proposes a new of customer segmentation analysis using innovative M-means clustering algorithm.We have validated the algorithm Micro Finance Institution using the K = 3 we have proved that the algorithm can be well applied to

Fig. 1 :
Fig. 1: Taxonomy of DM algorithms Clustering algorithms may be grouped into two distinct types: Hierarchical and partitioning methods (Fraley and Raftery, 1998).Additional classification has been introduced later (Tan et al., 2002) with three new layers: Density-based methods, model-based clustering and grid based methods.

Fig. 4 :
Fig. 4: High level view our algorithm steps

Table 1 :
Data set of telco products (basket)