Efficient constraint-based Sequential Pattern Mining (SPM) algorithm to understand customers’ buying behaviour from time stamp-based sequence dataset

: Business Strategies are formulated based on an understanding of customer needs. This requires development of a strategy to understand customer behaviour and buying patterns, both current and future. This involves understanding, first how an organization currently understands customer needs and second predicting future trends to drive growth. This article focuses on purchase trend of customer, where timing of purchase is more important than association of item to be purchased, and which can be found out with Sequential Pattern Mining (SPM) methods. Conventional SPM algorithms worked purely on frequency identifying patterns that were more frequent but suffering from challenges like generation of huge number of uninteresting patterns, lack of user’s interested patterns, rare item problem, etc. Article attempts a solution through development of a SPM algorithm based on various constraints like Gap, Compactness, Item, Recency, Profitability and Length along with Frequency constraint. Incorporation of six additional constraints is as well to ensure that all patterns are recently active (Recency), active for certain time span (Compactness), profitable and indicative of next timeline for purchase (Length―Item―Gap). The article also attempts to throw light on how proposed


PUBLIC INTEREST STATEMENT
The main objective of a business economic activity is to satisfy needs and wants of customers. Hence management of any business aims at identifying and predicting purchase tendency of customers with a view to plan its business strategy including product development and marketing substrategies. This requires data mining tools including sequential pattern mining (SPM) techniques that help in achieving aforesaid objective. Most of the existing SPM approaches work purely on frequency, which fail to extract sequential patterns of users' interest. Incorporation of constraints in SPM is able to address such shortcomings. Proposed framework might be useful for decision-maker to understand business, to identify past, current as well future buying pattern of customers. Further Emerging Patterns (EPs) that can help in predicting future buying behaviour can also be identified with the proposed framework. Paper also highlights obsolete, new and forming stage patterns that will be relevant for business managements.

Literature survey
To mine important information and mould it into proper knowledge is important task from past few decades, known as data mining. (Chen, Han, & Philip, 1996). Data mining uses to extract knowledge from huge extent of data (Mary & Iyengar, 2010;Raza, 2010). There are certain activity which is necessary to happen in sequence to mine such sequence are important data mining activity known as sequential pattern Mining (SPM), was introduced in (Agrawal & Srikant, 1995;Srikant & Agrawal, 1996). It is useful in various data mining applications, to discover useful customer and market information from the data, such as product recommendation (Lawrence et al., 2001), e-retailing, customer profiling (Hu & Chen, 2008;Mahdavi, Cho, Shirazi, & Sahebjamnia, 2008). Current SPM techniques are mainly differentiate by a priori based (Agrawal & Srikant, 1994) and FP-Growth-based (Han, Pei, & Yin, 2000) techniques. Both a priori-based GSP (Srikant & Agrawal, 1996), SPADE (Zaki, 2001), SPAM (Ayres, Flannick, Gehrke, & Yiu, 2002), SPIRIT (Garofalakis, Rastogi, & Shim, 1999) and FP-Growth-based Freespan (Han, Dong et al., 2000), Prefix span (Pei, Han, Mortazavi, & Pino, 2001) are purely worked on sole parameter frequency, known as support threshold. Conventional SPM techniques often take substantial computational time and space for mining the complete set of sequential patterns in a large sequence database. Introduction of constraint may open a new opportunity for performance improvement: "Can we improve the efficiency of sequential pattern mining by focusing only on interesting patterns?" Constraint-based mining may improve or overcome both the difficulties: efficiency and effectiveness. Conventional SPM techniques are not successful to extract those patterns which are having more potential in terms of Monitory value and Recency. (Chen, Kuo, Wu, & Tang, 2009) has taken care of these parameters by introducing Recency, Frequency and Monitory (RFM) parameters in SPM.
The algorithm RFM-A priori is helpful to users for identification of those patterns which are more frequent, active recently and have high monetary value (Chen et al., 2009). RFM is used in many data mining activities likes, association rule mining, clustering, classification, etc. (Chen, Wu, & Chen, 2005;Mohammad, Hosseini, Maleki, & Mohammad, 2009;Qiasi, Dehnavi, Minaei-Bidgoli, & Amooee, 2012). (Goodman, 1992) has applied RFM model to find valuable customers. RFM is used to specify loyal and profitable customers on based of clustering. Classification algorithms are useful to obtain rules for implementing effective customer relationship management. Combinations of behavioural and demographical characteristics are used to estimate loyalty (Qiasi et al., 2012). Extended RFM model called Weighted RFM (WRFM) and K-means algorithm applied classify customer product loyalty in under B2B concept. Customer loyalty is used in marketing strategy (Mohammad et al., 2009). In most of the researches, two methods are common for identifying loyal customers, one of them is in terms of demographic variables (such as age, gender, etc.) and the other is in terms of interactive behaviours of customers that are expressed with the so-called RFM. RFM model is proposed by (Hughes, 1994) and has been used in direct marketing for several decades (Chen et al., 2005). (Chen et al., 2009) focused on effective discovery of current spending pattern of customers and trends of behavioural change using classification-based clustering on bases of RFM model to identify high-profit, gold customers. It helps to identifies customer preference, and provide desired service according to customer need which prevent customer attrition (Chen et al., 2005). (Shaw, Subramaniam, Tan, & Welge, 2001) mainly describes how to incorporate data mining into the framework of marketing knowledge management. (Song, Kim, & Kim, 2001) depicts a method to detect changes of customer behaviour at different time snapshots from customer profiles and sales data. (Shen & Chuang, 2009) used RFM model-based cluster analysis for customers, which can evaluate high value customers in terms of high loyalty, high interest, and a high amount of purchase. On bases of results company can apply appropriate target marketing to enhance customers' lifetime value (CLV) (Shen & Chuang, 2009). The problem of sequential pattern mining in B2C (Business to customer) environment and in B2B (Business to Business) environment has been investigated by Chen. 1 Importance of Compact, Frequent, and Recent sequential patterns (CFR-patterns) in (Business to customer) B2C environment and Compact, Frequent, Recent, and Repeated sequential patterns (CFR 2 -patterns) in B2B (Business to Business) environment has been described (see Note 1). The algorithm CFM-PREFIX SPAN has incorporated compactness and Monitory constraint in pattern growth-based Prefix Span which give precise patterns in context of high-valued patterns and pattern active during precise time span (Mallick, Garg, & Grover, 2012). The Algorithm C-Prefix Span has incorporated Gap, Compactness, Recency along with conventional Frequency (Bhensdadia & Kosta, 2012). Following advantages can be achieved by constraint-based sequential mining: (i) Enhance performance of algorithm by reducing computation cost of uninteresting patterns.
(ii) Along with frequency other important parameter are focused.
(iii) User can focused on only those patterns which are really of his/her interest.
(iv) User can get goal or application-oriented result.

Major research findings and basic outline of proposed research
(I) Most of the existing SPM methods work purely on frequency parameter, formally known as support threshold. Support is important to distinguish if patterns appear repeatedly or not. On the other hand, the proposed Constraint-based Prefix Span algorithm would be concerned about decision maker's perception. Proposed algorithm through use of Recency constraint determines current buying patterns and through use of profit constraint determines the more profitable buying patterns. Further compactness constraint can be used to identify buying behaviour of customer during specific time span including seasonal patterns. Item and length constraint will help to give detailed buying behaviour of customer. Further length constraint in the algorithm can be used to understand customers' buying preferences for identifying influential items of purchase leading or indicating higher probability for next purchase.
(II) Almost all the existing SPM methods are focused on the current scenario. There are some patterns which have potential to become strong in future, but are suffering from slightly less support. Minor reduction in support values may bring such patterns into focus, which can be potential buying patterns for future. Our research highlights such potential patterns using Boundary value reduction and Recencybased Emerging Patterns (EPs) algorithm.
(III) Extraction of sequential patterns works on objective measures like support and confidence. SPM with subjective measures is an unchartered area that needs to be explored. Our research includes subjective analysis for purchase such as profitability, loyalty and influential purchase using amalgamation of various constraints. Customers who are frequent, profitable and recent are identified as loyal customers. Segmentation of customers based on such subjective survey criteria can be done using the proposed method.
(IV) Our research provides an algorithm to study customers' current buying behaviour and identification of patterns that are likely to become obsolete in future and those that are at their formative stage.

System framework
Theoretical physicist Albert Einstein once said: "Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted." Since marketers can't measure everything, the challenge is to focus on those parameters that truly matter to enhance business performance. 2 Article has adopted this philosophy. Article focused on those constraints which are useful to track customer's present as well future buying behaviour. Future buying trend of purchase can be discovered by following: (i) Emerging Patterns (to highlight buying habits of customers which will useful for future).
(ii) High-influenced item (on bases of customer buying preferences).
(iii) Buying habit of customer on forming stage.
(iv) Seasonal buying habit of customer.
Customers' behavioural study provides current as well future buying snapshot which will be useful for expansion of business.

Emerging patterns (EPs)
Almost all the existing SPM techniques are concentrated on the present buying behaviour of customer but there are some patterns which are having sufficient potential to become strong in future, which are suffering from slightly less support. Slight reduction in support boundary can laid such pattern in consideration, which can be emerging purchase patterns for tomorrow. Such Emerging Patterns (EPs) are considered in proposed approach here.
In Equation 1, SD is Sequential Database with timestamp. FP is Frequent Patterns, which are generated for support values. x is support value which generates frequent patterns (FP) Support = x% . support y is generated by reduction of 2% in support value x. (FP) Support = y% is frequent patterns generated by slight reduction of support value which high lights those patterns which are facing problem of little low frequency. SEQ EP1 is emerging patterns generated by reduced boundary value, recent patterns from those patterns might be potential patterns of tomorrow.
Where, |Y-X | = Boundary value > 0, SD = Sequential Database, FP = frequent pattern, SEQ EP1 = Sequential Pattern for reduced boundary value; X,Y > 0 (refer Equation 2) Where, C recency is recency timestamp constraint. SEQ recency_EP is emerging sequential patterns generated by recency constraint and EP is Emerging Patterns which are recent. Generated patterns are suffering from low support but not old. Old patterns are eliminated by inclusion of recency constraint. For example, <(Computer) (Floppy)> was earlier famous buying pattern. Suppose, it's having 18% support value. It was rejected for support value 20%. Reduction of boundary value high lights this pattern though it is old (SEQ EP1 = (FP) Support = x% -(FP) Support = y% ). Another buying pattern <(laptop)(blue ray disk) > is recent but not too frequent as compared to complete transactional database. Because of less little value of support, it is rejected. Pattern <(Laptop)(Blue ray disk)> is selected by slight reduction of boundary. The patterns which are on its forming stage can be easily detected.

Profitable pattern
It is important for any business to understand profitable purchase. Profit is indirectly derived from Monitory constraint. Profit is depends upon two valuable parameters: purchase price and sold price. Profitability is one of the important parameter of customer segregation and retention. The Profitable constraint define item in a sequence must be more than the defined threshold value. The Profitable constraint is formally represented as following: Where, ω ∈ {≥}, ∆T integer value/ Profitable threshold. α is sequence. (1) A sequence S S = <(q 1 (qty 1 ), t 1 , M_Sold 1, M_Pur 1 ), (q 2 (qty 2 ), M_Sold 2 , M_Pur 2 ), … , (q m (qty m ), M_ Sold m , M_Pur m ) > is said to be a subsequence of S only if, (1) item set S S is a subsequence of S, S S ∈S and (2) the number of items in S should satisfied (refer Equations 3 and 4)

Influential item
It is important for decision maker to understand purchase of which item leads to another purchase(s) after some time period. For example, TV→DVD→(CD,CD Box). TV is highly influenced item which leads to second or third purchase(s). High association of such chain purchase leads to loyal customer, with the help of length constraint such chain purchase can be identified. Amalgamation of length and profitability can highlight those patterns which are building strong relationship with customer and lead to sequence purchase.

Seasonal buying patterns
Most of the customer moving for purchase during festival seasons like Diwali and Christmas. Purchase of garment and house hold is drastically increased during these time span. In India, purchase of gold and diamond are increased during Akshay Tritya, Lakshmi Pujan and Pushya Nakshtra every year. 3 Compactness constraint which used to represent duration is helpful parameter to understand active purchase during particular time span. Patterns which are highly purchased can be extracted by compactness constraint; business maker can change sale value of such items which leads to profitability (Pei, Han, & Wang, 2007

Constraints to understand proposed algorithm
Traditional sequential pattern mining only is distinguishes whether a pattern appears or not (Ayres et al., 2002;Garofalakis et al., 1999;Zaki, 2001). The original Prefix span algorithm only worked on frequency constraint to discover sequential patterns from sequence database (Pei et al., 2001). RFM pattern mining approach not only determines the existence of a pattern but also checks whether it satisfies the recency and the monetary constraints (Chen et al., 2009). Proposed approach worked on seven constraints-Frequency constraint, Item constraint, Length constraint, Gap constraint, Compactness constraint, Recency constraint, Quantity constraint and profitable constraint. Above constraints are incorporated in original Prefix span algorithm. Following key concepts are required to understand the proposed algorithm:

Frequency constraint
The frequency constraint is defined as each discovered pattern must satisfy minimum support That is, the support of an item is the percentage of transaction in which that items occurs (Refer Equation 6).

Recency constraint
Recency constraint is specified by giving a recency minimum support (r_minsup). In time stampedbased dataset minimum Recency time stamp is given. The patterns having more time stamp can be selected. For example, ((a)<1>(b,c)<2>) and ((a,b)<3>(b,c)<5>) are frequent patterns. Suppose, r_ minsup = 3 than only ((a,b)<2>(b,c)<5>) is selected because pattern is generated at later time stamped value, which are considered as recent pattern.
"after buying item a and b, the customer moves to buy item b and item c." Then, the transaction in the sequence that buys item b and item c must satisfy recency constraint. Time stamp of last buying item (b, c) should ≥ r_minsup = 3. Formally, Recency constraint is define in Equation 7 (Pei et al., 2007).
where θ ∈ {≤, ≥} and ∆t is a given integer. A sequence α satisfies the constraint if and only if , and for all 1 < j ≤ len(α),

Compactness constraint
Compactness constraint is specified by giving compactness range (min_compactness, max_compactness). In sequence databases each transaction in every sequence has a timestamp. The time-stamp difference (difference of days) between the first and the last transactions in a discovered sequential pattern must not be greater than given period. (Refer Equation 9) (Pei et al., 2007
A sequence S S = < (q 1 (qty 1 )), (q 2 (qty 2 )), .... , (q m (qty m )) > is said to be a subsequence of S only if, (1) itemset S S is a subsequence of S, S S ∈S and (2) the number of items in S should satisfied Where, T Qty is quantity constraint.
Total Monitory: The Total Monitory (TM) constraint defines item in a sequence must be more than the defined threshold value. The Total Monitory constraint is formally represented in Equation 10. (7) C recency ≡ recency( ) Δt Where, ω ∈ {≥}, ∆T integer value/ Profitable threshold. α is sequence.

Length constraint
Length constraint is specified by giving minimum length support (l_minsup).Length of the discovered pattern must be greater than l_minsup. Suppose the pattern is <(a,b)(b,c)> and l_minsup = 2. The sequence is selected because length of the sequence = 2 ≥ l_minsup(=2). (refer Equation 12) (Pei et al., 2007).

Item constraint
An item constraint specifies subset of items that should or should not be present in the patterns. Suppose the pattern is <(a), (bc)>. If the item constraint is b than it is satisfied by above pattern. It is in the form of Equation 13 (Pei et al., 2007).

Strength and working of proposed algorithm
(i) The proposed algorithm being FP-Growth based, Constraint-based prefix span reduces candidate generation and works on projected prefix database. (10) (ii) First the algorithm scans the database and identifies frequent items. It recursively finds the prefix not only on frequent items but also with consideration of gap constraint which takes care of two adjacent time stamps (max_gap,min_gap),with first and last time stamp of prefix sequence having to satisfy compactness constraint.
(iii) The sequence which does not follow such constraints can be pruned at pseudo projection level. This process reduces the database projection cost and search space as compared to sole parameter support threshold thereby increasing efficiency of proposed algorithm.
(iv) Incorporation of Length constraint limits the generation of sequences by pruning the sequences having more length at projection level. Item and Recency act as post processing parameters. Incorporation of such constraints in conventional Prefix Span gives more effective results as per user's interest.
(v) Proposed Emerging Pattern mining algorithm is identifying those patterns which are not in limelight but have potential to become strong in near future. Such hidden patterns can be highlighted using slight reduction of boundary value of support threshold and inclusion of recency constraint. Figure 1 shows three stages of sequential pattern generation. Dotted line and straight line denoted backward flow and forward flow. Figure 2 shows detail of Pre-processing of input data. After generation of sequential input file in format, file is used for sequence generation. Figure 3 and Figure 4 shows flow to generate sequences which satisfied frequency, compactness and gap constraints (FCG-sequences).

Procedure:
Step 1: Find length-1 patterns and remove irrelevant sequences.
(i) Scan the sequence database SDB once to count f-support for each item set.   (ii) Identify patterns as length-1 patterns.
(iii) Infrequent items are removed and generate pseudo-sequence database Step 2: Divide the set of sequential patterns into subsets Without considering constraint C, the complete set of sequential patterns should be divided into subsets without overlap according to the set of length-1 sequential patterns (prefix).
Step 3: Construct projected database and mine subsets recursively (i) Construct projected database for each prefix.
(ii) Recursively generate projected database for each new prefix and mine it to find local frequent patterns.
Step 4: Mine constraint-based sequential Pattern from Projected database and mine subset recursively.

(i) Apply gap and compactness constraint.
(a) In each projected database, frequent adjacent item i and j, t i and t j are time stamped, respectively where, sequence <(i, t i ), (j, t j )>. | t i -t j | ≤ | min_gap -max_gap | Figure 5. Generation of complete constraint-based sequential pattern (C).

(b) In each projected database, time stamp difference of first and last frequent items i and k,t i and t k are time stamped respectively, where sequence <(i,t i ), (j,t j ), (k,t k )>. | t i -t k | ≥ | min_compact -max_compact |
(ii) For each frequent item i append it to prefix to generate new prefix in such a way that a) i can be assembled to the last element of prefix to form a sequential pattern or b) <i> can be appended to prefix to form a sequential pattern.
(iii) Recursively generate projected database for each new prefix which satisfies C Compactness and C Gap constraints to find local frequent patterns. Where, C Compactness and C Gap are compactness and Gap constraint.
Repeat step 3 and step 4 recursively.
iii. Apply item constraint to frequent patterns: Check item iis present in global frequent patterns.
Once frequent patterns are generated its check the prescribed item is present in the sequence or not, which should satisfied i_constraint.

Challenges of proposed Algorithm
I. There is no scientific survey about boundary value reduction in proposed Emerging Patterns mining algorithm. Only domain experts can decide how much reduction is required for a particular application.
II. Reduction of support threshold boundary generates enormous numbers patterns.

Justification for choosing the prefix span to modify
Following graph shows the execution time and memory usage of various SPM algorithms namely GSP (Srikant & Agrawal, 1996), SPADE (Zaki, 2001), SPAM (Ayres et al., 2002), SPIRIT (Garofalakis et al., 1999) and FP-Growth-based Freespan (Han, Dong et al., 2000), Prefixspan (Pei et al., 2001). Experiment conducted on four real-time datasets namely Breast cancer, Mushroom, Leviathan, MSNBC and six synthetic dataset generated by IBM generator. 5 IBM generated datasets are described in Table 1 and Table 2. Statistical Description of Breast Cancer Dataset is described in Table 3. Figures 6, 7 and 8 describe performance of SPM algorithms on real-time Breast Cancer Dataset.

Working environment
Algorithms worked on Java environment and tested on an Intel Core Duo Processor with 2GB main memory under Windows operating system.
Following observations can be made from above experiments: • Fifty-eight percent more time is taken by SPADE algorithm as compared to Prefix span, which is gradually reducing with respect to increasing support values. Approximately 18%-19% more execution time is taken by Prefix Span as compared to SPAM. Almost 68.6% more execution time is taken by GSP as compared by Prefix Span.
• Prefix Span and SPAM are generating same number of frequent sequences. Same way the same number of sequences are generated by SPADE and GSP. Ninety percent less sequences are generated by Prefix Span and SPAM as compare to SPADE and GSP.
• Memory consumption is less by Prefix Span as compare to other algorithms because of generation of pseudo database.

Comparative study of proposed algorithm with the traditional SPM algorithms
This section empirically compared the proposed Constraint-based Prefix Spanalgorithm with the traditional SPM algorithm Prefix Span (Zaki, 2001) and constraint-based SPM method RFM (Chen et al., 2009). Experimental study performed on synthetic datasets (describe in Table 4).  = (1,3), Recency =2, Length = 3. Seven tests are performed to evaluate the proposed algorithm, which are described in Table 4.
The first test is designed for efficiency and effectiveness analysis in form of, execution time and number of pattern generation respectively. Test II emphasis of scalability analysis. Test III evaluates importance of individual constraint in form of pattern generation. Test IV tries to find out buying behaviour of customer using length constraint. Test V, VI and VII emphasis on Recency and Gap constraints, how number of pattern generation is changing for wider range to smaller range of Gap and for increment of Recency time stamp value.
Test I executed based on six synthetic datasets. Comparison made for run times and pattern generation of three algorithms: proposed constraint based Prefix Span (RFCGL) with RFM and algorithm Prefix Span. Here, result of C10S4T2.5N10 and C50S4T2.5N10 are shown. Rest of the results (for remaining four datasets) are same, (Refer Figure 9).
In Test II, six synthetic datasets were used to perform scalability analysis, which varied the value of |C| (from 10 K to 50 K) for support range (0.010%-0.025%) for 30% confidence value. (Refer Figure 10) Test III explored how the recency, frequency, compactness, gap and length constraints influence the generation of sequential patterns. CFRGL patterns, CFRG* patterns, CFR** patterns,*FR** patterns and *F*** patterns, where *F*** patterns are the traditional sequential patterns. Table 5 lists the amounts of these five patterns and their corresponding percentages with respect to the traditional pattern (*F***). (Refer Figure 11) Test VI shows, only one-fourth of patterns are recent and compact. Three-fourth patterns are traditional in C10S4T2.5N10 and C20S4T2.5N10 datasets, which is decreasing by more 5%-7% in remaining datasets. Inclusion of all the constraints reduced the patterns drastically. Only 3%-5% patterns are having length more than 3 means 95% times of purchase happen only ones or twice in sequence. Average 36% of time purchase happens only one time. People are not moving for further purchase. Average 45.25% of the purchase happen two times is moving for third or more time. (Refer Figure 12) Test V shows, Numbers of sequential patterns are reducing gradually for older to later timestamp. But execution time remains same for confidence and support values. Execution time is getting reduced by reducing support count. Same way number of generated sequential patterns are also reduced. (Refer Figures 13, 14, 15) Test VI, shows number of patterns generated and execution time for various supports and confidence. More no of patterns are generated for larger gap and it's reduced for smaller gap. Execution time remains same. Increases of confidence value also generate more number of patterns by 11% and 10% wrt. 20%-40%, 40%-60% confidence value. Decreasing rate of support also decrease execution time by 75%. (Refer Figure 16)      In Test VII, We have taken support, confidence as a fix value 0.001(0.1%) and 0.5(50%) respectively and gap = <1,3>.more number of patterns are generated for wider range. One percent more number of patterns are generated as compared to smaller range of timestamp e.g. <min,max>:<2,3>-<2,4>and <3,3>-<3,4>. (Refer Figure 17)

Experiment of emerging patterns (EPs)
Second experiment discovers such patterns which are lies on boundary but does not discover because of little low support value. Such patterns are not in lime light now but might be in high light for future. It's having potential to become strong in future known as Emerging Patterns (EPs). EPs based on customers' buying behaviour captured at various support values: 0.1%, 0.08% and 0.05%. Almost 10%-12% patterns are known as Emerging Patterns (EPs), are generated after reduction of boundary value for IBM generated synthetic dataset.  After reduction of boundary value more patterns are generated, out of them 36% (C10s4T2.5N10), 18% (C30s4T2.5N10) and 23% (C50s4T2.5N10) new patterns are generated which are known as forming stage patterns. Patterns which are already generated for normal support threshold are discovered at reduced boundary support with change support value. Sixty-four percent (C10s4T2.5N10), 82% (C30s4T2.5N10) and 77% (C50s4T2.5N10) patterns are older patterns. (Refer Figure 18).

Conclusion and future direction
Proposed Constraint-based Prefix Span algorithm is not restricted to conventional Sequential Pattern Mining (SPM) parameter frequency but incorporates six more important parameters like Gap, Recency, Compactness/Duration, Profitability, Item and Length. Incorporation of these constraints in FP-growth based-Prefix Span leads to more efficient and effective results by reduction of patterns. Concise patterns present relevant and precise results in terms of users' interest. Seven different experiments are performed on IBM generated six synthetic datasets. Comparison made for run times and pattern generation of three algorithms: proposed constraint-based Prefix Span with RFM and Prefix Span. Proposed constraint-based Prefix Span algorithm is more efficient and effective in terms of reduction of patterns generation of interesting patterns for user. Experiment studies also reveal that less number of patterns is generated by Duration/Compactness constraint with emphasis on patterns that are active for certain time span. Simulation study of Gap states that: "less number of        people is moving towards another purchase within a short time period but increase in time gap will give more number of purchases." Recency and Profitability are important parameters to formulate marketing strategies and the proposed algorithm addresses both parameters. Proposed Constrained-based Prefix Span is scalable in terms of generation of patterns and execution time by varying range of customer from 10K to 50K. Simulation also captures Emerging Patterns (EPs) by reduction of support boundary and recency parameter. Algorithm is also able to identify pattern which are at formative stage. Sequential pattern-based clustering can be helpful for customer segmentation for enabling marketers to target loyal customer groups.