循序樣式探勘是一門研究如何從序列資料庫裡找出頻繁循序樣式的資料探勘方法,過去的序列資料探勘方法大致可分成兩大類[2]:Apriori-like methods [7][8][9][23]和Pattern-growth methods [10][12][16][17][20]。在一個循序樣式中,兩事件間隔發生的時間機率可以提供更多資訊給決策者分析與預測關聯樣式的變化。然而先前的研究並沒有發展出能於探勘樣式的過程裡同時找出此機率的技術。因此,為了提供這樣的資訊,我們擴充PrefixSpan演算法且發展成一套新的演算法PCTP(PrefixSpan Considering Time Probability)。此方法也可以藉由考量最小機率的限制,來減少探勘過程中產生的樣式數量。 本研究以實驗來比較PCTP與現存的循序樣式探勘方法,結果顯示PCTP可彌補過去相關方法的不足。由效能研究中亦証明-PCTP是一個能精減關聯樣式並可為循序樣式提供額外時間機率資訊的有效方法。
Sequential Pattern Mining is a data mining method that is used to find frequent sequential patterns in a sequential database. The conventional sequence data mining methods can be divided into two categories[2]: Apriori-like methods [7][8][9][23] and Pattern-growth methods[10][12][16][17][20]. Time-interval probability between two events in a sequential pattern can provide more information for decision maker to analyze and predict the behavior of correlated pattern. However, in the previous studies there is no technique developed to simultaneously discover the probability in the pattern mining process. Thus, to provide such information, we extend the PrefixSpan method and develop a new sequential pattern mining approach, PCTP(PrefixSpan Considering Time Probability). The proposed approach can also reduce the number of patterns produced in the mining process by considering the minimize probability constraint. The proposed approach is compared to existing sequential pattern mining methods to show how they complement each other to discover association rules. Our performance study shows that PCTP is a valuable approach to condense the correlated patterns and provide additional time-interval probability information for sequential pattern.