ABSTRACT
Conditional functional dependency (CFD) has been verified to be more effective for checking data consistency than traditional FD, and there are quite a few algorithms of mining CFDs from a static database. However, records in a database are frequently added, deleted or modified in reality. Thus, relevant incremental algorithms are preferred in a dynamic updating database. To our knowledge, the study of incremental algorithms for mining CFDs are rare. In this paper, an incremental algorithm, iCFDMiner is proposed based on the batch algorithm CFDMiner, which is very popular for discovering constant CFDs in static databases. It is proved that iCFDMiner scales well with the size of the database, and all operations (adding, deleting and modifying). Experiments show that iCFDMiner outperforms CFDMiner in terms of running time and computing spaces in most cases.
- Cong G, Fan W, Geerts F, et al. Improving Data Quality: Consistency and Accuracy{C}. Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07), Austria, Sep. 23-27, 2007(7): 315--326. Google ScholarDigital Library
- Fan W, Geerts F, Jia X, et al. Conditional Functional Dependencies for Capturing Data Inconsistencies{J}. ACM Transactions on Database Systems, 2008, 33(2): 1--48. Google ScholarDigital Library
- Li J, Liu J, Toivonen H, et al. Effective Pruning for the Discovery of Conditional Functional Dependencies{J}. The Computer Journal, 2013, 56(3): 378--392. Google ScholarDigital Library
- Fan W, Geerts F, Li J, et al. Discovering Conditional Functional Dependencies{J}. IEEE Transactions on Knowledge & Data Engineering, 2011, 23(5): 683--698. Google ScholarDigital Library
- Diallo T, Novelli N, Petit J. Discovering (Frequent) Constant Conditional Functional Dependencies{J}. International Journal of Data Mining, Modelling and Management, 2012, 4(3): 205--223.Google Scholar
- Bohannon P, Fan W, Geerts F, et al. Conditional Functional Dependencies for Data Cleaning{C}. Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE'07), Istanbul, Turkey, Apr. 15-20, 2007: 746--755.Google ScholarCross Ref
- Fan W, Geerts F. Foundations of Data Quality Management{M}. Morgan & Claypool, 2012. Google ScholarDigital Library
- Liu X, Li J. Discovering extended conditional functional dependencies{J}. Journal of Computer Research & Development, 2015, 52(1): 130--140.Google Scholar
- Zhou J, Diao X, Cao J, et al. A method for generating fixing rules from constant conditional functional dependencies{C}. IEEE International Conference on Knowledge Engineering & Applications(ICKEA'17), Singapore, Sep 28-30, 2017: 6--11.Google Scholar
- Zhou J, Diao X, Cao J, et al. An Optimization Strategy for CFDMiner: An Algorithm of Discovering Constant Conditional Functional Dependencies{J}. IEICE Transactions on Information and Systems, 2016, 99(2): 537--540.Google ScholarCross Ref
- Li J, Li H, Wong L, et al. Minimum Description Length Principle: Generators are Preferable to Closed Patterns{C}. Proceedings of the 21st National Conference on Artificial Intelligence (AAAI'06) and the 18th Innovative Applications of Artificial Intelligence Conference (IAAI'06), Boston, Massachusetts, USA, Jul. 16-20, 2006: 409--414. Google ScholarDigital Library
- Li J, Liu G, Wong L. Mining Statistically Important Equivalence Classes and Delta-discriminative Emerging Patterns{C}. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07), San Jose, California, USA, Aug. 12-15, 2007: 430--439. Google ScholarDigital Library
- Zhou J, Diao X, Cao J. Mining of constant conditional functional dependencies based on pruning free itemsets{J}. Qinghua Daxue Xuebao/Journal of Tsinghua University, 2016, 56(3): 253--261.Google Scholar
- Tran A, Truong T, Le B. Simultaneous Mining of Frequent Closed Itemsets and Their Generators{J}. Engineering Applications of Artificial Intelligence, 2014, 36: 64--80. Google ScholarDigital Library
- Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules{C}. Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94), Santiago de Chile, Chile, Sep. 12-15, 1994: 487--499. Google ScholarDigital Library
- Calders T, Goethals B. Non-derivable Itemset Mining{J}. Data Mining & Knowledge Discovery, 2007, 14(1): 171--206. Google ScholarDigital Library
- Goethals B, Zaki M J. Frequent Itemset Mining Implementations{C}. Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM'03), Melbourne, Florida, USA, Dec. 19-22, 2003: 1--13.Google Scholar
- Li H, Li J, Wong L, et al. Relative Risk and Odds Ratio: A Data Mining Perspective{C}. Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'05), Baltimore, Maryland, USA, Jun. 14-16, 2005: 368--377. Google ScholarDigital Library
- Pasquier N, Bastide Y, Taouil R, et al. Discovering Frequent Closed Itemsets for Association Rules{C}. Proceedings of the 7th International Conference on Database Theory (ICDT'99), Jerusalem, Israel, Jan. 10-12, 1999: 398--416. Google ScholarDigital Library
- Wang J, Han J, Pei J. CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets{C}. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03), Washington, DC, USA, Aug. 24-27, 2003: 236--245. Google ScholarDigital Library
- Zaki M J. Mining Non-redundant Association Rules{J}. Data Mining & Knowledge Discovery, 2004, 9(3): 223--248. Google ScholarDigital Library
Index Terms
- iCFDMiner: An Incremental Algorithm of Mining Constant CFDs from Dynamic Databases
Recommendations
An efficient algorithm for incrementally mining frequent closed itemsets
The purpose of mining frequent itemsets is to identify the items in groups that always appear together and exceed the user-specified threshold of a transaction database. However, numerous frequent itemsets may exist in a transaction database, hindering ...
Finding Robust Itemsets under Subsampling
Mining frequent patterns is plagued by the problem of pattern explosion, making pattern reduction techniques a key challenge in pattern mining. In this article we propose a novel theoretical framework for pattern reduction by measuring the robustness of ...
Dataless Transitions Between Concise Representations of Frequent Patterns
For many data mining problems in order to solve them it is required to discover frequent patterns. Frequent itemsets are useful e.g. in the discovery of association and episode rules, sequential patterns and clusters. Nevertheless, the number of ...
Comments