research-article

iCFDMiner: An Incremental Algorithm of Mining Constant CFDs from Dynamic Databases

Authors:
Jinling Zhou

Jiuquan Satellite Launch Centre, Jiuquan and Army Engineering University, Nanjing

Jiuquan Satellite Launch Centre, Jiuquan and Army Engineering University, Nanjing
View Profile

,
Qian Cheng

Jiuquan Satellite Launch Centre, Jiuquan

Jiuquan Satellite Launch Centre, Jiuquan
View Profile

,
Shufang Li

Jiuquan Satellite Launch Centre, Jiuquan

Jiuquan Satellite Launch Centre, Jiuquan
View Profile

ICCDE '18: Proceedings of the 2018 International Conference on Computing and Data EngineeringMay 2018Pages 15–21https://doi.org/10.1145/3219788.3219808

Published:04 May 2018Publication History

ICCDE '18: Proceedings of the 2018 International Conference on Computing and Data Engineering

Pages 15–21

ABSTRACT

Conditional functional dependency (CFD) has been verified to be more effective for checking data consistency than traditional FD, and there are quite a few algorithms of mining CFDs from a static database. However, records in a database are frequently added, deleted or modified in reality. Thus, relevant incremental algorithms are preferred in a dynamic updating database. To our knowledge, the study of incremental algorithms for mining CFDs are rare. In this paper, an incremental algorithm, iCFDMiner is proposed based on the batch algorithm CFDMiner, which is very popular for discovering constant CFDs in static databases. It is proved that iCFDMiner scales well with the size of the database, and all operations (adding, deleting and modifying). Experiments show that iCFDMiner outperforms CFDMiner in terms of running time and computing spaces in most cases.

References

Cong G, Fan W, Geerts F, et al. Improving Data Quality: Consistency and Accuracy{C}. Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07), Austria, Sep. 23-27, 2007(7): 315--326. Google ScholarDigital Library
Fan W, Geerts F, Jia X, et al. Conditional Functional Dependencies for Capturing Data Inconsistencies{J}. ACM Transactions on Database Systems, 2008, 33(2): 1--48. Google ScholarDigital Library
Li J, Liu J, Toivonen H, et al. Effective Pruning for the Discovery of Conditional Functional Dependencies{J}. The Computer Journal, 2013, 56(3): 378--392. Google ScholarDigital Library
Fan W, Geerts F, Li J, et al. Discovering Conditional Functional Dependencies{J}. IEEE Transactions on Knowledge & Data Engineering, 2011, 23(5): 683--698. Google ScholarDigital Library
Diallo T, Novelli N, Petit J. Discovering (Frequent) Constant Conditional Functional Dependencies{J}. International Journal of Data Mining, Modelling and Management, 2012, 4(3): 205--223.Google Scholar
Bohannon P, Fan W, Geerts F, et al. Conditional Functional Dependencies for Data Cleaning{C}. Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE'07), Istanbul, Turkey, Apr. 15-20, 2007: 746--755.Google ScholarCross Ref
Fan W, Geerts F. Foundations of Data Quality Management{M}. Morgan & Claypool, 2012. Google ScholarDigital Library
Liu X, Li J. Discovering extended conditional functional dependencies{J}. Journal of Computer Research & Development, 2015, 52(1): 130--140.Google Scholar
Zhou J, Diao X, Cao J, et al. A method for generating fixing rules from constant conditional functional dependencies{C}. IEEE International Conference on Knowledge Engineering & Applications(ICKEA'17), Singapore, Sep 28-30, 2017: 6--11.Google Scholar
Zhou J, Diao X, Cao J, et al. An Optimization Strategy for CFDMiner: An Algorithm of Discovering Constant Conditional Functional Dependencies{J}. IEICE Transactions on Information and Systems, 2016, 99(2): 537--540.Google ScholarCross Ref
Li J, Li H, Wong L, et al. Minimum Description Length Principle: Generators are Preferable to Closed Patterns{C}. Proceedings of the 21st National Conference on Artificial Intelligence (AAAI'06) and the 18th Innovative Applications of Artificial Intelligence Conference (IAAI'06), Boston, Massachusetts, USA, Jul. 16-20, 2006: 409--414. Google ScholarDigital Library
Li J, Liu G, Wong L. Mining Statistically Important Equivalence Classes and Delta-discriminative Emerging Patterns{C}. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07), San Jose, California, USA, Aug. 12-15, 2007: 430--439. Google ScholarDigital Library
Zhou J, Diao X, Cao J. Mining of constant conditional functional dependencies based on pruning free itemsets{J}. Qinghua Daxue Xuebao/Journal of Tsinghua University, 2016, 56(3): 253--261.Google Scholar
Tran A, Truong T, Le B. Simultaneous Mining of Frequent Closed Itemsets and Their Generators{J}. Engineering Applications of Artificial Intelligence, 2014, 36: 64--80. Google ScholarDigital Library
Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules{C}. Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94), Santiago de Chile, Chile, Sep. 12-15, 1994: 487--499. Google ScholarDigital Library
Calders T, Goethals B. Non-derivable Itemset Mining{J}. Data Mining & Knowledge Discovery, 2007, 14(1): 171--206. Google ScholarDigital Library
Goethals B, Zaki M J. Frequent Itemset Mining Implementations{C}. Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM'03), Melbourne, Florida, USA, Dec. 19-22, 2003: 1--13.Google Scholar
Li H, Li J, Wong L, et al. Relative Risk and Odds Ratio: A Data Mining Perspective{C}. Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'05), Baltimore, Maryland, USA, Jun. 14-16, 2005: 368--377. Google ScholarDigital Library
Pasquier N, Bastide Y, Taouil R, et al. Discovering Frequent Closed Itemsets for Association Rules{C}. Proceedings of the 7th International Conference on Database Theory (ICDT'99), Jerusalem, Israel, Jan. 10-12, 1999: 398--416. Google ScholarDigital Library
Wang J, Han J, Pei J. CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets{C}. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03), Washington, DC, USA, Aug. 24-27, 2003: 236--245. Google ScholarDigital Library
Zaki M J. Mining Non-redundant Association Rules{J}. Data Mining & Knowledge Discovery, 2004, 9(3): 223--248. Google ScholarDigital Library

Index Terms

iCFDMiner: An Incremental Algorithm of Mining Constant CFDs from Dynamic Databases
1. Information systems
  1. Data management systems
    1. Information integration
      1. Data cleaning

Recommendations

An efficient algorithm for incrementally mining frequent closed itemsets

The purpose of mining frequent itemsets is to identify the items in groups that always appear together and exceed the user-specified threshold of a transaction database. However, numerous frequent itemsets may exist in a transaction database, hindering ...
Read More
Finding Robust Itemsets under Subsampling

Mining frequent patterns is plagued by the problem of pattern explosion, making pattern reduction techniques a key challenge in pattern mining. In this article we propose a novel theoretical framework for pattern reduction by measuring the robustness of ...
Read More
Dataless Transitions Between Concise Representations of Frequent Patterns

For many data mining problems in order to solve them it is required to discover frequent patterns. Frequent itemsets are useful e.g. in the discovery of association and episode rules, sequential patterns and clusters. Nevertheless, the number of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICCDE '18: Proceedings of the 2018 International Conference on Computing and Data Engineering
May 2018
116 pages
ISBN:9781450363938
DOI:10.1145/3219788

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Closed itemsets
Data quality
Free itemsets
Functional Dependency
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 43
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

iCFDMiner: An Incremental Algorithm of Mining Constant CFDs from Dynamic Databases

ICCDE '18: Proceedings of the 2018 International Conference on Computing and Data Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

An efficient algorithm for incrementally mining frequent closed itemsets

Finding Robust Itemsets under Subsampling

Dataless Transitions Between Concise Representations of Frequent Patterns

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

iCFDMiner: An Incremental Algorithm of Mining Constant CFDs from Dynamic Databases

ICCDE '18: Proceedings of the 2018 International Conference on Computing and Data Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

An efficient algorithm for incrementally mining frequent closed itemsets

Finding Robust Itemsets under Subsampling

Dataless Transitions Between Concise Representations of Frequent Patterns

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media