Abstract
In geographic association rule mining many patterns are either redundant or contain well known geographic domain associations explicitly represented in knowledge resources such as geographic database schemas and geo-ontologies. Existing spatial association rule mining algorithms are Apriori-like, and therefore generate a large amount of redundant patterns. For non-spatial data, the closed frequent pattern mining technique has been introduced to remove redundant patterns. This approach, however, does not warrant the elimination of both redundant and well known geographic dependences when mining geographic databases. This paper presents a novel method for pruning both redundant and well known geographic dependences, by pushing semantics into the pattern mining task. Experiments with real geographic databases have demonstrated a significant reduction of the total amount of patterns and the efficiency of the method.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) “Fast algorithms for mining association rules”, In proceedings of the 20th international conference on very large databases, Santiago. Chile. J.B. Bocca, M. Jarke, C. Zaniolo (eds) Morgan Kaufmann, San Francisco, pp. 487-499
Appice A, Ceci M, Lanza A, Francesca L, Malerba D (2003) Discovery of spatial association rules in geo-referenced census data: a relational mining approach. Intell Data Anal 7(6):542–566
Bogorny V, Camargo S, Engel PM, Alvares LO (2006) “Towards elimination of well known geographic domain patterns in spatial association rule mining.” In Proc. 3th IEEE International Conference on Intelligent Systems. London, IEEE Computer Society, Los Alamitos, pp. 532-537
Bogorny V, Camargo S, Engel PM, Alvares LO (2006) "Mining frequent geographic patterns with knowledge constraints". In Proc international symposium on advances in geographic information systems, Arlington, Virginia, R. A. de By, S. Nittel (eds) ACM Press, New York, pp.139-146
Bogorny V, Valiati JF, Camargo S, Engel PM, Kuijpers B, Alvares LO (2006) "Mining maximal generalized frequent geographic patterns with knowledge constraints". In Proc international conference on data mining, Hong Kong, China, IEEE Computer Society, Los Alamitos, pp. 813-817
Bogorny V, Palma AT, Engel PM (2006) Alvares LO “Weka-GDPM: integrating classical data mining toolkit to geographic information systems”, 2nd edn. Porto Alegre, SBBD Workshop on Data Mining Algorithms and Applications, Florianopolis, Brazil, SBC, pp 9–16
Bogorny V, Kuijpers B, Alvares LO (2008) Reducing uninteresting spatial association rules in geographic databases using background knowledge: a summary of results. Int J Geogr Inf Sci 22(4):361–386. doi:10.1080/13658810701412991
Bogorny V, Engel PM (2007) Alvares LO “Enhancing the Process of Knowledge Discovery in Geographic Databases using Geo-Ontologies”, in Data Mining with Ontologies: Implementations, Findings, and Frameworks.: H. O. Nigro, S. G. Cisaro and D. Xodo (eds.). Hershey, Idea Group Inc, pp 160–181
Booch G, Rumbaugh J (1998) Jacobson I The unified modelling language:. Reading, user guide, Addison-Wesley
Chifosky EJ, Cross JH (1990) Reverse engineering and design recovery: a taxonomy. IEEE Softw 7:13–17. doi:10.1109/52.43044
Clementini E, Di Felice P, Koperski K (2000) Mining multiple-level spatial association rules for objects with a broad boundary. Data Knowl Eng 34(3):251–270. doi:10.1016/S0169-023X(00)00017-3
Elmasri R (2003) Navathe S Fundamentals of database systems, 4th edn. Reading, Addison Wesley
Huang Y, Shekhar S, Xiong H (2004) Discovering co-location patterns from spatial datasets: a general approach. IEEE Trans Knowl Data Eng 16(12):1472–1485. doi:10.1109/TKDE.2004.90
Jaroszewicz S, Simovici DA (2001) “A general measure of rule interestingness”. In Proc. international conference on principles and practice of knowledge discovery in databases, L. D. Raedt, A. Siebes (ed) Springer, Berlin, pp. 253–265
Jaroszewicz S, Simovici DA (2004) “Interestingness of frequent itemsets using bayesian networks as background knowledge”, In proceedings of the international conference on knowledge discovery and data mining. W. Kim, R. Kohavi, J. Gehrke, W. DuMouchel (eds) ACM Press, New York, pp. 178–186
Koperski K, Han J (1995) “Discovery of spatial association rules in geographic information databases.” In Proceedings of the 4th international symposium in large spatial databases, M.J. Egenhofer, J.R. Herring (eds), Springer, Berlin, pp. 47-66
Liu B, Hsu W, Chen S, Ma Y (2000) Analyzing the subjective interestingness of association rules. IEEE Intell Syst 15(5):47–55. doi:10.1109/5254.889106
Mennis J, Liu J (2005) Mining association rules in spatio-temporal data: an analysis of urban socioeconomic and land cover change. Trans GIS 9(1):5–17. doi:10.1111/j.1467-9671.2005.00202.x
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) "Discovering frequent closed itemsets for association rules”. In Proceedings of the Seventh international conference on database theory, C. Beeri, P. Buneman (ed) Springer: Berlin, pp. 398-416
Pei J, Han J, Mao R (2000) “CLOSET an efficient algorithm for mining frequent closed itemsets”. In Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery, dallas, USA, W. Chen, J.F. Naughton, P.A. Bernstein (eds) ACM Press, New York
Servigne S, Ubeda T, Puricelli A, Larini R (2000) A Methodology for spatial consistency improvement of geographic databases. GeoInformatica 4(1):7–34. doi:10.1023/A:1009824308542
Shekhar S, Chawla S (2003) Spatial databases: a tour. Prentice Hall, Upper Saddle River
Silberschatz A, Tuzhilin A (1996) What makes patterns interesting in knowledge discovery systems. IEEE Trans Knowl Data Eng 8(6):970–974. doi:10.1109/69.553165
Srikant R, Agrawal R (1995) “Mining generalized association rules”. In Proceedings of the 21st international conference on very large databases, Zurich, Switzerland, U Dayal, P. M. Gray, S. Nishio (eds.), Morgan Kaufmann, San Francisco, pp. 407-419
Webb GI (2006) “Discovering significant rules”, In proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining. T. Eliassi-Rad, L. H. Ungar, M. Craven, D. Gunopulos (eds) ACM Press, New York, pp. 434–443
Webb GI, Zhang S (2005) K-Optimal Rule Discovery. Data Min Knowl Discov 10(1):5–79. doi:10.1007/s10618-005-0255-4
Witten I (2005) Frank E Data Mining: Practical machine learning tools and techniques, 2nd edn. San Francisco, Morgan Kaufmann
Tan P-N, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313. doi:10.1016/S0306-4379(03)00072-3
Xin D, Cheng H, Yan X, Han J (2006) “Extracting redundancy-aware top-k patterns”. In Proceedings of the Twelfth ACM SIGKDD international conference on knowledge discovery and data mining. T. Eliassi-Rad, L. H. Ungar, M. Craven, D. Gunopulos (eds) ACM Press, New York, pp. 444–453
Yoo JS, Shekhar S, Celik M (2006) A joinless approach for mining spatial colocation patterns. IEEE Trans Data Knowl Eng 18(10):1323–1337. doi:10.1109/TKDE.2006.150
Zaki M, Ching-Jui H (2002) “CHARM: An efficient algorithm for closed itemset mining”, In proceedings of the second SIAM international conference on data mining. R. L. Grossman, J. Han, V. Kumar, H. Mannila, R. Motwani (eds) Arlington, VA, SIAM, Philadelphia, pp. 457-473
Acknowledgment
Our thanks for both CAPES and CNPQ which partially provided the financial support for this research. To Procempa, for the real geographic databases. To the anonymous reviewers for their comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bogorny, V., Valiati, J.F. & Alvares, L.O. Semantic-based pruning of redundant and uninteresting frequent geographic patterns. Geoinformatica 14, 201–220 (2010). https://doi.org/10.1007/s10707-009-0082-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-009-0082-7