Abstract
Efficiency and Scalability have always been important concerns in the field of data mining, and are even more so in the multi-relational context, which is inherently more complex. The issue has been receiving an increasing amount of attention during the last few years, and quite a number of theoretical results, algorithms and implementations have been presented that explicitly aim at improving the efficiency and Scalability of multi-relational data mining approaches. With this article we attempt to present a structured overview.
- R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307--328. The MIT Press, 1996.]] Google ScholarDigital Library
- C. Anglano, A. Giordana, and G. Lo Bello. High-performance data mining on networks of workstations. In 10th International Symposium on Methodologies for Intelligent Systems, ISMIS'99, pages 520--528.Springer Verlag, 1999.]] Google ScholarDigital Library
- C. Anglano, A. Giordana, G. Lo Bello, and L. Saitta. An experimental evaluation of coevolutive concept learning. In J. Shavlik, editor, Proceedings of the 15th International Conference on Machine Learning, pages 19--27. Morgan Kaufmann, 1998.]] Google ScholarDigital Library
- S. Anthony and A. M. Frisch. Cautious induction in inductive logic programming. In N. Lavrač and S. Džeroski, editors, Proceedings of ILP-97, pages 45--60. Springer Verlag, LNCS 1297, 1997.]] Google ScholarDigital Library
- A. Appice, M. Ceci, and D. Malerba. Mining model trees :a multi-relational approach. In Proceedings of the 13th International Conference on Inductive Logic Programming. Springer-Verlag, 2003. To appear.]]Google ScholarCross Ref
- A. Atramentov, H. Leiva, and V. Honavar. A multi-relational decision tree learning algorithm: Implementation and experiments. In Proceedings of the 13th International Conference on Inductive Logic Programming. Springer-Verlag, 2003. To appear.]]Google ScholarCross Ref
- T. Bäck. Evolutionary Algorithms in theory and practice. New-York:Oxford University Press, 1995.]]Google Scholar
- W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming - An Introduction On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann, 1998.]] Google ScholarDigital Library
- F. Bergadano, R. Gemello, A. Giordana, and L. Saitta. ML-SMART: a Problem Solver for Learning from Examples. Fundamenta Informaticae, XII:29--50, 1989.]]Google Scholar
- J. Ales Bianchetti, C. Rouveirol, and M. Sebag. Constraint-based learning of long relational concepts. In C. Sammut, editor, Proceedings of the 19th International Conference on Machine Learning, pages 35--42. Morgan Kaufmann, 2002.]] Google ScholarDigital Library
- H. Blockeel. Top-down induction of first order logical decision trees. PhD thesis, Department of Computer Science, Katholieke Universiteit Leuven, 1998. http://www.cs.kuleuven.ac.be/-ml/PS/blockeel98:phd.ps.gz.]]Google Scholar
- H. Blockeel and L. De Raedt. Relational knowledge discovery in databases. In Proceedings of the Sixth International Workshop on Inductive Logic Programming, volume 1314 of Lecture Notes in Artificial Intelligence, pages 199--212. Springer-Verlag, 1996.]] Google ScholarDigital Library
- H. Blockeel and L. De Raedt, Top-down induction of first order logical decision trees. Artificial Intelligence, 101(1--2):285--297, June 1998.]]Google Scholar
- H. Blockeel, L. De Raedt, N. Jacobs, and B. Demoen. Scaling up inductive logic programming by learning from interpretations. Data Mining and Knowledge Discovery, 3(1):59--93, 1999.]] Google ScholarDigital Library
- H. Blockeel, L. Dehaspe, B. Demoen, G. Janssens, J. Ramon, and H. Vandecasteele. Improving the efficiency of inductive logic programming through the use of query packs. Journal of Artificial Intelligence Research, 16:135--166, 2002.]]Google ScholarDigital Library
- A. Borgida. On the relative expressiveness of description logics and predicate logics. Artificial Intelligence, 1996.]] Google ScholarDigital Library
- P. Cheeseman, B. Kanefsky, and W. M. Taylor. Where the really hard problems are. In Proc. of IJCAI'91, pages 331--337. Morgan Kaufmann, 1991.]]Google Scholar
- W. Chen and D. S. Warren. Tabled evaluartion with delaying for general logic programs. Journal of the ACM, 43(1):20--74, January 1996. http://www.cs.sunysb.edu/-sbprolog.]] Google ScholarDigital Library
- L. De Raedt. Logical settings for concept learning. Artificial Intelligence, 95:187--201, 1997.]] Google ScholarDigital Library
- L. De Raedt, H. Blockeel, L. Dehaspe, and W. Van Laer. Three companions for data mining in first order logic. In S. Džeroski and N. Lavrač, editors, Relational Data Mining, pages 105--139. Springer-Verlag, 2001.]] Google ScholarDigital Library
- L. De Raedt and S. Džeroski. First order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375--392, 1994.]] Google ScholarDigital Library
- L. Dehaspe and L. De Raedt. DLAB: A declarative language bias formalism. In Proceedings of the International Symposium on Methodologies for Intelligent Systems, volume 1079 of Lecture Notes in Artificial Intelligence, pages 613--622. Springer-Verlag, 1996.]] Google ScholarDigital Library
- L. Dehaspe and H. Toivonen. Discovery of frequent datalog patterns. Data Mining and Knowledge Discovery, 3(1):7--36, 1999.]] Google ScholarDigital Library
- T. G. Dietterich, R. H. Lathrop, and T. Lozano-Pérez. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1--2):31--71, 1997.]] Google ScholarDigital Library
- S. Džeroski and N. Lavrač, editors. Relational Data Mining. Springer-Verlag, 2001.]] Google ScholarDigital Library
- P.A. Flach and N. Lachiche. 1BC :a first order Bayesian classifier. In D. Page, editor, Proceedings of the Ninth International Workshop on Inductive Logic Programming, volume 1634, pages 93--103. Springer-Verlag, 1999.]] Google ScholarDigital Library
- L. Getoor, D. Koller, and B. Taskar. Statistical models for relational data. In Proc. of the KDD-02 Workshop on Multi-Relational Data Mining, 2002.]]Google Scholar
- A. Giordana and L. Saitta. REGAL: An integrated system for learning relations using genetic algorithms. In Proceedings of the 2nd International Workshop on Mul-tistrategy Learning, pages 234--249, 1993.]]Google Scholar
- A. Giordana and L. Saitta. Phase transitions in relational learning. Machine Learning Journal, 41:217--251, 2000.]] Google ScholarDigital Library
- A. Giordana, L. Saitta, M. Sebag, and M. Botta. Analyzing relational learning in the phase transition framework. In P. Langley, editor, Proceedings of the 17th International Conference on Machine Learning, pages 311.--318. Morgan Kaufmann, 2000.]] Google ScholarDigital Library
- D. E. Goldberg. Genetic algorithms in search, optimization and machine learning. Addison Wesley, 1989.]] Google ScholarDigital Library
- G. Gottlob, N. Leone, and F. Scarcello. The complexity of acyclic conjunctive queries. In 39th Annual Symposium on Foundations of Computer Science, FOCS '98, November 8--11, 1998, Palo Alto, California, USA., pages 706--715. IEEE Computer Society, 1998.]] Google ScholarDigital Library
- G. Gottlob, N. Leone, and F. Scarcello. On tractable queries and constraints. In Proceedings of the 10th International Conference on Database and Expert Systems Applications., pages 1--15, 1999.]] Google ScholarDigital Library
- J. Hekanaho. Dogma: A ga-based relational learner. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 205--214, 1998.]] Google ScholarDigital Library
- T. Hogg. Refining the phase transition in combinatorial search. Artificial Intelligence, 81(1--2):127--154, 1996.]] Google ScholarDigital Library
- T. Hogg, B. A. Huberman, and C. P. Williams, editors. Artificial Intelligence: Special Issue on Frontiers in Problem Solving: Phase Transitions and Complexity, volume 81(1--2). Elsevier, 1996.]]Google ScholarDigital Library
- T. Horvath and S. Wrobel. Towards discovery of deep and wide first-order structures :A case study in the domain of mutagenicity. In Discovery Science 2001, volume 2226 of Lecture Notes in Artificial Intelligence, pages 100--112, 2001.]] Google ScholarDigital Library
- M. Jarke and J. Koch. Query optimization in database systems. ACM Computing Surveys, 16(2), 1984.]] Google ScholarDigital Library
- C. J. Kennedy and C. Giraud-Carrier. A depth controlling strategy for strongly typed evolutionary programming. In GECCO 1999: Proceedings of the First Annual Conference, pages 1--6. Morgan Kaufmann, 1999.]]Google Scholar
- J-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, editor, Inductive Logic Programming, pages 335--359. Academic Press, 1992.]]Google Scholar
- J. U. Kietz and M. Lübbe. An efficient subsumption algorithm for inductive logic programming. In Proceedings of the 11th International Conference on Machine Learning, pages 130--138. Morgan Kaufmann, 1994.]]Google ScholarCross Ref
- A. J. Knobbe, A. Siebes, H. Blockeel, and D. van der Wallen. Multi-relational data mining, using UML for ILP. In Proceedings of PKDD-2000 - The Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases, volume 1910 of Lecture Notes in Artificial Intelligence, pages 1--12, Lyon, France, 2000. Springer.]] Google ScholarDigital Library
- J. R. Koza. Genetic Programming: On the Programming of Computers by means of Natural Evolution. MIT Press, Massachusetts, 1992.]] Google ScholarDigital Library
- J. R. Koza. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Massachussetts, 1994.]] Google ScholarDigital Library
- S. Kramer, L. De Raedt, and C. Helma. Molecular feature mining in HIV data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-01), pages 136--143. ACM Press, 2001.]] Google ScholarDigital Library
- S. Kramer, N. Lavrač, and P. Flach. Propositionalization approaches to relational data mining. In S. Džeroski and N. Lavrač, editors, Relational Data Mining, pages 262--291. Springer-Verlag, 2001.]] Google ScholarDigital Library
- D. Krzywania, J. Struyf, and H. Blockeel. Mining the UK traffic database. Internal SolEUNet report, 2002.]]Google Scholar
- N. Lachiche and P. Flach. A first-order representation for knowledge discovery and bayesian classification on relational data In PKDD2000 workshop on Data Mining, Decision Support, Meta-learning and ILP: Forum for Practical Problem Presentation and Prospective Solutions, pages 49--60, 2000.]]Google Scholar
- J. W. Lloyd. Declarative programming in Escher. Technical Report CSTR-95-013, University of Bristol, June, 1995.]] Google ScholarDigital Library
- J. W. Lloyd. Foundations of Logic Programming. Springer-Verlag, 2nd edition, 1987.]] Google ScholarDigital Library
- J. Maloberti and M. Sebag. Theta-subsumption in a constraint satisfaction perspective. In C. Rouveirol and M. Sebag, editors, Proceedings of Inductive Logic Programming, LNAI 2157, pages 164--178. Springer Verlag, 2001.]] Google ScholarDigital Library
- H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241--258, 1997.]] Google ScholarDigital Library
- H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241--258, 1997.]] Google ScholarDigital Library
- O. Maron and T. Lozano-Pérez. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems, volume 10, pages 570--576. Mit Press, 1998.]] Google ScholarDigital Library
- M. Mehta, R. Agrawal, and J. Rissanen. SLIQ :A fast scalable classifier for data mining. In Proceedings of the Fifth International Conference on Extending Database Technology, volume 1057 of Lecture Notes in Computer Science. Springer-Verlag, 1996.]] Google ScholarDigital Library
- A. W. Moore and M. S. Lee. Cached sufficient statistics for efficient machine learning with large datasets. Journal of Artificial Intelligence Research, 8:67--91, 1998.]]Google ScholarDigital Library
- K. Morik and P. Brockhausen. A multistrategy approach to relational discovery in databases. Machine Learning, 27(3):287--312, 1997.]] Google ScholarDigital Library
- S. Muggleton. Inductive logic programming. In S. Muggelton, editor, Inductive Logic Programming. Academic Press, 1992.]]Google Scholar
- S. Muggleton. Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming, 13(3--4):245--286, 1995.]]Google Scholar
- S. Muggleton and L. De Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming, 19:629--679, 1994.]]Google ScholarCross Ref
- C. Nédellec, H. Adé, F. Bergadano, and B. Tausend. Declarative bias in ILP. In L. De Raedt, editor, Advances in Inductive Logic Programming, volume 32 of Frontiers in Artificial Intelligence and Applications, pages 82--103. IOS Press, 1996.]]Google Scholar
- S.-H. Nienhuys-Cheng and R. De Wolf. Foundations of Inductive Logic Programming, volume 1228 of Lecture Notes in Computer Science and Lecture Notes in Artificial Intelligence. Springer-Verlag, New York, NY, USA, 1997.]] Google ScholarDigital Library
- T. Oates and D. Jensen. Large datasets lead to overly complex models: An explanation and a solution. In Knowledge Discovery and Data Mining, pages 294--298, 1998.]]Google Scholar
- D. Pavlov, H. Mannila, and P. Smyth. Beyond indepen-dence: Probabilistic models for query approximation on binary transaction data. IEEE Transactions on Knowledge and Data Engineering, 2003. To appear.]] Google ScholarDigital Library
- G. Plotkin. A note on inductive generalization. In B. Meltzer and D. Michie, editors, Machine Intelligence, volume 5, pages 153--163.Edinburgh University Press, 1970.]]Google Scholar
- J. R. Quinlan. Learning logical definition from relations. Machine Learning, 5:239--266, 1990.]] Google ScholarDigital Library
- J. R. Quinlan. FOIL :A midterm report. In P. Brazdil, editor, Proceedings of the 6th European Conference on Machine Learning, Lecture Notes in Artificial Intelligence. Springer-Verlag, 1993.]] Google ScholarDigital Library
- A. Ratle and M. Sebag. Genetic programming with domain knowledge for machine discovery. In Proceedings of the 12th International Conference on Inductive Logic Programming. Springer--Verlag, 2002.]]Google Scholar
- J. A. Robinson. A machine-oriented logic based on the resolution principle. Journal of the ACM, 12:23--41, 1965.]] Google ScholarDigital Library
- V. Santos Costa, A. Srinivasan, R. Camacho, H. Blockeel, B. Demoen, G. Janssens, J. Struyf, H. Vande-casteele, and W. Van Laer. Query transformations for improving the efficiency of ILP systems. Journal of Machine Learning Research, 2002. In press.]] Google ScholarDigital Library
- T. Scheffer, R. Herbrich, and F. Wysotzki. Efficient theta-subsumption based on graph algorithms. In Inductive Logic Programming, 6th International Work shop, Proceedings, volume 1314 of Lecture Notes in Artificial Intelligence, pages 212--228, 1996.]] Google ScholarDigital Library
- T. Scheffer and S. Wrobel. Finding the most interesting patterns in a database quickly by using sequential sampling. Journal of Artificial Intelligence Research, 3:833--862, 2002.]]Google Scholar
- M. Sebag and C. Rouveirol. Tractable Induction and Classification in First-Order Logic via Stochastic Matching. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, pages 888--893. Morgan Kaufmann, 1997.]]Google ScholarDigital Library
- M. Sebag and C. Rouveirol. Any-time relational reasoning: Resource-bounded induction and deduction through stochastic matching. Machine Learning, 38(1--2):41--62, 2000.]] Google ScholarDigital Library
- A. Serra, A. Giordana, and L. Saitta. Learning on the phase transition edge. In Proc. of IJCAI 2001, pages 921--926. Morgan Kaufmann, 2001.]]Google Scholar
- A. Srinivasan. A study of two sampling methods for analysing large datasets with ILP. Data Mining and Knowledge Discovery, 3(1):95--123, 1999.]] Google ScholarDigital Library
- A. Srinivasan. A study of two probabilistic methods for searching large spaces with ILP. Technical Report PRG-TR-16-00, Oxford University Computing Laboratory, 2000.]]Google Scholar
- J. Struyf and H. Blockeel. Query optimization in inductive logic programming by reordering literals. In Proceedings of the 13th International Conference on In ductive Logic Programming, Lecture Notes in Artificial Intelligence. Springer-Verlag, 2003. To appear.]]Google Scholar
- J. Struyf, J. Ramon, and H. Blockeel. Compact representation of knowledge bases in ILP. In Proceedings of the 12th International Conference on Inductive Logic Programming, volume 2583 of Lecture Notes in Artificial Intelligence, pages 254--269. Springer-Verlag, 2002.]]Google Scholar
- F. Torre and C. Rouveirol. Natural ideal operators in inductive logic programming. In Proceedings of the 9th European Conference on Machine Learning, pages 274--289, 1997.]] Google ScholarDigital Library
- E. Tsang. Foundations of Constraint Satisfaction. Academic Press, 1993.]]Google Scholar
- D. Tsur, J. D. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov, and A. Rosenthal. Query flocks: A generalization of association-rule mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-98), volume 27, 2 of ACM SIGMOD Record, pages 1--12, New York, June 1--4 1998. ACM Press.]] Google ScholarDigital Library
- H. Vandecasteele, B. Demoen, and G. Janssens. Compiling large disjunctions. In First International Conference on Computational Logic: Workshop on Parallelism and Implementation Technology for (Constraint) Logic Programming Languages, 2000. Also available as Technical Report CW 295, http://www.cs.kuleuven.ac.be/publicaties/rapporten/cv/CW295.ps.gz.]]Google Scholar
- M. L. Wong and K. S. Leung. Combining genetic programming and inductive logic programming using logic grammars. In D. B. Fogel, editor, Proceedings of the Second IEEE International Conference on Evolutionary Computation, pages 733--736. IEEE Press, 1995.]]Google Scholar
- S. Wrobel. An algorithm for multi-relational discovery of subgroups. In J. Komorowski and J. Zytkow, editors, Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD '97), pages 78--87. Springer-Verlag, 1997.]] Google ScholarDigital Library
- S. Wrobel, D. Wettschereck, E. Sommer, and W. Emde. Extensibility in data mining systems. In Proceedings of the Second International Conference on Knowledge Discovey and Data Mining (KDD-96). AAAI Press, 1996.]]Google ScholarDigital Library
Index Terms
- Scalability and efficiency in multi-relational data mining
Recommendations
Multi-relational data mining: an introduction
Data mining algorithms look for patterns in data. While most existing data mining approaches look for patterns in a single data table, multi-relational data mining (MRDM) approaches look for patterns that involve multiple tables (relations) from a ...
Interesting pattern mining in multi-relational data
Mining patterns from multi-relational data is a problem attracting increasing interest within the data mining community. Traditional data mining approaches are typically developed for single-table databases, and are not directly applicable to multi-...
Multi-relational Data Mining: a perspective
EPIA '01: Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint SolvingMulti-relational data mining (MRDM) is a form of data mining operating on data stored in multiple database tables. While machine learning and data mining are traditionally concerned with learning from single tables, MRDM is required in domains where the ...
Comments