skip to main content
article

Scalability and efficiency in multi-relational data mining

Published:01 July 2003Publication History
Skip Abstract Section

Abstract

Efficiency and Scalability have always been important concerns in the field of data mining, and are even more so in the multi-relational context, which is inherently more complex. The issue has been receiving an increasing amount of attention during the last few years, and quite a number of theoretical results, algorithms and implementations have been presented that explicitly aim at improving the efficiency and Scalability of multi-relational data mining approaches. With this article we attempt to present a structured overview.

References

  1. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307--328. The MIT Press, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Anglano, A. Giordana, and G. Lo Bello. High-performance data mining on networks of workstations. In 10th International Symposium on Methodologies for Intelligent Systems, ISMIS'99, pages 520--528.Springer Verlag, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Anglano, A. Giordana, G. Lo Bello, and L. Saitta. An experimental evaluation of coevolutive concept learning. In J. Shavlik, editor, Proceedings of the 15th International Conference on Machine Learning, pages 19--27. Morgan Kaufmann, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Anthony and A. M. Frisch. Cautious induction in inductive logic programming. In N. Lavrač and S. Džeroski, editors, Proceedings of ILP-97, pages 45--60. Springer Verlag, LNCS 1297, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Appice, M. Ceci, and D. Malerba. Mining model trees :a multi-relational approach. In Proceedings of the 13th International Conference on Inductive Logic Programming. Springer-Verlag, 2003. To appear.]]Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Atramentov, H. Leiva, and V. Honavar. A multi-relational decision tree learning algorithm: Implementation and experiments. In Proceedings of the 13th International Conference on Inductive Logic Programming. Springer-Verlag, 2003. To appear.]]Google ScholarGoogle ScholarCross RefCross Ref
  7. T. Bäck. Evolutionary Algorithms in theory and practice. New-York:Oxford University Press, 1995.]]Google ScholarGoogle Scholar
  8. W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming - An Introduction On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Bergadano, R. Gemello, A. Giordana, and L. Saitta. ML-SMART: a Problem Solver for Learning from Examples. Fundamenta Informaticae, XII:29--50, 1989.]]Google ScholarGoogle Scholar
  10. J. Ales Bianchetti, C. Rouveirol, and M. Sebag. Constraint-based learning of long relational concepts. In C. Sammut, editor, Proceedings of the 19th International Conference on Machine Learning, pages 35--42. Morgan Kaufmann, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Blockeel. Top-down induction of first order logical decision trees. PhD thesis, Department of Computer Science, Katholieke Universiteit Leuven, 1998. http://www.cs.kuleuven.ac.be/-ml/PS/blockeel98:phd.ps.gz.]]Google ScholarGoogle Scholar
  12. H. Blockeel and L. De Raedt. Relational knowledge discovery in databases. In Proceedings of the Sixth International Workshop on Inductive Logic Programming, volume 1314 of Lecture Notes in Artificial Intelligence, pages 199--212. Springer-Verlag, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Blockeel and L. De Raedt, Top-down induction of first order logical decision trees. Artificial Intelligence, 101(1--2):285--297, June 1998.]]Google ScholarGoogle Scholar
  14. H. Blockeel, L. De Raedt, N. Jacobs, and B. Demoen. Scaling up inductive logic programming by learning from interpretations. Data Mining and Knowledge Discovery, 3(1):59--93, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Blockeel, L. Dehaspe, B. Demoen, G. Janssens, J. Ramon, and H. Vandecasteele. Improving the efficiency of inductive logic programming through the use of query packs. Journal of Artificial Intelligence Research, 16:135--166, 2002.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Borgida. On the relative expressiveness of description logics and predicate logics. Artificial Intelligence, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Cheeseman, B. Kanefsky, and W. M. Taylor. Where the really hard problems are. In Proc. of IJCAI'91, pages 331--337. Morgan Kaufmann, 1991.]]Google ScholarGoogle Scholar
  18. W. Chen and D. S. Warren. Tabled evaluartion with delaying for general logic programs. Journal of the ACM, 43(1):20--74, January 1996. http://www.cs.sunysb.edu/-sbprolog.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. De Raedt. Logical settings for concept learning. Artificial Intelligence, 95:187--201, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. De Raedt, H. Blockeel, L. Dehaspe, and W. Van Laer. Three companions for data mining in first order logic. In S. Džeroski and N. Lavrač, editors, Relational Data Mining, pages 105--139. Springer-Verlag, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. De Raedt and S. Džeroski. First order jk-clausal theories are PAC-learnable. Artificial Intelligence, 70:375--392, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Dehaspe and L. De Raedt. DLAB: A declarative language bias formalism. In Proceedings of the International Symposium on Methodologies for Intelligent Systems, volume 1079 of Lecture Notes in Artificial Intelligence, pages 613--622. Springer-Verlag, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Dehaspe and H. Toivonen. Discovery of frequent datalog patterns. Data Mining and Knowledge Discovery, 3(1):7--36, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. G. Dietterich, R. H. Lathrop, and T. Lozano-Pérez. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 89(1--2):31--71, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Džeroski and N. Lavrač, editors. Relational Data Mining. Springer-Verlag, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P.A. Flach and N. Lachiche. 1BC :a first order Bayesian classifier. In D. Page, editor, Proceedings of the Ninth International Workshop on Inductive Logic Programming, volume 1634, pages 93--103. Springer-Verlag, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Getoor, D. Koller, and B. Taskar. Statistical models for relational data. In Proc. of the KDD-02 Workshop on Multi-Relational Data Mining, 2002.]]Google ScholarGoogle Scholar
  28. A. Giordana and L. Saitta. REGAL: An integrated system for learning relations using genetic algorithms. In Proceedings of the 2nd International Workshop on Mul-tistrategy Learning, pages 234--249, 1993.]]Google ScholarGoogle Scholar
  29. A. Giordana and L. Saitta. Phase transitions in relational learning. Machine Learning Journal, 41:217--251, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Giordana, L. Saitta, M. Sebag, and M. Botta. Analyzing relational learning in the phase transition framework. In P. Langley, editor, Proceedings of the 17th International Conference on Machine Learning, pages 311.--318. Morgan Kaufmann, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. E. Goldberg. Genetic algorithms in search, optimization and machine learning. Addison Wesley, 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. G. Gottlob, N. Leone, and F. Scarcello. The complexity of acyclic conjunctive queries. In 39th Annual Symposium on Foundations of Computer Science, FOCS '98, November 8--11, 1998, Palo Alto, California, USA., pages 706--715. IEEE Computer Society, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. G. Gottlob, N. Leone, and F. Scarcello. On tractable queries and constraints. In Proceedings of the 10th International Conference on Database and Expert Systems Applications., pages 1--15, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Hekanaho. Dogma: A ga-based relational learner. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 205--214, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T. Hogg. Refining the phase transition in combinatorial search. Artificial Intelligence, 81(1--2):127--154, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Hogg, B. A. Huberman, and C. P. Williams, editors. Artificial Intelligence: Special Issue on Frontiers in Problem Solving: Phase Transitions and Complexity, volume 81(1--2). Elsevier, 1996.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. Horvath and S. Wrobel. Towards discovery of deep and wide first-order structures :A case study in the domain of mutagenicity. In Discovery Science 2001, volume 2226 of Lecture Notes in Artificial Intelligence, pages 100--112, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Jarke and J. Koch. Query optimization in database systems. ACM Computing Surveys, 16(2), 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. J. Kennedy and C. Giraud-Carrier. A depth controlling strategy for strongly typed evolutionary programming. In GECCO 1999: Proceedings of the First Annual Conference, pages 1--6. Morgan Kaufmann, 1999.]]Google ScholarGoogle Scholar
  40. J-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, editor, Inductive Logic Programming, pages 335--359. Academic Press, 1992.]]Google ScholarGoogle Scholar
  41. J. U. Kietz and M. Lübbe. An efficient subsumption algorithm for inductive logic programming. In Proceedings of the 11th International Conference on Machine Learning, pages 130--138. Morgan Kaufmann, 1994.]]Google ScholarGoogle ScholarCross RefCross Ref
  42. A. J. Knobbe, A. Siebes, H. Blockeel, and D. van der Wallen. Multi-relational data mining, using UML for ILP. In Proceedings of PKDD-2000 - The Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases, volume 1910 of Lecture Notes in Artificial Intelligence, pages 1--12, Lyon, France, 2000. Springer.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. R. Koza. Genetic Programming: On the Programming of Computers by means of Natural Evolution. MIT Press, Massachusetts, 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. R. Koza. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Massachussetts, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Kramer, L. De Raedt, and C. Helma. Molecular feature mining in HIV data. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-01), pages 136--143. ACM Press, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. S. Kramer, N. Lavrač, and P. Flach. Propositionalization approaches to relational data mining. In S. Džeroski and N. Lavrač, editors, Relational Data Mining, pages 262--291. Springer-Verlag, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. D. Krzywania, J. Struyf, and H. Blockeel. Mining the UK traffic database. Internal SolEUNet report, 2002.]]Google ScholarGoogle Scholar
  48. N. Lachiche and P. Flach. A first-order representation for knowledge discovery and bayesian classification on relational data In PKDD2000 workshop on Data Mining, Decision Support, Meta-learning and ILP: Forum for Practical Problem Presentation and Prospective Solutions, pages 49--60, 2000.]]Google ScholarGoogle Scholar
  49. J. W. Lloyd. Declarative programming in Escher. Technical Report CSTR-95-013, University of Bristol, June, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. W. Lloyd. Foundations of Logic Programming. Springer-Verlag, 2nd edition, 1987.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Maloberti and M. Sebag. Theta-subsumption in a constraint satisfaction perspective. In C. Rouveirol and M. Sebag, editors, Proceedings of Inductive Logic Programming, LNAI 2157, pages 164--178. Springer Verlag, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241--258, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241--258, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. O. Maron and T. Lozano-Pérez. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems, volume 10, pages 570--576. Mit Press, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. M. Mehta, R. Agrawal, and J. Rissanen. SLIQ :A fast scalable classifier for data mining. In Proceedings of the Fifth International Conference on Extending Database Technology, volume 1057 of Lecture Notes in Computer Science. Springer-Verlag, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. A. W. Moore and M. S. Lee. Cached sufficient statistics for efficient machine learning with large datasets. Journal of Artificial Intelligence Research, 8:67--91, 1998.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. K. Morik and P. Brockhausen. A multistrategy approach to relational discovery in databases. Machine Learning, 27(3):287--312, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. S. Muggleton. Inductive logic programming. In S. Muggelton, editor, Inductive Logic Programming. Academic Press, 1992.]]Google ScholarGoogle Scholar
  59. S. Muggleton. Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming, 13(3--4):245--286, 1995.]]Google ScholarGoogle Scholar
  60. S. Muggleton and L. De Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming, 19:629--679, 1994.]]Google ScholarGoogle ScholarCross RefCross Ref
  61. C. Nédellec, H. Adé, F. Bergadano, and B. Tausend. Declarative bias in ILP. In L. De Raedt, editor, Advances in Inductive Logic Programming, volume 32 of Frontiers in Artificial Intelligence and Applications, pages 82--103. IOS Press, 1996.]]Google ScholarGoogle Scholar
  62. S.-H. Nienhuys-Cheng and R. De Wolf. Foundations of Inductive Logic Programming, volume 1228 of Lecture Notes in Computer Science and Lecture Notes in Artificial Intelligence. Springer-Verlag, New York, NY, USA, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. T. Oates and D. Jensen. Large datasets lead to overly complex models: An explanation and a solution. In Knowledge Discovery and Data Mining, pages 294--298, 1998.]]Google ScholarGoogle Scholar
  64. D. Pavlov, H. Mannila, and P. Smyth. Beyond indepen-dence: Probabilistic models for query approximation on binary transaction data. IEEE Transactions on Knowledge and Data Engineering, 2003. To appear.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. G. Plotkin. A note on inductive generalization. In B. Meltzer and D. Michie, editors, Machine Intelligence, volume 5, pages 153--163.Edinburgh University Press, 1970.]]Google ScholarGoogle Scholar
  66. J. R. Quinlan. Learning logical definition from relations. Machine Learning, 5:239--266, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. J. R. Quinlan. FOIL :A midterm report. In P. Brazdil, editor, Proceedings of the 6th European Conference on Machine Learning, Lecture Notes in Artificial Intelligence. Springer-Verlag, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. A. Ratle and M. Sebag. Genetic programming with domain knowledge for machine discovery. In Proceedings of the 12th International Conference on Inductive Logic Programming. Springer--Verlag, 2002.]]Google ScholarGoogle Scholar
  69. J. A. Robinson. A machine-oriented logic based on the resolution principle. Journal of the ACM, 12:23--41, 1965.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. V. Santos Costa, A. Srinivasan, R. Camacho, H. Blockeel, B. Demoen, G. Janssens, J. Struyf, H. Vande-casteele, and W. Van Laer. Query transformations for improving the efficiency of ILP systems. Journal of Machine Learning Research, 2002. In press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. T. Scheffer, R. Herbrich, and F. Wysotzki. Efficient theta-subsumption based on graph algorithms. In Inductive Logic Programming, 6th International Work shop, Proceedings, volume 1314 of Lecture Notes in Artificial Intelligence, pages 212--228, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. T. Scheffer and S. Wrobel. Finding the most interesting patterns in a database quickly by using sequential sampling. Journal of Artificial Intelligence Research, 3:833--862, 2002.]]Google ScholarGoogle Scholar
  73. M. Sebag and C. Rouveirol. Tractable Induction and Classification in First-Order Logic via Stochastic Matching. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, pages 888--893. Morgan Kaufmann, 1997.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. M. Sebag and C. Rouveirol. Any-time relational reasoning: Resource-bounded induction and deduction through stochastic matching. Machine Learning, 38(1--2):41--62, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. A. Serra, A. Giordana, and L. Saitta. Learning on the phase transition edge. In Proc. of IJCAI 2001, pages 921--926. Morgan Kaufmann, 2001.]]Google ScholarGoogle Scholar
  76. A. Srinivasan. A study of two sampling methods for analysing large datasets with ILP. Data Mining and Knowledge Discovery, 3(1):95--123, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. A. Srinivasan. A study of two probabilistic methods for searching large spaces with ILP. Technical Report PRG-TR-16-00, Oxford University Computing Laboratory, 2000.]]Google ScholarGoogle Scholar
  78. J. Struyf and H. Blockeel. Query optimization in inductive logic programming by reordering literals. In Proceedings of the 13th International Conference on In ductive Logic Programming, Lecture Notes in Artificial Intelligence. Springer-Verlag, 2003. To appear.]]Google ScholarGoogle Scholar
  79. J. Struyf, J. Ramon, and H. Blockeel. Compact representation of knowledge bases in ILP. In Proceedings of the 12th International Conference on Inductive Logic Programming, volume 2583 of Lecture Notes in Artificial Intelligence, pages 254--269. Springer-Verlag, 2002.]]Google ScholarGoogle Scholar
  80. F. Torre and C. Rouveirol. Natural ideal operators in inductive logic programming. In Proceedings of the 9th European Conference on Machine Learning, pages 274--289, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. E. Tsang. Foundations of Constraint Satisfaction. Academic Press, 1993.]]Google ScholarGoogle Scholar
  82. D. Tsur, J. D. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov, and A. Rosenthal. Query flocks: A generalization of association-rule mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD-98), volume 27, 2 of ACM SIGMOD Record, pages 1--12, New York, June 1--4 1998. ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. H. Vandecasteele, B. Demoen, and G. Janssens. Compiling large disjunctions. In First International Conference on Computational Logic: Workshop on Parallelism and Implementation Technology for (Constraint) Logic Programming Languages, 2000. Also available as Technical Report CW 295, http://www.cs.kuleuven.ac.be/publicaties/rapporten/cv/CW295.ps.gz.]]Google ScholarGoogle Scholar
  84. M. L. Wong and K. S. Leung. Combining genetic programming and inductive logic programming using logic grammars. In D. B. Fogel, editor, Proceedings of the Second IEEE International Conference on Evolutionary Computation, pages 733--736. IEEE Press, 1995.]]Google ScholarGoogle Scholar
  85. S. Wrobel. An algorithm for multi-relational discovery of subgroups. In J. Komorowski and J. Zytkow, editors, Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD '97), pages 78--87. Springer-Verlag, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. S. Wrobel, D. Wettschereck, E. Sommer, and W. Emde. Extensibility in data mining systems. In Proceedings of the Second International Conference on Knowledge Discovey and Data Mining (KDD-96). AAAI Press, 1996.]]Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalability and efficiency in multi-relational data mining
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGKDD Explorations Newsletter
            ACM SIGKDD Explorations Newsletter  Volume 5, Issue 1
            July 2003
            101 pages
            ISSN:1931-0145
            EISSN:1931-0153
            DOI:10.1145/959242
            Issue’s Table of Contents

            Copyright © 2003 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 July 2003

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader