Abstract
Neural networks, despite their empirically proven abilities, have been little used for the refinement of existing knowledge because this task requires a three-step process. First, knowledge must be inserted into a neural network. Second, the network must be refined. Third, the refined knowledge must be extracted from the network. We have previously described a method for the first step of this process. Standard neural learning techniques can accomplish the second step. In this article, we propose and empirically evaluate a method for the final, and possibly most difficult, step. Our method efficiently extracts symbolic rules from trained neural networks. The four major results of empirical tests of this method are that the extracted rules 1) closely reproduce the accuracy of the network from which they are extracted; 2) are superior to the rules produced by methods that directly refine symbolic rules; 3) are superior to those produced by previous techniques for extracting rules from trained neural networks; and 4) are “human comprehensible.” Thus, this method demonstrates that neural networks can be used to effectively refine symbolic knowledge. Moreover, the rule-extraction technique developed herein contributes to the understanding of how symbolic and connectionist approaches to artificial intelligence can be profitably integrated.
Article PDF
Similar content being viewed by others
References
Berenji, H.R. (1991). Refinement of approximate reasoning-based controllers by reinforcement learaning. Proceedings of the Eighth International Machine Learning Workshop (pp. 475–479). Evanston, IL: Morgan Kaufmann.
Bochereau, L., & Bourgine, P. (1990). Extraction of semantic features and logical rules from a multilayer neural network. International Joint Conference on Neural Networks (Vol. 2) (pp. 579–582). Washington, D.C.: Erlbaum.
Bruner, J.S., Goodnow, J.J., & Austin, G.A. (1956). A study of thinking. New York: Wiley.
Dzeroski, S., & Lavrac, N. (1991). Learning relations from noisy examples: An empirical comparison of LINUS and FOIL. Proceedings of the Eighth International Machine Learning Workshop (pp. 399–402). Evanston, IL: Morgan Kaufmann.
Fahlman, S.E., & Lebiere, C. (1989). The cascade-correlation learning architecture. Advances in neural information processing systems (Vol. 2) (pp. 524–532). Denver, CO: Morgan Kaufmann.
Fisher, D.H., & McKusick, K.B. (1989). An empirical comparison of ID3 and backpropagation. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 788–793). Detroit, MI: Morgan Kaufmann.
Fu, L.M. (1991). Rule learning by searching on adapted nets. Proceedings of the Ninth National Conference on Artificial Intelligence (pp. 590–595). Anaheim, CA: AAAI Press.
Goldman, S.A., & Kearns, M.J. (1991). On the complexity of teaching. Proceedings of the Fourth Annual Workshop on Computational Learning Theory (pp. 303–314). Santa Cruz, CA: Morgan Kaufmann.
Harley, C.B., & Reynolds, R.P. (1987). Analysis of E. coli promoter sequences. Nucleic Acids Research, 15, 2343–2361.
Hartigan, J.A. (1975). Clustering algorithms. New York: Wiley.
Hawley, D.K., & McClure, W.R. (1983). Compilation and analysis of escherichia coli promotor DNA sequences. Nucleic Acids Research, 11, 2237–2255.
Hayashi, Y. (1990). A neural expert system with automated extraction of fuzzy if-then rules. Advances in neural information processing systems (Vol. 3) (pp. 578–584). Denver, CO: Morgan Kaufmann.
Hinton, G.E. (1989). Connectionist learning procedures. Artificial Intelligence, 40, 185–234.
Judd, S. (1988). On the complexity of loading shallow neural networks. Journal of Complexity, 4, 177–192.
Koudelka, G.B., Harrison, S.C., & Ptashne, M. (1987). Effect of non-contacted bases on the affinity of 434 operator for 434 repressor and Cro. Nature, 326, 886–888.
Le Cun, Y., Denker, J.S., & Solla, S.A. (1989). Optimal brain damage. Advances in neural information processing systems (Vol. 2) (pp. 598–605). Denver, CO: Morgan Kaufmann.
Masuoka, R., Watanabe, N., Kawamura, A., Owada, Y., & Asakawa, K. (1990). Neurofuzzy system—fuzzy inference using a structured neural network. Proceedings of the International Conference on Fuzzy Logic & Neural Networks (pp. 173–177). Iizuka, Japan.
McDermott, J. (1982). R1: A rule-based configurer of computer systems. Artificial Intelligence, 19, 21–32.
McMillan, C., Mozer, M.C., & Smolensky, P. (1991). The connectionist scientist game: Rule extraction and refinement in a neural network. Proceedings of the Thirteenth Annual Conference of the Cognitive Science Society. Chicago, IL: Erlbaum.
Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97.
Mozer, M.C., & Smolensky, P. (1988). Skeletonization: A technique for trimming the fat from a network via relevance assessment. Advances in neural information processing systems (Vol. 1) (pp. 107–115). Denver, CO: Morgan Kaufmann.
Murphy, P.M., & Pazzani, M.J. (1991). ID2-of-3: Constructive induction of M-of-N concepts for discriminators in decision trees. Proceedings of the Eighth International Machine Learning Workshop (pp. 183–187). Evanston, IL: Morgan Kaufmann.
Nessier, U., & Weene, P. (1962). Hierarchies in concept attainment. Journal of Experimental Psychology, 64, 640–645.
Noordewier, M.O., Towell, G.G., & Shavlik, J.W. (1991). Training knowledge-based neural networks to recognize genes in DNA sequences. Advances in neural information processing systems (Vol. 3) (pp. 530–536). Denver, CO: Morgan Kaufmann.
Nowlan, S.J., & Hinton, G.E. (1991). Simplifying neural networks by soft weight-sharing. Advances in neural information processing systems (Vol. 4) (pp. 993–1000). Denver, CO: Morgan Kaufmann.
Ourston, D. (1991). Using explanation-based and empirical methods in theory revision. Ph.D. thesis, Department of Computer Sciences, University of Texas, Austin, TX.
Ourston, D., & Mooney, R.J. (1990). Changing the rules: A comprehensive approach to theory refinement. Proceedings of the Eighth National Conference on Artificial Intelligence (pp. 815–820). Boston, MA: AAAI Press.
Pazzani, M. (1992). When prior knowledge hinders learning. Workshop Notes of Constraining Learning with Prior Knowledge (pp. 44–52). San Jose, CA.
Pratt, L.Y., Mostow, J., & Kamm, C.A. (1991). Direct transfer of learned information among neural networks. Proceedings of the Ninth National Conference on Artificial Intelligence (pp. 584–589). Anaheim, CA: AAAI Press.
Quinlan, J.R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies, 27, 221–234.
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart & J.L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Volume 1: Foundations (pp. 318–362). Cambridge, MA: MIT Press.
Saito, K., & Nakano, R. (1988). Medical diagnostic expert system based on PDP model. Proceedings of IEEE International Conference on Neural Networks (Vol. 1) (pp. 255–262). San Diego, CA: IEEE.
Sestito, S., & Dillon, T. (1990). Using multi-layered neural networks for learning symbolic knowledge. Proceedings of the Fourth Australian Joint Conference on Artificial Intelligence. Perth, Australia: World Scientific.
Shavlik, J.W., Mooney, R.J., & Towell, G.G. (1991). Symbolic and neural net learning algorithms: An empirical comparison. Machine Learning, 6, 111–143.
Stormo, G.D. (1990). Consensus patterns in DNA. Methods in enzymology (Vol. 183). Orlando, FL: Academic Press.
Sutton, R.S. (1986). Two problems with backpropagation and other steepest descent learning procedures for networks. Program of the Eighth Annual Conference of the Cognitive Science Society (pp. 823–831). Amherst, MA: Erlbaum.
Thompson, K., Langley, P., & Iba, W. (1991). Using background knowledge in concept formation. Proceedings of the Eighth International Machine Learning Workshop (pp. 554–558). Evanston, IL: Morgan Kaufmann.
Thrun, S., Bala, J. Bloedorn, E., Bratko, I., Cestnik, B., Cheng, J., De Jong, K., Dzeroski, S., Fahlman, S., Fisher, D., Hamann, R., Kaufman, K., Keller, S., Kononenko, I., Kreuziger, J., Michalski, R., Mitchell, T., Pachowicz, P., Reich, Y., Vafaie, H., Van de Welde, W., Wenzel, W., Wnek, J., & Zhang, J. (1991). The MONK's problem: A performance comparison of different learning algorithms. (Technical Report CMU-CS-91-197). Pittsburgh, PA: Carnegie Mellon.
Towell, G.G. (1991). Symbolic knowledge and neural networks: Insertion, refinement, and extraction. Ph.D. thesis, Computer Sciences Department, University of Wisconsin, Madison, WI.
Towell, G.G., & Shavlik, J.W. (1991). Interpretation of artificial neural networks: Mapping knowledge-based neural networks into rules. Advances in neural information processing systems (Vol. 4) (pp. 977–984). Denver, CO: Morgan Kaufmann.
Towell, G.G., Shavlik, J.W., & Noordewier, M.O. (1990). Refinement of approximately correct domain theories by knowledge-based neural networks. Proceedings of the Eighth National Conference on Artificial Intelligence (pp. 861–866). Boston: MA: AAAI Press.
Weiss, S.M., & Kulikowski, C.A. (1990). Computer systems that learn. San Mateo, CA: Morgan Kaufmann.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Towell, G.G., Shavlik, J.W. Extracting Refined Rules from Knowledge-Based Neural Networks. Machine Learning 13, 71–101 (1993). https://doi.org/10.1023/A:1022683529158
Issue Date:
DOI: https://doi.org/10.1023/A:1022683529158