Abstract
In this paper, we present an experiment on knowledge discovery in chemical reaction databases. Chemical reactions are the main elements on which relies synthesis in organic chemistry, and this is why chemical reactions databases are of first importance. From a problem-solving process point of view, synthesis in organic chemistry must be considered at several levels of abstraction: mainly a strategic level where general synthesis methods are involved, and a tactic level where actual chemical reactions are applied. The research work presented in this paper is aimed at discovering general synthesis methods from chemical reaction databases in order to design generic and reusable synthesis plans. The knowledge discovery process relies on frequent levelwise itemset search and association rule extraction, but also on chemical knowledge involved within every step of the knowledge discovery process. Moreover, the overall process is supervised by an expert of the domain. The principles of this original experiment on mining chemical reaction databases and its results are detailed and discussed.
Chapter PDF
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Yu, P.S., Chen, A.L.P. (eds.) Proceedings of the Eleventh International Conference on Data Engineering (ICDE 1995), Taipei, Taiwan, pp. 3–14. IEEE Computer Society, Los Alamitos (1995)
Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. ACM SIGKDD Explorations 2(2), 66–75 (2000)
Berasaluce, S.: Fouille de données at acquisition de connaissances à partir de bases de données de réactions chimiques. Thèse de chimie informatique et théorique, Université Henri Poincaré Nancy 1 (2002)
Berasaluce, S., Laurenço, C., Napoli, A., Niel, G.: Data mining in reaction databases: extraction of knowledge on chemical functionality transformations. Technical Report A04-R-049, LORIA, Nancy (2004)
Brachman, R.J., Anand, T.: The Process of Knowledge Discovery in Databases. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, Menlo Park, California, pp. 37–57. AAAI Press / MIT Press (1996)
Chittimoori, R., Holder, L.B., Cook, D.J.: Applying the Subdue substructure discovery system to the chemical toxicity domain. In: Proceedings of the Florida AI Research Symposium, pp. 90–94 (1999)
Corey, E.J., Cheng, X.M.: The Logic of Chemical Synthesis. John Wiley & Sons, New York (1989)
Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compounds. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 30–36 (1998)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: Knowledge Discovery and Data Mining: Towards a Unifying Framework. In: Proceedings of the Second International Conference on Knowledge Discovery & Data Mining (KDD 1996), Portland, Oregon, pp. 82–88 (1996)
Ganter, B., Rudolph, S.: Formal Concept Analysis Methods for Dynamic Conceptual Graphs. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 143–156. Springer, Heidelberg (2001)
Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Berlin (1999)
Inokuchi, A., Washio, T., Motoda, H.: An Apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Kuramochi, M., Karypis, G.: An efficient algorithm for discovering frequent subgraphs. Technical Report 02–026, Department of Computer Science, University of Minnesota (2002); To be published in IEEE Transactions on Knowledge and Data Engineering
Napoli, A., Laurenço, C., Ducournau, R.: An object-based representation system for organic synthesis planning. International Journal of Human-Computer Studies 41(1/2), 5–32 (1994)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Pruning closed itemset lattices for association rules. International Journal of Information Systems 24(1), 25–46 (1999)
Sena, M., Karypis, G.: SLPMiner: An algorithm for finding frequent sequential patterns using length-decreasing support constraint. Technical Report 02–023, Department of Computer Science, University of Minnesota (2002)
Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with titanic. Journal of Data and Knowledge Engineering 42(2), 189–222 (2002)
Vismara, P., Laurenço, C.: An abstract representation for molecular graphs. DIMACS Series in Discrete Mathematics and Theoretical Computer Science 51, 343–366 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berasaluce, S., Laurenço, C., Napoli, A., Niel, G. (2004). An Experiment on Knowledge Discovery in Chemical Databases. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive