Characterization of bio-chemical signals by inductive logic programming

https://doi.org/10.1016/S0950-7051(01)00129-0Get rights and content

Abstract

This paper presents a methodology to design a discrete-event system (DES) for the on-line supervision of a biotechnological process. The DES is synthesized applying wavelet transform and inductive logic programming on the measured signals constrained to the biotechnologist expert validation.

Introduction

Despite the complexity of biotechnological processes, biotechnologists are able to identify normal and abnormal situations, undertaking suitable actions accordingly. Process experts are able to draw such conclusions by analyzing a set of measured signals collected from the plant. The inexistence of satisfactory mathematical models impedes model-based approaches to be used in supervisory tasks, thus involving other strategies to be applied. That suggests that, in spite of the lack of process models, measured data can be used instead in the supervisory system development.

One of the main contribution of artificial intelligence (AI) to biological or chemical processes turns out to be the classification of an increasing amount of data. Can we do more than that and can an AI program contribute to help in the discovery of hidden rules in some process as complex as this. In fact, even if we can predict, for instance, mutagenicity of a given molecule or the secondary structure of proteins, with high degree of accuracy, this is not sufficient to give a deep insight of the observed behavior.

The main process we are concerned with in our paper is a bio-reaction, namely the dynamical behavior of yeast during chemostat cultivation. Starting from the observation of a set of evolutive parameters, our final aim is to extract logical rules to infer the physiological state of the yeast. Doing so, we obtain not only a better understanding of the system evolution but also the possibility to integrate the inferred rules in a full on-line control process. The first thing we have to do is to capture and analyze the parameters given by the sensors. These signals must be treated to be finally given to the logic machine. Thus, two things have to be done: firstly, to denoise the signals, secondly to compute the local maximum values of the given curves. In fact, we are more interested in the variations of the signals than in their pure instantaneous values. We use a method issued from wavelets theory [1] which tends to replace classical Fourier analysis. Among the advantages of WT, we can cite:

  • 1.

    The possibility to decompose a signal into multiple resolutions or scales. This property allows, for a given signal, to exhibit self-similarity at different magnifications.

  • 2.

    The localization of wavelets in both time and frequency allows to analyze local features (this is not the case for Fourier transformation).

  • 3.

    Wavelets have an excellent data compression rate.

At the end of this purely analytic treatment, we dispose of a set of clean values for each critical parameter.

Now, our idea is to apply inductive logic programming to exhibit, starting from a finite sample set of numerical observations, a number of logical formulae which organize the knowledge using causal relationships. Inductive logic programming is a sub-field of machine learning based upon a first-order logic framework. So, instead of giving a mathematical formula (for instance a differential equation) or a statistical prediction involving the different parameters, we provide a set of implicative logical formulae.

A part of these formulae can generally be inferred by a human expert, so it is a way to validate partially the mechanism. But it remains some new formulae which express an unknown causality relation: in that sense, this is a kind of knowledge discovery. As far as we know, one of the novelties of our work is the introduction of a time dimension to simulate the dynamic process. In logic, this time variable is in general not considered except with some specific modal logics. So, we modelize the time with an integer-valued variable (Fig. 1).

Section snippets

Yeast production application: example

Yeasts are a very well-studied micro-organisms [2] and today, such micro-organism as Saccharomyces cerevisiae which make the object of this study, are largely used in various sectors of the biomedical and biotechnology industrial processes.

Saccharomyces Cerevisiae is studied under oxidative regime (i.e. no ethanol production) to produce yeast under a laboratory environment in a bioreactor. Two different procedures are applied: a batch procedure that is followed by a continuous procedure. The

Statistical estimation

It is not new that noise generally affects the quality of the observed date. The standard model to described a noisy signal is:X[n]=f[n]+W[n]where f is the determinist signal, and W[n] is a gaussian white noise of variance σ2.

To estimate f by f̂=f̂(X) two methods have been tested:

  • 1.

    linear methods: having the main defect that blur the shape of the signals,

  • 2.

    non-linear method: wavelet transform.

In signal classification it is important to find a representation of the signal which are translation

Knowledge based methodology

In learning there is a constant interaction between the creation and the recognition of concepts. The goal of the methodology is to obtain a model of the process, which can be used in a supervisory system for condition monitoring. The complexity of this model imposes the co-operation of data mining techniques along with the expect knowledge. When only expert knowledge is used to identify process situations or states, any of these situations can arise:

  • 1.

    the expert can express only a partial

Discussion and conclusion

In this paper, we apply logical tools to get explanation rules concerning the behavior of a bio-reactor. The ability to incorporate background knowledge and re-use past experiences marks out ILP as a very effective solution for our problem. Instead of simply giving classification results, we get some logical rules establishing a causality relationship between different parameters of the bio-machinery. Among these rules, some ones are validated by expert knowledge, but some new ones have been

References (6)

  • A. Arneodo, F. Argoul, E. Bacry, J. Elezgaray, J.F. Muzy. Ondelettes, Multifractales et Turbulence. Diderot Editeur,...
  • K. Konstantinov et al.

    Physiological state control of fermentation processes

    Biotechnology and Bioengineering

    (1989)
  • J. Rocca. Technical report of Laboratoire d'automatisme et architecutre des sytemes,...
There are more references available in the full text version of this article.

Cited by (14)

  • DNA Double-Strand Break-Based Nonmonotonic Logic

    2015, Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools
  • Bioprocess diagnosis based on the empirical use of distance measures in the theory of belief functions

    2014, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    The event-tracking is classified as “normal” or “faulty” using a supervised classifier. Heuristics based expert systems rely on capturing knowledge and know-how) on the growth conditions of microorganism, and take into account the different fault occurrences and process variable interactions that the plant personnel can envision, by capturing them a rule base (Steyer, 1991; Doncescu et al., 2002). Model-based statistical signal processing: In these approaches, statistical models are developed using event-tracking data collected during the routine normal operation of fermentation.

  • A new relational Tri-training system with adaptive data editing for inductive logic programming

    2012, Knowledge-Based Systems
    Citation Excerpt :

    By using first-order logic as the representational mechanism for hypotheses and examples, ILP (Inductive Logic Programming) can overcome the two main limitations of classical machine learning techniques (CN2 [1], HDT (Hybrid Decision Tree) [2]): (1) the use of a limited knowledge representation formalism (essentially a propositional logic), and (2) difficulties in using substantial background knowledge in the learning process [3]. In recent years, ILP has been applied successfully to a number of real-world fields, such as structure–activity for drug design [4], characterization of bio-chemical signals [5], gene regulation prediction [6] and protein–protein interactions [7,8]. ILP systems automate the construction of first-order definite clause theories from examples and background knowledge, its main task is to find a hypothesis H which covers all positive examples and none of the negative examples.

  • Improving the scalability of ILP-based multi-relational concept discovery system through parallelization

    2012, Knowledge-Based Systems
    Citation Excerpt :

    Using logic in data mining is a common technique. It has applications in concept learning systems [4–7], association rule mining [8,9], handling aggregation and numerical data [10–13], coping with noisy and incomplete data [13–15], engineering applications [16,17], improving similarity measures in complex data [18]. Quinlan [4] defines the problem of concept learning as learning Horn clauses, disjunction of literals with exactly one positive literal being the concept relation, from relational data.

  • Detection and characterization of physiological states in bioprocesses based on Hölder exponent

    2008, Knowledge-Based Systems
    Citation Excerpt :

    However, no reliable technique exist to carry out real-time measurement of non-volatile substrates and metabolites in the fermentor. Several works using various approaches, lead to the conclusion that the limits of a state are linked to the singularities of biochemical signals: Steyer et al. [22] (using expert system and fuzzy logic), Bakshi and Stephanopoulos [1] (using expert system and wavelets) and Doncescu et al. [3] (using inductive logic) show that the beginning and the end of a state correspond to singularities of the biochemical signals measured during the process. In a fed-batch bioprocess, a physiological state can occur several times during the experience.

View all citing articles on Scopus
View full text