Discriminating Different Classes of Toxicants by Transcript Profiling

Male rats were treated with various model compounds or the appropriate vehicle controls. Most substances were either well-known hepatotoxicants or showed hepatotoxicity during preclinical testing. The aim of the present study was to determine if biological samples from rats treated with various compounds can be classified based on gene expression profiles. In addition to gene expression analysis using microarrays, a complete serum chemistry profile and liver and kidney histopathology were performed. We analyzed hepatic gene expression profiles using a supervised learning method (support vector machines; SVMs) to generate classification rules and combined this with recursive feature elimination to improve classification performance and to identify a compact subset of probe sets with potential use as biomarkers. Two different SVM algorithms were tested, and the models obtained were validated with a compound-based external cross-validation approach. Our predictive models were able to discriminate between hepatotoxic and nonhepatotoxic compounds. Furthermore, they predicted the correct class of hepatotoxicant in most cases. We provide an example showing that a predictive model built on transcript profiles from one rat strain can successfully classify profiles from another rat strain. In addition, we demonstrate that the predictive models identify nonresponders and are able to discriminate between gene changes related to pharmacology and toxicity. This work confirms the hypothesis that compound classification based on gene expression data is feasible.

In its simplest form, the SVM algorithm works by determining a hyperplane f(x) = 〈w,x〉 + b (1) that separates positive from negative examples directly in the input space of the given data.
For the linearly separable problem, it can be shown that is always possible to find an optimal separating hyperplane in the sense that the margin (i.e. the distance between the plane and the closest lying data points) is maximized. Just as intuition might suggest, from this approach we also get a solution that minimizes the expectation for classification errors when applying the Predictive toxicology us In order to handle linearly non-separable problems, the simple SVM algorithm sketched above can be extended in two ways. First, a soft margin approach is applied to provide some tolerance for a limited number of training errors (i.e. data points lying at the wrong side of the separating hyperplane). This can be implemented in different ways, but the common principle is always to introduce additional parameter(s) that can control the trade-off between training accuracy and the size of the margin, from which the latter one is closely related to the The argument of the sign-function can be regarded as a measure for the classification confidence. From now on, we will refer to it as the discriminant value. In the context of the binary classification problem, its sign indicates the predicted class membership of a data point while its absolute value is roughly correlated with the unambiguousness of this decision.
(Note, however, that abnormally large numbers might be pointing to the membership in a different class that was not considered when training was carried out.) It is a remarkable observation that in almost all realistic data sets a large fraction, if not most of the coefficients α i reach a value of zero in the process of optimization. This means that the solution only depends on a relatively small subset of data points which are called support vectors. They correspond to the training examples lying closest to the separating hyperplane ("borderline cases"). This sparseness of the solution not only gives rise to computational benefits, but also has some important learning theoretic consequences. Using equations (1) and (2), the decision function for classifying data examples (regardless whether they belong to the training or a test set) simply becomes classifier on an independent test set. Furthermore it can be proven that the normal vector of the optimal separating hyperplane can always be expressed in the basis of the given training patterns, i.e. ing toxicogenomics 2 Predictive toxicology us generalization performance of a classifier. A suitable choice for the values of these controlling parameters can be done either by estimating the amount of noise in the data or by assessing the classification performance with independent test data. Secondly, all training patterns can be mapped to a higher dimensional space ("feature space") prior to constructing the optimal separating hyperplane. By using a nonlinear mapping, data which is not linearly separable in the original input space can acquire this attractive property in the feature space. Surprisingly, there is a way in practice to avoid the computationally expensive mapping transformation altogether. This is possible since there is a formulation of the classification problem that is based solely on dot products between mapped data points. Therefore, real valued kernel functions k(x i ,x j ) that represent the inner product carried out in the higher dimensional feature space can be introduced in order to replace the dot product between vectors of the input space.
The decision function (3) therefore can be written in the form Although being a binary classification technique by its very nature, the SVM approach can be applied to multiple class problems as well since in such a situation it is always possible to break down the principal classification task into several binary decisions. One possibility to do so is the One-versus-All (OVA) training method. Following the OVA protocol, n single From this representation it can be seen that kernel functions can also be interpreted as (nonlinear) pairwise similarity measures. Summarizing, the "kernel trick" allows to combine the mathematical elegance of linear decision functions with the power of dealing with problems which are not linearly separable.
ing toxicogenomics Predictive toxicology us where I, J and K are disjoint sets of indices whose union is { 1, 2,…, m}, i.e. the complete set of indices for the m features which are known for each training example and e x , e y , e z are orthogonal unit vectors spanning a Cartesian coordinate system. This transformation splits one discriminant value into three components, whereby each component is given as a linear combination of different features. By inserting the feature weights w i obtained from one binary SVM classifier and plotting Φ(x) for all training patterns in a three dimensional scatter plot, it is possible to project down a potentially very high dimensional separation problem into then involves combining the output of n SVMs and making a decision based on the resulting set of discriminant values.
In order to retrieve at least some of this information, we have tried and hereby propose the following method that leads to mapping of all data examples into a three dimensional space.
Our approach is to split up the dot product 〈w,x〉 in the decision function of the linear kernel Predictive toxicology us a three dimensional cube while preserving the separability information together with some inherent structure of the data. For example, in the case of linear separable data there exists also (at least) one two dimensional plane in the projected data set that separates positive from negative examples. Of course there are many possible ways to distribute the features among the sets I, J and K. As a practical approach for standardizing the mapping, we first sort the features by their absolute weightsw i  and split the resulting list in three equal parts, each containing exactly one third of all features (barring rounding effects). Therefore the "most important" 33% of the features are collected in set I, while for example the least important ones are gathered in subset K. This means that the components of Φ(x) are not equivalent.
Often there are many features that are completely redundant for the classification (as it is certainly the case for microarray experiments) and the data is completely separable with the e x component alone. However, according to our experience with real microarray data it is often the case that subclusters of microarrays (representing for example slight differences in the treatment of individual samples or the specific effect of compounds that have been put together in one class) can be observed when all components are considered simultaneously. A visual inspection of the three dimensional mapping (5) can therefore lead to a deeper understanding of the data set and the detection of unknown subgroups or single outliers.
Furthermore, this transformation can also be carried out with the test examples so that all data can be shown in a single scatter plot. It is then possible to compare the distribution of the test data with those of different training groups in order to detect similarities and dissimilarities among the groups. We use this method routinely to explore our data sets.

Affymetrix ID Description
AB011369_s_at protein kinase c-binding protein beta15