Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals

doi:10.1016/S0933-3657(98)00028-1

Artificial Intelligence in Medicine

Volume 13, Issue 3, July 1998, Pages 147-165

https://doi.org/10.1016/S0933-3657(98)00028-1 Get rights and content

Abstract

A new classification algorithm, called VFI5 (for Voting Feature Intervals), is developed and applied to problem of differential diagnosis of erythemato-squamous diseases. The domain contains records of patients with known diagnosis. Given a training set of such records, the VFI5 classifier learns how to differentiate a new case in the domain. VFI5 represents a concept in the form of feature intervals on each feature dimension separately. classification in the VFI5 algorithm is based on a real-valued voting. Each feature equally participates in the voting process and the class that receives the maximum amount of votes is declared to be the predicted class. The performance of the VFI5 classifier is evaluated empirically in terms of classification accuracy and running time.

Introduction

Researchers working on artificial intelligence have created many algorithms that successfully learn straightforward abilities. If the context is well-defined and the bounds of the problem can be correctly encoded for the computer, then these algorithms can often pick up a pattern and learn to predict it successfully. Inductive learning is a well-known approach to automatic knowledge acquisition of such patterns and classification knowledge from examples.

In several medical domains, the inductive learning systems were actually applied, e.g. two classification systems are used in the localization of a primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology [10]. The CRLS system is a system for learning categorical decision criteria in biomedical domains [15]. The case-based BOLERO system learns both plans and goals states, with the aim of improving the performance of a rule-based system by adapting the rule-based system behavior to the most recent information available about a patient [13]. The DIAGAID is a program, using connectionist approach, to determine the diagnostic value of clinical data [7].

Classification learning algorithms are composed of two components; namely, training and prediction (classification). The training phase, using some induction algorithms, forms a model of the domain from the training examples encoding some previous experiences. The classification phase, on the other hand, using this model, tries to predict the class that a new instance (case) belongs to.

The main requirement for such a system is prediction accuracy. Furthermore, a classification learning algorithm is expected to have a short training and prediction time. Such a system should be robust to noisy training instances. Also, in some real-world domains, both training and test instances may have some missing values. Features (attributes) that are used to encode instances may have different levels of relevancy to the domain. A classification learning system should be able to learn and/or incorporate information about the weights of the features. Another requirement might be the comprehensibility of the learned knowledge by human experts. The advantage of this trait is two folded. First, the human experts can check and verify the learned classification knowledge before it is put to use in real-world domains. Second, some previously unknown facts and patterns may be brought to the attention of human experts, leading to interesting discoveries in the field.

Previously developed machine learning algorithms, usually, possess some of these characteristics, and fail to satisfy the others. For example, some algorithms, (e.g. nearest neighbor and instance based learning algorithms 1, 4) develop a model of the domain quickly, however, it may take quite a long time to make a prediction using this model. On the other hand, some algorithms (e.g. neural networks) can make a fast prediction, however the knowledge they learn is difficult for humans to understand and verify.

The success of a classification learning algorithm, in terms of the criteria mentioned above, is directly related to the scheme used for representing the classification knowledge learned. In this paper, we present a knowledge representation technique called voting feature intervals (VFI). Along with the learning and classification algorithms, the whole system is called VFI5. The VFI representation is based on Feature Projections that has been used in CFP [8]and k-NNFP [2]. The VFI5, which is a non-incremental and supervised learning algorithm, is applied to differential diagnosis of erythemato-squamous diseases. Here, we show that that VFI5 algorithm, using the VFI representation, results in highly accurate predictions, has short training and classification times, is robust to noisy training instances and missing feature values, can use feature weights, and produces a human readable model of the classification knowledge.

The rationale behind VFI knowledge representation is that human experts maintain knowledge in this form, especially in medical domains. The input to VFI5 training algorithm is a set of training instances that are descriptions of patients with known diagnoses. Learning from these training examples, VFI5 constructs a representation of the classification knowledge inherent in the examples. This knowledge is represented as the projections of the training dataset by feature intervals on each feature dimension separately. Subsequently, for each feature dimension, projection points with similar characteristics are grouped into intervals. Therefore, an interval represents a set of feature values that yield the same classifications.

When diagnosing a new patient, each feature participates in the voting process and the diagnosis that receives the maximum amount of votes is predicted to be the diagnosis of that patient. As each feature participates in learning and classification independently, VFI enables an easy and natural way of handling missing feature values by simply ignoring them, i.e. features whose values are unknown do not participate in the voting.

The next section will describe the VFI5 algorithm in detail. In Section 3, the problem of differential diagnosis of erythemato-squamous diseases is explained. Application of the VFI5 algorithm to this domain is discussed in Section 4. Section 6describes the weights learned for the features of this domain using a genetic algorithm. Finally, the last section concludes with some remarks and plans for future work.

Section snippets

The VFI5 algorithm

The VFI5 classification algorithm is an improved version of the early VFI1 algorithm [6]. Here, the VFI5 algorithm is described in detail and explained through the use of an example.

Differential diagnosis of erythemato-squamous diseases

The differential diagnosis of erythemato-squamous diseases is a difficult problem in dermatology. They all share the clinical features of erythema and scaling, with very few differences. The diseases in this group are psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, chronic dermatitis and pityriasis rubra pilaris.

These diseases are frequently seen in the outpatient dermatology departments. At first sight, all of the diseases look very much alike with the erythema and scaling.

Experiments

Currently, the dataset for the domain contains 366 instances. Firstly, we used all of these instances to obtain a description of the domain. The description consists of the feature intervals constructed for each feature. The intervals obtained for features f₆, f₁₄, f₁₅, f₂₁ and f₃₄ are shown in Fig. 6.

It is clear from Fig. 6 that the nonzero values of feature f₆ (polygonal papules) indicate class C₃ (pityriasis rubra pilaris). On the other hand, the high values for f₁₄ would suggest class C₁ or

Comprehensibility of VFI5

The explanation ability of a classification process is as much important as its classification accuracy. We have shown the empirical evaluation of the VFI5 classifier in Section 4on the Dermatology dataset. However, a high prediction accuracy is not enough for a classification system; the knowledge it constructs should also be comprehensible by humans. For this purpose, we have tried to visualize the concept description learned by the VFI5 classifier. Since each feature votes for each class

Learning feature weights using a genetic algorithm

In a real-world domain, just like the one used in this paper, all of the features used in the descriptions of instances may have different levels of relevancy. Therefore, many feature selection and feature weight learning algorithms have been developed by machine learning researchers 3, 5, 12.

We had developed a genetic algorithm for learning the feature weights to be used with the Nearest Neighbor classification algorithm. We applied the same genetic algorithm to determine the weights of the

Conclusions

In this paper, a new classification algorithm called VFI5 has been developed and applied to differential diagnosis of erythemato-squamous diseases. Since each feature is processed separately, the missing feature values that may appear both in the training and test instances are simply ignored in VFI5. In other classification algorithms, such as decision tree inductive learning algorithms, the missing values require extra care [14]. This problem has been overcome by simply omitting the feature

Acknowledgements

This project is supported by TUBITAK (Scientific and Technical Research Council of Turkey) under Grant EEEAG-153. The authors thank Narin Emeksiz for preparing the user interface for the VFI5 program.

References (15)

J Forsström et al.
DIAGAID: a connectionist approach to determine the diagnostic value of clinical data.
Artif Intell Med
(1991)
B Lopez et al.
Case-based learning of plans and goal states in medical diagnosis
Artif Intell Med
(1997)
DW Aha et al.
Instance-Based Learning Algorithms
Mach Learn
(1991)
Akkuş A, Güvenir HA. K Nearest Neighbor classification on Feature Projections. In: Proc. ICML' 96,...
H Almnallim et al.
Learning boolean concepts in the presence of many irrelevant features
Artif Intell
(1994)
S Cost et al.
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features
Mach Learn
(1993)
Demiröz G, Güvenir HA. Genetic Algorithms to Learn Feature Weights for the Nearest Neighbor Algorithm. In: Proc...

There are more references available in the full text version of this article.

Cited by (194)

Fast and explainable clustering based on sorting
2024, Pattern Recognition
We introduce a fast and explainable clustering method called CLASSIX. It consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by the merging of groups into clusters. The algorithm is controlled by two scalar parameters, namely a distance parameter for the aggregation and another parameter controlling the minimal cluster size. Extensive experiments are conducted to give a comprehensive evaluation of the clustering performance on synthetic and real-world datasets, with various cluster shapes and low to high feature dimensionality. Our experiments demonstrate that CLASSIX competes with state-of-the-art clustering algorithms. The algorithm has linear space complexity and achieves near linear time complexity on a wide range of problems. Its inherent simplicity allows for the generation of intuitive explanations of the computed clusters.
Fast and Robust Unsupervised Dimensionality Reduction with Adaptive Bipartite Graphs
2023, Knowledge-Based Systems
Recent unsupervised dimension reduction algorithms use similarity graphs between data point pairs to preserve local structure while reducing dimension. However, the time complexity of these methods is proportional to the square of the number of samples, which limits their application to large-scale datasets. Moreover, the square Euclidean calculation criterion between sample point pairs will magnify the bad influence of outliers on the graph. In addition, these methods only preserve the local structure while losing other important structural information. To this end, we propose a fast adaptive unsupervised projection model termed Fast and Robust Unsupervised Dimensionality Reduction with Adaptive Bipartite Graph (FRUDR-ABG), which uses a few anchor points and sample points to build a bipartite graph to preserve the local geometric structure of the data to reduce the running time and improve efficiency. We propose a criterion based on the $l_{2, 1}$ norm to calculate the distance between anchor points and data points to reduce the negative influence of outliers on graph construction. A practical strategy is also proposed to realize joint learning of global and local structures. According to the characteristics of graph construction and dimensionality reduction adaptive learning in the algorithm, we design an iterative reweighting method to solve the model. Experimental results on several benchmark datasets show that FRUDR-ABG has higher efficiency and recognition performance than existing unsupervised dimensionality reduction methods.
Fine tuning attribute weighted naive Bayes
2022, Neurocomputing
Naive Bayes (NB) is one of the top 10 data mining algorithms due to its simplicity, efficiency and efficacy. However, both the unrealistic attribute conditional independence assumption and the unreliable conditional probability estimation limit its performance. Of numerous improved approaches, attribute weighting only focuses on alleviating the unrealistic attribute conditional independence assumption, while fine tuning devotes all the efforts to finding a more reliable conditional probability estimation. In this study, we argue that both of them are equally important to enhance the performance of NB and propose a novel model called fine tuned attribute weighted NB (FTAWNB) by combining fine tuning with attribute weighting into a uniform framework. In FTAWNB, we first exploit correlation-based attribute weighting to initialize the conditional probabilities, then for each misclassified training instance, the conditional probabilities are fine tuned iteratively to make them more reliable, and the fine tuning process will stop once the training classification accuracy no longer improves. Extensive experimental results show that FTAWNB significantly outperforms all the other existing state-of-the-art competitors.
Joint neighborhood preserving and projected clustering for feature extraction
2022, Neurocomputing
Neighborhood reconstruction is proved effective for dimensionality reduction because of the preservation of manifold structure. Conventional neighborhood preserving embedding (NPE) method first learns the affinity relationship or reconstruction relationship in the original space, and then learns the projection matrix to preserve the learned local information in low-dimensional space. However, the pre-learned manifold information may be inaccurate due to the noises and irrelevant features in real-world data. The performance of dimensionality reduction would be influenced as well. Besides, NPE and its variants only aim to preserve the local reconstruction relationship but ignore the fuzzy membership relationship between samples and cluster prototypes. To address these issues, we propose an adaptive neighborhood preserving discriminant projection model, where sparse reconstruction coefficients are updated in the process of dimensionality reduction to eliminate the influence of noises and irrelevant features. Meanwhile, we also learn the fuzzy membership relationships between data points and cluster prototypes to gather the samples belonging to the same class together in low-dimensional space. Neighborhood reconstruction learning and clustering are seamlessly connected in the learned subspace. To solve this model, an iterative algorithm is developed. The experimental results of recognition accuracy show the superiorities of the proposed methods over the state-of-the-arts.
AZ-skin: Inclusive system for skin disease recognition from hybrid data
2024, Multimedia Tools and Applications
Consensus Clustering With Co-Association Matrix Optimization
2024, IEEE Transactions on Neural Networks and Learning Systems

View all citing articles on Scopus

¹: Present address. Microsoft Corporation, Redmond, WA 98052, USA. Tel.: +1 425 9366181; fax: +1 425 9367329; e-mail: [email protected]

View full text

Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals

Abstract

Introduction

Section snippets

The VFI5 algorithm

Differential diagnosis of erythemato-squamous diseases

Experiments

Comprehensibility of VFI5

Learning feature weights using a genetic algorithm

Conclusions

Acknowledgements

Artif Intell Med

Artif Intell Med

Instance-Based Learning Algorithms

Mach Learn

Learning boolean concepts in the presence of many irrelevant features

Artif Intell

A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Mach Learn