Dimension Reduction of Gene Expression Data for Designing Optimized Rule Base Classifier

Paul, Amit; Sil, Jaya; Das Mukhopadhyay, Chitrangada

doi:10.1007/978-81-322-1856-2_15

Amit Paul⁴,
Jaya Sil⁵ &
Chitrangada Das Mukhopadhyay⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 266))

1042 Accesses
1 Citations

Abstract

The paper highlights the need of dimension reduction of voluminous gene expression microarray data for developing a robust classifier to predict patients with cancerous genes. The proposed algorithm builds a fuzzy rule based classifier with optimized rule set without much sacrificing classification accuracy. The gene expression matrix is first discretized using linguistic values. The importance factor of each gene is then evaluated representing the degree of presence of a unique linguistic value of the gene both in disease and nondisease classes. Initial fuzzy rule base consists higher ranking genes and gradually other genes are included in the rule base in order to achieve maximum classification accuracy. Thus optimum rule set is built with important genes for classification of test data set. The methodology proposed here has been successfully demonstrated for the lung cancer classification problem, which includes 97 smokers with lung cancer and 90 without lung cancer gene expression data. The results are promising even though maximum number of genes are removed from the original data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kononenko, I.: Inductive and bayesian learning in medical diagnosis. Appl. Artif. Intell. 7(4), 317–337 (1993)
Article Google Scholar
Wolberg, W., Street, W.: Mangasarian ol. machine learning techniques to diagnose breast cancer from fine-needle aspirates. Cancer Lett. 77, 163–171 (1994)
Article Google Scholar
Wolberg, W., Street, W.: Mangasarian ol. image analysis and machine learning applied to breast cancer diagnosis and prognosis. Anal. Quant. Cytol. Histol. 17(2), 77–87 (1995)
Google Scholar
Kurgan, L., Cios, K., Tadeusiewicz, R., Ogiela, M., Goodenday, L.S.: Knowledge discovery approach to automated cardiac spect diagnosis. Artif. Intell. Med. 23(2), 149–169 (2001)
Google Scholar
Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19(5), 563–570 (2003)
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Article MATH Google Scholar
Yu, J., Ongarello, S., Fiedler, R., Chen, X., Toffolo, G., Cobelli, C., et al.: Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21, 2200–2209 (2005)
Article Google Scholar
Oh, I., Lee, J., Moon, B.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)
Article Google Scholar
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Article Google Scholar
Liu, H., Motoda, H.: Feature extraction, construction and selection: a data mining perspective, 1st edn. Kluwer, Norwell (1998)
Book MATH Google Scholar
Conilione, P., Wang, D.: A comparative study on feature selection for E. coli promoter recognition. Int. J. Inf. Technol. 11, 54–66 (2005)
Google Scholar
Degroeve, S., Baets, B., de Peer, Y., Rouzé, P.: Feature subset selection for splice site prediction. Bioinformatics 18(Suppl 2), 75–83 (2002)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Liu, H., Yu, L.: Toward integrated feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Article Google Scholar
Kuncheva, L.: Fuzzy Classifier Design. Springer, Heidelberg (2000)
Book MATH Google Scholar
Leondes, C. (ed.): Fuzzy Theory Systems: Techniques and Applications, vol. 1–4. Academic Press, San Diego (1999)
Google Scholar
Yuan, Y., Shaw, M.: Induction of fuzzy decision trees. Fuzzy Sets Syst. 25, 125–139 (1995)
Article MathSciNet Google Scholar
Ichihashi, H., Shirai, T., Nagasaka, K., Miyoshi, T.: Neuro-fuzzy ID3: a method of inducing fuzzy decision trees with linear programming for maximizing entropy and an algebraic method for incremental learning. Fuzzy Sets Syst. 84, 1–19 (1996)
Article MathSciNet Google Scholar
Yuan, Y., Zhuang, H.: A genetic algorithm for generating fuzzy classification rules. Fuzzy Sets Syst. 84, 1–19 (1996)
Article MATH Google Scholar
Castillo, L., Gonzalez, A., Perez, P.: Including a simplicity criterion in the selection of the best rule in a genetic fuzzy learning algorithm. Fuzzy Sets Syst. 120(2), 309–321 (2001)
Article MATH MathSciNet Google Scholar
Castro, J., Castro-Schez, J., Zurita, J.: Use of a fuzzy machine learning technique in the knowledge acquisition process. Fuzzy Sets Syst. 123(3), 307–320 (2001)
Article MATH MathSciNet Google Scholar
Jin, Y.: Fuzzy modeling of high-dimensional systems: complexity reduction and interpretability improvement. IEEE Trans. Fuzzy Syst. 8(2), 212–221 (2000)
Article Google Scholar
de Oliveira, V.: Semantic constraints for membership function optimization. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 29(1), 128–138 (1999)
Google Scholar
Pedrycz, W., de Oliveira, V.: Optimization of fuzzy models. IEEE Trans. Systems Man Cybern. Part B Cybern. 26(4), 627–637 (1996)
Google Scholar
Setnes, M., Babuska, R., Verbruggen, B.: Rule-based modeling: precision and transparency. IEEE Trans. Systems Man and Cybern. Part C Appl. Rev. 28(1), 165–169 (1998)
Google Scholar
Setnes, M., Roubos, H.: GA-fuzzy based modeling and classification: complexity and performance. IEEE Trans. Fuzzy Syst. 8(5), 509–522 (2000)
Article Google Scholar
Spira, A., Beane, J., Shah, V., Steiling, K., Liu, G., Schembri, F., Gilman, S., Dumas, Y., Calner, P., Sebastiani, P., Sridhar, S., Beamis, J., Lamb, C., Anderson, T., Gerry, N., Keane, J., Lenburg, M., Brody, J.: Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat. Med. 13(3), 361–366 (2007)
Article Google Scholar
Gustafson, A., Soldi, R., Anderlind, C., Scholand, M., Qian, J., Zhang, X., Cooper, K., Walker, D., McWilliams, A., Liu, G., Szabo, E., Brody, J., Massion, P., Lenburg, M., Lam, S., Bild, A., Spira, A.: Airway PI3K pathway activation is an early and reversible event in lung cancer development. Sci. Transl. Med. 2(26), 26–25 (2010)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The weka data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)
Google Scholar
Hall, M.: Correlation-based feature selection for machine learning. Thesis for the degree of Doctor of Philosophy (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, St. Thomas College of Engineering and Technology, Khidirpore, India
Amit Paul
Computer Science and Technology, Bengal Engineering and Science University, Shibpur, India
Jaya Sil
Health Care Science and Technology, Bengal Engineering and Science University, Shibpur, India
Chitrangada Das Mukhopadhyay

Authors

Amit Paul
View author publications
You can also search for this author in PubMed Google Scholar
Jaya Sil
View author publications
You can also search for this author in PubMed Google Scholar
Chitrangada Das Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amit Paul .

Editor information

Editors and Affiliations

Computer Science and Engineering, Indian School of Mines, Dhanbad, Dhanbad, Jharkhand, India
G. P. Biswas
Computer Science and Engineering, Indian School of Mines, Dhanbad, Dhanbad, Jharkhand, India
Sushanta Mukhopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Paul, A., Sil, J., Das Mukhopadhyay, C. (2014). Dimension Reduction of Gene Expression Data for Designing Optimized Rule Base Classifier. In: Biswas, G., Mukhopadhyay, S. (eds) Recent Advances in Information Technology. Advances in Intelligent Systems and Computing, vol 266. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1856-2_15

Download citation

DOI: https://doi.org/10.1007/978-81-322-1856-2_15
Published: 12 March 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1855-5
Online ISBN: 978-81-322-1856-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics