Skip to main content
Log in

Assessing gene stability and gene affinity in microarray data classification using an extended relieff algorithm

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Microarray data have become an integral part of the clinical and drug discovery process. Due to its voluminous and heterogeneous nature, the question arises of the interpretability and stability of the traditional gene selection method. To enhance the stability of the gene selection method, so that the results are better explicable, an ameliorated Extended ReliefF gene selection algorithm is proposed. It encodes gene affinity information using a new mathematical formula based on Bayes’ theorem and Manhattan distance for calculating the nearest neighbor in a pooled sample. It works in four aspects: initializing sample gene weight, improving gene weight, maximizing sample gene weight and finally adopting mutation operation. The proposed method selects the most informative genes which are highly perceptive to the prognosis of the disease. Further, to accomplish the accuracy and stability of the algorithm, soft classification is performed on Relieved_F, STIR, VLS-RelifF, I-RelieF, conventional ReliefF and proposed extended ReliefF algorithms using three classifiers namely Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) on ten microarray datasets. According to the findings, MLP training times are much longer than those of RF and SVM. From a network perspective, SVM is much faster at training, whereas MLP excels in terms of accuracy. With a rise in gene similarity among the genes selected from the multiple training sets, the approach becomes more stable. As a result, it can be seen that the recommended gene selection algorithm greatly outperforms the other feature selection methods in terms of accuracy and stability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Fig. 2

Similar content being viewed by others

Data availability

The datasets used in this research work is downloaded from genomics-pubs.princeton.edu/oncology/database.htm.

Abbreviations

SVM:

Support Vector Machine

MLP:

Multilayer Perceptron

CV:

Cross-validation

RF:

Random Forest

LOOCV:

Leave One Out Cross-Validation

References

  1. Alizadeh AA, Eisen MB et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511. https://doi.org/10.1038/35000501

    Article  Google Scholar 

  2. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024

    Article  Google Scholar 

  3. Dang TH, Trung DP, Tran HL, Le Van Q (2016) Using dimension reduction with feature selection to enhance accuracy of tumor classification. 2016 IntConf Biomed Eng (BME-HUST). https://doi.org/10.1109/bme-hust.2016.7782082

  4. Dashtban M, Balafar M (2017) Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 109(2):91–107. https://doi.org/10.1016/j.ygeno.2017.01.004

    Article  Google Scholar 

  5. Dhanalakshmi R,  Khaire UM (2019) Feature selection and classification of microarray data for cancer prediction using mapreduce implementation of random forest algorithm. Journal of Scientific and Industrial Research  78:158:161

  6. Drotár P, Gazda J, Smékal Z (2015) An experimental comparison of feature selection methods on two-class biomedical datasets. Comput Biol Med 66:1–10. https://doi.org/10.1016/j.compbiomed.2015.08.010

    Article  Google Scholar 

  7. Furlanello C, Serafini M, Merler S, Jurman G (2003) An accelerated procedure for recursive feature ranking on microarray data. Neural Netw 16(5–6):641–648. https://doi.org/10.1016/s0893-6080(03)00103-5

    Article  Google Scholar 

  8. Ghosh A, Barman S (2016) Application of Euclidean distance measurement and principal component analysis for gene identification. Gene 583(2):112–120. https://doi.org/10.1016/j.gene.2016.02.015

    Article  Google Scholar 

  9. Giurcaneanu C, Tabus I, Shmulevich I, Wei Zhang (2003) Stability-based cluster analysis applied to microarray data. Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings. https://doi.org/10.1109/isspa.2003.1224814

  10. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537. https://doi.org/10.1126/science.286.5439.531

    Article  Google Scholar 

  11. Goncalves J, Marks W (2002) Roles and requirements for a research microarray database. IEEE Eng Med Biol Mag 21(6):154–157. https://doi.org/10.1109/memb.2002.1175154

    Article  Google Scholar 

  12. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene Selection for Cancer Classification using Support Vector Machines. Mach Learn 46:389–422. https://doi.org/10.1023/A:1012487302797

    Article  Google Scholar 

  13. Hinrichs A, Prochno J, Ullrich M (2019) The curse of dimensionality for numerical integration on general domains. J Complex 50:25–42. https://doi.org/10.1016/j.jco.2018.08.003

    Article  MathSciNet  Google Scholar 

  14. Imoto S, Miyano S (2012) A Top-R feature selection algorithm for Microarray gene expression data. IEEE/ACM Trans Comput Biol Bioinf 9(3):754–764. https://doi.org/10.1109/tcbb.2011.151

    Article  Google Scholar 

  15. K C, S. K, Mundayoor S (2015) A BBO based feature selection method for DNA microarray. ARC J Int J Res Stud Biosci (IJRSB)3(1):201–204

  16. Khan MW, Alam M (2012) A survey of application: Genomics and genetic programming, a new frontier. Genomics 100(2):65–71. https://doi.org/10.1016/j.ygeno.2012.05.014

    Article  Google Scholar 

  17. Kumar M, Kumar Rath S (2015) Classification of microarray using MapReduce based proximal support vector machine classifier. Knowl-Based Syst 89:584–602. https://doi.org/10.1016/j.knosys.2015.09.005

    Article  Google Scholar 

  18. Kumar M, Rath NK, Swain A, Rath SK (2015) Feature selection and classification of Microarray data using MapReduce based ANOVA and k-nearest neighbor. Procedia Comput Sci 54:301–310. https://doi.org/10.1016/j.procs.2015.06.035

    Article  Google Scholar 

  19. Kumar V (2014) Feature selection: A literature review. Smart Comput Rev 4(3). https://doi.org/10.6029/smartcr.2014.03.007

  20. Li X, Li M, Yin M (2017) Multiobjective ranking binary artificial bee colony for gene selection problems using microarray datasets. IEEE/CAA J Autom Sin 1–16. https://doi.org/10.1109/jas.2016.7510034

  21. Nakai K, Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14(4):897–911. https://doi.org/10.1016/s0888-7543(05)80111-9

    Article  Google Scholar 

  22. Pang H, George SL, Hui K, Tong T (2012) Gene selection using iterative feature elimination random forests for survival outcomes. IEEE/ACM Trans Comput Biol Bioinf 9(5):1422–1431. https://doi.org/10.1109/tcbb.2012.63

    Article  Google Scholar 

  23. Perthame É, Friguet C, Causeur D (2016) Stability of feature selection in classification issues for high-dimensional correlated data. Stat Comput 26(4):783–796. https://doi.org/10.1007/s11222-015-9569-2

    Article  MathSciNet  Google Scholar 

  24. Somol P, Novovičová J (2010) Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality. IEEE Trans Pattern Anal Mach Intell 32(11):1921–1939. https://doi.org/10.1109/TPAMI.2010.34

    Article  Google Scholar 

  25. Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15(11):1119–1125. https://doi.org/10.1016/0167-8655(94)90127-9

    Article  Google Scholar 

  26. Ray SS, Ganivada A, Pal SK (2016) A granular self-organizing map for clustering and gene selection in Microarray data. IEEE Trans Neural Netw Learn Syst 27(9):1890–1906. https://doi.org/10.1109/tnnls.2015.2460994

    Article  MathSciNet  Google Scholar 

  27. Ruan J, Jahid MJ, Gu F, Lei C, Huang Y, Hsu Y, Mutch DG, Chen C, Kirma NB, Huang TH (2019) A novel algorithm for network-based prediction of cancer recurrence. Genomics 111(1):17–23. https://doi.org/10.1016/j.ygeno.2016.07.005

    Article  Google Scholar 

  28. Tu K, Yu H, Guo Z, Li X (2004) Learnability-based further prediction of gene functions in gene ontology. Genomics 84(6):922–928. https://doi.org/10.1016/j.ygeno.2004.08.005

    Article  Google Scholar 

  29. Yates (1999) Modern information retrieval. Pearson Education India

  30. Zahiri J, Yaghoubi O, Mohammad-Noori M, Ebrahimpour R, Masoudi-Nejad A (2013) PPIevo : Protein–protein interaction prediction from PSSM based evolutionary information. Genomics 102(4):237–242. https://doi.org/10.1016/j.ygeno.2013.05.006

    Article  Google Scholar 

  31. Srivastava N, Gautam J (2017) Prognosis of disease that may occur with growing age using confabulation based algorithm. Def Life Sci J 2(4):399–405. https://doi.org/10.14429/dlsj.2.11029

  32. Ahmad S, Mehfuz S, Mebarek-Oudina F, Beg J (2022) RSM analysis based cloud access security broker: a systematic literature review. Clust Comput 25(5):3733–3763

    Article  Google Scholar 

  33. Nyo MT, Mebarek-Oudina F, Hlaing SS, Khan NA (2022) Otsu’s thresholding technique for MRI image brain tumor segmentation. Multimed Tools Appl 81(30):43837–43849

    Article  Google Scholar 

  34. Sheela CJJ, Suganthi G (2022) Automatic brain tumor segmentation from MRI using greedy snake model and fuzzy C-means optimization. J King Saud Univ-Comput Inf Sci 34(3):557–566

    Google Scholar 

  35. Sucharita S, Sahu B, Swarnkar T, Meher SK (2023) Classification of cancer microarray data using a two-step feature selection framework with moth-flame optimization and extreme learning machine. Multimed Tools Appl 1–28

  36. Ram PK, Kuila P (2023) Dynamic scaling factor based differential evolution with multi-layer perceptron for gene selection from pathway information of microarray data. Multimed Tools Appl 82(9):13453–13478

    Article  Google Scholar 

  37. Chaki J, Dey N (2020) Pattern analysis of genetics and genomics: a survey of the state-of-art. Multimed Tools Appl 79:11163–11194

    Article  Google Scholar 

Download references

Funding

This work is funded under the Data Science Research of Interdisciplinary Cyber-Physical Systems (ICPS) Programme of the Department of Science and Technology (DST) [Sanction Number T-54], New Delhi, Government of India, India.

Author information

Authors and Affiliations

Authors

Contributions

Both authors made contributions to the planning and design of the study. Ms. Neha Srivastava prepared the materials, collected the data, and carried out the analysis. Ms. Neha Srivastava wrote the manuscript's initial draught, while Dr. Devendra K. Tayal provided feedback on earlier draughts. The final manuscript was read and approved by both writers.

Corresponding author

Correspondence to Neha Srivastava.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Srivastava, N., Tayal, D.K. Assessing gene stability and gene affinity in microarray data classification using an extended relieff algorithm. Multimed Tools Appl 83, 45761–45776 (2024). https://doi.org/10.1007/s11042-023-17149-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17149-0

Keywords

Navigation