Improved swarm intelligence algorithms with time-varying modified Sigmoid transfer function for Amphetamine-type stimulants drug classification

https://doi.org/10.1016/j.chemolab.2022.104574Get rights and content

Highlights

  • A time-varying modified Sigmoid transfer function with two time-varying updating strategies is introduced.

  • The proposed transfer functions are applied in five different standard swarm intelligence (SI) algorithms.

  • The new binary SI algorithms are evaluated in the descriptor selection problem for ATS drug classification.

  • Experimental results confirmed the proposed algorithms promote fast convergence and better classification accuracy.

  • The statistical analyses show significant improvement between the proposed algorithms with the comparative ones.

Abstract

Swarm-intelligence (SI) algorithms have received great attention in addressing various binary optimization problems such as feature selection. In this article, a new time-varying modified Sigmoid transfer function with two time-varying updating schemes is proposed as the binarization method for particle swarm optimization (PSO), grey wolf optimization algorithm (GWO), whale optimization algorithm (WOA), harris hawk optimization (HHO), and manta-ray foraging optimization (MRFO). The new binary algorithms, BPSO, BGWOA, BWOA, BHHO, and BMRFO algorithms are utilized for solving the descriptors selection problem in supervised Amphetamine-type Stimulants (ATS) drug classification task. The goal of this study is to improve the speed of convergence and classification accuracy. To evaluate the performance of the proposed methods, experiments were carried out on a specific chemical dataset containing molecular descriptors of ATS and non-ATS drugs. The results obtained showed that the proposed methods’ performances on the chemical dataset are promising in near to optimal convergence, fast computation, increased classification accuracy, and enormous reduction in descriptor size.

Introduction

Amphetamine-type stimulant (ATS) drugs are one of the popular synthetic drugs of abuse. These substances were originally developed for pharmacological research, but, the underground chemist keep modifying the chemical structure of these compounds to evade legal regulation. The novelty of these substances makes them undetectable by traditional drug testing methods [1].

Nowadays, there are different devices and test kits available in the market for use in ATS drug testing [[2], [3], [4]]. However, several drawbacks were discovered within these methods such as long preparation and execution time, costly apparatus, and equipment, complex testing process, requiring well-trained technicians, unreliable and inconsistent outputs from different test kits, and outdated analytical methods. At present, computational methods have been shown as the promising techniques in the cheminformatics field such as in drug design and discovery [5], drug-non drug classification [6], molecular similarity analysis [7], toxicity prediction [8,9], and quantitative structure-activity/property relationships (QSAR/QSPR) analysis [10]. Computational methods also offer much cheaper and faster procedures. Molecular descriptors are an important component that has provided support for many modern computational models. Molecular descriptors are numerical indexes encoded from molecular structure representation of different dimensionalities (0D, 1D, 2D, 3D, or 4D). The higher the dimensionality the more information about the molecular features is stored in the descriptors.

Due to the rapid increment of chemical data, machine learning has become a promising tool to process big data at high volume, veracity, and velocity and with enormous flexibility [11]. In a previous study, Pratama et al. proposed an approach for ATS drug identification by employing a newly developed 3D image pre-processing technique called 3D Exact Legendre Moment Invariants (3D ELMI) as a feature extractor algorithm and several classification algorithms [12,13]. 3D ELMI is a molecular descriptor algorithm that is used to calculate descriptors of 7190 drug compounds (equal size of ATS and non-ATS drugs). It generates 1185 descriptors for each drug compound and is then used as input to the classification algorithms to perform the identification task. The experimental results found that the random forest (RF) classifier is superior in achieving the highest classification accuracy. According to Ref. [14], molecular classification involved three steps: feature extraction, feature selection, and classification. This study aims to improve the ATS drug classification performance by carrying out the feature selection as a data-preprocessing step before executing the classification task.

The existence of new molecular descriptors that generate high-dimensional descriptors has made the feature or descriptor selection step required in computational modeling. Descriptor selection is a well-known non-polynomial (NP) hard combinatorial search problem as the number of possible feature subsets grows exponentially with the increase of dimensionality. The traditional feature selection techniques are inefficient to handle medium or large descriptors or datasets. Therefore, the SI algorithm is one of the core technology to address this issue. SI algorithm is classified as a population-based optimization algorithm having several advantages [15]:

  • Ease of implementation.

  • Fewer operators compared to evolutionary approaches.

  • Fewer parameters to tune.

  • Retain information about the search space throughout the iteration.

  • Regularly use memory to save the best solution obtained so far.

Some of the successful works that implemented SI algorithms in the descriptors selection problem are outlined in Table 1.

Descriptor selection has become popular research in the cheminformatics domain where researchers attempt to identify the lowest feasible number of descriptors that can provide good predictive performance [16]. The research area of incorporating and implementing SI algorithms to the feature selection problem is still active to date. According to the No Free Lunch (NFL) theorem, there is no universal algorithm that applies to all optimization problems [17]. Therefore, there is always an opportunity to come up with new metaheuristic-based feature selection algorithms to enhance the process of solving feature selection problems. Numerous new approaches are originated in the literature [[18], [19], [20], [21], [22]]. Motivated by this, this research proposed ten new SI-based feature selection algorithms to obtain the significance and discriminative 3D ELMI descriptors to enhance the classification performance.

The initial version of the SI algorithm produces a continuous solution and is only applicable to solve continuous optimization problems. Therefore, the binary version of the SI algorithm is mandatory to generate a binary solution for addressing binary optimization problems. Examples of binary optimization problems are feature selection [23], and traveling salesman problem (TSP) [24]. The common practice is to use the transfer function as a conversion method [[25], [26], [27]]. The implementation of a transfer function is straightforward and does not increase the complexity of the original algorithm. In addition, the utilization of a suitable transfer function will provide a good balance between the exploration and exploitation phases in the SI algorithm resulting in a better convergence and good classification accuracy. Several popular transfer functions used in the literature are listed in Table 2.

In the present research, we introduced a time-varying modified Sigmoid transfer function with a linear time-varying updating strategy. We also adopted the transfer function that we have proposed in Ref. [28]. To evaluate the efficiency of the particular transfer functions, we integrate them into five continuous SI algorithms: particle swarm optimization algorithm (PSO), whale optimization algorithm (WOA), grey wolf optimization algorithm (GWO), harris hawk optimization algorithm (HHO), and manta-ray foraging optimization algorithm (MRFO). The characteristics of these SI algorithms are summarized in Table 3 while Table 4, Table 5 present the transfer functions used to produce their binary versions.

In a few words, this paper introduces a new approach for tackling descriptors selection problem based on the PSO, BWO, GWO, HHO, and MRFO algorithms, and its main contributions can be summarized as follows:

  • 1.

    A novel time-varying modified Sigmoid transfer function with a linear (TV1) time-varying updating scheme is introduced as a binarization technique for the metaheuristic algorithm.

  • 2.

    Ten new binary variants of PSO, BWOA, GWO, HHO, and MRFO algorithms are developed by employing TV1 and our recently proposed transfer function (TV2) in Ref. [28]: BPSOTV1, BPSOTV2, BWOATV1, BWOATV2, BGWOTV1, BGWOTV2, BHHOTV1, BHHOTV2, BMRFOTV1, BMRFOTV2.

  • 3.

    These proposed SI algorithms are adapted as a feature search for wrapper feature selection in a supervised binary classification task that differentiates ATS and non-ATS drugs.

  • 4.

    The final results were assessed based on different performance metrics, including the average fitness, average classification accuracy, average fitness, average number of selected features, as well as the respective standard deviation values.

  • 5.

    The significance of the proposed algorithms was validated against competitive algorithms using a Wilcoxon's rank-sum non-parametric statistical test at a significance level of α ​= ​0.05.

There are several limitations of this study which include:

  • 1.

    The transfer functions are validated on five SI-based optimization algorithms only.

  • 2.

    The algorithms are used to solve the descriptors selection problem in the drug analysis domain.

  • 3.

    Only one chemical dataset is used for algorithms evaluation.

  • 4.

    This study does not apply any data preprocessing to the dataset.

The remainder of this paper is structured as follows. Section 2 explains the concepts of the proposed new transfer functions and their implementation in PSO, GWO, WOA, HHO, and MRFO algorithms. Section 3 describes the necessary material and methods used in the experiments. Results and discussions are presented in Section 4. Finally, Section 5 concludes and highlights some future works.

Section snippets

The binary version of PSO, GWO, WOA, HHO, and MRFO

[23,24,50]. In BPSO, BWOA, BGWOA, BHHO, and BMRFO, the search agents (solutions) update their positions continuously to any point in the search space based on the best search agent discovered so far. Then the real position of search agents is converted to binary values using the proposed time-varying modified Sigmoid transfer function. This technique forces search agents to move in a binary space by probability definition which updates each element (feature) in the solution (features subset) to

Dataset

In this study, a special chemical dataset was adopted in the experiments. The dataset contains 1187 descriptors to represent the 3D molecular structure of 3595 ATS dan 3595 non-ATS drugs compounds. The descriptors were generated by using a novel 3D Exact Legendre Moment Invariant molecular descriptors algorithm that was developed by Pratama in Refs. [12,13]. The descriptors comprise molecule id, 1185 moment invariants values, and a binary class label, 0 (non-ATS drug) and 1 (ATS drug). During

Experimental results and discussion

Table 8 present the detailed experimental results. The average fitness value achieved by the proposed algorithms is much lower when compared with their existing binary version except for BGWO TV1. It is confirmed by the convergence curves illustrated in Figs. 2–6. This disclosed the potential of TV2 in improving the convergence of BPSO, BWOA, BGWO, and BHHO. Meanwhile, TV1 is capable to help the BMRFO algorithm to escape from local optima more efficiently. BMRFOTV1 is seen to provide the lowest

Conclusions and future work

In this paper, the time-varying modified Sigmoid transfer function with two time-varying update strategies are introduced as binarization methods for PSO, GWO, WOA, HHO, and MRFO algorithms. These proposed binary SI algorithms are used in the descriptors selection problem to improve ATS drug classification accuracy. Among all the best proposed binary algorithms, BMRFOTV1 was seen to be superior and significant in selecting relevant and discriminative descriptors and obtained high classification

Author statement

Norfadzlia Mohd Yusof: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing – Original draft, Visualization. Azah Kamilah Muda: Conceptualization, Supervison, Project administration, Funding acquisition. Satrya Fajri Pratama: Conceptualization, Data Curation, Resource, Supervison. Ramon Carbo-Dorca: Wrting – Review & Editing. Ajith Abraham: Wrting – Review & Editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by Universiti Teknikal Malaysia Melaka through the Fundamental Research Grant Scheme [FRGS/1/2020/FTMK-CACT/F00461] from the Ministry of Higher Education, Malaysia.

References (64)

  • M. Mafarja et al.

    Binary dragonfly optimization for feature selection using time-varying transfer functions

    Knowl. Base Syst.

    (2018)
  • S. Mirjalili et al.

    S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization

    Swarm Evol. Comput.

    (2013)
  • Z. Beheshti

    A time-varying mirrored S-shaped transfer function for binary particle swarm optimization

    Inf. Sci.

    (2020)
  • M. Mafarja et al.

    Whale optimization approaches for wrapper feature selection

    Appl. Soft Comput.

    (2018)
  • M. Mafarja et al.

    Binary grasshopper optimisation algorithm approaches for feature selection problems

    Expert Syst. Appl.

    (2019)
  • E. Emary et al.

    Binary ant lion approaches for feature selection

    Neurocomputing

    (2016)
  • B. Turkoglu et al.

    Binary artificial algae algorithm for feature selection

    Appl. Soft Comput.

    (2022)
  • A.A. Abd El-Mageed et al.

    Improved binary adaptive wind driven optimization algorithm-based dimensionality reduction for supervised classification

    Comput. Ind. Eng.

    (2022)
  • L. Harper et al.

    An overview of forensic drug testing methods and their suitability for harm reduction point-of-care services

    Harm Reduct. J.

    (2017)
  • E. Lendoiro et al.

    An LC-MS/MS methodological approach to the analysis of hair for amphetamine-type-stimulant (ATS) drugs, including selected synthetic cathinones and piperazines

    Drug Test. Anal.

    (2017)
  • H. Chung et al.

    Amphetamine-type stimulants in drug testing

    Mass Spectrom Lett

    (2019)
  • M.D. Krasowski et al.

    Using cheminformatics to predict cross reactivity of “designer drugs” to their currently available immunoassays

    J. Cheminf.

    (2014)
  • A. Karim et al.

    Efficient toxicity prediction via simple features using shallow neural networks and decision trees

    ACS Omega

    (2019)
  • G. Idakwo et al.

    A review on machine learning methods for in silico toxicity prediction

    J. Environ. Sci. Health Part C Environ. Carcinog. Ecotoxicol. Rev.

    (2018)
  • Y. Wang et al.

    Incorporating PLS model information into particle swarm optimization for descriptor selection in QSAR/QSPR

    J. Chemom.

    (2015)
  • S.F. Pratama

    Three-Dimensional Exact Legendre Moment Invariants for Amphetamine-type Stimulants Molecular Structure Representation

    (2017)
  • S.F. Pratama et al.

    Preparation of ATS drugs 3D molecular structure for 3D moment invariants-based molecular descriptors

  • A. Elsawy et al.

    A hybridised feature selection approach in molecular classification using CSO and GA

    Int. J. Comput. Appl. Technol.

    (2019)
  • Z.Y. Algamal et al.

    QSAR model for predicting neuraminidase inhibitors of influenza A viruses (H1N1) based on adaptive grasshopper optimization algorithm

    SAR QSAR Environ. Res.

    (2020)
  • D.H. Wolpert et al.

    No free lunch theorems

    IEEE Trans. Evol. Comput.

    (1997)
  • F.S. Gharehchopogh et al.

    Chaotic Vortex Search Algorithm: Metaheuristic Algorithm for Feature Selection

    (2021)
  • H. Mohammadzadeh et al.

    A novel hybrid whale optimization algorithm with flower pollination algorithm for feature selection: case study Email spam detection

    Comput. Intell.

    (2020)
  • Cited by (10)

    • Improving Amphetamine-type Stimulants drug classification using chaotic-based time-varying binary whale optimization algorithm

      2022, Chemometrics and Intelligent Laboratory Systems
      Citation Excerpt :

      Besides that, the transfer function can improve the exploration and exploitation of the SI algorithm therefore the selection of an appropriate transfer function is important. Various transfer functions have been utilized and examined by several researchers in BWOA for feature selection problems in supervised data classification [22–27]. Another improvement method in the SI domain is the application of chaos theory.

    View all citing articles on Scopus
    View full text