Abstract
Total order ranking (TOR) strategies, which are mathematically based on elementary methods of discrete mathematics, seem to be attractive and simple tools for performing data analysis. Moreover order-ranking strategies seem to be a very useful tool not only to perform data exploration but also to develop order ranking models, a possible alternative to conventional quantitative structure–activity relationship (QSAR) methods. In fact, when data material is characterised by uncertainties, order methods can be used as alternative to statistical methods such as multilinear regression (MLR), because they do not require specific functional relationships between the independent and dependent variables (responses). A ranking model is a relationship between a set of dependent attributes, experimentally investigated, and a set of independent attributes, i.e. model attributes, which are calculated attributes. As in regression and classification models, the variable selection model is one of the main steps in finding predictive models. In this work the genetic algorithm–variable subset selection (GA–VSS) approach is proposed as the variable selection method for searching for the best ranking models within a wide set of variables. The models based on the selected subsets of variables are compared with the experimental ranking and evaluated by the Spearman’s rank index. A case study application is presented on a TOR model developed for polychlorinated biphenyl (PCB) compounds, which have been analysed according to some of their physicochemical properties which play an important role in their environmental impact.
Similar content being viewed by others
References
Halfon E, Reggiani MG (1986) On ranking chemicals for environmental hazard. Environ Sci Technol 20:1173–1179
Halfon E (1989) Comparison of an index function and a vectorial approach method for ranking of waste disposal sites. Environ Sci Technol 23:600–609
Halfon E, Bruggemann R (1998) On ranking chemicals for environmental hazard. Comparison of methodologies. Proceedings of the workshop on order theoretical tools in environmental sciences, pp 11–48
Massart DL, Vandeginste BGM, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J (1997) Handbook of chemometrics and qualimetrics: part A, Amsterdam, chapter 26, pp 783–803
Keller RH, Massart DL (1991) Chemom Intell Lab Syst 175–189
Hendriks MMWB, Boer JH, Smilde AK, Doorbos DA (1992) Chemom Intell Lab Syst 16:175–191
Lewi PJ, Van Hoof J, Boey P (1992) Chemom Intell Lab Syst 16:139–144
Harrington EC (1965) Industrial quality control 21:494–498
Hocking RR (1976) The analysis and selection of variables in linear regression. Biometrics 32:1–49
Miller AJ (1990) Subset Selection in Regression. Chapman and Hall, London (UK), pp 230
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Massachusetts
Wehrens R, Buydens LMC (1998) Evolutionary optimization: a tutorial. TrAC, Trends Anal Chem 17(4):193–203
Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6:267–281
Leardi R (1994) Application of genetic algorithms to feature selection under full validation conditions and to outlier detection. J Chemom 8:65–79
Luke BT (1994) Evolutionary programming applied to the development of quantitative structure-activity relationships and quantitative structure-property relationships. J Chem Inf Comput Sci 34:1279–1287
Leardi R (1996) Genetic algorithms in feature selection. In: Devillers J (ed) Genetic algorithms in molecular modeling. Principles of QSAR and Drug Design. vol 1. Academic, London, pp 67–86
Todeschini R, Consonni V, Mauri A, Pavan M (2004) MobyDigs: software for regression and classification models by genetic algorithms In: Leardi R (ed) Nature-inspired methods in chemometrics: genetic algorithms and artificial neural networks, chap 5. Elsevier, p 141–167
Kendall MG (1948) Rank Correlation Methods. Charles Griffin and Co., London 195:202–204
Patil GS (1991) Correlation of aqueous solubility and octanol-water partition coefficient based on molecular structure. Chemosphere 22(8):723–738
Myrdal P, Ward GH, Dannenfelser R-M, Mishra D, Yalkowsky SH (1992) AQUAFAC 1: Aqueous functional group activity coefficients: application to hydrocarbons. Chemosphere 24:1047–1061
Todeschini R, Consonni V, Mauri A, Pavan M (2004) DRAGON, Rel. 5 for Windows; Talete srl: Milano, Italy
HYPERCHEM (1995) Rel 4 for Windows. Autodesk. Inc., Sausalito USA
Bonchev D (1983) Information theoretic indices for characterization of chemical structures. Research Studies Press, Chichester, UK
Devillers J, Balaban AT (2000) Topological indices and related descriptors in QSAR and QSPR. Gordon and Breach, Amsterdam
Kier LB, Hall LH (1986) Molecular connectivity in structure-activity analysis. Research Studies Press, Wiley, Chichester , pp 262
Moreau G, Broto P (1980a) The autocorrelation of a topological structure: a new molecular descriptor. Nouv J Chim 4:359–360
Moreau G, Broto P (1980b) Autocorrelation of molecular structures: application to SAR studies. Nouv J Chim 4:757–764
Broto P, Moreau G, Vandycke C (1984) Molecular structures: perception, autocorrelation descriptor and SAR studies. Autocorrelation Descriptor. Eur J Med Chem 19:66–70
Estrada E (1995) Edge adjacency relationships and a novel topological index related to molecular volume. J Chem Inf Comput Sci 35:31–33
Pearlman RS, Smith KM (1998) Novel software tools for chemical diversity. In: Kubinyi H, Folkers G, Martin YC (eds) 3D QSAR in Drug Design, vol 2. Kluwer/ESCOM, Dordrecht, pp 339–353
Pearlman RS (1999) Novel software tools for addressing chemical diversity. Internet Communication, http://www.netsci.org/Science/Combichem/feature08.html
Gálvez J, Garcìa R, Salabert MT, Soler R (1994) Charge indexes. New Topological Descriptors. J Chem Inf Comput Sci 34:520–525
Gálvez J, Garcìa-Domenech R, De Julián-Ortiz V, Soler R (1995) Topological approach to drug design. J Chem Inf Comput Sci 35:272–284
Balaban AT, Ciubotariu D, Medeleanu M (1991) Topological indices and real vertex invariants based on graph eigenvalues or eigenvectors. J Chem Inf Comput Sci 31:517–523
Randic M (1995) Molecular shape profiles. J Chem Inf Comput Sci 35:373–382
Randic M (1996) Quantitative structure-property relationship—boiling points of planar benzenoids. New J Chem 20:1001–1009
Hemmer MC, Steinhauer V, Gasteiger J (1999) Deriving the 3D structure of organic molecules from their infrared spectra. Vib Spectrosc 19:151–164
Schuur J, Gasteiger J (1996) 3D-MoRSE Code—a new method for coding the 3D structure of molecules. In: Gasteiger J (ed) Software Development in Chemistry, vol 10. Fachgruppe Chemie-Information-Computer (CIC), Frankfurt am Main
Schuur J, Gasteiger J (1997) Infrared spectra simulation of substituted benzene derivatives on the basis of a 3D structure representation. Anal Chem 69:2398–2405
Todeschini R, Lasagni M, Marengo E (1994) New molecular descriptors for 2D- and 3D-Structures. Theory J Chemom 8:263–273
Todeschini R, Gramatica P (1997) 3D-Modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of WHIM descriptors. Quant Struct-Act Relat 16:113–119
Consonni V, Todeschini R, Pavan M (2002) Structure/response correlation and similarity/diversity analysis by GETAWAY descriptors. Part 1. Theory of the novel 3D molecular descriptors. J Chem Comput Sci 42:693–705
Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, p 667
Todeschini R, Consonni V, Mauri A, Pavan M (2003) RANA for Windows; Talete srl, Milano
Acknowledgements
Financial support from the Commission of the European Union (R&D project “Beam”, EVK1-CT1999-00012) is acknowledged.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pavan, M., Mauri, A. & Todeschini, R. Total ranking models by the genetic algorithm variable subset selection (GA–VSS) approach for environmental priority settings. Anal Bioanal Chem 380, 430–444 (2004). https://doi.org/10.1007/s00216-004-2762-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00216-004-2762-3