Skip to main content
Log in

Optimization of SVM parameters for recognition of regulatory DNA sequences

  • Original Paper
  • Published:
TOP Aims and scope Submit manuscript

Abstract

Identification and recognition of specific functionally-important DNA sequence fragments such as regulatory sequences are considered the most important problems in bioinformatics. One type of such fragments are promoters, i.e., short regulatory DNA sequences located upstream of a gene. Detection of regulatory DNA sequences is important for successful gene prediction and gene expression studies. In this paper, Support Vector Machine (SVM) is used for classification of DNA sequences and recognition of the regulatory sequences. For optimal classification, various SVM learning and kernel parameters (hyperparameters) and their optimization methods are analyzed. In a case study, optimization of the SVM hyperparameters for linear, polynomial and power series kernels is performed using a modification of the Nelder–Mead (downhill simplex) algorithm. The method allows for improving the precision of identification of the regulatory DNA sequences. The results of promoter recognition for the drosophila sequence datasets are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Ali S, Smith KA (2003) Automatic parameter selection for polynomial kernel. In: Proc of the IEEE int conf on information reuse and integration (IRI 2003), October 27–29, 2003, Las Vegas, NV, USA, pp 243–249

  • Ancona N, Cicirelli G, Stella E, Distante A (2002) Object detection in images: Run-time complexity and parameter selection of Support Vector Machines. In: Proc of the 16th int conf on pattern recognition (ICPR’02), 11–15 August 2002, Quebec, Canada, vol 2, pp 426–429

  • Ayat NE, Cheriet M, Suen CY (2002) Empirical error based optimization of SVM kernels: Application to digit image recognition. In: Proc of the 8th int workshop on frontiers in handwriting recognition (IWFHR’02), August 6–8, 2002, p 292

  • Boardman M, Trappenberg T (2006) A heuristic for free parameter optimization with Support Vector Machines. In: Proc of IEEE int joint conf on neural networks (IJCNN 2006), July 16–21, 2006, Vancouver, Canada, pp 1337–1344

  • Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220:49–65

    Article  Google Scholar 

  • Cassabaum ML, Waagen DE, Rodriguez JJ, Schmitt HA (2004) Unsupervised optimization of Support Vector Machine parameters. In: Kadar I (ed) Automatic target recognition XIV. Proc of SPIE, vol 5426(1), SPIE Defense & Security Symposium, Orlando, FL, April 13–15, 2004, pp 316–325

  • Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1–3):131–159

    Article  Google Scholar 

  • Cherkassky V, Mulier F (1998) Learning from data: concepts, theory, and methods. Wiley, New York

    Google Scholar 

  • Christmann A, Luebke K, Rüping S, Marin-Galianos M (2005) Determination of hyperparameters for kernel-based classification and regression. Technical report 38/05, SFB475, University of Dortmund, Germany

  • Damaševičius R (2008a) Splice site recognition in DNA sequences using k-mer frequency based mapping for Support Vector Machine with power series kernel. In: Proc of int conf on complex software intensive systems (CISIS-2008), March 4–7, 2008, Barcelona, Spain, pp 687–692

  • Damaševičius R (2008b) Feature representation of DNA sequences for machine learning tasks. In: Proc of fifth int workshop on computational systems biology (WCSB 2008), June 11–13, 2008, Leipzig, Germany, pp 29–32

  • Damaševičius R (2008c) Analysis of binary feature mapping rules for promoter recognition in imbalanced DNA sequence datasets using Support Vector Machine. In: Proc of IEEE int conf on intelligent systems (IS’08), September 6–8, 2008, Varna, Bulgaria, vol 2, pp 1120–1125

  • Debnath R, Takahashi H (2004) An efficient method for tuning kernel parameter of the support vector machine. In: Proc of the IEEE int symp on communications and information technology (ISCIT 2004), Sapporo, Japan, October 2004, vol 2, pp 1023–1028

  • Demeler B, Zhou GW (1991) Neural network optimization for E. coli promoter prediction. Nucleic Acids Res 19:1593–1599

    Article  Google Scholar 

  • Duan K, Keerthi SS, Poo AN (2003) Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51:41–59

    Article  Google Scholar 

  • Eitrich T, Lang B (2006) Efficient optimization of Support Vector Machine learning parameters for unbalanced data sets. J Comput Appl Math 196(2):425–436

    Article  Google Scholar 

  • Friedrichs F, Igel C (2004) Evolutionary tuning of multiple SVM parameters. In: Trends in neurocomputing: 12th European symp on artificial neural networks 2004, vol 64, pp 107–117

  • Frohlich H, Zell A (2005) Efficient parameter selection for Support Vector Machines in classification and regression via model-based global optimization. In: Proc of IEEE int joint conf on neural networks (IJCNN ’05), 31 July–4 Aug 2005, vol 3, pp 1431–1436

  • Gold C, Sollich P (2005) Fast Bayesian Support Vector Machine parameter tuning with the nystrom method. In: Proc. of the IEEE int joint conf on neural networks (IJCNN ’05), July 31–August 4, 2005, Montréal, Québec, Canada, vol 5, pp 2820–2825

  • Gordon L, Chervonenkis A, Gammerman AJ, Shahmuradov IA, Solovyev VV (2003) Sequence alignment kernel for recognition of promoter regions. Bioinformatics 19:1964–1971

    Article  Google Scholar 

  • Imbault F, Lebart K (2004) A stochastic optimization approach for parameter tuning of support vector machines. In: Proc of the 17th int conf on pattern recognition (ICPR 2004), 23–26 August 2004, Cambridge, UK, vol 4, pp 597–600

  • Kulkarni A, Jayaraman VK, Kulkarni BD (2004) Support vector classification with parameter tuning assisted by agent-based technique. Comput Chem Eng 28(3):311–318

    Article  Google Scholar 

  • Kurasova O, Dzemyda G, Vainoras A (2007) Parameter system for human physiological data representation and analysis. In: Proc of 3rd Iberian conf on pattern recognition and image analysis, IbPRIA 2007, Girona, Spain, June 6–8, 2007. LNCS, vol 4477, pp 209–216

  • Lim H (2004). Support vector parameter selection using experimental design based generating set search (SVEG) with application to predictive software data modeling. PhD thesis, Syracuse University

  • Lin C-J, Peng C-C, Lee C-Y (2004) Prediction of RNA polymerase binding sites using purine-pyrimidine encoding and hybrid learning methods. Int J Appl Sci Eng 2:177–188

    Google Scholar 

  • Liu YA, Stoller SD, Teitelbaum T (1998) Static caching for incremental computation. ACM Trans Program Lang Syst 20(3):546–585

    Article  Google Scholar 

  • Mattera D, Haykin S (1999) Support vector machines for dynamic reconstruction of a chaotic system. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods: support vector learning, pp 209–241

  • Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313

    Google Scholar 

  • Platt J (2000) Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. MIT Press, Cambridge

    Google Scholar 

  • Quan Y, Yang J (2003) An improved parameter tuning method for support vector machines. In: Proc of 9th int conf on rough sets, fuzzy sets, data mining, and granular computing (RSFDGrC 2003), Chongqing, China, May 26–29, 2003, pp 607–610

  • Ranawana R, Palade V (2005) A neural network based multiclassifier system for gene identification in DNA sequences. J Neural Comput Appl 14:122–131

    Article  Google Scholar 

  • Raudys S (2005) Texonomy of classifiers based on dissimilarity features. In: Proc of 3rd int conf on advances in pattern recognition, ICAPR 2005, Bath, UK, August 22–25, 2005. LNCS, vol 3686, pp 136–145

  • Schittkowski K (2005) Optimal parameter selection in Support Vector Machines. J Ind Manag Optim 1(4):465–476

    Google Scholar 

  • Smola AJ, Murata N, Schölkopf B, Miller KR (1998) Asymptotically optimal choice of ε-loss for support vector machines. In: Proc of 8th int conference on artificial neural networks, Berlin, Germany, pp 105–110

  • Sobha Rani T, Durga Bhavani S, Bapi RS (2007) Analysis of E.coli promoter recognition problem in dinucleotide feature space. Bioinformatics 23(5):582–588

    Article  Google Scholar 

  • Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York

    Google Scholar 

  • van der Walt CM, Barnard E (2006) Data characteristics that determine classifier performance. In: Proc of the 16th annual symp of the pattern recognition association of South Africa, pp 160–165

  • Werner T (2003) The state of the art of mammalian promoter recognition. Brief Bioinform 4(1):22–30

    Article  Google Scholar 

  • Yan B, Domeniconi C (2006) Kernel optimization using pairwise constraints for semi-supervised clustering. Technical report ISE-TR-06-09, Information and Software Engineering Department, George Mason University, Fairfax, Virginia, USA

  • Zhuang L, Dai H (2006) Parameter optimization of kernel-based one-class classifier on imbalance learning. J Comput 1(7):32–40

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robertas Damaševičius.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Damaševičius, R. Optimization of SVM parameters for recognition of regulatory DNA sequences. TOP 18, 339–353 (2010). https://doi.org/10.1007/s11750-010-0152-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11750-010-0152-x

Keywords

Mathematics Subject Classification (2000)

Navigation