Score statistics of global sequence alignment from the energy distribution of a modified directed polymer and directed percolation problem

Mihaela E. Sardiu, Gelio Alves, and Yi-Kuo Yu
Phys. Rev. E 72, 061917 – Published 23 December 2005

Abstract

Sequence alignment is one of the most important bioinformatics tools for modern molecular biology. The statistical characterization of gapped alignment scores has been a long-standing problem in sequence alignment research. Using a variant of the directed path in random media model, we investigate the score statistics of global sequence alignment taking into account, in particular, the compositional bias of the sequences compared. Such statistics are used to distinguish accidental similarity due to compositional similarity from biologically significant similarity. To accommodate the compositional bias, we introduce an extra parameter indicating the probability for positive matching scores to occur. When is small, a high scoring alignment obviously cannot come from compositional similarity. When is large, the highest scoring point within a global alignment tends to be close to the end of both sequences, in which case we say the system percolates. By applying finite-size scaling theory on percolating probability functions of various sizes (sequence lengths), the critical at infinite size is obtained. For alignment of length , the fact that the score fluctuation grows as is confirmed upon investigating the scaling form of the alignment score. Using the Kolmogorov-Smirnov statistics test, we show that the random variable , if properly scaled, follows the Tracy-Widom distributions: Gaussian orthogonal ensemble for slightly larger than and Gaussian unitary ensemble for larger . Although these results deepen our understanding of the distribution of alignment scores, the use of these results in practical applications remains somewhat heuristic and needs to be further developed. Nevertheless, the possibility of characterizing score statistics for modest system size (sequence lengths), via proper reparametrization of alignment scores, is illustrated.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
10 More
  • Received 3 June 2005
  • Revised 21 October 2005

DOI:https://doi.org/10.1103/PhysRevE.72.061917

Authors & Affiliations

Mihaela E. Sardiu1,2, Gelio Alves1,2, and Yi-Kuo Yu2

  • 1Department of Physics, Florida Atlantic University, Boca Raton, Florida 33431, USA
  • 2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA

Article Text (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 72, Iss. 6 — December 2005

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×