Elsevier

Journal of Theoretical Biology

Volume 336, 7 November 2013, Pages 52-60
Journal of Theoretical Biology

Study of LZ-word distribution and its application for sequence comparison

https://doi.org/10.1016/j.jtbi.2013.07.008Get rights and content

Highlights

  • With the components' length in mind, we revised Lempel–Ziv complexity.

  • We first investigated the whole distribution of LZ-words.

  • We defined transition and extension operations among the revised LZ-word sets.

  • We calculated numerical characteristics of the sorted union LZ-word set.

Abstract

Lempel–Ziv complexity has been widely used for sequence comparison and achieved promising results, but until now components' distribution in exhaustive history has not been studied. This paper investigated the whole distribution of LZ-words and presented a novel statistical method for sequence comparison. With the components' length in mind, we revised Lempel–Ziv complexity and obtained various sets of LZ-words. Instead of calculating the LZ-words' contents, we defined a series of set operations on LZ-word set to compare biological sequences. In order to assess the effectiveness of the proposed method, we performed two sets of experiments and compared it with alignment-based methods.

Keywords

Lempel–Ziv complexity
Word set
Set operation
Phylogenetic analysis

Cited by (0)

View Abstract