SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes

doi:10.1371/journal.pcbi.1008439

SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes

Fig 1

The SkewIT algorithm.

A genome of length L is“circularized” by taking the first half of the sequence (L/2) and concatenating that sequence onto the end of the genome (A). The algorithm then splits the sequence into many shorter windows of length w. We assign each window an α value [1,-1,0] based on whether there are more Gs, Cs, or equal quantities of both. (B) The GC skew statistic is shown (left) plotted across the E. coli genome, with a purple dotted line showing where the original sequence ended, prior to concatenating 1/2 of the genome to the end. The plot on the right shows the α value plotted for the same genome. (C) SkewIT finds the location in the genome with the greatest difference in GC skew between the first half and the second half of the genome, by using a pair of sliding windows to find the greatest sum of differences between the α values for the two halves.

doi: https://doi.org/10.1371/journal.pcbi.1008439.g001