Motifs in SARS-CoV-2 evolution

  1. Christian M. Reidys1,3
  1. 1Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, Virginia 22904, USA
  2. 2Department of Computer Science, University of Virginia, Charlottesville, Virginia 22904, USA
  3. 3Department of Mathematics, University of Virginia, Charlottesville, Virginia 22904, USA
  1. Corresponding author: duckcr{at}gmail.com
  1. Handling editor: Peter Stadler

Abstract

We present a novel framework enhancing the prediction of whether novel lineage poses the threat of eventually dominating the viral population. The framework is based purely on genomic sequence data, without requiring prior established biological analysis. Its building blocks are sets of coevolving sites in the alignment (motifs), identified via coevolutionary signals. The collection of such motifs forms a relational structure over the polymorphic sites. Motifs are constructed using distances quantifying the coevolutionary coupling of pairs and manifest as coevolving clusters of sites. We present an approach to genomic surveillance based on this notion of relational structure. Our system will issue an alert regarding a lineage, based on its contribution to drastic changes in the relational structure. We then conduct a comprehensive retrospective analysis of the COVID-19 pandemic based on SARS-CoV-2 genomic sequence data in GISAID from October 2020 to September 2022, across 21 lineages and 27 countries with weekly resolution. We investigate the performance of this surveillance system in terms of its accuracy, timeliness, and robustness. Lastly, we study how well each lineage is classified by such a system.

Keywords

  • Received December 15, 2022.
  • Accepted September 20, 2023.

This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

| Table of Contents