A k-mer-based method for the identification of phenotype-associated genomic biomarkers and predicting phenotypes of sequenced bacteria
Fig 4
Virulence genes in corresponding clusters and wzi included in the PhenotypeSeeker prediction model in K. pneumoniae strains (13-mers, weighted, max. 10 000 k-mers for the regression model).
Each row is one strain, and each column represents one protein coding gene. Blue cells represent 13-mers in the model for the corresponding gene and a strain. Genes in colibactin, aerobactin and yersiniabactin clusters show the most differentiating pattern between carrier and invasive/infectious strains.