Phylogenetic insights into the diversity of Chryseobacterium species

The genus Chryseobacterium was formally established in 1994 and contains 112 species with validly published names. Most of these species are yellow or orange coloured, and contain a flexirubin-type pigment. The genomes of 83 of these 112 species have been sequenced in view of their importance in clinical microbiology and potential applications in biotechnology. The National Center for Biotechnology Information taxonomy browser lists 1415 strains as members of the genus Chryseobacterium , of which the genomes of 94 strains have been sequenced. In this study, by comparing the 16S rDNA and the deduced proteome sequences, at least 20 of these strains have been proposed to represent novel species of the genus Chryseobacterium . Furthermore, a yellow-coloured bacterium isolated from dry soil in the USA (and identified as Flavobacterium sp. strain B-14859) has also been reconciled as a novel member of the genus Chryseobacterium based on the analysis of 16S rDNA sequences and the presence of flexirubin. Yet another bacterium (isolated from a water sample collected in the Western Ghats of India and identified as Chryseobacterium sp. strain WG4) was also found to represent a novel species. These proposals need to be validated using polyphasic taxonomic approaches.


InTRoduCTIon
Chryseobacterium was circumscribed as a novel genus of the family Flavobacteriaceae by Vandamme et al. [1] to provide a separate taxonomic status for six members of the genus Flavobacterium that appeared to be distantly related to the type species Flavobacterium aquatile. This characterization was based on DNA:rRNA hybridization and chemotaxonomic studies [1]. The name of the genus (chryseos=golden) was due to the fact that the bacteria produced yellow to orangecoloured colonies on solid media [1], and it was reported that the pigment was flexirubin [1,2]. [Flavobacterium] gleum, which was isolated by Holmes et al. [3] from human clinical specimens, was designated as the type species of Chryseobacterium [1]. With Chryseobacterium meningosepticum being distinct from other bacteria within the group [1], it appears that Chryseobacterium was destined to be heterogeneous since the time of its inception. The genus Elizabethkingia was later carved out of Chryseobacterium to accommodate C. meningosepticum [4]. Although there were a mere 18 species of Chryseobacterium in 2006 [5], that number had risen to 58 by 2014 [6]. At the time of writing (January 2019), the list of prokaryotic names with standing in nomenclature ( www. bacterio. net) contained 112 Chryseobacterium spp. with validly published names, representing every letter of the English alphabet except Q. Many of these species were reported to be multidrug resistant [5]. In view of their importance in clinical microbiology and potential applications in biotechnology, the genomes of 83 Chryseobacterium spp. have been sequenced and are available in National Center for Biotechnology Information (NCBI) ( www. ncbi. nlm. nih. gov/ genome/? term= Chryseobacterium). Despite this wealth of data, only a few systematic attempts have been made to compare these genomes [7][8][9][10]. Notable among these attempts is the quest to characterise antibiotic resistance and identify the genetic basis for the same [7,8,10]. Furthermore, using genome-based taxonomic analysis, it has been proposed that the genus Chryseobacterium be emended to include members of the closely related genus Epilithonimonas [11]. At the time of writing (January 2019), the NCBI taxonomy browser ( www. ncbi. nlm. nih. gov/ Taxonomy/ Browser/ wwwtax. cgi? id= 59732) had listed 1415 strains as members of the genus Chryseobacterium. The genomes of 94 of these strains have also been sequenced and are available in NCBI ( www. ncbi. nlm. nih. gov/ genome/ genomes/ 13849?). However, very little is known about the sources or characteristics of these strains, and it is likely that many of them belong to one of the 112 species already described. The objectives of this study were to extend the current knowledge about the diversity of Chryseobacterium spp., and provide insights into the taxonomic status of strains that are not yet assigned to a species within the genus.

Phylogenetic analysis using CVTree3
Phylogenetic analysis using the web server CVTree3, which is an alignment-and parameter-free method that relies on the oligopeptide content (K-tuple length) of conserved proteins to deduce evolutionary relatedness [12], was performed as described previously [13].
Briefly, the deduced proteome sequences (excluding plasmidencoded proteins) of Chryseobacterium spp. were downloaded from UniProt ( www. uniprot. org/ proteomes/). The protein sequences were saved as multifasta files with the extension .faa. The multifasta files for each strain were uploaded on to the CVTree3 web server (http:// tlife. fudan. edu. cn/ cvtree/ cvtree/) and analysed by selecting all available K-tuple length options (from 3 to 9). Since the best K-values for bacteria were shown to be 5-6 [12], the proteome tree was visualised at K=6. The output from CVTree3 was saved as a Newick file, and the tree was rendered using the Interactive Tree Of Life (iTOL) web server version 4 (https:// itol. embl. de/).

Phylogenetic analysis using mega 7.0
Pairwise alignments of DNA sequences were performed using ClustalW with default parameters. The pairwise distance matrix derived from these alignments was used to construct a guide tree by the neighbour-joining method. Subsequent progressive alignment was based on the guide tree. Phylogeny was reconstructed using the maximum likelihood method (with 1000 bootstrap replicates) and the Tamura-Nei substitution model in mega 7.0. The output from mega was saved as a Newick file, and the tree was rendered using iTOL.

PCR, cloning and sequencing
Bacterial genomic DNA was isolated using the snap-chill method. Briefly, a loopful of fresh bacterial culture was resuspended in 100 µl sterile ddH 2 O in a 1.5 ml microcentrifuge tube. The cell suspension was boiled in a water bath for 10 min. The boiled cell suspension was incubated at −80 °C for 10 min. The frozen suspension was thawed and centifuged (~18 600 g for 10 min at 4 °C) using a Hettich MIKRO 220 R centrifuge. The supernatant was transferred to a sterile 0.5 ml microcentrifuge tube and used in PCR after serial dilution. Amplification of the 16S rDNA was performed using the 27F (5′-AGAGTTTGATCMTGGCTCAG-3′) and 1492R (5′-TACG GYTA CCTT GTTA CGACTT-3′) primers. PCR products (~1.5 kb) were gel purified using the GeneJET Gel Extraction Kit (Thermo Scientific) and ligated into the pTZ57R/T vector (InsTAclone PCR Cloning Kit, Thermo Scientific). Competent cells of Escherichia coli DH5α were prepared using the CaCl 2 method and transformed with the ligated products. Transformants were selected on Luria-Bertani (LB) agar plates containing ampicillin (100 µg ml −1 ) and recombinants were selected by blue-white screening. Recombinants were confirmed by plasmid purification (GeneJET Plasmid Miniprep Kit, Thermo Scientific) and restriction digestion. DNA inserts within pTZ57R/T were sequenced using the Sanger sequencing method.

Pigment extraction and analysis
Strain B-14859 was procured from the Agricultural Research Service Culture Collection (Peoria, Illinois, USA), which is the only known official source of the bacterium. This bacterium was cultured using LB medium at 30 °C with aeration (shaking at 200 r.p.m.). Solid LB medium was prepared using 2 % agar for culturing bacteria from glycerol stocks or plating broth cultures to test purity. Biochemical tests were performed based on the descriptions in the VetBact online resource ( www. vetbact. org/) of the Faculty of Veterinary Medicine and Animal Science of the Swedish University of Agricultural Sciences. Liquid bacterial cultures were pelletted in sterile Oak Ridge tubes using a fixed angle rotor (JA20, ~12000 g for 10 min at 4 °C) in an Avanti J-25 centrifuge (Beckman Coulter, USA). Wet biomass (80 or 160 mg) was obtained from the pellets and resuspended in 1 ml acetone by gentle vortexing. The suspension was lysed using a VCX 750 Vibra-Cell sonicator (Sonics and Materials, USA) for 10 min (30 % amplitude with 5 s on/off pulse). The yellow-coloured supernatant was collected by centrifugation and scanned using a Cary 100 UV-Vis spectrophotometer (Agilent Technologies). Spectra in the 200-800 nm wavelength range were recorded.

RESuLTS And dISCuSSIon
Identification of novel species using 16S rdnA sequences A total of 33 strains that were not yet assigned to a species within the genus Chryseobacterium were chosen for further analysis based on the availability of their genome sequences in the public databases. For 29 of these strains, 16S rDNA sequences were obtained from GenBank ( Table 1). The closest homologues of these sequences were searched within the '16S ribosomal RNA sequences (Bacteria and Archaea)' database of NCBI using blastn. The output was optimised by selecting the 'Highly similar sequences (megablast)' option. From this search, the top hits (those with the highest blastn score, having a query coverage of >90 %) for each sequence were recorded (Table 1). To characterise each strain further, the criteria proposed by Chun et al. [14] were used. If the identity of the top hit was ≥98.7 %, then the strain was inferred not to represent a novel species. For four strains (AG844, CBo1, ERMR1 : 04 and YR203), the top hits had 99-100 % identity  (Table 1). It is very likely these strains belong to C. cucumeris, C. formosense, C. polytrichastri and C. vrystaatense, respectively.
A strain can be predicted to represent a novel species if the identity of the top hit is <98.7 % [14]. For five strains, the identities of the top hits were 97 % (with blast scores >2444). For nine strains, the identities of the top hits were 98 % (with blast scores >2499). Pending further confirmation, these 14 strains were deemed to represent novel Chryseobacterium spp. (Table 1). For four other strains (Hurlbut01, Leaf394, SCN 40-13 and YR221), 16S rDNA sequences were not available in GenBank (Table 1). It appears that the 16S rDNA genes of these strains were not covered during genome sequencing because even annotation using RAST (http:// rast. nmpdr. org/) did not reveal them. However, the unavailability of 16S rDNA sequences was not a major handicap since these strains could be characterised using their genomes or other phylogenetic markers.

Identification of novel species using deduced proteome sequences
Previously, Chun et al. [14] had shown that analysis of 16S rDNA sequences could be combined with whole genome comparisons to correctly identify and recognise novel species. On the same principles, proteome sequence-based analyses were performed using CVTree3 to check the relationships among 92 strains (including the 33 listed in Table 1) of Chryseobacterium spp. ( Table 2). The phylogenetic tree (Fig. 1) derived from this analysis indicated that the genus Chryseobacterium is diverse and polyphyletic. In total, 16 of the 21 species previously analysed using whole genome sequences by Hahnke et al. [11] were present in this tree ( Table 2 and Fig. 1). Although the methods of analyses are different, the branching patterns of these 16 species were similar in Fig. 1 and in the tree reported by Hahnke et al. [11]. For example, C. antarcticum, C. jeonii, C. koreense and C. solincola were located on a major branch in both trees. Chryseobacterium bovis, which was shown to cluster with Epilithonimonas tenax by Hahnke et al. [11], was found on a separate branch containing five other Chryseobacterium spp. (Fig. 1). Furthermore, C. angstadtii, C. kwangjuense and C. luteum occurred on yet another major branch, as did C. gallinarum, C. gleum and C. indologenes (Fig. 1).
Hahnke et al. [11] showed that C. aquaticum and C. greenlandense were closely related to each other and co-locate with C. formosense. A similar outcome was conspicuous in the proteome sequence-based tree (Fig. 1).
Strains AG844, CBo1 and ERMR1 : 04 were predicted not to represent novel species based on 16S rDNA sequence comparisons (Table 1). These strains had shorter branches and clustered with C. cucumeris, C. formosense and C. polytrichastri, respectively, in the proteome sequence-based tree (Fig. 1). Strain YR203 had a longer and distinct branch in the tree because the top hit for this strain (C. vrystaatense, Table 1) was not included in the analysis. Among other strains predicted not to represent novel species (Table 1) *Colour and pigment as reported in the references listed in the last column. †These species were previously analysed using whole genome sequences by Hahnke et al. [11]. ‡Type strain. §Reference proteomes (UniProt defines these as 'well-studied model organisms and other organisms of interest for biomedical research and phylogeny'). ||Strains CF314 and ISE14 were proposed to represent novel species after the analysis reported here were completed. Table 2. Continued BLS98, HMWF035, JM1, MOF25P and StRB126 also had relatively short branches and clustered with C. gallinarum, C. oranimense, C. gambrini, C. olae, C. balustinum and C. jejuense, respectively (Fig. 1). However, although strain ISE14 was predicted to be closely related to C. lactis (Table 1), it actually had a longer branch and clustered with C. indologenes (Fig. 1). Similarly, although strain MYb7 was predicted to be closely related to C. lactis (Table 1), it occurred in a cluster with strain HMWF028 and C. culicis (Fig. 1). Surprisingly, strain PMSZPI clustered with C. gallinarum (Fig. 1), although it was predicted to be closely related to C. culicis (Table 1). These discrepancies could be due to misidentified strains and/or their sequences. Further analysis are required to establish the phylogenetic status and novelty of strains HMWF028, ISE14, MYb7 and PMSZPI.
All five strains (CF314, IHB B 17019, JAH, Leaf180 and Leaf404) that were deemed to represent new species based on the comparison of their 16S rDNA sequences (top hits having an identity of 97 %, blast scores >2444, Table 1) had relatively longer branches (Fig. 1). More importantly, all nine strains (52, FH1, FH2, FP211-J200, Leaf201, Leaf405, RU33C, RU37D and T16E-39) whose top hits had an identity of 98 % (blast scores >2499, Table 1) also had longer branches (Fig. 1). Among these 14 strains, Leaf405 shared a branch with C. arachidis, T16E-39 shared a branch with C. piperi and was located close to CF314, IHB B 17019 shared a branch with C. wanjuense and was located close to RU37D, FP211-J200 shared a branch with C. halperniae and was located close to FH1 (Fig. 1). Among the four strains lacking 16S rDNA sequences (Table 1),  and YR221 appear to represent novel species based on the length and distinctness of the branches on which they are located in the tree (Fig. 1). Leaf394 shared a branch with Leaf404 and may also represent novel species. However, Hurlbut01 may not represent a  Table 2. Black lines and text indicate Chryseobacterium spp. with validly published names (n=58). Green line and text indicate strains that were predicted not to represent novel species (n=11). Red lines and text indicate strains that were predicted to represent novel species (n=23). The proteome of Flavobacterium columnare ATCC 49512 (UniProt Proteome ID: UP000005638) was used as the outgroup, which does not appear in the figure. The tree was scaled based on branch length values. The bar allows the estimation of branch lengths (e.g. strain SCN 40-13 has a branch length of 0.2235, which is approximately 2.235 times the length of the bar).
novel species and is closely related to C. aquaticum (Fig. 1). Using whole genome comparisons, Tetz and Tetz [47] had proposed C. mucoviscidosis to be a novel species. The deduced proteome sequence-based tree provides further credence to this proposal, and shows that C. mucoviscidosis is related to C. gambrini (Fig. 1).

Characterization of strain nRRL B-14859
It has been more than two decades since Hou [76] identified and described a Gram-negative, non-motile, rod-shaped bacterium that produced yellowish-brown colonies. This bacterium (referred to as strain NRRL B-14859, also known as strain DS5) was identified as a member of the genus Flavobacterium [76]. This strain was shown to produce oxygenated fatty acids such as 10-ketostearic acid and 10-hydroxystearic acid using oleic acid [76] and vegetable oils [77] as substrates. It was also shown to convert linoleic acid to 10-hydroxy-12(Z)-octadecenoic acid [78]. The bioconversion of oleic acid by this strain was reported to be more efficient than the bioconversion of linoleic acid. More importantly, the oleate hydratase of strain B-14859 was predicted to be a C-10 positional-specific enzyme with a preference for 18-carbon mono-unsaturated fatty acid [79]. This strain was further characterised in the context of results reported in the previous two sections.
Golden-yellow-coloured colonies of strain B-14859 were seen on LB agar plates after overnight incubation at 30 °C (the strain could also grow at 20 or 42 °C). Strain B-14859 was resistant to ampicillin (100 µg ml −1 ), kanamycin (50 µg ml −1 ), tetracycline (30 µg ml −1 ) and spermidine (15 µg ml −1 ), but lacked plasmids. It was catalase, urease and gelatinase positive, but was oxidase and indole negative. The 16S rDNA gene of strain B-14859 was cloned and sequenced. The closest homologues of this sequence were retrieved as described in the first section. The top five hits (from C. ureilyticum, C. indologenes, C. gleum, Chryseobacterium bernardetii and C. vrystaatense) had 97-98 % identity (with a blast score of 2471-2508, query coverage of 96-99 %).  Table 2. Black lines and text indicate Chryseobacterium spp. with validly published names (n=7). Green line and text indicate strain AG844 that was predicted to be very closely related to C. cucumeris GSE06 in Fig. 1. Red lines and text indicate five strains that were predicted to represent novel species (only three of these are shown in Fig. 1); strains B-14859 and WG4 lacked genome sequences and are shaded.
Therefore, based on the inferences drawn in Table 1, it appeared that strain B-14859 belongs to the genus Chryseobacterium, and represents a novel species. To further characterise the taxonomic position of this strain, phylogenetic analysis was performed using 16S rDNA sequences (961 bp). In the phylogenetic tree, C. cucumeris and C. gleum clustered on a main branch with strains AG844 and RU33C (Fig. 2). A similar clustering was also observed in the proteome sequence-based tree (Fig. 1).
Since strains B-14859 and RU33C were located on a sub-branch within this main branch (Fig. 2), it is likely that they are closely related. Furthermore, Chryseobacterium sp. strain WG4, which was isolated from a water sample collected in the Western Ghats of India [80], was located on a separate branch (Fig. 2) and may also represent a novel species. Notably, C. lactis clustered on a main branch with C. ureilyticum (and strain ISE14) in the 16S rDNA sequence-based tree (Fig. 2), but with C. indologenes (and strain ISE14) in the proteome sequence-based tree (Fig. 1). Interestingly, strain MYb7, which was predicted to be closely related to C. lactis (Table 1), but clustered with C. culicis in the proteome sequence-based tree (Fig. 1), was located on a separate branch in Fig. 2. As indicated previously, further analyses are required to resolve the taxonomic position of strains ISE14 and MYb7.
Overnight cultures of strain B-14859 on solid medium or in broth had a distinct fruity odour. Interestingly, fruity odour was also reported in C. indologenes [38], but not in C. gleum [3]. Based on the characterization of strain WG4 [80], it is likely that the fruity aroma of Chryseobacterium sp. is due to ethyl-2-methylbutyrate and ethyl-3-methylbutyrate. Yabuuchi et al. [38] reported that the flexirubin-type pigment of C. indologenes turned deep red after one drop of 3 % potassium hydroxide (KOH) solution was added, and that the colour change was reversed when one drop of 1.5 N hydrochloric acid (HCl) was added. A similar result was obtained with the pigment of strain B-14859 (Fig. 3). Yabuuchi et al. [38] also reported that the absorption spectra of pigments extracted using acetone from three strains of C. indologenes had a single peak at ~451 nm. The pigment of strain B-14859 extracted using acetone showed a similar peak (Fig. 4). In contrast, the UV-Vis absorption spectrum of acetone-extracted pigments of Sphingomonas paucimobilis Fig. 3. Reversible colour change of pigment of strain B-14859. Left: colour changes to red when 3 % KOH added to culture spot (a); no change in colour when 1.5 N HCl is added (b); no change in colour when 1.5 N HCl is added first, followed by 3 % KOH (c); colour changes from red to yellow when 3 % KOH is added first, followed by 1.5 N HCl (d); control culture spot (e). Right: control culture (a); colour changes to red when 3 % KOH is added to the broth culture (b); colour changes from red to yellow when 3 % KOH is added first, followed by 1.5 N HCl (c). strain B-54, which produces C 40 carotenoids, showed three peaks (Fig. 4). Furthermore, the ~451 nm peak of strain B-14859 shifted to a higher wavelength after the addition of 20 % KOH (Fig. 4), as reported previously for C. indologenes strains [38]. Therefore, it is likely that the pigment of strain B-14859 is of the flexirubin type. Flexirubins, first identified in Flexibacter elegans [81], are polyene compounds that are insoluble in many organic solvents or water [82]. The biological functions of these pigments, which appear to be pervasive in Chryseobacterium spp. (Table 2), are yet to be characterised.