Severe fever with thrombocytopenia syndrome (SFTS) is caused by SFTS virus (SFTSV), a member of the order Bunyavirales, family Phenuiviridae, genus Bandavirus (https://talk.ictvonline.org/taxonomy). SFTS is a newly emerging infectious disease with its major clinical symptoms and laboratory findings including fever, thrombocytopenia, gastrointestinal symptoms, leukopenia, and elevated levels of serum hepatic enzymes. Patients with SFTS usually die from multiple organ failure, and the average fatality rate is 12%, although it has been reported to be as high as 30% in some areas [1,2,3]. SFTS was first reported in China, with additional cases subsequently confirmed in Japan, Korea, and, most recently, Vietnam [4]. Two cases with comparable symptoms caused by a similar virus, Heartland virus, were reported in the United States, and cases of infection with novel bandaviruses, including the Hunter Island group virus, Malsoor virus, and Guertu virus, were reported in Australia, India, and China, respectively [1, 5,6,7,8,9,10].

SFTS is mainly transmitted by ticks. Specifically, ticks of the family Ixodidae have been implicated as vectors of SFTSV. However, human-to-human transmission by contact with blood or body fluid from infected patients has also been reported in China and South Korea [2, 11,12,13,14]. Significantly, a novel case of SFTS infection was reported in South Korea without evidence of a tick bite [13]. Since the first report of SFTS infection in 2010, the number of cases has continuously increased every year in China, Japan, and South Korea. Patient surveillance in South Korea demonstrated 36 confirmed cases in 2013, which increased to 55 in 2014, 79 in 2015, and 165 in 2016 [15, 16]. A total of 158 SFTSV strains were isolated from the serum of these patients as described previously [17, 18]. Given the novelty of this virus and the limited information available, we aimed to acquire more molecular-level information on SFTSV toward the goal of developing a new diagnostic method for SFTS. To this end, we randomly selected 51 cases while ensuring that all provinces with a confirmed SFTS patient were included, and the isolates from these cases were sequenced.

The 51 clinical samples used in this study were collected as part of a laboratory surveillance system led by the Korea National Institute of Health (KNIH) during 2013–2016. In brief, the 5′- and 3′-terminal regions were sequenced by rapid amplification of cDNA ends technology. The genome sequences, including 41 tripartite (segments L, M, and S) and 10 bipartite (segments M and S) sequences, were generated using de novo assembly with DNASTAR SeqMan version 7.1 (Lasergene). The genome sequences obtained in this study were deposited in the GenBank/EMBL/DDBJ databases under the accession numbers KU507543–KU507577, KP663731–KP663745, and MF094728–MF094820, respectively. For gene characterization, we collected and manually edited 207 tripartite segmented genome sequences (163 Chinese, 43 Japanese, and one Korean) with available sampling dates from the GenBank database. Here, we focused on the protein-coding regions of SFTSV to investigate sequence variations and evolutionary dynamics.

The geographical distribution of the sequenced SFTSV samples is shown in Figure 1. In our dataset, isolates from Daegu represented the majority of the SFTSV genomes sequenced.

Fig. 1
figure 1

Geographic distribution of 51 selected SFTS cases in South Korea from 2013 to 2016 analyzed in this study. Of the regions with confirmed cases, Daegu was the main endemic region in this study. The shading of each region reflects the number of SFTS cases by area

Variation analysis was performed using 207 genome sequences collected from the National Center for Biotechnology Information GenBank database and the 51 genome sequences from the KNIH. The genome sequences were aligned against a reference genome sequence (strain HB29: accession no. NC_018139, NC_018138, and NC_018137 for the L, M, and S segment, respectively) using MUSCLE v3.8 [19]. At the nucleotide level, the total coding sequence length of the three segments was 6255, 3222, and 1620 nucleotides for the L, M, and S segment, respectively. This dataset revealed sequence variations by segment, including 1,254 variations for segment L, 803 for segment M, and 358 for segment S, 207, 154, and 58 of which were present exclusively in the Korean isolates, respectively.

At the amino acid level, the L, M, and S segments contain 2084, 1074, and 540 amino acid residues, respectively. In the Korean isolates, 82, 122, and 48 amino acids varied in the L, M, and S segments, respectively, 31, 37, and 16 of which were specific to the Korean sequences. In segment S, site 238 of the nonstructural protein coding region contained multiple variations: D (Asp) > E (Glu)/N (Asn)/G (Gly). In all of the Japanese sequences, this change was to E (Glu), whereas the Korean sequences presented three variations: one E (Glu) (strain 16KS28), two N (Asn) (strain 16KS31 and 16KS40), and one G (Gly) (strain 16KS26). A Japanese research group reported that substitution of the amino acid residue 962 (R > S) is crucial for the membrane fusion step of viral infection [20]. In our data, all of the KNIH strains except for strain 15KS7 (accession no. MF094809) had this replacement at residue 962. Another study found that the R > W 2 substitution at position 624 was associated with strong cell-fusion activity under acidic conditions, although none of the KNIH strains showed this variation [21].

To investigate the evolutionary dynamics of SFTSV, a maximum-clade-credibility tree was constructed from Bayesian phylogenetic analysis using the BEAST v1.8.4 package [22] and the FigTree v1.4 program [23], with general time-reversible, gamma-distributed substitution rate heterogeneity (G) and proportion of invariable sites (I) under both strict and uncorrelated relaxed molecular clocks. The trees for each of the three segments showed a similar topology (Fig. 2). A total of 248 sequences for segment L and 258 sequences for segments M and S were divided into two major geographical clades, designated as the Chinese clade and the Korean/Japanese clade (hereafter referred to as clade B, representing the virus commonly circulating in South Korea and Japan). The Chinese geographical clade was composed of five clades (A, C–F), and geographical clade B was the largest single clade.

Fig. 2
figure 2

Maximum-clade-credibility phylogenetic trees of the three genome segments: (A) segment S, (B) segment M, and (C) segment L. Red, Korean complete genome sequences; orange, Korean incomplete genome sequences (M and S segments only); green, Japanese complete genome sequences; black, Chinese complete genome sequences; purple, Korean complete genome sequence from GenBank; blue, C1 clade in each of the three segments

Among all of the analyzed isolates in clade B, there were 30 Chinese, 42 Japanese, and 34 Korean strains for segment L; 29 Chinese, 42 Japanese, and 41 Korean strains for segment M; and 30 Chinese, 42 Japanese, and 41 Korean strains for segment S. Among the 41 tripartite KNIH genomes (segments L, M, and S), 34 were clustered in clade B. Of the bipartite KNIH sequences (segments M and S), seven of 10 isolates were also grouped in clade B. Six sequences of KNIH isolates – KASJH (2014), 16KS15 (2016), 16KS17 (2016), 16KS33 (2016), 16KS51 (2016), and 16KS52 (2016) – were grouped in the Chinese clade D, whereas one and two isolates of the remaining three bipartite-sequenced samples belonged to the Chinese clade D and A, respectively. One unique isolate from the KNIH (16KS45) was identified to have resulted from multiple inter-lineage reassortment. This isolate was grouped into different Chinese clades according to the segment analyzed: the segment L tree grouped 16KS45 in clade C, whereas the segment M and S trees grouped this isolate into clade A.

Although 98% of the Japanese isolates clustered in clade B, the isolate SPL087A grouped in Chinese clades; clade C for segment L, clade E for segment M, and clade A for segment S. For the Chinese isolates, 81.6% of the genome sequences clustered in the Chinese clades, whereas 30 isolates clustered in the Korean/Japanese clade B. Altogether, these results indicate that the majority of the Korean and Japanese SFTSV genomes cluster distinctly from the Chinese SFTSV genomes. Nevertheless, clade B may need to be separated into at least three subclades owing to the recent growth of this clade with a large number of Korean SFTSV sequences.

Genetic reassortment within the segmented RNA genome of SFTSV was observed in this study (Table 1). The Japanese isolate SPL087A emerged as a unique reassortant within the Japanese genomes and clustered in the Chinese clade C, E, and A for the L, M, and S segment, respectively. The Korean isolate 16KS45 was a unique reassortant among the Korean sequences, belonging to the Chinese clade C, A, and A for the L, M, and S segment, respectively. The Chinese strains NB32 and NB38 were reassigned from Chinese clades to the Korean/Japanese clade B. NB32 clustered in clade B, A, and B and NB38 clustered in clade A, A, and B for segment L, M, and S, respectively. Of the 15 strains that resulted from reassortment, eight had their L and S segments assigned to the same clade and the M segment was assigned to a different clade, which in accordance with the findings of Rezelj et al. [24]. The present analysis also identified a novel Korean reassortant of SFTSV that was not found in earlier studies [25, 26].

Table 1 Reassortants identified based on phylogenetic tree topology differences

Bayesian phylogenetic analysis was performed to estimate the evolutionary rate and timescale for SFTSV. The evolutionary rate of all sequences of SFTSV was estimated to be 1.07E-4 (5.25 E-5–1.62E-4) for segment L, 2.08E-4 (1.11E-4–3.04E-4) for segment M, and 2.60E-4 (1.4588E-4–3.5961E-4) for segment S. The estimated time of the most recent common ancestor was 1736.66 (1566.24–1874.19) for segment L, 1758.65 (1600.62–1875.34) for segment M, and 1869.82 (1798.98–1929.03) for segment S, thereby indicating that SFTSV might have originated between 1736 and 1869. Although a different dataset was used in each study, our estimates of evolutionary rate were similar to those reported previously [27, 28]. However, Liu et al. [26] reported 3.25–4.2 times higher evolutionary rates than our estimates.

In summary, in this study, we determined 51 full-length genome sequences of Korean SFTSV isolates that were sampled from 2013 to 2016. This is the first phylogenetic and evolutionary analysis of a large number of Korean SFTSV genome sequences. Most of these KNIH sequences clustered in a major clade with Japanese sequences, whereas six complete KNIH genome sequences clustered in Chinese clades. One of the Korean isolates was identified as a novel reassortant and was assigned to a Chinese clade.