Changes in the spike and nucleocapsid protein of porcine epidemic diarrhea virus strain in Vietnam—a molecular potential for the vaccine development?

Background Porcine epidemic diarrhea virus (PEDV) is a dangerous virus causing large piglet losses. PEDV spread rapidly between pig farms and caused the death of up to 90% of infected piglets. Current vaccines are only partially effective in providing immunity to suckling due to the rapid dissemination and ongoing evolution of PEDV. Methods In this study, the complete genome of a PEDV strain in Vietnam 2018 (IBT/VN/2018 strain) has been sequenced. The nucleotide sequence of each fragment was assembled to build a continuous complete sequence using the DNASTAR program. The complete nucleotide sequences and amino acid sequences of S, N, and ORF3 genes were aligned and analyzed to detect the mutations. Results The full-length genome was determined with 28,031 nucleotides in length which consisted of the 5′UTR, ORF1ab, S protein, ORF3, E protein, M protein, N protein, and 3′UTR region. The phylogenetic analysis showed that the IBT/VN/2018 strain was highly virulent belonged to the G2b subgroup along with the Northern American and Asian S-INDEL strains. Multiple sequence alignment of deduced amino acids revealed numerous mutations in the S, N, and ORF3 regions including one substitution 766P > L766 in the epitope SS6; two in the S0subdomain (135DN136 > 135SI136 and N144> D144); two in subdomain SHR1 at aa 1009L > M1009 and 1089S > L1089; one at aa 1279P > S1279 in subdomain SHR2 of the S protein; two at aa 364N > I364 and 378N > S378 in the N protein; four at aa 25L > S25, 70I > V70, 107C > F107, and 168D > N168 in the ORF3 protein. We identified two insertions (at aa 59NQGV62 and aa 145N) and one deletion (at aa 168DI169) in S protein. Remarkable, eight amino acid substitutions (294I > M294, 318A > S318, 335V > I335, 361A > T361, 497R > T497, 501SH502 > 501IY502, 506I > T506, 682V > I682, and 777P > L777) were found in SA subdomain. Besides, N- and O-glycosylation analysis of S, N, and ORF3 protein reveals three known sites (25G+, 123N+, and 62V+) and three novel sites (144D+, 1009M+, and 1279L+) in the IBT/VN/2018 strain compared with the vaccine strains. Taken together, the results showed that mutations in the S, N, and ORF3 genes can affect receptor specificity, viral pathogenicity, and the ability to evade the host immune system of the IBT/VN/2018 strain. Our results highlight the importance of molecular characterization of field strains of PEDV for the development of an effective vaccine to control PEDV infections in Vietnam.

The S protein (a glycoprotein) contains a specific receptor binding site that is an antigenic target for neutralizing antibodies and relates to its pathogenicity and immunogenicity (Pospischil, Stuedli & Kiupel, 2002). This protein plays a critical role in viral entry through the viral-cellular fusion activity and induces an immune response in the natural host during replication (Lee, 2015;Teenavechyan et al., 2016). Thus, it is used to study the genetic match between vaccine strains and circulating PEDV strains (Lee et al., 2010;Lee & Lee, 2014;Oh et al., 2014). The M protein is a surface protein and plays an important role in the process of virus-assembly and the induction of protective antibodies with neutralizing activity (Lee, 2015;Teenavechyan et al., 2016). The N protein is highly conserved and binds to virion RNA to provide a structural basis for viral transcription, replication, and assembly (Curtis, Yount & Baric, 2002;Chen et al., 2013a;Sun et al., 2015). The N protein was commonly used to diagnose infection with PEDV (Song, Moon & Kang, 2015a) and was known to protect the viral genome during the coronavirus assembly. It also affects other anti-virus responses through host immune evasion strategies (Lee, 2015;Teenavechyan et al., 2016). The epitopes on the N protein are considered to possibly cause for induction of cell-mediated immunity (CMI) (Saif, 1993). However, protein S is more antigenic than other PEDV proteins and anti-S antibodies detected in PEDV-infected pigs last longer than anti-N antibodies (Knuchel et al., 1992). For non-structural proteins, ORF1a and ORF1b are multifunctional related to viral genome replication (Brian & Baric, 2005), and the accessory ORF3 protein is known to be associated with the virus virulence (Park et al., 2007;Park et al., 2008;Wang et al., 2012;Chen et al., 2013b;Song et al., 2015b). Therefore, the ORF3 protein has been studied extensively in the molecular epidemiology of PEDV (Shirato et al., 2011;Chen et al., 2013a;Li et al., 2013;Temeeyasen et al., 2014;Song et al., 2015b). These proteins have been the targets of many studies to understand the causes of the outbreaks and develop more effective vaccines (Lee & Lee, 2014;Temeeyasen et al., 2014;Lee, 2015;Song et al., 2015b;Lin et al., 2016;Su et al., 2016). Won et al. (2020) reviewed 299 published articles and showed that current vaccines are produced mainly based on PEDV strains: CV777 (belonged to G1a group) and SM98, DR13 (belonged to G2a group). The G2b whole-virus killed vaccines have been developed and used in the USA and the G2b live oral vaccine (based on the KNU-141113 strain) has been developed and used in Korea from 2020. However, the virus evoluted and accumulated mutations as the time passed that may lead to sub-optimally match to the actual pandemic virus of the vaccines. Therefore, complete genome sequencing of PEDV strains circulating in each country is of critical in order to develop effective vaccines (Vlasova et al., 2014;Vui et al., 2015;Jarvis et al., 2016;Fan et al., 2017;Lee & Lee, 2017;Rasmussen et al., 2018;Liang et al., 2020;Garcia-Hernandez et al., 2021;Wen et al., 2021).
In Vietnam, PEDV caused an outbreak for the first time in 2009 and then occurred again in 2013 (Diep et al., 2018). Several PEDV vaccines have been developed and being used in Vietnam based on vaccine strains CV777/CN/KT323979, SM98/Korea/GU937797, and DR13/Korea/JQ023162 (belonged to G1a and G2a groups). Although a vaccine campaign for piglets has been conducted since 2011 for PEDV control, diarrhea disease still exists in many provinces and causes serious damage to the livestock industry in recent years in Vietnam. The low effectiveness of the vaccines could be due to the genetic differences between vaccines and field epidemic strains, emphasizing the necessity of novel vaccines against new viral strains. Vui et al. (2015) has sequenced the first complete genome of PEDV from outbreaks in Vietnam in 2013 and suggested that the Vietnamese PEDV isolates were new variants. Other studies mainly focused on the spike gene sequences indicated that there have been remarkable changes in the Vietnamese PEDV strains collected from 2012 to 2016(Kim et al., 2015Diep et al., 2018;Than et al., 2020) and these may decrease the vaccine efficiencies. Although PEDV strains remain the main cause of piglet losses in large litters in Vietnam; however, very limited molecular data of their genotypic and genealogy of PEDV are available from Vietnam. Since the first complete genome sequence of Vietnamese PEDV strains announced in 2015, no more complete sequence of Vietnamese PEDV strain was reported. In this study, we report the molecular characteristics and the changes in the spike and nucleocapsid protein of a PEDV strain isolated from pigs causing severe diarrhea in 2018 in Hung Yen province, Northern Vietnam. Our study emphasizes the importance and urgency of studying the genetic diversity of current PEDV strains and their evolution in order to develop a suitable vaccine strategy for each geographic region.

Sample collection and RNA extraction
The PED-specific PCR-positive fecal samples were collected from the piglets that died of diarrhea in 2018, in Hung Yen province (in North of Vietnam), transported to the laboratory, and stored at −20 • C until used. Total RNA was extracted using Trizol reagent (Merck, Darmstadt, Germany) according to the manufacturer's instructions and suspended in DEPC-treated water (Thermo Fisher Scientific Inc, Waltham, MA, USA) and stored at −80 • C until use. cDNA synthesis and genome sequencing The first-strand cDNA synthesis was performed using the Maxima Reverse Transcriptase Kit (Thermo Fisher Scientific Inc, Waltham, MA, USA) following the manufacturer's protocols. The complete genome sequence of the PEDV strain was amplified by 20 pairs of primers (Table 1) designed based on the conserved regions of the reference strains of PEDV available on GenBank. Each DNA fragment was amplified by using PCR master mix (2X) (Thermo Fisher Scientific Inc, Waltham, MA, USA), and the thermal cycle was performed at 94 • C for 3 min, followed by 35 cycles of 94 • C for 1 min, 50 • C to 56 • C for 40 s, and 72 • C for 1 min, with a final extension at 72 • C for 8 min. The PCR products were purified from agarose gel using GeneJet Gel Extraction Kit (Thermo Fisher Scientific Inc, Waltham, MA, USA) and subjected to DNA sequencing by using an automated sequencer (ABI 3500 Genetic Analyzer). The nucleotide sequence of each fragment was assembled to build a continuous complete sequence using the DNASTAR program.

Phylogenetic analysis
Phylogenetic trees were constructed based on the full-length genome and S nucleotide sequences by the neighbor-joining method using MEGA 7 software (Kumar, Stecher & Tamura, 2016) of the Vietnamese strain with 72 reference strains (

Multiple sequence alignments of S, N and ORF3 genes
The complete nucleotide sequences and amino acid sequences of S, N, and ORF3 genes were aligned and compared by the BioEdit software (version 7.0.9.0) (Hall, 1999) to detect the mutations.

In silico glycosylation and homology modeling analysis
The glycosylation analysis was performed to detect the glycosylation sites in the S, N, and ORF3 proteins using the glycosylation prediction software (Hamby & Hirst, 2008). For homology modeling analysis, the S protein sequence of the Vietnamese strain was compared with the S protein sequence ID 6U7K in PDB by SPDV software (Guex & Peitsch, 1997).

Genome analysis
The complete genome of the IBT/VN/2018 strain was determined and deposited in GenBank under accession number MT198679. The full-length genome was 28,031 nucleotide (nt) in length, excluding the 3 poly-A. It consisted of a 5 UTR region with 292 nt in length; an ORF1ab region with 20,248 nt; a S protein with 4,158 nt (1,385 aa); an ORF3 region with 675 nt (224 aa); an E protein with 231 nt (76 aa); a M protein with 681 nt (226 aa); a N protein with 1,326 nt (441 aa); and a 3 UTR region with 229 nt. Full-length

Phylogenetic analysis
The phylogenetic tree constructed based on the full-length genome sequence of 37 PEDV references indicated that the PEDV strains could be divided into two major groups G1 and G2 (

Multiple sequence alignments of S, N and ORF3 genes
The S gene sequence of the IBT/VN/2018 strain was 4,158 nt in length and encoded a protein consisting of 1,385 aa. The genetic identity of the nucleotide sequence and amino acid sequence of the S gene of the Vietnamese strain was 92.2%-98.1% and 91.1%-98.4% compared to the strains from other countries, respectively (Table S2) The S protein can also be divided into two domains: S1 (aa 1-789), S2 (aa 790-1,385), or subdomains as described by Li et al. (2016) (Fig. 3) (Table S3). The IBT/VN/2018 strain had the highest genetic identity with CH Hubei/CN/2016/KY928065 strain and the lowest identity with the vaccine strain CV777/CN/KT323979. Notably, two substitutions at aa 364 V > I 364 and 378 N > S 378 were only detected in the IBT/VN/2018 strain (Fig. S2).
The identity of nucleotide sequence and the amino acid sequence of the ORF3 region of the IBT/VN/2018 strain was 89.1%-99.4% and 90.1%-100% compared to other strains, respectively (Table S4) members of the Coronaviridae family, with genetic identities ranging from 95.7% to 99.3% compared to other PEDV strains (Table S1). The PEDV, based on the S gene sequence, was classified into two genogroups (G1 and G2) and each can sub-divided into groups (G1a, G1b, G2a, and G2b). Classical strains are designated G1a, and the new variant strains (with insertion and deletion in the S gene) belong to G1b. The G2a and G2b include the highly virulent strains in Asia and North America. The phylogenetic tree constructed from the whole genome sequences (Fig. 1) and the S gene sequences (Fig. 2) showed that the IBT/VN/2018 strain belonged to the G2b subgroup including Northern American and Asian S-INDEL strains. According to previous studies, strains that belonged to the G2b and Northern American and Asian S-INDEL subgroup were highly virulent (Lee, 2015;Lin et al., 2016). This result is consistent with previous studies on PEDV strains isolated in Northern provinces of Vietnam which also showed that Vietnamese PEDV strains belong to the G2b subgroup (Kim et al., 2015;Diep et al., 2018). The S protein composes of 1,383 to 1,386 amino acids consisting of the S1 subunit (aa 1-789) and the S2 subunit (aa 790-1,383) (Fig. 3), depending on the strain (Sun et al., 2007). S protein is known to mediate viral entry and inducing neutralizing antibodies in the natural host. The S1 subunit is the extracellular domain and can bind to target cell receptors (Deng et al., 2016). It is important for cell membrane fusion and virus entry and it is the antigenic target of neutralizing antibodies (Sun et al., 2008;Lee et al., 2011;Deng et al., 2016). It contains a putative signal peptide (aa 1-24), a large extracellular region contains two subdomains: NTD (aa 21-324) and CTD (aa 253-638) (Li, 2015), a single transmembrane domain (aa 1,334-1,356), and a short cytoplasmic tail (Oh et al., 2014). Four neutralizing epitopes (aa 499-638, 748-755, 764-771 designated as COE, SS2, SS6 in domain S1, and 2C10 1,368-1,374 in domain S2) have been determined on the surface of S protein (Sun et al., 2008;Li et al., 2016;Okda et al., 2017). The two regions including SS2 and 2C10 are highly conserved. In contrast, there are two positions at aa 499 and 520 in the COE subdomain and one position at aa 766 in the SS6 subdomain are highly variable (Chang et al., 2002;Cruz, Kim & Shin, 2008;Sun et al., 2008;Lara-Romero et al., 2018). These findings are the basis for the development of new effective vaccines in the future (Yu et al., 2020). Further analysis of sequences in functional protein regions of the IBT/VN/2018 strain showed that at the N-terminal of S protein there were two aa insertions and one aa deletion compared to the CV777/CN/KT323979, AJ1102/CN/JX188454, SM98/Korea/KJ857455, and DR13/Korea/JQ023162 strains (Fig. 4). The substitution at aa 766 P > L 766 was also found in the IBT/VN/2018 strain in the epitope SS6. These changes may lead to the ineffectiveness of the drugs and vaccines.
The results of homologous modeling showed that the acquired mutations found in S protein of the IBT/VN/2018 strain including mutations in the S 0 domain does not remarkably affect its overall 3D structure (Fig. S4). This data is well consistent with result of a previous study which indicated that the deletion of S 0 domain does not impart any macroscopic changes in spike protein conformation (Kirchdoerfer et al., 2021).
It has been known that glycosylation plays an important role receptor binding, virus entry, protein proteolysis, and protein transport, thereby altering the virulence and immune evasion of virus (Vigerust & Shepherd, 2007). The glycosylation sites have been reported in PEDV in recent years (Kim et al., 2015;Wen et al., 2021). Six glycosylation sites (25 G+ , 123 N+ , 62 V+ , 144 D+ , 1009 M+ , and 1279 L+ ) were detected in the ORF3, N, and S proteins of the IBT/VN/2018 strain when compared with the vaccine strains. Among that three novel glycosylation sites (144 D+ , 1009 M+ , and 1279 L+ ) were detected in the IBT/VN/2018 strain. As a consequence, the amino acid substitutions in the S, N and ORF3 proteins may help the IBT/VN/2018 strain evade the host immune system and probably change the virulence of the virus.

CONCLUSION
Our study showed that the IBT/VN/2018 strain belonged to the G2b subgroup that along with the Northern American and Asian S-INDEL strains and are considered as a highly virulent group. Remarkably, eight amino acid substitutions ( 294 I > M 294 , 318 A > S 318 , 335 V > I 335 , 361 A > T 361 , 497 R > T 497 , 501 SH 502 > 501 IY 502 , 506 I > T 506 , 682 V > I 682 , and 777 P > L 777 ) were found in S A subdomain in the Vietnamese strain compared with the vaccine strains. In addition to these substitutions, three novel N-and O-glycosylation sites (144 D+ , 1009 M+ , and 1279 L+ ) were detected in the S protein of IBT/VN/2018 strain. The continual mutations in these genes may have generated a novel antigenic strain and help the virus escape from the host immune response induced by the vaccine. Our results highlight the importance of molecular characterization of PEDV strains circulating in Vietnam and provide a molecular potential for the development of an effective vaccine to control PEDV infections of pigs in Vietnam.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This study was supported by the Vietnam Academy of Science and Technology under grant No VAST02.03/20-21. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.