Detection and characterization of a novel bat-borne coronavirus in Singapore using multiple molecular approaches

Bats are important reservoirs and vectors in the transmission of emerging infectious diseases. Many highly pathogenic viruses such as SARS-CoV and rabies-related lyssaviruses have crossed species barriers to infect humans and other animals. In this study we monitored the major roost sites of bats in Singapore, and performed surveillance for zoonotic pathogens in these bats. Screening of guano samples collected during the survey uncovered a bat coronavirus (Betacoronavirus) in Cynopterus brachyotis, commonly known as the lesser dog-faced fruit bat. Using a capture-enrichment sequencing platform, the full-length genome of the bat CoV was sequenced and found to be closely related to the bat coronavirus HKU9 species found in Leschenault’s rousette discovered in the Guangdong and Yunnan provinces.


InTRoduCTIon
Infectious diseases continue to be a major threat to modern society and coronaviruses (CoVs) are one of the most notable virus families responsible for recent, highly pathogenic viral disease outbreaks. Over the last two decades, major outbreaks of deadly CoVs have been reported in humans and livestock, including severe acute respiratory syndrome (SARS) in 2003 [1,2], Middle East respiratory syndrome (MERS) in 2012 [3] and most recently, swine acute diarrhoea syndrome (SADS) in 2017 [4]. These outbreaks have had a significant impact on the economy, global travel and society, and should serve as a warning for other emergent CoVs. While Asia used to be considered to be the 'hot zone' for such outbreaks, the emergence of MERS highlighted that such events can happen anywhere in the world.
CoVs are a family of viruses within the order Nidovirales, and the subfamily Orthocoronavirinae can be divided into four genera, Alphacoronavirus, Betacoronavirus, Deltacoronavirus and Gammacoronavirus, based on antigenic reactivity and/or genome sequences (https:// talk. ictvonline. org/ taxonomy/). CoVs in general can cause disease in a variety of domestic and wild animals, as well as in humans, whereas alpha-and betacoronaviruses predominantly infect mammals. SARS-CoV and MERS-CoV belong to the genus Betacoronavirus, under the subgenera Sarbecovirus and Merbecovirus, respectively [5]. These viruses are genetically distinct [6]. SARS-CoV originated in China and spread worldwide, infecting approximately 8000 individuals with a overall mortality rate of 10 % during the 2002-2003 epidemic [7]. Since emerging in 2012 in the Middle East, MERS-CoV has spread to 27 countries and has infected humans with an estimated mortality rate of 35 % [8].
In addition to SARS-CoV and MERS-CoV, the betacoronavirus human coronavirus OC43 [9], and the alphacoronaviruses NL63 and 229E-CoV have been reported to cause infections in humans [10,11]. The novel HKU2-related bat coronavirus, SADS-CoV, was identified as the aetiological agent responsible for a large-scale outbreak of fatal disease in China that caused the death of 24 693 piglets [4]. Although SADS-CoV was shown to be incapable of infecting humans in its current form, it remains to be seen whether further adaptation or a related virus can spill over into human populations in the future.
Bats have been identified as a natural reservoir for a variety of viral diseases, with RNA viruses accounting for the overwhelming majority of emerging pathogens [12]. Notably, bats have been recognized to carry exceptionally diverse CoVs from which SARS-CoV, MERS-CoV and SADS-CoV are thought to originate [4,13].
In light of these recent events, it is probable that bat-borne CoVs will continue their zoonotic emergence and cause more human SARS-or MERS-like outbreaks. One of the ongoing challenges that scientists face with these zoonotic crossover events is ensuring early warning or rapid detection and diagnosis so that effective countermeasures can be implemented. Unfortunately, current detection methodologies and our lack of understanding of cross-species transmission make the prediction of future spillover events nearly impossible. Active surveillance remains our best defence to ameliorate the impact of these events. With limited funding available for these surveillance systems in regions where they are likely to be needed the most, it is currently impossible to achieve sufficient breadth and depth to comprehensively detect all potential spillover events. An investment focused on surveillance of CoVs in bats is considered to be most impactful based on past data and the current trend of emerging infectious diseases worldwide [14].
In 2013 the National Environment Agency (NEA) and the National Parks Board (NParks) initiated a bat surveillance programme in Singapore. Through an island-wide survey, bat species, bat day roosts, population counts and age demographics were determined at each location. Guano samples were collected from three major roost sites and other opportunistic sampling, such as harvesting of dead bat carcasses and oral and rectal swabs from live catches, was performed. While screening these samples for pathogens of public health concern in Singapore, CoVs were detected in two guano samples and a rectal swab collected from a lesser dog-faced fruit bat. The CoV genome is the largest among all known RNA viruses (~30 kb) and whole-genome sequencing is not a simple task, especially from clinical or surveillance samples with limited genetic material [15]. To increase the speed and success rate of full-genome sequencing of novel CoVs, we have developed a probe-based enrichment method to specifically capture CoV sequences using a library of 120 nt probes. The library contains probes targeting all known CoV sequences [published by the National Center for Biotechnology Information (NCBI) and unpublished]. This new platform was first validated with cultured MERS-CoV and subsequently applied to sequence and characterize the novel CoV detected from bat samples collected in Singapore.

Ecological data gathering and guano sampling
Roost surveys were performed for the identification of species, determination of the size of roosts and guano collection. Twenty-seven locations were selected for survey based on reports of bat activity collected from social media reports and public sightings from January to March 2014 as well as bat sightings from field surveys of major green areas. As bats are known to utilize alternative roost sites for different functions, only day roosts were considered for the ground survey to reduce the possibility of double counting [16]. Information such as bat species, population counts and age estimates were collected from the day roosts. Photographs of the roosting bats were taken to assist and complement survey counts. Individual bats were identified by their eyes, head, nose and wings using a taxonomic key [17]. For estimation of the age of bats, it was assumed that nursing bats were juvenile and all others were adults.
Repeated surveys were performed at five roost sites [HortPark, Singapore Zoological Garden, Singapore Botanic Garden, Rifle Range Flyover (RRFO) and Lower Peirce Reservoir] and three major roost sites were also identified (East Coast Park, RRFO and Singapore Zoological Garden) for periodic guano sampling. During sampling, groundsheets were placed under the roosting bats and guano samples were collected from the groundsheets and put into universal transport media. In addition, opportunistic rectal swabs were collected from bats that were accidentally caught in mist nets during the NParks bird-ringing programme or provided by the Wildlife Reserve Singapore (WRS). Any bat carcasses found in a nature reserve (SBWR) or donated by pest control companies were sent to the laboratory for testing. Guano samples and rectal swabs were placed in universal transport media and bat carcasses were kept refrigerated before processing in the laboratory. Bat carcasses were swabbed in the oral and rectal cavities before dissection. Tissues (heart, liver, spleen, lung, kidney, intestine and brain) were then harvested.
The geographical distribution of bat roosts in Singapore was plotted in QGIS 3.2.3 (QGIS Development Team, 2018) using the basemap of Singapore downloaded from Onemap (Singapore Land Authority, 2018). Project coordinate system WGS 84 projection (EPSG:4326) was used for plotting.

RnA extraction and PCR screening of bat samples
A total of 107 guano samples, 14 rectal swabs and 6 bat carcasses, collected over a period of 3 years, were screened for the presence of CoVs [2], Japanese encephalitis virus (JEV) [18], rabies-related lyssavirus [19], leptospirosis [20] and hantavirus [21]. Nucleic acid was extracted from tissue/ guano/swabs using the AllPrep DNA/RNA Mini kit (Qiagen) according to the manufacturer's instructions. RNaseOUT (Invitrogen), a ribonuclease inhibitor, was added to each RNA prior to storage at −80°C. RNA was reverse-transcribed using SuperScript Reverse Transcriptase (Invitrogen) according to the manufacturer's instructions. The cDNA was used as the template for conventional PCR for the detection of coronavirus, hantavirus and leptospirosis with Phusion Flash High-Fidelity PCR Master Mix (Life Technology). Any PCR products were purified using a PCR purification kit (Qiagen) and sequenced by Sanger sequencing. RNA was used as a template for real-time PCR using the QuaniTect SYBR green kit (Qiagen) according to the manufacturer's instructions for the detection of rabies-related lyssavirus and JEV.
The primer sequences for the detection of coronavirus, JEV [18], rabies-related lyssavirus [19], leptospirosis [20] and hantavirus [21] are listed in Table S2 (available in the online version of this article).

RnA extraction
Vero B4 cells were infected with MERS-CoV under BSL3 containment and harvested for RNA extraction when cytopathic effect (CPE) was evident. RNA samples were verified as being inactive according to the Duke-NUS ABSL3 protocol prior to removal from the facility. RNA was extracted from both MERS-CoV-infected Vero B4 cells and uninfected Vero B4 cells using the Total RNA Kit I (Omega Biotek) according to the manufacturer's instructions. RNA was extracted from WIV1-infected Vero cells using the Total RNA Kit I (Omega Biotek) according to the manufacturer's instructions. The total amount of RNA was quantified using a Nanodrop (Thermo Scientific).

design and assessment of CoV probe library
For the solution-based targeted-enrichment methodology, biotinylated DNA probes were designed against 90 coronavirus genomes from the NCBI database and our unpublished studies that are known to infect bats. The probes were 120 nucleotides (nt) in length and complementary to the viral genome. To capture this diversity, we took an iterative approach to the design, where we first designed baits targeting the conserved regions of the genome and then generated baits targeting the variable regions. The full list of 4303 probes designed for CoV can be found in Table S1.
To assess the CoV probe library, next-generation sequencing (NGS) libraries were prepared from 1 µg of PaKi cellular RNA spiked with serially diluted MERS-CoV RNA. A volume of 2 µl of the diluted MERS-CoV RNA was added to

Phylogenetic analysis
Multiple sequence alignment of the full-genome sequence of BtCoV92 coronavirus and 2211 unique coronavirus sequences present in the NCBI database was carried out using a fast Fourier transformation method in MAFFT v6.940b. An approximately maximum-likelihood phylogenetic tree was generated using the generalized time-reversible model of nucleotide evolution in FastTree v2.1.7 [26]. FastTree uses SH-like local supports with 1000 resamples to estimate and validate the reliability of each split in the tree. From the tree, the branch containing the sequence from BtCoV92 coronavirus sequence was selected and a more robust maximumlikelihood phylogenetic tree was created using RAXML [27] with 1000 bootstrap replications. The trees were visualized

Active surveillance of the Singapore bat population
From 2013-2016, an island-wide survey was conducted by NParks and NEA to determine the species, demographics and population size of bats at different locations in Singapore. A total of 27 locations were surveyed and bat roosts were found in 15 of these locations (Fig. 1). During the collection of bat ecological data (Table 1), three major bat roosting sites at East Coast Park, RRFO and Singapore Zoological Garden were identified for surveillance and periodic guano sampling.
Thirteen out of 15 roost sites only harbour lesser dog-faced bats (Cynopterus brachyotis). Cave nectar bats (Eonycteris spelaea) were found at RRFO and whiskered bats (Myotis sp.) at HortPark. The species and population of bats in the Bukit Timah Caves were undetermined due to insufficient light to allow for counting and visual inspection of the bats without disturbance. Based on a previous survey, the bats were identified as dusky fruit bats (Penthetor lucasi) [28,29]. We were unable to distinguish the number of adult and juveniles bats found at the Rifle Range Flyover and East Coast Park, as the bats were roosting under the flyover and on tall trees and our field equipment was unable to capture the information accurately.  An approximately maximum-likelihood phylogenetic tree was generated using the generalized time-reversible model of nucleotide evolution in FastTree v2.1.7. FastTree uses SH-like local supports with 1000 resamples to estimate and validate the reliability of each split in the tree. From the tree, the branch containing the sequence from the Bt92 coronavirus sequence and 58 sequences from NCBI was selected and a more robust maximum-likelihood phylogenetic tree was created using RAXML with 1000 bootstrap replications. The trees were visualized using FigTree v1.4.2.

Identification of a CoV-positive sample through active surveillance
A total of 107 guano samples, 14 rectal swabs and 6 bat carcasses were collected and screened for the presence of CoVs, JEV, rabies-related lyssavirus, leptospirosis and hantavirus. Three samples collected from lesser dog-faced bats (C. brachyotis) were found to be positive for CoV by PCR targeting a 440 bp region located within the RNAdependent RNA polymerase gene (RdRp) [2]. This set of primer targets motif A corresponding to MGWDYPKCD and motif C corresponding to MMILSDD within the RdRp gene, which is strictly conserved among known RNA-dependent polymerases [30][31][32][33]. CoV sequences (BtCoV22 and BtCoV29) were detected in guano samples collected from the Singapore Zoological Garden and SBWR, respectively. One CoV sequence (BtCoV92) was detected in the rectal swab of a bat carcass collected from SBWR.

design of CoV probe library for enrichment
A probe library to cover all known alpha-and betacoronaviruses was designed and constructed from all available non-redundant genome sequences (full length or partial). The algorithm BaitMaker [34] was then used to design a custom probe set for coronaviruses (Table S1). Briefly, BaitMaker utilizes a k-mer-based search pattern strategy to identify regions of conservation and variability within a viral species. Non-redundant, non-overlapping probes are then designed to target these regions with a minimum of 85 % identity between the probe and the targeted sequence. The source code for Bait-Maker software is freely available at: https:// umasangumathi. github. io/ BaitMaker/.

Assessment of CoV probe library with MERS-CoV
Generic NGS enables unbiased pathogen discovery, but suffers from issues of low sensitivity and high per-sample cost, mainly due to high background of host genes in most samples. After NGS, removal of host background followed by a de novo assembly strategy is usually used for pathogen discovery when viral RNA is abundant. However, this approach fails when the amount of viral RNA is low.
To overcome this issue and to fully realize the power of NGS in pathogen discovery and investigation, several groups (including ours) have now developed an enrichment-based NGS strategy [35][36][37]. To improve the signal-to-background noise ratio, NGS libraries are incubated with a set of biotinylated viral capture probes predesigned according to known viral sequences, followed by enrichment with streptavidin-coated beads. From the assessment of the CoV probe library, there were at least 100 times the number of reads mapped in the enriched samples as compared to the unenriched samples for each dilution of MERS-CoV RNA ( Table 2). The enrichment strategy was also evaluated on WIV1, an uncharacterized SARS-related coronavirus. An increase in the depth of coverage in the enriched sample compared with the unenriched sample was demonstrated (Fig. S1).

Genome characterization of the novel bat CoV
Sequences obtained from the three BtCoVs detected in lesser dog-faced bats were analysed and assembled with Geneious (version 9.  (Table 3). The 5′-ACGAAC-3′ TRS has also been shown to be the TRS for SARS-CoV [38].
The genome of BtCoV92 is similar to that of the other CoVs, with a characteristic gene order of 5′-replicase ORF1ab, spike (S), envelope (E), membrane (M) and nucleocapsid (N)-3′.
The BtCoV92 contains four accessory genes interspersed within the structural genes, one ORF encoding none structural protein NS3 between the S and E genes (ORF 3) and three ORFs encoding NS7a, NS7b, NS7c downstream of the N gene (Fig. 2). NS7c is not typically reported in HKU9, but was only found in the diliman-1525G2 virus found in C. brachyotis in the Philippines [39]. Based on the available partial genome sequence, NS7c was also present in BtCoV22/29, sharing 100 % sequence identity with the NS7c found in BtCoV92. BtCoV92 is closely related to the species Rousettus bat coronavirus HKU9 found in Leschenault's rousette and an unidentified species Rousettus sp. found previously in the Guangdong and Yunnan provinces [40] (Fig. 3). Bat coronavirus HKU9 and bat coronavirus GCCDC1 are two closely related yet distinct betacoronaviruses. GCCDC1 is genetically less diverse than HKU9 and has a p10 gene from reovirus in its genome [41]. HKU9 consists of five lineages and BtCoV92 does not group within any of the five lineages, which makes it a new one [41]. The S and N proteins of BtCoV92 showed 61-72 % and 74-80 % identity to those of the HKU9 species, respectively. We found no recombination between BtCoV92 and HKU9 (NC_009021.1 and HM211101.1) with the Simplot program (data not shown). The N protein of BtCoV92 showed 90 % identity with that of the diliman-1525G2 virus and they clustered in the same branch in the N protein phylogenetic analysis ( Fig. 4a and b). Similarly, when using a partial N protein sequence (461 amino acids, 8 missing amino acids from position 351-358 of the N protein), BtCoV22/29 clustered together with BtCoV92 and diliman-1525G2 virus during phylogenetic analysis (Fig. 4b). In a previous study, a betacoronavirus (KX452687) was detected in local E. spelaea [42] and when comparing the N protein partial sequences with those of BtCoV92, BtCoV22/29, these sequences clustered under a different branch from KX452687 in the phylogenetic tree (Fig. 4a).

dISCuSSIon
In 2015 [39], emerging diseases with epidemic potential were identified by the World Health Organization (WHO) as targets for outbreak preparedness, and SARS-CoV and MERS-CoV were both in the list [43]. Delayed identification and detection of SARS-CoV during the outbreak from November 2002 to April 2003 resulted in wide spread of disease and increased human fatalities [44] . Therefore, surveillance remains key for early outbreak alerts and disease interventions.
In this study, we reported the characterization of the complete genome sequence of a novel coronavirus in Singapore from lesser dog faced bats (C. brachyotis). These bats were the most frequently counted bats during the island-wide surveillance and are commonly found in parks, where their food sources (fruit trees) are widely distributed. They are well adapted to living closely with humans, which inevitably increases the risk of disease transmission and the emergence of zoonoses [45]. Previously, flaviviruses (Phnom Penh virus, Carey island virus and Jugra virus), Nipah virus and betacoronaviruses have been found in C. brachyotis [39,46,47] . Obtaining the full-genome sequences for CoVs from field samples has always been challenging for multiple reasons: (1) the CoV genome (26-32 kb) is the largest non-segmented RNA viral genome known to date [15]; (2) CoV genomes are genetically highly variable due to the high frequency of recombination among CoVs [48] [48,49]; and (3) the absolute amount and relative percentage of viral nucleic acid is generally very low in clinical/field samples. In the current study, we developed a CoV-targeted probe-based enrichment strategy to improve the efficiency of CoV genome characterization. As a proof of concept, we applied this CoV-enrichment NGS strategy to rapidly characterize the first complete CoV genome sequence in Singapore from a novel CoV detected in C. brachyotis. Genetically, this newly found CoV belongs to the genus Betacoronavirus and is closely related to the species Rousettus bat coronavirus HKU9 [50] and diliman-1525G2 [39].
As for any probe-based enrichment NGS strategy, our probe library is designed to detect both known CoVs and novel CoVs genetically related to known members of the family. We believe that the same approach is equally applicable to both outbreak responses and active surveillance programmes, such as the US PREDICT project [51] and the proposed Global Virome Project [52].