Complete genome sequence of canine astrovirus with molecular and epidemiological characterisation of UK strains.

Highlights • Astroviruses are a common cause of gastroenteritis in many species including man.• We sought to determine whether canine astrovirus is circulating in the UK.• Canine astrovirus was identified in four dogs with gastroenteritis.• Sequencing the capsid of each isolate identified significant genetic heterogeneity.• The first full genome sequence of canine astrovirus has also been determined.

, but to date, no full-length genome sequence of CaAstV has been reported.
The purpose of this study was to determine the prevalence of CaAstV in the UK dog population and to obtain where possible the complete genetic sequence of circulating strains. Four CaAstV strains were identified in dogs showing clinical signs of gastroenteritis and sequencing of ORF2 found significant genetic diversity between strains. The first full-length sequences of two CaAstV strains was subsequently determined.

Sample collection
Stool samples were collected from dogs admitted to five participating veterinary clinics and a single animal shelter in counties in the south and east of England; Cambridgeshire, Kent, Lincolnshire, Middlesex and Suffolk. Ethical approval was not required for this study as all samples collected were animal waste products. With owner consent, dogs were recruited to the study if they passed stools whilst hospitalised. Stool samples were collected from dogs from the animal shelter if they passed diarrhoea. All stool samples were stored at À20 8C until and during transportation to the laboratory, where they were stored at À80 8C prior to nucleic acid extraction. As controls, stool samples were also collected from healthy dogs owned by veterinary staff at each clinic, as well as from dogs at participating boarding kennels. Basic case data was recorded for each dog from which a stool sample was collected, including age, breed, sex, reason for admission, and any recent history of enteric disease.

Detection of CaAstV in clinical samples
Stools were diluted to a final concentration of 10% w/v in phosphate-buffered saline, pH 7.2, and solids were removed by centrifugation at 8000 Â g for 5 min. Viral nucleic acid was extracted from 140 ml of each clarified stool suspension using the GenElute TM Mammalian Total RNA Miniprep Kit (Sigma-Aldrich) according to the manufacturers' instructions. An internal extraction control (equine arteritis virus RNA) was added during the nucleic acid extraction process as previously described (Caddy et al., 2013).cDNA was generated by reverse transcription using MMLV reverse transcriptase enzyme (Life Technologies) and random hexamers (Life Technologies) with the reaction performed at 42 8C for 1 h, followed by an inactivation step at 70 8C for 10 min. qPCR was performed using primers targeting a conserved region of the viral RNA-dependent RNA polymerase as previously described (Martella et al., 2011). The sequences of primers used for PCR in this study are presented in Table 1. qPCR reactions were prepared using the MESA Blue qPCR MasterMix Plus for SYBR Assay (Eurogentech). Briefly, 2 ml cDNA was mixed with 2Â MasterMix and 0.5 mM primers, then incubated at 95 8C for 10 min. The thermal cycle protocol used with a ViiA7 qPCR machine (AB Applied Biosystems) was as follows: 40 cycles of 94 8C, 15 s; 56 8C, 30 s; 72 8C, 30 s, followed by generation of a melt curve. Viral genome copy number was calculated by interpolation from a standard curve generated using serial dilutions of a standard DNA amplicon cloned from a positive control. The limit of detection was determined by the lowest dilution of control standard CNA reproducibly detected in the assay. All samples were additionally screened for the presence of canine parvovirus (CPV), canine enteric coronavirus (CECoV) and canine norovirus (CNV) using a 1-step qRT-PCR protocol (Caddy et al., 2013).

Capsid sequencing
All samples positive to CaAstV by qPCR were subjected to conventional PCR to confirm the presence of CaAstV and to enable genome sequencing. The capsid of all positive samples was amplified from cDNA synthesised using SuperScript II Reverse Transcriptase (Invitrogen) according to the manufacturer's protocol with 0.5 mM AV12 primer (Table 1). The PCR reaction was performed using KOD hot start polymerase (EMD Millipore), with reverse primer s2m-rev, and the forward primer 625F-1 from the original qPCR assay. The amplification programme consisted of an initial 5 min step at 95 8C, followed by 35 cycles with 95 8C for 20 s, 58 8C for 30 s and 72 8C for 90 s. A final elongation step at 72 8C for 5 min was performed, followed by chilling to 4 8C.
PCR products were subsequently cloned into pCR-Blunt TM using the Zero Blunt PCR Cloning Kit (Life Technologies) according to the manufacturers protocol. Sequencing of the 5 0 and 3 0 regions of the plasmid insert was performed using pCR-Blunt TM specific primers by the University of Cambridge Biochemistry DNA Sequencing Facility. Sequencing primers for the central region of insert were then designed based on the primary sequence data to give a 200 nt overlap with each predecessor, and a second round of sequencing reactions was performed (sequencing primer details available on request). The complete capsid nucleotide sequence generated using this method was then confirmed by sequencing of PCR products generated from cDNA directly.

Full genome sequencing and phylogenetic analysis
PCR amplification of ORF1a and ORF1b was achieved using cDNA synthesised using SuperScript II Reverse Transcriptase (Invitrogen) according to the manufacturer's protocol with 626R-1 reverse primer (Table 1). A forward primer was designed approximately 300 bp from the 5 0 end of ORF1a, based on a small genome fragment fortuitously amplified by mis-priming in a preliminary screen. The Zero Blunt PCR Cloning Kit was then used to clone the PCR product and sequencing performed using the protocol described above. Primers were designed sequentially to enable sequencing of the entire cloned product in both the 5 0 -3 0 and 3 0 -5 0 direction, with 200 nt overlaps between segments. Sequencing of PCR products directly was finally performed to ensure no mutations has arisen as a result of cloning. Each nucleotide was sequenced at least once in each direction, with an average coverage of 3.
Sequences at the extremity of the viral genome were determined using 5 0 and 3 0 RACE utilising a kit (Life Technologies) according to the manufacturers instructions, and gene specific primers as listed in Table 1. Direct sequencing was then performed as above.

Sample collection
Stool samples and clinical data were collected from 67 dogs with severe gastroenteritis admitted to veterinary clinics or an animal shelter distributed across the UK between August 2012 and June 2014. Control samples were collected from 181 dogs without signs of gastroenteritis, from either veterinary inpatients with non-gastrointestinal illness, or dogs at boarding kennels or belonging to veterinary staff. In total 56 breeds of dog were represented. The mean age of dogs with gastroenteritis was 4.3 years (standard deviation 4.1 years) and the mean age of control animals was 6.1 years (standard deviation 3.9 years).

Identification of CaAstV isolates
Nucleic acid extraction and qPCR were performed on 248 stool samples. Samples were systematically tested for the presence of CaAstV using a SYBR-based qPCR screen. In addition, screening for CPV, CECoV, CNV and the internal extraction control (EAV) was also performed using a 1-step qPCR protocol. CaAstV was detected in a total of four samples. All four positive dogs were showing signs of gastroenteritis, whereas CaAstV was not detected in any dogs without gastroenteritis. The difference between the prevalence of CaAstV in dogs with gastroenteritis (6.0%) and prevalence in dogs without gastroenteritis was statistically significant (p = <0.001). CPV was detected in 10/67 dogs with gastroenteritis, and CECoV in 2/67 of the same group. Two of the four CaAstV positive dogs were also co-infected with CPV, but no co-infections with CECoV and CaAstV were identified. The age range of CaAstV positive dogs was from 7 weeks to 7 years, with a mean age 2.1 years (SD 3.3 years). Table 2 summarises the clinical information and viral screening results of the four CaAstV positive dogs.

Comparison of CaAstV capsid sequences
The complete coding sequence of the four CaAstV strains identified was determined using conventional PCR and cloning techniques. The ORF2 nucleotide and amino acid sequences of these strains were aligned using ClustalW2. The overall nucleotide identity between strains was 77.1-81.1%, whereas the amino acid identity was 79.3-86.3%. The four sequences were deposited in the GenBank database and assigned accession numbers KP404149-KP404152 (cases 1-4, respectively).
A number of studies have reported the N-terminal and C-terminal regions of astrovirus capsids to be relatively conserved, whereas the central region is hypervariable. The human astrovirus capsid protein has previously been divided into three regions; the N terminus (amino acids 1-415), a variable central region (416-707), which includes a hypervariable section from 649 to 707, and a conserved C terminus (708-786) (Willcocks et al., 1995). An analogous approach for the CaAstV capsid was taken by Zhu et al. (2011), who divided the capsid into three regions for analysis: amino acids 1-446, 447-730, and 731-end. The four capsid sequences derived from this study have been analysed according to the latter scheme, and sequence identity compared (Fig. 1). Sequence analysis of the three regions clearly shows that the majority of sequence variation is concentrated in region II. A 24 nt deletion was identified in samples 2-4 in the 5 0 end region II, which has previously been reported in Chinese CaAstV strains (Zhu et al., 2011). It is not possible to predict the location of this deletion on the capsid structure as it is beyond the region of the astrovirus capsid spike for which the crystal structure has been solved (Dong et al., 2011). However, sequence alignment with the human astrovirus type 8 capsid (GenBank AAF85964.1) shows this deletion to be located downstream of the caspase cleavage site required for virion maturation, which truncates the full length capsid protein (VP90) into the mature VP70 form (Banos-Lara and Mé ndez, 2010). Therefore it is predicted that the 24 nt deletion will not alter the mature virion. Evolutionary analysis of the four CaAstV sequences from the study, alongside the seven previously reported full-length CaAstV capsid sequences are presented in Fig. 2. This analysis indicated that the UK strains do not cluster, contrary to a previous study, which analysed CaAstV strains from a single city (Zhu et al., 2011). Each UK strain is distinct from each other, and whereas one strain clusters most closely with the Chinese strains, the remainder group with strains identified in Italy over a number of years.

Full-length CaAstV genome sequenced
The complete CaAstV genome was determined from two samples, isolated from a 7-week-old crossbreed dog (strain Gillingham/2012/UK, GenBank accession number KP404149) and a 7-year-old Border Collie (strain Lincoln/ 2012/UK, KP404150). The total length of CaAstV Gillingham/2012/UK is 6600 nt and CaAst/Lincoln/2012/UK is 6572 nt. Each genome encodes three ORFs; ORF1a, ORF1b, and ORF2 flanked by a 5 0 UTR, and a 3 0 UTR plus a poly-A tail (Fig. 3). In human astroviruses, the 5 0 UTR is 85 nt in length, whereas data from both CaAstV strains would indicate that the CaAstV 5 0 UTR is 45 nt. The 83 nt 3 0 UTR of human astrovirus is identical to 3 0 UTR in CaAstV Lincoln/2012/UK, whereas the 3 0 UTR of CaAstV Gillingham/2012/UK is 2 nt shorter. The nucleotide composition of both CaAstV strains is 29% A, 22% G, 26% T and 23% C. The G/C composition is 45%.
The ORF1a of non-canine astroviruses encodes a serine protease . Sequence alignment of CaAstV with astroviruses of other species shows a high degree of conservation in the predicted serine protease region (Fig. 3B). This is especially pronounced in the regions around the proposed catalytic triad of the serine protease.
ORF1a also encodes the viral genome-linked protein, VPg (Fuentes et al., 2012) (Fig. 4C). CaAstV VPg is predicted to start at aa 656, at a conserved QK cleavage site and is 90 aa in length. In other astroviruses, it has been proposed that the C-terminal VPg cleavage site is coded by Q(P/A/S/L) (Al-Mutairy et al., 2005), and the presence of a QS dipeptide at the same site in CaAstV is consistent with this prediction. The amino acid motif KGK(N/T)K is conserved at the N-terminal end of VPg sequences from both astroviruses and caliciviruses, and this is also identifiable in the CaAstV genome. Another conserved VPg motif is TEXEY, with mutagenesis studies indicating that the Y (tyrosine) residue covalently links VPg to viral RNA (Fuentes et al., 2012). Analysis of the CaAstV ORF1a sequence also identifies the conserved TEXEY motif at 684-688, thus this tyrosine is predicted to covalently link to the RNA genome. However, the CaAstV sequence diverges slightly from the other mamastroviruses studied, Fig. 2. Phylogenetic tree based on the full-length amino acid sequence of the capsid protein of CaAstVs. This includes the four UK strains identified in this study (underlined) and the seven strains previously sequenced and listed in GenBank. The tree was determined using the neighbour-joining method. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (100 replicates) is shown next to the branches. Evolutionary analyses were conducted in MEGA6. in that X of the motif corresponds to K, whereas this is E/Q in all other mamastroviruses. A À1 ribosomal frameshift site between ORF1a and ORF1b, present in human astroviruses (Marczinke et al., 1994), is also conserved in CaAstV. This translational frameshift is directed by the slippery heptamer sequence AAAAAAC at position 2666 in the CaAstV genome. A stem loop structure is predicted downstream of the slippery sequence, as shown in Fig. 4. The slippery sequence and downstream stem loop are highly conserved amongst mamastroviruses. The 3 0 end of ORF1a overlaps with ORF1b by 49 nucleotides. This is shorter than the 71 nt overlap reported for human astroviruses.
As for all other astroviruses studied to date, an overlapping reading frame exists at the ORF1b-ORF2 junction of CaAstV. The CaAstV ORF1b-ORF2 overlap sequence is 8 nt, as reported for other mamastroviruses, hence ORF2 is in the same frame as ORF1a. As previously reported, the capsid sequence was found to have an in-frame start codon 180 nt upstream of the start codon homologous to other mamastrovirus genomes (Toffan et al., 2009).
The 6 aa C terminus of the VP1 (SRGHAE) is highly conserved with several mammalian AstVs. This motif is within a highly conserved nucleotide stretch, s2m, overlapping the termination codon of ORF2, and has been identified in both CaAstV strains sequenced.
The overall nucleotide identity between the two CaAstV strains sequenced was 88.5%. Sequence comparison of the individual ORFs is presented in Table 3. This clearly shows that ORF1b (RdRp) is most closely conserved, making this an ideal target for qPCR screens. Conversely, the capsid sequence is most diverse.
A phylogenetic tree was constructed by multiple alignment of the full-length genome of the two CaAstV strains isolated in this study, and a number of astrovirus reference strains isolated from different mammalian species. This was achieved using MEGA6 software (Fig. 5).

Discussion
CaAstV has previously been detected sporadically in dogs across the world, but the association with disease, prevalence levels and genetic diversity is largely unknown. This study presents the first identification and molecular characterisation of CaAstV cases in dogs in the UK. Sequencing of the viral capsid for all four strains revealed extensive genetic diversity and the first full-length sequences for CaAstV were determined for two strains.
The prevalence of CaAstV in gastroenteritis cases in this study was shown to be 6.0%. This prevalence was unexpectedly high given that CaAstV has not previously been reported in the UK. It may be predicted that this is an underestimation of prevalence based on the population of dogs surveyed. The majority of previous CaAstV epidemiological studies have focused on dogs less than 6 months old, whereas this study included dogs of any age. Serological studies have shown that exposure to CaAstV typically occurs in young animals, with dogs older than 3 months significantly more likely to be seropositive than younger dogs (Martella et al., 2011). This suggests that studies focusing only on young dogs will identify more positive CaAStV cases. However, the decision to survey dogs of any age in this study enabled detection of a CaAstV case in a 7-year-old dog. This is oldest case of CaAstV reported to date and highlights the need to have an index of suspicion for infectious causes of gastroenteritis in dogs of any age. Indeed although human astrovirus is more common in paediatric populations, infections in the elderly are reported (Ferná ndez et al., 2011).
The pathology caused by CaAstV in dogs is uncertain as CaAstV has previously been detected in the stools of both healthy and diseased dogs (Grellet et al., 2012;Martella et al., 2011). However, our study shows a clear relationship between the presence of CaAstV RNA in stool samples, and   (Martella et al., 2011;Zhu et al., 2011), but is at odds with a French study which found no significant difference in CaAstV identification between diarrhoeic (27%) or healthy puppies (19%) (Grellet et al., 2012). Determination of the specific pathology induced by CaAstV infection in dogs is often confounded in a clinical setting by co-infections with other gastroenteric pathogens (e.g. CPV in dogs 1 and 4 in this study). Experimental studies will be required to confirm or refute the association of CaAstV with gastroenteritis, but this study does suggest CaAstV can cause disease. Sequencing of the capsid region of each CaAstV strain identified in this study revealed significant sequence variation. This mirrors the variation previously identified within human astrovirus isolates (De Benedictis et al., 2011). At present, astroviruses are named according to the species in which they are isolated, and subsequent classification is based upon serotypes; these are defined if there is a 20-fold or greater two-way cross neutralisation titre (King et al., 2011). Sequence analysis has verified this classification, with human astroviruses 1-8 having 86-100% nucleotide identity within a serotype, based on capsid sequences (Noel et al., 1995). The nucleotide variation within the capsid region of the four CaAstV strains was shown to be 77.1-81.1%, which strongly suggests these strains are also different serotypes. Confirmation of this requires serological analysis, but unfortunately repeated attempts to grow the CaAstV isolates identified in this study in cell culture failed (data not shown).
Identification of four possible CaAstV serotypes circulating in the UK alone raises questions regarding the possible origins of these strains. Phylogenetic analysis of the UK capsid strains alongside the limited number of CaAstV sequences previously listed in GenBank was unexpected. There was no clustering of the UK strains, unlike the grouping of all Chinese strains. Instead UK strains each grouped with a different CaAstV isolate from either China or Italy. With such limited sequences available it is not possible to determine whether CaAstV strains have spread globally, or independent evolution has occurred. Clearly a high rate of evolution does occur within all astroviruses however, as their RNA genome facilitates introduction of point mutations and recombination events.
Given the strain diversity identified, it has been suggested that some CaAstV strains may be more pathogenic than others. This has previously been reported for astroviruses of mink, which show variation in their ability to invade the central nervous system in a strain related manner (Martella et al., 2012). Assessment of this risk will require wider epidemiological and clinical studies. Another concern raised by the existence of multiple circulating CaAstV strains is regarding future disease control. Management of viral causes of gastroenteritis in dogs is best achieved by widespread vaccination, exemplified by the widely used canine parvovirus vaccine. However, if CaAstV vaccine development is considered, the presence of multiple strains will make vaccine design challenging.  5. Phylogenetic tree based complete genome nucleotide sequences of astroviruses from a range of mammalian species. GenBank accession numbers are listed for all sequences analysed, with the CaAstV sequences underlined. The tree was determined using the neighbour-joining method. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (100 replicates) is shown next to the branches. Evolutionary analyses were conducted in MEGA6.
Full genome sequencing of two CaAstV isolates revealed them to be closely related and possess a typical astrovirus organisation. The first full length sequence of an astrovirus was for human astrovirus in 1994 , and relatively few full length sequences have since been determined. Sequence analysis of the CaAstV strains identifies the presence of a serine protease and VPg within ORF1a as for other astroviruses, which is separated from and conserved RdRp of ORF1b by a À1 frameshift.
In summary, this study has not only identified CaAstV circulating in the UK dog population, but also found significant genetic diversity within the CaAstV strains. Furthermore, full genome sequencing of two CaAstV isolates has enabled detailed molecular characterisation of this astrovirus species, and provides the astrovirus field with further examples of genome variation.