Temporal Transcriptome and Promoter Architecture of the African Swine Fever Virus

The African Swine Fever Virus causes haemorrhagic fever in domestic pigs and presents the biggest global threat to animal farming in recorded history. Despite its importance, very little is known the mechanisms and temporal regulation of transcription in ASFV. Here we report the first detailed viral transcriptome analysis of ASFV during early and late infection of Vero cells. In addition to total RNA sequencing, we have characterised the transcription start sites and transcription termination sites at nucleotide-resolution, revealing the distinct DNA consensus motifs of early and late promoters, as well as the sequence determinants for transcription termination. ASFV can utilise alternative promoters to generate distinct proteins from the same transcription unit that differ with respect to the polypeptide N-terminus. Finally, our results reveal that the ASFV-RNAP undergoes transcript slippage at the 5’ end of transcription units that in a promoter sequence-specific manner results in the addition of 5’-AT and 5’-ATAT tails to mRNAs.


Introduction [553 words]
African swine fever virus (ASFV) causes incurable and lethal haemorrhagic fever in domestic pigs. In 2019, ASF presents an acute and global animal health emergency that has the potential to devastate entire national economies since effective vaccines or antiviral drugs are not available (FAO of UN).
Urgent action is needed to advance our knowledge about the fundamental biology of ASFV, including the mechanisms and temporal regulation of transcription. A thorough understand of RNAP and factor function, as well as accurate knowledge of which genes are expressed and their amino acid sequence, is direly needed for the development of antiviral drugs and vaccines, respectively. ASFV is the sole member of Asfarviridae 1 , a family resembling others in the Nucleocytoplasmic Large DNA Viruses  (ORFs). The genomic variation between strains predominantly originates from loss or gain of genes at the genome termini that belong to multigene families (MGFs) 6 . Despite its global economic importance, little is known about ASFV transcription, even though it is believed to be similar to the vaccinia virus (VACV) system [7][8][9] , a distantlyrelated NCLDV member of the Poxviridae family 10 . ASFV encodes both an RNA polymerase (RNAP), a poly-A polymerase, and an mRNA capping enzyme, and extracts obtained from mature virus particles are fully transcription competent 9,11,12 . The basal ASFV transcription machinery resembles the eukaryotic RNAPII system by encompassing an (8-subunit) ASFV-RNAP and distant relatives of the TATA-binding protein (TBP), the transcription initiation factor II B (TFIIB) and the elongation factor TFIIS. ASFV also encodes a histone-like DNA binding protein, pA104R 13 . Of particular interest is the possibility that the ASFV-RNAP can gain promoter-specificity in terms of temporal (early or late) gene expression dependent on the association with either host (eukaryote)-like TBP/TFIIB or virus-specific factors such as D1133L/G1340L, which are homologous to the VACV D6-A7 ETF heterodimer 14,15 . The promoter consensus motifs for early and late ASFV genes have not been characterised on a genomewide scale, or in fact in any detail, with the exception of an AT-rich sequence motif upstream of the p72 transcription start site (TSS) and a consistently AT-rich region overlapping the TSS in late genes 16 .
Importantly, information about the temporal ASFV gene expression, TSS and transcription termination sites (TTS) is sparse 9,10 .
We have focused our analysis on the BA71V strain (170,101 bp genome, with 153 annotated ORFs 17,18 ), because this is the best studied ASFV strain with regards to viral transcription and mRNA modification and protein expression 9,19 . Here we have applied a combination of transcriptome sequencing (RNAseq), RNA 5'-end (cap analysis gene expression sequencing or 'CAGE-seq')-and RNA 3'-end (3' RNAseq) determination. We report (i) a map of the ASFV transcriptome at one early and one late time point during infection (5 h and 16 h post-infection), in (Figure 1)), (ii) a genome-wide TSS map that has allowed us to define early and late ASFV promoter consensus motifs as well as shed light on 5' UTRs 4 and the phenomenon of RNA-5' leaders in ASFV, and (iii) a genome-wide TTS map that provides novel insights into the mechanism of transcription termination in ASFV and by inference in NCLDVs.

5
Results [2,902 words] ASFV transcriptome and genome structure A transcriptome is defined by the overall expression levels of mRNAs, and their 5' and 3' termini. We carried out RNA-seq, CAGE-seq and 3' RNA-seq in order to characterise these parameters during early and late ASFV infection, which combined inform about the ASFV transcriptome and DNA sequence signatures associated with transcription initiation and termination.
Vero cells were infected with BA71V, and at 5h and 16h post-infection RNA was extracted from cells.
Bowtie 2 20 mapping of the RNA-seq, CAGE-seq and 3' RNA-seq reads (summarised in Supplementary   Table 1) showed a strong correlation between replicates (Pearson correlation coefficient r ≥ 0.98), with one exception of RNA-seq from 16h (r of 0.74 and 0.84 for two strands, Supplementary Figure 1). For both time points, reads mapped well to the ASFV genome and were consistent between replicates for all three techniques. A genome-wide map of reads mapped from all three NGS techniques is shown in Figure 2a, while examples of signals for transcription start sites (TSSs) and transcription termination sites (TTSs) revealed from downstream analysis are shown in Figure 2. b-e. The RNA-seq sequencing depth was sufficient to analyse significant changes in ASFV gene transcription (i.e. reads) at early and late infection due the small genome size (170 kb), and unsurprisingly most CAGE-seq reads were aligned upstream of ORFs start codons. However, a subset of late infection sample reads did map to intergenic regions or within ORFs, perhaps indicative of misannotated ASFV ORFs. Alternatively, these RNAs may be generated by pervasive transcription, a phenomenon observed during late VACV infection 21,22 . Another origin of intra-ORF CAGE RNA-5' signals could be due to mRNA de-capping and degradation followed by re-capping as observed with eukaryotic mRNAs 23,24 .

Mapping of ASFV Primary Transcription Start Sites
Following mapping of CAGE-seq reads to the ASFV-BA71V genome, we located regions including an enrichment of reads corresponding to the 5' ends of transcripts and thereby the transcription start sites (TSS). Using the CAGEfightR 25 package we detected a total of 779 TSS organised in clusters with 6 a predominant sharp peak within each cluster (Supplementary Figure 2a), somewhat broader than the 657 termination clusters from 3' RNA-seq (Supplementary Figure 2b) . Six additional TSS clusters were annotated manually since they were clearly detectable when viewing alignments manually but missed by CAGEfightR. Not all ~780 clusters were located within 500 bp upstream of the ASFV gene translation initiation codons indicative of genuine gene promoters, but within and antisense relative to ORF coding sequences (CDS) (Figure 3a). 28 clusters in total were not found associated with any annotated ASFV-BA71V ORFs. The clusters associated with annotated ORFs and in the sense direction were manually investigated for their feasibility as 'primary' TSSs (pTSSs), based on peak height, proximity to the ORF initiation codon, and coverage from our complementing RNA-seq data. We identified pTSSs fulfilling these criteria upstream of 151 of the 153 annotated ORFs in the ASFV-BA71V genome, the missing ORFs being E66L and C62L. Overall, our data showed good agreement with previously individually mapped TSSs of 44 ORFs, because in 86 % the identified pTSSs matched (Supplementary Table 2). Our sequencing data resulted in the re-annotation of eleven ORFs where the pTSS / RNA-seq coverage was not compatible with the ORF annotations in the published ASFV-BA71V genome. Based on the edited annotations (Table 1), we provide a novel gene feature file (Separate 'GFF').
Several genes (including B169L and I177L) have a pTSS upstream of the annotated start codon and an alternative TSS apparently residing within the ORF (Figure 3b). This was previously observed for gene I243L that is associated with distinct TSSs for early, intermediate and late stages of gene expression described by Rodríguez et al. 26 , in agreement with our CAGE-seq results (Supplementary Figure 3a).
I243L encodes a homologue of the important RNA polymerase II transcript cleavage factor TFIIS that is highly conserved in eukaryotes and among NCLDV members albeit with varying domain conservation 27 . Since the late I243L TSS is downstream of the start codon at the I243L ORF 5' end, the protein absent during late infection. The role for this domain in transcription is little understood but appears to be important during preinitiation complex assembly and stability, independent of the transcript cleavage activity 28,29 . We identified 7 further genes with alternative pTSSs during early and late infection, these are summarised in Table 2. In most cases, the re-annotated (single pTSS downstream of start codon) or alternative pTSSs (multiple pTSSs, some downstream of start codon) did not substantially alter the protein products of the ORFs, except for re-annotated I177L and alternative pTSSs of B169L (Figure 3 b.), two putative transmembrane proteins 11,18 . The longer B169L mRNA synthesised during late infection encodes the complete 169 AA protein, while the early transcript encodes an N-terminally truncated protein of 78 AA, and the predominant late pTSS for I177L would truncate it to less than half its original length.

Novel Genes Supported by Sequencing Data
28 pTSSs in our CAGE-seq data set were not associated with annotated ORFs (Supplementary Table 3).
Our manual analysis of both CAGE-seq and RNA-seq maps revealed that seven of these pTSSs were associated with transcripts that encode short ORFs, which we call putative novel genes (pNGs). These encode polypeptides of 25-56 AA length (Table 3) without a clear similarity to characterised ASFV genes that were likely missed in initial BA71V ORF prediction as only those ≥ 60 AA were annotated 18 . Figure 3c. illustrates the features of pNG1 and pNG3, with distinct TSS and TTS, and robust RNA-seq read coverage across the entire gene. In support of the notion that the pNGs are bona fide ASFV genes, five of seven pNG transcription units include a 5-8 nucleotide poly-T sequence at their 3' termini; this sequence signature has been proposed to serve as transcription termination motif 9,30 .

Highly expressed ASFV genes during Early and Late Infection
In order to gain insights into the expression of the individual TUs, we quantified the steady-state mRNA levels using CAGE-seq and compared the most highly abundant mRNAs at early and late time points  31 . Surprisingly, six genes were found in the top-20 expressed genes during both early and late infection from CAGE-seq (CP312R, A151R, K205R, Y118L, pNG1, I73R). However, a simple comparison of gene steady-state transcript levels would not be sufficient to draw conclusions about differential gene expression between the two time points; a thorough analysis of this kind requires both CAGE-and RNA-seq data sets as described below 32 .

Differential Expression of ASFV Genes
We characterised the differential expression of ASFV genes between early and late infection by comparing separate DESeq2 analyses of RNA-seq and CAGE-seq datasets. From RNA-seq data we concluded 103 ASFV TUs showed significant differential expression (adjusted p-value < 0.05), with 47 genes down-and 56 genes upregulated during the progression from early (5 hrs) to late (16 hrs) infection (Supplementary Figure 4b). CAGE-seq appeared to be more sensitive, indicating that 149 genes were significantly differentially expressed: 65 genes were down-and 84 genes were upregulated ( Figure 4b). 101 genes were congruent in both datasets, and the changes in expression were significantly correlated between CAGE-seq and RNA-seq data sets (Spearman's rank correlation coefficient ρ = 0.73, Figure 4c). The directions of changes were only reversed for 10 (out of 101) genes between the common RNA-seq and CAGE-seq datasets (DP63R, I329L, NP419L, B66L, A224L, E248R, O174L, D345L, C315R and NP1450L), giving us a total of 91 genes we confidently classified as early (36) and late (55) from our analysis. Supplementary Table 6 provides all details of differentially expressed genes from the RNA-seq and CAGE-seq analyses, their functions, and whether previously detected in viral particles. The 91 genes with correlated differential expression (from both CAGE-seq and RNAseq) were assigned with functional categories based on their annotation in the VOCS database 33 and ASFVdb 34 (Figure 4d). Around one fifth of the early and late genes were classified as 'uncharacterised' 9 without any functional predictions. The transition between 5 h and 16 h post infection is accompanied by a significant transcript-level increase in genes important for viral morphology and structure, but also the overall diversity of genes differentially expressed changed. A significant difference was seen in the multigene family (MGF) members; they constitute nearly a half of the early genes, but only one (MGF 505-2R) belongs to the late genes. ORFs annotated with a 'transmembrane region' or 'putative signal peptide' were also overrepresented in the late infection (Fisher Test: p < 0.05); they remain poorly characterised beyond a domain prediction and nine proteins (out of 12) of these ORFs could be detected in BA71V virions by mass spectrometry 11 .

Architecture of ASFV Gene Promoters and Consensus Elements
The genome-wide TSS map combined with information about their differential temporal expression allowed us to analyse the sequence context of TSSs and thereby characterise the consensus motifs and promoter architecture of early and late ASFV genes. For the sequence analyses we compared our clearly defined 36 early genes and 54 late genes (quartiles in Figure 4c). Eukaryotic RNA pol II core promoters are characterised by a plethora of motifs, including TATA-and BRE boxes, and the Initiator (Inr). The former two interact with the TBP and TFIIB initiation factors, while the latter is interacting with RNA pol II, respectively. Multiple sequence alignments of the regions proximal to the TSS revealed several interesting ASFV promoter signatures that are related to RNA pol II motifs 35 . The Initiator (Inr) element overlapping the TSS is a feature that distinguishes between early and late gene promoters

ASFV mRNAs have 5' leader regions
Early and late genes in ASFV differ with regard to the length of 5' untranslated 'leader' regions (UTR).
5' UTRs of the late genes are significantly shorter and have a higher AT-content compared to early genes (p-value < 0.05, Figure 5c). Surprisingly, a subset of late gene CAGE-seq reads extended upstream of the assigned TSSs and were not complementary to the DNA template strand sequence. In order to rule out any mapping artefacts, we trimmed the CAGE-seq reads by removing the upstream 25 nt, and aligned them to the genome at the 5' boundary of the reads. This did not significantly impair the mapping statistics but highlighted that nearly half of the annotated TSS (74/158) among both early and late genes are associated with mRNAs that have short 5' extensions, including 7 genes with multiple TSSs (Supplementary Table 7). The majority of the 5' leaders are 2 or 4 nt long (Figure 5h and Supplementary Figure 5e) and not correlated with expression phase (Supplementary Figure 5f). The most common sequence motif is AT (33% and 71% of early and late genes, respectively) and ATAT (7% in late genes, Figure 5i). In order to investigate any potential sequence-dependency of the mRNAs associated with AT-and ATAT-5' leaders, we scrutinised the template DNA sequence downstream of the TSS and found that all TUs, contained the motif ATA at positions +1 to +3 (Figure 5i-j). This suggests that the formation of AT-leaders is generated by RNA polymerase slippage on the first two nucleotides of the initial A(+1)TANNN template sequence, generating AUA(+1)UANNN or AUAUA(+1)UANNN mRNAs. A different but related slippage has been observed in the VACV transcription system, where all post-replicative mRNAs contain short polyA leaders which are associated with consensus Inr TAAAT motif 21 . The structural determinants underlying RNAP slippage are interactions between the template DNA sequence and the RNAP and/or transcription initiation factors; the differential use of distinct initiation factors for the transcription of early and late ASFV genes thus likely accounts for difference in leader sequences.

Transcription termination of ASFV-RNAP
Mapping of the 3' mRNA ends of a several ASFV genes revealed a conserved sequence motif consisting of ≥7 thymidylate residues in the template, which is consistent with native 3' end formation generated by transcription termination 30 . In order to investigate the sequence context of transcription termination, mRNA 3' end sequencing was used to obtain the sequences immediately preceding the poly(A) tails, generating a genome-wide map of mRNA 3' end peaks (Figure 2). Using a similar approach as for pTSS mapping, CAGEfightR detected a total of 657 termination site clusters, 212 TTSs within 1000 bp downstream of 1-3 ORFs. Because multiple ORFs had more than one cluster within that region (Supplementary Table 8), we defined 114 primary transcription termination site (pTTS) as the the TTS with the highest CAGEfightR-score in closest proximity to a a stop codon; we classified the 98 remaining peaks as non-primary TTSs (npTTS). By carrying out sequence analyses similar to the TSS mapping, we identified a highly conserved poly-T signal within 10 bp upstream of 126 TTSs (83 pTTSs, 43 npTTSs) that was characterised by ≥4 consecutive T residues (Figure 6a), with the ultimate residue located <3 bp of the last T residue in the motif (Figure 6b). The remaining 86 TTSs were not associated with any recognisable sequence motif besides a a single T residue 1 bp upstream of the TTS. Our results are in good agreement with a previous S1 nuclease mapping of 6 ORFs (Supplementary Table 2), but less so with 17 proposed TTSs which were predicted based on transcript length estimates relative to upstream transcription start sites (Supplementary Table 2). This may be because only ≥7 consecutive Ts in the template were included to serve as terminators. Our results demonstrate that the total number of consecutive Ts of the polyT motif can vary, with polyTs of early genes being longer than those of late genes ( Figure 6c). Finally, we observed differences between early and late gene termination, in as much as poly-T terminators were overrepresented in early-and underrepresented in late genes ( Figure   6d), and the 3' UTRs of late genes were significantly longer compared to 3' UTRs of early genes ( Figure   6e), in good agreement with previous studies on a small number of mRNAs.

Discussion [1,014 words]
Here we report the first comprehensive ASFV transcriptome study at single-nucleotide resolution. The unequivocal mapping of 158 TSS and 114 TTS for the 160 viral genes allowed us to reannotate the ASFV BA71V genome. Our results provide detailed information about the differential gene expression during early-and late infection, the sequence motifs for early and late gene promoters (EPM and LPM, and Inr elements) and terminators (poly-T motif), and evidence quasi-templated 'AU' RNA-5' tailing by the ASFV-RNAP.
Eukaryotic RNA pol II and the archaeal RNAP critically rely on the two initiation factors TBP and TFIIB for transcription initiation on all mRNA genes. In contrast, the bacterial RNAP obtains specificity for subsets of gene promoters by associating with distinct sigma ( for selectivity) factors. The ASFV RNAP is related to the eukaryotic RNA pol II, however, transcription initiation of early-and late genes is directed by two distinct sets of general initiation factors and their cognate DNA recognition motifs.
Our TSS mapping demonstrates that early and late gene promoters are associated with a combination of conserved and distinct sequence signatures. The first feature of all ASFV promoters is the Initiator The distance distribution of the EPM is narrow (located 9-10 bp upstream of the TSS) while the distance between the LPM and TSS shows greater variation and is located closer (4-6 bp) to the TSS.
We cannot strictly rule out a limited overlap between early and late genes, or that some genes have hybrid promoters that would enable the expression of genes during both early and late stages, both of which might impede the concise analysis of specific promoter consensus motifs. To unequivocally attribute factors to their cognate binding motifs, a chromatin immunoprecipitation approach is required. Considering the close relationship between ASFV and VACV, we posit that the ASFV EPM is recognised by heterodimeric ASFV-initiation factor (VACV D6/A7) 10  The mechanisms underlying transcription termination of multisubunit RNAP are diverse 50,51 . Our analyses of genome-wide ASFV RNA-3' ends allowed the mapping of the ASFV 'terminome'. Over half of ASFV gene mRNA 3' are characterised by a stretch of seven U residues, with the TTS mostly coinciding with the last T residue in the template DNA motif, which is in good agreement with a few ASFV terminators that have been characterised individually 30,52 . In contrast, VACV appears to utilise a motif ~ 40 nt upstream of the mRNA 3' 53,54 . In essence, the ASFV-RNAP is akin to archaeal RNAPs and RNA pol III, where a poly-U stretch is the sole cis-acting motif without any RNA secondary structures characteristic of bacterial intrinsic terminators 51 . The mapped mRNA 3' ends of genes without any association with poly-U motifs are still likely to represent bona fide nascent termination sites, since RNA-seq reads were decreasing towards these termination sites. ASFV encodes several (VACV-related) RNA helicases that have been speculated to facilitate transcription termination and/or mRNA release 9,55 . Future functional studies will address the molecular mechanisms of termination including the role of putative termination factors.
Understanding the molecular mechanisms of the ASFV transcription system is not only of academic interest. More than 6 million pigs have died in Asia since 2018 and the Chinese pig population, which comprised half of the world's population, has decreased by 40% (FAO OIE WAHIS). Unless effective vaccines in conjunction with antiviral treatments against the ASFV are developed, the larger part of the global pig population may succumb to this terrible disease or has to be culled to prevent its propagation (OIE, https://www.oie.int). Structure-based drug design is crucially dependent on our knowledge about the fundamental biology of ASFV transcription machinery and the temporal gene expression pattern, and our results directly contribute to these burning issues for animal husbandry. 16

RNA Sample Extraction from Vero Cells infected with BA71V
Vero cells (Sigma-Aldrich, cat #84113001) were grown in 6-well plates, plates and were infected in 2 replicate wells for 5h or 16h with a multiplicity of infection of 5 of the ASFV BA71V strain, collected in Trizol Lysis Reagent (Thermo Fisher Scientific) separately, after growth medium was removed. Infected cells were collected at 5h post-infection (samples for RNA-seq: S3-5h and S4-5h, CAGE-seq: S1-5h and  Table 1) and 12 FASTQ files.
Library preparation and CAGE-sequencing of RNA samples S1-5h, S2-5h, S3-16h and S4-16h was carried out by CAGE-seq (Kabushiki Kaisha DNAFORM, Japan). Library preparation produce single-end indexed cDNA libraries for sequencing: in brief, this included reverse transcription with random primers, oxidation and biotinylation of 5' mRNA cap, followed by RNase ONE treatment removing RNA not protected in a cDNA-RNA hybrid. Two rounds of cap-trapping using Streptavidin beads, washing away uncapped RNA-cDNA hybrids. Next, RNase ONE and RNase H treatment degraded any remaining RNA, and cDNA strands were subsequently released from the Streptavidin beads and quality-assessed via Bioanalyzer. Single strand index linker and 3' linker was ligated to released cDNA strands, and primer containing Illumina Sequencer Priming site was used for second strand synthesis. Samples were sequenced using the Illumina HiSeq platform producing 76 bp reads (Supplementary Table 1).
3' RNA-seq was carried out with samples E-5h_1, E-5h_2, L-16h_1 and L-16h_2 using the Lexogen  Table 1). Cluster classification was not successful in all cases, therefore, manual adjustment was necessary.

Sequencing Quality Checks and Mapping to ASFV and Vero Genomes
Integrative Genomics Viewer (IGV) 65 was used to visualise BW files relative to the BA71V ORFs, and incorrectly classified clusters were corrected. Clusters with the 'tssUpstream' classification were split into subsets for each ORF. 'Primary' cluster subset contained either the highest scoring CAGEfightR cluster or the highest scoring manually-annotated peak, and the highest peak coordinate was defined as the primary TSS (pTSS) for an ORF. Further clusters associated these ORFs were classified as 'nonprimary', highest peak as a non-primary TSS (npTSS).
If the strongest CTSS location was intra-ORF and corroborated with RNA-seq coverage, then the ORF was re-defined as starting from the next ATG downstream. For the 28 intergenic CTSSs, IGV was used 19 to visualise if CAGE BW peaks were followed by RNA-seq coverage downstream, and whether the transcribed region encode a putative ORF using NCBI Open Reading Frame Finder 66 .

TTS-Mapping
TTSs were mapped in a similar manner to TSSs and CAGEfightR was utilised as above to locate clusters of 3' RNA-seq peaks, though differed in some respects: input BigWig files contained the 3' read-end coverage extracted from BAM files using BEDtools genomecov. Clusters were detected for the 3' RNAseq peaks in the same manner as before, except merging clusters < 25 nt apart, which detected a total of 567 clusters. BEDtools was used to check whether the highest point of each cluster (TTS) was within 500 bp or 1000 bp downstream of annotated ORFs and pNGs. TTSs were then filtered out if 10 nt downstream of the 3' end had ≥ 50% As, to exclude clusters potentialy originated from miss-priming.
TTS clusters for pNG3 and pNG4 were initially filtered out but included in final 212 TTSs due to their strong RNA-seq agreement. In cases of multiple TTS clusters per gene we defined the highest CAGEfightR-scored one within 1000 bp downstream of ORFs as primary (pTTS) unless no clear RNAseq coverage was shown, or manually annotated from the literature for O61R 52 .

DESeq2 Differential Expression Analysis of ASFV Genes
A new GFF was produced for investigating differential expression of ASFV genes across the genome with changes from the original U18466.2.gff: for all 151 ASFV ORFs which had identified pTSSs, we defined their transcription unit (TU) as beginning from the pTSS coordinate to ORF end. Since no pTSS was identified for ORFs E66L and C62L these entries were left as ORFs within the GFF, while the 7 putative pNGs were defined as their pTSS down to the genome coordinate at which the RNA-seq coverage ends. In 8 cases where genes had alternative pTSSs for the different time-points the TUs were defined as the most upstream pTSS down to the ORF end. For analysing differential expression with the CAGE-seq dataset a GFF was created with BEDtools extending from the pTSS coordinate, 25 bp upstream and 75 bp downstream, however, in cases of alternating pTSSs this TU was defined as 25 bp upstream of the most upstream pTSS and 75 bp downstream of the most downstream pTSS. HTSeqcount 67 was used to count reads mapping to genomic regions described above for both the RNA-and 20 CAGE-seq sample datasets. The raw read counts were then used to analyse differential expression across these regions between the time-points using DESeq2 (default normalisation described by Love et al. 68 ) and those regions showing changes with an adjusted p-value (padj) of <0.05 were considered significant. Further analysis of ASFV genes used their characterised or predicted functions as found in the VOCS tool database (https://4virology.net/) 33,69 or ASFVdb 34 entries for the ASFV-BA71V genome.

Early and Late Promoter Analysis
DESeq2 results were used to categorise ASFV genes into two simple sub-classes: early; genes downregulated from early to late infection and late; those upregulated from early to late infection. For those with newly annotated pTSSs (151 including 7 pNGs but excluding 15 alternative pTSSs), sequences 30 bp upstream and 5 bp downstream were extracted from the ASFV-BA71V genome in FASTA format using BEDtools. The 36 Early, 55 Late and all 166 pTSSs (including alternative ones) at once were analysed using MEME software (http://meme-suite.org) 70 , searching for 5 motifs with a width of 10-25 nt, other settings at default. Significant motifs (E-value < 0.05) detected via MEME were submitted to a following FIMO 40 search (p-value cut-off < 0.0001) of 60 nt upstream of the total 166 pTSS sequences (including pNGs and alternative pTSSs), and Tomtom 41 search (UP00029_1, Database: uniprobe_mouse) to find similar known motifs. Fig. 1

. Annotated genome of ASFV-BA71V indicating transcription start sites (TSS) and early and late genes.
The map includes 153 previously annotated as well as novel genes identified in this study and their differential expression pattern. Downregulated-(blue) and upregulated (red) genes were differentially expressed according to both RNA-seq and CAGE-seq data, while colour coding in light-blue and light red showed the same type of differential expression in the both techniques but only statistically significant (adjusted p-value < 0.05) according to either CAGE-seq or RNA-seq, respectively. The map was visualised with the R package gggenes.