Ancient herpes simplex 1 genomes reveal recent viral structure in Eurasia

Human herpes simplex virus 1 (HSV-1), a life-long infection spread by oral contact, infects a majority of adults globally. Phylogeographic clustering of sampled diversity into European, pan-Eurasian, and African groups has suggested the virus codiverged with human migrations out of Africa, although a much younger origin has also been proposed. We present three full ancient European HSV-1 genomes and one partial genome, dating from the 3rd to 17th century CE, sequenced to up to 9.5× with paired human genomes up to 10.16×. Considering a dataset of modern and ancient genomes, we apply phylogenetic methods to estimate the age of sampled modern Eurasian HSV-1 diversity to 4.68 (3.87 to 5.65) ka. Extrapolation of estimated rates to a global dataset points to the age of extant sampled HSV-1 as 5.29 (4.60 to 6.12) ka, suggesting HSV-1 lineage replacement coinciding with the late Neolithic period and following Bronze Age migrations.

evidence. The radiocarbon determination supports but does not categorically confirm this phasing (Data S9).
Information on the radiocarbon determination for EDI111 is given in Data S9. EDI111 has been radiocarbon dated to 545-590 calCE (1σ interval) or 437-636 calCE (12σ interval). As a lower left 2nd molar was sampled, the radiocarbon determination does not date the death of the individual but when the tooth formed by age 11 -13. As this individual was osteologically assessed as 35-45 years old when they died, this suggests that the individual died 20 -30 years after the intervals quoted. Given the date and location of the Edix Hill cemetery it is unlikely that the individuals buried there consumed marine fish, and the δ13C value of -20.4 is compatible with this. Low level consumption of freshwater fish by those buried at Edix Hill is possible, but can not be accurately assessed. Consumption of freshwater fish would have the effect of making the radiocarbon determination older than the death of the individual. The impact of the potential freshwater reservoir effect has not been estimated and is likely to be slight.

Brodovsky, Nevolino, Russia
The Brodovsky burial site is located in the village of Borody in Perm Krai of Russia. On the old riverbank of Shakva several barrows and ground burials have been found (83). Archaeological excavations have been carried out there repeatedly since 1898 (83,84). By the burial customs and grave goods, the Brodovsky burial site is related to the Nevolino culture, which existed in Prikamaye from 4 th to 9 th centuries CE (84,85). Unfortunately, the analysed tooth comes from a burial, which can no longer be more precisely localised.

Raadhuisstraat 185-193, Alphen aan den Rijn, South-Holland, NL
The individuals buried on the right bank of the river Rhine are most likely victims of a French attack on the villages Bodegraven and Zwammerdam in December 1672(86) (pages [65][66]. The inhabitants of the villages were massacred and some of the bodies ended up in the Rhine which took them downstream. Some bodies were washed ashore on the excavated area (86), which was still unoccupied at that time(86) (pages 13-19 and 94-114). A total of four complete individuals were discovered. The bodies were all men from European descent. Two bodies (S36 and S37) were buried disrespectfully together in one grave and face down, which suggests they were French soldiers (i.e. the enemy).
Individual S16 was a male of 26-35 years old(86) (pages 57-65). In his youth he probably suffered from an insufficiency of vitamin D, which resulted in a slight bent in the lower legs and a deviation in the ribs. The man was a fervent smoker of clay pipes. Traces of the habit are visible in multiple places on the teeth, where the hard clay pipe, usually put in the same place in the mouth, has worn the teeth (see fig. S1).

Variant annotation
We annotated our vcf files using snpEff(64) with a custom database for the HSV-1 reference strain based on the gff3 for NC_001806.2. For the annotation, we also included indels in our vcfs, which were inspected by eye to avoid misalignments and misidentifications. Using bedtools (63) and our GenMap data we further computed the number of SNPs per gene and the coverage in gene intervals (>MQ30). Variants are listed in Data S5.
We investigated moderate and high impact SNPs within glycoprotein intervals associated with an immune response based on the following criteria: effect grade, coverage and depth of coverage of the interval of interest, MQ>=30 at the position and DP>=3 at the position.
Selected SNPs: JDS005 UL27: gB -Two SNPs predicted to be moderate but they do not fall into Uniprot-predicted features, such as the virion surface. Unlikely to be significant.
UL22: gH -Four non-synonymous SNPs (p.Ala150Thr; p.Ser138Ala; p.Thr127Ile; p.Pro110Ser) are all in the Uniprot-predicted virion surface portion of the glycoprotein, increasing the chances that these variants may be recognised as epitopes by the immune system (87). Glycoprotein gH also has a role in cell infection (fusion of the virus for entry) (88).
US4: gG -This is a short protein, so it is hard to predict the effect of a non-synonymous SNP at aa3.
US6: gD -This SNP is not in the transmembrane domain of gD, but lies just outside the N terminus (amino acids 1-29), which are recognised; speculated to be neutral (89).

EDI111
UL27: gB -One moderate effect SNP, but AA873 is an intravirion (predicted) domain, so less likely to be an epitope.

BRO001
US3: -One high effect (stop_loss) SNP in this region, which is a significant virulence factor for HSV-1. This region encodes serine/threonine kinase.
UL27: gB -One moderate effect SNP. As in EDI111, it lies within an intravirion (predicted) domain, so less likely to be an epitope.
UL22: gH -A number of predicted moderate SNPs are in the Uniprot-predicted virion surface portion of the glycoprotein, increasing the chances that these variants may be recognised as epitopes by the immune system.
US4: gG -This is a short protein, so it is hard to predict the effect of a non-synonymous SNP at aa3 (same as the JDS genome, thus it may be a common polymorphism).
US6: gD -Just outside the predicted transmembrane domain of gD; speculated to be neutral.

Host Susceptibility analyses
The ancient human vcf files were annotated with the latest ClinVar vcf downloaded from the ftp site on 27/01/2020. The resulting annotated snps were filtered for clinical significance "pathogenic", "likely pathogenic" and for "herpes" information tags (Data S3). For JDS005 (~11X) variants with DP ≤ 3 were discarded; however, since EDI111 had a much lower coverage of ~6X, all variants with DP ≥ 2 were kept and evaluated by hand. Variants called homozygous were accepted. Due to aDNA damage present and average genomic coverage below 15X, variants called heterozygous were assessed as either likely heterozygous (ratio of ref:alt at ~1:1) or likely erroneous (ratio of ref:alt >2:1 and a C > T or G > A transition). Interestingly, though not related to HSV-1, JDS005 carries an autosomal dominant variant for nocturnal frontal lobe epilepsy (rs12721510, Data S3), which may explain why he was in the Hospital of St John the Evangelist at a young age.
In addition, specific genomic regions of interest for susceptibility to Herpes viruses were identified by searching ClinVar for "Herpes" and including all genes in which pathogenic SNPS were listed. This included 14 genes. A bed file of gene positions was used to extract the haplotypes from a merged VCF file using bcftools (90).
Herpesviridae have been found to be more abundant in cases of chronic and aggressive periodontitis (28,91) though the connection with HSV-1 remains inconclusive (30,92). Sequencing reads from EDI111 point to the presence of bacteria commonly associated with minor to moderate periodontal disease along with a number of other microorganisms that may indicate poor dental health.

Isotope Analysis
Analysis of carbon and nitrogen isotopes in archaeological human dentine and bone collagen can be used to provide information about diet in the past (93,94). This is based on the principle that carbon and nitrogen isotope values in skeletal tissues largely reflect the isotopic composition of the diet consumed during life (95,96) and these signatures are preserved in archaeological skeletal remains. Broadly, carbon isotope values can provide information about the types of plants consumed (C 3 vs C 4 ) (97,98), nitrogen isotope values can provide information about animal protein consumption and subsequent trophic level (93,99,100) and analysis of a combination of carbon and nitrogen isotope values can provide information about marine fish consumption (99,101,102). In addition, analysing multiple skeletal tissues from an individual can provide information about diet across an individual's lifespan (103). As primary dentine does not remodel(104), analysis of carbon and nitrogen isotopes in dentine is representative of diet at the time of formation in childhood, whereas bone constantly remodels throughout life, at varying rates (105) and therefore isotope values in bone collagen represent diet in the years before death (93).

Generation of isotopic data
Dentine collagen from the root (enamel-dentine junction to root apex) of an upper left 2 nd molar (representing diet from approx.8.5-12.5yr(106) from JDS005 was analysed, and combined with collagen isotope data generated by Price (2013)(107) from a rib midshaft (representing diet in the years before death (105)). EDI111 and BRO001 were not analysed as they were outside the scope of the After the Plague project. Preparation of collagen samples was carried out in the Dorothy Garrod Laboratory, McDonald Institute, University of Cambridge, following the laboratory standard operating procedures based on a modified version of the Longin method (Longin 1971, Collins and Galley 1998, Richards and Hedges 1999). An approx. 300mg sample of root dentine and a 500-1000mg sample of rib bone were cut using a hand-held Dremel drill with a diamond-tipped cutting wheel. The sample surfaces were then abraded using a sandblaster to remove surface contaminants. To extract the collagen, samples were first demineralized in approximately 8ml of cold 0.5M aq. hydrochloric acid (HCl), then rinsed with distilled water and heated in approximately 8ml of pH 3.0 H2O for 48hrs at 75°C until gelatinized. The samples were then filtered using Ezee filters (60-90μm), frozen (-20°C, then -80°C) and placed into a freeze-drier until fully lyophilized. Analysis of the collagen samples was carried out in the Godwin Laboratory, Department of Earth Sciences, Cambridge. All samples were analysed in triplicate. For each one of the triplicates, 0.8mg (±0.1mg) of lyophilised collagen was weighed into a tin capsule. For each batch of samples submitted for analysis, a suite of in-house standards were also weighed into tin capsules and analysed (caffeine, alanine, nylon, protein, EMC). Analysis was carried out using a Costech automated elemental analyser coupled with a Thermo Finnigan MAT253 isotope ratio mass spectrometer in continuous flow mode. All samples are reported on the international scale relative to VPDB for carbon and AIR for nitrogen (Hoefs 1997). Based on replicate analyses of standards, analytical error was <±0.2‰. All results were quality checked to see if they were within the acceptable atomic C:N ratio range of 2.9-3.6 (DeNiro 1985) and above the minimum acceptable %C and %N threshold ( >4-5% for C and >13% for N (Ambrose 1990)).

Results
Both tissues from JDS005 provided successful δ 13 C and δ 15 N values: dentine collagen δ 13 C = -19.5‰ and δ 15 N = 9.9‰, rib collagen δ 13 C = -19.5‰ and δ 15 N = 9.8‰. The C:N ratios were within the acceptable range of 2.9-3.6(108) and the %C and %N in the samples were above the minimum thresholds (taken as >13% for C and >4-5% for N (109). The isotope values for JDS005 are below the mean for the general adult population from the Hospital of St John (mean dentine collagen δ 13 C = -19.2‰, δ 15 N = 12.1‰, n=55; mean rib collagen δ 13 C = -19.0‰, δ 15 N = 12.5‰, n=114) (110). More specifically, they are below the mean for adult males in the Hospital (mean dentine collagen δ 13 C = -19.1‰, δ 15 N = 12.1‰, n=31; mean rib collagen δ 13 C = -18.9‰, δ 15 N = 12.6‰, n=64) and general Parish adult males from the nearby High and Late Medieval site of All Saints by the Castle (mean dentine collagen δ 13 C = -19.5‰, δ 15 N = 11.9‰, n=14; mean rib collagen δ 13 C = -19.3‰, δ 15 N = 12.5‰, n=25) (ibid.). Indeed, JDS005 has some of the lowest δ 15 N values observed out of all the High and Late Medieval adults sampled from the Cambridgeshire population (ibid.). JDS005 is also unusual in the lack of meaningful change in the isotope values between their dentine and rib collagen (difference in values is below the analytical error of ±0.2‰). This is different to the rest of the adult population from the Hospital of St John, who generally exhibited a change in values between the tissues, particularly in δ 15 N (mean Δ 13 C rib coll-dentine coll = 0.1‰, mean Δ 15 N rib coll-dentine coll = 0.5‰, n=53). JDS005 is well below the mean Δ 15 N rib coll-dentine coll values for adult males in the Hospital (0.7‰, n=30). Overall, the isotope values for JDS005 are indicative of a C 3 plant based diet, with little regular terrestrial animal protein input and little to no marine protein input. The lack of change in isotope values between the dentine and rib collagen indicates no detectable change in diet between childhood and adulthood. In terms of δ 15 N values, JDS005 appears to be quite unusual compared to the rest of the population buried in the Hospital of St John, as well as the Cambridge town population as a whole. Their much lower values and lack of change could indicate they were consistently consuming less animal proteins than much of the rest of the population of the town. When taken in context with their young age at death, pathologies present, short stature and eventual burial in the Hospital, this may be taken as an indication of someone who could not afford regular animal or marine proteins in their diet, someone who was too sickly to consume them, or even someone who chose not to consume them.

Supplementary Text A brief history of kissing Christiana Scheib
While there are Neolithic figurines that have been interpreted as a couple embracing (111), the earliest known written record of kissing is a Bronze Age manuscript from South Asia (112) (Fig. 4B). The custom may have made its way back to the Mediterranean with the return of Alexander the Great's troops around 300 BCE. Three hundred years later, the Emperor Tiberius is said to have tried to ban kissing at official functions in an effort to stop the spread of disease (unclear whether the disease was herpes) (113).
Such a transition in transmission routes would allow for transmission within larger host communities as well as at the intersection of populations when they meet. Sexual-romantic kissing is far from universal in human cultures (and is also observed in Chimpanzees and Bonobos), but is higher in prevalence in modern European, Middle Eastern and Asian cultures (114).    Following identification of a significant temporal signal at the node falling basal to the non-African HSV-1, the same plot is demonstrated only for descendants from this node (as highlighted in panel a. In both cases plot headings provide the rate, MRCA, regression coefficient and p-value following 10,000 permutations of the sampling date computed using the BactDating roottotip() function. (C), Models with blue circles are inside 95% HPD, red outside, and without circles have at most 0.40% support. Model 123421 (transversion model -TVM) had 86.79% posterior support and 86.79% cumulative support. (D), Posterior distributions estimated under a strict clock and three possible specifications of demographic models (red -coalescent constant, green -coalescent exponential, blue -coalescent skyline). Estimated tree heights are given in the left-hand panels, estimated clock rates are provided in the right-hand panels. (E), Posterior distributions estimated under a relaxed clock and three possible specifications of demographic models (red -coalescent constant, green -coalescent exponential, blue -coalescent skyline). Estimated tree heights are given in the left-hand panels, estimated clock rates are provided in the right-hand panels.      (Norberg et al. 2006) 137088