Ancient DNA from a lost Negev Highlands desert grape reveals a Late Antiquity wine lineage

Significance The modern winemaking industry is heavily reliant on a limited number of European grape cultivars, which are best suited for cultivation in temperate climates. Global warming emphasizes the need for diversity in this high-impact agricultural crop. Grapevine lineages bred in hot and arid regions, often preserved over centuries, may present an alternative to the classic winemaking grape cultivars. Our study of a legacy grapevine variety from the Negev Highlands desert of southern Israel sheds light on its genetics, biological properties, and lasting impact. The modern-day close relatives of the archaeological grapes may now provide an exceptional platform for future studies on grapevine resilience to aridity.

evidenced in Early Islamic period when trash began to be deposited inside abandoned 85 houses. This phenomenon was documented as late as the 8th or 9th c. CE.   In 2016, an excavation was carried out in front of the cave in a room with well-preserved 106 walls making this one of the best-preserved external structures outside any of the caves 107 in the site. The room is 2.60 × 3.40 m. in size and the southern and western walls were 108 preserved to a height of over 2.8 m. and 3.1 m., respectively. The interior of the room 109 was covered with a layer of collapsed stone ceiling slabs that sealed the lower levels. The 110 slabs, together with the location of the structure on the southern slope where it was less 111 exposed to winter rains and run-off, ensured unusually well-preserved layers of rich 112 organic material below the slabs. 113 A round, carved stone animal trough containing an outlet was revealed in the northwest 114 corner of the room while a second stone carved installation, apparently used to hold a 115 torpedo-shaped Gaza wine jar, was uncovered at a slightly higher level next to a  Based on the radiocarbon dating of wood and organic finds and the typology of the 129 complete amphorae, the shipwreck was dated to the early Islamic period (end of the 130 7th-beginning of the 8th centuries CE; (11, 12)). The most significant finds are the large 131 quantities of well-preserved botanical remains, including olive pits, walnuts, peach 132 stones, carob pods, pine cones and grape pips.  We compared the spectra of the grape pips to Lignin, cellulose and charcoal. We chose 142 samples whose spectra resemble to lignin and cellulose, and showed that they contain 143 uncharred organic residue. Most of the grape pips from Shivta and Nessana were 144 charred while the pips from Avadt were well preserved.    Table S4.  The mapped reads were filtered based on the quality of the mapping, allowing a 241 minimal Qmap of 8. Additionally, mapped reads with more than eight mismatches were 242 filtered out. We allowed a relatively high number of mismatches so that sequences with 243 ancient DNA type damage may still be used for genotyping, while confidence in their 244 sequencing accuracy is reduced through the change in their sequencing quality field.

245
Reads that were multiply aligned were also removed. Multiply aligned reads were 246 defined as reads that were aligned more than once with the reference genome and their 247 second best alignment had less than twice the number of mismatches as the best 248 alignment. An average of 6.8% of the reads in the capture sequenced samples and 4.1%  Table S4, it was decided to create two separate datasets. One was The archaeological samples averaged in coverage between X4.6 and X59 for these 286 datasets and the modern native samples averaged in coverage between X72 and over 287 X500, see Tables S1&S2.

288
Substitution frequencies analysis 289 We followed the changes in the numbers and in the frequencies of all of substitution 290 types within the ancient samples (when compared to the reference genome) starting 291 with the first mapping round and ending in the genotypes. This is summarized in Table   292 S6. We found that the combined frequencies of C->T and G->A type substitutions are 293 reduced from 47.6% of all substitutions in the first mapping stage, to 33.1% in the 294 filtered genotypes stage.

295
For comparison, we calculated the same statistics over the nine modern Israeli native 296 samples that were capture sequenced. This is summarized in Table S7. For these 297 samples, the combined frequencies of C->T and G->A substitutions totalled in 28.8% in 298 the mapped reads stage and in 32.6% in filtered genotypes. We also showed that the   clones or close relatives. They include cultivar pairs whose names are spelt slightly collected in the same areas such as Nitzanim_1 -Nitzanim_P. 348 We used the seven pairs with known kinship to assess the level of kinship with the 349 archaeological samples. In addition, we use them as controls when testing for error 350 introduced due to biases. See below.

Control for error introduced through inaccurate imputation 352
To determine if errors were introduced in the phasing analysis through erroneous 353 imputation we rerun the phasing analysis over a SNPs dataset in which no missing data 354 was allowed. 355 We filtered out 79 samples (out of the total 1,007) with more than 3% missing data and shared IBD segments (missing data analysis). 363 We present the lengths of shared IBD segments between each sample pair in Fig. S6A.

364
All known clones and parent-offspring pairs rank similarly with and without missing 365 data. A33-Asswad Karech pair also ranks similarly, thus the ranking is robust to error 366 caused by erroneous phasing. not alter their coverage cutoff. We were left with 6,126 SNPs. 372 We filtered out samples with less than 60% genotype calls in 6,126 SNPs, and were left 373 with 959 samples, of which 66 were southern Levant samples, including two 374 archeological samples (A32 and A33). We repeated the phasing as described in the 375 methods for this dataset and identified a total of 70,913 shared IBD segments (10X 376 analysis). As control, we repeated phasing for the original dataset with the same 959 377 samples and randomly chosen 6,126 SNPs, and identified a total of 71,547shared IBD 378 segments (5X analysis). 379 We present the lengths of shared IBD segments between each sample pair in Fig. S6B.

380
All known clones and parent-offspring pairs rank similarly when the X10 and the X5 381 analyses are compared. A33-Asswad Karech pair also ranks similarly, thus the ranking is 382 robust to error caused by low coverage.

383
Accounting for ascertainment bias in the archaeological data 384 Ascertainment bias in the archaeological data, such as the excess in homozygosity 385 compared with the modern data, discussed in Chapter 5, may reduce the power of the 386 analysis to correctly identify IBD segments in the archeological samples (i.e. false 387 negative). To make sure there is no reason for concern for the wrong inference of 388 kinship (false positive), we employed two additional analyses that supports the 389 conclusion that A33 and Asswad Karech are highly related and apparently parent-       Figure S1. Radiocarbon probability distribution of the calibrated dates of the grape pips from Avdat. Top: The probability 502 distribution of the calibrated dates of the grape pips. The dates are ordered by spit and in spit they are ordered from old to young. 503 Bottom: Modelled probability distribution of the calibrated dates. The model is a sequence of two phases (spit 7 is older than spit 2). 504 505 Figure S2. Patterns of deamination damage on sequences of the ancient, as detected by mapDamage software. X-axis: location on 506 the sequence segment. Y-axis: substitution frequency. 507 508 Figure S3. Heterozygosity percentage of cultivated (pink), wild (light blue) and three archaeological samples A31, A32 and 509 A33 (dark blue) calculated over the large SNPs dataset of 6,928 SNPs. The archaeological samples are the least 510 heterogenic as a group, however they fit within the distribution of heterozygosity in the wild group.   Karech (in red) and of the right copy of Asswad Karech (in blue) to A33's two copies. 530 Figure S8. The consensus clustering of 100 STRUCTURE runs per K for K=2-7. Samples are divided to groups according to 532 geographic origin and cultivation status (cultivated/wild). The native Israeli samples whose species is not known were 533 classified here in accordance to the PCA results in Figure 3C