Incidence of ancient variants associated with oncological diseases in modern populations

Abstract Publicly available genome-wide data, sequenced from 2730 ancient human samples were analyzed for genetic predisposition to malignancy. The temporal and spatial incidence of risk variants for cancer diseases in ancient genomes was recorded, allowing for estimates of their frequencies in ancient human communities. We identified 55 risk alleles associated with oncological conditions in the screened ancient samples. For further analysis, we selected three variants conclusively deemed to entail pathogenic effect, the VHL gene’s rs28940298, the TP53 gene’s rs78378222 and the BRCA2 gene’s rs11571833. The contemporary population frequency of these three variants is lower, or similar, to their estimated frequency in ancient human communities, indicating that they might have been subject to negative selection. This is also suggested by their temporal dynamics during the last 10000 years, which show an overall temporal decrease in population frequency. The oldest samples in which these three variants were established testify their ancient origin, including the presence of TP53 gene’s rs78378222 variant in 50000 BP old Neanderthal and Denisovan genomes. Our results demonstrate the presence of cancer-causing mutations in ancient human communities and suggest that they differed in frequency among ancient populations as they do among contemporary ones. Data on germline mutations in tumour suppressor genes in ancient human genomes are scarce and their historic prevalence gives insights into the evolution of predisposition to cancer disease and hence helps advance paleogenomic medicine.


Introduction
There has been a rise in the prevalence of oncological diseases in modern times but the reasons behind this rise remain unclear [1].Detrimental factors associated with modern lifestyle, the prolongation of human lifespan in recent decades, and novel screening methods could all be contributing to this trend.Analyses of human remains from ancient communities, facilitated by the achievements of contemporary oncology research, could give insights into the evolution and epidemiology of cancer diseases.The rarity of preserved soft tumour tissues, and difficulties in the analysis of neoplastic lesions in mummies [2], partially explains why data on cancer prevalence in ancient human communities are scarce.Short lifespan and the absence of carcinogenic substances in the environment such as pollution, food additives and tobacco, might also be factors explaining this scarcity of data.Presumptive cancer cases based on visible pathologic bone lesions have been demonstrated in ancient Egyptian populations [3,4].Analysis of skeletal remains from necropolises in the vicinity of Rome from the Imperial Age (first century BC -third century AD) establish a neoplasm typical of osteosarcoma [5], and a rare case of bone metastasis of prostate carcinoma in an elderly man [6].Merczi et al. (2014) detect three cases of probable skeletal metastatic carcinoma in Hungary from the Roman period (1st-fifth century AD) [7].
The effect of mutations predisposing to cancer is currently mostly based on genetic studies on contemporary populations.However, exceptionally preserved neoplastic cells in the mummy of Ferrante I of Aragon, King of Naples and leading figure of the Italian Renaissance (1431-1494), indicate the presence of an infiltrating epithelial malignant tumour, probably in the colon [8].The subsequent molecular analyses establish mutations in codon 12 in the K-RAS oncogene, a main hotspot for mutations causing colon cancer.Another study on eighteenth century mummies establishes a missense mutation in the APC gene, mutations in which are known to be common and strongly associated with the development of colorectal carcinomas [9].These studies indicate that genetic predisposition to cancer already existed in the pre-industrialization era.genomic methods are just beginning to be utilized for analyses of disease occurrence in ancient DNA (aDNA).Such efforts are however complicated by the difficulty in characterizing the relationship between genotype and tumour phenotype in ancient samples.Also, DNA extracted from ancient bone material is highly fragmented, which does not allow complete reading of the information and provides only limited opportunities for analyzing germline mutations.Insight can be gained however by virtue of the ever increasing data of ancient whole-genome data, coupled with accumulating knowledge about the clinical significance of mutations predisposing to hereditary types of cancer in contemporary human populations.
The aim of this study was to analyze publicly available ancient genome-wide data for the prevalence of risk variants in genes predisposing to cancer.

Materials and methods
This study is based on analysis of 2730 publicly available ancient genome-wide data with information for up to 1.23 million positions in the genome in hg19 coordinates [10].The allocation to geographic regions of the aDNA samples is given in Table 1.
The age distribution from 100 BP to 15000 BP of the analyzed ancient samples is given in Figure 1.
Of all the genomic positions in the aDNA for which data were available (up to 1.23 million), initially we filtered out variants listed in the publicly available DisgeNet database [11] of genes and variants associated with human diseases (of the 369554 variants listed in DisgeNet, there was information for 32643 in the ancient genome data samples).A list of 167 (Supplementary material S1) genes associated with oncologic conditions was compiled of genes used in screening tests of five renowned genomic laboratories, i.e. genologica Medica (Spain), Sema4 (united States), Baylor genetics (united States), Invitae (united States) and genomic Diagnostic laboratory, Children's Hospital of Philadelphia (united States).The ancient genome-wide data contain information about 479 variants in the 167 genes associated with oncological conditions (Supplementary material S1).Based on large-scale case-control studies, WgS analyses, meta-analyses and other relevant studies, the DisgeNet database identifies 55 of these variants to be risk alleles for oncological conditions (Supplementary material S2).

Results
The estimated frequency of these 55 risk variants in ancient populations is consistently lower than their global frequency in contemporary populations (Figure 2).genetic factors of marked importance for diseases, including cancer are, however, rare variants, as the functional effect of rare variants is usually larger than that of common variants [12].Therefore for further analyses we selected three variants with low frequency in contemporary populations (MAF ≤ 0.01): the VHL gene's rs28940298, the TP53 gene's rs78378222 and the BRCA2 gene's rs11571833.These three variants are represented with black triangles in Figure 2 and are highlighted with bold text in Table S2 (Supplementary material), and their contemporary frequencies are lower or similar to their estimated  frequencies in ancient populations.The temporal dynamics of these variants during the last 10000 years indicate an overall temporal decrease in population frequency (Figure 3).Among the ancient genome-wide samples analyzed in this study, there are two from Neanderthal individuals (from the Vindija Cave in Croatia and from the Altai region in Russia), and from a Denisovan individual from Siberia, all dated from around 50000 BP.Intriguingly, all these three archaic human individuals harboured one of the three mutations, the TP53 gene's rs78378222.
The mutation rs28940298 C > T in the VHL gene is evaluated as likely pathogenic by VarSome.It is established in homozygous state (TT) in a total of 40 ancient genomes, 23 of which are from Europe, the rest being from different parts in Asia, but also in a 1000-year-old sample from the Caribbean (Figure 4).
In contemporary human populations, a search for the rs28940298 C > T variant in 251358 genomes finds it 53 times, the overall population frequency being much lower (MAF = 0.00015) than that estimated in ancient human communities (MAF = 0.015).Among contemporary populations, its frequency varies from 0.0007 in South Asian populations, 0.0003 in European (Non-Finnish) populations, to as low as 0.00006 in African populations, and the mutation is not detected at all in latin populations.The five oldest samples in which this mutation was established (around 7800-8700 BP) are all from the Balkan region and Anatolia, indicating a possible region of origin of this mutation.
Its temporal dynamics suggest, despite fluctuations and three peaks (at around 9000 BP, 7000 and 9000 BP), an apparent gradual frequency decline through time (Figure 3).Examining the population frequencies of all disease associated variants from DisgeNet that are also found in the ancient genome-wide data, this mutation is among 0.0006% with largest relative drop in frequency between the overall ancient and contemporary estimates.Disease associated variants are expected to be affected by negative selection, and this result suggests that the rs28940298 mutation has been subject to strong negative selection.
The TP53 gene's rs78378222 variant is found in 19 ancient human genomes (Figure 3).In contemporary populations, its frequency is highest in European (MAF = 0.014) and latin American (MAF = 0.013) populations, corresponding to its incidence in ancient human communities.The presence of this variant in 50000-old Neanderthal genomes and a Denisovan genome from Central Asia is an indication of their considerable age, as well as of their possible place of origin.
The mutation rs11571833 A > T in the BRCA2 (K3326* mutation) is a stop-gain variant.It is established in only three aDNA samples, from Anatolia (8000 − 8500 BP), South ural (3500 − 4000 BP) and Caucasus (13000-13500 BP) (MAF = 0.001).In contemporary human populations, the frequency of the T allele is also very low (MAF = 0.007), being highest in European populations.

Discussion
For this paleogenomic study, publicly available genome-wide data were examined for the incidence of SNP variants associated with cancer.Human populations differ in frequency of SNP variants and these differences have been the basis for studying differentiation and migration among human populations [13].genetic variants also play a role in adaptation to extreme environmental conditions such as cold and altitude [14], malaria resistance [15], lactose [16] and arsenic [17] tolerance and other environmental factors.Some variants, on the other hand, are the result of spontaneous or induced mutagenesis, and lead to disease and reduced lifespan.
We initially hypothesized that oncological diseases were less prevalent in ancient human communities, as potentially mutagenic factors, e.g.toxic chemicals in the environment or in the food, tobacco smoking, radiation, etc., were not as widespread.Also, we hypothesized that mutations in tumour-suppressor genes might have been expressed to a lesser extent in ancient communities, as only a small portion of individuals reached advanced age for the clinical effect of pathogenic mutations to manifest itself.
The VHL rs28940298 mutation was initially identified in the contemporary Chuvash people of Russia where it is endemic and 1 in 20 individuals are heterozygous  carriers [18].It has been established in other populations, e.g.Indian [19], Russian and ukrainian [20], in patients of Asian and Western European ancestry [21].A study by liu et al. ( 2004), based on the worldwide distribution of the mutation and haplotype analysis of 8 polymorphic markers around this gene concludes that this mutation was spread from a single founder 14000 to 62000 years ago [22].Even though we establish extensive worldwide distribution of this variant from ancient samples, the hypothesis of spread from a single founder cannot be dismissed due to the antiquity of the mutation and subsequent human migrations.Negative selection, and attenuation of endemic and other environmental factors, could explain the gradual frequency decrease of this variant through time.
The normal functioning of the VHL gene can be disrupted by a multitude of mutations leading to Von Hippel-lindau disease predisposition [23].The molecular pathogenesis is dictated by the loss of function of the VHl protein, which binds and degrades transcription of hypoxia-inducible factor α (HIF-α), an oxygen-dependent pathway.Protein from mutant VHL gene cannot recognize HIF-α, so HIF-α accumulates affecting the expression of a group of downstream genes, including erythropoietin (EPO), vascular endothelial growth factor (VEgF) and transforming growth factor (TgF-223 α), to promote oncogenesis [24].This tumour-suppressor gene is often associated with renal cell carcinoma (RCC), pancreatic cyst or tumour (PCT), central nervous system haemangioblastoma (CHB), retinal angiomas (RA) and phaeochromocytoma (PHEO).A number of studies have shown that the rs28940298 (c.598C > T) mutation in the VHL gene in homozygote state causes Chuvash polycythaemia, an autosomal recessive form of erythrocytosis [18,22,25].This mutation results in a substitution of arginine for tryptophan at codon 200 (p.Arg200Trp).The mutant protein causes HIF-α levels to increase, promoting erythropoiesis and resulting in polycythaemia.Chuvash polycythaemia is characterized by increased red blood cell mass and haemoglobin levels, increased mass of solid organs [26], and is associated with arterial and venous thrombosis, major bleeding episodes, cerebral vascular events and premature mortality [27].
TP53 is considered a key tumour suppressor gene which acts as a transcription factor and induces the expression of genes involved in cell cycle regulation and genes involved in the intrinsic (PuMA, BAX) and extrinsic (TNFRSF10B, FAS) pathways of apoptosis [28].Studies of TP53 gene have mostly focussed on mutations in exons [29], while the uTR regions have been largely ignored [30].A gWAS from 2011 identifies association of the rs78378222 mutation with prostate cancer, glioma and colorectal adenoma [31], while a meta-analysis from 2016 suggests that this mutation is a potent risk factor for overall cancer [32].
Pathogenic mutations in the BRCA2 gene predispose to hereditary breast and ovarian cancer (HBOC).However, the germ-line stop-gain mutation K3326* (rs11571833) is also associated with risk of lung cancer [33,34], cancers of the upper-aero-digestive tract [35] and with the risk of urinary tract cancers [36].The oldest instance the BRCA2 gene rs11571833 A > T mutation was established in a sample from the Caucasus region from 13000 to 13500 BP, an approximate estimate of its minimum age.It was established only two more times in the analyzed samples, in a 8000-8500-old sample from Anatolia and in a 3500-4000-old sample from Southern ural.The scarcity of established cases of this variant in ancient samples prevents us from making any reliable inferences on its historical prevalence.yet, all ancient samples in which it was detected are from Western Asia, whereas its contemporary distribution is global, which might indicate worldwide spread from this region.Also, compared to the estimated ancient frequencies (0.0003), the contemporary frequencies are higher, i.e. the global estimate being 0.0066 and regionally ranging from as high as 0.011 in Finnish down to 0.0012 in African populations.
The BRCA2 gene encodes the breast cancer type two susceptibility protein (BRCA2), an essential protein in genome maintenance, homologous recombination (HR) and replication fork protection.One role of BRCA2, common to DNA break repair, DNA crosslink repair, and replication fork protection, is delivery of RAD51 to sites where it is needed [37].The BRCA2 gene rs11571833 A > T mutation is a germline nonsense mutation entailing stop codon K3326X and loss of the terminal 93 amino acids of the protein chain.It is located in the C-terminus of BRCA2, which also contains a RAD51 binding domain [38].This mutation eliminates Thr3387 of the BRCA2 protein obstructing the localization of BRCA2 to the nucleus and the release of Rad51.Recent studies show that the key role of BRCA2 in the repair and recovery from stalled replication forks involves exon 27, which harbours the K3326X mutation [39].
geographic bias is inherent to all aDNA studies and it has inevitably also affected our results about the incidence and prevalence of cancer associated variants in ancient human communities.Our results demonstrate the presence of such mutations and suggest that they differed in frequency among ancient communities as they do among contemporary populations.Key to understanding disease aetiology is identifying the processes that cause the genetic disease prevalence among human populations to differ.Disease-causing variants are studied intensively so that screening programs can be established in populations where they are found with relatively high frequency [40].

Conclusions
For the present study, publicly available genome-wide data from ancient DNA samples were analyzed for the incidence of variants associated with cancer diseases.Of the established 55 risk alleles for oncological conditions, we selected three variants conclusively deemed to entail pathogenic effect, VHL rs28940298, BRCA2 rs11571833 and TP53 rs78378222.The contemporary population frequency of these three variants is lower, or similar to their estimated frequency in ancient human communities, indicating that they might have been subject to negative selection.This is also suggested by the temporal dynamics of these variants during the last 10000 years that show an overall temporal decrease in population frequency.The oldest samples in which these three variants were established testify their ancient origin, especially intriguing being the presence of TP53 rs78378222 variant in 50000 BP Neanderthal and Denisovan genomes.Data on germline mutations in cancer-causing genes in ancient human genomes are scarce and their historic prevalence might give insight into the evolution of cancer disease and help advance modern medicine.

Figure 2 .
Figure 2. estimated population frequencies of the 55 variants identified to be risk factors for oncologic diseases in ancient vs. contemporary populations.points above the identity line (x = y) are variants estimated to have higher frequencies in contemporary compared to ancient populations.the three variants (rs28940298 in VHL, rs11571833 in BRCA2 and rs7837822 in TP53) are represented with black triangles.

Figure 3 .
Figure 3. temporal dynamics (1000-10000 Bp) of the three examined variants established in ancient samples.the frequency trajectories are plotted using bins of 1000 years and sliding windows of 500 years.uncertainty of the frequency estimation is indicated by a grey coloured area, representing the normal approximation of the 95% binomial proportion ci.

Figure 4 .
Figure 4. geographical location of the aDna samples in which the three variants were established.the samples are allocated in panels corresponding to the time period to which they have been dated (0-3000 Bp, 3000-5000 Bp, and 5000-52000 Bp).

Table 1 .
allocation to geographic region of the ancient Dna samples analyzed.