Developing an Online Portal for Determining the Genomic Signature of Archaic DNA that are Associated to Modern Human Genetic Diseases: A Meta-Analysis Study

Objective: Mutations or introgression can cause and rise adaptive alleles of which some can be beneficial. Archaic humans lived more than 200,000 years ago in Europe and Western Asia. They were adapted to the environment and pathogens that prevailed in these locations. It can therefore be thought that modern humans obtained significant immune advantage from the archaic alleles. Materials and Methods: First, data were collected by meta-analysis from previously identified genetic diseases caused by alleles that were introgressed from archaics. Second, the in silico model portal (http://www. archaics2phenotype.xxx.edu.tr) was designed to trace the history of the Neanderthal allele. The portal also shows the current distribution of the genotypes of the selected alleles within different populations and cor-relates with the individuals phenotype. Results: Our developed model provides a better understanding for the origin of genetic diseases or traits that are associated with the Neanderthal genome. Conclusion: The developed medicine model will help individuals and their populations to receive the best treatment. It also clarifies why there are differences in disease phenotypes in modern humans. sapiens began to spread in the world from Africa around 30,000 years ago [4]. Therefore, Neanderthals and early humans coexisted and mated. Modern genetic data show that Neanderthals mated with modern humans in Europe when they coexisted. As a result, almost 1-4% of the modern humans’ genome consists of genes from Neanderthals. The genes that were inherited from Neanderthals help us fight deadly viruses such as Epstein-Barr. However, gene mutations have also resulted in diseases such as Crohn’s disease, type-2 diabetes, lupus, heart diseases, and depression [1]. This study focused on the genomes that passed from Neanderthals to modern humans. As a significant proportion of these, archaic-specific DNAs are found within the TLR1-TLR6-TLR10 gene cluster that belong to toll-like receptors (TLRs). TLRs recognize the structure of

pathogens and provide natural immunity against many pathogens. Therefore, they are an important defense against pathogens. TLRs are known to respond to stimuli associated with various pathogens and to provide signal responses necessary for the activation of innate immune effector mechanisms and subsequent development of adaptive immunity [1].
A previous study indicated that modern humans carry three archaic-like haplotypes, and three TLRs that were inherited from archaic humans were identified. Two of these haplotypes resemble those of the Neanderthal genome, and the third haplotype resembles that of the Denisovan genome. The frequency of single nucleotide polymorphisms (SNPs) commonly shared in Neanderthal-like haplotypes varies among continents and populations. In Europe, allelic frequencies of Neanderthal-like core haplotypes are higher in Southern European populations [1].
First, we aimed to collect previously identified archaic-like SNPs that have clinical significance by meta-analysis. Then, we combined scientific knowledge and outcome from previous studies to determine diseases in modern humans, which were received genetically from Neanderthals. Second, a software program was developed to merge previously identified archaic-like SNPs and their clinical pathogenicity. Thus, this study and the developed software give us data regarding the origin of diseases in modern humans. Finally, an in silico model was designed for clinicians and researchers to trace the history of the archaic alleles and determine the possible correlation with the persons' phenotype, thus providing a better understanding to interpret the underlying mechanisms of the diseases.

Materials and methods
Recent data by Dannemann et al. [1] were used to determine the archaic-like SNPs that are represented within the human genome. The research group previously identified 79 archaic-like alleles within the TLR6-TLR1-TLR10 gene cluster, which indicates repeated introgression from archaic humans. Meta-analysis was performed to find out the possible clinical significance of those genetic markers from 1000 genome populations.
Neanderthal introgression maps of Sankararaman et al. [5] and Vernot et al. [6] were used for the identification of archaic-like haplotypes that are potentially observed in modern human genomes. The introgression map presented by Sankararaman et al. [5] provides the possibility of the emergence of SNPs on the polymorphic positions of Neanderthals in modern humans [5]. Vernot et al. [6] detected introgressed regions of modern human reference sequence and compared these candidate regions with reference from Neanderthal genome [6]. They used introgression possibilities per SNP for all Asian and European individuals. They also calculated the difference between Neandertal probabilities from the distance between neighboring SNP pairs, including three TLR genes and an additional region of 50 kb (Chromosome 4:38.723.860-38.908.438) [1]. Potentially archaic-like SNPs in this region were identified in 109 Yoruba individuals in the genome dataset of Neanderthal or Denisovan genomes. Furthermore, Deamann et al. [6] agreed that this introgressed region covers chromosome 4 of 143 kb (Chromosome 4:38.760.338-38.905.731) and contains 61 archaic-like SNPs. This region overlaps with two haplotypes identified by Vernot et al. [6].
Microsoft visual studio C++ 2008 edition served as the integrated development environment and the C programming language was used to build software.
The software was generated in two parts. In the first part, software was created to allow the user to search information via the created database in two different ways. The user could conduct the search using SNP ID and chromosome location. The database was created in the second part.
The in silico genome browser was designed for the first time to show the data collected so far of all identified archaic-like SNPs and their clinical significance. Therefore, a program was created to generate a database that comprised all the data collected for 79 archaic-like SNPs. The SNP variation of ancestral nucleotides, the diseases caused by the SNPs, and allele frequencies and genotype frequencies according to 1000 genome populations were added to the program, which was created separately for each SNP ID.
The website has four main sections: Homepage, About us, User guide, and Contact.
In the Homepage section, a search can be conducted using SNP IDs or chromosome locations. If the given ID or location matches any on the database, the result will be visible on the screen. In the About us section, the user can get general information about the website. The User guide section is designed to guide the users on the use of the website. The Contact section is designed to allow the user to communicate with the website administrator regarding any queries they may have about the website.

Results
Before we conducted our study, all significant information about archaic SNPs was scattered at different places and various genome browsers. Therefore, we aimed to merge all information as the first step. Our merged meta-analysis data provided a better understanding of the mechanism and background of diseases.
Second, the in silico genome browser was created and transferred to the online platform. This generated genome browser provides online access to researchers and clinicians. After separately creating the domain name and hosting service, they were merged to create the publically free website http://archaics2phenotype. xxx.edu.tr/.
This website was generated for researchers and clinicians. The created database will facilitate the work of researchers because they can obtain all data with references via our browser. Our developed in silico model provides better understanding of the origin of genetic diseases or traits associated with archaic genomes. Moreover, it provides quick access to data for researchers and clinicians through genome browser.
A meta-analysis, which combines the results of multiple independent studies in a given subject, was performed to collect all the identified archaic-like SNPs. We used three international genome browsers and scientific articles for meta-analysis. We determined 79 archaic-like SNPs from the study of Dannemann et al. [1]. Then, 1000 genomes were used to check for SNP registration and identification in the 1000 genome populations.
The clinical significance of the identified SNPs was determined using the genetic browsers. In this study, three different international databases were used to collect data: Ensembl genome, 1000 genome, and dbSNP. Additionally, population genetic information was collected from 1000 genome data by Ensemble. Thus, for each population, allele frequencies and genotype frequencies were obtained for each determined SNP.

• Senturk and Ergoren. Neanderthals and Modern Human Diseases
Eurasian J Med 2020; 52(2): 153-60 • Our developed in silico model provides better understanding of the origin of genetic diseases or traits associated with archaic genomes.
• It provides quick access to data for researchers and clinicians through genome browser.
• The developed software was designed to help individuals and their belong populations to receive the best treatment in the future.

Main Points
Allele frequency is the frequency of occurrence of a specific allele in a population. For example, if A is dominant allele and T is recessive allele, we have three different possibilities for allele combination. These would be AA, AT, and TT. Genotype frequency will be how often we see each allele combination in the population. But, allele frequency is how often we see each allele (A or T) in the population. Thus, allele frequency is the number of A alleles divided by the total number of alleles (A+T) or the number of T alleles divided by the total number of alleles in the population.  [7]. Additionally, these alleles are also associated with resistance to several microorganisms and allergic diseases. Different alleles together with variety of gene expressions cause different disease phenotypes in modern humans. These archaic-like SNPs are responsible for some disease and disease susceptibility in the human genome ( Figure 1). Our meta-analysis report lists each archaic-like SNP and its association with pathogenic diseases (Table 2).
During the designing of the database, the program codes were written in C language in Visual Studio. The software of this study was basically divided into two parts.
In the first part, users could enter three different input options independently or simul-taneously; the software was designed to allow users to search using SNP ID, or chromosome location of the interested SNP, or both. In the second part, the output of the searched input was displayed on the screen. In this part, the data acquired were used to create the in silico browser. After the creation of the necessary algorithms, all collected informative data about the 79 archaic-like SNPs were integrated with the new software. Thus, the archaics2phenotpe software was generated.
After creating the software and database, the website was generated and the database was posted on the website for online access. This website is an information sharing platform, which is available online to users.
Domain name and web hosting are required to create a website. First, the domain, archai-cs2phenotype.xxx.edu.tr, was created to setup the website for internet browsers. Second, the web hosting was created to activate the website. The users of the website could access all available data on the website through the web hosting. In addition, all data of the 79 archaic-like SNPs were stored in the hosting service. The database created using the software was transferred into the hosting service. Then, both the domain name and web hosting service were connected to each other and the website was activated eventually. As a final step, the interface of the website was designed. The appearance of the in silico genome browser is crucial for ease of use. The database was transferred to the website for online access and the data are at present freely available at http://archaics2phenotype.xxx.edu.tr/ to the public worldwide.

Discussion
Genetic and archeological studies showed that Neanderthals and modern humans interbred 50.000 years ago. The fossil findings revealed that the population of Neanderthals began to decline 40.000 years ago and the Neanderthal generations become extinct 39.000 years ago. There were many factors that contributed to their extinction and many hypotheses about their generation. First possibility is the rivalry for resources or direct warfare between Neanderthals and modern humans. Modern humans were more advanced technologically and were better hunters compared to the Neanderthals. Therefore, humans had better chances of survival. Second possibility is that the Neanderthals were adapted to cold climate. Their lives became difficult as the climate became warmer gradually. Another possibility could be the new pathogens and parasites found in the new environment [8].
Considerable genetic diversity occurs in humans by ancient polymorphisms. Thus, Neanderthal and modern haplotypes are not much diverged from modern human sequences. In Europe, the allelic frequencies of Neanderthal-like core haplotypes are higher in Southern European populations [1], for example, Tuscany in Italy and Iberian populations in Spain (TSI and IBS with frequencies of 39.3% and 38.3%, respectively). In Asia, Neanderthal-like allele frequency core haplotypes are higher in East Asian populations, such as Japanese in Tokyo (JPT, frequency 53.4%) and Han Chinese (CHB, frequency 53.6%). The frequencies of other Asian populations are between 21.7% and 41.9% [1].
In Neanderthal genome project, the genome was obtained from the bones found in the Vindija cave. The extracted Neanderthal DNAs were compared to those of five different modern humans (French, Chinese, Papua New Figure 1. Illustration of the statistical calculation of the most common diseases or traits that might have been caused by archaic-like SNPs. The horizontal axis represents the most common diseases or traits and the vertical axis illustrates the frequency of the disease. Self-reported allergy is the most seen disease followed by Helicobacter pylori infection . Interestingly, alcohol consumption and amyotrophic lateral sclerosis had an association with archaic-like SNPs (5% and 4%, respectively). Other traits that were found <1% are endometriosis, blood pressure, coronary artery disease, abnormal lymphocyte counts, Paget's disease, height, allergic sensitization, breast cancer, and suicide attempts in bipolar and panic disorders.  Table 2. Shows each listed archaic-like SNP and its associated disease. These archaic-like SNPs mainly cause self-reported allergies and Helicobacter pylori serologic status.
Guinea, and Africans from San and Yaruba groups) [2]. The results from the initial analyses showed that Neanderthal DNA was more similar to the non-African population' s DNA than to the African one. The simplest explanation of this similarity was that there was a gene flow between Neanderthals and humans. There were significant differences between the modern humans and the Neanderthals in four genes: sperm-associated antigen 17 (SPAG17), which is responsible for sperm motility [8]; protocadherin-16 (PCD16), which is responsible for wound healing [9]; transcription termination factor TTF1, which is responsible for gene reading; and RPTN gene, which is highly expressed in hair follicles, skin, and sweat glands [10]. Apart from these, the mannose receptor C-type 1 (MRC1) gene, also found in Neanderthals and modern humans, played a role in cell communication. However, the Neanderthals carried a special mutation in the MRC1 gene. This mutation did not appear in modern humans. It had led to the formation of a pale skin color and red hair in Neanderthals [11]. Another