Candidate SNP Markers Significantly Altering the Affinity of TATA-Binding Protein for the Promoters of Human Hub Genes for Atherogenesis, Atherosclerosis and Atheroprotection

Atherosclerosis is a systemic disease in which focal lesions in arteries promote the build-up of lipoproteins and cholesterol they are transporting. The development of atheroma (atherogenesis) narrows blood vessels, reduces the blood supply and leads to cardiovascular diseases. According to the World Health Organization (WHO), cardiovascular diseases are the leading cause of death, which has been especially boosted since the COVID-19 pandemic. There is a variety of contributors to atherosclerosis, including lifestyle factors and genetic predisposition. Antioxidant diets and recreational exercises act as atheroprotectors and can retard atherogenesis. The search for molecular markers of atherogenesis and atheroprotection for predictive, preventive and personalized medicine appears to be the most promising direction for the study of atherosclerosis. In this work, we have analyzed 1068 human genes associated with atherogenesis, atherosclerosis and atheroprotection. The hub genes regulating these processes have been found to be the most ancient. In silico analysis of all 5112 SNPs in their promoters has revealed 330 candidate SNP markers, which statistically significantly change the affinity of the TATA-binding protein (TBP) for these promoters. These molecular markers have made us confident that natural selection acts against underexpression of the hub genes for atherogenesis, atherosclerosis and atheroprotection. At the same time, upregulation of the one for atheroprotection promotes human health.


Introduction
Atherosclerosis is characterized by deposits made of lipids, pieces of connective tissue and inflammatory and smooth muscle cells inside the arteries [1]. All its stages from atheroma formation and growth to complications (e.g., myocardial infarction [2] and stroke [3] as the leading causes of death, according to the World Health Organization (WHO) [4]) represent the inflammatory response to the lesion caused by specific cytokines. Atherosclerosis is an aging-related disease [5], with such factors as acquired dyslipidemia [6], acquired diabetes [7], tobacco smoking [8], past diseases [9], lack of physical activity and excessive physical activity that are collectively called lifestyle increasing the same time, Haldane's dilemma [40] and the neutral theory of molecular evolution [41] provide independent evidence that an absolute majority of human SNPs are neutral. Consequently, as far as neutral SNPs are concerned, their much less demanding computer-assisted identification for exclusion from the cohort search for clinical molecular markers can be helpful to personalized medicine [30].
The clinical SNP markers that have been identified to date can be found in the freely accessible databases OMIM [42], ClinVar [43] and GWAS Central [44]. By far the majority of these markers were found in the protein-coding regions of genes in the form of damage to protein molecules, which is easier to detect, because protein damage is subject to no variation in all human tissues. However, protein damage attributable to SNPs cannot be repaired due to an ethical prohibition to edit individual human genomes [45]. Admittedly, research into genome editing with the use of animal models to treat human diseases is one of the hot spots in modern biomedical genetics and pharmacogenomics (see, for example, [46]).
At the same time, only an infinitesimally small share of the known SNP markers of diseases have been identified in the regulatory regions of genes [35,[42][43][44], where SNPs only modulate their expression levels in humans, not affecting the proteins encoded by these genes. Because the expression levels of the genes depend on the tissue and the stimulus even without a single SNP in the regulatory regions of these genes, regulatory SNP markers of diseases are more difficult to detect [47].
Nevertheless, the pathogenic effects of such molecular markers can be eliminated by changing lifestyle [48], breaking unhealthy habits [49], using low-molecular inhibitors of proteins encoded by the corresponding genes as medicines against excess of these proteins [50], using exogenous recombinant human proteins to compensate for the lack of these proteins caused by the corresponding SNPs [51] and using short antisense oligonucleotides as pharmacotherapy treatment of SNP-promoted misregulation of human gene expression [52]. That is why regulatory SNP markers are becoming more and more in focus as personalized medicine progresses [53].
Various regulatory biomedical SNP markers were found due to a significant change in the affinity of the TATA-binding protein (TBP) to the promoters of the human genes carrying these SNPs. This change takes place when TBP initiates the assembly of preinitiation complexes from a dense transcriptionally inactive packaging of these promoters into nucleosomes [54,55], this assembly being one of the earliest stages in the initiation of transcription of these genes [56][57][58][59]. If it were not for this molecular event, no primary initiation of gene transcription would have happened: knockout model animals (TBP-/-) run out of maternal TBPs at the blastula stage and have their development arrested [60,61], while chromatin immunoprecipitation followed by sequencing (ChIP-Seq) locates TBP binding sites before the starts of most transcripts [62][63][64].
Therefore, the SNPs that alter TBP-promoter affinity are phenotypically easy to observe: the higher the affinity of TBP to the promoter carrying that particular SNP, the higher the expression level of the gene regulated by this promoter, which was confirmed by a large number of independent experiments [65][66][67][68][69]. By contrast, the in silico assessment of all other regulatory SNPs' phenotypic manifestations remains a challenge bioinformatics has to overcome [70,71].
In our previous works, we proposed a bioinformatics model of TBP binding to the promoter in three consecutive steps [72]: (i) TBP slides [73] along the slightly bending DNA helix of the promoter [74] <=>; (ii) TBP stops at the potential TBP binding site [75,76] <=>; and (iii) the DNA helix of the promoter has a bend angle of 90 • , so the TBP-promoter [77][78][79] complex becomes fixed. The model was confirmed by an independent in vitro experiment [80]. Nevertheless, we additionally checked this in silico model using 68 independent experimental cases [68] in PubMed [81] in our real-time in vitro experiments [82,83] in equilibrium [84] and nonequilibrium [85,86] conditions, and with transfected human cell cultures ex vivo [87]. This led us to the development of SNP_TATA_Comparator, a freely accessible web service [88]. SNP_TATA_Comparator uses two 90-bp DNA sequences of the promoter before the transcription start site (one corresponding to the norm, the other to the minor variant of the given SNP) as input data and outputs in silico estimates of TBP affinity for these promoter variants expressed as nanomoles per liter (nM) of the equilibrium dissociation constant, Kd, of the TBP-DNA complex, standard errors of these estimates and the statistical significance, p, of the mutation-induced change in the expression level of the gene regulated by this promoter according to Fischer's Z-test [89]. With this web service, we have previously revealed candidate SNP markers of autoimmune disorders [90], behavioral disorders [91] and chronopathologies [92] Following a 50-year-old tradition to compare the frequencies of different types of mutations (transitions vs. transversions [93], inserts vs. deletions [94] and synonymous vs. nonsynonymous substitutions [95]) in order to estimate the parameters of molecular evolution, we compared the respective frequencies of SNPs that increase and decrease TBP affinity to the promoters of the same genes. This comparison was done in response to the question as to whether these genes are under natural selection or neutral drift [96]. In so doing, we have already named the likely modes of evolution of human genes associated with aggression [97], rheumatoid arthritis [98], hypertension [99] and all protein-coding human Y chromosome genes associated with the male reproductive potential [100]. Finally, based on this standpoint, we have previously used SNP_TATA_Comparator [88] and analyzed all 1189 SNPs in all 90-bp promoters before the transcription start sites of 26 and 8 human genes associated with atherogenesis and atheroprotection, respectively [101,102]. As a result, we have revealed 238 candidate SNP markers that significantly change the affinity of the TATA-binding protein (TBP) for these promoters and are significantly consistent with stabilizing selection as the sum of neutral drift accelerating atherogenesis and natural selection favoring enhanced atheroprotection.
For that reason, here we have used the same way to analyze all human hub genes for atherogenesis, atherosclerosis and atheroprotection and verified the results obtained, assuming the hypothesis about a role of self-domestication in human evolution [103,104].

The Human Hub Genes for Atherogenesis, Atherosclerosis and Atheroprotection
We have studied all 1068 human genes that were associated with atherogenesis (n = 180), atherosclerosis (n = 999) and atheroprotection (n = 47) according to the most current build of NCBI's Gene database [105] (the workflow is explained in Section 3.1 and depicted [106,107] in Figure 1). Symbols for each of these genes are explained in Table S1 (here and elsewhere: see Supplementary Materials). As the first step, we performed a comparative analysis of the overlap between these three sets of human genes (for the results, see the Venn diagram in the upper part of Figure 1). A total of six non-overlapping subsets have been revealed, their members being presented in Table S1. For example, 16 hub genes for atherogenesis, atherosclerosis and atheroprotection have been found: APOA1, C1QTNF9, CD163, CRP, CXCR4, HMOX1, KLF2, LCAT, NFE2L2, NR1H3, PF4, PON1, PON2, SERPINF1, TLR2 and YAP1 (Figure 1, gray-green bar).  Table S1 (hereinafter: see Supplementary Materials); the dotted line is an auxiliary line running above the gray-green bar for the subset (a) of 16 human hub genes for atherogenesis, atherosclerosis and atheroprotection, the tops of the other five bars being above the line. (*): statistical significance p < 0.05. U and Z are the values of the statistics in the nonparametric Mann-Whitney U test and parametric Fischer's Z-test according to STATIS-TICA (StatSoft TM , Tulsa, OK, USA). PAI, genes' phylostratigraphic age index evaluated against the BLAST-based scale [106] Table S1 (hereinafter: see Supplementary Materials); the dotted line is an auxiliary line running above the gray-green bar for the subset (a) of 16 human hub genes for atherogenesis, atherosclerosis and atheroprotection, the tops of the other five bars being above the line. (*): statistical significance p < 0.05. U and Z are the values of the statistics in the nonparametric Mann-Whitney U test and parametric Fischer's Z-test according to STATISTICA (StatSoft TM , Tulsa, OK, USA). PAI, genes' phylostratigraphic age index evaluated against the BLAST-based scale [106] using the freely available web service Orthoscape [107]; BLAST-based PAI scale: 0, Cellular organisms; 1, Eukaryota; 2, Opisthokonta; 3, Metazoa; 4, Eumetazoa; 5, Bilateria; 6, Deuterostomia; 7, Chordata; 8, Craniata; 9, Vertebrata; 10, Gnathostomata; 11

Human Hub Genes for Atherosgenesis, Atherosclerosis and Atheroprotection Are the Most Ancient on the Molecular Evolution Scale
As the second step, we performed an in silico phylostratigraphic analysis of all the 1068 human genes (see the central part of Figure 1). This was done using Orthoscape [107,108], our previously developed plugin for Cytoscape [109], a freely available web service. With Orthoscape, we obtained estimates of each gene's phylostratigraphic age index (PAI) on the molecular evolution scale of the Basic Local Alignment Search Tool (BLAST) [106]. The results obtained are given in Table S1. This table also displays the arithmetic mean of the PAIs and the standard error of the mean (SEM) for each of the non-overlapping subsets of genes, these values being proportional to the heights of the gene bars and error bars in the chart in the central part of Figure 1. As can be seen, the gray-green bar for the subset of 16 human hub genes for atherogenesis, atherosclerosis and atheroprotection falls below the auxiliary dotted line, which is below the tops of the other five bars. This is statistically significant according to the binomial distribution (p < 0.05, (*) in the upper part of the chart). It means that the hub genes for atherogenesis, atherosclerosis and atheroprotection in the 16-strong set are the most ancient of those being studied.
Finally, as can be seen from the central part of this chart, the subset (a) of the hub genes for atherogenesis, atherosclerosis and atheroprotection is significantly (p < 0.05) more ancient than the subset (b) of 106 hub genes for atherogenesis and atherosclerosis, but without relevance to atheroprotection (gray-blue bar), according to the nonparametric Mann-Whitney U test and parametric Fischer's Z-test (p < 0.05). This could be regarded as an additional piece of independent evidence in support of the conclusion that the subset of 16 human hub genes for atherogenesis, atherosclerosis and atheroprotection is the most ancient. With this in mind, we focused our further study on these 16 genes as on the fundamental feature that arose in many-celled animals of the subkingdom Eumetazoa, which have the nervous system, differentiated tissues and specialized intercellular contacts (Table S1: mean PIA = 4.06 ± 1.02) in the course of their evolution from more ancient forms of animals in the kingdom Metazoa (PIA = 3). If the genes in question promoted wound healing (as does atherogenesis or atheroprotection) in eumetazoans of reproductive age, then at later age a pathology like atherosclerosis [19] as an age-related disease [110] could be the cost for this adaptive advantage. This is consistent with the conclusions from an independent in silico modeling of evolution in heterogeneous microorganismal communities [111]. Thus we estimated the overall effect of atherogenesis, atherosclerosis and atheroprotection on the health of quite diverse humans, without belittling the importance of other genes described in Figure 1: (b) 106 hub genes for atherogenesis and atherosclerosis, (c) 21 hub genes for atheroprotection and atherosclerosis, (d) 10 genes specific for atheroprotection, (e) 58 genes specific for atherogenesis, and (f) 856 genes specific for atherosclerosis. The data on these genes provide molecular evidence of how animal complexity has grown on its way from Metazoa to Homo (PIA = 28).
This figure is a graphical rendering of the report following an intellectual "data mining" analysis of articles and databases about the molecular pathways in which the proteins encoded by human hub genes for atherogenesis, atherosclerosis and atheroprotection are involved. First of all, the figure displays "Reverse cholesterol transport", "Cholesterol efflux" and "Lipid metabolism", which are key for atherogenesis [113]. Furthermore, the molecular pathways "Inflammation", "Oxidative stress" and "Reactive oxygen species (ROS) generation" in this associative network have been found to be relevant to the development of atherosclerosis [114]. Finally, "Regeneration", "Wound healing" and "Antiinflammatory response" are integral molecular pathways involved in atheroprotection [115]. With the use of this associative network ( Figure 2) and the PubMed database [81] as accessed on 15 March 2023, we have annotated each of the 16 human hub genes for atherogenesis, atherosclerosis and atheroprotection ( Figure 1) and answered the question as to how the deficiency or excess of their protein products accelerates or slows down the development of each of these three processes . Our answers to this question can be found in Table S2.
Next, with our freely available web service SNP_TATA_Comparator [88] run in the automated mode, we analyzed all 5112 SNPs in the 90-bp proximal promoters before the starts of all protein-coding transcripts from each of the 16 human hub genes for atherogenesis, atherosclerosis and atheroprotection (see the bottom row of the table in the lower part of Figure 1). For example, the human CRP gene is one of the 16 hub genes for atherogenesis, atherosclerosis and atheroprotection (Figure 2). It encodes C-reactive protein as a key molecular marker for an inflammatory process in vessel walls, the development of Figure 2. The associative network of signaling pathways (right) including the proteins (left) encoded by the 16 human hub genes for an atherogenesis, atherosclerosis and atheroprotection. The network was built using the automated mode of our publicly available web service ANDSystem [112] with "Human, proteins, APOA1, C1QTNF9, CD163, CRP, CXCR4, HMOX1, KLF2, LCAT, NFE2L2, NR1H3, PF4, PON1, PON2, SERPINF1, TLR2, YAP1, pathways" as input. Legend: Nodes: red circles, proteins. Proteins: APOA1, apolipoproteins A1; C1QTNF9, C1q and TNF related protein 9; CD163, molecules CD163; CRP, C-reactive protein; CXCR4, C-X-C motif chemokine receptor 4; HMOX1, heme oxygenase 1; KLF2, KLF transcription factors 2; LCAT, lecithin-cholesterol acyltransferase; NFE2L2, NFE2 like bZIP transcription factor 2; NR1H3, nuclear receptor subfamily 1 group H members 3; PF4, platelet factor 4; PON1, and PON2, paraoxonases 1 and 2, respectively; SERPINF1, serpin family F member 1; TLR2, toll like receptor 2; YAP1, Yes1 associated transcriptional regulator. Network edges: green barb arrow and T-shaped arrow: activation and deactivation, respectively; purple barb arrow and T-shaped arrow: positive and negative regulation, respectively; black, blue, brown and purple lines: association, involvement upregulation and regulation, respectively. This figure is a graphical rendering of the report following an intellectual "data mining" analysis of articles and databases about the molecular pathways in which the proteins encoded by human hub genes for atherogenesis, atherosclerosis and atheroprotection are involved. First of all, the figure displays "Reverse cholesterol transport", "Cholesterol efflux" and "Lipid metabolism", which are key for atherogenesis [113]. Furthermore, the molecular pathways "Inflammation", "Oxidative stress" and "Reactive oxygen species (ROS) generation" in this associative network have been found to be relevant to the development of atherosclerosis [114]. Finally, "Regeneration", "Wound healing" and "Anti-inflammatory response" are integral molecular pathways involved in atheroprotection [115]. With the use of this associative network ( Figure 2) and the PubMed database [81] as accessed on 15 March 2023, we have annotated each of the 16 human hub genes for atherogenesis, atherosclerosis and atheroprotection ( Figure 1) and answered the question as to how the deficiency or excess of their protein products accelerates or slows down the development of each of these three processes . Our answers to this question can be found in Table S2.
Next, with our freely available web service SNP_TATA_Comparator [88] run in the automated mode, we analyzed all 5112 SNPs in the 90-bp proximal promoters before the starts of all protein-coding transcripts from each of the 16 human hub genes for atherogenesis, atherosclerosis and atheroprotection (see the bottom row of the table in the lower part of Figure 1). For example, the human CRP gene is one of the 16 hub genes for atherogenesis, atherosclerosis and atheroprotection ( Figure 2). It encodes C-reactive protein as a key molec-ular marker for an inflammatory process in vessel walls, the development of atherosclerosis and an associated risk of cardiovascular diseases and their complications [130]. In the promoters located before the starts of all the five protein-coding transcripts from this gene, we found 226 SNPs, of which only two were consistent with a significant decrease in TBP affinity for these promoters (Table S3: rs1660782424:C and rs1660782480:G). In particular, this table includes SNP rs1660782480:G, in which G replaces A at position −28 relative to the start of transcript CRP-201 (position "+1"), according to Ensembl [35] and dbSNP [36]. According to our calculations described in Section 3.5, this substitution in the canonical TATA box (the TBP-binding site is in bold type and underlined) "tgctttggatAtaaatccagg => tgctttggatGtaaatccagg" reduces TBP affinity for this promoter from normal 2.26 ± 0.23 nanomoles per liter (nM) for the equilibrium dissociation constant K D of the TBP-promoter complex with the A nucleotide to mutant 7.64 ± 0.76 nM with the G nucleotide. Similarly, SNP rs1660782424:C replaces T with C at position −28 of the same TATA-box, namely: "tgctttggataTaaatccagg => tgctttggataCaaatccagg", which corresponds to a decrease in TBP affinity for the same promoter as follows: 2.26 ± 0.23 nM => 6.19 ± 0.62 nM (Table S3).
According to a large body of experimental data reported by independent authors (for review, see [67][68][69]), any decrease in TBP affinity for the promoter of a gene causes a decrease in the expression of this gene. In the example being considered, this provides evidence in favor of a reduction in C-reactive protein, which can have a beneficial effect on human health by retarding atherogenesis [130], enhancing atheroprotection in severe obesity [132] and reducing the risk of cardiovascular events in atherosclerosis [133].
As can be seen from the lower part of Figure 1, we have similarly revealed 330 candidate SNP markers, of which 150 decreased TBP affinity to the promoters of these hub genes and 316 increased it. According to multiple independent empirical observations [67], the former decreased the expression levels of these genes, while the latter increased them. An increased frequency of SNPs increasing TBP affinity for the promoters compared to that of SNPs decreasing it is significantly (Figure 1: P ADJ < 10 −4 , binomial distribution with Bonferroni's correction) different from the whole-genome ratio of their respective frequencies, according to the 1000 Genomes Project Consortium [32,210,211]. This implies that the human genome is under neutral drift [40,41]. This observation provides evidence that natural selection favors elevated expression of the human hub genes for atherogenesis, atherosclerosis and atheroprotection.
Finally, to understand whether natural selection favors or disfavors the candidate SNP markers that significantly changing TBP affinity for the promoters of the hub genes for atherogenesis, atherosclerosis and atheroprotection, we counted how many of these SNP markers improve or adversely affect human health as a result of their influence on these three processes (see Table S3, column "Effect on human health during atherosclerosis, atherogenesis and atheroprotection [Ref]"). The results obtained are given in Table 1. As can be seen from this table, the candidate SNP markers that significantly change TBP affinity for the promoters of the human hub genes for atherogenesis, atherosclerosis and atheroprotection significantly improve human health indicators due to improved atheroprotection.

Verification of the Results Obtained by Analysis of Human Hub Genes for Atherogenesis, Atherosclerosis and Atheroprotection Using the Most Current Build of the ClinVar Database
First of all, we checked ClinVar [43] (accessed on 25 April 2023) for clinical data on each of 330 candidate SNP markers in hub genes for atherogenesis, atherosclerosis and atheroprotection, predicted for the first time in this work (Table S3). We found the only entry about a patient diagnosed as having "Osteogenesis Imperfecta, Recessive" with SNP rs541151948 in the SERPINF1 gene encoding protein serpin F1 (see Table S3). The clinical significance value for this ClinVar entry [43] was "Uncertain significance", to indicate an unclear role of this SNP in this disease.
According to this table, this SNP has two minor alleles, rs541151948:A and rs541151948:T, when the nucleotide C (norm) is replaced by A and T, respectively, at position −60 relative to the start of the SERPINF1-205 transcript: "gagtgcaggtCgctttaagaa => gagtgcaggtAgctttaagaa" and "gagtgcaggtCgctttaagaa => gagtgcaggtTgctttaagaa". These substitutions are consistent with an increase in TBP affinity to the promoter of this transcript from 10.21 ± 0.92 nM (norm) to 7.27 ± 0.51 nM and up to 8.53 ± 0.77 nM, respectively, which is statistically significant at p < 10 −6 and p < 10 −6 according to Fischer's Z-test. This implies that the carriers of the minor alleles rs541151948:A and rs541151948:T can be expected to have excessive amounts of serpin F1, according to our estimates in Table S3.
As can be seen from this table, excessive serpin F1 in the models of human atherogenesis using senescent vascular smooth muscle cells is a biomarker of atherosclerotic plaques [185] and, consequently, narrowed blood vessels, as were clinically observed in the patient with osteogenesis imperfecta [192]. Additionally, according to a comprehensive overview [183], overexpression of SERPINF1 inhibited regeneration in mouse-based models of human wound healing [184]. A similar reduction in regeneration rates was observed in delayed fracture healing, according to a historical cohort study of patients with osteogenesis imperfecta [193]. Finally, elevated serpin F1 as a pro-inflammatory cytokine inhibitor guarantees a controllable reduced level of inflammatory processes [189][190][191], which were observed in a cohort study of children with osteogenesis imperfecta [194]. Thus, the phenotypical manifestations of excessive SERPINF1 in osteogenesis imperfecta are consistent with the symptoms of accelerated atherogenesis, reduced atheroprotection and mild atherosclerosis (Table S3).
Thus, our original proposal is to use the known clinical SNP marker rs541151948 for recessive osteogenesis imperfecta documented in ClinVar [43] as being a candidate SNP marker of accelerated atherogenesis, reduced atheroprotection and mild atherosclerosis.
Next, to elucidate the unknown role of the clinically known SNP-marker of recessive osteogenesis imperfecta, rs541151948, in the development of this disease, we propose the following possible molecular mechanism. Minor alleles rs541151948:A and rs541151948:T can significantly enhance serpin F1 levels, which can narrow blood vessels [185], delay regeneration in injury healing [183,184] and inhibit pro-inflammatory cytokines [189][190][191].
Finally, as the reader can see, with our resource-friendly in silico analysis of 5112 SNPs in the promoters of 16 hub genes for atherogenesis, atherosclerosis and atheroprotection, we have identified two candidate biomedical SNP markers, rs541151948:A and rs541151948:T, as being the most promising for a less resource-friendly cohort search for clinical SNP markers for personalized medicine, including their testing for the Hardy-Weinberg equilibrium [38].

Verification of the Results Obtained by Analysis of Human Hub Genes for Atherogenesis, Atherosclerosis and Atheroprotection Using RNA-Seq Data on Domestic and Wild Animals
Our further interest was to verify whether natural selection does favor human health improvement through overexpression of the hub genes for atherogenesis, atherosclerosis and atheroprotection, assuming the hypothesis about a role of self-domestication in human evolution [103,104]. To this end, we collected all freely available transcriptomes (RNA-Seq data) of domestic and wild animals ( Table 2) using the PubMed databases [81]. As can be seen from the bottom row of Table 2, a total of 2905 differentially expressed genes (DEGs) have been found in nine tissues of seven domestic species and seven wild counterparts, according to 11 original works [212][213][214][215][216][217][218][219][220][221]. Table 2. RNA-Seq data on domestic animals versus their wild counterparts (PubMed data [81] The 26 animal DEGs that we have found to be homologous to the human hub genes for atherogenesis, atherosclerosis and atheroprotection are listed in the right-hand part of Table S4; the effects that changes in the expression levels of these hub genes have on human health are listed in the left-hand part. As can be seen from the bottom row of the table in Figure 1, the DEGs whose expression levels are higher in domestic than wild animals occur at significantly higher frequencies than they would have if they had been under neutral drift [32][33][34][35][36][37][38][39][40][41] (P ADJ < 0.05, binomial distribution with Bonferroni's correction). This is consistent with a case of destabilizing selection as a necessary attribute in animal domestication [222].
Additionally, we compared changes in expression levels between (a) the human hub genes for atherogenesis, atherosclerosis and atheroprotection, and (b) the DEGs in the domestic animals (the upper part of Table 3). The lower part of this table contains estimates of the statistical significance of the correlations between the effects of same-direction changes in the expression levels in the homologous human genes on human health and in animals under domestication. As can be seen from the bottom line, elevated expression levels of the human hub genes for atherogenesis, atherosclerosis and atheroprotection, and of their homologs, are equally significant molecular markers for health improvement and domestication. Table 3. Verification of the results obtained by analysis of human hub genes for atherogenesis, atherosclerosis and atheroprotection using RNA-Seq data on domestic and wild animals. Thus, with the use of independent experimental RNA-Seq data on domestic and wild animals, and assuming the hypothesis that self-domestication has a role in human evolution [103,104], we significantly confirmed the fact that natural selection acts against underexpression of the human hub genes for atherogenesis, atherosclerosis and atheroprotection. At the same time, enhanced atheroprotection prevents the formation of atheromas, thus promoting health.

The Human Genes
We have studied the human genes in the lists of search results for keyword entries in the dialog box of the NCBI Gene database [105] accessed on 15 March 2023 with the following filters activated: "Genomic", "Protein-coding", "Annotated genes", "Ensembl", and "Current". With ["atherogenesis" AND "Homo sapiens"] as input data, a list of 180 protein-coding human genes associated with atherogenesis was produced. Similarly, the use of input data in the form of ["atherosclerosis" AND "Homo sapiens"] produced a list of 999 human genes associated with atherosclerosis, while ["atheroprotective" AND "Homo sapiens"] produced a list of 47 human genes associated with atheroprotection (see Figure 1). Taking into account the overlaps between these three lists in the Venn diagram (Figure 1), we have studied 1068 human genes (see Table S1).

In Silico Assessment of the BLAST-Based PAIs for a Human Gene
We calculated the BLAST-based [106] PAI for an arbitrary human gene using its NCBI Entrez gene number as input data for our Orthoscape plug-in [107,108] within the Cytoscape software suite [109]. The output was the most recent common ancestor of all animal species in which the DNA sequence of this gene is known. The following evolutionary rank scale was used: 0,

Data Mining of Literature Sources and Databases Publicly Available on the Internet
We performed data mining using our previously published freely available web service ANDSystem [112] run in the automated mode, with "Human, proteins, APOA1, C1QTNF9, CD163, CRP, CXCR4, HMOX1, KLF2, LCAT, NFE2L2, NR1H3, PF4, PON1, PON2, SERPINF1, TLR2, YAP1, pathways" as input data, all the other parameters set at their default values. As a result, we built the associative network shown in Figure 2, which characterizes the participation of the above-mentioned proteins encoded by the human hub genes for atherogenesis, atherosclerosis and atheroprotection (left-hand border) in the molecular pathways (right-hand border), according to ANDSystem [112] accessed on 15 March 2023.

DNA Sequences
For in silico analysis of the human hub genes for atherogenesis, atherosclerosis and atheroprotection, we retrieved the DNA sequences and SNPs of their 90-bp proximal promoters from Ensembl [35] and dbSNP [36], respectively, with SNP_TATA_Comparator run in the automated mode [88], when it uses the Bioperl library [223] for access to these databases.

In Silico Analysis of DNA Sequences
We analyzed the SNPs in the DNA sequences with SNP_TATA_Comparator [88] run in the automated mode, and this analysis was identical to those described previously [88,103]. It implements our bioinformatic model of the three-stage binding of the TATA-binding protein (TBP) to a given 90-bp proximal promoter of the human gene [224][225][226][227] (for the most detailed description, see Section S1 "Supplementary methods for DNA sequence analysis").

Human_SNP_TATAdb and PetDEGsDB: Our All-New Knowledge Bases
We have documented (1) the candidate SNP markers that we have found to significantly change TBP affinity for the promoters of 16 human hub genes for atherogenesis, atherosclerosis and atheroprotection, and (2) the estimates of the potential effect of these molecular markers on human health (Table S3) in an Excel-compatible flat file. Similarly, we have generated a similar Excel file for associations between the human hub genes and the DEGs (Table S4), and added it to PetDEGsDB, a freely available knowledge base, with its new assembly accessible at www.sysbio.ru/domestic-wild (accessed on 15 March 2023) in the MariaDB 10.2.12 database management system (MariaDB Corporation Ab, Espoo, Finland).
Finally, we have similarly generated another knowledge base for the candidate SNP markers that significantly change TBP affinity for human gene promoters, Human_SNP_TATAdb, its first version being publicly accessible at www.sysbio.ru/Human_SNP_TATAdb in MariaDB 10.2.12 (as accessed on 15 March 2023).

Statistical Analysis
We performed the Mann-Whitney U test, Fisher's Z-test, and an exact test for the binomial distribution using appropriate options in STATISTICA (Statsoft TM ).

Conclusions
We have for the first time estimated the phylostratigraphic age indices (PAIs) of all 1068 human genes associated by the most current NCBI gene build [105] with atherogenesis, atherosclerosis and an atheroprotection and placed them on the BLAST-based molecular evolution scale (Table S1). We have thus found that the 16-strong set of hub genes regulating these three processes is the most ancient ( Figure 1).
Next, we have for the first time performed an in silico assessment of the effects of all SNPs localized in 90-bp promoters before the starts of all protein-coding transcripts from these 16 hub genes on TBP affinity for these promoters according to Ensembl [35] and dbSNP [36] as accessed on 15 March 2023. We have thus found 330 candidate SNP markers that significantly change this affinity and, therefore, the expression of the hub genes for atherogenesis, atherosclerosis and atheroprotection, and, consequently, have a bearing on human health. We have found that natural selection acts against underexpression of the hub genes for atherogenesis, atherosclerosis and atheroprotection, and, due to enhanced atheroprotection, favors human health improvement.
Finally, we have verified all 330 candidate SNP markers in the hub genes for atherogenesis, atherosclerosis and atheroprotection with the use of the ClinVar [43] database (accessed on 25 April 2023). We have thus for the first time proposed to use the known clinical SNP marker rs541151948 for recessive osteogenesis imperfecta [43] as a candidate SNP marker of accelerated atherogenesis, reduced atheroprotection and mild atherosclerosis.

Acknowledgments:
We are thankful to the multi-access bioinformatics center for the use of computational resources as supported by Russian government project No. FWNR-2022-0020.

Conflicts of Interest:
The authors declare no conflict of interest.