Analysis of genetic diversity in patients with major psychiatric disorders versus healthy controls: A molecular-genetic study of 1698 subjects genotyped for 100 candidate genes (549 SNPs)

Background: This study analyzed the extent to which irregularities in genetic diversity separate psychiatric patients from healthy controls

Background: This study analyzed the extent to which irregularities in genetic diversity separate psychiatric patients from healthy controls.Methods: Genetic diversity was quantified through multidimensional "gene vectors" assembled from 4 to 8 polymorphic SNPs located within each of 100 candidate genes.The number of different genotypic patterns observed per gene was called the gene's "diversity index".Results: The diversity indices were found to be only weakly correlated with their constituent number of SNPs (20.5 % explained variance), thus suggesting that genetic diversity is an intrinsic gene property that has evolved over the course of evolution.Significant deviations from "normal" diversity values were found for (1) major depression; (2) Alzheimer's disease; and (3) schizoaffective disorders.Almost one third of the genes were correlated with each other, with correlations ranging from 0.0303 to 0.7245.The central finding of this study was the discovery of "singular genes" characterized by distinctive genotypic patterns that appeared exclusively in patients but not in healthy controls.Neural Nets yielded nonlinear classifiers that correctly identified up to 90 % of patients.Overlaps between diagnostic subgroups on the genotype level suggested that (1) diagnoses-crossing vulnerabilities are likely involved in the pathogenesis of major psychiatric disorders; (2) clinically defined diagnoses may not constitute etiological entities.Conclusion: Detailed analyses of the variation of genotypic patterns in genes along with the correlation between genes lead to nonlinear classifiers that enable very robust separation between psychiatric patients and healthy controls on the genotype level.

Background
There is little proven knowledge about etiology and pathogenesis of psychiatric disorders.Even after 50 years of modern psychiatry, (1) there are no causal treatment options; (2) it is not possible to reliably predict if and when a particular patient will respond to a particular treatment; and (3) in individual cases it is hardly possible to make any reliable prognosis.
As to the genetically predisposed factors postulated to be involved in the pathogenesis of psychiatric disorders, evidence clearly speaks against single causes, as psychiatric disorders aggregate in families, but do not segregate.In particular, psychiatric disorders do not follow simple Mendelian modes of inheritance.No homotypic diagnostic patterns are observed in families with multiple affected subjects.Typically, the clinical diagnoses of first and second degree relatives appear to be independent of the index case's primary diagnosis.
Most importantly, our studies of monozygotic (mz) twins discordant for schizophrenia disorders made it clear that genetically predisposed factors are not a sufficient condition for the development of psychiatric disorders (Braun et al., 2017).Rather, genetics seems to act in the sense of an unspecific "vulnerability", so that the unaffected co-twins of mz twins with schizophrenia may be at an increased risk of developing psychiatric symptoms, but can still do very well in daily life.
The pathogenesis of psychiatric disorders is further obscured by etiological heterogeneity, which suggests that multiple pathways can lead to the same clinical picture.Eugen Bleuler, the renowned father of "schizophrenia", already spoke of the "group of schizophrenias" to emphasize that "schizophrenia" does not represent an etiological entity (Bleuler, 1969).The most likely etiological scenario is a complex interplay between multiple, genetically predisposed endogenous factors and multiple exogenous factors that may induce the development of latent disorders by triggering the manifestation of clinically relevant symptoms.Among the exogenous factors, lifestyle, diet, consumption behavior, and physical activity play a prominent role.Inflammation appears to be another major exogenous constituent explaining some 15-25 % of the observed phenotypic variance (Stassen et al., 2021;Wang et al., 2022).
This project did not follow standard genotype-to-phenotype association methods that rely on "psychiatric diagnosis" as phenotype (Dennison et al., 2020;Legge et al., 2021;Levey et al., 2021;Howard et al., 2019;Gordovez and McMahon, 2020), but investigated the extent to which irregularities in genetic diversity might separate patients with major psychiatric disorders from healthy controls, where "genetic diversity" denotes the multitude of genotypic patterns observed with each gene.
Analyses of genetic diversity (GDAs) bring up the problem of hidden population stratification due to admixture of people with different ancestries (Berger et al., 2006;Price et al., 2010;Shi et al., 2021).We addressed this problem by (1) recruiting half of the healthy controls from the patients' unaffected first-degree relatives so that part of patients and controls shared their ancestry; and (2) developing a "natural" model of "biological ethnicity" through cluster analyses of 73 SNPs located within the CLOCK gene exhibiting distinctive adaptations of North-South and West-East specifics.Both methods yielded estimates of the amount of genotypic variance that is explainable by hidden population stratification.
The project relied on 100 candidate genes reported in the literature as "possibly" involved in the pathogenesis of psychiatric disorders, and whose genotypic patterns were assessed through 549 SNPs.However, we did not expect any of these genes to be directly linked to a psychiatric disorder, as this would otherwise have been found long ago.As we were interested in significant deviations from "normal" diversity values, as well as in setting up multidimensional genetic vector spaces that represent genetic diversity in a metric model (cf.Stassen et al., 2003), the main selection criterion for candidate genes was the utmost variation in genetic diversity across subjects.On this basis we searched for vulnerability and resilience genes by means of multi-layer Neural Nets (NNs) in combination with methods of Artificial Intelligence (AI).Specifically, we addressed the following questions: (1) How to reproducibly quantify genetic diversity at a high resolution?(2) Are there genes for which genetic diversity is reduced in male schizophrenia patients, given the fact that 80 % of male patients have no offspring?(3) Are there vulnerability and resilience genes whose genotypic patterns can distinguish between psychiatric patients and healthy controls?(4) To what extent do vulnerability and resilience genes correlate with each other, i.e. are there genotypic patterns that show up more than randomly with each other?
The patients had been recruited from the daily admissions at three university hospitals in Switzerland and Germany, and from the daily admissions at two private mental health treatment centers in Switzerland.Selection criterion was a suspected ICD-10 diagnosis of F20 (schizophrenia), F25 (schizoaffective disorders), F31 (bipolar illness), or F32/F33 (major depression).All patients had been informed about the goals of this research project and that they can discontinue participation at any time without giving reasons and without facing any disadvantages from this.All patients had signed a written informed consent.
Psychopathology was assessed by specifically trained interviewers: (1) previous history through the syndrome-oriented 63-item SADS Syndrome Check List SSCL-16 (Endicott and Spitzer, 1978); and (2) response to treatment over up to 5 weeks through either the 30-item Positive and Negative Syndrome Scale PANSS (Kay et al., 1987), or the 17/21-item Hamilton Depression Scale HAM-D (Hamilton, 1960).The study protocol also included the collection of blood samples for serum extraction and DNA isolation (Qiagen: QIAamp Blood Maxi Kit).
A minimum baseline score of at least 21 on the general psychopathology PANSS-G Scale (primary "F2x.x"diagnoses), or of at least 15 on the HAM-D17 Scale (primary "F3x.x"diagnoses), was required at entry into study.The definitive diagnoses were decided by consensus of two experienced senior psychiatrists, with unclear cases being assigned to the residual group "other diagnoses".The late-onset Alzheimer's disease patients [24 males and 51 females of European ancestry; ages 78.3 ± 5.4 years; age-of-onset at 71.9 ± 4.9 years (range 65-85 years)] came from the NIHM (DNA and DSM-4 diagnoses).
The healthy control subjects were recruited either through advertising, or from the patients' unaffected first-degree relatives.Eligibility criteria for the healthy controls were: (1) European descent; (2) between 20 and 70 years of age, males and females; (3) native German speaker; and (4) no history of psychiatric disorders.All control subjects filled out the 63-item Zurich Health Questionnaire "ZHQ" (Kuny and Stassen, 1988).Using ZHQ data, we assigned subjects with a negative history of «consumption behavior», «psychosomatic disturbances», or «impaired mental health», to the residual diagnostic subgroup "other diagnoses".The information on ancestry was based on self-reports only.Details of the sample composition are given in Table 1.
Genotyping was performed using the iPLEX assay on the MassARRAY MALDI-TOF mass spectrometer "Sequenom" (Oeth et al., 2009), multiplexed with 40+ separate loci per reaction.This method is based on single base extension (SBE) of SNP specific primers using mass modified ddNTPs.In addition, SBE primer length was used to ensure unambiguous resolution of SNP and alleles.Quality criteria were a sample call rate >80 %, SNP call rate >95 %, and genotypes of CEU Trios in accordance with HapMap database >98 %.

Quantifying genetic diversity
The estimation of the genetic diversity associated with the 100 candidate genes relied on "gene vectors" which were assembled per gene from the genotypes of 4-8 polymorphic SNPs located within each gene.As a SNP can exhibit three different expressions regardless of allele definition, a base-3 system 2 was used to construct gene vectors: With m SNPs, a total of 3 m different genotypic patterns would be theoretically possible per gene.However, no more than half of the theoretically possible patterns were actually found among the 1698 subjects of this project, due to SNP correlations.As a rule of thumb, an average of 100 different genotypic patterns is expected for a 10-dimensional gene vector made up of five SNPs.Thus, gene vectors assess genetic diversity at an adequate resolution, as plenty of "variation" means plenty of "information".The number of different genotypic patterns per gene was referred to as the gene's "diversity index".
As genetic diversity depends on sample size, we generated a set of calibration data by drawing 32 random samples of equal size from the total sample (n = 1,698) for each gene, and for 24 sample sizes in steps of 50 between 50 and 1200.By averaging across the 32 random samples, we obtained 100*24 normative distributions for the 100 candidate genes, covering all sample sizes of the diagnostic subgroups within this project.As an estimate of the correlation between two genes j 1 and j 2 , we used the maximum frequency among the combinations of genotypic patterns of gene j 1 with gene j 2 , divided by the sample size.

Neural nets and artificial intelligence
Nonlinear Neural Nets (NN) connect the "neurons" of the input layer (the subjects' gene vectors) with the "neurons" of the output layer (the subjects' psychiatric diagnoses) via "hidden" layers.Our goal was to construct NN models that correctly classified all 1698 subjects in terms of psychiatric diagnoses through their gene vectors.NN connections were realized by (1) weight matrices; and (2) model fitting algorithms minimizing an error function in the weight space ("goodness of fit").The most popular model fitting strategy, the backpropagation algorithm (Hecht-Nielsen, 1989), looks for the minimum of the error function using the method of gradient descent.The achievable precision of the model essentially depends on the information included, the quality of underlying data, and the number of intermediate layers implemented to model nonlinear interactions (Fig. 1).
Results derived through standard NN approaches, which use 80 % of samples for training and the remaining 20 % for testing tend to be overoptimistic and prone to spurious, non-reproducible results.By contrast, the k-fold cross-validation approach splits the data into k roughly equal parts, using k-1 partitions for training, while one partition is used for testing.This process is repeated until each partition has served as a testing set, so that k estimates of prediction errors are generated.The resulting prediction errors are approximately unbiased for the "true" error for sufficiently large k (k ≈ 10 is a typical value in practice).In consequence, we relied on the k-fold cross-validation strategy with k = 10 throughout the entire project and applied the well-proven "random walk" strategy in order to distinguish between local and global minima.In this way, spurious and non-reproducible results were effectively eliminated.
We used Artificial Intelligence (AI) methods in order to refine the initial, tentative weights of genes.Genes that showed high diversity indices but little variation across all population subsets were weighted lower than genes that displayed large variation, as little variation means little information and, therefore, a small contribution to discrimination.
The genes' "informativeness" in terms of genetic diversity was estimated by drawing random samples with sample sizes ranging from 3 % to 30 % of the total sample along with tens of thousands of iterations.The algorithm detected subsets of test persons with marked deviations

Table 1
The «Zurich Molecular-Genetic Study of Psychiatric Vulnerability» encompasses 2008 patients hospitalized for major psychiatric disorders along with 464 healthy controls.For this project, 1698 subjects were genotyped for 100 specifically selected genes and 549 polymorphic SNPs located within these genes.Ages are given in years.from "normal" diversity indices.In parallel, the algorithm also looked for genotypic patterns that tended to show up exclusively in patients but not in controls.

Quantifying biological ethnicity
While GWAS are focusing on global, genome-wide "genetic ancestry" to estimate the probability of a subject to belong to a certain "ancestry group", the concept of "biological ethnicity" has its focus on hidden population stratifications that arise locally within chromosome segments.The CLOCK gene was chosen because it likely shows distinctive adaptations to typical "North-South" and "West-East" diurnal and seasonal patterns that might not only give rise to population stratification, but might also lead to disruptions of the body's internal clock (hypothesized to be linked to depression).We used five gene vectors by subdividing the gene into five segments, each with 15 SNPs, along with cluster analyses for the detection of "natural" subgroups inherent in the gene vectors.A principal component analysis prior to cluster analysis eliminated the correlations between the five gene vectors.

Statistical analyses
We used the Statistical Analysis Software SAS/STAT 9.4 by SAS Institute Inc. (PROCs: TTEST, GLM, ACECLUS, CLUSTER, FASTCLUS, MODECLUS, VARCLUS, PRINCOMP, and FACTOR) along with PROC HPNEURAL from SAS Enterprise Miner 15.1 for Neural Net analyses, complemented by NN and AI programs developed by our institute.

Diversity Index
In this Central European population of 1698 subjects, the diversity indices of the chosen candidate genes ranged from 18 (CYP2C19) to 476 (GPR39), with a mean value of 109.4 ± 82.8.Of the 681 SNPs originally genotyped, we had to exclude 190 SNPs (19.4 %) from subsequent analyses due to a missing data rate that was too high.To avoid possible biases caused by the varying numbers of SNPs within each gene, our plan was to weight genes reciprocally to the constituent number of their SNPs.Contrary to expectations, however, the diversity indices were found to be only weakly correlated with their constituent number of SNPs.In fact, a generalized linear regression model GLM explained no more than 20.5 % of the observed variance, whereas the combined factors chromosome, gene size, and gene position explained 6.63 %.It had therefore to be assumed that genetic diversity, as estimated by diversity indices, is an intrinsic gene property that has evolved over the course of evolution.An illustrative example is given in Fig. 2, where large differences showed up in the comparison between CYP2J2 (diversity index: 69) and SLC6A6 (diversity index: 182), although 5 SNPs were involved in both genes.Given these facts, weighting genes reciprocally to the number of their SNPs appeared to obscure this important gene property, thus being clearly counterproductive.The distribution of the ensemble of diversity indices exhibited two peaks (diversity indices around 70 and 170), along with seven genes exhibiting a diversity index above 250 (Fig. 2).
The 100*24 normative calibration curves, covering all 100 candidate genes and population sizes of this project, displayed a very robust behavior with respect to scattering and, when regarded as a function of sample size, with respect to continuity (Table 2).
This robustness is shown in Fig. 3 where the diversity indices of the two genes CYP2J2 and SCL6A6 are plotted for sample sizes between 50 and 1700 in steps of 50.The differences between the two curves regarding shape and steepness indicate different gene types, as CYP2J2 belongs to the left gene group in Fig. 1 (distribution peak around 70), and SLC6A6 to the middle gene group (distribution peak around 170).
The validity of the normative calibration curves was verified by comparing males (n = 742) with females (n = 956) in terms of the diversity indices of 96 autosomal genes.Virtually no differences showed up for any of the genes after correction for sample size.Thus, the amount of variance of between-population differences that was explainable by population size could be reduced to less than 10 %.This enabled us to accurately adjust comparisons between samples in terms of the sample  sizes involved.Differences derived through single gene comparisons were generally smaller than expected and did not survive Bonferroni corrections.As an alternative, we relied on the diversity indices of the total sample as reference, computed the differences between total sample and the diagnostic subgroup of interest, and created a "total score" by summing up the differences over the 100 genes.Subsequent ttests yielded several significant differences: (1) a significant reduction in genetic diversity (p < 0.0001) for patients with major depression (n = 596); (2) a significant reduction in genetic diversity (p < 0.0001) for patients with Alzheimer's disease (n = 75); and (3) a significant increase in genetic diversity (p < 0.0001) for patients with schizoaffective disorders (n = 64).It is important to note that population size did NOT explain the above deviations, as the deviations pointed in opposite directions for patients with Alzheimer's disease (n = 75) compared to patients with schizoaffective disorders (n = 64).The observed deviations were related to a small number of genes, while the majority of genes showed no differences.The hypothesis of a reduction in genetic diversity among male patients with schizophrenia could not be confirmed (p = 0.0693).
The contributions of genes to a given phenotype must not be purely additive for a given sample.If, for example, the contribution of one gene G1 to a given phenotype is 15 %, and the contribution of a second gene G2 is 10 %, then the joint contribution of G1 and G2 does not necessarily have to be 25 %, but can be considerably smaller, that is, just 20 %, or so.This is due to the fact that some genotypic pattern p1 from G1 may be linked to a genotypic pattern p2 from G2, such that p1 alone has the same contribution to the phenotype as p1 and p2 together ("redundancy").The "correlation" between G1 and G2 is a measure of inherent redundancy.
Almost one third of the genes under investigation showed such correlations, ranging from r = 0.0303 (GRIK3/TNF) to r = 0.7245 (CYP3A5/CYP3A7), with a mean correlation of 0.1027 ± 0.1025 for the patients with schizophrenia disorders (n = 363); of 0.1069 ± 0.1020 for the patients with major depression (n = 596); and of 0.1053 ± 0.1021 for the healthy controls (n = 267).With r ≤ 0.105 (p≈0.010),more than half of the empirically found correlations originated from smaller subsets among the patients of the diagnostic subgroups.The observed correlations were of interest in the context of the envisaged NN analyses, as  the NN method evaluates interactions between genes.By contrast, the observed correlations did not provide insights into the biological function or relevance of such interactions.

Singular genes
The distributions of the genotypic patterns showed no substantial differences between healthy controls and the patients of 5 diagnostic subgroups (Fig. 4a,b,c), with the only exception of several genes among the Alzheimer's patients (Fig. 4c).Although comparisons of single genotypic patterns occasionally reached statistical significance outlasting Bonferroni corrections, the phenotypic variance explained by this was very small and non-additive.
AI-controlled analyses revealed several genes that appeared to be illness-specific, as they exhibited genotypic patterns that showed up exclusively in patients but not in healthy controls.For example, 33.9 % of the schizophrenia group showed genotypic patterns of gene GPR39 which were completely absent in healthy controls.Similarly, 33.0 % of depressed patients showed genotypic patterns of gene GRIA1; 21.8 % of bipolar patients showed genotypic patterns of gene STAT1; 25.8 % of schizoaffective patients showed genotypic patterns of gene ABCB1; and 18.7 % of Alzheimer's patients showed genotypic patterns of gene Figure 4a. .The distributions of the genotypic patterns of the genes under study showed no substantial differences between healthy controls (n=468, upper half) and the patients of 4 diagnostic subgroups under study.Shown here is the diagnostic subgroup of "Schizophrenia" (n=363, lower half).SCL6A1, all of which being completely absent in healthy controls.Because of their distinctive characteristics, these genes were termed "singular genes".For each diagnostic subgroup, we found some 13-30 singular genes whose genotypic patterns appeared exclusively in at least 10 % of patients but not in healthy controls.
Most of the singular genes had higher than average diversity indices.The number of singular genes did not depend on sample size: (1) a total of 29 singular genes were found in the subgroup of schizophrenia patients (n = 363), virtually identical with the 28 singular genes observed in the subgroup of bipolar patients (n = 134); whereas (2) just 24 singular genes showed up in the subgroup of depressive patients (n = 596), compared to the 33 singular genes found in the much smaller subgroup of schizoaffective patients (n = 62) (Table 3).
Even though the diagnostic groups had singular genes in common, the singular genes differed from diagnostic subgroup to diagnostic subgroup in terms of genotypic patterns and intrinsic weights.It was even possible to identify a set of singular genes specific to the differences between schizophrenia and MDD patients.By contrast, we were not successful in finding health-specific "resilience genes", i.e. genes with genotypic patterns observed in significant numbers among healthy controls but not in patients.
Extending the control group by those 201 cases who did not meet the criteria of major psychiatric disorders ("Controls(+)"; n = 468), and rerunning the AI-controlled analyses left the results essentially unchanged.Only the number of singular genes reaching significance dropped somewhat (Table 3).

Neural net analyses
Augmented by the structure-generating a priori knowledge of singular genes, the NN analyses achieved good steady-state results when comparing the diagnostic subgroups with healthy controls.For the subgroups of patients with schizophrenia disorders, major depression, bipolar illness, and schizoaffective disorders, the NN algorithm yielded a rate of about 90 % correctly classified patients along with a 10 % subset of patients labeled as "unknown" (Table 4).This in contrast to (1) the subgroup of patients suffering from Alzheimer's disease which performed with 80 % correctly classified subjects slightly worse; and (2) the conglomerate subgroup of patients "other diagnoses" where the optimization terminated with 40 % of subjects classified as "unknown" (39.8 % false-negative error rate).
The construction of classifiers that separate patients with schizophrenia disorders from patients with (1) bipolar illness; (2) major depression; or (3) schizoaffective disorders was somewhat less successful, with false-negative error rates of 20 %.In particular, the NN constraint of a clinically desirable false-positive error rate of 0 % could not be upheld and had to be raised to 5 % to achieve useful results.All this indicated considerable genetic overlaps between the diagnostic subgroups in the range of 20 %-25 % (Alzheimer's disease: 15 %).In other words, there were patients with similar vulnerability profiles who have been assigned to different diagnostic categories.Conversely, there was an average of 10 % of patients for whom the vulnerability models derived by NN analyses did not fit at all (Alzheimer's disease: 20 %).The classifiers derived through NN analyses were composed of 6-10 genes: 4-5 core genes that were common to all classifiers, plus 2-5 accessory genes that depended on the target population (Table 5).The classifiers were non-unique.It was readily possible to exclude 1-2 genes (up to 3 genes) of an optimized classifier and re-run the NN analyses.This replaced the eliminated genes by other compatible genes, so that the modified classifiers achieved similar, only slightly reduced performances.
This redundancy inherent in the classifier genes was due to the correlations between these genes.For example, in the diagnostic subgroup of schizophrenia disorders (n = 363), gene STAT1 was correlated with genes CYP3A5, CYP3A7, CYP3A4, CYP1A1, CYP1A2, CYP2B6, and CYP2D7, with correlation coefficients between 0.1377 and 0.2287.And gene STAT4 was correlated with genes CYP3A5, CYP3A7, and CYP2B6, with correlation coefficients ranging from 0.1240 to 0.1405, while gene CYP27A1 was correlated with genes CYP3A5, CYP3A7, CYP3A4, and SLC4A3, with correlation coefficients between 0.3636 and 0.5840.The results of the other diagnostic subgroups were similar.The virtually ubiquitous interconnectedness of genes was very complex and could not be broken down in a straightforward manner.

Mental health
Reversing the methodological approach of "separating patients from healthy controls" to "separating healthy controls from patients" by means of NN classifiers did not lead to a useful operationalization of "mental health".Although NN analyses based on healthy controls (n = 267) as target population and patients (n = 1,431) as control population yielded a list of genes with genotypic patterns that occurred only in the target population, the contributions of these genotypic patterns to separation were generally small with no major contributor.Even with 18 genes, no more than 46 % of the healthy control subjects were correctly Table 3 «Singular genes» denote illness-specific genes for which genotypic patterns inherent in these genes show up exclusively in patients, but not in healthy controls.For each diagnostic subgroup, we found some 13-30 singular genes with frequencies between 10.0 % and 36.4 %.Weakening the clear-cut definition of "healthiness" for the control population (n = 267) by extending it with the 201 patients of our sample without severe psychiatric diagnoses (n = 468) left the results essentially unchanged.Only the number of singular genes reaching significance dropped somewhat in each diagnostic subgroup.

Table 4
For four target populations, we found in comparisons with health controls a rate of about 90 % correctly classified patients along with a 10 % subgroup labeled as "unknown".The only exception was the subgroup of patients with "Alzheimer's disease" where apparently one or more genes of relevance were missing in the selection of candidate genes.classified, while 54 % were labeled as "unknown".Inclusion of further genes led to only marginal improvements, thus suggesting that genetic factors that strengthen resilience among patients and controls might not be detectable in this way.

Biological ethnicity
The diversity indices of the 5 CLOCK gene segments (made up of 73 SNPs) lay between 148 and 181, thus indicating a sufficient resolution of between-subject similarities and differences.The correlations between the gene segments were above average with values between 0.3609 and 0.4380, so that certain combinations of genotypic patterns across gene segments were more frequent than expected by chance, underlining the good utility of the CLOCK gene for modelling biological ethnicity 3 .
Principal component analysis eliminated the correlation between the 5 CLOCK gene segments almost completely.The first two eigenvalues already explained 97.4 % of the observed variance, so that subsequent cluster analyses were carried out solely with the two corresponding eigenvectors.We found 3 clearly separated clusters, but these were unrelated to the status of affectedness and the patients' clinical diagnoses.

Discussion
Unlike standard genotype-to-phenotype association methods with "psychiatric diagnosis" as phenotype (Horwitz et al., 2019;Unal-Aydin et al., 2021), this project explored the extent to which irregularities in genetic diversity separate patients with major psychiatric disorders from healthy controls.Specifically, we searched for distinct traces in the patients' genotypic patterns caused by the genetic component of psychiatric disorders (Kendler, 2015;Smeland et al., 2020).Key elements were (1) the "gene vectors" assembled from 4-8 polymorphic SNPs located within genes and representing the genes' distinctive "fingerprints"; (2) the genes' diversity indices defined through the number of different genotypic patterns observed with each gene; and (3) the quantification of correlations between genes.
The gene vectors resulted from high-precision genotyping with very low missing data rates.As there was no need for statistical imputations (Marchini and Howie, 2010), the overall data quality met very high standards.The data analyses provided a body of quite convincing evidence that genetic diversity is most likely an intrinsic gene property that can be successfully quantified using "gene vectors" and "diversity indices".As a direct consequence, any set of 4-8 sufficiently polymorphic SNPs located within genes can be expected to yield comparable estimates of genetic diversity.On the other hand, genetic diversity essentially depended on the population under investigation, in other words, on selection and number of subjects drawn from the population.To address the problem of sample size dependence, we constructed normative calibration curves per gene and for sample sizes in the range of 50-1200 by means of a comprehensive random sampling algorithm that systematically evaluated all diagnostic subgroups along with the healthy controls.Once the differences in sample size were compensated for in this way, the amount of variance of between-sample differences that was explainable by sample size could be reduced to less than 10 %.
The normative calibration curves displayed a very robust behavior with respect to scattering and, when regarded as a function of sample size, with respect to continuity.The validity of the normative calibration curves was verified by comparing males (n = 742) with females (n = 956) where no differences showed up after correction for sample size.Additional support came from the fact that "sample size" did not explain the deviations in genetic diversity from "normal" values as observed for patients with Alzheimer's disease (n = 75) compared to patients with schizoaffective disorders (n = 64), as these deviations pointed in opposite directions.And most importantly, the distributions of diversity indices were found to be virtually identical for diagnostic subgroups of quite different size: for example, healthy controls (n = 267), depression (n = 596), and schizophrenia (n = 363) (Fig. 3a, b).This in contrast to the distribution derived from the Alzheimer's subgroup (n = 75) (Fig. 3c).In consequence, the proposed method of approach apparently constituted a sound basis for high-resolution analyses of the variation of genotypic patterns in genes and the correlations between genes (McKinney et al., 2006;Boucher and Jenna, 2013;Moore et al., 2019).
The diversity indices of the diagnostic subgroups under investigation were not homogeneously distributed over the distribution of the total sample (n = 1698).Rather, significant deviations from "normal" diversity indices showed up for three diagnostic subgroups: (1) a significant decrease for major depression; (2) a significant decrease for Alzheimer's disease; and (3) a significant increase for schizoaffective disorders.These deviations were related to a small number of genes, while the majority of genes showed no such differences.If the observed irregularities is a constituent of genetic vulnerability to psychiatric disorders, then the three diagnostic subgroups apparently follow etiologically different vulnerability pathways (Talarico et al., 2022), and schizoaffective depression is different from major depressive disorder, despite clinically similar symptoms.
Detailed analysis of the observed irregularities revealed the existence of singular genes, that is, illness-specific genes for which certain genotypic patterns showed up exclusively in patients, but not in healthy controls.For each of the diagnostic subgroups, we found between 13 and 30 singular genes, where the number was independent of the sample size.It is highly unlikely that the singular genes with their illnessspecific characteristics are entirely due to methodological artifacts.It is equally unlikely that the singular genes were mainly the result of hidden population stratifications, since half of the healthy controls were unaffected 1st-degree relatives of the study patients, expected to share a major part of population stratification with their affected 1 st degree relatives.Here it is important to note that the use of unaffected 1st-degree relatives as healthy controls leads to a reduction in separation between patients and controls, rather than to an inflation.Re-running the analyses without the unaffected 1st-degree relatives as controls left the configuration of singular genes virtually unchanged.By contrast, attempts to correct for population stratification by ancestry maps derived through principal component analyses were much less powerful (Gaspar and Breen, 2019).Given the distinctive characteristics of singular genes, NN analyses achieved steady-state results of 80 %-90 % correctly classified subjects when comparing diagnostic subgroups with healthy controls.The only exception with a 20 false-negative error rate was the subgroup of Alzheimer's disease patients.Evidently, genes of critical relevance to Alzheimer's disease were missing.
The NN classifiers were not unique because significant correlations between the genes caused a certain amount of redundancy.Given this redundancy, it is unlikely that there is a direct causal link between singular genes and psychiatric disorders since then several genes would have to overlap in their causal effects.Rather, the observed illnessspecific irregularities might be signs of a latent, cross-diagnosis vulnerability that makes it easier for exogenous factors to trigger the onset of psychiatric disorders, or to weaken the resilience of those affected.In fact, diagnosis-crossing vulnerabilities along with resilience factors may be involved in the pathogenesis of psychiatric disorders.The more so, as the genetic overlap between diagnostic subgroups seems to indicate that the clinically defined diagnoses do not represent biological entities.This is in line with clinical observations: (1) no homotypic diagnostic patterns are observed in families with multiple affected subjects; (2) there appears to be a continuum between affective and psychotic disorders (Stassen et al., 2006); (3) a majority of patients with a clinical diagnosis of schizophrenia also report major depressive symptoms; and (4) the clinical diagnoses of monozygotic twins who both developed a psychiatric disorder can be quite different even though they share the same genome (Braun et al., 2017). 5Biological ethnicity has its focus on population stratifications that arise locally within chromosome segments.

H.H. Stassen et al.
It is generally accepted that the body's immune system can be strengthened in a quite straightforward way: get enough sleep, control consumption behavior, care for a balanced diet, and do regular exercises.And best of all, this goes hand in hand with strengthening the body's robustness ("resilience") with regard to physical and mental health problems.The term "resilience" encompasses all those endogenous mechanisms that support and maintain health, thereby showing considerable between-subject differences (Braun et al., 2017).
As part of this project, we explored the idea that there might be an equivalent to the latent "vulnerability" concept revealed by our data.Specifically, we hoped that there might be some protective shield ("resilience") that compensates for the negative effects of vulnerability through a set of singular genes.Contrary to expectations, the data analyses did not readily lead to the envisaged results.In fact, what we experience as "resilience" may refer to something more fundamental, more comprehensive, and more complex compared to the narrowly defined "vulnerability", so that genetic factors that strengthen resilience among patients and controls might not be detectable in this way.In other words: while a useful vulnerability model appears to be well within reach, a comparable resilience model may not exist.
By construction, Genetic Diversity Analyses (GDAs) have a much higher resolution than single SNP approaches because of the great variability inherent in "gene vectors".Therefore, the results of GDAs and Genome-Wide Association Studies (GWAS) are only comparable to a limited extent, as signal detection differs not only quantitatively but also qualitatively.In particular, the "nearest gene" approach of GWAS complicates the cross-comparison of results excessively.
There are 100+ psychiatry-relevant GWAS (Chimusa and Defo, 2022).Reproducibility appears to be the central problem as every study finds something different, even when relying on pretty robust phenotypes like "response to treatment" (Allen and Bishop, 2019).Similarly inconsistent results come from GWAS using endophenotypes (Greenwood et al., 2016).Reproducibility can get compromised because (1) it is difficult to interpret associations: signals with strong associations may be "false-positives" while signals with weak associations may be "false-negatives"; and (2) the phenotypic variance explained by single SNPs is tiny and non-additive, so that GWAS require thousands of cases and controls (Dattani et al., 2022).
Schizophrenia GWAS: The most sophisticated approach to explaining associations detected by GWAS is by the "Schizophrenia Working Group" of the Psychiatric Genetics Consortium (Trubetskoy et al., 2022).This approach relies on a combination of fine-mapping, transcriptomic analysis, and functional genomic annotations.The authors reported 120 prioritized loci distributed over the entire genome (with the exception of chromosome 22): 88 intron-, 16 intergenic-, 4 missense-, 3 regulatory region-, 2 splice donor-, two 3 prime UTR-, two 5 prime UTR-, 1 non-coding transcript exon-, and 2 synonymous variants.The overlap with the GDA results was marginal.
Major depression GWAS: In their meta-analysis of seven cohorts, the "Major Depressive Disorder Working Group" of the Psychiatric Genomics Consortium (Howard et al., 2019) reported 44 prioritized loci distributed over 18 chromosomes: 27 intron-, 12 intergenic-, 2 regulatory region-, one 3 prime UTR-, and 2 non-coding transcript exon variants.The overlap with the GDA results was marginal.
Alzheimer's disease GWAS: In a review article of 3 recent GWAS in comparison to the meta-analysis carried out by the International Genomics of Alzheimer's Project (IGAP), the authors reported 77 loci distributed over 18 chromosomes (Andrews et al., 2020): 41 intron-, 14 intergenic-, 8 regulatory region-, three 3 prime UTR-, 5 missense-, 3 TF binding site-, 1 stop gained-, and 2 non-coding transcript exon variants.The newer GWAS were not independent of each other, yet produced some inconsistent outcomes.There was no overlap with the GDA results.
Excluding the Alzheimer's disease GWAS, there were 4 genes that received the strongest support from the cross comparisons: GRM1, GABBR2, GRIN2A, NRG1, and CACNA1C (did not reach significance in the GDA study).For methodological reasons, the poor overlap of results between GWAS and GDAs could be expected.Far less understandable are the inconsistencies between GWAS.Particularly disillusioning is the fact that GWAS results explain far less than 10 % of phenotypic variance.Therefore, the question arises whether GWAS are the most promising approach to psychiatric genetics.
Given their robustness, the results of this study can undoubtedly be replicated by independent patient samples.At first glance, existing GWAS with large samples of patients and controls appear to be a good basis for replicating our results.However, GWAS typically have relatively high error rates along with high percentages of missing data.This may not be a major problem in single SNP analyses, but can become an unmanageable obstacle in multivariate approaches (Dattani et al., 2022).Another problem arises from the fact that the SNPs of GWAS are fixed and cannot be freely chosen within genes as needed.

Conclusions
Multidimensional gene vectors enable high-resolution analyses of the genetic differences between patients and controls, which emerge from the variation of genotypic patterns in genes and from the correlations between genes.
The central finding of this study was the discovery of singular genes with their ability to separate patients from healthy controls.Even though singular genes do not establish a causal link to psychiatric disorders, they constitute clinically significant signs of latent vulnerabilities that make it easier for exogenous factors to trigger the onset of psychiatric disorders.Of particular interest is the genetic overlap between diagnostic subgroups as this indicates that clinically defined diagnoses may not represent etiological entities.
The proposed method of approach may have cleared the way to clinical applications that facilitate the early detection of latent psychiatric disorders among risk cases, so that early interventions can be started before clinically relevant symptoms develop.

Limitations
The majority of patients and controls came from Central Europe, so that the variation in biological ethnicity was modest.One must also assume that the classifiers constructed through this sample will not necessarily show the same good performance with ethnically different populations.

Figure 1 .
Figure 1.Principal schema of a multilayer Neural Net (NN) where clinical diagnosis (output) results from multiple gene vectors (input) connected to each other by complex interactions via one or more "hidden" layer(s).The NN algorithm iteratively constructs a model that is simultaneously fitted to the observed data of all patients.The achievable goodness of fit depends on the information included, the quality of underlying data, and the number of intermediate layers implemented to model nonlinear interactions.

Figure 2 .
Figure 2. Distribution of the diversity indices of 100 genes as observed in 1,698 Central European subjects (including a small number of U.S. Americans).The diversity index ranged from 18 (CYP2C19) to 476 (GPR39) with a mean value of 109.4 ± 82.8.The distribution revealed two peaks (diversity indices around 70 and 170), along with 7 genes exhibiting a diversity index above 250.These results may indicate different types of genes.

Figure 3 .
Figure3.Diversity index as a function of sample size, with sample sizes ranging from 50 to 1,700.Lower half: gene CYP2J2 on chromosome 1 with diversity index¼69.CYP2J2 belongs to the left group of genes in Fig.1.Upper half: gene SLC6A6 on chromosome 3 with diversity index¼182.SLC6A6 belongs to the middle group of genes in Fig.1.The diversity index was determined for both genes from 5 SNPs each.This demonstrates that the diversity index is an intrinsic gene property and only weakly linked to the number of SNPs.All genetic analyses relied on a genetic-physical map derived from Ensembl Build 105 of September 25, 2021.

Figure 4b .
Figure 4b. .The distributions of the genotypic patterns of the genes under study showed no substantial differences between healthy controls (n=468, upper half) and the patients of 4 diagnostic subgroups under study.Displayed here is the diagnostic subgroup of "Depression" (n=596, lower half).

Figure 4c .
Figure 4c. .The distributions of the genotypic patterns of the genes under study showed no substantial differences between healthy controls (n=468, upper half) and the patients of 4 diagnostic subgroups under study.By contrast, the distribution of the Alzheimer's subgroup (n=75, lower half) exhibited significant deviations from all other ones.
Zurich Study of Genetic Diversity in Psychiatry

Table 2
Expected values regarding diversity indices for 10 genes and sample sizes ranging from 100 to 1,000.Due to the well-behaved characteristics of the underlying calibration curves, simple linear interpolation between the sampling points is sufficient to calculate indices for intermediate sample sizes.

Table 5
Classifier genes have been identified by the NN algorithm as contributing to the separation between the diagnostic subgroups and healthy controls.All genetic analyses relied on a genetic-physical map derived from Ensembl Build 105 of September 25, 2021.