Exploring the Genetic Architecture of Parkinson's Disease in a Southern Spanish Population

Parkinson's disease (PD), the second most frequent neurodegenerative disease, fits within the wide range of complex polygenic disorders influenced by both genetic and environmental factors. In fact, just around 10% of PD cases are familial, caused by monogenic forms which sometimes exhibit incomplete penetrance. The vast majority of PD cases are sporadic and often present with varied clinical presentations. Therefore, here is when we thought about PD as a single clinical and pathological entity.


Introduction
It is assumed that through genetics, we can understand the molecular basis of disease and this understanding will afford the opportunity to develop and to test therapies based on aetiology.
In the last two decades, we have witnessed remarkable progress in the field of genetics. Along the way, we have learnt that only a small percentage of human diseases are associated with a classical Mendelian pattern of inheritance.
Parkinson's disease (PD), the second most frequent neurodegenerative disease, fits within the wide range of complex polygenic disorders influenced by both genetic and environmental factors. In fact, just around 10% of PD cases are familial, caused by monogenic forms which sometimes exhibit incomplete penetrance. The vast majority of PD cases are sporadic and often present with varied clinical presentations. Therefore, here is when we thought about PD as a single clinical and pathological entity.
In the recent years, the emergence of new technologies has revolutionized our concepts to identify genetic mechanisms implicated in PD. Importantly, Genome-Wide Association Studies (GWAS) have been key in such enormous advance. Basically, GWAS are hypothesisfree approaches which evaluate large numbers of cases and controls by studying thousands of markers throughout the genome, using high throughput gene-chip arrays. Allele and genotype frequencies from each of these genetic variants are compared between the case and the control group to detect alleles or genotypes that are overrepresented in one group versus the other. The primary aim of a GWAS is to identify genetic variants that modulate the risk rather than cause disease. Such variants are common in the population and predispose to disease with small effect sizes (odds ratio<1.5). The power of this genome wide assessment, is that it produces an unbiased analysis yielding to surprising findings that would have been overlooked by candidate-gene approaches. The ultimate aim of these studies is to better understand the biology of disease, under the assumption that a better understanding will lead to translational advances enabling more effective prevention or potential treatments. However, the road from GWAS to biology is not direct since an association between a genetic variant at a locus and a trait is not informative regarding the target gene or the possible mechanism involved on the phenotypic change.
An important aspect of every GWAS is the ability to create genetic predictors for disease risk. GWAS can estimate the effect size at multiple loci in the discovery phase, and those effects can be used in independent samples to generate a cumulative risk score per individual. These polygenic predictions are not particularly informative for a certain individual, but they explain a sufficient proportion of variation to distinguish groups with the highest and the lowest risk.
Despite the considerable progress reached in PD genetics, GWAS have identified only one-tenth of the common heritable component, suggesting that there is much left to find. It is expected that in the years to come, new loci with smaller effect sizes will be identified, and those hits will allow the creation of more accurate prediction scores.

Genome Wide Assessment of Parkinson´s Disease in a Southern Spanish Population
To the best of our knowledge, we have conducted the first GWAS of PD in a Spanish population and the second in a Southern European population. In a preliminary phase of this project, we aimed to investigate whether single nucleotide polymorphisms (SNPs) previously identified as risk variants in other populations contributed to PD risk in the Southern Spanish population.

Abstract
In the recent years, the emergence of new technologies has revolutionized our concepts to identify genetic mechanisms implicated in Parkinson's disease (PD). Genome-wide association studies (GWAS) have been key in such enormous advance. To the best of our knowledge we have conducted the first GWAS of PD in a Spanish population. We replicated the association of 5 reported PD-related loci at nominal p-value, and our cumulative risk score was consistent with studies performed in other European populations. We did not manage to identify any novel rare variant through single variant and gene-based tests and we assume that there may be structural genomic variation conferring risk for PD poorly covered or undetectable by the array. We conclude that in complex genetic disorders such as PD, collaboration drives progress and real advances can only be made by large consortiums cooperating with collaborative spirit. In the years to come, interpretation of the risk in the context of disease pathogenesis will be the main goal to reach.

Exploring the Genetic Architecture of Parkinson's Disease in a Southern Spanish Population
Our results suggested that there was no detectable high risk variant for PD present in our population, consistent with results of other GWAS with small sample size [1,2]. However, we marginally replicated association of five previously reported PD-related loci [3] at nominal p-value.
One of the limitations of the study design is the small sample size in comparison to other PD GWAS. The study failed to detect genomewide association at any of the 28 loci associated with PD in the latest meta-analysis. The p values that we report are within the expected range given the sample size of fewer than 300 cases. Obviously, the power to detect association for a variant with a particular effect size and minor allele frequency depends on the number of cases and controls included in the study. In theory, the larger the sample size is, the more sensitive the identification of variants with small effect sizes is.
With this argument in mind, it is needless to say that success in genetics requires extensive scientific collaboration. In complex genetic diseases such as PD, collaboration drive progress and real advances can only be made by large consortiums cooperating with collaborative spirit.
One important issue in every GWAS is to check the presence of population stratification, since allele types and frequencies for a substantial proportion of SNPs differ between ethnicities. The main strength of this study was that the sample comes from a relatively homogeneous ancestry background. In fact, our results clearly show that no population differences exist.
In the present study, we used genetic risk profiling to aggregate risk across the previously established risk loci. Like most published GWAS, PD loci individually confer only modest risk for disease. However, when we examined these loci collectively, the 20% of individuals with the highest burden of genetic risk were about 3.5 times more likely to get disease than those 20% of individuals with the lowest burden of risk. These results are of paramount importance to identify individuals at risk. It is thought that in the future, such risk profiles will not be able to predict disease status by themselves, but could be useful as a battery of tests aimed at predicting disease likelihood, progression and onset. Furthermore, accumulating additional genetic risk for disease provides supplementary understanding of the disease process as a whole. It is possible that the way to increase our knowledge about the mechanisms involved on disease, is to identify as much of the genetic influence as possible.
Since we used NeuroX, an array enriched with rare neurodegenerative disease-related variants, we also attempted to explore whether diseaseassociated rare variants were present in our cohort. We did not manage to identify any novel rare variant through single variant and gene-based tests. GWAS is not designed for and is inefficient at identifying novel rare genetic variants, simply because of its low frequency. This approach is only able to study relatively common types of variants, those that occur at a frequency of more than 5% in the general population. However, when we extracted our data and focused on variants present in PD known genes, our results were in concordance with our previous findings by sequencing.
Finally, in addition to providing SNP variation data, we evaluated the role of copy number variants (CNVs) as risk factors for PD in these subjects. We are aware of the limitation that NeuroX is not very effective in detecting structural variations (duplications, deletions, or inversions). We managed to identify small deletions but we assume that may there be additional CNVs conferring risk for PD poorly covered or undetectable by the array.
Great steps forward have been achieved in the field of PD research; however, the path to PD therapies is still long. Undoubtedly there will be common risk loci for PD to be discovered and it is expected that mega meta-analysis of GWAS will continue to detect new associations with PD over the next years. Increasing the size of current GWAS in different populations is one possibility that will give useful insights to prioritize new loci. However, we should keep in mind that larger number of samples will overpower heterogeneity, with the disadvantage of diluting out those variants associated with certain subgroups. Also, identifying additional common risk loci through GWAS will require both sample availability and substantial investment. Hopefully, there will be more to investigate from the already created data, looking at the sub GWAS hits in detail and combining other approaches.
The future of GWAS involves new and difficult challenges. Integrating data from GWAS and expression quantitative trait locus will be useful to identify associations between transcripts and traits and to prioritize genes to functional follow-up. It is also expected that GWAS by SNPs arrays will be replaced by whole-genome sequencing analyses.

Conclusion
The present study represents a complete assessment of a Spanish cohort at a genome-wide level, characterizing a novel population in PD genetics. We used NeuroX array to replicate the association of 5 reported PD-related loci at a nominal p value and our cumulative risk score was consistent with other studies. No novel rare or copy number variants were identified, since GWAS are designed mainly to study common genetic variation and is limited at identifying structural variation. We suggest that sets of larger samples will be required for further identification of common variation and replication of such associations.
In conclusion, GWAS have led to a better understanding of the genetic architecture of PD. But after all, interpretation of the risk in the context of disease pathogenesis remains a huge goal to reach.