Genome-wide association studies (GWAS)

Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are assessed by applying statistical tests, the two key challenges to address are the following: which method is the best for using multiple testing and how to increase the statistical power after adjustment for multiple testing. In this association studies, a solid criterion obtained to consider its significant with high statistical power and without necessarily increasing the sample size is crucial. Numerous methods have been developed for addressing these limitations; they have improved type I and type II errors rates. The proposed methods are mainly based on changing the type for establishing the association and extending it to continuous traits. Some of these statistical methods are very complex, which are difficult to use, specially for non-statisticians who usually obtain such data. Moreover, very few methods focused on developing a new statistical test for categorical data, which is the most common form of measuring phenotypical traits in humans and other organisms. By applying the maximum values of chi-square distribution as the test statistic, this study propose a new statistical test called Quotient C that allows testing associations between thousands of SNPs and a categorical trait. In real datasets, Quotient C is observed to be less stringent criterion that allows the declaration of a large number of associations between SNPs and dichotomous outcomes in comparison with the classical methods used for correcting multiple testing, thus keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than α. The proposed method has a lower type II error rate and a better statistical power than the following methods: Bonferroni, Holm, Hochberg and Benjamini and Hochberg.

Some of these statistical methods are very complex, which are difficult to use, specially for non-statisticians who usually obtain such data. Moreover, very few methods focused on developing a new statistical test for categorical data, which is the most common form of measuring phenotypical traits in humans and other organisms. By applying the maximum values of chi-square distribution as the test statistic, this study propose a new statistical test called Quotient C that allows testing associations between thousands of SNPs and a categorical trait.
In real datasets, Quotient C is observed to be less stringent criterion that allows the declaration of a large number of associations between SNPs and dichotomous outcomes in comparison with the classical methods used for correcting multiple testing, thus keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than α. The proposed method has a lower type II error rate and a better statistical power than the following methods: Bonferroni, Holm, Hochberg and Benjamini and Hochberg.

Introduction
Multiple hypothesis testing refers to a method of testing in which more than one hypothesis is tested on the same data set simultaneously. This method of testing is not recently introduced, and the basis for such procedures was established since the forties and early fifties by Dunkan, Scheffé and Tukey. Their applicability due to technological advancement in recent years, specially in genomics, have become difficult and the limitations of the classical methods are more evident nowadays.
In several genomic data analysis contexts, a large number of statistical hypotheses are tested at the same time on the same dataset. For example, when genetic polymorphism is used as a marker and it wants to be associated with a phenotype. In the human genome, there are over one million polymorphic sites, they are called Single Nucleotide Polymorphisms or SNPs. A review found a 99.9% similarity in DNA sequence between two samples of people taken at random; the remaining 0.01% includes any difference or genome variation between individuals. These genetic variants or polymorphisms arise in mutations along different ancestors and have been used in studies to identify if any of them is associated with a particular phenotypic trait, such as a disease. However, over 90% of these genomic variations are neutral and are not able to explain a particular trait.
Studies that identify associations between SNPs and the presence of a trait (generally a disease or more generally an outcome) are called Genome-wide association studies or GWAS and compare whole genomes of two groups of participants using an epidemiological casecontrol design, without making assumptions about the genomic location of the possible causal variants. The DNA of individuals with the disease is compared to those free of it and the millions of genetic variations are read using arrays of SNPs. The allele frequencies or genotypes between these two groups are compared and whether a particular type of variant is more frequent in cases than in controls according to statistical tests, it is decided that the polymorphism is associated to the presence of the outcome.
The first GWAS was published in 2005 and it investigated the macular degeneration related to age. Although these studies have been used to evaluate associations between SNPs and traits mostly in humans, they serve also to answer the same question in other organisms, such as plants and animals. For example, association studies have helped to identify genetic variations associated with susceptibility of ornamental plants and crops to viruses or pests 3 as a strategy to improve production; in other cases, GWAS have been applied to identify genes or Single Nucleotide Polymorphism associated with traits of economic importance in animal productions, like genetic variations related to better quality of milk or meat.
The response variable can be a quantitative trait such as the weight of individuals or the amount of a certain hormone in blood, or it can be dichotomous as the presence or absence of the disease or trait under study. In the latter case, the results are summarized in a 2 × 2 contingency table with the presence of SNP as exposure and disease as the outcome. Traditionally, when a dichotomous variable is evaluated, the association between k i , i = 1, 2, . . . , K SNPs and the outcome is tested using primarily a χ 2 test of homogeneity, followed, in some cases, by a logistic regression model in order to obtain a proper estimate of the association both raw and adjusted for covariates. Then, a correction of the p−values is performed for each one of the H 1 , H 2 , . . . , H K hypotheses tested. The classical methods for those correction include Bonferroni´s, Holm´s, Hochberg´s proposal that control the family-wise error rate (FWER) and Benjamini & Hochberg´s one that control the false discovery rate (FDR).
Other approaches have worked on the false discovery proportion and the construction of confidence bounds for this measure and recent advances have moved away of the classical view of the problem like the penalized regression, methods based on functional genomics data or on Bayesian statistics.
In GWAS the most difficult limitation to address is the limited statistical power achieved after making adjustment to the type I error for multiple comparisons and its impact on the number of false positives. Testing simultaneously and in an independent way a large number of markers potentially associated with an outcome may inflate the number of false positives and a very stringent correction can affect the probability of rejecting the null hypothesis when it is false or in GWAS, the probability of correctly detecting a genuine association (statistical power). To test simultaneously a great amount of SNPs, each one as a separate hypothesis, involves making adjustments to the significance level and indeed, increasing the sample size needed to reach an adequate power. That is not a realistic and possible situation in genomics because the entire or a great part of the genome of individuals must be sequenced and this is still expensive. Obtaining a solid criterion for declaring significant associations with high statistical power and without increasing the sample size is crucial to Genome-wide association studies and statistical solutions are needed.
Methods have been developed in order to address these limitations and they have reduced type I and type II errors. The proposed methods are mainly based on changing the type of test to establish the association and extending it for continuous traits. Some of them are very complex statistical methods and are difficult to use for non-statisticians who usually produce the data. Moreover, very few methods have focused on developing a new statistical test for categorical data, which is the most common form of measuring phenotypical traits 4 1 Introduction in humans and other organisms.
A new statistical test was proposed, called Quotient C test. It allows testing associations between thousands of SNPs and a categorical trait using an test statistic based on the maximum of χ 2 random variables. The results on real dataset show that the Quotient C represents a less stringent criterion and allows to declare associated with the outcome a large amount of Single Nucleotide Polymorphism compared to the classical methods used to correct for multiple testing, keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than α. The proposed methodology has a lower type II error rate and indeed a better statistical power than Bonferroni´s, Holm´s, Hochberg´s and Benjamini & Hochberg´s methods. To improve the statistical power without increasing the simple size represents a clear advantage of the Quotient C compared to the other proposals and allows to minimizing the amount of false negatives.

Description and formulation of the problem 2.1. Background
Multiple hypothesis testing refers to the fact that more than one hypothesis is tested simultaneously on the same data set. Although this issue is not new and the basis for such procedures raised since the forties and first half of the fifties by Dunkan [1,2], Scheffé and Tukey [3,4], their applicability due to technological advances in recent years, mainly in the human genetics becomes difficult and the limitations of the classical methods are more and more evident nowadays.
With the advancement in knowledge regarding DNA since the early fifties and the human genetics from the genome project since 1990, it has been established that although this molecule is a constant in all humans, there are over one million sites that vary genetically among individuals [5], they are called Single Nucleotide Polymorphisms or SNPs. A review found a 99.9% similarity in DNA sequence between two samples of people taken at random of the same population; the remaining 0.01% include any difference or genome variation between individuals [6]. These genetic variants or polymorphisms, arise in mutations from different ancestors [7] and have been used in different studies to identify if any of them is associated with a particular trait [8], such as a disease. However, over 90% of these genomic variations are neutral and are not able to explain a particular trait [9].
A substantial number of SNPs have been identified across the genome through the HapMap project and these have been recorded in databases accessible for consultation by the scientific community like the dbSNP database, GWASdb or the National Human Genome Research Institute [10,11].
Studies that identify associations between polymorphisms and the presence of a trait (generally a disease) are called Genome-wide association studies or GWAS and generally they compare entire genomes of two groups of participants using an epidemiological case-control design, without making assumptions about the genomic location of the possible causal variants. A case-control study is designed to determine if an exposure is associated with an outcome (i.e., a disease, a trait or any other condition of interest) [12]. This kind of studies 6 2 Description and formulation of the problem represent a sample strategy in which the most important characteristic is that the studied population is made up of at least two groups: one of them called case (a group known to have the outcome) and the other called the control group (a group known to be free of the outcome) [12,13]. By definition, a case-control study is always retrospective because it starts with the outcome of interest, then traces back to investigate multiple exposures, so, it looks back in time to measure which subjects in each group were exposed, comparing its frequency in the case group to the control group [12,13,14,15].
In GWAS, the DNA of people with the disease or the trait is compared to those free of it and the millions of genetic variations are read using arrays of SNPs. In general, the allele frequencies or genotypes between these two groups are compared and whether a particular type of variant is more frequent in cases than in controls according to statistical tests, it is decided that the SNP is associated to the presence of the disease or trait [16]. As the genome does not change in a subject since its birth, the SNPs are always considered as a retrospective exposure, so the case-control design is appropriated.
The response variable can be a quantitative trait such like the weight of individuals or the amount of a certain hormone in blood, or it can be dichotomous like the presence or absence of the disease or trait under study [17]. In these latter cases, the results are summarized in a 2×2 contingency table with the presence of SNP as exposure and the disease as the outcome. For example, to assess the association between the presence of the 4889A/G polymorphism of CYP1A1 gene and the presence of lung cancer [18], the results would be summarized as shown in Table (2-1).
The association between a potential exposure and the outcome is evaluated using primarily a χ 2 test of homogeneity, followed in some cases by a logistic regression model in order to obtain a proper estimate of the association both raw and adjusted for covariates using odd ratios (OR) [19]. Thus, for each of the SNPs investigated, a table like the one shown above is constructed and then, the association is evaluated using an individual hypothesis for each 2.2 The statistical problem in GWAS 7 polymorphism; at the end, the level of significance for multiple testing is corrected [8]. Thus, a GWAS results in multiple contrast of K hypotheses between dichotomous categorical variables.
The first GWAS was published in 2005 and it investigated the macular degeneration related to age [20]. Although these studies have been used to asses associations between SNPs and traits mostly in humans, they also have served to establish genotype-phenotype associations in other organisms, like plants and animals. For example, association studies have helped to identify genetic variations associated with susceptibility of ornamental plants and crops to viruses or pests as a strategy to improve production [22,23,24]; in other cases, Genomewide association studies have been applied to identify genes or SNPs associated with traits of economic importance in animal productions, like genetic variations related with better quality of milk or meat [25,26,27,28].
The markers in the public domain available to support studies like GWAS exceed 1.4 million [5] and are available in databases to be consulted. In the Single Nucleotide Polymorphism Database (dbSNP), a free public archive and general catalog of identified genome variations developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI), there is information about 9 million SNPs (within and across different species), including most of the estimated 11 million SNPs with rare allele frequency less than 1% or greater that are estimated to exist in the human genome [29,30]. Other databases are the SeattleSNPs PGA (focused on identifying, genotyping and modeling the associations between Single Nucleotide Polymorphisms in candidate genes and pathways that underlie inflammatory responses in humans) [31] and EVOLution of TREEs as drivers of terrestrial biodiversity (information about sets of DNA samples for ecological research) [32], among others.
To test simultaneously a large number of markers, each one as a separate hypothesis involves making adjustments to the significance level and increasing the sample size needed to correctly detect a genuine statistical association. Nevertheless, sample size increase is very costly because the entire genome of individuals must be sequenced and it still being very expensive, therefore statistical solutions are needed.

The statistical problem in GWAS
When a hypothesis is tested, the approach is as follows: let X be a random variable with probability mass function or probability density function f (x|θ), where θ is an unknown parameter but it is known to belong to a parameter space Θ, namely θ ∈ Θ with Θ ⊂ R. We want to test the null hypothesis H 0 = θ ⊂ Θ 0 against the alternative hypothesis [33,34]. To pose 8 2 Description and formulation of the problem the hypothesis, a rejection region Γ is definided and Γ corresponds to a subset of the sample space that leads to the rejection of H 0 , such that if the observed statistical T (X) = t ∈ Γ, the null hypothesis is rejected, otherwise data support H 1 . Γ is chosen in such way that the probability of making a type I error, i.e rejecting a true null hypothesis is such small as is possible; this level denoted as α is known as the significance level of the test [34].
When more than one hypothesis is tested on the same dataset, the problem arises when all hypotheses are evaluated together at a given level α, so regardless of the multiplicity of the problem, the probability of making a type I error increases when the number of hypotheses increases. Thus, when contrasting simultaneously K null hypotheses, it is very likely that if all of them are true (not associated with the outcome) and some of them are simply rejected by chance. If K is large enough, it will always prove a rejected hypothesis although all are true [35].
In GWAS the most difficult limitation to address is the limited statistical power achieved after making adjustment to the type I error for multiple comparisons and their impact on the number of false positives and negatives [36,37,38,39]. Testing simultaneously a large number of markers potentially associated with the outcome may inflate the number of false positives and to affect the probability of rejecting the null hypothesis when it is false or in GWAS, the probability of correctly detecting a genuine association genotype-phenotype (statistical power) [40].
In this kind of studies, usually millions of comparisons are performed between categorical data on the same samples of patients. Under this assumption, at a given level α, a number K of markers being tested and assuming that all hypotheses are independent, it is possible to calculate the probability P of making a type I error using the following formula [40,46]. In this way, with a type I error of 0.05 and five SNPs tested simultaneously, there is a 22.61% of probability that one of them is associated significantly by mistake with the disease, probability that rises to 51.23% with 14 SNPs and to 90.05% with 45 ones.
In order to address such limitations generated in multiple comparison tests, techniques have been developed which have inevitably led to two undesirable outcomes: to increased levels of false positives by a weak correction or to decrease statistical power due to a very strong correction [40].
Consider the following mathematical structure of the multiple testing problems in GWAS: let H = (H 0 1 , H 0 2 , . . . , H 0 K ) be the collection of null hypotheses that we would like to reject for k i , i = 1, 2, . . . , K Single Nucleotide Polymorphisms studied. An unknown number k 0 of these hypotheses is true whereas the other k 1 = K − k 0 is false. Now, let T ⊆ H be the 2.2 The statistical problem in GWAS 9 collection of true hypotheses and F = H \T the collection of false hypotheses. The objective of multiple comparisons procedures is to choose a collection R ⊆ H of hypothesis to reject. If the p−values p 1 , p 2 , . . . , p K for each of the hypotheses H 0 k are calculated, one possibility for R is the collection R = {H k : p k ≤ T }, where all the H 0 k are rejected (also called as a discovery) if its p k ≤ T ; in this way, multiple comparison procedures can be simplified as the choice of a T [41]. Ideally R = F, however, this does not always happen and represents errors committed on rejected hypotheses. The number of errors and non-errors committed in a study with multiple testing of hypotheses, using a frequentist setting, are summarized in a table as shown in   [41,44].
For example, rejecting a true null hypothesis (type I error) or committing a false positive (or false discovery) corresponds to the V hypothesis of the set R ∩ T ; on the other hand, a false negative or type II error occurs when a null hypothesis that is false is not rejected, ie the k 1 − U hypothesis of the set F \ R, where U are the true null hypotheses that are not rejected [41,44].
Using this notation, the specific number K is known, the number k 0 and k 1 = K − k 0 of true and false null hypotheses are unknown parameters, while R = #R is an observable random variable and S, M, U and V are unobservable random variables [47].
The particularity of each techniques of correction by multiple comparisons depends on the proposed T and the classical methods include the methods of Bonferroni, Holm, Hochberg and Benjamini & Hochberg. The most commonly used technique is the Bonferroni´s correction [8,41], which involves dividing the significance level α by the number of hypotheses tested and the corrected p−values are used to compare the results [51], thus T = α/K. In this method, α = α K is computed and each individual hypothesis H i with p i ≤ α is rejected, so the total type I error is divided equally among all the hypotheses tested [52]. Reducing the significance level could eventually find new associations (increasing statistical power), but it is risky because false positives could be among the associated SNPs. In a similar way that Bonferroni´s method (using a single-step adjustment), Sidák proposed T = 1−(1−α) 1/K [53]. 10 2 Description and formulation of the problem Other proposes include a sequential variation using the ordered p−values p (1) , . . . , p (K) . In those methods, each p−value p (i) is compared with its critical value (T ). So, Holm proposed T = α/(K − i) [54], Hochberg a T = α/(K − i + 1) [55] and Benjamini and Hochberg a T = iα/K [56].
Bonferroni´s, Holm´s, and Hochberg´s proposal are considered as familywise error rate (FWER) controlling methods, that is they calculate the probability that R contains any error or P (V > 0) [41,54,55]. Moreover, the procedure of Benjamini and Hochberg controls the false discovery rate (FDR) or the expected proportion of errors among R) (E(Q)) [56].

Assumptions of multiple testing procedures
In multiple testing problems, an important assumption must be done about the dependence structure of the variables or the p−values. High-dimensional studies like GWAS rarely involve the analysis of independent variables, rather, many related variables are analyzed simultaneously; however, some of the classical methods for performing multiple testing rely on independence, or some form of weak dependence among the data corresponding to the variables being tested. Leek and Storey [57] have developed an approach for addressing arbitrarily strong multiple testing dependence in two cases: at the level of the original data collected (population-level) and before test statistics or p−values are been calculated (estimation-level); these cases are types of multivariate dependence between vectors. The authors assume that K related hypotheses tests are simultaneously performed, each one based on an n−vector sampled from a common probability space R n and the data corresponding to each hypothesis test is the vector x i = (x i1 , . . . , x in ) for i = 1, 2, . . . , K. The dataset is able to be arranged into a K × n matrix X where the i−th row is composed of x i , the outcome for each one of the hypothesis is arranged in a vector Y = (y 1 , . . . , y n ) and the goal of the procedure is to perform a hypothesis test on , it is able to be modeled using linear models, nonparametric smoothers, longitudinal models, among others, so this expression can be written as The vector Y is made up of any type of random variables and in GWAS, it is made up of dichotomous variables. Now, let be E the K × n matrix, so the model can be written as X =BS(Y) + E. Definition 2.2.1 Population-level multiple testing dependence. The population-level multiple testing dependence exist when P r(x 1 , x 2 , . . . , x K |Y) = P r(x 1 |Y) × P r(x 2 |Y)×, . . . , ×P r(x K |Y).
(2-1) Definition 2.2.2 Estimation-level multiple testing dependence. The estimation-level multiple testing dependence exists when

The statistical problem in GWAS 11
From (2-1), the dependence at the population level is therefore any probabilistic dependence among the x i , after conditioning on Y and this is equivalent to the existence of dependence across the rows of E. Meanwhile, from (2-2), the estimation-level dependence is equivalent to dependence among the rows of the residual matrix R = X −BS(Y). When population-level multiple testing dependence exists, then this will lead to estimation-level multiple testing dependence [57].
Typically, multiple testing dependence has been defined in terms of p−values or test statistics resulting from multiple tests. Let is assume the collection S 1 , S 2 , . . . , S K of test statistics, one for each hypothesis tested and its corresponding unadjusted p−values p 1 , p 2 , . . . , p K . The assumption over the p−vales can be in two mainly ways: (i) non assumption at all and (ii) some assumption on the dependence structure of the p−values known as dependence through stochastic ordering or PDS [41]. With respect to the last one, it involves only the p−values of the true hypotheses, denoted as q 1 , q 2 , . . . , q K 0 . The PDS comes in two ways: a weaker and a stronger one [58,59]. The weaker PDS condition says that is nondecreasing in u for every i and for every f , a coordinate-wise nondecreasing function. The stronger PDS condition is similar to (2-3) but involves p i instead of q i , that is In (2)(3)(4), there are nondecreasing function f of p−values for all the hypotheses, not only for the true ones [58,59].

Recent advances and challenges
As a step further to the classical methods, mainly FRD-controlling ones, other approaches have worked on the false discovery proportion or FDP denoted as Q and the construction of confidence bounds for this measure. The FDP is the proportion of false discoveries among total rejections. Formally it is defined as: Storey´s approach [60] considers the set R of rejected hypotheses of the form R = {H k : p k ≤ t} where t is a constant. In R, Storey proposes that the expected number of true hypothesis to be rejected is k 0 t and an estimation of k 0 , denoted ask 0 , is given byk 0 = #{p k >λ}+1 The value of λ usually is proposed as 0.5 although λ = α is an usual choice. So, 12 2 Description and formulation of the problem the estimation of Q, denoted asQ is given byQ =k 0 t #R [60].
Efron proposed a view of the FDP and the F DR = E[Q] called local FDR [61]. The Storey method allows obtain conclusions only over the set R and not over any subset of R or over independently rejected hypothesis. The Efron´s approach solves this limitation but it comes with additional assumptions. The proposal assumes the test statistics S 1 , S 2 , . . . , S K were randomly drawn from a mixture distribution with density where π 0 is the probability that a hypothesis is true, f 0 (t) is the density of the test statistics of true hypotheses and f 1 (t) is the density of test statistics of false hypotheses. The shape of f 1 and π 0 are unknown and both must be estimated from the data meanwhile f 0 is considered as known. So, a similar version of FDR, the P r( A confidence estimation of the FDP is sometimes more informative than a point estimation, mostly in cases of variability. Two works are well-known to address this situation: Goeman and Solaris [62], developed an interval estimation of FDP under the PDS assumption and Meinshausen developed confidence intervals using a permutation-based method [63]. Modern proposals have moved away of the classical view of the multiple hypotheses testing problem. The penalized regression has been used to predict disease risk based on a genome, e.g. on a large number of single-nucleotide polymorphisms [64]. This methodology, although it has a different purpose (to predict the probability of a outcome based on the presence of SNPs), has incorporated techniques to select the SNPs to include in the model as least absolute shrinkage and selection operator or LASSO [65], which estimates a vector regression coefficients associated with the SNP effects by maximizing a penalized log-likelihood function and at the end, the not selected SNPs in the final model have regression coefficients set to zero. Other techniques used for a similar purpose are the smoothly clipped absolute deviation (SCAD) [66] and the truncated L 1 −penalty or (TLP) [67].
Kichaev et al. [68] presented a new proposal called functionally informed novel discovery of risk loci or FINDOR based on functional genomic data. This method uses polygenic modeling to weight SNPs according to how well they tag functional categories that are enriched for heritability and it can be summarized in two steps. First, a function that predicts the χ 2 statistic is estimated for each SNP using a comprehensive assortment of functional annotations which includes coding, conserved and regulatory annotation. This random variable is predicted using a stratified LD score regression that includes as input a measure of confounding biases, the effect size on per-SNP heritability of annotation and an LD score which indicates the degree to which each SNP tags the annotation. The second step consists in stratifying the SNPs based on their expected χ 2 values. In simulated data, the methodology has shown to improve in a 9%-38% the number of independent associations genotype and 2.2 The statistical problem in GWAS 13 indeed, to improve the statistical power [68].
Bayesian statistic has also been applied to the selection of SNPs associated with clinical or biological traits. The Bayes´s theorem is used to combine prior beliefs of marker effects, which are expressed in terms of prior distributions, with information from data for inference [69]. In this framework, the authors propose a regression model for the SNP joint effect on the response and to specify a prior for the regression coefficients. The best multiple SNP model is determined evaluating the posterior probability [70,71,72,73,74]. As well as in the classical proposals, the number of models evaluated is usually adjusted for the multiple testing [74] and the high-dimension is still a limitation of those methods. Some of them have been development only for K = 2 [75] or they require a heavy first step selection to reduce the number of SNPs [76].
The most obvious technical limitation of GWAS is the high cost and effort required to genotype hundreds of thousands of SNPs per individual; because of this, the most of the conducted studies have had modest sample sizes with a consequent reduction in the statistical power of the tests used. These small studies are prone to random error and often offer large confidence intervals to draw meaningful conclusions [77]. Experience in other clinical settings suggests that they tend to produce more favorable results compared to studies with large sample sizes [78]. To increase these sizes, meta-analyzes have been developed and they integrate structured and systematic GWAS results over a given SNP and disease. Although this practice certainly increases the size of the study, a meta-analysis combines the results from multiple studies in an effort to increase power (over individual studies) and in Genomewide association studies, this has a important implication on the results because it increases the diversity of genetic variants in the control group compared to cases [79] due to multiple genetic origins of the participants. GWAS have had great success in identifying genes associated with the development of monogenic disorders or Mendelian diseases (where a single gene is responsible for the disease), but in polygenic diseases, their success has been limited because the associated genes explain only a small portion of the risk [80]. This lack of success has been partly explained by the low heritability of polymorphisms, imprecise definition of phenotypes and the most important fact: the low statistical power of the test of hypothesis [81].
Minimizing false positives generated through multiple hypothesis testing in Genome-wide association studies is essential for obtaining a solid criterion for declaring significant associations with greater statistical power without increasing the sample size and without ignoring the high-dimensionality [81,82]. Today, although the developed techniques have made an approach to the solution of these problems, the limitations hinder its use as a true practice, sacrificing the validity of the results. Given this gap in scientific knowledge, a statistical 3. Objectives

General objective
To develop a statistical methodology for estimating associations between categorical data in multiple contrasts of hypothesis conservative of statistical power and false positive rate, with application to studies of genome-wide association.

Specific objectives
To develop a statistical methodology to determine associations between categorical variables in multiple hypothesis testing.
To evaluate the usefulness of the proposed technique by estimating associations between SNPs and the development of a particular complex disease, using a real database.
To determine the statistical power generated with the proposed methodology and compare these parameters to those obtained by conventional statistical techniques used in solving this problem.

Overview of the methodology
The methodology was proposed in two stages: 1. Theoretical phase. A new test statistic was proposed based on the maximum of the statistical test used to prove the hypotheses: The ith polymorphism is not associated with the outcome.
The ith polymorphism is associated with the outcome.
For simplicity, we denoted U i (k i ) = X i as the value of the statistical test used to prove the previous hypotheses for i = 1, 2, . . . , K polymorphism. The necessary steps are described below: a) The probability density function (p.d.f ) of X (K) = max(X 1 , X 2 , . . . , X K ) denoted as g k (x k ) was found.
b) The parameter r * of the function g k (x k ) was estimated.
c) The quotient between the random variable U i (k i ) and the density function of the maximum evaluated in X i = x i was proposed as statistical test to choose the SNPs that are associated with the outcome. This statistical test was called Quotient C and formally, it is defined as: d ) The p.d.f of the Quotient C was found. First, the joint density function between X := U i (k i ) ∼ χ 2 1 and Y := g (k) (x i ) is found and then, the joint density function between X and C denoted as u XC (x, c). The probability density function of C, denoted as f C (c) was estimated (by transformation techniques of random variables), integrating u XC (x, c) with respect to x, that is f C (c) = ∞ −∞ u XC (x, c)dx.
2. Application phase. The proposed methodology has been applied to real data in comparison to the classical methods of multiple testing, following the next steps: a) The classical methods of correction by multiple comparison tests as well the proposed methodology were applied to two real databases and the results were compared.
b) The statistical power of Quotient C as well as the other methods was calculated using the approach proposed by Dudoit, Boldrick and Shaffer [44].
c) A simulation study was performed in order to establish the repeatably of the results. In this case, three methods were compared: Bonferroni, Benjamini and Hochberg and Quotient C. Three scenarios were considered: i. SNPs detected by all the methods. ii. SNPs detected only by Benjamini and Hochberg´s proposal and iii. SNPs detected only by the Quotient C. In each scenario, a sample of 30% of polymorphism was taken. The simulation included 1000 repetitions in which the three methods were applied. The objective was to count the number of SNPs that each method found as associated with the outcome and summarize how many of them was detected by the other methods according to the scenario. This dataset was supplied by Ariza et al. [48] and has the genetic variants studied as associated factors with the hardness of meat in sheep. In 130 sheep of the breed Camuro or ovino de pelo criollo colombiano, sacrificed under controlled conditions, the muscular hardness of Longgisimus dorsi muscle was studied as response variable. The hardness of the meat was considered as hard if the shear force value, using a Warner-Brattzler machine, was equal or higher than 17.118 u that is (Y = 1); otherwise, the meat was considered tender (Y = 0).   Let j be the index of a subject in the sample and for the j−th subject, Y = t denotes if he/she is a case or a control. So, Y = 1 is the j−th subject is a case or Y = 0 if the individual is a control. Usually, to evaluate the association between the ith genetic variation and the outcome, a 2 × 2 table is elaborated and the statistical test used to test the hypotheses H 0 i : the ith SNP is not associated with Y = t, again H 1 i : the ith SNP is associated with Y = t, is given by with a, b, c y d denoted as in the Table (2-1).
With K independent polymorphism and a true H 0 i , U i (k i ) ∼ χ 2 1 and U i (k i ) ∈ [0, ∞). The biggest values of U i (k i ) lead to reject the null hypothesis.

Probability density function of the maximum of χ 2 random variables
For simplicity, let is denote X i = U i (k i ), i = 1, 2, . . . , K and we can see that X 1 , X 2 , . . . , X K are independent and identically distributed random variables (iid), each one X i ∼ χ 2 (r)

Proposed statistical test 21
with r = 1 degrees of freedom. Let F (x) and f (x) be the common cumulative distribution functions and probability density function, respectively, of these random variables. Now, let X (1) , X (2) , . . . , X (K) be the ordered random variables, where X (1) ≤ X (2) ≤ , . . . , ≤ X (K) . Using this notation, X (1) = min(X 1 , X 2 , . . . , X k ) and X (K) = max(X 1 , X 2 , . . . , X K ). The probability density function of X (K) is given by the following theorem: Theorem 1 Density function of the maximum of order statistics [84]. Let X 1 , X 2 , . . . , X K be independent identically distributed continuous random variables with common distribution function F (x) and common density function f (x). If X (k) denotes the maximum of X 1 , X 2 , . . . , X K , then the density function of X (k) is given by As X 1 , X 2 , . . . , X K are continuous and iid random variables, each one of them ∼ χ 2 and using (5-2), the maximum of those variables have the probability density function given by where γ(r/2, x/2) = x/2 0 t r/2−1 e −t dt is an incomplete gamma function and Γ(r/2) = ∞ 0 t r/2−1 e −t dt is a gamma function, which by definition is a constant. So, the density function given in (5-3) follows a distribution with r * degrees of freedom.

Proposed test statistic
When k i , i = 1, 2, . . . , k, . . . K polymorphisms are tested and the random variables X i are considered as order statistics, the interest is choosing those X i with the biggest values that lead to the rejection of the null hypotheses at α or the type I error. Considering the distribution of the maximum of the random variables X 1 , X 2 , . . . , X K and 22 5 Results to evaluate the values of X i = x i under this distribution, we estimate the probability that each one of the random variables belongs to these maximum.
The quotient between the random variable U i (k i ) and the density function of the maximum evaluated in X i = x i was proposed as the statistical test to choose the SNPs that are associated with an outcome. This statistical test was called Quotient C, and it is defined by: Based on this quotient, we deduce the following: If C i ≥ 1, then the value of the statistic X i = U i (k i ) is at least equal or greater than the value of X i = x i based on the distribution of the maximum value of the random variables, X 1 , X 2 , . . . , X K , assuming that X i is the maximum.
If C i < 1, then it implies that the value of the statistic X i = U i (k i ) is smaller than the value of X i = x i based on the distribution of the maximum of the random variables, assuming that X i is the maximum.
As a consequence of applying the Quotient C, the SNPs that are associated with the presence of the outcome were those whose C i ≥ 1 and are statistically significant at any given α; this is equivalent to the collection R = {H k : p k ∈ X (K) }.

Probability density function of Quotient C
The Quotient C is a function of two independent random variables: X := U i (k i ) ∼ χ 2 1 and Y := g (k) (x i ). The latter follows the distribution given in . In this way, the joint probability density function between X and Y is: if 0 ≤ x, 0 ≤ y 0 , otherwise.

Proposed statistical test
To find the distribution of the proposed statistical test, first, the joint density between X and C was found and then, the marginal probability density function of Quotient C.
The joint probability density function between X and C was found by the method of transformation [83,84,85], which is based on the following proposition: has Lebesgue measure 0 and g on A j is one-to-one with a non-vanishing Jacobian, i.e., the determinant Det(∂g(x)/∂x) = 0 on A j , j = 1, . . . , m. Then Y has the following Lebesgue p.d.f.: where h j is the inverse function of g on A j , j = 1, . . . , m.
Now, let be h(x, y) = C and we assign X a value x ≥ 0, Let´s denote u(x, c) as the joint probability density function between X and C,which is given by ] is found and applying (5-5) we have: 24 5 Results Multiplying powers with the same base and applying properties of the exponents in (5-6): By integrating the joint density function between X and C with respect to x, Quotient C´s p.d.f is given by Now, let's consider the following arrangement:

Proposed statistical test 25
Where g(·) is the probability density function of a random variable that follows a gamma distribution with parameters r * +1 2 , 2 c and X ∼ g( r * +1 2 , 2 c ). In this way, the probability density function of Quotient C is given by the expression (5-7).
Intuitively, the expression of f C (c) has the following limitations when K → ∞: (i) E X → ∞ so f C (c) can take too large values and (ii) Γ(r * /2) K → ∞ so f C (c) → 0. Computationally speaking, the maximum and minimum value that a conventional computer accepts are 1.7976e 308 and 2.225e −308 respectively; bigger values are considered as Inf and a lower one as 0. In this way, for the estimation of the p−values for the SNPs, the following adjustment were done: (i) when the computer assigned an infinite value, it was replaced by 1.7976e 308 and (ii) when the computer assigned a value of 0, it was replaced by 2.225e −308 .

Estimation of the probability density function of Quotient C
The graphs (A-1) to (A-4) show the plot function of f C (c) for different values of r * and K. Is possible to see when this adjustment is done due to values of K bigger than 308 (as 1000 and 10000), the probability density function of the Quotient C is affected ( Figure A-2 to A-4). Although an analytic expression of the probability density function f C (c) was found, it is not possible to calculate it exactly or it is very difficult to calculate it with the degree of precision desired due the limitations previously exposed, so a Monte Carlo integration was performed. The Figure  shows an illustration of Monte Carlo integration of the probability density function of Quotient C for the two datasets used and r * = 1.

Maximum likelihood estimation of r *
The probability density function of the maximum is given by  and let (r * ) be the likelihood function of g k (x k ). The part of this function that involves the parameter is the kernel and since the maximization of the likelihood function is done with respect to the parameter, the rest is irrelevant. Therefore

5 Results
Argument Description Value f The function to be optimized. The function is either minimized or maximized over its first argument. Equation (5-8) lower The lower end point of the interval to be searched. 1 upper The upper end point of the interval to be searched 100 tol Tolerance, the desired accuracy. 0.0001220703 Table 5-1.: Arguments used in the estimation of r * through the function optimize in R.
We have that As it is possible to see, the analytic solution to (5-9) is not simple, so the estimation of r * was performed through computational methods.
Estimation of r * through the function optimize in R The function optimize of the package stats in R -project allows to find in an easy way a maximum (as well as minimum) of a function f with respect to its first argument, across [a, b], an interval of possible values of the parameter to be estimated, where a = lower (the minimum possible value to the parameter) and b = upper (the maximum one); these are two arguments to be defined in R. The method used is a combination of golden section search and successive parabolic interpolation [42], and was designed for use with continuous functions [43].
The description of the arguments used are given in the Table (5-1). The L(r * ) was used as the function f because it showed better behaviour when it was optimized. The appendix B shows the behaviour of f for different values of X i and K.

Statistical power
The statistical power according to classical adjustment methods as well as quotient C was calculated using the approach proposed by Dudoit, Shaffer and Boldrick (see equation (5)(6)(7)(8)(9)(10)(11)) [44]. They propose that the concept of statistical power can be generalized in different ways depending on single or multiple hypotheses. They define the statistical power in the following three ways [87]:

Application of the proposed methodology to the dataset of Single Nucleotide Polymorphism associated with hardness of meat in sheep. 29
The probability of rejecting at least one false null hypothesis, that is equal to P r(S ≥ 1) = P r(T ≤ k 1 − 1).
The average probability of rejecting the false null hypotheses or average power defined as E(S)/k 1 , and The probability of rejecting all false null hypotheses, P r(S = m 1 ) = P r(T = 0).
Also, the authors define the estimated statistical power, denoted asζ, using the False Discovery Rate o F DR as: The F DR, proposed by Benjamini and Hochberg [56] is defined as F DR = E(Q), where Q is a measure of the number V of type I errors or the false discovery proportion (FDP). This last measure formally is definided as (2)(3)(4)(5).
The False Discovery Rate is equivalent to: Also, R ∼ Bin(K, π) [41] where the likelihood estimation of π, denoted asπ can be estimated aŝ π = r/K, where r is the number of null hypothesis rejected observed.

Application of the proposed methodology to the dataset of Single Nucleotide
Polymorphism associated with hardness of meat in sheep.

General results
The study population was classified in 61 sheep with hard meat and 69 with tender one. The minor allele frequency ranged from 0.76% to 11.5% in the whole sample (mean = 5.37%) and from 0.76% to 10.9% in cases (mean = 5.07%) and 0.76% to 11.5% in controls (mean = 5.37%). Figure Associated  4240  2606  2606  2606  2795  3787  Non associated  26536  28170  28170  28170  27981  26989  Total  30776  30776  30776  30776  30776 30776 Abbreviations: B-H, Benjamini and Hochberg´s method. Table  shows the number of SNPs associated and non associated at p ≤ 0.05 in an unadjusted analysis as well as according to classical multiple testing procedures and Quotient C.
Without any correction, the number of SNPs associated with hardness of meat in sheep was 4240, amount that decreases to 2606 when the procedure of Bonferroni, Holm or Hochberg is applied. These last methods detected 61.48% of the SNPs associated in a unadjusted analysis. The Benjamini and Hochberg´s correction identified a higher amount of Single Nucleotide Polymorphism associated with the outcome (2795) compared to FWER-based procedures, that represent a 7.22% of more polymorphism. On other hand, with respect to the Quotient C, this methodology identified as associated 3787 SNPs, a higher number compared to the classical methods (45.31% more than FWER-methods and 35.49% more than Benjamini and Hochberg´s one) and less than without correction. For illustrative purposes, Table (5-3) shows some of the SNPs detected as associated by Quotient C as test statistic and the comparison of its p-values with the values of the others methods; the SNPs were randomly selected and then sorted according its r * values. . It is possible to see a higher amount of polymorphism detected without correction compared to Bonferroni´s, Holm´s and Hochberg´s proposal. Also, as the polymorphism detected as associated with the FWER-controlling methods were the same, the Manhattan-plot were equal by the three procedures. On the other hand, the profile of SNPs associated with the hardness of meat was different when the Benjamini & Hochberg´s method was applied ( Figure 5-4, (A)), that implies not only the number of polymorphism associated is different when this method was used but also, the SNPs identified and their p−values. In Figure (5-4, (B)), the Manhattan-plot of Quotient C is presented and a clear difference not only in the number but also in the profile of the polymorphism is shown compared with the Manhattan-plot of the other proposals.

Application of the proposed methodology to the dataset of Single Nucleotide
Polymorphism associated with hardness of meat in sheep. 31    [48]. The SNPs were sorted in ascending order of their r * value.

5.6
Application of the proposed methodology to the dataset of SNPs from HLA-DRB1 associated with multiple sclerosis in an admixed Colombian population. 35

Statistical power
Table (5-4) reports the statistical power according to K as well as adjustment method. It is possible to see that when K is less than 2606, statistical power of all the methods is similar and close to 1. However, Bonferroni, Holm and Hochberg´s procedures showed an abrupt descent of power for K ≥ 2607 (the number of SNPs detected as associated by these methods). Benjamini & Hochberg´s correction had a higher statistical power compared to the other classical methods, even for K ≥ 2795 and its value decreased to 0 when K = 2959 (number of SNPs associated with the outcome using this procedure), showing a less abrupt decline than Bonferroni´s´s, Holm´s and Hochberg´s procedures.
When Quotient C was applied to detect SNPs associated with the outcome, the statistical power was higher than the other methods for multiple comparisons. Compared to Bonferroni´s, Holm´s and Hochberg´s proposals, when the latter ones reached zero power, Quotient C had values close to 1, even for 3787 SNPs (ζ = 0.604). The values ofζ for Benjamini & Hochberg´s correction were lower in all the Ks compared to Quotient C; thus, whenζ = 0 in the Benjamini and Hochberg´s one, the statistical power for Quotient C was close to 0.817. For Quotient C,ζ = 0 when K = 15267, and for K = 3787, the probability of making a type II error was 0.396 (which implies that every 100 false null hypothesis, about 40 are erroneously not rejected). 5.6. Application of the proposed methodology to the dataset of SNPs from HLA-DRB1 associated with multiple sclerosis in an admixed Colombian population.

General results
Table  reports the raw and adjusted p−values as well as the value of r * , C and p for Quotient C for each one of the SNPs studied. The Graph (5-6) shows a deviation of the raw p−values of the dotted line, which means a non-uniform profile of them and is possible to suspect that many of the tested hypothesis are false.
In an unadjusted analysis, two SNPs (DRB*15 and DRB1*14) were associated with multiple sclerosis (MS); when a classical correction was applied, only DRB1*15 was associated with the outcome in all the multiple testing procedures. Quotient C identified only the same SNPs as associated with MS (p = 0.047).

Statistical power
The statistical power of the classical multiple testing procedures for K = 1 was similar for all of them and higher than the Quotient C one. So, for K = 1, the probability of rejecting the null hypothesis when there are association between the SNP and the outcome is equal to 0.56 when Bonferroni´s, Holm´s, Hochberg´s and Benjamini and Hochberg´s corrections were applied and 0.21 for Quotient C. When K = 2,ζ = 0 for classical methods but Quotient C statistical power remained equal when K = 1. The value ofζ was 0 for Quotient C when K ≥ 3 (see Table (5-6)).

Simulations
The table (5)(6)(7) reports the results of the performed simulation using the dataset of Single Nucleotide Polymorphism associated with hardness of meat in sheep. Among the sample of SNPs detected by all the methods (782), all the compared procedures were able to detect the 100% of them in each one of the 1000 simulations. On the other hand, in the sample of polymorphisms detected only by Benjamini and Hochberg, the Bonferroni´s procedure did not find as associated any of them and the Quotient C detected as associated approximately 17-18 of those SNPs in each one of the simulations. Finally, the Quotient C identified during the simulations nearly 305-308 of the 308 SNPs detected only by this proposal, meanwhile the other two methods did not find any as associated with the outcome. (5)(6)(7)(8)(9)(10)(11) The simulation study has shown that Quotient C and Benjamini and Hochberg´s proposal are consistent with respect to the SNPs that are identified as associated by themselves. The number of polymorphism associated was always equal to the number of the sample for both methods. In addition, Quotient C identified in 1000 repetitions, SNPs that Benjamini and Hochberg´s method did not in contrast with the first methodology that identified some polymorphism associated by Benjamini and Hochberg.

Discussion
The classical methods (Bonferroni, Holm and Hochberg) control the multiple testing by the control of the familywise error rate (FWER) given by F W ER = P (V > 0) = P (Q > 0) (probability that R contains any error) [41,54,55,98]. Meanwhile Benjamini and Hochberg´s procedure controls the false discovery rate (FDR) or E(Q) (the expected proportion of errors among R) [56]. For those methods, the researcher selects an error rate to control type I error and the procedure finds a collection R according to an a priori criterion, (generally the significance level, or α). This procedure is straightforward regarding the control of false positives but affects power enormously [90]. Moreover, given the amount of statistical tests done in genomic data, the procedure of multiple testing correction is not the best choice.
As shown, the here proposed test statistic, Quotient C, solves several problems of previous procedures as it allows to correct type I error and increases power at the same time, being simple to apply to SNP data and a categorical trait. In the dataset of Ariza et al. [48], the Bonferroni´s, Holm´s and Hochberg´s procedure detected a lower amount of SNPs associated with the outcome than Benjamini and Hochberg´s method. This amount was lower than the number of polymorphism detected by Quotient C. The FWER-based methods are considered very strict and therefore very conservative at level α because the probability of wrongly rejecting a true null hypothesis is ≤ α [60,91,92,62]. Controlling the familywise error rate implies to prevent type I errors at all cost (thus, when the FWER-based procedures are used, it results in very few rejected hypotheses or non at all, but the individually rejected hypotheses are reliable) and consequently the amount of type II errors are high [41,62,93,94]. Those type of errors can be very expensive in a clinical, biological or animal breeding context. If a type II error is done and false negatives appear, this means that a disease is not detected, or in the genomic context, that a SNP is declared as non associated with a particular trail when it is really associated with it. Although Bonferroni´s, Holm´s and Hochberg´s method control the same type of error, the last two are considered less conservative [93,95], perhaps in the two datasets used, the number of rejected null hypotheses was equal in all the FWER-based methods. The Benjamini and Hochberg´s proposal is considered less conservative than FWER-controlling methods [96,97] and its critical values are much larger, so usually more rejections can be made; in the dataset of the hardness of the meat, 189 more SNPs were detected as associated with a outcome than FWER´s methods.
In FWER approach, if the set of hypotheses R has F W ER ≤ α, each one of those hypotheses in R has a probability ≤ α of being a type I error, meanwhile, in FDR control methods, the collection R has on average a probability α on that subset (not each individual hypothesis) [41,98,99]. That is an advantage of FWER-based procedures over FDR-based procedures. Although the last one is considered one of the best options in multiple testing problems, its usefulness is still being criticized due to the property of the controls and the implications of its conclusions. An open question is how risky it can be to draw conclusions using the individually rejected hypotheses [41,100,111]. As FDR= E[Q], the collection R can have FDP> α. That implies that controlling the FDR at an given level of α can allow that the amount of false rejecting on R is greater than the type I error allowed in the study. Since the set R has on average FDP≤ α, this measure is not the same for any sub collection on R, which can have higher or less values of FDP than the complete collection. Thus, is likely that R has false positives and an individually rejected hypothesis can be a false positive indeed [41,102,108]. A subsequent disadvantage of this situation is the interpretation of the adjusted p values; since they reflect a property of the entire collection R and not of each individual hypothesis. Therefore decisions on individual SNPs are risky.
For both data sets, all the SNPs associated using Benjamini and Hochberg´s method were the same detected by Bonferroni´s, Holm´s and Hochberg´s one. The FWER and FDR are related: how 0 ≤ V R ≤ 1, then F DR ≤ P (Q > 0), which implies that all the methods that control the FWER are also methods that control FDR [41].
Since E(Q) ≤ P (Q > 0), FDR is at the most equal to the FWER and it is easier to keep the FDR below α than to keep the FWER at the same level. So, the expect result is that FDRcontrolling procedures have more power compared to FWER-based methods. To keep the level α has an important impact over the errors committed; FWER-controlling methods introduce a greater amount of type II errors, so its statistical power is lower compared to methods that control the false discovery rate. Benjamini and Hochberg´s proposal is considered the most powerful procedure in multiple testing problems because the FDR control is less stringent than FWERcontrolling methods [104,100,105]. In both data sets, the statistical power was similar for the classical methods until a number of SNPs equal to the number of SNPs detected as associated with the outcome was reached for FWER-controlling methods, when K > #R, the Benjamini and Hochberg´s statistical power was higher even beyond the number of SNPs declared as associated by this proposal. Relaxing the criterion from FWER to FDR and then to Quotient C, clearly makes an important difference in terms of improving statistical power.
We propose two applications of the defined collection R using the Quotient C: (i) to be a test statistic by itself to evaluate the association between SNPs and an outcome at a level α, allowing to generate conclusions from individually rejected hypothesis (this a consequence of being a FWERcontrolling method) and (ii) in a conservative way, to define an exploratory set of hypothesis to be tested later using classical multiple comparisons procedures (be used as a preliminary filter). In both cases, a less stringent condition in which H k , k = 1, 2, . . . , K are associated if it belongs to the maximum of the distribution has shown to improve the statistical power (and to detect a higher amount of SNPs associated) compared to the classical methods, principally compared to those that control FWER.
In the Quotient C proposal, no assumptions beyond independence between the hypothesis tests were assumed (not over the p−values). In multiple testing problems, mainly in genomics, some dependence structure of the p−values must be considered (no dependence, positive dependence through stochastic ordering and some methods use an adapted dependency structure of the p−values using permutations) [41]. Generally, the data show a degree of internal correlation as statistical 44 6 Discussion dependencies between variables [89,106]. Bonferroni´s and Holm´s proposal are valid under any dependence structure of the p−values, Hochberg is valid if the PDS assumption holds as well as Benjamini and Hochberg [41,107,108]. The worst scenario about the dependency structure was considered in our proposal and more conservative results were obtained with respect to other methods that consider different dependence structure of the p−values. However, future studies must be performed to establish how the proposed statistical test works regarding other assumptions over the p−values. Ignoring the dependence between the hypothesis test (and their respective p−values) can result in both bias due to the priori unknown correlation or highly variable significance measures [57,109]. Other authors [106] have shown that the correlation may increase the bias and variance of the false discovery rate estimation substantially with respect to the independent case, and that in some cases, such as an exchangeable correlation structure, the estimator fails to be consistent as the number of tests becomes large. The current consensus establishes that, in genomic data, it is safe to use methods based on some structure dependence of the p−values [41,110,111]. From the profile of the raw p−values in both datasets, some doubts exist about the dependence structure of they; although a uniformity of the p-values of true hypotheses is expected on average, the existence of correlation between them can explain the shape of those figures [41,112]. This is true, especially for the data set of HLA-DRB1 associated with multiple sclerosis in an admixed Colombian population. In GWAS, usually very few SNPs are expected as associated with the outcome and therefore a lower amount of false null hypothesis, therefore, a deviation from the uniform shape of the histogram or from the dotted line on the sorted p−values is taken also as an indication of lack of model fit or of inappropriate formulation of null hypotheses [41,113,114]. Without further analysis, it is not possible to attribute the deviations seen from uniformity to the presence of some dependence structure of the p−values or false hypotheses.
The statistical problem that was addressed implies to find a new method to establish associations between exposures (alleles of SNPs) and one dichotomous outcome. This univariate view of the problem is a situation that is not completely realistic. In most of the diseases, as in complex diseases, only a genetic variation is not enough to explain an outcome. Behavioral and environmental factors mediate the way in which a genetic variant is expressed to produce a particular trait. In other situations, the outcome is a result of interactions between genetic variations and external factors. A broader approach to the statistical and biological problem requires a multivariate view; Quotient C allows a univariate solution to this particular problem in Genome-wide association studies and further work should be addressed to reach a multivariate extension of the proposed method, controlling the effect of multiple independent variables (including interaction between SNPs) on the outcome.
From the expressions (5-7) it is possible to see that when K, the number of tested SNPs is too large, a common situation in GWAS, the calculus of f C (c) is problematic. Genome-wide association studies often involve hundreds, thousands or even millions of SNPs tested simultaneously, so K is expected to be large and computationally, the implementation of Quotient C can be limitated. This limitation has a clear impact on the utility of Quotient C in terms of precision and acceptance as a alternative for the scientific community and a more general public.
Although Quotient C was originally designed to solve a particular statistical problem deviated from a biologic problem, its usefulness is able to be extended to other scenarios in which associations between dichotomous variables must be found in the context of a multiple testing problem as social sciences, Economy, Medicine, Epidemiology, Agronomy and animal health, among other. In my work, I have shown that the classical methods are not the only solution to this type of situations. Some of the most important questions about Quotient C are now open and future work must be performed to understand clearly the limits of my proposal.

Conclusions and perspectives
Quotient C represents a new option to solve the problems generated in multiple testing problems, with a direct application to GWAS and other scenarios. It represents a less stringent criterion and allows to declare a larger amount of associations between SNPs and dichotomous outcomes compared to the classical methods used to correct for multiple testing, keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than α.
The proposed methodology has a lower type II error rate and a better statistical power than Bonferroni´s, Holm´s, Hochberg´s and Benjamini & Hochberg´s methods. To improve the statistical power without increasing the sample size and without forgetting the high-dimensionality of the problem represents clear advantages of Quotient C compared to the previous proposals, allowing minimizing the amount of false negatives on the results. Also, the procedure is simple to implement and its interpretation is easy for statisticians as well as for scientists that produce the data.
Future works must be performed in order to establish the validity of the proposed methodology under other assumptions and to expand the proposal so that a broader approach to the statistical and biological problem will be reached.

Ethical considerations
This thesis was developed following the Ethical Guidelines for Statistical Practice Prepared by the Committee on Professional Ethics of the American Statistical Association [115].  For the development and applying of the proposed methodology, there were not participant recruitment or obtaining biological samples of any type. In addition, there were no sacrifice of animals. So, to reach the proposed objectives was used data sets given by Ariza et al. [48] and Toro et al. [50]. (8-1) The research Evaluación genómica asociada a características de calidad de la carne de ovino de pelo criollo colombiano was approved by Comité de Bióetica of the Faculty of veterinary medicine and zootechnics at the Universidad Nacional de Colombia. Likewise, the research HLA-DRB1*14 is a protective allele for multiple sclerosis in an admixed Colombian population was approved by Comité Corporativo deÉtica en Investigación at Fundación Santa Fé de Bogotá. Both committees warranted that the original research that recollected the data in each one of the data sets used in this thesis complied with ethical principles to conduct research in animals and humans according to each case.