On the approximation of interaction effectmodels by Hadamard powers of the additive genomic relationship

Whole genome epistasis models with interactions between different loci can be approximated by genomic relationship models based on Hadamard powers of the additive genomic relationship. We illustrate that the quality of this approximation reduces when the degree of interaction d increases. Moreover, considering relationship models defined as weighted sum of interactions of different degree, we investigate the impact of this decreasing quality of approximation of the summands on the approximation of the weighted sum. Our results indicate that these approximations remain on a reliable level, but their quality reduces when the weights of interactions of higher degrees do not decrease quickly. © 2020 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
With the broad availability of genomic data of individual animals or plant lines, genomic prediction (Meuwissen et al., 2001) has been widely implemented in modern breeding programs (Hayes et al., 2009;Jannink et al., 2010;Meuwissen et al., 2016;Crossa et al., 2017). The standard method genomic best linear unbiased prediction (GBLUP) is based on the commonly used additive effect model (Falconer and Mackay, 1996;Gianola et al., 2009). Given the n × p matrix M describing the marker states of the n individuals at p loci, the additive effect model is defined by y = 1 n µ + Mβ + ϵ. (1) Here, y is the n × 1 vector of phenotypic data, 1 n an n × 1 vector with each entry equal to 1, µ a fixed effect, β a p × 1 vector of marker effects and ϵ an n × 1 vector of errors. Moreover, usually the additional assumptions of β ∼ N (0, σ 2 β I p ) and ϵ ∼ N (0, σ 2 ϵ I n ) are made. I p and I n denote the Identity matrix of respective dimension (Note that the term GBLUP usually refers to a reformulated version of Eq. (1) using g := Mβ, but this distinction will not be used here since both models are statistically equivalent).
Having estimated/predicted the relevant parameters asμ and β, the predicted effect of a change at an arbitrary locus k is independent of the state of other markers. This characteristic seems contrary to the function of biological systems which rely on interaction and where thus the effect of a change at locus k is assumed to depend on the genetic background. The discrepancy between the intrinsic logic of the statistical additive model and the mechanistics of biological processes may provide a motivation to consider ''non-additive" relationships for the prediction of (non-additive) total genetic values or phenotypes (de los Campos et al., 2009;Ober et al., 2011). An epistasis model extending the additive setup of Eq. (1) with products of markers as additional predictors (Ober et al., 2015;Jiang and Reif, 2015;Martini et al., 2016) is called the extended genomic best linear unbiased prediction (EGBLUP) (Jiang and Reif, 2015). In more detail, this pair epistasis model is defined by k=1,...,p−1;l>k M i,k M i,l Here, µ and β are as previously defined and M i,• denotes the ith row of M, that is the genomic data of individual i. Moreover, h k,l is the interaction effect of loci k and l with h k,l i.i.d ∼ N (0, σ 2 h ) and all random effects being stochastic independent from each other. It has been demonstrated that model (2) is equivalent (Martini et al., 2016) to a model y = 1 n µ + g 1 + g 2 + ϵ The operator • denotes here the Hadamard, that is the entry-wise product. This model and some variations have been shown to be able to increase predictive ability for some incidences and compared to the additive GBLUP model (Su et al., 2012;Ober et al., 2015;Jiang and Reif, 2015;Martini et al., 2016 types of relationship matrices have also been used to control genetic background effects in association studies (Xu, 2013). In Eq.
(2), the interactions are modeled pairwise and only between different loci (l > k). Some variations of this model have been used in literature (Jiang and Reif, 2015;Martini et al., 2016) defined by allowing interaction of the loci with themselves (l ≥ k) or modeling the p 2 interactions by counting the interaction between different loci twice (k, l = 1, . . . , p) It has been shown that the interaction terms of Eqs. (5)-(6) translate to covariance matrices of g 2 of Eq. (3) the following way (Martini et al., 2016): Moreover, it has also been demonstrated that for higher degrees of interactions d, the sum of all p d d-wise interaction terms translate to Hadamard powers of G (Martini et al., 2016): l,m,o=1,...,p M i,k M i,l M i,m M i,o h k,l,m,o= and analogously for any degree d. We are not aware of a general concise formula that generalizes Eq. (4) to a general higher degree d (the cases d = 3 and d = 4 are treated in the Appendix.) Note here that Eq. (9) also includes terms of form M 3 i,k that is a three way interaction within a locus. Terms of this type of intra-locus interactions of higher degree d may be difficult to interpret from a quantitative genetics point of view.
It has been argued in the context of a specific marker coding (VanRaden, 2008) that for increasing number of markers p, the quality of an approximation of Eq. (4), which models only the interactions between different loci, by Eq. (8), which models p 2 interactions, improves (Jiang and Reif, 2015). The reason for this improving quality of approximation is roughly spoken a result of (M • M)(M • M) ′ getting relatively small compared to (G • G), and the factor 0.5 being compensated by an adapted estimate of the variance componentσ 2 h .
For the case of d = 2, Eq. (4) allows to use model (2) grows very fast. We show that the argumentation which illustrates that for d = 2 and increasing p the quality of an approximation of Eq. (4) by Eq. (8) improves, can analogously be adapted to any fixed degree d and increasing p. This means that for any fixed degree of interaction d and increasing p, the quality of the approximation of a model based on interactions between different loci by a Hadamard power of the additive genomic relationship improves. A different situation -which is important for limit considerations of models with increasing degree of interaction -is increasing d for fixed p. The incorporation of higher degree interactions can lead to new relationship models that aim at reflecting biological complexity better. We show that the quality of the approximation of a model with interactions between different loci by Hadamard powers of G reduces when p is fixed and d increases. Moreover, we investigate the limit behavior of weighted sums of interactions of increasing degree, in particular, their reliability when substituting interactions between different loci by Hadamard powers of the additive relationship. We show that the approximation may be less reliable if the weights of higher degree interactions do not decrease fast enough.
As a remark please be aware of the problem that the coding of the markers has an impact on epistasis models which use the products of marker values as additional predictor variables (He and Parida, 2016;Martini et al., 2017Martini et al., , 2019. However, this topic of how to code markers will be ignored in this manuscript. The presented results are independent of the coding of markers.

Some words on an improvement of the approximation
Let us first make some theoretical considerations on what ''the quality of an approximation of Eq. (4) by Eq. (8) improves'' shall mean. We definẽ G p denotes here the additive genomic relationship matrix based on p markers. Considering the fact that a factor c ∈ R + can be compensated by estimating a different variance component, a first idea to make the statement on improving the quality of the approximation more precise could be: This expression has earlier been used (Jiang and Reif, 2015) with c = 2 and for the specific situation of an allele-frequencycentered and scaled additive genomic relationship according to VanRaden (2008). (Note again that even though subtracting the mean from each column will not have an effect on the prediction of additive effects (Strandén and Christensen, 2011;Martini et al., 2017), it has been shown that this transformation has an impact if the centered values are multiplied to model interactions (He and Parida, 2016;Martini et al., 2017Martini et al., , 2019.) In the current setup -with G := MM ′ -expression (11) raises some questions and would require case distinctions or restrictions. Additionally to the formal question of which metric to use to define the limit, there are more conceptual questions. For instance it is not clear how it should be decided which marker pattern a new column has when another marker column is added. Without a restriction on how to add a new column, one can find examples for which the limits in Eq. (11) are not defined or for which a convergence of the entries to 0 leads to a situation in which Eq. (11) is satisfied, but the approximation of the two matrices does not improve. Some examples can be found in the Appendix.

A simple criterion for the quality of the approximation
To avoid these complications, we choose a simple way to characterize how good the approximation is for fixed p and d = 2. This criterion will not solve the problem of how to add new marker columns when p increases, but gives a simple and well-defined equation.
The model described in Eq. (8) models p 2 interactions consisting of the interactions which we want to model (each of them twice) and additionally the p interactions of markers with themselves which are not included in Eq. (4). Since model (4) is a submodel of G • G, and since each interaction has the same influence on the relationship matrix due to assuming their effects to be independent and identically distributed, we can define a measure for the ''error'' of the approximation as the proportion of interactions which we model in G • G but which are not included in Eq. (4).
In this case of d = 2, this is given by the p 2 interactions which we model minus twice the E 2 describes the error as portion of interactions which we model in our approximation but which are not included in Eq. (4). We see that which confirms relatively easily that G • G is a good approximation for Eq. (4) if p is large. Note here that the difference between Eqs. (2) and (6) which is here -for degree 2 -(not) including interactions of a marker with itself, can also be considered as the difference of the possible events when drawing from {1, . . . , p} with or without replacement. We have to subtract from a set of events of drawing with replacement, those that are not possible when drawing without replacement. This view may facilitate to comprehend the relation between the two models when considering higher degrees d afterwards.
Expression E 2 (p) is well-defined, but also only guarantees that the approximation of H (2) p is constantly the 0 2×2 matrix. Thus, the approximation will not improve. The reason is here that the only interaction which is not zero is the interaction of the first marker with itself for individual 1. An improving approximation will only be given if the values of

General degree d
Analogously, for general d ≤ p, Eq. (12) generalizes to The term is built by subtracting the portion of required interactions (and their d! permutations) form 1. Also here, we see that However, for fixed p and increasing degree, the quality of approximation reduces and the error E d (p) reaches even 1 when d is larger than p (recall that the binomial coefficient is defined as being zero when d > p): This means that is the error tends to 0 when d is fixed and p is increasing, but it approaches 1 when p is fixed and d is increasing. In parts, this is an obvious result because dealing with a model with p markers, we can calculate the (p + 1)th Hadamard power G •(p+1) , but there is no interaction between p + 1 different loci.
Also note that E d (p) is a strictly monotonously increasing function for fixed p and increasing d ≤ p since A proof of this statement can be found in the Appendix.
To illustrate this observation, let us consider a small example.
Example 1. Let us consider the case of five markers (p = 5) and the approximation of degree five (d = 5). Then Example 1 illustrates that the quality of the approximation can decrease quickly when the number of markers is very small.

Approximations for real genotypic data
Let us consider a data set of real genotypes. We use the marker data of a wheat data set published by Crossa et al. (2010) and also provided by the R package BGLR (Pérez and de Los Campos, 2014). For more information on the data set, see Crossa et al. (2010).
We use a {±1} coding of the marker data, start with p = 1 and take the first marker of the data set to calculate G 1 = M 1 M ′ 1 and the corresponding H (1) 1 . Note that for d = 1, H (1) p = G p for any p, meaning that the additive matrices are the same, or in other words, when drawing only one-element sets, it does not matter whether we draw with or without replacement.
We then subsequently increase the number of markers by adding the following columns of the marker matrix, and calculate the matrices G •d p and H  Table 1 for p ∈ {1, . . . , 25} and d ≤ p. We used the correlation of the entries as a similarity measure of the matrices because it is a simple criterion and independent of the data structure of a phenotype y. Since H (d) p does not exist for d > p, no correlation is given for these cases.
However, it is clear that the error of the approximation in the previously discussed sense is 100% for these cases. We see that for any fixed d and increasing p, the correlation of the entries of G •d p and H (d) p increases, but for any fixed p, the correlation reduces with increasing d.
An interesting aspect is that the correlation for d = 2 is directly equal to one, already for the case of p = 2. This is a result of using markers which only have two states coded as {±1}.
Also if we consider the cases d = p, we see that the correlation of the matrices tends to 0 for increasing p, which has already been stated by Eq. (14).

Limit considerations of models with higher degree interactions
In the following, we would like to investigate empirically whether the decreasing quality of the approximation for higher degree interactions has an impact on limit considerations.
Limit Problem 1. We would like to build a model that takes all interactions of p different loci into account. We assume for this limit model that the variance component σ 2 β remains the same for any degree d, that is any interaction effect of any degree comes from the same distribution. We can formulate the model which we are interested in as Since it is computationally demanding to calculate the matrices H (d) for higher degree interaction, we are interested in an approximation using Hadamard powers G •d . As discussed above, the matrix G •d counts each interaction which we aim to model in H (d) d! times. Moreover, additional interactions are included in G •d in which we are not interested. To give an equal weight to any interaction which we would like to model, we have to divide each G •d by d! to guarantee that the weights are adapted between degrees. Without this adjustment, we would model the interactions of degree two twice giving them twice the weight of the additive effects. Analogously, the interactions of degree three would be modeled six times, giving each of them six times the weight of an additive effect.
An approximation of our desired relationship model can thus be given by where we include the inverse factorial to scale the matrices relatively to each other. Since each entry in Eq. (17) follows the exponential power series, it can be approximated for large p by Please recall that the operations are here meant entry-wise. In particular, the exponential function refers to the entry-wise exponential (and not the matrix exponential): (exp(G)) i,j := exp(G i,j ).
Moreover, note here that this limit is not identical to the Gaussian kernel in a reproducing kernel Hilbert space approach, since the exponential function is not applied to the squared Euclidean distance but to the entries of G (which is slightly different from a limit consideration leading to the Gaussian kernel and presented by Jiang and Reif (2015)). A question is how good the approximation of the covariance model which we actually would like to model (Eq. (16)) by Eq. (18) is. Although the presence of the inverse factorials, which give the weights to the Hadamard powers of G in Eq. (17), suggests that the influence of higher degree interactions will quickly vanish, a general theoretical consideration is difficult since the quality of approximation also depends on how fast the entries of G p grow.
For this reason, we use the relationship matrices calculated for the wheat data set (Table 1) as well as a {−1, 0, 1} coded maize data (for more details on the data see the section Data at the end of the manuscript) and consider the correlation of the matrices defined by Eqs. (16) and (18) for increasing p. The results are presented in Fig. 1. We see for the wheat data (blue line) an initially high correlation which is a result of only using one marker. Since we have only one marker with the two possible values {±1}, each entry of G 1 has only two possibilities. Applying the non-linear exponential function still gives a matrix with only two values which is perfectly correlated with G 1 . This high correlation is then reduced when a second marker is introduced and the correlation keeps decreasing until the increase in p improves the approximation sufficiently to push the correlation again towards 1. For the maize data (black line), the correlation starts -since we are dealing with markers with three states -on a lower level but increases quickly towards 1.

Limit Problem 2.
Let us assume that we would like to use a model in which the variance of the higher degree interaction increases by d!
It should be mentioned here that epistatic effects are usually defined as deviations from the fit defined by lower degree interactions. This concept would translate to the assumption that the variance components decrease with increasing d. However, Eq. (19) is a valid covariance model and we would like to investigate the effect of these increasing weights on the overall approximation. A reason for defining an increasing variance could be to give the higher degree interactions more flexibility to capture some important interaction terms. We may approximate Eq. (19) by which converges for increasing p to . 2. Correlation of the entries of the matrices defined by Eqs. (19) and (21) for increasing p and the maize data (coded as {−1, 0, 1}/ √ p).
iff |G i,j | < 1 ∀i, j. The latter condition of the entries having absolute values smaller than 1 is essential, since otherwise the series of Eq. (20) will not converge. Recall again that all operations are meant entry-wise. This condition of not any absolute entry being larger than or equal to 1 is for instance given when we are dealing with {−1, 0, 1} coded data, the marker data is divided by the square root of p (which is equal to dividing G by p), and if none of the lines is completely homozygous. The wheat data is not appropriate for this limit, since it has only two values and the entries on the diagonal would be equal to 1 (when dividing the markers by √ p). However, we consider the behavior of the correlation for the maize data (with dividing the marker values by √ p). Fig. 2 illustrates that -since the weights of the higher degree interactions are not reduced in Eq. (20) -the accumulated error across the different degrees d matters more than in Limit Problem 1. The correlation of the entries of the matrices defined by Eqs. (19) and (21) reduces, and a reversion of this trend cannot be observed up to p = 30, which was the maximal value for which we were able to calculate H (p) with our approach. Note here that for values of p below five, some of the included lines were completely homozygous, which leads to a situation of the maximal value of the additive relationship matrix being 1 and thus Eq. (21) not being defined. Therefore, no correlation is given for these points.

Summary and outlook
We gave an explicit formula to quantify the error when approximating a model with interactions between different loci by Hadamard powers of the additive genomic relationship (Eq. (13)). The criterion used to quantify the quality of the approximation also struggles with the problem of how to add a new column of markers when p increases, but gives a simple and well-defined equation.
We illustrated that when the number of markers p is fixed and d increases, the quality of the approximation of H (d) by G •d decreases. For limit considerations such as Limit Problem 1, where the impact of higher degree interactions reduces quickly, this reduced quality does not have a big impact on the quality of approximation of the overall limit. However -as illustrated by Limit Problem 2 -for models in which the weight is not reduced fast enough with increasing degree d, the overall limit can have a lower (but still high) correlation with what is supposed to be approximated.
Due to the computational restrictions, we were not able to calculate all H (d) with d ≤ p for p larger than 30. Thus, we cannot judge the behavior of our empirical considerations of Limit Problems 1 and 2 above 30 markers. An interesting theoretical problem -which would also allow to investigate the behavior of limits for larger values of p -would be to find a concise equation generalizing Eq. (4) to any degree d.

Data
As described above, the wheat data has been published by Crossa et al. (2010) and is also provided by the R package BGLR (Pérez and de Los Campos, 2014). The maize data was provided by the same publication (Crossa et al., 2010). We used the file dataCorn_SS_asi.RData which is available in File S1 of following link https://www.genetics.org/content/186/2/713.supplemental We reduced the set to the 101 lines which had at least one heterozygous (0) marker within the first five markers. Since calculating H (d) is computationally demanding, we restricted us to this subset for which Eq. (21) is already defined for p = 5. This reduced data set was used for Limit Problems 1 and 2.

Acknowledgments
We are thankful for the financial support provided by CIMMYT, CGIAR CRP WHEAT, the Bill & Melinda Gates Foundation, as well as the USAID projects (Cornell University and Kansas State University) that generated the CIMMYT wheat data analyzed in this study. We acknowledge the financial support provided by the Foundation for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) in Norway through NFR grant 267806. Moreover, we thank two anonymous referees, especially the one who pointed out an important error in the Appendix.

Appendix A. Extensions of Eq. (4) to d ∈ {3, 4} and illustration of the general problem
As pointed out earlier, the difference between G •d and H (d) is represented by the difference in the sets of possible events when drawing d times from {1, . . . , p} either with or without replacement. For d = 2, we can simply use the matrix corresponding to the set of interactions {1, . . . , p}×{1, . . . , p}, remove the covariance matrix coming from the tuples (i, i) and divide the remaining matrix by 2 to account for not considering the order of the draws, which means (i, j) is considered to be equal to (j, i). The matrix G •2 corresponds to {1, . . . , p} × {1, . . . , p} and the matrix Before, we go to the cases of d ∈ {3, 4}, recall that we are looking for a concise formula, that is one that can be easily calculated using Hadamard products of M and G. It is obvious that there are equations which allow to calculate H (d) , for instance the straight-forward approach used in this manuscript, but we are looking for a formula allowing the use of Hadamard products and simplifying the computation.
Let us now consider the case of d = 3. Analogously to the case of d = 2, we identify G •3 with the set of interactions described by {1, . . . , p} × {1, . . . , p} × {1, . . . , p}. We have to subtract the matrix corresponding to {(i, i) i=1,...,p } × {1, . . . , p} and its permutations. This can be represented by the matrix 3 However, this matrix also includes the covariance generated by This procedure is similar to the ''principle of inclusion and exclusion" known from basic set theory and gives the following identity Analogously, a formula for degree d = 4 can be derived.
However, we have to consider here the sets with their corresponding permutations. Moreover, an important point is to be aware of the fact that the sets are in parts subsets of others. Thus, we obtain .
Each summand on the right hand side corresponds to one of the sets mentioned above.
For d = 5, the following sets would have to be considered:

)
In particular this means that the diagonal will increase with p and ''tend to ∞", but the off-diagonal elements will alternate.
Here, the limit ofH is not defined (even if ''∞" is considered as a limit), that is Eq. (11) does not make sense for this example.
Note here that one could argue that the limit G p /p would be well-defined and that we have to use this alternative definition of the genomic relationship matrix. However, we can define a more complicated example for which G p /p will neither converge.

B.2. An example for which G p /p does not converge and thus Eq. (11)
is not defined ) .
We would like to thank an anonymous reviewer here who pointed out that -based on the above definition -we can write Consequently, G p /p has a subsequence converging to .
Thus, G p /p does not converge.
The examples described above aimed at illustrating the problem of possibly alternating entries. Another important situation to consider is the convergence of the entries of G p to zero.
Both matrices have the same limit, which means Eq. (11) is fulfilled with c = 1, but inH p the variances will always have the double weight of the covariances, whereas all entries are identical in H p . Thus, the quality of approximation when approximating one by the other should remain on the same level -independent of p.