Different types of Bernstein operators in inference of Gaussian graphical model

The Gaussian graphical model (GGM) is a powerful tool to describe the relationship between the nodes via the inverse of the covariance matrix in a complex biological system. But the inference of this matrix is problematic because of its high dimension and sparsity. From previous analyses, it has been shown that the Bernstein and Szasz polynomials can improve the accuracy of the estimate if they are used in advance of the inference as a processing step of the data. Hereby in this study, we consider whether any type of the Bernstein operators such as the Bleiman Butzer Hahn, Meyer-König, and Zeller operators can be performed for the improvement of the accuracy or only the Bernstein and the Szasz polynomials can satisfy this condition. From the findings of the Monte Carlo runs, we detect that the highest accuracies in GGM can be obtained under the Bernstein and Szasz polynomials, rather than all other types of the Bernstein polynomials, from small to high-dimensional biological networks. Subjects: Bioinformatics; Biology; Bioscience; Mathematics & Statistics; Multivariate Statistics; Science; Statistical Theory & Methods; Statistics; Statistics & Probability; Statistics for the Biological Sciences

ABOUT THE AUTHOR Vilda Purutçuoğlu is the associate professor in the Department of Statistics at Middle East Technical University (METU) and also affiliated faculty in the Informatics Institute, Institute of Applied Mathematics and Department of Biomedical Engineering at METU. She has completed her BSc and MSc in statistics and minor degree in economics. She received her doctorate at the Lancaster University. Purutçuoğlu's current researches are in the field of bioinformatics, systems biology and biostatistics. She has a research group with seven doctorates and four master students who have been working on deterministic and stochastic modelings of biological networks and their inferences via Bayesian and frequentist theories. Hereby, the findings in this study are the part of the researches about the improvement of probabilistic inferences of complex networks and these findings can be helpful for eliminating batch effects in their observations before any statistical analyses.

PUBLIC INTEREST STATEMENT
The Gaussian graphical model (GGM) is one of the well-known probabilistic modeling approaches for the complex biological systems. Briefly, this model expresses the biological interactions between components of systems, which are genes or proteins, by using the property of the conditional dependency under multivariate normally distributed data. On the other hand, the Bernstein Polynomials are one of the famous Bernsteintype of operators in the approximation theory to smoothen the data via statistical distributions within the range of [0,1]. From previous studies, it has been shown that these polynomials are successful in getting high accuracy in inference of interactions between genes in the system when they are used as the processing step in advance of GGM. In this study, we consider all Bernsteintype of operators, as the alternatives of the Bernstein polynomials, for the pre-processing step of GGM to discard the bath effects in the original observations.

Introduction
The approximation theory is concerned with the study of how well given functions can be proximated by basic functions. In this theory, it is usual to apply the approximating functions in the form of linear positive operators, such as the Bernstein, Szasz-Mirakyan polynomials, the Bleiman Butzer Hahn (BBH) operator, Meyer-König and Zeller (MKZ) operator. From previous works, it is known that the Bernstein and Szasz polynomials can significantly improve the accuracy of estimates from the point of view of their inference via the Gaussian graphical model (GGM) (Purutçuoğlu, Ağraz & Wit, 2015). Therefore, here we investigate whether the strong alternatives of the Bernstein polynomials, i.e. the Bleiman and Hahn operator and the Meyer-König and Zeller operator, are as successful as the Bernstein polynomials and can be used alternately. The Gaussian graphical model is a probabilistic and undirected statistical model for the complex biological networks under the normally distributed random multivariate variables whose dependency structure is represented by a graph (Dempster, 1972). In Figure 1, we represent the structure of the filtered yeast interactome (FYI) network (Han et al., 2004) as an example of realistically complex biological systems that GGM can handle with. Hereby, in GGM, we assume that the observations follow a multivariate normal distribution with a mean vector and the variance-covariance matrix Σ, and the conditional independent structure of the observations can be described via a set of nodes and edges constructing an undirected network by means of the inverse of the covariance matrix, also called the precision Θ = Σ −1 . In Θ, the non-zero entries indicate the interactions between nodes, i.e. genes, and zero entries imply no interaction between the selected pair of nodes.
On the other hand, the Bernstein operators which cover the Bernstein and Szasz polynomials are the approximations that are based on the binomial and the Poisson distribution, respectively. But in the literature, different Bernstein-type operators are presented as well. Specifically, the Bleiman-Butzer Hahn (BBH) and the Meyer-König and Zeller (MKZ) operators which are derived from the Bernstein operators are the most well-known ones. The theoretical properties of the BBH operator (Agratini, 1996) and MKZ operator (Abel, 1995;Becker, 1977) have been studied extensively.
In biological networks, the Bernstein polynomials enable us to transform the data in a new range (Lorentz, 1953;Bernstein, 1912). Accordingly, in this study, we consider to implement the Bernstein polynomials, the BBH operator and the MKZ operator in different dimensional biological systems before estimating the model parameters of GGM. Accordingly, in the organization of this paper, we introduce GGM and different types of operators in Section 2. In Section 3, we report our outputs based on the Monte Carlo studies. Finally, in Section 4, we summarize our results and discuss our future works.

Gaussian graphical model
The biological system is an expression between components of the complex biological networks and the interactions between components in this system can be probabilistically represented by the Gaussian graphical model. In this modeling approach, the nodes are indicated as X = (X 1 , … , X p ) under the assumption that the state vector X has a multivariate Gaussian (normal) distribution via in which is a p-dimensional mean vector and Σ shows the (p × p) variance-covariance matrix for the p-dimensional normal density of the random variable X with n samples per nodes, i.e. genes. Hereby, the p-dimensional normal density of X shown in Equation (1) can be denoted by the following probability density function.
In Equation (2), Σ is symmetric and invertible matrix, and its inverse called the precision, Θ, indicates the dependency structure between two nodes. Accordingly, in GGM, if there exists an edge between two nodes, i.e. Θ ij = Θ ji ≠ 0, these two nodes are not conditionally independent given other nodes (Whittaker, 1990). In other words, the non-zero entries of the off-diagonal entries of Θ imply a physical or functional interaction between the nodes (Dempster, 1972;Wermuth, 1976;Whittaker, 1990).
In Figure 2, we draw a small network having five nodes in which X 1 is conditionally independent on X 3 for given X 2 . Similarly, X 3 is conditionally independent on X 5 for given X 2 and X 4 in the same figure.
There are lots of techniques to estimate entries of the precision matrix in GGM. The neighborhood selection method with the lasso regression (Meinshaussen & Bühlmann, 2006) is one of the computationally efficient approaches for sparse and high-dimensional graphs. This method belongs to the class of the covariance selection approaches that is based on non-parametric calculation. In this model, given that the set of nodes and the number of nodes in the graph are denoted by Φ and |Φ(n)|, respectively, the neighborhood of the node p in Φ is the smallest subset of the node Φ ⧵ p. Since each state of the node, i.e. gene, p, X p , is defined as conditionally independent on the state of all remaining nodes, X −p , an optimal prediction of the vector of the regression coefficient of X p in the lasso regression (Tibshirani, 1996), p , can be found by the following expression. (1) (3)

Figure 2. Simple representation of the relationship between five nodes in a system.
In Equation (3), Φ denotes the set of nodes with n number of observations each, E(⋅) shows the expectation of the given random variable, and Θ p k stands for Θ p k = −Σ kp ∕Σ pp . Here, the best predictor for X p is detected as a linear function of variables in the set of neighbors of the node (Meinshaussen & Bühlmann, 2006). Furthermore, the graphical lasso, also named as glasso, approach (Friedman, Hastie, & Tibshiran, 2008), is another common technique in inference of biological networks. Briefly, this method controls the type I and type II errors by detecting the probability under very low levels when the penalized likelihood expression is defined as below.
In Equation (4), || ⋅ || 1 shows the l 1 -norm, which is the sum of the absolute value of the given element, and Tr(⋅) presents the trace. S indicates the sample covariance matrix S = Θ −1 . Accordingly, in inference of S in Equation (4) as an unbiased estimator of Σ in Equation (2), the sample covariance matrix is applied, i.e. Θ = S via Here, s ij denotes the ith row and the jth entry of S and n is the total sample size per gene as stated previously. Finally, x i k stands for the kth sample of the ith gene and x (j) refers to the mean term for the ith gene having totally n observations.
On the other hand, different researchers have also investigated other parametric solutions of this problem. But the main challenge under these calculations is the high dimension of genes (nodes) with respect to the observations per gene, denoted by n, i.e. n << p. This inequality causes a singular covariance matrix, resulting in multiple of solutions if we perform the standard maximum likelihood technique in inference of model parameters. Hereby, in order to unravel this problem, different types of penalized methods are also suggested besides the neighborhood selection and glasso approaches as described above. These methods are based on distinct optimization approaches of the lasso regression in the estimation of Θ. We can also list grouped lasso (Yuan & Lin, citeyuan), elastic net method (Zou & Hastie, 2005), fused lasso (Tibshirani, Saunders, Rosset, Zhu, & Knight, 2005), adaptive lasso (Zou, 2006), coordinate-wise descent algorithm within the l 1 -penalized lasso (Friedman, Hastie, & Tibshiran, 2007), and multivariate methods with scout algorithm (Witten & Tibshirani, 2009) approaches as the other most well-known approaches used in GGM. All these methods mainly suggest similar model construction for the penalized maximum likelihood function defined for the lasso regression by either the l 1 norm or l 2 -norm as the penalizing term. On the other side, recently, certain semi-parametric approaches such as the non-paranormal SKEPTIC algorithm (Liu, Han, Yuan, Lafferty, & Wasserman, 2012) and fully parametric approaches via Bayesian techniques such as the birth-death Markov chain Monte Carlo (MCMC) (Muhammadi & Wit, 2014) approach are proposed in order to estimate Θ without using any optimization methods. In the literature of GGM, different types of Monte Carlo approaches have been also performed. Atay-Kayis and Massam (2005) apply Monte Carlo methods in the calculation of Θ when the network graph in GGM is non-decomposable. Dobra, Lenkoski, and Rodriguez (2011) and Wang and Li (2012) introduce the reversible jump MCMC and Dauwels, Yu, Xu, and Wang (2013) implement the Monte Carlo expectation maximization method within copula GGM. Whereas among all these alternatives, the neighborhood selection approach, suggested by Meinshaussen and Bühlmann (2006), is one of the most common techniques due to its simplicity in inference. Hence, in our analyses, we prefer this method and perform it in junction with Monte Carlo simulations in order to evaluate the performance of different estimates.

Bernstein polynomials
The Bernstein polynomial was introduced by Sergey Natanovich Bernstein as a new proof of the Weierstrass approximation theorem. This polynomial privileges in the approximation theory since it applies the constructive new polynomials to prove the Weierstrass approximation instead of polynomials that are already known as mathematicians. The Bernstein polynomial (Lorentz, 1953) has certain properties such as the positivity, symmetry, and the degree raising. The first characteristic means b k:n = b n−k:n while i = 1, … , n. On the other hand, the second feature presents b k:n (x) > 0 and finally, the third one implies that any lower degree Bernstein polynomial is written as a linear combination of the nth Bernstein polynomials.
The generalized version of the Bernstein operator is called the Szas-Mirakyan operator and is defined as where x ∈ [0, 1] and the function f is defined on an infinite interval R + = [0, ∞).

Butzer Hahn operator
Bleiman, Butzer and Hahn (BBH) operator (Bleiman, Butzer, & Hahn, 1980) which is defined by the Bernstein-type can be represented as below for the nth degree with the k basis polynomials for the value x.
For Equation (9), the following inequality is satisfied.
Equation (10) Additionally, the property of the uniform approximation of the BBH operator has been studied (Totik, 1984) when f belongs to C[0, ∞] for the continuous function on [0, ∞). Also Mercer (1989) has independently derived the Voronovskaya-type theorem which gives an asymptotic error term for the Bernstein polynomials for the functions which are twice differentiable as follows.
is the second derivative of the function. Abel (1996) has extended this study by giving the complete asymptotic expansion for the BBH operator as the following form.
n → ∞. Here, c k represents all the coefficients from k = 0 to k = n.
The Szasz operator is the limiting operator of BBH (Khan, 1988) (Khan, Nevai, & Pinkus, 1991). L n (f ;x) is convex itself if f is a non-increasing convex function. Jayasri and Sitaraman (1993) have determined that L n is a pointwise approximation process in which the largest subclass of C[0, ∞) for the Bernstein-type of operator.
Then, the following function class is introduced in the study of Hermann, Szbados, and Tandori (1991).
He has proved that if f belongs to , then for each x > 0, the pointwise convergence is lim n→∞ L n f = f on [0, ∞). Moreover, for some a > 0, f (x) = e ax , then lim x→∞ L n (f ;x) = ∞. Also the operator L n is arisen from the random variable with n observations, X n , which has the Bernoulli distribution as below.

Meyer-König and Zeller operator
The operator is known as the Meyer-König and Zeller (MKZ) operator when for the nth degree and the kth basis polynomial for the value of x as stated previously.
These operators are known as the Bernstein-type of operators. Cheney and Sharma (1996) have defined this operator as a power series of the Bernstein operators.
The MKZ operator can also be obtained from the negative binomial distribution via

Application
In the assessment of the polynomials' results, we consider four different scenarios. In the first scenario, we estimate the precision matrix Θ from the simulated data-sets under different kinds of the Bernstein polynomials and the Bernstein-type of operators, which are the MKZ operator and the BBH operator. For the analyses of the model, we generate 50, 100, and 500 dimensional data-sets in which each gene has 20 observations. Then, we set all the off-diagonal entries of Θ to 0.9 arbitrarily and generate scale-free networks by using the huge package in the R programme language. Then, in order to evaluate whether the entry of the off-diagonal terms has any effect in inference, we change it via moderately small and small values too. Hereby, in the second and the third scenarios, the off-diagonal elements of the precision matrix set to 0.7 and 0.5, respectively, when the networks are scale-free. Finally, as the fourth plan, we change the sparsity of the system from the scale-freeness to the hubs property since the former typically indicates a high sparsity level around 90% or above and the latter implies relatively a lower sparsity level at around 80-90%. On the other hand, in our analyses, we have not controlled the performance of methods lower than these sparsities levels as the high sparsity is one of the main features of biological systems (Barabasi & Otiva,Barabasi04).
Accordingly, in all scenarios, we initially generate a data-set for the true network and keep its true path for the best model selection in further steps. Later, we transform this actual data-set via the Bernstein polynomial, the Szasz polynomial, the MKZ operator, and the BBH operator. Finally, all these non-transformed and transformed data are used to estimate the precision matrices by using the neighborhood selection (NB) method. In the application, we choose NB among alternatives due to its computational gain. For the assessment, we calculate the F-measure and the precision for each run as shown below.
From Table 1, it is seen that for low dimensions, the Bernstein polynomials give better results than others, in particular, the Szasz polynomials have the highest F-measure. We obtain the same results for the precision values as well. Moreover, we detect that when the dimension of the matrix increases, the F-measure and the precision value decrease. Furthermore, as shown in Tables 2 and 3, we observe similar findings in the sense the Szasz polynomials typically produce better F-measure and precision even though we decrease the correlation between genes (by off-diagonal entries 0.7 and 0.5). Whereas under these scenarios, it is found that the MKZ operator is as good as the Szasz operator in terms of the accuracy of the estimates under certain conditions. Finally when we evaluate the outputs of Tables 2-4, we see that the results of both the Szasz polynomial and the MKZ operator are very close to each other and overperform with respect to the remaining operators.
On conclusion, all these outputs imply that when the sparsity of networks decreases, all Bernsteintype of operators compute similar results and the Szasz polynomial as well as the MKZ operator are slightly better than others. On the contrary, if the sparsity level raises as mostly observed in

Discussion
In this study, we have compared all well-known Bernstein-type of operators to detect which alternative produces the highest accuracy if it is applied with GGM. For this purpose, we have analyzed the Monte Carlo results of the Bernstein polynomials, BBH and MKZ operators. The results have indicated that the Bernstein polynomials have the highest accuracies if they are performed in advance of the inference of the model parameters of GGM and the MKZ operator can be another good alternative if the sparsity level of the system decreases. But for all choices of operators, we have found that these operators can improve the accuracies of estimates for different dimensional biochemical systems if they are implemented as the pre-processing step before the inference of the precision matrix. As the extension of this study, we consider other approximation methods such as the Fourier transformations in such a way that the distributional feature of the observations can be embedded to the new transformed data in order to smooth the original data-set smartly and get estimates with higher accuracy.