1 Introduction

The application of computational algebraic methods to experimental design has been extensively covered by the authors, co-workers and others following the seminal paper (Pistone and Wynn 1996), monograph (Pistone et al. 2001) and review by the authors (Maruri-Aguilar and Wynn 2015), with other developments such the study of duality between fractions of factorial design and models (Maruri-Aguilar et al. 2012). In this work a design is seen as a set of points and defines a natural polynomial model which can be used to interpolate data at design points. We concentrate on designs that are the union of product tensor structures, known as echelon designs. For such designs, the exponents of the interpolator have the same form as the design, giving a duality between designs and models, with both exhibiting hierarchical structure.

Another feature of this hierarchical design-model framework is the central role played by inclusion–exclusion (IE) formulæ. Inclusion–exclusion has been used extensively in reliability theory, where formulæ are derived from monomial ideals, see (Sáenz-de Cabezón , Wynn 2009). We use the same principles to describe the echelon design points as IE of tensor (product) designs. The benefit of this method is the reduction in complexity achieved by using the so-called minimal free resolution. In short, the coefficients in the IE are Betti numbers of the resolution of the monomial ideal.

In this paper we apply the same machinery to study sparse grids. These grids are used widely in quadrature, particularly in the Finite Element solution of differential equations, and the difference of terminology should be noted. Sparse grids are sometimes referred to as Smolyak grids following the seminal paper (Smolyak 1963). Recent work has used sparse grids to the solution of differential equations (Nobile et al. 2008), for interpolation (Barthelmann et al. 2000) and for integration (Novak and Ritter 1996).

In our development, a set of quadrature points becomes an experimental design (design for short). This approach is in line with the recent history of the design of computer experiments, also termed as simulation experiments or simply simulation. We show that a Smolyak grid satisfies the same IE despite not having the same hierarchical form as discussed above.

The main result of this paper is that the IE formulæ which hold for interpolators over echelon designs extend to more general sparse grids. Polynomial interpolators based on the “echelon” designs inherit the IE decomposition that uses the natural product model interpolators. Remarkably, the same IE forms are inherited by sparse grid interpolators. Moreover the reduced forms use the multi-graded Betti numbers from the algebraic theory.

Betti numbers are at the foundation of algebraic topology, and describe features of the geometry of manifolds via topologically equivalent, and one might say more abstract entities, namely simplicial complexes. In statistics they have been used intensively within the relatively new area topological data analysis (TDA) (Edelsbrunner , Harer 2010). However, here Betti numbers are used in a different way, namely as a way of reducing the complexity of the inclusion–exclusion lemma. It is quickly observed that in handling large inclusion–exclusion formulæ, there are redundancies; that is to say a large amount of cancellation of terms and many sets may be empty. This happens partly because the underlying dimension may be much smaller that the number of sets and partly because of the complexity of the inter-relation between the sets. Both in TDA and IE there are “minimal free resolutions” (a term from the underlying algebraic topology) which gives, in a well-defined sense the, least complex formulae in which the plus and minus signs of the formulae are replaced by (multigraded) Betti numbers (Naiman and Wynn 1997; Sáenz-de Cabezón , Wynn 2009).

The “sets” in this paper are the tensor grids of exponents \(\alpha \) in monomials \(x^\alpha \). Thus, in summary, the Betti numbers support less complex IE formulae, in the sense of number of terms, in which the complexity is, in a sense, buried in the Betti numbers. The main purpose of this paper is to point out this structure, that is to say complexity reduction using the Betti numbers, is inherited by the natural interpolators based supported on the tensor grids and their extension to sparse grids.

The rest of the paper is ordered as follows. In Sect. 2 we introduce echelon designs and the duality between designs and models. We discuss the tensor product notation and the extreme corner points that define echelon designs. Section 3 contains the main results for interpolation. We define indicators which are the building blocks of a Lagrange interpolator over a single tensor design and give the result of inclusion–exclusion which we prove in Appendix 1. This result allows interpolation over general echelon designs which is achieved by IE formulæ. In Sect. 4 we apply our results to sparse grids and show this with an example. At the core of this paper are two approaches to response surface modelling, the numerical analysis and the statistical viewpoints. We briefly discuss these two approaches and suggest future work in Sect. 5.

2 Designs and models

We start with a general definition of a design that covers designs for response surface modelling and designs for computer experiments. The objective is to study the effect of k continuous factors \(x_1, \ldots , x_k\in \mathbb R\) on a real valued output y, with information collected in an experimental design.

Definition 1

A design D is a finite set of n points in \(\mathbb R^k\). For every factor, the values of the coordinates of points in D are the levels of the design.

In this paper we are interested in a special type of designs called echelon designs. The following two definitions form the core of this paper: a hierarchical grid in the integers and the echelon design built from it. Both objects below are designs as of Definition 1.

Definition 2

A reference grid G is a finite subset of \(\mathbb Z^k_{\ge 0}\) with the order ideal property, that is for every point \(\alpha =(\alpha _1,\ldots ,\alpha _k)\in G\), all \(\alpha '\in \mathbb Z^k_{\ge 0}\) such that \(0\le \alpha '\le \alpha \) (coordinatewise) are also in G.

Definition 3

An echelon design is a design obtained from a reference grid G by changing the levels of each factor to any finite real values using a one to one mapping.

Definition 3 starts from a reference grid G to build an echelon design D. The converse procedure is important, namely that any echelon design D maps to a unique reference grid G(D), hence the use of the word reference. Using the identity map, a reference grid is trivially an echelon design.

Example 1

Full factorial experiments \(2^k\) are echelon designs. The echelon property is not limited to designs with two levels, and if the design has all combinations of levels of factors, then the full factorial design \(n_1n_2\cdots n_k\) is an echelon design, where \(n_i\) is the number of levels of the i-th factor.

In general, regular fractions of factorial designs are not echelon. A simple case of this is the fraction \(2^{2-1}\) with point coordinates \((-1,-1),(1,1)\) that cannot be converted to a reference grid by relabeling levels. However, certain fractions of factorial designs are echelon designs because they can be converted to a reference grid, such as the designs used in the method of elementary effects (EE) for screening active factors in computer experiments, see (Morris 1991).

Fig. 1
figure 1

From left to right, a Smolyak design, its reference grid G and an echelon design with the same reference grid. The points in red in G are the points \({\mathcal {A}}(G)\)

Example 2

The central panel of Fig. 1 shows a reference grid G with \(n=17\) points, which is transformed into a Smolyak design by the map shown in Table 1. For instance, all the 7 points of G with horizontal (vertical) coordinate equal to zero are relabeled to a \(x_1\) (\(x_2\)) coordinate with value 0.5. The map is not unique, as points of G with coordinate value 1 (or 2) could be relabelled to be one of 0.113 or 0.887. The left panel in Fig. 1 shows the Smolyak design, while the right panel shows another echelon design with the same size and reference grid G. In both Smolyak and echelon, the numbers at the bottom (left) are numbers of points with the same \(x_1\) (\(x_2\)) coordinate. The numbers shown in G give the actual coordinates of points.

A reference grid G is defined by a special set of corner points \({{\mathcal {A}}}(G)\). This set of corners \({{\mathcal {A}}}(G)\) is comprised with those points \(\alpha \in G\) such that there are no other points \(\alpha '\in G\) that satisfy \(\alpha \le \alpha '\) coordinatewise. Trivially, the reference grid defined by \({{\mathcal {A}}}={{\mathcal {A}}}(G)\) equals G, i.e. \(G({{\mathcal {A}}})=G\). In this last statement, we slightly abused notation using G for a reference grid and \(G({{\mathcal {A}}})\) as the reference grid built from the set of extreme corners \({\mathcal {A}}\).

Table 1 Map for generating the Smolyak design D from the grid G of Fig. 1

The simplest reference grid is created with a single corner point \(\alpha =(\alpha _1,\ldots ,\alpha _k)\in \mathbb Z^k_{\ge 0}\), that is \(G({\alpha }) = \{\alpha '\in \mathbb Z^k_{\ge 0}: 0 \le \alpha ' \le \alpha \}\). The grid \(G(\alpha )\) is a product design, written equivalently as

$$\begin{aligned} G({\alpha }) = \{ 0,\ldots ,\alpha _1\}\bigotimes \cdots \bigotimes \{ 0,\ldots ,\alpha _k\}, \end{aligned}$$

and we refer to it as a tensor design. A tensor grid \(G(\alpha )\) is already a full factorial design and any echelon created from it will also be a full factorial.

We now discuss the polynomial models for our designs. The building block for the model is a monomial whose exponents are entries of \(\alpha \), that is \(x^{\alpha } =x_1^{\alpha _1} \cdots x_k^{\alpha _k}\). We write the model over the tensor grid \(G(\alpha )\) as

$$\begin{aligned} y_{\alpha }(x) = \sum _{\alpha ' \in G({\alpha })}\theta _{\alpha '} x^{\alpha '}, \end{aligned}$$

and in general for a reference grid G with corners \({{\mathcal {A}}}(G)\) we have the model:

$$\begin{aligned} y_{{{\mathcal {A}}}(G)}(x) = \sum _{\alpha ' \in G} \theta _{\alpha '} x^{\alpha '}. \end{aligned}$$

This model has observations collected in a reference grid design G and uses points \(\alpha \) in G as model exponents. This duality between design and model was referred to in the introduction, see also (Pistone et al. 2001). When we consider the elements of G as exponents of a model, then \({{\mathcal {A}}}(G)\) are the exponents of the set of directing monomials, see (Bates et al. 2003).

Although the model \(y_{{{\mathcal {A}}}(G)}(x)\) uses all the points and exponents in G, we refer to it by its most extreme points \({{\mathcal {A}}}(G)\). This notation is necessary for later as our methods depend on inclusion exclusion. We also use the notation \(G({{\mathcal {A}}})\) when we wish to recapture the reference grid from its defining corner set \({\mathcal {A}}\).

Example 3

Consider the reference grid G shown in Fig. 1. If we start from G, we recover its extreme points as \({\mathcal {A}}={{\mathcal {A}}}(G)=\{(0,6),(2,2),(6,0)\}\). Taking a slightly different approach, we recover the full grid of \(n=17\) points from \({\mathcal {A}}\) as \(G=G({\mathcal {A}})\).

The following lemma links models built with the reference grid and an echelon design. This result is taken from Pistone et al. (2001).

Lemma 1

If D is an echelon design, then it identifies a unique model with exponent vectors given by the elements of the reference grid with extreme corner \({\mathcal {A}}(G(D))\).

In the following section we develop the models for the reference grids using indicator functions.

3 Interpolation

Consider a non-negative integer vector \(\alpha \). We are interested in interpolation over points in tensor design \(G(\alpha )\). That is, assume that at every point \(\alpha '=(\alpha _1',\ldots ,\alpha _k')\) of \(G(\alpha )\), a real data value \(y_{\alpha '}\) is available. We will construct a polynomial interpolator using the full set of data \(\{y_{\alpha '}:\alpha ' \in G(\alpha ) \}\).

To build the interpolator, we start by defining the indicator function for the point \(\alpha '\) in the grid \(G(\alpha )\). The indicator is the product of k one dimensional indicators obtained by Lagrange interpolation:

$$\begin{aligned} I_{\alpha '} (x) = \prod _{i=1}^k \prod ^{\alpha _i}_{_{m \ne \alpha '_i}^{m=0}} \frac{x_i-m}{\alpha '_i - m}, \end{aligned}$$
(1)

where \(x = (x_1,\ldots ,x_k )\in \mathbb R^k\) is the point at which the indicator is evaluated. For simplicity we write \(I_{\alpha '} (x)\) instead of the notation \(I_{\alpha '}^{(\alpha )} (x)\) that makes explicit that this is the indicator function for point \(\alpha '\) in the grid \(G(\alpha )\) for a given \(\alpha \in \mathbb Z^k_{\ge 0}\). The notation \(I_{\alpha '}^{(\alpha )} (x)\) is used in Appendix 1 to prove Lemma 2.

For a point x in \(G(\alpha )\), the function \(I_{\alpha '}(x)\) has the indicator property

$$\begin{aligned} I_{\alpha '}(x)=\left\{ \begin{array}{ll}1&{}\text{ if } x=\alpha '\\ 0&{}\text{ if } x\ne \alpha ' \end{array}\right. . \end{aligned}$$

The interpolator function over the full tensor grid \(G(\alpha )\) is built as a linear combination of the indicator functions, using the values \(y_{\alpha }'\) as coefficients:

$$\begin{aligned} y_{\alpha }(x)=\sum _{0\le \alpha '\le \alpha }y_{\alpha '} I_{\alpha '}(x). \end{aligned}$$
(2)

Example 4

Consider the grid \(\{(\alpha '_1,\alpha '_2):\alpha '_1=0,1; \alpha '_2=0,1,2,3\}\) of \(n=8\) points in \(\mathbb Z^2_{\ge 0}\). This is the reference grid \(G(\alpha )\) for \(\alpha =(1,3)\), equivalently written as a tensor grid \(G(\alpha )=\{0,1\}\bigotimes \{0,1,2,3\}\). The indicator function of point \(\alpha '=(0,3)\in G(\alpha ) \) is

$$\begin{aligned} I_{\alpha '}(x)=\frac{x_1-1}{0-1}\cdot \frac{x_2}{3}\cdot \frac{x_2-1}{3-1}\cdot \frac{x_2-2}{3-2} =-\frac{1}{6}(x_1-1)x_2(x_2-1)(x_2-2). \end{aligned}$$

3.1 Interpolation on echelon design

We consider interpolation over a reference grid \(G({\mathcal {A}})\), for a set of corner points \({\mathcal {A}}\). Suppose that this grid is the union of two tensor grids

$$\begin{aligned} G({\{\alpha , \gamma \}}) = G({\alpha }) \cup G({\gamma }). \end{aligned}$$

Here \({\mathcal {A}} = \{\alpha ,\gamma \}\), where \(\alpha ,\gamma \) are distinct vectors with non-negative integer entries. To avoid trivial cancellations, in what follows the vectors \(\alpha ,\gamma \) do not dominate each other, that is neither \(\alpha \le \gamma \) nor \(\alpha \ge \gamma \). By this requirement we avoid the inclusions \(G(\alpha )\subset G(\gamma )\) and \(G(\alpha )\supset G(\gamma )\).

The following is a simple inclusion–exclusion for the designs, using indicator functions in an obvious way:

$$\begin{aligned} I(G({\{\alpha ,\gamma \}})) = I(G({\alpha })) + I(G({\gamma })) - I (G({\alpha }) \cap G({\gamma })), \end{aligned}$$

where the indicator I(G) takes the unit value on grid points in G and zero on integer vectors (grid points) outside G.

The greatest common divisor between monomials \(x^\alpha \) and \(x^\gamma \) is the monomial \(x^{\alpha \wedge \gamma }\), with \(\alpha \wedge \gamma :=\left( \min (\alpha _1,\gamma _1),\min (\alpha _2,\gamma _2),\ldots ,\min (\alpha _k,\gamma _k)\right) \). Note that \(\alpha \wedge \gamma \) is the extreme corner point of the tensor grid \(G({\alpha \wedge \gamma })=G(\alpha )\cap G(\gamma )\).

A main result of this paper is that the set inclusion–exclusion (IE) for the design points extends to the interpolators. The proof is given in Appendix 1.

Lemma 2

Let \(y_{\alpha }(x),y_{\gamma }(x)\) and \(y_{\alpha \wedge \gamma }(x)\) be the interpolator functions over grids \(G(\alpha ),G(\gamma )\) and \(G({\alpha \wedge \gamma })\), respectively; and let \(y_{ \{\alpha ,\gamma \}}(x)\) be the interpolator function over the reference grid with corners \({\mathcal {A}}=\{\alpha ,\gamma \}\), that is over \(G(\alpha )\cup {}G(\gamma )\). Then the interpolator \(y_{ \{\alpha ,\gamma \}}(x)\) satisfies the following inclusion–exclusion formula

$$\begin{aligned} y_{ \{\alpha ,\gamma \}}(x) =y_{\alpha }(x)+y_{\gamma }(x)-y_{\alpha \wedge \gamma }(x). \end{aligned}$$
(3)

Example 5

Consider \(G(\gamma )\) with \(\gamma =(2,2)\) and \(G(\alpha )\) of Example 4 with corner point \(\alpha =(1,3)\). These grids determine a design \(D=G(\alpha )\cup G(\gamma )\subset \mathbb Z^2_{\ge 0}\) of \(n=11\) points.

We now show the interpolating property for point \(x=(0,3)\in G(\alpha )\), that is we show that \(y_{D}(x)\) equals \(y_{(0,3)}\) when evaluated at \(x=(0,3)\). There are three terms of Equation (3) to be evaluated at \(x=(0,3)\), and given the location of x in \(G(\alpha )\) but not in \(G(\gamma )\), this computation uses the second case in the proof of Lemma 2.

The first of the three terms in Equation (3) is \(y_{\alpha }(x)\), to be evaluated at x. As \(y_{\alpha }\) interpolates over \(G(\alpha )\), this value is \(y_{(0,3)}\), precisely the value that we wish to interpolate. The second evaluation is for \(y_{\gamma }(x)\), composed of 9 summand terms. Of these, only the indicators \(I^{(\gamma )}_{(0,0)}(x), I^{(\gamma )}_{(0,1)}(x)\) and \(I^{(\gamma )}_{(0,2)}(x)\) have non zero value when evaluated at x. For the first one, when evaluated at \(x=(0,3)\), the value is

$$\begin{aligned} I^{(\gamma )}_{(0,0)}(x)=\frac{1}{4}(x_1-1)(x_1-2)(x_2-1)(x_2-2)=1, \end{aligned}$$

and the other two values are \(-3\) and 3 respectively so that

$$\begin{aligned} y_{\gamma }(x)=y_{(0,0)}-3y_{(0,1)}+3y_{(0,2)} \end{aligned}$$

at \(x=(0,3)\). The third evaluation is for \(y_{\alpha \wedge \gamma }(x)\) with 6 terms. The evaluation of indicators at \(x=(0,3)\) coincide with those of \(y_{\gamma }(x)\). Specifically, when evaluated at \(x=(0,3)\), the indicators \(I^{(\alpha \wedge \gamma )}_{(0,0)}(x),I^{(\alpha \wedge \gamma )}_{(0,1)}(x)\) and \(I^{(\alpha \wedge \gamma )}_{(0,2)}(x)\) give values \(1,-3\) and 3, respectively so that at \(x=(0,3)\) we have

$$\begin{aligned} y_{\alpha \wedge \gamma }(x)=y_{(0,0)}-3y_{(0,1)}+3y_{(0,2)}=y_{\gamma }(x). \end{aligned}$$

Indicators defined over grids \(G({\alpha \wedge \gamma })=G(\alpha )\cap G(\gamma )\) and \(G(\gamma )\) are different functions, for instance \(I^{(\alpha \wedge \gamma )}_{(0,0)}(x)=-\frac{1}{2}(x_1-1)(x_2-1)(x_2-2)\) that coincides with \(I^{(\gamma )}_{(0,0)}(x)\) at \(x=(0,3)\). We have thus shown the interpolating property at \(x=(0,3)\), that is

$$\begin{aligned}y_{D}(x)=y_{(0,3)}+\left( y_{(0,0)}-3y_{(0,1)}+3y_{(0,2)}\right) -\left( y_{(0,0)}-3y_{(0,1)}+3y_{(0,2)}\right) =y_{(0,3)}. \end{aligned}$$

Example 6

In this example the grids \(G(\alpha )\) and \(G(\gamma )\) lie in different dimensions, consider \(G(\alpha )\) with \(\alpha =(2,2,2)\), a grid with 27 points in \(\mathbb Z^3_{\ge 0}\); and grid \(G(\gamma )\) with \(\gamma =(0,0,3)\). Although \(G(\gamma )\) is embedded in \(\mathbb Z^3_{\ge 0}\), it is a set of 4 points lying along the \(x_3\) coordinate only. The design is \(D=G(\alpha )\cup G(\gamma )\), and we show the interpolating property for \(y_D(x)\) at the point \(x=(1,1,2)\in G(\alpha )\).

We begin with \(y_{\alpha }(x)=y_{(1,1,2)}\) at \(x=(1,1,2)\) because of interpolating property of \(y_{\alpha }\). For the inclusion–exclusion we require the grid \(G(\alpha )\cap G(\gamma )\) that has extreme corner point \(\alpha \wedge \gamma =(0,0,2)\). There is a single point (0, 0, 2) in both \(G(\gamma )\) and \(G(\alpha )\cap G(\gamma )\) that shares the most coordinates with the point (1, 1, 2). The indicators are \(I^{(\gamma )}_{(0,0,2)}(x)=-x_3(x_3-1)(x_3-3)/2\) and \(I^{(\alpha \wedge \gamma )}_{(0,0,2)}(x)=x_3(x_3-1)/2\) which when evaluated at (1, 1, 2) take the value one. Completing Equation (3), at \(x=(1,1,2)\) we have \(y_D(x)=y_{(1,1,2)}+y_{(0,0,2)}-y_{(0,0,2)}=y_{(1,1,2)}\).

Fig. 2
figure 2

Grids involved in Taylor IE

In general, reference grids we will have more than two corners and the inclusion–exclusion will require several layers. Figure 2 shows the grids involved in the IE with three generators \(\alpha ,\gamma ,\delta \). The top left panel is the reference grid \(G=G(\{\alpha ,\gamma ,\delta \})\); each of the other figures corresponds to a specific labeled grid; black dots are points in the grid while grey dots are points in G but outside the labeled grid.

The main result considers a reference grid defined by a set of corner points \({\mathcal {A}}\). We give the theorem below, whose proof by induction we omit.

Theorem 1

Let \(G({\mathcal {A}})\) be a reference grid for a set of corner points \({\mathcal {A}}\). Then the following holds:

  1. (i)

    There is an IE for \(G({\mathcal {A}})\) in terms of indicators of tensor grids:

    $$\begin{aligned}I(G({\mathcal {A}})) = \sum _{\alpha \in {\mathcal {A}}} I(G(\alpha )) - \sum _{\alpha ,\gamma \in {\mathcal {A}}, \alpha \ne \gamma } I(G(\alpha \wedge \gamma )) \\ + \sum _{\alpha ,\gamma , \delta \in {\mathcal {A}}, \alpha \ne \gamma \ne \delta } I(G(\alpha \wedge \gamma \wedge \delta )) + \cdots \end{aligned}$$
  2. (ii)

    The unique interpolator \(y_{{\mathcal {A}}}(x)\) for points in \(G({\mathcal {A}})\) satisfies the following

    $$\begin{aligned}y_{{\mathcal {A}}}(x) = \sum _{\alpha \in {\mathcal {A}}} y_{\alpha }(x) - \sum _{\alpha ,\gamma \in {\mathcal {A}}, \alpha \ne \gamma } y_{\alpha \wedge \gamma }(x) + \sum _{\alpha ,\gamma , \delta \in {\mathcal {A}}, \alpha \ne \gamma \ne \delta } y_{\alpha \wedge \gamma \wedge \delta }(x) + \cdots \end{aligned}$$

3.2 Betti numbers

In algebraic theory, the inclusion–exclusion of Theorem 1 is known as the Taylor resolution, which is the most complex case of IE, namely using all the singleton generators, then all possible pairs, triples and so on. The Taylor resolution is often highly redundant and it is possible to reduce the complexity of the IE by using other “resolutions” Sáenz-de Cabezón , Wynn (2009). Here is a simple example to show reduction in IE complexity by cancelation.

Example 7

Consider the set of corner points \({\mathcal {A}} = \{(2,0), (1,1), (0,2)\}\) that define a reference grid of six points in two dimensions. The full Taylor inclusion–exclusion of \(I(G( {\mathcal {A}}))\) is:

The cancellation occurs between terms that are equal, despite coming from intersections involving different numbers of sets, that is \(G({(2,0)})\cap G({(0,2)})= G\left( {(0,0)}\right) \) which equals \(G({(2,0)})\cap G({(1,1)} )\cap G({(0,2)})\). The interpolator has five terms as well \(y_{{\mathcal {A}}}(x)=y_{(2,0)}(x)+y_{(1,1)}(x)+y_{(0,2)}(x)- y_{(2,0)\wedge (1,1)}(x)-y_{(1,1)\wedge (0,2)}(x)\).

Table 2 Inclusion–exclusion for two grids each with five generators

The efficacy of cancellations to simplify IE depends on the specific design. IE in Example 7 needed only five terms. We give two more complex examples.

The first column of Table 2a gives the extreme corners \({\mathcal {A}}\) of a grid with \(n=112\) points in five factors. The rest of columns contain the layers of inclusion–exclusion for the Taylor IE, for example the column labelled “3-way” has the \({5\atopwithdelims ()3}=10\) results of the type \(\alpha \wedge \gamma \wedge \delta \) for \(\alpha ,\gamma ,\delta \in {\mathcal {A}}\). For this grid, the exhaustive Taylor computation is highly redundant with many cancellations so that out of the 31 terms, we only need six for the IE. The indicator is

$$\begin{aligned}I(G({\mathcal {A}}))= & {} I(G({(2,1,1,1,1)}))+I(G({(1,2,1,1,1)}))+I(G({(1,1,2,1,1)}))\\{} & {} +I(G({(1,1,1,2,1)}))+I(G({(1,1,1,1,2)}))-4I(G({(1,1,1,1,1)})). \end{aligned}$$

For this grid, or any echelon design built from it, the IE interpolator is

$$\begin{aligned} y_{{\mathcal {A}}}(x)= & {} y_{(2,1,1,1,1)}(x)+y_{(1,2,1,1,1)}(x)+y_{(1,1,2,1,1)}(x)\nonumber \\{} & {} +y_{(1,1,1,2,1)}(x)+y_{(1,1,1,1,2)}(x)-4y_{(1,1,1,1,1)}(x). \end{aligned}$$
(4)

Table 2b has the results for a grid with \(n=242\) points and the same number of factors and generators as the grid of Table 2a. However, this case has maximal complexity and no simplifications can be made. In summary, for this grid, both IE indicator \(I(G({\mathcal {A}}))\) and interpolator \(y_{{\mathcal {A}}}(x)\) need all 31 terms.

We require an efficient method to retrieve only the essential IE terms. There is no space to cover the theory of monomial ideals behind improved resolutions and simplified inclusion–exclusions for interpolators. We however we give a short explanation.

An efficient way of construction the sought inclusion–exclusion uses results from monomial ideals (Sáenz-de Cabezón , Wynn 2009). Recall that a monomial ideal is a set of polynomials, and in our case, we build the ideal by flipping the grid G around a point. After flipping, the points become the exponents of monomials that generate an ideal. We use the shorthand that the monomial ideal obtained by this flipping the corners in \({\mathcal {A}}\) around a point c is the derived ideal. The objective of creating the derived ideal is to determine efficiently the IE decomposition of the grid G.

Figure 3 gives a graphical explanation of the process. We flip the corners in \({\mathcal {A}}\) around a point c as follows, every corner \(\alpha \in {\mathcal {A}}\) is flipped to become \(\tilde{\alpha }=c-\alpha \) and similarly for \(\gamma ,\delta \) in the same figure. When the whole grid is flipped, the point c becomes the origin as shown in the right panel. The choice of flip point c is not critical as long as it is at least equal to the maximum value of G for the coordinate in turn. In the figure, c was even beyond the maximum value per coordinate which has no influence in the result.

Fig. 3
figure 3

The left panel shows a grid G, and the right panel shows exponents of the derived monomial ideal by flipping corners of G around point c. The shaded area is included to illustrate the flip operation

Example 8

We continue with the grid G of Example 7. To be specific, if the corner points \({\mathcal {A}}\) are flipped around the maximum point per coordinate \(c=(2,2)\), we retrieve the same \({\mathcal {A}}\) which when turned to exponents yields \(x_1^2,x_1x_2, x_2^2\) which are used to generate the derived ideal. For the generator corners of G in Example 2, we flip around \(c=(6,6)\) and convert to exponents so that we have ideal generators \(x_1^6,x_1^4x_2^4, x_2^6\). In a more complex case, the grid of Example 6 uses flip point \(c=(2,2,3)\) which gives monomials \(x_3,x_1^2,x_2^2\).

The result of homological computations with ideals is a collection of Betti numbers \(\beta _{\alpha ,j}\). These numbers are non-negative integers that provide the coefficients in a maximally reduced IE and thus are what we are looking for. Each number is indexed by a vector of non-negative integers \(\alpha \) and an integer j that controls the role of the number in the inclusion–exclusion. We only use positive \(\beta _{\alpha ,j}\), as those zeroes do not contribute to the result and for simplicity, we collect the Betti numbers for the same j in a set that we label \(B_j\). The computations are carried out using, for example, the function HilbertSeriesMultiDeg of the CoCoA computer package (Abbott et al.). The results of computations over the derived ideal have to be reversed around c so that we can use them for the grid.

The following theorem is a simplification of the exhaustive series of Theorem 1 and it is given without a proof.

Theorem 2

Let \(G({\mathcal {A}})\) be an echelon for a set of corner points \({\mathcal {A}}\). Let \(\beta _{\alpha ,j}\) be the multigraded Betti numbers associated with the minimal free resolution of the derived monomial ideal, and let the non-zero Betti numbers with the same value of j be collected in \(B_j\). Then

  1. (i)

    There is an IE for \(I(G({\mathcal {A}}))\):

    $$\begin{aligned} I(G({{\mathcal {A}}})) =\sum _{j=0}^{k-1}(-1)^j\sum _{\alpha \in B_j}\beta _{\alpha ,j}I(G({\alpha })), \end{aligned}$$
  2. (ii)

    The unique interpolator over \(G({\mathcal {A}})\) can be decomposed into tensor grid interpolator of the same form:

$$\begin{aligned} y_{{\mathcal {A}}}(x) =\sum _{j=0}^{k-1}(-1)^j\sum _{\alpha \in B_j}\beta _{\alpha ,j}y_{\alpha }(x). \end{aligned}$$

Theorems 1 and 2 yield exactly the same result concerning either the composition of \(I(G({\mathcal {A}}))\), or the interpolation formulæ for \(y_{{\mathcal {A}}}(x)\) over \(G({\mathcal {A}})\). However the former is much more complex, involving \(2^{\vert {{\mathcal {A}}}\vert }-1\) terms, while the latter has many of these redundancies removed. As shown by example earlier, the reduction in complexity achieved with Theorem 2 depends on the generators of the derived ideal. The following example continues an earlier grid with maximal redundancy.

Example 9

Consider the grid of Table 2a, which will be flipped around the point \(c=22222\), written using the notation of Table 2. The derived ideal only requires the extreme corners, that once flipped are 01111, 10111, 11011, 11101 and 11110 so that the ideal is generated by monomials \(x_2x_3x_4x_5,x_1x_3x_4x_5,x_1x_2x_4x_5,x_1x_2x_3x_5\) and \(x_1x_2x_3x_4\). The non-zero Betti numbers are \(\tilde{\beta }_{01111,0}=1, \tilde{\beta }_{10111,0}=1, \tilde{\beta }_{11011,0}=1, \tilde{\beta }_{11101,0}=1,\tilde{\beta }_{11110,0}=1\) and \(\tilde{\beta }_{11111,1}=4\), where the tilde indicates that results refer to the derived ideal. To use these numbers in the grid, we need to reverse the multi-indices around c, dropping the tilde notation in the process. The Betti numbers are \(\beta _{21111,0}=1, \beta _{12111,0}=1, \beta _{11211,0}=1, \beta _{11121,0}=1,\beta _{11112,0}=1\) and \(\beta _{11111,1}=4\). With these numbers we retrieve the simplified IE interpolator of Equation (4). By contrast, the grid of Table 2b yields all Betti numbers for multi-indices in Table 2b equal to one, with no simplification possible.

4 Extension to sparse grids

Sparse grids are designs used in stochastic finite element approximation, numerical integration and interpolation Nobile et al. (2008) and Plumlee (2014). They have a reduced number of runs while preserving the accuracy of approximations.

To construct a sparse design, we use the construction of Definition 3. That is, we start from a reference grid \(G({\mathcal {A}})\). Recall that although the reference grid lies in \(\mathbb Z^k_{\ge 0}\), it is already an echelon design, and each factor \(x_i\) has \(n_i\) levels \(0,1,\ldots ,n_i-1\). Next, define new levels for the design D, in which the levels of variable \(x_i\) are replaced by new levels

$$\begin{aligned} z_{i,1}, \ldots , z_{i,n_i}, \end{aligned}$$

for \(i = 1, \ldots , k\). The replacement of levels is arbitrary and only requires a one to one mapping, hence the design D may not resemble at all its reference grid G. The operations described above can be reversed and design D in the class defined by the operation has a unique reference grid G(D).

The reversibility of operations implies that the design D still identifies a single polynomial model with exponents given by points in the reference grid G. The main result of the paper is that the inclusion–exclusion results for the tensor designs and interpolators that hold for the echelon designs extend to the sparse designs obtained by a set of transformation of level of each factor. We shall now call such a design a Smolyak grid.

It is clear that any component tensor grid of the reference grid is transformed into a tensor grid in the design D. Thus a Smolyak grid defined from a reference grid G(D) has a set of defining corner points \({\mathcal {A}}\) which then indexes an IE, for D.

We use the simple notation \(y_{\tilde{\alpha }}\) to indicate the interpolator over the block defined by the "corner" \(\alpha \), and \(\tilde{{\mathcal {A}}}\) for the full Smolyak grid. Over the sparse grid \(\tilde{{\mathcal {A}}}\), the (Betti) reduced IE polynomial interpolator is

$$\begin{aligned} y_{\tilde{{\mathcal {A}}}}(x) =\sum _{j=0}^{k-1}(-1)^j\sum _{\alpha \in B_j}\beta _{\alpha ,j}y_{\tilde{\alpha }}(x). \end{aligned}$$

Example 10

Consider a Smolyak design D with \(k=2\) factors and \(n=161\) points and with 15 levels per factor. The reference grid G has generators \(\{(14,6),(6,14)\}\) so that the inclusion–exclusion involves three tensor grids and corresponding design points as shown in Fig. 4.

Fig. 4
figure 4

The plots in the top row show in red the points of grid \(G(\alpha )\) and in black the points in \(G({\mathcal {A}})\setminus G(\alpha )\). Each plot is headed by \(\alpha \). The plots in the bottom row are the corresponding points in the Smolyak design

5 Conclusions and future research

The sparse grids described are based on a reference grid and despite the apparently predominant one-factor-at-a-time structure, the relabelling of coordinates allows an efficient exploration of the design space while keeping the number of runs low. These designs are an important subclass of experimental designs both from a theoretical point of view and because of their widespread use in applications.

Sparse grids have been mainly used in two disciplines: numerical analysis (NA) and experimental design (DE). A motivation for the current paper was to rediscover and clarify intricate formulæ used in the numerical analysis literature by using the Betti number decomposition of the last section.

For a fixed reference grid \(G({{\mathcal {A}}})\), the construction of a design for which \(G(D) = G({{\mathcal {A}}})\) still requires the specification of the numerical values of levels of each factor \(z_{i,1}, \ldots , z_{i,n_i}\). Thus if the design is to optimised, it should be optimised with respect both the reference grid G(D) and the spacings.

By analogy with factorial design and response surface design, one may outline a design first without specifying the spacing. A "star-composite design" is an example, being made up of a factorial design and a star without specifying the spacing of either. The spacing may be decided later based on the shape of the whole design region, preset candidate points, optimum design or some structural property e.g., rotatability. We could say, informally, that we shown one method of producing a generalised star design, but have left till later issues of spacing. This is partly because the numerical analysis definitions of optimality and the statistical ones are different so that more research is needed. We also note that although we worked on saturated models, actions like adding replicate points to improve statistical capabilities do not change the model identified.

In NA one may consider an approximation of a function by \(y_{\tilde{{\mathcal {A}}}}(x)\) and select the spare grid \(\tilde{{\mathcal {A}}}\) making use of the inequality

$$\begin{aligned} \vline \, \vline \, f - y_{\tilde{{\mathcal {A}}}}(x) \, \vline \, \vline \, \le \, \vline \, \vline \; f - y^* \; \vline \, \vline \times \Delta , \end{aligned}$$

where \(y^*\) is some best approximator to f, in some class, \(\Delta \) is the so-called Lebesgue constant, and \(\vline \, \vline \; \cdot \; \vline \, \vline \) a chosen metric. But sparse grids are beginning to attract interest in the statistical literature; an example is Plumlee (2014). There is an opportunity to draw the extensive literature on optimal design for application to sparse grids, in the choice of reference grid and spacing, whether for physical response surface experiments or computer experiments.

Whether in NA or DE, the use of algebra reduces complexity by tensor decomposition and we believe, this will help us to understand in future research the important trade off between sparsity and optimality.