Skip to main content
Log in

Computing a Quantity of Interest from Observational Data

  • Published:
Constructive Approximation Aims and scope

Abstract

Scientific problems often feature observational data received in the form \(w_1=l_1(f),\ldots \),\(w_m=l_m(f)\) of known linear functionals applied to an unknown function f from some Banach space \(\mathcal {X}\), and it is required to either approximate f (the full approximation problem) or to estimate a quantity of interest Q(f). In typical examples, the quantities of interest can be the maximum/minimum of f or some averaged quantity such as the integral of f, while the observational data consists of point evaluations. To obtain meaningful results about such problems, it is necessary to possess additional information about f, usually as an assumption that f belongs to a certain model class \(\mathcal {K}\) contained in \(\mathcal {X}\). This is precisely the framework of optimal recovery, which produced substantial investigations when the model class is a ball in a smoothness space, e.g., when it is a unit ball in Lipschitz, Sobolev, or Besov spaces. This paper is concerned with other model classes described by approximation processes, as studied in DeVore et al. [Data assimilation in Banach spaces, (To Appear)]. Its main contributions are: (1) designing implementable optimal or near-optimal algorithms for the estimation of quantities of interest, (2) constructing linear optimal or near-optimal algorithms for the full approximation of an unknown function using its point evaluations. While the existence of linear optimal algorithms for the approximation of linear functionals Q(f) is a classical result established by Smolyak, a numerically friendly procedure that performs this approximation is not generally available. In this paper, we show that in classical recovery settings, such linear optimal algorithms can be produced by constrained minimization methods. We illustrate these techniques on several examples involving the computation of integrals using point evaluation data. In addition, we show that linearization of optimal algorithms can be achieved for the full approximation problem in the important situation where the \(l_j\) are point evaluations and \(\mathcal {X}\) is a space of continuous functions equipped with the uniform norm. It is also revealed how quasi-interpolation theory enables the construction of linear near-optimal algorithms for the recovery of the underlying function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. It is worth mentioning that the classical notion of a Chebyshev center of a set considered in this paper is different from the more computable notion considered in [8], which corresponds to the center of the largest ball contained in the given set.

  2. There are various conditions on Q ensuring that \(Q(\mathcal{K}_w(\epsilon ,V))\) is bounded, e.g., Q being a Lipschitz map.

  3. The correction (4.4) may be omitted, since a pointwise near-optimal algorithm is already provided by \(w\mapsto v(w)\), but it makes the algorithm A data-consistent.

References

  1. Adcock, B., Hansen, A.C.: Stable reconstructions in Hilbert spaces and the resolution of the Gibbs phenomenon. Appl. Comput. Harm. Anal. 32, 357–388 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. Adcock, B., Hansen, A.C., Poon, C.: Beyond consistent reconstructions: optimality and sharp bounds for generalized sampling, and application to the uniform resampling problem. SIAM J. Math. Anal. 45, 3132–3167 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Adcock, B., Platte, R.B., Shadrin, A.: Optimal sampling rates for approximating analytic functions from pointwise samples. IMA J. Numer. Anal. (2018). https://doi.org/10.1093/imanum/dry024

    Google Scholar 

  4. Bakhvalov, N.S.: On the optimality of linear methods for operator approximation in convex classes of functions. USSR Comput. Math. Math. Phys. 11, 244–249 (1971)

    Article  MATH  Google Scholar 

  5. Binev, P., Cohen, A., Dahmen, W., DeVore, R., Petrova, G., Wojtaszczyk, P.: Convergence rates for Greedy algorithms in reduced basis methods. SIAM J. Math. Anal. 43, 1457–1472 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. Binev, P., Cohen, A., Dahmen, W., DeVore, R., Petrova, G., Wojtaszczyk, P.: Data assimilation in reduced modeling. SIAM/ASA J. Uncertain. Quant. 5, 1–29 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bojanov, B.: Optimal recovery of functions and integrals. First European Congress of Mathematics, Vol. I (Paris, 1992), pp. 371–390, Progress in Mathematics, 119, Birkhäuser, Basel (1994)

  8. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  9. Cohen, A., Dahmen, W., DeVore, R.: Compressed sensing and best \(k\)-term approximation. JAMS 22, 211–231 (2009)

    MathSciNet  MATH  Google Scholar 

  10. Coppersmith, D., Rivlin, T.: The growth of polynomials bounded at equally spaced points. SIAM J. Math. Anal. 23, 970–983 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  11. Creutzig, J., Wojtaszczyk, P.: Linear versus nonlinear algorithms for linear problems. J. Complex. 20, 807–820 (2004)

    Article  MATH  Google Scholar 

  12. CVX Research, Inc. CVX: matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (2014)

  13. DeVore, R.: Nonlinear approximation. Acta Numer. 7, 51–150 (1998)

    Article  MATH  Google Scholar 

  14. DeVore, R., Lorentz, G.G.: Constructive Approximation, vol. 303. Springer Grundlehren, Berlin (1993)

    MATH  Google Scholar 

  15. DeVore, R., Petrova, G., Wojtaszczyk, P.: Data assimilation and sampling in Banach spaces. P. Calcolo 54, 1–45 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  16. Driscoll, T.A., Hale, N., Trefethen, L.N. (eds.): Chebfun Guide. Pafnuty Publications, Oxford (2014)

    Google Scholar 

  17. Elad, M.: Sparse and Redundant Representations. Springer, Berlin (2010)

    Book  MATH  Google Scholar 

  18. Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Birkhäuser, Basel (2013)

    Book  MATH  Google Scholar 

  19. Kalman, J.A.: Continuity and convexity of projections and barycentric coordinates in convex polyhedra. Pac. Math. J. 11, 1017–1022 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  20. Lindenstrauss, J.: Extension property for compact operators. Mem. Am. Math. Soc. 48 (1964)

  21. Marcinkiewicz, J., Zygmund, A.: Mean values of trigonometrical polynomials. Fundamenta Mathematicae 28, 131–166 (1937)

    MATH  Google Scholar 

  22. Micchelli, C., Rivlin, T.: Lectures on optimal recovery. Numerical analysis (Lancaster, 1984), 21–93, Lecture Notes in Math., 1129, Springer, Berlin (1985)

  23. Micchelli, C., Rivlin, T., Winograd, S.: The optimal recovery of smooth functions. Numerische Mathematik 26, 191–200 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  24. Milman, V., Schechtman, G.: Asymptotic Theory of Finite Dimensional Normed Spaces, Lecture Notes in Mathematics, vol. 1200. Springer, Berlin (1986)

    MATH  Google Scholar 

  25. Osipenko, KYu.: Best approximation of analytic functions from information about their values at a finite number of points. Math. Notes Acad. Sci. USSR 19(1), 17–23 (1976)

    MathSciNet  MATH  Google Scholar 

  26. Platte, R., Trefethen, L., Kuijlaars, A.: Impossibility of fast stable approximation of analytic functions from equispaced samples. SIAM Rev. 53, 308–318 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  27. Schönhage, A.: Fehlerfortpflantzung bei Interpolation. Numer. Math. 3, 62–71 (1961)

    Article  MathSciNet  MATH  Google Scholar 

  28. Traub, J., Wozniakowski, H.: A General Theory of Optimal Algorithms. Academic Press, New York (1980)

    MATH  Google Scholar 

  29. Turetskii, A.H.: The bounding of polynomials prescribed at equally distributed points. Proc. Pedag. Inst. Vitebsk 3, 117–127 (1940). [Russian]

    Google Scholar 

  30. Wilson, M.W.: Necessary and sufficient conditions for equidistant quadrature formula. SIAM J. Numer. Anal. 7(1), 134–141 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  31. Zippin, M.: Extension of Bounded Linear Operators, Handbook of the Geometry of Banach Spaces, vol. 2, pp. 1703–1741. North-Holland, Amsterdam (2003)

    MATH  Google Scholar 

  32. Zygmund, A.: Trigonometric Series. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Foucart.

Additional information

Communicated by Wolfgang Dahmen.

This research was supported by the ONR Contracts N00014-15-1-2181, N00014-16-1-2706 the NSF Grant DMS 1521067, DARPA through Oak Ridge National Laboratory; by the NSF Grant DMS 1622134; and by National Science Centre, Poland Grant UMO-2016/21/B/ST1/00241.

Appendix

Appendix

Finally, we provide full justifications for several unproven results we have relied on, namely (3.12), (4.1), and Lemma 4.1. We start with (3.12).

Lemma 6.1

For any polynomial \(r\in \mathcal{P}_d\), one has

$$\begin{aligned} \Vert r\Vert ^2_{ C[-1,1]}\le \frac{(d+1)^2}{2}\Vert r\Vert ^2_{L_2[-1,1]}, \end{aligned}$$
(6.1)

and the inequality is sharp.

Proof

Let us consider the expansion of r with respect to the Legendre polynomials \(P_j\) normalized so that \(P_j(1)=1\) and \(\Vert P_j\Vert _{L_2[-1,1]}^2=\dfrac{2}{2j+1}\); that is,

$$\begin{aligned} r(x)= & {} \sum _{j=0}^d\Vert P_j\Vert ^{-2}_{L_2[-1,1]}\left( \intop \limits _{-1}^1 r(y)P_j(y)\,dy\right) P_j(x) =\intop \limits _{-1}^1 \left( \sum _{j=0}^d\frac{P_j(x)P_j(y)}{\Vert P_j\Vert ^{2}_{L_2[-1,1]}}\right) r(y)\,dy \\=: & {} \intop \limits _{-1}^1 k(x,y) r(y)\,dy. \end{aligned}$$

Since

$$\begin{aligned} |r(x)| = \Big | \intop \limits _{-1}^1 k(x,y) r(y) dy \Big | \le \Vert k(x,\cdot )\Vert _{L_2[-1,1]} \Vert r\Vert _{L_2[-1,1]}, \end{aligned}$$

the statement in the lemma follows from the fact that

$$\begin{aligned} \Vert k(x,\cdot )\Vert _{L_2[-1,1]}^2= & {} \intop \limits _{-1}^1\left( \sum _{j=0}^d\frac{P_j(x)P_j(y)}{\Vert P_j\Vert _{L_2[-1,1]}^{2}}\right) ^2\,dy= \sum _{j=0}^d \frac{P^2_j(x)}{ \Vert P_j\Vert _{L_2[-1,1]}^{4}}\intop \limits _{-1}^1P^2_j(y)\,dy\\= & {} \sum _{j=0}^d \frac{P^2_j(x)}{\Vert P_j\Vert _{L_2[-1,1]}^{2}} \le \sum \limits _{j=0}^d \frac{1}{\Vert P_j\Vert _{L_2[-1,1]}^{2}} = \sum _{j=0}^d \frac{2j+1}{2}= \frac{(d+1)^2}{2}. \end{aligned}$$

Inequality (6.1) is sharp because all inequalities become equalities for \(r=k(1,\cdot )\). \(\square \)

Next, we continue by restating (4.1).

Lemma 6.2

Let V be an n-dimensional subspace of C(D) and \(x_1,\ldots ,x_m \in D\) be m distinct points in D. If \(\mathcal{N}:= \{ \eta \in C(D): \eta (x_1)= \cdots = \eta (x_m)=0 \}\), then

$$\begin{aligned} \mu (\mathcal{N},V)_{C(D)} = 1 + \mu (V,\mathcal{N})_{C(D)}. \end{aligned}$$

Proof

In view of (1.2), it is enough to establish that

$$\begin{aligned} \mu (\mathcal{N},V)_{C(D)} \ge 1 + \mu (V,\mathcal{N})_{C(D)}. \end{aligned}$$
(6.2)

Let us define

$$\begin{aligned} \mu := \mu (V,\mathcal{N})_{C(D)} = \max _{v \in V} \frac{\Vert v\Vert _{C(D)}}{\displaystyle {\max _{1 \le j \le m} |v(x_j)|}}, \end{aligned}$$

and pick \(v \in V\) with \(\max _{1 \le j \le m} |v(x_j)| = 1\) and \(\Vert v\Vert _{C(D)} = \mu \). If \( \mu > 1\), choose \(x^* \in D\) such that \(|v(x^*)| = \mu \) and therefore \(x^* \not \in \{x_1,\ldots ,x_m\}\). If \(\mu = 1\), choose \(x^* \in D \setminus \{x_1,\ldots ,x_m\}\) such that \(|v(x^*)| \ge \mu - \delta \) for an arbitrarily small \(\delta > 0\). We introduce a function \(h \in C(D)\) satisfying

$$\begin{aligned} h(x_j) = v(x_j), \; j=1,\ldots ,m, \qquad h(x^*) = -\mathrm{sgn}(v(x^*)), \qquad \Vert h\Vert _{C(D)} = 1. \end{aligned}$$

Clearly, the function \(\eta := v-h\) belongs to \(\mathcal{N}\), and we have

$$\begin{aligned} \mu (\mathcal{N},V)_{C(D)} \ge \frac{\Vert \eta \Vert _{C(D)}}{\Vert \eta - v\Vert _{C(D)}} = \frac{\Vert v-h\Vert _{C(D)}}{\Vert h\Vert _{C(D)}} \ge |v(x^*)-h(x^*)| \ge \mu - \delta + 1. \end{aligned}$$

Since \(\delta > 0\) was arbitrary, this proves (6.2). \(\square \)

Finally, we prove Lemma 4.1 stated in a slightly different version below.

Lemma 6.3

Let \(\theta _1,\ldots ,\theta _N\) be N distinct points in \(\mathbb {R}^n\) with convex hull \(\mathcal{C}:= \mathrm{conv}\{\theta _1,\ldots ,\theta _N\}\). Then, there exist functions \(\psi ^{(N)}_j:\mathcal{C}\rightarrow \mathbb {R}\), \(j=1,\ldots ,N\), such that

  1. (i)

    \(\psi ^{(N)}_1,\ldots ,\psi ^{(N)}_N\) are continuous on \(\mathcal{C}\);

  2. (ii)

    for any linear function \(\lambda : \mathbb {R}^n \rightarrow \mathbb {R}\) (in particular for \(\lambda (\theta )=1\) and \(\lambda (\theta )=\theta \)),

    $$\begin{aligned} \sum _{i=1}^N \psi ^{(N)}_i(\theta ) \lambda (\theta _i) = \lambda (\theta ) \qquad \text{ whenever } \theta \in \mathcal{C}; \end{aligned}$$
  3. (iii)

    for all \(i=1,\ldots , N\), \(\psi ^{(N)}_i(\theta ) \ge 0\) whenever \(\theta \in \mathcal{C}\);

  4. (iv)

    for all \(i,j = 1,\ldots ,N\), \(\psi ^{(N)}_i(\theta _j) = \delta _{i,j}\).

Proof

We proceed by induction on \(N \ge 1\). The result is clear for \(N=1\) and \(N=2\). Let us assume that it holds up to \(N-1\) for some integer \(N \ge 3\) and that we are given N distinct points \(\theta _1,\ldots ,\theta _N \in \mathbb {R}^n\). We separate two cases.

Case 1: Each \(\theta _j\) is an extreme point of \(\mathcal{C}:= \mathrm{conv} \{ \theta _1,\ldots , \theta _N \}\). In this case, we invoke the result of Kalman [19] and consider the functions \(\psi ^{(N)}_1,\ldots ,\psi ^{(N)}_N\) from [19] satisfying (i)–(iii). Condition (iv) then occurs as a consequence of (ii)-(iii). Indeed, given \(j = 1,\ldots ,N\), one can find a linear function \(\lambda : \mathbb {R}^n \rightarrow \mathbb {R}\) such that \(\lambda (\theta _j)=0\) and \(\lambda (\theta _i) >0\) for all \(i \not = j\). Therefore,

$$\begin{aligned} \sum _{i=1}^N \psi ^{(N)}_i(\theta _j) \lambda (\theta _i) = \lambda (\theta _j)=0 \end{aligned}$$

implies that \(\psi ^{(N)}_i(\theta _j) =0 \) for all \(i \not = j\), and then \(\psi ^{(N)}_j(\theta _j) =1 \) follows from \(\sum _{i=1}^N \psi ^{(N)}_i(\theta _j) =1\).

Case 2: One of the \(\theta _j\)’s belongs to the convex hull of the other \(\theta _i\)’s, say \(\theta _N \in \mathrm{conv} \{ \theta _1,\ldots , \theta _{N-1} \}\). Let \(\psi ^{(N-1)}_1,\ldots ,\psi ^{(N-1)}_{N-1}\) be the functions defined on \(\mathcal{C}= \mathrm{conv} \{ \theta _1,\ldots , \theta _{N-1} \} = \mathrm{conv} \{ \theta _1,\ldots , \theta _N \}\) that are obtained from the induction hypothesis applied to the \(N-1\) distinct points \(\theta _1,\ldots ,\theta _{N-1}\). Next, we introduce the set \(\Omega \), which has at least two elements, and the function \(\tau \), which is continuous on \(\mathcal{C}\), given by

$$\begin{aligned} \Omega := \{ j = 1,\ldots ,N-1: \psi ^{(N-1)}_j(\theta _N) > 0 \}, \quad \tau (\theta ) := \min _{j \in \Omega } \frac{\psi ^{(N-1)}_j(\theta )}{\psi ^{(N-1)}_j(\theta _N)}, \quad \theta \in \mathcal{C}. \end{aligned}$$

Finally, we define functions \(\psi ^{(N)}_1,\ldots ,\psi ^{(N)}_N\) by

$$\begin{aligned} \psi ^{(N)}_i(\theta ) := \psi ^{(N-1)}_i(\theta ) - \psi ^{(N-1)}_i(\theta _N) \tau (\theta ), \quad i=1,\ldots ,N-1, \quad \psi ^{(N)}_N(\theta ) := \tau (\theta ). \end{aligned}$$

These are continuous functions of \(\theta \in \mathcal{C}\), so (i) is satisfied. To verify (ii), given a linear function \(\lambda : \mathbb {R}^n \rightarrow \mathbb {R}\), we observe that

$$\begin{aligned} \sum _{i=1}^N \psi ^{(N)}_i (\theta ) \lambda (\theta _i)= & {} \sum _{i=1}^{N-1} \psi ^{(N-1)}_i (\theta ) \lambda (\theta _i) - \tau (\theta ) \sum _{i=1}^{N-1} \psi ^{(N-1)}_i (\theta _N) \lambda (\theta _i) + \tau (\theta ) \lambda (\theta _N)\\= & {} \lambda (\theta ) - \tau (\theta ) \lambda (\theta _N) + \tau (\theta ) \lambda (\theta _N) = \lambda (\theta ). \end{aligned}$$

As for (iii), given \(\theta \in \mathcal{C}\), the fact that \(\psi ^{(N)}_N(\theta ) \ge 0\) is clear from the definition of \(\tau \), and for \(i=1,\ldots ,N-1\), the fact that \(\psi ^{(N)}_i(\theta ) \ge 0\) is equivalent to \(\psi ^{(N-1)}_i(\theta _N) \tau (\theta ) \le \psi ^{(N-1)}_i(\theta )\), which is obvious if \(i \not \in \Omega \) and follows from the definition of \(\tau \) if \(i \in \Omega \). Finally, to prove (iv), it is enough to verify that \(\psi ^{(N)}_i(\theta _i) = 1\) for all \(i=1,\ldots ,N\), which clearly holds for \(i=N\), and for \(i=1,\ldots ,N-1\), it is the identity \(\psi ^{(N-1)}_i(\theta _N) \tau (\theta _i) = 0\), valid both when \(i \not \in \Omega \) and when \(i \in \Omega \), that implies \(\psi ^{(N)}_i(\theta _i) = \psi ^{(N-1)}_i(\theta _i) =1\). We have now shown that the induction hypothesis holds for N, and this concludes the inductive proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

DeVore, R., Foucart, S., Petrova, G. et al. Computing a Quantity of Interest from Observational Data. Constr Approx 49, 461–508 (2019). https://doi.org/10.1007/s00365-018-9433-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00365-018-9433-7

Keywords

Mathematics Subject Classification

Navigation