Deep ReLU neural networks in high-dimensional approximation
Introduction
Neural networks have been studied and used for almost 80 years, dating back to the foundational works of Hebb (1949), McCulloch and Pitts (1943) and Rosenblatt (1958). In recent years, deep neural networks have been successfully applied to a striking variety of Machine Learning problems, including computer vision (Krizhevsky, Sutskever, & Hinton, 2012), natural language processing (Wu, Schuster, Chen, Le, & Norouzi, 2016), speech recognition and image classification (LeCun, Bengio, & Hinton, 2015). The main advantage of deep neural networks over shallow ones is that they can output compositions of functions cheaply. Since their application range is getting wider, theoretical analysis to reveal the reason why deep neural networks could lead to significant practical improvements attracts substantial attention (Arora et al., 2017, Daubechies et al., 2019, Montúfar et al., 2014, Telgarsky, 2015, Telgarsky, 2016). In the last several years, there has been a number of interesting papers that address the role of depth and architecture of deep neural networks in approximating sets of functions which have a very special regularity properties such as analytic functions (E and Wang, 2018, Mhaskar, 1996), differentiable functions (Petersen and Voigtlaender, 2018, Yarotsky, 2017a), oscillatory functions (Grohs, Perekrestenko, Elbrachter, & Bolcskei, 2019), functions in isotropic Sobolev or Besov spaces (Ali and Nouy, 2020, Daubechies et al., 2019, Gribonval et al., 2021, Gühring et al., 2020, Yarotsky, 2017b) and functions in spaces of mixed smoothness (Montanelli and Du, 2019, Suzuki, 2019).
It has been shown that there is a close relation between the approximation by sampling recovery based on B-spline interpolation and quasi-interpolation representation, and the approximation by deep neural networks (Ali and Nouy, 2020, Daubechies et al., 2019, Montanelli and Du, 2019, Schwab and Zech, 2019, Suzuki, 2019, Yarotsky, 2017a, Yarotsky, 2017b). Most of these papers used deep ReLU (Rectified Linear Unit) neural networks for approximation since the rectified linear unit is a simple and preferable activation function in many applications. The output of such a network is a continuous piece-wise linear function which is easily and cheaply computed.
In recent decades, the high-dimensional approximation of functions or signals depending on a large number of variables, has been of great interest since they can be applied in a striking number of fields such as Mathematical Finance, Chemistry, Quantum Mechanics, Meteorology, and, in particular, in Uncertainty Quantification and Deep Machine Learning. A numerical method for such problems may require a computational cost increasing exponentially in dimension when the accuracy increases. This phenomenon is called the curse of dimensionality, coined by Bellmann (1957). Hence for an efficient computation in high-dimensional approximation, one of the key prerequisites is that the curse of dimension can be avoided or eased at least to some extent. In some cases this can be achieved, particularly when the functions to be approximated have an appropriate mixed smoothness, see Bungartz and Griebel (2004) and Novak and Woźniakowski, 2008, Novak and Woźniakowski, 2010 and references there. With this restriction one can apply approximation methods and sampling algorithms constructed on hyperbolic crosses and sparse grids which give a surprising effect since hyperbolic crosses and sparse grids have the number of elements much less than those of standard domains and grids but give the same approximation error. This essentially reduces the computational cost, and therefore makes the problem tractable.
The approximation by deep ReLU neural networks of functions having a mixed smoothness is very related to the high-dimensional sparse-grid approach which was introduced by Zenger for numerical solving partial differential equations (PDEs). For functions of mixed smoothness of integer order, high- dimensional sparse-grid approximations with application was investigated by Bungartz and Griebel (2004) employing hierarchical Lagrange polynomials multilevel basis and measuring the approximation error in the norm of or energy norm of . In the paper Yserentant (2010) on the electronic Schrödinger equation with very large number of variables, Yserentant used sparse-grid methods for approximation of the eigenfunctions of the electronic Schrödinger operator having a certain mixed smoothness. Triebel (2015, Chapter 6) has indicated that when the initial data belongs to spaces with mixed smoothness, Navier–Stokes equations admit a unique solution having some mixed smoothness. There is a very large number of papers on sparse grids in various problems of high-dimensional approximation in numerical solving of PDEs and stochastic PDEs, etc. to mention all of them. The reader can see the surveys in Bungartz and Griebel (2004) and the references therein.
Consider the problem of approximation of functions on in a space of a particular smoothness by trigonometric (for periodic ) or dyadic B-splines with accuracy and error measured in the norm of the space or isotropic Sobolev space . If is the Hölder space of isotropic smoothness , then the computation complexity typically is estimated similarly by or , respectively. For the Hölder space of mixed smoothness , the computation complexity is bounded by or , respectively, i.e., the bounds of the computation complexity are of quite different forms. Here, is a constant depending on as well as the norm in which the smoothness is defined. Similar estimates hold true for the computation complexity of approximation of functions in Sobolev or Besov type spaces of isotropic and mixed smoothness. Notice also that only in the last case the term in is free from the dimension . As usual, in classical settings of approximation problem which does not take account of dimension-dependence, this constant is not of interest and its value is not specified.
One of central problems in high-dimensional approximation is to give an evaluation explicit in for the term in the above mentioned estimates of computation complexity to understand the tractability of the approximation problem.
We briefly recall some known results on approximation by deep ReLU neural networks directly related to the present paper. In Yarotsky (2017a), the author constructed a deep ReLU neural network of depth and size that is capable in -approximating with accuracy of functions on from the unit ball of isotropic Sobolev space of smoothness . By using known results on VC-dimension of deep ReLU neural networks a lower bound also was established for this approximation. In Gühring et al. (2020), this result was extended to the isotropic Sobolev space with error measured in the norm of for and . Considering the -approximation of functions from the unit ball of Besov space of mixed smoothness by deep ReLU networks of depth and size , the author of Suzuki (2019) evaluated the approximation error as . The lower bound of the approximation error was estimated via known results on linear -widths. In Montanelli and Du (2019), the authors constructed a deep ReLU neural network for -approximation with accuracy of a function with homogeneous boundary condition in Sobolev space () of mixed smoothness . Its depth and size are evaluated as and , respectively. Notice that all the hidden constants in these estimates for computation complexity and convergence rate were not computed explicitly in dimension . In particular, in the proof of the convergence rate in the paper (Suzuki, 2019), the author used a discrete quasi-norm equivalence of Besov spaces established (Dũng, 2011a) which does not allow to find such constants explicit in . Also, due to the homogeneous boundary condition of the functions from the unit ball of the spaces considered in Montanelli and Du (2019), their -dimensional -norm is decreasing as fast as for some , when going to infinity, see Remark 4.6 for details.
The purpose of the present paper is to study the computation complexity of deep ReLU neural networks for the high-dimensional approximation of functions from Hölder–Zygmund space of mixed smoothness satisfying the homogeneous boundary condition, when the dimension may be very large. The approximation error is measured in the norm of the isotropic Sobolev space . We focus our attention on -dependence of this computation complexity. For every function , we want to explicitly construct a deep ReLU neural network having an output that approximates with a prescribed accuracy , and prove -dependent bounds of the computation complexity of this approximation characterized as its size and depth, explicitly in and (cf. Anthony and Bartlett, 2009, Daubechies et al., 2019, Montanelli and Du, 2019, Yarotsky, 2017a).
Let us emphasize that this problem of approximating functions from the space with error measured in the norm of the space , in particular, the energy norm of the space , naturally arises from some high-dimensional approximation and numerical methods of PDEs, see Bungartz and Griebel, 0000, Bungartz and Griebel, 2004, Garcke et al., 2001 and Griebel and Knapek (2009) for Poisson’s equation. For elliptic PDEs with homogeneous boundary condition, if the initial data and diffusion coefficients have a mixed smoothness, then the solution belongs to with a certain . One then can consider the problem of approximation of this solution by deep ReLU neural networks with error measured in the energy norm of . See a detailed example in Remark 4.5.
We briefly describe our contribution to high-dimensional approximation by deep ReLU neural networks. Denote by the unit ball in the space . For every , we explicitly construct a deep ReLU neural network having an output that approximates in the -norm with a prescribed accuracy and having the computation complexity expressing the dimension-dependent size and the dimension-dependent depth where . Notice the upper bounds of the size and the depth consist of three terms. The first term is independent of the dimension and the accuracy , the second term depends only on the dimension and the third term depends only on the accuracy . For the depth the second term is very mild. If a light restriction holds, in particular when , for the size the second term satisfies the inequality when .
By using a recent result on VC-dimension bounds for piecewise linear neural networks in Bartlett, Harvey, Liaw, and Mehrabian (2019) we prove the following dimension-dependent lower bound for the case when . For a given , if is a neural network architecture of depth such that for any , there is a deep ReLU neural network of architecture that approximates with accuracy , then there exists a constant such that
The proof of these results, in particular, the construction of the approximating deep ReLU neural networks relies on interpolation sampling recovery methods on sparse-grids of points tailored fit to the Hölder–Zygmund mixed smoothness and the regularity of the isotropic Sobolev space . These sampling recovery methods are explicitly constructed as a truncated Faber series of functions to be approximated.
Let us analyze some differences in the proofs of results between the present paper and the close paper (Suzuki, 2019) as well as the other related papers (Gühring et al., 2020, Montanelli and Du, 2019, Yarotsky, 2017a) (see also Remark 3.2, Remark 3.3, Remark 4.3, Remark 4.4).
Firstly, to prove the results in Suzuki (2019), the author employed discrete (quasi-)norm equivalence in terms of the valued-functional coefficients of B-spline quasi-interpolation representation for the Besov space (Dũng, 2011a). But, as mentioned above, this does not allow to estimate the dimension-dependent component of the approximation error. In the present paper, by using the representation of functions by Faber series we obtained the dimension-dependent bounds for the size and depth of a deep ReLU neural network required for approximation of functions from . This is a difference in the proofs between (Suzuki, 2019) and the present paper.
Secondly, in both the papers functions to be approximated have a certain anisotropic mixed smoothness, but the norm measuring approximation error used in Suzuki (2019) (also in Montanelli & Du, 2019) is of the Lebesgue space , while in our paper is of the isotropic space Sobolev . The anisotropic mixed smoothness and the difference between the norms of and together lead to different methods of construction of (quasi-)interpolation sparse-grid sampling approximation and hence of deep ReLU neural network approximation (notice that these methods are similar if functions to be approximated have an isotropic smoothness Gühring et al., 2020, Yarotsky, 2017a). In particular, the authors in Montanelli and Du (2019) and Suzuki (2019) used classical Smolyak grids, while in this paper we use “notched” Smolyak grids. Therefore, the sparsity of the grid points for interpolation sampling in our paper is much higher than the sparsity of those in Montanelli and Du (2019) and Suzuki (2019).
The outline of this paper is as follows. In Section 2, we recall necessary knowledge of deep ReLU neural networks. Section 3 introduces function spaces under consideration, presents a representation of continuous functions on the unit cube by Faber series and proves some error estimates of approximation by sparse-grid sampling recovery for functions in Hölder–Zygmund classes . In Section 4, based on the results in Section 3, we construct a deep ReLU neural network that approximates in the norm of the space functions in and prove upper and lower estimates for the size and depth required. Some concluding remarks are presented in Section 5.
Notation. As usual, denotes the natural numbers, the integers, the real numbers and ; . The letter is always reserved for the underlying dimension of , , etc., and denotes the set of all natural numbers from to . Vectorial quantities are denoted by boldface letters and denotes the th coordinate of , i.e., . For and we use the notation and . The symbol stands for the cardinality of the finite set . For we write and if we denote with the usual modification when . The notations and are extended to matrices in . For the function on , denotes the support of . The value of the function of one variable is understood as when the limit exists.
Section snippets
Deep ReLU neural networks
There is a wide variety of deep neural network architectures and each of them is adapted to specific tasks. For approximation of functions from Hölder–Zygmund spaces, in this section we introduce feed-forward deep ReLU neural networks with one-dimension output. We are interested in standard deep neural networks where only connections between neighboring layers are allowed. Let us introduce necessary definitions and elementary facts on deep ReLU neural networks.
Definition 2.1 Let and . A deep neural
Faber series and high-dimensional sparse-grid sampling recovery
In this section we introduce the space of functions having Hölder–Zygmund mixed smoothness , and the isotropic Sobolev space ; recall a representation of continuous functions on by tensor product Faber series. This representation plays a fundamental role in construction of sparse-grid sampling recovery and of deep neural networks for approximation in the -norm of functions from the space . We explicitly construct linear sampling methods on sparse grids and
Approximation by deep ReLU neural networks
In this section, we will apply the results on sparse-grid sampling recovery in the previous section to the approximation by deep ReLU neural networks of functions from . For every and every , we will explicitly construct a deep ReLU neural network having an architecture independent of , and the output which approximates in the norm of the isotropic Sobolev space with accuracy , and give dimension-dependent upper bounds for the size and the depth of . We
Concluding remarks
We have explicitly constructed a deep ReLU neural network having an output that approximates with an arbitrary prescribed accuracy in the norm of the isotropic Sobolev space functions having Hölder–Zygmund mixed smoothness with . For this approximation, we have established the dimension-dependent estimates for the computation complexity characterized by the size and the depth of this deep ReLU neural network:
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant number 102.01-2020.03. A part of this work was done when the authors were working at the Vietnam Institute for Advanced Study in Mathematics (VIASM). They would like to thank the VIASM for providing a fruitful research environment and working condition.
References (54)
- et al.
Sampling on energy-norm based sparse grids for the optimal recovery of Sobolev type functions in
Journal of Approximation Theory
(2016) - et al.
New explicit-in-dimension estimates for the cardinality of high-dimensional hyperbolic crosses and approximation of functions having mixed smoothness
Journal of Complexity
(2016) B-spline quasi-interpolant representations and sampling recovery of functions with mixed smoothness
Journal of Complexity
(2011)- et al.
Optimal approximation of piecewise smooth functions using deep ReLU neural networks
Neural Networks
(2018) Error bounds for approximations with deep ReLU networks
Neural Networks
(2017)- et al.
Approximation of smoothness classes by deep ReLU networks
(2020) - et al.
Neural network learning: Theoretical foundations
(2009) - et al.
Understanding deep neural networks with rectified linear units
Electronic colloquium on computational complexityReport No. 98
(2017) - et al.
Nearly-tight VC-dimension and pseudodimension bounds for piecewiselinear neural networks
Journal of Machine Learning Research
(2019) Dynamic programming
(1957)
Sparse grids
Acta Numerica
The finite element method for elliptic problems
Optimal adaptive sampling recovery
Advances in Computational Mathematics
Sampling and cubature on sparse grids based on a B-spline quasi-interpolation
Foundations of Computational Mathematics
Computation complexity of deep ReLU neural networks in high-dimensional approximation
Dimension-dependent error estimates for sampling recovery on Smolyak grids based on B-spline quasi-interpolation
Journal of Approximation Theory
N-Widths and -dimensions for high-dimensional approximations
Foundations of Computational Mathematics
Nonlinear approximation and (Deep) ReLU networks
Constructive approximation
Exponential convergence of the deep neural network approximation for analytic functions
Science China Mathematics
Data mining with sparse grids
Computing
Approximation spaces of deep neural networks
Optimized general sparse grid approximation spaces for operator equations
Mathematics of Computation
Deep neural network approximation theory
Error bounds for approximations with deep ReLU neural networks in norms
Analysis and Applications (Singapore)
Cited by (27)
Prediction of California bearing ratio and modified proctor parameters using deep neural networks and multiple linear regression: A case study of granular soils
2024, Case Studies in Construction MaterialsA multivariate Riesz basis of ReLU neural networks
2024, Applied and Computational Harmonic AnalysisDeep ReLU neural network approximation in Bochner spaces and applications to parametric PDEs
2023, Journal of ComplexityNumerical solving for generalized Black-Scholes-Merton model with neural finite element method
2022, Digital Signal Processing: A Review JournalCitation Excerpt :They [66] theoretically pointed out that the linear FEM can be represented by an deep ReLU NN with at least two hidden layers. Yarotsky et al. [67] analyzed the error bounds of the deep ReLU NN in Sobolev space, after that, Dung et al. [68] extended the Deep ReLU NN to solve the high-dimensional differential equations. In this section, we introduce the NFEM and use it to solve generalized BSM equations.
Real-time prediction of key monitoring physical parameters for early warning of fire-induced building collapse
2022, Computers and Structures