Simultaneous approximation of a smooth function and its derivatives by deep neural networks with piecewise-polynomial activations
Introduction
Neural networks have recently gained much attention due to their impressive performance in many complicated practical tasks, including image processing (LeCun, Bengio, & Hinton, 2015), generative modeling (Goodfellow et al., 2014), reinforcement learning (Mnih et al., 2015), numerical solution of PDEs, e.g., Geist et al., 2021, Han et al., 2018, and optimal control (Chen et al., 2019, Onken et al., 2022). This makes them extremely useful in design of self-driving vehicles (Li, Ota, & Dong, 2018) and robot control systems, e.g., Bozek et al., 2020, Cembrano et al., 1994, González-Álvarez et al., 2022. One of the reasons for such a success of neural networks is their expressiveness, that is, the ability to approximate functions with any desired accuracy. The question of expressiveness of neural networks has a long history and goes back to the papers (Cybenko, 1989, Funahashi, 1989, Hornik, 1991). In particular, in Cybenko (1989), the author showed that one hidden layer is enough to approximate any continuous function with any prescribed accuracy . However, further analysis revealed the fact that deep neural networks may require much fewer parameters than the shallow ones to approximate with the same accuracy. Many efforts were put in recent years to understand the fidelity of deep neural networks. In a pioneering work (Yarotsky, 2017), the author showed that for any target function from the Sobolev space there is a neural network with parameters and ReLU activation function, that approximates within the accuracy with respect to the -norm on the unit cube . Further works in this direction considered various smoothness classes of the target functions (Gühring and Raslan, 2021, Li et al., 2020a, Lu et al., 2021, Shen et al., 2021), neural networks with diverse activations (De Ryck et al., 2021, Gühring and Raslan, 2021, Jiao et al., 2021, Langer, 2021), domains of more complicated shape (Shen, Yang, & Zhang, 2020), and measured the approximation errors with respect to different norms (De Ryck et al., 2021, Gühring and Raslan, 2021, Schmidt-Hieber, 2020, Yarotsky, 2017). Several authors also considered the expressiveness of neural networks with different architectures. This includes wide neural networks of logarithmic (Gühring and Raslan, 2021, Schmidt-Hieber, 2020, Yarotsky, 2017) or even constant depth (De Ryck et al., 2021, Li et al., 2020a, Li et al., 2020b, Shen et al., 2021), or deep and narrow networks (Hanin, 2019, Kidger and Lyons, 2020, Park et al., 2021). Most of the existing results on the expressiveness of neural networks measure the quality of approximation with respect to either the - or -norm, . Much fewer papers study the approximation of derivatives of smooth functions. These rare exceptions include Gühring et al., 2020, Gühring and Raslan, 2021 and De Ryck et al. (2021).
In the present paper, we focus on feed-forward neural networks with piecewise-polynomial activation functions of the form . Neural networks with such activations are known to successfully approximate smooth functions from the Sobolev and Besov spaces with respect to the - and -norms (see, for instance, Abdeljawad and Grohs, 2022, Ali and Nouy, 2021, Chen et al., 2022, Gribonval et al., 2022, Klusowski and Barron, 2018, Li et al., 2020a, Li et al., 2020b, Siegel and Xu, 2022). We continue this line of research and study the ability of such neural networks to approximate not only smooth functions themselves but also their derivatives. We derive the non-asymptotic upper bounds on the Hölder norm between the target function and its approximation from a class of sparsely connected neural networks with ReQU activations. In particular, we show that, for any from a Hölder ball , , , (see Section 2 for the definition) and any , there exists a neural network with ReQU activation functions, that uniformly approximates the target function in the norms of the Hölder spaces for all . Here and further in the paper, stands for the largest integer which is strictly smaller than . A simplified statement of our main result is given below.
Theorem 1 simplified version of Theorem 2 Fix and . Then, for any , for any , and any integer , there exists a neural network with ReQU activation functions such that it has layers, at most neurons in each layer and non-zero weights taking their values in . Moreover, it holds that where is a universal constant.
We provide explicit expressions for the hidden constants in Theorem 2. The main contributions of our work can be summarized as follows.
- •
Given a smooth target function , we construct a neural network, that simultaneously approximates all the derivatives of up to order with optimal dependence of the precision on the number of non-zero weights. That is, if we denote the number of non-zero weights in the network by , then it holds that simultaneously for all .
- •
The constructed neural network has almost the same smoothness as the target function itself while approximating it with the optimal accuracy. This property turns out to be very useful in many applications including the approximation of PDEs and density transformations where we need to use derivatives of the approximation.
- •
The weights of the approximating neural network are bounded in absolute values by . The latter property plays a crucial role in deriving bounds on the generalization error of empirical risk minimizers in terms of the covering number of the corresponding parametric class of neural networks. Note that the upper bounds on the weights provided in De Ryck et al., 2021, Gühring et al., 2020, Gühring and Raslan, 2021 blow up as the approximation error decreases.
The rest of the paper is organized as follows. In Section 2, we introduce necessary definitions and notations. Section 3 contains the statement of our main result, Theorem 2, followed by a detailed comparison with the existing literature. We then present numerical experiments in Section 4. The proofs are collected in Section 5. Some auxiliary facts are deferred to Appendix.
Section snippets
Norms.
For a matrix and a vector , we denote by and the maximal absolute value of entries of and , respectively. and shall stand for the number of non-zero entries of and , respectively. Finally, the Frobenius norm and operator norm of are denoted by and , respectively, and the Euclidean norm of is denoted by . For a function , we set If the domain is clear from the context, we simply write or
Approximation of functions from Hölder classes
Our main result states that any function from , , , can be approximated by a feed-forward deep neural network with ReQU activation functions in , .
Theorem 2 Let and let . Then, for any , for any , and any integer , there exists a neural network of the width with hidden layers and at most non-zero weights taking their values
Numerical experiments
In this section, we provide numerical experiments to illustrate the approximation properties of neural networks with ReQU activations. We considered a scalar function of two variables and approximated it on the unit square via neural networks with two types of activations: ReLU and ReQU. All the neural networks were fully connected, and all their hidden layers had a width . The first layer had a width . The depth of neural networks took its values in . In
Proof of Theorem 2
Step 1. Let . Consider a vector , such that By Theorem 3, there exist tensor-product splines of order associated with knots at such that Our goal is to show that can be represented by a neural network with ReQU
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The publication was supported by the grant for research centers in the field of AI provided by the Analytical Center for the Government of the Russian Federation (ACRF) in accordance with the agreement on the provision of subsidies (identifier of the agreement 000000D730321P5Q0002) and the agreement with HSE University, Russia No. 70-2021-00139. Nikita Puchkin is a Young Russian Mathematics award winner and would like to thank its sponsors and jury. Denis Belomestny acknowledges the financial
References (41)
- et al.
Neural networks for robot control
Annual Review in Automatic Programming
(1994) - et al.
Power series expansion neural network
Journal of Computer Science
(2022) - et al.
On the approximation of functions by tanh neural networks
Neural Networks
(2021) On the approximate realization of continuous mappings by neural networks
Neural Networks
(1989)- et al.
Approximation rates for neural networks with encodable weights in smoothness spaces
Neural Networks
(2021) Approximation capabilities of multilayer feedforward networks
Neural Networks
(1991)- et al.
Neural network approximation: Three hidden layers are enough
Neural Networks
(2021) - et al.
High-order approximation rates for shallow neural networks with cosine and activation functions
Applied and Computational Harmonic Analysis
(2022) - et al.
Simultaneous -approximation order for neural networks
Neural Networks
(2005) Error bounds for approximations with deep ReLU networks
Neural Networks
(2017)
Approximations with deep neural networks in Sobolev time-space
Analysis and Applications
Approximation of smoothness classes by deep rectifier networks
SIAM Journal on Numerical Analysis
Neural network control of a wheeled mobile robot based on optimal trajectories
International Journal of Advanced Robotic Systems
Fast computation of Fourier integral operators
SIAM Journal on Scientific Computing
A fast butterfly algorithm for the computation of Fourier integral operators
Multiscale Modeling & Simulation. A SIAM Interdisciplinary Journal
Optimal control via neural networks: A convex approach
Approximation by superpositions of a sigmoidal function
Mathematics of Control, Signals, and Systems
Numerical solution of the parametric diffusion equation by deep neural networks
Journal of Scientific Computing
Generative adversarial nets
Cited by (10)
Theoretical guarantees for neural control variates in MCMC
2024, Mathematics and Computers in SimulationNeural networks with ReLU powers need less depth
2024, Neural NetworksA prediction and behavioural analysis of machine learning methods for modelling travel mode choice
2023, Transportation Research Part C: Emerging TechnologiesA Wasserstein perspective of Vanilla GANs
2024, arXivTrajectory Correction of the Rocket for Aerodynamic Load Shedding Based on Deep Neural Network and the Chaotic Evolution Strategy with Covariance Matrix Adaptation
2024, IEEE Transactions on Aerospace and Electronic Systems