Kernel density estimation by stagewise algorithm with a simple dictionary

Nishida, Kiheiji; Naito, Kanta

doi:10.1007/s00180-022-01303-7

Kernel density estimation by stagewise algorithm with a simple dictionary

Original paper
Published: 02 December 2022

Volume 39, pages 523–560, (2024)
Cite this article

Computational Statistics Aims and scope Submit manuscript

166 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

This study proposes multivariate kernel density estimation by stagewise minimization algorithm based on U-divergence and a simple dictionary. The dictionary consists of an appropriate scalar bandwidth matrix and a part of the original data. The resulting estimator brings us data-adaptive weighting parameters and bandwidth matrices, and provides a sparse representation of kernel density estimation. We develop the non-asymptotic error bound of the estimator that we obtained via the proposed stagewise minimization algorithm. It is confirmed from simulation studies that the proposed estimator performs as well as, or sometimes better than, other well-known density estimators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative Analysis of Density Estimation Based Kernel Regression

Optimal Kernel Selection for Density Estimation

Iterative kernel density estimation from noisy-dependent observations

Article 30 October 2023

References

Basu A, Harris IR, Hjort NL, Jones MC (1998) Robust and efficient estimation by minimising a density power divergence. Biometrika 85:549–559
Article MathSciNet Google Scholar
Bregman LM (1967) The relaxition method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput Math Math Phys 15(1):17–30
ADS Google Scholar
Dukkipati A (2010) On Kolmogorov-Nagumo averages and nonextensive entropy, 2010 International symposium on information theory and its applications, ISITA
Duong T, Hazelton ML (2003) Plug-in bandwidth matrices for bivariate kernel density estimation. J Nonparametric Stat 15(1):17–30
Article MathSciNet Google Scholar
Friedman JH, Stuetzle W, Schroeder A (1984) Projection pursuit density estimation. J Am Stat Assoc 79(387):599–608
Article MathSciNet Google Scholar
Girolami M, He C (2003) Probability density estimation from optimally condensed data samples. IEEE Trans Pattern Anal Mach Intell 25:1253–1264
Article Google Scholar
Klemelä J (2007) Density estimation with stagewise optimization of the empirical risk. Mach Learn 90:29–57
Google Scholar
Murata N, Takenouchi T, Kanamori T, Eguchi S (2004) Information geometry of U-boost and Bregman divergence. Neural Comput 16:1437–1481
Article PubMed Google Scholar
Minami M, Eguchi S (2002) Robust blind source separation by beta-divergence. Neural Comput 14:1859–1886
Article Google Scholar
Naito K, Eguchi S (2013) Density estimation with minimization of $U$-divergence. Mach Learn 90(1):29–57
Article MathSciNet Google Scholar
Nash JW, Sellers LT, Talbot RS, Cawthorn JA, Ford BW (1994) The population biology of abalone (haliotis species) in Tasmania. I. blacklip abalone (H. rubra) from the north coast and islands of bass Strait, Sea fisheries division, technical report No. 48 (ISSN 1034-3288)
Ridgeway G (2002) Looking for lumps: boosting and bagging for density estimation. Comput Stat Data Anal 38(1):379–392
Article MathSciNet Google Scholar
Scott D (2015) Multivariate density estimation, Theory, practice, and visualization, 2nd edn. Wiley
Book Google Scholar
Schott JR (2017) Matrix analysis for statistics, 3rd edn. Wiley
Google Scholar
Wand MP, Jones MC (1993) Comparison of somoothing parametrizations in bivariate density estimation. J Am Stat Assoc 88:520–528
Article Google Scholar
Zhang C, Jiang Y, Chai Y (2010) Penalized bregman divergence for large-dimensional regression and classification. Biometrika 97:551–566
Article MathSciNet PubMed PubMed Central Google Scholar
Zhang C, Jiang Y, Shang Z (2009) New aspects of bregman divergence in regression and classification with parametric and nonparametric estimation. Can J Stat 37:119–139
Article MathSciNet Google Scholar

Download references

Acknowledgements

Kiheiji NISHIDA is grateful for the financial support from Hyogo Medical University Grant for Research Promotion, 2022. Kanta NAITO gratefully acknowledges the financial support from KAKENHI 19K11851. We sincerely thank the editor and two anonymous reviewers for taking the time to review our manuscript and providing many constructive comments.

Author information

Authors and Affiliations

School of Pharmacy, Hyogo Medical University, 1-3-6, Minatojima, Chuo-ku, Kobe, Hyogo, 650-8530, Japan
Kiheiji Nishida
Graduate School of Science, Chiba University, 1-33, Yayoicho, Inage-ku, Chiba-shi, Chiba, 263-8522, Japan
Kanta Naito

Authors

Kiheiji Nishida
View author publications
You can also search for this author in PubMed Google Scholar
Kanta Naito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kiheiji Nishida.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

To prove Theorem 1, we need the following Lemmas 1–5. Let ${\tilde{f}}({\textbf{x}})$ be any density estimator and let

$$\begin{aligned} f^{*}({\textbf{x}}|{\textbf{X}}^{*}) = u\left( \sum _{j=1}^{N}p_{j}\xi (\phi _{j}({\textbf{x}}|{\textbf{X}}^{*})) \right) , \end{aligned}$$

(19)

where $p_{j} \ge 0$, $\sum _{j=1}^{N} p_{j} = 1$, and $\phi _{j}({\textbf{x}}|{\textbf{X}}^{*}) \in D$. We consider two functions of the variable $\pi \in [0, 1]$ defined as follows:

$$\begin{aligned}{} & {} \theta (\pi | {\tilde{f}}({\textbf{x}}), f^{*}({\textbf{x}}|{\textbf{X}}^{*})) \\{} & {} \quad = \sum _{j=1}^{N} p_{j} \int _{{\mathbb {R}}^{d}}[ U((1-\pi )\xi ({\tilde{f}}({\textbf{x}})) + \pi \xi (\phi _{j}({\textbf{x}}|{\textbf{X}}^{*}))) \\{} & {} \qquad - (1-\pi )U(\xi ({\tilde{f}}({\textbf{x}}))) - \pi U(\xi (f^{*}({\textbf{x}}|{\textbf{X}}^{*}))) ]d{\textbf{x}}. \\{} & {} \qquad \quad \eta (\pi | {\tilde{f}}({\textbf{x}}), f^{*}({\textbf{x}}|{\textbf{X}}^{*}))\\{} & {} \quad = \sum _{j=1}^{N} p_{j} \int _{{\mathbb {R}}^{d}}[ U((1-\pi )\xi ({\tilde{f}}({\textbf{x}})) + \pi \xi (\phi _{j}({\textbf{x}}|{\textbf{X}}^{*}))) \\{} & {} \qquad - U((1-\pi ) \xi ({\tilde{f}}({\textbf{x}})) + \pi \xi (f^{*}({\textbf{x}}|{\textbf{X}}^{*}))) ]d{\textbf{x}}. \end{aligned}$$

Then, Lemmas 1–3 are obtained as follows:

Lemma 1

Let ${\tilde{f}}({\textbf{x}})$ be a density estimator of $f({\textbf{x}})$ and let $f^{*}({\textbf{x}}|{\textbf{X}}^{*})$ be the estimator given in (19). For $0 \le \pi \le 1$, it then follows that

$$\begin{aligned}{} & {} \sum _{j=1}^{N}p_{j} {\widehat{L}}_{U}(u((1-\pi )\xi ({\tilde{f}}(\cdot ))+\pi \xi (\phi _{j}(\cdot |{\textbf{X}}^{*})))) - {\widehat{L}}_{U}(f^{*}(\cdot |{\textbf{X}}^{*}))\\{} & {} \quad = (1-\pi )[{\widehat{L}}_{U}({\tilde{f}}(\cdot )) - {\widehat{L}}_{U}(f^{*}(\cdot |{\textbf{X}}^{*}))] + \theta (\pi |{\tilde{f}}({\textbf{x}}), f^{*}({\textbf{x}}|{\textbf{X}}^{*})). \end{aligned}$$

Lemma 2

Let ${\tilde{f}}({\textbf{x}})$ be a density estimator of $f({\textbf{x}})$ and let $f^{*}({\textbf{x}}|{\textbf{X}}^{*})$ be the estimator given in (19). Then for any $\pi \in [0, 1]$,

$$\begin{aligned} \theta (\pi |{\tilde{f}}({\textbf{x}}), f^{*}({\textbf{x}}|{\textbf{X}}^{*}))\le & {} \eta (\pi |{\tilde{f}}({\textbf{x}}), f^{*}({\textbf{x}}|{\textbf{X}}^{*})) \end{aligned}$$

Lemma 3

Let $f^{*}({\textbf{x}}|{\textbf{X}}^{*})$ be as given in (19) and let

$$\begin{aligned} {\tilde{f}}({\textbf{x}}|{\textbf{X}}^{*})= & {} u \left( \sum _{l=1}^{k} q_{l} \xi ({\tilde{\phi }}_{l}({\textbf{x}}|{\textbf{X}}^{*})) \right) \end{aligned}$$

for some $\sum _{l=1}^{k} q_{l} \xi (\tilde{\phi _{l}}(\cdot |{\textbf{X}}^{*})) \in co(\xi (D))$. Under Assumption 1, it follows that

$$\begin{aligned} \eta (\pi ) = \eta (\pi |{\tilde{f}}({\textbf{x}}|{\textbf{X}}^{*}), f^{*}({\textbf{x}}|{\textbf{X}}^{*})) \le \pi ^{2} B_{U}({\textbf{X}}^{*})^{2} \end{aligned}$$

for any $\pi \in [0,1]$.

In Lemmas 4-5, let $W = \{ (\lambda _{\phi })_{\phi \in D} | \lambda _{\phi } \ge 0, \sum _{\phi \in D}\lambda _{\phi }=1, \#\{ \lambda _{\phi } > 0 \} < \infty \}$ and we define the following convex combination:

$$\begin{aligned} f({\textbf{x}}, \Lambda ) = f({\textbf{x}}, \Lambda , D) = u \Bigl ( \sum _{\phi \in D} \lambda _{\phi }\xi (\phi ({\textbf{x}}|{\textbf{X}}^{*})) \Bigr ), {\textbf{x}} \in {\mathbb {R}}^{d}, \end{aligned}$$

where $\Lambda = (\lambda _{\phi })_{\phi \in D} \in W$. Then, we obtain Lemmas 4-5.

Lemma 4

For stagewise minimization density estimator ${\widehat{f}}({\textbf{x}}|{\textbf{X}}^{*})$, it holds under Assumption 1 that

$$\begin{aligned} {\widehat{L}}_{U}({\widehat{f}}(\cdot |{\textbf{X}}^{*}))\le & {} \inf _{\Lambda \in W} {\widehat{L}}_{U}({f}(\cdot , \Lambda )) + \frac{\theta ^2 B_{U}({\textbf{X}}^{*})^{2}}{M + (\theta -1)} + \delta . \end{aligned}$$

Lemma 5

For $\tau > 0$, let ${\widehat{f}}(\cdot |{\textbf{X}}^{*}) \in \{ f(\cdot . \Lambda ) | \Lambda \in W\}$ be such that

$$\begin{aligned} {\widehat{L}}_{U}({\widehat{f}}(\cdot |{\textbf{X}}^{*})) \le \inf _{\Lambda \in W} {\widehat{L}}_{U}({\widehat{f}}(\cdot , \Lambda )) + \tau . \end{aligned}$$

Then,

$$\begin{aligned}{} & {} D_{U}(f, {\widehat{f}}(\cdot |{\textbf{X}}^{*})) \le \inf _{\Lambda \in W} {\widehat{L}}_{U}({\widehat{f}}(\cdot , \Lambda )) + \tau + 2 \sup _{\phi \in D} |\nu _{n}(\xi (\phi (\cdot |{\textbf{X}}^{*})))|,\\{} & {} \quad where \nu _{n}(\xi (\phi (\cdot |{\textbf{X}}^{*}))) = \frac{1}{n} \sum _{i=1}^{n} \xi (\phi ({\textbf{X}}_{i}|{\textbf{X}}^{*})) - \int _{{\mathbb {R}}^{d}} \xi (\phi ({\textbf{x}}|{\textbf{X}}^{*}))f({\textbf{x}})d{\textbf{x}}. \end{aligned}$$

If we replace the dictionary in the proofs of Klemelä (2007) and Naito and Eguchi (2013) with the one in (9), we obtain Lemmas 1–5. Using Lemmas 4-5 in conjunction with Lemmas 1–3, we obtain the non-asymptotic error bound given by Theorem 1.

Appendix B

We obtain Theorem 2 using Lemma 6.

Lemma 6

For any $h({\textbf{x}}|{\textbf{X}}^{*}) \in co(\xi (D))$, let $g({\textbf{x}}|{\textbf{X}}^{*}) = u(h({\textbf{x}}|{\textbf{X}}^{*}))$ and let $g_{c}({\textbf{x}}|{\textbf{X}}^{*}) = v_{g}^{-1}g({\textbf{x}}|{\textbf{X}}^{*})$, where

$$\begin{aligned} v_{g} = v_{g}({\textbf{X}}^{*}) = \int _{{\mathbb {R}}^{d}}g({\textbf{x}}|{\textbf{X}}^{*})d{\textbf{x}}. \end{aligned}$$

Under Assumption 2, we have

$$\begin{aligned}{} & {} D_{U}(f(\cdot ), g_{c}(\cdot |{\textbf{X}}^{*}))\\{} & {} \quad \le D_{U}(f(\cdot ), g(\cdot |{\textbf{X}}^{*})) + C_{U}^{-1} \bigl |1-v_{g}({\textbf{X}}^{*})^{-1} \bigr | \int _{{\mathbb {R}}^{d}} \bigl |g_{c}({\textbf{x}}|{\textbf{X}}^{*}) -f({\textbf{x}})\bigr | g({\textbf{x}}|{\textbf{X}}^{*})^{1-\alpha } d{\textbf{x}}. \end{aligned}$$

If we replace the dictionary in the proofs of Klemelä (2007) and Naito and Eguchi (2013) with the one in (9), we obtain Lemma 6.

Appendix C

Using convex property of $U(t) = \exp (t)$, we can verify for the KL divergence that

$$\begin{aligned}{} & {} \Psi (\delta , \Phi |{\textbf{X}}^{*}) \\{} & {} \quad = \int _{{\mathbb {R}}^{d}} \exp \left( (1-\delta ) \sum _{m=1}^{T}q_{m} \log {\tilde{\phi }}_{m}({\textbf{x}}|{\textbf{X}}^{*}) + \delta \log \phi ({\textbf{x}}|{\textbf{X}}^{*}) \right) \\{} & {} \qquad \quad \{ \log \phi ({\textbf{x}}|{\textbf{X}}^{*}) - \log {\bar{\phi }}({\textbf{x}}|{\textbf{X}}^{*}) \}^{2} d{\textbf{x}} \\{} & {} \quad \le \int _{{\mathbb {R}}^{d}} \left( (1-\delta ) \sum _{m=1}^{T}q_{m} {\tilde{\phi }}_{m}({\textbf{x}}|{\textbf{X}}^{*}) + \delta \phi ({\textbf{x}}|{\textbf{X}}^{*}) \right) \{ \log \phi ({\textbf{x}}|{\textbf{X}}^{*}) - \log {\bar{\phi }}({\textbf{x}}|{\textbf{X}}^{*}) \}^{2} d{\textbf{x}} \\{} & {} \quad \le (1-\delta ) \sum _{m=1}^{T}q_{m} \int _{{\mathbb {R}}^{d}} {\tilde{\phi }}({\textbf{x}}|{\textbf{X}}^{*}) \{ \log \phi ({\textbf{x}}|{\textbf{X}}^{*}) - \log {\bar{\phi }}({\textbf{x}}|{\textbf{X}}^{*}) \}^{2} d{\textbf{x}} \\{} & {} \qquad + \delta \int _{{\mathbb {R}}^{d}} \phi ({\textbf{x}}|{\textbf{X}}^{*}) \{ \log \phi ({\textbf{x}}|{\textbf{X}}^{*}) - \log {\bar{\phi }}({\textbf{x}}|{\textbf{X}}^{*}) \}^{2} d{\textbf{x}} \\{} & {} \quad \le (1-\delta )\sum _{m=1}^{T}q_{m}B_{KL}({\textbf{X}}^{*})^{2} + \delta B_{KL}({\textbf{X}}^{*})^{2} \\{} & {} \quad = B_{KL}({\textbf{X}}^{*})^{2}. \end{aligned}$$

We evaluate the constant $B_{KL}({\textbf{X}}^{*})^{2}$. Let $h_{a}$, $h_{b}$ and $h_{c}$ be three different scalar bandwidths, which are not random variables by assumption. Let ${\textbf{X}}_{i}^{*}$, ${\textbf{X}}_{j}^{*}$ and ${\textbf{X}}_{k}^{*}$ respectively be the means of the words. In what follows, we denote the density of the d-dimensional multivariate normal distribution $N_{d}({\textbf{X}}_{i}^{*}, h_{a}^{2}{\textbf{I}})$ to be $\phi _{a}({\textbf{x}}|{\textbf{X}}_{i}^{*})$. We also denote the density of the d-dimensional standard normal distribution to be $\phi (\cdot )$. For notational convenience, we define $h_{ab} \equiv h_{a}/h_{b}$ and $h_{ac} \equiv h_{a}/h_{c}$. Then, we obtain

$$\begin{aligned} J= & {} \int _{{\mathbb {R}}^{d}} \phi _{a}({\textbf{x}}|{\textbf{X}}_{i}^{*}) \{ \log {\phi _{b}({\textbf{x}}|{\textbf{X}}_{j}^{*})} - \log {\phi _{c}({\textbf{x}}|{\textbf{X}}_{k}^{*})} \}^{2} d{\textbf{x}} \\= & {} \int _{{\mathbb {R}}^{d}} \phi _{a}({\textbf{x}}|{\textbf{X}}_{i}^{*}) \Bigl \{ \frac{1}{2h_{c}^{2}} \Vert {\textbf{x}} - {\textbf{X}}_{k}^{*} \Vert ^{2} - \frac{1}{2h_{b}^{2}}\Vert {\textbf{x}} - {\textbf{X}}_{j}^{*} \Vert ^{2} + d\log {h_{cb}} \Bigr \}^{2} d{\textbf{x}} \\= & {} \int _{{\mathbb {R}}^{d}} \phi _{a}({\textbf{x}}|{\textbf{X}}_{i}^{*}) \Biggl [ \frac{h_{ac}^{2} - h_{ab}^{2}}{2} \cdot \frac{\Vert {\textbf{x}} - {\textbf{X}}_{i}^{*} \Vert ^{2}}{h_{a}^{2}} \\{} & {} + \frac{({\textbf{x}}-{\textbf{X}}_{i}^{*})^{T}}{h_{a}} \Biggl \{ h_{ac}^{2} \frac{({\textbf{X}}_{i}^{*} - {\textbf{X}}_{k}^{*})}{h_{a}} - h_{ab}^{2} \frac{({\textbf{X}}_{i}^{*} - {\textbf{X}}_{j}^{*})}{h_{a}} \Biggr \} \\{} & {} + \frac{\Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{k}^{*} \Vert ^{2}}{2h_{c}^{2}} - \frac{\Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{j}^{*} \Vert ^{2}}{2h_{b}^{2}} + d \log h_{cb} \Biggr ]^{2} d{\textbf{x}} \\= & {} \int _{{\mathbb {R}}^{d}} \phi ({\textbf{t}}) \Bigl \{ C_{1}(a, b, c) \Vert {\textbf{t}} \Vert ^{2} + {\textbf{t}}^{T} {\textbf{C}}_{2}(a, b, c; i,j,k) + C_{3}(b, c; i,j,k) \Bigr \}^{2} d{\textbf{t}}, \end{aligned}$$

where we obtain the last equation by change of variable ${\textbf{t}} = {\textbf{x}} - {\textbf{X}}_{i}^{*}$ and define to be

$$\begin{aligned} C_{1}\equiv & {} C_{1}(a, b, c) = \frac{h_{ac}^{2} - h_{ab}^{2}}{2} \ \ \ \in {\mathbb {R}}, \\ {\textbf{C}}_{2}\equiv & {} {\textbf{C}}_{2}(a, b, c; i,j,k) = h_{ac}^{2} \frac{({\textbf{X}}_{i}^{*} - {\textbf{X}}_{k}^{*})}{h_{a}} - h_{ab}^{2} \frac{({\textbf{X}}_{i}^{*} - {\textbf{X}}_{j}^{*})}{h_{a}} \ \ \ \in {\mathbb {R}}^{d}, \\ C_{3}\equiv & {} {C}_{3}(b, c; i,j,k) = \frac{\Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{k}^{*} \Vert ^{2}}{2h_{c}^{2}} - \frac{\Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{j}^{*} \Vert ^{2}}{2h_{b}^{2}} + d \log h_{cb} \ \ \ \in {\mathbb {R}}. \end{aligned}$$

The symbol $\Vert {\textbf{x}} \Vert$ means $({\textbf{x}}^{T}{\textbf{x}})^{1/2}$. Using the fact that the odd order moments of normal distribution are zero, and Theorem 11.22 in Schott (2017, p.480), we obtain

$$\begin{aligned} J= & {} \int _{{\mathbb {R}}^{d}} \phi ({\textbf{t}}) \Bigl \{ C_{1}^2 \Vert {\textbf{t}} \Vert ^{4} + ({\textbf{C}}_{2}^{T}{\textbf{t}})({\textbf{t}}^{T} {\textbf{C}}_{2}) + C_{3}^{2} + 2 C_{1} \Vert {\textbf{t}} \Vert ^{2}({\textbf{t}}^{T} {\textbf{C}}_{2}) \nonumber \\{} & {} + 2C_{3}{\textbf{t}}^{T}{\textbf{C}}_{2} + 2C_{1}C_{3} \Vert {\textbf{t}}\Vert ^{2} \Bigr \} d{\textbf{t}} \nonumber \\= & {} d(d+2) C_{1}^{2} + \Vert {\textbf{C}}_{2} \Vert ^{2} + 2d C_{1}C_{3} + C_{3}^{2} \nonumber \\= & {} 2d C_{1}^{2} + \Vert {\textbf{C}}_{2} \Vert ^{2} + (d C_{1} + C_{3})^{2}. \end{aligned}$$

(20)

We define

$$\begin{aligned} h_{R}= & {} \frac{h_{max}}{h_{min}}, \end{aligned}$$

where $h_{min}$ and $h_{max}$ are the minimum and the maximum bandwidths in the dictionary, respectively. We also define

$$\begin{aligned} R^{2}= & {} \max _{i \ne j} \{ \Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{j}^{*} \Vert ^{2} \}. \end{aligned}$$

Then, we obtain

$$\begin{aligned} |C_{1}|= & {} \Biggl | \frac{h_{ac}^{2}-h_{ab}^{2}}{2} \Biggr | \nonumber \\\le & {} \frac{h_{ac}^{2}+h_{ab}^{2}}{2} \nonumber \\\le & {} \frac{h_{R}^{2}+h_{R}^{2}}{2} \nonumber \\= & {} h_{R}^{2}, \end{aligned}$$

(21)

$$\begin{aligned} \Vert {\textbf{C}}_{2}\Vert ^{2}= & {} \Biggl \Vert h_{ac}^{2} \frac{({\textbf{X}}_{i}^{*} - {\textbf{X}}_{k}^{*})}{h_{a}} - h_{ab}^{2} \frac{({\textbf{X}}_{i}^{*} - {\textbf{X}}_{j}^{*})}{h_{a}} \Biggr \Vert ^{2} \nonumber \\\le & {} \frac{h_{ac}^{4}}{h_{a}^{2}} \Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{k}^{*}\Vert ^{2} + \frac{h_{ab}^{4}}{h_{a}^{2}} \Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{j}^{*}\Vert ^{2} + 2 \frac{h_{ac}^{2}h_{ab}^{2}}{h_{a}^{2}} \Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{k}^{*}\Vert \cdot \Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{j}^{*} \Vert \nonumber \\\le & {} \frac{h_{ac}^{4}}{h_{a}^{2}}R^{2} + \frac{h_{ab}^{4}}{h_{a}^{2}}R^{2} + 2 \frac{h_{ac}^{2}h_{ab}^{2}}{h_{a}^{2}} R^{2} \nonumber \\= & {} \Biggl ( \frac{h_{ac}^{2} + h_{ab}^{2}}{h_{a}} \Biggr )^{2} R^{2} \nonumber \\\le & {} 4 \Biggl ( \frac{h_{R}^{2}}{h_{min}} \Biggr )^{2} R^{2}, \end{aligned}$$

(22)

and

$$\begin{aligned} |C_{3}|= & {} \Biggl | \frac{\Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{k}^{*}\Vert ^{2}}{2h_{c}^{2}} - \frac{\Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{j}^{*}\Vert ^{2}}{2h_{b}^{2}} + d \log h_{cb} \Biggr | \nonumber \\\le & {} \frac{1}{2h_{min}^{2}} \Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{k}^{*}\Vert ^{2} + \frac{1}{2h_{min}^{2}} \Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{j}^{*}\Vert ^{2} + d \log h_{R} \nonumber \\\le & {} \frac{1}{2h_{min}^{2}} R^{2} + \frac{1}{2h_{min}^{2}} R^{2} + d \log h_{R} \nonumber \\= & {} \frac{R^{2}}{h_{min}^{2}} + d \log h_{R}. \end{aligned}$$

(23)

Therefore, using (21), (22), and (23) in (20), we obtain the upper bound

$$\begin{aligned} J\le & {} B_{KL}({\textbf{X}}^{*})^{2} \nonumber \\\le & {} 2d h_{R}^{4} + 4 \Biggl ( \frac{h_{R}^{2}}{h_{min}} \Biggr )^{2} R^{2} + \Biggl \{ dh_{R}^{2} + \frac{R^{2}}{h_{min}^{2}} + d \log h_{R} \Biggr \}^2. \end{aligned}$$

(24)

$\square$

Appendix D

We prove that the finiteness of $E_{{\textbf{X}}^{*}}[B_{U}({\textbf{X}}^{*})^{2}]$ in (15) is ensured if the fourth moment of ${\textbf{X}}_{i}^{*}$ is assumed in the case of KL divergence as described in Remark 3. Considering the expectation of the right-hand side of (24), we need to evaluate

$$\begin{aligned}{} & {} E_{{\textbf{X}}^{*}}\Bigl [ (R^{2})^2 \Bigr ] \\{} & {} \quad = E_{{\textbf{X}}^{*}} \Bigl [ \max _{i \ne j} (\Vert {\textbf{X}}_{i}^{*} - {\textbf{X}}_{j}^{*}\Vert ^{2})^2 \Bigr ] \\{} & {} \quad \le E_{{\textbf{X}}^{*}} \Bigl [ \max _{i \ne j} (\Vert {\textbf{X}}_{i}^{*} \Vert ^{2} + \Vert {\textbf{X}}_{j}^{*} \Vert ^{2} + 2 |{\textbf{X}}_{i}^{*T} {\textbf{X}}_{j}^{*}| )^2 \Bigr ] \\{} & {} \quad \le E_{{\textbf{X}}^{*}} \Bigl [ (\max _{i} \Vert {\textbf{X}}_{i}^{*} \Vert ^{2} + \max _{j} \Vert {\textbf{X}}_{j}^{*} \Vert ^{2} + 2 \max _{i} \Vert {\textbf{X}}_{i}^{*}\Vert \max _{j} \Vert {\textbf{X}}_{j}^{*}\Vert )^2 \Bigr ] \\{} & {} \quad = 16 E_{{\textbf{X}}^{*}} \Bigl [\max _{i} \Vert {\textbf{X}}_{i}^{*} \Vert ^{4} \Bigr ]. \end{aligned}$$

Furthermore, we obtain

$$\begin{aligned}{} & {} E_{{\textbf{X}}^{*}} \Bigl [ \max _{i} \Vert {\textbf{X}}_{i}^{*} \Vert ^{4} \Bigr ]\\{} & {} \quad \le \sum _{i=1}^{m} E_{{\textbf{X}}^{*}} \Bigl [ \Vert {\textbf{X}}_{i}^{*} \Vert ^{4} \Bigr ] \\{} & {} \quad \le m \cdot \max _{i} E_{{\textbf{X}}^{*}} \Bigl [ \Vert {\textbf{X}}_{i}^{*} \Vert ^{4} \Bigr ]. \end{aligned}$$

Hence, it suffices to assume $E_{{\textbf{X}}^{*}} \Bigl [ \Vert {\textbf{X}}_{i}^{*} \Vert ^{4} \Bigr ] < \infty$. $\square$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nishida, K., Naito, K. Kernel density estimation by stagewise algorithm with a simple dictionary. Comput Stat 39, 523–560 (2024). https://doi.org/10.1007/s00180-022-01303-7

Download citation

Received: 06 August 2021
Accepted: 07 November 2022
Published: 02 December 2022
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00180-022-01303-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel density estimation by stagewise algorithm with a simple dictionary

Abstract

Access this article

Similar content being viewed by others

Comparative Analysis of Density Estimation Based Kernel Regression

Optimal Kernel Selection for Density Estimation

Iterative kernel density estimation from noisy-dependent observations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A

Lemma 1

Lemma 2

Lemma 3

Lemma 4

Lemma 5

Appendix B

Lemma 6

Appendix C

Appendix D

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Kernel density estimation by stagewise algorithm with a simple dictionary

Abstract

Access this article

Similar content being viewed by others

Comparative Analysis of Density Estimation Based Kernel Regression

Optimal Kernel Selection for Density Estimation

Iterative kernel density estimation from noisy-dependent observations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A

Lemma 1

Lemma 2

Lemma 3

Lemma 4

Lemma 5

Appendix B

Lemma 6

Appendix C

Appendix D

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation