Optimal classification of Gaussian processes in homo- and heteroscedastic settings

Torrecilla, José L.; Ramos-Carreño, Carlos; Sánchez-Montañés, Manuel; Suárez, Alberto

doi:10.1007/s11222-020-09937-7

Optimal classification of Gaussian processes in homo- and heteroscedastic settings

Published: 12 March 2020

Volume 30, pages 1091–1111, (2020)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

505 Accesses
2 Citations
Explore all metrics

Abstract

A procedure to derive optimal discrimination rules is formulated for binary functional classification problems in which the instances available for induction are characterized by random trajectories sampled from different Gaussian processes, depending on the class label. Specifically, these optimal rules are derived as the asymptotic form of the quadratic discriminant for the discretely monitored trajectories in the limit that the set of monitoring points becomes dense in the interval on which the processes are defined. The main goal of this work is to provide a detailed analysis of such optimal rules in the dense monitoring limit, with a particular focus on elucidating the mechanisms by which near-perfect classification arises. In the general case, the quadratic discriminant includes terms that are singular in this limit. If such singularities do not cancel out, one obtains near-perfect classification, which means that the error approaches zero asymptotically, for infinite sample sizes. This singular limit is a consequence of the orthogonality of the probability measures associated with the stochastic processes from which the trajectories are sampled. As a further novel result of this analysis, we formulate rules to determine whether two Gaussian processes are equivalent or mutually singular (orthogonal).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variational Bayesian multinomial logistic Gaussian process classification

Article 26 September 2017

Wanhyun Cho, Inseop Na, … Soonyoung Park

Kernel discriminant analysis and clustering with parsimonious Gaussian process models

Article 04 September 2014

C. Bouveyron, M. Fauvel & S. Girard

Bayesian Block-Diagonal Predictive Classifier for Gaussian Data

References

Baíllo, A., Cuevas, A., Cuesta-Albertos, J.A.: Supervised classification for a family of Gaussian functional models. Scand. J. Stat. 38(3), 480–498 (2011)
MathSciNet MATH Google Scholar
Baker, C.T.H.: The Numerical Treatment of Integral Equations. Clarendon, Oxford (1977)
MATH Google Scholar
Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, Boston (2004)
MATH Google Scholar
Berrendero, J.R., Cárcamo, J.: Linear components of quadratic classifiers. Adv. Data Anal. Classif. 13(2), 347–377 (2019)
MathSciNet MATH Google Scholar
Berrendero, J.R., Bueno-Larraz, B., Cuevas, A.: On Mahalanobis distance in functional settings (2018a). arXiv:1803.06550
Berrendero, J.R., Cuevas, A., Torrecilla, J.L.: On the use of reproducing kernel Hilbert spaces in functional classification. J. Am. Stat. Assoc. 113(523), 1210–1218 (2018b)
MathSciNet MATH Google Scholar
Bollerslev, T., Chou, R., Kroner, K.F.: Arch modeling in finance: a review of the theory and empirical evidence. J. Econom. 52(1–2), 5–59 (1992)
MATH Google Scholar
Cont, R.: Empirical properties of asset returns: stylized facts and statistical issues. Quant. Finance 1(2), 223–236 (2001)
MATH Google Scholar
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2002)
MathSciNet MATH Google Scholar
Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Theory Viewpoint (Cambridge Monographs on Applied & Computational Mathematics). Cambridge University Press, New York (2007)
Google Scholar
Cuesta-Albertos, J.A., Dutta, S.: On perfect classification for Gaussian processes (2016). arXiv:1602.04941
Cuevas, A.: A partial overview of the theory of statistics with functional data. J. Stat. Plan. Inference 147, 1–23 (2014)
MathSciNet MATH Google Scholar
Dai, X., Müller, H.G., Yao, F.: Optimal Bayes classifiers for functional data and density ratios. Biometrika 104(3), 545–560 (2017)
MathSciNet MATH Google Scholar
Delaigle, A., Hall, P.: Defining probability density for a distribution of random functions. Ann. Stat. 38(2), 1171–1193 (2010)
MathSciNet MATH Google Scholar
Delaigle, A., Hall, P.: Achieving near perfect classification for functional data. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 74(2), 267–286 (2012)
MathSciNet MATH Google Scholar
Delaigle, A., Hall, P.: Classification using censored functional data. J. Am. Stat. Assoc. 108(504), 1269–1283 (2013)
MathSciNet MATH Google Scholar
Epifanio, I., Ventura-Campos, N.: Hippocampal shape analysis in Alzheimer’s disease using functional data analysis. Stat. Med. 33(5), 867–880 (2014)
MathSciNet Google Scholar
Fama, E.F.: The behavior of stock-market prices. J. Bus. 38(1), 34–105 (1965)
Google Scholar
Feldman, J.: Equivalence and perpendicularity of Gaussian processes. Pac. J. Math. 8(4), 699–708 (1958)
MathSciNet MATH Google Scholar
Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics. Springer, Secaucus (2006)
MATH Google Scholar
Galeano, P., Joseph, E., Lillo, R.E.: The Mahalanobis distance for functional data with applications to classification. Technometrics 57(2), 281–291 (2015). https://doi.org/10.1080/00401706.2014.902774
Article MathSciNet Google Scholar
Hájek, J.: A property of $J$-divergences of marginal probability distributions. Czechoslov. Math. J. 08(3), 460–463 (1958)
MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, Berlin (2009)
MATH Google Scholar
Hubert, M., Rousseeuw, P., Segaert, P.: Multivariate and functional classification using depth and distance. Adv. Data Anal. Classif. 11(3), 445–466 (2017)
MathSciNet MATH Google Scholar
Kailath, T.: Some results on singular detection. Inf. Control 9(2), 130–152 (1966)
MATH Google Scholar
Kailath, T.: RKHS approach to detection and estimation problems-I: deterministic signals in Gaussian noise. IEEE Trans. Inf. Theory 17(5), 530–549 (1971)
MathSciNet MATH Google Scholar
Kuelbs, J.: Gaussian measures on a Banach space. Journal of Functional Analysis 5(3), 354–367 (1970)
MathSciNet MATH Google Scholar
Leng, X., Müller, H.G.: Classification using functional data analysis for temporal gene expression data. Bioinformatics 22(1), 68–76 (2006)
Google Scholar
Lukić, M.N., Beder, J.H.: Stochastic processes with sample paths in reproducing Kernel Hilbert spaces. Trans. Am. Math. Soc. 353(10), 3945–3969 (2001)
MathSciNet MATH Google Scholar
Manton, J.H., Amblard, P.O.: A primer on reproducing kernel Hilbert spaces. Found. Trends Signal Process. 8(1–2), 1–126 (2015)
MathSciNet MATH Google Scholar
Marks, S., Dunn, O.J.: Discriminant functions when covariance matrices are unequal. J. Am. Stat. Assoc. 69(346), 555–559 (1974)
MATH Google Scholar
Martin-Barragan, B., Lillo, R., Romo, J.: Interpretable support vector machines for functional data. Eur. J. Oper. Res. 232(1), 146–155 (2014)
Google Scholar
Müller, H.G.: Peter hall, functional data analysis and random objects. Ann. Stat. 44(5), 1867–1887 (2016)
MathSciNet MATH Google Scholar
Osborne, M.F.M.: Brownian motion in the stock market. Oper. Res. 7(2), 145–173 (1959)
MathSciNet MATH Google Scholar
Parzen, E.: Statistical inference on time series by Hilbert space methods. Technical report 23, Statistics Department, Stanford University (1959)
Parzen, E.: An approach to time series analysis. Ann Math Stat 32(4), 951–989 (1961a)
MATH Google Scholar
Parzen, E.: Regression analysis of continuous parameter time series. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, University of California Press, Berkeley, California , pp 469–489 (1961b)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Ramos-Carreño, C., Suárez, A., Torrecilla, J.L., Carbajo Berrocal, M., Marcos Manchón, P., Pérez Manso, P., Hernando Bernabé, A.: scikit-fda: functional data analysis in Python (2019). https://doi.org/10.5281/zenodo.3468127
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer Series in Statistics, 2nd edn. Springer, Berlin (2005)
Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, London (2005)
Google Scholar
Rincón, M., Ruiz-Medina, M.D.: Wavelet-RKHS-based functional statistical classification. Adv. Data Anal. Classif. 6(3), 201–217 (2012)
MathSciNet MATH Google Scholar
Rossi, F., Villa, N.: Support vector machine for functional data classification. Neurocomputing 69(7), 730–742 (2006). New Issues in Neurocomputing: 13th European Symposium on Artificial Neural Networks
Google Scholar
Sato, H.: On the equivalence of Gaussian measures. J. Math. Soc. Jpn. 19(2), 159–172 (1967)
MathSciNet MATH Google Scholar
Shepp, L.A.: Radon–Nikodym derivatives of Gaussian measures. Ann. Math. Stat. 37(2), 321–354 (1966)
MathSciNet MATH Google Scholar
Song, J.J., Deng, W., Lee, H.J., Kwon, D.: Optimal classification for time-course gene expression data using functional data analysis. Comput. Biol. Chem. 32(6), 426–432 (2008)
MathSciNet MATH Google Scholar
Spence, A.: On the convergence of the Nyström method for the integral equation eigenvalue problem. Numer. Math. 25(1), 57–66 (1975)
MathSciNet MATH Google Scholar
Varberg, D.E.: On equivalence of Gaussian measures. Pac. J. Math. 11(2), 751–762 (1961)
MathSciNet MATH Google Scholar
Wahl, P.W., Kronmal, R.A.: Discriminant functions when covariances are unequal and sample sizes are moderate. Biometrics 33(3), 479–484 (1977)
MATH Google Scholar
Wang, J.L., Chiou, J.M., Müller, H.G.: Functional data analysis. Ann. Rev. Stat. Appl. 3(1), 257–295 (2016)
Google Scholar
Zhu, H., Brown, P.J., Morris, J.S.: Robust classification of functional and quantitative image data using functional mixed models. Biometrics 68(4), 1260–1268 (2012)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The research has been supported by the Spanish Ministry of Economy, Industry, and Competitiveness—State Research Agency, Projects MTM2016-78751-P and TIN2016-76406-P(AEI/FEDER, UE), and Comunidad Autónoma de Madrid, Project S2017/BMD-3688. The authors gratefully acknowledge the use of the computational facilities at the Centro de Computación Científica (CCC) at the Universidad Autónoma de Madrid (UAM).

Author information

Authors and Affiliations

Department of Mathematics, Universidad Autónoma de Madrid, Madrid, Spain
José L. Torrecilla
UC3M-Santander Big Data Institute, Getafe, Spain
José L. Torrecilla
Computer Science Department, Universidad Autónoma de Madrid, Madrid, Spain
Carlos Ramos-Carreño, Manuel Sánchez-Montañés & Alberto Suárez

Authors

José L. Torrecilla
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Ramos-Carreño
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Sánchez-Montañés
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Suárez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José L. Torrecilla.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Discrete monitoring

In the derivations carried out, the processes X are monitored at a set of appropriately chosen discrete times $ \left\{ t_i \right\} _{i=i}^N \in {\mathcal {I}}^N$. The integrals that appear (e.g., in the definitions of the inner products) are then approximated by Riemann sums

$$\begin{aligned} \int _{t \in {\mathcal {I}}} h(t) \mathrm{d}t \approx \frac{1}{N} \sum _{n=1}^N h(t_n). \end{aligned}$$

(93)

For functions that are continuous in ${\mathcal {I}}$, these Riemman sums converge to the corresponding definite integrals in the limit of dense monitoring

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{1}{N} \sum _{n=1}^N h(t_n) = \int _{t \in {\mathcal {I}}} h(t) \mathrm{d}t \quad \forall h \in {\mathcal {C}}\left[ I\right] . \end{aligned}$$

(94)

Let $K_0$ and $K_1$ be symmetric, strictly positive kernels that are continuous in ${\mathcal {I}}$. Let the corresponding RKHS’s be infinite-dimensional. In the discretized representation, the kernel functions $\left\{ K_i(s,t); s,t \in {\mathcal {I}} \right\} _{i=0}^1$ is approximated by ${\mathbf {K}}_i$, the corresponding $N \times N$ Gram matrices, whose elements are

$$\begin{aligned} \left( {\mathbf {K}}_i\right) _{mn} = K_i(t_n,t_m), \quad n,m = 1, 2,\ldots N, \end{aligned}$$

(95)

for $i = 0,1.$ Let $\left\{ \nu _{ij} = \right\} _{j=1}^N$ be the (positive) eigenvalues of matrix ${\mathbf {K}}_i$. Theorem 3.4 of Baker (1977) can be used to show that, in the limit of dense monitoring,

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{\nu _{j}}{\varDelta T} = \lambda _{j}, \quad j = 1,2, \ldots ,N \end{aligned}$$

(96)

where $\left\{ \lambda _{i1} \ge \lambda _{i2} \ge \ldots \ge \lambda _{iN} > 0 \right\} $ are the largest N eigenvalues of ${\mathcal {K}}_i$, the covariance operator associated with the kernel $K_i$.

Therefore, the spectrum of the Gram matrix ${\mathbf {K}}_i$ converges to the spectrum of the covariance operator ${\mathcal {K}}_i$. In particular, the ratio of the determinants of the Gram matrix

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{\left| {\mathbf {K}}_1\right| }{\left| {\mathbf {K}}_0\right| }= & {} \lim _{N \rightarrow \infty } \prod _{j=1}^N \frac{\nu _{1j}}{\nu _{0j}} \nonumber \\= & {} \lim _{N \rightarrow \infty } \prod _{j = 1}^N \frac{\lambda _{1j}}{\lambda _{0j}} \equiv \frac{\left| {\mathcal {K}}_1\right| }{\left| {\mathcal {K}}_0\right| }, \end{aligned}$$

(97)

can be used to define the ratio $ \frac{\left| {\mathcal {K}}_1\right| }{\left| {\mathcal {K}}_0\right| } $ when the corresponding Gaussian processes are equivalent (${\mathbb {P}}_0 \sim {\mathbb {P}}_1$), in which case the limit exists (is finite) and is different from zero.

B Setup for the experiment with financial data

The setup of the experiment is as follows: Let $\left\{ S_i(t_0),\right. $$\left. S_i(t_1),\ldots , S_i(t_L) \right\} $ be the time series of asset market prices for stock i monitored at the equally spaced instants

$$\begin{aligned} t_n = t_0 + n \varDelta T; \ n = 0,1,\ldots , L, \end{aligned}$$

where $ L = M (N_B + 1) - 1$. In the data analyzed $\varDelta T$ is 1 day. Therefore, the quantity $S_i(t_n)$ is the closing price of the corresponding stock on the nth day of the period considered.

The time series is broken up into M segments of length $N_B +1$, with $N_B = 2^B$ for some integer B

$$\begin{aligned} \left\{ S_i\left( t_0^{[m]}\right) , S_i\left( t_1^{[m]}\right) , \ldots , S_i\left( t_{N_{B}}^{[m]}\right) \right\} _{m=1}^M, \end{aligned}$$

where $t_n^{[m]} = t_{n+ (m-1)N_B}$, with $n = 0, 1, \ldots , N_B$, and $m = 1,2,\ldots , M$. These M time series of $N_B+1$ prices are then transformed into the corresponding time series of log-returns

$$\begin{aligned} \left\{ X_i\left( t_0^{[m]}\right) , X_i\left( t_1^{[m]}\right) , \ldots , X_i\left( t_{N_B}^{[m]}\right) \right\} _{m=1}^M, \end{aligned}$$

(98)

where

$$\begin{aligned} X_i\left( t_n^{[m]}\right) = \log \frac{S_i\left( t_{n}^{[m]}\right) }{S_i\left( t_{0}^{[m]}\right) },\quad n = 0,1,\ldots N_B. \end{aligned}$$

The goal is to discriminate between different stocks on the basis of the corresponding time series of log-returns. In particular, we will analyze how the accuracy of the predictions depends on the monitoring frequency. For this reason, discrimination is made on the basis of $N_b+1$ subsampled values within each segment

$$\begin{aligned} \left\{ X_i\left( t_{0}^{[m]}\right) , X_i\left( t_{ n_b}^{[m]}\right) , X_i\left( t_{2 n_b}^{[m]}\right) , \ldots , X_i\left( t_{N_b n_b}^{[m]}\right) \right\} , \end{aligned}$$

where $N_b = 2^b$, and $n_b = 2^{B-b}$ with $b = 0,1,\ldots ,B$. As an illustration, for $b = 0$, only two inputs in each time series are used for discrimination

$$\begin{aligned} \left\{ X_i\left( t_{0}^{[m]}\right) , X_i\left( t_{N_B}^{[m]}\right) \right\} . \end{aligned}$$

For $b = B$ ($n_B = 1$) the complete time series given by Eq. (98) is used as input to the different classifiers. The higher monitoring the frequency is, the closer the problem is to a functional paradigm.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Torrecilla, J.L., Ramos-Carreño, C., Sánchez-Montañés, M. et al. Optimal classification of Gaussian processes in homo- and heteroscedastic settings. Stat Comput 30, 1091–1111 (2020). https://doi.org/10.1007/s11222-020-09937-7

Download citation

Received: 26 July 2019
Accepted: 24 February 2020
Published: 12 March 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11222-020-09937-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal classification of Gaussian processes in homo- and heteroscedastic settings

Abstract

Access this article

Similar content being viewed by others

Variational Bayesian multinomial logistic Gaussian process classification

Kernel discriminant analysis and clustering with parsimonious Gaussian process models

Bayesian Block-Diagonal Predictive Classifier for Gaussian Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Discrete monitoring

B Setup for the experiment with financial data

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal classification of Gaussian processes in homo- and heteroscedastic settings

Abstract

Access this article

Similar content being viewed by others

Variational Bayesian multinomial logistic Gaussian process classification

Kernel discriminant analysis and clustering with parsimonious Gaussian process models

Bayesian Block-Diagonal Predictive Classifier for Gaussian Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Discrete monitoring

B Setup for the experiment with financial data

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation