Abstract
Optimal design for linear regression is a fundamental task in statistics. For finite design spaces, recent progress has shown that random designs drawn using proportional volume sampling (PVS for short) lead to polynomial-time algorithms with approximation guarantees that outperform i.i.d. sampling. PVS strikes the balance between design nodes that jointly fill the design space, while marginally staying in regions of high mass under the solution of a relaxed convex version of the original problem. In this paper, we examine some of the statistical implications of a new variant of PVS for (possibly Bayesian) optimal design. Using point process machinery, we treat the case of a generic Polish design space. We show that not only are known A-optimality approximation guarantees preserved, but we obtain similar guarantees for D-optimal design that tighten recent results. Moreover, we show that our PVS variant can be sampled in polynomial time. Unfortunately, in spite of its elegance and tractability, we demonstrate on a simple example that the practical implications of general PVS are likely limited. In the second part of the paper, we focus on applications and investigate the use of PVS as a subroutine for stochastic search heuristics. We demonstrate that PVS is a robust addition to the practitioner’s toolbox, especially when the regression functions are nonstandard and the design space, while low-dimensional, has a complicated shape (e.g., nonlinear boundaries, several connected components).
Similar content being viewed by others
Notes
Python code available at https://github.com/APoinas/Optimal-design-in-continuous-space
References
Andersen, M., Dahl, J., Liu, Z., Vandenberghe, L.: Interior-point methods for large-scale cone programming. In: Sra, S., Nowozin, S., Wright, S. (eds.) Optimization for Machine Learning, MIT Press, chap 1, pp. 55–83 (2012)
Atkinson, A., Donev, A., Tobias, R.: Optimum Experimental Designs, with SAS. Oxford University Press, Oxford Statistical Science Series, Oxford (2007)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, USA (2004)
Collings, B.J.: Characteristic polynomials by diagonal expansion. Am. Stat. 37(3), 233–235 (1983)
Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes, vol. I, 2nd edn. Springer, New York (2003)
De Castro, Y., Gamboa, F., Henrion, D., Hess, R., Lasserre, J.: Approximate optimal designs for multivariate polynomial regression. Ann. Stat. 47(1), 127–155 (2019)
Dereziński, M., Warmuth, M., Hsu, D.: Leveraged volume sampling for linear regression. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, pp. 2510–2519 (2018)
Dereziński, M., Warmuth, M., Hsu, D.: Unbiased estimators for random design regression. ArXiv pre-print (2019)
Dereziński, M., Liang, F., Mahoney, M.: Bayesian experimental design using regularized determinantal point processes. In: Chiappa, S., Calandra, R. (Eds.) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR, Online, Proceedings of Machine Learning Research, Vol. 108, pp. 3197–3207 (2020)
Dereziński, M., Mahoney, M.: Determinantal point processes in randomized numerical linear algebra. Not. Am. Math. Soc. 68, 1 (2021)
Dette, H.: Bayesian d-optimal and model robust designs in linear regression models. Stat.: J. Theor. Appl. Stat. 25(1), 27–46 (1993)
Dette, H., Studden, W.J.: The Theory of Canonical Moments with Applications in Statistics, Probability, and Analysis. Wiley Series in Probability and Statistics, Wiley, Hoboken (1997)
Dette, H., Melas, V., Pepelyshev, A.: D-optimal designs for trigonometric regression models on a partial circle. Ann. Inst. Stat. Math. 54, 945–959 (2002)
Dick, J., Pilichshammer, F.: Digital Nets and Sequences. Discrepancy Theory and Quasi-Monte Carlo Integration. Cambridge University Press, Cambridge (2010)
Fang, K., Li, R., Sudjianto, A.: Design and modeling for computer experiments. Computer science and data analysis series, 1st edn. Chapman and Hall/CRC, Boca Raton (2006)
Farrell, R.H., Kiefer, J., Walbran, A.: Optimum multivariate designs. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1: Statistics, University of California Press, pp. 113–138 (1967)
Fedorov, V.: Theory of Optimal Experiments Designs. Academic Press, New York (1972)
Gautier, G., Bardenet, R., Valko, M.: On two ways to use determinantal point processes for Monte Carlo integration. Tech. rep., ICML workshop on Negative dependence in machine learning (2019a)
Gautier, G., Polito, G., Bardenet, R., Valko, M.: DPPy: DPP Sampling with Python. Journal of Machine Learning Research - Machine Learning Open Source Software (JMLR-MLOSS) (2019b)
Grove, D., Woods, D., Lewis, S.: Multifactor b-spline mixed models in designed experiments for the engine mapping problem. J. Qual. Technol. 36(4), 380–391 (2004)
Hough, J., Krishnapur, M., Peres, Y., Virag, B.: Zeros of Gaussian Analytic Functions and Determinantal Point Processes. American Mathematical Society, Providence (2009)
Hough, JB., Krishnapur, M., Peres, Y., Virág, B.: Determinantal processes and independence. Probability surveys (2006)
Johansson, K.: Random matrices and determinantal processes. Les Houches Summer School Proc. 83(C), 1–56 (2006)
Kulesza, A., Taskar, B.: Determinantal point processes for machine learning. Foundations and Trends in Machine Learning (2012)
Lavancier, F., Møller, J., Rubak, E.: Determinantal point process models and statistical inference. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 77, 853–877 (2015)
Liski, E., Mandal, N., Shah, K., Sinha, Ba.: Topics in Optimal Design, 1st edn. Lecture Notes in Statistics 163, Springer, New York (2002)
Liu, X., Yue, R.X., Chatterjee, K.: Geometric characterization of d-optimal designs for random coefficient regression models. Statist. Probab. Lett. 159, 108696 (2020)
Macchi, O.: The coincidence approach to stochastic point processes. Adv. Appl. Probab. 7, 83–122 (1975)
Maronge, J., Zhai, Y., Wiens, D., Fang, Z.: Optimal designs for spline wavelet regression models. J. Stat. Plann. Inference 184, 94–104 (2017)
Nikolov, A., Singh, M., Tantipongpipat, UT.: Proportional volume sampling and approximation algorithms for a-optimal design. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, SODA ’19, 1369–1386 (2019)
Piepel, G., Stanfill, B., Cooley, S., Jones, B., Kroll, J., Vienna, J.: Developing a space-filling mixture experiment design when the components are subject to linear and nonlinear constraints. Qual. Eng. 31(3), 463–472 (2019). https://doi.org/10.1080/08982112.2018.1517887
Pronzato, L., Pázman, A.: Design of Experiments in Nonlinear Models: Asymptotic Normality, Optimality Criteria and Small-Sample Properties. Lecture Notes in Statistics, vol. 212. Springer-Verlag, New York (2013)
Pukelsheim, F.: Optimal Design of Experiments. Classics in applied mathematics 50, Society for Industrial and Applied Mathematics (2006)
Pukelsheim, F., Rieder, S.: Efficient rounding of approximate designs. Biometrika 79(4), 763–770 (1992)
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2004)
Summa, M., Eisenbrand, F., Faenza, Y., Moldenhauer, C.: On largest volume simplices and sub-determinants. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms 2015 (2014)
Virtanen, P., Gommers, R., Oliphant, T., et al.: SciPy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17, 261–272 (2020)
Woods, D., Lewis, S., Dewynne, J.: Designing experiments for multi-variable b-spline models. Sankhya 65, 660–670 (2003)
Acknowledgements
We thank Adrien Hardy for useful discussions throughout the project. We thank Michał Dereziński for his insightful comments and suggestions on an early draft. We acknowledge support from ERC grant Blackjack (ERC-2019-STG-851866) and ANR AI chair Baccarat (ANR-20-CHIA-0002).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Proof of the well-definedness of Definition 3.1
It is obvious that the Janossy densities are positive. Therefore, in order to prove that proportional volume sampling is well defined; see Daley and Vere-Jones (2003, Proposition 5.3.II.(ii)), we only need to show that
We write the eigenvalues of \(\varLambda \) as \(\lambda _1\le \cdots \le \lambda _p\) and the spectral decomposition of \(\varLambda \) as \(\varLambda =P^T D_\lambda P\), where \(D_\lambda \) is the \(p\times p\) diagonal matrix with the \(\lambda _i\) as its diagonal entries. Then, we define the functions \(\psi _i\), \(1\le i\le p\), by the linear transform of the function \(\phi _i\) defined by \((\psi _1(x),\cdots ,\psi _p(x)):=(\phi _1(x),\cdots ,\phi _p(x))P^T\). Finally, we have the decomposition
where \(\psi _S:=(\psi _{S_1},\cdots ,\psi _{S_{|S|}})\) and \(\lambda ^{S^c}:=\prod _{i\notin S}\lambda _i\), with the usual convention \(\lambda ^{\emptyset }=1\); see Collings (1983). Now, by the discrete Cauchy–Binet formula,
where \(x_T:=(x_{T_1},\cdots ,x_{T_{|T|}})\). And, by using the more general Cauchy-Binet formula (Johansson 2006), we get
Therefore
where, in the last two identities, we used the facts that (i) \(G_\nu (\psi _S)\) is equal to \(G_\nu (\psi )_S\), the submatrix of \(G_\nu (\psi )\) whose rows and columns are indexed by S, and (ii) \(G_\nu (\psi )=G_\nu (\phi P^T)=PG_\nu (\psi )P^T\). This proves (A.1).
B Proof of Proposition 3.2
First, we write
Since \((\phi (x)^T\phi (x)+\varLambda )^{-1}\det (\phi (x)^T\phi (x)+\varLambda )\) is the adjugate matrix of \((\phi (x)^T\phi (x)+\varLambda )\), its (i, j) entry is
where we define \(\varLambda _{-j,-i}\) as the matrix \(\varLambda \) with its jth row and ith column removed, and \(\phi _{-i}(x)\) as the vector \(\phi (x)\) with its ith entry removed. Therefore, the (i, j) entry of the matrix \(\mathbb {E}\left[ (\phi (X)^T\phi (X)+\varLambda )^{-1}\right] \) is
Using the same reasoning as in the proof of normalization in Sect. A, we get that
Note that the proof in Sect. A does not rely on any symmetricity argument, so that identity (B.1) can be proved in the same way. As a consequence, we get that
which is the (i, j) entry of the inverse matrix of \(G_\nu (\phi )+\varLambda \). This proves identity (3.2).
Finally, the proof of identity (3.3) is straightforward:
C Proof of Proposition 3.3
By definition of the Janossy densities, we have
The integral in the numerator simplifies to
As for the denominator of (C.1), following the lines of Sect. A leads to
where the \(\psi \) functions are defined the same way as in Sect. A. Recalling that
we can rewrite the sum in (C.3) as
Now, since \(\nu (\varOmega )=k\), the sequence \(i\mapsto \nu (\varOmega )^i/i!\) is increasing when \(i\le k\). Hence, for all \(S\subset [p]\) such that \(S\ne [p]\),
We thus obtain
Finally, combining (C.1), (C.2), (C.4) and the fact that \(\nu (\varOmega )=k\), we get
concluding the proof.
D Proof of Proposition 3.4
Using the convexity of \(x\mapsto 1/x\) on \(\mathbb {R}_+^*\), it comes
Now, in Sect. C we showed that
which concludes the proof.
E Proof of Proposition 3.5
By definition of the Janossy densities, we have
Using the same notation as in Sect. A, we expand the numerator into
Now,
where in this case \(S^c\) denotes the complement of S relative to \([p]\backslash \{i\}\). Note that there are exactly \(\dim (\text {Ker}(\varLambda ))\) eigenvalues of \(\varLambda \) equal to 0, so that the elements in the sum in (E.3) are equal to 0 when \(|S|\le m_0-1\). Since \(\nu (\varOmega )=k\), the sequence \(i\mapsto \nu (\varOmega )^i/i!\) is increasing when \(i\le k\), so that
which, combined with (E.2), gives
Finally, combining (E.1), (E.4) and (C.3) gives
and since \(\nu (\varOmega )=k\), this concludes the proof.
F Proof of Proposition 3.7
The Janossy densities and correlation functions of a point process are linked by the following identity; see Daley and Vere -Jones (2003, Lemma 5.4.III):
Applying this identity to the Janossy densities of \(\mathbb {P}_{\textrm{VS}}^{\nu }(\phi ,\varLambda )\), we get
Now, for all \(x_1,\cdots ,x_n\in \varOmega \), using the same reasoning as in the proof of normalization in Sect. A but replacing the matrix \(\varLambda \) with the matrix \(\phi (x)^T\phi (x)+\varLambda \), we get
We then conclude that
G Proof of Proposition 3.8
For any \(n\in \mathbb {N}\) and \(x\in \varOmega ^n\), we write K[x] for the \(n\times n\) matrix with entries \(K(x_i,x_j)\). Since \(G_\nu (\phi )+\varLambda \) is invertible, then
Now, it remains to show that the superposition of X and Y has the same correlation functions to conclude that its distribution is \(\mathbb {P}_{\textrm{VS}}^{\nu }(\phi ,A)\).
Let \(n\in \mathbb {N}\), we recall that the nth-order correlation function \(\rho '_n\) of \(X\cup Y\) satisfy
for all integrable functions f, where the \(\ne \) symbol means that the sum is taken on distinct elements of \(X\cup Y\). Since each element of \(X\cup Y\) is either in X or Y then (G.2) can be rewritten as
This proves that the correlation functions of \(X\cup Y\) also satisfy
Therefore, \(X\cup Y\) is distributed as \(\mathbb {P}_{\textrm{VS}}^{\nu }(\phi ,A)\).
H Proof of Corollary 3.10
\(X\sim \mathbb {P}_{\textrm{VS}}^{\nu }(\phi ,\varLambda )\) is the superposition of a Poisson point process Y with intensity \(\nu \) and a DPP Z with intensity \(\rho (x)\textrm{d}x=\phi (x)(G_\nu (\phi )+\varLambda )^{-1}\phi (x)^T\); see identity (G.1). Therefore,
with \(\mathbb {E}[|Y|]=\nu (\varOmega )\) and
Since we can rewrite \(\phi (x)(G_\nu (\phi )+\varLambda )^{-1}\phi (x)^T\) as \(\textrm{Tr}((G_\nu (\phi )+\varLambda )^{-1}\phi (x)^T\phi (x))\), we get
concluding the proof.
I A parametrized reference measure for Sect. 3.4.
To parametrize \(\nu \), we write its density f as a linear combination of positive functions with nonnegative weights, that is,
This way, minimizing \(h(G_\nu (\phi ))\) over \(\nu =f\textrm{d}x\) of the form (1.1) and such that \(\nu (\varOmega )=k\) is equivalent to finding \((\omega _1,\cdots ,\omega _n)\) minimizing
This is now a convex optimization problem that can be solved numerically. For our illustration, we consider that \(h\in \{h_D,h_A\}\) and the \(g_i\) to be the 231 polynomial functions of two variables with degree \(\le 10\) as well as their composition with \((x,y)\mapsto (1-x,1-y)\), which are all nonnegative functions on \(\varOmega =[0,1]^2\). We show in Fig. 7 the density of the measures minimizing (1.2) for both optimality criteria and for \(\varLambda \in \{I_{10}, 0.01 I_{10}, 0.0001 I_{10}\}\).
J Performance of PVS when dealing with both qualitative and quantitative variables
Following a similar idea as in Sect. 3.4 and Atkinson et al (2007, Section 14), we consider the design space \(\varOmega =[0,1]^2\times \{0,1\}\) and \(k=p=11\), with the regression functions \(\phi _i\), for \(i\le 10\), being the 10 bivariate polynomials of degree \(\le 3\) and \(\phi _{11}(x,y,z):=\mathbbm {1}_{z=1}\). We consider the non-Bayesian setting where \(\varLambda =0\). Following our approach in Sect. I, we parameterize \(\nu \) as a linear combination (with nonnegative weights) of the 462 functions of the form \((x,y,z)\mapsto P(x,y)\mathbbm {1}_{z=i}\), where P is a polynomial function of degree \(\le 10\) and \(i\in \{0,1\}\). We find the optimal weights numerically by solving the associated convex optimization problem to get an optimized measure \(\nu ^*\).
We show in Fig. 9 an example of design generated by PVS with or without an optimized measure, compared to a uniformly drawn design and an optimal one. We also show in Fig. 8 the performance of PVS and i.i.d. designs with reference measure being either uniform or \(\nu ^*\). The results are very similar result to those in Fig. 3c, where we did not have qualitative factors. This shows that the addition of qualitative factors does not deter the performance of PVS.
Rights and permissions
About this article
Cite this article
Poinas, A., Bardenet, R. On proportional volume sampling for experimental design in general spaces. Stat Comput 33, 29 (2023). https://doi.org/10.1007/s11222-022-10115-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-022-10115-0