Use of EM algorithm for data reduction under sparsity assumption

Ghosh, Atanu Kumar; Chakraborty, Arnab

doi:10.1007/s00180-016-0657-3

Use of EM algorithm for data reduction under sparsity assumption

Original Paper
Published: 03 May 2016

Volume 32, pages 387–407, (2017)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Atanu Kumar Ghosh¹ &
Arnab Chakraborty¹

533 Accesses
5 Citations
Explore all metrics

Abstract

Recent scientific applications produce data that are too large for storing or rendering for further statistical analysis. This motivates the construction of an optimum mechanism to choose only a subset of the available information and drawing inferences about the parent population using only the stored subset. This paper addresses the issue of estimation of parameter from such filtered data. Instead of all the observations we observe only a few chosen linear combinations of them and treat the remaining information as missing. From the observed linear combinations we try to estimate the parameter using EM based technique under the assumption that the parameter is sparse. In this paper we propose two related methods called ASREM and ESREM. The methods developed here are also used for hypothesis testing and construction of confidence interval. Similar data filtering approach already exists in signal sampling paradigm, for example, Compressive Sampling introduced by Candes et al. (Commun Pure Appl Math 59(8):1207–1223, 2006) and Donoho (IEEE Trans Inf Theory 52: 1289–1306, 2006). The methods proposed in this paper are not claimed to outperform all the available techniques of signal recovery, rather our methods are suggested as an alternative way of data compression using EM algorithm. However, we shall compare our methods to one standard algorithm, viz., robust signal recovery from noisy data using min-$\ell _{1}$ with quadratic constraints. Finally we shall apply one of our methods to a real life dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust compressive sensing of sparse signals: a review

Article Open access 19 October 2016

Sparse Signal Recovery with Exponential-Family Noise

Importance sampling in signal processing applications

References

Baraniuk R (2007) Compressive sensing. IEEE Signal Process Mag 24:118–121
Article Google Scholar
Baraniuk R, Davenport M, DeVore R, Wakin M (2008) A simple proof of the restricted isometry property for random matrices. Constr Approx 28(3):253–263
Article MathSciNet MATH Google Scholar
Candes EJ (2006) Compressive sampling. Proc Int Congr Math 3:1433–1452
MathSciNet MATH Google Scholar
Candes EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Process Mag 25(2):21–30
Article Google Scholar
Candes EJ, Romberg JK, Tao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Commun Pure Appl Math 59(8):1207–1223
Article MathSciNet MATH Google Scholar
Casella G, Lehman E (2003) Theory of point estimation. Springer, New York
Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
MathSciNet MATH Google Scholar
Donoho D (2006) Compressed sensing. IEEE Trans Inf Theory 52:1289–1306
Article MathSciNet MATH Google Scholar
Duarte MF, Davenport MA, Takhar D, Laska JN, Sun T, Kelly KF, Baraniuk RG (2008) Single-pixel imaging via compressive sampling. IEEE Signal Process Mag 25(2):83–91
Article Google Scholar
Golub G, Van Loan C (1996) Matrix computations. John Hopkins University Press, Baltimore
MATH Google Scholar
Kim DK, Taylor JMG (1995) The restricted EM algorithm for maximum likelihood estimation under linear restrictions on the parameters. J Am Stat Assoc 90:708–716
Article MathSciNet MATH Google Scholar
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
MATH Google Scholar
Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B 44:226–233
MathSciNet MATH Google Scholar
Mallat S (1999) A wavelet tour of signal processing. Academic Press, Cambridge
MATH Google Scholar
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. Wiley, New York
Book MATH Google Scholar
Meng X, Rubin DB (1991) Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. J Am Stat Assoc 86:899–909
Article Google Scholar
Romberg J (2008) Imaging via compressive sampling. IEEE Signal Process Mag 25(2):14–20
Article Google Scholar
Shanon C (1949) Communications in the presence of noise. Proc IRE 37:447–457
Article MathSciNet Google Scholar
Shi NZ, Zheng SR, Guo J (2005) The restricted EM algorithm under inequality restrictions on the parameters. J Multivar Anal 92:53–76
Article MathSciNet MATH Google Scholar
Tan M, Tian G, Fang H (2003) Estimating restricted normal means using the EM-type algorithms and IBF sampling. World Scientific, New Jersey
Book MATH Google Scholar
Tian GL, Ng KW, Tan M (2008) EM-type algorithms for computing restricted MLEs in multivariate normal distributions and multivariate t-distributions. Comput Stat Data Anal 52:4768–4778
Article MathSciNet MATH Google Scholar
Willett RM, Marcia RF, Nichols JM (2011) Compressed sensing for practical optical imaging systems: a tutorial. Opt Eng 50(7):072,601–1–072,601–12
Google Scholar

Download references

Acknowledgments

We thank the anonymous reviewers for their valuable suggestions.

Author information

Authors and Affiliations

Applied Statistics Unit, Indian Statistical Institute, Kolkata, India
Atanu Kumar Ghosh & Arnab Chakraborty

Authors

Atanu Kumar Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Arnab Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Atanu Kumar Ghosh.

Appendix

1.1 Detailed calculations of the proposed approaches

The derivations and the detailed expression of the M-steps of ASREM and ESREM are described here.

1.1.1 Calculation of the M-step in ASREM

The conditional distribution of $\mathbf {x|y,\varvec{\mathbf {\mu }}}^{(t)}$ is given by

$$\begin{aligned} N_{n}\big (\varvec{\mathbf {\mu }}^{(t)}+\phi '(\phi \phi ^{'})^{-1} (\mathbf {y}-\phi \varvec{\mathbf {\mu }}^{(t)}),\,\sigma ^{2} (\mathbf {I}_{n}-\phi '(\phi \phi ^{'})^{-1}\phi )\big ). \end{aligned}$$

(12.1)

Now $\varvec{\mu }\in S_{i}$ implies certain linear restrictions on the parameter $\varvec{\mu }$ in the form

$$\begin{aligned} A_{i}\varvec{\mu }=\mathbf {0} \end{aligned}$$

where $A_{i}=((a_{\gamma \delta }^{i}))$ is a $n\times n$ matrix such that

$$\begin{aligned} a_{\delta \delta }^{i}={\left\{ \begin{array}{ll} 1 &{} \text {if}\,\mu _{\delta }=0\\ 0 &{} \text {otherwise} \end{array}\right. }\quad \text {and} \quad a_{\gamma \delta }^{i}=0\quad \text {if}\,\gamma \ne \delta . \end{aligned}$$

Using Lagrange multiplier method to incorporate the restrictions we construct

$$\begin{aligned} Q_{R}(\varvec{\mu })=Q(\varvec{\mu })- \varvec{\lambda }^{'}A_{i}\varvec{\mu } \end{aligned}$$

where $\varvec{\lambda }^{'}=(\lambda _{1},\lambda _{2},\ldots ,\lambda _{n})$ are Lagrangean multipliers.

We set $\frac{\partial }{\partial \mu _{j}}Q_{R}(\varvec{\mu })=0$ for all j.

If $\mu _{j}\ne 0$

$$\begin{aligned}&\frac{\partial }{\partial \mu _{j}}Q_{R}(\varvec{\mu })=0\;\Longrightarrow \;\frac{\partial }{\partial \mu _{j}}Q(\varvec{\mu })=0\\&\quad \Longrightarrow \frac{\partial }{\partial \mu _{j}}\big [E\{\sum _{\alpha =1}^{n}(x_{\alpha }-\mu _{\alpha })^{2}|y,\varvec{\mathbf {\mu }}^{(t)}\}\big ]=0\\&\quad \Longrightarrow \hat{\mu }_{j}^{(t+1)}(S_{i})=E[x_{j}|y,\varvec{\mathbf {\mu }}^{(t)}]\\&\quad \Longrightarrow \;\hat{\mu }_{j}^{(t+1)}(S_{i})=\mu _{j}^{(t)}+\alpha _{j}-\beta _{j} \end{aligned}$$

where $\alpha _{j}=j^{th}$ element of $\phi (\phi \phi ^{'})^{-1}y$ and $\beta _{j}=j^{th}$ element of $\phi ^{'}(\phi \phi ^{'})^{-1}\phi \varvec{\mu }^{(t)}$ (from (12.1))

If $\mu _{j}=0$

$$\begin{aligned}&\frac{\partial }{\partial \mu _{j}}Q_{R}(\varvec{\mu })=0\\&\quad \Longrightarrow \hat{\mu }_{j}^{(t+1)}(S_{i})=E[x_{j}|y,\varvec{\mathbf {\mu }}^{(t)}]-\lambda _{j} \end{aligned}$$

From the restriction $\mu _{j}=0$ we get

$$\begin{aligned}&\lambda _{j}=E[x_{j}|y,\varvec{\mathbf {\mu }}^{(t)}]\\&\quad \Longrightarrow \hat{\mu }_{j}^{(t+1)}(S_{i})=0. \end{aligned}$$

Then we choose the $\mathbf {\varvec{\hat{\mu }}}^{(t+1)}(S_{i})$ for which $Q(\mathbf {\varvec{\hat{\mu }}}^{(t+1)}(S_{i}))$ is maximum as the new estimate of $\mathbf {\varvec{\mu }}$ at $(t+1)^{th}$ iteration.Thus the estimate of $\varvec{\mu }$ is

$$\begin{aligned} \varvec{\hat{\mu }}^{(t+1)}=\varvec{\hat{\mu }}^{(t+1)}(S_{i}) \end{aligned}$$

such that

$$\begin{aligned} Q(\mathbf {\varvec{\hat{\mu }}}^{(t+1)}(S_{i}))\ge Q(\mathbf {\varvec{\hat{\mu }}}^{(t+1)}(S_{j}))\,\forall j=1,2,\ldots ,n. \end{aligned}$$

1.1.2 Calculation regarding in ESREM

For the unrestricted EM algorithm the estimate of $\varvec{\mu }$ should converge to the maximizer of the observed log-likelihood. The equation for finding the MLE from the observed likelihood is

$$\begin{aligned} (\phi ^{'}V^{-1}\phi )\varvec{\mu }=\phi ^{'}V^{-1}\mathbf {y} \end{aligned}$$

(12.2)

where $V=\phi \phi ^{'}$.

Now the above Eq. (12.2) does not have a unique solution as $rank[(\phi ^{'}V^{-1}\phi )_{n\times n}]=m\ll n$. Hence the observed likelihood does not have a unique maximum and our unrestricted EM algorithm will produce many estimates of $\varvec{\mu }$. Among these many estimates we choose the sparsest solution. This is taken care of by taking the initial estimate of $\varvec{\mu }$ as $\varvec{0}$ in the iterative process as then the estimate will hopefully converge to nearest solution which will be the sparsest one. We have justify this with the help of simulation in Sect. 9. For finding the least norm solution of (12.2) we take the Moore–Penrose inverse of $(\phi ^{'}V^{-1}\phi )$ (Golub and VanLoan 1996) and find the unrestricted EM estimate as

$$\begin{aligned} \varvec{\hat{\mu }}^{un}=(\phi ^{'}V^{-1}\phi )^{+}\phi ^{'}V^{-1}y=P\mathbf {y} \end{aligned}$$

where $P=(\phi ^{'}V^{-1}\phi )^{+}\phi ^{'}V^{-1}$

Then the distribution of $\varvec{\hat{\mu }}^{un}$ comes out to be $N_{n}(P\phi \varvec{\mu }\,,\,\sigma ^{2}PVP^{'}),$ so that we get $E(\varvec{\hat{\mu }}^{un})=P\phi \varvec{\mu }=(\phi ^{'}V^{-1}\phi )^{+}(\phi ^{'}V^{-1}\phi )\varvec{\mu }$. Since $\varvec{\mu }$ is sparse we may assume $(\phi ^{'}V^{-1}\phi )^{+}(\phi ^{'}V^{-1}\phi )\varvec{\mu }\simeq \varvec{\mu }$ and hence we get $E(\varvec{\hat{\mu }}^{un})=P\phi \varvec{\mu }\simeq \varvec{\mu }$.

Now for identifying the true subspace we want to test n hypotheses

$$\begin{aligned} H_{0i}:\mu _{i}=0\qquad \forall i=1(1)n \end{aligned}$$

Let

$$\begin{aligned} \varvec{\hat{\mu }}^{un}=(\hat{\mu }_{1}^{un},\hat{\mu }_{2}^{un},\ldots \hat{\mu }_{n}^{un})^{'} \end{aligned}$$

The above calculations shows that the test statistics for testing $H_{0i}$ is

$$\begin{aligned} \tau _{i}=|\frac{\hat{\mu }_{i}^{un}}{\sigma \sqrt{s_{ii}}}|\sim N(0,1)\quad \text {under}\,H_{0i}\qquad \forall i=1(1)n \end{aligned}$$

where $s_{ii}=i^{th}\text {diagonal element of}\,PVP^{'}$. This justifies the choice of the estimated subspace in 6.1.

1.2 Proofs of the theoretical properties

Below we give the Proofs of Theorems 2 and 3. The theorems are stated earlier at Sect. 7.

1.2.1 Proof of Theorem 2

Proof

First we consider the ASREM algorithm.We apply the REM for each of the subspaces $S_{i}$ in each iteration. The only modification at this stage is that in the M step of each iteration we choose the maximum of the Q values over different subspaces. At the $(t+1)^{th}$ iteration in ASREM we have

$$\begin{aligned}&Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)}(S_{i}))\ge Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)}(S_{j}))\,\forall j\ne i\\&\quad \Rightarrow Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)})={\displaystyle \max _{j}}\; Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)}(S_{j})) \end{aligned}$$

Also since we apply REM for each of the subspace $S_{i}$, it follows from the property of REM that for $S_{i}$ we have

$$\begin{aligned} Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)}(S_{i}))\ge Q(\varvec{\varvec{\hat{\mu }}}^{(t)}(S_{i}))\,\forall i \end{aligned}$$

Hence we have the following chain of inequalities

$$\begin{aligned}&Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)})={\displaystyle \max _{j}}\; Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)}(S_{j}))\ge {\displaystyle \max _{j}}\; Q(\varvec{\varvec{\hat{\mu }}}^{(t)}(S_{j}))=Q(\varvec{\varvec{\hat{\mu }}}^{(t)})\\&\quad \Rightarrow \ell _{obs}(\varvec{\varvec{\hat{\mu }}}^{(t+1)})\ge \ell _{obs}(\varvec{\varvec{\hat{\mu }}}^{(t)}). \end{aligned}$$

Thus for ASREM we get that the Q function is nondecreasing in each iteration which makes the observed log-likelihood also nondecreasing in each iteration.

Next we consider the ESREM algorithm. Here we apply the REM over the estimated subspace $\hat{S}_{\varvec{\mu }}$. Hence from the property of REM we find that the observed log-likelihood is nondecreasing with each iteration. $\square $

1.2.2 Proof of Theorem 3

Proof

The insignificant components of $\varvec{\hat{\mu }}^{un}$ are set to zero and the subspace spanned by the basis corresponding to the significant components is chosen. We find the restricted estimate $\varvec{\hat{\mu }}$ by maximizing $Q(\mathbf {\varvec{\mu }})$ over this subspace. Thus the restricted estimate $\varvec{\hat{\mu }}$ can be treated as a projection of $\varvec{\hat{\mu }}^{un}$ on $\hat{S}_{\varvec{\mu }}$. Hence if $\varvec{\mu }\in \hat{S}_{\varvec{\mu }}$ we have

$$\begin{aligned} \parallel \varvec{\hat{\mu }}-\varvec{\mu }\parallel _{l_{2}}\le \parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}\qquad w.p.\,1 \end{aligned}$$

This implies

$$\begin{aligned} E[\parallel \varvec{\hat{\mu }}-\varvec{\mu }\parallel _{l_{2}}]\le E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}] \end{aligned}$$

Now

$$\begin{aligned}&\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}^{2}={\displaystyle \sum _{i=1}^{n}(\hat{\mu }_{i}^{un}-\mu _{i})^{2}}\\&\quad \Rightarrow E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}^{2}]=\sum _{i=1}^{n}E(\hat{\mu }_{i}^{un}-\mu _{i})^{2}=\sum _{i=1}^{n}V(\hat{\mu }_{i}^{un})\qquad [\because E(\hat{\mu }_{i}^{un})=\mu _{i}]\\&\quad \Rightarrow E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}^{2}]=\sigma ^{2}\sum _{i=1}^{n}s_{ii} \end{aligned}$$

Then

$$\begin{aligned}&E^{2}[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}]\le E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}^{2}]=\sigma ^{2}\sum _{i=1}^{n}s_{ii}\\&\quad \Rightarrow E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}]\le \sigma \sqrt{\sum _{i=1}^{n}s_{ii}}\qquad \qquad [\because \, s_{ii}>0\;\forall i] \end{aligned}$$

This implies

$$\begin{aligned}&E[\parallel \varvec{\hat{\mu }}-\varvec{\mu }\parallel _{l_{2}}]\le E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}]\le \sigma \sqrt{\sum _{i=1}^{n}s_{ii}}\\&\quad \Rightarrow E[\parallel \varvec{\hat{\mu }}-\varvec{\mu }\parallel _{l_{2}}]\le \sigma \sqrt{\sum _{i=1}^{n}s_{ii}} \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghosh, A.K., Chakraborty, A. Use of EM algorithm for data reduction under sparsity assumption. Comput Stat 32, 387–407 (2017). https://doi.org/10.1007/s00180-016-0657-3

Download citation

Received: 10 November 2014
Accepted: 06 April 2016
Published: 03 May 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s00180-016-0657-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Use of EM algorithm for data reduction under sparsity assumption

Abstract

Access this article

Similar content being viewed by others

Robust compressive sensing of sparse signals: a review

Sparse Signal Recovery with Exponential-Family Noise

Importance sampling in signal processing applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Detailed calculations of the proposed approaches

1.1.1 Calculation of the M-step in ASREM

1.1.2 Calculation regarding in ESREM

1.2 Proofs of the theoretical properties

1.2.1 Proof of Theorem 2

Proof

1.2.2 Proof of Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Use of EM algorithm for data reduction under sparsity assumption

Abstract

Access this article

Similar content being viewed by others

Robust compressive sensing of sparse signals: a review

Sparse Signal Recovery with Exponential-Family Noise

Importance sampling in signal processing applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Detailed calculations of the proposed approaches

1.1.1 Calculation of the M-step in ASREM

1.1.2 Calculation regarding in ESREM

1.2 Proofs of the theoretical properties

1.2.1 Proof of Theorem 2

Proof

1.2.2 Proof of Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation