Skip to main content
Log in

Use of EM algorithm for data reduction under sparsity assumption

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Recent scientific applications produce data that are too large for storing or rendering for further statistical analysis. This motivates the construction of an optimum mechanism to choose only a subset of the available information and drawing inferences about the parent population using only the stored subset. This paper addresses the issue of estimation of parameter from such filtered data. Instead of all the observations we observe only a few chosen linear combinations of them and treat the remaining information as missing. From the observed linear combinations we try to estimate the parameter using EM based technique under the assumption that the parameter is sparse. In this paper we propose two related methods called ASREM and ESREM. The methods developed here are also used for hypothesis testing and construction of confidence interval. Similar data filtering approach already exists in signal sampling paradigm, for example, Compressive Sampling introduced by Candes et al. (Commun Pure Appl Math 59(8):1207–1223, 2006) and Donoho (IEEE Trans Inf Theory 52: 1289–1306, 2006). The methods proposed in this paper are not claimed to outperform all the available techniques of signal recovery, rather our methods are suggested as an alternative way of data compression using EM algorithm. However, we shall compare our methods to one standard algorithm, viz., robust signal recovery from noisy data using min-\(\ell _{1}\) with quadratic constraints. Finally we shall apply one of our methods to a real life dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baraniuk R (2007) Compressive sensing. IEEE Signal Process Mag 24:118–121

    Article  Google Scholar 

  • Baraniuk R, Davenport M, DeVore R, Wakin M (2008) A simple proof of the restricted isometry property for random matrices. Constr Approx 28(3):253–263

    Article  MathSciNet  MATH  Google Scholar 

  • Candes EJ (2006) Compressive sampling. Proc Int Congr Math 3:1433–1452

    MathSciNet  MATH  Google Scholar 

  • Candes EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Process Mag 25(2):21–30

    Article  Google Scholar 

  • Candes EJ, Romberg JK, Tao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Commun Pure Appl Math 59(8):1207–1223

    Article  MathSciNet  MATH  Google Scholar 

  • Casella G, Lehman E (2003) Theory of point estimation. Springer, New York

    Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Donoho D (2006) Compressed sensing. IEEE Trans Inf Theory 52:1289–1306

    Article  MathSciNet  MATH  Google Scholar 

  • Duarte MF, Davenport MA, Takhar D, Laska JN, Sun T, Kelly KF, Baraniuk RG (2008) Single-pixel imaging via compressive sampling. IEEE Signal Process Mag 25(2):83–91

    Article  Google Scholar 

  • Golub G, Van Loan C (1996) Matrix computations. John Hopkins University Press, Baltimore

    MATH  Google Scholar 

  • Kim DK, Taylor JMG (1995) The restricted EM algorithm for maximum likelihood estimation under linear restrictions on the parameters. J Am Stat Assoc 90:708–716

    Article  MathSciNet  MATH  Google Scholar 

  • Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York

    MATH  Google Scholar 

  • Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B 44:226–233

    MathSciNet  MATH  Google Scholar 

  • Mallat S (1999) A wavelet tour of signal processing. Academic Press, Cambridge

    MATH  Google Scholar 

  • McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. Wiley, New York

    Book  MATH  Google Scholar 

  • Meng X, Rubin DB (1991) Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. J Am Stat Assoc 86:899–909

    Article  Google Scholar 

  • Romberg J (2008) Imaging via compressive sampling. IEEE Signal Process Mag 25(2):14–20

    Article  Google Scholar 

  • Shanon C (1949) Communications in the presence of noise. Proc IRE 37:447–457

    Article  MathSciNet  Google Scholar 

  • Shi NZ, Zheng SR, Guo J (2005) The restricted EM algorithm under inequality restrictions on the parameters. J Multivar Anal 92:53–76

    Article  MathSciNet  MATH  Google Scholar 

  • Tan M, Tian G, Fang H (2003) Estimating restricted normal means using the EM-type algorithms and IBF sampling. World Scientific, New Jersey

    Book  MATH  Google Scholar 

  • Tian GL, Ng KW, Tan M (2008) EM-type algorithms for computing restricted MLEs in multivariate normal distributions and multivariate t-distributions. Comput Stat Data Anal 52:4768–4778

    Article  MathSciNet  MATH  Google Scholar 

  • Willett RM, Marcia RF, Nichols JM (2011) Compressed sensing for practical optical imaging systems: a tutorial. Opt Eng 50(7):072,601–1–072,601–12

    Google Scholar 

Download references

Acknowledgments

We thank the anonymous reviewers for their valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Atanu Kumar Ghosh.

Appendix

Appendix

1.1 Detailed calculations of the proposed approaches

The derivations and the detailed expression of the M-steps of ASREM and ESREM are described here.

1.1.1 Calculation of the M-step in ASREM

The conditional distribution of \(\mathbf {x|y,\varvec{\mathbf {\mu }}}^{(t)}\) is given by

$$\begin{aligned} N_{n}\big (\varvec{\mathbf {\mu }}^{(t)}+\phi '(\phi \phi ^{'})^{-1} (\mathbf {y}-\phi \varvec{\mathbf {\mu }}^{(t)}),\,\sigma ^{2} (\mathbf {I}_{n}-\phi '(\phi \phi ^{'})^{-1}\phi )\big ). \end{aligned}$$
(12.1)

Now \(\varvec{\mu }\in S_{i}\) implies certain linear restrictions on the parameter \(\varvec{\mu }\) in the form

$$\begin{aligned} A_{i}\varvec{\mu }=\mathbf {0} \end{aligned}$$

where \(A_{i}=((a_{\gamma \delta }^{i}))\) is a \(n\times n\) matrix such that

$$\begin{aligned} a_{\delta \delta }^{i}={\left\{ \begin{array}{ll} 1 &{} \text {if}\,\mu _{\delta }=0\\ 0 &{} \text {otherwise} \end{array}\right. }\quad \text {and} \quad a_{\gamma \delta }^{i}=0\quad \text {if}\,\gamma \ne \delta . \end{aligned}$$

Using Lagrange multiplier method to incorporate the restrictions we construct

$$\begin{aligned} Q_{R}(\varvec{\mu })=Q(\varvec{\mu })- \varvec{\lambda }^{'}A_{i}\varvec{\mu } \end{aligned}$$

where \(\varvec{\lambda }^{'}=(\lambda _{1},\lambda _{2},\ldots ,\lambda _{n})\) are Lagrangean multipliers.

We set \(\frac{\partial }{\partial \mu _{j}}Q_{R}(\varvec{\mu })=0\) for all j.

If \(\mu _{j}\ne 0\)

$$\begin{aligned}&\frac{\partial }{\partial \mu _{j}}Q_{R}(\varvec{\mu })=0\;\Longrightarrow \;\frac{\partial }{\partial \mu _{j}}Q(\varvec{\mu })=0\\&\quad \Longrightarrow \frac{\partial }{\partial \mu _{j}}\big [E\{\sum _{\alpha =1}^{n}(x_{\alpha }-\mu _{\alpha })^{2}|y,\varvec{\mathbf {\mu }}^{(t)}\}\big ]=0\\&\quad \Longrightarrow \hat{\mu }_{j}^{(t+1)}(S_{i})=E[x_{j}|y,\varvec{\mathbf {\mu }}^{(t)}]\\&\quad \Longrightarrow \;\hat{\mu }_{j}^{(t+1)}(S_{i})=\mu _{j}^{(t)}+\alpha _{j}-\beta _{j} \end{aligned}$$

where \(\alpha _{j}=j^{th}\) element of \(\phi (\phi \phi ^{'})^{-1}y\) and \(\beta _{j}=j^{th}\) element of \(\phi ^{'}(\phi \phi ^{'})^{-1}\phi \varvec{\mu }^{(t)}\) (from (12.1))

If \(\mu _{j}=0\)

$$\begin{aligned}&\frac{\partial }{\partial \mu _{j}}Q_{R}(\varvec{\mu })=0\\&\quad \Longrightarrow \hat{\mu }_{j}^{(t+1)}(S_{i})=E[x_{j}|y,\varvec{\mathbf {\mu }}^{(t)}]-\lambda _{j} \end{aligned}$$

From the restriction \(\mu _{j}=0\) we get

$$\begin{aligned}&\lambda _{j}=E[x_{j}|y,\varvec{\mathbf {\mu }}^{(t)}]\\&\quad \Longrightarrow \hat{\mu }_{j}^{(t+1)}(S_{i})=0. \end{aligned}$$

Then we choose the \(\mathbf {\varvec{\hat{\mu }}}^{(t+1)}(S_{i})\) for which \(Q(\mathbf {\varvec{\hat{\mu }}}^{(t+1)}(S_{i}))\) is maximum as the new estimate of \(\mathbf {\varvec{\mu }}\) at \((t+1)^{th}\) iteration.Thus the estimate of \(\varvec{\mu }\) is

$$\begin{aligned} \varvec{\hat{\mu }}^{(t+1)}=\varvec{\hat{\mu }}^{(t+1)}(S_{i}) \end{aligned}$$

such that

$$\begin{aligned} Q(\mathbf {\varvec{\hat{\mu }}}^{(t+1)}(S_{i}))\ge Q(\mathbf {\varvec{\hat{\mu }}}^{(t+1)}(S_{j}))\,\forall j=1,2,\ldots ,n. \end{aligned}$$

1.1.2 Calculation regarding in ESREM

For the unrestricted EM algorithm the estimate of \(\varvec{\mu }\) should converge to the maximizer of the observed log-likelihood. The equation for finding the MLE from the observed likelihood is

$$\begin{aligned} (\phi ^{'}V^{-1}\phi )\varvec{\mu }=\phi ^{'}V^{-1}\mathbf {y} \end{aligned}$$
(12.2)

where \(V=\phi \phi ^{'}\).

Now the above Eq. (12.2) does not have a unique solution as \(rank[(\phi ^{'}V^{-1}\phi )_{n\times n}]=m\ll n\). Hence the observed likelihood does not have a unique maximum and our unrestricted EM algorithm will produce many estimates of \(\varvec{\mu }\). Among these many estimates we choose the sparsest solution. This is taken care of by taking the initial estimate of \(\varvec{\mu }\) as \(\varvec{0}\) in the iterative process as then the estimate will hopefully converge to nearest solution which will be the sparsest one. We have justify this with the help of simulation in Sect. 9. For finding the least norm solution of (12.2) we take the Moore–Penrose inverse of \((\phi ^{'}V^{-1}\phi )\) (Golub and VanLoan 1996) and find the unrestricted EM estimate as

$$\begin{aligned} \varvec{\hat{\mu }}^{un}=(\phi ^{'}V^{-1}\phi )^{+}\phi ^{'}V^{-1}y=P\mathbf {y} \end{aligned}$$

where \(P=(\phi ^{'}V^{-1}\phi )^{+}\phi ^{'}V^{-1}\)

Then the distribution of \(\varvec{\hat{\mu }}^{un}\) comes out to be \(N_{n}(P\phi \varvec{\mu }\,,\,\sigma ^{2}PVP^{'}),\) so that we get \(E(\varvec{\hat{\mu }}^{un})=P\phi \varvec{\mu }=(\phi ^{'}V^{-1}\phi )^{+}(\phi ^{'}V^{-1}\phi )\varvec{\mu }\). Since \(\varvec{\mu }\) is sparse we may assume \((\phi ^{'}V^{-1}\phi )^{+}(\phi ^{'}V^{-1}\phi )\varvec{\mu }\simeq \varvec{\mu }\) and hence we get \(E(\varvec{\hat{\mu }}^{un})=P\phi \varvec{\mu }\simeq \varvec{\mu }\).

Now for identifying the true subspace we want to test n hypotheses

$$\begin{aligned} H_{0i}:\mu _{i}=0\qquad \forall i=1(1)n \end{aligned}$$

Let

$$\begin{aligned} \varvec{\hat{\mu }}^{un}=(\hat{\mu }_{1}^{un},\hat{\mu }_{2}^{un},\ldots \hat{\mu }_{n}^{un})^{'} \end{aligned}$$

The above calculations shows that the test statistics for testing \(H_{0i}\) is

$$\begin{aligned} \tau _{i}=|\frac{\hat{\mu }_{i}^{un}}{\sigma \sqrt{s_{ii}}}|\sim N(0,1)\quad \text {under}\,H_{0i}\qquad \forall i=1(1)n \end{aligned}$$

where \(s_{ii}=i^{th}\text {diagonal element of}\,PVP^{'}\). This justifies the choice of the estimated subspace in 6.1.

1.2 Proofs of the theoretical properties

Below we give the Proofs of Theorems 2 and 3. The theorems are stated earlier at Sect. 7.

1.2.1 Proof of Theorem 2

Proof

First we consider the ASREM algorithm.We apply the REM for each of the subspaces \(S_{i}\) in each iteration. The only modification at this stage is that in the M step of each iteration we choose the maximum of the Q values over different subspaces. At the \((t+1)^{th}\) iteration in ASREM we have

$$\begin{aligned}&Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)}(S_{i}))\ge Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)}(S_{j}))\,\forall j\ne i\\&\quad \Rightarrow Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)})={\displaystyle \max _{j}}\; Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)}(S_{j})) \end{aligned}$$

Also since we apply REM for each of the subspace \(S_{i}\), it follows from the property of REM that for \(S_{i}\) we have

$$\begin{aligned} Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)}(S_{i}))\ge Q(\varvec{\varvec{\hat{\mu }}}^{(t)}(S_{i}))\,\forall i \end{aligned}$$

Hence we have the following chain of inequalities

$$\begin{aligned}&Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)})={\displaystyle \max _{j}}\; Q(\varvec{\varvec{\hat{\mu }}}^{(t+1)}(S_{j}))\ge {\displaystyle \max _{j}}\; Q(\varvec{\varvec{\hat{\mu }}}^{(t)}(S_{j}))=Q(\varvec{\varvec{\hat{\mu }}}^{(t)})\\&\quad \Rightarrow \ell _{obs}(\varvec{\varvec{\hat{\mu }}}^{(t+1)})\ge \ell _{obs}(\varvec{\varvec{\hat{\mu }}}^{(t)}). \end{aligned}$$

Thus for ASREM we get that the Q function is nondecreasing in each iteration which makes the observed log-likelihood also nondecreasing in each iteration.

Next we consider the ESREM algorithm. Here we apply the REM over the estimated subspace \(\hat{S}_{\varvec{\mu }}\). Hence from the property of REM we find that the observed log-likelihood is nondecreasing with each iteration. \(\square \)

1.2.2 Proof of Theorem 3

Proof

The insignificant components of \(\varvec{\hat{\mu }}^{un}\) are set to zero and the subspace spanned by the basis corresponding to the significant components is chosen. We find the restricted estimate \(\varvec{\hat{\mu }}\) by maximizing \(Q(\mathbf {\varvec{\mu }})\) over this subspace. Thus the restricted estimate \(\varvec{\hat{\mu }}\) can be treated as a projection of \(\varvec{\hat{\mu }}^{un}\) on \(\hat{S}_{\varvec{\mu }}\). Hence if \(\varvec{\mu }\in \hat{S}_{\varvec{\mu }}\) we have

$$\begin{aligned} \parallel \varvec{\hat{\mu }}-\varvec{\mu }\parallel _{l_{2}}\le \parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}\qquad w.p.\,1 \end{aligned}$$

This implies

$$\begin{aligned} E[\parallel \varvec{\hat{\mu }}-\varvec{\mu }\parallel _{l_{2}}]\le E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}] \end{aligned}$$

Now

$$\begin{aligned}&\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}^{2}={\displaystyle \sum _{i=1}^{n}(\hat{\mu }_{i}^{un}-\mu _{i})^{2}}\\&\quad \Rightarrow E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}^{2}]=\sum _{i=1}^{n}E(\hat{\mu }_{i}^{un}-\mu _{i})^{2}=\sum _{i=1}^{n}V(\hat{\mu }_{i}^{un})\qquad [\because E(\hat{\mu }_{i}^{un})=\mu _{i}]\\&\quad \Rightarrow E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}^{2}]=\sigma ^{2}\sum _{i=1}^{n}s_{ii} \end{aligned}$$

Then

$$\begin{aligned}&E^{2}[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}]\le E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}^{2}]=\sigma ^{2}\sum _{i=1}^{n}s_{ii}\\&\quad \Rightarrow E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}]\le \sigma \sqrt{\sum _{i=1}^{n}s_{ii}}\qquad \qquad [\because \, s_{ii}>0\;\forall i] \end{aligned}$$

This implies

$$\begin{aligned}&E[\parallel \varvec{\hat{\mu }}-\varvec{\mu }\parallel _{l_{2}}]\le E[\parallel \varvec{\hat{\mu }}^{un}-\varvec{\mu }\parallel _{l_{2}}]\le \sigma \sqrt{\sum _{i=1}^{n}s_{ii}}\\&\quad \Rightarrow E[\parallel \varvec{\hat{\mu }}-\varvec{\mu }\parallel _{l_{2}}]\le \sigma \sqrt{\sum _{i=1}^{n}s_{ii}} \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, A.K., Chakraborty, A. Use of EM algorithm for data reduction under sparsity assumption. Comput Stat 32, 387–407 (2017). https://doi.org/10.1007/s00180-016-0657-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-016-0657-3

Keywords

Navigation