Skip to main content
Log in

ECOPICA: empirical copula-based independent component analysis

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

This study proposes a non-parametric ICA method, called ECOPICA, which describes the joint distribution of data by empirical copulas and measures the dependence between recovery signals by an independent test statistic. We employ the grasshopper algorithm to optimize the proposed objective function. Several acceleration tricks are further designed to enhance the computational efficiency of the proposed algorithm under the parallel computing framework. Our simulation and empirical analysis show that ECOPICA produces better and more robust recovery performances than other well-known ICA approaches for various source distribution shapes, especially when the source distribution is skewed or near-Gaussian.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. In practice, the \(\varvec{S}\) on the left-hand side of (2) can be a permutation of the original \(\varvec{S}\) in (1). Since the original order of the components in \(\varvec{S}\) is unknown and to simplify the notations, we still use the notation \(\varvec{S}\) in (2) without ambiguity.

  2. IRMAS dataset can be found from https://www.upf.edu/web/mtg/irmas.

References

Download references

Acknowledgements

We would like to express our sincere gratitude to the Editor and two anonymous reviewers for their insightful comments to improve our work. We thank the National Science and Technology Council, Taiwan (Grants NSTC 112-2118-M-110-001-MY2, MOST 111-2118-M-006-002-MY2, and NSTC 112-2118-M-008-004-MY3) for partial support of this research. This article is dedicated to Professor Wen-Jang Huang for acknowledging his significant role in fostering mathematical and statistical talents in Taiwan until his passing on November 21, 2023.

Author information

Authors and Affiliations

Authors

Contributions

Pi, Guo, Chen, and Huang discussed the concept of the proposed method and wrote the main manuscript text. Pi conducted the numerical experiments.

Corresponding author

Correspondence to Shih-Feng Huang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Two tricks of acceleration

In this section, we introduce the details of the uniform trick and mini-batch trick for acceleration mentioned in Sect. 3.2.

  1. (i)

    Uniform Trick Since we need to calculate \(s_2\) in (7) multiple times, which measures the kernel distance between the empirical copula of the data \({\widehat{C}}_{\varvec{Y}, T}\) and the empirical independence copula \(\varPi _T\), the required computing resources increase dramatically when the signal length T is large. Since the estimation of \(s_2\) is not sensitive to the accuracy of the estimation of \(\varPi _T\), we propose to estimate \(s_2\) in the following way to reduce the computational complexity:

    $$\begin{aligned} s_2 = \frac{1}{T \cdot T_0^n} \sum _{i=1}^T \prod _{j=1}^n \sum _{k=1}^{T_0} k_{\sigma }\left( u_j^{(i)}, \frac{k}{T_0}\right) , \end{aligned}$$

    where \(T_0 = \min {\{1000, T\}}\). That is, we use \(\varPi _{T_0}\), a “thicker" (but still unbiased) estimator of the independence copula, in the calculation of \(s_2\). By doing this, the computational complexity of \(s_2\) reduced from \(O(nT^2)\) to O(nT) when \(T > 1000\). We call it the uniform trick since this concise form is derived from the property of independence copula. Based on Lemma 6 in Póczos et al. (2012), Lemma L2 in Roy et al. (2020) shows that the error term of this approximation is bounded by \(4n/T_0^2\). In addition, Fig. 18 presents the contour plot of the error bounds of the uniform trick approximation under various T and n, which reveals that the error bounds decrease as T increasing for any fixed n and the error bounds are less than \(10^{-4}\) when \(T>2000\) for \(n\le 50\).

  2. (ii)

    Mini-Batch Trick In addition to the uniform trick, we further propose a mini-batch trick to reduce the computational complexity of \(I_{\sigma }\). By the findings in Roy et al. (2020), the power of the dependency testing based on \(I_{\sigma }\) is acceptable if the sample size \(T\ge 250\). Hence, the proposed mini-batch trick is to split a signal into multiple batches with a predetermined batch size, say 250. Accordingly, the \(I_{\sigma }\) is estimated by aggregating the measurements of the dependency on each batch, and the computational complexity is reduced to O(nT). Specifically, let T denote the length of the signal and \(T_0\) be the selected batch size. Hence, the number of batches is \(B = \lceil T/T_0 \rceil \). First, split the recover data matrix \({\textbf{Y}}\) into multiple batches \({\textbf{Y}}_1,\dots , {\textbf{Y}}_B\). Next, perform the empirical copula transformation on each batch to get \({\textbf{U}}_1,\dots , {\textbf{U}}_B\) and compute the dependency measurement, denoted by \({\widehat{I}}_{\sigma }({\textbf{Y}}_i)\), \(i=1,\ldots , B\), for each batch. Finally, the \(I_{\sigma }\) is estimated by \(B^{-1}\sum _{i=1}^B{\widehat{I}}_{\sigma }({\textbf{Y}}_i)\). The concept of this procedure is to estimate the overall dependency with the dependencies of nearby regions and ignore the dependencies of distant regions. From the view of the estimation of \(s_1\), we estimate the kernel matrix by a sparse matrix with zeros in all its off-diagonal block matrices.

Fig. 18
figure 18

Contour plot of the error bound of uniform trick approximation

Appendix B: Parallel computation framework

To compute \({\widehat{I}}_{\sigma }(\varvec{\beta })\) in Eq. (7) parallelly, which is equivalent to compute \(s_1\) and \(s_2\) in Eq. (7) parallelly. In the calculation of \(s_2\), we assign tasks to different threads according to \(i = 1, 2, \dots ,T\); In the calculation of \(s_1\), we assign tasks to different threads according to \(k = 1, 2, \dots , \left( {\begin{array}{c}T\\ 2\end{array}}\right) \), where the relationship between k and (ij), \(1 \le i < j \le T\), is defined as

$$\begin{aligned} k = i + \frac{(j-1)(j-2)}{2}. \end{aligned}$$
(B1)

The key to this idea is that when k is determined, the closed-form of i and j can be found, making it possible to calculate \(k_\sigma ({\textbf{u}}^{(i)}, {\textbf{u}}^{(j)})\) used in Eq. (7). The following proposition shows how we fulfilled this idea, and the rigorous proof of it is also given below.

Proposition 1

If \(k = i + (j-1)(j-2)/2\), \(1 \le i < j \le T\), \(1 \le k \le \left( {\begin{array}{c}T\\ 2\end{array}}\right) \), and \(i, j, k \in {\mathbb {N}}\), then \(j = \lceil 1/2 + \sqrt{2k + 1/4} \rceil \) and \(i = k - (j-1)(j-2)/2\).

Proof

Before the proof, the matrix shown below helps to gain better intuition of (B1), where i and j indicate the row number and the column number of the matrix, and k indicates the value in the (ij)-th entry of it.

Fig. 19
figure 19

Boxplots of time cost for two-stage parallel computing

This proof can be divided into two parts. The first part proves that j can be uniquely determined when k is fixed, and the second part proves that i can be uniquely determined when k and j are fixed. One can note that the second part of the proof is done through (B1) directly. Now assume k is fixed, the minimum value of i is 1 and the maximum value of it is \(j-1\). When \(i = 1\), we have \(2(k-i) = (j-1)(j-2)\), which implies \(j = 3/2 + \sqrt{2k-7/4}\); when \(i = j-1\), we have \(2k = j(j-1)\), which implies \(j = 1/2 + \sqrt{2k+1/4}.\) Since \(j \in {\mathbb {N}}\), it remains to show \(\lceil j_l(k) \rceil = \lfloor j_u(k) \rfloor \), where \(j_l(k) = 1/2 + \sqrt{2k+1/4}\) and \(j_u(k) = 3/2 + \sqrt{2k-7/4}\).

Suppose \(N< j_l(k) < N + 1\) for some \(k, N \in {\mathbb {N}}\), there exists \(k^*=N(N-1)/2 < k\) such that \(j_u(k^*+1) = N+1 \le j_u(k)\). Hence, there do not exists \(N \in {\mathbb {N}}\) such that \(j_l(k), j_u(k) \in (N, N+1)\), which implies \(\lceil j_l(k) \rceil \le \lfloor j_u(k) \rfloor \) and \(\lceil j_u(k) \rceil - \lfloor j_l(k) \rfloor \le 1 + \sqrt{2k-7/4} - \sqrt{2k+1/4} < 1\), which leads to contradiction since \(\lceil j_l(k) \rceil \) and \(\lfloor j_u(k) \rfloor \) are both positive integers. Therefore, \(\lceil j_l(k) \rceil = \lfloor j_u(k) \rfloor \) and the proof is completed. \(\square \)

Appendix C: Accelerated method experiments

To investigate the effects of batch sizes on the parallel computation framework, we used 10 cores for parallel computing and 30 particles in the ECOPICA algorithm. Three mixed signals with a length of 5000 were generated randomly. For each possible combination of \((n_1, n_2)\), we randomly generated 30 particles 50 times and recorded the time cost for dependency measuring in each run. The boxplots for the time costs of batch sizes 250, 1000, and 5000 are shown in Fig. 19, which reveals that calculating \(I_{\sigma }\) becomes more time-demanding with large batch sizes and it is appropriate to use more threads for acceleration in this case. On the contrary, it is more important to process different particles simultaneously in the case of small batch sizes. Through the experiment, two conclusions can be made:

  1. 1.

    Using a smaller batch size can greatly reduce the time spent.

  2. 2.

    How to allocate cores only matters when using a smaller batch size.

In addition, Table 3 shows the median of the time cost for different batch sizes and combinations of \((n_1, n_2)\). When the batch size is 250, we can save up to about 30% of the time spent. On the other hand, when the batch size is 1000 and 5000, the time saved is less than 8%. In our experiments, we set the batch size to 250, \(n_1 = 5\), and \(n_2 = 2\), resulting in over 1000 times speed up.

Table 3 Medians of the computational time (in seconds) for different combinations of batch sizes and clustering numbers

Appendix D: MICA algorithm

Following the same symbols in Sect. 2.3, based on the Theorem 3.1 in Rahmanishamsi and Alikhani-Vafa (2020), the dependency of a two-dimensional random vector \((Y_1, Y_2)\) can be measured by

$$\begin{aligned} \delta _T= & {} h(T) \Big [ 16 - \frac{30}{T^2} \sum _{i_1=1}^T\sum _{i_2=1}^T \max \left( u_1^{(i_1)}, u_1^{(i_2)}, u_2^{(i_1)}, u_2^{(i_2)}\right) \\{} & {} + \frac{20}{T} \sum _{i=1}^n \max \left( u_1^{(i)}, u_2^{(i)}\right) \Big ], \end{aligned}$$

where \(h(T)=T^2(T+1)/(T^3+T^2+5T)\). To minimize \(\delta _T\), we can simply omit the constant multipliers to make computation more efficient. For the multidimensional case, we use the sum of pairwise dependencies to measure the magnitude of multidimensional dependency. To fairly compare the performances of the two non-parametric dependency measurements, \(I_\sigma \), and \(\delta _T\), we adopt the same island-based GOA algorithm described in Algorithm 1 (substituting \(I_\sigma \) as \(\delta _T\)) to solve the ICA problem as well as adopts the same acceleration methods mentioned in 3.2 when conducting MICA.

Appendix E: Other simulation results

1.1 E.1: Gamma distribution

Fig. 20
figure 20

Different distribution shapes of \(\text {Gamma}(x; a, 1)\)

Table 4 Excess kurtosis and skewness of \(\text {Gamma}(a, 1)\)

In this scenario, we consider generating independent source signals from a gamma distribution \(\text {Gamma}(x; a, 1)\) with density

$$\begin{aligned} f(x; a, 1) = \dfrac{1}{\Gamma (a)} x^{a-1} e^{-x},\ x> 0,\ a>0, \end{aligned}$$
(E2)

where \(\Gamma (\cdot )\) is the gamma function. In particular, we consider the cases of \(a\in \{1, 2, 4, 8, 16, 32\}\). The corresponding densities of the 6 cases are shown in Fig. 20 and the associated excess kurtosis and skewness are reported in Table 4.

Fig. 21
figure 21

Boxplots of the mjSNRs for various ICA methods in the \(\text {Gamma}(a, 1)\) scenario

By performing the same procedure as in the scenario of Beta distributions, Fig. 21 presents the boxplots of mjSNRs of various ICA methods. From Fig. 21, one can find that all models worked well when the excess kurtosis \(\gamma _2 \ge 1\), that is, \(a \in \{ 1,2,4 \}\), and the ECOPICA performs better than other ICA methods. However, when a increases, ECOPICA and COPICA have better performance than other ICA methods. This phenomenon indicates that ECOPICA has achieved the best performance in most of our cases when the source distribution gradually approaches the Gaussian distribution.

Fig. 22
figure 22

Boxplots of the mjSNRs for the Rayleigh(1) scenario

1.2 E.2: Rayleigh distribution

In this example, we evaluated the capability of each model when the source signals follow the Rayleigh distribution. Compared with \(\text {Gamma}(16, 1)\), Rayleigh distribution has smaller excess kurtosis but larger skewness. The density function of the \(\text {Rayleigh}(x;\sigma )\) distribution is

$$\begin{aligned} f(x; \sigma ) = \dfrac{x}{\sigma ^2} e^{-\frac{x^2}{2\sigma ^2}},\ x \ge 0. \end{aligned}$$
(E3)

Since both the excess kurtosis and the skewness of the Rayleigh distribution are constants, they can be computed by \(-(6 \pi ^2 -24\pi + 16)/(4-\pi )^2 \approx 0.245\) and \(2\sqrt{\pi }(\pi -3)/(4-\pi )^{3/2} \approx 0.631\), respectively. We only consider the case \(\sigma = 1\) in this example. Simulation results are presented in Fig. 22, which reveals that only ECOPICA, COPICA, and JADE worked well, and ECOPICA dominated the other models in this example.

1.3 E.3: Laplace distribution

Fig. 23
figure 23

Boxplots of mjSNR for Laplace (0, 1) example

Next, we generated source signals from a \(\text {Laplace}(x;\mu ,b)\) distribution with the following density function

$$\begin{aligned} f(x; \mu , b) = \dfrac{1}{2b} \exp \left\{ -\displaystyle \frac{|x-\mu |}{b}\right\} ,\ x \in {\mathbb {R}}. \end{aligned}$$
(E4)

Since the Laplace distribution is symmetric and its excess kurtosis is 3, we expect to see that all the models recovered the source signals successfully. The numerical results are presented in Fig. 23, which is consistent with our expectations.

1.4 E.4: Mixed scenario

Different from the previous scenarios generating source signals independently form a single distribution, we further consider generating 3 source signals from \(\text {Rayleigh}(1)\), \(\text {Beta}(1, 3)\), and \(\text {Gamma}(8, 1)\) separately. The approximated values of excess kurtosis and skewness of the three distributions are reported in Table 5. The boxplots of mjSNR value for each ICA model in this scenario are shown in Fig. 24, which reveals the superiority of ECOPICA over the other ICA methods.

Table 5 Excess kutrosis and skewness of \(\text {Rayleigh}(1)\), \(\text {Beta}(1, 3)\), and \(\text {Gamma}(8, 1)\) distributions
Fig. 24
figure 24

Boxplots of mjSNR for the mixed distribution example

Appendix F: Setup of (KNm) in the Algorithm 1

Fig. 25
figure 25

Boxplots of mjSNR for different (KNm) settings

Intuitively, utilizing more resources for optimized search is more likely to yield better performance. In our experience, increasing the total number of particles (\(K \times N\)) in the GOA algorithm can mostly lead to improved performance. In practice, however, we need to consider constraints on time and resources. Here fixed the total number of particles \(K \times N\) as 30, we attempted to divide the particles into 2, 3, 5, and 6 groups (\(K = 2, 3, 5, 6\)) with different communication frequencies (\(m = 3, 5, 7\)). To see the effect of the different (KNm) combinations, we revisited the experiments for our first beta distribution scenario in Sect. 4.2, and for each parameter combination, we independently repeated the experiment 50 times under the same distribution assumption, resulting in a total of \(4 \times 3 \times 15 \times 50 = 9000\) experiments. The mjSNR is also used as the performance measurement.

Figure 25 shows the boxplots of mjSNR for different (KNm) settings. Based on this figure, we observed that the ECOPICA performance is not sensitive to the parameter set-up. Thus due to our experience, we recommend a minimum of 10 particles per group, i.e., \(N = 10\), and group size K as 3 to ensure optimization effectiveness. Regarding communication frequency, we advise a frequency of at least every 3 iterations (\(m = 3)\) for computational efficiency.

Appendix G: Gradient-based algorithm

Although GOA works well in our experiments, choosing the right number of particles may be a potential problem when n, the number of signals, is large because the dimension of the solution space grows quadratically as n increases. Here, we propose a gradient-based algorithm, so that no parameterization on the orthogonal transformation \({\textbf{R}}\) is needed.

First, we compute the derivative \(\displaystyle \frac{\partial {\widehat{I}}_{\sigma }({\textbf{Y}})}{\partial {\textbf{R}}^{(j)}}\), where \({\textbf{R}}^{(j)}\) is the j-th row of \({\textbf{R}}\). Consider \({\widehat{I}}_{\sigma }({\textbf{Y}}) = \sqrt{\frac{a}{b}}\), where \(a = s_1 - 2s_2 + v_3\) and \(b=v_1 - 2v_2 + v_3\) as defined in Eq. (7).

$$\begin{aligned} \frac{\partial {\widehat{I}}_{\sigma }({\textbf{Y}})}{\partial {\textbf{R}}^{(j)}} = \frac{1}{2 \cdot b \cdot {\widehat{I}}_{\sigma }({\textbf{Y}})} \left( \frac{\partial s_1}{\partial {\textbf{R}}^{(j)}} - 2 \frac{\partial s_2}{\partial {\textbf{R}}^{(j)}} \right) \end{aligned}$$
(G5)

Recall that

$$\begin{aligned}{} & {} s_1 = \frac{2}{T^2} \underset{1\le p < q \le T}{\sum } k_{\sigma }({\textbf{u}}^{(p)}, {\textbf{u}}^{(q)}) + \frac{1}{T} \text { and } s_2 \\{} & {} \quad = \frac{1}{T^{n+1}} \sum _{p=1}^T \prod _{j'=1}^n \sum _{t=1}^T k_{\sigma }\left( u_{j'}^{(p)}, \frac{t}{T}\right) . \end{aligned}$$

For simplicity, we use the following notations:

$$\begin{aligned} K^{(1)}_{p,q} := k_{\sigma }({\textbf{u}}^{(p)}, {\textbf{u}}^{(q)}) \text { and } K^{(2)}_{p,t,j'} := k_{\sigma }\left( u_{j'}^{(p)}, \frac{t}{T}\right) . \end{aligned}$$

Thus,

$$\begin{aligned} \frac{\partial s_1}{\partial {\textbf{R}}^{(j)}}&= \frac{2}{T^2} \underset{1\le p < q \le T}{\sum } \frac{\partial K^{(1)}_{p,q}}{\partial {\textbf{R}}^{(j)}} \\ \frac{\partial s_2}{\partial {\textbf{R}}^{(j)}}&= \frac{1}{T^{n+1}} \sum _{p=1}^T \left( \sum _{t=1}^T \frac{\partial K^{(2)}_{p,t,j}}{\partial {\textbf{R}}^{(j)}} \right) \prod _{\begin{array}{c} j'=1 \\ j' \ne j \end{array}}^n \left( \sum _{t=1}^T K^{(2)}_{p,t,j'} \right) , \end{aligned}$$

where

$$\begin{aligned}&\frac{\partial K^{(1)}_{p,q}}{\partial {\textbf{R}}^{(j)}} \\&\quad = -\frac{1}{\sigma ^2} \cdot K^{(1)}_{p,q} \cdot ({\textbf{u}}^{(p)}_j - {\textbf{u}}^{(q)}_j) \cdot \left[ f_j({\textbf{y}}^{(p)}_j) \cdot {\textbf{z}}^{(p)} - f_j({\textbf{y}}^{(q)}_j) \cdot {\textbf{z}}^{(q)} \right] \\&\frac{\partial K^{(2)}_{p,t,j}}{\partial {\textbf{R}}^{(j)}} = -\frac{1}{\sigma ^2} \cdot K^{(2)}_{p,t,j} \cdot ({\textbf{u}}^{(p)}_j - \frac{t}{T}) \cdot f_j({\textbf{y}}^{(p)}_j) \cdot {\textbf{z}}^{(p)} \end{aligned}$$

To approximate the density function of \(y_j\), one can use a histogram of \({\textbf{y}}_j\). Another way is to fit spline functions by points used in the histogram, which may be helpful if the second-order condition is considered.

For clarity, we rewrite the optimization problem in Eq. (9) as

$$\begin{aligned} \min _{{\textbf{R}} \in {\mathbb {R}}^{n \times n}} {\widehat{I}}_{\sigma }({\textbf{R}}) \text { s.t. } {\textbf{R}}^T{\textbf{R}}={\textbf{I}}_n. \end{aligned}$$
(G6)

To solve the problem, one can use the Cayley transformation technique as mentioned in Wen and Yin (2013). Given the current orthogonal transformation \({\textbf{R}}\) and the gradient \(G = \displaystyle \frac{\partial {\widehat{I}}_{\sigma }({\textbf{R}})}{\partial {\textbf{R}}_{i,j}}\), one can first compute the skew-symmetric matrix

$$\begin{aligned} {\textbf{B}}={\textbf{G}}{\textbf{R}}^T - {\textbf{R}}{\textbf{G}}^T. \end{aligned}$$
(G7)

Then, \({\textbf{R}}\) can be updated by the following rule:

$$\begin{aligned} {\textbf{R}} \leftarrow \left( {\textbf{I}}_n - \frac{\tau }{2}{\textbf{B}}\right) ^{-1} \left( {\textbf{I}}_n + \frac{\tau }{2}{\textbf{B}}\right) {\textbf{R}}, \end{aligned}$$
(G8)

where \(\tau \in {\mathbb {R}}\) is the step size. To find the appropriate step size, a line search can be adopted to find \(\tau \) satisfying the Armijo-Wolfe conditions (Nocedal and Wright 2009). To reduce the complexity of calculation and update faster, a stochastic gradient descent-like method can be adopted.

Although we propose this theoretical method to solve the ICA problem, the algorithm may be difficult to implement practically because solving the gradient of \({\widehat{I}}_{\sigma }({\textbf{R}})\) is much more complicated than computing \({\widehat{I}}_{\sigma }({\textbf{R}})\). In addition, the second-order condition should be considered to evaluate whether the final point attains the local minimum. Therefore, determining a differentiable approximation of \({\widehat{I}}_{\sigma }({\textbf{R}})\) is important for future research.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pi, HK., Guo, MH., Chen, RB. et al. ECOPICA: empirical copula-based independent component analysis. Stat Comput 34, 52 (2024). https://doi.org/10.1007/s11222-023-10368-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-023-10368-3

Keywords

Navigation