ECOPICA: empirical copula-based independent component analysis

Pi, Hung-Kai; Guo, Mei-Hui; Chen, Ray-Bing; Huang, Shih-Feng

doi:10.1007/s11222-023-10368-3

ECOPICA: empirical copula-based independent component analysis

Original Paper
Published: 13 December 2023

Volume 34, article number 52, (2024)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Hung-Kai Pi¹,
Mei-Hui Guo¹,
Ray-Bing Chen² &
…
Shih-Feng Huang³

125 Accesses
Explore all metrics

Abstract

This study proposes a non-parametric ICA method, called ECOPICA, which describes the joint distribution of data by empirical copulas and measures the dependence between recovery signals by an independent test statistic. We employ the grasshopper algorithm to optimize the proposed objective function. Several acceleration tricks are further designed to enhance the computational efficiency of the proposed algorithm under the parallel computing framework. Our simulation and empirical analysis show that ECOPICA produces better and more robust recovery performances than other well-known ICA approaches for various source distribution shapes, especially when the source distribution is skewed or near-Gaussian.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Regularized distributionally robust optimization with application to the index tracking problem

Article 27 February 2024

Tutorial on PCA and approximate PCA and approximate kernel PCA

Article Open access 31 October 2022

Notes

In practice, the $\varvec{S}$ on the left-hand side of (2) can be a permutation of the original $\varvec{S}$ in (1). Since the original order of the components in $\varvec{S}$ is unknown and to simplify the notations, we still use the notation $\varvec{S}$ in (2) without ambiguity.
IRMAS dataset can be found from https://www.upf.edu/web/mtg/irmas.

References

Abayomi, K., Lall, U., de la Pena, V.: Copula based independent component analysis. http://ssrn.com/abstract=1028822 (2007)
Bach, F.R., Jordan, M.I.: Kernel independent component analysis. J. Mach. Learn. Res. 3(Jul), 1–48 (2002)
MathSciNet Google Scholar
Bansal, P.: Intel Image Classification (Image Scene Classification of Multiclass), Version 2 (2019). https://www.kaggle.com/puneet6060/intel-image-classification
Bartlett, M.S., Movellan, J.R., Sejnowski, T.J.: Face recognition by independent component analysis. IEEE Trans. Neural Netw. 13(6), 1450–1464 (2002). https://doi.org/10.1109/TNN.2002.804287
Article Google Scholar
Beckmann, C.F., Smith, S.M.: Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Trans. Med. Imaging 23(2), 137–152 (2004). https://doi.org/10.1109/TMI.2003.822821
Article Google Scholar
Beckmann, C.F., Smith, S.M.: Tensorial extensions of independent component analysis for multisubject FMRI analysis. Neuroimage 25(1), 294–311 (2005). https://doi.org/10.1016/j.neuroimage.2004.10.043
Article Google Scholar
Bell, A.J., Sejnowski, T.J.: An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7(6), 1129–1159 (1995). https://doi.org/10.1162/neco.1995.7.6.1129
Article Google Scholar
Bell, A.J., Sejnowski, T.J.: The “independent components’’ of natural scenes are edge filters. Vis. Res. 37(23), 3327–3338 (1997). https://doi.org/10.1016/S0042-6989(97)00121-1
Article Google Scholar
Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: ISMIR, pp. 559–564 (2012). Citeseer
Brunner, C., Naeem, M., Leeb, R., Graimann, B., Pfurtscheller, G.: Spatial filtering and selection of optimized components in four class motor imagery EEG data using independent components analysis. Pattern Recognit. Lett. 28(8), 957–964 (2007). https://doi.org/10.1016/j.patrec.2007.01.002
Article Google Scholar
Calhoun, V.D., Adali, T., Pearlson, G.D., Pekar, J.J.: A method for making group inferences from functional MRI data using independent component analysis. Hum. Brain Mapp. 14(3), 140–151 (2001). https://doi.org/10.1002/hbm.1048
Article Google Scholar
Cardoso, J.-F., Souloumiac, A.: Blind beamforming for non-gaussian signals. In: IEE Proceedings F (Radar and Signal Processing), vol. 140, pp. 362–370 (1993). https://doi.org/10.1049/ip-f-2.1993.0054. IET. https://digital-library.theiet.org/content/journals/10.1049/ip-f-2.1993.0054
Chen, R.-B., Guo, M., Härdle, W.K., Huang, S.-F.: Copica-independent component analysis via copula techniques. Stat. Comput. 25, 273–288 (2015). https://doi.org/10.1007/s11222-013-9431-3
Article MathSciNet Google Scholar
Comon, P.: Independent component analysis, a new concept? Signal Process. 36(3), 287–314 (1994). https://doi.org/10.1016/0165-1684(94)90029-9
Article Google Scholar
Dagher, I., Nachar, R.: Face recognition using IPCA-ICA algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 28(6), 996–1000 (2006). https://doi.org/10.1109/TPAMI.2006.118
Article Google Scholar
Delorme, A., Sejnowski, T., Makeig, S.: Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis. Neuroimage 34(4), 1443–1449 (2007). https://doi.org/10.1016/j.neuroimage.2006.11.004
Article Google Scholar
Fukumizu, K., Gretton, A., Sun, X., Schölkopf, B.: Kernel measures of conditional dependence. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 489–496. Curran Associates Inc, New York (2008)
Google Scholar
Helwig, N.E.: ICA: Independent Component Analysis (2018). R package version 1.0-2. https://CRAN.R-project.org/package=ica
Hyvärinen, A.: Sparse code shrinkage: denoising of Nongaussian data by maximum likelihood estimation. Neural Comput. 11(7), 1739–1768 (1999). https://doi.org/10.1162/089976699300016214
Article MathSciNet Google Scholar
Hyvarinen, A.: Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Netw. 10(3), 626–634 (1999). https://doi.org/10.1109/72.761722
Article Google Scholar
Hyvärinen, A., Oja, E.: Independent component analysis by general nonlinear Hebbian-like learning rules. Signal Process. 64(3), 301–313 (1998). https://doi.org/10.1016/S0165-1684(97)00197-7
Article Google Scholar
Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13(4–5), 411–430 (2000). https://doi.org/10.1016/S0893-6080(00)00026-5
Article Google Scholar
Hyvärinen, A., Karhunen, J., Oja, E.: 7. What is Independent Component Analysis?, pp. 145–164. Wiley, New York (2001). https://doi.org/10.1002/0471221317.ch7
Book Google Scholar
Jutten, C., Herault, J.: Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Process. 24(1), 1–10 (1991). https://doi.org/10.1016/0165-1684(91)90079-X
Article Google Scholar
Karvanen, J., Eriksson, J., Koivunen, V.: Maximum likelihood estimation of ICA model for wide class of source distributions. In: Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501), vol. 1, pp. 445–454 (2000). https://doi.org/10.1109/NNSP.2000.889437. IEEE
Karvanen, J., Eriksson, J., Koivunen, V.: Maximum likelihood estimation of ica model for wide class of source distributions. In: Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501), vol. 1, pp. 445–4541 (2000). https://doi.org/10.1109/NNSP.2000.889437. https://ieeexplore.ieee.org/document/889437
Kirshner, S., Póczos, B.: ICA and ISA using Schweizer–Wolff measure of dependence. In: Proceedings of the 25th International Conference on Machine Learning, pp. 464–471 (2008)
Kwak, K.-C., Pedrycz, W.: Face recognition using an enhanced independent component analysis approach. IEEE Trans. Neural Netw. 18(2), 530–541 (2007). https://doi.org/10.1109/TNN.2006.885436
Article Google Scholar
Lai, Z., Xu, Y., Chen, Q., Yang, J., Zhang, D.: Multilinear sparse principal component analysis. IEEE Trans. Neural Netw. Learn. Syst. 25(10), 1942–1950 (2014). https://doi.org/10.1109/TNNLS.2013.2297381
Article Google Scholar
Learned-Miller, E.G., Iii, J.W.F.: ICA using spacings estimates of entropy. J. Mach. Learn. Res. 4, 1271–1295 (2003)
MathSciNet Google Scholar
Lopez-Paz, D., Hennig, P., Schölkopf, B.: The randomized dependence coefficient. Adv. Neural Inf. Process. Syst. 26 (2013)
Lu, C.-J., Lee, T.-S., Chiu, C.-C.: Financial time series forecasting using independent component analysis and support vector regression. Decis. Support Syst. 47(2), 115–125 (2009). https://doi.org/10.1016/j.dss.2009.02.001
Article Google Scholar
Marchini, J.L., Heaton, C., Ripley, B.D.: fastICA: FastICA Algorithms to Perform ICA and Projection Pursuit. (2019). R package version 1.2-2. https://CRAN.R-project.org/package=fastICA
Nocedal, J., Wright, S.: Numerical Optimization (Springer Series in Operations Research and Financial Engineering), 2nd edn. Springer, New York (2009)
Google Scholar
Oja, E., Kiviluoto, K., Malaroiu, S.: Independent component analysis for financial time series. In: Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373), pp. 111–116 (2000). https://doi.org/10.1109/ASSPCC.2000.882456. https://ieeexplore.ieee.org/document/882456
Pham, T.D., Möcks, J.: Beyond principal component analysis: a trilinear decomposition model and least squares estimation. Psychometrika 57(2), 203–215 (1992)
Article MathSciNet Google Scholar
Póczos, B., Ghahramani, Z., Schneider, J.: Copula-based kernel dependency measures. arXiv preprint arXiv:1206.4682 (2012)
Póczos, B., Ghahramani, Z., Schneider, J.: Copula-based kernel dependency measures. In: Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12, pp. 1635–1642. Omnipress, Madison, WI, USA (2012)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019). R Foundation for Statistical Computing. https://www.R-project.org
Rahmanishamsi, J., Alikhani-Vafa, A.: On a measure of dependence and its application to ICA. J. Stat. Model. Theory Appl. 1(1), 155–167 (2020)
Google Scholar
Roy, A., Ghosh, A., Goswami, A., Murthy, C.: Some new copula based distribution-free tests of independence among several random variables. Sankhya A 84, 556–596 (2020). https://doi.org/10.1007/s13171-020-00207-2
Article MathSciNet Google Scholar
Saremi, S., Mirjalili, S., Lewis, A.: Grasshopper optimisation algorithm: theory and application. Adv. Eng. Softw. 105, 30–47 (2017). https://doi.org/10.1016/j.advengsoft.2017.01.004
Article Google Scholar
Sklar, A.: Fonctions de repartition an dimensions et leurs marges. Publ. inst. statist. univ. Paris 8, 229–231 (1959)
MathSciNet Google Scholar
Song, L., Lu, H.: EcoICA: skewness-based ICA via eigenvectors of cumulant operator. In: Durrant, R.J., Kim, K.-E. (eds.) Proceedings of The 8th Asian Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 63, pp. 445–460. PMLR, The University of Waikato, Hamilton, New Zealand (2016). https://proceedings.mlr.press/v63/Song94.html
Spurek, P., Tabor, J., Rola, P., Ociepka, M.: Ica based on asymmetry. Pattern Recognit. 67, 230–244 (2017). https://doi.org/10.1016/j.patcog.2017.02.019
Article Google Scholar
Suganthan, P.N., Hansen, N., Liang, J.J., Deb, K., Chen, Y.-P., Auger, A., Tiwari, S.: Problem definitions and evaluation criteria for the CEC 2005 special session on real-parameter optimization. KanGAL report 2005005 (2005), 2005 (2005)
Tsai, A.C., Liou, M., Jung, T.-P., Onton, J.A., Cheng, P.E., Huang, C.-C., Duann, J.-R., Makeig, S.: Mapping single-trial EEG records on the cortical surface through a spatiotemporal modality. Neuroimage 32(1), 195–207 (2006). https://doi.org/10.1016/j.neuroimage.2006.02.044
Article Google Scholar
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1), 397–434 (2013). https://doi.org/10.1007/s10107-012-0584-1
Article MathSciNet Google Scholar
Yang, J., Gao, X., Zhang, D., Yang, J.-Y.: Kernel ICA: an alternative formulation and its application to face recognition. Pattern Recognit. 38(10), 1784–1787 (2005). https://doi.org/10.1016/j.patcog.2005.01.023
Article Google Scholar
Zhang, H., Yang, H., Guan, C.: Bayesian learning for spatial filtering in an EEG-based brain-computer interface. IEEE Trans. Neural Netw. Learn. Syst. 24(7), 1049–1060 (2013). https://doi.org/10.1109/TNNLS.2013.2249087
Article Google Scholar

Download references

Acknowledgements

We would like to express our sincere gratitude to the Editor and two anonymous reviewers for their insightful comments to improve our work. We thank the National Science and Technology Council, Taiwan (Grants NSTC 112-2118-M-110-001-MY2, MOST 111-2118-M-006-002-MY2, and NSTC 112-2118-M-008-004-MY3) for partial support of this research. This article is dedicated to Professor Wen-Jang Huang for acknowledging his significant role in fostering mathematical and statistical talents in Taiwan until his passing on November 21, 2023.

Author information

Authors and Affiliations

Department of Applied Mathematics, National Sun Yat-sen University, Kaohsiung, 804, Taiwan
Hung-Kai Pi & Mei-Hui Guo
Department of Statistics, National Cheng Kung University, Tainan, 701, Taiwan
Ray-Bing Chen
Graduate Institute of Statistics, National Central University, Taoyuan, 320, Taiwan
Shih-Feng Huang

Authors

Hung-Kai Pi
View author publications
You can also search for this author in PubMed Google Scholar
Mei-Hui Guo
View author publications
You can also search for this author in PubMed Google Scholar
Ray-Bing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Feng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Pi, Guo, Chen, and Huang discussed the concept of the proposed method and wrote the main manuscript text. Pi conducted the numerical experiments.

Corresponding author

Correspondence to Shih-Feng Huang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Two tricks of acceleration

In this section, we introduce the details of the uniform trick and mini-batch trick for acceleration mentioned in Sect. 3.2.

(i)
Uniform Trick Since we need to calculate $s_2$ in (7) multiple times, which measures the kernel distance between the empirical copula of the data ${\widehat{C}}_{\varvec{Y}, T}$ and the empirical independence copula $\varPi _T$, the required computing resources increase dramatically when the signal length T is large. Since the estimation of $s_2$ is not sensitive to the accuracy of the estimation of $\varPi _T$, we propose to estimate $s_2$ in the following way to reduce the computational complexity:
$$\begin{aligned} s_2 = \frac{1}{T \cdot T_0^n} \sum _{i=1}^T \prod _{j=1}^n \sum _{k=1}^{T_0} k_{\sigma }\left( u_j^{(i)}, \frac{k}{T_0}\right) , \end{aligned}$$
where $T_0 = \min {\{1000, T\}}$. That is, we use $\varPi _{T_0}$, a “thicker" (but still unbiased) estimator of the independence copula, in the calculation of $s_2$. By doing this, the computational complexity of $s_2$ reduced from $O(nT^2)$ to O(nT) when $T > 1000$. We call it the uniform trick since this concise form is derived from the property of independence copula. Based on Lemma 6 in Póczos et al. (2012), Lemma L2 in Roy et al. (2020) shows that the error term of this approximation is bounded by $4n/T_0^2$. In addition, Fig. 18 presents the contour plot of the error bounds of the uniform trick approximation under various T and n, which reveals that the error bounds decrease as T increasing for any fixed n and the error bounds are less than $10^{-4}$ when $T>2000$ for $n\le 50$.
(ii)
Mini-Batch Trick In addition to the uniform trick, we further propose a mini-batch trick to reduce the computational complexity of $I_{\sigma }$. By the findings in Roy et al. (2020), the power of the dependency testing based on $I_{\sigma }$ is acceptable if the sample size $T\ge 250$. Hence, the proposed mini-batch trick is to split a signal into multiple batches with a predetermined batch size, say 250. Accordingly, the $I_{\sigma }$ is estimated by aggregating the measurements of the dependency on each batch, and the computational complexity is reduced to O(nT). Specifically, let T denote the length of the signal and $T_0$ be the selected batch size. Hence, the number of batches is $B = \lceil T/T_0 \rceil $. First, split the recover data matrix ${\textbf{Y}}$ into multiple batches ${\textbf{Y}}_1,\dots , {\textbf{Y}}_B$. Next, perform the empirical copula transformation on each batch to get ${\textbf{U}}_1,\dots , {\textbf{U}}_B$ and compute the dependency measurement, denoted by ${\widehat{I}}_{\sigma }({\textbf{Y}}_i)$, $i=1,\ldots , B$, for each batch. Finally, the $I_{\sigma }$ is estimated by $B^{-1}\sum _{i=1}^B{\widehat{I}}_{\sigma }({\textbf{Y}}_i)$. The concept of this procedure is to estimate the overall dependency with the dependencies of nearby regions and ignore the dependencies of distant regions. From the view of the estimation of $s_1$, we estimate the kernel matrix by a sparse matrix with zeros in all its off-diagonal block matrices.

Appendix B: Parallel computation framework

To compute ${\widehat{I}}_{\sigma }(\varvec{\beta })$ in Eq. (7) parallelly, which is equivalent to compute $s_1$ and $s_2$ in Eq. (7) parallelly. In the calculation of $s_2$, we assign tasks to different threads according to $i = 1, 2, \dots ,T$; In the calculation of $s_1$, we assign tasks to different threads according to $k = 1, 2, \dots , \left( {\begin{array}{c}T\\ 2\end{array}}\right) $, where the relationship between k and (i, j), $1 \le i < j \le T$, is defined as

$$\begin{aligned} k = i + \frac{(j-1)(j-2)}{2}. \end{aligned}$$

(B1)

The key to this idea is that when k is determined, the closed-form of i and j can be found, making it possible to calculate $k_\sigma ({\textbf{u}}^{(i)}, {\textbf{u}}^{(j)})$ used in Eq. (7). The following proposition shows how we fulfilled this idea, and the rigorous proof of it is also given below.

Proposition 1

If $k = i + (j-1)(j-2)/2$, $1 \le i < j \le T$, $1 \le k \le \left( {\begin{array}{c}T\\ 2\end{array}}\right) $, and $i, j, k \in {\mathbb {N}}$, then $j = \lceil 1/2 + \sqrt{2k + 1/4} \rceil $ and $i = k - (j-1)(j-2)/2$.

Proof

Before the proof, the matrix shown below helps to gain better intuition of (B1), where i and j indicate the row number and the column number of the matrix, and k indicates the value in the (i, j)-th entry of it.

This proof can be divided into two parts. The first part proves that j can be uniquely determined when k is fixed, and the second part proves that i can be uniquely determined when k and j are fixed. One can note that the second part of the proof is done through (B1) directly. Now assume k is fixed, the minimum value of i is 1 and the maximum value of it is $j-1$. When $i = 1$, we have $2(k-i) = (j-1)(j-2)$, which implies $j = 3/2 + \sqrt{2k-7/4}$; when $i = j-1$, we have $2k = j(j-1)$, which implies $j = 1/2 + \sqrt{2k+1/4}.$ Since $j \in {\mathbb {N}}$, it remains to show $\lceil j_l(k) \rceil = \lfloor j_u(k) \rfloor $, where $j_l(k) = 1/2 + \sqrt{2k+1/4}$ and $j_u(k) = 3/2 + \sqrt{2k-7/4}$.

Suppose $N< j_l(k) < N + 1$ for some $k, N \in {\mathbb {N}}$, there exists $k^*=N(N-1)/2 < k$ such that $j_u(k^*+1) = N+1 \le j_u(k)$. Hence, there do not exists $N \in {\mathbb {N}}$ such that $j_l(k), j_u(k) \in (N, N+1)$, which implies $\lceil j_l(k) \rceil \le \lfloor j_u(k) \rfloor $ and $\lceil j_u(k) \rceil - \lfloor j_l(k) \rfloor \le 1 + \sqrt{2k-7/4} - \sqrt{2k+1/4} < 1$, which leads to contradiction since $\lceil j_l(k) \rceil $ and $\lfloor j_u(k) \rfloor $ are both positive integers. Therefore, $\lceil j_l(k) \rceil = \lfloor j_u(k) \rfloor $ and the proof is completed. $\square $

Appendix C: Accelerated method experiments

To investigate the effects of batch sizes on the parallel computation framework, we used 10 cores for parallel computing and 30 particles in the ECOPICA algorithm. Three mixed signals with a length of 5000 were generated randomly. For each possible combination of $(n_1, n_2)$, we randomly generated 30 particles 50 times and recorded the time cost for dependency measuring in each run. The boxplots for the time costs of batch sizes 250, 1000, and 5000 are shown in Fig. 19, which reveals that calculating $I_{\sigma }$ becomes more time-demanding with large batch sizes and it is appropriate to use more threads for acceleration in this case. On the contrary, it is more important to process different particles simultaneously in the case of small batch sizes. Through the experiment, two conclusions can be made:

1.
Using a smaller batch size can greatly reduce the time spent.
2.
How to allocate cores only matters when using a smaller batch size.

In addition, Table 3 shows the median of the time cost for different batch sizes and combinations of $(n_1, n_2)$. When the batch size is 250, we can save up to about 30% of the time spent. On the other hand, when the batch size is 1000 and 5000, the time saved is less than 8%. In our experiments, we set the batch size to 250, $n_1 = 5$, and $n_2 = 2$, resulting in over 1000 times speed up.

Table 3 Medians of the computational time (in seconds) for different combinations of batch sizes and clustering numbers

Full size table

Appendix D: MICA algorithm

Following the same symbols in Sect. 2.3, based on the Theorem 3.1 in Rahmanishamsi and Alikhani-Vafa (2020), the dependency of a two-dimensional random vector $(Y_1, Y_2)$ can be measured by

$$\begin{aligned} \delta _T= & {} h(T) \Big [ 16 - \frac{30}{T^2} \sum _{i_1=1}^T\sum _{i_2=1}^T \max \left( u_1^{(i_1)}, u_1^{(i_2)}, u_2^{(i_1)}, u_2^{(i_2)}\right) \\{} & {} + \frac{20}{T} \sum _{i=1}^n \max \left( u_1^{(i)}, u_2^{(i)}\right) \Big ], \end{aligned}$$

where $h(T)=T^2(T+1)/(T^3+T^2+5T)$. To minimize $\delta _T$, we can simply omit the constant multipliers to make computation more efficient. For the multidimensional case, we use the sum of pairwise dependencies to measure the magnitude of multidimensional dependency. To fairly compare the performances of the two non-parametric dependency measurements, $I_\sigma $, and $\delta _T$, we adopt the same island-based GOA algorithm described in Algorithm 1 (substituting $I_\sigma $ as $\delta _T$) to solve the ICA problem as well as adopts the same acceleration methods mentioned in 3.2 when conducting MICA.

Appendix E: Other simulation results

1.1 E.1: Gamma distribution

Table 4 Excess kurtosis and skewness of $\text {Gamma}(a, 1)$

Full size table

In this scenario, we consider generating independent source signals from a gamma distribution $\text {Gamma}(x; a, 1)$ with density

$$\begin{aligned} f(x; a, 1) = \dfrac{1}{\Gamma (a)} x^{a-1} e^{-x},\ x> 0,\ a>0, \end{aligned}$$

(E2)

where $\Gamma (\cdot )$ is the gamma function. In particular, we consider the cases of $a\in \{1, 2, 4, 8, 16, 32\}$. The corresponding densities of the 6 cases are shown in Fig. 20 and the associated excess kurtosis and skewness are reported in Table 4.

By performing the same procedure as in the scenario of Beta distributions, Fig. 21 presents the boxplots of mjSNRs of various ICA methods. From Fig. 21, one can find that all models worked well when the excess kurtosis $\gamma _2 \ge 1$, that is, $a \in \{ 1,2,4 \}$, and the ECOPICA performs better than other ICA methods. However, when a increases, ECOPICA and COPICA have better performance than other ICA methods. This phenomenon indicates that ECOPICA has achieved the best performance in most of our cases when the source distribution gradually approaches the Gaussian distribution.

1.2 E.2: Rayleigh distribution

In this example, we evaluated the capability of each model when the source signals follow the Rayleigh distribution. Compared with $\text {Gamma}(16, 1)$, Rayleigh distribution has smaller excess kurtosis but larger skewness. The density function of the $\text {Rayleigh}(x;\sigma )$ distribution is

$$\begin{aligned} f(x; \sigma ) = \dfrac{x}{\sigma ^2} e^{-\frac{x^2}{2\sigma ^2}},\ x \ge 0. \end{aligned}$$

(E3)

Since both the excess kurtosis and the skewness of the Rayleigh distribution are constants, they can be computed by $-(6 \pi ^2 -24\pi + 16)/(4-\pi )^2 \approx 0.245$ and $2\sqrt{\pi }(\pi -3)/(4-\pi )^{3/2} \approx 0.631$, respectively. We only consider the case $\sigma = 1$ in this example. Simulation results are presented in Fig. 22, which reveals that only ECOPICA, COPICA, and JADE worked well, and ECOPICA dominated the other models in this example.

1.3 E.3: Laplace distribution

Next, we generated source signals from a $\text {Laplace}(x;\mu ,b)$ distribution with the following density function

$$\begin{aligned} f(x; \mu , b) = \dfrac{1}{2b} \exp \left\{ -\displaystyle \frac{|x-\mu |}{b}\right\} ,\ x \in {\mathbb {R}}. \end{aligned}$$

(E4)

Since the Laplace distribution is symmetric and its excess kurtosis is 3, we expect to see that all the models recovered the source signals successfully. The numerical results are presented in Fig. 23, which is consistent with our expectations.

1.4 E.4: Mixed scenario

Different from the previous scenarios generating source signals independently form a single distribution, we further consider generating 3 source signals from $\text {Rayleigh}(1)$, $\text {Beta}(1, 3)$, and $\text {Gamma}(8, 1)$ separately. The approximated values of excess kurtosis and skewness of the three distributions are reported in Table 5. The boxplots of mjSNR value for each ICA model in this scenario are shown in Fig. 24, which reveals the superiority of ECOPICA over the other ICA methods.

Table 5 Excess kutrosis and skewness of $\text {Rayleigh}(1)$, $\text {Beta}(1, 3)$, and $\text {Gamma}(8, 1)$ distributions

Full size table

Appendix F: Setup of (K, N, m) in the Algorithm 1

Intuitively, utilizing more resources for optimized search is more likely to yield better performance. In our experience, increasing the total number of particles ($K \times N$) in the GOA algorithm can mostly lead to improved performance. In practice, however, we need to consider constraints on time and resources. Here fixed the total number of particles $K \times N$ as 30, we attempted to divide the particles into 2, 3, 5, and 6 groups ($K = 2, 3, 5, 6$) with different communication frequencies ($m = 3, 5, 7$). To see the effect of the different (K, N, m) combinations, we revisited the experiments for our first beta distribution scenario in Sect. 4.2, and for each parameter combination, we independently repeated the experiment 50 times under the same distribution assumption, resulting in a total of $4 \times 3 \times 15 \times 50 = 9000$ experiments. The mjSNR is also used as the performance measurement.

Figure 25 shows the boxplots of mjSNR for different (K, N, m) settings. Based on this figure, we observed that the ECOPICA performance is not sensitive to the parameter set-up. Thus due to our experience, we recommend a minimum of 10 particles per group, i.e., $N = 10$, and group size K as 3 to ensure optimization effectiveness. Regarding communication frequency, we advise a frequency of at least every 3 iterations ($m = 3)$ for computational efficiency.

Appendix G: Gradient-based algorithm

Although GOA works well in our experiments, choosing the right number of particles may be a potential problem when n, the number of signals, is large because the dimension of the solution space grows quadratically as n increases. Here, we propose a gradient-based algorithm, so that no parameterization on the orthogonal transformation ${\textbf{R}}$ is needed.

First, we compute the derivative $\displaystyle \frac{\partial {\widehat{I}}_{\sigma }({\textbf{Y}})}{\partial {\textbf{R}}^{(j)}}$, where ${\textbf{R}}^{(j)}$ is the j-th row of ${\textbf{R}}$. Consider ${\widehat{I}}_{\sigma }({\textbf{Y}}) = \sqrt{\frac{a}{b}}$, where $a = s_1 - 2s_2 + v_3$ and $b=v_1 - 2v_2 + v_3$ as defined in Eq. (7).

$$\begin{aligned} \frac{\partial {\widehat{I}}_{\sigma }({\textbf{Y}})}{\partial {\textbf{R}}^{(j)}} = \frac{1}{2 \cdot b \cdot {\widehat{I}}_{\sigma }({\textbf{Y}})} \left( \frac{\partial s_1}{\partial {\textbf{R}}^{(j)}} - 2 \frac{\partial s_2}{\partial {\textbf{R}}^{(j)}} \right) \end{aligned}$$

(G5)

Recall that

$$\begin{aligned}{} & {} s_1 = \frac{2}{T^2} \underset{1\le p < q \le T}{\sum } k_{\sigma }({\textbf{u}}^{(p)}, {\textbf{u}}^{(q)}) + \frac{1}{T} \text { and } s_2 \\{} & {} \quad = \frac{1}{T^{n+1}} \sum _{p=1}^T \prod _{j'=1}^n \sum _{t=1}^T k_{\sigma }\left( u_{j'}^{(p)}, \frac{t}{T}\right) . \end{aligned}$$

For simplicity, we use the following notations:

$$\begin{aligned} K^{(1)}_{p,q} := k_{\sigma }({\textbf{u}}^{(p)}, {\textbf{u}}^{(q)}) \text { and } K^{(2)}_{p,t,j'} := k_{\sigma }\left( u_{j'}^{(p)}, \frac{t}{T}\right) . \end{aligned}$$

Thus,

$$\begin{aligned} \frac{\partial s_1}{\partial {\textbf{R}}^{(j)}}&= \frac{2}{T^2} \underset{1\le p < q \le T}{\sum } \frac{\partial K^{(1)}_{p,q}}{\partial {\textbf{R}}^{(j)}} \\ \frac{\partial s_2}{\partial {\textbf{R}}^{(j)}}&= \frac{1}{T^{n+1}} \sum _{p=1}^T \left( \sum _{t=1}^T \frac{\partial K^{(2)}_{p,t,j}}{\partial {\textbf{R}}^{(j)}} \right) \prod _{\begin{array}{c} j'=1 \\ j' \ne j \end{array}}^n \left( \sum _{t=1}^T K^{(2)}_{p,t,j'} \right) , \end{aligned}$$

where

$$\begin{aligned}&\frac{\partial K^{(1)}_{p,q}}{\partial {\textbf{R}}^{(j)}} \\&\quad = -\frac{1}{\sigma ^2} \cdot K^{(1)}_{p,q} \cdot ({\textbf{u}}^{(p)}_j - {\textbf{u}}^{(q)}_j) \cdot \left[ f_j({\textbf{y}}^{(p)}_j) \cdot {\textbf{z}}^{(p)} - f_j({\textbf{y}}^{(q)}_j) \cdot {\textbf{z}}^{(q)} \right] \\&\frac{\partial K^{(2)}_{p,t,j}}{\partial {\textbf{R}}^{(j)}} = -\frac{1}{\sigma ^2} \cdot K^{(2)}_{p,t,j} \cdot ({\textbf{u}}^{(p)}_j - \frac{t}{T}) \cdot f_j({\textbf{y}}^{(p)}_j) \cdot {\textbf{z}}^{(p)} \end{aligned}$$

To approximate the density function of $y_j$, one can use a histogram of ${\textbf{y}}_j$. Another way is to fit spline functions by points used in the histogram, which may be helpful if the second-order condition is considered.

For clarity, we rewrite the optimization problem in Eq. (9) as

$$\begin{aligned} \min _{{\textbf{R}} \in {\mathbb {R}}^{n \times n}} {\widehat{I}}_{\sigma }({\textbf{R}}) \text { s.t. } {\textbf{R}}^T{\textbf{R}}={\textbf{I}}_n. \end{aligned}$$

(G6)

To solve the problem, one can use the Cayley transformation technique as mentioned in Wen and Yin (2013). Given the current orthogonal transformation ${\textbf{R}}$ and the gradient $G = \displaystyle \frac{\partial {\widehat{I}}_{\sigma }({\textbf{R}})}{\partial {\textbf{R}}_{i,j}}$, one can first compute the skew-symmetric matrix

$$\begin{aligned} {\textbf{B}}={\textbf{G}}{\textbf{R}}^T - {\textbf{R}}{\textbf{G}}^T. \end{aligned}$$

(G7)

Then, ${\textbf{R}}$ can be updated by the following rule:

$$\begin{aligned} {\textbf{R}} \leftarrow \left( {\textbf{I}}_n - \frac{\tau }{2}{\textbf{B}}\right) ^{-1} \left( {\textbf{I}}_n + \frac{\tau }{2}{\textbf{B}}\right) {\textbf{R}}, \end{aligned}$$

(G8)

where $\tau \in {\mathbb {R}}$ is the step size. To find the appropriate step size, a line search can be adopted to find $\tau $ satisfying the Armijo-Wolfe conditions (Nocedal and Wright 2009). To reduce the complexity of calculation and update faster, a stochastic gradient descent-like method can be adopted.

Although we propose this theoretical method to solve the ICA problem, the algorithm may be difficult to implement practically because solving the gradient of ${\widehat{I}}_{\sigma }({\textbf{R}})$ is much more complicated than computing ${\widehat{I}}_{\sigma }({\textbf{R}})$. In addition, the second-order condition should be considered to evaluate whether the final point attains the local minimum. Therefore, determining a differentiable approximation of ${\widehat{I}}_{\sigma }({\textbf{R}})$ is important for future research.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pi, HK., Guo, MH., Chen, RB. et al. ECOPICA: empirical copula-based independent component analysis. Stat Comput 34, 52 (2024). https://doi.org/10.1007/s11222-023-10368-3

Download citation

Received: 02 October 2023
Accepted: 23 November 2023
Published: 13 December 2023
DOI: https://doi.org/10.1007/s11222-023-10368-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ECOPICA: empirical copula-based independent component analysis

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Regularized distributionally robust optimization with application to the index tracking problem

Tutorial on PCA and approximate PCA and approximate kernel PCA

Notes

References

Acknowledgements