Computational Statistics and Data Analysis

The MIXANDMIX (mixtures by Anderson mixing) tool for the computation of the empirical spectral distribution of random matrices generated by mixtures of populations is described. Within the population mixture model the mapping between the population distributions and the limiting spectral distribution can be obtained by solving a set of systems of non-linear equations, for which an efficient implementation is provided. The contributions include a method for accelerated fixed point convergence, a homotopy continuation strategy to prevent convergence to non-admissible solutions, a blind non-uniform grid construction for effective distribution support detection and approximation, and a parallel computing architecture. Comparisons are performed with available packages for the single population case and with results obtained by simulation for the more general model implemented here. Results show competitive performance and improved flexibility.


Introduction
Random matrix theory is at the core of modern high dimensional statistical inference (Yao et al., 2015) with applications in physics, biology, economics, communications, computer science or imaging (Couillet and Debbah, 2013;Paul and Aue, 2014;Bun et al., 2017). In the large dimensional scenario, classical asymptotics where the available number of samples of a given population N is much larger than the dimension M are no longer valid. A key contribution in this setting is the Marčenko-Pastur theorem (Marčenko and Pastur, 1967;Silverstein and Bai, 1995), which characterizes the limiting behaviour of the empirical spectral distribution (ESD) for matrices with random entries when N → ∞. Namely, there exists a fixed point equation relating the eigenvalues of the empirical and population distributions, which can be used for inference under the mediation of appropriate numerical techniques. Using this characterization, subsequent asymptotics-based inference can be performed, for instance, for dimensionality reduction, hypothesis testing, signal retrieval, classification or covariance estimation (Yao et al., 2015). In addition, Silverstein and Choi (1995) develops ideas outlined in Marčenko and Pastur (1967) to analyse the support of the ESD based on the monotonicity of the inverse of the function involved in the fixed point equation. Emerging from statistical models in array processing, the extension in Wagner et al. (2012) focuses on generalizing previous results to the case where the matrix rows are independent but drawn from a collection of population distributions. Here, the relation between the population covariances and the limiting behaviour of the sample eigenvalues is governed by a system of non-linear equations and, although asymptotic sample eigenvalue confinement has also been proved (Kammoun and Alouini, 2016), a simple description of the support is no longer at hand.
Despite the variety of potential applications, not many works have practically confronted the numerical issues involved in computing the ESD from a given population distribution. The most flexible package that we have identified has been described in Dobriban (2015) (SPECTRODE), where the fixed point equation in Silverstein and Bai (1995) is transformed into an ordinary differential equation (ODE) with initial condition obtained by the solution of the fixed point equation on a single point within the support. In this work the author shows that the proposed method compares favourably with straightforward fixed point solvers, both in terms of accuracy and computational efficiency. In addition, accuracy limitations of Monte Carlo simulations (Jing et al., 2010) are showcased and, while recognizing their independent interest, arguments are provided about the practical limitations (Rao and Edelman, 2008) or limited applicability (Olver and Nadakuditi, 2013) of other approaches. In our experiments, this package has revealed an exquisite accuracy and efficiency in computing the ESD and determining its support. However, the applicability or efficiency of an ODE approach may be compromised for more general models such as Wagner et al., 2012. This is mainly due to the increased computational complexity of evaluating the Jacobian of the system of fixed point equations, which no longer depends on the eigenvalues of the populations but on a combination of their covariances. Another interesting tool, conceived to solve the more general problem of recovering the population distribution from the observed eigenvalues by means of the so called QuEST function, has been described in Ledoit and Wolf (2017). Solving this problem is required, for instance, for covariance estimation. The literature review in this work points to a systematic limitation of most precedent methods, as they are only capable to obtain estimates for very particular forms of the population distribution. In addition, it introduces an interesting feature not present in Dobriban (2015), the use of a non-uniform grid with increased resolution near the support edges, which allows more efficient approximations. However, once again, this technique is designed for cases where a single nonlinear equation is to be solved while generalizations to systems of equations do not seem straightforward.
This technical note describes a series of tools that have been developed to compute the ESD for the mixture of populations case in Wagner et al. (2012). This model or certain analogous forms, has attracted interest in areas such as telecommunications (Moustakas and Simon, 2007;Couillet et al., 2011;Dupuy and Loubaton, 2011), machine learning (Benaych-Georges and Couillet and Benaych-Georges, 2016), medical imaging (Cordero-Grande et al., 2019) or genetics (Fan and Johnstone, 2019). Our method is based on directly solving the system of nonlinear equations. However, the results in Dobriban (2015) and our own analysis, have identified certain limitations in commonly reported algorithms based on fixed point iteration solvers, so a set of technical refinements are proposed in this note. These include the use of Anderson mixing to accelerate the fixed point iterations, a homotopy continuation method to prevent non-admissible solutions, a set of heuristics to detect the support of the distribution and to adapt the approximation grid to the ESD shape, and a formulation that allows for efficient computations in graphical processing units (GPU). We validate our tool by comparison with Dobriban (2015) and Ledoit and Wolf (2017), both in terms of efficiency and accuracy. As our methods are envisaged to operate on more general models than those contemplated in Dobriban (2015), Ledoit and Wolf (2017), they do not make use of any precomputed information about the distribution support. Nevertheless, we show that they are reliable enough and their efficiency and accuracy are comparable or superior to those of Dobriban (2015), Ledoit and Wolf (2017). In addition, comparisons with Monte Carlo simulations show that they are also capable of providing accurate results for more general models. A MATLAB implementation of our approach, that we will refer to as the MIXANDMIX (Mixtures by Anderson Mixing) method, including the scripts required to replicate the experiments in this note, has been made publicly available at https://github.com/mriphysics/MixAndMix/releases/tag/1.1.0. This note is organized as follows: in Section 2 we review different random matrix models, in Section 3 we describe the main features of our method, in Section 4 we validate the proposed technique, in Section 5 we discuss the implications of the obtained results and in Section 6 we end up with some conclusions.

Theory
Consider a complex random matrix X of size N × M. We are interested in the eigenvalue distribution of the sample covariance Y = N −1 X H X in the asymptotic regime where both M → ∞ and N → ∞ but they maintain a fixed ratio γ = M/N with γ > 0. For simplicity we assume the entries of the matrix are zero mean Gaussian distributed, but keeping in mind that the literature contemplates different generalizations. Three main scenarios are contemplated:

IID standard entries
Within this model, that we call standard model, we can write X H ∼ CN (0 MN , I MN ), with CN denoting a circularly symmetric complex Gaussian distribution, 0 MN the M × N matrix with zero entries, and I MN the MN × MN identity matrix. Marčenko and Pastur (1967) showed that in this case the distribution of eigenvalues of the sample covariance asymptotically converges to with γ − = (1 − √ γ ) 2 and γ + = (1 + √ γ ) 2 defining the support of the distribution. Note that if γ ≥ 1, the distribution has a 1 − γ −1 point mass (γ > 1) or is locally unbounded (γ = 1) at x = 0. Thus, in this case, (1) gives an explicit characterization of the ESD.

IID rows
In this scenario, we can write X H ∼ CN (0 MN , Λ M ⊗ I N ), with Λ M a given population covariance matrix. Marčenko and Pastur (1967) and Silverstein and Bai (1995) respectively derived expressions for probabilistic and almost sure limits of the sample covariance spectrum using its Stieltjes transform for which the inversion formula would give back the ESD by If, in a discrete setting, we denote the increasingly sorted eigenvalues of Λ M by {λ 1 , . . . , λ M }, when N → ∞ the eigenvalue distribution of the matrixỸ = N −1 XX H converges to a function whose Stieltjes transformm(z) is related to the population distribution by the fixed point equationm(z) = ) −1 and to the Stieltjes transform of the corresponding limiting function for the spectral distribution of the sample covariance matrix Y, m(z), bym(z) = γ m(z) + (γ − 1)/z. In addition, Silverstein and Choi (1995), following the guidelines in Marčenko and Pastur (1967), showed that the support of the ESD can be characterized by studying the zeros of the derivative of the inverse map, z ′ (m), in appropriate domains. This property, together with the analyticity of the empirical density within its support are the keys to the SPECTRODE approach in Dobriban (2015).
To clarify the connections between the different models, it is more convenient to express the previous relations in terms of auxiliary functions e(z) = − 1 γ zm(z) − 1 γ for an equivalent fixed point equation, and an expression for the Stieltjes transform of the ESD, We refer to this model as the single population model.

Independent rows
In this mixture of populations model, the matrix is drawn from X H indicator matrix with ones in the diagonal elements corresponding to those rows sampled according to the population covariance for which there is a unique solution in C \ R + . m(z) is expressed in terms of these functions as with α k = tr(D k )/N. Note that (6) and (7) reduce to (4) and (5) when K = 1. There are two main limitations to extend the SPECTRODE method to this setting. First, we are unaware of studies characterizing the support of the measures inducing e j (z) by means of some analogue to Silverstein and Choi (1995). Second, both potential extensions of support characterizations or usage of ODE solvers would require the Jacobian of (6), which involves additional matrix multiplications, with a penalty in computational efficiency. Thus, we have focused on developing a reliable method to solve the system (6) not requiring the Jacobian.

Methods
In this Section we describe our numerical solver for the system of equations in (6).

Support detection
When γ → 0 the ESD tends to the population distribution for the standard and single population models, while for the mixture of populations it is governed by an effective single population distribution Thus, in this limiting case the ESD support only includes the eigenvalues of an equivalent single population distribution. On the other side, for γ > 0, , with λ k m denoting the mth eigenvalue of Λ k M and t > 1, provide lower and upper bounds on the limiting support.
To consider these two extreme cases, we group and sort the set of Our method attends to the overlap of the set of intervals i.e., the intervals generated by the maximum possible spectral dispersion of each grouped eigenvalue, including a numerical safety margin t, which we have set to t = 1.001 (and constraining |γ − 1| ≥ 10 −9 to avoid the potential singularity at x = 0 when γ = 1). The support of the limiting distribution will be contained in [a, b], as this interval considers the maximum possible eigenvalue dispersion as produced by minorizing (a) and majorizing (b) point mass population distributions governed by (1). On the contrary, there is no guarantee for ⋃ p [a p , b p ] to contain the whole support of the limiting distribution.
However, the intervals in (8) serve to cover the γ → 0 regime, which originates challenging highly localized support intervals, as in this case our construction guarantees that the support will be contained in the defined intervals.
Lack of overlap among adjacent intervals is detected by checking the condition b p < a p+1 . Assuming this is observed I ≤ P − 1 times, we can get a partition into I + 1 segments, each one induced by P i eigenvalues. Each of these support segments is gridded using a total of max(M (o) P i , M (i) ) points, with M (i) the minimum number of points per segment and ≤ M (i) the minimum ratio of points per number of grouped eigenvalues. Importantly, to improve detectability, by noting the multiplicative dependence of the spectral dispersion width with the spectral location in (8), gridding is performed uniformly in logarithmic units. Then, we can call the solver of (6) (to be described in Section 3.3), and compute (7) and (3) over the defined grid locations.
The output of this first step is a set of estimates for the distribution in a non-uniform grid x = x p with 1 ≤ p ≤ P 0 and

Adaptive regridding
Additional points are added to the grid by pursuing g ′ (x) the grid mapping function and c a constant. This non-uniform grid construction criterion is based on both the second order derivative of the density f ′′ (x) and the grid value x. The first feature, f ′′ (x), favours the allocation of grid points near the support edges, in accordance to the √ x − x 0 behaviour of the distribution at the boundaries (Silverstein and Choi, 1995), as well as in those areas where linear interpolation results in larger approximation errors. This is similar in spirit to the arcsine criterion in Ledoit and Wolf (2017) but does not use any prior information about the support edges as it is not available in the mixture of populations model. The second feature, x, favours the allocation of grid points near the upper edge of the support, which could be important for applications related to signal detection (Nadakuditi, 2014;Dobriban and Owen, 2019). After P l = R l P 0 points are added to the grid, the solver for f (x) is called on the new set of points to allow for an update of the f ′′ (x) values to be used at the next iterative regridding step. This whole process is repeated L times, so R l , 1 ≤ l ≤ L control the final resolution of the distribution computations. The parameters by default in our implementation are R l = 1 ∀l and L = 1.

Homotopy continuation
The calculation of the ESD involves a pass to the limit in (3) as the Stieltjes transform does not converge in the real line. In addition, the solution of (6) is not unique on the real line. Numerically, this may provoke spurious fixed point convergence when the current solution is far away from the optimum and the computations are being performed in locations that are close to the real line. To prevent these situations, we have emulated (3) by homotopy continuation.
We start by obtaining an approximate solution for (6) in a grid given by z = x + ξ 2 i, with ξ = ξ 0 1 P l for big enough ξ 0 common for all 1 ≤ p ≤ P l . At each iteration i we compute ε i p = max where the updates on e are to be described in Section 3.4. Considering that, as discussed in Dobriban (2015), to obtain an accuracy of at least ϵ for the distribution f (x), we need to solve the system of equations in a complex grid given by x + iϵ 2 , we can perform the update ξ j+1 The iterations at the grid location indexed by p are terminated when the prescribed accuracy is reached, namely when ξ j p = ϵ and ε i p < ϵ. In our experiments we have used ξ 0 = 1 and β = 10, for which we have observed robust and efficient performance.

Anderson acceleration
The experiments in Dobriban (2015) showed that when solving the IID rows problem in Section 2.2 by a straightforward fixed point algorithm, in our context when performing the updates on e(z) directly using (4), convergence is often very slow, so they reported situations where their SPECTRODE code could be 1000× quicker while simultaneously obtaining 1000× higher accuracy. In this note we show that this apparent limitation of the fixed point iterates can be overcome by using techniques to accelerate their convergence. As discussed in Section 2.3, the accelerated convergence achievable by methods requiring the Jacobian of the fixed point identities may not compensate for the increased cost per iteration involved in computing the Jacobian. Thus, we have resorted to Anderson mixing (Anderson, 1965), a technique not requiring explicit Jacobian calculations that has demonstrated good practical performance, in occasions providing competitive results when compared to gradient-based approaches (Ramière and Helfer, 2015).
Considering a given multidimensional fixed point mapping g(e) such as (6), Anderson iterations are computed as with Q i denoting the number of iterations whose history is used for the update at iteration i and ν i = (ν i 1 , . . . , ν i Due to the potential ill-posedness of this system, we have actually solved a damped version (Scieur et al., 2019;Henderson and Varadhan, 2019) with damping parameter given by λ i = 0.1 max k,q |∆H i k,q |. We have set Q i = min(2, i − 1) on the basis of our empirical testing and in agreement with the experimental results in Ramière and Helfer (2015).

GPU acceleration
GPU-based implementations of the SPECTRODE method (Dobriban, 2015) appear involved due to the sequential nature of ODE solvers. In contrast, acceleration of the ESD computation in the QuEST method (Ledoit and Wolf, 2017) seems more plausible as it solves for a zero of a function independently for the different grid locations, but the authors have not discussed this aspect. Our code has been architectured so that most demanding routines support both CPU and GPU based parallel computations. This includes the parallel computation of the solutions of the system of equations in (6) for the different grid locations but also the parallel computation of different ESDs, required, for instance, in patch-based image denoising applications (Cordero-Grande et al., 2019).

Results
In this Section we first validate the beneficial effects of Anderson acceleration and homotopy continuation in ESD computations (Section 4.1), then compare our tool to the SPECTRODE and QuEST proposals in those regimes in which they can operate (Section 4.2) and finally provide some results on the application of our technique to the mixture of populations model (Section 4.3). Unless otherwise stated experiments are performed using CPU computations.

Validation of introduced refinements
In Dobriban (2015) an experiment was performed illustrating the limitations of fixed point iterations to obtain accurate results in ESD calculations. Their method is compared with a fixed point iteration solver for the standard model in Section 2.1, using the closed form density in (1) with γ = 0.5 to assess the accuracy. Here we repeat that experiment adding our Anderson acceleration technique to the fixed point solver. The results are presented in Fig. 1. First, we have been able to replicate the results in Dobriban (2015); the fixed point algorithm, grossly equivalent to the MIXANDMIX implementation with Q = 0, i.e., without Anderson mixing, is only able to provide very moderate accuracies, despite being run for 1/ϵ iterations. However, when introducing the Anderson acceleration scheme, the results are dramatically better, with MIXANDMIX and SPECTRODE demonstrating comparable performance. In this experiment the curves show a slightly better accuracy (Fig. 1a) and worse computational efficiency (Fig. 1b) for MIXANDMIX, but this should be taken with caution as these tests have been conducted without considering the influence of grid sizes on the approximation, which will be taken into account in the experiments in Section 4.2. In addition, we show (Fig. 1c) that despite the accuracy obtained by a straightforward fixed point algorithm is poor everywhere within the support, the accuracy curve after Anderson mixing remains below the SPECTRODE curve almost everywhere.
In Fig. 2a we compare the results of our method without and with homotopy continuation to those of SPECTRODE. The SPECTRODE method and ours with homotopy continuation are observed to overlap at the scale of the plot. However, when running MIXANDMIX without homotopy continuation, we can see that there exist some grid points for which the computations spuriously converge to the zero solution. We know this solution is infeasible because it provokes discontinuities in the distribution, which contradicts its expected analyticity properties (Silverstein and Choi, 1995). To illustrate the reasons for these numerical issues, we have taken a grid point corresponding to one of these infeasible results, x = 2.2. For this point, Fig. 2b-e show the squared magnitude of the residuals of the fixed point maps, |h(e)| 2 , at different complex plane values of the auxiliary function e(z) = e(x + δi) as we approach the real line with δ = {1, 0.1, 0.01, 0.001}. First, for z = x + 1i there is a unique minimum in C + whose basin of attraction covers the whole of C + . As we decrease δ (see for instance z = x + 0.1i) we can track this minimum in a neighbourhood of its previous location and check that it is still the only one in the upper half of the complex plane. However, a new local minimum has emerged in the lower half of the plane but so close to the real line that its basin of attraction extends to the upper half. As we keep decreasing δ, the basin of attraction of this minimum in C + gets bigger; however, by analyticity there has to be an area around the global optimum for which the method should still converge to the global optimum, which can be ensured by homotopy continuation. This explains the problem we are observing in the left hand side: the fixed point algorithm has entered the basin of attraction of the minimum in the lower half and it has not been able to escape from this area. In addition, the location of the attractor explains why the spurious distribution value obtained by the fixed point algorithm is generally pushed to 0 when failing to converge to the right optimum.

Comparison with the literature
In Fig. 3 we compare the accuracy and computational efficiency of MIXANDMIX with the SPECTRODE and QuEST approaches for two population distributions that admit a closed form expression for the ESD. In Fig. 3a,b, we show respectively the averaged accuracy and computation times for a set of aspect ratios ranging from γ = 0.05 to γ = 0.95 in steps of 0.1, γ = 1, and reciprocal 1/γ ranging from γ = 0.05 to γ = 0.95 in steps of 0.1 for the standard distribution (MP) in (1). Corresponding accuracies throughout the support together with the gold standard density (in a logarithmic scale) are shown in Fig. 3c for the γ = 0.5 case. Analogous plots are provided in Fig. 3d-f for a two-delta (δδ) distribution with equiprobable eigenvalues at λ 1 = 1 and λ 2 = 8, for which the ESD can be obtained by solving a third order polynomial equation (Dobriban, 2015;Rao and Edelman, 2008). The SPECTRODE and MIXANDMIX approaches have been run with an accuracy parameter providing similar computation times than those of the QuEST method using 100 grid points, which corresponds to ϵ = 10 −6 /ϵ = 10 −5 (MP, γ < 1/γ ≥ 1) and ϵ = 10 −5 /ϵ = 10 −4 (δδ, γ < 1/γ ≥ 1) for SPECTRODE and ϵ = 10 −5 , L = 3 (both) for MIXANDMIX. To account for the relative grid complexities of different methods, linearly interpolated densities are compared with close-form solutions in a uniform grid comprised of 10 000 evenly distributed points along the support. MIXANDMIX is roughly 2 and 1 orders of magnitude more accurate than QuEST and SPECTRODE respectively. We observe that the computation times of SPECTRODE largely depend on the aspect ratio, with increments of several orders of magnitude as γ → 1, while they are much more uniform for MIXANDMIX and QuEST. As for the accuracy distributions, they are generally satisfactory for all methods, but MIXANDMIX seems to provide improved results near the edges for the MP case and throughout the support for the δδ case. The grid sizes used by each method for γ = 0.5 have MIXANDMIX is potentially limited by risks of failures at detecting all the segments comprising the distribution support. Thereby, we have conducted some experiments to test the support detection reliability in challenging scenarios. In Fig. 4 we cover analogous experiments to those in Fig. 3 for skewed δδ problems with λ = {1, 100} with respective multiplicity ratios w = {0.99, 0.01} (Fig. 4a-c), left skewed (LS) problem, and w = {0.01, 0.99} (Fig. 4d-f), right skewed (RS) problem.
Note that due to the scale differences in the population eigenvalue locations, results on the accuracy throughout the domain are more conveniently represented in a logarithmic scale. SPECTRODE parameters for approximately matched average computation times have now been modified to ϵ = 10 −4 /ϵ = 10 −3 (LS, γ < 1/γ ≥ 1) and ϵ = 10 −3 /ϵ = 10 −2 (RS, γ < 1/γ ≥ 1) while MIXANDMIX parameters could be kept the same as in previous experiments while still matching QuEST computation times. Once again, results in Fig. 4a, d show an improvement of averaged accuracy by several orders of magnitude when using the MIXANDMIX method. Results on Fig. 4c, f show that MIXANDMIX has been capable to approximate the support for both problems and to provide more accurate results than the other two methods almost everywhere within the support. In this case the grid sizes have been 104/104 for QuEST, 6963/9130 for SPECTRODE and 1248/1248 for MIXANDMIX for the LS/RS problems.
In Fig. 5 we apply the three methods to a challenging comb-like (Dobriban, 2015) problem. In this case the population eigenvalues come from 100 equiprobable point masses uniformly distributed in the interval [0.1, 10], thus with a large number of components that differ by as much as two orders of magnitude. We show the results at selected areas of the support for γ = 0.025, γ = 0.5 and γ = 0.975 respectively in Fig. 5a-c, again using logarithmic scaling for improved visualization. When tuning the SPECTRODE parameter for similar computation times to those of QuEST and MIXANDMIX, which turned out to happen at ϵ = 10 −3 , the visual impression is of limited performance. In contrast, MIXANDMIX appears to be powerful, showing strengths not only in detecting all the support intervals, but also in picking-up the density oscillations observed for γ = 0.025 and the steeped lower edge for γ = 0.975. We have quantitatively assessed this perception by running the final experiment including also the SPECTRODE results at the accuracy level where we could not visually detect any differences with MIXANDMIX results, ϵ = 10 −6 for both γ = 0.025 and γ = 0.5, or the maximum code accuracy, ϵ = 10 −8 , for γ = 0.975.

ESD of population mixtures
In this Section we provide an illustration of the MIXANDMIX results for the mixture of populations model and the benefits of the GPU architecture. Fig. 6a shows the calculated ESD for a mixture of populations with K = 6 equiprobable populations, so α k = 1/K , drawn from population covariances Λ k M(DIAG) = Λ k mn(DIAG) = (((m + k) mod K ) + 1)δ[m, n], i.e., a diagonal matrix whose elements in the diagonal grow from 1 to K with period K with this pattern being shifted across the populations. Fig. 6b extends this DIAG problem to non-diagonal population covariances using Λ k M(CORR) = Λ k mn(CORR) = ρ |m−n| l √ Λ k mm(DIAG) Λ k nn(DIAG) , ρ < 1, l > 0, so this CORR problem reduces to the DIAG problem when l → ∞. Namely, Fig. 6b shows the results for ρ = 0.2, l = 0.25 and γ = 0.5, here with M = 120 for convenience. We can see that calculations are in agreement with simulations (with these being obtained using the biggest matrix sizes that we would fit in our GPU memory) for both the DIAG and CORR problems, with the CORR results showing a larger spectral dispersion due to the larger condition number of the non-diagonal matrix. Finally, Fig. 6c shows the GPU computation times for the CORR problem for a number of populations ranging from K = 1 to K = 6. First, we can appreciate a significant penalty when moving from K = 1 to K = 2, as that switches the problem from solving a single equation based only on the eigenvalues to a system of equations based on the whole structure of the covariance matrices. Second, we observe that for K ≥ 2 the computation times grow with K following a lower than 1 ratio. This is to be attributed to an increased degree of parallelization of the GPU implementation for bigger problems and a stable fixed point Anderson acceleration in the multidimensional case.

Discussion
We have presented a set of numerical techniques to aid in the computation of the ESD in a mixture of populations model. These include the use of Anderson mixing to accelerate the fixed point iterations, homotopy continuation for robust convergence to the right optimum, adaptive grid construction to efficiently detect the support and approximate the distribution, and a parallel architecture to tackle the increased computational demands in this setting. Results have shown that our method offers favourable practical efficiency-accuracy tradeoffs when compared with related approaches while being able to address more general models than available tools.
Our tool has focused on the model described in Wagner et al. (2012) but it could straightforwardly be adapted and optimized to address several analogous models in the literature, such as those mentioned in Section 1. Nevertheless, we have not contemplated even more general models such as the Kronecker model (Zhang et al., 2013), with a recent contribution for the separable case in Leeb (2019), or more general couplings between the matrix elements (Wen et al., 2011;Lu et al., 2016). However, these models generally involve the solution of more intricate systems of nonlinear equations with more auxiliary functions, but with strong functional resemblances to the system we have studied here, so there is potential to reuse or adapt our tools to tackle them. The population model discussed in this paper admits a free probability (Mingo and Speicher, 2017) based formulation given by where ⊠ represents the free multiplicative convolution and ⊞ the free additive convolution over the summands. This implies that the machinery described in Belinschi et al. (2017) could be used to obtain the system of equations in Section 2.3. However, the models in Belinschi et al. (2017) are more general, including self-adjoint polynomials not necessarily built from a combination of Marčenko-Pastur and atomic distributions, but simply from asymptotically freely independent matrices. Thus, investigations are required to discern under what conditions our numerical tools can be extended to cope with these models. The proposed technique does not fully exploit the existing descriptions used for support detection in the single population model because we are not aware of any such descriptions for mixtures of populations (although recent results such as those in Bao et al. (2019), Ji (2019) may pave the way for them). Anyhow, the gridding procedure has shown a robust behaviour in challenging practical scenarios, even when compared with methods that exploit the properties of the support. The main reason is the introduction of a logarithmic grid, that enables efficient support searches at different spectral scales. This has been synergistically combined with an adaptive grid refinement making use of the second order derivative of the density (with a bias towards the upper edge information) generally with more efficient approximations than provided by previous methods. Although this grid refinement criterion has proven effective for all the tested cases, generally providing finer spectral resolvability than previously proposed methods, other criteria may be more appropriate for different applications. In this regard, we should mention that our software is built on top of a generic grid refinement method that allows to test other possibilities by simply defining different criteria for grid cell subdivision, with some exemplary alternatives already included in the code. In addition, support detection correctness can generally be inferred from the numerical integration of the estimated measures, which is provided as an output.
Another difference with previous approaches is the dependence of the method on more parameters. Although this may add an extra degree of complexity for users, we have observed good behaviour for all test cases without resorting to parameter tuning, so the tool should be readily useable for many applications. From a different perspective, the combined inspection of simulations and manipulation of these parameters may allow to fine tune the method in most challenging scenarios, some of them, as shown in some of the experiments in Section 4.2, not being adequately covered by the reduced flexibility of related approaches. In summary, grid detection robustness can be improved by increasing M (i) , robustness of approximation by increasing ξ 0 and/or decreasing β and accuracy by decreasing ϵ and/or increasing L. However, a line for future research could involve the incorporation of refined techniques for parameter selection. A possibility could be to investigate more efficient and robust interplays between homotopy continuation and nonlinear acceleration, for instance making use of adaptive regularization schemes for nonlinear acceleration (Scieur et al., 2019).

Conclusions
This work has introduced a set of techniques to compute the ESD in a mixture of populations model. A generic procedure using only the functional form of the fixed point equations relating the population and limiting empirical distributions has been proposed. Efficient convergence is achieved by Anderson acceleration and homotopy continuation and novel strategies for grid construction have been provided. The method has compared well with related proposals in the literature which, to our knowledge, are only capable to address more restricted models. By providing this detailed description of our solution, we expect our distributed tool to be of practical interest for statisticians working in the field.