Hierarchical MTC User Activity Detection and Channel Estimation with Unknown Spatial Covariance

This paper addresses the joint user identification and channel estimation (JUICE) problem in machine-type communications under the practical spatially correlated channels model with unknown covariance matrices. Furthermore, we consider an MTC network with hierarchical user activity patterns following an event-triggered traffic mode. Therein the users are distributed over clusters with a structured sporadic activity behaviour that exhibits both cluster-level and intra-cluster sparsity patterns. To solve the JUICE problem, we first leverage the concept of strong priors and propose a hierarchical-sparsity-inducing spike-and-slab prior to model the structured sparse activity pattern. Subsequently, we derive a Bayesian inference scheme by coupling the expectation propagation (EP) algorithm with the expectation maximization (EM) framework. Second, we reformulate the JUICE as a maximum a posteriori (MAP) estimation problem and propose a computationally-efficient solution based on the alternating direction method of multipliers (ADMM). More precisely, we relax the strong spike-and-slab prior with a cluster-sparsity-promoting prior based on the long-sum penalty. We then derive an ADMM algorithm that solves the MAP problem through a sequence of closed-form updates. Numerical results highlight the significant performance significant gains obtained by the proposed algorithms, as well as their robustness against various assumptions on the users sparse activity behaviour.


I. INTRODUCTION
Machine-type communications (MTC) constitute one of the fundamental pillars in the current 5G cellular systems [1].In MTC networks, the base station (BS) aims to provide connectivity to a massive number of low-cost energy-constrained devices, known as user equipments (UEs).In the practical MTC, the UEs exhibit sporadic activity, generate predominantly uplink traffic and transmit mainly short-packet data [2].Consequently, employing the conventional channel access protocols in MTC networks would result in a large signalling overhead [1].To address this issue, grant-free access protocols have been introduced, enabling the UEs to access the network without the need for scheduling requests in advance.Therefore, utilizing grant-free access in MTC would result in a low signalling overhead and an extended lifespan for the UEs.
To fully exploit the features of grant-free protocols, the BS has to accurately perform the task of joint UE identification and channel estimation (JUICE).Subsequently, owing to the sporadic nature of UEs traffic, the JUICE problem has been widely formulated as a sparse recovery problem.Furthermore, as the BS antennas sense the same sparse activity behaviour, the JUICE problem extends to the multiple measurements vector (MMV) setup.Solving the sparse recovery problem in MMV can be achieved via, for instance, greedy algorithms such as simultaneous orthogonal matching pursuit (SOMP) [3], mixed norm optimization approaches [4], sparse Bayesian learning (SBL) [5], and message passing algorithms [6].

A. Related Work
Numerous ongoing efforts to devise algorithms to solve the JUICE are presented in the literature.For instance, Chen et al. [7] addressed the JUICE under two scenarios, with and without prior knowledge of the large-scale fading.For both scenarios, they evaluated analytically a JUICE solution based on the AMP algorithm.Furthermore, Liu et al. [8] provided asymptotic performance analysis for a Bayesian AMP algorithm in terms of activity detection and channel estimation under the assumption of known large-scale fading.Ke et al. [9] considered an enhanced mobile broadband network, and investigated the performance of the generalized AMP algorithm that exploits the structured sparsity in the multiple-input multiple-output (MIMO) channels.The authors in [10], [11] proposed JUICE solutions from a non-Bayesian perspective, where they formulated the JUICE as a maximum likelihood problem and solved it via coordinateblock descend [10] and non-negative least-squares [11].Nonetheless, these works focused on the JUICE only under uncorrelated fading channels, which may not accurately represent practical situations as the MIMO channels are typically spatially correlated [12].In fact, JUICE solutions that are designed for uncorrelated channels may exhibit sensitivity to the correlation structures encountered in practical scenarios [13].Thus, incorporating the spatial correlation in designing the JUICE solutions is of utmost importance, and it is still in its infancy in the literature.
Recently, few works have looked into the JUICE problem under spatially correlated channels.
For instance, Cheng et al. exploited in [14] both the spatial and temporal correlation of the propagation channels and proposed the orthogonal AMP algorithm to solve the JUICE problem.
Rajoriya et al. [15] proposed a Bayesian solution that couples AMP and SBL to provide an algorithm that enjoys the low complexity of AMP and the good performance of SBL.Bai et al. [16] proposed a distributed AMP algorithm in cell-free MTC networks that aims to reduce the complexity of AMP by distributing the computation load over several access points.Moreover, we formulated the JUICE as an ℓ 2,1 -norm minimization problem [17], [18] and as a maximum a posteriori (MAP) problem in [19].For both formulations, we derived computationally-efficient algorithms based on the alternating direction method of multipliers (ADMM).Furthermore, several theoretical analysis studies have investigated the performance of AMP for JUICE in correlated MIMO channels, in terms of activity detection accuracy [20], [21], channel estimation [21] and achievable rate [15].
While the works in [14]- [21] address the JUICE problem under the more practical spatially correlated MIMO channels, they make the assumption that the channel distribution information (CDI) for all the UEs are fully known to the BS at any transmission instance.However, this October 17, 2023 DRAFT assumption can be challenging to fulfill in realistic scenarios, as the BS cannot track the CDI of the UEs with long inactive status.Furthermore, the prior works on the JUICE consider MTC networks with an independent activity pattern amongst the UEs, modeling for instance, a scenario where each UE monitor an independent processes and thus activate randomly.Nonetheless, in practice, the UEs are deployed over several clusters, where the UEs within each cluster monitor the same phenomena, for example, leading to the paradigm of event-triggered traffic models.
The event-triggered traffic model results in a hierarchical sparse activation pattern constituting in both cluster-level sparsity and intra-cluster sparsity.The cluster-level sparsity arises because the event epicenters are concentrated around a small subset of clusters (referred to as active clusters), causing only the UEs from those active clusters to be prompt for activity.On the other hand, each event would trigger only a subset of UEs to be active in practice, resulting in an intra-cluster sparsity model1 .To the best of our knowledge, only two works have investigated the JUICE problem with correlated UEs activity pattern.Liu et al. [22] addressed only the activity detection problem and proposed a solution based on preamble selection under different assumptions on the prior knowledge on the activity correlation distribution.Becirovic et al. [23] proposed two sparsity-promoting priors based on ℓ 2,1 -norm and total-variation and proposed a non-negative least squares algorithm to solve a relaxed version of an ℓ 0 −norm minimization.

B. Main Contribution
This paper makes the following two distinctions from the prior works: first, we address the JUICE in spatially correlated MIMO channels with no prior knowledge on the exact CDI.Second, in contrast to the mainstream grant-free access literature, where the traffic is assumed to be independent amongst the UEs, we consider an MTC network where the UEs are distributed into clusters with an activation pattern following an event-triggered traffic model, thus, introducing a correlated activity pattern amongst the UEs.For instance, this could model a network where the UEs form clusters based on their geographical locations and each cluster is associated with a common task.Here, an event could trigger a small subset of UEs belonging to a particular cluster to activate concurrently, leading to hierarchical UE activity pattern.
The main contributions of our paper can be summarized as follows2 : • This paper proposes two Bayesian inference approaches to solve the JUICE problem in spatially correlated MIMO channels with a hierarchical UEs activity pattern and a limited or no prior knowledge of the CDI.
• We propose a hierarchical spike-and-slab prior that incorporates both the cluster-level and intra-cluster sparsity patterns.Subsequently, we derive a solution based on introducing an expectation propagation (EP) algorithm [25] within an expectation maximization (EM) [26] framework.The proposed solution iteratively approximates an intractable joint posterior distribution and provides an estimate on the active UEs, their channels, as well as the CDI.
• We propose an alternative solution to the JUICE problem by relaxing the spike-and-slab prior with a log-sum prior [27] that provides a more flexible approach to encode the hierarchical activity pattern.Subsequently, we formulate the JUICE as a MAP problem and derive a solution based on the ADMM framework in order to solve iteratively an approximated version of the MAP problem via a sequence of closed-form updates.
• Numerical results demonstrate the substantial performance gains achieved by our proposed algorithms, as well as their robustness against different activity pattern structures.

A. System Model
We consider a single-cell uplink network consisting of a set N = {1, 2, . . ., N } UEs served by a single BS equipped with a uniform linear array containing M antennas, as depicted in Fig. 1.The UEs are geographically distributed over N c clusters, where each UE belong to a unique cluster.We denote the index set of the lth cluster as C l ⊆ N , l = 1, . . ., N c , where each cluster contains L UEs3 , such that N = LN c .
In contrast to the majority of the literature on grant-free access that considers an independent UEs activation pattern, we consider in this paper an MTC network operating in an event-triggered traffic model inducing a correlated activation pattern between the UEs.In particular, eventtriggered traffic model raises in the practical MTC network where the UEs are grouped into clusters, such that each cluster is associated with a monitoring task.Subsequently, we make herein the following distinctions based on technical observations on the UEs activation pattern under the event-triggered traffic model: • The UEs activity is triggered by events concentrated around a small subset of active clusters, thus, giving rise to a cluster-level sparsity structure.Therefore, we define the cluster-level vector c = [c 1 , . . ., N c ] T , where c l = 1 if the lth cluster is active and c l = 0, otherwise.
• Within each active cluster, a subset containing at most L c ≤ L UEs will be active, thus, inducing a correlation between the UEs activity within the same cluster, in the form of intracluster sparsity.Thus, to model the intra-cluster sparsity, we introduce γ i ∈ {0, 1}, i ∈ N , where γ i = 1 if the ith UE is active and γ i = 0, otherwise.] where the elements belonging to a given cluster (γ i , i ∈ C l ) are assumed to be correlated.The correlation in the UES activity will be discussed in the following sections.
We consider that the channel response h i between the ith UE and the BS follows a local scattering model [28].Thus, h i ∈ C M , ∀i ∈ N , is modeled as a zero-mean complex Gaussian random variable, i.e., Additionally, we adopt the common assumption that the channels are wide-sense stationary.Thus, the changes in the covariance matrices R = {R i } N i=1 occur less frequently compared to the variations in the channel realizations [12].
At any coherence interval T c , each active UE transmits a total of τ c symbols to the BS over two phases.In the first phase, each active UE transmits a τ p -length pilot sequence to the BS, whereas in the second phase, each active UE transmits its information data to the BS using the remaining τ c − τ p symbols.Subsequently, in order to decode the information data transmitted from the active UEs, the BS utilizes the received signal from the pilot transmission phase to perform the JUICE task.To this end, the pilot transmission phase is realized by assigning to each UE i, ∀i ∈ N , a unique unit-norm pilot sequence ϕ i ∈ C τp , and a transmit power p i inversely proportional to its average channel gain in order to to reduce the disparity in the channels gain amongst the UEs.[29], [30].Consequently, the received signal during the pilot transmission phase Y ∈ C τp×M is given by where X = [x 1 , . . ., x N ] ∈ C M ×N represents the effective channel matrix with is the pilot sequence matrix , and W ∈ C τp×M is an additive white Gaussian noise with an i.i.d.elements, where each element is drawn from CN (0, σ 2 ).
The joint detection of the active UEs and estimating their channel reduces to estimating the October 17, 2023 DRAFT (unknown) row-sparse effective channel matrix X T based on the received pilot signal in (1).
Thus, the JUICE can be formulated as a sparse recovery problem from an MMV setup.The prior works in the literature showed that sparse recovery algorithms derived from a Bayesian perspective yield usually the best performance [6].

B. Bayesian Inference Setup
In the Bayesian framework, the unknown variables to be estimated, i.e., X, is modelled using a prior probability distribution functions (PDF) which incorporate and encode both the prior knowledge on X as well as the prior knowledge on its hidden hyper-parameters.Subsequently, utilizing prior functions that promote sparsity while incorporating the hierarchical sparse structures of X T is the key to achieve accurate solutions for the JUICE problem.In particular, sparsity-promoting priors can be categorized as either weak or strong priors.For instance, the Laplace distributions is a weak sparsity prior as it promotes sparsity, but it does not assign strictly a zero probability to the case where the vector x i = 0. On the other hand, strong sparsity priors, such as the spike-and-slab distributions (with delta peak at zero), are more stringent as they assign a strictly positive probability to the case of x i = 0 [31].Therefore, for the JUICE problem in this paper, the priors should be designed in order to the encode (i) the spatial correlation of each x i , ∀i, (ii) the cluster-level sparsity between clusters, (iii) the intra-cluster sparsity structure.We will discuss next, how to design the priors to encode (i), (ii), and (iii).

C. Hierarchical Spike-and-Slab Prior
In the following, we leverage the concept of strong sparsity priors and we propose a hierarchical spike-and-slab sparsity-promoting prior to model the cluster-level sparsity, the intra-cluster sparsity, and the spatially correlated channels.First, we introduce the following parameters: a) Cluster-level activity prior : In order to impose the cluster-level activity we model each b) Intra-cluster sparsity prior: We define the hyper-parameters γ i ∈ R + , ∀i, that imposes a row sparsity structure on X.To this end, we design p(γ) such that it promotes sparse solution, for instance, p(γ) can be drawn from the Laplacian distribution as p(γ) ∝ exp − N i=1 γi .
c) Channels spatial correlation prior: we introduce the set of covariance matrices such that each Ri is positive definite matrix that captures the spatial correlation between the entries of ith row in X T .A common and physically grounded prior for an Ri of the Gaussian random variable x i is given by the inverse Wishart distribution [26], defined as where dom of the distribution, and B i ∈ C M ×M is a symmetric positive-definite matrix that represents the prior information for the covariance matrix Ri [26].
By utilising the definitions above, we model the effective channel x i , ∀i ∈ N , using the spike-and-slab prior as The main idea in (3) can be summarized as follows: • If c l = 0, the vector x i would have only the spike component, i. e. , delta function, from (3), thus leading to an estimation of a zero-vector (x i = 0).
• If c l = 1, x i would have only the slab component from (3) in the form be a Gaussian random vector with covariance matrix γi Ri .Therefore, if γi ≈ 0, the covariance matrix γi Ri of the slab component in (3) would be very small that we could safely estimate that x i ≈ 0, whereas if γi > 0, x i would be a non-zero Gaussian random vector.
Finally, to encode (ii)-(iii) into the Bayesian formulation, we propose the following Hierar-October 17, 2023 DRAFT chical spike and slab prior on p(X|c, R) as where δ X C l = i∈C l δ(x i ), and we denoted for simplicity R i = γi Ri .

III. JUICE VIA EM-EP
A common approach in Bayesian inference when the hyper-parameter set is not known, is to maximize the likelihood function p(Y|Ξ).However, in many cases, the likelihood p(Y|Ξ) is a non-convex function of Ξ and its global maximum cannot be found in a closed form.Thus, Ξ can be obtained through type-II maximum likelihood estimation by finding a local maximum using the expectation-maximization (EM) framework.The classical EM iteratively alternates between two steps, namely, the E-step and the M-step.In the E-step, the current values of the hyper-parameters are used to evaluate the posterior distribution of interest.Subsequently, the hyper-parameters are estimated again using the current statistics of the posterior distribution in the M-step [26].
In the JUICE context, for a given Ξ, the joint posterior distribution p(X, c|Y) is expressed as a product for three factors, namely, f 1 (X), f 2 (X, c) and f 3 (c) as . ( The disadvantage of the spike-and-slab prior is that it renders the computation of the posterior distribution to be a computationally demanding Task.In particular, p(X, c|Y) in ( 5) cannot be computed exactly when N is large and, thus, it has to be estimated numerically.To this end, we resort to the expectation propagation (EP) algorithm to find a tractable approximation to the true posterior distribution p(X, c|Y) in (5).
Next, we shall derive a novel JUICE solution based on coupling the EP algortihm within the EM framework.More precisely, at any EM iteration (k): (i) given the set of hyper-parameter , the EP framework is utilized to approximate the intractable posterior distribution p(X, c|Y), and subsequently, compute the posterior mean, denoted as m

A. Main Idea of EP
The EP algorithm aims to approximate iteratively the true posterior distribution f (X, c) by a simpler distribution Q(X, c) that belongs to an exponential family.More precisely, EP aims is Each factor q k (•), k = 1, 2, 3, of the joint variations approximation Q(X, c) is obtained by minimizing iteratively the Kullback-Leibler divergence [26] as where is termed the cavity distribution.The optimization problem ( 7) is convex with a unique global optimum solution, that is obtained by matching the expected values of the [26].In the following section, we show in details how to derive the approximation factor Q(X, c) through the EP framework.

B. E-Step: Posterior approximation Via EP
The choice of the approximating factors in EP is not stringent, rather, it is flexible.Thus, we design q 1 (•), q 2 (•), q 3 (•) such that they: 1) offer tractability and closed-from updates, 2) capture the important features of the true posterior distribution such as cluster-level and intra-cluster sparsity.To this end, we design the approximate factors as follows Thus, we write the global approximation Q(X, γ, c) as where m = [m 1 , . . ., m N ] and Σ = {Σ i } N i=1 are obtained by applying the product of two Gaussian terms rule, as shown in Appendix A. We note that since q 3 (c) is the same as f 3 (x), it can be obtained directly and we need to estimate only q 1 (X) and q 2 (X, c) as we show next.
1) Estimation of q 1 (X): We describe now how to compute the {m 1,i , Σ 1,i } N i=1 of the first approximate term q 1 (X).Note that since both the f 1 (X) and q 1 (X) has a Gaussian form, f 1 (X) can be approximated exactly by q 1 (X), independently of the values of the other approximate factors q 2 (•) and q 3 (•).Subsequently, we only have to set q 1 (•) = f 1 (•) at the start of the EP algorithm and it can be kept constant afterwards.Now, let us rewrite the vector form of the received signal as where where ⊗ denotes the Kronecker product and the operation vec(•) stacks the columns of the matrix vertically.Subsequently the vector-form of the likelihood function f 1 (•) is given by Similarly, we can write the vector form of q 1 (X) as where Σ1 = diag(Σ 1,1 , . . ., Σ 1,N ) is a block diagonal matrix and m1 = [m T 1,1 , . . ., m T 1,N ] T .Note that f 1 (x) is a distribution of y conditioned on x, whereas q 1 (x) is a function of x that depends on y, m and Σ.Thus, by writing the full Gaussian distributions in ( 14) and ( 15), and rearranging few terms, the first and second moments of q 1 (x) are parameterized as follows 2) Estimation of q 2 (•): Note that q 2 (X, c) in ( 9) factorize into N c independent mixed Gaussian-Bernoulli distributions q 2 (X C l , c l ), l = 1, . . ., N c , allowing for the parallel updates of q 2 (X, c).
In the following, we present in detail how to update each term q 2 (X C l , c l ).
a) Update Q \2,l (•): First, we compute the marginal cavity distribution Q \2,l (X C l , c l ) by removing the contribution of q 2 (X C l , c l ) from the the global approximation Q(X, c) in (11) as where Σ\2,l i and m\2,l i are obtained by utilizing the rule of the fraction of two Gaussian terms, as shown in Appendix A, and they are given as , where G l,0 is the normalizing constant needed to ensure that the f 2 (•)Q \2,l (•) integrates to unity, given as (Please refer to Appendix B) where b l = (1 − ϵ) Since all the terms in Q(•) and f 2 (•) are drawn from a Gaussian distribution, the solution to the previous KL-divergence is given by matching the moments of Q new (•) and f 2,l (•)Q \2,l (•).

October 17, 2023 DRAFT
To this end, we compute the sufficient statistics of f 2,l (•)Q \2,l (•) with respect to both c l and x i , i ∈ C l .Subsequently, the sufficient central moments of Q new (•) are given as The details of the derivations to obtain (20) are presented in Appendix B.
c) Update q 2 (X C l , c l ): finally, the updated q 2 (•) is computed as where the mean and the covariance matrices of the updated q 2 (X C l , c l ) are computed as C. M-Step: Hyper-parameter Update Once q 2 (X, c) is updated using the previous Ξ (k −1) , the new posterior mean m (k) and posterior covariance Σ (k) of Q(X, c) are updated using (12).Subsequently, the M-step at the kth EM iteration is carried out as follows where (a) is obtained by dropping the terms that do not depend on Ξ from joint probability and noting that p(Ξ) = p(γ)p( R) by design.
1) γ-update: Note that the final expression in (24) shows that the prior p(γ) plays a role in the M-step.However, it has been shown that even a non-uniform prior will lead to a sparse vector γ, thanks to the relevance vector machine [32].Thus we will make the same simplification and drop p(γ) from (24).Subsequently, (24) decouples into N subproblem across γ as Thus, the optimal solution is obtained by setting the gradients of the objective function in (25) with respect to γi to zero, resulting in the following update rule 2) Covariance matrix { Ri }-update: Taking inspiration from the argument presented in [33], we recognize that attempting to estimate all N covariance matrices Ri using the data available at the BS would result in overfitting.Therefore, instead of estimating N covariance matrices Ri , we estimate only N c covariance matrices RC l , l = 1, . . ., N c.In particular, we make the assumption that all effective channels within a single cluster share the same spatial correlation structure 4 , i.e., x i ∼ CN (0, γi RC l ), ∀i ∈ C l , ∀l = 1, . . ., N c .Consequently, the optimization problem (24) with respect to RC l decoupled into L subproblems as Therefore, by applying the first optimally condition, RC l is given as Subsequently, we set R(k

D. Algorithm Implementation
The details of the proposed algorithm termed EM-EP, are summarized in Algorithm 1. EM-EP is run until ∥X (k) − X (k−1) ∥ 2 F < ϵ stp or until a maximum number of iterations k max is reached.
Next, we outline two practical implementation considerations of the EM-EP algorithm.

Algorithm 1: EM-EP
Input: received signal Y, pilot sequence matrix Φ, noise variance σ 2 , ϵ, ϵstp, {B l } Nc l=1 ,kc.Compute the new covariance Σ2,i and the mean m2,i of q2(XC l , c l ) using ( 22) and ( 23), respectively update the new posterior mean mi and covariance Σi using (12) 8 Update the set of hyper-parameters Ξ using ( 26) and ( 28) Output: X = m 1) Non-positive Covariance Matrix: While Q(•) and Q new (•) have to be proper distributions in the EP framework, the approximated terms q 1 (•) and q 2 (•), on the other hand, are not constrained to be proper distributions.In practice, some factors q 2 (•) may become improper, resulting in nonpositive definite covariance matrices.In this work, for any non-positive definite matrix Σ 2,i , ∀i, we add a small regularization parameter ζ > 0 to its diagonal elements to ensures that the covariance matrix is positive definite.Some Alternative solutions have also been proposed [35].
2) Pruning: In practice, to overcome the computational complexity of EM-EP, at each iterations, we reduce the search space by ignoring any non-active UEs when updating the approximate factors.To this end, we use the posterior mean of the activity cluster indicator E f 2,l Q \2,l [c l ] as a measure to prune non-active clusters.More precisely, for any cluster l with all the effective channels are set to zero, i.e., X C l = 0, and they are pruned from the model.

IV. ALTERNATIVE SOLUTION VIA ADMM
Owing to the proposed the Hierarchical spike-and-slab prior, and the efficient joint posterior approximation obtained using the EP framework, the proposed EM-EP algorithm provides superior performance compared to the state-of-the-art algorithm, as we will show in the simulation analysis.However, the EM-EP algorithm, like most of EP-based methods, comes with the burden of high computational complexity.Therefore, in this section, we provide an alternative solution to the JUICE problem by relaxing the strong spike-and-slab prior by a weak prior based on the log-sum penalty [27], that still captures the essence of the hierarchical structure of the UEs activity pattern.Furthermore, we reformulate the JUICE as a MAP estimation that is solved via ADMM in order to offer a computationally-efficient algorithm with closed-form updates that can be computed via simple analytical expressions.Next, we present in detail the proposed solution.

A. JUICE as MAP Estimation
In this section, we formulate the JUICE as a MAP estimation problem in order to: 1) identify the active clusters, and their corresponding active UEs, 2) estimate the effective channel matrix X, 3) estimate the unknown covariance matrices {R i } N i=1 .Subsequently, given the received signal Y, the MAP estimation problem is expressed as where (a) follows from the Markov chain {γ, R} → X → Y and since the maximization is independent from p(Y), and (b) follows from the likelihood function of the received signal model (1).Note that an alternative expression to the conditional PDF of X in ( 4) is given as where I(a) is an indicator function that takes the value 1 if a ̸ = 0, and 0 otherwise.Note that p(x i |γ i , R i ) in (30) implies that if γ i = 0, then x i equals 0 with probability one, and in the case of γ i = 1, the effective channel x i follows a complex Gaussian distribution with zero mean and covariance matrix p i R i .

B. Cluster-Sparsity Promoting Prior via Log-Sum
Recall the definition of the effective channel x i = √ p i γ i h i , where the true indicator variable γ i ∈ {0, 1} controls the sparsity of ith UE.Thus, assigning a sparsity-promoting prior to p(γ) is the key to obtain a sparse solution to (29).A conventional choice for a tractable sparsitypromoting prior p(γ) is the log-sum penalty prior N i=1 log(γ i + ϵ 0 ) as it resembles most closely the canonical ℓ 0 -norm when ϵ 0 → 0. Subsequently, we define the following sparsity prior p(γ) Although the prior ( 31) is an appropriate choice as it: 1) promotes sparsity, 2) is separable across the UEs, it ignores the hierarchical structure of the UEs' activity pattern.Therefore, to account for the cluster-level sparsity, we propose the cluster-sparsity-promoting prior J c (•) that correlates the UE activity indicators variables belonging to the same cluster, i.e., γ i , i ∈ C l with the same log-sum penalty.More precisely, we propose the following prior function: Note that J c (γ) promotes quite stringently solutions that have clustered sparsity as it has the tendency to enforce all UEs within each cluster to be detected active even if only one UE is active, being thereby susceptible to high false alarm error rate.Thus, J c (γ) would face robustness issues in the instances where the UEs activity pattern does not exhibit a clustered structure.

C. Proposed ADMM Solution
Before we derive the ADMM based solution for (29), we make two technical choices: 1) the binary nature of γ renders the objective function in (29) intractable for large N .Thus, to overcome this challenge, we note that finding the index set {i | γ i ̸ = 0, i ∈ N } is equivalent to finding the index set {i | ∥x i ∥ > 0, i ∈ N }.Thus, we can eliminate the variable γ from the MAP problem by approximating each γ i by ∥x i ∥ and by relaxing p(γ) by an equivalent prior function p(X) which depends on ∥x i ∥, ∀i ∈ N .2) Similarly to Section III-C2, we resort to estimating a shared covariance matrix for all the UEs within the same cluster.
By using the aforementioned arguments and introducing the regularization weights β 1 , β 2 , and β 3 that control the emphasis on the priors with respect to the measurement fidelity term, the MAP estimation problem (29) can be equivalently rewritten as Now, we propose an iterative solution based on a hierarchical algorithm with two loops, an outer loop and an inner loop.The central idea is to alternate p(X) over J c (•) (32) and J s (•) (31).More precisely, in the outer loop, the algorithm enforces the detection of active clusters via the cluster-level sparsity-promoting function J c .Subsequently, the algorithm runs an inner loop over the just-estimated active clusters to detect the individual active UEs belonging to them by using the sparsity-promoting prior J s .The algorithm details are presented next.

1) Outer Loop:
As we aims to detect the set of the active clusters first, we enforce p(X) to promote the cluster-level sparsity by − log p(X) = Nc l=1 log( i∈C l ∥x i ∥ + ϵ 0 ).Since − log p(X) is concave, we apply a majorization-minimization (MM) approximation [36] to linearize − log p(X) ∥ + ϵ 0 −1 and k c is the MM iteration index for the outer loop.Thus, the relaxed version of the problem (33) can be solved iteratively as where We develop a computationally efficient ADMM solution for (34) through a set of sequential update rules, each computed in a closed-form.To this end, we introduce two splitting variables Z, V ∈ C M ×N and the Lagrange dual variable Λ v , Λ z and define the set of variables to be estimated as Ω = {X, R, Z, V, Λ z , Λ v }.Subsequently, we write the augmented Lagrangian as The ADMM solves to the optimization problem (34) by minimizing the augmented Lagrangian L(Ω) in ( 35) over the primal variables (Z, V, X, Σ), followed by updating the dual variables [37].The primal variable update is given as where .
2) Inner Loop: After running the outer loop for some pre-defined K c iterations, we detect the set of the estimated active clusters Ŝ = j∈J C j , where l ∈ J if there exists i ∈ C l such that ∥x i ∥ > ϵ thr , where ϵ thr > 0 is a small predefined parameter.In the inner loop, the proposed algorithm aims to detect the active UEs belonging to Ŝ by using the separable sparsity-promoting prior − log p(X) ∝ i∈ Ŝ log(∥x i ∥ + ϵ 0 ).Furthermore, we apply the MM approximation to linearize the concave − log p(X) where k u is inner loop iteration index and g Subsequently, the optimization problem for the inner loop is given by where Φ Ŝ and X Ŝ denote the matrices Φ and X, respectively, restricted to the set Ŝ.

Algorithm 2: corr-MAP-ADMM
Input: Φ, {B l } C l=1 , β1,β2, β3 ,ρ, ϵ0, ϵ, ku max , kc max , Kc. Initialization: 1 BS receives Y, compute and store Φ T Φ * + ρIN −1 2 while kc < kc max or ∥X (kc) − X (k−1) ∥ < ϵ do 3 Solve (37) using the similar update rules as step 3-6, but using g 3) ADMM Update Rules: The details of the proposed algorithm in section IV, termed corr-MAP-ADMM, is summarized in Algorithm 2. Note that owing to the proposed splitting techniques in (35), all the sub-problems in (36) are convex, thus, can be solved analytically via closed-form formulas.Further, the optimization over X, V, and Σ is separable over the UEs and the clusters, allowing for parallel updates.Thus, the exact solution to (36) is given in steps 3-6 in Algorithm 2, and we follow the same analogy for to solve (37) in the inner loop.

V. SIMULATION RESULT
This section quantifies the performance and the robustness of the proposed algorithms and compare them to existing sparse recovery algorithms in terms of active UEs identification accuracy, channel estimation quality, and convergence behaviour.

A. Simulation Setup
We consider a network with a single BS serving a N = 200 UEs distributed equally over N c = 20 clusters, with a total of active K = 16.Each UE is assigned a unique unit-norm pilot sequence drawn from an i.i.d.complex Bernoulli distribution.We set each B l , ∀l, as

B. Metrics and Baselines
The performance is evaluated in terms of: 1) channel estimation quality measured by the normalized mean square error (NMSE) given as , 2) activity detection accuracy using the support recovery rate (SRR), defined as |S∩ Ŝ| |S− Ŝ|+K , where Ŝ denotes the detected support by a given algorithm.We compare the performances of the proposed algorithms against three algorithms that solve MMV sparse recovery problem, namely, 1) iterative reweighted ℓ 2,1 (IRWℓ 2,1 ) via ADMM [18], 2) MAP-ADMM [19], and 3) T-SBL [33], where for MAP-ADMM and T-SBL, both the second-order statistics and the noise variance are known at the BS.Furthermore, For an optimal NMSE benchmark, we consider the oracle minimum mean square error (MMSE) estimator that is provided with an "oracle" knowledge on the true set of active UEs.
In practice, the regularization parameters of the sparse recovery algorithms needs to be finetuned.For a fair comparison, all the parameters have been empirically tuned via cross-validation in advance and then fixed such that they provide their best performance.

C. Correlated Activity Pattern
First, we investigate the performance of the proposed EM-EP and corr-MAP-ADMM and compare it to the baseline algorithms.To this end, the 16 active UEs are distributed over 2 active clusters, each with 8 active UEs.Furthermore, the results emphasize the superiority of the proposed EM-EP in terms of channel estimation quality, providing an approximate gain of 4 dB compared to MAP-ADMM and T-SBL.
3) Effect of the Number of BS Antennas: Fig. 4 illustrates the effect of the number of BS antennas (M ) on the performance of the proposed algorithms in terms of NMSE.As expected, all the algorithms experience an improvement in the channel estimation quality as M increases.
Notably, EM-EP algorithm substantially outperforms the other sparse recovery algorithms, with a remarkable gains in the lower range for BS antennas, i.e., M < 12.In addition, corr-MAP-ADMM provide the second best performance for M < 12.However, it is important to note that as M increases, the performance gap between all algorithms gradually decreases.This can be attributed to the fact that having more measurements from additional antennas would yields more gains compared to incorporating any side information, such as correlated activity structure.
In summary, the results in Fig. 2, Fig. 3, and Fig. 4 clearly demonstrate the substantial gains achieved by incorporating the structured activity pattern when designing solutions for the JUICE problem.Indeed, even with no prior knowledge on the CDI, the proposed algorithms outperform state-of-the-art recovery algorithms that utilizes prior knowledge on the full CDI.

D. Robustness Model Mismatch
We investigate now the robustness of the proposed algorithms scenarios of sparsity model mismatch where discrepancies exist between the actual scenario and the imposed algorithm model.A sparsity model mismatch occurs if the UEs activity is independent rather than exhibiting a correlated pattern.

E. Convergence Behaviour
The computational complexity per iteration of EM-EP is mainly dominated by the Σ update in (12) and requires O(M 3 N 3 ) complex multiplications.In contrast, corr-MAP-ADMM's primary computational complexity arises from the V and the R updates resulting in computational  Subsequently, by following the same steps, the nth moment with respect to x i is computed as x n i f2(XC l , c l )Q \2,l (XC l , c l )dXC l dc l = a l G l,0 x n i CN (xi; Ri(Ri + Σ\2,l i Note that the integral term represents the nth moment of a multivariate Gaussian random vector, thus the mean E f 2 Q \2,l [x i ] and the variance Var f 2 Q \2,l [x i ] are deduced directly from (41) and they are given in (20).Finally, the posterior mean with respect to c l is computed as (42)

Fig. 1 :
Fig. 1: Illustration of an MTC network with an activity pattern centred around few number of clusters.

c
l as a Bernoulli random variable with p(c l = 1) = ϵ and p(c l = 0) = 1 − ϵ.Furthermore, to October 17, 2023 DRAFT account for the independence amongst the clusters activity, we express the probability mass function of p(c) = Nc l=1 B(c l ; ϵ), where B denotes the Bernoulli distribution.

,
of the effective channel x i , ∀i ∈ N .(ii) In the M-step, by utilizing the sufficient statistics of X obtained from the previous E-step, we compute the new values of the hyperparameters Ξ (k) by minimizing a lower-bound on the negative log-likelihood p(Y|Ξ).Next, we will present the details of the proposed algorithm.

9 if kc mod Kc = 0 then 10 Ŝ=
j∈J C j , {l ∈ J : ∃i ∈ C l |∥xi∥ > ϵ} 11 while ku < ku max do 12 where Ψ l is a random positive-definite Hermitian matrix to model the error in the prior knowledge on the covariance matrix R C l , whereas the parameter ζ controls the level of average mismatch between R i and B l , ∀i ∈ C l , and it is set as ζ = 0.1.

Fig. 4 :
Fig. 4: JUICE performance in terms of NASE versus the number of BS antennas M for N = 200, τ p = 20, and SNR = 15 dB.
complexity in the order of O(N M 2 + N c M 3 ).For a reference, MAP-ADMM and T-SBL requireO(M N 2 + N M 2 )and O(N 2 M 3 τ p ) complex multiplications, respectively.Nonetheless, despite being more computationally demanding compared to corr-MAP-ADMM, EM-EP exhibits a faster convergence rate, requiring only about 10 iterations to converge in contrast to corr-MAP-ADMM which takes up to 60 iterations as shown in Fig.6.