Access Point Cooperation Strategies for Coded Random Access in Cell-Free Massive MIMO

In this article, grant-free uplink communication from a large number of machine-type devices in cell-free massive MIMO networks is explored. A novel approach that leverages coded random access (CRA), on the device side, with combining of signals received at properly selected access points (APs) and cooperative successive interference cancelation (SIC), on the network side, is presented. Initially, an analytical framework based on stochastic geometry is developed to investigate performance of AP cooperation through signal combining under diverse AP cluster compositions. The potential gain from AP signal combining is then assessed by evaluating a genie-aided scheme, guiding the network in cluster selection for each active device. Subsequently, two practical AP selection algorithms that operate in grant-free conditions (i.e., do not require prior information regarding the active users) are proposed. Numerical results show how AP cooperation through signal combining and distributed interference cancelation can bring tangible benefits even without prior information about active users, under different signal-to-noise ratio regimes, closing in some cases the gap to the genie-aided approach. Additionally, the results prove that AP cooperation can be used to reduce the devices’ energy consumption and the number of APs that have to be deployed by the service providers to achieve specific performance levels.

Access Point Cooperation Strategies for Coded Random Access in Cell-Free Massive MIMO Enrico Testi , Member, IEEE, Velio Tralli , Senior Member, IEEE, and Enrico Paolini , Senior Member, IEEE Abstract-In this article, grant-free uplink communication from a large number of machine-type devices in cell-free massive MIMO networks is explored.A novel approach that leverages coded random access (CRA), on the device side, with combining of signals received at properly selected access points (APs) and cooperative successive interference cancelation (SIC), on the network side, is presented.Initially, an analytical framework based on stochastic geometry is developed to investigate performance of AP cooperation through signal combining under diverse AP cluster compositions.The potential gain from AP signal combining is then assessed by evaluating a genie-aided scheme, guiding the network in cluster selection for each active device.Subsequently, two practical AP selection algorithms that operate in grant-free conditions (i.e., do not require prior information regarding the active users) are proposed.Numerical results show how AP cooperation through signal combining and distributed interference cancelation can bring tangible benefits even without prior information about active users, under different signal-to-noise ratio regimes, closing in some cases the gap to the genie-aided approach.Additionally, the results prove that AP cooperation can be used to reduce the devices' energy consumption and the number of APs that have to be deployed by the service providers to achieve specific performance levels.

I. INTRODUCTION
T HE EVER-GROWING integration of the Internet of Things (IoT) in several application domains has prompted an explosion of the number of connected smart objects, leading to the rise of massive machine-type communications (mMTC) [1], [2].In 6G, mMTC systems will need to meet scalability and device battery lifetime requirements more stringent than 5G ones, with use case-specific reliability and latency levels [3], [4].In this context, enhanced medium Enrico Testi and Enrico Paolini are with the CNIT/WiLab, Department of Electrical, Electronic and Information Engineering, University of Bologna, 47522 Cesena, Italy (e-mail: enrico.testi@unibo.it;e.paolini@unibo.it).
Digital Object Identifier 10.1109/JIOT.2024.3428922access protocol (MAC) and physical (PHY) layer schemes for next-generation massive multiple access (MMA), as well as innovative network architectures, will be paramount to address the challenges posed by the growing network densities and the heterogeneous service requirements.
Concerning innovative architectures, cell-free massive MIMO (CF-mMIMO) networks have recently emerged as a promising alternative to traditional, cell-based ones [5], [6], [7], [8], [9].This architectural concept diverges from a classical cell-based structure in that a massive number of access points (APs) is distributed across the network area.The AP operations are coordinated by one or more central processing units (CPUs) with powerful computation capabilities.This way, the system is shifted from a "base station (BS)-centric" model to a "user-centric" one, where each user accesses the network through the cluster of APs that are at closest distance from it, to constantly maximize its quality of service [10].The decentralized nature of CF-mMIMO networks, where each AP is equipped with a relatively small number of antennas, leverages the benefits of massive MIMO, achieving high multiplexing and array gains through distributed signal processing.The relatively short links between devices and APs facilitate more effective and equitable coverage, thereby enabling connected devices to operate at lower transmit power levels, a regime of great interest in mMTC.Such network architectures, augmented by emerging technologies like reconfigurable intelligent surfaces (RIS), extremely large MIMO (XL-MIMO), and holographic MIMO, are among the most promising candidates for next-generation wireless networks [11], [12], [13].
In terms of innovative grant-free access protocols supporting MMA, coded random access (CRA) ones [14] have recently gained interest due to their capability to achieve higher levels of reliability (with respect to current mMTC) even with a large number of active uncoordinated devices (e.g., [15], [16], [17], [18], and [19]).Based on the idea of combining packet diversity with successive interference cancelation (SIC) across various resources at the receiver, these protocols can provide remarkable scalability gains [14].In the CRA protocol considered in this work, an active user transmits multiple replicas of its data payload in different time slots; whenever any such replica is decoded in a slot, the interference generated by the decoded packet and by its twins in the corresponding slots is subtracted from the received signal samples.Reprocessing the signal in these slots can potentially result in the decoding of additional messages.Bridging massive access with modern codes-on-graphs, these protocols have so far been studied mainly in a centralized context, where all active users contend to deliver their messages to the same BS.To the best of the author's knowledge, the only work considering CRA in cell-free networks is [21].
In the currently available scientific literature, although massive access is a predominant use case in the context of CF-mMIMO systems, only a few facets of it have been investigated.In [22] and [23], activity detection was tackled as a maximum likelihood (ML) estimation problem, while [24] delved into joint activity detection, channel estimation, and data estimation via belief propagation.Distributed and centralized processing for joint activity detection and channel estimation with nonorthogonal random pilot sequences were discussed in [25].Strategies aimed at minimizing the impact of pilot contamination in CF-mMIMO networks through power control and pilot assignment have been proposed recently [26], [27], [28], [29], [30].In both [27] and [28], a preliminary phase during which a generic user equipment (UE) is associated with a subset of available APs, is followed by a practical strategy to assign orthogonal pilots to UEs aiming at reducing the mutual interference and improving the spectral efficiency.Conversely, Rahmani et al. [29] assumed that all APs receive the signals of all UEs and develop a scalable and distributed pilot assignment mechanism via multiagent reinforcement learning.In [26] and [30], power control is formulated as an optimization problem to reduce the effect of pilot contamination; the objective of [30] is to maximize the minimum signal-to-interference-plus-noise ratio (SINR), while that of [26] is to enforce a desired minimum SINR across all the UEs.In addition, [31], [32] focused on the downlink.In [21], a CRA scheme tailored to CF-mMIMO networks was described.Here, the APs process the received signals individually while the CPU orchestrates a distributed SIC procedure with very low overhead on the fronthaul links.
In this article, following the preliminary work in [20], differently from [21], we investigate the possibility of enhancing the performance of massive grant-free access in CF-mMIMO networks (with reference to a CRA transmission scheme) through a judicious selection of subsets of APs, whose signals are combined to detect and decode the received messages.Owing to the assumed fully grant-free nature of the access protocol, the AP selection must be performed at the CPU level based solely on the signal samples received at the APs; this leads to a compelling challenge since no foreknowledge can be assumed about the number of active users, their spatial positions, and the pilot each of them employs.
The main contributions of this work are the following.1) We introduce and systematically analyze different levels of cooperation among APs in grant-free CF-mMIMO networks, exploring how they can influence network performance.This includes the novel concept of cooperative detection and decoding of user messages through dynamically formed clusters of APs. 2) We derive an analytical framework based on stochastic geometry to study the performance of AP cooperation in packet detection and decoding under various AP cluster compositions.
3) We analyze the potentially attainable gain of cooperative detection and decoding using a genie-aided approach, where AP clusters are optimally selected based on the full knowledge of the system.This provides a benchmark to evaluate the effectiveness of practical cooperative schemes, thereby motivating the development of efficient cooperation mechanisms.4) We propose two novel schemes for "uninformed" cluster formation, i.e., cluster formation where the system operates without prior information about active users, apart from a set of available pilots that they can choose individually, in a grant-free manner.5) Finally, extensive simulations are carried out to assess the performance of the proposed AP cooperation schemes.These simulations compare the proposed methods against noncooperative and basic cooperative schemes.Additional considerations regarding the possibility of exploiting AP cooperation to reduce the devices' energy consumption and the number of APs deployed in the service area are presented.The system model and the CRA-based processing schemes with no AP signal combining are presented in Section II.The analytical framework for the performance analysis of AP signal combining is presented in Section III.The genieaided and practical AP cooperation strategies are described in Section IV.Numerical results are presented in Section V and conclusions are drawn in Section VI.

II. SYSTEM MODEL AND BACKGROUND
We consider a CF-mMIMO system with M APs, each with N antennas, deployed in an area.The APs are connected to a CPU via fronthaul links, as depicted in Fig. 1, and jointly serve a massive number of machine-type devices (not all simultaneously active) under CPU coordination on the same time-frequency resources.In the figure, the APs are arranged on a grid although this is not required by the developed algorithms.Uplink is synchronous, i.e., there is a common time reference between devices, APs, and CPU.The time is segmented into frames, each divided into N s slots.Time synchronization is achieved through a beacon signal transmitted by the APs at the start of each frame.Such synchronization signals are standard in cellular networks, such as the primary and secondary synchronization signals in 5G [5].Let us also remark that, while perfect time synchronization may be challenging in large networks due to propagation delays, if each AP is synchronized with its neighboring APs and users synchronize with the APs in their immediate vicinity then our proposed system model remains valid [5].After waking up, an active device waits for the beacon and then competes for transmission of one k-bit message W ∈ [2 k ] within the subsequent frame.
Let us consider a generic frame and assume to have K devices active over the frame which are randomly distributed in the network area according to a uniform spatial distribution.All devices encode their messages with the same channel encoder and map the coded bits onto the same complex constellation, obtaining a data payload x ∈ C 1×N D of N D complex symbols.An active device selects R slots in [N s ] and transmits a packet [p, x] in each such slot, where p ∈ C 1×N P is a pilot sequence drawn from a set P of P K mutually orthogonal pilots.Note that, while the same payload x is transmitted in all the R picked slots, the pilot p may change in each such slot.Both the R slot indexes and the R pilots are generated as functions of the random message W (e.g., through the algorithm described in [33]).Although the pilot may change, we refer to the transmission of a user in each of its R chosen slots as a "replica."Each replica arriving at an AP is assumed perfectly aligned with one of the frame slots.
All devices employ the same transmit power ρ.Consider a generic slot and let U be the subset of active users transmitting a replica in it.Each user in U first transmits the pilot corresponding to its message.The corresponding vector of signal samples received by antenna n of AP m, y mn,P ∈ C 1×N P , is where p(W i ) ∈ C 1×N P is the pilot function of the message W i transmitted by user i, h mni is the channel gain between antenna n of AP m and device i, and w mn ∼ CN (0 N P , σ 2 N I N P ×N P ) is a vector of noise samples.Pilot transmissions are followed by payload transmissions.The corresponding vector of received signal samples at antenna n of AP m, y mn,D ∈ C 1×N D , is where x(W i ) is the data payload of message W i transmitted by user i, and ν mn ∼ CN (0 N D , σ 2 N I N D ×N D ) is a vector of noise samples.The coefficients h mni , for each user i, AP m and antenna n, are the complex channel gains, which are constant over an entire slot and are defined according to a Rayleigh block fading channel model.
The channel gains are modeled as h mni = g mni √ β mi , where g mni ∼ CN (0, 1) and β mi are the small-scale and large-scale fading coefficients, respectively.We have where PL mi is the path-loss between user i and AP m, σ S is the shadowing intensity, and u mi ∼ N (0, 1).For each (m, n, i), g mni remains constant through a slot and varies independently from slot to slot, while β mi stays constant over the whole frame.We let g mni and g pqj be independent for any (m, n, i) = (p, q, j).Similarly, β m,i and β p,j are independent for any (m, i) = (p, j).In this work we will utilize two distinct path-loss models: 1) a general path-loss model for analytically deriving (in Section III-A) a performance metric useful to evaluate the interplay between pilot contamination and diversity gain when forming clusters of cooperating APs and 2) a specific path-loss model, closely aligned with the 3GPP Urban Microcell Model, for evaluating (in Section V-B) the performance of the proposed processing schemes.

A. Noncooperative CRA
We first delineate a processing scheme that does not foresee any kind of cooperation among the APs, which we refer to as noncooperative (NC).In this scheme, which is obtained as a simplified noncooperative version of the scheme proposed in [21], each AP independently attempts to decode the transmitted packets using only its own received signal, without exchanging information with other APs or the CPU.In NC scheme, the SIC process only mitigates interference from Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
packets detected locally.Additionally, compared to the scheme in [21], we adapt the processing method to accommodate multiple available pilots and multiple antennas at the APs.
The NC processing is logically split into two phases, namely, an initial slot-by-slot signal detection and decoding followed by a SIC process across the slots, both individually executed by each AP.The two phases are described as follows.
1) Slot-by-Slot Decoding: In the generic slot s ∈ [N s ] (in the following we omit index s to keep the symbol notation light), each AP first processes the received pilot signal samples.The pilot signal in (1) received at antenna n of AP m is projected along pilot j, j ∈ [P], and ML channel estimation is obtained as Then, maximal ratio combining (MRC) is performed on the payload samples using the estimate in (4), as where ĥm (j) = [ ĥm1 (j), . . ., ĥmN (j)] T is the vector of channel gains estimated over pilot j, and Y m,D = [y T m1,D , . . ., y T mN,D ] T is the matrix of aggregated received payload symbols at all AP antennas.If the message, sent by a device in the processed slot using pilot j, is retrieved upon successful demapping and decoding performed on z m (j), the message is pushed to a local list L m located at the AP m.This operation is repeated by each AP in all slots and for all pilots.
2) SIC: In NC processing, the mth AP processes the messages in its list L m in order.Let us denote by W the generic message in the list, and by [N s ] W ⊆ [N s ] the subset of slots where the replicas of the message W have been transmitted.The mth AP picks the first message W in the list and performs the subtraction of the interference generated by the transmitted replicas of the decoded message, followed by a new decoding attempt, in all slots in which the replicas were transmitted.Since the set [N s ] W , the data payload and the pilot are all functions of the message W only, the AP can retrieve all of this information from it.In particular, for each slot in [N s ] W , AP m first re-estimates the channel using the known payload, x(W), as and then subtracts h(W) mn [p(W), x(W)] from the received pilot and payload samples, [y mn,P , y mn,D ].After interference cancelation, the AP attempts decoding of new messages by performing (4) and ( 5) in each of the processed slots.When a new message is decoded, it is pushed into the bottom of L m .The pseudocode for NC scheme is detailed in Algorithm 1, where Y m,P = [y T m1,P , . . ., y T mN,P ] T is the aggregate of the pilot signals received by all the antennas of AP m.

B. CRA With Cooperative SIC
In this section we review the processing scheme in [21], for the case of multiple available pilots and multiple antennas at the APs, which we refer to as CRA with cooperative SIC (CS).Differently from the NC scheme, it includes a simple form of cooperation among APs which share all the decoded packets with the help of the CPU, thus enabling a distributed global SIC process.While CS is fundamentally based on noncooperative signal detection at the APs, it incorporates a mild mechanism of cooperation between the APs during SIC, that allows us to demonstrate the incremental benefits of cooperation.In Sections III and IV, we will introduce a superior level of cooperation, which we refer to as cooperative detection and decoding, and propose a set of practical cooperation schemes.
The CS scheme is still logically split into two phases, namely, an initial slot-by-slot processing executed individually by each AP followed by a distributed SIC process.The main difference between NC and CS schemes lies on the second phase: while in NC each AP performs SIC of only the messages it has decoded locally, in CS a distributed cooperative SIC takes place across all the APs.The CS processing is outlined as follows.
1) Slot-by-Slot Decoding: Each AP processes the received signals as in the first phase of the NC scheme.At the end of the procedure, each AP forwards its local list of decoded messages to the CPU via fronthaul links; in this way, all messages successfully decoded in the frame are delivered to the CPU that stores them in a global list L. 2) Cooperative SIC: The CPU processes the messages in the list L in order.At each step, the CPU picks the first message in the list and sends it to all the APs for SIC.Let us denote by W such message.Since the data payload, the pilot and the set [N s ] W are all functions of the message only, each AP can retrieve all this information from it.Each AP, for each slot in [N s ] W , first re-estimates the channel from the payload, then performs subtraction of the interference generated by the transmitted replicas of the decoded message, as detailed in phase 2 of NC scheme.After interference cancelation, the AP attempts decoding of new messages by performing ( 4) and ( 5) in each of the processed slots.When a new message is decoded, it is sent to the CPU via fronthaul link and added at the bottom of L. The pseudocode for CS scheme is detailed in Algorithm 2.

III. COOPERATIVE DETECTION AND DECODING
In this section, we first introduce cooperative detection and decoding of the users' messages through clusters of APs.Subsequently, we investigate the performance of AP signal combining for distinct cluster compositions and under different network load conditions.Note that cooperation is here meant as a combining operation, performed by the CPU on the signal samples of suitably defined AP subsets.Although selecting large sets of APs might appear the best approach at first glance, this is actually not always the case: APs with an unfavorable SINR or receiving contaminated pilots, when added to a set of cooperating APs may in fact jeopardize successful message decoding.This effect is even more exacerbated under highnetwork load conditions.Let us also remark that the APs cluster construction is carried out at the CPU level, relying solely on the signal samples received at the APs.This poses a significant challenge, as it has to be performed in a fully grant-free scenario where no prior information is assumed regarding the number of active users, their spatial positions, or the specific pilot each user employs.
In order to clarify how cooperative detection and decoding is implemented and which issues arise when building a subset Fig. 2. Graphical illustration of the sets of devices and APs introduced in Section III: 1) reference device i, which is active and is transmitting on the slot s by using pilot p 1 ; 2) the set of APs C i , that can be potentially used to cooperatively decode the packet replica transmitted by device i in a generic slot s; 3) the set I i = U \ {i} of devices transmitting a packet replica in the considered slot, except the ith device; and 4) the set E i ⊂ I i , i.e., the subset of devices transmitting pilot p 1 .
of cooperating APs, let us first focus on the detection and decoding of a given message W i of user i, assuming that we know the user is active and is transmitting in slot s using pilot p(W i ).Let us denote by C i = {t 1 , . . ., t C i } the set of APs with cardinality C i = |C i | that can be potentially used to cooperatively decode the packet replica transmitted by device i in slot s (for simplicity, we omit slot index).Let us also denote by I i = U \ {i} the set of devices transmitting a packet replica in the considered slot, except the ith one, and by E i ⊂ I i the subset of devices transmitting pilot p(W i ).Fig. 2 provides a graphical illustration of the introduced sets of devices and APs.
Cooperative detection and decoding of message W i is based on the linear combination of the received payload signals of all the C i APs forming cluster C i .If MRC is used to combine the signals, decoding of the user's message is attempted from vector v i ∈ C 1×N D obtained as where  (7), performed by the CPU with the signals of the cooperating APs, is obtained by stacking all their channel coefficients and received signals, even though the APs are not co-located.
The effectiveness of signal combining in improving the signal-to-noise ratio (SNR) and reducing the interference power depends on the quality of the channel gains estimated by the APs, as well as on the composition of cluster C i , which must be appropriately constructed to include the APs receiving signals with minimal interference and exhibiting high-quality channel estimates.In this regard, let us consider the estimated channel gain at antenna n of AP m related to user i, obtained Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
through ML estimation over the pilot p(W i ), as We denote as e mni the channel estimation error due to pilot contamination and noise experienced by the nth antenna of the mth AP and related to user i, i.e., The pilot contamination and the noise terms are two independent zero-mean complex Gaussian random variables, that are constant within each slot.The channel estimation bias term in (9), can hinder the capability of the CPU to decode the message transmitted by device i, by reducing the effectiveness of signal combining.In the following section we will analyze the different components of the combined signal in ( 7) and we will define a performance metric able to capture the joint effects of pilot contamination, payload interference and noise on the quality of the combined signal employed to decode a message.

A. Performance Analysis
The following analysis is focused on a generic slot of the frame.We first calculate the statistical power of the channel estimation error, which will be exploited subsequently.As the active devices transmitting a packet replica in the designated slot, forming the set I i , are uniformly distributed across the area, it is reasonable to consider them as forming a Poisson point process (PPP) with intensity λ in R 2 .Assuming each device selects a pilot randomly, the count of users transmitting pilot p(W i ), forming the set E i , can still be modeled as a PPP with intensity λ E = λ/P, as a direct consequence of the independent thinning of Poisson processes [34].
To make the analysis valid also when transmitters and receivers are closer to one another, we adopt a bounded pathloss model having the form (10) where r mi is the distance between the mth AP and the ith device, α is the path-loss exponent, and L 0 is the minimum path-loss, so that 2L 0 is the path-loss at reference distance r 0 = 1 m [35], [36].We also define the scaled shadowing intensity parameter as σS = log(10)σ S /(10 √ 2).Let us now denote as ξ = ξ(λ E ) the average pilot contamination power for each antenna n of AP m, defined as the expectation of the squared amplitude of the sum of the channel coefficients between antenna n of AP m and the interfering devices in E i , i.e., The expectation in (11) is taken over all the sources of randomness.We can derive a closed-form expression for ξ(λ E ) as follows.
Theorem 1: The expected squared amplitude of the sum of the channel coefficients between the interferers distributed according to a PPP with density λ E in R 2 and the nth antenna of the mth AP is independent of m and n and is given by Proof: Let us start by taking the expectation with respect to the small-scale fading.Recalling that h mni ∼ CN (0, β mi ), we have We thus recognize that z = j∈E i β mj is a sum of marks, β mj , of a marked PPP with density λ E [34].Let us now apply Campbell's theorem to find the moment generating function (m.g.f.) of z as [37, p. 108] where u ∼ N (0, 1) and the mark β is a function of u.Therefore Incorporating ( 16) into (15) yields which completes the proof.
From Theorem 1, we deduce the following corollary.
Corollary 1: The average power of the channel estimation error in ( 9) is given by Proof: The proof follows directly from applying Theorem 1 to (9).
Remark 1: We notice that the average channel estimation error power depends linearly on the intensity of the Poisson process of the interferers that, in turn, depends on the number Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Next, we exploit the result of Theorem 1 to formulate a performance metric that enables us to discern the impact on the system performance of forming the cluster C i with a given set of APs.This metric is designed to shed light on the variations in the performance of joint detection and decoding via AP signal combining for the packet transmitted by one of the devices, especially in response to changes occurring under distinct network load conditions.
Let us reformulate the received signal samples in (2) by isolating the signal transmitted by user i, as Let us also consider the estimated and actual channel gains at all the APs in the cluster stacked into two vectors, respectively, qi and , where h T mi = [h m1i , . . ., h mNi ] with m ∈ C i .Thus, the estimated channel gains can be expressed in a compact form as qi = q i + ẽi (20) where the channel estimation errors at all the APs in the cluster can be stacked in vector ẽi = [e T t 1 i , . . ., e T t C i i ] T , where e T mi = [e m1i , . . ., e mNi ] with m ∈ C i .Following the cooperative detection and decoding scheme, the signals of all the C i APs forming cluster C i are combined via MRC as in (7).The resulting signal vector, without normalization term, becomes where we recall that Y i,D is the aggregate of the payload signals received by all the antennas of the APs in cluster C i , and ] T is the aggregate of the noise samples at the APs, with V T m = [ν T m1 , . . ., ν T mN ] and m ∈ C i .Our focus now lies in assessing a performance metric that establishes a connection between the power of the reference signal received from device i and the power of the unwanted contributions after MRC, due to noise, interference and channel estimation error.This evaluation will enable us to gain insights into how the system performance is affected by choice of the subset of the APs to be incorporated into the cluster.Hence, we define the performance metric γ i (C i ) as the ratio between the average received power from user i and the average interference and noise power after MRC of the signals received from the APs in cluster C i , i.e., where the expectation is taken over all the sources of randomness, by keeping fixed the positions of the APs in the cluster C i with respect to the reference device i.In our analysis, we assume that the elements of ẽi and q j are incorrelated for j ∈ I i .
In practice, the correlation between the elements of the two vectors is low and becomes negligible when the number of pilots is sufficiently high.
For the sake of conciseness, let us define the sum of the first and second-order noncentral moments of the large-scale fading coefficients between the APs in the cluster and the reference device i as respectively.Let us also define the quantity ϒ(λ, i) as which is directly related to the power of the channel estimation error (see the proof of Theorem 2).We proceed to obtain an expression for γ i (C i ) by exploiting the result of Theorem 1 and by taking the expectation with respect to fast fading and noise.The terms μ (i) 2 can be estimated using Monte Carlo methods.
Theorem 2: The ratio between the average received power from user i and the average interference and noise power after MRC of the signals received by the APs in cluster C i can be approximated as Proof: Let us start by expressing Next, we can derive an approximation of the first term of the denominator of ( 22) by applying Theorem 1 under the assumption of incorrelation between the elements of ẽi and q j Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
for j ∈ I i .Specifically, we can write The second term in the denominator of ( 22) can be manipulated as where Theorem 1 is used to calculate the expectation of the channel estimate error power.Moreover, we can express the last term in the denominator of ( 22) as Substituting ( 27)-( 30) into ( 22) yields (26), completing the proof.
The insights provided by Theorem 2 aid in comprehending the rationale behind selecting which APs to incorporate into the cluster C i for the decoding of the message transmitted by user i.The expression in the denominator of (26) highlights that, including APs with unfavorable large-scale fading coefficients and consequently leading to a pronounced pilot contamination effect, reduces the ratio between the average useful signals' power and that of the interference.From (26), it becomes apparent that, as the cluster size C i grows, the addition of APs located progressively farther from device i results in a saturation of μ (i)  2 .The saturation is a consequence of the diminishing contribution of distant APs to the sum.Conversely, the terms multiplied by C i exhibit a linear increase, leading to a decrease of the performance metric, as further shown in the numerical results.Moreover, further insights into the cluster construction process can be acquired by examining γ i (C i ) when pilot contamination is negligible.
Remark 2: When there is a substantial number of orthogonal pilots available, the impact of pilot contamination becomes insignificant, leading to a simplified form of (26), i.e., lim From ( 31), we observe that, when pilot contamination is negligible but the payload signal of device i is still interfered, it is recommended to form large clusters of APs for cooperative detection and decoding.Therefore, it becomes clear that mitigating the impact of pilot contamination, e.g., by increasing the number of available orthogonal pilots, opens up the potential for additional performance improvements through the combination of signals received by different APs.

IV. SCHEMES FOR COOPERATIVE DETECTION,DECODING,
AND SIC Based on the considerations made in Section III, we now address enhancements to the above-described CS processing, that leverage AP cooperation in payload estimation and decoding.We initially examine the performance offered by an idealized algorithm where the groups of APs to be combined to process the uplink signal are selected, for each active device, by a "genie" with full knowledge of the system.This genieaided strategy is used as a benchmark for two actual schemes, working with no prior information, introduced afterwards.Under specific conditions, the performance of the developed practical strategies can approach the genie-aided one.

A. Genie-Aided Cooperative Detection, Decoding, and SIC
We consider an idealized setting where the CPU is assisted by a genie with full knowledge of the set of active users, their employed pilots, and the aggregate SINR they experience after any linear combining of signals received by any set of cooperating APs.With such an information the CPU is able to operate a per-user processing, instead of the uninformed per-slot and per-pilot processing implemented in CS and other practical schemes; in particular, the CPU can build optimal AP clusters, each one devoted to the decoding of the packet transmitted by a specific active device.The channel gains between the APs and user i, for each antenna n, are obtained through ML estimation over the pilot p(W i ) used by user i, which is known by the genie, as in (8).The channel gains are evaluated for all m ∈ C i .Subsequently, MRC is used to combine the signals received by all the APs in C i as in (7).Then, decoding of the user's message is attempted from vector We can define the aggregated SINR for user i resulting from the combining of all the signals received by the APs in the cluster C i , as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where q i is the stacked vector of all channel gains h mni between APs m in the cluster C i and user i.The genieaided scheme aims to find the smallest cluster of APs, C i , which can successfully decode the packet of user i, maximizing SINR(i, C i ).It exploits the genie's knowledge of SINR obtained after combining the signals received by any set of APs.The cooperative detection and decoding algorithms are executed by the CPU only after the system has performed the CS processing scheme.This cascading operation yields two main benefits: 1) interference cancelation performed by the CS algorithm reduces the pilot contamination, thereby enhancing the performance of payload detection and estimation through AP combining and 2) the CS scheme proves to be less burdensome in terms of data exchanged over the fronthaul links.Hence, AP combining schemes are employed only when CS cannot successfully decode messages from all users.The genie-aided cooperative detection, decoding, and SIC (CDS-G) scheme is still organized into two subsequent phases.

1) User-By-User Cooperative Decoding (UCD):
The CPU iteratively builds clusters of APs that attempt cooperative decoding of the user's messages, and stores all the decoded messages in a list L. Initially, the algorithm foresees a cluster composed by a single AP (C i = 1), which is the one having the largest SINR for the considered user.If the decoding fails, the cardinality of the set C i is increased by 1 by adding the AP that leads to the maximum SINR(i, C i ).
2) SIC: the CPU performs SIC in the slots in which the decoded messages were transmitted and reattempts cooperative decoding of the messages not yet decoded.When a new message is decoded, it is added to the list L.
The complete pseudocode of the genie-based algorithm is given in Algorithm 3, where W is the set of devices whose message remains undecoded, [N s ] i (r) is the slot in which replica r of message i has been transmitted, and C max is the maximum cardinality allowed for the set of cooperating APs, which corresponds also to the maximum number of attempts to decode one message.Clearly, this idealized AP cooperation scheme is not feasible in practice, as it takes advantage of the knowledge of a genie, having important information about the network users at its disposal.In the following we propose two practical schemes for AP cooperation that do not require any prior information about the network users.

B. Distance-Based Cooperative Detection and Decoding
The main idea behind AP combining is to identify sets of APs that can jointly perform the decoding of the message transmitted by a user.However, in the grant-free access schemes the set of active devices, and the pilots used by each device, are not known.Hence, in every slot of the frame and for each pilot, the CPU is required to blindly process the signals received by all APs to detect potential clusters of receivers that should collaborate to jointly decode the messages of active users.Note that, since channel state information is unknown to the receivers, it is also hard to estimate the SINR after MRC and to exploit it to build such clusters.In the following, we propose a distance-based cooperative detection, decoding, and SIC (CDS-A) algorithm Algorithm 3: Pseudo-code for CDS-G Scheme Phase 1 -User-by-User Cooperative Decoding (UCD) that does not require any a priori insight into the active users in the network.The cooperation mechanism is based on the creation of M clusters, which depend solely on the geometry of the APs, each one having as centroid a different AP of the network.
We say that AP k is a neighbor of AP m if the distance r mk is less than or equal to a threshold D max , i.e., r mk ≤ D max .Then, the cluster which has AP m as its centroid, is composed by all the neighbors of m, with m included.It is important to note that in this case, differently from the genie-aided algorithm, the notation C m identifies a cluster that has AP m as the centroid.The signal received by each AP in C m is projected along the jth pilot to perform channel estimation as in (4).The estimate of all channel gains can be stacked in a single vector qm (j) = [ ĥt 1 (j) T , . . ., ĥt C (j) T ] T , where ĥm (j) T = [ ĥm1 (j), . . ., ĥmN (j)].Then, MRC is performed as and finally decoding is attempted from v m (j) ∈ C 1×N D .The proposed algorithm is executed after CS scheme and retains the structure of CDS-G, but with a few variations described hereafter.First, distance-based clusters are built based on the network geometry.Then, the CPU attempts Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.decoding of messages, for each slot and for each available pilot, using the clusters of APs, storing the decoded messages in L. Subsequently, the information about the decoded messages is used to perform interference cancelation as in phase 2 of CDS-G, and decoding is reattempted by the CPU using the clusters of APs.The complete pseudocode of the proposed CDS-A algorithm is given in Algorithm 4.

C. Correlation-Based Cooperative Detection and Decoding
Although the CDS-A algorithm does not require any a priori information about the active users, its cluster construction process does not make full exploitation of the different amounts of useful and interference powers received by the APs.For this reason, aiming at associating only those APs that receive signals from the same device with favorable SINR, we propose an iterative algorithm for cluster construction based on the similarity of signals received by the APs.At each time slot and for each available pilot the CPU performs the following operations.As in CDS-A algorithm, the CPU creates M clusters, each one having as centroid a different AP of the network.As in CDS-G scheme, the size of each cluster is increased iteratively, until a maximum size C max is reached.At each iteration, a new AP is added to each cluster as follows.Each AP individually estimates its vector of channel gains using pilot j as in (4), and performs MRC as in (5).The AP a that is added to the cluster C m having AP m as centroid is where Cm is formed by the APs k for which r mk ≤ D max , and the amplitude of the correlation between the signals at the APs is evaluated after performing individual MRC.As for CDS-G and CDS-A, the signals received by the APs belonging to the same cluster are stacked, MRC is performed, and then decoding is attempted.The pseudocode of the correlationbased cooperative detection, decoding, and SIC (CDS-B) scheme is given in Algorithm 5.

V. NUMERICAL RESULTS
In this section, we initially assess the validity of the developed analytical model in a controlled scenario and employ it to investigate how the performance metric of the AP signal combining schemes, γ i , varies across different cluster compositions and under various interference conditions.We also show through a practical example the relationship between γ i and the statistic of the SINR after MRC of signals received by the APs in the cluster.Subsequently, we assess the performance of the genie-aided and practical cooperative detection and decoding schemes (i.e., CDS-G, CDS-A, and CDS-B) proposed in Section IV over actual scenarios of interest for CF-mMIMO systems.This evaluation is conducted through extensive simulations; the performance is compared with that of the CS and NC schemes.

A. AP Cooperation Performance Metric
We consider a circular area of radius L = 300 m with a reference device i located in the center, M = 20 singleantenna APs randomly positioned with uniform distribution in the annulus with inner and outer radii measuring 3 m and 12 m, respectively, centered in the reference device's position, and a set of active interferers distributed according to a PPP with density λ = 0.015 interfering devices per m 2 and per slot.We consider a generic slot of the CRA scheme detailed in Section II, in which we assume that all the interferers and the reference device transmit a packet replica picking a random pilot among the P available ones.The corresponding density of the devices adopting the same pilot as the reference device is λ E = λ/P.The path-loss model is the one defined in (10), with L 0 = 40 dB and α = 3.The ratio between the transmit power of each device and the noise power is ρ/σ 2 N = 130 dB, and the shadowing coefficient is σ S = 4 dB.We compare the analytical expression in (26) with the outcomes of a Monte Carlo simulator.
Fig. 3(a) depicts γ (C i ) versus the number of APs in cluster C i under different pilot contamination regimes.The figure shows that the analytical expression matches well the outcomes of the Monte Carlo simulations when the number of orthogonal pilots is sufficiently large, i.e., in a low-pilot contamination regime.This stems from the assumption that the elements of ẽi and q j are uncorrelated for j ∈ I i .In practice, the correlation between these vectors' elements is minimal and becomes negligible when the number of available pilots is sufficiently large.However, when the number of orthogonal pilots is limited, this correlation introduces a small discrepancy between the analytical model and Monte Carlo simulations.This discrepancy is not significant, as the trend of γ i remains consistent, allowing us to draw reliable conclusions about the choice the optimal cluster size for cooperative detection and decoding of the message of device i.
Moreover, Fig. 3(a) shows that when only a small number of orthogonal pilots is available, the impact of pilot contamination induces a degradation in performance when additional APs that are far apart from the target device are incorporated into the cluster.In fact, with P = 32 the maximum γ (C i ) is obtained, by including in the cluster solely the C = 7 nearest APs, suggesting that optimal performance is achieved with small clusters or even individual APs when P is small.Conversely, Fig. 3(a) shows that when the effect of pilot contamination is mild (i.e., P = 64), constructing larger clusters opens up the possibility of increasing the system performance.This effect is even more pronounced when the number of available orthogonal pilots becomes large (i.e., P → ∞), making the effect of pilot contamination negligible.In such conditions, adding all the 20 APs to the cluster improves the system performance.
This effect is also illustrated in Fig. 3(b), which presents the CDF of the actual SINR obtained via Monte Carlo simulations for various cluster compositions considered in Fig. 3(a), with P = 32 orthogonal pilots.Fig. 3(b) shows an improvement in the SINR CDF when the signal received by user i is processed by a cluster of C i = 7 APs compared to a single AP.However, incorporating all M = 20 APs in the cluster leads to a degradation in the CDF.This behavior is consistent with the metric γ i in Fig. 3(a), where the curve for P = 32 peaks around C i = 7.
Additionally, the dashed curve in Fig. 3(b) represents the CDF of the SINR obtained by forming a cluster of 7 APs using the genie-aided algorithm CDS-G.This demonstrates that the SINR statistics can be significantly improved if the cluster is not only limited in size, but also adaptively constructed based on the instantaneous achievable SINR.

B. Cooperative Detection and Decoding Performance
Monte Carlo simulations were run to assess the performance of the genie-aided and the practical cooperation schemes, in terms of packet loss rate (PLR), P L , versus the number of simultaneously active devices per frame, K.We remark that while a simple simulation scenario was considered for the validation of the analytical derivation in Section V-A, an actual CF-mMIMO system with the following specifications is used to assess the performance of the AP cooperation schemes.We considered a square area of 400 m × 400 m with M = 256 single-antenna APs, connected to a single CPU and evenly distributed on a regular grid (Fig. 1).Each AP is assumed to have a height of 10 m.To limit border effects, devices are assumed to wake up in a square area of 300 m×300 m centered on the AP grid.The path-loss between the ith device and the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where r mi [m] is the distance between active device i and AP m in meters.The shadowing intensity has been set to σ S = 6 dB.
As pointed out by several authors (e.g., [10]) this propagation model matches well the 3GPP Urban Microcell model in the sub-6 GHz band, in NLOS conditions and for AP antennas at a height of 10 m.The UE transmit power ranges from −20 to 10 dBm; it is set to 10 and 0 dBm when ρ/σ 2 N = 100 dB and ρ/σ 2 N = 90 dB, respectively.The message length of each active device is 421 bits.The message is encoded by a (511 421 10) BCH code, a null bit is appended to the codeword, and finally the encoded bits are mapped onto a QPSK constellation, yielding a data payload of N D = 256 symbols.The payload is appended to a pilot sequence of N P = 16 symbols to form the packet.The number of slots per frame is N s = 78, the number of packet replicas is R = 2, and the number of available pilot sequences ranges from P = 1 to P = 16 depending on the test.
1) P L Versus K in High-Signal-to-Noise Ratio Regime: In Fig. 4(a), the proposed processing schemes are compared in terms of P L versus the number of active users per frame, K, for ρ/σ 2 N = 100 dB and N = 1.In this condition, the limiting factor for system performance is the interference among devices.The genie-aided approach demonstrates that performance improvement via AP signal combining is possible, leading to an increase of around 17% of served users at P L = 10 −3 with respect to CS and more than 50% with respect to NC.Furthermore, also CDS-A and CDS-B exhibit a performance enhancement with respect to CS.As expected, the improvement is lower than that of CDS-G, since the practical schemes operate blindly, i.e., lacking prior knowledge about the active devices.Moreover, CDS-B is the practical scheme that yields the best performance, proving to be less sensitive than CDS-A to the choice of D max .This is because the CDS-A scheme forms clusters of cooperating APs based solely on their mutual distances, whereas the CDS-B scheme incorporates an iterative algorithm for cluster construction based on the similarity of the signals received by the APs.Consequently, the CDS-B scheme is more adept at identifying clusters of APs that experience the highest aggregated SINR.
For comparison, we also simulated a scenario with a lower number of multiantenna APs distributed across the area, while maintaining the same total number of antennas.Fig. 4(b) presents the performance of the proposed schemes with M = 64 APs, each equipped with N = 4 antennas, and a SNR ρ/σ 2 N = 100 dB.The total number of APs antennas in this scenario is 256, as in the setting of Fig. 4(a).Comparing the performance of the CDS-G scheme in Fig. 4(b) with that in Fig. 4(a), it is evident that distributing the antennas throughout the network area results in a significant performance gain due to AP cooperation.However, the performance gap between the practical schemes and the genie-aided one remains approximately the same in both configurations.
Fig. 4(c) shows P L versus K with ρ/σ 2 N = 100 dB and APs equipped with N = 4 antennas.In this condition, where interference among devices is the primary limiting factor for system performance, the genie-aided approach reveals that performance enhancement through AP cooperation remains viable.This results in an approximate 8% increase in the number of supported users at P L = 10 −3 compared to CS and an approximate 70% increase compared to NC.However, the scalability gain with respect to CS for APs with N = 4 antennas in such a high-SNR regime is less than that observed for single-antenna APs since an additional gain for all the schemes is due to multiple antenna processing.Similarly, both CDS-A and CDS-B show improved scalability, supporting around 4% more devices at P L = 10 −3 compared to CS.As anticipated, the enhancement is lower than that achieved by CDS-G.
2) P L Versus K in Low-Signal-to-Noise Ratio Regime: Fig. 5(a) shows P L versus K, setting ρ/σ 2 N = 90 dB and N = 1.Also in this low-SNR regime, cooperative detection and decoding achieves significantly better performance than NC and CS.The improvement has to be attributed to the effective exploitation of distributed multiple antennas, enabled by AP combining.Fig. 5(a) reveals how AP combining is a powerful tool to achieve good performance in a lowdevice transmit power regime, thereby increasing devices' battery lifetime and network energy efficiency.This result is  5(a) with that in Fig. 5(b), it is again evident that distributing the antennas throughout the network area enables a significant performance gain for AP cooperation.Additionally, the gap between the PLR achieved by the practical schemes and that of the genie-aided benchmark appears to be larger in this configuration.
Fig. 5(c) shows P L versus K with ρ/σ 2 N = 90 dB and APs equipped with N = 4 antennas.In this low-SNR regime, cooperative detection and decoding still achieves remarkably better performance than CS, even compared with the single-antenna case.The improvement has to be attributed to the effective exploitation of both co-located and distributed multiple antennas, enabled by AP combining.The results in Fig. 5(c) highlight that genie-aided AP combining supports approximately 12.5% more devices at P L = 10 −3 compared to CS and more than 50% with respect to NC, underscoring the effectiveness of AP combining in achieving robust performance in a low-device transmit power regime.Notably, in this regime, CDS-B with D max = 40 m remains the best practical cooperation scheme, demonstrating performance remarkably close with that of CDS-G.
3) Impact of Cooperation on Devices' Energy Consumption: Fig. 6 illustrates P L versus the ratio ρ/σ 2 N in a scenario with K = 1000 active users in the frame and P = 12 available orthogonal pilots.The figure demonstrates how AP cooperation enhances performance, enabling a reduction in device transmit power without compromising system efficiency.Specifically, achieving P L = 10 −3 with CS scheme requires ρ/σ 2 N = 95 dB, while CDS-G and CDS-B schemes achieve the same performance at ρ/σ 2 N = 90 dB.Note that NC performance is limited by a floor at P L > 10 −2 .Therefore, the adoption of cooperative detection and decoding schemes can lead to remarkable energy savings on the part of devices, enabling longer battery durations and aligning with the evolution of future mMTC systems designed for low-power applications.N = 1.As envisaged, increasing the number of APs leads to a performance improvement since the devices are more likely to be in close proximity to at least one of the APs.Interestingly, the performance gain attainable via AP cooperation increases when the are more APs in the area.Remarkably, P L = 10 −3 can be achieved with CDS-G and CDS-B schemes deploying 52 APs less than with CS.This presents an opportunity for network providers to simplify the network infrastructure and decrease the associated deployment costs.5) Impact of the Number of Pilots: Fig. 8 depicts P L for K = 1000 active users as a function of the number of available orthogonal pilots P, setting ρ/σ 2 N = 100 dB.As expected, increasing P leads to a performance enhancement since the effect of pilot contamination is reduced.However, from a certain number of available pilots on, P L saturates due to the remaining effect of payload interference.Fig. 8 shows that increasing P opens up the possibility to obtain a more substantial performance improvement via AP cooperation.Specifically, the performance gains achievable through CDS-G, CDS-A, and CDS-B schemes exhibit an increase when moving from P = 1 to P = 4.

6) Computational Complexity and Fronthaul Load:
To conclude the analysis, we evaluate the computational complexity of the proposed processing schemes leveraging Big O notation, that illustrates the efficiency of our algorithms in terms of number of computational steps required for their execution on a generic machine.We denote C max the number of APs involved in a decoding attempt in algorithm CDS-A.Throughout the analysis, the sequence of operations needed to perform channel estimation, payload detection, and message decoding is counted as c N steps, where c is the number of APs involved in the detection and decoding process.For example, c = 1 in the NC and CS schemes, c = C max in CDS-A, and c ranges from 1 to C max in CDS-G and CDS-B.Also, the sequence of operations needed to perform interference cancelation at one antenna of one AP is counted as 1 step.
We also calculate the amount of fronthaul traffic that the cell-free network architecture has to undergo for each proposed scheme to work.In particular, let us define the fronthaul load as the number of bits that all APs have to send to the CPU for the processing of one entire frame when adopting one of the proposed schemes, and let us denote by B the number of memory bits occupied by a complex number.
The computational complexities and fronthaul loads for all the proposed processing schemes are reported in Table I and detailed in the following.
1) NC: Phase 1 requires MNN s P steps for all the APs to process all the slots of the frame.Then, each AP subtracts the interference generated by the replicas of the decoded messages, and attempts decoding again in all slots in which the replicas were transmitted.In the worst case, this yields to MNKR steps for interference cancelation and MN(K − 1)RP steps for reprocessing the already processed slots.The overall complexity of NC is O(MN(N s P + KR + (K − 1)RP)).Considering the fronthaul load, each AP operates independently, transmitting the locally decoded messages to the CPU.
In the worst-case scenario, the fronthaul load is kKM bits per frame.2) CS: The CS scheme follows the same processing steps of NC, with the sole distinction being that the CPU orchestrates a distributed SIC phase.Consequently, in the worst-case scenario, the computational complexity of the CS scheme is equivalent to that of the NC scheme, i.e., O(MN(N s P + KR + (K − 1)RP)).However, we note that the NC scheme is less likely to decode all active users and thereby initiate the corresponding interference cancelations.Consequently, under practical conditions, the NC scheme exhibits lower computational complexity compared to CS, albeit at the cost of reduced system performance.In the worst-case scenario, the APs transmit a total of kKM bits per frame over the fronthaul.1 3) CDS-G: Phase 1 requires at most (1/2)KRNC max (C max + 1) steps to process all the users' replicas once.Regarding Phase 2, DIC and UCD procedures require at most MNKR and (1/4)NRC max K(C max + 1)(K − 1) steps, respectively.Therefore, the overall complexity of CDS-G scheme is O(MNKR + (1/4)NRC max K(C max + 1)(K − 1)).It is important to note that, since the CDS-G scheme is a genie-aided benchmark, it involves an exhaustive search for the subsets of APs that maximize the SINR of each user.This results in a computational complexity that scales with the square of the number of users and the square of the maximum AP cluster size, K 2 and C 2 max , respectively.Each AP sends (N P +N D )NN s complex numbers to the CPU, resulting in a total fronthaul load of B (N P + N D ) NMN s bits per frame.4) CDS-A: As this cooperation mechanism relies on the formation of M clusters, determined exclusively by the geometry of the APs, the number of steps required by the CDS-A scheme in the worst-case scenario is similar to that of the NC and CS schemes, with the only difference that the number of APs involved in the decoding attempts is C max .The overall complexity is thus O(MN( C max N s P + KR + (K − 1)RP C max )).As for CDS-G, the total fronthaul load is B (N P + N D ) NMN bits per frame.5) CDS-B: Phase 1 of the CDS-B scheme requires a maximum of (1/2)MNN s PC max (C max + 1) processing steps.Regarding Phase 2, DIC and CSCD procedures require at most MNKR and (1/2)MN(K − 1)PRC max (C max + 1) steps, respectively.Therefore, the overall complexity is O((1/2)MN(N s PC max (C max + 1) + 2KR + (K − 1)RPC max (C max + 1))).As for CDS-G and CDS-A, the total fronthaul load is B (N P + N D ) NMN s bits per frame.As an example, for system with M = 256, N = 1, N s = 78, K = 1000, k = 421 bits, N P = 16 symbols, N D = 256 symbols, assuming that a complex number occupies B = 128 bits in memory, the fronthaul load of CS and NC becomes 107.8 Mb/frame, while that of CDS-G, CDS-A, and CDS-B is 695.2Mb/frame (i.e., around seven times higher).
We also observe that the centralized nature of CDS-A and CDS-B ensures a stable fronthaul load independent of the number of users, making these algorithms scalable and practical in large networks.On the other hand, in more distributed algorithms, such as NC and CS, the fronthaul load depends on K: it initially increases with the density of users, until the system load reaches a critical point; once the system becomes overloaded and the PLR rises to large values, the fronthaul load decreases due to the reduced number of successful decoding events.

VI. CONCLUSION
In this article, a grant-free MMA scheme based on CRA and specifically designed for CF-mMIMO networks has been presented.Leveraging the distributed nature of the network and the combining of signals received at properly selected APs, the proposed processing enhances the system scalability-versusreliability tradeoff.Initially, a stochastic geometry-based analytical framework to study the performance of AP cooperation under different cluster compositions has been derived.Then, the potential gain attainable via AP cooperation has been explored using a genie-aided approach.Finally, two practical algorithms for AP selection in a fully grant-free context have been proposed.Numerical results have shown how AP cooperation through AP signal combining and cooperative SIC can bring tangible benefits even without prior information about active users, under both high and low-SNR conditions, closing in some cases the gap to the genie-aided approach.Moreover, it has been shown how the adoption of cooperative detection and decoding schemes can lead to energy saving, enabling longer devices' battery durations and aligning with the evolution of future mMTC systems designed for low-power applications.It should be remarked that AP signal combining comes at a price in terms of increased traffic on the fronthaul, as soft signal samples are forwarded to the CPU, and of increased complexity in the CPU, that is now tasked with deciding which APs should be selected for signal combining, and with decoding messages from combined signals.These aspects represent further directions of investigation.

Manuscript received 15
March 2024; revised 29 May 2024 and 2 July 2024; accepted 5 July 2024.Date of publication 15 July 2024; date of current version 23 August 2024.This work was supported in part by the Consorzio Nazionale Interuniversitario per le Telecomunicazioni (CNIT) Wireless Communications Laboratory (WiLab) and the WiLab-Huawei Joint Innovation Center, and in part by the European Union through the Italian National Recovery and Resilience Plan of NextGenerationEU, partnership on "Telecommunications of the Future" under Grant PE00000001-RESTART.This article was presented in part at the 2024 IEEE International Conference on Communications (ICC 2024) Workshops.(Corresponding author: Enrico Testi.)

Fig. 1 .
Fig. 1.Illustration of a CRA CF-mMIMO scenario where two clusters of APs obtained via CDS-A and CDS-B strategies perform cooperative detection and decoding of the packets transmitted by the jth and the ith active devices, respectively.

Algorithm 2 : 2 Execute4 6 for m ← 1 to M do 7 forall s ∈ [N s ] W do 8 Process slot s 9 IC 12 Remove W from L 13 Execute
Pseudo-code for CS Scheme Phase 1 -Slot-by-Slot Decoding (SD) 1 for m ← 1 to M do Merge all the lists L m for m = 1, . . ., M 5 forall W ∈ L do SD([N s ] W ,m), m = 1, . . ., M 14 end DIC([N s ] W )

Algorithm 4 : 2 C 5 Process slot s 6 for m ← 1 to M do 7 for j ← 1 to P do 8 C ← |C m | 9 ĥk 2 , ∀k ∈ C m 10 v 2 11Attempt decoding v m (j) 12 if decoding successful then 13 L
Pseudo-code for CDS-A Scheme 1 for m ← 1 to M do m ← all the APs j : r m,j ≤ D max 3 end Phase 1 -Slot-by-slot Cooperative Decoding (SCD) 4 forall s ∈ [N s ] do (j) ← Y k,P p H j / p H j m (j) ← qm (j) H Y m,D / qm (j) l ∈ L do 19 Execute DIC([N s ] l ) 20 Execute SCD([N s ] l ) 21 end RD(C 1 , . . ., C M ; D max ) SCD([N s ]) DD(C m , j)

Fig. 3 .
Fig. 3. (a) Performance metric γ (C i ) versus the number of APs in cluster C i , with different numbers of orthogonal pilots, P (i.e., under different pilot contamination regimes).(b) CDF of the SINR under different cluster compositions with P = 32.

Fig. 6 .
Fig. 6.P L of the proposed processing schemes versus ρ/σ 2 N , with K = 1000, P = 12 orthogonal pilots, and N = 1.The performance gap between the proposed schemes is highlighted at P L = 10 −3 .
is the aggregate of the payload signals received by all the antennas of the APs in cluster C i , and the estimates of all channel gains are stacked in the single vector qi = [ ĥT t 1 i , . . ., ĥT t C i i ] T , where ĥT mi = [ ĥm1i , . . ., ĥmNi ] with m ∈ C i .Note that the MRC operation in qi 21 while C ≤ C max and W i not decoded 22 if W i not decoded then 23 r ← r + 1 27 forall l ∈ L do 28 Execute DIC([N s ] l )