Chirp-Based Majority Vote Computation for Federated Edge Learning and Distributed Localization

In this study, we propose an over-the-air computation (OAC) scheme based on chirps to detect the majority votes (MVs) in a wireless network for federated edge learning (FEEL) and distributed localization. With the proposed approach, a group of votes is mapped to an index of a linear chirp at each edge device (ED). From superposed chirp signals, the corresponding MVs at the edge server (ES) are then detected non-coherently with a set of energy comparators by exploiting the bit representation of the indices. The proposed scheme is power-efficient and has low out-of-band emission while it does not use the channel state information (CSI) at the EDs and ES. Hence, it paves the way for long-distance FEEL and distributed localization based on MVs in a wireless sensor network with low-complexity devices. For FEEL, we comprehensively demonstrate the efficacy of the proposed approach under heterogeneous data distribution. For localization, we propose iterative refinements and multiple repetitions to improve the localization performance. We show that the proposed strategies minimize the distance between the root-mean-square error (RMSE) error and quantization bound.


I. INTRODUCTION
I NTERNET-OF-THINGS (IoT) leads to a massive amount of data distributed across many edge devices (EDs) such as sensors. For many applications, it is necessary to process the distributed data for further inferences. However, the transmission of local data from multiple EDs to the fusion nodes over wireless links not only causes spectrum congestion but also raises privacy concerns. To address these challenges, over-the-air computation (OAC) has recently been proposed [3], [4]. By utilizing the signal superposition property of wireless channels, OAC can compute a function of local data at EDs, leading to significantly reduced latency and increased privacy. However, multipath fading, power misalignment, and time-synchronization issues in a practical network complicate the design of a reliable OAC scheme. In this work, we primarily focus on the design of a reliable OAC scheme that can be used for distributed learning and localization in a large-scale sensor network.
In the literature, many studies investigate OAC for distributed learning, in particular for federated edge learning (FEEL), i.e., one of the well-studied distributed learning paradigms [5], [6]. FEEL deploys federated learning (FL) over a wireless network, in which many EDs participate in training using locally accessible data and an edge server (ES) aggregates the local decisions without accessing the data at the EDs. In [3], the authors develop an analog aggregation framework, where the model parameters of a neural network are transmitted over orthogonal frequency division multiplexing (OFDM) subcarriers and aggregated over the air. To mitigate the impact of multi-path fading, truncatedchannel inversion (TCI) is used, in which the symbols on the OFDM subcarriers are multiplied by the inverse of the channel coefficients. However, because channel state information (CSI) is a function of the synchronization point, TCI requires sample-level time synchronization. Additionally, for TCI, the subcarriers that experience deep-fading are also excluded from the transmission which could potentially lead to information loss. In [4], the authors propose a digital aggregation framework called one-bit broadband digital aggregation (OBDA) based on sign stochastic gradient descent (signSGD) [7], where the signs of the gradients are mapped to the quadrature phase-shift keying (QPSK) symbols and transmitted along with TCI. In [8], the authors analyze a scenario where the ES is equipped with multiple antennas and the CSI is not available to the EDs. To achieve coherent combining, the ES uses superposed CSIs. In [9], the authors propose an orthogonal OAC scheme with majority vote (MV) based on signSGD [7], where the MV is calculated by comparing the values of energy accumulated over orthogonal OFDM resources. In [10], it is shown that if the gradients are correlated, the resulting OFDM symbol can cause very high instantaneous peak power. Hence, OBDA and the scheme introduced in [9] yield a high peak-to-mean envelope power ratio (PMEPR). To address this issue, in [10], the signs of the gradients are transmitted as pulse-position modulation (PPM) symbols constructed with discrete Fourier transform (DFT)-spread OFDM (DFT-s-OFDM), while preserving the non-coherent orthogonal detection from [9]. Nevertheless, PMEPR can still be high depending on the choice of the parameters, e.g., pulse duration in a PPM symbol. The proposed methods in [9], [10] do not rely on the CSIs and therefore, are resistant to time-synchronization issues. However, the aggregation relies on a power control mechanism to account for path loss and ensure that the signals from all the EDs arrive at the ES with a similar power level. Therefore, power efficiency is also crucial in order to achieve wider cell coverage. In [1], a low PMEPR circularly-shifted chirp (CSC) based MV is proposed for power-efficient FEEL implementation to achieve wider coverage. However, the approach results in low spectral efficiency, which can jeopardize the advantage of OAC as compared with the one based on the traditional first-communicate-then-compute approach. In [11], a fast and accurate chirp-based data aggregation technique is introduced for Long Range (LoRa) networks. We suggest a power-efficient alternative that also preserves good spectral efficiency.
Distributed localization in wireless sensor networks (WSNs) is another fundamental application of Internet-of-Things (IoT) networks. In particular, localization accuracy and latency are the key aspects of distributed localization. To make full use of the multipath effect and dependence on environmental factors, the fingerprint-based localization methods are proposed to obtain the location through the establishment of a fingerprint database [12], [13]. An energy-efficient solution for fingerprint-based positioning is proposed in [14]. However, fingerprint data collection requires a large number of signal samples and needs to be updated regularly to account for environmental changes, which makes the fingerprint matching process computationally intensive [15]. In the literature, many studies investigate distributed localization for low-complexity sensors, e.g., [16], [17], [18], [19].
In [16], the authors propose a range-free localization as a cost-effective solution for low-complexity WSN. The proposed work, however, can only achieve coarse accuracy. The authors in [17] propose a voting-based localization that can withstand malicious beacon nodes. However, a storage overhead must be maintained because the target node must store all local data. In [18], the authors propose a collaborative distributed localization scheme that allows for a trade-off between wireless transmissions and accuracy. In [19], the authors propose a distributed localization algorithm for low-power sensors in Fifth Generation (5G)-enabled IoT networks. The proposed methods in [18] and [19] require the sensors to access information about the neighboring sensors, which is inconvenient in practice and introduces communication overhead. Another concern with the schemes in [16], [17], [18], [19] is that the latency increases linearly with the number of sensors. To the best of our knowledge, OAC has not been used for distributed localization in the literature.
In this work, we propose a power-and spectrally-efficient OAC scheme based on chirps to achieve a large coverage range and a set of solutions for distributed learning and localization based on the proposed scheme. Our contributions are listed as follows: • A low-complexity OAC solution: Motivated by the lowpower consumption of the chirp waveforms, we develop a CSC-based OAC method to calculate the aggregation of the local binary votes at an ES. We map the local votes to a linear combination of CSCs for transmission. Our scheme is based on non-coherent energy detection and it does not require coherent aggregation while operating under imperfect synchronization. Since it does not rely on the availability of CSI at ED or ES, it is suitable for low-complexity sensors. • Spectrally-efficient transmission: We improve the encoder proposed in our previous work [1] by increasing the spectral efficiency by mapping a group of votes to a chirp index. We provide theoretical comparisons with the encoder in [1]. The proposed encoder also provides a trade-off between spectral efficiency and PMEPR. • Power-efficient transmission: We show that our chirp-based transmission offers a significantly lower out-of-band emission than OBDA when power amplifier (PA) non-linearity is considered. For a given adjacent-channel-leakage ratio (ACLR) constraint, we demonstrate that the proposed scheme requires less output-power back-off (OBO) than that of OBDA. A lower OBO requirement allows power efficient transmission, which results in enhanced cell coverage. • Comprehensive analysis for distributed learning and localization: Inspired by the communication efficient distributed training by the MV with signSGD [7], we develop a chirp-based distributed learning over a wireless network by leveraging the wider cell coverage of CSC-based MV (CSC-MV). We demonstrate that the proposed approach achieves high test accuracy under both homogeneous and heterogeneous data distribution scenarios. Also, based on the proposed scheme, we present a new low-complexity localization method. By adding further analysis to our initial results in [2], we incorporate multiple iterations and repetitions to enhance the resolution of the localization and increase the detection probability, respectively. We demonstrate that the proposed modifications can minimize the rootmean-square error (RMSE) to a comparable level to the quantization bound. The rest of the paper is organized as follows. In Section II, the system model is given. In Section III, the proposed scheme is discussed. In Section IV, the proposed method is analyzed for FEEL and distributed localization. In Section V, we provide several numerical results. Finally, we conclude the paper in Section VI.
Notation: The complex and real numbers are denoted by C and R, respectively. Z S denotes the set of integers {0, 1, . . . , S − 1}. E[·] is the expectation value of its argument. The sign function is denoted by sign(·) and results in 1, −1, or ±1 at random for a positive, a negative, or a zero-valued argument, respectively. The N-dimensional all zero and one vectors are 0 N and I N , respectively. P[·] is the probability of an event.

II. SYSTEM MODEL
Consider a cell with K EDs and an ES. Each ED and the ES is equipped with a single antenna. The EDs participate in Q votes indexed by i ∈ Z Q . Let v k,i ∈ {−1, +1} denote the ith vote of the kth ED. The difference in the number of EDs voting for +1 and −1 for the ith vote can be written as The ES aims at computing the i as the target function, and calculate the MV across K EDs for a certain application as The main objective of this study is to develop an OAC scheme that estimates the i , for all i ∈ Z Q with the considerations of power and spectral efficiency, and evaluating its efficacy for applications such as FEEL and distributed localization as discussed in detailed in Section IV.

A. POWER CONTROL
We consider a power control mechanism that takes into account the maximum transmit power constraint at the EDs. Let P t (r k ) and P r (R) be the average transmit power of the kth ED located at the distance r k away from the ES and the average received power at the ES, respectively. The received power for the kth ED located at a distance r k away from the ES can be expressed as where α is the path loss exponent of the corresponding channel and G ref is the path gain for the reference link distance r ref . We assume that α is known at the transmitter side. With the consideration of a minimum OBO constraint, under power control, P t (r k ) can be expressed as where r P is the threshold range beyond which the EDs are unable to increase the transmit power, and P ref is the transmitted power at distance r ref .  )) and OBO min = 10 log 10 (P sat /P t (r P )), respectively, where P sat is the saturated power output of the PA. Hence, we obtain r P as Therefore, a smaller OBO min value offers a wider coverage. Without loss of generality, we assume that G ref = 1 and P ref = 1 Watt in this study.

B. SIGNAL MODEL: CIRCULARLY-SHIFTED CHIRPS
In this work, we map the votes to chirps and compute the aggregation of votes via OAC. Let B 0 (t) = e jψ 0 (t) be the complex baseband signal of a band-limited linear chirp signal with the duration T chirp . We assume the frequency of the phase term ψ 0 (t) is changing from − D 2T chirp Hz to D 2T chirp Hz from time 0 to T chirp seconds for some frequency sweeping factor D. Let B l (t) = e jψ l (t) be the lth circular translation of B 0 (t), where τ l = l/M × T chirp is the amount of circular shift and m = 0, 1, 2, . . ., M − 1. The complex baseband signal p(t) of a linear combinations of the M translated chirps can be expressed as where d l is the lth modulation symbol for d l = 0 or d l ∈ C, e.g., a phase-shift keying (PSK) symbol. In [20], it is shown that a linear combination of CSCs can be synthesized through a DFT-s-OFDM transmitter with a selection of specific frequency-domain spectral shaping (FDSS) coefficients, with the motivation of its compatibility to 3GPP 4G LTE and 5G NR. The synthesis technique can be used to generate the mth transmitted baseband signal for each ED as where t k,m ∈ C N is the mth baseband signal in discrete time for the kth ED, d k,m ∈ C M is a vector that contains the symbols on M bins, D M ∈ C M×M is the orthonormal M-point DFT matrix, f ∈ C M is the special FDSS vector for CSCs, M f ∈ R N×M is the mapping matrix that maps the output of the DFT precoder to a set of subcarriers, and F H N ∈ C N×N is the orthonormal N-point inverse DFT (IDFT) matrix. In [20], the vector f is derived where c j is given by where S(·) and C(·) denote the Fresnel integrals with sine and cosine functions, respectively, The M input indices of the DFT-s-OFDM transmitter corresponds to the M CSCs within the symbol duration.
In this study, the synchronization errors are taken into account. The time synchronization of the EDs may be imperfect, which leads to the different times of arrival at the ES. We consider that the times of arrival of the signals vary randomly from 0 to N arr discrete time samples with equal probability. Thus, the maximum difference between the arrival times is T arr = N arr T sample . The synchronization point where DFT is applied to the received signal for demodulation at the ES can also be detected inaccurately. We consider that the synchronization point at the ES can vary by N err discrete samples to model the impairment. Thus, T err = N err T sample is the maximum time difference between the erroneous synchronization points. The maximum synchronization error is then T sync = T arr + T err seconds. We also introduce random phases to the frequency domain to model the impact of imperfect synchronization in the time domain. Let T chn seconds denote the maximum delay of the corresponding channels. We assume that a synchronization mechanism can ensure that the duration of the cyclic prefix (CP) is longer than the sum of the maximum synchronization error T sync and the maximum delay T chn . For example, in 802.11ax, an access point (AP) is capable of initiating a synchronization procedure for multi-user orthogonal frequency division multiple access (OFDMA) transmission using a trigger frame (TF)-based signaling [21]. Therefore, we can write the mth received baseband signal in discrete-time as where H k ∈ C N×N is a circular-convolution matrix based on the DFT of the channel impulse response (CIR) of the channels between the EDs and the ES, and n m ∼ CN (0 N , σ 2 n I N ) is the additive white Gaussian noise (AWGN). At the ES, the received symbols on the M bins of the mth OFDM symbol can be written as whered m ∈ C M is the vector containing the received symbols on the mth OFDM symbol. We assume that the frequency synchronization is perfect. In practice, residual frequency offsets are inevitable.
We assume that a control mechanism, similar to that used in 3GPP Fourth Generation (4G) Long-Term Evolution (LTE) and/or 5G-New Radio (NR) with random-access channel (RACH) and/or physical uplink control channel (PUCCH), is used to manage frequency synchronization errors in the network [22].

C. NOTATIONS FOR FUNCTIONS AND SEQUENCES
Let f be a binary function that maps Z 2m Similarly, let g be a pseudo-Boolean function that maps Z m 2 to R. In this study, to simplify the notation, we use· on top of a function to denote a function that maps the non-negative integers in ., x m ) and y (y 1 , y 2 , . . . , y m ). In other words, i and j are the decimal representations of the binary numbers constructed by using all elements in the sequence x and the sequence y, where the most significant bits are x 1 and y 1 , respectively. Similarly,g maps i ∈ Z 2 m to the co-domain of g asg(i) g(e −1 (i)). The bar· over a binary digit denotes NOT operation.

III. MAJORITY VOTE COMPUTATION WITH CHIRPS A. TRANSMITTER 1) ENCODER
At each ED, the Q available votes are partitioned into G group of length N v , indexed by g ∈ Z G , and the last group is padded with the −1 votes if Q is not an integer multiple of N v . Without loss of generality, we focus on the transmission of gth group of votes. Let the sequence v k denote the gth group of votes for the kth Let s k e(y k ) be the decimal value representation of y k . Our proposed encoder encodes y k to a polynomial representation p k (z) such that only the s k th term has a non-zero coefficient, and the rest of the coefficients are zeroes. The polynomial presentation p k (z) can be written as where the functionf k (s) denotes the coefficient of the sth term, and is defined bỹ The corresponding Boolean function f (x, y) can be expressed as

2) MODULATION
Let N s be the number of CSC symbols used in the transmission of Q votes and the symbols are indexed by m ∈ Z N s . Let B denote the number of CSCs per symbol and assume that M is an integer multiple of B. We then partition the M CSC indices within one CSC symbol into B blocks indexed by b ∈ Z B , where the bth block specifies the sequence of , using a predefined mapping function M for m g ∈ Z N s , and b g ∈ Z B . The decimal value s k indicates the activated subblock index. Therefore, as (11) implies, only the s k th subblock of the S available subblocks is activated.
If the corresponding bin of a CSC index carries a symbol, it is designated as an active CSC; otherwise, it is inactive. After activating the s k th subblock, only the first CSC of the corresponding subblock is activated with a random PSK symbol, and the rest of the M g = M b /S − 1 CSCs are deactivated. Deactivating CSCs corresponding to the adjacent M g indices provides a guard period of T g = M g T chirp /M between the activated CSCs in the time domain. The guard period protects against interference due to the delay spread of the channel, and imperfect time synchronization. To ensure negligible interference, we choose M g such that the condition specified by is maintained. Let the vector d  (m, b). We set the following symbols as where p k,g is a random symbol on the unit circle.

B. RECEIVERS 1) DEMODULATION
We assume that the ES knows the mapping function M so that the corresponding symbol and block indices (m g , b g ) are known. Note that, following demodulation, the condition in (14) ensures that the demodulated symbols are spread across a maximum of M g indices. ∀s, the energy of the superposition of the spread symbols in the sth subblock can then be calculated as

2) DECODER
We consider block-fading channels, i.e., the channel coefficient is fixed for each symbol, and each ED and changes randomly for the next symbol. The polynomial presentation for the gth vote at the receiver can be expressed as where h k is the corresponding block channel coefficient for the kth user and n s is the noise term for the sth subblock.
Without loss of generality, we focus on detecting the ith vote and assume that the ith vote belongs to the gth group of votes. We also assume that K + w EDs vote with 1 and K − w EDs vote with 0 for the ith vote. Let w refer to the position of the ith vote on the gth group. By the definition of v k in Section III-A1, we can obtain w = i − (g − 1)N v . Our proposed decoder compares the energies of the subblocks specified for x w = 0 with that of x w = 1 to determine the decision metric of the ith vote as Following the estimation of˜ i , we use the definition of MV in (2) to calculate the ith MV as 3) PERFORMANCE ANALYSIS a) Average performance: The sum of all possible combinations for x w = 0 can be written as We consider that the channel block coefficients follow Rayleigh distribution and E[h k h * k ] 1, ∀k. The noise n s is AWGN with variance σ 2 n . By using (12), the expected value of the left-hand side of (19) can be expressed as Similarly, We can use equation (19) and (21) to obtain Therefore,˜ i is an unbiased estimation of i and the scheme detects the correct MV in average. b) Probability of correct MV detection: We define the probability of detecting the correct ith MV as obtained as For a given set {v 1 , . . ., v K }, g sup (x) g sup (s) is a random variable following an exponential distribution with the rate λ s K k=1 M bfk (s) + σ 2 n . Theorem 1: Let λ u = λ v for u = v. The probability of detecting the correct w MV detection is obtained as , and l u is uth Lagrange basis polynomial associated with the value λ u , ∀u ∈ S + w [23], and can be defined by Similarly, l v is the Lagrange basis polynomials associated with the value λ v , ∀v ∈ S − w , and can be defined likewise (25). The proof is given in Appendix A.

4) EFFECTIVE SNR
From (22), the expected received signal-to-noise ratio (SNR) when all the EDs transmit together can be written as acts as a gain factor to the effective SNR.

C. COMPARISONS WITH STATE-OF-THE-ART SCHEMES
The scheme of [3], [4], [8] experience high PMEPR as they are based on OFDM transmission. The key benefit of our proposed approach over those in [3], [4], [24] is that CSC-MV can achieve low PMEPR leading to wider cell coverage. CSC-MV also does not rely on the CSI to function. Non-coherent detection also aids in the elimination of synchronization issues. On the other hand, CSC-MV offers a trade-off between PMEPR and resource consumption. The PMEPR increases with the number of CSCs [25]. hence, for B number of CSCs transmission, the PMEPR is 10 log 10 B. However, the number of symbols needed to train each round is Q B . As a result, we must increase the PMEPR limit to reduce resource utilization.
Our proposed method updates the encoder in [1] and utilizes the available spectrum more efficiently, therefore, requires fewer resources. For the proposed mapper, each CSC symbol can carry B log 2 M B(1+M g ) votes, while the encoder in [1] can carry B votes. Therefore, by offering an option to select M g , our proposed encoder delivers a gain factor of log 2 M B(1+M g ) . From (14), we get the optimum M g as

A. DISTRIBUTED LEARNING
Let D be the dataset containing all the labeled data samples. Also, let the vectors x and y be a data sample and its associated label, respectively for {(x, y)} ∈ D. Let D k denote the local dataset for user index, k = 0, 1, . . . , K − 1 such that D = K k=1 D k . The centralized loss function can be expressed as where w = [w 1 , . . ., w Q ] T ∈ R Q is the parameter vector, f (w, x, y) denotes the sample loss function that measures the labeling error for (x, y). In the case of distributed learning, the goal is to minimize the loss function in (27), where the dataset is not uploaded to a centralized server. Letg (n) k be the local stochastic gradient vector given bỹ

VOLUME 4, 2023
whereD k ⊂ D k is the set of the selected data samples and n b = |D k | is the batch size, w (n) is the parameter vector at the nth communication round. The ith element ofg (n) k is denoted byg (n) k,i , which represents the local gradient for parameter w i for the kth ED. Instead of communicating the true values of g (n) k,i , the signs of the gradients,ḡ k,i ) are used to reduce the communication cost [7]. Provided that the local votes are available to the ES, the MV for the ith gradient at the MV v After calculating the MVs, ∀i, the ES sends the MV vector Q ] T to the EDs. The updated parameters at the nth communication round can be expressed as where η is the learning rate. For FEEL, we consider the same procedure outlined above. However, we calculate the MV in (29) with an OAC scheme that relies on the non-coherent detection of CSCs.
The MV computed with (18) obtains the original MV given in (29), probabilistically, due to the non-coherent detection. Nevertheless, for a non-convex loss function F(w), we can show that CSC-MV still maintains the convergence of the original MV in [7] under the assumptions given as follows: Assumption 1 (Bounded Loss Function): ∀w, there exists a F * such that F(w) ≥ F * .
Assumption 2 (L-Smooth Gradient [26]): Let g be the gradient of F(w) for some w. ∀w and ∀w , the expression holds for some non-negative constant values, L 1 , . . ., L Q .

Assumption 3 (Bounded Variance):
The local estimates of the stochastic gradient, {g k = [g k,1 , . . .,g k,Q ] T = ∇F k (w (n) )}, ∀k, are independent and unbiased estimates of g = [g 1 , . . ., g Q ] T = ∇F(w) with a coordinate bounded variance, i.e., E g k = g, ∀k and Assumption 4 (Unimodal, Symmetric Gradient Noise): For all k, for any given w, each element of the vectorg k , follows a unimodal distribution that is symmetric around its mean.
We also assume that CSCs are orthogonal to each other. This assumption is not strong because the interference among them can be maintained negligibly low if the CSCs are sufficiently separated apart in time from each other. The orthogonality of the CSCs and the presence of a guard period T g between the activated CSCs ensure energy accumulation over orthogonal resources. Hence, based on the aforementioned assumptions, by following the similar steps mentioned in [9] or [10], the following theorem can be derived: Theorem 2: For n b = N/γ and η = 1/ L 1 n b , the convergence rate of our proposed FEEL in fading channel can be expressed as, where a = (1 The proof of Theorem 2 is provided in Appendix B.

B. DISTRIBUTED LOCALIZATION
In this section, we use our proposed OAC scheme to implement the localization strategy specified in [17]. Consider an L × L rectangular area with K EDs and an ES, where L is the length in meters on the x-axis and y-axis. Assume that each ED and ES are equipped with a single antenna. We also assume that each ED is aware of its location, e.g., through a Global Positioning System (GPS) or fixed configuration. The EDs can also measure their distance from the ES, e.g., by using methods based on the received signal strength indicator (RSSI), time of arrival (ToA), or time difference of arrival (TDoA) [27]. We assume that ES is mobile and desires to identify its location via the feedback signals from EDs without accessing the local information at the EDs.
In this study, we consider the voting-based localization discussed in [17]. In this approach, the area is first divided into N g × N g cells. Each ED takes part in a voting process occurring at the ES and votes for the potential cells where the ES may be located in. To elaborate the voting process rigorously, let A x a, b, N g denote an arithmetic sequence of N g elements with the first element a and the last element b. Since each ED is aware of its position and its distance from the target, it can determine whether a cell is a potential candidate for the ES's location. Each cell is assigned to either 1 or −1 if it is a candidate cell or not, respectively. Let v k n x ,n y be the assigned value for the cell with the indices n x ∈ {0, . . ., N g − 1} and n y ∈ {0, . . ., N g − 1} on the x-axis and y-axis for the kth ED, respectively. Also, let i be the linear cell index that assigns a unique integer to each cell as i = n x + N g n y , ∀i ∈ {0, . . . , N 2 g − 1}. We then equivalently express the vote for the ith cell of the kth ED as v k,i = v k n x ,n y . The accumulation of votes is shown in FIGURE 2. The number in each cell indicates the total number of EDs voting for that cell (i.e., K k=1 (v k,i + 1)/2 for the ith cell). No ED votes for the empty cells. At the receiver side, the cell(s) with the most votes is the most likely location for the server. The index with the most votes can be determined by estimating the difference between the number of EDs that vote for +1 and −1, i.e., i , for all i, the cell index with the highest number of vote difference can be obtained as The target is then assumed to be at the center of the i max th cell.

1) ENHANCEMENTS
In this section, we propose several methods to address the probabilistic nature of the proposed scheme in order to enhance localization performance. a) Multiple repetitions: To increase the probability of correct detection in (32), one of the strategies that we consider is to use repetitive transmissions of the votes and obtain the cell index through a median operation. For a total number of N r repetitions, (32) can be expressed for the n r th repetition asĩ where˜ (n r ) i is the decision metric for the ith cell and the n r th repetition. The coordinates of the cell with the highest number of votes can be determined as X ES = median X 1 ES , . . . , X N r ES and Y ES = median Y 1 ES , . . . , Y N r ES . Note that the median is useful to eliminate outliers that occur in fading channels. b) Iterative method: The resolution of the localization is determined by the selected grid size N g and the length L. A larger grid size results in a smaller resolution but is computationally more demanding. We utilize an iterative procedure for further refinement. For the iterative procedure, the grid size is chosen as small. After the first round of iteration, the ES estimates its location and broadcasts the coordinates to all EDs. For the next iteration, the ED focuses on a smaller area with the estimated location at the center. Let the zoomed region for each iteration be equal to the area of N z × N z cells. For the n i th iteration, the grid size is then updated as L n i = L n i −1 × N z N g . The ED updates the chosen axes as and The EDs continue the voting process while taking the new grid configuration into account.

Algorithm 1 Algorithm for Localization
Initialize the grid length L and the grid (X ax , Y ax ) for n i = 1 : N i do for n r = 1 : N r do for k = 1 : K do if n i = 1 then Use the initialized parameters else if n i > 1 then

all EDs end for
Note that the iterative method can also be utilized with multiple repetitions. The transmission time increases by a factor of N i N r . All computations, however, occur over the air. Therefore, multiple iterations and repetitions do not introduce complexity to the location estimation as the complexity of the algorithms specified in (32) and (33) are the same.
Note that there exists an inaccuracy owing to the grid quantization with our scheme. If the ES is assumed to be randomly located in the area, based on the grid quantization, a bound on the RMSE can be obtained as where e is the localization error in meters.

2) COMPUTATIONAL COMPLEXITY
In Algorithm 1, the computational complexity of the calculations per ED is dominated by the construction of v k sequence from N 2 g votes. The computational cost of this step is O N 2 g . It should be noted that, for OAC, the calculations per ED can be regarded as parallel computations. As a result, the computational cost of Algorithm 1 is O N i N r N 2 g . If we compare this with an approach, where all EDs transmit their location and distance information without using OAC and the ES estimates the location by applying Least Squares Estimator (LSE)-based multi-lateration technique, the computational complexity is O K 3 [28]. Therefore, for OAC, the computational complexity is independent of K.

A. COMMUNICATION PERFORMANCE
We consider M = 240 subcarriers, IDFT size N = 256, cyclic prefix size N cp = 64, and a sampling rate of f s = 30.72 Msps. ITU Extended Pedestrian A (EPA) is considered for the fading channel with no mobility for each round and the channel variation is captured by regenerating the channel at each communication round. The path loss exponent is α = 4. We assume perfect power control within the coverage range (i.e., r k < r P ) and OBO ref is set to 30 dB for r ref = r min = 10 m. We consider the Rapp model for the PA at the EDs with a saturation amplitude of 1 and a smoothness factor of 3. For CSC-MV, we use N arr = 1 and N err = 1 for time synchronization errors and set M g = 4 to maintain the condition in (14). We consider perfect time synchronization for OBDA for a fair comparison, as OBDA does not function properly with imperfect synchronization [10].
The relation between the number of CSCs B and the number of votes per OFDM symbol M v is depicted in FIGURE 3 for the current configuration for CSC-MV with both the proposed and the encoder mentioned in [1]. The figure clearly shows that our proposed mapper allows for more votes than the old encoder. It shows that the total number of votes per OFDM symbol M v is 2 and 8 for the proposed and the encoder in [1], respectively. This implies that the proposed encoder is four times faster than the one for B = 2. The gain factor decreases as B increases. For B ≥ 13, both the new and the old encoder can carry the same number of votes.
In FIGURE 4(a), several PMEPR distributions for the proposed scheme are given. The OBDA causes substantially high PMEPR as the signs of the gradients may result in a constructive addition in the time domain. The CSCs, on the other hand, result in low PMEPR as shown in Section III-C. When B is 1, 2, or 6, The PMEPR is approximately 2 dB, 3 dB, and 8 dB, respectively. However, theoretically, the PMEPR should be 0 dB, 2 dB, and 6 dB when B is 1, 2, or 6, respectively. The synthesized signal is distorted due to the abrupt frequency change of the linear CSCs within a symbol duration [20]. As a result, the observed PMEPR is larger than the theoretical bound. However, PMEPR can be improved by choosing a very small D or by modifying the FDSS vector f [20]. The cubic metric (CM) distributions are shown in FIGURE 4(b). The CM of a time-domain signal x(t) is calculated as CM(x(t)) = (RCM x(t) − RCM ref )/K s , where the empirical slope factor K s is set to 1.52 for OFDM systems and RCM x(t) is defined by RCM x(t) = 20 log 10 (x norm (t)), where x norm (t) is the normalized signal. For the reference signal, we set RCM ref to 1.52 dB [29]. For B = 1, FIGURE 4(b) demonstrates that CSC-MV performs even better than the reference signal. Similar to the PMEPR results, CSC-MV outperforms OBDA for B ∈ {1, 2, 6}. FIGURE 5 shows the ACLR versus OBO plot for CSC-MV and OBDA. For both schemes, we consider a timedomain windowing with a raised cosine window to minimize spectral leakage. We define ACLR as the ratio of the power received outside the allocated frequency band of the channel to the received power on the assigned channel bandwidth. The plots show that under similar ACLR constraints, the power amplifier must operate at a larger OBO value for the OBDA compared to the CSC-MV. Moreover, the lowest ACLR that OBDA and CSC-MV can achieve is −23 dB and −28.22 dB, respectively. If we consider an ACLR constraint of −22 dB, we calculate OBO obda min = 10.5 dB,

B. DISTRIBUTED LEARNING PERFORMANCE
We consider the learning task of handwritten digit recognition over a FEEL system in a circular cell with a radius of R max = 50 m and the number of EDs, K = 50. The FEEL performance is tested under two different uplink SNRs, i.e., 0 dB and 20 dB. We compare the performance of CSC-MV with OBDA in this setup for both homogeneous and heterogeneous data distributions. For homogeneous data distribution, all digits are equally assigned to each ED. For heterogeneous data distributions, the cell is divided into two equal areas with an equal number of EDs. The first area is the circle with a radius of R max / √ 2. The second area is the ring-shaped area enclosed by two concentric circles with radius R max / √ 2 and R max . The EDs located at the first and the second area only have the data samples with labels {0, 1, 2, 3, 4} and {5, 6, 7, 8, 9}, respectively (See [9, Fig. 3] for illustration). Our model is based on the convolution neural network (CNN) given in [9], which contains Q = 123090 learnable parameters.   are 20, 37, and 42, respectively. For OBDA, the votes of the 20 near EDs have a stronger impact compared to the random votes of the 30 far EDs that are affected by imperfect power control. The training remains unaffected and OBDA performs well for homogeneous data distributions. However, it should be noted that CSC-MV achieves the same test accuracy without requiring perfect time synchronization and the availability of perfect CSI, while also maintaining power efficiency.  FIGURE 8(d) show that CSC-MV performs much better than OBDA in terms of test accuracy for heterogeneous data distributions. Although Assumption 3 does not hold in the case of heterogeneous data distribution, the test results are still remarkable for CSC-MV. The test accuracy results can be further understood from the loss vs. link-distance performance after 750 iterations given in FIGURE 9. For OBDA, the plot shows that the 20 near EDs only have half of the available labels. As a result, the trained The availability of all the labels allows the model to converge with high test accuracy. For the same reason, the test accuracy is high for CSC-MV (B = 2).
Observe that the test accuracy for CSC-MV with the proposed encoder and the encoder in [1] converge at approximately the same number of rounds. However, as FIGURE 3 suggests, our proposed encoder is four times faster and three times faster than the encoder in [1] in completing each round for B = 2, 4 respectively.

C. LOCALIZATION PERFORMANCE
We assume perfect power control while evaluating the localization performance. In FIGURE 10, we provide the RMSE versus SNR performance of the localization strategy for N r = {1, 3, 5, 7} and N i = 3. It is evident from the figures that the RMSE decreases as the number of repetitions N r increases. It happens because the probability of error lowers with repetitions, as stated in Section IV-B. FIGURE 10(a) and FIGURE 10(b) further demonstrate that the RMSE performance increases with the number of devices K. As discussed in Section III-B4, it occurs as a result of the increased SNR gain caused by the increased number of devices.
In FIGURE 12, the RMSE versus N i performance is compared for K = {16, 64} and N r = {1, 3, 5} along with the quantization bound. For the given parameter set, the RMSE performance is best for K = 64, N r = 5 and it performs worst when K = 16, N r = 1. The RMSE performance can be improved by either increasing the K or N r . The high RMSE results for N r = 1 from FIGURE 10 and FIGURE 12 are due to the large errors that rarely occur. For instance, FIGURE 11 demonstrates otherwise, i.e., the cumulative distribution function (CCDF) of error probability for N r = {1, 3, 5, 7}, with K = {16, 100} for SNR = 0 dB. The error is within the range of the bound with a probability of 99.99%. The RMSE, however, is impacted by a very small percentage of large errors that result in a large RMSE