Downlink Training Sequence Design Based on Waterﬁlling Solution for Low-Latency FDD Massive MIMO Communications Systems

: Future generations of wireless communications systems are expected to evolve toward allowing massive ubiquitous connectivity and achieving ultra-reliable and low-latency communications (URLLC) with extremely high data rates. Massive multiple-input multiple-output (m-MIMO) is a crucial transmission technique to fulﬁll the demands of high data rates in the upcoming wireless systems. However, obtaining a downlink (DL) training sequence (TS) that is feasible for fast channel estimation, i.e., meeting the low-latency communications required by future generations of wireless systems, in m-MIMO with frequency-division-duplex (FDD) when users have different channel correlations is very challenging. Therefore, a low-complexity solution for designing the DL training sequences to maximize the achievable sum rate of FDD systems with limited channel coherence time (CCT) is proposed using a waterﬁlling power allocation method. This achievable sum rate maximization is achieved using sequences produced from a summation of the user’s covariance matrices and then applying a waterﬁlling power allocation method to the obtained low-complexity training sequence. The results show that the proposed TS outperforms the existing methods in the medium and high SNR regimes while reducing computational complexity. The obtained results signify the proposed TS’s feasibility for practical consideration compared with the existing DL training sequence designs.


Introduction
The number of wireless devices has rapidly increased over the past ten years, creating enormous requirements for data services [1,2]. In particular, future generations of wireless systems are anticipated to progress in a direction that will enable widespread ubiquitous connectivity and achieve ultra-reliable and low-latency communications (URLLC) with very high data rates. Achieving URLLC would require advanced physical layer technologies. To this end, a massive multiple-input multiple-output (m-MIMO) system, which makes use of several base station (BS) antenna arrays, is introduced as a crucial technique for future wireless communications systems [3][4][5][6]. The m-MIMO technique has the ability to enhance the degrees of freedom of communication systems, leading to improvement in both data transmission rate and link reliability [7]. Furthermore, several advantages of an m-MIMO system can be obtained, such as allowing for a significant increase in spectral efficiency, which refers to the amount of data that can be transmitted over a given frequency band, reducing the impact of fading and interference that can increase the reliability of wireless communication, and allowing a uniform quality of service [8,9]. Finally, an m-MIMO system improves energy efficiency, because it enables the use of low-power radio-frequency (RF) components and reduces the need for complex signal processing [8].
The aforementioned advantages introduce m-MIMO systems as a key technology for future wireless communications systems [10]. However, channel state information (CSI) must be estimated accurately in order for an m-MIMO system to achieve its full potential gain.
When using time-division-duplex (TDD) transmission, the necessary CSI for precoding is obtained by combining orthogonal training sequence (TS) broadcasting by the users in the uplink (UL) transmission during every channel coherence time (CCT) period [3]. With channel reciprocity, no downlink (DL) TS is needed, and hence, the TS overhead is proportional to the number of users (K) rather than the number of antennas N [4,8,[11][12][13][14][15][16]. Hence, N can be made very large without affecting the system overhead [17]. This is why previous work on m-MIMO systems has focused on TDD transmission mode. However, the limitations of TDD-based systems with partial CSI typically originate from the transceiver hardware limitations and calibration error in the UL/DL RF chains, notwithstanding the encouraging findings of TDD transmission mode with m-MIMO systems [14,[18][19][20][21]. In addition, because the power amplifier is only activated part of the time, TDD systems often have poorer signal-to-noise ratios (SNR) than FDD-based systems.
If the UL and DL channels use the same frequency band, which is the case for TDD operation protocol, there can be a significant amount of interference between the UL and DL signals. Separating the frequencies used for UL and DL, which is the case in FDD operation protocol, makes it easier for the system to distinguish between the signals transmitted by different users and helps in reducing the interference and hence improving system performance. Furthermore, the FDD system is a well-established duplexing technique that is widely used in many existing wireless communication systems. For example, FDD operation mode is used by approximately 85% of the present-day mobile communication networks that use long-term evolution (LTE) technology for mobile communication [22]. This makes it easier to integrate m-MIMO systems with existing infrastructure and devices. Therefore, it is essential to explore the potential of FDD operation in m-MIMO systems allowing URLLC so they operate with short CCT.
In the FDD operation mode, users may need to estimate the DL channels to obtain CSI for DL precoding. This requires the BS to optimize the DL TS and broadcast them to the users, who must provide the quantized channel estimates back to the BS to precoder the signals to different users in the DL [23]. However, this approach is impractical for FDD m-MIMO systems with a large N, as the overhead for the DL CSI estimate scales linearly with the number of antennas, which can be much larger than the number of users [24,25]. As a result, the channel training would consume a significant amount of the available CCT, leaving little time for broadcasting data to the users [25,26]. Increasing the number of antennas at the BS makes it very difficult to optimize the DL TS and estimate CSI in FDD systems due to the short CCT. This has led researchers to look for viable solutions for designing a DL TS for CSI estimation in m-MIMO systems with FDD transmission mode.

Related Work
Several studies have explored the enhancement of DL TS design by using a wellknown filter known as a minimum-mean-square-error (MMSE) that takes into account both the statistics of the channel and the noise [27][28][29][30]. To this end, considering different system models, the authors in [27][28][29][30] investigate the m-MIMO performance when there is the same spatial correlation among the users at the BS. Specifically, the work in [27][28][29][30] considers the time and spatial domains correlations and designs TS based on the spatial domain and then use the time domain for CSI prediction. The authors in [31][32][33] consider compressed sensing (CS) techniques and design the TS assuming the users have a common correlation pattern. In these studies, the TS duration needed for the DL channel estimate of the FDD transmission mode benefits from the sparsity structure. However, the users might exhibit distinct spatial correlation patterns in practice. Therefore, the TS design proposed in [27][28][29][30][31][32][33] cannot apply to the multiuser m-MIMO when the users have different correlations.
To overcome the issue of having different channel correlations with FDD transmission mode, numerous studies have looked into the TS design using various methods. For example, the authors in [34,35] proposed a method with two-stage precoding. In this two-stage approach, the users are partitioned into different groups with common correlation users, and then each group has a different channel correlation to simplify the TS design and reduce the TS length to estimate CSI. To make the two-stage approach effective, it is important to use a sophisticated scheduling algorithm and clustering method for both user groups and individual users within those groups. Additionally, although a two-stage method might help reduce the length of the DL TS, it is still necessary to employ these scheduling and clustering algorithms. Otherwise, assuming a finite number of users to be chosen at the scheduling stage and realistic propagation conditions with heterogeneous user channels, the system performance could be significantly reduced due to the residual interference between the users in the same group and the users in different groups. Moreover, the study in [34,35] does not deal with the single-stage approach with K independent user CCMs, making it impossible to predict the FDD m-MIMO system performance in a single-stage precoding scenario. The work in [36][37][38][39][40] considers a single-stage precoding process assuming the users have different transmit correlations in FDD transmission mode. The DL TS has been optimized using extensive iterative algorithms. The optimization algorithms used have a slow rate of convergence, leading to high latency and computational complexity. In practical systems with a URLLC, the channel could exhibit a short CCT. As a result, it is necessary to obtain CSI estimation more frequently while maintaining low latency between the CSI estimation and precoding. Therefore, there is an essential need for developing a feasible TS design in FDD m-MIMO systems that can maximize the sum rate while not increasing the computational complexity. Recently, the work in [41] proposed a training design without the need for using iterative algorithms. However, the sum rate of FDD systems with short CCT is not yet discovered. Specifically, the effect of TS overhead has not yet been taken into consideration in [41], and thus, the sum rate of FDD m-MIMO systems with the practical consideration of limited CCT is not yet discovered. Furthermore, the rate can be maximized through the use of efficient and low-complexity DL TS design.

Research Contributions
To date, designing a low-complexity DL TS for fast and accurate CSI estimation, i.e., meeting the low-latency communications required by the future generations of wireless systems, in m-MIMO systems with FDD transmission when users have distinct channel correlations is very challenging. This paper addresses this technical challenge by proposing a feasible solution for TS design with low complexity and limited CCT. To achieve this, the channel covariance matrices, which contain statistical information, are utilized to construct a low-complexity TS for m-MIMO systems with an FDD transmission mode. In addition, a waterfilling power allocation algorithm is proposed to optimize the DL TSs when the users have different correlations. The proposed training design is based on a non-iterative solution. The main contributions of this paper can be summarized as follows.
• This paper addresses the challenge of designing a low-complexity DL TS m-MIMO system in the FDD transmission mode with limited CCT when users span different channel covariance matrices. • This paper proposes a feasible solution for CSI estimation using a low-complexity DL TS to maximize the achievable sum rate of FDD systems with limited CCT using a waterfilling power allocation method. • This paper investigates the performance of FDD m-MIMO systems taking into account the achievable sum rate maximization with short CCT, which is an essential metric for many wireless systems applications, in contrast to earlier research studies that have considered the MSE metric only.
• This paper explores the potential sum rate performance with a zero-forcing (ZFBF) precoder, which can effectively mitigate interference at high SNR, in contrast to conventional precoding methods such as the matched filter. • This paper conducts comparisons for the rate performances between the proposed low-complexity DL TS design and existing methods for DL sequence optimization. The comparisons are presented using a one ring (OR) channel model [34,42,43] and a Laplacian [36,44] physical channel model with uniform planar array (UPA) configuration. The results demonstrate that the proposed low-complexity TS design considerably improves the DL rate over the state-of-the-art TS designs. This achievement signifies the importance of using the proposed approach in practical systems with URLLC.
Paper Outlines: Section 2 presents the system model and the characterization of the rate using the ZFBF precoder. In Section 3, we explain the TS design and channel estimation process together with the problem formulation. In Section 4, we explain the proposed algorithm that optimizes the TS in m-MIMO systems with FDD transmission mode. The physical channel model used in the performance evaluation of the proposed TS design is presented in Section 5. In Section 6, the performance evaluation results are provided. Finally, the paper is concluded in Section 7.

System Model Discussion
We consider a single-cell system, where the BS is equipped with transmit antennas where N 1, which transmits TSs and data for K single antenna users with N K. Massive MIMO technology is used to improve the performance of wireless communications networks by distributing signals through multiple antennas [45]. This allows for joint beamforming, which enhances the network capacity and improves the overall experience of the users [46]. Instead of using large horizontal linear arrays with hundreds of antennas that would create narrow beams in the azimuth, compact planar panels with limited angular resolution in the azimuth (horizontal) and elevation (vertical) domains are being used. Hence, a UPA is considered in this paper. Due to the closely spaced antenna elements at the BS, the correlation is increased in an m-MIMO system. According to typical wireless networks, the coherence block length that is available for transmission resources consists of time and frequency symbols. These symbols represent the intervals of time and frequency during which the channel responses remain mostly constant [47][48][49][50]. Extension to orthogonal frequency-division multiple (OFDM) could be considered in the future [51,52]. In this paper, the available CCT denoted by τ c ∈ Z + is divided into training time period τ tr and data transmission period τ d . The coherence length that can be used for transmission in both the UK and DL is divided between them. The transmit energy can be divided between the TS stage and the data transmission stage. Figure 1 demonstrates the system model with a coherence block length of τ c ∈ Z + , which consists of the DL TS and data transmission in m-MIMO systems with an FDD operation mode.
The signal received by the kth user in the data-transmission stage in the DL can be expressed as [53] where parameter s = [s 1 , . . . , s K ] T ∈ C K represents independent and identical distribution (iid) of a data signal with a zero-mean circularly symmetric complex-Gaussian (CSCG) with power constraint of E[ss H ] = I K , and V = [v 1 , . . . , v K ] ∈ C N×K is the precoding matrix. Extension to color noise could be considered in the future [54]. The DL instantaneous channel intended between the BS and the kth user h k ∈ C N is given by h k = R 1/2 khk , where the elements ofh k ∈ C N are iid with CSCG. The kth user CCM, R k is given by which represents the transmit correlation matrix. Additive noise z k is modeled as an iid with CSCG random variable. The received signal to interference and noise ratio SINR, represented by fl k , can be written as where 1/ρ d refers to the reciprocal of the SNR. The received signal, i.e., SINR, is influenced by various statistics and the specific precoding technique adopted by the BS, which depends on channel estimation. In terms of information theory, the best precoding strategy for multiuser MIMO systems is achieved by using dirty paper coding (DPC) [55,56]. Nevertheless, precoding with DPC is still not practically feasible due to its high design complexity [57]. Linear precoding can be utilized instead of DPC to offer a performance that is not optimal but still satisfactory while keeping the design complexity low [58,59]. To this end, we consider a ZFBF precoder that can be normalized as [53] v where v k denotes the precoding vector of user k and parameter p k = ||v k || 2 2 represents a scalar value of the power that can be allocated to user k, which satisfies ∑ K k=1 p k = P d , and H † denotes the pseudo-inverse of matrixĤ that represents the DL CSI estimation witĥ H T = [ĥ 1 , . . . ,ĥ K ]. AS such, the DL achievable rate, Rate, can be written as The per-user received signal in (2) depends on the statistic information of the channel, the CSI estimation, and the precoder used at the BS. Note that the expectation in (4) is determined using different channel variations based on Monte Carlo simulations.

CSI Estimation Process
As previously noted, the accuracy of DL CSI estimation is crucial for the performance of the DL precoding scheme. Therefore, it is essential to design a feasible TS for CSI estimation in m-MIMO systems with FDD transmission mode. This section presents the TS design, which depends on the received training signal used for CSI estimation conditioned on the channel covariance matrix (CCM), as subsequently explained. To estimate the DL CSI, the BS transmits a TS matrix of length T tr during the training period. Thus, the received training signal per user, denoted as yk ∈ C Ttr during the training period, can be expressed as where S tr ∈ C N×T tr denotes the training matrix that satisfies tr S H tr S tr = ρ tr T tr , where ρ tr represents the transmitted power during the training stage. Parameter z k ∈ C T tr denotes the receiver noise that is modeled with iid CSCG with CN 0, I T tr . This paper assumes equal training and data power allocation, i.e., ρ tr = ρ d . However, optimizing the power allocation between ρ tr and ρ d might be considered in future.
Linear filters can be utilized to optimize channel estimation performance, given that the channel vector h k follows a CSCG distribution with known statistics at the BS. Linear filters can be utilized to enhance channel estimation performance by taking advantage of the statistics of the channel vector h k , which is denoted by CSCG with known properties at the BS. In order to achieve the best possible channel estimation performance in the DL with T tr < N, Bayesian estimation is employed. This is done by utilizing an MMSE filter that takes into account both channel and noise statistics. As such, CSI estimateĥ k ∼ CN (0, Φ k ) can be expressed asĥ where y k denotes the per user-received signal during the training stage. As such, CSI estimation error might be written as The covariance matrix of the CSI estimation error, which is denoted by C e,k ∈ C N×N , can be expressed as The mean square error can be written as Let R tot = ∑ K k=1 R k be the superposition of different user channel covariance matrices. Using the Woodbury matrix identity, the error covariance matrix can be written as

Problem Formulation
We minimize the MSE based on TS S tr in the DL m-MIMO systems with FDD transmission mode equivalent to the optimization problem defined in (13) The MSE corresponds to a function of the subspace that depends on the superposition of the user channel covariance matrices R tot and the pilot matrix S tr , which implicitly depends on the training time and power.
In m-MIMO systems, where the BS antennas are closely spaced by half a wavelength and located within a limited physical area, the channel coefficients may be strongly correlated. As a result, the CCMs will have a significant spread of eigenvalues. Furthermore, a significant amount of energy would be concentrated in a small number of strong directions, known as eigenmodes, rather than being distributed uniformly in all directions.
Consequently, it is essential to allocate more power to these strong spatial directions while assigning zero power to the weaker directions. The authors were motivated by this observation to investigate the exploration of the superposition covariance matrix R tot in the training design. This was done with the aim of reducing the overhead of the DL TS and optimizing the allocation of power. To this end, this paper proposes a computationally efficient TS design using a waterfilling algorithm, described in detail in Section 4, when the users have distinct correlations. The algorithm proposed is intended to achieve a high rate of performance with a simplified design process.

Proposed Low Complexity DL TS Design Based on a Waterfilling Algorithm
In this section, we discuss the proposed TS framework based on a waterfilling power allocation algorithm. The structure of TS matrix S tr is designed by considering the K different CCMs where the CCMs of all users are combined together. As such, the effective eigenvectors of the superposition CCMs and the corresponding eigenvalues are used in the pilot design. The superposition CCM R tot can be expressed by considering the eigenvalue decomposition (EVD) as where unitary matrix U tot = [u tot,1 , . . . , u tot,N ] ∈ C N×N represents the eigenvectors of CCM and Λ tot ∈ R N×N denotes the eigenvalues of CCM R tot , which are arranged in descending order so that λ tot,1 ≥ λ tot,2 ≥ · · · ≥ λ tot,N . Using the eigenvalue decomposition in (14), the MSE error can be expressed as Expression (15) can be written as Lemma 1. The following inequality holds for positive definite matrix A: where a i,i is a diagonal element of the matrix A. The equality in (17) holds if A is a diagonal matrix.

Proof.
A proof of Lemma 1 can be found in [60].
Based on Lemma 1 and majorization theory, the MSE in (16) is minimized if U H tot S tr S H tr U tot is a diagonal matrix with all non-negative elements. Consider the EVD of S tr S H tr , i.e., S tr S H tr = UΣU H . To this end, the right hand side in (16) can be written as U H tot UΣU H U tot , which can be represented by a diagonal matrix Σ if we choose U = U tot , i.e., Σ = Σ tot . The diagonal matrix Σ 0 T tr with an optimum power assignment and under the transmit power constraint can be found through the waterfilling algorithm; see, e.g., [27,[61][62][63][64]. As such, the TS framework is provided next.

Proposition 1.
A new DL TS S tr ∈ C N×T tr for m-MIMO systems with FDD transmission mode considering different spatial correlation is designed based on the first T tr eigenvectors of the superposition CCMs R tot , denoting the largest eigenvalues, scaled by an optimized power allocation that is obtained using the waterfilling solution as expressed in (18) where U tot ∈ C N×T tr is a rectangular matrix consisting of T tr eigenvectors of R tot associated with the first T tr largest eigenvalues with an optimized power assignment. Auxiliary variable Σ tot ∈ R N×N is a non-negative diagonal matrix Σ tot = diag{σ 1,tot , ...σ T tr,tot , ..., 0, ..., 0} with an optimized power allocation based on a waterfilling algorithm as given by (19). The power allocation should satisfy the transmit power constraint and be arranged in decreasing order with t and at the same order as the eigenvalues of R tot , i.e., λ t,tot where the training matrix satisfies tr S H tr S tr = ρ tr T tr . The waterfilling/optimum power assignment can be obtained using a Lagrange multiplier method as where µ > 0 is the Lagrange multiplier, which is chosen to satisfy the transmit power constraint, i.e., ∑ T tr t σ t,tot = ρ tr .
The waterfilling-based approach introduced in Proposition 1 identifies which eigenmodes are essential for DL training and estimating the CSI. The intuition behind the waterfilling-based approach is to consider eigendirections of R tot to allocate more power in these few effective directions while assigning zero power to the weakest directions. In particular, the power in the diagonal matrix Σ tot corresponds to only T tr largest eigenvalues of R tot , which are deemed as essential by the waterfilling criteria. The minimum number of training length is obtained by the number of active training powers σ t,tot that satisfies the transmit power constraint ρ tr . The rank of Σ tot is equal to the rank of S tr , which is equal to the maximal number of active σ t,tot . To the best of our knowledge, this DL training framework design has not been proposed for FDD m-MIMO with distinct spatial correlation matrices.

Channel Correlation Models
When modeling the CCM in MIMO systems, many studies have assumed Rayleigh fading is uncorrelated. This means that the elements of the matrix are uniformly distributed and independent for all users [65]. As a result, all directions are equally important in this model since the energy is broadcast evenly over all directions. The assumption of spatially uncorrelated channel coefficients is quite strict. In real-world scenarios, m-MIMO systems with a large N may exhibit significant correlations among the channel coefficients. Moreover, empirical measurements have provided evidence that MIMO channels do exhibit correlation in practice [66][67][68]. Therefore, it is important to consider correlation models. When dealing with correlated Rayleigh fading, only a small number of effective paths are present in the angular directions, which means that the remaining directions can be disregarded. The level of spatial correlation present in the CCM is determined by both the statistical information available and the specific configurations of the antenna array.
Note that the radiation pattern of an antenna array can be calculated by taking into account the phase and amplitude of the signals at each antenna element, as well as the spacing and geometry of the elements. The pattern can be represented graphically in two or three dimensions. In general, the radiation pattern of an antenna array depends on the array configuration, which includes the number of elements, their spacing, and their relative phase and amplitude. In this paper, we use a uniform rectangular array in which the radiation pattern is distributed in both the horizontal and vertical directions.
We will now describe the geometric characteristics of the CCM R k for the Laplacian and the OR model with UPA. The attributes of the UPA are established by the angles of arrival (AoA) in the horizontal θ k,H and vertical θ k,V directions. We assume the angular spread in the horizontal direction ω k,H and vertical direction ω k,V and the normalized antenna spacing D H and D V (measured in wavelengths). We also assume that the BS array has uniform spacing between its antenna elements, denoted by D, which is equal in both the horizontal and vertical directions, i.e., D H and D V . The number of antennas arrays is denoted by N, which is equal to the product of the number of antenna arrays in the horizontal (azimuth), denoted by N H , and the number of antenna arrays in the vertical (elevation), denoted by N V . A uniform spacing between antennas is considered so that the same distance d is assumed for both azimuth and elevation directions denoted as d k,H and d k,V , respectively. Hence, d k,H = d k,V = d k . Based on this assumption, the CCMs in the horizontal and vertical directions are denoted as R k,H ∈ C NH×N H and R k,V ∈ C NV×N V , respectively. Thus, the elements of R k,H and R k,V can be expressed in Toeplitz form as (m, n)th element [28].
The computation of the integration in equations (20) and (21) is done through numerical methods. The vertical angular spread is denoted by the parameter ω V and its value is obtained according to reference [28].
The value of θ k,V , which represents the angle at which the kth user's signal arrives in the vertical direction, is defined as follows [28]: The user-BS distance, BS height, and scattering ring radius are denoted as d k , h, and s, respectively, and are measured in meters. The low angular spread in both horizontal and vertical directions indicates that the channel covariance matrix (CCM) has a low-rank structure, indicating a strong spatial correlation. The expression in (24) can be used to compute the CCM R k of the kth user in the UPA configuration. The horizontal and vertical CCMs of R k are Toeplitz matrices with components computed numerically, where θ k,V and ω V are the vertical AoA and angular spread, respectively.
The Kronecker operator is represented by the symbol ⊗. The channel covariance matrices for the Laplacian model in the azimuth and elevation directions are provided in equations (25) and (26), respectively.
where θ k,H and θ k,V denote the AoA in the horizontal direction and vertical direction, respectively. Parameters ω H and ω V represent the horizontal and vertical angular spreads in horizontal and vertical directions, respectively.

Performance Evaluation
This section discusses multiple findings that describe the performance of m-MIMO with systems. Specifically, this section provides comparison results between the rate of the state-of-the-art TS designs proposed in [36,41] and the rate obtained using our proposed training solution. The comparison is conducted based on the scattering one-ring (OR) channel model and the Laplacian channel model. Table 1 contains the parameters used for the performance evaluation. In this study, we focus on a densely populated urban area where users are situated within a 200-meter range from the BS. The coherence block length values used in the simulation are τ c = 100 and τ c = 50, with a total of K = 10 users. The OR model and Laplacian model are employed with distinct angular spreads of ω H = 5 • , 15 • , which indicate strong and poor spatial correlation in the channel, respectively. The other parameters are obtained by AoAs θ k,H , angular spread ω, and spacing of the antenna D, as specified in [36]. The power allocation should satisfy the total constraint with P d = 1. The training power ρ tr is given as SNR in dB, which ranges from -15 dB to 20 dB. In our figures, every point depicted in the simulation is the average of 5000 independent channel realizations obtained through Monte Carlo simulation runs. The angle of arrivals is specified as θ k,H = −57.5 • , −45 • , −41.5 • , −23 • , −7.5 • , 7.5 • , 23.5 • , 41.5 • , 45 • , 57.5 • , according to [36]. In real-world scenarios, the CCM based on the OR and Laplacian models is unlikely to have singular values equal to zero. To minimize the overhead during DL TS, the strong eigenvalues of the summation of transmit CCM are chosen in the design, while the remaining eigenvalues are disregarded. This implies that only the dominant dimensions of the effective channel, where most of the energy is concentrated, are used for precoding design and data transmission, whereas the other dimensions are not utilized. It is important to note that using orthogonal or random DL TS is not effective with FDD m-MIMO systems because these sequences might span all dimensional space and waste resources in the training phase. The proposed DL TS design is evaluated and compared to conventional designs based on achievable sum-rate maximization.  Figure 2 examines the achievable sum rate using the Laplacian channel model comparing different training designs under a strong channel and relatively weak correlations. We select N = 128 and with very short CCT τ c = 50. The results in Figure 2 show that the rate of the proposed design outperforms the state-of-the-art training designs. In addition, the results in Figure 2 indicate that a significant increase in the rate can be achieved with high correlation, i.e., ω H = 5 • . Figure 3 investigates the rate using the Laplacian model when the CCT is increased, i.e., τ c = 100. The results in Figure 3 demonstrate that the rate is increased when the CCT is increased, and the proposed TS design outperforms the existing methods. Figures 4 and 5 investigate the rate with Laplacian model when N = 256 and with very short and short CCT τ c = 50 and τ c = 100, respectively. The results in Figures 4 and 5 show that a considerable enhancement in the rate can be achieved using the proposed TS design in comparison to the existing methods. The results also indicate that with ω H = 15 • , loss in the rate is observed in all the methods considered with around 5 bits/sec/Hz. Figures 6 and 7 show plots of the sum rate using OR channel model with N = 128 and with very short and short CCT τ c = 50 and τ c = 100, respectively. In both figures, we evaluate various TS designs under a high channel and comparatively weak correlations, specifically, at ω H = 5 • and ω H = 15 • , respectively. The analyses of the results demonstrate that the proposed TS method outperforms the existing methods, especially at high SNR values. This performance enhancement is significantly increased when N is increased, as clearly indicated by the results in Figures 8 and 9. The results indicate that a remarkable improvement in the rate performances is achieved using the proposed pilot sequence. The findings indicate that when the horizontal angle ω H = 5 • , the suggested design results in a 30 bits/s/Hz rate with a 7 dB improvement over existing designs. On the other hand, when the horizontal angle is ω H = 15 • , the proposed pilot sequence yields a 30 bits/s/Hz rate with a 9 dB enhancement compared to conventional designs. Specifically, with a large number of N, around 5 bits/s/Hz sum rate improvement can be obtained over the state-of-the-art training sequences.   Figures 10 and 11 show, respectively, plots of the sum rate using Laplacian and OR channel models as a function of a number of BS antenna with CCT τ c = 100 and SNR = 15 dB. The results demonstrate a significant improvement in the sum rate performance can be achieved with the proposed training sequence design.
Overall, the results indicate clearly that our proposed TS design outperforms the state-of-the-art training designs in all the scenarios considered. The physical interpretation of the proposed idea is that each eigenvector of the R tot represents the transmit direction where the users are located, and the associated eigenvalue Λ tot indicates the channel gain in that direction. As such, more power should be assigned to the signals transmitted along with the directions with larger channel gains. In particular, at low SNR values, training only a few directions (corresponding to the largest eigenvalues of the R tot is enough to achieve acceptable performance and zero power could be allocated to the weakest eigenvalues. As the SNR values increase, additional directions corresponding to smaller eigenvalues could be included to be trained to improve the channel estimation accuracy. Our proposed method exploits the waterfilling power allocation algorithm and achieves a significant improvement in the rate compared to the existing methods. The results presented in this paper also indicate that a feasible DL pilot sequence for the FDD systems with very limited CCT can be achieved without increasing the system complexity and overhead. The analyses of the results support the main contribution of this work.

Conclusions
Optimizing DL TS to estimate CSI in m-MIMO systems in a general situation with single-stage precoding and when users exhibit different channel covariance matrices is extremely challenging. Even though there are sophisticated algorithms available to deal with the issue of optimizing the TS to accurately estimate CSI, they tend to have slow convergence speeds and high computation complexity. To address this issue, this paper proposed a feasible DL TS to estimate the CSI with a low-computational requirement. The proposed TS method works with limited CCT aiming to maximize the DL sum rate performance. The performance of the proposed training design is assessed in terms of the maximum rate that can be achieved, and it is compared with the most advanced methods available. The results illustrated that the proposed TS method outperforms the existing methods in the medium and high SNR regimes. The enhancement in the system performance is achieved by maximizing the rate of the FDD m-MIMO system, without increasing system complexity and overhead. Specifically, this paper demonstrated that a DL TS design that is practical and not overly complex can be attained by taking into account the sum of the eigendecomposition of the channel's covariance matrices. The obtained results signify the effectiveness of using the proposed TS solution for practical consideration in comparison with the existing DL TS solutions.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: