Channel Estimation Using CNN-LSTM in RIS-NOMA Assisted 6G Network

The combination of non-orthogonal multiple access (NOMA) and reconfigurable intelligent surface (RIS) technologies is proposed to meet the demands of data rate, latency, and connectivity in sixth-generation (6G) networks. The two techniques can support each other to increase the performance of the 6G system. In a RIS-aided system, channel estimation is a challenging problem, especially when applying passive RIS which has no signal processing. This paper proposes a deep learning (DL)-based channel estimation method using a convolutional long-short term memory (CNN-LSTM) model for RIS-NOMA wireless communication systems that integrate RIS and NOMA techniques. CNN-LSTM leverages both the benefits of convolutional neural network (CNN) as well as long-short term memory (LSTM), in which CNN can capture special features while LSTM can capture temporal features of time-series data. The simulation results indicate that the proposed CNN-LSTM model shows its robustness toward the variation of the RIS-NOMA system parameters, i.e., transmit signal-to-noise ratio (SNR), power allocation factor, and the number of RIS elements. The impacts of the RIS-NOMA system parameters on the prediction accuracy of the proposed DL-based channel estimation methods are evaluated via different performance metrics. The results reveal that the performance accuracy in terms of normalized root mean square error (NRMSE), coefficient of determination R-squared score (R2 score), mean absolute scaled error (MASE), and mean absolute percentage error (MAPE) increases with an increased transmit SNR, power allocation factor of the first user and the number of RIS elements. Additionally, the CNN-LSTM prediction performance shows its superiority as compared to those of the four benchmark models including the CNN1D-LSTM model using one-dimensional convolution layer (conv1D), CNN1D-BiLSTM model using bidirectional long-short term memory, CNN model and LSTM model.


I. INTRODUCTION
T HE upcoming next-generation sixth-generation (6G) networks, with the explosion of up to trillions of intelligent devices and massive connections between devices, put forward requirements for fast speed, low power consumption, low-latency, and energy efficiency. Reconfigurable intelligent surface (RIS) emerges as one of the potential and innovative techniques, apart from multiple-input multiple-output (MIMO), millimeter-wave (mm-wave), and relays communications, for the future 6G networks. RIS can enhance the spectrum efficiency, energy efficiency and throughput of wireless communications by leveraging its ability to reflect and redirect signals in a controlled manner [1], low hardware footprint [2], and capability to expand the range of network coverage [3]. RIS is composed of elements, in which each element or group of elements independently adjust the amplitude and phase responses of incident signals in real-time to obtain energy toward an intended direction [4]. By appropriate adjustments and designs for phase shifts, RIS is able to manipulate the communication environment, increase the link quality, and improve environment coverage [5]. In addition, RIS can operate in a passive mode containing low-cost passive RIS elements with low energy consumption, contributing to an increase in the energy efficiency for wireless communication systems.
Although the next-generation multiple access (NGMA) techniques are in the investigation, NOMA is recommended for the 6G networks [6]. Regarding new requirements, the key concept of NGMA is to enable massive devices to connect efficiently and intelligently in a given radio resource under extremely low latency, high-reliability communications, high capacity, and high data rate. The existing orthogonal multiple access (OMA) techniques might be infeasible for these extreme demands. The reason lies behind the fact that the current OMA access techniques only serve a single user in each orthogonal radio resource block, and thus, limit the system capacity and spectrum efficiency [7]. To provide a higher spectrum efficiency compared to OMA, NOMA technique introduces a mechanism in which the given orthogonal radio resource is attributed to multiple users simultaneously. NOMA is a promising research trend for the evolution of critical and massive machine-type communications in 6G [8], [9]. Regarding new technologies, the integration of NOMA with emerging techniques such as RIS, Terahertz communications, index modulation, random access, and visible light communications brings new research challenges in 6G. NOMA is combined with Terahertz communications for vehicle-tovehicle and vehicle-to-infrastructure applications to improve performance [10]. In [11], the integration of NOMA and RIS is discussed as one of the potential scenarios for 6G and has been proposed by the research community [12].
In multi-user 6G networks with massive devices, NOMA is a potential technique integrated with RIS to meet the demands of data rate, latency, and connectivity [13]. On the one hand, NOMA systems are useful in tackling the problems of user (UE) explosion in future networks by allowing massive devices to connect simultaneously. A NOMA system serves multiple users (UEs) with different quality of service (QoS) in the same orthogonal resource (e.g., time, frequency, and code) block by exploiting a power domain. The power-domain-based technique exploits the difference of channels between UEs for multiplexing by using superposition combining (SC) at a transmitter and applying successiveinterference-cancellation (SIC) at a receiver to eliminate the co-channel interference before decoding its own signal [14]. Additionally, a system using NOMA is superior to a conventional OMA system in terms of average sum-rate and outage probability [15]. On the other hand, with the aid of RIS, a NOMA system increases its performance by additional signal diversity with no more time slot or energy. Compared to a conventional NOMA system where channel conditions are determined by the propagation environment, the counterpart with the support of RIS is able to tune the channel quality of each UE by changing the reflection coefficient or the position of RIS. This flexibility of RIS toward channel quality provides a smart NOMA design and the NOMA system turns into a system based on QoS instead of on channel gains from the environment [16], which is useful in 6G networks where different QoS devices are connected. Due to the excellent integration of RIS and NOMA, many studies recently have been paying attention to reconfigurable-combined-NOMA (RIS-NOMA) systems.

II. BACKGROUND
This section presents a review of RIS-NOMA wireless communication systems and the use of DL methods for channel estimation in conventional RIS-aided systems without NOMA. From this background, we outline the motivations and contributions in this paper.

A. RIS-NOMA WIRELESS COMMUNICATION SYSTEMS
Recent studies on RIS-NOMA systems mainly focus on system performance analysis by solving optimization problems. It was demonstrated that deploying RIS for a NOMAassisted system or applying NOMA for a RIS-aided system can enhance the system performance compared to the conventional NOMA (without RIS) or OMA-based RIS systems, respectively. Specifically, by optimizing the rate performance of the system, the study in [17] shows that the integration of RIS and NOMA enables the system to achieve a higher rate than an OMA-based RIS system and a conventional NOMA system. In [18], in an effort to maximize the weighted sum rate and propose an UE order scheme for NOMA system, it is shown that an optimal deployment location of RIS can enlarge the disparities among channels of UEs in a RIS-NOMA system and increases channel gains of all UEs for a RIS-OMA system. With an aim of transmit power optimization, the study in [19] proves that the proposed a RIS-NOMA system using a 32-elements-RIS obtains a higher performance than a conventional MIMO system equipped with 64 antennas. Similarly, the paper in [20] and [21] studied maximizing the system throughput and energy efficiency of RIS-NOMA systems. Of these, the RIS-NOMA systems show their superiority in terms of system throughput and energy efficiency, as compared to systems only utilizing RIS or NOMA.
The aforementioned studies have highlighted the merits of the integration of NOMA and RIS with an assumption that the channel state information (CSI) of the network is available at the transmitter but no consideration has been given to the importance and challenge of estimating these channels. For NOMA systems, an accurate estimation of channel gains is a prerequisite to perform SIC at each receiver [17]. In practice, the challenge of channel estimation in a passive RIS-assisted network lies in that a RIS is composed of passive elements without any receive or transmit processing chains, thus channel estimation is only implemented at the transmitter or the receiver side. In addition, a large number of RIS elements in reality leads to an increased number of unknown channel parameters that needed to be estimated [22]. So far, channel estimations for RIS-NOMA systems have been still a research gap on which existing works have rarely focused. 44 VOLUME 1, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

B. DL APPROACHES FOR CHANNEL ESTIMATION IN A CONVENTIONAL RIS-ASSISTED SYSTEM WITHOUT NOMA
Recently, channel estimation for conventional RIS-aided systems has been widely studied. However, channel estimation methods, especially DL-based channel estimator, for passive RIS-assisted systems have been investigated in a few recent works. As the challenges of channels estimation for passive RIS-aided systems discussed above, our work only considers channel estimation methods for passive RIS-aided systems. A channel estimation-based DL algorithm, which is a data-driven approach aids RIS to overcome challenges caused by a conventional RIS-assisted channel estimation algorithms, i.e., minimum mean square error (MMSE) or least-square methods. Specifically, the pilot overhead when applying conventional channel estimation methods is prohibitively high [23], [24], resulting in a limitation of spectral efficiency and a delay in channel estimation. Although in [25] and [26], the authors proposed methods to tackle these issues, these approaches still required a reduction in beamforming capacity and assume channel sparsity in the network, respectively. DL-based channel estimators have shown their potential by reducing the high pilot overhead which is caused by an increased number of RIS elements in RIS-OMA systems [27], [28].
The channel estimation algorithm based on DL approaches for RIS-aided systems, particularly multi-users ones, has been investigated in recent studies. Compared to the conventional linear channel estimator, i.e. minimum-mean-squareerror or least square method, DL-based approaches exploit non-linear inherent relationships between the input-output signals to produce more reliable channel estimations for RISassisted systems with a less computational cost [29]. A DL model with an orthogonal matching pursuit algorithm followed by the residual network is proposed for a RIS-aided multi-users massive system [30]. This method is built on the assumption of a quasi-sparse structure of the cascaded angular channel, therefore is not suitable for the scenarios of frequency-selective fading in a time-varying environment. In particular, the author in [4] applied generative adversarial networks based on convolutional blind denoising, a convolutional blind denoising network, and multiple residual dense networks to estimate cascaded uplink channels of a time division duplex (TDD) passive RIS-aided communication system. This study is under the assumption that the optimal phase shift matrix is different for one or more UEs in each time slot. In this paper, this assumption is unnecessary; as our proposed RIS-NOMA wireless system serves multiple UEs at the same time block, therefore the optimal phase shift is considered for the overall system at a time instead of for each UE. A CNN-based deep residual learning (CDReL) model is proposed in [31] to estimate cascaded channels in a RISassisted TDD multi-user system based on noisy pilot-based observations. This study is based on an OMA technique, therefore an optimal scheme for the matrix of pilot sequence at all UEs is necessary to improve the received signal at base station (BS) before applying channel estimation. However, this optimal scheme can be ignored in our proposed downlink NOMA-aid system in this study owing to the fact that a NOMA system applies SC received signals at the transmitter and SIC at UEs.
Several studies have focused on channel estimation-based DL methods for RIS-assisted systems in time-varying environments. A time-varying channel estimation using a recurrent neural network combined with ordinary differential equations for a passive RIS-assisted single UE system is investigated in [28]. Although the mobility speed at the receiver side is taken into account, this study only observes the system for a single UE. In addition, it applies a control RIS in an on/off pattern during a pilot block to reduce the length of the cascaded channel with an increased the number of RIS elements. No benchmark channel estimation methods have been applied to compare with the proposed method in this study. A fast time-varying channel estimation approach is studied in [32] by applying novel sparse-connected LSTM along with a three-stages channel estimation approach for a full-duplex RIS-assisted multi-user multiple-input singleoutput (MU-MISO). The channel from BS to RIS is considered in a large-time scale, and the channels from RIS to each UE is considered on a small-time scale only. Hence, the cascaded channel, which is the combination of channel from BS to RIS and from RIS to each UE, disregards small time-scale variation of the former as well as large-scale variation of the latter. Most of the aforementioned studies on passive RIS-aided systems applying OMA technique, in which all devices are fixed or time-varying channels were derived from geometric channel models.
Recently, channel estimation based on DL algorithms for an integrated sensing and communication (ISAC) systems that are assisted with RIS has been investigated. The authors in [33] proposed a three-stage channel estimation using CNN architecture for a RIS-assisted ISAC system. This approach aims to overcome the inherent inference in a system containing both sensing and communications signals. The proposed estimation DL method is applied for both direct channels, including UE-BS and target-BS and reflected ones, consisting of UE-RIS-BS and BS-target-RIS-BS, and proved to outperform the baseline least square method. This approach requires turning on RIS components at second stage, which leads to battery consumption. In addition, the proposed method is specifically designed for uplink channels and is not easily extended to downlink. The focus of this investigation was on a scenario where the devices within the system were stationary, and more specifically, the link between the RIS and BS was established over a very limited distance of 2 meters.
Although there are many papers working on channel estimation for RIS-aided systems, most of the aforementioned studies on passive RIS-aided systems apply OMA technique. Furthermore, the fixed or time-varying channels have been mostly obtained through the use of geometric channel models, specifically, the Saleh-Valenzuela model which assumes that the RIS has a negligible size in comparison to the distance between the transmitter and receiver. This assumption could be impractical when RIS is placed in close proximity to BS.
Accurate channel estimation is more crucial for the NOMA-based SIC system than for the OMA system because the SIC decoder at the receiving end heavily depends on the CSI. Additionally, due to a large number of RIS elements and the multiuser nature of NOMA, there are numerous RIS-related channel coefficients that require estimation in a RIS-NOMA system. In channel estimation, it is expected that estimation may take a large number of time slots in a RIS-NOMA system which is usually directly related to the number of reflecting elements and users, and it has the potential to surpass the maximum duration of the channel coherence time [34]. Different from the traditional RIS system, a RIS-NOMA system differentiates between UEs using power allocation factors. When a UE has lower power allocation, it may receive a weaker signal as it moves farther away from the BS, which can lead to difficulties in channel estimation.

C. MOTIVATION AND CONTRIBUTIONS
As discussed in the previous section, DL-based channel estimation approaches for passive RIS-NOMA systems have not been explored sufficiently in existing studies yet. This paper is an early attempt to study a DL-based channel estimation method for a downlink multi-user RIS-NOMA wireless communication system. However, NOMA poses new challenges for a RIS-aided system in terms of channel estimation. Firstly, channel coefficient matrices become larger due to a huge number of channels produced from the nature of multiple users in NOMA combined with many RIS elements. This challenge is more complicated when RIS is fully passive which makes it difficult to estimate the direct channel from BS or UE to RIS. In this case, the cascaded channels are estimated alternatively [34]. However, the cascaded channels have complicated statistics and high dimensions due to the large number of channel coefficients in a RIS-NOMA system. In this situation, applying conventional methods such as least square (LS) or least minimum mean square errors (LMMSE) leads to an expensive computation with the possibility to cause more channel estimation errors [31]. DL-based channel estimation may be a good solution to cope with the problem of a large amount of channel dataset. In addition, when it comes to the time-series set of channel coefficients of the RIS-NOMA model system, none of the available time-series models gives the best results in various situations because it is not easy to identify the underlying data generation process of time-series. Using individual models is generally insufficient to exploit all the characteristics of time-series data [35]. This demands a proper design of a DL-based channel estimation method to adapt to an unfamiliar wireless system model as well as propagation environment, like the proposed RIS-NOMA model in the paper. The contributions of the paper are summarized as follows: • The paper proposes a potential RIS-NOMA wireless system model for 6G networks, for which the channel estimation is investigated. The proposed wireless system model is different from those in previous studies that also work on channel estimation for a RIS-aid wireless system. (i) Firstly, NOMA is considered as a multiple access technique of the proposed model; hence, the system exploits the power domain of the system that is defined by the power allocation factor assigned to each UE, instead of applying pilot schemes or time division schemes as in previous RIS-aided OMA system. (ii) Secondly, different from [29] where only single user mobility is observed, in this paper, all users slowly move further from RIS to avoid the reduction of transmission performance, e.g., outage probability, and the transmission rate of each UE. (iii) Thirdly, unlike other proposed system models that take control of RIS or use active RIS, this article suggests the passive RIS for the system, in which no signal processing is available at RIS. In addition, in contrast to [32], all channels in the proposed wireless system suffer from both large-scale fading and small-scale fading.
• A data generation algorithm is developed to generate a time-series dataset based on the parameters of the proposed RIS-NOMA system model. This dataset is utilized for the study of channel estimation in the RIS-NOMA system and can be further investigated to design and test new approaches to improve end-to-end system-level performance.
• A design of the CNN-LSTM algorithm is proposed for channel estimation in the RIS-NOMA system by exploiting all characteristics of the generated time-series dataset including spatial features and temporal features. The time-series dataset is very sensitive to the DL models; in other words, a small change in the dataset due to variation of the parameters in the RIS-NOMA system could lead to a failure of channel predictions. For this reason, the CNN-LSTM channel estimation algorithm is designed to work efficiently for a wide range of RIS-NOMA system parameters, including power allocation factor, signal-to-noise ratio (SNR), and the number of RIS, rather than for a fixed set of parameters from optimization problems.
• Extensive simulations are carried out to verify the performance accuracy of the proposed DL-based channel estimator for the proposed RIS-NOMA systems. The performance accuracy metrics of the proposed DL model are evaluated under variation of RIS-NOMA system parameters, i.e., a transmit SNR at BS, power allocation factors for each UE, and the number of RIS elements. Through this evaluation, the robustness of CNN-LSTM against a large number of RIS-NOMA systems is verified.
• The paper provides a comprehensive comparison of the proposed CNN-LSTM using two-dimensional convolution layer (conv2D) convolution layers over the four benchmarks, including CNN1D-LSTM using conv1D layer, LSTM model, CNN1D-BilSTM using  bi-directional long-short term memory, conventional CNN, and conventional LSTM to demonstrate its superiority over other compared models in channel estimation for the proposed downlink RIS-NOMA system. The rest of the paper is organized as follows. The RIS-NOMA system model and the problem formulation are included in Section III. In Section IV, we propose a CNN-LSTM architecture as a DL-based channel estimator for the proposed wireless communication system. Section V presents the simulation setup and simulation results of the proposed DL model. Finally, we conclude the paper in Section VI.

III. SYSTEM MODEL AND PROBLEM FORMULATION
In this section, a NOMA-RIS system model is presented at first then a problem relating to channel estimation from this system model is defined. Finally, we discuss about dataset and data preparation in this section.

A. SYSTEM MODEL
We consider RIS-aided downlink communications in a RIS-NOMA system as in Fig.1. As shown in Fig. 1, the system composes of a BS equipped with N t antennas, a RIS equipped with L reflecting elements, and M UEs where each UE is equipped with N r antennas. Given RIS elements are assumed to be far apart from each other, resulting in mutually uncorrelated elements and avoiding mutual signal coupling among them. We assume the direct link from BS to j th user (UE j ), denoted by BS → UE j where j ∈ {1, 2, . . . , M }, is blocked due to signal attenuation in a harsh propagation environment. Hence, the received signal at each UE j comes from the cascaded channel BS → RIS → UE j . In practice, a BS and RIS are fixed while UE j is mobile. In this paper, it is assumed that UE j is slowly moving further away from BS. We assume that all users move at the same speed v meter per one time step t. The speed is slow enough to avoid causing Doppler effects. This assumption is to ensure the communication performance of the system, e.g., transmission rate, and outage probability. A such RIS-NOMA wireless system is proposed for several applications such as connected autonomous vehicles [10], ultra-high-definition video, virtual reality, and internet of things applications such as wearable devices and smart homes [36]. Due to the blockage of the direct channel, communication from a BS to UE j is only defined by the cascaded channel BS → RIS → UE j . The cascaded channel from BS to each UE j at time t is the combination of the channel BS → RIS and RIS → UE j , which is denoted as G j [32] and is calculated as All channels in this paper are considered approximately constant at each channel coherence block. We only focus on one channel coherence block with a duration of T . Each time step or time instant t = 1, 2, . . . , T . Due to a slow-moving scenario and perfect CSI knowledge at BS, the channel coherence time in this paper is on the order of tens of milliseconds (ms) [37], [38]. Next, we describe the signal model of the proposed RIS-NOMA system. In a NOMA-based system, the technique is applied at the transmitter (BS), which means that the BS transmits a superposed signal at a time t to all UEs [39]. In this formula, the symbol P s is the transmit power at BS, x j (t) ∈ C N t ×1 is a signal for user UE j , and c j is power allocation factor assigned to UE j which satisfies M j=1 c 2 j = 1 and c 1 < c 2 < . . . < c M . We assume that the channel of a UE j with a higher index has a weaker channel gain and thus requires more power allocation. Based on the cascaded channel from Eq.(1), The received signal at UE j at time t is expressed as where PL j (t) is path loss parameter at UE j , g j (t) ∈ C L×N r and H(t) ∈ C L×N t denotes the channel matrices of the links RIS → UE j and BS → RIS, respectively. We assume that all channels are Rayleigh fading channels. The symbol n j (t) ∈ C N r ×1 presents the noise at each UE j [19], [40].
In this RIS-NOMA network, the noise n i at UE j follows Gaussian distribution CN ∼ (µ n , N 0 ). In Eq. (1), the factor γ = P s /N 0 is the transmit SNR with E|[n j (t)| 2 ] = N 0 . We normalize the noise n j (t) by a standard normal Gaussian distribution with µ n = 0 and N 0 = 1. Diagonal coefficient matrix of RIS has main diagonal including elements from Each element in the main diagonal takes into account both phase shift and amplitude reflection coefficient on the incident signal [19] as follows where θ m ∈ [0, 2π ] is the phase shift and α m is the amplitude reflection coefficient. The path loss of each UE PL j , is determined according to distances and path loss exponent (PLE) as follows where symbol d 0 presents the reference distance; α sr and d sr are the PLE of the environment and the distance of the link VOLUME 1, 2023 BS → RIS; and α rj and d rj (t) are the PLE of the environment and the distance of the link RIS → UE j [41], [42]. It can be seen from Eq. (2), Eq. (3), and Eq. (4) that the received signal at each user is affected by the distance between devices and the parameters of RIS. Different users are allocated different power factors according to their CSI. In this paper, CSI of all cascaded channels of the proposed RIS-NOMA system are assumed to be perfectly known at BS. Based on the knowledge of the CSI, the NOMA system can separate the signal of each user by applying SIC at the receiver side.
At the receiver side, the SIC technique is applied to decode the signal of a specific UE by leveraging the difference between channel gains from a transmitter to each user UE j . After receiving the signal, each user will perform SIC according to using the principle of NOMA. The principle of NOMA allows every UE to receive signals of all UEs in the RIS-NOMA system, then each user obtains its desired signal by decoding unwanted signals. Because UE M is allocated the highest power allocation, it will directly decode its signal x M by treating the other sig- where h j = PL j g j H H with j = 1, 2, . . . , M . When decoding UE with index ρ th where ρ̸ =M and ρ > 1, all the previous signals, i.e., M th , (M − 1) th , . . . , (ρ + 1) th , should be cancelled, and the remaining signals, i.e., (ρ − 1) th , (ρ − 2) th , . . . , 2 nd , 1 st are considered as inter-user interferences [43]. For simplicity of equations, we omit the term t. Then, a perfect SIC at user UE ρ , the SINR at UE ρ to decode signal x ρ can be expressed as Finally, UE 1 will decode its own signal after cancelling all signals from previous users with index, i.e, M th , (M −1) th , . . . , 2 nd [44] and the SINR at UE 1 to decode signal x 1 is described as To determine the SINR in Eq. (5), Eq. (6), and Eq. Generate x 1 (t), x 2 (t), H(t), n(t) 4: Initialize g t , G t , t , y t as empty matrices M = 2 the number of UEs 5: for j = 1 to M do 6: Generate g j (t), θ i (t) 7: Using θ j (t) and α m , calculate φ j (t) from (3) 8: Using φ j (t), calculate i (t) 9: Calculate d ri (t) = d ri (t)v 10: Calculate received signal y j (t) at j th UE from (2) 11: (1) 12: Concatenate y j (t), G j (t) to y t , G t

13:
end for 14: Concatenate y t , G t to Y, G 15: end for 16: return Y, G

C. DATA PREPARATION
We assume that the channel gains follow independent and identically standard complex Gaussian (normal) distributions CN ∼ (µ, σ 2 ) with the mean µ and variance σ 2 [23], [45], [46]. The dataset is generated with K samples as in the Algorithm 1. In a time-series problem, each sample presents a time step t in the RIS-NOMA system. We assume that at each time step, UE j slowly moves further away from the BS with the same velocity. The dimension of matrices in the Algorithm 1 are denoted as ∈ C K ×N r N t ×M , n ∈ C K ×N r N t ×M , and G ∈ C K ×N r N t ×M . It can be seen that all matrices include dimensions which are the number of time steps K and the number of UEs M . The remaining dimensions depend on the number of antennas at the transmitters and the receivers and the elements of RIS. It is noted that in this study, the transmit signal at UE j which is denoted by x j follows the distribution CN ∼ (µ xj , σ 2 xj ) with the mean and variance µ xj and σ xj , where j = 1, 2, . . . , M . The output of the Algorithm 1, including received signal Y and the cascaded channel G, are processed and used as the input and output of the data in the proposed DL model.

IV. PROPOSED DEEP LEARNING MODEL
In this section, we propose a CNN-LSTM based model for addressing the channel estimation problem in the RIS-NOMA system. We also provide a complexity analysis of the proposed model and discuss the performance metrics to evaluate the effectiveness of the proposed model.

A. CNN-LSTM MODEL
In this section, an architecture of the proposed CNN-LSTM network is first briefly illustrated. Next, the flow of data in the DL model is presented. Then, the training process of the proposed DL-based channel estimation for the proposed RIS-NOMA communication system in Fig. 1 is illustrated.
This paper proposes a variant of an encoder-decoder LSTM model that composes of two modules, including CNN module at the encoder side and LSTM module at the decoder side [47] as in Fig. 2, which is called a CNN-LSTM model. This kind of variant achieves the advantage of both CNN model and LSTM model. Compared to a conventional LSTM which merely contains LSTM layers, the proposed DL model exploits the power of CNN in terms of feature extraction. In this paper, the CNN has been utilized to learn spatial correlated features from an input sequence [48] of the received signals at UEs. Then, the features from CNN are fed into LSTM for decoding, and in turn extract the temporal correlation to predict the output, the cascaded channels, at one or many future time steps.
As shown in Fig. 2, the proposed DL model consists of an input layer, a CNN module, a LSTM module, and an output layer. Each component of the model is introduced as follows.

1) DATA PROCESSING FOR INPUT LAYER
We need to process the output data from Algorithm 1 in Section III-B before feeding them into the input layer of the CNN-LSTM model. The first output of the Algorithm 1 is a time-series sequence of the received signals at UE j of the RIS-NOMA network, which is Y = {Y(1), Y(2), . . . , Y(K )}. In particular, the sample at time step t is Y(t) = [y 1 (t), y 2 (t), .., y M (t)] and y M (t) ∈ C N r ×N t , where t ∈ {1, 2, . . . , K }. The received signal at UE j which is denoted as y j (t) is a complex number, but a conventional DL model is designed for real values. Thereby, we extract two parts of a complex number including amplitude and angle, and then concatenate them as a dimension; resulting in y M (t) ∈ R 2×N r ×N t and Y ∈ R K ×M ×2N r N t . Likewise, the second output of the Algorithm 1 is the cascaded channels from BS at UE j of the RIS-NOMA is a K time steps sequence, which is denoted as G = {G(1), G(2), . . . , G(K )}. At each time step t with t ∈ {1, 2, .., K }, the cascaded channels of the RIS-NOMA system consist of individual cascaded channels from BS to each UE j , and is presented by G(t) = {G 1 (t), G 2 (t), . . . , G M (t)}. The goal of the channel estimation is to predict the cascaded channel gains, thus we extract the amplitudes of cascaded channels as the target of the CNN-LSTM model. In this paper, the training length or symbols count of the channel estimation duration is considered as long enough to estimate all channel coefficients of the RIS-NOMA wireless system.
For a time-series problem, it is necessary to transform both the input sequence and output sequence as an input-output time-series sequence. The total K samples of input Y(t) and target G(t) is transformed into a time-series sequence based on the input time steps t i and future time steps t o . After this transformation, the length of the dataset will be changed and is denoted by Sample. In turn, the time-series sequence is divided into a training dataset and a testing dataset before going to the input layer.

2) CNN MODULE
CNN is commonly used for image processing applications such as feature extraction and classification, where an image can be divided into three channels, including red, green, and blue. In this paper, the matrix of the received signal could be considered a three-dimensional (3D) image as the matrix includes three dimensions including the time step, the number of users, and the power values of the received signal [49]. Plus, the matrix of the received signals becomes larger when the number of RIS increases. Inspired by the analogy between a 3D image and the matrix of received signals, this paper uses CNN to extract the features of the received signals from users. The first layer of CNN is the input layer, and the last layer is the output layer, with a few hidden layers in between. The input of the CNN model has three dimensions height, width, and depth, which correspond to the number of UEs, the received signals at UEs, and the time steps in the proposed RIS-NOMA system. In most used CNN models, hidden layers compose of several convolution layers preceding pooling layers, through which feature space at the input is matched with the output. Each convolution layer contains several kernels (or filters) which define the depth of the layer. Kernels are known as feature detectors that connect the input and a set of weights by filters, and the convolution layer calculates the dot product between them. An activation function is applied at the output of each convolution layer to generate a feature map. This feature map serves as the input of the next layer, for example, the next convolution layer, or pooling layer. Following the convolution layers, pooling (sub-sampling) layers are used to reduce the number of CNN parameters by performing down-sampling for each feature map. Finally, the output of the last pooling layer is fed into a fully connected layer. The input of the fully connected layer is flattened into the form of a vector. The output of the fully connected layer is the final output of CNN. The main benefit of using CNN is the weights sharing feature, where weights and bias are the same for all neutrons in each hidden layer, reducing the trainable parameters that enhance the generalization and avoid overfitting. Another advantage is the ability to extract the inference characteristics of features, however, CNN needs huge training data to learn all parameters [50], [51]. The architecture of the proposed model is described in detail in Fig. 2. Firstly, the sequential input is fed into the CNN module as in Fig. 2. The CNN module includes two 2-D convolutional (2D-conv) layers, one max-pooling layer and one flatten layer. A convolutional layer is a feature map that extracts the features of the input data by using a convolution operation with filters, which is a set of locally connected kernels. Two 2D-conv layers apply the same hyperparameters and the number of filters is equal to N filter . The extracted feature dimensions after convolutional layers are very high, thereby a max-pooling layer with a size (p×p) is added to minimize the size of extracted features and further extract the most important features as well as obtain more useful information. In this way, CNN module can capture the features of the data. Finally, a flatten layer is applied to flatten all features into a single vector before feeding into the LSTM.

3) LSTM MODULE
LSTM is considered a useful approach to capture long temporal features such as sequences like in natural language processing. Motivated by this, it is feasible to apply the LSTM module when coping with time-series sequences of received signals and channel coefficients. However, LSTM properly does not capture well other features such as spatial features [51]. This is the reason for considering the combina- tion of the LSTM with the CNN module to capture not only time dependencies but also spatial features like the number of UEs and the received signal at each UE, for the proposed DL-based channel estimation method. To match the output of CNN as the input layer of LSTM, a RepeatVector layer is applied to repeat t o times to make sure that the output contains the expected number of output time steps. The LSTM module consists of one or several LSTM layers and each LSTM layer is composed of LSTM cells. LSTM cell or LSTM internal memory cell is a key element of the LSTM module that establishes the temporal connections of a LSTM layer.
The topology of a LSTM cell is illustrated in Fig. 3 which consists of two different recurrent features, including hidden sate h and cell state cs. A LSTM unit uses a previous state, including previous hidden state and previous cell state (h(t − 1), cs(t − 1)), and a current input x(t) to update the current time step (h(t), cs(t)) which can be expressed as where L is a function which is capable of mapping input sequence and output sequence at any length. A basic LSTM unit contains three gates, including input gate, forget gate, and output gate to control the cell state. In particular, the input sequence x(t) is fed into a LSTM cell at each time step t. In each cell, the input x(t) and the hidden state of the current time step h(t−1) are fed into the three gates (functions) before going through a sigmoid activation function σ . The update of a LSTM cell in terms of (h(t), cs(t)) at time step t is expressed as follows: where  cs(t − 1) should be retained. Input gate i(t) determines how much new information is taken into account via cell update cs u (t). The computation for cs u (t) is similar to the three gates above, but uses Tanh activation function with a value range (−1, 1) as Output gate o(t) controls how much content in the cell state cs(t) should leave the cell to arrive the hidden state h(t). The cell state at time t is presented as By using Tanh activation function, the hidden state of the current step h(t) of a LSTM cell is always in interval (−1, 1) and is expressed as The cell state in Eq. (11) and the hidden state in Eq. (12) is determined based on the input gate, output gate, and forget gate from Eq. (9), along with cell update from Eq. (10). A LSTM cell is flexible because it can obtain an input time-series sequence at any length, e.g., x(t − 1) and produce an output sequence at any length, e.g., x(t + k) as in Fig. 4.

4) OUTPUT LAYER
The output of the last LSTM layer is fed into the input of a TimeDistributed layer which wraps the data for each time step of the sequential data. The number of the output of the CNN-LSTM is N fo . The output is amplitude of the cascaded channel of the RIS-NOMA system at time step (t i + t o ). Based on the predicted cascaded channel at the output layer and the target cascaded channel in the RIS-NOMA system, the parameters of the CNN-LSTM model will be updated via an optimization algorithm. The objective of the optimization problem is to minimize the mean square error (MSE) loss function which is defined as where N is the total number of samples and G(t i + t o ) andĜ(t i + t o ) are the amplitude of target cascade channels and predicted cascaded channels from BS to two UEs. The flow of data through each layer of the CNN-LSTM model is presented in Fig. 2.

B. COMPLEXITY ANALYSIS
Based on the proposed CNN-LSTM in Fig. 2, we determine the computational complexity by calculating the number of FLOPs (floating operations). The reason is the proposed CNN-LSTM using a backpropagation algorithm during the training stage, which normally works with matrices. To calculate the complexity of CNN-LSTM model at each time step, we are required to calculate the time complexity of CNN module and LSTM module separately [52], [53]. In addition, the computational complexity of each module comes from those in each layer. The complexity of CNN module mainly comes from those in convolutional layers. The max-pooling layer contributes marginally to the total computational complexity of CNN module. Flatten layer does not require the computational complexity. Therefore, the complexity of CNN module is calculated as O( d l=1 n l−1 s w s h n l o w o h ), where l is the index of a convolutional layer, d is the number of convolutional layers, n l is the number of filters or the width in l th layer, s w and s h are the width and the height of the filter respectively, and o w and o h are the width and height of the output feature map correspondingly [54]. Regarding LSTM module, the complexity of LSTM per time step comes from the computational complexity of input gate, output gate, forget gate and cell update in Eq.9 and Eq.10 which is calculated as O(4( where D i is the input dimension and D o is the output dimension. The LSTM updates its complexity according to time step [55]. At last, the computational of the dense layer is determined by O(D o n d ) [56] where n d is the number of neurons in dense layer. In sum, the computational complexity of the CNN-LSTM will iterate via S time steps and I iterations is

C. PERFORMANCE METRICS
The dataset of the paper is generated by Algorithm 1, and divided into training and testing dataset. The performance of the proposed CNN-LSTM model is evaluated using performance metrics on test dataset such as normalized root mean square error (NRMSE), mean absolute scaled error (MASE), mean absolute percentage error (MAPE), and R-squared score (R2 score). In this paper, we evaluate the DL model on different datasets generated from different RIS-NOMA system parameters, in which each dataset has its scale. Due to independent scales between different datasets, scale-free error metrics are applied instead of scale-dependent error metrics [57]. The MSE and mean absolute error (MAE) is good to compare the performance of different DL models on a dataset or same scale datasets. The MSE is defined as in (13) and the MAE is defined as follows: However, two metrics MSE and MAE are no longer feasible when comparing performance accuracy of a model for VOLUME 1, 2023 different datasets [58]. To evaluate the performance of a DL model against datasets with different scales, sale-free metrics should be applied. Since the generated datasets in this paper have different scales by adjusting the parameters of the RIS-NOMA system, we apply scale-free metrics including NRMSE and MASE instead of MSE in Eq. (13) and MAE in Eq. (14). The metric NRMSE is defined as: where RMSE = 1 N N t=1 y t −ŷ t 2 ;ŷ t and y t are the predicted channel gain and target channel gain at a given time t respectively; and y max and y min are maximum and minimum values of the target channel gains, respectively. The NRMSE measures the normalization of the deviation between the observed value as well as the actual value [59]. The metric MASE is defined as [57]: The percentage error MAPE is defined as Besides the metrics that evaluate the behavior of the difference between the predicted and the actual data, the decision coefficient R2 score which is known as a goodness of fit to evaluate the level of fit of a model [60]. The metric R2 score is expressed as where in which µ y is the standard deviation of y. This study applies NRMSE and R2 score in Eq. (15) and Eq. (18) to evaluate the cascaded channels prediction accuracy and the goodness of fit of the proposed CNN-LSTM respectively. This is because the root mean square error (RMSE) is a standard error which is extremely sensitive to extraordinarily large or small errors in the dataset [59]. Two metrics MASE and MAPE in Eq. (16) and Eq. (17) are added to evaluate the prediction performance of the proposed DL model in several scenarios of the RIS-NOMA system.

V. SIMULATION SETUP AND RESULTS
This section provides simulation results to verify the efficiency of the proposed CNN-LSTM model for different RIS-NOMA systems.

A. SIMULATION SETUP
The definition of 6G is evolving, there is no standard for system-level parameters in a specific setting or use-case in 6G so far [61]. Therefore, in this paper, we introduce a possible parameter setting for the potential RIS-NOMA system model in 6G. In this simulation, we model the RIS-NOMA system for M = 2 UE, and each user is equipped with a single antenna. We set the distance from BS → RIS, RIS → UE 1 , and RIS → UE 2 are 150 m, 30 m, and 40 m, respectively. The reference distance is d 0 = 20m. Both UE move at v = 0.1 meter per time step t to ensure the signal strength of two UE are not too low after K time steps moving further from BS the received signal at each user. It is assumed that the distribution of x(t) follows CN ∼ (1, 0.1) and that of BS → RIS channel follows CN ∼ (1, 0.1). The distribution of links from RIS → UE j are CN ∼ (4, 1) and CN ∼ (3, 1), correspondingly. In line with significant research progress on the development of loss-less meta-surfaces [62], [63], this paper uses an ideal phase shift model in which the reflecting magnitude is considered as 1 without energy loss. We set up value of α m = 1 in (3) and α sr = α ri = 2.2 in (4). This is a time-series problem, so it is very sensitive to the change of the dataset, therefore, in this paper, we fix phase shift by generating fixed phase shifts in [0.01π, 0.02π]. We will observe the RIS-NOMA system versus the variation of system parameters including transmit SNR, the power allocation factor and the number of RIS elements. The details are illustrated in Table 1. Figure 5 shows the normalized (with respect to user 1) received signal power for both users over time in the case γ = 20, L = 20, and c = 0.3.
The Sample data points of the input-output time-series sequence will be divided into 0.3 : 0.7 for training data and testing data respectively. The input applies max-min normalization and the target is scaled in the range of [0, 1] before feeding into the DL model.
At the encoder side, we use the kernel size of convolution 2D (Conv2D) layer with (3, 1) and apply a same padding for each Conv2D layer. A max-pooling layer with p = 2 and the size is 2×2 is applied. The CNN-LSTM model is tuned with N filter ∈ {4, 8, 16, 64}. At the decoder side, each LSTM layer consists of N cell = 16 unit and at least one of two LSTM layers apply Tanh activation function. The output is finally fed into TimeDistributed Dense layer with the number of features at output layer N fo = 2, which is for two cascaded channel gains of two users, before reaching the output layer. Adam optimizer is used to optimize the loss function in (13).

B. LOSS CURVE ON TRAINING DATA
The MSE loss function in (13) is used to train the proposed CNN-LSTM model, which has been effective in calculating the back-propagation of loss values. The advantage of MSE is that it ensures the model has a few outliers prediction with large errors. The MSE loss over 4000 epochs on the normalized training dataset in case γ ∈ {10, 15, 20}, L = 20 and c = 0.25 is illustrated in Fig. 6. In three scenarios, the MSE loss curves perform the same curve behavior. As shown in Fig. 6, the loss converges very well on the normalized training data. It can be seen that the loss reduces to approximately 2E −4 for the first 200 epochs and reaches a factor of 1E −5 at 4000 epochs. A large number of training epochs is to ensure that the proposed DL model provides a good fit to the dataset which obtains a high R2 score.   L = 20. We can find that NRMSE decrease when the number of epochs increases. The NRMSE on the test dataset achieve a steady state at 4000 epochs at observed scenarios, which shows the robustness of the proposed DL model. The R2 score at each user and average R2 score gradually increase with respect to the number of epochs. At epoch 4000, the three values of R2 score are above 0.9. Fig. 9 and Fig. 10 study the average NRMSE performance and the average R2 score of the cascaded channel against the transmit SNR. In this simulation, we set the value of power allocation factor c ∈ {0.1 : 0.05 : 0.4}. Fig. 9 shows that the NRMSE decrease with the increase in transmit SNR of the system as a larger SNR contributes to better channel environments for both UEs. An increase in a transmit SNR can avoid a very low amplitude of the received signal at each user, which can degrade the channel estimation performance. A lower SNR leads to lower signal power or lower amplitude at each user. A low amplitude signal at each user will become a very low value after normalization at the input of the CNN-LSTM model. If the normalized input of the CNN-LSTM is too low, it can be treated as noise, which will degrade the prediction performance of the CNN-LSTM. As shown in Fig. 9 the NRMSE significantly reduces from γ = 10 dB to γ = 22.5 dB and gradually drops in very high SNR region from γ = 22.5 dB to γ = 30 dB. Especially, in the case of very high transmit SNR from 25 dB to 30 dB, the NRMSE slightly increases when c = 0.35. It could be due to differences in the received signal between two users which is significant, resulting in a huge difference between the scales of input features in the CNN-LSTM DL model. This is the reason for a slight reduction of the performance of the DL model in case c = 0.35 with a transmit SNR is 25 dB or 30 dB as in Fig. 9. Fig. 10 plots the average R2 score curves vs. transmit SNR to evaluate the degree of fit [59] of the DL model. In the other words, the R2 score curve shows how well the predicted cascaded channels catch up with the trend of the target cascaded channels in the RIS-NOMA system. As can be seen from Fig. 10, the prediction performance in terms of average R2 score is over 0.9 overall. This means that the level of a good fit of the proposed CNN-LSTM model is not sensitive to the transmit SNR parameter of the RIS-NOMA system.

E. POWER ALLOCATION COEFFICIENT
In Fig. 11, average NRMSE performance is observed according to power allocation factors which are in {0.1 : 0.05 : 0.4}. It can be seen that the NRMSE decreases with an increased power allocation factor, which are in line with the results in Fig. 9. In Fig. 12, we can find that the average R2 score performance of the cascaded channel is higher than 0.9 overall with different power allocation factors. The proposed DL model shows its robustness to the change of the power allocation factor, the most important parameter that defines NOMA system. The results in Fig. 12 are in line with those in Fig. 10.
In this paper, the average NRMSE and average R2 score are selected to evaluate the prediction accuracy and the good fit of the proposed CNN-LSTM for cascaded channels of the RIS-NOMA system. We also consider other two metrics, including the average MASE and average MAPE in several cases to evaluate their behavior along with average    to that of average NRMSE in the observed scenarios. This means that average MASE and average MAPE decreases when power allocation factor c increases or when transmit SNR (γ ) increases. Fig. 13 studies the impact of the number of RIS elements on the average NRMSE performance. As seen in (3), the number of RIS elements affects the angle of the received signal at each UE, which in turn has an impact on the input features of the CNN-LSTM model. Therefore, a change in the number of RIS elements influences the prediction performance of the proposed DL for the cascaded channels. Here, we set c ∈ {0.1, 0.15, 0.2} and γ = 15. In Fig. 13, we can observe that the prediction accuracy in terms of average NRMSE improves with an increase in the number of RIS elements. The higher the number of RIS elements, the stronger reflected signals from RIS can reach each user. It becomes evident from Fig. 13 that the important role of increasing RIS elements in increasing the prediction accuracy for RIS-NOMA by applying the proposed CNN-LSTM model. Observe from Fig. 14 that the average R2 scores versus the number of RIS elements are above 0.92, which indicates that the CNN-LSTM is able to follow the trend of the cascaded channel of the RIS-NOMA system. We can see that the results in Fig. 13 and Fig. 14 are aligned with those in Section V-D and Section V-E. Fig. 15 and Fig. 16 show the predicted cascaded channels and the target (actual) cascaded channels on test dataset for each UE of the RIS-NOMA by applying the proposed CNN-LSTM. In this setting, we set γ = 20, L = 20 and c = 0.3. The figures prove that the proposed CNN-LSTM has a good prediction performance at both UE; which means that  the predicted cascaded channel at each UE catches up very well with its target cascaded channel. The adopted channel estimation based CNN-LSTM model in the case of N filter = 64 is illustrated in Fig. 17 in case the NOMA-RIS parameters system includes γ = 10 dB, L = 20, and c = 0.3.

H. COMPARISON OF THE PROPOSED DL MODEL WITH OTHER DL MODELS
To demonstrate the robustness of CNN-LSTM based channel estimation for the NOMA-RIS system, they are compared against previous study benchmarks, which are illustrated in Table. 3. In Table 3, we describe scenarios, multiple access techniques and the DL-based channel estimation methods and performance metrics. We select the benchmark channel estimation method that is based on/modified from CNN or LSTM architecture, as it allows for easier comparisons with our  proposed CNN-LSTM architecture that we proposed. It can be seen from Table 3 recent studies generally apply orthogonal frequency-division multiple access (OFDMA) method, which is different from the NOMA approach as presented in this paper. Essentially, there are many differences between an OFDMA-based system and a NOMA-based system, such as variations in system design and parameters. As a result, there are different approaches to resource allocation among users, as well as different strategies for utilizing the capabilities of RIS to improve the signal quality.
Moreover, normalized mean square error (NMSE) is a commonly used metric in prior studies. However, our paper employs NRMSE, which is the square root of NMSE, for the sake of easy visualization and consistency with the dataset's scale. Results in [64] and [65] indicate that certain studies on channel estimation, including those using machine learning, have employed NRMSE as a metric. Nonetheless, in cases where the NRMSE values are very similar between two scenarios, i.e., two sets of parameters in RIS-NOMA system, additional metrics are necessary to determine which scenario performs better. To make the evaluation more comprehensive, our paper incorporates two other metrics, namely MASE and MAPE. By using NRMSE, MASE, and MAPE, we only examine the extent of prediction errors in each scenario. On the other hand, the R2 score is a crucial metric that evaluates how well the predicted channel aligns with the pattern of the target channel.
To evaluate the effectiveness of the proposed CNN-LSTM model, this paper compares the prediction performance of four DL models with four other DL models including CNN1D-LSTM, CNN1D-BiLSTM, CNN and LSTM. It is noted that we use symbol CNN1D-and CNN2D-to distinguish between models using conv1D and conv2D while making a comparison. Among them, CNN1D-LSTM uses conv1D layer while the proposed CNN2D-LSTM applies conv2D layer at their CNN modules. The difference between conv1D layer and conv2D layer is the dimension of input data. The input data of conv1D is two-dimensional data while of conv2D is three-dimensional data. The conv1D combines the last two features of the input data of conv2D as one feature, in which the dimension of input sequence of conv1D becomes [Sample, t i , 2 * M * N r * N t ]. For performance comparison, we also use CNN1D-BiLSTM [67], which combines CNN1D and Bi-LSTM. In [67], CNN1D-BiLSTM is proposed for CSI estimation in an OFDMA assisted wireless system without using RIS in a high-speed mobile scenario. Furthermore, we apply the CNN model which uses conv2D [33], [66] and LSTM model [68] for the comparison. We use similar hyperparameters set up for CNN model and LSTM model as those in the CNN module and the LSTM module of the proposed CNN2D-LSTM, respectively. To analyze the prediction performance of the four DL models, we set the parameters of the RIS-NOMA with L = 20, γ ∈ {10, 15, 20} dB, and c = 0.3.
In order to evaluate the behaviour of different models on a dataset, we use four scale-dependent metrics which consist of average RMSE, MAE, MAPE and R2 score as  Table 4. Table 4 shows the accuracy superiority in terms of four metrics of the proposed CNN2D-LSTM model over four remaining models. In case γ = 10 dB, the prediction accuracy in terms of RMSE and MAE values are similar for the CNN2D-LSTM model and CNN model. However, CNN2D-LSTM model is superior in terms of MAPE and R2 score compared to CNN model. Table 4 shows that using the conv2D layer outperforms using the conv1D layer regardless conv1D is combined with LSTM or Bi-LSTM. For example, e.g, the RMSE = 0.0142, RMSE = 0.0391, and RMSE = 0.2504 for CNN2D-LSTM, CNN1D-LSTM, and CNN1D-BiLSTM, respectively. CNN model shows a relatively good performance in terms of four performance metrics in case of γ = 10 dB and γ = 15 dB. Although in these two cases, its prediction accuracy is better than that of CNN1D-LSTM, CNN model fails to follow the trend of the cascaded channel of the RIS-NOMA system with R2 = −4.2541 in the case of γ = 20 dB. Clearly, CNN is sensitive to the change of the transmit SNR. Although CNN has the ability of feature extraction, its performance for time-series sequences of the cascaded channel is not robust to the change of the transmit power parameter in the RIS-NOMA. Different from the four remaining models, LSTM models failed to predict the cascaded channels of the RIS-NOMA system for all observed scenarios with very high values at average values of root mean square error (RMSE), MAE, and MAPE, especially with minus values of average R2 score. The LSTM model cannot exploit the feature extraction of CNN module as in CNN2D-LSTM, therefore it fails to predict the cascaded channel of the RIS-NOMA system. Through the comparative analysis, the proposed CNN2D-LSTM proves its feasibility and effectiveness over the four remaining models in the cascaded channel estimation problem in the RIS-NOMA system. This is because the proposed model takes advantage of the feature extraction characteristic of CNN model and the time-series prediction model LSTM model.

VI. CONCLUSION AND FUTURE WORKS
In this paper, we proposed a CNN-LSTM model to predict the cascaded channel of the RIS-NOMA network in which both users are slowly moving further from the BS. The CNN-LSTM shows its robustness to the variation of the RIS-NOMA system parameters. In other words, the CNN-LSTM provides a good channel prediction in terms of NRMSE, R2 score, MASE and MAPE. To evaluate the efficiency of the CNN-LSTM, we compare its performance accuracy with that of a CNN-LSTM model using conv1D, CNN model and LSTM model. The CNN-LSTM provides better prediction accuracy in terms of RMSE, MAE, MAPE, and R2 score compared to the other four models. In future work, we can consider the Doppler shift effect on the RIS-NOMA system. In order to deal with practical scenarios, a more general fading channel model, such as Rician fading, can be applied to all channels in the wireless system in future works. The RIS-NOMA system could include the direct channels from BS to each user to consider the effect of the proposed CNN-LSTM model, however, required further investigation. Plus, it is worth noting that an ideal phase shift is infeasible for practical hardware because of unavoidable energy dissipation [69], our future research will consider a practical phase shift model for the proposed RIS-NOMA system. In addition, to reduce the complexity of channel estimation and improve the performance of a system based on RIS, an optimization for the channel estimation of training length or symbols count can be taken into account as well where channel data can be provided by proposed model [70]. The proposed channel estimation technique can be further considered and investigated in a scenario of integrated sensing and communication as in [33] in which RIS can potentially work for both sensing and communications signals from UE.