Quantum Search-Aided Multi-User Detection of IDMA-Assisted Multi-Layered Video Streaming

Moore’s law is expected to lead to the gates of the quantum world in 2017. Therefore, the emerging quantum computing research is expected to give rise to novel quantum search algorithms, which may replace the currently used classical ones in wireless communications, leading to performance improvements and complexity reduction. In this paper, we demonstrate the benefits of quantum-assisted multi-user detection (QMUD) in the uplink of a multi-user system, where the reference user conveys a multi-layered video stream to the base station, while using adaptive modulation and different rates per video layer. This is the first study, where a QMUD is employed in a video application. The QMUD does not treat the rest of the users as interference, but rather detects the signals transmitted by all the users. We have evaluated the system’s performance both in terms of its bit error ratio and peak signal-to-noise ratio versus the channel’s signal-to-noise ratio, while quantifying the complexity reduction achieved by using the QMUD instead of the optimal classical maximum a posteriori probability MUD. The effect of the number of users on the system’s performance is also quantified.

We live in an era of wireless ''tele-presence'', but naturally, the different applications impose diverse requirements. High-quality video and audio streaming require ultra-fast Internet connections for achieving the best user experience [1], while health and safety applications rely on secure low-delay communications. At the same time, the Internet of Things (IoT) [2] requires low power consumption.
In crowded areas, such as airports, train stations, stadiums, festivals or concerts, sometimes it is not possible to acquire a connection, because it is prevented by a high number of users simultaneously uploading videos, photos or making a phone call. These challenges in the field of classic wireless communications are addressed with the aid of classical computing, which typically relies on suboptimal search methods and hence results in suboptimal quality of service, due to the prohibitively high computational complexity of optimal classical full-search algorithms. When quantum computing [3]- [5] becomes a commercial reality, quantum algorithms may replace their classical counterparts in wireless communication systems, heralding the era of quantum-assisted communications [6]. Quantumbased communications [6], which rely on the transmission and reception of quantum bits, or qubits, 1 are expected to introduce genuinely new wireless paradigms, which were impossible to conceive with the aid of classical computing, such as 100% secure communications. In the context of quantum-assisted communications, which is the scope of this paper, Quantum Search Algorithms (QSA) [3], [4], such as Grover's QSA [8]- [10], the Boyer-Brassard-Høyer-Tapp (BBHT) QSA [11], or the Dürr -Høyer Algorithm (DHA) [12] may be used for solving optimization problems. In 1996, Grover [8] proposed a QSA that succeeds with ∼100% probability in finding the position of a desired value δ in an unsorted database of length N , after as few as O( √ N ) queries of the database. Compared to the complexity required by the classical ''brute force'' search, which is on the order of O(N ), Grover's QSA offers a quadratic reduction in complexity. As a further development, the BBHT QSA [11] relies on Grover's QSA and manages to find the location of δ in a database, even if δ appears S times, when S is unknown to us. Finally, the DHA [12] exploits both the BBHT QSA and Grover's QSA for finding the position of the minimum entry in the database after O( √ N ) queries. In the uplink of a multi-carrier Non-Orthogonal Multiple Access (NOMA) system, the Base Station (BS) performs Multi-User Detection (MUD) for detecting the symbols of the users who transmitted on the same subcarriers [13], [14]. If iterations are allowed between the MUD and the channel decoders of the users, then the employment of softintput soft-output MUDs substantially improves the system's Bit Error Ratio (BER) performance. The optimal Maximum A posteriori Probability (MAP) MUD is associated with the best achievable performance, but its complexity increases exponentially with the number of users supported, in other words, with the size of the search problem. As demonstrated in [15], classical heuristic and bio-inspired MUDs perform well, despite requiring a substantially lower complexity in under-loaded or full-rank multi-user systems, where the number of users U is not higher than the number of receive Antenna Elements (AE) P at the BS. On the other hand, in rank-deficient scenarios, where U > P, their performance is gravely degraded. In [15] and [16] we proposed a Quantumassisted MUD (QMUD), namely the DHA-aided MUltiinput Approximation with Forward Knowledge Transfer (DHA-MUA-FKT) QMUD, which offers a near-optimal performance, similiar to that of the MAP MUD, at a fraction of its complexity. In rank-deficient systems, where hard-outputs at the MUD are sufficient, the DHA QMUD [17] matches the performance of the Maximum Likelihood (ML) MUD, while offering a quadratic reduction of complexity. In [18] a joint quantum-assisted channel and data detector was proposed for the uplink of NOMA, where quantum computing was amalgamated with the evolutionary repeated weighted boosting search algorithm [19]. For the downlink of a NOMA system, a quantum-aided method was proposed in [20] for finding the optimal precoding matrix, in terms of minimizing all users' average BER. Furthermore, in non-coherent detection scenarios, where no channel estimates are available at the BS, a non-coherent QMUD may be employed [21] or a Quantum-assisted Multiple Symbol Differential Detector (QMSDD) may be used, when the users opt for differential modulation [13].
In the aforementioned contributions, QMUDs were employed in general multi-user scenarios, where a fixed timeinvariant modulation was adopted by each user. By contrast, our novel contribution is that a reference user transmits multilayered video to the BS relying on a Multi-Carrier Interleave Division Multiple Access (MC-IDMA) [16] system, while near-instantaneously adapting the modulation scheme according to the prevalent channel conditions. At the same time, since this is a NOMA system, multiple users share the same frequency resources for transmitting their own data to the BS. The BS does not treat the multiple users as interference imposed on the reference user, but rather aims for jointly detecting the signals of all users by employing the DHA-MUD QMUD or the MAP MUD, without any bias towards a specific user.
It should be noted that the proposed quantum computing techniques may be used in any multiple access or multi-stream scheme, including Orthogonal Frequency Division Multiple Access (OFDMA) as well as in any NOMA schemes apart from Multi-Carrier Code Division Multiple Access (MC-CDMA) and Sparse Code Multiple Access (SCMA). In orthogonal multiple access schemes, the QMUDs would operate as single-stream quantum detectors on each subcarrier, aiming to detect the symbol transmitted by each user. As it will be discussed in the following, since the quantum search algorithm, which are employed in the QMUDs and quantum detectors, shine when the legitimate number of transmitted symbols is high, quantum detection could be used when the users employ high-order modulation schemes.
The research community has mainly focused its efforts on resource allocation conceived for multi-user environments supporting video transmission, while separating the users in the frequency domain, hence eliminating the requirement of MUD. Wang et al. [22], [23] proposed power-and subcarrier-allocation algorithms based both on the quality of the channel estimates and on the achievable throughput, when multiple users upload their own video in a multi-carrier system. Each subcarrier is allocated to a single user, therefore the BS performs single user detection on a per subcarrier basis. Similarly, Xiao and van der Schaar [24] presented a resource allocation algorithm, which jointly maximizes the video quality of all the users, while separating them in the time-or frequency-domains. Zhao et al. [25], proposed a joint rate and power allocation algorithm, when multiple users transmit their videos in a Code Division Multiple Access (CDMA) system. In that contribution, the users' signals are detected by the classic linear Minimum Mean Square Error (MMSE) detector, while detecting the signal of a particular user.
This MMSE MUD treats the signals of the rest of the users as Multiple Access Interference (MAI). By contrast, in our solution the multiple users are not separated in the frequency-, time-or code-domain and their signals are jointly detected at the BS using an MUD. In practical scenarios, the overall throughput of the system will be increased if multiple users are allowed to transmit simultaneously, while exploiting the same resources, given that their signals will be separated based either on their spatial signature or on their interleaving sequence at the BS, by using an appropriate MUD.
Layered video coding is capable of generating multiple layers of unequal importance. The most important layers are referred to as the base layer (BL), while the less important layers depend on the BL and are referred to as enhancement layers (ELs). When the related BL is corrupted or lost due to channel impairments, the layered video decoder has to discard the corresponding ELs, regardless of whether they are correctly recovered or not. Hence it it intuitive to employ unequal error protection (UEP) for layered video communications, where a higher protection may be assigned to the BL than to the ELs. Numerous contributions have been disseminated on the UEP of layered video, which were reviewed in [26]. A particularly beneficial solution is to embed all the bits of the BL into the ELs with the aid of taking their modulo-2 sum, which allows us to recover the corrupted BL with the aid of the ELs by their joint iterative detection, provided that the ELs were received perfectly [23].
Against this background, our novel contributions are:  The rest of the paper is structured as depicted in Fig. 1. In Section II, we discuss layered video streaming, while in Section III, we investigate the system model, including the encoding, transmission, reception and decoding of the video bits. Section IV includes the basic prerequisites of quantum computing and the investigation of the DHA-MUA-FKT QMUD, while our simulation results are discussed in Section V. Finally, our conclusions are offered in Section VI.

II. LAYERED VIDEO TECHNOLOGIES
Layered video compression [29], [30] encodes a video sequence into multiple layers, which enables us to progressively refine the reconstructed video quality at the receiver, as and when the channel quality improves. Again, the most important layer is referred to as the base layer and the less important layers are termed as enhancement layers, which rely on the BL. Furthermore, an EL may be further relied upon by less important ELs. Again, when the BL or an EL is lost or corrupted during its transmission, the dependent layers cannot be utilized by the decoder and must be dropped. A layered video scheme is displayed in Fig. 2, where the video sequence captured from the scene is encoded into four layers by the layered video encoder, namely L 0 ∼ L 3 , where layer L i (0 < i ≤ 3) depends on layer L i−1 for decoding, while layer L i improves the video quality of layer L i−1 . In other words, layer L 0 is the BL and layers L 1 ∼ L 3 are ELs depending on the BL. Furthermore, as shown in Fig. 2, the ELs L 2 and L 3 rely on the EL L 1 . In other words, if layer L 1 is corrupted, then layers L 2 and L 3 are dropped by the decoder. Let us introduce a number of layered video coding techniques that have been investigated and / or standardized.

A. PARTITION MODE OF H.264
A number of layered video coding schemes [31] have been developed and some of them have been adopted by recent video coding standards, such as, for example, the Scalable Video Coding (SVC) [29] and the Data Partitioned mode (DP) [30], [32], [33]. In the DP mode, the data streams representing different semantic importance are categorized into a maximum of three bit streams / partitions [34] per video slice, namely the type A, type B and type C partitions. The header information, such as the macroblock (MB) types, the quantization parameters and the motion vectors are carried by the type A partition. The type B partition is also referred to as the intra-partition, which contains intra-frame-coded information, including the Coded Block Patterns (CBP) and intra-coded coefficients. The type B partition is capable of prohibiting error propagation in the scenario, when the reference frame of the current frame is corrupted. In contrast to the type B partition, the type C partition is the inter-coded partition, which carries the inter-CBPs and the inter-frame coded coefficients. The type C partition has to rely on the reference frame for reconstructing the current picture. Hence, if the reference picture is corrupted, errors may be propagated to the current frame. Amongst these three partitions, the type A partition may be deemed to be the most important one, which may be treated as the BL. Correspondingly, the type B and C partitions may be interpreted as a pair of enhancement layers, since they are dependent on the type A partition for decoding. Albeit the information in partition B and C cannot be used in the absence of partition A, partition B and partition C can be used independently of each other, given the availability of partition A. The dependency of the layers in the partitioned mode of the H.264/AVC video CODEC is exemplified in Fig. 3, where the group of pictures (GOP) parameter is GOP = 2.

B. SCALABLE VIDEO CODING
The subject of scalable video coding [29], [35] has been an active research field for over two decades. This terminology is also used in the Annex G extension of the FIGURE 2. Architecture of a layered video scheme [27], [28], where the video quality is refined gradually. A'' indicates B is predicted from A. Here the phrase B ''depends on'' A implies that the layer B will be discarded by the video decoder, if layer A is lost. The phrase B ''is predicted from'' A means that the layer B may still be usefully utilized by the video decoder, if layer A is lost. Hence the relationship ''depends on'' is stronger than ''is predicted from''.
H.264/AVC video compression standard [30]. Indeed, SVC is capable of generating several bitstreams that may be decoded at a similar quality and compression ratio to that of the existing H.264/AVC CODEC. When for example low-cost, low-quality streaming is required by the users, some of the ELs may be removed from the compressed video stream, which facilitates flexible bitrate-control based on the specific preferences of the users. A H.264/AVC scalable video stream contains a sequence of Network Abstraction Layer Units (NALUs) [30], which consist of a header and a payload. The header contains the information about the type of NALU and its function in the video reconstruction process, while the payload carries the compressed signals of a video frame. The parameters dependency ID (DID), temporal ID (TID) and quality ID (QID) contained in the NALU header describe the scalability feature of the bitstream. Specifically, DID, TID and QID represent Coarse Grain Scalability (CGS), Temporal Scalability (TS) and Medium Grain Scalability (MGS) [29], respectively. The CGS feature facilitates the coarse adaption of video properties, such as the spatial resolution of the video, reconfiguring from Quarter Common Intermediate Format (QCIF) to CIF, where the video can be encoded into a set of enhanced sub-streams referred to as dependent-layers. The DID parameter represents the dependent-layer which the current NALU belongs to. The decoding of a NALU with DID > 0 depends on the NALUs associated with (DID − 1), but with the same TID and QID values. Based on this dependency rule, the video bit-rate and quality may be readily reduced by removing the particular NALUs associated with a DID larger than a specific DID parameter. Similar dependency rules exist for the temporal scalability and MGS features. The dependency of the layers in the SVC stream is exemplified in Fig. 4. Similarly, SVC is also developed as a profile of the H.265 High Efficiency Video Coding (HEVC) standard [36], [37].

C. MULTIVIEW VIDEO CODING
Recently, the Joint Video Team (JVT) proposed MVC as an amendment to the H.264/AVC standard [30]. Apart from the classic techniques employed in single-view coding, VOLUME 5, 2017  multi-view video coding invokes the so-called inter-view correction technique by jointly processing the different views for the sake of reducing the bitrate. Hence, the first encoded view may be termed as the BL, while the remaining views may be treated as the ELs. The dependency of the layers in the MVC stream is exemplified in Fig. 5.

D. OTHERS
Set-Partitioning In Hierarchical Trees (SPIHT) [38], [39] was originally proposed as an image compression algorithm, which encodes the most important wavelet transform coefficients first and allows an increasingly refined reproduction of the original image. A multiview profile (MVP) [40] was developed by the Moving Picture Expert Group (MPEG)'s [1] video coding standard, where the left view and right view were encoded into a BL and an EL, respectively. Again, the emerging H.265 scheme will continue to include a scalability profile.

III. SYSTEM MODEL
Let us now consider the uplink of an MC-IDMA system supporting U users. The reference user, who is assumed to be the first user in our scenarios, without any loss of generality, conveys a video to the BS. Therefore, initially the reference user encodes the video information in L video = 3 video layers using an H.265 encoder, as described in Fig. 6.

A. CHANNEL ENCODER
The bit streams of the different video layers of the reference user, namely b , for the BL, EL1 and EL2, respectively, are channel encoded in parallel, independently of each other, as presented in Fig. 6. The uth user, u ∈ {2, 3, · · · , U }, is not assumed to transmit video, therefore their data may be encoded serially as a single stream. Each encoded bit stream may be spread using a repetition code as a direct spreading sequence having a spreading factor of SF (l) u , where u ∈ {1, 2, · · · , U } and l ∈ {BL, EL1, EL2}, when u = 1. It should be noted that the spreading factor of each video layer for the reference user may be different from each other, resulting in unequal protection for the video layers. As it was mentioned in Section II, the BL is the most important video layer, because without an error-free reception of the BL, any potential correct reception of EL1 or EL2 will be wasted. Therefore, one may prefer to include additional parity bits for the BL, hence reducing its video throughput, but increasing its resilience to noise.

B. ADAPTIVE MODULATION
Still referring to Fig. 6, the reference user may use Adaptive Quadrature Amplitude Modulation (AQAM), adjusting the modulation based on the channel quality. When the channel quality allows it, bits from multiple video layers will be mapped to a single transmitted symbol, by increasing the constellation's size. A single packet of our multi-carrier system includes D number of consecutive OFDM symbols, with each OFDM symbol being transmitted on Q subcarriers. It is assumed that all U users supported by the system transmit on all Q available subcarriers. Therefore, an MUD should be employed on a per subcarrier basis, for detecting the symbol transmitted by each user on each subcarrier. The modulation FIGURE 6. MC-IDMA uplink communication system's block diagram supporting U users employing channel coding as well as iterative, soft-input soft-output QMUD at the BS. The first user is the reference user, who channel encodes 3 video layers in parallel and transmits them, if the channel quality allows it, using adaptive modulation. The interleaving sequence for each employed interleaver is unique. scheme that will be selected by the reference user depends on the instantaneous received SNR, averaged over all the subcarriers and OFDM symbols of a specific packet, as well as over the P receive AEs at the BS. We have assumed that the adaptation of the modulation scheme is controlled on a per packet basis, therefore the reference user employs the same modulation scheme for transmitting all the D · Q symbols of a packet. The instantaneous received SNR per symbol per receive AE of the reference user of a specific packet is equal to p is the Frequency-Domain CHannel Transfer Function (FD-CHTF) between the uth user and the pth receive AE for all Q subcarriers of the D consecutive OFDM symbols of the transmitted packet, and it may be described as where H (u) p,o,q is the complex-valued channel coefficient between the uth user and the pth receive AE on the qth subcarrier of the oth OFDM symbol, in conjunction with The predetermined received power thresholds SNR BPSK , SNR QPSK , SNR 16QAM , determine the SNR inst ranges, where modulation schemes are switched. More precisely, the reference user's modulation scheme's switching takes place based on the instantaneous SNR per symbol per receive AE of (1), as described in Table 1. When the instantaneous SNR inst is lower than SNR BPSK , then the reference user is in a No Transmission (No Tx) mode, where it is preferable to pause transmission of the specific packet, for saving the transmission power, since, assuming that the power threshold SNR BPSK is chosen correctly, a potential transmission would have resulted in corrupted information bits. As discussed in Section II, when a video frame is erroneously decoded, the previous video frame may replace it, resulting in a screen ''freeze''. Still referring to Table 1, when the instantaneous SNR of the reference user SNR inst lies between SNR BPSK and SNR QPSK , then the Binary Phase Shift Keying (BPSK) modulation scheme is used by the reference user. Similarly, if we have SNR QPSK ≤ SNR inst < SNR 16QAM , then the Quadrature Phase Shift Keying (QPSK) is employed and if SNR 16QAM ≤ SNR inst , the 16-ary Quadrature Amplitude Modulation (QAM) scheme is used.
The SNR threshold of a specific modulation scheme is selected based on its BER performance during the fixed modulation transmission mode [41]. In this contribution, the BER threshold is equal to 10 −5 for all modulation schemes and video layers. The selection of the SNR thresholds relies on the following methodology: • The SNR BPSK is equal to the channel SNR value, where the BER performance of the BL, when BPSK symbols are transmitted, is below 10 −5 .
• The modulation scheme should switch to QPSK, when the BER performance of the BL is below 10 −5 for at least maintaining the quality of service offered by the BPSK scheme. Therefore the SNR QPSK is equal to the channel SNR value, where the BER performance of the BL, when QPSK symbols are transmitted, is below 10 −5 .
The BER performance of the EL1 at that specific channel SNR value may or may not be sufficiently good, but from this SNR value and on, the QPSK offers at least equal quality of experience to that of the BPSK scheme.
• Similar to switching from BPSK to QPSK, the transition to 16-QAM should occur, when the 16-QAM scheme offers at least the same target video quality as the QPSK scheme. In other words, by making the logical assumption that both the BL and the EL1 of the QPSK scheme at that point have reached a BER lower than 10 −5 , the SNR 16−QAM should be equal to the channel SNR value, where the BER performances of both the BL and of the EL1 are below 10 −5 , when 16-QAM symbols are transmitted. Let us note that based on the aforementioned discussions, we should expect the specific threshold SNR values to depend on the system parameters, such as the number of users supported in the system, the coding rate, the spreading factors and the position of the video bits in a symbol, when SP mapping is used.

C. MAPPING VIDEO LAYERS TO SYMBOLS
As mentioned in Section II, for decoding a layered video stream, the enhancement layers are useful only when the BL has been correctly received. This hierarchy still stands between enhancement layers as well. For example, in our scenarios, an error-free reception of EL2 is only useful when both the BL and EL1 of that specific video frame have also been correctly decoded. In our system, each packet includes a specific fixed number of symbols, regardelss of the channel quality. Therefore, when AQAM scheme is used, the decision concerning the choice of the specific modulation mode is followed by a choice of which layers of the video frame will be mapped to the symbols of that packet. Since the BL of all video frames is always required for the correct decoding of the video, one bit of the BL of the transmitted video frame is always included in the transmitted symbol, regardless of the soecific modulation mode that has been selected. Additionally, when QPSK symbols are transmitted, a bit from the EL1 may accompany the bit from the BL. Finally, when the 16-QAM mode is available, we have opted for transmitting one bit from the BL, one bit from the EL1 and two bits from the EL2, to be fairer to EL2, since its bits will only be transmitted when the channel quality is very good. The video layer-to-symbol mapping is encapsulated in Table 2. Please note that since iterations are allowed between the MUD as well as the despreaders and channel decoders (MUD-DES/DEC) of Fig. 6, SP mapping is used instead of Gray mapping, for the sake of increasing the mutual information at the output of the channel decoders with the aid of multiple MUD-DES/DEC iterations. We show that in our application, where a symbol's specific bit may correspond to a different video layer, both the number of video layer-bits per symbol and also the specific bit position it occupies in that SP-mapped symbol are important in terms of the system's PSNR versus channel SNR performance.

D. TRANSMISSION & RECEPTION
At the same time, the remaining (U − 1) users only transmit BPSK symbols and their signals are not considered as interference at the BS. The BS detects each user's symbols, without giving preference to any particular user. Let us now assume the worst-case scenario in terms of the detection performance versus complexity, where all U users transmit on all of the Q available subcarriers. Each user's symbol stream is modulated using a Q-point Inverse Fast Fourier Transform (IFFT) as depicted in Fig. 6.
Assuming that this is a synchronous system, the U signals arrive simultaneously of the P receive AEs at the BS. Since the Q orthogonal subcarriers do not interfere with each other, the MUD may operate on a per subcarrier basis. Therefore, on the qth subcarrier, the U channel-contaminated signals are added together at each of the P receive AEs, along with the respective Additive White Gaussian Noise (AWGN), which is a random, Gaussian-distributed, complex-valued variable with zero mean and a variance of N 0 = 2σ 2 . Consequently, the signal of the oth OFDM symbol's qth subcarrier, where we have q ∈ {1, 2, · · · , Q} and o ∈ {1, 2, · · · , D}, at the input of the MUD is described as where y o,q = y 1,o,q , y 2,o,q , · · · , y P,o,q T is a (P × 1)element complex-valued vector that contains the received signal on each receive AE for the oth OFDM symbol's qth subcarrier, H o,q is the (P × U )-element complex-valued FD-CHTF matrix that includes the channel states between the uth user and the pth receive AEs, for u ∈ {1, 2, · · · , U } and p ∈ {1, 2, · · · , P}, for the oth OFDM symbol's qth subcarrier, as encapsulated in where, similarly to (2), H p,o,q is the complex-valued channel coefficient between the uth user and the pth receive AE for the oth OFDM symbol's qth subcarrier. Continuing from (3), vector that includes the symbols transmitted by each user. Again, it should be emphasized that in our scenarios the symbol of the first (reference) user x (1) o,q may be from the BPSK, QPSK or 16-QAM constellation, while the rest of the symbols in x o,q of (3) belong to the BPSK constellation. Finally, the (P × 1)-element complex-valued vector n o,q = n 1,o,q , n 2,o,q , · · · , n P,o,q T in (3) represents the AWGN on each receive AE for the oth OFDM symbol's qth subcarrier.

E. MULTI-USER DETECTION
The MUD determines which symbol each of the U users has transmitted on each subcarrier of every OFDM symbol, by exploiting the knowledge of the received signal y o,q , the channel estimatesĤ o,q and the noise variance N 0 . In this contribution we assume that perfect channel estimates are available at the BS, therefore we haveĤ o,q = H o,q , for every o ∈ {1, 2, · · · , D} and q ∈ {1, 2, · · · , Q}. Let us focus on the qth subcarrier of the oth OFDM symbol without any loss of generality, for enabling us to omit the subscripts o and q. By applying Bayes' theroem, the conditional probability that the specific multi-user symbol x of (3) was transmitted by the users, given that the received signal is equal to y of (3) is described by where P(x) is the a priori probability that the multi-level symbol x was transmitted by the users and P(y) is the probability of the multi-antenna signal y to have been received without any conditional events. In our systems, we assume that all symbols x initially exhibit an equiprobable a priori probability of where M BPSK = 2 is the constellation size of the BPSK scheme of the (U − 1) interferring users, while M AQAM is the constellation size of the modulation scheme that the reference user employs. When multiple MUD-DES/DEC iterations are allowed between the MUD and the decoders, the a priori probability of each symbol will differ. In that case, the a priori probability of a symbol x is equal to the product of the individual a priori probabilities of the constituant bits, as described in where b (u) m is the mth bit of the uth user. Still referring to (5), P (y|x) is the conditional probability that y would have been received if x was transmitted. This so-called channel probability P (y|x) is equal to [14] The MAP MUD calculates the a posteriori Log Likelihood Ratio (LLR) of each bit that is included in a multi-level symbol, by taking into account all legitimate multi-level symbols, as seen in Fig. 6 and given by where χ (u, m, v) is the subset of legitimate symbols that have the mth bit for the uth user equal to v, with u ∈ {1, 2, · · · , U }, m ∈ 1, 2, · · · , M AQAM when u = 1 and m ∈ {1} when u = 1, and v ∈ {0, 1}, while f (x) is the Cost Function (CF) of the MUD. According to (9) the CF is equal to where P(x) is described in (7). Each calculation of f (x) in (10) counts as a single Cost Function Evaluation (CFE), which is our complexity metric. Since the MAP MUD computes the CF values of all legitimate symbols, it requires M U −1 BPSK · M AQAM number of CFEs of (10) per subcarrier, which may be excessive, when U is high.

F. MUD -DESPREADING / DECODING ITERATIONS
Once the a posteriori bit-based LLRs have been calculated by the MUD, the respective a priori bit-based LLRs are subtracted from them, for creating the extrinsic bit-based VOLUME 5, 2017 LLRs L m,e b (m) u as illustrated in Fig. 6 and encapsulated in The extrinsic LLRs at the output of the MUD are deinterleaved using the user-as well as video layer-specific deinterleaver and are then fed to the corresponding DS Despreaders and subsequently to their respective and independent channel decoders as a priori bit-based LLRs. After the decoding procedures the a posteriori LLRs at the output of the channel decoders, the a posteriori LLRs of the coded bits are then spread again, and are turned into extrinsic LLRs by a similar procedure as that described in (11) Fig. 6, which reconstructs the transmitted video.

G. VIDEO TRANSMISSION SPECIFICATIONS
During the transmission of video in our scenarios, we have created a number of specifications for simulating a practical system for creating meaningful comparisons. Each packet conveys a specific, predetermined number of symbols D · Q. Since all the symbols in each packet will be carried by the same modulation scheme, the number of coded video bits per packet will be is the spreading factor of the BL. Similarly, for packets using the QPSK and 16-QAM modes, the number of bits per video layer is equal to respectively. Since it is impossible to have an integer number of packets per video frame, we have assumed the following specifications: • If the remaining bits to be transmitted for an EL of the currently transmitted video frame are fewer than the available bits of the same EL in the next packet, we may fill the remaining bit positions with bits from the same EL of the subsequent video frames, subject to a specific video frame delay constraint.
• If the remaining bits to be transmitted for the BL of the currently transmitted video frame are fewer than the available bits of the same BL in the next packet, we may pad the remaining bit positions with random bits. Even though this methodology reduces the effective throughput of the system, it slightly delays the transmission of the BL's bits, allowing the EL1 and EL2 to ''catch up'' with it, again, subject to the video frame delay imposed. It also confines the corruption inflicted by the ''no Tx'' mode of Table 1, in case that mode is selected, as it will be discussed shortly.
• The tolerable video frame delay determines the number of video frames before and after the current video frame of the BL, during which the EL1 and EL2 may transmit their bits in the next packet. Since in our scenarios we transmit a video stream encoded at 30 frames per second we may allow a maximum delay of 10 ms, therefore a maximum video frame delay of 3 video frames at 30 frames per second. For example, if the Current Video Frame (CVF) of the EL1 is more than 3 video frames behind the CVF BL of the BL, then bits from the (CVF BL − 3)th video frame's EL1 are included in the next packet. This results in some bits from the EL1 and EL2 never being transmitted. It should be noted that if the channel quality is so good, that the 16-QAM mode is employed for each packet, all the bits of the video stream will be transmitted. We should also emphasize that all the bits of the BL are always transmitted, regardless of the employed modulation mode employed, provided that the reference user transmits in each time slot.
• In some cases, the ''no Tx'' mode will be selected according to Table 1. When this happens, the packet that would have been transmitted is assumed to be discarded and the video decoder will replace the lost bits by randomly generated ones. The number of bits that are assumed to be lost are equal to the number of bits that would have been conveyed to the BS, if the BPSK modulation scheme was used, since the BPSK mode would have been selected, if ''no Tx'' was not an option. Since this will lead to the corruption of the video frame for which the packet was discarded, there is no point in continuing the transmission of the BL, the EL1, or the EL2 of that video frame, hence the next transmitted packet will recommence the transmission from the subsequent video frame.
• Transmission of the video clip is assumed to be completed, when the BL bits of the last video frame have been transmitted to the BS.

IV. QUANTUM-ASSISTED MULTI-USER DETECTION
Let us now continue by reviewing the operation of the QMUD, bearing in mind that readers who are only interested in the overall system performance attained may directly proceed to Section V. The powerful DHA-MUA-FKT QMUD is conceived for our multi-user system, which exploits the potent parallel processing capability of QSA for finding the optimal multiuser vector at a fraction of the MAP MUD's CFEs. Its further benefit is that it is capable of handling rank-deficient scenarios up to a normalized user-load of 2 − 3. It is based on multiple bit-based searches performed by the DHA, which is in turn constructed from Grover's QSA. Let us continue with an introduction to the prerequisites of quantum computing, the necessary quantum search algorithms and the functions of the DHA-MUA-FKT QMUD of [15] and [16].

A. FUNDAMENTALS OF QUANTUM COMPUTING
As mentioned in Section I, a qubit |q is the information unit in quantum computing. A qubit may be found in a superposition of the states |0 and |1 , as in |q = a|0 + b|1 , where a, b ∈ C are the amplitudes of the quantum states and |a| 2 + |b| 2 = 1. When a qubit is observed or measured on a specific basis, the resultant quantum state is one of the states of the basis. For example, if we observe the qubit |q = a|0 + b|1 in the computational basis {|0 , |1 }, then there is a |a| 2 probability that we will obtain |q = |0 and |b| 2 probability that |q = |1 after the measurement.
A qubit's quantum state can be processed with the aid of unitary operators U [3], [4], where U −1 = U † . One of the most commonly used unitary operators is the Hadamard operator H , which maps H |0 → (|0 + |1 )/ √ 2 = |+ and H |1 → (|0 − |1 )/ √ 2 = |− . Multiple qubits may form a quantum register and their quantum state may be jointly represented and processed. For example, two qubits may be found in a superposition of four states as in |q 1 |q 2 = a 00 |00 + a 01 |01 + a 10 |10 + a 11 |11 , with |a 00 | 2 + |a 01 | 2 + |a 10 | 2 + |a 11 | 2 = 1. The evolution of a quantum register from one state to another may take place by applying single-or multiple-qubit unitary operators. The Controlled-NOT (CNOT) gate is fed with a control qubit as well as a target qubit, flipping the quantum state of the target qubit, when the control qubit is in the |1 state.
Let us consider an example quantum register associated with two qubits in the all-zero state, as in |q 1 q 2 = |00 . Let us initially apply the Hadamard operator H to the first qubit, resulting in |q 1 q 2 = |0 +|1 √ 2 ⊗ |0 = 1 √ 2 (|00 + |10 ). Please note that the quantum state of each of the qubits may be described independently of the other qubit, hence after a potential observation of one of the qubits, the quantum state of the unmeasured qubit will remain intact. If we apply the CNOT operator to our example, assuming that |q 1 is the control qubit and |q 2 is the target qubit, we will obtain CNOT |q 1 q 2 = 1 √ 2 (|00 + |11 ). The states of the two qubits in the resultant quantum register cannot be described independently of each other, therefore the two qubits are entangled [3]. Any measurement or operation applied to one or more entangled qubits, affects the quantum state of the rest of the entangled qubits as well [5].

B. GROVER's QUANTUM SEARCH ALGORITHM
Grover's QSA [8], [9] succeeds in finding the position x s of a known value δ in an unsorted database f (x) of size N , so that f (x) = δ, with a success probability of ∼100% after only O( √ N ) evaluations of the function, compared to the optimal classical ''brute-force'' algorithm, which would require N /2 function evaluations on average. Grover's QSA and the quantum algorithms that were contructed from it may be employed in optimization problems by exploiting their parallel search capabilities. For Grover's QSA to succeed, we have to know a priori • the size N of the database, • the value δ we are looking for, • as well as the number of times S that δ appears in the database. Even though estimating the size N of the database may be straightforward in optimization problems, frequently we have no knowledge about the value δ sought, or of the number of times S that it is included in the database.
Grover's QSA employs a methodology termed as amplitude amplification [42], which may be distinguished in the following steps. 1) Firstly, Grover's QSA initiates an equiprobable superposition of n = log 2 (N ) qubits, as in 2) Grover's QSA tries to ''shift'' this equiprobable superposition of states towards the solution states by using unitary operators, so that the initial equiprobable superposition of all states of (19) eventually evolves to a superposition of just the solution states, which have the highest possible probability to be observed. 2 In order to achieve this, the Oracle operator O is applied to the quantum state |ψ . The Oracle operator includes the parallel evaluation of the function f and marks the specific quantum state(s) |x s that correspond to the sought value δ by flipping their signs. In other words, if f (x j ) = δ, then O|x j → −|x j . 3) After the application of the Oracle, the diffusion operator [9] is applied, which essentially mirrors the amplitudes of each superimposed quantum state with respect to the overall average value of quantum amplitudes. A single application of the Oracle operator O followed by the diffusion operator constitutes the so-called Grover-operator G. 2 It should be noted here that for simplicity, we will often represent a multiqubit quantum state in form of its decimal representation instead of its binary one. For example, the five-qubit quantum state |11001 may also be written as |25 . VOLUME 5, 2017 4) After applying Grover's operator G opt consecutive times to the initial superposition of states |ψ of (19), where we have [8] we observe the resultant state G opt |ψ . The probability of observing a wanted quantum state |x s , where f (x s ) = δ, is equal to [8] P success = sin 2 which is equal to 100%, when N /S = 4 and greater than 99%, when N /S ≥ 32 [3].

C. BOYER-BRASSARD-HØYER-TAPP ALGORTIHM
When the number of solutions S is not known a priori, but the sought value δ is known, the BBHT QSA [11] may be used for the search problem. Since without the knowledge of S we are unable to calculate the optimal number of Grover operator applications based on (16), a specifically structured trialand-error methodology is adopted, which has been shown to solve the search problem after a maximum of 4.5 √ N Grover operator applications. This methodology includes the application of Grover's operator a pseudo-random number of times, before observing the resultant state. Upon checking the quantum state |x j obtained after the observation, by evaluating the function f (x j ), we may conclude whether it is one of the solution states x s = x j . If it is not, another equiprobable superposition of states is constructed and the Grover operator is applied another pseudo-random number of times, as stated in [11]. The BBHT QSA uses Grover's operator in the same way as Grover's QSA, with the only difference being the number of applications of this operator to the initial equiprobable superposition of states, before observing it.
If we try to distinguish the algorithmic steps of the BBTH QSA, these would be: 1) Firstly, the BBHT QSA initiates an equiprobable superposition of n = log 2 (N ) qubits, as in 2) Then, Grover's operator of Section IV-B is applied a pseudo-random number of times m, since opt of (16) cannot be calculated, due to the fact that the number of solutions S is unknown.
3) The resultant superposition of states is observed. If the observed state is not a solution, we start from the first step again, with the only difference that the pseudorandom number of Grover operator applications m will be chosen by a different pool of numbers. The number of times m that the Grover's operator will be applied is selected from a carefully created pool of numbers, which depends on the number of hitherto failed trials. The construction of that pool guarantees that, if it exists, a solution will have been found after no more than 4.5 √ N database queries.

D. DÜRR-HØYER ALGORITHM
The DHA [12] is a quantum search algorithm that finds the index of the minimum or the maximum value of a function with ∼100% probability of success after a maximum of 22.5 √ N applications of Grover's operator. The procedure is similar to the BBHT QSA, apart from a pair of differences. Firstly, the DHA does not assume any prior knowledge about the value δ min or δ max that corresponds to the indices x min and x max , so that f (x min ) = δ min or f (x max ) = δ max . This is also the case in our MUD problem, since there is no prior information about the maximum value of the CF of (10). Secondly, the Oracle that is used in the DHA operates in a different way from that of Grover's QSA or the BBHT QSA.
The high-level description of the DHA's steps may be: 1) Similarly to Grover's QSA and the BBHT QSA, the DHA also initializes an equiprobable superposition of n = log 2 (N ) qubits, as in Additionally, the DHA has an extra input to the quantum circuit. That extra input is a single quantum state, which acts as an initial ''guess'' of the solution of our search problem. Focusing on finding the maximum of a CF, as we discussed in [17], the closer the CF value of the initial quantum state is to the maximum CF value, the faster the DHA solves the optimization problem on average. This is the reason why in [17] we proposed the employment of the linear MMSE detector's output as the initial state of the DHA, which will serve as a better initial guess than a randomly selected multi-level symbol, even in rank-deficient scenarios. Let us denote the CF value of the initial quantum state by δ i . 2) Since now we know the sought value δ i , but the number of states that have a CF value higher than δ i is still unknown, we employ the BBHT QSA for finding a quantum state that corresponds to a higher CF value than δ i . In its quest for finding the position of the maximum value of the CF in the database, the Oracle of the DHA-employed BBHT QSA will only mark as solutions the quantum states that are ''better'' than the already found best solution. In other words, the Oracle of the DHA will only mark as solutions the particular states x that satisfy f (x) > δ i . 3) Once a quantum state x s is observed at the end of a BBHT QSA, its CF value f (x s ) is classically compared to δ i . If we have f (x s ) > δ i , then we set δ i = f (x s ) and another BBHT QSA is employed for the updated δ i , as described in Step 2. This process is repeated as long as δ i continues to be updated by the CF value of a ''better'' quantum state. The DHA concludes that it has found the position of the maximum value in the database, when the BBHT sub-process finishes after O( √ N ) number of Grover applications without providing a better quantum state than the best already found one. Then, we conclude that the last updated δ i is the maximum entry in the database and its corresponding state x max is the index that maximizes the CF. The best case scenario corresponds to the case where the initial multi-level symbol, which is used as an input to the DHA, happens to be x max . In that case, the BBHT sub-process will not find a better symbol than that, but will continue searching nonetheless until it times out after 4.5 √ N Grover iterations, which is also the minimum number of Grover operator applications for the DHA.
A single Grover operator application evaluates the CF of (10) in the quantum domain, using a quantum circuit and operating on qubits. The BBHT QSA and the DHA also rely on CFEs taking place in the classical domain after observing the resultant quantum states. Since the actual complexity of a single CFE in the quantum domain and that of a single CFE in the classical domain will depend on the specific technology used for building the quantum circuits, we will continue by assuming that a single CFE in the quantum domains requires the same complexity as a single CFE in the classical domain [7], [8], [11], [12], [42].

E. DHA-MUA-FKT QMUD
Let us investigate the operation of the DHA-MUA-FKT QMUD [16] in the context of a U = 3-user scenario, where each of the users transmits 8-ary Pulse Amplitude Modulation (8-PAM) symbols. The constellation of the multilevel symbols is depicted in Fig. 7. The soft-input soft-output QMUD with the multi-input approximation calculates the a posteriori bit-based LLR values by using the CF values of multiple symbols. The general flowchart of the DHA-MUA-FKT QMUD is illustrated in Fig. 8. is equal to 0. The aim of the DHA search is to find the specific symbol that maximizes the CF of (10). Figure 9a depicts the part of the full constellation that is searched during this first DHA search of the DHA-MUA-FKT QMUD (blue circles), the symbols that were evaluated by the DHA in the classical domain (orange dots), as well as the symbol that maximized the CF (green dot). Figure 9a describes Step 1 of Fig. 8. It should be noted that the symbols for which the CF value was calculated by the DHA in the classical domain, are stored in the memory and hence are readily available for the calculation of the bit-based LLR of that specific bit. Moreover, since the Forward Knowledge Transfer (FKT) modification of [15] is applied in this QMUD, the symbols that are evaluated by the DHA for a specific bit will also be stored in the memory allocated for the symbols of the subsequent bits of a multi-level symbol. For example, in our scenario, the orange symbols of Fig. 9a will also be used for the calculation of the bit-based LLR of the 1st user's 2nd bit, the 1st user's 3rd bit, the 2nd user's 1st bit, and so on. This is encapsulated in Step 5 of the flowchart in Fig. 8 [16].
After the first quantum search, based on Step 2 of the flowchart in Fig. 8, the DHA is employed again for the other half of the multi-level symbol constellation, corresponding to the symbols that have the first user's first bit equal to 1, as illustrated in Fig. 9b. Once again, all the symbols that were evaluated in the classical domain, along with their already computed CF values, are stored for the calculation of the appropriate bit-based LLRs based on the FKT modification of Step 5 in Fig. 8 [15]. By observing Fig. 9a and Fig. 9b, we may conclude that the two DHA searches have jointly searched the full multi-level symbol constellation. Therefore, in order to find the globally best symbol, a comparison between the two best found symbols of the two DHA searches would suffice. This is stated in Step 3 of the flowchart in Fig. 8, at which point the globally best symbolx max is found.
In our scenario, the symbol x max = [0.22, −0.22, 0.22], was found as the one that was most likely to have been transmitted. The bit-based representation of that symbol is b x max =  [0, 1, 0, 0, 1, 1, 0, 1, 0]. This means that if we employ a DHA search on the specific part of the full constellation that corresponds to the second user's first bit (i = 4th bit of the multi-level symbol) being equal to 0, we should be expecting that the best found symbol will be x max , since b x max (4) = 0. Therefore, from after i > 1 in Fig. 8, there is no reason in employing a futile search for the particular parts of the databases that include x max . This is the reason why only a single DHA search is performed for the rest of the bits during Step 4 in Fig. 8 -to be more specific for their bit value that is not equal to that of the globally best symbol. Figures 9c, 9d, 9e and 9f show the outcome for some of the subsequent DHA searches. For example, since the best guess for the first user's second bit was b x max (2) = 1, the DHA was only employed for the entries having b (2) 1 (2) = 0 in Fig. 9c in order to approach its MAP-based LLR value. The evaluated symbols of these DHA searches will still continue to be stored in the databases of the hitherto not yet searched bits, due to the FKT modification.
After the DHA search(es) of a bit is / are completed, there is a ∼100% probability that the best symbol for both of that bit's binary values has been found, therefore we may proceed with the calculation of only that bit's LLR, as depicted in Step 6 of Fig. 8, which is expected to be close to that of the optimal MAP MUD. The MAP MUD evaluates the CF for all the legitimate multi-level symbols of Fig. 7. The complexity of the DHA-MUA-FKT QMUD becomes lower than that of the MAP MUD, when there are at least U = 12 users transmitting BPSK symbols, or in other words, when the multi-level symbol consists of at least 12 bits. In the scenario of Fig. 9, the overall complexity of the DHA-MUA-FKT QMUD is higher than that of the MAP MUD for the associated number that Grover's operator applications, which is not visible in this figure. Steps 4-6 of Fig. 8 are repeated for each bit that participates in the multi-level symbol.
The DHA search operation employed by the DHA-MUA-FKT QMUD has been shown to converge to the maximum of each of the databases employed [12]. Therefore, the DHA-MUA-FKT QMUD manages to estimate the bitbased LLRs by using the most likely transmitted symbols. Moreover, it also used additional symbols, which were found during the DHA search, in order to provide a better estimate of the LLRs.

V. SIMULATION RESULTS
The system that we will investigate supports U = 12 users transmitting over the Extended Vehicular A (EVA) channels [43]. The number of subcarriers per OFDM symbol is equal to Q = 1024 and each user occupies all the available subcarriers. The number of receive AEs at the BS is equal to P = 4, resulting in a normalized user load of U L = U /P = 3 per subcarrier. Therefore, conventional linear detectors would exhibit an unacceptable detection performance. Again, the reference user may transmit BPSK, QPSK or 16-QAM symbols, based on the near-instantaneous channel quality, the rest of the users only use BPSK symbols. A recursive systematic convolutional code is employed by all the users, as well as by the video layers of the reference user, in conjunction with R = 1/2 and 8 Trellis states. The BL of the reference user is initially assumed to have a spreading factor of SF (BL) 1 = 4 before the bit-based interleaver, for the sake of assigning additional protection to it. The number of OFDM symbols per transmission burst, also termed as a packet, is equal to D = 8. Since each OFDM symbol has Q = 1024 subcarriers, the number of symbols per packet is D · Q = 8192. Therefore, the bit-based interleavers of the reference user's BL and EL1, as well as fthose of the rest of the users, have an equal length of 8192 bits, since each of these bit streams map a single bit to each transmitted symbol per packet. On the other hand, when the reference user transmits 16-QAM symbols, the bit-based interleaver length of the EL2 is equal to 16 384, since two bits of a 16-QAM symbol belong to the EL2, according to Table 2.
The mobile is assumed to have a velocity of v = 30 km/h, which in combination with a carrier frequency of f c = 2.5 GHz and a sampling frequency of f s = 15.36 MHz results in a normalized Doppler frequency of f d = v · f c /(c · f s ) = 4.52 · 10 −6 . The effective normalized Doppler frequency [13] is equal to F d = Q·f d = 0.0046. When shadowing [41] is also considered, the slow-fading lognormal channel is assumed to be fading ten times slower than the correlated Rayleighdistributed fast-fading channels, hence its associated Doppler frequency is equal to f (slow) d = 4.52 · 10 −7 . The system parameters are gathered in Table 3.
In our simulations, the video format YUV 4:2:0 is used for 176 × 144-pixel frames at 30 frames per second (FPS). Three layers per frame are generated by the 8 bits per pixel in the SHVC video CODEC employed. The resultant bit-rate became 668.8 Kbps, while in the case of a lost frame, the most recent correctly received frame is copied. The parameters used for the video CODEC in our systems are summarized in Table 4. The reason we opted for a relatively low resolution is mainly the complexity of simulating quantum algorithms with the aid of classical computers, since we do not have access to a true quantum computer. Even though a quantum processor running the QMUD would require a reduced number of of cost-function evaluations than classical MUDs, when we simulate the operations of a quantum computer using classical processors, the associated simulation time becomes higher than that of the classical full search. Based on the system parameters of Table 3, as well as on the video transmission specifications of Section III-G, this scenario requires the transmission of 63 packets for conveying 1-second video instance to the BS. Finally, we have assumed that the reference user transmits a packet every 10 time slots, which has an impact on both the timedomain correlation of the shadowing and on that of the fast-fading channels. Specifically, the 10 time slot interval between transmissions, in conjunction with the normalized Doppler frequencies of the shadowing and of the fast fading channels of Table 3, results in the fast fading channels of two consecutive transmitted packets to be uncorrelated, while the correlation remains pronounced in the slow fading channel envelope.

A. IMPORTANCE-FIRST VERSUS FAIRNESS-FIRST BIT-MAPPING 1) IMPORTANCE-FIRST BIT-MAPPING
Let us commence by investigating a scenario, where the bit to symbol mapping takes place based on the importance of each video layer. More specifically, in the ''important-first''-based bit mapping, the bits of the BL occupy the most protected bit position in the symbol. In a similar manner, the bits of EL1 are allocated to the second-best protected bit of the symbol, when QPSK symbols are used. Finally, when the 16-QAM scheme is adopted, the BL's bits are allocated to the most protected bit position, the EL1's bits are mapped to the second-best protected bit position and the two EL2's bits occupy the remaining two bit positions. This method of mapping is summarized in Table 5, which results in unequal video protection. Figure 10 represents the PSNR versus channel SNR curves of the three fixed modulation schemes, along with that of FIGURE 10. PSNR performance of the fixed modulation schemes, as well as of the AQAM scheme without the presence of shadowing, with respect to the channel SNR, when the important-first bit-to-symbol mapping of Table 5 is used. The channel SNR regions, in which a specific modulation mode of the adaptive modulation will be selected, are marked with vertical dashed lines. The system's parameters are summarized in Table 3.  Table 5. The thresholds were found based on our BER simulations.
Still referring to Fig. 10, the additional PSNR improvement of flawlessly detecting EL1 is about 6 dB. Similarly, the 8.5 dB channel SNR gap between the BL and the EL1 in the 16-QAM scenario is expected to have a similarly beneficial effect on its PSNR versus channel SNR performance, as confirmed by the extra 5 dB PSNR benefit of 16-QAM over QPSK. By observing the PSNR curves of the QPSK and 16-QAM schemes, we may confirm that their PSNR performance is similar to that of the BPSK scheme, until their corresponding BER performance of EL1 reaches the BER point required by the video decoder for improving the PSNR. This effect is more pronounced for the 16-QAM scheme, since the channel SNR gap between the BL and the EL1 in our BER simulations is wider. The PSNR curve of the AQAM system shows the effect of the ''No Tx'' range, which degrades the system's PSNR performance for low SNR values. When the channel SNR is high enough for making the probability of the instantaneous SNR to lie in the ''No Tx'' range negligible, the PSNR curve of the AQAM scenario tends to follow the PSNR curve of the specific modulation mode selected at the specific SNR value. The PSNR performance of all investigated scenarios may be considered equivalent for the DHA-MUA-FKT QMUD and the MAP MUD, with the DHA-MUA-FKT QMUD having the extra benefit of achieving the same quality at a lower number of CFEs. As it transpires from Fig. 10, when AQAM is used, the system is capable of unimpared video communications at a wide range of channel SNRs. The importance-first bitmapping may be used in all systems, regardless of the choice of modulation modes and of the number of video layers.

2) FAIRNESS-FIRST BIT-MAPPING
For fixing this irregularity of the PSNR performance, the fairness-first bit mapping of Table 5 focuses on mapping the video layer bits to a symbol in such an order, so that the resultant BER performances of the video layers are similar. In order to achieve this, the order of bit-mapping at the QPSK symbols is swapped with respect to that of the importancefirst bit mapping, while that of the 16-QAM symbols is also altered. FIGURE 11. BER performance of the reference user's video layers with respect to the channel SNR and the fixed modulation scheme used, when the fairness-first bit-to-symbol mapping of Table 5 is used. The channel SNR regions, in which a specific modulation mode of the adaptive modulation will be selected, are marked with vertical dashed lines. The system's parameters are summarized in Table 3.
The BER curves that correspond to the fairness-first bitmapping are shown in Fig. 11. Let us initially note that once again the performance of the DHA-MUA-FKT QMUD is near-optimal with respect to that of the MAP MUD, while requiring a lower complexity. Based on Fig. 11, we may find the threshold SNR values for the three transmission regions that employ different modulation schemes, based on the methodology of Section III-B. As stated in Table 5, we have SNR BPSK = −5.37 dB, as in the case of the importance-first bit mapping, since nothing has changed for the BPSK symbol transmission, SNR QPSK = −1.24 dB and SNR 16QAM = 5.19 dB. Comparing the threshold SNR values when the importance-first and the fairness-first bitmapping methods are used, it seems that for the former method, the QPSK region starts at higher SNR values and ends at lower SNR values. The SNR difference between the different video layers in the same modulation scheme is very low, when the fairness-first bit-mapping method is employed, creating an expectation for the PSNR performance of that system's modulation schemes to start increasing at higher VOLUME 5, 2017 FIGURE 12. PSNR performance of the fixed modulation schemes, as well as of the AQAM scheme without the presence of shadowing, with respect to the channel SNR, when the fairness-first bit-to-symbol mapping of Table 5 is used. The channel SNR regions, in which a specific modulation mode of the adaptive modulation will be selected, are marked with vertical dashed lines. The system's parameters are summarized in Table 3.
SNR values, but having a higher gradient, reaching their saturation point earlier than in the case, when the importance-first bit-mapping is used.
The PSNR performance of the three modulation schemes, as well as that of the scenario where adaptive modulation is employed in the absence of shadowing is depicted in Fig. 12. By comparing Fig. 10 to Fig. 12, we may verify that in the case of the fixed QPSK and 16-QAM schemes, the PSNR curves in the latter figure start increasing at higher SNR values, but reach their PSNR ceiling at lower PSNR values. Moreover, since all the video layers in a single modulation scheme exhibit similar BER performance, as illustrated in Fig. 11, the PSNR performance does not converge to that of BPSK, instead it rises straight to the maximum PSNR value that corresponds to that modulation scheme. We note that for video decoding, it is important to reach the maximum possible PSNR value of a modulation scheme at the lowest possible SNR values. Hence, we believe that the fairness-first bit-mapping is more suitable for video transmission. It should be noted that the specific bit arrangement for the fairnessfirst bit-mapping method of Table 5 corresponds only to our scenario and it also depends on the coding rate adopted by each video layer, prior to the modulation.
As for the PSNR performance of the system, when AQAM is employed in the absence of shadowing in Fig. 12, the trend observed is similar to that of Fig. 10. The PSNR curve of the AQAM scheme follows the PSNR performance of the selected modulation scheme in each channel SNR region, with the only deviations taking place at the SNR-borders of the region, due to the instantaneous channel SNR crossing the border for some packets, making them transmit using either the lower-order or the higher-order modulation scheme. Given our video CODEC employed, which was analysed in Section II, transmitting using a lowerorder modulation scheme, at a given average channel SNR value affects the resultant PSNR performance more dramatically than it does, when an excessive-throughput higher-order modulation scheme is used. This is the reason why the PSNR curve of the AQAM scheme in Fig. 12 does not closely match the PSNR performance of the fixed modulation scheme at the bottom of its channel SNR range. This fact becomes most evident at an average channel SNR value of −4 dB, where a single ''no TX'' mode occurence degrades several video frames, resulting in a lower PSNR value than that exhibited by the BPSK scheme. Hence, from this point onwards we will use the fairness-first bit-mapping in our scenarios. Figure 13 presents a snapshot of the originally transmitted video, as well as four snapshots of the same video frame of the received and decoded video for different values of channel SNR. Again, the fairness-first bit mapping was adopted and adaptive modulation was employed, corresponding to the scenario of Fig. 12. As expected based on Fig. 12, when the channel SNR improves, the PSNR is also enhanced. Figure 13a corresponds to a different video frame than the rest. This is due to the fact that when a video frame is erroneously decoded, the most recent correctly received video frame replaces it. The visual difference between Fig. 13b which corresponds to the transmission of the BL only -and Fig. 13c, which corresponds to the error-free transmission of the BL and the EL1 is obvious. However, careful speculation is required for spotting the differences between Fig. 13c and Fig. 13d, with the latter corresponding to an error-free reception of all the video layers. Therefore, as expected, Fig. 13d matches the transmitted video frame of Fig. 13e. Figure. 14 depicts the PSNR performance of our system, when AQAM is used in the presence of shadowing using the parameters of Table 3, when the fairness-first bit-mapping is adopted. As we may observe, the PSNR performance of the AQAM-aided system is lower than those of the individual fixed modulation schemes, due to the fact that the σ = 6 dB standard deviation associated with the slow fading power gain, results in the instantaneous SNR to switch modulation schemes more often than it would in the absence of shadowing. This also explains the fact that the PSNR values of the AQAM scheme in the ''no Tx'' region are higher than the minimum ones, since some of the video frames were transmitted due to the power gain of the shadowing. The fact that the PSNR performance of the AQAM scheme is lower than those of the three individual modulation schemes verifies that in our system a potential switch to a lowerorder modulation scheme degrades its PSNR performance more drastically than a potential switch to a higher-order modulation scheme would improve it.

C. THE COMPLEXITY OF THE MUDs
In all discussed figures, the performance of the DHA-MUA-FKT QMUD may be considered equivalent to that of the MAP MUD for all video layers and modulation schemes, despite a lower complexity of the DHA-MUA-FKT QMUD, as illustrated in Fig. 15. We may observe that as the database size grows due to using a higher-order modulation scheme, the ratio between the QMUD's and the MAP MUD's complexity becomes smaller. Therefore, even though the absolute complexity increases for both MUDs when higherorder modulation schemes are used, the QMUD becomes VOLUME 5, 2017 FIGURE 14. PSNR performance of the fixed modulation schemes, as well as of the AQAM scheme in the presence of shadowing, with respect to the channel SNR, when the fairness-first bit-to-symbol mapping of Table 5 is used. The channel SNR regions, in which a specific modulation mode of the adaptive modulation will be selected, are marked with vertical dashed lines. The system's parameters are summarized in Table 3.

FIGURE 15.
Complexity of the DHA-MUA-FKT QMUD in the deployed systems, in terms of the percentage of the number of CFEs required by the the QMUD over that needed by the MAP MUD, with respect to the channel SNR and the employed modulation scheme. The channel SNR regions,in which a specific modulation mode of the adaptive modulation will be selected, are marked with vertical dashed lines. The system's parameters are summarized in Table 3. more complex at a lower rate. The complexity of the DHA-MUA-FKT QMUD in a specific modulation scheme becomes lower, when the channel SNR is increased, because we use the MMSE detector's output as the initial input to the QMUD, as described in Section IV-D. Since the MMSE detector yields improved results when the channel SNR is higher, it reduces the complexity of the QMUD. Focusing on the complexity of the QMUD, when AQAM is used in the absence of shadowing, its complexity follows the trends of the fixed modulation schemes, apart from the transition regions, which are in the vicinity of the threshold SNR values. In those regions the modulation scheme selected changes more often from packet to packet, hence resulting in an  FIGURE 16. PSNR performance of the fixed modulation schemes, as well as of the AQAM scheme without the presence of shadowing, with respect to the channel SNR, when the fairness-first bit-to-symbol mapping of Table 5 is used. The channel SNR regions, in which a specific modulation mode of the adaptive modulation will be selected, are marked with vertical dashed lines. The system's parameters are summarized in Table 3.
average complexity between those of the fixed modulation schemes of the two regions. From the perspective of the absolute complexity, when the system switches to a higherorder modulation scheme, the shrinking transitional regions of the AQAM system's complexity in Fig. 15 indicate that the MAP MUD's complexity in those SNR regions grows more rapidly than the increase of the DHA-MUA-FKT QMUD's complexity. The absolute values of the MUD CFEs are stated in Table 6 for SNR values of −2 dB, 4 dB and 12 dB.

D. THE EFFECT OF THE NUMBER OF USERS
Let us now investigate the behaviour of the system, when different number of users are supported. The PSNR versus channel SNR curves of the multi-user systems are close to those of the single-user scenario, as it may be seen in Fig. 16. As expected, when the user load is increased for each modulation scheme, a higher channel SNR is required in order to achieve the same PSNR performance. Furthermore, the MAP MUD succeeds in achieving a near-single-user performance even in highly rank-deficient scenarios, given that the necessary complexity is affordable. It should be noted that for clarity, only the fixed modulation-based PSNR curves of the U = 12 scenario are plotted, but each AQAM-based PSNR recorded for the scenarios supporting different number of users, the selected SNR thresholds correspond to their own BER performance. . BER performance of the reference user's video layers with respect to the channel SNR and the fixed modulation scheme used, when the fairness-first bit-to-symbol mapping of Table 5 is used in a system supporting U = 12 users in the two scenarios, where the users have been allocated the same interleaving sequences and unique interleaving sequences. The channel SNR regions, in which a specific modulation mode of the adaptive modulation will be selected, are marked with vertical lines, the linestyle of which matches the linestyle of the number of users supported in the system. The system's parameters are summarized in Table 3.

E. THE EFFECT OF UNIQUE INTERLEAVING SEQUENCES
Moreover, for evaluating the effect that the unique interleaving sequences have on the systems investigated, in Fig. 17 we compare the scenario of Fig. 11, where the U = 12 users supported employ unique interleaving sequences, with a scenario, where all the users, as well as the BL and EL1 of the reference user employ the same interleaving sequence. The former system may be considered as an MC-IDMA system, since the unique interleaving sequence of a user may act as that specific user's ''signature'', scrambling the erroneously detected bits, which occur at similar bit positions at the MUD between the users, and mapping them to different bit positions, generating unique behaviours at each user's channel decoder. This is expected to improve the system's overall performance, especially when iterations are allowed between the MUD and the channel decoders. Fig. 17 verifies that when unique interleaving sequences are employed by the users, the performance is at least equivalent to that of the scenario, where the same interleaving sequences are used, while in the low BER regions the power gain may reach 0.5 dB in our setup.

VI. CONCLUSIONS
In this contribution, we presented a system study, showing the suitability of a QMUD in practical communication scenarios, where multiple users are supported and a reference user is transmitting video to the BS. As exemplified in Fig. 11, the QMUD's performance is equivalent to that of the MAP MUD, independently of the bit mapping methodology adopted. At the same time, the complexity of the QMUD quantified in terms of CFEs is lower than that of the MAP MUD as stated in Table 6, while it rises more slowly, when modulation schemes associated with larger constellations are used, as illustrated in Fig. 15. A graphical tutorial for the DHA-MUA-FKT QMUD was provided in Section IV-E, based on a specific multi-user scenario and the graphical representation of its multi-level constellation.
Based on Fig. 12 and Fig. 14, we may conclude that the DHA-MUA-FKT QMUD is also capable of providing iterative detection for systems that employ adaptive modulation. Furthermore, the use of powerful detectors, such as the MAP MUD and the DHA-MUA-FKT QMUD, offers a near-single-user performance in multi-user systems relying on MC-IDMA as exemplified in Fig. 16, since they mitigate the multi-user interference by jointly detecting the users supported, while increasing the throughput of the system, compared to CDMA, TDMA, FDMA, or OFDM-SDMA systems, where each subcarrier is allocated only to a single user. Our future research may consider the transmission of stereoscopic and holographic video.

ACKNOWLEDGMENT
The use of the IRIDIS High Performance Computing Facility at the University of Southampton is also acknowledged.
ZUNAIRA BABAR received the B.Eng. degree in electrical engineering from the National University of Science and Technology, Islamabad, Pakistan, in 2008, and the M.Sc. degree (Hons.) and the Ph.D. degree in wireless communications from the University of Southampton, U.K., in 2011 and 2015, respectively. She is currently a Research Fellow with the Southampton Wireless Group, University of Southampton.
Her research interests include quantum error correction codes, channel coding, coded modulation, iterative detection, and cooperative communications. His research interests include adaptive coded modulation, coded modulation, channel coding, space-time coding, joint source and channel coding, iterative detection, OFDM, MIMO, cooperative communications, distributed coding, quantum error correction codes and joint wireless-and-optical-fiber communications. He is a Chartered Engineer and a fellow of the Higher Education Academy, U.K.
LAJOS HANZO (M'91-SM'92-F'04) received the degree in electronics in 1976, and the Ph.D. degree in 1983. During his 38-year career in telecommunications, he has held various research and academic posts in Hungary, Germany, and the U.K. Since 1986, he has been with the School of Electronics and Computer Science, University of Southampton, U.K., where he holds the Chair in telecommunications. He has successfully supervised about 100 Ph.D. students, co-authored 20 John Wiley/IEEE Press books on mobile radio communications totaling in excess of 10 000 pages, authored or co-authored 1500+ research entries at the IEEE Xplore. He is currently directing a 100-strong academic research team, involving in a range of research projects in the field of wireless multimedia communications sponsored by industry, the Engineering and Physical Sciences Research Council, U.K., the European Research Council's Advanced Fellow Grant, and the Royal Society's Wolfson Research Merit Award. In 2009, he received the honorary doctorate Doctor Honoris Causa from the Technical University of Budapest. He is an enthusiastic supporter of industrial and academic liaison and he offers a range of industrial courses. He has acted both as the TPC and the General Chair of the IEEE conferences, presented keynote lectures, and received a number of distinctions.
Dr. Hanzo is a fellow of the Royal Academy of Engineering, the Institution of Engineering and Technology, and the European Association for Signal Processing. He is also the Governor of the IEEE VTS. From 2008 to 2012, he was the Editor-in-Chief of the IEEE Press and also a Chaired Professor with Tsinghua University, Beijing. He has 22,000+ citations. VOLUME 5, 2017