1 Introduction

1.1 Error correction for Ring-LWE-based public key encryption

As the world’s top tech companies and research labs compete in the race to build a quantum computer, real world public key cryptography, such as digital signatures, public key encryption (PKE), and key exchange protocols, must be made quantum resistant. The ring learning with errors (RLWE) problem was introduced in [20] in 2010, expanding on the classical version of the learning with errors problem (LWE) introduced by Regev in [30]. Since then, cryptography based on the RLWE problem has become one of the most attractive post quantum candidates. Its security relies on the worst-case approximate shortest independent vector problem (SIVP) on ideal lattices and it gives better efficiency compared with plain LWE because of the ring structure. Many of the prominent submissions to the National Institute of Standards and Technology’s (NIST) call for proposals [26], for example NewHope [4] and LAC Footnote 1 [19], are based on RLWE. Though neither of the two advanced to NIST’s third round, academic and industrial study on RLWE cryptography and their applications never stops. In this work, we focus on the issue of error correction for RLWE-based public key encryption.

Among the RLWE-based public key exchange protocols, there are essentially two major approaches to the problem of sharing a session key which is used to protect communication payload: the reconciliation approach of [10] and the encryption approach of [3]. By the first approach, both participants agree on a shared value from some pseudorandom signals with the help of a robust extractor. This work focuses on the other approach which resembles the compact RLWE public key encryption scheme proposed in [21]. Taking NewHope for example, the binary secret to be shared is encoded using a repetition code, mapped to {0,⌊q/2⌋}n and then wrapped by an encryption function. There will be a residue noise term after the decryption. Upon getting a decrypted codeword, the decoder then sums up the symbols corresponding to the repeated digits and infers if the digit should be 0 or 1 according to a threshold. Taking the telecommunication system as an analogy, this process is exactly a hard-decision decoding process which is not able to offer the optimal decoding performance. Decreasing the decryption failure rate (DFR) is believed of vital importance to the RLWE-based PKE. Firstly, if we seek chosen-ciphertext attack (CCA) security of the above cryptosystem using the classical Fujisaki-Okamoto transform [14], a pretty low DFR is required and the NIST standardization targets at a failure rate lower than 2− 128. Secondly, more capable error correction allows larger error terms of RLWE, increasing the hardness of the underlying lattice problem and therefore the security of the cryptosystem.

To improve the error correction and security of RLWE-based PKE, some researchers have exploited the goodness of multidimensional lattices. For example, Leech lattice encoding and decoding are used in LWE-based PKE [28]. It gives the densest sphere packing of its dimension which means a good trade-off between information transmission rate and error correction capability. An alternative way to decrease DFR is to apply error-correcting codes (ECC). In [13], Fritzmann et al. considered how much the RLWE-based PKE protocol, NewHope Simple, could profit from Bose–Chaudhuri–Hocquenghem (BCH) codes, low-density parity-check (LDPC) codes, and a hybrid of the two regarding the DFR. They achieved a DFR of 2− 140 using these codes, but their decoding algorithms were not intrinsically constant-time though this defect was likely to be managed if proper measures (e.g., fixing the number of iterations) were taken. In an independent line of work, Saarinen designed a linear block code called XE5 and implemented it in a RLWE-based PKE scheme called Hila5 [31] Footnote 2. This method is able to share 256 bits of message and additional 240 bits of redundancy at DFR below 2− 128. The decoding algorithm runs in constant time, which provides resilience to timing-based side-channel attacks.

How to deal with the dependency existing in the residue noise term of RLWE-based PKE is closely related to the soundness of DFR estimation [9, 13, 32]. For example, in an integer ring of cyclotomic field \(\frac {\mathbb {Z}_{q}[X]}{x^{n}+1}\), the multiplication, denoted by ⋅, of two ring elements results in a polynomial in the ring with correlated coefficients. In the case of RLWE-based PKE, the residue noise etse1 + e2 (will appear in Section 2.2) has correlated coefficients such that an incorrect decryption of one bit may increase the probability of decryption error of other bits. As a result, the DFR estimation will be inaccurate if we assume the noise term has independent and identically distributed (i.i.d.) coefficients [9]. Moreover, advanced decoding algorithms (e.g. soft-decision decoding) presume an i.i.d. channel. That is the reason why we expect an i.i.d. noise model. We have found a few “independence” assumptions in the literature. Fritzmann et al. gave upper bounds on DFR using their error-correcting codes assuming that the residue noise can be seen as independent [13]. They improved the bit-security of NewHope to 309 bits for n = 1024, q = 12289 targeting at DFR= 2− 140. D’Anvers et al. assume the residue noise has independent coefficients conditional on its norm. This method was used to deal with ternary error terms but it is impractical for true discrete Gaussian errors [9]. Song et al. formulated the NewHope as a digital communication system and solved a part of the dependency. They improved the bit-security to 252 bits (n = 1024, q = 12289) targeting at a DFR of 2− 140 as well [32].

1.2 Originality and contribution

Distinguished from existing error-correcting schemes for RLWE-based PKE in the literature, the originality of the proposed polar coding scheme is threefold.

  1. 1.

    Firstly, we take advantage of the fact that the secret s and e in the residue noise etse1 + e2 are known by Alice at the decryption stage. This is leveraged to improve decoding performance. To this end, this work formalizes the mathematical model of RLWE PKE as a fading channel with channel state information (CSI) available. Existing works (e.g. [32]) also treat RLWE PKE as a telecommunication system but it does not exploit the knowledge about s and e which are seen as CSI in this work.

  2. 2.

    Secondly, we resolve the correlation between the coefficients of the residue noise using canonical embedding under which polynomial multiplications are turned into coordinate-wise multiplications Footnote 3 and we derived an i.i.d. channel model in the end. This allows us to carry out soft-decision decoding and accurate DFR estimation.

  3. 3.

    In addition to providing an error-correcting approach, polar codes exhibit some salient features. Compared with BCH and LDPC, polar codes provide competitive and well-understood decoding performance limits measured by Bhattacharyya parameter. Moreover, its decoding is not affected by error floors [23]. Besides, the encoding and decoding of polar codes are isochronous in the sense that the timings of encoding and decoding are irrelevant to the secret and the plaintext.

The contributions of this paper are summarized as follows.

  1. 1.

    We formulate the RLWE-based PKE as an i.i.d. fading channel with CSI available to the receiver without any “independence” assumptions. These are the prerequisites of the proposed of polar coding scheme.

    1. (a)

      As explained earlier in this section, the coefficient correlation of the residue noise term etse1 + e2 is unfastened by canonical embedding leading to an i.i.d. channel model. We view e and s as CSI which are known by Alice at the decryption stage whilst Bob on the other side only knows its distribution.

    2. (b)

      Taking telecommunication system as an analogy, mapping a single bit 0 or 1 of the plaintext to a symbol on the constellation {0,⌊q/2⌋} is called modulation. To make the modulation scheme fit in with the i.i.d. fading channel in canonical basis, we proposed a new modulation scheme at the cost of error tolerance.

  2. 2.

    Then we give the explicit construction of polar codes for RLWE-based PKE channel model. Experimental results and theoretical estimation of DFR are also given. Specifically, we derive a new DFR of 2− 298 for q = 12289,n = 1024,r = 2 (\(r=\sqrt {k/2}\)) and code rate= 0.25, while NewHope gives a DFR of 2− 216 in the same setting; we derive a new DFR of 2− 156 for r = 2.83 (k = 16) and code rate= 0.25 while NewHope is proved to give a DFR of 2− 137 in almost the same setting [32]. Thanks to the new DFR margin, the proposed RLWE-based PKE achieves a better bit-security level than NewHope while achieving the same target DFR. Besides, the encoding and decoding of polar codes support quasi-linear (i.e., \(O(n\log n)\) with n to be the degree of the cyclotomic field of RLWE) and isochronous implementations, which will be discussed in detail in Section 7.2.

1.3 Roadmap

This paper is organized as follows. A review of the necessary algebraic number theory, fading channels and polar codes can be found in Section 2. In Section 3 we explain how to formulate a typical RLWE-based PKE scheme as an i.i.d. fading channel. How to handle the dependency in canonical basis is also demonstrated. Section 4 gives a high-level description of RLWE-based PKE with the proposed polar coding scheme. Section 5 gives the explicit construction of polar codes for RLWE. Section 6 analyzes the DFR theoretically and experimentally when polar coding is applied. Section 7 discusses the bit-security improvement derived by the new DFR margin as well as the isochrony of polar codes. Section 8 concludes this paper.

2 Preliminaries

2.1 Algebraic number theory

We review the necessary concepts from algebraic number theory required for our discussion of ring-LWE. In particular, we will relate many of our definitions to power-of-two cyclotomic fields, which are popular in modern cryptography.

A number field \(K= \mathbb {Q}(\zeta )\) can be defined by adjoining an element \(\zeta \in \mathbb {C}\) to the field of \(\mathbb {Q}\) where ζ satisfies f(ζ) = 0 for some irreducible polynomial \(f(X) \in \mathbb {Q}[X]\). Then, the degree of K over \(\mathbb {Q}\) is precisely the degree n of f(X). Because f(ζ) = 0, K can be seen as a vector space over \(\mathbb {Q}\) endowed with a basis {1,ζ,...,ζn− 1} known as the power basis of K. Let ζm be a primitive m th complex root of unity with minimal polynomial

$$\begin{array}{@{}rcl@{}} f(X)=\prod\limits_{i\in\mathbb{Z}^{*}_{m}}(X-{\zeta_{m}^{i}})\in\mathbb{Z}[X], \end{array}$$

where \(\mathbb {Z}^{*}_{m}\) is the group of invertible elements in \(\mathbb {Z}_{m}\). Then, the m th cyclotomic number field is defined as \(K=\mathbb {Q}(\zeta _{m})\). When m ≥ 2 is a power of two, f(X) = Xn + 1 and n = m/2.

A number field K of degree n permits n distinct ring embeddings \(\sigma _{i}: K \rightarrow \mathbb {C}, i =1,...,n,\) which correspond to n automorphisms of K mapping ζ to each root of its minimal polynomial f(X). The n embeddings include s1 real embeddings and s2 pairs of complex conjugate embeddings. The concatenation of the n embeddings is called canonical embedding σ(⋅) which is a map from K into the space

$$H = \lbrace (x_{1},...,x_{n}) \in \mathbb{R}^{s_{1}} \times \mathbb{C}^{2 s_{2}} \vert x_{s_{1}+j} = \overline{x_{s_{1}+s_{2} +j}},~\forall 1 \leq j \leq s_{2} \rbrace \subseteq \mathbb{C}^{n}.$$

For power-of-two cyclotomics, s1 = 0. Because the complex embeddings come in pairs of conjugates, H is isomorphic to \(\mathbb {R}^{n}\). We also remark that under the embedding σ multiplication in K maps to coordinate-wise multiplication in H.

Let \(\mathcal {O}_{K}\) be the set of all the algebraic integers in K. It forms a ring and is called ring of integers of the number field. For the above power-of-two cyclotomics, the ring of integers is \(\mathcal {O}_{K}=\mathbb {Z}[X]/{(}1+X^{n}{)}\) and the canonical embedding maps \(\mathcal {O}_{K}\) to an algebraic lattice in space H and the lattice generator matrix is defined as

$$B=\left(\begin{array}{ccccc} 1 & \zeta_{m} & {\zeta_{m}^{2}} &{\cdots} & \zeta_{m}^{n-1} \\ 1 & {\zeta_{m}^{3}} & {\zeta_{m}^{6}} &{\cdots} & \zeta_{m}^{3(n-1)}\\ 1 & {\zeta_{m}^{5}} & \zeta_{m}^{10} &{\cdots} & \zeta_{m}^{5(n-1)}\\ {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ 1 & \zeta_{m}^{(2n-1)} & \zeta_{m}^{(2n-1)2} & {\cdots} & \zeta_{m}^{(2n-1)(n-1))} \end{array} \right)_{n\times n}.$$

Moreover, because of the conjugate pairs of the embeddings, we can rewrite σ as \(\sigma ^{\prime }: K\rightarrow \mathbb {R}^{n}\)

$$\sigma^{\prime}(x) = (\Re[\sigma_{1}(x)],\Im[\sigma_{1}(x)],...,\Re[\sigma_{n/2}(x)],\Im[\sigma_{n/2}(x)]).$$

And the corresponding basis \(\tilde {B}\) of the images of the mapping is

$$\tilde{B}=\left(\begin{array}{ccccc} 1 & \Re[\zeta_{m}] & {\Re[\zeta_{m}^{2}}] &{\cdots} & \Re[\zeta_{m}^{n-1}] \\ 0 & \Im[\zeta_{m}] & {\Im[\zeta_{m}^{2}}] &{\cdots} & \Im[\zeta_{m}^{n-1}] \\ 1 & {\Re[\zeta_{m}^{3}}] & {\Re[\zeta_{m}^{6}}] &{\cdots} & \Re[\zeta_{m}^{3(n-1)}]\\ 0 & {\Im[\zeta_{m}^{3}}] & {\Im[\zeta_{m}^{6}}] &{\cdots} & \Im[\zeta_{m}^{3(n-1)}]\\ 1 & {\Re[\zeta_{m}^{5}}] & \Re[\zeta_{m}^{10}] &{\cdots} & \Re[\zeta_{m}^{5(n-1)}]\\ 0 & {\Im[\zeta_{m}^{5}}] & \Im[\zeta_{m}^{10}] &{\cdots} & \Im[\zeta_{m}^{5(n-1)}]\\ {\vdots} & {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ 1 & \Re[\zeta_{m}^{(n-1)}] & \Re[\zeta_{m}^{(n-1)2}] & {\cdots} & \Re[\zeta_{m}^{(n-1)(n-1)}]\\ 0 & \Im[\zeta_{m}^{(n-1)}] & \Im[\zeta_{m}^{(n-1)2}] & {\cdots} & \Im[\zeta_{m}^{(n-1)(n-1)}] \end{array} \right)_{n\times n}.$$

Note that both B and \(\tilde {B}\) are orthogonal matrices. The determinant of B is \(\sqrt {n}^{n}\) while that of \(\tilde {B}\) is \((\sqrt {n/2})^{n}\).

2.2 Ring-LWE public key encryption scheme and the coefficient dependency

For concreteness, we give an example of a public key scheme based on ring-LWE which was first described in [21]. Many ring-LWE schemes and protocols including NewHope closely resemble this one. The scheme is parameterized by an integer modulus q, dimension n, and error distribution χ over Rq. We will take the example of NewHope and view Rq as \(\frac {\mathbb {Z}_{q}[X]}{x^{n}+1}\) and define sampling from χ to be sampling each coefficient of a polynomial from the discrete Gaussian over \(\mathbb {Z}\). The scheme proceeds as follows.

  • Alice samples a secret key \(s \leftarrow \chi\) and publishes as a public key a ring-LWE sample (a,b) = (a,as + e) ∈ Rq × Rq, where a is uniformly random and \(e \leftarrow \chi\).

  • Bob encrypts a message mR2 as \((c_{1}, c_{2}) = (a \cdot t + e_{1}, b \cdot t + e_{2} + \lfloor \frac {q}{2} \rfloor \cdot m)\), where e1,e2,t are sampled independently from χ.

  • Alice decrypts using s by \(d := c_{2} - c_{1} \cdot s = \lfloor \frac {q}{2} \rfloor \cdot m +e \cdot t - s\cdot e_{1} + e_{2}\).

Alice then recovers the message m by decoding: if the ith coordinate of d is closer to 0 than ⌊q/2⌋, Alice assumes the ith coordinate of m was 0, otherwise she assumes it was 1.

We find the dependency between the coefficients of the residue noise term etse1 + e2 obvious if we rewrite it in vector form using coefficient embedding of Rq, i.e.,

$$\begin{pmatrix}e^{(0)}&-e^{(n-1)}&\cdots&-e^{(1)}\\e^{(1)}&e^{(0)}&\cdots&-e^{(2)}\\\vdots&\vdots&\ddots&\vdots\\e^{(n-1)}&e^{(n-2)}&\cdots&e^{(0)}\end{pmatrix}\;\mathbf t-\begin{pmatrix}s^{(0)}&-s^{(n-1)}&\cdots&-s^{(1)}\\s^{(1)}&s^{(0)}&\cdots&-s^{(2)}\\\vdots&\vdots&\ddots&\vdots\\s^{(n-1)}&s^{(n-2)}&\cdots&s^{(0)}\end{pmatrix}\;{\mathbf e}_1+{\mathbf e}_2,$$

where e(i) is the i-th coefficient of polynomial and t is the coefficient embedding of polynomial t. The row vectors of the negacyclic matrix generated by e have identical norm and they are multiplied by the same vector t and so do s and e1.

2.3 Fading channel

In wireless communications, a fading channel arises due to a time-varying attenuation of signal quality caused by either the propagation environment or by movement of the transmitter/receiver. We consider a discrete-time fading channel model W

$$y_{i} = h_{i}x_{i}+z_{i},~~~i=1,\cdots,N,$$

where hi is the channel gain, zi is additive white Gaussian noise (AWGN) and N is the signal length. We highlight two facts about CSI which are relevant to the RLWE channel model we will discuss in Section 3. Firstly, a few consecutive hi may be correlated and this period is called coherence interval of a fading channel W denoted by Tc. In the context of a fading channel with memory, the channel gain hi is believed to be a constant within one coherence interval and varies independently as the next coherence interval approaches. Secondly, the realization of hi is called channel state information (CSI) and the distribution of hi is called channel distribution information (CDI). CSI sometimes is known to the decoder.

When designing a telecommunication system, we prefer i.i.d. fading channels where hi are independent. There are a few methods to deal with the correlation. Let m = Tc > 1 and N/m = n. Since a fading channel with coherence interval Tc can be seen as m parallel sub-channels, a bit-interleaved coded modulation (BICM) technique can be used to handle the correlation between sub-channels [7, 22]. Another solution is to use multilevel codes [11] to design a coded modulation scheme with signal points in an m-dimensional signal space. In [18], a properly chosen lattice partition chain Λ1/⋯/Λl− 1/Λl is employed to design multilevel polar codes to achieve fading channel capacity. In this case, the dimension m of Λ1 is properly chosen such that the channel gain hi is assumed to be a constant amid the whole transmission of m symbols, i.e. Tc = m. A component code \(\mathcal {C}_{i}\) at the i-th level of the partition chain is designed in order to achieve the capacity of a Λi/Λi+ 1 fading channel. The component codes are combined by construction D giving rise to a lattice. More information about the multilevel construction and the Λi/Λi+ 1 channel can be found in [18] and [11]. We give an example of a mod \(\mathbb {Z}\) channel and a \(\mathbb {Z}/2\mathbb {Z}\) channel as follows and the fading version will be given in Section 3.

Example 1

A mod \(\mathbb {Z}\) channel is an AWGN channel with input restricted to \(a\in \mathcal {V}(\mathbb {Z})\) where \(\mathcal {V}(\mathbb {Z})\) is the fundamental region Footnote 4 of \(\mathbb {Z}\). At the receiver’s end, there is a mod \(\mathcal {V}(\mathbb {Z})\) operation giving the equivalent channel output as

$$y=a+z~\text{mod}~\mathbb{Z}=(a+z^{\prime})~\text{mod}~\mathbb{Z},$$

where z is an AWGN noise and \(z^{\prime }=z~\text {mod}~\mathbb {Z}\).

Example 2

A \(\mathbb {Z}/2\mathbb {Z}\) channel is an AWGN channel with input restricted to \(r\in (\mathbb {Z}+a)\cap \mathcal {V}(2\mathbb {Z})\) for some offset \(a\in \mathbb {R}\). At the receiver’s end, the equivalent channel output is

$$y=r+z~\text{mod}~2\mathbb{Z}= r+z^{\prime}~\text{mod}~2\mathbb{Z},$$

where z is an AWGN noise and \(z^{\prime }=z~\text {mod}~2\mathbb {Z}\). It can be viewed as a mod \(2\mathbb {Z}\) channel with input restricted to a set of elements of \(\mathbb {Z}+a\) that fall in \(\mathcal {V}(2\mathbb {Z})\).

In the special case of Tc = 1, channel W is referred to as an i.i.d. fading channel. The design and performance of error-correcting codes for i.i.d. fading channels with/without CSI is well studied [6, 36]. In [18], Liu et al. proposed a polar coding scheme for i.i.d. fading channels to achieve the ergodic capacity. Unlike previous work of [6] in which CSI is given to both ends of communication, in Liu et al.’s scheme CSI is only known to the receiver which is more feasible in practice.

2.4 Polar codes

Polar codes, introduced by Arıkan in [5], are linear block codes of length n = 2l for a positive integer l that achieves the capacity of any binary-input discrete memoryless symmetric (BDMS) channels asymptotically Footnote 5. We firstly review some basics of polar codes for a BDMS channel. A binary-input channel W is symmetric if there exists a permutation π of the output alphabet \(\mathcal {Y}\) such that W(y|1) = W(π(y)|0) and π− 1 = π for \(y\in \mathcal {Y}\). Given a BDMS channel W, there are two commonly used metrics in information theory to measure the quality of W: the mutual informationFootnote 6 and the reliability.

Definition 1 (Mutual information of BDMS channels)

The mutual information I(W) ∈ [0,1] of a BDMS channel \(W:\mathcal {X}\rightarrow \mathcal {Y}\) is the maximum rate at which information can be successfully transmitted from the transmitter to the receiver. We define I(W) as

$$I(W)\triangleq\sum\limits_{y\in\mathcal{Y}}\sum\limits_{x\in\mathcal{X}}\frac{1}{2}W(y|x)\log\frac{W(y|x)}{\frac{1}{2}W(y|0)+\frac{1}{2}W(y|1)}.$$

In here, we use the definition of symmetric mutual information assuming uniform channel input which is also the capacity of the BDMS channel. We use the notations I(W) and I(Y ;X) interchangeably to denote the mutual information of W.

Definition 2 (Bhattacharyya parameter of BDMS channels)

The Bhattacharyya parameter Z(W) ∈ [0,1] is a measure of channel reliability for a BDMS channel W defined as

$$Z(W)\triangleq\sum\limits_{y\in\mathcal{Y}}\sqrt{W(y|0)W(y|1)},$$

where a small Z(W) indicates a more reliable channel while a large Z(W) implies a channel with more inference.

The capacity-achieving nature of polar codes arises from the so-called channel polarization phenomenon as a result of recursive applications of Arıkan’s transform to two identical W channels and their synthesized derivatives. The overall recursive transform can be done in a channel combining phase and a channel splitting phase. In the channel combining phase, a linear transformation defined as X1:n = U1:nGn is performed on a vector \(U^{1:n}\in \mathcal {X}^{1:n}\) over GF(2), where \(G_{n}= B_{n}\left [ \begin {array}{cc} 1 & 0 \\ 1 & 1 \end {array}\right ]^{\otimes l}\). Bn is a permutation matrix: if \(U^{\prime 1:n}=U^{1:n}B_{n}\) and \(l=\log _{2}n\), the \(i^{\prime }=((b_{l},\cdots ,b_{2},b_{1})_{2}+1)\)-th coordinate of \(U^{\prime 1:n}\) is the i = ((b1,b2,⋯ ,bl)2 + 1)-th coordinate of U1:n where (⋯ )2 is the binary expansion of an integer. By taking X1:n as the raw input of W, one derives a combined channel \(W_{n}:\mathcal {X}^{1:n}\rightarrow \mathcal {Y}^{1:n}\) with a transition probability of

$$W_{n}(y^{1:n}|u^{1:n})=\prod\limits_{i\in\{1,\cdots,n\}}W(y^{(i)}|x^{(i)}=(u^{1:n}G_{n})_{i}),$$

where (⋅)i denotes i-the coordinate. Since Gn induces a one-to-one mapping between U1:n and X1:n, the mutual information of Wn is

$$I(W_{n}) = I((Y^{1:n};U^{1:n})) = n I(W).$$

In the channel splitting phase, Wn is further split back into n synthesized channels \(W_{n}^{(i)}:\mathcal {X}\rightarrow \mathcal {Y}^{n}\times \mathcal {X}^{i-1}\) whose transition probability is defined by

$$W_n^{(i)}(y^{1:n},u^{1:i-1}\vert u^{(i)})=\sum\limits_{U^{i+1:n}\in\mathcal X^{n-i}}\frac1{2^{n-1}}W_n(Y^{1:n}\vert U^{1:n}).$$

It is proved in [5] that Arıkan’s transform preserves the mutual information in the sense that

$$I(W_{n})=nI(W)=\sum\limits_{i\in\{1,\cdots,n\}}I(W_{n}^{(i)}).$$

More importantly, the quality of the synthesized channels polarizes asymptotically as the recursion proceeds.

Theorem 1 (Channel polarization of mutual information 5)

For any BDMS channel W, the synthesized channels \(W_{n}^{(i)}\) polarize in the sense that, for any fixed δ ∈ (0,1), as n goes to infinity through powers of two, the fraction of indices i ∈{1,⋯ ,n} for which \(I(W_{n}^{(i)})\in (1-\delta ,1]\) goes to I(W) and the fraction for which \(I(W_{n}^{(i)})\in [0,\delta )\) goes to 1 − I(W).

The channel polarization theorem can also be stated in the metric of Bhattacharyya parameter by replacing \(I(W_{n}^{(i)})\) by \(Z(W_{n}^{(i)})\). For any desired transmission rate R < I(W), we can partition {1,⋯ ,n} into a subset \(\mathcal {A}\) and its complement \(\mathcal {A}^{C}\) such that (i) \(|\mathcal {A}|=\lfloor nR \rfloor\) and (ii) for any \(i\in \mathcal {A}\) and \(j\in \mathcal {A}^{C}\), \(Z(W_{n}^{(i)})\leq Z(W_{n}^{(j)})\). Given the “best” ⌊nR⌋ channels indexed by \(\mathcal {A}\), one can construct polar codes following the encoding rule:

$$\begin{array}{@{}rcl@{}} X^{1:n}=U_{\mathcal{A}}G_{n}(\mathcal{A})\oplus U_{\mathcal{A}^{C}}G_{n}(\mathcal{A}^{C}), \end{array}$$
(1)

where ⊕ is XOR operation, \(U_{\mathcal {A}}\) is called the information vector and \(U_{\mathcal {A}^{C}}\) is called the frozen vector known by both encoder and decoder. Typical realization of the frozen vector is \(U_{\mathcal {A}^{C}}=\textbf {0}\) for BDMS channels. In this manner, the useful information is transmitted via the most reliable synthesized channels. A question may arise on how to efficiently calculate \(Z(W_{n}^{(i)})\). A brief review can be found in Sections 2.5 and 5.3 but detailed descriptions of these methods are beyond the scope of this work.

The successive cancellation (SC) decoder is the initial decoding algorithm for polar codes. Let u(i) be the i-th coordinate of U1:n. Given a channel output y1:n of polar code, the SC decoder yields the recovered \(\bar {u}^{(i)}\) of u(i) in sequential order of index i according to the decoding rule specified as

$$\begin{array}{@{}rcl@{}} \bar{u}^{(i)}=\begin{cases} u^{(i)}, & i\in\mathcal{A}^{C}\\ \text{arg~max}_{u^{(i)}\in\{0,1\}} W_{n}^{(i)}(y^{1:n},\bar{u}^{1:i-1}|u^{(i)}), & i\in \mathcal{A} \end{cases}, \end{array}$$
(2)

where \(\bar {u}^{1:i-1}\) is the estimation of u1:i− 1 recovered before \(\bar {u}^{(i)}\). Details of the SC decoder can be found in Appendix A.

Denote by Pe the averaged probability of frame errors. As a result of polar encoding and SC decoding, it is proved in [5] that Pe is upper bounded as follows.

Theorem 2 (Decoding Performance 5)

For any BDMS channel W and any choices of parameter \((n,R,\mathcal {A})\),

$$\begin{array}{@{}rcl@{}} P_{e}\leq\sum\limits_{i\in\mathcal{A}}Z(W_{n}^{(i)}). \end{array}$$

2.5 Channel degradation and upgradation

The construction of polar codes can be addressed if all the Bhattacharyya parameters \(Z(W_{n}^{(i)})\) of synthesized channels can be efficiently calculated. To this end, a quantization method was proposed in [34] to construct a degraded or upgraded approximation of a binary-input memoryless symmetric (BMS) channel. In this way, one can approximate \(Z(W_{n}^{(i)})\) efficiently with tractable and minor distortion. We define the degradation and upgradation relation as follows and will be further discussed them in the sequel.

Definition 3 (Degraded and Upgraded Channel, 34)

A channel \(\mathcal {Q}:\mathcal {X}\rightarrow \mathcal {Z}\) is (stochastically) degraded with respect to a channel \(\mathcal {W}:\mathcal {X}\rightarrow \mathcal {Y}\) if there exists a channel \(\mathcal {P}:\mathcal {Y}\rightarrow \mathcal {Z}\) such that

$$\mathcal{Q}(z|x)=\sum\limits_{y\in\mathcal{Y}}\mathcal{W}(y|x)\mathcal{P}(z|y)$$

for all \(z\in \mathcal {Z}\) and \(x\in \mathcal {X}\). We denote by \(\mathcal {Q}\preceq \mathcal {W}\) the relation that \(\mathcal {Q}\) is degraded with respect to \(\mathcal {W}\). Conversely, we denote by \(\mathcal {Q}^{\prime }\succeq \mathcal {W}\) the relation that \(\mathcal {Q}^{\prime }\) is upgraded with respect to \(\mathcal {W}\) if there exists a channel \(\mathcal {Q}^{\prime }:\mathcal {X}\rightarrow \mathcal {Z}^{\prime }\) and a channel \(\mathcal {P}:\mathcal {Z}^{\prime }\rightarrow \mathcal {Y}\) such that for \(y\in \mathcal {Y}\) and \(x\in \mathcal {X}\)

$$\mathcal{W}(y|x)=\sum\limits_{z^{\prime}\in\mathcal{Z}^{\prime}}\mathcal{Q}^{\prime}(z^{\prime}|x)\mathcal{P}(y|z^{\prime}).$$

Moreover, Lemma 1 indicates that the synthesized channels of \(\mathcal {Q},\mathcal {W},\mathcal {Q}^{\prime }\) under Arıkan’s transform also fulfill the channel degradation and upgradation relation. This implies a polar code constructed for \(\mathcal {Q}\) also fits in with \(\mathcal {W}\).

Lemma 1 (restatement of Lemma 4.7 in 17)

Given BMS channels \(\mathcal {W},\mathcal {Q}\), and \(\mathcal {Q}^{\prime }\), we denote by \(\mathcal {W}_{n}^{(i)}\), \(\mathcal {Q}_{n}^{(i)}\) and \({\mathcal {Q}^{\prime }}_{n}^{(i)}\) for i ∈ [1,n] the synthesized channels derived by Arıkan’s transform. If \(\mathcal {Q}^{\prime }\succeq \mathcal {W}\succeq \mathcal {Q}\) for all i, then \({\mathcal {Q}^{\prime }}_{n}^{(i)}\succeq \mathcal {W}_{n}^{(i)}\succeq \mathcal {Q}_{n}^{(i)}\).

If the channel degradation or upgradation relation is set up, their channel capacity, reliability and error probability will be related as follows.

Lemma 2 (34)

Let \(\mathcal {W}\) be a BMS channel and suppose there exists the other channel \(\mathcal {Q}\) such that \(\mathcal {Q}\preceq \mathcal {W}\). Then

$$C(\mathrm{Q}) \leq C(\mathcal{W}),~Z(\mathrm{Q})\geq Z(\mathcal{W}),~P_{e}(\mathrm{Q})\geq P_{e}(\mathcal{W}).$$

The inequality will reverse if we replace “degraded” by “upgraded”.

3 RLWE channel model

3.1 RLWE channel model in canonical basis

Definition 4

The real multivariate normal distribution has density function

$$g_{\mu,{\Sigma}}(x)=\frac{e^{-\frac{1}{2}(x-\mu)^{T}{\Sigma}^{-1}(x-\mu)}}{\sqrt{|2\pi {\Sigma}|}}~~x\in\mathbb{R}^{n}$$

where |⋅| denotes the determinant, \(\mu =\mathbb {E}[X]\in \mathbb {R}^{n}\), \({\Sigma }=\mathbb {E}\left [ (X-\mu )(X-\mu )^{T} \right ]\); we write \(X\sim \mathcal {N}(\mu ,{\Sigma })\). A generalization would be the complex multivariate normal distribution \(Z\sim \mathcal {N}\mathcal {C}(\mu ,{{\varGamma }})\) with density function

$$g_{\mu,{{\varGamma}}}(z)=\frac{e^{-(z-\mu)^{*}{{\varGamma}}^{-1}(z-\mu)}}{|\pi {{\varGamma}}|}~~z\in\mathbb{C}^{n},$$

where z denotes the Hermitian transpose of the vector z and \({{\varGamma }} =\mathbb {E}[(Z-\mu )(Z-\mu )^{*}]\).

We already have an RLWE-based PKE instance in Section 2.2. Now we consider the problem of decoding the message m from the polynomial

$$y = \lfloor \frac{q}{2} \rfloor \cdot m + e \cdot t - s \cdot e_{1} + e_{2} \mod R_{q},$$
(3)

where et and se1 are products of polynomials in \(\mathbb {Z}_{q}[x]/(1+x^{n})\). The coefficients of e,t,s,e1,e2 should be drawn from discrete Gaussian. We use continuous normal distribution \(\mathcal {N}(0,r^{2})\) instead to simplify the distribution analysis of the noise term.

Under canonical embedding, formula (3) can be rewritten as Footnote 7

$$\sigma(y)=\sigma(\lfloor\frac{q}{2}\rfloor m)+\sigma(e)\sigma(t)-\sigma(s)\sigma(e_{1})+\sigma(e_{2})~~\mod{BR_{q}},$$
(4)

where B is the orthogonal basis defined in Section 2.1 and the multiplications (i.e., σ(e)σ(t) and σ(s)σ(e1)) and additions are both coordinate-wise as explained in Section 2.1. Due to the conjugate pairs, formula (4) can be refined as

$$\underbrace{\left(\begin{array}{c} \Re[B_{1}\textbf{y}]\\ \Im[B_{1}\textbf{y}]\\ \Re[B_{2}\textbf{y}]\\ \Im[B_{2}\textbf{y}]\\ \vdots\\ \Re[B_{\frac{n}{2}}\textbf{y}]\\ \Im[B_{\frac{n}{2}}\textbf{y}] \end{array}\right)}_{\tilde{B}\textbf{y}} = \underbrace{\left(\begin{array}{c} \Re[B_{1}\lfloor\frac{q}{2}\rfloor \textbf{m}]\\ \Im[B_{1}\lfloor\frac{q}{2}\rfloor \textbf{m}]\\ \Re[B_{2}\lfloor\frac{q}{2}\rfloor \textbf{m}]\\ \Im[B_{2}\lfloor\frac{q}{2}\rfloor \textbf{m}]\\ \vdots\\ \Re[B_{\frac{n}{2}}\lfloor\frac{q}{2}\rfloor \textbf{m}]\\ \Im[B_{\frac{n}{2}}\lfloor\frac{q}{2}\rfloor \textbf{m}] \end{array}\right)}_{\tilde{B}\lfloor\frac{q}{2}\rfloor \textbf{m}} + \underbrace{\left(\begin{array}{c} \Re[\sigma_{1}({e})\sigma_{1}({t})-\sigma_{1}({s})\sigma_{1}({e}_{1})+\sigma_{1}({e}_{2})]\\ \Im[\sigma_{1}({e})\sigma_{1}({t})-\sigma_{1}({s})\sigma_{1}({e}_{1})+\sigma_{1}({e}_{2})]\\ \Re[\sigma_{2}({e})\sigma_{2}({t})-\sigma_{2}({s})\sigma_{2}({e}_{1})+\sigma_{2}({e}_{2})]\\ \Im[\sigma_{2}({e})\sigma_{2}({t})-\sigma_{2}({s})\sigma_{2}({e}_{1})+\sigma_{2}({e}_{2})]\\ \vdots\\ \Re[\sigma_{\frac{n}{2}}({e})\sigma_{\frac{n}{2}}({t})-\sigma_{\frac{n}{2}}({s})\sigma_{\frac{n}{2}}({e}_{1})+\sigma_{\frac{n}{2}}({e}_{2})]\\ \Im[\sigma_{\frac{n}{2}}({e})\sigma_{\frac{n}{2}}({t})-\sigma_{\frac{n}{2}}({s})\sigma_{\frac{n}{2}}({e}_{1})+\sigma_{\frac{n}{2}}({e}_{2})] \end{array}\right)}_{\textbf{N}}\text{mod}{\tilde{B}R_{q}},$$
(5)

where Bj represents the jth row of B, vector y and m are vector forms of polynomials y and m, \(\tilde {B}\) and \(\sigma ^{\prime }\) are introduced in Section 2.1, \(\tilde {B}\textbf {y}=\sigma ^{\prime }({y})\), and \(\tilde {B}\lfloor \frac {q}{2}\rfloor \textbf {m}=\sigma ^{\prime }(\lfloor \frac {q}{2}\rfloor {m})\). To see how the noise term N is distributed, we rewrite formula (5) for all the odd indices i = 1,3,5,⋯ ,n/2 − 1 as

$$\begin{array}{@{}rcl@{}}\left[ \begin{array}{c} \sigma^{\prime}_{i}(y)\\ \sigma^{\prime}_{i+1}(y) \end{array}\right] &= &\left[\begin{array}{c} \sigma^{\prime}_{i}(\lfloor \frac{q}{2} \rfloor m)\\ \sigma^{\prime}_{i+1}(\lfloor \frac{q}{4} \rfloor m) \end{array}\right] + \left[\begin{array}{cc} \sigma^{\prime}_{i}(e) & -\sigma^{\prime}_{i+1}(e)\\ \sigma^{\prime}_{i+1}(e) & \sigma^{\prime}_{i}(e) \end{array}\right] \left[\begin{array}{c} \sigma^{\prime}_{i}(t)\\ \sigma^{\prime}_{i+1}(t) \end{array}\right]\\ &&- \left[\begin{array}{cc} \sigma^{\prime}_{i}(s) & -\sigma^{\prime}_{i+1}(s)\\ \sigma^{\prime}_{i+1}(s) & \sigma^{\prime}_{i}(s) \end{array}\right] \left[\begin{array}{c} \sigma^{\prime}_{i}(e_{1})\\ \sigma^{\prime}_{i+1}(e_{1}) \end{array}\right] + \left[\begin{array}{c} \sigma^{\prime}_{i}(e_{2})\\ \sigma^{\prime}_{i+1}(e_{2}) \end{array}\right], \end{array}$$
(6)

where \(\tilde {B}_{i}(\cdot )=\sigma ^{\prime }_{i}(\cdot )\) and \(\tilde {B}_{i+1}(\cdot )=\sigma ^{\prime }_{i+1}(\cdot )\). Under embedding \(\sigma :K\rightarrow \mathbb {C}^{n}\), the spherical normal distributed vectors, e and t, are mapped to complex spherical normal vectors, \(\sigma (e),\sigma (t)\sim \mathcal {N}\mathcal {C}(0,nr^{2}\mathbb {I})\). As for the embedding \(\sigma ^{\prime }:K\rightarrow \mathbb {R}^{n}\), the spherical normal distribution \(\mathcal {N}(0,r^{2}\mathbb {I})\) is transformed to a new spherical normal distribution \(\mathcal {N}(0,nr^{2}/2\mathbb {I})\). Since e,t are coordinate-wise i.i.d. their embeddings σ(e), σ(t), \(\sigma ^{\prime }(e)\), \(\sigma ^{\prime }(t)\) are coordinate-wise independent as well. We observe from formula (6) that every odd-indexed coordinate and the next even-indexed coordinate are somehow correlated because they share the same \(\sigma ^{\prime }_{i}(e),\sigma ^{\prime }_{i+1}(e)\), \(\sigma ^{\prime }_{i}(t),\sigma ^{\prime }_{i+1}(t)\), \(\sigma ^{\prime }_{i}(s),\sigma ^{\prime }_{i+1}(s)\) and \(\sigma ^{\prime }_{i}(e_{1}),\sigma ^{\prime }_{i+1}(e_{1})\) although \(\sigma ^{\prime }_{i}(e_{2}),\sigma ^{\prime }_{i+1}(e_{2})\) are independent.

To further refine the RLWE channel model, we can rewrite formula (5) and (6) as

$$\tilde{B}\textbf{y}=\tilde{B}\lfloor\frac{q}{2}\rfloor\textbf{m}+\textbf{N},\mod{\tilde{B}R_{q}}$$
(7)

where for i = 1,2,⋯ ,n, Ni = HiZi, \(Z_{i}\leftarrow \mathcal {N}(0,\frac {nr^{2}}{2})\), and

$$H_{i} =\sqrt{\sigma^{*}_{\lceil i/2\rceil}({e})\sigma_{\lceil i/2\rceil}({e})+\sigma^{*}_{\lceil i/2\rceil}({s})\sigma_{\lceil i/2\rceil}({s})+1}.$$

Because of the correlation between every two coordinates, Hi and Hj are independent for two different indices i,j as long as ⌈i/2⌉≠⌈j/2⌉; otherwise Hi = Hj. Similarly, Zi and Zj are correlated if ⌈i/2⌉ = ⌈j/2⌉; otherwise they are independent.

Unlike in NewHope and other RLWE-based encryption schemes where the plaintext is encoded and decoded in the polynomial basis, we will carry out encoding and decoding in canonical basis. Observe that the channel given by formula (7) is a fading channel with coherence interval Tc = 2 coordinates except that the symbols to be transmitted after modulation, i.e., \(\tilde {B}\lfloor \frac {q}{2}\rfloor \mathbf {m}\), are not coordinate-wise independent. In next subsection, we will adjust the modulation scheme such that a tailored constellation diagram can fit in with the fading channel.

3.2 A tailored constellation diagram

The RLWE channel in formula (3) can be interpreted as n parallel \(\mathbb {Z}/2\mathbb {Z}\) channels where a message m ∈{0,1}n is mapped to a symbol on the constellation diagram \(\{0,\lfloor \frac {q}{2} \rfloor \}^{n}\). The mod Rq operation defines a valid constellation space as an n-dimensional cube Λ with vertices {0,q}n. To ease the description of how we design a new constellation diagram in canonical basis, we make a modification to the modulation scheme in formula (3): the message m ∈{− 1,1}n is mapped onto the constellation diagram \(\{\pm \lfloor \frac {q}{4} \rfloor \}^{n}\) and the valid constellation space is a cube Λ with vertices \(\{\pm \lfloor \frac {q}{2}\rfloor \}^{n}\). This modification will preserve the capacity of the \(\mathbb {Z}/2\mathbb {Z}\) channel because they are statistically equivalent if we ignore geometrical approximation caused by the round-off operation ⌊⋅⌋.

According to formula (7), after applying the canonical embedding, the constellation diagram turns into \(\tilde {B}\{\pm \lfloor \frac {q}{4}\rfloor \}^{n}\). Similarly, we can obtain the new constellation space \({{\varLambda }}^{\prime }=\tilde {B}{{\varLambda }}\) by rotating Λ and scaling it up by a factor of \(\sqrt {n/2}\).

As discussed in previous subsection, the coherence interval Tc of the residue noise equals to 2 coordinates while the constellation symbol \(\tilde {B}\lfloor \frac {q}{4}\rfloor \textbf {m}\) has memory throughout n coordinates. In a communication system, the interleaving technique can be used to alleviate the correlation of the source by permuting symbols of different code blocks. Unfortunately, interleaving is impractical in the RLWE channel because there is only one code block of length n. At the cost of distance between the constellation symbols, we tailor the constellation space \({{\varLambda }}^{\prime }\) to fit in with the fading channel.

Essentially, we are looking for a new modulation scheme meeting two conditions: (a) we desire the symbols after modulation (or the modulated message) to be coordinate-wise i.i.d.; in other words, we expect a valid constellation diagram inside the space \({{\varLambda }}^{\prime }\) such that for coordinate-wise i.i.d. message m, the modulated message is coordinate-wise i.i.d. as well; (b) the new modulation scheme gives us a \(\mathbb {Z}/2\mathbb {Z}\) channel. Conceptually, the maximal n-dimensional cube \({{\varLambda }}^{\prime \prime }\) enclosed in \({{\varLambda }}^{\prime }\) and parallel to Λ is our target constellation space. In this case, the symbols to be transmitted can be easily made to be binary and i.i.d. if we divide the cube \({{\varLambda }}^{\prime \prime }\) equally into 2n small cubes and select all the centers of the small cubes to be the constellation diagram. However, looking for such a \({{\varLambda }}^{\prime \prime }\) in practice is intractable when the dimension n is large and we are unclear about in what direction and by what degree the cube \({{\varLambda }}^{\prime }\) is rotated with respect to Λ. Instead, we compromise on the constellation size and use the cube \({{\varLambda }}^{\prime \prime }\) which is parallel to Λ and is enclosed in the maximal ball inscribed in \({{\varLambda }}^{\prime }\). In this manner, we can make sure there always exists such a constellation space \({{\varLambda }}^{\prime \prime }\) and it is straightforward to calculate its size. Figure 1 illustrates this idea in 2-dimensional case. If the side length of Λ is q, the side of \({{\varLambda }}^{\prime }\) turns out to have length \(q\sqrt {n/2}\), and the side of \({{\varLambda }}^{\prime \prime }\) will be \(q/\sqrt {2}\). Observe that \({{\varLambda }}^{\prime }=\sqrt {2}\tilde {B}{{\varLambda }}^{\prime \prime }\).

Fig. 1
figure 1

Switch of constellation diagram

3.3 Tailored RLWE channel model in canonical basis

Given the tailored constellation space \({{\varLambda }}^{\prime \prime }\) and its corresponding constellation diagram, we now have a tailored RLWE channel model in the canonical basis:

$$\textbf{y}=\lfloor\frac{q}{2}\rfloor \frac{1}{\sqrt{2}}\textbf{m}+\mathbf{N},\mod{{{\varLambda}}^{\prime\prime}},$$
(8)

where m ∈{0,1}n, Ni = HiZi and \(Z_{i}\leftarrow \mathcal {N}(0,nr^{2}/2)\) for 1 ≤ in. As discussed in formula (7), Hi and Hj are independent for two different indices i,j as long as ⌈i/2⌉≠⌈j/2⌉; otherwise Hi = Hj. Similarly, Zi and Zj are independent if ⌈i/2⌉≠⌈j/2⌉ otherwise they are correlated.

We observe that the tailored channel model in formula (8) can be seen as a fading channel where Hi is the channel gain and Zi is the additive noise. A family of fading channels (e.g., i.i.d. fading, block fading, compound fading) are well studied in existing work of [6, 18, 36] and explicit constructions of error-correcting codes are given. In this work, since Hi and Zi have the same coherence interval of two coordinates, our strategy is to divide the n parallel channels into two groups of i.i.d. channels and we construct two parallel polar codes of equal block length n/2 for the two \(\mathbb {Z}/2\mathbb {Z}\) fading channels. Note that in this work we use parameters similar to NewHope, e.g., q = 12289,n = 1024, r ∈{1,2,6,9} where the values of r correspond to the “Short” and “Tall” parameters in [8].

Denote by L and \(L^{\prime }\) two one-dimensional lattices \(\lfloor \frac {q}{2}\rfloor \frac {1}{\sqrt {2}} \mathbb {Z}\) and \(q\frac {1}{\sqrt {2}}\mathbb {Z}\) respectively. The above channel model can also be written as a fading \(L/L^{\prime }\) channel, i.e.,

$$Y_{i}=\lfloor\frac{q}{2}\rfloor\frac{1}{\sqrt{2}} m_{i}+H_{i}*Z_{i},\mod{q\frac{1}{\sqrt{2}}\mathbb{Z}},~i=1,{\cdots} n,$$
(9)

where mi ∈{0,1} and the channel input X is restricted to the discrete alphabet \(\mathcal {X}=L\cap \mathcal {R}(L^{\prime })=\{0, \lfloor \frac {q}{2}\rfloor \frac {1}{\sqrt {2}}\}\). Since Alice knows exactly what e and s are, she knows both the distribution and realization of the channel gain Hi. At the transmitter’s end, Bob only knows the distribution of Hi. Both of them know the distribution of Zi. How to achieve the ergodic capacity of such an i.i.d. fading channel using polar codes is well studied in [18] and we are about to adapt their strategy to our tailored RLWE channel model. A diagram of a fading \(L/L^{\prime }\) channel with CSI available to the decoder is shown in Fig. 2.

Fig. 2
figure 2

A block diagram of a fading \({L/L^{\prime }}\) channel

Denote by \(W:X\rightarrow (\tilde {Y},H)\) the fading \(L/L^{\prime }\) channel with CSI available to the decoder. The transition probability of W is

$$\begin{array}{@{}rcl@{}} P_{\tilde{Y},H|X}(\tilde{y},h|x) & =& P_{Y,H|X}(y=\tilde{y}+L^{\prime},h|x)\frac{d\tilde{y}}{dy} \\ & =& P_{H}(h)P_{Y|H,X}(y=\tilde{y}+L^{\prime}|h,x) \\ & =& P_{H}(h)\sum\limits_{\lambda\in L^{\prime}}\frac{1}{\sqrt{2\pi}h\sigma}\exp\left\lbrace-\frac{(\tilde{y}+\lambda-x)^{2}}{2\sigma^{2} h^{2}}\right\rbrace, \end{array}$$
(10)

where \(\sigma =\sqrt {\frac {n}{2}}r\). The distribution of H is

$$P_{H}(h) = \frac{1}{2\sigma^{4}}h(h^{2}-1)\exp\left\lbrace -\frac{h^{2}-1}{2\sigma^{2}} \right\rbrace =\frac{2h(h^{2}-1)}{n^{2}r^{4}}e^{-\frac{(h^{2}-1)}{nr^{2}}},~h>1.$$

The pdf of H in terms of various choices of parameter r is depicted in Fig. 3.

Fig. 3
figure 3

Probability density function of fading coefficient H

As discussed in [12] and [18], the capacity of the fading \(L/L^{\prime }\) channel is given by

$$\begin{array}{@{}rcl@{}} C(L/L^{\prime},\sigma^{2}) &=& E_{H}\left[C(L^{\prime},(h\sigma)^{2})\right]-E_{H}\left[C(L,(h\sigma)^{2})\right] \\ &=& E_{H}\left[\mathfrak{h}(L,(h\sigma)^{2})\right]-E_{H}\left[\mathfrak{h}(L^{\prime},(h\sigma)^{2})\right]+\log |L/L^{\prime}|, \end{array}$$

where \(E_{H}\left [\cdot \right ]\) denotes the expectation over the fading coefficient, \(\mathfrak {h}(L,\sigma ^{2})\) and \(\mathfrak {h}(L^{\prime },\sigma ^{2})\) are differential entropies of mod-L and mod-\(L^{\prime }\) channels respectively, and \(|L/L^{\prime }|\) is the order of the partition \(L/L^{\prime }\). Specifically, \(\mathfrak {h}(L,\sigma ^{2})\) is given by

$$\begin{array}{@{}rcl@{}} \mathfrak{h}(L,(h\sigma)^{2}) &=&-{\int}_{\mathcal{R}(L)}\;f_{L,(h\sigma)^{2}}(z^{\prime})\log f_{L,(h\sigma)^{2}}(z^{\prime})dz^{\prime}, \\ f_{L,(h\sigma)^{2}}(z^{\prime}) &=& \sum\limits_{\lambda\in L}g_{(h\sigma)^{2}}(z^{\prime}+\lambda),~z^{\prime}\in \mathcal{R}(L), \end{array}$$
(11)

where \(\mathcal {R}\) is a fundamental region of lattice L, \(g_{(h\sigma )^{2}}(\cdot )\) is the density function of \(\mathcal {N}(0,h^{2}\sigma ^{2}\mathbb {I})\). We refer to \(f_{L,(h\sigma )^{2}}\) as an L-periodic Gaussian density function which is defined by summing up a set of copies of a Gaussian density function centered at every lattice point of L. The value of an L-periodic Gaussian variable \(z^{\prime }\) is restricted to any fundamental region of L such that the integral of its density function over \(\mathcal {R}(L)\) is obviously 1. See Fig. 4 for the ergodic capacity of the fading \(L/L^{\prime }\) channel \(W:X\rightarrow (\tilde {Y},H)\) with respect to different choices of r. In a communication system, the signal-to-noise ratio (SNR) is a measure of the reliability of a channel. It is defined as the ratio of the signal strength over the noise strengthFootnote 8.

Fig. 4
figure 4

Capacity of RLWE channel vs SNR given \(X\in \{0, \lfloor \frac {q}{2}\rfloor \frac {1}{\sqrt {2}}\}\), n = 1024, q = 12289

Recall it in Section 2.4 the definition of a symmetric channel. It is observed that \(P_{\tilde {Y},H|X}(\tilde {y},h|x = 0 )=P_{\tilde {Y},H|X}(\pi (\tilde {y},h)|x = \lfloor \frac {q}{2}\rfloor \frac {1}{\sqrt {2}} )\) holds for a permutation \(\pi (\tilde {y},h)=\left ((\lfloor \frac {q}{2}\rfloor \frac {1}{\sqrt {2}}-\tilde {y})~\text {mod}~q\frac {1}{\sqrt {2}} \mathbb {Z},h \right )\) over the outputs \((\tilde {y},h)\). Therefore, the fading \(L/L^{\prime }\) channel W is symmetric and we can achieve its capacity using polar codes.

4 Description of the encryption scheme

Table 1 gives a high-level description of the RLWE-based PKE scheme using polar codes which are customized for our tailored RLWE channel model in canonical basis. The functions PolarEnc(⋅) and PolarDec(⋅) are encoding and decoding algorithms of polar codes which will be explicitly introduced in the sequel.

Remark 1

Unlike most RLWE encryption schemes where the error distribution χ is defined over \(\mathbb {Z}\) (e.g., central Binomial in NewHope), we use the definition of χ when the ideal learning with errors problem was initially proposed in [33] where χ is defined on \(\mathbb {R}/[0, q)\). Moreover, according to the formal definition of ring-LWE in [21], the error distribution is also continuous over the field tensor product \(K\otimes _{\mathbb {Q}}\mathbb {R}\).

Remark 2

A plaintext m is uniquely mapped to a symbol \(\lfloor \frac {q}{2} \rfloor \frac {1}{\sqrt {2}}\textit {PolarEnc}(\mathbf {m})\) on the constellation diagram in canonical basis. Then it is switched to polynomial basis and turned into vector v. Note that \(\mathbf {v}\in (\mathbb {R}/[0,q))^{n}\) but not in Rq. We see it reasonable since χ is also real and continuous.

Table 1 A RLWE public key encryption scheme with polar encoding and decoding

One may notice in Table 1 that Alice finally derives a mod-\(\tilde {B}R_{q}\) channel (or equivalently a mod-\({{\varLambda }}^{\prime }\) channel) as in Fig. 1 rather than the mod-\({{\varLambda }}^{\prime \prime }\) in formula (8) (or equivalently the mod-\(L^{\prime }\) channel in (9)). Questions arise whether the tailored RLWE channel model in formula (8) makes sense and how it will behave if we construct a polar code for the mod-\({{\varLambda }}^{\prime \prime }\) channel when we actually have a mod-\({{\varLambda }}^{\prime }\) channel. Lemma 3 illustrates the channel degradation relation between the two channels.

Lemma 3

(Channel Degradation Relation Between RLWE Channel and Its Tailored Variant) Let \({{\varLambda }}^{\prime }\) be the constellation space and let \({{\varLambda }}^{\prime \prime }\) be its tailored variant as in Fig. 1. Given the tailored RLWE channel model as in formula (8) with CSI Hi known to the decoder as in Fig. 2, the fading \(L^{n}/{{\varLambda }}^{\prime \prime }\) channel is degraded with respect to the fading \(L^{n}/{{\varLambda }}^{\prime }\) channel.

Proof

Denote by \(W^{\prime }\) the fading \(L^{n}/{{\varLambda }}^{\prime }\) channel \(y^{\prime }=x+h*z\mod {{{\varLambda }}^{\prime }}\) where \(y^{\prime }\in \mathcal {R}({{\varLambda }}^{\prime })\), \(x\in L^{n}\cap \mathcal {R}({{\varLambda }}^{\prime })\) is the channel input, h is the channel gain and z is the Gaussian noise. In the same fashion, we define the fading \(L^{n}/{{\varLambda }}^{\prime \prime }\) channel \(W^{\prime \prime }\) as \(y^{\prime \prime }=x+h*z\mod {{{\varLambda }}^{\prime \prime }}\) where \(y^{\prime \prime }\in \mathcal {R}({{\varLambda }}^{\prime \prime })\), \(x\in L^{n}\cap \mathcal {R}({{\varLambda }}^{\prime \prime })\).

As formula (10) indicates, the \(L/L^{\prime }\) fading channel with CSI known to the receiver in formula (9) can be viewed as an independent combination of channel gain h and an \(L/L^{\prime }\) Gaussian channel. Therefore, with no loss of generality, we can view the channel gain h as a constant. We can rewrite channel \(W^{\prime }\) as \(W^{\prime }: y^{\prime }=x+z^{\prime }\mod {{{\varLambda }}^{\prime }}\) and rewrite \(W^{\prime \prime }\) as \(W^{\prime \prime }: y^{\prime \prime }=x+z^{\prime }\mod {{{\varLambda }}^{\prime \prime }}\) where \(z^{\prime }\sim \mathcal {N}(0,h^{2}\sigma ^{2}\mathbb {I})\). The channel transition probability of \(W^{\prime }\) is

$$\begin{array}{@{}rcl@{}} W^{\prime}(y^{\prime}|x) &=& \sum\limits_{\lambda^{\prime}\in{{\varLambda}}^{\prime}}g_{(h\sigma)^{2}}(y^{\prime}-x+\lambda^{\prime}), y^{\prime}\in \mathcal{R}({{\varLambda}}^{\prime}) \\ &=& \sum\limits_{\lambda^{\prime}\in{{\varLambda}}^{\prime}}g_{(h\sigma)^{2}}(n^{\prime}+\lambda^{\prime}), n^{\prime}\in \mathcal{R}({{\varLambda}}^{\prime}) \end{array}$$
(12)

where \(g_{(h\sigma )^{2}}\) represents the density function of \(\mathcal {N}(0,h^{2}\sigma ^{2}\mathbb {I})\) and \(n^{\prime }=z^{\prime }\mod {{{\varLambda }}^{\prime }}\). The channel transition probability of \(W^{\prime \prime }\) is

$$\begin{array}{@{}rcl@{}} W^{''}(y^{''}|x) &&= \sum\limits_{\lambda^{''}\in{{\varLambda}}^{''}}g_{(h\sigma)^{2}}(y^{''}-x+\lambda^{''}),\;\;\;\;\;\;\;\;\;\;y^{''}\in \mathcal{R}({{\varLambda}}^{''}) \\ &&= \sum\limits_{\lambda^{''}\in{{\varLambda}}^{''}}g_{(h\sigma)^{2}}(n^{''}+\lambda^{''}),\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;n^{''}\in \mathcal{R}({{\varLambda}}^{''}) \\ &&\overset{(a)}{=}\sum\limits_{\lambda^{\prime}\in{{\varLambda}}^{\prime}}g_{(h\sigma)^{2}}(n^{'}\frac{\tilde{B}^{-1}}{\sqrt{2}}+\lambda^{'}\frac{\tilde{B}^{-1}}{\sqrt{2}}),\;\;\;\;\;\;\;n^{'}\in\mathcal{R}({{\varLambda}}^{'}) \\ &&=\sum\limits_{\lambda^{'}\in{{\varLambda}}^{'}}g_{(h\sigma)^{2}}\left(\frac{\tilde{B}^{-1}}{\sqrt{2}}(n^{'}+\lambda^{'})\right),\;\;\;\;\;\;\;n^{'}\in\mathcal{R}({{\varLambda}}^{'}) \\ &&=\sum\limits_{\lambda^{'}\in{{\varLambda}}^{'}}g_{(h\sigma\sqrt{2}\tilde{B})^{2}}(n^{'}+\lambda^{'}),\;\;\;\;\;\;\;\;\;\;\;\;\;n^{'}\in\mathcal{R}({{\varLambda}}^{'}) \end{array}$$
(13)

where \(n^{\prime \prime }=z^{\prime }\mod {{{\varLambda }}^{\prime \prime }}\) and the equality (a) is due to the relation \({{\varLambda }}^{\prime }=\sqrt {2}\tilde {B}{{\varLambda }}^{\prime \prime }\), \(\lambda ^{\prime }=\sqrt {2}\tilde {B}\lambda ^{\prime \prime }\), and \(n^{\prime }\in \mathcal {R}({{\varLambda }}^{\prime })\), \(n^{\prime \prime }\in \mathcal {R}({{\varLambda }}^{\prime \prime })\). We observe from equation (13) that channel \(W^{\prime \prime }\) is statistically equivalent to \(W^{\prime \prime }: y^{\prime \prime } = x + z^{\prime \prime }\mod {{{\varLambda }}^{\prime }}\) where \(z^{\prime \prime }\sim \mathcal {N}(0,(h\sigma \sqrt {2}\tilde {B})^{2}\mathbb {I})\). Since the transition probabilities in equation (12) and equation (13) are two \({{\varLambda }}^{\prime }\)-periodic Gaussian distributions featured with variances \((h\sigma )^{2} < (h\sigma \sqrt {2}\tilde {B})^{2}\), we can prove \(W^{\prime \prime }\) is degraded with respect to \(W^{\prime }\) by introducing an intermediate \(L^{n}/{{\varLambda }}^{\prime }\) channel \(W^{\prime \prime \prime }\) with additive Gaussian noise \(z^{\prime \prime \prime }\sim \mathcal {N}(0,(h\sigma \sqrt {2}\tilde {B})^{2}\mathbb {I}-(h\sigma )^{2}\mathbb {I})\) such that \(W^{\prime \prime }\) is a concatenation of \(W^{\prime }\) and \(W^{\prime \prime \prime }\), i.e.,

$$\begin{array}{@{}rcl@{}} (x+z^{\prime\prime})\mod{{{\varLambda}}^{\prime}} &=& (x+z^{\prime}+z^{\prime\prime\prime}) \mod{{{\varLambda}}^{\prime}} \\ &=& \left((x+z^{\prime})\mod{{{\varLambda}}^{\prime}}\right) + z^{\prime\prime\prime} \mod{{{\varLambda}}^{\prime}}. \end{array}$$

The above concatenation satisfies the definition of channel degradation (Definition 3). □

Given the channel degradation relation between the fading \(L^{n}/{{\varLambda }}^{\prime }\) channel \(W^{\prime }\) and the fading \(L^{n}/{{\varLambda }}^{\prime \prime }\) channel \(W^{\prime \prime }\), it is guaranteed by Lemma 1 that the polar codes constructed for \(W^{\prime \prime }\) also fit in with \(W^{\prime }\). How to explicitly construct polar codes will be shown in next section.

5 Polar coding for the tailored RLWE channel

As discussed in Section 2.4, we need a BDMS channel before we can adapt the polar coding method, including calculating the Bhattacharyya parameters of the synthesized channels, defining the information set \(\mathcal {A}\) and frozen set \(\mathcal {A}^{c}\), encoding and SC decoding. We have already proved the fading \(L/L^{\prime }\) channel \(W:X\rightarrow (\tilde {Y},H)\) as in formula (9) is symmetric in Section 3.3. Since we assume the channel gain H and Gaussian noise Z to be continuous and so is the channel output, we need to discretize the channel output \(H,\tilde {Y}\) before constructing polar codes. An elegant channel quantization scheme was proposed in [18] where the two output H and \(\tilde {Y}\) are discretized independently with tractable loss of channel capacity. Basically, the channel gain H is discretized into a series of discrete values with uniform occurrence probability. As for the output \(\tilde {Y}\), we will decompose the \(L/L^{\prime }\) channel into multiple BDMS channels such that the overall channel capacity almost preserves with only negligible loss.

5.1 Quantization of the fading coefficient

As discussed in previous sections, the fading \(L/L^{\prime }\) channel with CSI available to the decoder is statistically equivalent to an independent combination of the fading coefficient H and an \(L/L^{\prime }\) channel with additive Gaussian noise of variance (hσ)2. Therefore, we firstly quantize H then the \(L/L^{\prime }\) channel. Let {αi} be an ascending sequence in the following form

$$\alpha_{1}=1,\alpha_{2},\cdots,\alpha_{m},\alpha_{m+1}=+\infty,$$

so that for 1 ≤ im we have

$${\int}_{\alpha_{i}}^{\alpha_{i+1}}P_{H}(h)dh=\frac{1}{m}.$$

We take the centroid with respect to the interval (αi,αi+ 1) as the discretized alphabet \({\mathscr{H}}_{q}=\{h_{i}\}\) for i = 1,⋯ ,m where hi is calculated as follows.

$$h_{i} = {\int}_{\alpha_{i}}^{\alpha_{i+1}}mhP_{H}(h)dh.$$

5.2 Degrading transform quantization

As in Fig. 2 we view the tailored RLWE channel as an i.i.d. fading channel. For such a channel, polar codes are constructed in [18] to achieve the ergodic capacity C(W) as long as the receiver knows the CSI and the transmitter knows the CDI. Given n (\(n=2^{l}, l\in \mathbb {Z}\)) i.i.d. tailored RLWE channels \(W:X\rightarrow (\tilde {Y},H)\), we define the channel input as X1:n = U1:nGn where U1:n ∈{0,1}1:n and Gn is the generator matrix Footnote 9. We obtain n synthesized channels \(W_{n}^{(i)}:U^{(i)}\rightarrow (U^{1:i-1},\tilde {Y}^{1:n},H^{1:n})\) for 1 ≤ in by performing channel combining and channel splitting. The Bhattacharyya parameter for W is defined as

$$Z(W)\overset{\triangle}{=}\sum\limits_{\tilde{y},h}\sqrt{P_{\tilde{Y},H|X}(\tilde{y},h|0)P_{\tilde{Y},H|X}(\tilde{y},h|1)}.$$

To compute \(Z(W_{n}^{(i)})\) efficiently, we employ the degrading transform proposed in [34] to quantize a BMS channel W with continuous output alphabet into a degraded and approximated BDMS channel WQ with finite output alphabet size. Intuitively, the finer the discretized output alphabet is, the better WQ approximates W. Since we have already discretized H as hi for i = 1,⋯ ,m, we can consider hi as a constant and quantize the \(L/L^{\prime }\) channel \(W_{h_{i}}:X,h_{i}\rightarrow \tilde {Y}\) for each hi.

We define the likelihood ratio (LR) of a channel W as

$$\lambda(\tilde{y},h_{i}){:=}\frac{W_{\tilde{Y}|X,h_{i}}(\tilde{y}|0,h_{i})}{W_{\tilde{Y}|X,h_{i}}(\tilde{y}|\lfloor\frac{q}{2}\rfloor\frac{1}{\sqrt{2}},h_{i})},$$
(14)

where the transition probability \(W_{\tilde {Y}|X,h_{i}}\) is

$$\begin{array}{@{}rcl@{}} && W_{\tilde{Y}|h_{i},X}(\tilde{y}|0,h_{i}) = f_{L^{\prime},0,{h_{i}^{2}}\sigma^{2}}(\tilde{y}) = \sum\limits_{\lambda\in L^{\prime}}g_{0,(h_{i}\sigma)^{2}}(\tilde{y}+\lambda),\\ && W_{\tilde{Y}|h_{i},X}(\tilde{y}|\lfloor\frac{q}{2}\rfloor\frac{1}{\sqrt{2}},h_{i}) = f_{L^{\prime},\lfloor\frac{q}{2}\rfloor\frac{1}{\sqrt{2}},{h_{i}^{2}}\sigma^{2}}(\tilde{y}) = \sum\limits_{\lambda\in L^{\prime}}g_{\lfloor\frac{q}{2}\rfloor\frac{1}{\sqrt{2}},(h_{i}\sigma)^{2}}(\tilde{y}+\lambda). \end{array}$$

Figure 5 depicts \(W_{\tilde {Y}|X,h_{i}}\) and \(\lambda (\tilde {y},h_{i})\) by giving some examples when q = 12289,r = 2 and hi = 10,30.

Fig. 5
figure 5

The probability density and likelihood ratio of \(W:X,h_{i}\rightarrow \tilde {Y}\)

We can see it in Fig. 5 that the channel \(W_{h_{i}}:X, h_{i}\rightarrow \tilde {Y}\) is BMS with \(\tilde {Y}\) continuously located over the interval \([0,q/\sqrt {2})\). There exists a permutation function \(\pi (\tilde {y})=(\lfloor \frac {q}{2}\rfloor \frac {1}{\sqrt {2}}-\tilde {y})\mod q/\sqrt {2}\) such that \(W(\tilde {y}|0,h_{i})=W(\pi (\tilde {y})|\lfloor \frac {q}{2}\rfloor \frac {1}{\sqrt {2}},h_{i})\). Intuitively, the BMS channel \(W_{h_{i}}\) can be decomposed into infinite binary symmetric channels (BSCs) \(W_{c}:X,h_{i}\rightarrow \tilde {Y}_{c}\) where the output is \(\tilde {Y}_{c}\in \{y_{c},\pi (y_{c})\}\) for continuous \(y_{c}\in [0,q/\sqrt {2})\), \(X\in \{0,\lfloor \frac {q}{2}\rfloor \frac {1}{\sqrt {2}}\}\) and the crossover probability is the corresponding probability density \(W(\tilde {y}_{c}|X,h_{i})\). If we focus on the likelihood ratio \(\lambda (\tilde {y}_{c},h_{i})\geq 1\), the crossover probability of BSC Wc is \(\frac {1}{\lambda (\tilde {y}_{c},h_{i})+1}\). The capacity of this BSC is

$$C[\lambda(\tilde{y}_{c},h_{i})]= 1-\frac{\lambda(\tilde{y}_{c},h_{i})}{\lambda(\tilde{y}_{c},h_{i})+1}\log{\frac{\lambda(\tilde{y}_{c},h_{i})+1}{\lambda(\tilde{y}_{c},h_{i})}}- \frac{1}{\lambda(\tilde{y}_{c},h_{i})+1} \log{(\lambda(\tilde{y}_{c},h_{i})+1)},$$

where \(\lambda (\tilde {y}_{c},h_{i})\geq 1\). Quantitatively, the continuous decomposition of \(W_{h_{i}}\) preserves the channel capacity in the sense that

$$C(W_{h_{i}})={\int}_{\lambda(\tilde{y},h_{i})\geq1} (W_{\tilde{Y}|X,h_{i}}(\tilde{y}|0,h_{i})+W_{\tilde{Y}|X,h_{i}}(\tilde{y}|\lfloor\frac{q}{2}\rfloor\frac{1}{\sqrt{2}},h_{i})) C[\lambda(\tilde{y},h_{i})] d\tilde{y},$$

where the integral interval is restricted to \(\tilde {y}\) such that \(\lambda (\tilde {y},h_{i})\geq 1\). If we ignore the subtle geometrical error introduced by rounding ⌊⋅⌋, we can observe a symmetry feature in the graphs in Fig. 5 and we find that the valid integral interval is

$$A:=[0,\lfloor\frac{q}{2}\rfloor\frac{1}{2\sqrt{2}}] \cup [\lfloor\frac{q}{2}\rfloor\frac{3}{2\sqrt{2}}, q\frac{1}{\sqrt{2}}].$$

We divide the interval A into ν segments Aj for j ∈ [ν] such that

$$\begin{array}{@{}rcl@{}} A_{j} &=& \left \{ \tilde{y}\in A:\frac{j-1}{\nu} \le C[\lambda(\tilde{y},h_{i})] < \frac{j}{\nu} \right \} \\ &=& \left \{\tilde{y}\in A: { \frac{1}{\mathfrak{h_{2}^{-1}}\left(\frac{\nu-i+1}{\nu}\right)}-1} \le \lambda(\tilde{y},h_{i}) < {\frac{1}{\mathfrak{h_{2}^{-1}}\left(\frac{\nu-i}{\nu}\right)}-1} \right \}, \end{array}$$

where \(\mathfrak {h_{2}}(\cdot )\) is the binary entropy function. Each Aj corresponds to a BSC channel with crossover probability

$$p_{j}=\frac{{\int}_{A_{j}}W_{\tilde{Y}|X,h_{i}}(\tilde{y}|\lfloor\frac{q}{2}\rfloor\frac{1}{\sqrt{2}},h_{i})d\tilde{y}}{{\int}_{A_{j}}W_{\tilde{Y}|X,h_{i}}(\tilde{y}|\lfloor\frac{q}{2}\rfloor\frac{1}{\sqrt{2}},h_{i})d\tilde{y}+{\int}_{A_{j}}W_{\tilde{Y}|X,h_{i}}(\tilde{y}|0,h_{i})d\tilde{y}},$$
(15)

where

$$\begin{array}{@{}rcl@{}} {\int}_{A_{j}}W_{\tilde{Y}|X,h_{i}}(\tilde{y}|0,h_{i})d\tilde{y}&=&{\int}_{A_{j}}\sum\limits_{\lambda\in L^{\prime}}g_{0,(h_{i}\sigma)^{2}}(\tilde{y}+\lambda)d\tilde{y}\\ {\int}_{A_{j}}W_{\tilde{Y}|X,h_{i}}(\tilde{y}|\lfloor\frac{q}{2}\rfloor\frac{1}{\sqrt{2}},h_{i})d\tilde{y} &=&{\int}_{A_{j}}\sum\limits_{\lambda\in L^{\prime}}g_{\lfloor\frac{q}{2}\rfloor\frac{1}{\sqrt{2}},(h_{i}\sigma)^{2}}(\tilde{y}+\lambda)d\tilde{y}. \end{array}$$

Since lattice \(L^{\prime }\) is infinite, we can numerically approximate \(f_{L^{\prime },0,{h_{i}^{2}}\sigma ^{2}}(\tilde {y})\), \(f_{L^{\prime },\lfloor \frac {q}{2}\rfloor \frac {1}{\sqrt {2}},{h_{i}^{2}}\sigma ^{2}}(\tilde {y})\) then \(\lambda (\tilde {y},h_{i}),A_{j}\) and pj.

If we define zj and its conjugate \(\bar {z}_{j}\) to be the channel output of the BSC associated with Aj, we will obtain the discretized output alphabet of \(W_{h_{i}}\) as

$$\mathcal{Z}:=\{z_{1},\bar{z}_{1},z_{2},\bar{z}_{2},\cdots,z_{\nu},\bar{z}_{\nu}\}.$$

If we denote by WQ the discretized version of the original fading \(L/L^{\prime }\) channel \(W:X\rightarrow \tilde {Y},H\), the output alphabet of WQ is \({\mathscr{H}}_{q}\otimes \mathcal {Z}:=\{h_{i}\}\otimes \{z_{1},\bar {z}_{1},\cdots ,z_{\nu },\bar {z}_{\nu }\}\) for i ∈ [m] and j ∈ [ν] where ⊗ denotes the Cartesian product of two sets.

Lemma 4

The channel \(W_{Q}:X\rightarrow Z,H_{q}\) is degraded with respect to W.

Proof

We supply an intermediate channel \(W_{P}:(\tilde {Y},H)\rightarrow (Z,H_{q})\) such that

$$\begin{array}{@{}rcl@{}} W_{P}(z,h_{q}|\tilde{y},h)=\begin{cases} \begin{array}{c} 1, \\ 1,\\ 0, \end{array}&\begin{array}{c} \text{if}~z=z_{j}, \tilde{y}\in A_{j},~\text{and}~h_{q}=h_{i},h\in[\alpha_{i},\alpha_{i+1}),\\ \text{if}~z=\bar{z}_{j}, \pi(\tilde{y})\in A_{j},~\text{and}~h_{q}=h_{i},h\in[\alpha_{i},\alpha_{i+1}),\\ \text{otherwise}. \end{array} \end{cases} \end{array}$$

We observe a channel degradation relation such that

$$W_{Q}(z,h_{q}|x)=\int W_{\tilde{Y},H|X}(\tilde{y},h|x)W_{P}(z,h_{q}|\tilde{y},h)d\tilde{y}\;dh.$$

Corollary 1

Given that \(W_{Q}:X\rightarrow Z,H_{q}\) is degraded with respect to W, the capacity, Bhattacharyya parameter and frame error rate of the two channels are related as

$$\begin{array}{@{}rcl@{}} C(W_{Q}) &\leq C(W),~Z(W_{Q})\geq Z(W),~P_{e}(W_{Q})\geq P_{e}(W). \end{array}$$

Proof

As a corollary of Lemmas 2 and 4. □

It is indicated in [34] that the capacity loss introduced by the degrading transform is no greater than 1/ν. If we choose large alphabet size m and 2ν, the loss of capacity is negligible and so is Z(⋅) and Pe(⋅). We also verified our channel quantization scheme with respect to the channel capacity. As is shown in Fig. 6, for m = 20,ν = 50 and multiple choices of r, C(WQ) is close to C(W) with only negligible difference.

Fig. 6
figure 6

A comparison between C(W) and C(WQ) for different r, when m = 20,ν = 50

To summarize, what the degrading transform does is to convert the RLWE channel W with continuous output alphabet into a BDMS channel WQ with finite output, which can be viewed as a combination of m × ν BSC channels. In this way, one can construct polar codes for WQ which also fit in with W.

5.3 Polar encoding and SC decoding

5.3.1 Encoding algorithm PolarEnc(⋅)

Given the BDMS channel WQ derived by channel quantization, we can adapt the polar encoding and decoding method introduced in Section 2.4 to WQ. Recall that the output alphabet of WQ is m × 2ν. As the channel combining and splitting process continue, the alphabet size of the synthesized channels \(W_{Qn}^{(i)}\) will increase exponentially as the recursion proceeds. To handle this problem, we employ an approximation method proposed in [27] which can reduce the alphabet size of a BDMS channel with negligible and tractable loss of performance by merging some of the output symbols.

After we finish computing the Bhattacharyya parameters of all the \(W_{Qn}^{(i)}\), we can define the information set \(\mathcal {A}\) and frozen set \(\mathcal {A}^{c}\). Recall the encoding algorithm PolarEnc(m) in Table 1. We construct polar codes for plaintext m = u1:n as

$$\begin{array}{@{}rcl@{}} x^{1:n} &=& \textit{PolarEnc}(\textbf{m}) \\ &=& u_{\mathcal{A}}G_{N}(\mathcal{A})\oplus u_{\mathcal{A}^{c}}G_{N}(\mathcal{A}^{c}), \end{array}$$

where \(u_{\mathcal {A}}\) is the information vector and \(u_{\mathcal {A}^{c}}\) is the frozen vector. The complexity of encoding is \(O(n\log n)\) where n is equal to the degree of the cyclotomic field of RLWE.

5.3.2 Decoding algorithm PolarDec(⋅)

The decoding algorithm PolarDec(⋅) is exactly the same as the so called successive cancellation (SC) decoding initially proposed in [5]. Upon receiving the signal \(\tilde {y}^{1:n}\) (i.e. \(\tilde {y}^{1:n}=\tilde {B}\mathbf {y}\) in Table 1) and invoking their knowledge of the CSI h1:n, the recipient applies the SC decoding to \(\tilde {y}^{1:n},h^{1:n}\) and gives an estimation \(\bar {u}^{1:n}\) of u1:n as

$$\begin{array}{@{}rcl@{}} \bar{u}^{(i)}=\begin{cases} \begin{array}{l} \text{arg~max}_{u^{(i)}\in\{0,1\}}W_{n}^{(i)}(\tilde{y}^{1:n},h^{1:n},\bar{u}^{1:i-1}|u^{(i)}),\\ 0, \end{array} & \begin{array}{c} \text{if}~i\in\mathcal{A}\\ \text{otherwise} \end{array}\end{cases}, \end{array}$$

where the transition probabilities of synthesized channels \(W_{n}^{(i)}(\cdot |\cdot )\) can be recursively calculated by SC decoding algorithm with complexity \(O(n\log n)\). Details of SC decoding can be found in Appendix A. A frame error occurs if \(\bar {u}^{1:n}\neq u^{1:n}\); we may interchangeably use frame error probability and DFR in this work. Additionally, PolarEnc(⋅) and PolarDec(⋅) require constant steps of operations for fixed choices of \(n,\mathcal {A}\), making isochronous implementations possible. Details about isochrony will be discussed in Section 7.2.

6 Results: Performance analysis and improvement

According to Theorem 2, the frame error probability \(P_{e}(n,R,\mathcal {A})\) of SC decoding is upper bounded by the sum of \(Z(W_{n}^{(i)})\). Since \(W_Q \preceq W\) and \(W_{Qn}^{(i)}\preceq W_{n}^{(i)}\) according to Lemma 1, we derive

$$P_{e}(n,R,\mathcal{A})\leq \sum\limits_{i\in\mathcal{A}}Z(W_{Qn}^{(i)}).$$
(16)

Recall it in Fig. 6 that the capacity of our tailored RLWE channel deteriorates dramatically because we use a tailored and shrunk constellation diagram. As a result, for most choices of r which are believed to be secure in RLWE-based PKE, we cannot obtain a desired DFR lower than 2− 128 which is used as a benchmark in NIST standardization. As explained in Section 3.2, we carefully and conservatively choose a cube \({{\varLambda }}^{\prime \prime }\) which is enclosed in the maximal sphere inscribed in \({{\varLambda }}^{\prime }\). Almost surely there are other valid choices of \({{\varLambda }}^{\prime \prime }\) lager than the one we choose, though it is not easy at all to figure out the optimal one. A pragmatic solution to this harsh problem is to gradually scale \({{\varLambda }}^{\prime \prime }\) up by a factor t ≥ 1 and run simulations for each to justify if the numerical results of Pe coincide with the upper bound in formula (16). We highlight that if t is not larger than some critical point, the channel degradation relation in Lemma 3 will still hold. Therefore, the theoretical upper bound on Pe will still apply after we scale the modulation constellation \({{\varLambda }}^{\prime \prime }\). Please refer to Remark 3 for further explanation.

Fig. 7
figure 7

Frame error probability for RLWE-based PKE with q = 12289, n = 1024, r = 1 and multiple choices of scale factor t: simulation results vs. upper bounds

Figure 7 compares the upper bounds of frame error probability Pe with our simulation results in the setting of q = 12289,n = 1024,r = 1. The solid lines indicate the upper bounds of Pe with respect to different code rate R. The solid lines with stars represent our simulation results which, for reasonably small DFR, comply with the upper bound. We aim to achieve Pe = 2− 128 at code rate R = 0.25. Apparently, it is unachievable when the scale factor t = 1. We gradually increase t and obtain the corresponding estimation of Pe. We can see that the decoding performance is improved significantly upon a slightly larger t, e.g., Pe is smaller than 10− 60(≈ 2− 200) at R = 0.25 for t = 2. When t = 2, the experiment result represented by the red star also complies with its corresponding theoretical estimation, i.e., the red solid line. It implies that our estimation of Pe for t = 2 is reliable to some extent. Please note that all these experiments target at relatively large Pe which is feasible to verify.

Fig. 8
figure 8

Frame error probability for RLWE-based PKE with q = 12289, n = 1024, r = 2 and multiple choices of scale factor t: simulation results vs. upper bounds

Fig. 9
figure 9

Frame error probability for RLWE-based PKE with q = 12289, n = 1024, r = 2.83 and multiple choices of scale factor t: simulation results vs. upper bounds

Figure 8 can be interpreted in the same manner as Fig. 7. The only different parameter used here is r = 2. The solid lines in different colors represent our estimation of Pe and the stars are our simulation results. By making scale factor t as large as 6, the target R and Pe can be achieved. For relatively large Pe shown in the graph, we observed that our simulation results comply with our estimation when t = 6,7,9,11,12. However, when t = 14, simulation results are worse than our estimation, implying that the constellation diagram \({{\varLambda }}^{\prime \prime }\) is overwhelmingly large and goes beyond the valid domain.

Fig. 10
figure 10

Frame error probability for RLWE-based PKE with q = 12289, n = 1024, r = 3.46 and multiple choices of scale factor t: simulation results vs. upper bounds

In Fig. 9, r = 2.83. We can observe that our estimations are effective for t = 8,12 but fail for t > 12. We can see that none of our simulation results comply with the estimations in Fig. 10. It implies that the scaling method does not apply for r ≥ 3.46.

Remark 3

The error sources for the scaled and tailored RLWE channel model are concluded as follows.

  1. (a)

    As t increases, the constellation space \({{\varLambda }}^{\prime \prime }\) may go beyond \({{\varLambda }}^{\prime }\) and our model will fail to describe the statistical feature of the real channel.

  2. (b)

    The SC decoder takes \(\tilde {B}\textbf {y}\) to be the channel output of a fading \(L^{n}/{{\varLambda }}^{\prime \prime }\) channel while it is actually a fading \(L^{n}/ {{\varLambda }}^{\prime }/ {{\varLambda }}^{\prime \prime }\) channel according to Table 1. This is because Alice firstly performs a mod Rq operation and then calculates \(\tilde {B}\textbf {y}\) upon receiving y from Bob. For small r, the two channels have quite close distributions but they become less likely as r goes larger. This explains why our model fails when r ≥ 3.46 in Fig. 10.

  3. (c)

    It might be misunderstood that for any t > 1 the theoretical estimation in formula (16) would not apply. This is exactly not the case. As stated in Section 3.2, the constellation \({{\varLambda }}^{\prime }\) shrinks \(\sqrt {n}\) times in length and becomes the tailored one \({{\varLambda }}^{\prime \prime }\). As a result of (a) and (b), slightly increasing \({{\varLambda }}^{\prime \prime }\) will not affect the soundness of the channel degradation relation and formula (16) if t does not go beyond some critical point. To find such point is nontrivial. That is why we run simulations to explore the relation between t, r and DFR. The disadvantage of this pragmatic method is that we can not verify small Pe of cryptographic interest.

7 Security analysis

7.1 Security improvement by new DFR

We define the concrete bit-security to be \(\log _{2}\) of the time complexity of certain attacks breaking a scheme of specific parameters of interest. We analyze the concrete bit-security of the proposed RLWE-based PKE by considering the best known generic attacks against ring-LWE and the corresponding cost models. A comprehensive survey of a variety of generic attacks and cost models can be found in [1, 2]. Since the proposed RLWE-based PKE differs from NewHope solely in the way plaintext is encoded and decoded, and the error-correction code itself does not affect security reduction, therefore the security estimation of NewHope [4] can be extended to our case.

Following the security estimation in [4, 19], we focus on two generic attacks. Essentially, we will consider (a) a primal attack which consists of constructing a unique shortest vector problem (uSVP) given LWE samples and solving it using block Korkin–Zolotarev (BKZ) algorithm with classical/quantum sieving (b) a dual attack which searches for the shortest vector in a dual lattice constructed by LWE samples using BKZ with classical/quantum sieving. We employ the cost model in [4] where the cost of BKZ with classical/quantum sieving is 0.292β/0.265β with β the block dimension of BKZ. In Table 2, we summarize the security estimates of the two attacks where the cost is defined as \(\log _{2}\) of time complexity of BKZ Footnote 10. Note that a variant of the dual attack is used by the estimator which makes the cost different from [4].

There exists a trade-off relation between DFR and bit-security level of RLWE-based PKE. Basically, larger error term (or larger binomial parameter k in NewHope) gives better security but worse DFR. The motivation of this work is to employ polar codes to give a safer DFR margin such that we can improve the bit-security level while achieving the target DFR. In NIST standardization, this target DFR is 2− 128. A more conservative target 2− 140 is used in the literature [13, 32].

Table 2 illustrates the DFR and bit-security level of RLWE-based PKE using our polar coding scheme for different choices of binomial parameter k (\(r=\sqrt {k/2}\)) and scale factor t. As we discussed in previous section, the scale factor of the constellation diagram cannot be larger than 12 for k = 8, otherwise the estimation of DFR is no longer valid. We select a more conservative choice t = 11 and achieve DFR= 2− 298 for n = 1024,q = 12289,k = 8 using our polar coding scheme which is smaller than the DFR 2− 216 of NewHope round 2 in the same setting. As discussed in Fig. 10, our calculation of DFR for k ≥ 24 (r ≥ 3.46) no longer applies.

In conclusion, our polar coding scheme and the selected parameters provide the RLWE-based PKE with a bit-security of at least 256 bits while achieving the target DFR 2− 140 (and also 2− 128). This is a considerable improvement compared with NewHope round 2 which offers a bit-security of 235 bits with the same parameters. The state-of-the-art study of this kind can be classified into two categories. In [13], LDPC and BCH codes are used to increase the bit-security to 309 bits while achieving DFR of 2− 140. However, their DFR estimation highly relies on an “independence” assumption and their error-correcting algorithms are not isochronous. The other approach was proposed by Song et al. in [32] which gave a tighter bound on DFR of NewHope and the bit-security is increased to 252 bits.

Table 2 Improved bit-security of RLWE PKE using polar codes for n = 1024,q = 12289

7.2 Resilience against timing-based attacks

When error-correcting codes are adapted to RLWE-based PKE, a major concern is the resilience against timing-based attacks. Discussions of this kind can be found in [19, 31]. We employ a semi-formal definition of constant-time algorithms which is called “isochrony” in [15]. We view an algorithm to be isochronous if its execution time is independent of the sensitive part of its input and output. This is a weaker notion than the conventional definition but suffices to argue security against timing attacks. We will justify the isochrony of polar encoding and decoding in this section.

Encoding

As introduced in Section 2.4 as well as Section 5.3, the encoding of polar codes takes plaintext u1:n as input and yields codewords as equation (1). The block length n is equal to the degree of cyclotomic field of RLWE. The encoding process comprises exactly \(\frac {n\log n}{2}\) many XOR logical operations no matter what the plaintext u1:n is. This can be verified by some trivial examples as in Fig. 11. Note that it is sensible to carry out the calculation of Bhattacharyya parameters for the synthesized channels offline. Because they are determined by the distribution of the residue noise term etse1 + e2 and can be done once and for all. Therefore, the encoding is isochronous.

Fig. 11
figure 11

Examples of polar encoding, ⊕: XOR logical operation, (a) n = 2 (b) n = 4

Decoding

As detailed in Appendix A, the SC decoding comprises three types of operations, i.e. (1) recursive calculation of the transition probabilities \(W_{n}^{(i)}\) as in Algorithm 2 (2) comparisons of two transition probabilities as line 9 of Algorithm 1 (3) XOR logical operations as in Algorithm 3. As in [15], we prove the SC decoding to be isochronous by showing that its timing is irrelevant to the sensitive information of the protocol. Regarding the decoding of RLWE PKE, the sensitive information includes the input \(\tilde {B}\mathbf {y}\) in Table 1 (we use shorthand notation y1:n) and output \(\bar {u}^{1:n}\) (i.e. the decoding result of plaintext u1:n) of SC decoding and the secret terms e,s,t,e1,e2 separately generated by each side of protocol. Note that the information set \(\mathcal {A}\) and its complement \(\mathcal {A}^{c}\) are determined by the distribution of secret terms and block length n which are publicly known. The frozen vector (e.g., an all zero vector) is also publicly known. Table 3 illustrates what types of operations are isochronous with respect to the sensitive information.

Firstly, recursively calculations of \(W_{n}^{(i)}\) are isochronous because their timings are irrelevant to any sensitive information. As described in Appendix A, for any fixed n an SC decoder carries out exactly \(n\log n\) many transition probability assemblies as in equation (17) and (18). Normally, these assemblies are floating-point operations. We use transition probability rather than the more popular likelihood ratio recursions to avoid floating-point divisions which are considered difficult for isochronous implementations [29, 37].

Secondly, the floating-point comparisons of two transition probabilities in Algorithm 1 are the decision-making process which yields the output \(\bar {u}^{1:n}\). Generally speaking, comparing two close floating-point values would take longer, but it is equally likely to return True and False nonetheless. Therefore, it makes sense to consider the timings of this type of operations irrelevant to \(\bar {u}^{1:n}\). In addition, the overall running time taken by comparisons is relevant to \(\mathcal {A}\) and n because comparisons only take place for information set \(\mathcal {A}\). Other sensitive information is not related to comparison operations.

Thirdly, the XOR logical operations in Algorithm 3 are the same as what happens in encoding. The quantity of XOR operations carried out by Algorithm 3 is uniquely determined by block length n.

We conclude that the encoding and decoding are isochronous with respect to sensitive information including the plaintext, the input and output of SC decoding and the secret terms e,s,t,e1,e2.

Table 3 Relevance between timings of operations and information: relevant: \(\checkmark\), irrelevant: ×

8 Conclusions

We have presented the first example of a polar coding technique to improve the DFR of RLWE-based PKE which takes advantage of viewing the protocol as a fading channel with CSI known to the decoder. Moreover, switching from polynomial basis to canonical basis unfastens the dependency existing in the residue noise term. The constellation space is tailored to derive an i.i.d. fading channel at the cost of decoding performance and a scaling method is employed to counteract the performance loss. Both numerical and theoretical results are given to verify the DFR estimation. The advantages of our method are as follows.

  • We derive an i.i.d. channel model of the residue noise term in H space using canonical embedding. The advantage that some knowledge of noise term is known by the decoder is taken to improve the decoding performance.

  • The bit-security is increased to 256 bits while achieving the target DFR of 2− 140 in the setting of n = 1024,q = 12289,k = 16 (r = 2.83). This improvement is better than the benchmark 252 bits in [32]. Though it does break the record of 309 bits in [13], their results rely on an “independence” assumption that may not hold nonetheless.

  • Polar codes support isochronous implementations of encoding and decoding while LDPC and BCH codes employed in [13] do not. We show the encoding and decoding of polar codes to be isochronous with respect to sensitive information of the protocol.

The disadvantages are also given as follows.

  • Switching between the two basis by multiplying matrix \(\tilde {B}\) and \(\tilde {B}^{-1}\) as in Table 1 increases the complexity of the protocol.

  • To derive an i.i.d. channel model, we designed a tailored modulation diagram which gives closer code distance than the original modulation diagram \(\{0,\lfloor \frac {q}{2} \rfloor \}\). It hurts the decoding performance but the power of polar codes and the proposed scaling method counteract this effects to some extent.

  • However, the critical points of the scale factor t and the noise parameter r beyond which the theoretical upper bound on Pe no longer applies are currently missing.