A survey on single server private information retrieval in a coding theory perspective

In this paper, we present a new perspective of single server private information retrieval (PIR) schemes by using the notion of linear error-correcting codes. Many of the known single server schemes are based on taking linear combinations between database elements and the query elements. Using the theory of linear codes, we develop a generic framework that formalizes all such PIR schemes. This generic framework provides an appropriate setup to analyze the security of such PIR schemes. In fact, we describe some known PIR schemes with respect to this code-based framework, and present the weaknesses of the broken PIR schemes in a unified point of view.


Introduction
Private information retrieval (PIR) was first introduced in [1] to cope with the following problem: retrieving an element from a database, without revealing to the untrusted source managing the database any information about that element. Since its introduction, it has attracted many researchers and several works have addressed their focus on it. There have been proposed two solutions to this problem, namely, the information theoretical one and the computational one. The first one aims to the common features of several existing PIR schemes, and on the other hand, the retrieval function describes the key differences between the schemes. In terms of the framework, the privacy of a PIR scheme heavily relies on the retrieval function. We observe that several choices of retrieval functions are not safe to use, for example, finite field homomorphisms and vector space homomorphisms. Moreover, we discuss the weaknesses of many broken PIR schemes with respect to this code-based framework.
The paper is organized as follows: in Sect. 2, we introduce the notation that will be used throughout the paper and give the background on single server private information retrieval, and linear codes over finite fields and over rings. In Sect. 3, we present the code-based framework and discuss the security in a general point of view. In Sect. 4, we provide a survey on four different PIR schemes, described in terms of the code-based framework. The first one is a basic scheme that uses a finite field homomorphism as the retrieval function. Whereas the other three are based on the existing PIR schemes [19,21] and [16], respectively. The latter is the only example of a scheme that we present which is still unbroken. For the former two we will also describe the existing attacks with respect to the proposed code-based framework. Finally, in Sect. 5, we draw some theoretical remarks on the generality of the framework, and on the security of single server PIR schemes.

Preliminaries
In this section we introduce the notation that we use in the paper and we recall some background on the theory of single server PIR. Moreover, we introduce the basic notions of error-correcting linear codes.

Notation
In this paper, we denote by R a ring and by R × the set of invertible elements in the ring R . Moreover, let q be a prime power, then we denote by q the finite field of size q.
We use bold lower case, respectively bold upper case letters to denote row vectors, respectively matrices. When we consider column vectors, we use the transpose symbol. The identity matrix of size k is denoted by k . Given a vector of length n and a set S ⊂ {1, … , n} , we denote by S the projection of on the coordinates indexed by S. In the same way, S denotes the projection of the k × n matrix to the columns indexed by S.
For a set S we denote by S C its complement. The support of a vector ∈ n q is denoted by Supp( ) = {1 ≤ i ≤ n | x i ≠ 0}.
The i-th entry of a vector ∈ n q is denoted by [i] , for i ∈ {1, … , n}. Given a set S and a distribution on S, x ← represents a sample x from S following the distribution .

Single server private information retrieval
A single server PIR is a scheme involving two parties, the user and the server. The server manages a database containing some public information, and the user is interested in retrieving some entries of the database, without revealing which item was queried.

Basic description
A basic description of a single server PIR scheme is as follows. Let the database be denoted by DB = {db 1 , … , db N } , containing N files, and suppose the user wishes to retrieve the i-th file db i . The user first constructs a query Q = {q 1 , … , q N } , which hides the information about the index i, and sends it to the server. The server computes a response by performing certain operations between q j and db j for each j, and returns it to the user. The scheme is said to be correct if the user can retrieve the desired file db i from the response.

Communication and computational cost
A simple solution to preserve the privacy is downloading the whole database. However, the communication cost of this operation, measured as the total number of bits exchanged by user and server, in the trivial case is too high, namely O(N) where N is the size of the database. Modern PIR protocols allow the user to retrieve data from the database, with a communication complexity much smaller than O(N) . Some common methods can be used to improve the communication cost of any PIR scheme. In Sect. 3.2, we discuss such techniques in detail.
Another important aspect of a single server PIR scheme is the computational cost. Since the database has to process each entry of the query, the schemes are computationally expensive.

Over finite fields
Let be a vector in n q . The Hamming weight of is denoted by wt( ) and it is defined as the number of its nonzero entries, i.e., it is the size of its support. The Hamming distance between two vectors , ∈ n q is defined as the number of components in which the two vectors differ, i.e., d( , ) = |{i | x i ≠ y i }|.
An [n, k] q linear code C is a k-dimensional subspace of n q endowed with the Hamming distance and the elements of C are called codewords.
The minimum distance d of C is the quantity When the minimum distance d of a linear code C is known, then C is denoted by d ∶= min{d( , ) | , ∈ C, ≠ }.

3
A survey on single server private information retrieval in… A matrix ∈ k×n q whose rows form a basis for C is called generator matrix of C . Hence, we can define the code C as { ∈ n q | = ⊤ , ∈ k q } . Similarly, we can define the code C as the kernel of a matrix ∈ (n−k)×n q , i.e. C ∶= ker( ) = { ∈ n q | ⊤ = } . Such a matrix is called parity-check matrix for the code C . An information set of an [n, k, d] q code C is a set I ⊂ {1, … , n} of size k, such that | C |=| C I |, where C I denotes the restriction of all codewords to the entries indexed by I.

Over rings
Let R be a commutative ring with identity. A linear code C of length n over R is an R-module in the space R n .
A linear code C of length n over R is called cyclic if = (c 1 , … , c n ) ∈ C implies (c n , c 1 , … , c n−1 ) ∈ C . Equivalently, C is an ideal of the ring R[x]∕(x n − 1).
A linear code C of length n over R is called negacyclic if C is an ideal of the ring R[x]∕(x n + 1).

Code-based framework
In this section, we present a generic framework for single server PIR schemes by using the notion of error-correcting codes. For simplicity, we present the framework using a simple database setup, later we discuss different kinds of database setups that can be used to improve the communication complexity.

Code-based framework
Before we describe the framework in detail, we highlight some elements that are used in the framework: • We describe the generic framework over a finite commutative ring R using a retrieval function f ∶ R → R and three subsets X, Y, Z of R. • The database files belong to the set X. • In order to generate queries, we fix a randomly chosen linear code C over R .
Each element of the query is the sum of a randomly chosen codeword in C and an error vector over R. • To generate the error vectors corresponding to the non-desired files we use the set Y, whereas for the desired file we use the set Z.

Setup:
We define a retrieval function f ∶ R → R , and subsets X, Y, Z ⊆ R satisfying: such that any linear combination of elements in Y with scalars in X belongs to ker(f ) , i.e., x 1 y 1 + x 2 y 2 + ⋯ + x j y j ∈ ker(f ) whenever x 1 , … , x j ∈ X and y 1 , … , y j ∈ Y. 3. Z ⊆ f −1 (R × ) such that f (y + xz) = xf (z) for all y ∈ ker(f ), x ∈ X and z ∈ Z.
Note that f does not need to be a ring homomorphism, it can be any kind of function from R to R satisfying the above three conditions. Let = (m i ) ∈ X N represent the database, i.e., there are N files in the database. Suppose that the user wants to retrieve the b-th file from the database.
Let C be a random linear code over R of length n, i.e., C is an R-submodule of R n . Query generation: Let 1 , … , m be generators of C as an R-module, and let Enc ∶ R m → R n be an encoding map of C . Note that Enc is an R-linear map given by (a 1 , … , a m ) ↦ a 1 1 + ⋯ + a m m . Let 1 , 2 , … , N be randomly chosen elements in R m , and define i = Enc( i ) for all i ∈ {1, … , N}. Now, let v be a randomly chosen fixed element in {1, … , n} and we randomly choose error vectors 1 , 2 , … , N in R n , such that they satisfy the following conditions that allow the reply extraction: The query is then given by

Reply generation: The response is generated by computing
Reply extraction: First we perform the decoding by applying the encoding map on 1 , and obtain: After that we can use the retrieval function f on the v-th coordinate, The above equalities follow from the conditions of the retrieval function. Now,

Communication complexity and different database setups
With respect to the basic description of the code-based framework, the communication cost is more than the size of the whole database. Indeed, for each file which is an element in R , we are sending a query element in R n+m . Thus the total communication cost is (N + 1) times the size of an element in R m+n . We can improve the communication complexity by using a matrix database setup [1]  and repeats the query to retrieve each part of the file. Since the query is generated in order to retrieve only small portions of the desired file, the size of the ambient space reduces accordingly. Hence, relative to the size of the database, the query size reduces by a factor of L, and the response size increases by the same factor.

Security
The security of a single server computational PIR scheme is based on the difficulty of identifying the index of the desired file by looking at the query. With respect to the code-based framework, we can describe the security using the following distinguishability problem.
Problem 1 (Distinguishability Problem) Consider the notations of the setup and the query generation process of the code-based framework. Given the query vectors The difficulty of solving the distinguishability problem depends highly on the choice of the retrieval function f. In the following, we present two generic strategies that can be used to solve this problem. However, the computational cost of these strategies directly relies on the choice of the retrieval function and the error vectors 1 , … , N .

Consider the following matrix consisting of the query vectors
Observe that the vectors 1 for all j ∈ {1, … , n} belong to the column span of . We recall that the v-th coordinate of the error vectors are chosen in a special way, i.e.
Hence, one could solve Problem 1 by finding the vector

Examples of different PIR's in our framework
In this section, we discuss several examples of single server PIR schemes that are based on different kinds of retrieval function. In each case, we analyze the security with respect to the distinguishability problem. In Table 1, we summarize all the differences among the schemes.

Basic PIR scheme using finite field isomorphism
In the following we describe the simplest case of the code-based framework, i.e., by considering linear codes over an arbitrary finite field and a field homomorphism for the retrieval function.

Scheme
Setup: Since the identity map is the only non-zero field endomorphism, the retrieval function f ∶ q → q has to be the identity map. We consider the sets X = q , It is easy to see that f satisfies all the conditions of a retrieval function. Let = (m i ) ∈ N q represent the database, i.e., there are N files in the database, each file is of size q. Let C be a random linear [n, k] code over q . The code C is kept secret by the user.
Query generation: Let be a generator matrix of C , and let I ⊆ {1, … , n} be an information set. We use to perform the encoding, i.e., the encoding map Enc ∶ k q → n q is given by ↦ . Let 1 , … , N be randomly chosen vectors in k q , and define the corresponding Recall that in the code-based framework we send i 's in the query to facilitate the decoding in the reply extraction process. However, in this case, this can equivalently be achieved by adding no errors at the coordinates that are indexed by I. In particular, let v be a random element in I C , and we randomly choose error vectors The query is then given by

Reply generation:
The database computes Since I is an information set and Supp( ) ⊆ I C , we can perform decoding on by computing We now only consider the v-th coordinate of and apply the identity retrieval function, which gives

Security
As we discussed in Sect. 3.3, the security of the presented PIR scheme relies on the hardness of solving the distinguishability problem (see Problem 1). In this case, the distinguishability problem can be solved in polynomial time using the first strategy mentioned in Sect. 3.3.
Let be the matrix containing all the query vectors as rows, i.e.,

3
A survey on single server private information retrieval in… This implies that and hence the vector This means that the b-th unitary vector, i.e., the all zero vector having the entry 1 at the b-th position, is in the column span of . An attacker can easily find such a vector by simply going through all N unitary vectors and checking their existence in the column span of . Moreover, existence of another vector of Hamming weight one in the column span of is very unlikely. More precisely, given an N × (n − 1) random matrix where N > n , the probability of having a weight one vector in the column span of is (n − 1)q (n−N) , which is negligible. Despite having a small probability, there exist at most n unit vectors in the column span of , which leaks information about the index b, since n < N.

HHWZ PIR scheme
Recently, Holzbaur, Hollanti and Wachter-Zeh have proposed the first single server PIR scheme based on coding theory in [19]. In this PIR scheme the authors consider the field extension q m and secretly choose a partition of the basis over q . Shortly after, this proposal has been attacked in [20], using that the removal of one row within the query matrix and checking for the dimension of the rest reveals the position of the desired file.
In the following, we describe this PIR scheme presented in [19] with respect to our code-based framework. Later, we also present the attack [20] in terms of solving the distinguishability problem.
Note that the original PIR scheme differs from our description in the database and query setup. In [19], the authors consider the database elements to be L × matrices over the base field q , and the query elements are also × n matrices over the base field q m . Note that the authors have used the technique of iterative reply generation, i.e., by using the same query to retrieve each of the L rows of the database file. In the following description, we consider L = 1 and use an equivalent setup where the database files are single elements in q and the query elements are vectors over q m.

Scheme
In this case, we work over an extension of the finite field q and the retrieval function is an q -linear map.
Setup: Let { 1 , … , m } be a basis of q m as an q -vector space. Further, let V be the subspace Span q ( 1 , … , s ) and W be Span q ( s+1 , … , m ) , where s is some integer in {1, … , m} . The retrieval function is given as It is easy to check that Proj V satisfies all the conditions of the retrieval function.
Let = (m i ) ∈ N q be the database, i.e., there are N files in the database, each file is of size q. Suppose the user wants to retrieve the b-th file from the database. Let C be a random [n, k] linear code over q m.
Query generation: For the encoding and decoding, we follow the same procedure as in Sect. 4.1.
Let be a generator matrix of C , and let I ⊆ {1, … , n} be an information set. We use to perform the encoding, i.e., the encoding map is Enc ∶ k q → n q given by ↦ . Let 1 , … , N be randomly chosen vectors in k q m , and define the corresponding As in Sect. 4.1, we perform the decoding by adding no errors at the coordinates that are indexed by I.
Let v be a fixed element in I C . Now, we choose error vectors 1 , 2 , … , N randomly in n q m such that The query is then given by Reply generation: The response is generated by computing Reply extraction: Since I is an information set and Supp( ) ⊆ I C , we can perform the decoding on by computing

3
A survey on single server private information retrieval in… Now we consider the v-th coordinate of and apply the retrieval function, which gives

Security
The original PIR scheme [19] has been attacked in [20], by solving the distinguishability problem. The attack follows the second strategy mentioned in Sect. 3.3.
Let be the matrix containing all the query vectors as rows, i.e., For each i ∈ {1, … , N} , let i be the submatrix of obtained by deleting the i-th row. Then the q -rank of these matrices satisfy the following proposition.

Proposition 1 [20, Proposition 3.1] Let be given as above. Then
Moreover, for all i ∈ {1, … , N} In the case when N < mn , the query size becomes bigger than the size of the database, i.e., the scheme is no better than the trivial PIR protocol of downloading entire database. Hence, we assume N ≥ mn and we use the following corollary to distinguish the index b in polynomial time.
Proof From Proposition 1, we have that rank In the first case, we have that rank q ( b ) = mk and rank q ( b ) = (n − k − 1)m + (m − s) (with high probability), where the first part comes from the columns not indexed by v, which live in the full space q m = W + V and the second part comes from the column indexed by v, which lives in the subspace W . Note that the equation rank q ( b ) = m(n − k) − s holds true with high probability due to the randomness of the matrix entries.
In the case of i ≠ b , we still have that rank q ( i ) = mk , but now rank q ( i ) = (n − k − 1)m + m (with high probability), where the first part comes from the columns not indexed by v and the second part comes from the column v (observe that in this case all columns are in the full space q m = W + V ). Note that the equation rank q ( i ) = m(n − k) holds true with high probability due to the randomness of the matrix entries. ◻

AMG PIR scheme
In the following, we describe the PIR scheme presented in [21] with respect to our code-based framework. Later, we also present the lattice-based attack [22] in terms of solving the distinguishability problem. Note that the original PIR scheme differs from our description in the following way: • Database setup: in [21], the authors consider the database elements to be vectors over the base field p . Moreover, each query element is a matrix over p . In the following description, we use an equivalent setup where the database files are single elements in p and query elements are vectors over p . • Noise-scrambling matrix : the authors introduce an invertible diagonal matrix in order to disguise the soft-noise error vectors from the hard-noise error vectors. In our description, we ignore this scrambling matrix , as we will see in the security discussion that has no effect on the column space of the query matrix. • In [21], the rate k/n of the underlying linear code is fixed k∕n = 0.5 . In our description we use an arbitrary rate.

Scheme
In this scheme, we work over a finite field p , where p is a prime number. We will see p as {−⌊ p 2 ⌋, … , ⌊ p 2 ⌋}. Setup: Assume that the database is of the form = (m i ) ∈ {0, 1, … , 2 − 1} N with = ⌈log 2 (N)⌉ + 1 , i.e., there are N files in the database each of size bits. Note that if the file size is bigger than bits, then we split the files in chunks of bits. Suppose the user wants to retrieve the b-th file from the database.
Let p be a prime number greater than 2 3 and t = 2 2 . The retrieval function is given by the remainder of the Lee weight corresponding to modulo t, i.e., where wt L t denotes the Lee weight on ℤ∕tℤ = {0, 1, … , t − 1} , which is defined as . Now observe that a linear combination of elements in Y with scalars from X having arbitrary number of terms does not necessarily belongs to ker(f ) . However, the condition is satisfied when we have at most N number of terms in the linear combination: for x 1 , … , x N ∈ X and y 1 , … , y N ∈ Y we have that and hence Further we have that for y ∈ Y, x ∈ X and z ∈ Z Let C be a random linear [n, k] code over p , which is kept secret by the user. Query generation: For the encoding and decoding, we follow the same procedure as in Sects. 4.1 and 4.2.
Let be a generator matrix of C , and let I ⊆ {1, … , n} be an information set. We use to perform the encoding, i.e., the encoding map is Enc ∶ k q → n q given by ↦ . Let 1 , … , N be randomly chosen vectors in k q , and define the corresponding codewords i ∶= Enc( i ) = i for all i ∈ {1, … , N}.
As in Sects 4.1 and 4.2, we perform the decoding by adding no errors at the coordinates that are indexed by I.
Let v be a fixed element in I C . Now, we choose error vectors 1 , 2 , … , N randomly in n q m such that wt L t (z) ∶= min{z, t − z}.

The query is then given by
Reply generation: The response is generated by computing Reply extraction: Since I is an information set and Supp( ) ⊆ I C , we can perform the decoding on by computing We will only focus on the v-th coordinate of and apply the retrieval function to obtain This works since and hence

3
A survey on single server private information retrieval in… Now since gcd(t, p) = 1 , we can retrieve m b .

Security
In [22], Liu et al. presented a lattice-based attack on the AMG PIR scheme. The method used in the attack can be described as per the first strategy, mentioned in Sect. 3.3, to solve the distinguishability problem.
Let be the matrix containing all the query vectors as rows, i.e.,  [v] has N − 1 entries from {−1, +1} and one entry with value equal to t. If we delete the b-th row of , call it the matrix b , then the vector will be, with a very high probability, the shortest vector in the p-ary lattice generated by the columns of b . More precisely, the lattice is generated by the n columns of [ b |p N−1 ] . However, it is still infeasible to find this vector due to the large dimension of the lattice.
In [22], the authors construct multiple small dimensional lattices. Let k ≤ s ≤ N , and let (1) , … , (⌈N∕s⌉) be a row-wise partitioning of the matrix , i.e., (i) is the s × n matrix given by s rows of indexed by {(i − 1)s + 1, … , is} . Now, let L i be the p-ary lattice generated by the columns of (i) . Note that the dimension of the lattices L i is s, hence the attacker chooses s such that implementing basis reduction algorithms for L i is feasible. In order to find the index b, the attacker goes through each of these lattices.
Note that the index b of the desired file corresponds to the lattice L ⌊b∕s⌋ , which the attacker is able to find, and then the attacker finds the index b by solving the closest vector problem for L ⌊b∕s⌋ .
More in detail, in the case of i ≠ ⌊b∕s⌋ , we observe that the shortest vector in L i corresponds to the vector ( (i−1)s+1 [v], … , is [v]) having entries in {−1, +1} . This observation does not hold in the case of i = ⌊b∕s⌋ due to the existence of large t. The attacker uses the lattice reduction algorithms to find the shortest vector in each L i , and consequently finds the corresponding lattice L ⌊b∕s⌋ . Now, the index b can be located using solving the closest vector problem. Let j = ⌊b∕s⌋ . Then observe that ( (j−1)s+1 [v], … , js [v]) ∈ L j is the closest lattice vector to (0, … , 0, t, 0, … , 0) (with t at the b-th position). To find the index b, we can use Kannan's embedding technique [23] to solve (at most) s instances of the closest vector problem with inputs vector of the form (0, … , 0, t, 0, … , 0).

Ring-LWE based PIR schemes
In the section, we describe the PIR schemes constructed using the Ring-LWE (RLWE) based homomorphic encryption schemes. In particular, we consider the construction of XPIR scheme [16] that uses the Ring-LWE based homomorphic encryption scheme presented in [24].
The original PIR scheme differs from our description in the error distribution as follows. In [16], the authors use two different distributions and ′ to sample errors. The distribution is used to generate the public key and the distribution ′ , having larger variance, is used for encryption. In the following description, we consider only one distribution, mimicking ′ , to sample error vectors in the query generation process.
We would like to remark that in the following description, the database elements and the query elements are polynomials of degree smaller than n with coefficients in R , which can also be represented by vectors in R n .

Scheme
In this scheme, we work over a finite ring ℤ∕qℤ , where q is a positive integer. Instead of a random linear code over ℤ∕qℤ , we consider a random negacyclic code over ℤ∕qℤ.
Setup: Let q, t be positive integers with t < q and gcd(t, q) = 1 . The retrieval function is given by Let be a discrete Gaussian distribution with standard deviation . The parameters q, n, t, are chosen such that they satisfy Nt 2 √ n < q∕2 , where n is the length of the linear code that will be used in query generation. Now, we define the subsets Observe that for x 1 , x 2 , … , x N ∈ X and ty 1 , ty 2 , … , ty N ∈ Y we have that

3
A survey on single server private information retrieval in… This works since the choice of parameters q, n, t, implies that � ∑ N i=1 x i ty i � < q∕2 with very high probability. And for x ∈ X, ty ∈ Y and tz + 1 ∈ Z we have that since |ty + x(tz + 1)| < q∕2.
Let n be a power of 2, and let R q ∶= (ℤ∕qℤ)[x]∕(x n + 1) . Let = (m i ) ∈ (X[x]∕(x n + 1)) N , i.e., there are N files in the database and each file is an element in R q with coefficients in X. In particular, each file is of size log 2 (tn) bits. Suppose the user wants to retrieve the b-th file from the database.
Let C be a negacyclic code of length n over ℤ∕qℤ generated by some randomly chosen s ∈ R q , i.e., C is a ideal in R q generated by s. The code is kept secret by the user.
Query generation: We use the generating polynomial s to define the encoding map, i.e., Enc ∶ R q → R q is given by a ↦ as.
Let a 1 , a 2 , … , a N be randomly chosen elements in R q , and define N codewords c i ∶= a i s for all i ∈ {1, … , N}. Now, we choose the errors e 1 , e 2 , … , e N in R q such that they satisfy the following two conditions that allow the reply extraction: 1. e i = ty i , with y i sampled from the distribution , for all i ≠ b, 2. e b = ty b + 1 with y b sampled from .
Let i ∶= (a i , c i + e i ) for all i ∈ {1, … , N} . The query is then given by Reply generation: The response is generated by computing

Reply extraction:
By applying the encoding map Enc on r 1 , we first decode r 2 to obtain the error part, i.e., After that we can use the retrieval function f, .
Note that here we apply f on an element of R q , which is done by applying f on each coefficient. The last equality follows from the conditions on the parameters n, q, t, , since the maximal coefficient of ∑ N i=1 m i e i is, with high probability, upper bounded by Nt 2 √ n (see [24,Lemma 1]), which is less than q/2.

Security
As mentioned above, the XPIR scheme [16] uses the fully homomorphic encryption scheme presented in [24], whose security is based on the hardness of solving the polynomial learning with error (PLWE) problem, which is a simplified version of the ring LWE problem. Let R q = ℤ∕qℤ[x]∕(x n + 1) , and let be a narrow discrete Gaussian distribution on R q . Then the PLWE assumption states that it is computationally hard to distinguish a polynomial number of samples of the form (a i , a i s + e i ) and the same number of samples of the form (a i , u i ) , where s, a i 's and u i 's are sampled uniformly from R q and the e i 's are sampled from .
Moreover, [24,Proposition 1] states that if the samples are of the form (a i , a i s + te i ) , where a i , s, e i are as above and t ∈ (ℤ∕qℤ) × , then distinguishing such samples from the uniform samples is equivalent to the PLWE assumption.
Let be the query matrix and i be the submatrix of obtained by deleting the i-th row. Translating the above mentioned approach [24, Proposition 1] to our generic framework means that distinguishing b from an uniformly sampled matrix is equivalent to the PLWE problem.
However, the second strategy mentioned in Sect. 3.3 aims in a different direction: that is to distinguish between A i for i ≠ b and A b . Thus, this might lead to new security analyses of such PIR schemes.

Generic PIR scheme vs code-based framework
A natural question would be to ask whether any single server PIR scheme can be described in terms of the code-based framework. The answer is no, as the number theoretic PIR scheme by Kushilevitz and Ostrovsky [25] does not fit the framework. However, if we restrict to the class of PIR schemes that generates replies by contracting the database elements and the query elements using linear combinations (which will be denoted from now on as additive PIR schemes), then the answer is yes. In the following, we discuss the requirements of an arbitrary additive PIR scheme and argue the necessity of the elements in the code-based framework to fulfil those requirements: 1. Ambient space: An additive PIR scheme needs two operations: multiplication ( * ) between database and query elements, and addition (+) of those products. Hence, the canonical choice of the ambient space is rings. For practical reasons, the rings should be finite. 2. Retrieval: Let the database be denoted by DB = {db 1 , … , db N } , and the corresponding query be given by Q = {q 1 , … , q N } . Suppose that the user wants to retrieve the b-th file. In an additive PIR scheme, the reply is ∑ N i=1 db i * q i and user wants to retrieve db b from the reply. The operation ∑ N i=1 db i * q i ↦ db b , denoted by g, is an analogue to the retrieval function used in the code-based framework. First we note that g annihilates ∑ i≠b db i * q i in such a way that we are only left with g(db b * q b ) . And then db b is recovered from g(db b * q b ) . These two properties imply that db i 's and q i 's live in special subsets of the ambient space R. Let X denote the space of database elements, Y denote the space of query elements that are not associated with the desired file and Z denote the space of query element associated with the desired file. The requirements on g imply that: (1) a linear combination of elements in Y with scalars in X belongs to the kernel of g, and (2) g(x * z) = x * g(z) and g(z) is an invertible element, for any x ∈ X and z ∈ Z . These two conditions are the basis of the conditions of the retrieval function used in the code-based framework. 3. Privacy: Another important aspect of a PIR scheme is privacy, i.e., given a query Q, it should be computationally infeasible to determine the index b of the desired file. Let us look at the scenario where we directly use elements in Y and Z to generate query elements. Then the privacy relies on the hardness of the following decisional problem: given q ∈ Y ∪ Z , decide whether q ∈ Y or q ∈ Z . In general this may not be a hard problem, as one can apply the retrieval function to distinguish the elements between Y and Z. Therefore, to ensure privacy we must add some randomness to the query elements. Moreover, the user should be able to remove this randomness even after receiving the reply that contains their linear combinations. This is exactly the rationale of linear error-correcting codes. We treat the elements of Y and Z as errors, and the added randomness belongs to a random linear code.

On security of PIR schemes
In terms of the code-based framework, the security of a PIR scheme relies on the type of the underlying retrieval function. As we have noticed from the examples in Sect. 4, the following type of retrieval functions are not safe to use.

Field homomorphisms:
In the case where the retrieval function is a non-trivial field homomorphism, the PIR scheme is then equivalent to the one described in Sect. 4.1. The kernel of the retrieval function must be {0} , as {0} is the only proper ideal in any field. As a consequence, determining the index of the desired file becomes an easy task of finding a unitary vector in the column space of the query matrix, thus it suffers from the first attack strategy discussed in Sect. 3.3. 2. Vector space homomorphisms: In this case, the resulting PIR scheme is equivalent to HHWZ PIR scheme [19], described in Sect. 4.2. The kernel of a non-trivial linear map is a proper subspace of the parent vector space. This results in an exceptionally low rank of the matrix that is obtained from the query matrix by deleting the row that corresponds to the desired file, thus it suffers from the second attack strategy discussed in Sect. 3.3.
We can generalize these two cases to more types of retrieval functions. Clearly, the weakness of vector space homomorphisms can also be observed in the case of free module homomorphims, because of the existence of the notion of rank and dimension for free modules. On the other hand, the weakness of field homomorphisms can be seen in the case of local ring homomorphims. Let Hence, similar to the field homomorphism case, we observe the existence of a unit vector in the column space of the query matrix. The other two schemes, presented in Sects. 4.3 and 4.4 respectively, do not use additive retrieval functions. Both the schemes work on the idea of using small modulus errors in a large modulus ambient space. Due to which the security eventually relies on finding short vectors in a high dimensional lattice, which is a computationally hard problem. However, in the case of AMG PIR scheme, the problem breaks down over multiple small dimensional lattices and hence the attack becomes feasible. Whereas in the case of LWE-based PIR schemes, this new perspective may have a potential in introducing new approaches for their security analysis.
In order to construct an additive PIR scheme, one may investigate the cases of structured morphisms like ring homomorphisms and module homomorphisms, or the cases of unstructured morphisms like the functions used in AMG scheme and LWE-based schemes.
Furthermore, if one constructs an additive PIR scheme independently, then it would be worth translating the scheme in terms of the code-based framework to check for possible security issues.