The stochastic geometry of unconstrained one-bit data compression

A stationary stochastic geometric model is proposed for analyzing the data compression method used in one-bit compressed sensing. The data set is an unconstrained stationary set, for instance all of $\mathbb{R}^n$ or a stationary Poisson point process in $\mathbb{R}^n$. It is compressed using a stationary and isotropic Poisson hyperplane tessellation, assumed independent of the data. That is, each data point is compressed using one bit with respect to each hyperplane, which is the side of the hyperplane it lies on. This model allows one to determine how the intensity of the hyperplanes must scale with the dimension $n$ to ensure sufficient separation of different data by the hyperplanes as well as sufficient proximity of the data compressed together. The results have direct implications in compressive sensing and in source coding.


Introduction and Motivations
One-bit compressed sensing is a method of signal recovery from a sequence of measurements contained in {−1, 1}. More specifically, one aims to recover the signal x ∈ R n from measurements of the form where the u i are independent vectors in R n and t i random displacements in R. One can interpret this problem geometrically, by the fact that each pair (u i , t i ) defines a unique affine hyperplane in R n with normal vector u i at distance t i from the origin. The measurement y i ∈ {−1, 1} then indicates which side of the hyperplane the signal x lies on. This collection of hyperplanes tessellates the space of signals into convex cells. Two signals contained in the same cell will have the same set of one-bit measurements {y i }. The quality of this compression can be measured in a few different ways. For instance, one can measure how likely it is that two different signals are compressed differently, i.e., lie in different cells of the tessellation. As in one-bit compressed sensing, the quality can also be determined by having a small error in signal recovery, which can be guaranteed if the collection of hyperplanes tessellate the signal space into cells small enough to ensure all signals within a single cell are close in Euclidean distance.
Previous work ( [4], [15], [19]) has examined this problem when it is known that the signal lies in some bounded set K ⊂ R n . In this paper, we consider the data set to be either all of R n or an uncountable discrete subset of R n modeled with a stationary Poisson point process. The assumption that the data is Poisson provides a worse-case The first and second author were supported by a grant of the Simons Foundation (#197982 to UT Austin) and the second author was supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1110007. 1 scenario, since any dependence between the underlying points increases one's ability to compress the data in such a way that the signals can be recovered with small error. The set of random hyperplanes used to obtain the one-bit measurements is given by a stationary and isotropic Poisson hyperplane process. The reasons for this choice are discussed at the end of the paper (see Subsection 6.3), the key reason being that it leads to the least volume of data compressed with a typical data point among a wide collection of hyperplane models.
As already explained, the aim is to find the minimum intensity of the hyperplane process at some scaling with the space dimension n such that different data will be separated by hyperplanes with high probability, and also for data compressed in the same way to be close with high probability. Under the assumption of stationarity, we can ask for, in some sense, a "typical" instance to satisfy the desired property. To address the "typicality", there are two viewpoints to take. One is from the view of a typical data point, and in the stationary regime, we can consider its location to be at the origin. The cell of the tessellation that the typical signal is contained in is then the so-called zero cell [9], also referred to as the Crofton cell. The other viewpoint is to ask that a typical cell satisfy some property, e.g., to have small diameter. The typical cell of a stationary Poisson hyperplane tessellation can be interpreted as the distribution of the cell obtained when taking a large ball centered at the origin, and picking a cell intersecting that ball uniformly at random. The zero cell is larger in mean than the typical cell, as there is bias towards larger cells when asking that it contain the origin. The viewpoint of a typical signal and its cell, the zero cell, seems a more natural viewpoint to take here, and will be the main focus of this paper, although some results are also derived on the typical cell for comparison.
To summarize the results, consider a sequence of compressions indexed by dimension, i.e., for each n, let X n be a stationary and isotropic Poisson hyperplane tessellation in R n with intensity γ n that is used to compress the underlying data. We let γ n ∼ ρn α as n → ∞ and discuss the values of α for which a good separation or low distortion of the data can be achieved with high probability by the hyperplanes when n is large. Several criteria of good separation and low distortion are discussed. By good separation, we mean a property that connects differences between data and differences between their encodings. By low distortion, we mean a property than connects closeness of data and similarity of their encodings. The results on the matter are summarized below when data are the whole of R n .
The first separation criterion discussed is that the distance to the nearest data that is compressed differently from the typical data (i.e., the closest point of the Euclidean space which is not in the zero cell) be small. It is shown that as long as α > 0, this distance tends to zero in distribution as n tends to infinity.
The second separation criterion considered is that some transformation of the typical signal is compressed differently than the typical signal with high probability. We discuss two types of transformations: (i) a Gaussian displacement with fixed variance σ per dimension (which is the least demanding of the criteria discussed here), and (ii) a displacement at a fixed distance σ away and in a random direction. For case (i), we show that, for α = 0, the typical signal is compressed in the same way as the typical 2 signal with a probability decreasing exponentially with ρ. We also show that the same holds in case (ii) provided α = 1 2 . The first low distortion criterion is the requirement that the volume of other data compressed with a typical data be small. The hyperplane intensities discussed above are not large enough for this to hold. While data in most directions will be separated from the typical data, there is a set of directions of decreasing measure in which the compression will remain identical, and in high dimension, this is where most of the volume of data compressed like the typical signal lies. Considering this low distortion criterion, we show that, for α = 1, there is a threshold for ρ above which the expected value of the volume in question goes to zero and below which it approaches infinity.
A small volume still does not ensure that all data compressed together is close in Euclidean distance. This motivates the discussion of a second low distortion criterion. In the case where data is the whole Euclidean space, the requirement is that the point which is the farthest away from the typical data and encoded in the same way be within some distance R. It is shown that if we increase α to 3 2 , then there exists a value for ρ above which this probability approaches one as dimension n tends to infinity. A similar criterion for the case when the data is modeled with a Poisson point process is also discussed.
Some of these scalings can be significantly decreased if it is known that the data are 'sparse', namely lie within a lower dimensional subspace of R n . In Section 5, we show how this affects the intensity of hyperplanes needed for the above low distortion criteria.
The results have several implications in compressed sensing and in source coding. These are discussed in Subsections 6.1 and 6.2 at the end of the paper.

Preliminaries and Notation
First we define the notation for the classical objects used in the present paper. Let B n (r) denote the ball or radius r centered at the origin in R n . The usual ℓ 2 norm of a vector is denoted by | · |, and the n-dimensional volume of a set K ⊂ R n by V n (K). The volume of the n−dimensional unit ball B n (1) is denoted by κ n and the surface area of the n-dimensional unit sphere S n−1 is denoted by ω n . They satisfy .
Also recall the following special functions. The gamma function is defined as and the upper and lower regularized incomplete gamma functions are defined for all R ≥ 0 by respectively. Stirling's formula gives the following asymptotic expansion as x → ∞: 3 The following asymptotic formulas will be used throughout: by (1), as n → ∞, Denote by F , C the sets of closed and convex subsets of R n , respectively. For A ⊂ R n , define The σ-algebra B(F ) of Borel sets of F is generated by either of the systems {F C : C ∈ C} and {F C : C ∈ C} (see Lemma 2.1.1 in [21]). Denote the set of n − 1 dimensional hyperplanes in R n by H n and the Grassmanian of n − 1-dimensional linear subspaces of R n by G(n, n − 1). The set G(n, n − 1) is the subset of hyperplanes in H n that pass through the origin.

Poisson Hyperplane Tessellations.
A hyperplane process X in R n is a random counting measure on the space H n . The process X is stationary if its distribution is invariant under translations and it is isotropic if its distribution is invariant under rotations about the origin.
The intensity measure of X is defined as Θ(·) := E[X(·)]. The following theorem (see, e.g., [21]) provides a decomposition for the intensity measure for all stationary hyperplane processes. Note that elements of the space H n are of the form where u ∈ R n and τ ∈ R.
Theorem 2.1. Let X be a stationary hyperplane process in R n with intensity measure Θ = 0. Then, there is a unique number γ ∈ (0, ∞) and probability measure Q on G(n, n − 1) such that for all nonnegative measurable functions f on H n , where for A ∈ B(S n−1 ), φ(A) := 1 2 Q({u ⊥ : u ∈ A}). φ is called the spherical directional distribution. In particular, for A ∈ B(H n ), The parameter γ is called the intensity and Q the directional distribution of X. If X is isotropic, then Q is rotationally invariant and thus is the Haar measure ν n−1 and φ = σ, the normalized spherical Lebesgue measure on S n−1 .
The hyperplane process X with intensity measure Θ is Poisson if for all disjoint Zero cell. A hyperplane process X in R n induces a random tessellation of R n .The zero cell, or Crofton cell, of this tessellation, denoted Z 0 , is the cell of this tessellation containing the origin.
The following result (see Theorem 10.4.9 in [21]) states that for stationary Poisson hyperplane processes, isotropic hyperplanes minimize the expected area of the zero cell over all spherical distributions. This result helps to justify considering the class of isotropic Poisson hyperplanes to tessellate the space, since cells of smaller volume may lead to a more efficient compression.
Theorem 2.2. Let X be a nondegenerate stationary Poisson hyperplane process in R n of intensity γ, and let Z 0 be the zero cell of the induced hyperplane tessellation. Then, with equality if and only if X is isotropic.
As mentioned in the introduction, a small volume is not sufficient to ensure that two data points that have the same compression are close together. This requires the cell the points are contained in to have small diameter, but this is a difficult quantity to study. A related quantity is the radius of the smallest ball centered at the origin that contains the cell C, i.e., the quantity The distribution of R M (Z 0 ) is described in [6]. It is based on the observation that if R M ≥ r, then the sphere of radius r centered at the origin will not be covered by the random arcs generated by the hyperplanes that compose the faces of Z 0 , i.e., rS n−1 ∩ int(Z 0 ) = ∅. Since the directional distribution of X is just the Haar measure on S n−1 , the probability that R M ≥ r is the probability that S n−1 can be covered by a Poisson number N of independent spherical caps, with angular radii divided by π distributed as dν(θ) = π sin(πθ)1 [0,1/2] (θ)dθ. Unfortunately, no explicit formula for this probability is known beyond dimension two.
2.3. Typical cell. Since larger cells are more likely to contain the origin, the zero cell is not a good measure of the average or "typical" cell. We can instead consider a large compact set and pick a cell uniformly at random and translate it is some appropriate way so that it contains the origin. This more accurately represents the average distribution of the cells induced by the hyperplane process. Formally, we define the typical cell as follows. Let c : C ′ → R n be a center function, that is, a measurable map which is compatible with translations, i.e., c(C + x) = c(C) + x for all x ∈ R n . For a hyperplane process X, letX denote the induced random mosaic, that is, the collection of cells of the induced tessellation.
Definition 2.1. The typical cell Z of a hyperplane process X is the random polytope with distribution where B ∈ B(R n ) is an arbitrary bounded Borel set, and λ is the cell intensity ofX. Also, this distribution has the ergodic interpretation , a.s.
The cell intensity λ of the induced random mosaicX of a hyperplane process X in R n is related to the intensity γ of X in the following way: Let Z denote the typical cell of X. It is known that (see, e.g., [21, (10.4) and (10.46)]), Remark 2.1. Consider a sequence of hyperplane tessellations X n in increasing dimensions R n with intensity γ n and cell intensity λ n . If λ n ∼ e nλ as n → ∞, this corresponds to when γ n ∼ ρn as n → ∞. This exponential scaling with dimension for the point process of cell centroids matches the so-called Shannon regime studied in [2], and leads to a linear scaling of the hyperplane intensity with dimension.
The inradius r in of a cell is the radius of the largest ball completely contained in the cell. The following result gives the distribution of the inradius of the typical cell.  [21]) Let X be a nondegenerate stationary Poisson hyperplane process in R n with intensity γ. Let Z be the typical cell. Then, a ≥ 0.
2.4. Palm Distribution. Throughout this paper, when the underlying data is assumed to be discrete, it is modeled by a stationary Poisson point process N with intensity λ. Since this is an unbounded collection of data, we need some way of examining a typical data point and the cell of the tessellation that contains it.
To do this, we use the Palm probability measure of N, denoted by P 0 N , which is defined as follows. Let (Ω, A, {θ t } t∈R n , P) be a stationary framework and N a random measure compatible with the flow {θ t } t∈R n , implying N is stationary. The Palm probability associated with N, denoted P 0 N , is defined on (Ω, A) by for any bounded Borel set B with volume one. The Palm probability P 0 N can be thought of the distribution of N conditioned on there being a point at 0. Thus, to talk about the cell of a typical data point, we condition on a point being at 0, and examine the cell of the tessellation it is contained in, i.e., the zero cell. There is also the following ergodic interpretation of the Palm probability. By Birkhoff's Pointwise Ergodic theorem, for all convex averaging sequences {K m } m≥1 in R n , and all f : Ω → R + measurable and in , as m → ∞, P − a.s.

6
Thus, we can think of the Palm probability as the empirical average over all the points in a very large ball. The reduced Palm probability measure of N, denoted P 0,! N is defined as P 0 N −δ 0 , that is, the Palm measure with the point at 0 removed. An important result called Slivnyak's theorem states that a Poisson point process has the same distribution as its reduced Palm distribution, i.e. P 0,! N = P N . The distribution of the typical cell of a stationary tessellation can also be thought of as the zero cell of the tessellation under the Palm measure of the point process of cell centers. That is, its distribution is that of the cell containing the origin, conditioned on a cell of the tessellation having its center at the origin.

Results
In this section, for each n, let X n be a stationary and isotropic Poisson hyperplane process in R n with intensity γ n representing the compression scheme (note that the Poisson assumption implies that the compression scheme is characterized by a single parameter γ n > 0, for all dimensions n). The zero cell of the tessellation is denoted Z 0,n and the typical cell is denoted Z n . In the case where the underlying data is discrete, N n is a stationary Poisson point process with intensity λ n lying in R n and independent of X n , representing the data. The Palm probability of N n is denoted by P 0 n . As explained in the introduction, the goal is to find the minimum intensity γ n needed to separate or minimize the distortion of the data R n or N n with high probability according to various criteria listed there.
3.1. Distance from typical data to nearest data compressed differently. Given a typical data point, we first ask how far away the closest data is that is compressed differently in any direction. When the data is all of R n , this is the distance to the nearest separating hyperplane in any direction. To find the distribution of this distance, notice that if no hyperplane hits the ball of radius r centered on the typical data, then this distance is greater than r. This is the spherical contact distribution [21]: Proposition 3.1. Assume γ n → ∞ as n → ∞, for example γ n ∼ ρn α as n → ∞ for any α > 0. Then, for fixed r > 0, lim n→∞ D n (r) = 0.
Proof. By the fact that X is Poisson, Another viewpoint to take is the distance to the nearest data compressed differently from the center of a typical cell of the tessellation, where the center is considered to be the center of the largest ball completely contained in the cell. This is equivalent to asking for the distribution of the inradius of the typical cell. Theorem 2.3 implies the following. 7 Proposition 3.2. Assume γ n → ∞ as n → ∞, for example γ n ∼ ρn α for any α > 0. Then, for fixed r > 0, lim n→∞ P(r in (Z n ) > r) = 0.
3.2. Separation of two different data. The next criterion for separation is the probability that two different data points, one obtained by some given transformation of the other, are compressed differently, i.e., the probability that there is at least one hyperplane separating them. First, consider the case where the transformation is a random displacement by an i.i.d. Gaussian with mean zero and variance σ 2 per dimension.
Then, since X is Poisson, by (7), By (8), By the strong law of large numbers, |Y n | 2 /n → σ 2 a.s., and by (2), as n → ∞, Then, as n → ∞, Next, consider the case where the displacement is uniformly chosen on the sphere of fixed radius δ. By the fact that the tessellation is isotropic, this is equivalent to looking at the linear contact distribution for any fixed direction u ∈ S n−1 at distance δ: Proposition 3.4. For each n, let Y n,δ be a uniformly chosen random point on the sphere of radius δ in R n . Under the same assumptions as in Proposition 3.3, Then, by the asymptotic formula (9), as n → ∞, By continuity, the conclusion holds.
Note that a scaling of γ n greater than n 1 2 (resp. more than a constant) is needed for this last separation criterion (resp. that of the Gaussian displacement) to hold as dimension increases. This is less than what is needed for the expected volume of V n (Z 0,n ) to be small as seen in the next section. This indicates that in high dimensions, most of the volume of the cell is concentrated in a set of directions with very small measure.
3.3. Volume of data compressed together. This section is focused on the asymptotic behavior as n goes to infinity of the volume of the data that is compressed together in a cell of the tessellation. The requirement that this volume tends to zero is a first low distortion criterion. One viewpoint is to examine the volume of data in the cell containing a typical data point. When the data is all of R n , this is the just the volume of Z 0,n . This quantity has been studied in [12] and [11]. The expected value is From [11], the following bounds on higher moments of V n (Z 0,n ) are obtained: A corollary in [11] shows there exist constants c and C, not depending on n or γ, such that 9 The authors note that if γ scales with n in such a way that E[V n (Z 0,n )] = 1 for all n, the lower bound implies that the variance of V n (Z 0,n ) approaches infinity as the dimension n increases, which contrasts with the behavior seen in the typical cell of the Poisson-Voronoi tessellation, where the variance converges to zero, see [1].
By the asymptotic formulas (2) and the above results, we obtain the following limiting behavior as dimension goes to infinity. Proposition 3.5. Let γ n ∼ ρn as n → ∞ for some ρ > 0. Then, In addition, Proof. By (2), as n → ∞, Thus, by (10), under the assumption γ n ∼ ρn, we have the following limiting behavior: This implies the last statement.
Another viewpoint is to consider the volume of the typical cell Z n of the tessellation. This measures the volume of a typical collection of data that is compressed together. Proposition 3.6. If γ n ∼ ρn for some ρ > 0 as n → ∞, then In addition, Proof. By (6), the expected value of the volume is Then, by (2), as n → ∞, Thus, assuming γ n ∼ ρn as n → ∞ for ρ > 0, The right hand side is positive if ρ < e −1/2 and negative if ρ > e −1/2 , which implies the last statement of the proposition.
When the data set is (the support of) a stationary Poisson point process, the volume of the zero cell has to be replaced by the number of points of N n that lie in Z 0,n . A similar threshold exists for the expected amount of data in Z 0,n , but it depends on the intensity of N n . This then implies that for ρ big enough, the probability that there is another data point in the cell of a typical data is small, meaning that with high probability, the cell of the tessellation determines the data uniquely.

3.4.
Farthest distance between two data points compressed together. Another and more demanding low distortion criterion is that all the data compressed together be close in Euclidean distance. Consider first the case when the data is all of R n . We want to find the scaling necessary for γ n to ensure that all data points in the zero cell are within some distance from the typical data point at the origin. This is equivalent to showing that the radius of the smallest ball centered at the origin that contains all of the zero cell is small. As mentioned in Section 2.2, a closed form for the distribution of this radius R M is only known in dimension two, but we can obtain bounds that give the following asymptotic behavior.
Theorem 3.1. Assume γ n ∼ ρn α as n → ∞ and let R > 0. Then, there exists Before proving the Theorem, we need the following. Define the beta prime density with parameters n ∈ N and σ > 0 as follows: . Let X 1 , . . . , X m be i.i.d random vectors in R n with density f n,σ and let P σ m,n denote the convex hull of these points. Also, define A := A(X 1 , ..., X n ) to be the d − 1 dimensional affine subspace containing the points X 1 , . . . , X n , and let h(A) be the signed distance from the origin to the subspace A. The following lemma gives the probability that the points X 1 , . . . , X n form a face of P σ m,n . Lemma 3.1.
Proof. Let π A ⊥ be the projection from R n to the 1-dimensional subspace A ⊥ and define the isometry I A ⊥ : A ⊥ → R such that I A ⊥ (0) = 0. By Lemma 3.1 in [14], if X has density f n,σ , then I A ⊥ (π A ⊥ (X)) has density This was stated with σ = 1 in the reference, but if X has density f n,σ , then X/σ has densityf n,1 , and the more general statement follows from a change a variables, since I A ⊥ (π A ⊥ (X/σ)) = I A ⊥ (π A ⊥ (X))/σ. 12 Also, by Corollary 3.6 in [14], if X 1 , . . . X n have the beta prime density f n,1 , then h 2 (A)/σ 2 has density By a changes of variables, Hence, the distribution of |h(A)| has densitỹ Then, by the fact that [X 1 , . . . , X n ] is a facet of P σ m,n if and only if where the last inequality follows from the fact that the densities are symmetric. Hence, We can now prove Theorem 3.1.
Let Π n (σ) be a Poisson point process on R n \{0} with intensity measure ν. Then, (16) implies the following generalization of (4.6) in [13]: As m → ∞, where X 1 , . . . , X m are i.i.d random vectors in R n with density f n,σ . Now, let P σ m,n be the convex hull of X 1 , . . . , X m . The convergence (18) implies that with C(P ) denoting the convex hull of the points in set P . Now, by the same argument as in the proof of Theorem 1.21 of [14], the convex dual of C(Π n (σ)) has the same distribution as the zero cell Z 0,n of a stationary and isotropic hyperplane tessellation with intensity γ n = σωn ω n+1 . Hence, the distances to the faces of the convex hull of Π n (σ) are the reciprocal of the distances to the vertices of Z 0,n . This gives where the last equality follows from Lemma 3.1. By the same arguments as in Lemma 4.9 in [14], as m → ∞, Then, since m n ∼ m n n! as m → ∞, Let γ n = σωn ω n+1 , i.e., let σ = γ n ω n+1 ωn . Then, E[# of vertices farther than r in Z 0,n ] = Γ( n+1 2 )π n √ πΓ( n 2 + 1) Γ u n, γ n ω n+1 πω n r .
Now, by Markov's inequality, Also, By the assumption on γ n and (1), as n → ∞, Then, by Laplace's method (see Lemma A.2 in [17]), for ρ > and similarly, for ρ < The function ln π + ln x − x + 1 is concave, and has two zeros, one 0 < x ℓ < 1 and one where x u > 1. These zeros determine the values of respectively.
Next consider the case where the underlying data is a Poisson point process, and more precisely the regime where the expected number of points in the zero cell goes to infinity. Theorem 3.2 below gives a sufficient condition for all points of the point process which are contained in the zero cell (the cell of the typical data) to be within distance R n from the point at the origin (the typical data). The result also shows that the same scaling that is sufficient for the criterion to be satisfied is also necessary.
If R is such that ρ u := √ π R √ 2 a(R, λ) < ρ * , which holds for R large enough, then for all ρ in the interval (ρ u , ρ * ), where the convergence is at least exponential of rate where the convergence is at least exponential of rate λ+ 1 2 log 32πeR 2 − √ 2ρR √ π < 0.
Remark 3.1. To separate data more efficiently, we would ideally like to assume a relationship between λ n and γ n such that the cells of the tessellation contain more than one point with high probability. The assumption that lim n→∞ E 0,! n [N n (Z 0,n )] = ∞ does not ensure that lim n→∞ P 0 n (N n (Z 0,n ) > 1) = 1, however. The second moment method does not help, since this lower bound goes to zero as n goes to infinity for all λ n , and thus it remains an open question what scaling of λ n and γ n is needed to ensure lim n→∞ P 0 n (N n (Z 0,n ) > 1) = 1.

Summary
Our results can be summarized in terms of phenomena that successively take place when increasing ρ for a given α and incrementing α, when parameterizing the intensity of hyperplanes as ρn α . As soon as α is positive, one finds a data arbitrarily close and encoded differently w.h.p. In addition, a displacement of order √ n in a random direction leads to an encoding which is different w.h.p. When moving to α > 1 2 , a displacement of order one in a random direction leads to an encoding which is different w.h.p. Further phenomena start appearing when α = 1 (Shannon regime). When increasing ρ, one first gets a small volume for the typical cell, and then for the zero cell w.h.p. At this scale, one can also control distortion, namely the fact that the most distant data point encoded like the typical data is at distance at most √ nR w.h.p. by a proper choice of ρ with ρ arbitrarily small as R grows. A new phenomenon appears at α = 3 2 where a sufficiently large ρ guarantees that the most distant data point encoded like the typical data is at distance at most R w.h.p. The following table illustrates how and when this collection of phenomena take place when increasing α and ρ .
In the above table, the only distortion measure which was included is P(R M (Z 0,n ) > r), but as mentioned, we could also consider P(R M (Z 0,n ) > √ nr), which follows the information theoretic Shannon regime discussed later in Section 6.2. In this case the threshold above which this probability is small in high dimensions is for α = 1 and ρ > ρ u , and by Remark 2.1, this is the scaling at which the centroids of the cells have intensity growing like e nλ with dimension n for some λ ∈ R. Table 2. Limit of separation and distortion metrics as n → ∞ for different values of α and ρ when γ n ∼ ρn α .

Dimension Reduction
If it is known beforehand that the data lie in a lower dimensional subspace of R n , then the number of random hyperplanes needed to encode it may be much less than was evaluated above. If the subspace is known, we can tessellate the subspace directly. But if only the dimension of the subspace known, then we can model the subspace containing the data as a uniform random subspace in R n independent of X n . Let L be a random subspace in R n of dimension m(n), independent of the hyperplane tessellation X. If we assume that the data all lie in L, then instead of considering the zero cell Z 0 of X in R n , we can consider the zero cell Z (L) 0 of the tessellation induced by the intersection of X with L. By radial symmetry, we can just consider a fixed subspace L. It is known that X ∩ L is a Poisson hyperplane process with intensity measure where γ m = ωmω n+1 ωnω m+1 γ. In [11], the authors showed that and established the following results on higher moments: Proposition 3.5 can be extended to this case: Proposition 5.1. Let L n be a random subspace of R n with dimension m n < n such that m n → ∞ as n → ∞. Let X n be a stationary and isotropic Poisson hyperplane process in R n with intensity γ n . Then, if γ n ∼ ρ √ m n n for some fixed ρ > 0, . Similarly, Theorem 3.1 can be extended to: 20 Proposition 5.2. Let L n be a random subspace of R n with dimension m n < n such that m n → ∞ as n → ∞. Let X n be a stationary and isotropic Poisson hyperplane process in R n with intensity γ n , and let R > 0 Then, if γ n ∼ ρn α−1 m n as n → ∞, then there exists ρ u such that for all ρ > ρ u , and there exists ρ ℓ such that for all ρ < ρ ℓ , 6. Comments 6.1. One-Bit Compressed Sensing Comments. In this paper, the compression of the data can be considered as a sequence of one-bit measurements, where each bit gives the side of a random hyperplane the data lies on. This is the paradigm of one-bit compressed sensing, and the aim of this section is to further connect this theory with the results in this paper. Traditional compressed sensing is concerned with recovering a signal x ∈ R n from a measurement vector y = Ax ∈ R m , where A is some m × n measurement matrix (m ≤ n). The goal is to find the smallest m such that the signal x can be recovered from y. If m is less than n, this problem is ill-posed. However, Tao and Candes [8] showed that under the assumption that x is s-sparse, i.e. |supp(x)| ≤ s, x can be recovered from y = Ax, where A is Gaussian matrix, with m = O s log n s measurements. In general the measurement vector in this set-up requires infinite bit precision. Onebit compressed sensing was introduced by Baraniuk and Boufounos in [5] and aims to recover x from the most severely quantized measurements possible: y = sign(Ax). This contains just one-bit per measurement. Note that taking these measurements loses all information regarding the norm of x, so we can only hope to recover x/|x|. The goal is then to find a x * ∈ S n−1 such that |x/|x| − x * | < δ for some error δ. To reconstruct the signal from m measurements, Plan and Vershynin showed that one can solve the convex optimization (26) min x 1 subject to sign(Ax) ≡ y and Ax 1 = m, where A is a m × n matrix with i.i.d. standard Gaussian entries, see Theorem 1.1 in [18]. The original signal is recovered with small error if it can be guaranteed that the reconstructed signal is close in Euclidean distance to the original signal with high probability. Plan and Vershynin showed this error guarantee specifically for sparse or almost sparse signals using the following two results. First, they showed that if the original signal is effectively sparse (see Remark 1 in [18]), the signal returned from the optimization (26) will also be effectively sparse. Second they use the fact that there is a tessellation of the signal space S n−1 ∩ Σ s , where Σ s := {s − sparse signals}, with m = O(s log 2 (n/s)) hyperplanes where all cells in the tessellation will have diameter at most δ, i.e., all sparse signals within a cell of the tessellation will be with δ-distance apart from eachother. Thus, the recovered signal will be within distance δ of the original signal with high probability. In fact, they showed a more general result in [19] that, 21 for a subset K ⊆ S n−1 , all cells of a tessellation with m ≥ Cδ −6 ω(K) 2 hyperplanes will have diameter at most δ with probability as least 1 − 3e −cδ 4 m , where ω(K) is the Gaussian mean width of the set K. Some recent work has shown that the same geometric techniques can be used to recover a signal x, both direction and magnitude, if it is known that |x| ≤ R < ∞. Instead of linear hyperplanes tessellating K ⊂ S n−1 , consider a bounded set K ⊂ R n and tessellate it with affine hyperplanes with normal vectors a i and translations from the origin t i . It was shown in [3] to show that a s-sparse signal x with |x| ≤ R can be recovered with measurements of the form where t 1 , ..., t m ∼ N (0, R 2 ) are independent of a 1 , ..., a m . It is proved that the following program recovers the signal with small error: More specifically, Theorem 2 in [3] states that with probability at least 1−3 exp(−cδ 4 m), the following holds for all x ∈ B n (R) ∩ Σ s : For n ≥ 2m and m ≥ Cδ −4 s log(n/s), and for y obtained from the measurement model (27), the solution x * to the program (28) satisfies |x − x * | ≤ δR. Also, Knudson et al. [15] showed that if t is a Gaussian vector with variance depending on R, x can be recovered if |x| ≤ R by lifting to one dimension higher and using the program (26). They also showed you can estimate the magnitude (but not direction) of a signal x in an annulus r ≤ |x| ≤ R up to error δ with m R 4 r −2 δ −2 measurements from evaluating the inverse Gaussian error function.
If we remove the norm constraint on the signal, one can use a stationary and isotropic hyperplane tessellation to obtain an infinite sequence of one-bit measurements encoding the signal. Instead of minimizing the number of hyperplanes, the intensity of hyperplanes is minimized, as done throughout this paper for the various separation/distortion metrics. The encoding scheme corresponding to a stationary and isotropic Poisson hyperplane tessellation is given as follows. Letting {u i } i∈Z be an i.i.d sequence of normal Gaussian random vectors in R n , and {t i } i∈Z be the support of a Poisson point process of intensity γ in R, then the encoding is given by the one-bit measurements The collection of hyperplanes {H(u i , t i )} i∈Z tessellates all of R n and forms a stationary and isotropic Poisson hyperplane process with intensity γ, and all data within a single cell of the tessellation have the same encoding. The results in the paper provide an analysis of the quality of the compression, in terms of theoretical error bounds on the separation of a typical signal from other signals or the distortion of a typical signal. These are based on some metric of the cell that a typical signal lies in, i.e., the zero cell by stationarity. The paradigm of one-bit compressed sensing requires the ability to recover the original data given only its one-bit encoding. Given an encoding, if one can identify a member 22 of the cell corresponding to this sequence of bits, one can use this as an approximation of the original data. The convex optimization recovery technique used in the literature for the constrained norm case will return a signal x * that is one of the vertices of the cell, and knowing that all cells have small diameters ensures that recovered signal is close the original. The analogous strategy for the Poisson hyperplane compression requires showing that the vertex of the zero cell that is furthest from the origin is close in Euclidean distance, and thus the measure of distortion needed to ensure signal recovery through this convex optimization strategy is Theorem 3.1. To ensure that the farthest vertex of the cell containing the original signal is within error distance δ the intensity of hyperplanes γ n must be on the order of n 3/2 .
An alternative method for reconstruction that returns a point of the cell more likely to be close to the typical signal would provide a more efficient compression. For example, if the reconstruction returns a uniformly distributed signal in the cell determined by the measurements using, for instance, the algorithm for finding an approximate uniform random point in a convex set in [10], this could be guaranteed to be close to the original signal with high probability using results from [17].
As seen later, a deterministic grid actually performs better than the isotropic Poisson hyperplane tessellation in the full dimensional case in the sense that a smaller constant ρ is needed to ensure that the furthest vertex, or a uniform random vector in the cell, is close with high probability. However, if the data is sparse, or somehow lower-dimensional, this may make the isotropic case more desirable. In the case of a deterministic grid, only in the best case scenario will the intersection of the tessellation with a random m-dimensional subspace be a m-dimensional grid. However, in the isotropic case, the intersection will always have the distribution of a m-dimensional isotropic hyperplane tessellation. A more complete analysis of the case of sparse and lower dimensional data is left for future work.
6.2. Information Theoretic Comments. The aim of this section is to connect the results of the present paper to classical information theory. 6.2.1. Channel Coding. Consider first channel coding. The additive noise channel features the transmission of codewords in R n (n is referred to the block-length of the code) through a noisy channel. The white Gaussian noise special case is of the same nature as that considered in Proposition 3.3: each coordinate of a transmitted codeword is additively blurred by an independent N (0, σ 2 ) random variable.
In the viewpoint introduced by Poltyrev [20], the codebook is a stationary point process in R n (e.g., a Poisson point process in the random coding case) and the decoding scheme consists in saying that the codeword c was transmitted if the received message is in the Voronoi cell of c. The latter is the maximal likelihood decoder. In the regime where the point process has intensity e nρ for some ρ ∈ R, there is a threshold for ρ below which the correct codeword is decoded with a probability tending to one as n tends to infinity, and above which the probability of error tends to 1 as n tends to infinity. In Shannon's channel coding theory, the codewords are constrained to satisfy some power constraint requiring that the Euclidean norm of a codeword be less than or 23 equal to √ nP , for some P which is the power per symbol. As shown in [2] (Lemma 2 and Theorem 7), the Poltyrev viewpoint can be connected to Shannon's channel coding theorem in the high signal to noise ratio case, namely when P tends to infinity. In particular the Shannon capacity then grows like 1 2 log(2πeP ) when P → ∞, and the Poltyrev capacity is what one gets asymptotically when subtracting 1 2 log(2πeP ) from the Shannon capacity.
6.2.2. Loss-less One-bit Compression Source Coding. Consider now source coding, which is more directly related to the setting considered in the present paper. Consider a source with i.i.d. N(0, σ 2 ) symbols. If there are n such symbols, with n (also called blocklength) large, they lie in a ball of radius √ nσ 2 , which has volume about e n 1 2 log(2πeσ 2 ) . If one wants to represent in a loss-less way all typical sequences of this type by 2 βn binary compression sequences, namely all binary sequences of length βn, the volume per sequence should tend to 0. That is e n 1 2 log(2πeσ 2 ) e −βn log (2) should go to 0 when n tends to infinity. This shows that the best (smallest) compression rate β for such a signal is β c = 1 2 log(2πeσ 2 )/ log (2). This is sharp and generalizes to all sources with a well defined entropy rate. This is formalized in the source coding theorem.
In our case, we have no structure in the signal, which corresponds to letting σ 2 tend to ∞. The unconstrained setting developed in the present paper can hence be seen as an analogue of the Poltyrev regime for source coding. In addition, we focus on a specific coding scheme which is that of Poisson hyperplanes one-bit compression.
Before going down this path, let us discuss some questions related to coding in this one-bit compressive setting. (1) What is the codebook? A first natural answer consists in associating one codeword sampled at random to each cell, with the uniform sampling taking place in a conditionally independent way given the hyperplane tessellation. Another possibility is the center of the smallest ball containing the zero cell (the out-ball). A third one is the center of the largest ball contained in the zero cell (the in-ball). (2) What is the decoding algorithm? By this, we mean the way to retrieve the codeword, as defined above, from the sequence of bits characterizing the cell as described in Section 6.1.
For unconstrained one-bit data compression, the analogue of the Shannon threshold β c is the density γ n = ρn α of hyperplanes that separates the situations where the mean volume of the typical cell tends to 0 and infinity, respectively. As shown above, this critical density lies in the Shannon regime, namely for α = 1. More precisely, if γ n = ρn, with ρ < ρ c = 1 √ e , then this mean volume tends to infinity, whereas if ρ > ρ c , then it tends to 0. In other words, for one-bit compressive sensing based on Poisson isotropic hyperplanes, the Palm-Shannon-Poltyrev source coding rate is α c = 1 and ρ c = 1 √ e . The proposed name comes from the fact that one looks at the typical cell, with typicality defined in the Palm sense (e.g., with respect to the point process of centers of the out-balls). The threshold that separates the situations where the mean volume of zero cell tends to 0 and infinity, respectively, could be called the Feller-Shannon- Poltyrev 24 threshold and is obtained for a density of hyperplanes with α c = 1 and ρ c = π √ e . The proposed name comes from "Feller's paradox" which states that the interval of a stationary point process on R containing the origin is larger than the typical interval. The Feller-Shannon-Poltyrev rate is of the same order as the Palm-Shannon-Poltyrev one, but π times larger. 6.2.3. Lossy One-bit Compression Source Coding. In the classical lossy source coding case, one looks for a codebook such that the distortion between a signal and its encoding be less than or equal to D. The most common distortion constraint is that the signal be at Euclidean distance order less than or equal to √ nD from the sequence it is encoded by. The rate-distortion function then specifies what is the best coding rate ensuring this constraint.
The framework discussed in the present paper can be seen as some Poltyrev version of lossy source coding with codebooks corresponding to one-bit data compression. As for the loss-less case, the first dichotomy is whether one takes the Palm viewpoint of the typical codeword or the Feller viewpoint of the typical data point. The cell of the former is Z, whereas that containing the latter is Z 0 . Let us first discuss the equivalent of the classical distortion defined above in the Palm case. If the codewords are the centers of the out-balls, then a natural definition of Palm distortion is in terms of the radius of the out-ball of the typical cell. For instance, in this case, the rate-distortion function would give the smallest intensity of hyperplanes γ n = ρn α such that this radius is less than or equal to √ nR, as a function of R. This Palm-Shannon-Poltyrev out-ball rate-distortion function is not known to the best of our knowledge. However, the Feller version of this problem is precisely solved by Theorems 3.2 and 3.1. For instance, in the case of Theorem 3.1, the parameters in question are α = 1 and ρ u (R) = x u √ π R √ 2 , with x u the constant defined in the proof of the theorem. Hence the function R → nρ u (R) can be seen as the rate-distortion function for this version of the problem. Note that for this definition of distortion, lossy coding with a radius R large enough requires a smaller hyperplane intensity than that guaranteeing the Palm volume to go to zero (which can be seen as an analogue of loss-less coding): the exponent is the same, namely α = 1, but the multiplicative constant ρ(u) goes to 0 as R tends to infinity. As expected, relaxing the distortion constraint allows one to use smaller codes.
The paper also determines various other rate-separation functions of the Feller type. A first instance is the Feller-Shannon-Poltyrev in-ball function, which gives the smallest hyperplane intensity such that the closest data point not encoded in the same way as the origin lies at a distance at least δ. This last condition is equivalent to having the radius of the largest ball centered at the origin and contained in the zero cell being larger than or equal to δ. By the same arguments as in Proposition 3.1, the associated threshold is α c = 0. If γ n = ρ, the probability that this distance is at least δ is exp(−2ρδ). A second example is the Feller-Shannon-Poltyrev linear contact function, which gives the smallest hyperplane intensity such that the closest data point in some random direction and not encoded as the origin is at distance more than √ nD. By the arguments of Proposition 3.2, the threshold is again α c = 0 and if γ n = ρ, the probability that this distance is at least √ nD is exp(− √ 2 √ π ρD).

Why Isotropic Poisson Hyperplanes.
We discuss here some mathematical reasons justifying the framework proposed here for a one-bit compression based on Poisson isotropic hyperplanes. Other natural options in the Poisson hyperplane framework are Poisson Manhattan hyperplanes, where all hyperplanes are orthogonal to the orthonormal basis of R n . An even simpler hyperplane system is the square one (referred to as the deterministic grid below). The following tables summarize the results available on basic quantities related to these tessellations, when the distance to the nearest hyperplane is the same in expectation. The results are proved at the end of the section. Table 3. Comparison of quantities for different tessellations with intensity γ in R n .
Type of tessellation Deterministic Grid For all criteria in Table 3, the Poisson isotropic setting outperforms the two other options. For the expected volume of the zero cell (first column), the isotropic Poisson tessellation is the best, i.e., has the smallest expected volume. This fact is the main justification of the use of this Poisson isotropic structure in the context of one-bit compression: this allows the code with the smallest volume of data encoded as the typical data, among all three options. The Poisson isotropic setting is also better than the other two in terms of the probability of separation of the typical data from data point x. We see from the last column that isotropic Poisson hyperplanes outperforms the other two options orderwise: the thresholds for the latter have order α = 1, whereas that of the former has order α = 1/2 only.
In contrast, consider now a uniform random vector Y chosen in the zero cell and take as a distortion criterion the "norm" of Y , defined as E[|Y | 2 ] 1 2 . The deterministic grid has the smallest norm and the Poisson grid has the second smallest norm. From Proposition 4.1 in [17], the isotropic Poisson tessellation gives an upper bound of this norm, where the upper bound is larger than the other two cases. For the quantity R M , or equivalently, the furthest vertex of the zero cell from the origin, the results are the same, with the deterministic grid performing better than the Poisson grid, and the isotropic Poisson tessellation having an upper bound greater than the other two cases, since x u ≈ 3. For both quantities to be small, the scaling with dimension n needed for γ is n 3/2 for all three tessellations.
We now give the proofs.
To compute the norm of the uniform random vector in the zero cell of the deterministic grid, consider the fixed cube of width 2n γ . Let Y n ∼ Uniform ([−n/γ, n/γ] n ). Then, 26 Thus, |Y n | ∼ n 3/2 √ 3γ , as n → ∞. The other quantities are immediate. The Poisson Manhattan tessellation is defined as follows. Let X be a Poisson hyperplane tessellation in R n with intensity γ and directional distribution φ that has mass 1 2n on each positive and negative axis, i.e. the normal vectors of the hyperplanes are the usual basis directions ±e 1 , ..., ±e n . Since equal weight is placed on each direction, the normal vectors of the hyperplanes form independent Poisson point processes of intensity γ n on each axis.
For each i = 1, . . . n, let N i = {T i k } be the Poisson point process of intersection points on the ±e i axis with the usual convention that T i 0 ≤ 0 < T i 1 . Then, the zero cell Z 0 of X is defined as Note that the interval [T i 0 , T i 1 ] will not have an exponential distribution, since we are requiring that 0 is in the interval, biasing for larger intervals. We obtain the distribution of the length of the interval by using the Palm distributions of {N i } n i=1 . By Slivnyak's theorem, P N = P 0 N −δ 0 , so the distribution of length of the interval is the same as P(T i 1 − T i 0 ∈ A) = P 0 (T i 1 + |T i −1 | ∈ A). Under P 0 , i.e. conditioned on T 0 = 0, T 1 and |T −1 | are independent exponential random variables with parameter γ n . Then, we first see that

27
Also, for Y such that conditioned on X, Y ∼ Uniform(Z 0 ), the law of large numbers implies that as n → ∞, Thus, |Y n | 2 ∼ n 3/2 γ as n → ∞. For the Poisson Manhattan, the quantity R M is given by Thus, R M is concentrated near √ 7n 3/2 √ 2γ for large n.