Fast Hadamard transforms for compressive sensing of joint systems: measurement of a 3.2 million-dimensional bi-photon probability distribution

We demonstrate how to efficiently implement extremely high-dimensional compressive imaging of a bi-photon probability distribution. Our method uses fast-Hadamard-transform Kronecker-based compressive sensing to acquire the joint space distribution. We list, in detail, the operations necessary to enable fast-transform-based matrix-vector operations in the joint space to reconstruct a 16.8 million-dimensional image in less than 10 minutes. Within a subspace of that image exists a 3.2 million-dimensional bi-photon probability distribution. In addition, we demonstrate how the marginal distributions can aid in the accuracy of joint space distribution reconstructions.


Introduction
Characterizing high-dimensional joint systems is a difficult problem due to experimental impracticalities such as long measurement times, low flux, or insufficient computing resources. One example of such a characterization is of continuous-variable entangled states -a resource gaining ground in quantum technologies [1][2][3][4][5][6]. A widely used source of continuous-variable entangled states is Spontaneous Parametric Down-Conversion (SPDC) in a nonlinear crystal [7]. Depending on the configuration, the resulting bi-photon state may be entangled in the transverse degrees of freedom [8][9][10]. To determine if the system is entangled, both the bi-photon joint position and joint momentum probability distributions composed of signal and idler photons must be measured through correlation measurements.
Much work has been done recently to characterize high-dimensional position-momentum entanglement with discrete measurements [8,[11][12][13][14][15][16]. Characterizations of position or momentum distributions resulting from SPDC are done by measuring signal and idler pixel correlations in either an image plane of the crystal (constituting a position measurement) or a Fouriertransform plane of the crystal (constituting a momentum measurement) through coincidence counting.
For a bi-photon probability distribution, measurements are typically done by measuring correlations via raster scanning through the individual signal-and idler-photon probability distributions. The time required to complete a raster scan with single-photon detectors quickly becomes impractical for certain scans. Imaging these distributions with a camera has been shown in [17,18], yet cameras often introduce far more noise than a single-photon detector.
Recently, Compressive Sensing (CS) [19,20] techniques have been introduced as an alternative to raster scanning for characterizing a high-dimensional entangled system, dropping the measurement time from months to hours [21,22]. While the data-acquisition time is drastically reduced, it comes at the cost of computational complexity, requiring a computational reconstruction of the signal. Performing CS on high-dimensional signals is not a new problem, and several clever solutions exist for utilizing separable compressive sensing matrices combined by a Kronecker product [23,24]. However, these methods are ill suited for sampling the correlations in a joint space.
In this article we propose the use of fast Hadamard transforms for high-dimensional joint space reconstructions. Specifically, we show how the Kronecker-product-based recursion relations of Sylvester-type Hadamard matrices can combine single-particle sensing matrices. This, in turn, enables the use of fast Hadamard transforms in the joint space as they have been shown to drastically reduce CS reconstruction times [25]. Using the randomization techniques outlined in [26,27], sensing matrices composed of randomized Hadamard matrices offer tremendous speed enhancements in many reconstruction algorithms.
Additionally we show that by using the individual signal and idler marginal distributions in our reconstruction of the joint space distribution, we can more accurately acquire transverse spatial correlations as measured by the mutual information. To the best of our knowledge, this is the first time the single particle information has been used in reconstructing the joint space distribution.
To demonstrate the effectiveness of our method, reconstruct a 16.8 × 10 6 dimensional jointspace distribution in only a few minutes. Within this joint-space lives a 3.2 million-dimensional bi-photon position probability distribution from which we measure the degree of transverse correlations between signal and idler photons using the mutual information.
The experimental realization here is closely related to the work performed in [21]. The experiment in this article is merely meant to demonstrate how structured randomness enables efficient reconstructions of the joint space distribution at even higher dimensions. In theory, this increase in resolution allows for an increase in the amount of measurable mutual information.

Compressive sensing
CS helps to overcome unreasonable data-acquisition times associated with sampling signals with limited resources. CS requires that the signal of interest ∼ x has a sparse representation x via a basis transform. Limiting the discussion to real signals for simplicity, x with a random sensing matrix A ∈ R M×N to form a measurement vector y = A · ∼ x where y ∈ R M . Assuming the signal representation x is k-sparse, x may be recovered by solving where the second term is a least-squares term bounded by a predefined error ε and is defined as the l p -norm. The function g(x) is a function to be minimized depending on the assumed sparsity; g(x) is often chosen to be the l 1 -norm x 2 1 , the total variation as defined by the gradient ∇x 1 , or a combination of functions where x is known to be sparse. τ is simply a weighting term that weights the sparsity, favoring either the least-squares solution or the sparsity. Because A ∈ R M×N where M N, there are an infinite number of solutions confined to the least-squares term. The function g(x) picks out the sparsest of these solutions which should correspond to our signal x. An overview of compressive sensing and its applications may be found in [28].

Measuring a non-separable joint system
We apply CS to measure the joint position probability distribution of the down-converted signal and idler photons from SPDC. Quantum mechanics tells us that the bi-photon state exists in a Hilbert space composed of the tensor product of the individual signal and idler photon Hilbert spaces. After representing the bi-photon state in a discrete basis, operators on these states are represented as matrices. In order make a measurement, we approximate the state as living in a finite dimensional Hilbert space. We can therefore represent a bi-photon operator matrix in terms of Kronecker products of individual signal and idler photon operator matrices. For CS, we can let these operators be projection operators and manipulate them such that they form the rows of a sensing matrix A. The sets of projection operators for the signal and idler spaces are designated by a a set of M patterns where each pattern is in R N ; P S and P I ∈ R M×N . In this manner, the sensing matrix is written as . . .
for i ∈ 1...M where P[i] represents the i th row of P.
The Kronecker product ⊗ in this article operates on matrices and vectors such that if a is of dimension m × n and b is of dimension p × q, then their Kronecker is of dimension mp × nq Fig. 1. The above experimental diagram demonstrates how to image a joint two-particle system. In this paper, the joint system is composed of highly-correlated signal and idler photons from a SPDC source. The experiment samples the position distribution of the joint system by taking random projections of signal and idler intensities with a spatial light modulator within an image plane of the crystal. An avalanche photodiode (APD) detects photon arrivals while the photon counters measure photon coincidences.
represented as An experimental diagram for measuring the joint space bi-photon position probability distribution is presented in Fig. 1 where each particle's space is depicted as two-dimensional. Yet, to simplify the CS formalism, we represent the signal and idler spaces as one-dimensional living in R N and the joint space distribution as a vector x ∈ R N 2 . As outlined in [21], compressive sensing is experimentally accomplished by taking random projections with patterns composed of √ N × √ N pixels within each subspace of the signal and idler systems and then measuring the resulting correlations in photon counts.
During reconstruction, the bi-photon probability distribution is already sparse in the pixel basis due to the tight pixel correlations resulting from energy and momentum conservation. This eliminates the need to define a sparse-basis such that Ψ becomes the identity operator Ψ = 1 in the formalism above.
Random binary matrices were used in [21], yet we wish to use structured random binary matrices for reconstruction purposes. Using properties of Kronecker products enables relatively efficient computations of the reconstruction operations A · x and A T · y, where A T is the transpose of A, because A never needs to be computed explicitly [29]. However, structured randomness can enable the use of fast transforms which are even more efficient. Our contribution is to demonstrate in the following section how randomized Sylvester-Hadamard matrices enable the use of fast Hadamard transforms in joint space CS reconstructions.

Randomly sampled & permuted Hadamard sensing matrices
Sylvester-Hadamard matrices have a structure that is particularly advantageous to the CS framework. These matrices are generated from a simple recursion relation defined by a Kronecker product.
From these, any Sylvester-Hadamard matrix can be decomposed as follows: for k > 1. Because of this structure, Sylvester-Hadamard matrices are restricted to powers of two but can be used to build patterns and a sensing matrix that utilizes the speed and efficiency of a fast Hadamard transform H[...]. We use a normally-ordered fast transform in this work. Its algorithm is similar to that of a fast Fourier transform, but it consists of only additions and subtractions. Hence, it performs the reconstruction operations A · x and A T · y in O N 2 log N time -significantly faster than an explicit matrix-vector multiplication of O N 4 time. A thorough overview of Hadamard matrices, fast-Hadamard transforms, and their applications to signal and image processing may be found in [30]. To construct P S , P I , and A from Hadamard matrices, the Hadamard matrices must be randomized in both their rows and columns. The sensing matrices must be both incoherent with the image yet span the space in which the signal resides. In other words, the sensing matrices must adequately sample the basis components of the signal. Random sensing matrices perform this task well, yet Hadamard matrices naturally contain much structure. To begin, each sensing matrix must be formed by taking specific rows from a Hadamard matrix with the correct dimensions. A is constructed from H N 2 while P S and P I are constructed from H N . Because of the relation in Eq. (2), the rows of A will be determined by the rows of P S and P I .
The randomization of the Hadamard matrix rows is accomplished by defining two vectors r S and r I ∈ R M for each signal and idler system composed of M randomly chosen integers on the interval [2,N]. The values in r state which rows should be extracted from H N when constructing P S and P I . Note that the interval begins at 2 because the first row of a Hadamard matrix is composed entirely of ones. The interval may begin at 1 if the total photon flux on a detector is desired. Also, note that A ∈ R M×N 2 where M << N 2 . This condition allows for scenarios where P S , P I ∈ R M×N such that M > N in the individual subspaces, meaning rows of H N may be repeated when constructing P S and P I .
The randomization of the Hadamard columns is accomplished by defining permutation vectors p S and p I ∈ R N that randomly permute the N columns of H N . Once r and p have been defined for both the signal and idler subspaces, patterns are constructed by the following equations: where the y and x components of H[y, x] refer to the rows and columns of H respectively. Although these operations are defined for the signal and idler subspaces, they combine in a particular way according to Eq. (2) to construct a Hadamard-based sensing matrix A that enables fast transform operations in the joint space. The manner in which they combine to manipulate a Hadamard matrix H N 2 that spans the joint space is detailed in the next section.

Joint space Sylvester-Hadamard sensing matrices
Once r and p have been defined for the individual signal and idler subspaces, they may be used to construct the corresponding joint space row-selection and permutation vectors, r SI and p SI . Consider the construction of r SI first. By Eq. (4), the complete joint space sensing matrix A is simply formed by the row-wise Kronecker product of the subspace sensing matrices P S and P I . As r S and r I determine the ordering of the Hadamard rows within these patterns, r SI must also be a subset of a Kronecker product of r S and r I . Knowing that the Kronecker product of P S and P I will form "blocks" of size [M × N], it is straightforward to show that for i ∈ 1...M where r[i] represents the i th component of r. Note that element-wise counting in this article starts at 1 and not 0. Because r S and r I are chosen at random and will often be over-complete, M > N, and r SI will probably have repeating units, and a row within A will appear more than once. This is equivalent to taking the same projection more than once, offering no additional information. To prevent this, we simply compare each of the values within r SI and eliminate repeating i th values. If r SI [i] is a repeated value, we eliminate r SI [i] along with the components r S [i] and r I [i]. In this way, the number of samples M will decrease yet contain the same amount of information.
The formation of p SI follows a similar form as r SI , yet it will be of length N 2 . Although it is not a simple Kronecker product, it does follow from the structure in Eq. (2). The structure of p SI takes the form for i ∈ 1...N and j ∈ 1...N. Generating randomized Hadamard matrices using r and p for each signal, idler, and joint space are summarized below: where the y and x components of H[y, x] refer to the rows and columns of H respectively. The construction of A presented in Eq. (8) allows us to use fast transforms as explained in the next section.

Joint space fast Hadamard transform operations
Keeping track of the randomization operations allows the use of fast Hadamard transforms when computing A · x and A T · y. This is accomplished by reordering either x or y according to p, taking the fast Hadamard transform, and then picking specific elements from the final result according to r. The manner in which they are rearranged and picked depends upon the operation A · x or A T · y in either the data acquisition or reconstruction processes.
Starting with the data-taking procedure y = A·x, projections are taken in each signal and idler system by first constructing individual patterns. Pattern construction is done by fast Hadamard transforming basis vectors and then permuting them. Because of the symmetric nature of a This means that every element in y will require four coincidence measurements. Even though 4M coincidence measurements are required when using one detector per subspace, the drastic sampling performance gained through CS methods is such that 4M N 2 . Alternatively, if two detectors are used in each subspace (one detector to collect projections from the positive components and one detector to collect projections from the negative components), the detection process could be streamlined to measure each of the four correlations in Eq. (10).
In reconstruction, fast-Hadamard transforms may be utilized by CS reconstruction algorithms to perform the operations A · x and A T · y. The operation A · x first requires that x be inverse-permuted, fast Hadamard transformed, and then have the correct M elements extracted from the final result. The inverse-permutation is done by defining an inverse permutation vector q as for all i elements in p. Hence, A · x is realized with the following operations The operation A T ·y requires that a vector β composed of N 2 zeros be filled with the elements of y according to r SI , fast Hadamard transformed, and then permuted according to q as follows: Again, these operations work because Hadamard matrices are symmetric. However, it should be noted that the true inverse operation is H −1 N = H T N /N. When taking the fast transform operation in Eq. (13), we are explicitly taking the forward fast transform and neglecting the normalization term. Because of this structure, the operations A · x and A T · y can be utilized by most reconstruction algorithms to operate more efficiently.
The methods outlined in this article can also be applied to joint space signals that are not sparse in the "pixel-basis" where Ψ = 1. Sparse forward and inverse transform operations, Ψ[...] and Ψ[...] −1 , need to be applied to x in an appropriate order to bring x and y back into the pixel-basis before fast-Hadamard transforming. Hence, the operations A · x and A T · y become A · Ψ[x] −1 and Ψ[A T · y].
To reiterate, the novelty presented in this section is in how Eqns. (6) and (7) enable the use of fast Hadamard transforms for calculating A · Ψ[x] −1 and Ψ[A T · y] as summarized below.
2. Inverse permute x using q SI such that x = x [q SI ].

Fast
Hadamard transform x such that y = H[x ].
4. Extract M elements from y using r SI to obtain y = y [r SI ].
2. Place the components of y into β using r SI to assign the locations for the elements of y such that β [r SI ] = y.
5. If Ψ = 1, neglect this step and let x = x . Otherwise, transform x into the sparse basis to obtain x = Ψ[x ].

Mutual information
To demonstrate the practicality of the previous results, we compressively measure and quickly reconstruct a 16.8 × 10 6 dimensional joint space probability distribution. Up to this point, the reason why these joint space measurements are useful has not been explained in detail other than to inform the reader of the characterization of correlated systems. When attempting to use down-converted photons for information transfer, an important question to ask is, How much uncertainty about the position of the signal photon is removed upon knowing the position of the idler photon? This quantity is effectively answered by the Shannon mutual information between the position statistics of the signal and idler photons I(X S , X I ) [31]. The mutual information quantifies the classical channel capacity and is easily found by first measuring the joint space probability distribution. Given the discrete random variables' distributions for signal X S and idler X I , and their allowed set of random values, x S and x I respectively, the mutual information is defined as: where p(x S , x I ) is the joint space probability distribution while p(x S ) and p(x I ) are the marginal probability distributions. In terms of our CS formalism, p(x S , x I ) = x. Once x has been procured from y, the marginal distributions p(x S ) and p(x I ) are then found summing over appropriate values of p(x S , x I ). Since we may approximate the joint distribution as a double Gaussian, we may also say 2 I(X S ,X I ) is equal to the Schmidt number of the state which is a measure of the number of entangled modes [32,33]. In our case, 2 I(X S ,X I ) is equal to the number of distinct channel inputs.

Theoretical expectations
Before reporting our experimental results, it is useful to first estimate the theoretical maximum amount of possible mutual information as derived from first principles based on the crystal and the pump-laser specifications. That result should then be compared with the maximum possible information we could measure given our SLM resolution. A thorough calculation characterizing degenerate SPDC is done in [10] in one transverse spatial dimension. Assuming a double Gaussian bi-photon state, the mutual information in the position domain between down-converted photons (X S and X I ) is where L z , λ p , and σ p represent the length of the nonlinear crystal, the pump-laser wavelength, and the standard deviation of the Gaussian intensity pump-laser profile, respectively. We use a 325 nm pump laser and a 1 mm length nonlinear crystal. The maximum 1/e 2 pump diameter is listed as 1.2 mm resulting in a σ p that is four times smaller, i.e. σ p = 3 × 10 −4 m. The experiment uses approximately degenerate down-converted light because of the paraxial nature of beam propagation and the use of narrow-band filters. Our pump laser is approximately Gaussian in two dimensions resulting in a mutual information twice as large as reported in Eq. (15). We obtain a theoretical maximum mutual information between signal and idler photons of 10.9 bits. However, when moving from an infinite-dimensional Hilbert space to a finite dimensional space dictated by the resolution of the SLM and its pixel size, the resulting measurable mutual information must be less than or equal to to the continuous variable case of 10.9 bits.

Reconstruction algorithm: maximizing the mutual information
Given that a motivation for these reconstructions is to infer the mutual information shared between signal and idler photons, it is reasonable to design a reconstruction algorithm that attempts to maximize the mutual information via the suppression of noise through soft or hard thresholding; hard thresholding implies setting components less than a threshold to zero while soft thresholding implies first hard thresholding and then decreasing the amplitude of every remaining signal component by the threshold's amplitude. For practical applications, only the largest measurable signal components of x should be used to infer the mutual information because of the presence of background noise. With this in mind, we use an iterative thresholding algorithm [34] that computes the mutual information at each iteration. The program exits when the mutual information no longer increases with thresholding. The algorithm we use is summarized as follows: where c is a vector composed entirely of ones times a non-zero constant and min(x t ) is a vector composed entirely of ones times the smallest element in x t . Note that at each iteration we take a projection of the current result with the termη 1 A T · (y − A · x t ) . During the first iteration, A · x 0 = 0 because x 0 = c. Note that A · c = 0 only because we chose to neglect the first row of the Hadamard matrices. The first iteration results in computing A T · y as in standard iterative style algorithms.η 1 [. . .] is an operator that performs soft thresholding on everything within its brackets using a biorthogonal 4.4 wavelet transform with a two-level decomposition [35][36][37]. Soft threshold in the wavelet basis, often referred to as wavelet shrinkage, is performed using the universal threshold of Donoho and Johnstone [38,39]. The filtered signal is then inverse transformed back to the pixel basis.η 2 [. . .] then performs a hard thresholding on everything within its brackets operating in the pixel basis and renormalizes the final result to be a probability distribution. The threshold ofη 2 gradually increases with each iteration. As x t becomes less and less noisy through filtering and converges to the true solution, y − A · x t approaches zero. However, it never truly reaches zero and results in a small noise term. To prevent injecting random noise into each filtered iteration, we take the projection of the noisy term with the current clean solution. We then add this projection back to the current solution to prevent discarding current signal components. After hard thresholding the first iteration, min (x t>0 ) = 0 where 0 is a null vector.

Experimental results
As an experimental demonstration, we compare how the measured mutual information from compressive measurements compares to the theoretical mutual information of 10.9 bits. Knowing that information from CS resides in the standard deviation of the signal from the different projections, this standard deviation must be greater than the shot noise. Otherwise, the signal is obscured by the noise. With binary patterns on each SLM we measured coincidence counts at a rate of 4 × 10 3 counts per second. Since each SLM reduces the incoming flux by approximately 50%, the total number of coincidences Φ was approximately four times larger, or Φ = 1.6 × 10 4 coincidences per second. We chose the integration time such that all four projections per y-element required a total of 8 seconds due to power constraints. The resulting ratio of the standard deviation from the projections relative to the shot noise was 2.4. In addition, the sparsity k should be of order N = 4096 due to tight pixel correlations resulting from position and momentum correlations, meaning M = O (Nlog(N)) = O(10 4 ) measurements. We chose the number of measurements to be 2 × 10 4 (M/N 2 ≈ .001) as a reasonable compromise between the total integration time and reconstruction quality. The resulting scan took just over 44 hours. To compare these values to a raster scan, the signal-to-noise ratio (SNR) goes as Φt/N, assuming perfect pixel correlations and uniform illumination. Here, t is the integration time per pixel. When raster scanning in an N 2 dimensional joint space, the total integration time goes as N 2 t = N 3 SNR 2 /Φ for shot noise limited signals. Therefore, a raster scan operating under perfect conditions, again considering perfect pixel correlations and uniform illumination, would require 50 days to achieve a SNR = 1 for N = 4096. Hence, 50 days represents the bare minimum integration time for raster scans. The reconstructed joint space probability distribution is presented in in Fig. 2.
From the same set of data, a comparison of the recovered mutual information versus the number of samples M was also conducted. The results state that the resulting mutual information is highly dependent on the noise. As the algorithm seeks to find the optimal threshold to discern the largest number of distinguishable modes, random noise is thresholded into sparse speckle patterns when the signal is not large enough to distinguish from the noise. These speckle patterns incorrectly state that there exist tight pixel correlations. To stress the severity of this flaw, we consistently recover about 8 bits of mutual information with M = 10 projections. Even for large SNR and large M, these speckle patterns may still be present along with the true signal for many reconstruction algorithms. Hence, noise will artificially increase the mutual information.
Reconstruction errors that artificially increase the mutual information bring into question the validity of CS methods for these types of characterizations. However, there exists information in the signal and idler marginal probability distributions that can reduce this error. On the right hand side of Fig. 2, the marginal distribution for only the signal is shown since the idler distribution is similar in appearance. The signal's measured 64 × 64 pixel marginal distribution was recovered from the 2 × 10 4 projections using photon counts from only that detector. The beam is Gaussian; Fig. 2 Fig. 3. It is evident from Fig. 3 that background noise significantly alters the results, fabricating a larger mutual information. When neglecting marginal information, the recovered mutual information values appear to asymptotically decrease as M grows larger, supposedly approaching an accurate value. However, including information from the marginals decreases these errors, even for small M, by systematically reducing background noise. Note that Fig. 3 plots the mutual information in bits. This means that for low sampling percentages when including the marginal information, just under 4 bits of mutual information, presumably from noise, is calculated. However, these reconstructions are in a regime where M k log(N/k) and the signal is dominated by noise.
Using the reconstructed joint space distribution, the marginal probability distributions can be calculated and compared to the measured marginal distributions. When not including the measured marginal information, the joint space distribution becomes riddled with a low-level noise that drastically warps the resulting reconstructed marginal distributions. However, including the marginal information results in a clean joint space distribution with reconstructed marginal distributions that are similar in appearance to the measured values. The reconstructed distributions in Fig. 2 were recovered while including the marginal distribution information. Using CS and additional information in the marginal distributions, we measure 7.21 bits of mutual information, corresponding to 148 correlated modes. The reason we fall short of 10.9 bits of measured mutual information is likely due to the low resolution imposed by the beam covering a small part of the sample space. This resulted in a discretization of the marginal distributions that was not detailed enough to measure all of the available correlations. Another issue is the misalignment in the overlap of signal and idler pixels. This causes the appearance of a second correlation band as seen in Fig 2 (a). Finally, a misalignment in the focus will also result in a single pixel in one arm sharing correlations with several pixels in the other arm. While we compressively measured the correlations existing between 4096 signal and 4096 idler SLM pixels, illumination profiles on each SLM suggest that there exist pixels where no correlations are even possible. To determine the actual joint space distribution from which we could measure possible correlations, it is reasonable to consider the 1/e 2 diameter of each Gaussian beam profile and consider the total number of correlations that could exist between those pixels alone. The product of these areas, relative to pixel size, then defines the effective joint space dimensionality. From these enclosed areas, we calculate a joint space probability distribution of 3.23 × 10 6 dimensions. Notice the measured and recovered marginal distributions presented in Fig. 2. The marginal distribution recovered from the reconstructed joint space distribution is smaller than the 1/e 2 thresholded distribution displayed immediately above. These regions should be identical in the limit of infinite SNR and perfect image-plane optical alignment. Again, the difference suggests that either the SNR may not have been sufficient to extract all of the existing correlations or that the alignment was imprecise.
Reconstructions in [21] were limited to a maximum resolution of 10 6 pixels on a desktop computer with 32 GBytes of memory and required several hours to reconstruct x before the image quality was sufficient to exit the reconstruction program. However using a laptop with 8 GBytes of memory, we perform reconstructions of a 16.8 × 10 6 dimensional joint space in under ten minutes with satisfactory results using fast Hadamard transforms with the algorithm presented in Eq. (16). We ran the reconstruction algorithm 10 times per sample point to ensure that the results are consistent. The resulting error bars were four orders of magnitude smaller than the scale allows and are not shown. It should be noted that the reported mutual information values are the result of extracting the largest correlations from the data and are not the result of inferring the mutual information from a best Gaussian fit, assuming the resulting SPDC follows a commonly approximated double-Gaussian wave function [32,33]. While a Gaussian fit would probably characterize the system better and result in a higher mutual information by reporting correlations in the tails of the Gaussian, it is not indicative of the actual available channel capacity.

Conclusion
This article describes in detail the methods necessary to efficiently perform high-dimensional compressive Kronecker imaging of a joint system. By randomizing Hadamard matrices and utilizing fast Hadamard transforms, CS is performed in the 16.8 million-dimensional Kronecker space while containing a 3.23 million-dimensional bi-photon probability distribution. The experiment that would require over 50 days to do a raster scan under perfect conditions is instead performed in just under two days and only requires several minutes to reconstruct the data. As an example, we compressively measured 7.21 out of 10.9 bits of mutual information while also demonstrating how to improve the accuracy of these results utilizing information contained within the marginal distributions. We believe these methods will prove to be an invaluable tool in measuring the distribution functions of correlated systems as well as other correlation-based CS implementations.