Single-pixel imaging 12 years on: a review

Modern cameras typically use an array of millions of detector pixels to capture images. By contrast, single-pixel cameras use a sequence of mask patterns to filter the scene along with the corresponding measurements of the transmitted intensity which is recorded using a single-pixel detector. This review considers the development of single-pixel cameras from the seminal work of Duarte et al. up to the present state of the art. We cover the variety of hardware configurations, design of mask patterns and the associated reconstruction algorithms, many of which relate to the field of compressed sensing and, more recently, machine learning. Overall, single-pixel cameras lend themselves to imaging at non-visible wavelengths and with precise timing or depth resolution. We discuss the suitability of single-pixel cameras for different application areas, including infrared imaging and 3D situation awareness for autonomous vehicles. © 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
The concept of single-pixel imaging followed the development of compressive sensing [1][2][3][4] and was reported soon after in a seminal paper by Duarte et al. at Rice University [5]. This pioneering work is a combination of different imaging and sampling techniques which has inspired the field of single-pixel imaging, laying the foundations for recovering images from a single-pixel camera when the number of measurements is fewer than the total number of unknown pixels in the image, that is, when the properties of the image are sensed compressively, also known as under-sampling or sub-sampling.
Prior to this work, in 2005, Sen et al. had published the paper "Dual Photography" [6] which proposed the idea that an image could be captured using just a single photodetector (single-pixel detector) rather than a detector array as used by most common imaging devices such as mobile phones and digital SLR cameras. Here, the spatial structure is provided by interrogating a scene with a series of spatially resolved patterns while measuring the correlated intensities using the single-pixel detector. The development of silicon-based charge-coupled device (CCD) and complementary metal-oxide-semiconductor (CMOS) pixelated sensors has brought the benefits of cheap, high-performance, imaging technologies for many applications in the visible (VIS) wavelength spectrum. However, single-pixel detectors can bring significant performance advantages such as sensitivity at non-visible wavelengths or very precise timing resolution, both of which can be impractical or prohibitively costly to implement as a pixelated imaging device.
A popular choice for non-visible wavelength single-pixel imaging has been in the short-wave infrared (SWIR) spectral region (approximately 1-3 µm) due to the availability of detectors having a good sensitivity [7,8]. In particular, telecoms research has provided a range of InGaAs devices which has allowed both cost-effective detectors and illumination sources to be developed (operating in the 800 nm to 1800 nm range). This wavelength range has been shown to be particularly suited to imaging through scattering media, such as smoke [8], and has also been used to detect and image hydrocarbon gas leaks [9].
Single-pixel imaging has provided an ideal test platform for new state-of-the-art detector technologies, allowing the development of cost-effective imaging systems at wavelengths across the electromagnetic spectrum. Examples include x-ray imaging [10][11][12], terahertz imaging [13][14][15], compressive radar [16], a VIS-NIR telescope [17] and fluorescence microscopy [18]. They have also utilised various sampling schemes including compressive sensing and machine learning. Figure 1 shows a timeline of the development of a range of single-pixel imaging systems, including a range of modulation technologies and sampling schemes. Publications are shown by year and highlight the modulation technology and sampling scheme used. It is interesting to note that systems based on a structured detection approach, and employ sampling schemes such as compressive sensing (CS) or machine learning (ML), are often termed single-pixel cameras, whereas those based on structured illumination are often referred to as computational ghost imaging. The following references are shown: Sen  A particular application that has recently attracted much attention is single-pixel imaging using time-of-flight (ToF) measurements, which can be used to recover 3D profiles of a scene from a distance. When combined with the recent advances in machine learning algorithms, single-pixel imaging shows promise as a powerful technique for low-cost, scan-free, 3D sensing and classification. This paper provides a review of single-pixel imaging techniques, including that of the closely related field of computational ghost imaging (computational GI), and focuses on the main algorithm and hardware developments over the past twelve years. There are still many discussions on the distinction between single-pixel and computational GI. In this paper we discuss them with respect to their common terminology in the literature; single-pixel imaging often seeks to solve an inverse problem, whereas computational GI often seeks to perform a reconstruction from an ensemble average.

Basics of single-pixel imaging
A simple method of capturing an image using a single-pixel detector is to sequentially measure each pixel in turn, as in the raster-scan approach used in the original mechanical televisor of John Logie Baird [34]. However, sequentially measuring information on only one pixel in turn is an inefficient use of the available illumination light. A more common scan strategy is to use a sequence of spatially resolved patterns and to record the intensity measurements of the correlations between the patterns and the object, or scene. This correlation measurement can be performed in one of two ways. A light modulator placed in the image plane of a camera lens can be used to mask images of the scene, the filtered intensities being measured by the single-pixel detector. This mode of operation is commonly referred to as structured detection (see Fig. 2), and is often used in the field of single-pixel imaging or single-pixel cameras. Alternatively, the light modulator can be used to project patterns onto the scene and the single-pixel detector used to measure the back scattered intensities. This mode of operation (shown in Fig. 3) is commonly referred to as structured illumination, and is often used in the field of computational GI (discussed in detail in section 3). In both these configurations the conventional light source can be replaced with a pulsed laser (also shown in Fig. 3) so as to provide time-of-flight information and hence depth of the scene (discussed in detail in section 6.2). Section 4 discusses some of the modulation technologies commonly used in single-pixel imaging.
The object or scene can be reconstructed by multiplying each pattern in the sequence by the corresponding single-pixel intensity measurement, resulting in a set of weighted patterns that can be summed to form an image. In principle, reconstructing an image comprising of N pixels in total requires a sequence of M = N different patterns. However, if the set consists of non-orthogonal patterns and / or the measurements are subjected to noise then a large number M N measurements are needed in order to achieve a good signal to noise ratio (SNR) of the final image. A common approach is to use an orthogonal pattern set, such as the Hadamard basis (see section 5), and measure the differential intensity for each pattern and its contrast inverse (i.e. photographic negative).
Given a sequence of N-element orthonormal pattern pairs P (x,y),m (where m is the pattern sequence number), the corresponding differential intensity signals between the positive and inverse patterns are S m , which are proportional to the correlations between each pattern and the scene. Based on M patterns, the 2D image estimate of the object or scene, O (x,y),M , can be obtained by It is clear that a means of significantly reducing the number of required patterns, and measurements, is necessary for single-pixel imaging systems to be widely adopted. Compressive sensing (CS) [3,35,36] has been shown to be a route for exploiting the redundancy in the structure of most natural signals or images. CS is based on the principle that most natural images are sparse when expressed in the appropriate basis, i.e. a basis having many coefficients that are close, or equal, to zero. This is the case for image compression algorithms such as JPEG [37,38] or JPEG 2000 [39]. CS enables image reconstruction with far fewer measurements than are required for conventional sampling schemes, allowing faster data acquisition or higher SNR [27]. However, despite the focus on faster imaging, or imaging with improved SNR, many sensing problems do can be used to spatially filter light by selectively redirecting parts of an incident light beam at ±24 • to the normal, corresponding to the individual DMD micromirrors being in the "on" or "off" state respectively. An object is flood-illuminated and imaged onto the DMD, where a sequence of binary patterns displayed on the DMD can be used to mask, or filter, the image. A single photodetector is used to measure the total filtered intensity for each mask pattern, allowing an image of the object to be reconstructed. b) Each pattern in the sequence is then multiplied by the corresponding single-pixel intensity measurement to give a set of weighted patterns that can be summed to reconstruct the image. In an alternative configuration, the DMD is used to project a sequence of light patterns onto a scene and the single-pixel detector measures the total back scattered intensity. For both structured illumination and structured detection a pulsed laser can be used as the illumination source to perform temporal resolution measurements using a single-pixel detector (as shown here). Recording the temporal form of the back scattered light provides a measure of the distance travelled by the light and hence depth of the scene. b) Similar to the structured detection scheme, the sequence of projected patterns and the corresponding intensity measurements allows an image to be reconstructed. In the case where a pulsed laser is used, the additional time-of-flight information from the broadened back scattered pulse allows a depth map of the scene to be constructed. not require the full signal to be reconstructed. This is the case in applications such as detection or classification [40]. In the case of compressive classification, the resulting dimensionally reduced matched filters are sometimes termed "smashed filters" [41]. Image-free classification is also discussed in section 8 when using machine learned sampling schemes.

Computational ghost imaging
A field that is very closely related to single-pixel imaging is that of ghost imaging (GI), which is a technique that exploits the quantum nature of the entangled photon pairs produced in spontaneous parametric down-conversion [42]. A pump laser incident on a nonlinear crystal produces the photon pairs, often termed the signal and idler, which are entangled in their positions and hence, measuring the position of one implies the position of the other. The signal and idler beams are separated along different paths, one is measured by a spatially resolved detector such as a CCD or scanning pinhole and photodetector, the other interacts with the target object and is collected by a single-pixel detector (in GI this is often referred to as a bucket detector). Importantly, the light captured by the CCD never interacts with the target object. Only by correlating the CCD and bucket detector measurements can the "ghost" image be revealed [43]. Whilst originally demonstrated using degenerate signal and idler photons at 702 nm, GI has also been achieved at other wavelengths, including a demonstration using non-degenerate photons at 1550 nm and 460 nm [44]. GI has even been achieved using two beams formed by correlated pairs of ultracold metastable helium atoms [45].
However, it was soon realised that while GI was originally designed to exploit the quantum nature of light, it was also possible to be performed in a classical experiment [24,46]. Similar to the quantum experiment, a structured illumination light field is split into two near identical beams, usually termed the reference and object beams. The reference beam is recorded by the CCD while the object beam impinges upon the target object, and the scattered or transmitted light is then measured by the bucket detector. Bennink et al. [47] demonstrated coincidence imaging using a classical light source made by chopping and deflecting a laser beam, creating pairs of angularly correlated pulses. However, in most of the early examples of classical GI the object being imaged was illuminated by a time-dependent speckle pattern, generated by passing a collimated laser beam through a rotating ground-glass diffuser [25,48,49] (see section 4.1 for a discussion on pseudothermal modulation schemes). A simple beamsplitter copies this pseudothermal source into the reference and object beams.
The classical form of GI was developed further by Shapiro [26], who proposed the use of a computer controlled spatial light modulator (SLM) for creating the speckle patterns to illuminate the object. Since the patterns are predetermined using a computational method, the beamsplitter and CCD sensor are no longer required as it is no longer necessary to record the illumination beam, only the synchronised intensity measurements from the bucket detector are required in order to reconstruct the image. This form of GI, often referred to as computational GI, was demonstrated experimentally by Bromberg et al. [28] and shortly afterwards was demonstrated experimentally using compressive sensing [27]. Erkmen and Shapiro [50] provide a useful review of quantum, classical and computational ghost imaging.
Similar to the discussion in section 2, if the number of resolution cells "speckles" within the illumination pattern is N, one needs in principle at least M = N different patterns in order to fully reconstruct the image of the object. In practice, since these correlation methods are statistical in nature, there is spatial overlap between the different speckle patterns and hence they form a non-orthogonal measurement basis. A large number M N measurements are therefore needed in order to achieve a SNR 1 [27]. A major downside to classical GI was the large background level in the reconstructed images compared to that achieved using a quantum source. Methods of improving the SNR of GI systems were soon proposed, with differential GI being the most widely adopted [29]. Here, a differential bucket detector signal measurement is employed which is sensitive only to the fluctuating part of the intensity signal.
There have been useful comparisons of computational GI systems to single-pixel cameras [51]. In particular, computational GI can be compared to the original work on dual photography [6] which is a novel photographic technique that exploits Helmholtz reciprocity to interchange the lights and cameras in a scene [52,53]. This can also be compared to the work of Sun et al. [30] where four spatially separated single-pixel detectors are used to obtain a 3D reconstruction of an object (see section 6 for a discussion on 3D imaging and ranging). Despite being commonly treated as separate research fields, it has become obvious that, from an optical perspective, computational GI and single-pixel imaging are the same. However, it is still convenient to maintain a distinction between the two, where single-pixel imaging (or single-pixel cameras) often use a structured detection scheme, and compressed sensing, whereas computational GI often uses a structured illumination scheme. The difference between these two schemes can be demonstrated by interchanging the locations of the light source and the detector in the setups illustrated in Fig. 2 and Fig. 3.

Modulation schemes
As previously shown in Fig. 1, there are several choices regarding the modulation technologies used to produce the patterns for either structured detection or structured illumination single-pixel imaging systems. A useful table listing the advantages and disadvantages of various elements of single-pixel imaging systems can be found in Ref. [54].

Pseudothermal
A source of pseudothermal light can be generated by passing a laser beam through a rotating ground-glass diffuser [25]. In the case of a static diffuser a speckle pattern is generated, resulting from the diffusively transmitted light that undergoes constructive and destructive interference in different spatial regions. When the diffuser is rotated, the intensity cross-section of the resulting optical beam varies with time. In order to avoid repetition of the light field every full rotation of the diffuser, transmission through a turbid solution of microspheres can be used to further spatially randomize the pattern [48]. The pseudothermal light that emerges compares in its coherence properties to the light of an actual thermal source such as an LED [55]. As discussed in section 3 an optical beamsplitter forms two near-identical copies of the light field which can be used as the reference and object beams in a classical GI system.
The spectral properties of the pseudothermal source is determined by the properties of the materials from which it is made. Yu et al. [11] demonstrated a GI system using a pseudothermal x-ray source produced by passing a monochromatic x-ray beam through a slit array and a movable porous gold film. More recently, Zhang et al. [12] demonstrated an ultra-low radiation x-ray GI system where the pseudothermal source is generated using a polychromatic x-ray source and a sheet of rotating sandpaper. Here, the spatial structure of the illumination is similar to the speckle pattern produced using a laser and rotating ground-glass diffuser, however, this is now due to absorption rather than laser interference. The characteristics of these speckle-like features are determined by the size and transmission properties of the silicon carbide grains in the sandpaper.

Liquid crystal spatial light modulators
Pseudothermal light beams can also be generated by applying controllable random phase masks, φ r (x, y), using a liquid crystal spatial light modulator (LC-SLM), a computer-controlled diffractive optical element which has enhanced a number of research fields in recent years. LC-SLMs impose a prescribed amount of phase shift at each pixel in an array by varying the local optical path length. Typically, this is accomplished by controlling the local orientation of the molecules in a nematic liquid crystal layer covering an array of electrodes. These are generally reflective devices and they have an associated diffraction efficiency, fill factor and overall reflectivity, which determines their overall optical efficiency. Examples of computational GI schemes using an LC-SLM with a single-pixel detector can be found in Shapiro [26] and Katz et al. [27].

Digital micromirror devices
Digital micromirror devices (DMDs), consisting of an array of hundreds of thousands of individually addressable micromirrors, were originally developed for the display industry [56]. They offer a method of modulating light which is fast and works over a broad range of wavelengths. Micromirrors can be individually oriented at ±12 • , with respect to the plane of the array, by displaying a binary pattern on the DMD. The result is that light normally incident on the DMD is redirected into two paths at ±24 • respectively i.e. 2 × ±12 • , as illustrated in Fig. 2. In a typical single-pixel camera configuration the DMD is implemented as a programmable binary transmission mask where only the path of light arising from the micromirrors in the "on" state, corresponding to a value of "+1" in the binary pattern, is transmitted and the other path, corresponding to "0", is blocked. This can be used to structure the detected image intensities and is commonly referred to as structured detection (as previously illustrated in Fig. 2). Alternatively, the DMD can be used with a light source to project intensity patterns onto a scene, commonly referred to as structured illumination (as previously illustrated in Fig. 3).
The use of light can be optimised by measuring the light in both the positive and negative directions of the mirror tilt. Using two detectors in this manner enables a differential measurement to be performed. However, this is more commonly achieved using just one detector by displaying a pattern which is immediately followed by its contrast inverse. In addition, the background signal noise is noticeably reduced with the differential scheme, especially in the presence of illumination noise. Of course, this differential approach comes at the cost of doubling the number of binary patterns that need to be displayed on the DMD and hence, an increase in the time required to collect the data and reconstruct the image. However, DMDs are commercially available having binary pattern display rates of 22.7 kHz, which for relatively low-resolution applications allows near-video rate image reconstruction on a standard performance computer [8].
The superior modulation rate and broad wavelength response of available DMD systems, in comparison to those based on liquid crystal technology, make DMDs the common choice for use in computational imaging systems. They are particularly compatible with multi-spectral applications, where a small number of different detector types are used to measure the correlated intensities, assuming that the broad spectral response of the DMD is greater than the combined spectral responses of the individual detectors. The aluminium micromirrors of the DMD are compatible with light from the UV to the IR. However, careful consideration is required when operating at the longer wavelengths due to diffraction effects arising from the pitch of the micromirror array elements, typically 10-15 µm for many devices. Despite this limitation, standard DMDs can be used to indirectly modulate THz beams for THz single-pixel imaging (even when using wavelengths typically hundreds of µm). Stantchev et al. [57] used a DMD to spatially modulate an 800 nm pump beam which was imaged onto the back of a silicon wafer in order to modulate the THz beam. THz imaging systems have potential for applications in non-invasive imaging of concealed structures, such as in the semiconductor manufacturing industry. DMDs have also been demonstrated in other novel imaging applications. Gao et al. demonstrated compressed ultrafast photography (CUP) by using a DMD with a streak camera and based on compressed sensing [58], achieving single-shot CUP at one hundred billion frames per second.

LED arrays
The limited frame rate of many single-pixel cameras and computational GI systems has limited their use for dynamic imaging applications. Following the early demonstrations of single-pixel imaging, many research groups have utilized compressive sampling in order to significantly reduce the number of mask patterns required to successfully reconstruct an image. However, there is still a computational cost associated with compressive sampling schemes. Recently Xu et al. [32] demonstrated a computational ghost imaging system that could continuously capture 32 × 32 pixel images of a dynamic scene at a rate of 1000 fps, approximately two orders of magnitude larger than other existing ghost imaging systems, by utilising an LED array for high-speed structured illumination. This was achieved by utilising the very fast (<1 µs) switching time of the LEDs, along with the symmetry present in the Hadamard basis set that was used.

Pattern choice
For a camera to image without using a pixelated sensor array we need to apply a series of masks to acquire the spatial information. In the early days of television this was achieved using a physical mask, a rotating Nipkow disc consisting of a spiral arrangement of holes [59]. The signal was measured as each of the holes rotated past the scene, and line-by-line this would construct an image [34,60]. The modern version of applying a mask is to use a DMD or SLM (see section 4), enabling the mask to be dynamic and displaying a set of carefully chosen masks. The mask set, or sampling basis, can be chosen from a range of options for making a single-pixel image measurement. The simplest method would be to emulate that of early television and measure a single area per-pixel, effectively raster scanning a single pixel over the scene; this per-pixel measurement works well with high light levels but is an inefficient use of the available light [5].
GI using individual pairs of entangled photons takes many hundreds of individual measurements to form an image [42]. As discussed in section 3, it is possible to project a pseudothermal light field consisting of speckle patterns to perform classical GI but this again takes many measurements to produce a useful image [24]. Section 7 discusses a range of strategies to perform compressive sensing in order to reduce the number of measurements required when using a random basis set. In contrast, an orthogonal basis set systematically samples the scene to acquire the image such that an image is broken down into its component spatial frequencies and recorded.

Random binary
Sampling with speckle patterns could be simulated with random patterns [27,61]. These random patterns could be grey-scale values, however if fast acquisition is desired then a DMD will be able to project a series of binary patterns at much faster rates. These random patterns will reconstruct the image [26], though it can take a very large number of samples to produce a low noise image [30]. A more efficient sampling can be performed by using differential measurements, taking measurements for both sides of the DMD by using two sensors and subtracting the measured signals. The method can also be improved by using exactly half the pixels for each measurement [29]. An example of the output of these differential measurements is shown in Fig. 4.

Hadamard transform
The Hadamard matrix can be used as a basis for various sensing and imaging applications, such as recording the spatial frequencies of an image [62,63] or multiplexing the direction of illumination in a scene [64,65]. In the case of a single-pixel camera, the use of a Hadamard basis to sample the image was demonstrated by Duarte et al. [5]. The Hadamard patterns are orthogonal with binary values of +1 or −1, the Hadamard matrix is derived from the initial matrix H 2 to produce any 4k sized matrix.
These matrices are their own transpose such that HH T = nI n , meaning that image reconstruction can be performed without matrix inversion. For image processing the naturally ordered Hadamard matrices can use the Walsh-Hadamard transform to calculate the result of a Hadamard matrix multiplied by a vector, with the existence of a fast Walsh-Hadamard transform (FWHT) making minimal demand on the computation required [66]. The imaging masks are created for a N pixel image (i.e.

√
N × √ N) by using the Hadamard matrix of size N × N. Each row is reshaped to be the size of the image and the signal can be measured for that matrix. An example of these patterns is shown in Fig. 4. The final image is reconstructed as the Hadamard matrix multiplied by the vector of measured signals S to produce a one-dimensional vector of the output image O that requires to be reshaped into the 2D image, The orthogonality of the Hadamard basis is maintained when the elements of each of the patterns is either +1 or −1, rather than the +1 or 0 that can be displayed on the DMD. Therefore, the differential signal acquisition approach is commonly used when displaying Hadamard patterns.
To sample the scene a measurement must be performed for both the +1 and −1 Hadamard values; this can be performed using either the two detector or single detector differential measurement scheme as discussed in section 4.3. This differential measurement removes any offset in the image due to background light, or slow variations in the illumination source brightness. However, this differential approach comes at the cost of requiring twice the number of patterns to be displayed on the DMD. To demonstrate how the number of patterns can be reduced and still recreate an image we can consider what happens when we reduce the frequency range of the Hadamard patterns used. The frequency spectrum can be determined by the number of changes in the pattern, how many times the image changes between −1 and +1 (for the Hadamard patterns this value is the same for all the rows and for all the columns in a single pattern). Figure 5 shows this measurement in the x and y directions, enabling a frequency spectrum to be produced with the signal measured for each pattern. The Hadamard spectrum shows the zero-frequency component in the top left and the maximum frequency in the lower right. The plot demonstrates that with orthogonal patterns the number of patterns used to capture the image can be reduced and will change the resulting image quality. The difference between the ground truth and the produced image can be measured using the mean squared error (MSE) as the difference between the ground truth intensity image I GT and the reconstructed image I, defined as From this the Power signal-to-noise ratio (PSNR) in decibel (dB) is defined as These calculations are performed for the different numbers of patterns used, with a comparison of the Hadamard and Fourier basis shown in Fig. 5. A square cut-off is used to reduce the number of Hadamard patterns to reconstruct the image. The relation being that a significant reduction of the patterns can be made, which effectively reduces the number of pixels in the image.

Fourier basis
Other sampling schemes have been based on Fourier encoding of the pattern set [31] with further work showing some advantages over the Hadamard sampling method [67]. Whereas the Hadamard patterns use arrays of binary values, the Fourier patterns use gray-scale values. These gray-scale values can be produced with a DMD by dithering the mirrors during acquisition, or by using a high-resolution DMD and having a "super-pixel" of several mirrors, where the light gradient is controlled by the ratio of the mirrors in the "on" and "off" states. The same frequency is displayed with different phase values, with methods varying from using 3 or 4 phase values equally spaced between 0 and 2π. For a square image consisting of N pixels the patterns are created for the spatial frequencies 0 to ( √ N − 1) in both the x and y dimensions, with the frequencies u and v respectively. The pattern P(u, v) is generated for the image as An example of these patterns is shown in Fig. 4. The intensity signal is insufficient to make an image reconstruction, a measurement to acquire the phase is performed by changing the phase term, φ, in Eq. (7). The Fourier spectrum component F (u, v) for the spatial frequencies u and v defined for four values of φ is where D φ is the intensity measurement for the signal for each of the patterns P. An inverse Fourier transform applied to the Fourier spectrum will reconstruct the image. The Fourier spectrum allows for a sensible filtering of the number of patterns needed to reconstruct the image. The effect of reducing the number of sampling frequencies used to reconstruct an image is shown in Fig. 5. It has been demonstrated that changing the shape of the cut-off, from a square to a circle or diamond, can produce different fidelity in the image reconstruction while using the same number of patterns [67].

3D imaging and ranging
Three-dimensional (3D) imaging and ranging is a research field that supports a wide range of applications including object detection and classification, surface mapping and 3D situation awareness for autonomous vehicles. Within the field of computational GI, two main techniques are used, each having their advantages and drawbacks which are dependent on the specific application.

3D computational ghost imaging
A common technique of capturing 3D images uses stereo vision [68], which is the extraction of 3D information from the images of a scene obtained from different vantage points. However, these different images need to be aligned and have the correct geometry for the technique to be successful, and is usually computationally costly. There is a wide range of articles on using multiple 2D images to estimate depth, and a review of the algorithms used can be found in Lazaros et al. [69]. An alternative technique using photometric stereo [70] captures a sequence of images, all from the same vantage point but under different lighting conditions. Each image in the sequence is lit using a different spatially separated source of illumination. These images are much easier to align provided the sequence is captured fast enough to avoid movement of the scene between image frames. The resulting images each differ mainly in the shading profile of the scene, from which the surface normals can be estimated. Depth information can also be estimated from the 2D images obtained from single-pixel or computational GI. A good example of this is the 3D computational imaging system demonstrated by Sun et al. [30], which uses a photometric stereo technique and multiple single-pixel detectors rather than multiple illumination sources. Multiple detectors in different positions are used to capture multiple images of a scene illuminated using a sequence of structured patterns. Similar to conventional photometric stereo imaging, the shading in each individual image appears as if it was illuminated from a different direction. Since the spatial structure of the images is determined by a single pattern projector, the images exhibit perfect pixel registration, and comparing these images allows the 3D form of the scene to be reconstructed.

Time-of-flight imaging
A time-of-flight (ToF) measurement determines the distance to an object by illuminating it with pulsed laser light and measuring the delay of the back-scattered pulses [71,72]. The distance d can be estimated by d = ∆tc/2, where ∆t is the ToF and c is the speed of light. ToF can be used in a single-pixel imaging configuration to provide information on the depth of the scene while the transverse spatial resolution is provided by the single-pixel image reconstruction, allowing a 3D representation [54,73,74]. Pulsed lasers are available with a temporal resolution in the tens of picoseconds and suitable detectors can be sensitive at the single photon level. Therefore, a ToF method is compatible with long-range, high-precision depth mapping. Figure 3 illustrates a structured illumination computational GI system that also incorporates a pulsed laser illumination source for ToF measurements (similar to that described in Ref. [33]). It is important to realise that these ToF measurement schemes are also compatible with single-pixel camera configurations, such as the one illustrated in Fig. 2 and the experiments reported by Howland et al. [20,21].
In the case of regular 2D computational imaging, one average intensity measurement is recorded for each mask or projected pattern in the sequence. In the 3D scheme, a series of intensity measurements are recorded for each pattern, each element corresponding to an intensity measurement at different depths within the scene. Hence, a series of images can be reconstructed, one at each depth, forming a 3D data cube from which both the reflectivity and depth information can be extracted.
Some of the previous demonstrations of ToF single-pixel imaging (or single-pixel LiDAR) were based on photon counting (Geiger mode) detection [20,21]. However, despite the benefits of being able to image in low light conditions, photon counting detectors have the disadvantage of having an inherent dead time (typically 10s of nanoseconds) between successive measurements, reducing the total detection efficiency. This requires measurements of the back-scattered photons from many illumination pulses in order to obtain an accurate temporal response from a 3D scene. Alternatively, a high-speed photodiode can measure the temporal response from a single illumination pulse. Sun et al. [75] demonstrated a single-pixel 3D imaging system using a high-speed photodiode for measuring the time-varying response of the back-scattered light, achieving a depth accuracy of 3 mm at a detection range of 5 m.

Regularisation techniques
Real images are not collections of random pixel values, rather spatially adjacent pixels tend to have similar values to each other. Within traditional image processing this allows various denoising algorithms to be applied. Within single-pixel imaging denoising is also possible but the same principles allow a form of compressed sensing where the number of masks and associated measurements can be reduced to be smaller than the number of pixels in the image. Both denoising and compressed sensing can be based on a cost function for the reconstructed image which is derived from both the data and prior information, the image is then optimised to minimise the value of this cost function. The prior information can take several forms of varying significance. At its most basic the prior can, without loss of generality, assume that all pixel values are of positive intensity. Additionally, most natural scenes when expressed in the spatial frequency domain are sparse, i.e. many of the spatial frequencies have extremely low amplitude and can be discarded. This sparsity is the basis of many image compression techniques including the ubiquitous JPEG [37], where a discrete cosine transform is used based on a fixed low-dimensionality. To avoid the need to repeatedly calculate Fourier, or similar, transforms a similar prior is to recognise that either total variation (TV) or total curvature (TC) of the intensity distribution, O (x,y) , of a natural image is small. These regularisation functions, R, can be written as where as before N is the total number of pixels. This image regularisation can be considered alongside a measure of how well the reconstructed image accounts for the measured M data values. For Gaussian noise this is characterised by the average of the square of the difference between the measured and predicted signals, χ 2 /M, to create a cost function, C, for the image reconstruction.
The quantity λ sets the balance of the reconstruction between satisfying the data or the prior (captured in the regularisation function) and is typically set at a level such that, when optimised, we have χ 2 /M ≈ 1, ensuring that the reconstruction is the one which most satisfies the prior while still being statistically compatible with the data. As introduced above, if an image is reconstructed using the approach embodied in Eq. (1), the algorithm can be applied to cases for both M>N (number of measurements exceed number of pixels) or even M<N (number of measurements less than number of pixels). When M>N the regularisation process is an example of denoising, when M<N it is an implementation of compressed sensing [76]. The literature around compressed sensing is extensive, not just for single-pixel cameras but more widely to high-dimensional measurement systems in general. Having established a cost function upon which to optimise the reconstruction there are further subtleties as to the statistical properties of the various regularisation functions. Assuming that the regularisation term is based on the sum of many terms, e.g. the coefficients of the image spatial frequencies, one can combine these coefficients into a single number, R, in various ways. Most obvious is to calculate the sum of the squares, l 2 , of the individual coefficients, and a minimisation of this term will tend to suppress the large coefficients. However, real images of natural scenes often have a few dominant spatial frequencies and a better goal is to promote a sparsity in the spatial frequencies. To this end a more powerful regularisation is to calculate the sum of the moduli of the coefficients, l 1 [2,4]. The details behind this sophistication is beyond the scope of this article but it is worth flagging how different statistical measures yield reconstructed or denoised images with different characteristics. There is no universal optimum measure, but rather the regularisation term should be chosen that best reflects the image type. If an l 1 regularisation is to be used then it is essential to undertake this regularisation in a basis in which the typical image can be described by the smallest number of non zero coefficients (i.e. a basis in which the typical images are sparse)which for natural scenes is often the wavelet basis.

Machine learning
The machine learning approach to single-pixel imaging is a newer development than many of the other single-pixel associated techniques. The approach uses deep learning in the form of a convolutional neural network (CNN) to perform the reconstruction of an image based on fewer measurements than would be required for the orthogonal sampling methods or the traditional ghost imaging techniques. The CNN exploits the development in the speed of calculations performed by graphics processor units (GPUs) to allow higher computation rates than are available on conventional computer processors. These CNNs have so far brought about breakthroughs in processing images, object identification, language processing and medical diagnosis [77,78].
An application of CNNs developed by many groups has been to make reconstructions of an image from random patterns [79][80][81]. This computational GI using a CNN has produced equivalent results to a regulariser method (as discussed in section 7). Sampling with random patterns is much less efficient than with a structured imaging basis. However, deep learning has allowed the sampling basis itself to be constructed to be most efficient to sample a scene. If we are able to construct the most efficient basis then efficient imaging can be made with a minimal number of measurements. This development of a sampling basis was shown by Higham et al. [23], for a 128 × 128 image the number of samples was made to be 666 demonstrating a 4% compression in the sampling to reconstruct 2D images at video rates. The training of the CNN produces the sampling basis (pattern set) and also the reconstruction algorithm in the form of a trained neural network, results are shown in Fig. 6. This development of a custom basis was later used to produce 3D images of a scene [33], where the deep-learned patterns were projected onto a scene and the depth image was recovered. Similar to the scheme described in section 6.2 and illustrated in Fig. 3, a pulsed laser illuminated a DMD which was used to structure the light and (using photon-counting timing electronics) a time-of-flight measurement was made to collect the depth map of the scene. Single-pixel imaging does not necessarily need to perform a full image reconstruction to detect and classify objects. A CNN has been used to develop a low number of patterns to classify and identify very fast moving objects [82]. This technique could be extended further to enable a sensing system that feeds into a control system, such as an autonomous vehicle, meaning the creation of an image to analyse is not required for the navigation algorithms to react to a hazard presented to it, enabling much faster reaction times. Such sensing schemes are sometimes referred to as image-free classification.
Finally, using optical machine learning with a single-pixel detector may also be a possibility for image reconstruction. A diffractive neural network made up of a cascade of phase-only masks can reconstruct images without requiring the processing power of a computer [83].

Conclusions
We have provided a review of both single-pixel imaging and computational GI techniques and given a summary timeline of some of the main developments over the past twelve years. We have discussed some of the important aspects of the technique including, the modulation hardware, choice of pattern design, sampling strategy and the choice of detector types. While it is clear that single-pixel cameras and computational GI systems are similar in an optical sense, for the purposes of this review we found it convenient to maintain the distinction between the two. In this respect we recognise that single-pixel cameras are often based on structured detection and compressed sensing while computational GI is often based on structured illumination.
We have discussed two major advantages of single-pixel imaging which both relate to the choice of the single-pixel detector type. The first is in the design of low-cost cameras for imaging at wavelengths or multiple wavelengths outside the visible spectrum, where focal plane detector arrays are unavailable or prohibitively expensive. The second is time-resolved imaging where the time resolution of the single-pixel detector is vastly superior to that of the focal-plane array. Potential applications could be in the design of low-cost cameras for imaging at IR wavelengths, such as in gas leak detection, and for 3D imaging and ranging using LiDAR systems.
From very early on in the development of single-pixel imaging and computational GI there has been much research into ways to reduce both the data acquisition time and the image reconstruction time. Some of these techniques have been discussed in this review and include, orthogonal sampling pattern basis, compressive sensing, high-speed spatial light modulation and machine learning algorithms. Machine learning techniques have shown promise in LiDAR systems for the high-speed 3D information and ranging required for situation awareness of autonomous vehicles. It is important to realise that in such detection and classification applications it is often sufficient to detect the characteristic intensity signals without needing to reconstruct the image. Hence, fast "image-free" detection and classification is a promising research field of single-pixel imaging which could lead to an exciting new range of unique sensing technologies.

Disclosures
The authors declare no conflicts of interest.