Confocal 3D reflectance imaging through multimode fibers without wavefront shaping

Imaging through optical multimode fibers (MMFs) has the potential to enable hair-thin endoscopes that reduce the invasiveness of imaging deep inside tissues and organs. Current approaches predominantly require active wavefront shaping and fluorescent labeling, which limits their use to preclinical applications and frustrates imaging speed. Here we present a computational approach to reconstruct depth-gated confocal images using a raster-scanned, focused input illumination. We demonstrate the compatibility of this approach with quantitative phase, dark-field, and polarimetric imaging. Computational imaging through MMF opens a new pathway for minimally invasive imaging in medical diagnosis and biological investigations.


Introduction
The use of optical multimode fibers (MMFs) as imaging conduits could enable hair-thin minimally invasive endoscopes that provide visual access to otherwise difficult to reach tissue sites [1]. Imaging through MMF has shown many advantages over other optical endoscopes: pliable geometry and lower cost compared to wide-field rod-lens endoscopes, minimized probe size and variable sampling rate and working distance compared to micro-electromechanical-system-(MEMS) based endoscopes, and dense mode population over the core area compared to fiber bundles and multi-core fibers. Although current approaches to imaging through MMFs have small tolerance to bending and generally require a rigid shape after calibration [2], progress with addressing this limitation [3,4,5] highlights the promising biomedical significance of MMF imaging in neuroscience and imaging-guided surgery and biopsy [6,7,8,9].
Several strategies have been proposed to compensate the chaos caused by mode mixing and dispersion in MMF transmission, including wavefront shaping (WFS) [10,11,12,13,14,15,7,9], speckle imaging [6,16], compressive imaging [17,18], and deep learning [19,20]. WFS, which is the primary strategy for endoscopic imaging of complex three-dimensional (3D) biological samples, generates a sharp distal focus through a MMF by taking into account the calibrated MMF transmission. In this approach, a distal focus is scanned through the sample for image formation. Combined with focusing through MMF using WFS, digital phase filtering on the detection pathway has been demon-strated to enhance the optical image contrast of non-fluorescent samples and allow optical sectioning to resolve axial structures [12] without the need for pulsed lasers as in time of flight methods [21].
Unfortunately, WFS techniques require complex systems and encounter limitations in imaging speed, focus accuracy, and power efficiency. One fundamental limitation is the physical latency of wavefront shaping modules. While an unprecedentedly fast WFS technique has been introduced to enable real-time focusing through scattering media by using an ultra-high-speed 1D modulator [22], imaging through MMF with this technique is associated with suboptimal focus contrast due to the uncompensated transverse dimension. Furthermore, most MMF imaging is so far geared towards preclinical imaging in mouse models and combines fluorescence labeling with physical scanning for high sensitivity and specificity. Imaging through MMFs for clinical diagnostic applications would benefit from exogenouslabel-free imaging. Towards this end, Choi et al. reported on wide-field label-free imaging through MMF that avoids WFS and offers high imaging speed by employing galvanometer scanning mirrors and a turbid lens imaging algorithm [6], but this approach does not provide optical sectioning or phase information, leading to low-contrast wide-field images.
This work builds on the simplicity of WFS-free MMF imaging and incorporates 3D optical sectioning to increase image contrast by introducing a new computational strategy that enables synthetic confocally gated imaging through MMF. The method achieves high-contrast reflectance imaging through point-spread function (PSF) engineering based on coherent transmission matrix (TM) operations. After initial calibration of a MMF -common to all current MMF imaging strategies -and obtaining its full forward TM by scanning a focal spot over the proximal side of the MMF, we measure the round-trip reflection matrix of the fiber including the sample at its distal end using the same proximal spot basis. Owing to reciprocity, the TM defines the bi-directional light transport through the MMF and describes light scrambling on both illumination and detection pathways [23]. We then digitally compensate the effect of transmission through the MMF on the reflection matrix by left-and right-multiplication with the regularized matrix inversion of the TM and its transpose, respectively. The individually addressable recovered illumination and detection modes enable remote confocal gating of selected observation planes (OPs). Using numerical refocusing, a single measurement of the reflection matrix yields volumetric images of the sample, with field of view (FOV) and spatial resolution on an OP determined by the fiber numerical aperture (NA) and the solid angle subtended by the fiber core. Finally, we show imaging of complex samples with versatile optical contrasts including dark-field and birefringence imaging to demonstrate the potential for label-free endoscopic biomedical imaging.

Overview
The monochromatic optical transmission through a general medium from an input surface to an output surface can be expressed by a coherent matrix, specifying the amplitude and phase evolution of the transmitted field between pairs of input and output spatial channels. As illustrated in Fig. 1(a), we express forward light transmission through a MMF as matrix T. The measurement of T, by determining the amplitude and phase of the output speckle pattern arising from focal illumination at each independent transverse location on the input fiber facet, calibrates the transmission from the proximal (P) end to a distal calibration plane (at z = 0). The MMF output speckle pattern on the calibration plane per each proximal input realization constitutes a column vector of T. In imaging mode with a sample at the fiber distal end, we illuminate and detect light from the fiber proximal end. This corresponds to a bi-directional light transport consisting of forward transmission through the MMF, free-space propagation to an OP modeled by Fresnel diffraction (H), speckle illumination on and backscattering from the sample, coupling back into the same MMF, and backward transmission to the proximal facet. The TM representing backward transmission T T is the transpose transformation of the forward TM T due to reciprocity [23], and the overall round-trip reflection matrix M, describing optical transmission from and to the proximal side matrix formalism can be expressed as where R quantifies the backscattering process of the light-sample interaction in the spot basis on the distal OP. R has intrinsic transpose symmetry, R = R T . Similar to T, each column of M is a proximally recorded speckle pattern per input realization. The transpose symmetry of M that follows from Eq.1 is experimentally attained as elaborated in the Methods section.
The computational reconstruction is illustrated in Fig. 1(b). With a previously measured T, we can digitally compensate the light scrambling during round trip MMF propagation and extract R from M by right-and left-multiplying it with the inverse and transpose inverse of HT, respectively. The H matrix can be numerically generated for arbitrary refocusing and controls the OP position (see Supplementary Materials S2). In practice, the measured T is generally non-square, ill-posed, and noisy, so we used Tikhonov regularization to approximate its inversion and transpose inversion, T -1(tik) and T -T(tik) , with the regularization parameter set to 10% of the largest singular value as justified by the In the imaging phase, we measure the coherent round-trip M through the same set of proximal channels as used for calibrating T, but now in the presence of a distal object. The H matrix accounts for free-space propagation from the calibration plane to an OP. The sample is illuminated by different speckle realizations due to the proximal 2D scan, and R matrix denotes light-sample interaction. The measured M and T are used for computational reconstruction. (b) By modeling the round-trip light transmission with matrix multiplications, we can compensate the MMF scrambling using the measured T to isolate the reflection matrixR. The image of the sample is reconstructed from the diagonal of |R| 2 , which is essentially synthesized confocal illumination and detection through each available channel. With numerical refocusing, we can complete 3D scan with a single measured M. The color map encodes complex values. amp.: amplitude. (c) Depth-gated confocal images and the corresponding γ profile of a 3D-structured sample. The d axis shows physical distance from the fiber facet. d is in the unit of µm. The scale bar is 50 µm.
L-curve method [24]. The approximated R,R, thus can be derived as where matrices are defined regardless of basis representation. When the input and output ofR are both in the spot basis, an adequate high-contrast image reconstruction of the en face scattering on the OP can be obtained by reshaping the intensity on the diagonal ofR into its corresponding 2D xy-layout. Physically, the diagonal elements ofR correspond to synthetic focal illumination and detection occurring through identical channels on the OP, creating a spatial confocal gating effect with a depth of focus determined by the effective NA available at each location on the OP. By varying H, we can numerically shift the OP along the optical axis to different distances and reconstruct the full addressable 3D image volume from a single measured M.

MMF calibration and sample reflectance measurement
All experiments used a 1-m-long step-index MMF with 105 µm core diameter and a NA of 0.22 (FG105LCA, Thorlabs) that theoretically supports ∼ 550 guided modes per polarization at λ = 1550 nm. The fiber was coiled with a minimum radius of curvature of ∼ 50 mm. The monochromatic calibration matrix T was measured by sequentially probing the MMF input channels with a focal spot, while complex detection was used for all distal output channels concurrently. Each input and output channel included two orthogonal polarization states: horizontal (H) and vertical (V). The focal spot position on the proximal input side was indexed by u and the speckle pattern exiting on the distal side was imaged with an off-axis holographic imaging system, whose object plane determined the calibration plane, approximately 100 µm away from the MMF distal facet. Images were flattened into column vectors of T directly in the Fourier domain, with output channels indexed by ν F . The input and output spatial channels of T have been ordered first by spatial coordinate, then by polarization.
In imaging experiments, we again sequentially coupled light into the MMF through the same set of proximal input states. We then recorded the round-trip light transmission on the proximal side by the same off-axis holography setup.
To preserve the symmetry between the illumination and the detection configurations and to obtain a square matrix M, we sampled the recorded complex output fields at the ordered positions identical to the set of input states. Numerical corrections to compensate for the physical misalignment were then applied to the output channels of the measured M to accurately match the input channels and recover the underlying transpose symmetry as previously described [23]. The detailed experimental setup and data processing are in Supplementary Materials S1.

Confocal image reconstruction
Using the sample measurement M and the pre-measured T, we then computedR following Eq. 2. The free-space propagation matrix H(z) from the calibration plane to a selected OP(z) was defined as a diagonal matrix in ν F (illustration in Fig. S1(b-1) and details in Supplementary Materials S2), considering the medium refractive index. The input and output basis ofR were converted from ν F to ν by multiplication with a pre-computed inverse discrete Fourier transform matrix. A 2D confocal intensity image I of sample reflectance at the OP was reconstructed by reshaping of the diagonal ofR as where the point (x, y) is mapped from the distal spatial channel ν to real-space coordinates, and [·] indicates matrix entries, arranged in row and column. For polarization-preserving samples, reconstructed images of co-polarized illumination and detection channels are identical and incoherently summed to increase signal. This computation was repeated for multiple values of z to generate 3D images from a single reflectance measurement M with depth expressed in d (referenced to distal facet). Intensity images were converted to base-10 logarithmic scale for display.

Wide-field image reconstruction
From the same measured M, we can also obtain wide-field imaging that is equivalent to the turbid lens imaging algorithm [25]. We compensated for the reverse MMF transmission of reflectance from the sample under the variety of speckle illuminations, and then incoherently averaged the reflectance to statistically compose a uniform illumination. In terms of matrix operations, we left-multiply Eq. 1 with H -T T -T(tik) where each column of the matrix product is the sample reflection resulting from a distinct speckle illumination. Widefield images were reconstructed by integrating the absolute square ofRHT along the input dimension into a single column vector and reshaping the vector back to 2D coordinates. T(:,u) means u th column vector of T. To simplify computation, the matrix product HT in Eq. 4 was assumed to be unitary, so that by Parseval's theorem the integrated row intensity of RHT is identical to that ofR. Confocal and wide-field images from the sameR can be thereafter fairly compared.

Quantitative phase imaging
In confocal imaging, due to the complex nature ofR, quantitative phase imaging of thin samples can be accomplished by taking the complex values of the diagonal elements of a computedR to form a complex 2D image (X), where the amplitude encodes the absolute reflectivity, and the phase quantifies changes in the wavefront of light propagating through the specimen and back.

Dark-field imaging
Off-diagonal elements ofR also contain abundant information of sample optical properties, which can be extracted through manipulations onR. For instance, each column ofR represents the scattering at OP in response to an illumination focused on a single channel q. Instead of collecting the intensity at the corresponding location on the matrix diagonal, the intensity of surrounding output channels ν was summed with weights L(ν, q) given by their Euclidean distance from the input channel on the xy-plane (the on-diagonal confocal signal thus has a zero weight) with a cutoff radius of 12 µm, and normalized by the overall intensity where the Cartesian point (ζ, ξ) maps to the distal channel indexed at ν. We named this metric scattering contrast (S). SinceR is transpose-symmetric, interchanging the illumination and detection renders identically reconstructed images. To compare with microscopy, for each location in the image plane, the scattering contrast S is essentially the combination of focused illumination and ring-shaped detection PSF, which captures positive signals from the boundaries of sample heterogeneity and is analogous to a dark-field confocal imaging scheme [26].

Polarization contrast
So far, computation of images fromR only considered co-polarized illumination and detection, where each distal spatial channel ν degenerates into ν H and ν V , and entries corresponding to input and output channels were used with the same polarization state. For birefringent samples such as collagen, illumination through a channel in a certain polarization state may induce cross-polarized backscattering. WhileR is symmetric and has inputs and outputs ordered first by coordinates and then by polarization, the diagonals of the two off-diagonal matrix quadrants represent crosspolarized detection, and the sample birefringence at individual image positions (x, y) can be resolved and characterized by assembling 2-by-2 Jones matrices, in the basis of orthogonal linear polarization states. From the Jones matrix at each channel, a retardation matrix was isolated using polar decomposition. Owing to the intrinsic transpose symmetry, the resulting matrix describes a linear retarder that can be characterized by its amount of retardance (ret) δ and optic axis (OA) φ orientation. Endogenous contrast within birefringent samples can thus be retrieved from this polarization-diverse measurement.

Depth-gated imaging through MMF without WFS
To demonstrate the depth gating effect of our computational reconstruction, we imaged a 3D-structured sample through the MMF, as shown in Fig. 1(c). The sample is a coverslip in air with buccal epithelial cells deposited on both surfaces. We computed the confocal image for each OP at varying distance d from the MMF distal facet (d = 0) and calculated the corresponding integrated reflectivity (γ) by summing the intensity over the entire en face image. The γ versus depth profile reveals three separated peaks, which inform reflective MMF facet and coverslip surfaces, and their axial positions at d = 0, 120, and 320 µm considering medium refractive index. The green and brown insets show highcontrast images of cells on the front (d = 120 µm) and back (d = 320 µm) surfaces of the coverslip, respectively. Our matrix approach, which combines confocal gating with numerical refocusing, thus enables 3D scan from a single measured M without WFS. For more details and additional results on this experiment, please see Supplementary Materials S6.
To further evaluate the confocal gating effect, we imaged a USAF resolution chart (R1D21P, Thorlabs) in different media and distances d through the MMF, as sketched in Fig. 2(a). In each medium, a sample reflectance matrixR was computed from a single measured M for each OP at varying depth to find in-focus position, and processed to reconstruct confocal and wide-field images for direct comparison, as shown in Fig. 2(b). The pixel-wise illumination and detection are also illustrated. Since the chart has a binary reflectance pattern across its surface, we can quantify the intensity image contrast as where I p and I g are the intensities of the chrome pattern and the glass substrate, respectively. In Fig. 2(c) and (d), I p and I g are averaged within selected regions of interest indicated by solid-line and dashed-line boxes for the chrome and glass substrate areas, respectively. In each imaging condition and modality, we calculated the corresponding γ profile, which is normalized by the highest value along the axial OP positions.
To directly study the confocal gating efficiency, we imaged the chart placed at d = 120 µm in air. In Fig. 2(c), the confocal method renders the chart patterns with high contrast due to the rejection of background signals from reflection at the MMF facet. The confocal image achieved a contrast of 0.9, close to the expected value of ∼0.92, assuming full reflection from the chrome pattern and 4% reflection at the air-glass interface. In Fig. 2(e), the profile reveals two prominent and separated peaks at d = 0 and d = 120 µm, corresponding to the MMF facet and the resolution chart, respectively. This shows the optical sectioning of confocal gating.
To test the capacity of computational confocal gating in the presence of additional sample scattering, we imaged the chart placed at d = 400 µm in agarose gel mixed with 2 wt.% intralipid, which corresponds to ∼0.36 mean free paths. In Fig. 2(d), the reduced confocal image quality may be due to: intralipid scattering that distorted the wavefront and the reconstructed images, degraded spatial resolution upon beam divergence at large d (elaborated in the following section), or the limitation of numerical H when the OP is far from the calibration plane. Despite the medium scattering and lower signal to background ratio when the chart is far from the facet, the confocal image maintained a high contrast of 0.96, compared to a theoretical value of ∼0.99 (assuming 0.4% reflection at the gel-glass interface). This could be attributed to the immersion gel having a refractive index closer to the glass material of chart and MMF, and the specular reflection from the chart substrate and MMF facet is attenuated. Meanwhile, the chart patterns still have full reflection, thereby resulting in a better image contrast. In Fig. 2(f), the peak in the γ profile at d = 400 µm corresponds to the chart, and, although weak, precisely informs on its physical location when assuming the medium refractive index to be 1.4. These results evidence the effective suppression of out-of-focus scattering and reflection without active WFS.
To theoretically compare the confocal method to the wide-field processing, one can juxtapose Eqs. 4 and 2 to find that the matrix multiplication also on the right side of M pre-compensates the light scrambling effect of the MMF forward transmission, and synthesizes sharp foci through the MMF on a selected OP. In contrast, the wide-field processing of R corresponds to speckle illumination, as illustrated in Fig. 2(b). When imaging in air as in Fig. 2(c), while the pattern with wide-field imaging stands out from the background on the OP at d = 120 µm, the strong background results in a low contrast of 0.48, and the corresponding γ profile in Fig. 2(e) remains a constant throughout the entire observation range. On the other hand, the wide-field image in Fig. 2(d) can barely distinguish the pattern from the background, resulting in poor contrast of 0.11. The uniform γ plot of wide-field imaging in Fig. 2(f) again exposes the lack of optical sectioning.

Efficient imaging with flexible reconstruction, field of view, and spatial resolution
Imaging through a MMF using WFS typically scans a focus along a pre-defined scanning trace and records a single image point from each focus location. In contrast, our method illuminates the sample with a sequence of MMF-induced speckle patterns and utilizes the camera for parallel sampling of all addressable locations in the imaging volume. This allows arbitrary definition of sampling grid and working distance in post processing of single measurement of M (elaborated in Supplementary Materials S3). As light diverges upon exiting the MMF distal end governed by the fiber NA, computational reconstruction can adapt to a growing FOV with increasing OP distance from the fiber facet. Here, we demonstrate this flexible reconstruction in MMF reflectance imaging and evaluate the resulting FOV and 3D spatial resolution as a function of distance to the tip of the fiber.
To mimic endoscopic imaging with a variable working distance, a resolution chart was mounted on a translation stage and positioned at different distances d = 10, 600, or 1200 µm. For each d, a round-trip M was measured and computational reconstruction with numerical refocusing was utilized to locate the axial position of the resolution chart. Figure 3(a)-(c) show the in-focus confocal images of different chart areas (color boxes in (d)) with physical dimensions of 123, 184, and 247 µm, when d = 10, 600, and 1200 µm, respectively. The illumination power on the chart was kept at ∼0.5 mW. Due to the beam divergence and limited laser power, camera exposure time was increased from 200 µs up to 1 ms for larger d to compensate for the declining photon collection. We filled the space between the fiber and the chart with index-matching gel (G608N3, Thorlabs) to mitigate the specular reflection from the MMF distal facet. In Fig. 3(a)-(c), imaging from farther away captures a more complete picture, as the FOV expands with increasing d. However, this comes at the expense of spatial resolution and collected reflectance power, as the patterns are severely blurred at d = 1200 µm, and background speckle becomes apparent. The finest detail of the chart, element 6 in group 7, can be resolved when the MMF is in close proximity of the facet, d = 10 µm, where the FOV is determined on a lower bound by the fiber core size. To quantify the spatial resolution at varying d, we inferred the lateral resolution, δx, from the smallest resolvable pattern on the chart, as shown by example dashed blue line and its linear intensity profile plot in 3(e). Also, since the chart serves as a sharp edge in the axial direction, we utilized the FWHM of the γ profile around the focused d to measure the axial resolution δz. The FOV of each computed image was characterized by its diameter ∅, set as twice the radius where the radially averaged intensity dropped below 1 % of the center. We conducted several imaging realizations and computed corresponding axial profile and radial mean intensity at d = 10 µm for statistical analysis, as plotted in red and black curves in 3(e), respectively. For comparison, the theoretically expected spatial resolutions were derived considering the effective on-axis NA defined by the minimum between the fiber NA and the solid angle subtended by the fiber core at the corresponding distance from the facet. The theoretical FOV was estimated using T by measuring the radial extent of averaged synthetic illumination patterns away from the facet (further details on theoretical resolution and FOV are presented in Supplementary Materials S4). The overall quantification results are shown in Fig. 3(f). While the experimental spatial resolutions are consistent with diffraction-limited theoretical values, the experimentally determined FOVs are up to 50 % smaller than expected with increasing distance. This may be due to the low light collection efficiency of reflectance from distant objects or the limitation of numerical H. Nevertheless, the agreement in scaling properties of experimental and theoretical values corroborates the flexibility in addressable spatial dimensions given by the degrees of freedom guided through the MMF. These results demonstrate the convenience of reconstructing the entire sample volume without a pre-defined scan pattern in practical setting where the sample distance is unknown. Moreover, this flexibility of the matrix approach allows confocal image reconstruction from partial measurements of M with illumination through only a subset of proximal spatial channels (as shown in Supplementary Materials S5). While reconstruction from partial measurements compromises background suppression, it accelerates the volume rate, which may be critical for real-time applications.

Multi-modal computational confocal imaging of label-free complex samples
To improve specificity in confocal MMF imaging without WFS for visualizing unlabeled biological specimens in a reflection mode, we leveraged the matrix approach to generate diverse contrasts from a measured round-trip M by applying different post-processing, and synthesized multiple imaging modalities to create signal specificity. Different strategies for image formation are described in the Methods section and illustrated in the following figures. The multi-contrast images in addition to the intensity images of the 3D imaging example in Fig. 1(c) can be found in the Supplementary Materials S6.
We started with non-birefringent samples to demonstrate multi-contrast imaging. Figure 4 (a) shows a typical reflection matrixR. The sample arrangement for these experiments, shown in Figure 4 (b), allowed imaging of a sample on a microscope glass slide in a reflection mode through the MMF, and also in transmission mode (t) with the distal imaging system and bright-field illumination through the MMF as ground-truth images. Figure 4 (c) shows the images of a monolayer of 3 µm polystyrene beads spread on the surface of a microscope slide and imaged in air at d = 100 µm. Since the reflectivity of the beads is orders of magnitude lower than that of the air-glass interface, the obtained round-tripR on in-focus OP has diagonal elements dominated by the specular reflection from the glass slide, resulting in beads silhouetted against the glass signal in the confocal intensity image (I) and featuring negative contrast, similar to other reports of reflectance imaging through optical fibers [25,12,27]. With the full knowledge ofR and following Eq. 7, we are able to extract scattering signal specifically from the beads and create a dark-field image (S) through numerical engineering of the system detection PSF (sketched in the inset). Physically, bead-induced roughness distorted the specular reflection wavefront, which made returning light partially cross-coupled to neighboring spatial channels. Intriguingly, the cross-coupling signals that delineate the beads provided a slightly higher resolving power than the confocal intensity image, as verified by comparing the line profiles of clustered particles. This exemplifies the benefit of computational reconstruction, whereas physical implementation of dark-field imaging would traditionally require an annular filter, axicon lens, or customized pinhole, and increase the system complexity [11,26,28].
To demonstrate multimodal MMF imaging including phase and dark-field imaging from the same measurement of an unlabeled biological specimen, human buccal epithelial cells were smeared on microscope glass slide and placed at d = 120 µm in air. The sample was laterally translated to image several overlapping areas, and at each lateral location a round-trip M was measured to reconstruct the corresponding image. Multiple images were then stitched together to make a composite image with a wider FOV. Fig. 4 (d) shows phase contrast (left, X), revealing the contour of cellular membranes and nuclei in its amplitude (coded in brightness), likely because they deflect the focused illumination which attenuates the reflected signals, thereby resulting in negative contrast. Furthermore, as shown by its color-coded phase, the variation in sample thickness or refractive index inhomogeneity provides an intrinsic phase contrast of the unlabeled sample likely caused by sub-cellular structures, with information not contained in the intensity image alone. The scattering contrast image (right, S) delivers complementary information by positively outlining the cellular membrane morphology along with some cytoplasmic organelles that can be roughly correlated with the transmission image.
To demonstrate polarization sensitive (PS) computational imaging through MMF based on our matrix approach and Eqs. 8, as illustrated in Fig. 5(a), we obtained reflection matrices of anisotropic materials including quarter-wave plate (QWP) and cholesterol crystals through the MMF. To validate quantitative retardation and OA measurements, a QWP (WPQ501, Thorlabs) was placed on a microscope slide and its proximal reflection was measured. As show in Fig. 5(b), the edge of the wave plate was imaged in different orientations to verify the OA orientation retrieved from polarization analysis. One M was measured in each orientation. Due to the round-trip light propagation, the QWP has an effective half-wave retardation, which leads to full attenuation in the co-polarized detection for confocal intensity images when the slow axis is 45°to the H or V polarizations, and partial attenuation in between. Consistently, the corresponding retardance images reveal a constant π rad retardation of the wave plate regardless of the orientation. On the other hand, the color-coded OA images (combined with brightness-coded ret images) show a rotating OA of the QWP with orientation exactly the same as the slow axis angle. Note that the OA colormap has a periodicity of π instead of 2π used in phase colormaps. Figure 5(c) shows another example with home-made plate-like cholesterol crystals (S25677, Fisher Science Education) on a microscope slide, which has a much weaker retardance due to its small thickness (tens of µm) yet uniform optic axis orientation. Judging from the values of retardation, the crystal may be thicker towards the bottom of the image. Since the crystal thickness is much smaller than the confocal gate, interference between the front and back surfaces results in en face fringes. A tighter confocal gate may be achieved by switching to MMFs with higher NA or choosing a shorter operating wavelength.

Discussion
Computational confocal imaging through MMFs is a novel matrix-based method to obtain depth-gated images using a proximal scanning spot basis for reflectance measurement without WFS, yielding multimodal 3D reflectance of unlabeled samples including confocal intensity, quantitative phase, dark-field, birefringent retardance, and optic axis contrast modalities. Despite the many documented imaging through MMFs, this is the first report of numerical PSF engineering, phase, and PS imaging through MMF in a reflection geometry.
High-contrast imaging through MMF frequently relies on fluorescent labeling or is operated in a transmission regime [17,29,15,7,13], which is incompatible with practical endoscopic applications. Fluorescence scanning microendoscopy furthermore suffers from photobleaching [30,31]. Our computational imaging approach instead efficiently extracts weak elastic scattering signals from unstained samples and operates in a reflection regime, making it favorable to practical endoscopic applications. The matrix approach moreover offers an elegant way of achieving full polarization management and leverages polarization as additional contrast mechanism. In comparison, WFS for physical focusing through MMF typically addresses only a single polarization state [12,7], to avoid the hardware complexity required for full polarization-control [13]. While PSF engineering through complex media in a transmission regime has been documented [32], our method here avoids SLM/DMD and optimization. More broadly speaking, illumination and detection with any respective PSF and in any polarization state can be readily engineered by weighting the entries ofR accordingly.
The matrix approach employs a simple proximal spot basis for illumination, which relaxes hardware requirements by accepting any 2D scanning module and avoids the need for WFS. The limiting factor in imaging speed of this work is the InGaAs-camera frame rate of 120 Hz, which may be directly improved by an order of magnitude by replacing it with a faster one or by shifting the operation wavelength towards visible wavelengths with more and even faster camera options. Recently, MMF calibration covering 256 degrees of freedom within only 34 ms has been demonstrated by using a field programmable gate array (FPGA) to address the general latency issue in hardware communication [33], which exemplifies the efficacy of program optimization and is readily applicable to speed up our implementation. Our approach also allows image formation from a partial round-trip measurement with as few as 200 input realizations, which trades off slight image quality for ∼ 85% measurement time reduction. Because the reconstruction of individ-ual spatial channels is independent from other channels, this offers high potential for parallelization of the processing using GPU acceleration. With careful engineering of the data acquisition and processing pipeline, fast video-rate imaging should be achievable. Fundamentally, imaging speed of our MMF imaging method is limited by computational complexity, and no longer by hardware as for WFS methods.
MMF imaging has a notorious intolerance to small fiber perturbations such as bending or looping. Even small fiber alterations typically require MMF re-calibration, which is exceedingly difficult in an endoscopic setting [23]. This admittedly remains the most fundamental limitation towards practical use of flexible MMF endoscopy. In our experiments, the calibrated 1-m-long MMF was fixed looping on the optic table and remained stable for several hours without perceivable TM change such that the same measured TM could be used for imaging different samples through the MMF. In a practical setting, an ultra-thin MMF could be mechanically shielded inside a rigid needle or hypodermic tubing to enable high-quality imaging through MMF without re-calibration. Furthermore, several promising strategies are being pursued to address the need for TM calibration without physical access to the distal fiber end: the installation of carefully designed passive optics or guide star to the MMF distal tip [34,3,5], compressive sampling of TM with sparsity constraint [35], or the use of graded-index MMF which has increased robustness of light transport to bending deformations.
The disclosed method offers 3D confocal imaging through MMFs with high signal specificity, yet is less hardwaredemanding than common WFS methods. This may expedite or create applications of minimally invasive MMF-based endoscopy in biomedicine, where probe size and cost are critical factors. For instance, deep brain imaging in neurosurgery, on-site inspection in needle biopsy, collagen imaging in arthroscopy, and tympanic cavity imaging in middle and inner ear surgery are potential uses of the technique. The same methodology may also be extended to optical imaging through other complex or turbid media, or other imaging technologies such as ultrasound tomography.

Conclusion
Accurate knowledge of light propagation through MMF can transform this low-cost optical component into a highthroughput and ultra-thin optical conduit for measuring elastic optical scattering by remote samples in a reflection regime. The demonstrated computational imaging through MMF based on round-trip measurements in a proximal spotbasis avoids the limitations in WFS and the use of fluorescent labeling, two important bottlenecks impeding practical MMF imaging. This may streamline the system design and, in combination with future advances in calibration stability of MMFs, stimulate the advent of hair-thin imaging probes that improve diagnostic performance, enhance guidance of existing interventions, and enable novel image-guided therapeutic procedures in clinical medicine.

Supplementary Materials for:
Confocal 3D reflectance imaging through multimode fibers without wavefront shaping S1 Experimental setup All experiments used a 1-m-long step-index MMF with 105 µm core diameter and a NA of 0.22 (FG105LCA, Thorlabs) that theoretically supports ∼ 550 guided modes per polarization. The fiber was coiled with a minimum radius of curvature of ∼ 50 mm. The monochromatic calibration matrix T was measured by sequentially probing the MMF input channels, while holographic detection was used for all output channels concurrently, as depicted in Fig. S1(a). Each input and output channel included two orthogonal polarization states: horizontal (H) and vertical (V). To alternate the illumination polarization between H and V, a laser beam (λ = 1550 nm and linewidth < 100 kHz) was linearly polarized and passed through a fiber-based electro-optical phase retarder (PR, Boston Applied Technologies). The laser was steered by a two-axis galvanometer scanning stage (GM, GVSM002-US, Thorlabs), and then focused by an objective lens (Plan Apo NIR Infinity Corrected, Mitutoyo) with a NA of 0.4 into a 2.5 µm full-width at half maximum (FWHM) spot on the proximal facet of the MMF. The angular spectrum of the spot exceeded the NA of the MMF to ensure efficient population of all modes. The focal spot position on the proximal input side was indexed by u and the speckle pattern exiting on the distal side was imaged with another identical objective lens and a tube lens (f = 30cm) onto an InGaAs camera (OW1.7-VS-CL-LP-640, Raptor Photonics) with exposure time of 20 µs at 120 frames per second. The object plane of the distal imaging system determined the calibration plane, which was approximately 100 µm away from the distal facet. We define d as the distance of the OP away from the MMF distal facet (at d = 0). A beam displacer (BD40, Thorlabs) was used in front of the camera to spatially separate the output into H and V polarization states. An angled plane reference wave polarized at 45 • independently interfered with the two speckle patterns on the camera to record the speckle field amplitude and phase through off-axis holography in both detection polarization states simultaneously. Images of the two polarization states were demodulated, spatially registered, and flattened into a column vector of T directly in the Fourier domain, with output channels at (k x , k y ) indexed by ν F . To uniformly probe all the MMF's guided modes, transmission was recorded for an oversampled grid of input spot positions u, typically ∼700 points for each input polarization state, sequentially generated by driving the GM and PR. The total acquisition time was 20 seconds. The input and output spatial channels of T have been ordered first by spatial coordinate, then by polarization.
In imaging experiments, as illustrated in Fig. S1(b), a sample was placed in front of the MMF distal tip (b-1) and the round-trip M was measured from the proximal side (b-2). We again sequentially coupled light into the MMF through the same set of proximal input states. Light with a power of 0.5 mW exited the distal facet and propagated towards the sample, where part of the light backscattered and coupled back into the same MMF. On the proximal side, we recorded the round-trip light transmission by decoupling its path from the illumination with a non-polarizing beam splitter and directing it to the same off-axis holography setup. The exposure time was set in the range 200-1000 µs depending on the sample. A complete round-trip sample measurement was acquired in 20 s. To preserve the symmetry between the illumination and the detection configurations and to obtain a square matrix M, we sampled the recorded complex output fields at the ordered positions identical to the set of input states. The matrix M was then constructed with the same procedure that was used to find T. Numerical corrections to compensate for the physical misalignment were then applied to the output channels of the measured M to accurately match the input channels and recover the underlying transpose symmetry as previously described [23].

S2 Resolving axial information with numerical refocusing
For a sample with volumetric structures, under weakly scattering regime and the Born approximation, we can express the reflection matrix R on the calibration plane (z = 0) as a summation of backscattering fields contributed from N individual OPs at varying axial positions, where r di , when in real-space ν representation, is a diagonal matrix with en face reflectivity over i th OP at z = d i , and H is a unitary TM modeling the loss-less free-space propagation from the calibration plane to the OP. Due to the unitary matrix properties, where the superscript -1, -T, †, and ⋆ indicate true inverse, true inverse of transpose, Hermitian transpose, and conjugate, respectively. Note that H simply reduces to an identity matrix when z = 0. According to Fresnel diffraction theory under paraxial approximation, the transfer function of a free-space propagation is a convolution kernel in real space, or a multiplicative quadratic phase term in the Fourier domain. Depending on the distance, z, of a selected OP, we can compute the Fourier phase term accounting for the propagation process where k x and k y are the coordinates in the in-plane momentum domain. The matrix H(z) in Fourier domain is then a diagonal matrix with main diagonal as F at distal channels, and incorporating the matrix into T through leftmultiplication extends the output of T to the OP at z. Plugging Eq.S1 into Eq.1 and then into Eq.2, and substituting respectively T -1(tik) with T -1(tik) H † (z) and T -T(tik) with H ⋆ (z)T -T(tik) using Eq. S2, we have a new transpose-symmetric reflection matrix R with input and output channels shifted to z which is the same as Eq. 2. If we set z = d j , Eq. S4 becomes where we isolate the in-focus and out-of-focus matrices. Assuming the out-of-focus reflective planes are separated from the in-focus plane by much more than a depth of focus, and the total background energy is uniformly distributed over all spatial channels, we can approximate the summation of out-of-focus terms in Eq. S5 as a complex matrix with random phases but a constant amplitude. Collecting the on-diagonal elements ofR dj hence leads to signal predominance by the en face reflectivity at z = d j and suppression of out-of-focus signals, or background rejection. In short, after measuring M, by obtainingR at intended axial position following Eq. S4 and then applying Eq.3 , we can digitally shift to the j th OP and image the en face reflectivity at z = d j without repeated measurements.

S3 Digital resampling of image physical and digital dimensions
The light transport through a MMF and interaction with a distal sample can be well modeled with measured TMs, which contain full complex propagation information of wave-vectors within the NA of the MMF. While the experimental T has output channels stored in Fourier domain, with an one-time measured M in the imaging phase, arbitrary resampling of 2D image dimensions and also digital adjustment of image size on any selected OPs can be readily configured based on Fourier relations. This offers flexible trade-off between image processing speed and accuracy in a pragmatic circumstance: a lower resampling rate or smaller physical dimension reduces the computational burden, which is suitable for a faster image preview, whereas a higher resampling rate produces a detailed and smooth image at the expense of longer processing duration. Here, we quantify the trade-off by timing the image processing on a personal computer with a 3.4 GHz Intel Core i7 CPU and 16 GB RAM using MATLAB.
For an arbitrary setting of image physical and digital dimensions, we upsampled the output spatial channels of T in the Fourier domain by interpolation, and pre-computed an inverse discrete Fourier Transform (iDFT) matrix for converting the distal channels to resampled real-space coordinates during the 2D real-space image reconstruction. We focus on the upsampling that corresponds to a valid augmentation to the initial pupil size on the calibration plane (∼105 µm in diameter). Note that the interpolation of T output channels and the calculation of iDFT matrices are performed prior to actual image formation processes. The necessary computation of images on an OP involves application of phase terms to T outputs for intended numerical refocusing, distal spatial channels conversion into real-space coordinates with the prepared iDFT matrix, left and right multiplication of M with regularized inversion of extended backward and forward TMs following Eq. S4 to retrieveR, and reshaping back to a 2D image using Eq.3.
In the experiment, the initial T had output channels accounting for 247 × 247 square area of camera recording pixels conjugating a physical size of 123 × 123 µm 2 , and the resolution chart as sample was placed on an OP at d = 10, 600, 1200 µm away from the facet. For each imaging setting, we timed only the necessary computation. As shown in Fig. S2

S4 3D resolution and field of view
To calculate the theoretical lateral and axial resolution, we need to first compute the effective NA, NA eff , specific to an OP at an axial position. While the effective NA may also be dependent of the lateral displacement from the optical axis, we only consider an on-axis point object on the OP for convenience. Given the object at a distance d away from the MMF facet, the effective NA can be calculated from the maximal angle formed with the point as the vertex and marginal rays within the MMF acceptance angle as sides, as illustrated in Fig. S3(a) and (b). When d is within the focal length of the MMF, Ω ∼ ηD/2NA, a full NA can be obtained, which is determined during MMF fabrication. Here, η is the medium refractive index, and θ a is the fiber acceptance angle. Once d is larger than this range, only a partial NA can be achieved due to the limited MMF diameter. The value of effective NA is summarized as Given the effective NA, we can then compute the expected lateral and axial resolution as in confocal microscopy [1] where we see that the axial resolution has a strong dependence on the system NA.
With the circular symmetry of fiber core shape, we can define the FOV on an OP as the diameter of a circular area with circumference from furthest off-axis points having normalized confocally detected intensity dropped below 1% threshold. Using the measured T of the MMF, we can free-space propagate each output light field per input to an OP and incoherently sum all output light intensity over each input realization. This results in a circular blob on the OP indicating the average illumination power at each spatial channel. Taking the spatial-channel-wise intensity square of the blob informs confocally detectable power, as shown in Fig. S3(c), where the OP is 600 µm away from the MMF distal facet. The low light coupling efficiency at FOV peripheral causes the vignetting effect on reconstructed images, and the quantified FOV has ∅∼260 µm by applying the threshold to plotted radius-wise mean intensity.

S5 Reconstructing confocal images from partial TM measurement
In the imaging phase, we can reconstruct confocal images from the round-trip measurement of M by obtaining the reflection matrix,R, processing its elements, and reshaping into 2D real-space coordinates at a selected OP. While the measurement of a full M by sequentially coupling light into all MMF proximal channels delivers maximal information of the distal sample bounded by the MMF throughput, intermediate confocal images for preview can also be reconstructed from a round-trip measurement with partial set of input realizations,M, which is a subset of M containing constituent column vectors, leading to a rectangular matrix. As illustrated in Figure S4(a), withM, we can reconstruct an speckled image on an OP from a computed reflection matrix,R, by respectively left and right multiplyingM with full T -T(tik) andT -1(tik) , which is the regularized inverse of a subset of T with constituent column vectors at input channels corresponding toM. Physically speaking, the image derived from the partial measurement corresponds to the distal sample under statistically non-uniform illumination. Using confocal intensity images I for demonstration here, we define the completeness of an intermediate image as the normalized intensity correlation, C, with the final image reconstructed from full M measurement, C = x,y I i (x, y)I f (x, y) x,y I i (x, y) x,y I f (x, y) , where I i and I f are intermediate and final images, respectively. The completeness arrives at C = 1 when I i = I f . Figure S4(b) shows examples of intermediate images with their quantified completeness. Here, the sample is a resolution chart, and the full M is a 1354-by-1354 square matrix. From the plot, we can see that the completeness quickly improves with the number of input realizations and achieves 90% with ∼200 input realizations, which is only ∼15% of the total number of realizations. The intermediate images start from speckled pattern and evolve to clean and high-contrast final confocal intensity image.

S6 3D confocal images with various contrasts
To demonstrate 3D imaging of biological samples through the MMF based on numerical refocusing, a sample with multiple layers was prepared following similar volumetric reconstruction experiments performed by others [2,3,4]. A proximal reflectance measurement of M through the MMF included reflectance from multiple layers of a sample, shown in Fig. S5(top), including buccal epithelial cells deposited on both surfaces of a glass coverslip with thickness of ∼200 µm, placed at d = 120 µm in air. From this single M, 3D volumetric imaging was computed by numerical refocusing and image reconstruction. The depth-dependent γ plot was consistent with the physical location of each reflective interface, considering the refractive indices of each layer (1.44 in glass). High-resolution confocal images with intensity, phase, and scattering contrasts were computed at the two individual coverslip surfaces (d = 120 and 320 µm). Both planes exhibited contrast from cell samples in all images, consistent with the transmission ground truth, and with high contrast, indicating the confocal gating efficacy. Note that in complex samples, because optical phase accumulates as light is reflected from further into the sample, the phase of shallower cells is overlaid on deeper-lying cells.