Imaging across multiple spatial scales with the multi-camera array microscope

This article experimentally examines different configurations of a novel multi-camera array microscope (MCAM) imaging technology. The MCAM is based upon a densely packed array of"micro-cameras"to jointly image across a large field-of-view at high resolution. Each micro-camera within the array images a unique area of a sample of interest, and then all acquired data with 54 micro-cameras are digitally combined into composite frames, whose total pixel counts significantly exceed the pixel counts of standard microscope systems. We present results from three unique MCAM configurations for different use cases. First, we demonstrate a configuration that simultaneously images and estimates the 3D object depth across a 100 x 135 mm^2 field-of-view (FOV) at approximately 20 um resolution, which results in 0.15 gigapixels (GP) per snapshot. Second, we demonstrate an MCAM configuration that records video across a continuous 83 x 123 mm^2 FOV with two-fold increased resolution (0.48 GP per frame). Finally, we report a third high-resolution configuration (2 um resolution) that can rapidly produce 9.8 GP composites of large histopathology specimens.


Introduction
A general challenge in the design of optical microscopes is to identify strategies that overcome a fundamental trade-off between imaging resolution and field-of-view (FOV). Optical microscopes range from FOVs of several centimeters when resolving at multi-µm resolution, to less than a millimeter when imaging at sub-µm resolution [1]. The total number of spatial points resolvable by a standard optical microscope, commonly referred to as the imaging system space-bandwidth product (SBP) [2], is generally between 10 and 50 million (10-50 megapixels) [3].
There are a number of compelling reasons why new microscope techniques with increased SBP can be more useful than conventional microscopes. For example, a large-SBP microscope would enable observation of multiple model organisms, such as zebrafish (D. rerio), fruit flies (D. melanogaster), and nematodes (C. elegans) during natural movement over a large area [4,5,6,7]. In addition, a high SBP system would assist with the rapid inspection of large electronics components [8,9], semiconductor wafers [10], microfluidic systems [11] and various materials [12,13] during manufacturing. Finally, high SBP microscopes would also facilitate novel parallelized assays to increase the throughput of high-content imaging and screening experiments common to the fields of pharmacology, toxicology and drug discovery [14].
Unfortunately, large-SBP imaging cannot be easily achieved by directly increasing the FOV of a microscope objective lens capable of a desired imaging resolution. As specified by the lens scaling law [15], all lenses are affected by arXiv:2212.00027v2 [eess.IV] 28 Feb 2023 optical aberrations, and the size of optical aberrations scale linearly with the diameter of a lens. Large-diameter lenses must utilize additional optical elements to correct for aberrations to maintain a fixed resolution over a desired FOV. Additional elements must be added in a super-linear manner [15,16], which leads to a rapid increase in size, weight, complexity and cost of large-SBP microscope objective lenses [17]. At the same time, the largest currently available image sensors only contain several hundred megapixels [18], which presents a second challenge to directly scaling up standard single-lens microscope designs.
Instead of relying upon a single extremely large objective lens, most current microscopes utilize mechanical scanning to overcome typical FOV limits. Scanning systems capture multiple high-resolution images in a step-and-repeat manner, which are then tiled together into a final large-area composite. Scanning microscopes are available from a number of companies and form the foundation of modern wide-area inspection and whole-slide imaging (WSI) [19]. Unfortunately, mechanical scanning is inherently slow. For example, a single 96 well plate (8 x 12 cm² in size) typically requires approximately 8 minutes to scan [20,21]. In addition, while sequential scanning works well for static samples, it is not an option for rapidly moving samples.
An alternative to mechanical scanning is to scan a microscope's illumination while capturing images in a timesequential manner. Structured illumination microscopy [22] and Fourier ptychography [23,24] process such variably illuminated images into final, high-resolution image composites. Similar strategies have also been employed utilizing speckle illumination [25]. Time-sequential measurement of stochastic or photoactivated emission is also standard practice in super-resolution microscopy [26,27]. Finally, alternative methods have utilized scanning spots produced by microlens arrays [28,29], as well as steering the illumination across large specimens via a parabolic mirror [30], to observe a larger sample area. While such approaches offer various merits over step-and-repeat scanning systems, the above methods still must capture multiple images over time, and thus cannot easily record large area, high-resolution synchronized video of dynamic objects.
A multi-aperture design can circumvent the lens scaling law by utilizing many lenses and digital sensors to record images in parallel [31]. The first demonstrations of digital multi-aperture imaging originated in standard cameras designed for imaging distant objects and can be generally summarized in two categories. The first "direct" category uses multiple cameras arranged in a compact array with no additional collection optics. This design was explored to decrease optical form-factor [32], to implement light-field capture [33], to adopt hierarchical modular designs [34], and is now a common feature in many smartphone camera systems. The second category of approach adopts a cascaded or "multi-scale" lens design in which an array of cameras is focused upon an intermediate image, typically produced by a large, primary lens [35,36,31,37]. This latter cascaded strategy has been implemented in several interesting configurations [38], including a microscope that produces images at sub-micrometer resolution over approximately one square centimeter [39]. However, there has been limited work to date examining the direct use of multiple imaging systems to record image and video across a continuous area at high spatial resolution in parallel for applications in microscopy, which is the focus of this work.
There are a variety of benefits of direct arrays for microscopic imaging applications. For example, small optics exhibit fewer aberrations, and thus simpler, more compact, and less expensive lenses can be used within the array design. Similarly, smaller and inexpensive CMOS pixel arrays are now made in large quantities for the smartphone camera market. Put together, these two insights point to a relatively simple and inexpensive wide-area microscope design. In this work, we demonstrate the ability to use such an array to directly record images at micrometer-scale resolution over a FOV of several hundred square centimeters in parallel. Our primary design, which we term a multi-camera array microscope (MCAM), includes 54 micro-cameras that are integrated into a regularly packed grid with a 13.5-mm center-to-center spacing. In the following, we cover several configurations of this new technology to achieve different functionalities: the ability to record 3D snapshots across a 100 x 135 mm² area at approximately 20µm full-pitch lateral resolution and 42µm axial sensitivity, video across a continuous 83 x 123 mm² area at 2X higher lateral resolution, and high-resolution image with 9.8 GP over a similar area. As detailed below, each of these configurations is a function of selected imaging magnification that one can easily adjust using the same hardware.
2 Multi-camera array microscope design We will begin by deriving several key properties, such as the relationship between magnification and pixel-limited resolution, for direct-array imaging systems. Our starting assumption is that the array of interest is planar, contains identical imaging systems that are arranged in a uniform grid, and is imaging a large two-dimensional specimen of interest. We will scrutinize these assumptions in later sections. As we aim to configure the entire array, and thus each lens-sensor pair, to directly image at micrometer-scale resolution, we will loosely refer to each lens-sensor pair as a microscope. It is useful to begin by considering how only two such individual microscopes side-by-side can be configured to image a continuous area (i.e., such that the FOV of each microscope abuts the other, see Figure 2(a)). Naturally, one would aim to place the microscopes as close as possible to one another. At the extreme limit, the sensors for each microscope would ideally sit immediately adjacent to one another.
Assuming this limit is achievable, we need to consider the objective lens placed in front of each sensor. As shown in Figure 2(a), it is clear that a lens with a magnification M > 1 will result in a gap between FOVs of adjacent microscopes. While it is possible to uniquely tilt each lens and sensor to minimize or remove this gap [40], such a configuration introduces several sources of experimental complexity that hinder large array development. Instead, one can simply use a lens with M ≤ 1 to ensure that the FOVs of each imaging system directly touch one another (Figure 2(a)). This leads to a first key property of direct array microscopy with a planar array -the maximum magnification must be less than unity to ensure continuous imaging across an extended FOV.
At first glance, this key property would appear problematic for achieving high-resolution imaging. In such lowmagnification conditions, the finite sensor pixel size is critical for the overall resolution. The system's pixel-limited full-pitch resolution r pix is easily found by projecting the finite pixel width δ onto the sample plane as r pix = 2δ/M . Historic CCD and CMOS sensors contained pixels on the order of δ = 5 − 10µm in width, which suggests that even in an ideally tiled array, the minimum full-pitch resolution would be 10 − 20µm at best when configured for continuous coverage, which precludes many applications. Over the past years, however, the average pixel size of CMOS sensors has decreased dramatically. It is now common to find CMOS pixels in the range of δ = 0.7 − 1µm [41]. Alternative sensor designs likewise include even smaller pixel widths [42]. Modern CMOS pixel arrays, typically found in most smartphone cameras, achieve a minimum full-pitch resolution r pix = 2δ/M between 1.4 and 2 µm when using a magnification of M = 1. This approximately matches the resolution offered by a current standard 4X or 10X objective lens with numerical apertures in the 0.15-0.3 range.
In practice, placing individual image sensors immediately adjacent to one another is challenging (in part due to mechanical packing and electrical routing constraints). Considering an image sensor of an active area width s in one dimension, this implies that the inter-camera pitch p must satisfy p > s in practical MCAM implementation ( Figure 2(b)). The sensor pitch p also defines the lateral separation between the optical axis of the adjacent systems in a planar array design. This defines the physical separation between the center of each image FOV. To fulfill our aim of imaging a contiguous surface without any gaps, we must ensure that for each lens-sensor pair in the array, the imaged FOV is equal to the sensor pitch: F OV = p (see Figure 2(b), green dashed line). When configured to image across a continuous object plane without any gaps, the magnification of each imaging system, defined as the ratio between its image FOV and an object FOV, is then given by: Inserting this required magnification into our definition of the imaging system's pixel-limited full-pitch resolution now provides a relationship in terms of sensor width s and pitch p: for a configuration to image a continuous surface. Of course, the imaging system magnification can also be selected to be greater than or less than Eq. 1, which will impact the total FOV coverage of the array microscope and its resolution. We will detail these alternative configurations below. We plot pixel-limited resolution as a function of magnification and for several common sensor pixel widths in Figure 2(c), which also highlights the three configurations of the MCAM investigated here.
The resolution of each imaging system within the array is also impacted by its optics. Based on the specific pixellimited resolution set by Eq. 2, it is useful to match the optical resolution of each single lens to this limit. In practice, this condition is relatively easy to satisfy for the compact imaging optics used by each lens-sensor pair, as we demonstrate for a number of configurations and as theoretically examined in prior work [43]. Each of the micro-camera imaging lenses is relatively small, since the lens diameter must be less than p, which is often on the order of 1 cm. Furthermore, small lenses ensure tight array packing. Following the lens scaling law outlined in Ref. [15], such small-diameter lenses are easily designed to offer high performance (i.e., minimal aberrations) across a wide range of specifications. While there are certainly optical limits on achievable resolutions and FOVs for each micro-camera, a primary benefit of utilizing a multi-aperture imaging arrangement is the ability to avoid many of the challenges associated with large lens design. In other words, since the required SBP of each micro-camera remains relatively small, designing a lens to meet pixel-limited resolution requirements is much easier (and cheaper) than attempting to create a large lens to capture a much larger SBP.
Finally, the number of individual lens-sensor pairs within the entire array needs to be considered. This number is simply a function of the desired total FOV for the array system. In this work, we first demonstrate a primary configuration that covers an approximate 100 x 135 mm² area using a 6 x 9 array of 54 micro-cameras. We also demonstrate the ability to tile together four individual multi-camera arrays to directly extend this total FOV by a factor of four. This straight-forward scalability of the total FOV is a unique benefit of arrayed microscopy, such as the MCAM system (see also Figure 7).
The selected magnification of all lens-sensor pairs within the array is a key design choice that drives three primary regimes of operation (see Figure 2(b)). We summarize these three unique operation regimes next, before experimentally demonstrating their benefits and trade-offs.

Configuration 1 -Multi-View Imaging
Using a large working distance and a low image magnification, the MCAM can be configured for "multi-view" imaging. In this configuration, the FOVs of individual lens-sensor pairs overlap so that each location in the sample is imaged by at least two unique micro-cameras (see Figure 2(b), purple). As we will show, a multi-view imaging geometry offers a number of interesting benefits. For instance, this allows stereoscopic imaging to estimate the depth of imaged objects, or to apply photogrammetry software to jointly reconstruct a height map of the imaged object while stitching together the final large-area composite. Alternatively, spectral or polarimetric multiplexing may be utilized to capture additional specimen information from such overlapped imagery. In the extreme scenario of all N micro-cameras within the array imaging the same object location, which is only possible when the object distance is increased to a large stand-off distance, the micro-camera array captures a light-field-type dataset [44], from which depth can be estimated [45,46,47,48]. We have found that a useful MCAM configuration is at the opposite limit, where a small amount of inter-camera overlap between immediately adjacent sensors can produce high-quality depth estimation, while the entire array can still yield a significantly increased imaging FOV.
The magnification of an MCAM configured for multi-view imaging typically satisfies, which indicates that each micro-camera FOV has a width of 2p to ensure that one point on the sample plane maps to at least two micro-cameras (except at the boundaries). This will ensure that in two dimensions every point is imaged by 4 micro-cameras with square image sensors. For example, inserting the condition M m = 1/4 (p = 2s) and a pixel size of δ = 1.1µm into r pix = 2δ/M yields an 8.8 µm lower bound for full-pitch resolution. This approximately matches the resolution provided by a 1.25X microscope objective lens (with 21 mm FOV diameter) commonly used for macroscopic (2D) inspection [1]. The number of micro-cameras required to image a desired total sample plane FOV area A in a multi-view configuration is A/4p 2 . We note that parts of the FOV of the corner micro-cameras may not include more than one viewpoint per sample plane area, which must be taken into account during post-processing analysis.

Configuration 2 -Continuous FOV Imaging
The Continuous FOV regime is entered when the magnification is increased past the multi-view scenario, but the entire surface of an object is still viewed by at least one micro-camera. This configuration has higher spatial resolution than the multi-view geometry and requires the following magnification: The continuous FOV configuration was the primary focus of our explanation at the beginning of Section 2, where we explained how a continuous area (i.e., without any gaps in the FOV) is observed by a planar array when its magnification is less than one. Typically, a small amount of inter-camera image overlap (approx. 5-10%) is required for effective composite stitching.

Configuration 3 -Tiled High-Resolution Imaging
To obtain even higher image resolution, one can increase the magnification of each single micro-camera such that a gap appears between adjacent FOVs (Figure 2(a)): Such a "tiled" configuration no longer images across a continuous sample area in one snapshot. Instead, it captures data from a discrete, non-contiguous sample area with a FOV width of s/M along one dimension. The achievable resolution of such a configuration depends on the selected magnification and eventually the imaging optics. Based on r pix = 2δ/M and assuming M t = 2 and a δ = 1.1 µm pixel size, we can estimate the full-pitch pixel-limited resolution to 1.1 µm for a tiled imaging arrangement, which matches the resolution achieved by a standard 10X microscope objective lens.
In a tiled imaging configuration, the MCAM operates similarly to a large number of individual microscopes configured to image in parallel [33]. To observe a macroscopic area, mechanical scanning of either the specimen or the array can be used to fill in the FOV gaps over multiple image acquisitions. The number of scan locations required to fill in the FOV gaps across the entire specimen plane in one dimension is pM/s (i.e., the number of micro-camera FOVs of size s/M that fit within an inter-camera spacing p). Assuming square image sensors and a square micro-camera packing geometry, ( pM/s ) 2 unique scan locations must be visited during imaging.
Compared to standard step-and-repeat imaging with a single-lens microscope, an array with N micro-cameras can image a specified surface at least N times faster, since it has to scan N times fewer locations, to increase imaging throughput accordingly. A second benefit is a greatly reduced scanning travel range. A tiled imaging MCAM only has to mechanically scan over the inter-camera spacing distance p to fill in missing FOV gaps, as opposed to standard microscopes which must scan across the entire extent of the desired aggregate FOV. In other words, 1/N less movement is required. In applications where the throughput is important, this can result in rather dramatic savings, e.g., in imaging of large histology slides or when inspecting large semiconductor wafers for defects.

Results
To test each of the three imaging configurations, we constructed a prototype MCAM system containing a 6 x 9 array of micro-cameras. The 54 individual 13 megapixel CMOS sensors (ONSemi AR1335, 3120 x 4208 pixels, δ = 1.1 µm pixel width) were tiled at a p = 13.5 mm pitch on a single PCB board. We designed an optomechanical mount to hold a 6 x 9 array of customized lenses (25.05 mm effective focal length, f/4, 13 mm outer diameter, fabricated by Edmund Optics). These lenses were separated by the same p = 13.5 mm pitch. Each lens can be focused individually via a custom thread mount. To test all three imaging configurations, we used the same MCAM design with all lenses focused to a common plane, and adjusted the distance between the lenses and sensors by moving the optomechanical mount of the lens array. For tiled imaging (configuration 3), an additional 6 x 9 array of lenses was included to minimize aberrations (see details below). The sample working distance was controlled via a 3-axis stage.
Image data from all micro-cameras was routed to a single Field-Programmable Gate Array (FPGA) before transmission to a single desktop computer via a PCIe link. The FPGA allows control over the settings of all 54 camera sensors, e.g., exposure time and gain, through a global address. Once configured, a single command initiates synchronized image or video acquisition across all image sensors (approx. 0.7 GP per snapshot). Image and video data are transmitted from all sensors to the FPGA via high-speed serial data lanes. The FPGA then organizes and routes this data to a standard desktop computer with 128 GB of RAM and a 4TB solid-state drive for permanent storage. As detailed below, the system enables full-frame video recording at seven frames per second and higher frame rates at lower per-frame pixel counts.
We applied a standard image stitching software to create all composite images post-capture unless stated otherwise. This customized software followed procedures currently available within the open-source Hugin code base [49]. In some experiments, it was beneficial to capture images of calibration targets to identify optimal stitching parameters before executing the imaging experiment. This pre-calibration step takes around two minutes for the first two imaging configurations and an hour for the tiled imaging configuration and then allows essentially instantaneous stitching of subsequent newly captured frames (as long as the sample is flat and at the same depth).

Validation of optical resolution
We summarize the results of a first set of experiments designed to assess MCAM system resolution in Figure 3. We imaged a custom-designed resolution target with the printed area (green box in Figure 3) spanning 83 x 123 mm² at three different working distances associated with each of the three configurations (multi-view, continuous, and tiled design). For multi-view imaging, a working distance of W D m = 250 mm yielded a magnification of M m = 0.1. For continuous imaging, a working distance of W D c = 140 mm yielded a magnification of M c = 0.2. Finally, for tiled imaging, we mounted a second array of 6 x 9 matching lenses atop the existing lens array but flipped it to produce a 4f imaging system for each micro-camera. This optical layout yielded an approximate working distance of W D t = 5 mm and magnification M t = 1, as seen in Table 1.
The total array FOV varied slightly for each imaging configuration, which we denote with colored box outlines (purple, green, and blue) in Figure 3. Additional key specifications for each design are highlighted in Table 1. Maximum full-pitch resolution limits for each setup were 20 µm, 10 µm, and 2 µm for multi-view, continuous, and tiled imaging, respectively. The trend in achieved resolution follows the mathematical derivation in section 2. We also demonstrate that the resolution changes very little across different camera positions and across different image regions of a single camera for the multi-view and continuous configurations, as shown in supplementary Figure S1. Tiled imaging (configuration  3), however, exhibits limited aberrations at the corner of each micro-camera FOV, as a working distance of the lenses utilized for this demonstration was 90 − ∞ mm, not the 5 mm used here. Future designs can utilize customized lens designs for tiled imaging, or crop such aberrated areas at the expense of requiring additional scanning. In the following subsections, we demonstrate MCAM imaging performance in each regime in a set of experiments.

3D imaging with multi-view
Stereo imaging is a standard technique in computer vision that can estimate object depth based on imaging from multiple views [50]. Through a similar principle, the multi-view imaging configuration of the MCAM (with 50 % or more overlap between the FOV of adjacent cameras) can be used to estimate the 3D height of specimens. The most direct explanation of stereoscopic-based depth estimation follows the principle of triangulation. As sketched in Figure 4(a), a point on the object plane will map to two image plane locations on two unique sensors in a multi-view setup configured with 50 % overlap. Each imaged location will be at certain a lateral disparity, d 1 and d 2 , with respect to the optical axes of camera 1 and 2. From these measurable lateral disparities and known optical setup parameters, such as inter-camera pitch p and image plane distance I d , object depth can be computed as, To verify this principle with our multi-view MCAM, we performed an experiment to measure the accuracy of this depth estimation by recording images of a standard USAF resolution target at a range of object distances. After calibrating the multi-view MCAM to exhibit slightly more than 50% overlap in each dimension (W D m = 250 mm), we axially displaced the resolution target from -3 to 3 mm from the originally calibrated focal plane and captured images at 10 µm increments. From two acquired camera images per depth plane, we then used standard stereoscopic methods to 1) find common features across both images, 2) compute disparity distances d 1 and d 2 from corresponding optical axes, and 3) apply Eq. 6 to estimate object depth. As plotted on the y-axis in Figure 4(b), we observe accurate performance across the 6 mm range with an experimental RMSE of 42.4 µm with respect to ground-truth depth.
Photogrammetric 3D reconstruction algorithms can also be used to produce dense, pixel-wise 3D surface height maps that are co-registered with the stitched photometric images (Figure 4(c)). The problem of estimating 3D information from a collection of multi-view 2D images has been extensively studied in the computer vision community at macroscopic scales [51,52] and more recently at smaller, mesoscopic (mm) scales [53]. Here, we adapted the algorithmic approach laid out in [53] to jointly reconstruct a stitched composite and 3D surface height map for all individual 54 MCAM sub-images. The central idea behind this computational procedure is to jointly match and register sample features viewed from multiple perspectives while converting parallax distortions to an estimated sample height. Unlike light-field microscopes, the multi-view MCAM can jointly provide a significantly enlarged image FOV with hundreds of resolved megapixels per snapshot, in addition to a 3D height map with an estimated axial sensitivity of 42 µm. Figure 4(c) demonstrates 3D surface height map formation for a leaf across a 135 cm 2 area, from which the 3D nature of the midrib is clearly highlighted.

Gigapixel video over a continuous area
For the next demonstration of MCAM imaging, we arranged the multi-camera array for continuous area imaging at a working distance of W D c = 140 mm, which creates a small amount (approx. 5-10%) of inter-camera overlap ( Figure 5(a)). As shown in Figure 2, This configuration provides a higher magnification and spatial resolution for contiguous area imaging as compared to the multi-view arrangement.
A video of a viscous fluid mixture (areas dyed red and green for visualization) was recorded at a frame rate of 5 Hz for a duration of 21 seconds (115 frames were recorded within 128GB RAM). Two exemplary frames highlighting liquid movement are in Figure 5(b) and an example recording is in Visualization 1, wherein the dynamics of small particulate matter is clearly observable. As detailed in the Discussion, current MCAM frame rates are limited by FPGAto-workstation data transmission. Video data can be acquired at higher frame rates across the same contiguous area, by reducing per-frame data with on-chip pixel binning, wherein adjacent pixel values are combined pre-transmission for a reduced final frame pixel count. For example, with the CMOS sensors and electronics utilized in the current configuration, an approximate 28 frame rate is achieved with the use of 2X pixel binning, at the expense of a 2X lower pixel-limited resolution.

9.8 gigapixel tiled imaging
Finally, to demonstrate higher resolution MCAM imaging, we created a slightly modified optical arrangement and outfitted the specimen plane for limited translation scanning. For each of the 54 micro-cameras in the tiled imaging configuration, we arranged two of the same lenses used for multi-view and continuous imaging in a 4f configuration. This produced 54 unique 1x magnification images, each with an approximate 2 µm full-pitch resolution limit. The maximum lateral travel distance for MCAM tiled imaging is specified by p, the inter-camera spacing, which here is 13.5 mm Figure 6(a)). This particular tiled imaging setup requires 5 x 5 scans (25 snapshots) to cover the entire FOV, where each snapshot's FOV overlaps with adjacent snapshots by approximately 10% for effective stitching. The resulting dataset from our 25 scans contains 1350 unique micro-camera images (13 megapixels per image). Figure 6(b) shows a set of two macaque brain slices (75 µm thick) arranged on slides, which were simultaneously imaged using the tiled MCAM configuration to produce final composites with 9.8 GP in total. In this example, along with lateral scanning to fill in inter-camera FOV gaps, we also executed axial scanning (10 slices at 10 µm increment) to account for the uneven surface of the relatively thick tissue specimens. For each micro-camera at each lateral scan position, we selected the most in-focus image via a Laplacian-based contrast metric for final composite synthesis. The bottom-left images in Figure 6(b) show the CA2 region of the hippocampus, while the bottom-right images highlight an area exhibiting gliosis.

Discussion and Conclusion
The three configurations of the MCAM system, presented in this work are a natural starting point for exploring the utility of parallelized acquisition in microscopy. By providing gigapixel-scale single-snapshot throughput, the MCAM  technology opens up new possibilities for wide-area high-resolution 3D imaging and video. There is a number of existing challenges and future directions for MCAM technology improvement that will likely lead to exciting follow-up work, which we summarize below.
One primary challenge faced by multi-aperture microscope designs is data management. In this work, we utilized a single FPGA to aggregate Mobile Industry Processor Interface (MIPI) data directly from all sensors, which led to several key benefits. First, the single FPGA allowed us to directly synchronize video capture across all 54 cameras within the array, leading to the ability to record gigapixel-scale video data without motion artifacts. Second, we were able to achieve a maximum data transmission rate of approximately 5 GB/sec, which corresponds to approximately 7 frames per second video (8 bits/pixel) with our 54-camera arrangement. It is possible to crop the sensors or reduce the camera count to increase the frame rate; for example, a 3072×3072 square crop yields ∼10 frames per second. Transfer at such high data rates must account for limitations both in transmission links and in drive write speeds. In our prototype, we utilized a PCIe connection between the computer and the MCAM and stored data on pre-allocated RAM space before moving to a solid-state drive, which enabled approximately 249 seconds of video capture at full resolution. As solid-state storage volumes and write speeds continue to improve, we anticipate the ability to increase data transmission rates, and thus imaging frame rates, using the current array. We also anticipate that improved data transmission will open the door to novel MCAM designs that offer even larger full-frame sizes. Compression methods are likewise available to dramatically reduce video sizes in a lossless manner, for example [54,55,56], and we see the FPGA as a key means to pre-process image data for additional future speed-up.
A second challenge is reaching a higher resolution. Following section 2's analysis, it is clear that resolution can be improved beyond our initial demonstrations. For tiled imaging, custom-designed lenses with superior magnification could match the resolution performance of standard high-NA objective lenses over the demonstrated 8 x 12 cm² FOV. For continuous and multi-view imaging, one primary driver of resolution is the inter-camera spacing p. Future MCAM designs can readily be created with p < 13.5mm for a proportional increase in resolution performance. Smaller pixels and/or alternative single-photon detector arrays [42,57,58] may also be adopted. Our demonstrated design likewise utilized 4:3 format CMOS sensors. It is alternatively possible to utilize perfectly square sensors for uniform overlap in x and y. Or, highly rectangular sensors can be utilized for multi-view imaging with an overlap in just one spatial dimension (i.e., to facilitate stereoscopic imaging with just two views per sample plane location), which may lead to improved resolution multi-view imaging. Finally, a non-rectangular (e.g., hexagonal) micro-camera packing geometry may also yield future performance gains.
It is also desirable to ensure that the macroscopic objects imaged here remain in sharp focus. We used standard calibration processes to first focus all 54 lenses within our prototype array for the three demonstrated configurations onto a common plane. However, afterward, care was required to ensure each macroscopic sample was placed and remained in-focus. As the MCAM transitions to higher resolution designs with shallower depths of field, several strategies may prove helpful. For example, a phase mask may be included for extended depth of field imaging [59]. A per-lens autofocus capability would enable imaging at high focus across curved specimens, or even surfaces that variably change in height as a function of time.
In terms of imaging FOV, a particular MCAM design's coverage is simply proportional to the number of microcameras included within an array. If a larger FOV is required, then it is possible to directly join multiple MCAMs to form a larger aggregate array. An example of such an aggregate MCAM array to image over a significantly larger FOV is shown in Figure 7(a). In this demonstration, we combined four identical, separately constructed 4 x 6 micro-camera arrays into an aggregate 8 x 12 array that contained 96 individual lens-sensor pairs (10 megapixels each, 0.96 GP per snapshot). As demonstrated in Figure 7(b, c), such a configuration can provide high-resolution continuous image capture over a large, macroscopic area covering approximately 18 x 24 cm². Additional details about this design can be found in Ref. [60].
The MCAM can also be configured for a wide variety of alternative imaging configurations. We have recently demonstrated multi-aperture fluorescence imaging [60]. Polarization [61], phase contrast [62] and aperture coding [63] techniques can also be easily implemented to extract additional specimen information. Likewise, while it has been previously explored in tiled imaging configurations that do not image a continuous area [64], variable-angle dark-field and bright-field illumination is an alternative direction for novel configurations of continuous and multi-view MCAM setups. As with all digital microscopes, software will continue to play an increasingly important role in the preprocessing and analysis of the MCAM's terapixel-sized datasets. While this work performed all processing post-capture, future implementations may benefit from on-the-fly or hardware-accelerated image stitching and/or surface height estimation. For application-specific scenarios such as object tracking and feature detection, various methods may be used to significantly reduce acquired file sizes on-the-fly while still maintaining salient image information.

Data availability
Data underlying the results may be obtained from the authors upon reasonable request.

Supplemental document
See Supplement 1 for supporting content.