Deep Learning Enabled Scalable Calibration of a Dynamically Deformed Multimode Fiber

Multimode fibers (MMF) are miniaturized, flexible, and high‐capacity information channels, promising to open up new applications in endoscopic imaging. However, precise light control through an MMF with continuous deformations is still a challenge. Here, a scalable calibration framework for a dynamically deformed MMF using deep learning is proposed. The proof‐of‐concept experiments demonstrate that the proposed continual generative adversarial model has the ability to characterize the MMF transmission states sequentially and detect the fiber deformation using proximal reflection in real‐time synchronously, allowing self‐adaptively cross‐state focusing through a semi‐flexible MMF without distal access after the scalable calibration. This framework is a continual learning scheme under extreme memory constraints where the model is able to synthesize training data and prevent forgetting the previously learned bending states. The proposed method paves the way for the experimental realization of scalable calibration of a dynamically deformed MMF.


Introduction
Miniaturized ultrathin endoscopes play a vital role in minimally invasive surgery and biological applications such as in vivo fluorescence microscopy. There is a need to develop nonrigid endoscopes to access different body cavities through a small incision. Multimode fibers (MMF) are miniaturized, flexible, and high-capacity information channels that may meet these extremely high demands due to their small diameter down to tens of microns and their ability to bend into acute angles. [1][2][3][4][5][6][7][8][9] However, the nature of MMF transmission leads to the scrambling of incident wavefronts resulting in random speckle patterns at the fiber output. A number of techniques in the adaptive optics domain have recently been developed to overcome this transmission degradation and to permit the desired light control through MMF. [3,[10][11][12] Recently, advances in complex modulation of the phase or the intensity of a light beam were enabled by the development of light-shaping hardware such as spatial light modulators (SLM) and digital micromirror devices (DMD). [13,14] Subsequently, the transmission matrix (TM) measurement of MMFs was explored, [2,3,8,9,15,16] allowing potential endoscopic applications. [8,9,17,18] The MMF transmission state is highly sensitive to external perturbations and environmental changes. [19][20][21] When the fiber is disturbed, the transmission state changes, and the precalibrated TM fails to remain valid to the new MMF transmission states. Therefore, a completely rigid MMF endoscope was usually Multimode fibers (MMF) are miniaturized, flexible, and high-capacity information channels, promising to open up new applications in endoscopic imaging. However, precise light control through an MMF with continuous deformations is still a challenge. Here, a scalable calibration framework for a dynamically deformed MMF using deep learning is proposed. The proof-of-concept experiments demonstrate that the proposed continual generative adversarial model has the ability to characterize the MMF transmission states sequentially and detect the fiber deformation using proximal reflection in real-time synchronously, allowing self-adaptively cross-state focusing through a semi-flexible MMF without distal access after the scalable calibration. This framework is a continual learning scheme under extreme memory constraints where the model is able to synthesize training data and prevent forgetting the previously learned bending states. The proposed method paves the way for the experimental realization of scalable calibration of a dynamically deformed MMF.
used to overcome this limitation, where the fiber was calibrated once for one specific spatial conformation. [8,9,22,23] Nevertheless, a flexible or semi-flexible endoscope is necessary for some applications. In most biomedical applications, inserting the MMF into deep tissue would induce inevitable shape and temperature changes. Real-time compensation for distortions was proposed in ref., [24] but additional feedback hardware was still needed. It was also reported that imaging with a fiber bent within a restricted radius of curvature range was achieved by using a particular S-shaped configuration. [25] A detailed analysis of the propagation-invariant modes within TMs was conducted but only using graded-index MMFs. [26] Recently, more compressive sensing MMF imaging schemes were proposed toward robustness of light transport or fast imaging speed. [27][28][29][30][31][32] On the other hand, it has been widely recognized that reflected light transmitted back through the fiber is beneficial to detect fiber deformation and is promising for imaging without distal access, [33][34][35][36][37] which is critical for practical MMF imaging. One early attempt was by using a virtual coherent point light source placed at the distal fiber tip to dynamically compensate for bending, and the light was focused through a semi-flexible MMF for which the number of conformation is limited. [33] Subsequently, focusing was maintained as the fiber was maneuvered to the target site prior to imaging by the addition of a partial reflector to the distal fiber end. [34] Recently, Gordon et al. have proposed a method introducing a thin stack of structured metasurface reflectors at the distal facet of the fiber, to characterize MMF TMs for lensless imaging without distal access. [36] Also, the transpose relationship between the backward and forward transmission through an MMF was verified in ref., [37] revealing that the direct retrieval of the forward TM from a round-trip measurement was impeded by the symmetry.
Deep learning techniques have been successfully applied to simple geometries (such as MNIST digits, letters) reconstruction [38][39][40][41] and spatially distributed data transmission [42,43] through MMFs. Particularly, it was also suggested that the neural network learned and generalized different transmission states when the MMF was bent or subject to continuous transmission characteristic variations. [38,39] Further, it was also reported that transmission of natural scenes through an MMF up to 10 m was achieved by statistically reconstructing the inverse TM for the fiber. [44] Turpin et al. demonstrated light scattering control in transmission and reflection with neural networks. [45] To precisely control the light propagation using existing algorithms through a dynamically deformed MMF is very challenging. First, different from other scattering media, the transmission properties of MMF are extremely susceptible to the fiber's deformation. A precalibrated TM or a fixed neural network for the MMF transmission can only be applicable to the current fiber state. Second, previous studies about continuous MMF transmission variations have an assumption that these schemes have full access to data collected previously as the fiber is dynamically deformed, [39] which was impractical due to the limited memory on the machine. Third, most of the methods described above require access to the distal facet of the fiber, which is not feasible in most realistic usage scenarios without bulky distal optics.
Here, we propose and demonstrate a scalable calibration framework using deep learning for a dynamically deformed MMF under extreme memory constraints (EMC). [46,47] We term this deeplearning-based approach DI-GAN (Generative Adversarial Network for Deep Imaging) and use it to characterize the MMF transmission states sequentially and detect the fiber deformation in real-time synchronously, enabling self-adaptively cross-state focusing through a semi-flexible MMF without distal access after the scalable calibration. DI-GAN, modified based on CVAE-GAN, [48] is trained using a conditional GAN in a continual fashion with mutually matched triplets of 1) various randomly generated input patterns coupled into the proximal fiber end, 2) the corresponding transmitted speckle images, and 3) reflected speckle images captured at the distal and proximal ends, respectively, under different bending positions of a dynamically deformed MMF. The network architecture is plotted in Figure S3 (Section S4, Supporting Information) trained using data collected under the experimental setup shown in Figure 1. This framework accurately reconstructs the input patterns from the transmitted speckle images through a fiber with continuous deformations, naturally leading to the calibration of the fiber at the corresponding bending position and enabling the input excitation wavefront prediction for focusing light at the distal tip of the fiber.
We also leverage the reflection information transmitted back to the proximal end of the fiber to detect the fiber conformation in realtime, and then spots can be self-adaptively generated on the distal fiber facet by projecting the corresponding input patterns inferred by the network without distal access. Therefore, access to the distal end (transmitted speckle images collection) is only needed during the training stage. But after that training, access to the distal end is not necessary as there is access to the reflection. For fully utilizing the reflection information and simplifying the setup, our proof-ofconcept experiment deploys an MMF coupler instead of a standard MMF nor a reflector ( Figure S2, more details in Section S3, Supporting Information).
The remainder of this paper is organized as follows. First, using DI-GAN, we implement the scalable calibration of a semi-flexible MMF under EMC by investigating the input-output relationships under different MMF transmission states. Second, we demonstrate DI-GAN-based transmission inference using reflection, which indicates the deterministic one-to-one mapping between the transmission and the reflection over the same input pattern. Third, we utilize the reflected speckle image to identify the current bending state and achieve self-adaptively focusing at an arbitrary state before and after MMF deformation. Furthermore, another important feature of DI-GAN is shown that it enables focusing performance monitoring at the proximal end without distal access.

Experimental Setup
A schematic diagram of our experimental setup is depicted in Figure 1. The experiments are performed on an MMF coupler (Thorlabs, TM50R5F1A). Each port of the MMF coupler has a 0.8-m length fiber lead with a core diameter of 50 μm and NA ¼ 0.22. The 1 Â 2 MMF coupler is designed to split light between Port 2 and Port 3 with a 50:50 coupling ratio. In our setup, we use Port 2 as our signal input and Port 1 as a transmission output. Port 3 is used to capture the reflected light from Port 1 using Camera 2. The system consists of three modules: 1) laser modulation module; 2) collection module; and 3) bending module. In the laser modulation module, a collimated laser beam from a continuous-wave diode-pumped laser (532 nm, Cobolt Samba 50) is expanded to match the area of our DMD (Vialux V-7001, %22 kHz). The DMD can modulate the laser beam into desired random patterns with binary spatial amplitudes. Each pixel of the input pattern occupies 4 Â 4 DMD micromirror pixels. For example, a pattern of 24 Â 24-pixel can be obtained by maintaining the total micromirror pixel numbers at 96 Â 96. Lenses L1, L2, L3, and L4 are placed in a 4f setup to project the modulated beam to the back focal plane of a microscope objective lens OL1 (Nikon CFI Plan Achro 20X, NA ¼ 0.4). A pinhole filters the first diffraction order in the Fourier plane of L3, blocking the remaining orders. The beam is then coupled into Port 2 of the MMF coupler by OL1. In the collection module, the output from the fiber facet of Port 1 and Port 3 is both collected and collimated by a microscope objective (OL2, OL3, Olympus PLN 20X, NA ¼ 0.4) and tube lens (TL1, TL2, Thorlabs AC254-200-A-ML). Camera 1 (QImaging optiMOS) and Camera 2 (JAI SP-500M-CXP4) are triggered simultaneously by the signal from the DMD. This allows two cameras to capture corresponding speckle images from the fiber facet when the DMD projects random patterns. In the bending module, as shown in Figure 1, the pigtail of the MMF in Port 1 is initially placed between three opposing optical posts. The post in the middle is mounted on a single-axis translation stage. The bending state of the MMF can be changed by tuning the position of the post at a displacement step of 1 mm. In our proof-of-concept experiments, the data sets are collected when the fiber pigtail is sequentially bent to 10 different positions (from 0 to 9 mm). To perform more continuous control of fiber deformation, one can use a three-axis motorized translation stage (more discussions in Section 4). Examples of transmitted and reflected speckle images are illustrated in Figure S1a and S2a in Supporting Information, respectively.

Data Preprocessing and Preparation of Training Data
The system shown in Figure 1 operates at 200 frames per second (i.e., 200 input-transmission-reflection triplets per second), and 10 000 transmitted speckle images and reflected speckle images are collected for each fiber bending state sequentially. The 10 000 input patterns displayed on the DMD are randomly generated with an 'ON' to 'OFF' pixel ratio of 50:50 (implemented using random.randint function within the Python Numpy Library) in a 24 Â 24 square configuration. Both transmission and reflection images are then downsampled to 96 Â 96 pixels, using resize function with INTER_AREA interpolation within the Python cv2 Library to speed up training and reduce computer memory usage, without losing image information. [49] The dynamic range of image intensity is subsequently normalized to [-1, 1] in order to stabilize the training of the GAN network. For each data set collected at each bending state, 90% of the data (i.e., 9000 data triplets) are randomly selected for training using the train_test_ split function within the Python Scikit-learn Library, and the remaining 10% of data form the testing data set.

Continual Generative Adversarial Model under EMC: DI-GAN
The proposal focuses on the accessibility to training data and the scalability of the model. This issue is often termed continual learning or incremental learning. [50] One of the most straightforward solutions for this problem is sequential fine tuning (SFT), where the network is trained sequentially with a sequence of Figure 1. The experimental setup used to obtain the data under different bending states. The beam is expanded, collimated, and directed onto the DMD, the reflection of which is coupled into Port 2 of the MMF coupler. The transmission output from Port 1 of the MMF coupler is captured by Camera 1. Simultaneously, the reflection output from Port 3 of the MMF coupler is captured by Camera 2. The MMF pigtail in Port 1 is bent using three opposing optical posts, the middle one of which is mounted on a single-axis translation stage (shown in the inset below). L1-L4: bi-convex lenses; TL1, TL2: tube lens; DMD: digital micromirror device; OL1-OL3: objective lens; MMF: multimode fiber.
www.advancedsciencenews.com www.adpr-journal.com independent tasks. The parameters of a network are first trained on previous data sets and then are fine tuned using the new data set to learn the current task. Nevertheless, this method will inevitably result in another problem, catastrophic forgetting, [51] where the network forgets its previous learning task. The reason is straightforward: the parameters of the network (optimized using the previous data set) have been tuned to adapt to the latest data distribution. Numerous efforts have been made to design network architectures and training algorithms that can prevent or at least alleviate catastrophic forgetting. [52] Based on how the task data are stored and utilized through the sequential learning process, there are three types of families to prevent forgetting: replay methods, [53][54][55] regularization-based methods, [51,56] and parameter isolation methods. [57,58] Inspired by these previous studies and given the impossibility of storing all data collected during the MMF bending, we propose our continual generative adversarial model under EMC. Instead of learning the different bending states of the fiber jointly at the same time in a single training process under the assumption that all data for training are available beforehand, [39] we develop a continual learning scheme under EMC where the model is able to synthesize training data for itself and the memory usage is extremely limited during the course of training. Compared with the conventional neural network, which results in forgetting the previous after transfer learning, our proposed scheme permits continual generalization of all bending states after the scalable calibration process. As shown in Figure 2, the DI-GAN updating is based on the mechanism of replaying memories from previous transmission states to consolidate them while learning new ones, preventing the models from forgetting the previous bending states. [59] Here, the memories refer to the images data synthesized by the previously trained model, differing from computer memory. If the synthesized images data match the data distribution of the ground truth very well, there is no need to allocate computer memory for data storage, and the training of the models satisfies the EMC. In our case, the training data set in each round contains 9000 newly collected data for the current bending state and 9000 Â i network-synthesized data for the previous bending states, where i means the number of previous bending positions. For example, shown in Figure 2, after training using the data set collected at the first bending position, the models have had the ability to synthesize data of transmission and reflection, matching the data distribution of the first bending position very well. The fiber is then bent to the second position, so another data set is collected. In the next round of training, the newly collected data and the data synthesized by the previously trained models will be jointly used to train the new models. The training process continues until the MMF dynamic bending ends, and then the networks are able to generalize all bending states. See more details in Section S4 and S5, Supporting Information. Figure 3a demonstrates DI-GAN-based DMD pattern reconstruction with high accuracy at different bending states (represented by digits from '0' to '9'), each one of which indicates the fiber conformation distance curved from its original position. By inputting the transmitted speckle images measured at the corresponding bending states into the DI-GAN (Encoder network), we recover the DMD patterns coupled into the fiber with %100% accuracy. Note that the high fidelity of the reconstructed DMD input patterns confirms the accurate characterization of the fiber with altered positions. To further quantify DI-GAN (Encoder network) output, we calculate various average performance evaluations (see details in Section S1, Supporting Information); Table 1 illustrates that the reconstructed DMD patterns calculated from transmitted speckle images agree with the true binary DMD patterns at each MMF state very well.

Scalable Calibration of a Semi-flexible MMF Using DI-GAN
Next, we apply DI-GAN to the data synthesis task for training future models continuously without forgetting the previous learning. Detailed in Section 2.3, the continual generative adversarial model, DI-GAN, is trained using the data triplets of images sequentially without direct access to the previously collected data under EMC. For these data triplets, we can easily generate a mass of random DMD patterns, and the trained DI-GAN will quickly synthesize realistic reflected speckle images (using Decoder network) and transmitted speckle image (using Generator network) over the corresponding bending states. The synthesized data triplets for the previous bending states are combined with the newly collected data triplets for the current state, and the combined outcome form the training data set for the next scalable calibration process. Figure 3b demonstrates that the predicted Figure 2. Illustration of the continual learning scheme under EMC. Training is first done on the data set available at the first bending state. After that, when the fiber is bent to the second position, the new data set is collected and used together with the network-synthesized data to train the network. The training process continues until the MMF dynamic bending ends, and then the networks are able to generalize all bending states.  As illustrated in Figure 3c, we compare the scalability between the proposal and a simple baseline, SFT, by showing the reconstruction of standard bars pattern after sequentially training the ten tasks (left to right: 0 to 9 mm). We observe that SFT completely forgets previous tasks in all data sets it learned before and can only reconstruct the DMD patterns at the last bending state; while those input patterns recovered by DI-GAN are in general clear and much more recognizable. This demonstrates that our network has a certain extrapolation ability.

DI-GAN-based Transmission Inference Using Reflection
Toward complete data synthesis, DI-GAN can also be used to perform transmitted speckle image inference, where DI-GAN (Generator network) can be trained using data pairs of DMD patterns and reflected speckle images as the model inputs. Figure 4 demonstrates our blind testing results for DI-GAN-based transmission inference using reflection. The trained Generator network digitally transfers the reflected speckle image into a transmitted speckle image, while at the same time retains high contrast, matching the ground truth to a high degree of similarity. For example, the predicted transmitted speckle image in the upper row of Figure 4a has a PCC of 0.9829 and an SSIM of 0.9554 with the ground truth image. To further qualify the performance of the proposed model architecture (using both DMD pattern and reflection as combined input, termed as DR2T-Generator), we also train a model only using reflection to infer transmission (termed as R2T-Generator). The R2T-Generator-inferred transmitted speckle images are also shown in the left column of Figure 4b for comparison, outputting more blurred images with smaller PCC and SSIM values. The pixels of DR2T-Generator-inferred transmitted speckle images are substantially sharper as compared to the R2T-Generator-inferred ones, providing a good match to the ground truth. The absolute difference images of the DR2T-Generator results and R2T-Generator results with respect to the corresponding ground-truth images are also provided on the right of Figure 4b, with MSE values reported, further demonstrating the success of   So far, in Section 3.1 and 3.2, we have blindly tested both transmission and reflection synthesis using DI-GAN (Decoder network and Generator network, respectively). Specifically, DI-GAN-based transmission inference using reflection indicates the deterministic one-to-one mapping between the transmission and the reflection over the same input pattern. Although not demonstrated here, the reflected speckle image also carries the complete information of the input pattern, which can be recovered through the reflection with high accuracy, too (similar to what has been shown in Figure 3a). We have evaluated the data synthesis performance of DI-GAN under different metrics, which reveal the effectiveness and robustness of DI-GAN to the sequential learning task under EMC.

Self-adaptively Cross-state Focusing through a Semi-flexible MMF
In Section 3.1, we have demonstrated the ability of DI-GAN (Encoder network) to reconstruct the DMD input patterns with %100% accuracy. As demonstrated in ref., [45] the focus can be generated by using a neural network when the fiber conformation is kept stationary. To highlight the utility of DI-GAN for cross-state focusing through a dynamically deformed MMF, we take advantage of the DI-GAN to output the desired DMD input patterns for spot generation at the distal tip of the fiber. In Figure 3a, the inputs of the DI-GAN (Encoder network) are transmitted speckle images, while here, we directly replace the inputs with images with a focus at a desired location across the speckle. Illustrated in Figure 5, by appending the corresponding bending states, the different DMD patterns for focusing are digitally generated by the network. For one specific bending state, by inputting images with spots at different locations sequentially, the network can quickly output a series of input patterns for focusing. It has also been reported that focus could be generated at the distal end of MMF using TM information. [60,61] Specifically, the position of the focus is controlled by choosing the corresponding row in the TM; to implement the complex conjugate in one row of the TM as the DMD input filed, the DMD pixels where the real part was positive are selectively turned on. For further comparison, we also show the input patterns generated by TM in the righost column in Figure 5. As can be seen in this comparison, the distribution of the pixels across DI-GAN-predicted input patterns is more separate and sparser than the TM-predicted ones, indicating that the DI-GAN enables  Figure 5. DMD input patterns generation for cross-state focusing. By appending a bending state (digits from '0' to '9') to an image with a focus at the desired location (left) and passing it through a trained DI-GAN (Encoder network), a predicted DMD input pattern for focusing (middle) can be digitally obtained. The output pattern is then binarily processed to get a binary DMD pattern (right) using a threshold of 0.5. The DMD input patterns predicted by TM are also provided on the righost, with denser and larger patches. Scale bars, 10 μm.
www.advancedsciencenews.com www.adpr-journal.com to control light more precisely by providing better modulation patterns. Next, we project the input patterns predicted by DI-GAN and TM onto the DMD, respectively, and observe the focus images captured at the distal fiber end. Figure 6 illustrates the focusing performance of these two methods (two DMD patterns highlighted using red dotted squares from the bottom row in Figure 5), showing an example of a focus that is around the fiber end center. The first row in Figure 6a-c shows the focus evaluation of DI-GAN method, and the second row in Figure 6d-f shows the focus evaluation of TM method. For comparison, we calculated the enhancement factor (EF, detailed in Section S2, Supporting Information) of these two focuses. It has shown that the EF of DI-GAN-generated focus (about 74.45) is nearly double that of the TM-generated one (about 39.39). Besides, the full width at half maximum (FWHM, detailed in Section S2, Supporting Information) of the focus is calculated by performing a 2D Gaussian fit over the region of interest (ROI) around the focus center that covers %10 Â 10 μm 2 (Figure 6b, e, zoom-in areas in the red dotted squares from Figure 6a,d. The intensity profiles of the focuses along the x and y axes, plotted by fitting the data with a 2D Gaussian function, are also shown in Figure 6c,f, respectively. For example, in Figure 6c, the intensity profile of the focus in the ROI is shown using a blue mesh grid, and the multi-colored smooth surface represents the intensity 2D Gaussian fit, which is then projected onto the xoz and yoz planes, respectively. It is clear that the focus generated using DI-GAN has smaller FWHM (about 1.63 μm along the x-axis and 1.66 μm along the y-axis) than the TM one (about 2.18 μm along the x-axis and 2.15 μm along the y-axis). The theoretical value of FWHM (Abbe diffraction limit, calculated as λ/(2NA)) in our case is around 1.21 μm, and the degradation results from the difference between the ideal input pattern and practical input pattern. Since the DMD only supports binary amplitude modulation rather than continuous (as with a liquid crystal on silicon SLM), such field transformation will inevitably result in a certain difference from the ideal case. Furthermore, we observe that the DI-GAN has significantly improved the signal-to-noise ratio (SNR, detailed in Section S2, Supporting Information) of the focus, from 26.06 to 52.58, more than twice the SNR of TM-generated one. In addition to focusing at the center of the speckle, a video showing the scanning with multiple spots enabled by the TM, and the proposed model can be found via the link given in Supporting Information as ref., [5] demonstrating that the CNN model has high enhancement and darker background.
DI-GAN not only substantially surpasses the TM in focusing ability but also permits focusing performance monitoring at the Figure 6. Focus generation around the fiber end center with DI-GAN and TM. a) and d) Foci generated by projecting the two DMD input patterns using DI-GAN and TM highlighted in the bottom row in Figure 5. b) and e) Zoomed areas from the ROI (red dotted squares, covering %10 Â 10 μm 2 , i.e., 18 Â 18 pixels) in (a) and (d) with 3σ circle area and FWHM values. c) and f ) 2D Gaussian fit over (b) and (e). The blue mesh grid shows the intensity profile of the focus, and the multicolored smooth surface represents the intensity 2D Gaussian fit, which is then projected onto the xoz and yoz planes, respectively. FWHM values along the x-axis and y-axis are also provided. Experiments were repeated over 10 bending states, achieving similar results. Grid pitch: 0.5537 μm pixel À1 (see calculation in Section S2, Supporting Information). Scale bars, 10 μm.
www.advancedsciencenews.com www.adpr-journal.com proximal end without distal access. For MMF-based endoscope application using scanning-based imaging technology, where the fiber tip would be inserted inside the deep tissue, the imaging quality primarily relies on the focusing performance that is impossible to be evaluated directly. We show for the first time, to the best of our knowledge, that the trained DI-GAN is able to monitor the focusing performance at the proximal end without distal access. Figure 4a has shown that the transmitted speckle images at the distal fiber end could be predicted using the corresponding DMD patterns and reflected speckle images; similarly, plotted in Figure 7, we use the DI-GAN (Generator network) to predict the focusing at the distal fiber end under the scenario where the distal tip is not accessible. A predicted focus is visible with high similarity to the experimentally captured focus, albeit with a 0.54 μm larger FWHM. The last feature of the proposed DI-GAN is that the Classifier network enables real-time bending state identification during monitoring or focusing. Using the reflected speckle image as the network input, the Classifier network can quickly output its corresponding fiber bending conformation (classification onto one of the 10 classes), with 100% accuracy in our case. Then, we are able to detect whether the current fiber configuration is changed, and recall the corresponding DMD input patterns for focusing from the DMD's on-board memory. By doing so, we can achieve self-adaptively focusing at an arbitrary state before and after MMF bending. Figure 8 shows the process of selfadaptively cross-state focusing through a semi-flexible MMF to solve the bending problem. Before the focusing starts, the framework initializes itself to identify the current fiber deformation. In the initialization, the DMD projects any patterns tentatively and the reflected speckle images are observed at the proximal end, so the fiber bending state can be identified in real-time with the help of DI-GAN (Classifier network). Next, the DMD input patterns over each corresponding bending state are recalled and projected for focusing. Simultaneously, the focusing performance is monitored using DI-GAN (Generator network) and the fiber deformation is detected using DI-GAN (Classifier network) at the proximal end continuously. During the scanning, if the network detects a change of the fiber bending state, the different  www.advancedsciencenews.com www.adpr-journal.com DMD input patterns for focusing are recalled. Therefore, selfadaptively focusing at an arbitrary state before and after MMF bending can be achieved.

Discussion and Conclusion
We have developed a unique framework, termed DI-GAN, powered by conditional GAN, that enables accurate input patterns reconstruction through a fiber with continuous deformations and permits input excitation wavefront prediction for selfadaptively focusing light at the distal tip of the fiber without distal access after the scalable calibration. This framework is a continual learning scheme under extreme memory usage limitation where the model is able to synthesize training data and prevent forgetting the previously learned bending states.
There are a few limiting factors in our proof-of-concept experiment. First, although it has been demonstrated that the deep learning method has surpassed the TM method in the ability to precisely control light propagation through MMF, it is also observed that the focus quality implemented using neural networks is not uniform across the fiber end. The DMD used in our experiment only supports binary modulation, which yields an inevitable difference between the ideal input pattern and the actual input pattern. This difference in input field at different positions results in a variation in the focus quality across the fiber end. For further improvement of the focus quality, more macro pixels on the DMD or a liquid crystal-based spatial light modulator (LC-SLM) could be used since both approaches can substantially improve the effective modulation. On the other hand, for the current setup and model structure in the text, the prediction of the DMD input pattern for focusing can be adjusted by changing the network input (focus intensity distribution), resulting in even, controllable, and measurable focus intensity delivery. Second, the number of MMF bending states in our proof-ofconcept experiment is limited, and the cross-state calibration framework is only suitable for a semi-flexible MMF characterization. In the proposal, although we just demonstrate ten bending states within a small bending range (0 to 9 mm), the fiber can be bent in an arbitrary range because the different bending states are indicated using simple and discrete labels (digits from '0' to '9'). By appending the labels to the input images, the networks can generate the output images at the corresponding bending states. The separated labels are, however, not full featured and thus not suitable for referring to an arbitrary fiber deformation. To extend the framework to a totally flexible fiber in the future, a continuous and user-defined reference that can better indicate the fiber deformation needs to be explored for including them into the training of models. For example, the stage could be mounted on a three-axis motorized translation stage, allowing continuous, precise, and repeatable control of fiber deformation, and the data with a continuous reference could be collected accordingly. Third, we also note that the training of the continual learning model may be time consuming and challenging; however, this training process is a one-time shot, and the model can be rapidly deployed after the training is complete. Given the rapid development of high-performance computing technologies, neural networks may soon be able to be trained in a fraction of the time they currently require. Finally, the proposed network architecture used in DI-GAN is fixed, which may fail to accommodate continually incoming tasks. A dynamic architecture expansion mechanism that ensures sufficient model capacity is considered feasible. In addition to the architecture, loss function redesign is also conducive to the data synthesis ability of the models.
In summary, we have presented a scalable calibration framework for a dynamically deformed MMF under EMC using deep learning. Our proof-of-concept experiments demonstrate that the proposed continual generative adversarial model enables us to characterize the MMF transmission states sequentially and detect the fiber deformation in real-time synchronously, allowing self-adaptively cross-state focusing through a semi-flexible MMF without distal access after the scalable calibration. The proposed method paves the way for the experimental realization of scalable calibration of a dynamically deformed MMF and will lead to future new flexible MMF-based endoscopes.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.