Machine-learning enhanced dark soliton detection in Bose-Einstein condensates

Most data in cold-atom experiments comes from images, the analysis of which is limited by our preconceptions of the patterns that could be present in the data. We focus on the well-defined case of detecting dark solitons -- appearing as local density depletions in a Bose-Einstein condensate (BEC) -- using a methodology that is extensible to the general task of pattern recognition in images of cold atoms. Studying soliton dynamics over a wide range of parameters requires the analysis of large datasets, making the existing human-inspection-based methodology a significant bottleneck. Here we describe an automated classification and positioning system for identifying localized excitations in atomic BECs utilizing deep convolutional neural networks to eliminate the need for human image examination. Furthermore, we openly publish our labeled dataset of dark solitons, the first of its kind, for further machine learning research.

Using cold-atom Bose-Einstein condensates (BECs), we focus on solitons, robust solitary waves that retain their size, shape, and speed at which they travel [20,21]. These

Raw data
Step 2: Data preprocessing Processed data Step 3: Image classification Single soliton No soliton / Other excitation Step 4: Solition positioning Step 1: Measurement Adjust control parameters Soliton position properties arise from an interplay between nonlinearity and dispersion that is present in many physical systems. Indeed, since their first observation in canals [22], solitons have been found in rivers and seas [23,24]; BECs [25,26]; optical fibers [27,28]; astronomical plasmas [29]; and even human blood vesicles [30,31]. Due to their inherent stability, solitons in optical fibers [32] have found commercial applications in long-distance, highspeed transmission lines [33]. While the natural environment does not allow for the controlled study of quantum solitons, BECs are an excellent medium where individual or multiple solitons can be created on-demand, with all their properties, such as position and velocity, tuned according to necessity [34,35]. Most measurements in BEC experiments produce raw data in the form of images that, in our context, provide information about the solitons' positions within the BEC. The challenge is to efficiently and reliably identify the number of solitons and their locations. Traditional least-squares fitting techniques can locate solitons, provided that the soliton number is known in advance. Currently, the number of solitons is determined manually [35], and this human intervention inhibits the automated analysis of large datasets.
Here, we describe our reliable automated soliton detection and positioning system that takes as input image data and outputs information whether a single soliton is present, and, if so, its location. Since solitons are easily identifiable by human examination of images, this problem naturally connects to the field of computer vision and ConvNet-based image classification [36]. Our algorithm consists of a data preprocessor that converts raw data into a ConvNet-compatible format; a ConvNet image classifier that determines if a single soliton has been detected; and a position regressor that locates the soliton within the BEC, when applicable (see figure 1 for a schematic of the analysis flow).
We show that our fully automated system performs comparably to our existing human image classifier, autonomously replicating the data analysis in Ref. [35]. In addition to developing a detection and positioning tool, we established a dataset of over 6 000 labeled experimental images of BECs with and without solitonic excitations; this dataset is available via the National Institute of Standards and Technology (NIST) Science Data Portal [37] and at data.gov.
The remainder of this paper is organized as follows: in section 2, we illustrate the work flow of the soliton detector and its preparation process. Then in section 3, we demonstrate the system, quantify its performance, and discuss the quality of the labeled dataset. Finally in section 4, we conclude and discuss possible future directions.

Soliton detection and position system
In this section we describe our fully automated method of soliton detection and positioning in images of BECs. Our four-step protocol, detailed in the following sections and depicted in figure 1, is outlined as follows.
Step 1: Measurement. The measurement consists of three raw images that are combined to produce a single image of the atomic density distribution.
Step 2: Data preprocessing. As shown in figure 1, the BEC is rotated with respect to the image frame orientation, and the region of interest where atoms are captured is a small fraction of the full image. To simplify soliton positioning, the data is first rotated to align the BEC orientation with the image frame and then cropped prior to the classification step.
Step 3: Image classification. The pre-trained ConvNet classifier determines whether a lone soliton is present in a given image. If so, step four is executed, otherwise the image analysis terminates.
Step 4: Soliton positioning. The soliton position with respect to the BEC center is determined using a least-squares fit based on a one-dimensional (1D) model function.
We launch solitons using our recently developed 'improved' protocol, that simultaneously engineers the density and phase of the BEC wave function [35]. By contrast with the 'standard' protocol that only modifies the BEC phase and can only create solitons within a small range of initial velocities, our protocol can create solitons with arbitrary initial velocity. The potentials for density engineering and phase imprinting are both generated by far-detuned laser light, spatially patterned by a digital micromirror device (DMD). Our protocol is summarized as follows: After the BEC is created, we reduce its local density by applying a repulsive dimple potential. Next, the DMD is reprogrammed to display a step function that illuminates only half of the BEC, imprinting the soliton's phase profile. To minimize creating additional density perturbations, the dimple potential is reapplied and its magnitude slowly ramped to zero. We note that in our data there are additional solitonic excitations that, while representing different physical states (e.g., kink solitons, solitonic vortices, soliton rings and so forth [40]), can result in similar image and we identify simply as solitons in our analysis.
After solitons are created, we let them oscillate in the harmonic trapping potential for a variable evolution time. For evolution times much less than the trap period, additional density excitations from the soliton imprinting process are present. We then turn off the trapping potential and let the BEC evolve for a 15 ms time of flight, before absorption imaging the resulting density distribution [41].

Data preprocessing
We established a dataset of over 6.2 × 10 3 images for ConvNet training; these images were taken from multiple experiments performed in a single lab over a span of two months. The raw images were obtained with a 648×488 pixel camera (Point Grey FL3) with 5.6 µm square pixels, labeled by i and j. Including the ≈ 6× magnification, each pixel has effective 0.93 µm size. The diffraction limit of the imaging system gives an optical resolution of ≈ 2.8 µm (roughly three pixels).
Absorption imaging combines three raw images into a single record of atomic density. In the first image I A i,j , a probe laser illuminates the BEC and the resulting intensity records the probe with the BEC's shadow. The second image I P i,j records only the probe intensity, and the third image I BG i,j is a dark frame containing any ambient background signal. The 2D column density can be derived from these images, where the resonant cross-section σ 0 = 3λ 2 /(2π) is derived from the wavelength λ of the probe laser. The dimensionless product σ 0 n i,j is of order 1 in our data, so we express density in terms of this product. Figure 1 shows an example of the probe beam with atoms and the resulting density in the 'raw data' and 'image classifier' frames, respectively. In our raw data, the BEC occupies only a small region of the image, and the long axis of the BEC is rotated by about 43 degrees with respect to the camera. To facilitate the ConvNet training, the images are rotated to align the BEC with the image frame and cropped to discard the large fraction of the image that does not contain information about the BEC. Since the BEC's position and shape can vary for different realizations of the same experiment, we implement a fitting approach to determine the position and size of the BEC.
Next, we fit every image to a column-integrated 3D Thomas-Fermi distribution [42], giving the 2D distribution: This function describes the density distribution of 3D BECs integrated along the imaging axis. We use six parameters to fit: the BEC center coordinates [i 0 , j 0 ]; the peak 2D density n 0 ; the Thomas-Fermi radii [R i , R j ]; and an offset δn from small changes in probe intensity between images. Successful fitting requires acceptable initial guesses for all fit parameters. We obtained guesses for i 0 and j 0 by summing the image along the vertical and horizontal directions to obtain two 1D projections, from which we select the average position of the five largest values as the initial guesses. We took the largest value of the image as the guess for n 0 and used [R i , R j ] = [66, 55] pixels, based on the typical radii over the whole dataset. The guess for the offset δn is zero. The result of these fits are included in our released dataset.
We determined the 164 × 132 pixel extent of the cropping region by examining the radii [R i , R j ] = [66(5), 58 (3)] obtained from fits to 6.2 × 10 3 images. We then centered the cropping region at [i 0 , j 0 ] as determined from fits of each image separately. The process was validated on an additional 10 4 images not included in our dataset. In the preprocessed images, dark solitons appear as vertically aligned density depletions and are easily visually identified (see top-left panel in figure2(b)).

Labeling
Three independent human labelers labeled the preprocessed data, categorizing the images into three classes: 'no soliton', 'single soliton', and 'other excitations'. The 'no soliton' class contains images that unambiguously contains no solitons; the 'single soliton' class describes images with one and only one soliton; and 'other excitations' class covers any image that can neither be interpreted as 'no soliton' nor 'single soliton. ' We did not include a separate 'two soliton' class in our demonstration because the small number of images with two solitons led to ineffective training.
The labeling process was carried out in eight batches, with each batch size limited by the attention span of the labelers. Once a given batch was completed, the resulting labels were compared and images with full agreement were set aside. The overall labeling agreement rate was 87 % (table 1 shows a comparison of the labeling agreement for all three classes), consistent across all batches. The remaining images were further analyzed and discussed until an agreement was reached. The final distribution of images between classes is as follows: 19.8 % in the no soliton class, 55.4 % in the single soliton class, and 24.8 % in the other excitations class. Figure 2(c) shows representative labeled images from each class. This labeled dataset was employed to train the ConvNet classifier and to test the positioning protocol.

Image classification
Our ConvNet classifier, shown in figure 2(a), consists of five convolutional layers. Each layer is followed by a rectified linear unit (ReLU) function defined as f (x) = max(0, x), then a max pooling layer 4 . The final max pooling layer is flattened and fully connected to a deep neural network with three hidden layers (256, 128, and 64 neurons, respectively) and an output layer (three neurons). Each hidden layer is followed by the ReLU activation function, and to reduce overfitting, a dropout layer that randomly eliminates neural connections with a frequency of 0.5 during each training stage. The output vector ξ = (ξ 1 , ξ 2 , ξ 3 ) is normalized by the softmax activation function, giving the final output probabilities P m (ξ) = exp(ξ m )/ n exp(ξ n ). The labeled dataset was divided into two subsets: 640 images (10.2 % of the dataset) were set aside as testing set, while the remaining 5 617 images (89.8 %) were used for training during the model architecture development. Since our training dataset is unbalanced, i.e., its different classes have a significantly different number of images, we balance it using augmentation techniques. We augment using three physically acceptable transformations: horizontal and vertical reflections, as well as a 180 degree rotation. All three transformation were applied to the no soliton and other excitations classes, increasing their size by a factor of four. For the single soliton class we used one randomly chosen transformation per image, doubling the size of this class. After augmentations, the size of the three classes has a 0.28 : 0.38 : 0.34 fractional distribution. To model a small rotation angle present in different realizations of our BEC, we randomly rotate images by an angle in the range ±1 degree every time they are used during the training process. We applied an elliptical mask with radii [R i , R j ] to each image, eliminating all technical noise outside the BEC, to accelerate the training process 5 . Lastly, we preconditioned the data to have a range suitable for ConvNet input by uniformly scaling the image-values to the [0, 1] range.
Since our testing dataset remains unbalanced, we assess the performance of trained models using the weighted F1 score [43]. When two models have similar weighted F1 scores, we first compare their accuracies as a tie-breaker, and if that fails we use the F1 score of the single soliton class 6 .
We used a semi-structured search through the model parameter space, and the resulting performance for varying hyperparameters is detailed in the appendix A.2. Once we determined the best performing model, we used randomly selected 95 % of training set for the final training. Training terminated when the F1 score of the remaining 5 % did not increase for five epochs. We took the model prior to these five non-improving epochs as our final trained model.  In summary, our model has weighted F 1 ≈ 0.9 and accuracy ≈ 90 %, in excess of the 87.0 % human agreement ratio. The most frequent classifier errors conflate images from the single soliton class and the other excitations class: 6.9 % of the single soliton images is wrongly assigned to the other classes (P 1 < 0.2), and 4.3 % has no clear assignment (0.2 ≤ P 1 < 0.8). Figure 3(b) shows that the classifier works very well for the no soliton and single soliton classes. The classifier performs better when tested against human-initiallyagreed data than human-initially-disagreed data, suggesting that some disagreed upon images may be truly ambiguous (Also see the last column in table 2). In addition, we observe an anomalously large misclassification rate for human agreed data in the other excitations class, resulting from the human labelers use of this class when facing a dilemma. Furthermore, the wrongly classified data are distributed near the corners of figure 3(a), indicating a high degree of confidence in the misclassification.

Position regression
Once images containing only one soliton are identified, we locate the soliton position using a simple yet robust least-squares fitting procedure [43]. The first step consists of summing each 2D image along the j direction to obtain a 1D distribution n i = j n i,j . We fit the 1D distributions to the expected profile: that is, Eq. (2) integrated along the j direction. The initial guess for n 1D 0 was the max of the integrated distribution, and the remaining guesses were taken from the successful 2D fit. We subtract the fit from the integrated 1D profile to obtain the residuals ∆ i = n i − n 1D i . Ideally, this procedure would result in a flat background containing a single dip, associated with the soliton, which we identified using a Gaussian fit 7 . We use the minimum of ∆ i as the initial guess for the Gaussian amplitude, the minimum position as the initial center, 3 pixels for the width, and zero for the offset. This fit yielded the soliton width, amplitude and position.

Soliton detector
To test the performance of the fully automated soliton detection and positioning system, we use two sets of images containing oscillating dark solitons 8 that were launched using the standard and improved protocols described in section 2.1, with 60 and 59 images, respectively.
In the first test, we used the improved-protocol data-set, with representative summed data n i presented in the top panel of figure 4(a). As the solitons in these images are well pronounced, we expected the ConvNet will easily classify them. Out of 59 images, 52 were classified as single soliton and the remaining 7 were classified as other excitations, in agreement with a human labeler. Solitons were then located in the first group by the positioning regressor (see figure 1). The middle and bottom rows in figure 4(a) plot the soliton position from manual and ConvNet identification, respectively. We fit i(t) = A sin(ωt+Φ)+i 0 to the soliton position data, and we compare the fitted parameters with those obtained from our previous manual approach. As can be seen by comparing the middle and bottom rows of figure 4(a), the performance of the automated protocol is basically indistinguishable from the manual approach. The physical parameters from the ML classifier (A = 2(2) pixels and ω/2π = 2.3(7) Hz) were within one standard deviation of those obtained for manual soliton identification (A = 2(2) pixels and ω/2π = 2.3(6) Hz).
In the second test, we used images with solitons generated by the standard phase imprinting protocol. As can be seen in top panel of figure 4(b), solitons in these images can be shallower than those in figure 4(a), making them potentially more difficult to distinguish from the no soliton and other excitations classes. Out of the 60 images in this test, 22 were classified by the ConvNet as no soliton, and 11 as other excitations, in agreement with a human labeler. The remaining 27 were classified as a single soliton and were sent to the position regressor. The lower panels in figure 4(b) shows soliton position as a function of evolution time, obtained from manual [35] and ConvNet identification, respectively. Since [35] compared the soliton oscillation amplitude resulting from the two imprinting protocols, the authors did not limit themselves to images with a single soliton. Rather, when more than one soliton was created, the authors identified all the solitons but tracked only that associated with a specific trajectory. Since the ConvNet classifier was trained to select images with single soliton excitations, the middle panel in figure 4(b) includes 12 more points than the bottom panel. Even with fewer data points, however, the parameters from the ML classifier (A = 34(3) pixels and ω/2π = 3.34(9) Hz) were within one standard deviation of those obtained for manual soliton identification (A = 35(2) pixels and ω/2π = 3.39(5) Hz).
The complete analysis resulting in both oscillation plots took under 148 s per series on a 2014 MacBook Pro. The expected performance relevant for in-situ operation, is ≈ 2.4 s per image, a relatively small overhead on top of the measurement time (about 12 s). In many cases, however, the analysis of an image would take place during the acquisition of the next image.

Soliton dataset
As with all ML techniques, the availability of the training data is essential for good performance of the trained classifier. To assure the reliability of the assigned labels, the full dataset was independently labeled by three labelers, as described in section 2.2. Our full soliton image dataset consists of 6 257 labeled images. There are 1 237, 3 468, and 1 552 images for no soliton, single soliton, and other excitations classes, respectively.
While for 5 445 (87.0 %) of the images the assigned labels were consistent between labelers, for the remaining 812 images (13.0 %) there was a disagreement with at least one labeler. These images needed to be further discussed until an agreement was reached. As can be seen in table 1, the most challenging was distinguishing between images with single soliton and other excitations. This is likely due to the fact that the phase imprinting method used to imprint solitons can also create other excitations that appear as density modulations or fringes in the BEC. Examples of such modulation can be seen in the off-diagonal images in figure 2(c). Additional discussion of the misclassified and mislabeled data can be found in appendix A.3. Our dataset includes the full-frame raw images, the cropped and rotated images as used in this study, as well as the set of the fitted integrated 2D Thomas-Fermi distribution parameters. This dataset is sufficient to reproduce our results but also to test fitted alternative models with varying cropping size or image resolution [37].

Conclusion and outlook
In this manuscript, we present an automated dark soliton detection and positioning system that combines ML-based image classification with standard fitting techniques to track soliton dynamics in experimental images of BECs. We show that the system performs on par with more traditional approaches that rely on human input for soliton identification, creating the opportunity to study soliton dynamics in large datasets. We also make available the first dataset of images from a dark soliton BEC experiment, which provides an opportunity for the data science community to develop more sophisticated analysis tools and to further understand nonlinear manybody physics.
The performance of the classifier, as measured by the weighted F1 score, leaves room for improvement. While tuning the hyperparameters allowed us to substantially improve the initial performance, additional data is necessary to push the limits. However, human labeling is not only time-consuming but, as the analysis of the misclassified images revealed, is also not always reliable. Other approaches, such as active learning ML [44], may be more suitable for this task. Such enlarged dataset, in turn, will enable refining the soliton classifier and perform model uncertainty quantification [45,46], which currently is not accounted for. Together, these refinements may enable reliable in-situ deployment.
This study was preconditioned on the assumption of specific structure in the images, leading to our three classes. Enlarged dataset will enable employing unsupervised learning strategies [47] to possibly discover additional classes consistent with the data without presumptions. This unsupervised learning of soliton-data is a prototype for ML based discovery with cold-atom data in general. The blue boxed filter activates if more than one soliton is present.

Acknowledgment
This work was partially supported by NIST and NSF through the Physics Frontier Center at the JQI. We appreciate conversations with Yuchen Yue and Justin Elenewski.

A.1. Additional visualization of intermediate convolutional layers
In addition to figure 2(b) in the main text (visualizing the single soliton case), figure A1 shows the same intermediate layers for the correctly classified sample images from the (a) no soliton and (b) other excitations classes. In both figures, we highlight two filters: red box indicates a filter that activates for an image with a single soliton, while the blue box indicates a filter that activates for an image from the other excitations class. For an image from the no soliton class ( figure A1(a)), neither highlighted filter is activated. This confirms that filters in our model are trained to properly detect and locate features that are characteristic of each class in a previously unknown image.

A.2. Model parameter tuning
We used a naive semi-structure search approach to optimize our model parameter.
During parameter tuning, we used k-fold cross-validation to assess the generalizability of trained models. For each set of hyperparameters defining the overall model (e.g., kernel size and number of layers, both convolutional and hidden), the training set was split into k = 5 smaller sets ('folds'), four of which were used for training and one for validation. Once the model was trained using all five cross-validation combinations, the mean score was recorded and compared against scores from networks with other hyperparameters. We arrange the parameters by their importance: filter number of each convolutional layers, dense layer node number, optimizer, convolution kernel size, dropout rate and batch size. The parameters are optimized in this order and the history of hyperparameter tuning is shown in table A1.

A.3. Misclassified data
As discussed in section 2.1, in BEC experiments trapped condensates are often held for a certain period of time after phase imprinting. This is necessary to smooth out the various other excitations resulting from the phase imprinting process. However, by looking only at a single image (single holding time), all the information about the soliton dynamics is lost, and other excitations can be confused with shallow solitons. In the final 640 images in the test dataset, there are 68 misclassified images in one of six possible ways. As can be seen in figure 2(c), for our model, the most confusion comes from distinguishing between the single soliton and other excitations classes. Upon reviewing the 58 images misclassified in this way, we find that out of 27 images classified as other excitations only two clearly contain lone solitons. The remaining 25 images are confusing and thus should belong to the other excitations class. Interestingly, for this case, the classifier seems to be nearly as likely to be wrong as confused (see middle columns in figure 3(b)). All 31 images classified as single solitons are true misclassification. For these images, the classifier is confidently wrong (see last columns in figure 3(b)). We suspect that it might be possible to improve the classifier by training with less penalty for classifying an ambiguous image to other excitations class. This also suggests that active learning strategy might be better for training model and labeling data than relaying on dataset labeled by humans.