Intelligent smartphone-based multimode imaging otoscope for the mobile diagnosis of otitis media.

Otitis media (OM) is one of the most common ear diseases in children and a common reason for outpatient visits to medical doctors in primary care practices. Adhesive OM (AdOM) is recognized as a sequela of OM with effusion (OME) and often requires surgical intervention. OME and AdOM exhibit similar symptoms, and it is difficult to distinguish between them using a conventional otoscope in a primary care unit. The accuracy of the diagnosis is highly dependent on the experience of the examiner. The development of an advanced otoscope with less variation in diagnostic accuracy by the examiner is crucial for a more accurate diagnosis. Thus, we developed an intelligent smartphone-based multimode imaging otoscope for better diagnosis of OM, even in mobile environments. The system offers spectral and autofluorescence imaging of the tympanic membrane using a smartphone attached to the developed multimode imaging module. Moreover, it is capable of intelligent analysis for distinguishing between normal, OME, and AdOM ears using a machine learning algorithm. Using the developed system, we examined the ears of 69 patients to assess their performance for distinguishing between normal, OME, and AdOM ears. In the classification of ear diseases, the multimode system based on machine learning analysis performed better in terms of accuracy and F1 scores than single RGB image analysis, RGB/fluorescence image analysis, and the analysis of spectral image cubes only, respectively. These results demonstrate that the intelligent multimode diagnostic capability of an otoscope would be beneficial for better diagnosis and management of OM.


Introduction
Otitis media (OM) is a spectrum of ear disorders that is fairly common, especially in children [1]. OM with effusion (OME) is defined as the presence of fluid in the middle ear without acute ear infection [2]. OME is one of the main types of OM and is responsible for 10%-15% of visits to the clinic in childhood [3]. OME is the most common cause of hearing loss in children in developed countries [4]. OME is related to poor language learning and school performance during childhood [5,6]. OME and adhesive OM (AdOM) are two common middle ear inflammations. Both are characterized by the presence of effusion in the middle ear chamber and often appear because of a previous acute bacterial infection in that area, which is distinguished by the formation of tympanic membrane adhesions with other middle ear structures in the case of AdOM [7,8]. They are nearly asymptomatic disorders and tend to evolve into chronic cases. If not diagnosed and treated promptly, they may cause ossicular chain damage and progress to cholesteatoma development [8]. Prompt and accurate diagnosis and management of ear disorders are critical to avoid these issues.
The detection of middle ear fluid is a requirement for a specific diagnosis of OM [9]. Commonly, the presence of effusion in the middle ear was confirmed using a pneumatic otoscope. However, the need to adequately seal the ear canal poses a challenge for the use of such tools. Alternatively, acoustic reflectometry may be applied for this, but this approach also presents a limitation because it only allows the acquisition of one-dimensional data. Many efforts have been made to include various functionalities in traditional otoscopy, such as spectroscopy, short-wave infrared imaging, and optical coherence tomography [10][11][12][13].
The clinical environment requires nurses and physicians to be in constant motion, meeting many patients a day in different locations. In that milieu, portable tools are essential to increase the efficiency of health care professionals [14]. Consequently, smartphones have been increasingly incorporated in the medical practice and thus allow aiding physicians in the clinical decision-making, monitoring patients with chronic diseases, and increasing their productivity and efficiency [15][16][17][18][19][20]. Additionally, smartphone-based otoscopy has also been reported as a diagnostic and monitoring tool for ear diseases [21][22][23].
Fluorescence was used to diagnose ear diseases in previous reports. Using chinchilla models, Spector et al. studied the potential of fluorescence spectroscopy to assess bacteria causing acute OM infections [24]. Using fluorescence spectroscopy, four bacteria causing acute OM infection were successfully distinguished in animal models. Levy et al. and Yim et al. tested the potential of fluorescence imaging for the diagnosis of cholesteatoma and OM, respectively [25,26]. Fluorescence imaging is useful for diagnosis; however, exogenous fluorophores are required [25,26]. In situ autofluorescence imaging has been used as a useful methodology for assessing biological tissues in medicine and biology. Cells and tissues naturally emit fluorescence, which is also known as autofluorescence, when excited by light at suitable wavelengths (ultraviolet, visible, or near-infrared). Autofluorescence from cells and tissues is closely related to their morphological and metabolic conditions. Therefore, comprehensive diagnostic information on diseases can be obtained via a noninvasive analysis of tissue autofluorescence without contrast agents [27]. Valdez et al. explored autofluorescence imaging of the middle ear in the detection of ear diseases [28]. They described a study of the potential of a multiwavelength autofluorescence imaging otoscope for the diagnosis of cholesteatoma.
In our previous work, we developed a smartphone-based spectral imaging otoscope, capable of obtaining a spectral image cube that contains spectral information for the diagnosis of middle ear diseases [29]. The system allowed differentiation between the normal and abnormal tympanic membranes. Therefore, it showed a high capability for the quantitative detection of chronic OM with high contrast, implying that the smartphone-based spectral imaging otoscope may have the potential for mobile diagnosis of various middle ear diseases. Previous studies demonstrated that the incorporation of multiple imaging modalities, acquiring complementary information from the sample, provides an important performance improvement compared to single imaging modality diagnostic tools [30][31][32]. Therefore, an advanced smartphone-based otoscope with multimode imaging and analysis capability must be developed for better versatility in the diagnosis of various ear diseases with high sensitivity and specificity.
We, therefore, developed an intelligent smartphone-based multimode imaging otoscope for better mobile diagnosis of ear diseases and demonstrated the potential of machine learningbased multimode image analysis in distinguishing between normal, OME, and AdOM ears.
To acquire different but complementary information about the tympanic membrane of ears, RGB, autofluorescence, and spectral imaging modalities are integrated with a smartphone-based otoscope, and further machine-learning-based multimode image analysis is implemented on the smartphone-based multimode otoscope. The intelligent smartphone-based multimode imaging otoscope was used to examine the ears of 69 patients with and without middle ear diseases such as OME or AdOM to evaluate it. To differentiate normal ears, OME, and AdOM, various machine learning techniques such as multilayer perceptron (MP), random forest (RF), logistic regression (LR), decision trees (DTs), and Naïve Bayes (NB) are trained and tested with a multimode dataset, acquired using our developed system. The machine learning techniques were further compared to conventional spectral classification algorithms, such as the spectral angle mapper (SAM) and Euclidean distance (ED), demonstrating the potential of the intelligent smartphone-based multimode imaging otoscope for mobile ear disease diagnosis.

Intelligent smartphone-based multimode otoscope
We developed an intelligent multimode smartphone-based otoscope. The devised otoscope comprises an interface module, an imaging module, an illumination module, a smartphone (Galaxy S8+, Samsung), and a custom-designed Android application. Figure 1(a) shows the schematics of the overall system components. The rear imaging module of the smartphone is an RGB camera with a resolution of 12 MP, a sensor size of 1/2.55 inches, and a pixel size of 1.4 µm. The aperture ratio and focal length of the lens are f/1.7 and 4 mm, respectively. The interface module includes an interface circuit and a 3.7 V Li-ion battery. The imaging module includes a set of optical lenses for collecting light from ear regions of interest, a high-pass filter for rejecting the excitation light from the collected light, and a smartphone for recording the light. The illumination module incorporates visible range light-emitting diodes (LEDs), high-power ultraviolet (UV) LEDs, band-pass filters to guarantee that excitation light at a selected wavelength from the UV LEDs are delivered onto a sample, and coupling lenses for delivering the light from the UV LEDs into optical fibers. The light from the optical fibers was directed to the sample regions of interest. A photographic image of the assembled apparatus is shown in Fig. 1(b).
The interface circuit mainly comprises of a microcontroller unit (MCU) (Atmega 128A, Atmel), a Bluetooth low energy (BLE) module (RN4871, Microchip), and two LED drivers, one of which is for driving the visible range LEDs (TLC5926, Texas Instruments), and the other is for driving the high-power LEDs (STP04CM05, ST microelectronics). The BLE module enables wireless connection with a smartphone and controls the system via our custom-designed Android application. Three voltage regulators are also included in the interface circuit to supply power to the components on the board. While a 3.7V-to-3.3V regulator (TPS73033, Texas Instruments) provides power to the BLE module, a 3.7V-to-5 V regulator (NCP1402, On semiconductors) supplies power to the MCU, the visible range LEDs, and LED driver, and an additional 3.7V-to-5 V regulator (MCP73831, Microchip) capable of high-current output (∼400 mA) is used for high-power LEDs. The interface circuit also enables the recharging of a lithium-ion battery via a micro-USB female connector. Figure 1(c) shows a block diagram of the interface board.
Twelve LEDs were used for white light, spectral, and UV autofluorescence imaging. Among the ten LEDs within the visible range, eight LEDs (USHIO LEDs) with peaks at 429. were incorporated into the system for spectral illumination. One extra light source at 555.83 nm (FWHM: 16.55 nm) was achieved by using a white LED (SST-20-WCS-A120-L4600, Luminus Devices) and an optical filter (65-098, Edmund Optics). Another white LED was used for white light imaging. Figure 2 shows the emission spectra of the LEDs used in the system. The emission spectra of the eight narrow-band color sources are shown in Fig. 2(a). Figure 2(b) exhibits the spectra of the white LED and the filtered white LED with peak emission at 555.83 nm. All visible-range LEDs were placed inside an LED multiplexer, as described in our previous work [29]. The output of the LED multiplexer was attached to one end of a bundle of visible range optical fibers with a diameter of To remove light at unwanted wavelengths from the excitation light, band-pass filters (84-078, Edmund Optics) were placed after the UV LEDs. Figure 2(b) also displays the spectra of the UV LED before and after passing through an optical band-pass filter. After the band-pass filter, a coupling lens (43-480, Edmund Optics) focused an excitation beam onto the bundle of fibers with a diameter of 200 µm (57-068, Edmund Optics), fit to the transmission of UV light. The measured irradiance of the UV light emitted by the optical fibers was 69.75µW/cm 2 .
Light from the fibers interacts with target regions of the ear, and the light reflected or emitted from the target regions is then collected by the lens system, which includes a high-pass filter with a cut-off frequency of 400 nm (62-974, Edmund Optics) to ensure the removal of the excitation light for UV autofluorescence imaging. Images were then recorded using the smartphone camera. Finally, the acquired images were transferred to a server via either LTE or wi-fi where the images were analyzed, and the results were then returned to the smartphone.

Preprocessing of multimode image data
For spectral image analysis, key preprocessing steps must be performed to ensure reliable spectral data classification. As the first preprocessing step, except for the white light and fluorescence images, all RGB images were converted to grayscale, followed by a flat-field correction. Image registration for all images was performed to compensate for the misalignments of images owing to hand movements during the intervals between the image acquisitions. In the preprocessing of an autofluorescence image, the CLAHE algorithm was applied to the image for contrast enhancement [33], and image registration of the contrast-enhanced image was performed. Finally, the preprocessed images were stacked in a multimode image cube that contained 12 images, which are the nine grayscale spectral images and the red (R), green (G), and blue (B) channels of the autofluorescence image. In addition, we compare the performance of the analysis of the multimode image cubes with various other data combinations to find the one that would provide the best data for the classification of OM. The data flow, from preprocessing to the generation of a segmented image, is shown in Fig. 3 for all the data combinations. Fig. 3. Block diagram of the image processing and classification data path for all the tested data combinations, that is, multimode (yellow arrows), spectral (blue arrows), the combination of white and fluorescence (green arrows), and only white light images (gray arrows). The data follow different processing paths depending on the imaging modality through which it was acquired. Spectral images undergo flat field correction and image registration, whereas autofluorescence images are contrast-enhanced and then registered. The registration is held using the white-light image as a reference. Now the data is ready for classification using a machine learning algorithm. After the labeled image is generated, background removal is executed.

Spectral classification of the multimode ear images
We compared various machine learning and conventional spectral classification algorithms to find the algorithm that offers the highest classification accuracy using our developed system. The MP, RF, LR, DTs, NB, SAM, and ED algorithms were applied to our developed system for the classification of ear diseases of interest. These classification algorithms have been extensively used to analyze spectral imagery data [17,29,[34][35][36][37][38][39][40][41].
For training and testing of machine learning algorithms for classifying normal (Class 1), OM (Class 2), and AdOM (Class 3) ears, ground truths were constructed with the assistance of medical doctors to select the specific regions for each class in all samples. Next, a binary mask was created for regions of interest and applied to the image cube [x, y, and λ (nine wavelengths + R, G, and B channels of an autofluorescence image)], followed by the removal of all null spectral signatures resulting from the masking process. After selecting only pixels from the regions of interest, spectral signatures from spectral images and intensity profiles of R, G, and B channels from the autofluorescence image were extracted at every selected pixel for each class. Additionally, before the classification, data standardization was applied to the extracted data because the maximum intensities in certain channels of autofluorescence images, especially the R channel, were significantly lower than those of other channels. The dataset was then divided in an 80/20 split for the training and test sets, respectively.
MP has been used for the classification of various types of images, including medical, satellite, and spectral images [17,22,35,42,43]. In this study, MP was applied to analyze multimode images. It consists of an input layer, a hidden layer, and an output layer. For the input layer, 12 nodes were used to accept pixels from the nine spectral images at different wavelengths, and the R, G, and B channels of one autofluorescence image. Here, the optimal number of nodes and a regularization parameter were experimentally determined by training and testing cycles while maintaining the other parameters. MP exhibited the best performance at ten nodes in the hidden layer. For the output layer, four nodes were used to output four classes: normal, OME, AdOM, and an extra specular reflection class. The NN model used to classify data composed entirely of the spectral image cubes had nine input nodes, 17 hidden nodes, and four output nodes, and a regularization parameter of 0.03562. We also tested the case when the input data were the R, G, and B channels of both autofluorescence and color eardrum images. Here, the optimal number of nodes was six in the hidden layer. Finally, when the input data were the R, G, and B channels of a white light image of the eardrums, the best number of nodes in the hidden layer was four. A regularization parameter of 0.00028 was determined for the analysis of multimode images and the combination of autofluorescence and white light images, whereas it was determined to be 0.00045 for the analysis of a white light image of eardrums.
We further tested other machine-learning classifiers and conventional spectral classification algorithms. For the analysis of multimode data, an RF classifier was designed with 58 estimators, a maximum tree depth of ten, a minimum-samples-split of 60, and a minimum-samples-leaf of 66, while a DTs classifier had a maximum depth of nine, a minimum-samples-split of 130, and a minimum-samples-leaf of 142. An RF classifier with a maximum depth of seven, 58 estimators, minimum-sample-leaf of 110, and a minimum-sample-split of 174 was found to provide the best classification results when analyzing spectral image cubes only, whereas a DT with a maximum depth of ten, minimum-sample-leaf equal to 146 and minimum-sample-split of 114 yielded the best results with the same data. To analyze the combination of autofluorescence and white light images, the RF classifier had 24 estimators, a maximum tree depth of 8, a minimum-samples-split of 198, a minimum-samples-leaf of 78, while a DT classifier was defined with a max depth of 8, a minimum-samples-split of 182, and a minimum-samples-leaf of 194. Finally, the classification of white light images using an RF classifier was performed with the parameters such as 29 estimators, trees with a maximum depth of 6, a minimum-samples-split of 58, and a minimum-samples-leaf of 78, while the designed DTs classifier had a maximum tree depth of 9, a minimum-samples-split of 58 and a minimum-samples-leaf of 126. The selected criterion for the split quality measure was entropy. The LR classifier had a penalty of l2 and a C value of 1.8874×10 −7 , 3.2903×10 −8 , 1.4873×10 −5 , and 7.8805×10 −8 for the multimode, the autofluorescence and white light, the spectral image cube, and white light data types, respectively, while for the NB classifier, var-smoothing parameters defined as 0.2395, 0.0574, 1, and 1×10 −9 were used to classify for the multimode, the autofluorescence and white light, the spectral image cube, and the white light data types respectively.

Clinical trials
A clinical trial was performed at Seoul National University, Seoul, South Korea, with approval from the Institutional Review Board of the hospital. One otologist collected data from male and female patients with ages ranging from 2 to 80 years. The medical diagnosis was made based on microscopic images and an audiology test. For patients who had abnormalities in both ears, each ear was counted as a separate sample. This study was conducted in accordance with the Declaration of Helsinki. A total of 69 patients participated in this study. From the 69 patients, 30 normal samples, 30 OME samples, and 29 AdOM samples were acquired

Analysis of a multimode image cube for the classification of normal and OM ears
After multimode images of normal, AdOM, and OME ears were acquired using our developed otoscope, the multimode images were analyzed using MP for their classification. Average spectral signatures of each class are shown in Fig. 4. These signatures were obtained by averaging the pixel values within a 50 × 50 window in the dataset. The areas for extraction of the signatures were indicated by an otologist. The vertical lines show the standard deviations for each wavelength. The graph in Fig. 4 shows that the most important features to distinguish between different classes are observed at 525 nm and 550 nm. In other wavelengths, a strong overlap of the standard deviation is noticed. Figure 5 shows white light (column a), autofluorescence (column b), and classified images (column c) of the tympanic membranes of normal, AdOM, and OME ears. Normal tympanic membranes exhibit low autofluorescence with a faint autofluorescence originating at the malleus and bony promontory, as reported in a previous study [28]. Additionally, vascular regions appear darker than normal tympanic membrane regions because of the strong absorption of UV light in the blood. In Fig. 5(b), the AdOM eardrum shows adhesion in the mesotympanum and attic areas along with effusion in the middle ear, resulting in a strong presence of autofluorescence. In the image, the regions of adhesion exhibit strong light-blue autofluorescence, whereas the effusion regions at the center of the eardrum exhibit greenish autofluorescence. However, in the OME autofluorescence image, no strong autofluorescence was observed. This could be because of the type of effusion, which is likely to be mucous in the case of AdOM and serous in the case of OME [44]. As previously suggested [28], the faint autofluorescence from the bony promontory cannot be distinguished in Fig. 5(b) (OME), probably because of the presence of effusion. Interestingly, strong autofluorescence from earwax was observed in the images. These results show that autofluorescence imaging could be used as a powerful additional aid for a specialist during OM diagnosis, allowing the visualization of features that cannot be observed using a regular otoscope.
Moreover, we performed spectral classification of the multimode images after pre-processing using MP. Figure 5(c) shows the classified images of the tympanic membranes of normal, AdOM, and OME ears. Green indicates pixels classified as normal, blue indicates pixels classified as OME, red indicates pixels labeled as adhesion, and black indicates pixels classified as specular reflection. A normal tympanic membrane was successfully classified (Fig. 5(c) (normal)). In the classified image for OME (Fig. 5(c) (OME)), most pixels in the tympanic membrane region were classified as OME (blue). Here, the pixels in the areas corresponding to earwax were classified as normal regions, however, since we did not train the algorithms to classify earwax, this can be considered misclassification. Additionally, in the classified image for AdOM, the areas of adhesions with the attic were distinguished from the areas where effusion was observed behind the tympanic membrane, except for adhesion regions at the center of the image, where a strong greenish autofluorescence is noted.

Comparison of various machine learning and conventional techniques on the assessment of multimode otoscopic imagery
To evaluate the benefits of multimode image analysis in distinguishing between normal, AdOM, and OME ears using our developed otoscope, the result of multimode image analysis with a spectral image cube and an autofluorescence image was compared to the results obtained by the analysis of the combination of white light and autofluorescence images, spectral image cube only, and the analysis of white light images only as shown in Table 1. Table 1 is organized in order of accuracy, with the highest accuracy located on the top rows and the lowest on the bottom rows. The best metrics are highlighted in bold. The metrics shown in Table 1 were obtained through a 10-fold cross-validation process. MP provided the best outcome in the metrics considered.
In particular, the multimode image analysis resulted in the highest mean F1-score of 0.7320, area under the curve (AUC) of 0.9186, and accuracy of 0.7963. The RF closely followed the performance of the MP, but with a much lower F1 score in the AdOM class. A similar trend was observed for the three top classifiers. As expected, the traditional classification algorithms, such as SAM and ED, exhibit poorer performance than the machine learning algorithms because they are the most vulnerable to inter-and intra-class spectral signature variations. All the algorithms had lower outcomes in the AdOM class than in the other classes, implying that spectral and fluorescence imagery data may not be ideal for the diagnosis of AdOM. Table 2 shows the where strong greenish autofluorescence is seen, were misclassified as adhesion; (a, OME) The white light image of another case of OM; (b, OME): autofluorescence image. Note that here, in contrast to (b, AdOM), less autofluorescence is seen, nevertheless, compared with Fig. 3(b, Normal), autofluorescence coming from the bony promontory cannot be noticed.
(c, OME) shows a classification map with most of the eardrum labeled in blue, agreeing with the OME diagnostic given to this ear.
confusion matrix of the MP classifier, which exhibits the best performance in the classification of Normal, OME, and AdOM classes with the multimode data. The confusion matrix demonstrates that MP correctly classified 87.6871% and 82.7153% the Normal and OM classes but it showed the lowest performance in the classification of the AdOM class. In particular, the AdOM class was often misclassified as OME class. The confusion matrices for all the classifiers at different data combinations that provided the best results are found in the Supplement 1 (Table S1). These results show that the machine learning-based multimode image analysis with a spectral image cube and an autofluorescence image not only provides additional qualitative information that can be visually examined by a specialist, but also enables precise classification of ear diseases compared to image analysis with a single image and dual-mode images.

Discussion
We built an intelligent smartphone-based multimode otoscope to collect multiple and complementary data that contain useful diagnostic information and then evaluated its performance for the diagnosis of ear diseases. We hypothesized that the addition of another imaging modality and intelligent multimode analysis capability into our smartphone-based spectral imaging otoscope would increase its potential for the diagnosis of ear diseases. Thus, we conducted clinical trials at a tertiary education hospital in Seoul, where one otology specialist collected data from normal, OME, and AdOM patients. The collected data were then analyzed using various machine learning and conventional spectral classification algorithms. In our previous work, we demonstrated the potential of a multispectral smartphone-based otoscope [29]. The capability of the system as a diagnostic tool was demonstrated using a small number of samples. However, in this work, we conducted extensive clinical trials to collect a large amount of data from patients using an advanced smartphone-based otoscope with additional imaging and analysis capabilities. Therefore, we could ensure that the intelligent multimode otoscope is more accurate than a conventional otoscope in diagnosing ear diseases. Few studies employed spectroscopy to diagnose ear diseases [10,11]. However, to the best of our knowledge, there is no study on multimode imaging including spectral imaging in the diagnosis of OM. Therefore, we developed the intelligent smartphone-based multimode otoscope including the spectral imaging capability. For the spectral imaging, we tried to minimize the overlap of the spectra of adjacent LEDs selecting the ones with the least FWHMs under considerations of availability in the market, size, and emitted light intensity. For autofluorescence imaging, light from two high-power UV LEDs was filtered and transmitted through optical fibers to the tympanic membrane of a patient. The measured irradiance of UV light was 69.75µW/cm 2 , 14 times lower than the limit value of 1 mW/cm 2 guided by the American Conference of Governmental Industrial Hygienists [45]. The device was designed to be attached to a Samsung Galaxy S8 + . However, to serve as a telemedicine tool in remote areas with low resources, the cost may appear as an important factor. Thus, a spectral imaging module, which can be attached to more affordable smartphone models, needs to be further optimized. It can be simply realized by constructing a mechanical structure capable of adjusting the position of a key optical system module. In the system, the diameter of the otoscope probe tip, including a removable and sterilizable cap, is 5.8 mm. Therefore, when the system is applied to a patient with a narrow eardrum, the probe may not be placed at the working distance from the tympanic membrane, thus resulting in the acquisition of a defocused image. This may affect the performance of the system. However, to some extent, a defocused image was also shown to hold useful diagnosis information [46]. As mentioned, spectroscopy, which is based on point detection, has been used for the diagnosis of ear diseases [10,11]. Since the areas of interest in the eardrum correspond to multiple pixels in an image, spectral imaging and analysis can be used for the diagnosis of ear diseases, as spectroscopy, even though slightly defocused images are acquired. In those cases, the classified images tend to show errors near the class boundaries. To increase pixel accuracy in spectral classification, defocused image acquisition should be avoided. It can be realized by fast image acquisition with appropriate illumination and an optical lens system with a large depth of focus.
One of the key advancements of our proposed system over the system described in [29] was the ability to acquire fluorescent images of the tympanic membrane of a patient in addition to its spectral imaging capability. This was realized via the inclusion of high-current LED sink drivers, high-power UV LEDs, optical filters, and specialized optical fibers into the otoscope. To date, a fluorescence imaging otoscope has been developed [28]. In this study, autofluorescence images provided better contrast for identifying the margins of the affected regions. As shown in Fig. 5, autofluorescence images provided additional information for the visualization of ear conditions that cannot be seen solely via the examination of images obtained under white light illumination. However, due to the low intensity of the emitted fluorescence from the middle ear structures, contrast enhancement was needed for a better visual analysis of the images. Furthermore, we noticed that the machine learning classifiers performed better on the data that had been through the contrast enhancement process. In contrast to the device described in [28], our system allows us to obtain a multispectral image cube at nine wavelengths and an autofluorescence image in mobile environments. This permits our system to provide more relevant spectral and autofluorescence information on the ear regions of interest, thus allowing a more precise distinction between ear diseases.
The machine learning-based analysis of the various combinations of data emphasized the importance of a multimode imaging device for the diagnosis of ear diseases. As shown in Table 1, MP was superior to other machine learning algorithms when trained with multimode data. Among the designated classes, the lowest metric values were for the classification of AdOM. This might be because of the similar spectral characteristic features of the AdOM and OME ears. Even the analysis of the multimode data was not sufficient to achieve the classification of AdOM class with an accuracy as high as the Normal and OME cases. One of the distinct features of AdOM from other types of OM is the morphological difference caused by the adhesion of the tympanic membrane and the middle ear structure [7,8]. However, the multimode otoscope developed here is not suitable for acquiring 3D morphological information on ears in detail. Therefore, incorporating a 3D imaging modality that allows for quantitative assessment of eardrum morphology into the system may increase the accuracy of the system in classifying normal, OM, and AdOM ears.
In the clinical trials, patients with a broad age range were examined. Interestingly, age-related variations were found in the spectral signatures for each class. However, the significance of these findings needs to be further evaluated because of an unbalanced distribution of patients per age group in the dataset. Furthermore, the level of the severity of the diseases also affects the spectral signatures, but here the level of severity was not a parameter considered when building the datasets. The associated works remain a future study.

Conclusions
We developed an intelligent multimode otoscope capable of obtaining different but complementary information on the eardrum. Specifically, the developed otoscope allows obtaining a white light image, which is required for real-time visualization, a spectral image cube containing nine channels in the visible range, and an autofluorescence image of the eardrum. Therefore, the system described here enables the acquisition of more quantitative and qualitative data using a handheld, fully portable, and ubiquitously connectable device, suitable for primary care environments. Using data collected from clinical trials, we showed that an autofluorescence image of the eardrum allows better visualization of features that cannot be distinguished uniquely by the analysis of images obtained using white light illumination. Machine learning-based analysis with multimode images yields better performance than single-mode or dual-mode images. It was found that MP could distinguish between normal, OME, and AdOM ears with a mean F1-score of 0.7320, AUC of 0.9186, and accuracy of 0.7963. However, the F1 score for AdOM was lower than the F1 scores for the normal and OME ears. The addition of 3D morphological imaging capabilities to the system could improve this. Currently, more advanced algorithms, such as deep learning networks, are available. The application of more advanced algorithms to the system could also improve the overall performance of the system. The associated study remains a future work. Overall, our findings suggest that the intelligent multimode otoscope can be a useful mobile diagnostic tool for the diagnosis of various ear diseases through the acquisition of additional qualitative and quantitative features in the middle ear.