Automatic Identification of Single Bacterial Colonies Using Deep and Transfer Learning

Bacterial classification is a vital step in medical diagnosis. This procedure normally has several stages. An early stage involves inspecting the morphology of the bacterial colonies. Traditionally, a bacterial colony expert inspects the sample to determine the type of bacteria through visual inspection or molecular biology techniques. With advances in image processing, specifically, the use of deep and transfer learning techniques, and the wide availability of cameras, we applied deep and transfer learning techniques to address this task without requiring expert knowledge or sample shipping. We used a convolutional neural network (CNN) to identify different bacterial colonies based on their appearance in images captured by cell phone cameras. In this paper, we collected a dataset that contains images of different bacteria taken by cell phone cameras with various settings. Thus, images of two classes of bacterial colonies were obtained in King Abdulaziz City for Science and Technology. The dataset contains 8,043 images. The experimental results show that our application has high accuracy without requiring expert inspections.


I. INTRODUCTION
Machine learning is a subdiscipline of artificial intelligence (AI). According to [1], the aim of machine learning is ''to develop algorithms that learn interpretation principles from training samples and apply them to new data from the same domain to make informed decisions''. One of the most popular subdisciplines of machine learning is neural networks, specifically multilayer neural networks, which is commonly known as deep learning. Deep learning has become a popular research topic in different disciplines, as deep learning approaches allow computers to perform sophisticated tasks.
In recent years, the use of deep learning techniques in biology and medicine has increased significantly. Bacterial identification is a crucial step in disease diagnosis, infection treatment, tracking of disease outbreaks associated with The associate editor coordinating the review of this manuscript and approving it for publication was Seifedine Kadry . pathogens, and infectious disease prevention. Bacterial identification has traditionally been carried out by handling bacterial cells through many steps from bacterial culturing to Gram staining and microscopy. Some of these steps take several hours to complete. Alternatively, bacteria could be accurately identified by studying bacterial genetic codes. This genetic analysis includes DNA extraction, purification, and polymerase chain reactions or sequencing in some cases, which is considerably more expensive than traditional methods and requires several costly instruments. Fortunately, with the development of machine learning and AI, bacterial identification can easily be achieved in a single step by using image recognition and machine learning. This technological advance offers a rapid and costeffective technique that addresses the time and cost problems associated with existing bacterial identification methods. The development of an automatic bacterial identification method would be very beneficial for the healthcare and industrial sectors.
In recent years, the world has faced the COVID-19 pandemic. Scientists worldwide strove to develop appropriate treatments. This situation revealed the lack of fast detection techniques for microbes, and faster testing techniques are needed for microorganisms [2].
We propose an image processing approach that uses deep and transfer learning techniques to identify bacterial colonies depending on the appearance of the colony morphology. This approach is an effective alternative to traditional methods such as molecular biology techniques, the VITEK 2 System, and mass spectrometry systems. In general, existing methods are expensive, have substantial time costs for data processing [3], and depend on instruments or expertise that are not available in all laboratories. Our bacterial colony classification approach is cost-and timeeffective and achieves high accuracy.
The key contributions of this paper are summarized as follows: We collected a new dataset of Escherichia coli (E. coli) and Klebsiella pneumoniae (K. pneumoniae) bacteria using mobile device cameras instead of microscope devices. To the best of our knowledge, this is the first bacterial dataset captured by RGB cameras for full and partial plates (no single colonies).
We developed and applied a deep learning-based approach on the newly collected dataset.
We conducted extensive experimental studies to evaluate our approach using various model architectures and imaging conditions, such as different camera angles, distances to bacteria and backgrounds.

II. RELATED WORK
Several recent studies have shown that deep learning techniques can be applied to develop valuable medical and biological applications using different algorithms [4]. These applications have been utilized in hospitals, veterinary clinics, and food industries as image classification tools in biological and medical settings [5]. A deep learning technique for classifying different species of bacteria was recently proposed by [6], and a set of single colony images was used to train their model. In comparison, in this paper, several models were trained to differentiate two types of bacteria using images of both full and partial plates acquired at different angles and distances that could be viewed by human eyes without requiring any technical devices that might simplify feature extraction. The convolutional neural network (CNN) model has been shown to identify five species of bacteria with 95% accuracy [7]. The CNN model has also been demonstrated to count bacterial colonies without human intervention [8]. Moreover, deep learning models identified 90% of the bacterial colonies in the first 3 hours of growth and more than 95% of the bacterial colonies after 12 hours [9]. Reference [2] used ResNet-18 and ResNet-50 models on a dataset that contained approximately 660 images of 33 species of bacteria and achieved 99.35% accuracy. Reference [10] used a machine learning process to classify 2,520 images of Klebsiella and achieved an accuracy of 96.71% using an extreme learning machine. Another study aimed to achieve early identification of pathogenic bacteria in food and water with a computational live bacterial detection system that regularly captures coherent microscopy images of bacterial growth inside a 60-mm-diameter agar plate and analyzes these time-lapse holograms using a deep neural network [9].
Transfer learning has been used to detect, identify, and classify bacteria. For instance, the authors in [11] performed bacterial classification using atrous convolutions and transfer learning. Atrous learning was applied to increase the number of dimensions. VGG-16 with transfer learning was utilized to perform atrous convolutions. Approximately 660 images in the DIBaS dataset were used to train, validate, and test the models, and the bacterial images were classified into 33 categories. The results show that the proposed atrous transfer learning model achieved good results, obtaining a classification accuracy of 95%.
The authors in [12] achieved bacterial classification using a pretrained DenseNet-201 model. Microscopic images of bacterial pathogens were classified into six categories. Transfer learning was applied with the frozen weight technique, in which the weights of all layers except the classification layer were held constant. After image augmentation and resizing, more than 40,000 image pairs were used to train and test the proposed model. The results indicated that DenseNet-201 outperformed VGG-16 and ResNet-18, achieving an accuracy of 99.2%.
The study in [13] identified and classified longitudinal bacteria using transfer learning with ResNet-18. The dataset contained approximately 5000 microscopic images of Thiosymbion bacteria. Each image included only one cell shape. With transfer learning, ResNet-18 accurately classified 99% longitudinal cell divisions. Furthermore, [14] presented an open source tool called MotilityJ for detecting the spread of bacteria. The proposed tool was developed using deep transfer learning. Several deep learning models, such as ResNet-101 and EfficientNet-B3, were used for transfer learning. The results show that the proposed tool segmented bacterial colonies with up to 100% accuracy.
Transfer learning has also been used extensively to detect COVID-19 and pneumonia. For instance, [15] presented a classification approach for rapidly differentiating between COVID-19, pneumonia, and healthy persons by using VGG-19 and transfer learning on chest X-ray images. A freely available dataset of approximately four thousand (3797) images was used in their analyses. Several image processing techniques were applied to preprocess the images, including resizing the images to 256 × 256, applying a Gaussian filter, and equalizing the histograms across each RGB channel. Transfer learning was exploited by using VGG-19, a 19-layer CNN-based architecture pretrained on the ImageNet dataset. An accuracy of 97% was achieved on the test data. The results indicate that transfer learning can improve the performance of deep learning models.
In [16], the authors applied transfer learning to investigate the performance of a CNN for automatically detecting coronaviruses. Two datasets composed of X-ray images obtained from public medical repositories were used. Each dataset contained more than 1400 images, including COVID (200), healthy (500) and viral or bacterial pneumonia images (700). Transfer learning was applied because the number of confirmed COVID-positive images was small. With transfer learning, pretrained CNN model weights are used, which addresses the lack of training images. Five CNN models were used in their analyses: VGG-19, Inception, Xception, Inception ResNet v2, and MobileNet. The results indicate that MobileNet and VGG-19 achieved the highest accuracy of 96.7%.
A transfer learning approach for addressing the lack of training data available for identifying COVID-19, viral pneumonia and healthy individuals in lung X-ray images is presented in [17]. Model integration and transfer learning techniques were applied before classification. Specifically, pretrained ResNet-152 and ResNet-101 models were trained on the obtained datasets to accurately classify images into three classes. In the model integration stage, these models were combined by freezing their weights. Two publicly available datasets, namely, Chest X-ray and RSNA Pneumonia, were used for training and testing. The resulting model classifies a given image into one of three classes and achieved an accuracy of 96.1%.
A generative adversarial network (GAN) based on transfer learning to automatically detect pneumonia caused by COVID-19 is presented in [18]. The model classifies the images into two classes: healthy and pneumonia. The dataset contains approximately 5800 chest X-ray images. The small size of the dataset does not affect the model performance because transfer learning and the GAN prevent overfitting, thereby improving model performance. The GAN generates a large number (90%) of similar images to increase the number of training images while preventing overfitting. Four CNN models were selected to perform transfer learning and identify pneumonia images: ResNet, AlexNet, SqueezeNet, and GoogLeNet. The results indicate that ResNet displayed the best performance, achieving the highest accuracy of 99% on the test data.
A modified CNN transfer learning approach for detecting COVID-19 in chest X-ray and CT images is presented in [19]. Deep transfer learning was applied by modifying some layers of a pretrained AlexNet model. In addition to the COVID detection approach, the authors generated publicly available datasets composed of CT and X-ray images collected from several sources. The dataset contained approximately 500 images, including 120 X-ray images and 340 CT images. The approach detected COVID-positive images with 98% accuracy. Another study [20] proposed a transfer learning approach for identifying COVID-19 and differentiating COVID-19 from viral pneumonia, bacterial pneumonia, and healthy lungs. Two classification models (multiclass and binary) were implemented after training nine deep transfer learning models. These models range from eight layers in AlexNet to 201 layers in DenseNet. Two freely available datasets, including COVID-19 chest X-ray and pneumonia datasets, were used for transfer learning. The results indicate that ResNet achieved an average accuracy of 98.75%, and 100% accuracy was achieved in differentiating COVID-19 from healthy individuals, viral pneumonia, and bacterial pneumonia.
In [21], the authors proposed a computer-aided COVID-19 diagnosis algorithm based on preprocessed chest X-ray images. Numerous preprocessing techniques were applied to enhance the images and improve the classification accuracy. These preprocessing techniques include histogram equalization and low-filter algorithms. Next, these images were combined and used as input in a transfer learning CNN trained on a publicly available dataset of 8474 images. The VGG-16 transfer learning model classifies images into three classes, namely, COVID pneumonia, non-COVID pneumonia, and healthy, with an accuracy of 94.5%. The results indicate that preprocessing images can improve the detection process and COVID-19 detection results.
The authors in [22] presented an ensemble transfer learning approach for detecting COVID-19 in chest X-rays while differentiating between viral and bacterial pneumonia. Ensemble learning was applied by merging the results of several pretrained models, including ResNet v2, DenseNet-201 and VGG-16. The model learns deep features in chest X-ray images and uses these features to implement binary and multiclass classification. Two datasets with more than 1600 images were used for training and testing. The model achieved accuracies of 99% and 96.1% on the two datasets.
Another transfer learning technique for performing multiclass classification for COVID-19 detection is presented in [23]. Two datasets containing 5500 images of thoracic radiographs were used to train ResNet and classify images as healthy, viral pneumonia, or bacterial pneumonia. Furthermore, an experiment was performed to compare the performance of the CNN with and without transfer learning. The results indicate that the transfer learning approach with ResNet-50 achieved an accuracy of 97%, outperforming deep learning methods without transfer learning.
In [24], the authors proposed a deep transfer learning approach using ResNet-50, DenseNet-121 and VGG-19. However, in most of the literature, transfer learning is used with small datasets. In this paper, a large number of images is classified into one of four categories, namely, opaque lungs, healthy lungs, COVID-19 pneumonia, and viral pneumonia. A large dataset with more than 20,000 chest X-ray images was used for transfer learning to train and validate the model, achieving an accuracy of 94%. The results indicate that transfer learning is a promising approach for identifying COVID-19 and conventional pneumonia in chest X-ray images. Table 1 shows the results of some recent research related to our work. The table includes AI methods for classifying bacteria that achieved the highest reported accuracy on the dataset, including the sample size, dataset name and image type. These studies used datasets containing microscopic images of bacterial cells, whereas a new dataset with bacterial colonies images was collected in our study, using mobile devices cameras to directly capture the colonies without using any microscopes or magnification devices.
As shown in Table 1, several popular CNN architectures, such as AlexNet [25], VGGNet [26], and ResNet [27], have been widely used in research. We experimented with various architectures to investigate the applicability and usability of our dataset and report all results to provide an in depth review of the applicability and limitations of the proposed dataset.

III. APPROACH
This approach was developed based on the success of CNNbased models for solving image classification tasks by using previously labeled data to learn how to classify new data. We designed our approach using state-of-the-art architectures and evaluated it on our in-house collected dataset. Regardless of the architecture, the model is initialized with random weights or with weights from a model pretrained on another dataset. Then, the model outputs random classes. Next, during training, those weights are adjusted to obtain more reasonable outputs.
To initialize the model weights, we used the weights of a model trained on a different dataset, which is a common method in deep learning known as transfer learning. Transfer learning involves drawing on what has been learned from other datasets to address a previous task to solve the current task of interest. Additionally, via experimentation, we found that transfer learning yields better performance than other approaches.
In this paper, we used various popular CNN architectures, such as ResNet-18, AlexNet, VGG-16, SqueezeNet, and DenseNet 161, to identify two bacterial species in digital images of bacterial colonies. During the training phase, as shown in Figure 1, the model was exposed to images of both classes accompanied by sample class labels. During each learning step, the model weights were adjusted to increase the number of times that the model predicted the correct class and decrease the number of times that the model predicted the incorrect class. With sufficient time and training data, the model should be able to classify a new sample correctly.
Our approach is simple and commonly used when applying CNNs for classification tasks. We used a CNN followed by a classifier layer that has two output values representing the different bacterial types, as shown in Figure 1. A cross entropy loss and soft max layer were used during training, aiming to minimize the loss via backpropagation during each iteration.
During the testing phase, the CNN and classifier layer are both fixed, and their weights are held constant. The weights learned during the training phase are used to output the predicted results, which are later used to evaluate our approach, as shown in Figure 2. The CNN and classifier layers can vary, and we tested various networks.

IV. DATASET
The dataset collected in this study consists of 8,043 digital images of colonies of two different bacterial strains: E. coli  and K. pneumoniae. These digital images were collected in the National Centre for Biotechnology-KACST in the Kingdom of Saudi Arabia. The bacterial strains were initially streaked on MacConkey agar plates and incubated at 37 • C for 24 hours. A single colony was taken with a 10 µl loop and restreaked on a fresh MacConkey agar plate for increased purity. These plates were incubated under the same conditions as the other plates in the dataset.
Three different mobile phones were used to collect these photos (iPhone Xs Max, iPhone 11 Pro and iPhone 7). All phone cameras were set at 1080p resolution to obtain higherquality photos. The horizontal and vertical resolution of all images was 72 dots per inch (dpi).
Different shooting settings were also used in the collecting process in ''controlled'' and ''uncontrolled'' environments to obtain a wider range of image qualities, camera poses, lighting conditions, etc. In the controlled environment, the distance between the camera and plate, camera pose, lighting conditions, and camera itself were fixed during image collection. In the uncontrolled environment, these conditions were not fixed. Table 2 summarizes the conditions in the controlled and uncontrolled environments, while Figure 4 presents fragments of the images.
All images in the dataset were processed depending on whether the image was in the training or testing set. All training data were resized to 224 x 224 pixels and randomly flipped for horizontal or vertical transformations, and all values were converted to tensors. The preprocessing for the  testing data, however, included only two steps: the images were resized to 224 x 224 pixels, and the pixels were normalized for color clarity. Figure 3 shows some sample images before and after preprocessing. It is worth noting that we applied different preprocessing steps for the training and testing data to increase the variety in the training data, thereby increasing the accuracy and noise robustness of our model.

V. DATASET VALIDATION
The validity and authenticity of the newly collected images in the dataset used in this study were first confirmed by identifying both bacterial strains using the VITEK 2 system. Then, we tested the common features of colonies produced by both bacterial strains when streaked on MacConkey agar plates. These features include the color, appearance and texture of the colonies. The E. coli colonies are known to appear as round, dome-shaped colonies that are slightly depressed in the middle, with bump point locations in the middle of the depression. These colonies can also appear as rough-edged, flat, or flared colonies [28]. In contrast, K. pneumoniae colonies appear as round, domeshaped, mucoid, pink colonies, as shown in Figure 4 [29], [30].

A. SYSTEM SETUPS AND MODEL PARAMETERS
We used five CNN architectures: ResNet-18 [31], AlexNet [25], VGG-16 [26], SqueezeNet [32], and DenseNet 161 [33]. All models were pretrained on the 1000-class ImageNet dataset [34]. The output of the classifier layer in all models was reshaped to suit our classification task. Therefore, we edited the classifier layer to have one of two classes, E. coli or K. pneumoniae, as its output. We used Google Colab to train the models. Google Colab provided different resources during each run; however, these resources affect only the running time, which is beyond the scope of this paper. The model parameters were set as follows: Learning rate = 0.0001 Momentum = 0.9 Batch size = 8 Optimizer = Stochastic gradient descent Number of epochs = 30 The above training settings were chosen after repeating Experiment #1 several times and estimating the model accuracy on a validation dataset, which is a subsection of the training dataset.

B. RESULTS
Given that the dataset and approach are the first of their kind, there are no baseline or existing state-of-the-art methods with which we can compare our model. Thus, we used multiple architectures and experimental conditions to provide sufficient evidence demonstrating the applicability and effectiveness of our approach. We evaluated our approach on each model with five different experimental settings in terms of the data distributions and selections for training and testing. Table 3 shows the number of bacterial plates used for each experiment.
In the first experiment, we used 241 controlled training images and 50 controlled testing images to evaluate the applicability of our approach in the easiest scenario where the training and testing data are similar. In the second experiment, we applied 5-fold cross-validation to assess the model under various data divisions and to measure the accuracy with more robustness. The dataset was divided as follows: Fold 1 contains E. coli and K. pneumoniae training images from plates 1 to 8 and testing images from all other plates.
Fold 2 contains E. coli and K. pneumoniae training images from plates 9 to 16 and testing images from all other plates.
Fold 3 contains E. coli and K. pneumoniae training images from plates 17 to 24 and testing images from all other plates.
Fold 4 contains E. coli and K. pneumoniae training images from plates 25 to 32 and testing images from all other plates.
Fold 5 contains E. Coli training images from plates 33 to 36, K. pneumoniae training images from plates 33 to 39, and testing images from all other plates.
The third experiment used 291 controlled training images and 4,387 uncontrolled testing images to evaluate the effectiveness of the model when the test images were acquired under slightly different conditions and to measure the reduction rate. In the fourth experiment, we used 2,614 uncontrolled training images and 1,772 uncontrolled testing images to evaluate the robustness of the model on a variety of data. In the fifth experiment, we used 4,386 uncontrolled training images and 291 controlled testing images to assess the comparative performance of the model when trained on   a variety of data versus controlled data. Table 4 shows the details of each experiment. The number of available images during the training phase varies in each experiment.
The results of all experiments evaluating the approach in various conditions and settings indicate the applicability of our method. As shown in Table 5, the VGG-16  architecture yielded very high accuracy, outperforming all other architectures under all conditions and image settings, including different cameras, angles, camera poses, and lighting conditions. The accuracy of VGG-16 in experiment #3 was 86%, which is expected given that the model was trained on images obtained under more controlled settings, i.e., easier tasks, and tested on images obtained under less controlled settings, i.e., harder tasks.
To gain more insight into the training phase, Figure 5 shows the loss during each epoch, which decreases as expected.
In general, the results are consistent with common knowledge about training on data with higher variety (uncontrolled data), demonstrating that the model obtains higher robustness and accuracy than models trained on less varied data (controlled data). Moreover, more training data improves model performance, thereby explaining the increased accuracy. The experimental results show the effectiveness of our approach.
To further analyze the results, in addition to the accuracy, we calculated the F1-score, precision and recall, and the results are shown in Tables 6 and 7. In general, the results confirm that the model has good performance and is consistent across different classes of bacteria.

VII. DISCUSSION AND CONCLUSION
This paper presented and evaluated a bacterial colony classification approach. The approach was developed and evaluated on real-world data that we collected in house. The collected data is, to the best of our knowledge, first of its kind. The evaluation process contained various data conditions and architectures. Two bacterial colonies were prepared for this study: Escherichia coli and Klebsiella pneumoniae. The colonies were prepared under two settings: controlled and uncontrolled. The controlled and uncontrolled datasets contained 321 and 7,722 images, respectively. A comprehensive explanation of five experiments performed on the dataset is provided in this paper. The applicability, effectiveness, and robustness of the dataset and approach were tested. The accuracy of the proposed method ranged from 72-100%. Higher accuracy was obtained by using the controlled dataset for training and testing or the uncontrolled dataset for training.
The higher accuracy reported when using the uncontrolled data for training was due to the increase in the amount of data and image variety in terms of pose and lighting conditions, which increased the robustness of the approach.
The proposed approach was trained using a dataset captured by different mobile devices on full or partial plates. This approach increases the speed and efficiency of the identification process in laboratories and does not require expert assistance. Thus, we believe that the proposed model offers considerable value to laboratories, as our method may accelerate their work and reduce the need for expert experience.