Predicting the Breed of Dogs and Cats with Fine-Tuned Keras Applications

The images classification is one of the most common applications of deep learning. Images of dogs and cats are mostly used as examples for image classification models, as they are relatively easy for the human eyes to recognize. However, classifying the breed of a dog or a cat has its own complexity. In this paper, a fine-tuned pre-trained model of a Keras’ application was built with a new dataset of dogs and cats to predict the breed of identified dogs or cats. Keras applications are deep learning models, which have been previously trained with general image datasets from ImageNet. In this paper, the ResNet-152 v2, Inception-ResNet v2, and Xception models, adopted from Keras application, are retrained to predict the breed among the 21 classes of dogs and cats. Our results indicate that the Xception model has produced the highest prediction accuracy. The training accuracy is 99.49%, the validation accuracy is 99.21%, and the testing accuracy is 91.24%. Besides, the training time is about 14 hours and the predicting time is about 18.41 seconds.


Introduction
Deep learning is commonly used to solve computer vision problems, with researchers building upon each other's work. Dean et al. [1] applied deep learning to speech recognition, and Krizhevsky et al. [2] developed a convolutional neural network (CNN) for image classification. Recognizing that building new, accurate CNNs is difficult due to the data and time required, researchers have fine-tuned existing models to improve results without that expense. Tajbakhsh et al. [3] demonstrated a fine-tuned CNN model that produced better results than an all-new one. Yosinski et al. [4] then demonstrated the feature transferability of CNN models and developed a new image classifier using a pre-built Keras model with a new dataset. Our aim is to tailor a Keras model to develop a classifier for identifying the breeds of dogs and cats (CDC). Our approach can also be applied to develop new image classifiers to enable robots in automated factories to identify objects using appropriate datasets of tools and work pieces.

Related Works
Image classification of dogs and cats has been frequently used as a case study for deep learning methods. Parkhi et al. [5] proposed a method for classifying the images of 37 different breeds of dogs and cats with an accuracy of 59%. Panigrahi et al. [6] used a deep learning model to classify images of dogs and cats simply as "dog" or "cat", with a testing accuracy of 88.31%. Jajodia et al. [7] used a sequential CNN to make a similar basic distinction, with 90.10% accuracy. Reddy et al. [8], Lo et al. [9], Deng et al. [10], and Buddhavarapu et al. [11] utilized transfer learning methods with Keras models to create the new models of Resnet, Inception-Resnet, and Xception.

CDC Dataset
We adopted Keras models pre-trained with a general image dataset. We used that with our new CDC with images of dogs and cats as listed in Tab. 1. Our data included images of 21 breeds of dogs or cats divided into training, validation, and testing images as shown. All images were taken from Dreamstime stock photos [12]. In total, we used 20,574 training images, 2,572 validation images, and 2,590 testing images. To test our approach, we adopted the Inception-ResNet v2, ResNet152 v2, and Xception models from the Keras repository and fine-tuned them with the new dataset. We fine-tuned CDC with the training and validation datasets, omitted the top layer of each Keras model, and redefined the number of fully connected layers based on the number of breeds. The training parameters are shown in Tab. 2. For training ResNet-152 v2 and Inception-ResNet v2, we used 50 epochs and a batch size of 16. For training Xception, we used 50 epochs and a batch size of 4. We used the stochastic gradient descent (SGD) as the optimizer in all cases, with a learning rate of 0.0001. Finally, we employed binary cross-entropy as the loss function throughout.
We created the confusion matrix by using the fine-tuned Keras model with the testing dataset as input. We calculated the testing accuracy of each CDC-Keras model combination using the confusion matrix and Eq. (1): where TP i is the true positive value for each breed, FN i is the false negative value for each breed, and n is the number of breeds.

ResNet-152-Based CDC Model
He et al. [13] proposed ResNet v2 as an improvement of the residual network model. Identity mapping was used to directly propagate the forward and backward signals from one block to another. ResNet v2 offers better performance than the previous version, which has various types with different sizes of hidden layers. ResNet-152 v2 has 152 hidden layers and uses a fixed input image size of 224 × 224 RGB pixels.
The training result for CDC using the fine-tuned ResNet-152 model is shown in Fig. 1

Inception-ResNet-Based CDC Model
Szegedy et al. [14] developed Inception-ResNet v2 model as an improvement to Inception v3 with residual connections to increase training speed and recognition performance. The input images to Inception-ResNet v2 are fixed size 299 × 299 RGB pixels. Training results of the fine-tuned Inception-ResNet-based CDC model are shown in Fig. 2. The final training accuracy and validation were about 98.97% and 98.94%, respectively. Training this combination required 12 hours and 29 minutes. The confusion matrix of the Inception-ResNet-based CDC model is shown in Tab. 4. The testing accuracy was about 89.50%.

Xception-Based CDC Model
Chollet [15] developed Xception as an improvement to Inception v3. The Xception and Inception v3 models have the same parameters, but Xception uses separable convolutions. The inputs to Xception are fixed size 299 × 299 RGB images. The training result of the fine-tuned Xception-based CDC model is shown in Fig. 3. The final training and validation accuracies of the Xception-based CDC model were 99.50% and 99.23%, respectively. This combination required 13 hours and 48 minutes for training. The confusion matrix of the Xception-based CDC model is shown in Tab. 5. The testing accuracy was 91.24%.

Model Comparison
Tab. 6 shows the training and validation accuracies for each fine-tuned CDC model. The table also includes values for the fine-tuned VGG19-based CDC model from our previous work [16]. The models we have evaluated for this paper all have higher accuracy than the VGG19-based CDC model. The Xception-based CDC model had the highest accuracy among all the models. The testing accuracy for the Norwegian Forest Cat class was poor overall. We believe this is due to the similarities between the Norwegian Forest Cat and the Siberian and Maine Coon cats it was commonly misclassified as. Fig. 4 shows images of these three breeds. Tab. 8 shows the training and testing times for all models. VGG19 had the shortest training time, 6 hours and 38 minutes, but the poorest accuracy. The Xception-based CDC model had the longest training time but achieved the highest accuracy and required the least amount of time to identify the breed.
We also used these combined models to identify individual dog and cat images, with results shown in Fig. 5. The prediction times are shown in Tab. 9.     Tab. 9 shows that the Xception-based CDC was slower than the VGG19-based CDC model but faster than the other two models. Fig. 6 plots the overall performance of all models. Xception offered better accuracy than VGG19, at the cost of slower classification time.

Conclusion
In this paper, we have presented an image classifier for identifying the breeds of dogs and cats that incorporates fine-tuned deep learning models from Keras. Our results show that the Xception-based CDC model has the highest accuracy among those tested, with training, validation, and testing accuracies of 99.49%, 99.21%, and 99.21%, respectively. Although the training speed of the VGG19-based CDC model from our previous work is the fastest among all the models, it offers the lowest accuracy. The higher accuracy of Xception comes at a cost, with the longest training time of 13 hours and 14 minutes. Nonetheless, Xception's speed at identifying breeds was second-fastest overall, behind only the relatively inaccurate VGG19.