Varied Image Data Augmentation Methods for Building Ensemble

Convolutional Neural Networks (CNNs) are used in many domains but the requirement of large datasets for robust training sessions and no overfitting makes them hard to apply in medical fields and similar fields. However, when large quantities of samples cannot be easily collected, various methods can still be applied to stem the problem depending on the sample type. Data augmentation, rather than other methods, has recently been under the spotlight mostly because of the simplicity and effectiveness of some of the more adopted methods. The research question addressed in this work is whether data augmentation techniques can help in developing robust and efficient machine learning systems to be used in different domains for classification purposes. To do that, we introduce new image augmentation techniques that make use of different methods like Fourier Transform (FT), Discrete Cosine Transform (DCT), Radon Transform (RT), Hilbert Transform (HT), Singular Value Decomposition (SVD), Local Laplacian Filters (LLF) and Hampel filter (HF). We define different ensemble methods by combining various classical data augmentation methods with the newer ones presented here. We performed an extensive empirical evaluation on 15 different datasets to validate our proposal. The obtained results show that the newly proposed data augmentation methods can be very effective even when used alone. The ensembles trained with different augmentations methods can outperform some of the best approaches reported in the literature as well as compete with state-of-the-art custom methods. All resources are available at https://github.com/LorisNanni.


I. INTRODUCTION
Convolutional Neural Networks (CNNs) and their derivations are a hot topic of research as these deep learners can be said to be at the top of the range for image classification tasks. Leveraging the mathematical concept of convolution, CNNs learn kernels that can extract salient features directly from the training sets, without the need for human intervention or otherwise necessary feature extraction algorithms. These learners are thus able to reach and surpass the efficacy and efficiency of handcrafted features thanks to their ability to perceive relationships in bigger pixel clusters and extract features independently from their position by reducing the The associate editor coordinating the review of this manuscript and approving it for publication was Mauro Tucci . size of the input and expanding its depth while shaping it to the final output size. In general, CNNs need vast, labeled datasets to achieve acceptable results in classification problems due to their colossal parameter size, and for that human intervention is needed. However, it is impossible to label manually the massive number of images needed to train CNNs (on the order of 14 million images with a thousand classes in the case of ImageNet [15]). In some cases, mostly in medical and bioinformatic fields, where the number of obtainable samples is limited by external factors, collecting enough data for CNNs training can be prohibitive due to cost, knowledge needed, and labor. Where the datasets are instead enormous researchers are required to gain access to powerful and costly machines that can handle the workload. Solutions to the problem of collecting big datasets have been used for a long time and are still being researched. The two most powerful techniques are based on transfer learning and data augmentation. The first makes use of CNN architectures pre-trained on enormous datasets, like the ImageNet [15], and then finetuned on smaller datasets. The second generates new samples based on the original ones to add to the training set. Other newer methods include dropout layers [60], zero-shot or one-shot learning [50], [70] and batch normalization [60].
Data augmentation is adopted in different situations to overcome the scarcity of images and help during the training phase of a model. For example, this is done to augment the information in low-light environments, [32] like in underwater images where it is necessary to maintain important details while at the same time enhancing the quality of the image [31], [73], [74].
Here we want to set focus mostly on data augmentation due to its vital presence in specific fields where big datasets cannot be created [27], [49], [61]. Data augmentation manages through the enlargement of the dataset to promote better generalization and reduce the problem of overfitting by adding and extracting information that is inherited within the training space. Most of the literature, in this regard, (see surveys [27], [49], [61]) covers geometric transforms, color modification methods through statistical probability and learned methods in the likings of Generative Adversarial Networks. Here, we analyze and evaluate the performances of ensembles built on the data level through combination by sum rule of different image manipulation methods like the ones presented in [47].
The remaining of the paper is structured as follows: Section II provides some related works on data augmentation via image manipulation. Section III describes the new data augmentation approaches. Section IV describes the empirical evaluation with the comparison of all deep learning models trained with the image augmentation methods. Section V concludes this work and provides some further research on this topic.
The contributions of this article are emphasized as follows: i) We propose a set of new methods for data augmentation based on different image transformations. These new methods enjoys fast processing speed and are beneficial for different computer vision tasks. ii) We define different ensemble methods that combine different data augmentation methods during the training phase. Compared with existing machine learning methods, our ensembles based on data augmentation methods provide competitive results in different domains. iii) We provide an empirical evaluation of ensembles trained with classical and the new data augmentation methods. Assessing on several metrics and datasets, we show that making use of different and independent image augmentation methods is beneficial for ensembles.

II. RELATED WORK
A high-level taxonomy for these methods is depicted in Figure 1 [61]. The research study proposed in this work is focused on data augmentation methods computed by performing basic image manipulation. Most of these augmentation algorithms are easy to implement, thus providing feasible ways for scholars to adopt them. But practitioners must be careful in applying these image manipulations because it is possible to produce new images that no longer belong to the same class as the original. For instance, flipping an image that represents the digit ''6'' would result in an image that represents the digit ''9.'' Among the different geometric transformations, flipping, rotation, and translation are the most popular ones. Flipping, especially along the horizontal axis, is one of the simplest and most popular geometric transforms for data augmentation [61]. Rotating an image on the right or left axis in the range [1,359] is another typical geometric transformation as it is a translation that shifts a sample up, down, left, and right [61]. The latter can introduce undesirable noise [38]. Another technique consists of the random cropping of an image that results in a new image with reduced size which is often required to fit the input of a model. New images can be generated also by substituting random values in the original input. This technique has been extensively evaluated in the literature [40], for instance, comparing the performance of different AlexNets trained with these simple augmentation techniques and then assessing the performance on two different datasets [59]. On the standard ImageNet and CIFAR10 [28], models trained with data augmentation via rotation of images provide better performance if compared with data augmentation via translation, random cropping, or random values.
Random erasing [76] and cutting [17] result in new images with occlusions; this is particularly useful as these methods represent standard situations in the real world where objects are often partially visible. A recent literature review of data augmentation methods reports the literature on this type of method [49]. Good performance has been provided by a ResNet architecture trained on Fashion-MNIST, CIFAR10, and CIFAR100 where new images were generated by erasing portions that vary in size of the original inputs [76].
Mixing images is another simple method for generating new images. This can be easily done by averaging the pixels between two or more images from the same classes [25], or images can be transformed by adopting a transform whose outputs can be mixed, for instance, by chaining [23]. Data augmentation can be accomplished also by combining different techniques. For instance, in [25], random images were cropped and flipped before averaging the RGB channel values of each pixel in the images. Recently, non-linear transformations were adopted to mix images [33] or by adopting generative adversarial networks (GAN) [33].
Kernel filters are often used to blur or sharpen images with the aim of generating new images, for instance, by applying Gaussian blur with a sliding window of fixed size or by randomly swapping the values of the matrix in the filter window, as done by PatchShuffle [26].  The generation of new color spaces is another method adopted for data augmentation, with the positive side-effect that possible existing biases of illumination are removed [61]. It is also possible to compute a histogram of pixels in a color channel with the aim of applying different filters to each channel or converting one color space into another. But in some situations, for instance, changing RGB to grayscale, this operation can result in reduced performance of a classifier [9]. Data augmentation is often produced with the addition of random noise to color distributions or by jittering and adjusting the contrast, saturation, and brightness of the original images [29], [59]. These color adjustments may result in the loss of valuable information. We point the reader to [65] for an exhaustive review of color space transforms for image augmentation.
Sometimes, data augmentation techniques do not consider the entire training set. An example of that is PCA jittering [29], [41], [42], [59], [65] which performs data augmentation by multiplying the PCA components by a small number. In particular, only the first component that is the most informative is jittered [65], an image can be transformed by applying PCA, DCT, and jittered by adding noise to all components before reconstructing the image [42].

III. MATERIALS AND METHODS
In this section, we describe the data augmentation methods adopted from the literature and we introduce the new methods proposed in this study.

A. DATA AUGMENTATION MODELS
The number of images produced by each method used in this study is reported in Table 1.

1) OLD DATA AUGMENTATION METHODS
Old data augmentation methods are drawn from the literature. In particular: the methods labeled APP1 to APP11 have been detailed in [47], while APP12 to APP14 are proposed in [48]. For technical details about these methods, we point the interested reader to the original papers. APP1 produces 3 new images by geometric transformation. Starting from a given image, one image is generated by randomly reflecting the input from top to bottom and another one from left to right. The third transformation linearly scales the original image along both axes.
APP2 generates 6 new images by repeating APP1's operations with three additional ones: rotation, translation, and shearing.
APP3 generates 4 new images by replicating APP2 process without shear.
APP4 generates 3 new images by applying PCA-based transforms.
APP5 generates 3 new images by applying DCT-based transforms similar to the ones adopted in APP4.
APP6 generates 3 new images by altering the color space. The three images are constructed by altering contrast, sharpness, and color shifting. APP7 generates 7 new images by altering the color space. The first four augmented images are produced by altering the pixel colors in the original image. Two images are generated by combining sharpening and a Gaussian filter. One image is generated by color-shifting.
APP8 generates 2 new images by altering the color space followed by the application of two nonlinear mappings.
APP9 generates 6 new images by applying elastic deformations combined with low-pass filters.
APP12 generates 5 new images by combining DCT and the random selection of other images.
APP13 generates 3 new images by applying the Radon Transform (RT) in a different way.
APP14 generates 2 new images by applying the Fast Fourier Transform (FFT) and DCT.

2) NEW DATA AUGMENTATION METHODS
Five new approaches are proposed here. Notice that most of the methods which make use of direct transformations are grouped and depicted in Figure 3, while, examples of more complex data augmentation for each of the new methods are depicted in Figure 2.
APP15 produces 11 new images. The first image is generated through the application of two consecutive DCT transforms to each matrix composing the three color planes, after which haze reduction and histogram equalization algorithms are applied for better readability (see Listing 1). The goal is not to extract a specific type of information, but rather to use the DCT properties to modify the image. The inverse DCT is closely related to the forward DCT. Thus, taking two consecutive forward DCTs results in an image that appears similar to, but not the same as, the original. Therefore, we exploit this DCT property to create new images starting from the original ones. The second image, in a similar way, applies the FFT two times, but between the two also zeroes out all positions of the matrix where the modulo of the value is higher than the average standard deviation of the columns (see Listing 2). Another image is generated through FFT transform by averaging the phases of the obtained matrix with that of another random image from the same dataset; to the result is then applied the inverse FFT (see Listing 5). The fourth image is obtained by applying Singular Value Decomposition (SVD) with the removal from the diagonal matrix of all values lower than the max divided by a random integer in the range [50,100]; the three obtained matrices thus modified are then multiplied to get the final image (see Listing 3). The successive three images make use of Local Laplacian Filtering for a random enhancement of small details contrast, a random smoothing of small details, and an overall increase of dynamic range and contrast. Another image is obtained through a technique that, using a transformation called color indexing, which reduces the number of colors available to represent the image to a random number in the range [8,16]  for each color plane. The method used for the ninth image utilizes a SuperPixel segmentation mask with a random number of clusters between 300 and 2000 for each of which the mean color is calculated and used to replace each pixel contained in the defined area (see Listing 4). An image is also obtained using the Hilbert transform which extracts a discrete-time analytic signal from every column of the image in the phase and modulo form from which only the phase is kept to define the new image through value normalization. The last image is the first distorted image of APP9. For the methods just detailed the pseudocode is now reported: Where STDrandomMask(matrix) returns a mask of where the value is inside the average standard deviation of each row.
APP16. This method utilizes the DCT implementation presented for APP15 and the first distorted image of APP9.
APP17. This method introduces a new augmentation that makes use of the Hampel outlier removal filter, generally  used for signals. The image generated from this method is obtained through a process of vectorization, column by column, of the image to which is then applied the filter with a measurement window of 20 and a standard deviation of 1.5. Before being added to the augmented dataset the linearized image is converted back to the original form by simply slicing the image into columns of the final size. This method is combined with the APP15: DCT, Superpixel, and Hilbert augmentations. Listing 6 reports a pseudocode of the data augmentation method.
APP18. This augmentation utilizes the ones of APP17 but substitutes the Superpixel method with the Laplacian based presented in APP15.
APP19. This method combines the FFT, Hilbert, and Hampel augmentations with the addition of a combination of APP1 through APP3 that contains: vertical and horizontal flips, random rotations in the range [1 • , 180 • ], gaussian noise addition, cropping of a random number of pixels in the range [0,20] from every side, and hue, saturation, brightness APP20. This method makes use of the first distorted image of APP9 together with the Superpixel (see APP15) and Pix-elShuffle method, as well as a slight variation of the ''Patch shuffing'' technique [12]. This last one, instead of switching the places of regular patches of the original image, it chooses sections of random dimensions to be randomly overlapped with other parts of the image (view Figure 2). The PixelShuffle method, also inspired by the ''Patch shuffling'' technique, performs a shift of each pixel of the image in a randomly chosen position inside the 3 × 3 area surrounding it starting from the top-left corner and progressing row by row. The outcome is a picture where the original pixels have been displaced to nearby positions, possibly resulting in multiple copies of the same pixel or in the removal of it from the resulting image. Listing 7 reports the pseudocode of the data augmentation method.

B. DATASETS
In this work, we assess the proposed ensembles from augmentation methods by adopting several image classification benchmarks. In particular, Table 2 reports all the information for each dataset: a short name, the original dataset name (if   provided in the reference), the number of classes and samples, the size(s) of the images, the testing protocol, and the original reference. For the testing protocols, we adopt the following abbreviations: • 5CV, 10CV indicates whether a 5-fold or 10-fold crossvalidation has been adopted; • Tr-Te indicates a pre-divided dataset into training and testing sets in the original paper. For instance, LAR and InfL have been divided with a three-fold division and the different folds were provided by the authors. For PBC, the official protocol specifies that 88% of the images be included in the training set and 12% in the test set, with both sets maintaining the same sample per class ratio as in the original dataset. END includes a training set of 3302 images and an external validation set of 200 images. The performance indicator typically reported on these datasets is accuracy, which measures the rate of correct classifications. For the GRAV dataset, four different views are extracted at different duration from each glitch/image, therefore the final score is obtained by combining the four classification scores via the average rule.  To provide the significance of the results, a statistical analysis has been performed and the Wilcoxon signed rank test [14] provided.

C. DEEP LEARNING MODELS
In this work, we adopt recent deep learning models for developing our ensembles and also as a baseline to assess our proposal in the experimental analysis. In particular, we adopt ResNet50 [22], MobileNetv2 [55], Efficient-NetB0 [64], and DenseNet [24]. These models are all recent convolutional-based neural networks and they are adopted in this work to show the feasibility and effectiveness of the proposed solution.
ResNet50 is a deep convolutional neural network trained on the ImageNet dataset, with 50 layers. ResNet50 is considered to be a very effective and efficient model for image classification tasks, and it has been widely used in a variety of applications, including object detection, image recognition, and video classification. It is also often used as a base model for transfer learning, where it is fine-tuned for a specific task using a smaller dataset. One of the key features of ResNet50 is its use of residual connections, which allow the network to learn complex features more effectively by bypassing some of the layers and allowing the gradients to flow more directly through the network. This helps to alleviate the vanishing gradient problem, which can occur when training very deep networks and can make it difficult to learn meaningful features.
MobileNetv2 is a lightweight convolutional neural network designed for efficient on-device inferencing on mobile and embedded devices. It is based on the idea of depthwise separable convolutions, which allows the model to be more efficient and faster to compute than traditional CNNs. In a depthwise separable convolution, the input is first processed by a depthwise convolution, which applies a separate filter to each input channel, and then the resulting feature maps are processed by a pointwise convolution, which combines the feature maps using a 1 × 1 convolution. This allows the model to learn more complex features while still being efficient and fast to compute. MobileNetv2 also introduces the concept of inverted residuals, which are a modified version of the residual connections used in the ResNet architecture. Inverted residuals allow the model to learn more complex features by increasing the dimensionality of the feature maps in the bottlenecks, which are the layers that reduce the spatial dimensions of the feature maps. VOLUME 11, 2023   EfficientNet is a family of convolutional neural network models that were developed to improve the efficiency and effectiveness of deep learning models. The EfficientNet models are designed to be scalable, so that they can be easily adapted to a variety of applications and datasets by adjusting the model size. They are characterized by their use of compound scaling, which allows them to improve the model performance by scaling up the network dimensions in a balanced way. The network dimensions include the number of channels in the convolutional layers, the spatial resolution of the input, and the depth of the network. By scaling these dimensions appropriately, the EfficientNet models can achieve better performance with fewer parameters and less computation than other CNNs. EfficientNet also introduces the concept of autotuning, which allows the model to automatically search for the optimal balance of network size, resolution, and depth for a given task.
DenseNet is a type of convolutional neural network that introduces the concept of dense connections, which allows the network to learn more complex features and improve performance. In a DenseNet, each layer is connected to all of the preceding layers, rather than just the immediately preceding layer as in traditional CNNs. This allows the network to incorporate features from all of the previous layers, which can be beneficial when learning from datasets with highly correlated features. DenseNet also uses a growth rate parameter to control the number of feature maps in each layer, which helps to reduce the number of parameters in the model and improve efficiency.

IV. EMPIRICAL EVALUATION
In this section we report the results of the empirical evaluation. All the experiments were taken on a Windows Server 2019, with an Intel Core i9-10920X CPU, 3.5 GHz, and 256 GB RAM, we employed an Nvidia Titan RTX 24 GB, 1350 MHz. They are developed in Matlab 2022a. We start our experiments by comparing the accuracy obtained by the following approaches: • SA, a stand-alone network trained on APP3, which is a quite standard data augmentation approach.
• BestSA, a stand-alone network trained on APP19 which produces the best average performance compared with all the other data augmentation sets.
• EnsDA_A, this is the fusion by sum rule among all the CNNs trained using APP1 to APP11; each net is trained with a different data augmentation approach. The data augmentation methods based on color spaces (i.e., APP6 to APP8) are not reported on VIR, HE, and MA since they are gray-level images.
• EnsDA_B, this fusion is the same as EnsDA_A except for the addition of nets trained with the augmentation methods APP12-14.
• EnsDA_C, this is the fusion by sum rule among those methods not based on feature transforms. Each approach is iterated twice (three times for datasets VIR, HE, and MA since they are gray-level images; they are trained three times so that the size of the ensemble EnsDA_C is similar to EnsDA_B).
• EnsDA_Mix, this is the fusion by sum rule among the methods trained with APP1, APP2, APP10:APP20; • EnsDA_MixB, this is the fusion by sum rule among the methods trained on APP1:APP9 APP15:APP19; • EnsBase(X), this is a baseline ensemble intended to compare and validate the performance of EnsDA_*; EnsBase(X) combines (via sum rule) X networks trained separately on APP3, which is a quite standard data augmentation approach. All the adopted models are pre-trained on ImageNet. In particular, in Table 3, 4, and 5 we report the accuracy of different architectures that adopt different networks as models.
In all three topologies, the highest average accuracy is obtained by EnsDA_MixB, which clearly outperforms the baseline Ens_Base (14).
Accuracy is probably the most commonly used performance measure for classification problems, but it is the least suitable for comparing classifiers because accuracy depends on the choice of classification threshold. To address this issue, the area under the curve (AUC) [21] is a standard measure of choice when testing the performance of predictive models. It gives a likelihood estimation that a predictor ranks a random positive instance higher than a randomly selected negative instance. This work uses the error under the ROC curve (EUC), which is an extension of AUC (i.e., EUC = 1 − AUC). Tables 6, 7, and 8 show the performance of the proposed approach in terms of EUC. Since EUC is a binary classification measure, the same average EUC value is used for all multiclass problems. The results in Tables 6, 7, and 8 mainly show trends in accuracy. That is, the proposed ensemble outperforms both the base ensemble and independent methods based on data augmentation.
The only dataset where the new approach clearly performs poorly, across all topologies, compared to EnsDA_B, is the Gravity dataset; that dataset is built by spectrograms. Since spectrograms have time and frequency on their axes, not all the standard image augmentation techniques are useful, e.g. reflection in APP15's FFT-based method. Thus, it is of the main importance to use specific augmentation techniques for spectrograms, such as linearly interpolating between pairs of real training examples [72]. This has been shown to improve the generalization performance of neural networks, especially when the data is limited or the model is prone to overfitting. In literature, it is used in exactly the same way and works both for RGB and spectrograms, the principle is the same, overlapping the signal [71].
In Table 9, we compare the different ensemble with the baseline ensemble (i.e. EnsBase (14)), the different methods are compared using the Wilcoxon signed-rank tests. It is very interesting to note that EnsDA_Mix outperforms Ens-Base(14) with a p-value smaller than 0.1 in all the topologies and for both the performance indicators. In this regard, it is worth noting that the higher p-value was reached with the model that employs DenseNet as a backbone, which is the model that gets the best performance. It is very hard to boost the performance when the backbone is already at its best because at some point the model reaches a plateau for a given dataset. EnsDA_Mix is the method suggested in this work.
In Table 13, we compare the performance of EnsDA_Mix ensembles with the best methods reported in the literature on the same datasets. As can be observed, our proposed best method obtains state-of-the-art or similar performance in all the datasets. Note that the performance indicator is the  F1-measure with the LAR and InfL dataset because that is the measure that is reported most commonly in the literature for this dataset. Moreover, we have reported the performance of two ensembles based on the three tested topologies: • EnsTop, sum rule among the three EnsDA_Mix calculated using, ResNet50, MobileNetV2 and DenseNet201 • EnsTop_W, as the previous one, but the methods are combined with the weighted sum rule the weight of DenseNet201 is 2, since DenseNet201 obtains on average the best performance. We can say that EnsTop_W outperforms all the other approaches and that the result is statistically significant, with a p − value < 0.05, for the following cases: EnsTop_W vs EnsDA_Mix-ResNet50 (p−value = 0.0006) and EnsTop_W vs EnsDA_Mix-MobileNetV2 (p − value = 0.0081).
As a final test, we computed the Matthews correlation coefficient (MCC) as the performance indicator for the three best-performing groups to check which one is the best. In the literature [10], MCC is proved to be a more reliable statistical indicator than accuracy even for binary classification datasets. Table 10 reports the p-values among the different pairs of the ensembles, it can be noticed that the performance of EnsDA_Mix is always statistically significant with respect to EnsBas(14) with a p − value always smaller than 0.05.
Notice that BestSA results in 13 new images for each original sample in the training set. It is unfeasible to apply VOLUME 11, 2023 TABLE 13. Performance as a measure of accuracy (in %) compared with the best in the literature. In square brackets, we report the reference that provides the best performance on a dataset. ** On LAR and InfL, F1 is the performance measure. *** The method in [30] combines descriptors based on both object scale and fixed scale images. **** Only handcrafted features are used.
EnsBase (14), using as a backbone the method BestSA, to all the datasets and for all the topologies, due to the fact that this would become too expensive in terms of training time, also when GPUs are employed. Anyway, our goal is to develop new ensemble methods leveraging different data augmentation methods.
The data augmentation methods do not increase the complexity as the time spent to train the system is proportional to the number of images generated. For instance, on the 2D HELA dataset, a ResNet50 takes 400 seconds to complete the training process. When APP19 is applied, the same network takes 4400 seconds to complete the training process. Notice that APP19 produces 13 new images making the resulting running time linear in the number of images generated.
In Table 11, we report the inference time in seconds taken to classify a batch of 100 images. We compare the time taken by a single model (specified in the first column) and an ensemble made of 15 classifiers. We can notice that the time spent by the ensembles is linear in the number of classifiers in the ensemble. This is a positive aspect since each classifier is independent this operation could be parallelized to speed up the process.

A. ABLATION ANALYSIS
In order to prove the advantage of the proposed method, we perform an ablation study to compare whether the application of data augmentation increases the performance with respect to a baseline method. First, notice from Tables 3,  4, and 5, that the stand-alone network (column SA) provides a performance that is on average lower than the ensembles. This is true always but in the Gravity dataset where the performance is very close. On average, the adoption of the data augmentation method in the ensemble provides an increase in the performance of almost 3%.
To control whether the contribution of the data augmentation methods is beneficial, we compared the performance of a stand-alone network with an ensemble trained with and without data augmentation. Results for some of the datasets are reported in Table 12, where ReLu reports the performance of a stand-alone network, 15ReLU is an ensemble trained without data augmentation, and EnsBase is the ensemble described in the previous sections. It can be noticed that adopting an ensemble approach always gets better performance when compared with the stand-alone methods. Moreover, it can be seen that the adoption of a data augmentation method in the training process allows for a further increase in the performance of the ensemble. Providing another piece of evidence about the benefits brought by our proposal.

V. CONCLUSION
In this study, we compare combinations of pre-trained CNNs that have been fine-tuned to various training sets while incorporating the best image manipulation methods for creating new images. In comparison to various benchmarks that represented various image classification tasks, we evaluated the performance of these networks and their combinations. The reliability of CNNs is improved by combining images produced by various data augmentation techniques into a data-level deep learning ensemble, as demonstrated in this study. Given the variety of landmarks used, the method we use to construct CNN ensembles should be effective for the majority of imaging issues.
ANDREA LOREGGIA received the master's degree (cum laude) from the University of Padua, in 2012, and the Ph.D. degree in computer science, in 2016. He is currently an Assistant Professor with the Department of Information Engineering, University of Brescia. His studies are dedicated to designing and providing tools for developing intelligent agents capable of representing and reasoning with preference and ethical-moral principles. His research interest includes artificial intelligence span from knowledge representation to deep learning. He is a member of the UN/CEFACT Group of Experts, he actively participates in the dissemination and sustainable development of technology.
SHERYL BRAHNAM received the master's degree from The City College of New York, in 1997, and the Ph.D. degree in computer science from the Graduate Center, City University of New York, in 2002. She is currently a Professor with Missouri State University. Her research interests include pattern recognition, face recognition, bioinformatics, and medical image analysis.
MICHELANGELO PACI received the B.Sc. and M.Sc. degrees in biomedical engineering, the B.Sc. degree in computer engineering, and the Ph.D. degree in bioengineering from the University of Bologna, Italy, in 2004Italy, in , 2006Italy, in , 2007Italy, in , and 2013, respectively. After two years in industrial automation, from 2008 to 2009, he spent nine years in academia doing research with the Tampere University, Finland, including silico modeling of human cardiomyocytes with a focus on in silico drug assays and machine learning and texture analysis for image classification. Currently, he works in the industry as a Software and Firmware Designer, still being a Docent in computational cardiology with Tampere University.
Open Access funding provided by 'Università degli Studi di Padova' within the CRUI CARE Agreement VOLUME 11, 2023