HistoClean: Open-source software for histological image pre-processing and augmentation to improve development of robust convolutional neural networks

Graphical abstract


Introduction
The growth of digital image analysis in clinical pathology and its subsequent case for use in clinical medicine has been sup-ported by the conception of open-source digital image analysis (DIA) software [1][2][3]. Use of machine learning from predetermined features allows for the development of DIA algorithms within these software environments. This allows bio-image analysts and consultant histopathologists to answer difficult, specific research questions in human tissue [4]. The subsequent introduction of deep learning has revolutionised the development of DIA algorithms [5]. This has enabled potential solutions to tumour and biomarker detection, as well as tumour subtyping [6,7]. However, these solutions require domain-specific knowledge relating to the deep learning methodology, as well as the awareness of hardware acceleration [8].
Consequently, open-source software to aid bio-image analysts without a background in computer vision to develop deep learning models have evolved [9,10]. Deep learning methodologies learn feature representations from the data without requiring predefined feature extraction. The resultant models can therefore be significantly more sensitive to dataset specific attributes, such as irregularities in staining, batch effects and the quality of the digital slide [11,12]. Use of image pre-processing and augmentation prior to developing deep learning models can regularise the input images, thereby, mitigating the potential for bias in the training of the CNN, or other deep learning models, and its independent validation [13][14][15][16]. Among these, the most common techniques include class-balancing [17], image normalisation [18], and image augmentation [19]. These techniques often involve the use of multiple coding libraries, which in turn requires knowledge of the documentation before implementation. Herein we present HistoClean; an open-source, high-level, graphical user interface (GUI) for image pre-processing. HistoClean aims to complement other open-source software and deep-learning frameworks in the bio-image analysis ecosystem [9,10,20]. HistoClean's image pre-processing toolkit is divided into five functional modules based on computational methods frequently used in histological image pre-processing; image patching, whitespace thresholding, dataset balancing, image normalisation and image augmentation (Fig. 1). These modules can be used independently or in combination with each other as the user requires. HistoClean brings together image pre-processing techniques from across multiple Python libraries. This simplifies the image preparation phase of deep-learning analysis in a way that is transparent and maintains data integrity.
The process of developing deep learning models for histopathological analysis is a combined effort between computer scientists, biomedical scientists and pathologists. HistoClean aims to help bridge the knowledge gap between these domains by providing a point-and-click alternative to computer programming for these processes. The intended audience of this application are i) Biomedical scientists and pathologists, who can use the tool to evaluate how image pre-processing might influence visualisation of underlying biology. ii) Computer scientists who can apply the appropriate changes in a rapid and reproduceable way, saving the time and effort of developing coding scripts in the process.
In this study, a practical example of how HistoClean can optimise input images for training a simple CNN to predict stromal maturity is described (Fig. 2). In evaluating these models, we demonstrate the benefit of image pre-processing for deep learning, even in relatively simple CNN architecture, and introduce Histo-Clean as an open-source software solution to quickly implement and review these techniques.
The main contribution of this paper is the development of a novel, easy to use, point-and click application for the rapid preprocessing and augmentation of image datasets for use in deep learning, image analysis pipelines.

HistoClean application development
HistoClean was developed using Anaconda3 and Python 3.8. Code was written using the PyCharm integrated developer environment. The GUI was developed using the Tkinter toolbox (v8.6). Initial development and testing of the software was performed on an Octane V laptop with an Intel Core i7-9700F 3.0 GHz processor and 32 GB Corsair 2400 MHz SODIMM DDR4 RAM, with a Windows 10 operating system. The application was converted to a .exe program using the Pyinstaller Python package [21]. All testing was performed in the Windows 10 operating sys-tem. For ease of use it is recommended that images should be organised within directories corresponding to each image class. The application runs all processes on the CPU. No GPU is required. The application makes prominent use of multithreading, which scales to the number of cores in the CPU. The application has 160 user interaction points, all of which have exception handling for input characters and data types. The application is designed to allow the user to have complete control over the techniques applied. The modules outlined here can be used together or separately as the user requires.

User interface design
The HistoClean user interface was created utilising established simple-design principle, minimising the amount of on-screen text and interaction points while maintaining functionality [22]. The interface features a modular, single-window design with a focus on minimalism and displays clear categorisation of the application's functions [23]. Icons were added to the module selection buttons to allow for quicker and easier identification of module functionality [24]. Upon selecting a module, users are walked through the process using the concept of procedural instruction [25] with a natural progression from the top of the screen to the bottom. We enhanced the principles of clarity and comprehensibility, with reduced focus on aesthetics [26]. The primary colouration of black on light grey/white was chosen not only for visual clarity, but for accessibility for colour-blindness. A wayfinding feature has been implemented into the module selection buttons, which darken according to which module is active at the time.
HistoClean features extensive error handling which follows the principles of prevention, correction and recovery [27]. Examples of how each of these principles is utilised is as follows: HistoClean will prevent the user from entering non-numeric values if these are not appropriate. HistoClean will also automatically correct for one-channel images in the Normalisation module by converting to RGB beforehand. Finally, throughout the entirety of the program, user-interaction points that have been accidentally overlooked can be recovered via the use of feedback tools such as popups and widget highlighting.
HistoClean is designed to be a standalone application. As such, the application was compiled as an executable file using Pyinstaller. All dependencies are included at download, with the user only needing to click on the application to begin.

Image patching module
CNN's require input image tiles to have consistent dimensions [28]. For this reason, HistoClean includes an image patching module that utilises the Python library Patchify [29]. This module interface allows the user to create image tile subsets from a larger input image to their specification and provides real-time feedback of the output to the user, facilitating straightforward evaluation and adjustment (Fig. 1a). This module can be used for block processing of n images organised within a common file directory. The user can select an output destination wherein the directory structure and naming conventions of the original images will be retained and populated with the requested image patches. The file names of these new image tiles are suffixed with their patch co-ordinates from the original image for reproducibility. Maintaining transparency in the pre-processing stages ensures that results can ultimately be traced back to their source ensuring that HistoClean does not damage original source data or impede data integrity and reproducibility.
to address this issue and improve the quality of input image tiles, HistoClean includes a tissue thresholding module that allows the user to remove image tiles from their dataset based on a minimum threshold of approximate tissue coverage. The method outlined in this paper uses binary thresholding to determine the percentage of positive pixels, representing tissue, and null pixels, representing whitespace (Fig. 3). Tissue coverage and relative intensity of the staining can vary significantly depending on any number of predisposing factors. Therefore, HistoClean's module interface allows the user, in real time, to explore different thresholds for dichotomising these pixels into tissue vs whitespace. In addition, adaptive thresholding is available for each image as well as Otsu binarization [31].
All of these thresholding options come courtesy of the OpenCV Python library [32]. These processes generate a binary mask for each image which the GUI presents alongside the original image for review. Users can view five images simultaneously. Upon approval of an arbitrary threshold, images are removed or relocated based on user preference.

HistoClean: Class balancing module
Class balancing is essential to prevent class bias of data when developing deep learning models [33]. For this reason, HistoClean includes a class balancing module that enables the user to equalise the number of images per class prior to training of the CNN Fig. 1. HistoClean (a), an all-in-one toolkit for the pre-processing of images for use in deep learning. Modules include (b) whitespace estimation and filtering, implemented in the white space removal module, (c) tools for generation of image tiles from larger images, which are executed within the image patching module. (d) Image normalisation, which standardises the colour grading of the images. (e) Quick balancing, which balances the number of images in different classes by classic image augmentation, and (f) image pre-processing/ augmentation, which provides further methods to expand an image set, add noise and accentuate image data. (Fig. 1c). This requires that each class of images be provided in a separate directory by the user. The user can then decide to balance using three options: reducing the number of image tiles in each class to the smallest class, increasing the number of image tiles in each class based on the largest class, or balance the number of images in each class based on the average number of images in each class. The pre-requisite for using this functionality is that no class contains less than one eighth of the samples of the largest class. This pre-condition is reinforced through exception handling. This is to prevent duplicate images arising from repeated augmentations. If the user balances the samples through class reduction, the image tiles in the larger class-specific dataset are then relocated to a new directory, denoted as 'Removed Images', or are permanently deleted based on user preference. If class-size is balanced . These tiles are independently sorted into training, test and validation datasets at a patient level (c). Image pre-processing and augmentation is conducted on the tiles using HistoClean where appropriate in the training, test and validation datasets in order to prepare tiles for use in a convolutional neural network (d). Within a typical convolutional neural network, each tile is fed through a series of convolutional and pooling layers in order to create feature maps to differentiate between the two classes (e). These feature maps are then fed through several fully connected layers which determine which class the images belong to (f). Each tile is assigned a value used for class prediction; the prediction values for each tile are then aggregated in order to provide an overall class prediction per patient (g). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) by the addition of image tiles, then a random assortment of image tiles equal to the difference between the largest class-specific image dataset are selected without replacement from within the smaller dataset(s). The random selections of image tiles are then augmented thus balancing the number of image tiles in that class by addition of 'new' image data. Image augmentation techniques are randomly selected from mirroring, clockwise rotation at 90°, 180°or 270°, or a combination of mirroring and a single rotation. This can create up to 7 unique images from a single image as required. A random number generator, seeded to the date and time of dataset balancing, determines the augmentation applied.

Image normalisation module
Histological images possess unique image colour, contrasts, and brightness profiles. Batch effects in staining (Fig. 4a) can significantly influence model performance [13]. Image normalisation can be used to bring uniformity to the images in the dataset by adjusting the range of pixel values of an input image, according to that of a target image [18]. For this reason, HistoClean includes an image normalisation module based on histogram matching from the Python library scikit-image [34]. Histogram matching works by comparing the cumulative histogram of pixel intensities from a target and an input image, before adjusting the pixel values of the input image according to the target image [35] (Fig. 4b). His-toClean's module interface allows the user to select a target image to normalise to and to review examples of the histogram-matched images before committing to image normalisation to n images organised within a folder. This gives the user complete control over the normalisation process. These can be either be for tiles for a singular slide or a cohort of slides. These are saved to a separate userdefined folder, or can replace the original images at the user's discretion. If saved in a separate folder, the directory structure of the original is replicated.

Image augmentation and Pre-processing module
It is not always possible to source large collections of histological images in the pursuit of developing deep learning models [36]. Image augmentation is a technique which that can be used for the artificial expansion of image datasets to provide more training examples. In addition, image pre-processing can be used to enhance features already present in an image dataset in order to provide more specific features for the CNN training [37]. By providing deep learning models with augmented data, the user can reduce the risk of overfitting and improve the generalisation ability of the CNN [36]. For this reason, HistoClean includes an image augmentation/pre-processing module based on the Python library Imgaug [38]. This allows the user to select, review and apply the most popular image augmentation techniques used in the development of CNNs to their image dataset in real-time (Fig. 1e). These include adjusting the colour range, contrast, blur and sharpness, noise, pixel and channel dropout and more.
There are over 50 pre-processing options available that can be used individually or in combination. Generated images files from augmentation are identifiable by their name, which incorporates the name of the root file from which the image derived so as to maintain data integrity. If a new image set is created, the directory structure is replicated from the original.

Patient samples
Ethical approval and access to diagnostic H&E stained slides from a retrospective cohort of oropharyngeal squamous cell carcinomas (OPSCC) for stromal maturity prediction by artificial intelligence was granted via the Northern Ireland Biobank (OREC 16/NI/0030; NIB19/0312) [39]. Briefly, patients with a primary oropharyngeal cancer diagnosed between 2000 and 2011 were identified and their diagnostic H&E retrieved from the Belfast Health and Social Care Trust courtesy of the Northern Ireland Biobank. All slides were digitised using a Leica Aperio AT2 at 40x magnification (0.25 lm / pixel). Virtual slides were saved in a .svs file format and imported into the open-source image analysis tool QuPath (v0.1.2) [1] to enable image annotation by a qualified histopathologist.

Classification of stromal maturity
Using DIA software QuPath (v0.1.2), a trained pathologist reviewed all the diagnostic H&E slides from each case before identifying and annotating ROIs for classification of stromal maturity on the slide that most represented malignant OPSCC. QuPath was used due to the presence of the built-in tools available for the annotation of the ROIs. Classification of mature stroma was defined by the presence of fine, regular, elongated collagen fibres organised with approximately parallel orientation. Conversely, immature stroma was defined by disorganised, random orientation of collagen fibres with and without the presence of oedema and myxoid-like degeneration [40,41]. Stroma maturity was determined as being either mature or immature for each ROI by visual review. This was conducted by the pathologist, along with two other blinded independent assessors based on previously published criteria [40,41]. Stromal maturity is a prognostic factor in cancer, with immature stoma associated tumour patients exhibiting significantly worse survival. The exact mechanisms behind why this is the case are not fully understood, but theories have emerged citing stromal gene expression and the influence the desmoplastic reaction has epithelial to mesenchymal transition [42,43]. Representative images of mature and immature stroma were created and used as reference criteria for all assessors prior to classification (Fig. 5).

Image set preparation
Image tiles of size 250X250 pixels at x40 (0.025 mm/pixel) magnification were extracted from the ROIs, that had been previously annotated in QuPath by the pathologist, using the built-in scripting functions. These dimensions and resolution were chosen to be large enough to allow the images to capture the intricacies of the stromal structure, but small enough to reduce computational expense and allow for larger training batch sizes. Tiles were organised in separate directories for mature and immature stroma as determined by manual assessment. These were further grouped into directories representing each patient. Images were divided at a patient level into three sets. First, the training set, which consisted of 70% of the patients was used to train the CNNs. Second, the test set, which consisted of 15% of the patients was used to evaluate model performance during training. Lastly, the independent validation set consisting of the remaining 15% of patients. This did not influence the training of the model and was instead used to evaluate model performance. This produced the baseline ''Unbalanced" image set. Images were organised in this way to account for intra-patient heterogeneity of stromal maturity. An entire heterogenous patient existed within the training, test or independent validation set and was not split among the three. This is to prevent the CNN from ''recognising" patients between the three sets.

Image pre-processing using HistoClean
In order to demonstrate the benefit of image pre-processing for the development of robust CNN's, seven independent image datasets were produced from the baseline image set. These utilised a combination of class balancing, image normalisation and preprocessing (Table 1).
Class balancing augmented the smaller image class to provide the same number of images as the larger class. This option was chosen as reducing the larger class down, would have resulted in a lesser volume of images for training, harming model accuracy. Balancing the classes was done with the aim of reducing training bias towards a single class. Image pre-processing was limited to embossing of the images (Intensity = 2, Alpha = 1) (Fig. 6). Embossing was chosen with the aim of accentuating the differences in the features between mature and immature stroma outlined in Section 2.3. The same target image was used in all normalised sets. Normalisation was done with the aim of removing any potential colour bias in the model. In particular, the histogram matching technique was chosen here as it offered less computational overheads than other more advanced stain normalisation methods such as the Reinhard [44] and Macenko [45] methods, with the under-standing that this may cause image artefacts [18]. All image manipulation was conducted prior to input in the CNN. The processes for creating all these image sets were timed. Augmentations   were applied across the training, test and independent validation sets, with the exception of balancing, which was done across training and test sets only. HistoClean offers the ability to save to any servers connected to the computer operating system. As such these separate image sets were saved to a local server

CNN design
The CNNs used in these experiments were designed using PyTorch [46]. A core CNN architecture was established and trained independently on each of the 8 datasets from scratch. This network consists of five convolutional layers interlinked with five pooling layers (Fig. 7). The output of the final pooling layer is then flattened and fed into two fully connected layers wherein stromal maturity is predicted using the softmax function in the final layer. The CNN architecture was kept relatively simple to reduce computational cost and training times, as well as highlight the impact of image pre-processing using HistoClean. Training was carried out for 200 epochs, with a batch size of 150. Adam Optimisation was used with a learning rate of 1e-6. Test batch size was set to 150 images. The outcome of the softmax function in the CNN produced a probability for each input image ranging from 0 (predicted mature) to 1 (predicted immature). Stromal maturity of the input images was classified as immature if the stromal maturity probability was greater or equal to 0.5, otherwise it was considered mature. After training on every fifth batch, the neural network calculated the accuracy and loss on a randomly selected test batch. If the test accuracy was greater than or equal to 65%, the weights and biases of the model were saved for further model evaluation. The weights and biases of the top 10 test batch accuracies were applied  to the entire test set to get an improved evaluation of in-model performance. Only the model weights and biases that provided the top test accuracy were carried forward. These were then loaded to the CNN and applied to the independent validation image set. Stromal maturity probabilities at a ROI level were produced by majority voting of individual tile classifications. In patients with heterogeneous ROI classification of stromal maturity, majority voting of the ROIs was used to determine classification at a patient level. This was done to remain comparable with manual assessment. If the number of predicted stromal immature and mature ROI's was equal the patient was considered to have mature stroma overall. To enable comparison of how different input images affected training of the CNN, batch size, learning rate, loss function and optimiser were all kept constant through all experiments. Full code for the CNN can be found at: (https://github.com/HistoClean-QUB/HistoClean)

Statistical analysis
The pathologist stromal maturity scores were used as the ground truth for development of the CNN. Model evaluation was conducted against the ground truth (pathologist scores) for the best-saved weights and bias in each of the image data sets at an individual tile, ROI and patient level. Confusion matrices were calculated to help determine the model's precision, recall and F1scores. Receiver-Operator Characteristic (ROC) curves were generated for assessment of the area under the curve (AUC) using the Scikit-learn library [34] in Python 3.8 at a tile and ROI level. Due to the heterogenous nature of some of the patients and methods of aggregation to predict outcome, ROC curves were not generated at this level.
Comparability between the best CNN model and the manual evaluation method was also assessed. Sensitivity, specificity, accuracy and their 95% confidence intervals were also calculated in the two additional independent manual stromal maturity classifications. For the purpose of this analysis, the model was considered a fourth evaluator. Inter-evaluator concordance was conducted using Fleiss' Kappa. All bio-statistical analyses were performed using R v3.6.1 [47].

Patient images
Classification of stromal maturity in digitally annotated ROI's was conducted on H&E stained slides for 197 patients with OPSCC. From these patients, 636 ROIs were annotated and evaluated manually. In total, 9.91% (63/636) ROIs had insufficient stroma to produce tiles, resulting in 4.06% (8/197) patients being excluded from further analysis in the study. Of the remaining patients, 33.86% (64/189) were found to have immature stroma in all ROIs assessed and 45.50% (86/189) patients were found to have mature stroma present in all ROIs assessed. Classification of stromal maturity across ROIs was heterogeneous in 20.64% (39/189) of patients assessed. There were 29 heterogenous patients in the training group, 4 in the test group and 6 in the independent validation group (Fig. 8). A complete breakdown of tiles, ROIs and patients can be found in Supplementary Figure 1.

Image set times
The time taken to perform each of the adjustments outlined in Table 1  The use of multithreading allowed for the processing of the images in a rapid timeframe. As mentioned previously, the number of threads used scales with the CPU cores, allowing the user to carry out other tasks while HistoClean produces the new images.

Evaluation of image data sets in robust CNN development
The CNN was trained eight separate times from scratch using the eight separate image sets summarised in Table 1. Use of image pre-processing techniques were found to consistently improve upon model performance when compared to the baseline ''unbal- anced" dataset across all levels of prediction assessed; from probability of individual image tiles to aggregation of probability at the patient level (Table 2). Image pre-processing conducted in the Balanced Embossed set provided the best overall accuracy at a tile, ROI and Patient level (0.774, 0.835 and 0.857 respectively) as well as a superior f1-score (0.820, 0.844 and 0.846 respectively). From these results, the balanced embossed set was determined to be the best preforming image set overall. In addition, the Balanced Embossed image set provided the best area under curve (AUC) scores (0.839 and 0.963 at a tile and patch level; Fig. 9).
The ability to predict stromal maturity using the CNN trained on the balanced embossed images was developed using the ground truth for stromal maturity in that ROI as provided by a single pathologist. Therefore, the sensitivity and specificity of manual classification of stromal maturity by two independent assessors to predict the pathologist scores was conducted and compared to results from balanced embossed image trained CNN in order to determine how reproducible the original pathologist scores were. Both independent manual assessors and the balanced embossed image set trained CNN demonstrated comparable sensitivity (100%; 95% CI, 77%-100%, for Assessor 1; 93%; 95% CI, 68%-100%, for Assessor 2 and 80%; 95% CI, 52%-96%, for the CNN) and specificity (86%; 95% CI, 57%-98%, for Assessor 1; 100%; 95% CI, 75%-100%, for Assessor 2 and 85%; 95% CI, 55%-98%, for the CNN) when classifying patients with having immature stroma based on the original pathologist scores. Moreover, the Fleiss' Kappa score demonstrated good concordance between all three manual assessors and the CNN(j = 0.785, p < 0.0001). A review of misclassification by the balanced embossed image set trained CNN found misclassification occurred most often when a small number of tiles were available for stromal classification in that patient (Fig. 10a). Misclassification by this model was found at a tile level whenever the image augmentation enhanced the presence of whitespace in immature stroma tiles resulting in misclassification of mature stroma in the embossed image (Fig. 10b). In one patient, no tiles were able to be extracted from 3 of the 5 ROIs, resulting in an inversion of stromal maturity prediction that was subsequently incorrect.

Discussion
As technology advances, so too does the demand for computational, high-throughput, cost-effective diagnostic tools for use in clinical medicine. This is particularly true in the field of clinical pathology that traditionally has utilised fewer technological aids in spite of a depleting workforce [48,49]. Digital pathology, involves the acquisition and review of ultra-high-resolution whole slide images using a computer monitor in place of a microscope [50]. Digitisation of histological slides benefits from remote access for diagnostic reporting, providing a quick and easy means of recourse for diagnoses of complex pathology though ease of sharing virtual slides to consultant histopathologists with subspecialist interest [48]. In addition, slide digitisation permits the use of digital image analysis tools to quantify histological features objectively using AI, as seen in radiomics [52]. At present, use of digital image analysis algorithms by consultant histopathologists is limited due to lack of modernisation in clinical pathology within the National Health Service, UK [51]. However, many consultant histopathologists recognise the benefit digital image analysis methodology could provide in streamlining the decision making process [53].
In contrast to other medical and non-medical disciplines that have implemented AI-assisted DIA, there is a scarcity of appropriate pathological images for developing deep learning models in clinical pathology [54]. This is in part due to the relatively recent move towards digitisation of pathology services, but more often due to lack of pathological material regarding the question of interest. Histological images are data rich and demonstrate significant heterogeneity across and within disease pathologies [55]. Therefore, the number of images required for effective deep learning is that of many orders of magnitude greater than that those required when developing models using more classical machine learning methods. Depending on the model being developed, this may require image datasets to be sourced at a global scale. Consequently, this introduces image variability and potential bias into CNN learning through differences in laboratory practice, scanning procedures or age of the sample being scanned [56]. This can have a pronounced effect on model learning and validation, particularly in small cohort studies, as each histological image possess unique image colour, contrasts and brightness profiles. The interlaboratory variation limits the efficacy of developed models from small cohort students to be used in practice. CNNs have already shown promise in several cancer types and in several different use cases. One study by Khosravi et al. evaluated both in-house and the current top pretrained models' efficacy across numerous cancer types and in several different tasks [57]. Many of these models achieved > 90% accuracy in the categories of tumour detection, biomarker detection and tumour subtyping in bladder, breast and lung cancers. Another study demonstrated the use of several pretrained neural networks to identify different growth patterns in lung adenocarcinoma, achieving accuracies up to 85% [6].  In this study, we demonstrate the power of image preprocessing and augmentation and present a novel open-source GUI called HistoClean. Using a relatively simple CNN architecture, we clearly establish how use of image pre-processing techniques improves upon model generalisability for prediction of stromal maturity in an independent validation dataset. Further, we show that the best developed model, the balanced embossed model, had similar concordance, sensitivity and specificity to two further independent assessors of stromal maturity by manual review. However, we also show that poor choice of image pre-processing and augmentation techniques can introduce bias and noise. The use of image augmentation for dataset balancing helped to increase the small number of immature samples present for model development whilst image pre-processing through embossing helped to accentuate the features of interest we wanted the model to train with. Therefore, to ensure successful model development, consideration of which techniques to implement should reflect the specific research question being asked. HistoClean offers a simple point-and-click GUI that allows users without a coding background to rapidly augment and pre-process images, utilising live feedback to evaluate these changes. This also aids computer scientists by removing the process of writing, running and re-running scripts. The minimalistic user interface, combined with the provided procedural instruction, creates an implicit user-friendly experience [22,23,25].
When trying to improve the accuracy of a CNN, often developmental time is spent refining the neural network and the network's hyperparameters, or using deeper networks. However, it is arguably just as, if not more important to focus on the quality of the images used in training the network; a sentiment captured by the expression ''rubbish in = rubbish out". This study illustrates how crucial it is to balance the number of input images across the classes to prevent model overfitting. This initial step significantly improved both overall accuracy and AUC at the tile, patch and ROI level. The strength of this action is also clearly demonstrated by the change in false mature and false immature rates when comparing the balanced dataset to the unbalanced dataset. This is evidenced in the increases in f1-value at tile ROI and patient level (0.187, 0.340 and 0.443 respectively, Table 2). In parallel to this, embossing alone also demonstrated increases in accuracy and AUC across all levels, as well as lessening the effect of a mature dominant training set (Table 2). A synergistic improvement occurred when the dataset was both balanced and embossed, achieving an accuracy of 0.774 at a tile level. These improvements are in line with several other studies that use different augmentation techniques [58][59][60]. Importantly, HistoClean allowed the bioimage analyst to review the output of the image processing steps being applied within the software before proceeding to model development, providing opportunity for discussion of how particular image augmentations may enhance qualitative features the pathologist used to define stromal maturity in the image.
The CNN used in this study is relatively simple. This case-study demonstrates that high quality input data for training through the use of both pre-processing and augmentation techniques can improve classification accuracy with simple model architecture. Future studies utilising these same image augmentation and preprocessing techniques for more advanced deep learning models such as VGG [61], AlexNet [62]and ResNet [63] architectures would be of interest. The positive impact of these techniques may be less pronounced in these models due to the higher complexity of the models. However, this would be at a much greater computational cost and training times, as well as requiring more high-powered computer hardware that creates a barrier to entry for deep learning.
While HistoClean has proven to be a useful tool in this study, there are improvements which can be made. At this current time, ity maps ( [70]). These techniques highlight areas of interest on the original images, providing some insight into which features are contributing to the classification. As techniques like this continue to improve, the concerns around the blind nature of deep learning should be alleviated.

Conclusions
This study confirms that use of image pre-processing and augmentation techniques available in HistoClean can advance the field of deep learning by facilitating arguably the most important step CNN-centric experiments; image set preparation. However, there is a lack of easy to use open-source GUI software to facilitate this process, and therefore this often requires knowledge of computer programming. This study demonstrates the usefulness of Histo-Clean as an open-source software to implement image preprocessing techniques in image research, saving time and improving transparency and data integrity. HistoClean provides a rapid, robust and reproducible means of implementing these techniques in a way that can be used by experts, such as pathologists, to help identify which techniques could potentially be of use in their study, without the need for an inherent knowledge of coding. His-toClean also saves the user the effort of running and re-running scripts to assess how the pre-processing techniques may be affecting the underlying biology in the image. This in turn empowers the researchers by allowing them to better make judgements on the optimal techniques to apply for their work. The application has been designed around the concept of minimalism and procedural instruction to create an inherently user-friendly experience. The open-source nature of HistoClean allows for the continuous development of the application as more advanced augmentation and pre-processing techniques are identified and requested.

Declaration of Competing Interest
Dr. M.S.T has recently received honoraria for advisory work in relation to the following companies: Incyte, MindPeak, QuanPathDerivatives and MSD. He is part of academia-industry consortia supported by the UK government (Innovate UK). Dr J.J. is also involved in an academic-industry research programme funded by IUK. These declarations of interest are all unrelated with the submitted publication. All other authors declare no competing interests.
Health Agency in Northern Ireland, Seán Crummey Memorial Fund, and Tom Simms Memorial Fund.

Funding
This study was supported by a Cancer Research UK Accelerator grant (C11512/A20256). The funders had no role in study design, collection, data analysis or interpretation of the data.