CycleGAN for virtual stain transfer: Is seeing really believing?

doi:10.1016/j.artmed.2022.102420

Artificial Intelligence in Medicine

Volume 133, November 2022, 102420

https://doi.org/10.1016/j.artmed.2022.102420 Get rights and content

Highlights

•
CycleGAN, often used for stain transfer, is sensitive to architectural changes.
•
Changes do not affect visual plausibility but do affect pre-trained deep models.
•
Stain transfer can perturb important diagnostic markers, limiting reliable use-cases.

Abstract

Digital Pathology is an area prone to high variation due to multiple factors which can strongly affect diagnostic quality and visual appearance of the Whole-Slide-Images (WSIs). The state-of-the art methods to deal with such variation tend to address this through style-transfer inspired approaches. Usually, these solutions directly apply successful approaches from the literature, potentially with some task-related modifications. The majority of the obtained results are visually convincing, however, this paper shows that this is not a guarantee that such images can be directly used for either medical diagnosis or reducing domain shift.This article shows that slight modification in a stain transfer architecture, such as a choice of normalisation layer, while resulting in a variety of visually appealing results, surprisingly greatly effects the ability of a stain transfer model to reduce domain shift. By extensive qualitative and quantitative evaluations, we confirm that translations resulting from different stain transfer architectures are distinct from each other and from the real samples. Therefore conclusions made by visual inspection or pretrained model evaluation might be misleading.

Graphical abstract

Introduction

Digital pathology has become a rich area of innovation in both clinical application and research. However, its crucial process of staining is known to be prone to high variation [1] due to differences in tissue preparation (exposure time, tissue fixation, section thickness etc.), scanner characteristics (sensor, resolution, storage format, etc.) or staining protocol. Examples of such variations for the case of kidney pathology are given in Fig. 1. These differences can affect automatic systems [2] as they represent a source of domain shift [3]. A pathologist is able to correct for these variations due to experience, however current AI algorithms are not able to use such background knowledge. Thus, standardising the appearance of histological slides has become of great importance from both a diagnostic point-of-view and for the successful development and application of automated systems.

The standardisation is often addressed using computer vision techniques such as virtual staining — artificially changing the appearance of an image after its acquisition. Historically, research has focused on standardising the appearance of one particular stain, i.e. reducing the variation along the rows in Fig. 1. This is usually referred in literature as stain normalisation. However, for the sake of better understanding, herein this is referred as intra-stain normalisation. Classical (non-deep) approaches to intra-stain normalisation use stain separation to isolate specific channels and then standardise the colour levels with respect to a reference image [4], [5], [6]. More recent approaches use machine learning or deep learning strategies to standardise image appearance [7], [8]. Nowadays, the problem of virtual staining is typically considered to be a style-transfer problem, and a number of successful approaches based on style transfer techniques developed for natural images have been adapted to digital pathology [9], [10], [11], [12]. Their introduction however enabled the possibility to translate between two (or more) physically different stains (i.e. stain translation). With such models, it becomes possible to reduce the variation along the columns of Fig. 1. This represents the second type of normalisation studied in this article, and is referred to as inter-stain normalisation. One of the most successfully and widely applied approaches is CycleGAN [13]— an unsupervised and unpaired method that enables virtual staining between two stainings without any additional effort for data preparation.

Many works attest that CycleGAN-based methods for virtual staining can achieve translations that are visually indistinguishable from real samples [9], [10], [12], [14], [15]. Furthermore, many works propose extensions to the original CycleGAN architecture [9], [16], its loss function [12] or, with respect to a specific task, extend the training paradigm with additional modules [15], [17]. These works tend to rely on visual inspection to compare several approaches [15], [18], which may be unreliable [10]; or use consecutive slides stained differently [19] to validate the translation. However, such absolute comparisons are also limited since the staining process is prone to high variation and tissue structure will vary between consecutive slides.

As such, it is hard to quantitatively compare two methods or architectural changes. Moreover, assessing the quality of a translation is dependent on the purpose for which it will be used. Although CycleGAN based approaches have great success and the resulting translations look plausible,¹ the use of artificially generated images is usually limited to the computer vision domain since in the medical sense, these images can be untrustworthy, e.g. it is known that such approaches can hallucinate features [20], [21] and thus can be unreliable for diagnostic purposes.

Assuming that the translation results in high fidelity, these methods are more often used in the computer vision domain to reduce domain shift [12], [22]; or as a domain augmentation strategy to reduce the need of additional annotations [10], [23]. Since these approaches are becoming more commonplace, and new possibilities are being explored such as multi-stain segmentation [10] or improving tumour classification [24], it is of a great importance to raise awareness of the sensitivity of such methods to some common, and rather small, changes.

As such, this article demonstrates that even the most simple architectural choice in CycleGAN-based models can play an important role in the ability of obtained models to reduce domain shift, even when visual appearance is not affected. Although most models produce plausible translations, i.e. those visually indistinguishable from real samples, the huge performance difference observed in pretrained models when applied to translated images, confirms that the quality of translations differ. In this study, the datasets are chosen to be as representative as possible, containing both histochemical (HC) and immunohistochemical (IHC) stains, and different directions of translations are investigated. In order to limit the number of experimental degrees-of-freedom, the modifications to the original CycleGAN architecture are restricted to the normalisation layer. In the original architecture Instance normalisation is used, in this study this is varied to other approaches commonly found in the literature: Batch, Layer, and Group. We show that the translations obtained by varying the normalisation layer belong to different distributions, and are distinct from those of real samples, causing pretrained models to perform badly.

Furthermore, since manual visual inspection cannot determine a difference in quality between the translations, it follows that visual inspection cannot be used as a validation criteria for virtual staining.

The main contributions of this article are:

•
To demonstrate that relatively small changes in CycleGAN-based methods, such as different normalisation layers, can have a great impact on translation quality, from the perspective of its ability to reduce a domain shift introduced by both inter- and intra-stain variation.
•
To better define the limitations of visual inspection when assessing virtual staining.
•
To give evidence that physical differences between stains, in addition to architectural choices, can play an important role when applying virtual stain transfer for reducing inter-stain domain shift.
•
To show that generative approaches can be used to indicate whether a divergence from the true stain distribution has taken place or not when virtual staining is performed.

The remainder of this article is organised as follows: in Section 2 literature related to stain transfer, and particularly approaches which are based on the CycleGAN architecture, are reviewed. Section 3 gives a detailed description of the presented study and dataset. Section 4 presents the experimental results. Finally, Section 5 analyses stain translation models in terms of their visual quality, training stability, failure cases and generated data distributions.

Section snippets

Related work

Generally, two main sources of domain shift in Digital Pathology can be identified and these are illustrated in Fig. 2: intra-stain variability, which represents the visual differences of one particular stain; and inter-stain variability, which is the result of the physical/chemical differences between stainings. Addressing inter-stain variability is of interest when tackling tasks that are solvable across various stains (such as glomeruli segmentation [10]) whereas intra-stain variability is

Methods

In order to demonstrate the sensitivity of virtual stain transfer to the underlying architecture, the ubiquitous CycleGAN architecture is taken and different stain transfer models are created by replacing the normalisation layers in both the discriminators and the generators. To quantitatively validate the obtained translations, their ability to reduce domain shift introduced by inter- or intra-stain variation is measured for the task of glomeruli segmentation. This is achieved using the

Inter-stain variability

The translations obtained by many of the stain transfer models are plausible (see definition in Introduction), as will be discussed in more details in Section 5.1.1. Nevertheless, the quantitative analysis performed using pretrained models shows that there are significant differences between their ability to reduce domain shift. Here, two directions are taken: by evaluating the PAS model’s performance on translations from the target stains to PAS (see Table 2); and by testing the models

Discussion

In this section, qualitative and quantitative assessments of the stain transfer models will be presented. The qualitative analysis includes visual assessment, which is presented in Section 5.1.1. However, the findings in Section 4 give strong evidence that this cannot be relied upon. Section 5.1.2 will further demonstrate this by highlighting the model’s instability during different training stages. Moreover, Section 5.1.3 presents some failure cases that can be easily overlooked by

Conclusions

To summarise, this article presents a study on the sensitivity of virtual stain transfer obtained by the most commonly used technique, CycleGAN, when used to reduce the domain shift introduced by both inter- and intra- stain variation (commonly referred to as stain translation, and stain normalisation). In order to control the architectural differences between stain translation models the experiments focused on different normalisation layers in the CycleGAN architecture.

Surprisingly, the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by: ERACoSysMed and e:Med initiatives by the German Ministry of Research and Education (BMBF); SysMIFTA (project management PTJ, FKZ 031L-0085A; Agence National de la Recherche, ANR, project number ANR-15—CMED-0004); SYSIMIT (project management DLR, FKZ 01ZX1608A); ArtIC, France project “Artificial Intelligence for Care” (grant ANR-20-THIA-0006-01) and co funded by Région Grand Est, France, Inria Nancy - Grand Est, France, IHU of Strasbourg, France, University of

References (64)

JanowczykA. et al.
Stain normalization using sparse autoencoders (stanosa): Application to digital pathology
Comput Med Imaging Graph
(2017)
ShamsolmoaliP. et al.
Image synthesis with adversarial networks: A comprehensive survey and case studies
Inf Fusion
(2021)
LoY.-C. et al.
Cycle-consistent GAN-based stain translation of renal pathology images with glomerulus detection application
Appl Soft Comput
(2021)
ShinS.J. et al.
Style transfer strategy for developing a generalizable deep learning application in digital pathology
Comput Methods Programs Biomed
(2021)
MoghadamA.Z. et al.
Stain transfer using generative adversarial networks and disentangled features
Comput Biol Med
(2022)
BuenoG. et al.
Data for glomeruli characterization in histopathological images
Data Brief
(2020)
BancroftJ.D. et al.
Theory and practice of histological techniques
(2008)
Ciompi F, Geessink OGF, Bejnordi BE, de Souza GS, Baidoshvili A, Litjens G, et al. The importance of stain...
CsurkaG.
A comprehensive survey on domain adaptation for visual applications
SalviM. et al.
Stain color adaptive normalization (scan) algorithm: Separation and standardization of histological stains in digital pathology
Comput Methods Programs Biomed
(2020)

Macenko M, Niethammer M, Marron JS, Borland D, Woosley JT, Guan Xiaojun, et al. A method for normalizing histology...

ReinhardE. et al.

Color transfer between images

IEEE Comput Graph

(2001)

KhanA.M. et al.

A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution

IEEE Trans Biomed Eng

(2014)

GadermayrM. et al.

Generative adversarial networks for facilitating stain-independent supervised and unsupervised segmentation: A study on kidney histology

IEEE Trans Med Imaging

(2019)

VasiljevićJ. et al.

Towards histopathological stain invariance by unsupervised domain augmentation using generative adversarial networks

Neurocomputing

(2021)

Rana A, Yauney G, Lowe A, Shah P. Computational histological staining and destaining of prostate core biopsy rgb images...

Shaban MT, Baur C, Navab N, Albarqouni S. StainGAN: Stain style transfer for digital histological images. In: ISBI....

Zhu J-Y, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks....

Brieu N, Meier A, Kapil A, Schoenmeyer R, Gavriel CG, Caie PD, Schmidt G. Domain adaptation-based augmentation for...

Shrivastava A, Adorno W, Sharma Y, Ehsan L, Ali SA, Moore SR, et al. Self-Attentive Adversarial Stain Normalization....

Cai S, Xue Y, Gao Q, Du M, Chen G, Zhang H, et al. Stain style transfer using transitive adversarial networks. In:...

Mahapatra D, Bozorgtabar B, Thiran J-P, Shao L. Structure preserving stain normalization of histopathology images using...

KangH. et al.

Stainnet: a fast and robust stain normalization network

Front Med

(2021)

Lahiani A, Navab N, Albarqouni S, Klaiman E. Perceptual embedding consistency for seamless reconstruction of tilewise...

Cohen JP, Luck M, Honari S. Distribution Matching Losses Can Hallucinate Features in Medical Image Translation. In:...

Vasiljević J, Feuerhake F, Wemmert C, Lampert T. Self Adversarial Attack as an Augmentation Method for...

Gadermayr M, Appel V, Klinkhammer MB, Boor P, Merhof D. Which way round? A study on the performance of...

Wagner S, Khalili N, Sharma R, Boxberg M, Marr C, de Back W, et al. Structure-Preserving Multi-domain Stain Color...

XuZ. et al.

Effective immunohistochemistry pathology microscopy image generation using cyclegan

Front Mol Biosci

(2020)

VahadaneA. et al.

Structure-preserving color normalization and sparse stain separation for histological images

IEEE Trans Med Imaging

(2016)

Lampert T, Merveille O, Schmitz J, Forestier G, Feuerhake F, Wemmert C. Strategies for training stain invariant CNNs....

Mercan C, Reijnen-Mooij G, Martin DT, Lotz J, Weiss N, van Gerven M, et al. Virtual staining for mitosis detection in...

Cited by (9)

Improving generalization capability of deep learning-based nuclei instance segmentation by non-deterministic train time and deterministic test time stain normalization
2024, Computational and Structural Biotechnology Journal
With the advent of digital pathology and microscopic systems that can scan and save whole slide histological images automatically, there is a growing trend to use computerized methods to analyze acquired images. Among different histopathological image analysis tasks, nuclei instance segmentation plays a fundamental role in a wide range of clinical and research applications. While many semi- and fully-automatic computerized methods have been proposed for nuclei instance segmentation, deep learning (DL)-based approaches have been shown to deliver the best performances. However, the performance of such approaches usually degrades when tested on unseen datasets.
In this work, we propose a novel method to improve the generalization capability of a DL-based automatic segmentation approach. Besides utilizing one of the state-of-the-art DL-based models as a baseline, our method incorporates non-deterministic train time and deterministic test time stain normalization, and ensembling to boost the segmentation performance. We trained the model with one single training set and evaluated its segmentation performance on seven test datasets. Our results show that the proposed method provides up to 4.9%, 5.4%, and 5.9% better average performance in segmenting nuclei based on Dice score, aggregated Jaccard index, and panoptic quality score, respectively, compared to the baseline segmentation model.
Virtual staining for histology by deep learning
2024, Trends in Biotechnology
In pathology and biomedical research, histology is the cornerstone method for tissue analysis. Currently, the histological workflow consumes plenty of chemicals, water, and time for staining procedures. Deep learning is now enabling digital replacement of parts of the histological staining procedure. In virtual staining, histological stains are created by training neural networks to produce stained images from an unstained tissue image, or through transferring information from one stain to another. These technical innovations provide more sustainable, rapid, and cost-effective alternatives to traditional histological pipelines, but their development is in an early phase and requires rigorous validation. In this review we cover the basic concepts of virtual staining for histology and provide future insights into the utilization of artificial intelligence (AI)-enabled virtual histology.
HistoStarGAN: A unified approach to stain normalisation, stain transfer and stain invariant segmentation in renal histopathology
2023, Knowledge-Based Systems
Virtual stain transfer is a promising area of research in Computational Pathology, which has a great potential to alleviate important limitations when applying deep-learning-based solutions such as lack of annotations and sensitivity to a domain shift. However, in the literature, the majority of virtual staining approaches are trained for a specific staining or stain combination, and their extension to unseen stainings requires the acquisition of additional data and training. In this paper, we propose HistoStarGAN, a unified framework that performs stain transfer between multiple stainings, stain normalisation and stain invariant segmentation, all in one inference of the model. We demonstrate the generalisation abilities of the proposed solution to perform diverse stain transfer and accurate stain invariant segmentation over numerous unseen stainings, which is the first such demonstration in the field. Moreover, the pre-trained HistoStarGAN model can serve as a synthetic data generator, which paves the way for the use of fully annotated synthetic image data to improve the training of deep learning-based algorithms. To illustrate the capabilities of our approach, as well as the potential risks in the microscopy domain, inspired by applications in natural images, we generated KidneyArtPathology, a fully annotated artificial image dataset for renal pathology.
A stain color normalization with robust dictionary learning for breast cancer histological images processing
2023, Biomedical Signal Processing and Control
Microscopic analyses of tissue samples are crucial for confirming the diagnosis of breast cancer. The digitization of these samples has led to the development of computational systems that can assist pathologists. However, these systems may face limitations owing to color variations in the images. Normalization studies have been widely conducted to address these issues, but there is still a need for new proposals that take into account the biological properties of dyes and tissues. This study presents a novel method for normalizing hematoxylin and eosin-stained histological images by estimating the color appearance matrices and density maps of the stain. The proposed method offers contributions in terms of pixel selection and weight definition to improve the color estimation of histological images. Besides, to the best of our knowledge, no previous studies have evaluated normalized images considering both handcrafted and learning features. Breast cancer images with significant color variations were used to evaluate this approach and the results demonstrated its effectiveness and efficiency. The average values of FSIM, NIQE, and QSSIM were up to 0.9866, 3.4298, and 0.9655, respectively. Compared with other normalization techniques, the proposed method showed an increase of up to 5.9261, with the largest difference observed in the amount of noise added, as indicated by the NIQE metric. To determine the impact of normalization on feature extraction, the evaluations included an analysis of both color and deep-learned features. These experiments showed that all evaluated methods harmed the separation of breast cancer samples by color features. In contrast, the deep-learned features resulted in less complex classification problems, especially with the proposed normalization. This technique also reached one of the lowest processing times, nearly 6 s with the largest image from the databases.
Unsupervised Multi-Domain Progressive Stain Transfer Guided by Style Encoding Dictionary
2024, IEEE Transactions on Image Processing
Artificial Intelligence for Digital and Computational Pathology
2023, arXiv

View all citing articles on Scopus

View full text

Research paperCycleGAN for virtual stain transfer: Is seeing really believing?

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Related work

Methods

Inter-stain variability

Discussion

Conclusions

Declaration of Competing Interest

Acknowledgements

Comput Med Imaging Graph

Inf Fusion

Appl Soft Comput

Comput Methods Programs Biomed

Comput Biol Med

Data Brief

Theory and practice of histological techniques

A comprehensive survey on domain adaptation for visual applications

Stain color adaptive normalization (scan) algorithm: Separation and standardization of histological stains in digital pathology

Comput Methods Programs Biomed

Color transfer between images

IEEE Comput Graph

A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution

IEEE Trans Biomed Eng

Generative adversarial networks for facilitating stain-independent supervised and unsupervised segmentation: A study on kidney histology

IEEE Trans Med Imaging

Towards histopathological stain invariance by unsupervised domain augmentation using generative adversarial networks

Neurocomputing

Stainnet: a fast and robust stain normalization network

Front Med

Effective immunohistochemistry pathology microscopy image generation using cyclegan

Front Mol Biosci

Structure-preserving color normalization and sparse stain separation for histological images

IEEE Trans Med Imaging

Research paper
CycleGAN for virtual stain transfer: Is seeing really believing?