Research paper
CycleGAN for virtual stain transfer: Is seeing really believing?

https://doi.org/10.1016/j.artmed.2022.102420Get rights and content

Highlights

  • CycleGAN, often used for stain transfer, is sensitive to architectural changes.

  • Changes do not affect visual plausibility but do affect pre-trained deep models.

  • Stain transfer can perturb important diagnostic markers, limiting reliable use-cases.

Abstract

Digital Pathology is an area prone to high variation due to multiple factors which can strongly affect diagnostic quality and visual appearance of the Whole-Slide-Images (WSIs). The state-of-the art methods to deal with such variation tend to address this through style-transfer inspired approaches. Usually, these solutions directly apply successful approaches from the literature, potentially with some task-related modifications. The majority of the obtained results are visually convincing, however, this paper shows that this is not a guarantee that such images can be directly used for either medical diagnosis or reducing domain shift.This article shows that slight modification in a stain transfer architecture, such as a choice of normalisation layer, while resulting in a variety of visually appealing results, surprisingly greatly effects the ability of a stain transfer model to reduce domain shift. By extensive qualitative and quantitative evaluations, we confirm that translations resulting from different stain transfer architectures are distinct from each other and from the real samples. Therefore conclusions made by visual inspection or pretrained model evaluation might be misleading.

Introduction

Digital pathology has become a rich area of innovation in both clinical application and research. However, its crucial process of staining is known to be prone to high variation [1] due to differences in tissue preparation (exposure time, tissue fixation, section thickness etc.), scanner characteristics (sensor, resolution, storage format, etc.) or staining protocol. Examples of such variations for the case of kidney pathology are given in Fig. 1. These differences can affect automatic systems [2] as they represent a source of domain shift [3]. A pathologist is able to correct for these variations due to experience, however current AI algorithms are not able to use such background knowledge. Thus, standardising the appearance of histological slides has become of great importance from both a diagnostic point-of-view and for the successful development and application of automated systems.

The standardisation is often addressed using computer vision techniques such as virtual staining — artificially changing the appearance of an image after its acquisition. Historically, research has focused on standardising the appearance of one particular stain, i.e. reducing the variation along the rows in Fig. 1. This is usually referred in literature as stain normalisation. However, for the sake of better understanding, herein this is referred as intra-stain normalisation. Classical (non-deep) approaches to intra-stain normalisation use stain separation to isolate specific channels and then standardise the colour levels with respect to a reference image [4], [5], [6]. More recent approaches use machine learning or deep learning strategies to standardise image appearance [7], [8]. Nowadays, the problem of virtual staining is typically considered to be a style-transfer problem, and a number of successful approaches based on style transfer techniques developed for natural images have been adapted to digital pathology [9], [10], [11], [12]. Their introduction however enabled the possibility to translate between two (or more) physically different stains (i.e. stain translation). With such models, it becomes possible to reduce the variation along the columns of Fig. 1. This represents the second type of normalisation studied in this article, and is referred to as inter-stain normalisation. One of the most successfully and widely applied approaches is CycleGAN [13]— an unsupervised and unpaired method that enables virtual staining between two stainings without any additional effort for data preparation.

Many works attest that CycleGAN-based methods for virtual staining can achieve translations that are visually indistinguishable from real samples [9], [10], [12], [14], [15]. Furthermore, many works propose extensions to the original CycleGAN architecture [9], [16], its loss function [12] or, with respect to a specific task, extend the training paradigm with additional modules [15], [17]. These works tend to rely on visual inspection to compare several approaches [15], [18], which may be unreliable [10]; or use consecutive slides stained differently [19] to validate the translation. However, such absolute comparisons are also limited since the staining process is prone to high variation and tissue structure will vary between consecutive slides.

As such, it is hard to quantitatively compare two methods or architectural changes. Moreover, assessing the quality of a translation is dependent on the purpose for which it will be used. Although CycleGAN based approaches have great success and the resulting translations look plausible,1 the use of artificially generated images is usually limited to the computer vision domain since in the medical sense, these images can be untrustworthy, e.g. it is known that such approaches can hallucinate features [20], [21] and thus can be unreliable for diagnostic purposes.

Assuming that the translation results in high fidelity, these methods are more often used in the computer vision domain to reduce domain shift [12], [22]; or as a domain augmentation strategy to reduce the need of additional annotations [10], [23]. Since these approaches are becoming more commonplace, and new possibilities are being explored such as multi-stain segmentation [10] or improving tumour classification [24], it is of a great importance to raise awareness of the sensitivity of such methods to some common, and rather small, changes.

As such, this article demonstrates that even the most simple architectural choice in CycleGAN-based models can play an important role in the ability of obtained models to reduce domain shift, even when visual appearance is not affected. Although most models produce plausible translations, i.e. those visually indistinguishable from real samples, the huge performance difference observed in pretrained models when applied to translated images, confirms that the quality of translations differ. In this study, the datasets are chosen to be as representative as possible, containing both histochemical (HC) and immunohistochemical (IHC) stains, and different directions of translations are investigated. In order to limit the number of experimental degrees-of-freedom, the modifications to the original CycleGAN architecture are restricted to the normalisation layer. In the original architecture Instance normalisation is used, in this study this is varied to other approaches commonly found in the literature: Batch, Layer, and Group. We show that the translations obtained by varying the normalisation layer belong to different distributions, and are distinct from those of real samples, causing pretrained models to perform badly.

Furthermore, since manual visual inspection cannot determine a difference in quality between the translations, it follows that visual inspection cannot be used as a validation criteria for virtual staining.

The main contributions of this article are:

  • To demonstrate that relatively small changes in CycleGAN-based methods, such as different normalisation layers, can have a great impact on translation quality, from the perspective of its ability to reduce a domain shift introduced by both inter- and intra-stain variation.

  • To better define the limitations of visual inspection when assessing virtual staining.

  • To give evidence that physical differences between stains, in addition to architectural choices, can play an important role when applying virtual stain transfer for reducing inter-stain domain shift.

  • To show that generative approaches can be used to indicate whether a divergence from the true stain distribution has taken place or not when virtual staining is performed.

The remainder of this article is organised as follows: in Section 2 literature related to stain transfer, and particularly approaches which are based on the CycleGAN architecture, are reviewed. Section 3 gives a detailed description of the presented study and dataset. Section 4 presents the experimental results. Finally, Section 5 analyses stain translation models in terms of their visual quality, training stability, failure cases and generated data distributions.

Section snippets

Related work

Generally, two main sources of domain shift in Digital Pathology can be identified and these are illustrated in Fig. 2: intra-stain variability, which represents the visual differences of one particular stain; and inter-stain variability, which is the result of the physical/chemical differences between stainings. Addressing inter-stain variability is of interest when tackling tasks that are solvable across various stains (such as glomeruli segmentation [10]) whereas intra-stain variability is

Methods

In order to demonstrate the sensitivity of virtual stain transfer to the underlying architecture, the ubiquitous CycleGAN architecture is taken and different stain transfer models are created by replacing the normalisation layers in both the discriminators and the generators. To quantitatively validate the obtained translations, their ability to reduce domain shift introduced by inter- or intra-stain variation is measured for the task of glomeruli segmentation. This is achieved using the

Inter-stain variability

The translations obtained by many of the stain transfer models are plausible (see definition in Introduction), as will be discussed in more details in Section 5.1.1. Nevertheless, the quantitative analysis performed using pretrained models shows that there are significant differences between their ability to reduce domain shift. Here, two directions are taken: by evaluating the PAS model’s performance on translations from the target stains to PAS (see Table 2); and by testing the models

Discussion

In this section, qualitative and quantitative assessments of the stain transfer models will be presented. The qualitative analysis includes visual assessment, which is presented in Section 5.1.1. However, the findings in Section 4 give strong evidence that this cannot be relied upon. Section 5.1.2 will further demonstrate this by highlighting the model’s instability during different training stages. Moreover, Section 5.1.3 presents some failure cases that can be easily overlooked by

Conclusions

To summarise, this article presents a study on the sensitivity of virtual stain transfer obtained by the most commonly used technique, CycleGAN, when used to reduce the domain shift introduced by both inter- and intra- stain variation (commonly referred to as stain translation, and stain normalisation). In order to control the architectural differences between stain translation models the experiments focused on different normalisation layers in the CycleGAN architecture.

Surprisingly, the

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by: ERACoSysMed and e:Med initiatives by the German Ministry of Research and Education (BMBF); SysMIFTA (project management PTJ, FKZ 031L-0085A; Agence National de la Recherche, ANR, project number ANR-15—CMED-0004); SYSIMIT (project management DLR, FKZ 01ZX1608A); ArtIC, France project “Artificial Intelligence for Care” (grant ANR-20-THIA-0006-01) and co funded by Région Grand Est, France, Inria Nancy - Grand Est, France, IHU of Strasbourg, France, University of

References (64)

  • Macenko M, Niethammer M, Marron JS, Borland D, Woosley JT, Guan Xiaojun, et al. A method for normalizing histology...
  • ReinhardE. et al.

    Color transfer between images

    IEEE Comput Graph

    (2001)
  • KhanA.M. et al.

    A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution

    IEEE Trans Biomed Eng

    (2014)
  • GadermayrM. et al.

    Generative adversarial networks for facilitating stain-independent supervised and unsupervised segmentation: A study on kidney histology

    IEEE Trans Med Imaging

    (2019)
  • VasiljevićJ. et al.

    Towards histopathological stain invariance by unsupervised domain augmentation using generative adversarial networks

    Neurocomputing

    (2021)
  • Rana A, Yauney G, Lowe A, Shah P. Computational histological staining and destaining of prostate core biopsy rgb images...
  • Shaban MT, Baur C, Navab N, Albarqouni S. StainGAN: Stain style transfer for digital histological images. In: ISBI....
  • Zhu J-Y, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks....
  • Brieu N, Meier A, Kapil A, Schoenmeyer R, Gavriel CG, Caie PD, Schmidt G. Domain adaptation-based augmentation for...
  • Shrivastava A, Adorno W, Sharma Y, Ehsan L, Ali SA, Moore SR, et al. Self-Attentive Adversarial Stain Normalization....
  • Cai S, Xue Y, Gao Q, Du M, Chen G, Zhang H, et al. Stain style transfer using transitive adversarial networks. In:...
  • Mahapatra D, Bozorgtabar B, Thiran J-P, Shao L. Structure preserving stain normalization of histopathology images using...
  • KangH. et al.

    Stainnet: a fast and robust stain normalization network

    Front Med

    (2021)
  • Lahiani A, Navab N, Albarqouni S, Klaiman E. Perceptual embedding consistency for seamless reconstruction of tilewise...
  • Cohen JP, Luck M, Honari S. Distribution Matching Losses Can Hallucinate Features in Medical Image Translation. In:...
  • Vasiljević J, Feuerhake F, Wemmert C, Lampert T. Self Adversarial Attack as an Augmentation Method for...
  • Gadermayr M, Appel V, Klinkhammer MB, Boor P, Merhof D. Which way round? A study on the performance of...
  • Wagner S, Khalili N, Sharma R, Boxberg M, Marr C, de Back W, et al. Structure-Preserving Multi-domain Stain Color...
  • XuZ. et al.

    Effective immunohistochemistry pathology microscopy image generation using cyclegan

    Front Mol Biosci

    (2020)
  • VahadaneA. et al.

    Structure-preserving color normalization and sparse stain separation for histological images

    IEEE Trans Med Imaging

    (2016)
  • Lampert T, Merveille O, Schmitz J, Forestier G, Feuerhake F, Wemmert C. Strategies for training stain invariant CNNs....
  • Mercan C, Reijnen-Mooij G, Martin DT, Lotz J, Weiss N, van Gerven M, et al. Virtual staining for mitosis detection in...
  • Cited by (9)

    View all citing articles on Scopus
    View full text