Stain Normalization using Sparse AutoEncoders (StaNoSA): Application to digital pathology

https://doi.org/10.1016/j.compmedimag.2016.05.003Get rights and content

Highlights

  • Digital histopathology slides have many sources of variance.

  • These variances can cause algorithms to perform erratically.

  • Stain Normalization using Sparse AutoEncoders (StaNoSA) in introduced.

  • It standardizes color distributions of a test image to a single template image.

  • Validated using three experiments with five other color standardization approaches.

Abstract

Digital histopathology slides have many sources of variance, and while pathologists typically do not struggle with them, computer aided diagnostic algorithms can perform erratically. This manuscript presents Stain Normalization using Sparse AutoEncoders (StaNoSA) for use in standardizing the color distributions of a test image to that of a single template image. We show how sparse autoencoders can be leveraged to partition images into tissue sub-types, so that color standardization for each can be performed independently. StaNoSA was validated on three experiments and compared against five other color standardization approaches and shown to have either comparable or superior results.

Introduction

Digital pathology (DP) is the process by which histology slides are digitized to produce high resolution images via whole slide digital scanners (Gurcan et al., 2009). These digitized slides afford the possibility of applying image analysis techniques to tissue images for the purpose of object detection, segmentation, and tissue classification. These automated image analysis algorithms are very relevant for nuclei detection, mitosis quantification, and tubule counting. Additionally these image analysis algorithms provide the ability of performing higher level, supervised learning tasks such as disease grading, thereby enabling the development of decision support algorithms for pathologists (Veta et al., 2014).

Prior to evaluation of the specimen under the microscope by a pathologist, the tissue is invariably treated by artificial or natural agents, to stain the various cellular structures. Hemotoxylin and Eosin (H&E) is one of the most routinely used stains for evaluating disease morphology, an example of which is shown in Fig. 1. The hemotoxylin provides a blue or purple appearance to the nuclei while the eosin renders eosinophilic structures (e.g., cytoplasm, collagen, and muscle fibers) a pinkish hue.

Since the staining process is a chemical one, there are many variables which can drastically change the overall appearance of the same tissue. For example, the specimen thickness, concentration of the stain, manufacturer, time, and temperature at which the stain is applied can have significant implications for the appearance of the final tissue specimen. Fig. 1(a) shows an H&E stained gastrointestinal (GI) biopsy tissue image. Fig. 1(b) shows a sample taken from the same specimen but stained using a slightly different protocol (i.e., a different concentration of H&E respectively), and as such, appears significantly darker.

The staining process is not the only source of visual variability in imaging of tissue specimens. The digitization process also could potentially induce changes and variability in tissue image appearance. For example Fig. 1 shows the same physical specimen scanned using two different scanners. Differences in the scanning platform (e.g., bulbs, ambient illumination, sensor chips), image stitching algorithms, and acquisition technologies (e.g., compression, tiling, whiteness correction) can also induce substantive differences in appearance of the resulting tissue image.

Pathologists are specifically trained to be able to cope with these variations and for the most part do not struggle with diagnostic decision making or object counting/quantification (e.g., nuclear or mitosis counting). On the other hand, image analysis methods, especially supervised learning/classification algorithms for nuclei segmentation or tissue partitioning, typically find it more difficult to cope with variations in image appearance and staining (Monaco et al., 2012). For instance if an algorithm is trained to identify nuclei based off chromatic cues from a singe site, the variations in staining might cause the algorithm to have a large number of errors for slightly differently stained images from a different site. This is further compounded when we consider extremely large datasets that are curated from many different sites, such as The Cancer Genome Atlas (TCGA).

These variations in stain and tissue appearance have spurred recent research in development of color standardization and normalization algorithms to help improve performance of subsequent image analysis algorithms Monaco et al., 2012, Khan et al., 2012. Often, this occurs by identifying a single image with the most optimal tissue staining and visual appearance, and designating this image as the “template”. Subsequently all other images to be standardized have their intensity distributions mapped to match the distribution of the template image. Previous works Hipp et al., 2011, Kong et al., 2007, Wang et al., 2007 have suggested that partitioning the image into constitute tissue subtypes (i.e., epithelium, nuclei, stroma, etc.) and attempting to match distributions on a tissue-per-tissue basis is more optimal compared to an approach which involves simply aligning global image distributions between the target and template images. In the context of histopathology this process might involve first identifying stromal tissue, nuclei, lymphocytes, fatty adipose tissue, cancer epithelium both within the target and template images and then specifically establishing correspondences between the tissue partitions in the two images. Subsequently the tissue specific distributions could then be aligned between the target and template images. While these tissue specific alignment procedures (Khan et al., 2014) have had more success compared to global intensity alignment approaches (Jain, 1989), successfully identifying the partitions remains an open challenge. For example, nuclei segmentation on its own is a large area of research Irshad et al., 2014, Sahirzeeshan and Madabhushi, 2012, Fatakdawala et al., 2010, Xu et al., 2016, yet represents only a single histologic primitive. It is therefore clear that more powerful and flexible approaches are needed for automated partitioning of the entire tissue image into distinct tissue compartments.

Our approach, Stain Normalization using Sparse AutoEncoders (StaNoSA), is based off the intuition that similar tissue types will be clustered close to each other in a learned feature space. This feature space is derived in an unsupervised manner, releasing it from the requirement of domain specific knowledge such as having to know the “true” color of the tissue stains, a requirement for a number of other approaches (Tian et al., 2014). Our approach (see Fig. 2 for a high level flowchart) employs sparse-auto encoders (SAE), a type of deep learning approach which through an iterative process learns filters which can optimally reconstruct an image. These filters provide the feature space for our approach to operate in. Once the pixels are appropriately clustered in this deep learned feature space into their individual tissue sub-types, tissue distribution matching (TDM) can occur on a per channel, per cluster basis. This TDM step allows for altering the target image to match the template image's color space.

The main contribution of this work is a new TDM based algorithm for color standardization for digital pathology images and which employs sparse autoencoders for automated tissue partitioning and establishing tissue specific correspondences between the target and template images. Autoencoding is the unsupervised process of learning filters which can most accurately reconstruct input data when transmitted through a compression medium. By performing this procedure as a multiple layer architecture, increasingly sophisticated data abstractions can be learned (Vincent et al., 2008). Additionally as part of our approach we perturb the input data with noise and attempt to recover the original unperturbed signal, an approach termed denoising auto-encoders (Vincent et al., 2008), that has been shown to yield robust features. StaNoSA is thus a fully automated way of transforming images of the same stain type to the same color space so that the amount of variance from (a) technicians, (b) protocols, and (c) equipment could be minimized.

The rest of the paper is organized as follows. Section 2 involves a review of the previous work in the field. Section 3 describes the approach (StaNoSA) and associated algorithms. Section 4 rigorously evaluates the method across two different datasets and compares the approach with a state of the art approach and four other common methods. Section 5 contains the discussion. Finally, in Section 6 we present our concluding remarks.

Section snippets

Previous work and novel contributions

Previous approaches (Khan et al., 2014) to color normalization for digital histopathology images tend to fall into one of two categories. The first set of approaches exploit staining characteristics directly, such as Beer–Lambert's law through color de-convolution (Ruifrok and Johnston, 2001). They attempt to divide the image color space into individual stain contributions and normalize these individually. The second category of algorithms take a statistical approach which rely on finding

Notation

For all methods, we define the dataset Z={C1,C2,CM} of M images, where an image C=(C,ψ) is a 2D set of pixels c  C and ψ is the associated vectorial function which assigns RGB values. T=CaZ is chosen from Z as the template image by which all other images in the dataset will be normalized to. Without loss of generality we chose S=CbZ to be the “target image”, which is to be normalized into the color space of T. See Table 1 for additional notation used in this manuscript.

Deep learning of filters from image patches

Denoising auto-encoders

Experimental evaluation

To rigorously evaluate our approach, we perform three experiments, each focused on directly addressing a different reason for variance in color and appearance of pathologic tissue slides. Specifically we attempted to address variations induced by (a) differences in platform and scanners, and (b) staining. Additionally we also evaluated the performance of a nuclear detection algorithm on the pre- and post-standardized images to evaluate the role of color standardization methods in facilitating

Discussion

Color standardization of digital histopathology images is critical to reducing stain variability and improving the robustness of computer assisted diagnostic and image quantification algorithms such as nuclei and mitoses detection. Previous approaches have potentially been handicapped by the necessity of accurately defining a stain matrix or requiring images to have similar tissue type representations in the image (i.e., similar proportions of stroma, nuclei). StaNoSA is able to circumvent the

Concluding remarks

In this work, we present a new color standardization approach called Stain Normalization using Sparse AutoEncoders (StaNoSA) which attempts to address the limitations of previous related approaches. We leverage the intuition that filters learned via deep learning tend to respond similarly to tissue sub-types having similar characteristics, even across images. This invariance allows for accurate, and unsupervised, partitioning of the tissue compartments for subsequent histogram matching and

Acknowledgements

Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under award numbers 1U24CA199374-01, R21CA167811-01, R21CA179327-01; R21CA195152-01 the National Institute of Diabetes and Digestive and Kidney Diseases under award number R01DK098503-02, the DOD Prostate Cancer Synergistic Idea Development Award (PC120857); the DOD Lung Cancer Idea Development New Investigator Award (LC130463), the DOD Prostate Cancer Idea Development Award

References (31)

  • J.D. Hipp et al.

    Spatially invariant vector quantization: a pattern matching algorithm for multiple classes of image subject matter including pathology

    J Pathol Inform

    (2011)
  • A. Basavanhally et al.

    Em-based segmentation-driven color standardization of digitized histopathology

    Proc SPIE

    (2013)
  • Y. Bengio et al.

    Greedy layer-wise training of deep networks

    Adv Neural Inf Process Syst

    (2007)
  • Y. Bengio

    Learning deep architectures for AI

    Found Trends Mach Learn

    (2009)
  • A. Coates et al.

    An analysis of single-layer networks in unsupervised feature learning

  • L. Dice

    Measures of the amount of ecologic association between species

    Ecology

    (1945)
  • H. Fatakdawala et al.

    Expectation-maximization driven geodesic active contour with overlap resolution (EMAGACOR): application to lymphocyte segmentation on breast cancer histopathology

    IEEE Trans Biomed Eng

    (2010)
  • Goodfellow IJ, Warde-Farley D, Lamblin P, Dumoulin V, Mirza M, Pascanu R, et al., 2013. Pylearn2: a machine learning...
  • M.N. Gurcan et al.

    Histopathological image analysis: a review

    IEEE Rev Biomed Eng

    (2009)
  • P.W. Hamilton et al.

    Automated location of dysplastic fields in colorectal histology using image texture analysis

    J Pathol

    (1997)
  • H. Irshad et al.

    Methods for nuclei detection, segmentation, and classification in digital histopathology: a review – current status and future potential

    IEEE Rev Biomed Eng

    (2014)
  • A.K. Jain

    Fundamentals of digital image processing

    (1989)
  • A.M. Khan et al.

    Ranpec: random projections with ensemble clustering for segmentation of tumor areas in breast histology images

  • A.M. Khan et al.

    A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution.

    IEEE Trans Biomed Eng

    (2014)
  • J. Kong et al.

    Computer-aided grading of neuroblastic differentiation: multi-resolution and multi-classifier approach

  • Cited by (0)

    View full text