Contrast optimization of mass spectrometry imaging (MSI) data visualization by threshold intensity quantization (TrIQ)

Mass spectrometry imaging (MSI) enables the unbiased characterization of surfaces with respect to their chemical composition. In biological MSI, zones with differential mass profiles hint towards localized physiological processes, such as the tissue-specific accumulation of secondary metabolites, or diseases, such as cancer. Thus, the efficient discovery of ‘regions of interest’ (ROI) is of utmost importance in MSI. However, often the discovery of ROIs is hampered by high background noise and artifact signals. Especially in ambient ionization MSI, unmasking biologically relevant information from crude data sets is challenging. Therefore, we implemented a Threshold Intensity Quantization (TrIQ) algorithm for augmenting the contrast in MSI data visualizations. The simple algorithm reduces the impact of extreme values (‘outliers’) and rescales the dynamic range of mass signals. We provide an R script for post-processing MSI data in the imzML community format (https://bitbucket.org/lababi/msi.r) and implemented the TrIQ in our open-source imaging software RmsiGUI (https://bitbucket.org/lababi/rmsigui/). Applying these programs to different biological MSI data sets demonstrated the universal applicability of TrIQ for improving the contrast in the MSI data visualization. We show that TrIQ improves a subsequent detection of ROIs by sectioning. In addition, the adjustment of the dynamic signal intensity range makes MSI data sets comparable.


INTRODUCTION
Imaging methods such as microscopy, infrared thermography, and magnetic resonance tomography play a central role in science, technology, and medicine (Gomès, Assy & Chapuis, 2015;Doi, 2006).
Mass spectrometry imaging (MSI) datasets contain spatially resolved spectral information. The parallel detection of multiple compounds with high sensitivity established mass spectrometry (MS) as the first-choice tool for exploratory studies, particularly in combination with data mining methods (López-Fernández et al., 2015;Winkler, 2015). Numerous MSI technologies have been reported; the most commonly used MSI platform is based on matrix-assisted laser desorption/ionization (MALDI), suitable for a wide range of molecules (Rae Buchberger et al., 2018). However, conventional MSI The original DESI data set consists of four samples per image, which we separated for further processing.
All datasets comply with the imzML data format community standard (Schramm et al., 2012;Römpp et al., 2011), implemented in many proprietary and open source programs for MSI data processing (Weiskirchen et al., 2019).

Threshold intensity quantization (TrIQ) algorithm
An image is a function f ðx; yÞ that assigns an intensity level for each point x; y in a two-dimension space. For visualizing f ðx; yÞ on a computer screen or printer, the image must be digitized for both intensity and spatial coordinates. As MSI is a scanning technique, spatial coordinates x and y are already discrete values related to the lateral resolution of the scanning device. The intensity values provided by the MS ion detector are analog quantities that must be transformed into discrete ones. Quantization is a process for mapping a range of analog intensity values to a single discrete value, known as a gray level. Zero-memory is a widely used quantization method. The zero-memory quantizer computes equally spaced intensity bins of width w: where n represents the number of discrete values, usually 256; minðf Þ and maxðf Þ operators provide minimum and maximum intensity values. Quantization is based on a comparison with the transition levels t k : Finally, the discrete mapped value Q is obtained: (AI)MSI methods often produce outliers, i.e., infrequent extreme intensity values, which drastically reduce image contrast. The Threshold Intensity Quantization, or TrIQ, addresses this issue by setting a new upper limit T; intensities above this threshold will be Table 1 Mass spectrometry imaging data sets. AP-MALDI-Atmospheric pressure matrix assisted laser desorption/ionization, DESI-Desorption electrospray ionization, LAESI-Laser ablation/electrospray ionization, LTP-Low-temperature plasma, Res.-lateral resolution, Tol.-mass tolerance. References: Oetjen et al. (2015), Zheng, Bartels & Svatoš (2020), Maldonado-Torres et al. (2014), Maldonado-Torres et al. (2017), Römpp et al. (2010), Römpp et al. (2014). hðiÞ stands for the i bin's frequency within an image histogram, N is the image's pixel count. Given a target probability q it is possible to find the bin k whose CDF closely resembles q. Then, the upper limit of the bin k in h will be used as the threshold value T. The new transition levels can be defined as: Therefore, Q mapping will be Qðf ðx; yÞÞ ¼ 0; f ðx; yÞ t 1 k; t k , f ðx; yÞ t kþ1 n À 1; f ðx; yÞ . t nÀ1 8 < : (7) with k running from 1 to n À 1. From Eq. (7) follows that a higher k leads to a better approximation of q. Default values for k and q in RmsiGUI are 100 and 98%, respectively.
Using the default values of the TrIQ, 98% (= q) of the image's total intensity are visualized. Pixel intensities above the calculated threshold value T are limited to a maximum value. Therefore, the rescaled 100 (= k) bins visualize the dataset's intensity levels with more detail.

Implementation
Several programs and workflows for MSI data analysis employ the statistical language R (R Core Team, 2018), such as MSI.R (Gamboa-Becerra et al., 2015), Cardinal (Bemis et al., 2015), and the Galaxy MSI module (Föll et al., 2019). The Otsu segmentation method (Otsu, 1979) tested in this work comes with the R package EBImage (Pau et al., 2010). Recently, we published an R-based platform for MSI data processing with a graphical user interface, RmsiGUI, which provides modules for the control of an open hardware imaging robot (Open LabBot), the processing of raw data, and the analysis of MSI data (Rosas-Román et al., 2020). We integrated the TrIQ algorithm into RmsiGUI and provide the R code snippets for facilitating its adoption into other programs. The source code is freely available from the project repository https://bitbucket.org/lababi/rmsigui/.
We use the viridis color map, which is optimized for human perception and people with color vision deficiencies (Nuñez, Anderton & Renslow, 2018;Garnier et al., 2018). Reading and processing MSI data in imzML format are done with the MALDIquantForeign and MALDIquant libraries (Gibb & Strimmer, 2012;Gibb & Franceschi, 2019). Figure 1 shows the graphical user interface of RmsiGUI with the TrIQ option selector.

RESULTS AND DISCUSSION
The contrast of digital images can be enhanced with additional operations, such as global or local histogram equalization algorithms; however, such image processing tools do not preserve the original gray level scale's linearity. In contrast, our Threshold Intensity Quantization (TrIQ) approach finds an intensity threshold T for saturating the images' last gray level. Importantly, the linearity of the experimentally determined intensity scale is preserved.
In the next sections, we demonstrate the application of the TrIQ for the processing of mass spectrometry imaging (MSI) datasets. The term raw image is used in this paper for denoting images rendered with the default quantization method of R. Figure 2 shows the mass spectrometry image of a human colorectal adenocarcinoma sample, acquired with DESI and 100 lm spatial resolution (Oetjen et al., 2015). The imaged signal of 885.55 m/z corresponds to de-protonated phosphatidylinositol (18:0/20:4), [C 47 H 83 O 13 P-H] À (Tillner et al., 2017).

Contrast optimization
The direct plotting of the extracted m/z slice results in the image shown in Fig  A typical data transformation for imaging is the use of a logarithmic intensity scale. Figure 2B shows the image after applying the natural logarithm to the MSI signal intensities and default quantization. The contrast is improved. However, further operations would be necessary, such as the subtraction of the background level. Besides, the interpretation of the non-linear color scale is not intuitive.
The conventional sequence for improving the contrast of an MSI image is applying the zero-memory quantization of MSI data and a transformation function on the quantized pixels. Figure 2C shows the result of this process. Although improved contrast is gained with linear equalization, the corresponding histogram shows that many gray levels were lost in the transformation process. Only eighteen remaining gray levels contain data. This information loss is irreversible and may have adverse effects on the performance of standard segmentation algorithms. To make TrIQ and equalized images comparable, we selected an intensity threshold for saturating the equalized image near to the T threshold provided by the TrIQ algorithm. The shown linear equalization histogram has 256 bins.
Applying the TrIQ algorithm with P ¼ 0:95 resulted in Fig. 2D. Pixels with intensities above T ¼ 4; 244 were binned within the highest gray level. The corresponding histogram shows an almost flat frequency over a wide intensity range. Compared to the unprocessed image, the contrast of the image was drastically improved, and the linear intensity scale of the color representation was preserved. Figure 2E shows the histological staining of the tissue section analyzed. The mass image processed with the TrIQ algorithm reassembles best the anatomical details which are recognizable in the optical image.

Background optimization
Various causes can result in background noise, such as sample metabolites with low vapor pressure and matrix compounds. Figure 3A shows the rendering of signals from an Removing outliers with TrIQ and reducing gray levels lead to improved image brightness (see Figs. 3A and 3B). Nevertheless, background noise is also enhanced, and the sample shape is not well defined (see ion 209 m/z in Fig. 3B). There are two possibilities for background correction with TrIQ. The first option is reducing the number of gray levels, thus grouping a wider range of values within a single bin. Reducing the gray levels from 32 to 9 produced an almost perfectly uniform background and a well-defined sample shape, as shown in Fig. 3C. The second method finds a new black level threshold that substitutes the operator minðf Þ in Eq. (5). As the color bars for this approach indicate, the black level thresholds depend on the individual image data (see Fig. 3D). Both methods efficiently diminish the impact of background noise. But whereas reducing the gray levels is the simpler approach, defining a new black level threshold maintains the color depth.

Normalization for comparable mass spectrometry images
Comparing mass spectrometry images (MSI) is a challenge because standard quantization procedures create images with distinct intensity and color scales, even if they were measured under the same experimental conditions. Global TrIQ finds the highest T among a given MSI set. This threshold is used for computing the transition levels and mapping MSI intensities to discrete values on every image within the set. Figure 4 shows human colorectal adenocarcinoma images. The samples come from the same tissue, cut into slices with a thickness of 10 lm (Oetjen et al., 2015). All images visualize the abundance of the ion 885.55 m/z. The plotting of the raw data resulted in Figure 4 Global TrIQ applied to the ion 885.55 m/z of human colorectal adenocarcinoma DESI MSI slices, with P = 0.91 and 32 gray levels. Compared to the raw data visualization (A), the contrast is drastically enhanced by applying the TrIQ algorithm (B). The normalization also allows a direct comparison of the images (dataset from Oetjen et al. (2015)). Full-size  DOI: 10.7717/peerj-cs.585/ fig-4 images of low contrast, which are difficult to interpret and to compare (see Fig. 4A). Using the Global TrIQ algorithm, a maximum threshold T of 4,970 was calculated and applied for all twelve images; the accumulated probability was set to 0.91. The resulting color scale is the same for all images (see Fig. 4B). Thus, the relative abundance and distribution of the ion in multiple images can be evaluated at a single glance. Figure 5 provides another example of global TrIQ. The image compares the 62.1, 84.1, and 306.1 m/z ions of chili (Capsicum annuum) slices sampled at 1 mm lateral resolution with low-temperature plasma (LTP) MSI (Maldonado-Torres et al., 2014). LTP MSI detects small, volatile compounds. Therefore, the images resulting from this ambient ionization technique are noisy. TrIQ with P ¼ 0:98 and five gray levels improves the contrast. The remaining noise is efficiently removed with a 3 Â 3 median filter (see Fig. 5). The signals with a mass-to-charge ratio of 62.1 and 84.1 display a defined localization at the seeds and at the center of the placenta. The signal with 306.1 m/z is enriched in placenta tissue and corresponds to the [C 18 H 27  Both examples demonstrate the usefulness of TrIQ to normalize MSI data for image comparison.

Segmentation and visualization of regions of interest (ROI)
Finding regions of interest (ROI) in MSI datasets is essential for biomarker discovery and physiological studies. High-contrast and low-noise images favor the automated segmentation. Figure 6 compares binary masks obtained using the standard Otsu algorithm and TrIQ. The input image was zero-memory quantized with 256 bins before applying the Otsu algorithm. The TrIQ binary masks were obtained by setting TRUE all non-zero values of the median filtered images of Fig. 5. TrIQ-median filtered images show more extensive and homogenous regions compared to the Otsu masks and fit well with the anatomical fruit sections of the optical image (see Fig. 6). Figure 7 shows the Global TrIQ segmentation of mouse urinary bladder data, scanned with AP-MALDI MSI at a lateral resolution of 10 μm (Römpp et al., 2010;Römpp et al., 2014). The distribution of 741.53 m/z, identified as a sphingomyelin [C 39 H 79 N 2 O 6 P + K] þ ion, is related to muscle tissue. 743.54 m/z is associated with the lamina propria structure, while 798.54 m/z is mainly found in the urothelium (Römpp et al., 2010). Global TrIQ was applied with P = 0.95 and 25 gray levels. TrIQ and median filtering allow clear discrimination between muscle tissue and lamina propria using the marker ions 741.53 and 743.54 m/z. For separating the urothelium structure from the muscular tissue, the binary image for 798.54 m/z was calculated by zeroing gray levels below nine. In contrast, if the Otsu method is applied to the 798.54 m/z ion, the urothelium region is isolated automatically. This result is expected since the Otsu method assumes an image histogram distribution with a deep sharp valley between two peaks. Figure 8 shows an overlay of the TrIQ processed ion images, representing correctly the anatomical structures of the mouse urinary bladder.

Computational performance of the algorithm
For estimating the computational performance of the TrIQ algorithm, we reprocessed selected ion traces presented in this paper (DESI 885.55 m/z, LAESI 209 m/z, LTP 62.1 m/z, Figure 6 Image segmentation for C. annuum obtained with Otsu and TrIQ. For images with intensity outliers and noisy regions, TrIQ combined with median filtering gives larger and more uniform regions compared to the standard Otsu algorithm (dataset from Maldonado-Torres et al. (2014).
Full-size  DOI: 10.7717/peerj-cs.585/ fig-6 and AP-MALDI 741.53 m/z) and measured the time for applying the TrIQ. To test the suitability of the TrIQ algorithm for large images, we built synthetic slices by sequentially doubling the DESI data. Time calculation was executed 25 times to account for variations in system processes of the operating system. Figure 9 demonstrates the results from running the R script Timing. R (provided as supplemental code) on a standard Linux laptop (Intel(R) Core(TM) i7-7700HQ CPU with 2.80 GHz, 16 Gb RAM, Peppermint OS 10). On average, more than 1,000 pixels were processed per millisecond. In the tested range, i.e., up to >500,000 pixels, the execution speed was proportional to the image's size. Thus, the TrIQ algorithm is computationally efficient and scalable. Contrary to histogram equalization algorithms, TrIQ preserves the linearity of measured ion intensities. The processing with TrIQ facilitates the recognition of regions of interest (ROI) in MSI data sets, either by visual inspection of by automated segmentation algorithms, supporting the interpretation of MSI data in biology and medicine.
As with any data filtering method, TrIQ could remove or alter valuable information. Therefore, it is recommendable to compare raw and processed images and carefully adjust the target probability and the number of gray levels. MSI technology itself causes further sources of errors. Fixation agents, solvents, and matrix compounds can lead to background signals. To some extent, such off-sample ions can be removed manually, or using MSI software (Ovchinnikova et al., 2020). Technical variations caused by the sample topology or unstable ionization add additional inaccuracies (Bartels et al., 2017). Further, the ion count is not strictly proportional to the abundance of a molecule but depends on the local sample structure and composition, and the desorption/ionization principle. Thus, complementary MSI, optical and histological methods should be used on the same sample (Swales et al., 2019).
Applying TrIQ to a set of images equalizes their intensity scales and makes them comparable. We demonstrated the implementation of the TrIQ algorithms in R to process MSI data in the community format imzML. The algorithm is computationally fast and only requires basic operations, and thus can be quickly adapted to any programming language. TrIQ can be applied to improve any scientific data plotting with extreme values, respecting the original intensity levels of raw data.