Mistic: An open-source multiplexed image t-SNE viewer

Summary Understanding the complex ecology of a tumor tissue and the spatiotemporal relationships between its cellular and microenvironment components is becoming a key component of translational research, especially in immuno-oncology. The generation and analysis of multiplexed images from patient samples is of paramount importance to facilitate this understanding. Here, we present Mistic, an open-source multiplexed image t-SNE viewer that enables the simultaneous viewing of multiple 2D images rendered using multiple layout options to provide an overall visual preview of the entire dataset. In particular, the positions of the images can be t-SNE or UMAP coordinates. This grouped view of all images allows an exploratory understanding of the specific expression pattern of a given biomarker or collection of biomarkers across all images, helps to identify images expressing a particular phenotype, and can help select images for subsequent downstream analysis. Currently, there is no freely available tool to generate such image t-SNEs.


In brief
Multiplex imaging of tissues allows the simultaneous imaging of multiple biomarkers on a tissue specimen of interest and is a critical tool for clinical cancer diagnosis and prognosis. A common way to visualize and better understand such multiplexed images is to utilize dimensionality reduction (DR) methods, where each image is abstracted as a point in the reduced space. We developed Mistic to enable the simultaneous viewing of multiple 2D multiplexed images by combining DR, image processing, and GUI programming.

INTRODUCTION
Multiplex imaging of tissues, which allows the simultaneous imaging of multiple biomarkers on a tissue specimen of interest, is a critical tool for clinical cancer diagnosis and prognosis. Historically, patient tissue samples stained with hematoxylin and eosin have been used as the gold standard for tumor diagnosis by indicating the presence of tumors and their grade. [1][2][3] With the advent of immunohistochemical (IHC) 4 staining and the flourishing of multiplexed imaging approaches that leverage IHC, immunofluorescence (IF), fluorescence in situ hybridization (FISH), 5,6 multiplexed ion beam imaging (MIBI), 7 cyclic labeling such as co-detection by indexing (CODEX), 8 cyclic immunofluorescence (CyCIF), 9 and imaging mass cytometry (IMC), 10 there is a wealth of potential data to be gleaned from a single section of tissue. Biomarkers THE BIGGER PICTURE A crucial component of translational research is in exploiting tumor tissue for diagnostic or prognostic purposes. We believe this can best be achieved through a deeper understanding of the complex ecology of a tumor tissue and the spatiotemporal relationships between its cellular and microenvironment components. Multiplexed images from patient samples facilitate this understanding. We present Mistic, an open-source multiplexed image t-SNE viewer that enables the simultaneous viewing of multiple 2D multiplexed images to provide an overall visual preview of the entire dataset. This allows an exploratory understanding of underlying patterns in the data such as the specific expression pattern of a given biomarker across all images. Currently, there is no free tool to generate such image t-SNEs. Mistic aims to fill this gap by providing an easy to implement tool with simple functionality to view multiple images at once. Mistic supports images from Vectra, CyCIF, t-CyCIF, and CODEX.
Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems can be observed and quantified with their tissue context completely conserved. Due to the multidimensional nature of the data from these multiplexed images, analysis requires computational pipelines to both interrogate and study how the tissue architecture, spatial distribution of multiple cell phenotypes, and co-expression of signaling and cell cycle markers are related and what patterns might exist.
There are several commercial software platforms available for quantifying and analyzing multiplex image data, for example, Imaris (from Oxford Instruments), 11 Amira (from Thermo Fisher Scientific), 12 and Halo (from Indica Labs). 13,14 There are also open-source software platforms, for instance, ImageJ, 15 CellProfiler, 16 V3D, 17 BioImageXD, 18 Icy, 19 FIJI, 20 and QuPath 21 for the analysis of two dimensional (2D) biological images. Most of these platforms allow for a single 2D image to be examined at any one time.
A common way to visualize and better understand multidimensional data, such as that coming from multiplex images, is to utilize dimensionality reduction methods such as uniform manifold approximation and projection (UMAP) 22 or t-distributed stochastic neighbor embedding (t-SNE), 23 where each image is abstracted as a dot in the reduced space. These approaches are especially useful when combined with clustering methods (e.g., Gaussian mixture models  24,25 Louvain, 26 and Leiden 27 ) that can highlight key aspects of the data. While utilizing these approaches in our own work dealing with multiplexed images of non-small cell lung cancer (NSCLC) tumors, we realized that there could be a significant benefit to visualizing the actual tissue samples behind a UMAP or t-SNE scatter projection, thus giving rise to an ''image t-SNE.'' In our specific application, inspection of the images that constituted each spatially segregated cluster revealed cluster-specific biomarker patterns that, along with the tumor phenotypes, could be mapped succinctly to the therapy response of each patient. Thus, the image t-SNE rendering aided both our understanding and intuition that there exist distinct tumor patterns that guide the clustering, and that these patterns can potentially inform why a specific therapeutic response emerged, leading to further biological insights. Motivated by the usefulness of the image t-SNE in our work and in our recent analysis of endometrial cancer, 28 which we discuss in section tissue microarray cores for endometrial cancer, we have developed Mistic, an open-source multiplexed image t-SNE viewer that enables the simultaneous viewing of multiple 2D images rendered using multiple layout options to provide an overall visual preview of the entire dataset. In particular, the positions of the images can be taken from t-SNE or UMAP coordinates. This grouped view of the images further aids an exploratory understanding of the biomarkers' specific expression pattern across all images, helping to identify images expressing a particular phenotype or to select images for subsequent downstream analysis. Currently there is no freely available tool to generate such image t-SNEs (see Table 1). Software such as BioImageXD and Icy offer do offer a ''gallery'' or ''stack montage'' option, where a multichannel image is split into its individual channels to be viewed at once. Mistic is distinct in that multiple multichannel images can be processed and rendered at once using either user-pre-defined coordinates (e.g., from t-SNE or UMAP analysis), random coordinates, or using a grid layout. Mistic is agnostic as to how the t-SNE/ UMAP 2D coordinates are generated by the user. Since t-SNE/UMAP rendering of a dataset is closely aligned to the specific research question, Mistic allows the user to utilize either t-SNE or UMAP projections-or newer ones as they emerge-based on the user's specific question.
In section image t-SNE-based visualization of multiplexed images from a NSCLC cohort shows marker expression clustering across different patient response groups, we illustrate the importance of visualizing multiplexed images using an image t-SNE in the context of NSCLC. In section Mistic: an open-source multiplexed image t-SNE viewer, we describe Mistic and its features, in more detail. We run Mistic on several datasets using different data formats and describe these results in section generalizability and scalability experiments. In section discussion, we compare Mistic to alternative approaches and conclude with future work. Further details of the code and data can be found in section data and code availability.

RESULTS
Image t-SNE-based visualization of multiplexed images from an NSCLC cohort shows marker expression clustering across different patient response groups We computationally analyzed 92 7-stain PerkinElmer Vectra images from nine patients with advanced/metastatic NSCLC with progression. 35 They were treated with an oral HDAC inhibitor (vorinostat) combined with a PD-1 inhibitor (pembrolizumab). Tumor biopsies were collected from all patients both pre-and ontreatment. Of the nine patients, four qualified as ''Response 1'' and five as ''Response 2,'' where responses are based on how the tumors have progressed per the RECIST classification. 36 There are 34 images from patients having Response 1 and 58 images from patients classified as Response 2. Note that we have labeled the clusters, markers, patients, and responses in a generic fashion, since the biological conclusions arising from these data are not the purpose of this work.
We extracted the cell segments per field of view (FoV), built a count matrix with cells as rows and markers as columns, and clustered the count matrix to identify heterogeneous cell types, in particular tumor and immune cells. From these cell types, we automatically demarcate tumor-rich regions, across images.
To further quantify the tumor-immune cell colocalization at the tumor border, we cluster the tumor-immune cells at the tumor border using a GMM. 24,25 The input matrix to the GMM consists of cells (as rows) and their marker distribution as features (columns). The rows of the input matrix are ordered based on the cluster assignments, and the Z score of the marker expression (columns) is averaged over vectors per cluster representing a cell type (rows). Those markers that have a higher Z score per cluster are identified as the differentially expressed markers for that cluster. The clusters are visualized using a standard 2D t-SNE plot where each point represents an image ( Figures 1A and 1B). The differentially expressed markers for each of the three clusters are shown in Figure 1A, and the corresponding patient responses of either Response 1 or Response 2 categories, which are known a priori, are depicted in Figure 1B. We see that there is a higher colocalization of different sets of markers for Response 1 and Response 2 patients, respectively (Figures 1A and 1B), indicating underlying structural differences between different patient response groups.
To better understand how these clusters relate to the actual images, we generated an image t-SNE ( Figure 1C) where each dot in the t-SNE of Figures 1A and 1B is replaced with its corresponding multiplexed image. This arrangement of images projected as an image t-SNE clearly highlights the difference in immune cell abundance across Response 1 and Response 2 patient groups.

Mistic: An open-source multiplexed image t-SNE viewer
In order to facilitate the generation and manipulation of image t-SNEs, we developed an image t-SNE viewer called Mistic (multiplexed image t-SNE viewer). Mistic allows the simultaneous viewing of multiple multiplexed images, where images can be arranged using either pre-defined coordinates (e.g., t-SNE or UMAP), randomly generated coordinates, or a grid view. Mistic is written in Python and uses Bokeh, 37 which is a Python library for creating interactive visualizations for modern web browsers, along with JavaScript. Mistic has the capability to load and display multiple multiplexed images along with the metadata for the images. In Table 2, we provide the different imaging formats and number of images Mistic can be scaled to. It produces publication-ready outputs that can be saved in PNG format. Additionally, it can be used as the initial image viewer for exploratory image analysis before switching to more comprehensive (but single-image) viewers such as ImageJ, 15 Fiji, 20 and QuPath. 21

Descriptor
Mistic provides many of the standard image-viewing features that users have come to rely on and expect, through a userinput panel and two canvases. The user-input panel (Figure 2A) allows the user to select between (1) the stack montage view where all the markers of a single multiplexed image can be viewed simultaneously or (2) the multiple image view. For the

OPEN ACCESS
Descriptor latter, user can choose markers for rendering the multiplexed images, optional image borders, the arrangement of the images by coordinates or grid, and the option to shuffle the order of image rendering for overlapping images. An overall color theme for Mistic can be chosen from black, blue, and gray. The user can also choose the imaging technique used to generate the images such as Vectra, CyCIF, t-CyCIF, or CODEX (PhenoCycler). Mistic further provides two canvases for image  Figure 2B), which is generated based on user preferences, and a live canvas depicting the corresponding t-SNE scatterplot that uses the metadata from the images, where each image is represented as a dot ( Figure 2C). We explain the two canvases in detail in the following subsections (image t-SNE rendered through the static canvas and metadata rendered through the live canvas). Image t-SNE rendered through the static canvas To view the multiplexed images simultaneously, Mistic offers the user the ability to choose from three different image layouts (see Figure 3): (1) t-SNE layout based on user-pre-defined coordinates; (2) vertical grid arrangement of all images; (3) random layout based on coordinates that Mistic generates. Depending on the specific

Descriptor
(1) response category of the patients (e.g., based on RECIST classification) (2) treatment phase (such as pre-treatment or during treatment) (3) cluster annotations that are based on the differentialexpression analysis of the markers (4) patient distribution This metadata information may be provided by the user, using appropriate folders provided in Mistic's code repository, available here: https://github.com/MathOnco/Mistic. If no metadata is provided, the t-SNE scatterplot without any color coding will be rendered. Hover tool for image identification. In order to identify each image in the static canvas, we have a hover functionality built into the live canvases. Hovering over each image provides information such as name of the image, name of the corresponding thumbnail, image coordinates, and all metadata per image ( Figure 6). Processing user inputs from Mistic GUI Image processing based on markers selected. Each userselected marker channel of the multiplexed image is denoised separately. We use the scikit-image 43 and SciPy 44 libraries for Python.
1. We use median filtering, which is a nonlinear digital filtering technique, often used to preserve edges while removing noise and improving morphology detection. Function used is scipy.ndimage.median_filter(). 2. Next, we perform Otsu thresholding, which is an adaptive thresholding for image binarization. This calculates a distribution for the pixel levels on each side of the threshold, i.e., to demarcate pixels that either fall in foreground or background. The aim is to automatically find the threshold value where the sum of foreground and background distribution is at its minimum. Function used is threshold_otsu() from scikit-image. 3. Based on the threshold, we close the gaps in the image to refine morphological boundaries. Function used is closing() from scikit-image. 4. To sharpen the morphological boundaries, we clear the boundaries using clear_border() from scikit-image. 5. The pixel intensities in each channel are then upweighted to preserve morphology.
The cleaned channels are then combined to form the cleaned multichannel image. The denoised image is stored as an array in the unsigned byte format (''uint8'') to enable easy format conversion.
These are performed in generate_image_tSNE() in main.py in Mistic's code repository. Inbuilt dimensionality reduction and Bayesian clustering. Mistic will generate both 2D t-SNE coordinates and cluster the images, if the t-SNE coordinates or cluster labels are not provided by the user. Each multiplexed image is abstracted to a vector of length 6 where the entries of the vector are the means of the initial six channels. These vectors are stacked to create a matrix that is input to a t-SNE generation function (sklearn.manifold's tSNE()) and subsequently clustered using sklearn.mixture's BayesianGaussianMixture(). Border option. An image with a border is created by pasting the cleaned image onto a rectangle with a slightly larger height and width than the cleaned image. The rectangle is filled with a color based on the metadata provided by the user. These are performed in generate_image_tSNE() in main.py.

Figure 6. Hover window
An example hover window that opens with the hover tool while mousing over a t-SNE dot on any of the live canvases (here shown for ''Cluster annotations''). This live canvas is for the stack montage option discussed in section t-CyCIF image of primary lung squamous cell carcinoma.

OPEN ACCESS
Descriptor this option via a plugin along with further image-adjusting functionalities (e.g., brightness, sharpness). Bokeh plot tools Each Mistic canvas uses the interactive Bokeh toolbar to save plots, select regions, and change plot parameters such as zoom level, reset, pan, etc. Figure 8 shows the set of plot tools used. Further documentation of the Bokeh toolbar and how to use it can be found here: https://docs.bokeh.org/en/latest/ docs/user_guide/tools.html.

Generalizability and scalability experiments t-CyCIF image of lung adenocarcinoma metastasis to lymph node
To show the generalizability of Mistic, we use t-CyCIF data from lung adenocarcinoma metastasized to the lymph node. [38][39][40] The image is in OME-TIFF format, 48 13 GB in size with dimensions 10,101 3 9,666, and it has 44 marker channels. To simultaneously test Mistic for scalability, we created duplicates of this image. Figure 9A shows the Mistic static canvas where 40 duplicate t-CyCIF images with six markers (CD45, keratin, a-SMA, FoxP3, PD-1, PD-L1) are rendered in rows. The zoomed-in composite image thumbnail is shown in Figure 9B with the corresponding composite image as seen in Minerva 47 for five markers (CD45, IBA1, keratin, a-SMA, DNA) ( Figure 9C). Mistic allows the user to choose any number of markers for simultaneous viewing, while Minerva allows up to five markers. In Figures S2-S4, Mistic is shown on 50, 60, and 70 image repeats, respectively, where images are either rendered in rows or randomly. t-CyCIF image of primary lung squamous cell carcinoma For a single multiplexed image, Mistic provides the user a stack montage view made up of the individual markers. In Figure 7A, we show this option for the t-CyCIF image on primary lung squamous cell carcinoma [38][39][40] in OME-TIFF format for all 44 marker channels. We highlight the keratin channel in Figure 7B and show the corresponding channel using Minerva 47 ( Figure 7C). Minerva provides single marker views for 12 markers, whereas Mistic renders all 44 channels as a montage. Tissue microarray cores for endometrial cancer A recent study on endometrial cancer 28 explored the effects of coordinated humoral response (from plasma cells) and cellular immune responses (from T and B cells) in the progression of four different human endometrial cancer subtypes: clear cell carcinoma, serous, endometrioid type high grade, and endometrioid type low grade. These effects were studied by investigating the spatial colocalization and co-expression of polymeric immunoglobulin receptor (pIgR) by tumor cells with immunoglobulins A and G (IgA, IgG) secreted by B cells. The imaging data in this study consisted of 210 tissue microarray (TMA) cores from endometrial tumor samples stained for plasma cells, B cells, IgA, IgG, and

DISCUSSION
Understanding the complex ecology of a tumor tissue and the spatiotemporal relationships between its cellular and microenvironment components is becoming a key component of translational research, especially in immuno-oncology. The generation Descriptor and analysis of multiplexed images from patient samples is of paramount importance to facilitate this understanding. In Table 1, we highlight different image viewers currently available as opensource or commercial software. While most software can handle the visualizing and processing of a single multiplex or microscopy image, to our knowledge, there exists no current image viewer allowing the simultaneous preview of multiple multiplexed images, rendered using t-SNE coordinates or random coordinates. Mistic does not provide additional image processing capabilities such as adjusting images for brightness, sharpness, etc., or detecting objects (segmentation), since Mistic was built with the motivation of providing a preliminary all-image view to aid in better informing quantitative downstream analysis such as identifying spatial patterns across the tumor-immune environment (sections image t-SNE-based visualization of multiplexed images from a NSCLC cohort shows marker expression clustering across different patient response groups and tissue microarray cores for endometrial cancer) and in visualizing specific marker channels (section t-Cy-CIF image of primary lung squamous cell carcinoma). Using the visuals from Mistic, selected single multiplexed images can be further analyzed using established software in Table 1. Software such as ml4A, 32 Mirador, 33 and OpenSeadragon 34 currently do not cater to 2D multiplexed images. Mistic aims to fill this gap by providing this simple functionality to view multiple images at once, while also giving users the option to view images based on a set of user choices. In our test runs using 92 images (with dimension 1,024 x 1,024 pixels), Mistic takes under a minute to process and render the images according to the user options available (for user options, see section image t-SNE rendered through the static canvas and Figure 2A).
As part of future work, a few potential improvements will be introduced to Mistic. Once a set of images are identified using Mistic, we would like to render those images separately in the live panel. This gives the user an additional perspective to refine the selected images for further analysis. We also hope to integrate Mistic into one of the open-source software viewers listed in Table 1. This would require the development of an additional framework in React JavaScript, 50 which is the single largest user interface framework.
Through our generalized examples of NSCLC (consisting of 92 7-marker Vectra TIFF images from nine patients), lung adenocarcinoma (70 44-marker t-CyCIF OME-TIFF images), colorectal carcinoma (42 4-marker CyCIF OME-TIFF images), breast adenocarcinoma (88 64-marker CODEX OME-TIFF images), tonsil data (105 16-marker CODEX QPTIFF images), and endometrial cancer (210 PNG images from 107 patients), we have demonstrated the functionality and practicality of Mistic. Our aim is that Mistic will be used as a first step to viewing multiplexed images simultaneously. This all-image visual preview should facilitate preliminary insights into possible marker expression patterns, aiding downstream image analysis for predicting disease progression and identifying clinical biomarkers.

EXPERIMENTAL PROCEDURES
Resource availability Lead contact Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Alexander R. A. Anderson (alexander. anderson@moffitt.org).