colocr: an R package for conducting co-localization analysis on fluorescence microscopy images

Background The co-localization analysis of fluorescence microscopy images is a widely used technique in biological research. It is often used to determine the co-distribution of two proteins inside the cell, suggesting that these two proteins could be functionally or physically associated. The limiting step in conducting microscopy image analysis in a graphical interface tool is the selection of the regions of interest for the co-localization of two proteins. Implementation This package provides a simple straightforward workflow for loading fluorescence images, choosing regions of interest and calculating co-localization measurements. Included in the package is a shiny app that can be invoked locally to interactively select the regions of interest where two proteins are co-localized. Availability colocr is available on the comprehensive R archive network, and the source code is available on GitHub under the GPL-3 license as part of the ROpenSci collection, https://github.com/ropensci/colocr.


INTRODUCTION
Biologists use fluorescence microscopy imaging techniques in a variety of applications. Among them, the most widely used application is the co-localization analysis. It is often used to describe the co-distribution of two proteins that are functionally linked in the cell. The underlying assumption of this technique is that two proteins closely localizing will interact with each other to potentially share some common characteristics in the cell functions. Several methods are developed for quantifying the co-localization of the two intracelullar proteins using fluorescence microscopy images. Nevertheless, co-localization analysis still has some limitations. Therefore, methods to deal with these limitations such as correcting for chromatic shift of the color components were also proposed (Manders, 1997;Manders, Verbeek & Aten, 1993;Matsuda et al., 2018).
Multiple tools implement these methods with easy to use graphical interfaces such as Fijian extension of ImageJ (Schindelin et al., 2012;Schneider, Rasband & Eliceiri, 2012). imager and magick are two R packages that can be used for similar image analysis (Barthelme, 2018;Ooms, 2018). Selecting the regions of interest (ROI) in a graphical interface is a critical step for these image analyses. Often, this requires manual work by the user, which can be time consuming when processing tens or hundreds of images. Also, this analysis would be very hard to reproduce or rerun with minor parameter changes. Other image analysis programmatic tools have a wider functionality and goals beyond simple analysis, so non-experienced users might have a hard time using them.
Here, we present a simple package called colocr that can be used in R environment (R Core Team, 2017). colocr enables quantifying the co-localization of two coloring dyes from the high quality microscopy images obtained from staining with two different fluorescent probes. The functions in colocr map to the intuitive steps of the colocalization analysis and do not require prior knowledge of image analysis or advanced R. The package offers a graphical user interface based on the popular Shiny applications that can be launched locally or accessed online (Chang et al., 2016).

Data sources
The confocal fluorescence microscopy images presented in this article are from the DU145 prostate cancer cell line. In this experiment, the cell line was treated with two primary antibody probes for two proteins RKIP and one of MAP1LC3B, PIK3CB, TBC1D5 or TOLLIP, and subsequently with two secondary antibody probes conjugated by different fluorescent dyes (Ahmed et al., 2018). The aim of this experiment is to determine the degree of co-localization of two proteins in this cell line and to further describe their functional association in autophagy during the tumor progression.

Co-localization measurements
The following is a brief discussion of the theory and interpretations of the different measurements we used in this package as measures of cellular co-localization. The articles by Manders, Verbeek & Aten (1993) and Dunn, Kamocka & McDonald (2011) describe the formal details of the statistics. For each of the co-localization measurement, we provide a definition, formula, range of values, interpretation and the suitable situations where it can be used.

Pearson's correlation coefficient
Pearson's correlation coefficient (PCC) is the co-variance of the pixel intensities from the two channels. The mean of the intensities is subtracted from each pixel which makes the coefficient independent of the background level. The PCC is calculated as follows: Where R i and the G i is the intensities of the magenta and green channels and theR andḠ are the average intensities. The values of PCC are between 1 and −1 for perfect correlations in the positive and negative directions respectively and 0 means no correlation. PCC measures both the occurrence and the proportionality of the pixel intensity, therefore is expected to be used in cases where the two dyes are expected to co-localize and to scale linearly.

Manders overlap coefficient
Manders overlap coefficient (MOC) is the fraction of pixels from each channel with values above the background. It doesn't require subtraction of the mean. Therefore, the values are always between 0 and 1. The MOC is calculated as follows: Where R i and the G i is the intensities of the magenta and green channels. MOC is suitable to use in cases where the signal from the two proteins are expected to co-occur but not in proportion to each other.

Data objects & methods
colocr uses an S3 object called cimg from the imager package. All methods take this object as input with exception of image_load which takes a single argument for the path to the image file. image_load and roi_select returns the same cimg with an additional attribute called label in the latter case. roi_show and roi_check return NULL and four and two plots, respectively. roi_test returns a data.frame. Table 1 summarizes the input and output of each function in the package.

Source code & reproducibility
The source code for the package is available on GitHub under the GPL-3 license as part of the ROpenSci on-boarding repository (https://github.com/ropensci/colocr). The code and the image in this document are available at https://github.com/BCMSLab/colocrart. A simplified version of this code is presented in the last section of this article. The full version of the code is provided in an additional file.

RESULTS & DISCUSSION
Here, we introduce an example from the published literature where images from the DU145 prostate cancer cell line stained with dyes for two proteins RKIP/PEBP1 and LC3/MAPLC3B roi_select Image object (cimg) and parameters to select regions of interest.
Image object (cimg) with and label attribute.
roi_show Image object (cimg) with and label attribute. Four plots. Original image, low resolution selected regions and two gray scale images of two channels with highlighted selected regions.
roi_check Image object (cimg) with and label attribute. Two plots. Scatter plot and density distribution of the pixel intensities from the selected regions in two channels.
roi_test Image object (cimg) with and label attribute. A data.frame. With a column for each of the requested colocalization statistics and a row for each of the regions of interest.  (Ahmed et al., 2018). The aim of this experiment is to determine how much of the two proteins are co-localized or co-distributed in the particular cell line (Fig. 1).

Selecting regions of interest (ROI)
The function roi_select relies on different algorithms from the imager package. However, using the functions to select the ROIs require no background knowledge in the workings of the algorithms and can be done through trying different parameters and choosing the most appropriate ones. Typically, one wants to select the regions of the image occupied by a cell or a group of cells. The package can also select certain areas/structures within the cell if they are distinct enough. The default behaviour is to select the largest contiguous region of the image and add the next (n) largest regions using the n argument.
The selection of ROIs is achieved using morphological operations from imager (Barthelme, 2018). In brief, we start by selecting the structures in the gray-scale image using the default values of three major operations; threshold, grow (dilation) and shrink (erosion). Thresholding excludes the pixels below a certain value. Grow and shrink test for whether a number of pixel outward and inward, respectively, belong to the structure. The combination of the two operations; fill and clean can include and exclude gaps in the structure, respectively. In our experience, a suitable selection can emerge easily by varying these parameters in a trial and error fashion.
This function returns a cimg object containing the original input image and an added attribute called label to indicate the selected regions. label is a vector of integers; with 0 for the non-selected areas, 1 for the first, 2 for the second selected regions and so on. The selection process can be assessed visually using roi_show. The function outputs four plots; the merge image, the pixel set and each of the two channels with highlighted ROIs (Fig. 2).

Quality assessment of pixel intensities
Both the co-localization measurements implemented in this package quantify different aspects of the linear trend between the pixel intensities from the two channels of the image. Therefore, it is useful to visualize this trend and the distribution of the intensities to make  sure whether the analysis is suitable. The expectation is that the pixel intensities from the two channels should align with the diagonal in the first graph and show nearly overlapped distributions in the second with the similar pattern of pixel values (Fig. 3).

Calculating co-localization measurements
The two different measurements implemented in this package are the PCC and MOC. We described the rational and the formulation of those measurements in ''Materials & Methods''. Invoking the test is a one function call on the selected regions of interest. roi_test returns a data.frame with a column for each of the desired measurements and a row for each of the selected regions (n) ( Table 2).

Testing for statistical significance
While colocr doesn't implement any formal statistical tests for significance, it is an important issue to discuss. One can test the significance of the difference in colocalization between two groups (co-localized vs uncorrelated probes) using a simple t -test. Alternatively, one can compare the observed co-localization measurement in one group to a null model generated from the same data. Dunn, Kamocka & McDonald (2011) discussed the difficulties in generating true random models to compare with the observations. For the purposes of comparison, probes that don't co-localize with the protein of interest (negative control) can be used. This comparison can be tested using a t -test when the observations are normally distributed, otherwise non-parametric tests can be used.

Processing a collection of images
To process a collection of images at once, the input for the functions should be lists of the original object type. Other parameter arguments can be single values that apply to all images or as lists of the same length with specific values for each image. Similarly, the output of image_load, roi_select and roi_test would be a list of the original output object type. For roi_show and roi_check, the output is the same set of plots for each image.

Graphic user interface (Shiny application)
Arguably, selecting the regions of interest is the most time-consuming step in this analysis. Usually, one has to select the regions by hand when using image analysis software such as ImageJ. This package only semi-automates this step, but still relies on the user's judgment on which parameters to use and whether the selected ROIs are appropriate. To simplify this step, the package provides a simple shiny app to learn these parameters interactively and use it in the rest of the workflow. This app can be invoked locally from within an R session or accessed online at the following address: https://mahshaaban.shinyapps.io/colocr_app2/.

Other image processing packages in R
The three main image processing packages available in R are imager, magick and EBImage (Barthelme, 2018;Ooms, 2018;Pau et al., 2010). imager wraps the CImg and magick wraps the Magick++ C++ libraries, respectively (Tschumperle, 2018;Bob Friesenhahn, 2018). Both packages and their underlying libraries contain a wide functionality for image processing and analysis. colocr uses some imager and magick functionality to simplify the co-localization analysis of microscopy images. Similarly, EBImage can be used to select areas of interest in images and extract pixel intensities.
In colocr, there are only a few high-level functions that map directly to the steps of the co-localization analysis. The users don't have to worry about much of the details of the data structures or the specifics of the applied morphological operations. Finally, the functions in this package are vectorized and can be used to process multiple images at once. The current implementation cannot handle 3D or time-series images. In addition, only common bitmap and raster image types are supported by the read function. Together, colocr uses existing image processing R packages to create a custom tool specific for the co-localization analysis.

A case study from the published literature
We used the colocr package to reproduce an analysis from the published literature for the co-localization of RKIP with four different proteins in DU145 cell line (Ahmed et al., 2018).
The amount of co-localization of RKIP was quantified with each of the proteins in more than 5 images each and represented as PCC and MOC values (Fig. 4). The quantification was originally conducted using the ImageJ Fiji plugin and was found to agree with colocr calculations in both PCC and MOC values, suggesting that our co-localization R package is very compatible with the ImageJ Fiji plugin.

Typical colocr workflow
A typical colocr workflow starts by loading the merge images in an R session using image_load. Then selecting the regions of interest using roi_select. Finally, calculating the desired co-localization measurement using roi_test. Optionally, roi_show highlights the selected regions on the images and roi_check visualizes the scatter and the density distributions of the pixel intensities. Figure 5 depicts the steps and the functions of the typical workflow.

Reproducing figures and table in this document
In this section, we simplified a version of the code used to produce this document. Briefly, we load the required R libraries, construct a path to the image file (example image) and apply a typical workflow to calculate the co-localization measurements. First, we start by loading the two libraries imager and colocr.

library(imager) library(colocr)
The example image used throughout the document is from DU145 cell line stained for RKIP and LC3 in the first and second channel, respectively. The image is included in the package and can be accessed using system.file.
We load the image using image_load and show it along with the two channels ( Fig.  1).