A R-Script for Generating Multiple Sclerosis Lesion Pattern Discrimination Plots

One significant characteristic of Multiple Sclerosis (MS), a chronic inflammatory demyelinating disease of the central nervous system, is the evolution of highly variable patterns of white matter lesions. Based on geostatistical metrics, the MS-Lesion Pattern Discrimination Plot reduces complex three- and four-dimensional configurations of MS-White Matter Lesions to a well-arranged and standardized two-dimensional plot that facilitates follow-up, cross-sectional and medication impact analysis. Here, we present a script that generates the MS-Lesion Pattern Discrimination Plot, using the widespread statistical computing environment R. Input data to the script are Nifti-1 or Analyze-7.5 files with individual MS-White Matter Lesion masks in Montreal Normal Brain geometry. The MS-Lesion Pattern Discrimination Plot, variogram plots and associated fitting statistics are output to the R console and exported to standard graphics and text files. Besides reviewing relevant geostatistical basics and commenting on implementation details for smooth customization and extension, the paper guides through generating MS-Lesion Pattern Discrimination Plots using publicly available synthetic MS-Lesion patterns. The paper is accompanied by the R script LDPgenerator.r, a small sample data set and associated graphics for comparison.


Introduction
Multiple sclerosis (MS), an inflammatory demyelinating disease of the central nervous system with neurodegenerative processes in the later course, affects more than 2.5 million people worldwide. It is the leading nontraumatic cause of serious neurologic disability in young adults. MS is initially characterized by phases of clinical relapses and remissions in 80-90% of the patients, and frequently followed by progression of disability over time. MS is highly variable-from benign to disastrous [1]: some patients may accumulate severe and irreversible disability within a few years, while others may show a benign course with just little or no disability even after decades. The hallmark of MS are lesions in the white and grey matter of the central nervous system, which are hyperintense on T2-weighted MRI sequences. MRI is the key technology to assess MS-lesion dissemination in space and time [2]. The number of MS-lesions, total lesion volume, spatial lesion pattern and single lesion shape are highly variable across patients with MS [3]. This makes the correlation of radiological data and clinical findings extremely challenging-a situation also known as the "clinicoradiological paradox" [4]. 2

of 10
Variography, a core method of classical geostatistics, proved suitable for explorative data analysis (EDA) of MS-White Matter Lesion patterns (MS-WML) and for extracting quantitative spatial-statistics metrics on MS-WML. The MS-Lesion Pattern Discrimination Plot (MS-LDP) summarizes these metrics in a clear and standardized form, to aid in followup, cross-sectional and medication impact analysis [5,6]. With the aid of the MS-LDP, significantly different evolution of MS-lesion patterns could be disclosed between male and female early MS cohorts [7].

Sample Data
Three phantoms of brains with MS-lesions (MNI_mild, MNI_moderate, MNI_severe) that were used for both illustrating this paper and as the accompanying sample data were downloaded from: https://brainweb.bic.mni.mcgill.ca/brainweb/anatomic_ms.html.

Software
The R-script LDPgenerator.r (Supplementary Materials, see Appendix A for description) is based on the statistical computing environment R [8] and was developed and tested in R version 3.6.0. LDPgenerator.r, the above-mentioned sample data and result files can be found in Appendix A. RNifti, a necessary R package can be downloaded and installed from the CRAN package repository: https://CRAN.R-project.org/package=RNifti. Geostatistics comprises a range of algorithms for characterizing, modelling [14] and simulation [15] of multidimensional data and associated uncertainty, including spatiotemporal data [16]. Originally developed for mining optimization [17], geostatistics subsequently found the way into a variety of space-time related fields. In medicine, early applications of geostatistics deal with the creation of geomedical maps, showing the geographical distribution of disease cases (https://www.esri.com/library/ebooks/geomedicine.pdf) [18]. Possible applications of geostatistics go far beyond 2D mapping in the macro scales, however. For example, geostatistical methods can be readily employed in the micro scales [19,20] and enable quantitative texture analysis [21,22]. Since this paper focuses on the application of the variogram to texture analysis of MS-lesion patterns, a short excursus to the foundation of variography seems reasonable. The variogram is a central EDA tool in classical two-point geostatistics ("classical" as compared to the more recent approach of Multiple-Point Geostatistics [23]). The variogram ( Figure 2) quantifies the spatial structure of measurements by contrasting the distance between pairs of measurement points (abscissa) and the associated variability (γ, on the ordinate). In other words, the variogram yields a measure of spatial correlation [24,25]. The empirical variogram γ(h) is straightforwardly computed using (Equation (1)):

Basics of the MS-Lesion Pattern Discrimination Plot
z(x) value of variable at some 3D location x, here z(x) is a voxel with z = binary variable (0 or 1); h lag vector of separation between observed data (units: here, mm); n(h) number of data pairs [z(x), z(x+h)] at lag h; γ(h) empirical variogram value for lag h. The γ(h) of a binary MS-WML is calculated by comparing the values (0 or 1) of all voxel pairs within a specified lag h, according to Equation (1). Calculating γ(h) for increasing lag distances h, the empirical variogram plot (short: "the variogram") is derived ( Figure 2).
Variograms of binary MS-WML generally start with small values of γ at small h (distance in Figure 2), which is due to the large correlation of adjacent voxel pairs. After an increase in γ with lag away from the origin, with further increases in h the correlation decreases, and the variogram levels off. The flatter the variogram near its origin, the more pronounced is the spatial correlation (i.e., in the current context: the larger the MS-lesions will be, on average). Computing variograms for specific lag orientations yields so-called directional variograms. Individual directional empirical variograms in the three major orthogonal directions of MNB geometry (x, y, z directions, compare Figure 1) can be used to disclose and quantify spatial anisotropies of MS-lesion patterns.
Since empirical variograms focus on visual inspection, several permissible variogram functions for quantifying empirical variograms were introduced [26]. These functions approximate an empirical variogram's shape by two parameters: the variogram range a, and the variogram sill c. Among the available variogram model functions, the exponential variogram model was found to be the most suitable for quantifying MS-lesion patterns [5]: c Sill; a Range; h lag vector of separation; γ(h) model variogram value for lag h. When exponential variogram models are separately fitted to x, y, z directional variograms (Equation (2) (Figure 3a). It yields information on MS-WML geometrical anisotropies-e.g., lesion confluence along the CSF system, isotropic growth of individual lesions or Dawson fingers. The MS-LDP ( Figure 3b) abstracts MS-WML geometry with just two parameters in combining above three value pairs by their means (Equation (3)): The MS-LDP quantifies important geometric aspects of MS-WML: a is considered a measure of spatial continuity, c is proxy of total lesion load. The higher a, the bigger and smoother (i.e., with less lesion surface roughness) are lesions; the higher c, the higher is total MS-lesion load. From analyzing a larger cohort of consistently acquired MS-WML, abscissa a, and ordinate c of the MS-LDP were defined to span: a . . . 0 to 3, c . . . −12 to −4 [6]. Figure 3a is a Component MS-LDP and Figure 3b is the associated MS-LDP, produced from MNI_mild, MNI_moderate and MNI_severe data. measure of spatial continuity, c is proxy of total lesion load. The higher a, the bigger and smoother (i.e., with less lesion surface roughness) are lesions; the higher c, the higher is total MS-lesion load. From analyzing a larger cohort of consistently acquired MS-WML, abscissa a, and ordinate c of the MS-LDP were defined to span: a … 0 to 3, c … −12 to −4 [6]. Figure 3a is a Component MS-LDP and Figure 3b is the associated MS-LDP, produced from MNI_mild, MNI_moderate and MNI_severe data. shows LDPgenerator.r code sections and sequence of operations. In short, the program flow is as follows: after the user has selected input and output files, for each input file, empirical variograms are calculated and exponential variogram models are fitted. Variogram graphics are generated and Component MS-LDP and MS-LDP parameters are stored in a container file. After processing all input files, plots are generated from the container file.
Below, code sections 1-5 are described in more detail (see Figure 4 and compare respective comments in LDPgenerator.r code): Section 1 (code lines 11-12): Package RNifti is loaded; this package is prerequisite for fast access to images stored in Nifti-1 or Analyze-7.5 medical image formats. Section 2 (code lines 14-29): Default values for variography parameters can be set by the user: image (binarization) threshold in case MS-lesion probability maps from automatic MS-lesion segmentation programs are to be processed, nonlinear least squares (nls) starting estimates, number of lags, graphics appearance.
ImageThreshold: The standard input to LDPgenerator is binary MS-WML, i.e., a binarized voxel array with MS-lesion voxels = 1 and all other voxels = 0. When floating-point MS-lesion probability maps with MS-lesion probabilities between 0.0 and 1.0 are input, ImageThreshold controls the conversion to binary: all voxels with values above ImageThreshold are set to 1.0 (MS-lesion), voxels with lower values are set to 0.0 (non-lesion). Figure 4 shows LDPgenerator.r code sections and sequence of operations. In sh program flow is as follows: after the user has selected input and output files, f input file, empirical variograms are calculated and exponential variogram models ted. Variogram graphics are generated and Component MS-LDP and MS-LDP para are stored in a container file. After processing all input files, plots are generated fr container file.  List of input files (*.nii, *.img, *.hdr): Input files can be selected from the list presented by the GUI. Input files must be formatted Nifti-1.0 (file type *.nii) or Analyze-7.5 (file pairs of type *.hdr,*.img). Mixing Nifti and Analyze files is possible. Analyze files can be selected by selecting either .hdr or .img. Make sure input files are explicitly 3D in the image headers (!) LDP container file (*.var): The user provides a file name, with standard extension var. The LDP container file (ASCII) contains variography data (compare Table 1) on each input file to generate the Component MS-LDP and MS-LDP graphics.

A Worked Example in 5 Steps
The associated sample data set comprises three MS-lesion probability maps ("brain phantoms") downloaded from brainweb https://brainweb.bic.mni.mcgill.ca/brainweb/ anatomic_ms.html, converted to the Analyze format. First, copy the file pairs MNI_mild.hdr, MNI_mild.img; MNI_moderate.hdr, MNI_moderate.img; MNI_severe.hdr, MNI_severe.img (files included in Supplementary Materials, see Appendix A for description) to a suitable directory. Then follow the 5 steps below: Step 1: From standard R, launch LDPgenerator.r (no changes of parameters necessary in the script); Step 2: Define input files: Navigate to the relevant directory and select MNI_mild.hdr, MNI_moderate.hdr and MNI_severe.hdr from the file list; Step 3: Define the LDP container output file: In the highlighted input box, type MNI.var (Supplementary Materials, see Appendix A for description);

Discussion
Based on the widespread, freely available R statistics environment, LDPgenerator.r provides routine production of MS-Lesion Pattern Discrimination Plots, associated variogram graphics and statistics.
The first implementation for generating MS-LDP that was used in [5][6][7] involved operating-system based scripting and incorporated, besides R scripts, two MS-Windows based software components. Obviously, this was tedious, error-prone and limited the generation of MS-LDP to the MS-Windows operating system. As compared to above original implementation, MS-LDP can now be produced easily in standard R, by just selecting input files and defining output files via the R standard graphical user interface.
The script runs with acceptable speed, processing time for a typical MS-WML is about one minute on a current PC. Clearly, an interpreted language like R has drawbacks regarding processing speed. There are several opportunities for script improvement, however: LDPgenerator.r involves several loops, parts of which could be re-formulated as optimized user defined functions, which in turn would speed up the script. Except of package RNifti, the current version of LDPgenerator.r is implemented on base R. Since the R environment enables straightforward code extension with a wealth of packages, here are some ideas for customization/improvements:

•
Adding a 3D viewer: The recent version of RNifti (1.1) contains a basic viewer for easy implementation of Nifti/Analyze viewing. Providing interactive viewing of MS-WML would facilitate setting the correct binarization threshold values, via enabling visual checking of resulting lesion probability maps. • Using a GUI package, e.g., https://r4stats.com/articles/software-reviews/r-guicomparison/. • Availability of state-of-art GUI elements like customizable buttons, input boxes or spinners would enable easy tuning of variography and graphics parameters. • Using an improved graphics package-see: https://cran.r-project.org/web/views/ Graphics.html. More sophisticated annotation elements like labels, scalable arrows, or improved legend elements would facilitate explorative data analysis of time series portrayed in the MS-LDP, e.g., data from follow-up MRI.
LDPgenerator.r was developed with R version 3.6.0 and tested with input data from variable sources. Some pitfalls occurred that need to be mentioned: LDPgenerator.r can process only Nifti and Analyze files that are declared "3D" in their file headers. Using R Studio to launch LDPgenerator.r, graphics generation failed with high-resolution (4k) graphics hardware.