mritc : A Package for MRI Tissue Classiﬁcation

This paper presents an R package for magnetic resonance imaging (MRI) tissue classiﬁcation. The methods include using normal mixture models, hidden Markov normal mixture models, and a higher resolution hidden Markov normal mixture model ﬁtted by various optimization algorithms and by a Bayesian Markov chain Monte Carlo (MCMC) method. Functions to obtain initial values of parameters of normal mixture models and spatial parameters are provided. Supported input formats are ANALYZE, NIfTI, and a raw byte format. The function slices3d in misc3d is used for visualizing data and results. Various performance evaluation indices are provided to evaluate classiﬁcation results. To improve performance, table lookup methods are used in several places, and vectorized computation taking advantage of conditional independence properties are used. Some computations are performed by C code, and OpenMP is used to parallelize key loops in the C code.


Introduction
Magnetic resonance imaging (MRI) is a non-invasive method for imaging the inside of objects, for example different organs of animals and humans. Different types of images have different contrast effects to satisfy specific applications. Here, we focus on structural imaging of the brain. Each image is a three-dimensional (3D) array of image intensities, one for each voxel (volume picture element). In the package mritc, we focus on using signal intensities from a T1-weighted MR acquisition. Figure 1(a) shows a coronal view of a brain. In a T1-weighted image, white matter (WM) tends to have light gray color, gray matter (GM) medium gray, and cerebrospinal fluid (CSF) dark gray (Hornak 2011). Brain tissue classification has many applications, for example in diagnosis of disease and preparation for surgery. Since manual tissue classification is labor intensive, (a) Coronal view (b) Density of intensities Figure 1: Coronal view of a brain from a T1-weighted image and its density plot of three tissues.
automated methods have been proposed to match the quality of the manual technique at lower cost.
MRI data consist of image intensities y 1 , . . . , y N for N voxels in a 3D array, where N is large, for example 256×256×192. Typically, a binary mask is used to remove non-brain tissue, and the remaining number of voxels (within the brain mask) is still quite large. Intensities are often scaled to [0,255] and rounded to an integer. A density plot of intensities from a low noise MR image is shown in Figure 1 (b). For voxel i = 1, . . . , N , the goal is to find the tissue types, denoted by z i ∈ {1, 2, 3} corresponding to the three major tissue types.
The motivation of the package was to make available some tissue classification methods especially the ones using the Bayesian MCMC based on Feng et al. (2012), which are not available elsewhere.
The next section reviews classification approaches provided in the package. Section 3 discusses computational issues involved to improve speed. The overview of the functions available and several examples are shown in Section 4. Some discussion and directions for future work are presented in Section 5.

Methods available
The package provides tools for MRI tissue classification using normal mixture models and (partial volume, higher resolution) hidden Markov normal mixture models fitted by various methods. This section outlines the approaches available. For more up-to-date references on MRI tissue classification, see Feng et al. (2012) and references therein.

The pure voxel assumption
There are three modes in Figure 1(b), and CSF are usually with small intensity values, the GM are often in the middle, and WM are those that tend to have larger intensity values. A mixture model is a natural choice when there are multiple modes in a density plot. Given the tissue structure z = (z 1 , . . . , z N ), intensities are independent and normally distributed . The means and variances depend on the tissue types. Assuming tissue types are independent leads to a simple normal mixture model (NMM) Parameters are easily estimated by the expectation-maximization (EM) algorithm (Dempster et al. 1977) and tissue types can be assigned using the Bayes classifier (Mitchell 1997).
Since the data come from a 3D array, a natural question would be how the spatial information could be incorporated. Since adjacent voxels are likely to contain the same tissue type, a more realistic model accounts for this spatial homogeneity in z. The Potts model family provides simple models for spatial homogeneity where C(β) is a normalizing constant and i∼j indicates neighboring voxels. We need to define neighborhood structure and then assign relationships among neighboring voxels. For a 3D array, besides defining six neighbors in the x, y, and z directions, one can add twelve diagonal neighbors in the x − y, x − z, and y − z planes, and another eight on the 3D diagonals. This leads to a six-neighbor structure, an 18-neighbor structure, and a 26-neighbor structure ( Figure 2).
The parameter β in (1), called inverse temperature, determines the level of spatial homogeneity between neighboring voxels in the image. A zero β would imply that neighboring voxels are independent. We use positive β values hereafter. The w ij are weights and we assume w ij ≡1 hereafter. The term N i=1 α i (z i ) is called the external field. The α i (z i ) are functions of z i . When β = 0, the external field completely characterizes the probability distribution of the independent z i , i = 1, 2, . . . , N .
The Potts model is an example of a Markov random field (MRF) model, since the conditional distribution of z i given all others is equal to the conditional distribution of z i given all its neighbors. Let I denote the indicator function and I(x, y) = 1 when x = y and 0 otherwise.
For k = 2 components this model is called the Ising model (Ising 1925); for k > 2 it is the Potts (1953) model. The Ising model was originally proposed to describe the physical properties of magnets. Due to its power, the Ising model and its various versions have been widely used in other fields, such as brain models in cognitive science, information and machine learning theory (MacKay 2003), economics (Bourgine and Nadal 2004), sociology (Kohring 1996) and game theory (Hauert and Szabó 2005).
The most commonly used Potts model is the one without an external field and with w ij ≡1, We refer to this as the simple Potts model hereafter.
The spatial correlation is accommodated by p(z), a model from the Potts family to characterize the property among tissue types of voxels, and given the tissue types, the intensity values are independent and normally distributed with means and variances depending on the tissue types.
The HMNMM can be fitted by the iterated conditional modes (ICM) algorithm which alternately maximizes each parameter conditional on all others being fixed (Besag 1986), or the hidden Markov random field EM (HMRFEM) algorithm which is a variation of the EM algorithm (Zhang et al. 2001). Though the ICM has its drawbacks, such as sensitivity to the initial configuration and visiting scheme, and getting stuck at a local maximum, it is fast and easy to implement. It is useful to run the ICM algorithm as a pilot study to make sure that the hidden Markov normal mixture model provides reasonable results before using more time consuming algorithms.
Alternatively, we can use the Bayesian MCMC method to fit the model by specifying a prior distributions p(µ, σ 2 ) on µ, σ 2 and then using MCMC to compute characteristics of the posterior distribution p(µ, σ 2 , z | y). Assume µ, σ 2 , z are independent; µ have independent normal distributions with the ordering restriction µ 1 < . . . < µ k to avoid identifiability issues associated with label switching (Celeux et al. 2000); and σ 2 have independent and identically distributed inverse Gamma distributions. Then the full conditionals satisfy µ independent normal, σ 2 independent inverse Gamma, and z Potts model with external field α i (z i ) = log f (y i | µ(z i ), σ(z i )). The MRI segmentation for human brain by the Bayesian MCMC method has rarely been used before. Almost all previously proposed Bayesian methods have used maximum a posteriori (MAP) estimation, which essentially reduces to an optimization problem (Feng et al. 2012).

Incorporating partial volume effects
In the previous section, all methods were based on the assumption that voxels are pure, containing only one tissue type. A more realistic model would take into account the fact that some voxels are not homogeneous: while some may contain only one tissue type, others on the interface will contain two or possibly three different tissue types. This phenomenon is called the partial volume (PV) effect (Pham et al. 2000).
To address this, one approach is to introduce intermediate classes: the combination of CSF/GM (CG) and the combination of GM/WM (GW). This helps reduce confounding in estimation. A number of studies have used this approach, and among them the Gaussian and partial volume (GPV) method, which uses normal mixture model with dependent means and variances, performs well (Cuadra et al. 2005). For the GPV, it assumes the means and variances of CG and GW are equal to the weighted average of corresponding pure tissues and the densities of voxels from CG and GW are equal to mean densities based on the distribution of weights. However, there is no analytical solution to the integrals and numerical integration must be used. The algorithm to implement that with a Potts prior is referred to as the partial volume HMRFEM (PVHMRFEM) in the package.
We have adopted a different approach. Each voxel is divided in half in the x, y, z directions, producing eight subvoxels. Each subvoxel is viewed as containing only one tissue type and the observed voxel intensity is equal to the sum of its eight unobserved. Conditional on the tissue types of subvoxels, the intensity values of subvoxels are independent normals and a spatial model is used at the subvoxel level. Furthermore, to capture the fact that CSF and WM rarely coexist in a voxel we use We call this model the repulsion Potts model and basically it assumes that it is most likely that the neighboring subvoxels are of the same tissue type and it is most unlikely that one subvoxel is from CSF but its neighbor is from WM and vice versa. We use a Bayesian formulation to solve it. The higher resolution model fitted by the Bayesian MCMC (MCMCsub) provides more accurate tissue classification and also allows more effective estimation of the proportion of each voxel that belongs to each of the major tissue types. A similar subvoxel idea has been used in Van Leemput et al. (2003). However, there are differences between their approach and ours; in particular, our method avoids problems such as classifying voxels containing both WM and GM as pure GM voxels (Feng et al. 2012).

Computational issues
The size of the MRI data is typically very large, for example, the data from BrainWeb (Collins et al. 1998), a simulated brain database, has over 1.8 million voxels even after masking out non-brain voxels. Furthermore when we use the higher resolution model in Section 2.2, each voxel is further divided into eight subvoxels. To improve speed, table lookup methods are used in various places, vectorization is used to take advantage of conditional independence, and some computations are performed by C code and C using OpenMP to parallelize loops. For a model from the Potts family, the conditional tissue type for a (sub)voxel is defined uniquely by its neighbors, and the conditional distributions are exactly the same discrete distribution given the same configuration of (sub)voxels in the neighboring voxels. Furthermore, the conditional distribution depends only on the total number of (sub)voxels from different tissue types. Therefore we can enumerate all possible cases and their corresponding conditional probabilities and use a pre-computed table of probabilities for all (sub)voxels when updating tissue types. For example when each (sub)voxel has six neighbors and there are three tissue types, there are only 343 different cases in the table. To enumerate all possible cases, first all configurations for different numbers of neighbors and tissue types are computed using a recursive method and then the unique combination of total number of (sub)voxels from different tissue types are generated. For the commonly used default setup with three tissue types and either six, 18, or 26 neighbors, the tables are pre-computed and saved for repeated usage. For others, the tables are computed at runtime.

Conditional independence
Suppose we can divide variables that need to be updated into disjoint blocks and, given the variables in other blocks, all the variables within the same block are conditionally independent. Then we can update one block at a time, and generate variables within each block independently. This may allow the generation within a block to be vectorized or parallelized. For example, in Figure 3(a), under the four neighbor structure in 2D (a voxel has neighbors on its south, north, east, and west), given the blacks, the whites are independent and vice versa. By this kind of independence, the vector z (indices to tissue types) can be updated in two steps: one for the blacks, one for the whites. This concept can be traced back to Besag's "Coding Methods" (Besag 1974); it was described in Wilkinson (2005) and a detailed discussion can be found in Winkler (2003). This conditional independence can be extended to 3D lattices with a six neighbor structure. In Figure 3(b), given the blacks, the whites are  independent and vice versa. The minimum number of blocks to make the variables within each block independent given the other blocks is called the chromatic number (Winkler 2003). So the chromatic numbers for six neighbors in 3D is two, for 18 neighbors it is seven, and for 26 it is eight. See Feng (2008) for more details.
The conditional independence is used for vectorized computation when updating the vector z.
Combined with the table lookup method, the speed is greatly improved. To further improve the speed, embedded C code with parallel computation implemented by OpenMP is used.

OpenMP
OpenMP is a framework for shared memory parallel computing. It is supported by most C/C++ and Fortran compilers and runtime systems currently in use, and is based on the fork/join model. For a concise introduction to OpenMP see Tierney (2007).
The following code snippet illustrates the usage of the OpenMP in a relatively simple case. The code is from an embedded C function to fulfill the task of updating (sub)voxel intensity values. The upper part is an abstract of the function from the sequential code and the lower part illustrates its corresponding parallel code. We can see that the additional coding effort is not very much in this case.

Number of processors
One of the advantages of OpenMP is that we can upgrade the code from sequential to parallel one step at a time. In each step, we just focus on part of the code. In this way, there is an incremental path to parallelism and the integrity of typically well tested serial code is maintained.
To realize parallelism, compiler directives #pragma in C are used and the keyword omp distinguishes the pragma as an OpenMP pragma. We parallelize a for-loop specified by parallel for. With a compiler that can recognize the directive, the code is run in a parallel fashion, otherwise the code is run sequentially. Therefore it is easy to port the code from one machine to another.
In the example, there is also a scope clause firstprivate to specify the sharing attributes of variables and an implicit barrier for synchronization. From our experience, it is a good idea to explicitly specify the scope of all variables used in a parallel region. In particular it is important to specify the firstprivate scope for variables that are initialized in the sequential portion and used but not modified in the parallel loop. Failure to identify these as firstprivate can result in poor performance. Some compilers may be able to infer this property in some cases but others are not able to, so to ensure that code works well on a range of compilers explicit specification is a good idea. See Chandra et al. (2001) for detailed discussion on OpenMP.
The computational time of using the MCMCsub method on a full BrainWeb data set (with 1,838,675 voxels inside the mask) is shown in Table 1. The number of sweeps is 100, which seems to be sufficient to produce good classifications. The speedup in shown in Figure 4. The program is run on a Linux dual quad-core AMD Opteron 2.3GHz, 48GB RAM computer. The speedup is about 2.1 when eight processors are used. By Amdahl's law, no matter how well the parallel proportion is coded and how many processors are used, the performance of the overall code will be limited by the proportion that can be parallelized and the larger the number of processors, the more severe the effect.
For the number of processors, without explicit specification, all available ones are used. To specify the number of processors, we can use the environment variable OMP_NUM_THREADS. Suppose the code is run with four processors. For the Unix-like machines, run R with OMP_NUM_THREADS specified as follows.

OMP_NUM_THREADS=4 R
For the Windows users, the OMP_NUM_THREADS can be specified as follows. First, click "start" to find R icon. Second, right click the icon and then click "properties". Finally, add OMP_NUM_THREADS=4 to the "Target". Furthermore, for the Windows version, one needs to put pthreadGC2.dll at C:\WINDOWS\system32. To obtain pthreadGC2.dll, use ftp: //sourceware.org/pub/pthreads-win32/dll-latest/lib/pthreadGC2.dll. Functions to test and specify the number of processors within the package may be provided in the future.

Overview of the package
In this section, we give first an overview of the different functions of the package and then several examples on the usage of the package in detail.

Functions of the package
The ANALYZE, NIfTI, and raw byte file formats are supported for input and output. For the ANALYZE format, the functions are taken from the package AnalyzeFMRI Initial values of the means, variances, and proportions of the NMMs can be generated using Otsu's method implemented by a fast algorithm using the function initOtsu. Another function to generate initial estimates is initProp. Otsu's algorithm was originally proposed to threshold a gray-level image to a binary image (Otsu 1979). It was generalized to multi-level thresholding and a fast algorithm was proposed in Liao et al. (2001). Note that for MRI data, it can be slow if the number of classes is greater than three (more than two threshold values) even with the fast algorithm implemented, since Otsu's algorithm uses an exhaustive search. But it should be sufficient with three classes, which corresponds to the typical case in MRI classification. For initProp, the threshold values are quantiles based on the guess of proportion of different components which is much faster than using initOtsu when there are more than three components.  Various spatial input parameters for different methods, such as the neighbors of corresponding voxels, the blocks for conditional independence, and the subvoxels corresponding to each voxel, can be obtained using the function makeMRIspatial. The package supports six, 18, and 26 neighbors structure typical for MRI ( Figure 2).
The classification methods and their corresponding function names in the package are listed in Table 2.
For the default setup of the Potts models, a simple Potts model is used. The exception is for MCMCsub and PVHMRFEM, where the repulsion Potts model is adopted. A six neighbor structure in 3D is the default neighbor definition. The default value of β for PVHMRFEM is taken from Cuadra et al. (2005). For other methods, the values are obtained through a validation procedure (Feng et al. 2012). The argument spatialMat is used to specify the format of f (z i , z j ) in the model (1). For a simple Potts model, it is an identity matrix and for a repulsion Potts model it is used to specify β 1 /β and −β 2 /β.
The package provides a uniform platform for all methods with easier usage by the function mritc. The user must specify the input MR image intarr, the mask of the image, and the method used. The other parameters are defined automatically. The output is an object of class mritc and generic print, summary, and plot methods are provided.
Furthermore different metrics for accuracy of predictions based on available truth can be obtained through the function measureMRI. The measures available include: mis-classification rate, average mean square error, dice similarity measure (DSM, Cuadra et al. 2005), confusion table, and tissue volume error. The mis-classification rate compares whole voxel classification to the discrete anatomical model. This requires a form of rounding of the results and doesn't reflect all the PV information each approach provides. To demonstrate the ability of capturing the PV effect, average mean square errors between the estimated and the true tissue distributions within voxels is provided. Another performance measure is the DSM defined as follows where N t a and N t b are the number of voxels classified as tissue t by method a and b respectively, and N t a∩b is the number of voxels classified as tissue t by both methods a and b. The larger the DSM, the more similar the results from the two methods. Furthermore, the confusion table could be used to evaluate the results on a tissue per tissue base. The element on the ith row and jth column of the table is the probability of classifying voxels as being from category i given that the true category should be j. The differences in absolute values between the estimated and true tissue volumes with respect to the truth are also available.

Examples
This part demonstrates the usage of the package through four examples using the data sets contained in the package. The data was adapted from the BrainWeb repository with the size reduced to take less space and time for the purpose of illustration.
The first example reads the T1-weighted image and its corresponding mask using readMRI and focus on voxels inside the brain. The classification is based on the NMM fitted by the EM algorithm using mritc.em. The initial values of the NMM are obtained using initOtsu.
Instead of using the HMM, the second example use the higher resolution HMNMM fitted by the Bayesian method. The spatial parameters are obtained using the function makeMRIspatial with argument sub = TRUE.
Compared with the first two examples, the third is more comprehensive. The T1-weighted image and the mask are read in and the image is plotted using the function slices3d in misc3d (Feng and Tierney 2008) and then the wrapper of all classification methods mritc is used to classify the image using the ICM algorithm. Next, the classification results are plotted and evaluated by different metrics based on the truth. The truth reflects the proportion of different tissues present in each voxel. Finally, the images of the classification results and the original are overlaid for an overall view.
First the image is read and plotted.
R> t1 <-T1 R> t1[mask == 1] <-0 R> icm.class <-max.col(tc.icm$prob) R> tc.icm$mask[tc.icm$mask == 1] <-icm.class R> slices3d(t1, tc.icm$mask, col2 = terrain.colors(4)[4:1], alpha = 0.7) In the last example, we show different ways to obtain initial estimates. As mentioned in Section 4.1, there are two functions available. Compared with initOtsu, which just needs the specification of number of threshold values, initProp requires a rough guess on proportions of different tissues. As to the speed, the former is slower than the later, though the difference is small when there are three types of tissues.