Mokken Scale Analysis in R

Mokken scale analysis (MSA) is a scaling procedure for both dichotomous and polytomous items. It consists of an item selection algorithm to partition a set of items into Mokken scales and several methods to check the assumptions of two nonparametric item response theory models: the monotone homogeneity model and the double monotonicity model. First, we present an R package mokken for MSA and explain the procedures. Second, we show how to perform MSA in R using test data obtained with the Adjective Checklist.


Introduction
In this paper an R package (R Development Core Team 2007), called mokken, for Mokken scale analysis (MSA) is discussed.MSA is a scaling technique for ordinal data and mainly used for scaling test and questionnaire data.MSA is closely related to nonparametric item response theory (IRT) models which imply ordinal measurement.MSA consists of two parts: (1) an automated selection algorithm which partitions a set of ordinal variables (from here on called items) into scales (called Mokken scales) satisfying criteria related to nonparametric IRT models and possibly leaving some items unselected, and (2) methods to investigate assumptions of nonparametric IRT models.
The paper provides a short summary of the main concepts in MSA but is by no means exhaustive.For a more thorough discussion of nonparametric IRT, MSA, and data analysis strategies we refer to the following literature.Nonparametric IRT and MSA for dichotomous item scores were developed by Mokken (1971; also see Mokken and Lewis, 1982) and extended to polytomous items scores by Molenaar (1991Molenaar ( , 1997)).Sijtsma and Molenaar (2002) gave an overview of nonparametric IRT and MSA, and provided many references and examples.Meijer and Baneke (2004) demonstrated how MSA can be used preliminary to parametric IRT.
Currently available software for MSA are a commercial package called MSP5 for Windows (Molenaar and Sijtsma 2000) and a Stata module (Weesie 1999).
The remainder of the paper is organized as follows.Section 2 discusses nonparametric IRT models and several methods to check model assumptions.Section 3 discusses the functions in mokken.Section 4 gives a demonstration of MSA by applying the functions in mokken to personality test data.

Mokken scale analysis 2.1. Nonparametric IRT models
Suppose a test or a questionnaire contains a set of items which are numbered 1, . . ., J and indexed by j.For convenience, but without loss of generality, suppose that each item has m+1 ordered answer categories.Let X j denote the score on item j with realization x j = 0, 1, . . ., m.If m = 1 the item is called dichotomous; if m > 1 the item is called polytomous.The sum score is defined as X + = J j=1 X j .In IRT it is assumed that a (possibly multidimensional) latent trait θ triggers the item responses.It is also assumed that the ordering of the scores of each item reflects the hypothesized ordering on θ.Expression X j ≥ x is called an item step (Sijtsma and Molenaar 2002, p. 122) and P(X j ≥ x|θ) is called the item step response function.Because P(X j ≥ 0|θ) = 1 for all θ, the relation between item j and θ is characterized by m item step response functions: P(X j ≥ 1|θ), . . ., P(X j ≥ m|θ).For dichotomous items the item step response function reduces to P(X = 1|θ).
Four assumptions define the two most popular nonparametric IRT models.
Local independence implies that the item responses only depend on θ.Latent monotonicity means that the item step response functions are nondecreasing functions of θ. (Latent monotonicity is usually referred to as monotonicity, but we prefer the term latent monotonicity to distinguish it from manifest monotonicity that it introduced later on.)Nonintersection means that the item step response functions do not intersect.
The assumptions unidimensionality, local independence, and latent monotonicity define the most general nonparametric IRT model: the monotone homogeneity model (Mokken 1971) also known as the nonparametric graded response model (Hemker, Sijtsma, Molenaar, and Junker 1997).Assumptions unidimensionality, local independence, latent monotonicity, and nonintersection define the double monotonicity model (Mokken 1971).Several other nonparametric IRT models have been proposed (see van der Ark 2001, for an overview).All popular unidimensional parametric IRT models, such as the Rasch model (Rasch 1960), the two-and three-parameter logistic model (Birnbaum 1968), the graded response model (Samejima 1969), also assume unidimensionality, local independence, and latent monotonicity.Therefore, investigation of the assumptions of nonparametric IRT models is also useful when parametric IRT models are used.In addition, parametric IRT models assume that the item step response functions have a parametric functional form.
Nonparametric IRT models have the following measurement properties.For dichotomous items, the monotone homogeneity model implies stochastic ordering of θ by X + (known under the acronym SOL), i.e., P(θ > a|X + = L) ≥ P(θ > a|X + = K) for all a and for all K < L (Hemker, Sijtsma, Molenaar, and Junker 1996; also see, Grayson 1988;Huynh 1994).Because the monotone homogeneity model is the most general IRT model, SOL also holds for other popular IRT models for dichotomous item scores.In general, SOL does not hold for IRT models for polytomous item scores (Hemker et al. 1997) but for most models violations are rare if the number of items exceeds five (van der Ark 2005).For dichotomous items, the double monotonicity model allows an invariant ordering of the items on θ.For polytomous items this is not the case.Sijtsma and Junker (1996) and Sijtsma and Hemker (1998) provided more details on item ordering.

Scalability coefficients
For each pair of items, there is an item-pair scalability coefficient H ij ; i, j = 1, . . ., J (Molenaar 1991).Let COV(X i , X j ) be the covariance between X i and X j , and let COV(X i , X j ) max be the maximum covariance between X i and X j given the marginal distributions of X i and X j .
If the variance of the scores on item i and item j are both positive, then H ij is the normed covariance between the item scores: If X i or X j have zero variance, H ij can still be computed (Molenaar 1991) but Equation 1is no longer true.In MSA, items belonging to the same Mokken scale should have positive item-pair scalability coefficients.
For each item, there is an item scalability coefficient H j ; j = 1, . . ., J (Molenaar 1991).Let R −j = X + −X j ; R −j is called the rest score.Let COV(X j , R −j ) be the covariance between X j and R −j , and let COV(X j , R −j ) max be the maximum covariance between X j and R −j given the marginal distributions of X j and R −j .If X j and R −j both have positive variance, then H j is the normed covariance between the item score and the rest score: In MSA, items belonging to the same Mokken scale should have an item scalability coefficient greater than some positive lower bound c.As a rule of thumb c > .3(Sijtsma and Molenaar 2002).van Abswoude, van der Ark, and Sijtsma (2004) argued that H j can be interpreted in a similar way as the discrimination parameters in parametric IRT.
For the entire set of items, there is a test scalability coefficient H: COV(X j , R −j ) max .

Automated item selection algorithm
An important part of MSA is the partitioning of a set of items into Mokken scales and possibly a set of unscalable items.Mokken (1971, p. 184) defined a Mokken scale as a set of dichotomously scored items for which, for a suitably chosen positive lower bound c, all inter-item covariances are strictly positive and H j ≥ c > 0. This definition can be readily generalized to polytomously scored items.Partitioning a set of items into Mokken scales is done using an automated item selection algorithm (Mokken 1971, pp. 190-193) described in detail by Sijtsma and Molenaar (2002, Chapter 5) and van Abswoude et al. (2004).Two parameters, lower bound c and nominal significance level α, have to be specified by the researcher.Lower bound c defines the minimum value of coefficients H j in the Mokken scale.
The recommended default value is c = .3(Molenaar and Sijtsma 2000).Parameter α is the nominal significance level of the inequality tests used in the automated item selection algorithm and its recommended default value is .05.

Investigation of latent monotonicity
Manifest monotonicity is an observable property of the test data, and defined as Junker and Sijtsma (2000) showed that for dichotomous items latent monotonicity implies manifest monotonicity.For polytomous items, some counterexamples have been found (Junker and Sijtsma 2000) but Molenaar and Sijtsma (2000) assume that in practice, also for polytomous items, manifest monotonicity is a valid test of latent monotonicity.
A first practical issue when using manifest monotonicity to investigate latent monotonicity is that the number of respondents having R −j = r may be too small for an accurate estimation of P(X j ≥ x|R −j = r).This is solved by grouping respondents with adjacent rest scores until the size of the rest score group is greater a preset criterion called minsize (Molenaar and Sijtsma 2000, pp. 67-70).In fact, (3) becomes where s 1 , . . ., s n and r 1 , . . ., r m are n and m consecutive integers, respectively.For the default method to construct rest score groups see Molenaar and Sijtsma (2000, p. 67).It is advised to investigate latent monotonicity several times using different values for minsize.
A second practical issue is that some violations of manifest monotonicity may be too small to be relevant.Therefore, only violations greater than minvi (default value is .03)are reported and for each reported violation a significance test at level α = .05(without Bonferroni correction) is computed (Molenaar and Sijtsma 2000, p. 67).

Investigation of nonintersection
Molenaar and Sijtsma(2000, pp. 74-88) describe three methods to investigate nonintersection: method pmatrix, method restscore, and method restsplit (method restsplit is not included in mokken).All three methods are a special case of the following implication.If local independence holds, then nonintersection, implies manifest property where W is a manifest variable independent of X i and X j .

Method pmatrix
Method pmatrix was proposed by Mokken (1971, pp. 180-182).Manifest variable W in ( 5) is the dichotomized score on item k.Mokken showed that if (4) holds then and The joint probabilities P(X i ≥ x, X k ≥ z) are collected in a Jm × Jm matrix called the P(++) matrix.The rows and columns of the P(++) matrix correspond to the Jm item steps ordered in popularity.The first row and column of the P(++) matrix correspond to the least popular item step, the last row and column correspond to the most popular item step.Entries in the P(++) matrix pertaining to the same item (i.e.P(X j ≥ x, X j ≥ z) j = 1, . . ., J; x = 1, . . ., m; z = 1, . . ., m) are not considered.The P(++) matrix can be used to test nonintersection in the following way.Equation ( 6) is equivalent to a P(++) matrix that has nondecreasing entries both rowwise and columnwise.
Similarly, the P(−−) matrix contains the joint probabilities P(X i < x, X j < y).The rows and columns of the P(−−) matrix correspond to the same ordered item steps as the rows and columns of the P(++) matrix.Entries P(X j < x, X j < z) (j = 1, . . ., J; x = 1, . . ., m; z = 1 . . ., m), i.e. entries pertaining to the same item, are not considered.Equation ( 7) is equivalent to a P(−−) matrix that has nonincreasing entries both rowwise and columnwise.

Method restscore
A second choice for W in ( 5) is rest score R −i,−j = X + − X i − X j (note that X + is not an adequate choice because it depends on X i and X j ).Equation 5 then becomes Method restscore investigates nonintersection for each pair of items.
Similar practical issues as discussed in Section 2.4 apply to method restscore.First, the number of respondents having R −i,−j = r may be too small for an accurate estimation of P(X i ≥ x|R −i,−j = r) and rest score groups must be formed (Molenaar and Sijtsma 2000, pp. 74-78).
Second, some violations may be too small to be relevant.Therefore, only violations greater than minvi (default value is .03)are reported and for each reported violation a significance test at level α = .05(without Bonferroni correction) is computed (Molenaar and Sijtsma 2000, p. 78).

Description of the functions in mokken
The package mokken contains five principal functions.Except for the graphics, the function names and the output in mokken are similar to function names and output in the package MSP5 for Windows (Molenaar and Sijtsma 2000).The graphics in mokken differ substantially from the graphics provided by MSP5 for Windows.The functions in mokken were tested on several real and simulated data sets.In all occasions the results were identical to results obtained with MSP5 for Windows.

check.monotonicity
Description: Returns the results from the investigation of latent monotonicity.
Usage: check.monotonicity(X,minvi = .03,minsize = default.minsize) Required arguments: X: matrix or data frame of numeric data containing the responses of N respondents to J items.Missing values are not allowed.
Value: Returns an object of class monotonicity.classcontaining results: A list with as many components as there are items.Each component itself is also a list containing the results of the check of manifest monotonicity.See Section 4 or (Molenaar and Sijtsma 2000, pp. 66-74) for more detailed information; I.labels: the item labels, Hi: the item scalability coefficients H j (2); and m: the number of answer categories.
Details: S3 methods are available so summary and plot can be used for objects of class monotonicity.class.Let MC be an object of class monotonicity.class.
summary(MC) returns a matrix with a summary of the results of the investigation of latent monotonicity (See Section 4 for an example).plot(MC, items = all) returns a graph (See Section 4 for an example).
items: vector containing the numbers of the items for which the results are depicted graphically.By default the results for all items are depicted.
Usage: check.pmatrix(X,minvi = .03) Required arguments: X: matrix or data frame of numeric data containing the responses of N respondents to J items.Missing values are not allowed.
Optional arguments: minvi: minimum size of a violation that is reported (Molenaar and Sijtsma 2000, p. 71).
Value: returns an object (of class pmatrix.class)containing Ppp: the P(++) matrix, Pmm: the P(−−) matrix, I.item: vector indicating to which items the rows and column the P(++) matrix belong.I.step: the labels of the item steps in order of popularity, I.labels: the item labels, Hi: the item scalability coefficients H j (2), and minvi: the value of minvi.
Details: The output is often numerous.S3 methods are available so summary and plot can be used for objects of class pmatrix.class.Let PC be an object of class pmatrix.class.
summary(PC) returns a list with two components.The first components contains a summary of the P(++) matrix per item and the second component contains a summary of the P(−−) matrix per item.See Section 4 for an example.plot(PC, items = all, pmatrix=both) returns a graphic display of the results of the investigation of nonintersection using method pmatrix (See Section 4 for an example).items: vector containing the numbers of the items for which the results are depicted graphically.By default the results for all items are depicted.pmatrix has values "ppp", "pmm", and "both"; If pmatrix="ppp", then the P(++) matrix is plotted, if pmatrix="pmm", then the P(−−) matrix is plotted, if pmatrix="both", then both the P(++) matrix and P(−−) matrix are plotted.
No c-axis labels are provided in the plot if the number of item steps is greater than 10.

check.restscore
Description: Returns the results from the investigation of nonintersection using method restscore.
Usage: check.restscore(X,minvi = .03,minsize = default.minsize) Required arguments: X: matrix or data frame of numeric data containing the responses of N respondents to J items.Missing values are not allowed.
Value: returns an object (of class restscore.class)containing results: A list with as many components as there are item pairs.Each component itself is also a list containing the results of the check of nonintersection using method restscore.See Section 4 or (Molenaar and Sijtsma 2000, pp. 74-78) for more detailed information; I.labels: the item labels, Hi: the item scalability coefficients H j (2); and m: the number of answer categories.Details: the output is often numerous because results is a list of J(J − 1)/2 components.The procedure can be slow for large numbers of items.S3 methods are available so summary and plot can be used for objects of class restscore.class.Let RC be an object of class restscore.class.
summary(RC) Returns a matrix with a summary of the results of the checks of nonintersection using method restscore.
plot(RC, item.pairs= all) returns a graphic display of the results of the investigation of nonintersection using method restscore (See Section 4 for an example).item.pairs:vector containing the numbers of the item pairs for which the results are depicted graphically.For example, item.pairs = 1 prints the results for items 1 and 2, item.pairs= 2 prints the results for items 1 and 3, item.pairs= J prints the results for items 1 and J, and item.pairs= J+1 prints the results for items 2 and 3.By default the results for all item pairs are depicted.

The data
MSA was performed on Adjective Checklist (Gough and Heilbrun 1980) data, acl, which are available in mokken.The data (Vorst 1992) contain the scores of 433 students from the University of Amsterdam on 218 items from a Dutch version of the Adjective Checklist.Each item is an adjective with five ordered answer categories (0 = completely disagree, 1 = disagree, 2 = neither agree nor disagree, 3 = agree, 4 = completely agree).Form each adjective, the respondents must consider to what degree it describes their personality, and mark the answer category that fits best to this description.
Initially, the number of items in the Dutch version of the Adjective Checklist was larger than 218.Oosterveld (1989) suggested constructing 22 scales of 10 items each, and ignoring the remaining adjectives.Two items were not included in the administered test leaving two scales with 9 items.77 of the 218 items that constitute the ten scales were negatively worded.The negatively worded items are indicated by an asterisk in the dimnames and their item scores were reversed.296 out of the 94,394 responses (.03%) were missing; item scores were imputed for the missing scores using method Two-Way with Error (for details on this imputation method see Bernaards and Sijtsma, 2000) applied to each scale separately, yielding a completed 433 × 218 data matrix.Table 1 gives an overview of the 22 scales of the Adjective Checklist and their items.The output shows that all H ij s are positive satisfying the first criterion of a Mokken scale.Seven out of ten H j s are less than the lower bound c = .3which violates the second criterion of a Mokken scale; especially the item-scalability coefficient for Unintelligent * is low (H j = .11).The scalability coefficient for the entire scale, H, equals .26which is too low for the qualification "weak scale".

Investigation of latent monotonicity
Investigating latent monotonicity in the scale Communality can be done using the following three commands.
R> monotonicity.com<-check.monotonicity(acl.com)R> summary(monotonicity.com) R> plot(monotonicity.com) All computations are done in the function check.monotonicity; the most important component of the resulting object monotonicity.com is results, which is a list with as many components as there are items in the scale (i.e.10), and each component itself is also a list with four components.These results are summarized by summary and visualized by plot.
The result of summary(monotonicity.com) is a matrix showing for each item a summary of the checks of manifest monotonicity.

Investigation of nonintersection
Investigating nonintersection in the scale Communality can be done using methods pmatrix and restscore.

Method pmatrix
Method pmatrix can be applied using the following three commands.
R> pmatrix.com<-check.pmatrix(acl.com)R> summary(pmatrix.com)R> plot(pmatrix.com)Function check.pmatrixcomputes the P(++) matrix (pmatrix.com$Ppp)and the P(−−) matrix (pmatrix.com$Pmm).The size of the two matrices (Jm × Jm = 40 × 40) is too large to be useful for inspecting violations of nonintersection.Function summary reduces the quantity of the output.Function summary produces summaries of both the P(++) and the P(−−) matrix.The summary of the P(++) matrix is For each item the output shows: the scalability coefficient H j (ItemH, Section 2.2), the number of active pairs (#ac), the number of violations greater than minvi (#vi), the average number of violations per active pair (#vi/#ac), the maximum violation (maxvi), the sum of the violations greater than minvi (sum), and sum/#ac.The output shows that items Reliable, Honest, Thankless * , Unfriendly * , and Dependable * have some violations greater than 0.03.The violations are relatively small.A similar matrix is provided to summarize the results of the P(−−) matrix.
Function plot produces two graphs for each item j.In one graph the lines represent the rows of the P(++) matrix pertaining to item j which should be nondecreasing if nonintersection holds, In the other graph the lines represent the rows of the P(−−) matrix pertaining to item j which should be nonincreasing if nonintersection holds.
Figure 2 displays the four rows in the P(++) matrix pertaining to item 5 (Unintelligent * ).
On the horizontal axis, the (10 − 1)4 = 36 item steps of the remaining 9 items are displayed in ascending order of popularity; i.e. the tick on the extreme left of the horizontal axis is the least popular item step X 2 ≥ 4, and the tick on the extreme right of the horizontal axis is the most popular item step X 1 ≥ 1.Let I 1 , . . ., I 36 denote the 36 ordered item steps.The upper line in the graph connects P(X 5 ≥ 1, I 1 ), . . ., P(X 5 ≥ 1, I 36 ), the next line in the graph displays P(X 5 ≥ 2, I 1 ), . . ., P(X 5 ≥ 2, I 36 ), etc.A nonincreasing line indicates a violation of nonintersection.Figure 2 shows that there are a few very small violations (less than 0.03) in the rows of the P(++) matrix pertaining to item 5 (Unintelligent * ).A similar graph is provided for the P(−−) matrix, where a nondecreasing line indicates a violation of nonintersection.

Method restscore
Method restscore can be applied using the following three commands.

Automated item selection algorithm
The automated item selection algorithm can be applied to scale Communality as follows: R> scale.com<-search.normal(acl.com)and results in the following vector scale.com(t(scale.com) is displayed).Computing the scalability coefficients for the first scale can be done as follows R> coefH (acl.com[,scale.com==1]) Similarly, investigating latent monotonicity for the first Mokken scale is done by R> monotonicity.com. 1 <-check.monotonicity(acl.com[,scale.com==1])R> summary(monotonicity.com.1)R> plot(monotonicity.com.1) Investigating latent monotonicity for the second scale and checking nonintersection are done in a similar way.

Figure 1 :
Figure 1: Visualization of the check of manifest monotonicity for item 5 (Unintelligent * )

Figure 2 :
Figure 2: Visualization of the check of nonintersection using method pmatrix for item 5 (Unintelligent * )

Figure 3 :
Figure 3: Visualization of the check of nonintersection using method restscore for items 4 (Deceitful * ) and 5 (Unintelligent * ) the default lower bound c = .3results in two Mokken scales.One Mokken scale contains the adjectives Dependable, Reliable, Honest, and Deceitful * (indicated by a 1 in scale.com),and the other Mokken scale contains the adjectives Cruel * , Unfriendly * , Obnoxious * , and Thankless * (indicated by a 2 in scale.com).The items Unscrupulous * and Unintelligent * are unscalable (indicated by a 0 in scale.com).