Flexible Rasch Mixture Models with Package psychomix

Measurement invariance is an important assumption in the Rasch model and mixture models constitute a ﬂexible way of checking for a violation of this assumption by detecting unobserved heterogeneity in item response data. Here, a general class of Rasch mixture models is established and implemented in R , using conditional maximum likelihood estimation of the item parameters (given the raw scores) along with ﬂexible speciﬁcation of two model building blocks: (1) Mixture weights for the unobserved classes can be treated as model parameters or based on covariates in a concomitant variable model. (2) The distribution of raw score probabilities can be parametrized in two possible ways, either using a saturated model or a speciﬁcation through mean and variance. The function raschmix() in the R package psychomix provides these models, leveraging the general infrastructure for ﬁtting mixture models in the ﬂexmix package. Usage of the function and its associated methods is illustrated on artiﬁcial data as well as empirical data from a study of verbally aggressive behavior.


Introduction
In item response theory (IRT), latent traits are usually measured by employing probabilistic models for responses to sets of items.One of the most prominent examples for such an approach is the Rasch model (Rasch 1960) which captures the difficulty (or equivalently easiness) of binary items and the respondent's trait level on a single common scale.Generally, a central assumption of most IRT models (including the Rasch model) is measurement invariance, i.e., that all items measure the latent trait in the same way for all subjects.If violated, measurements obtained from such a model provide no fair comparisons of the subjects.A typical violation of measurement invariance in the Rasch model is differential item functioning (DIF), see Ackerman (1992).
Therefore, assessing the assumption of measurement invariance and checking for DIF is crucial when establishing a Rasch model for measurements of latent traits.Hence, various approaches have been suggested in the literature that try to assess heterogeneity in (groups of) subjects either based on observed covariates or unobserved latent classes.If covariates are available, classical tests like the Wald or likelihood ratio test can be employed to compare model fits between some reference group and one or more focal groups (Fischer and Molenaar 1995).Typically, these groups are defined by the researcher based on categorical covariates or arbi-Flexible Rasch Mixture Models with Package psychomix trary splits in either numerical covariates or the raw scores (Andersen 1972).More recently, extensions of these classical tests have also been embedded into a mixed model representation (Van den Noortgate and De Boeck 2005).Another recently suggested technique is to recursively define and assess groupings in a data-driven way based on all available covariates (both numerical and categorical) in so-called Rasch trees (Strobl, Kopf, and Zeileis 2011).
Heterogeneity occuring in latent classes only (i.e., not observed or captured by covariates), however, is typically addressed by mixtures of IRT models.Specifically, Rost (1990) combined a mixture model approach with the Rasch model.If any covariates are present, they can be used to predict the latent classes (as opposed to the item parameters themselves) in a second step (Cohen and Bolt 2005).More recently, extensions to this mixture model approach have been suggested that encompass this prediction, see Tay, Newman, and Vermunt (2011) for a unifying framework.
In this paper, we introduce the psychomix package for the R system for statistical computing (R Development Core Team 2011) that provides software for fitting a general and flexible class of Rasch mixture models along with comprehensive methods for model selection, assessment, and visualization.The package leverages the general and object-oriented infrastructre for fitting mixture models from the flexmix package (Leisch 2004;Grün and Leisch 2008), combining it with the function RaschModel.fit() from the psychotools package (Zeileis, Strobl, and Wickelmaier 2011) for the estimation of Rasch models.All packages are freely available from the Comprehensive R Archive Network at http://CRAN.R-project.org/.
The reason for using RaschModel.fit() as opposed to other previously existing (and much more powerful and flexible) R packages for Rasch modeling -such as ltm (Rizopoulos 2006) or eRm (Mair and Hatzinger 2007) -is reduced computational complexity: RaschModel.fit() is intended to provide a "no frills" implementation of simple Rasch models, useful when refitting a model multiple times in mixtures or recursive partitions (see also Strobl et al. 2011).
While psychomix was under development, another R implementation of the Rost (1990) model became available in package mRm (Preinerstorfer 2011).As this builds on specialized C++ code, it runs considerably faster than the generic flexmix approach -however, it only covers this one particular type of model and offers fewer methods for specifying, inspecting, and assessing (fitted) models.In psychomix, both approaches are reconciled by optionally employing the mRm solution as an input to the flexmix routines.
In the following, we first briefly review both Rasch and mixture models and combine them in a general Rasch mixture framework (Section 2).Subsequently, the R implementation in psychomix is introduced (Section 3), illustrated by means of simulated data, and applied in practice to a study of verbally aggressive behavior (Section 4).Concluding remarks are provided in Section 5.

Rasch mixture models
In the following, we first provide a short introduction to the Rasch model, subsequently outline the basics of mixture models in general, and finally introduce a general class of Rasch mixture models along with the corresponding estimation techniques.
Sucess of solving an item or agreeing with it is coded as "1", while "0" codes the opposite response.The model suggested by Rasch (1960) uses the person's ability θ i (i = 1, . . ., n) and the item's difficulty β j (j = 1, . . ., m) to model the response y ij of person i to item j: Under the assumption of independence -both across persons and items within persons (see Fischer and Molenaar 1995, Chapter 1) -the likelihood for the whole sample y = (y ij ) n×m can be written as the product of the likelihood contributions from Equation 1 for all combinations of subjects and items.It is parameterized by the vector of all person parameters θ = (θ 1 , . . ., θ n ) and the vector of all item parameters β = (β 1 , . . ., β m ) (see Equation 2).
On the basis of the number of correctly solved items, the so-called "raw" scores r i = m j=1 y ij , it can be factorized into a conditional likelihood of the item parameters h(•) and the score probabilties g(•) (Equation 3).Because the scores r are sufficient statistics for the person parameters θ, the likelihood of the item parameters β conditional on the scores r does not depend on the person parameters θ (Equation 4).
The conditional likelihood of the item parameters takes the form where γ r i (•) is the elementary symmetric function of order r i , capturing all possible response patterns leading to a certain score (see Fischer and Molenaar 1995, Chapter 3, for details).
There are several approaches to estimating the Rasch model: Joint maximum likelihood (ML) estimation of β and θ is inconsistent, thus two other approaches have been established.Both are two-step approaches but differ in the way the person parameters θ are handled.For marginal ML estimation a distribution for θ is assumed and integrated out in L(θ, β), or equivalently in g(r|θ, β).In the conditional ML approach only the conditional likelihood of the item parameters h(y|r, β) from Equation 5is maximized for estimating the item parameters.
Technically, this is equivalent to maximizing L(θ, β) with respect to β if one assumes that g(r|δ) = g(r|θ, β) does not depend on θ or β, but potentially other parameters δ.
In R, the ltm package (Rizopoulos 2006) uses the marginal ML approach while the eRm package (Mair and Hatzinger 2007) employs the conditional ML approach, i.e., uses and reports only the conditional part of the likelihood in the estimation of β.The latter approach is also taken by the RaschModel.fit()function in the psychotools package (Zeileis et al. 2011).

Mixture models
Mixture models are a generic approach for modeling data that is assumed to stem from different groups (or clusters) but group membership is unknown.

Flexible Rasch Mixture Models with Package psychomix
The likelihood f (•) of such a mixture model is a weighted sum (with prior weights π k ) of the likelihood from several components f k (•) representing the different groups: Generally, the components f k (•) can be densities or (regression) models.Typically, all k components f k (•) are assumed to be of the same type f (y|ξ k ), distinguished through their component-specific parameter vector ξ k .
If variables are present which do not influence the components f k (•) themselves but rather the prior class membership probabilities π k , they can be incorporated in the model as socalled concomitant variables (Dayton and Macready 1988).In the psychometric literature, such covariates predicting latent information are also employed, e.g., by Tay et al. (2011) who advocate a unifying IRT framework that also optionally encompasses concomitant information (labeled MM-IRT-C for mixed-measurement IRT with covariates).To embed such concomitant variables x i into the general mixture model notation, a model for the component membership probability π(k|x i , α) with parameters α is employed: where commonly a multinomial logit model is chosen to parametrize π(k|x i , α) (see e.g., Grün and Leisch 2008;Tay et al. 2011).Note that the multinomial model collapses to separate π k (k = 1, . . ., K) if there is only an intercept and not real concomitants in x i .

Flavors of Rasch mixture models
When combining the general mixture model framework from Equation 6 with the Rasch model based on Equation 1, several options are conceivable for two of the building blocks.First, the component weights can be estimated via a separate parameter π k for each component or via a concomitant variable model π(k|x i , α) with parameters α.Second, the full likelihood function f (y i |ξ k ) of the components needs to be defined.If a conditional ML approach is adopted, it is clear that the conditional likelihood h(y i |r i , β) from Equation 5 should be one part, but various choices for modeling the score probabilities are available.One option is to model each score probability with its own parameter g(r i ) = ψ r i , while another (more parsimonious) option would be to adopt a parametric distribution with fewer parameters (Rost and von Davier 1995).Note that while for a single-component model, the estimates of the item parameters β are invariant to the choice of the score probabilities (as long as it is independent from β), this is no longer the case for a mixture model with K ≥ 2.

Rost's original parametrization
One of these possible mixtures -the so-called "mixed Rasch model" introduced by Rost (1990) -is already well-established in the psychometric literature.It models the score probabilities through separate parameters g(r i ) = ψ r i (under the restriction that they sum to unity) and does not employ concomitant variables.The likelihood of Rost's mixture model can thus be written as This particular parametrization is implemented in the R package mRm (Preinerstorfer 2011).
Since subjects who solve either none or all items (i.e., r i = 0 or m, respectively) do not contribute to the conditional likelihood of the item parameters they cannot be allocated to any of the components in this parametrization.Hence, Rost (1990) proposed to remove those "extreme scorers" from the analysis entirely and fix the corresponding score probabilities ψ 0 and ψ m at 0. However, if one wishes to include these extreme scorers in the analysis, the corresponding score probabilities can be estimated through their relative frequency (across all components) and the remaining score probabilites within each component are rescaled to sum to unity together with those extreme score probabilties.Nevertheless, the extreme scorers still do not contribute to the estimation of the mixture itself.

Other score distributions
As noted by Rost and von Davier (1995), the disadvantage of this saturated model for the raw score probabilities is that many parameters need to be estimated (K × (m − 2), not counting potential extreme scorers) that are typically not of interest.To check for DIF, the item parameters are of prime importance while the raw score distribution can be regarded as a nuisance term.This problem can be alleviated by embedding the model from Equation 7into a more general framework that also encompasses more parsimonious parametrizations.More specifically, a conditional logit model can be established containing some auxiliary regressors z i with coefficients δ.
The saturated g(r i ) = ψ r i model is a special case when constructing the auxiliary regressor from indicator/dummy variables for the raw scores 2, . . ., m − 1: As an alternative Rost and von Davier (1995) suggests a specification with only two parameters that link to mean and variance of the score distribution, respectively.More specifically, the auxiliary regessor is ) so that δ pertains to the vector of location and dispersion parameters of the score distribution.

General Rasch mixture model
Combining all elements of the likelihood this yields a more general specification of the Rasch mixture model Flexible Rasch Mixture Models with Package psychomix

Parameter estimation
Parameter estimation for mixture models is usually done via the expectation-maximization (EM) algorithm (Dempster, Laird, and Rubin 1977).It treats group membership as unknown and optimizes the full likelihood including the group membership on basis of the observed values only.It iterates between two steps until convergence: estimation of group membership (E-step) and estimation of the components (M-step).
In the E-step, the posterior probabilities of each observation for the k components is estimated through: using the parameter estimates for π and ξ from the previous iteration.In the case of concomitant variables, the component weights are πik = π(k|x i , α).
In the M-step, the parameters of the mixture are re-estimated with the posterior probabilites as weights.Thus, observations deemed unlikely to belong to a certain component have little influence on estimation within this component.For each component, the weighted ML estimation can be written as ξk = argmax = argmax which for the Rasch model amounts to separately maximizing the weighted conditional loglikelihood for the item parameters and the weighted score log-likelihood.
The concomitant model can be estimated seperately from the posterior probabilities, e.g., for a multinomial model: Finally, note that the number of components K is not a standard model parameter (because the likelihood regularity conditions do not apply) and thus it is not estimated through the EM algorithm.Either it needs to be chosen by the practitioner or by model selection techniques such as information criteria, as illustrated in the following examples.

User interface
The function raschmix() can be used to fit the different flavors of Rasch mixture models described in Section 2.3: with or without concomitant variables in π(k|x i , α), and with different score distributions g(r i |δ k ) (saturated vs. mean/variance parametrization).The function's synopsis is raschmix(formula, data, k, subset, weights, scores = "saturated", nrep = 3, cluster = NULL, control = NULL, verbose = TRUE, drop = TRUE, unique = FALSE, which = NULL, gradtol = 1e-6, deriv = "sum", hessian = FALSE, ...) where the lines of arguments pertain to (1) data/model specification processed within raschmix(), (2) control arguments for fitting a single mixture model, (3) control arguments for iterating across mixtures over a range of numbers of components K, all passed to stepFlexmix(), and (4) control arguments for fitting each model component within a mixture (i.e., the M-step) passed to RaschModel.fit().Details are provided below, focusing on usage in practice first.
A formula interface with the usual formula, data, subset, and weights arguments is used: The left-hand side of the formula sets up the response matrix y and the right-hand side the concomitant variables x (if any).The response may be provided by a single matrix or a set of individual dummy vectors, both of which may be contained in an optional data frame.Example usages are raschmix(resp ~1, ...) if the matrix resp is an object in the working environment or raschmix(item1 + item2 + item3 ~1, data = d, ...) if the item* vectors are in the data frame d.In both cases, ~1 signals that there are no concomitant variables -if there were, they could be specified as raschmix(resp ~conc1 + conc2, ...).
The scores of the model can be set to either "saturated" (see Equation 7) or "meanvar" for the mean/variance specification of Rost and von Davier (1995).Finally, the number of components K of the mixture is specified through k, which may be a vector resulting in a mixture model being fitted for each element.
To control the EM algorithm for fitting the specified mixture models, cluster may optionally specificy starting probabilities pik and control can set certain control arguments through a named list or an object of class "FLXcontrol".One of these control arguments named minprior sets the minimum prior probability for every component.If in an iteration of the EM algorithm, any component has a prior probability smaller then minprior, it is removed from the mixture in the next iteration.The default is 0, i.e., avoiding such shrinkage of the model.If cluster is not provided, nrep different random initializations are employed, keeping only the best solution (to avoid local optima).Finally, cluster can be set to "mrm" in which case the fast C++ implementation from mRm (Preinerstorfer 2011) can be leveraged to generate optimized starting values.Again, the best solution of nrep runs of mrm() is used.Note that as of version 1.0 of mRm only the model from Equation 7is supported in mrm(), resulting in suboptimal -but potentially still useful -posterior probabilities pik for any other model flavor.
Internally, stepFlexmix() is called to fit all individual mixture models and takes control arguments verbose, drop, and unique.If k is a vector, the whole set of models is returned by default but one may choose to select only the best model according to an information criterion.For example, raschmix(resp, k = 1:3, which = "AIC", ...) or raschmix(resp ~1, data = d, k = 1:4, which = "BIC", ...).
The arguments gradtol, deriv and hessian are used to control the estimation of the item parameters in each M-step (Equation 11) carried out via RaschModel.fit().Function raschmix() returns objects of class "raschmix" or "stepRaschmix", respectively, depending on whether a single or multiple mixture models are fitted.These classes extend "flexmix" and "stepFlexmix", respectively, for more technical details see the next section.
For standard methods for extracting or displaying information, either for "raschmix" directly or by inheritance, see Table 1 for an overview.

Internal structure
As briefly mentioned above, raschmix() leverages the flexmix package (Leisch 2004;Grün and Leisch 2008) and particularly its stepFlexmix() function for the estimation of (sets of) mixture models.
The flexmix package is designed specifically to provide the infrastructure for flexible mixture modelling via the EM algorithm, where the type of a mixture model is determined through the model employed in the components.In the estimation process, this component model definition corresponds to the definition of the M-step (Equation 11).Consequently, the flexmix package provides the framework for fitting mixture models by leveraging the modular structure of the EM algorithm.Provided with the right M-step, flexmix takes care of the data handling and iterating estimation through both E-step and M-step.
The M-step needs to be provided in the form of a flexmix driver inheriting from class "FLXM" (see Grün and Leisch 2008, for details).The psychomix package includes such a driver function: FLXMCrasch() relies on the function RaschModel.fit() from the psychotools package for estimation of the item parameters (i.e., maximization of the conditional likelihood from Equation 5) and adds different estimates of raw score probabilities depending on their parameterization.
The reason for employing RaschModel.fit() rather than one of the more established Rasch model packages such as eRm or ltm is speed: RaschModel.fit() has been designed with reduced flexibility in order to save time when refitted multiple times as in Rasch mixture models or also Rasch trees in the psychotree package (Strobl et al. 2011).
In the flexmix package, two fitting functions are provided.flexmix() is designed for fitting one model once and returns an object of class "flexmix".stepFlexmix() extends this so that either a single model or several models can be fitted.It also provides the functionality to fit each model repeatedly to avoid local optima.
When fitting models repeatedly, only the solution with the highest likelihood is returned.Thus, if stepFlexmix() is used to repeatedly fit a single model, it returns an object of class "flexmix".If stepFlexmix() is used to fit several models (repeatedly or just once), it returns an object of class "stepFlexmix".
This principle extends to raschmix(): If it is used to fit a single model, the returned object is of class "raschmix".If used for fitting multiple models, raschmix() returns an object of class "stepRaschmix".Both classes extend their flemix counterparts.

Illustrations
For illustrating the flexible usage of raschmix(), we employ an artificial data set drawn from one of the three data generating processes (DGPs) suggested by Rost (1990) for the introduction of Rasch mixture models.All three DPGs are provided in the function simRaschmix() setting the design to "rost1", "rost2", or "rost3", respectively.The DPGs contain mixtures of K = 1, and 2, and 3 components, respectively, all with m = 10 items.

Flexible Rasch Mixture Models with Package psychomix
Here, a dataset from the second DGP is generated along with two artificial covariates x1 and x2.Covariate x1 is an informative binary variable (i.e., correlated with the true group membership) while x2 is an uninformative continuous variable.
R> set.seed(1)R> r2 <-simRaschmix(design = "rost2") R> d <-data.frame(+ x1 = rbinom(nrow(r2), prob = c(0.4,0.6)[attr(r2, "group")], size = 1), The Rost (1990) version of the Rasch mixture model -i.e., with a saturated score model and without concomitant variables -is fitted for one to three components.As no concomitants are employed in this model flavor, the matrix r2 can be passed to raschmix() without formula: To inspect the results, the returned object can either be printed, as illustrated above, or plotted yielding a visualization of information criteria (see Figure 1).Both printed display and visualization show a big difference in information criteria across numer of components K, with the minimum always being assumed for K = 2, thus correctly recovering the two latent classes constructed in the underlying DGP.
The values of the information criteria can also be accessed directly via the functions of the corresponding names.To select a certain model from a "stepRaschmix" object, the getModel() function from the flexmix package can be employed.The specification of which model is to be selected can either be an information criterion, or the number of components as a string, or the index of the model in the original vector k.In this particular case, which = "BIC", which = "2", and which = 2 would all return the model with K = 2 components.

Call
In addition to the item parameters, the parameters() function can also return the parameters of the "score" model and the "concomitant" model (if any).The type of parameters can be set via the which argument.Per default parameters() returns both item and score parameters.
A comparison between estimated and true class membership can be conducted using the clusters() function and the corresponding attribute of the data, respectively.As already noticeable from the item parameters, the first component of the mixture matches the second true group of the data and vice versa.This label-switching property of mixture models in general can also be seen in the cross-table of class memberships.We thus have 38 misclassifications among the 1628 observations.

R> m2
Call: raschmix(data = r2, k = 1:3, scores = "meanvar") −1 0 1 2 q q q q q q q q q q q q q q q q q q q q 1 2 4 6 8 10 Comp. 1 Comp. 2 q q q q q q q q q q q q q q q q q q q q Items Centered item difficulty parameters −2 −1 0 1 2 q q q q q q q q q q q q q q q q q q q q 1 2 4 6 8 10 Comp. 1 Comp. 2 q q q q q q q q q q q q q q q q q q q q Figure 2: True (black) and estimated (blue/red) item parameters for the two model specifications, "saturated" (left) and "meanvar" (right), for the artifical scenario 2 data.
As in the saturated version of the Rasch mixture model, all three information criteria prefer the two-component model.Thus, this version of a Rasch mixture model is also capable of recognizing the two latent classes in the data while using a more parsimonious parametrization with 23 instead of 35 parameters.

R> logLik(m2b)
'log Lik.' -8747.084 (df=23) R> logLik(m1b) 'log Lik.' -8738.606 (df=35) The estimated parameters of the distribution of the score probabililities can be accessed through parameters() while the full set of score probabilities is returned by scoreProbs().The resulting item parameters for this particular data set are virtually identical to those from the saturated version, as can be seen in Figure 2.
To demonstrate the use of a concomitant variable model for the weights of the mixture, the two artificial variables x1 and x2 are employed.They are added on the right-hand side of the formula, yielding a multinomial logit model for the (only if k = 2 or more components are specified).
R> cm2 <-raschmix(resp ~x1 + x2, data = d, k = 2:3, scores = "meanvar") The BIC is used to compare the models with and without concomitant variables.In both cases, the two true groups are recognized correctly, while the model with concomitants manages to employ the additional information and reaches a somewhat improved model fit.

Empirical application: Verbal aggression
The verbal aggression dataset (De Boeck and Wilson 2004) contains item response data from 316 first-year psychology students along with gender and trait anger (assessed by the Dutch adaptation of the state-trait anger scale) as covariates (Smits, De Boeck, and Vansteelandt 2004).The 243 women and 73 men responded to 24 items constructed the following way: Following the description of a frustrating situation, subjects are asked to agree or disagree with a possible reaction.The situations are described by the following four sentences: S1: A bus fails to stop for me.S2: I miss a train because a clerk gave me faulty information.S3: The grocery store closes just as I am about to enter.S4: The operator disconnects me when I had used up my last 10 cents for a call.Each reaction begins with either "I want to" or "I do" and is followed by one of the three verbally aggressive reactions "curse", "scold", or "shout", e.g., "I want to curse", "I do curse", "I want to scold", or "I do scold".
For our illustration, we use only the first two sentences which describe situations in which the others are to blame.Extreme-scoring subjects agreeing with either none or all responses are removed.
The posterior probabilities for the three components can be visualized via histogram(va12_mix3) -by default using a square-root scale, yielding a so-called rootogram -as shown in Figure 3.In the ideal case, posterior probilities of the observations for each component are either high or low, yielding a U-shape in all panels.In this case here, the components are separated acceptably well.
The item profiles in three components can be visualized via plot(va12_mix3) or xyplot(va12_mix3) with the output of the latter being shown in Figure 4.The first six items are responses to the first sentence (bus), the remaining six refer to the second sentence (train).The six reactions are grouped in "want"/"do" pairs: first for "curse", then "scold", and finally "shout".
The third component displays a zigzag pattern which indicates that subjects in this component always find it easier or less extreme to "want to" react a certain way rather than to actually "do" react that way.In the other two components this want/do relationship is reversed, except for the shouting response (to either situation).
In the first component, there are no big differences in the estimated item parameters.Neither the situation (S1 or S2) nor the type of verbal response (curse, scold, or shout) is particularly hard to agree to for subjects in this component.In components 2 and 3, the situation is also not very relevant but subjects differentiate between the three verbal responses.This is best visible in component 2 where item difficulty is clearly increasing from response "curse" to response "shout".Thus, shouting is preceived as the most extreme verbal response while cursing is considered a comparably moderate response.In component 3 this pattern is also visible albeit not as prominently as in component 2.
One could also consider the 3-component model with concomitant variables as its BIC was almost equivalent to that of the model without concomitant variables.The estimated item parameters are virtually identical between both models and are hence not shown here.Nevertheless, the link between the concomitant variables and the latent classes may still be of interest: R> parameters(getModel(va12_mix2, which = "3"), which = "concomitant") 1 2 3 (Intercept) 0 -0.76040110 -3.6721134 gendermale 0 1.66471685 1.4177908 anger 0 0.01155322 0.1268023 The absolute sizes of the cofficients reflect that there may be some association with gender but less with the anger score.However, as there is a slight increase in BIC compared to the model without concomitants, the association with the covariates appears to be relatively weak.In comparison to other approaches exploring the association of class membership with covariates ex post (e.g., as in Cohen and Bolt 2005), the main advantage of the concomitant variables model lies in the simulataneous estimation of the mixture and the influence of covariates.

Summary
Mixtures of Rasch models are a flexible means of checking measurement invariance and testing for differential item functioning.Here, we establish a rather general unifying conceptual framework for Rasch mixture models along with the corresponding computational tools in the R package psychomix.In particular, this includes the original model specification of Rost (1990) as well as more parsimoneous parameterizations (Rost and von Davier 1995), along with the possibility to incorporate concomitant variables predicting the latent classes (as in Tay et al. 2011).
The R implementation is based on the infrastructure provided by flexmix package, allowing for convenient model specification and selection.The rich set of methods for flexmix objects is complemented by additional functions specifically designed for Rasch models, e.g., extracting different types of parameters in different transformations and visualizing the estimated component-specific item parameters in various ways.Optionally, speed gains can be obtained from utilizing the C++ implementation in the mRm package for selecting optimal starting values.Thus, psychomix provides a comprehensive and convenient toolbox for the application of Rasch mixture models in psychometric research practice.
) with (a) the concomitant model π(k|x i , α) for modeling component membership, (b) the component-specific conditional likelihood of the item parameters given the scores h(y i |r i , β k ), and (c) the component-specific score distribution g(r i |δ k ).

Figure 4 :
Figure 4: Item profiles for the 3-component Rasch mixture model on verbal agression data.Items 1-6 pertain to situation S1 (bus), items 7-12 to situation S2 (train), each in the following order: want to curse, do curse, want to scold, do scold, want to shout, do shout.

Table 1 :
Flexible Rasch Mixture Models with Package psychomix Methods for objects of class "raschmix".
The estimated score probabilites of the illustrative model are approximately equal across components and roughly uniform.
As mentioned above, the parameters of the concomitant model can be accessed via the parameters() function, setting which = "concomitant".The influence of the informative covariate x1 is reflected in the large absolute coefficient while the estimated coefficient for the noninformative covariate x2 is close to zero.The corresponding estimated item parameters parameters(cm2b, "item") are not very different from the previous models (and are hence not shown here).This illustrative application Flexible Rasch Mixture Models with Package psychomix shows that the inclusion of concomitant variables can provide additional information, e.g., that x1 but not x2 is associated with the class membership.Note also that this is picked up although a rather weak association was simulated here.