Zero and One Inflated Item Response Theory Models for Bounded Continuous Data

Bounded continuous data are encountered in many applications of item response theory, including the measurement of mood, personality, and response times and in the analyses of summed item scores. Although different item response theory models exist to analyze such bounded continuous data, most models assume the data to be in an open interval and cannot accommodate data in a closed interval. As a result, ad hoc transformations are needed to prevent scores on the bounds of the observed variables. To motivate the present study, we demonstrate in real and simulated data that this practice of fitting open interval models to closed interval data can majorly affect parameter estimates even in cases with only 5% of the responses on one of the bounds of the observed variables. To address this problem, we propose a zero and one inflated item response theory modeling framework for bounded continuous responses in the closed interval. We illustrate how four existing models for bounded responses from the literature can be accommodated in the framework. The resulting zero and one inflated item response theory models are studied in a simulation study and a real data application to investigate parameter recovery, model fit, and the consequences of fitting the incorrect distribution to the data. We find that neglecting the bounded nature of the data biases parameters and that misspecification of the exact distribution may affect the results depending on the data generating model.

, in which respondents indicate their item response on a line segment. Furthermore, item response times on cognitive test items with item-level deadlines (as advocated by, e.g., Goldhammer [2015]) can be considered bounded continuous responses, and finally, summed dichotomous item scores or ordinal item scores can (pragmatically) be considered as bounded and approximately continuous in some situations (see Dolan, 1994;Rhemtulla et al., 2012).
Similarly as in IRT models for categorical responses, the key of IRT models for bounded continuous responses is to model the expected value of the item response variable for a given person as a function of the underlying person and item parameters. In this article, we focus on bounded IRT models that use a monotonic and S-shaped form for this function. This is similar to the well-known one-and two-parameter logistic models for dichotomous data (Birnbaum, 1968), but it is different from unfolding IRT models (Coombs, 1964;Roberts et al., 2000;see Noel, 2014, for an approach to bounded continuous IRT modeling) that adopt a nonmonotonic form for the response function and censored factor analysis (Muthen, 1989) that adopt a monotonic step function.
One of the first attempts to formulate an IRT model for bounded continuous responses has been by Samejima (1973; see also Ferrando, 2002). Although different special cases exist in this general model, in the most popular and practically feasible special case, the S B distribution (Johnson, 1949) is assumed for the conditional distribution of the responses. Other bounded IRT models have been proposed based on the beta distribution (Noel & Dauvier, 2007; see also Revuelta et al., 2022, for a related approach in the common factor model), the simplex distribution (Flores et al., 2020), the truncated normal distribution (Müller, 1987), and a distribution based on a truncated exponential function (Verhelst, 2019). In addition, an unbounded normal distribution has been proposed (Ferrando, 2009;Mellenbergh, 1994;Thissen et al., 1983), which is equivalent to the common linear factor model for the continuous responses (Jöreskog, 1971;Spearman, 1904). This article is motivated by two observations about conventional bounded IRT models: First, interestingly, despite the importance of bounded continuous data in the applications mentioned above, the existing IRT approaches have mostly focused on responses in the open interval (0, 1), but not on responses in the closed interval [0,1]. Exceptions are the approaches by Verhelst (2019) and Müller (1987) however; unfortunately, these models are challenging to estimate (see Verhelst, 2019) hampering practical applications. As a result, if respondents use the end points of the continuous measurement scale, which-at least in our experience-happens often, the data need to be arbitrarily transformed to prevent the 0 and 1 scores in the dataset to allow the application of bounded IRT models for the open interval (see, e.g., Noel & Dauvier, 2007). We will show below that this practice can majorly affect the parameter estimates of the bounded response model. Second, although there are Zero and One Inflated IRT different models available for bounded continuous responses, due to the lack of a common modeling framework, these models have not been compared directly in terms of parameter recovery, robustness to misspecification, model fit, and real data applications. Therefore, to address the two issues above, we present a zero and one inflated IRT modeling framework for bounded continuous responses. In this framework, it is straightforward to accommodate the existing bounded IRT models above. In addition, a general Bayesian estimation procedure is proposed to fit and compare the different models. The outline is as follows: First, we review the conventional bounded IRT models and derive a general zero and one inflated approach to accommodate closed interval responses. Next, we show in a real dataset and a simulated dataset that only relatively mild zero or one inflation can already substantially affect the person and item parameter estimates in the conventional bounded IRT models. We then present the Bayesian procedure to estimate the zero and one inflated bounded IRT models and to study model fit. After that, we present the results of two simulation studies to investigate parameter recovery and examine how misspecification of the conditional distribution of the responses affects the modeling results. Finally, we present an application to 22 personality scales to compare the different models empirically. We end with a general discussion.

IRT Models for Bounded Continuous Data
Let X 0 pi 2 ½L; U denote the continuous bounded item score of person p ¼ 1; . . . ; N to item i ¼ 1; . . . ; n that can take values between a theoretical lower bound L and upper bound U. Commonly L ¼ 0 and U ¼ 100 in the case of visual analogue scales, L ¼ 0 and U is equal to the item deadline (e.g., in seconds) in the case of response times and L ¼ 0 and U ¼ n in the case of summed dichotomous item scores (in the case of 0, 1 scoring). Next, these item scores X 0 pi are transformed using X pi ¼ ðX 0 pi À LÞ= ðU À LÞ, such that X pi 2 ½0; 1. As mentioned above, similarly to IRT models for categorical data, IRT models for bounded continuous data focus on EðX pi jy p ; τ i Þ, the expected response of person p on item i conditional on the underlying latent person parameter which is on the real line, that is, y p 2 R, and the underlying vector of item parameters, τ i 2 R m , where m denotes the number of item parameters in a given model. Commonly, the relation between EðX pi jy p ; τ i Þ and y p and τ i is characterized by an S-shaped function similarly to the well-known one-and two-parameter models for dichotomous data (but see Noel, 2014, for bounded continuous IRT models that adopt nonmonotonic response functions). However, in the one-and two-parameter IRT models, which are based on the Bernoulli distribution, there is only one natural parameter to be modeled, while for bounded continuous IRT models, there are different suitable distributions that commonly include more natural parameters.
In the below, we discuss four bounded continuous IRT models mentioned above that are based on different distributions of X pi conditional on y p and τ i : the bounded IRT model based on the S B distribution for the conditional distribution of X pi (Samejima, 1973), the beta-IRT model (Noel & Dauvier, 2007), the simplex-IRT model (Flores et al., 2020), and the normal-IRT model (Ferrando, 2009;Mellenbergh, 1994;Thissen et al., 1983). We do not consider the models that are based on truncated distributions (Müller, 1987;Verhelst, 2019) because these models are challenging to estimate as mentioned above (although parameter estimation is feasible using pairwise item parameter estimation, see Verhelst, 2019).

S B -IRT Model
Probably, one of the most well-known models for bounded continuous responses is the model by Samejima (1973). Although the framework outlined by Samejima is much broader, arguably one of the most important special cases (in terms of practical applicability and statistical properties, see Ferrando, 2002) is based on the S B distribution. The S B distribution arises if a normally distributed variable, Z, is transformed according to Z 0 ¼ cðZÞ, where cð:Þ is the logistic function defined by cðdÞ ¼ ½1 þ expðÀdÞ À1 : See Figure 1 (left) for some example plots of this distribution. If the mean of Z is modeled using a linear IRT FIGURE 1. Examples of the different distributions adopted by the bounded-item response theory models. The parameters used for the S B distribution for, respectively, the black solid, striped, and dotted lines are: m ¼ 0, 0.2, 0.1 and d ¼ 1.5, 2.0, and 3.0 and for the gray solid, striped, and dotted lines: m ¼ À1.5, À1, and 0 and d ¼ 1, 1, and 1. In addition, the parameters used for the beta distribution for, respectively, the black solid, striped, and dotted lines are: a ¼ 0:8; 2; and 4 and b ¼ 0:8; 2; and 4 and for the gray solid, striped, and dotted lines: a ¼ 2, 3, and 4 and b ¼ 5; 5; and 5. Finally, the parameters used for the simplex distribution for, respectively, the black solid, striped, and dotted lines are: m¼ 0.3, 0.4, and 0.5 and f ¼ 1, 1, and 1 and for the gray solid, striped, and dotted lines: m ¼ 0.3, 0.4, and 0.5 and f ¼ 4, 4, and 4. Zero and One Inflated IRT parameterization, the following model arises, which we will refer to as the S B -IRT model: where a i 2 R þ is an item discrimination parameter on the positive real line, b i 2 R is an item easiness parameter, and d i 2 R þ is a dispersion parameter. The expressions for the conditional mean and variance of X pi are complicated and are not provided here (but we refer the reader to the appendix of Johnson [1949]). However, most importantly, EðX pi jy p ; τ i Þ has a symmetric S-shaped curve with its maximum slope at EðX pi jy p ; τ i Þ ¼ 1 2 for y p ¼ À b i a i , similarly to the twoparameter model for dichotomous responses (Johnson, 1949;see Ferrando, 2002). In addition, characteristic for this model is that the test information function is constant across y p , that is: while for the IRT models considered next, the test information is not constant across y p .

Beta-IRT Model
The beta-IRT model (Noel & Dauvier, 2007) assumes a beta distribution for X pi with person and item-specific shape parameters, that is: where Gð:Þ is the gamma function defined by GðdÞ ¼ where a pi 2 R þ and b pi 2 R þ . See Figure 1 (middle) for some example plots of this distribution. In the beta-IRT model, a pi and b pi are given by Molenaar et al.
with a i , b i , and y p defined as before, and with o i 2 R , so that a dispersion parameter is defined as o For the beta-IRT model, the conditional mean and variance of X pi are, respectively, given by and VARðX pi jy p ; where cð:Þ is defined before. Thus, the conditional mean has the same parametric form as the two-parameter logistic model. The original model proposed by Noel and Dauvier (2007) did not contain a discrimination parameter, a i ; however, we added this parameter to the model to ensure comparability among the different models considered in this study. The test information function for this model is given by where Oð:Þ is the trigamma function defined by OðdÞ ¼ q 2 lnGðdÞ qd 2 . Note that this expression differs slightly from that of Noel and Dauvier (2007) as in their model a i ¼ 1. The individual terms in Iðy p Þ are the item information functions, which are unimodal functions similar to the item information function in the twoparametric logistic model for dichotomous data with the maximum information at a i y p þ b i ¼ 0 and with the information about y p decreasing as ja i y p þ b i j increases.

Simplex-IRT model
In the simplex-IRT model (Flores et al., 2020), the conditional distribution of X pi is assumed to follow a simplex distribution (Barndorff-Nielsen & Jørgensen, 1991), that is: with m pi 2 ð0; 1Þ and with dispersion parameter f i 2 R þ . See Figure 1 (right) for some example plots of this distribution. Although relatively less well known as compared to the beta distribution, the simplex distribution has previously been applied in a generalized linear mixed modeling framework to analyze proportions (see Zhang et al., 2016, for applications and an implementation in R). Using Zero and One Inflated IRT the simplex distribution, an IRT model is specified by submitting m pi to a twoparameter logistic model decomposition (Flores et al., 2020): where cð:Þ, a i , b i , and y p are defined as before. The conditional mean and variance for the simplex-IRT model are then, respectively, given by and VARðX pi jy p ; where Gð:Þ is the upper incomplete gamma function defined by Gðx; dÞ ¼ ð 1 x t dÀ1 expðÀtÞdt and where cð:Þ is defined before.
The simplex-IRT model above has originally been proposed by Flores et al (2020) as a measurement model response times. Here, we study the model in the broader context of measurement models for bounded continuous responses. As Flores et al. focused on response times modeling, they did not provide an expression for the test information function. However, it is straightforward to derive, that is: where the expectation is taken in the distribution of X pi , that is, f ð:Þ in Equation 7, m pi is given by Equation 8, and lnLðy p ; τ i Þ is the log-likelihood function based on Equation 7. The form of the item information functions (i.e., the individual terms in Iðy p Þ) differs importantly from those of the beta-IRT model. That is, for the simplex-IRT model, the item information is increasing toward both ends of the y p -scale, whereas for the beta-IRT model, the information is decreasing toward the ends of the y p -scale. Similar to the beta-IRT model, however, the item information function of the simplex-IRT model has a maximum at Àb i =a i . As the simplex-IRT model is not well studied yet and the item and test information functions have not been derived before, we provide some example item information plots in Figure 2. As mentioned in the figure caption, the item Molenaar et al. Examples of the simplex-item response theory item information function (left column) and the corresponding marginal distributions (right column) for a i ¼ 0.5, j i ¼ 15, and b i equal to À1, À0.5, 0, 0.5, or 1. The vertical striped line indicates y p ¼ Àb i =a i . Note that in all situations, the information increases to infinity for y !À1 and y ! 1, but this is not visible in all plots.
Zero and One Inflated IRT information in the simplex-IRT increases to infinity for y ! À1 and y ! 1. This can be verified from Equation 11: In the squared first-order derivative of the log-likelihood with respect to y, @lnLðy p ; τ i Þ @y p 2 , it holds that for m pi ! 0 and m pi ! 1, which happens for y ! À1 and y ! 1, @lnLðy p ; τ i Þ @y p 2 approaches 1. As a result, the item information also approaches 1.
In comparing the distributions from the different models in Figure 1, it can be seen that all models can account for bimodality to some degree, which has shown to be relevant in psychological data by Noel (2014). The models still however differ in the exact form of the bimodality, for instance, in the beta distribution, the two modes always occur for X pi ! 0 and X pi ! 1, while for the S B and the simplex distribution, both modes can occur for 0 > X pi > 1.

Normal-IRT Model
An unbounded normal distribution has also been proposed for bounded continuous data. The main focus of this article is on the bounded IRT models above, but we will also consider the normal-IRT model as a reference to compare the results to. We focus on the parameterization of Mellenbergh (1994) and Thissen et al. (1983), that is: where a i , b i , and y p are defined as before, and where s 2 i 2 R þ is a dispersion parameter. The conditional mean and variance of X pi in the normal-IRT model are, respectively, given by a i y p þ b i and s 2 . In addition, similar like the S B -IRT model, the test information function is constant and given by

Comparability of the Parameters
As the models differ in their formulation, question arises how the parameters can be compared across the models. First, if y p is identified in the same way across all models, its estimates can readily be compared. That is, as y p is a latent variable, it lacks a scale. By identifying this scale in the same way in all models (e.g., by imposing a Normal(0,1) distribution), y p estimates can be compared across models.

Molenaar et al.
For the a i and b i parameters, the estimates from the beta-IRT and simplex-IRT model can also readily be compared as both parameters occur in the same logistic function between the conditional mean of X pi and y p (i.e., Equations 6 and 9). However, for the S B -IRT model, these parameters are on a different scale as the conditional mean of X pi is not a logistic function of y p , even though the function is S-shaped. Yet, there is a transformation possible, which enables comparability of both a i and b i across models. First, as noted above, for the S B -IRT model, it holds that EðX pi jy p ; τ i Þ ¼ 1 2 for y p ¼ À b i a i , which is also true for the beta-IRT model and the simplex model. Therefore, by relying on b Ã i ¼ À b i a i , the item easiness parameters can be meaningfully compared across models. To enable comparison of the discrimination parameters, we focus on a Ã i , which denotes the maximum slope of EðX pi jy p ; τ i Þ with respect to y p . As is well known, in a logistic IRT model like Equations 6 and 9, the curve has its maximum slope at Although the S B -IRT model does not rely on a logistic function for EðX pi jy p ; τ i Þ, it has been shown that the maximum slope in the S B -IRT model also occurs at y p ¼ À b i a i with the slope being given by (see, e.g., Ferrando, 2002) As a result, using Equation 13, a Ã i from the S B -IRT model can meaningfully be compared to a Ã i from the beta-IRT and simplex-IRT models. As the normal-IRT model does not account for the bounded nature of the responses, it is misspecified by definition. However, the model may provide a reasonable approximation in some situations (Ferrando, 2002). To enable a meaningful comparison of the parameters, similarly as above, we define b Ã i as the y p value in the normal-IRT model for which EðX pi jy p ; In addition, because in the normal-IRT model the slope of EðX pi jy p ; τ i Þ with respect to y p is constant, a i is equal to the maximum slope, that is, a Ã i ¼ a i . Thus, using these results, the parameters from the normal-IRT model can be compared to the transformed parameters from the bounded IRT models.

Two Motivating Examples: Consequences of Zero and One Inflation
Except for the normal-IRT models, all models above are unsuitable for responses in the closed interval [0,1]. That is, for X pi ¼ 0 or X pi ¼ 1, density f ðX pi jy p ; τ i Þ in the beta-IRT model is either equal to 0 or equal to 1 (depending on a pi and b pi in Equation 3) making the log-likelihood infinite for observations on the bounds of the response variable. In addition, for the S B -IRT and simplex-IRT Zero and One Inflated IRT models, the densities are undefined at X pi ¼ 0 and X pi ¼ 1. In practice, researchers recode responses on the bounds to prevent problems with the likelihood and to enable application of the models above. That is, 0 responses are recoded, so that they are slightly above the lower bound (e.g., to 1e-5), and 1 responses are recoded, so that they are slightly below the upper bound (e.g., to 1 -1e-5). However, even though the likelihoods of the models are now tractable, the models cannot account for an excess of scores near the bounds. Below, we present a real data example and a simulated data example to show that the consequences of not accounting for zero and one inflation can be quite severe.

Example 1: Adjectives Checklist
We took two scales from the Adjectives Checklist (ACL; Gough, & Heilbrun, 1980) data (N ¼ 244Þ that are analyzed in more depth in the real data illustration section. The first scale is the Affiliation scale (10 bounded continuous items), for which both end points of the response scale are hardly used (i.e., there is hardly zero or one inflation, only four subjects used the lower end point once, the upper end point is not used at all). The second scale is the Abasement (also 10 bounded continuous items), for which the lower end point of the response scale is used more frequently resulting in zero inflation. That is, the lower end point of the scale is used in 10.75% of the responses on average across the items from the scale.
To these data, we fit the conventional beta-IRT model for open interval data from Equations 3 through 5 and the zero-one inflated extension proposed in this article. To enable application of the conventional model to the closed interval data from the ACL, we recoded 0 into 1e-5 and 1 into 1 -1e-5 for the conventional model application. See Figure 3 for a plot of the estimates of y p , b i , a i , and o i in the conventional model and the estimates in the proposed model. If the estimates from the two models agree, they should scatter around the straight gray line. As can be seen, for the Affiliation scale with hardly any zero inflation, the estimates seem to agree. However, for the Abasement scale with substantial zero inflation, the estimates are systematically different for the item parameters and are very variable in the case of y p as compared to the Affiliation scale without inflation.
The zero inflation in this real data example thus seems to affect the estimates from the conventional beta-IRT model. This is also true for the other models discussed above (the simplex and S B models). However, of course, a more systematic approach is needed to demonstrate this. Therefore, below we verify this finding in a simulated data example.

Example 2: Simulated Data
We simulated seven datasets according to the conventional beta-IRT model using N ¼ 244 and n ¼ 10 similar to the real data above. The true item parameters a i , b i , and o i were set to the estimates as found for the Affiliation scale Molenaar et al. Zero and One Inflated IRT above. Person parameters y p are drawn from a normal distribution. In these data, we increased the amount of zero and one inflation using the newly proposed model (which will be discussed in more detail below). Specifically, we considered the scenarios, in which 0%, approximately 5%, or approximately 10% of the scores on each item are in the lower and/or upper end point. 1 We considered the following scenarios: (1) no inflation, (2) 5% zero inflation, (3) 10% zero inflation, (4) 5% one inflation, (5) 10% one inflation, (6) 5% zero and 5% one inflation, and (7) 10% zero and 10% one inflation. See the left column of plots in Figure 4 or 5 for the distribution of Item 1 in Scenarios 1, 2, 3, 6, and 7.
To these seven datasets, we fit the conventional beta-IRT model for open interval data from Equations 3 through 5. To enable application of this model to the closed interval data from Scenarios 2 through 7, we recoded 0 into 1e-5 and 1 into 1 -1e-5. See Figures 4 and 5 for the estimates of, respectively, b i and y p in the conventional model (left column of plots) and the estimates in the proposed data generating model (right column of plots) in Scenarios 1, 2, 3, 6, and 7. As can be seen, for the proposed model, the estimates of both b i and y p seem acceptably close to the true parameter values, while for the conventional model, the estimates are biased for b i and have increased variability for y p . Results for Scenarios 4 and 5 (which are not in the figure) are comparable to the results from Scenarios 2 and 3. In addition, the effect on the discrimination parameters a i is comparable to the effect on y p (i.e., increased variability in the estimates). Overall, it seems thus desirable to have an IRT approach available to takes zero and one inflation into account.

A Zero and One Inflated Bounded IRT Approach
Here, we present the zero and one inflated approach illustrated above. The idea is that we model a dummy variable, Z pi , which codes three possible outcomes of the response process: Z pi ¼ 0 if respondent p decides to score on the lower boundary of item i. Z pi ¼ 1 if respondent p decides to score between the lower and upper boundary of item i. Z pi ¼ 2 if respondent p decides to score on the upper boundary of item i.
Next, Z pi is submitted to a logistic graded response model (Samejima, 1969) with category threshold parameter g 1i 0 and g 2i 0 , for which it holds that g 0 2i < g 0 1i , so that where a i and y p are from the model under consideration, and PðZ pi ! cjy p ; τ i Þ ¼ 1 for c ¼ 0. Next, to facilitate interpretation, we use g 0i ¼ Àg 0 1i and g 1i ¼ Àg 0 2i , so that in the final model (see below), parameter g 0i directly models the conditional probability of X pi ¼ 0, and parameter g 1i directly models the conditional probability of X pi ¼ 1. Note that now, g 1i > g 0i . The probability distribution of Z pi is then given by Molenaar et al. Zero and One Inflated IRT FIGURE 5. Left column: Histograms of an example item (Item 1) in the different scenarios. Middle column: Estimates of the conventional beta-item response theory model (xaxis) and the true values (y-axis) for y p in the different scenarios. Right column: Estimates of the proposed model (x-axis) and the true values (x-axis) for y p in the different scenarios. The gray line indicates a one-to-one correspondence.
The final model consists of the joint conditional density of Z pi and X pi , which will be denoted by kð:Þ. As Z pi is deterministically related to X pi , Z pi itself can be neglected. Therefore, the final model is defined according to where f ð:Þ corresponds to the density from the original models above and τ i contains the item parameters of that model including g 0i and g 1i . A mechanism similar to the above is used by Ferrari (2010, 2012) to model zero or one inflation in the beta distribution and beta regression models. In those models however, kðX pi jy p ; τ i Þ is estimated freely for X pi ¼ 0 or for X pi ¼ 1 (i.e., the proportion of zeros or ones in the data), while here these probabilities are constrained according to the IRT model. These constraints make sure that information about the IRT parameters a i , b i , and y p is drawn from the zero and one scores. If estimated freely in the present approach, the zero and one scores will not contribute to the parameter estimation of these IRT parameters. An addition difference between the present work and the work by Ospina and Ferrari is that we consider zero and one inflation simultaneously, so that Z pi has three levels as discussed above. Ospina and Ferrari only considered zero or one inflation, by which Z pi only has two levels. Accommodating both zero and one inflation is desirable in the present IRT case as in continuous items, both end points may be used by the subjects.
Due to the inflation mechanism introduced above, the test information function of the models will change. The derivation of the test information function for the zero and one inflated bounded IRT model is given in the Appendix. Most importantly, the item and test information includes a contribution by the information from the zero and one scores and a contribution by the regular test information function Iðy p Þ from the bounded IRT model f ðX pi jy p ; τ i Þ used for 0 < X pi < 1 in Equation 16. Note that for the S B -IRT model, the resulting test information function is not constant anymore but has an inverted U-shape. The exact shapes of the test information functions are illustrated later for the different models.

Parameter Estimation
We implemented the models above in a Bayesian Markov Chain Monte Carlo (MCMC) framework. If θ denotes a vector of y 1 , . . . , y N , T denotes a matrix with the stacked τ i vectors for i ¼ 1; . . . ; n, and X denotes the N Â n matrix of item responses, then, the full posterior is proportional to pðθ; TjXÞ / sðθ; TjXÞ; where kð:Þ is given above. For all models, τ i contains logða i Þ, b i , g 0i , and g 1i . In addition, τ i contains logðd i Þ for the S B -IRT model, o i for the beta-IRT model, and logðf i Þ for the simplex-IRT model. Note that the number of free parameters is thus equal to N Â 5n for all models. To facilitate parameter estimation, we estimate the untransformed parameters (i.e., a i and b i instead of a Ã i and b Ã i ) as this parameterization is more stable for b i . However, the parameters can always be transformed afterwards. In addition, we estimate logða i Þ to avoid sign switching during estimation. In all models, the prior distribution of y p , gðÞ, is specified to be a Normal(0,1) distribution, and the prior distribution of τ i is specified to be independent Normal(0,10) distributions for each element of τ i . For g 1i , the normal prior is truncated below g 0i to ensure that g 1i > g 0i . The models are implemented in Stan using Rstan (Stan Development Team, 2019) in the R statistical computing environment (R core team, 2019). The scripts to fit the different models are available from www.dylanmolenaar.nl.

Model Comparison Using the Log Marginal Likelihood
To be able to select between the different models, we propose to use the fully marginalized log-likelihood, that is: The advantage of the log marginal likelihood is that it incorporates a penalty for model complexity in a natural way. As determining model complexity of the models under investigation in this study is not straightforward, we consider the log marginal likelihood better suitable to select between competing models as compared to, for example, the Deviance Information Criterion (DIC), Watanabe-Akaike Information Criterion (WAIC), and Bayesian Information Criterion (BIC). The marginal likelihood is the key ingredient of Bayes's factors commonly used for selection between two competing models. Here, we use the log Molenaar et al.
marginal likelihood as a fit index on its own by selecting the model with the highest log marginal likelihood as best fitting model; however, calculation of Bayes's factors is straightforward, which is illustrated in the real data application. Calculating the fully marginalized log-likelihood directly is infeasible due to the high number of nested integrals. However, the marginal likelihood can be approximated using bridge sampling (e.g., Gronau et al., 2017;Meng & Wong, 1996).

Simulation Study A: Item Parameter Recovery and Model Fit
In this simulation study, we simulate data according to the zero and one inflation mechanism in Equation 16 with f ðX pi jy p ; τ i Þ given by either the S B -IRT model (Equation 1), the beta-IRT model (Equations 3-5) or the simplex-IRT model (Equation 7). We use the same item parameters across replications to study parameter recovery of the item parameters. We use 12 items with true values given in Table 1. The true values for g 0i and g 1i correspond to the *10% zero and one inflation scenario from Figures 4 and 5, which we chose because it is the most challenging scenario for the inflated IRT model due to the reasonably large inflation in the data. In addition, for a i and b i , we use the same values across the different models as these parameters are either comparable (between the beta-IRT model and the simplex-IRT model due to Equations 6 and 9) or highly related (between the S B -IRT model and the other models). For the dispersion parameters, we use different values for the models as these parameters have different scales across the models. To give an illustration of how the simulated data look, we plotted the resulting densities and information functions for the different models in Figure 6 for Item 5. The distributions differ across items but are generally left-skewed comparable to Figure 6.
The true item parameter values in this study are inspired by the real dataset used in the example below. That is why the values of the item easiness parameters b i are chosen to represent a relatively easy test (this is what was found throughout most of the 22 scales in our real dataset, with the exception that for some scales the items are relatively difficult, but this generally only affects b i , so that the observed distributions are right skewed, but the simulation results below are equally applicable to these cases). We think that a relatively easy test as used in this simulation study is typical for tests with continuous items, which are mostly concerned with measuring constructs like personality and mood. Using a relatively easy test for the simulation does however result in smaller information about y p in its upper range. As the test information functions of the models differ in their shape, the relative easiness of the test will affect parameter estimates differently across the models. We think that this is interesting, as this will also happen in practice. However, this differential effect of the distribution of item easiness should be kept in mind when interpreting the results.

Zero and One Inflated IRT
In this study, we use sample sizes of 50, 100, and 200 persons. In addition, we use 100 replications. In each replication, we sample y p from a Normal(0,1) FIGURE 6. Density and item information function of Item 5 (b i ¼ 0:685; a i ¼ 0:5; g 0 ¼ À2; and g 1 ¼ 2Þ for the S B -IRT model (d i ¼ 2), the beta-IRT model (o i ¼ 2), and the simplex-IRT model (j i ¼ 7Þ. The zero and one inflation is not incorporated in the density plots as these are reflected by probabilities and not by densities. The zero and one inflation is however incorporated in the item information function. IRT ¼ item response theory. distribution. To the data in each replication, we fit the three bounded-IRT models subject to zero and one inflation (Equation 16). In addition, we estimate the unbounded normal-IRT model to investigate the effect of neglecting the bounded nature of the data. To enable a fair comparison to the other models, this model is also subjected to the zero and one inflation in Equation 16. Finally, the models are compared using the log marginal likelihood discussed above. We also considered the DIC (Spiegelhalter et al., 2002) and WAIC (Watanabe, 2010) model fit indices, to have a reference to compare the performance of the log marginal likelihood to. However, note that there are other fit indices that may perform better than the DIC and WAIC (e.g., the LOO-IC; Vehtari et al., 2017).

Parameter Recovery of the True Model
To study the parameter recovery, we focus on the posterior mean estimates of the correctly specified models in the different conditions in the simulation study for N ¼ 200. Tables 2 through 4 depict the mean squared error (MSE), the squared bias (BIAS 2 ), and the variance of the estimates (VAR) for the parameters of respectively the S B -IRT, the beta-IRT, and the simplex-IRT models for Items Molenaar et al.
1, 2, 5,6, 9, and 10 in the N ¼ 200 condition. To save space, we selected this subset to represent a mix of odd and even items (which differ in their item discrimination, see Table 1). For an adequate parameter recovery, the MSE is approximately equal to the VAR, which results in a BIAS 2 close to 0. As can be seen, for all parameters in all models, the parameter recovery seems acceptable with the difference between MSE and VAR being only notable in the third decimal. The results for the other sample size conditions (N ¼ 50 and N ¼ 100) are acceptable with adequate parameter recovery and a minor bias in the case of N ¼ 50, where the estimates are pulled toward their prior means.

Consequence of Misfit
To study the consequences for the item parameters of fitting an incorrect model to the data, we focus on the parameter estimates of the discrimination parameters and the easiness parameters across the different models for the different data scenarios in the simulation study. To enable a meaningful comparison, we focus on a Ã i and b Ã i as discussed above. Figures 7 and 8 depict the boxplots of the errors of, respectively, expðb Ã i Þ and a Ã i across replications for the different fitted models under the different data generating scenarios for Zero and One Inflated IRT N ¼ 200. We rely on expðb Ã i Þ for clarity of the figure as in a few cases b Ã i is large and negative which distorts the figure. 2 From Figure 7, it can be seen that for the bounded IRT models, b Ã i is hardly affected by misspecification. The only case where b Ã i is slightly underestimated is in the S B -IRT model in the case of simplex-IRT data. In the normal-IRT model, b Ã i is underestimated, in all data scenarios but the bias is minor. From Figure 8, it can be seen that model misspecification has a larger effect on a Ã i . That is, for the S B -IRT model, the simplex-IRT parameters are underestimated, and for the simplex-IRT model, the S B -IRT model parameter and the beta-IRT model parameters are overestimated. In addition, for the beta-IRT model, the simplex-IRT model parameters are underestimated. The beta-IRT model and the S B -IRT model are less biased with respect to each other. For the normal-IRT model, a Ã i is underestimated in all data scenarios with the bias being larger for the items with a larger item easiness.

Model Fit
In Table 5, the proportion of replications in which a given model is selected according to the log marginal likelihood, the DIC, and the WAIC is depicted (detection rates). As can be seen, the models are well separable: For N ¼ 50, the log marginal likelihood performs overall better compared to the DIC and WAIC  Zero and One Inflated IRT insignificant, p ¼ :591). For N ¼ 100, the log marginal likelihood overall still performs best, but with smaller differences. For N ¼ 200, the three fit measures perform optimal with all false positive rates equal to 0.00.

Conclusion
From the above, we can conclude that the parameter recovery of the different models is acceptable with MSE's close to the parameter variability. With respect to model selection, it appeared that for smaller sample sizes, the log marginal likelihood outperforms the DIC and WAIC, but for larger sample sizes, this difference diminishes. In addition, the three bounded IRT models are relatively robust with respect to each other in the estimation of the item easiness. However, with respect to the item discrimination, the beta-IRT and S B -IRT models are relatively robust to each other while they are biased if the data are generated from the simplex-IRT model. In addition, the simplex-IRT model appeared to be biased with respect to the beta-IRT and S B -IRT models. The normal-IRT model was biased in all scenarios with small effects on the easiness but relatively large effects on the discrimination parameter.

Simulation Study B: Person Parameter Recovery
Similarly, as in Simulation Study A, we simulate data according to the zero and one inflated bounded-IRT models above. However, we now use the same values for y p across replications to study parameter recovery of the person parameters. Specifically, we use the following levels for y p : À3, À2.5, À2, À1.5, À1, À0.5, 0, .5, 1, 1.5, 2, 2.5, and 3. The frequency with which the different levels for y p occur follows a standard normal distribution. In addition, we use 240 subjects, 6, 12, and 18 items, and 100 replications. In each replication, we sample the item parameters from a specific distribution (see Table 6). The choice for these distributions is again inspired by the real data application below. The test information across the replications on the basis of the true item parameters is depicted in Figure 9 for each model and n ¼ 18. To the data in each replication, we fit the same four models as in Simulation Study A and study parameter recovery of y p .

Results
For all models, a small shrinkage effect is found for the y p estimates in all models and conditions. That is, due to the finite numbers of item (18 at most), the y p estimates are pulled slightly to their prior mean. To be able to study parameter recovery, we adjust for the shrinkage effect by dividing the y p estimates by the Molenaar et al.
standard deviation of the estimates. As a result, all departures from the true y p values are due to bias and not due to shrinkage.

Parameter Recovery of the True Model
To study the parameter recovery, we focus on the estimates of the true model in the different conditions in the simulation study to see whether the expected squared bias approaches 0. Tables 7-9 contain the MSE, BIAS 2 , and VAR for the parameters of, respectively, the S B -IRT, the beta-IRT, and the simplex-IRT FIGURE 9. Test information functions across replications in Simulation Study B for the different models and n ¼ 18. Zero and One Inflated IRT models for the different values of y p and for a different number of items. As can be seen, for all models, the parameter recovery seems acceptable with the MSE being mostly due to parameter variability and with a small to neglectable contribution of the squared bias. For six items (n ¼ 6Þ, bias seems slightly larger for larger absolute values of y p , but this bias decreases for larger number of items. The MSE seems to follow the test information functions in Figure 9, at least for the beta-IRT and the simplex-IRT model, with the MSEs being somewhat smaller toward the upper y p values for the simplex-IRT model, while being somewhat smaller in the middle y p region for the beta-IRT model. Note that the values of the MSE itself cannot be compared between the models as these results are based on different data with different characteristics. For instance, the test information is overall much smaller for the data generated with the S B model (see Figure 9), which results in overall larger MSEs.

Consequences of Misfit
To study the consequences for the person parameters of fitting an incorrect model to the data, we focus on the parameter estimates of y p across the different models for the different data scenarios in the simulation study. Figure 10 depicts boxplots of the errors of the person parameters across replications for the different fitted models under the different data generating scenarios for 18 items. As expected due to the above, if the correct model is fit, the boxplots indicate no Molenaar et al.
bias. If the incorrect model is fit, it is most notable that the normal-IRT model is biased, with under estimation of the lower y p values and over estimation of the upper y p values. In addition, it seems that the S B -IRT and beta-IRT model are relatively robust with respect to each other. However, if the data are generated according to a simplex-IRT model, the y p estimates in both the S B -IRT and beta-IRT model are biased for the upper y p values. In addition, the simplex-IRT model is biased for the lower and upper y p values if the data follow a beta-IRT model. Figure 11 depicts the boxplots of the estimated posterior standard deviations of the person parameters across replications for each true value of y p in the different fitted models and under the different data generating scenarios for 18 items. As a reference, the boxes of the estimates in the true model are in gray. As can be seen, the normal-IRT model overestimates the posterior standard deviation under all data generation scenarios with the largest effect for the S B -IRT data scenario. For the other models, some differences are also evident, but smaller: For instance, in the beta-IRT, the posterior standard deviation is overestimated for larger values of y p in the S B -IRT data scenario. In addition, the S B -IRT and simplex-IRT underestimate the posterior standard deviation for larger y p values in the beta-IRT scenario. Finally, the beta-IRT model overestimates the posterior standard deviation for larger values of y p in the simplex scenario. Similarly as above, these local effects in the upper range of y p are due to positive skew in the simulated data. If these effects are reversed into negative skew, the lower range of y p will be affected. Zero and One Inflated IRT

Conclusion
The person parameter recovery is acceptable in all models. If an incorrect model is fit to the data, the person parameter estimates and posterior standard deviations in a normal-IRT model are biased across the full y p range. For the other models, the person parameters and posterior standard deviations may be biased in the upper or lower range of y p depending on the estimated model and the properties of the data under the data generation model.

Data
In this application, we apply the zero and one inflated IRT models from the present study to a dataset containing the responses of 244 respondents to 218 items from the ACL (Gough & Heilbrun, 1980) of which two scales were considered above in Motivating Example 1. The ACL is a personality questionnaire consisting of adjectives like "stable," "responsible," and "organized" to which respondents indicate to what degree this adjective applies to them. In the present study, the ACL was administered using a continuous response scale consisting of a 60-mm line segment. Responses are scored as the distance (in millimeters) from the left end of the line segment ("totally not applicable to me"). These responses are rescaled into the [0,1] interval to enable application of the present models. The ACL administration considered here covered 22 scales (the original ACL covers 30 scales). All scales contained 10 items, except for the two final scales 21 and 22 which contain nine items. In the ACL data, the upper end point of the response scale is never used. The average percentage of lower end point (zero) scores is between 0.2% and 10.7% (see Table 10).

Models
The four zero and one inflated IRT models considered in the simulation study are fit to each scale of the ACL separately. Aim is to see which models fit best and how the results from the different models compare to each other. In addition, we fitted the conventional models to see how the results differ. In our MCMC implementation, we used four chains of 10,000 samples from the posterior parameter distribution each. For each chain, the first 5,000 samples are discarded as burn-in. The chains are judged to be converged based on the split R-hat (Vehtari et al., 2021). For one scale, Scale 20 (Nurturant parent), the inflated S B -IRT model failed to converge with the split R-hat well above 1 for multiple parameters. Therefore, for Scale 20, we omit the results concerning the S B -IRT model. For the conventional models, we transformed all zero scores to 1e-05 except for the simplex-IRT model as this resulted in divergence of the dispersion parameter. For this model, we transformed the zeros to 0.01.

Results
Table 10 contains the log marginal likelihood for the different bounded-IRT models and the different ACL scales. Note that the log marginal likelihood can also be used to calculate Bayes's factors. For instance for the Communality scale, the log Bayes's factor between the S B -IRT model and the beta-IRT model equals 1446 À 1346 ¼ 100, indicating that evidence is strongly in favor of the S B -IRT model. Here, however, similarly as in the simulation study, we rely on the log marginal likelihood as a fit statistic, that is, the larger values indicate a better model fit. In addition, the DIC and WAIC fit indices agree about the best fitting model for all scales except Scale 22 (Adapted Child). For this scale, the DIC and Note. The log marginal likelihood for the best fitting model is indicated in boldface. "% zeros" indicates the percentage of zero scores averaged over the items of the corresponding scale. IRT ¼ item response theory. a For this scale, the S B -IRT model failed to converge.

Zero and One Inflated IRT
WAIC indicate the beta-IRT model as best fitting model, while the log marginal likelihood indicates the S B -IRT model. As in the results of the simulation study, the log marginal likelihood is associated with overall fewer false positives, and we rely on the log marginal likelihood and conclude that the S B -IRT model fits best for Scale 22.
As can be seen in Table 10, the beta-IRT model fits best to the majority of the scales followed by the S B -IRT model. The simplex-IRT model fits best to three of the 22 scales. Interestingly, the S B -IRT model fits best to the scales with the higher degrees of zero inflation. Below, we look closer to the results of the "Abasement" scale that was analyzed in the motivating example above and that has the most severe zero inflation on average (10.7%). Table 11 contains the posterior means and standard deviations for the a Ã i , b Ã i , dispersion, and g 0i parameters across models. Note that a Ã i and b Ã i are transformed parameters, which can be compared across the different models. In addition, g 1i are not considered as there are no one responses in the data. As can be seen from the table, results differ most notably between the normal-IRT model and the bounded-IRT models with the posterior means for b Ã i being systematically higher and a Ã i being systematically smaller for the normal-IRT model. The posterior means and standard deviations for the bounded-IRT models differ minorly for some items, but in general, the results tend to converge for the item parameters. Figure 12 contains histograms with fitted curves and item information for the zero-one inflated IRT models and the conventional bounded IRT models for three example items (2, 3, and 9) from the Abasement scale. As can be seen, generally, the information is larger in the inflation IRT model. In addition, the fitted curves differ notably across the inflation IRT models and the conventional models, especially in the case of a higher percentage of zero scores in the item. Figure 13 depicts the posterior means of y p of the Abasement scale for the four different models. As can be seen, for the normal-IRT model, these estimates differ substantially from the others especially in the upper and lower range of y p . In addition, for the S B -IRT and beta-IRT model, the posterior mean estimates are almost perfectly related, while for the simplex-IRT model, some minor differences are notable in the upper and lower range of y p .

Discussion
In this study, we proposed a zero and one inflated IRT framework for bounded continuous data in the closed interval. In two motivating examples, we showed in both real and simulated data that not taking zero and one inflation into account can seriously distort modeling results. Next, in the simulation study, we demonstrated the viability of the proposed framework and the suitability of the log marginal likelihood to select among the different models, even in small sample sizes. In addition, we studied the consequences of misfit in the distribution assumed by the continuous IRT models. It turned out that not modeling the Molenaar et al. Zero and One Inflated IRT bounded nature of the data can result in severe bias in the posterior means and standard deviations of the person and item parameters, but that misspecification of the bounded IRT model generally has a smaller impact on the results. Therefore, in practice, a general advice is to use the simplex-IRT model if (some of) the items show bimodality (as this model is most flexible in these cases) and to use the beta-IRT and S B models in the case of unimodal data (as these models are FIGURE 12. First three rows: Histograms with fitted curves and item information for the zero-one inflated item response theory (IRT) models and the conventional bounded IRT models for three example items (2, 3, and 9) from the Abasement scale. Fourth row: Test information functions. more flexible in these cases, although these models can handle some degree of bimodality). In addition, as misfit may still bias the results depending on the data generating model and the parameters of interest, it is always advisable to test different models and to base inferences on the best fitting model to decrease the misfit to a minimum. Although we thus argue for avoiding misfit of the conditional response distribution in practical applications involving bounded continuous data, the present methodology is fully parametric, so that there will always be some misfit. An alternative may be a nonparametric approach (e.g., based on splines, Jungbacker et al., 2014) or an approach based on mixtures (e.g., Dolan & Van der Maas, 1998); however, in these more complex models, more parameter uncertainty is introduced. Therefore, we emphasized the model fit and model selection aspect of the present topic, to hopefully decrease misfit while retaining the benefits from the parametric form of the distribution. To this end, further research may focus on FIGURE 13. Posterior mean estimates of y p for the Abasement scale.

Zero and One Inflated IRT
where cð:Þ is the logistic function, and all parameters are as defined in this article. Note that the test information implied by the traditional model, f ðX pi jy p ; τ i Þ is denoted Iðy p Þ in this article and is known for all models considered. The test information for the zero and one inflated model above is Below, we give each of the three expectations. For brevity, denote P 0 ¼ cðg 0i À a i y p Þ; Q 0 ¼ 1 À cðg 0i À a i y p Þ; P 1 ¼ cðg 1i À a i y p Þ; Q 1 ¼ 1 À cðg 1i À a i y p Þ: Then, the expectations are given by where Iðy p Þ denotes the test information function of the conventional model.

Acknowledgments
We thank Harry Vorst for providing the data in the application section and Esther Lietaert Peerbolte for her help and discussion on this topic. In memory of Don Mellenbergh, whose enthusiasm and interest in item response theory (IRT) was a great inspiration to the first author. Discussions between Don and the first author on the topic of continuous IRT models have inspired the present work.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

Notes
1. In the model that we present later, we used the following values to create the data (depending on the scenario): g 0i ¼ À1 (no zero inflation), g 1i ¼ 1 (no one inflation), g 0i ¼ À3 (*5% zero inflation), g 0i ¼ À2 (*10% zero inflation), g 1i ¼ 3 (*5% one inflation), and g 1i ¼ 2 (*10% zero inflation). As mentioned in the text, the values for a i , b i , and o i were set to the estimates as found for the Affiliation scale. 2. Specifically, in the calculation of b Ã i ¼ À b i a i , one divides by a i . In the present simulation study, the true value of a i is 0.5 for the odd items. As a result, in a few replications, the estimate of a i is close to 0. This is unproblematic for the model, but, for these cases, b Ã i can become large and negative, which distorts a figure like Figure 7. We therefore relied on the bias of expðb Ã i Þ.