Abstract
In forced-choice questionnaires, respondents have to make choices between two or more items presented at the same time. Several IRT models have been developed to link respondent choices to underlying psychological attributes, including the recent MUPP (Stark et al. in Appl Psychol Meas 29:184–203, 2005) and Thurstonian IRT (Brown and Maydeu-Olivares in Educ Psychol Meas 71:460–502, 2011) models. In the present article, a common framework is proposed that describes forced-choice models along three axes: (1) the forced-choice format used; (2) the measurement model for the relationships between items and psychological attributes they measure; and (3) the decision model for choice behavior. Using the framework, fundamental properties of forced-choice measurement of individual differences are considered. It is shown that the scale origin for the attributes is generally identified in questionnaires using either unidimensional or multidimensional comparisons. Both dominance and ideal point models can be used to provide accurate forced-choice measurement; and the rules governing accurate person score estimation with these models are remarkably similar.
Similar content being viewed by others
Notes
Here, the standard coding procedure in the Thurstonian choice literature (Maydeu-Olivares & Böckenholt, 2005) is adopted. It is important to note that at the point of coding, no assumptions are made about the underlying distributions, decision mechanisms, etc.
Negative weights in IP models do not make sense conceptually; hence, we use the squared values.
Double exponential (or Gumbel; sometimes referred to as Weibull) distribution has the cumulative function \(F(z)=\exp (-\exp (-z))\).
Ignoring these assumptions and using the normal ogive link function results in probabilities that are different from those predicted by Thurstone’s model (12). Discrepancies depend on the combination of two utilities, and can be large. For normally distributed utilities, Thurstone’s model provides better prediction.
Unlike in paired comparison tasks, it is assumed that no items are repeated across the forced-choice questionnaire. This is common practice in questionnaire design.
For the partial ranking design whereby only one “best” item must be chosen, the multinomial logistic model of McFadden (16) may be used to model choices within each block, if it can be assumed that error variances are all equal. The choices for different blocks are independent conditional on the personal attributes, and the probability of observed response pattern is the product of probabilities of block choices. Since the assumption of equal error variances is often untenable, this model will not be considered further.
References
Andersen, E. B. (1976). Paired comparisons with individual differences. Psychometrika, 41(2), 141–157.
Andrich, D. (1989). A probabilistic IRT model for unfolding preference data. Applied Psychological Measurement, 13, 193–296.
Andrich, D. (1995). Hyperbolic cosine latent trait models for unfolding direct-responses and pairwise preferences. Applied Psychological Measurement, 20, 269–290.
Bartram, D. (2007). Increasing validity with forced-choice criterion measurement formats. International Journal of Selection and Assessment, 15, 263–272.
Bennett, J. F., & Hays, W. L. (1960). Multidimensional unfolding: Determining the dimensionality of ranked preference data. Psychometrika, 25, 27–43.
Block, J. (1961). The Q-sort method in personality assessment and psychiatric research. Springfield, IL: Charles C. Thomas.
Böckenholt, U. (2004). Comparative judgments as an alternative to ratings: Identifying the scale origin. Psychological Methods, 9, 453–465.
Böckenholt, U. (2006). Thurstonian-based analyses: Past, present and future utilities. Psychometrika, 71(4), 615–629.
Bradley, R. A. (1953). Some statistical methods in taste testing and quality evaluation. Biometrics, 9, 22–38.
Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39, 324–345.
Brady, H. E. (1989). Factor and ideal point analysis for interpersonally incomparable data. Psychometrika, 54, 181–202.
Brown, A. (2009). Doing less but getting more: Improving forced-choice measures with IRT. Paper presented at the 24th annual conference of the Society for Industrial and Organizational Psychology, New Orleans, LA.
Brown, A. & Bartram, D. (2009–2011). OPQ32r Technical Manual. Surrey, UK: SHL Group.
Brown, A., & Maydeu-Olivares, A. (2010). Issues that should not be overlooked in the dominance versus ideal point controversy. Industrial and Organizational Psychology, 3, 489–493.
Brown, A., & Maydeu-Olivares, A. (2011). Item response modeling of forced-choice questionnaires. Educational and Psychological Measurement, 71, 460–502.
Brown, A., & Maydeu-Olivares, A. (2012). Fitting a Thurstonian IRT model to forced-choice data using Mplus. Behavior Research Methods, 44, 1135–1147.
Brown, A., & Maydeu-Olivares, A. (2013). How IRT can solve problems of ipsative data in forced-choice questionnaires. Psychological Methods, 18, 36–52.
Brown, A., & Maydeu-Olivares, A. (in press). Modeling forced-choice response formats. In P. Irwing, T. Booth, & D. Hughes (Eds.), The Wiley Handbook of Psychometric Testing. London: Wiley.
Chan, W. (2003). Analyzing ipsative data in psychological research. Behaviormetrika, 30, 99–121.
Cheung, M. W. L., & Chan, W. (2002). Reducing uniform response bias with ipsative measurement in multiple-group confirmatory factor analysis. Structural Equation Modeling, 9, 55–77.
Christiansen, N., Burns, G., & Montgomery, G. (2005). Reconsidering the use of forced-choice formats for applicant personality assessment. Human Performance, 18, 267–307.
Clemans, W. V. (1966). An analytical and empirical examination of some properties of ipsative measures. Psychometric Monographs, 14.
Coombs, C. H. (1950). Psychological scaling without a unit of measurement. Psychological Review, 57, 145–158.
Coombs, C. H. (1960). A theory of data. Psychological Review, 67, 143–159.
De Soete, G., & Carroll, J. D. (1983). A maximum likelihood method for fitting the wandering vector model. Psychometrika, 48, 553–566.
Drasgow, F., Chernyshenko, O. S., & Stark, S. (2009). Test theory and personality measurement. In J. N. Butcher (Ed.), Oxford handbook of personality assessment. London: Oxford University Press.
Drasgow, F., Chernyshenko, O. S., & Stark, S. (2010). 75 years after Likert: Thurstone was right!. Industrial and Organizational Psychology: Perspectives on Science and Practice, 3, 465–476.
Huang, J., & Mead, A. D. (2014, July 7). Effect of personality item writing on psychometric properties of ideal-point and Likert scales. Psychological Assessment. Advance online publication. doi: http://dx.doi.org/10.1037/a0037273.
Jackson, D., Wroblewski, V., & Ashton, M. (2000). The impact of faking on employment tests: Does forced choice offer a solution? Human Performance, 13, 371–388.
Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York, NY: Wiley.
Luce, R. D. (1977). The choice axiom after twenty years. Journal of Mathematical Psychology, 15, 215–233.
Martin, B. A., Bowen, C.-C., & Hunt, S. T. (2002). How effective are people at faking on personality questionnaires? Personality and Individual Differences, 32, 247–256.
Maydeu-Olivares, A. (1999). Thurstonian modeling of ranking data via mean and covariance structure analysis. Psychometrika, 64, 325–340.
Maydeu-Olivares, A., & Böckenholt, U. (2005). Structural equation modeling of paired-comparison and ranking data. Psychological Methods, 10, 285–304.
Maydeu-Olivares, A., & Böckenholt, U. (2008). Modeling subjective health outcomes: Top 10 reasons to use Thurstone’s method. Medical Care, 46, 346–348.
Maydeu-Olivares, A., & Brown, A. (2010). Item response modeling of paired comparison and ranking data. Multivariate Behavioral Research, 45, 935–974.
McCloy, R., Heggestad, E., & Reeve, C. (2005). A silk purse from the sow’s ear: Retrieving normative information from multidimensional forced-choice items. Organizational Research Methods, 8, 222–248.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
McFadden, D. (1973). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in Econometrics. New York: Academic Press.
McFadden, D. (1976). Quantal choice analysis: A survey. Annals of Economic and Social Measurement, 5, 363–390.
McFadden, D. (2001). Economic choices. The American Economic Review, 91(3), 351–378.
Meade, A. (2004). Psychometric problems and issues involved with creating and using ipsative measures for selection. Journal of Occupational and Organisational Psychology, 77, 531–552.
Muthén, L.K. & Muthén, B.O. (1998–2012). Mplus user’s guide (7th ed.). Los Angeles, CA: Muthén & Muthén.
Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2000). A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24, 3–32.
Schwarz, N., Knäuper, B., Hippler, H. J., Noelle-Neumann, E., & Clark, L. (1991). Rating scales numeric values may change the meaning of scale labels. Public Opinion Quarterly, 55, 570–582.
Shepard, R. N. (1957). Stimulus and response generalization: A stochastic model relating generalization to distance in psychological space. Psychometrika, 22, 325–345.
Stark, S., Chernyshenko, O., & Drasgow, F. (2005). An IRT approach to constructing and scoring pairwise preference items involving stimuli on different dimensions: The multi-unidimensional pairwise-preference model. Applied Psychological Measurement, 29, 184–203.
Stark, S., & Drasgow, F. (2002). An EM approach to parameter estimation for the Zinnes and Griggs paired comparison IRT model. Applied Psychological Measurement, 26, 208–227.
Takane, Y. (1987). Analysis of covariance structures and probabilistic binary choice data. Communication and Cognition, 20, 45–62.
Takane, Y. (1996). An item response model for multidimensional analysis of multiple choice data. Behaviormetrika, 23, 153–167.
Takane, Y., & De Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.
Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273–286.
Thurstone, L. L. (1928). Attitudes can be measured. American Journal of Sociology, 33, 529–554.
Thurstone, L. L. (1929). The measurement of psychological value. In T. V. Smith & W. K. Wright (Eds.), Essays in philosophy by seventeen doctors of philosophy of the University of Chicago (pp. 157–174). Chicago: Open Court.
Thurstone, L. L. (1931). Rank order as a psychophysical method. Journal of Experimental Psychology, 14, 187–201.
Tsai, R. C., & Böckenholt, U. (2001). Maximum likelihood estimation of factor and ideal point models for paired comparison data. Journal of Mathematical Psychology, 45, 795–811.
Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 79(4), 281–299.
Vasilopoulos, N. L., Cucina, J. M., Dyomina, N. V., Morewitz, C. L., & Reilly, R. R. (2006). Forced-choice personality tests: A measure of personality and cognitive ability? Human Performance, 19, 175–199.
Zinnes, J. L., & Griggs, R. A. (1974). Probabilistic, multidimensional unfolding analysis. Psychometrika, 39, 327–350.
Acknowledgments
I am grateful to Alberto Maydeu-Olivares for his continuous support and helpful comments on an earlier draft of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Brown, A. Item Response Models for Forced-Choice Questionnaires: A Common Framework. Psychometrika 81, 135–160 (2016). https://doi.org/10.1007/s11336-014-9434-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-014-9434-9