Abstract
A Cultural Consensus Theory approach for ordinal data is developed, leading to a new model for ordered polytomous data. The model introduces a novel way of measuring response biases and also measures consensus item values, a consensus response scale, item difficulty, and informant knowledge. The model is extended as a finite mixture model to fit both simulated and real multicultural data, in which subgroups of informants have different sets of consensus item values. The extension is thus a form of model-based clustering for ordinal data. The hierarchical Bayesian framework is utilized for inference, and two posterior predictive checks are developed to verify the central assumptions of the model.
Similar content being viewed by others
Notes
Note that these latter two parameters have generally different interpretations than in current typical IRT models, and this is further explained in the discussion.
One way of coding this (for Bayesian inference software) can be viewed in Appendix A.
When C=2 categories, the sum-to-zero constraint provides that G=γ 1=0.
These settings were found to work well in applications to both simulated and real data; researchers may consider exploring other prior distribution settings however, to optimize mixing within their specific applications.
As a characteristic of being a finite mixture model, label-switching and mixing phenomena (Stephens 2000) are possible in the MC-LTRM, which need to be addressed prior to calculating convergence, model comparison statistics, and posterior predictive checks. For more information on handling these, see Section 3 in Anders and Batchelder (2012).
All simulated data sets were generated from the hierarchical LTRM or MC-LTRM specified in Section 2.3, with hyperparameters randomly generated in a sensible interval within the diffuse hyperpriors using the uniform distribution.
In respect to the discrete parameters (the e i of the MC-LTRM), inspection of their trace plots to assess that the chains have converged on similar distributions may be more preferred than the \(\hat{R}\) diagnostic; as the \(\hat{R}\) diagnostic may have difficulty in properly assessing convergence for discrete parameters whose chains have a high likelihood to converge to distributions with zero or approximate zero variance (as driven by a strong signal in the data).
Similar simulations were performed with the MC-LTRM\(_{\lambda_{k} \not= 1}\) as the one here. The parameter recovery and eigenvalue check results were comparable. In addition, the DIC often preferred the MC-LTRM\(_{\lambda_{k} \not= 1}\) over the MC-LTRM\(_{\lambda_{k} = 1}\) for data simulated by the MC-LTRM\(_{\lambda_{k} \not= 1}\), unless there was very little heterogeneity in the simulated λ k values.
In IRT models, the item difficulty is established by the response thresholds for each item, whereas in contrast, the response thresholds of the LTRM pertain to characteristic response biases of each informant.
Note: another way of measuring consensus student abilities is to have N raters assess P productions from each student and link these hierarchically to M student ability traits (where M is the number of students involved); this has the benefit of using more than one assessment per grader on each student to calculate the consensus values of latent student ability and is the design of the HRM discussed previously. One could consider expanding the LTRM this way in future work.
References
Anders, R. (2013). CCTpack: cultural consensus theory applications to data. R package version 0.9.
Anders, R., & Batchelder, W.H. (2012). Cultural consensus theory for multiple consensus truths. Journal for Mathematical Psychology, 56, 452–469.
Batchelder, W.H., & Anders, R. (2012). Cultural consensus theory: comparing different concepts of cultural truth. Journal of Mathematical Psychology, 56, 316–332.
Batchelder, W.H., & Romney, A.K. (1986). The statistical analysis of a general condorcet model for dichotomous choice situations. In B. Grofman & G. Owen (Eds.), Information pooling and group decision making: proceedings of the second University of California Irvine conference on political economy (pp. 103–112). Greenwich: JAI Press.
Batchelder, W.H., & Romney, A.K. (1988). Test theory without an answer key. Psychometrika, 53, 71–92.
Batchelder, W.H., & Romney, A.K. (1989). New results in test theory without an answer key. In Roskam (Ed.), Mathematical psychology in progress (pp. 229–248). Heidelberg: Springer.
Buhrmester, M., Kwang, T., & Gosling, S.D. (2011). Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data?. Perspectives on Psychological Science, 6, 3–5.
Comrey, A.L. (1962). The minimum residual method of factor analysis. Psychological Reports, 11, 15–18.
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559.
DeCarlo, L.T. (2005). A model of rater behavior in essay grading based on signal detection theory. Journal of Educational Measurement, 42, 53–76.
Fischer, G.H., & Molenaar, I.W. (1995). Rasch models: recent developments and applications. New York: Springer.
Fox, C.R., & Tversky, A. (1995). Ambiguity aversion and comparative ignorance. The Quarterly Journal of Economics, 110, 585–603.
Fox, J. (2013). Polycor: polychoric and polyserial correlations. R package version 0.7-8.
Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (2004). Bayesian data analysis (2nd ed.). Boca Raton: Chapman and Hall/CRC.
Gonzalez, R., & Wu, G. (1999). On the shape of the probability weighting function. Cognitive Psychology, 38, 129–166.
Green, D.M., & Swets, J.A. (1966). Signal detection theory and psychophysics. New York: Wiley.
Hruschka, D.J., Kalim, N., Edmonds, J., & Sibley, L. (2008). When there is more than one answer key: cultural theories of postpartum hemorrhage in Matlab, Bangladesh. Field Methods, 20, 315–337.
Johnson, V.E., & Albert, J.H. (1999). Ordinal data modeling. Statistics for social science and public policy. Berlin: Springer.
Karabatsos, G., & Batchelder, W.H. (2003). Markov chain estimation methods for test theory without an answer key. Psychometrika, 68, 373–389.
Kruschke, J.K. (2011). Doing Bayesian data analysis: a tutorial with R and BUGS. Amsterdam: Elsevier/Academic Press.
Lancaster, H., & Hamdan, M. (1964). Estimation of the correlation coefficient in contingency tables with possibly nonmetrical characters. Psychometrika, 29, 383–391.
Lee, M.D. (2011). How cognitive modeling can benefit from hierarchical Bayesian models. Journal of Mathematical Psychology, 55, 1–7.
Lord, F.M., Novick, M.R., & Birnbaum, A. (1968). Statistical theories of mental test scores (Vol. 47). Reading: Addison-Wesley.
Macmillan, N.A., & Creelman, C.D. (2005). Detection theory: a users guide (2nd ed.). Mahwah: Erlbaum.
Nering, M.L., & Ostini, R. (2011). Handbook of polytomous item response theory models. New York: Taylor and Francis.
Patz, R.J., Junker, B.W., Johnson, M.S., & Mariano, L.T. (2002). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics, 27, 341–384.
Plummer, M. (2003). JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling.
Plummer, M. (2012). Rjags: Bayesian graphic models using MCMC. R package version 3.2.0. http://CRAN.R-project.org/package=rjags.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Denmarks Paedagogiske Institute.
Revelle, W. (2012). psych: procedures for psychological, psychometric, and personality research. Northwestern University Evanston, Illinois. R package version 1.2.1.
Rigdon, E.E. (2010). Polychoric correlation coefficient. In N.J. Salkind (Ed.), Encyclopedia of research design (pp. 1046–1049). Thonsand Oaks: Sage.
Romney, A.K., & Batchelder, W.H. (1999). Cultural consensus theory. In R. Wilson & F. Keil (Eds.), The MIT encyclopedia of the cognitive sciences (pp. 208–209). Cambridge: MIT Press.
Romney, A.K., Batchelder, W.H., & Weller, S.C. (1987). Recent applications of cultural consensus theory. American Behavioral Scientist, 31, 163–177.
Romney, A.K., Weller, S.C., & Batchelder, W.H. (1986). Culture as consensus: a theory of culture and informant accuracy. American Anthropologist, 88, 313–338.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement.
Spearman, C.E. (1904). ‘General intelligence’ objectively determined and measured. The American Journal of Psychology, 15, 72–101.
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B, 6, 583–640.
Sprouse, J., Wagers, M., & Phillips, C. (2012). A test of the relation between working-memory capacity and syntactic island effects. Language, 88, 82–123.
Stephens, M. (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 62, 795–809.
Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393–408.
van der Linden, W.J., & Hambleton, R.K. (1997). Handbook of modern item response theory. Berlin: Springer.
Weller, S.W. (2007). Cultural consensus theory: applications and frequently asked questions. Field Methods, 19, 339–368.
Zhang, H., & Maloney, L.T. (2012). Ubiquitous log odds: a common representation of probability and frequency distortion in perception, action and cognition. Frontiers in Neuroscience, 6.
Acknowledgements
This research was funded by grants to the second author from the U.S. Air Force Office of Scientific Research (AFOSR), the Army Research Office (ARO), and an award from the Oak Ridge Institute for Science and Education (ORISE).
We would like to thank Jon Sprouse for making available to us his grammaticality data set. In addition, we are grateful to Zita Oravecz for her helpful advice and comments.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A. JAGS Model Code
1.1 A.1 Latent Truth Rater Model (LTRM) Code
model{
#Data
for (i in 1:n){
for (k in 1:m){
tau[i,k] <- E[i]/lam[k]
pX[i,k,1] <- pnorm((a[i]*g[1]) + b[i],T[k],tau[i,k])
for (c in 2:(C-1)){
pX[i,k,c] <- pnorm((a[i]*g[t]) + b[i],T[k],tau[i,k]) - sum(pX[i,k,1:(c-1)])}
pX[i,k,C] <- (1 - sum(pX[i,k,1:(C-1)]))
X[i,k] ~ dcat(pX[i,k,1:T])}}
#Parameters
for (k in 1:m){
T[k] ~ dnorm(Tmu,Ttau)
ilogitT[k] <- ilogit(T[k])
lam[k] ~ dgamma(lamtau,lamtau)}
for (c in 1:(C-2)){
dg[c] ~ dnorm(0,.1)}
dg2[1:(C-2)] <- dg[1:(C-2)]
dg2[C-1] <- -sum(dg[1:(C-2)])
g <- sort(dg2)
for(c in 1:(C-1)){
ilogitg[c] <- ilogit(g[c])}
for (i in 1:n){
E[i] ~ dgamma(pow(Emu,2)*Etau,Emu*Etau)
a[i] ~ dgamma(atau,atau)
b[i] ~ dnorm(bmu,btau)}
#Hyperparameters
Tmu ~ dnorm(0,.1)
Ttau ~ dgamma(1,.1)
bmu <- 0
btau ~ dgamma(1,.1)
amu <- 1
atau ~ dgamma(1,.1)
Emu ~ dgamma(4,4)
Etau ~ dgamma(4,4)
lammu <- 1
lamtau ~ dgamma(4,4)}
1.2 A.2 Multi-Culture Latent Truth Rater Model (MC-LTRM) Code
model{
#Data
for (i in 1:n){
for (k in 1:m){
tau[i,k] <- E[i]/lam[k]
pX[i,k,1] <- pnorm((a[i]*g[1,om[i]]) + b[i],T[k,om[i]],tau[i,k])
for (c in 2:(C-1)){
pX[i,k,c] <- pnorm((a[i]*g[t,om[i]]) + b[i],T[k,om[i]],tau[i,k])
- sum(pX[i,k,1:(c-1)])}
pX[i,k,C] <- (1 - sum(pX[i,k,1:(C-1)]))
X[i,k] ~ dcat(pX[i,k,1:C])}}
#Parameters
for (v in 1:V){
for (k in 1:m){
T[k,v] ~ dnorm(Tmu[v],Ttau[v])
ilogitT[k,v] <- ilogit(T[k,v])}
for (c in 1:(C-2)){
dg[c,v] ~ dnorm(0,.1)}
dg2[1:(C-2),v] <- dg[1:(C-2),v]
dg2[T-1,v] <- -sum(dg[1:(C-2),v])
g[1:(C-1),v] <- sort(dg2[1:(C-1),v])
for(c in 1:(C-1)){
ilogitg[c,v] <- ilogit(g[c,v])}}
for (k in 1:m){
lam[k] ~ dgamma(1*lamtau,1*lamtau)}
for (i in 1:n){
om[i] ~ dcat(pi)
E[i] ~ dgamma(pow(Emu[om[i]],2)*Etau[om[i]],Emu[om[i]]*Etau[om[i]])
a[i] ~ dgamma(atau[om[i]],atau[om[i]])
b[i] ~ dnorm(bmu[om[i]],btau[om[i]])}
pi[1:V] ~ ddirch(alpha)
#Hyperparameters
for (v in 1:V){
alpha[v] <- 1
Tmu[v] ~ dnorm(0,.1)
Ttau[v] ~ dgamma(1,.1)
bmu[v] <- 0
btau[v] ~ dgamma(1,.1)
amu[v] <- 1
atau[v] ~ dgamma(1,.1)
Emu[v] ~ dgamma(4,4)
Etau[v] ~ dgamma(4,4)}
lammu <- 1
lamtau ~ dgamma(1,.1)}
Appendix B. Cities Questionnaire
2.1 B.1 Informants rated the following for either Irvine, New York, or Miami
-
1.
Rate the amount of rain experienced during the fall
-
2.
Rate the amount of snow experienced during the winter
-
3.
Rate the level of humidity in the summer
-
4.
Rate the general wind factor during the fall
-
5.
Rate how cold it is during the winter
-
6.
Rate how hot it is during the summer
-
7.
Rate the range of temperatures experienced across the year
-
8.
Rate the amount of people that use public transportation as the primary mode of transport
-
9.
Rate the amount of crime that occurs
-
10.
Rate the amount of ethnic/racial diversity
-
11.
Rate how liberally minded the general population is
-
12.
Rate how much “nightlife” the city has
-
13.
Rate the population density of the city
-
14.
Rate how close the ocean is
-
15.
Rate how modernized the city is
-
16.
Rate the air quality (smog level) of the city
-
17.
Rate the cleanliness of the city
-
18.
Rate how well-known the city is compared to other cities in the state
-
19.
Rate the cost of living in the city
-
20.
Rate the amount of homeless people living in the city
Appendix C. Spearman and Item Difficulty Properties of the Model
3.1 C.1 Spearman Law Property with Proof
Theorem 1
Suppose that Axioms 1*, 2–5, and 6* hold for the MC-LTRM. Then given fixed values of \(\pmb{\mathcal{T}}\), E, Λ, and Ω: ∀i,j=1,…,N∋i≠j,
where K is a random variable representing item indices, with probability density \(\operatorname{Pr}(K=k) = 1/M \ \forall k = 1, \ldots, M \).
Proof
Note that \(Y_{ik} = T_{\Omega_{i} k} + \epsilon_{ik}\) for the MC-LTRM and that ∀i,k,E(ϵ ik )=0. Further, by Axiom 4, conditional independence requires that all of the \((\epsilon _{ik})_{i=1}^{N}\) are conditionally independent for fixed \(\pmb{\mathcal {T}}\), E, Λ, and Ω. From these assumptions the terms in (C.1) can be calculated as follows. First, note that with the matrix of latent appraisals, Y=(Y ik ) N×M , the correlation between two informants over items is
Next, note that
Now from the zero mean and conditional independence properties of the error random variables, it is clear that (C.3) becomes
Similarly, other aspects of (C.2) can be computed, such as
so the result is
Next consider the terms on the right side of (C.1). Using the same methods, it is easy to calculate
where the computational formula for the variance of a random variable, Var(X)=E(X 2)−E 2(X), is used. Finally, the third correlation is obtained as
When these three correlations are multiplied, the result is (13). □
While the triple correlation property behind the latent appraisals of the MC-LTRM is given in (13), note that if all informants share the same cultural truth (V=1), (13) reduces to
3.2 C.2 Item Difficulty Property with Proof
Theorem 2
Suppose that Axioms 1–6 hold for the LTRM and item difficulty is neutral (λ k =1). Then, for fixed T and E,
Proof
First, using conditional independence, for any item k, we have
From (2), Var(ϵ ik )=1/τ ik , and under the assumption that λ k =1 in (3),
and this is the same for all items k. □
3.3 C.3 Rasch Application for Including Item Difficulty
If the item difficulty is assumed heterogeneous (varies over items), then the parameter λ k >0 scales the magnitude of the informant precision, depending on the difficulty of the item; for example, assigning student grades is a type of categorization, and one might conceive of essays that are easier to grade than others. The logic behind incorporating the item difficulty in the model as in (3) is achieved by following earlier CCT work that adopts a form of the Rasch (1960) model (Batchelder & Romney, 1988; Karabatsos & Batchelder, 2003).
The Rasch model applies to a doubly indexed parameter, p ik with space (0,1), and since τ ik ∈(0,∞), one can see how the relationship in (3) develops by applying the Rasch model to
Then the Rasch model can be parameterized in several different ways (e.g., Fischer and Molenaar 1995), and in this case, a convenient way is to use the parameterization
where E i ,λ k >0. Note that solving (C.6) for τ ik yields (3).
The key to successfully reducing the number of estimated parameters for doubly indexed, subject-item quantities, such as τ ik , is to assume that the separate contributions, which form (3), are additive on some scale with no interaction. In this case, additivity on a logarithmic scale is demonstrated by lnτ ik =lnE i −lnλ k . However, an inherited issue by using the Rasch model is an identifiability problem that must be handled during estimation. One can see the identifiability problem by defining \(\forall c > 0, E_{i}^{*} = c E_{i}, \lambda_{k}^{*} = c \lambda_{k}\); then \(\frac{E_{i}}{\lambda_{k}} = \frac{E_{i}^{*}}{\lambda_{k}^{*}}\). When estimating a fixed-effects version of the model, the identifiability issue can be handled in several ways, such as by setting the mean of the item difficulty parameters to a neutral value, \(\bar{\lambda} = 1\). In the hierarchical version of the model, the hierarchical distribution mean can be set to a neutral value, μ λ =1, which was the method used in our analysis.
Rights and permissions
About this article
Cite this article
Anders, R., Batchelder, W.H. Cultural Consensus Theory for the Ordinal Data Case. Psychometrika 80, 151–181 (2015). https://doi.org/10.1007/s11336-013-9382-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-013-9382-9