Multinomial models with linear inequality constraints: Overview and improvements of computational methods for Bayesian inference☆
Introduction
Multinomial random variables form the backbone of discrete and categorical data analysis within psychology and the behavioral sciences. The key to any viable data analysis is the successful translation of an abstract theoretical hypothesis into a concrete, statistical model. As a simple example, consider the hypothesis that overconsumption of drugs (i.e., taking more tablets than prescribed) decreases with the number of daily doses (Paes, Bakker, & Soe-Agnie, 1997). To assess the validity of this prediction, one could test the statistical hypothesis that overconsumption is identical across all dosage regimes. If this hypothesis is rejected, one could carry out subsequent analyses to determine if the rates differ across the dosage conditions in a pairwise fashion. Yet, testing the “straw-man” model of all dosage conditions resulting in identical rates of overconsumption is not necessarily a faithful translation of the original hypothesis, rather, it is a means to an end, serving only as a pretext to carrying out tests on multiple pairs of dosage conditions.
To make this example more concrete, suppose we have three dosage regimes of the drug (i.e., once, twice, and three times daily) in a between-subjects design (Paes et al., 1997). We model the number of participants showing overconsumption in each condition as a binomial random variable and define the parameters , , and as the corresponding probabilities that an individual takes more tablets than prescribed. While we could test whether the three parameters are equal across all conditions (i.e., ), this does not directly follow from our original hypothesis which only specified a monotonic relationship between overconsumption and dosage regimen. Testing the hypothesis of interest requires specifying an ordering relationship imposed on the overconsumption rates for each of the three dosage conditions: Paired with the binomial likelihood function, these order constraints represent a more faithful statistical analysis of the hypothesis being tested (see also Hoijtink, 2011 for a full discussion). Testing order constraints such as these, and linear inequality constraints more generally, requires a bit more effort than simpler tests of equality, but, as we show, can be carried out efficiently and are more interpretable.
A key difficulty in analyzing inequality-constrained models and theories is that it can quickly become difficult to characterize the resulting restricted parameter space (e.g., Davis-Stober, 2012, Fishburn, 1992). Our drug dosage example is quite simple—indeed, for Eq. (1), there are only two non-redundant pairwise order constraints, namely, and . When combined with the inequality constraints that the probability of overconsumption must be between zero and one for all conditions (i.e., ), this completely characterizes the ordering relationships of interest. However, not all interesting hypotheses are so simple in structure. As we illustrate in Section 5.3, the random preference model of Regenwetter and Davis-Stober (2012) is far more complex with 75,834 non-redundant linear inequalities.
In general, bounded, linearly restricted parameter spaces can be defined in two different, yet equivalent, ways (Brøndsted, 2012). First, the restricted parameter space can be defined as the solution space to a system of a finite number of linear inequalities and equalities — similar to our drug dosage example. Alternatively, the same restricted parameter space can be defined as the convex hull of a set of extremal points (vertices). Let . For our simple dosage example, the set of all extremal points is the set of all vectors, , where each entry is equal to 0 or 1 and satisfy the above inequalities, which yields the set: , , , and . Section 1.1 shows that it is often relatively easy to derive these vertices by enumerating all patterns that are predicted by a psychological theory even though it may be difficult to specify the corresponding system of inequality constraints (Regenwetter & Robinson, 2017).
Irrespective of how inequality constraints are formally specified, their statistical analysis has been a long-standing issue in mathematical psychology (Iverson & Falmagne, 1985) and statistics in general (Barlow et al., 1972, Robertson et al., 1988, Silvapulle and Sen, 2004). In classical statistics, basic results regarding the asymptotic distribution of the likelihood ratio test are valid when testing equality constraints, but are not when testing inequality constraints (Davis-Stober, 2009, Silvapulle and Sen, 2004). As a remedy, methods for inequality-constrained models have recently been developed in the Bayesian framework (Hoijtink et al., 2008, Karabatsos, 2005, Klugkist et al., 2005, Myung et al., 2005, Sedransk et al., 1985) or based on minimum description length (Heck et al., 2015, Klauer and Kellen, 2015, Rissanen, 1978). Multinomial models with inequality constraints have also been applied to the Bayesian analysis of contingency tables (e.g., Agresti and Hitchcock, 2005, Klugkist et al., 2010, Laudy and Hoijtink, 2007, Lindley, 1964). However, general-purpose software packages for Bayesian statistics such as JAGS (Plummer, 2003) or Stan (Stan Development Team, 2018) are often not suited for the analysis of models with complex inequality constraints. This is due to the fact that the boundary of the constrained parameter space is specified as a, typically complex, function of multiple parameters. As a result, the parameters are highly inter-dependent and often cannot be defined independently (for a counterexample with simple constraints, see Heck & Wagenmakers, 2016).
This article considers computational methods of carrying out Bayesian analyses on multinomial models with linear inequality constraints on the parameters. However, we go further than analyzing simple “toy” models such as the dosage example above and consider models defined by arbitrarily complex linear constraints on multinomial parameters. Analyzing this class of model is known to be computationally challenging, especially for highly complex linear constraints as those defined by random preference models (Smeulders, Davis-Stober, Regenwetter, & Spieksma, 2018) and the axioms of additive conjoint measurement (Karabatsos, 2018). In the following, Section 1.1 highlights the relevance of inequality-constrained multinomial models for testing psychological theories. In Section 2, we introduce the notation, likelihood, and prior for multinomial models and the two types of representations for inequality constraints. Section 3 extends existing computational methods for binomial models with specific order constraints (e.g., Karabatsos, 2005, Myung et al., 2005) to multinomial models with arbitrary sets of linear inequalities. More precisely, we develop a general Gibbs sampler for parameter estimation and offer improved computational methods for estimating the encompassing Bayes factor for carrying out Bayesian model selection. Section 4 develops these methods for models that are specified by a set of predicted patterns using the vertex representation. This is useful, as defining a restricted model may be straightforward for one type of representation but not the other, while switching between representations can be computationally infeasible (Avis, Bremner, & Seidel, 1997). In Section 5, we offer the R package multinomineq (Heck & Davis-Stober, 2019) and show how to apply inequality-constrained multinomial models in practice using concrete examples. Finally, Section 6 discusses the analysis of nested data, the choice of priors, and possible directions for future research.
Inequality constraints on multinomial parameters can arise in a number of ways. Similar to our drug consumption example, they can arise “organically” by directly instantiating the hypothesis of interest. For this example, the inequalities are implied by the natural hypothesis that the response categories should be ordered by dosage regimen. In this way, inequality constraints can provide a direct evaluation of the hypothesis of interest, in contrast to other, heuristic methods such as testing the equality of all three dosage condition parameters and then carrying out additional, post hoc analyses to determine directional differences. In later sections, we will consider other examples of linear inequality constraints that arise naturally from theoretic hypotheses that are more complex than simple order restrictions (Hilbig & Moshagen, 2014).
While not immediately obvious, linear inequality constraints can also arise when evaluating theories/models/axioms in which multiple predictions are made. Such theories are quite common, especially in the field of judgment and decision making. For example, consider the well-known transitivity of preference axiom (Regenwetter, Dana, & Davis-Stober, 2011). Depending upon an individual’s tastes, there are many ways for a decision maker to have transitive preferences over a set of choice alternatives. Evaluating multiple predictions of a theory simultaneously within a multinomial framework opens up additional ways to operationalize this theory of interest. As an example, we consider methods of stochastic specification for deterministic theories, although we note that the application of such methods (e.g., mixture methods) extends beyond the decision making domain (Davis-Stober, Morey, Gretton, & Heathcote, 2016).
Many psychological theories predict deterministic choice patterns across different contexts (e.g., different types of stimuli, items, conditions, measurement occasions, or pre-existing groups). For instance, a theory might provide a specific response pattern such as “participants prefer Option A over B in each of five choice scenarios” (Bröder & Schiffer, 2003). Often, however, theories predict more than one response pattern. As illustrated in Section 5.2 for the description-experience gap in the domain of risky gambles (Hertwig, Barron, Weber, & Erev, 2004), the hypothesis that participants assign more weight to small probabilities results in multiple predicted patterns. The complete set of predicted patterns can be obtained in different ways (Regenwetter & Robinson, 2017), for instance, by (a) translating a verbal theory into predicted patterns, (b) deriving algebraic implications of axioms or formal theories, and (c) brute force enumeration of all of the predictions made by the deterministic theory, typically under a set of theory-specific assumptions (e.g., theory parameter values).3 Irrespective of how the theoretical predictions are derived, observed choice frequencies are inherently noisy and exhibit a certain amount of variance both within and across persons or contexts. Hence, the question arises of how to define a stochastic model for empirical frequencies based on a set of deterministic predicted patterns (Carbone and Hey, 2000, Heck et al., 2017, Regenwetter and Davis-Stober, 2012, Regenwetter and Davis-Stober, 2018).
In multinomial models, each predicted choice pattern can be represented by a vector of probabilities of either one (an option is deterministically chosen) or zero (an option is not chosen; Bröder & Schiffer, 2003). Fig. 1 illustrates this for two independent binomial probabilities of preferring Option A over B in a control and an experimental condition, respectively. The three black points in Fig. 1A show three predicted patterns of a hypothetical theory that are represented by the vectors , , and . For instance, the pattern represents the prediction that Option A is chosen in the control condition (since ) whereas Option B is chosen in the experimental condition (since ).
To derive a stochastic model based on a set of predictions , it is important to consider why a psychological theory makes multiple predictions in the first place (Regenwetter & Robinson, 2017). A theory might assume that one of the predicted patterns consistently describes the “true” data-generating mechanism across all measurement occasions. According to this interpretation, theory-inconsistent responses merely emerge from unsystematic errors in responding (e.g., due to inattention) whereas latent preferences are stable. In our example, this assumption results in a stochastic model with two independent error probabilities for the two conditions. These error probabilities serve as free parameters and are usually constrained to be below a predefined, fixed threshold such as 20%. In Fig. 1B, this independent-error model is illustrated geometrically by square boxes around the three predicted patterns.
Alternatively, a theory might assume that latent preference states randomly fluctuate across measurement occasions (e.g., across time, persons, or situations), whereas the response process is error-free (Regenwetter & Robinson, 2017). This means that at each measurement occasion, one of the predicted patterns describes the “true” data-generating mechanism perfectly. However, since we do not know which latent states generated the responses in which trials, this error specification leads to a finite mixture model over the predicted patterns (Regenwetter et al., 2014). Fig. 1C shows the parameter space of this mixture model for our example. Essentially, the model permits only those probability vectors that are inside the triangle obtained by connecting the three predicted preference patterns by straight lines (i.e., ). Geometrically, this area is the convex hull of the finite number of predicted patterns and defines a convex polygon in two dimensions (cf. Eq. (8)). More generally, for choice probabilities, the convex hull results in a convex polyhedron, and for arbitrary number of probabilities , this geometric object is known as a convex polytope (Koppen, 1995, Suck, 1992).
The present paper is concerned with mixture models as that illustrated in Fig. 1C. Theoretically, these models assume random variation in the latent, data-generating process, which can be represented statistically as a mixture distribution over the finite set of predicted patterns (Regenwetter & Robinson, 2017). The parameter space of these models can equivalently be described by specifying explicit linear inequality constraints on choice probabilities (e.g., ), or by the convex hull of all response patterns that are predicted by a theory. These mixture models are quite general and, depending upon the experimental design, can provide a strong test of the theory/axiom of interest. For example, applied to a single individual with choice responses aggregated over multiple time points, a violation of a mixture model over a set of predictions provides evidence that this individual must have violated the theory of interest; as the model allowed for an arbitrary distribution over all possible theory-consistent preferences.
Section snippets
Multinomial models with linear inequality constraints
In this section, we outline the notation, likelihood function, and prior distribution of multinomial models and introduce the two equivalent formal representations of linear inequality constraints.
Bayesian inference using the inequality representation
In this section, we summarize and improve computational methods for the Bayesian analysis of multinomial models given a set of linear inequality constraints.
Bayesian inference using the vertex representation
In the following, we develop computational tools for obtaining posterior samples and computing the Bayes factor for inequality-constrained multinomial models that are defined by the -representation. Instead of providing a set of inequalities as in the -representation, the -representation uses an matrix that contains one vertex (e.g., a predicted pattern) per row as illustrated in Eq. (8). For many psychological theories, it is indeed easier to obtain a list of all admissible
The R package multinomineq
We implemented the above computational methods for multinomial models with convex, inequality constraints in C++ using the linear-algebra library Armadillo (Sanderson, 2010). This has the advantage that many of the sequential computations can efficiently be performed using precompiled code. To also make the methods available to a broad audience, the functions are embedded in the R package multinomineq, which is freely available on GitHub (www.github.com/danheck/multinomineq/; Heck &
Discussion
In mathematical psychology in general and judgment and decision making in particular, many theories can be formulated by a set of linear inequality constraints on multinomial models (Iverson, 2006). This includes representational measurement theory (Karabatsos, 2001, Krantz et al., 1971), state-trace analysis (Prince et al., 2012), decision axioms such as transitivity (Myung et al., 2005, Regenwetter et al., 2011), random utility models (for a review, see Marley & Regenwetter, 2017), and
References (87)
- et al.
How good are convex hull algorithms?
11th ACM symposium on computational geometry
Computational Geometry
(1997) - et al.
How to assess a model’s testability and identifiability
Journal of Mathematical Psychology
(2000) - et al.
Generalized two- and three-dimensional clipping
Computers & Graphics
(1978) Analysis of multinomial models under inequality constraints: applications to measurement theory
Journal of Mathematical Psychology
(2009)A lexicographic semiorder polytope and probabilistic representations of choice
Journal of Mathematical Psychology
(2012)- et al.
Individual differences in the algebraic structure of preferences
Journal of Mathematical Psychology
(2015) - et al.
Extended formulations for order polytopes through network flows
Journal of Mathematical Psychology
(2018) - et al.
Bayes factors for state-trace analysis
Journal of Mathematical Psychology
(2016) - et al.
Primary facets of order polytopes
Journal of Mathematical Psychology
(2016) Induced binary probabilities and the linear ordering polytope: a status report
Mathematical Social Sciences
(1992)
From information processing to decisions: formalizing and comparing probabilistic choice models
Cognitive Psychology
Adjusted priors for Bayes factors involving reparameterized order constraints
Journal of Mathematical Psychology
Testing order constraints: qualitative differences between Bayes factors and normalized maximum likelihood
Statistics & Probability Letters
An essay on inequalities and order-restricted inference
Journal of Mathematical Psychology
Statistical issues in measurement
Mathematical Social Sciences
The exchangeable multinomial model as an approach to testing deterministic axioms of choice and measurement
Journal of Mathematical Psychology
The flexibility of models of recognition memory: the case of confidence ratings
Journal of Mathematical Psychology
Parametric order constraints in multinomial processing tree models: an extension of knapp and batchelder (2004)
Journal of Mathematical Psychology
The Bayes factor for inequality and about equality constrained models
Computational Statistics & Data Analysis
Random utility representation of binary choice probabilities: critical graphs yielding critical necessary conditions
Journal of Mathematical Psychology
Prior distributions for random choice structures
Journal of Mathematical Psychology
A Bayesian approach to testing decision making axioms
Journal of Mathematical Psychology
Modeling by shortest data description
Automatica
Testing probabilistic models of choice using column generation
Computers & Operations Research
Bayesian model selection for group studies
NeuroImage
Geometric and combinatorial properties of the polytope of binary choice probabilities
Mathematical Social Sciences
An encompassing prior generalization of the Savage–Dickey density ratio
Computational Statistics & Data Analysis
Bayesian inference for categorical data analysis
Statistical Methods & Applications
Computing convex hulls and counting integer points with polymake
Mathematical Programming Computation
Statistical inference under order restrictions: Theory and application of isotonic regression
Bayesian strategy assessment in multi-attribute decision making
Journal of Behavioral Decision Making
An introduction to convex polytopes
Which error story is best?
Journal of Risk and Uncertainty
A model-based test for treatment effects with probabilistic classifications
Psychological Methods
Porta - polyhedron representation transformation algorithm
Non-uniform random variate generation
Data analysis using Stein’s estimator and its generalizations
Journal of the American Statistical Association
Frequently asked questions in polyhedral computation
Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling
Journal of the American Statistical Association
Objective priors: an introduction for frequentists
Statistical Science
Some do and some don’t? Accounting for variability of individual difference structures
Psychonomic Bulletin & Review
A caveat on the Savage-Dickey density ratio: the case of computing Bayes factors for regression parameters
British Journal of Mathematical and Statistical Psychology
Cited by (26)
Order-constrained inference to supplement experimental data analytics in behavioral economics: A motivational case study
2023, Journal of Behavioral and Experimental EconomicsAn illustrated guide to context effects
2023, Journal of Mathematical PsychologyCultural consensus theory for two-dimensional location judgments
2023, Journal of Mathematical PsychologyBayesian inference for generalized linear model with linear inequality constraints
2022, Computational Statistics and Data AnalysisTUTORIAL: “With sufficient increases in X, more people will engage in the target behavior”
2020, Journal of Mathematical PsychologyCitation Excerpt :This uncertainty about convergence, together with the potentially slow convergence overall, is the main price to pay for having an algorithm that applies to a very broad collection of models, including models that combine inequality with equality constraints. For these hypotheses, none of which involve equality constraints, the user who requires very high accuracy and confidence in the Bayes factors can either supplement or replace this Bayes factor calculation with a draw-and-test algorithm that is currently available only in the Matlab code version of QTest (as well as the R package of Heck & Davis-Stober, 2019).15 The most parsimonious hypotheses, such as Hypotheses 1 & 2 generate very small Bayes factors that unambiguously rate the evidence against these hypotheses as decisive in Targets 1 & 3, and strong in Target 2.
Bayesian hypothesis testing for Gaussian graphical models: Conditional independence and order constraints
2020, Journal of Mathematical Psychology
- ☆
The R package multinomineq can be installed from https://github.com/danheck/multinomineq/. Data and R code for the analyses are available at the Open Science Framework at https://osf.io/xv9u3/.
- 1
The first author was supported by the research training group Statistical Modeling in Psychology (GRK 2277), funded by the German Research Foundation (DFG).
- 2
The second author was supported by the National Science Foundation, United States (grant SES 14-59866) and the National Institute of Health, United States (grant K25AA024182).