Disequilibrium in behavior analysis: A disequilibrium theory redux

Disequilibrium theory is an approach to reinforcement that reconsiders the putative response strengthening prowess of stimuli. This disequilibrium approach—the pinnacle of the response deprivation hypothesis—reliably predicts changes in behavior without reference to a response strengthening process. While the strengthening model of reinforcement has received renewed and critical appraisal in behavior analysis, its appraisers have not fully considered the role that a disequilibrium conceptualization might play in their respective theories of re-inforcement. In this essay we celebrate William Timberlake’s legacy by elucidating the assumptions of disequilibrium theory and by exploring its predictions and implications within behavior analysis. We treat the disequilibrium approach to reinforcement as the theory of reinforcement in behavior analysis, and in doing so, we distinguish disequilibrium conditions from motivating operations and explore future directions regarding the potential to predict generalization and maintenance outcomes. The disequilibrium approach to reinforcement is not a mere deprivation operation used for the purposes of establishing a stimulus as a “reinforcer,” as it is a general theory of behavior.


Introduction
William Timberlake is the epitome of Skinner's (1956) first unformalized principle of scientific method: "When you run onto something interesting, drop everything else and study it.I [Skinner] tore up the Parthenon and started over " (p. 223).The Parthenon on which Timberlake went to work was Skinner's operant chamber.He fashioned channels into the floors of operant chambers, released ball bearings along them, and elicited species-typical behaviors that were once obscured by the experimenter's choice of lever (Timberlake et al., 1982).He repositioned a food hopper, usually in the wall of an operant chamber, into the floor of a pigeon's chamber in order to mimic its ecological niche and ultimately show that pigeons were not as superstitious as Skinner (1948) once thought (Timberlake and Lucas, 1985).Most notably, in the eyes of these authors at least, was Timberlake's (and James Allison's) use of a wheel running apparatus to uncover the necessary conditions of reinforcement, so as to displace the prevailing and presumptive reinforcement models of the time (Timberlake and Allison, 1974).
Reinforcement, arguably the foremost principle in behavior analysis, is that which follows a response-contingently or adventitiously-and strengthens that response as a result (Skinner, 1953;cf. Staddon, 1992).Behavior analysts may refer to the principles of learning and behavior as the principles of reinforcement because it is from the principle of reinforcement that all other principles of learning and behavior emanate: discrimination, generalization, and extinction considered broadly.Discrimination and generalization are the products of responses reinforced (strengthened) in the presence of stimuli, while extinction is a process that is operative only with respect to responses previously reinforced (strengthened).Reinforcement is so central to any account of operant behavior that it is often taken for granted (Shahan, 2017); it is treated more like an assumption than an empirically testable law of effect.To question this assumption is to question the principle and principles of reinforcement, the warp and weft of behavior analysis.Timberlake did just that when he questioned what it meant to call a stimulus a reinforcer, throughout his career and to this day (see, e.g., Timberlake, 2004).Timberlake and Allison (1974) were keen on the issues that a strengthening model of reinforcement wrought: "Despite its plausibility and widespread acceptance, the appeal of the strengthening process is primarily intuitive; the process itself is a mystery" (p.149).Such a model of reinforcement "…directs experimentation and interpretation but cannot itself be tested" (p.149).Timberlake and Allison's own model of reinforcement-their attempt to displace the implicit strengthening model underlying Skinner's theory of reinforcement (Timberlake, 1988)-was the response deprivation hypothesis.Unlike the assumption of strengthening that could not be tested, Timberlake and Allison put forth a working hypothesis that could be verified in some cases, falsified in others, and elaborated to accommodate the challenges it encountered.Skinner's law of effect, for instance, was pragmatic and inductive: to name a stimulus a reinforcer was expedient and true only to the extent that it was observed to have actually increased behavior.Timberlake and Allison's response deprivation hypothesis was pragmatic, but more deductive than inductive: to name a stimulus a reinforcer was based on a priori circumstances that established, with a degree of certainty, whether a given set of conditions would work to increase behavior (Timberlake and Farmer-Dougan, 1991). 1imberlake and Allison's (1974) response deprivation hypothesis is a substantive model of reinforcement that was elevated to the level of theory in Timberlake (1980), Timberlake (1984a), andTimberlake (1993) to name a few.It predicts that which will increase (or decrease) behavior, and therefore, specifies the conditions for reinforcement (and punishment) a priori."Unfortunately, despite the theoretical and empirical importance of this capability, it is rarely discussed in texts on learning or behavior modification" (Timberlake, 1984b, p. 385).If the response deprivation hypothesis is discussed in behavior analytic texts or with colleagues, it is relegated to a section for consideration next to Premack's (e.g., Pierce and Cheney, 2013;cf. an exception, Domjan, 2015) or is subsumed by the concept of the motivating operation (Klatt and Morris, 2001;Michael, 1993a).The purpose of the current paper is to elucidate and treat the response deprivation hypothesis as the theory of reinforcement in behavior analysis (i.e., to explore its predictions and implications within behavior analysis).Skinner's theory of reinforcement is personally satisfying when it works, with our highly tuned apparatuses in the laboratory and with our seat-of-the-pants preference assessments in the field, but such is not a comprehensive theory of reinforcement (Konarski et al., 1981;Timberlake, 1988).Timberlake's approach to reinforcement refines and replaces Premack's, complements Skinner's by displacing its underlying strengthening model, and is more than a clever deprivation operation to establish a stimulus as a "reinforcer."

What is the reinforcer?
According to textbook parlance, the reinforcer is a stimulus that follows behavior and increases that behavior as a result.Behavior analysts, scientists and practitioners alike, seek out these stimuli in order to assert control via the contrivance of a contingency: if you do X, then you will get Y.Behavior X-measured in rate, duration, or percentage correct-increases because of at least one or more of the following: (1) Y was observed to increase behavior in another context, (2) Y was restricted for an extended period of time with no opportunity to obtain it, or (3) Y is a stimulus associated with high probability behavior.Number (1) refers to the presumed transituationality of stimuli as reinforcers (Meehl, 1950), number (2) refers to the establishment of stimuli as reinforcers via motivating operations (Laraway et al., 2003), and number (3) refers to the hypothesis that high probability behavior reinforces low probability behavior (Premack, 1959).Each are the reasons given for the "effectiveness" or "value" of a stimulus to reinforce.Different combinations of these three approaches are used in the laboratory and field.If (1) does not hold true, then defer to (2); if (2) is not feasible, then defer to (3) by guessing which stimulus is associated with high probability behavior.Premack's principle most closely resembles Timberlake and Allison's (1974) approach because of its emphasis on a probability differential between activities, but Premack's general rule of thumb is often the subject of misuse when behavior analysts guess the probabilities of behavior.For example, "Kids like playing Fortnite (high probability), so let's make that contingent upon doing their homework (low probability)" is true only to the extent that you never encounter the kids that "like" doing their homework (high probability).
Instead of gratuitously applying approaches (1) through ( 3), Timberlake and Allison (1974) generally asked: What do individuals usually do during their free time, without any constraints on their behavior, and what would happen if we disrupted that behavior with a contingency?At the heart of this question is activities.2For Timberlake and Allison there are no "reinforcers" or "punishers," only reinforcement or punishment effects attributed to the disruption of an individual's unconstrained activities.Arrange a contingency such that an individual watches less Netflix than she usually does during her free time, and that individual will engage in more behavior to access it (reinforcement effect); arrange a contingency such that the same individual watches more Netflix than she usually does during her free time, and she will stop engaging in behavior to access it (punishment effect).On the face of it these examples might sound like establishing and abolishing operations respectively, but for reasons we will discuss later, they are not.
The significance of activities to Timberlake and Allison's (1974) response deprivation theory of reinforcement is evident in their units of analysis: instrumental and contingent activities.Activities are designated as instrumental when they produce access to the contingent activities that depend upon them.For instance, pecking is instrumental in gaining access to food for a pigeon while crying is instrumental in gaining access to food for an infant.The contingent activity is not the passive receipt of food, but the action of eating that food after having worked for it.The stream of daily activities that include eating, drinking, playing, working, and sleeping is segmented for analysis in terms of instrumental and contingent activities.Timberlake and Allison showed that if you disrupt those activities, individuals will engage in more or less instrumental action to compensate for that disruption.That disruption of instrumental and contingent activities is called disequilibrium.

3.
What is disequilibrium?Timberlake and Allison's (1974) response deprivation hypothesis culminated in Timberlake's (1980) A Molar Equilibrium Theory of Learned Performance.Molar equilibrium theory is more commonly referred to as the "disequilibrium approach" and it is the name of the theory of reinforcement we will refer to throughout (Timberlake, 1984a;Timberlake and Farmer-Dougan, 1991).The disequilibrium approach is a theory unto its own because of its unique units of analysis, methodology, and models that flow from its assumptions.We will describe the assumptions of the disequilibrium approach throughout, and in doing so, we will elucidate one of its models.The model that captures the crux of the disequilibrium approach is the disequilibrium model of reinforcement and punishment (Eisenberger et al., 1967;Timberlake and Allison, 1974;Heth and Warren, 1978).The disequilibrium model of reinforcement and punishment can be called the basic disequilibrium model, for there are other disequilibrium models that extend it (see Heth and Warren, 1978;Timberlake and Wozny, 1979).For an advanced generalization of the disequilibrium approach, which encompasses and captures all of its important features, readers should see Hanson and Timberlake (1983) for the differential equations that account for many of the classic reinforcement schedule effects (i.e., fixed-ratio, fixed-interval, and variable-interval effects).
A general assumption of any disequilibrium model is that instrumental and contingent activities have an equilibrium, also referred to as a set-point or bliss-point (Timberlake, 1984a).Equilibrium is the steady-state responding of an individual when there are no constraints placed on their behavior.Equilibria are measured during free operant baselines, or paired baselines, in which there is an opportunity to engage in at least two simultaneously available activities.By measuring how much or how often an individual engages in each of the two available activities, scientists and practitioners can set the terms of a contingency to increase or decrease instrumental activity.The effectiveness of disequilibrium models, as technologies to change behavior, depends on a measure of any individuals' equilibria.Equilibria, setpoints, and bliss-points, however, are not fixed entities or ideals internal to the organism, as some critics of the disequilibrium approach feared (Hursh et al., 1984).The terms equilibrium, set-point, and blisspoint serve a methodological function just like basin of attraction in dynamical systems models and base-line in single-case experimental designs.Equilibria are a point of comparison and should be measured often because they reflect the activity patterns of organisms in state X, at time Y, and in context Z.
The notion that there are context-specific equilibria is the first assumption of the disequilibrium approach (Timberlake, 1980).The second assumption of the disequilibrium approach considers the imposition of contingencies that are natural or contrived: When a contingency arrangement, such as a fixed-ratio schedule, is discrepant with an individual's equilibrium, a disequilibrium condition will ensue and that individual will work to reduce it (Timberlake, 1980).Taking the example of Netflix, let's say you eat 10 thin mints for every 60 min of Netflix watched.If we arranged a contingency such that you could only access 5 min of Netflix per every 10 thin mints eaten, you would experience what is called a "response deficit" (Timberlake, 1980).Five minutes of Netflix is less than your typical, 60 min watching experience.This response deficit is the disequilibrium condition you will work to reduce by eating more thin mints to gain access to as close to 60 min of Netflix as possible.Eating 10 thin mints is the instrumental activity that gets you 5 min access to Netflix, the contingent activity.At this rate, 50 thin mints eaten, overall, would get you only 25 min access to Netflix, a serious response deficit that is offset only by eating more thin mints.But how many more can you really eat?
Another disequilibrium condition is called "response excess" (Timberlake, 1980).If you are granted access to 60 min of Netflix, on the condition that you must eat 1 thin mint to get it, you will have met your equilibrium for watching Netflix but will be in deficit with respect to your cookie eating habit.Eat one thin mint and you will get your 60 min of Netflix; eat a second thin mint and you will be required to watch another 60 min of Netflix, which puts you at 120 min of Netflix overall.One-hundred and twenty minutes of Netflix is a response excess that can only be offset by putting that cookie down, now.The excess of Netflix is the disequilibrium condition you work to reduce, not by engaging in more cookie eating behavior, but by engaging in less.Below we describe these different disequilibrium conditions more precisely, with a brief overview of the basic disequilibrium model of reinforcement and punishment (see Jacobs et al., 2017, for a more extensive treatment and tutorial on the basic disequilibrium model).

Reinforcement
According to the disequilibrium approach, the necessary condition for reinforcement is a response deficit.Timberlake and Allison (1974) referred to this as response deprivation, but so as not to conflate the conditions of reinforcement with motivating operations, such as time since one's last meal, we prefer response deficit (Timberlake, 1980).The basic disequilibrium model of reinforcement specifies a response deficit in the following form: where I and C represent instrumental to contingent response requirements respectively and O i and O c represent free operant baseline measures of instrumental and contingent activities respectively.In relatively few terms the model says: If you impose an instrumental to contingent response requirement that is greater than what an individual usually does during his/her free time, then you will observe an increase in instrumental activity.Take an example from Konarski et al. (1980), whereby they increased a first-grader's coloring activity by making math the contingent activity.Using Eq. ( 1) above, their contingency was as follows: In Dave's case, coloring increased relative to baseline and math activity remained in deficit.Dave reduced that deficit as much as was possible in session but was constrained by the I/C contingency.An important detail to note regarding Dave's case was that math-a low probability behavior compared to coloring-was used as the contingent activity.According to Premack's (1959) probability differential hypothesis, high probability behavior must be the contingent activity in order for there to be a reinforcement effect.In Dave's case, high probability behavior did not reinforce low probability behavior according to Premack's principle.Instead, when the low probability math activity was made contingent upon engaging in the high probability coloring activity, the high probability coloring activity increased.This is where the disequilibrium approach departs from the Premack principle: A probability differential between high and low probability behaviors is not the necessary condition for reinforcement effects.As long as I/C is greater than O i /O c , it does not matter which behavior is high probability or low probability (Timberlake and Wozny, 1979).This is to say that scientists and practitioners can designate, based on their analytic goals, either behavior as instrumental or contingent.

Punishment
In the case of punishment effects, the necessary condition is a response excess.Heth and Warren (1978) referred to this as response satiation, but for the same reason we prefer response deficit, we prefer Timberlake's (1980) usage of response excess.The basic disequilibrium model of punishment specifies response excess as the inverse of reinforcement: where all variables except the sign of inequality are the same.Here the model generally says that if you impose an instrumental to contingent response requirement that is less than what an individual usually does during his/her free time, then you will observe a decrease in behavior.Take an example from Realon and Konarski (1993), whereby they decreased a 19-year old's self-injurious hand-mouthing by transforming it into an instrumental activity that produced an excess in leisure activity.Given Eq. ( 2), Realon and Konarski's contingency was as follows: < 1 5

566
This was the contingency arranged for Thomas, where 131.5 was the mean frequency of instances of hand-mouthing and 566 was the mean duration of leisure engagement in seconds.The terms of I/C required Thomas to engage in 5-s of leisure activity for every 1 instance of hand-mouthing.Therefore, if Thomas engaged in hand-mouthing anywhere near his equilibrium of 131.5, then he would exceed his leisure activity equilibrium.Fig. 2

The theory of reinforcement
The basic disequilibrium model of reinforcement and punishment encompasses and answers our initial question: What do individuals usually do during their free time, without any constraints on their behavior, and what would happen if we disrupted that behavior with a contingency?As shown in the studies cited thus far, individuals increase or decrease instrumental activity as a function of the imposed disequilibrium conditions.That is, changes in instrumental activity approach or maintain that which individuals usually do during their free time.An increase or decrease in instrumental activity, however, is not the only outcome specified by the disequilibrium approach.A third assumption of the disequilibrium approach is that reductions in disequilibrium result in learned performances when those reductions are reliably associated with stimuli and responses (Timberlake, 1980).This is to say that stimuli and responses will acquire new or modified functions when disequilibrium conditions are reduced.Thomas' handmouthing, for instance, came to serve an instrumental function across people and contexts (Realon and Konarski, 1993).The assumption that individuals learn from reductions in disequilibrium distinguishes disequilibrium conditions from motivating operations and has implications regarding response acquisition, stimulus control, and generalization.These implications have not been thoroughly brought to fore in behavior analytic research and practice, so what follows is an examination of those implications when disequilibrium conditions are considered as distinct from motivating operations and when the disequilibrium approach is taken to be the theory of reinforcement.

Disequilibrium predictions and motivating operations
The motivating operation (MO) concept is characterized by two effects: the value-altering effect and behavior-altering effect.While the value-altering effect is any operation that alters "the effectiveness of reinforcers and punishers," the behavior-altering effect is any operation that alters "the frequency of operant response classes related to those consequences" (Laraway et al., 2003, p. 412).A defining characteristic of the MO, which remains unstated, is its reliance on a strengthening model of reinforcement."Reinforcers" and "punishers" are described as things endowed with an effectiveness to strengthen or suppress behavior (see Michael, 1993b).This is a major difference between the MO concept and disequilibrium conditions.According to the disequilibrium approach, there are no "reinforcers" or "punishers" to be more or less effective.Instead, there are context-specific instrumental and contingent equilibria that are susceptible to change via the imposition of a contingency that produces a disequilibrium condition.Additionally, it is the terms of a contingency (I/C < or > O i /O c ) that has behavioraltering effects according to the disequilibrium approach; not pre-session deprivation and satiation operations or the administration of stimulants or depressants (see Poling et al., 2017, for types of MOs).In Holburn and Dougher's (1986) words: "It is important to note that the deprivation [response deficit] or satiation [response excess] is operating in the contingency; it does not require extended periods of time away from the treatment setting where an individual is denied access to (or bombarded with) the contingent response" (p.72; italics is theirs, bracketed terms added).This is not to say that there is no place for motivation within the disequilibrium approach.Rather, the disequilibrium approach's departure from a strengthening model of reinforcement requires a reconceptualization of Michael's (1993b) MO concept.In this reconceptualization, the effects of MOs are not on the stimuli with which individuals interact.Arguably, and consistent with the disequilibrium Fig. 2. A punishment contingency exemplified by the re-plotted data for participant Thomas in Realon & Kornarski (1993).The available data in the last intervention condition supplied the values for I/C.approach, the effects of MOs are on the state of the organism, O (Killeen and Jacobs, 2017a)."O" is the term with which the state of the organism-physiological, emotional, or historical-is represented in the modulated, "four-term" contingency proposed by Killeen and Jacobs (2017a;b).
If MOs operate on the state of the organism, O, then disequilibrium conditions operate on the organism's behavior (O i and O c ).While the state variable, O, modulates which classes of stimuli an organism will approach or avoid (Killeen and Jacobs, 2017b), disequilibrium conditions set the terms for how much an organism will approach or avoid given that state.Free operant baseline methods are essential to the disequilibrium approach because they reflect the state of the organism in O i and O c .As a result, "…the equilibrium approach can specify conditions under which a deprived animal will show no increase in instrumental responding, other conditions under which an undeprived animal will show an increase in instrumental responding, and other conditions in which the reverse of these effects should occur" (Timberlake, 1980, p. 10).The predictions of the disequilibrium approach described by Timberlake (1980) provide analytic justification for why scientists and practitioners might want to distinguish MO from disequilibrium.The evidence used to support the effect of MOs on stimulus control is a case in point.Lotfizadeh et al. (2012) replotted data from Thomas and King (1959) to show how food deprivation changed the shape and level of generalization gradients for pigeons at 60%, 70%, 80%, and 90% of ad libitum weight.Fig. 3 shows the mean responses of those pigeons with the 550-nm wavelength designated as the discriminative stimulus (S D ).As Lotfizadeh et al. (2012) describe it, "…visual analysis of the obtained generalization gradients suggests that the range of test stimuli that evokes responding is wider when deprivation is high than when it is lower…" (p.98).In other words, there is a deterioration of stimulus control at higher levels of deprivation, for there is a greater number of responses in the presence of untrained stimuli.While Lotfizadeh and colleagues attribute Thomas and King's data to the effect of MOs on behavior, a disequilibrium analysis attributes their data to the effect of defective contingencies of reinforcement.Thomas and King did not control for disequilibrium conditions at each level of deprivation and did not implement discrimination training at any other level of deprivation besides 80 percent.Interestingly, the level of deprivation with the steepest slope is the 80% free-feeding group.
A disequilibrium approach to Thomas and King's (1959) preparation would measure independent groups of pigeons' O i and O c at each level of deprivation, implement an I/C ratio greater than O i /O c during discrimination training at each level of deprivation, and, finally, test for stimulus control by presenting trained and untrained stimuli at each level of deprivation and in the absence of disequilibrium conditions.If the assumption that reductions in disequilibrium result in learned performances holds true, then each respective group of pigeons should exhibit a peak in the presence of the 550-nm wavelength with an overall mean number of responses that is comparable to their initial O i and O c equilibria.Fig. 4 is an illustration of hypothetical data predicted by this disequilibrium approach to stimulus control and motivation.Motivation, in terms of level of food deprivation, accounts for the overall mean level of responding, while disequilibrium conditions-at each level of deprivation during training-accounts for the discriminative performances at each level of deprivation.Notice that the 80% data series in Fig. 4 is Thomas and King's (1959) original data.We chose this data series, and made the 60%, 70%, and 90% data series' proportional to it because the 80% group of pigeons in Thomas and King's study did both discrimination training and generalization testing at 80% free-feeding weight.We are assuming that this is the sort of generalization gradient you would observe when disequilibrium conditions are controlled for in training and at each level of deprivation.
According to the disequilibrium approach, the motivational state of the organism is reflected in its free operant baseline.As a result, scientists and practitioners can set the terms of an I/C contingency in order to get learned performances, such as discriminative control, at any level of deprivation.Thomas and King (1959) asked how individuals would respond at different levels of food deprivation (O) but did not know to ask how individuals would respond given disequilibrium conditions relative to their current state (I/C < or > O i / O c ).If the addition of controls for disequilibrium conditions result in more orderly patterns of behavior like that shown in Fig. 4, then scientists and practitioners will be in a better position to parse the individual and/or combined effects of motivation and disequilibrium on behavior.A test of the hypothetical results in Fig. 4 would not only be an important test of the assumptions of the disequilibrium approach but would also be a test of current orthodoxy in behavior analysis.If disequilibrium conditions can change the level and shape of generalization gradients, then the MO, as behavior analysts currently understand it, will require reconceptualization.

Disequilibrium predictions and generalization
The preceding section concerned generalization gradients but did not directly address the issue of generalization proper.Broadly speaking, stimulus generalization is responding similarly in the presence of different stimuli, while response generalization is responding differently in the presence of the same stimuli.Scientists and practitioners program for or against these generalization outcomes based on their analytic goals.For instance, generalization from training to nontraining contexts is often desired in application.As Stokes and Baer (1977) pointed out, however, there is no clearly expressed technology for promoting generalization outcomes.Practitioners train and hope for generalization, and when it fails to occur, they are to blame because they did not adequately control for target and extraneous stimuli during training (Kirby and Bickel, 1988).According to the disequilibrium approach, generalization failures are not the failures of scientists and practitioners per se.Instead, generalization failures are a failure of the strengthening model of reinforcement to predict which stimuli and responses will acquire functions under which conditions.
When it is assumed that reductions in disequilibrium result in learned performances, it is expected that stimuli and responses reliably associated with those reductions will acquire stimulus and response functions that are discriminative and instrumental.This expectation, or hypothesis, can be tested within a disequilibrium approach because the conditions of disequilibrium are quantitatively specifiable in terms of the basic and extended disequilibrium models (see Timberlake and Wozny, 1979).The fourth and final assumption of the disequilibrium approach is an expression of this quantitative advantage: the size of the change in instrumental responding will be directly related to the size of the initial disequilibrium (Timberlake, 1980).In Timberlake's (1980) words, "Assumption 4 is patterned after the response deprivation hypothesis of Timberlake and Allison (1974) and is intended only as a working hypothesis, enabling the prediction of functional relations among variables" (p.18).The functional relations among variables refers to the presence or absence of a discrepancy between O i /O c and I/C, which predicts how much or whether instrumental activity will change.This phrase can also refer to the acquisition of stimulus and response functions predicted by: (1) the presence of disequilibrium conditions that vary in size or (2) the absence of disequilibrium conditions altogether.
The advantage of a disequilibrium approach to generalization is its potential to predict, and therefore, increase the precision with which scientists and practitioners can control.By grading the size of disequilibrium conditions, specified by the disequilibrium models, scientists and practitioners could parametrically examine the relationships between the sizes of disequilibrium and generalization outcomes.The disequilibrium approach allows investigators to ask and test, for instance, whether generalization outcomes are a function of the size of disequilibrium conditions in training.When Stokes and Baer (1977) and Kirby and Bickel (1988) identified techniques for the promotion of generalization, they did not have or consider a theory of reinforcement with predictive capabilities like the disequilibrium approach.Their reliance on a strengthening model of reinforcement meant that there was no gradation of reinforcement effects (i.e., disequilibria) they could test against generalization outcomes.The implications of a disequilibrium approach to generalization, then, could entail the following: Failures of generalization to nontraining contexts could be due to not just extraneous or confounding variables, but could also be due to the absence of disequilibrium conditions or inconsequential sizes of disequilibrium conditions during training.

Disequilibrium predictions and maintenance
Compared to the strengthening model of reinforcement, the disequilibrium approach has much to offer in terms of quantifiable prediction and precise control.The downside of the disequilibrium approach, though, is its lack of research in areas important to behavior analysts.The literature on generalization from a disequilibrium approach, for instance, is practically nonexistent.As far as we know, Realon and Konarski (1993) is the only applied disequilibrium study that included a test of generalization across people and settings.Worse still, we know of no applied disequilibrium studies that have tested for the maintenance of instrumental activities on follow-up.Maintenance refers to the continuation of behavior change long after a behavior modification program has been terminated (Miltenberger, 2016).Maintenance poses a particularly interesting challenge to the disequilibrium approach because it begs these questions: Does the instrumental and contingent activity of individuals inevitably return to equilibrium when disequilibrium conditions are removed?Or do disequilibrium conditions have new equilibria as their result?These questions, and others like them, require testing in both the laboratory and field.
As was the case with generalization, scientists and practitioners could test the sizes of disequilibrium conditions against the stability of instrumental activities on follow-up.A hypothesis based on the disequilibrium approach would suggest a lack of maintenance when disequilibrium conditions were relatively small during training and the presence of maintenance when disequilibrium conditions were moderate to large during training.Farmer-Dougan (1998) showed, with the predictions of the minimum-distance bliss-point model, that changes in the instrumental activity of preschool children were greatest when equilibrium was moderately disrupted.Such results point to the possibility that relatively moderate disequilibrium conditions could be predictive of more stable performances on follow-up.The generation of testable hypotheses such as these gives the disequilibrium approach an advantage over other theories of reinforcement that rely on best guesses or hopes for the best.As Timberlake (1980) put it: "In some sense (and with tongue in cheek), applying the empirical law of effect can be compared to operating a light switch.If the switch works, things are clear; if it does not, you're left in the dark" (p. 6).

Disequilibrium in 21st century behavior analysis
The preceding sections explicated the assumptions of the disequilibrium approach and its potentially beneficial implications for scientists and practitioners in behavior analysis.The implications we identified in the cases of motivation, generalization, and maintenance are not conjecture.They are hypotheses based on a theory of reinforcement that can be subjected to testing in the laboratory and field.Table 1 lists the disequilibrium assumptions in the order and words of

1.
To the extent that characteristics of responding in a free baseline are stable and can be recovered following schedule manipulations, there exists one or more equilibrium states that reliably instigate and control responding.

2.
If a schedule forces a change in the relations among response characteristics in the absence of a change in the equilibrium states instigating and controlling them, a disequilibrium condition will be produced that the behavior of the organism will tend to reduce.

3.
A disequilibrium condition will result in learned performance to the extent that responses and/or predictive stimuli are reliably associated with reduction in that disequilibrium.The reliable association may be based on present and past experience, and evolutionary history.4.
The size of the change in instrumental responding under a disequilibrium condition will be directly related to the size of the initial disequilibrium.Timberlake (1980).They are the assumptions that shaped our hypotheses.In order for disequilibrium theory to take hold within 21 st century behavior analysis, though, scientists and practitioners will have to reconsider what it means to call a stimulus a reinforcer that strengthens behavior.Baum (2012), Cowie (2018), Killeen and Jacobs (2017a;2017b), Shahan (2017), and others have done just this, but without having taken the disequilibrium approach to task.The current paper is a disequilibrium theory redux because we believe now is the time to bring back the ideas and works that were so far ahead of Timberlake's own time.As we see it, disequilibrium is everywhere and so could be studied extensively in translation and application.

Disequilibrium everywhere
The ubiquity of reinforcement recognized by Skinner (1981) is a ubiquity recognized by the disequilibrium approach as well.Rather than seeing reinforcers abound in stimuli, the disequilibrium approach is a lens through which one cannot help but see disequilibrium conditions abound in one's everyday activities."From an equilibrium viewpoint, a schedule is a useful device for probing the operating characteristics of the organism" (Timberlake, 1980, p. 53, italics added).This is to say that disequilibrium is not limited to schedules imposed in the laboratory.Schedules were only devices or tools Timberlake used to test a more general theory of behavior: a disequilibrium theory of behavior.Therefore, disequilibrium characterizes the change in conditions that comes with shaping novel responses, targeting response effort and/ or topography, and transforming previously contingent activities into instrumental ones.An infant's cry is instrumental in gaining access to a contingent activity, for example, until parents require fine grained responses like "mom-ma" or "da-da".Disequilibrium is imposed when responses not already a part of an organism's repertoire are deemed instrumental by parents, teachers, or practitioners.When responses approximate "momma" or "dada", they reduce disequilibrium conditions and serve an instrumental function as a result.The function being to maintain or approach equilibrium.
Disequilibrium is the extra assignment from school or work that encroaches upon your opportunity for leisure.It is the exercise prescribed by your doctor that encroaches upon your work-break lunch: how many steps did you take today?Disequilibrium is the lecture gone too long and the one bite too many.At the day's end, disequilibrium is the sleep lost trying to engage in all of the above activities whilst maintaining equilibrium."Returning to the concept of reinforcement, it can be seen, therefore, that to restrict any particular activity in which an animal engages is to perturb the distribution of all of the animal's activities, that is, to bring about global disequilibrium" (Turvey and Shaw, 1999, p. 102). 3The distribution of one's activities are perturbed daily, and persons engage in instrumental activities that restore, within limits, global equilibrium.

Discovering disequilibria everywhere
The ubiquity of disequilibrium is not yet occurrent, however, because of the methods of the disequilibrium approach that assess for it.Free operant, paired baselines are useful in controlled settings, when animals or persons can be given the opportunity to engage in two or more activities, but are restrictive in cases when the therapeutic context is your office and the client is a verbally capable adult with a diagnosis of major depressive disorder.Additionally, paired baselines in one setting do not capture the ebb and flow of one's daily activities, or how their overall distribution might be disrupted to the detriment or benefit of the person seeking treatment.The disequilibrium approach, in its fullest, calls for a methodology that befits its subject matter (i.e., the broad range of activities, both big and small, that we engage in daily).An extension of the paired baseline is called for and may be envisioned in technologies like Ecological Momentary Assessment (Shiffman et al., 2008).
EMA includes a broad set of methods to collect real-time in-themoment information about an individual's behavior.This is usually accomplished by having the user electronically input information about their behavior on a regular basis.The user will either be prompted electronically (e.g., via text, email, or an app), or will be required to enter information when an event occurs (e.g., smoking a cigarette).EMA data have revealed previously unknown effects of interventions in clinical psychology and can even inform the course of an individual's treatment (Hayes et al., 2018).EMA, as an extension of the paired baseline, may dovetail the disequilibrium approach with Bronfenbrenner's (1977) experimental ecology and Timberlake's (2001) behavioral systems approach.An assessment of activities, within one's respective niche, could reveal the prevalence of disequilibrium conditions and inform laboratory models of everyday events to secure a more lucrative future for the experimental analysis of behavior (Killeen, 2018;Mace and Critchfield, 2010).Disequilibrium in 21 st century behavior analysis will involve a test of its predictions in both the laboratory and field, and whether the disequilibrium approach can bring certain applied and ecologically valid technologies to bear will be one of its primary challenges (Dowdy and Jacobs, 2018;Jacobs et al., 2017).

Conclusion
When Timberlake (1984c) challenged Skinner (1984) on what it meant to call a stimulus a reinforcer, Skinner's (1984) reply was that Timberlake "misunderstood operant conditioning…" (p.508).At the risk of being similarly accused of misunderstanding operant conditioning, we have tried to describe and explore the implications of a full-fledged disequilibrium theory of reinforcement in behavior analysis.Disequilibrium theory is at present, we believe, in the second of three stages of a theory's career: "First, you know, a new theory is attacked as absurd; then it is admitted to be true, but obvious and insignificant; finally it is seen to be so important that its adversaries claim that they themselves discovered it" (James, 1907(James, /1995, p. 76), p. 76).Even though it is admitted to be true, disequilibrium theory is not given priority in our introductory behavior analytic texts; possibly because it is obvious in the case of wheel-running-rats and seemingly insignificant in the case of addressing everyday human affairs.However, when "the diehards of the status quo feel an obligation to study one paper or another, to make a few grumbling comments, and perhaps to join in its exploration" (Feyerabend, 1975(Feyerabend, /2010, p. 23, p. 23), we will have embarked upon that final stage of a theory's career.The discovery, which is William Timberlake's, is an empirically testable and predictive theory of reinforcement that identifies the necessary conditions for reinforcement.Disequilibrium theory casts old results-reinforcement schedule effects-into new terms, and as a result, makes unique predictions that increase the precision with which scientists and practitioners can control behavior.
This contingency was arranged for Dave, and the data represent mean durations of behavior in minutes during baseline (O i /O c ) and intervention (I/C).Therefore, Dave engaged in 12.1 min of coloring activity and 7.5 min of math activity during a free operant baseline.The instrumental to contingent requirement was that Dave had to engage in 5 min of coloring activity to gain access to 2 min of math activity.The basic disequilibrium model predicts that Dave will engage in more minutes of coloring activity to gain as close to 7.5 min of math activity as possible.Fig. 1 is a graphical representation of Dave's equilibrium (O i and O c ) compared to his activity following intervention (I/C > O i /O c ).Note that Dave engaged in approximately 16 min of coloring (reinforcement effect) but was still in response deficit with having accessed only 3.5 min of math activity.

Fig. 1 .
Fig. 1.Re-plotted data for participant Dave in Kornarski et al. (1980) depicting a reinforcement contingency.The dashed lines denote the coordinates of the instrumental (y-axis) and contingent (x-axis) activities.
is a graphical representation of Thomas' equilibrium (O i and O c ) compared to his activity following intervention (I/C < O i /O c ). Thomas engaged in less hand-mouthing (19.3) than during baseline (131.5) because of the response excess of leisure activity (696-s) occurrent during the imposition of I/C.The response excess was the disequilibrium condition that Thomas reduced by engaging in less hand-mouthing.As was the case with reinforcement effects, punishment effects are a function of disequilibrium conditions, and as long as I/C is less than O i /O c , scientists and practitioners can designate either activity as instrumental or contingent.

Fig. 4 .
Fig. 4.An illustration of hypothesized results, if Thomas and King (1959, Experiment 1) had controlled for disequilibrium conditions at each level of deprivation.See text for details.