Evidence of the unthinkable: Experimental wargaming at the nuclear threshold

Ongoing nuclear modernization programs in Russia, China, and the USA have reopened longstanding debates among scholars concerning whether tailored nuclear weapons are likely to have destabilizing consequences for international security. Without data to adjudicate this debate, however, these discussions have remained entirely theoretical. In this article, we introduce an experimental wargaming platform, SIGNAL, to quantify the effect of tailored nuclear capabilities on the nuclear threshold in a simulated environment. We then compare these results with a survey experiment using scenarios related to military basing, cyber operations, and nuclear threats from the wargame environment. While the survey experiments suggest that the presence of tailored nuclear capabilities increases the likelihood of conflict escalation, this trend diminishes in the wargaming context. Across both data-generating processes, we find support for the proposition that lower-yield nuclear weapons are used as a substitute for their higher-yield counterparts. These results have consequences for recent and ongoing policy debates concerning strategic posture and the future of arms control. This work also makes methodological contributions to the design and application of experimental wargaming for social science research, particularly for scenarios where data are limited or non-existent.


Introduction
Debates concerning the strategic impact of tailored nuclear weapons -nuclear weapons designed to produce custom effects such as a low explosive yield or electromagnetic pulse (EMP) effects -have existed throughout the nuclear age, with some suggesting that they contribute to stability and others to instability. Nitze, writing during a period in which a doctrine of massive retaliation was ascendant, suggested as early as 1955 that adding tailored nuclear capabilities might reduce the vulnerability of the USA to nuclear blackmail by the Soviet Union (Buzzard, 1956;Nitze, 1956). Ten years later, McNamara suggested that NATO adopt the doctrine of flexible response with an emphasis on the role of theater nuclear weapons to ensure that the USA had the capability to respond to escalation (Powell, 1988), and in 1974, the Schlesinger doctrine outlined the uses of limited nuclear options as counterforce weapons (Schlesinger, 1975;Burr, 2005). More recently, nuclear modernization in Russia, China and the USA has rekindled academic arguments regarding the opportunities and pitfalls associated with tailored nuclear capabilities (Heginbotham et al., 2017;Podvig, 2018;Talmadge, 2019). The release of the 2018 Nuclear Posture Review announcing plans for a new low-yield nuclear warhead (the W76-2 variant), in particular, renewed this theoretical debate with some suggesting that this development would be destabilizing while others argued that the capabilities are necessary for stability (Broad & Sanger, 2016;Narang, 2018;Long, 2018;Roblin, 2019;Facini, 2020).
However, in the absence of empirical data, how do we adjudicate these claims? How, if at all, might tailored nuclear capabilities impact the threshold for nuclear use? To address these questions, we introduce a first application of large-N experimental wargaming as a method of social science inquiry.
Below, we examine the impact of high-precision lowyield and enhanced-EMP nuclear weapons on the nuclear threshold using data from the Strategic Interaction Game between Nuclear Armed Lands (SIGNAL) experimental wargaming platform. Specifically, we compare player behavior and game outcomes with and without these weapons in the arsenal and examine the likelihood of nuclear use. To benchmark the study, we compare these results with a more traditional threesegment survey experiment that uses the same treatment in scenarios designed to approximate those from the wargame setting. Our analysis suggests that the inclusion of tailored nuclear capabilities in an arsenal may increase the likelihood of nuclear use and substitute for high-yield nuclear use. This effect was observed with statistical significance in the survey setting -an important finding given the widespread use of survey methods in the field. Finally, we reflect on the methodological contribution of the article and the potential applications of experimental wargaming to behavioral social science and international relations research.

Tailored nuclear options in theory
A lack of observational data poses a significant challenge to the empirical examination of nuclear issues (Colby & Gerson, 2013;Lieber & Press, 2017). As Gartzke, Kaplow & Mehta (2015) note, the literature often fails to account for the 'diverse portfolios of [nuclear] weapons with varying range, destructive power, and other characteristics'. Scholars have come to rely on theory and extrapolation from a limited number of cases to examine the potential effects of adding new capabilities to the nuclear arsenal (Brodie et al., 1946;Schelling, 1966;Zagare, 1985;Brewer & Blair, 1979;Powell, 1990;Larsen & Kartchner, 2014;Acton, 2015;Heimer, 2018). This scholarship has contributed a number of assertions related to the nuclear threshold. While some suggest that the 'nuclear-ness' of weapons explains patterns of non-use (Tannenwald, 1999(Tannenwald, , 2005, others posit that there remain conditions under which states may still engage in limited nuclear war (Larsen & Kartchner, 2014;Freedman & Michaels, 2019). This leaves us with a central question, how do nuclear capabilities with tailored effects shape the likelihood of nuclear use?
In the sections below, we outline two schools of thought pertaining to the impact of tailored nuclear weapons on escalation.
Tailored nuclear weapons and stability Throughout the 1950s and 1960s, proponents of tailored nuclear capabilities outlined the benefits of using tactical nuclear weapons in a graduated deterrence architecture rather than in the 'massive retaliation' strategy that represented the orthodoxy of the period (Blackett, 1958;Kissinger, 1960;Osgood, 1979). Utilitarian arguments for the development of tailored nuclear capabilities go beyond deterrence to consider nuclear capabilities as warfighting tools to address discrete military challenges for which more traditional high-yield nuclear capabilities are ill equipped. Both academics and policymakers have suggested that nuclear weapons are needed that reliably produce 'special effects' with much lower collateral damage to destroy or otherwise neutralize targets (Dowler, Howard & Joseph, 1991;Potter et al., 2000;Blair, Carns & Vitto, 2004;Levi, 2004;Tertrais, 2011;Lieber & Press, 2013;Davis et al., 2019). The 2002 US Nuclear Posture Review notes three types of targets for tailored nuclear weapons: 'hardened or deeply buried facilities; chemical and biological agents; and mobile and relocatable targets'. 1 Others point to the substantially lower levels of collateral damage associated with the use of tailored nuclear weapons (Younger, 2000).
Scholars have also recently argued that tailored nuclear weapons offer a useful tool for improving crisis stability by controlling escalation (Colby, 2014;Kroenig, 2015Kroenig, , 2016Kroenig, , 2018. In work re-examining the withdrawal of nuclear forces in Europe, for example, Kroenig notes that the decision to 'eliminate tactical nuclear weapons from Europe has left Russia with a wide range of options on the nuclear escalation ladder' -suggesting that the deployment of a symmetrical nuclear capability might limit these options (Kroenig, 2015(Kroenig, , 2016. This has led some to argue that low-yield nuclear weapons enable a more credible nuclear deterrent by providing a reasonable response option in certain regional scenarios (Lieber & Press, 2009).

Tailored nuclear weapons and instability
Alternatively, there are two logics that underpin the theory that limited nuclear capabilities are likely to increase the likelihood of nuclear use. First, nuclear weapons tailored to reduce civilian casualties may weaken moral norms associated with their use. Second, tailored nuclear weapons suffer from a discrimination problem -whereby an adversary cannot distinguish between high-yield and low-yield capabilities -contributing to inadvertent escalation and conflict spirals.
Scholars note the potential for tailored nuclear weapons to reduce the nuclear threshold as they provide an attractive means to accomplish military objectives while limiting collateral damage (Halperin, 1961;Von Hippel et al., 1988;Rovere & Robertson, 2013;Doyle, 2016Doyle, , 2017. The reduced incidental injury to civilians afforded by low-yield or enhanced-EMP nuclear weapons, for example, poses concerns that their deployment weakens the nuclear taboo by being perceived as a less dangerous nuclear option, eroding moral and ethical norms surrounding nuclear non-use (Tannenwald, 1999). Without the indiscriminate blast associated with traditional nuclear weapons, will the use of nuclear weapons be viewed as tolerable, desirable even, leading to a ripple effect that legitimizes nuclear use? Incidentally, some also argue that there is no guarantee that the amount of collateral damage will be substantially altered by these capabilities (Toon et al., 2019).
The employment of tailored nuclear capabilities may also have escalatory rather than dampening effects on conflict escalation. There is no assurance that a nuclear confrontation that begins with the use of tailored nuclear weapons will remain limited -with the potential for escalation to a strategic nuclear war with existential consequences (Daugherty, Levi & Von Hippel, 1986). Brodie, for example, suggests that when conflict models take into account the reciprocal use of low-yield nuclear weapons, the result is a conflict spiral: 'we tend in the end to get the same kind of utterly nihilistic result in considering unrestricted tactical war in the future that we get in unrestricted strategic war' (Brodie, 1955). This theoretical claim was later showcased in subsequent wargames in which 'practical exercises with simulated tactical nuclear weapons undermined any claims that such warfare could be kept limited' (Freedman & Michaels, 2019). More recently, scholars have argued that the deployment of sea-launched, low-yield nuclear weapons reduces the separation between conventional and nuclear escalation as adversaries do not know a priori whether or not an incoming missile is armed with a nuclear payload (Narang, 2018;Weber & Parthemore, 2019). Faced with the uncertainty of what type of nuclear capability an adversary is deploying or launching, there are fears that state leaders will prematurely (or pre-emptively) embark upon an escalatory response.
This theoretical literature yields the following question: if tailored nuclear capabilities are present, is nuclear use more or less likely? While some argue that tailored nuclear capabilities are destabilizing and others argue they are stabilizing, it is also possible that they have no impact. Without data, we cannot adjudicate these theoretical claims. Below, we present an experimental wargaming approach that attempts to provide that data.

Experimental wargaming
Much of our contemporary understanding of conflict escalation dynamics involving nuclear weapons relies on theory rather than empirics. In response, scholars have turned to alternative data-generating processes to examine nuclear issues. Traditional seminar-based wargames, for example, offer a mechanism for senior policymakers to engage with vexing geographical and geopolitical challenges (Pauly, 2018). Formal and computer-based models also serve as longstanding examples of this work (Powell, 1988(Powell, , 1990. More recently, scholars have used survey and laboratory experiments to investigate nuclear matters -including the conditions under which subjects (often members of the public or undergraduate subjects) would resort to nuclear use (Press, Sagan & Valentino, 2013;Quek, 2016;Sagan & Valentino, 2017). These approaches, like all synthetic data-generating processes, have strengths and weaknesses -from the assumptions that simplify and underpin formal models to the lack of consequences associated with survey responses.
Here, we propose an approach that combines experimental methods with wargaming techniques that have been developed over the past six decades (Perla, 1990;Asal, 2005;Perla & McGrady, 2011;Sabin, 2012;Schofield, 2013). In the process, we provide a new tool for social scientists to interrogate theories on phenomena for which there are limited or no empirical data.
Wargames as experiments Important characteristics of experimental design are often neglected in traditional wargaming -limiting their utility for quantitative analysis and causal inference. However, when executed using experimental design principles, we suggest that these challenges can be overcome to enable a new methodological tool in the social science toolkit (Reddie et al., 2018). As Pauly notes, one of the major assets of wargaming in comparison with survey experiments is the degree to which participants are 'immersed' in the strategic environment of the game (Pauly, 2018). While still an abstraction of reality, wargames provide a rich environment for insight into human decisionmaking, where a wide range of potential scenarios and conflict dynamics can be captured for analysis (Lin-Greenberg, Pauly & Schneider, 2022).
The following characteristics make experimental wargaming particularly well suited to social science inquiry (Tingley & Walter, 2011;Hyde, 2015;Rathbun, Kertzer & Paradis, 2017). First, experimental wargames are repeatable and allow for inference on the basis of player behavior. Second, experimental wargames can be conducted using a control-treatment design, where all conditions within the experiment other than the treatment variable and the characteristics associated with each player are held constant. Third, experimental wargaming provides researchers with control over the variables under examination -in this case, the military capabilities provided to each player. Fourth, the instrumentation of the game can be optimized for data collection -particularly in digital settings. This is important given the data loss in traditional wargaming frameworks that use self-reporting or rapporteurs to collect data on game-level outcomes. Finally, experimental wargames allow for increased fidelity and complexity associated with the scenario in comparison with formal models and survey experiments.
Indeed, the application of experimental design principles to wargaming has already been leveraged by scholars carrying out longitudinal analysis on archived games as well as those creating small-N, analog experimental games to address nuclear, cyber, and drone warfare scenarios (Schneider, 2017;Pauly, 2018;Lin-Greenberg, 2018;Jensen & Valeriano, 2019). Below, we provide a brief description of the SIGNAL game architecture and address the mechanisms through which it addresses the research question -do tailored nuclear weapons lead to an increased likelihood of nuclear use?
SIGNAL design SIGNAL is a three-player (1v1v1), turn-based experimental wargaming platform built upon a hexagonal-based grid. 2 All players -from a convenience sample, UC Berkeley's Experimental Social Science Laboratory, and Amazon Mechanical Turk -enter the game through a web browser, watch a short video, and complete a tutorial and demographic survey before competing in a virtual world to achieve the highest relative score across three win conditions over five rounds of play. Two of these win conditions are economic, focused on building infrastructure (i.e. maximizing the number of towns, cities, and/or military bases) and gaining resources (including food, oil, iron, and precious metals) and the other is security-related -centered on minimizing the loss of territory (as opposed to commandeering territory to the greatest degree possible). The zero-sum nature of this competition is specifically designed to provide a competitive environment, but not to force military conflic (Letchford et al., 2022).
As illustrated in Figure 1, SIGNAL uses abstract 'countries' (denoted by their color as Green, Purple, and Orange) to reduce the risk of players interpolating realworld cases into the experimental environment. The game world and competitive dynamics are explicitly designed to not map onto any real-world scenario in favor of illuminating how players respond to strategic questions rather than caricaturing a specific conflict or a country's likely action(s). There are, of course, tradeoffs between the internal and external validity of the study in making this choice.
As a between-subjects, control-treatment experiment, the nuclear capabilities provided to the Green and Purple players vary. For the purposes of this experiment, Green and Purple are given nuclear weapons along with a set of conventional capabilities. This dyad either has nuclear forces comprising only high-yield nuclear weapons (the control condition) or those comprising high-yield and tailored nuclear weapons, specifically high-precision low-yield (HPLY) and enhanced-EMP weapons (the treatment condition). 3 Conventional military capabilities (infantry, naval, missile, and defense capabilities) provide alternative means to hold resources in the game and degrade an adversary's capabilities. The Orange player, while having the same conventional military capabilities as other players and a slightly increased access to resources, does not have nuclear capabilities. The experimental conditions are summarized in Table I. Game play is governed by a set of rules that do not require external adjudication, with play taking place on a round-by-round basis. In brief, each round comprises three phases: signaling, action, and upkeep.
The signaling phase allows players to simultaneously place signaling tokens on hexes in the game environment and stage (face down) infrastructure or military capability cards to enable potential action in the subsequent phase. The staging of action cards has the dual significance of enabling future actions and 'signaling' what types of potential actions may be taken. Players may use signaling tokens to bluff, for example, placing them on hexes where they do not intend to take action. There is also a cost to staging capabilities that resembles 'costly signaling' or a 'credible commitment' to act (Powell, 1990;Sagan & Suri, 2003;Yarhi-Milo, Kertzer & Renshon, 2018). 4 During the action phase, players make decisions regarding which of the staged action cards they will execute. 5 The turn order is randomized to ensure that no player has a consistent first-mover advantage that may influence their decisionmaking and the game dynamics. During the upkeep phase, players keep score, collect income, and verify that they have sufficient resources to support their population and infrastructure. This gameplay provides a rich and immersive environment for players to grapple with strategies surrounding nuclear weapons. For example, we observed players solicit no first use agreements and nuclear umbrellas, as well as  Table I. Treatment and control conditions tested using the SIGNAL framework. Here, HY represents high-yield nuclear weapons, T represents those players provided HPLY nuclear weapons and electromagnetic pulse (EMP) nuclear weapons, and CW represents conventional (non-nuclear) weapons Levels of analysis and measures of nuclear use SIGNAL collects game-based data (N ¼ 425) comprising all of the signaling and action moves undertaken by players within the game. Second, SIGNAL collects player data (N ¼ 1275 of which N ¼ 850 have nuclear capabilities) 6 from the game as well as demographic characteristics theorized to influence behavior, including age, political affiliation, occupation, and experience. 7 At the game level, we extract from the data the incidence and type of nuclear use in each game. 8 When considering player-level data, we are also interested in the characteristics of players that use nuclear weapons.
To scope the dependent variable, we use two different measures of nuclear use: nuclear first use as well as whether or not a player used nuclear weapons at any point during the game. Indeed, there is good reason to believe that the drivers of nuclear first use might be distinct from nuclear use and we endeavor to analyze both. 9 As SIGNAL is a fixed round game, there is the potential for players to employ limited backward induction, modifying their strategy in the last round toward the optimal actions required to achieve the win conditions -and knowing that there is no opportunity for retribution. This introduces a potential systematic bias in the analysis whereby player actions are governed by game mechanics as opposed to their own strategic decisionmaking. To quantify the impact of this, the data are analyzed with and without actions from the last round included in the dataset.

Data analysis
We use regression-based methods to interrogate the effect of the experimental treatment (nuclear capabilities) and demographic variables on the dependent variable of interest (nuclear use) (Draper & Smith, 1998). As the dependent variable is treated as dichotomous (i.e. nuclear weapons are either used or they are not), we use logistic regression. 10 Specifically, we apply a series of logistic regression models to test the effects of the treatment on the binary wargame outcome of interest, Y, where 0 indicates no nuclear use and 1 corresponds to nuclear use of any kind. Here, x 1 ; . . . ; x k represents a set of predictor variables that might influence nuclear weapon use (e.g. the presence/absence of tailored nuclear capabilities, demographic characteristics, etc.). To determine the conditional probability, p, of nuclear use  (Sagan & Valentino, 2017). We also posit that those with intimate knowledge of nuclear weapons and national security issues may be more reticent to use nuclear weapons. McIntyre et al. have also previously engaged with questions of how demographic characteristics might influence behavior in a wargame setting (McIntyre et al., 2006). 8 Staging a nuclear capability in the signaling phase is necessary but not sufficient to constitute nuclear use. All nuclear cards, whether 'High-yield', 'HPLY', or 'EMP' are coded as nuclear use -and each card used to field the capability includes 'nuclear' in its title. 9 We use the following exclusion criteria to determine those games included in the analysis: games must have lasted at least three complete rounds, all players must have stayed in the game for the duration, and players must have completed the demographic survey associated along with the experiment. 10 Having a binary outcome variable violates the assumption of linearity in an ordinary least squares regression. Logistic regression addresses this issue via a logarithmic transformation of the outcome variable that allows us to model a non-linear association in a linear way -expressing the regression equation in logarithmic terms (a logit model). For a detailed description of this approach, see Menard (2002). We also include alternative model specifications in the Supplementary Materials to assess the influence of model choice on the result. As logit models assume the absence of multicollinearity, we run tests for multicollinearity between the variables using a variance inflation factor test.
(Y ¼ 1) for a given set of predictors, a logit transformation is applied of the form: where ' is the log odds and the coefficient values b 0 ; . . . ; b k , are obtained via maximum likelihood estimation. Using this approach, b 0 is an offset parameter corresponding to the log odds of nuclear use when the predictor variables are zero and b 1 ; . . . ; b k represents the expected change in log odds of nuclear use for a oneunit increase in the predictor variable. These log odds can, in turn, be used to calculate the odds and probability of a particular outcome using the log transform. For b 1 ; . . . ; b k , a positive or negative coefficient suggests that the predictor of interest has a positive or negative effect on the likelihood of nuclear use, respectively. The tables below report these coefficients as an estimate of the relationship between the predictors and the dependent variable of interest.

Nuclear use by game
We begin with an analysis of nuclear use by game wherein the dependent variable represents whether the players use nuclear weapons (of any type) in a given game. The treatment variable is binary, coded as a 0 for those games where tailored nuclear capabilities (i.e. enhanced EMP and HPLY nuclear weapons) are absent and as a 1 when they are present. 11 As a reminder, this article tests the symmetrical (peer competitor) condition in which both Green and Purple have identical nuclear capabilities.
Game-level results. Table III provides the results of a game-level logistic regression. Model 1 examines the effect of the presence of tailored nuclear capabilities on nuclear use. The coefficient and standard error suggest that there is a positive but statistically insignificant relationship between the presence of tailored nuclear capabilities and nuclear use. Put another way, there is only a 2% increase in the odds of nuclear use when tailored nuclear capabilities are present -and this effect does not rise to the level of statistical significance. Indeed, as the 'tailored' row shows, there is no statistically significant difference between the treatment and control games in the sample; this result is consistent regardless of the covariates included in the analysis. In the second, third, and fourth models, we include demographic variables theorized to influence the decision of a player to employ nuclear capabilities (Press, Sagan & Valentino, 2013;Sagan & Valentino, 2017). Each characteristic is treated as binary on a per-player basis, e.g. national security represents whether or not a player has work experience in the national security field, a player coded as more conservative has moderate to conservative political leanings and reported knowledge represents whether or not a player reports knowledge of nuclear issues. However, it is important to note that the demographic characteristics in the analysis represent an aggregation of player characteristics in the game. For example, the female characteristic of the game ranges from 0, in the case of a game that includes no women, to 3, for a game that comprises all female players. A game that includes two women and one man is scored as a 2, and so on. This approach does not address the potential for interaction effects between players of different types, as others have observed in team settings (Pauly, 2018).
Despite representing the largest wargaming dataset of its kind, we still have only 425 games in our dataset. As a result, we consider covariates in tranches (Model 2 and Model 3) before including all of the demographic characteristics in Model 4. As is the case with the treatment condition (Model 1), these demographic characteristics do not appear to be shaping a decision to use nuclear weapons inside of the SIGNAL wargame environment in a statistically significant manner. Those games that include more players that are women, have a college degree, report knowledge of nuclear issues, and are over 29 years of age have a lower likelihood of using nuclear weapons in the game. 12 Using log transformation, the coefficients for each can be translated into a percentage that reflects how adding an additional player with a specific characteristic will affect the likelihood of nuclear use. We find in Model 4, for example, that each addition of a player over the age of 29 reduces the probability of nuclear use by 8.4%, all else equal. Those games that include more players that report a background working in national security and are more conservative report a higher likelihood of nuclear use -although, again, without rising to the level of statistical significance. 13 The results of these analyses are also depicted in graphical form in Figure 2. It is clear from this visualization that each parameter estimate (i.e., b 1 ; . . . ; b k ) overlaps with zero within the estimated uncertainties. That is, there is no statistically significant difference in the likelihood of nuclear use between the treatment and control games, irrespective of the particular covariates included in the analysis. While not statistically significant, in Models 1-4, the inclusion of tailored nuclear weapons in the arsenal results in an increased likelihood of nuclear use.
The differences in findings between reported knowledge (decreased likelihood of nuclear use) and experience working in national security roles (increased likelihood of nuclear use) are particularly interesting in light of current debates concerning the appropriateness of sampling elites and non-elites. Specifically, some have questioned the appropriateness of using non-elite samples to address research questions pertaining to national and international security issues -with elites generally understood to be current or former senior policymakers in government . Others have found little quantitative evidence for gaps between elite and non-elite behavior (Kertzer, 2022). Using data concerning education, self-reported subject matter expertise, and occupation, there appears to be only negligible differences across games that include higher or lower numbers of players with these markers of 'elite-ness'.
Addressing final round effects. The SIGNAL wargame has a fixed number of rounds known to players at the outset of the game. To explore whether the findings reported above may be driven in part by players deciding to use their nuclear capabilities in the final round of the game without fear of reciprocal action, we extend the analysis of the game-level data to examine a subset in which the final round of play is discarded from the dataset. 14 The results of this analysis are shown 12 In an attempt to speak to the existing survey experiment literature, our demographic covariates reflect the models in Sagan & Valentino (2017). The age cut-off at 30 years of age represents one of the exceptions and was made given the available sample (very few players of SIGNAL are over the 50 years of age cut-off used in Sagan and Valentino's work). 13 The demographic survey allows a respondent to select a moderate political ideology, in addition to liberal and conservative leanings. For the purposes of this analysis, we code moderates as conservative for the results reported in Table III. 14 For a game that runs to four rounds, for example, only the first three rounds would be included in the analysis. This theoretical concern regarding iterated versus non-iterated games is wellestablished in the existing literature (Axelrod & Hamilton, 1981). in Table IV. Once again, the treatment has a positive (and not statistically significant) effect on the probability of nuclear use, although it is worth noting that the coefficient associated with the experimental treatment is considerably larger in Model 5 (0.214) compared to Model 1 (0.021) Put another way, when the last round is removed, the odds of nuclear use rise 24% when tailored nuclear capabilities are in the game compared with the 2% rise associated with the same treatment noted above. Further, the coefficient obtained in Model 5 is positive within one standard deviation.
Comparing the demographic coefficients from Model 4 in Table III and Model 6 in Table IV, the coefficients associated with age and education also shift considerably, suggesting that younger players and those with college degrees are most sensitive to last-round effects. All in all, these results suggest that when the last round of game data are taken out of the analysis, the presence of tailored nuclear capabilities may have a greater impact on the likelihood of nuclear use in the game, all else equal.
Testing for substitution. Recall that both the instability and stability schools discussed above noted the potential for tailored nuclear capabilities substituting for their high-yield counterparts. To test this proposition, we compare the use of high-yield nuclear capabilities across the two experimental conditions using the same modeling approaches as outlined above. To construct this dependent variable, we create a dichotomized measure of whether a player uses a high-yield nuclear card in the game or not. The results of these analyses are included in Table V. Models 7 and 8 suggest that those games that include tailored nuclear capabilities decrease the odds of highyield nuclear use by approximately 12% compared with those games wherein only high-yield nuclear weapons are present. These negative findings are not statistically significant, however. Interestingly, those players that report a background in national security appear more likely to use high-yield nuclear weapons, all else equal. Taken together, these results suggest that there is good reason to further interrogate the substitution effect associated with tailored nuclear capabilities in future work.
Summarizing the analysis of the game-level data, nuclear use appears more likely when tailored nuclear capabilities are present. Additionally, high-yield nuclear use appears less likely when tailored nuclear capabilities are present. This suggests that the substitution effect

Nuclear use by player
The second set of statistical models presented here uses the player as the unit of analysis. Rather than coding the game based on whether it is a treatment or control game, these analyses examine player actions given their random assignment of nuclear capabilities. Mirroring the analyses above, a player in the treatment game (coded as 1) has access to high-yield and tailored nuclear weapons. A player in the control game (coded as 0) only has access to high-yield nuclear weapons. Non-nuclear players were removed from the dataset as they could not be reasonably expected to cross the nuclear threshold without the requisite capabilities.
With the additional granularity of the player-level data, we ask two related questions: what are the determinants of an individual player deciding to use nuclear weapons; and what are the determinants of an individual player deciding to use nuclear weapons first? The distinction between nuclear use and nuclear first use is particularly important given that the cognitive drivers for each may be different. 15 In simple terms, a decision to escalate a conventional war to a nuclear conflict is, at least in theory, distinct from a decision to reciprocate in kind.
We once again turn to a series of logit models to examine the effects of the treatment on the likelihood of a player to use nuclear weapons. The results of these analyses are shown in Table VI. Models 9 and 10 examine the effects of the predictors on nuclear first use. The coefficient of 0.009 and standard error of 0.143 reported in Model 9 suggest that the effects of the additional tailored nuclear capabilities have a negligible impact on the likelihood of nuclear first use. Model 10, that takes demographic characteristics into account, also suggests that the treatment has little impact on nuclear first use. The results of Model 10 also suggest that female players and those players that  reported subject matter expertise in the national security field are less likely to use nuclear weapons first in the SIGNAL wargaming environment with significance of p < 0.05 and p < 0.10, respectively. This is an important finding given that it is at odds with Sagan and Valentino's finding that women are just as likely as men to assume hawkish behavior (Sagan & Valentino, 2017). Models 11 and 12 return to an analysis of nuclear use rather than nuclear first use. Model 11 reports a similar positive, statistically insignificant result (0.021). Model 12 once again reports the demographic resultsnone of which are statistically significant and all of which are broadly in line with the game-level analysis -suggesting that the aggregation of player actions to the game level does not meaningfully alter the findings.
These analyses using the player as the unit of analysis provide two important insights. First, there are important differences between nuclear first use and subsequent nuclear use as evidenced by comparison of Tables VI and III. Second, the presence of tailored nuclear capabilities continues to have a positive -but statistically insignificant -effect on nuclear use at the player level (Models 11 and 12).

Method comparison
When developing a new method of inquiry, we would ideally validate the approach against the empirical record. However, one of the primary justifications for developing a new methodological approach to interrogate nuclear issues is the dearth of empirical data with which to test existing theories regarding nuclear deterrence and conflict escalation, particularly in the context of novel and emerging technologies. To address this challenge, we use a survey experiment to explore the same research question examined using the SIGNAL experimental wargaming platform and compare the findings.

SIGNAL survey design
The SIGNAL survey is a three-segment factorial vignette experiment designed to approximate a series of scenarios faced by players inside of the SIGNAL wargame environment. In the survey, respondents provide recommendations to their state leader in the face of an evolving crisis. We randomly assign military capabilities to both the survey respondent and the fictional adversary in three ways (no nuclear capabilities, high-yield nuclear capabilities, high-yield and tailored nuclear capabilities), resulting in a 3 Â 3, between-subjects survey experiment design. For the purposes of this article, we are concerned with the two conditions that are corollaries to the treatment and control conditions in the SIGNAL experimental wargame environment described above. 16 In the first segment of the vignette, respondents faced a scenario in which an adversary plans to build a military base in a near neighbor. Then, respondents faced an unattributed cyber attack. Finally, respondents faced a nuclear threat scenario. The baseline vignettes remained the same across treatments with the experiment introducing two sources of variation.
First, we vary the capabilities ascribed to the adversary based upon the experimental condition randomly assigned to the research subject, where the notional adversary was randomly assigned no nuclear capabilities, only high-yield nuclear weapons, or HPLY nuclear weapons, enhanced-EMP nuclear weapons, and highyield nuclear weapons. Second, we vary the military, economic, and diplomatic policy responses that players could choose to advise on the basis of their randomly assigned treatment.

Survey experiment results
Here, we interrogate the impact of the presence of tailored nuclear capabilities on the respondent's policy advice. To best approximate the decisionmaking process faced by players in the SIGNAL wargame environment, we used all three segments in the multisegment survey as the unit of analysis. The treatment condition refers to the case in which respondents (and the fictional adversary) are provided with high yield and tailored nuclear capabilities. The control condition refers to the case in which respondents (and the adversary) are provided with only high-yield nuclear capabilities. Nuclear use, for the purposes of the survey experiment, refers to respondents choosing any of the three nuclear use policy options. Table VII provides the results of these analyses. As shown in Model 16, the coefficient estimating the effect of tailored nuclear capabilities (1.136) suggests that respondents provided with HPLY and enhanced-EMP nuclear weapons in addition to high-yield nuclear capabilities are approximately three times as likely to use nuclear weapons in comparison with respondents who are provided with only high-yield nuclear capabilities. This finding is statistically significant to the p < 0.01 level. In Model 17, we assess the same demographic covariates used above and find that those respondents 30 years of age and older (À1.087) have a lower likelihood of recommending nuclear use, while those that identify as more politically conservative (0.641) have a higher likelihood of recommending nuclear use -consistent with Sagan and Valentino's recent work (Sagan & Valentino, 2017). In Models 15 and 16, we test the likelihood of respondents recommending the use of high-yield nuclear capabilities with and without the presence of tailored nuclear capabilities in the arsenal. If there is a substitution effect, as found in the wargaming above, we would expect to see a negative coefficient associated with the tailored condition -particularly as overall nuclear use, as established in Models 13 and 14, is higher in this condition. As in the SIGNAL experimental wargame, the survey data suggests that the presence of tailored nuclear capabilities has a negative effect (À0.652) on the likelihood to recommend high-yield nuclear use. That is, respondents with tailored nuclear capabilities are half as likely to employ their high-yield nuclear weapons. Unlike the results of the wargame analysis above, this finding is statistically significant.
As respondents may select as many policy recommendations as they deem appropriate, an analysis of how respondents rank nuclear options in their guidance may help to better understand whether respondents viewed their recommendation as an important strategic decision. The results of these analyses are included in Table VIII. Here, we examine the effect of the additional tailored nuclear capabilities on the likelihood of respondents ranking nuclear use in the top five, top three, or as the top policy option shown in Models 17-Model 19, respectively. In Model 17, the presence of tailored nuclear capabilities has the same positive, statistically significant effect (0.933) reported above -although the coefficient is lower than the unranked analysis. In Model 18, which examines when nuclear use is ranked within the top three recommendations, the coefficient (0.835) further decreases. Model 19, wherein nuclear use is ranked as the top policy guidance, reports a positive estimate (0.652), but the finding is no longer statistically significant. Taken together, these models suggest that the effect of tailored nuclear capabilities on the decision to recommend nuclear use lessens as the respondent's commitment to that option strengthens. This suggests that respondents may ultimately prefer alternative capabilities but that the presence of tailored capabilities places nuclear weapon use squarely on the table.
In summary, our analysis of the SIGNAL survey data suggests that there is a positive, statistically significant relationship between the presence of tailored nuclear capabilities and nuclear use -under symmetric conditions. The survey data also provide further evidence for a substitution effect, whereby tailored nuclear capabilities are likely to be used in lieu of their high-yield counterparts. While there are similarities in the findings between the wargame and survey analysis, there are also differences that point to the different laboratory effects across the two experimental environments worthy of further study.

Conclusion
Across wargaming and survey methods, the evidence presented in this article finds limited support for the proposition that tailored nuclear capabilities increase the likelihood of crossing the nuclear threshold. Amid policy debates concerning the appropriate mix of nuclear capabilities as nuclear weapons states modernize their arsenals, in general, and concerns surrounding the proliferation of low-yield nuclear weapons, in particular, this finding suggestive of the destabilizing consequences of tailored nuclear capabilities raises important questions for both academics and policymakers to consider. Our analysis also finds support for the proposition that lowyield nuclear weapons substitute for their high-yield counterparts -suggesting that even in nuclear conflict, players internalize distinctions in the use of different types of force.
This article also showcases the use of experimental wargaming methods to create an immersive environment for carrying out quantitative social science research. The results discussed above also point to important differences between experimental wargames and surveys as data-generating processes. Further work is undoubtedly needed to understand the laboratory effects associated with wargame design. For example, do team-based decisionmaking processes yield different results than the individual-level decisionmaking implemented in SIG-NAL? Does the number of players inside a game setting matter -would three-player games evolve differently than 10-player games? Would different win conditions yield different results? How might digital vs. analog settings influence player behavior? 17 To answer these methodological questions, we look forward to scholars of behavioral social science implementing, manipulating, and testing experimental wargame designs. Perhaps most significantly, this work represents a model framework that combines experimental and gaming methods to interrogate research questions pertaining to international security. Our approach addresses some of the methodological concerns associated with alternative synthetic data-generating processes -from formal models that bake in simplifying assumptions to traditional wargames that rely on adjudication and offer idiosyncratic results. Moreover, the development of this method and its comparative benefits vis-à-vis existing approaches offers particular advantages with regard to data-starved policy and academic debates concerning the risks posed by emerging military capabilities -from hypersonic missiles to the integration of 'AI technologies' -that the existing literature struggles to adjudicate. While Quinlan -quoted at the top of this article -is right that scholars do not have empirical data regarding nuclear weapons use, it is our hope that this work serves as an initial demonstration of the potential utility of experimental wargaming for large-N analysis by revisiting a long-held and policy-relevant research question related to deterrence, strategy and international security (Quinlan, 2009). For research questions in which observational data are limited, experimental wargaming methods represent a compelling new tool for social science inquiry.

Replication Data
The dataset, Online appendix, codebook, and replication files for the empirical analysis in this article are available at https://www.prio.org/jpr/datasets/. All analyses were conducted using R.