Can money buy control of Congress?

Can a political party spend enough across electoral campaigns to garner a majority within the U.S. Congress? Prior research on campaign spending minimizes the importance of campaign heterogeneity and fails to aggregate effects across campaigns, rendering it unable to address this question. Instead, we tackle the question with a system-level analysis of campaign expenditures. First, using a flexible machine learning approach, we show that spending has substantial and nonlinear marginal effects on outcomes at the level of the campaign. Second, by aggregating these effects to the entire U.S. Congress, we show that large seat swings that change congressional control have, in the past, been possible for expenditure levels consonant with those presently observed after having removed the most extreme levels. However, this possibility appears to have faded over the past decade. Our approach also allows us to illustrate the often significant effects that eliminating campaign spending could have.

To simulate hypotheticals, we used the KRLS models detailed in the paper and Appendix C. For each chamber, we used both the predicted means and estimated variance-covariance matrices for the set of contested seats, drawing 5000 samples from the multivariate normal distribution.We did so for four cases: (1) actual spending levels; (2) zero spending advantage, in which we held total spending at its minimum observed level for all seats; (3) cases in which the Democratic candidate spent more, in which we set Democratic Expenditure Advantage at a quantile larger than the median such that Democratic Expenditure Advantage was positive; and (4) cases in which the Republican candidate spent more, in which we set Democratic Expenditure Advantage at a quantile smaller than the median such that Democratic Expenditure Advantage was negative.Finally, we accounted for all seats that were either not up for reelection, uncontested, or dropped from the dataset for some other reason.We used these simulations to illustrate hypothetical control of Congress under different spending profiles.
In more detail, we used the R code below to produce these simulations.The code consists of a function that takes four arguments: (1) a type (a string "actual" or "zero") or quantile (a number from 0.05 to 0.95), ( 2) a fitted KRLS model (either that for the House or the Senate), (3) the dataset used to produce that model, and (4) the desired number of simulations, which defaults to 5000.
The function first checks whether the arguments are legal, and then defines three things: a new dataset of predictors pred_X, which is a copy of the predictors used in the KRLS model, the vector of actual observed values of Democratic Expenditure Advantage, and the vector of years corresponding to each of the contests included in the dataset.If the function is being used to simulate values at a quantile of Democratic Expenditure Advantage, that spending level is then calculated and called simulated_dem_spending_advantage, because we index the hypotheticals in the last case by the observed quantile of Democratic Expenditure Advantage, ranging from the 5 th percentile (a large Republican expenditure advantage) to the 95 th percentile (a large Democratic expenditure advantage).
Exactly what happens next depends on the type or quantile called for, but regardless of which of these hypotheticals is desired, the function will create the appropriate value of predictors (i.e., pred_X).When "actual" is called, pred_X is not altered at all.When "zero" is called, the column of pred_X which stores hypothetical values of Democratic Expenditure Advantage is set equal to 0, and the column of pred_X which stores hypothetical values of log Total Expenditure is set equal to its minimum observed value.
For any hypothetical quantile value q ∈ [0.05, 0.95], pred_X is altered in keeping with a scenario in which (for all races in the middle 90% of the Democratic Expenditure Advantage distribution) one of the two candidates is "topped up." To elaborate, for any positive value of simulated_dem_spending_advantage, we sweep through contests in the middle 90% of Democratic Expenditure Advantage, identifying those in which the Democratic candidate did not outspend her opponent at the desired level.We set the value of Democratic Expenditure Advantage to simulated_dem_spending_advantage in those rows, and change the value of log Total Expenditures to the appropriate corresponding value that would result from such additional spending.In other words, for all contests not in the tails of the spending distribution, we check to see if the Democratic candidate had a spending advantage less than the amount specified in the hypothetical, and increase their spending advantage to that point if so.That is what we mean above by "topped up." We do a similar process when simulated_dem_spending_advantage < 0, which corresponds to hypothetical scenarios in which all Republican candidates outspend their Democratic rivals.Specifically, for all contests in the middle 90% of Democratic Expenditure Advantage, we check to see if the Democratic candidate had a spending advantage greater than the amount specified in the hypothetical, and decrease their spending advantage to that point if so.
Given the completed pred_X object, we produce simulated values of Democratic Vote Share in all races by drawing from a multivariate Normal distribution with mean equal to the predicted values from the the fitted KRLS model evaluated at pred_X and the variance-covariance matrix provided by KRLS.It is important to draw all races from one multivariate Normal distribution because different observations will covary with each other.We draw 5000 such simulations for the vector of Democratic Vote Share values.
Finally, for each simulation and each contest, we record whether Democratic Vote Share was above 50%, and if so, count that as a Democratic victory in that race.We then sum the number of Democratic victories for each year and simulation.The function returns a data.framecontaining simulated numbers of seats held by Democrats and Republicans, for each year and each simulation, as well as the type or quantile called for by the function.
After running this function, we merge in the observed outcomes of races that were dropped from the dataset, e.g., unopposed candidates.The result is a set of 5000 simulated outcomes for each chamber, each year, and each hypothetical expenditure advantage.
The paper reports on simulations for "actual", "zero", maximum Democratic Expenditure Advantage (quantile = 0.95), and maximum Republican Expenditure Advantage (quantile = 0.05), as well as quantile values between 0.05 and 0.95, with increments of 0.01.

Appendix B: Dataset Construction
Our datasets comprise all House and Senate contests in the 19 general elections from 1980 to 2018.In the House, we have 8700(= 20 × 435) observations, and in the Senate, 667.For comparability, we exclude off-cycle Senate elections for the balances of incomplete terms that were held on general election dates.
For both House and Senate datasets, we started with the "Statistics of the Presidential and Congressional Election" compiled and published by the clerk of the House after each election, from which we gathered all candidates in each contest.We isolated those cases in which there was one Democratic party candidate and one Republican party candidate, and identified their names and vote totals.
These data were then merged with selected columns from Gary Jacobson's dataset on quality challengers and Adam Bonica's Database on Ideology, Money, and Elections (DIME), matching states, districts (for the House), election cycles, and names, cleaning where necessary.In addition to challenger quality measures, Jacobson's data provided indicators for whether a seat was open, included a Democratic party incumbent, occurred immediately after redistricting, or included some other event that rendered the contest noncomparable with contests that included two opposing major party candidates.Jacobson also provides presidential vote share at the House district level.These totals are from the most recent election where possible (e.g., mid-decade, midterm elections), from concurrent results for presidential election years, adjusting for redistricting where necessary.Results are similar when we drop House races with contemporaneous presidential elections.In the Senate, we use statewide Democratic presidential vote share from the most recent election.DIME provided matches to both Bonica's measure of ideology (CF scores) and FEC identifiers.We rely on CF scores, which are measures of candidate ideology based on the giving patterns of political donors, because it allows us to obtain ideological point estimates for challengers.Alternative methods for obtaining candidate ideology, like DW-NOMINATE, only provide ideology estimates for incumbents, which is why we instead rely on CF scores.The value of combining evidence from original sources and those collected separately by Jacobson and Bonica is that we could triangulate any discrepancies.
More generally, we include variables such as ideology and state population for several reasons.One is that each had been used in prior literature on understanding vote totals.For example, Abramowitz (1988) for state populations in Senate races and Ensley (2009) for ideology difference.A second is to account for potential bang-for-the-buck.Larger populations might suggest a lower marginal effect of money.Or the marginal effect of money might be different for those closer to a district median and for those further away, perhaps because money helps candidates become better known.A third is to capture other possible connections between those variables and the effect of money on elections.For instance, it is possible that ideology influences available money, and that available money helps determine expenditures.One could imagine other scenarios as well.Because KRLS allows the model specification to be, in a sense, determined by the data, by including both expenditures and ideology, we allow the model to account for any influence if such exists.
Given the aforementioned contest and candidate identifiers, we next merged in the relevant expenditure variables from the "all candidates file".Specifically, we selected the following columns: "Candidate identification, " "Candidate name, " "Party affiliation, " "Candidate state" and "Candidate district, " "Primary election status" and "General election status, " and finally "Total disbursements." We used all but the last column to identify total disbursements for all major candidates in the universe of contests, repairing missing or incorrect observations where necessary.For the Senate contests, we also brought in the log of each state's Voter Eligible Population for a given election year -relying on data from McDonald's United States Elections Project.This was done to control for the possibility of larger populations having a lower marginal effect of money.
We further used the available sources to identify events that rendered races noncomparable.Given these cleaned major party candidates, we next added in major independent candidates who caucused with a major party (e.g., Bernie Sanders, Virgil Goode, etc.).For example, we identified all contests from Louisiana in which the jungle primary included more than one major party candidate from a single party.In those cases, we replaced the row with the runoff election where possible.To accommodate cases in which a candidate won the jungle primary outright despite being one of several major party candidates, we created a variable named "Jungle" which we later used to exclude that row from analysis.Similarly, we identified other odd cases, including contests with two major party candidates from the same party because of a top two primary, or those with only one major party candidate and an independent who had not yet caucused with the opposing party.
Once we identified all major party candidates for all House and Senate races, we gathered, cleaned, and merged in outside spending by interest groups.Specifically, we gathered all spending on "electioneering communications",1 "communication costs", 2 "party coordinated expenditures", 3 , and "independent expenditures". 4 These data sources include indicators for spending by outside groups that include reference to specific candidates.In many cases, these outside groups are considered to be "dark money" because they do not disclose their donors; they are included, however, whenever they report spending money in particular contests.Indeed, many congressional contests attract considerable amounts of dark money, and that spending is transparent even though the donors to those groups remain unknown. 5We next cleaned these data, repairing mangled identification numbers, candidate names, geographic identifiers, etc.For the last category, records also included indicators for whether the group supported or opposed the candidate.We further coded party coordinated expenditures to indicate support for a candidate.For the first two categories, we had to supply such indicators on a case-by-case basis.To do so, we coded interest groups as conservative or liberal where possible, based on evidence from opensecrets.org.Such coding was not possible in the case of major trade associations that sometimes campaigned on behalf of candidates from both parties, but the vast majority of spending was easily identified using this method.Based on this strategy, we measured outside spending for and opposed to each candidate.
Importantly, outside spending changed dramatically during our study period, due to changes in law including the Bipartisan Campaign Reform Act and Citizens United.Consequently, we reestimated our models on subsets of our dataset.The conclusions presented in the paper appear not to change dramatically based on these inclusion criteria.

Appendix C: KRLS Model Description & Results
Our key measure of Democratic Expenditure Advantage is an interaction term that is held at zero for observations in the top 5% and bottom 5% of its distribution.The reasons are (1) the distributions of Democratic Expenditure Advantage are both leptokurtic, meaning that they have fat tails and extreme outliers, and (2) inferences based on these tail values are unlikely to extrapolate well to the bulk of distribution.In the Senate, the middle 90% of the spending distribution ranges from −$10.5M to $11.4M, but the total range is from −$49M to $78M.Its excess kurtosis is 17.8.Similarly, in the House, the middle 90% of the spending distribution ranges from −$1.7M to $1.7M, but the total range is from −$24M to $20M.Its excess kurtosis is 43.7.
The observations in the tails may be misleading for several reasons.First, there are likely decreasing returns to scale from campaign spending.Second, values in the tails may often emerge from quixotic candidacies launched by wealthy candidates who go on to lose dramatically.Third, our hypotheticals of interest do not include the possibility for such gigantic investments in all races; instead, we are interested in the much more plausible ranges indicated by the middle 90% of the distribution.
We fit all KRLS models using the bigKRLS package (Mohanty and Shaffer, 2019) in R.
Figure S1 illustrates the same results as in the text's Figure 1, save without separate lines for incumbents so as to make comparisons to later figures in this appendix easier.At the aggregate level, estimates are statistically significant in every case.As noted above, the LASSO is linear, and just like as with any linear model would be absent interaction effects without an a priori selection of the proper interaction terms.It is likely therefore depressed by smaller marginal effects in the tails.Of the three nonparametric methods, KRLS produced the smallest estimates of average marginal effects in both chambers.To the extent that the results in the paper are biased because of choice of learning method, this bias seems plausibly conservative.
To replicate Figure S1, we plot the estimated conditional average marginal effects in appendix Figure S2.Because the LASSO is linear, its prediction is flat, and we therefore omit it from the figure.Of the three nonparametric methods, KRLS is the most conservative, although its estimates are very similar to those from SVM. SVM also yields plausible marginal effects, with a range somewhat larger than those from KRLS, especially in the House.In contrast, the estimates from random forests are often implausible, with average marginal effects ranging from three to five times as large as those from KRLS and SVM.Further, the marginal effects ranges for random forests are extreme-between −170% and 2103% in the House, and between −38% and 186% in the Senate-which takes estimates several orders of magnitude outside of the logically possible range of the outcome variable.
Based on these comparisons, we are most confident about the estimates based on KRLS, which we document in the main paper, because they offer competitive out-of-sample predictive accuracy, produce plausible estimates of marginal effects, and are also the most conservative of the three nonparametric methods.LASSO likely performs the worst in terms of out-of-sample predictive accuracy because it does not accommodate heterogeneity in marginal effects without explicitly specifying interaction terms. 7While RF offers the best out-of-sample predictive accuracy, our method of calculating marginal effects based on RF yielded wildly implausible estimates. 8inally, because SVM is competitive with KRLS on both plausibility and predictive accuracy, we use it in Supplemental Appendix Section I as a robustness check, replicating our analysis completely with this method.

Robustness of Nonlinear Effect Estimates
Figure S1: Points are estimated marginal effects of Democratic Spending Advantage based on KRLS models.Lines are LOESS fits.

Figure S2 :
Figure S2: Replication of Figure S1 with alternative methods.The nonlinear finding from KRLS is replicated by both support vector machine regression and random forest regression.
Figure S3: Replication of Figure S1 with SVM and the nonparametric bootstrap.

Figure S4 :
Figure S4: Replication of Figure 2 from the main paper with SVM and the nonparametric bootstrap.Densities indicate simulated distributions of the numbers of seats held by Democrats under the hypothetical cases with Democrats' advantage held at the 95 th percentile (darker/blue) and with Democrats' advantage held at the 5 th percentile (lighter/red).