Optimizing the tie-breaker regression discontinuity design

Motivated by customer loyalty plans, we study tie-breaker designs which are hybrids of randomized controlled trials (RCTs) and regression discontinuity designs (RDDs). We quantify the statistical efficiency of a tie-breaker design in which a proportion $\Delta$ of observed customers are in the RCT. In a two line regression, statistical efficiency increases monotonically with $\Delta$, so efficiency is maximized by an RCT. That same regression model quantifies the short term value of the treatment allocation and this comparison favors smaller $\Delta$ with the RDD being best. We solve for the optimal tradeoff between these exploration and exploitation goals. The usual tie-breaker design experiments on the middle $\Delta$ subjects as ranked by the running variable. We quantify the efficiency of other designs such as experimenting only in the second decile from the top. We also consider more general models such as quadratic regressions.


Introduction
Airlines, hotels and other companies may offer incentives such as free upgrades to their most loyal customers in the expectation that those customers will respond favorably with future business. The companies wish to measure the impact of those incentives while also trying to get the greatest benefit from them. An e-commerce company might want to offer some analytic tools to the customers most likely to benefit from them, while also measuring the impact of offering those tools.
These companies can rank their customers, offer the incentive to the highest ranked ones, and then measure impact with a regression discontinuity design (RDD). Or they can run a randomized controlled experiment (RCT) and measure impact by comparing results from customers with and without the incentive. The RDD is expected to have the greatest immediate payoff while the RCT is known to be more statistically efficient.
This tradeoff is naturally handled in a tie-breaker design. For a running variable x, subjects in a tie-breaker design are allocated to a control condition if x A, to a test condition if x B and their treatment (test or control) is randomized if A < x < B. If A = B then no subjects are randomized and the data follow an RDD as introduced by Thistlethwaite and Campbell (1960). The treatment effect is estimated as the extent to which the regression has a jump discontinuity where x = A = B. At the other extreme, if all the x values are above A and below B, then the design is an RCT as described in texts on causal inference (Imbens and Rubin, 2015) or on experimental design (Box et al., 1978;Wu and Hamada, 2011). Tie-breaker designs are also called cutoff designs (Cappelleri and Trochim, 2003) and the running variable is also called an assignment variable or a forcing variable. Sometimes we refer to subjects getting the treatment or not, in place of getting test and control levels of the treatment.  use a tie-breaker design to evaluate the effects of post secondary aid in Nebraska. In that setting, x was a student ranking. Students were triaged into top, middle and bottom groups. The top students received aid, the bottom ones did not, and those in the middle group were randomized to receive aid or not. Aiken et al. (1998) report on a study about allocation of students to remedial English classes where the running variable is a measure of students' reading ability before they matriculate.
Our interest is in optimizing the size of the RCT within a tie-breaker experiment. The RCT is well known to be more statistically efficient than the RDD. See for instance Jacob et al. (2012a, Section 6). However the positive impact from the test condition is ordinarily going to be better in an RDD. Companies may have more to gain by increasing business from their best customers. Similarly, merit-based scholarships are used when one wants to get academically stronger students into a class. There is thus an exploration-exploitation tradeoff here; the RCT is better for measuring impact while the RDD is expected to have more positive impact on the subjects under study.
It is possible to study this tradeoff via extensive Monte Carlo simulations or similar numerical exploration. While that approach can be used with very detailed assumptions about the distribution of x and flexible models for the response of interest, it does not provide much insight into the general nature of the tradeoff. We consider a special case where the running variable has been rescaled to have a symmetric distribution centered at x = 0, and the experimental range is from A = −∆ to B = ∆. We will use a linear regression model for the response with a separate slope and intercept for test and control. In this setting, Jacob et al. (2012a, Section 6) found the RCT to be 4 times as efficient as the RDD when x is uniformly distributed. Figure 1 illustrates tie-breaker designs for four values of ∆. The assignment variable there has a Gaussian distribution, that we assume has been centered and scaled. The outcome variable is simulated from a linear model with a constant treatment effect. For instance, in the third panel, the top 1/6 of customers get the treatment, the bottom 1/6 do not and a fraction ∆ = 2/3 of the data in the middle have randomized allocation. For a Gaussian allocation variable, the experimental region in the middle of the data is where the data are most densely packed, which will typically be desirable. This paper is organized as follows. Section 2 introduces a two-line regression relating an outcome to the assignment variable. The slope and intercept vary between treatment and control. The assignment variable will not always be Gaussian, but we can always rank order it, so that section is based on the ranks. Section 3 shows that the statistical efficiency of incorporating ∆ > 0 experimentation versus the plain regression discontinuity design at ∆ = 0 is 1 + 3∆ 2 (2 − ∆ 2 ). Thus, statistical efficiency is a monotone increasing function of the amount of experimentation. At the extreme, a pure RCT with ∆ = 1 is 4 times as efficient as the RDD. We ordinarily expect that our outcome variable will show the greatest gains if we give the treatment to the highest ranked customers. Section 4 quantifies that cost in the two-line regression model and trades it off against statistical efficiency. The optimal ∆ is then dependent on the ratio between the value per customer of the short term return and the value of the information per customer that we get for a given ∆. Although an experiment might be designed for a linear model, once the data are collected there may be nonlinearities that warrant a more flexible model. Section 5 repeats our analysis of the linear model for a pair of quadratic regression models. In this case, regression discontinuity design has a much higher variance than the experiment does. This is in line with recent findings of Gelman and Imbens (2017). Section 6 revisits the Gaussian case that we illustrate in Figure 1. It is similar to the uniform case. Here a full RCT is π/(π − 2) . = 2.75 times as efficient as the RDD. It is qualitatively similar to the uniform case. Section 7 describes a numerical version of our approach that does not require a simplistic regression model. One can always use brute force optimization of a Monte Carlo simulation. We show how to replace the simulation inner loop by matrix algebra allowing faster and more thorough optimization. The tie-breaker literature has emphasized experiments in the middle range of the running variable x. Section 8 looks at off center experiments, such as experimenting in just the second decile from the top. In our motivating applications, the incentive might only be offered to a small fraction of customers. Section 9 contains a short discussion of how to use the findings.
We close this introduction with some additional references. Since Thistlethwaite and Campbell (1960), there have been many applications of regression discontinuity designs, particularly in economics and political science. Textbook treatments and surveys may be found in Angrist and Pischke (2009), Angrist and Pischke (2014), Jacob et al. (2012b), Imbens and Lemieux (2008), Jacob et al. (2012a), Klaauw (2008), and Lee and Lemieux (2010).
It is well known in the literature that experiments are more efficient than regression discontinuity designs. Section 6 of Jacob et al. (2012a) discusses this point in depth. They include the four-fold efficiency improvement we get for uniformly distributed running variables and a factor of 2.75 for normal running variables. The latter goes back to Goldberger (1972).
For an historical note, a tradeoff of this kind appeared in the Lanarkshire milk experiment, described by Student (1931). The goal was to measure the effect of a daily ration of milk on the health of school children. Among many complications was the fact that some of the schools chose to give the rations to the students that they thought needed it most. While that may have been the most beneficial way to allocate the school's milk, it was very damaging to the process of learning the causal impact of the milk rations. A tie-breaker experiment might have been a good compromise.

Delta = 1 (RCT)
Assignment variable Outcome Figure 1: Illustrative data for tie-breaker designs with ∆ ∈ {0, 1/3, 2/3, 1}, and a standardized Gaussian assignment variable. The regression discontinuity design has ∆ = 0, the randomized controlled trial has ∆ = 1. Treated points are plotted in red, control in black. Allocation is deterministic for x outside the blue lines.

Setup
We begin with a simple setting where there are an even number N of customers i = 1, . . . , N , and exactly N/2 of them will receive the treatment. There is an "assignment variable" x i ∈ R that measures the suitability of the customer for the program. The assignment variable might be the output of a statistical machine learning model based on multiple variables, or it could be based on a subjective judgment of one or more experts or stakeholders. We will simplify the problem by transforming x i to be equispaced in the interval [−1, 1]. That is, after sorting the customers in increasing order of x i , we make a rank transformation to x i = (2i − N − 1)/N . If N = 6, the assignment variable is (−5, −3, −1, +1, +3, +5)/6. Let z i indicate the treatment status; subjects that receive the treatment have z i = +1 and subjects that do not receive the treatment have z i = −1.
We denote the experimental interval by (−∆, +∆) for ∆ in [0, 1]. In our hybrid design the treatment assignment takes the form: If ∆ = 0, then we have a classic RDD with the discontinuity at x = 0. If ∆ = 1, then we have a classic RCT. If 0 < ∆ < 1, then we have a tie-breader design with ∆ measuring amount of the randomization.
The random allocation in equation (1) will make half of the z i for |x i | < ∆ equal 1 and the other half will be −1. One way to do this is to choose z i = 1 for a simple random sample of half of the elements in R = {i | |x i | < ∆}. Stratified schemes, setting z i = 1 for exactly one random member of each consecutive pair of indices in R are also easy to implement.
The impact of the treatment is measured by a scalar outcome Y where Y i is a measure of the benefit derived from customer i. We suppose that the delay time between setting z i and observing Y i is long enough to make bandit methods (see for instance, Scott (2015)) unsuitable. We will instead compare experimental designs using the following two-line regression model: where ε i are IID random variables with mean 0 and finite variance σ 2 > 0. Our analysis is based on the regression model (2) instead of the randomization because the treatment for subjects with x outside (−∆, ∆) is not random. The effect of the treatment averaged over customers i = 1, . . . , N is 2β 2 . The factor of 2 comes from comparing z i = 1 to z i = −1. We can also estimate whether the effect increases or decreases with x, through the coefficient β 3 . The quantity 2β 2 is also the magnitude of the treatment effect on a (hypothetical) average customer with x = 0.
Under model (2), we can distinguish customers for whom the treatment is effective from those for whom it is not. Suppose that τ is the incremental cost of offering the treatment to one customer. If β 3 > 0, then there is a cutpoint , 1] then the treatment either pays off on average at all x, or pays off on average for no x. If β 3 < 0, then the treatment only pays off for customers with x i x * . We discuss that case further in Section 4.

Efficiency in the two-line model
We will analyze the data (x i , Y i ) for i = 1, . . . , N by fitting model (2) by least squares. The parameter of interest is β = (β 0 , β 1 , β 2 , β 3 ) T and we assume that Y i are independent random variables with Var and Var(β) = (X T X ) −1 σ 2 . Because σ 2 does not depend on ∆, we can compare designs assuming that σ = 1.
Next, we look at how X T X depends on ∆. For large N we can replace i x 2 i by N 1 −1 x 2 dx/2 = N/3. Similar integral approximations yield where where φ(∆) is the average value of z × x over the design. We let The approximation error in (3) is O p (1/ √ N ) when the random z i are assigned by simple random sampling and it is much smaller under stratified sampling. We will work with (3) as if it were exact.
We can reorder the rows and columns of (3) to make it block diagonal, where the labels on the matrix above refer to the variables that the β j multiply and φ = φ(∆). It follows that Thus the variances scale by (1/3 − φ 2 ) −1 . The individual component variances are Var(β 0 ) = Var(β 2 ) = 1/(1 − 3φ 2 ) and Var(β 1 ) = Var(β 3 ) = 3/(1 − 3φ 2 ). These variances are smallest for small values of φ, corresponding to large values of ∆. That is, the more randomized experimentation there is in the data, the less variance there is in the estimates. Therefore, the regression discontinuity design is worst and the randomized experiment is best. Larger values of φ also induce stronger correlations among theβ j . Variance vs Delta 0 = regression discontinuity, 1 = experiment The estimated gain from the intervention for a customer with a given x isÊ(Y | x, z = 1) −Ê(Y | x, z = −1) = 2(β 2 + xβ 3 ). Next after some algebra. The relative efficiency of the experiment versus regression discontinuity is for all x. That is, the randomized experiment with N/4 observations is as informative as the regression discontinuity with N observations and this holds uniformly over all levels of the assignment variable x. This factor of 4 is given by Jacob et al. (2012a). Figure 2 shows the variance of the treatment effect parameters as a function of ∆. Some values from the plot are shown in Table 1. The regression discontinuity design has four times the variance of the experiment as we saw in equation (7). The slope coefficent for treatment always has three times the variance of the intercept coefficient as follows from (5). Figure 3 show the variance of the estimated impact versus x for several choices of ∆. Method ∆ Var(β 2 ) Var(β 3 ) Regression discontinuity 0 4/N 12/N Experiment 1 1/N 3/N Table 1: Variance ofβ 2 (treatment effect intercept) andβ 3 (treatment effect slope) under regression discontinuity (∆ = 0) and randomized experiment (∆ = 1). It assumes that Var(Y | x, z) = 1.

Variance of treatment effect vs x Linear regression
Target location x N x Variance Top = reg. discontinuity, Delta=0 Bottom = experiment, Delta=1 Step size 0.1 Figure 3: Variance of 2(β 2 + xβ 3 ) versus x in the two-line model (2), for ∆ between 0 and 1 in steps of 0.1. Note that the vertical axis is logarithmic.

Cost of experimentation
We ordinarily expect the value of the incentive to increase with the variable x. In that case the greatest return on the N customers in the experiment arises from the regression discontinuity design with ∆ = 0. The information gain from ∆ > 0 comes at some cost in the present sample. This section quantifies that cost.
For a deterministic allocation of z = 1 or z = −1 we have E(Y | x, z) = β 0 + β 1 x + β 2 z + β 3 zx. When z is chosen randomly with Pr(z = 1) = Pr(z = −1) = 1/2, then E(Y | x) = β 0 + β 1 x. It follows that the expected gain per customer in the hybrid design is Neither β 1 nor β 2 appear in this gain and the value of β 0 does not affect our choice of ∆. Only β 3 which models how the payoff from the incentive varies with the assignment variable x makes a difference. Compared to the regression discontinuity design with ∆ = 0, the cost of incorporating experimentation is N (g(0) − g(∆)) = N β 3 ∆ 2 /2, which grows slowly as ∆ increases from zero and then rapidly as ∆ approaches one.
If β 3 > 0, then as expected, we gain the most from the regression discontinuity design and the least from the experiment. This is a classic exploration-exploitation tradeoff.
It is also possible that some settings have β 3 < 0. This might happen if the incentive is additional free tutoring in the educational context, or if it is advice on how to best use an e-commerce company's products in a context where higher performing customers already knew about the advice. In these cases the minimal cost is to give the incentive to the bottom N/2 customers and not the top N/2 customers. The analysis of this paper goes through by reversing the customer ranking, thereby replacing x by −x and also changing the sign of β 3 . Now we turn to optimizing the choice of ∆ given some assumptions on the relative value of the information in the data for future decisions and the expected gain on the experiment. The precision (inverse variance) of our estimate ofβ is a linear function of N and so is the expected gain. We can therefore trade off precision per customer with gain per customer. We think that β 3 is the most important parameter so we take the precision gain per customer to be Alternatively, we could focus on 2β 2 which is both the average gain per customer and the gain for the customer at x = 0. The precision for 2β 2 turns out to be p(∆)/4 so it perfectly aligned with precision on β 3 . More generally the gain from the incentive at any specific x has a variance given by (6). Any weighted average of precision of 2(β 2 + β 3 x) over points x ∈ [−1, 1] is a scalar multiple of p(∆) from (8).
We trade off gain per customer and precision per customer with the value function where λ > 0 measures the value for future decisions of having greater precision on β 3 . Proposition 1. Let v(∆) be given by equation (9) with λ > 0 and β 3 0. Then the maximum of v over ∆ ∈ [0, 1] occurs at Proof. Let γ = ∆ 2 . We will first maximize v = c − β 3 γ/2 − λ(1 − γ) 2 /4 over 0 γ 1, where c does not depend on γ. Now v has a unique maximum over γ ∈ R at γ * = 1 − β 3 /λ. The maximizing γ is γ * when 0 γ * 1, it is 0 when γ * < 0 and it is 1 when γ * > 1. Equation (10) translates these results back to the optimal ∆. We see from equation (10) that the decision depends on the critical ratio β 3 /λ. The numerator reflects the value of more efficient allocation and the denominator captures the value of improved information gathering. When β 3 λ then the discontinuity design with ∆ = 0 is optimal. The full experiment, ∆ = 1, is never optimal unless β 3 = 0 or the value λ of information to be used in future decisions is infinite. Figure 4 shows the value ∆ * from equation (10) versus the ratio r = β 3 /λ of the short term to long term value coefficients. The function is nearly equal to 1 − r/2 near the origin and has negative curvature on 0 r 1. If future uses are important enough that r 1/10, then one should use ∆ 1 − 0.1/2 = 0.95. That is, when the future is very important the optimal hybrid is very close to an RCT.

Quadratic regression
A quadratic regression model allows a richer exploration of the treatment effect. For instance, model (11) allows for the possibility that the treatment pays off if and only if x is in some interval. It also allows for a situation where the payoff only comes outside of some interval. This model has even (symmetric) predictors 1, xz, x 2 and odd (antisymmetric) predictors x, z, zx 2 . As in the linear case, the even and odd predictors are orthogonal to each other. Now (1/N )X T X is a 6 × 6 block diagonal matrix. Some of the entries are as well as φ(∆) from Section 3 that we call φ 1 (∆) here. We find that Once again we get a block diagonal pattern with two identical blocks. This is a consequence of z 2 = 1, and it will happen for more general models with odd and even predictors.

Variance of treatment effect vs x Quadratic regression
Target location x N x Variance Top = reg. discontinuity, Delta=0 Bottom = experiment, Delta=1 Step size 0.1 Figure 5: Variance of 2(β 2 +xβ 3 +x 2β 5 ) versus x in the quadratic model (11), for ∆ between 0 and 1 in steps of 0.1. Figure 5 show the variance of the estimated impact versus x for several choices of ∆. Notice that the variance is given on a logarithmic scale there. The regression discontinuity design ∆ = 0 in the top curve there, has extremely large variances especially where |x| is close to 1. The randomized design at the bottom has much smaller variance. Even the maximum variance in the experiment (at x = 1) is smaller than the minimum variance in the regression discontinuity model (at x = 0).

Gaussian case
The original assignment variable might have a nearly Gaussian distribution. Or we might believe that the two-line linear model fits better if we have transformed the assignment variable rank to normal scores x i = Φ −1 ((i − 1/2)/N ), where Φ(·) is the cumulative distribution of the N (0, 1) distribution.
We will experiment on the central data with |x i | τ choosing τ to get a fraction ∆ of data in the experiment. That leads to τ = Φ −1 ((1 + ∆)/2). After reordering the variables we find in this case that Compared to the uniform scores case, the diagonal has changed from (1, 1/3, 1, 1/3) to (1, 1, 1, 1). The value of φ from the uniform case changes to For this Gaussian case, all 4 estimated coefficientsβ j have the same variance, equal to 1/(1 − φ 2 G ). The variances for uniform assignment variables were not all the same. The difference stems from the points x i having variance 1/3 in the uniform case instead of variance 1 here. As before as ∆ increases, φ G also increases and so Var(β j ) decreases. Now we work out the efficiency of the RCT compared to the RDD. For the RCT, ∆ = 1 yields τ = ∞ and then φ G = 0. For the RDD, ∆ = 0 yields τ = 0 and then φ G = 2ϕ(0). Thus the efficiency of the RCT compared to the RDD is as reported by Goldberger (1972). This is somewhat less than the efficiency gain of 4 in the uniform case. The efficiency versus ∆ (not shown) has a qualitatively similar shape to the black curve for the coefficient of z in the uniform case ( Figure 2).

General numerical approach
The two line model for a running variable x with a symmetric distribution made it simple to study central experimental windows of the form (−∆, ∆). In that setting the means of x i and z i were both zero, and the variance of parameter estimates depended simply on just one quantity ∆. We may want to use a more general regression model, allow experimental windows that are not centered around the middle value of x, have x values that are not uniform or Gaussian, and we might also want to use models other than two regression lines. There might even be more than one running variable as in Abdulkadiroglu et al. (2017). The price for this flexibility is high; users have to answer some hard questions about their goals, and then do numerical optimization over parameters with a potentially expensive Monte Carlo inner loop. In this section we show that the inner loop can be done algebraicly.
We suppose that prior to treatment assignment, customer i has a known feature vector F i ∈ R d which includes an intercept variable equal to 1, but not the treatment variable z i . For instance in the linear and quadratic models, the features F i are (1, x i ) T and (1, x i , x 2 i ) T , respectively. In the regression model for the treated customers i and E(Y i ) = F T i (β − γ) for the others. Here γ ∈ R d models the effect of treatment.
The generalized tie-breaker study works with a vector θ ∈ R d and sets In the random case, we suppose that z i = 1 with probability p and is −1 with probabilty 1 − p where p need not be 1/2. Because F i contains an intercept term, the experimental window |θ T F i | < ∆ need not be centered on a central value of θ T F i . The analyst must now choose ∆ 0, θ ∈ R d and p ∈ (0, 1). The analogue of our previous approach is to find the matrix (X T X ) −1 where The lower right corner of X T X is A because it is using E(z 2 i | F i ) = 1. Averaging over the outcomes of z i this way is statistically reasonable when n d. If ε i are independent with mean zero and variance σ 2 , then This averages over the outcomes ε i so that they do not have to be simulated. One can now do brute force numerical search for good values of θ and p and ∆. A good choice would yield a favorably small Var(γ). A bad choice will yield a larger variance covariance matrix. A very bad choice would lead to singular X T X and one would of course reject the corresponding triple (θ, ∆, p). For instance, such a singularity would happen if max i θ T F i < −∆ which is an obviously poor choice because then no customers would be in the treatment group.
Using a formula for the inverse of a block matrix we get

Non-central experimental regions
Our treatment of the two line model assumed that the experimental region was in the center of the range of the running variable. For a loyalty program one might prefer instead to allocate the benefit in a different way. Perhaps the top 10% get the benefit, and the next 10% are randomized to receive the benefit or not, while the bottom 80% do not get the benefit. For a less expensive incentive, the company might want to offer it to the top 50% of customers and then randomize it to the bottom 50%. We can model these options by taking for a b.
Let the running variable x ∈ R be random with E(x 4 ) < ∞. Let x ∈ R be random with a finite value of E(x 4 ). Let z = 1 with probability p(x) and z = −1 otherwise. Then letting X be the design matrix in the two line regression, and noting that z 2 = 1, we have under random sampling of x i and z i given x i for i = 1, . . . , n. The O p (n −1/2 ) error holds because E(x 4 ) < ∞. The error could be less than O p (1/ √ n) if p(x) is a simple enough function to make stratification tractable.
We can center x so that E(x) = 0 and then .
We can scale x to get E(x 2 ) = 1 so that D = I 2 . We retain more general scaling because x ∼ U [−1, 1] has E(x 2 ) = 1/3 and rescaling would require working with the less convenient distribution U [− √ 3, √ 3]. We need the inverse of a block diagonal matrix containing just two unique square blocks. The following proposition specializes block matrix inversion to our case.   Using Proposition 3 we get Our primary interest is in Var(β 3 ), for the coefficient of xz. This is the lower right element of (D − CD −1 C) −1 . Now The asymptotic value of nβ 3 depends on certain integrals. For the case of primary interest to us with x ∼ U [−1, 1], and p(x) = 1/2 in the experimental region, these are Table 2 shows Var(β 3 ) for various designs. The first two are the full experiment and the RDD discussed previously. Next is an experiment on just the bottom half of x. This strategy is inadmissible by our criteria. It has more variance than the RDD and also lower allocation efficiency.
Next, the table shows Var(β 3 ) for an experiment on just the second 10% of the running variable, from the 80th to the 90th percentiles of the U [−1, 1] distribution. Just below it is an equal sized experiment in the middle. We see that experimenting in the middle is much more informative. Shifting the experimental region to one side reduces the sample size for either the treatment or control level of z. It also affects the correlations among predictors in the two line model.
The variance for experimenting on the second decile looks large compared to the central experiments. It has within it a central experiment on just the middle third of the data from the 70th to the 100th percentiles of x. Experimenting on the middle third of [−1, 1] involves taking a = −1/3 and b = 1/3 which yields Var(β 3 ) . = 7.36. However if we had only experimented over the range 0.4 to 1.0 (with cut points at 0.6 and 0.8) then N would be only 0.3 times as large as it is in the second decile experiment. Furthermore, reducing the range of x by a factor of 0.3 multiplies β 3 by 1/0.3 and Var(β 3 ) by 1/0.3 2 . To adjust for these factors we divide 7.36/N by 0.3 3 and get 272.72/N . As a result doing the experiment on the second 10% really is better than just doing a central 1/3 experiment on the top 30%.
One tiny experiment involves just randomizing for one percent of the data centered on the median of x. We get a variance of 11.99/N for this compared to 12/N for the RDD, so the tiny experiment is almost identical to the RDD. We can move the location of the tiny experiment. Table 2 shows the results for a tiny experiment near the 80'th and 90'th percentiles of x. These are quite similar to skewed RDDs where the cutpoint is off center.

Discussion
In an incentive plan, a regression discontinuity design rewards the a priori best customers but it has severe disadvantages if one wants to follow up with regression models to measure impact. There is a tradeoff between estimation efficiency and allocation efficiency. Proposition 2 provides a principled way to translate estimates or educated guesses about the present value of the incentives and future value of information into a choice of ∆ in a hybrid experiment.
In industrial settings, the incentive under study will change over time. Experience with similar though perhaps not identical prior incentive plans then gives some guidance for making the tradeoff.
We have examined a simple linear model because it is easiest to work with and is a reasonable starting point in many contexts. Analysts have many more models at their disposal when the data come in. Section 5 on the quadratic model provides a warning: the RDD becomes very unreliable already with this model which is only slightly more complicated than the two-line model.
In some applications, the allocation variable may be the output of a scoring model based on many customer variables. We expect that incorporating randomness into the design will give better data for refitting such an underlying scoring model, but following up that point is outside the scope of this article. The effects are likely to vary considerably from problem to problem.