DELTA-METHOD INFERENCE FOR A CLASS OF SET-IDENTIFIED SVARS

This paper studies Structural Vector Autoregressions that impose equality and/or inequality restrictions to set-identify a single shock (e.g., a monetary shock). We make three contributions to the literature. (i) We present an algorithm to compute—for each horizon, each variable, a fixed vector of reduced-form parameters, and a given collection of equality and/or inequality restrictions—the largest and smallest value of the coefficients of the structural impulse-response function. (ii) We provide conditions under which the largest and smallest value of the structural parameters are directionally differentiable functions of the reduced-form parameters. (iii) We propose a computationally convenient delta-method confidence interval for the set-identified coefficients of the structural impulse-response function. We present sufficient conditions to guarantee the pointwise consistency in level of the suggested inference approach. To illustrate our results, we use a monetary Structural Vector Autoregression estimated with monthly U.S. data. We set-identify an unconventional monetary policy shock that decreases the two-year government bond rate upon impact, but has no effect over the nominal federal funds rate. We impose two additional sign restrictions on the contemporaneous responses of inflation and output. We use our confidence bands to assess the effects of the announcement of the second part of the so-called Quantitative Easing program in August 2010. (JEL-Classification: C1, C32, E47).


INTRODUCTION
An increasingly popular practice in empirical macroeconomics is to set-identify the parameters of a Structural Vector Autoregression [SVAR]. This approach was pioneered by Faust (1998), Canova and Nicoló (2002) and Uhlig (2005). Most of the follow-up studies have relied on Bayesian methods to construct posterior credible sets for the structural coefficients of the impulse-response function.
There has been recent interest in studying non-Bayesian approaches to summarize uncertainty in set-identified SVARs. Moon, Schorfheide, and Granziera (2013) [MSG13] pro-pose Projection/Bonferroni frequentist inference based on a moment-inequality-minimumdistance framework. Giacomini and Kitagawa (2014) [GK14] propose robust-Bayesian inference using multiple priors for rotation matrices. Gafarov, Meier, and Montiel Olea (2016) [GMM16] propose frequentist inference based on the projection of a Wald ellipsoid for the SVAR reduced-form parameters. None of these approaches requires the specification of prior beliefs by the researcher.
This paper contributes to the non-Bayesian analysis of set-identified SVARs by proposing a novel delta-method confidence interval for the coefficients of the impulse-response function [IRF]. Broadly speaking, our approach is based on a closed-form characterization of the endpoints of the identified set (given a vector of reduced-form parameters and a collection of binding inequality constraints). Our delta-method confidence interval takes the form of a plug-in estimator for the identified set plus/minus standard errors. In terms of theoretical results, we establish the pointwise consistency in level of our confidence interval. In terms of practical considerations, we argue that the computational cost of our procedure compares very favorably with other non-Bayesian procedures and also with the standard Bayesian algorithm described in Uhlig (2005).
The main limitation of our approach is that the delta-method confidence interval is only defined for SVAR models that impose equality and inequality restrictions on a single structural shock (e.g., a monetary shock). Admittedly, this is problematic, as some popular applications of set-identified SVARs feature restrictions on multiple structural innovations. 1 In spite of this observation, single-shock set-identified models have been applied in several empirical studies: the effects of monetary policy on output [Uhlig (2005)], the impact of monetary policy on the housing market [Vargas-Silva (2008)], the effects of labor market shocks on worker flows [Fujita (2011)], the effects of exchange rates on aggregate prices [An and Wang (2012)], and the effect of optimism shocks on business cycles fluctuations [Beaudry, Nam, and Wang (2014)]. Thus, we think there is room for our results to have an impact on empirical work.
Empirical Application-Unconventional Monetary Policy Shocks: To illustrate the usefulness of our main results, we estimate a monetary Structural Vector Autoregression using monthly U.S. data from July 1979 to December 2007 (a sample that deliberately ends one semester before the financial crisis begins). The goal of our exercise is to use pre-crisis data to learn about the responses of macroeconomic variables to shocks that have effects similar to the 'unconventional' monetary policy interventions implemented after the crisis.
In 'conventional' descriptions of monetary policy, the short-term nominal interest rate 1 SVAR applications for the oil market set-identify both demand and supply shocks using sign restrictions and elasticity bounds [Kilian and Murphy (2012)]. The same is true for recent labor market applications [Baumeister and Hamilton (2015)]. Mountford and Uhlig (2009)-one of the most cited applications of setidentified SVARs-use sign restrictions to identify a government revenue shock as well as a government spending shock, while controlling for a generic business cycle shock and a monetary policy shock. is assumed to be the central bank's policy instrument. Following any adjustment by the monetary authority, the market participants-households and firms, both domestic and foreign-use available information to form expectations about the future level of longerterm real interest rates relevant for their consumption and investment decisions.
The recent Great Recession has forced the Federal Reserve to consider alternative mechanisms to affect market beliefs about the future of real interest rates. Two examples of such unconventional policies are the Federal Open Market Committee's forward guidance announcements and the Federal Reserve's large-scale asset purchases program. Broadly speaking, through forward guidance "the Federal Open Market Committee provides an indication to households, businesses, and investors about the stance of monetary policy expected to prevail in the future". 2 In a similar fashion, the asset purchase program of the Federal Reserve intends to "put downward pressure on yields of a wide range of longer-term securities, support mortgage markets, and promote a stronger economic recovery". 3 With this motivation in mind, we set-identify an unconventional monetary policy [UMP] shock as an innovation that decreases the two-year government bond rate upon impact, but has no effect over the nominal federal funds rate. 4 We consider two additional sign restrictions on the contemporaneous responses of inflation and output. Namely, we assume that-upon impact-neither inflation nor output can respond negatively to a UMP shock. metric). The Kronecker product between matrices A and B is denoted by A ⊗ B. The vector e m i ∈ R m denotes the i-th column of the identity matrix of dimension m. If B is a matrix of dimension n × n, B i ≡ Be n i denotes its i-th column. If the dimension of e n i is obvious, we ignore the superscript n.

MODEL AND OVERVIEW OF THE MAIN THEORETICAL RESULTS
This section introduces notation, the class of SVAR models we consider, and presents a brief overview of the main methodological results in the paper. It is our hope that this brief summary (which contains references to the main propositions and lemmas in the paper) contributes to the understanding of the theoretical basis behind our delta-method confidence interval.
Notation: This paper studies the n-dimensional Structural Vector Autoregression (SVAR) with p lags; i.i.d. structural innovations distributed according to F ; and unknown n × n structural matrix B: see Lütkepohl (2007), p. 362.
The object of interest is the k-th period ahead structural impulse response function of variable i to a particular shock j (e.g., a monetary shock). In the SVAR model this parameter is given by the (k, i, j)-coefficient of the structural impulse-response function: where B j ≡ Be j and e i and e j denote the i-th and j-th column of the identity matrix I n . 5 An auxiliary object in the estimation of the structural parameters is the vector of reducedform parameters in the SVAR model: The parameter A denotes the autoregressive coefficients in the VAR model, while Σ denotes the covariance matrix of residuals. These parameters can be estimated directly from the data by Ordinary Least-Squares. The (reduced-form) parameter space is M.
Set-Identifying Restrictions: Let R(µ) ⊆ R n be a set of inequality and equality 5 The transformation C k (A) that appears in equation (2.2) is defined recursively by the formula C 0 ≡ In: Lütkepohl (1990), p. 116. restrictions imposed on B j . 6 A common practice in empirical macroeconomics is to use the restrictions in R(µ) to set-identify the structural parameter in (2.2) as a function of the reduced-form parameters in (2.3). In our paper, the set R(µ) takes the form: where Z(µ) is a matrix of dimension n×m z matrix and S(µ) is a matrix of dimension n×m s . The matrix Z(µ) collects the equality restrictions specified by the researcher (we assume there are m z of them). The matrix S(µ) collects the inequality restrictions (we assume there are m s of them). We assume that both Z(µ) and S(µ) are differentiable functions of µ.
Scope: The simple formulation in (2.4) allows the researcher to incorporate any restric- Thus, our analysis allows for the following identifying restrictions: a) Restrictions on the responses of variable i at horizon k to an impulse on the j-th shock: as in Uhlig (2005).
b) Long-run restrictions on the response of variable i to an impulse on the j-th shock: as in Blanchard and Quah (1989). c) Restrictions on the j-th column of (H ′ ) −1 : as in Rubio-Ramirez, Caldara, and Arias (2015). d) Elasticity bounds as in Kilian and Murphy (2012); for example, for some b ∈ R :

Endpoints of the identified Set:
The main results in this paper concern the endpoints of the identified set for the structural parameters, given µ. The endpoints of the identified set (which we sometimes refer to as the maximum and minimum response) are defined as follows: Definition (Endpoints of the identified set): Given a vector of reduced-form parameters µ we define the endpoints of the identified set for λ k,i,j as the functions: The function v k,i,j (µ) corresponds to the largest value of the structural parameter, λ k,i,j subject to the restriction that B j ∈ R(µ) and also that B j is the j-th column of a square root of Σ. The lower bound is defined analogously.
Overview of the main results: Our delta-method confidence interval is supported by the three theoretical results described in the abstract. Our results can be summarized as follows: • Summary of Lemma 1 (Characterization of the maximum and minimum response given a fixed set of active constraints): We show that v k,i,j (µ) and v k,i,j (µ) are the value functions of a mathematical program whose Karush-Kuhn-Tucker points can be described analytically-up to a set of 'active' inequality constraints. 7 More concretely, we show that the maximum response for λ k,i,j is equal to either plus or minus the function: and r ′ is a matrix collecting the gradient vectors of the constraints in R(µ) that are active at a maximum. The minimum response is obtained analogously.
• Summary of Proposition 1 (Algorithm to evaluate the maximum and minimum response): We use the closed-form expressions of Lemma 1 to present an algorithm that allows a researcher to evaluate the endpoints of the identified set given a vector of reduced-form parameters. The algorithm evaluates different collections of active constraints (different matrices r) and selects the constraints that generate the largest (or smallest) value function-after checking that the inequality constraints not included in r are satisfied.
• Summary of Lemma 2: (Differentiability of the maximum and minimum response for a fixed set of active constraints) We establish the differentiability of the function v k,i,j , which depends on a fixed set of active constraints. We allow the constraints in the n × l matrix r (with l ≤ n − 1) to depend on the reduced-form parameters. We show that the derivative of v k,i,j w.r.t. µ is given by: where r k is the k-th column of r, w * k is the k-th component of w * , and We argue that λ * and w * can be interpreted as the Lagrange multipliers associated to the constraints BB ′ = Σ and to the active constraints in r.
• Summary of Proposition 2 (Directional Differentiability of the endpoints): We use the formula in Lemma 2 to show that the functions v k,i,j (·) and v k,i,j (·) are directionally differentiable, in a sense we make precise. We relate the expression of the directional derivative with the generalized versions of the envelope theorems in the work of Fiacco and Ishizuka (1990) and Bonnans and Shapiro (2000). We argue that directional differentiability of the value functions (as opposed to full differentiability) arises due to the possibility that different structural models lead to the maximum (or minimum) response. In particular, let R * (µ) denote the sets of active restrictions that yield the same maximum response and assume, for simplicity, that v k,i,j (µ) > 0. We show that: • Summary of Proposition 3 (Delta-Method Confidence Interval): We establish the pointwise consistency in level of a delta-method confidence interval, which takes the form: where µ T is the typical OLS estimator for the VAR reduced-form parameters, z 1−α/2 is the (1 − α/2) quantile of a standard normal, and σ (k,i,j),T is our formula for the standard errors based on the directional derivatives.
Outline: In the remaining part of the paper, we formalize these propositions and apply them to conduct inference about the responses to an unconventional monetary shock.

RUNNING EXAMPLE: UNCONVENTIONAL MONETARY SHOCKS
This section introduces our empirical application, which will be used as a running example to illustrate our assumptions and results.
Monetary Svar: We consider a simple 4-variable model that includes the Consumer Price Index (CP I t ), the Industrial Production Index (IP t ), the 2-year Treasury Bond rate (2yT B t ), and the Federal Funds rate (F F t ). 8 We take a logarithmic transformation of CP I t , IP t and then work with first differences for all variables. Thus, our vector of macro variables is: We set the number of lags equal to 11 using the Bayesian Information Criterion (p = 11).
The time span of the monthly series is July 1979 to August 2008 (T = 342). To keep our exposition as simple as possible, we ignore potential co-integration issues between shortterm and long-term interest rates. Without loss of generality, we assume that column of B corresponding to an UMP shock is the first column; B 1 ≡ Be 1 . Our equality/inequality restrictions are summarized in the following Table:  The suggested set-identification strategy in this paper is not new. Baumeister and Benati (2013) study an analogous 'spread' monetary shock that leaves the short-term nominal rate unchanged, but affects the spread between the ten-year Treasury-bond yield and the policy rate. They consider a Bayesian SVAR with time varying parameters and stochastic volatility combined with demand and supply structural shocks that satisfy zero/sign restrictions as in Rubio-Ramirez, Waggoner, and Zha (2010). Their main result is that the long-term yield spread exerts a powerful effect on both output growth and inflation. All their inference is Bayesian, while ours is frequentist. In addition, our SVAR model does not consider timevarying parameters, stochastic volatility, and restrictions on other nonmonetary shocks.

THE ENDPOINTS OF THE IDENTIFIED SET
In this section we formalize Lemma 1 and Proposition 1. We consider the problem of finding the maximum response to an impulse in the j-th structural shock subject to m z equality ('zero') restrictions and m s inequality ('sign') restrictions. The focus on the maximum and the minimum is an intermediate step to conduct frequentist inference about the coefficients of the impulse-response function.
This section makes two assumptions on the sign and zero restrictions allowed in the model.
First, we require the number of zero restrictions to be less than n − 1. Second, we assume that every collection of n − 1 − m z inequality restrictions and the m z equality restrictions are linearly independent everywhere in the parameter space.
Just as before, the set R(µ) is given by Running Example-R(µ): In the UMP example, the set of restrictions R(µ) corrresponds to (see Table I): Consequently: We note that the equality and inequality restrictions in our example do not depend on the reduced-form parameters (neither Z nor S depend on µ).
Main Assumptions: The first assumption in this section requires the number of zero restrictions to be strictly smaller than n−1. The rationale behind Assumption 1 is as follows: if m z > n − 1, then R(µ) = {0 n×1 } for every µ for which there are n linearly independent equality restrictions. This is problematic, as the latter implies there is no B j ∈ R n such that B j ∈ R n and B j = Be j for some BB ′ = Σ (provided B is invertible). 9 Assumption 1 m z ≤ n − 1.
9 If B is invertible, then Σ is invertible and B ′ Σ −1 B = In, which implies B ′ j Σ −1 B j = 1. Therefore, if B j = Be j for some square root of Σ then B j must be different from 0 n×1 . Our second assumption imposes a 'linear independence' condition on the equality and inequality restrictions on the model (given a particular value of the reduced-form parameter µ). Let e ms 1 , e ms 2 , . . . e ms ms denote the m s different columns of the identity matrix I ms . Let e(k) denote an m s × k matrix formed by collecting any of the k ≤ n − 1 − m z columns of I ms . Note that for any matrix S, the matrix Se(k) selects k columns of S.
Definition We say that Z(µ) and S(µ) are linearly independent at µ if for any k ∈ N, k ≤ n − 1 − m z and any e(k) the matrix is assumed to have full column rank (rank m z + k).
We use this definition to state our following assumption: Assumption 2 The parameter space M is such that Z(µ) and S(µ) are linearly independent at every µ ∈ M.

This 'linear independence' property plays an important role in the characterization of the maximum and minimum response in terms of Karush-Kuhn-Tucker conditions.
Running Example-Assumption 1 and 2: In our application m z = 1, so we only need to verify Assumption 2. As we mentioned before, the matrix e(k) is a selector matrix. For example, let e(2) be given by the first and third column of I 3 ; that is This implies that and moreover: Thus, the matrix R(µ; e(2)) is formed by collecting the gradient of the unique equality restriction and the first and third inequality restrictions in S. Note that regardless of the number of k columns selected from S-and regardless of what M is-the resulting matrix R(µ, e(k)) will always have full column rank.
Verifying Assumption 2 with more general restrictions requires additional work. For example, suppose that the researcher is interested in including the restriction: This restriction says that the UMP shock cannot decrease the growth rate in Industrial Production even one-period after the shock. Since C 1 (A) = A 1 , the vector e ′ 2 C 1 (A) is equal to the second row of A 1 , which we can denote as (A 1,(2,1) , A 1,(2,2) , A 1,(2,3) , A 1,(2,4) ). The matrix S(µ) is now given by: Hence, we conclude that Assumption 2 will be satisfied as long as M is such that A 1,(2,j) = 0 for all j = 1, . . . 4, which means that each of the entries in the first lag of Y t−1 has predictive power on Y t after controlling for the rest of the lags.
The third assumption is the following: Assumption 3 The matrices Z(µ) and S(µ) are differentiable functions of the reducedform parameter µ.
We are not aware of equality/inequality restrictions in the SVAR literature that does not satisfy this property. In particular, all the examples given in p. 5 of this paper satisfy Assumption 3.

Lemma 1: Closed-form Solution for the maximum response given an active set of constraints
In this section we show that that given a collection r ∈ R m×n of 'active' constraints (m ≤ n − 1) the maximum response is determined in closed-form (and up to sign) by the Karush-Kuhn-Tucker conditions of the program (2.5) and (2.6).
Lemma 1 Let r be a matrix of dimension n × m, m ≤ (n − 1) collecting the gradients of the 'active' (binding) constraints at a solution of the mathematical program (2.5). Suppose that Assumption 1 holds and suppose that Z(µ) and S(µ) are linearly independent at µ.
is given by either plus or minus the norm of the residual of the projection of Σ 1/2 C k (A) ′ e i into the space spanned by the columns of Σ 1/2 r; that is: Consequently, the sign of v k,i,j (µ) depends on which of the two solutions x * (µ; r) (the one with (4.1) in the denominator or the one with (4.2)) satisfies the sign restrictions that are not in r.
Proof: See Appendix A.1 for the proof, which uses the necessary Karush-Kuhn-Tucker conditions of the optimization problem to characterize the maximizers given a set r of active constraints. 10 Figure 1 presents a graphical representation of the mathematical program of interest. Figure 2 presents an intuitive description of the solution.
10 To guarantee the existence of Karush-Kuhn-Tucker multipliers we use the fact that Z(µ) and S(µ) are linear independence at µ. Our assumption implies that the mathematical program defining the endpoints of the identified set satisfies a linear independence constraint qualification (see Fiacco and Ishizuka (1990), p. 224). Figure 1 provides a graphical representation of the mathematical program (2.5), where BB ′ = Σ has been replaced by the 'ellipsoid' constraint x ′ Σ −1 x = 1, x ≡ B j ∈ R 3 (this equivalence will not hold, in general, if there are restrictions on multiple shocks). The objective function corresponds to the hyperplane with normal vector C k (A) ′ e i ∈ R 3 . In this example, there is only one equality restriction with normal vector given by the (blue, solid) line. This restriction requires the contemporaneous impact of the j-th shock on the third variable to be zero. Note that without the equality restriction the maximizer and minimizer will be given by the point at which the hyperplane is tangent to the ellipsoid.
One way to think about the solution to the problem of interest is explained in Figure   2. Suppose there are only equality constraints. Note that That is, the selected value of x should be of the form: The quadratic equality constraint also restricts the choice variable x to satisfy x ′ x = 1.
Consequently, the problem can be re-written as  and there is only one zero restriction. The solution to the program must lie in the orthogonal complement of Z (blue, thin, solid). In this picture the orthogonal complement corresponds to the space spanned by the blue, thick, solid lines. This implies that the rotated solution, denoted x ≡ Σ −1/2 x, must be of the form M Σ 1/2 Z y for some y ∈ R 3 . Hence, the only relevant part of x ′ Σ −1 x = 1 becomes the projected version of it: y ′ M Σ 1/2 y = 1, represented by the black, solid ellipsoid. One can find the value of this problem by projecting the gradient of the objective function on the orthogonal complement of Σ 1/2 z (arrow) and selecting a direction in the ellipsoid proportional to it. The value function v k,ij (A, Σ) will be given by the norm of the arrow.
An application of the Cauchy-Schwartz inequality shows that the positive value in (4.1) gives the maximum response in (2.5). 11 11 Using the fact that M Σ 1/2 Z is idempotent and using the assumption that

Proposition 1: Algorithm to evaluate the maximum and minimum response
We have provided a closed-form expression (up to a sign) for the maximum response v k,i,j (µ), given a collection r of active restrictions. We now answer the following question: how does one compute the maximum response v k,i,j (µ) for a given value of µ?
We use the result in Lemma 1 to state the solution of the mathematical program (2.5) that includes both equality and inequality restrictions. The main result in this section is that such problem can be solved by 'activating' different combinations of inequality constraints. In other words, the problem in (2.5) can be solved by finding the largest value among the Karush-Kuhn-Tucker points that satisfy a feasibility constraint.
Additional Notation Illustrated with our Example-1) Collection of Active Constraints: Fix (A, Σ) and, in a slight abuse of notation let Z and S denote Z(µ) and S(µ). Define first: as the R n matrix that collects all of the m z zero restrictions. Hence, in our empirical application: Define also: as the collection of all matrices that activate one of the m s inequality restrictions; analgously, R 1 corresponds to the collection of matrices that impose one of the inequality restrictions as an equality restriction. In our example: Therefore, s.t. y ′ M Σ 1/2 Z y = 1. By the Cauchy-Schwartz inequality this program is bounded above by (e ′ i C k Σ 1/2 M Σ 1/2 Z Σ 1/2 C ′ k e i ) 1/2 . This value can be achieved by x * (A, Σ; Z) in Lemma 1.
More generally, for l ≤ n − m z − 1, consider the collection: The matrix r l ∈ R l activates l of the m s sign-restrictions in the SVAR model. Note that the collection R l has m s !/(l!(m s − l)!) elements and R ms has a unique element in which all the sign restrictions of the model are active (provided m s ≤ n − m z − 1). In our example, n − m z − 1 = 2. There are 3 different subsequences of two elements from the sequence {1, 2, 3}: {1, 2}, {1, 3}, and {2, 3}. Therefore, Thus, R 1 and R 2 denote the different collection of active constraints formed by choosing one and two of the elements of S, respectively.
Additional Notation Illustrated with our Example-2) Feasibility: We define the feasibility of a vector x ∈ R n (with respect to the sign restrictions) as the indicator where, following convention, ≥ is taken component-wise whenever the binary relation is applied to vectors. Hence, x ∈ R n is a feasible point for the mathematical program (2.5) if and only if 1 ms (x) = 1 and x satisfies the equality restrictions in Z. 12 In the context of our example: The point x is a feasible point for the mathematical program in (2.5) as it satisfies both the equality and inequality restrictions.
Proposition 1 (Algorithm to evaluate the maximum and minimum response) with each program subject to with equality and inequality restrictions: denote all possible combinations of up to n − 1 active constraints and for r ∈ R define v k,i,j (µ; r) as the function: , and let c be a positive and large constant ( Consider the candidate value functions: If v k,i,j (µ; r) = 0, set: If v k,i,j (µ; r) = 0 and there is a point x * = 0 satisfying the equality restrictions in r and also the inequality restrictions that are not included in r, set: If v k,i,j (µ; r) = 0 and there is no point x * = 0 satisfying the equality restrictions in r and the inequality restrictions that are not in r, set: That is, the value function v k,i,j (µ) is obtained by computing the Karush-Kuhn-Tucker points in Lemma 1 for each r, penalizing the value v k,i,j (µ; r) if unfeasible, and maximizing over all the possible values of r. The minimum value is obtained analogously.
Proof: The intuition behind the proof is as follows. Note that any combination of active sign restrictions r for which x * + (A, Σ; r) (or x * − (A, Σ; r)) is well-defined and feasible must be, by definition, no larger than v k,i,j (A, Σ). Thus, we only have to show that max r∈R max{f + max (A, Σ; r), f − max (A, Σ; r)} ≥ v k,i,j (A, Σ). Since Lemma 1 showed that the value of the program (2.5) should be of the form f + max (A, Σ, r) or f − max (A, Σ, r) for some r ∈ R, the result must follow. The proof is formalized in Appendix A.2.
Algorithm to evaluate the endpoints of the identified-set: The proposition above shows that in order to solve the mathematical problem in (2.5) it is sufficient to apply the following algorithm: 1. Activate different combinations of the m s sign restrictions. Collect the original m z equality restrictions and the inequality restrictions that were activated in the matrix r. The matrix should have no more than n − 1 columns. Note that the total number of matrices r will be given by: 2. Compute the candidate value functions ±v k,i,j (A, Σ; r) for each of the elements r ∈ R.
3. If v k,i,j (A, Σ; r) = 0, verify if x * + (A, Σ; r) satisfies the sign restrictions that were not included in r. That is, verify the feasibility of the solution x * + (A, Σ; r). If the primal feasibility condition is satisfied set If the primal feasibility condition is violated penalize v k,i,j (A, Σ; r) to guarantee that it is never a solution by setting: gives the smallest value. Using the algorithm in the UMP example: We use the algorithm to evaluate the identified set in the running example. We fix µ at its estimated OLS values, denoted µ T , and we report v k,i,j ( µ T ) and v k,i,j ( µ T ) for the cumulative IRFs. 13 The scale in Figure 3 corresponds to a one standard deviation structural UMP shock.
We consider first the equality/inequality restrictions in Table I. We note that evaluating the endpoints of the identified for the 4 variables in the VAR, over 40 horizons, takes around .1 seconds. We then include an additional inequality restriction on the response of output to an expansionary UMP shock. Namely, we assume that even one period after the shock, the cumulative effect on IP cannot be negative (e ′ 2 (C 0 + C 1 (A))B 1 ≥ 0). A comparison between the two collections of restrictions suggests that, in this example, the noncontemporaneous constraint has almost no additional identification power.
Of course, one could use the Bayesian algorithm in Uhlig (2005) to approximate the value of the endpoints. Given D draws of the reduced-form parameters (A, Σ) and a unit vector q ∈ R n , one could report the maximum and minimum value for {λ d k,i,j (A, Σ, q)} D d=1 over the different draws. This algorithm is a random grid search approach to solve the programs (2.5) and (2.6). Figure 6 in the appendix presents a comparison between the different approaches. The grid search takes around 300 seconds to run and underestimates the identified set. 13 The formula for the maximum (minimum) k-th period ahead cumulative IRF replaces C k ( A T ) by  Table I. (Blue, Crosses) Endpoints of the identified set with the additional restriction e ′ 2 (C 0 + C 1 (A))B 1 ≥ 0.

DIRECTIONAL DIFFERENTIABILITY OF THE ENDPOINTS
Once again, let denote the set of all possible combinations up to n − 1 active constraints. 14 Let us denote the typical element in R(µ) as r(µ), which we take to be an n × l matrix with l ≤ n − 1. We will continue working with the auxiliary function: where we now explicitly acknowledge the possible dependence of r on µ. In Lemma 1 we have shown that if r(µ) is the active set of constraints at a solution of the program (2.5), then: as long as v k,i,j (µ) = 0. In order to establish the differentiability of v k,i,j (µ) we prove the following intermediate result.
Lemma 2 (Differentiability results for a given active set of constraints) If r(µ) is differentiable with respect to µ and v k,i,j (µ; r(µ)) = 0, then v k,i,j (µ; r(µ)) is differentiable with respect to µ with derivativev k,i,j (µ; r(µ)) given by: where r k (µ) denotes the k-th column of r(µ), and w * k is the k-th component of the vector w * .
14 The dependence of the set R on the parameter µ was omitted in the previous section for notational simplicity.
The envelope theorem sheds light on the derivative formula provided in Lemma 2. Note first The auxiliary Lagrangian function of this problem is given by: where λ is the Lagrange multiplier corresponding to the quadratic equality restriction and w ∈ R l is the vector of Lagrange multipliers corresponding to the l equality restrictions.
The envelope theorem suggests thatv k,i,j (µ; r(µ)) is given by the formula in Lemma 2. We confirm this intutition in the prove of Lemma 2; provided v k,i,j (µ; r(µ)) = 0.
We now establish the differentiability of v k,i,j (µ). Without loss of generality, assume that v k,i,j (µ) > 0 (and also that v k,i,j (µ) < 0). For a fixed vector of reduced-form parameters define the sets: The set R * (µ) collects the different active constraints that could lead to the maximum value. The set R * (µ) is a singleton if and only if the program (2.5) has a unique solution.
The set R * (µ) is defined analogously.
Proposition 2 (Directional differentiability of the endpoints of the identified set) Suppose, w.l.o.g., that v k,i,j (µ) > 0 and v k,i,j (µ) < 0. Then, for any sequence h n ∈ R d such that h n → h and any sequence t n → ∞: Thus, v k,i,j and v k,i,j are directionally differentiable functions of the reduced-form parameters with directional derivative: Proof: See Appendix A.4. Bayesian credible sets based on the quantiles of the posterior distribution of v(µ) will be asymptotically equivalent to the frequentist bootstrap (which is not consistent in this case).
These results imply that typical frequentist and Bayesian inference for directionally differentiable functions is problematic. The next section shows that the special form of the directional derivative in the class of SVARs models studied in this paper allows the researcher to conduct delta-method inference, with a slight adjustment on the standard errors.

DELTA-METHOD INFERENCE
This section proposes a delta-method confidence interval of the form is the OLS estimator for µ defined as: We work under the assumption that √ T ( µ T − µ) is asymptotically normal with some covariance matrix Ω. A common formula to estimate the asymptotic variance of µ T is: We use the results in Proposition 2 and the asymptotically normality of µ T to suggest the following formula for σ (k,i,j),T : where R( µ T ) is the set of all possible active constraints evaluated at µ T .
Main Result in this section: Let P denote the data generating process and let I R k,i,j (µ(P )) denote the identified set for the structural parameter λ k,i,j given the equality/inequality restrictions in R(µ). This section shows that under our proposed specification of σ (k,i,j),T : Consequently, the delta-method confidence interval presented in this paper is pointwise consistent in level. We now describe the main large-sample assumptions concerning P .
Data Generating Process: The SVAR parameters (A 1 , . . . , A p , B, F ) define a probability measure, denoted P , over the data observed by the econometrician. Our main assumption concerning P is as follows: Assumption 4 (Asymptotic Normality of µ T ) The data generating process P is such that for µ(P ) ∈ R d : and Ω T p → Ω(P ).
Thus, our only restriction on (A 1 , A 2 , . . . , A p , B, F ) is that, whatever these parameters are, the OLS estimator µ T is asymptotically normal with a covariance matrix that can be estimated consistently. Delta-method for Directionally Differentiable functions: Dümbgen (1993), Shapiro (1991), and Fang and Santos (2014) have shown if v is a directionally differentiable function with directional derivativev µ (h) (in direction h evaluated at µ) then: whenever Assumption 4 holds. 15 Proposition 2 in the previous section established that the directional derivative of v k,i,j -in direction h evaluated at µ-is given by: where R * (µ) collects the active constraints that generate v k,i,j (µ). Thus, Proposition 2 and Assumption 4 imply that: Inference for Directionally Differentiable Functions: How can we use the delta-method result above to construct a confidence set for λ k,i,j ? Our suggestion-which 15 The map v : R r → R is said to be Hadamard directionally differentiable at µ ∈ µ ⊆ R r , tangentially to R r , if there is a continuous (not necessarily linear) mapv(·, µ) : R r → R such that: The function v(·) is Fully Differentiable at µ if and only if the mappingv(·; µ) is linear. See Fang and Santos (2014) for a recent elegant exposition on directionally differentiable functions. See also Shapiro (1990).
exploits the specific form of the directional derivative in the SVAR context-is to consider: where R( µ T ) is the set of all the different collections of active constraints evaluated at µ T . The resulting standard error will then be used to enlarge the plug-in estimator of the endpoints of the identified set. The suggested confidence interval is shown to be pointwise consistent in level. 16 This is formalized in the following proposition.
Proposition 3 (Pointwise Consistency in Level of the Delta-Method Confidence Interval) Let σ (k,i,j),T be defined as in (6.1). Suppose there are at most n − 1 equality restrictions and that data generating process P is such that Z(µ(P )) and S(µ(P )) are linearly independent at µ(P ) as defined in Section 4, and also differentiable. Suppose in addition that: Then, under Assumption 4: Proof: See Appendix A.5.
The intuition behind our proof is as follows. Note first that if λ belongs to the identified set, (i.e., λ ∈ I R k,i,j (µ(P )), then such parameter must lie between the maximum and the minimum response; that is λ ∈ [v k,i,j (µ(P )), v k,i,j (µ(P ))]. Consequently, one can show that: is larger than or equal to Thus, a sufficient condition for the validity of the delta-method confidence interval is that it covers the identified set with probability at least 1 − α. Note that the probability of covering 16 The question of how to build a uniformly consistent in level, delta-method confidence set for a setidentified parameter is beyond the scope of this paper. For the readers interested in uniform inference for set-identified parameters in SVARs our suggestion is to apply the projection approach developed in Gafarov et al. (2016). Compared to GMM16, the delta-method approach described in this paper is faster to implement. the identified set can be written as one minus the sum of the following two terms: Using our large sample assumptions and the delta-method for directional differentiable functions, these probabilities are approximately equal to: Take any r * ∈ R * (µ(P )) for which σ (k,i,j) (r * ) ≡v k,i,j (µ(P ), r * ) ′ Ωv k,i,j (µ(P ), r * ) > 0. (we have assumed that such r * exists). It follows that: (k,i,j) , and the last term is bounded above by α/2. Analogously, we can select an r * ∈ R * (µ(P )) for which σ (k,i,j) (r * ) ≡v k,i,j (µ(P ), r * ) ′ Ωv k,i,j (µ(P ), r * ) > 0 and we can show that: which is also bounded above by α/2. These inequalities suffice to establish the pointwise validity of our delta-method approach.
Monte-Carlo Evidence: We conduct a simple Monte-Carlo exercise to study the coverage probability of our delta-method confidence set. We set (1 − α) = .68, which implies that z 1−α/2 = .9945. We generate 10, 000 draws from the multivariate normal model N d ( µ T , Ω T /T ) and for each draw (denoted µ * ) we compute the confidence interval: We check whether [v k,i,j is contained in the confidence interval or not. The estimated probability provides a lower bound on the coverage of the identified parameter.
The results are reported in Figure 4.
for the model µ * ∼ N ( µ T , Ω T ), with T = 342. The values µ T and Ω T correspond, respectively, to the estimators of the reduced-form parameter and its asymptotic covariance matrix in the UMP application. (Blue, Solid Line) Nominal confidence level for the delta-method confidence interval (68%).

UNCONVENTIONAL MONETARY SHOCKS
As we mentioned before, the identification strategy in this paper was motivated by two mechanisms used by the Federal Reserve to affect market beliefs during the Great Recession: forward guidance announcements and the large-scale asset purchase program. We will focus on one particular episode of the Great Recession illustrating the role of forward guidance.
In Figure 5 uses our delta-method approach to construct confidence bands for the evolution of the levels of the four variables in the monetary SVAR. We fix all the variables at their level on July 2010 and we trace their evolution (over a 12-month window) according to the confidence set for their cumulative responses. The motivation for this exercise is as follows. Suppose that-back in August 2010-an econometrician is asked to provide confidence bands for the evolution of IP, CPI, 2YTB, and FF after the August 2010 announcement of the Federal Open Market Committee (FOMC). The econometrician observes the realization of the macroeconomic variables from July 1979 until August 2010, but decides to deliberately ignore the two years of data after the crisis (to avoid introducing structural changes, stochastic volatility, or any other feature that will complicate the estimation of the VAR).
The econometrician uses the data until December 2007-one semester before the financial crisis-to conduct delta-method inference on the cumulative responses to a one standard deviation unconventional monetary policy shock. The econometrician then uses these cumulative responses to get a rough idea of the evolution of the variables (in levels) following the announcement of the Federal Reserve in August 2010. The econometrician assumes there is a linear trend for CPI/IP, and ignores sampling uncertainty coming from the trend estimation in reporting the bands.
An ex-post evaluation of this exercise (over a window of 12 months) is reported in Figure  5. 17 We note that the observed dynamics for CPI, IP, GS2, and FFR from August 2010 to July 2011 fall within the bounds motivated by our delta-method confidence interval. We also note that our delta-method confidence interval misses the observed value at most three out of 12 months, which means that our 68% confidence-set covers each of these variables at least 75% of the time. We also report the 68% Bayesian credible sets. Computational Cost: We close this section with some comments regarding the computational cost of our delta-method procedure. Most of the work to compute the endpoints of the identified set and its derivatives is analytical. Consequently, practitioners can expect the computational burden of our procedure to be low. We note that the implementation of our delta-method confidence interval in the running example takes only around .15 seconds (using a standard Laptop @2.4GHz IntelCore i7). With the same equipment, the standard Bayesian implementation required around 327 seconds for 10,000 draws (which means that we could have constructed 2,000 delta-method confidence intervals while we generated the Bayesian credible set).
Comparison with the Projection Approach: Figure 7 presents a comparison between the delta-method approach and the projection confidence interval recently proposed by Gafarov et al. (2016) [GMM16]. The projection confidence interval has three theoretical properties that we were not able to verify for the delta-method approach. First, the projection confidence interval is uniformly consistent in level. Second, the projection confidence interval yields valid inference for the whole impulse-response function and not only its scalar coefficients. Third, the projection confidence interval has-in large samples-Bayesian credibility of at least the nominal level (for a large class of priors).
In order to exploit our formulas for the endpoints of the identified set, we followed a different algorithm to the one suggested in GMM16. We used a random grid over the Wald ellipsoid for the reduced-form parameters and reported the range of IRFs over this grid. The implementation of the projection confidence set (based on a random grid of 10,000 points) took around 1300 seconds (4 times slower than the Bayesian credible set and 8,000 times slower than the delta-method). We note that the projection confidence interval (which is wider than the delta-method bands) contains the realized value of IP, CPI, 2YTB, and FF for every horizon under consideration.
Comparison with Calibrated Projection: The projection confidence interval covers the structural parameters more often than necessary. GMM16 show that when the endpoints of the identified set are differentiable, one can project a Wald ellipsoid with a radius given by z 2 1−α to eliminate projection bias. The calibrated confidence set will be approximately given by: where σ (k,i,j),T , σ (k,i,j),T are estimators of the asymptotic variance of the plug-in estimators for the endpoints of the identified set. Figure 8 reports the calibrated projection.
Under the differentiability assumption, both σ (k,i,j),T , σ (k,i,j),T are smaller (for every data realization) than σ (k,i,j),T . This suggests that-whenever the endpoints of the identified set are fully differentiable-the calibrated projection should deliver smaller confidence intervals than our delta-method approach, as our formula for the standard error takes into account the possibility that the endpoints are only directionally differentiable. In terms of computation time, the calibrated projection takes approximately 109,365 seconds (solving the nonlinear program described in GMM16).
Comparison with the Robust Approach: Finally, Figure 6 in the Appendix reports the robust-Bayesian credible set in Giacomini andKitagawa (2014)[2014]. The implementation of the robust-Bayes credible set (based on 10,000 posterior draws and using our algorithm to evaluate the endpoints) took around 9106 seconds. 18

CONCLUSION
This paper focused on set-identified SVARs that impose equality and inequality restrictions to set-identify only one structural shock. For this class of models, the endpoints of the identified set have special properties that allow an intuitive and computationally simple approach to conduct frequentist inference. Specifically, the paper made three contributions: (i) We presented an algorithm to compute-for each horizon, each variable, a fixed vector of reduced-form parameters, and a given collection of equality and/or inequality restrictions-the largest and smallest value of the coefficients of the structural IRF (see Proposition 1). Our algorithm does not require random sampling from the space of rotation matrices or unit vectors. Instead, we treated the bounds of the identified set as the maximum and minimum value of a mathematical program whose solutions we can characterize analytically.
(ii) We provided sufficient conditions under which the largest and smallest value of the structural parameters are directionally differentiable functions of the reduced-form parameters (see Proposition 2). This result seems to be of interest in its own right, and for example, could be used to explore the frequentist properties of the robust-Bayesian procedure in Giacomini and Kitagawa (2014).
(iii) We proposed a computationally convenient delta-method confidence interval for the set-identified coefficients of the structural IRF. We presented sufficient conditions to guarantee the pointwise consistency in level of the suggested inference approach. The delta-method in this paper exploited the structure of the directional derivative. Baumeister, C. and J. Hamilton (2015): "Sign Restrictions, Structural Vector Autoregressions, and Useful APPENDIX A: MAIN RESULTS

A.1. Lemma 1
Let S(µ) denote the n × ms matrix of ms 'sign' restrictions and let Z(µ) denote the mz × n matrix of 'zero' restrictions. For notational simplicity, we deliberately ignore the dependence of the equality/inequality restrictions on µ. The problem in equation (2.5) is equivalent to: The auxiliary Lagrangian function is given by: SInce Z(µ) and S(µ) are linearly independent at µ we can characterize the maximum response using the Karush-Kuhn-Tucker conditions for the mathematical program in (2.5). The Karush-Kuhn-Tucker necessary conditions for this problem are as follows: plus the additional dual feasibility constraint requiring the Lagrange multipliers, ω 2i , to be smaller than or equal to zero.
Let x * (A, Σ, r) be one (out of possibly many) maximizer of the program of interest and suppose that the m × n matrix (m ≤ n − 1) r collects all the restrictions that are active (binding). Because of Assumption 1 and the fact that Z and S are linearly independent at µ, the matrix r is of full rank m and m must be smaller than or equal n − 1. Using stationarity, primal feasibility, and complementary slackness at x * we get: where v k,i,j (A, Σ) denotes the value of the maximum response when the constraints in r are active. Thus, the Lagrange multiplier w * 1 is unique and given by: Note also that w * 1 = 0 if and only if v k,ij (A, Σ) = 0. We now show that if v k,i,j (A, Σ) = 0 there are unique w * 2 and w * 3 that satisfy the Karush-Kuhn Tucker conditions. Note that left multiplying the stationarity condition by Σ we have: where w collects the nonzero components of ω 2 and all the components of ω 3 .
Consequently the value function given active constraints z is given by either: We will use the first order conditions to find the vector of Lagrange multipliers w and show that they are unique given v k,i,j (A, Σ) = 0. Note that Under the assumptions of the Lemma, r is of rank m. If v k,i,j (A, Σ) = 0, the equation above holds if and only if Consequently, the Lagrange multipliers for the active restrictions are unique. To conclude the proof, we get an explicit expression of the value function in terms of (A, Σ). To do so, note that: Therefore, the equation above and (A.2) imply that if v k,i,j (A, Σ) = 0 then either: Furthermore, since any solution for which z is the set of binding constraints satisfies 2w * 1 x * ′ = (C k (A) ′ e i − rw) ′ Σ, then the solution vectors for which the constraints in r are binding are: In any case the lagrange multipliers for the active constraints are given (as shown above) by, Remark to the proof of Lemma 1: If v k,i,j (A, Σ) = 0, then neither the maximizer x * nor the Lagrange multipliers w * are unique. Note that any x ∈ R n orthogonal to C k (A) ′ e i satisfying the ellipsoid constraint x ′ Σ −1 x = 1 and the sign/zero restrictions is a solution. In addition, any vector of Lagrange multipliers satisfying the equation Case 2.2: If v k,i,j (A, Σ) = 0, there is an x * = 0 in the choice set. Hence, the Karush-Kuhn-Tucker conditions implies that C k (A) ′ e i is a linear combination of the active constraints that generate the value of zero (which means there is an r * such that f + max (A, Σ; Step 1 and Step 2 shows that the value function v k,i,j (A, Σ) is obtained by computing the Karush-Kuhn-Tucker points in Lemma 1 for each r, penalizing the value v k,i,j (A, Σ; r) if unfeasible, and maximizing over all the possible values of r. The proof for the lower bound is analogous.

A.3. Lemma 2
Note that if r l (µ) is differentiable and v k,i,j (µ, r l (µ)) = 0, then the function: , is differentiable as well. Moreover, the function is also differentiable. Therefore, (where we have applied the chain rule for matrix derivatives).
We now use the envelope theorem to compute this derivative. Note that we have shown the existence of unique multipliers λ * ∈ R and w * ∈ R l such that: Therefore: ∂v k,i,j (µ, r l (µ))
Note also that: where r k,l (µ) denotes the k-th column of r l (µ). Consequently: where w * k is the k-th entry of the vector of lagrange multipliers w * . This gives the partial derivative of v k,i,j (µ, r l (µ)) with respect to vec(A). We note that this derivative can also be written as: which is the expression given in the overview. Finally, to get the derivative with respect to vec(Σ) we note that: Since each of r T (v k,i,j (A T , Σ T , r 1 (µ T )) − v k,i,j (µ, r l (µ))) in the previous expression is, by Lemma 2, approximately equal tov k,i,j (µ; r l (µ)) ′ h. Then