Abstract
This chapter studies partially identified structures defined by a finite number of moment inequalities. When the moment function is misspecified, it becomes difficult to interpret the conventional identified set. Even more seriously, this can be an empty set. We define a pseudo-true identified set whose elements can be interpreted as the least-squares projections of the moment functions that are observationally equivalent to the true moment function. We then construct a set estimator for the pseudo-true identified set and establish its \(O_{p}(n^{-1/2})\) rate of convergence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Here, we take the indicators (or instruments) \(1_A(z)\) as given. The indicators \(1_{A}(z)\) could be replaced by any finite vector of measurable non-negative functions of \(z\). Andrews and Shi (2011) give examples of such functions.
- 2.
The players do not need to know the \(F\)’s, but these are important to the econometrician.
- 3.
For this example, \(\Theta _I\) is never empty as long as the number (\(2K\)) of moment inequalities equals the number of parameters \((\ell )\).
- 4.
We are indebted to an anonymous referee for pointing out a relationship between BMM’s framework and ours. General incomplete linear moment restrictions are given by \(E[V(Z^{\prime }\theta -Y)]=E[Vu(V)]\), where \(V\) is a vector of random variables, and \(u\) is an unknown bounded function. See BMM for details.
- 5.
Their framework does not consider misspecification. Their object of interest is therefore the conventional identified set \(\Theta _I\). In our setting, the sample criterion function degenerates, i.e., \(Q_n(\theta ,s)=0\), on a neighborhood of \(\Theta _{*}\times \mathcal S _0\) under Assumption 3.2 (iv).
- 6.
We are indebted to an anonymous referee for this point.
- 7.
Since the mean value theorem only applies element by element to the vector in (A.8), the mean value \(\bar{\theta }_n\) differs across the elements. For notational simplicity, we use \(\bar{\theta }_n\) in what follows, but the fact that they differ element to element should be understood implicitly. For the measurability of these mean values, see Jennrich (1969) for example.
References
Ai, C., and X. Chen (2003): “Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions”, Econometrica, 71(6), 1795–1843.
Aliprantis, C. D., and K. C. Border (2006): Infinite Dimensional Analysis-A Hitchhiker’s Guide. Springer, Berlin.
Andrews, D. W. K. (1994): “Chapter 37: Empirical Process Methods in Econometrics”, vol. 4 of Handbook of Econometrics, pp. 2247–2294. Elsevier, Amsterdam.
Andrews, D. W. K., and X. Shi (2011): “Inference for Parameters Defined by Conditional Moment Inequalities”, Discussion Paper, Yale University.
Bajari, P., C. L. Benkard, and J. Levin (2007): “Estimating Dynamic Models of Imperfect Competition”, Econometrica, 75(5), 1331–1370.
Bontemps, C., T. Magnac, and E. Maurin (2011): “Set Identified Linear Models”, CeMMAP Working Paper.
Chen, X. (2007): “Large Sample Sieve Estimation of Semi-Nonparametric Models”, Handbook of Econometrics, 6, 5549–5632.
Chernozhukov, V., H. Hong, and E. Tamer (2007): “Estimation and Confidence Regions for Parameter Sets in Econometric Models1”, Econometrica, 75(5), 1243–1284.
Ciliberto, F., and E. Tamer (2009): “Market Structure and Multiple Equilibria in Airline Markets”, Econometrica, 77(6), 1791–1828.
Folland, G. (1999): Real Analysis: Modern Techniques and Their Applications, vol. 40. Wiley-Interscience, New York.
Guggenberger, P., J. Hahn, and K. Kim (2008): “Specification Testing under Moment Inequalities”, Economics Letters, 99(2), 375–378.
Ichimura, H., and S. Lee (2010): “Characterization of the Asymptotic Distribution of Semiparametric M-Estimators”, Journal of Econometrics, 159(2), 252–266.
Jennrich, R. I. (1969): “Asymptotic Properties of Nonlinear Least Squares Estimators”, Annals of Mathematical Statistics, 40(2), 633–643.
Kaido, H., and H. White (2010): “A Two-Stage Approach for Partially Identified Models”, Discussion Paper, University of California San Diego.
Lindenstrauss, J., D. Preiss, and J. Tiser (2007): “Differentiability of Lipschitz Maps”, in Banach Spaces and Their Applications in, Analysis, pp. 111–123.
Luttmer, E. G. J. (1996): “Asset Pricing in Economies with Frictions”, Econometrica, 64(6), 1439–1467.
Manski, C. F., and E. Tamer (2002): “Inference on Regressions with Interval Data on a Regressor or Outcome”, Econometrica, 70(2), 519–546.
Molchanov, I. S. (2005): Theory of Random Sets. Springer, Berlin.
Newey, W. (1994): "The Asymptotic Variance of Semipara metric Estimators," Econometrica, 62(6), 1349–1382.
Newey, W. K., and D. McFadden (1994): “Large Sample Estimation and Hypothesis Testing”, Handbook of Econometrics, 4, 2111–2245.
Pakes, A. (2010): “Alternative Models for Moment Inequalities”, Econometrica, 78(6), 1783–1822.
Pakes, A., J. Porter, K. Ho, and J. Ishii (2006): “Moment Inequalities and Their Application”, Working Paper, Harvard University.
Ponomareva, M., and E. Tamer (2010): “Misspecification in Moment Inequality Models: Back to Moment Equalities?” Econometrics Journal, 10, 1–21.
Santos, A. (2011): “Instrumental Variables Methods for Recovering Continuous Linear Functionals”, Journal of Econometrics, 161, 129–146.
Sherman, R. P. (1993): “The Limiting Distribution of the Maximum Rank Correlation Estimator”, Econometrica, 61(1), 123–137.
Tamer, E. (2003): “Incomplete Simultaneous Discrete Response Model with Multiple Equilibria”, The Review of Economic Studies, 70(1), 147–165.
van der Vaart, A. W., and J. A. Wellner (1996): Weak Convergence and Empirical Processes: with Applications to Statistics. Springer, New York.
White, H. (1982): “Maximum Likelihood Estimation of Misspecified Models”, Econometrica, 50(1), 1–25.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Mathematical Proofs
Mathematical Proofs
1.1 Notation
Throughout the appendix, let \(\Vert \cdot \Vert \) denote the usual Euclidean norm. For each \(s,s^{\prime }\in \mathcal S \), let \(\rho (s,s^{\prime }):=\sup _{x\in \mathcal S }\max _{j=1,{\ldots } ,l}|s^{(j)}(x)-s^{\prime (j)}(x)| \). For each \(a\times b\) matrix \(A\), let \(\Vert A\Vert _{op}:=\min \{c:\Vert Av\Vert \le c\Vert v\Vert ,v\in \mathbb R ^{b}\}\) be the operator norm. For any symmetric matrix \(A\), let \(\xi (A)\) denote the smallest eigenvalue of \(A\).
For a given pseudometric space \((T,\rho )\), let \(N(\epsilon ,T,\rho )\) be the covering number, i.e., the minimal number of \(\epsilon \)-balls needed to cover \(T\). For each measurable function \(f:\mathcal X \rightarrow \mathbb R \) and \(1\le p<\infty \), let \(\Vert f\Vert _{L^{p}}:=E[|f(X)|^p]^{1/p}\) provided that the integral exists. Similarly, let \(\Vert f\Vert _\infty :=\inf \{c:P(|f(X)|>c)=0\}\). For a given function space \(\mathcal G \) equipped with a norm \(\Vert \cdot \Vert _{\mathcal G }\) and \(l,u\in \mathcal G \), let \([l,u]:=\{f\in \mathcal G :l\le f\le u\}\). For each \(f\in \mathcal G \), let \(B_{\epsilon ,f}:=\{[l,u]:l\le f\le u,\Vert l-u\Vert _{\mathcal G }<\epsilon \}\) be the \(\epsilon \)-bracket of \(f\). The bracketing number \(N_{[\,]}(\epsilon ,\mathcal G ,\Vert \cdot \Vert _{\mathcal G })\) is the minimum number of \(\epsilon \)-brackets needed to cover \(\mathcal G \). An envelope function \(G\) of a function class \(\mathcal G \) is a measurable function such that \(g(x)\le G(x)\) for all \(g\in \mathcal G \). For each \(\delta >0\), the bracketing integral of \(\mathcal G \) with an envelope function \(G\) is defined as \(J_{[]}(\delta ,\mathcal G ,\Vert \cdot \Vert _{\mathcal G }):=\int _0^\delta \sqrt{1+\ln N_{[]}(\epsilon \Vert G\Vert _{\mathcal G },\mathcal G ,\Vert \cdot \Vert _{\mathcal G })}d\epsilon \).
1.2 Projection
Proof of Proposition 2.1.
Note that under the conditions of Example 2.1, Assumption 2.3 holds. This ensures \(\mathcal S _0\) is nonempty. By Eq. (13), \(\Theta _{*}\) is nonempty. Furthermore, let \(\theta \in \Theta _I,\) and for each \(z\in \mathcal Z \), let \(r_\theta (z):=z^{\prime }\theta \). Note that \(r_\theta \in \mathcal S _0\). Thus, (13) holds with \(s=r_\theta \), which ensures the first claim.
For the second claim, note that the condition \(E[Y_U|Z]=E[Y_L|Z]=Z^{\prime }\theta _0\) a.s implies that any \(\theta \in \Theta _I\) must satisfy
By the rank condition on \(D\), the unique solution to (A.1) is \(\theta _0-\theta =0\). Thus, \(\{\theta _0\}=\Theta _I\). Since \(\{\theta _0\}\subseteq \Theta _{*}\) by the first claim, it suffices to show that \(\theta _0\) is the unique element of \(\Theta _{*}\). For this, note that under our assumptions, \(\mathcal S _0=\{s_0\}\) with \(s_0(z)=z^{\prime }\theta _0\). Thus, \(\Theta _{*}=\{\theta _0\}\). This completes the proof.\(\square \)
1.3 Consistency of the Parametric Part
For each \(s\in \mathcal S \), let \(\theta _{\ast }(s):=\mathop{\rm arg\,min}_{\theta \in \Theta }Q(\theta ,s)\) and \(\hat{\theta}_{n}(s):=\mathop{\rm arg\,min}_{\theta \in \Theta}Q_{n}(\theta ,s)\).
Lemma A.1
Suppose that Assumptions 3.4 and 3.2 (iv) hold. Then, (i) for each \(x\in \mathcal X \) and any \(s,s^{\prime }\in \mathcal S \), there exists a function \(C_{1}:\mathcal X \rightarrow \mathbb R _{+}\) such that
(ii) For each \(x\in \mathcal X \), \(j=1,{\ldots } ,L,\) and any \(s,s^{\prime }\in \mathcal S \), there exists a function \(C_{2}:\mathcal X \rightarrow \mathbb R _{+}\) such that
Proof of Lemma A.1
Assumption 3.4 ensures that
Assumption 3.2 (iv) ensures that for each \(s\in L^2_{\mathcal S ,L}\), \(\theta _{*}(s)=\Pi _{\mathcal R _\Theta }s\) is uniquely determined, where \(\Pi _{\mathcal R _\Theta }\) is the projection mapping from the Hilbert space \(L^2_{\mathcal S ,L}\) to the closed convex subset \(\mathcal R _{\Theta }\). Furthermore, Lemma 6.54 (d) in Aliprantis and Border (2006) and the fact that \(\rho \) is stronger than \(\Vert \cdot \Vert _W\) imply
for some \(c>0\). Combining (A.4) and (A.5) ensures (i). Similarly, Assumption 3.4 ensures that for each \(x\in \mathcal X \)
Combining (A.5) and (A.6) ensures (ii). \(\square \)
Proof of Theorem 3.1
Step 1: Let \(s\in \mathcal S \) be given. For each \(\theta \in \Theta \), let \(Q_s(\theta ):=Q(\theta ,s)\) and \(Q_{n,s}(\theta ):=Q_n(\theta ,s)\). By Assumption 3.2 (iv) and Theorem 6.53 in Aliprantis and Border (2006), \(Q_s\) is uniquely minimized at \(\theta _{*}(s)\). By Assumption 3.2 (i), \(\Theta \) is compact. By Assumption 3.2, \(Q(\theta )\) is continuous. Furthermore, Assumption 3.4 ensures the applicability of the uniform law of large numbers. Thus, \(\sup _{\theta \in \Theta }|Q_{n,s}(\theta )-Q_s(\theta )|=o_p(1)\). Hence, by Theorem 2.1 in Newey and McFadden (1994), \(\hat{\theta }_n(s)-\theta _{*}(s)=o_p(1)\).
By Assumptions 3.2 (v), 3.4 (ii), and the fact that \(\hat{\theta }_n(s)\) is consistent for \(\theta _{*}(s)\), \(\hat{\theta }_n(s)\) solves the first order condition:
with probability approaching one. Expanding this condition at \(\theta _{*}(s)\) using the mean-value theorem applied to each element of \(\nabla _\theta Q_n(\theta ,s)\) yields
where \(\bar{\theta }_n(s)\) lies on the line segment that connects \(\hat{\theta }_n(s)\) and \(\theta _{*}(s)\).Footnote 7 For each \(s\in \mathcal S _0^{\bar{\eta }}\), let
Below, we show that the function class \(\Psi :=\{f_s:f_s=\psi ^{(j)}_s, s\in \mathcal S _0^{\bar{\eta }}, j=1,2,{\ldots },J\}\) is a Glivenko–Cantelli class.
By Assumption 3.4 (ii), Lemma A.1, the triangle inequality, and the Cauchy–Schwarz inequality, for any \(s,s^{\prime }\in \mathcal S \),
where \(F(x):=(C_2(x)\Vert W\Vert _{op}(M+R(x))+(1+C_1(x))\Vert W\Vert _{op}R(x))\times \sqrt{L}\). For any \(\epsilon >0\), let \(u:=\epsilon /2\Vert F\Vert _{L^1}\). By, Theorem 2.7.11 in van der Vaart and Wellner (1996) and Assumption 3.2 (ii), we obtain
For each \(j=1,{\ldots },L\), let \(\mathcal S _0^{\bar{\eta },(j)}:=\{s^{(j)}:s\in \mathcal S _0^{\bar{\eta }}\}\). For each \(j, g\in \mathcal S _0^{\bar{\eta },(j)},\) and \(\epsilon >0\), let \(B^{(j)}_\epsilon (g):=\{f\in \mathcal S _0^{\bar{\eta },(j)}:\Vert f-g\Vert _{\infty }<\epsilon \}\). Similarly, for each \(s\in \mathcal S _0^{\bar{\eta }}\), let \(B_{u,\rho }(s):=\{f\in \mathcal S _0^{\bar{\eta },(j)}:\rho (f,s)<\epsilon \}\). As we will show below, \(N_j:=N(u,\mathcal S _0^{\bar{\eta },(j)},\Vert \cdot \Vert _\infty )\) is finite for all \(j\). Thus, for each \(j\) there exist \(f_{1,j},{\ldots },f_{N_j,j}\in \mathcal S _0^{\bar{\eta },(j)}\) such that \(\mathcal S _0^{\bar{\eta },(j)}\subseteq \bigcup _{l=1}^{N_j}B_{u}^{(j)}(f_{l,j})\). We can then obtain a grid of distinct points \(f_1,{\ldots },f_N\in \mathcal S _0^{\bar{\eta }}\) such that \(f_i^{(j)}=f_{l,j}\) for some \(1\le l\le N_j\), where \(N=\prod _{j=1}^L N_j\). Then, by the definition of \(\rho \), \(\mathcal S _0^{\bar{\eta }}\subseteq \bigcup _{i=1}^N B_{u,\rho }(f_i)\). Thus,
where the last inequality follows from Assumption 3.2 (ii)–(iii) and Theorem 2.7.1 in van der Vaart and Wellner (1996). By Theorem 2.4.1 in van der Vaart and Wellner (1996), \(\Psi \) is a Glivenko–Cantelli class.
Note that, by Assumptions 3.2 (v) and 3.4, \(\theta ^*(s)\) solves the population analog of (A.7). Thus,
These results together with the strong law of large numbers whose applicability is ensured by Assumptions 3.3 and 3.4 (ii) imply
Step 2: In this step, we show that the Hessian \(\nabla ^2_\theta Q_n(\theta ,s)\) is invertible with probability approaching 1 uniformly over \(\mathcal N _{\bar{\epsilon },\bar{\eta }}\). Let \(\mathcal H :=\{h_{\theta ,s}:\mathcal X \rightarrow \mathbb R :h_{\theta ,s}(x)=H^{(i,j)}_W(\theta ,s,x)+2\nabla _\theta r^{(i)}_{\theta }(x)^{\prime }W\nabla _\theta r^{(j)}_{\theta }(x), 1\le i,j\le p,\theta \in \Theta ,s\in \mathcal S _0^{\bar{\eta }}\}\). Note that \(h_{\theta ,s}\) takes the form:
for some \(1\le i,j\le p, \theta \in \Theta \), and \(s\in \mathcal S ^{\bar{\eta }}_0\). Consider the function classes \(\mathcal F _1:=\{D^\alpha _\theta r^{(k)}_\theta :\theta \in \Theta ,|\alpha |\le 2,k=1,{\ldots },L\}\) and \(\mathcal F _2:=\{s^{(k)}:s\in \mathcal S _0^{\bar{\eta }},k=1,{\ldots },L\}\). Assumptions 3.2 (i), 3.4, and Theorem 2.7.11 in van der Vaart and Wellner (1996) ensure \(N_{[]}(\epsilon ,\mathcal F _1,\Vert \cdot \Vert _{L^2})\le N(u,\Theta ,\Vert \cdot \Vert )<\infty \) with \(u:=\epsilon /2\Vert C\Vert _{L^2}\). Assumption 3.2 (ii)–(iii) and Corollary 2.7.2 in van der Vaart and Wellner (1996) ensure \(N_{[]}(\epsilon ,\mathcal F _2,\Vert \cdot \Vert _{L^2})\le N_{[]}(\epsilon , \mathcal C ^\gamma _M(\mathcal X ),\Vert \cdot \Vert _{L^2})<\infty \). Since \(\mathcal H \) can be obtained by combining functions in \(\mathcal F _1\) and \(\mathcal F _2\) by additions and pointwise multiplications, Theorem 6 in Andrews (1994) implies \(N_{[]}(\epsilon ,\mathcal H ,\Vert \cdot \Vert _{L^2})<\infty \). This bracketing number is given in terms of the \(L^2\)-norm, but we can also obtain a bracketing number in terms of the \(L^1\)-norm. For this, let \(h_1,{\ldots },h_p\) be the centers of \(\Vert \cdot \Vert _{L^2}\)-balls that cover \(\mathcal H \). Then, the brackets \([h_i-\epsilon ,h_i+\epsilon ],i=1,{\ldots },p\) cover \(\mathcal H \), and each bracket has length at most \(2\epsilon \) in \(\Vert \cdot \Vert _{L^1}\). Thus, \(N_{[]}(\epsilon ,\mathcal H ,\Vert \cdot \Vert _{L^1})<\infty \). By Theorem 2.7.1 in van der Vaart and Wellner (1996), \(\mathcal H \) is a Glivenko–Cantelli class. Hence, uniformly over \(\Theta \times \mathcal S _0^{\bar{\eta }}\),
Note that \(d_{H,W}(\hat{\mathcal S }_n,\mathcal S _0)=o_p(1)\) by Assumption 3.6. Thus, \((\bar{\theta }_n(s),s)\in \mathcal N _{\bar{\epsilon },\bar{\eta }}\) with probability approaching one. By Assumption 3.5 and (A.15), there exists \(\delta >0\) such that \(\nabla ^2_\theta Q_n(\bar{\theta }_n(s),s)\)’s smallest eigenvalue is above \(\delta \) uniformly over \(\mathcal N _{\bar{\epsilon },\bar{\eta }}\). Thus, the Hessian \(\nabla ^2_\theta Q_n(\bar{\theta }_n(s),s) \) in (A.8) is invertible with probability approaching 1.
Step 3: Steps 1–2 imply that, uniformly over \(\mathcal S _0^{\bar{\eta }}\),
where we used the fact that \(\Vert \theta _{*}(s)-\theta _{*}(s^{\prime })\Vert \le \Vert s-s^{\prime }\Vert _{W}\) by Lemma 6.54 (d) in Aliprantis and Border (2006).
Step 4: Finally, note that by Step 3,
Equation (18) and Assumption 3.6 then ensure the desired result. \(\square \)
1.4 Convergence Rate
The following lemma controls the rate at which \(\hat{\Theta }_n\) covers \(\Theta _{*}.\) Given a sequence \(\{\eta _{n}\}\) such that \(\eta _{n}\rightarrow 0\), we let \(V^{\delta _{1n}}(s):=\{\theta ^{\prime }:\Vert \theta ^{\prime }-\theta (s)\Vert \le e_n, \quad e_n=O_p(\eta _n)\}\) and let \(\mathcal N _{\eta _n,0}:=\{(\theta ,s):\theta \in V^{\eta _n}(s),s\in \mathcal S _0\}\).
Lemma A.2
Suppose Assumptions 2.1–2.3, 3.1–3.2, and 3.6 hold. Let \(\{\delta _{1n}\}\) and \(\{\epsilon _{n}\}\) be sequences of non-negative numbers converging to 0 as \(n\rightarrow \infty \). Let \(G:\Theta \times \mathcal S \rightarrow \mathbb R _{+}\) be a function such that \(G\) is jointly measurable and lower semicontinuous. For each \(n\), let \(G_{n}:\Omega \times \Theta \times \mathcal S \rightarrow \mathbb R \) be a function such that for each \(\omega \in \Omega \), \(G_{n}(\omega ,\cdot ,\cdot )\) is jointly measurable and lower semicontinuous, and for each \((\theta ,s)\in \Theta \times \mathcal S \), \(G_{n}(\cdot ,\theta ,s)\) is measurable. Let \(\Theta _{*}:=\{G(\theta ,s)=0,s\in \mathcal S _{0}\}\) and \(\hat{\Theta }_{n}:=\{\theta \in \Theta :G_{n}(\theta ,s)\le \inf _{\theta \in \Theta }G_{n}(\theta ,s)+c_{n},s\in \hat{\mathcal S }_{n}\}\). Suppose that \(d_{H}(\hat{\Theta }_{n},\Theta _{*})=O_{p}(\delta _{1n})\). Suppose further that there exists a positive constant \(\kappa \) and a neighborhood \(V(s)\) of \(\theta _{*}(s)\) such that
for all \(\theta \in V(s),s\in \mathcal S _{0}\). Suppose that uniformly over \(\mathcal N _{\delta _{1n},0}\),
Then
Proof of Lemma A.2
The proof of this Lemma is similar to Theorem 1 in Sherman (1993). By (A.19), (A.20), and the Hausdorff consistency of \(\hat{\Theta }_n\), it follows that, uniformly over \(\mathcal N _{\delta _{1n},0}\),
with probability approaching 1. As in Theorem 1 in Sherman (1993), write \(K_n\Vert \theta -\theta (s)\Vert \) for the \(O_p(\Vert \theta -\theta _{*}(s)\Vert /\sqrt{n})\) term, where \(K_n=O_p(1/\sqrt{n})\) and also note that \(o_p(\Vert \theta -\theta _{*}(s)\Vert ^2)\) is bounded from below by \(-\frac{\kappa }{2}\Vert \theta -\theta ^*(s)\Vert ^2\) with probability approaching 1. Thus, we obtain
Completing the square, we obtain
Taking square roots gives
Thus,
This completes the proof. \(\square \)
The following lemma controls the rate at which \(\hat{\Theta }_n\) is contracted into a neighborhood of \(\Theta _{*}\). Given \(s\in \mathcal S \) and a sequence \(\{\delta _n\}\) such that \(\delta _n\rightarrow 0\), let \(U^{\delta _n}(s):=\{\theta \in \Theta :\Vert \theta -\theta _{*}(s)\Vert \ge \delta _n\}\).
Lemma A.3
Suppose Assumptions 2.1–2.3, 3.1–3.2, and 3.6 hold. Let \(G_{n}\) be defined as in Lemma A.2. Suppose that there exist positive constants \((k,\kappa _{2})\) and a sequence \(\{\delta _{1n}\}\) such that
with probability approaching 1 for all \(\theta \in U^{\delta _{n}}(s)\) with \(\delta _{n}:=(k\delta _{1n}/\sqrt{n})^{1/2}\) and \(s\in \mathcal S _{0}^{\bar{\eta }}\). Then,
Proof of Lemma A.3
Note first that \(\hat{\mathcal S }_n\) is in \(\mathcal S _0^{\bar{\eta }}\) with probability approaching 1 by Assumption 3.6. Let \(\tilde{c}_n:=\sqrt{n}c_n\) and \(\bar{c}_n:=\max \{\kappa _2k\delta _{1n},\tilde{c}_n\}\). Let \(\epsilon _n:=(\bar{c}_n /\kappa _2 \sqrt{n})^{1/2}\). Then, uniformly over \(\mathcal S _0^{\bar{\eta }}\),
Since \(\sqrt{n}G_n(\hat{\theta }_n(s),s)\le \tilde{c}_n\) for all \(s\in \hat{\mathcal S }_n\), the results above ensure
This ensures the claim of the Lemma. \(\square \)
Proof of Theorem 3.2
We first show (A.19) holds with \(G(\theta ,s)=Q(\theta ,s)\). For this, we use the second-order Taylor expansion of \(Q(\theta ,s)\). For \(\theta \in V^{\delta _{1n}}(s)\), it holds by Assumptions 3.2 (v) and 3.4 that
where \(\bar{\theta }(s)\) is on the line segment that connects \(\theta \) and \(\theta _{*}(s)\). By (15), \(Q(\theta _{*}(s),s)=0\), and by the first order condition of the optimality, \(\nabla _\theta Q(\theta _{*}(s),s)=0\). Thus, it follows that
where \(\kappa :=\inf _{\theta \in \Theta ,s\in \mathcal S _0} \xi (\nabla ^2_\theta Q(\theta ,s))/2\), and \(\kappa >0\) by Assumption 3.5.
We next show that (A.20) holds for
In what follows, let \(\hat{E}_n\) denote the expectation with respect to the empirical distribution. Using the Taylor expansion of \(G_n\) and \(G\) with respect to \(\theta \) at \(\theta _{*}(s)\), we may write
where
Thus, for (A.20) to hold, it suffices to show that \( S_{1n}(\theta ,s)=O_p(\Vert \theta -\theta _{*}(s)\Vert /\sqrt{n})+o_p(\Vert \theta -\theta _{*}(s)\Vert ^2)\) and \(S_{2n}(\theta ,s)=O_p(\epsilon _n)\) for some \(\epsilon _n\rightarrow 0\). For \(S_{1n}\), note that our assumptions suffice for the conditions of Lemma A.4. Thus, \(\Phi \) is a \(P_0\)-Donsker class. This ensures \(S_{1n}(\theta ,s)=O_p(\Vert \theta -\theta _{*}(s)\Vert /\sqrt{n})+o_p(\Vert \theta -\theta _{*}(s)\Vert ^2)\). We now consider \(S_{2n}\). For each \(s\in \mathcal S _0\) and \(x\in \mathcal X \), let \(\phi _s(x):=\nabla _\theta r_{\theta _{*}(s)}(x)^{\prime }W\nabla _\theta r_{\theta _{*}(s)}(x)\). Note that
where the last inequality follows from Lemma B.1 of Ichimura and Lee (2010). Now, Markov’s inequality, Lemma A.4, and Assumption 3.4 (ii) ensure that \(S_{2n}=O_p(\epsilon _n)\), where \(\epsilon _n=n^{-1/2}\delta _{1n}^2\).
We further set \(c_n=0\). Note that the estimator defined in (17) with \(c_n=0\) equals the set estimator \(\hat{\Theta }_n=\{\theta :G_n(\theta ,s)\le \inf _{\theta \in \Theta }G_n(\theta ,s)\}.\) By Assumption 3.7 and Step 4 of the proof of Theorem 3.1, we may take \(\delta _{1n}=O_p(n^{-1/4})\) as an initial rate. Lemma A.2 then implies that \(\vec d_H(\Theta _{*},\hat{\Theta }_n)=O_p(\epsilon ^{1/2}_n)\), where \(\epsilon _n=O_p(n^{-1/2}\delta _{1n}^2)=O_p(n^{-1})\). Thus, \(\vec d_H(\Theta _{*},\hat{\Theta }_n)=O_p(n^{-1/2})\).
Now we consider \(\vec d_H(\hat{\Theta }_n,\Theta _{*})\). We show that (A.28) holds for \(G_n\). For each \(\theta \) and \(s\), let \(L_n(\theta ,s):=\frac{1}{n}\sum _{i=1}^{n}(s(X_{i})-r_{\theta }(X_{i}))^{\prime }W(s(X_{i})-r_{\theta }(X_{i})) \nonumber \). Let \(s\in \mathcal S ^{\bar{\eta }}_0\) and \(\theta \in U^{\delta _{1n}}(s)\). A second-order Taylor expansion of \(G_n(\theta ,s)=L_n(\theta ,s)-L_n(\theta _{*}(s),s)\) with respect to \(\theta \) at \(\theta _{*}(s)\) gives
with probability approaching 1 for some \(\kappa _2>0\), where \(\bar{\theta }_n(s)\) is a point on the line segment that connects \(\theta \) and \(\theta _{*}(s)\). The last inequality follows from Step 3 of the proof of Theorem 3.1 and Assumption 3.5.
Set \(\tilde{c}_n=0\). Then, Lemma A.3 implies \(\vec d_H(\hat{\Theta }_n,\Theta _{*})=O_p(\delta _{1n}^{1/2}/n^{1/4})\). Setting \(\delta _{1n}=O_p(n^{-1/4})\) refines this rate to \(O_p(n^{-3/8})\). Repeated applications of Lemma A.3 then implies \(\vec d_H(\hat{\Theta }_n,\Theta _{*})=O_p(n^{-1/2})\). As both of the directed Hausdorff distances converge to 0 at the stochastic order of \(n^{-1/2}\), the claim of the theorem follows. \(\square \)
Lemma A.4
Suppose Assumptions 3.2 and 3.4 hold. Then \(\Phi \) is a \(P_0\)-Donsker class.
Proof of Lemma A.4
The proof of Theorem 3.1 shows that each \(f_s\in \Phi \) is Lipschitz in \(s\). For any \(\epsilon >0\), Assumption 3.2 (ii)–(iii), Theorems 2.7.11 and 2.7.2 in van der Vaart and Wellner (1996), and (A.12) imply
where \(C\) is a constant that depends only on \(k,\gamma ,L\), and \({\text{ diam}}(\mathcal X )\). Thus, for any \(\delta >0\),
Example 2.14.4 in van der Vaart and Wellner (1996) ensures that \(\Psi \) is \(P_0\)-Donsker. \(\square \)
1.5 First Stage Estimation
In the following, we work with the following population criterion function. For each \(s\in \mathcal S \), let \(\mathcal Q \) be defined by
Lemma A.5
Suppose that Assumption 3.9 (i) holds. Let the criterion function be given as in (A.40). Then, there exists a positive constant \(C_{2}\) such that
Proof of Lemma A.5
Let \(s\in \mathcal S \) be arbitrary. For any \(s_0\in {\mathcal S }\), \(E[\varphi ^{(j)}(X,s_0)]\le 0\) for \(j=1,{\ldots }, l\). Let \(V\) be an open set that contains \(s\) and \(s_0\). Assumption 3.9 (i) and Theorem 1.7 in Lindenstrauss et al. (2007), it holds that
where \(\tilde{V}_j:=\{g\in V:\dot{\varphi }^{(j)}_{g}\;{\text{ exists}}\}\). Let \(C_2:=\sum _{j=1}^l \Vert \sup _{g \in \mathcal S } \dot{\varphi }^{(j)}_{g}\Vert ^2_{op}\). It holds that \(0<C_2<\infty \) by the hypothesis. We thus obtain
for all \(s_0\in \mathcal S _0\). Note that \(s_0\mapsto \Vert s-s_0\Vert _W\) is continuous and \(\mathcal S _0\) is compact by Assumption 3.2 (ii)–(iii) and Assumption 3.10 (i). Taking infimum over \(\mathcal S _0\) then ensures the desired result. \(\square \)
Lemma A.6
Suppose Assumption 3.9 (ii) holds. Let the criterion function be given as in (A.40). Then there exists a positive constant \(C\) such that
Proof of Lemma A.6
If \(s\in \mathcal S _0\), the conclusion is immediate. Suppose that \(s\notin \mathcal S _0.\) By Assumption 3.9 (ii), there exists \(s_0\in \mathcal S _0\)
Let \(C_3:= C_j\). Thus, the claim of the lemma follows. \(\square \)
In the following, let \(\mathcal G :=\{g:g(x)=\varphi _{s}^{(j)}(x),s\in \mathcal S ,j=1,{\ldots } ,l\}\).
Lemma A.7
Suppose Assumptions 3.2, 3.4 , and 3.8 hold. Then \(\mathcal G \) is a \(P_0\)-Donsker class.
Proof of Lemma A.7
By Assumption 3.8, \(\varphi ^{(j)}_s\) is Lipschitz in \(s\). The rest of the proof is the same as that of Lemma A.4. \(\square \)
Proof of Theorem 3.3
We establish the claims of the theorem by applying Theorem B.1 in Santos (2011). Note first that Assumption 3.2 (ii)–(iii) and Assumption 3.10 (i) ensure that \(\mathcal S \) is compact. This ensures condition (i) of Theorem B.1 in Santos (2011). Condition (ii) of Theorem B.1 in Santos (2011) is ensured by Assumption 3.10. Lemma A.7 ensures that uniformly over \(\Theta _n\)
Thus, condition (iii) of Theorem B.1 in Santos (2011) hold with \(C_1=1\) and \(c_{2n}=n^{-1}\). Lemma A.5 ensures that \(\mathcal Q (s)\le \inf _{s_0\in \mathcal S _0}C_2\Vert s-s_0\Vert _W^2\) for some \(C_2>0\). Thus, condition (iv) of Theorem B.1 in Santos (2011) hold with \(\kappa _1=2\). Now, the first claim of Theorem B.1. in Santos (2011) establishes
Furthermore, Lemma A.6 ensures \(\mathcal Q (s)\ge \inf _{s_0\in \mathcal S _0}C_3\Vert s-s_0\Vert ^2\) for some \(C_3>0\). This ensures condition (v) of Theorem B.1 in Santos (2011) with \(\kappa _2=2\). Now, the second claim of Theorem B.1. in Santos (2011) ensures
Since \((b_n/a_n)^{1/2}/\delta _n\rightarrow \infty \), the claim of the theorem follows. \(\square \)
Proof of Corollary 3.1
In what follows, we explicitly show \(\mathcal Q _{n}\)’s dependence on \(\omega \in \Omega \). Let \(\mathcal Q _{n}:\Omega \times \mathcal S \rightarrow \mathbb R \) be defined by \(\mathcal Q _{n}(\omega ,s)=\sum _{j=1}^l(\frac{1}{n}\sum _{i=1}^{n}\varphi (X_{i}(\omega ),s))_+^2\). By Assumption 2.3, \(\varphi \) is continuous in \(s\) for every \(x\) and measurable for every \(s\). Also note that \(X_i\) is measurable for every \(i\). Thus, by Lemma 4.51 in Aliprantis and Border (2006), \(\mathcal Q _{n}\) is jointly measurable in \((\omega ,s)\) and lower semicontinuous in \(s\) for every \(\omega \). Note that \(\mathcal S \) is compact by Assumptions 3.2 (ii)–(iii) and 3.10 (i), which implies \(\mathcal S \) is locally compact. Since \(\mathcal S \) is a metric space, it is a Hausdorff space. Thus, by Proposition 5.3.6 in Molchanov (2005), \(\mathcal Q _n\) is a normal integrand defined on a locally compact Hausdorff space. Proposition 5.3.10 in Molchanov (2005) then ensures the first claim.
Now we show the second claim using Theorem 3.3 (i). Assumptions 2.1–2.3 hold with \(\varphi \) defined in (5). Assumption 3.2 holds by our hypothesis with \(\gamma =1\). Assumption 3.3 is also satisfied by the hypothesis. Note that for each \(j\), \(\varphi ^{(j)}(x,s)=(y_L-s(z))1_{A_k}(z)\) or \(\varphi ^{(j)}(x,s)=(s(z)-y_U)1_{A_k}(z)\) for some \(k\in \{1,{\ldots }, K\}\). Without loss of generality, let \(j\) be an index for which \(\varphi ^{(j)}(x,s)=(y_L-s(z))1_{A_k}(z)\) for some Borel set \(A_k\). For any \(s,s^{\prime }\in \mathcal S \),
It is straightforward to show the same result for other indexes. Thus, Assumption 3.8 is satisfied.
Now for \(j\) such that \(\varphi ^{(j)}(x,s)=(y_L-s(z))1_{A_k}(z)\), note that
Thus, the Fréchet derivative is given by \(\dot{\varphi }^{(j)}_s(h)=E[h(Z)(-1_{A_k}(Z))]\). By Proposition 6.13 in Folland (1999), the norm of the operator is given by \(\Vert \dot{\varphi }^{(j)}_s\Vert _{op}=E[|-1_{A_k}(Z)|^2]^{1/2}=P_0(Z\in A_k)>0\), which ensures the boundedness (continuity) of the operator. It is straightforward to show the same result for other indexes. Hence, Assumption 3.9 (i) is satisfied. By construction, Assumption 3.10 (i) is satisfied, and Assumption 3.10 (ii) holds with \(\delta _n\asymp J_n^{-1}\) (See Chen 2007). These ensure the conditions of Theorem 3.3 (i). Thus, the second claim follows.
For the third claim, let \(s\in \mathcal S \setminus \mathcal S _0\). Then, there exists \(j\) such that \(E[\varphi ^{(j)}(X_i,s)]>0\). Without loss of generality, suppose that \(E[\varphi ^{(j)}(X_i,s)]=E[(Y_{L,i}-s(Z_i))1_{A_k} (Z_i)]\ge \delta >0\). Let \(s_0\in \mathcal S _0\) be such that
Such \(s_0\) always exists by the intermediate value theorem. Then, for \(j\) with which \(\varphi ^{(j)}(x,s)=(y_L-s(z))1_{A_k}(z)\), it follows that
Thus, we have
where \(C:=\inf _{q\in E}E[q(Z_i)1_{A_k}(Z_i)]\) and \(E:=\{q\in \mathcal S :\Vert q\Vert _W=1,E[q(Z_i)1_{A_k} (Z_i)]>0\}\). Since \(C\) is the minimum value of a linear function over a convex set, it is finite. Furthermore, by the construction of \(E\), it holds that \(C>0\). Thus, Assumption 3.9 (ii) holds. Thus, by Theorem 3.3 (ii), the third claim follows. \(\square \)
Proof of Corollary 3.2
We show the claim of the corollary using Theorem 3.2. Note that we have shown, in the proof of Corollary 3.1, that Assumptions 2.1–2.3, 3.2 (i)–(iii), and 3.3 hold. Thus, to apply Theorem 3.2, it remains to show Assumptions 2.4, 3.2 (iv), and 3.4–3.7.
Assumption 2.4 is satisfied by the parameterization \(r_\theta (z)=\theta ^{(1)}+\theta ^{(2)}z\). For Assumption 3.2 (iv), note that \(\mathcal R _\Theta \) is given by
Since \(\Theta \) is convex, for any \(\lambda \in [0,1]\), it holds that \(\lambda r_\theta +(1-\lambda )r_{\theta ^{\prime }}=r_{\lambda \theta +(1-\lambda )\theta ^{\prime }}\in \mathcal R _\Theta \). Thus, Assumption 3.2 (iv) is satisfied. For Assumption 3.4, note first that \(r_\theta \) is twice continuously differentiable on the interior of \(\Theta \). Because \(r_\theta \) is linear, \(\max _{|\alpha |\le 2}|D^{\alpha }_\theta r_\theta (z)-D^{\alpha }_\theta r_{\theta ^{\prime }}(z)|= (1+z^2)^{1/2}\Vert \theta -\theta ^{\prime }\Vert \) by the Cauchy–Schwarz inequality. By the compactness of \(\mathcal Z \), \(C(z):= (1+z^2)^{1/2}\) is bounded. Thus, Assumption 3.4 (i) is satisfied. Similarly, \(\max _{|\alpha |\le 2}\sup _{\theta \in \Theta }|D^\alpha _\theta r_\theta |\le \max \{1,|z|,C(1+z^2)^{1/2}\}=:R(z)\), where \(C:=\sup _{\theta \in \Theta }\Vert \theta \Vert \). By the compactness of \(\mathcal Z \) and \(\Theta \), \(R\) is bounded. Thus, Assumption 3.4 (ii) is satisfied. Note that the Hessian of \(Q(\theta ,s)\) with respect to \(\theta \) is given by \(2E[(1,z)(1,z)^{\prime }]\), which does not depend on \(\theta \) nor \(s\) and is positive definite by the assumption that \(Var(Z)>0\). Thus, Assumption 3.5 is satisfied. Assumptions 3.6 and 3.7 are ensured by Corollary 3.1. Now the conditions of Theorem 3.2 are satisfied. Thus, the claim of the Corollary follows. \(\square \)
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Kaido, H., White, H. (2013). Estimating Misspecified Moment Inequality Models. In: Chen, X., Swanson, N. (eds) Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1653-1_13
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1653-1_13
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1652-4
Online ISBN: 978-1-4614-1653-1
eBook Packages: Business and EconomicsEconomics and Finance (R0)