Skip to main content
  • 1367 Accesses

Abstract

This chapter studies partially identified structures defined by a finite number of moment inequalities. When the moment function is misspecified, it becomes difficult to interpret the conventional identified set. Even more seriously, this can be an empty set. We define a pseudo-true identified set whose elements can be interpreted as the least-squares projections of the moment functions that are observationally equivalent to the true moment function. We then construct a set estimator for the pseudo-true identified set and establish its \(O_{p}(n^{-1/2})\) rate of convergence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Here, we take the indicators (or instruments) \(1_A(z)\) as given. The indicators \(1_{A}(z)\) could be replaced by any finite vector of measurable non-negative functions of \(z\). Andrews and Shi (2011) give examples of such functions.

  2. 2.

    The players do not need to know the \(F\)’s, but these are important to the econometrician.

  3. 3.

    For this example, \(\Theta _I\) is never empty as long as the number (\(2K\)) of moment inequalities equals the number of parameters \((\ell )\).

  4. 4.

    We are indebted to an anonymous referee for pointing out a relationship between BMM’s framework and ours. General incomplete linear moment restrictions are given by \(E[V(Z^{\prime }\theta -Y)]=E[Vu(V)]\), where \(V\) is a vector of random variables, and \(u\) is an unknown bounded function. See BMM for details.

  5. 5.

    Their framework does not consider misspecification. Their object of interest is therefore the conventional identified set \(\Theta _I\). In our setting, the sample criterion function degenerates, i.e., \(Q_n(\theta ,s)=0\), on a neighborhood of \(\Theta _{*}\times \mathcal S _0\) under Assumption 3.2 (iv).

  6. 6.

    We are indebted to an anonymous referee for this point.

  7. 7.

    Since the mean value theorem only applies element by element to the vector in (A.8), the mean value \(\bar{\theta }_n\) differs across the elements. For notational simplicity, we use \(\bar{\theta }_n\) in what follows, but the fact that they differ element to element should be understood implicitly. For the measurability of these mean values, see Jennrich (1969) for example.

References

  • Ai, C., and X. Chen (2003): “Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions”, Econometrica, 71(6), 1795–1843.

    Article  Google Scholar 

  • Aliprantis, C. D., and K. C. Border (2006): Infinite Dimensional Analysis-A Hitchhiker’s Guide. Springer, Berlin.

    Google Scholar 

  • Andrews, D. W. K. (1994): “Chapter 37: Empirical Process Methods in Econometrics”, vol. 4 of Handbook of Econometrics, pp. 2247–2294. Elsevier, Amsterdam.

    Google Scholar 

  • Andrews, D. W. K., and X. Shi (2011): “Inference for Parameters Defined by Conditional Moment Inequalities”, Discussion Paper, Yale University.

    Google Scholar 

  • Bajari, P., C. L. Benkard, and J. Levin (2007): “Estimating Dynamic Models of Imperfect Competition”, Econometrica, 75(5), 1331–1370.

    Article  Google Scholar 

  • Bontemps, C., T. Magnac, and E. Maurin (2011): “Set Identified Linear Models”, CeMMAP Working Paper.

    Google Scholar 

  • Chen, X. (2007): “Large Sample Sieve Estimation of Semi-Nonparametric Models”, Handbook of Econometrics, 6, 5549–5632.

    Article  Google Scholar 

  • Chernozhukov, V., H. Hong, and E. Tamer (2007): “Estimation and Confidence Regions for Parameter Sets in Econometric Models1”, Econometrica, 75(5), 1243–1284.

    Article  Google Scholar 

  • Ciliberto, F., and E. Tamer (2009): “Market Structure and Multiple Equilibria in Airline Markets”, Econometrica, 77(6), 1791–1828.

    Article  Google Scholar 

  • Folland, G. (1999): Real Analysis: Modern Techniques and Their Applications, vol. 40. Wiley-Interscience, New York.

    Google Scholar 

  • Guggenberger, P., J. Hahn, and K. Kim (2008): “Specification Testing under Moment Inequalities”, Economics Letters, 99(2), 375–378.

    Article  Google Scholar 

  • Ichimura, H., and S. Lee (2010): “Characterization of the Asymptotic Distribution of Semiparametric M-Estimators”, Journal of Econometrics, 159(2), 252–266.

    Article  Google Scholar 

  • Jennrich, R. I. (1969): “Asymptotic Properties of Nonlinear Least Squares Estimators”, Annals of Mathematical Statistics, 40(2), 633–643.

    Article  Google Scholar 

  • Kaido, H., and H. White (2010): “A Two-Stage Approach for Partially Identified Models”, Discussion Paper, University of California San Diego.

    Google Scholar 

  • Lindenstrauss, J., D. Preiss, and J. Tiser (2007): “Differentiability of Lipschitz Maps”, in Banach Spaces and Their Applications in, Analysis, pp. 111–123.

    Google Scholar 

  • Luttmer, E. G. J. (1996): “Asset Pricing in Economies with Frictions”, Econometrica, 64(6), 1439–1467.

    Article  Google Scholar 

  • Manski, C. F., and E. Tamer (2002): “Inference on Regressions with Interval Data on a Regressor or Outcome”, Econometrica, 70(2), 519–546.

    Article  Google Scholar 

  • Molchanov, I. S. (2005): Theory of Random Sets. Springer, Berlin.

    Google Scholar 

  • Newey, W. (1994): "The Asymptotic Variance of Semipara metric Estimators," Econometrica, 62(6), 1349–1382.

    Article  Google Scholar 

  • Newey, W. K., and D. McFadden (1994): “Large Sample Estimation and Hypothesis Testing”, Handbook of Econometrics, 4, 2111–2245.

    Article  Google Scholar 

  • Pakes, A. (2010): “Alternative Models for Moment Inequalities”, Econometrica, 78(6), 1783–1822.

    Article  Google Scholar 

  • Pakes, A., J. Porter, K. Ho, and J. Ishii (2006): “Moment Inequalities and Their Application”, Working Paper, Harvard University.

    Google Scholar 

  • Ponomareva, M., and E. Tamer (2010): “Misspecification in Moment Inequality Models: Back to Moment Equalities?” Econometrics Journal, 10, 1–21.

    Google Scholar 

  • Santos, A. (2011): “Instrumental Variables Methods for Recovering Continuous Linear Functionals”, Journal of Econometrics, 161, 129–146.

    Article  Google Scholar 

  • Sherman, R. P. (1993): “The Limiting Distribution of the Maximum Rank Correlation Estimator”, Econometrica, 61(1), 123–137.

    Article  Google Scholar 

  • Tamer, E. (2003): “Incomplete Simultaneous Discrete Response Model with Multiple Equilibria”, The Review of Economic Studies, 70(1), 147–165.

    Article  Google Scholar 

  • van der Vaart, A. W., and J. A. Wellner (1996): Weak Convergence and Empirical Processes: with Applications to Statistics. Springer, New York.

    Google Scholar 

  • White, H. (1982): “Maximum Likelihood Estimation of Misspecified Models”, Econometrica, 50(1), 1–25.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroaki Kaido .

Editor information

Editors and Affiliations

Mathematical Proofs

Mathematical Proofs

1.1 Notation

Throughout the appendix, let \(\Vert \cdot \Vert \) denote the usual Euclidean norm. For each \(s,s^{\prime }\in \mathcal S \), let \(\rho (s,s^{\prime }):=\sup _{x\in \mathcal S }\max _{j=1,{\ldots } ,l}|s^{(j)}(x)-s^{\prime (j)}(x)| \). For each \(a\times b\) matrix \(A\), let \(\Vert A\Vert _{op}:=\min \{c:\Vert Av\Vert \le c\Vert v\Vert ,v\in \mathbb R ^{b}\}\) be the operator norm. For any symmetric matrix \(A\), let \(\xi (A)\) denote the smallest eigenvalue of \(A\).

For a given pseudometric space \((T,\rho )\), let \(N(\epsilon ,T,\rho )\) be the covering number, i.e., the minimal number of \(\epsilon \)-balls needed to cover \(T\). For each measurable function \(f:\mathcal X \rightarrow \mathbb R \) and \(1\le p<\infty \), let \(\Vert f\Vert _{L^{p}}:=E[|f(X)|^p]^{1/p}\) provided that the integral exists. Similarly, let \(\Vert f\Vert _\infty :=\inf \{c:P(|f(X)|>c)=0\}\). For a given function space \(\mathcal G \) equipped with a norm \(\Vert \cdot \Vert _{\mathcal G }\) and \(l,u\in \mathcal G \), let \([l,u]:=\{f\in \mathcal G :l\le f\le u\}\). For each \(f\in \mathcal G \), let \(B_{\epsilon ,f}:=\{[l,u]:l\le f\le u,\Vert l-u\Vert _{\mathcal G }<\epsilon \}\) be the \(\epsilon \)-bracket of \(f\). The bracketing number \(N_{[\,]}(\epsilon ,\mathcal G ,\Vert \cdot \Vert _{\mathcal G })\) is the minimum number of \(\epsilon \)-brackets needed to cover \(\mathcal G \). An envelope function \(G\) of a function class \(\mathcal G \) is a measurable function such that \(g(x)\le G(x)\) for all \(g\in \mathcal G \). For each \(\delta >0\), the bracketing integral of \(\mathcal G \) with an envelope function \(G\) is defined as \(J_{[]}(\delta ,\mathcal G ,\Vert \cdot \Vert _{\mathcal G }):=\int _0^\delta \sqrt{1+\ln N_{[]}(\epsilon \Vert G\Vert _{\mathcal G },\mathcal G ,\Vert \cdot \Vert _{\mathcal G })}d\epsilon \).

1.2 Projection

Proof of Proposition 2.1.

Note that under the conditions of Example 2.1, Assumption 2.3 holds. This ensures \(\mathcal S _0\) is nonempty. By Eq. (13), \(\Theta _{*}\) is nonempty. Furthermore, let \(\theta \in \Theta _I,\) and for each \(z\in \mathcal Z \), let \(r_\theta (z):=z^{\prime }\theta \). Note that \(r_\theta \in \mathcal S _0\). Thus, (13) holds with \(s=r_\theta \), which ensures the first claim.

For the second claim, note that the condition \(E[Y_U|Z]=E[Y_L|Z]=Z^{\prime }\theta _0\) a.s implies that any \(\theta \in \Theta _I\) must satisfy

$$\begin{aligned} E[Z1\{Z\in A_j\}]^{\prime }(\theta _0-\theta )= 0,\quad j=1,2,{\ldots },K. \end{aligned}$$
(A.1)

By the rank condition on \(D\), the unique solution to (A.1) is \(\theta _0-\theta =0\). Thus, \(\{\theta _0\}=\Theta _I\). Since \(\{\theta _0\}\subseteq \Theta _{*}\) by the first claim, it suffices to show that \(\theta _0\) is the unique element of \(\Theta _{*}\). For this, note that under our assumptions, \(\mathcal S _0=\{s_0\}\) with \(s_0(z)=z^{\prime }\theta _0\). Thus, \(\Theta _{*}=\{\theta _0\}\). This completes the proof.\(\square \)

1.3 Consistency of the Parametric Part

For each \(s\in \mathcal S \), let \(\theta _{\ast }(s):=\mathop{\rm arg\,min}_{\theta \in \Theta }Q(\theta ,s)\) and \(\hat{\theta}_{n}(s):=\mathop{\rm arg\,min}_{\theta \in \Theta}Q_{n}(\theta ,s)\).

Lemma A.1

Suppose that Assumptions 3.4 and 3.2 (iv) hold. Then, (i) for each \(x\in \mathcal X \) and any \(s,s^{\prime }\in \mathcal S \), there exists a function \(C_{1}:\mathcal X \rightarrow \mathbb R _{+}\) such that

$$\begin{aligned} \Big \Vert r_{\theta _{*}(s)}(x)-r_{\theta _{*}(s^{\prime })}(x)\Big \Vert \le C_{1}(x)\rho (s,s^{\prime }); \end{aligned}$$
(A.2)

(ii) For each \(x\in \mathcal X \), \(j=1,{\ldots } ,L,\) and any \(s,s^{\prime }\in \mathcal S \), there exists a function \(C_{2}:\mathcal X \rightarrow \mathbb R _{+}\) such that

$$\begin{aligned} \Big \Vert \nabla _{\theta }^{(j)}r_{\theta _{*}(s)}(x)-\nabla _{\theta }^{(j)}r_{\theta _{*}(s^{\prime })}(x)\Big \Vert \le C_{2}(x)\rho (s,s^{\prime }). \end{aligned}$$
(A.3)

Proof of Lemma A.1

Assumption 3.4 ensures that

$$\begin{aligned} \Big \Vert r_{\theta _{*}(s)}(x)-r_{\theta _{*}(s^{\prime })}(x)\Big \Vert \le L^{1/2} C(x) \Big \Vert \theta _{*}(s)-\theta _{*}(s^{\prime })\Big \Vert . \end{aligned}$$
(A.4)

Assumption 3.2 (iv) ensures that for each \(s\in L^2_{\mathcal S ,L}\), \(\theta _{*}(s)=\Pi _{\mathcal R _\Theta }s\) is uniquely determined, where \(\Pi _{\mathcal R _\Theta }\) is the projection mapping from the Hilbert space \(L^2_{\mathcal S ,L}\) to the closed convex subset \(\mathcal R _{\Theta }\). Furthermore, Lemma 6.54 (d) in Aliprantis and Border (2006) and the fact that \(\rho \) is stronger than \(\Vert \cdot \Vert _W\) imply

$$\begin{aligned} \Big \Vert \theta _{*}(s)-\theta _{*}(s^{\prime })\Big \Vert \le \Big \Vert s-s^{\prime }\Big \Vert _{W}\le c\rho (s,s^{\prime }), \end{aligned}$$
(A.5)

for some \(c>0\). Combining (A.4) and (A.5) ensures (i). Similarly, Assumption 3.4 ensures that for each \(x\in \mathcal X \)

$$\begin{aligned} \Big \Vert \nabla ^{(j)}_\theta r_{\theta _{*}(s)}(x)-\nabla ^{(j)}_\theta r_{\theta _{*}(s^{\prime })}(x)\Big \Vert \le J^{1/2} C(x) \Big \Vert \theta _{*}(s)-\theta _{*}(s^{\prime })\Big \Vert . \end{aligned}$$
(A.6)

Combining (A.5) and (A.6) ensures (ii). \(\square \)

Proof of Theorem 3.1

Step 1: Let \(s\in \mathcal S \) be given. For each \(\theta \in \Theta \), let \(Q_s(\theta ):=Q(\theta ,s)\) and \(Q_{n,s}(\theta ):=Q_n(\theta ,s)\). By Assumption 3.2 (iv) and Theorem 6.53 in Aliprantis and Border (2006), \(Q_s\) is uniquely minimized at \(\theta _{*}(s)\). By Assumption 3.2 (i), \(\Theta \) is compact. By Assumption 3.2, \(Q(\theta )\) is continuous. Furthermore, Assumption 3.4 ensures the applicability of the uniform law of large numbers. Thus, \(\sup _{\theta \in \Theta }|Q_{n,s}(\theta )-Q_s(\theta )|=o_p(1)\). Hence, by Theorem 2.1 in Newey and McFadden (1994), \(\hat{\theta }_n(s)-\theta _{*}(s)=o_p(1)\).

By Assumptions 3.2 (v), 3.4 (ii), and the fact that \(\hat{\theta }_n(s)\) is consistent for \(\theta _{*}(s)\), \(\hat{\theta }_n(s)\) solves the first order condition:

$$\begin{aligned} \nabla _\theta Q_n(\theta ,s)=\frac{1}{n}\sum _{i=1}^n \nabla _\theta r_\theta (X_i)^{\prime }W(s(X_i)-r_\theta (X_i))=0, \end{aligned}$$
(A.7)

with probability approaching one. Expanding this condition at \(\theta _{*}(s)\) using the mean-value theorem applied to each element of \(\nabla _\theta Q_n(\theta ,s)\) yields

$$\begin{aligned} \nabla ^2_\theta Q_n(\bar{\theta }_n(s),s)(\hat{\theta }_n(s)-\theta _{*}(s))=\frac{1}{n} \sum _{i=1}^n \nabla _\theta r_{\theta _{*}(s)}(X_i)^{\prime }W(s(X_i)-r_{\theta _{*}(s)}(X_i)), \end{aligned}$$
(A.8)

where \(\bar{\theta }_n(s)\) lies on the line segment that connects \(\hat{\theta }_n(s)\) and \(\theta _{*}(s)\).Footnote 7 For each \(s\in \mathcal S _0^{\bar{\eta }}\), let

$$\begin{aligned} \psi _s(x):=\nabla _\theta r_{\theta _{*}(s)}(x)^{\prime }W(s(x)-r_{\theta _{*}(s)}(x)). \end{aligned}$$
(A.9)

Below, we show that the function class \(\Psi :=\{f_s:f_s=\psi ^{(j)}_s, s\in \mathcal S _0^{\bar{\eta }}, j=1,2,{\ldots },J\}\) is a Glivenko–Cantelli class.

By Assumption 3.4 (ii), Lemma A.1, the triangle inequality, and the Cauchy–Schwarz inequality, for any \(s,s^{\prime }\in \mathcal S \),

$$\begin{aligned} |\psi ^{(j)}_s(x)-\psi ^{(j)}_{s^{\prime }}(x)|\le&\;\Big \Vert (\nabla ^{(j)}_\theta r_{\theta _{*}(s)}(x)-\nabla ^{(j)}_\theta r_{\theta _{*}(s^{\prime })}(x))^{\prime }W \Big \Vert \nonumber \\&\;\times \Big \Vert s(x)-r_{\theta _{*}(s)}(x)\Big \Vert +\Big \Vert \nabla ^{(j)}_\theta r_{\theta _{*}(s^{\prime })}(x)^{\prime }W\Big \Vert \nonumber \\&\;\times \Big \Vert [s(x)-s^{\prime }(x)]+[r_{\theta _{*}(s^{\prime })}(x)-r_{\theta _{*}(s)}(x)]\Big \Vert \nonumber \\ \le&\; (C_2(x)\Vert W\Vert _{op}(M+R(x))+(1+C_1(x))\Vert W\Vert _{op}R(x))\nonumber \\&\qquad \qquad \qquad \qquad \quad \;\times \sup _{x\in \mathcal S }\Big \Vert s(x)-s^{\prime }(x)\Big \Vert \nonumber \\ \le&\; F(x)\rho (s,s^{\prime }), \end{aligned}$$
(A.10)

where \(F(x):=(C_2(x)\Vert W\Vert _{op}(M+R(x))+(1+C_1(x))\Vert W\Vert _{op}R(x))\times \sqrt{L}\). For any \(\epsilon >0\), let \(u:=\epsilon /2\Vert F\Vert _{L^1}\). By, Theorem 2.7.11 in van der Vaart and Wellner (1996) and Assumption 3.2 (ii), we obtain

$$\begin{aligned} N_{[]}(\epsilon ,\Psi ,\Vert \cdot \Vert _{L^1})&=N_{[]}(2 u\Vert F\Vert _{L^1},\Psi ,\Vert \cdot \Vert _{L^1})\nonumber \\&\le N(u,\mathcal S _{0}^{\bar{\eta }},\rho ). \end{aligned}$$
(A.11)

For each \(j=1,{\ldots },L\), let \(\mathcal S _0^{\bar{\eta },(j)}:=\{s^{(j)}:s\in \mathcal S _0^{\bar{\eta }}\}\). For each \(j, g\in \mathcal S _0^{\bar{\eta },(j)},\) and \(\epsilon >0\), let \(B^{(j)}_\epsilon (g):=\{f\in \mathcal S _0^{\bar{\eta },(j)}:\Vert f-g\Vert _{\infty }<\epsilon \}\). Similarly, for each \(s\in \mathcal S _0^{\bar{\eta }}\), let \(B_{u,\rho }(s):=\{f\in \mathcal S _0^{\bar{\eta },(j)}:\rho (f,s)<\epsilon \}\). As we will show below, \(N_j:=N(u,\mathcal S _0^{\bar{\eta },(j)},\Vert \cdot \Vert _\infty )\) is finite for all \(j\). Thus, for each \(j\) there exist \(f_{1,j},{\ldots },f_{N_j,j}\in \mathcal S _0^{\bar{\eta },(j)}\) such that \(\mathcal S _0^{\bar{\eta },(j)}\subseteq \bigcup _{l=1}^{N_j}B_{u}^{(j)}(f_{l,j})\). We can then obtain a grid of distinct points \(f_1,{\ldots },f_N\in \mathcal S _0^{\bar{\eta }}\) such that \(f_i^{(j)}=f_{l,j}\) for some \(1\le l\le N_j\), where \(N=\prod _{j=1}^L N_j\). Then, by the definition of \(\rho \), \(\mathcal S _0^{\bar{\eta }}\subseteq \bigcup _{i=1}^N B_{u,\rho }(f_i)\). Thus,

$$\begin{aligned} N\big (u,\mathcal S _{0}^{\bar{\eta }},\rho \big )\le \prod _{j=1}^{L}N\big (u,\mathcal S _0^{\bar{\eta },(j)},\Vert \cdot \Vert _\infty \big )\le N\big (u,\mathcal C ^\gamma _M(\mathcal X ),\Vert \cdot \Vert _\infty \big )^{L}<\infty , \end{aligned}$$
(A.12)

where the last inequality follows from Assumption 3.2 (ii)–(iii) and Theorem 2.7.1 in van der Vaart and Wellner (1996). By Theorem 2.4.1 in van der Vaart and Wellner (1996), \(\Psi \) is a Glivenko–Cantelli class.

Note that, by Assumptions 3.2 (v) and 3.4, \(\theta ^*(s)\) solves the population analog of (A.7). Thus,

$$\begin{aligned} E[ \nabla _\theta r_{\theta _{*}(s)}(X_i)^{\prime }W(s(X_i)-r_{\theta _{*}(s)}(X_i))]=E[\psi _{s}(x)]=0. \end{aligned}$$
(A.13)

These results together with the strong law of large numbers whose applicability is ensured by Assumptions 3.3 and 3.4 (ii) imply

$$\begin{aligned} \sup _{s\in \mathcal S _{0}^{\bar{\eta }}}\left|\frac{1}{n}\sum _{i=1}^n\psi ^{(j)}_s(X_i)\right|=o_p(1),\quad j=1,{\ldots }, J. \end{aligned}$$
(A.14)

Step 2: In this step, we show that the Hessian \(\nabla ^2_\theta Q_n(\theta ,s)\) is invertible with probability approaching 1 uniformly over \(\mathcal N _{\bar{\epsilon },\bar{\eta }}\). Let \(\mathcal H :=\{h_{\theta ,s}:\mathcal X \rightarrow \mathbb R :h_{\theta ,s}(x)=H^{(i,j)}_W(\theta ,s,x)+2\nabla _\theta r^{(i)}_{\theta }(x)^{\prime }W\nabla _\theta r^{(j)}_{\theta }(x), 1\le i,j\le p,\theta \in \Theta ,s\in \mathcal S _0^{\bar{\eta }}\}\). Note that \(h_{\theta ,s}\) takes the form:

$$\begin{aligned} h_{\theta ,s}(x)&=2\sum _{k=1}^L\sum _{h=1}^L\frac{\partial ^2 r_\theta ^{(h)}(x)}{\partial \theta _i\partial \theta _j}W^{(h,k)}\big (s^{(k)}(x)-r_\theta ^{(k)}(x)\big )\\&\quad +\sum _{k=1}^L\sum _{h=1}^L\frac{\partial r_\theta ^{(h)}(x)}{\partial \theta _i}W^{(h,k)}\frac{\partial r_\theta ^{(k)}(x)}{\partial \theta _j} \end{aligned}$$

for some \(1\le i,j\le p, \theta \in \Theta \), and \(s\in \mathcal S ^{\bar{\eta }}_0\). Consider the function classes \(\mathcal F _1:=\{D^\alpha _\theta r^{(k)}_\theta :\theta \in \Theta ,|\alpha |\le 2,k=1,{\ldots },L\}\) and \(\mathcal F _2:=\{s^{(k)}:s\in \mathcal S _0^{\bar{\eta }},k=1,{\ldots },L\}\). Assumptions 3.2 (i), 3.4, and Theorem 2.7.11 in van der Vaart and Wellner (1996) ensure \(N_{[]}(\epsilon ,\mathcal F _1,\Vert \cdot \Vert _{L^2})\le N(u,\Theta ,\Vert \cdot \Vert )<\infty \) with \(u:=\epsilon /2\Vert C\Vert _{L^2}\). Assumption 3.2 (ii)–(iii) and Corollary 2.7.2 in van der Vaart and Wellner (1996) ensure \(N_{[]}(\epsilon ,\mathcal F _2,\Vert \cdot \Vert _{L^2})\le N_{[]}(\epsilon , \mathcal C ^\gamma _M(\mathcal X ),\Vert \cdot \Vert _{L^2})<\infty \). Since \(\mathcal H \) can be obtained by combining functions in \(\mathcal F _1\) and \(\mathcal F _2\) by additions and pointwise multiplications, Theorem 6 in Andrews (1994) implies \(N_{[]}(\epsilon ,\mathcal H ,\Vert \cdot \Vert _{L^2})<\infty \). This bracketing number is given in terms of the \(L^2\)-norm, but we can also obtain a bracketing number in terms of the \(L^1\)-norm. For this, let \(h_1,{\ldots },h_p\) be the centers of \(\Vert \cdot \Vert _{L^2}\)-balls that cover \(\mathcal H \). Then, the brackets \([h_i-\epsilon ,h_i+\epsilon ],i=1,{\ldots },p\) cover \(\mathcal H \), and each bracket has length at most \(2\epsilon \) in \(\Vert \cdot \Vert _{L^1}\). Thus, \(N_{[]}(\epsilon ,\mathcal H ,\Vert \cdot \Vert _{L^1})<\infty \). By Theorem 2.7.1 in van der Vaart and Wellner (1996), \(\mathcal H \) is a Glivenko–Cantelli class. Hence, uniformly over \(\Theta \times \mathcal S _0^{\bar{\eta }}\),

$$\begin{aligned} \nabla ^2_\theta Q_n(\theta ,s)&=\frac{1}{n}\sum _{i=1}^n H_W(\theta ,s,X_i)+ 2\nabla _\theta r_{\theta }(X_i)^{\prime }W\nabla _\theta r_{\theta }(X_i)\nonumber \\&\stackrel{p}{\rightarrow }E[H_W(\theta ,s,X_i)+ 2\nabla _\theta r_{\theta }(X_i)^{\prime }W\nabla _\theta r_{\theta }(X_i)]. \end{aligned}$$
(A.15)

Note that \(d_{H,W}(\hat{\mathcal S }_n,\mathcal S _0)=o_p(1)\) by Assumption 3.6. Thus, \((\bar{\theta }_n(s),s)\in \mathcal N _{\bar{\epsilon },\bar{\eta }}\) with probability approaching one. By Assumption 3.5 and (A.15), there exists \(\delta >0\) such that \(\nabla ^2_\theta Q_n(\bar{\theta }_n(s),s)\)’s smallest eigenvalue is above \(\delta \) uniformly over \(\mathcal N _{\bar{\epsilon },\bar{\eta }}\). Thus, the Hessian \(\nabla ^2_\theta Q_n(\bar{\theta }_n(s),s) \) in (A.8) is invertible with probability approaching 1.

Step 3: Steps 1–2 imply that, uniformly over \(\mathcal S _0^{\bar{\eta }}\),

$$\begin{aligned} \Vert \theta _{*}(s)-\hat{\theta }_n(s^{\prime })\Vert&=\Vert \theta _{*}(s)-\theta _{*}(s^{\prime })+\theta _{*}(s^{\prime })-\hat{\theta }_n(s^{\prime })\Vert \nonumber \\&\le \Vert \theta _{*}(s)-\theta _{*}(s^{\prime })\Vert +2\delta ^{-1}\sup _{s\in \mathcal S _{0}^{\bar{\eta }}}\left\Vert\frac{1}{n}\sum _{i=1}^n\psi _s(X_i)\right\Vert\nonumber \\&\le \Vert s-s^{\prime }\Vert _{W}+o_p(1), \end{aligned}$$
(A.16)

where we used the fact that \(\Vert \theta _{*}(s)-\theta _{*}(s^{\prime })\Vert \le \Vert s-s^{\prime }\Vert _{W}\) by Lemma 6.54 (d) in Aliprantis and Border (2006).

Step 4: Finally, note that by Step 3,

$$\begin{aligned} \vec d_H(\Theta _{*},\hat{\Theta }_n)&=\sup _{\theta \in \Theta _{*}}\inf _{\theta ^{\prime }\in \hat{\Theta }_n}\Vert \theta -\theta ^{\prime }\Vert =\sup _{s\in \mathcal S _0}\inf _{s^{\prime }\in \hat{\mathcal S }_n}\Vert \theta _{*}(s)-\hat{\theta }_n(s^{\prime })\Vert \nonumber \\&\le \sup _{s\in \mathcal S _0}\inf _{ s^{\prime }\in \hat{\mathcal S }_n}\Vert s-s^{\prime }\Vert _W +o_p(1)\end{aligned}$$
(A.17)
$$\begin{aligned} \vec d_H(\hat{\Theta }_n,\Theta _{*})&=\sup _{\theta ^{\prime }\in \hat{\Theta }_n}\inf _{\theta \in \Theta _{*}}\Vert \theta -\theta ^{\prime }\Vert =\sup _{s^{\prime }\in \hat{\mathcal S }_n}\inf _{s\in \mathcal S _0}\Vert \theta _{*}(s)-\hat{\theta }_n(s^{\prime })\Vert \nonumber \\&\le \sup _{s^{\prime }\in \hat{\mathcal S }_n}\inf _{s\in \mathcal S _0}\Vert s-s^{\prime }\Vert _W+o_p(1). \end{aligned}$$
(A.18)

Equation (18) and Assumption 3.6 then ensure the desired result. \(\square \)

1.4 Convergence Rate

The following lemma controls the rate at which \(\hat{\Theta }_n\) covers \(\Theta _{*}.\) Given a sequence \(\{\eta _{n}\}\) such that \(\eta _{n}\rightarrow 0\), we let \(V^{\delta _{1n}}(s):=\{\theta ^{\prime }:\Vert \theta ^{\prime }-\theta (s)\Vert \le e_n, \quad e_n=O_p(\eta _n)\}\) and let \(\mathcal N _{\eta _n,0}:=\{(\theta ,s):\theta \in V^{\eta _n}(s),s\in \mathcal S _0\}\).

Lemma A.2

Suppose Assumptions 2.1–2.3, 3.1–3.2, and 3.6 hold. Let \(\{\delta _{1n}\}\) and \(\{\epsilon _{n}\}\) be sequences of non-negative numbers converging to 0 as \(n\rightarrow \infty \). Let \(G:\Theta \times \mathcal S \rightarrow \mathbb R _{+}\) be a function such that \(G\) is jointly measurable and lower semicontinuous. For each \(n\), let \(G_{n}:\Omega \times \Theta \times \mathcal S \rightarrow \mathbb R \) be a function such that for each \(\omega \in \Omega \), \(G_{n}(\omega ,\cdot ,\cdot )\) is jointly measurable and lower semicontinuous, and for each \((\theta ,s)\in \Theta \times \mathcal S \), \(G_{n}(\cdot ,\theta ,s)\) is measurable. Let \(\Theta _{*}:=\{G(\theta ,s)=0,s\in \mathcal S _{0}\}\) and \(\hat{\Theta }_{n}:=\{\theta \in \Theta :G_{n}(\theta ,s)\le \inf _{\theta \in \Theta }G_{n}(\theta ,s)+c_{n},s\in \hat{\mathcal S }_{n}\}\). Suppose that \(d_{H}(\hat{\Theta }_{n},\Theta _{*})=O_{p}(\delta _{1n})\). Suppose further that there exists a positive constant \(\kappa \) and a neighborhood \(V(s)\) of \(\theta _{*}(s)\) such that

$$\begin{aligned} G(\theta ,s)\ge \kappa \Vert \theta -\theta _{*}(s)\Vert ^{2} \end{aligned}$$
(A.19)

for all \(\theta \in V(s),s\in \mathcal S _{0}\). Suppose that uniformly over \(\mathcal N _{\delta _{1n},0}\),

$$\begin{aligned} G_{n}(\theta ,s)=G(\theta ,s)+O_{p}(\Vert \theta -\theta _{*}(s)\Vert /\sqrt{n})+o_{p}(\Vert \theta -\theta _{*}(s)\Vert ^{2})+O_{p}(\epsilon _{n}). \end{aligned}$$
(A.20)

Then

$$\begin{aligned} \vec {d}_{H}(\Theta _{*},\hat{\Theta }_{n})=O_{p}(\max \{c_{n}^{1/2},\epsilon _{n}^{1/2},1/\sqrt{n}\}). \end{aligned}$$

Proof of Lemma A.2

The proof of this Lemma is similar to Theorem 1 in Sherman (1993). By (A.19), (A.20), and the Hausdorff consistency of \(\hat{\Theta }_n\), it follows that, uniformly over \(\mathcal N _{\delta _{1n},0}\),

$$\begin{aligned} c_n\ge \kappa \Vert \theta -\theta _{*}(s)\Vert ^2+O_p(\Vert \theta -\theta _{*}(s)\Vert /\sqrt{n}) + o_p(\Vert \theta -\theta _{*}(s)\Vert ^2)+O_p(\epsilon _n), \end{aligned}$$
(A.21)

with probability approaching 1. As in Theorem 1 in Sherman (1993), write \(K_n\Vert \theta -\theta (s)\Vert \) for the \(O_p(\Vert \theta -\theta _{*}(s)\Vert /\sqrt{n})\) term, where \(K_n=O_p(1/\sqrt{n})\) and also note that \(o_p(\Vert \theta -\theta _{*}(s)\Vert ^2)\) is bounded from below by \(-\frac{\kappa }{2}\Vert \theta -\theta ^*(s)\Vert ^2\) with probability approaching 1. Thus, we obtain

$$\begin{aligned} \frac{\kappa }{2}\Vert \theta -\theta _{*}(s)\Vert ^2+K_n\Vert \theta -\theta _{*}(s)\Vert \le c_n+O_p(\epsilon _n). \end{aligned}$$
(A.22)

Completing the square, we obtain

$$\begin{aligned} \frac{1}{2}\kappa (\Vert \theta -\theta _{*}(s)\Vert -K_n/\kappa )^2\le c_n+O_p(\epsilon _n)+\frac{1}{2}K_n^2/\kappa =c_n+O_p(\epsilon _n)+O_p(1/n). \end{aligned}$$
(A.23)

Taking square roots gives

$$\begin{aligned} \Vert \theta -\theta _{*}(s)\Vert&\le (2/\kappa )^{1/2}c_n^{1/2}+K_n/\kappa +O_p(\epsilon _n^{1/2})+O_p(1/\sqrt{n})\end{aligned}$$
(A.24)
$$\begin{aligned}&= O_p(c^{1/2}_n)+O_p(\epsilon ^{1/2}_n)+O_p(1/\sqrt{n}). \end{aligned}$$
(A.25)

Thus,

$$\begin{aligned} \vec d_H(\Theta _{*},\hat{\Theta }_n)&= \sup _{s\in \mathcal S _0}\inf _{\theta \in \hat{\Theta }_n}\Vert \theta -\theta _{*}(s)\Vert \end{aligned}$$
(A.26)
$$\begin{aligned}&\le \sup _{s\in \mathcal S _0}\inf _{\theta \in V^{\delta _{1n}}(s)}\Vert \theta -\theta _{*}(s)\Vert \nonumber \\&\le O_p(c^{1/2}_n)+O_p(\epsilon ^{1/2}_n)+O_p(1/\sqrt{n}). \end{aligned}$$
(A.27)

This completes the proof. \(\square \)

The following lemma controls the rate at which \(\hat{\Theta }_n\) is contracted into a neighborhood of \(\Theta _{*}\). Given \(s\in \mathcal S \) and a sequence \(\{\delta _n\}\) such that \(\delta _n\rightarrow 0\), let \(U^{\delta _n}(s):=\{\theta \in \Theta :\Vert \theta -\theta _{*}(s)\Vert \ge \delta _n\}\).

Lemma A.3

Suppose Assumptions 2.1–2.3, 3.1–3.2, and 3.6 hold. Let \(G_{n}\) be defined as in Lemma A.2. Suppose that there exist positive constants \((k,\kappa _{2})\) and a sequence \(\{\delta _{1n}\}\) such that

$$\begin{aligned} G_{n}(\theta ,s)\ge \kappa _{2}\Vert \theta -\theta _{*}(s)\Vert ^{2} \end{aligned}$$
(A.28)

with probability approaching 1 for all \(\theta \in U^{\delta _{n}}(s)\) with \(\delta _{n}:=(k\delta _{1n}/\sqrt{n})^{1/2}\) and \(s\in \mathcal S _{0}^{\bar{\eta }}\). Then,

$$\begin{aligned} \vec {d}_{H}(\hat{\Theta }_{n},\Theta _{*})=O_{p}(\delta _{1n}^{1/2}/n^{1/4})+O_{p}(c_{n}^{1/2}). \end{aligned}$$

Proof of Lemma A.3

Note first that \(\hat{\mathcal S }_n\) is in \(\mathcal S _0^{\bar{\eta }}\) with probability approaching 1 by Assumption 3.6. Let \(\tilde{c}_n:=\sqrt{n}c_n\) and \(\bar{c}_n:=\max \{\kappa _2k\delta _{1n},\tilde{c}_n\}\). Let \(\epsilon _n:=(\bar{c}_n /\kappa _2 \sqrt{n})^{1/2}\). Then, uniformly over \(\mathcal S _0^{\bar{\eta }}\),

$$\begin{aligned} \inf _{\Theta \cap U^{\epsilon _n}(s)}\sqrt{n}G_n(\theta ,s)\ge \kappa _2 \sqrt{n}\epsilon _n^2 \ge \bar{c}_n. \end{aligned}$$
(A.29)

Since \(\sqrt{n}G_n(\hat{\theta }_n(s),s)\le \tilde{c}_n\) for all \(s\in \hat{\mathcal S }_n\), the results above ensure

$$\begin{aligned} \vec d_H(\hat{\Theta }_n,\Theta _{*})&=\sup _{s\in \hat{\mathcal S }_n}\inf _{\theta \in \Theta _{*}}\Vert \hat{\theta }_n(s)-\theta \Vert \\&\le \sup _{s\in \hat{\mathcal S }_n}\Vert \hat{\theta }_n(s)-\theta _{*}(s)\Vert \le \epsilon _n=O_p(\delta _{1n}^{1/2}/n^{1/4})+O_p(\tilde{c}_n^{1/2}/n^{1/4}). \end{aligned}$$

This ensures the claim of the Lemma. \(\square \)

Proof of Theorem 3.2

We first show (A.19) holds with \(G(\theta ,s)=Q(\theta ,s)\). For this, we use the second-order Taylor expansion of \(Q(\theta ,s)\). For \(\theta \in V^{\delta _{1n}}(s)\), it holds by Assumptions 3.2 (v) and 3.4 that

$$\begin{aligned} Q(\theta ,s)=&\;Q(\theta _{*}(s),s)+\nabla _\theta Q(\theta _{*}(s),s)^{\prime }(\theta -\theta _{*}(s))\nonumber \\&\; +\frac{1}{2}(\theta -\theta _{*}(s))^{\prime }\nabla ^2_\theta Q(\bar{\theta }(s),s)(\theta -\theta _{*}(s)), \end{aligned}$$
(A.30)

where \(\bar{\theta }(s)\) is on the line segment that connects \(\theta \) and \(\theta _{*}(s)\). By (15), \(Q(\theta _{*}(s),s)=0\), and by the first order condition of the optimality, \(\nabla _\theta Q(\theta _{*}(s),s)=0\). Thus, it follows that

$$\begin{aligned} Q(\theta ,s)= \frac{1}{2}(\theta -\theta _{*}(s))^{\prime }\nabla ^2_\theta Q(\bar{\theta }(s),s)(\theta -\theta _{*}(s))\ge \kappa \Vert \theta -\theta _{*}(s)\Vert ^2, \end{aligned}$$
(A.31)

where \(\kappa :=\inf _{\theta \in \Theta ,s\in \mathcal S _0} \xi (\nabla ^2_\theta Q(\theta ,s))/2\), and \(\kappa >0\) by Assumption 3.5.

We next show that (A.20) holds for

$$\begin{aligned} G_{n}(\theta ,s)=&\; \frac{1}{n}\sum _{i=1}^{n}(s(X_{i})-r_{\theta }(X_{i}))^{\prime }W(s(X_{i})-r_{\theta }(X_{i}))\nonumber \\&\quad -\frac{1}{n}\sum _{i=1}^{n}(s(X_{i})-r_{\theta _{*}(s) }(X_{i}))^{\prime }W(s(X_{i})-r_{\theta _{*}(s) }(X_{i})). \end{aligned}$$
(A.32)

In what follows, let \(\hat{E}_n\) denote the expectation with respect to the empirical distribution. Using the Taylor expansion of \(G_n\) and \(G\) with respect to \(\theta \) at \(\theta _{*}(s)\), we may write

$$\begin{aligned} G_n(\theta ,s)-G(\theta ,s)=S_{1,n}(\theta ,s)+S_{2,n}(\theta ,s), \end{aligned}$$
(A.33)

where

$$\begin{aligned} S_{1n}(\theta ,s)&:= -2(\theta -\theta _{*}(s))^{\prime }(\hat{E}_n-E)[\nabla _\theta r_{\theta _{*}(s)}(x)^{\prime }W(s(x)-r_{\theta _{*}}(x))]\nonumber \\&+o_p(\Vert \theta -\theta _{*}(s)\Vert ^2) \end{aligned}$$
(A.34)
$$\begin{aligned} S_{2n}(\theta ,s)&:= (\theta -\theta _{*}(s))^{\prime }(\hat{E}_n-E)[\nabla _\theta r_{\theta _{*}(s)}(x)^{\prime }W\nabla _\theta r_{\theta _{*}(s)}(x)](\theta -\theta _{*}(s)).\nonumber \\ \end{aligned}$$
(A.35)

Thus, for (A.20) to hold, it suffices to show that \( S_{1n}(\theta ,s)=O_p(\Vert \theta -\theta _{*}(s)\Vert /\sqrt{n})+o_p(\Vert \theta -\theta _{*}(s)\Vert ^2)\) and \(S_{2n}(\theta ,s)=O_p(\epsilon _n)\) for some \(\epsilon _n\rightarrow 0\). For \(S_{1n}\), note that our assumptions suffice for the conditions of Lemma A.4. Thus, \(\Phi \) is a \(P_0\)-Donsker class. This ensures \(S_{1n}(\theta ,s)=O_p(\Vert \theta -\theta _{*}(s)\Vert /\sqrt{n})+o_p(\Vert \theta -\theta _{*}(s)\Vert ^2)\). We now consider \(S_{2n}\). For each \(s\in \mathcal S _0\) and \(x\in \mathcal X \), let \(\phi _s(x):=\nabla _\theta r_{\theta _{*}(s)}(x)^{\prime }W\nabla _\theta r_{\theta _{*}(s)}(x)\). Note that

$$\begin{aligned} E\left[\sup _{(\theta ,s)\in \mathcal N _{\delta _{1n},0}}\left|S_{2n}(\theta ,s)\right|\right]&\le \delta _{1n}^2 n^{-1/2}E\left[\sup _{s\in \mathcal S _0}\left|\mathbb G _n\phi _s\right|\right]\nonumber \\&\le n^{-1/2}\delta _{1n}^2 C J_{[]}(1,\mathcal S _0,\Vert \cdot \Vert _{L^2})\left\Vert\sup _{s\in \mathcal S _0}|\phi _s|\right\Vert_{L^2}, \end{aligned}$$
(A.36)

where the last inequality follows from Lemma B.1 of Ichimura and Lee (2010). Now, Markov’s inequality, Lemma A.4, and Assumption 3.4 (ii) ensure that \(S_{2n}=O_p(\epsilon _n)\), where \(\epsilon _n=n^{-1/2}\delta _{1n}^2\).

We further set \(c_n=0\). Note that the estimator defined in (17) with \(c_n=0\) equals the set estimator \(\hat{\Theta }_n=\{\theta :G_n(\theta ,s)\le \inf _{\theta \in \Theta }G_n(\theta ,s)\}.\) By Assumption 3.7 and Step 4 of the proof of Theorem 3.1, we may take \(\delta _{1n}=O_p(n^{-1/4})\) as an initial rate. Lemma A.2 then implies that \(\vec d_H(\Theta _{*},\hat{\Theta }_n)=O_p(\epsilon ^{1/2}_n)\), where \(\epsilon _n=O_p(n^{-1/2}\delta _{1n}^2)=O_p(n^{-1})\). Thus, \(\vec d_H(\Theta _{*},\hat{\Theta }_n)=O_p(n^{-1/2})\).

Now we consider \(\vec d_H(\hat{\Theta }_n,\Theta _{*})\). We show that (A.28) holds for \(G_n\). For each \(\theta \) and \(s\), let \(L_n(\theta ,s):=\frac{1}{n}\sum _{i=1}^{n}(s(X_{i})-r_{\theta }(X_{i}))^{\prime }W(s(X_{i})-r_{\theta }(X_{i})) \nonumber \). Let \(s\in \mathcal S ^{\bar{\eta }}_0\) and \(\theta \in U^{\delta _{1n}}(s)\). A second-order Taylor expansion of \(G_n(\theta ,s)=L_n(\theta ,s)-L_n(\theta _{*}(s),s)\) with respect to \(\theta \) at \(\theta _{*}(s)\) gives

$$\begin{aligned} G_n(\theta ,s)&=\nabla _\theta L_n(\theta _{*}(s),s)^{\prime }(\theta -\theta _{*}(s))+\frac{1}{2}(\theta -\theta _{*}(s))^{\prime }\nabla _\theta ^2 L_n(\bar{\theta }_n(s),s)(\theta -\theta _{*}(s))\nonumber \\&=o_p(1)+\frac{1}{2}(\theta -\theta _{*}(s))^{\prime }\nabla _\theta ^2 L_n(\bar{\theta }_n(s),s)(\theta -\theta _{*}(s))\nonumber \\&\ge \kappa _2\Vert \theta -\theta _{*}(s)\Vert ^2, \end{aligned}$$
(A.37)

with probability approaching 1 for some \(\kappa _2>0\), where \(\bar{\theta }_n(s)\) is a point on the line segment that connects \(\theta \) and \(\theta _{*}(s)\). The last inequality follows from Step 3 of the proof of Theorem 3.1 and Assumption 3.5.

Set \(\tilde{c}_n=0\). Then, Lemma A.3 implies \(\vec d_H(\hat{\Theta }_n,\Theta _{*})=O_p(\delta _{1n}^{1/2}/n^{1/4})\). Setting \(\delta _{1n}=O_p(n^{-1/4})\) refines this rate to \(O_p(n^{-3/8})\). Repeated applications of Lemma A.3 then implies \(\vec d_H(\hat{\Theta }_n,\Theta _{*})=O_p(n^{-1/2})\). As both of the directed Hausdorff distances converge to 0 at the stochastic order of \(n^{-1/2}\), the claim of the theorem follows. \(\square \)

Lemma A.4

Suppose Assumptions 3.2 and 3.4 hold. Then \(\Phi \) is a \(P_0\)-Donsker class.

Proof of Lemma A.4

The proof of Theorem 3.1 shows that each \(f_s\in \Phi \) is Lipschitz in \(s\). For any \(\epsilon >0\), Assumption 3.2 (ii)–(iii), Theorems 2.7.11 and 2.7.2 in van der Vaart and Wellner (1996), and (A.12) imply

$$\begin{aligned} \ln N_{[]}(\epsilon \Vert F\Vert _{L^2},\Psi ,\Vert \cdot \Vert _{L^2}) \le \ln N(\epsilon /2,\mathcal S _{0}^{\delta _2},\rho )^L \le C (1/\epsilon )^{k/\gamma }, \end{aligned}$$
(A.38)

where \(C\) is a constant that depends only on \(k,\gamma ,L\), and \({\text{ diam}}(\mathcal X )\). Thus, for any \(\delta >0\),

$$\begin{aligned} J_{[]}(\delta ,\Phi , \Vert \cdot \Vert _{L^2})\le \int \limits _{0}^{\delta } \sqrt{1+C (1/\epsilon )^{k/\gamma }}d\epsilon <\infty . \end{aligned}$$
(A.39)

Example 2.14.4 in van der Vaart and Wellner (1996) ensures that \(\Psi \) is \(P_0\)-Donsker. \(\square \)

1.5 First Stage Estimation

In the following, we work with the following population criterion function. For each \(s\in \mathcal S \), let \(\mathcal Q \) be defined by

$$\begin{aligned} \mathcal Q (s):=\sum _{j=1}^lE[\varphi ^{(j)}(X_i,s)]^2_+. \end{aligned}$$
(A.40)

Lemma A.5

Suppose that Assumption 3.9 (i) holds. Let the criterion function be given as in (A.40). Then, there exists a positive constant \(C_{2}\) such that

$$\begin{aligned} \mathcal Q (s)\le \inf _{s_{0}\in \mathcal S _{0}}C_{2}\Vert s-s_{0}\Vert _{W}^{2}. \end{aligned}$$

Proof of Lemma A.5

Let \(s\in \mathcal S \) be arbitrary. For any \(s_0\in {\mathcal S }\), \(E[\varphi ^{(j)}(X,s_0)]\le 0\) for \(j=1,{\ldots }, l\). Let \(V\) be an open set that contains \(s\) and \(s_0\). Assumption 3.9 (i) and Theorem 1.7 in Lindenstrauss et al. (2007), it holds that

$$\begin{aligned} \mathcal Q (s)&\le \sum _{j=1}^l\Big (E[\varphi ^{(j)}(X_i,s)]-E[\varphi ^{(j)}(X_i,s_0)]\Big )^2_+\nonumber \\&\le \left(\sum _{j=1}^l \Vert \sup _{g \in \tilde{V}_j} \dot{\varphi }^{(j)}_{g}\Vert ^2_{op}\right)\Vert s-s_0\Vert _W^{2}, \end{aligned}$$
(A.41)

where \(\tilde{V}_j:=\{g\in V:\dot{\varphi }^{(j)}_{g}\;{\text{ exists}}\}\). Let \(C_2:=\sum _{j=1}^l \Vert \sup _{g \in \mathcal S } \dot{\varphi }^{(j)}_{g}\Vert ^2_{op}\). It holds that \(0<C_2<\infty \) by the hypothesis. We thus obtain

$$\begin{aligned} \mathcal Q (s)\le C_2\Vert s-s_0\Vert ^2_W \end{aligned}$$
(A.42)

for all \(s_0\in \mathcal S _0\). Note that \(s_0\mapsto \Vert s-s_0\Vert _W\) is continuous and \(\mathcal S _0\) is compact by Assumption 3.2 (ii)–(iii) and Assumption 3.10 (i). Taking infimum over \(\mathcal S _0\) then ensures the desired result. \(\square \)

Lemma A.6

Suppose Assumption 3.9 (ii) holds. Let the criterion function be given as in (A.40). Then there exists a positive constant \(C\) such that

$$\begin{aligned} \mathcal Q (s)\ge \inf _{s_{0}\in \mathcal S _{0}}C_{3}\Vert s-s_{0}\Vert _{W}^{2}. \end{aligned}$$

Proof of Lemma A.6

If \(s\in \mathcal S _0\), the conclusion is immediate. Suppose that \(s\notin \mathcal S _0.\) By Assumption 3.9 (ii), there exists \(s_0\in \mathcal S _0\)

$$\begin{aligned} \mathcal Q (s)= \sum _{j\in \mathcal I (s)}(E[\varphi ^{(j)}(X_i,s)])^2 \ge C_j\Vert s-s_0\Vert _W^2. \end{aligned}$$
(A.43)

Let \(C_3:= C_j\). Thus, the claim of the lemma follows. \(\square \)

In the following, let \(\mathcal G :=\{g:g(x)=\varphi _{s}^{(j)}(x),s\in \mathcal S ,j=1,{\ldots } ,l\}\).

Lemma A.7

Suppose Assumptions 3.2, 3.4 , and 3.8 hold. Then \(\mathcal G \) is a \(P_0\)-Donsker class.

Proof of Lemma A.7

By Assumption 3.8, \(\varphi ^{(j)}_s\) is Lipschitz in \(s\). The rest of the proof is the same as that of Lemma A.4. \(\square \)

Proof of Theorem 3.3

We establish the claims of the theorem by applying Theorem B.1 in Santos (2011). Note first that Assumption 3.2 (ii)–(iii) and Assumption 3.10 (i) ensure that \(\mathcal S \) is compact. This ensures condition (i) of Theorem B.1 in Santos (2011). Condition (ii) of Theorem B.1 in Santos (2011) is ensured by Assumption 3.10. Lemma A.7 ensures that uniformly over \(\Theta _n\)

$$\begin{aligned} \mathcal Q _{n}(s)=\mathcal Q (s)+O_p(n^{-1}). \end{aligned}$$
(A.44)

Thus, condition (iii) of Theorem B.1 in Santos (2011) hold with \(C_1=1\) and \(c_{2n}=n^{-1}\). Lemma A.5 ensures that \(\mathcal Q (s)\le \inf _{s_0\in \mathcal S _0}C_2\Vert s-s_0\Vert _W^2\) for some \(C_2>0\). Thus, condition (iv) of Theorem B.1 in Santos (2011) hold with \(\kappa _1=2\). Now, the first claim of Theorem B.1. in Santos (2011) establishes

$$\begin{aligned} d_{H,W}(\hat{\mathcal S }_n,\mathcal S _0)=o_p(1). \end{aligned}$$
(A.45)

Furthermore, Lemma A.6 ensures \(\mathcal Q (s)\ge \inf _{s_0\in \mathcal S _0}C_3\Vert s-s_0\Vert ^2\) for some \(C_3>0\). This ensures condition (v) of Theorem B.1 in Santos (2011) with \(\kappa _2=2\). Now, the second claim of Theorem B.1. in Santos (2011) ensures

$$\begin{aligned} d_{H,W}(\hat{\mathcal S }_n,\mathcal S _0)=O_p(\max \{(b_n/a_n)^{1/2},\delta _n\}). \end{aligned}$$
(A.46)

Since \((b_n/a_n)^{1/2}/\delta _n\rightarrow \infty \), the claim of the theorem follows. \(\square \)

Proof of Corollary 3.1

In what follows, we explicitly show \(\mathcal Q _{n}\)’s dependence on \(\omega \in \Omega \). Let \(\mathcal Q _{n}:\Omega \times \mathcal S \rightarrow \mathbb R \) be defined by \(\mathcal Q _{n}(\omega ,s)=\sum _{j=1}^l(\frac{1}{n}\sum _{i=1}^{n}\varphi (X_{i}(\omega ),s))_+^2\). By Assumption 2.3, \(\varphi \) is continuous in \(s\) for every \(x\) and measurable for every \(s\). Also note that \(X_i\) is measurable for every \(i\). Thus, by Lemma 4.51 in Aliprantis and Border (2006), \(\mathcal Q _{n}\) is jointly measurable in \((\omega ,s)\) and lower semicontinuous in \(s\) for every \(\omega \). Note that \(\mathcal S \) is compact by Assumptions 3.2 (ii)–(iii) and 3.10 (i), which implies \(\mathcal S \) is locally compact. Since \(\mathcal S \) is a metric space, it is a Hausdorff space. Thus, by Proposition 5.3.6 in Molchanov (2005), \(\mathcal Q _n\) is a normal integrand defined on a locally compact Hausdorff space. Proposition 5.3.10 in Molchanov (2005) then ensures the first claim.

Now we show the second claim using Theorem 3.3 (i). Assumptions 2.1–2.3 hold with \(\varphi \) defined in (5). Assumption 3.2 holds by our hypothesis with \(\gamma =1\). Assumption 3.3 is also satisfied by the hypothesis. Note that for each \(j\), \(\varphi ^{(j)}(x,s)=(y_L-s(z))1_{A_k}(z)\) or \(\varphi ^{(j)}(x,s)=(s(z)-y_U)1_{A_k}(z)\) for some \(k\in \{1,{\ldots }, K\}\). Without loss of generality, let \(j\) be an index for which \(\varphi ^{(j)}(x,s)=(y_L-s(z))1_{A_k}(z)\) for some Borel set \(A_k\). For any \(s,s^{\prime }\in \mathcal S \),

$$\begin{aligned} |\varphi ^{(j)}(x,s)-\varphi ^{(j)}(x,s^{\prime })|=|(s^{\prime }(z)-s(z))1_{A_k}(z)|\le \rho (s,s^{\prime }). \end{aligned}$$
(A.47)

It is straightforward to show the same result for other indexes. Thus, Assumption 3.8 is satisfied.

Now for \(j\) such that \(\varphi ^{(j)}(x,s)=(y_L-s(z))1_{A_k}(z)\), note that

$$\begin{aligned} |\bar{\varphi }^{(j)}(s+h)-\bar{\varphi }^{(j)}(s) -E[h(Z)(-1_{A_k}(Z))]|=0. \end{aligned}$$
(A.48)

Thus, the Fréchet derivative is given by \(\dot{\varphi }^{(j)}_s(h)=E[h(Z)(-1_{A_k}(Z))]\). By Proposition 6.13 in Folland (1999), the norm of the operator is given by \(\Vert \dot{\varphi }^{(j)}_s\Vert _{op}=E[|-1_{A_k}(Z)|^2]^{1/2}=P_0(Z\in A_k)>0\), which ensures the boundedness (continuity) of the operator. It is straightforward to show the same result for other indexes. Hence, Assumption 3.9 (i) is satisfied. By construction, Assumption 3.10 (i) is satisfied, and Assumption 3.10 (ii) holds with \(\delta _n\asymp J_n^{-1}\) (See Chen 2007). These ensure the conditions of Theorem 3.3 (i). Thus, the second claim follows.

For the third claim, let \(s\in \mathcal S \setminus \mathcal S _0\). Then, there exists \(j\) such that \(E[\varphi ^{(j)}(X_i,s)]>0\). Without loss of generality, suppose that \(E[\varphi ^{(j)}(X_i,s)]=E[(Y_{L,i}-s(Z_i))1_{A_k} (Z_i)]\ge \delta >0\). Let \(s_0\in \mathcal S _0\) be such that

$$\begin{aligned} E[(Y_{L,i}-s_0(Z_i))1_{A_k}(Z_i)]=0. \end{aligned}$$
(A.49)

Such \(s_0\) always exists by the intermediate value theorem. Then, for \(j\) with which \(\varphi ^{(j)}(x,s)=(y_L-s(z))1_{A_k}(z)\), it follows that

$$\begin{aligned} E[\varphi ^{(j)}(X_i,s)]&=E[(Y_{L,i}-s(Z_i))1_{A_k}(Z_i)]-E[(Y_{L,i}-s_0(Z_i))1_{A_k}(Z_i)]\nonumber \\[6pt]&=E[(s_0(Z_i)-s(Z_i))1_{A_k}(Z_i)]>0 \end{aligned}$$
(A.50)

Thus, we have

$$\begin{aligned} E[\varphi ^{(j)}(X_i,s)]\ge C\Vert s_0-s\Vert _W, \end{aligned}$$
(A.51)

where \(C:=\inf _{q\in E}E[q(Z_i)1_{A_k}(Z_i)]\) and \(E:=\{q\in \mathcal S :\Vert q\Vert _W=1,E[q(Z_i)1_{A_k} (Z_i)]>0\}\). Since \(C\) is the minimum value of a linear function over a convex set, it is finite. Furthermore, by the construction of \(E\), it holds that \(C>0\). Thus, Assumption 3.9 (ii) holds. Thus, by Theorem 3.3 (ii), the third claim follows. \(\square \)

Proof of Corollary 3.2

We show the claim of the corollary using Theorem 3.2. Note that we have shown, in the proof of Corollary 3.1, that Assumptions 2.1–2.3, 3.2 (i)–(iii), and 3.3 hold. Thus, to apply Theorem 3.2, it remains to show Assumptions 2.4, 3.2 (iv), and 3.4–3.7.

Assumption 2.4 is satisfied by the parameterization \(r_\theta (z)=\theta ^{(1)}+\theta ^{(2)}z\). For Assumption 3.2 (iv), note that \(\mathcal R _\Theta \) is given by

$$\begin{aligned} \mathcal R _\Theta =\big \{r_\theta :r_\theta =\theta ^{(1)}+\theta ^{(2)}z, \quad \theta \in \Theta \big \}. \end{aligned}$$

Since \(\Theta \) is convex, for any \(\lambda \in [0,1]\), it holds that \(\lambda r_\theta +(1-\lambda )r_{\theta ^{\prime }}=r_{\lambda \theta +(1-\lambda )\theta ^{\prime }}\in \mathcal R _\Theta \). Thus, Assumption 3.2 (iv) is satisfied. For Assumption 3.4, note first that \(r_\theta \) is twice continuously differentiable on the interior of \(\Theta \). Because \(r_\theta \) is linear, \(\max _{|\alpha |\le 2}|D^{\alpha }_\theta r_\theta (z)-D^{\alpha }_\theta r_{\theta ^{\prime }}(z)|= (1+z^2)^{1/2}\Vert \theta -\theta ^{\prime }\Vert \) by the Cauchy–Schwarz inequality. By the compactness of \(\mathcal Z \), \(C(z):= (1+z^2)^{1/2}\) is bounded. Thus, Assumption 3.4 (i) is satisfied. Similarly, \(\max _{|\alpha |\le 2}\sup _{\theta \in \Theta }|D^\alpha _\theta r_\theta |\le \max \{1,|z|,C(1+z^2)^{1/2}\}=:R(z)\), where \(C:=\sup _{\theta \in \Theta }\Vert \theta \Vert \). By the compactness of \(\mathcal Z \) and \(\Theta \), \(R\) is bounded. Thus, Assumption 3.4 (ii) is satisfied. Note that the Hessian of \(Q(\theta ,s)\) with respect to \(\theta \) is given by \(2E[(1,z)(1,z)^{\prime }]\), which does not depend on \(\theta \) nor \(s\) and is positive definite by the assumption that \(Var(Z)>0\). Thus, Assumption 3.5 is satisfied. Assumptions 3.6 and 3.7 are ensured by Corollary 3.1. Now the conditions of Theorem 3.2 are satisfied. Thus, the claim of the Corollary follows. \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kaido, H., White, H. (2013). Estimating Misspecified Moment Inequality Models. In: Chen, X., Swanson, N. (eds) Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1653-1_13

Download citation

Publish with us

Policies and ethics