Identification Robust Inference for the Risk Premium in Term Structure Models

We propose identification robust statistics for testing hypotheses on the risk premia in dynamic affine term structure models. We do so using the moment equation specification proposed for these models in Adrian et al. (2013). We extend the subset (factor) Anderson-Rubin test from Guggenberger et al. (2012) to models with multiple dynamic factors and time-varying risk prices. Unlike projection-based tests, it provides a computationally tractable manner to conduct identification robust tests on a larger number of parameters. We analyze the potential identification issues arising in empirical studies. Statistical inference based on the three-stage estimator from Adrian et al. (2013) requires knowledge of the factors' quality and is misleading without full-rank beta's or with sampling errors of comparable size as the loadings. Empirical applications show that some factors, though potentially weak, may drive the time variation of risk prices, and weak identification issues are more prominent in multi-factor models.


Introduction
A variety of Dynamic affine term structure models (DATSMs) have been developed since the foundational work by Vasicek (1977) and Cox et al. (1985). They help to understand the movements of bond yields and to analyze financial markets. DATSMs are empirically appealing for their smooth tractability and simple characterization of how risks get priced. There are many studies employing this framework. To list a few: Cochrane and Piazzesi (2005) apply affine term structure models to study time variation in expected excess bond returns using a single powerful explanatory factor; Wu and Xia (2016) use affine models to summarize the macroeconomic effects of unconventional monetary policy; Ang and Piazzesi (2003) investigate how macro economic variables affect bond prices and the dynamics of the yield curve, Buraschi and Jiltsov (2005) study the properties of the nominal and real risk premia of the term structure of interest rates and Goliński and Zaffaroni (2016) incorporate long memory state variables into the term structure model. We adopt the DATSMs setup developed by Adrian et al. (2013) which nests a general class of linear asset pricing models and can be regarded as a linear asset pricing model with time-varying risk premia and dynamic factors.
Many recent studies have developed approaches to estimate DATSMs. Most of them involve a time-consuming numerical optimization procedure which results from the high non-linearity. The inference concerning the coefficients suffers similar challenges. Another undesirable feature, as pointed out by, e.g., Hamilton and Wu (2012), is that the identification can be problematic. Lack of identification, e.g., due to unspanned factors (Adrian et al. (2013)), and thus a relatively flat surface of the likelihood also leads to unsatisfying inference results (e.g., Kan and Zhang (1999), Gospodinov et al. (2017), Kleibergen (2009), Hamilton and Wu (2012), Dovonon and Renault (2013), Beaulieu et al. (2016), Khalaf and Schaller (2016), Cattaneo et al. (2022)).
Unspanned factors refer to those that only affect the dynamics of bond prices under the historical measure but not the risk neutral one (see, e.g., Joslin et al. (2014)). The presence of such factors has been documented in empirical studies (e.g., Ludvigson and Ng (2009), Adrian et al. (2013)). They can lead to identification challenges since varying the parameters of the risk neutral pricing measure -risk premia parameters -associated with these factors does not strongly influence bond prices. Adrian et al. (2013) allow for the presence of unspanned factors with the prior knowledge of knowing which factors are unspanned. Their proposed estimation procedure differs for cases with and without unspanned factors.
Because of the identification issues, traditional inference methods based on t-tests and Wald statistics become unreliable for conducting inference on the risk prices in DATSMs (e.g., Stock andWright (2000) Kleibergen (2005), Antoine and Renault (2020), Andrews and Cheng (2012), Antoine and Lavergne (2022)). We therefore propose easy-to-implement identification robust test procedures which are valid even when the model is not identified due to unspanned factors. The test procedures we provide can be used to study the time-varying risk-premia for linear asset pricing models. Our proposed inference procedures use the framework presented in Adrian et al. (2013), where the risk of bond prices is modeled as a linear functional in observed factors. The risk of bond prices can then be decomposed into two parts: a time-constant and a time-varying part. We propose statistics for testing hypothezes specified on all parameters of the time-varying components and on just subsets of them. The paper is organized as follows: Section 2 introduces the DATSM. Section 3 states the three step estimation procedure from Adrian et al. (2013) and shows that it is sensitive to identification issues. Section 3 also shows the empirical relevance of these identification issues using the data from Adrian et al. (2013). Section 4 introduces the identification robust tests for the time-varying risk premia. It conducts a small simulation experiment and applies then for an one time-varying risk factor setting. Section 6 introduces the identification robust tests for testing hypothezes on subsets of the timevarying risk premia. It applies them to a variety of multi-factor settings using the data from Adrian et al. (2013). The final sixth section concludes. We use the following notation throughout the paper: ⊗ and vec(·) represent respectively the Kronecker product and vectorization operator; Σ 1 2 is the lower triangular Cholesky decomposition of the positive definite symmetric matrix Σ such that Σ = Σ 1 2 Σ 1 2 ; for a N × K dimensional full rank matrix A : P A = A(A A) −1 A and M A = I N − P A .

Dynamic affine term structure models (DATSMs)
We briefly discuss the popular class of DATSMs with observed factors. Instead of working directly with the implied yields on an n-period bond as usually done in the term structure literature, we make use of the excess holding return of an n-period bond as in Adrian et al. (2013).
We first illustrate the model set-up following Adrian et al. (2013) and thereafter consider tests associated with the risk premia. For P t,n the price at time t of a zero-coupon bond maturing at time t + n, the pricing kernel, M t+1 , is such that P t,n = E t (M t+1 P t+1,n−1 ).

Regression estimator and Wald based inference
To estimate the price of risk, Adrian et al. (2013) propose a three-step procedure akin to the two pass procedure from Fama and MacBeth (1973): 1. Estimate: 2. Estimate: by least squares to obtainâ (n) ,d (n) andβ (n) , n = 1, . . . , N .
, n = 1, . . . , N , and estimate λ 0 and Λ 1 using: (â +ĝ +βμ) The three-step procedure essentially regresses transformed returns on estimated β's. Two issues can hamper the reliability of the final stage: the sampling error of estimates resulting from previous stages and the quality of the β's. The first issue is negligible when the information dominates the asymptotically vanishing sampling errors, and only slight modifications are necessary for the asymptotic variance estimator used in test statistics to accommodate for it. However, when modeling the unspanned factors, Adrian et al. (2013) show that the entries in the β's corresponding to unspanned factors are zero (Assumption 3.1), which leads to identification problems (e.g., Kleibergen (2009), Beaulieu et al. (2013,Kleibergen and Zhan (2015)).
Assumption 3.1. Potential existence of (nearly) unspanned factors: β = (B, C), B represents the spanned factors and is of full rank K B ≤ K while C = O(1/ √ T ) reflects the (nearly) unspanned factors. If K B = K, there are no (nearly) unspanned factors. Adrian et al. (2013) show that Assumption 3.1 embeds the unspanned factors that do not affect the dynamics of bonds under the historical pricing measure. The rows of β comprised of close to zero values correspond to unspanned factors. Adrian et al. (2013) assume that the location of these rows is known, and their three-step estimation procedure excludes the unspanned factors in the second step by only including spanned factors. The third step would otherwise encounter identification issues resulting from a classical multicollinearity problem. It is straightforward to show that zero β's lead to an identification problem because any value of the λ's associated with the zero elements in β goes for the values of excess returns. When we instead just have small β's, which are comparable in magnitude to the estimation error, we similarly encounter such an identification problem (e.g., Kleibergen (2009), Antoine and Renault (2009, Kleibergen and Zhan (2020), Kleibergen et al. (2022)).
Proposition 3.1 states some well-known results from the weak identification literature. It shows that the risk premia estimatorΛ becomes inconsistent in the presence of weak factors because it converges to a non-normal distribution. This results since varying the value of the risk premia associated with the weak factors does not change the excess returns. The asymptotic distribution of the conventional Wald statistic for testing the null hypothesis H 0 : Λ 1 = Λ 0 1 then no longer converges to a χ 2 -distribution, and the same holds for subset testing based on this estimator. Therefore, the conventional test statistics can be misleading in the presence of unspanned factors. (a) If β is of full rank: where follows a non-standard distribution, soΛ no longer converges to the true value Λ at rate √ T .
Proof. See the Online Supplementary Appendix.
We focus on inference concerning the time-varying component of risk prices Λ 1 . We therefore demean the one-period (log) excess holding returns by subtracting its time-series average: for z = r, X, v and e resp. When stacking the equations for N different maturities: where R t+1 = (r t+1,1 . . .r t+1,N ) , β = (β (1) . . . β (n) ) , e t = (ē t,1 . . .ē t,N ) , the pricing equation closely resembles the beta-pricing model for the return on (portfolios of) assets: r t+1 = βλ + βF t+1 + u t+1 , with r t an N -dimensional vector with the returns on N assets, β the N × K dimensional beta matrix and F t a K -dimensional vector of risk factors. A further important similarity that both models imply is the reduced rank structure that becomes obvious using a slight re-specification: where the N × 2K and N × (K + 1) dimensional matrices β(Λ 1 . . . I K ) and β(λ . . . I K ) are each at most of rank K, so except for the largest K singular values, all, K and 1 resp., singular values of these matrices are zero. Further for Λ 1 and λ to be well defined, β should be of full rank. When β is near a reduced rank value, or in other words, if some factors are weak/unspanned, we encounter an identification issue which is also reflected by more than just K or 1 resp. singular values of the above matrices being equal or close to zero. The least squares estimates of the β's associated with the excess returns for bonds with 11 different maturities of 6, 12, 18 · · · , 60 and 84, 120 months over the sample period 1987:01-2011:12, and the factors are the first five principal components generated using the cross-section of bond yields for maturities 3, · · · , 120 months (same data taken from Adrain et al. (2013)). Based on Proposition 3.1.(a), we report LS estimates of β's with t-statistics in round brackets; andp-values of F -tests in square brackets, testing the null hypothesis H 0 : β j = 0 that each column is jointly zero.. The Kleibergen-Paap rank statistic (see Kleibergen and Paap (2006)) testing H 0 : rank(β)=4: 1.6561 [0.9764].
Illustrating unspanned and weak factors in an empirical study In reality, there is no direct prior knowledge of the number of unspanned or weak factors. To show their presence and empirical relevance, we use the data from Adrian et al. (2013), i.e., the zero coupon yield data constructed by Gürkaynak et al. (2007). Table 1 shows the factor loading estimates and relevant tests corresponding to the five principal component (PCA) factors employed in Adrian et al. (2013). Table 1 shows that many elements of β are small and not statistically significant from zero at the 5% significance level. Though the F -tests on the columns of β are significant, the rank test of the β-matrix indicates potential identification problems, because for these factor loadings we can not reject the reduced rank null hypothesis at the 5% significance level.
. R t uses the demeaned excess returns of the same bonds as in Table 1,X t involves the five PCA factors in (a) and uses the five PCA factors with one additional macro factor in (b).  Table 1 and shows a scree plot of the (log) singular values of which estimates β(Λ 1 . . . I K ). In case of strong spanned factors, the smallest K, is five, singular values should be close to zero while Figure 1. (a) shows that the smallest six singular values are close to zero which indicates a weak/unspanned factor problem which is further reflected by the rank test not rejecting rank(β)=4 in Table 1 at the 5% significance level.  As previously noted in the literature (e.g., Kleibergen and Zhan (2020), Kleibergen et al. (2022)), weak identification issues are often present when macro factors are used. Following Ang and Piazzesi (2003), we construct one macro factor, the real activity measure, which is the first principal component resulting from four variables that capture real US macro activity: the "Help Wanted Advertising in Newspapers (HELP)" 1 index, unemployment (UE), the growth rate of employment (EMPLOY), and the growth rate of industrial production (IP). As shown in Table 2, the macro factor is much less correlated with returns and thus is more likely to result in an identification issue. This is further reflected in Figure 1(b) containing the scree plot which shows that while there are six factors, the smallest seven singular values are close to zero. Table 2 also shows a tiny value for the rank test which provides another indication of a weak/unspanned factor problem.
The identification problems revealed in Figures 1-2 and Tables 1-2 not only affect the validity of the estimators but also the reliability of traditional inference procedures. Adrian et al. (2013) assume that the positions of the zero rows in β are known when dealing with unspanned factors. We remain agnostic about this and provide testing procedures concerning Λ 1 which are identification robust without the need of prior knowledge of the unspanned factors.

Identification robust tests of time-varying risk premia
The identification robust tests are based on the sample moment vector. The sample moment vector results from the observed factors having no predictive power for the prediction error,ē t+1,n , so our sample moment vector for Λ 1 is: T T t=1X t−1X t−1 , and its derivative with respect to vec(Λ 1 ) is We next make an assumption regarding the large sample behavior of the sample moment vector and its derivative.
where the Jacobian J = −(Q XX ⊗ β),Q XX → p Q XX , and ψ f and ψ q are N × K and N K × K 2 dimensional random vectors: , Assumption 4.1 is a high-level assumption which resembles Assumption 1 in Kleibergen (2005) and holds under mild conditions. Assumption 4.1 holds true irrespective of Assumption 3.1. Assumption 2.1 is sufficient for Assumption 4.1, but our proposed test statistics can be applied to more general cases than the model implied in Assumption 2.1. For our setting: with Ψ q =vecinv(ψ q ) and We also have with vech(A) containing the unique elements of a symmetric matrix A. Since ψ q has N K 3 elements while the number of unique elements in Ψ β and Ψ XX equals N K + 1 2 K(K + 1), the joint normal distribution of (ψ f , ψ q ) is further allowed to be degenerate. When v t is normal (as assumed in Assumption 2.1) or its third moment equals zero, ψ β and ψ XX are also independently distributed.
The identification robust statistics use an estimator of the Jacobian whose limit behavior under H 0 : Λ 1 = Λ 0 1 is independent of the limit behavior of the sample moment, see Kleibergen (2005): ) and V f f (Λ 0 1 ), and ψ q.f independent of ψ f . We can next define the identification robust Factor Anderson-Rubin (FAR), (Kleibergen) Lagrange multiplier (KLM) and JKLM statistics for testing H 0 : Λ 1 = Λ 0 1 : The limiting distributions are a direct result of Assumption 4.1 and do not depend on the rank of the Jacobian or β, so the limiting distributions hold regardless of Assumption 3.1.

Illustrative simulation and empirical results
We conduct a single-factor model simulation study to illustrate the performance of the proposed robust joint tests. For the data generating process (DGP), we consider where the parameters are calibrated to data from Adrian et al. (2013). In particular, we fix the sample size to be T = 300 and use the eleven excess returns as in Table 1. We calibrate with the first PCA factor from Adrian et al. (2013) to mimic the strong identified case and the third one for the weak identification setting. Figure 2 shows power curves of the conventional t-statistic and the robust test statistics in both strong and weakly identified cases. For both settings, FAR, KLM and JKLM tests are all size correct, while the Wald test is size distorted under weak identification and size correct but biased for stong identification. For weak identification, the KLM test has some power loss away from the hypothesized value because of which it is preferred to combine it in a conditional or unconditional manner with the J-test to improve power, see Moreira (2003) and Kleibergen (2005). We use the robust tests to analyze the time-varying component of the risk premium. A detailed description of the involved excess returns and risk factors has been discussed previously for Tables 1-2. Figure 3 shows the pvalues for testing the risk premium associated with all six factors in a single factor model with the identification robust tests. A p-value larger than, say, 5%, implies that we could not reject the null at the 5% significance level.  Figure 3 shows that for most cases the JKLM test leads to unbounded 95% confidence sets even for strong factors, e.g., the first PCA factor, since the pvalue curves are above the 5% line over the whole interval of analyzed values of the risk premium which is consistent with the smaller power observed in our simulation exercises and results since the JKLM test primarily tests misspecification. For all factors, the FAR and KLM tests provide bounded 95% confidence sets since only for bounded regions the p-values are above the 5% line, even for those potentially weak factors such as the fifth PCA factor and the macro factor. The latter implies that in a single factor setting, all these risk premia are identified. For the high-order PCA factors, the robust tests, however, result in 95% confidence sets that differ from those resulting from the Wald test. Most striking is that a zero value for the risk premium is not rejected for strong factors such as the first and second PCA factors but rejected for potentially weak factors. For example, the null hypothesis that Λ 1 = 0 is rejected by the FAR and KLM test for both the fifth factor and the macro factor. This is partly in line with Adrian et al. (2013), which highlight the role of the higher-order principal components as the time variation may be largely driven by, e.g., the fifth principal component. Therefore, some factors may be weak but have some importance for interpreting the (timevarying) expected returns.  Table 2.

Identification robust sub-vector testing
The identification robust tests introduced in the previous section are for testing hypotheses specified on all elements of Λ 1 . We are often interested in testing hypotheses specified on just subsets of the parameters. When we analyze multi-factor models, testing whether or not a certain factor risk premium exhibits time variation would require testing a specific row of Λ 1 , while testing whether a factor drives the time variation would require to test the corresponding column of Λ 1 . Under our current settings, projectionbased versions of the identification robust tests would allow us to test such hypotheses whilst preserving the size of the test, see Dufour and Taamouti (2005). These tests, however, lead to reduced power so we extend the robust subset FAR test (sFAR) of Guggenberger et al. (2012) for testing hypotheses on all elements in a row of Λ 1 which can be similarly extended to test for all elements in a column of Λ 1 .
Without loss of generality, we consider testing the hypothesis that the risk premia associated with one specific factor, say the first, are all equal to λ 0 1 : Under H 0 : λ 1 = λ 0 1 , the N × 2K dimensional reduced rank parameter matrix in the equation for the stacked returns becomes: which, since the rank of β 2 Λ 2 I K−1 equals K − 1, shows that H 0 implies that the smallest K singular values of Φ times The sFAR statistic for testing H 0 : λ 1 = λ 0 1 : therefore corresponds with a rank test of H 0 : rank The bounding distribution of the limiting distribution of the sFAR statistic relies upon a Kronecker product structure (KPS) asymptotic covariance matrix of the least squares estimator of the linear model ( see Guggenberger et al. (2012)): The KPS thus concerns the asymptotic variance of We note thatv t is not directly observed so it adds additional sampling error when imputing estimates ofv t . To implement the sFAR test, we therefore make the following assumption.
Assumption 5.1. There exists Ω ∈ R 2K×2K and Σ ∈ R N ×N symmetric positive definite matrices such that S = Ω ⊗ Σ and The asymptotic normality stated in Assumption 5.1 is a direct result of Assumption 2.1. Ifv t is directly observed, soẽ t = e t , Assumption 2.1 .v t+1 ) and Σ = var(e t ). Because of the additional sampling error due to the generated regressorv t+1 , Assumption 5.1 does, however, not provide the exact specifications of Ω and Σ.
(1)  Table 3: KPS test (KPST) statistics for testing the null hypothesis that H 0 : S = Ω ⊗ Σ for some Ω ∈ R 2K×2K and Σ ∈ R N ×N symmetric positive definite matrices. All four cases use excess returns on bonds with maturities 3, 12, 24, 60, 90, 120 months from Adrian et al. (2013). For the factors X t , (1) uses the macro factor (real activity) and the level factor (first PCA factor), (2) uses the level factor (first PCA factor) and the slope factor (second PCA factor), (3) uses the macro factor (real activity) and the slope factor (second PCA factor) and (4) uses the macro factor (real activity) and the curvature factor (third PCA factor).
We use the KPS test (KPST) from Guggenberger et al. (2022) to test for the proximity of a KPS matrix to the covariance matrix, S. Table 3 reports the KPST results, and shows that the KPS restriction for S is a realistic assumption since none of these tests reject the null hypothesis that the covariance matrix has a KPS at the 5% significance level. A by-product of the KPS test is the KPS factorization forŜ, see Guggenberger et al.(2022).
Proof. See Appendix.
Because of the KPS covariance structure,Ω ⊗Σ, we can compute the sFAR statistic using the characteristic polynomial stated in Proposition 5.2.
Proposition 5.2. Under Assumptions 4.1 and 5.1, letVΦ = (Ψ ⊗Σ),Ψ = T times the sum of the K smallest roots of the characteristic polynomial: ). The bound on the limiting distribution holds regardless of Assumption 3.1.
Proof. See Appendix.  Figure 4 illustrates the χ 2 bound on the limiting distribution of the sFAR statistic stated in Proposition 5.2. The density of the limiting distribution of the sFAR statistic is simulated for a two-factor model, which uses the first two PCA factors to mimic the strongly identified case and the third and the fifth PCA factors to mimic weak identification. The limiting distribution of the sFAR statistic is χ 2 when the model is strongly identified. In the weakly identified case, as shown in Figure 4, the limiting distribution is bounded by the χ 2 distribution. Using χ 2 critical values for the sFAR test thus controls the size of the test.
For projection-based tests on λ 1 , the involved test has to be computed over a grid of points concerning the partialled out parameters. This becomes computationally burdensome when the number of partialled out parameters increases because of an increased dimension of Λ 1 resulting from more factors. It makes our proposed approach involving the sFAR test more empirically appealing because it does not involve an extensive grid search. In practice, Assumption 5.1 can be relaxed by using the KPST as a pre-test for conducting robust subset testing as described in Guggenberger et al. (2022).
The value of the sFAR statistic at parameter values distant from zero provide a diagnostic to indicate if the confidence sets of the hypothesized parameters are bounded. These tests are therefore indicative of weak identification.
Proposition 5.3. For tests of H 0 : λ 1 = cλ 0 1 with λ 0 1 a fixed vector of length one and c a scalar, lim c→∞ sFAR(cλ 0 1 ), the realized value of the sFAR statistic at a distant value of λ 1 in the direction of λ 0 1 , equals T times the sum of the K smallest roots of orthonormal matrix that is orthogonal to λ 0 1 . The limit sFAR statistic is uniformly bounded from below by the minimum eigenvalue of TΨ −1/2 Vβ Proposition 5.3 provides a way of verifying whether the confidence sets resulting from the sFAR statistic are bounded or unbounded in specific directions (Dufour (1997), Kleibergen and Zhan (2020) is a rank test statistic concerning the rank of the factor loading matrix β (see Kleibergen and Paap (2006)), so Proposition 5.3 shows that the sFAR test evaluated at distant values relates to the rank of β. Proposition 5.3 also explains that when we encounter weak identification issues with β's close to reduced rank, we have unbounded confidence sets. The lower bound is sharp when K = 1, as indicated in the proof of Proposition 5.3, for which case also Theorem 12 in Kleibergen (2021) applies. When K = 1 and the Kleibergen-Paap rank test is significant at the 5% significance level, Proposition 5.3 implies that the sFAR test leads to bounded 95% confidence sets of λ 1 . Table 4 reports the Kleibergen-Paap rank test for different factor settings for the data from Adrian et al. (2013). It shows that, in line with Figure  3, all single-factor model have bounded 95% confidence sets for the timevarying risk premia, which are less likely to be bounded when we include more than three factors. The fifth factor, though identified in a single-factor setting, suffers from weak identification problems when we include other factors. Table 4 also shows that the rank test statistic is a good indicator of unboundedness as small values of the rank test statistics suggest unbounded confidence sets.
(1) rank (2) Table 4: Kleibergen-Paap rank statistic testing H 0 : rank(β) = K − 1 (K denotes the number of factors) and its associated [p-value] in square brackets, for varying factor combinations. The colum headed by (i), for i=1,. . . ,5, states which factor combinations are used when using i factors. All cases use excess returns on bonds with maturities of 2, 3, 12, 60, and 120 months and different combinations of the five PCA factors from Adrian et al. (2013). We mark with one star if the lower bound of the limit sFAR (see Proposition 5.3) indicates bounded 95% confidence sets in every direction, and mark with double daggers if the associated 95% confidence sets of the time-varying risk premia parameters of one or more factor are unbounded.

Power of the sFAR test
To illlustrate the power of the sFAR test, we compute power curves for two settings calibrated to the data discussed previously. Figure 6 therefore shows the two-dimensional power curves that result when jointly testing the two risk premia parameters associated with a single factor in a two factor model. The left hand side of Figure 6 shows the power curves for a strongly identified setting while the right hand side does so for a weakly identified setting. The power curves on the right hand side show that the sFAR test is not consistent for weakly identified settings since the rejection frequencies do not converge to one when we move away from the hypothesized value. Figure 6: Simulated power surfaces (rejection frequencies) of sFAR tests on Λ 1 's w.r.t. the first factor of a two-factor model: the left panel (a) is a strong identified setting calibrated to the two-factor model with excess returns of bonds with maturities 3, 60, 120 months using the first and second PCA factors , while the right panel (b) is a weakly identified setting calibrated to the two-factor model with excess returns of bonds with maturities 3, 60, 120 months using the third and fifth PCA factors. Dotted lines (γ i , y, 0.05) are drawn to mark the positions of the calibrated risk premia values, γ i 's, at 5% level.

Identification robust confidence sets for risk premia
We use the sFAR test to construct confidence sets on the risk premia resulting from two and three factor models. Figure 7 shows the 90, 95 and 99% joint confidence sets that result for the two risk premia resulting for one specific factor in a two factor model using the data from Adrian et al. (2013) while Figure 8 does so for the three risk premia resulting for one specific factor in a three factor model. Size correct confidence sets for the individual risk premia result by projecting the joint confidence sets on the axes. When using four or more factors, the number of risk premia concerning one factor is at least four so we have to use projection-based tests based on the sFAR statistic to be able to visualize these confidence sets. For expository purposes and since Table 4 shows that some of these confidence sets are unbounded, for example, the one that results when using all five factors, we therefore refrain from using more than three factors. 2 Figure 7 shows all two dimensional confidence sets for the two risk premia resulting for one factor for all different specifications using the five PCA factors discussed previously in a two factor model. The two dimensional confidence sets in Figure 2 vary a lot. Quite a few are empty so all values of the parameters are rejected at significance levels which exceed 99%. This occurs, for example, when using the first and either the second, third and fourth PCA factor so the model is misspecified. There are also settings where the confidence set is bounded and well behaved which occurs, for example, when using the third and fourth PCA factor. Other confidence sets are unbounded and/or cover the whole two-dimensional space, which occurs, for instance, when using the third and fifth PCA. For this combination the 90% confidence set for the two risk premia on the third factor is unbounded but excludes an area in the parameter space while the 90% confidence set of the two risk premia on the fifth factor covers the whole two-dimensional space. Table 4 also shows that the combination of the third and fifth PCA factors leads to a smaller rank test statistic than other factor combinations when using two-factor models which is in line with the unbounded confidence set in Figure 7 which relates to the third and fifth PCA factors. This is all indicative of weak identification when using both the third and fifth factors. Figure 8 shows the joint confidence sets for the three risk premia associated with a single factor in a three factor model. The first column of Figure  8 does for a factor model containing the first three PCAs as factors while the second column does so using the first, third and fifth PCAs as factors. Unlike when using two factors, the first column shows that the confidence sets are no longer empty but bounded which shows that the risk premia for the first three PCA factors are well identified and that the model is no longer misspecified. This is confirmed by the p-value of the rank test on the β's. This is in contrast when using the first, third and fifth PCAs as factors. The confidence sets in the second column of Figure 8 are namely all unbounded indicating weak identification of the risk premia which is further reflected by the p-value of the rank test on the β's. Table 4 also shows that the model including the first, third, and fifth PCA factors has a much smaller rank test statistic than one of the first three PCA factors within the three-factor model. Figure 8 shows the three dimensional confidence sets that result from  the sFAR test. It results from partialling out the six risk premia associated with the other factors. Hence, when we compute these confidence sets using projection with the identification robust tests, we have to specify a ninedimensional grid for the risk premia and compute the identification robust tests for all values on this nine-dimensional grid. This is, or is close to be, computationally infeasible. Hence, the sFAR test provides a computationally tractable manner to conduct identification robust tests on larger number of parameters.

Conclusion
We propose identification robust test procedures for testing hypotheses on risk premia in dynamic affine term structure models. The robust subset factor Anderson-Rubin test extends the sFAR test from the linear asset pricing model to allow for tests on multiple risk premia and, unlike projection based testing, provides a computationally tractable manner to conduct identification robust tests on larger number of parameters. Our empirical results show that especially in case of multiple factors, weak identification is pervasive and traditional tests are likely misleading. We use the empirical settings from the literature on affine term structure models, see e.g. Adrian et al. (2013) and Ang and Piazzesi (2003)), to illustrate our results and the importance of using weak identification robust test procedures. A Appendix: Proofs.
Suppose that there exists a consistent estimator for S,Ŝ, a by-product of the KPS test is the KPS factorization forŜ (Guggenberger et al.(2022)). We briefly discuss how it operates. For a matrix A ∈ R kp×kp with a block structure for j = 1, . . . , p. Consider a singular value decomposition (SVD) of R(Ŝ):
Proposition A.1.1 shows that the distribution of the eigenvalues under the null hypothesis depends only on the nuisance parameters M. Proposition A.1.3 argues that to derive the upper bound, we need only focus on the nuisance parameters M under the "strongly identified" setting, where the eigenvalues of M M satisfy the condition that κ 1 > κ 2 > · · · > κ K−1 > κ K = κ K+1 = · · · = 0 and the (K − 1)-th largest eigenvalue, κ K−1 , goes to infinity. Because the "strongly identified" is shown to lead to the "largest probability" of having larger values of the K smallest eigenvalues.
In the following, we first provide Proposition A.1 and then continue to discuss the joint distribution of the eigenvalues of the eigenvalue problem from Proposition A.1.1 and the associated approximate conditional distribution of the smallest eigenvalues in Section A.2.1.
Proof of A.1.1. The FAR statistic reads as follows and the sample moments f T (Λ 1 , X) of the model under consideration for given Λ 1 are Note that ford the least squares estimator resulting from the 2-nd step of the three step procedure, which implies that Provided that Assumption 5.1 holds, the above result implies that we can chooseV

Substituting (A.2) and (A.3) into (A.1) gives
which via the trace operator can be rewritten as where q is a (2K − 1) × K matrix of full column rank K. Let q * denote the set of (2K − 1) × K matrices of full column rank K, and q * * denote the set of all matrices q for which q A(Λ 0 ) Ψ A(Λ 0 )q = I K , then it is a straightforward result such that Denoteq = arg min q∈q * * FAR(λ 0 1 , q), and we prove A.1.1 by linking FAR(λ 0 1 ,q) with an eigenvalue problem and showing that under the null hypothesis λ 1 = λ 0 1 , with probability approaching one we have Theorem 1.2 from Sameh and Wisniewski (1982) implies that FAR(λ 0 1 ,q) equals T times the sum of the K smallest eigenvalues of the eigenvalue problem: x is a (2K − 1)-eigenvector, κ is a scalar eigenvalue. T times the sum of the K smallest eigenvalues, κ's, of the above eigenvalue problem equals T times the sum of the K smallest roots of the characteristic polynomial: Pre/post-multiplying equation (A.6) by |C| yields |µI 2K−1 − Ξ Ξ| = 0. Therefore, the eigenvalue problem (A.5) is equivalent to the eigenvalue problem |µI 2K−1 − Ξ Ξ| = 0, and FAR(λ 0 1 ,q) equals the sum of the K smallest eigenvalues of Ξ Ξ.

Proof of A.1.2. Under the hypothesis H
β 2 β 2 ·u , and thus the rank of M, denoted as ρ(M), is smaller than or equal to K − 1. Therefore, the null space of M, denoted by N (M), is at least K dimensional. In the following, we use the decomposition Ξ = Z + M with Z ∼ N (0, I).
For an arbitrary n-by-n real symmetric matrix A, the k-th largest eigenvalue, via min-max characterization (also known as the Courant-Fisher expression), can be expressed as where the first minimum is over all (n − k + 1)-dimensional subspaces U of R n . Therefore, employing this characterization, the j-th eigenvalue of Ξ Ξ iŝ For j < K, 2K − 1 − j + 1 > K, we can no longer choose U ⊆ N (M) when κ K−1 → ∞ as the null space would be K dimensional. Let's consider M M = QP Q , Q is an orthogonal matrix whose columns are eigenvectors of M M and P is a diagonal matrix whose entries are the eigenvalues κ i , i ≥ 1. Denote Q i the eigenvector associated with κ i .
Then for any 2K − 1 − j + 1-dimensional linear space U in R 2K−1 , U ∩ span(Q i , i ≥ K − 1) is not empty. Hence, which implies thatκ j → p ∞. and Q n h → Q, where M n h M n h = Q n h P n h Q n h , Q n h is an orthogonal matrix whose columns are eigenvectors of M n h M n h and P n h is a diagonal matrix whose entries are the eigenvalues κ i ( M n h M n h ), i ≥ 1. Denote Q n h ,i the eigenvector associated with κ i (M n h M n h ), and by construction Q n h ,i → Q i . Since we allow weak identifications, and thus κ i (M n h M n h ), i ≤ K − 1 can be zero when all factors are unspanned or bounded when all factors are weak

Proof of
By construction, the following two properties hold.
where the minimum is over all (2K − 1 − j + 1)-dimensional subspaces U of R 2K−1 that are orthogonal to the linear spaces spanned by vectors Q i , i < K.

For
We prove the above two properties which lead directly to the final conclusion.

Proof of equation (A.8). Note that
Hence, for the sequence {M h , h ≥ 1}, if, for example, for an arbitrary i < K such that Q i x = 0, x = 1, then with probability approaching one as h increases, Combining the above equation with equation (A.7), we know κ j (Ξ M h Ξ M h ), j ≥ K, the minimum in equation (A.8) should be achieved over linear spaces that are orthogonal to Q i , i < K as h increases. This completes the proof of the first statement.

Proof of equation (A.9). For
Therefore, we arrive at the final conclusion that for j ≥ K, where the second last equality is due to construction that we restrict the increasing rate of κ h,1 such that κ h,1 = o A.2.1 Joint distribution of the eigenvalues of the eigenvalue problem in Proposition A.1.1 and the approximate conditional distribution Proposition A.1 links the sFAR statistic with the smallest eigenvalues of an eigenvalue problem. We next first discuss the distributional behavior of the eigenvalues of the eigenvalue problem in Proposition A.1.1 under the "strong identified" setting, which then leads to the distributional behavior of the smallest eigenvalues. We derive an approximate conditional density of the smallest eigenvalues given the larger ones and then show the critical values constructed based on the conditional density are bounded by those generated from a χ 2 distribution asymptotically in Corollary A.2. Guggenberger et al. (2019) discuss the case where M M is a reduced rank 2-by-2 matrix with eigenvalues κ 1 > κ 2 = 2 and κ 1 increasing to infinity. This Section extends the analysis to the more general "strongly identified" specification that M M is a reduced rank L-by-L matrix of rank K − 1 with eigenvalues κ 1 > · · · > κ K−1 > κ K = · · · = κ L and κ K−1 increasing to infinity.
The following Proposition A.2 replicates the discussion in Muirhead (1978) for an approximation of the density of all eigenvalues of noncentral Wishart matrices. This gives rise to an approximate conditional density of the (L − K + 1) smallest eigenvalues of the noncentral Wishart matrix given the remaining eigenvalues as shown in (A.12).
Substituting (A.11) into (A.10) gives an asymptotic representation of the joint density of all latent roots, which corresponds to equation (6.5) in Muirhead (1978).
Substituting the result of Proposition A.3 into (A.12) gives equation (A.14). Next, we prove the remaining statements. The following result holds for some real number ξ x/κ between 0 and x κ when x ≤κ −κ − 2(1− ) (A.17) The expansion (A.17) implies that for x ĵ κ i 2 and thus equation (A.12) can be expressed as Following the result in Proposition A.3, the equation (A.18) implies that As for the case when , which leads to equation (A.16).
As for the remaining results, note that when x L ≥ N − (K − 1), and as the term on the left side is dominating terms of order O κ The above inequality implies q 1−α,t 1 < q 1−α,t 2 . Similarly, when x L ≥ N − 2 with respect toκ i is positive whenκ i is large, and jointly with equation (A.14) it implies that q 1−α,t 1 is increasing inκ i , i ≤ K − 1.
Corollary A.2 further verifies the numerical analysis in Guggenberger et al. (2019) and extends to a general setting. Results in Section A.2 essentially lead to the proof of Proposition 5.2, and we briefly summarize it in A.3.

A.3 Proof of Proposition 5.2.
The first result that the statistic sFAR is equal to the sum of K smallest eigenvalues follows directly the proof of Proposition A.1.1. Next, we prove that lim T →∞ sFAR(λ 0 1 ) ≺ χ 2 (K(N − (K − 1)).
The smallest K roots are calculated from the polynomial which can be rewritten as Hence, when c goes to infinity, the characteristic polynomial becomes: SinceΦ = (d . . .β), we haveΦ×diag(λ 0 1,⊥ , I K−1 ) = (dλ 0 1,⊥ . . .β). The subset AR statistic now equals the sum of the K smallest root of the above characteristic polynomial which depend onλ 0 1,⊥ . For example, whenλ 1 = e 1,k ,λ 0 1,⊥ = 0 Then the above characteristic polynomial can be rewritten, by pre-and post multiplying the matrices in the determinant withΨ(λ 0 1,⊥ ) −1/2 andΨ(λ 0 1,⊥ ) −1/2 respectively, as and thus Cauchy's interlacing inequality implies the sum of the K smallest roots of the above polynomial is bounded from below by the minimum eigenvalue ofΨ −1/2 Vβ Σ −1βΨ −1/2 V . Therefore, the limit sFAR statistic is bounded from below uniformly by the minimum eigenvalue ofΨ −1/2 Vβ where r t = ln P t,1 represents the continuously compounded risk-free rate. The property of the pricing kernel such that P t,τn+1 = EM t+1 P t+1,τn implies that From the moment generating function of a normal distribution and the assumption that {R t+1 , v t+1 } are jointly normal, 1 we know from (A.21) that . We can decompose unexplained returns into two parts: one explained by the innovation shock (v t+1 ), the other by errors i.i.d e , and then The above results imply that from (A.22) we have, Adrian et al. (2015) consider a less restricted model setting with a different pricing kernel specification. The model closely resembles affine term structure models, and it takes into account the unspanned factors. Unspanned factors refer to factors that are not correlated with the contemporaneous excess returns but contribute to the forecasts of excess returns. Though usually, factors are either significant factors for the cross section or significant for forecasts, this model takes into account the possibility that factors act in both dimensions. The state variables (X t ) satisfy equation (A.19), X t+1 = µ + ΦX t + v t+1 , and fall into three categories: X 1,t ∈ R k 1 , X 2,t ∈ R k 2 , X 3,t ∈ R k 3 , where K C × 1 vector C t = X 1,t , X 2,t is for the cross section, whose innovations have significant non-zero β's in (A.23), K F × 1 vector F t = X 2,t , X 3,t for forecasts and u t = v 1,t , v 2,t denotes innovations in the C t factors. The new pricing kernel M t+1 in Adrian et al. (2015) satisfies where R i,t+1 denotes holding period return in excess of the risk free rate of asset i. As a consequence, where β i,t = var (u t+1 |F t ) −1 cov (u t+1 , R i,t+1 |F t ). Therefore, we can write R i,t+1 = β i,t (λ 0 + λ 1 F t ) + β i,t u t+1 + e i,t+1 , (A.23) the structure of which implies that our proposed tests are applicable when we assume time-constant β's in this framework.

A.2.3 Hamilton and Wu (2012)
Hamilton and Wu (2012) analyze the yield, and they assume that equation (A.19)-(A.20) hold 2 , and the risk free one-period yield y (1) t,r , the yield of a n-period bond, y (n) t , satisfies y (1) t,r = δ 0 + δ 1 X t , y (n) t = a n + b n X t + u t,n , 2 Our notations are slightly from the original ones in Hamilton and Wu (2012) to be consistent from previous discussions.
If we consider the data transformation: r (n) t = ny (n) t − (n + 1)y (n+1) t−1 + y (1) t−1,r , the above equations imply that with c (n) = na n − (n + 1)a n+1 + δ 0 = β (n) λ 0 + (β (n) Σ v β (n) /2 − β (n) µ), β (n) = nb n , e t = nu t,n − (n + 1)u t−1,n+1 , which then indicates our tests are valid with proper restrictions on e t . The constant term c (n) has different structure than the one indicated in Adrian et al. In the following, β is an N × K matrix, which is the transpose of the factor loading matrix we use in the main context, and we denote f = (µ, d), Ψ β as a stack form of ψ β , the asymptotic distribution of √ T vec β − β , such that (Ψ β ) i,j 1≤i≤K,1≤j≤N = (ψ β ) (i−1)N +j , and denote K a,b the commutation matrix such that K a,b vec(A) = vec(A ) with A a a × b matrix.
The statement (a) is a direct result following Adrian et al.(2013). Here we only list the specifications of V β , C Λ,β , V Λ .
and A β is a N K × K matrix A β = ⊕ N i=1 β (i) , where ⊕ denotes the matrix direct sum, such that A ⊕ B = A 0 0 B .
We discuss two special cases to show the statement (b). We first introduce some new notation, given the three-stage estimator as in the form in Adrian et al. (2013): and thuŝ We can look at the asymptotic properties ofΛ by looking at T i,j 's. For convenience in the following sections otherwise well mentioned, we denote T i,j as the previous T i,j divided by √ T such that: vec (T 1,1 ) = Λ, vec T µ=0 Based on the above derivations, we can see that T 1,2 , T 1,5 converge to zero in probability but not the rest terms, and thus given β = 0, which implies that the estimated parameterΛ does not converge to Λ in probability but converges to Λ + with having a non-standard distribution.
If β = B √ T , where B is full rank, then Again, we consider those decomposed terms (we only show those terms that do not converge to zero in probability): which imply thatΛ does not converge to Λ in probability but again converges to Λ + with having a non-standard distribution.
We only analyze one out of two approaches (without knowledge of unspanned factors) proposed by Adrian et al. (2013) since these two appraoches are equivalent, as suggested by the following proposition. (2) the second step is to obtain estimates of a (n−1) , d (n−1) , β (n−1) I , Σ e by Notice the following two equations hold (B.25) which directly leads to the equality c = d + Φ β. The equality (B.24) is obvious due to the orthogonality ξ β X − = 0. Therefore, we only need to show the equation (B.25).
where the last equality is due to the facts such that (I−P Mι T X P Mι T X − )∆ = 0,