Semiparametric Identification in Panel Data Discrete Response Models∗

This paper studies partial identification in fixed effects panel data discrete response models. In particular, semiparametric identification in linear index binary and ordered response panel data models with fixed effects is examined. It is shown that under unrestrictive distributional assumptions on the fixed effect and the time-varying unobservables and failure of point identification, informative bounds on the regression coefficients can still be derived under fairly weak conditions. Partial identification is achieved by eliminating the fixed effect and finding features of the distribution that do not depend on the unobserved heterogeneity. Numerical analysis illustrates how these sets shrink as the support of the explanatory variables increases and how higher variation in the unobservables results in wider identification bounds.


Introduction
This paper studies semiparametric identification in fixed effects panel data discrete response models, and in particular linear index binary and ordered response panel data models with additively separable fixed effects. It is shown that in models with unrestrictive distributional assumptions on the fixed effect and the time-varying unobservables where point identification fails, informative identification bounds on the regression coefficients can still be derived under fairly weak conditions. The approach builds on the work on semiparametric identification in static binary panel data models of Rosen and Weidner (2013,WP) and extends it to dynamic binary and static and dynamic ordered response fixed effects panel data models. Partial identification is achieved by relying on features of the distribution of observable variables that do not depend on the fixed effect. Numerical analysis illustrates how these sets shrink as the support of the explanatory variables increases and how higher variation in the unobservables results in wider identification bounds.
It is often noted that individuals who were observed making a specific choice in the past are more likely to make the same choice in the future. For example, in labour economics is has been documented that the employment choices individuals make in different periods are functions of past employment decisions. In panel data demand settings, when consumers make a specific purchasing decision regularly, it is often evident that decisions are linked intertemporally. Such correlation might be a result of the formation of habits, the persistence in choice or the presence of switching costs.
As pointed out by Heckman (1981) in panel data models intertemporal correlation between choices in different periods comes in general through the presence of time-invariant unobservables as well as lagged dependent variables in the utility function. In the first case individuals differ by some unobserved time-invariant factor that correlates decisions over time, also referred to as spurious state dependence. In the latter, past choices enter the utility function directly and therefore affect future decisions. This dependence is referred to as the true state dependence. Ignoring this dynamic behaviour of individuals can result in inconsistent estimates of regression coefficients and other quantities of interest, such as price elasticities. Linear panel data model settings where the dependent outcome is continuous, can be seen as solving an omitted variables problem, arising from the presence of this additively separable time-invariant unobserved component. Even when this fixed effect is not restricted to be independent of the explanatory variables, point identification of the regression parameters can be achieved by differencing out this additively separable time-invariant unobserved heterogeneity.
In non-linear panel data models with additively separable fixed effects, the differencing approach can not be directly implemented. Identification and estimation in these models therefore rely heavily on the assumptions econometricians place on the individual specific heterogeneity and the challenges these models face have been well documented in the literature.
Choosing between random and fixed effects, how to deal with initial conditions and lagged dependent variables, as well as the incidental parameters problem and the calculation of marginal effects, have all been extensively studied by a number of authors. A detailed summary of developments can be found in Arellano and Honoré (2001) and more recently in Arellano and Bonhomme (2011).
This paper studies identification in fixed effects linear index discrete response models, without imposing distributional assumptions on the additively separable unobserved heterogeneity. It uses a similar approach as in linear panel data models, where identification relies on observable implications in which the additively separable fixed effect does not appear. As the model is non-linear, it is shown that eliminating the individual fixed effect leads to partial identification of the regression parameters. This paper examines two different discrete response settings. The first setting corresponds to the case where individuals' choice set consists of a binary outcome, for example whether to purchase a specific product or not in a given period, and examines semiparametric identification in linear index binary panel data models. Several papers, including Chamberlain (1984Chamberlain ( , 2010, Honoré (2002) and Honoré and Kyriazidou (2000), have shown that in linear index panel data models with binary outcomes, parametric point-identification of the regression parameters when regressors have bounded support can only be achieved under the assumption of independently and identically logistically distributed time-varying unobservables. Manski (1987), using a conditional version of the Maximum Score Estimator, shows that the panel data binary model can have a median regression interpretation and inference is still possible if the disturbances for each panel member are known to be time-stationary conditional on the strictly exogenous variables and the fixed effects, when the strictly exogenous explanatory variables vary enough over time with at least one component having unbounded support. Honoré and Lewbel (2002) also show point-identification of binary panel data models with predetermined regressors, under the assumption that there exists a special regressor which conditional on the rest of the regressors and the instruments is independent of the fixed effect.
Point-identification of the regression parameters in discrete response panel data models relies on strong assumptions, such as logistically distributed unobservables and large support conditions, which are strong, untestable and difficult to satisfy in applications. In an attempt to overcome this limitation, several papers study semiparametric identification in a general class of panel data models that results to partial identification of parameters and quantities of interest. For example, Chernozhukov, Hahn, and Newey (2005) focus on nonparametric bound identification, estimation and inference in multinomial panel data models with correlated random effects with only discrete explanatory variables. Honoré and Tamer (2006) study bounds on parameters in dynamic discrete choice models, mainly focusing on the initial condition problem. In linear panel data settings Rosen (2012) studies the identifying power of conditional quantile restrictions in short panels with fixed effects, while Chernozhukov, Fernández-Val, Hahn, and Newey (2013) provide sharp identification bounds for average and quantile treatments effects in fully parametric and semiparametric nonseparable panel data models. In a very recent paper, Pakes and Porter (2014) develop a new approach to semiparametric analysis of multinomial models with additively separable fixed effects. Set identification of the model parameters in this setting comes through the weak monotonicity assumption on the index function and the group homogeneity condition on the disturbances conditional on the contemporaneous explanatory variables and the fixed effects. This paper focuses on semiparametric identification of the regression coefficients in linear index binary panel data models with additive fixed effects, where the main focus is on the elimination of the unobserved heterogeneity. No assumptions are imposed on the fixed effect, which is allowed to be correlated with the explanatory variables and the time-varying unobservables. Identification is achieved by finding features of the distribution that do not depend on the fixed effect. This approach extends the work on semiparametric identification in static binary panel data models in Rosen and Weidner (2013,WP), thereafter RW2013, to dynamic binary panel data models, where last periods choice directly enters current period's decision rule. Conditioning on the initial condition, identification relies on individuals who switch options from one period to the next. It is shown that the joint probability of the choices these individuals make in two consecutive periods is bounded by features of the distribution that do not depend on the unobserved fixed effect. Therefore, under an exogeneity condition for the time varying unobservables, this allows for the derivation of identification bounds on the coefficients of the contemporaneous explanatory variables as well as the lagged dependent variables, without parametric assumptions on the fixed effect or the time-varying unobservables.
Panel data binary response models are suitable for explaining individual choices when the choice set includes two alternatives, such as the choice between purchasing a product or not, the choice between two different brands and the choice between employment and unemployment.
Most of the papers dealing with identification of discrete choice panel data models, use the binary case as an example. However, when the choice set includes more than two alternatives then the binary choice model may fail to take into account all the information.
Several papers recognizing that the choice set individuals face may include more than two options, have extended the binary response panel data model to a multinomial response panel data model where the the choice set includes a variety of unordered alternatives. Although, these papers provide clear identification results, they usually require the comparison of each option against every other alternative, which might be intractable and computationally heavy in practice. This paper departs from these models and imposes some additional shape restrictions on the functional form, such that the binary model is extended to an ordered model. Such a method reduces the amount of between alternative comparisons needed and gives a clearer picture on which option combinations are needed for identification.
Models that impose ordering restrictions can describe situations where consumers choose between vertically differentiated products, such as the choice between flying first, business or economy class and the choice between buying a two-blade or a three blade razor, as in Shaked and Sutton (1983), Berry (1994), Bresnahan (1987), and Aristodemou and Rosen (2015,WP). In market settings where consumers are observed purchasing a specific product regularly, consumers base their purchasing decisions on past choices or decisions are somehow correlated because consumers are locked-in. As a result, in these markets dynamics play a crucial role in individuals' consumption decisions.
Identification of panel data ordered response models has not been extensively studied in the literature. Following the work by Honoré (1992) that shows how to consistently estimate the parameters in the truncated/censored panel data model, this paper focuses on the "inbetween" case of ordered outcomes. Since every ordered response model can be expressed as a dichotomous/binary response model, parametric point-identification can be achieved under the assumptions of logistically distributed time-varying unobservables as discussed in Chamberlain (1984Chamberlain ( , 2010 and time-constant thresholds. As discussed in Baetschmann, Staub, and Winkelmann (2015) literature has estimated the fixed effects ordered logit model either with a single dichotomization with constant or individual-specific thresholds, or by combining all possible dichotomizations by various estimation methods, such as a two-step minimum distance method and generalized method of moments. The former approach is inefficient as it does not use all the information and in the case of individual specific thresholds shown to be inconsistent. The latter can be consistent and nearly efficient, under correct specification.
The approaches used in the literature rely on the distributional assumption of iid logistically distributed unobservables independent of the fixed effect and do not use all the information provided. Furthermore, they require time-invariant thresholds, which might be too restrictive in some settings. This paper departs from these approaches and examines semiparametric identification of fixed effects ordered response panel data models without imposing distributional assumptions on the fixed effect or the time-varying unobservables. The complete structure of the ordered choice model is used, thus utilizing information provided from the additional shape restrictions, while at the same time allowing for the possibility of time-varying thresholds.
As already discussed, allowing for the fixed effect to be correlated with the explanatory variables, creates an endogeneity problem. Chesher (2010) and Chesher and Smolinski (2012) derive sharp identification bounds in nonparametric cross-sectional ordered response models in the presence of endogenous variables. They show that these bounds can shrink at a relatively fast rate as the relevance of the instruments increases and as the number of ordered outcomes becomes larger. Furthermore, they show that when the ordered model is expressed as a binary response model, the identified sets are not sharp. In a more general setting a series of papers Chesher, Rosen, and Smolinski (2013), Chesher andRosen (2012, 2013) study identification of instrumental variables models with discrete dependent variables. This paper departs from the instrumental variables approach and tackles this problem by eliminating this unobserved heterogeneity and finding features of the distribution of the observed outcomes that do not depend on this time-invariant unobservable. In contrast to the binary response panel data model, identification of the regression parameters does not rely only on observations who switch choices in two consecutive periods. It is shown that individuals who choose the middle option in two consecutive periods are also a useful source for identification. The greater number of pairs that can be used in the ordered model in comparison to the binary model can help in achieving tighter bounds for the regression parameters, whilst keeping the number of inequalities to a trackable number. Furthermore, the use of information from individuals who stay with the same option might be useful in comparing the behaviour of switchers to non-switchers.
In this paper identification in the static and dynamic, binary and ordered panel data models is studied. The main result of this paper is that in general classes of panel data discrete response models where when point-identification in known to fail, informative identification sets of the regression coefficients can still be derived. Numerical examples examine how these sets behave under different conditions. The rest of the paper is structured as follows. Section 2 reviews the work in RW2013 and examines identification in dynamic binary response models. Section 3 extends the binary response panel data model to an ordered response panel data model and examines identification under the static and dynamic settings. Section 4 gives some numerical results for the binary panel data model. Section 5 concludes with some final remarks and future steps.

The Binary Response Panel Data Model
This section examines identification in fixed effects binary response panel data models. Each individual in the population is observed for two time periods, t = 1 and t = 2, and in each time period he can choose one outcome from the set of binary options, Y t ≡ {0, 1}. Therefore, each individuals is characterized by a set of observables (Y, X) such that Y = (Y 1 , Y 2 ), X = (X 1 , X 2 ), and a set of unobservables (V, α), where V = (V 1 , V 2 ) and α ∈ R. The utility an individual, with covariates x t , y t−1 and unobservables v t , α, receives from choosing a specific outcome y in period t,y t , is then given by where, x t are observed individual characteristics, y t−1 is the option chosen last period, α is the unobserved to the econometrician time-invariant individual fixed effect, v t is the unobserved to the econometrician time-varying component and g() is function not yet specified, which is an element of some parameter space Θ. Normalizing the utility of the outside option, Y t = 0 in each period to be zero and under the assumption that in each period t individuals choose the outcome to maximize (1), the panel data binary response model can be shown to be equivalent to, Such a setting could be used to describe situations where individuals in each periods decide whether to purchase a product or not, or whether to seek employment or stay unemployed.
In all these situations persistence in decisions has been documented by a number of authors.

Static Binary Panel Data Model
This section examines the simplest version of the model in (2), where the lagged dependent variable does not enter directly the decision rule function and the correlation between choices in different periods comes only through the unobservables, α and V t , such that, where F contains the Borel Sets. The support of (X Assumption 2. For each value of x ∈ X there is a proper conditional distribution of (Y 1 , Y 2 ) given X = x, P 0 (y 1 , y 2 |x) ≡ P(Y 1 = y 1 ∧ Y 2 = y 2 |X = x) and the conditional probability of each pair (y 1 , y 2 ), is point identified over the support of (Y 1 , Y 2 ) for almost every x ∈ X .
Assumption 3. The conditional distribution of V given (X = x), F V |X is absolutely continuous with respect to Lebesgue measure with everywhere positive density and the marginal Assumption 4. The conditional distribution of α given (X = x) is absolutely continuous with respect to Lebesgue measure with everywhere positive density on R and marginal distribution F α|X .
Assumption 5. X and V are stochastically independent. α is allowed to be correlated with both V and X in an arbitrary way. The joint distribution of (V, α) conditional on (X = x) is given by F (V,α)|X .
Assumptions 3-5 provide restrictions on the conditional distributions of the unobservables given the observed covariates. These conditional distributions are elements of the generic collection of conditional distributions defined as, Assumption 1 defines the underlying probability space and notation for the support of random variables (Y, X, V, α). Assumption 2 stipulates that the conditional distribution of (Y 1 , Y 2 ) given covariates x is point identified for almost every x ∈ X , as would be the case for example under random sampling. Assumption 3 and Assumption 4 requires that the time varying unobservable V and the unobserved heterogeneity α are absolutely continuously distributed conditional on X with full support in Euclidean space. Assumption 5 imposes independence of X and V , but allows α and V to be arbitrary correlated with a joint distribution F (V,α)|X .
Assumption 4 imposes no restrictions on the time-invariant unobservable and allows correlation with the explanatory variables. Therefore, the presence of α, creates an endogeneity problem, that needs to be addressed for identification and consistent estimation of the parameters of interest. In linear panel data models with continuous outcomes, differencing out the fixed effect is sufficient to guarantee point-identification of the regression parameters. This paper examines identification in linear index discrete panel data models and mimics the approach used in linear panel data models with continuous outcomes to solve the problem of the fixed effect. To be able to proceed the following additional assumption is imposed.
Assumption 6. The individual specific part, g(·), of the model in (3) is linear in the variables such that, Finally the set of structures S ≡ (β, F (V,α)|X ) admitted by model (3) is defined through Assumption 7.
Assumption 7. S ≡ (β, F (V,α)|X ) ∈ S is a specified collection of parameters β and joint distributions of the time-varying unobservable and the unobserved heterogeneity, F (V,α)|X . Such a S is called a structure.

Identified Set: Binary Static Panel Data Model
Point-identification of the regression coefficients under both the logistic distribution assumption, Chamberlain (1984), and the time-stationarity assumption, Manski (1987), comes by observing individuals who change behaviour from one period to the next. This behaviour gives rise to features of the distribution that do not depend on the unobserved heterogeneity.
This paper uses exactly the same approach, and finds features of the distribution that do not depend on α, without imposing these strong assumptions on the time-varying unobservables.
As it will be shown below removing the unobserved heterogeneity, even under Assumption 6, will lead to partial identification of the regression parameters. Under Assumption 6 the model in (3) in each period, t = {1, 2}, is given by, This model has been studied in RW2013. From the specification of the model in (6), the authors obtain the regions R SB (y 1 ,y 2 ) (x; β) that partition the supp(V, α) for all (Y 1 , Y 2 ) = (y 1 , y 2 ) and (X 1 , X 2 ) = (x 1 , x 2 ) such that: and the conditional joint distribution of (V, α)|X = x for any given pair (y 1 , y 2 ) ∈ Y 1 × Y 2 is, where P (y 1 , y 2 |x) = P (Y 1 = y 1 ∧ Y 2 = y 2 |X = x).
be a structure admitted by model (6). If S SB is an observationally equivalent structure to S 0 , then under assumptions 1-7, β SB satisfies a.e.
x ∈ X Proof. The proof follows the work in RW2013 and it is provided here for completeness. The crucial element in Theorem 1 is that by limiting the attention to individuals who change their choice from one period to the next, equation (8) becomes invariant to changes in the value of the unobserved time-varying heterogeneity, α. This implies that only realizations that fall under the following two types can be used, The choice probabilities of these events are given by: From (9) it is clear: Then For any given X = x and using Assumption 5, The above relations provide restrictions on the distribution of ∆V |X for any realization of x ∈ X that does not depend on the fixed effect α. The events {Y 1 = 0 ∧ Y 2 = 0} and {Y 1 = 1 ∧ Y 2 = 1} provide no restrictions on ∆V and can not be used to eliminate the fixed effect α. Similarly to the binary logit fixed effects model and as discussed in Honoré (2002), the individuals who do not switch can not be used to identify the regression parameter, since for any value of β the choices these individuals make can be rationalized by extremely large or extremely small values of the fixed effect. In other words, these events provide no restrictions on the values the fixed effect can take for a given value of the regression parameters. The distribution of ∆V |X ∼ F ∆V |X, is equivalent to ∆V ∼ F ∆V by Assumption 5. Notice that in order for the bounds in Theorem 1 to be informative, there should exists x ∈ X such that x 1 = x 2 with positive probability.
It can be clearly seen that the bounds change according to different values of x ∈ X .
Suppose that X t ∈ {0, 1}. There are two distinct cases where x t changes from t = 1 to t = 2, (1, 0)} and the bounds in (13) are given by: Now suppose that X t ∈ {0, 1, 2}. There are 6 distinct cases where x t change from period t = 1 to period t = 2, x = (x 1 , x 2 ) ∈ {(0, 1), (1, 0), (0, 2), (2, 0), (1, 2), (2, 1)}. Then the bounds in (13) are given by: From the above it is clear that different values of (x 1 , x 2 ) lead to different identified distributions for (Y 1 , Y 2 ). Therefore, to characterize the identified set for β, the greatest lower bound and the smallest upper bound of (13) over all possible pairs (x 1 , x 2 ) need to be used. Following RW2013 take an arbitrary constant ω ∈ R and notice that: Therefore The upper and lower bounds will then be defined as: Furthermore notice that depending on the value of ω the lower and the upper bounds will change. When −∆Xβ ≤ ω, and when −∆Xβ ≥ ω, Therefore this implies that (15) is equivalent to Theorem 2. (RW2013) Let Assumptions 1-7 hold. The identified set for β is given by: Theorem 3. (RW2013) Let Assumptions 1-7 hold. Then if β ∈ Θ SB , there exists a structure S SB = (β SB , F SB (V,α)|X ) that satisfies the restrictions of the model and is observationally equivalent to structure S 0 that generates P 0 (y 1 , y 2 |x). The sharp identified set for β and F (V,α)|X is given by: Proof. The proof of Theorem 2 and Theorem 3 can be found in RW2013.
Proof. From Theorem 1 the existence of the conditional joint distribution of (V, α)|X for every β ∈ Θ SB implies the existence of the marginal distribution F α|X for every β ∈ Θ SB and a marginal distribution F V with an associated marginal distribution of the difference F ∆V .

Dynamic Binary Panel Data Models
Section 2.1 studies static panel data models, where individuals' choices are correlated across different periods only through the unobserved heterogeneity. In panel data settings with repeated observations it is evident and natural to assume that individuals' past choices directly affect current and future decisions. For example, an individual's decision to seek employment in the current period is likely to be affect on his employment status last period. The dynamic binary response panel data model that includes the lagged dependent variable as an additional explanatory variable is equivalent to model (2), Last period's choice directly affects the decision so the choice in period t − 1 needs to be taken into account. This creates an initial condition problem in modeling the choice in period t = 1, since the choice in period t = 1 depends on the choice in period t = 0. In handling this problem an approach similar to Wooldridge (2005) is used. In particular, in this paper it is be assumed that the outcome in period t = 0, Y 0 = y 0 , is observed, however no assumptions about its generation are imposed or about its relation with the fixed effect, such that the set of conditioning covariates consists of (x, y 0 ) ∈ X × Y 0 . Section 2.2.1 formalizes the assumptions.

Model Assumptions
Assumption 8. The observed data comprise a random sample of N individuals from the where F contains the Borel Sets. The support of (X Assumption 9. For each value of x ∈ X and y 0 ∈ Y 0 there is a proper conditional distribution of (Y 1 , Y 2 ) given X = x and Y 0 = y 0 , and the conditional probability of each pair (y 1 , y 2 ), is point identified over the support of (Y 1 , Y 2 ) for almost every x ∈ X and y 0 ∈ Y 0 .
solutely continuous with respect to Lebesgue measure with everywhere positive density and the Assumption 11. The conditional distribution of α given X = x, Y 0 = y 0 is absolutely continuous with respect to Lebesgue measure with everywhere positive density on R and marginal distribution F α|X,Y 0 .
Assumption 12. X and V are stochastically independent conditional on Y 0 , i.e. V ⊥ X|Y 0 .
α is allowed to be correlated with both V and X in an arbitrary way. The joint distribution of Assumptions (8)-(11) have the same interpretation as in Section 2.1.1. Assumption 12 imposes only conditional independence between X and V conditional on Y 0 , which is less restrictive than the assumptions imposed in the literature, such as V ⊥ (X, Y 0 ) or specifying the conditional distribution V |X, Y 0 . This assumption allows for example correlation between V 0 and (V 1 , V 2 ). Assumptions 10-12 provide restrictions on the conditional distributions of the unobservables given the observed covariates. These conditional distributions are elements of the generic collection of conditional distributions defined as, As this paper examines identification in discrete linear-index panel data models the following additional assumption, similar to Assumption 6, is also imposed.
Assumption 13. The individual specific part, g(·), of model (2) is linear in the variables such that, Under Assumption 13 the model in (2) is given by: To account for last period's choice, this paper examines two different settings. In the first setting described in this paper, l(Y t−1 ) = 1(Y t−1 = 1) such that, and in the second, The rationale behind these settings is that the options in the models considered in this paper have a qualitative rather than quantitative meaning. The first setting is equivalent to the standard specification of dynamic binary panel data models, where the lagged dependent variable enters as an additional regressor in the form of Y t−1 and the parameter γ measures the "impact" of choosing option Y t−1 = 1. In the second one the parameter γ captures the effect of choosing the same option from one period to the next, rather than "a unit change" effect, and can be interpreted as a state dependence variable. The main focus of this paper will be in the first case. Finally, the set of admissible structures Assumption 14. S ≡ (β, γ, F (V,α)|X,Y 0 ) ∈ S is a specified collection of parameters β and γ, and joint distributions of the time-varying unobservable and the unobserved heterogeneity.
Such a S is called a structure.
Under Assumptions 8-14 the identified set of admissible structures, denoted by S 0 , is characterized. Define by u(ỹ 1 ,ỹ 2 , x, y 0 , α, v 1 , v 2 ; β, γ), the utility an individual with covariates x, y 0 and unobservables α, v 1 , v 2 receives from choosing any pair (ỹ 1 ,ỹ 2 ) ∈ (Y 1 , Y 2 ) and by R (y 1 ,y 2 ) (x, y 0 ; β, γ) the region of unobservables (V, α) associated with (y 1 , y 2 ) maximizing Then S 0 is characterized by, and the identified set for the model parameters (β, γ) is then characterized by, Point-identification of the regression coefficients in a dynamic binary response model under the logistic distribution assumption comes by observing individuals for (at least) four time periods who change behaviour from period t = 2 and t = 3, as shown in Honoré and Kyriazidou (2000). This behaviour gives rise to features of the distribution that do not depend on the unobserved heterogeneity. This paper uses exactly the same approach, and finds features of the distribution that do not depend on α, without imposing these strong assumptions on the time-varying unobservables. As it will be shown below removing the unobserved heterogeneity, even under Assumption 13, will lead to partial identification of the regression parameters.

Identified Set: Binary Response Dynamic Panel Data Model
This section considers an extension of model (6) that includes last period's choice and uses the specification in (19). The model in (18) can then be expressed as: The model in (23) is equivalent to standard linear-index dynamic binary response panel data models, like the one discussed in Honoré and Kyriazidou (2000), where the lagged binary dependent variable enters as an additional regressor. As in Section 2.1 identification of the model parameters (β, γ) comes through features of the distribution that are invariant to changes in α, by considering the joint probability of the choices individuals make in periods t = 1 and such that for all (V, α) ∈ (V, A), (Y 1 , Y 2 ) = (y 1 , y 2 ) when X = x and Y 0 = y 0 are given by: The conditional joint probabilities of F (V,α)|X,Y 0 for any given pair (y 1 , almost every x ∈ X and y 0 ∈ Y 0 are thus defined by, where P (y 1 , y 2 |x, From the regions in (24) and Figure 1(a), it can be seen that the model in (23) is complete in the sense that conditioning on any value of the exogenous variables and the initial condition, there is a unique solution to the individual's problem with probability one. Therefore, applying the same approach as in Section 2.1 that removes α, identification sets of the form of (21) and (22) can be derived. Since the outcome in period t = 0 appears in the generation of the outcome in period t = 1, the outcomes in all three periods t = {0, 1, 2} are used. Assumption 11 imposes no restrictions on the fixed effect and identification of the parameters β and γ through the elimination of α comes only by observing individuals who switch in periods t = 1 and t = 2 for each of the values of y 0 . This implies that consideration of observations of the following types are required, Proof. The proof of Theorem 4 follows similar arguments as in Theorem 1 and is provided in Appendix A.1. Figures 1(a) and 1(b) plot the regions of unobservables conditional on Y 0 = 0 and γ < 0 given in equations (24) and provide an outline of the main idea. The model in (23) is complete as it can be seen from Figure 1(a), since conditional on any value of the explanatory variable and the initial condition there is a unique solution to the individual decision problem with probability one. Using this fact it can be shown that the probability of any switching event is bounded by the probability of an event that is independent of the fixed effect. Figure   1(b) shows that for the event (Y 1 , Y 2 ) = (1, 0) conditional on Y 0 = 0 and γ < 0. Changing α moves the region of (1, 0)|Y 0 = 0 up and down the line ∆V = −∆Xβ − γ. Therefore, it The proof is provided in Appendix A.2 and follows similar arguments as in Section 2.1 and in Chesher (2013) and RW2013.
In dynamic models this period's choice depends on last periods's choice. This implies that the choice in period t = 1 depends on the choice in period t = 0, which is the first period observed in the sample. Unless this period coincides with the first period of the process, it will depend on previous (not observed) periods, the exogenous variables in period t = 0 and the joint distribution of the outcome in the first period and the unobserved heterogeneity. This joint distribution is (in general) different from the joint distribution of future outcomes and the unobserved heterogeneity. Therefore, since V ⊥ α, Assumption 12 of V ⊥ X|Y 0 does not imply (in general) V ⊥ (X, Y 0 ). Notice that unlike the Honoré and Kyriazidou (2000) |X,Y 0 that satisfies the restrictions of the model and is observationally equivalent to structure S 0 that generates P 0 (y 1 , y 2 |x, y 0 ). Bounds for β, γ and F (V,α)|X,Y 0 are given by the set: and sup x:−∆xβ≤ω The identified set in Theorem 6 does not specify the sharp identified sets for (β, γ, F (V,α)|X,Y 0 ).
The presence of the lagged dependent variable creates additional difficulties and defining the sharp identified set of structures is left for future work.
Suppose that instead of Assumption 12 of X ⊥ V |Y 0 , Assumption 5 of X ⊥ V as in Section 2.1.1 is imposed. Then the identified set for β and γ can be expressed in terms of the unconditional probabilities such that, Theorem 7. Let Assumptions 8-11, 5, 13-14 hold. Then the (unconditional) identified set for (β, γ) is given by, Proof. The proof applies the Total Law of Probabilities and is provided in Section A.3.

Dynamic Binary Panel Data Model with Changing Behaviour
Consider now the extension of model (6) that includes last period's choice, but allows for the lagged dependent variable to enter as specified by (20), Such a specification would be suitable in modeling cases where individuals form habits or are somehow locked-in once they made a decision and therefore staying with the same option from one period to the next has an additional effect on their utility, captured by the state dependence parameter γ. The individual decision rule in (18) can be written as, As in Section 2.2.2 the choice in period t − 1 directly affects the utility, and hence the choice in period t. Consider now the two cases where Y t−1 = Y t and Y t−1 = Y t , when Y t = Y t+1 .
Using the utility notation in (1), U t = g(X t , Y t−1 , α, V t ), in the first case the underling utility function in model (27) becomes, and γ does not appear in the determination of U t . In the latter case the utility function becomes, and γ appears in the equation determining U t . Notice one important feature of the model in (27), the underlying within period decision rule for each y ∈ Y t changes given Y t−1 . For This implies that, as it will be shown below, the model in (27) is both incomplete and incoherent 1 .
The regions of unobservables (V, α), R DB (y 1 ,y 2 ) (x, y 0 ; β, γ) associated with each (Y 1 , Y 2 ) = (y 1 , y 2 ) choice when X = x and Y 0 = y 0 are given by, 1 The within period decision rule will be different for different outcomes even in the case where the direction of change is explicitly modeled and allowed to have heterogeneous coefficients.
Proposition 1. From the regions defined in (31) and Figure 2, it can be seen that the dynamic model in (27) is incomplete, in the sense that ∃(x, y 0 ) ∈ X × Y 0 conditional on which there is not a unique solution to model (27) with probability 1.
Proof. From (31) the regions of unobservables associated with the choice pairs (Y 1 , Y 2 ) = (1, 0) and (Y 1 , Y 2 ) = (0, 0) when Y 0 = 0 are defined by, Suppose also γ < 0 and consider any (V, α) ∈ V * such that Then it can be shown that: This implies that the model in (27)  Proof. Suppose γ < 0 and Y 0 = 0. Consider any (V, α) ∈ V * * , where If −x 1 β − γ ≤ V 1 + α then Y 1 = 1 and Y 0 = 1, which contradicts the conditioning on This implies that conditioning on For (V, α) ∈ V * * it also means that −x 2 β < V 2 + α < −x 2 β − γ. This can correspond to −x 2 β < V 2 + α and V 2 + α < −x 2 β − γ. However, conditioning on Y 1 = 1 and Y 0 = 0 none of the two inequalities can hold. First take, −x 2 β < V 2 + α. This corresponds to the event Y 2 = 1|Y 1 = 0, which contradicts the Y 1 = 1 from above. Then the event V 2 + α < −x 2 β − γ corresponds to Y 2 = 0|Y 1 = 0 which again contradicts Y 1 = 1. Therefore, V * * = ∅ and the dynamic model in (27) is incoherent. Figure 2 shows the regions V * and V * * . The important feature of model in (27) is the presence of current period's outcome in the generation of current period's utility. Unless γ = 0, the model in (27) is logically inconsistent as discussed in Maddala (1986). Such a model would resemble models of simultaneous response games like Tamer (2003) and more general binomial response models with dummy endogenous regressors as in Lewbel (2007). The incompleteness and incoherency generally result in an inability to obtain point-identification of the regression parameters. Chesher and Rosen (2012) examine identification in incomplete and incoherent models and the application of these results to the model in (27) is left for future research.

The Ordered Choice Panel Data Model
Section 2 studies identification in binary panel data models. As discussed in Section 1 several papers, such as Chintagunta, Kyriazidou, and Perktold (2001)  This section extends the model (2) to a model of three ordered outcomes, where in every period t = 1 and t = 2 each individual chooses one option from the set Y t = {0, 1, 2}.
Such a model could be used, for example, in describing consumers' choices when faced with vertically differentiated alternatives such that if all the options were offered at the same price, where A B denotes that B is weakly preferred to A. Y t = 0 denotes the outside option and corresponds to not choosing any of the available alternatives. By extending the model in (2) to include options that are ordered the choice rule for each individual can be expressed as, where x t are observed individual characteristics, y t−1 is last period's choice, α is the unobserved to the econometrician time-invariant individual heterogeneity, v t is the time-varying component unobserved to the econometrician and g() is a function not yet specified, which is an element of some parameter space Θ. c = {c 11 , c 12 , c 21 , c 22 } ∈ C are the threshold parameters in the ordered model, such that C ⊆ R 4 and c 2t > c 1t , ∀t ∈ {1, 2}. They can be interpreted as the disutility parameters, exogenously determined and common to all individuals, incurred in period t to obtain quality level y. For the rest of the paper the vector c is assumed to be observed. In application these parameters are more likely to be unknown and therefore included in the identified sets. The simplification is done to reduce the dimension of the identified set for the model parameters. In future research and application the restriction that the threshold are observed will be relaxed.
As already discussed in Section 1 the ordered response panel data model has not been extensively studied in the literature and the work has mainly focused in redefining the ordered response model as a set of binary response models and imposing logistically distributed unobservables. This paper departs from this approach and uses directly the ordered structure of the model to characterize the identified set, without imposing distributional assumption on the unobserved time-varying components or the fixed effect. Such an approach uses more information than the binary response representation and provides informative identification bounds for the regression parameters.

Static Ordered Panel Data Model
This section examines identification of model (32) with no lagged dependent variables where correlation comes only through the presence of the unobserved heterogeneity. Applying Assumption 6 of linearity in the variables to (32), the static ordered response model for the periods t = 1 and t = 2 can be expressed as, Under Assumptions 1-7 the identified set of admissible structures, denoted by S 0 , is characterized. Define by u(ỹ 1 ,ỹ 2 , x, c, α, v 1 , v 2 ; β), the utility an individual receives from choosing any pair (ỹ 1 ,ỹ 2 ) ∈ (Y 1 , Y 2 ) given the observables x, c and the unobservables α, v 1 , v 2 and by R (y 1 ,y 2 ) (x, c; β) the region of unobservables (V, α) associated with (y 1 , y 2 ) maximizing Then S 0 is characterized by: F (V,α)|X,c (R (y 1 ,y 2 ) (x, c; β)) = P 0 (y 1 , y 2 |x, c) a.e. x ∈ X and c ∈ C and the identified set for the model parameters β is given by: F (V,α)|X,c (R (y 1 ,y 2 ) (x, c; β)) = P 0 (y 1 , y 2 |x, c) a.e. x ∈ X and c ∈ C

Identified Set: Ordered Response Static Panel Data Model
As in Section 2 no assumptions are imposed on the fixed effect and identification bounds for the regression parameters, β, are derived by finding feature of the distribution that do not depend on α. The regions R SO (y 1 ,y 2 ) (x, c; β) that partition the support of (V, α) such that (Y 1 , Y 2 ) = (y 1 , y 2 ) when X = x and for any fixed c, are given by: The conditional joint probabilities of (V, α)|X = x, c, F (V,α)|X,c , for any given pair (y 1 , y 2 ) ∈ (Y 1 × Y 2 ), conditional on x ∈ X and for any fixed c ∈ C are given by From the regions defined in (34) and plotted in Figure 3 it is easy to see that for any fixed vector c the model in (33) is complete in the sense that conditional on any value of exogenous variables x ∈ X , there is a unique solution to the individual decision problem with probability one.
In addition to the individuals who change from period t = 1 to t = 2 it can be shown that information that is independent of α is also provided by considering individuals who choose that same option, Y = 1, in periods t = 1 and t = 2 such that, This might prove helpful when comparing the behaviour of switchers to non-switchers. Assumptions 1-7, for any fixed parameter vector c, if S SO is an observationally equivalent structure to S 0 , then for any x ∈ X β SO satisfies where ∆X = X 2 − X 1 and ∆V = V 2 − V 1 .
Proof. The proof that the conditional probabilities in P (y 1 , y 2 |x, c) in Theorem 8 are bounded by features of the distribution that do not depend on the fixed effect α is provided in Appendix A.4. Notice that in addition to the probabilities of the switching events, the conditional probability of the "in-between" event (1, 1) is also bounded by a conditional probability invariant to the fixed effect. The rationale behind this result is that, as discussed in Section 2, in the binary model the events (Y 1 , Y 2 ) = (0, 0) and (1, 1) give no information on β since the behaviour for these "extreme" cases can be matched by extremely small or extremely large values of α regardless of the value of β. This is also true for the ordered model for the events (0, 0) and (2, 2). However, as in can also be seen from Figure 3 in the ordered response model where Y t = {0, 1, 2}, considered in this section, the middle event (Y 1 , Y 2 ) = (1, 1) provides restrictions on the possible values the fixed effect can take for each value of the parameter β, and hence it can be used to identify the regression coefficients.
Theorem 9. Let Assumptions 1-7 hold, for any fixed parameter vector c. Using the definitions in (37), an outer region for β is given by the set: a.e. x ∈ X and c ∈ C and for any fixed ω, ω , ω ∈ R Proof. The proof is provided in Appendix A.5. Notice that following Theorem 8, information on the parameters β comes also from the non-switchers who choose option (Y 1 , Y 2 ) = (1, 1).
The identified set in Theorem 9 provides an outer region of the projection of the joint identified set for (β, F (V,α)|X,c ), defined in the theorem below.
Theorem 10. Let Assumptions 1-7 hold for any fixed parameter vector c. Then if β ∈ Θ SO , then there exists a structure S SO = β SO , F SO (V,α)|X,c that satisfies the restrictions of the model and is observationally equivalent to structure S 0 that generates P 0 (y 1 , y 2 |x, c). The identified set for β and F (V,α)|X,c is given by: Notice that Theorem 10 does not characterize the sharp identified set. The question of sharp- ness remains an open question to be addressed in future work.

Dynamic Ordered Panel Data Model
Persistence in choice is evident when modeling individual behaviour in many different cases.
For example, in modeling consumer demand in markets with differentiated products where consumers are observed purchasing products regularly it is often observed that consumers choices are linked intertemporally. As discussed in Section 1 in addition to the spurious state dependence, this correlation can also come through a true state dependence where past choices directly affect the utility function. This is also true in cases where consumers purchase a specific product regularly and decisions are over sets of differentiated alternatives. As discussed in Dubé, Hitsch, andRossi (2009, 2010) the persistence in consumers brand and quality choices is evident even after controlling for the fixed effects. A common explanation is that consumers face monetary, psychological and search costs, that induce a lock-in effect, loyalty in specific alternatives and habit formation.
To capture these effects the model in (32) Similarly to the dynamic binary response panel data model, the presence of the lagged dependent variables results to an initial conditions problem. This paper does not examine the initial condition problem, but instead it is assumed that the outcome in the initial period t = 0, y 0 ∈ Y 0 ≡ {0, 1, 2} is known, however the model is not specified and the explanatory variables do not need to be known. Notice that since the choice set now is Y 0 ≡ {0, 1, 2} the parameters of interest are (β, γ) ∈ Θ, with γ = (γ 1 , γ 2 ) 2 .
Under Assumptions 8-14 the identified set of admissible structures, denoted by S 0 , is char-2 Wooldridge (2005) discusses dynamic ordered response models conditional on the initial condition, modeling the distributional assumption of the fixed effect given the initial condition, such as normality assumption, which is a different approach than the one used in the paper.

Identified Set: Dynamic Ordered Response Model
As in Section 2.2 the initial condition Y 0 is assumed to be known, however no information for its generation is assumed, and the joint probabilities for a sequence of events in periods t = 1 and t = 2 will be conditional on the choice in period t = 0. As the model is complete, the regions R DO (y 1 ,y 2 ) (x, y 0 , c; β, γ) such that for all (V, α) ∈ (V, A), (Y 1 , Y 2 ) = (y 1 , y 2 ) when X = x , Y 0 = y 0 and c, partition the support of (V, α). The conditional joint probability of (V, α)|X, Y 0 , c, when X = x, Y 0 = y 0 and c is fixed, for any pair (y 1 , y 2 ) ∈ Y 1 × Y 2 , is given by where P (y 1 , y 2 |x, y 0 , c) = P (Y 1 = y 1 ∧ Y 2 = y 2 |X = x, Y 0 = y 0 , c).
As in Section 3.1, for the dynamic ordered panel data model, for any value of Y 0 = y 0 , with no assumption on the fixed effect, restrictions on the realization of ∆V |X, Y 0 , c and hence information for the values β, γ that does not depend on the unobserved heterogeneity parameter are provided by the following observations, Appendix A.6 proves that the probabilities of the sequences of events described in (41) conditional on a specific value of y 0 ∈ {0, 1, 2}, X = x and c ∈ C are bounded by the features of the distribution that do not depend on the fixed effect. Theorem 11 formalizes the identified set for (β, γ).
Theorem 11. Let Assumptions 8-14 hold for any fixed parameter vector c. The outer region for β and γ is given by the set: a.e. x ∈ X , y 0 ∈ Y 0 and c ∈ C The proof is similar to the proof of Theorem 9 and therefore omitted.

Numerical Examples
This

Static Binary Panel Data Model
Consider the two time-period static binary panel data model, where X t = (X 1t , X 2t ), β = (β 1 , β 2 ), α|X ∼ N (Xδ, σ 2 α ) with X = 1 2 (X 1 + X 2 ) and δ = (1, −1) and V t |X, α iid ∼ f (). The true parameter of β 2 = 1 after the normalization of β 1 = 1 and σ 2 α = 1. For the purpose of comparison the baseline PGP is similar to the one used in RW2013 3 . Tables 1 and 2 give the identified sets for β 2 as described in Theorem 2 under different specification for the distribution of the time varying unobservables and as the support of the discrete explanatory variables (X 1t , X 2t ) changes. The first Probit specification in Table 1 with V t |X, α iid ∼ N (0, 1) is the same as in RW2013. In addition to the specification in RW2013 Table 1 also provides identified sets under the probit specification with V t iid ∼ N 0, π 2 3 and the standard logit specification V t ⊥ X, α with iid logistic distribution. for (X 1t , X 2t ) around zero Support of (X 1t , X 2t )  From Tables 1 and 2 two main conclusions can be drawn. The first one is that as the support of the explanatory variables increases the identified sets become narrower. This suggests that even thought the model only partially identify the regression parameters, those sets shrink around the true value as the support of the explanatory variables increases.
Secondly, it is evident that the model with V t iid ∼ N 0, π 2 3 errors and the standard logit model with variance of the unobservables var(V t ) = π 2 3 give almost identical identified sets. Since the two distributions are very similar, they tend to behave in a same way. This might have implications on the identification and estimation of such models, not suggested by the theoretical results.

Changing PGP: Variance of unobservables
This section provides identified sets for β 2 under different PGP for the variance of the time varying unobservables and the fixed effect when the support of (X 1t , X 2t ) is {−2, −1, 0, 1, 2}.   Table 3 gives the identified set as expressed in Theorem 2 when the true variances of the time varying unobservables and the fixed effect change. As it can be concluded, the identified set for β 2 becomes wider for PGP with higher variance of the unobservables.

Dynamic Binary Response Panel Data Model
Consider the two time-period dynamic binary response panel data model as the one described in Section 2.2, where X t = (X 1t , X 2t ), β = (β 1 , β 2 ), α|X ∼ N (Xδ, 1) with X = 1 2 (X 1 + X 2 ) and δ = (1, −1) and V t |X, α iid ∼ N (0, 1). The true parameter of β 2 = 1 and γ = 0.5 after normalizing β 1 = 1. , as the support of (X 1t , X 2t ) increases. It is clear that for the specific range of values for the grid of γ chosen the identified set for β 2 shrinks as the support of the discrete explanatory variables (X 1t , X 2t ) increases, however with three periods the identified set for γ is only bounded by above. Recall that as shown in Honoré and Kyriazidou (2000) the parameters in the dynamic binary response model with one lagged dependent variable and logistically distributed unobservables are point-identified with four periods. Extending the model in Section 2.2 to include more than three time periods is currently in progress.

Concluding Remarks
This paper studies identification of discrete response panel data models with fixed effects. Un- As discussed in Section 1 when the time-varying unobservables are independent and identically distributed with a logistic distribution, then the regression parameters in the linear index static and dynamic binary models can be point identified. The feature of the distribution these papers use that does not depend on the unobservable α is the conditional probability of the outcome variable in a specific period taking a specific value, conditional on the event that individuals change at some period in the past. The feature of the distribution that does not depend on the unobserved heterogeneity in this paper is the joint probability of the two outcome variables that take different values in two periods. If Assumptions 3 and 10 are strengthened such that the time-varying unobservables follow an independent and identically distributed logistic distribution, the bounds provided in this paper might still fail to be singletons. Chamberlain (1984Chamberlain ( , 2010 and Honoré and Kyriazidou (2000) prove point-identification of the regression parameters, when the regressors have bounded support, under the assumptions that the time-varying unobservables follow a logistic distribution and are independent of both the explanatory variables (X 1 , X 2 ) and α. Under Assumptions 5 and 12 V is allowed to be correlated with α. Imposing the assumption that V ⊥ α may or may not lead to pointidentification, but this would require an additional assumption which might not be credible or testable.
In conclusion, even though the identification bounds in this paper might not be singleton sets, they provide information on the regression parameters under fairly weak condition. Since the sets do not depend on any distributional assumption on the unobservables, they can provide information for a general class of linear index static and dynamic discrete response panel data models with fixed effects. Furthermore, they are relatively simple to construct and therefore might be easy to use for computation and inference. Extending the models to incorporate more time periods is currently in progress and in immediate next steps, the models studied in this paper will be applied to consumer demand for differentiated products.

A Proofs of Theorems
A.1 Proof of Theorem 4 This section provides the proof of Theorem 4. Following similar arguments as in Theorem 1, it can be shown that the event probabilities described in Theorem 4 do not depend on α.
Consider the sequences of events in (26). Conditioning on Y 0 = 0 event A implies the following conditional probability for the event (Y 1 , Y 2 ) = (0, 1), Conditioning on Y 0 = 0 event B implies the following conditional probability for the event Conditioning on Y 0 = 1 event C implies that the conditional probability of event (Y 1 , Y 2 ) = (0, 1) is equivalent to Conditioning on Y 0 = 1 event D implies that the conditional probability of the event (Y 1 , Y 2 ) = (1, 0) is equivalent to For any fixed X = x and by applying Assumption 12 , then which completes the proof.

A.2 Proof of Theorem 5
This section proves Theorem 5. Using arguments similar to the ones in Section 2.1 and in Chesher (2013) and RW2013, it can be shown that for any constant ω ∈ R, conditioning on and conditioning on Y 0 = 1, The relations in (43) and (44) in combination with Assumption 12 imply that, ∀ω ∈ R: These equalities in combination with (43), (44) and by applying Assumption 12 imply that (45) is equivalent to: which completes the proof.

A.3 Proof of Theorem 7
To prove the unconditional identified set in Theorem 7, first notice that Define P (Y 0 = 0|X = x) = P 0 (x) and P (Y 0 = 1|X = x) = P 1 (x), which are fully observed.
Following relations (43) and (44), ∀ω ∈ R: Define by Then multiplying by P 0 (x) and P 1 (x) such that, implies that the relations in (47) can be expressed as where the last result follows from Assumption 5 of V ⊥ X. The last relation implies that, This completes the proof.

A.4 Proof of Theorem 8
The conditional probabilities P (Y 1 = y 1 ∧ Y 2 = y 2 |X = x, c) = P (y 1 , y 2 |x, c) of the events given in (35) are then given by: From (48) it can be shown that: Then using Assumptions 5 for any fixed c ∈ C, the relations in (49) can be expressed as: The inequalities in (50) lead to bounds for β, such that for any given X = x, The above relations shows that changing choices from period t = 1 to t = 2 provide restrictions on the distributions of ∆V that does not depend on the fixed effect, α. In the binary case discussed in Section 2 the events (Y 1 , Y 2 ) = (0, 0) and (1, 1) gave no information on β since the behaviour for these "extreme" cases can be matched by extremely small or extremely large values of α. This is also true for the static ordered model for the events (0, 0) and (2, 2).
However, in the ordered response model where Y t = {0, 1, 2}, considered in this section, there is an "in-between" event, (Y 1 , Y 2 ) = (1, 1), that provides information on the β without involving the fixed effect. To see that consider the joint probability of choosing the event (36): By combining it can be shown that: which does not depend on α. This completes the proof.