Estimating dynamic Panel data. A practical approach to perform long panels

Panel data methodology is one of the most popular tools for quantitative analysis in the eld of social sciences, particularly on topics related to economics and business. This technique allows simultaneously addressing individual e ects, numerous periods, and in turn, the endogeneity of the model or independent regressors. Despite these advantages, there are several methodological and practical limitations to perform estimations using this tool. There are two types of models that can be estimated with Panel data: Static and Dynamic, the former is the most developed while dynamic models still have some theoretical and practical constraints. This paper focuses precisely on the latter, Dynamic panel data, using an approach that combines theory and praxis, and paying special attention on its applicability on macroeonomic data, specially datasets with a long period of time and a small number of individuals, also called long panels.


Introduction
Studies on Panel data methodology began in the XIX answering new questions that Pool data analysis or Time series could not directly solve. The rst works on this methodology was focused on lineal regressions and static models, where x and random eects were determined assuming a xed temporal eect without paying enough attention to endogenous relationships. To analyse these interactions, a new tool was developed in the XX: Dynamic models; Balestra & Nerlove (1966), Nerlove (1971), Maddala (1971Maddala ( , 1975 are some of the rst works. Finally, in the 70's, empirical studies on dynamic panel data began to be published in specialized journals. Dynamic Panel data methodology oers some advantages in comparison to the Static version. The possibility to address the heterogeneity of the individuals and also the use of several instrumental variables in order to deal with the endogeneity of the variables of the model, also known as lagged variables. Moreover, along with the estimation of models with endogenous variables, it is possible to perform more sophisticated models (Ruíz-Porras 2012). However, dynamic panel data also has some weaknesses. First, estimators can be unstable and the reported values could depend on characteristics of the sample. Also, the use of lagged variables not necessarily can deal with serial correlation problems (Pérez-López 2008). In addition, it is complex to nd appropriately instruments to some endogenous regressors when only weak instruments are available. Nevertheless, one of the main limitations of this methodology is the analysis for long time periods (long t) and few individuals (short n), which could result in the overidentication of the model (Ruíz-Porras 2012).
Several empirical studies on the eld of Economy are using databases with long time periods and small number of individuals, for example when researcher try to understand the eect of key factors on performance of companies, industries or territories. In order to built these models, previous authors have proposed to treat the equations from the dierent cross section units as a system of seemingly unrelated regression equations (SURE) and then estimate the system by generalized least squares (GLS) techniques (Pesaran 2006), while others assume Panel data as a more adequate methodology to deal with these estimations. This fact Revista Colombiana de Estadística 41 (2018) 3152 is the main target of this article, which provides some alternatives to face this situation and estimate dynamic models with long panels (long t and short n).
During the nineties, studies of endogenous models using dynamic panel data (DPD) were usual and some works on this methodology were carried out. Relevant contributions on DPD by Arellano & Bond (1991), Arellano & Bover (1995), Blundell & Bond (1998) and Roodman (2009) were provided in order to improve the understanding of the complex economic processes by empirical researches. Although, it has been more than thirty years from the rst works, this technique still has some open questions. Thus, the purpose of this article is to guide the reader in the use of dynamic panel data and provide some clues to solve limitations when panels are formed by a long t and short n. This restriction is addressed by using Stata, and solutions are oered in this text.
The paper is made up of four sections. The next section oers a review on Dynamic Panel Data including the models to be analyzed by using this methodology. Then, a detailed description about how estimate long panels is included.
The fourth section includes examples of endogenous model estimates using Stata.
Finally, remarked conclusions are provided.

Evolution and Advance on Panel Data Methodology
In the last fty years, Panel data methodology has become one of the most popular tools for empirical studies in dierent elds of knowledge. There has been an important progress in the knowledge for static models, but in the dynamic version still remain some theoretical and practical constraints. The purpose of this paper is to provide some clues and recommendations for the use of dynamic panel data, specically for the performing of endogenous models with long panels.
Panel data is a statistical tool to perform models using a number of individuals (companies, countries, households, etc.) across a dened period of time. This technique diers from cross-sectional analysis, which is used to perform an analysis of several individuals at a specic point in time, and the methodology of time series, which corresponds to the analysis of the same individual across time. Thus, the use of panel data requires two conditions: data from dierent individuals (n) collected over time (t). In addition to these conditions, restrictions may also arise due to the number of observations and the relationship between n and t. The recommendation to perform a model with panel data is to use a large number of individuals (n) and a small period of time (t), in order to have adequate degrees of freedom and avoid overidentication. This methodology has been used more frequently in studies at rm level, because databases usually have a large number (n) of observations in a short period of time (t). This condition oers the advantage of capturing the variability of the phenomenon, through observation of a large number of cases. At an aggregate level (e.g.: countries, regions, sectors, etc.), whose databases frequently have a small n/t relationship, even less than 1, some serious diculties arise when studies of endogenous models are carried out. Figure 1 shows an example comparing OLS and Panel analysis where the individuals eect has been taken into account providing a better adjustment and thus, improving the explanatory capacity. Dierent results (models) are obtained in the example when the models are performed by OLS or Panel. The above is a consequence of individual effects, which can be assumed by panel data methodology. In fact, individual eects (dashed line) generate a greater slope of the function than OLS estimating (solid line) and better adjusting the model to the observed data, improving the explanatory capacity. As it has been mentioned above, there are two main types of Panel data. Static panels, used to estimate static models, and Dynamic panels, more suitable to perform endogenous models. Static panels can be classied into models with xed or random eects, depending on how they consider the individual eects, assuming in both cases these eects as constant over time. The above restriction makes static models limited to consider the dynamics of time-varying, or the endogeneity. On the contrary, dynamic panel data allow us to treat endogeneity of variables and model.
From an evolutionary perspective, Nelson & Winter (1982) and Dosi (1988) indicated that endogenous models are highly dependent on the past and its accumulative process. Dynamic panels allow including an endogenous structure into the model through instrumental variables. This endogeneity is dened as the existence of correlation between the dependent variable and the error term, which is related to the causal relationship between the variables explainin the model (Mileva 2007, Wooldridge 2013, inadequate data quality, autoregression and autocorrelated errors and/or omission of relevant variables. In economic terms, endogeneity can be interpreted as the eect of the past on the present, both on the model (dependent variable) and on the independent variables, or as the causality relationship between regressors and explained variable along the time.
The inclusion of the dependent variable as regressor, consistent with the work reported by classical authors such as Arellano & Bond (1991), Arellano & Bover (1995) and Blundell & Bond (1998), is performed by using lagged endogenous terms as a way to avoid the correlation problems between variables, dening (Y ) : The second term of the function (regressors) corresponds to the lag of dependent variable (Y (it−n) ) plus the independent variables (X it) . Due to the causality is related to time, the regressor is included as the lag of Y (it) In addition, not only the lagged variables can be used as instruments of endogenous variables, but also others independent variables correlated to the regressor target but not correlated to the error term of the model. In general, these types of instruments are not very easy to detect, and many times they can be not completely correlated to the endogenous variable.

Types of Dynamic Panel data
The evolution in the analysis of dynamic Panel data and the building of estimators, have introduced new possibilities for the analysis of endogenous models.
These models have been specially focused on the econometric analysis of the endogeneity.
Two main ways have been developed to address the endogeneity in the models, in addition to traditional instrumental variables; the rst one, is to build instrumental variables in levels, while a second choice corresponds to the generation of those variables but in dierences. However, even when the literature has showed advances in these analysis, there are some diculties in the application of the dynamic panel data. This is particularly discussed in this document.

Dynamic Panel Functions: Instrumental Variables in Dierences and Levels
The rst method for the treatment of the endogeneity problem uses the instrumental variables obtained through lags of the endogenous variables. Depending on the estimator used, these lags may be applied in dierences or levels. The dierences between both methods will be expressed in the following equations: Instrument in dierences: Instrument in levels: The use of instruments in dierences or levels is expressed in the following equations: Equations in dierences: Equations in levels: Considering the building of instruments in dynamic panel data, it is possible to nd dierent estimators: The rst one was developed by Arellano and Bond in 1991 (Arellano & Bond 1991). It is known as Dierence GMM, because this estimator uses as instruments the lags in dierences.
Latter, it was developed the estimator that uses as instrumental variables the lags in dierences and levels. This change allowed to work with panel data composed by a small period of time, and therefore with a small number of instruments.
It is known as System GMM and it was developed by (Arellano & Bover 1995).
A third estimator was developed by Roodman (2006). It is called xtabond2.
This estimator follows the same logic that System GMM, but it introduces more options in the used of the instruments. In addition, xtabond2 allows us to work separately the endogeneity of the dependent or independent variables.
As we have mentioned, System GMM uses the instruments in level and dierences. The equations that allow its calculation are as follows: Equations in dierences and levels. System GMM The error term ε it has two orthogonal components: The use of dynamic panel data in dierences (Dierence GMM) and system (System GMM) requires dierent commands in Stata: xtbond (Arellano & Bond 1991). This command uses as instrumental variable the lags of endogenous variable in dierences (Dierence GMM).
xtdpdsys (Arellano & Bover 1995). This command uses as instrumental variables of endogenous variable the lags in dierences and levels (Dierence and System GMM) xtabond2 (Roodman 2006). Similarly to xtdpdsys, it uses the instrumental variables of endogenous variable as lags in levels and dierences. This is not an ocial command in Stata, but it is an option given by Roodman (2006). xtdpd. It is used for the regression of endogenous variables as instruments in dierences or levels. According to Cameron & Trivedi (2009), the use of this command will allow to correct the model of the average moving, being detected by the Arellano and Bond Test (Autocorrelation of second order).
In addition, the estimators mentioned above, allow us to do the analysis through two alternatives: One step and Two steps, depending on if the weight matrix is homocedastic or heterocedastic. Literature indicates that Two steps estimators are more ecient; therefore it is recommendable the use of the heterocedastic matrix in this type of estimations.
One step: It uses only the homocedastic weight matrix for the estimation. Two steps: It uses the heterocedastic weight matrix for the estimation.
The dierentiation between these alternatives is the key for the determination of overidentication in a dynamic model, as we will analyzed in the next section.

Main Issues in the Estimation of Dynamic Panel data Using GMM
The utilization of GMM in the estimation has two main issues: the proliferation of instruments and the serial autocorrelation of errors. These two issues will be higher when the panel used is made up by a sample with a big period of time and reduced number of individuals.
The proliferation of instruments refers to the existence of a higher level of instruments. This will cause overidentication in the model as a consequence of the Previous authors propose to treat the equations from the dierent cross section units as a system of seemingly unrelated regression equations (SURE) and then estimate the system by generalized least squares (GLS) techniques (Pesaran 2006).
Specically when N is small relative to T and the error are uncorrelated with the regressors crossection dependence can be modelled using SURE (Chudik, Pesaran & Tosetti 2011).
Although Pesaran (2006) pointed out some ideas for dealing with dierent size of N and T, the proposal is not totally appropriate when both N and T are large, as it is the case of countries studies. For N and T large some authors propose restricting the covariance matrix of the error using a common factor specication with a xed number of unobserved factors (Hoechile 1933, Phillips & Sul 2003. However, some econometric error occurs when N is large. For that reason, Pesaran (2006) proposed to apply the Common Correlated Eect estimators (CCE) when N and T tend to innite.
However, is quite common to have a small N and a large T. In this case, four solutions have been proposed; 1)Running a separate regression for each group and averaging the coecient over groups. 2) Combine the data dening a common slope, allowing for xed or random intercepts and estimating pooled regressions (Mairesse & Griliches 1988). 3) Take the data average over group and estimate the aggregate time series regressions (Pesaran, Pierse & Kumar 1989, Lee, Pesaran & Pierse 1990). Finally, 4) Averaging the data over time and estimating cross section regression on group means (Barro 1991). The solutions mentioned above present some limitations. The group mean estimator obtained by the average of the coecients for each group is consistent for large N and T, but the pooled and aggregate estimators are not consistent in dynamic models and there is a bias (Pesaran & Smith 1995).
Another approach deals with databases formed by small N and large T using Panel data methodology as solution to this condition (N small in comparison to T). This work is along this line of research.

Revista Colombiana de Estadística 41 (2018) 3152
The use of large number of individuals and short period of time is the most common type of data in dynamic panel analysis. It is called Short panels. The literature does not specify a number of individuals (n) or time (t) to classify the panels as long or short. However some authors have indicated the following rule: a suitable n could be greater than 100, while the t should not exceed 15 periods, and ideally it should be less than 10, if the target is to estimate dynamic models with panel data (Roodman 2009). This is When we try to estimate dynamic models with panels conformed by n relatively small (n < 100) and t large (t > 15), using lags of variables as instruments of endogenous terms, we nd additional diculties due to the panel data structure and the way of instrumental variable generation. This fact is caused by the incorporation of lags of endogenous variable as its instrument(s), which must be correlated to the endogenous regressor and E(µ|x) = 0. This alternative of instruments (lags of endogenous variables) resolves the problem nding a suitable instrument to endogenous regressors. Anderson & Hsiao (1981), Arellano & Bond (1991) and Arellano & Bover (1995) have demonstrated the importance of lags as instruments, and the relevance to estimate dynamic models. Nevertheless, when using long panels an important obstacle emerges: the proliferation of instruments (Roodman 2009). This is because the number of instruments to be generated is directly related to the length of the panel (number of periods). For example, for a variable with t = 5, the number of potential instruments is 12 (from equations in dierences and 3 from equation in levels) when we use GMM methodology.
In the case of equations in dierences, the number of instruments is dened as follows: Function: If there are one or more endogenous variables, the number of instruments increases even more, as each regressor is instrumentalized by all their dierences and levels (with GMM). This proliferation of instruments, initially was seen as favorable, since it increased the eciency of the estimator (Arellano & Bond 1991), however, it causes an overidentication of the model, mainly when the number of degrees of freedom is small, e.g. when there are few individuals. Therefore, as the panel grows in periods and decreases on number of individuals, the probability of overidentication increases. In Table 1 we show a brief summary of the main problems that arise with the use of panel analysis. H 0 = overidentication restrictions apply.
In order to avoid an overidentication of the model, the number of individuals or groups must be greater than the number of instruments used. Therefore, reducing the number of instruments becomes a necessary condition when we use long panels. The literature shows various alternatives to solve this problem depending on the nature of the model, the purpose of the analysis, the length of the panel and the characteristics of the variables. The rst alternative is to reduce t, dividing the analysis into two sections (two separate models). Other possibility is to group the periods (e.g. using biennia, trienniums or others). However, these options are limited, because they reduce the information available for the analysis, aecting the variance.
Another alternative is to reduce the instruments through the restriction of lags.
As an endogenous model is specied incorporating the lag(s) of the independent variable (Y) as regressor(s), it is common to limit to one or two the periods (lags), that is Y (t−1) and Y (t−2) (commonly known as L1 and L2, respectively). If we suspect of a delayed (endogenous) eect, it is recommended to add more lags, a situation that can be oset by the elimination of those closest to t 0 due to each L (lag of each endogenous regressors) incorporates more instruments.

Revista Colombiana de Estadística 41 (2018) 3152
It is also possible to reduce the generation of instrumental variables, either lag of Y or endogenous regressors, using only equations in dierences or levels. In addition, if we need to further reduce these types of variables, we can restrict lags of each variable to a value between t − 1 and t n , in other words, to estimate the model using as instruments only those generated for a time interval and not for the entire period of the panel.
In order to select the option to reduce the number of instrument, some criteria are proposed: • Sample characteristics.

Number of individuals (n)
Time periods (t) • Literature review on characteristics of model (endogenous or not) and the regressors.
• Serial correlation between model's errors • Overidentication. In order to manage many instruments it is required more than one alternative to limit them.
In endogenous models, in addition to the overidentication discussed above, additional drawbacks related to the serial second-order autocorrelation of residues can arise, indicating that the instrument used is not consistent. Given this limitation, we constantly need to test instrument variables in order to dene the most appropriate regressor, because even when the number is suitable, can remain the serial autocorrelation inconvenient. To identify whether or not autocorrelation, Arellano and Bond test should be used, as follows: Arellano and Bond test.
The null hypothesis is: • Ho: There not exist autocorrelation.

When Arellano and Bond test indicates that there is serial correlation in
both levels, probably we are facing a unit root model.

Modeling Endogenous Functions With Panel Data:
Step by Step 1 This section contains the syntax for the use of dynamic panel data in Stata and the interpretation of the set of estimators: xtabond, xtdpdsys and xtabond2.

xtabond Estimators (Instrumental Variables used in Dierences)
To perform a regression using xtabond we will distinguish between models with endogenous, exogenous and/or predetermined variables.
Firstly, we should do the estimation without the vce(robust) option, and then apply the Sargan test (this test only works without this option).
The order in the syntax will be as follows: xtabond indicates to Stata that you are using dynamic panel data, then, the dependent variable (vardep) has to be written, and later the independent exogenous variables (in the example: var1, var2, var3 and var n). Finally, after the comma and with the expression lags, you should introduce the number of lags of the dependent variable as regressor. Secondly, the model is estimated with the option vce(robust). After this option we will add the Arellano and Bond test to determine the existence of serial autocorrelation or not (estat abond).
In order to indicate to Stata the use of independent variables as predetermined, we use the following syntax after the comma: pre (var4, var5, lagstructur(#,#)).
Inside the parentheses we will introduce the predetermined variables (in the examples var4 y var5) and the limitations of the lags (lagstructur(#,#)). The rst # indicates the number of lags introduced in the model, and the second # indicates the maximum quantity of lags.
The number maximum of lags will depend on the period of time of the sample, taking into account that when it is used, the estimator in dierence, one t is lost for each dierence.
In the estimation other commands can be used to specify the limitation of the instruments: 1. Maxdelp (#) Maximum number of lags of the dependent variables that can be used as instruments.
2. Maxlags (#) Maximum number of lags of the predetermined and endogenous variables.
Where, var1, var2 and var3 are the exogenous independent variables, var4 and var5 are the predetermined independent variables, and var6, var7 and var8 are the endogenous independent variables.
The description of the syntax for these estimators (xtdpdsys) is similar to the xtabond (explained in paragraphs above). The only dierence is the use of the comand xtdpdsys. At the level of methodology, the main dierence between both estimators (xtabond and xtdpdsys) is the treatment used for the building of instrumental variables. The rst one use only instrumental variables in dierences and the second one use instrumental variables in levels.

xtabond2 Estimator (Instrumental Variables in Dierences and Levels)
Stata has some estimators that use the instrumental variables in level and dierences. (xtdpdsys and xtdpd). However, the estimator xtabond2 has some advantages regarding the latter ones. This estimator allows excluding the lags of the dependent variables as regressors.
To perform the model using this estimator it is necessary to install the command in Stata. For doing that, it must be written as ndit xtabond2 in the command bar.
As it has been mentioned, xtabond only use as instrumental variables the lags in dierences. This will reduce the number of instruments used in the regression.
In addition, xtabond2 uses the lags in levels, increasing the size of the matrix (system equation) and the number of instruments of the endogenous variable(s).
Therefore, the rst one, xtabond, is recommendable when the period is long, while the last one, xtabond2, is better for a panel with a short period of time, given that it incorporates the instruments in levels, reducing the loses of information.
xtabond2 can use instruments in dierences and in levels. This information is incorporated in the model with the following expressions; instruments in difference and levels (gmmstyle), only dierences (command eq(di )) or only levels (command eq(level)).
To run the analysis on Stata with xtabond2, the instructions are divided into two parts; the rst one identies the variables that we are going to analyze, and the second one indicates how those variables are going to be incorporated into the model (endogenous, predetermined or exogenous). This second part also introduces the restrictions. Both parts of the equation are separated by a comma.
First, we will introduce the dependent variable with its lags and then, the independent variables. If we want to incorporate the dependent variable as a regressor, this must be specied between the dependent and independent variables using the syntax of l.vardep, for the rst lag of the dependent variable, l(#). This same structure is used for the specication of independent variables through their lags.
There are two ways for giving instructions to Stata in the treatment of the variables.
a. gmmstyle o gmm: for endogenous and predetermined variables. b. ivstyle o iv: for exogenous variables xtabond2 doesn't require the postestimation for Sargan and Hansen test (overidentication and for the serial autocorrelation of the error term), because these tests are reported directly.
In the following lines we describe the syntax in XTABOND 2 for the use of exogenous, predetermined and endogenous variables.
Robust is the instruction for working with heterocedasticity.

Models with Independent Variables as Predetermined
There are three alternatives for the specication of predetermined variables that report the same results: The last syntax has used the rst option. This indicates that var4 and var5 are predetermined variables, and therefore, they are specied with the command gmm. Moreover, the exogenous variables will use the command iv.
The independent variables can be introduced using one or more lags. This is expressed in the rst part of the equation. (e.g, if we want to use one lag, it should be expressed with the command l., ex: l. (var6)). This means that the variable var will be analyzed using its rst lag.
The syntax is as follows: The following table (Table 2) shows a summary of model estimations alternatives to deal with endogeneity and the corresponding command in Stata. The statistics reported is x 2 . The number close to the x 2 in parentheses, correspond to the quantity of instruments over the instruments needed. The dierence between the total instruments and the instruments leftover, is the optimal number of instrument for the model.
The interpretation of the Sargan test will be as follow:

Null hypothesis
Ho: All the restrictions of overidentication are valid.
Criteria of rejection or acceptation: Prob>chi 2 ≥ 0.05(5%) If the probability obtained is equal or higher to 0.05, the used instruments in the estimation are valid, and therefore overidentication doesn't exit. Therefore, there is no evidence to reject the null hypothesis. However, if the probability is lower than 0.05, the data is suggesting that the instruments are not valid and as consequence there is overidentication in the model. Therefore, we reject the null hypothesis.
If the probability is close to 1, this doesn't mean that the instruments are valid.
It means that the asymptotic properties of the test have not been applied. In that case, we should reject Ho, as in the case where the probability is lower than < 0.05 (Roodman 2009).
Given that the estimator uses the higher quantity of available instruments and the probability of overindentication is high, when we reject the Sargan test, it is recommendable to apply some restrictions to the generation of instruments. For doing that, we can use the following commands: xtabond and xtdpdsys: maxlags or maxldep xtabond2: lags, collapse, eq(level) and eq(di).

Hansen Test
This test is available for xtabond2 and it is calculated directly when we use this command. In addition, it is recommendable to use it with the heterocedastic weight matrix (Two step). The interpretation of the test will be as follow: Null hypothesis(Ídem Sargan) Ho: All the restrictions of overidentication are valid.
Criteria of rejection/acceptation P rob > x 2 ≥ 0.05(5%) If the probability is close to 1, it means that the asymptotic properties of the test have not been applied, and therefore we also must reject Ho (Roodman 2009).
As recommendation P (x 2 ) should be in the range of 0.05 ≤ P (x 2 ) < 0.8, being the optimal to nd a probability 0.1 ≤ (x 2 ) < 0.25 If P (X 2 ) is out of that range, the model could be overidentied and might be needed the introduction of some restrictions in the generation of instruments. For the application of this test Stata use the following commands: Sargan Test: estat sargan Using it after the estimation with One step Hansen Test: it is given directly when xtabond2 is used.

Arellano and Bond Autocorrelation Test
Dynamic panel data introduces the condition of no correlation in the errors term (Cameron & Trivedi 2009). For testing that, we use the Arellano and Bond test.
We should expect that the probability of Ar(2) (pr > z) will be not signicant at 5%. This will conrm the absence of serial autocorrelation in the errors.
The interpretation of this test will be as follow: Null hypothesis: Ho: Autocorrelation doesn't exit.

Criteria of rejection/acceptation
To reject that null hypothesis we will use AR (2). This rejection applies when the probability pr > z is higher than 0.05, that is to say, the errors term are not serially correlated.

Conclusions
Panel data methodology has become one of the most popular tools used by researchers and academics who try to explain economics phenomenon by empirical analysis. Panel data allows incorporating into the analysis the eect of individuals and time, which gives a great advantage over cross sectional or time series.
Findings and new contributions have enabled to perform dynamic models, being able to analyze endogenous processes as evolutionary theory proposes. Most of the works in this regard have been conducted using databases made up of a large number of individuals and a short period of time (typical of micro data), however when estimating panels with few individuals and extended periods of time some limitations arise.
The main restriction in these cases arises from the overidentication of the model due to the proliferation of instruments of endogenous regressors when we use the GMM alternative including equations in levels and dierences. This situation requires adjustments (options) and to apply several considerations to properly estimate models with this type of databases and methodology.
Although there are important advances in the study and works on panel data, dynamic version still requires additional eorts. Therefore, this article addresses this restriction in order to guide researcher to implement dynamic panel data using Stata software. In particular, this paper provides guidance for the scholars to understand the origin of the overidentication, as well as provide some tools to solve it.
Among the main strategies to conduct the overidentication described in this article are: restricting the lags of the dependent variable used as regressor of the model; limit the use of lags to generate instruments of endogenous independent variables; and avoid using equations in levels and dierences simultaneously. In addition, researchers should pay attention to the serial autocorrelation tendency in this type of models. Thus, both challenges must be simultaneously addressed.
All the above must be permanently checked through the statistical tests in order to verify that conditions and restrictions of the estimation are found. Therefore, the incorporation of explanatory variables should be step by step, avoiding overidentication and allowing a better t of the model. This article is not free of weaknesses, since the objective is to provide a practical methodological support for non-specialist researchers in econometrics. The focus of this work is not the building of a theory, the search of a new estimator or specic test for this type of panel data, but this paper only tries to provide a guide to estimate dynamic models using Panel. In this sense, this work oers a way to carry out quantitative studies on several phenomena from data collected in a long time series and small number of individuals, which is common in database of countries, regions or where the observed unit has a limited population.