Sectoral Contribution to Nigeria’s Gross Domestic Product (GDP) Growth Rate: A Study of Multicollinearity in Aggregated Time Series Data

The study examined aggregated GDP time series data for the period 1991 – 2010 to analyze the contributions of each sector to Nigeria’s GDP growth; and observe deficiencies inherent in the use of Ordinary Least Square (OLS) technique in GDP studies. Secondary data was used for the study. The data was sourced from the Central bank of Nigeria (CBN), and the National Bureau of Statistics (NBS). Analysis was carried out using graphs, tables, OLS regression technique, pairwise correlation, and Generalized Ridge Regression Analysis (GRR). Findings show an increasing trend in aggregate GDP for the study period, with an average GDP growth rate of 4.98%. Severe multicolinearity was detected in the GDP time series data using OLS regression with poor relationships, and high R value but few significant variables. The pairwise correlation coefficients also showed the presence of moderate to severe multicollinearity in the aggregate GDP variables. As a remedial measure, Generalized Ridge Regression analysis was employed. The GRR result showed a more well behaved model compared to the OLS implying a more precise estimation of the regression coefficients for the GRR model, which show a better influence of each sector’s contribution to the GDP growth rate within the twenty-year period. Therefore, it would be more Original Research Article Ejiba and Omolade; JSRR, 11(1): 1-13, 2016; Article no.JSRR.26364 2 appropriate for researchers in GDP time series studies to account for the problem of multicollinearity by utilizing Ridge regression technique for estimation since all Ridge regression models give more robust estimates than the OLS models.


INTRODUCTION
The various sectors of any economy have a contribution to the development of any economy. This is to say that no matter how small the contribution of any sector to the national income of that economy, it adds up to the aggregate income of the economy and thus contributing directly or indirectly to the gross domestic earnings of such economy [1]. Gross Domestic Product (GDP) is referred to as the value of final goods and services produced in the economy during a given period [2]. Since independence in 1960, the Nigerian economy remains weak, narrow and externally-oriented with primary production activities of agriculture and mining and quarrying (including crude oil and gas) accounting for about 65% of the GDP and over 80% of government revenues. In addition, the primary production activities account for over 90% of foreign exchange earnings and 75% of employment [3]. In contrast, secondary activities comprising manufacturing and building and construction, which traditionally have greater potential for employment generation, broadening the productive base of the economy and generating sustainable foreign exchange earnings and government revenues account for a mere 4.14 and 2.0% of gross output, respectively.
Over the last seven years, certain changes have taken place in the structure of output in the economy; prominent among these changes is the entry of the telecommunications sector which has witnessed explosive and sustained real GDP growth. In spite of this, the Nigerian economy continues to grapple with a number of challenges that has hampered efforts at economic transformation. The economy is yet to achieve the necessary structural changes required to jump-start rapid and sustainable growth and development, aside disarticulated and narrow productive base, while the sectoral linkages in the economy are also weak [4]. Although the economy experienced respectable GDP growth rates, averaging over 6.5% per annum between 2006 and 2012 [5] this growth neither brings commensurate employment nor reduce the poverty level experienced in the country.
For effective economic planning and economic growth however, government require GDP time series studies as they are important and give policy makers ample information on the performance of government. Time series data and observations including GDP, stock prices, money supply, consumer price index, annual homicide rate and automobile sales figures are given over time; according to [6]. Unlike the arrangement in cross sectional data, the chronological ordering of observations in a time series conveys potentially important information.
A key feature of time series data that makes it more difficult to analyze than cross sectional data is the fact that economic observations can rarely, if ever, be assumed to be independent across time.
Therefore, forecasting economic relationships may be plagued by estimation problems including the problem of multicollinearity.
Therefore, an empirical investigation which attempts to measure the validity of the theory against the observable data [7] may be required.

Conceptual Review
Shantha [8] defined multicollinearity as the existence of a linear relationship among of the independent variables. This occurs when too many variables have been included in the model and a number of different variables measure similar phenomena. It is defined in terms of a lack of independence, or of the presence of interdependence -signified by high intercorrelations within a set of variables, and under this view can exist quite apart from the nature, or even the existence of a dependency relationship between X and a dependent variable Y [9]. Its severity causes difficulty in making prediction inferences and estimations as well as selecting an appropriate set of variables for the model. In such cases, any inferences based on the parameter estimations of the model become invalid [10]. [11] noted that multicollinearity may be due to the data collection method employed, constraints on the model or in the population being sampled, model specification such as adding polynomial terms to the regression model, and an over determined model which is defined as a model with more explanatory variables than the number of observations. It must thus suffice to emphasize that multicollinearity is both a symptom and a facet of poor experimental design [12]. It is a statistical issue which constitutes a threat and often a very serious threat both to the proper specification and to the effective estimation of the type of structural relationships commonly sought through the use of regression techniques because it inflates the variance of ordinary least squares estimator and a numerical issue in the sense that the small errors in input may cause large errors in the output. The presence of serious multicollinearity would reduce the accuracy of the parameter estimate in a linear regression models. Many authors even suggest that an examination for the existence of multicollinearity should be routinely performed as an initial step in regression analysis.
As rightly opined by [10], multicollinearity causes major interpretative problems in regression analysis, such as wrong sign problem, produces unstable and inconsistent estimates of parameter, insignificant regression coefficients where in fact it is significant and it is thus very essential to investigate and detect the presence to reduce the destructive effects of multicollinearity.

Literature Review
A number of studies on GDP have been carried out in Nigeria. [13] studied the role of Stock Market development on economic growth in Nigeria using a 15-year time series data from (1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) using Ordinary Least Square technique. The result revealed a negative, but non-significant relationship between GDP and Stock market capitalization; while there was a positive relationship between GDP and liquidity. To examine the severity of multicollinearity, collinearity diagnostic was performed for the independent variables which gave Tolerance values of 0.033, 0.34 and 0.495 respectively suggesting the presence of multicollinearity among the variables. Finally, the authors concluded that "we should view with caution the notion that stock market size is not significant for economic growth since multicollinearity exists in the data used for this analysis''.
Ahungwa et al. [14] examined Trend Analysis of the contribution of Agriculture to the GDP of Nigeria between the periods (1960-2012). The study utilized multiple regression analysis specified as Double-Log model to examine the contribution of Agriculture to Gross Domestic Product (GDP). The study showed a statistical significant relationship between GDP and both agriculture and industrial sector; with R 2 value of 99%.
Oziengbe [15] studied the relative impacts of Federal Capital and recurrent expenditures on Nigeria's Economy (1980-2011) using multiple regression, co-integration and Error Correction Model (ECM). The OLS results show that only the signs on the coefficients of government expenditure (GOVEXP) and remittance (REMIT) conform to a priori expectations. The signs on the other variables, including foreign direct investment (FDI), and official development assistance and foreign aid (ODAAID) do not conform to a priori expectations. The study concluded that within the period under review, government expenditure and personal remittances from Nigerians abroad contributed significantly to the size of the nation's economy, while the effects of FDI and ODAAID on the nation's economy, showed negative signs and were not significant. The study also drew attention to the fact that "we must be cautious not to insinuate that foreign aid, ODA and foreign direct investment pose dangers to the economy as they could be vital ingredients for economic growth and development if properly harnessed". This assertion therefore brings to light the question of the appropriateness of the techniques utilized in these studies; as according to [8] using classical multiple regression analysis may result to various symptoms of multicollinearity including overstated regression coefficients, incorrect signs, and highly unstable predictive equations. It is in the light of these that this study aims to draw attention about the dangers apparent in the use of OLS technique in examining GDP time series data; bearing in mind the possibility of the existence of severe multicollinearity which could render inference less reliable -in spite of the contention by [16] that multicollinearity is God's will and not a problem with OLS or statistical technique.

Study Area
The study was conducted in Nigeria, a country in West Africa with population of over 160 million people and sharing land borders with the Republic of Benin in the West; Chad and Cameroun in the East; and Niger in the North. Its coast lies on the gulf of Guinea in the south, and it borders Lake Chad to the Northeast. Nigeria has an area of 923,768 square kilometers and lies between latitude 4° and 14° N and longitudes 3° and 14° E of the Greenwich meridian, which is entirely within the tropical zone.

Data
The study made use of secondary data obtained from the published documentation of National Bureau of Statistics (NBS) and Central Bank of Nigeria (CBN) bulletin on the Gross Domestic Product (GDP) of Nigeria over a 20 year interval (1991 -2010). Sectoral contribution of the identified sectors and sub -sectors of the nation's GDP (at 1984 Constant Factor Cost in N Million) were considered for data collection. The fifteen (15) identified sectors includes: agriculture, crude petroleum, mining and quarrying, manufacturing, utilities, building and construction, transportation, communication, wholesale and retail trade, hotel and restaurants, finance and insurance, real estate and business services, housing, community social and personal services. From the total GDP and from sub -sectors, contribution of each sector to the aggregate national GDP was obtained for the periods under the study. From the aggregate GDP obtained from each year and according to [5] a computation of the GDP growth rate was done as given by the expression; Where: Y ୲ = aggregate GDP in year t; Y ୲ିଵ = aggregate GDP in year t − 1 (previous year)

Analytical Technique
The analytical techniques employed include Ordinary Least Square (OLS) regression, and Generalized Ridge Regression Analysis (GRR).

Contribution of sectors to aggregate GDP
The model employed was derived using Ordinary Least Square (OLS) to fit the implicit function Y t = f (X 1t , X 2t ….X 15t , u t ) into the following different functional forms.

Detecting multicolinearity in the GDP time series data
In detecting multicollinearity, OLS regression analysis was conducted and the R 2 value and the number of significant variables were observed. In addition, pairwise correlation of the selected sectors of the economy, Tolerance, Variance Inflating Factor (VIF), and Condition Index (CI) was also used in detection of multicollinearity.

Remedial measures for multicolinearity
The remedial measures adopted in the study include: (1) Acquisition of additional data (analysed by OLS) to the initial sample: Addition of time series data of 10 years (1981 -1990) were added to the initial dataset of (1991 -2010) to give a total of 30 years.
(3) The data is transformed by taking the second difference of the series (differences of successive values of the variables) to account for non-stationarity of the time series data. The original (linear) OLS regression equation is given as: The first differenced form is therefore given as: The second differenced form is: (4) Other remedial measures used in the study include dropping of one or more of the collinear variables from the model, and respecification of the model; that is respecifying the linear functional form of the OLS model stated above into a Log-Linear model given as: (5) Undertaking General Ridge Regression Analysis (GRRA): A comparison is made between regression estimates obtained from the GRRA analysis and the OLS estimates of the initial sample, in terms of the predictive accuracy of model. Table 1 gives the average contribution of each sector to the aggregate national GDP growth rate, as well as trend in aggregate GDP, and its growth rate over the period (1991 -2010) under review.

Sectoral Contribution to Aggregate GDP
From

OLS regression analysis
Subjecting the variables to an OLS regression however gives the following outcome. From the result, only few t-ratios are statistically significant with high R 2 ("classic" symptom of multicollinearity) and other values indicate the presence of strong multicollinearity. In accessing the individual sectoral contribution to GDP growth rate, the detection of multicollinearity, type and severity, Ordinary Least Square (OLS) model in linear functional form is employed. From the result, agriculture, utilities, finance and insurance sector show negative relationship with GDP growth rate (-1.532E-5, -.001, and -3.525E-5 respectively), while other sectors show positive relationship with growth rate of GDP. Manufacturing (.007) is significant at 1%, wholesale and retail sector is significant (.027), utilities (.078), and producers of government services (.089) are significant at 5%. R 2 (.995) and its adjusted value (.974) show that the model is well behaved but few of the variables are significant indicating the presence of multicollinearity. This result obtained here shows a similar trait to those obtained in the previous GDP time series studies given above in the review of literature. From Table 2, the different ways of detecting multicollinearity and its severity as reviewed in the literature indicated that building and construction, hotel and restaurants, housing and dwelling and community social and personal services have zero tolerance (TOL) values and hence exhibit greatest collinearity among variables of model. Only crude petroleum sector has VIF value (6.694) with below 10 indicating moderate collinearity, while other 14 sectors are highly collinear variables with building and construction (9225.752) being the highest.

Pairwise correlation
The pairwise correlation estimate or correlation matrix shows the correlation among various sectors of the economy. High pairwise correlation with respect to the sectors contribution to GDP growth rate is one of the ways to determining the presence of multicollinearity. Table 3 shows few high pairwise correlation coefficients among explanatory variables while most coefficients are moderate or low correlation suggesting weak or no serious collinearity. This method is not usually reliable in identifying severe collinearity as it gives many low pairwise correlation coefficients. However, high pair-wise correlation is sufficient but not a necessary condition for the existence of the multicollinearity [17].

Acquisition of additional data
As one of the remedial measures to reducing multicollinearity, additional data from 1981 -1990 was added to the initial time-series data to make a total of 30 years. OLS regression result for this measure is shown in Table 4 to assess the extent to which multicollinearity severity has been reduced by comparing it with OLS regression results in Table 2. With the acquisition of additional data, mining and quarrying, hotel and restaurants, finance and insurance, real estate and business service sectors and housing dwelling (-.001, -.001, -5.138E-5 and -.010 respectively) show negative relationship with GDP growth rate while the other sectors show positive relationship with GDP growth rate. Real estates and business service sector (-.010) exhibit greatest influence on GDP growth rate similar to that of Table 2. Crude petroleum (.001), manufacturing (.000), wholesale and retail (.006), real estate and business service (.003), producers of government services (.001), community social and personal services (.001) are positive and significant at 1%. Transport (.017) sector, finance and insurance (.041), and mining and quarrying sector (.063) are significant at 5%. Also, R 2 (.979) and its adjusted value (.953) are high in Table 6 relative to that of  Table 4. Different indicators pointed out to the fact that collinearity among the sectors was reduced as a result of the added data. In Table 4 after acquisition of additional data, more coefficients (9) are significant than in Table 2 before acquisition of additional data. More coefficients in Table 4 have tolerance (TOL) values (greater than zero) than in Table 2. Thus explanatory variables before acquisition of additional data exhibit greater correlation than after the acquisition of additional data. Also, more VIF values in Table 4 are closer to 10 than in Table 2, though none of the VIF values are below 10. Condition Index (CI) value (211.821) above 30 indicated a severe case of multicollinearity but less severe compared to CI value (275.559) before the acquisition of additional variables. Therefore, acquisition of additional data to the initial dataset made collinearity among the sectors less severe.

Re-specification of the model
One remedial measure that produced similar results to this multicollinearity remedial measure in Table 4 is re-specifying the model into a loglinear regression model as shown in Table 5.
OLS results from the log-linear model such as R 2 (.993), adjusted R 2 (.965), number of significant variables (9) and CI value (291.465) indicated severe multicollinearity, less severe than that of the linear model (403.722) but more severe (211.821) than after additional data acquisition.

Dropping variables from the model
As this is a severe case of multicollinearity, another remedial measure is dropping highly collinear explanatory variables from the model to reduce collinearity among them. Variable were dropped based on tolerance and CI values of each of the variables leaving 5 sectors as shown in Table 6. From the table, utilities (-.002), and transport (-9.835E-5) show negative relationship, while other sectors show a positive relationship with GDP growth rate. Manufacturing (.000), utilities (.006), and communication (.001) are significant at 1% with relatively high R 2 (.834) and adjusted R 2 (.775). Tolerance values for all the variables are well above zero (i.e. variables are less collinear) compared to results in Tables 2, 4 and 5. VIF values are well above 10 indicating far less collinearity by each of these variables. CI value is (19.619) which is lesser than 30 and all the CI values of any of the remedial measures. All these indicate that the multicollinearity reduced to is a moderate one unlike in the previous cases. Thus, dropping of some highly collinear variables, better than any other remedial measures has greatly reduced severe multicollinearity. However as often seen, exclusion of these variables also led to a model specification error as some of the excluded variables are important contributors to aggregate GDP and GDP growth rate.

Transformation of variables
As a result of specification bias inherent from the previous measure (dropping of variables), transforming the variables is another preferable way by which multicollinearity can be reduced. In this study, the variables were transformed by taking the second difference of the time series data. The time series data was transformed by differencing because these variables tend to move in the same direction over time and are non-stationary. Thus, transforming the variables by differencing helps to minimize variables linear dependence as well as non-stationarity or unit root problem. Using the differenced series, agriculture, crude petroleum, mining and quarrying sector, manufacturing, utilities, hotel and restaurants sector, finance and insurance sector and housing and dwelling sector (-7.786E-6, -3.533E-5, -.002, -2.269E-6, -.001, -.004, -1.120E-5, and -.002 respectively) show negative relationship with GDP growth rate while the other sectors show positive relationship with GDP growth rate. Manufacturing (.071), wholesale and retail (.068), finance and insurance sector (.089) are the few sectors significant at 5%, fewer than the number of significant sectors in the other remedial measures discussed above. R 2 (.961) and its adjusted value (.688) are high and with few significant variables as shown in Table 7. Other indicators showed that multicollinearity was reduced as a result of transforming the variables than in other remedial measures, even though it has the fewest significant variables. There are more coefficients in Table 7 with tolerance (TOL) values (greater than zero) than in Table 2

Ridge regression analysis results
The estimate obtained from ridge regression analysis show that all the sectors are positively related to GDP growth rate in line with a priori expectations as indicated in Table 8. This estimate is called a generalized ridge estimate [17]. From the results, crude oil sector's contribution to the GDP growth rate, manufacturing, finance and insurance was significant at 1%; while mining and quarrying, and communication was significant at 5%. The R 2 value (.837) shows a good fit as the sectors explain 83.7% of the observed variation in the GDP growth rate. Hence, the concern of multicolinearity as observed and illustrated in the GDP time series studies above could be reduced using the ridge regression technique. According [8], all ridge regression models are better than Ordinary Least Square when the multicollinearity problem exists, and the best model is the Generalized Ridge Regression because it has

CONCLUSIONS
Every identified sector of an economy is believed to contribute to GDP from which the growth rate is estimated. Studies have been conducted to examine the contribution of each sector within the Nigerian economy and the relevance of each sector to the nation's economy. However, obtaining the estimate for such contribution often involve analysis of data involving multiple regression in which characteristic time series data are fitted into the regression model. Associated with multiple regressions is the problem of multicollinearity which may affect adversely result obtained leading to misleading inference. From the analysis conducted, ridge regression analysis would be more appropriate in such studies as it is a more reliable method to draw inference about the contributions of the various sectors to the GDP. The GRR model indicate a more well behaved model compared to the OLS implying a more precise estimation of the regression coefficients for the GRR model, which show a better influence of each sector's contribution to the GDP growth rate within the year 1991-2010 -a twenty-year period. Therefore, it would be more appropriate for researchers in GDP time series studies to account for the problem of multicollinearity by utilizing Ridge regression technique for estimation since all Ridge regression models give more robust estimates than the OLS models.