Reproducibility of statistical data, academic publications and policy implications: Evidence from Ghana

This paper examines the accuracy, validity and presentation of statistical evidence and also assesses the implications of irreproducibility associated with variations in sample size for academic research work and policy-making. The 2012/13 Ghana Living Standards Survey (GLSS), 10 academic publications and the Free Senior High School policy in Ghana are used to address the objectives of the paper. The data show that about 20 per cent of the tables in the Main Report of the GLSS Six is irreproducible, 10 per cent of the tables have outcomes worth re-examining, and in terms of completeness in the presentation of statistical evidence, only 3 out of the 27 sampled tables report the sample size that was used. Again, nine out of the 10 academic publications use half of the original sample size, two-fifths of the publications do not report the sample size for the descriptive statistics, a couple of the papers show varying sample size between the descriptive statistics and the regression analysis.


Value of the Data
The data provide comprehensive socio-economic information on Ghana at the regional and national levels. Among others, the data generally, Provide extensive database for national and regional planning. For instance, it provides a reliable impetus to conduct a comparative analysis on consumption as a proportion of household production.
Provide information on trends of households' consumption and expenditure at a greater level of disaggregation. Together with other surveys (GLSS1-5), the GLSS 6 provides the basis for comparing socio-economic and health issues on Ghana.
Give a detailed investigation into the structure and distribution of wages and conditions of work of the country's labour force.
Provide a wide-ranging standard data for use in the compilation of current statistics on average earnings, hours of work, and time rates of wages and salaries so as to indicate wage/salary differentials between branches of industry, geographic regions, occupations and the sexes

Data
The GLSS Six which was conducted in 2012-13 collects detailed information on Demographic Characteristics of Households, Education, Health, Employment, Migration and Tourism, Housing Conditions, Household Agriculture, Household Expenditure, Income and its Components and Access to Financial Services, Credit and Assets, Governance Peace and Security [1]. The data is hierarchical at the individual, household and community levels, thereby facilitating the analyses of both unit-specific and contextual issues that inform outturn of behaviours and policies. Official reports emanating from the GLSS Six are: the Main Report; Poverty Profile in Ghana 2005-2013; Labour Force Report; Child Labour Report; Governance Peace and Security Report; Water Quality Testing Report; and the Community Report [1].
The GLSS Six data, sourced from the GSS website, has 102 instrument-section specific (not including aggregated) data files and the number of variables in each of these data files ranges between 8 and 136. The Main Report of the GLSS Six, which has in total 212 tables and figures (excluding tables and figures in the annex), is used as the reference document for examining the accuracy of data through an attempt to reproduce a sample of the tables and figures. These tables and figures have been captured in 11 out of the 13 chapters (that is excluding the introductory and concluding chapters) of the GLSS Six Main Report. Guided by a minimum of two and a maximum of three tables and figures for each chapter (based on a pre-determined target of engaging at least 10% of the total number of tables and figures), stratified and simple random sampling techniques were employed. Two or three tables were selected depending on whether the total number of tables and figures in the chapter was less than or greater than 20. Specifically, two tables were selected in instances where the total number of tables was less than or equal to 20, while three tables were selected in the case where the number of tables and figures was more than 20 (See Appendix A).
The second objective of the paper assesses the implications of irreproducibility of statistical evidence as a result of non-availability and use or presentation of a varying number of observations. Ten working papers and journal articles published in the last three years (2015-17) and employing the GLSS Six raw data and regression techniques are used for the assessment. A conscious effort was made to select five papers each for individual and household level regression analysis. The population of working papers and journal articles that satisfy the above criteria was obtained using the google scholar, researchgate and sciencedirect search engines. The search yielded 11 papers, but one of them did not have full citation and therefore was dropped. Table 1 Ability to reproduce statistical evidence reported in GLSS Six. Tables  Title of Table  Reproducible

Materials and Methods
To minimise the occurrence of differences in reported statistical evidence and to initiate the culture of reproducibility, the STATA syntax used for generating the tables in this paper has been provided in Appendix B. Table 1 shows that 80 per cent of the tables in the GLSS Six Main Report are reproducible. For all the chapters of the GLSS Six Main Report that had tables (Chapters 2 to 12), at least one sampled table was reproducible, thereby reposing a degree of confidence in the reliability of the reported statistical evidence across thematic issues. In seven out of the eleven chapters all sampled tables were reproduced. These were the chapters on Migration, Housing, Agriculture, Expenditure, Finance, Non-farm Enterprise and Governance. The inability to reproduce all the tables engenders the need to investigate the source of the difference. However, the ability to investigate and identify the source of variation strictly depends on the comprehensiveness of information and skills. Thus, the source of variation may emanate either from the author's error, lack of appreciation of the data manipulation process and/or assumptions underlying the variable generation process or from an error by GSS.
Generally, it was observed that derived variables and means, compared to proportions, were relatively challenging to reproduce. This is expected, since in practice the ability to reproduce statistical evidence from an existing dataset is enhanced through the provision of comprehensive metadata and the availability of a syntax filea practice yet to be adopted by most producers and users of Ghanaian data. The limitedness of comprehensive documentation, especially a methodological manual for the GLSS Six, may account for the irreproducibility of derived variables. Also, the handling of outliers and variables with legitimate zero responses may account for the challenges in reproducing tables that have continuous outcomes.
For an appreciation of the extent of variation between statistical evidence produced in this paper and that reported in the GLSS Six Main Report, each of the tables that were irreproducible have been presented for discussion. This provides an opportunity to discuss the effect on overall patterns that will guide the discussion and policy recommendations.
From Table 2, more than 50 per cent of the values generated are reproducible. The variations are observed with the computation of the proportions of the different types of Christian religious denominations. The variations in the values generated in this paper and evidence from GSS range from 1.4 percentage points (proportions of Catholics in Accra (GAMA)) to 26 percentage points (proportions of Catholics in Rural Savannah). Noticeably, these variations do not change the patterns of proportions of different religious denominations across the geographical areas. However, it is conjectured that the source of deviation is emanating from the use of different number of observations in computing the proportions for the broad religious denomination compared to the sub  categories (breakdown for the Christianity). Thus, the computation of the sub-categories of Christianity uses the sub-sample of Christianity as the total and the other broad religious categories use the total number of households as the sample size. While this is understandable, the non-provision of the different sample sizes for a single table may generate erroneous interpretation and use of the disaggregated data for further analysis.
The GSS statistical evidence on educational attainment of the Ghanaian population aged 15 years and older is at variance with the values obtained from the reproducibility exercise. The observed differences span from -24.9 to 29.9 (see Table 3). Across all the values in Table 3, none of the values was reproducible and the variations affected the observed patterns of the proportion of educational attainment across the various levels. Specifically, while the GSS reports that the majority (44.6%) of Ghanaians aged 15 years and older have less than MSLC/BECE, the reproducibility exercise in the paper shows that the majority (45.6%) have attained MSLC/BECE/Vocational. Two tables (Tables 4.7 and 4.15a) out of the three that were sampled for the chapter on health were irreproducible. Table 4.7, titled "Average consultation fees and payments for medicines (GH¢) two weeks preceding the interview (excluding those who paid nothing) by locality and sex", provided an opportunity to reproduce means to enhance comparison with reproduction of proportions. From the means reported in Table 4.7, for the eight sub-categories of health expenditure, the reproduction of one of them (consultation) was exact, while the averages for all the others varied significantly and in different directions. Thus, the reproduction exercise yielded values that were either greater than or less than the values reported in the GLSS Six Main Report. Compared to the values reported in the GLSS Main report, some of the variations from the reproduction exercise recorded a difference of more than 200 per cent. The outcomes for total medical expenses in Other Urban, Rural Forest and Rural Savannah areas point to such instances. The extent of the variation also affected the patterns of health expenditure across the geographical areas. For instance, the fees for overall treatment, as reported by GSS, show that the average cost in GAMA is greater than Other Urban areas by GH¢20.32, while the calculations from the reproducibility exercise show that a reverse pattern is observed where Other Urban areas have a higher mean expenditure by a margin of GH¢35.53.
Further scrutiny of the raw data revealed that there were significant outliers across the health expenditure categories. The reproduction exercise did not adjust the computation of the means to accommodate the effects of the outliers because there is no documentation on how extreme values were treated in the production of the statistical evidence in the GLSS Six Main Report. Although the health expenditure variables are directly obtainable from the raw data (not derived variables), the variations are significant relative to the earlier tables that were not reproducible. This can also be attributed to the influence of extreme values. It is worth interrogating this matter as in some instances the effects of the outliers were really compelling. Table 4.15a, titled "Percentage distribution of women aged 15-49 years who used contraceptives by amount paid and age group", is the second irreproducible table for the chapter on health. In contrast to Table 4.7, only two out of seven payment categorisations were irreproducible. However, through another lens, the extent of variations in these two cases is also high, as in the case of Table 4.7. Indeed, the difference in some instances was in excess of 500 percentage points. For example, payment bracket GH¢1.00 and GH¢1.99 for age group 20-24 shows a percentage difference  Table 4 Irreproducibility of Table 4.7 -Average consultation fees and payments for medicines (GH¢) two weeks preceding the interview (excluding those who paid nothing) by locality and sex. of 804 per cent between the author's computation and the statistical evidence reported by GSS. Although values in the tables are percentages, they are underscored by the treatment of a continuous variable (amount paid for contraceptives). This supports the earlier proposition that continuous and derived variables are sensitive to reproducibility of statistical evidence relative to proportions. (Table 5) The fifth table captured as irreproducible was identified in the chapter on employment. The unemployment rate in Ghana has, over time, attracted the attention of all stakeholders because of the low and differential rates across different rate surveys. The [1] based on the GLSS Six indicates that the unemployment rate in Ghana is 5.2 per cent while the [2] based on the LFS presents 11.9 per cent. Against the backdrop that issues related to conceptualisation, definition, standardisation, international benchmarking and local context are not interrogated in this paper, the observation on unemployment in Ghana calls for a re-consideration of these issues. Compared to other tables that were irreproducible, Table 6 presents relatively marginal differences between values obtained from the reproducibility exercise and the GLSS Six Main Report. Major variations were observed for the unemployed in Ghana. All the variables of economically active and inactive, employed and unemployed are all derived variables and require some assumptions, hence the need to have comprehensive information to inform reproducibility.

Validity
A re-examination of this paper's Table 2, which is titled "Irreproducibility of Table 2.10 -Household heads by religion and locality", shows that the column percentages, with the exception of respondents who are Christians, should sum up to 100 per cent as reported in the GLSS Six Main Report. Columns 3, 5, 7, 9 and 11 present totals exceeding 100 per cent. Cognisant of the fact that as a result of approximation the row or column percentages may sometimes vary by plus or minus 1 percentage point, the extent of variation from the expected 100 per cent in Table 2 ranges between 13.6 and 29.1 percentage points. Table 7 provides an opportunity to compare the statistical evidence on educational attainment as documented in the Main Report of the GLSS Six with the GLSS Five (2005-06), 2010 GPHC, 2014 GDHS and computed values from this paper. Conscious of differences in dates for the respective census or surveys, assumptions underlying the grouping of educational attainment and variations in the use of educational level, grade, attainment and completion, the reported statistical evidence of the proportion of Ghanaians with less than MSLC/BECE in the GLSS Six varies strikingly in comparison with the other reports. Also, comparing the GLSS Six and the other surveys and census, the examination reveals variations in the proportion of Ghanaians who have obtained MSLC/BECE/Vocational.
The third mode of validation of statistical evidence uses Table 4, which is titled "Irreproducibility of Table 4.7 -Average consultation fees and payments for medicines (GH¢) two weeks preceding the interview (excluding those who paid nothing) by locality and sex", to examine veracity of the values. In this context, validity is assessed based on comparability of the obtained values across the localities. Based on broad expenditure patterns, expenditure in the Greater Accra Metropolitan Area (GAMA) is expected to be higher than all other four localities. Also, expenditure levels in urban areas are expected to be higher than rural areas. See Chapter 10 of the 2014 GSS Main Report. Of the eight subcategories of health expenditure in Table 4, three (consultation fees, total medical expenses and transport expenses) depict that the mean expenditure in rural areas is greater than that for GAMA. Prima facie, this is contrary to the expected pattern and therefore calls for the ability to reproduce comparability to previous and comprehensive information on the processes, assumptions and context. Conscious of the fact that an argument can be put forward to support circumstances under which expenditure in rural areas would be higher than that of urban settlement, the scope (three out of the eight sub-categories) and intensity (extent of variation) undoubtedly call for further reexamination. Mindful of the sparsely distributed nature of health facilities in rural areas, which might culminate in relatively higher costs of health-related transportation compared to urban settlements, it would be quite challenging to ascribe reasons to the observation that total medical expenses and cost of consultation is higher in rural areas. Also, the average total medical expenses for all rural areas To further obtain validity of the statistical evidence provided in Table 4.7, an examination of the aggregates using the mean of means of the sub-expenditure health categories across the localities is undertaken. From an urban-rural point of view, the mean of means shows that expenditure in urban areas is 12 per cent higher than rural settlements (GH¢234.07 and GH¢208.98 respectively). However, from Table 10.7 of the same GLSS Six Main Report, it can be observed that mean annual household cash expenditure for urban areas is 34.13 per cent higher compared to rural areas (GH¢169 and GH¢ 126 respectively). In addition to the above, the observations in 2012-13 were at variance with the pattern of health expenditure across the localities for the 2005-06 wave of the GLSS. In 2005-06, health expenditure for all the three sub-categories was higher in GAMA and urban areas compared to the other localities in rural areas. These observations corroborate the need for reproducibility of statistical evidence prior to engaging findings for policy-making and further analysis.

Comprehensiveness
The GLSS Six has more than 230 data files, making it too many for further analysis. This, coupled with the non-availability of a methodological manual on how the data files can be combined and assumptions and procedures for generating derived variables, increases the likelihood of errors in using the data for further analysis. The GDHS is organized in a compact manner and has extensive documentation on the processes for combining data files, as well as detailed information on the assumptions and procedures for generating derived variables.
Related to the packaging is the style and comprehensiveness in the presentation of the statistical evidence. For both the style and comprehensiveness, emphasis is placed on only one issuehandling and presentation of the number of observations. For any summary statistics to have meaning for a user, the number and elements of the observations are the most important factors. To this end, it is important for all statistical outputs to include the number of observations and their characteristics (including but not limited to applications of weights, conditions for inclusion and exclusion and issues related to standardisation and normalisation). Almost all the tables and figures in the GLSS Six provide minimal information on these expected attributes. Indeed, only one out of every ten tables in the GLSS provides information on the number of observations. The risk of not providing detailed information on the number of observations is that a statistic can be reproduced with a different set of observations that has varying attributes. Table 8 offers an example of this occurrence. Also, the treatment of weights is not discussed. Type of weight and its distinction across different units of analysis are critical pieces of information that have to be disclosed to improve the chances of reproducibility. Table 5.14 of the GLSS Main Report, titled "Currently employed population 15 years and older with contract, unions, tax deductions and employee benefits by sex", is used to demonstrate the relevance of presenting the number of observations used in generating statistical evidence. From Table 8 in this paper, the column captioned "Sample 15 Years and Older" exactly reproduces the statistical evidence documented in the GLSS Six Main Report. The column captioned "Entire Sample" ignores the age Table 7 Validation of statistical evidence on educational attainment reported in GLSS Six. restriction (15 years and older) and generates the statistical evidence for all those who are working.

Level of Educational
Using an approximation condition of less than one per cent all the statistical evidence will be reproduced, but with a different sample specifically including children. It is worth noting that, based on the correlates presented in Table 8, children are not likely to benefit from issues related to contract, unionism, tax holidays, social security and pension schemes. Thus, approximate statistical evidence has been reproduced in an instance where the sample variance is 73 (6709 for the entire sample compared to 6636 for the sample with the age restrictionsee the last row of Table 8). The implications of the observed variations in the sample size from Table 8 will surface as the data is further engaged for detailed analysis. This is so because, for instance, use of averages responds to outliers whereas cross tabulations do not. Such occurrence may lead to erroneous policies and/or counterintuitive results.

Comparability of sample sizes employed
The rest of the presentation responds to the second objective by providing a discussion on how variation in sample sizes influences the outcome of summary statistics, regression analysis and policy direction. Table 9 presents sample sizes used for descriptive statistics, regression analysis of the 10 working papers and journal articles and shows how these compare with the original sample as provided in the GLSS. Table 9 shows that four out of the ten working papers and journal articles did not indicate the sample size for the descriptive statistics and one of these four papers failed to indicate the sample size for both the descriptive statistics and the regression analysis. For the six papers that reported sample size for both the descriptive statistics and the regression analysis the values were the same, with the exception of two cases. In these two cases, the sample size for the regression analysis was greater than that for the descriptive statistics. Based on the original sample size for the GLSS Six (16,772 households and 72,372 individuals), the last column of Table 9 computes the proportion of the sample that was used for the regression analysis of each of the ten working papers and journal articles.

Variations in locational patterns based on differences in sample sizes
Tables 10 and 11 provide a platform to assess the distribution of respondents across geographical areas that were critical to the sampling procedure of the original data. Regional representation was prioritised in the implementation of the GLSS and reflects population density in the respective regions of the country. Columns 2 and 3 of Table 10 present the urban and regional distribution of households and individual respondents in Ghana. Columns 4 and 5 of the same Table present the distribution of Ghanaians based on the samples used by the two papers. It is observed that the  distribution of households across the urban and regional areas is the same (Columns 2 and 4). This is expected given the same sample size. Contrary to this observation is the variation in the patterns for columns 3 and 5, where the sample sizes are not the same. For example, the evidence from the GSS indicates that the third most populous region is the Eastern, but from the fifth column Northern is the third most populous region. The concern, as presented in Table 11, is accentuated in the case of Paper No. 3. The regional distributions for both sub-samples (migrant and non-migrant) point to the fact that Upper West is the most populous region and Greater Accra is one of the two least populous regions in Ghana. While admittedly, the authors provide detailed procedures and reasons for using a reduced sample of 6383 households instead of 16772, the observed variations present a concern for generalisability of the findings. The verification and assessment of the implications of using a reduced sample are worth discussing in academic publications.

Policy implications of irreproducibility
The value associated with evidence-based policy-making has been well documented [3] leading to its advocacy in all circumstances including interventions that are undeniably needed. To this end, while the Free SHS policy in Ghana is undeniably a worthy cause, it is imperative to assess the role of data in supporting the policy direction and in tracking the outputs, outcomes and impact. As a precursor to the discussion on the implications of irreproducibility for policy-making and tracking, it is important to interrogate the type of data needed for a particular policy. For the purposes of this paper, it is assumed that education has been prioritised among other issues and the concern is the choice of investing in a particular educational level at a given point in time. The data required to reach a decision on a particular educational level will need to capture the mobility of pupils and students across different levels to determine the entry level that requires focused intervention. This will mean that panel data for a particular cohort of children will serve the purpose. Cross-sectional data, like the GLSS, will provide only a snapshot of information and is therefore not ideal for tracking performance over time. However, since there is no long-term national data designed to track the enrolment and graduation of a given cohort of pupils and students, the GLSS comes in handy. With this limitation in mind, it is imperative to ensure that the available cross-sectional data is devoid of other challenges such as non-comparability across different surveys and censuses and irreproducibility.
With reference to Table 7, the second and third columns reported variations in the patterns across the four levels of educational attainment, and these provide different leads on the level to be prioritised. The evidence reported in the GLSS Six lends itself to focusing on MSLC/BECE, since 44.6 per cent of Ghanaians are unable to attain this level. Thus, the evidence from the GLSS Six does not support the Free SHS policy. However, the GLSS Five indicates that about two-fifths of Ghanaians attain MSLC/BECE/Vocational and only 13.6 per cent complete Secondary/SSS/SHS or higher. Therefore, the evidence in the GLSS Five aligns with the rationale for the Free SHS. The statistical evidence reported in the GLSS Six on educational attainment is irreproducible. This engenders further concern, especially in the light of the fact that the calculations in this paper support the Free SHS. The experience of irreproducibility dampens confidence in the data and adversely affects the use of the data to inform policy. This further justifies the clarion call for the need to provide comprehensive documentation, including a methodological manual and syntax for generating statistical evidence, in the release of nationally conducted surveys such as the GLSS.

Transparency document. Supplementary material
Transparency data associated with this article can be found in the online version at https://doi.org/ 10.1016/j.dib.2018.04.008.