Age structure and age heaping: solving Ireland’s post-Famine digit preference puzzle

The quality of age reporting in Ireland worsened in the years after the 1845–1852 Great Irish Famine, even as measures of educational attainment improved. We show how Ireland’s age structure partly accounts for this seemingly conﬂicting pattern. Speciﬁcally, we argue that a greater propensity to emigrate typiﬁed the youngest segment (23–32-year-olds) used in conventional indices of age heaping. Any quantiﬁcation of age heaping patterns must therefore be interpreted considering an older underlying population which is inherently more likely to heap. We demonstrate how age heaping indices can adjust for such demographic change by introducing age standardization.


Introduction
Cliometricians continue to be intrigued by patterns of digit preference in the distribution of ages.The phenomenon, now widely referred to as age heaping, first came to prominence among economic historians when Mokyr (1983) drew a link between digit preference and numeracy skills in his analysis of Irish emigrants to the USA. 1 The basic idea goes that rounding your age means you are unable to calculate your true age; more age heaping in a population means a lower ability to count in that population.Viewed as a novel solution to the lack of historical data on human capital attainment, age heaping indices have since been energetically applied to a wide range of datasets to provide new illumination on basic education levels across time and space. 2 However, a recent debate between economic historians questions the efficacy of age heaping as a numeracy and human capital indicator. 3Brian A 'Hearn, Alexia Delfino, and  Alessandro Nuvolari (in A 'Hearn et al. 2022a) argue that age heaping propensity cannot represent numerical skills alone and posit two other explanations based on an analysis of historical Italian censuses: 1. state capacity, where the accuracy of census recording reflects available state resources and 2. culture, where digit preference mirrors changes to the social admiration of youth and the elderly.They conclude that changing age heaping patterns are a combined outcome of the forces of modernization rather than a narrow consequence of human capital accumulation. 4Joerg Baten, Giacomo Benati, and Sarah Ferber (in Baten  et al. 2022) reply that A 'Hearn et al.'s (2022a) concerns are exaggerated and reiterate the age-heaping-as-numeracy interpretation. 5Then, a rejoinder by A 'Hearn et al. (2022b) argues that they are not dismissing age-heaping-as-numeracy, but rather adding nuance to the interpretation. 6e further augment this exchange by identifying, and then subsequently addressing, the "Irish heaping puzzle": a strange increase in age heaping, or a fall in the quality of age reporting, between the 1841 and 1871 censuses, despite increasing literacy, school enrolment and educational attainment (see figure 1). 7Our analysis suggests a fourth explanation for patterns of age heaping: it is at least partially a mechanical phenomenon, driven by demographic forces.When we adjust for the underlying age structure of the population, our heaping puzzle is reduced in size.We therefore suggest that future studies of age heaping should account for age structure when making comparisons across time and space.Accounting for demographic changes, the difference in age heaping in Ireland between 1841 and 1871 roughly halved. 8Compared with the experience of other European countries, this adjustment in ABCC values is the same order of magnitude as changes measured across much longer horizons.In Baten et al.'s (2014) study of the impact of higher wheat prices, a 1.63 ABCC decrease is reported as a 'big' effect. 9Our adjustment is similar in scale. 10e illustrate how the underlying age structure of a population under study influences its measure of heaping by showing that younger populations are less inclined to report rounded ages.We argue that Ireland's population was automatically more likely to heap in 1871 because it was significantly older.11Without taking this demographic change into consideration, economic historians could incorrectly assume that Ireland had experienced see: Mukherjee and Mukhopadhyay (1988), Noumbissi (1992), Bailey and Mukannah (1993), Spoorenberg and  Dutreuilh (2007), Fayehun et al. (2020), and Singh et al. (2021).90-years-and-over.Source: BPP 1843; BPP 1876.a decline in numeracy, a reduction in state capacity, or a change in culture.12Instead, the reality is a famine-induced "premature ageing" effect, precipitated by Ireland's Great Famineera excess mortality and accelerated migration experiences.Ireland's post-Famine emigrants were in large measure young female and male domestic servants and young male agricultural laborers.Only 25% of the cohort aged 10-19 in 1841 remains in the 1871 census, whereas 32% of the cohort aged 20-29 in 1841 remains in 1871. 13asurements due to selective migration and mortality.The appendix to their paper provides a useful discussion and statistical analysis of the issue of age grouping and heaping propensity.Documenting a reversal of age heaping in famine-affected areas is not by itself a novel contribution.Manzel (2009) observes a famine-related reversal for Spain; Baten et al. (2010)  for China; and Baten et al. (2014) for England.And for Ireland, Baten et al. (2014) first notice the pattern we here explore, while Blum et al. (2017) speculate that its causes lie with the fact that the census is completed by male household heads.However, to our knowledge, we are the first to systematically test the hypothesis that famine-induced demographic change is a chief driver of such heaping reversals, and the first to design a modified index to incorporate age standardization as a solution. 14he implication of our findings is that age structure correction should be carried out by other researchers too; yes, demographic change may be particularly pronounced in the Irish case, but all populations experience year-on-year changes due to fluctuations in birth and death rates, aging and changes to life expectancy, and patterns of immigration and emigration.In particular, the type of rural-urban migration typical of nineteenth-century industrialization distorts population distributions and makes direct rural-urban comparisons fraught with difficulty in the absence of some sort of demographic standardization correction.More locally to Ireland, our findings also suggest a more explicit role for demographic aging in debates about the island's post-Famine economic performance. 15e proceed as follows: Section 2 sets out the Irish puzzle; Section 3 demonstrates the salience of population age structure in explaining the Irish puzzle; Section 4 develops a standardization correction approach to account for age structure in heaping indices; and Section 5 discusses the applicability of our approach to other times and places, including in Spain between 1877 and 1910.

Ireland's digit preference puzzle
There are several alternative indices available with which to evaluate the extent of age heaping in a single metric.The Whipple index (WI) was the first to be popularized by economic historians. 16It is a ratio of the share of people reporting an age ending in 0 or 5 to all age statements, where the population is restricted to those aged between 23 and 62: 14 Ó Gráda (2006) was also aware of the problem of demography in his study of Jewish migrants from Tsarist Russia included in the 1911 Irish census.Rather than constructing a single age heaping index, he reported separate age heaping indices for each age band.Tollnek and Baten (2024) largely dismiss population aging as a potential source of bias in their handbook chapter on the use of age heaping metrics as cliometrics methodology. 15For recent perspectives on post-Famine economic performance, see: Begley et al. (2016), Henderson (2019), and Kenny et al. (2023). 16Or as Whipple referred to it, "method of adjusting data troubled with these concentrations on the round numbers" (Whipple 1923, p. 180).Meanwhile, the Myers blended population method index is a weighted sum of the number of persons reporting ages ending in each of the ten terminal digits (Myers 1940, 1954), while the Bachi method involves applying the Whipple method repeatedly to determine the extent of preference for each final digit (Bachi 1951) where n is the sum of individuals with that specific age.As calculated, the WI would give greater weight to the bottom of the age distribution for a traditional "pyramid" shaped population distribution; because there are more younger people this means that younger age groups affect the WI more than older people.A WI value of 500 indicates that all age statements end in 0 or 5; a value of 100 indicates no heaping.A'Hearn, Baten and Crayen-on the suggestion of Gregory Clark-modify the WI to range between 0 and 100, where 0 indicates everyone reports an age terminating in 0 or 5, and 100 that there is no heaping on ages terminating in 0 and 5 (A 'Hearn et al. 2009).17Advocates of this modified index, since named the ABCC index after its instigators, interpret it as the percentage of the population that is numerate.It is given as follows: What is less well documented in the economic history literature is that contemporaneous to the A 'Hearn et al. (2009) contribution, an active debate among demographers had also led to a modified WI.This modification, due to Spoorenberg and Dutreuilh (2007), generalizes the WI to all digits.18They create a 'total modified Whipple's index' that combines the separate indices for each terminal digit into a single index, which ranges between 0 and 16.This can then be transformed into a modified ABCC that ranges between 0 and 100, as per Beltrán Tapia et al. (2022).For demographers, the purpose of these modified indices is to assess what changes in digit preference drive the improvement in age reporting quality, also for digits besides 0 and 5. 19 Our data are taken from the decennial Irish census, with particular focus on the returns for 1841 and 1871.1841 is recognized as an important juncture in the history of the Irish census, with significant improvements in data collection methods that formed the basis for subsequent censuses (Linehan 1991).It is the last census taken before the Great Irish Famine and has therefore been used extensively by scholars looking into the determinants of the famine (see, especially, Mokyr 1983; and contributions to Crowley et al. 2012).Our use of the 1841 and 1871 censuses avoids the serious pitfalls of the later 1911 census, where age reporting was perversely affected by the Old Age Pensions Act, which incentivized the elderly to exaggerate their age to gain access to newly introduced welfare payments (Lee 1969; Budd  and Guinnane 1991). 20n table 1, we report age heaping indices for Ireland across our four decennial census points: 1841, 1851, 1861, and 1871. 21The table suggests greater heaping in 1871 (as compared to 1841)  was not an exception; heaping was more pronounced in all three census years after 1841.By contrast, reported literacy in Ireland increased from roughly 1-in-2 in 1841 to 7-in-10 by 1871.This runs counter to the correlation we would expect to see if heaping is an indicator of basic human capital-it would mean that while the Irish were learning to read, they were forgetting how to count. 22able 2 reports age heaping by age group in Ireland across the same four decennial census points. 23We mainly focus here on males, because female ages are typically reported to census officials by their male household heads; women's ages arguably only tell us something more about men (see Blum et al. 2017).The table shows an aging population and an increased heaping propensity post-Famine.In addition, it illustrates a greater heaping propensity among older persons, which is not unique to the Irish case; similar observations have been made in different contexts (A' Hearn et al. 2009; Crayen and Baten 2010; Beltrán Tapia et al. 2022).
Ireland was the first constituent polity of the UK to receive state-funded primary education, from 1831 (Blum et al. 2017).National schools taught the "three R's"-reading, writing, A.
B. and arithmetic-with book-keeping being an especially sought-after subject (Coolahan 1981;  Clarke 2008).It is unsurprising to see this education policy bearing fruit in terms of the reduction in illiteracy 40 years after the policy was implemented.But it is quite surprising that there was an increase in heaping over the same period, if heaping is to be interpreted narrowly as numeracy skills. 24It is this puzzle that we now attempt to solve.

Explaining the Irish puzzle
To explore Ireland's curious heaping reversal, we use tabulated age data from the 1841 and 1871 censuses of Ireland.Unlike the intervening Irish censuses, these have the advantage of reporting ages by county (32 counties, equivalent to NUTS-3) and therefore permit spatial comparisons across the island. 25Figure 2 maps ABCC values by county for 1841 and 1871 and shows a striking decline in ABCC across most of the polity.
Our focus is the changing age structure of the population: the Irish population became older after the Famine.Among the population used in calculating the ABCC index, the main difference was a 4.35 percentage-point decrease in the share of the population aged 23-32 and a 5.05 percentage-point increase in the share of the population aged 53-62.This was primarily a consequence of increased emigration flows to Great Britain and North America.Irish migrants were overwhelmingly young single individuals (male and female); they tended not to migrate in family groups (Ó Gráda and O'Rourke 1997).
We model the change in ABCC at the county level between 1841 and 1871 using an OLS regression as a basic variable decomposition exercise.This provides evidence on the potentially salient factors for digit preference patterns.Following our hypothesis that age structure can help account for the heaping reversal, our main variables of interest are the shares of the population in the different age bands used in the Whipple calculation.We repeat this decomposition exercise with the change in illiteracy rates between 1841 and 1871, for comparison.
We add Famine-era excess mortality and migration as control variables.Famine-era excess mortality is Mokyr's (1983) average annual excess death rate between 1846 and 1851. 26amine-era migration is Ó Gráda and O'Rourke's (1997) estimation between the 1841 and 1851 censuses.We include the initial ABCC level in 1841 to control for level effects.We repeat the simple regressions with illiteracy as the focus, to see whether this differs from our findings for heaping.Table 3 reports the descriptive statistics and table 4 the regression results.To aid in the comparison of the size of regression coefficients, Appendix table A2 reports standardized regression coefficients for table 4's regression results; there we express the coefficients in terms of a single, common set of statistically reasonable units.
The results provide evidence on the extent to which the changing age structure accounts for the change in ABCC over time.In specification 1, changing age structure and ABCC in 1841 explain 34% of the variation in the dependent variable.The age structure variables (omitting the oldest group) are positive and significant, indicating that a more positive change in these younger group shares is associated with a more positive change in the ABCC. 27ith the inclusion of Famine-era migration and excess mortality in specification 3, the magnitude of the age structure effects is smaller but still statistically significant.Notably, Table 4. OLS regressions of the change in ABCC and illiteracy between 1841 and 1871  Famine-era migration is more statistically important than Famine-era mortality.This is to be expected; it was the oldest and the youngest that were most prone to perish during the Famine (Mokyr and Ó Gráda 1982).However, Famine-era migration was the main driver of changing age structure-specifically, those age groups included in the most widely-used age heaping indices.Demographic aging also matters when looking at illiteracy in specifications 4-6.Younger generations increasingly benefitted from the rise of publicly funded education, and this invariably fueled declining illiteracy over time.But of crucial importance was the increased literacy of the female population; the educational attainment of women was emphasized by both Fitzpatrick (1986) and Blum et al. (2017).Here also Famine-era migration and excess mortality account for decreased illiteracy.
The 1871 census also provides information on literacy by age, distinguishing those aged 20-40 and those aged over 40. Figure 3 plots the literacy and ABCC values for these two cohorts, side-by-side.Each dot represents a county, the size of the dots represents the population size of each county, and the line of best fit uses population weights.There is a clear distinction between those under and over 40; for any given level of literacy, the older cohort heaps significantly more, by 5 and 8 percentage points for men and women respectively.Crayen and Baten (2010) argue that age heaping is best analyzed by birth cohort rather than from a population average.However, comparing different birth cohorts derived from one census source (i.e., from a cross-section) introduces selection bias.For example, using the 1871 census and comparing those born in the 1820s with those born in the 1810s introduces bias from selective mortality and migration.Equally, isolating a single birth cohort reported across different decadal censuses (i.e., time series) introduces selection biases. 28This is especially problematic in the Irish case where, for example, only 25% of the cohort aged 10-19 in 1841 remains in the 1871 census, and 32% of the cohort aged 20-29.Indeed, looking at a single cohort over time shows a pronounced fall in the ABCC values in figure 4. 29 An alternative approach is to compare the same age group, not cohort, across census years.This allows us to compare like with like: youth with youth, middle-aged with middle-aged, and elderly with elderly.Figure 5 illustrates the change in ABCC values between the two census points by sex and age bin.There is very little change in the 23-32 age group for men and women, while the largest change is in the 43-52 age group.Women see negligible decline in the 53-62 age group, whereas for men there is still a substantial decline.Notably, however, the gaps here are much less pronounced than the change within the individual cohort shown in figure 4. Yet, beyond the advantage of age groups over cohorts, we must also account for age structure variability-which is overlooked in age heaping metrics.Our technical solution is to adjust the conventional Whipple index by standardizing it by age.Adjusting for age composition is commonly carried out when comparing other demographic variables, notably mortality rates.A classic example from the USA is the comparison of mortality rates in Florida, which is famously a destination for retirees, and Alaska, which has a younger population.Without adjusting for age structure, the Florida-Alaska comparison gives a misleading impression of mortality. 30The same principle applies to age heaping.

B. Females
Whipple made a distinction between "secessive" and "accessive" populations, the former being a population which has excess emigration and the latter one with excess immigration. 31et the Whipple Index itself compares the relative frequency of age statements terminating in 0 and 5 for the population aged between 23 and 62 without allowance for the age structure of the population.This is because Whipple's original purpose was to adjust for 'errors in age' rather than for making comparisons of populations across time and space.Simply put, this index, and its various derivatives, is not being applied in the way that it was originally intended.
We can account for age structure differences by constructing a weighted average Whipple index, using standardized population weights to make our adjustment. 32As the magnitude of error is in individual ages, the weights are derived from the underlying data source by grouping the ages: where m are age-standardized weights for each age group. 33Standardized estimates using national average age group shares then adjust for the differences in population structure and enable us to see whether the changing composition of the population explains the differences in heaping observed. 34Essentially, our age-standardized Whipple index allows us to compare like-with-like.Our index can be incorporated into an ABCC computation to provide an agestandardized ABCC (AgeABCC) by replacing WI with AgeWI in Equation 2. We use weights derived from the 1841 census (reported in table 5) to directly compare 1841's younger population with 1871's older population. 35Reweighting censuses to synthesize a stable population across time is analogous to using a base year in the calculation of real GDP, or, perhaps more aptly still, using fixed base-year quantities in calculating a price index.The effect of this adjustment is best illustrated by comparing the unadjusted and adjusted ABCC values for 1871, shown in figure 6.The adjustment (i.e., standardization) using the 1841 weights increases the mean ABCC values for both men and women.As a robustness exercise we include composite weights using Belgium's 1900 census and England and Wales's 1901 census, inspired by Whipple's observation that neither Belgium nor England reported as populations with an "abnormal use of rounding" (Whipple 1923, table 32).We also use weights calculated from the US population in 1900.Finally, we report equal weights Our adjustment leads to a significant alteration of the "change in ABCC between 1841 and 1871" variable used as the dependent variable in our regression analysis.Figure 7 depicts in a boxplot the percentage difference of the change (1841-1871) between unadjusted and standardized figures.After standardization, the change in ABCC values was 30% lower for males and 33% lower for females.In extreme cases, such as in County Westmeath, the difference between 1841 and 1871 was 66% lower for males and 177% lower for females; the latter reflecting a decrease in heaping when standardized figures are used.County Dublin, Figure 7. Percentage difference of ΔABCC (1841-1871) between unadjusted and standardized.Source: See table 3.
which is unusual due to the reduction in heaping it experienced from 1841 to 1871, has larger changes in ABCC when standardized figures are used.This heterogeneity underscores the sizeable implications of standardization.37

Discussion and conclusion
When Mokyr (1983) first made the connection between age heaping and numeracy skills, he was particularly interested in whether Irish emigrants to North America were positively or negatively selected on human capital. 38We show there is a second type of selection he should have been concerned with: a mechanical process due to selective emigration by age which creates a statistical illusion of differences in age heaping between sample populations.Mokyr's selection concern is something real, not spurious, and is something our standardization procedure does not address directly.But we argue that his concerns can be addressed only once we correct for a sample's age structure.
Indeed, our analysis suggests Ireland's nineteenth-century age heaping puzzle can be better understood by considering the island's changing age structure over time.Those appearing in the later 1871 census were the survivors and "remainers" in a post-Famine Ireland, where mortality and emigration had selectively yielded an older Irish population that is naturally more inclined to report rounded ages, i.e., age heap.These demographic changes had a sizeable impact on the age structure of the island which experienced "premature aging" in the decades after the Famine (Gilleard 2016).The implications of our findings for other studies using age heaping as a development indicator are twofold.Firstly, for comparative purposes, stable populations are the most appropriate for the application of heaping metrics.The Irish case shows that demographic shocks can induce changes in heaping propensity which are inconsistent with changes in other human capital or state capacity indicators.This is especially important when heaping is used as the sole indicator of human capital development because while there is generally a positive correlation with other human capital indicators, this is not always the case.Secondly, standardization should be considered for use in other studies of age heaping as it adds credibility to comparisons across time and space and is derived from methods used by  demographers.Yes, in some cases standardization will produce negligible differences, but this does not negate the value of enhancing the precision of our metrics.
Our study provides impetus for future inquiry.At a local level, for the case of Ireland, it raises questions about the impact of an older (thus more heaping-inclined) population for post-Famine economic and social development.This is especially interesting given the social conservatism and lack of industrialization that prevailed in the post-Famine decades.More generally, our argument raises questions about the role of standardization in the comparative methods of economic history.Of particular interest for future debate is the choice of an appropriate benchmark when comparing different times and places. 39e illustrate our conclusion with the case of Spain-a country with considerable emigration from the 1880s, with over three million Spaniards leaving for Latin America in particular (Sánchez-Alonso 2000).Like other European emigration nations, Spanish migrants tended to be young and single; while 15% of Spain's population was between the ages of 20 and 39, 32% of Spanish migrants to Argentina were in this age group.Spanish migrants tended to be more literate than those remaining in Spain.The heavy outpouring of emigrants makes Spain like the Irish case.And just like Ireland, there were only a few cities that absorbed internal migrants from rural areas.Those that moved to the cities tended to be more literate than those that remained in their provinces of origin (Beltrán Tapia and de Miguel Salanova 2017).
We apply our standardization methods to data from Beltrán Tapia et al. (2022), who found, like our Irish case, that heaping did not decline as literacy rose.Figure 8 depicts our results.We use the 1877 and 1910 Spanish censuses as these represent years immediately prior to mass emigration waves, and the later years of the migration boom.Our approach resolves Beltrán Tapia et al.'s age heaping puzzle; we find that ABCC index rose after we correct for Spanish demographic patterns distorted by high levels of young person emigration.Furthermore, the remaining levels of age heaping are very low, which supports Beltrán Tapia et al.'s view that the data collection process may have influenced how ages were recorded-possibly crossreferenced with other registers by census administrators.
Finally, how large are our adjustments?Table 6 compares our adjusted ABCC values with unadjusted results from the pioneering study of this field (A 'Hearn et al. 2009).Our adjustments are quantitative large compared to the temporal changes in age heaping patterns reported there.Over our 30-year measurement period, our adjustments represent −0.05 ABCC points per year equivalent change for men.This is similar in magnitude the total unadjusted change in ABCC per year levels that took place in France-and larger than that in Germany-across a 100-year measurement period, and similar to the change in Italy, Norway and Poland across a 50-year period.

Figure 2 .
Figure 2. ABCC index by county in the 1841 and 1871 censuses.Note: Bins are nested means derived from the mean of all for both years, the mean between the minimum and the mean, and the mean between the mean and the maximum for both years.Source: BPP 1843; BPP 1876.
Robust standard errors in parentheses, * * * P < 0.01, * * P < 0.05, * P < 0.1.Illiteracy is the percentage of persons who can neither read nor write of persons 5 years and upwards.

Figure 3 .
Figure 3. Literacy and ABCC values for Ireland in 1871.(A) Males.(B) Females.Note: Fitted values lines-of-best-fit are weighted OLS regressions, where weights are county population.Source: Own calculations using BPP 1876.

Figure 4 .
Figure 4. ABCC values of cohorts born in the 1810s reported in the 1841 and 1871 censuses.Note: Boxplot demonstrating the locality, spread and skewness of cohorts through their quartiles.We use the 23-32 cohort from 1841 and the 53-62 cohort from 1871.Source: See table 3.

Figure 5 .
Figure 5. Change in ABCC values by age-category between the 1841 and 1871 censuses.Note: Outliers plotted as individual points.Source: See table 3.
. The US Census Bureau has developed a set of Microsoft Excel workbooks which is calls its Population Analysis System (PAS) that calculates the Whipple, Myers and other such indices: https://www.census.gov/data/software/pas.html.

Table 1 .
Estimates of heaping(Whipple and ABCC), illiteracy and industrial production, 1841-1871 censuses Note: A higher Whipple or lower ABCC implies more heaping.Illiteracy is the percentage of persons who can neither read nor write of persons 5 years and upwards.Output is measured as industrial output.Productivity is labor productivity, measured as industrial output per worker.Source: BPP 1843; BPP 1851; BPP 1863a, LVII; BPP 1863b, LXI; BPP 1876.Output data fromKenny et al. (2023).

Table 2 .
Estimates of heaping (ABCC), male, by age group, 1841-1871 censuses Note: Pop. share is share of total male population between the ages of 23 and 62 in census year; 23-62 total is the standard unweighted ABCC index.Source: See table 1.

Table 5 .
Population weights and sex ratios by age bin Source: 1841 census weights derived from BPP 1843; 1871 from BPP 1876; composite weights based on England and Wales's 1901 census (BPP 1904) and Belgium's 1900 census (Ministre de l'Intérieur et de l'Instruction Publique de Belgique 1903) weighted by population size; US 1900 weights from United States Census Office (1901); Spain 1877 from Beltrán Tapia et al. (2022).Unadjusted and age-standardized ABCC indices in 1871.Source: See table 3. as implied by A'Hearn et al.'s (2022a) footnote 38, for comparison. 36Corresponding age pyramids for each are reported in Appendix figure A4.

Table 6 .
Comparison of magnitude changes in ABCC over time Source: Ireland is authors' own calculations; comparison group is from table 4 of A'Hearn et al. (2009).