Correcting for artifactual correlation between misreported month of birth and attained height-for-age reduces but does not eliminate measured vulnerability to season of birth in poorer countries

ABSTRACT Background Height-for-age z-scores (HAZ) are associated with month of birth (MOB) in many nutrition surveys, but that link could be an artifactual result of measurement error in child birthdates. Objective We corrected estimates of the associations between HAZ and MOB for a common type of age misreporting, to measure the remaining seasonality in HAZ and identify country characteristics associated with vulnerability to seasonal changes in early life. Design We used nationally representative repeated cross-sections from all available Demographic and Health Surveys (DHS), totaling 1,363,806 children from 218 surveys in 72 countries over 1986–2016, to estimate the seasonal patterns in HAZ by MOB within each survey. Then, we corrected these estimates for each survey's random errors in recorded birth month implied by differences in attained height between children reported as born in December of one year versus January of the next. Indicators of seasonal variation between other months were modeled as functions of national-level incomes using linear regression, and visualizations were constructed using nonparametric local polynomial smoothing regressions. Results Over all surveys, misreporting MOB accounted for about one-eighth of the gap in attained height between the worst and best months to be born, which averaged 0.41 HAZ in the raw data and 0.34 HAZ after correction for age misreporting. A linear correction reduced apparent seasonality of HAZ by MOB in 49 of 72 countries, and the remaining nonartifactual differences by season of birth were larger in countries with lower average income per capita. Conclusions Measurement error in child MOB helps to explain the association between attained height and seasonal variation in early life environments, but significant seasonality in HAZ by MOB remains in many poor countries. Higher national income is associated with smoother outcomes across birth months, and birth registration efforts would improve nutrition research.


Introduction
Measuring the nutritional consequences of seasonal changes in disease exposure and the food environment can improve understanding of how poorer families cope with risk, avoid vulnerability, and gain resilience to the many nutritional risk factors they face (1,2). Nutrition smoothing is the ability of an individual or a group of people to maintain stability in their nutritional status and health despite changes in their household and community environment, including seasonal fluctuations in weather as well as sudden shocks such as natural disasters or pest infestations (3).
Whether a population is able to smooth its nutritional outcomes can often be measured using associations between attained height and birth circumstances, due to the sensitivity of linear growth to different risk factors (4). Whatever a population's average level of stunting over the year, seasonal fluctuations can adversely affect children's human capital development (5) and linear growth (6)(7)(8). Direct measurement of relevant climate and weather variables might not be feasible, and in any case researchers can be interested primarily in whether attained heights are affected by all of the many factors that vary by season (9,10). Seasonality in height and other nutrition indicators by month of birth (MOB) has been observed in several populations and varies depending on region (11). The mechanisms for these patterns might be related to the endocrine system (12), energy intake (13), dietary quality (14), poverty (15), birth-or conception-related factors (16), pregnancy-related factors (17), or the disease environment (18). Analyses of seasonality in child heights depend on the accurate measurement of child birthdates (19,20). This is a difficult task, especially because globally only ∼65% of children aged <5 y have had their births registered (21). Larsen et al. (20) recently discovered artifacts that cause an apparent seasonal pattern in child heights by MOB in the Demographic and Health Surveys (DHS). One of these artifacts is a gradient in height-forage z-scores (HAZ), observed as an implausibly smooth linear increase in average HAZ monthly from January to December, and an implausibly large step change in mean HAZ from those reportedly born in December of one year to January of the next (19,20). HAZs indicate linear growth for individual children compared with healthy children of the same age and sex. This implausible monthly gradient and December-January gap can be explained by random error in reported birth months.
Random measurement errors in exposure variables cause imprecise coefficient estimates, attenuation bias, and a loss of statistical power, but in this case random misreporting of birth months within calendar years interacts with linear growth to create artifactual seasonality in observed HAZ. The objective of this study was to correct for the implausible gradient in the relation between HAZ and MOB, and thereby reveal nonartifactual seasonality in attained heights by MOB across countries.

Methods
We used a collection of nationally representative repeat crosssections of data from the DHS to estimate patterns in child heights by MOB (22). The DHS are the largest collection of comparable health and nutrition microdata in the world, and are typically conducted at 5-y intervals in many low-and middle-income countries in collaboration with national statistics offices. Women of childbearing age (age 15-49 y) are the primary subjects of DHS data along with their children aged <5 y. Our focus was these children's height-for-age, for which we have about 1.4 million observations (Supplemental Figure 1). To build this dataset we appended 218 of the Standard DHS together in Stata 15/MP (23). This collection spanned 72 countries and 31 y. Then, we generated binary variables indicating reported MOB. Next, we estimated HAZ as a function of MOB, controlling for agein-months and sex, by survey with mother fixed-effects using the ordinary least squares (OLS) estimator. Age and sex controls were used to improve the precision of estimated coefficients on MOB, reducing errors attributable to differences in the ages, sexes, or other attributes of children with each reported birth month in any given survey, or among siblings with the same mother in estimates that use maternal fixed-effects.
In Equation 1, i indexes mothers, j indexes children, and m indexes months. MoB m denotes 11 binary indicators for reported MOB estimated with January as the omitted reference group, Age is measured as number of months in linear and quadratic terms, Male is a binary indicator which equals 1 if the child is male and 0 if the child is female, δ i are mother fixed-effects, and μ i j is an independent and identically distributed error term. To account for correlation in omitted influences on HAZ in each survey site, robust SEs for each coefficient were estimated with clustering by enumeration area. Coefficient estimates from these models were used to construct indicators of seasonality in heights, using separate regressions for each survey.
To calculate and visualize gradients for worst-to-best months to be born, the matrix of estimated coefficients for the model in Equation 1, by survey, was exported into a new database. Therefore, the set of results across surveys resulted in a matrix of estimated coefficients with 218 rows and 15 columns. The rows of the new matrix contained the sets of coefficient estimates by survey. The columns of the new matrix contained the coefficients estimated on the following variables: 11 month-of-birth binary variables, age, age 2 , sex, and a constant term. Also included were the following scalars in subsequent columns, by survey: F statistics, R 2 , and total number of observations from each OLS regression.
To correct the estimated coefficients on MOB for the measurement errors, we followed Larsen et al. (20) in assuming that the implausibly large gaps between December and January births and the implausibly smooth gradients observed in estimated coefficients over each successive month within the year were due to random misreporting of MOB. For children who were actually born in midyear (July), if their MOB is misreported earlier in the year (between January and June), then they are in fact younger than they are reported to be, and therefore their height-for-age would be calculated as lower than it should be, leading to lower average HAZs in the earlier months of the year. The reverse is true if their MOB is misreported as occurring later in the year (between August and December).
With equal probability that births in each month will be misreported as occurring in earlier or later months, the only children whose recorded birthdate is an unbiased estimate of their true age are those born at the midpoint of each year between June and July. To correct for the artifactual gradient in HAZ effects associated with each successive reported MOB, we subtracted one-twelfth of the estimated coefficient on December births from the estimate for each successive month, with one-half of the total added back so the gradient was rotated around the midpoint of each year and July 1 births became the reference category. Denoting the estimated coefficient on each recorded MOB asα s m , where m is 1 for January and 12 for December, the corrected estimate of true seasonal effects in each month net of measurement error was: This correction could be visualized as a linear rotation of each MOB coefficient around the midpoint of the year between June and July, allowing a comparison of true seasonality in HAZ with other indicators of health. For surveys with a larger estimated gap between December and January births (α s 12 ), the slope of this linear correction was steeper. Without the adjustment, seasonality in these other health outcomes and HAZ were not comparable, due to the systematic bias in estimated HAZ caused by random measurement error in birthdates. After we corrected the estimated coefficients for measurement error in child birthdates, we calculated 2 primary outcome variables to indicate seasonality in HAZ by survey, using the estimated coefficients on the MOB binary variables. Both seasonality indicators were calculated with the original raw coefficients and the corrected coefficients, to compare in visualizations. The first indicator of seasonality was the gap in HAZ between the worstto-best months to be born in a given country and year (denoted Gap). This was calculated as the absolute value of the minimum of the estimated coefficients on MOB minus the maximum of the estimated coefficients on MOB (Equation 3). This indicator reflected the potential disparity in HAZ between the worst and best months to be born for the participants of each given survey. The larger the estimated gap, the more substantial the presence of seasonality in heights.
The second indicator of seasonality (Equation 4) was the standard deviation of the corrected coefficients on MoB m (denoted SD). This indicator captured variation in vulnerability within the calendar year. The larger the estimated SD of coefficients on MoB m , the more substantial the presence of seasonality in HAZ. January was the reference month in estimated coefficients for each of these models, and July 1 was the corrected reference for the HAZ coefficients, so each coefficient could be interpreted as the mean difference in HAZ associated with each MOB, relative to others in their survey with the same age and sex, and relative to siblings with the same mother in regressions with maternal fixedeffects.
After the 2 seasonality indicators were constructed using the reduced 218-row matrix, the datasets of estimated coefficients were merged by DHS survey year with data on gross domestic product (GDP) per person, measured at purchasing power parity (PPP) prices, from the Penn World Tables (24). Missing values of GDP were linearly interpolated by year when possible. Then, we used OLS to estimate the relations between the 2 measures of seasonality, either Gap or SD, and GDP per person in that country at the time of the survey, as well as the survey year and year squared to capture any global trends over time (Equation 5).
In addition to the regression analyses, we constructed visualizations of nutrition smoothing across months of birth before and after the adjustment for measurement errors and with respect to the national-level covariates. These visualizations were Epanechnikov-kernel weighted local polynomial smoothing regressions of degree zero. Using nonparametric methods is a flexible way to see differences across groups without having to assign a functional form. Nonparametric regressions estimated the means and CIs for each outcome as continuous functions of the variables on the x-axis: MOB, and the natural logarithm of GDP as an indicator of national-level incomes per person. We did not bring other data to the study for causal inference methods such as instrumental variables, or for replication and validation, because our aim was specifically to correct for the implausible gradient in HAZ over the calendar year that is observed on average globally. With additional data such as an instrumental variable that is correlated with true MOB and unrelated to HAZ, further research could identify causes of seasonality and modifiable factors to smooth nutrition outcomes at specific locations. Table 1 presents descriptive statistics for DHS by global region for Middle East/North Africa/West Asia/Europe, sub-Saharan Africa, South and Southeast Asia, Latin America and the Caribbean, Central Asia, and for the sample as a whole. The largest number of observations comes from sub-Saharan Africa, and HAZs were lowest in South and Southeast Asia with a mean of −1.48 SDs below the median. Table 2 presents descriptive statistics for the collapsed matrix of estimated coefficients across the 218 included surveys, and summarizes data on real GDP. The outcome variables are listed in order of how they were constructed, first correcting for measurement error in birthdates, then controlling for mother fixed-effects, and finally for measurement error in birthdates and mother fixed-effects. Across all surveys, there was a mean 0.23 HAZ points (0.17 SD) gap between reported Decemberborn and reported January-born children. Before correcting for measurement error in child birthdates, the mean gap between the worst-to-best months to be born for HAZ was 0.41 HAZ, and after the linear correction, this mean gap declined to 0.34 HAZ. There were substantial country-level differences in these preadjustment and postadjustment means (Tables 3 and 4). After controlling for mother fixed-effects and clustering errors by community in the original survey-level regressions, the mean gap in HAZ decreased less, from 0.65 HAZ to 0.62 HAZ. Similarly, without controlling for mother fixed-effects, the mean of the SDs declined from 0.13 SD to 0.10 SD after accounting for measurement error in birthdates. After controlling for mother fixed-effects, the difference between premeasurement and postmeasurement error correction declined slightly, from 0.20 SD to 0.19 SD. Tables 3 and 4 summarize the changes by country in 2 seasonality indicators before and after correcting for measurement error in child months of birth: Gap (Table 3) and SD (Table 4). Data in these tables were sorted by relative differences in each indicator, with the largest changes in seasonality after the linear correction at the top of the table and declining as the table continues. Seasonality in child heights in 49 countries declined by between −37.87% and −0.20% when measured by the gap between the worst and best months to be born. All regions of the world were represented in the 23 countries where seasonality measured by Gap increased from between 0.17% and 68.78%. Of the 10 countries in the DHS collection with the largest relative decrease in HAZ seasonality after the linear correction, 7 were located in sub-Saharan Africa (Table 3). When measured by the SD of coefficient estimates on months of birth, seasonality in child heights declined by between −0.55% and −45.41% in 50 of 72 countries after correcting for measurement error, and increased by between 0.10% and 67.46% for 22 countries. Figure 1 demonstrates the effects of the linear adjustment for measurement error in child MOB across the whole sample.

Results
The solid line shows the steady increase in estimated coefficients of HAZ on child MOB across the year, an artifactual relation that was the result of random measurement error in child MOB (19,20). The dotted line shows how these estimated coefficients changed after correcting for the random measurement error, eliminating the implausibly large gap in HAZs between Decemberborn and January-born children. Figure 1 includes all countries with available anthropometric data, and so a near-horizontal relation would be expected because weather and climate across the year vary greatly among the included countries (19,20). Country-level investigations are necessary to ascertain where, if any, seasonality in child heights was still present after the adjustment. Figure 2 shows how seasonality in HAZ, measured by the gap between worst and best months to be born, was related to GDP per capita during the year of the survey in each country. A wider gap between the dotted and solid lines in these charts indicates more measurement error in MOB in the original DHS data. The solid line was not corrected for measurement error in birthdates, and showed a negative association between seasonality in HAZ and GDP per capita. This negative association was less pronounced but still present after correcting for measurement error in child birthdates, as indicated with the dotted line. The difference between preadjustment and postadjustment for measurement error in birthdates was nonexistent for the highest income countries, perhaps because the original measurement error was not especially pervasive for those surveys. For countries in the low-to-middle income range, the measurement error accounted for about 10% of the gap in HAZ between the worst and best months to be born. Figure 3 shows a similar pattern for when seasonality was measured by the SD of MOB coefficients, where  measurement error accounted for about 10% of the observed seasonality before the linear correction, but only for countries in the bottom and middle of the income distribution. For higher income countries, there were no differences preadjustment and postadjustment in the relation between HAZ seasonality and GDP per capita. Figures 4 and 5 are examples of country-level changes in the appearance of seasonality in HAZs before and after correcting for measurement error in child MOB. For illustration, we chose 2 countries, Zambia and Bangladesh, located in 2 different regions of the world, which had very different appearances of seasonality in child heights after the linear adjustment for measurement error in child MOB. First, directly comparing Figure 4 with Figure 5, seasonality in HAZs was still present in Bangladesh after correcting for measurement error in child MOB, but not in Zambia. In Zambia, any seasonality in HAZs was erased by the linear adjustment for measurement error in child MOB. Before correcting for the artifactual relation between HAZ and MOB, there was a gap of about 0.25 HAZ between Decemberborn and January-born children in Zambia. After the adjustment, no gap was apparent between December-and January-reported births. In Bangladesh, there was still a gap of about 0.05 HAZ between the best month to be born (August) and the worst month to be born (March), even after correcting for measurement error in the MOBs. The comparison between Bangladesh and Zambia indicates that misreporting in MOB was more prevalent in Zambia than Bangladesh, perhaps reflecting the difference in birth registration systems between the countries.
The associations between seasonality in child HAZ and national income are presented Table 5. For illustration, an example of these regressions and the calculations for the Gap and SD seasonality indicators is presented in Table 6 for Zambia, which had 5 Standard DHS surveys included in the full collection, implemented between 1992 and 2013. All models in Table 5 were estimated using the HAZ coefficients, corrected to account for measurement error in child birthdates, and the original regressions from which the coefficients came were estimated using mother fixed-effects. Seasonality in HAZs as measured by both indicators, Gap and SD, was negatively associated with GDP per capita. Given that the GDP covariate was log-transformed, the coefficients can be interpreted as semielasticities of HAZ seasonality with respect to GDP. Thus, a 1% increase in GDP at 2011 PPP prices was associated with a 0.065 reduction in the HAZ gap between the worst and best months to be born, and a 0.019 reduction in the SD across all estimated coefficients on months of birth. These estimated associations are meaningful because GDP typically grows over time and because seasonality indicators are measured at the population level. Although 0.065 HAZ points might not be clinically significant to an individual child, shifts in the seasonal distribution of HAZ of that magnitude are substantial. Seasonality in HAZ also declined over time, as indicated by the estimated coefficients on the time trend in columns 3 and 4 of Table 5. Each additional year reduced the HAZ gap between the worst and best months to be born by 0.01 HAZ points on average, and reduced the SD across all estimated coefficients on months of birth by 0.003 on average. Income and time are colinear due to economic development during this period, and the closest correlation between seasonality in HAZ and income is shown in the models in columns 5 and 6. In summary, seasonality in HAZ decreased slowly over time and has a small negative association with GDP, after correcting all estimates for measurement error in MOB. Low R 2 values for the models in Table 5 are likely due to the relatively coarse measurement of population well-being in the GDP indicator. Several other household-and individual-specific factors also affect seasonality in HAZ, such as care practices, food intake, disease status, and livelihoods. By design, the models presented in Table 5 were not intended to account for most of the seasonality in HAZ, only to assess the associations between GDP and seasonality in HAZ. We would not expect GDP and time to be the sole determinants of seasonality in HAZ at the national level, but data limitations and the potential for measurement error precluded the use of other possibly relevant variables.

Discussion
Even after accounting for random measurement error of birthdates that leads to spurious patterns in child heights throughout the year, seasonality in HAZ by MOB was still present in many of the poorest countries. This indicates that season of birth is still a determinant of linear growth in many but not all contexts, threatening long-term human capital development. Many of the countries with remaining nonartifactual seasonality in HAZ are in sub-Saharan Africa. In 9 countries-Côte d'Ivoire, Comoros, Ghana, Moldova, Kazakhstan, Togo, Thailand, Uzbekistan, and  1 Data shown are means of the SD indicator of seasonality by included country (see text), before and after correcting for measurement error in child MOB. MOB, month of birth. 2  Albania-the remaining nonartifactual gap in HAZ between the worst and best months to be born was still >1.0 HAZ. Country-specific seasonal patterns can be helpful for interpreting these results, for example, in Bangladesh. The main lean season in Bangladesh occurs during October-November (13,25). Given the patterns seen in HAZ by MOB in Bangladesh after correcting for measurement error in child MOB, it appears that being born during the rice harvest season in February-March is worse for future height attainment than being born 2 months before the lean season in October-November. Therefore, having the complementary feeding stage begin in August, right before the lean season, is worse for subsequent linear growth compared with being born just before the lean season when newborn infants would be protected from food shortages by breastfeeding, and then be able to start their complementary feeding stage as the harvest season begins. Further locationspecific analyses are important to understand specific issues related to birth registration systems and other constraints on national health survey accuracy.
The negative association between overall seasonality in child heights (after correcting for misreported MOB) and national-level incomes reflects height as a cumulative, intergenerational indicator of well-being. The negative association between seasonality in heights and GDP also speaks to the broad range of policies and conditions needed to promote resilience and protect families from adverse conditions throughout the calendar year. Further work is necessary to understand the determinants of nutrition smoothing in specific contexts, including the strategic use of longitudinal data, the concurrent measurement of agriculture, nutrition, and health variables, and incorporating more nutritional information  on older children, adolescents, and adults into national-level surveys.

Study limitations
There were 3 main limitations of this study. First, we analyzed matrices of regression results from 218 individual surveys, merged with other data at country-and year-levels. Whereas the geographic and temporal coverage was substantial, the countries and years for which data were available depended on where surveys could be implemented, and did not include many of the world's most vulnerable populations. Second, additional subnational analyses would be valuable, especially because climatic and agricultural risks vary widely within countries. A third  limitation was that not all components of the original regression results were used for analysis, namely, the SEs of estimated coefficients on MOB. Instead, we focused on the estimated MOB coefficients themselves. In future work, we would aim to incorporate additional information relating to hypothesis testing, such as the SEs or CIs of estimated MOB coefficients, to gain a deeper understanding of nutrition smoothing and its variability.
Finally, we assumed that the errors in mismeasurement of birthdates were random and equally distributed across the year. A more specific approach to dealing with this measurement error might be possible, such as by analyzing recorded birthdates by survey enumerator or calculating an individual child's risk of having a mismeasured birthdate based on observable factors.  Coefficients are OLS estimates of the associations between each given variable and an indicator of seasonality: the absolute value of the gap between the worst and best months to be born (Gap) and the SD of estimate coefficients on months of birth (SD), after correcting for measurement error in month of birth. Covariates are measured at national and annual levels. Original models were estimated by survey for 218 surveys using OLS for HAZ as a function of age, age 2 , sex, and mother fixed-effects. * * P < 0.01, * * * P < 0.001. GDP, Real 2011 Gross Domestic Product at Purchasing Power Parity; HAZ, height-for-age z-score; OLS, ordinary least squares

Future research on nutrition smoothing and resilience
Several important questions remain about seasonality in child HAZ and nutrition smoothing. Estimating the amount of stunting that could be eliminated by the smoothing of HAZ outcomes throughout the year could be useful, as well as examining what economic, environmental, and social factors facilitate nutrition smoothing at the national and subnational levels. For example, public health infrastructure and market access might allow families to overcome seasonal environmental risks to their children's health (26). Building on work on gender bias in the intrahousehold allocation of foods, researchers could estimate differences between boys and girls in the smoothing of their HAZ outcomes throughout the year.
Some research questions about nutrition smoothing can be answered using existing data and literature or by developing merged databases that combine different types of data, whereas others could require specialized data collection. For example, including  more nutritional outcome data on older children and adolescents would be valuable, especially given that their anthropometric measurements would be less sensitive to artifactual measurement error in MOB. Existing data, information, and knowledge are also valuable. For example, systematic reviews of existing literature on early-life shocks within local contexts would be valuable for synthesizing what is already known about nutrition smoothing in particular places. Making it easier for researchers, especially those in low-and middle-income countries, to study the effects of early-life shocks in their own communities would be productive (27). Investigating the mechanisms for how early-life shocks affect later health is becoming more feasible due to advances in measurement and in database management for climate and nutrition variables. For example, remote-sensing data have become particularly valuable for obtaining objective information about climatic conditions at particular times (28,29), or for assessing the risk or severity of famine or drought. Using remote-sensing data does not come without challenges. Remotely sensed climate databases are subject to various biases depending on the particular data-generating processes, but judicious care and various strategies can assess database quality for particular research questions, or address shortcomings during analysis.
Other relatively recent advances of interest to researchers who primarily use publicly available data are the Living Standard Measurement Study-Integrated Surveys on Agriculture, which concurrently measure agricultural and health microdata in nationally representative panels in close collaboration with national government ministries (30). The Demographic and Health Surveys, which collect data in nationally representative repeat cross-sections about every 5 y in low-and middle-income countries, are now including spatial covariate datasets with their geocoded microdata (22). Improving survey enumeration and birth registration efforts to increase the quality of data on MOB would make true seasonal patterns more easily apparent. With the use of these publicly available datasets, there are opportunities for researchers to investigate nutrition smoothing and its local determinants. Making it easier to merge national surveys or censuses with environmental or climate data would also be useful.
There is still no unified understanding of the consequences of seasonal risks to child nutrition. The child health effects of climate and other outside factors are often substantial, and not often homogeneous within countries or across different subgroups of the population in question. Research studies in this area do not often investigate mechanisms directly, largely due to challenges with needed data. Instead, the focus has generally been on measuring the effects on nutrition of early-life shocks or seasonal cycles within specific contexts. These are useful exercises, especially given the heterogeneity in effects described above, and future work should attempt to go deeper in examining mechanisms and biological and social pathways.