Introduction

According to the “Out of Africa” or “African Replacement” hypothesis every non-African human being is descended from a founder group of homo sapiens living in Africa, a small subset of whom at some point in history dispersed into the wider world and displaced earlier human species resident there (reported by Stringer and McKie 1996). This explains the greater genetic diversity found in sub-Saharan-African populations in comparison to other ethnicities (Taillon-Miller et al. 2000).

One example of a heritable region found to be more genetically diverse in sub-Saharan-Africans is the insulin (INS) variable number of tandem repeats (VNTR), a 11p15.5 chromosomal locus at which particular 14–15 base pair repeats are present in variable numbers in different individuals within a population (Stead and Jeffreys 2002; Stead et al. 2003). In African populations the INS VNTR has been divided into 22 lineages, whereas in non-African populations there are only three lineages (grouped as class I, and classes IIIA and IIIB, according the number of repeats in the VNTR). Deviation from Mendelian transmission ratios has been observed with these two major classes (classes I and III; Eaves et al. 1999), implicating this genetic region as conferring a survival advantage. This may operate through associations with size at birth (i.e. birth weight, birth length or head circumference at birth) since: (1) historically low birth weight (LBW) has been the major determinant of neonatal mortality (McCormick 1985), (2) genetic diversity that alters size at birth may therefore have influenced survival in the past (Dunger et al. 2007), and (3) in non-African populations the INS VNTR has been variably associated with size at birth and post-natal weight gain (Dunger et al. 1998; Ong et al. 2004; Mitchell et al. 2004; Bennett et al. 2004; Lindsay et al. 2003; Heude et al. 2006; Vu-Hong et al. 2006; Landmann et al. 2006; Osada et al. 2007; Mook-Kanamori et al. 2007). Inconsistent reports in different populations may reflect genetic heterogeneity and selection of cases. Opposite associations have been observed in different populations (Lindsay et al. 2003), which could reflect linkage with adjacent polymorphisms in imprinted cluster genes on chromosome 11 that have been variably associated with weight gain (Gu et al. 2002; Petry et al. 2005; Paquette et al. 1998). Examples of these include insulin-like growth factor 2 (IGF2), whose protein product IGF-II is very important in foetal growth, and is both exclusively paternally expressed in foetal life and located adjacent to the INS gene, and H19, an IGF2 repressor which is exclusively maternally expressed during foetal life (Lustig et al. 1994).

Such genetic factors may interact with effects associated with other established factors related to size at birth and early weight gain, including pre-pregnancy maternal weight and maternal weight gain during pregnancy, gestational age, gender and parity (Ong et al. 2002). Whilst in developed countries maternal macronutrient intake during pregnancy may (Moore et al. 2004) or may not (Mathews et al. 1999) contribute to size at birth, in less developed populations such relationships are better established. An example of this has been observed in The Gambia, where severe seasonal variation in nutrition leads to a drop in birth weight of around 250 g and an increased prevalence of babies born with a LBW (Ceesay et al. 1997). In such populations with varying amounts and qualities of nutrition, there are likely to have been intense genetic pressures historically to boost foetal and early growth and therefore chances of surviving the critical neonatal period. In light of this we chose to study this Mandinkan African population from The Gambia who still experience extreme environmentally related nutritional pressure which affects size at birth, post-natal weight gain and long term survival (Ceesay et al. 1997). As well as having the nutritional pressures on their foetal and early growth, they also have the increased genetic diversity (in comparison to non-African populations) of the African INS VNTR. We sought to test the hypothesis that genetic variation around the INS VNTR in this population is associated with size at birth and early growth.

Materials and methods

Population

This was a retrospective cohort study of all live births in three subsistence-farming villages of the West Kiang District in The Gambia from 1976 to the present. There is a seasonal agricultural system that revolves around an annual rainy season, which occurs from July to November. Regular ante- and post-natal care has been provided by resident midwives to all women of childbearing age since the mid-1970s. Birth weights have been recorded from three villages (Keneba, Manduar and Kantong Kunda) since 1976. They were recorded to the nearest 10 g by the trained midwives with a Salter spring balance and tared sling (Salter Industrial Measurements Ltd., West Bromwich, UK), which was calibrated regularly. LBW was defined as weight <2,500 g. Measurement of gestational age was introduced in 1978, which was assessed by using the physical and neurological scoring system validated by Dubowitz et al. (1970) by medical doctors within 5 days of delivery. Prematurity was defined as a gestational age <37 completed weeks. Small for gestational age (SGA) was defined as a birth weight less than the tenth percentile of gestational age based on the reference standard by Williams et al. (1982). Thick and thin blood smears for malaria routinely taken from women during 43,385 antenatal and well-women clinic visits were positive for malaria on 3,276 occasions between 1978 and 2003. For a more detailed description of the population and its surrounds see Rayco-Solon et al. (2004).

There were two previous nutritional supplementation trials, which included ≥1 of the three villages (Ceesay et al. 1997; Prentice et al. 1987). The first-order effects of supplementation (fitted with the use of a single indicator variable) on preterm births and on SGA were not significant. The seasonality of either the preterm births or SGA was not significantly modified by supplementation when season was modelled by a simple binary variable indicating the hungry season (July to November). Thus, we ignored supplementation in the rest of the analysis.

The recording of births and deaths in these villages predated the formation of the Joint Gambian Government/Medical Research Council Gambia Ethics Committee. Approval for the continuation of the demographic surveillance was granted when the committee was formed in 1981. Approval for the genetic component of the study was granted in 2003. Informed consent was gained from either each study participant or their parents, as appropriate.

DNA samples

DNA from this population was extracted from blood, buffy coats or peripheral blood mononuclear cells using standard methods as part of the establishment of the Gambian DNA biobank (Sirugo et al. 2004). INS single nucleotide polymorphism (SNP) genotyping was successfully performed on 2,613–2,652 individuals, with between 124 and 553 informative DNA trios (i.e. mother, father and offspring) depending on the genotype distribution of the SNP in question and the availability of recorded birth phenotype (i.e. length, weight and head circumference). For each of the SNPs between 97.9 and 99.4% of the samples were successfully genotyped. Repeat genotyping gave 99.0–99.7% concordance rates.

Genotyping strategy

To genotype all the SNPs close to the INS VNTR would have involved genotyping 57 SNPs (Stead et al. 2003) and given our population size, would have meant that some INS haplotypes were present in less than 10 people. Hence we developed a strategy whereby genotyping a reduced number of SNPs would give us closely related haplotype groups (after reconstructing phases) containing individuals with closely related haplotypes (Stead and Jeffreys 2002). Using the published African haplotype prevalences (Stead et al. 2003) and by assuming that a DNA sample’s assigned haplotype was correct if it contained the INS VNTR haplotype that was most prevalent within that group, it was found that 85.3% of samples could have their haplotypes correctly assigned by genotyping just 5 SNPs (SNPs 24, 27, 28, 34, 38) and 1 indel (‘SNP’ 69) [from Stead et al. (2003); Fig. 1]. This strategy was therefore chosen to optimise the balance between the amount of genotyping ‘effort’ (cost, amount of DNA to be used, etc.) that needed to be made and useful genetic information gained. A subset of 100 randomly chosen DNA samples were then genotyped for these SNPs to validate them in our population. SNP allele population prevalences did not differ from those previously published in different African populations (Supplementary Table SD1 and Stead and Jeffreys 2002) so we then proceeded to use this genotyping strategy across the whole population.

Genotyping of INS VNTR polymorphisms

SNPs 24 (rs3842738), 27 (rs689), 28 (rs5506), 38 (rs3842752) and 69 (rs3842740) were genotyped using the 5′-nuclease technique and standard protocols (Taqman, Assays by Design; Applied Biosystems, Warrington, Cheshire, UK). Primer and probe sequences are shown in Table 1. PCR reactions contained 4 ng genomic DNA, 1× TaqMan universal PCR master mix, forward and reverse primers (both 72 nM), and VIC- and FAM-labelled probes (both 16 nM) in a total reaction volume of 5 μl. The reaction mixes were incubated at 50°C for 2 min and 95°C for 10 min followed by 40 cycles 95°C for 15 s and 60°C for 1 min. Completed PCRs were read on an ABI Prism 3100 Genetic Analyzer (Applied Biosystems) and analysed using their allelic discrimination sequence detection software.

Table 1 Primer and probe sequences used in the 5′-nuclease assays and PCR-RFLP assay (for SNP 34) used to genotype the INS SNPs

SNP 34 (rs3842748) was genotyped using PCR amplification and restriction fragment length polymorphism analyses. Oligonucleotide primer sequences are shown in Table 1. A measure of 10 ng of DNA was amplified along with 1× Bioline NH4 buffer, 200 μM each dNTP, 1 mM magnesium, 6 pmol each oligonucleotide primer and 0.5 units Biotaq DNA polymerase (Bioline, London, UK) in a total reaction volume of 10 μl. The mix was incubated for 5 min at 94°C, followed by 19 cycles of 94 °C for 45 s, 59°C for 45 s (dropping 0.5°C per cycle) and 72°C for 45 s. After this the mix underwent 15 cycles of 94°C for 45 s, 49°C for 45 s and 72°C for 45 s, followed by a final incubation at 72°C for 10 min. The PCR mix was then incubated with 1 unit DraIII (New England Biolabs, Hitchin, Herts., UK) at 37 °C for 16 h. This produced a 223 bp band for the C allele and 118 and 105 bp bands corresponding to the G allele when the sample was separated by agarose gel electrophoresis.

Statistical analyses

We examined separately the association of each SNP with four continuous outcomes: weight, length and head circumference at birth and weight gain in the first 2 years of life. Whilst straightforward least squares regression was used for the birth outcomes, weight gain analysis was performed in three stages. First weight-for-age z scores were calculated for each observation using age- and sex-specific means and standard deviations calculated from the Keneba child clinic data. Next the slope and its standard error for the regression line of z score on age was calculated for each child. Finally these slopes were used as the outcome variable in a least squares regression on genotype. Only children who had at least 5 observations before 2 years of age and at least one in both the first and second years of life, and those for whom standard error for the regression slope of z score on age was less than 0.4, were included in this analysis.

Each SNP was fitted in regression analysis as a linear term: the effect of each copy of the minor allele was assumed to be additive. Both the maternal and infant genotypes were considered. We also considered parent-of-origin effects for the infant’s genotype. Where these could be determined, we additionally fitted models with separate terms for alleles derived from mother and father. In order to test for a parent-of-origin effect these last models were reparameterised in terms of an interaction term for the difference between the effects of an allele according to whether it was inherited from the father or the mother. In order to examine the original class I/class III axis found to be predictive of birth size in non-African populations we constructed a variable computed from the difference between the number of copies of classes I and III haplotypes (defined by SNP 27, and SNPs 34 and 38, respectively; see Fig. 1). This variable therefore took values between −2 and 2; individuals without either of these haplotypes were uninformative and omitted from this analysis.

Fig. 1
figure 1

Median joining network showing genetically related INS haplotypic groups in Mandinkan people from The Gambia, established by genotyping SNPs 24 (rs3842738), 27 (rs689), 28 (rs5506), 34 (rs3842748) and 38 (rs3842752), and indel 69 (rs3842740). SNP numbers and diagram adapted from Stead et al. (2003). Reproduced with permission from Cold Spring Harbour Laboratory Press © 2009, Stead et al. (2003)

The main analyses of SNPs presented here fitted a general linear model of association, with random effects to allow for intra-family correlation using the generalised least squares method (Stat Corp LP 2007). Secondary analyses for comparison purposes were fitted using a quantitative TDT (qTDT) model (Allison 1997), regressing outcome on genotype whilst controlling for mating type (AA × Aa, Aa × Aa or aa × Aa—other mating types are uninformative and hence excluded from this analysis). An extension of this method (George et al. 1999) was used to incorporate parent-of-origin effects. In order to allow for the inclusion of several children from each family, we again fitted a “between-family” random effect, as suggested by Allison (1997). Polygamy is very common in this society (children in this study had on average 2.4 full siblings and 2.6 half-siblings; 2.5 of which have a different mother) complicating the definition of nuclear families. Since children sharing the same mother are more closely related than those sharing the same father (due to exposure to both genes transmitted from the mother and the maternal intrauterine environment), for the purposes of this analysis, we defined family members as children sharing the same mother. (We are aware that this is clearly imperfect since there will also be some dependence between other relatives. However, we are reassured that these could be ignored since even allowing for siblings did not profoundly affect conclusions.) Each analysis was repeated with and without the following covariates: sex of the child, village where the birth took place, year of birth (linear trend), season of birth (binary hungry versus harvest—first half of year vs. second) and parity (binary according to whether the infant was the mother’s first delivery). qTDT analysis was not appropriate for the class I/III variable so only the association models with random effects were fitted. All statistical analysis was performed using Stata version 10 (Stat Corp LP 2007). For the association analyses [based on genotyping 2,700 samples, using the methods of Lange et al. (2002, 2004) and an additive model] for a fraction of the variance of birth weight of 1% the study had statistical power of 90.2–99.9 % (α = 0.05–0.001). Due to potential effects of multiple testing a Bonferroni correction was applied such that P < 0.008 was considered statistically significant throughout.

Results

General population characteristics

Table 2 shows the birth characteristics from the three villages from which our study population was drawn. Table 3 shows the genotype frequencies for the INS genotypes from our population, all of which were consistent with Hardy Weinberg equilibrium. Figure SD1 in the supplementary data section shows the linkage disequilibrium between the 5 INS SNPs and 1 indel that were genotyped in this population.

Table 2 Summary of the birth characteristics from each of the three villages in the West Kiang District in The Gambia from which the study population was drawn
Table 3 Numbers (and %) of individuals split by INS genotypes in the West Kiang District Gambian population. All the genotypes were consistent with Hardy Weinberg equilibrium

Simple associations of genotypes and allele transmission with birth and early growth characteristics

Table 4 shows the effect sizes of the associations between the various INS VNTR SNPs and the birth and early growth phenotypes. The supplementary data Table SD2 shows the effect sizes of the linkage and association of the transmission of these SNP alleles with phenotype using qTDT, when trying to confirm these associations. None of the associations were significantly modified by adjusting for the season of birth.

Table 4 Association (modelled with random effects and adjusted for sex of the child, village where the birth took place, mother’s parity and season and year of birth) between birth and early growth characteristics and INS (a) SNP 24, (b) SNP 27, (c) SNP 28, (d) SNP 34, (e) SNP 38 and (f) SNP 69. Results of the statistical analyses are presented as effect size (β coefficient) and 95% confidence interval

SNP 24 was not associated with any of the foetal and early growth phenotype. However, SNP 27 showed a significant association between birth length and infant’s allele transmitted from the mother controlled for maternal genotype (P = 0.004; by qTDT P = 0.015).

Significant associations shown by SNP 28 were only evident with post-natal growth, where maternal transmission of the T allele (controlled for the maternal genotype) tended to be associated with greater post-natal weight gain than when the C allele was transmitted (P = 0.006; by qTDT P = 0.009). There appeared to be parent-of-origin effects on post-natal weight gain, with only maternal transmission showing the above association (P = 0.006; by qTDT P = 0.022). SNPs 34, 38 and 69 failed to show any significant associations with foetal and early growth phenotype.

Comparisons between INS VNTR classes I and III subjects as found in non-sub-Saharan-African populations

There was no significant difference between INS VNTR classes I and III infants or mothers in birth weight, head circumference at birth or length at birth (all P > 0.05). 95% confidence intervals for the infant class I versus class III effect were: birth weight (−22, 38 g), head circumference (−0.7, 1.5 mm) and birth length (−1.2, 4.7 mm).

Discussion

This study is the first that we are aware of finding significant associations in an in situ African population combining polymorphisms in a more genetically diverse sub-Saharan African gene, in this case INS and its VNTR, with markers of foetal and early growth. Whilst associations have been variably observed with SNPs that flank the INS VNTR in non-African populations, we reasoned that studying an almost exclusively Mandinkan population living in The Gambia would give us an opportunity to search for enhanced associations with purely African polymorphisms. Recently some of the SNPs and phenotypes that we investigated were studied in African Americans (Adkins et al. 2008), but this population would have been unlike the one we studied due to both the different environmental exposures (nutritional, medical and otherwise) and likely genetic admixture (Parra et al. 1998).

We studied 2 SNPs and 1 indel that were African-specific. All three polymorphisms may have undergone recurrent mutation (or gene conversion) in the past (Stead and Jeffreys 2002), so a given allele at each of these SNPs might be present on very different ancestral lineages. Associations with these polymorphisms therefore have to be interpreted with caution. We also studied a further 3 SNPs that were not African-specific. Of these six polymorphisms we could find no associations between markers of foetal and early growth and SNPs 24, 34, 38 and 69, despite there being evidence that SNP 69 can activate a downstream cryptic 5′ splice site to extend the INS pre-mRNA 5′ leader (Královičová et al. 2006), and that SNP 34 in the offspring and SNP 24 in the mother may be associated with risk of being SGA in African Americans (Adkins et al. 2008).

The infant’s maternally transmitted (African-specific) SNP 28 allele was associated with post-natal weight gain, an important factor for neonatal survival (Victora et al. 2001). The association was adjusted for the maternal genotype suggesting that it was not caused by maternal genotype effects on the intrauterine environment. The INS gene is not considered one of the classically imprinted genes, unlike nearby H19 and IGF2, common polymorphisms in which have been associated with foetal growth in non-African populations (Petry et al. 2005; Kaku et al. 2007). There is evidence that the imprinting of INS is developmentally regulated, however, such that only the paternal allele is expressed in the yolk sac (Moore et al. 2001). Our parent-of-origin effect obviously reflects an association with the maternally transmitted allele, however, so the mechanism is unclear although not unique for this gene (Adkins et al. 2008). This may reflect linkage disequilibrium with SNPs in nearby paternally imprinted genes such as H19, polymorphic variation in which was associated with early weight gain in a British population (Petry et al. 2005).

SNP 27 has been extensively studied in relation to early growth due to its almost perfect linkage disequilibrium with the two major INS VNTR classes found in non-African populations (Barratt et al. 2004). In these populations some studies have shown associations with size at birth or early growth (Dunger et al. 1998; Ong et al. 2004; Lindsay et al. 2003; Heude et al. 2006; Osada et al. 2007) and some have not (Mitchell et al. 2004; Bennett et al. 2004; Vu-Hong et al. 2006; Landmann et al. 2006; Mook-Kanamori et al. 2007). This possibly reflects differences in linkage disequilibrium with a putative functional polymorphism between these different populations, although SNP 27 itself might be functional (Královičová et al. 2006). In the current study we could find no association with these phenotypes and the classical VNTR classes or between SNP 27 and either birth weight or head circumference. Where we did find a significant association was between SNP 27 and birth length. This confirms one of our original findings in Caucasians, although the association was in the opposite direction (Dunger et al. 1998), as has previously been observed for the association with birth weight in Pima Indians (Lindsay et al. 2003). People in the current study with the ‘T’ allele were born shorter at birth with similar weights suggesting that they may have had more body fat. Birth weight associations may therefore not have been evident partially due to poor maternal nutrition during pregnancy.

This is the first time, to the best of our knowledge, where a gene previously found to be in association with a particular quantitative trait that is related to neonatal mortality in non-African populations has been studied in an attempt to find further associations between the more polymorphic African gene and phenotype from an environment with higher neonatal mortalities. Where we found significant associations confirmation in similar populations will be necessary if a suitable population can be found. To interpret our data we applied a Bonferonni correction to account for multiple testing, although this is controversial for correlated data (Rothman 1990). The statistical power for this study, calculated a priori using an effect size based on the one that we originally observed for the INS VNTR in a contemporary British population (Dunger et al. 1998), may not have been realised in practice given that other studies have failed to replicate such an effect size (e.g. Bennett et al. 2004; Mitchell et al. 2004). Nevertheless, given the tight confidence intervals that were gained, suggesting precise group mean estimates, we can be confident that we did not miss any associations with large or moderate effect sizes.

In conclusion this study suggests that in the tribally and socially pure Gambian population that we studied there are associations between SNPs in the INS region (one of which is African-specific) and foetal and early growth characteristics, which contribute to the overall polygenic associations with size at birth (Dunger et al. 2007). Despite D′ values suggesting a high degree of linkage disequilibrium between the polymorphisms studied, in our population SNPs 34 and 27 were the only two polymorphisms genotyped where there was significant correlation of any extent between genotypes, so associations with different SNPs can not be explained solely by linkage disequilibrium. We could find no further evidence of this genetic region providing a survival advantage when the system was stressed by maternal undernutrition taking place in the rainy season.