Introduction

Although genotype-environment (GE) interaction is the current focus of much behavioral genetic research, GE correlation appears to be a more widespread phenomenon (Plomin and Davis 2006). GE interaction refers to genetic sensitivity to environments in the sense that the effects of the environment can depend on genetics and the effects of genetics can depend on the environment (Kendler and Eaves 1986). In contrast, GE correlation refers to genetic exposure to the environment in that experiences can be correlated with genotype. Identifying GE correlation involves treating environmental measures as dependent measures in quantitative genetic analyses, and for this reason such research has been called the nature of nurture (Plomin and Bergeman 1991). Beginning with the pioneering work of Rowe (1981, 1983), dozens of twin and adoption studies have shown ubiquitous genetic influence on widely used measures of the environment (Plomin 1994). A recent review of 55 independent genetic studies using environmental measures found an average heritability of 0.27 across 35 different environmental measures (Kendler and Baker 2007).

In developmental psychology, the main long-term goal of GE correlation research is to explain the extent to which genetic factors mediate associations between ostensibly environmental measures and measures of children’s development (Plomin 1994). In terms of measures of parenting which are the mainstay of such studies, the key question is about the direction of effects, the extent to which parenting is the cause or the effect of children’s development (Bell 1968). For this reason, it is reasonable to investigate genetic influence of children on their parents’ parenting. For example, much GE correlation research in quantitative genetics uses twins who are children but analyzes parental reports of parenting towards the children (Plomin 1994). Finding evidence for genetic influence in such a design suggests that parenting reflects genetic differences in children, which is central to the direction-of-effects issue. However, the investigation of GE correlation is rich in other possibilities for understanding ways in which genetics impacts associations between parenting measures and children’s development (Plomin et al. 1977). It is also possible to use children’s self-reports of their perceptions of their parents’ parenting, which would assess genetic factors in the children themselves that contribute to heritability of such perceptions of parenting. In addition, it is possible to study genetic influences on parenting using a design in which the parents are the twins, which would assess genetic factors in the parents themselves that contribute to heritability regardless of whether the parenting measure is associated with children’s development.

Because of our interest in the direction-of-effects issue, we focused on a parenting measure known to relate to children’s development and conducted a genome-wide association study of this measure of parenting but using children. We used a parent-report measure called the Confusion, Hubbub, And Order Scale (CHAOS; Matheny et al. 1995) because it shows stronger (negative) associations with cognitive development in childhood than other environmental measures such as socioeconomic status (Petrill et al. 2004; Pike et al. 2006). Family chaos involves ‘environmental confusion’—the lack of organization and calm in the household. Although the CHAOS measure has not yet been used in published GE correlation research, the review mentioned above included results for three measures of family organization which yielded an average heritability of 0.25 (Kendler et al. 2007). We have found that a self-report version of the CHAOS measure produced a heritability estimate of 0.50 based on MZ and DZ twin correlations of 0.66 and 0.41 (unpublished) in a sample of 3,000 9-year-old twin pairs in our Twins Early Development Study (TEDS; Oliver and Plomin 2007), the sample used in the present study.

Research on GE correlation will be greatly advanced when genes are identified that are responsible for the heritability of environmental measures (Jaffee and Price 2007), just as research on GE interaction was advanced by identifying genes that interact with the environment in affecting behavioral development (Caspi et al. 2003, 2002). Candidate gene associations have been reported with marital status (Dick et al. 2006) and adults’ retrospective reports of how they were parented (Lucht et al. 2006). We used five SNPs reported to be associated with general cognitive ability in 7-year-olds (Butcher et al. 2005a, b) as a composite ‘SNP set’ and found that the SNP set correlated significantly with the CHAOS measure completed by parents when their children were 3 and 4 years old (Harlaar et al. 2005). Maternal education and paternal occupational class were not correlated with the SNP set. This evidence for GE correlation using measured genes and measured environments motivated us to conduct a genome-wide association scan for genes associated with family chaos.

Genomewide association scans are now possible using SNP microarrays (Hirschhorn and Daly 2005), although many issues remain to be resolved such as gene-centered versus genome-centered approaches (Neale and Sham 2004), common versus rare variants, sample size, and design (Carlson et al. 2004; Newton-Cheh and Hirschhorn 2005; Thomas et al. 2005; Wang et al. 2005). However, microarrays are expensive and can be used only once, which makes them impractical for genotyping the very large samples needed to detect gene associations of small effect size. One economical strategy for screening large samples for small effects is to pool DNA for groups such as cases versus controls for a disorder or low versus high groups for a quantitative trait (Darvasi and Soller 1994; Knight and Sham 2006; Norton et al. 2004). We have combined the strengths of microarrays and DNA pooling in a method we call SNP microarrays and pools (SNP-MaP). We and others have shown that pooled DNA can be genotyped reliably on microarrays (Butcher et al. 2004; Kirov et al. 2006; Meaburn et al. 2005; Meaburn et al. 2006; Pearson et al. 2007) and we have used the SNP-MaP method to identify genes associated with general cognitive ability (Butcher et al. 2005b, 2007) and with reading (Meaburn et al. 2007) using a multistage design that includes confirmation by individual genotyping of SNPs nominated in the SNP-MaP scan.

In the present study, we apply the SNP-MaP method in a two-stage association scan of family chaos in a representative UK sample of 6,000 4-year-old children in 3,000 families. In the first stage, we used pooled DNA to screen for the largest SNP allele frequency differences from 490,041 autosomal SNPs comparing low chaos families (N = 463) and high chaos families (N = 402). In the second stage, we individually genotyped 48 SNPs nominated by SNP-MaP and tested them for association in an unselected representative sample of 3,529 children; genotyping an unselected sample allows us to test the quantitative trait locus (QTL) hypothesis by assessing the extent to which the SNPs are associated with CHAOS throughout the distribution. The goal of this two-stage design was to balance false positive and false negative results in the search for associations of small effect size.

Methods

Participants

The Twins Early Development Study (TEDS) is a large, longitudinal study set up to investigate the genetic and environmental bases of cognitive and behavioral development (Oliver et al. 2007; Trouton et al. 2002). TEDS recruited families of twins born in England and Wales in 1994, 1995 and 1996. Nearly 16,000 families were contacted, of whom over 11,000 agreed to participate. Parents completed questionnaire booklets in the year following the birth of the twins that assessed a range of background variables, with subsequent questionnaire booklets sent before the children’s birthdays. The sample is representative of the UK population (ascertained by comparison with the 1994 census data from the Office of National Statistics), although fewer mothers of twins are in full-time work outside the home. We excluded children with severe current medical problems, children who had suffered severe problems at birth or whose mothers had suffered severe problems during pregnancy. Unknown or uncertain zygosity was also grounds for exclusion. We also excluded twins whose first language was other than English. Finally, in order to avoid issues of population stratification, we included only twins whose parents reported their ethnicity as ‘white’, which is 94% of the sample (comparable to the UK population). The sample used in the present study included 4,650 families for whom DNA was available as well as environmental measures when the children were 4 years of age.

Measure

The degree of chaos in the home was assessed at both 3 and 4 years of age by parents (98% mothers) using the Confusion, Hubbub, and Order Scale (CHAOS; Matheny et al. 1995). The CHAOS questionnaire has been validated through comparison with direct observations in the home environment (Matheny et al. 1995). More than a dozen publications have used the CHAOS measure; a recent paper concludes that “the CHAOS scale provides an adequate and economical measure of home confusion and disorganization that should prove useful in clinical research with diverse populations” (Dumas et al. 2005).

The short version of the CHAOS measure that we used consisted of six items rated on a five-point scale (1 = definitely untrue, 5 = definitely true), including the following examples: “You can’t hear yourself think in our home” and “we are usually able to stay on top of things” (reverse scored). As mentioned earlier, in our research, the CHAOS measure assessed by parents when their children were 3 and 4 years of age correlated more highly with cognitive development than did other environmental measures such as socio-economic status; moreover, CHAOS correlated with cognitive development independently of socio-economic status (Pike et al. 2006), as has been found in other studies as well (Dumas et al. 2005). In our study when the children were 3 and 4 years old, CHAOS correlated 0.28 with low socio-economic status, 0.41 with maternal depression, 0.27 with negative maternal feelings towards the children, and 0.30 with harsh discipline towards the children (Pike et al. 2006).

A total chaos score was generated at 3 years and at 4 years by summing the items (following reverse scoring so that high values = high chaos). In our sample, the scale yielded acceptable internal consistency at both ages (Cronbach’s alpha = 0.63). The scale yielded a correlation of 0.69 from 3 to 4 years (N = 4650 families), indicating considerable stability across a year, which is a lower-limit estimate of test–retest reliability. Because reliability is increased by aggregating data at multiple measurement occasions, we averaged the CHAOS scores at 3 and 4 years.

Design and Procedures

The design and procedures are described briefly in this section; greater detail can be found in other publications (Butcher et al. 2007; Meaburn et al. 2007).

Stage 1: SNP microarrays and pooling (SNP-MaP) screen of low versus high groups

Low and high CHAOS families were selected from the TEDS sample of more than 4,000 families with twins for whom DNA on both twins and CHAOS data on the family were available. A 33% cut-off (i.e., the top and bottom third) was used to select families from the CHAOS score distribution. In addition, as part of an ongoing GE interaction study of general cognitive ability (‘g’), families were also required to score in either the top or bottom 15% of the general cognitive ability distribution. These criteria resulted in the selection of 469 low CHAOS families and 369 high CHAOS families. Allele frequencies for the low and high groups were indexed by the average of 10 independent DNA sub-pools (biological replicates) per group; each individual was randomly ascribed to one sub-pool.

Stage 2: Testing the QTL hypothesis by individually genotyping SNPs nominated by SNP-MaP in an unselected sample

In Stage 2 of the study, the QTL hypothesis was tested by individually genotyping an independent sample. Because the foundation sample contains twin pairs, only one twin per pair was selected (= 4,655). We also excluded Stage 1 individuals and MZ co-twins of Stage 1 individuals. (Although it would be acceptable to include MZ co-twins of Stage 1 individuals in Stage 2, CHAOS is a family-wide measure, which means including an MZ co-twin is tantamount to including the pooled individual themselves because the genotype and the phenotype is the same.) After removing these individuals, 4,183 individuals remained; 3,529 had CHAOS data (z-score range of the sample was −2.3 to 4.0). The sample provides 100%, 99%, and 76% power to detect an additive single-locus genetic effect explaining 1%, 0.5% and 0.2% of the total variance of CHAOS scores, respectively, uncorrected for multiple testing (P < 0.05, one-tailed) (Purcell et al. 2003). These power estimates refer to the SNPs themselves; power is of course less to detect indirect associations with other polymorphisms in between the SNPs assessed on the Affymetrix GeneChip 500 K array. Nonetheless, with these SNPs, two-thirds of the SNPs are in high LD (r 2 > 0.8) with a SNP genotyped in HapMap (Pe’er et al. 2006); with this in mind, power to capture a truly contributory variant by indirect association is equivalent to genotyping a sample 2,823 individuals. Such a scenario has 100%, 96% and 66% to detect QTLs with the same parameters as previously mentioned.

DNA pool construction

Each individual selected for the low or high CHAOS groups was randomly assigned to one of ten sub-pools for each group. Genomic DNA for each individual, extracted from buccal swabs (Freeman et al. 2003) and suspended in EDTA TE buffer (0.01 M Tris–HCl, 0.001 M EDTA, pH 8.0), was quantified in triplicate using PicoGreen™ dsDNA quantitation reagent (Invitrogen). Upon obtaining reliable triplicate readings, each individual contributed the same amount of DNA to their respective sub-pool. Because individual samples differed in their concentrations, a range of volumes of individual DNAs was added to permit equimolar DNA contributions to the sub-pools. We deemed 1 μl the minimum volume that could be added to a sub-pool without compromising pipette error. Therefore, the amount of DNA contributed to the sub-pools was determined by the mass of DNA contained in 1 μl of the most concentrated individual, in this case 98.6 ng/μl. Each individual therefore contributed 98.6 ng to the DNA pool. The range of concentrations for the 20 sub-pools was: 14.7–17.2 ng/μl (low CHAOS), and 15.7–17.2 ng/μl (high CHAOS).

SNP microarray allelotyping of pooled DNA

Each of the 20 DNA pools was allelotyped using the GeneChip® Mapping 500 K Array set in accordance with the standard protocol for individual DNA samples (see the GeneChip® Mapping 500 K Assay Manual for full protocol). Each microarray was scanned using the GeneChip® Scanner 3000 with High-Resolution Scanning Upgrade, which was controlled using GeneChip® Operating software (GCOS) v1.4. Cell intensity (.cel) files were analyzed using GTYPE. Each of the twenty DNA sub-pools was assayed on a separate microarray set; for quality control checks, a reference DNA individual provided by the manufacturer (sample number 100103) was also assayed on a separate microarray set.

Generation of SNP-MaP allele frequency estimates

Relative Allele Signal (RAS) scores, calculated using the 10 K MPAM Mapping algorithm, have been shown to be reliable and valid indices of allele frequency in pooled DNA (Brohede et al. 2006; Butcher et al. 2004; Craig et al. 2005; Kirov et al. 2006; Liu et al. 2005; Meaburn et al. 2005, 2006; Simpson et al. 2005). Details of how probesets on Affymetrix Mapping GeneChip® microarrays are used to calculate allele frequency estimates as described elsewhere (Butcher et al. 2007). Allele frequency estimates for the 500 K microarray set were calculated manually from the raw probe intensity data exported as a .txt file.

Selection of SNPs from Stage 1

To select SNPs for individual genotyping, we derived a rank-based composite score based on five criteria from the Stage 1 dataset. The derivation of this composite score is presented elsewhere (Butcher et al. 2007). Briefly, the five criteria were: (1) greater average allele frequency difference between low and high CHAOS groups, (2) smaller average variance of the low and high CHAOS groups (i.e., variance across the DNA pooled allele frequency estimates for each group), (3) smaller average variance within each microarray (i.e., variance across the multiple probesets that form the microarray’s allele frequency estimate), (4) greater number of successful replicate pools, and (5) greater minor allele frequency, as indexed by the average of the low and high CHAOS groups. Because we expect many more putatively significant associations from Stage 1 than could be realistically individually genotyped (>5,000, < 0.01), we used this composite to choose the top 48 SNPs with the highest composite scores. The SNP screen was restricted to the autosomes because the DNA pools included both boys and girls, which complicates analyses of SNPs on the X chromosome.

Individual genotyping

After excluding Stage 1 individuals and selecting just one twin per pair as described earlier, the 3,529 individuals were genotyped using the Applied Biosystems’ SNPlex™ genotyping system and analyzed using GeneMapper v4.0 software (Applied Biosystems). SNPlex is a capillary electrophoresis-based multiplex genotyping system capable of genotyping up to 48 SNPs per sample per well (Tobler et al. 2005). In addition to the 3,529 TEDS individuals, 88 CEPH individuals who have been genotyped as part of the HapMap Project (The International HapMap Consortium 2003; The International HapMap Consortium 2005) were obtained from the Coriell Institute to assess genotyping quality and error rate. Reference genotypes of CEPH individuals for the selected SNPs were downloaded from HapMart, the data mining tool for downloading HapMap data (http://hapmart.hapmap.org/BioMart/martview).

Because quantitative genetic research strongly suggests that the majority of genetic effects are additive, we were primarily interested in testing SNPs for their additive effect. Therefore, genotypes of SNPs passing quality control (see below) were tested for additive genetic effects using a Pearson correlation (r). In addition, we followed a procedure recommended by Balding (2006) to test whether a non-additive model (ANOVA) predicted significantly better than an additive model (linear regression).

Genotyping quality control for individual genotyping

The following sequential criteria were applied: SNPs were omitted from analysis if poor genotype clusters prevented GeneMapper software from making calls or if a SNP showed >1 genotype mismatch between CEPH genotypes deposited in HapMap and those derived using in-house genotyping methods. Individuals were omitted if their SNP call rate was <65%. Finally, for each SNP, low peak height genotypes (<25% of the average peak height) were removed; we apply this procedure because poor quality samples often exhibit high background noise that SNPlex can mistake as heterozygotes. It is important to control for this as an excess of heterozygotes will artificially inflate the type-I error rate of Hardy–Weinberg equilibrium tests.

Results

Stage 1: SNP microarrays and pooling (SNP-MaP) screen of low versus high groups

SNP-MaP allele frequencies for the 20 DNA pools were calculated. In order to increase the reliability of SNP-MaP allele frequency estimates, we required allele frequency estimates from a minimum of 6 (out of 10) replicates for both high and low groups. We also excluded SNPs with minor allele frequencies lower than 0.05 as power to detect association in this range is greatly reduced. After these exclusion criteria, the autosomal genomewide screen consisted of 448,944 SNPs from the 500 K microarray set.

The average allele frequency for the low and high CHAOS groups was calculated for each SNP. The correlation between the low and high CHAOS groups was 0.992, indicating that the rank order of allele frequencies was highly reliable overall—a test analogous to genome control. Accordingly, between-group differences were small: Fig. 1 illustrates that 90% of the SNPs exhibited between-group differences smaller than 0.05, with a mean between-group absolute difference of 0.027 for the whole dataset (range: 0.00–0.28).

Fig. 1
figure 1

A histogram illustrating the distribution of absolute allele frequency differences between low and high CHAOS groups derived through pooled DNA on microarrays. The y-axis indicates the number of SNPs and x-axis shows absolute allele frequency differences between low and high CHAOS groups. The figure shows that the vast majority of allele frequency differences are small and that the mean allele frequency between low and high CHAOS groups is about 0.027. The x-axis is elongated to accommodate outliers, which are a logical source of candidate SNPs to follow up. The total number of SNPs is 448,944 because SNPs represented by fewer than 6 out of 10 replicates were removed

As explained in Methods, SNPs selected for individual genotyping in Stage 2 were chosen on the basis of a ranked composite score which took into account the between-group allele frequency difference, variance between- and within-biological replicate microarrays, number of successfully assayed arrays and minor allele frequency. Due to financial restrictions, we were limited to individually genotyping in SNPlex a single probeset of 48 SNPs with the highest composite scores. The mean absolute difference between low and high SNP-MaP allele frequency estimates for these 48 SNPs was 0.11 (ranging from 0.05 to 0.24). The seven SNPs with the largest between-group allele frequency differences were not selected as they exhibited high levels of variance, which counted unfavorably in the composite selection score. Figure 2 places the 48 selected SNPs in the context of the full dataset by plotting the average allele frequency of the low CHAOS group against that of the high CHAOS group. Details about the 48 selected SNPs can be found in Table 1.

Fig. 2
figure 2

A scatterplot showing the 48 top-ranked SNPs (crosses) against the background of 448,994 unselected SNPs comparing allele frequencies for the low CHAOS group (x-axis) and the high CHAOS group (y-axis). The figure also displays the density of SNPs as a function of low CHAOS versus high CHAOS allele frequency differences; density of SNP clusters increases as the heat map changes from light grey (sparse clusters) though to dark grey (dense clusters). Allele frequency differences are small with the majority of small differences occurring for SNPs with minor allele frequencies of 0.10–0.25, which reflects the representation of SNPs with these allele frequencies on the Affymetrix microarray. The correlation between low and high CHAOS allele frequencies was 0.992 indicating high reliability of the rank order of allele frequencies across the low and high CHAOS groups

Table 1 Summary of Stage 1 and Stage 2 genotyping for the 48 SNPs selected by ranked composite score using pooled DNA to screen the genome

Individual genotyping quality control

In our SNPlex analysis, three out of 48 SNPs (rs11263591, rs3843872 and rs4839628) exhibited poor call rates across plates due to poor genotype clustering and were omitted from further analyses. The remaining SNPs showed acceptable genotyping error rates as measured by the concordance between our in-house derived genotypes for 88 CEPH individuals and the genotypes of the same CEPH individuals available from HapMap: We observed 3 mismatches out of 3,954 genotypes (error rate < 0.1%). Of these errors, homozygotes were erroneously called as heterozygotes.

393 individuals (11%) showing low call rates across SNPs were omitted; fragmented DNA is a pre-requisite to running SNPlex and sub-optimal fragmentation is the likely cause of these low call rates. We also excluded an additional 9% of genotypes per SNP whose peak heights were <25% of the average peak height for that SNP across the study. Finally, with 45 SNPs, none would be expected to depart from Hardy–Weinberg equilibrium at P < 0.01; however, 4 SNPs (rs10001415, rs1030303, rs11950448 and rs7970012) did show significant departures and these SNPs were omitted from subsequent analysis. At the cost of reduced sample size, these conservative criteria improved observed genotypic distributions under Hardy–Weinberg equilibrium, tightened genotype clusters in SNPlex, and left the distribution of CHAOS unchanged. After excluding the 7 aforementioned SNPs, we observe 128,299 (88.7%) of a possible 144,689 genotypes. After excluding samples with poor call rates and low peak heights, we used 117,062 genotypes to perform association analysis. The distribution of the CHAOS measure was unchanged after genotype exclusionary criteria.

Stage 2: Testing the QTL hypothesis by individually genotyping SNPs nominated by SNP-MaP in an unselected sample

The 41 successfully genotyped SNPs nominated by Stage 1 were individually genotyped across the unselected sample of 3,529 children in order to test the QTL hypothesis directly by assessing the extent to which the SNPs are associated with CHAOS throughout the distribution. Each individual’s genotypes for the 41 SNPs were tested for additive genotypic effects. With 41 tests and an alpha of 0.05, 2 significant results would be expected on the basis of chance alone using a nominal one-tailed alpha level of 0.05. (We used a one-tailed test because the difference observed in Stage 2 was required to be in the same direction as that seen in Stage 1 screening.). Only one SNP (rs12820468) was significantly associated in the predicted direction with individual differences in CHAOS throughout the distribution. A summary of Stage 1 and Stage 2 results for the 48 SNPs selected (including SNP locations) is provided in Table 1.

Figure 3 presents the results for rs12820468 in terms of standardized mean quantitative trait CHAOS scores for the three SNP genotypes. As can be seen from Fig. 3, the SNP appears to show dominance for the rarer C allele with homozygotes and heterozygote carriers appearing to be susceptible for selecting more disordered environments. However, following the procedure suggested by Balding (2006), we compared additive and non-additive models and found that the non-additive model did not fit significantly better than the additive model. We also examined the associations separately for boys and girls, but no significant differences were found; because our Stage 1 design included boys and girls it would favor SNPs that show effects in both sexes.

Fig. 3
figure 3

Genotype-by-phenotype plot for SNP rs12820468 illustrating the effect of genotype (x-axis) on standardized CHAOS scores (y-axis). The best-fitting genetic model was additive despite the apparent effect of dominance

Discussion

In the first genomewide association scan of an environmental measure, we chose to study family chaos using the parent-report CHAOS measure with DNA of the children because we are interested in the role of GE correlation in the mediation of associations between parenting and children’s development. CHAOS is an especially interesting parenting measure because it correlates more highly with children’s cognitive development than do other environmental measures, including socio-economic status. Like other measures of the family environment, there is evidence from quantitative genetic studies for genetic influence. The present study attempted to bring the power of genome-wide association (GWA) to bear on identifying some of the DNA variation in children responsible for genetic influence on parent-reported CHAOS.

We found one SNP associated with family chaos that reached a nominal significance level of P < 0.05. However, with 41 SNPs nominated in a first SNP-MaP stage using pooled DNA for low and high CHAOS groups we would expect 2 SNPs to remain significant on the basis of chance alone. For this reason, we conclude that despite having 99% power to detect SNP associations that account for more than 0.5% of the variance, we were unable to detect any SNP associations that met conservative criteria for significance. That is, because two significant associations in the second stage of the design would be expected on the basis of chance alone, we assume that the single significant association that emerged (rs12820468) is a chance result. Despite this, it is worth noting that rs12820468 is located in intron 7 of transmembrane protein 16D (TMEM16D) and although the likelihood of the SNP showing functionality is low, it lies in an LD block containing 4 indels (including a 9 bp deletion) and numerous repeat elements. TMEM16D is a large (∼334 Kb) protein coding gene of unknown function located on chromosome 12q23.1-q23.2 and exists in 3 known isoforms. Overall, TMEM16D shows some conservation of features with primates but little with placental mammals or vertebrates. Because functional plausibility is unclear, more work on the TMEM16D gene might be warranted in future molecular genetic research on family chaos and its correlates.

The power of the present design leads us to draw a more far-reaching conclusion: We conclude that it is unlikely that any SNPs of large effect contribute to heritable influence on family chaos as assessed by parents of young children using DNA of the children. As mentioned in the Introduction, we would only find SNP associations if parental reports of CHAOS are correlated with SNPs in their children—this is the key test of GE correlation mediation of the relationship between parenting and children’s development. However, if this were not one’s goal, it may be easier to find SNP associations using children’s own reports of CHAOS or to find SNP associations between parents’ reports of CHAOS and the parents’ DNA. Alternatively, power to detect associations underscoring such evocative GE correlations may be increased by directly studying the behaviors of twins that evoke the parenting. If these behaviors of twins are more heritable then these studies might also yield more associations.

Another possibility is that SNPs not represented on the Affymetrix 500 K array, as well as other polymorphisms (e.g., copy number variation, indels, microsatellites etc.), may have passed through our screen unnoticed. In this regard, it is noteworthy that the same design and sample used in this study have been successful in identifying six SNPs associated with general cognitive ability even though the average effect size of the six SNP associations was only 0.2% (Butcher et al. 2007). This finding suggests that the two-stage SNP-MaP design followed by individual genotyping of an independent unselected sample can identify SNP associations of small effect size; the present finding is important in demonstrating that the design does not always yield positive findings. One important difference between the two studies is that general cognitive ability is nearly twice as heritable as measures of family environment which could indicate that it will be more difficult to find genes associated with family environment, although it is not necessarily the case that it is easier to find genes for more heritable traits. Another difference between the two studies is that in the SNP-MaP stage, we used top and bottom thirds of the CHAOS distribution to select our low and high CHAOS groups, whereas the low and high general cognitive ability groups were selected from the top and bottom sixths. Given roughly equal sample sizes in the two studies, the less severe selection for CHAOS results in less power than in the study of general cognitive ability.

Nonetheless, the evidence for the heritability of measures of the family environment such as family chaos is persuasive (e.g., Kendler and Baker 2007), which implies that differences in DNA sequence are ultimately responsible for the heritability. It is likely that the DNA differences responsible for this heritability have such small or subtle effects that even more powerful strategies will be needed to detect them. Identifying genes associated with environmental measures will be worth the effort because they will foster research on an active model of experience in which individuals select, modify and create environments on the basis of their genetic proclivities (Plomin 1994). In other words, genetic effects on behavior do not stop at the skin—genetic effects need to be considered in relation to an ‘extended phenotype’ that includes effects on individuals’ environments (Dawkins 1982, 2004).