Headliners: Smoking: The Role of the Parent in Deterring Child Smoking, as Seen by Rural Native American and White Parents

Kegler MC, Malcoe LH. 2005. Anti-smoking socialization beliefs among rural Native American and white parents of young children. Health Educ Res 20(2):175–184. 
 
Studies suggest that there are differences between the races in parental “anti-smoking socialization”—that is, how parents influence their children’s expectations regarding the feasibility, acceptability, and consequences of smoking cigarettes. For instance, black parents are more likely than white parents to set ground rules regarding tobacco use for their children, and are less likely to assume that teens will inevitably experiment with smoking. Now Lorraine Halinka Malcoe and NIEHS grantee Michelle C. Kegler of Emory University have compared antismoking socialization beliefs among rural white and Native American parents. Better information on how beliefs vary racially could help shape more effective ways of teaching parents to deter their children from smoking. 
 
Teen smoking rates vary significantly between racial and ethnic groups. According to data from the Centers for Disease Control and Prevention for the year 2000, 31.8% of white high school students reported smoking in the past 30 days. Hispanic students were next at 22.6%, followed by Asian Americans at 20.6%, and blacks at 16.8%. Data on smoking among Native American teenagers are not as readily available, but some studies have indicated the rate among Native Americans overall is comparable to or higher than that of whites. In 2000, 36% of adult Native Americans smoked, compared with 24.1% of white adults. 
 
The study showed that Native American and white parents were similar in their antismoking socialization beliefs with one exception: Native American parents were less likely to believe that schools are better than parents at teaching children about the dangers of smoking. Less educated parents were more likely to believe that strictly forbidding children to smoke only makes them want to smoke more. Consistent with earlier results, parents of both races had less stringent beliefs and a lesser sense of parental efficacy compared to black parents. 
 
Methods to bolster antismoking socialization beliefs of less-educated parents may be important in preventing children in low-income rural communities with high smoking rates from beginning to smoke. Although limited in size and scope, this study provides evidence that future research should focus on ways to increase parental communication of antismoking beliefs and assessment of whether such interventions result in lower rates of smoking onset.


Background
To find new susceptibility loci for complex diseases on the human genome, a high number of case and control samples is required. An old approach with new perspective is the pooling of cases and controls. The larger the number of analyzed SNPs, the more striking are the advantages of a pooling study. With advanced microarray technology it is now possible to analyze SNPs throughout the whole genome. With the Human Mapping 500 K array set from Affymetrix and the BeadChips from Illumina, over 500,000 SNPs can be genotyped on two arrays. Different groups have tested the reliability of Affymetrix microarrays for pooling studies with either the 10 K array [1][2][3][4][5][6] or the 50 K array [7,8]. On these arrays, each SNP is interro-gated by 40 probes (20 for the plus and 20 on the minus strand). On the 250 K arrays over 90% of the SNPs are represented by only 24 probes (some SNPs are only on the plus or the minus strand). This reduction of probes, as well as the reduction of the feature size from 18 μm (10 K), and 8 μm (50 K) to 5 μm (250 K) could have a negative influence on the outcome of pooling results. To examine if this is true, we tested the Nsp I 250 K array which represents 262.264 SNPs and is part of the 500 K array set. According to the Data Sheet from Affymetrix, over 85% of the human genome is covered by SNPs within 10 kb distance with this array set. If allelotyping of pooled DNA is feasible with these arrays, whole genome association studies including thousands of samples could be performed within a few weeks in a cost-effective manner.

K array
To assess the measurement error in our lab, we estimated the allele frequency in a pool of 26 DNA samples previously genotyped in our lab with the 10 K array. We calculated the allele frequency with three methods (see Material and Methods). As reference data for the correction of unequal allele signals, we took either data generated in our lab ("our") or data from other labs ("web" or "brohede"). From 10,561 SNPs on the 10 K array, the allele frequency of 3,574 SNPs could be estimated with all three methods. In Table 1, we show the mean and median error (absolute difference between known and estimated allele frequency), the correlation coefficient between known and estimated allele frequency, and the standard deviation (SD) between the four replicates. As expected, the estimates were better when using the reference data generated in our lab. The PPC method was the most accurate method with a mean error of 0.043. However, the kcorrection with heterozygous RAS values gave only slightly worse results with an error of 0.046. In comparison with other methods the PPC is the only algorithm that uses only perfect match data. To elucidate if the k-correction can be improved by utilizing just perfect match data, we set all cell intensity values in the original cell files to zero. Then we derived a perfect-match-RAS and reanalyzed the data using the k-correction with heterozygous references. The resulting estimates gave an average error of 0.108. Applying a second degree polynomial on these perfect-match-RAS values could reduce the error to 0.054. However, for "normal" RAS values the second degree polynomial did not improve the error.

K array
From the 262,264 SNPs on the Nsp 250 K array, the rsnumbers of 195,158 SNPs could be identified from the HapMap CEPH Population (NCBI_Build35). We excluded 137 SNPs (3 on Chr. 1, 128 on Chr. 2, 6 on Chr. 16) which had inconsistent genotype information in the two sources (e.g. rs1364648, Affymetrix annotation: A/G, minus-strand; HapMap data: C/G, plus-strand). From the remaining SNPs, 122,754 had a 100% call rate in the 88 HapMap samples. For the evaluation, 104,141 SNPs could be used because they had at least one "AB" genotype (required for k-correction) in the 56 reference samples genotyped in our lab. Table 2 shows the mean error, the correlation coefficient between known and estimated allele frequency, and the standard deviation between the pool replicates. We also specified how the accuracy depended on the number of pool replicates, the number of reference RAS values (with AB genotype), the minor allele frequency, and the SNP type. As expected, we found that the mean error decreased by the number of pool replicates. The mean error also decreased by the number of "AB" reference samples, and with an increasing minor allele frequency. To see if the error improves with higher allele frequencies only because of a higher number of "AB" references or vice versa, we adjusted both parameters and found the same trend. We could further show that the estimation of the allele frequency in A/T SNPs was significantly less accurate than in G/C SNPs (p < 0.001). The same trends were found for the 10 K array (results not shown).
For the reference samples, arrays with less than 93% call rate were excluded. For pooled DNA, however, the call rate normally is around 80%, because many SNP frequencies lie between homozygous and heterozygous frequencies. To prove if the call rate can be partially explained by the detection rate (MDR), we plotted the call rates against detection rates from 100 Nsp and 100 Sty arrays previously analyzed with individual DNA in our lab ( Figure 1). According to the regression curve, a call rate of 93% corresponds to a detection rate of about 97.8%. One of our 250 K arrays (hybridized with pooled DNA) had a detection rate of 96.7%. It was therefore considered to be of bad quality and was excluded. This array also had a significantly poorer accuracy (error: 0.075). In the other four arrays (with MDR >99.2) a high MDR also correlated with a low error (see Figure 2).

Discussion
With our data from the 10 K array, we could confirm that from the three tested methods, the PPC algorithm [1] gave the best estimates. Compared to other methods, this algorithm (a) utilizes the signal intensities from individual probes (not RAS values); (b) it takes only data from the perfect matches; (c) it applies a second degree polynomial for correction of unequal hybridization; and (d) it uses reference information from all three genotypes (AA, AB, BB). Our results suggest that neither of these parameters alone is responsible for the good performance of the PPC algorithm but the combination of all. However, the need for all three genotypes in the reference samples limits the The errors are based on estimates from 3574 SNPs which could be analyzed by all methods. *Data used for normalization: "our" = 34 individuals analyzed in our lab, "Brohede" = 26 individuals analyzed in the lab of Brohede et al. [1], "web" >3000 individuals analyzed in the lab of Caig et al. [9], files are available under [15]. To avoid the use of reference data in a case-control study with pooled samples, it is also possible to directly compare the signal intensities of the perfectly matching probes between cases and controls as shown by Macgregor et al. [7]. In this study, the use of a correction for unequal hybridization signals had only little effect upon the results. However, also slight improvements can be important for the finding of low susceptibility genes in pooling studies.
Despite the reduction of the feature number and feature size, the absolute error between real and estimated allele frequency with the 250 K array was as low as the one for the 10 K array when using Simpson's k-correction. The correlation between real and estimated allele frequency was even higher with the 250 K array, and the standard deviation was lower. However, our results from the 10 K and the 250 K array are not directly comparable, because (a) pools were constructed from different DNA samples, (b) the experimental protocol was different, (c) different scanners were used for both chips, and (d) the software used for data extraction was different.
As shown in Table 2, the accuracy of the allele frequency estimation improved with the number of pool replicates. The absolute error between three and four replicates only decreased by 0.001. Therefore, we assume that the addition of further technical replicates would not essentially improve the accuracy. In our study, we used pools of identical samples. However, for a case-control study, it might be of advantage to use pools of independent samples to capture the variance between the individuals. In this case, an increase of replicates can improve the accuracy. With increasing number of "AB" references, the error decreased to 0.024 when 35 references were present. In our study, the mean error was smaller when the minor allele frequency was higher. This was also true for the 10 K results using the PPC algorithm, which is in contrast to the results published by Brohede et al. [1], where the best estimates were obtained at minor allele frequencies <0.1. Interestingly, the accuracy of A/T SNPs was found to be significantly worse than the accuracy of G/C SNPs on the 250 K array. This is probably due to the higher affinity of the G-C hydrogen bound compared to the A-T bound. For the stability of the entire hybridization complex, an unspecific hybridization with "A" or "T" is relatively less important than with "G" or "C". Here we analyzed only one of the two 250 K arrays from the 500 K set. The only difference between the two arrays is the cleavage side in the first fragmentation step. Therefore, we assume that both arrays, Nsp and Sty, perform equally well.
Pooling of samples has several disadvantages compared to a case-control study analyzing individual genotypes: (a) Associations which do not result in a significant change of the allele frequency can be overlooked; (b) Measurement errors can lead to false results; (c) Stratification of the population by age, sex, disease subtype, etc. has to be done before the analysis; (d) Haplotype analysis is only possible under certain conditions [10,11]; and (e) Analysis of gene-gene interactions can not be performed. However, with advancing technologies and algorithms, the mean Graph showing the correlation between detection rate (MDR) and call rate Figure 1 Graph showing the correlation between detection rate (MDR) and call rate. Data derived from 100 NspI and 100 StyI arrays, hybridized with individual DNA. A 93% call rate corresponds to about 97.8% MDR.
measurement error can probably be reduced to values < 0.03 [1,4]. The use of linkage information should improve the likelihood of finding "real" associations and detect false positive SNPs. Taking the HapMap information (Build 35) for the 10 K array, we found ~30% of the SNPs to be linked to its downstream SNP (LOD >3); with the 500 K array set it was ~50%. With this high linkage, the allele frequency of one SNP can be partly explained by the allele frequency of a linked SNP. To take advantage of this fact, two recent publications propose to use p-value combinations in a sliding-window concept [9,12]. With increasing number of analyzed SNPs and better linkage information most haplotypes can be explained by individual SNPs [13].

Conclusion
We think that DNA pooling might be a useful and affordable tool to detecting new candidate genes for genetic diseases, especially at a whole genome level. However, this has to be proven in future association studies with pooled DNA.

DNA pooling and microarray analysis
The determination of the DNA concentration in the individual DNA samples was done with PicoGreen reagent (Molecular Probes) using a standard curve of λ-DNA.
From each sample, 50 ng genomic DNA was taken for the pool construction. For the 10 K array, we pooled 26 DNA samples that were individually genotyped before with the 10 K array. For the 250 K array we pooled 88 samples from the HapMap CEPH Population, whose genotype information is available at the HapMap homepage [14]. From individual or pooled samples 250 ng DNA was analyzed on the GeneChip Human Mapping 10 K Xba 131 array or the 250 K Nsp array (Affymetrix) according the manufacturers protocols. Four replicates of the same DNA pool from the 10 K and the 250 K array were processed and hybridized on four different days, respectively. Imaging of the microarrays was performed using either the GCS3000 scanner (10 K array) or the upgraded GCS3000-G7 scanner (250 K array) from Affymetrix. Genotype calls and probe intensity data were extracted with the GDAS software using default parameters ( [9]. For this correction we excluded RAS1 and RAS2 values with standard deviation >1 (SD from 4 pools) and set values <0 and >1 to 0 and 1, respectively. As reference data for the k-corrections (Simpson et al. and Craig et al.) we used RAS values from 34 arrays analyzed with individual DNA in our lab or RAS values from over 3000 arrays on the web page [15] provided by Craig et al. [9]. The polynomial based probe specific correction (PPC) from Brohede, et al. uses information of the individual perfect match probe pairs from all three genotypes [1]. As reference data for correction, we used 34 arrays previously analyzed in our lab or k-correction data from 26 arrays kindly provided by Jesper Brohede as external reference.

Estimation of allele frequency with the 250 K array
For the 250 K arrays, the k-correction proposed by Simpson, et al. was used to estimate the allele frequencies [6].
Heterozygous RAS values were taken from a set of 56 arrays (all with call rates >93%), which were previously analyzed with individual DNA in our lab. The average RAS Graph showing the correlation between detection rate (MDR) and the error (absolute difference between estimated and known allele frequency) Figure 2 Graph showing the correlation between detection rate (MDR) and the error (absolute difference between estimated and known allele frequency). Each cross stands for one 250 K array, all hybridized with the same DNA pool.