Introduction

Smoke exposure is one of the most serious health problems worldwide1. Smoking creates a heavy disease burden and is associated with a 50% higher mortality rate from all causes among men who are smokers2. Active smoking is currently the most preventable cause of death, disability and various chronic diseases3,4,5,6,7,8,9. China is the largest tobacco grower and consumer in the world10 and the disease burden resulting from tobacco smoking is high11,12. One study recently conducted in East Asia demonstrated a smoking rate of 52.9% in adult Chinese men (aged 20–69 years) between 2008 and 201113.

A study on twins in 2011 showed that susceptibility to smoking behaviour is influenced by genetic factors14 and family linkage analyses and candidate gene association studies have confirmed this finding15,16,17,18. Since 2005, genome-wide association studies (GWASs) of smoking behaviour (regular smoking, cigarettes per day and smoking initiation (SI) age) have identified 21 single-nucleotide polymorphisms (SNPs) with significant genome-wide associations (P < 5 × 10−8) in or near the following genes: CHRNB3, CHRNA6, BDNF, CHRNA3, CHRNA5, AGPHD1, CHRNB4, CYP2A6 and EGLN219,20,21,22,23,24,25,26. Many of these genes are expressed in or known to act in nicotine or dopamine receptor or brain-derived neurotrophic factor pathways.

Although GWASs have identified 21 SNPs associated with smoking behaviour19,20,21,22,23,24,25,26, each SNP accounted for only a very small fraction of the variation in smoking behaviour and the results were unstable. The variable results among studies may be related to differences in effect sizes, sample sizes, genetic heterogeneity, genomic confounders, linkage disequilibrium (LD) and spurious associations27. Furthermore, the study populations of these GWASs did not include Chinese individuals. Thus, we conducted this study to verify these SNPs in a Chinese population and subsequently create a genetic score combining the effects of these SNPs on smoking behaviour.

Design and Methods

Study sample

We conducted two population-based, cross-sectional surveys in 2001 and 2010 on elderly residents (aged ≥60 years) of the Wanshoulu district. As described in our previous study28,29, a 2-step randomized cluster sampling method was used to select 2,277 participants (943 males and 1,334 females) in 2001 and 2,102 participants (848 males and 1,254 females) in 2010. After excluding 818 participants duplicated in both surveys and 8 unsuccessful genotyping results, a total of 3,553 participants (1,477 males and 2,076 females) were included as our study sample (Fig. 1). Trained interviewers met with the participants face-to-face to complete a standardized questionnaire addressing a range of demographic factors, medical history and health-related behaviours (particularly smoking exposure status).

Figure 1
figure 1

Flow diagram of the study population.

Measurement of smoking behaviour

A smoker was defined as a person who had ever smoked a tobacco product daily for at least 6 months30. A heavy smoker was defined as a person who had ever smoked more than 20 cigarettes per day31. Additionally, an SI age of ≤18 years was used a measurement of smoking behaviour1 because previous studies have shown that compared with SI during adulthood, tobacco use prior to 18 years of age leads to behavioural consequences (such as drug abuse) during adulthood, in addition to more serious health consequences (including mental and physical effects)32.

Measurement of covariates

The categories of educational attainment included 0–6 years (primary school or less), 6–12 years (middle school to high school or the equivalent) and ≥13 years (completed a university or other tertiary education). The occupation types were classified into the following three categories: white collar (professional, government), light physical labour (skilled worker, service, merchant) and hard physical labour (farmer, factory worker, manufacturing and transportation worker). Ethnicity was classified into the following two categories: Han and minority. Body mass index (BMI) was classified into the following three categories: normal (<24.00), overweight (24.00–27.99) and obese (≥ 28.00)33. Sports activity time was classified into the following three categories: <1 hour/week, 1–4 hours/week and >4 hours/week.

Genotyping

The standard proteinase K-phenol-chloroform method was used to extract DNA from whole peripheral blood samples. The laboratory staff was blinded to the identities of the subjects and their smoking status.

Among the 21 previously reported SNPs, we excluded rs1051730, rs879048, rs2036527, rs8034191, rs11638372 and rs16969968 due to minor allele frequencies (MAFs) <0.1 in the HAPMAP-CHB (Chinese Han Beijing) population (Supplementary Table S1); however, the 15 remaining candidate SNPs were included in our analysis (Fig. 2). The MassARRAY system was used to genotype the candidate SNPs.

Figure 2
figure 2

The process for choosing candidate SNPs.

Genetic score

Genotyping revealed an LD plot (Supplementary Fig. S1) for the 15 SNPs: using run tagger, we chose rs6474412 to represent this LD plot (Supplementary Table S2). To evaluate the effects of these SNPs on smoking behaviour, we examined the SNPs in four genetic models (dominance model, recessive model, heterogeneous codominant model and additive model34) and in males and females separately. We then excluded the SNPs with no significant effect on smoking behaviour in our population. The final genetic score was built on 7 SNPs (Supplementary Tables S3–9).

Similar to previous studies that evaluated genetic scores for smoking behaviour35 and obesity36, our genetic score was based on 3 methods. In the first two methods, each SNP was weighted according to the size of its relative effect (β coefficient) using two types of β coefficints: β1 was derived from our population and adjusted for demographic characteristics (age, gender, education, occupation and ethnicity), BMI and sports activity time; β2 was derived from the results of GWASs and meta-analyses (Table 1)19,20,21,22,23,24,25,26. The third method used the unweighted counts of risk alleles to construct the score.

Table 1 The 7 SNPs used to calculate the genetic score for smoking behaviour.

Statistical analysis

HAPLOVIEW software version 4.2 (http:// www.broadinstitute.org/haploview) was used for analyses of Hardy-Weinberg equilibrium (HWE), LD and run tagger. SPSS version 19.0 (serial No. 5076595) was used for the data analysis. The significance level for all tests was set at a two-tailed α value of 0.05. The differences in means and proportions were tested using t-tests and chi-squared tests, respectively. Logistic regression models were used to identify the odds ratio (OR) of the genetic score for smoking behaviour.

Ethical considerations

The committee for medical ethics of the Chinese PLA General Hospital examined and approved our study; this study was performed in accordance with the ethical guidelines of the Declaration of Helsinki (version 2002). Each study participant provided written informed consent prior to completing the questionnaire.

Results

Patient characteristics

A total of 3,553 participants (1,477 males and 2,076 females) were included in our study. The average age was 70.29 ± 6.43 years. There were 1,067 smokers and 2,486 never smokers in our sample population: the two groups differed in gender (P < 0.001) and education (P = 0.007) but no significant differences were detected in age, ethnicity, occupation, BMI and sports activity time (P > 0.05) (Table 2). Table 3 depicts the genotype frequencies of the 7 SNPs.

Table 2 Baseline characteristics of the participants.
Table 3 Genotype frequencies of the 7 SNPs.

Effect of genetic score on smoking behaviour

Genetic score type 1

Risk alleles from the imputed data (0, 1 or 2) for each SNP were weighted according to their relative β coefficients (β1, Table 1), which were estimated from our data after adjusting for demographic characteristics (age, gender, education, occupation and ethnicity), BMI and sports activity time. Weighted risk alleles were summed for each individual to generate a type 1 genetic score representing the individual’s risk allele score (ranging from 0.06 to 0.88; average: 0.42 ± 0.14). The participants were divided into three groups according to tertiles (0.36 and 0.48): group 1 included participants with a genetic score <0.36; group 2 comprised participants with a genetic score 0.36–0.48; and group 3 included participants with a genetic score >0.48.

Through logistic regression analysis, we found that participants with a high genetic score (group 3) had a 26% higher risk of trying smoking and a 29% higher risk for SI at ≤18 years old after adjusting for age, gender, education, occupation, ethnicity, BMI and sports activity time. Among males, the ORs were even higher (1.37 and 1.37, respectively), whereas in females, the association was not significant (Table 4).

Table 4 Effect of genetic score 1 (β1) on smoking behaviour.

Genetic score type 2

Risk alleles from the imputed data (0, 1 or 2) per SNP were weighted for their relative β coefficients (β2, Table 1), which were estimated from previously reported GWASs and meta-analyses. Weighted risk alleles were summed for each individual to generate the type 2 genetic score representing the individual’s risk allele score (ranging from 0.14 to 3.53; average: 1.54 ± 0.70). The participants were divided into three groups according to tertiles (1.05 and 1.95): group 1 had a genetic score <1.05; group 2 had a genetic score of 1.05–1.95; and group 3 had a genetic score >1.95.

Regarding the type 2 genetic score, we found that participants with a high genetic score (group 3) had a 24% higher risk of trying smoking and a 28% higher risk for SI at ≤18 years of age after adjusting for age, gender, education, occupation, ethnicity, BMI and sports activity time. Among males, the ORs were even higher (1.37 and 1.42, respectively), whereas in females, the association was not significant (Table 5).

Table 5 Effect of genetic score 2 (β2) on smoking behaviour.

Genetic score type 3

Risk alleles from the imputed data (0, 1 or 2) per SNP were unweighted and summed for each individual, generating the type 3 genetic score as a representation of the individual’s risk allele score (ranging from 2 to 14; average: 7.47 ± 1.80). The participants were divided into three groups according to tertiles (7 and 9): group 1 had a genetic score <7; group 2 had a genetic score of 7–9; and group 3 had a genetic score >9.

Regarding the type 3 genetic score, we found that participants with a high genetic score (group 3) had a 34% higher risk of trying smoking and a 43% higher risk for SI at ≤18 years of age after adjusting for age, gender, education, occupation, ethnicity, BMI and sports activity time. Among males, the ORs were even higher (1.42 and 1.46, respectively), whereas in females, the association was not significant (Table 6).

Table 6 Effect of genetic score 3 (unweighted) on smoking behaviour.

Receiver-operating characteristic (ROC) curves

ROC curves were constructed using age, gender, education, occupation, ethnicity, BMI and sports activity time in addition to genetic score types 1, 2 and 3 (Fig. 3). The areas under the curve (AUCs) of the three types of genetic scores were 0.832, 0.832 and 0.832 for predicting smoking status in the total population; 0.673, 0.673 and 0.674 in males; and 0.724, 0.724 and 0.723 in females, respectively (Fig. 3). These results indicated that the associations of the three types of genetic scores with smoking were similar. Furthermore, for better extrapolation and improved understanding of such results, the unweighted genetic score represents the ideal choice.

Figure 3
figure 3

ROC curves of the four prediction models in total, male and female populations.

Next, we compared the AUCs of age, gender, education, occupation, ethnicity, BMI and sports activity time with and without the genetic score (unweighted). These values were 0.832 and 0.817 in the total population, 0.674 and 0.613 in males and 0.723 and 0.707 in females, respectively. This difference was significant in males (P < 0.05) (Fig. 3).

Furthermore, the average scores of the smoking group, heavy smoking group and SI at ≤18 years of age group were significantly higher than the never smoking group of males and the total population (Table 7).

Table 7 Average scores of never smokers, smokers, heavy smokers and SI at ≤18 years group.

Discussion

In this study, we retested all 18 significant SNPs (P < 5 × 10−8) from GWASs conducted on smoking behaviour (cigarettes smoked per day (CPD), SI) in a Chinese population; we then chose 7 of these SNPs to derive genetic scores. We derived three types of genetic scores to evaluate the genetic risk of smoking behaviour (smoking, heavy smoking and SI at ≤18 years of age) and found that the evaluation capacities of these three scores were approximately the same. Furthermore, we linked genetic risk and smoking behaviour (smoking, heavy smoking and SI at ≤18 years of age) in a Chinese population.

Certain SNPs were significant in GWASs conducted in European or African American populations; however, the MAFs of these SNPs in the Chinese population were too low for our analysis. Furthermore, of the 15 candidate SNPs, 4 SNPs displayed no association with smoking behaviour in the Chinese population, although significant associations were found in GWASs conducted on other populations. We identified 7 SNPs that impacted the susceptibility to smoking behaviour in the Chinese population (similar to the reported GWAS results). Moreover, both our study and previous studies found SNPs with common and unique features in terms of MAF, haplotype blocks and effects in different populations37,38.

Previous genetic score studies have used two methods to create the genetic score: 1) summing the unweighted SNPs35 and 2) summing SNPs weighted by their effect36. To our knowledge, this study is the first to compare the effects of different genetic score generation methods and we found that the three types of genetic scores elicited similar effects on smoking behaviour.

Furthermore, we found that genetic score was significantly associated with smoking behaviour (smoking status or SI at ≤ 18 years of age) in the Chinese population. This result is consistent with that of a study performed in New Zealand35, in which individuals with elevated genetic risk were more likely to convert to daily smoking as teenagers and progressed more rapidly from SI to heavy smoking.

However, the present study has several limitations. First, the candidate SNPs that were chosen from the GWAS results were mainly identified in European or African American populations; only a few such studies have been reported in Chinese populations. This may have decreased the reliability of the findings regarding SNPs related to smoking behaviour in the Chinese population. In addition, the SNPs from the USA/Northern European populations may not be suitable or sufficient to create a genetic score in the Chinese population. Thus, additional GWAS studies of large samples from the Chinese population should be conducted to create a more suitable genetic score for this population. Second, the small sample size of smoking women in our study may have decreased the stability of the results in women. Third, the genetic score created in our study requires verification in a larger Chinese sample.

To conclude, in this study, we tested GWAS-significant SNPs associated with smoking behaviour in a Chinese population and structured three types of genetic scores. We found that the effects of the three types of genetic score were similar; however, to best extrapolate and understand these types of results, the unweighted genetic score represents the ideal choice. Furthermore, the genetic score was significantly associated with smoking behaviour (smoking status and SI at ≤18 years of age). The results of this study may guide relevant health education for those with a high genetic score and promote smoking control to improve the health of the population.

Additional Information

How to cite this article: Yang, S. et al. Genetic scores of smoking behaviour in a Chinese population. Sci. Rep. 6, 22799; doi: 10.1038/srep22799 (2016).