Genetic Variants Modifying the Risk of Lung Cancer and Its Subtypes: A Comprehensive Meta-analysis and a Case-control Study

Association studies on lung cancer have often yielded con�icting and inconclusive results. We performed a comprehensive meta-analysis to dissect the precise effects of the candidate variants. We searched for association studies on lung cancer from the Indian subcontinent. Cochran’s Q-test assessed heterogeneity. Both overall and histotype-stratied meta-analysis was done using xed-effect and random-effects models. Smoking status strati�ed subgroup analysis and effect modi�cation tests were done. An associated variant with signi�cant heterogeneity was genotyped in an eastern Indian population to investigate the contribution of potential confounders followed by a comprehensive meta-analysis across world populations. Signi�cant heterogeneity was observed for the 8 variants. Both xed-effect and random-effects meta-analysis of 24 variants showed FDR-corrected associations of rs3547/XRCC1 and rs1048943/CYP1A1 with lung cancer along with 5 nominal associations. del1/GSTT1, rs4646903/CYP1A1, and rs10488943/CYP1A1 were associated with adenocarcinoma, squamous cell carcinoma, and both, respectively. rs4646903/CYP1A1 was associated with lung cancer among smokers with signi�cant effect modi�cation by smoking. rs10488943/CYP1A1 was associated with lung adenocarcinoma in the East Indian case-control study. rs1048943/CYP1A1 was associated with lung cancer across world populations. Our work con�rms the risk loci for lung cancer and its subtypes in the context of smoking and other aetiological factors, which could aid in personalised treatment.


Introduction
Despite several measures taken against tobacco smoking and consumption, lung cancer remains one of the leading causes of cancer-related mortalities worldwide with a low 5-year survival rate 1 .Epidemiological data suggest that the global lung cancer burden has risen to 2.1 million new cases of all cancer cases and 1.8 million deaths, which is close to 1 in 5 of all cancer deaths 2 .The increasing rate of incidence of lung cancer in the Indian subcontinent and East Asia is evident in a recent epidemiological report 3 .Lung cancer incidences show a great deal of variation across different geographical regions consisting of an admixture of diverse populations 4 .In India, lung cancer constitutes 5.9% of all new cancer cases and 8.1% of all cancer-related mortalities in both sexes 2 .The northeastern state of Mizoram accounts for the highest reported cases of lung cancer in both sexes 4 .Earlier reports stated that approximately one million of the total ve million lung cancer deaths worldwide are contributed by India 5 , and the death toll is projected to rise to 1.5 million by 2020 5,6 .Smoking tobacco has been considered a signi cant factor in lung carcinogenesis 1,7 .Apart from tobacco smoking, betel quid chewing 8,9 , diet [10][11][12] , biofuel exposure [10][11][12][13][14][15] , asbestos exposure 10,11,16 and other environmental pollutants 10,11,17,18 also contribute to lung carcinogenesis.Earlier studies have revealed a rise in the incidence of lung cancer among never smokers 19 , particularly in women of East Asian origin 20,21 .
Genome-wide association studies (GWAS) in the Chinese population have identi ed 16 susceptibility loci (p ≤ 5.00 × 10 − 8) to be associated with lung cancer risk 22,23 and 4 loci out of them showed evidence of association with lung cancer risk in smokers 22 .Similarly, another GWAS on subjects of European ancestry with 29266 lung cancer patients and 56450 controls identi ed 18 susceptibility loci (p ≤ 5.00 × 10 − 8), including 10 novel loci.
Interestingly, the association of the 10 novel loci varied across different histological subtypes.Out of the 10 loci, four were associated with overall lung cancer risk, while the remaining 6 loci were found to be associated with lung adenocarcinoma 21 .Most of the GWAS was done on subjects of European or Chinese descent, and the majority of the identi ed risk alleles have not been evaluated in the population of the Indian subcontinent despite several candidate gene association studies [24][25][26][27][28][29][30][31][32][33] .Contradictory outcomes of case-control association studies of the same polymorphism by different authors failed to identify the overall effect of the genes and the genetic variations on lung cancer susceptibility in the region.For example, one study showed no association between the GSTT1 deletion polymorphism and lung cancer in 100 cases and 76 controls 34 , but another study with 146 cases and 146 controls showed a signi cant association of GSTT1 deletion polymorphism with lung cancer 35 .In another study conducted, rs1048943 of CYP1A1 showed a signi cant association with lung cancer 31 , while the same variant showed no signi cant association with lung cancer in a different study from the Indian subcontinent 36 .The differences in genetic association across the geographical regions of the Indian subcontinent, comprised of distinct population groups, might be attributed to gene-gene and gene-environment interactions, which could act as potential modulators of lung cancer risk 37 .The contradictory outcomes of the variant between different studies could be due to small sample sizes, clinical heterogeneity between the populations, and racial/ethnic differences 38 within the Indian subcontinent.
Further, the differences in socio-economic and cultural practices in different parts of the Indian subcontinent might contribute to diverse lifestyle habits like smoking, chewing of tobacco and betel quids, alcohol consumption, and exposure to air pollutants; exposure to asbestos and other occupational hazards that in turn could modify the risk of the disease.This brings forth the importance of meta-analysis, which is a powerful statistical method 39 to assess the pooled effect of the variant(s) on lung cancer susceptibility in the concerned population by pooling the individual study data.The current study aimed to estimate the magnitude of the effect of the reported candidate associations on lung cancer susceptibility in the Indian subcontinent through a metaanalysis pipeline (Supplementary Information, Fig. S1) as followed in our previous pilot study 40 .The meta-analysis was followed by a case-control study, conducted on the lung cancer-associated polymorphic variant after multiple testing adjustments with signi cant heterogeneity (Supplementary Information, Fig. S2).Furthermore, the signi cant variants of the above analysis were meta-analysed across the world population to contrast the associations across populations of varying ethnicities.

Methods
The scheme of analysis followed in this study is explained and summarised in (Supplementary Information, Fig. S2).

Identi cation and Eligibility of studies
The current study followed the PRISMA guidelines 41 .Systematic mining of the databases, such as PubMed, Scopus, and Web of Science, was done to select appropriate studies using the following keywords: (SNP/SNPs/polymorphisms/single nucleotide polymorphisms/SNVs/SNV/Mutation/ Variants/Genotypes/Alleles ); (Lung cancer/Lung Carcinoma/Lung malignancy/Lung neoplasm); (India/Pakistan/Nepal/Bangladesh /Bhutan/Sri-Lanka/Maldives/Afghanistan).All case-control genetic association studies on lung cancer were curated and selected manually by two authors and rechecked by the other authors.

Inclusion and Exclusion criteria
The selection of the studies for meta-analysis was made following the speci c inclusion criteria: (a) samples should be from populations belonging to the countries of the Indian subcontinent; (b) case-control genotype data should be reported for each polymorphic variant; (c) the data source should be a full research article; (d) all association studies published till 31st December 2019 were considered; (e) each polymorphism reported in at least three independent studies on the different sample populations were considered; (f) studies should be published in English.The exclusion criteria were as follows: (a') incomplete data from review articles, letters, editorials, comments and case reports; (b') duplicated studies using the same population; (c') studies with no genotype count among cases and controls; (d') studies in languages other than English; (e') unpublished data; and (f') systematic reviews and metaanalysis were not considered.

Data extraction
Data extraction from the literature was done following speci c inclusion and exclusion criteria.The data collected from the selected studies are (1) rst author surname, (2) year of the publication, (3) mean age with standard deviation, (4) sex, (5) smoking status, (6) histological types, (7) genetic polymorphisms and (8) genotype-speci c case-control data (9) geographical region of sampling done in the selected studies.

Selection of a genetic model for meta-analysis
Apart from del1/GSTT1, del2/GSTM1, rs3025039/VEGFA, rs1048943/CYP1A1 and rs4646903/ CYP1A1, the remaining 20 polymorphic variants were analysed in 3 different genetic models: i.e. additive, dominant and recessive models.The variants del1/GSTT1 and del2/GSTM1 were only analysed in the recessive model, while rs1048943/CYP1A1, rs4646903/CYP1A1, and rs3025039/VEGFA were analysed in a dominant model.We restricted our analysis to a particular genetic model for the mentioned variants because of the nature of the data reported in the studies selected for the analysis.

Meta-analysis
Inter-study heterogeneity was evaluated using Cochran's Q test (p < 0.1) 42 .In the presence of signi cant heterogeneity, the heterogeneity index (I 2 ) quanti es the degree of inconsistency across studies using the formula: %, where 'n' is the number of studies selected.The I 2 value is expressed as a percentage with cut-offs of 25%, 50% or 75% conventionally signifying the presence of low-, mid-or high-grade heterogeneity, respectively.Meta-analysis was conducted in R, 3.4.2package 'metafor' 43 on lung cancer genetic association reports from the Indian subcontinent.Logistic regression of lung cancer status on variant genotype was done using additive, recessive and dominant effect models (using R function 'glm') to obtain the study-level effect sizes (log odds ratio), standard errors and 95% con dence intervals.Both xed-effect meta-analysis (inversevariance weighting) and random-effects meta-analysis (DerSimonian Laird method) were used to combine the study-level estimates (using the 'rma.uni'function in R package 'metafor') 43 .It estimates cumulative odds ratios and 95% con dence intervals (95% CI) to determine the overall evidence of statistical association (p < 0.05) of the reported variants with lung cancer risk.Benjamini-Hochberg method was used to correct the false discovery rate (FDR) at a 5% level (p FDR < 0.05).

Effect on histological subtypes
The genotype counts of the cases were strati ed within the histological subtypes of lung cancer for 5 variants only.The remaining variants lacked the histological subtype-strati ed genotype counts for the cases and were not included in the analysis.Similarly, a logistic regression of the status of histological subtypes of lung cancer on variant genotype was done in three genetic models (using R function 'glm'), as mentioned above, to obtain the study-level effect sizes (log odds-ratios), standard errors and 95% con dence intervals.Following this, both xed-effect and random-effects meta-analyses were used to combine the study-level estimates (using the 'rma.uni'function in R package 'metafor') 43 to assess the effect of variants on each lung cancer subtype.

Effect Modi cation by Smoking
For the variants nominally associated with lung cancer, we looked for smoking status-strati ed summary data (genotype counts) as described earlier 40,44 .
The strati ed study-level β-coe cient (log OR) and standard errors (SE) were calculated using logistic regression from the stratum-speci c genotype counts reported in the respective studies.A xed-effect meta-analysis obtained overall effect estimates, con dence intervals and p-values within each subgroup (i.e., for 'smoker' and 'non-smoker' groups).Finally, a meta-regression in the xed-effect model was done with stratum (i.e., smoker/non-smoker group) as a moderator variable (using the 'rma.uni'function in R package 'metafor').

Effect of Geographical Region
The selected studies for this meta-analysis did not report the race/ethnicity of the patients and controls.Thus, we grouped the studies into geographic regions (North, South, East and West) and conducted a meta-analysis within each subgroup.This analysis was done to assess the region-speci c association of the polymorphic variants with lung cancer risk if any 45 .
Publication Bias A visual inspection of funnel plots 46 along with Egger's regression test was done to evaluate the asymmetry (p < 0.05) of the funnel plots for the estimation of publication bias if any, among the selected studies.Egger's regression test uses a weighted regression model with multiplicative dispersion for only those variants that are reported in 10 or more studies.

Case-Control analysis in the East Indian population
All the histological subtypes of con rmed lung cancer patients were recruited from the Saroj Gupta Cancer Centre and Research Institute, and the Department of CHEST, IPGME&R in Kolkata were considered for patient selection and recruitment for sample collection.Both males and females were considered without any speci c sex bias.Patients with a recent history of tobacco smoking were selected.All individuals who had quit smoking ≥ 15 years from the date of sample collection were not considered.Similarly, controls were recruited from the Department of CHEST, IPGME&R in Kolkata, belonging to the same geographical region as the patients with con rmed smoking history.The controls were clinico-radiologically con rmed healthy smokers aged ≥ 55 years 2 and without any history of cancer.All participants of the study were asked to ll up a detailed questionnaire, and their informed consent for voluntary participation was obtained before sample collection following the ethical guidelines of the concerned institutes and the Declaration of Helsinki, 1964.Case-Control association of rs1048943/CYP1A1 with lung cancer in the East Indian population, including 101 cases and 413controls, was conducted.The polymorphism was selected from the current meta-analysis on lung cancer as the signi cant polymorphic variant after FDR correction with signi cant heterogeneity between the studies.The primer sequences used for the PCR of the fragment of 204 base pairs harbouring the polymorphism rs1048943 of CYP1A1 were as follows: CYP1A1-F: 5'-CTGTCTCCCTCTGGTTAC AGGAAGC-3', and CYP1A1-R: 5'-TTCCACCCGTTGCAGCAGG ATAGC-3'.The PCR conditions followed for adequate ampli cation were as follows: 94°C/5 min ─ (94°C/40 s ─ 61°C/40 s ─ 72°C/40 s) ⋅ 30 cycles --72°C/7 min --4°C hold.Following PCR, a quality check of the amplicons was done in 6% PAGE.The BsrDI restriction enzyme digested the PCR amplicons at 65°C for 2 h.
Covariate-adjusted (adjusted for age, sex, ethnicity, smoking intensity in pack-years, alcohol consumption, tobacco and betel quid chewing, and asbestos exposure) unconditional logistic regression of the genotype count on lung cancer status was performed using R 3.4.2software.

Meta-analysis of signi cant variants in the global population
The variants associated with lung cancer (after FDR correction) in the sample population of the Indian subcontinent were further meta-analysed in the global population following a similar protocol and PRISMA guidelines 41 , as mentioned above.PubMed, Scopus, and Web of Science repositories were searched to select appropriate studies using the following keywords: (CYP1A1/XRCC1); (CYP1A1*2C/rs1048943/rs3547/single nucleotide polymorphisms/SNV/Mutation/Variants/Alleles /Genotypes).Our case-control data of the case-control analysis of the East Indian population were included in this meta-analysis.Similar, inclusion and exclusion criteria to the above meta-analysis were followed, except that the selection of the studies was not con ned to any particular geographical region or ethnicity.The variant, rs1048943 (CYP1A1), was analysed in the dominant model only, while rs3547 (XRCC1) was analysed in additive, dominant and recessive models.Similar categories of data were extracted from the selected studies as in the above meta-analysis.Heterogeneity between studies was assessed using Cochran's Q test (p < 0.1).A meta-analysis following a similar protocol, as mentioned above, was conducted.The genotypes were strati ed by histological subtypes and geographical regions reported in the selected studies.This was followed by subgroup analyses and an effect modi cation test using a xed-effect meta-regression model for smoking status.

Study characteristics
Systematic mining of the databases with the search strings as mentioned above revealed 1060 hits, which were screened down to 50 studies following the speci c inclusion/exclusion criteria set for the proposed study (Figure 1).These 50 studies included 24 polymorphisms from 12 genes with 9,487 cases and 10,455 controls (Table 1).Covariate speci c case-control data, particularly tobacco smoking, mean age, histological status, and geographical region of the subjects were recorded from all 50 studies selected for meta-analysis (Supplementary Information, Table S1).

Meta-analysis for the overall association of polymorphic variants with lung cancer
Analysis of heterogeneity between studies using I 2 metric revealed 5 variants, such as GSTP1/rs1695, rs1042522/TP53, rs9282861/SULT1A1, rs25487/XRCC1 and rs3025039/VEGFA, to show mid-grade heterogeneity in a dominant model.Similarly, rs1048943/CYP1A1 showed low grade (25% <I 2 >50%) heterogeneity and rs2010963/VEGFA showed high grade (I 2 >75%) heterogeneity in the dominant model.In the additive model, rs9282861/SULT1A1, rs25487/XRCC1, rs3547/XRCC1, rs2010963/ VEGFA, and rs419558/DKK2 showed low grade heterogeneity while rs1042522/TP53 showed mid-grade (50%<I 2 >75%) heterogeneity.In the recessive model, del1/GSTT1, del2/GSTM1, rs9282861/SULT1A1, rs2010963/VEGFA, and rs44772/DKK2 showed low-grade heterogeneity, and rs3547/XRCC1 showed high-grade heterogeneity.Cochran's Q test also revealed the presence of signi cant heterogeneity between studies (P<0.1) for 8 variants, including rs1695/GSTP1, rs1042522/TP53, rs9282861/SULT1A1, rs25487/XRCC1, rs10488943/CYP1A1, rs3025039/VEGFA, VEGFA/rs2010963, and rs25487/XRCC1 in the dominant model only.The results of the xed-effect and randomeffects meta-analysis are shown in Table 2.The xed-effect (FE) meta-analysis gives an average effect under the assumption that all effect estimates are assessing a common underlying effect.The random-effects (RE) meta-analysis is a variation of the FE method, which incorporates the assumption that different studies estimate different yet related effects.While the FE model is more powerful in the absence of heterogeneity, the RE model gives more robust estimates of the overall effect, where large studies are unlikely to dominate, particularly in the presence of heterogeneity.Therefore, the results for both FE and RE models were discussed in this meta-analysis.
Interestingly for the variant rs1048943/CYP1A1, most of the signal was driven by a single study 31 ], which could be the reason behind the observed heterogeneity between studies.A random-effects meta-analysis revealed that rs1048943/CYP1A1 was signi cantly associated with the overall lung cancer risk in the Indian subcontinent in a dominant (AG+GG vs. AA: OR=2.07; 95% CI=1.32-3.27;p= 0.002) model (Table 2).However, adjustment for multiple testing by Benjamini-Höchberg false discovery rate (FDR) correction (P FDR <0.05) revealed a signi cant association of rs1048943/CYP1A1 (p FDR = 0.027) and rs3547/XRCC1 (p FDR = 0.027) in a dominant model (Table 2).The associations of rs1048943/CYP1A1 and rs3547/XRCC1 with overall lung cancer risk in a dominant model are depicted in forest plots (Figure 2).

Evaluation of Publication Bias
To check for potential publication bias, we looked at funnel plots constructed of odds ratios and standard errors of the analysis.Visual inspection of the funnel plots for del1/GSTT1 and del2/GSTM1 (Supplementary Information, Fig. S3).For rs4646903/CYP1A1, the funnel plots are summarised in the following Fig. (Supplementary Information, Fig. S6), and rs1048943/CYP1A1 (Figure 2 A') did not show any apparent asymmetry.Furthermore, Egger's regression test did not nd signi cant evidence of asymmetry in the funnel plots for rs4646903/CYP1A1 (p = 0.35), del1/GSTT1 (p = 0.93), and del2/GSTM1 (p = 0.07).Thus, we did not nd evidence for signi cant publication bias for the variants studied here.

Association based on the Geographical Region
Only 6 out of 24 variants were strati ed according to the geographical zones of the Indian subcontinent.The remaining variants were reported from the same zones of the subcontinent, as mentioned above.Hence, they were not considered in this analysis.The analysis revealed del1/GSTT1, and rs1048943/CYP1A1 to be associated with lung cancer risk in the population of the "North" and "South" zone of the subcontinent in a recessive and dominant model, respectively (Supplementary Information, Table S2).The variant del2/GSTM1 was found to be associated with lung cancer risk in the population of the "North" zone of the subcontinent only in a recessive model, while the variant rs4646903/CYP1A1 has been found to show association with the population of the "East" zone only in a dominant model (Supplementary Information, Table S2).The variant rs25487/XRCC1 was found to be associated with lung cancer in the population of the "South" zone of the subcontinent in additive and dominant models, but also held association with lung cancer in the population of the "East" zone of the subcontinent in a recessive model of analysis (Supplementary Information, Table S2).The variant rs1042522/TP53 holds an association with lung cancer in the "East" zone of the subcontinent (Supplementary Information, Table S2).
Similarly, rs1048943/CYP1A1was associated with Squamous cell carcinoma (AG+GG vs. AA: OR = 3.53; 95% CI = 2.05-6.08;p= 0.00005) with no heterogeneity.However, the variant rs1042522/TP53 was not associated with any of the histological subtypes.The association of these variants with different histological subtypes in different models of analysis are depicted in forest plots (Figure 3).The detailed results of the histology-strati ed metaanalysis are summarised in (Table 3).The association of variants with speci c histological subtypes justi es the genetic heterogeneity of lung cancer and differences in the pathogenesis of the subtypes, which could be useful in better prognosis and therapy.

Subgroup Analysis-Association of variants based on Tobacco Smoking status
Subgroup analysis for del1/GSTT1, del2/GSTM1, rs4646903/CYP1A1, rs1048943/CYP1A1, rs25487/XRCC1, and rs9282861/SULT1A1 was done using the smoking status-strati ed summary data separately within the "Smoker" and "Non-Smoker" sub-groups.Data for the remaining variants were not available in the selected studies for this meta-analysis.The analysis revealed rs4646903/CYP1A1 and rs25487/XRCC1 to be associated with lung cancer in smokers in a dominant (TC+CC vs. TT: OR = 2.26, 95% CI = 1.44-3.53;p = 0.0004) and recessive (AA vs. AG+GG: OR = 0.48, 95% CI = 0.27-0.86;p = 0.01) models, respectively (Table 4 A).No evidence of signi cant heterogeneity was found.Interestingly, the variant genotype 'AA' of rs25487/XRCC1 has been found to confer protection from lung cancer in the smoker population of the Indian subcontinent.However, the variant rs1048943/CYP1A1 was found to be associated with lung cancer in both smokers (AA vs. AG+GG: OR = 2.23, 95% CI = 1.47-3.37;p =0.0002) and non-smokers (AA vs. AG+GG: OR = 1.74, 95% CI = 1.10-2.73;p =0.02) in a dominant model.Interestingly, the magnitude of association with smokers is higher than non-smokers.No signi cant heterogeneity was observed for the variant in both smokers and non-smokers.The association of these variants with smoking status in different models of analysis are depicted in forest plots (Figure 4).Furthermore, we tested for effect modi cation of the 6 polymorphic variants by smoking status using a xed-effect meta-regression model with smoking status as a moderator variable.The difference in effect size (log-OR) between smokers and non-smokers is denoted by the coe cient of the moderator that revealed 2 polymorphic variants, viz.del1/GSTT1 (p = 0.015) and rs4646903/CYP1A1 (p = 0.01), has a considerable effect on lung cancer risk modi ed by smoking (Table 4

B).
Association of rs1048943/CYP1A1 with lung cancer in a case-control dataset The detailed demographic and clinical attributes of the study sample from East India is summarised in (Supplementary Information, Table S3).Out of the 2 variants con rmed to be associated by the meta-analysis after FDR correction, namely rs3547/XRCC1 and rs1048943/CYP1A1, the latter showed evidence of signi cant heterogeneity (Q=1.93,I 2 =48.32, p=0.092).For the variant rs1048943/CYP1A1, a consistent pattern of association with lung cancer was observed from the Northern region of the subcontinent, but lack signi cant association with lung cancer in the Eastern region.We hypothesised that this heterogeneity might be explained by looking at covariate-speci c and subgroup-strati ed analysis.One of the major reasons for heterogeneity in the crude analysis is the uneven distribution of confounder/subgroups across studies.Hence, to understand the source of heterogeneity, we genotyped this polymorphic variant in a case-control lung cancer dataset comprising 101 cases and 413 controls from a representative East Indian population, where several covariates had been measured.
The age and pack-years of smoking were converted to categorical variables based on their respective mean values.The case-control analysis revealed no signi cant association of rs1048943 of CYP1A1 in additive effect model (GG vs GA vs. AA OR=1.24, 95% CI= 0.79-1.93;p adj =0.35), dominant effect model (GG+GA vs. AA: OR=1.33, 95% CI= 0.825-2.16;P adj =0.24), and recessive effect model (GG vs AG+AA: OR=0.49, 95% CI=0.1-4.56;p adj =0.53) with lung cancer adjusted for pack-years of smoking, age, alcohol consumption, tobacco and betel quid chewing, and asbestos exposure (Supplementary Information, Table S4).The representative image of the RFLP analysis is depicted in (Supplementary Information, Fig. S7).No signi cant association of rs1048943 was found in any of the covariate subgroups, such as pack-years of smoking, age, alcohol, tobacco and betel quid chewing, and asbestos exposure (Supplementary Information, Table S5).However, strati cation of genotype count based on the histological subtypes revealed rs1048943 of CYP1A1 to be nominally associated with lung adenocarcinoma (ADC) in additive (GG vs GA vs. AA: OR=1.75; 95% CI=1.01-3.05;p additive =0.047) and dominant (GG+GA vs. AA: OR=1.99; 95% CI=1.10-3.63;p dominant =0.024) effect models adjusted with pack-years, alcohol consumption, tobacco and betel quid chewing, and asbestos exposure (Supplementary Information, Table S6).The results indicate the probable implication of the variant in speci c histopathology of lung cancer, supporting the notion of genetic heterogeneity and variability of lung tumour subtypes, both in origin and pathogenesis.

Meta-analysis of rs1048943 (CYP1A1) and rs3547 (XRCC1) in world population
The signi cant nding of the above study was replicated in a worldwide sample population.Literature search with the speci c keywords revealed a total of 2617 hits for the variant rs1048943 (CYP1A1), and 2224 hits for rs3547 (XRCC1) published till 31 st December 2019, worldwide.Our case-control association was included in the pool of hits, which increased the total number of hits for rs1048943 (CYP1A1) to 2618.Following the speci c inclusion/exclusion criteria, 49 studies (including 1 study reporting both the variants) were selected for the meta-analysis (Supplementary Fig. S8).These 48 studies included 2 variants from 2 genes with 12393 cases and 13841 controls.All the covariate and demographic summary data for the two variants are listed in a tabular form (Supplementary Information, Table S7).
Cochran's Q test showed evidence of signi cant heterogeneity between studies (p = 0.086) for rs1048943/CYP1A1 only.Fixed-effect meta-analysis revealed a signi cant association of rs1048943/CYP1A1 (AG+GG vs. AA: OR=1.22; 95% CI=1.06-1.41;p = 0.01) with lung cancer across the world population.The random-effects meta-analysis also revealed signi cant association of rs1048943/CYP1A1 (AG+GG vs. AA: OR=1.22; 95% CI=1.001-1.483;p = 0.048) with lung cancer across the world population (Supplementary Information, Table S8).The analysis was performed only in the dominant model due to the data presented in the selected studies.The study-level and combined association of rs1048943/CYP1A1 with lung cancer have been depicted in forest plots (Supplementary Information, Fig. S9).A xed-effect meta-analysis revealed a signi cant association of rs3547/XRCC1 with lung cancer across the world population in all the three genetic models (Supplementary Information, Table S8).However, a random-effects meta-analysis showed a signi cant association of rs3547/XRCC1 with the disease in only additive and dominant models (Supplementary Information, Table S8).The study-level and combined association of rs3547/XRCC1 with lung cancer have been depicted in forest plots (Supplementary Information, Fig. S10).
Qualitative assessment of the funnel plots revealed the lack of signi cant publication bias for rs1048943/CYP1A1 (Supplementary Information, Fig. S9).Further, Egger's regression test estimated lack of signi cant asymmetry of the funnel plot for rs1048943/CYP1A1 (p = 0.96).
Strati cation of the case-control genotype data by histological subtypes of lung cancer revealed a signi cant association of rs1048943/CYP1A1 with Squamous cell carcinoma (AG+GG vs. AA: OR=1.39; 95% CI=1.07-1.82;p = 0.015) in the xed-effect model only (Supplementary Information, Table S9 and Fig. S11).The variant, rs3547/XRCC1 lacked relevant case-control genotype data strati ed by histology.Subgroup analysis revealed signi cant association of rs1048943/CYP1A1 with lung cancer in smokers in a dominant model (AG+GG vs. AA: OR = 1.54, 95% CI = 1.17-2.02;p = 0.002) (Supplementary Information, Table S10 and Fig. S12).Furthermore, we tested for effect modi cation of the variant by smoking status using a xed-effect meta-regression model with smoking status as a moderator variable.The difference in effect size (log-OR) between smokers and non-smokers is denoted by the coe cient of the moderator that revealed no signi cant effect on lung cancer risk modi ed by smoking (p = 0.47) (Supplementary Information, Table S11).
Further, rs1048943/CYP1A1 genotype was strati ed by the geographical region of the selected studies that revealed signi cant association (p < 0.05) of the variant with lung cancer in Indian and Australian population in both xed-effect and random-effects models (Supplementary Information, Table S12 and Fig. S13).

Discussion
Our study presents the rst comprehensive meta-analysis of 24 variants of 12 genes across 51 studies from the Indian subcontinent that provides an insight into the combined effect of each variant on overall and covariate-strati ed lung cancer risk in the region.The lack of signi cant publication bias con rms that the results were not overestimated under the in uence of any bias in the published articles.Although GWAS data mining revealed no signi cant association of rs1048943/CYP1A1 with lung cancer, it showed a signi cant association of the CYP1A1 gene with hypertension and habitual coffee consumption.Therefore, the association of the variant with lung cancer could be modi ed by coffee consumption or smoking tobacco.The variant rs1048943/CYP1A1 was found to be associated with lung cancer risk in East Asians 47 , which shows the colinearity of the ndings of this study to the present study as discussed here.The CYP1A1 (Cytochrome P450, family 1, member A1; 15q22-24) gene encodes a bulky phase I endoplasmic xenobiotic metabolism enzyme, present in lung tissue.The enzyme catalyses the activation of reactive electrophilic compounds, including benzo[a]pyrenes and PAHs present in tobacco smoke 48 .It promotes the formation of DNA adducts, which imparts a genotoxic effect that could lead to DNA lesions and cause lung cancer.The variant rs1048943A > G of CYP1A1 locus causes a single amino acid substitution (Ile > Val) in the heme-binding region, which increases enzyme activity, enhancing the activation of procarcinogens in tobacco smoke.It in uences the metabolism of environmental carcinogens, such as tobacco smoke that modi es lung cancer susceptibility 48 .
The superfamily of glutathione-S-transferases (GSTs) comprises of multifunctional enzymes that catalyse the conjugation of reduced tripeptide glutathione to various electrophilic and hydrophobic substrates resulting in their detoxi cation and effective elimination from the cell.Thus, they help to reduce the carcinogenic load accumulated due to smoking from the cells.The null genotype of the deletion polymorphisms of glutathione-S-transferase theta 1 (GSTT1) and glutathione-S-transferase mu 1 (GSTM1) are frequently found to be associated with lung cancer with evidence of effect modi cation by tobacco smoking.The null genotype is responsible for the lack of the enzyme within the cell conferring higher risk to lung cancer.Inconsistent reports on the association of the null genotypes of GSTT1 (del1) and GSTM1 (del2) led to confusion regarding their correct effect on the disease pathogenesis 34,35 .Ethnicity/racial differences in the association of GSTT1 null genotype with lung cancer has been reported where the frequency of the null genotype was found to be signi cantly higher in Asians than in Caucasians 49 .
The Dickkopf-related protein 2 (DKK2) gene encodes a secretory protein belonging to the Dickkopf family.The protein, DKK2 bears two cysteine-rich regions and is involved in embryonic development through the Wnt/β-catenin signalling pathway.DKK2 exhibits a bimodal function as an agonist or antagonist of the Wnt/β-catenin signalling pathway 50 depending on the cellular context and the presence of the co-factor kremen2.The activity of DKK2 is modulated the Wnt co-receptor, LDL-receptor related protein 5 (LRP5) and − 6 (LRP6) 51 .Aberrant expression of DKK2 has been observed in many tumours, including epigenetic silencing of the expression of DKK2 in ovarian carcinoma 52 , hepatocellular carcinoma 53 , and renal carcinoma 54 .RNAi-mediated silencing of DKK2 is frequently observed in tongue squamous cell carcinoma 55 , and oesophageal adenocarcinoma 56 .These reports are suggestive of the anti-tumour effect of DKK2.However, the upregulation of DKK2 promotes cell proliferation and invasion through the Wnt signalling pathway in prostate cancer 57 , Ewing sarcoma 58 , and colorectal cancer 59 .Thus, the cellular context-dependent function of DKK2 is very complex, which is evident from the above examples.DKK2 has been found to promote angiogenesis, which is distinct from VEGF-dependent angiogenesis 60 in the formation of closer interconnections of the vessels.
Interestingly, Dkk2-induced blood vessels consistently show higher coverage of endothelial cells (ECs) by pericytes and smooth muscle cells (SMCs), which are involved in vessel maturity and stability.Dkk2-mediated angiogenesis consists of a signalling cascade induced through LRP6-mediated APC/Asef2/Cdc42 activation.DKK2 promotes tumour progression by suppressing cytotoxic immune cell activation in colorectal carcinoma 61 and NSCLC 62 with APC mutations.In a recent study 26 , the heterozygous genotype of rs17037102/DKK2 and rs419558/DKK2 confer an increased risk of lung cancer.
A combination of all the 3 genotypic variants of DKK2 confers a four-fold increase to lung cancer risk.
The protein encoded by XRCC1 (X-ray repair cross-complementing 1; 19q13.31)performs an e cient repair of single-strand DNA breaks formed by the exposure to ionising radiation and alkylating agents.XRCC1 interacts with DNA ligase III, polymerase beta and poly (ADP-ribose) polymerase to participate in the base excision repair.The protein plays a role in DNA processing during meiosis and DNA recombination in germinal epithelial cells.Moreover, XRCC1 harbours a rare microsatellite polymorphism, which is associated with varying radiosensitivity in cancer 63 .Polymorphisms of XRCC1, like Arg194Trp (exon 7), Arg280His (exon 10) and Arg399Gln (exon 11) were reported to confer increased risk to lung cancer [64][65][66] with inconsistencies across different populations [67][68][69][70][71] .
The association of variants with different histological subtypes of lung cancer revealed del1/GSTT1 to be associated with lung adenocarcinoma, rs4646903/CYP1A1 with lung squamous cell carcinoma while rs1048943/CYP1A1 with both lung adenocarcinoma and lung squamous cell carcinoma.
Thus, strati cation of the genotypes based on the histological subtypes of lung cancer improved risk assessment potential.Identi cation of subtypes speci c genetic risk markers helps to design targeted early detection and prevention strategies.Moreover, the identi cation of histotype associated genetic markers may de ne the mechanism underlying the currently unknown origins of morphological variations that could contribute to the development of personalised treatment modalities for subtype-speci c lung cancer cases.Furthermore, subgroup analysis of 6 variants strati ed by smoking status revealed rs1048943 of the CYP1A1 gene to be signi cantly associated with lung cancer in both smokers (p = 0.0002) and non-smokers (p = 0.02).However, the meta-regression analysis revealed the absence of any effect modi cation of rs1048943 on lung cancer by smoking (p = 0.43), which implies that the polymorphism has no modi er effect on lung cancer by smoking.The variant rs4646903 of the CYP1A1 gene (p = 0.0004) show an association with lung cancer in smokers only.Interestingly, signi cant effect modi cation of del1 of the GSTT1 gene (p = 0.015) and rs4646903 of the CYP1A1 gene (p = 0.01) on lung cancer by smoking was observed by meta-regression analysis, which suggested the importance of the variants in modifying the risk of lung cancer by smoking status.
Based on the meta-regression analysis, there is no signi cant effect modi cation for the remaining variants, although it can be surmised that there may be interaction in the biological mechanisms leading to lung cancer.Subgroup analysis based on covariates, such as age, sex, ethnicity, exposure types and dose was not done due to lack of su cient reports on the population of the Indian subcontinent.The Indian subcontinent consists of a highly heterogeneous population with considerable admixture among different ethnicities, which could modify the linkage disequilibrium structure of the population 72 .This could contribute to signi cant heterogeneity between the studies.Therefore, the geographical region based association of the variants was performed to assess the effect estimates of the variants in different geospatial clusters of the Indian subcontinent.
The variant rs1048943A > G of CYP1A1 locus is a non-synonymous polymorphic variant, which imparts an individual effect on lung cancer risk in various populations 31,73,74 .On the other hand, rs3547/XRCC1 is a synonymous polymorphic variant that has shown no association with lung cancer in the Korean 75 , Chinese 76 , Latino 77 , and African-American 77 population.In the Korean population, the variant shows association with lung cancer in a haplotype with other XRCC1 variants like rs3547-rs25487-rs25486-rs25489-rs2293036-rs1799778-rs1001581-rs12611088-rs3213282.The meta-analysis was followed by a case-control analysis in the East Indian population, which revealed no association of rs1048943A > G of CYP1A1 with overall lung cancer risk but with lung adenocarcinoma adjusted with covariate status.Thus, our case-control study reveals rs1048943/CYP1A1 to be a histological subtype-speci c variant for lung cancer in the East Indian population, which could be a potential target for personalised therapy and histology speci c drug designing for lung cancer patients.The nding shows colinearity with the outcomes of the current meta-analysis.The studies, included in this meta-analysis, reported from the Eastern region of the subcontinent also shows lack of association of the risk genotype (GG) of the polymorphic variant rs1048943/CYP1A1 as summarised in (Table 1 and Supplementary Information, Table S1) 78,79 .
Our replication meta-analysis across the world population justi es the role of the variants rs1048943 (CYP1A1) and rs3547 (XRCC1) in conferring overall lung cancer risk, particularly in smokers, with a higher power.Interestingly, rs1048943 (CYP1A1) shows no effect modi cation by smoking status on lung cancer risk that is indicative of the association in smokers as a random occurrence by chance In the larger sample size, the variant rs1048943 (CYP1A1) shows an association with squamous cell carcinoma, which is indicative of a population-speci c effect of the variant on different histological subtypes of lung cancer.The association of rs1048943 (CYP1A1) across various populations identi es the relevance of the variant in lung cancer risk in a population-speci c manner, which could be critical in designing personalised treatment and precision medicine for patients of diverse populations.
Interestingly, out of 12 selected genes for meta-analysis 5 genes belong to the xenobiotic metabolism pathway, 3 belong to the DNA repair pathway, 3 belong to the Wnt/β-catenin pathway regulating various physiological aspects of lung cancer, and 1 belong to the angiogenic pathway.The xenobiotic metabolism and DNA repair pathways could be the signi cant 'modi er' and 'driver' pathways leading to altered gene-environment interaction and development of lung cancer.Genes of xenobiotic metabolism pathways are involved in the metabolism and detoxi cation of tobacco smoke components to reduce intracellular carcinogenic load.Some genes of the xenobiotic metabolism pathway also induce bio-activation of procarcinogens into potent carcinogens that can quickly form DNA-adducts and subsequent mutagenesis.Further, genes belonging to the DNA repair pathway functions to repair DNA damage induced by tobacco smoking and radiation.Detailed text mining of the available reports following the inclusion criteria, revealed the association of xenobiotic metabolism genes (XMG) and DNA repair genes (DRG) to the risk of lung cancer development.
In the current study, we were unable to perform subgroup and meta-regression analysis for other covariate risk factors for all the variants due to unavailability of su cient data in the selected studies.
The subtype-speci c polymorphic variant identi cation as obtained in the current meta-analysis would su ce personalised therapy and development of precision medicine.The identi cation of genetic variants for which there is evidence of in uence on lung cancer risk through meta-analysis may provide new insights into the fundamental biological pathways involved in the development of lung cancer that could help in future research.Further, identi cation of lung cancer risk variants may also be bene cial in the assessment of risk scores for accurate population risk strati cation and decision making, which could be of potential value in targeting primary prevention and lung cancer screening modalities in a population-speci c manner.

Figures
Figures

Table 1
Details of the selected studies for meta-analysis

Table 2 A
comprehensive list of meta-analysis results showing the overall association of the variants with lung cancer, with crude odds ratio (OR), 95% Con dence Interval (CI), P FDR , Benjamini-Höchberg False Discovery Rate (FDR) corrected P-value, Heterogeneity indices H 2 , I 2 , Publication Bias by Egger's Test.Both the Genetic model and model used for meta-analysis are also mentioned.

Table 4 [
A] The results of the subgroup analysis strati ed by smoking status.Signi cant associations are depicted in bold.The results of the effect modi cation of variants on lung cancer by smoking status.Signi cant associations are depicted in bold.P-value<0.05*,0.01**, 0.001***, OR, Crude Odds Ratio, 95% CI, 95% Con dence Interval, I 2 and H 2 are measures of heterogeneity.§ P Het <0.1 (Cochran's Q test).Signi cant associations are depicted in bold. †