Dataset on the effect of Benzene exposure on genetic damage, hematotoxicity, telomere length and polymorphisms in metabolic and DNA repair genes

In this paper, we present an occupational dataset to evaluate benzene exposure on the effective biomarkers of genetic damage, indicated as cytokinesis-block micronucleus (MN) frequency, hematotoxicity, indicated as white blood cells (WBC) counts, and molecular marker of telomere length (TL). And we further to eliminate the mechanism of benzene induced damage. Then evaluate the effects of sites polymorphism in environmental response genes, including 18 sites in metabolic and DNA repair genes, and the interaction between gene polymorphism and benzene exposure. This dataset is supplementary to the submitted research by [1] focused on the biomarkers TL, and a detailed description of the subjects sampling, biomarkers detection, data analysis and discussion are discussed in detail.


a b s t r a c t
In this paper, we present an occupational dataset to evaluate benzene exposure on the effective biomarkers of genetic damage, indicated as cytokinesis-block micronucleus (MN) frequency, hematotoxicity, indicated as white blood cells (WBC) counts, and molecular marker of telomere length (TL). And we further to eliminate the mechanism of benzene induced damage. Then evaluate the effects of sites polymorphism in environmental response genes, including 18 sites in metabolic and DNA repair genes, and the interaction between gene polymorphism and benzene exposure. This dataset is supplementary to the submitted research by [1] focused on the biomarkers TL, and a detailed description of the subjects sampling, biomarkers detection, data analysis and discussion are discussed in detail.
© 2020 Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license. ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ) Specifications Table   Subject Public Health and Health Policy Specific subject area Environmental and occupational health Type of data Value of the Data • This data provides biomarkers of genetic damage indicated as cytokinesis-block micronucleus frequency, and hematotoxicity, indicated as white blood cells counts in benzene exposed workers. • This data also provides demographic characteristics and telomere length of benzene exposed workers and controls. • This data presents 18 environmental response gene polymorphisms, including metabolic and DNA repair genes.

Data
In this paper, we provide a benzene exposure workers recruited in shoes plant in Wenzhou, China. The dataset provide the analysis of benzene exposure on effects of biomarkers of peripheral blood cytokinesis-block micronucleus frequency, white blood cells (WBC), telomere length, and the polymorphic sites in environmental response genes, including metabolic and DNA repair genes. Fig. 1 showed analyzed data on dose response relationship between effect biomarkers and benzene exposure cumulative exposure dose (CED) (mg/m 3 -year). The raw data are provided with this article as supplementary material. Table 1 presented the analyzed data effects of gene polymorphisms and benzene exposure on peripheral blood white blood cell (WBC). Multi-linear regression indicated that rs13181 in XPD Lys751Gln, null GSTM1, and rs 3,813,867 in CYP2E1 were associated with reduced WBC. Rs 3,212,986 in ERCC1 3 -UTR G8092T, TG genotype was associated with a increased WBC.

Subjects
In total, 294 benzene-exposed participants, ranging from 17 to 57 years old, were enrolled in 2011 from Wenzhou, China. And a control group consisting of 102 indoor workers, were selected in the same year from the same city. We predicted the number of samples before the research for MN frequency, TL and genetic polymorphism.

For predict Tl sample number
The formula of sampling numbers comparing the difference of two samples was: Significant level α = 0.05, power of test 1-β = 0.9, ' δ' was the mean value of difference between benzene exposed group and control group. According to previous reports [2] , δ = TL exposure -TL control = 1.37-1.26 = 0.11, the largest S = 0.23, and n1 = n2 = 90. Therefore, at least 90 benzene exposed workers and controls were needed in the data. The Multiple linear analysis was conducted to analyze the gene polymorphisms on white blood cell in separate models, after adjusting gender, age, smoking and alcohol using. a CED, cumulated exposed dose, which was continuous variable.

For predict sample number of polymorphic form of metabolic genes and DNA repair genes
According to pre-experiment, the mutant allele (the Heterozygote + Homozygous mutant) frequency was 20%. The equation was: If the ' δ', the difference of TL between the wild type workers and mutant workers was postulated as 0.15, 154 benzene-exposed workers were needed. Generally speaking, larger the sample size, better will be the results for polymorphic form of data. Our sample size can detect the TL difference of 0.15.

For predict MN frequency
Micronucleus (MN) frequency suit for Poisson distribution, and the formula of sampling numbers comparing the rata of two samples were as follows: Where λ 1 and λ 2 are mean of the control and exposed workers, respectively. If we postulated that α = 0.05, β = 0.10. According to previous data [3] , λ 1 and λ 2 were 7.06 and 4.83. We can deduce the sampling numbers were 21 for exposed and control separately. According to previous data of our team [4] , with the lowest MN frequency in the exposed workers, 44 numbers of the exposed and control were needed in the dataset.
The KASP genotyping platform requires two different, allele-specific, competing forward primers, one labeled with FAM, and the other forward labeled with HEX, and one common re-verse primer were needed ( table 2 ). The last column was the sequence of polymorphism sites and its mutant alleles.