Dataset of allele, genotype and haplotype frequencies of four LIN28B gene polymorphisms analyzed for association with age at menarche in Russian women

In this paper, we present the allele, genotype and haplotype frequencies of 4 single nucleotide polymorphisms (SNPs) in LIN28B gene (rs4946651, rs7759938, rs314280, rs314276) in a sample of Russian women. These SNPs had been previously identified to be associated with age at menarche in genome-wide association studies (GWAS). The information about age at menarche was obtained using the questionnaire. The frequencies of alleles, genotypes and haplotypes of four SNPs were classified in 3 groups: the whole sample, individuals with the early age at menarche (<12 years), and those with the average age at menarche (12–14 years).


a b s t r a c t
In this paper, we present the allele, genotype and haplotype frequencies of 4 single nucleotide polymorphisms (SNPs) in LIN28B gene (rs4946651, rs7759938, rs314280, rs314276) in a sample of Russian women. These SNPs had been previously identified to be associated with age at menarche in genome-wide association studies (GWAS). The information about age at menarche was obtained using the questionnaire. The frequencies of alleles, genotypes and haplotypes of four SNPs were classified in 3 groups: the whole sample, individuals with the early age at menarche (<12 years), and those with the average age at menarche (12e14 years).
© 2019 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons. org/licenses/by/4.0/).

Subjects
The recruitment of the participants was carried out through the Perinatal Centre of the Belgorod Regional Clinical Hospital of St. Joasaph during 2008e2013. All participants were unrelated women of Russian descent (self-declared) living in Central Russia [8]. The following exclusion criteria were adopted: non-Russian descent, a birthplace outside of Central Russia, malignant tumors of a small pelvis and breast, benign tumors and hyperplastic disorders of the reproductive organs in women (leiomyoma, endometriosis, and endometrial hyperplasia), chronic severe diseases of the vital organs (heart, respiratory or renal failure), severe autoimmune diseases. The research protocol was approved by the Regional Ethics Committee of Belgorod State University. Written informed consent for participation was obtained from all individuals enrolled in the research.
The information about AAM was obtained using the questionnaire. AAM was defined as age (full years) of first menses. Each participant was asked a question: "How old were you when you had the first menses?" Women with AAM !18 years (n ¼ 4) or women who refused to answer (n ¼ 13) were excluded from the research. In total, 674 females participated in the research.

Data format
Raw and analyzed data Experimental factors Total genomic DNA was isolated from buffy coat using the standard phenol-chloroform method.

Experimental features
DNA samples were genotyped using the Sequenom MassARRAY® iPLEX platform, which is based on MALDI-TOF (matrix-assisted laser desorption/ionization time-of-flight) mass spectrometry Data source location Belgorod, Russia Data accessibility The data is available with this article Value of the data The genetic variants in LIN28B gene may play a role in age at menarche. The data on the allele, genotype and haplotype frequencies are important because they contribute to understanding genetic structure of populations. The data can be used in research of a genetic basis of age at menarche and menarche-associated multifactorial diseases (obesity, breast cancer, osteoporosis, uterine leiomyoma, endometriosis, preeclampsia and others) in various populations.

Blood sample collection and DNA handling
The phlebotomy was performed by a certified nurse. Five milliliters of blood was taken from the ulnar vein into a plastic vial (Vacutainer®) with 0.5M EDTA solution (рН ¼ 8.0). Extraction of lymphocyte DNA was done by standard phenol-chloroform technique and quantified by Nanodrop 2000 spectrophotometer (Thermo Scientific, Inc.). Only samples with А260/А280 ¼ 1.7e2.0 were used for the analysis.

SNP genotyping
DNA samples were genotyped using the Sequenom MassARRAY® iPLEX platform at the Centre of Genomic Sciences (University of Hong Kong). The procedure for DNA sample preparation and data quality control are described elsewhere [10].

Statistical analysis
The correspondence of the SNPs to the Hardy-Weinberg equilibrium was checked using the chisquare test. No significant differences in allele frequencies between the group with the early age at menarche (<12 years) and group with the average age at menarche (12e14 years) (p > 0.05) were revealed. The Haploview version 4.2 software (https://www.broadinstitute.org/haploview/haploview) was used to quantify the linkage disequilibrium (LD) between rs4946651, rs7759938, rs314280 and rs314276 in LIN28B gene. Haplotype frequencies were determined using the EM algorithm. The LD block structure was defined using the Solid Spine of the LD algorithm [11] provided by the Haploview 4.2. The degree of genetic linkage between the 4 SNPs in each groups was estimated as Lewontin's coefficient D 0 , where no color (D 0 ¼ 0) indicates that LD is weak or nonexistent and the dark red (D 0 ¼ 1) indicates that there exists strong pairwise LD between SNPs (Fig. 1).