Human exome sequence data in support of somatic mosaicism in carotid atherosclerosis

Understanding the mechanisms underlying the connection between somatic mosaicism and cardiovascular disease is likely essential for the future of personalized medicine. This article is aimed at providing data on somatic mosaicism in human carotid atherosclerosis. An advanced carotid atherosclerotic plaque and white blood cells were collected simultaneously from each patient (eight Slavic males, aged 67 ± 3.8 years [mean ± SD]) to assess the spectrum of germline and somatic genetic variants. Exome sequencing of DNA from the samples was performed with the SureSelect Clinical Research Exome Enrichment Kit (Agilent Technologies) and HiSeq 1500 (Illumina). The dataset contains germline and somatic single-nucleotide variants and small indels identified in the advanced carotid atherosclerotic plaque and white blood cells of each patient. This dataset does not include copy number variants owing to a lack of suitable tools for reliable calculation of copy numbers from exome sequencing data on cancer-unrelated samples. The dataset should help to understand somatic mosaicism in cardiovascular diseases and to identify copy number variants by means of more appropriate newer tools in the future.


a b s t r a c t
Understanding the mechanisms underlying the connection between somatic mosaicism and cardiovascular disease is likely essential for the future of personalized medicine. This article is aimed at providing data on somatic mosaicism in human carotid atherosclerosis. An advanced carotid atherosclerotic plaque and white blood cells were collected simultaneously from each patient (eight Slavic males, aged 67 ± 3.8 years [mean ± SD]) to assess the spectrum of germline and somatic genetic variants. Exome sequencing of DNA from the samples was performed with the SureSelect Clinical Research Exome Enrichment Kit (Agilent Technologies) and HiSeq 1500 (Illumina). The dataset contains germline and somatic single-nucleotide variants and small indels identified in the advanced carotid atherosclerotic plaque and white blood cells of each patient. This dataset does not include copy number variants owing to a lack of suitable tools for reliable calculation of copy numbers from exome sequencing data on cancer-unrelated samples. The dataset should help to understand somatic mosaicism in cardiovascular diseases and to identify copy number variants by means of more appropriate newer tools in the future.
© 2021 The Author(s Value of the Data • The data on both germline and somatic mutations associated with carotid atherosclerosis can serve as the basis for studies on its pathogenesis and searches for biomarkers of disease severity and for new therapeutic targets. • The dataset should be useful for identifying copy number variants with more appropriate newer tools in the future.
• The data can be helpful for further research into somatic mosaicism in atherosclerosis and clonal expansion in noncancerous tissues. • The data can be integrated with other multi-omics studies or existing databases to better understand molecular mechanisms of complex traits and diseases.

Data Description
Clinical information about the patients with advanced carotid atherosclerosis is presented in Mendeley Data and contains the following characteristics: ID, age, sex, height, weight, bodymass index (BMI), waist circumference (WC), ischemic heart disease (IHD) symptoms, comorbidities, atherosclerotic-plaque description, and echocardiogram data. All patients had coronary artery disease in their medical history, abdominal obesity, arterial hypertension, hypercholesterolemia, and class II of heart failure (New York Heart Association).
All exomes were sequenced using the Illumina HiSeq platform in 2 × 150 bp paired-end format. The dataset provides raw data, alignment data, and called and filtered data on both white blood cells and an advanced carotid atherosclerotic plaque collected from each patient (eight patients).
FASTQ raw data files and BAM alignment data files (hg19) were deposited in the NCBI database under BioProject database number PRJNA758796 ( https://www.ncbi.nlm.nih.gov/ bioproject/758796 ). The data cover eight patients, each of them has two BioSamples: white blood cells and the carotid atherosclerotic plaque ( Table 1 ). Each BioSample accession is associated with SRA accessions linked with FASTQ raw data files and BAM alignment data files.
Germline single-nucleotide polymorphisms (SNPs) and small indels detected in white blood cells and carotid atherosclerotic plaques of patients with atherosclerosis are stored in the Mendeley Data (doi: 10.17632/hj68dfm5sm.1 ) in VCF file format. The data are not filtered. A brief description of this dataset is presented in Table 2 . In total 1 281 674 genetic variants were called, 199 034 of them are small indels.   Somatic SNPs and small indels were deposited in the Mendeley Data (doi: 10.17632/ hj68dfm5sm.1 ) in the same VCF file format. The data are not filtered. A brief description of this dataset is presented in Table 3 .

Experimental Design, Materials and Methods
Matched white blood cells and a carotid atherosclerotic plaque were collected from each patient with carotid atherosclerosis (eight patients, Slavic males, aged 67 ± 3.8 years [mean ± SD]). The specimens of atherosclerotic plaques were obtained during planned carotid endarterectomy.
All patients had carotid artery stenosis of more than 90%. Tissue biopsies were frozen and stored in nitrogen prior to DNA extraction.
Total-genomic-DNA samples were extracted from the white blood cells and carotid atherosclerotic plaques by means of the DNeasy Blood and Tissue Kit (Qiagen) as per manufacturer's protocol. The specimens of atherosclerotic plaques before the DNA extraction were briefly washed separately in a buffer that consisted of 0.05% of 2-mercaptoethanol, 0.5 M EDTA pH 8.0, and 1 × PBS. After that, the specimens were shredded using a disposable scalpel. This step helps proteinase K and the lysis buffer from the kit to work properly and minimizes the risk of rapid DNA degradation.
Library preparation was done using the SureSelect XT2 (SSXT2) Reagent Kit and the SureSelect Clinical Research Exome V2 Exome Enrichment Kit (Design ID S06588914), according to the manufacturer's protocol (Agilent Technologies). The quality and concentration of DNA libraries were assessed on a Qubit 3.0 instrument (Thermo) and TapeStation 2200 (Agilent Technologies). Sequencing was carried out on the Illumina HiSeq 1500 platform in 2 × 150 bp paired-end format.
Alignments to the hg19 reference genome were conducted by the BWA-MEM algorithm [1] . The germline and somatic SNPs and indels were called by Best Practices Workflows included in the Genome Analysis Toolkit (ver. 4) [2] . Somatic SNPs and indels were identified via a comparison of exome sequences of the leukocytes and carotid atherosclerotic plaque for each patient. Statistics were analyzed by means of vcftools [3] and bcftools [4] .

Ethics Statements
The study protocol was approved by the Ethical Committee of Research Institute of Cardiology (approval number: 203). Informed consent was obtained from all patients involved in the experiments.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.