A- Clinical samples:
Our study is a retrospective study which includes 51 Lebanese patients with BC who were referred to Hôtel Dieu de France hospital (HDF) in Beirut between 2000 and 2019, and who fulfilled the following inclusion criteria: patient aged ≥ 18 years old, currently or previously diagnosed with BC at any stage.
Approval to conduct the study was obtained from the Ethics Committee of Saint Joseph University, Beirut, Lebanon (HDF 1176) and the study was performed in accordance with the Declaration of Helsinki. All patients who agreed to get included in this project signed an informed consent for participation, sample collection and data publication. Medical data related to the patients who were hospitalized at HDF was retrieved from the archive of the hospital. A questionnaire assessing the main environmental risk factors for BC was filled by the patients. Peripheral blood was then collected from each individual enrolled in this study and DNA was extracted using the salting-out method (18).
B- WES analysis:
In order to identify germline variants responsible for BC in the studied population, WES was performed and was followed by a targeted analysis of a panel of 127 genes of interest.
Exon capture and sequencing: The exome was captured using the SureSelect Human All Exons, reagents (Agilent Inc.® Santa Clara, CA) according to the manufacturer’s standard protocol. The concentration of each library was determined using Agilent’s QPCR NGS Library Quantification Kit (G4880A). Samples were pooled prior to sequencing with a final concentration of each sample equal to 10 nM. Sequencing was performed on the Illumina HiSeq2000 platform using TruSeq v3 chemistry.
Mapping and alignment: Reads files (FASTQ) were generated from the sequencing platform via the manufacturer’s proprietary software. Reads were aligned to the hg19/b37 reference genome using the Burrows-Wheeler Aligner (BWA) package v0.6.1 (19). Local realignment of the mapped reads around potential insertion/deletion (Indel) sites was carried out with the Genome Analysis Tool Kit (GATK) v1.6 (20). Duplicate reads were marked using Picard v1.62. Additional BAM file manipulations were performed with Samtools 0.1.18 (21). Base quality (Phred scale) scores were recalibrated using GATK’s covariance recalibration. SNP and Indel variants called using the GATK Unified Genotyper for each sample (22). Variants were called using high stringency settings and annotated with VarAFT software 2.131 (23) containing information from dbSNP147 and ExAC (24).
C- Genes of interest:
WES raw data were evaluated for the genes GSTM1 and NAT2, previously associated with BC in addition to an analysis of the following panel of genes involved in different types of hereditary cancer: AIP, ALK, APC, ATM, ATR, AXIN2, BAP1, BARD1, BLM, BMPR1A, BRCA1, BRCA2, BRIP1, BUB1B, CASR, CDC73, CDH1, CDK4, CDKN1B, CDKN1C, CDKN2A, CEBPA, CHEK2, CTC1, CTNNA1, CYLD, DDB2, DICER1, DIS3L2, DKC1, EGLN1, EPCAM, ERCC1, ERCC2, ERCC3, ERCC4, ERCC5, EXT1, EXT2, EZH2, FAN1, FANCA, FANCB, FANCC, FANCD2, FANCE, FANCF, FANCG, FANCI, FANCL, FANCM, FH, FLCN, GALNT12, GATA2, GPC3, GREM1, HOXB13, HRAS, KIF1B, KIT, LZTR1, MAX, MC1R, MEN1, MET, MITF, MLH1, MLH3, MRE11, MSH2, MSH6, MUTYH, NBN, NF1, NF2, NHP2, NOP10, NTHL1, PALB2, PDGFRA, PHOX2B, PMS2, POLD1, POLE, POLH, POT1, PRKAR1A, PRSS1, PTCH1, PTCH2, PTEN, RAD50, RAD51C, RAD51D, RB1, RECQL4, RET, RUNX1, SDHA, SDHAF2, SDHB, SDHC, SDHD, SLC45A2, SLX4, SMAD4, SMARCA4, SMARCB1, SMARCE1, STK11, SUFU, TERC, TERT, TINF2, TMEM127, TP53, TSC1, TSC2, TYR, VHL, WRAP53, WRN, WT1, XPA, XPC, and XRCC2.
D- Analysis and interpretation of the detected variants
Variants in the genes of interest were filtered for only protein-altering variants, including truncating variants (stop gain/loss, start loss, or frameshift), canonical splice-site variants, inframe indels affecting protein-coding regions, variants within the intron-exon boundary (ten bases flanking the exonic boundaries), and missense variants based on frequency of < 10% of occurrence in dbSNPv137, 1000 Genome, Genome Aggregation Database (gnomAD), and our in-house database containing exome data of 300 exomes. Variants obtained were evaluated according to ClinVar. Those who were reported, at least once, to be pathogenic in ClinVar were initially retained then re-assessed based on prediction softwares and literature review (25).