Pharmacogenetics driving personalized medicine: analysis of genetic polymorphisms related to breast cancer medications in Italian isolated populations

Breast cancer is the most common cancer in women characterized by a high variable clinical outcome among individuals treated with equivalent regimens and novel targeted therapies. In this study, we performed a population based approach intersecting high-throughput genotype data from Friuli Venezia Giulia (FVG) isolated populations with publically available pharmacogenomics information to estimate the frequency of genotypes correlated with responsiveness to breast cancer treatment thus improving the clinical management of this disease in an efficient and cost effective way. A list of 80 variants reported to be related to the efficacy or toxicity of breast cancer drugs was obtained from PharmGKB database. Fourty-one were present in FVG, 1000G European (EUR) and ExAC (Non Finnish European) databases. Their frequency was extracted using PLINK software and the differences tested by Fisher’s exact test. Statistical analyses revealed that 13 out of the 41 (32 %) variants were significantly different in frequency in our sample as compared to the EUR/ExAC cohorts. For nine variants the available level of evidence (LOE) included polymorphisms related to cyclophosphamide, tamoxifen, doxorubicin, fluorpyrimidine and paclitaxel. In particular, for trastuzumab two variants were detected: (1) rs1801274-G within FCGR2A and associated with decreased efficacy (LOE 2B); (2) rs1136201-G located within ERBB2 and associated with increased toxicity (LOE 3). Both these two variants were underrepresented in the FVG population compared to EUR/ExAC population thus suggesting a high therapeutic index of this drug in our population. Moreover, as regards fluoropyrimidines, the frequency of two polymorphisms within the DPYD gene associated with drug toxicity (e.g., rs2297595-C allele and rs3918290-T allele, LOE 2A and 1, respectively) was extremely low in FVG population thus suggesting that a larger number of FVG patients could benefit from full dosage of fluoropyrimidine therapy. All these findings increase the overall knowledge on the prevalence of specific variants related with breast cancer treatment responsiveness in FVG population and highlight the importance of assessing gene polymorphisms related with cancer medications in isolated communities.

Background The development of refined technologies for genetic analysis (e.g., next-generation sequencing, genotyping, etc.), paired with a continuous optimization of computational and bioinformatic tools, has recently unveiled the scope of human genetic variations. These high-throughput approaches led to the discovery of novel diseaseassociated variants, germline mutations responsible for rare genetic diseases, and, as far as the cancer field is concerned, to the identification of somatic mutations predictive of treatment responsiveness [1]. The characterization of patient-specific genetic make up is critical for the development of personalized interventions. A medication that is proven efficacious in many patients, often fails to work in others. Furthermore, even if a certain drug is active, it still may cause serious side effects [2]. Pharmacogenomics addresses this issue by seeking to identify genetic contributors to human variation in drug efficacy and toxicity with the hope of developing personalized treatments.
Breast cancer is the most common cancer in women worldwide. An early detection combined with an appropriate treatment has proved to be effective in reducing risk of death and relapse [3][4][5]. Nevertheless, in the adjuvant setting, only few patients will actually benefit from the treatment. Similarly, a wide degree of variation in treatment sensitivity is observed in metastatic setting.
There is a tremendous effort to identify factors associated with treatment responsiveness [6][7][8]. Although most of the studies have been focusing on tumor characteristics, it is clear that host's genetic make up can influence treatment tolerability and outcome.
The effect of several major antineoplastic agents is influenced by genetic polymorphisms of different nature ranging from the target itself (e.g., transtuzumab and HER2 [9] to metabolic pathways (e.g., capecitabine and DPYD) [10]. In breast cancer, as for other diseases, there is high degree of heterogeneity in term of clinical outcome among individuals treated with equivalent regimens such as hormonal agents (e.g., tamoxifen), cytotoxic agents (e.g., capecitabine), and targeted therapies [11].
In view of this heterogeneity, an increased awareness of the distribution of risk variants within a specific population (or community) is critical to plan tailored healthcare interventions [12]. At the best of our knowledge, many studies described drug response related to the variations in the general population but none of them have so far analyzed the prevalence of defined risk alleles (i.e., variants associated with treatment toxicity or treatment failure) in isolated communities such as those ones described below. In fact, the detection of an unusual high rate of one or more risk variants in a defined community could prompt the local authority to implement ad-hoc screening strategies, which wouldn't be otherwise costeffective in the general population. In addition, physicians can use this information to better sharpen the risk-benefit ratio of a specific intervention in every-day clinical practice.
Here, we developed an analytic pipeline that intersects high-throughput genotype data with publically available pharmacogenomics information. Through this approach, we described the frequency of genetic markers correlated with responsiveness to breast cancer treatment in the Friuli Venezia Giulia (FVG) population with the aim of improving clinical management of this disease in a costeffective way. FVG is an autonomous region located in North Eastern Italy at the border with Austria and Slovenia constituted by 1.2 million of inhabitants. Thanks to this autonomy and to the effective clinical data exchange between different health service providers, FGV has developed a regional health care network (RHCN) ensuring continuity of care and improved health services for citizens of this region. This exchange is facilitated by an information technology (IT) infrastructure that allows a continuous and up-to-date secure exchange of medical data and records. As of 2008, medical data from hospitals, primary care physicians, and pharmacies (digital prescription records) are all accessible through this IT infrastructure in a strictly regulated manner.
Moreover, in 2009, two population based research pilot projects have been started. One is the FVG genetic park, focused in studying six isolated communities of this region (Erto-Casso, Clauzetto, Resia, Sauris, San Martino del Carso, and Illegio) for an overall number of approximately 2500 inhabitants [13]. The second one is the MoMa, which aims at investigating dismetabolic syndromes in another large community of the region (Montereale/Maniago, approx. 15,000 inhabitants). Both volunteer-based projects, representing approximately 2 % of the adult population living in the FVG autonomous region, have established a biobank which stores biological samples and a huge collection of clinical data. Taken together, the existence of all the elements mentioned above sets FVG Region in a globally unique position for the implementation of genomic medicine.

Sample collection, DNA sampling and genotyping
One thousand five hundred ninety samples from the FGV genetic park (six isolated villages, see Fig. 1) project were used for our genomic analyses (Additional file 1: Table S1) [13]. Information from a standardized health examination with collection of a series of deep phenotypes (neurological, psychiatric, audiological, ophtalmological, cardiovascular, etc.) and a questionnaire on health-related topics, such as lifestyle and diet were collected from all participants. The exact number of participants divided by village, together with information about sex and mean age are reported in Additional file 1: Table S1. DNA from blood samples was extracted using standard protocols. All the 1590 subjects were genotyped using the HumanExome BeadChip. In addition, a subset of 1259 individuals has been genotyped using the Illumina HumanCNV370-Quadv3_C (300 K), and another subset (N = 331) with the Illumina HumanOmniExpress-12v1-Multi_C chip (700 K) (see Fig. 2). Genotype quality control and data cleaning were performed as previously described [13]. Considering the different genotyping platforms, the number of samples available for the analysis of each variant ranged from 331 to 1590 (the vast majority of cases) as described in Table 1. All participants have signed a broad informed consent form (a 5 years follow-up is now in progress), which allows the continuous updating of epidemiological data through periodical linking to National electronic databases and registries. The research was conducted according to the ethical standards defined by the Helsinki declaration. The study was approved by the Institutional Review Board of IRCCS Burlo Garofolo PROT CE/v-78.

Comparison between populations
All variants reported to be related to the efficacy or toxicity of breast cancer drugs were extracted from PharmGKB database (https://www.pharmgkb.org/ [14]), obtaining a set of 80 single nucleotide polymorphisms (SNPs), all located on autosomal chromosomes. The overlap between these data and our population's genotypes resulted in two catalogs of 37 and 23 loci (respectively from joint dataset and HumanExome chip) for a unique list of 41 variants. Among these 41 variants we extracted frequency information in the general European cohort from 379 people of the 1000G Project data (EUR population) [15], 33,368 people (Non Finnish European individuals) from the ExAC database and [16] 1590 individuals from the FVG population using PLINK software [17].
For each variant, differences in allele frequencies between the three populations were tested by Fisher's exact test. Due to the high number of people included in ExAC database compared to 1000G data, the estimation of frequencies is more precise thus leading to a higher significance p value. Because of the explorative nature of our analysis, no multiple testing corrections were performed, and a p value ≤0.05 was considered statistically significant.

Aggregated analysis of risk variants
In order to estimate the overall genotype's risk for therapeutical agents, six main classes of drugs were defined as follows: (1) fluoropyrimidines, (2) alkylating agents, (3) taxanes, (4) antracyclines/tumor antibiotics, (5) antiestrogens, and (6) monoclonal antibodies. For each class of drugs, all the variants associated with breast cancer medications and having a different frequency between the FGV and the EUR/ExAC populations, were pooled and the proportion of the "at risk" genotypes in our population was defined.

Results
A cohort of 1590 individuals from six different isolated villages located in FVG region was used as the base for the study ( Fig. 1 and Additional file 1: Table S1). Out of 80 variants and 58 genes related to breast cancer medications according to the PharmGKB database, 41 variants in 32 genes were present in our genotyping platforms for the six isolated populations (Additional file 2: Table  S2 and Fig. 2). The frequencies of 13 out of 41 variants (i.e., 32 %) were different between FGV and EUR/ExAC cohorts (Table 1). When available, level of evidence (LOE) of the variant-drug combination, scoring from 1 (annotation for which pharmacogenomics guidelines are implemented in clinical practice) to 4 (annotation based on a case report, non-significant study or in vitro, molecular or functional assay evidence only), is reported in Table 2. A more detailed description of these differences and the relative clinical impact is described in the following sections. We will first describe differences for those SNPs for which an association with activity or toxicity has been reported (LOE 1-4), see Table 2A. Then, we will discuss differences for the SNPs that were reported as "related" with breast cancer medications but for each the evidence supporting the association is lacking (LOE not reported; Table 2B). Finally, we will present the cumulative frequency of at risk genotype for the defined categories of drugs.

Genetic variants associated to breast neoplasm medications
By using our analytical pipeline, we found nine variants associated with drug toxicity or efficacy having a different distribution between the FGV and the EUR/ExAC populations supported by a certain LOE (Table 2A). The level of association was reported as statistically significant in at least one study for seven of them (LOE 1-3) and borderline significant for one of them (rs4244285; Step (1) 80 variants and 58 genes related to breast cancer medicationsa ccording to the PharmGKB database were considered.
Step (2) Variants selected in the previous step were overlapped with data from the available genotyping platforms in the FVG cohort: 41 variants and 32 genes were used for further analysis.
Step (3) Frequencies for the 41 variants selected in step 2 were compared between FVG and EUR cohorts, resulting in a set of 13 variants. Among them, for nine variants the association was supported by a certain LOE  LOE 4). In one case (rs714368), two studies reported opposite results (LOE 4), as described below (Table 2A). The type and the position of these variants are reported in Table 2A. Eight of them are missense variants, while two of them are annotated as splicing variants. For the remaining three one is a synonymous variant, one is located in UTR3 site and one is located in an intergenic region. A more detailed description of each variant including data on in silico prediction is reported in the Additional file 3: Table S3. Considering the variants that affect the splicing, one is reported to be a spice donor (rs3918290) and the other one (rs776746) as a splice acceptor located in the UTR5. In this light, a change in these sites could have functional consequences. Doxorubicine and cyclophosphamide are the backbone of the chemotherapeutic regimens used for the treatment of breast cancer patients. Cyclophosphamide is a pro-drug that needs to be oxidized to exert its cytotoxic effect. This step is catalyzed by a number of cytochrome P450 enzymes, including CYPC219 [18]. The cellular uptake of doxorubicine is mediated by ABCB1 and SLC22A16 cationic transporters [19].
As regards CYP2C19, the rs4244285 variant has been reported to be associated with differential response to cyclophosphamide-doxorubicin adjuvant regimen. The frequency of the A allele in the EUR/ExAC cohorts and in the mixed population is 15 % while in our cohort is 8 % (p = 1.1E−03) ( Table 1). The GG genotype was the most frequent in FVG cohorts (with the highest number in Resia valley) (Table 2A). According to a single retrospective study [19], individuals bearing the AA genotype show a trend of an increased risk of poorer outcome if treated with cyclophosphamide-doxorubicin polychemotherapeutic regimen (LOE 4) although other studies are partially discordant [20].
As for SLC22A16 gene, the rs714368 polymorphism, which is related to doxorubicin response, has a frequency of 35 % for the C allele in the Asian population; the frequency   14:22 of this allele in the EUR/ExAC cohorts is 21-22 % while in our population is 27 % (p = 6.9E−03) ( Table 1). A minor effect (p = 5.5E−2) of increased exposure to doxorubicin associated with rs714368 C rare allele homozygosity was observed in Lal et al. [21]. Conversely, a more recent study [19] has associated, a decreased incidence of dose delay with the carrying of the C allele, both at the homozygous and heterozygous status. Considering that the T allele is more frequent in FVG cohorts (Table 2A), there is a higher proportion of individuals which may experience higher toxicity for doxorubicin (LOE 4). The rs4880 G allele (located within super superoxide dismutase 2-SOD2-gene) has been correlated with lower survival in breast cancer patients treated with chemotherapy (p = 1E−3). However, the effect was mostly restricted to those treated with cyclophosphamide-based adjuvant regimens (p value for cyclophosphamide-genotype interaction = 2.3E−2, LOE 2B), an association replicated in two independent cohorts (US and Norwegian patients) [22]. More recently, AA genotype has been associated with statistically significant better progression-free and overall survival in patients treated with adjuvant tamoxifen [23]. In the EUR/ExAC cohorts the frequency of the G allele is 46 and 51 % respectively while in FVG population is 54.5 % (p = 1.9E−02; 7.8E−02) suggesting a quite high percentage of at risk genotype in FVG region (Table 2A).
The introduction of the anti-HER2 monoclonal antibody (mAb) trastuzumab in clinical practice has revolutionized the treatment of HER-2 positive breast cancer patients. In fact, their mechanism of actions relies on their ability to inhibit the target surface molecules and to trigger antibody-dependent cellular cytotoxicity (ADCC), a process which involve fragment crystallizable (FC) receptors. Based on some retrospective studies, it could be speculated that polymorphisms of FC receptor resulting in differential FC affinity can reduce the activity of those mAb [24].
Regarding rs1801274 (FCGR2A gene), the frequency of G allele in our cohort is 42 % while in EUR/ExAC cohorts is 49-50 % (p = 1.5E−02; 9.2E−07) (Tables 1 and 2A). Exome chip data supports this result. The most frequent genotype in FVG cohorts is AG. In trastuzumab treated patients, this genotype, as compared to AA genotype, was significantly associated with decreased response and shorter progression-free survival (LOE 2B) [25,26]. Nevertheless, these data should be considered with caution. In fact, the association between rs1801274 FCGR2A and response to trastuzumab in metastatic [25] or neoadjuvant [26] setting comes from the retrospective analyses of small patient cohorts. Conversely, Hurvitz et al. failed to reproduce these observations by evaluating more than one thousand patients enrolled onto an adjuvant randomized trial [24,27].
Although trastuzumab is generally well tolerated, its use is often associated with a clinically relevant cardiotoxicity thus a particular caution in the qualification for treatment is necessary and the genotype information could improve the physician's decision-making process. A recent investigation has shown that protein modification induced by rs1136201 ERBB2 polymorphisms may render cardiomyocytes dependent upon HER2 signaling and more sensitive to trastuzumab-mediated toxicity [28] (LOE 3). According to this study, patients with the AA genotype may have decreased risk of cardiotoxicity as compared to patients with the AG genotype following trastuzumab administration (p = 5.8E−3) [28]. The frequency of G allele for rs1136201 is 17 % in FVG cohorts and between 24 and 25 % in EUR/ExAC cohorts (p = 3.6E−5; 2.4E−13), while the frequency of low risk AA patients in our cohort is 69 % (Tables 1 and 2A).
Fluoropyrimidine (i.e., capecitabine, 5-fluoruracil) are another class of drugs widely used in breast cancer. Dihydropyrimidine dehydrogenase (DPD) eliminates more than 80 % of the administered drug and is the ratelimiting enzyme foruoropyrimidine catabolism. Cancer patients carrying mutations in the dihydropyrimidine dehydrogenase gene (DPYD) have a high risk to develop severe drug-adverse effects following fluoropyrimidine drugs administration. These side effects consist in myelosuppression, mucositis, neurotoxicity, hand-foot syndrome, and diarrhea, which can be life-threatening. Guidelines that recommend alternative drugs or different dose adjustment according to DPYD genotype have been developed [29,30].
The rs3918290 is the most well studied DPYD variant and the T allele variant is associated with increased toxicity (LOE 1). This intronic polymorphism results in a splicing variant skipping an entire exon and a nonfunctional protein [33]). The frequency of T allele was extremely low in our population (0.063 % vs 0.6-1 %, FGV vs EUR/ExAC populations, respectively, Table 2A).
Homozygous TT genotypes (i.e., those with a severe outcome) are not present in our cohort, while there are two out of 1581 CT heterozygous individuals (five heterozygous cases out of 379 samples in EUR cohort). A low frequency of at risk DPYD rs2297595 C and rs3918290 T genotypes might be a good indicator for physicians (see Table 2A). We also noticed a different distribution of rs1048943 (CYP1A1 gene) in our vs EUR/ExAC cohort. CYP1A1 is likely involved in the metabolism of fluoropyrimidine [34]. Metastatic breast cancer patients carrying TT genotype may suffer a decreased progression-free survival when treated with capecitabine plus docetaxel [35] (LOE 3). The frequency of C allele in our cohort was 6.8 % (6.3 % in Exome chip data) and 3-4 % in the EUR/ ExAC cohort (p = 2.4E−2, 5.7E −18). Although in the PharmGKB website is reported an association between this variant and taxane, Vaclavikova et al. indicates that CYP1A1 does not metabolize taxanes [36], implying that the association noticed is driven by the effect on this polymorphism on capecitabine.
Finally, we found a different distribution of rs776746 (CYP3A5) T allele between EUR (8 %) and our population (5 %). In this regard, a recent study reported this variant as significantly associated with severe neutropenia in breast cancer patients treated with paclitaxel (LOE 3), another critical drug widely used in both metastatic and adjuvant setting [37] (Table 2A). This polymorphism has also been studied in the setting of adjuvant tamoxifen, but no correlation with recurrence risk of disease has been detected [38].

Genetic variants with doubtful role related with breast cancer medications
In addition to the above mentioned associations, we found 4 variants (rs2369049 close to TCL1A gene, rs2072671 within CDA gene; rs1801159 within DPYD gene, rs6214 within IGF1 gene) of genes listed as "related" to breast cancer medications by PharmGKB having a different distribution between the FGV and EUR/ExAC cohorts but for which the LOE of drug-variant association was not reported. By reviewing the pertinent literature we found that all but one of these polymorphisms (rs2369049-TCL1A), the underlying studies were largely negative [39,40]. As for rs2369049, however, two studies detected an association with exemestane toxicity but in opposite direction [39,41].

Aggregated analysis of variants at risk
To further increase our knowledge on combined at risk genotypes (i.e., pooling together the genetic data available regarding) related to each treatment, we divided therapeutic agents in six main classes (see "Methods" section). The combined presence of the described risk alleles (Table 2A) was checked within each class.
For three of those classes, the members (i.e., the breast cancer medications) did not share any of the reported at risk genotype. We then defined, for the three remaining classes, the proportion of at risk subjects within each class by calculating the overall cumulative frequency of at risk genotypes ( Table 3). The results are described as follows.
Fluoropyrimidines (capecitabine and 5-fluorouracil). No subjects carried all the three at risk genotypes [e.g., rs2297595 (DPYD gene), rs1048943 (CYP1A1 gene), and rs3918290 (DPYD gene)]. We observed that only 1.26 % of the analyzed samples carried both at risk genotypes for rs2297595 and rs1048943 polymorphisms, while the 0.06 % of subjects carried the rs3918290 and rs1048943 at risk alleles or the rs3918290 and rs2297595 risk alleles.
Alkylating agents (cyclophosphamide). Approximately 1 % (1.20 %) of our sample could be at risk of a poor outcome being a carrier of both at risk genotypes for rs4880 and rs4244285 variants.

Discussion
In this study, by taking advantage of the participation to a Pilot National Project on a specific set of isolated communities of North-Eastern Italy (i.e., FVG population) and compared to those from EUR population and ExAC database, we described the distribution of polymorphisms related to breast cancer medications. Despite the frequencies of most polymorphisms were similar between the EUR, ExAC and the FVG cohort, more than 30 % of variants analyzed significantly differed among these cohorts. A certain LOE was reported for 9 out of those 13 variants, while for four of them the related literature fails to detect any kind of association (LOE not reported).
Variants for which a LOE was available include polymorphisms linked to cyclophosphamide, tamoxifen, doxorubicin, fluorpyrimidine and paclitaxel. As for cyclophosphamide, one polymorphism (i.e., rs4880-G) within the SOD2 gene was associated with poor outcome (LOE 2B) and showed a higher frequency in the FVG as compared to EUR/ExAC cohort. Another variant, located in the CYP2C19 gene (i.e., rs4244285-A), had a lower frequency in our population. Importantly, the rs4880-G has also been associated with poor response to tamoxifen (LOE 3). Notably, only a small proportion of FVG population (i.e., 1.2 %) carried both at risk polymorphisms.
Regarding doxorubicine and paclitaxel, we observed that the frequency of two alleles such as rs714368-T (SLC22A16) and rs776746-C (CYP3A5), associated respectively with toxicity to doxorubicine (LOE 4) and paclitaxel (LOE 3), displayed a higher frequency in the FVG cohort. These findings imply that that those two drugs could be less tolerated by FVG patients.
As for trastuzumab, one variant within the FCGR2A gene (rs1801274-G), which is associated with decreased efficacy (LOE 2B), and one variant within the ERBB2 gene, which is implicated in cardiomyopathy development (LOE 3), were underrepresented in FVG population suggesting an higher therapeutic index of this drug in our population as compared to that expected in the overall European population. Nevertheless, 18.4 % of subjects in our cohort carried the at risk genotypes for both rs1136201 and rs1801274, thus highlighting that a considerable proportion of patients in FVG displays a particularly unfavorable genotype despite the frequency of such polymorphisms is lower as compared to the one observed in the EUR cohort.
As far as fluorpyrimidines is concerned, the frequency of two polymorphisms within the DPYD gene associated with drug toxicity (e.g., rs2297595 C allele and rs3918290 T allele, LOE 2A and 1, respectively) was extremely low in FVG population. A number of studies have conclusively demonstrated that patients with functional alteration of DPYD gene can experience life threatening adverse events following fluoropyrimidine administration. The rs3918290 is the most well characterized DPYD variant and dosing guidelines have been developed. The clinical pharmacogenetics implementation consortium (CPIC) for dihydropyrimidine dehydrogenase genotype and fluoropyrimidine dosing recommends to select alternative drugs in case of homozygosity for rs3918290 T allele and to reduce dose by 50 % or select alternative drug in case of heterozygosity [29]. T allele was extremely rare in FVG population and homozygous TT genotypes (i.e., those ones with a severe outcome) were not present in our cohort, while there were only 2 out of 1581 CT heterozygous individuals. The frequency of the rs1048943 T allele (CYP1A1 gene), which has been associated with toxicity of capecitabine-containg regimens, was higher in FVG population. However, the clinical relevance of this association is not clearly defined (LOE 3). Importantly, only 0.06 % of the FVG patients carried both at risk genotypes for rs3918290 and rs1048943 polymorphisms and only 1.2 % for rs2297595 and rs1048943 variants. Overall, considering the clinical relevance of the DPYD data, these results suggest that a larger number of FVG patients could benefit from full dosage of fluoropyrimidine therapy.
As for tamoxifene, several studies have assessed the impact of cytochrome CYP2D6 genotype on treatment responsiveness but results are clashing. Although CYP2D6 data were not available in our cohort, a recent meta-analysis on 25 studies enrolling more than 13 thousand individuals concluded that there is no sufficient evidence to support CYP2D6 genotyping in patients treated with tamoxifen [40].
The information derived from our study will be transferred to the Regional Health Care Network in order to prepare specific leaflets to accurately inform local hospitals and physicians allowing the implementation of genomic medicine. While we found that one-third of the analyzed variants had a different frequency in our vs the EUR/ExAC cohort, we also noticed that for most of the assessed targets the respective LOE was weak (LOE 3-4) and further investigations are needed to confirm the reported associations. Implementation of such approach in breast-cancer clinical setting could fill this knowledge gap, which is a necessary step to prospectively refine the impact of a patient-based personalized treatment.

Conclusions
In conclusion, our explorative study highlights the importance of assessing gene polymorphisms related with cancer medications in isolated populations. In particular, the finding that specific functional variants, strongly associated with toxicity or lack of efficacy, are