Regulatory VCAN polymorphism is associated with shoulder pain and disability in breast cancer survivors

Shoulder morbidity following breast cancer treatment is multifactorial. Despite several treatment- and patient-related factors being implicated, unexplained inter-individual variability exists in the development of such morbidity. Given the paucity of relavant genetic studies, we investigate the role of polymorphisms in candidate proteoglycan genes. We conducted a cross-sectional study on 254 South African breast cancer survivors, to evaluate associations between shoulder pain/disability and ten single nucleotide polymorphisms (SNPs) within four proteoglycan genes: ACAN (rs1126823 G>A, rs1516797 G>T, rs2882676 A>C); BGN (rs1042103 G>A, rs743641 A>T, rs743642 G>T); DCN rs516115 C>T; and VCAN (rs11726 A>G, rs2287926 G>A, rs309559). Participants were grouped into no–low and moderate–high shoulder pain/disability based on total pain/disability scores: < 30 and ≥ 30, respectively using the Shoulder Pain and Disability Index (SPADI). The GG genotype of VCAN rs11726 was independently associated with an increased risk of being in the moderate-to-high shoulder pain (P = 0.005, OR = 2.326, 95% CI = 1.259–4.348) or disability (P = 0.011, OR = 2.439, 95% CI = 1.235–4.762) categories, after adjusting for participants’ age. In addition, the T-T-G inferred allele combination of BGN (rs74364–rs743642)–VCAN rs11726 was associated with an increased risk of being in the moderate-to-high shoulder disability category (0 = 0.002, OR = 2.347, 95% CI = 1.215–4.534). Our study is first to report that VCAN rs11726, independently or interacting with BGN polymorphisms, is associated with shoulder pain or disability in breast cancer survivors. Whereas our findings suggest an involvement of proteoglycans in the etiology of shoulder pain/disability, further studies are recommended.


Introduction
Shoulder pain and disability are common chronic upperlimb morbidities among female breast cancer survivors [1,2]. We recently reported that 74% and 62% of breast cancer survivors report some level of pain and functional disability, respectively, at least 1 year post-surgery [1]. The most recent systematic review of upper limb morbidity amongst breast cancer survivors by Hidding et al. [2] estimates prevalence of upper-limb pain and disability (reduced range of motion) at 9-68% and 6-31 %, respectively, beyond 1 year post-surgery. Shoulder pain and disability are strongly correlated [1], and may persist beyond 7 years following treatment [3]. Given that such morbidities have a negative impact on the quality of life of affected individuals [4], understanding the underlying etiology remains an urgent need.
The etiology underlying shoulder pain/disability remains poorly understood. Contributing treatmentrelated factors include type of breast surgery, type of axillary surgery and adjuvant therapy, with absolute risk increases of 1-21% for persistent pain [2,5,6]; patientrelated factors include age, presence of preoperative pain, presence of acute post-operative pain and genetic predisposition, with absolute risk increases of 2-7% for persistent pain [6,7]. Nonetheless, there remains a paucity of studies investigating the role of genetic factors in the interindividual variability in developing shoulder pain/disability amongst breast cancer survivors.
A growing body of evidence, from non-cancer-related conditions, supports the role of polymorphisms within genes encoding structural and regulatory extracellular matrix (ECM) proteins in modulating susceptibility to shoulder pathology [8][9][10][11]. Although no associations between polymorphisms in proteoglycan-encoding genes and non-cancer-related shoulder conditions have been reported to date, proteoglycans are important ECM components whose expression levels are altered in such conditions [12,13]. In particular, changes in expression of the proteoglycans-aggrecan (ACAN), versican (VCAN), biglycan (BGN) and decorin (DCN)-have been reported in rotator cuff disease [12,13]. Moreover, associations have been reported between proteoglycan gene polymorphisms and other connective tissue conditions such as anterior cruciate ligament ruptures [10,[14][15][16]. The hylectan proteoglycans, including ACAN and VCAN are important structural components in connective tissues such as tendons and ligaments. The small leucine-rich proteoglycans (SLRPs) including BGN and DCN regulate collagen fibrillogenesis, and are important modulators of the angiogenesis and the transforming growth factor beta (TGF-β) signaling pathways amongst others [17]. Given the role of proteoglycans in the ECM of connective tissues, functional polymorphisms in proteoglycan-encoding genes may, perhaps, lead to altered signaling and/or biomechanical properties in tissues such as tendons or ligaments of the shoulder. Indeed, studies on animal models have demonstrated that changes in expression of decorin and biglycan alter mechanical properties of tendons including failure load, stiffness, dynamic modulus and viscosity [18]. We hypothesize that polymorphisms in proteoglycanencoding genes may be associated with shoulder pain/ disability amongst breast cancer survivors. Our aim, therefore, was to investigate the association between candidate gene polymorphisms within proteoglycanencoding genes and shoulder pain/disability following breast cancer treatment in women.

Study design
The study design is a pilot, cross-sectional, genetic association study based on the candidate gene approach.

Participants and setting
A total of 254 participants were conveniently recruited in the year period 2013-2018, from the waiting room of the Oncology Clinic of a tertiary public teaching hospital in South Africa. All eligible participants (Table 1) agreeing to participate gave written informed consent. The recruited participants selfidentified as 'mixed-ancestry' ethnicity, a rich genetic admixture ancestrally derived from immigrants from Western Europe, West Africa, Asia and the indigenous Southern African populations [19].

Study procedures
Study procedures have been previously reported [7]. Briefly, eligible consented participants completed the Shoulder Pain and Disability Index (SPADI) questionnaire and had their bloods drawn by venipuncture at the cubital fossa of the unaffected side using EDTA vacutainer tubes. Whole blood samples were immediately stored at − 20°C until total DNA extraction using the method descried by Lahiri et al. [20].

Patient-reported outcome measure
The primary outcome measure in this study was the SPADI, a validated and reliable patient-reported questionnaire with two domains: Pain (5 items) and Disability (8 items) [21,22]. Participants rated pain or difficulty associated with specific activities of daily living on a visual analog scale (VAS) of 0 (no pain/difficulty) to 10 (extreme pain/difficulty). Symptom scores for both SPADI domains were reported as percentages of possible total scores [22]. Pain and disability scores were categorized according to score effects on activities of daily living and clinical relevance [7]; SPADI scores > 30 are regarded as having moderate-severe effects on activities of daily living [23], while patients with specific shoulder pain diagnoses, or on pain medication, were reported to have scores > 30 [21]. The reference 'no-low' category consisted of participants with SPADI pain/disability scores < 30, whereas the case 'moderate-high' category consisted of participants with SPADI pain/disability scores ≥ 30.

Single-nucleotide polymorphism (SNP) selection
SNPs with global minor allele frequency > 0.15 in the ENSEMBL database (http://www.ensembl.org) were selected for investigation based on meeting one or more of the following criteria: Identified from a whole exome sequencing project on risk factors for tendinopathy or musculoskeletal soft tissue injuries [24] Functional significance, based on reported effects on gene expression or protein function Located in regulatory gene regions Previous associations with multifactorial soft-tissue shoulder conditions.
A total of ten SNPs within four proteoglycan-encoding genes were included (Tables 4 and 5). In order to ensure robust genetic association analyses, only SNP call rates of > 95% and Hardy-Weinberg p values > 0.05 were included.

Genetic analyses
Genotyping was performed using TaqMan™ assays (Applied Biosystems) in 96-well plates, following manufacturer's instructions in a QuantStudio™ 3 Real-Time PCR System (Applied Biosystems) at the Division of Exercise Science and Sports Medicine, University of Cape Town. Both negative controls (no DNA sample), positive controls (DNA of known genotypes) and replicates (sample duplicates) were included in every plate to evaluate the reliability of the PCR and detect potential genotyping errors. The genotyping data were analyzed on Thermo Fisher Cloud genotyping analysis Software Version: 3.3.0-SR2-build 21 with automatic genotype calling for the 9 SNPs: ACAN (rs1126823 A>G, rs1516797 T>G); BGN (rs1042103 G>A, rs743641 A>T, rs743642 G>T), DCN (rs516115 C>T) and VCAN (rs11726 A>G, rs2287926 G>A, rs309559 A>G). Due to less-efficient amplification for the ACAN rs2882676 A>C SNP, genotypes were manually called and compared with the manual calls of an independent blinded technical support member with 99.7% similarity.

Bias
Nine percent (23 out of 254) of participants could not provide bloods because they were lost after consent when they went for further medical examination in the clinic. Although there may be differences between participants who provided blood and those who did not, it is unlikely as all participants were randomly identified and consented.

Statistical analysis
The calculation of sample size for this study, using QUANTO version 1.2.469 [25], was described previously [7]. A sample size of N = 231 was regarded likely sufficient to detect odds ratios of ≥ 2.5 for allele frequencies ≥ 0.15, assuming an expected average baseline risk for shoulder pain (32%) and disability (25%), for dominant or additive genetic models [7].
Demographic and clinical data were analyzed using Statistica version 13.2.70 [26]. Mann Whitney U tests were used to evaluate differences in quantitative characteristics between the shoulder pain/disability categories, given that the data was non-parametric. Fisher's exact and Chi-square analyses were performed to evaluate differences in categorical demographic and clinical characteristics between the shoulder pain/disability categories.
The genotype data were analyzed using R Studio version 1.3.895 running R version 3.6.3 [27,28]. Chi-square and Fisher's exact tests were used to evaluate differences in the genotype, allele and inferred haplotype frequencies between the shoulder pain/disability categories. Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) were calculated using R package 'genetics' version 1.3.8.1.2 [29]. Logistic regression analyses were performed using R package 'SNPassoc' version 1.9-2 to evaluate the association between SNP genotype and shoulder pain/disability category membership [30]; the best model (with the lowest Akaike Information Criterion (AIC)) was chosen among dominant, recessive and log-additive models. Using the R package 'haplo.stats' Version 1.7.9 [31], inferred haplotypes for the ACAN, BGN and VCAN polymorphisms were constructed using the genotype date for each SNP investigated. To investigate possible gene-gene interactions in modulating risk for shoulder pain/disability, inferred allele combinations were constructed using the relevant genotype data for the genes. The choice of SNPs for inferred allele combination construction was based on stepwise backward elimination logistic regression analysis. In each step, the least informative SNPs whose exclusion lowered, and therefore improved, the AIC of the model was removed until the last three SNPs representing the best model for shoulder pain or disability with three SNPs. To avoid saturating the models while controlling for confounding, only participants' age, which was shown to be associated with our primary outcomes, was included in all multivariate regression models. For all inferred haplotypes or allele combinations, a low haplotype frequency cut-off of 4% was used to improve validity. Stepwise regression analyses were performed using R package 'MASS' version 7.3-51.5 [32]. R package 'ggplot2' version 3.3.2 was used to produce all graphs [33]. The level of significance was set as p < 0.05.

Results
Differences in clinical and demographic characteristics between pain/disability categories Participants in the moderate-high shoulder pain category were significantly younger compared with those in the no-low shoulder pain category 53.8 (45.3-64.3) vs. 60.8 (53.5-65.5), p = 0.001) ( Table 2). Similarly, participants in the moderate-high shoulder disability category were significantly younger compared with those in the no-low disability category (54.4 (45.0-64.9) vs. 60.4 (53.2-65.2), p = 0.014) ( Table 3). However, no significant differences (p > 0.05) were noted between participants in the no-low and moderatehigh shoulder pain/disability categories for all other variables. Despite being statistically insignificant, a lower proportion of participants in the moderate-high shoulder pain/disability category underwent the more aggressive surgeries: mastectomy and axillary lymph not dissection, compared with those in the no-low shoulder pain/disability category (Tables 2 and 3). In addition, a higher proportion of participants in the  Tables 2 and 3). Receipt of adjuvant radiotherapy was only notable for shoulder pain categories, with a higher proportion of participants in the moderatehigh category receiving the same compared with participants in the no-low category (Table 2). Interestingly, we noted that (p = 0.014) a higher proportion of participants with the GG (88.2%, n = 60) genotype for ACAN rs1126823 A>G received hormonal therapy compared with those with AA or AG (73.0%, n = 111) genotypes (Supplementary Table 1). A significantly (p = 0.034) lower proportion of participants with the TT (58.3%, n = 14) genotype for BGN rs743641 A>T received hormonal therapy compared with participants with AA or AT (80.1%, n = 157) genotypes. Furthermore, a significantly (p = 0.001) higher proportion of participants with the AA (92.9%, n = 65) genotype of ACAN rs2882676 A>C had mastectomy compared with those with AC or CC (73.1%, n = 114) genotypes. Whereas, individuals with a TT (66.7%, n = 44) genotype for DCN rs516115 C>T were significantly (p = 0.007) less likely to have mastectomy compared with CT or CC (84.0%, n = 136) genotype carries.

Genotype/allele frequency distributions between shoulder pain/disability categories
The genotype frequencies of the VCAN rs11726 A>G polymorphism were significantly different (p < 0.05) between the no-low and moderate-high categories for both shoulder pain and disability, including after adjustment for participants' age (Tables 4 and 5). In particular, the GG genotype of VCAN rs11726 A>G was significantly more common (p = 0.005; OR = 2.326, 95% CI = 1.250-4.348) in the moderate-high shoulder pain category (48%) in comparison with the no-low shoulder pain category (29%) (Table 4). Similarly, the GG genotype of VCAN rs11726 A>G was significantly more common (p = 0.011; OR = 2.439, 95% CI = 1.235-4.762) in the moderate-high shoulder disability category (51%) in comparison with the no-low shoulder disability category (30%) ( Table 5). No significant differences were noted in allele frequency distributions between the no-low and moderate-high categories for the VCAN rs11726 A>G polymorphism (Tables 4 and 5). However, there was a trend (p = 0.069) towards over-representation of the A allele of VCAN rs11726 A>G in the no-low shoulder disability category (44%) in comparison with the moderate-high disability category (33%) ( Table 5).
For both shoulder pain and shoulder disability, no significant differences (p > 0.05) in the genotype/allele frequency   distributions were noted between the no-low and moderate-high categories for the following SNPs: ACAN rs1126823 A>G, ACAN rs1516797 T>G, ACAN rs2882676 A>C, BGN rs1042103 G>A, BGN rs743641 A>T, BGN rs743642 G>T, DCN rs516115 C>T, VCAN rs2287926 G>A and VCAN rs309559 A>G (Tables 4 and 5). The genotype distributions for the whole group were in Hardy-Weinberg equilibrium (HWE exact test p > 0.05) for all SNPs investigated in this study (Supplementary Table 1).
Inferred haplotype frequency distributions between shoulder pain/disability categories No significant differences were noted in the frequency distribution of the ACAN (rs1126823 A>G -rs1516797 T>G -rs2882676 A>C), BGN (rs1042103 G>A -rs743641 A>T -rs743642 G>T) or VCAN (rs11726 A>G -rs2287926 G>A -rs309559 A>G) inferred haplotypes between the no-low and moderate-high shoulder pain/disability categories (p > 0.05) (Fig. 1).

Inferred allele combination frequency distributions between shoulder pain/disability categories
No significant differences (p > 0.05) were noted in the frequency distribution of the inferred DCN rs516115 C>T -VCAN (rs2287926 A>C -rs11726 A>G) allele combinations between no-low and moderate-high shoulder pain categories ( Fig. 2A). However, significant differences were noted in the frequencies of the BGN (rs743641 A>T -rs743642 G>T) -VCAN rs11726 A>G (p = 0.011) inferred allele combinations between participants with no-low and moderate-high shoulder disability (Fig. 2B). In particular, the T-T-G inferred allele combination of BGN (rs743641 A>T-rs743642 G>T) -VCAN rs11726 A>G was significantly over-represented (p = 0.002; OR = 2.347, 95% CI = 1.215-4.534) in the moderate-high shoulder disability category compared with the no-low shoulder disability category (Fig. 2B). Moreover, a trend was noted towards overrepresentation (p = 0.050) of the T-T-A allele combination of BGN (rs743641 A>T -rs743642 G>T) -VCAN rs11726 A>G in the no-low shoulder disability category in comparison with the moderate-to-high disability category (Fig. 2B).

Discussion
There is a paucity of studies investigating the role of genetic factors in modulating susceptibility to shoulder pain or disability amongst breast cancer survivors. Whereas proteoglycan gene polymorphisms or expression levels have been implicated in connective tissue conditions of the shoulder or other sites such as the knee [10,[12][13][14][15][16], their role in shoulder morbidity amongst breast cancer survivors is unknown. We found associations between VCAN rs11726 A>G genotype and BGN (rs743641 (251A>T) -rs743642 (318G>T)) -VCAN rs11726 (1429A>G) inferred allele combinations, and shoulder pain/disability among women following breast cancer treatment. To the best of our knowledge, our study is the first to report associations between proteoglycan gene polymorphisms and shoulder pain/disability amongst breast cancer survivors. This adds to the body of evidence indicating genetic predisposition to shoulder pain or disability following breast cancer treatment [7]. The VCAN rs11726 (1429A>G) polymorphism-associated with shoulder pain/disability both independently and in an inferred allele combination with BGN: rs743641 (251A>T) and rs743642 (318G>T)-is located in the 3' UTR gene region and has been shown to alter VCAN gene expression levels (Ensembl genome browser data, database version 100.38, Genome Reference Consortium Human Build 38) [34]. In particular, the G allele of VCAN rs11726 demonstrates higher levels of expression relative to the ancestral A allele in skeletal muscle and pancreatic cells [34]. The VCAN rs11726 polymorphism is also located within the long non-coding transcript, VCAN-AS1, which is an antisense RNA transcript for VCAN and hence another possible mechanism for regulating VCAN expression. Although nonsignificant, it was interesting to observe a trend (p = 0.069) towards under-representation of the G allele amongst participants reporting no-low levels of shoulder disability compared with those that reported moderate-high levels of shoulder disability (Table 5). Increasing the samples population is therefore important towards evaluating the association of these alternate alleles with shoulder disability. The exact mechanism by which the VCAN rs11726 polymorphism may lead to shoulder disability or pain is not clear, given the versatile and complex functions of VCAN [17]. Possibly, VCAN interacts with important signaling factors such as the fibrogenesis factor TGF-β and the inflammatory factor nuclear factor kappa B (NF-κB), which may contribute to shoulder morbidity by promoting fibrogenesis and nociceptive signaling, respectively [17,35,36]; perhaps, increased expression of VCAN amongst G allele carriers leads to enhanced fibrosis and pain signaling in the shoulder in response to late treatment effects amongst breast cancer survivors. The BGN rs743641 and rs743642 polymorphisms are both located in the 3' UTR gene regulatory region, but have no reported gene expression correlations nor previous associations with connective tissue disorders of the shoulder. Another possible explanation for the role of the associated VCAN and BGN polymorphisms in this study could be that they are in linkage disequilibrium with other SNPs that are involved in the development of pain or disability. Whereas no previous associations have been reported between VCAN rs11726, BGN rs743641 or rs743642, and shoulder morbidity to date, other polymorphisms in BGN have been implicated in connective tissue disorders such as ACL ruptures [14,16]. In addition, upregulation of VCAN and BGN expression has been demonstrated in a rat model of rotator cuff injury [37].
Consistent with previous reports on upper limb morbidity amongst breast cancer survivors [3,6,38], participants' age was significantly associated with both shoulder pain and shoulder disability following breast cancer treatment in our study. Younger participants were more likely to be in the moderate-high shoulder pain or disability category. The link between age and pain reporting remains unclear. One possible explanation is the reported reduction in pain sensitivity with age as determined from pressure pain threshold (PPT) measurements which may be relevant for movementrelated pain that is measured by the SPADI instrument and (E and F) VCAN (rs11726 A>G -rs2287926 G>A -rs309559 A>G) inferred haplotypes between participants with no-low and moderate-high shoulder pain/disability following breast cancer treatment. Global p values (adjusted for participants' age) are noted centrally at the top of each graph in our study [39]. Given that PPT measurements are subjective, the reported association may reflect changes in pain perception with age. Contrary to previous reports [2,5,6,40,41], type of breast surgery, having axillary surgery and receipt of adjuvant therapy were not significantly associated with shoulder pain or shoulder disability in our study. In fact, a higher frequency of the more aggressive surgical procedures mastectomy and ALNDcompared with the conservative WLE and SLNB-was observed in the no-low pain/disability group (Tables 2  and 3). This finding may perhaps be specific to our cohort; pain reporting has been associated with ethnicity [1], and the average time after treatment in our cohort is longer than that of most similar studies. Consistent with our findings, De Groef et al. [42] reported a high prevalence of upper limb morbidity amongst breast cancer patients who underwent the less invasive sentinel nodenegative suggesting that upper limb morbidity amongst breast cancer survivors may not be largely explained by factors related to surgical management after long followup periods.
It was interesting to note the genotype associations of ACAN rs1126823 G>A, ACAN rs2882676 A>C, BGN rs743641 A>T and DCN rs516115 C>T with treatment characteristics in our cohort. This may reflect, at least in part, the role of proteoglycans in the development and progression of cancer, thereby, influencing treatment options and selections. Although, no studies to date have demonstrated their roles in breast cancer, BGN and DCN have been implicated in the development and progression of endometrial, bladder, colon, blood or lung cancers [43].

Limitations
Despite being adequate for medium-large effect sizes (OR ≥ 2.0), our sample size was largely underpowered (power < 80%) for small effect sizes (OR = 1.5). As a result, we did not adjust for multiple comparisons based on the number of SNPs (familywise error rate), given its exploratory nature. Larger sample sizes may detect significant differences in other clinical/treatment characteristics and genotype/allele distributions included in this study. Although clinical relevance was used in creating shoulder pain/disability categories, there was no wide score gap between them. Therefore, close to the boundary score of 30, individuals with otherwise similar shoulder pain/disability characteristics may be in different categories. Ethnicity was determined by self-report, a less reliable method than genetic ancestry estimates, and therefore, there is a possibility of undetermined population stratification in our sample. While determination of genetic ancestry is very useful in detecting population stratification, this study only tested targeted loci using functional polymorphisms in a hypothesis-driven approach. Applying genetic ancestry estimates in this case would be a completely different study, much larger than the one described in this manuscript, such as a genomewide association study. Lastly, associations between SNPs in ACAN/BGN/DCN genes and treatment characteristics (Supplementary Table 1) may have an

Conclusion
Our findings provide evidence of association between polymorphisms in proteoglycan-encoding genes and shoulder pain/disability among women following breast cancer treatment. Future studies in independent populations with larger sample sizes are warranted to replicate our findings and further characterize the reported associations.