Introduction

Growth and differentiation factor 5 (GDF5), also known as cartilage-derived morphogenetic protein 1, is a signaling molecule that acts extracellularly to promote the development, maintenance and repair of synovial joint tissues, particularly bone and cartilage.1 It achieves this by regulating the expression of a number of anabolic genes, including COL2A1, which codes for type II collagen, and ACAN, which codes for aggrecan.1 Homozygous mutations within the GDF5 gene can lead to severe skeletal defects such as those observed in Grebe type and Hunter–Thompson chondrodysplasias.2, 3 Furthermore, brachydactyly type C, which is characterized by the shortening or loss of the phalanges, is due to haploinsufficiency of GDF5 and resides in patients heterozygous for the Grebe type mutation CDMPC400Y.4 These discoveries led to the hypothesis that skeletal elements can vary in their sensitivity to the loss of GDF5 function.4 This hypothesis is supported by studies of the brachypodism mutation in mice, which involves a frame-shift mutation of gdf5 that produces a truncated and inactive GDF5. Mice homozygous for the mutation present with overt skeletal defects, whereas heterozygous mice do not manifest an obvious abnormal phenotype but are predisposed to develop an osteoarthritis (OA)-like condition when challenged.5, 6

Based on the known role of GDF5 in skeletogenesis, Miyamoto et al7 selected GDF5 as a plausible candidate for harboring OA susceptibility in humans. OA is an age-related common polygenic disease characterized by the thinning and loss of the articular cartilage in synovial joints such as the hips and the knees.8 This study by Miyamoto et al7 revealed that a single-nucleotide polymorphism (SNP) in the 5′-untranslated region (5′-UTR) of GDF5, rs143383, was associated at genome-wide significance with OA in a Japanese population. This result was quickly replicated in a number of other OA populations.9 rs143383 is a C/T transition and the OA-associated T-allele of rs143383 has been shown to produce less mRNA transcript compared with the ancestral C-allele, indicating that reduced GDF5 expression is likely to be the mechanism through which this OA susceptibility is operating.7, 10 The effect that rs143383 has on GDF5 expression is modulated by a second C/T transition SNP, rs143384, which is also located within the 5′-UTR of the gene; luciferase reporter assays have revealed that the reduction in GDF5 expression occurs only when the T-allele of rs143383 is present on the same chromosome with the T-allele of rs143384.11

rs143383 is common in Asians and Europeans, with a minor allele frequency (MAF) >20% for the C-allele. Genome-wide association scans of polygenic traits have revealed that common alleles individually contribute only modestly to trait variance and there are claims that rare variants of greater individual impact could account for some of the missing heritability.12, 13 As such we were keen to assess whether GDF5 harbored rare variants that could influence OA susceptibility by either acting as independent risk factors or by modulating the influence of rs143383, akin to rs143384. We therefore performed a deep sequence analysis of GDF5, on >1900 European individuals, and focused our sequencing on the proximal regulatory elements of the gene (promoter, UTRs, intron/exon boundaries) and on the protein coding sequence. This revealed an absence of rare GDF5 variants, with the gene harboring either known common variants, with MAFs ≥2.5%, or very rare novel variants, with MAFs ≤0.006%.14 In total, the study uncovered six novel variants; five within the protein coding sequence of GDF5 and one in the proximal promoter, 41 bp upstream of the transcription start site. Clearly, the extremely low frequency of these variants means that none can have a genetic impact at the population level on OA risk. We were, however, intrigued by the −41 bp variant, since this C/A transversion is located in a conserved sequence of GDF5 and a search of databases revealed that the novel A-allele is likely to impact on the binding of trans-acting regulators of gene transcription. If the −41 bp variant does highlight a genuine regulatory domain of GDF5, then this may offer us with a DNA site that could be exploited to manipulate the expression of the gene and therefore potentially alleviate the detrimental effect that the T-allele of rs143383 has on GDF5 expression and therefore on OA risk. In this report we describe the experiments that were performed to assess whether the −41 bp variant is functional.

Materials and methods

Construction of GDF5-pGL3-basic luciferase reporter plasmids

We investigated the −41 bp variant in combination with rs143383 and rs143384. The physical order of the three variants is −41bp-rs143383-rs143384. To generate constructs for use in the GDF5 promoter/5′-UTR luciferase reporter gene assays, a 403-bp fragment of genomic DNA corresponding to the GDF5 proximal promoter and part of the 5′-UTR (−97 to +305 with respect to the transcriptional start site of GDF5) and encompassing the three variants was PCR amplified using gene-specific primers containing either a MluI (forward primer 5′-ACTCAACGCGTGGATTCAAAACTAGGGG-3′) or a BglII (reverse primer 5′-GCACAAGATCTAGCCGCTGAATGACACCAAAG-3′) restriction site at the 5′ end of the primer. The PCR fragment was then cloned into the MluI/BglII sites of the purified pGL3-basic vector’s multiple cloning site (Promega, Southampton, UK). Plasmid DNA was transformed into MACH 1 bacterial cells (Invitrogen, Paisley, UK) and following growth on agar plates, clones were picked and sequenced by Sanger chemistry using capillary electrophoresis. The starting input DNA for this cloning strategy came from the individual in whom the −41 bp variant had been originally detected.14 This individual was heterozygous for the −41 bp variant and for rs143383 and rs143384, and carried a −41 bp (A) – rs143383 (C) – rs143384 (C) haplotype (in future known as the A-C-C haplotype) as well as a −41 bp (C) – rs143383 (T) – rs143384 (T) haplotype (in future known as the C-T-T haplotype). To generate A-T-T and C-C-C haplotypes we used the Agilent Quickchange II site-directed mutagenesis kit (Agilent, Wokingham, UK). To generate A-T-T we used the C-T-T clone and the forward primer 5′-CAGCATTACGCCATTATTCCTTCTTGGAAA-3′ and the reverse primer 5′-TTTCCAAGAAGGAATAATGGCGTAATGCTG-3′. To generate C-C-C we used the A-C-C clone and the forward primer 5′-CAGCATTACGCCATTCTTCCTTCTTGGAAA-3′ and the reverse primer 5′-TTTCCAAGAAGGAAGAATGGCGTAATGCTG-3′. Purified DNA for each clone was prepared using the Maxi-Prep kit (Qiagen, Crawley, UK) before transfection.

Transfection of cell lines

Two human cell lines were tested; the SW1353 chondrosarcoma cell line and the MG63 osteosarcoma cell line. SW1353 cells were seeded in Dulbecco’s modified Eagle’s medium/nutrient mixture F-12 (DMEM/F-12; Gibco, Paisley, UK), containing 5% fetal bovine serum (FBS), penicillin, streptomycin (Sigma Aldrich, Gillingham, UK) and glutamine at a density of 17 500 cells per well in a 48-well cell culture plate (Costar, Gillingham, UK). MG63 cells were seeded as above but with Dulbecco’s modified Eagle’s medium (DMEM; Gibco). SW1353 cells were cultured for 48 h and MG63 cells for 24 h. The cells were then transfected using 500 ng of pGL3 plasmid DNA and 15 ng of pTK-RL Renilla plasmid in combination with Exgen 500 Transfection reagent (Fermentas, Sankt Leon-Rot, Germany). Twenty-four hours after transfection the cells were lysed, using passive lysis buffer (Promega), and the protein extracted and stored at −20°C. Lysate sample was mixed with luciferase activating reagent II (Promega) and a 1-s luciferase activity reading was measured using a MicroLumat Plus LB96V luminometer (EG&G Berthold, Bad Wildbad, Germany). Stop and Glo (Promega) was added to each sample to measure renilla activity, using the same exposure time. An empty vector transfection was performed as a control. Each experiment contained six replicates, and was repeated three times, producing a total of 18 data points. A Student’s t-test was performed to assess any significant differences in luciferase activity between the different haplotypes.

Nuclear protein extraction

SW1353 and MG63 cells were plated on 500 cm2 plates at a density of 20 × 106 cells per plate. After 24 h the culture medium was removed and the cells were washed in ice-cold PBS. The PBS was then removed and the cells were scraped into 5 ml of fresh PBS and centrifuged for 30 s at 10 000 g at 4°C. The cells were then re-suspended in 1 ml of hypotonic buffer (Roche, Burgess Hill, UK) and incubated on ice for 15 min. The cells were then re-pelleted and the pellet re-suspended in ice-cold hypotonic buffer supplemented with 0.25 M sucrose. The cells were centrifuged and the pellet was re-suspended in 1.5 ml high salt buffer (20 mM HEPES, pH 7.9, 420 mM NaCl, 20% glycerol (v/v), 1 mM DTT, 10 mM NaF, 1 mM Na3VO4, and one complete protease inhibitor cocktail tablet per 50 ml of buffer). Following a 30-min incubation on ice the cells were centrifuged and the supernatant containing nuclear protein was collected and stored at −80°C.

Electrophoretic mobility shift assay (EMSA)

Fluorescently labeled (5′ DY682) oligonucleotides (Eurofins MWG Operon, London, UK) were re-suspended in water (Sigma Aldrich) to a concentration of 100 pmol/μl. The oligonucleotides were annealed to create double-stranded probes at 95 °C for 5 min in a solution containing EMSA annealing buffer (Odyssey Infrared EMSA kit, Li-Cor Biosciences, Cambridge, UK). Following this incubation, the reaction was allowed to cool to room temperature for 2–3 h. Five μg of nuclear protein extract was then incubated with 200 fmol of double-stranded probe and incubated for 20 min at room temperature and in the dark. The samples were then electrophoresed through a native 2.3% (weight/volume) polyacrylamide gel at 100 V at 4 °C in the dark for 4 h, or until the dye front had reached the gel end. Visualization was carried out using an Odyssey Infrared Imager (Li-Cor Biosciences). When carrying out supershift EMSAs, 2 μg of antibody (Transcruz, Santa Cruz, CA, USA) was added in place of the unlabeled competitor oligonucleotides. Supplementary Table S1 lists the nucleotide sequences of the two −41 bp probes and of the competitor oligonucleotides used.

YY1 knockdown

Cells were seeded at a density of 4000 cells per well in a 96-well cell culture plate (Costar) for PCR or 125 000 cells per well in a six-well plate (Costar) for protein extraction. Transfection of siRNA was then performed using ON TARGET Plus smartpool human siRNA (Dharmacon, Chicago, IL, USA). The cells were lysed using Cells to CT lysis buffer (Invitrogen) and the RNA was reverse transcribed using MMLV (Invitrogen). Gene expression was measured by quantitative real-time PCR using PrimeTime Mini qPCR Assays for YY1 and GAPDH (Integrated DNA Technologies, Coralville, IA, USA), an ABI gene expression assay for GDF5 (Applied Biosystems, Foster city, CA, USA) and an ABI PRISM 7900HT Sequence Detection System. Expression of YY1 and of GDF5 relative to GAPDH was determined using the 2−ΔCt method. Twelve replicates were performed. Differences in expression were assessed using a Student’s t-test. Cells from the six-well plate were harvested using lysis buffer (50 mM Tris, 10% glycerol, 50 mM NaF, 1 mM EGTA, 1 mM EDTA, 10 mM glycerol phosphate, 1% Triton X-100, 1x complete inhibitor cocktail, 1 μ M microcystin-LR and 1 mM Na3VO4). Following quantification using Bradford reagent (Expedeon, Cambridge, UK), 10 μg of protein was used for western blotting with a monoclonal β actin antibody (Sigma Aldrich) as a loading control and the same YY1 antibody as used in the EMSAs.

YY1 expression in human articular chondrocytes

We collected articular cartilage specimens from OA patients who had undergone elective joint replacement surgery of the hip or the knee, using informed consent and local ethical committee approval. RNA was extracted from the cartilage and cDNA was synthesized as described previously.15 The expression of YY1 was then assessed by quantitative real-time PCR, as described above. For protein analysis the chondrocytes were extracted from the cartilage extracellular matrix by enzymatic digestion using hyaluronidase, trypsin and collagenase. Following extraction, chondrocytes were cultured in DMEM culture medium containing 10% FBS, 2 mM L-glutamine, 200 IU/ml penicillin, 200 μg/ml streptomycin and 40 IU/ml Nystatin. These cells were then seeded at a density of 500 000 cells/well in a six-well plate. When the cells were 80–90% confluent the nuclear protein was extracted, as described above.

Results

The −41 bp variant was discovered in an individual who was heterozygous at this site and also for the rs143383 and rs143384 SNPs. Cloning and sequencing of this person’s DNA revealed the two haplotypes to be A-C-C and the C-T-T. We therefore investigated the effect of the −41 bp variant within the context of these natural haplotypes and in the context of the alternative A-T-T and C-C-C haplotypes.

In the comparison of A-C-C with C-T-T it was apparent that A-C-C drove greater luciferase expression than C-T-T, with a 30% difference (P<0.001) seen in MG63 and a 20% difference (P<0.001) seen in SW1353 (Figure 1 and Supplementary Table S2). As noted earlier, it is already known that the T-allele of rs143383, when in combination with the T-allele of rs143384, results in reduced expression. As such, the greater expression seen here for the A-C-C haplotype could reflect the functionality of rs143383 and rs143384 rather than any functionality of −41 bp. To resolve this dilemma we studied the alternative haplotypes A-T-T and C-C-C. When the A-T-T haplotype was compared with the C-T-T haplotype, A-T-T drove greater expression, with a 21% difference (P<0.001) in MG63 and a 10% difference (P=0.044) in SW1353. Furthermore, when the C-C-C haplotype was compared with the A-C-C haplotype, C-C-C resulted in reduced expression, with a 17% difference (P=0.0072) in MG63 and a 25% difference (P<0.001) in SW1353. These data clearly demonstrate that the −41 bp variant is itself functional and that the A-allele discovered in our deep-sequencing experiments mediates increased gene expression.

Figure 1
figure 1

Luciferase (LUC) reporter assays of GDF5 promoter/5′-UTR constructs in the osteogenic MG63 cell line (a) and the chondrogenic SW1353 cell line (b). A schematic drawing of the promoter/5′ UTR construct is shown at the top left of each panel and highlights the relative positions of the −41 bp, rs143383 and rs143384 polymorphisms. Data are the fold expression in relation to the control empty pGL3 vector, and are shown as the mean and standard error (SE) of the mean from three independent experiments, each performed with six biological replicates. P-values were calculated using a Student’s t-test. *P<0.05, **P<0.01, ***P<0.001; NS, not significant.

What was particularly exciting about our luciferase data were the discovery that introducing a -41-bp A-allele on to a rs143383-rs143384 T-T haplotype neutralizes or even reverses the capacity of this T-T haplotype to mediate reduced expression. This is apparent when one compares the relative expression seen between the C-C-C and the A-T-T haplotypes in Figure 1. In MG63 the A-T-T expression is equivalent to the C-C-C expression (P=0.21) despite A-T-T harboring the T-alleles of rs143383 and rs143384, while in SW1353 the A-T-T expression is now greater than that of C-C-C (P=0.0025).

Having established that the −41 bp variant is functional in respect of gene expression we then went on to use EMSAs to characterize trans-acting factors that bind to this site. We investigated both alleles of −41 bp and nuclear extracts from MG63 and SW1353 cells. We observed a differential pattern of protein:DNA complexes on addition of labeled DNA probes corresponding to each allele, implying a difference in nuclear protein binding between the alleles (lanes C and A in Supplementary Figure S1). Based on intensity there were four major protein:DNA complexes for the A-allele, two of which were also observed for the C-allele. The four protein:DNA complexes were outcompeted with increasing concentrations of unlabeled competitor, indicated with arrows in Supplementary Figure S1. The unlabeled A-allele competitor was then used to compete binding to the C-allele and vice versa in order to assess the affinity of binding to each allele. The A-allele competitor was able to compete binding of the protein:DNA complexes using a lower amount of competitor in comparison with the C-allele competitor, implying that the protein complexes are binding more strongly to the A-allele.

The −41 bp variant resides within a sequence of GDF5 that is conserved amongst mammals (Supplementary Figure S2). Using the online prediction databases TESS and Promo 3, we identified several transcription factors whose consensus sequences indicated a potential binding with the sequence encompassing the variant and in which this binding is predicted to be influenced by the C/A substitution. We tested the ability of YY1, SOX9, abaA, SEF4, VDR and GRLF consensus sequences to out-compete the binding to the −41bp C-allele and A-allele probes, using the competitor oligonucleotides listed in Supplementary Table S1. We did not detect any competition when using abaA, GRLF, Sef4, SOX9 or VDR competitors (data not shown). We did, however, detect competition for YY1 in both MG63 and SW1353 as indicated by the arrow in Figure 2, which highlights the protein:DNA complex band that demonstrates the clearest example of this.

Figure 2
figure 2

EMSAs performed on MG63 nuclear protein extract and on SW1353 nuclear protein extract using increasing concentrations of the YY1 consensus sequence unlabeled competitor probe. C and A correspond to the fluorescently labeled −41bp C-allele and A-allele probes used, respectively. YY1 Comp represents the unlabeled YY1 consensus sequence competitor probe added at 5 × , 10 × and 50 × relative to the C-allele and A-allele probes. Arrows depict the location of a protein:DNA complex whose intensity clearly diminishes as more competitor is added.

We subsequently carried out a YY1 antibody supershift EMSA using both MG63 and SW1353 nuclear extracts (Figure 3). The protein complex that we had highlighted in Figure 2 now showed a clear supershift on addition of the YY1 antibody in both cell types, confirming binding of YY1 protein. The supershift occurs for both the C and the A alleles.

Figure 3
figure 3

Antibody supershift EMSA performed on MG63 nuclear extract and on SW1353 nuclear extract. Con is the control antibody while YY1 is the YY1 polyclonal antibody. The lower arrows in both panels depict the same band seen in Figure 2, which has now supershifted following binding to the YY1 antibody (upper arrows). A supershift occurs for both the C and A alleles.

Overall, the EMSA data identified YY1 as a trans-acting factor that binds to the GDF5 sequence containing the −41 bp variant with YY1 binding more avidly to the rare A-allele.

We then used siRNA to knockdown YY1 in MG63 cells and assessed the effect this had on GDF5 expression (Figure 4). Our successful knockdown of YY1 (P<0.001) resulted in a reduction of YY1 protein and a subsequent significant reduction in expression of GDF5 (P=0.0097). Similar data was obtained for SW1353 (data not shown). This result supports YY1 as an activator of GDF5 expression. We finally demonstrated expression of YY1 in OA articular chondrocytes by quantitative PCR and the presence of YY1 protein in the same cells by western blotting (Supplementary Figure S3).

Figure 4
figure 4

Gene and protein expression levels 48 h after knockdown of YY1 using siRNA in MG63 cells. Left, YY1 gene expression when treated with YY1 siRNA compared with that of a non-targeting (NT) siRNA. Middle, GDF5 gene expression when treated with YY1 siRNA compared with that of a NT siRNA. Data shown are gene expression relative to that of the GAPDH housekeeper plus the SE of the mean. The YY1 expression data are presented as 2−(YY1-GAPDH) while the GDF5 expression data are presented as 2−(GDF5-GAPDH), with statistical analysis performed using a Student’s t-test (***P<0.001, **P=0.0097). Right, western blot of YY1 protein from cells treated with either NT siRNA or a YY1 siRNA. β actin antibody was used as a protein housekeeper to confirm equal loading.

Discussion

Using a range of techniques we have demonstrated that the −41 bp variant discovered by our deep-sequencing analysis of GDF5 is functional, with the novel A-allele mediating increased gene expression. One of the particularly exciting outcomes of our luciferase assay was the discovery that this A-allele is able to neutralize the reduced expression that is mediated by the OA-associated T-allele of rs143383. This result clearly demonstrates that the activity of a susceptibility allele is highly context specific and that negative effects on gene expression mediated by cis-acting regulatory sites are potentially modifiable by targeting other cis sites within the gene.

We have previously demonstrated that DNA methylation of CpG dinucleotides can modulate the expression of GDF5.16 However, the −41 bp variant neither destroys nor creates a CpG site nor is it part of a CpG island. Furthermore, the luciferase reporter assays and EMSA experiments that we performed do not use native DNA but instead use amplified DNA and synthesized oligonucleotide primers, respectively. As such they do not recapitulate any methylation profile that may exist in the native DNA. The differences that we identified between the expression of the C and A alleles of the −41 bp variant are not therefore accounted for by differential methylation of GDF5.

We identified YY1 as a trans-acting factor that differentially binds to the C and A alleles of the −41 bp variant, with more avid binding to the A-allele. Knocking down YY1 led to a reduction in GDF5 expression, implying that YY1 is an activator of GDF5 expression. YY1 (Yin Yang 1) is a widely expressed zinc-finger nuclear protein that is conserved between species and which can act as a transcriptional repressor or activator, depending on cellular context.17 It has roles in cellular proliferation, differentiation and apoptosis, with yy1 knockout mice having in utero lethality.18 As far as we are aware the protein has not been implicated in OA pathogenesis. Its discovery as a regulator of GDF5 expression now opens up the possibility that it can be exploited to modulate the expression of this important OA susceptibility locus. This may be initiated by screening compound libraries to identify proteins or other factors that can upregulate the expression of YY1 in cartilage chondrocytes in vitro. Such approaches are routine in industry and are becoming established in academia.19

When we carried out our deep sequencing of GDF5 our primary goal was to assess whether this gene harbored other, novel, polymorphisms that could directly contribute to OA genetic susceptibility at a population level; by definition, such polymorphisms would need to be common. Our experiment, however, only identified extremely rare variants, with MAFs ≤0.006%.14 It is becoming increasingly apparent that the human genome has an abundance of such rare, often unique, variants with their existence accounted for by the relatively recent explosion in the growth of the population.20 The −41 bp variant would appear to fit into this category and its extremely low frequency means that it cannot impact at the population level on OA risk.

Nevertheless, our functional analysis of the variant has demonstrated that despite its uniqueness it offers insight into the regulation of expression of GDF5 and has provided us with a protein, YY1, that modulates GDF5 expression and which can now be exploited to potentially alleviate the OA risk coded for by this gene. Deep sequencing therefore offers not only a means to identify novel susceptibility alleles but also a means to identify a mechanism to counteract the effects of the susceptibility alleles that are already known.