Identical G’l to A Mutations in Three Different Introns of the Type III Procollagen Gene (COL3Al) Produce Different Patterns of RNA Splicing in Three Variants of Ehlers-Danlos Syndrome AN EXPLANATION FOR EXON SKIPPING WITH SOME OTHERS*

Identical G+1 mutations in three different introns of the gene for type III procollagen (COL3A1) that cause aberrant splicing of RNA were found in three probands with life-threatening variants of Ehlers-Danlos syndrome. Because the three mutations were in a gene with multiple and homologous exons, they provided an interesting test for factors that influence aberrant splicing. The G+1 to A mutation in intron 16 caused extensive exon skipping, the G+1 to A mutation in intron 20 caused both use of a cryptic splice site and retention of all the intron sequences, and the G+1 to A mutation in intron 42 caused efficient use of a single cryptic splice site. The different patterns of RNA splicing were not explained by evaluation of potential cryptic splice sites in the introns by either their homology with 5'-splice sites from other genes or by their delta G(0)37 values for binding to U1 RNA. Instead, the results suggested that the patterns of aberrant RNA splicing were primarily determined by the relative rates at which adjacent introns were normally spliced.

A series of mutations that alter RNA splicing have been shown to cause a number of genetic diseases (4-23). The genes for the major fibrillar collagens present an unusual target for RNA splicing mutations, since the large triple-helical domains of the proteins are encoded by 42 exons that are similar both in size and nucleotide sequence (24)(25)(26). The typical exon is 54 bp' and the remaining exons of 45, 99, 108, and 162 bp appear to be related in size to the typical 54-bp exon either as multiples of 54 bp or by loss of 9 bp from 54-or 108-bp exons. Also, all of the 42 exons coding for the repeating -Gly-X-Y-sequences of the large triple-helical domains of the collagens begin with a complete codon for glycine. Therefore, mutations that delete an exon or cause "exon skipping" in RNA splicing generate an mRNA that is in-frame both in terms of coding sequences and in terms of the repeating -Gly- have recently been reported in the gene for type I procollagen in probands with osteogenesis imperfecta (20,21) or EDS-VII (22,23). Three of these mutations were single base mutations (20,22,23) and the fourth was a 19-base pair deletion (21). All four mutations produced efficient exon skipping so that the coding sequences of an exon were eliminated from mRNA transcripts and shorter pro+ chains were synthesized.
Here we report three single base mutations in the gene for type III procollagen that cause aberrant splicing of RNA. All three mutations were found in probands with the type IV variant of EDS (27)(28)(29), the most serious form of the disease which often has life-threatening consequences such as rupture of arteries or hollow organs. Each of the mutations was a single base substitution that converted the first G of an intervening sequence to an A. The three mutations, however, had very different effects in terms of the efficiency with which they induced exon skipping or the use of cryptic splice sites. tures of skin fibroblasts were prepared (30) from the three probands and RNA was extracted with guanidinium isothiocyanate and purified by centrifugation through cesium chloride (31). The RNA was used for Sl nuclease protection experiments using five different singlestranded antisense DNA probes coding for sequences of type III procollagen. The uniformly labeled DNA probes were prepared by cloning appropriate fragments (0.7-1.5 kilobases) of the cDNAs for type III procollagen (32) into the filamentous bacteriophage M13. The inserts in Ml3 were used to synthesize single-stranded antisense DNA labeled with [cr-32P]dCTP as described previously (20,21). The labeled probes were electrophoretically separated from template strands in a 5% polyacrylamide DNA sequencing gel and were electroeluted (31).
To DNA-To prepare a cDNA template for PCR, total RNA was isolated from cultured skin fibroblasts as described above (30,31). The RNA was used to synthesize double-stranded cDNA as described bv Gubler and Hoffman (33) with a commercial kit (Bethesda Research Laboratories). For 'the PCR (34) four sets of oligonucleotides based on the cDNA sequence (32) were used as primers so as to generate overlapping products including all the coding sequences of the cr chain domain and, therefore, made it possible to detect any mutations in the sequences (Fig.  1). The primers were 30-mers in which 20 nucleotides were complementary to the cDNA and 10 nucleotides contained sequences of restriction sites convenient for cloning.
To isolate genomic DNA templates for the PCR, DNA was extracted from 175-cm2 flasks of cultured skin fibroblasts with sodium dodecyl sulfate and proteinase K (31). Three sets of oligonucleotide primers were used for the PCR. For proband I (+l IVSlG), the 5' primer was identical to coding sequences in exon 15 and the 3' primer was complementary to coding sequences in exon 17. For proband II (+l IVSZO), the 5' primer was identical to coding sequences in exon 19 and the 3' primer was complementary to coding sequences in exon 21. For proband III (+l IVS42), the 5' primer was identical to sequences in intron 41 and the 3' primer was complementary to sequences in intron 43 (35).
For DNA sequencing, the PCR products were cloned into M13mp18 or M13mp19, and the clones were sequenced with the dideoxynucleotide method (36). Some of the clones were sequenced with standard procedures using radionucleotides. Other sequences were determined using fluorescently tagged Ml3 primers and analyzing the products on automated DNA sequencer (model 370A, Applied Biosystems).

Definition of RNA Splicing
Patterns-To determine the RNA splicing patterns, a series of single-stranded DNA probes were prepared for probe protection experiments. For proband I (+l IVSlG), a BamHI/BamHI cDNA fragment (nucleotides 173 to 1034 in Ref. 32) extending from exon 2 (37) to exon 16 (35)' was ligated to a genomic BamHI/EcoRI fragment extending from exon 16 to intron 17 (from nucleotide 1035 in the cDNA to about position +738 in intron 17) (35). The ligated product was used as a template for the PCR reaction using a 5' primer identical to sequences in exon 6 (37) and a 3' primer complementary to codons in exon 17 (35)' The PCR product was then cloned into Ml3 to prepare the single-stranded probe as described above except that the two probes generated were end-labeled with la-32PldCTP and the Klenow fragment of DNA nolvmerase I after digestion with AccI or NcoI. For proband II (+l IVSiO), cDNA from the proband's mRNA was used for the PCR using a 5' primer identical to codons in exon 15 and a 3' primer complementary to codons in exon 21 (32,35). The PCR products were separated on NuSieve agarose (FMC BioProducts). The largest fragment corresponding to RNA with unspliced sequences from intron 20 was cloned into Ml3 and sequenced. Nucleotide sequencing of the clone demonstrated that the insert contained the expected coding sequences and all the sequences of intron 20. A single-stranded DNA probe was prepared as described above except that digestion was done with A&. For proband III (+l IVS42), a hybrid cDNA-genomic DNA probe was prepared. The 5' half contained cDNA sequences from the PuuII site in exon 34 to the PuuII in exon 42 (32). The cDNA sequences were ligated to a genomic fragment extending from the PuuII site in exon 42 to an EcoRI site in intron 44 (35). The ligated DNA was then used as a template for a PCR with 5' primer identical to coding sequences in exon 34 and 3' primer complementary to sequences ending with nucleotide +34 of intron 43. The PCR product was cloned into Ml3 and sequenced. A labeled single-stranded DNA probe was prepared as described above except that the DNA was end-labeled with [oI-~'P] dGTP after BstEII digestion.
The appropriate single-stranded DNA probes were used for Sl nuclease protection experiments under the conditions described above except that 50-400 units of Sl were used. Analysis for Potential 5'-Splice Sites-To score normal and potential 5'-splice site with the method of Shapiro and Senapathy (38), the sequences were searched for -GT-dinucleotides and a score was computed for each -GT-dinucleotide in the context of the two nucleotides preceding and the four nucleotides following it. The scoring method was modified slightly (38) in that the -GT-dinucleotide used to identify potential 5' sites did not contribute to the score. The frequency (in percent) with which a particular nucleotide occurred in a given position relative to the -GT-was read from the subtable for primate genes (38). The total (t) or sum of the values for the six positions was then used to give a score according to the formula: score = 100 (ttmi,)/(t,,,., -t,,,& where t,,, and t,., are the minimum and maximum possible totals for the six positions. To calculate A& for binding of Ul RNA to normal and potential 5'-splice sites, the free energy parameters of Freier et al. (39) were used.

Initial
Screening of RNA Using Sl Nuclease Protection Experiments-To search for mutations in mRNAs from the three probands, five single-stranded cDNA probes spanning all the coding sequences of the human (ul(II1) chain (32) were prepared and used for Sl nuclease protection experiments. With RNA from proband I (+l IVSlG), one probe was only partially protected (Fig. 2). The probe was an XhoI/SalI fragment of 708 nt (nt 975-1682 in the cDNA extending from exons 15-25'; Ref. 32) that was uniformly labeled. About half of the probe was fully protected by the RNA from the proband's fibroblasts, and about half was cleaved to two fragments (Fig. 2). One of the cleaved fragments was 620 nt. The other was apparently too small to detect on the polyacryl- amide gels. The results, therefore, suggested that the RNA had an aberrant structure about 88 nt from either the 5'-or 3'-end of the probe, i.e. at about nt 1063 (13 nt downstream of the junction between exons 16' and 17') or nt 1594 (15 nt upstream of the junction between exons 23 and 24') in the cDNA (32).
RNAs from probands II and III fully protected the five probes (not shown).
Sequencing of Coding Sequences of the cDNAs-A series of overlapping PCR products were prepared using mRNA-derived cDNA from fibroblasts from the three probands (Fig.  1). The PCR products were subcloned into Ml3 and sequenced. To further characterize the mutation in proband I (Fig. 2), PCR products 1 and 2 were sequenced. In order to find mutat.ions in probands II and III, multiple clones of all PCR products were sequenced.
With proband I (+l IVSlG), six Ml3 clones of PCR product 1 showed a deletion of all 18 codons of exon 16. Since exon 16 contains nt 997-1050 of the cDNA (32), the results were consistent with the Sl nuclease experiments if allowance is made for the difficulty of accurately estimating the size of the protected fragments (Fig. 2). Four Ml3 clones had a normal sequence in the same region.
With proband II (+l IVSBO), alterations in nucleotide sequences were also seen in Ml3 clones of PCR product 1 (Fig. 1). Of 156 Ml3 clones that were partially sequenced, 60 had an insertion of the first 24 nt from intron 20, three had an insertion of all 132 nt from intron 20, two lacked all the codons of exon 20, and 91 had a normal sequence. The 24and 132-nt insertions all had a single-base mutation that converted the first G of the intron 20 to an A and thereby altered the consensus sequence of -GT-that has been found in most of the introns of eukaryotic genes reported to date (38,40). The 24-and 132-nt insertions did not alter the reading frame, but the 132-nt insertion contained a stop codon at position +34 to +36 of intron 20. The results were consistent with the negative results of the Sl nuclease experiments if it is assumed that the presence of intron sequences in the transcript produced looping out of the RNA but not of the labeled DNA probe used to protect the RNA.
With proband III (+l IVS42), 45 Ml3 clones from PCR product 4 had an insertion of the first 30 nt from intron 42, and three clones had a normal sequence in the region. The 30-nt insertion had a single-base mutation that converted the first G of intron 42 to an A and thereby altered the consensus sequence of -GT-. The 30-nt insertion did not change the reading frame. Again, the results were consistent with the negative results of the Sl nuclease experiments if it is assumed that the presence of intron sequences produced looping out of the RNA but not of the labeled cDNA probe.
The data from the experiments were used to design primers and to carry out PCRs using genomic DNA as a template. The products of the PCRs were again cloned into Ml3 for nucleotide sequencing.
The results confirmed the G" to A mutation in intron 20 in proband II and the G" to A mutation in intron 42 in proband III. They also demonstrated that proband I had an identical G" to A mutation in the first nucleotide of intron 16.
Pattern of RNA Splicing Induced by the Three Mutations-To define the pattern of RNA splicing produced by the three single-base mutations in the three probands, a further series of Sl nuclease protection experiments were carried out. Because only about half of the genomic structure of the human type III procollagen gene is known (see Refs. 32, 35, and 37), several different strategies were used to prepare singlestranded DNA probes containing the appropriate coding and intervening sequences (see "Materials and Methods"). To examine RNA splicing in proband I (+l IVSlG), an endlabeled hybrid DNA probe of 865 nt was prepared. The probe contained part of exon 10, all of exons 11-16, intron 16, and the first 50 nt of exon 17 (Fig. 3). RNA from the proband's fibroblasts protected fragments of 820, 373, 349, and 295 nt. The fragment of 295 nt indicated exon skipping in RNA splicing so that the 18 codons of exon 16 were eliminated from the mRNA. The fragment of 349 nt reflected normal splicing of the RNA. The fragment of 373 nt indicated use of a cryptic 5'-splice site at position +25 of intron 16. The sequence of the cryptic 5'-splice site was -GGGTATAA-.
The fragment of 820 nt reflected a failure to splice out intron 16 since it contains all 421 nt of intron 16. (Cleavage of the 865nt probe to a fragment of 820 nt was more easily seen in other gels (not shown).) An additional band of about 312 nt varied widely in intensity from one experiment to another. The fragment of 312 nt indicated a deletion of 37 nt of exon 16 but no potential cryptic splice site was found in exon 16 that could account for the band. Therefore, the origin of this fragment was not apparent. Densitometry of the gels demonstrated that about 71% of the mRNA from the mutated allele (half of the total mRNA) was spliced by exon skipping, about 21% was spliced by insertion of 24 nt from intron 16, about 1% was processed without any splicing of 421 nt of intron 16, and about 8% was accounted for by the undefined deletion (Table I).
In parallel experiments with proband II (+l IVS20), the probe was 382 nt long (Fig. 4). The probe contained part of exon 19, all of exon 20, all of intron 20, and part of exon 21. RNA from the proband's fibroblasts generated singlestranded DNA fragments of 112, 136, and 340 nt. The fragment of 112 nt indicated normal RNA splicing. The fragment of 136 nt indicated that the cryptic 5'-splice site at position +25 of the intron was used in splicing of some RNA. The sequence of the cryptic splice site was -TGGTTATT-.
The fragment of 340 nt indicated that some of the RNA was spliced so that all 132 nt of intron 20 were retained. The fragment of 58 nt that might reflect exon skipping of exon 20 was not detected. As indicated above, evidence of exon skipping was obtained by sequencing of cDNA clones generated from PCR products in that two cDNA clones of 156 nt had  from intron 20, about 34% by insertion of all the 132 nt from the intron, and about 12% by an undefined insertion of 113 nt (Table I).
In parallel experiments with proband III (+l IVS42), the probe was about 1259 nt long (Fig. 5) Densitometry of the gel indicated that essentially all of the RNA from the mutated allele was spliced by using the cryptic splice site in intron 42 (Table I). Analysis of Potential 5'-Splice Sites-To explore why the three mutations produced different RNA splicing patterns (Table I and Fig. 6, I), an analysis of the potential 5'.splice sites was carried out with the statistical procedure of Shapiro and Senapathy (38) in which potential 5'-splice sites are scored relative to the frequency with which the same bases are found in 542 normal 5'-splice sites of primate genes. In the analysis, the sequence complementary to the 5' end of Ul RNA scores 100. The normal and the potential 5'.splice sites were also evaluated with the free energy parameters of Freier et al. (39).
Comparison of the statistical scores for the normal 5'-splice sites ( Fig. 7 and Table II) indicated that the normal sites in the three introns had higher values than any potential sites in the same introns after sites within 70 nt of the next exon were excluded (41). Comparison of the AC!!; values did not, however, present as consistent a picture. For introns 16 and 20, the normal 5' sites had more favorable L!.G:, values than any potential sites in the two introns. This was not true for intron 42 in which four potential sites had more favorable AG]& values (-5.8 to -7.1 kcal/mol) than the normal 5'-site (-3.5 kcal/mol).
The data from the three mutations were examined to determine whether they explained why specific cryptic sites were used and why exon skipping occurred in proband I (+l IVS16) and not in the other two probands. The results indicated that when a cryptic site was used, there was a preference for the most 5' site with a high statistical score or with a favorable 1G$ value. In proband I (+l IVSlG), the +9 site in intron 16 was not used, apparently because it had unfavorable values by both parameters (34% and -0.6 kcal/mol). The f25 site was used, whereas the +153 site was not, apparently  and this intro" 1s sphced before the intron containing the G'l mutation.
In proband II (+l IVSBO), the observation that 34% of the RNA retains all of the sequences of intro" 20 is consistent with the scheme shown.
In proband III (+l IVS42), the data do not exclude the possibility that there is rapid cleavage of the cryptic site at +31 in intron 42 before sphcing of intron 41.
because the +25 site was more 5'. The same situation appeared to hold for proband II and proband III in that there was preference for cryptic sites near the beginning of the intron.
The data did not, however, provide a simple explanation as to why exon skipping occurred in proband I (fl IVS16) but not in the other two probands. In particular, intron 16 had two potential 5'.sites with AC& values more favorable (-5.2 and -5.4 kcal/mol) than the two best potential sites in intron 20 (-2.1 and -2.7 kcal/mol). Also, one of the sites at position +153 in intron 16 had a statistical score (47%) similar to the statistical score (49%) of the best site at position +25 in intron 20. Therefore, the values predicted that the extent of exon skipping and use of cryptic splice sites should be comparable in proband I (+l IVS16) and proband II (+l IVSZO). Instead, 70% of the RNA from the mutated allele in proband I was spliced by exon skipping and no more than about 1% was processed by exon skipping in proband II (Table I). DISCUSSION The results here describe the first RNA splicing mutations in the gene for type III procollagen. All three mutations were found in probands with EDS IV. All the mutations consisted of single base changes that converted the first G in an intervening sequence to an A, and thereby destroyed the sequence of -GT-found as the first two nucleotides of most intervening sequences (38,40). Two of the mutations occurred adjacent to homologous 54-bp exons coding for the repetitive -Gly-X-Y-amino acid sequences characteristic of collagens. The third mutation was adjacent to a similar 108-bp exon. The consequences of the three single base mutations, however, were very different in terms of the patterns of RNA splicing they generated (Fig. 6,1, and Table I). The mutation in proband I (+l IVS16) primarily caused exon skipping. The identical mutation in intron 20 in proband II (+l IVSPO) produced little exon skipping, but about 53% of the RNA was spliced through a cryptic site in intron 20 and about 34% was processed without any splicing of the sequences from intron 20. An identical G+' mutation in proband III (+l IVS42) produced highly efficient use of a single cryptic site in intron 42.
The observation that all three G" to A mutations altered RNA splicing is consistent with a large number of previous observations. Analyses of over 3000 introns indicated that 99.4% (40) or more (38) contained the dinucleotide sequence of -GT-at the 5' end of the intron. Most of the deviations from the -GT-rule were a change of the T to a C (13, 40,42), and only three introns, all in immunoglobins, lacked a G in the first position (38). Also, a series of mutations that change the -GT-dinucleotide sequence at the 5' ends of introns were shown to markedly alter RNA processing (9-11, 16,22,43).
The three mutations defined here are the first description of RNA splicing mutations that are caused by identical single base mutations in different introns of a gene with multiple and homologous exons. Therefore, they provide an interesting test for factors that influence aberrant RNA splicing. Evaluation of potential 5' sites in the three introns indicated that when a cryptic site was used, there was a preference for the most 5' site that had a favorable statistical score as defined by Shapiro and Senapathy (38) or a favorable Aa, value for binding to Ul RNA as calculated by the free energy parameters of Freier et al. (39). The data on the potential 5' sites, however, did not provide any simple explanation as to why extensive exon skipping occurred with the mutation found in proband I (+l IVS16) but not with the mutations found in the other two probands.
The extensive exon skipping seen with the mutation in proband I (+l IVS16) suggests the sequence of events illustrated in Fig. 6, ZZA. The normal splicing of an intron involves binding of Ul snRNP complex to the 5' end of the intron and binding of a U2 and U5 snRNP to the 3' end (44,45). The 5' end is then cleaved, a lariat loop is formed by the ligation of the 5' end to an A at the U2 snRNP binding site, and the process is completed by cleavage of the 3' end and joining of the two exons. The exon skipping observed in proband I (+l IVS16) can be explained by a U2 and U5 snRNP complex binding to the 3' end of intron 16 before a similar complex binds to the 3' end of intron 15. The absence of significant exon skipping in probands II and III suggests that a U2 and U5 snRNP complex binds to the 3' end of the adjacent upstream intron before a similar complex binds to the 3' end of the intron containing the mutation (Fig. 6, ZZB). In proband II (+l IVS20), the observation that 34% of the RNA was processed without any splicing of intron 20 (Table I), is consistent with the sequence shown in Fig. 6, ZZB. In proband III (+l IVS42), the sequence of events may differ slightly in that the cryptic site at position +31 may be efficiently cleaved before intron 41 is spliced. The two different sequences of events (Fig. 6, ZZ) depend on the assumption that the introns adjacent to the mutations are normally spliced at different rates relative to the intron containing the mutation. Different rates of splicing of introns from multi-exon genes was demonstrated in uitro for the murine interleukin-3 gene that contains four introns (46). The reasons for a preferential order in the splicing of transcripts from multi-exon genes are not apparent. However, examination of further mutations in collagen genes or in comparable multi-exon gene constructs may define the factors that are critical.
A technical point of interest was that probe protection experiments with a labeled cDNA probe and Sl nuclease did not detect the abnormal transcripts produced by two of the mutations. The results demonstrated, therefore, that under the conditions employed here Sl nuclease did not cleave a labeled DNA probe in RNA-DNA hybrids in which there was looping out of RNA but not the DNA. At the same time, it was apparent that probe protection experiments with the appropriate genomic probe were necessary to establish the relative abundance of aberrantly spliced RNAs, since data obtained from sequencing 10 or more randomly selected MI3 clones of PCR products were frequently misleading.