Mutations Away from Splice Site Recognition Sequences Might cis-Modulate Alternative Splicing of Goat aSl-Casein Transcripts STRUCTURAL ORGANIZATION OF THE RELEVANT GENE*

0.1-Casein variants F and D, synthesized in goat milk at lower levels than variant A, essentially differ from it by internal deletions of 37 and 11 amino acid resi- dues, respectively. Northern blot analysis of mRNAs encoding aSl-casein F and A and sequencing of the relevant cloned cDNAs, as well as sequencing of in vitro amplified genomic fragments, revealed multiple alternatively processed transcripts, from the F allele. Although correctly spliced were identified, 9, together encode the 37 amino acid residues present in asl-casein variant A but missing in variant F. 9 codes for the sequence present in variant A but deleted in variant D. A single nucleotide deletion in exon 9 and two insertions, 11 and 3 base pairs in length, in the downstream intron, were identified as mutations potentially responsible for the alternative skipping of these 3 exons. From a computer-predicted secondary structure it appeared that the 11-base pair insertion might be involved in base-pairing interactions with the intron 6’ splice site which might consequently be less accessible to U1 snRNA. We also report here the complete struc- tural organization of the goat aSl-casein transcription unit, deduced from polymerase chain reaction experi- ments. It contains 19 exons scattered within a nucleotide stretch nearly 17-kilobase


Mutations Away from Splice Site Recognition Sequences Might cis-Modulate Alternative Splicing of Goat aSl-Casein Transcripts
STRUCTURAL ORGANIZATION OF THE RELEVANT GENE* (Received for publication, September 16, 1991) Christine LerouxS, Nathalie Mazure, and Patrice Martine From the Laboratoire de Genitique Biochirnique, Znstitut National de la Recherche Agronornique, Dornaine de Viluert, 78350 Jouy-en-Josas, France 0.1-Casein variants F and D, synthesized in goat milk a t lower levels than variant A, essentially differ from it by internal deletions of 37 and 11 amino acid residues, respectively. Northern blot analysis of mRNAs encoding aSl-casein F and A and sequencing of the relevant cloned cDNAs, as well as sequencing of in vitro amplified genomic fragments, revealed multiple alternatively processed transcripts, from the F allele.
Although correctly spliced messengers were identified, most of the FmRNAs lacked three exons. These exons, further identified as exons 9, 10, and 11, together encode the 37 amino acid residues present in asl-casein variant A but missing in variant F. Exon 9 codes for the sequence present in variant A but deleted in variant D. A single nucleotide deletion in exon 9 and two insertions, 11 and 3 base pairs in length, in the downstream intron, were identified as mutations potentially responsible for the alternative skipping of these 3 exons. From a computer-predicted secondary structure it appeared that the 11-base pair insertion might be involved in base-pairing interactions with the intron 6' splice site which might consequently be less accessible to U1 snRNA. We also report here the complete structural organization of the goat aSl-casein transcription unit, deduced from polymerase chain reaction experiments. It contains 19 exons scattered within a nucleotide stretch nearly 17-kilobase pairs long.
Caseins, which are synthesized under multi-hormonal control in the mammary gland of mammals, amount to nearly 80% of the proteins in ruminants' milk. These proteins are cemented by a calcium phosphate salt to form large and stable colloidal particles, referred to as casein micelles. Bovine caseins, which have been the most thoroughly studied, consist of four polypeptide chains: as]-, asp-, p-, and K-caseins, the primary structures of which are known (reviewed in Ref. 1). Post-translational processing, such as phosphorylation, glycosylation, and limited proteolysis by plasmin, increases this * This work was sponsored by the French Ministry of Research and Technology. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) X59835 and X59836.
$ Supported in part by a fellowship from the council of Representatives of the "Poitou-Charentes" region.
In the goat species (Cupru hircus) caseins are likewise constituted of these four polypeptides, but a quantitative allelic variability, particularly for the ae1-casein, further adds to this complexity. It has been demonstrated by electrophoretic techniques that the extensive polymorphism observed in the goat is under the control of at least seven autosomal alleles, termed aS1-CnA, B, ' , D , E , F, and ' , which segregate according to Mendelian expectations (2, 3). The A , 3, and C alleles are associated with a high content (3.6 g/liter) of , x B lcasein in milk, while aal-CnD and are associated with a low content (0.6 g/liter) and aSl-CnE (previously named a,,-CnB-) with an intermediate content (1.6 g/liter). aSI-Cno is probably a true null allele. The most frequently encountered alleles in French flocks are a,,-CnE and F, which together represent between 75 and 84%, according to the breeds. RFLP analysis confirmed these data at the DNA level (4) and revealed the existence of two additional alleles (asl-CnF' and ").
The primary structure of the goat a,*-casein variant B has been recently established (5). The polypeptide chain and its bovine counterpart have the same length (199 residues) and are both 8 amino acid residues longer than the monomorphic ovine as,-casein. While genetic variants asl-CnA, B, C, and E only differ by amino acid substitutions, variants aS1-CnD and F appear to be asl-CnB variants internally deleted of 11 and 37 residues, respectively (5, 6). Both deletions start at the same position (residue 59) and lead to the loss of the multiple phosphorylation site, a hydrophilic cluster of five contiguous phosphoseryl residues: SerP64-SerP-SerP-SerP-SerP-Glu-Glu7'. It has been suggested that both deletions.arise from an improper processing of the primary transcript (6). Peptide as well as DNA structural data strongly support such a hypothesis. Indeed, the deletion occurring in a,,-casein D ends between two glutamate residues located next to the multiple phosphorylation site. In the /3-casein gene, two glutamate codons belonging to two contiguous exons contribute to the conserved multiple phosphorylation site (7, 8). ' It was therefore tempting to ascribe the loss of peptides Gln5' to Glu6' and Gln5' to Leug5 in goat aSl-casein D and F, respectively, to mutational events inducing an out-splicing of one or several putative exons during the processing of primary transcripts (pre-mRNAs).
In an attempt to substantiate this hypothesis, we have undertaken to analyze and compare, mainly using the polymerase chain reaction (PCR)' technique, the structural orga-C. Provot, M. A. Persuy, and J.-C. Mercier, submitted for publication.

6147
Alternative Splicing of Goat aS1-Casein Transcripts nization of the cUSl-CnA and cUSl-CnF alleles in the region containing the putatively out-spliced exons. In this report, we show that most of the transcripts from the asl-CnF allele are actually aberrantly spliced, and lack three exons. However, properly spliced messengers are also produced, as well as transcripts in which up to five exons are missing. Nine different transcripts were isolated and characterized. Our results suggest that a single base deletion within the first unspliced exon and insertions occurring within the downstream intron might be responsible for reducing the efficiency and accuracy of the splicing machinery, which leads t o exon-skipping and, in some instances, to the activation of cryptic splice sites. In addition, we report here the complete structural organization of the goat ael-casein transcription unit, which was deduced from PCR experiments.

EXPERIMENTAL PROCEDURES
RNA Preparation and Northern Blot Analysis-Mammary tissue was obtained from two freshly slaughtered goats, one homozygous A and one homozygous F a t the Lyel-Cn locus. Total RNA was prepared by guanidinium thiocyanate extraction (10) and poly(A)+ RNA isolated from total RNA by two successive chromatographic runs on oligo(dT)-cellulose (11). For Northern transfer analysis, total or poly(A)+ RNAs were treated with glyoxal as described (12), electrophoresed in 1.5% agarose gels, and transferred onto Biodyne B nylon filters (Pall BioSupport Corp., Glen Cove, NY). The membranes were probed with an ovine anl-casein cDNA (13) labeled with [ w~' P ]~C T P according to the random priming method (14), and treated following the recommendations of the manufacturer.
cDNA Synthesis and Cloning in pUCl8"Sequential synthesis of double-stranded cDNA was performed essentially as described (15) using poly(A)+ RNAs from goat lactating mammary gland as templates. The first strand cDNA was primed with oligo(dT), and synthesized with reverse transcriptase, while the second strand was synthesized by Escherichia coli DNA polymerase I after treatment of the mRNA/cDNA hybrid with E. coli ribonuclease H. Finally, the double-stranded cDNA was filled in with T4 DNA polymerase, and the blunt-ended cDNA thus obtained was inserted into SrnaI-digested pUC18 plasmid vector. E. coli DH5a competent cells (BRL) were transformed to ampicillin resistance with recombinant plasmid DNA using the supplier's protocol. Transformants were transferred onto nylon membranes (Amersham Corp.) and screened by colony hybridization with the ovine aSl-casein cDNA probe.
Genomic DNA Preparation-Goat genomic DNA was prepared from leucocytes isolated from the plasma fraction of EDTA-anticoagulated peripheral blood samples, as described previously (4).
Oligonucleotide Preparation-Oligonucleotides, the sequences of which are given below, were synthesized using P-cyanoethyl amidite chemistry either on a 8600 Biosearch or an Applied Biosystems PCR-Mate DNA synthesizer. After cleavage, dimethoxytritylatedprotected oligonucleotides were purified by RP-HPLC on a DeltaPack Cls column (Waters) using a gradient of acetonitrile. The dimethoxytritylated group was then manually removed by a 20-min treatment in 80% acetic acid at room temperature. Finally, oligonucleotides were dried down under vacuum, resuspended in distilled water, and their concentration adjusted to 50 pmollpl. The oligonucleotides used were as follows. Their numbering depends on their orientation. Oligonucleotides with uneven numbers are in the mRNA 5' to 3' direction and even ones in the opposite direction. The sequence of primers BT21, BT27, and BT28 was 5' extended with an EcoRI recognition site used for cloning the PCR-amplified fragments.
Polymerase Chain Reaction Amplification and Analysis of PCR Products-In vitro DNA amplification was performed with the thermostable DNA polymerase of Thermus aquaticus in a thermal cycler (Perkin-Elmer Cetus), essentially as described (16). A typical 100-pl reaction mix consisted of 10 pl of 10 X PCR buffer (500 mM KCl, 100 mM Tris-HC1, 15 mM MgC12, 0.1% (w/v) gelatin, pH 8.3), 5 pl of 5 mM dNTPs mix, 1 pl (50 pmol) of each amplimer, 0.5 pl (12.5 ng of recombinant plasmid or cDNA synthesis reaction mixture) to 4 p1 (1 pg of genomic DNA) of template DNA, and 0.5 pl (2.5 units) Amplitaq DNA polymerase (Perkin-Elmer Cetus). To avoid evaporation, mixes were overlaid with 70 pl of light mineral oil. After an initial denaturing step (94 "C for 10 min), the reaction mix was subjected, unless otherwise indicated, to the following three-step cycle which was repeated 25 times: denaturation for 2 min at 94 "C, annealing for 2 min at 52 to 63 "C, and extension for 2 min at 72 "C. Five pl of each reaction mix was analyzed by electrophoresis, in the presence of ethidium bromide (0.5 pg/ml), either in a 0.8% SeaKem (FMC) agarose slab gel in TBE buffer or in a 3-6% NuSieve (FMC) agarose slab gel in TEA buffer, depending on the size of amplified DNA fragments. Amplified products were analyzed by Southern blotting (17). Pall Biodyne B nylon transfer membranes were prehybridized in 5 X SSPE (0.9 M NaC1, 50 mM NaH2P04, 5 mM EDTA, pH 7.7), 1% Denhardt's solution (0.2% Ficoll, 0.2% polyvinylpyrrolidone, 0.2% bovine serum albumin), 0.5% sodium dodecyl sulfate, sonicated herring sperm DNA at 125 pg/ml for 3 h at 42 "C and then hybridized at the same temperature for 3 h with oligonucleotides radiolabeled (3 10' cpm/pg) at the 5' end with [r-"P] ATP at 1.5 IO6 cpm/ml. Membranes were washed twice for 10 min at room temperature in 5 X SSPE, then for 15 min at 50 "C in 5 X SSPE and, finally, at 55 "C for 15 min in 5 X SSPE. They were then autoradiographed at -70 "C with an intensifying screen for 1-3 h.
Before sequencing, PCR products obtained from genomic DNA were first either purified from low melting point agarose gels or phenol/chloroform-extracted and ethanol-precipitated, then phosphorylated with T4 polynucleotide kinase and cloned into SmaIdigested pUC18.
DNA Sequence Analysis-Nucleotide sequencing was performed according to the dideoxy nucleotide chain termination procedure (18) using [ ( u -~~S I~A T P (Amersham) and T7 DNA polymerase (Pharmacia LKB Biotechnology Inc.) either on single-or double-stranded DNA (19).
of Goat aSl-Casein Transcripts 6149

RESULTS
Transcript Analysis Northern Blot Analysis-Whereas goat asl-casein A transcripts yielded only a single band in Northern blot analysis, surprisingly, allele F gave rise both to normal-sized transcripts and to a shorter expected mRNA (Fig. 1). The amount of cyIIlcasein mRNA transcribed from the F allele was estimated to be at least 6 times lower than that transcribed from the A allele. Quantitative dot blot analysis and in uitro amplification experiments using @-casein cDNA as an internal standard (results not shown) further confirmed this result.
Nucleotide Sequence of the cDNA Encoding Goat aS1-Casein A-Two mammary cDNA libraries, representative of A and F homozygous goats at the locus asl-Cn, were constructed in pUC18. Of the 70 positive clones isolated from the "A" library (pUC18QAcDNA), three having inserts among the longest were chosen and their DNA sequenced entirely on both strands, giving an almost full-length cDNA (Fig. 2). In their overlapping regions the three sequences were identical and displayed initiation and stop codons defining the coding region and at least part of the poly(A) tail, 17 & 3 nucleotides (nt) downstream from a consensus (AATAAA) polyadenylation signal according to the polyadenylation site usage. The degree of similarity with its ovine (13) and bovine (20) counterpart was 99 and 95%, respectively, counting as one difference the 24-nt sequence (residues 466-489) which is lacking in the ovine species. Despite repeated efforts, we did not succeed in isolating positive clones from the "F" library (pUC18QFcDNA) with inserts long enough to yield useful information. Consequently, we amplified asl-CnF DNA fragments from the relevant cDNA synthesis reaction mix (FcDNA), using the PCR technique.
The F Allele Yields Multiple Forms of mRNA-Three pairs of oligonucleotide primers, BT21/BT22, BT17/BT38, and BT21/BT42 (their positions are given in Fig. 5 ) , whose sequences were derived from the goat aSl-casein A cDNA nucleotide sequence, were used to further analyze the different forms of aSI-casein F mRNA. The amplified fragments corresponding to these transcripts were subsequently extensively characterized. Total cDNA from allele A (AcDNA) was amplified in parallel as a control. Results of the Southern blot analysis of DNA fragments amplified with the first two pairs of primers are summarized in Table I. A transcript l l l -n t shorter than that from the A allele was expected from the protein structure data (6). Therefore, amplifying the FcDNA between primers BT21 and BT22 was expected to yield a 306bp fragment instead of the 417-bp fragment obtained with the AcDNA. Fig. 3a shows that, in addition to a 306-bp fragment, two longer fragments, of approximately 420 and 380 bp, also hybridized with radiolabeled BT22 (Fig. 3b). Moreover, these two fragments, but not the 306-bp fragment, hybridized with  2 ) and BT17/BT38 (lanes 3 and 4 ) . The cDNA samples which were amplified were prepared from mammary gland poly(A)+ RNAs taken from lactating goats which were either homo- BT25, which indicates that they contain the related sequence (Fig. 3c). This result suggested, in accordance with the Northern blot analysis, that the F allele gives rise to at least two classes of transcript, one of which apparently has a structure closely related to that of the unique transcript from the A allele. This was confirmed by the results of amplifying FcDNA between BT17 and BT38. A major 210-bp fragment, again -110 nt shorter than that (318 bp) obtained with AcDNA, was amplified, together with at least three additional fragments ranging in size between 320 and 260 bp (Fig. 3a). Southern blot analysis revealed that the 210-bp band probably comprised several different amplified fragments (Fig. 3b), which did not hybridize when the blot was probed with BT25 ( Fig. 3c).

645
, ___ ___ _" _" " _ ___ ___ ___ " _ " _ " _ _" ___ ___ ___ ___ ___ ___ ___ " _ " _ " _ " .  the F6, F7, and F9 sequences, which display additional deletions including the 24-nt duplicated sequence. In addition to a missing codon (CAG located between nt 277 and 279), also absent in F9, F8 was characterized by a 33-nt deletion starting, as did the Ill-nt deletion, at position 220. The corresponding 33-nt sequence, in which the deoxycytidyl phosphate residue lacking in the four other long forms is located, encodes the peptide region that is deleted in the goat aSl-casein D variant. The occurrence of these multiple forms of mRNAs, which was also demonstrated using the controlled primer extension method (21), strongly suggested an abnormal processing of a unique primary transcript of the aS1-Cn" allele. To test this hypothesis, we analyzed and compared the structure of the relevant regions of the A and F aSI-Cn alleles (results not shown).

Organization and Nucleotide Sequence of the Genomic Region
Two pairs of primers, BT17/BT12 and BT23s/BT12, were used to amplify genomic DNA from alleles A and F. BT23s and BT12 mimick sequences of aSI-casein A cDNA at the opposite extremity of each strand of the Ill-nt sequence missing in the aSl-casein F cDNA (see Fig. 9b). Two fragments, 2.45 and 0.95 kb in length, were amplified from each allele with the primers BT17/BT12 and BT23s/BT12, respectively (Fig. 6). Such a result confirms that the nucleotide sequence encoding the internal peptide, which is deleted in variant F, is present at the genomic level. However, the occurrence of two classes of aSl-casein transcripts, one which displayed the Ill-nt internal deletion and one which did not, might reflect the existence of two copies of the aSl-casein gene per haploid genome, one copy being deleted of at least 0.95 kb, i.e. spanning the region from BT23s to BT12. If so, amplifying the region between BT17 and BT22 from aRl-CnF homozygous goat genomic DNA should generate two frag- ments, differing in size by a minimum of 950 bp. This was observed neither with the genomic DNA of the goat from which the cDNA library had been made, nor with that of three other F homozygous goats examined. Moreover, the striking similarity and simplicity of the patterns displayed by both A and F alleles after digestion with various restriction endonucleases (4) argue convincingly in favor of a single copy of the aSl-casein gene. This implies that the internal deletion characterizing the aSl-casein F variant is not due to a genomic deletion.
Since BT17 ends precisely where the Ill-nt deletion and BT23s start in the cDNA, we assumed that these two contiguous exon sequences are separated from each other at the genomic level by a 1.5-kb intron. A unique 4-kb fragment was obtained by amplification with BT17/BT22. The structural organization of the genomic region, surrounding the Ill-nt coding sequence absent from short FmRNAs, was determined by a similar strategy. Using the PCR products as starting material, a basic restriction map was constructed for both alleles (see Fig. 8a). The 1.35-kb HaeIII-EcoRI and 1.25-kb EcoRI fragments were subcloned into pUC18, and at least three independent clones were sequenced for each fragment and each allele. In addition, the 0.95-kb DNA fragments, generated from both alleles by in vitro amplification using the BT23s/BT12 pair of primers, were cloned into pUC18, and at least four independent clones were sequenced. The Ill-nt sequence missing in short F cDNA forms appeared to consist of three exons, 33, 24, and 54 base pairs in length, separated by introns of 0.8 and 0.1 kb (Fig. 7). These three coding sequences were further identified as exons 9-11, respectively (see below).
Comparative analysis identified three major mutational events. Within exon 9 of the F allele, a single nucleotide (cytidyl phosphate residue) was missing. This result was later confirmed by allele-specific directed amplification using BT45/BT22 and BT23/BT22 as pairs of primers complementary to the F and A sequences, respectively, thus demonstrating that such a mutation was not an error introduced during the amplification process. Two insertions, of 11 and 3 bp, within the downstream intron (intron 9) were also detected in the F allele. The larger insertion (CGTAATGTTTC), which appeared to be nearly a perfect duplication of the preceding 11-nt sequence (CATAAAGTTTC), was located 73 bp downstream from the 5' splice site, while the 3-bp insertion (AAT or TAA) interrupted a polypyrimidine stretch (14 T) upstream from the ninth intron 3' splice site (Fig. 8b). In addition, eight scattered point mutations were detected within the sequenced region. Seven of these were intronic G + A/A + G and C + T/T + C transitions at a distance from splice site recognition sequences, and some may reflect the low fidelity of Taq polymerase. The last point mutation, a C + G transversion, affected the antepenultimate nucleotide of exon 10. The 3-bp insertion occurring within intron 9 is the only mutation interesting a splice site consensus sequence. Although, additional undetected sequence alterations may be involved, the cause of the skipping of exons 9, 10, and 11 is very likely to lie within the mutations described above.

Structural Organization of the Goat aSl-Casein Gene
Though the caseins represent one of the most rapidly diverging protein families, three regions of the "calciumsensitive" casein (asl, @, and as2) mRNAs, remain highly conserved the 5' noncoding region, the signal peptide-coding region, and the regions encoding the ubiquitous multiple phosphorylation site (7). Genes encoding the calcium-sensitive caseins have conserved a similar organization at the 5' end of the transcription unit and share common structural patterns upstream from it, which strongly supports the hypothesis of a common ancestor (22)(23)(24). In the rat, the first two 5' exons of the genes encoding the calcium-sensitive caseins exhibit a highly conserved structure (23). Exon 2, which is consistently 63 nt in size, comprises the end of the 5'-untranslated region (12 nt) plus 17 codons, including those encoding the signal peptide, which is invariably 15 amino acid residues long. The same structure is also found for the bovine and the ovine @-casein gene (8).' The 5' region of the bovine aSl-casein gene probably has a similar structure (25); therefore, it seemed reasonable to expect that the 5' region of the caprine aSl-casein gene might too. In addition, intervening sequences occurring between coding exons of the casein genes unsequenced intron region whose size is given. Exon sequences are shown as codons, and their deduced amino acid sequences are given so far sequenced belong to class 0 introns (26), since they interrupt the reading phase between codons. With these considerations in mind, and given the results reported above referring to the clean skipping of three exons during RNA processing, we hypothesized that the internal deletion occurring within the bovine aSl-casein A variant (27), as well as sequences lacking in both the ovine and the caprine ( F allele) aSl-casein mRNA (Fig. 5), might also be the consequence of exon-skipping events. Based on these observations, we sought to identify the most probable positions of exon junctions within the goat aal-casein cDNA sequence. Since primertemplate mismatches at the 3'-terminal base are known to reduce amplification yield dramatically, a set of primers having their 3' extremities at putative junctions between two contiguous exons were designed and used to confirm the position and to estimate the size of introns after in uitro amplification of genomic DNA ( A allele). The DNA fragments generated were analyzed by agarose gel electrophoresis (Fig.   9a). The overall structural organization of the goat aSl-casein transcription unit (Fig. 9b) was deduced from these results and from sequence data. The organization of the 3'-noncoding region was elucidated by sequencing, from both extremities, DNA fragments generated by amplification between the pairs of primers BT67/BT42 and BT57/BT56. The boundaries of exons 4 and 5 were confirmed by cloning and sequencing PCR-amplified fragments between BT27 and BT48, and the junction sequences at the 3' and 5' boundaries of exons 14 and 15, respectively, were likewise determined.
The main feature of the aSl-casein gene is its extremely split architecture. It contains 19 small exons, ranging in size from 24 to 154 bp in the coding region, spread over a -17-kb transcription unit. Our results for the 5' region are consistent with the organization of the five first exons previously reported for the rat aal-casein gene (25). At least five of the seven 24-bp exons (exons 6, 7, 10, 13, and 16), including the duplicated sequence (exons 10 and 13) alternatively skipped in the goat F allele transcripts, appear to have originated from the same ancestral exon, since their sequences show more than 58% similarity with a consensus sequence, as well as a remarkable conservation at their 3' end (Fig. 10). These exons are also very similar to the fourth exon of the bovine and sheep p-casein genes that encodes the multiple phosphorylation site. More strikingly, the similarity reaches 66% when the last seven codons of exon 7 of the goat ael-casein gene are compared with the fourth exon (21 bp) of the rat ,&casein gene. This strongly supports the hypothetical evolutionary pathway of the calcium-sensitive casein gene family through intra-and intergenic duplications, first predicted from amino acid sequence data (28,22) and further substantiated (7). As expected, the 24-nt sequence, constitutively deleted in the ovine transcript, corresponds to an exon (exon 16) probably skipped during processing of the ovine aS1-casein primary transcripts.

DISCUSSION
The results presented here indicate that the previously reported internal deletions within the goat aSl-casein variants F and D (6) are due not to a genomic deletion but to the outsplicing of three exons and one exon, respectively, which occurs during pre-mRNA processing. However, with the F allele, in addition to the major transcript form having three below. The vertical solid arrow and arrowheads indicate the single nucleotide (deoxycytidyl phosphate residue) deletion and intron insertions, respectively, detected in the F allele sequence. Underlined nucleotides correspond to homologous sequences putatively duplicated.  (25). Open bars represent introns, and exons are depicted by large, stippled black (exon constitutively out-spliced from ovine a,,-casein mRNA) and white (exons alternatively skipped from goat w-casein FmRNA) boxes. Sizes of introns (upper italic numbers) and exons (lower numbers) are indicated in kb and bp, respectively. Arrows topped by encircled numbers represent primers and indicate their positions. Sequences of cloned PCR products have also been taken into account to construct this schematic representation of the cr,,-casein transcription unit (out of scale). skipped exons, more or less deleted usl-CnF mRNAs, as well as correctly spliced transcripts, were also identified. These multiple forms are likely to originate in a dysfunction of the splicing machinery in which large multicomponent complexes, the spliceosomes, are involved. Inaccurate splicing, by promoting selection of cryptic splice sites and/or exon-skipping, can be assumed to be due to the sequence alterations within the F allele. We suggest that the cis-acting mutations that might be responsible for the alternative skipping of exons 9, 10, and 11 are precisely those features that differentiate the A and F alleles, i.e. the two insertions occurring within the ninth intron and the single nucleotide (C) deletion occurring within exon 9.

Alternative Splicing of
The cis-elements known to be required for pre-mRNA splicing in higher eukaryotes essentially include consensus intron sequences at the 5' splice site, at the 3' splice site, and the lariat branch point (29,30). The larger insertion in the ael-CnF allele (CGTAATGTTTC), located 73 nucleotides downstream from the 5' splice site of the 776-bp ninth intron, is situated at a distance from consensus sequences known to be involved in the splicing process. In contrast, the 3-nt insertion (AAT or TAA), interrupting the long polypyrimidine stretch (14 T) upstream from the ninth intron 3' splice site, might reduce the "spliceability" of exon 9 to exon 10. Therefore, if the intron is considered the basic unit recognized by the splicing machinery, one would expect an out-splicing of exon 10 alone, which has never been observed. Conversely, exons 10 and 11, when they are out-spliced, are always both simultaneously skipped. Whether this occurs before or after intron 10 has been removed is unknown. Considering that intron 10 sequence, which is only 90 bp in length, is strictly identical in both the A and F alleles, one can reasonably assume that the 5' and 3' splice sites of the F allele primary transcript are functional. Consequently, this intron could be quickly and efficiently removed, implying exons 10 and 11 could be skipped en bloc as a single 78-nt exon. However, skipping of exon 9 remains unexplained, especially since we have not detected any mutation within the splice site recognition sequence that surrounds this exon. Therefore, the factors responsible for alternative splicing might not be directly and solely the consequence of intron sequence alterations, but might also be due to exon mutations. Deletion, insertion or even more subtle changes (substitutions) of exon sequences play a crucial role in splice site selection, especially in the case of regulated alternative splicing (31-33). Frequently, normal splice sites adjacent to the altered exon are not used (34). Considering the emerging concept of the exon as the basic unit of assembly of the spliceosome and that splice site selection is determined by splicing factor interactions across exons (35), a single nucleotide deletion (deoxycytidyl phosphate residue) at position 23 within exon 9 would be expected to influence the selection of the 5' splice site of the downstream intron and to have dramatic consequences for removal of the upstream intron. Our results are consistent with such a model, since, in addition to the prevalent exon-skipped FmRNAs, we identified a mature transcript (F6) lacking the last 5 nucleotides (GUGAG) of exon 9, which could have been recognized and used as a cryptic 5' splice site. It is worth noting that the exon nucleotide deletion thus restores an in-frame mRNA. Conversely, it seems unlikely that the G+C transversion, at the first nucleotide of the last codon of exon 10, could have any effect on the out-splicing of this exon. However, it is clear that, taken together, the structural modifications characterizing the F allele in this region drastically disrupt the splicing process and influence splice site selection. This is further exemplified by the deletion of the first codon (CAG) of exon 11 (F8 and F9), through the activation of an alternative 3' splice site, probably influenced by distal sequences.
Spliceosomes are essentially composed of small nuclear ribonucleoprotein particles (snRNPs). U1 snRNP interacts through RNA-RNA base pairing with the 5' splice site consensus sequence AGIGURAGU (36) to ensure an accurate splice site selection. It is generally admitted that the context of U1 snRNP binding sites is a critical factor for 5' splice site selection; secondary structures of primary transcripts have been invoked as determinants of splice site accessibility (37-40). One explanation for the variety of the F allele transcripts is that the 5' splice site is sequestered within a secondary structure. To test this possibility, a computer-predicted secondary structure was determined, using the Zuker algorithm (41) with a 150-nt window. The free energy variations used were those reported (42). Analysis of the potential secondary structure revealed that the 11-nt insertion, identified within the ninth intron of the F allele genomic sequence, might be involved in base-pairing interactions with the 5' splice site of intron 9 (Fig. 11). If formed, such a hypothetical stem-loop structure would reduce the accessibility of the 5' splice site to splicing factors, including U1 snRNA and would activate a cryptic 5' splice site. This raises the possibility that RNA secondary structure, in addition to the frame shift mutation (single-nucleotide exon deletion) occurring within the ninth exon, is responsible for the alternative splicing of exon 9. We propose that exons 10 and 11, which may be efficiently spliced together, might be simultaneously skipped as if they constituted a single larger exon, because of the AAT insertion within the polypyrimidine stretch at the 3' end of the ninth intron. However, this scenario remains highly speculative and further experiments are needed to substantiate it. First, we plan to study the splicing of allelic minigene constructs in transfected animal cells.
Genomic sequence data are lacking to explain the deletion of exons 13 and 16. However, it should be pointed out that exon 16 is constitutively skipped in the ovine ~pecies.~ We hypothesize that this exon is located within an unfavorable environment such that some intron mutational event triggers either its alternative splicing, as in the goat species for the F allele, or its constitutive skipping, as in the ovine species. In contrast, exon 13, which is nearly a perfect duplication of exon 10 (23 out of the 24 nt are identical for the F allele), has been observed, so far, to be alternatively spliced only in the goat aB1-CnF allele. Further intron sequence data me needed to gain insight into this additional out-splicing event. This is currently under investigation in our laboratory. We are also examining other mutant alleles, including D and E, which are C. Leroux and P. Martin, unpublished results. associated with low and intermediate protein synthesis levels, respectively.
The multiplicity of aberrantly spliced transcripts from the a.l-CnF allele may reflect a relatively weak spliceability of FmRNA precursors. This, in turn, may lead to the &fold reduction in the amount of mature transcripts observed, which accounts for the lower aSl-casein content of milk produced by goats bearing this allele. Elsewhere, it was shown that translation termination mutations in internal exons of the dhfr (43) and p-globin (44) genes gave rise to low-RNA phenotype. The authors, who concluded that the occurrence of nonsense codons could affect RNA processing, proposed two models in which splicing and nuclear transport or nuclear scanning of reading frames of RNA molecule could be coupled to its translation. Following these models, one can put forward that the premature translational termination codon, generated by a frameshift due to the C deletion in exon 9 of the aSl-CnF gene, could partly explain the low level of these transcripts.
Interestingly, another example of abnormal RNA splicing may be that of the rare bovine aSl-casein variant A, which differs from common variants by a 13 amino acid internal deletion. Recently, a cDNA clone for bovine aSl-casein variant A was isolated from a mammary gland cDNA library constructed with tissue of a homozygous B cow (27). This result suggests that abnormal splicing may occasionally occur in the course of the maturation of primary transcripts from a usually correctly spliced allele (aS1-CnB).
It is unlikely that exon-skipping is restricted to the aslcasein gene and that it does not occur with the other calciumsensitive casein genes. In the ovine and caprine species, as2casein exists as two non-allelic forms translated from four types of mRNAs resulting from a combination of insertions and deletions, possibly due to aberrant splicing, affecting the 5'-untranslated and the coding regions (45,50).4 In the bovine species, a deletion of 9 residues in asp-casein variant D (471, again affecting a multiple phosphorylation site and associated with a lower amount of casein, as in the goat aS1-casein variants D and F, may also be due to the loss of an exon. Alternative splice site selection in the course of processing of primary transcripts allows multiple mRNAs to form and subsequently multiple protein isoforms to be produced from a single gene. In the case of the F allele, the variety of transcripts observed should yield no fewer than seven different products, in addition to the major form (aS1-casein Fl), which was previously characterized (6). It should be pointed out that several minor protein forms, probably corresponding to the translation products of the multiple mRNAs, are faintly visible in Western blots.6 We plan to examine, in translation in vitro experiments, whether such peptide chains exist, particularly the truncated hybrid protein originating in the frameshift mutation caused by the outbreak of a premature stop codon within correctly spliced FmRNAs (F5/F7). This putative protein, made up of 85 amino acid residues and displaying the N-terminal sequence (58 residues) of the mature aSl-casein, would no longer be phosphorylated and would have lost its C terminus. It would be of interest to assess whether such a peptide, which should possess properties differing from those of wild-type a,,-casein, has any impact on the micellar structure.
Results reported here favor the concept according to which both coding and intervening sequences are of great importance in permitting an accurate and effective splice site selection. However, transfection experiments are required to determine whether the predicted secondary structure at the 5' splice site of the ninth intron has any relevance to the differential splicing of the aS1-casein gene transcript. If this turns out to be the case, even the double selective pressure, exerted at the nucleotide level, appears to be insufficient to maintain both the splicing consensus sequence and the ubiquitous multiple phosphorylation recognition site (7). Indeed, intron mutations in the goat aal-casein gene, situated at a distance from this region would be able to provoke the loss of the multiple phosphorylation site, through alternative splicing events.