Alternative Splicing of the NCI Domain of the Human a3(lV) Collagen Gene DIFFERENTIAL EXPRESSION OF mRNA TRANSCRIPTS THAT PREDICT THREE PROTEIN VARIANTS WITH DISTINCT CARBOXYL REGIONS*

Three clones of NC1 of (u3(IV) collagen, named Q1, LS, and V, were isolated from human kidney; these predict three variant arS(IV) NC1 domains of 232-, 60-, and 199- amino acid residues, respectively, with unique COOH-termini. The human collagen IV gene ( C O W ) was iso- lated and characterized, and it was shown that the cDNA variants arose from alternative splicing by deletion of exon 4 in LS and deletion of exon 2 in V. The mRNA transcripts were differentially expressed in fetal and adult human kidney with Q1 the major species. Exon 4- LS lacked 183 residues from the carboxyl terminus with a frameshift producing a unique 11-amino acid terminal peptide. In exon 2- V a frameshift resulted in a unique V carboxyl terminus of 53 novel peptides with a new gly- cosylation site. The size of recombinant proteins indicated the frameshifts and new stop codons were as pre- dicted. The multiple forms of the a3(IV) NC1 region may contribute to autoimmune glomerular disease and hereditary nephritis, in which this portion of the collagen lV molecule is thought to play an important role. Collagen IV is major structural basement membranes and at least six of a-chains, al-6(IV), which are present amounts different basement membranes Each of the a-chains

The abbreviations used are: GBM, glomerular basement membrane; DHFR, dihydrofolate reductase; PCR, polymerase chain reaction; RT, reverse transcription; IPTG, isopropyl-1-thio-P-o-galactopyanoside; PAGE, polyacrylamide gel electrophoresis; nt, nucleotide; bp, base pair; ously forms in patients with Goodpasture's syndrome and some other forms of glomerulonephritis is present in the a3(IV) NC1 domain (7)(8)(9)(10). The cDNA for the a3(IV) NC1 domain has been reported, and its genomic DNA has been published (11)(12)(13). The gene is located on chromosome 2, region q35-37 (11). Expression of cDNA or use of synthetic peptides suggests that the COOH terminus of the a3(IV) NC1 harbors major autoantibody-reactive epitope(s), although additional reactive sites including other collagen IV a-chains and its NH2-terminal, 7 S region, may be involved in some patients (14)(15)(16)(17). In addition, the GBM antigenic epitopes reactive with the human anti-GBM antibodies are not detected in affected patients from some kindreds with hereditary nephritis termed Alport syndrome (18,191, and recent studies indicate that the a3(IV) and a4(IV) chains of the NC1 region are missing in collagenase digests of renal basement membranes from patients with Alport syndrome (20). Antibodies formed in Alport syndrome patients after transplantation with a normal kidney react with antigens believed to represent the a3(IV) and possibly a5(IV) NC1 regions (21,22). a5(IV), whose gene maps to the X chromosome (4), has been found to be differently mutated in kindreds of patients with Alport syndrome, with deletions of highly conserved cysteines and disruption of disulfide bridges (23-25).
The relationship between mutations in a5(IV) and the absence of a3(IV) or a4(IV) NC1 regions is currently unclear and could represent alterations in synthesis, abnormalities in assembly, or antigenic epitope masking or cleavage.
While cloning the cDNA for the human NC1 domain of a3(IV), different sized clones were observed, and the genomic DNA for the NC1 domain was recovered to understand the introdexon structure and to confirm that the different cDNAs had arisen by alternative splicing. The differential expression pattern of the different mRNAs was determined by RNase protection assay in fetal and adult kidneys to better access the possible contribution of the alternative forms in development.

MATERIALS AND METHODS
RTPCR to Generate the cDNAs of NC1 Domain-First strand cDNA synthesis was performed using human kidney total RNA and murine leukemia virus reverse transcriptase with a random hexanucleotide primer. The 100-pl reaction mixture contained standard enzyme buffer, 5 pg of total RNA, 20 units of RNasin, 500 pmol of hexanucleotide primer, 10 nw dithiothreitol, 1 nw of each dNTP, with 200 units of reverse transcriptase, and was heated to 95 "C for 10 min. Five pl of the mixture were used for PCR with NC1-5' (B'-CMCCACAG-CACC-3'), and V-5' (5'-GCCC?TGAGCC'ITATATAAGC-3') and NC1-3' for two separate reactions of PCR (60 "C was used for annealing during 35 cycles). The oligonucleotides were synthesized on an AB1 model 380B synthesizer (Applied BioSystems, Foster City, CA).
CAAlTCCTTCA-3') and NC1-3'  Construction of a Partial Genomic DNA Library-Five hundred pg of purified DNA from human placenta were digested with EcoRI and 10 pg of digested DNA were used for Southern blot (Fig. 1). The rest of the DNA was recovered by electrophoresis in 0.7% low melting point gel. The DNA bands identified by Southern blot (3 and 1.6 kb) were purified and ligated into EcoRI-digested, alkaline phosphatase from calf intestine (CIPktreated AZap 11, and were packaged in Gigapack Screening a n d Sequencing of Human Genomic Clones for the NCl Domain of c~~( W , T W O kinds of libraries (3 and 1.6 kDa) were screened with riboprobes transcribed from cDNA coding for the NCI domain. Screening of both libraries was performed under stringent hybridization conditions. Approximately 6 x lo6 recombinant phages from the 3and 1.6-kDa libraries were screened. Hybridization to filters was carried out overnight at 55 "C in Hybrisol (Oncor, Gaithersberg, MD) with 200 pg/ml heat-denatured, sheared salmon sperm DNA, 5 pg/ml Escherichia coli DNA, and a concentration of 2 x lofi c p d m l probe. Filters were washed once with 2 x SSC, 0.1% SDS at 37 "C for 30 min, three times at 55 "C for 30 min, and two times at 65 "C with 0.1% SSC and 0.1% SDS. After two rounds of screening, the positive plaques were picked up and the phagemids carried in A Z A P I1 (Stratagene) recombinants were rescued with helper phage in accordance with the manufacturer's instructions. Genomic DNA was subcloned into M13mp19 for single strand sequencing or into pGEMs for double strand sequencing by Sequenase enzymes (U. S. Biochemical Corp.).
RNA Extraction and RNase Protection Assay-Total RNA was isolated from 12-and 18-week (Advanced Bioscience Resources, Inc. Alameda, CA) as well as adult human kidneys by the single step method (26). Five pg of total RNA from each sample were hybridized with 1 x IO5 cpm of the appropriate ["P1UTP-labeled antisense riboprobe a t 55 "C for at least 10 h. The unhybridized RNA was then digested with RNase T1 and RNase A, as described elsewhere (27). The dried blots were scanned and radioactivity was quantitated on the Ambis radioanalytic imaging system (Ambis Systems, San Diego, CA), as previously described (27).
Amino Acid Sequence Comparison-Protein sequences were studied and compared for molecular mimicry using the IG-Suite package (In-telliGenetics, Inc., Mountain View, CA). Sequence hydropathy and structural plots were assessed with the GCG package (Genetics Computer Group (1991), Madison, WI).

RESULTS
Zsolation and Characterization of a3(lV,, NCl Domain cDNAs-Three different sizes of clones for a3(IV) NC1 were obtained by RT-PCR of human kidney RNA (Fig. 2). When NC1-5' and NC1-3' primers were used, a 657-bp clone was obtained and named Q1. Two other smaller clones, 479 and 484 bp in length, were also found and were named L5 and V, respectively. The sequence of Q1 was found to correspond exactly to the published sequence of NC1 domain of a3(IV), except that the codon for amino acid 86 was determined to be ACA rather than ATA, as reported (12), and the codon for amino acid 155 was ACC rather than GCC (11). These differences were also confirmed for amino acid 86 in the V clone and later in the genomic DNA. The results of these differences indicated that codons for amino acids 86 and 155 specify threonines rather than isoleucine and alanine, respectively.
Observation of the known introdexon patterns of COL4A1 and COL4A5 suggested that L5 and V could have come from the events of alternative splicing. Sequencing of L5 and V indicated that the deletion in L5 could have been due to "skipping" of exon 4, and the deletion in V could have been caused by "skipping" of exon 2, in both cases with frameshifts and the use of new stop codons. In Fig. 2  between L5 and V was only 5 bp and the amplified L5 and V overlapped, as shown in the second band. For better separation, V-5' and NC1-3' were used for PCR in which 394-bp Q1 and 221-bp exon 4-V were obtained (Fig. 2, lane 5). The third band (Fig. 2, lanes 1-31 was caused by mispriming with the upstream primer, NC1-5'. This was because the NC1 domain of a3(IV), like that of al(IV) (281, contained a repeat symmetry in which the two halves of the protein had a high degree of homology. The NC1-B'primer, 5"CAAACCACAG-CAATTCCTTCA-3', with a sequence corresponding to the 5' half nucleotides (nt) 64-87 of the NC1 domain, had an 80% homology with 3' half n t 394-428 of the NC1 domain, 5'-Zsolation of the 3' End of the Human COUA3 Gene-After two rounds of screening the 3and 1.6-kb EcoRI libraries, two positive clones were obtained from the 3-kb library and one positive clone was obtained from the 1.6-kb library. Selective sequencing of the exodintron junctions indicated that the exon sizes and exodintron patterns of NC1 domains of al(IV), a5(IV), and a3(IV) were in very good alignment (24,29). The exodintron structure of a2(IV) was different from that of a3(lV) in that two exodintron junctions were missing; however, the remaining exons of both shared the same borders (30). Although there were great similarities in the exon sizes and exon borders, the intron sizes themselves were different among COL4A3, COL4A1, and COL4A5 (13,24,31). Intron 2 and intron 4 of COL4A3 were approximately 1500 and 950 bp, respectively. The sequence of the shortest COL4A3 intron, intron 3 ( Fig. 31, was only 126 bp, which was smaller than the recent size estimates reported by Quinones et al. (13). Intron 2, intron 3, and intron 4 for COL4A1 are 2.9 kb, greater than 13 kb and 960 bp (31). The corresponding sizes for COMA5 are 345 bp, 2.05 kb, and 5 kb, respectively (24). The sizes of exon 2 and exon 4 which were involved in alternative splicing were 173 and 178 bp, respectively, and both were exons with split codons, which explains the frameshifts and different stop codons found in L5 and V. All of the introns started with gt and ended with ag. The exodintron structure determined for the COMA3 genomic DNA confirmed the initial impression that L5 and V could indeed have arisen through alternative splicing. The unique DNA sequences and predicted amino acid sequences for the L5 and V clones are shown in Fig. 4.
Expression of Three Clones in E. coli-To confirm the pre- TAT ATA AGC AG gtaaaaatccaatcccctagttttacaatgggaccaagtgaatc a c t t c c c t t g t a a t g g a a t g a a a g g c a g c a c a t g a c a g t g g c g c c atagtctttgtttcatgttacag A TGC ACT GTT

W begins
dicted splicing, frameshift, and alternate stop codon usage Ql, L5, and V were expressed. The cell lysate from transformed HSM174(DE3) cells with or without induction by IFTG was examined by SDS-PAGE. After Coomassie Blue staining, the expected sizes of recombinant proteins fused with DHFR (26 kDa) were found: 52 kDa for Ql, 50 kDa for V, and 33 kDa for L5 (Fig. 5 ) .
Change in Expression of V and L5 mRNA in the Kidney during Development and Adult Life-Total RNA prepared from human kidneys at different months of age and from adults was analyzed by ribonuclease protection assay using different probes. Hybridization of the QlA301 probe (containing exon 4), followed by ribonuclease digestion, identified a 301-nt band from the exon 4' Q1 and V, and a 106-nt band from the exon 4-L5 (Fig. 6). Hybridization with the exon 4-L5A123 probe yielded a fully protected 123-nt band from the exon 4-L5 and a truncated 106-nt fragment from both the exon 4' Q1 and V mRNAs (Fig. 6). Hybridization with the QlB248 probe, which covered exon 2, yielded a fully protected 248-nt band from the exon 2' Q l , a truncated 184-nt band from the exon2-V, and a truncated 180-nt band from the exon 2-L5 (Fig. 7). There was no separation of 184-nt V and 180-nt L5 when QlB248 was used as a probe. Hybridization with the exon 2-VA223 probe (Fig. 7) yielded a fully protected 223-nt band from the exon 2-  Asecond 85-nt band was also present in both the exon 2' Q1 and the L5 mRNAs. The high density of the 85-nt band was caused by the overlap from both Q1 and L5. Fig. 8 summarizes the ratios of QlN, L5N, and QlL5, as quantitated by the radioactivity detected in the protected bands with the Ambis system, and represents the average of three independent experiments (the 18-week values were not included due to a shortage of RNA for replicates). Because different amounts of radiolabeled nucleotide can be present in the protected fragments depending on their length and nucleotide content, this ratio was not canonical if not corrected; however, it could serve as an indicator of the fluctuations of various mRNA concentrations. For final values of different mRNAs, the counts were corrected to compensate for the different numbers of [32P]UTP that could be incorporated in the protected fragments. The calculations used the following formula: net counts = LsIA x LplLs, where Ls is the length of the protected fragment (nt), Lp is the length of the longest protected fragment on the gel, and A was the number of adenine nucleotide in the antisense DNA template. When the ratios were determined for the exon 4' 1 exon 4-or exon 2+lexon 2-ratios, this formula was not used for calculation because some bands were a mixture of two kinds of mRNAs. The formula was used to calculate t h e Q W , L5N, and Q l L 5 ratios (Fig. 8).
Studies with Q1A or L5A as a probe during development revealed changing exon 4'lexon 4-(Ql, VL5) ratios. A similar fluctuation of exon 4'lexon 4-ratios was found in assays with these two probes. The VA probe was the only probe that could be used to monitor the QlN, L5N, and QlL5 ratios individually (Fig. 8). When VA was used as a probe, the lowest Q1N ratio occurred in a 12-week fetal kidney, with a slight increase thereafter. In contrast, the highest Q l L 5 ratio was found in the 12-week fetal kidney, with a decrease from 4.0 to 3.4 between the 12-week fetal kidney and the adult kidney. The L5N ratio was found to be up-regulated during development, which correlated with Q l L 5 ratio. Among the three types of mRNAs, Q1 was the major type, about 13-15 times greater than V and 3-4 times greater than L5. According to the L5N and QlL5 ratios, L5 was up-regulated during development and aging.
Structure Analyses of Novel Peptides of V and L5-Deduced primary sequences of L5 and V were analyzed for hydropathy (Kyte-Doolitle) and acid-base composition by using the Hyd program. The unexpected appearance of an N-linked glycosylation site, "NKS," was found in the deduced primary amino acid sequence of V transcripts. Areas of linear amino acid homology between Q1, L5, V, and other sequences in the various data banks were sought by computer analysis. Numerous examples of 5 and 6 amino acid linear homologies were found. Table I lists examples of 7-amino acid linear homologies which might lead to instances of molecular mimicry possibly associated with induction of the autoimmune anti-GBM antibody response. DISCUSSION Our COL4A3 genomic DNA confirmed the exodintron reported by Quinones et al. (11). The structure was similar to COL4Al and COL4A5, but with two more exodintron junctions than with COMA2 (24, 29, 30). As with most other collagen genes, a junctional exon (exon 5) encoded the transition of the 3' end of the Gly-X-Y repeat sequence of the triple-helical collagenous domain and the functionally distinct noncollagenous NC1 domain. Interestingly, an RGDS motif in the collagenous region was adjacent to the NC1 domain and shared exon 5 with the NC1. This could mean that the RGDS motif has some hnctional relationship with the NC1 domain.
The normal splicing pattern of exon 2 and exon 4 in human kidney, as investigated by means of solution hybridization, revealed that three types of transcripts were present and the exon 2+/exon 2-and exon 4+/exon 4-ratios, of potentially great importance, varied during development and aging. The differential expression of these mRNA transcripts suggests that the relative amounts are controlled by some unknown factors and could be related to human disease (see ahead). Inspection of the sequences of the L5 and V alternative splice acceptor sites in the COMA3 NC1 gene indicated that they do not differ significantly from each other when compared with the consensus splice acceptor sequence. The V type, exon 2-transcripts were slightly higher in the fetal than in the adult kidney when studied with the VA probe, so that the ratios of Q1N and L5N kept increasing from the 12th week of fetal age. In contrast, L5 increased slightly during development in the samples studied to date. The change in V type mRNA suggested that it may play an important role during fetal development, possibly related to the novel peptide of V generated by alternative splicing. Alternative splicing of exon 2 has been suggested in a5(IV) (32). Even if alternative splicing similar to that noted here for a3(IV) happened in a5(IV), the novel carboxyl-terminal amino acids generated by exon 2 deletion in a3(IV) would not happen in a5(IV) because the reading frameshift that occurs generates an immediate termination codon.  NC1-5' primer annealed to a similar sequence in the second half of NC1 domain (Fig. 2). The alternatively spliced products and the associated changes in the COOH termini of the predicted products would have an impact on the structure of the various molecular forms. Some predictions can be made on the basis of the available structural information. The four leaf clover-like structure of the NC1 region of collagen IV proposed by Siebold et al. (33), which appears to be common for all the NC1 a-chain regions based on conservation of cysteine residues, would be greatly disturbed by the alternatively spliced L5 and V transcripts. In L5, 11 of the 12 conserved cysteine residues would be missing, thereby preventing formation of intermolecular cross-links via disulfide bridges. In V, only one of the homologous subdomains could form normally since the alternative splicing would remove the five terminal cysteines. The V type mRNA predicts a N-glycosalation site, NKS, would be formed which would be unique among collagen IV NC1 molecules. The introduction of the glycosylation site could also alter the structure of the COOH terminus, thereby changing the ability of the alternatively spliced products to form the expected end-to-end assembly of a(IV) in the GBM.
In normal conditions, the ratio between Q1 and the other two isoforms might serve to control the amount of the mature a3(IV). Since L5 and Q1 may not be functional and could not be incorporated into the mature protein, this could serve as a potential mechanism to control the amount of mature functional a3(IV). Actual odoff regulation of gene expression at the level of splicing is a very common event (34).
The identification of L5 and V alternatively spliced mRNAs for the human NC1 domain of a3(IV) and the observation that they vary in relationship to the predominant form, Q1, during development may add to our understanding of GBM antigenic and possible associated structural variations in certain autoimmune forms of glomerulonephritis and in some forms of hereditary nephritis. The NC1 region of a3(IV) carries epitopes reactive with spontaneously formed human anti-GBM antibodies found in patients with Goodpasture's syndrome and some forms of rapidly progressive glomerulonephritis. The epitopes in the a 3 W ) NC1 are reported to be near the carboxyl terminus, an area of the predominant Q1 molecule which would be missing in the two alternatively spliced variants, L5 and V. The changes in these alternatively spliced forms during development could possible relate to the reported observations of impaired reactivity of spontaneously formed human anti-GBM antibodies with human renal tissue from individuals during the first year of life (35).
Electron microscopic defects in the structure of GBM are found in patients with the hereditary nephritis associated with Aport syndrome (36). The GBM in some kindreds with Alport syndrome lack antigens reactive with spontaneously formed human anti-GBM antibodies, suggesting that a defect in production or assembly of the a3(IV) molecule into the GBM is a part of the abnormality seen in this condition. The relationship of alterations in a3(IV) and the recently reported genetic varia-tions in a5(IV) in X-linked forms ofAlport syndrome, as well as in some kindreds of autosomal inheritance, remains to be defined. Persistence of the developmental splicing patterns in patients with Alport syndrome could contribute to the structural abnormalities in the GBM. Alterations of the relative amounts of the predicted alternatively spliced L5 and V mRNA products could greatly alter the carboxyl-terminal regions of a3(IV), presumably affecting their ability to form dimers and interchain reactions necessary for normal assembly of GBM. The decreased ability to assemble the GBM could, in turn, contribute to the observed structural abnormalities.
The induction of typically transient anti-GBM antibody responses in patients with certain major histocompatibility segregations (37,38) is of major interest, but is poorly understood.
A flu-like illness is reported by about 50% of patients prior to identification of renal or pulmonary disease. An influenza A2 infection was detected by rising antibody titers in one patient with anti-GBM antibody disease (39). Comparisons of the linear amino acid sequences of the common and unique regions of the three alternatively spliced mRNAs was done to see if any areas of striking homology to exogenous or endogenous proteins were evident which might serve as sights for induction of an immune response via the mechanism of molecular mimicry (40,41). An area of 7-amino acid linear homology (Table I) was found between the influenza A hemagglutinin and Q1. Of interest, a 7-amino acid sequence at the NH2 termini of GMP-140 (Table I) was found to be homologous with 7 amino acids of the deduced primary sequence of the V transcripts, in which there was a glycosylation site. GMP-140 (P-selectin) is important in recruitment of inflammatory and immune cells.
The demonstration of alternative splicing of the a3(IV) NC1 domain and the differential expression of the three mRNA products during development provides a new way to begin to examine both the possible antigenic implications in autoimmune disease and the structural abnormalities of hereditary nephritis characterized by the inability to detect this antigen.