A cross-species analysis of the cystic fibrosis transmembrane conductance regulator. Potential functional domains and regulatory sites.

To help elucidate the function of the cystic fibrosis transmembrane conductance regulator (CFTR), we have undertaken a cross-species analysis of the DNA sequence which encodes this protein. We have isolated and characterized the cDNA of the bovine homologue of CFTR. The deduced amino acid sequence shows high overall identity with the published sequences from human and mouse, although there is marked variability between the different potential functional domains. The region around human amino acid 508, which is deleted in 70% of cystic fibrosis chromosomes, is highly conserved across species; of the missense cystic fibrosis mutations reported to date, all of the amino acids in the normal human sequence are conserved in the bovine and mouse sequences. A single amino acid encoded by the human cDNA (Ser-434) is missing in the bovine sequence, and there are two amino acids encoded by the bovine sequence which are absent in the human. These all stem from in-frame 3-base omissions within the sequences. In addition to the cow, we amplified the DNA sequences encoding a portion of the R-domain from sheep, monkey, rabbit, and guinea pig. These sequences show relatively low overall sequence identity (63%), but nearly all of the potential protein kinase A and protein kinase C phosphorylation sites are conserved over all of the species examined. Our results suggest functional significance for certain highly conserved residues and putative domains within CFTR.

To help elucidate the function of the cystic fibrosis transmembrane conductance regulator (CFTR), we have undertaken a cross-species analysis of the DNA sequence which encodes this protein. We have isolated and characterized the cDNA of the bovine homologue of CFTR. The deduced amino acid sequence shows high overall identity with the published sequences from human and mouse, although there is marked variability between the different potential functional domains. The region around human amino acid 508, which is deleted in 70% of cystic fibrosis chromosomes, is highly conserved across species; of the missense cystic fibrosis mutations reported to date, all of the amino acids in the normal human sequence are conserved in the bovine and mouse sequences. A single amino acid encoded by the human cDNA (Ser-434) is missing in the bovine sequence, and there are two amino acids encoded by the bovine sequence which are absent in the human. These all stem from in-frame 3-base omissions within the sequences. In addition to the cow, we amplified the DNA sequences encoding a portion of the R-domain from sheep, monkey, rabbit, and guinea pig. These sequences show relatively low overall sequence identity (63%), but nearly all of the potential protein kinase A and protein kinase C phosphorylation sites are conserved over all of the species examined. Our results suggest functional significance for certain highly conserved residues and putative domains within CFTR.
Cystic fibrosis (CF),' the most common lethal inherited disease in the Caucasian population, is a multisystem disorder whose pathophysiologic changes include the classic clinical triad of chronic pulmonary disease, pancreatic insufficiency, and elevated sweat electrolytes (1,2). Following the localization of the CF locus to chromosome 7 (3)(4)(5)(6)(7)(8)(9)(10)(11), a strategy of * This work was supported in part by a post-doctoral fellowship from the National Cystic Fibrosis Foundation (to G. D.) and grants from the G. Harold and Leila Y. Mathers Charitable Foundation and The Ben Franklin Partnership. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisernent" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) M761. 28. 11 To whom correspondence should be addressed Div. of Genetics, Children's Hospital of Philadelphia, 34th St. and Civic Center Blvd., Philadelphia, PA 19104. Tel.: 215-590-2944; Fax: 215-590-3850. ' T h e abbreviations used are: CF, cystic fibrosis; CFTR, cystic fibrosis transmembrane conductance regulator; ABC, ATP-binding cassette; kb, kilobase(s); PCR, polymerase chain reaction. chromosome walking and jumping was used to identify the CF gene, which contains 27 exons and spans approximately 250 kb of DNA (12)(13)(14). A 3-base pair deletion in exon 10 of this gene is found in approximately 70% of CF associated chromosomes. Other sequence aberrations have been identified in the CF gene, and the number of mutations identified totals more than 95.' The gene is predicted to encode a protein named the cystic fibrosis transmembrane conductance regulator (CFTR). Overexpression of CFTR in CF cells in culture complements the chloride conductance abnormality, which is characteristic of these cells (15,16).
Despite these recent major advances in the identification and characterization of the CF gene, our understanding of the function of CFTR and the molecular pathophysiology in this devastating disease is far from complete. Protein data base searches revealed that CFTR is similar to a group of proteins referred to as the ATP-binding cassette (ABC) family of transmembrane transporters (13). The ABC family of proteins has both prokaryotic and eukaryotic members, whose common function is the ATP-dependent transport of molecules or ions into or out of the cell (17). Structurally, these membrane-spanning transporters are characterized by highly conserved nucleotide-binding sites. The deduced amino acid sequence of CFTR shows considerable sequence similarity to the nucleotide-binding sites of the members of this family. In addition, hydropathy plots suggest that CFTR has 12 membrane-spanning domains arranged in two clusters of six, similar to the P-glycoproteins, which are a large subgroup of the ABC family of proteins. Based on these similarities, the proposed structure of CFTR suggests that CFTR may function as a membrane-spanning ATP-dependent transporter.
One domain of CFTR not common to the other members of the ABC family is the R-domain, a 241-residue hydrophilic portion of the protein operationally defined by those amino acids (590-830) encoded by exon 13 of the human gene. The R-domain contains numerous potential phosphorylation sites (nine protein kinase A and seven protein kinase C consensus phosphorylation sites), suggesting that this domain may function in a regulatory capacity. However, the deduced amino acid sequence of the R-domain shares no significant similarity with any known protein, and while much has been speculated, no function has been demonstrated for this unique domain.
Considerable effort has focused on the identification of CFTR nucleotide sequence aberrations that are found in patients with CF. In addition to the obvious importance in the molecular diagnosis of CF, the identification of mutations associated with disease can implicate which amino acids and/ or protein domains have functional significance. However, L.-C. Tsui, personal communication.

22762
Cross-species Comparison of CFTR construction of a map of functional domains based on a large series of mutations is subject to ascertainment bias. For example, certain regions of a gene may be more susceptible to mutations; alternatively, some mutations might be incompatible with fetal survival. In addition, definitive proof that a DNA sequence aberration is a mutation responsible for disease requires functional analysis.
Recent studies have examined the possible physiological function of the CFTR by overexpression of the recombinant protein in non-epithelial cells (18, 19). The results show CAMP-stimulated anion permeability in cells where such conductance is normally not observed, suggesting that CFTR either directly or indirectly regulates chloride channels. Since the cells used in these studies do not otherwise conduct chloride, a simple explanation compatible with the data is that CFTR may itself be a chloride channel. This explanation has been contested, however, in a recent report (20).
As an initial phase of our studies on the biochemical function of CFTR, we have undertaken a cross-species comparison of the CFTR cDNA. Presented here is the cloning and sequence characterization of the bovine CFTR cDNA. We focused particular attention on the R-domain and compared sequences from four additional species. The results of this study identify evolutionarily conserved residues/domains that are likely to have functional importance and complement analysis of human CFTR mutations. In addition, this analysis provides information necessary to rationally select amino acids to be targeted for site-directed mutagenesis studies of function. During preparation of this paper, the sequence of the mouse homologue of CFTR was published (21), and this information helps to extend our analysis.

MATERIALS AND METHODS
General Methodology-All reagents were reagent grade from J. T. Baker Inc. or Fisher, unless otherwise noted. All bacteriological media were from Difco. Restriction enzymes were purchased from Bethesda Research Laboratories. Oligonucleotides were made by the Protein/ Nucleic Acid Core Facility, Department of Pediatrics, University of Pennsylvania School of Medicine.
cDNA Cloning-The techniques used for cDNA library construction have been described (22), and reagents were from Invitrogen (San Diego, CA) unless noted otherwise. Total mRNA was isolated from bovine tracheal epithelium tissue (23) and poly(A+) RNA was then selected using oligo(dT)-cellulose (3'+5', Inc., West Chester, PA). First strand cDNA was synthesized using reverse transcriptase and an oligo(dT) primer. Second strand cDNA was synthesized using RNase H digestion of the RNA-DNA hybrids and Escherichia coli DNA polymerase I. T4 DNA polymerase was used to blunt-end the cDNA, and then hemiphosphorylated, NotI/EcoRI adaptors were added by blunt-end ligation. A portion (0.3 pg) of this cDNA was size-fractionated by agarose gel electrophoresis, and a fraction (1000-8000 bp) was recovered by electroelution. The cDNA was ligated to EcoRI-digested X g t l O (Stratagene), and the recombinant phage were packaged using Gigapak-Gold packaging extract (Stratagene). Approximately 5 X 10' independent phage were obtained. A second cDNA library was constructed using a similar strategy except that random hexamers (Boehringer Mannheim) were used for first strand cDNA synthesis. For both libraries, approximately 10' phage were plated a t a density of 3 X 104/150-mm plate on a lawn of C600 hfl E.
coli. Duplicate lifts were made using Colony/Plaque Screen filters (Du Pont), and the filters were screened sequentially using standard techniques (24). Stringency of hybridization was reduced to allow for cross-species hybridization when appropriate (hybridization a t 42 "C in 20% formamide, 5 X SSC; final wash a t 55 "C, 6 X SSC). PCR Amplification-PCR amplification was carried out using standard protocols (25). In general, amplifications were performed in a Perkin-Elmer Cetus thermal cycler, using recombinant Taq DNA polymerase (AmpliTaq, Perkin-Elmer Cetus). Initial denaturation was at 94 "C; amplification was performed for 30 cycles of 1 min a t 94 "C, 1 min at 55 "C, and 3 min a t 72 "C. Primers for producing the human CFTR sequence used to isolate the bovine cDNA were as follows. 5"TCTAGATGCGATCTGTGAGCCGAGTCTT-3'

5"GTCGACTTATTGAGAAGGAAATGTTC-3'
Primers used in the first amplification of bovine cDNA for isolation of CF2000/PCR were as follows. Nested primers for the second amplification were as follows.

5"TCGCTGAATGATGTGACCTT-3' (HCF2487A)
Primers used in the amplification of bovine genomic DNA from exons 9 and 19 were as follows.
RNA for Northern blot analysis was purified from fresh tissue, fractionated by agarose gel electrophoresis in the presence of formaldehyde, and blotted to nylon membranes as above. Radioactively labeled DNA probes were hybridized to the immobilized RNA in 20% (v/v) formamide, 5 X SSC, 5 X Denhardt's, 0.1% (w/v) sodium dodecyl sulfate a t 42 "C and washed in 0.1 X SSC, 0.1% sodium dodecyl sulfate at 65 "C.
DNA Sequencing-Purified plasmid DNA was sequenced using the dideoxy termination method with T7 DNA polymerase (U. S. Biochemical Corp.). Pools of amplified DNA from PCR reactions were sequenced directly after purification by agarose gel electrophoresis, using oligonucleotide primers end-labeled with 32P, and dideoxynucleotides in a modified PCR amplification (26). All DNA sequence was obtained from both strands, including that for PCR and plasmid templates.

RESULTS
Cloning and Sequencing of the Bovine CFTR cDNA-For the initial screening, a cDNA library was constructed from bovine tracheal RNA in a non-expression vector, X g t l O , using oligo(dT) primers. The probe used for screening was a DNA fragment spanning residues 3600-3816 of the human CFTR sequence, obtained by PCR amplification of human genomic DNA using primers from the published sequence (13). Plaque lifts were screened by hybridization of probe under reduced stringency to allow for cross-species hybridization (hybridization at 42 "C in 20% formamide, 5 x SSC; final wash a t 55 "C in 6 X SSC). Three positive clones were identified and were purified by two additional rounds of screening a t progressively lower plaque densities. Clone shows human CFTR with potential functional domains (13). Below are the three bovine clones, XBCFTR3.3, pCF2000/PCR and pCFlO9. Restriction enzyme sites in the bovine clones are also shown: E , RamHI; H , HindIII; X, XhoI.
3.3 bears extensive similarity to the human CFTR and corresponds to the 3' portion of the human cDNA. T o obtain the 5' portion of the cDNA, a pool of bovine tracheal cDNAwas amplified by PCR using special conditions. A 3' primer (BCFTR 165) based on the known bovine sequence from BCFTR 3.3 and a 5' primer (HCFTR 155) based on the human sequence were used. The oligonucleotide sequence of the upstream primer was specifically chosen because it contained 3 thymidines on the 3' terminus of its 24-base sequence. The presence of thymidines on the 3' end of PCR primers reportedly relaxes the requirements for perfect complementation (27). This amplification yielded a single product which was detectable by Southern blot analysis. This PCR product was isolated by agarose gel electrophoresis, and was then used as a template for a second round of PCR, using a nested downstream primer (BCFTR 100) and a nested upstream primer (HCFTR 210). This second PCR amplification yielded a single product of -2.0 kb which was detectable by ethidium bromide staining. The band was gel-purified and sequenced by direct PCR-based sequencing. This PCR product was also subcloned into a plasmid vector and yielded a clone, pCF2000, which was sequenced in its entirety. The PCR product represented the region of the human CFTR sequence beginning with nucleotide 211 and extending to base 2303, overlapping with the first 80 nucleotides of XBCFTR 3.3.
The pCF2000 insert was used as a probe for screening a second cDNA library constructed using randomly primed reverse transcription. This screening yielded a single positive clone with a 1.6-kb insert. This insert was subcloned into a plasmid vector, and sequence analysis indicated that this clone, pCF109, extended the bovine CFTR sequence by an additional 140 bases. The three overlapping clones are shown in Fig. 1, with a schematic view of the correspondingly aligned human CFTR cDNA regions encoding the putative functional domains as indicated.
The complete nucleotide sequence of the bovine CFTR cDNA is presented in Fig. 2. Inspection of the sequence revealed the presence of an open reading frame of 1481 amino acids, with extensive similarity to the human sequence. The open reading frame begins with an ATG codon a t base 61 and continues to a stop codon a t base 4503. The nucleotide sequence surrounding this ATG codon fulfills the consensus for a start codon as defined by Kozak (28). A polyadenylation signal was found at base 6095, and the sequence ends with a poly(A) tail.
When compared with the published sequence for human CFTR we see that the overall DNA sequence identity between the two sequences is 80%. Over the translated region the identity is 91%. When comparing sequences from each of the 27 human exons, all but five are 90-97% identical. Included in these five is exon 13, which encodes the R-domain and is only 85% identical at the nucleotide level.
Comparison of Deduced Amino Acid Sequences of CFTR-When the deduced amino acid sequences of bovine and human CFTR are aligned, we observe high overall similarity (Fig. 3).
There is a 90% sequence identity between the two molecules; the similarity is 96% when allowing for conservative substi-tutions3 The mouse sequence is less similar, 76% overall (Fig.  3).The second and sixth intracellular domains are particularly well conserved, as are isolated stretches of the first, fourth, and seventh. The carboxyl-terminal tail (residues 1430-1481), the R-domain (590-830), and an isolated segment of the seventh intracellular domain (1172-1188) are notably divergent. The two nucleotide binding domains share 95% sequence identity overall (the mouse sequence shares 84% identity with human and cow over this region). Within this domain, the segments which are most similar to the other members of the ABC transport family (residues 433-473, 488-513, 542-584, 1219-1259, 1277-1302, and 1340-1382) have 96% identity between the human and cow sequences. The 2nd, 5th, 6th, and 11th proposed transmembrane domains are perfectly conserved in the three species, and the 1st and 12th are 95% identical. The other six transmembrane domains are less well conserved (75-90% identity). Both potential sites of N-linked glycosylation in the human sequence (13) are conserved in the bovine and mouse sequences, including the Asn-Xaa-Thr/ Ser consensus sequence, found at residues 894 and 900.
Several sequence differences are particularly noteworthy (Fig. 4). In human exon 9 there are three nucleotides encoding a serine residue (Ser-434) that are absent in the bovine sequence. In two other sites there are single codons in the bovine sequence that are absent in the human. There is a glycine codon in the bovine sequence not found in exon 13 of the human sequence, between amino acids 773 and 774. There is an aspartic acid codon in the bovine sequence, not found in exon 19 of the human sequence, between amino acids 1171 and 1172. These three amino acid differences all stem from 3-base omissions in one species or the other, which is reminiscent of the F508 deletion found among 70% of CF alleles. Similar omissions are seen when comparing the mouse sequence (21) with the bovine and human sequences (Fig. 3).
In order to determine whether the three codon changes were due to genetic polymorphisms particular to the cows whose RNA was the source of the library, or are characteristic of the wild-type sequence of B. taurus, we examined the DNA sequence from two independent breeds of cow. Genomic DNA was isolated from Jersey and Holstein cows and subjected to PCR amplification in the exons containing the sequence variations. These amplified fragments were sequenced by direct PCR sequencing (data not shown) and were observed to be identical to the sequence obtained from the cDNA clones, indicating that these interesting variations are not bovine polymorphisms or cloning artifacts. Sequence Conservation of the R-domain between Several Mammalian Species-The sequence differences between cow and human in the R-domain are striking. We therefore amplified a portion of the R-domain (Fig. 2, residues 603-776) from four additional species (monkey, sheep, rabbit, and guinea pig) and determined their DNA sequence by direct PCR sequencing. Oligonucleotide primers complementary to conserved regions of the human and bovine cDNA sequences were used for amplification.
We selected oligonucleotide primers whose sequences contained several thymidine residues on their 3' ends (27).

N S Y A V I I T S T S S Y Y I F Y I Y V
CCAGTGGCTGACACTTTGCTTGCTCTAGGACTCTTCAG~GTTTACCACTGGTGCATACT 2880

L I T V S K T L H H K M L Q S V L Q A P
ATGTCMCCCTCMCACGTTGAAkhCAGGTGGGATTCTTMTAGATTCTCCAMGATATA 3000

M S T L N T L K T G G I L N R F S K D I GCAGTTTTGGATGATCTTCTGCCTCTTACCATATTTGATTTTGTTCAGTTGTTATTMTT 3060
A V L D D L L P L T I F D F V Q L L L I GTGATCGGAGCTGTGGTGGTCGTCTCAGTTTTACAGCCCTAUTCTTCCTAGCTACAGTG 3120

P V I A A F I L L R A Y F L H T S Q Q L
MGCAGCTGGMTCTGMGGCAGGAGTCCMTTTTCACTCATCTTGTTACMGTTTm 3240

S R V F K F I D M P T E D G K P N N S F
AGACCATCCMGGATAGTCAGCCCTCAAkhGTTATGATUTTGAGMTCMCATGTGMG 3660

L L N T K G E I Q I D G V S W D S I T L
CMCMTGGAGGMGGCCTTTGGAGTCATACCACAG~GTATTCATCTTTTCTGGMCA 3960

V D G G C V L S H G H K Q L M C L A R S
GTTCTCAGTAMGCAMGATCTTGCTGCTTGAT~CCCAGTGCTCATTTGGATCCMTA 4200 TTCCAMTMTTTGACMTCAGAMCATCCCTTMTTGGGCTACGGGCTGTTACAGTCCA 5220

T G C A G G G M C~G A T A T A G C T C C C T T G A T T T G T C A G T~C G G T T C T G C T T C C A~G 5280 T A G C A M C T T A T M G A T G T C T G M C T G M C C A T G G C T G G G~T G C T T T M T G C A T A T T 5340
GGTACAGWCCCTTCATMCACAGMGTTCCAGGTACAGAGTGTGCAGGCTAGAMG 5400 ACCGTGGGCACTCCATGTAWTGTACATGAMTTCAMGATCTAGGTATAGAGAGGGTGA 5460 TGTGTGTTTCAGGCTACCTATGTGCACCTCATGCTGTACATACTGAGCGGG"TMTG 5520

63% identical residues between all seven species over the 173
While a high level of divergence is seen in portions of the residues examined. As expected based on evolutionary consid-R-domain, it is interesting to note that all but two potential     serine and threonine phosphorylation sites (Fig. 5, underlined) in the compared sequences are conserved. Also of interest is the site of the bovine glycine amino acid "insertion" between residues 773 and 774 (see Fig. 4) of the human sequence. The human sequence is conserved in monkey, rabbit, and guinea pig. In the sheep sequence, the glycine of the bovine sequence is replaced by a cysteine and in the mouse it is phenylalanine (21). Furthermore, a proline residue (human Pro-750) found in all other species is absent in the guinea pig. A total of five omissions are seen in this region in the mouse sequence.

F R K N L D P Y G Q W S D Q E I W K V A D E V G L R A V I E Q F P G K L D F V L M G G i V L S H G H K Q~C L A R S V L S~K I L L L D E P S~L D P I T X Q I I R R~L K Q A~~~~~~
Expression of CFTR Message in Bovine Trachea-RNA from whole lung and tracheal mucosa was subjected to Northern blot analysis using the 4.0-kb cDNA insert from clone pBCFTR3.3 as a probe. Under high stringency a single band of approximately 6.5 kb was observed in the tracheal mucosa lane, while no signal was seen with lung RNA (Fig. 6 A ) , even upon long exposure. Both lanes had identical amounts of RNA as evidenced by ethidium bromide staining of the gel (data not shown) and by hybridization to a bovine a-tubulin probe.
Alternative splicing has been reported for p-glycoprotein (29) and for the human CFTR (30). A large percentage of CFTR message in human trachea lacks exon 9. Consequently we examined the bovine tracheal RNA for the presence of such an alternatively spliced message. cDNA from tracheal RNA was amplified by PCR in the region corresponding to human exons 8-10. PCR products were separated by electrophoresis and stained with ethidium bromide. A PCR product corresponding to a normally spliced message containing exon 9 was easily detected by ethidium bromide staining (data not shown) and by Southern blot analysis (Fig. 6 B ) . However, we failed to detect alternatively spliced mRNA similar to that seen in human CFTR, either by ethidium bromide staining or Southern blot analysis (Fig. 6B).

DISCUSSION
The identification of the CF gene based on chromosomal localization has led to a challenging situation in which there is extensive understanding of the molecular genetics of the disease but an inadequate understanding of the basic biochemistry of the corresponding protein and its function. We now report the cross-species sequence comparison of CFTR, which complements current approaches to understand the function of this important protein. The very recent report of the mouse sequence extends and strengthens this analysis (21).
A Comparison of Nucleotide Sequence Reveals 3-Base Omissions-In order to identify functionally important residues and domains of CFTR, we have cloned the cDNA which encodes bovine CFTR and have compared the sequence with that of the human and mouse sequences. This analysis has revealed that the unusual 3-base pair deletion that characterizes the most common mutation in CF also characterizes one type of sequence difference seen across species. Three sites within the CFTR sequence have been observed to contain 3base omissions when comparing the human and the bovine sequences. These sequence differences are probably characteristic of the wild-type sequence, since they are found in unaffected humans and healthy cows (Fig. 4). Interestingly, the location of each of these differences is in proximity to the location where nucleotide insertion or deletion mutations have been found in some CF patients. This may indicate that these sections of the CF gene are more susceptible to DNA recombination. In addition, a single amino acid omission is FIG. 5. Comparison of deduced amino acid sequences in a portion of the R-domain from bovine, sheep, human, monkey, guinea pig, rabbit, and mouse CFTR. Numbers refer to the bovine amino acids (see Fig. 2). Dashes represent identical amino acids to the bovine sequence. An asterisk represents a gap. Potential serine and threonine phosphorylation sites (14) are underlined. Human and mouse sequences were from the literature (13,21). also noted in in this region of the guinea pig sequence (Fig.  5 ) . A total of six amino acid omissions were reported in the mouse sequence (21).
A Comparison of Putative Functional Domains: Clues about the R-domain and Certain Transmembrane Domains- Fig. 7 shows a schematic model of the predicted structure of CFTR, based on our interpretation of hydropathy plots, computer predictions of a-helical regions, and sequence comparisons, and is presented for purposes of discussion. It includes an indication of the sequence differences between cow, human, and mouse (blue circles). This comparison serves as a basis for identifying functionally significant residues. The analysis is complemented by a cross-species comparison with p-glycoproteins. Devine et al. (29) studied four p-glycoproteins which confer a multidrug resistance phenotype (hamster pgp-1, mouse mdr-1, mouse mdr-3, and human mdr-1). Parallels may be drawn between the sequence comparisons of p-glycoprotein and CFTR, two types of ABC transporters. The nucleotide binding domains of the p-glycoproteins are well conserved (93% identity of the first and 90% of the second) as are those of CFTR (84% between cow, human, and mouse). There are three p-glycoprotein transmembrane domains that are very similar: the 2nd (83%), 4th (loo%), and 11th (90%). The remaining nine transmembrane domains are 43-76% conserved. The functional significance of p-glycoprotein transmembrane domains was demonstrated recently (31). Mouse mdrl multidrug resistance function could be abolished by substituting certain transmembrane domains of the mouse mdr2, a closely related molecule that is not capable of conferring the MDR phenotype. In CFTR, some of the transmembrane domains (lst, 2nd, 3rd, 5th, 6th, 8th, llth, and 12th) are highly conserved between cow, human, and mouse, while the other four are less well conserved. Dean et al. (32) have previously shown that the fifth and sixth transmembrane domains of CFTR are impressively conserved in Xenopus laeuis as well. This high conservation implies functional importance. Yet the CFTR sequences are quite different from the transmembrane domains of the p-glycoproteins and other members of the ABC family of transporters (data not shown), suggesting that they have distinct functions which are probably related to the type of ligand transported by the respective proteins.
Within the R-domain, which was originally proposed as a regulatory domain, the amino acid sequence identity between human and cow is lower (85%) than for any other region of the coding sequence. Sequence divergence is also seen in the mouse sequence (68% identity with cow). This suggests that for this domain the functional requirements are not linked to conserved primary structure. However, all of the potential sites of protein kinase A (Fig. 7, green) and protein kinase C phosphorylation (Fig. 7, yellow) in this region have been conserved not only between human and cow, but also between monkey, sheep, rabbit, and guinea pig (Fig. 5 ) . All but two FIG. 7. Schematic model of CFTR structure showing sequence differences between human, mouse, and bovine molecules, and missense mutations found in CF patients. Conservative amino acid changes are shown in light blue, non-conservative in dark blue. Consensus sites for phosphorylation by protein kinase A are in green, and by protein kinase C are in yellow. Missense mutations found in patients with CF are shown i n red, with mutations resulting in a premature termination codon indicated by a red dash. Black arrows indicate amino acid omissions between bovine and human sequences. Mutations are as reported in the literature (12,32,33,(39)(40)(41)(42)(43)(44)(45)(46). Intracellular domains are outlined: NBF I , putative first nucleotide-binding fold; NBF 2, putative second nucleotide-binding fold.
phosphorylation sites in this region are present in the mouse sequence (21). These data support the previous speculation that protein phosphorylation in the R-domain may play an important role in the function of the protein.
Evaluation of the Sites of Human Mutations-A number of amino acids of the human CFTR molecule have been hypothesized to be functionally important based on the detection of sequence aberrations in DNA from patients with CF. Indeed, of the missense CF mutations reported to date, all of the amino acid residues are conserved between the normal human and the bovine sequence (Fig. 7, red). These residues are also conserved in the mouse sequence. Of the sites of mutations found in CF patients, most important is the region which encodes the first proposed nucleotide-binding fold, human exons 10 and 11. Exon 10 is the location of the 3-base deletion which occurs in 70% of all CF alleles. This deletion leads to a loss of a phenylalanine residue which is conserved in the bovine CFTR, the mouse CFTR (21), and in some of the other ABC family members (13). Human exon 11 is the site of several missense mutations (33). The amino acids of this region are well conserved in the cow and mouse, supporting the hypothesis that this region is important from a functional standpoint. It might be reasonable to speculate that the cluster of mutations found in exon 11 is evidence of this region being particularly susceptible to nucleotide change; however, the nucleotides of this region are 95% identical to those of the cow and 84% identical in the mouse. This degree of identity is among the highest of the various exons and suggests that exon 11 is not a "hot spot" of nucleotide change. Rather, the data are consistent with the hypothesis that this region encodes a CFTR segment that is critical to normal protein function, has no increased propensity for nucleotide change, and is relatively intolerant to amino acid change.
An Analysis of Sites of Post-translational Modification-In the human CFTR there are a total of 29 potential protein kinase C and 10 potential protein kinase A phosphorylation sites (13). The protein kinase C sites are distributed throughout the putative intracellular domains of CFTR and all but two sites (Ser-256 and Ser-1444) are conserved in the cow. All but one of the protein kinase A sites (Thr-788) are conserved, and eight of these nine conserved sites are located in the R-domain. The importance of such phosphorylation is consistent with the indication that CAMP-dependent protein phosphorylation directly or indirectly regulates C1-conductance (34). Relatively little attention has been given to the possibility of CFTR regulation through tyrosine phosphorylation. There are 2 tyrosines in the human sequence that are adjoined by a single acidic residue on the amino-terminal side (380,385) and 1 tyrosine with two neighboring acidic residues (515). All of these are conserved in the cow and mouse, suggesting that in vivo they might be phosphorylation sites (35,36). Consistent with the notion of regulation by phosphorylation, recent studies on the multidrug resistance pglycoproteins have shown that phosphorylation has a major impact on the transport function of these proteins (37). However, the comparison with CFTR has limitations since the pglycoproteins do not have an R-domain. Within the putative fourth extracellular loop of the human CFTR are two consensus sites for N-linked glycosylation. Both of these sites are conserved in the cow and mouse. The conservation of phosphorylation and glycosylation sites in the CFTR suggest that these potential post-translational modifications are important for CFTR function.
There are 17 cysteines in the human sequence, 4 in transmembrane domains, and 11 in various intracellular domains. In the cow and mouse all of the transmembrane cysteines and 9 of the intracellular cysteines are conserved. Cysteines can form disulfide bonds, which could contribute to the stability of CFTR protein structure.
Expression of CFTR mRNA-We find a single CFTR message of approximately 6.3 kb from bovine tracheal mucosa. Little or no message is observed under the same conditions for RNA from bovine peripheral lung tissue, consistent with the results from human tissues (13).
Alternative splicing has been reported for p-glycoproteins in a number of different multidrug-resistant cell lines (29). It has also been observed that the CFTR message undergoes alternative splicing in normal human tracheal RNA. Specifically, all 183 base pairs of exon 9 have been observed to be spliced from the message in about 25% of CFTR transcripts, and in one case in up to 66% of transcripts (30). When we looked for a similar splicing pattern in pooled RNA from the bovine trachea, we found no evidence by PCR analysis. This may reflect a species difference, but it suggests that alternative splicing in this region is not required for normal function in the cow. Concluding Remarks-Most of the information regarding potential CFTR functional domains has emerged from indirect analysis by sequence comparison through data base searches and identification of human mutations. The crossspecies comparison that we present here strengthens these studies. It demonstrates the diversity which is tolerated in a wild-type sequence, and it highlights certain residues and domains that are particularly well conserved. Numerous questions are suggested by the data. Within the R-domain, numerous amino acid changes are observed, but the phosphorylation sites are conserved. Are the amino acid changes reflective of random drift, or do they indicate selection for speciesspecific interactions? Will a chimeric molecule replacing the human R-domain with that from the cow maintain a wildtype CFTR function? Are the conserved phosphorylation sites of the R-domain important for regulation of CFTR? An interesting and unexpected pattern of conservation of the transmembrane domains was observed. Is this related to ligand transport? Could the order of transmembrane domains be changed or could cross-species chimeras be constructed and function be preserved? These data, together with the other two sources of indirect information on function, will promote a rational, systematic formulation of testable hypotheses regarding structure and function of CFTR.