cDNA, Deduced Polypeptide Structure and Chromosomal Assignment of Human Pulmonary Surfactant Proteolipid, SPL(pVal)*

In hyaline membrane disease of premature infants, lack of surfactant leads to pulmonary atelectasis and respiratory distress. Hydrophobic surfactant proteins of M, = 5,000-14,000 have been isolated from mammalian surfactants which enhance the rate of spread- ing and the surface tension lowering properties of phospholipids during dynamic compression. We have characterized the amino-terminal amino acid sequence of pulmonary proteolipids from ether/ethanol extracts of bovine, canine, and human surfactant. Two distinct peptides were identified and termed SPL(pVa1) and SPL(Phe). An oligonucleotide probe based on the va-line-rich amino-terminal amino acid sequence of SPL(pVal) was utilized to isolate cDNA and genomic DNA encoding the human protein, termed surfactant proteolipid SPL(pVa1) on the basis of its unique polyvaline domain. The primary structure of a precursor protein of 20,870 daltons, containing the SPL(pVa1) peptide, was deduced from the nucleotide sequence of the cDNAs. Hybrid-arrested translation and immunoprecipitation of labeled translation products of human mRNA demonstrated an M, = 22,000 precursor protein, the active hydrophobic peptide being produced by proteolytic processing to M, = 5,000-6,000. Two

Pulmonary surfactant is composed primarily of the phospholipids phosphatidylcholine, phosphatidylglycerol and lesser amounts of surfactant-associated proteins. Two groups of surfactant proteins have been distinguished on the basis of differential solubility in organic solvent systems. Surfactantassociated protein of M , = 35,000 (SAP-35)' has been identified as an abundant glycoprotein present in numerous mammalian surfactants (1). SAP-35 is insoluble in ether/ethanol or chloroform/methanol (1,2) and arises from an mRNA encoding a 23,000-dalton polypeptide containing an approximately 70-amino acid, collagen-like amino-terminal domain (2-4). Hydrophobic small molecular weight proteins soluble in organic solvents have also been detected in a variety of mammalian surfactants (5-12). In recent work from this laboratory, we identified protein of M, = 6,000-14,000 in the ether/ethanol extracts of surfactant which were unrelated to SAP-35 or its fragments (11, 12). These same proteins were detected in surfactant extract preparations used clinically for treatment of hyaline membrane disease (11-15). cDNAs encoding one of these proteins, human surfactant proteolipid with amino terminus of phenylalanine, SPL(Phe), an M , = 7,500 peptide derived from an M, = 40,000 precursor, were recently reported by this laboratory (16). cDNA encoding a protein homologous to SPL(Phe), SP-18, was recently isolated from canine lung (17). Reconstitution of small molecular weight surfactant proteins with synthetic phospholipids imparts virtually complete surfactant-like properties to the mixture, including rapid surface absorption and surface tension lowering during dynamic compression (9, 10, 18). In the present work, we have identified a novel, small molecular weight hydrophobic surfactant protein, herein termed surfactant proteolipid with a polyvaline domain or SPL(pVa1).

MATERIALS AND METHODS
T4 DNA ligase and DNA restriction endonucleases were obtained from New England Biolabs, Beverly, MA. Reverse transcriptase was obtained from Life Sciences, Inc., St. Petersburg, FL, and used according to manufacturer's recommendations. Escherichia coli strain Y1090 ( A lac U169 pro A+ A Ion ara Dl39 strA supF (trpC 22::TnlO)pMC9) purchased from Clontech Laboratories, Palo Alto, CA, was used as the host strain for Xgtll. E. coli JM103 or JM109 (Pharmacia LKB Biotechnology Inc.) was used for growth of pUC plasmid and M13 phage subclones. The phage cloning vector Xgtll was used for the construction of a cDNA library as described by Young and Davis (19). pUC 19 and M13 were used for subcloning and DNA sequencing as described by Messing (20). The Xgtll library was constructed from human lung poly(A)+ RNA from an adult male from tissue obtained immediately at death as previously described (16). The tissue was provided by the National Diabetes Tissue Interchange, Washington, D. C. Wheat germ translation reagent was purchased from Promega Biotec Inc. and [36S)methionine from Du Pont-New England Nuclear. A human genomic lymphocyte library in XEMBL3 was purchased from Clontech Laboratories, Inc., Madison, WI.
Protein Purification-Hydrophobic surfactant proteins were purified from etherlethano1 extracts of human, canine, and bovine surfactant obtained by alveolar lavage as previously described (11, 12). Proteins were then delipidated by silicic acid chromatography in The abbreviations used are: SAP-35, surfactant-associated protein of M, = 35,000; SPL(Phe), human surfactant proteolipid with amino terminus of phenylalanine; SPL(pVal), surfactant proteolipid of M, approximately 5000-6000 containing polyvaline; SDS, sodium dodecyl sulfate; Pipes, 1,4-piperazinediethanesulfonic acid; PAGE, polyacrylamide gel electrophoresis. 9 chloroform/methanol, followed by extensive dialysis in chloroform/ methanol. Silver stain analysis of these preparations after SDS-PAGE revealed protein of M, approximately 5,000-6,000 as the most abundant protein; progressively smaller amounts of protein of M, approximately 14,000, 18,000, and 26,000 were observed (increased amounts of larger proteins were observed when the proteins were separated in the absence of sulfhydryl reducing agents) in these preparations both by silver stain and immunoblot analysis after SDS-PAGE as previously reported (11, 12). Antiserum utilized for immunoprecipitation was generated in albino rabbits by repeated injection of bovine proteolipid prepared from chloroform/methanol extracts of surfactant (12).
Amino Acid Sequence Analysis-Automated Edman degradation was performed as previously described on the essentially delipidated proteins using an Applied Biosystems Model 47A Evapophase protein sequencer (2,21). In every analysis the protein was deposited on Polybrene-treated glass fiber filters. Phenylthiohydantoin-derivatives of the amino acids were identified via high performance liquid chromatography on an Altex Ultrasphere ODS (4.5 X 250 mm) column, preceded by a precolumn (4.6 X 45 mm) of the same resin as previously reported (2,21).
Screening of Lung cDNA Libraty"SPL(pVa1) clones were identified from a Xgtll cDNA library prepared from poly(A)+ RNA of human lung tissue. Plaques were screened by making duplicate filter lifts of phage plated at 20,000 plaque-forming units/plate and screened with an oligonucleotide probe. The probe was (GTN)&, where N is either deoxycytidine or deoxyinosine (22) and was labeled with [y3'P]dATP using T, polynucleotide kinase. Labeling reactions were greater than 99% complete as determined by descending chrosolvent. Filters were prehybridized in 6 X SSPE (with 90 mM citrate), matography on DEAE-cellulose paper in a 0.3 M ammonium formate 2 X Denhardt's solution, SSPE contains 0.18 M NaCl, 10 mM NaPO, (pH 7.7), 1 m M EDTA. Denhardt's solution contained 0.02% bovine serum albumin, 0.02% Ficoll, 0.02% polyvinylpyrollidone, 0.1% SDS, and 50 pg/ml salmon sperm DNA. Hybridization was in the same solution with the polyvaline oligonucleotide probe at a concentration of 1.5 ng/ml, specific activity lo9 cpmlpg. Filters were washed eight times in large volumes of 6 X SSPE at 37 'C, air-dried, and exposed to XAR film for 96 h at -80 "C with Cronex Lightning Plus intensifying screens. EcoRI inserts from the positive Xgtll clones were subcloned into M13 for sequence analysis. Sequence was obtained using the diodeoxy method of Messing (20) utilizing either Klenow fragment of E. coli DNA polymerase I or the reverse transcriptase quasi end-labeling technique of Brunner et al. (23).
Screening of the Genomic Library-Clone 334.2 was labeled with [w3'P]dCTP using the nick translation reagent kit (Bethesda Research Laboratories Life Technologies, Inc., Gaithersburg, MD). This probe was used to screen the human lymphocyte genomic library in XEMBLI. Approximately 2 X 10' clones were screened at an initial plating density of 30,000 plaque-forming units/plate. The filters were screened in duplicate. The second and third screens were performed at 10-and 100-fold lower dilution, respectively. Phage DNA was prepared essentially as described by Maniatis et al. (24). To identify genomic fragments containing SPL(pVa1) sequences, cDNA described above was used in Southern blot analysis of genomic DNA after digestion with various restriction endonucleases. Southern blot analysis was performed using random primer labeled probes labeled with the random primer labeling technique (Pharmacia LKB Biotechnology Inc.) hybridized with the DNA in 5 X SSC, 5 X Denhardt's solution, 0.1% SDS at 65 'C. SSC contains 150 mM NaCI, 15 mM sodium citrate (pH 7.0). The nitrocellulose filters were washed twice in 2 X SSC, 0.1% SDS at 65 "C and four times in 0.2 X SSC, 0.1% (5 min each) at room temperature in 2 X SSC, 0.1% SDS, then twice SDS at 65 "C (20-30 min/wash). DNA fragments were subcloned into M13 vector mp8, mp9, mp18, or mp19 for diodeoxy sequencing by the quasi end-labeling method (23).
Dideoxy Nucleotide RNA-directed Sequencing-Dideoxynucleotide RNA-directed sequencing was done by a modification of the procedure described by Geliebter et al. (27).
ChromosomaE Assignment-The 3'P-labeled SPL(pVa1) cDNA (334.2) was hybridized to DNA obtained from the mouse-human chromosomal panels previously described by Griffin et al. (28) using essentially identical methodology and were kindly performed by Webster Cavenee, Ludwig Institute, Montreal, Canada. Cloning of cDNA for Human SPL(pVal)-An oligonucleotide probe based on the polyvaline sequence was utilized to screen a Xgtll expression library generated from human lung poly(A)+ RNA. Nucleotide sequence analysis of one clone (334.2) of 0.3 kilobases comprised an open reading frame predicting close identity to the amino acid sequence determined directly from the human protein and was used to isolate other clones from the same cDNA library (Fig. 1). Sequence analysis of nine unique clones resulted in a consensus sequence predicting a larger polypeptide precursor. Two distinct classes of cDNAs were detected by sequence analysis differing by the absence of 18 bases (463 to 480) in the 3"coding region in clones 311.3 and 13.1 compared to other clones including TP11.2. The Ile-Pro-Cys-Cys peptide was found within the reading frame of a larger polypeptide suggesting that the hydrophobic peptide of M, = 5,000-6,000 arises from proteolytic processing of a precursor protein at both the amino and carboxyl terminus (Fig. 2). Both clone 334.2 and clone -11.2 hybridized to a single 0.85-0.9-kilobase RNA after Northern
The broken underline indicates sequence from mRNAdirected dideoxy sequencing using a synthetic oligonucleotide based on sequence from clone RJ-21 as a primer. This sequence is present in the first exon of the SPL(pVa1) genomic clone VG519. Solid underlined amino acid sequence delineates predicted SPL(pVa1) sequence that matches amino acid sequence obtained directly on human SPL(pVa1) protein. Overlined DNA sequence is the 18-base pair sequence that is absent in cDNA clones 311.3 and 13-1. The predicted sequence and obtained sequence match at 16 of 17 amino acids, the difference being His3* instead of Asn. blot analysis of human lung RNA. Abundance of the RNA was greater in adult than fetal lung RNA (data not shown). Hybrid-arrested translation and immunoprecipitation with antiserum generated against bovine proteolipid resulted in complete arrest of a single M, = 22,000 polypeptide (Fig. 3). Residue Number  FIG. 4. Hydropathy analysis of the SPL(pVal) precursor. Hydropathy analysis using a span of 11 amino acids was performed on the entire predicted amino acid sequence according to the procedure of Kyte and Doolittle (30). Hydrophobicity is plotted as a function of residue number from Met' to the carboxyl terminus, Ile'". Values indicating hydrophobic and hydrophilic regions are above and below 0.5, respectively, as indicated by the horizontal dotted line. therefore utilized to predict the complete SPL(pVa1) mRNA.
RNA Sequence Analysis-Direct nucleotide sequence of SPL(pVa1) was derived from human lung RNA. The RNAdirected sequence overlapped with the cDNA sequence ( Fig.  2) and ended at a clear stop. An oligonucleotide based on the overlapping sequence was synthesized and utilized to locate this 5' exon from genomic DNA encoding SPL(pVa1): 5'A-G-Cloning of Genomic DNA-Screening the genomic library with the 334.2 probe resulted in the identification of 16 clones. Two classes of restriction fragment patterns were identified in these clones. The synthetic oligonucleotide based on the RNA sequencing experiment (above) was utilized to identify the 5' region of the SPL(pVa1) gene. Sequence from the 5'coding region of genomic clone VG519 overlapped exactly with the first RNA sequence and the 5' cDNA sequences, (Fig. 2). This genomic sequence is in the first exon of the SPL(pVa1) gene located 23 base pairs downstream from a consensus TATAA sequence further identifying the 5'-untranslated region of the SPL(pVa1) RNA. Complete analysis of SPL(pVa1) genomic DNAs will be reported separately.' The 5"untranslated portion of the mRNA begins 60 base pairs prior to a potential initiation which fits the criteria for a mammalian ribosomal binding site. There are two potential ATG start sites (bases 27 and 55) at the 5' end of the mRNA, the 5' most closely meeting the criteria described by Kozac (29). The 3' end of several cDNAs demonstrates a polyadenylation addition sequence predicting the end of SPL(pVa1) RNA. Two distinct classes of cDNA were isolated which encode the SPL(pVa1) active region, the predicted mRNA differing in the coding region 3' to the active M, = 5,000-6,000 peptide, wherein a deletion of 18 base pairs may result in two distinct polypeptide precursors differing by six amino acids. The predicted amino acid sequence is nearly identical to the partial amino acid sequence obtained from the surfactant proteolipid from human surfactant.
The predicted amino acid sequence of the entire SPL(pVa1) precursor was deduced from the cDNA clones, RNA sequencing, and the genomic DNA. The precursor comprises 197 amino acids (or 191 with the deletion) representing a 21,000dalton polypeptide. The size of the predicted polypeptide is consistent in size with the hybrid-selected translation product of M, = 22,000. Hydropathy analysis of the precursor protein was determined using the methods described by Kyte and Doolittle (30) (Fig. 4). There was no discernible signal peptide at the amino terminus, and the precursor polypeptide contained no asparagine-linked glycosylation sites, contrasting with the SPL(Phe) precursors which contain one or two potential asparagine-linked glycosylation sites (16). The SPL(pVa1) peptide begins at Ile" and the domain including amino acids Leu37 to SerG1 is compatible with a membraneassociated or spanning domain of 25 amino acids. This region contains the repeated valine residues. The precise carboxyl terminus of the SPL(pVa1) has not been identified directly, and numerous attempts to isolate proteolytic or CNBr fragments of the canine or bovine proteolipid have been unsuccessful.

C-A-A-G-A-T-G-G-A-T-G-T-G-G-G-C-
The identification of the polypeptide sequence of this hydrophobic protein completely distinguishes the SPL(pVa1) from SPL(Phe) and surfactant-associated protein of 35,000 daltons, SAP-35. Small molecular weight protein with composition consistent with the protein previously termed "surfactant apolipoprotein B" (1) has recently been identified as a carboxyl-terminal domain of SAP-35 (21). Mixtures of phospholipids with SAP-35 or its fragments were found to have relatively weak surface active properties compared to the surfactant proteolipids M, = 6,000-14,000 (11,21). Canine and bovine SPL(pVa1) alone conferred virtually full biophysical activity to phospholipid mixtures (18). Previous studies suggesting the importance of SAP-35 alone in conferring this surfactant activity may have been confounded by the presence of the surfactant proteolipids in lipid extracts used in those reconstitution studies (31). The present work supports previous studies in which small molecular weight surfactant proteins were distinguished from SAP-35 on the basis of amino acid composition and immunoreactivity (11,12,32).
Chromosomal Assignment-The 32P-labeled SPL(pVa1) clone (334.2) was hybridized to mouse-human chromosomal hybrids containing all human chromosomes as previously characterized (28). Hybridization was observed with hybrids containing only human chromosome 8 (data not shown).
Summary-The present work identifies a human hydrophobic surfactant polypeptide, SPL(pVal), its polypeptide precursor, and complete mRNA sequence. The SPL(pVa1) peptide and the control of SPL(pVa1) expression during lung development may be useful in diagnosis and treatment of hyaline membrane disease and other pulmonary disorders associated with surfactant deficiency.