Molecular Cloning of a Human Serum Protein Structurally Related to Complement Factor H*

Two cDNA clones termed H36-1 and H36-2 were isolated from a human liver cDNA library. Clone H36-1 appears to represent the recently isolated human serum proteins h37 and h42, which are two differently glycosylated forms of a protein antigenically related to human complement factor H. The H36-1 deduced protein sequence is 327 amino acid long and possesses a leader sequence. The secreted part of the protein is comprised of five tandem repeating units, termed short consensus repeats (SCRs). SCR 1 and 2 display high homology to the corresponding region of the recently isolated murine factor H-related cDNA clone 13G1. In contrast, the 3’-end of the H36-1 clone shows sequence homology to the 3‘-end of human complement factor H. The second clone, H36-2, is nearly identical to H36-1. Within 1148 base pairs, where the two clones overlap, their nucleotide sequences differed at nine positions. One nucleotide exchange in the sequence of H36-2 which was located within SCR 1 creats a stop codon (TAA). Consequently, the corresponding mRNA cannot code for a functional protein, suggesting that this clone is a transcribed pseudogene. These two clones represent new human members of the family of proteins structurally related to complement factor H. Complement regulatory complement cDNA clones for human murine factor H been derived from 4.4-kb‘ mRNAs (2,3). the and murine

Complement factor H is a 150-kDa serum glycoprotein and plays a regulatory role in the alternative pathway of complement activation (1). cDNA clones for the human and murine factor H molecule have been isolated and were derived from 4.4-kb' mRNAs (2,3). Both the human and murine factor H consist of 20 tandem repeating units of approximately 60 amino acids, termed short consensus repeats (SCR). These SCRs, which potentially evolved by gene duplication are conserved structural elements in a family of proteins termed regulators of complement activation (4). Members of this family include C4-binding protein, and at least four membrane proteins (the C3b/C4b receptor (CRl), decay accelerating factor, membrane cofactor protein, and the C3d/Epstein-Barr virus receptor (CR2)) (5). However, the SCR repeat *This work was supported by the Bundesministerium fur Forschung und Technologie Project 01VM8811 (to C. S. and R. D. H).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) X56209 and X56210.
The abbreviations used are: bp, base pairs; SCR, short consensus repeat; kb, kilobase(s). motif is not limited to complement proteins, as additional proteins like the b chain of clotting factor XIII, the lymphocyte homing receptor and the interleukin 2 receptor contain SCR elements (5).
Additional transcripts related to complement factor H have been described by Northern analyses in humans and in mice. Three complement factor H-related mRNA species of 3.5,2.8, and 1.8 kb, respectively, have been described in the murine system, and their corresponding cDNAs have been isolated (6). By using human liver RNA, two additional transcripts of 1.8 and 1.4 kb have been identified using 5'-and 3"specific probes of the factor H cDNA (7,8). cDNA clones representing the human 1.8-kb transcript have been isolated and characterized. Most likely this mRNA is derived from the factor H genomic locus by alternative splicing (9,10). The human cDNA for the 1.4-kb mRNA has not been isolated so far.
Recently two differently glycosylated forms (termed h37/ h42) of a human serum protein antigenically related to factor H have been isolated in our laboratory (11). The NHP-terminal sequence of this protein showed a high degree of homology to the NH2-terminal sequence of a factor H-related protein encoded by the murine cDNA clone 13G1 (6). The deduced protein sequence of the murine clone has a putative leader sequence, and the secreted portion is composed of five SCRs. While SCR 1 and 2 have no striking homology to known SCRs, both SCR 3 and 4 show very high homology to SCR 19 of the murine complement factor H. In addition SCR 5 as well as the 3"untranslated region were almost identical to SCR 20 and the 3"untranslated region of murine factor H. A specific fragment of this 13G1 cDNA identified a 1.8-kb transcript in murine liver RNA (6).
Here we describe the characterization of two cDNA clones related to the recently described human serum protein (h37/ h42) (11). The H36-1 cDNA clone appears to represents the two differently glycosylated, factor H-related proteins. The protein derived from this cDNA displays a hydrophobic leader sequence, is organized in five SCRs, and has two consensus sequences for N-linked glycosylation. The second clone H36-2 most likely represents a transcribed pseudogene since it contains an internal stop codon within SCR 1.

Labeling of Oligonucleotide Probes and Screening-A
human liver cDNA library synthesized by random priming and cloned into pGEM 42 was kindly provided by Dr. K. Stanley, Heart Research Institute, Sydney, Australia. It was plated and screened with two oligonucleotide probes. The first probe was synthesized according to the amino acid sequence covering the NHz-terminal end of the purified h42/h37 proteins, using a degenerative code. The second oligonucleotide probe covered 30 nucleotides of SCR 19 in the factor H sequence. The oligonucleotides were labeled with 32P using T-ATP and T4 polynucleotide kinase (12). After hybridization in 5 X Denhardt's, 6 X SSC (standard sodium citrate), 0.05% sodium dodecyl sulfate at 37 "C, the filters were washed in an identical solution at various temperatures Sequence Analysis of cDNA Clones-The nucleotide sequences of the cDNA inserts were determined by the dideoxy chain termination method (13) using dATP [r~-~'Ss] and Sequenase I1 (U. S. Biochemicals). Various oligonucleotides were synthesized and used as primers to sequence the two cDNA clones in both orientations.
Southern Blot Analysis-Human DNA (10 pg) isolated from Raji B cells was digested to completion with EcoRI or EamHI, separated in a 1.0% agarose gel, and transferred to a nylon membrane (PALL, Biodyne).
RNA Isolation and Northern Blot Analysis-Total cellular RNA was extracted with guanidine thiocyanate and isolated by centrifugation over CsCl (14). The RNA concentration was determined spectrophotometrically, and 8 pg of total cellular RNA was separated by electrophoresis in a formaldehyde-agarose gel and subsequently transferred to a nylon membrane (PALL).
Nick Translation and Hybridization-For hybridization the following inserts or fragments were used (i) a near full-length probe representing H36-2 cDNA; (ii) a 5'-fragment, the 290-bp DdeI fragment of H36-2. This fragment covered the 5'-untranslated region, the leader sequence, and the sequence encoding SCR 1; (iii) a 3'fragment, a cDNA clone representingposition 3233-3760 of the factor H cDNA. This sequence corresponds to SCR 18-20 of factor H and SCR 3-5 of H36-1 (and H36-2, respectively); and (iv) in addition the 1050-hp cDNA clone representing the 3'-end of the human 1.8-kh factor H mRNA was used (9). The DNA fragments were labeled with ,'"P by random priming (Amersham Corp.) and used for hybridization at 42 "C (5 X Denhardt's, 5 X SSC, 0.1% sodium dodecyl sulfate, 250 pg/ml denatured salmon sperm DNA, and 50% formamide). After hybridization for 14-18 h, the filters were washed at a final stringency at 0.1% SSC at 55 or 60 "C. The filters were exposed at -70 "C using intensifying screens (Quanta 111, Du Pont).

RESULTS
Isolation of H36-1 and H36-2 cDNA Clones-A random primed human liver cDNA library was screened with oligonucleotide probes related to human complement factor H. The sequence of the first probe was derived from the NH2terminal sequence of the previously isolated factor H-related serum protein using a degenerative code ( l l ) , and the second probe covered 30 bp of SCR 19 of human factor H. The two longest cDNA inserts termed H36-1 and H36-2 with 1149 and 1196 bp in length were isolated and subjected to sequence analysis. The two clones overlapped, and showed 99% identity. Clone H36-2 extended 48 bp at the 5'-end, and clone H36-1 extended one nucleotide at the 3'-end (Fig. lA). While clone H36-1 started within an open reading frame, representing a protein of 327 amino acids, clone H36-2 had an in frame stop codon at position 2-4 and a potential initiation site (ATG) at position 41-43. The sequence CCA AGC ATG shows a good match (six out of nine, including the ATG) with the consensus sequence of initiation sites GCC ACC ATG (15). In their overlapping region the two cDNA clones H36-1 and H36-2 differ in a total of nine nucleotides (Fig. IA), and the derived amino acid sequences differ at four positions (Fig. 1B). The sequence of the H36-2 clone displays a point mutation at position 196. This mutation creates an in-frame stop codon within SCR 1 at amino acid 52 and therefore the secreted protein is very short and seems functionally inactive. The next in-frame ATG is located at position 527 and would encode a Met in SCR 3. A cDNA with such a long 5'untranslated region, encoding a protein which initiates in the middle of an SCR, seems unlikely to represent a functional protein. Therefore, we conclude that clone H36-2 represents a transcribed pseudogene. In contrast clone H36-1 seems to represent the h37/h42 proteins. The total open reading frame predicts a protein of 330 amino acids in size, with a mass of 37.6 kDa.
Two potential N-linked glycosylation sites of the sequence Asn-X-Ser/Thr were found at position 126-128 (Asn-Ile-Ser)      and at position 194-196 (Asn-Trp-Thr), respectively. A hydrophilicity analysis of the predicted full-length polypeptide revealed a hydrophobic NHz-terminal region typical for signal peptides of secreted proteins (16). According to the criteria for signal peptide cleavage, this leader sequence is cleaved at position 18 (position 15 in the amino acid sequence derived from clone H36-1). That this site is actually used for cleavage was confirmed from the NHz-terminal sequence of the isolated proteins h37 and h42 (11). The molecular mass of the secreted, unglycosylated product was calculated to be 35.7 kDa. This size is consistent with the molecular mass as determined for the deglycosylated serum protein (11).

H36-1 M T M A C C A T GGMTTCTAT ATGATGAAGA AAAATATAAG C C A m T C C C AGGTTCCTAC
Structural Analysis and Homology-Structural alignment of the secreted protein encoded by the cDNA clone H36-1 indicated that the protein is comprised of five SCRs (Fig. 2A). Each SCR contains the characteristic 4 Cys (C) residues, and in addition, an Asn ( N ) , a Val (V), a Tyr ( Y ) , a Gly (G), and a Pro (P) residue is conserved in all five repeating units (all boxed in Fig. 2.4). SCR 1 and 2 of H36-1 are highly homologous to SCR 1 and 2 of the predicted protein of the murine factor H-related cDNA clone 13G1 (6), but possess no striking homology to any SCR of human factor H. The similarity of SCR 1 and 2 to the murine sequences is 79.4% on the DNA and 72.8% on the protein level. In contrast, SCR 3-5 and the 3"untranslated region do not show any significant homology to the murine 13G1 clone, but are almost identical to SCR 18-20 and to the 3'-end of human complement factor H (Fig.  2B). This homology is 98.3% on the DNA and 98.9% on the protein level.
A total difference of 12 nucleotides was observed in the conserved region of H36-1 and factor H (Fig. 2B). One nucleotide is exchanged in SCR 3 (homologous to SCR 18 in factor H), two nucleotides are different in SCR 4 (homologous to SCR 19), four nucleotides are different in SCR 5 (homologous to SCR 20), and five bases are different in the 3'untranslated region. Two nucleotide changes results in amino acid conversions: H36-1 has a Leu (L) at position 290 (numbering for the predicted full-length polypeptide) instead of Ser (S) (factor H), and an Ala (A) at position 296 instead of a Val (V). In the 3"untranslated region, a nucleotide exchange A instead of C at position 1094 creates the sequence ATTTA in clone H36-1, a sequence motif that has been associated with mRNA instability (17).

otructuro. Rooiduoo conoowod in all fivo SCRo aro boxcd, and tho two
Southern Hybridization-Southern analysis was performed to analyze the genomic structure of the H36-1 and H36-2 loci. The almost full-length H36-2 cDNA probe hybridized to several bands in EcoRIand BamHI-cleaved human genomic DNA. The intensities of the bands varied (Fig. 3). For example, the intensity of the 9.5-kb EcoRI band was stronger than that of the 7.9-or 5.9-kb bands. The EcoRI fragments that hybridized to the H36-2 probe represented at least 50 kb of genomic DNA. Due to the homology between H36-2, H36-1, and factor H, this probe will cross-hybridize to genomic fragments of factor H. To this end two smaller fragments were used for genomic analysis: a 5'-fragment, which covered the 5"untranslated region, the leader sequence and SCR 1 of both the H36-1 and H36-2 sequences; and a 3'-fragment representing SCR 18-20 of factor H, a region which is nearly identical to SCR 3-5 of H36-1 and H36-2 (compare Fig. 2B). Each of the two fragments, the 5'-fragment, which is specific for H36-1 and H36-2, and the 3'-fragment covered about 30 kb of genomic human DNA. The size of the fragments detected with end-specific probes is large enough to suggest that factor H and H36-1 (and/or H36-2) are represented by distinct genomic loci.
Expression Analysis-To analyze the expression of comple- the near full-length cDNA probe H36-2; (ii) a 5'-fragment, the 290bp DdeI fragment of H36-2 covering the 5"untranslated region, the leader sequence and SCR 1; and (iii) a 3"fragment derived from SCR 18-20 of human complement factor H. Due to the homology SCR 18-20 with SCR 3-5 of H36-1 and H36-2 cDNA this fragment also hybridized to DNA representing these two cDNAs.

A. A, nchomatic diagram of thv probvo. Thrvv prvbva W V r Q V Q~: (l)
ment factor H-related transcripts, Northern blot analyses were performed with human liver RNA. The filters were probed with the nearly full-length H36-2 cDNA and with smaller fragments. The H36-2 cDNA clone hybridized to several transcripts (Fig. 4, Table I). The smallest band of about 1.4 kb is expected to represent the mRNAs of H36-1 and H36-2. The same probe hybridized to a relatively broad band travelling near the 28 S ribosomal RNA. The 5'-fragment, which was expected to be specific for the 1.4-kb mRNAs detected the 1.4-kb species, and surprisingly, additional  Table I. Methods." the RNAs to that of 28 S and 18 S ribosomal RNAs.
"For description of probes compare Fig. 3 and "Materials and 'The mRNA sizes were calculated by comparing the mobility of This band was detected upon longer exposures. dThia amem m n c t likoly roproronts eovoral mRNh npocion of various size. Among them is the 4.4-kb mRNA which encodes factor H. This 4.4-kb mRNA was detected with the 3"fragment as a distinct band.
This protein is distinct from the h42 protein which represents the differently glycosylated form of h37 and which most likely is encoded by the 1.4-kb mRNA. mRNA near the 28 S ribosomal RNA (Fig. 4). The 3'-probe detected two dominant RNA species of 1.4 and 4.4 kb, and an mRNA of about 5.5 kb which hybridized with lower intensity (Fig. 4). The 1050 fragment, which represents the 3'-end of the recently described 1.8-kb factor H-related mRNA was used as a control (9). Due to its homology to complement factor H this probe also hybridized to the 4.4-kb mRNA. The relatively broad band near the 28 S ribosomal RNA detected with the H36-2 and the 5"probes seems to represent several mRNA transcripts. One of these transcripts codes for the 4.4kb complement factor H. However, this 4.4-kb mRNA hybridized as a distinct band, to the 3'-and the 1050 probes (Fig. 4). Therefore the other signals may be derived from additional so far uncharacterized mRNA species. In addition to this broad band, another mRNA transcript of about 5.5 kb was detected with the 3'-probe (Fig.4). The same RNA could also be detected with the H36-2 probe upon longer exposures (data not shown). These data indicate that several additional transcripts with sequences similar to complement factor Hand complement factor H-related cDNA clones are present in RNA derived from human liver (Table I).

DISCUSSION
To isolate cDNA clones representing the recently isolated human complement factor H-related serum proteins h42/h37, we screened a human liver cDNA library with oligonucleotide probes. Two almost identical cDNA clones termed H36-1 and H36-2, respectively, were isolated and sequenced. The two clones overlap for 1148 bp, and in this region they show a difference of nine nucleotides. The longest open reading frame of the two clones represents a protein of 330 amino acids. Several arguments suggest that the cDNA clone H36-1 represents the recently isolated human serum proteins h42 and h37: (i) the protein derived from the cDNA clone has a putative leader sequence indicative of a secreted protein, (ii) the predicted NH, terminus of the secreted product is identical to the NH, terminus of the serum protein, (iii) different usage of the two putative N-linked glycosylation sites, could lead to the two differently glycosylated forms (h42/h37) observed in human serum, and (iv) the molecular mass of the secreted, unglycosylated protein represented by the cDNA clone was calculated to 35.7 kDa. The deglycosylated h37 and h43 had identical mobility and under reduced conditions their mass was calculated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis to 38 kDa. This mass is in good agreement to that calculated for the secreted cDNA product.
The two cDNA clones H36-1 and H36-2 are 1149 and 1196 bp in length. The H36-2 probe hybridized to mRNA of 1.4 and 4.4 kb. Most likely these two cDNAs represent near fulllength clones of the 1.4-kb mRNA. Clone H36-2 displays a 5"untranslated region, and both clones have a 3"untranslated region. The cDNA clone H36-2 has a 5"untranslated region and an ATG initiation site at position 41-43. The nucleotide sequence of H36-1 starts 8 bp after this translation initiation site, indicating that this clone is not full-length. The 3"untranslated region of both clones is incomplete, as they do not possess a poly(A) tail. However, the 3'-untranslated region of the 1050 cDNA clone, which encodes the 3'terminal end of the 1.8-kb mRNA, is 271 bp long and ends with a poly(A) tail. Assuming a 3'-untranslated region of indicate that about 100 nucleotides are missing at the 3'-end. Therefore the two clones seem to represent the 1.4-kb mRNA.
Homology and Function-The murine cDNA clone 13G1 is the potential murine homologue of the human H36-1 clone. Similar to the protein represented by the H36-1 clone protein, aimilnr lcngth for both Lllc 1130-1 a d 1130-2 clulle wuuld the protein represented by the 13G1 cDNA has a putative leader sequence, and the secreted part is comprised of five SCRs. While SCR 1 and 2 of this predicted murine protein do not show any striking homology to the murine complement factor H sequence, SCR 3-5 as well as the 3"untranslated region are related to the murine factor H. Both SCR 3 and 4 are homologous to SCR 19, and SCR 5 and the 3"untranslated region are highly homologous to SCR 20 and the untranslated region, respectively. This mouse cDNA hybridized to an mRNA of 1.8 kb (6).
SCRs 1 and 2 of the deduced proteins of the H36-1 cDNA and of murine 13G1 cDNA have a homology of 72.8%, which is higher than the overall homology of the human and murine factor H proteins (61.2%). This strongly indicates that SCR 1 and 2 play an important role in the specific biological function of these molecules. In contrast, SCR 3-5 which are species specifically conserved may play a role similar to that of the corresponding part of complement factor H.
As the sequences of SCR 1 and 2 are distinct from any sequences of complement factor H, we think that the H36 protein can exert functions distinct from that of complement factor H. It will be of interest to determine their precise biological role.
Genomic Structure-Are both the H36-1 and the H36-2 cDNA clones derived from gene loci distinct from the factor H gene, or are they alternatively spliced products transcribed from the factor H locus? The overlapping sequences of H36-1 and H36-2 differ in a total of nine nucleotides. The differences in clone H36-2 can be explained by other than cloning artifacts. Alignment of the overlapping sequences of H36-1, H36-2, and factor H indicates that four of the nucleotide changes observed between the H36-1 and H36-2 clones represent mutations in the H36-1 sequence. In this case the H36-2 sequence is identical to factor H.
The nucleotide differences in the conserved regions of H36-1, H36-2, and factor H cDNAs are unlikely due to allelic variations or due to cloning artifacts, and therefore the cDNAs may be derived from distinct genes. Several cDNA clones representing either H36-1, H36-2, or factor H cDNA were isolated from the same cDNA library, which is derived from one person. So far, no additional mutation has been detected. Therefore, it seems most likely that both the H36-1 and the H36-2 mRNAs are transcribed from loci distinct from the factor H gene. Southern analysis with the H36-2 probe, which also hybridized to part of the factor H gene, demonstrated several genomic fragments, which covered at least 50 kb. In addition it has been demonstrated that the genomic locus of the potential murine homologue 13G1 is also distinct from the murine factor H locus (6).
Several members of the superfamily of SCR-containing genes are located on human chromosome 1 in a region termed the regulation of complement activation gene complex (18). These genes are characterized by their common usage of SCR. With few exceptions each SCR is encoded by a single exon. These SCRs conserve structural features in functionally distinct molecules, and their structural relatedness suggests that their genes evolved from a common ancestral gene by duplication events. Recent structural analyses indicate that duplication is a still ongoing process and duplication events are reported for the CR1 and C4bp protein (19-21). Similar duplication events may be responsible for the existence of two H36 loci, which are both transcribed to mRNA.
Northern Blot Experiments-These suggest the existence of additional factor H-related mRNA species. Besides the previously described RNA species of 4.4, 1.8, and 1.4 kb, which represent complement factor H and related cDNAs, additional mRNAs were detected here (Fig. 4, Table I). The various probes used for Northern analysis of human liver RNA detected (i) an RNA of about 5.5 kb, which hybridized with the H36-2 cDNA probe and with the 3"fragment. This signal was relatively weak and is shown in Fig. 4 with the 3'probe. An mRNA of identical size could be detected with the H36-2 probe upon longer exposures (data not shown). (ii) An mRNA of about 5.0 kb is detected with the H36-2 and the 5'probe. This mRNA is distinct from the 4.4-kb mRNA coding for factor H, as it is also detected with the 5'-fragment, which is specific for the H36-1 and H36-2 cDNA. (iii) In addition, the H36-2 as well as the 5"fragment detect an mRNA of about 4 kb in size. (iv) A further transcript of about 2 kb is detected with the 3'-probe derived from SCR 18-20 of factor H, (v) An mRNA of about 5.0 kb was detected with the cDNA probe derived from the 3'-end of the 1.8-kb mRNA.
To exclude cross-hybridization to other related SCR-encoding transcripts, the Northern blots were washed at high stringency. Therefore, each of the five additional mRNA transcripts should contain domains with high homology to the various probes. In human liver RNA an additional transcript of about 5.5 kb was detected with a cDNA probe (B38.1) which represents the 1.8-kb mRNA (8). In the murine system additional complement factor H related mRNAs of 3.5, 2.8, and two distinct transcripts of 1.8 kb have been described (6). The isolation of new cDNA clones encoding these mRNAs of complement factor H-related proteins, as well as the preparation of their encoded proteins from human serum, will help to demonstrate their relatedness to human complement factor H. It will be exciting to demonstrate the diverse biological functions of the individual members of this family of complement factor H-related proteins.