The Gene for Mouse Epidermal Filaggrin Precursor ITS PARTIAL CHARACTERIZATION, EXPRESSION, AND SEQUENCE OF A REPEATING FILAGGRIN UNIT*

Filaggrin is an important keratin intermediate filament-associated protein of terminally differentiated mammalian epidermis. Its aberrant expression has been implicated in a number of keratinizing disorders. We have isolated and sequenced a cDNA clone to mouse filaggrin, of 1.479 kilobase pairs, which represents less than 10% of the full-length mRNA estimated by Northern blot analysis to be 17 kilobases long. The cDNA clone delineates a 744-base pair repeat. This encodes a protein of 248 amino acids or 26,330 Da, which is almost identical to the known properties of mouse filaggrin in size, amino acid composition, and charge. Total mouse genomic DNA and the filaggrin gene isolated from a cosmid library were found to contain a super-stoichiometric repeat of the same size. These data support the hypothesis (Haydock, P.V., and Dale, B.A. (1986) J. Biol. Chem. 261, 12520-12525) that filaggrin is initially synthesized as a polyprotein precursor containing many tandem copies. However, our data suggest that the repeating filaggrin units of the precursor are not separated by "large linker" peptides as suggested by these authors. In situ hybridization was used to show that the filaggrin precursor mRNA is located precisely over the granular layer of the epidermis, indicating that expression of this gene is regulated at the transcriptional level as for the differentiation-specific keratin 1 protein. These probes will now permit detailed studies on the regulation of expression of the filaggrin gene.

Filaggrin is an important keratin intermediate filament-associated protein of terminally differentiated mammalian epidermis. Its aberrant expression has been implicated in a number of keratinizing disorders. We have isolated and sequenced a cDNA clone to mouse filaggrin, of 1.479 kilobase pairs, which represents less than 10% of the full-length mRNA estimated by Northern blot analysis to be 17 kilobases long. The cDNA clone delineates a 744-base pair repeat. This encodes a protein of 248 amino acids or 26,330 Da, which is almost identical to the known properties of mouse filaggrin in size, amino acid composition, and charge. Total mouse genomic DNA and the filaggrin gene isolated from a cosmid library were found to contain a super-stoichiometric repeat of the same size.
These data support the hypothesis (Haydock, P. V., and Dale, B. A. (1986) J. Biol. Chern. 261,[12520][12521][12522][12523][12524][12525] that filaggrin is initially synthesized as a polyprotein precursor containing many tandem copies. However, our data suggest that the repeating filaggrin units of the precursor are not separated by "large linker" peptides as suggested by these authors. I n situ hybridization was used to show that the filaggrin precursor mRNA is located precisely over the granular layer of the epidermis, indicating that expression of this gene is regulated at the transcriptional level as for the differentiation-specific keratin 1 protein. These probes will now permit detailed studies on the regulation of expression of the filaggrin gene.
The process of normal growth and development in mammalian epidermis involves the migration of cells committed to terminal differentiation from the proliferative basal layer into the suprabasal layers. Concomitantly, certain sets of specific proteins are synthesized that eventually comprise the bulk of fully differentiated stratum corneum cells. Among these are a defined pair of keratin intermediate filament chains (designated keratin 1 and keratin 10) (1-4), a nonfilamentous matrix protein termed filaggrin (5-9), and cell envelope proteins including involucrin (10). Numerous per-* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) 503458.
11 To whom all correspondence should be addressed Bldg. 10, Room 12N238, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892. turbations of genetic and environmental origin are known to disrupt this orderly process and are manifested in several well-known disorders of keratinization. In certain disorders of human and mouse epidermis, the amount of the keratohyalin, which is in part composed of the filaggrin precursor, is either diminished or excessive (11-13). However, the molecular bases of these disorders remains unknown.
Of these differentiation-specific proteins, filaggrin appears to have a very unusual biosynthetic history. It is initially synthesized as a large, insoluble, highly phosphorylated precursor which is deposited as keratohyalin granules (14-16). During terminal differentiation, this precursor is dephosphorylated and proteolytically cleaved to a smaller functional filaggrin molecule that can aggregate keratin intermediate filaments and promote disulfide-bond formation amongst the intermediate filaments in vitro and presumably in vivo as well (8,(16)(17)(18). Based on peptide mapping studies, it has been suggested that the filaggrin precursor has a repeating structure, consisting of several filaggrin repeats arranged in tandem or separated by linker peptides that are excised out during processing (19)(20)(21)(22). Accordingly, there are numerous steps at which numerous filaggrin biosynthesis could be altered perhaps promoting disease.
In this paper, we have isolated and characterized a cDNA clone encoding a portion of the mouse filaggrin precursor containing one full filaggrin repeat, and have used it to characterize the repeating structure of the gene. We have also employed in situ hybridization in an initial approach to study the regulation of expression of this gene.

EXPERIMENTAL PROCEDURES
Construction of cDNA Library-Newborn mouse (BALB/c) epidermal RNA was fractionated on a 5-30% sucrose density gradient as described (23). Fractions corresponding to species larger than the 28 S ribosomal RNA peak were collected and enriched for poly(A)+ RNA by chromatography on oligo(dT)-cellulose (2), from which a cDNA library was constructed in pBR322 (2).
Isolation and Characterization of Mouse Cosmid Genomic Cbnes-A mouse cosmid library (D2 cos NIH) constructed with the vector pSV13 containing inserts of genomic DNA from mouse 3T3 cells averaging 30-40 kbp' (24, 25) was a generous gift of Dr. Michael Gottesman (National Institutes of Health) and was utilized as described (24-26). First-round screening was done on filters containing about 3 X lo4 bacterial colonies/l5-cm filter with nick-translated probes generated from coding and 3'-noncoding portions of a cDNA clone and selected colonies were purified to homogeneity (26,27). The cosmid DNA was then purified for characterization as described (28).
Molecular Biology Techniques-RNA (Northern) and DNA (Southern) blots were done as before (2,29). Synthetic oligonucleotides were The abbreviations used are: kbp, kilobase pair; kb, kilobase; bp, base pair. of Mouse Filaggrin Precursor Gene labeled at their B'-ends (29). In some experiments they were also used as a primer to extend mouse epidermal poly(A)+ RNA with reverse transcriptase (28,30).
High molecular weight genomic DNA from mouse 3T3 cells was purchased from Oncor Inc. Both cDNA and cosmid clones were used to hybrid select mRNA from mouse epidermal poly(A)+ RNA (2), which was then translated in vitro using a reticulocyte lysate system (Du Pont-New England Nuclear) and a mixture containing 100 pCi each of 3H-labeled arginine, histidine, and glycine (23). Translation products were immunoprecipitated with a rabbit polyclonal antibody to mouse filaggrin which was kindly supplied by Dr. Robert Goldman (Northwestern University). Appropriate restriction enzyme fragments of the cDNA and cosmid clones were subcloned into M13mp18 or mp19 vectors (Bethesda Research Laboratories) for DNA sequencing, or pGEM-3 (Promega Biotec) vectors for the construction of sense and antisense riboprobes for in situ hybridization (4,31). DNA sequencing was performing using Maxam-Gilbert procedures with the modifications reported before (29) and by the dideoxy chain termination method using M13 (32).

RESULTS
Isolation and Characterization of cDNA Clones to Filaggrin Precursor-We have isolated and sequenced three tryptic peptides from mouse filaggrin which are GHQGAHQEQGR, QVHSGVQVEGR, and GHQHQHQR.' Several synthetic oligonucleotides corresponding to these sequences were prepared and in initial experiments on Northern blots, a number of these hybridized strongly to 28 S and 18 S ribosomal RNA. However, a negative strand probe of sequence 3'-GTECCNCGGGTGGTCCTCGT, A T T synthesized from the first peptide (identical to one isolated by Resing et al. (21)), also revealed a large species of about 17 kb. We were able to diminish the ribosomal RNA hybridization by primer extension of this oligonucleotide, such that only the major species a t 17 kb was detected (Fig. 1). We conclude that this represents the true size of the intact mRNA species for the mouse filaggrin precursor. Other investigators have previously identified a very large, although extensively degraded mRNA for the filaggrin precursor (22,23).
This extended oligonucleotide was then used to probe the cDNA library constructed from the size-fractionated RNA, from which many clones encoding filaggrin were identified. The longest clone was pFM3-2 and its nucleotide and deduced amino acid sequences are shown in Fig. 2. It encodes 1.479 kbp, has two polyadenylation signal sequences at the 3'-end, but does not possess a poly(A) tail.
This clone possesses a single open reading frame for the first 939 bp, encoding 313 amino acids. Notably, within the coding sequences, the first 195 bp (65 amino acids) share about 80% nucleotide and amino acid sequence homology with the last 195 bp (65 amino acids) of the coding sequences. This thereby delineates a 744-bp unique sequence repeat. This repeat encodes a protein of 248 amino acids, molecular weight of 26,330 which is close to the value of 25,840 deduced for mouse filaggrin by analytical ultracentrifugation (8). The calculated amino acid composition of this repeat is almost identical to the known amino acid composition of mouse filaggrin and its high molecular precursor (Table I). The protein (Fig. 2, residues 1-248) is strongly basic (pl=11.7), as originally appreciated from its properties (6-8), although there is an unusual tyrosine-rich acidic peptide region (Fig. 2, residues 151-163) of PI -4.5. Moreover, sequences of peptides previously described for mouse filaggrin (21, 22)' show more than 90% homology to portions of the cDNA sequence (see Fig. 2). Therefore, we suggest that the 248-amino acid repeat constitutes filaggrin. P. M. Steinert, unpublished data. This was probed with the cDNA clone pFM3-2. A single band estimated to be about 17 kb long is seen, together with some breakdown products typically seen with large mRNA species (33). The position of migration of the 28 S and 18 S ribosomal RNA species are shown. Similar results were obtained when probed with the extended oligonucleotide or the antisense riboprobes generated for in situ hybridization.
Two probes were generated from the cDNA clone to study the filaggrin precursor gene further in both total mouse genomic DNA and isolated cosmid clones. The coding probe consisted of the first 180 bp from the 5'-end to a PstI site; the 3"noncoding probe consisted of a 179-bp fragment from a DraI site to the 3'-end (Fig. 2).
Southern Blot Analysis of Mouse Genomic DNA-In Fig. 3, total mouse genomic DNA was cut with 8 restriction enzymes, 4 of which cut the 744-bp repeat of the cDNA clone and 4 which do not, separated on a 0.6% agarose gel, transferred to Nytran (Schleicher & Schuell) and examined with the two probes. Using the four enzymes BamHI, BanI, PstI, and XmaI which cut the 744-bp repeat of the cDNA clone, the coding probe detects bands at 1.5 to 15 kbp, depending on the enzyme utilized, as well as a prominent super-stoichiometric band at about 750 bp in each case (Fig. 3, lanes 1-4). However, the 3"noncoding probe recognized only the bands at 1.5 to 15 kbp in size (Fig. 3, lunes 9-22). In contrast, when genomic DNA was cut with the four enzymes EcoRI, HindIII, KpnI, and RsaI which do not cut the 744-bp repeat of the cDNA clone, a single much larger band is identified in each case with both probes, ranging in size from about 19-50 kbp (Fig.  3, lanes 5-8 and lanes 13-16). We conclude that the 19-50kbp bands represent genomic DNA fragments containing the near-or full-length filaggrin precursor gene. The super-stoichiometric 750-bp band represents repeating filaggrin units along the gene. The 1.5-15-kbp bands seen with both the coding and 3"noncoding probes contain the 3'-end of the gene and flanking sequences.
Isolation and Characterization of Cosmid Clones Containing the Mouse Filuggrin Precursor Gene-In order to examine this repeating structure in more detail, we isolated a large genomic fragment containing filaggrin precursor gene sequences from a cosmid library. Of the 12 cosmid clones selected by both the

FIG. 2. Nucleotide and deduced
amino acid sequence of the cDNA clone pFM3-2. The likely polyadenylation signal sequences, the PstI and DraI restriction enzyme sites used to make the coding and 3'-noncoding probes, and BamHI, BanI, and X m d repeating restriction enzyme sites, are underlined. The deduced amino acid sequence is given above the nucleotide sequence, using the single letter code. The sequence identified by our synthetic oligonucleotide is indicated (---    . . . . . 9 . .

G D S Q V H S G V H V E P S G G R S S S A N R R A G S S S G
GGTGACAGCCAAGTCCATTCTGGAGTCCATGTCGAGCCCAGCGGGGGCAGATCCTCATCTGCCAACAGGAGGGCTGGTTCCAGCTCTGGC . * . . . .

S . G . L A A S G V Q G A A A S G Q G G * 3 1 3 TCAGGGGTCCAAGGTGCAGCAGCAAGTGGTCAGGGAGGATAAGAATCCATATTTACAGCAAAGCACCTTGATTTTAATCAATCTCACAGC
TGTTCCTTTTTTCCAGTTAACATGGTTTTTCCATTTGGGTGATAGTTCCTTCTTATTCTAAGAATATTTTCAGCATTTAATCAGAATTTC TTCCACTGAAAACTAGACATTAATTTCACCAAAATGTTTTAAAATGCCCAAAATTTATATTGAGTGTTTGCAATCTTGTTCCTGCAATAG ui%"r GACTGACTTCTCCATGTGTTCCCCTTGGACTTTCTAGGATCTGCCCTTTCTTCAGCAATCGTGCAATCACATTGGACATACCAAATTCAA   coding and 3"noncoding probes, all had the same restriction enzyme profile. One of these clones, cFM6-1A2, was studied further with the same enzymes used above. The four enzymes which cut the 744-bp repeat of the cDNA clone all yielded a super-stoichiometric band at about 750 bp (Fig. 4, lunes 3-6).

C A C A G T A A A C T G T A T e T T G A C w C A
When examined with the coding probe, bands at 1.5-15 kbp as well as the super-stoichiometric band at 750 bp were recognized, exactly as seen above in total genomic DNA (Fig.  4, lanes 12-14). The 3'-noncoding probe recognized only the 1.5-15 kbp bands (Fig. 4, lanes 19-22). The four enzymes that do not cut the 744-bp repeat of the cDNA clone, all yielded an array of larger DNA fragments (Fig. 4, lanes 7-10), only one of which was recognized by both the coding and 3'noncoding probes (Fig. 4, lanes 15-18 and 23-26).

6.6-
for HindIII, 25 kbp uersus 18 kbp for KpnI, and 19 kbp uersus 15 kbp for RsaI. Therefore it is likely that cosmid clone does not contain the full-length gene, having sequences omitted from the 5'-end. Nevertheless, from the size of the RsaI fragment recognized by both probes within the cosmid clone (about 15 kbp, Fig. 4, lanes 18 and 26), it can be estimated that the cosmid could contain as many as 15 filaggrin repeats as well as the entire 3'-end of the gene.
Translation of Hybrid-selected mRNA-The cDNA clone and cosmid clone were used to hybrid-select poly(A)+ RNA, which was then translated in uitro. Fig. 5 shows translation of both keratins and high molecular weight proteins, the latter of which could be specifically immunoprecipitated by a mouse anti-filaggrin antibody. The cosmid clone was more efficient at selecting mRNA for the high molecular weight filaggrin precursor, probably because it contains a much longer coding region for hybridization.
In Situ Hybridization-In order to identify the location of the endogenous translation artifact (using 1 pl of the 25-pl translation reaction). Lanes 5-7, same but translation products immunoprecipitated (using 10 p1 of translation reaction).
the mouse filaggrin precursor mRNA within epidermal tissue, we employed in situ hybridization with the coding and 3'noncoding probes defined above. Both probes decorated the upper living layers of the epidermis (Fig. 6, a and b), coinciding precisely with the granular layer and location of the keratohyalin granules of the epidermis. The coding probe yielded a much more intense signal than the 3"noncoding probe. Fig.  6c shows a negative control; a sense strand to the 3"noncoding probe yielded only a weak but variable background signal. A sense strand to the coding region, however, yielded a stronger signal uniformly over the section, probably due to binding to ribosomal RNA as verified by Northern blots (data not shown). In Fig. 6d is shown a second control using a probe encoding the mouse keratin 1 chain. It localized predominantly to the spinous layers, diminishing to only a weak signal in the granular layers of the epidermis.

DISCUSSION
We report here the isolation and sequence of a cDNA clone for mouse epidermal filaggrin that encodes 1.479 kbp of the approximately 17 kb of the mRNA. The clone contains 313 amino acids of coding sequences which delineate a unique 248-amino acid (744 bp) repeating sequence. The properties of this sequence are identical to those expected for or known for mouse filaggrin, including molecular weight, amino acid composition, charge, and moreover, close similarity to a number of peptide sequences isolated by us and others. In addition, both the cDNA clone and the genomic cosmid DNA clone hybrid selected a mRNA species which was translated in uitro  Only the granular layers are decorated. The more intense signal in a is consistent with the presence of several repeats along the mRNA of the filaggrin precursor. c, a control using a sense riboprobe (having the same polarity as the mRNA) to the 3"noncoding probe. d, a probe consisting of a 900-bp fragment of the mouse keratin 1 cDNA clone (4) decorates the suprabasal layers below the granular layer. into a large protein product that could be immunoprecipitated with an anti-filaggrin antibody. Using portions of the cDNA clone as probes, we have identified a super-stoichiometric repeat of about 750 bp in total mouse genomic DNA and in the isolated cosmid clone bearing a portion of the filaggrin gene. We conclude that this super-stoichiometric 750-bp band and the 744-bp repeat in the cDNA sequence represent similarly sized repeating elements.
All of these observations point to the conclusion that filaggrin is initially translated from a large mRNA species encoding a polyprotein precursor which contains multiple filaggrin repeats. The data permit the construction of a model for this precursor gene using the PstI digestion data as an example (Fig. 7). We predict that a 2-kbp fragment extends from the PstI site identified in the coding portion of the cDNA clone through the 3"noncoding region into DNA sequences flanking the gene. The super-stoichiometric 744-bp band indicates that the gene consists of a series of 744-bp repeats. The enzymes BamHI, B a d , PstI, and XmaI which cut the 744-bp repeat sequence of the cDNA clone once, generate repeats of this size, but obviously covered overlapping sequences. Other enzymes such as HhaI and DdeI that cut the cDNA sequence more than once, generated many super-stoichiometric fragments, totaling about 744 bp in size (data not shown). Therefore, these data indicate that the repeats are conserved in size and nucleic acid sequence as certain enzymes either cut the gene into repeating fragments of 744 bp or do not cut it at all. As the cosmid clone does not contain the 5'-end of the gene, we do not know yet how many repeats there are in the fulllength gene. However, restriction analyses with RsaI (Figs. 3 and 4) suggest that the entire coding region of the gene lies within a genomic fragment of about 19 kbp. This is close to the size of the mRNA transcribed from the gene, estimated to be about 17 kb (Fig. 1). If there are in fact no introns, the entire coding region within the 19-kbp RsaI fragment could contain as many as 24 filaggrin repeats.
These data and our model for the repeating structure of the mouse filaggrin gene are consistent with those published in a recent study based on genomic blots of rat and mouse DNA (22), but differ in certain details largely because we present here for the first time complete sequence information of a filaggrin repeat. Published models have proposed that the mRNA encodes a precursor consisting of a series of repeating filaggrin units separated by a linker peptide between each (21, 22) or every second or third (20) repeat. A genomic mapping study estimated this repeat to be 850 bp long (22). Peptide mapping and pulse-chase experiments suggested that the repeating unit was about 267 amino acids long, of which approximately 219 consisted of filaggrin and 45-50 represented the linker (21). It was further suggested that the linker peptides were tyrosine-rich, since certain tyrosine-containing peptides identified in the high molecular weight precursor could not be found in isolated filaggrin (21). These authors thus proposed that the precursor is processed into filaggrin by excision of large tyrosine-rich linker peptides. Our blotting and sequence data have revealed a repeating sequence of 744 bp encoding 248 amino acids. Thus any linker must be included within each and every repeating unit. Since the 248amino acid repeat encodes a protein almost identical in size to the known size of mouse filaggrin (8), we calculate that the linker must be considerably shorter than 45-50 amino acids, if in fact it exists a t all. Therefore, our data point toward alternative hypotheses for the processing of the filaggrin precursor: namely that the high molecular weight precursor protein is spliced a t defined peptide bond(s), or that only a short linker peptide is excised. It is possible that the unusual acidic tyrosine-rich peptide region may serve as a signal sequence for processing, whether or not it is excised. Interestingly, most variations in amino acid sequences observed between our sequence deduced from the cDNA clone and those of published peptides (21, 22) exist in the vicinity of this acidic region.
While the present data define for the first time the complete sequence of a repeating filaggrin unit, the actual amino-and carboxyl-terminal sequences of it remain unknown. If the acidic tyrosine-rich region defines the ends of filaggrin, this means that the filaggrin unit encoded at the 3'-end of the mRNA is curiously truncated. Since there are several filaggrin repeats along the precursor, an incomplete filaggrin unit at the end may not be biologically important. Characterization of the structure of the entire gene should resolve these remaining questions. The insertion into expression vectors of cDNA and genomic sequences having repeating sequences containing selective deletions should aid in addressing these questions. In addition, the generation of antibodies to defined peptide regions of the 248-amino acid repeat will aid in the study of the processing of the precursor protein to filaggrin.
We have also used in situ hybridization to study the expression of the filaggrin precursor gene. Our data reveal that the mRNA is located precisely over the granular layer, consistent with the location of the keratohyalin granules in the epidermis. The coding probe yielded a much more intense signal than the 3"noncoding probe, since as expected from our model of the structure of the gene (Fig. 7), this region is repeated many times along the mRNA. In addition, the variability of the signal with the 3"noncoding probe suggests that the 3'-end of the filaggrin mRNA may be especially susceptible to degradation, as seen for other mRNA systems (34). A keratin 1 control was employed to show that the mRNA of the two species are predominantly located in different levels of the epidermis. Thus while it is well known that this keratin and filaggrin are both markers of mammalian epidermal differentiation (35), these data show that the two proteins are not coordinately expressed since the mRNA for the filaggrin precursor is expressed at a much later stage of differentiation.
Moreover, these in situ hybridization data also establish that the gene for the filaggrin precursor is regulated at the transcriptional level since its mRNA is stably expressed only when the cells migrate into the granular layers. However, we cannot rule out the possibility that the filaggrin precursor mRNA is very rapidly turned over in the lower differentiating cell layers of the epidermis and becomes more stable only in the granular layers.
The use of these in situ hybridization probes will now permit detailed studies on the expression of the filaggrin precursor in both normal and abnormally keratinizing epidermis, especially in disorders where the degree of filaggrin expression appears to be altered (11,lZ).