Mutational Tail Loss Is an Evolutionary Mechanism for Liberating Marapsins and Other Type I Serine Proteases from Transmembrane Anchors*

Background: Vertebrate marapsins can be either type I transmembrane proteases or unanchored. Results: Point mutations liberated marapsins from transmembrane peptides independently in human-related primates and other mammalian clades. Soluble marapsins are active and inhibitor-resistant. Conclusion: Mutational tail loss transformed transmembrane marapsins and related proteins into soluble proteases. Significance: These findings suggest a general evolutionary mechanism for evolving proteases with new properties and functions. Human and mouse marapsins (Prss27) are serine proteases preferentially expressed by stratified squamous epithelia. However, mouse marapsin contains a transmembrane anchor absent from the human enzyme. To gain insights into physical forms, activities, inhibition, and roles in epithelial differentiation, we traced tail loss in human marapsin to a nonsense mutation in an ancestral ape, compared substrate preferences of mouse and human marapsins with those of the epithelial peptidase prostasin, designed a selective substrate and inhibitor, and generated Prss27-null mice. Phylogenetic analysis predicts that most marapsins are transmembrane proteins. However, nonsense mutations caused membrane anchor loss in three clades: human/bonobo/chimpanzee, guinea pig/degu/tuco-tuco/mole rat, and cattle/yak. Most marapsin-related proteases, including prostasins, are type I transmembrane proteins, but the closest relatives (prosemins) are not. Soluble mouse and human marapsins are tryptic with subsite preferences distinct from those of prostasin, lack general proteinase activity, and unlike prostasins resist antiproteases, including leupeptin, aprotinin, serpins, and α2-macroglobulin, suggesting the presence of non-canonical active sites. Prss27-null mice develop normally in barrier conditions and are fertile without overt epithelial defects, indicating that marapsin does not play critical, non-redundant roles in development, reproduction, or epithelial differentiation. In conclusion, marapsins are conserved, inhibitor-resistant, tryptic peptidases. Although marapsins are type I transmembrane proteins in their typical form, they mutated independently into anchorless forms in several mammalian clades, including one involving humans. Similar pathways appear to have been traversed by prosemins and tryptases, suggesting that mutational tail loss is an important means of evolving new functions of tryptic serine proteases from transmembrane ancestors.

The human degradome is estimated to include 176 serine proteases and serine protease homologs (1). Of these, the largest subset belongs to the S1 family of trypsin-like proteins, including familiar proteases associated with digestion of food; with clotting and fibrinolytic and complement cascades; and with neutrophil, lymphocyte, and mast cell granules and host defense against infection. S1 serine proteases serve wide ranging functions and take many forms. The majority is secreted and soluble. A minority is membrane-associated via peptide or glycosylphosphatidylinositol (GPI) 2 anchors. The membrane group contains proteins with a transmembrane anchor that is C-or N-terminal to the catalytic domain (types I and II, respectively) (2,3). Although genes encoding type I transmembrane proteins comprise but a small fraction of the serine protease degradome, phylogenetic analysis suggests that at least one group of extant soluble serine proteases (namely, mast cell ␣/␤/ ␦-tryptases) evolved from type I proteins related to ␥-tryptases and prostasins (4 -9). However, the putative conversion lacks a mechanism and has not been observed within a specific clade of mammalian proteases. Seeking such evidence, we focused on marapsin (also known as pancreasin and PRSS27), which belongs to a group of serine proteases featuring members that can be either soluble or type I transmembrane proteins but are otherwise closely related (e.g. human versus mouse (10)) with highly conserved catalytic domains in mammals and some other vertebrates (7,11). Marapsins, therefore, provide an opportunity to test the hypothesis that some soluble serine proteases evolved from membrane-anchored proteases and to explore genetic mechanisms of the conversion.
Characterization of marapsins has been limited. Insights into enzymatic activity derive from studies of recombinant human marapsin/pancreasin, which is tryptic, preferring P1 Arg as assessed with a small panel of peptidyl 4-nitroanilides (10). No general protease activity has been identified for the human enzyme, which has the characteristic (unusual for a tryptic serine protease) of resisting inactivation by broad spectrum inhibitors such as aprotinin and soybean trypsin inhibitor. This suggests that the active site is restricted compared with classical trypsin-like serine proteases. The observed conservation of marapsin catalytic domains led us to hypothesize that substrate selectivity and inhibitor resistance are general characteristics of marapsins. Both human and mouse marapsins are modified by N-glycosylation (10,12). Additionally, the mouse enzyme contains a 34-residue, C-terminal hydrophobic extension (absent in the human enzyme) predicted to span cell membranes (10). Mouse marapsin expressed in mammalian cells is membraneassociated but shed by phosphatidylinositol-specific phospholipase C (3), suggesting that the transmembrane peptide is swapped for a GPI anchor (12). In agreement with hydropathy analysis as well as with predictions that the human enzyme lacks a C-terminal anchor, native human marapsin is secreted from an esophageal cell line (13). The marapsin gene is expressed by a variety of human, mouse, and opossum tissues (9,10,12,13). In humans and mice, it is especially abundant in stratified squamous epithelia of cornea, larynx, esophagus, and cervix (13,14) and is induced in wounded and psoriatic skin (13,14). These patterns of expression in combination with evidence of gene conservation in mammals encouraged speculation that marapsin serves critical functions, such as epithelial differentiation. To test hypotheses concerning origination of soluble marapsins from ancestral transmembrane proteases, cladespecific enzymatic properties of marapsins, and importance to epithelial development, we analyzed vertebrate marapsin and related protein sequences, compared mouse and human marapsin extended substrate preferences with those of prostasin, and characterized a marapsin-deficient mouse.

EXPERIMENTAL PROCEDURES
Data Mining-Full amino acid sequences of marapsins not previously published or annotated were obtained via BLAST searches of high throughput genome sequence and whole genome shotgun databases at the National Center for Biotechnology Information. Human and mouse marapsin genes and cDNAs (10) were used as query sequences. Previously unreported amino acid sequences of marapsins and marapsin-like proteases were predicted from genomic DNA using existing human and mouse gene structures as a template following standard rules for placement of intron-exon boundaries as detailed in prior publications involving phylogenetic studies of related serine proteases (5,7,8,10). Resulting protein sequences were aligned using Geneious software (Biomatters, Auckland, New Zealand). Candidate marapsins were consid-ered to be intact if there were no catalytic domain insertions or deletions relative to published, cDNA-validated human and mouse sequence (3,10,13) and if tryptic serine protease catalytic and specificity triad residues were conserved.
Sequencing of Marapsin Genes-Gorilla genomic DNA from cultured fibroblasts was obtained from the Primate Cell Repositories of the Coriell Institute for Medical Research (Camden, NJ). DNA encoding the marapsin gene sixth exon was amplified by PCR using the primer pair 5Ј-GGGTTCTTGATGAGGAA-GTCCGTTGAG and 5Ј-AGCTGGCACACAGGCTGGGTT-TTTATT. Amplimers were cloned into pCR2.1-TOPO (Invitrogen) and sequenced.
Identification of Truncation Mutations-Transcript alterations leading to tail loss in translated sequences of vertebrate marapsins were identified by alignment of nucleotide and predicted amino acid sequence bracketing the open reading frame of marapsin gene exon 6. Molecular time estimates of truncation mutations were calculated using TimeTree.
Prediction of Post-translational Modifications-Sites containing sequence Asn-Xaa-Ser/Thr (where Xaa is any amino acid) were considered potential sites of N-glycosylation. Candidate C-terminal transmembrane segments were identified by the Dense Alignment Surface algorithm (15) as applied to marapsins with open reading frame extensions relative to the human catalytic domain. Possible sites of GPI anchor attachment in marapsins with predicted C-terminal transmembrane peptide segments were identified using BigPI (16).
Phylogenetic Analysis-To probe cladistic relationships between catalytic domains, the analysis was restricted to sequence starting from Met 1 of the mature catalytic domain to avoid potential errors or biases arising from miscalls of prepro sequence that lie on small exons (10)). On the C-terminal end, aligned sequences were cut off at conserved His 239 of marapsin (or equivalent residue in related proteases) to avoid biasing effects of length variations in protease C termini, which do not universally include a membrane-spanning domain, which can extend protease sequence by 30 or more residues. A rooted tree was generated by unweighted pair group with arithmetic mean analysis using methods described previously (7).
Recombinant Expression of Soluble Proteases-Mouse marapsin was expressed in CHO cells as a soluble protein comprising residues Met 1 -Gln 292 with the native hydrophobic C-terminal transmembrane peptide replaced by a His 8 epitope. This activated enzyme, which contained the native, disulfidelinked propeptide, was purified from supernatants of conditioned medium by nickel affinity chromatography (13). Recombinant full-length human marapsin (Met 1 -Lys 290 -Ala-Ala-Ala-His 8 ) was expressed and purified similarly in its activated form (13). Combinatorial substrate profiling studies used mouse myeloma cell-expressed recombinant mouse marapsin (R&D Systems, Minneapolis, MN) comprising residues Ala 23 -Thr 290 with the native hydrophobic C-terminal transmembrane peptide replaced by a His 6 epitope. Met 1 -Gln 292 -His 8 mouse marapsin was subjected to PAGE (4 -12% NuPAGE bis-Tris gel with MES-SDS buffer, Invitrogen) under reducing (50 mM dithiothreitol) and non-reducing conditions followed by Coomassie Blue staining. Recombinant soluble mouse prostasin with C-terminal transmembrane sequence deleted at Ser 313 was produced in Escherichia coli and activated as described (17).
Mapping of Protease Subsites by Profiling with Combinatorial Fluorogenic Tetrapeptide Library-A positional scanning synthetic combinatorial library approach (18) as applied to other serine proteases (19,20) was used to identify residues preferred by recombinant soluble mouse and human marapsins in each of four positions (P1-P4) among fluorogenic tetrapeptides. A similar approach was used for mouse prostasin (170 nM). Because of limited recombinant material and prior work with mouse and human prostasins showing tryptic activity and preference for P1 Arg, this work tested mouse prostasin using libraries with Arg fixed in the P1 position. Recombinant marapsins were studied at 300 nM in 50 mM Tris, 250 mM NaCl, 0.05% Brij 35 (pH 8.0) at 25°C.
Design, Generation, and Validation of Custom Substrate and Inhibitor-A colorimetric 4-nitroanilide (4NA) substrate based on residues preferred by mouse marapsin at each subsite as determined using the positional scanning combinatorial library was synthesized by Anaspec (San Jose, CA) along with a potentially selective, peptidic, covalent (chloromethyl ketone) inhibitor. Potency and selectivity of the custom synthesized substrate and inhibitor were assessed in comparisons of recombinant mouse and human marapsins, recombinant mouse prostasin (R&D Systems) and matriptase (R&D Systems), and cattle trypsin (Sigma-Aldrich). Mouse marapsin also was tested against potential natural peptide and protein substrates containing favored sequences based on results of combinatorial screening of tetrapeptide fluorogenic substrates. Proteinaceous and small molecular weight inhibitors selected for the ability to inactivate other tryptic serine proteases were screened using the general tryptic protease substrate butyloxycarbonyl-QAR-4NA (0.5 mM; Bachem Americas, Torrance, CA). These inhibitors included 4-(2-aminoethyl)benzenesulfonyl fluoride, benzamidine, aprotinin, soybean trypsin inhibitor (all from Sigma-Aldrich), leupeptin (Enzo Life Sciences, Plymouth Meeting, PA), and nafamostat mesylate (Santa Cruz Biotechnology, Santa Cruz, CA).
Generation of Prss27 Ϫ/Ϫ Mice-Marapsin (Prss27)-deficient mice were generated in collaboration with Lexicon Pharmaceuticals (The Woodlands, TX) using gene trapping-based insertional mutagenesis (21) and retroviral vectors with splice acceptor sites to target expressed genes in mice. Disrupted genes were identified by 3Ј rapid amplification of cDNA ends, and a library of embryonic stem cell clones was created (21). Clone OST34330 contained a disrupted Prss27 gene (deleting exons 2 through 6) as established by DNA blotting of EcoRI-digested genomic DNA using probes corresponding to genomic sequence internal and external to the targeting vector sequence. The internal probe yielded 27.0-and 5.7-kb bands for wild type and targeted genomes, respectively, whereas the external probe yielded 27.0-and 16.7-kb bands, respectively. Targeted ES cell (129/SvEvBrd) clones were microinjected into C57Bl/6 (albino) blastocysts to generate chimeric mice that were crossed with C57Bl/6 mice to generate 50% 129/SvEvBrd, 50% C57Bl/6 progeny. Resulting heterozygous offspring were interbred to produce Prss27 Ϫ/Ϫ mice, which were backcrossed via single nucleotide polymorphism-assisted genome scanning (The Jackson Laboratory, Bar Harbor, ME) into a C57Bl/6J background until congenic. Single nucleotide polymorphism analysis suggested that the Prss27 Ϫ/Ϫ mice are Ͼ99% identical to C57Bl/6Jϩ/ϩ mice. Genotyping was performed on tail lysates via a PCR-based assay using primer pairs specific for wild type (forward, 5Ј-CAGGTAGGACTTAAGTGTCC and reverse, 5Ј-CCAGCAGGACCTGGTATATG, generating a 278-bp band) and mutant (forward, 5Ј-CCTGCCCTGCATC-CTTGTATGG and reverse, 5Ј-GCAGCGCATCGCCTTC-TATC, generating a 266-bp band) chromosomes. The PCR was performed for 35 cycles of melting, annealing, and extension at 94, 60, and 72°C, respectively. A marapsin genomic clone in pKOS and targeted ES cell genomic DNA served as wild type (ϩ/ϩ) and heterozygote (ϩ/Ϫ) positive controls, respectively.
Immunohistochemical Characterization of Mouse Tissues-Formalin-fixed, paraffin-embedded sections were deparaffinized in xylene and hydrated. Sections were blocked with 10% goat serum for 30 min at room temperature and then stained for 1 h with 0.9 g/ml monoclonal hamster anti-mouse marapsin antibody 3B2 (13) or with hamster IgG control antibody (BioLegend, San Diego, CA; 0.9 g/ml). After two washes in PBS, sections were incubated for 30 min with biotinylated goat anti-hamster IgG (Santa Cruz Biotechnology; 1.3 g/ml). After two further washes, sections were stained by incubation for 5 min with Vectastain ABC Reagents (Vector Laboratories, Burlingame, CA) and metal-enhanced diaminobenzidine (Thermo Fisher Scientific, Waltham, MA). Sections then were counterstained with hematoxylin, dehydrated in a graded series of alcohols, cleared in xylene, and mounted.

Marapsins Diverged from Related Serine Proteases in Premammalian Vertebrates and Are Highly Conserved in
Mammals-As shown in the cladogram in Fig. 1, marapsins (as deduced from genomic or cDNA sequence) are present in a wide variety of placental and marsupial mammals and in reptiles. The breadth of representation in mammalian genomes is similar to that of prostasin (7, 22), a related enzyme that is essential for embryonic development and postnatal survival in mice (23,24). Marapsin genes are more widespread than those that encode ␥-tryptase, which is absent from genomes of dogs and several other mammals and lacks an identified ortholog in non-mammalian vertebrates (7,9). Marapsin catalytic domain sequence conservation is high in comparison with related enzymes like testisin and ␥-tryptase as reflected by cladogram tine length, which is proportional to the number of sequence mismatches between a given pair of sequences. High level conservation also is evident from the alignment in Fig. 2, which shows little variation in sequence and even less in length except in the C-terminal tail, which is not part of the standard catalytic Cladogram. This rooted tree probing relationships between vertebrate marapsins and related serine proteases was generated by unweighted pair group with arithmetic mean analysis of aligned protein sequences. To allow comparison of proteins with signal peptides, propeptides, and C termini of varying length, the alignment was limited to mature catalytic domains with deletion of C-terminal extensions present in a subset of the proteases. Nodes were assigned if predicted by at least 550 of 1000 iterations of bootstrap resampling. Mouse granzyme A and human granzyme A, which are tryptic serine proteases not closely related to marapsins, prostasins, or tryptases, together serve as an outgroup. Clades of proteases known or predicted to lack a membrane anchor are red. The other proteases are predicted to be type I transmembrane proteins based on the presence of a C-terminal, hydrophobic extension of membrane-spanning length. Note that channel-activating protease (CAP-1) appears to be a frog ortholog of mammalian prostasins. Accession numbers of sequences used in tree construction are given in supplemental Table S1. APRIL 12, 2013 • VOLUME 288 • NUMBER 15 domain exemplified by trypsins and ␤-tryptases (5,25). Gene preservation and high level conservation of protein sequence can indicate biological importance and conserved function. In this regard, it is notable that all of the marapsins (a subset of which are shown in Fig. 2) are intact in that they lack major insertions, deletions, or truncations and possess the "catalytic triad" residues (His 57 , Asp 102 , and Ser 195 by standard chymotrypsinogen numbering) essential in serine proteases for attacking and hydrolyzing peptide bonds. Each of the sequences also contains the "specificity triad" residues (Asp 189 , Gly 216 , and Gly 226 ) by chymotrypsinogen numbering) featured in serine proteases with trypsin-like primary specificity for basic amino acids at the site of cleavage (26 -28). These triad residues line the pocket that accommodates the basic side chain of lysine or arginine substrates.

Evolution of Serine Proteases by Mutational Tail Loss
The large number of extant vertebrate marapsin and marapsin-related sequences contributing to the tree strongly reinforces prior suggestions that catalytic domain of marapsin is most closely related to that of the tryptic protease prosemin/ Prss22/tryptase ⑀ (6,9,12). Identification of reptilian marapsins predicts that marapsins last shared ancestry with prosemin before mammals evolved as a separate lineage from other vertebrates. One conserved and highly idiosyncratic feature of the mature catalytic domain is methionine as the first residue, which is Ile or Val in the vast majority of other trypsin-like serine proteases, including the closest relatives of marapsin shown in Fig. 1. The number and position of cysteines are highly conserved in mammalian marapsins, including Cys 110 (Fig. 2), which is predicted to form a disulfide linkage with the propeptide (10), and are similar to those of prosemin, prostasin, testisin, and ␥-tryptase. The propeptide itself (not shown) includes highly conserved consensus sequence CGRPRMLNR (see supplemental  Table S2), which begins with the Cys Ϫ9 proposed to link with Cys 110 and ends at the proposed site of cleavage activation at universally conserved Arg Ϫ1 . As shown by the alignment in Fig. 2, two consensus sites of predicted N-glycosylation (Asn 21 and Asn 45 ) are highly conserved in mammalian marapsins. This conservation combined with direct evidence from electrophoretic responses to N-glycosidase treatment that sugars are attached to one or more of these sites in recombinant marapsins (3,10) suggests that N-glycosylation may be required for stability or function.

Human Marapsin and Selected Other Marapsins Evolved into Soluble Proteases via Nonsense Mutations in Ancestors with
Transmembrane Segments-As shown by the alignments in Fig.  2, the most conspicuous structural variation among mamma-FIGURE 2. Alignment of marapsins. Selected mammalian marapsin sequences beginning with Met 1 of the predicted mature catalytic domain after activation are aligned. The nine absolutely conserved Cys residues, one of which (Cys 110 ) is predicted to link with propeptide Cys Ϫ9 (not shown), are marked with a "#." Consensus conserved sites of N-glycosylation sites are green. The absolutely conserved catalytic triad residues (His 41 , Asp 90 , and Ser 195 ) and specificity triad residues (Asp 189 , Gly 216 , and Gly 226 ) essential for S1 serine protease function and tryptic specificity are yellow and cyan, respectively. Other aligned residues that are identical to those in the mouse marapsin sequence are black. Sequence-terminating stop codons are red, revealing marked length variations.
lian marapsins is the length of the C-terminal region. The majority of marapsins, including reptilian versions, have a C-terminal extension that is hydrophobic and predicted to span lipid bilayers (10). In the mouse, this sequence can be exchanged for a lipid (GPI) membrane anchor (3). Indeed, most marapsins with a hydrophobic C-terminal extension have a consensus GPI anchor attachment site (for example, Asn 269 of mouse and rat sequences and Ser 268 of gorilla, gibbon, and orangutan sequences; see Fig. 2). However, human marapsin (10) and several other mammalian marapsins identified in the present work lack this hydrophobic extension as well as consensus GPI attachment sites; indeed, native human marapsin is secreted as a soluble protein (13). Thus, some marapsins have a peptide or lipid transmembrane anchor (and are typical type I transmembrane proteins), whereas others described in ensuing paragraphs appear to be typical soluble proteins. Although marapsin-related prosemins lack a C-terminal hydrophobic sequence and are not predicted to be transmembrane proteins, the slightly more distant cousins of marapsin in Fig. 1, including prostasin, ␥-tryptase, and testisin, all are known or predicted to be type I transmembrane tryptic proteases (2,3,5,17,29) and are basal to marapsins as well as to prosemins on the tree. This is evidence that soluble marapsins and prosemins evolved from type I proteases rather than the other way around.
A Nonsense Mutation in an Ancestral Great Ape Caused Loss of the C-terminal Transmembrane Anchor in Human, Chimpanzee, and Bonobo Marapsins-As shown in Fig. 2, the predicted catalytic domains of human, chimpanzee, and bonobo end well before the corresponding sequences of other primates, including otherwise closely related great apes (gorilla and orangutan), a lesser ape (gibbon), and old and new world monkeys (sequences not shown; for species, see cladogram in Fig. 1) and a distantly related "primitive" primate (galago). The missing 33 residues include a predicted C-terminal membranespanning peptide and GPI anchor addition site. Therefore, human, chimpanzee, and bonobo marapsins are predicted to be soluble, i.e. neither peptide-nor lipid-anchored to cell membranes. As revealed by the nucleotide alignments in supplemental Fig. S1, the genetic basis of the truncation is the same for humans and for the other two affected great apes: namely, a nonsense point mutation converting the Arg 257 codon into a stop codon. Identification of this mutation in humans and their two closest phylogenetic relatives among primates (as reflected by the clades in Fig. 1) suggests that this mutation occurred after humans last shared a common ancestor with gorillas ϳ8 million years ago (30) but before ancestors of chimpanzees, bonobos, and humans diverged into separate lineages ϳ6 million years ago (30). The high degree of homology in aligned exon 6 nucleotide sequences between human and ape sequences past the site of the nonsense mutation as revealed in supplemental Fig. S1 further supports the comparatively recent occurrence of the mutation. Despite the introduction of the earlier stop codon, which truncates the translated sequence, the relic of the ancestral C-terminal hydrophobic tail persists as an untranslated open reading frame.
A Separate Nonsense Mutation in an Ancestral Rodent Caused Transmembrane Anchor Loss in Marapsins of Guinea Pig-related Mammals Native to Two Continents-As seen in Fig. 2, early truncation of marapsin was also identified in members of rodent infraorder Hystricognathi, which are related only distantly to rats and mice. These animals (guinea pig, degu, social tuco-tuco, and naked mole rat) have a T3 C nonsense mutation changing Gln 266 to a stop codon as shown in supplemental Fig. S2. The resulting truncation occurs at a site very close to that seen in humans, chimpanzees, and bonobos and produces a catalytic domain of 255 residues, which is one residue shorter than human and 36 residues shorter than the rat and mouse catalytic domains, which contain a predicted transmembrane peptide and GPI attachment site. Thus, guinea pig and the related truncated marapsins like human marapsin provide evidence of a distinct but durable mutational tail loss mutation that occurred in an ancestral rodent after divergence of lineages leading to mice, rats, and squirrels and the ancestors of guinea pigs. Although the effect of this mutation is similar to the one identified in humans and closely related great apes, it is a distinct and otherwise unrelated event and occurred much longer ago (between 43 and 78 million years ago) given the much greater overall sequence divergence within this clade of affected rodents (as reflected by the length of tines in the Fig. 1 cladogram) as well as by the fact that the group includes mammals that are native to two long separated continents (Africa for mole rats and South America for guinea pig, degu, and social tuco-tuco).

Recent Marapsin Gene Mutations Caused Truncation and Probable Transmembrane Anchor Loss in Cattle-like Mammals-
Another marapsin truncation was identified in a third small clade: cattle and yaks, which are a subset of ruminant mammals, which in turn are a subset of ungulates. The truncation was noted in genomes from at least two varieties of domestic cattle: Bos primigenius taurus and Bos primigenius indicus. As shown in Fig. 2, the mutation truncates marapsin at a site after that observed in the primate and rodent subgroups discussed above. Nonetheless, it eliminates the hydrophobic C-terminal peptide that is the requisite for forming a transmembrane anchor, which in turn is required for attaching GPI. Thus, cattle and yak marapsins are predicted to be unattached to membranes. As revealed by Fig. 2 and by the nucleotide sequence alignments in supplemental Fig. S3, the truncation is not present in other bovids (sheep and goat) and ungulates (horse, rhinoceros, and pig) and therefore occurred between 4.7 and 30 million years ago. Two nucleotide changes are required to convert the corresponding codon in sheep to the stop codon in cattle and yaks. Without additional sequence information from other related species, it is not yet clear whether sheep and goats acquired an additional mutation after divergence from the ancestors of Bos species or whether the latter accumulated sequential mutations after divergence from ancestors of sheep. Based on overall sequence conservation and the cladogram in Fig. 1, the early termination event now present in cattle and yaks is more recent than the one observed in the rodent clade given that sheep appear to have shared a common ancestor with cattle more recently than guinea pigs shared an ancestor with naked mole rats. Fig. 3 by screening of combinatorial tetrapeptidic substrates, recombinant soluble human, and mouse marapsin, preferences are similar with both peptidases preferring tryptic residues at position P1 at the site of hydrolysis as anticipated based on strict conservation of specificity triad residues Asp/Gly/Gly lining the primary specificity pocket of nearly every serine protease of tryptic specificity (20,27). Although some tryptic serine proteases readily hydrolyze substrates with either Arg or Lys at P1 (18), the marapsins strongly prefer Arg. In the adjoining position at P2, polar residues (excepting Arg and Lys) are favored, and Gly and Pro are disfavored. P3 accommodates polar and hydrophobic amino acids with Gly and Pro disfavored as at P2. Hydrophobic residues are favored at P4. Substrate peptide preferences of recombinant soluble mouse prostasin at P2, P3, and P4 (with P1 fixed at Arg) contrast strongly with those of the marapsins, thereby suggesting that the natural targets of these two classes of enzymes may not overlap.

Human and Mouse Marapsin Subsite Preferences Differ from Those of Prostasin-As shown in
A Tetrapeptide Substrate Based on Marapsin Cleavage Preferences Is Selective for Marapsin versus Prostasin-To validate results of combinatorial peptide screening and to create a useful assay substrate for marapsin, a tetrapeptide substrate, succinyl-L-Tyr-Leu-Asn-Arg-4NA (YLNR-4NA), based on sequences preferred by marapsins and not by mouse prostasin (see Fig. 3) was synthesized. As shown in Fig. 4A, this substrate is selective for mouse marapsin over mouse prostasin, whereas commercially available substrate QAR-4NA (which is a substrate for human prostasin and detects prostasin and matriptase activity in cultured cells (31)) was preferred by mouse prostasin.
A Tetrapeptide Covalent Inhibitor Is Selective for Mouse Marapsin-To assess potential for developing a selective inhibitor, we tested peptidic compound biotinyl-L-Tyr-Leu-Asn-Arg-chloromethyl ketone (YLNR-chloromethyl ketone), which was custom synthesized based on results of combinatorial screening. As shown in Fig. 4B, YLNR-chloromethyl ketone potently inactivates mouse marapsin and is selective based on weaker inhibition of other tryptic serine proteases, including mouse prostasin, mouse matriptase and cattle trypsin.
Marapsin Resists Inhibitors-Although inactivated by YLNR-chloromethyl ketone, mouse marapsin is unusually resistant to other inhibitors that readily inhibit trypsin and other tryptic serine proteases as revealed by Fig. 5A. Mouse marapsin also resists inhibitors present in serum as indicated by the gel filtration chromatogram in Fig. 5B that shows preservation of marapsin activity upon addition to mouse serum, indicating that it resists inactivation by serum serpins, such as ␣ 1 -antitrypsin. Most marapsin-like activity appears in an elution position of ϳ30 kDa (relative to standard proteins), which is well below that of an ␣ 2 -macroglobulin-bound complex (ϳ750 kDa (32)), suggesting that marapsin also largely eludes capture by ␣ 2 -macroglobulin. Smaller amounts of activity appear at ϳ160 and ϳ700 kDa (in addition to QAR-4NA-hydrolyzing activity present in unspiked serum), possibly representing marapsin homo-or hetero-oligomers and a small fraction of ␣ 2 -macroglobulin-bound marapsin, respectively. Overall, 53% of marapsin activity added to serum was recovered. Furthermore, reducing and non-reducing SDS-PAGE of recombinant mouse marapsin (Fig. 5C) does not reveal formation of disulfide-linked oligomers as a potential explanation of inhibitor resistance.
Marapsin Lacks General Peptidase and Proteinase Activity-Notwithstanding its ability to hydrolyze tetrapeptides with fluorogenic or chromogenic leaving groups, marapsin lacks general peptidase and proteinase activity. Among potential natural peptide targets tested and not hydrolyzed by mouse marapsin despite prolonged incubations are LL-37 cathelicidin, substance P, calcitonin gene-related peptide, neurotensin, FIGURE 3. Comparison of marapsin and prostasin subsite preferences. Mouse and human marapsins and mouse prostasin were profiled using fluorogenic tetrapeptide substrates. For the marapsins, primary specificity was tested in a combinatorial library in which each of the designated 20 amino acids was held constant in turn as residues P2 through P4 were varied. Therefore, each assay condition tested a mixture of 8000 different peptide substrates for a given P1 residue. The amidolytic activity of marapsin liberates 7-amino-4-carbamoylmethylcoumarin. The initial readout in relative fluorescence units was normalized to the result for the most preferred amino acid in each subsite. The other panels show results of similar profiling at positions P2, P3, and P4, respectively, by fixing amino acids at the designated position and varying the residues at the remaining positions. Prostasin was profiled at positions P2, P3, and P4 with P1 fixed at Arg. Abbreviations for amino acids are given in standard one-letter code; n is norleucine (substituting for Met). Error bars represent S.D.
vasoactive intestinal peptide, and peptide YY, some of which contain sequences potentially favored based on results of combinatorial peptide screening. Proteins not cleaved include mouse epidermal growth factor receptor, mouse hepatocyte growth factor, human NK4-like protein (33), Staphylococcus aureus protein A, human keratin 14, mouse interleukin 17, and mouse pro-matrix metalloproteinase-9 (data not shown). We detected no gelatinolytic activity of mouse and human marapsins by gelatin zymography and no direct caseinolytic activity of the mouse enzyme (data not shown).
Prss27 Ϫ/Ϫ Mice Develop Normally and Are Fertile-The deleted portion of marapsin gene affecting exons 2 through 6, including the entire catalytic domain, is shown schematically in Fig. 6A. Mice that were genetically modified to lack functional marapsin genes and raised in barrier facilities lacked overt phenotypic defects as Prss27 ϩ/Ϫ heterozygotes and Prss27 Ϫ/Ϫ homozygotes. This was also true of mice backcrossed for 10 or more generations into a C57Bl/6 background. Fertility of Prss27 ϩ/ϩ , Prss27 ϩ/Ϫ , and Prss27 Ϫ/Ϫ mice was similar as was litter size, perinatal survival, and weight gain. Marapsin gene deletion in Prss27 Ϫ/Ϫ mice was confirmed by PCR-based genotyping (Fig. 6) and by using antibodies raised against mouse marapsin to survey selected tissues for marapsin expression (Fig. 7).
Epithelial Histology and Marapsin Expression in Prss27 ϩ/ϩ and Prss27 Ϫ/Ϫ Mice-As shown in Fig. 7, the histological appearance of stratified squamous epithelium was similar in Prss27 ϩ/ϩ and Prss27 Ϫ/Ϫ esophagus, which was identified previously as a prominent site of marapsin expression (13), suggesting that marapsin is not essential for development, differentiation, or sloughing of esophageal epithelium. In tissue sections, incubation with anti-marapsin antibody detected immunoreactive material in stratified squamous epithelium derived from Prss27 ϩ/ϩ but not Prss27 Ϫ/Ϫ esophagus, confirming the specificity of this antibody for marapsin.

DISCUSSION
The present work identifies mammals "caught in the act" of shedding the transmembrane domain of marapsin via nonsense mutation while maintaining high level conservation of the catalytic domain. Our observations that tail loss occurred independently in several clades and was preserved through millions of years of evolution suggest that tail loss is not deleterious and that positive selection may be at work. Theoretically, human marapsin and other marapsins lacking a C-terminal transmembrane hydrophobic peptide could have evolved from ancestors that were type I transmembrane proteins. Alternatively, transmembrane proteins could have evolved from soluble proteins by acquisition of hydrophobic tails. At least three lines of evidence in this study suggest that extant soluble marapsin evolved from transmembrane ancestors rather than the other way around. One piece of evidence is that most of the close relatives of marapsin are known or predicted to be type I transmembrane proteins. This includes prostasin, which is GPI-anchored in mice and humans and may have deeper origins in vertebrate evolution than marapsin, as revealed by supplemental Fig. S4 and by the analysis in Fig. 1. Second, tail loss in the three small clades depicted in Fig. 1 occurred comparatively recently in mammalian evolution and by distinct nonsense mutation events in each clade. These events are temporally and stochastically incompatible with an evolutionary sequence involving acquisition of a C-terminal anchor by tailless soluble proteins as a mechanism of generating the observed isoforms of marapsin. In specific reference to C-terminal anchor loss in a subset of primates, humans, chimpanzees, bonobos, and gorillas comprise subfamily Homininae within the family Hominidae (great apes), and humans, chimpanzees, and bonobos comprise tribe Hominini within subfamily Homininae. Thus, the appearance and preservation of the point mutation causing marapsin tail loss in Hominini but not in other great apes, monkeys, and other "lesser" primates follows the expected cladistic line of descent from early primates to humans (30). Finally, the evidence that human and other soluble marapsins were generated by recent mutational tail loss is supported by the presence of an open reading frame distal to the site of nonsense mutation that preserves the untranslated remnant of ancestral transmembrane sequence. In soluble mast cell tryptases, which are proposed to have evolved from type I transmembrane ancestral FIGURE 4. Selectivity of a tetrapeptide substrate and inhibitor synthesized based on mouse marapsin subsite preferences. A shows the relative preference of mouse marapsin, prostasin, and matriptase and bovine trypsin for custom tetrapeptide substrate YLNR-4NA versus nonspecific tripeptide substrate QAR-4NA (both substrates 0.5 mM). YLNR-4NA synthesized based on results of combinatorial peptide substrate profiling of marapsin (as shown in Fig. 3) was highly selective for mouse marapsin over prostasin and matriptase but less so in relation to trypsin in which subsite preferences are less pronounced. QAR-4NA was preferred by all proteases except marapsin. Specific activity for both substrates was much lower for marapsin and prostasin than for matriptase and trypsin. Error bars represent S.E. B shows the effect on QAR-4NA-hydrolyzing activity of preincubating marapsin, prostasin, matriptase, and trypsin with custom tetrapeptide-based inhibitor YLNR-chloromethyl ketone in various inhibitor/enzyme ([I]/[E]) molar ratios. Residual activity after incubation with inhibitor is shown as a percentage of activity without inhibitor.
proteases much earlier in mammalian evolution (5,7,34), this trace of the ancestral tail is no longer evident. This evolutionary mechanism as applied to marapsins, mast cell tryptases, and potentially to other proteases can be relevant for type I but not for type II proteases because a nonsense mutation in the transmembrane segment codon of a type II serine proteases would prevent the catalytic domain itself from being translated.
These observations reveal that although marapsins in at least three groups of mammals lack a C-terminal transmembrane   . Esophageal histology and marapsin immunoreactivity in Prss27 ؉/؉ and Prss27 ؊/؊ mice. The top panels are low and higher power photomicrographs of esophageal tissue sections incubated with anti-mouse marapsin antibody. Arrows indicate immunoreactive stratified squamous epithelial cells lining the esophageal lumen of Prss27 ϩ/ϩ mice. The granular brown staining in the most superficial layer represents lumenal debris and non-nucleated, keratinized cell-derived material. Prss27 Ϫ/Ϫ -derived sections lacked specific staining. The bottom panels show the lack of immunoreactivity in serial sections of Prss27 ϩ/ϩ and Prss27 Ϫ/Ϫ esophageal squamous epithelium incubated with non-immune (isotype control) antibody. Sections were counterstained with hematoxylin. Scale bars, 50 m. segment the transmembrane version is the typical mammalian form and was also the likely form of the enzyme in ancestral, premammalian vertebrates. Given the major variation in the length of marapsin C-terminal sequence, it is perhaps surprising that the catalytic domain of marapsin is strongly conserved in mammals and other vertebrates. The preservation of catalytic triad residues His 57 , Asp 102 , and Ser 195 (using standard chymotrypsinogen numbering) and trypsin-like specificity triad residues Asp 189 , Gly 216 , and Gly 226 indicates that proteolytic activity is likely to be important to marapsin function. The demonstration of similar peptidolytic activity in recombinant soluble versions of human and mouse marapsins further predicts that catalytic competence is a feature of the catalytic domain whether it originates as a transmembrane protein or as a protein secreted in a soluble form. Notwithstanding these considerations, mouse and human marapsins both are enzymatically weak in comparison with trypsin, matriptase, and mast cell tryptases and are similar to prostasins in this regard (31) although with divergent peptide subsite preferences as revealed by the combinatorial peptide library screening results in Fig. 2. Although it is possible that marapsins cleave a yet-tobe-identified protein target with high efficiency, our studies to date identify little or no proteinase activity for mouse and human marapsins. Therefore, it is possible that marapsin serves a regulatory function that does not depend on peptidase activity. We also note that the last four residues of observed consensus propeptide sequence CGRPRMLNR are similar to the custom YLNR-4NA peptide synthesized based on results of combinatorial screening of mouse marapsin and determined to be cleaved preferentially by marapsin. This suggests the possibility of direct marapsin autoactivation, which is consistent with the observed spontaneous activation of recombinant human and mouse marapsins during expression and purification. Alternatively, promarapsin could be activated in a multistep process by initial hydrolysis at Arg Ϫ5 followed by dipeptidyl peptidase I-or cathepsin-mediated removal of the remnant peptide as may occur with some mast cell tryptases (35)(36)(37) and mastins (38).
The general reluctance of mouse and human marapsins to hydrolyze proteins and peptides at tryptic sites even at sites that would seem to be favored based on identified preferences in tetrapeptide substrates suggests that the active site may be blocked in a manner that hinders access to substrates (and inhibitors). The overall weak activity toward tetrapeptide 4NA substrates also is consistent with a partially blocked or "collapsed" active site potentially due to the absence of a yet-to-be identified allosteric activator or cofactor (39) or to a requirement for active site conformation change induced by binding of a highly specific substrate as occurs for complement factor D (40). A potentially similar blockade was identified in structures derived from crystallized versions of prostasin (41,42), which as noted is a type I transmembrane tryptic serine peptidase related to marapsin but with differing substrate preferences and patterns of expression. The active site blockade may be more extreme in marapsin based on higher level resistance to inhibitors. Indeed, although the present work identifies low molecular weight inhibitors, marapsin appears to resist larger (i.e. proteinaceous) inhibitors, including broad spectrum tryptic serine proteases inhibitors like aprotinin, serum serpins, and ␣ 2 -macroglobulin. A partially blocked or collapsed active site can explain inhibitor resistance as well as weak general peptidase activity. Although some proteinases, notably mast cell tryptases (25) and mastins (38), achieve a measure of protection from larger inhibitors by self-associating into active site-shielding oligomers, the gel filtration behavior of mouse marapsin (Fig. 5) suggests that marapsin remains uninhibited as a monomer among many potential antipeptidases in serum. Furthermore, mouse and human marapsins do not seem to form disulfide-linked dimers or higher order oligomers. Therefore, we propose that the most likely explanation of inhibitor resistance is a prostasin-like, partially blocked active site in the monomeric catalytic domain.
It should be stressed that our phylogenetic analysis shows that the closest relatives of marapsin are not prostasins but prosemins, which appear to be soluble (i.e. not transmembrane) proteases in mammals. However, both marapsins and prosemins appear to have evolved from type I transmembrane proteases. On this basis, we propose that an early (possibly premammalian) ancestral prosemin underwent mutational loss of its C-terminal hydrophobic tail as some marapsins have done recently, leaving no extant membrane-anchored prosemins among identified mammalian proteases. Although marapsin and prostasin share similarities as type I transmembrane tryptic P1 Arg-preferring peptidases with partially protected active sites, there are key contrasts, including as already noted functional differences in peptide substrate preferences, inhibitor susceptibility, and tissue patterns of expression. Perhaps the most compelling contrast in regard to function is the respective phenotypes of prostasin-deficient (Prss8 Ϫ/Ϫ ) and marapsin-deficient (Prss27 Ϫ/Ϫ ) mice. Prss8 Ϫ/Ϫ mice die during embryogenesis, and mice with skin-selective deficits of prostasin die in the early postnatal period (23). Prss27 Ϫ/Ϫ mice as shown in the present work are viable and fertile without overt defects when raised in specific pathogen-free barrier conditions, arguing against a non-redundant, critical role in fetal or early postnatal development. Nonetheless, strong conservation of the marapsin gene in mammals and other vertebrates suggests that roles await discovery. In particular, the availability of the marapsindeficient mice reported here allows explorations of the importance of marapsin in epithelial responses to infection, injury, and other environmental challenges.