The Rhesus Macaque (Macaca mulatta) Sperm Proteome*

Mass spectrometry based proteomics has facilitated sperm composition studies in several mammalian species but no studies have been undertaken in non-human primate species. Here we report the analysis of the 1247 proteins that comprise the Rhesus macaque (Macaca mulatta) sperm proteome (termed the MacSP). Comparative analysis with previously characterized mouse and human sperm proteomes reveals substantial levels of orthology (47% and 40% respectively) and widespread overlap of functional categories based on Gene Ontology analyses. Approximately 10% of macaque sperm genes (113/1247) are significantly under-expressed in the testis as compared with other tissues, which may reflect proteins specifically acquired during epididymal maturation. Phylogenetic and genomic analyses of three MacSP ADAMs (A-Disintegrin and Metalloprotease proteins), ADAM18-, 20- and 21-like, provides empirical support for sperm genes functioning in non-human primate taxa which have been subsequently lost in the lineages leading to humans. The MacSP contains proteasome proteins of the 20S core subunit, the 19S proteasome activator complex and an alternate proteasome activator PA200, raising the possibility that proteasome activity is present in mature sperm. Robust empirical characterization of the Rhesus sperm proteome should greatly expand the possibility for targeted molecular studies of spermatogenesis and fertilization in a commonly used model species for human infertility.

The application of mass spectrometry (MS) based proteomics, coupled with whole genome annotation of an increasing number of species, has greatly extended our knowledge of sperm composition. Traditional methods used to assess sperm composition, including the use of sperm-specific antibodies and 2D gel electrophoresis, have identified a limited number of sperm proteins. These traditional studies have been augmented in recent years by the use of high throughput and highly sensitive MS (shotgun proteomics) that have substantially increased the accuracy of peptide identification, resulting in a significant increase in proteome coverage. Indeed, advances in MS instrumentation, data acquisition, and the availability of genome annotations have, for example, increased sperm proteome coverage in Drosophila from 381 (1) to 1108 proteins (2) over a five year period.
Two main MS based methodologies have been applied to study sperm composition, including (i) 2D PAGE followed by spot excision and MS and (ii) digestion of proteins, followed by MS/MS analysis of the resulting peptides (3). Although each method has its own advantages and disadvantages, a far greater level of proteome coverage is obtained using MS/MS (4). A previous comparative study found that each method identified proteins not found in the other and vice versa, and therefore it has been suggested that these methods should be used to complement each other (5). Thus, although no single methodology yet exists capable of producing a complete whole cell proteome, MS/MS methods provide deeper and broader coverage and are therefore the current method of choice. Shotgun proteomics has characterized sperm proteomes in a variety of taxa including plants, invertebrates and mammals such as human, mouse, rat, and bull (3, 6 -11). These studies achieve varying levels of proteome coverage as a result of several factors including the choice of MS equipment, sample acquisition, purification, solublization, and fractionation schemes. Although these different approaches make direct comparisons difficult they nevertheless have provided invaluable information regarding the composition of sperm and have helped to identify novel proteins that play important roles in sperm function and reproduction.
In this study we use MS based proteomics to elucidate the sperm proteome of a species of old world monkey, the Rhesus macaque (Macaca mulatta). Due primarily to their genetic and physiological similarities to humans, Rhesus macaques are the most widely used nonhuman primate model system for basic and applied biomedical research (12). Rhesus macaques are also used extensively as a model of human reproduction where numerous similarities at the molecular level have been observed between gametes of the two species, and why Rhesus macaques have become a useful model system for fertility and assisted reproductive technology re-search (13). A more complete knowledge of the sperm proteome will facilitate reproductive studies using the Rhesus macaque as a model organism. However, despite its widespread use in reproductive biology, the macaque sperm proteome (MacSP) 1 has yet to be characterized.
Although insight into the MacSP will facilitate reproductive studies using the Rhesus macaque as a model organism, this knowledge can also be used to better understand the composition of human sperm. Sperm mature and gain fertilization competency as they traverse the epididymis, a specialized duct that connects the testis to the vas deferens (14). During the maturation process, sperm lose or modify a number of their surface proteins and gain additional transient or permanent surface proteins in a well-organized manner, and it is only after emerging from the cauda epididymis that sperm are motile and considered fertilization competent (14,15).
Proteomic studies of human sperm have been undertaken (3,6,10), identifying between 98 -1760 sperm proteins, however these studies used sperm from ejaculates which complicates sperm proteome analysis. A previous study identified 923 proteins present in human seminal plasma (16), which is likely to be only a fraction of the seminal plasma proteome. Human sperm proteome data sets derived from human ejaculates makes it difficult to differentiate which of the identified proteins are sperm or seminal plasma constituents. For example, a major seminal protein family, the semenogelins are not expressed in the testis but are found in sperm proteomes determined from ejaculates (6,10). Such highly abundant seminal proteins may mask lower abundance integral sperm proteins and inhibit their identification by MS. In order to avoid these problems, we collected mature sperm directly from the cauda epididymis of the Rhesus macaque, thus avoiding contamination from seminal plasma proteins.
In the present study, sperm proteins were separated using 1D SDS-PAGE, digested and the resulting peptides analyzed by LC MS/MS. Using high stringency parameters for peptide identification, we conservatively identified 1247 proteins from purified samples of Rhesus macaque sperm. Given their close evolutionary relationship, the Rhesus macaque and human share 93% nucleotide homology (12). Data from this study can be used to complement what is currently known about the composition of human sperm and provides a more useful proxy of human sperm proteome composition than the proteomes of other non-primate mammals for which data is available. Studies of sperm composition, especially those in human, can be applied to develop novel molecular based clinical diagnostic tests of sperm quality, which is currently limited to evaluating parameters such as sperm count, mor-phology and motility. In addition, knowledge of sperm components can lead to the discovery of novel contraceptives and infertility treatments.

EXPERIMENTAL PROCEDURES
Tissue Preparation and Isolation of Mature Sperm-Intact epididymides were harvested from two euthanized adult macaques (Macaca mulatta) that had previously been used for a series of neurophysiological experiments. Each animal was euthanized and tissue processed successively over a 2 day period and stored in PBS at 4°C until used. Each intact epididymis was removed and both proximal and distal ends ligated. All procedures were approved by the Institutional Animal Care and Use Committee of Northwestern University.
Mature and motile sperm were collected from the cauda epididymis by needle puncture and collected in 50 ml conical centrifuge tubes. Sperm pellets were obtained by differential low speed centrifugations using a tabletop centrifuge (Sorvall). Purified sperm samples were obtained by repeated cycles of centrifugation and pellet resuspension in standard Ringers solution. Following each pellet resuspension, sperm purity was assessed using a DNA-based fluorescence assay. Equal 5 l aliquots of sperm were mounted on a microscope slide along with an equal volume of PBS containing 1.0 g/ml of 4Ј,6-diamidino-2-phenylindole (DAPI; Invitrogen) and subsequently examined using a Zeiss Axioskop equipped with DIC and epifluorescence optics (Fig. 1). Digital images captured with an onboard ccd camera and exported to Photoshop.
Protein Solubilization and Quantitation-Sperm samples were solubilized in SDS and quantified using the EZQ® Protein Quantitation Kit (Invitrogen, Carlsbad, CA). Protein fluorescence was measured using a Typhoon Trioϩ (Amersham Biosciences/GE Healthcare) equipped with a 488 nm laser and 610 nm bandpass filter. Im-ageQuant™ TL software was used to analyze fluorescence data. A standard curve was generated using fluorescence data from control samples of known concentration and used to determine sperm sample concentration.
1-Dimensional SDS-PAGE-A 1 mm 10% NuPAGE® Novex® Bis-Tris Mini Gel was set up using the XCell SureLock Mini-Cell system (Invitrogen) as per manufacturer's instructions for reduced samples. Fifty micrograms of Rhesus macaque sperm was loaded in triplicate and the gel was run for 35 min at a 200V constant. Following electrophoresis, the gel was stained using SimplyBlue™ SafeStain (Invitrogen) and destained as per manufacturer's instructions. The gel was transferred to a gel slicer (built in house) and each of the three lanes was separated from the rest of the gel by cutting vertically. Each lane was then cut into 16 equal sections and half of each section was transferred to a 96-well plate, resulting in 48 total gel slices. Gel slices were stored at Ϫ80°C until needed.
In-Gel Digestion of Proteins/MS Analysis-Using a scalpel, each gel slice was cut into smaller (ϳ1 ϫ 1 ϫ 1 mm) pieces which were transferred to a 0.6 ml microcentrifuge tube (one tube per gel slice, for a total of 48 tubes). A standard in-gel digestion protocol was performed on each gel slice as described previously (17). Gel slices were further destained using 50 mM ammonium bicarbonate/50% acetonitrile and dehydrated with 100% acetonitrile. Proteins were reduced and alkylated by treatment with dithiothreitol and iodoacetamide and subsequently digested using 15 ng/l Trypsin. Protein digests were extracted with 5% formic acid and 50% acetonitrile, and dried down using a vacuum centrifuge. Resulting peptides were reconstituted in 20 l of 0.1% formic acid.
Liquid Chromatography Mass Spectrometry-Extracted peptides were analyzed by nanoflow reverse phase liquid chromatography using a nanoAcquity LC system (Waters, Milford, MA) coupled in-line with the linear trap quadrupole LTQ Orbitrap Velos instrument (Thermo Fisher Scientific). The nano LC system included a Symmetry C18 5 m 180 m ϫ 20 mm trap column and a BEH130 C18 1.7 m 100 m ϫ 100 mm analytical column (Waters). Mobile phases A and B consisted of 0.1% formic acid in water and 0.1% formic acid in acetonitrile, respectively. Samples were loaded onto the trap column for desalting and preconcentration with 99%/1% mobile phase A:B at a flow rate of 5 l/min for 3 min. Peptides were separated on the analytical column at a flow rate of 500 nl/min with a two-step linear gradient consisting of 7% B to 25% B in 72 min and 25% B to 45% B in 10 min. The electrospray ion source consisted of a nanospray head (Thermo Fisher Scientific) coupled with a coated PicoTip fused silica spray tip with OD 360 m, ID 20 m, and 10 m diameter emitter orifice (New Objective, Inc.). Samples were analyzed using positive ion spray voltage and heated capillary temperature of 1.9 kV and 220°C, respectively.
Mass spectrometry data were collected with the instrument operating in data dependent MS/MS mode. MS survey scans (m/z 300 -2000) were acquired in the Orbitrap analyzer with a resolution of 60,000 at m/z 400 and an accumulation target of 1 ϫ 10 6 . This was followed by the collection of MS/MS scans of the 15 most intense precursor ions with a charge state Ն2 and an intensity threshold above 500 in the LTQ with the accumulation target of 10,000, an isolation window of 2 Da, normalized collision energy of setting of 35%, and activation time of 30ms. Dynamic exclusion was used with repeat counts, repeat duration, and exclusion duration of 1, 30 s, and 60 s, respectively.
Peptide Identification and Protein Annotation-The mass spectra data files were analyzed using Sequest (Thermo Fisher Scientific, San Jose, CA, USA; version 1.2.0.208) and X!Tandem (The GPM; version 2007.01.01.1) searched against the NCBI Macaca mulatta protein fasta file (dated 10/14/10, with 27088 entries). SEQUEST and X!Tandem were searched with a fragment ion mass tolerance of 0.80 Da and a parent ion tolerance of 10.0PPM. Iodoacetamide derivative of cysteine was specified as a fixed modification, whereas oxidation of methionine was specified as a variable modification. Results were merged using Scaffold (Proteome Software) version 3.0.8 which calculated False Discovery Rates (FDRs) using a reverse concatenated decoy database. Peptide identifications were accepted if they could be established at greater than 95.0% probability as specified by the Peptide Prophet algorithm (18) and protein identifications were accepted if they could be established at greater than 99.0% probability and contained at least three identified peptides. Protein probabilities were assigned by the Protein Prophet algorithm (19). Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony.
Protein identifications from Scaffold after analysis with Sequest and X!Tandem consisted of Macaque RefSeq (NCBI) protein IDs. These protein IDs were entered into BioMart (biomart.org, version 0.7) using the Ensembl Genes 54 (Sanger UK) Macaca mulatta genes (MMUL_1.0) database and used to obtain cross-referenced Ensembl Gene IDs and associated gene names. The same method was used to obtain human orthologs. Macaque and human Ensembl numbers were matched to each RefSeq protein ID in a 1:1 fashion such that there was only one macaque and one human Ensembl number per RefSeq protein ID, thus the size of the original data set was maintained. A gene name was assigned to each of the 1247 proteins identified in this study. Macaque Ensembl IDs and Human Ensembl IDs were assigned to 94% (1170/1247) and 93% (1165/1247) of proteins, respectively.
Gene Ontology, Sperm Phenotype and Genomic Analysis-The gene group functional profiling (g:GOSt) analysis tool from g:Profiler (20) was used to conduct Gene Ontology (GO) analyses. Macaque Ensembl Gene IDs (1166) for the genes expressed in the MacSP were queried against the Macaca mulatta data set. Significance of over-represented GO categories, as compared with the whole macaque proteome, was determined using the Benjamini-Hochberg FDR significance threshold. The expected number of proteins in each functional category in the whole macaque proteome was calculated for comparison with the MacSP. The distribution of functional categories in the MacSP as compared with the whole proteome is shown in Fig. 2. GO category enrichment is indicated by an asterisk (p Յ 0.05). A description of each of the GO terms can be found at www.geneontology.org.
Mammalian phenotype information was obtained from the MGI mammalian phenotype database (http://www.informatics.jax.org/) using annotated orthology relationships with the mouse and human. These results were further curated to obtain a comprehensive list of abnormal sperm morphology or process phenotypes. GO category enrichment was conducted using g:Profiler. The chromosomal distribution of macaque sperm genes was obtained from gene map coordinate file available from NCBI (Cyto_gene mapview 1.2). A Chisquared test was used to compare the proportion of X-linked and autosomal macaque sperm genes to the genome-wide distribution of genes.
Mouse and Human Sperm Proteome Comparisons-To compare the MacSP to other published proteomes, we obtained the MouseSP and HumanSP IDs from the published literature (6,9) and converted them to their corresponding Ensembl IDs using Biomart (Biomart.org). The resulting 934 mouse and 925 Human Ensembl gene IDs were then used to identify the corresponding orthologs present in the MacSP data set.
Network analyses were conducted on the genes found to be common to the macaque, mouse, and humanSPs and on the genes found to be common to both the macaque and mouse, and the macaque and human sperm proteomes. Networks were created using GO Biological Process Annotations (downloaded 23.05.2013) as implemented in the ClueGo plugin v2.0.5 (21) of Cytoscape v3.0.0 (22). For the macaque-, mouse-, and humanSP overlap, an enrichment (rightsided hypergeometric) test was used with a Benjamini-Hochberg multiple test correction. GO Tree Levels (min ϭ 1; max ϭ 2) and GO term restriction (min#genes ϭ 3, min% ϭ 1%) were set and terms were grouped using a Kappa Score Threshold of 0.3. For the macaque/mouse and macaque/humanSP overlap networks, an enrichment (right-sided hypergeometric) test was used with a Benjamini-Hochberg multiple test correction. GO Tree Levels (min ϭ 1; max ϭ 3) and GO term restriction (min#genes ϭ 5,min% ϭ 4%) were set and terms were grouped using a Kappa Score Threshold of 0.3. For all networks, resulting groups consisted of a minimum of two terms, and groups sharing Ͼ50% of terms were merged.
Phylogenetic Analysis-The MacSP contains 3 ADAM-like proteins annotated as ADAM18, 20, and 21-like. Protein BLAST queries of these proteins identified higher similarity to other ADAMs, in particular ADAMs 3, 4, and 6. To further examine this, all available macaque, human, chimpanzee, mouse, and rat ADAM sequences (101 total) were obtained from NCBI. Alignment and phylogenetic analysis of these protein sequences were conducted in MEGA5 (23) as described below with 50 bootstrap replicates to investigate placement of the ADAM-like proteins in the tree (not shown).
For subsequent analyses, a subset of 22 amino acid sequences (supplemental Table S1) from the macaque, human, chimpanzee, mouse, and rat for ADAMs 3, 4, 6, 18, 20, and 21 were aligned by ClustalW as implemented by MEGA5. All ambiguous positions were removed for each sequence pair and there were a total of 939 positions in the final data set. Evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model (22) and the bootstrap consensus tree, as shown in Fig. 3, was inferred from 1000 replicates (24). The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test are shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically as follows. When the number of common sites was Ͻ 100 or less than one fourth of the total number of sites, the maximum parsimony method was used; otherwise BIONJ method with MCL distance matrix was used. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Evolutionary analyses were conducted in MEGA5. Orthology was determined using reciprocal best protein BLAST queries of ADAM18, 20, and 21-like against mouse (Mus musculus) and rat (Rattus norvegicus) proteins.
Expression Analysis-Gene expression data for five rhesus macaque tissues (cerebral cortex, pancreas, thymus, testis, and an immortalized fibroblast cell line) obtained from three different research centers (25) was downloaded from the NCBI Gene Expression Omnibus (GEO, series GSE7094). The data set from the Yerkes National Primate Research Center was omitted from our analysis because their total intensities were markedly higher than the other two. Affymetrix Probeset Identifiers were matched to Ensembl macaque sperm gene IDs using BioMart. In total, there were 1774 Affymetrix probeset matches to macaque sperm genes, some represented by multiple probes on the array. Multiple matches were collated using the similarity of gene expression intensities across samples implemented in the Inferno software program (https://code.google.com/p/ inferno4proteomics/), resulting in 1041 MacSP genes for which expression data was available.
The average expression of the MacSP data set in the testis was compared with other tissues (cerebral cortex, pancreas, thymus, fibroblast). The ratio of average expression of sperm proteome genes to the averages in the other tissues was log2 transformed to determine the fold change of that particular gene in the testis as compared with all other tissues (supplemental Table S2).
Statistical tests were performed to determine if any sperm proteome genes were over-or underexpressed in the testis compared with other tissues and cell types. Gene expression intensities in the testis samples were compared with a background data set that consisted of the four averaged sets taken from the GSE7094 series. A standard t test, as implemented in the Inferno program, identified a set of 242 genes differentially expressed in the testis (p Ͻ 0.05) (supplemental Table S2).
Network Analysis-A network analysis was conducted on the 242 genes expressed in macaque sperm that are over-or underexpressed significantly in the testis. The network was created by comparing GO Molecular Function Annotations (downloaded 05.01.2012) for overand underexpressed genes as implemented in the ClueGo plugin v1.4 (21) of Cytoscape v2.8.2 (22). A two-sided hypergeometric enrichment and depletion test was used with a Benjamini-Hochberg multiple test correction. GO Tree Levels (min ϭ 1; max ϭ 3) and GO term restriction (min#genes ϭ 5, min% ϭ 1%) were set and terms were grouped using a Kappa Score Threshold of 0.1. Resulting groups consisted of a minimum of two terms, and groups sharing Ͼ50% of terms were merged.

Sperm Protein Identification by MS-Shotgun
MS/MS identified 1247 proteins in purified macaque sperm samples using strict criteria mass spectra analysis as well as protein identification (supplemental Table S3). False Discovery Rates (FDRs) using a reverse concatenated decoy database resulted in estimates of peptide and protein FDR of Ͻ1%, supporting the high quality of the data set. Three replicate sperm samples were analyzed and each sample resulted in the identification of between 5449 and 7523 unique peptides that identified between 928 -1056 proteins per sample ( Table  I). The average coverage of identified proteins ranged from 17.8% to 24.0% with an average across all samples of 20.7% (Table I, Supplemental Table S3). Although the cutoff for the minimum number of peptides identified per protein was set to 3, the average number of peptide hits per protein was over twice this value (6.46) reflecting the depth of the MS data. In addition, 59.5% (742/1247) of proteins in our data set were identified in all three samples whereas 82.4% (1028/1247) of proteins were identified in at least two samples (Fig. 1).
Abundant proteins in the MacSP were ranked by spectral counts providing a semiquantitive measure of relative protein levels (Table II, supplemental Table S3) (26). By far, the most highly abundant protein was a-kinase anchor protein 4-like (AKAP4), a known constituent of mammalian sperm. Orthology searches of the corresponding gene found that it shares one-to-one orthology with AKAP4 of 7 primate species, including human, as well as rat and mouse. AKAP4 was also identified as the most abundant protein in the rat sperm proteome (7) and is the most abundant protein constituent of the sperm fibrous sheath (27). Several keratins, including KRT1, 2, 5, 6a, 10, and 16 were also found to be among the 20 most abundant proteins identified in the MacSP. Although keratins are often associated with sample contamination, these keratins have previously been demonstrated to be present in the human sperm nucleus (5). Additional keratins have also been identified in the rat, human and mouse sperm proteomes (2,6,8,10) suggesting that these proteins play important functional roles in sperm. LOC703932, the 13th most abundant protein, has high sequence homology with keratin 9 from several species including human, orangutan (Pongo abelii), and marmoset (Callithrix jacchus). Keratin 9, along with other proteins, has been found to comprise the perinuclear ring of the manchette, which nucleates extending microtubules in the developing spermatid (28).
Mammalian Sperm Phenotypes-We identified 74 proteins in the MacSP which, when mutated or defective in mice or humans, result in abnormal sperm development or function (Table III). This includes 37 proteins associated with abnormal sperm morphology and 50 associated with abnormal physiological, motility, capacitation, or fertilization processes. Statistical analysis of functional over-representation among these genes found that abnormal sperm morphology phenotypes are enriched for proteins that localize to the flagellum (p ϭ 2.50 e-8; GO:19861) and cilium (p ϭ 2.00 e-4; GO:5929), including the structural genes TEKT2, 3 and 4 and ODF2. As expected, abnormal sperm process genes are significantly enriched for genes involved in fertilization (p ϭ 8.04 e-16; GO: 9566), sperm-egg recognition (p ϭ 1.64 e-4; GO: 35036) and those which localize to the acrosomal vesicle (p ϭ 4.68 e-11; GO:1669). Notably this group includes a number of highly studied sperm proteins involved in sperm-egg interaction, including IZUMO1, ADAM1, AKAP4, CRISP1, ACR, ZPBP1, and ZPBP2.
Gene Ontology (GO) Functional Analysis-Annotated molecular functions were obtained for 72% (896/1247) of the genes in the data set, which were placed among 13 broad functional categories (Fig. 2). Three of the categories, catalytic, antioxidant, and electron transport were statistically enriched in the MacSP (p Ͻ 0.05) as compared with the entire genome. Further analyses of the two most abundant annotated categories in the MacSP, catalytic and binding functions (described below), revealed a higher proportion of proteins with a catalytic function, but a slightly lower proportion of proteins with binding function in the MacSP (Fig. 2). With regards to catalytic function, the MacSP contains an overabundance of proteins with oxidoreductase-(116 proteins, p ϭ 3.44 e-17), hydrolase-(210 proteins, p ϭ 4.83 e-12), and isomerase (20 proteins, p ϭ 1.69 e-2) activity as compared with the whole proteome. The MacSP is enriched for oxidoreductases that act on NADH and NADPH (58 proteins, p ϭ 3.90 e-08), including dehydrogenases that act on NADH (26 proteins, p ϭ 9.78 e-08). The MacSP also contains hydrolases enriched in peptidase activity (65 proteins, p ϭ 1.75 e-05), particularly threonine-type peptidases/endopeptidases (14 proteins, p ϭ 1.77 e-10), and hydrolases acting on acid anhydrides (92 proteins, p ϭ 5.55 e-09).
The MacSP is enriched in a number of binding proteins including small molecule binding (205 proteins, p ϭ 2. Comparison to the Mouse and Human Sperm Proteomes-Previous studies have characterized the mouse and human sperm proteome but used fundamentally different sperm isolation techniques. The mouse sperm proteome was determined using sperm purified from the cauda epididymis, consistent with the methods used in this study, whereas the human analysis used sperm purified from total ejaculates. To assess the possible impact of nonsperm ejaculate contamination within the human sperm proteome, we calculated the number of orthologous proteins within each proteome (supplemental Table S4). Based on sperm proteome size and expected overlap based on total genome orthology, the MouseSP and MacSP are expected to share 589 orthologs whereas the HumanSP and MacSP are expected to share 692 orthologs. Contrary to what would be expected based on their evolutionary relationship, we identified a significantly higher (p ϭ 0.0037) proportion of orthology relationships between the macaque and mouse sperm proteomes (436 proteins; 47% of total proteome, 74% of expected) than between the MacSP and HumanSP (368 proteins; 40% of total proteome, 53% of expected). This finding is likely attributed to differences in sample preparation, MS techniques and the potential for the human sperm proteome to contain a number of contaminating seminal proteins that are not conserved components of sperm. This illustrates the difficulty in comparing sperm proteome data across species and across studies where varying methods are employed.
The Rhesus Macaque Sperm Proteome related to sperm biology including response to oxidative stress, sperm motility, and sexual reproduction. These common biological processes are also reflected in the triple overlap between the MacSP, HumanSP and MouseSP as detailed below.
Overall, 209 proteins comprise a core data set common to all three sperm proteomes (supplemental Table S4). We analyzed the pattern of enrichment of the GO functional categories statistically over-represented by the triple overlap data set using Cytoscape (Fig. 4). As expected from the pair-wise comparisons, a number of functional categories related to reproduction, fertilization, and sperm-egg recognition are represented, along with metabolic and energy production processes. The high level of functional coherence observed reflects a significant coverage of the proteomes by our MS approach, however, a number of differences between the three sperm proteomes were also apparent (see Discussion).
Phylogenetic Analysis of Macaque ADAM-like Proteins-Consistent with biochemical studies of mammalian sperm and functional studies of sperm-egg fusion (29, 30) our analysis identified eight members of the ADAM family in the MacSP. Among these, three proteins were annotated as ADAM-like, including ADAM18-, 20-, and 21-like, indicating high sequence similarity with the annotated ADAM18, 20, and 21 genes. The ADAM family of proteins has undergone significant expansion in the mouse lineage, whereas many of the human orthologs are pseudogenes. Because many ADAM proteins have evolved key roles in sperm biology, we were interested in determining whether these ADAM-like proteins (iii) were mis-annotated, and represent known mammalian ADAMs or (ii) represented a functional expansion of ADAM proteins in the primate lineage. Frequent gene turnover and pseudogenization among the ADAM gene family of proteins led us to suspect that these genes may be mis-annotated because of gene gain/loss during primate evolution. Construction and examination of the ADAM family phylogenies using all available ADAM sequences for the macaque, human, chimpanzee, mouse, and rat robustly groups ADAM18, 20 and 21 within a monophyletic group, exclusive of the ADAM18, 20 and 21-like proteins. Further phylogenetic and reciprocal BLAST analysis demonstrates a close evolutionary relationship and phylogenetic cluster between ADAM18-like and ADAM3-, ADAM20-like and ADAM4-and ADAM21-like and ADAM6-groups (Fig. 3). The identification of orthologs of ADAM3, 4, and 6 in the MacSP provides empirical evidence of sperm genes functioning in primate taxa, which have been subsequently lost in the human lineage.
Genomic Analysis of the MacSP-The X chromosome contains an underrepresentation of genes with male biased expression or function because (1) the X chromosome spends two-thirds of its life history in females, and may therefore be a selectively unfavorable location for such genes, and (2) the mammalian X chromosome is inactivated following the meiotic stages of spermatogenesis. These predictions have been supported in the analysis of male-biased expressed genes in Drosophila and mouse (31,32) and in the documented patterns of gene movement between the autosomes and the X chromosome (33,34). An analysis of the macaque sperm proteome further supports these predictions as the chromosomal distribution of sperm proteome genes is not uniform and displays a 28.3% underrepresentation on the X chromosome (2 ϭ 4.91; p Ͻ 0.05).
Expression Analysis-The testis expression of 1041 MacSP genes were extracted from previous macaque expression data sets (supplemental Table S2) and compared with the average expression of these genes in other tissues. The relative fold change distribution (Fig. 5) of the majority of testisexpressed genes (74%, 768/1041) was expressed in the in-terval between Ϯ 2 fold and thus not significantly different from the relative expression in other tissues. A minority of genes were found to be under-expressed by more than twofold (117/1041; 11%), and less than 1% (8/1041) under-expressed fivefold or more. Only a modest fraction, (156/1041; 15%) were found to be overexpressed by Ͼtwofold, with only ϳ4% (37/1041) overexpressed by Ͼfivefold compared with other tissues. Interestingly, only nine genes (Table IV) were over-expressed by Ͼ10-fold. Functional analysis of the 37 genes overexpressed by Ͼfivefold revealed that, five genes (SLC25A31, L1TD1, ZPBP, PACRG and REEP6) are unannotated for molecular function. However, GO annotates ZPBP for the biological process: binding of sperm to the zona pellucida, and two genes, SLC25A31 and PACRG, are annotated for cellular components: mitochondria. In humans, SLC25A31 is further annotated as having transporter activity and being involved in trans-membrane transport, and as a component of the flagellum. The remaining 32 genes were annotated for binding functions, including nucleotide, metal ion, and protein binding. Nearly half (15/32) also possess catalytic activity, and are annotated as hydrolases and transferases. Of the 32 genes, a number were also annotated as having structural molecule activity (5/32) and enzyme regulator activity (5/32).
Table IV details the annotated molecular functions and biological processes of the nine genes found to be more than 10-fold overexpressed in the testis. Despite being present in sperm and highly overexpressed in the testis, three, SERPINA5, DDX4, and MYLK, are annotated as being involved with reproduction and only SERPINA5 and DDX4 are annotated as being involved in spermatogenesis. Five genes are annotated as having nucleotide binding molecular functions, three of which are also annotated as nucleic acid binding proteins. The remaining two genes, TUBA3E and MYLK, bind GTP and ATP, respectively. Overall, the nine highly overexpressed genes possess a range of molecular functions and are involved in diverse biological processes.  Our analyses further identified 242 genes expressed in macaque sperm that are significantly over-or underexpressed in the testis relative to other tissues (supplemental Table S2). Interestingly, these genes were about evenly split; 128 overand 113 underexpressed. Despite the fact that sperm originate in the testis, this result illustrates that studying testis tissue expression is not necessarily informative with regards to sperm composition. Sperm generally acquire proteins in two different tissues, the testis during spermatogenesis and the epididymis during the post-testicular maturation process. This analysis can, however, provide some information with regards to the tissue of origin of sperm proteins. For example, the 128 significantly overexpressed testis genes represent a subclass of proteins with sperm-specific functions. Conversely, genes underexpressed in the testis, but whose protein products have been identified in the MacSP, may be translated and added to sperm during epididymal maturation. Additional functional genomic and bioinformatics analyses are needed to determine the overall significance of these protein classes. Fig. 6 shows a molecular function network analysis for the significantly over-and underexpressed genes. Nodes enriched with overexpressed genes are shown in red, and underexpressed genes shown in green. The most abundant annotated functional categories of over-and underexpressed genes include catalytic, binding, and structural molecules (data not shown), all of which are represented in the network analysis. However, genes with oxidoreductase activity have a tendency to be underexpressed in the testis as compared with other tissues, whereas genes with lyase, transferase, hydrolase, and peptidase activity are overexpressed. A similar number of genes also have a binding function, however nucleotide binding proteins, including ribonucleotide and purine nucleotide binding proteins tend to be overexpressed along with calmodulin binding proteins. Genes with an un-annotated molecular function comprise 34% of both overexpressed (44/ 128) of and underexpressed (39/114) genes, similar in proportion to all genes of the MacSP. DISCUSSION We used strict parameters for peptide and protein inclusion to generate a robust Rhesus macaque sperm proteome containing 1247 proteins. These criteria required scores for peptide identifications at greater than 95.0% probability as specified by the Peptide Prophet algorithm (18) and protein identifications were only accepted if they could be established at greater than 99.0% probability and contained at least 3 identified peptides. Previous sperm proteomics studies have used different search parameters including proteins identified by two, or in some cases even one peptide (2, 6, 7). We chose our parameters to ensure the accuracy of the identified proteins and to reduce false positives, which is particularly crucial for comparative studies of sperm proteome composition de-FIG. 6. Network molecular function analysis for genes found to be expressed in macaque sperm that are significantly over-or under-expressed in the testis at p < 0. 05 (242 genes total). Functional category nodes enriched for overexpressed genes are shown in red, and under-expressed shown in green. signed to gain evolutionary insights of sperm function and origin.
Relevance to Studies of Human Fertility-Despite a number of similarities, a number of functional differences can be observed between the macaque/mouse and macaque/human SP overlap networks (supplemental Fig. S1). Interestingly, the MacSP and MouseSP contains a group related to complement activation, whereas this node is absent in the MacSP and HumanSP network. Indeed, the MacSP and MouseSP contain several proteins involved in complement activation including C3, CD46, and CD55 not identified in the HumanSP. CD46 is a rapidly evolving protein in the mouse, and is associated with the sperm acrosome, although its precise role in acrosome biology is unknown, whereas CD55 is thought to play a role in modulating the complement pathway (35,36). In the sexual reproduction group, proteins involved in spermegg recognition are enriched in the MacSP and MouseSP but not in the MacSP and HumanSP network. Further examination revealed three proteins present in the MacSP and MouseSPs absent from the HumanSP and include ADAM3 (a pseudogene in humans), ZAN and ZPBP2, all of which have previously characterized sperm phenotypes (see Table III). These differences could reflect the unique evolutionary trajectory of these proteins in the human lineage, and therefore provide targets for future studies into their functional significance.
The MacSP and HumanSP network analysis reveals a number of functional categories that are not present in the MacSP and MouseSP overlap networks including homeostasis, negative regulation of cell growth, cell cycle regulation, antigen processing and presentation, membrane organization, and maintenance of localization. Several proteins involved in membrane organization were identified in the MacSP and HumanSP not identified in the MouseSP and include ANXA1, ANXA2, CACNA1A, HSPA4, RPS27A, and SYNE2. Two proteins involved in antigen processing and presentation, RPS27A and NPEPPS, were also absent from the MouseSP but present in the MacSP and HumanSP. Finally, the MacSP and HumanSP contains several proteins involved in maintenance of localization, which were not identified in the MouseSP including ARHGAP21, FLNA, GAA, JUP, SYNE1, SYNE2, and TLN1. Overall, the proteins common to the MacSP and Hu-manSP fall into a much broader range of functional categories than the proteins identified in the MacSP and MouseSP. A deeper understanding and the significance of these differences await further MS verification and functional studies.
Several previous studies have used MS to investigate the proteomic composition of human ejaculates, a combination of sperm mixed with seminal fluids. A previous study identified 923 proteins in seminal fluid and noted that a few proteins (e.g. semenogelins) present at high concentrations complicated identification of lower abundance proteins (16). Likewise, proteomic analyses of sperm from ejaculates could similarly mask low abundance proteins and reduce the number of sperm proteins identified while at the same time making it difficult to distinguish sperm from seminal fluid proteins. For example, a previous study identified heat shock proteins HSPA5 and HSPD1 on the surface of ejaculated human sperm, however it was unclear whether these proteins are components of sperm or simply adsorbed to the sperm surface following ejaculation (37). The identification of HSPA5 and HSPD1 in macaque epididymal sperm suggests that these proteins are sperm components before ejaculation. This finding is further supported by the identification of HSPA5 and HSPD1 in mouse epididymal sperm (9). Because this study used epididymal sperm devoid of seminal fluid, the data from this study can, by inference, be extended to the human sperm proteome.
A study by Johnston et al. identified 1760 proteins in human sperm for which the resulting list of sperm proteins was not published (10). However, this study acknowledged that many of the proteins found on ejaculated sperm are synthesized in the accessory sex glands, and acquired long after the morphological differentiation of the spermatozoa, potentially indicating that a large portion of their data set was composed of seminal proteins, rather than sperm proteins. A subsequent study, using proteins separated into Triton X-100 soluble and insoluble fractions, identified 1053 proteins in human sperm (6). This data set contained semenogelins, which are a predominant component of semen (38 -40). GO functional analysis also revealed that the most prevalent proteins are those with binding and catalytic functions (6), consistent with our analysis of the MacSP. Additionally, we observed that oxidoreductases, including oxidoreductases acting on NADH and NADPH, and proteins with electron carrier and antioxidant activity are highly enriched in the macaque, mouse and human sperm proteomes. The abundance of these GO categories presumably reflects the demand on sperm to generate energy for motility.
We compared the human sperm proteome (925/1053 proteins for which we were able to identify macaque orthologs) and MacSP and found 368 proteins common to both proteomes. Based on their evolutionary relationship, it was expected that the macaque and human sperm proteomes would share a higher degree of overlap than the macaque (iii) and mouse sperm proteomes. Interestingly, we observed a higher degree of overlap between the macaque and mouse sperm proteomes. This observation may be because of a number of factors including (i) contamination of the HumanSP with seminal proteins leading to problems discussed above (ii) differences in sample preparation and MS, leading to selection or enrichment of particular types of proteins, or (iii) significant evolutionary differences between sperm composition in humans and primates, reducing the amount of overlap observed between the Rhesus macaque and human sperm proteomes.
Sperm Proteasome-The sperm proteasome has recently become of interest because of its likely role in facilitating penetration through the egg vitelline coat during fertilization, sperm capacitation and the acrosome reaction (41,42). GO analysis revealed that the MacSP, MouseSP, and HumanSP are all highly enriched for proteins in the proteasome complex as determined using g:Profiler. (p ϭ 5.86 e-10, p ϭ 2.08 e-28 and p ϭ 3.93 e-24 respectively). The 26S proteasome functions to degrade ubiquitinated proteins and consists of a 20S core particle and a 19S regulatory complex positioned at one or both ends of the core. We identified 28 proteasome proteins in the MacSP (Fig. 7) including PSMA1-6, PSMA8, a testis-specific paralog of PSMA7, PSMB1-7, which comprises the entire 20S core particle, PSMC 3, 4, 5, 6 and PSMD1-2, which comprises most of the 19S base structure, and PSMD3, 7, and 11-14, which comprises most of the 19S lid structure. A previous study found that one component of the alpha proteasome complex, PSMA3, was reduced in patients with asthenozoospermia as compared with normozoospermic patients, suggesting that proteasome activity (or some unknown component of the proteasome) may play a role in sperm motility (43). We also identified PSMG2, which forms a heterodimer with PSMG1 (not identified) and promotes assembly of the 20S core subunit (44).
Finally, an alternate proteasome activator protein, PSME4 (PA200) was also identified in the MacSP. PSME4 has been proposed to replace the 19S regulatory complex at one or both ends of the 20S core particle (reviewed in (45)) and play a role in DNA double strand break repair (46), however mice lacking this gene show no developmental defects (47). Mice lacking PSME4, however, exhibited marked reduction in male, but not female, fertility as a result of pre-and postmeiotic defects in spermatogenesis resulting in a reduction of normal sperm (47). Interestingly, PSME4 exhibits broad tissue expression with the highest level of expression detected in the testis. Identification of a myriad of proteasome subunits in mature sperm suggests that mature proteasomes are present in sperm as they leave the epididymis and that various different proteasome complexes, such as those employing the 19S or PSME4 (PA200) regulatory complex may exist in mature sperm.
Divergence of Mating Systems and Proteome Diversification-Aminopeptidases-Although their precise function in sperm has yet to be elucidated, aminopeptidases appear to play an important role in sperm development and function and several members are found to be constituents of sperm across diverse taxa. In Drosophila the leucine aminopeptidase gene family has undergone significant expansion and functional diversification and represents the most abundant proteins in sperm, however, their function(s) have yet to be determined (48). Four aminopeptidases were identified in the MacSP including leucine aminopeptidase 3, LAP3; aminopeptidase-like 1, NPEPL1; an aspartyl aminopeptidase, DNPEP; and a puromycin-sensitive aminopeptidase NPEPPS. All four aminopeptidases identified in the MacSP have been shown in other mammals to play crucial roles in sperm development or function and are likely conserved constituents of sperm across a variety of taxa, however further studies are required to understand their roles in sperm biology.
LAP3 has also been identified in the mouse (9), rat (7), human (6) and Drosophila (2) sperm proteomes and thus appears to be a conserved sperm constituent across taxa. The MacSP also includes a paralog of LAP3, NEPEL1 (ensembl. org), also identified in the rat sperm proteome (7). A previous study identified both LAP3 and NEPEL1 as potential targets for protein S-nitrosylation (49), which has been proposed to occur in the female reproductive tract because oviduct cells produce biologically significant levels of NO (50). It is thus possible that this modification plays a key role in activating these proteins and that they play a role in capacitation, the acrosome reaction, or sperm-egg binding.
DNPEP has also been identified in the mouse (9), rat (7) and human (5) sperm proteomes and a recent MS study identified DNPEP in bovine epididymosomes from both the caput and cauda epididymis (51). Epididymosomes have been proposed to play a role in the transfer of proteins to sperm during the FIG. 7. Graphical representation of the 26S proteasome. Subunits identified in the MacSP are shaded, whereas those which were not identified remain white. Every subunit comprising the 20S core particle (PSMA1-6, PSMA8 and PSMB1-7) were identified, as well as PSMG2 (not pictured), which functions to facilitate 20S assembly. The majority of the base and lid complex, which comprise the 19S regulatory particle, were also identified. Although not depicted, PSME4 (PA200) was also identified in the MacSP and functions to replace one or both of the 19S regulatory particles. Fig. adapted from KEGG (www.genome.jp/kegg/). maturation process in the epididymis. However, only a few proteins are known to be acquired by this mechanism. Although it remains unclear whether DNPEP is added to sperm via epididymosomes during the maturation process, this protein has been suggested to play a role in remodeling epididymal sperm components (51). A previous study identified DNPEP in mouse cauda epididymal sperm and found that peptide intensity, as measured by MS, increased more than twofold in capacitated sperm suggesting significant posttranslational modification (52). This study also predicted that DNPEP has a phosphorylation motif, and thus may undergo post-translational modification during capacitation.
NPEPPS, a puromycin sensitive aminopeptidase, was also identified in the rat and human sperm proteomes and is believed to play a role in male fertility and protection against protein aggregation (53). NPEPPS deficient mice exhibit a wide range of phenotypes including loss of behavioral mating activity, germ cell degeneration, lower testis weight, sperm number, and sperm motility (54).
ADAM Proteins-Hydrolases were one category of catalytic proteins found to be highly enriched in the MacSP (p ϭ 4.83 e-12) and include peptidases and, in particular, metalloproteases. The MacSP contains several members of the ADAM (a-disintegrin and metalloprotease) family of proteins including ADAM1, 2, 10, 30, 32. These ADAMS, excluding ADAM10, are known to exhibit testis specific or testis-biased expression and have been previously identified in mature sperm.
ADAMs 1 and 2 are expressed as heterodimers on the sperm surface and are essential for sperm migration through the female reproductive tract, zona pellucida binding on the egg, and have also been suggested to play a role in the regulation and localization of sperm proteins (55). Although present in many nonhuman primates, ADAM1 is nonfunctional in humans as a result of a variety of mutations that disrupt the reading frame (56). Although many ADAMs involved in reproduction and expressed on the sperm surface have found to be under positive selection (9,57), in the macaque the adhesion domain of ADAM2 has been found to be under accelerated evolution (57). The function of ADAMs 30 and 32 in sperm is unknown, however previous work (58,59) determined that they were localized on the sperm surface, and ADAM32 has been suggested to play a role in sperm development or fertilization. Notably, ADAM30 is not found on the surface of testicular sperm (58) suggesting that it may be acquired during epididymal transit.
Additionally, we identified ADAM10 in the MacSP, which to our knowledge is the first time this ADAM has been identified as a constituent of mature sperm. ADAM10 is expressed in a variety of tissues and is highly expressed in the central nervous system (60). Previous studies have attempted to generate ADAM10 knockout mice, and whereas heterozygous males have normal phenotypes and are fertile, homozygous mice are not viable and die early in embryogenesis (61). Although fertile, this study did not assay whether heterozygous males exhibited a reduction in fertility or abnormal sperm physiology or number, which remains to be elucidated. ADAM10 did not exhibit signs of positive selection in the mammalian lineage (57), which suggests that ADAM10 may play a role in a conserved cellular process unrelated to sperm-egg binding. Although the role of ADAM10 in mature sperm has yet to be investigated, it may play a role in DNA damage induced apoptosis as ADAM10, along with ADAM17, increase in abundance and cell surface localization in rat spermatocytes during DNA damage induced apoptosis (62). ADAM10 has previously been implicated in the NOTCH or EGFR signaling pathways. Given the known role of the EGFR signaling pathway in both capacitation and the acrosome reaction (reviewed in (63)), ADAM10 may play a role in one or both of these processes.
The MacSP also contains 3 ADAM-like proteins annotated as ADAM18, 20, and 21-like. Protein BLAST queries of these proteins identified higher similarity to other ADAMs, and further examination of the ADAM family phylogenies using all available ADAM sequences (obtained from NCBI) for the human, chimpanzee, mouse, and rat clearly places ADAM18, 20 and 21 in their own monophyletic group (Fig. 6). This analysis allowed us to assign ADAM18-like to the ADAM3-, ADAM20-like to the ADAM4-and ADAM21-like to the AD-AM6-groups. Interestingly, human ADAMs 3, 4, and 6 are annotated as pseudogenes (64,65) and thus are not present in the human protein database. This most likely explains the mis-annotation of the macaque ADAMs 3, 4 and 6 given that annotation of the macaque proteome was likely largely based on the human (12). Although the human genome contains two copies of ADAM3 (CYRN1 and CYRN2), a previous study demonstrated that 27% of fertile men carry a large deletion in CYRN1 and CYRN2 contains stop codons in all three reading frames (66). Although CYRN1 and CYRN2 are transcribed, they are not translated and are thus nonfunctional (66). Most pseudogenes exist as disabled copies of functional parent genes, however a recent study identified ADAM3 as a unitary pseudogene in humans. A unitary pseudogene has no correlated functional parental gene, and loss of ADAM3 function is thought to have occurred in the human lineage after humanchimpanzee divergence (65). Likewise, the human ADAM4 genes contain a number of stop codons, whereas the ADAM6 gene has a frameshift mutation, resulting in nonfunctional pseudogenes (64). The identification of orthologs of ADAM3, 4 and 6 in the MacSP provides empirical evidence of sperm genes functioning in primate taxa, which have been subsequently lost in the human lineage. In species for which ADAMs 3, 4 and 6 have been studied, they exhibit testis-biased expression and have been identified in sperm (reviewed in (59)).
Despite the widespread use of rodents in studies of male infertility, nonhuman primates (especially Old World Monkeys such as the Rhesus macaque) are indispensable models for the study of a wide range of physiological, hormonal and molecular reproductive processes in humans. Characteriza-tion of the macaque sperm proteome, using highly sensitive and robust proteomic and statistical methodologies, has identified a wealth of proteins with identified or predicted functions in human or mouse sperm and therefore provides a power foundation for future molecular genetic studies of human spermatogenesis and, importantly, studies that link population-level genetic variation to sperm function and competitive ability. Previous studies using similar proteomic approaches in humans have been limited by the availability of "pure" sperm isolated form the epididymis, further emphasizing the relevance of the macaque as a model for the study of male fertility. The macaque sperm proteome has also highlighted specific cases where the molecular basis of sperm is divergent between closely related primate species and thus provides novel empirical support for the rapid evolution of genetic systems which differentiate humans from the closest relatives. * □ S This article contains supplemental Fig. S1 and Tables S1 to S4.
ʈ To whom correspondence should be addressed: Center for Infectious Diseases and Vaccinology and Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University, Tempe, AZ.