Parallel Proteomics to Improve Coverage and Confidence in the Partially Annotated Oryctolagus cuniculus Mitochondrial Proteome*

The ability to decipher the dynamic protein component of any system is determined by the inherent limitations of the technologies used, the complexity of the sample, and the existence of an annotated genome. In the absence of an annotated genome, large-scale proteomic investigations can be technically difficult. Yet the functional and biological species differences across animal models can lead to selection of partially or nonannotated organisms over those with an annotated genome. The outweighing of biology over technology leads us to investigate the degree to which a parallel approach can facilitate proteome coverage in the absence of complete genome annotation. When studying species without complete genome annotation, a particular challenge is how to ensure high proteome coverage while meeting the bioinformatic stringencies of high-throughput proteomics. A protein inventory of Oryctolagus cuniculus mitochondria was created by overlapping “protein-centric” and “peptide-centric” one-dimensional and two-dimensional liquid chromatography strategies; with additional partitioning into membrane-enriched and soluble fractions. With the use of these five parallel approaches, 2934 unique peptides were identified, corresponding to 558 nonredundant protein groups. 230 of these proteins (41%) were identified by only a single technical approach, confirming the need for parallel techniques to improve annotation. To determine the extent of coverage, a side-by-side comparison with human and mouse cardiomyocyte mitochondrial studies was performed. A nonredundant list of 995 discrete proteins was compiled, of which 244 (25%) were common across species. The current investigation identified 142 unique protein groups, the majority of which were detected here by only one technical approach, in particular peptide- and protein-centric two-dimensional liquid chromatography. Although no single approach achieved more than 40% coverage, the combination of three approaches (protein- and peptide-centric two-dimensional liquid chromatography and subfractionation) contributed 96% of all identifications. Parallel techniques ensured minimal false discovery, and reduced single peptide-based identifications while maximizing sequence coverage in the absence of the annotated rabbit proteome.

tected here by only one technical approach, in particular peptide-and protein-centric two-dimensional liquid chromatography. Although no single approach achieved more than 40% coverage, the combination of three approaches (protein-and peptide-centric two-dimensional liquid chromatography and subfractionation) contributed 96% of all identifications. Parallel techniques ensured minimal false discovery, and reduced single peptide-based identifications while maximizing sequence coverage in the absence of the annotated rabbit proteome. The ability to detect the dynamic protein component of any system is determined by the inherent limitations of the technologies used, the complexity of the sample, and the existence of an annotated genome. Although there are numerous annotated species, species variations can on occasion, make it impossible to compromise. With the underlying biological question an important factor in experimental design, a significant limitation to high-throughput proteomics is the inherent difficulties associated with the selection of a non-or partially sequenced and/or annotated species. Species differences result in cellular and molecular heterogeneity, even though broad functional homogeneity is retained. The heart for example, although functioning as a muscular pump, shows vast species heterogeneity, including major distinctions in the rate of contracture between rodents (600 beats/minute; mouse) and larger mammals (60 beats/minute; dog) (1). The species discrepancy in heart rate has been shown to be unrelated to myofibrillar density (2). Instead there is a close relationship between heart rate and mitochondrial content, oxygen consumption and oxidative capacity (1)(2)(3). These species differences increase the possibility for differences in their respective cardiomyocyte proteomes.
Oryctolagus cuniculus (rabbit) is often selected for myocardial studies, along with other species lacking deep genome coverage, including dog (3081 UniProtKB identifiers), pig (7948 identifiers), and to a lesser extent, bovine (15,575 iden-tifiers). With the rabbit whole genome shotgun not complete, there is incomplete annotation of the proteins associated. The challenges that this presents to proteomic studies are often outweighed by the physiology and pathophysiology of the species being more similar to human. Animal models are often used because the ability to obtain sufficient amounts of appropriate human tissue can be challenging. Human myocardium is generally scarce and rarely obtained in large quantities from nonpathophysiological states, potentially influencing protein localization and modification status, thereby altering the types of protein identified. The applicability to human tissue is the gold-standard measurement of any animal model. For cardiac studies, functional discrepancy between rodents and large mammals is a major consideration for species selection (4 -8). Rabbit myocardium is a particularly suitable cardiac model, as it is the smallest mammalian species that accurately mimics several human physiological parameters including heart rate, electron transport chain coupling and mitochondrial density (2). For the purposes of the current study, we aimed to "reverse engineer" the rabbit cardiomyocyte mitochondrial proteome through the combined use of (i) multiple high-throughput proteomic techniques and (ii) sequence homology by mammalian phylogeny (9).
The mitochondria have been implicated in numerous disease models including diabetes (10), cardiovascular disease (as reviewed by (11), cancer (12), and neurological disorders (13). Although the overall role is generally uniform, mitochondria have evolved to facilitate diverse morphological and functional roles (14), meeting the specific needs of not only the individual tissue, but also the species (15)(16)(17). Previous mitochondrial proteomic studies have observed this tissue distinction at the protein level (18 -20). Of the numerous protein inventories created of the mitochondria, where proteomics and GFP targeting are the most commonly used techniques, species with sequenced genomes are utilized.
Previous mitochondrial proteomic investigations have utilized a range of approaches, both gel-based and gel-free. The combination of one-dimensional gel electrophoresis (1-DE) 1 and on-line peptide fractionation (GeLC-MS; gel-based) has been successfully applied to the study of the cardiomyocyte mitochondria from well-annotated species, Mus musculus (537 protein groups; (20) and Homo sapiens 542 protein groups; (21)) whereas a gel-free approach identified 406 protein groups in Homo sapiens (22). Mitochondrial proteomic profiles have also been generated across multiple organs obtained from mouse (591 protein groups from brain, heart, liver, and kidney; (23)) and rat (689 protein groups from mus-cle, heart, and liver; (18)) with the use of GeLC-MS. In investigations of mitochondria enriched from the liver, gel-free methods resolved 297 protein groups (mouse; (24)), with further partitioning of the inner mitochondrial membrane, identifying 348 and 182 protein groups from rat and mouse liver respectively (25,26). The commonality across these large proteomic studies is the use of species with well annotated genomes. By comparison, mitochondrial proteomics performed in species with incomplete genome annotation, has to-date, been achieved using traditional two-dimensional electrophoresis (27,28).
To help circumvent the need for annotated genomes, traditional two-dimensional gel electrophoresis (2-DE) methods can be utilized (8,29,30), whereby with adequate separation, single proteins can be resolved for analysis with mass spectrometry. The mitochondria can however, pose distinct challenges for such approaches, because of the bias toward basic proteins (31,32) and the extensive inner and outer mitochondrial membranes (inner mitochondrial membrane and outer mitochondrial membrane respectively) (33). These membranes create functionally dissimilar compartments, with the mitochondrial matrix (internal to the inner mitochondrial membrane) and intermembrane space (between the inner mitochondrial membrane and outer mitochondrial membrane) containing additional proteins with essential functions. Previous studies have shown the impact of the intrinsic properties of mitochondrial proteins on 2-DE analysis, with only 77 unique inner mitochondrial membrane proteins identified by comparison with 342 identified using gel-free methodologies (32). Although 1-DE approaches are not subject to the same limitations, alone they are insufficient to observe low-abundance proteins. Gel-free methods are assumed to be more amenable to such investigations (as reviewed by (34,35) but as mentioned previously, have the prerequisite of an annotated genome.
We hypothesize that depth and breadth of coverage of the partially annotated rabbit mitochondrion would be enhanced by using more than gel free protein or peptide separation method of the intact or subproteomes of the mitochondria that are ultimately dependent upon the ability of the selected technique to adequately solublize and partition (as reviewed by (36,37)) the mitochondrial proteins. Furthermore, because of the lack of complete genome coverage of the rabbit, protein sequence homology from other mammals with phylogenetic relationships (9) must be used. We therefore also aim to determine the ability of these parallel methodologies for minimizing (i) false discovery rates; (ii) redundancy at numerous levels; and (iii) "one-hit wonders," while (iv) maximizing sequence coverage; these being factors that influence the ultimate success of such a study.
Subfractionation of Membrane-Enriched Proteins of the Mitochondria-Inner mitochondrial membrane were isolated according to previously reported protocols (38); and mitochondrial membranes were enriched by the modified sodium carbonate precipitation method as previously described (34,39) using the Bio-Rad Membrane II Protein Extraction Kit (Bio-Rad, Hercules, CA). The membrane-enriched pellets from both preparations were resolublised in either 2.5% n-octyl ␤-D-glucopyranoside (protein-centric) or 6 mol/L Urea/2 mol/L thiourea (peptide-centric). To limit potential interference from lipids associated with the mitochondrial membranes, delipidation was performed by phase partitioning using the ProteoSolve LRS Kit (Pressure Biosciences Inc., West Bridgewater, MA. This step was only performed on the membrane-enriched preparations because of the relatively high ratio of lipid to protein suggested in the mitochondrial membranes (39). In addition to investigating the membrane-enriched fraction, the soluble fraction produced from the sodium carbonate precipitation method was also analyzed to observe loosely associated and soluble proteins. Soluble and associate proteins were concentrated and desalted using solid phase extraction (C-18 Sep-Pak cartridges, Waters Corp., Milford, MA) prior to analysis. Collectively, these fractionated samples will be regarded as "fractionated mitochondria." Protein-centric 2-DLC (PF2D)-Analysis of denatured, intact mitochondrial proteins was carried out on a PF2D (Beckman-Coulter, Fullerton, CA) as previously described (32,40,41). Briefly, proteins were separated in the first dimension through a chromatofocussing column (250 mm ϫ 2.1 mm, Eprogen, Darien, IL), where proteins were fractionated through a gradient formed through the transition from 100% start buffer (6 mol/L urea, 25 mmol/L Bis-Tris, pH 8.5) to 100% eluent buffer (6 mol/L urea, 10% v/v Polybuffer 74 (GE Healthcare), pH 4 in 20% isopropanol) at a flow rate of 0.2 ml/min as previously described (40,41). Following sample injection, a stable baseline was established at pH 8.5 for 20 min prior to the initiation of the pH gradient, following which the column was washed with 1 mol/L NaCl. Elution profiles were monitored at 280 nm. Fractions were collected every 5 mins while stable pH was detected (pH 8.5 and 4.0), and during pH gradient fractions were collected at 0.3 pH intervals. Fractions from the first dimension were sequentially injected onto the second dimension reverse phase-high performance liquid chromatography (RP-HPLC) column (33 mm ϫ 4.6 mm, 1.5 mm nonporous ODS-IIIE C18 silica beads, Eprogen). Protein elution from RP-HPLC at 50°C constant was monitored at 214 nm from injection (minute 0) through until 100% solvent B. Fractions were collected every 0.25 mins during 15 min linear gradient from 25% to 75% solvent B (solvent A: 0.1% trifluoroacetic acid in ddH 2 O; solvent B 0.08% trifluoroacetic acid in 100% acetonitrile) at a flow rate of 0.75 ml/min. Those chromatofocussing fractions collected during stable pH were pooled prior to injection onto RP-HPLC. In this case, fractions were collected every 0.25 mins during a 40 min linear gradient from 30%to 70% solvent B at a flow rate of 0.75 ml/min. All reverse-phase fractions were stored at Ϫ80°C for further analysis.
Protein-centric 1-DLC-Both intact and fractionated mitochondria were separated by RP-HPLC (33 mm ϫ 4.6 mm, 1.5 mm nonporous ODS-IIIE C18 silica beads, Eprogen). A 125-g aliquot of solublized proteins were diluted in 0.1% trifluoroacetic acid /10% acetonitrile prior to loading onto RP-HPLC, kept constant at 50°C. Fractions were collected every 0.4 mins during a 40 min linear gradient from 30% to 70% solvent B at a flow rate of 0.75 ml/min. All reverse phase fractions were stored at Ϫ80°C for further analysis. To complement the linear separation of proteins, a stepwise (sawtooth) gradient from 10% to 100% B, was also implemented. Proteins were eluted from the RP-HPLC in packets, as determined by prolonged (5 mins) steps of solvent B. Fractions were collected every 1.25 mins from 10%, 20%, 30%, 40%, 50%, and 60% solvent B. Using electrospray ionization-tandem mass spectrometry (ESI-MS/MS), it was determined that proteins were only detectable in fractions collected at 10%-40% solvent B (data not shown). The four fractions collected during these steps were pooled prior to storage at Ϫ80°C.
Peptide-centric Approaches-Following urea/thiourea solublization of both intact and fractionated mitochondria, proteins (125 g) were reduced and alkylated with 5 mmol/L tris(2-carboxyethyl)phosphine and 10mmol/L iodoacetamide. Proteins were initially digested in limiting amounts of Lys-C for 6 h at room temperature. Given the high concentration of urea, the mixture required dilution in five volumes of 50 mmol/L ammonium bicarbonate prior to proteolysis in sequencing grade modified porcine trypsin. To prevent chemical modifications, the digestion was performed at 25°C for 24 h. Prior to peptide-centric 1-DLC and 2-DLC, peptides were desalted and concentrated with the use of C-18 SPE Sep-Pak cartridges (Waters Corp., Milford, MA), following which the samples were dried and stored at Ϫ80°C. Peptides prepared for peptide-centric 1-DLC were loaded directly into on-line RP-HPLC ESI-MS/MS. For peptide-centric 2-DLC, the peptide mixture was fractionated by strong cation exchange chromatography (SCX) on a 1100 RP-HPLC system (Agilent, Santa Clara, CA) using a PolySulfoethyl A column (2.1 ϫ 100 mm, 5 m, 300 Å, PolyLC, Columbia, MD), by first dissolving the sample in 4 ml SCX loading buffer (25% v/v acetonitrile, 10 mmol/L KH 2 PO 4 pH 2.8, adjusted with 1 N phosphoric acid). The sample was loaded onto the column and washed isocratically for 30 min at 250 l/min. Peptides were eluted by a gradient of 0 -350 mmol/L KCl (25% v/v acetonitrile, 10 mM KH 2 PO 4 pH 2.8) over 40 min at a flow rate of 250 l/min. The 214 nm absorbance was monitored and 15 SCX fractions were collected along the gradient. Each SCX fraction was dried down prior to the second dimension.
Identification of Unique Peptides and Proteins-Tandem mass spectra were extracted within Sorcerer (Sage-N, Milpitas, CA) using the ReAdW program. Charge state deconvolution and deisotoping were performed. All tandem mass spectrometry (MS/MS) samples were analyzed using Sequest (ThermoFinnigan, San Jose, CA; version v.27, rev. 11). Sequest was set up to search against the SWISS-PROT mammalian database (70894 entries; Protein Knowledgebase release 55.0) from UniProt knowledgebase (UniProtKB; SWISS-PROT ϩ TrEMBL), assuming trypsin digestion, with a maximum of two missed cleavages. Reverse database searching was performed simultaneously to ensure false discovery rates (FDR) were minimal (FDR Յ 3.7%). The rate was slightly relaxed, given the lack of the annotated rabbit genome. Sequest was searched with parent ion tolerance set to either 0.06 Da (Orbitrap) or 1.2 Da (LTQ; monoisotopic) and fragment tolerance set to 1.0 Da. Oxidation of methionine was specified in Sequest as a variable modification. Iodoacetamide derivative of cysteine was specified in Sequest as a fixed modification. In total 763,475 scanning events took place across all preparations and including all technical replicates. Scaffold (version Scaffold_ 2_02_00, Proteome Software Inc., Portland, OR) was used to validate MS/MS based peptide and protein identifications. All Sequest-derived peptide identifications were analyzed manually (44,098 spectra) if they exceeded deltaCn scores greater than 0.10 and XCorr scores greater than 2.1, 3.5, and 3.5 for doubly, triply, and quadruply charged peptides respectively. Those that did not meet these criteria were excluded immediately (719,377 scanning events). For inclusion, spectra were required to fulfill numerous requirements as outlined previously (42). Briefly, these requirements included (i) assignment of the majority of ions detected, including the most intense peaks, to parent or daughter ions and their associated peaks (arising from loss of water or amine); (ii) assignment of large ions (not related to the parent ion) to Pro/Asp; (iii) at least five isotopically resolved ions in sequential order, from both b-and y-ion series, matching theoretical peptide fragments and; (iv) maximum of one peak resolving with sufficient s/n ratios, that could not be assigned to parent and/or daughter ions and associated peaks. A total of 6521 spectra failed to meet these criteria. For those spectra that met requirements (iii) and (iv) but failed to meet (i) and (ii), de novo sequencing was attempted (3947 spectra), as previously described (8). Briefly, amino acid sequences were deduced by the mass differences between y-or b-ion "ladder," following which, peptide sequences were then used to search the UniProtKB (SwissProt and TrEMBL) database using the program BLASTP "short nearly exact matches" (43). For inclusion of these de novo sequenced peptides, requirements (i) and (ii) needed to be met. Those that did not meet these criteria were excluded (10,459 spectra). In the end, 33,639 spectra met all the requirements for inclusion. No singly charged peptides were examined, given that ESI imparts a single charge, in addition to the charged C-terminus created as a result of tryptic digestion. Protein identifications were accepted if they contained at least one identified peptide (Supplemental Table 1). As previously described, single identifying peptides (or one-hit wonders) were excluded if they were: (i) not present in both technical replicates; (ii) only sequenced and matched once within an individual dataset (44). For those peptides with only one identifying peptide (Supplemental Fig. 1), that met the criteria for inclusion, verification was performed by manual interpretation of the MS/MS spectra. Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony. Removal of possible sources of redundancy was achieved at three levels; naming redundancy, sequence redundancy, and species redundancy, by external validation of every sequenced peptide with the use of the BLASTP program, searching the UniProtKB. Within each experiment, numerous individual peptides were sequenced on multiple occasions (Supplemental Table 2). The number of MS/MS replicates for each peptide ranged from 2 to 35. As well, identification of a specific protein isoform was included if a peptide or an amino acid sequence unique to that particular isoform was observed. To enable cross-species comparisons with previously published human (21,22) and mouse (20) cardiomyocyte preparations, all proteins detected were mapped to Mouse UniProtKB (Taxonomy 10090; 59,533 identifiers) identifiers. To limit the species effects, all proteins detected were mapped also to Human UniProtKB (Taxonomy 9606; 95,621 identifiers) identifiers. When a protein could not be mapped with 99% homology, the BLASTP "short nearly exact matches" program was used to identify the protein with the closest homology. If the homology was less than 90%, then it was included as a separate entry.

RESULTS AND DISCUSSION
The aim of the current study was to use proteomic technologies to improve coverage of the partially annotated rabbit mitochondrial proteome through the use of five parallel methodological approaches. Current coverage of the Oryctolagus cuniculus (rabbit) protein database represents Ͻ3% of the human sequence (2501 rabbit versus 95,621 human identifiers in UniProtKB database). Traditionally, this would limit the ability to undertake high-throughput proteomics, however with the use of multidimensional fractionation to ensure adequate partitioning of the complex sample and sequence homology across mammalian groups with phylogenetic relationships to rabbits (including rodentia and primates), we have attempted to circumvent this. To determine the success of the current study to achieve this aim, we performed a crossexperimental comparison with previous cardiomyocyte mitochondrial studies, undertaken in species with deep genome coverage.
The current study used between 1 (peptide 1-DLC) and 3 dimensions (subfractionation) of separation ( Fig. 1) including: (i) peptide-centric 1-DLC; (ii) peptide-centric 2-DLC; (iii) protein-centric 1-DLC; (iv) protein-centric 2-DLC and; (v) subfractionated mitochondria (Table I) to partition mitochondrial proteins and peptides prior to mass spectrometry. When taken together, these five approaches enabled identification of 558 nonredundant proteins and 2934 unique peptides, in the absence of a fully annotated genome (Table I). The overall coverage of the cardiomyocyte mitochondrial proteome was improved by this parallel approach, where the total number of nonredundant peptides was nearly two fold greater than the number of peptides identified by any single technique (subfractionation; 1520 peptides). This was also reflected in the total number of nonredundant proteins, which was more than 1.5-fold greater than any single approach (peptide-centric 2-DLC; 379 proteins). Of the 558 nonredundant proteins that were identified, there was a bimodal distribution across the pH spectrum (Supplemental Fig. 2a), with a skew toward the low molecular mass range, (10 -20 kDa; 21%; Supplemental Fig. 2b). Each method also showed a partiality for a specific biochemical property, for example protein-centric 2-DLC identified a significant proportion of proteins with high pI (Ͼ9.5), and peptide-centric 2-DLC displayed a clear propensity for neutral proteins (pH 7) (Supplemental Fig. 3). We found that the multifaceted approach, one that utilizes multiple separation methods and the intrinsic biological features of mitochondria, was required to create a protein inventory that is equivalent to those from annotated species, as no single method would provide sufficient depth of coverage.
To ascertain the ability of the current study to sufficiently cover the mitochondrial proteome, in the absence of the complete rabbit genome annotation (2501 identifiers, SWISS-PROT ϩ TrEMBL), it was necessary to compare it with largescale investigations of human and mouse cardiomyocyte mitochondria, both of which have fully annotated genomes (59,533 mouse identifiers; 95,621 human identifiers; SWISS-PROT ϩ TrEMBL). Although there is mitochondrial heterogeneity between species and tissues at the level of protein expression, there is sufficient homology in protein sequences across mammals to enable this comparison. This homology was deemed to be of sufficient value as to circumvent the use of de novo sequencing algorithms, which can be problematic when analyzing ion trap data (45).
Proteomic Comparison of Partially and Fully Annotated Genomes-The ability to perform a cross-experimental comparison is dependent upon factors ranging from sample selection through to proteomic experimental conditions (including sample preparation, separation, and identification techniques). The rabbit cardiomyocyte mitochondrion that was used in this study was compared with protein inventories of human (21,22) and mouse (20) cardiomyocyte preparations. Of the 995 nonredundant proteins identified, only 19% (189 proteins) were observed by all (Fig. 2a). By comparison with the proteins that were observed across the five orthogonal strategies used in the current study (108 proteins), over 90% (99 proteins) were also observed in mouse and human cardiomyocyte mito-  chondria studies (20 -22). This may indicate that observation of proteins across multiple technologies improves the likelihood of true mitochondrial localization and may suggest that these proteins are highly abundant within the mitochondria, independent of species and tissue differences. Perhaps a more valid comparison is one that investigates, not the results of these individual proteomic studies, but the proteins observed at the level of the organisms. Rabbits (Lagomorpha) have been traditionally grouped with rodentia (including rats and mice) under the superorder of Glires on the basis of morphological features (as reviewed by (46). At the protein level, studies have suggested Lagomorpha are more closely related to primates than rodents, based on sequence homology across 88 proteins (9). By comparing the mitochondrial proteins identified, rather than the sequences comprising them, mouse (20), rabbit and human (21,22) proteome studies, each identified a similar number of proteins (535, 556, and 555 proteins respectively). As can be observed in Fig. 2b, this proteomic comparison may suggest more shared features between primates (human) and lagomorpha (rabbit), concurring with the results of Graur et al. The current study failed to detect 89 proteins that were identified in human and mouse mitochondrial studies, including 16 subunits of the 28S and 39S ribosome. Although it is difficult to determine the reason for this absence, if we compare the two primate studies, where one utilized gel-based (21) and the other, a gel-free (22) approach, we can observe the same absence of 28S ribosome subunits in the Gaucher et al. investigation, as the current study. We therefore hypothesize that this is a result of an experimental discrepancy between gel-based and gel-free methods, as we were otherwise able to detect low-abundance proteins, including other proteins contained in the 28S and 39S ribosomes.
The current study contributed 142 unique proteins, a similar proportion to that of both mouse (Pagliarini et Supplemental Fig. 4), a compendium of 31 mitochondrial inventories across diverse tissues and species (47), showed that of the 142 unique proteins, 69% (99 proteins) had been suggested to associate with the mitochondria, confirming the sequence homology/expression heterogeneity across mammals. This group included proteins involved in mitochondrial fission (mitochondrial fission factor; MFF HUMAN) and fusion (mitofusin-2; MFN2 HUMAN), which are vital to the regulation of mitochondrial morphology and distribution (as reviewed by (11); and multiple components of the 39S mitochondrial ribosomal complex (RM16, RM42, RM45, RM50, and RM51 subunits). The vast majority of these unique proteins (108 proteins) were identified by a single orthogonal approach, including the low abundance proteins of the 39S ribosomal complex. This suggests that to ensure depth of coverage, the parallel approach is essential. Overall, both side-by-side comparisons (study comparison and species comparison) showed that even in the absence of complete genome annotation, a high-throughput proteomics study is possible, detecting a comparable number of proteins as those undertaken in species with annotated genomes. This was only possible however, with the (i) use of 5 different separation strategies and (ii) common amino acid sequence similarities  Fig. 2b shows the comparison between primates (21,22), rodents (20); and Lagomorpha. In this species comparison, 244 protein groups (25%) were identified across all investigations. Each species identified a unique subset of protein groups, which were compared with the total number of proteins identified within that species to give the percentages presented. across mammalian species. Combining the results from peptide-and protein-centric 2-DLC and subfractionated approaches, 96% of protein groups and all but 372 peptides (13%) were identified.
To decipher the strengths of each approach, we performed a five-way comparison (Supplemental Fig. 5). By comparison with the total number of proteins identified by each separation strategy (Table I), peptide-centric 2-DLC provided the highest proportion of unique protein groups (28%; 106 protein groups), followed by protein-centric 2-DLC (23%; 56 protein groups), subfractionation (14%; 44 protein groups), protein-centric 1-DLC (10%; 21 proteins), and peptide-centric 1-DLC (1%; 3 protein groups). The majority of the 230 proteins that were detected by a single method (Table II) were identified by multiple observations of a single peptide (with caveats outlined in Methods and discussed below). As mentioned previously, numerous components of the 39S mitochondrial ribosomal complex and transporter inner membrane complex were only observed by one strategy. With the exception of peptide-centric 1-DLC, all strategies contributed at least two unique subunits of the 39S complex and at least one unique transporter inner membrane subunit (Table II). Given the relatively low abundance of the proteins contained in these complexes, these results may suggest that proteins identified by a single approach represent those expressed at low levels. It is therefore important not to overlook these proteins as they specifically enhance coverage of the rabbit cardiomyocyte mitochondrial proteome.
At the peptide level, once again peptide-centric 2D-LC resulted in the highest proportion of unique peptides observed (35%; 535 peptides), followed by protein-centric 2D-LC (30%; 234 peptides) and subfractionation (27%; 412 peptides). Protein-centric 1-DLC (20%; 163 peptides) and peptide-centric 1-DLC (15%; 181 peptides) contributed the fewest unique peptides overall. When taken together, these results tend to indicate, that peptide-centric 1D-LC is not a feasible approach as a stand-alone technique, which is unsurprising given that all potential mitochondrial proteins are separated across 1 dimension alone. The strengths of peptide-centric 1D-LC lie in its ability to provide validation for the remaining preparations when proteins are identified by a single peptide or one-hit wonders and by enhancing protein sequence coverage. With large proteomic studies becoming increasingly popular, the need to ensure accurate representation of the data and by proxy, the proteome of interest, is increasingly imperative (48), as such one-hit wonders are traditionally not included in such investigations. We found it necessary to include such proteins in the current study however.
Bioinformatic Challenges: Protein Sequence Coverage and "One-Hit Wonders"-Traditionally, protein matches where less than two discrete peptides can be successfully identified are removed from large proteomic studies. There are obvious caveats to this arising in relation to post-translational modifications, where a single peptide within a protein may exist in a modified form, or alternatively in the case of incomplete genome annotation. In the current study, when considering the approaches individually, 68% of the total number of protein groups identified (380 out of 556) were identified by a single peptide or one-hit wonder. By combining the five approaches however, 143 of these one-hit wonders were identified with more peptides, contributed by each of the approaches. For example, ATP synthase e (ATP5I_PIG), was identified by a single peptide following peptide-centric 2-DLC, however both protein-centric 1-and 2-DLC contributed an additional unique peptide, with the subfractionated preparation contributing an additional two peptides. In the end, we identified ATP synthase e using three discrete peptides (48% sequence coverage) through the combination of these four approaches. As discussed above, in the current study, where the rabbit genome was unavailable because of incomplete annotation, to ensure that the minimum number of proteins were identified by one-hit wonders, it was important to combine the results from the multistage approach, rather than selecting one or two methods alone.
Not only did this multistage approach ensure proteins were identified by more than one peptide on several occasions, but it also facilitated improved sequence coverage. We found that 113 proteins showed improved sequence coverage by using the combination of methodologies. This included NAD(P) transhydrogenase (NNTM_BOVIN), which following separation by protein-centric 1-DLC was identified by four unique peptides (5% sequence coverage). This increased to 39 discrete peptides (36% sequence coverage) with the addition of the remaining four methodologies. The increased sequence coverage could be observed by investigating the overlap of the five orthogonal strategies at both the protein and peptide level (Supplemental Fig. 5). There were only 159 peptides that were observed by all 5 orthogonal methods, a relatively small contribution (5%) by comparison with the total number of unique peptides (see Table I). These peptides represent the most abundant proteins detected in the current study, with multiple peptides identified from ATP synthase-␣ and -␤. At the protein level however, the 5 strategies combined to identify more proteins (108 protein groups) than any of the individual approaches (peptide-centric 2-DLC; 106 unique protein groups). Of the 108 proteins that were observed across all preparations, 98 (91%) were identified with more than one unique peptide contributed by any single strategy (Supplemental Table 3). These unique peptides increased protein sequence coverage by up to 40% (e.g. NDUA8_HUMAN). This suggests that even though there was redundancy across the five strategies at the protein level, the combination of approaches shows limited redundancy at the peptide level, therefore contributing toward the increased sequence coverage. In the current study, where high-throughput proteomics was performed in the absence of an annotated genome, the increased sequence coverage achieved makes this a valuable contribution to the overall protein identification confidence.
One consideration of the current study is the increased "observation" redundancy within protein-centric approaches. By maintaining intact proteins, information pertaining to discrete populations of protein variants is possible. At the same time however, protein-centric approaches lead to over six observations (based on tandem-MS spectra) per peptide, whereas peptide-centric approaches resulted in a significantly lower number (3.4 observations/ peptide). This is most likely because of distinct populations of proteins resolving independently because of a biochemical characteristic including the presence of a post-translational modification. As previously described (32), discrete populations of proteins can be observed with altered retention times in protein-centric approaches, indicating changes in biochemical properties (Fig. 3) (49). Therefore the partitioning of proteins into numerous fractions by protein-centric methods (1-DLC in particular) may lead to the reduced number of unique peptides observed because of the increased observations of the same peptide.

FIG. 3. Subpopulations of mitochondrial proteins resolved by protein-centric approaches.
To determine the ability of protein-centric approaches to identify protein subpopulations, the number of peptides (y axis) attributable to ATP synthase-␣ and -␤ were plotted against the protein-centric fractions (x axis) they were observed in. Bioinformatic Challenges: False Discovery Rates-Large proteomic studies can be computationally validated with the use of FDR whereby the generation of reverse databases facilitates prediction of the likelihood of false positives. In the current study, the FDR was relaxed slightly given the absence of genome annotation and subsequent inclusion of additional species to ensure maximal coverage. The caveat with such an approach was the introduction of sequence redundancy. This resulted in a true versus false one-hit wonder paradox, whereby the nature of searching against multiple species with high sequence homology, but not sequence identity resulted in false one-hit wonders. Although the majority of peptides were correctly assigned to the species with the closest homology, peptides with amino acid substitutions were assigned to a divergent species. This is best exemplified by trifunctional enzyme subunit ␣ (ECHA MOUSE), where the majority of the 22 peptides sequenced in the course of the current investigation, contained three peptides that were homologous to different species ( 441 ADMVIEAVFEDLSLK 455 from ECHA HUMAN; 327 FGEL-AMTK 334 from ECHA PIG and; 520 DTTASAVAVGLK 530 from ECHA RAT). Analysis of these sequences reveals low homology in response to single amino acid substitutions (Fig. 4). The potential for falsely elevated FDR is possible in the absence of an annotated genome and therefore must be considered. In the current study, we found that the proportion of false one-hit wonders was higher when protein-centric 2-DLC approaches (59% miss-assigned) were used.

CONCLUSIONS
The current study has applied one-, two-, and three-dimensions of separation to improve coverage of the partially-annotated rabbit cardiomyocyte mitochondrial genome and found that (i) for adequate coverage and bioinformatic confidence in the absence of an annotated genome, all five strategies combined are required; (ii) 25% of mitochondrial proteins are common to rabbit, mouse, and human studies of cardiomyocte mitochondria; and (iii) the converse was also true, with 25% of proteins uniquely identified within each species. In the current study, we found that bioinformatic confidence was improved by (i) the significantly reduced population of proteins identified by a single peptide (one-hit wonder) by combining the five approaches; (ii) increased sequence coverage through the reduced peptide redundancy across the five strategies; (iii) "observation" redundancy, a factor of discrete protein variants resolving individually with protein-centric approaches (at the cost of unique peptides and proteins) and; (iv) identifying false one-hit wonders, caused by poor sequence conservation with fully annotated genomes, the removal of which resulted in reduced FDR and enhanced sequence coverage. Thus, a combination of high dimensional separation for protein and peptides in parallel with further subfractionation is a successful approach to provide proteome coverage for rabbit, achieving a similar number of proteins identified as those from species with annotated genomes, while meeting the bioinformatic requirements of large-scale proteomic studies. In the current study we observed an increased number of one-hit wonders where a single peptide was used to identify a given protein. This included false one-hit wonders, where species variations resulted in single peptides attributed to divergent species (e.g. trifunctional enzyme subunit ␣). The majority of the 22 peptides sequenced were attributed to the sequenced mouse protein (ECHA_MOUSE). However, 3 peptides showed species specific variations. The peptide spanning residues 327-334 was identified as ECHA_PIG, showing sequence divergence at residues 331 (Ala) and 332 (Met). A similar divergence was observed for the peptides spanning residues 441-455 (ECHA_HUMAN; 451 Asp; 453 Ser; 454 Leu) and 520 -530 (ECHA_RAT; 521 Thr; 522 Thr; 530 Lys). Species variations resulted in each peptide being attributed to one species uniquely.