High-definition De Novo Sequencing of Crustacean Hyperglycemic Hormone (CHH)-family Neuropeptides*

A complete understanding of the biological functions of large signaling peptides (>4 kDa) requires comprehensive characterization of their amino acid sequences and post-translational modifications, which presents significant analytical challenges. In the past decade, there has been great success with mass spectrometry-based de novo sequencing of small neuropeptides. However, these approaches are less applicable to larger neuropeptides because of the inefficient fragmentation of peptides larger than 4 kDa and their lower endogenous abundance. The conventional proteomics approach focuses on large-scale determination of protein identities via database searching, lacking the ability for in-depth elucidation of individual amino acid residues. Here, we present a multifaceted MS approach for identification and characterization of large crustacean hyperglycemic hormone (CHH)-family neuropeptides, a class of peptide hormones that play central roles in the regulation of many important physiological processes of crustaceans. Six crustacean CHH-family neuropeptides (8–9.5 kDa), including two novel peptides with extensive disulfide linkages and PTMs, were fully sequenced without reference to genomic databases. High-definition de novo sequencing was achieved by a combination of bottom-up, off-line top-down, and on-line top-down tandem MS methods. Statistical evaluation indicated that these methods provided complementary information for sequence interpretation and increased the local identification confidence of each amino acid. Further investigations by MALDI imaging MS mapped the spatial distribution and colocalization patterns of various CHH-family neuropeptides in the neuroendocrine organs, revealing that two CHH-subfamilies are involved in distinct signaling pathways.

Neuropeptides and hormones comprise a diverse class of signaling molecules involved in numerous essential physiological processes, including analgesia, reward, food intake, learning and memory (1). Disorders of the neurosecretory and neuroendocrine systems influence many pathological processes. For example, obesity results from failure of energy homeostasis in association with endocrine alterations (2,3). Previous work from our lab used crustaceans as model organisms found that multiple neuropeptides were implicated in control of food intake, including RFamides, tachykinin related peptides, RYamides, and pyrokinins (4 -6).
Crustacean hyperglycemic hormone (CHH) 1 family neuropeptides play a central role in energy homeostasis of crustaceans (7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17). Hyperglycemic response of the CHHs was first reported after injection of crude eyestalk extract in crustaceans. Based on their preprohormone organization, the CHH family can be grouped into two sub-families: subfamily-I containing CHH, and subfamily-II containing molt-inhibiting hormone (MIH) and mandibular organ-inhibiting hormone (MOIH). The preprohormones of the subfamily-I have a CHH precursor related peptide (CPRP) that is cleaved off during processing; and preprohormones of the subfamily-II lack the CPRP (9). Uncovering their physiological functions will provide new insights into neuroendocrine regulation of energy homeostasis.
Characterization of CHH-family neuropeptides is challenging. They are comprised of more than 70 amino acids and often contain multiple post-translational modifications (PTMs) and complex disulfide bridge connections (7). In addition, physiological concentrations of these peptide hormones are typically below picomolar level, and most crustacean species do not have available genome and proteome databases to assist MS-based sequencing.
MS-based neuropeptidomics provides a powerful tool for rapid discovery and analysis of a large number of endogenous peptides from the brain and the central nervous system. Our group and others have greatly expanded the peptidomes of many model organisms (3, 18 -33). For example, we have discovered more than 200 neuropeptides with several neuropeptide families consisting of as many as 20 -40 members in a simple crustacean model system (5,6,(25)(26)(27)(28)(29)(30)(31)34). However, a majority of these neuropeptides are small peptides with 5-15 amino acid residues long, leaving a gap of identifying larger signaling peptides from organisms without sequenced genome. The observed lack of larger size peptide hormones can be attributed to the lack of effective de novo sequencing strategies for neuropeptides larger than 4 kDa, which are inherently more difficult to fragment using conventional techniques (34 -37). Although classical proteomics studies examine larger proteins, these tools are limited to identification based on database searching with one or more peptides matching without complete amino acid sequence coverage (36,38).
Large populations of neuropeptides from 4 -10 kDa exist in the nervous systems of both vertebrates and invertebrates (9,39,40). Understanding their functional roles requires sufficient molecular knowledge and a unique analytical approach. Therefore, developing effective and reliable methods for de novo sequencing of large neuropeptides at the individual amino acid residue level is an urgent gap to fill in neurobiology. In this study, we present a multifaceted MS strategy aimed at high-definition de novo sequencing and comprehensive characterization of the CHH-family neuropeptides in crustacean central nervous system. The high-definition de novo sequencing was achieved by a combination of three methods: (1) enzymatic digestion and LC-tandem mass spectrometry (MS/MS) bottom-up analysis to generate detailed sequences of proteolytic peptides; (2) off-line LC fractionation and subsequent top-down MS/MS to obtain high-quality fragmentation maps of intact peptides; and (3) on-line LC coupled to top-down MS/MS to allow rapid sequence analysis of low abundance peptides. Combining the three methods overcomes the limitations of each, and thus offers complementary and high-confidence determination of amino acid residues. We report the complete sequence analysis of six CHH-family neuropeptides including the discovery of two novel peptides. With the accurate molecular information, MALDI imaging and ion mobility MS were conducted for the first time to explore their anatomical distribution and biochemical properties.

EXPERIMENTAL PROCEDURES
Materials and Chemicals-All chemical reagents were obtained from Sigma-Aldrich (St. Louis, MO) unless otherwise noted. Optima grade formic acid, ACN, water, and methanol were purchased from Fisher Scientific (Pittsburgh, PA).
Animals, Tissue Dissection and Extraction-Blue crabs Callinectes sapidus and Jonah crabs Cancer borealis were shipped from the Fresh Lobster Company (Gloucester, MA), and then maintained in artificial seawater. The animals were anesthetized in ice, and the sinus glands (SGs) and pericardial organs (POs) were dissected and collected in acidified methanol. The tissue was then homogenized and extracted with acidified methanol. After centrifugation, supernatant fractions were combined and concentrated to dryness. The sample was re-suspended in 100 l of water for further analysis (5). The detailed protocol is described in Supplementary Materials.
HPLC Fractionation-HPLC separations were performed with a Waters Alliance HPLC system (Milford, MA). The mobile phases included solution A (water containing 0.1% formic acid) and solution B (acetonitrile (ACN) containing 0.1% formic acid). Approximately 50 l of extract was injected onto a Phenomenex Gemini C18 column (2.1 mm i.d., 150 mm length, 5 m particle size; Torrance, CA). The separations consisted of a 120 min gradient of 5-95% solution B. The flow rate was 0.2 ml/min. Fractions were automatically collected every 2 min with a Rainin Dynamax FC-4 fraction collector, followed by lyophilized, re-suspended in 20 l water, and stored in Ϫ80°C.
MALDI-TOF/TOF Analysis-A model 4800 MALDI-TOF/TOF analyzer (Applied Biosystems, Framingham, MA) equipped with a 200 Hz, 355 nm Nd:YAG laser was used. Acquisitions were performed in positive ion reflectron mode. Instrument parameters were set using the 4000 Series Explorer software (Applied Biosystems). Mass spectra were obtained by averaging 1000 laser shots covering mass range m/z 500 -4000 in reflectron mode and m/z 2000 -10000 in linear mode. MS/MS was achieved by 1 kV CID. For sample analysis, 0.4 l of sample was spotted on MALDI plate first and allowed to dry followed by the addition of 0.4 l 2,5-dihydroxybenzoic acid (DHB) matrix (4).
Bottom-up MS on Nano-LC-ESI-QTOF-An aliquot of 3 l peptide fraction was reduced and alkylated by dithiothreitol (DTT) and iodoacetamide (IAA), followed by digestion with trypsin, Glu-C and Lys-C (41) (see Supplemental materials for details). Nano-LC-ESI-QTOF MS/MS was performed using a Waters nanoAcquity UPLC system coupled to a QTOF Micro mass spectrometer (Waters, Milford, MA) as described previously (5). The MS/MS raw data were converted to peak list (.pkl) files using ProteinLynx software 2.4 (Waters) (5). Peptides were identified by searching against an NCBInr 20090726 protein database (9330197 sequences; 3196564765 residues) using the Mascot v2.1 search engine. Trypsin or Glu-C was selected as enzyme allowing up to two missed cleavages. Carboxylmethyl cysteine was specified as fixed modifications, and methionine oxidation and pyro-Glu as variable modifications. Precursor and MS/MS tolerances were set within 30 ppm and 0.6 Da for monoisotopic mass, respectively. Peptide charge states include 1ϩ, 2ϩ, and 3ϩ charged peptides.
Off-line Top-down MS on ESI-LTQ-FTICR-A 0.5 l of peptide fraction was reduced by incubation in 2.5 mM DTT for 1 h at 37°C and desalted by C18 ZipTip and resuspended in 10 l of 50% ACN containing 2% formic acid. The sample was analyzed using a 7T linear trap quadrupole (LTQ)/Fourier transform ion cyclotron resonance (FTICR) (LTQ-FT Ultra) hybrid mass spectrometer (Thermo Scientific Inc., Bremen, Germany) equipped with an automated chip-based nano-ESI source (Triversa NanoMate; Advion BioSciences, Ithaca, NY) as described previously (42). For CID and ECD fragmentation, individual charge states of peptide molecular ions were first isolated and then dissociated using 22-28% of normalized collision energy for CID or 4% electron energy for ECD with a 60 ms duration with no delay. Typically, 1000 transients were averaged to ensure high quality MS/MS spectra (42). All FTICR spectra were processed with Xtract Software (Xcalibur 2.0.5, Thermo Scientific Inc., Bremen, Germany) using a S/N threshold of 1.5 and fit factor of 40% and validated manually. The resulting mass lists were further assigned using the in-house developed "Ion Assignment" software. The assigned ions were manually validated to ensure the quality of assignments (42).
On-line Top-down MS on Nano-LC-ESI-LTQ-Orbitrap Elite-A 1 l of crude tissue extract was reduced by incubation in 2.5 mM DTT for 1 h at 37°C and desalted by C18 ZipTip and resuspended in 10 l of water containing 0.2% formic acid. On-line top-down MS was carried out on an Ultimate 3000 RSLCnano system coupled to an Orbitrap Elite mass spectrometer (Thermo Fisher Scientific, Bremen, Germany). A 0.5 l of peptide sample was injected onto a 2 cm, 150 m i.d. PLRP-S (dp 5 m, pore size 1000Å) trap column. A 10 cm, 75 m i.d. PLRP-S column was used for separation. The gradient was delivered at 300 nL/min starting at 5% B (95% acetonitrile and 0.2% formic acid) and rose to 10% B at 7 min, 50% B at 50 min, and 85% B at 58 min. The mass spectrometer was operated in the data-dependent mode to switch automatically between full-MS (scan 1), higher-energy collision dissociation (HCD)-MS 2 (scan 2), and electron transfer dissociation (ETD)-MS 2 (scan 3). The isolation width was set at 10 Da (36). The data processing method is the same as off-line top-down method described above.
MALDI Imaging-Immediately following dissection, the eyestalk was embedded in gelatin (100 mg/ml aqueous) and snap-frozen. Sectioning into 12 m slices at Ϫ25°C was performed on a cryostat (Microtom HM505E, Waldorf, Germany), and the slices were thawmounted onto a glass slide (Bruker Daltonics). An airbrush was used to spray coat the tissues with DHB. The airbrush was held perpendicular to the glass slide at a distance of 35 cm. Five coats of matrix were applied by spraying each sample for 30 s with 1 min dry time between each application (6).
Mass spectrometric analyses were performed in the linear, positive mode at ϩ20 kV accelerating potential on a time-of-flight mass spectrometer (Bruker Autoflex III TOF-TOF; Bruker Daltonics, Bremen, Germany), which was equipped with a Smartbeam laser capable of operating at a repetition rate of 200 Hz with optimized delayed extraction time and laser beam size was set to medium. Laser energy was optimized for signal-to-noise in each preparation. Using Bruker Protein Standard 1 (Bruker Daltonics, Bremen, Germany), a linear external calibration was applied to the instrument before data collection. Mass spectral data sets were acquired over a whole eyestalk section using FlexImaging software (Bruker Daltonics, Bremen, Germany) in the mass range of m/z 3000 to 10,000, with a raster step size of 50 m and 500 laser shots per spectrum. After data acquisition, molecular images were reconstituted using the FlexImaging software. Data was normalized using FlexImaging software, and each m/z signal plotted Ϯ 10 mass units (6).

Establishment and Validation of the High-definition De
Novo Sequencing Strategy-Our MS strategy for identification of CHH-family neuropeptides from the crustacean nervous system ( Fig. 1) involves three steps: peptide candidate scanning, in silico homology searching, and de novo sequencing. Although the cDNA sequence of C. sapidus CasϪS -G Ϫ CHH (Cas C. sapidus, SG sinus gland) preprohormone has been obtained by PCR-based cloning strategy, its amino acid sequence has not been identified by MS or Edman degradation (16,17). Here, the multifaceted strategy is es-tablished and validated by performing MS-based sequencing of the CasϪSGϪ CHH peptide.
Peptide Candidate Scanning-Two unique features of the CHH-family neuropeptides, a molecular weight (MW) ranging from 8 to 10 kDa and the presence of three disulfide bonds (12), were used as screening criteria for candidate identification. The sinus gland organs were pooled from ten animals, homogenized, lyophilized and subjected to reversed phase (RP)-HPLC fractionation ( Fig. 2A), followed by off-line direct infusion on an ESI-LTQ-FTICR mass spectrometer. Multiply charged CHH-family peptide ions were detected in the highresolution spectrum of Fraction #17 (Fig. 2B), and the accurate MW of this large peptide was determined as 8472.948 Da.
To screen for disulfide bonds, the peptide candidate was treated with DTT and IAA respectively to reduce the disulfide bonds and then alkylate the free thiol groups followed by analysis via MALDI-MS. MALDI can tolerate higher levels of salt and mostly produces singly charged ions facilitating mass comparison. Fig. 2C and 2D show the MALDI mass spectra of the original and derivatized peptides, respectively. The ions at m/z 8728, 8777, and 8824 correspond to the reduced pep- tides alkylated with attachment of 4, 5, and 6 carbamidomethyl groups, respectively. Incomplete alkylation might be caused by the large peptide size. These results suggest the presence of three disulfide bonds in the peptide CasϪSGϪ CHH.
In Silico Sequence Homology Searching-Sequence homology searching can aid in de novo sequencing of small neuropeptides. Here, we designed a bottom-up method to extend the utility of the protein database for sequencing of large neuropeptides. The candidate peptide was digested by trypsin and analyzed by LC-MS/MS, followed by Mascot searching against NCBI database. The first two hits were the CHH preprohormones of C. sapidus (52% sequence match) and Portunus trituberculatus (29% sequence match). The goal of our method is to analyze neuropeptides from species without proteome and genome databases. So, we assumed that there were no such knowledge for this target peptide CasϪSGϪ CHH, and the homologous CHH preprohormone from P. trituberculatus (supplemental Fig. S1) was used as a reference sequence for de novo sequencing. CHH family members share the characteristic feature of six cysteines located in the identical or similar positions, i.e. C 7 , C 23 /C 24 , C 26 /C 27 , C 39 / C 40 , C 43 /C 44 , C 52 /C 53 . Therefore, the sequence AA 64 -AA 139 of P. trituberculatus CHH preprohormone (supplemental Fig. S1) was mined as a reference peptide for the following de novo sequencing.
Bottom-up De Novo Sequencing-The tryptic fragments of CasϪSGϪ CHH were analyzed by LC-QTOF-MS/MS and MALDI-TOF/TOF-MS/MS. Fig. 3A is the MALDI-TOF/TOF mass spectrum of tryptic peptides of the Fraction #17. Fig. 3B displays a representative MS/MS spectrum of the tryptic peptide. Using the preprohormone sequence of P. trituberculatus CHH as a reference, the MS/MS spectra were carefully analyzed by software PepSeq (6) to assign fragment ions and determine peptide sequences. supplemental Table S1 lists all the sequenced tryptic peptides arising from CasϪSGϪ CHH. By this bottom-up sequencing method, 81% of the sequence was determined.
Off-line Top-down De Novo Sequencing-To determine the rest of the sequence, the intact CasϪSGϪ CHH was fragmented with top-down MS using CID on an ESI-ultra-high resolution (UHR)-QTOF maXis mass spectrometer via direct infusion (Fig. 4A). Two sets of b 55 -b 65 and y 7 -y 15 ions, generated by cleavage of amide bonds close to the peptide C terminus, were detected with intense signals. However, only a few fragment ions from middle region of the peptide chain were observed. Although only 14% of b and y ions were assigned, a sequence tag 56 LLIMDNFEEY 65 (in this study, the isobaric I/L residues were assigned using homologous sequences) was confidently determined from the intense C-terminal fragmentation (Fig. 4A). Similarly, this intact peptide was fragmented on a different instrument, an ESI-LTQ-FTICR mass spectrometer, with CID and ECD, resulting in poor fragmentation as well (data not shown). A possible explanation is that fragmentation of this peptide is hindered by its tertiary structure resulting from three disulfide bonds which crosslink the residues between Cys 7 -Cys 52 . Therefore, denaturation of peptides by breaking disulfide bonds with DTT and IAA could facilitate peptide fragmentation. However, the large peptide size causes incomplete alkylation as shown previously (Fig. 2D).
Here, we adopted a different strategy for peptide denaturation, in which the peptide was treated with DTT in urea solution for complete reduction of disulfide bonds, and then stored in 50% ACN containing 2% formic acid. The acidic environment prevented the disulfide bonds from reforming although the free thiol groups were not blocked by protective groups. After storing the sample either at 5°C for 5 h or Ϫ20°C for 4 days, no disulfide bond formation was observed (data not shown). The subsequent fragmentation of the reduced peptide on the LTQ-FTICR with CID and ECD pro- Sequence Assembling- Fig. 5C summarizes the sequence coverage identified by the three de novo sequencing methods (bottom-up, off-line top-down and on-line top-down). These methods are complementary and maximize the sequence identification. The identified sequence percentages are 81%, 75%, and 50%, respectively. Only two residues AA 32 and AA 35  In summary, this multifaceted MS strategy offers a systematic approach to elucidate amino acid sequences of the CHHfamily neuropeptides, which is applicable to characterization of large intact peptide hormones with or without known cDNA sequences.
CHH-family Neuropeptidome in C. sapidus and C. borealis-With the strategies described above, we have discovered and identified six CHH-family neuropeptides and two modified isoforms from C. sapidus and C. borealis (Table I). For each peptide, the MWs of both intact and "reduced" peptides were measured to improve the identification confidence ("reduced" refers to the peptides after DTT reduction), in which the theoretical and experimental MWs match within 17.5 ppm. The sequence alignments among homologous species are shown in Fig. 6. This high-level confidence for complete sequence coverage arises from a combination of the three de novo sequencing methods.

Identification of Two Novel CHH-family Neuropeptides in C. borealis-
The Novel MIH-Although C. borealis is a widely used animal model in neurobiological studies, there is no available genomic database, making peptide identification in this model organism challenging (25,27). With the multifaceted MS strategy described here, we found one novel MIH in the sinus gland. The accurate MWs of intact and "reduced" forms were determined as 8932.302 and 8938.230 Da, respectively. From the 6-Da mass difference, we inferred that this peptide contains three disulfide bonds and thus placing this peptide as a putative candidate for the CHH-family neuropeptides. The homology search using bottom-up proteomic method re- Probing CHH-family Neuropeptidome by MS sulted in a hit of the MIH from a homologous species, Cancer pagurus (43), with 89% of matched peptide coverage (supplemental Fig. S3). In the subsequent bottom-up de novo sequencing, multiple proteases (trypsin, Glu-C and Lys-C) were employed to generate proteolytic peptides from different cleavage site. Followed by LC-MS/MS analysis, computerassisted sequencing resulted in identification of proteolytic peptides listed in Table II. In addition, these results indicated that a Mascot search caused false positive identification of the tryptic peptide 68 QWVGILGAGRE 78 (supplemental Fig. S3). By overlapping these proteolytic peptides and referring the homologous sequence of Cancer pagurus CapϪSGϪ MIH, we identified the sequence AA 1 -AA 59 of the new CabϪSGϪ MIH in C. borealis. The sequence AA 60 -AA 77 remained unknown.
Previous work on sequencing of the C. sapidus CasϪSGϪ CHH showed that CID fragmentation of intact (nonreduced) CHH peptide scanning resulted in ample cleavage of C-terminal amide bonds, facilitating identification of the C-terminal residues. Similarly, we fragmented this putative CabϪSGϪ MIH peptide on ESI-LTQ-FTICR using CID, and observed one set of b ions with intense signals arising from sequential cleavage of the C-terminal amide bonds. Accordingly, AA 68 -AA 77 was determined as 68 EWVGILGAGS 77 with a C-terminal amidation. The remaining task was to identify AA 60 -AA 67 .
In the homologous CapϪSGϪ MIH (supplemental Fig. S3), AA 67 is an arginine and thus AA 60 -AA 67 forms a tryptic peptide during digestion. So, it is possible that in the putative CabϪSGϪ MIH the residue AA 60 -AA 67 also form a tryptic peptide. Based on this hypothesis, the MW of the tryptic peptide can be calculated by Peptide Mass Deduction, i.e. the measured CabϪSGϪ MIH peptide MW minus the calculated remain-ing sequence mass (8938.230 -8003.781 ϭ 934.449 Da). By carefully searching the bottom-up de novo sequencing data, one tryptic peptide was found matching the MW 934.449 Da (Fig. 7B). Analysis of the MS/MS spectrum using PepSeq determined its sequence as 60 TAEMSQLR 67 (Fig. 7C). With the addition of this fragment, the complete sequence of this novel CabϪSGϪ MIH was determined. To confirm it, off-line and on-line top-down MS/MS were carried out. In the resulting ECD spectrum (Fig. 7D), observation of a series of z . ions confirms the sequence 63 MSQL 66 which was determined by the bottom-up tryptic peptide in Fig. 7C. The fragmentation map resulting from ECD and CID (Fig. 5B) shows 58% of identified sequence coverage, and that of ETD and HCD gives 50% of sequence coverage (supplemental Fig. S4). Fig. 7E illustrates the sequencing process step by step. This approach offers a confident and accurate sequence analysis for   Fig. S5) with 89% of matched sequence coverage (44). With the bottom-up de novo sequencing method using multiple proteases for digestion, twelve proteolytic peptides corresponding to 97% sequence coverage (Table III) were identified to generate a putative MOIH sequence, the sequence order should be HK rather than KH. Using an off-line topdown method, this putative CabϪSGϪ MOIH could not be selected for fragmentation because of its low abundance and suppressed ionization (data not shown). The on-line top-down method successfully fragmented this putative CabϪSGϪ MOIH using ETD and HCD, which produced a fragmentation map with 52% of sequence coverage (supplemental Fig. S6).

MS-based Distribution Mapping and Conformation Analysis Reveal Biological Significance-
Distribution Mapping by MALDI Imaging-The functional roles that various compounds play in an organism are closely related to their locations. The X-organ/sinus gland complex located in eyestalks represents a major neurodocrine structure in decapod crustaceans (10). Previous studies using immunohistochemical techniques for peptide profiling indicate that the subfamily-I peptides (CHH) rarely overlap with the subfamily-II peptides (MIH and MOIH) (9,14,15,45). However, the immunohistochemical method suffers from antibody cross-reaction problem as the epitope peptides share a high degree of sequence homology (see Fig. 6) (14). To overcome this limitation, we demonstrated the first use of MALDI MS imaging technique to map the endogenous CHH-family peptides in crustacean neurosecretory structures.  Table I). The MALDI imaging results of the entire eyestalk reveal that all these peptides are located in the sinus gland (Fig. 8B). Interestingly, the zoom-in images (Figs. 8C-8G) of the sinus gland illustrate that CabϪSGϪ CPRP and subfamily-I peptides, CabϪSGϪ CHH-I and CabϪSGϪ CHH-II are co-localized; and subfamily-II peptides, CabϪSGϪ MIH and CabϪSGϪ MOIH are also co-localized. Moreover, the CabϪSGϪ CPRP, CabϪSGϪ CHH-I and CabϪSGϪ CHH-II have almost no overlap with CabϪSGϪ MIH and CabϪSGϪ MOIH, exhibiting distinct distribution patterns for the two CHH subfamily peptides. Furthermore, a similar distribution pattern was observed from MALDI imaging of the blue crab C. sapidus sinus gland, which shows that the CasϪSGϪ MIH and CasϪSGϪ CHH are differentially distributed, and CasϪSGϪ CPRP and CasϪSGϪ CHH are co-localized (data not shown). These results are consistent with previous studies using immunohistochemical methods.
Conformation Analysis by Ion Mobility MS-The N-terminal glutamine of neuropeptides can be modified by cyclization of the glutamine via condensation of the ␣-amino group with the side-chain carboxyl group (pyro-Glu modification). Modified peptides show variation of half-life and biological activity related to conformational changes (46). The CabϪSGϪ CHH-I and CabϪSGϪ CHH-II in Fig. 8A are the modified and unmodified isoforms, respectively. The release sites of the two isoforms are co-localized in the sinus gland ( Fig. 8C and 8D), suggesting that they exhibit similar tissue-specific distribution patterns. Thus, it is interesting to investigate the potential peptide conformational change induced by this modification. Supplemental Fig. S7 shows the calibrated collision cross section (CCS) (47) of the CabϪSGϪ CHH-I and CabϪSGϪ CHH-II with 40 Å 2 of CCS difference measured by ion mobility spectrometry, suggesting that the N-terminal pyro-Glu modification may cause intrinsic shape or conformational changes of neuropeptides.

Statistical Analysis Evaluates Performance Characteristics of the Three De Novo Sequencing Strategies-
Complementary Characteristics-Implementation of three de novo sequencing strategies provides more confident and effective sequence elucidation of large peptides. Here, we use CasϪSGϪ CHH, CasϪSGϪ MIH, CabϪSGϪ CHH and CabϪSGϪ MIH as model peptides to evaluate three figures of merit: number of identified residues (Fig. 9A), number of residues in sequence tags (Fig. 9B), and number of cleavage sites (Fig.  9C). These results indicate that the three strategies provide complementary information. Combined use of these strategies leads to 60%, 100%, and 36% boosts in the values of the three evaluated aspects, respectively (Supplemental Fig. S8).
Local Identification Confidence-The combination of offline and on-line top-down methods can also improve the local identification confidence (48)

Advantages of the High-definition De Novo Sequencing
Approach-We have described a multifaceted strategy for identification and characterization of the CHH-family neuropeptides. High-definition de novo sequencing not only offers complementary sequencing information but also improves confidence for elucidation of individual amino acid residues. Specifically, the bottom-up de novo sequencing method, employing multiple protease digestion and LC-MS/MS technique, enables deep amino acid sequencing by analysis of the overlapped proteolytic peptides (37,49). However, complications arise from incomplete characterization of alternative splice forms, labile PTMs and truncated isoforms, leading to misidentification of the native intact peptide isoforms (35,50,51). Combining bottom-up and top-down methods can overcome this issue. Off-line and on-line top-down de novo sequencing methods combined use of multiple fragmentation techniques including CID, ECD, HCD and ETD (52), offering comprehensive cleavage of intact large peptide molecules while retaining site-specific PTMs. This allows us to obtain the precise molecular details of these important peptide hormones.
Off-line Top-down Strategy-One challenge for top-down MS/MS of large peptides and proteins is the limited capability to detect low-abundance fragment ions. To overcome this problem, our off-line top-down experiment was performed on a platform established by Ge and coworkers (42), in which a 7T LTQ-FTICR is coupled with a chip-based nano-ESI, enabling femtomole level of sample consumption and consistent acquisition of spectrum. Thousands of MS/MS scans were averaged to generate a high-quality top-down MS/MS spectrum which facilitated the detection of numerous low-abundance fragment ions. For example, the ECD spectrum in Fig.  4B resulted from averaging 1000 scans with enhanced S/N ratio, which allows us to detect the low-abundance ions and obtain a 75% of sequence coverage (Figs. 5A and 5C).
On-line Top-down Strategy-Another challenge of topdown MS/MS is the incompatibility of long duty cycle of FT transient with fast LC-MS time scale in which analytes of interest are (co)eluted in a short time window (36). Here, we carried out the on-line top-down experiment on a platform constructed by Kelleher and co-workers (36), in which a newly developed ultrahigh resolution LTQ-Orbitrap Elite FTMS system (53) was coupled with nano-LC for high throughput LC-MS/MS analysis. The optimized ion transfer optics facilitates high speed top-down MS/MS and fast switching between CID, HCD, and ETD, enabling sensitive and rapid identification of the eluted peptide ions. For instance, the CabϪSGϪ MOIH was co-eluted with CabϪSGϪ CHH from LC fractionation. The major component CabϪSGϪ CHH suppressed the detection of CabϪSGϪ MOIH and thus the off-line top-down fragmentation of CabϪSGϪ MOIH cannot be performed. In contrast, using on-line top-down system, HCD and ETD spectra were collected with one scan resulting in a 52% of sequence coverage (supplemental Fig. S6).
Sample Preparation Strategy-In addition, we employed a simple strategy to dramatically improve the fragmentation efficiency of large peptides containing extensive disulfide bonds. The peptides are reduced by DTT and stored in acidic solvent followed by MS analysis, instead of IAA alkylation which usually causes incomplete reaction because of large peptide sizes and thus decreases the target ion intensities. Compared with our previous study (34) using IAA alkylation on CHH, this method resulted in ϳ3 fold increase of observed fragment ions and identified sequence coverage (comparison of fragmentation maps shown in supplemental Fig. S11). Furthermore, this method can potentially be applied to top-down proteomics to enhance fragmentation efficiency and sequence coverage.
Homology Search Strategy-In this study, the CHH-family peptides share high degree of sequence homology (Fig. 6). Therefore, a homology search using tryptic peptides in Mascot (54) ensures finding the desired homologous preprohormone. Alternatively, other algorithms with homology search function, such as PLGS (55) and PEAKS (48), have been examined to obtain similar results. For utility of the on-line top-down LC-MS/MS data (fragmentation of entire large peptides), ProSightPC (36) enables searching against both a manually curated peptide database and a database of homologous preprohormones. In addition, manual homology search using NCBI Blast (http://blast.ncbi.nlm.nih.gov/Blast.cgi) with the sequence tag LLIMDNFEEY (Fig. 4A) yielded the same preprohormones as Mascot.
Applicability of the Multifaceted Strategies-Although in this work we employed multiple mass spectrometers to demonstrate the complementary nature of various instruments and applicability of the strategies, the high-definition sequencing of large peptides can be accomplished using two or three instruments, depending on the goals of the study. For example, a combined use of a MALDI-based mass spectrometer and highresolution nano-ESI-tandem MS instrument with multiple fragmentation techniques would enable both high sequence coverage of peptides and spatial mapping of the peptides of interest. The former instrument can be used for fast peptide candidate screening and MALDI imaging; and the latter MS-platform can be employed for bottom-up, off-line top-down and on-line topdown experiments, in which the nano-ESI source is coupled with either off-line infusion device or on-line LC system.

Functional Aspects of the Identified CHH-family Neuropeptides-Previous studies in crustacean endocrinology
highlighted the central role of the CHH-family neuropeptides in signaling system of energy homeostasis, and proposed a model that employs neurotransmitters to control secretion and release of CHHs, followed by triggering the downstream energy metabolism by the second messengers (7,8). However, the precise molecular mechanisms underlying the interactions between neurotransmitters and CHHs at the cellular and network levels remain elusive. This is, in part, because of a lack of analytical capabilities to identify and characterize these low abundance endogenous signaling molecules in a complex microenvironment.
Our current large neuropeptidome analysis by use of highdefinition de novo sequencing allowed precise characterization of six CHH-family peptides with PTMs, including the novel MIH and MOIH neuropeptides. Their detailed molecular information will contribute to further functional and physiological studies exploring the mechanisms modulating the animal's metabolism, osmoregulation, molting and reproduction (9). One of the applications is to study their regulatory roles on energy homeostasis, with a goal to establish a simplified neuroendocrine model of energy regulation using crustaceans (8). A critical element of studying the complex interactions between multiple molecular players is the ability to map their spatial distributions and colocalization patterns. Toward this end, we employed MALDI imaging mass spectrometry and ion mobility MS to analyze the spatial distribution and molecular conformation of several CHH-family neuropeptides.
Generally, each CHH-family peptide plays distinct functional role. For example, CHHs regulate the blood glucose metabolism (16); MIHs suppress the synthesis of ecdysteroids delaying molting (12); and MOIHs inhibit the synthesis of methyl farnesoate and thus control somatic and gonadal growth (43). Nevertheless, many previous studies have suggested that these large peptides are multifunctional hormones and often display overlapping biological activities (8,9). For instance, the osmoregulatory function of CHH is related with crustacean molt cycles (56). Thus, visualization of the CHHfamily peptide hormone release sites may provide insights into the neurosecretion and signal transduction pathways as well as complex hormonal integration of these processes. Many efforts have been directed to the use of immunohistochemical techniques for distribution analysis of the CHHfamily peptides in the neurosecretory system (9,14,15,45). In general, the immunoreactivities of subfamily-I peptides rarely overlap with those of subfamily-II peptides, but co-localization of individual isoforms among subfamily-II peptides has been observed. Our MALDI imaging results of both C. sapidus and C. borealis sinus glands (Fig. 8) provide the first direct biochemical evidence to confirm this distribution pattern. This MS-based imaging method provides unambiguous identification and simultaneous measurement and mapping of multiple CHH-family peptides including intact CHH and CPRP as well as MIH and MOIH using a single eyestalk tissue section. This new method overcomes several limitations of traditional immunohistochemical techniques such as cross-reactivity and limited throughput by offering accurate and simultaneous mapping of numerous endogenous CHH-family peptide isoforms in a single experiment. Nonetheless, the MALDI MS imaging technique does not rival the cellular spatial resolution offered by antibody-based staining techniques. Here, we used 50 ϫ 50 m 2 pixels for MS image acquisition and visualization, enabling the generation of spatial distribution of various CHH-family neuropeptides within a ϳ1 ϫ 1 mm 2 tissue area of this important neuroendocrine organ.

CONCLUSIONS
The overall work described here represents a new route to discovery and characterization of large neuropeptides. This multifaceted MS-based strategy involves a comprehensive and systematic implementation of peptide candidate scanning, in silico homology searching, de novo sequencing, distribution mapping, and conformation analysis. The accurate sequence, spatial distribution pattern, and conformational analysis of the CHH-family neuropeptides were elucidated with a combination of MS tools. This high-definition de novo sequencing strategy combines well-established bottom-up, on-line top-down and off-line top-down methods, offering complementary sequence information at the residue level. Because CHH-family peptides represent the typical molecular features of large neuropeptides, this multifaceted strategy is applicable to comprehensive characterization of large peptidomes in other biological systems.
With the knowledge of the precise molecular details of these CHH-family neuropeptides, future work will focus on studying their functional roles and modulation mechanism. Ongoing project based on a novel quantitative top-down MS method will enable monitoring of CHH-family peptide release and deciphering the regulatory pathways in energy homeostasis and feeding behavior.
Acknowlegements-We thank Bruker Daltonics for graciously loaning the Autoflex III MALDI TOF/TOF mass spectrometer. We are also grateful to the UW School of Pharmacy Analytical Instrumentation Center for access to UHR-TOF maXis, to Lisa Xu and Huseyin Guner for experimental assistance with LTQ-FTICR, and to Emma Doud for assistance in editing this manuscript. * This work is supported in part by the National Institutes of Health (NIH) grant (R01DK071801 to LL) and the National Science Foundation grant (CHE-0967784 to LL). YG acknowledges support from the Wisconsin Partnership Fund for a Healthy Future and NIH R01HL096971. NLK thanks support from NIH grants R01 GM067193 and P30 DA018310 and the Searle Funds at the Chicago Community Trust (to Chicago Biomedical Consortium). LL acknowledges an H. I. Romnes Faculty Research Fellowship. CJ thanks an Oversea Training Fellowship and UW Vilas Conference Presentation Funds.
□ S This article contains supplemental Figs. S1to S12 and Tables S1 to S3.