Mass spectrometry‐based top‐down and bottom‐up approaches for proteomic analysis of the Moroccan Buthus occitanus scorpion venom

Buthus occitanus (B. occitanus) is one of the most dangerous scorpions in the world. Despite the involvement of B. occitanus scorpion in severe cases of envenomation in Morocco, no study has focused yet on the proteomic composition of the Moroccan B. occitanus scorpion venom. Mass spectrometry‐based proteomic techniques are commonly used in the study of scorpion venoms. The implementation of top‐down and bottom‐up approaches for proteomic analyses facilitates screening by allowing a global view of the structural aspects of such complex matrices. Here, we provide a partial overview of the venom of B. occitanus scorpion, in order to explore the diversity of its toxins and hereafter understand their effects. To this end, a combination of top‐down and bottom‐up approaches was applied using nano‐high liquid chromatography coupled to nano‐electrospray tandem mass spectrometry (nano‐LC‐ESI MS/MS). The LC‐MS results showed that B. occitanus venom contains around 200 molecular masses ranging from 1868 to 16 720 Da, the most representative of which are those between 5000 and 8000 Da. Interestingly, combined top‐down and bottom‐up LC‐MS/MS results allowed the identification of several toxins, which were mainly those acting on ion channels, including those targeting sodium (NaScTxs), potassium (KScTxs), chloride (ClScTxs), and calcium channels (CaScTx), as well as antimicrobial peptides (AMPs), amphipathic peptides, myotropic neuropeptides, and hypothetical secreted proteins. This study reveals the molecular diversity of B. occitanus scorpion venom and identifies components that may have useful pharmacological activities.

Each year, scorpion stings record new cases of envenomation over the world with an incidence of more than 1.5 million and over 2600 deaths, mainly in tropical and subtropical countries of South America, Asia, and North Africa [1]. Most of these envenomation cases were caused by scorpions belonging to the Buthidae family, which contains dangerous species known by their lethal venoms [2]. The venom of these family members contains a heterogeneous cocktail of compounds, including inorganic substances, enzymes, mucopolysaccharides, allergenic compounds, and peptides with high toxicity toward ionic channels of excitable cells [3][4][5][6]. In Morocco, 26 819 cases of scorpion stings were reported in 2019 by the Poison Control and Pharmacovigilance Center of Morocco, with an incidence of 75.3 cases per 100 000 inhabitants [7]. These statistics are due to the diversified scorpion fauna represented by over 50 species, mainly widespread in the middle and southwestern provinces of the kingdom [8]. Among these species, the yellow scorpion Buthus occitanus (B.occitanus) seems to be one of the most dangerous scorpions, on account of its toxic venom causing the majority of envenomation cases [9].
Although several studies had been carried out on this venom [10][11][12][13], no study has yet focused on the proteomic composition of the Moroccan B. occitanus scorpion venom despite its medical importance. Moreover, there are various strategies to screen scorpion venoms, from using conventional strategies for targeting one single toxin, to applying the most throughput equipment of screening for a detailed view of all toxic components. Nowadays, mass spectrometry-based proteomic approaches are still one of the most fundamental tools to decrypt the complexity of such matrices, owing to the revolutionary advances in instrumentation and software, in addition to improvement in omics strategies (peptidomic, proteomic, transcriptomic, and genomic) [14][15][16][17][18][19]. Among the approaches that have improved significantly the proteomics workflow, there are the top-down process, which designates a rapid analytical workflow of intact proteins, and the bottom-up approach, which requires prior proteolytic digestion of proteins before mass spectrometry analysis. These approaches lead to acquiring mass fingerprints, primary structural information, and posttranslational modifications [20][21][22][23]. The application of these approaches, singly or complementary, in several proteomic studies has increased the number of characterized venoms and identified toxins [24][25][26][27][28][29]. In this context, this work aimed to ensure an overview of the peptidome of B. occitanus scorpion (< 30 kDa), so exploring its toxins arsenal, using a combination of the top-down and bottom-up approaches applied on nano-high liquid chromatography coupled to a nano-electrospray tandem mass spectrometry (nano-LC-ESI MS/ MS).

Venom milking
Specimens of B. occitanus were collected from the region of Oualidia (32°44 0 N 9°01 0 W), in eastern Morocco. The crude venom was milked by electrical stimulation, pooled, centrifuged at 10 000 g for 20 min, freeze-dried, and stored at À20°C until use [30].
Disulfide-bridged half-cysteine residues of this venom filtrate were reduced by 10 mM of DTT in ammonium bicarbonate buffer (50 mM, pH 8.3), for 45 min at a temperature of 56°C. Cysteine residues were carboxamidomethylated by incubation with 50 mM iodoacetamide [IAA in ammonium bicarbonate (50 mM, pH 8.3)] for 1 h in the dark. Then, these proteins/peptides were desalted by ZipTip C4 (Millipore Corporation -Billerica, USA) and concentrated on a Savant SpeedVac (Thermo Scientific, San Jose, CA, USA).

Top-down proteomics
Intact and reduced/alkylated B. occitanus venom filtrates were carried out on an Orbitrap Fusion TM Lumos TM mass spectrometer (Thermo Scientific TM Waltham, MA, USA), equipped with a Dionex HPLC (Fig. 1).
Proteins/peptides were eluted directly from the column into the mass spectrometer and operated in positive mode with a spray voltage of 1.6 kV. MS spectra were acquired at a resolution setting of 120 000.
MS/MS analysis was performed on data-dependent acquisition, the top 10 abundant precursor ions were selected for an EThcD fragmentations (Electron-Transfer/ Higher-Energy Collision Dissociation) with a dynamic exclusion time of 90 s. MS/MS spectra were acquired at a resolution setting of 120 000, and the mass range was set from 150 to 2000 m/z.
Peptides were eluted at 250 nLÁmin À1 , using 3-22% gradient of solvent B for 112 min, then 22-38% gradient of solvent B for 35 min, and finally 38-60% gradient of solvent B for 15 min. The instrument method for the Q-Exactive Plus was set up in the data-dependent acquisition mode. MS and MS-MS spectra were acquired at a resolution of 60 000, 10 of the most abundant precursor ions were selected for HCD fragmentation with collision energy adjusted to 27. Mono-charged precursors and those with a charge state of > 7 were excluded.
In-gel digestion At first, 2 mg of venom filtrate was unfolded for 5 min at 95°C in sample buffer (LDS sample buffer) and then subjected to a SDS/PAGE using a 4-20% of polyacrylamide gel (SDS Precast Gel RunBlue, 4-20%, 12 well; Expedeon, CA, USA). The electrophoresis was performed, on a Bio-Rad system, at a constant voltage of 140 V, and the separated proteins were stained with Coomassie Brilliant Blue R (InstantBlue; Expedeon, CA, USA).
Stained bands corresponding to proteins/peptides with masses < 30 kDa (Fig. S1) were manually excised into equal small cubes of 1 mm 3 , then washed with Milli-Q water, ammonium bicarbonate 50 mM, and ACN 50%. Experimental workflow performed in this study. At first, B. occitanus venom was milked by electrical stimulation and applied to a 30 kDa filter. For the top-down venomic, the flow-through containing toxins < 30 kDa was analyzed by the Thermo Scientific TM Orbitrap Fusion Lumos Tribrid Mass Spectrometer. For the bottom-up approach, two digest methods were achieved: 1) in-solution digestion, the flow-through containing toxin < 30 kDa was directly reduced with DTT, alkylated with IAA, and digested with trypsin; and 2) in-gel digestion, the unstained gel was excised to small cubes, reduced, alkylated, and digested. The digest peptides were then desalted with ZipTip and applied to the Orbitrap Q-Exactive mass spectrometer. Subsequently, the slices were submitted to an in-gel reduction with DTT (10 mM) in ammonium bicarbonate buffer (50 mM, pH 8.3) for 45 min at a temperature of 56°C. Reduced slices were alkylated with IAA (50 mM) in ammonium bicarbonate (50 mM, pH 8.3) buffer for 20 min in the dark, followed by an overnight digestion with 0.1 lg of trypsin (Promega) at a temperature of 37°C [31]. The enzymatic reaction was stopped by adding 5 µL of FA 5%, and desalted by loading the peptides onto ZipTip C18. After drying, digested peptides were dissolved in 100 lL of 0.1% (v/v) FA and applied on a liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) system, composed of a nano-flow HPLC pump and an Orbitrap Q-Exactive mass spectrometer (Thermo Scientific) with a nano-electrospray ion source, as described in the section above.

Data analysis
The top-down liquid chromatography coupled to mass spectrometry (LC-MS) data analysis of native B. occitanus venom filtrate was deconvoluted using the Xtract algorithm within Thermo Scientific XCALIBUR 2.2 software (Thermo Fisher Scientific).
For protein identification, data from both of the venomic nano-LC-MS/MS approaches were processed using the PROTEOME DISCOVER 2.2 software (Thermo Fisher Scientific), against the UniProtKB database, downloaded in 2016 10 11, taxon identifier: 6855 and 4309 entries.
Parameters of processing were as follows: a mass tolerance of MS set at 50 p.p.m. and 0.3 Da for MS/MS. One unique peptide was required for protein identification, minimum peptide length was required at five amino acids, and the false discovery rate cutoff was 1%. Trypsin was chosen as the specific enzyme, with a maximum number of two missed cleavages for the bottom-up analysis. Variable modifications included oxidation of methionine and carbamidomethylation, while no fixed modification was set.

Mass spectrometry-based proteomic approaches
The whole proteomic approaches are based only on the UniProtKB database-dependent analysis without any manually de novo sequence annotation; therefore, the majority of reported peptide annotations are still an approximation. Also, it is important to stress that the relative abundances and the percentages of the described peptides are purely based on total number counts and not concentrations as long as no quantitative analysis was performed.

Top-down proteomics
The total ion chromatogram (TIC) generated from the top-down LC-MS analysis of native B. occitanus venom filtrate (Fig. 2) gave a partial picture of the venom complexity, with around 60 peaks, most of them detected with high relative abundance.
The mass fingerprint of B. occitanus venom was generated from a manual deconvolution of spectra gained by top-down LC-MS approach, thus detecting a total of 197 monoisotopic masses ranging from 1868 to 16 720 Da (Table 1). We get one mass less than 2000 Da, 28 molecular masses ranging between 2000 and 5000 Da, 147 mass values from 5000 to 8000 Da, and 21 masses for those over 8000 Da. . The x-axis represents the relative abundance (%), and the y-axis, the retention time (min). Spectra were deconvoluted, and generated monoisotopic masses were distributed according to their MW.
The most representative molecular masses were those from 5000 to 8000 Da, followed by those between 2000 and 5000 Da, which represents respectively 74% and 10% of the total number of measured molecular masses (Fig. 3).
The analysis of reduced/alkylated B. occitanus venom filtrate by tandem mass spectrometry allowed the identification of 68 peptides with a molecular weight (MW) from 1959.13 to 7943.53 Da. The detected experimental sequences are shown in Table 2;    (Fig. 4). Therefore, the other peptides corresponded approximately to toxins, previously identified in other scorpion species with a sequence identity ranging from 17% to 98% (Fig. S2). Therefore, the detected peptides were divided into five categories on the basis of their molecular functions according to the UniProtKB database (https://www.uni prot.org); 63 neurotoxins acting on sodium channels  (Fig. 5A).

Bottom-up proteomics
For the bottom-up workflow, two digest methods were performed: (a) in-solution digestion, the flow-through containing toxin < 30 kDa was directly reduced with DTT, alkylated with IAA, and digested with trypsin; and (b) in-gel digestion, the gel spot corresponding to peptides under 30 kDa (Fig. S1) was excised to small cubes, which after series of washings, were reduced, alkylated, and digested. The results generated by the bottom-up approach using the in-gel digestion yielded the identification of 36 peptides, whereas 37 was the total of the identified peptide by in-solution digestion. The detected peptides showed similarity of sequences with peptides from other scorpion species, and with their sequence coverage ranging from 10.23% (P68721) to 86.15% (P01489) and from 8.75% (P0C294) to 92.86% (P80669) for the in-gel and in-solution digestions, respectively.
The identified categories of peptides using the in-gel digestion were as follows: 27 NaScTxs; seven KscTxs; and two ClTxs (Table 3). While, through the in-solution digestion, we identified in addition to 24 NaScTxs, eight KScTxs and three ClScTxs, one entry that shares 60% of similarity with neurotoxin Tx-2 (P83406) purified from Hottentotta judaicus, could correspond to a calcium channel activator 'CaScTx' scorpion. Besides neurotoxins, one amphipathic peptide was detected by this digestion method (Table 4).
As we mentioned above, we aimed to gain a deeper understanding of the B. occitanus peptidome (under 30 kDa), so the molecular diversity of its toxins. In this context, we combined data from the top-down and bottom-up analyses and then analyzed the generated data to infer a global and comprehensive characterization of this venom.

Discussion
Envenomation following scorpion stings constitutes one of the most encountered emergencies in large parts of the world, especially in North Africa, where the data show the highest incidence and lethality [1]. Morocco is a country known for a high risk of envenomation owing to its huge and diversified scorpion fauna. Among the different scorpion species living in this country, the yellow scorpion B. occitanus is one of the most dangerous species with venom responsible for severe cases of envenomation.
Due to the limited knowledge about the composition and toxin arsenal of B. occitanus venom, we aimed in this study to elaborate the first exhaustive view of this scorpion venom peptidome and its molecular diversity, using mass spectrometry-based top-down and bottomup approaches.
Top-down data sets showed that the venom of B. occitanus is very complex, counting around 200 MWs ranging from 1868 to 16 720 Da. A similar number of components have been revealed by previous studies [32][33][34], others showed fewer components, as well as Leiurus abdullahbayrami (45 masses) and Opisthacanthus elatus (106 masses) [35,36], whereas some other scorpion venoms were more complex, such as the Pandinus cavimanus (390 masses) and Centruroides limpidus (395 masses) [37,38]. Additionally, the repartition of MWs showed that < 1% were components with molecular masses < 2000 Da, 14% were those from 2000 to 5000 Da, 74% were those between 5000 and 8000 Da, and 10% were those over than 8000 Da, while the repartition of MW from the French B. occitanus scorpion venom showed an abundance of molecules ranging from 2000 to 3000 Da and those less than 2000 Da [39]. Most importantly, the whole sequences of five toxins were identified with 100% sequence coverage using the top-down approach. These neurotoxins were detected for the first time in this venom; they all belong to the NaScTxs category and shared high similarities of sequence with toxins identified from other scorpion species: neurotoxin BmK-II (P59360), beta-insect depressant toxin BotIT4 (P55903), beta-insect depressant toxin BaIT2 (P80962), insect toxin LqhIT5 (P81240), and insect toxin BsIT4 (P82814). It is important to stress that the observed sequence of the P59360 entry with a MW of 7431.33 Da showed 100% similarity with the sequence of neurotoxin BmK-II isolated from the Chinese scorpion Mesobuthus martensii, this neurotoxin is active in mammal and insect Nav channel [40]. In contrast, the detected sequence of the P81240 entry (6611.8 Da) showed the presence of methionine in the N-terminal compared with the database sequence of the Insect toxin LqhIT5, an excitatory insect beta-toxin from the Leiurus hebraeus scorpion [41]. Similar to the P82814 entry (6954.15 Da), in which the observed sequence corresponds 100% to the insect toxin BsIT4, a depressant insect beta-toxins was isolated from Hottentotta tamulus sindicus [42]. Also, the observed sequence of the peptide corresponding to the depressant toxin BotIT4 (6837. 96 Da) presents methionine in N-terminal compared with the database sequence. This toxin, identified for the first time from the Tunisian Buthus tunetanus [43], showed also 100% sequence identity with the P80962 entry (6845.9 Da), referred to the beta-insect depressant toxin BaIT2 isolated from the Buthacus arenicola scorpion [44]. The high similarity of the amino acid sequence, in both detected depressant toxins and in the other peptides is commonly observed in scorpion toxins.
Interestingly, the combined top-down and bottomup data sets of B. occitanus venom provide the identification of 102 different peptides, whereas 147 proteins were characterized from the yellow Brazilian scorpion Tityus serrulatus, 60 of which were detected by the top-down approach [45]. The major representative category of components identified in our venom was neurotoxins, mainly NaScTxs (77%), these neurotoxins are abundant in species from the Buthidae family [38,46,47] and less representative in scorpions from the non-Buthidae family [33,48,49]. Those toxins are the ones responsible for envenomation symptoms [39]; their high content in the B. occitanus venom could explain the involvement of this scorpion in lethal cases of envenoming in the country.
Between the entries corresponding to NaScTxs, there are alpha-like toxins, this type of toxins had been already identified in several Buthus sp; yet, the alphatoxin Bot1 (P01488) has never been found in other Moroccan Buthus subspecies except from Buthus mardochei [39,[50][51][52][53], but identified herein with a high sequence coverage (98.48% on top-down data set). We should mention also that we identified for the first time, in this scorpion venom, peptides corresponding to atypical NaScTxs, as well as makatoxin-1, fragment from makatoxin-2, toxin Cg2, chain  in venom toxin meuNa32, and AaHIT4 toxin (which could bind on receptor site 3 or 4 of sodium channel) [33].
Besides NaScTxs and KScTxs (14%), ClScTxs (3%) were identified, these categories of peptides showed activities against autoimmune disease and cancers, respectively [54][55][56][57][58]; also, we identified one entry that shared 60% of similarity with neurotoxin Tx-2 (P83406), a calcium channel activator identified for the first time from the Buthotus judaicus, this category of toxins was identified in few scorpion species, for example, Parabuthus transvaalicus (Kurtoxin) and Parabuthus granulatus (Kurtoxin-like I) but never been detected in a Moroccan scorpion venom [59,60]. And last but not least, peptides referring to toxin Acra category have also been screened in B. occitanus venom, these toxins probably acting on ion channels.
Some peptides with antibacterial activities were also found, for example, amphipathic peptide (B8XH50) and AMP AcrAP1 (A0A059UI30); this category was commonly present in scorpion venom due to its role in the protection of venom glands and its involvement in the neurotoxic effects [61][62][63][64][65]. Additionally, other components were identified with a low percentage, such as orcokinin, a myotropic neuropeptide identified from crustaceans, insects, and arachnids [17,66], and hypothetical secreted proteins, which are proteins with unknown activities. Finally, we notice that some of the detected toxins were identified as fragments and chains, which may be due to the proteolysis of toxins. This process seems to be a usual PTM in scorpion and snake venoms, whereas its biological pertinence remains obscure [17,45].
This study decrypted the peptidome arsenal of the Moroccan B. occitanus scorpion venom through proteomic view without the de novo sequence annotation. These findings constitute a step forward to a 'deeper' understanding of this scorpion venom; nevertheless, complete identification of this complex matrix is still a challenging task, especially with the lack of a specific database and/or a complete sequenced genome of this venom.

Conclusion
Herein; we reported the first proteomic study of the Moroccan B. occitanus scorpion peptidome, using mass spectrometry-based top-down and bottom-up venomic approaches. The combination of these approaches allowed the identification of 102 components classified, with approximation, on different categories, mainly neurotoxins (96%), including NaScTxs (77%), KScTxs (14%), ClScTxs (3%), CaScTx (1%), and toxin Acra (1%). We also identified AMPs (1%), amphipathic peptides (1%), hypothetical secreted proteins (1%), and myotropic neuropeptides (1%). This study constitutes for sure a step forward to a deeper understanding of the B. occitanus venom; nevertheless, complete identification of this complex matrix is still a challenging task, especially with the lack of a specific database and a complete sequenced genome.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article.