Bottom-Up Design Approach for OBOC Peptide Libraries

One-bead-one-compound peptide libraries, developed following the top-down experimental approach, have attracted great interest in the identification of potential ligands or active peptides. By exploiting a reverse experimental design approach based on the bottom-up strategy, we aimed to develop simplified, maximally diverse peptide libraries that resulted in the successful characterization of mixture components. We show that libraries of 32 and 48 components can be successfully detected in a single run using chromatography coupled to mass spectrometry (UPLC-MS). The proposed libraries were further theoretically evaluated in terms of their composition and physico-chemical properties. By combining the knowledge obtained on single libraries we can cover larger sequence spaces and provide a controlled exploration of the peptide chemical space both theoretically and experimentally. Designing libraries by using the bottom-up approach opens up the possibility of rationally fine-tuning the library complexity based on the available analytical methods.

OBOC combinatorial libraries allow the synthesis of large and diverse mixtures of peptide sequences and their concurrent screening for affinity interactions [1,2,[12][13][14][15][16][17][18]. The four main steps employed in the development of OBOC peptide libraries are: (a) library synthesis, (b) screening against biological targets of interest, (c) deconvolution of the sequence, followed by (d) validation of peptide entities [19]. Several examples have been developed and characterized following this scheme [12,16,[20][21][22]. In addition, a variety of optimized methods have been proposed to improve both synthesis and screening processes, such as the use of color coding [23], fluorescent dyes [17], immobilization of beads on solid supports [24] and others [3,14,18,21,25]. This typical top-down design approach to OBOC peptide libraries leads to the efficient synthesis of large libraries but often fails in the step of sequence identification after screening. The unambiguous confirmation of peptide sequences remains a major bottleneck due to mixture complexity [3,12,26]. Such complexity often hampers straightforward sequence elucidation due to the co-existence of palindromic or isobaric For this purpose, we set out to determine whether a reverse experimental design approach based on the bottom-up strategy could lead to the efficient characterization of random peptide library mixtures by reducing the complexity of single libraries. Specifically, the focus was put on quality analysis of libraries designed by genetic algorithms to minimize sequence overlapping and thus facilitate their characterization by routinely available methods in research laboratories. The advantage of the proposed strategy is the obtainment of fully characterized smaller portions of the sequence space (composed of 10 1 -10 2 components) that can consequently be combined to encompass larger portions of it. Additionally, it allows the controlled increase in library size and complexity, giving rise to rationally designed platforms able to provide more informative data on single sequence behavior. For this purpose, we set out to determine whether a reverse experimental design approach based on the bottom-up strategy could lead to the efficient characterization of random peptide library mixtures by reducing the complexity of single libraries. Specifically, the focus was put on quality analysis of libraries designed by genetic algorithms to minimize sequence overlapping and thus facilitate their characterization by routinely available methods in research laboratories. The advantage of the proposed strategy is the obtainment of fully characterized smaller portions of the sequence space (composed of 10 1 -10 2 components) that can consequently be combined to encompass larger portions of it. Additionally, it allows the controlled increase in library size and complexity, giving rise to rationally designed platforms able to provide more informative data on single sequence behavior.

Bottom-Up Experimental Approach for the Design of OBOC Peptide Libraries
The bottom-up experimental approach to OBOC peptide library design is based on the choice of the library complexity to match the available instrumental characteristics such as resolving power and mass accuracy. The goal is to obtain rapid design and efficient analysis of single peptide sequences from simplified mixtures that could be performed routinely after synthesis by UPLC-MS or other complementary techniques. This approach consists of three main steps schematically represented in Figure 1: (a) the choice of amino acids (input) to obtain random peptide library designs (output) [29], (b) sequence-based properties evaluation followed by (c) library synthesis and characterization. Having a smaller library does not necessarily solve the issue of overlapping peptides. Therefore, we implemented a search-based algorithm to find optimal solutions, maximal in size and minimal in overlapping.

Bottom-Up Experimental Approach for the Design of OBOC Peptide Libraries
The bottom-up experimental approach to OBOC peptide library design is based on the choice of the library complexity to match the available instrumental characteristics such as resolving power and mass accuracy. The goal is to obtain rapid design and efficient analysis of single peptide sequences from simplified mixtures that could be performed routinely after synthesis by UPLC-MS or other complementary techniques. This approach consists of three main steps schematically represented in Figure 1: (a) the choice of amino acids (input) to obtain random peptide library designs (output) [29], (b) sequence-based properties evaluation followed by (c) library synthesis and characterization. Having a smaller library does not necessarily solve the issue of overlapping peptides. Therefore, we implemented a search-based algorithm to find optimal solutions, maximal in size and minimal in overlapping. To show the applicability of the bottom-up design approach, two libraries were designed and characterized. The first step was the choice of amino acids to be included in a specific library based on their properties that consequently dictate the properties of the library. In our previous work, we developed a genetic algorithm (GA) for OBOC peptide library design that produces random simplified libraries where it is possible to distinguish all components by their mass [29]. The parameters are the number of positions (r) i.e., peptide length, the list of possible residues for each position and the mass variability parameter (ΔT) [29]. We chose two algorithm-assisted designs for further experimental validation that consisted of five (Library 1) and six (Library 2) positions r where variability was introduced [29]. For Library 1, six amino acids ( , , , , , ) were proposed for each position (x1-x5) and one fixed residue (y) was used in the last position (x6), while for Library 2, seven amino acids ( , , , , , , ) were put in each position and two fixed positions (p,y) were used ( 3 and 7). This input is fed to the algorithm that produces a series of possible output options for library To show the applicability of the bottom-up design approach, two libraries were designed and characterized. The first step was the choice of amino acids to be included in a specific library based on their properties that consequently dictate the properties of the library. In our previous work, we developed a genetic algorithm (GA) for OBOC peptide library design that produces random simplified libraries where it is possible to distinguish all components by their mass [29]. The parameters are the number of positions (r) i.e., peptide length, the list of possible residues for each position and the mass variability parameter (∆T) [29]. We chose two algorithm-assisted designs for further experimental validation that consisted of five (Library 1) and six (Library 2) positions r where variability was introduced [29]. For Library 1, six amino acids (s,e,r,w,a,G) were proposed for each position (x 1 -x 5 ) and one fixed residue (y) was used in the last position (x 6 ), while for Library 2, seven amino acids (s,e,r,w,a,G,i) were put in each position and two fixed positions (p,y) were used (x 3 and x 7 ). This input is fed to the algorithm that produces a series of possible output options for library design [29]. The chosen output suggestions, from the 90-100% mass diversity region and for ∆T = 1 were: Our focus was put on the maximum (above 90%) mass and sequence diversity region because we believe it would lead to a more straightforward chromatographic and mass analysis. In both examples, all-D peptide libraries were considered. Libraries were designed to contain amino acids important for protein-protein interactions with the aim to develop libraries that could be useful for screening of specific protein-protein or receptor-ligand interactions [26,30].
Following, sequence-based analysis of relevant properties such as hydrophobicity, polarity, stability, potential for interactions through H-bonding, etc., (Section 2.2) was performed [31]. Finally, split-and-mix synthesis and consecutive characterization, taking into account the type of analytical instrument or platform, its resolving power and the molecular complexity of the library, were carried out by a) UPLC-MS, b) direct-injection high-resolution MS to increase mass accuracy and high-throughput and c) nano LC-MS/MS to increase sensitivity, confirm the exact amino acid sequences and to offer the possibility to work with minimal amounts of sample (Section 2.3). Linear regression and Spearman's correlation coefficient were used to analyze the relationship between experimentally (UPLC) determined retention times (Rt) and computed library properties.

Theoretical Evaluation of Library Properties
The composition and physico-chemical properties (Cruciani: polarity, hydrophobicity and H-bonding [32], hydrophobicity index, hydrophobic moment, aliphatic index, Boman index, net charge, isoelectric point and instability index) of the synthesized libraries were calculated using existing scripts from the R Peptide package [33,34]. To determine peptide composition, amino acids were classified based on their side chains into: tiny (A, C [33]. We expanded the existing classification by adding two more classes: 10) sulfur (C, M) and 11) hydroxyl (S, T).
Composition calculations are graphically presented in Figure 2 where each property is expressed in mole% allowing for the direct comparison of classes of properties present in L1 and L2. The two libraries show similar composition with the main differences observed for the aliphatic, polar and non-polar classes. This was expected due to the fact the same amino acids (s, e, r, w, a, G, y) were used for library construction. Different combinations of them were employed to make up the sequences in each library, with L2 having additional isoleucine and proline residues responsible for contributing to differences in overall library polarity.
In addition, violin plots (  [32]. Therefore, we can conclude that L1 shows higher polarity and H-bonding acceptor propensity while having slightly lower hydrophobicity compared to L2.  Hydrophobicity is an important stabilization force in protein folding but also determines the chromatographic separation of peptides. Hydrophobicity index and hydrophobic moment ( Figure  3b,c) show moderate values, with L2 being slightly more hydrophobic than L1. As expected, the aliphatic index (Figure 3d) is more prominent in L2 because of the presence of isoleucine and proline, absent in L1. Aliphatic amino acids contribute to the thermal stability of proteins [35].
The isoelectric point (IP) and the net charge are variables that affect the solubility of peptides under specific pH conditions. We considered these variables at the physiological pH. The designs for L1 (one E and two R residues) and L2 (two E and one R residues) suggest that libraries will show similar net charge and will be predominantly positively or negatively charged, respectively ( Figure  3f). Accordingly, L1 shows the IPs in the range above the physiological pH, while L2 shows IPs below the physiological pH (Figure 3e).
The Boman index provided an estimation of the potential of a peptide to bind to other proteins [36]. The values above 2.4 indicate a high binding potential. It can be observed that L1 has an overall Hydrophobicity is an important stabilization force in protein folding but also determines the chromatographic separation of peptides. Hydrophobicity index and hydrophobic moment (Figure 3b,c) show moderate values, with L2 being slightly more hydrophobic than L1. As expected, the aliphatic index ( Figure 3d) is more prominent in L2 because of the presence of isoleucine and proline, absent in L1. Aliphatic amino acids contribute to the thermal stability of proteins [35].
The isoelectric point (IP) and the net charge are variables that affect the solubility of peptides under specific pH conditions. We considered these variables at the physiological pH. The designs for L1 (one E and two R residues) and L2 (two E and one R residues) suggest that libraries will show similar net charge and will be predominantly positively or negatively charged, respectively (Figure 3f). Accordingly, L1 shows the IPs in the range above the physiological pH, while L2 shows IPs below the physiological pH (Figure 3e).
The Boman index provided an estimation of the potential of a peptide to bind to other proteins [36]. The values above 2.4 indicate a high binding potential. It can be observed that L1 has an overall higher binding potential than L2 (Figure 3g).
The instability index calculations (Figure 3h) available for L-peptides, provided an estimation of the stability of the proposed library components based on the recognition of pairs of stable dipeptides [37]. According to this parameter, the all-L libraries resulted unstable by showing instability indices above the threshold value of 40 [31]. With the intention of creating more stable libraries, we opted for the synthesis of all-D peptide libraries. The OBOC method allows for easy and fast introduction of D-or other unnatural amino acids during library synthesis [1,2,13].
Properties-based chemical space evaluation is a valuable tool to gain insight into possible issues that could arise during the chromatographic separations. When using reverse-phase liquid chromatography, peptides are separated according to their hydrophobicity. A challenge lies in the fact that if peptides are too hydrophobic their detection might fail due to solubility issues, sustained non-covalent interactions with the stationary phase in the LC column or insufficient separation prior to MS. On the other hand, very hydrophilic peptides could face the problem of insufficient retention on the LC column prior to MS. A library of the same size but having a more hydrophilic composition would require LC method adaptation and a slower increase in solvent polarity gradient to allow for detection and separation of all peptide components. Therefore, the insight into hydrophobicity, polarity, net charge and other properties could anticipate the mentioned drawbacks that can be manipulated during the library design step in a bottom-up approach through the choice of amino acids.
Chromatographic separations provide useful information about analyte properties. Linear regression was used to analyze the relationship between experimentally (UPLC) determined retention times (Rt) and library composition calculated with R to understand which peptide properties influenced most the Rt. For this purpose, the calculated properties for both libraries were put together to cover a larger sequence space. Additional data on new libraries can be added to cover even larger sequence spaces with the goal to gain knowledge on growing portions of it. The obtained relationships ( Figures S2-S4) indicate moderate linearity for monoisotopic mass (R 2 = 0.48), Cruciani H-boding property (R 2 = 0.48) and aromaticity (R 2 = 0.44), a low level of linearity for the class of tiny (R 2 = 0.3) and Cruciani polarity (R 2 = 0.22) and no linearity for other properties (R 2 < 0.15). As expected, it is not trivial to link a specific property to the Rt measurements [38], but a balance of different properties is involved.
Using a nonparametric statistical measure of Spearman's correlation coefficient (ρ), which does not assume normality in data distribution, we analyzed the strength of the monotonic relationship between Rt and computed peptide properties ( Figure S5 , while other properties exhibited weak and very weak relationships. Clearly, the relationship is more complex than a linear one and a plethora of peptide properties used together may better describe the libraries and could be used to assess the expected trend of Rt values.

Library Synthesis and Characterization by UPLC-MS, High-Resolution Mass Spectrometry and Nano LC-MS/MS
The synthesis of L1 and L2 was performed on monosized 30 µm TentaGel beads [9]. Peptide sequences were grown step-wise using the split-and-mix methodology, following the conventional Fmoc-SPPS protocol [39,40]. L1 had the first residue y (x 6 ) preloaded and splitting into two equal parts was required for all the remaining positions to obtain the 32-member library with Molecules 2020, 25, 3316 7 of 15 100% variability (Figure 2). L2, had the first residue y (x 7 ) preloaded and proline fixed in position x 3 , while positions x 1 , x 2 , x 4 and x 6 required splitting into two equal parts. Position x 5 was split into three equal parts to obtain the 48-member library with 98% variability. The next step was the assessment of the successful synthesis of the proposed libraries (Section 2.1). For this purpose, library components were released in solution prior to analysis using ammonia vapors to obtain amidated peptides at the C-terminus [9]. The library analysis was performed by UPLC-MS to determine whether we could detect all the 32 (L1) or 48 (L2) components in a single run based on their retention times and monoisotopic mass.
The UPLC-MS data for L1 (Figure 4), consisting of photodiode array (PDA) chromatograms and total ion chromatograms (TIC) were split into two regions, being zero to four minutes ( Figure 4a) and six to eight minutes (Figure 4b) representing the time frames where all the peptides were detected ( Figure S6a). In Figure 4c,d, MS spectra of nine representative peptides are shown. The full list of detected sequences alongside their retention times (Rt) is provided in Table S1 and Figure S7.  The UPLC-MS data for L2 ( Figure 5), consisting of the PDA and TIC chromatograms were split into two retention time regions, being 2 to 5.5 min (Figure 5a) and 6 to 9 min (Figure 5b) representing the time frames where all the peptides were detected ( Figure S6b). The MS spectra of 15 representative peptides are shown in Figure 5c,d, while the full list of detected sequences alongside their retention times is provided in Table S2 and Figure S8. The UPLC-MS data for L2 ( Figure 5), consisting of the PDA and TIC chromatograms were split into two retention time regions, being 2 to 5.5 min (Figure 5a) and 6 to 9 min (Figure 5b) representing the time frames where all the peptides were detected ( Figure S6b). The MS spectra of 15 representative peptides are shown in Figure 5c,d, while the full list of detected sequences alongside their retention times is provided in Table S2 and Figure S8. In addition, high-resolution mass spectrometry analysis of L1 was performed by direct infusion on the LTQ-FT Ultra mass spectrometer to confirm the successful synthesis of all library components. All 32 peptides that fit with amidation on the C-terminus were detected in the sample. In Figures S9 and S10, the high-resolution MS spectra of wsrasy and werasy sequences are shown. For the full list of MS spectra of the L1 components please refer to the supplementary File 2.
Next, the mass accuracy of UPLC-MS and direct injection LTQ measurements were evaluated in terms of mean absolute error (MAE) and root mean square error (RMSE). The definitions of these two standard metrics are given in Table 1, where represents the monoisotopic mass of the i-th peptide measured by the technique, represents the actual monoisotopic mass of the i-th peptide and n is the number of peptides within the library. The square expression used in RMSE makes this evaluation metric more sensitive to large errors, so it was used in addition to MAE to estimate the suitability of the used techniques. Having similar results in terms of both metrics, being 0.19-0.20 for UPLC-MS and 0.0007 for LTQ, we conclude that the majority of errors are similar in size. Moreover, the results presented in Table 1 clearly show that the LTQ is able to yield measurements that are three orders of magnitude more accurate than the ones obtained by the UPLC-MS. In addition, high-resolution mass spectrometry analysis of L1 was performed by direct infusion on the LTQ-FT Ultra mass spectrometer to confirm the successful synthesis of all library components. All 32 peptides that fit with amidation on the C-terminus were detected in the sample. In Figures S9 and S10, the high-resolution MS spectra of wsrasy and werasy sequences are shown. For the full list of MS spectra of the L1 components please refer to the supplementary File 2.
Next, the mass accuracy of UPLC-MS and direct injection LTQ measurements were evaluated in terms of mean absolute error (MAE) and root mean square error (RMSE). The definitions of these two standard metrics are given in Table 1, wherem i represents the monoisotopic mass of the i-th peptide measured by the technique, m i represents the actual monoisotopic mass of the i-th peptide and n is the number of peptides within the library. The square expression used in RMSE makes this evaluation metric more sensitive to large errors, so it was used in addition to MAE to estimate the suitability of the used techniques. Having similar results in terms of both metrics, being 0.19-0.20 for UPLC-MS and 0.0007 for LTQ, we conclude that the majority of errors are similar in size. Moreover, the results presented in Table 1 clearly show that the LTQ is able to yield measurements that are three orders of magnitude more accurate than the ones obtained by the UPLC-MS.
Mass spectrometry-based analysis of peptides allows the identification of peptide molecules based on their monoisotopic masses for single or multiply charged ions, created during ionization in the ion source. The analysis of complex peptide mixtures is based on similar principles applied to single peptide analysis. The quality of the MS data depends upon the analytical mass accuracy and resolution. However, with the increase of sample complexity in terms of library size and sequence length, signals of single peptides might overlap and hamper the unambiguous assignments [41].
In the top-down strategy, several approaches have been used for the identification of peptide sequences from complex OBOC libraries with MALDI-based MS being the method of choice [16,[42][43][44][45]. Frequently, the identified peptides were further analyzed with a variety of other, non-MS based methods, that included isothermal titration calorimetry (ITC), nuclear magnetic resonance (NMR), enzyme-linked immunosorbent based assays (ELISA) and surface plasmon resonance (SPR), with the intention to prove efficient interactions of identified peptides with their ligands. Nevertheless, most of the identified hits failed to provide conclusive data on ligand-target or peptide-protein interactions or showed a tendency to give false positives. This problem emerged during the ligand validation process, due to the inability to unambiguously identify sequences during the screening step [27]. This could arise from the inadequacy of the MS-based instrumental method to detect the exact permutation involved in the interaction rather than the most efficiently detectable sequences from the mixture, i.e., the sequences that are preferentially ionized using electrospray (ESI), MALDI or other ionization techniques.
Mass spectrometry is a powerful tool for the analysis of complex peptide libraries [9]. The high-resolution power of available instruments allows for the detection and successful resolving of thousands of peptides in a single spectrum [41]. It is a considerably faster method compared to LC-MS. However, the instruments are expansive and require highly skilled personnel to conduct such an analysis. Often, laboratories do not have such equipment. The bottom-up design approach proposed here would allow one to routinely characterize simplified libraries using UPLC-Q-MS. Therefore, the proposed reduction in sample complexity will lead to a higher probability of each individual peptide to be selected for fragmentation, i.e., detected.

Conclusions
In this paper, we proposed a novel bottom-up experimental design approach for OBOC peptide libraries based on the choice of the appropriate library complexity to match the instrument characteristics (i.e., resolution power, accuracy) and availability. Unlike previous attempts to synthesize large and complex libraries, we wanted to exploit the available instrumental techniques coupled to existing tools for peptide properties estimation to obtain precise and efficiently detectable peptide libraries.
The value of this strategy is the combination of the experimental data and the possibility to computationally help the design and evaluation of mixture properties. After evaluating the possible library designs and choosing the appropriate library complexity, sequence-related properties such as hydrophobicity, charge, isoelectric point and others could be assessed. Such a rational analysis of peptide libraries before synthesis allows us to anticipate possible challenges that could be encountered during synthesis or chromatographic and mass characterizations.
We showed that UPLC-MS, high-resolution MS and nano LC-MS/MS were suitable instruments for the analysis of the mixture of 32 peptides (L1). In addition, we characterized all 48 peptides (L2), including the two peptides with overlapping masses, in a single run using UPLC-MS. The obtained results confirmed the suitability of the proposed methodology for sequence elucidation and provided insight into allowed library complexity for the proposed methodology. Accordingly, when the available instrumentation and methods are suitable, the library complexity can be fine-tuned to match the analytical instrument characteristics. Hence, the complexity of the library will be increased through the algorithm-assisted library design only if the analytical resolving power allows for it. With this approach, we guarantee the possibility of efficiently detecting all the library components prior to more complex screenings. We envision that precise and conclusive data can be obtained that will lead to the successful identification of hits.
The goal was to create a platform for OBOC library design and characterization based on the bottom-up strategy starting with simplified peptide mixtures designed using genetic algorithms to obtain fully characterized portions of the explored sequence space. By combining the data obtained for several new libraries, we could expand our knowledge of the available sequence space. Thus, frustration linked to big screenings and inadequate analysis with inconclusive validation results could be avoided. However, the choice of the suitable design model, top-down or bottom-up, will ultimately depend on the nature of the specific scientific question and the laboratory resources available. We envision that the strategy proposed here will be of interest to a wider peptide chemistry community.

GA-Assisted Library Design
The libraries were designed using a Non-dominated Solutions Genetic Algorithm (NSGA-II), as reported previously [46]. Briefly, the following parameters of the GA model have been used: individual = single library, population size: 500, representation: bitstring (length depending on the input), generations: 100 * (multiplied by) bitstring length or 50 generations without an average relative change in the best fitness function value, crossover rate: 80%, crossover function: scattered, mutation rate: 1%, mutation function: bitflip, selection function: stochastic uniform, elitism: 5%, Pareto fraction: 20%, distance measure function: distance crowding, fitness functions: library size and percentage of peptides without mass overlap.

Solid-Phase Peptide Synthesis (SPPS) Procedure
The synthesis of peptide libraries was performed using the "split and mix method" [29]. All solid-phase manipulations were done manually in polypropylene syringes, each fitted with a polyethylene porous disk. All peptide sequences where grown step-wise using the standard Fmoc SPPS on TentaGel ® (particle size: 30 µm monosized; capacity: 0.24 mmol/g resin preloaded with the 4-hydroxymethylbenzoic acid (HMBA) linker followed by the first amino acid (Fmoc-Tyr(D)-OH) introduction through the ester bond formation. HMBA was coupled to the resin in DMF through a peptide bond, the same way as described below for the peptide chain growth, using three-fold excess of HMBA over the resin, TBTU and DIPEA. Following this step, Fmoc-Tyr(D)-OH was coupled in DCM using a 4-fold excess of amino acid, 4 equivalents of DIC and 0.4 equivalents of DMAP. The reaction was performed for 30 min and repeated three times.
The chain growth was performed with a three-fold excess of amino acid over the resin in DMF, using TBTU and DIPEA as activating and coupling reagents in 1:2 ratio respectively relative to the amino acid. Fmoc removal was carried out with 20% (v/v) piperidine in DMF.
The cleavage of the side chains was achieved using a cleavage cocktail: 95% TFA, 2.5% TIS, and 2.5% water. The library was extensively washed with water followed by washes in DMF, DCM and diethyl ether. The Kaiser colorimetric test assay was used for the detection of primary amines [39].
The cleavage from the resin of library peptides was achieved using the ammonia vapors to obtain peptides mixtures having NH 2 at the C-terminal [40]. The peptide mixtures were then dissolved in 20% (v/v) acetonitrile/water mixtures, the beads were removed following centrifugation at 13,000 rpm for 30 min, the peptides in solution filtered and characterized by UPLC-MS, high-resolution mass spectrometry and nano LC-MS/MS to confirm the detection of all the library components.

Quantification of Resin Loading Capacity
In order to measure the loading capacity of the resin, piperidine washes were collected and measured by UV spectroscopy after coupling the first amino acid. After the Fmoc-Tyr(D)-OH peptide coupling to the HMBA-resin, Fmoc was removed in 5 mL 20% (v/v) piperidine/DMF, the supernatant collected and the Fmoc absorbance registered at 301 nm using the UV spectroscopy. The loading was calculated using the following equation: X = A · V · ε · m · b, where X is the loading of the resin, A is Fmoc absorbance at 301 nm, V is the volume of solvent, ε is the molar extinction coefficient of Fmoc at 301 nm (7800 M −1 cm −1 ), m is the mass of the resin in g and b the loading in mmol/g. The new loading was calculated to be 0.17 mmol/g.

The Spilt and Mix Method
L1 had the first residue y (x 6 ) preloaded and splitting into two equal parts was required for all the remaining positions to obtain the 32-member library. L2 had the first residue y (x 7 ) preloaded, proline fixed in position x 3 , while positions x 1 , x 2 , x 4 and x 6 required splitting into two equal parts. Position x 5 was split into three equal parts to obtain the 48-member library. The splitting in two equal parts was achieved by adding 4 ml of DMF to the reaction vial (v1) containing the resin beads. Next, one 1 ml of bead suspension in DMF (v1) was added to each new vial i.e., 2 ml of bead suspension was distributed (1 ml per each vial) into two new reaction vials (v2 and v3). The following step consisted of the addition of 2 ml of DMF to the starting reaction vial (v1), taking out 2 ml (distributed 1 ml per each new vial) and repeating this process 4 more times until all the beads were transferred from v1 to v2 and v3. The splitting in three equal parts was performed by adding 6 ml of DMF to the reaction vial (v1), followed by distributing 3 ml of bead suspension i.e., 1 ml per each new vial (v2, v3, v4). The rest of the process was similar to the previously described one with the difference of the addition of 3 ml of DMF to the starting reaction vial (v1), taking 3 ml out (distributed 1 ml per each new vial) and repeating this process until all the beads were transferred from v1 to v2, v3 and v4. The chain growth was performed as described above (Section 4.3). After the addition of each new amino acid, the contents of the reaction vessels were mixed and Fmoc removal carried out. After the necessary washes, the splitting was repeated until the last amino acids were coupled. Following the completion of the synthesis, the beads were mixed together and cleaved from the resin.

PackageR Peptide Property Calculations
The aaComp function was implemented that classifies amino acids based on their size, side chains, hydrophobicity, charge and their response to pH 7 into 9 classes (tiny, small, aliphatic, aromatic, non-polar, polar, charged, basic, acidic). The script was modified to include two additional amino acid classes: sulfur and hydroxyl. In addition, library properties were assessed through Cruciani properties Funding: This work was supported by the Croatian science foundation/Hrvatska zaklada za znanost (UIP-2019-04-7999), by the University of Rijeka under the project number (uniri-pr-tehnic-19-10), by MINECO-FEDER (BIO2016-75327-R) and by the Generalitat de Catalunya (XRB and 2017SGR0998).