Deconstructing the Potency and Cell-Line Selectivity of Membranolytic Anticancer Peptides**

Current cancer treatments damage healthy cells and tissues, causing short-term and long-term side effects. New treatments are desired that show greater selectivity toward cancer cells and evade the common mechanisms of multidrug resistance. Membranolytic anticancer peptides (mACPs) hold promise against cancer and multidrug resistance. Amphipathicity, hydrophobicity, and net charge of mACPs participate in their respective interactions with cell membranes and their overall inhibition of cancer cells. To support the design of cell-line selective mACPs, we investigated the relationships that amino acid composition, physicochemical properties, sequence motifs, and sequence homology could have with their potency and selectivity towards several healthy and cancer cell lines. Sequence length and net charge are known to affect the selectivity of mACPs between cancer and healthy cell lines. Our study reveals that increasing the net charge or flexibility (i.e., small and aliphatic residues) influences their selectivity between cancer cell lines with comparable lipid compositions.


Introduction
Conventional cancer treatments such as surgery, chemotherapy, and radiation therapy have limited or no selectivity toward cancer cells, causing short-term and long-term side effects. [1,2] For instance, surgery and radiation therapy for prostate cancer increase the risk of urinary incontinence, erectile dysfunction, and intestinal complications. [3,4] Likewise, breast cancer chemotherapy leads to premature menopause, which increases the risk of osteoporosis and impaired fertility. [5] Current anticancer drugs present promiscuous cellular targeting and insufficient cellular drug uptake. In addition, cancer cells are susceptible to developing drug resistance against these agents through multiple mechanisms. [6,7] These limitations decrease drug efficacy in advanced stages and increase the likelihood of metastatic disease. Therefore, new treatments [8] and new drug delivery strategies [9] that present greater selectivity towards cancer cells and evade the common mechanisms of multidrug resistance are sought after.
Anticancer peptides (ACPs) are small biologics with less than 50 residues long. The peptides are generally amphipathic and cationic due to a high proportion of basic and hydrophobic residues. [10,11] These properties can be understood as many ACPs were originally discovered as broad-spectrum antimicrobial peptides. They fold into various structures, such as α-helices (full length or kinked), β-sheets, or a combination of the above, or they do not fold into any secondary structure (random coils). Although in the latter case, random-coiled ACPs may adopt a structure when approaching cell membranes. [12] ACPs can kill cancer cell lines through different mechanisms; (i) some peptides are introduced into the cell and attack the mitochondrial membrane, triggering apoptosis (non-membranolytic ACPs), while (ii) others directly destabilise the cell membrane, generating pores and causing necrosis (membranolytic ACPs, mACPs) [13,14] Besides their intrinsic anticancer activity, many peptides such as HNP-1 [15] possess an additional immunomodulatory function, [16] which recruits immune cells to attack cancer cells. Others inhibit angiogenesis; [16] the formation of blood vessels necessary for cancer proliferation and metastasis. [17] Furthermore, some ACPs modulate essential proteins; for example, Pep-7 cancels the activity of the E7 oncoprotein of human papillomavirus type 16 (one of those related to the development of cervical cancer) and reactivates the pRb / E2F pathway, which is necessary for the correct control of the cell cycle. [16,18] Cancer cells express specific membrane lipids [19,20] and proteins [21] on their surfaces, which mACPs possibly target. These peptides are the most studied ACPs with selectivity toward cancer cells that do not induce multidrug resistance. [16,22] Although the selectivity of mACPs is not yet fully understood, both the components of cancer cell membranes [8,10,[23][24][25][26] and the key global physicochemical properties of mACPs matter. [12,24,[27][28][29][30][31] To that extent, specific studies in α-helical cationic mACP families (i. e., aureins, cecropins, magainins, temporins) have revealed that sequence length [24,27] and net charge [31] influenced their selectivity between cancer and healthy cell lines. Understanding the key characteristics that promote the selectivity across cancer cell lines would support the discovery and design of cell-line selective mACPs.
In the present work, we studied the selectivity potential of 138 mACPs exhibiting inhibition towards two or more of 77 cancer cell lines using our in-house index, CCSI. We compared amino acid composition, physicochemical properties, sequence motifs, and sequence homology between two classes, selective and non-selective mACPs, before studying their relationships to potency and cell-line selectivity.

Defining cancer cell inhibition and selectivity
We collected 138 membranolytic anticancer peptides (mACPs) from DBAASP [32] using the Ranking Search option and specific filters. Each peptide exhibited inhibition against more than two specific cell lines, as expressed by their IC 50 values. Alongside the sequences, we identified 11 normal healthy cell lines (HCLs, Figure 2) and 77 distinct cancer cell lines (CCLs). Among the 138 peptides, only 0.7 % were evaluated against seven or more CCLs; 12.3 % against five CCLs; 18.8 % against four CCLs; and 65.9 % against three CCLs ( Figure 1A). The synthetic peptide named Buforin-2(5-13) [17][18][19][20]3 (DBAASP ID 887) was the most evaluated peptide against 23 cancer cell lines. We analysed the ratios across all evaluated cancer cell lines per peptide from biological experiments. Figure 1B lists the 77 cancer cell lines across 15 cancer types. Each bar represents the number of tested peptides against one cell line. The three most tested CCLs are related to blood disorders; K562, CCRF-CEM, and Jurkat E6-1 against 48, 30, and 29 peptides. Blood cancers are the most represented type of cancer in this analysis, with 11 cell lines. Besides blood cancer, mACPs have equally inhibited cell lines from other organs; brain (U251-MG), breast (MDA-MB-361), cervix (HeLa S3), colon (SW480 CCL-228), lung (NCI-H157, A549), and prostate (PC-3) -see Supporting Information Lists S1-S2 for their full names. In contrast, kidney, oral, ovary, pancreatic, skin, and stomach cancers were the least investigated cell lines.
Subsequently, we determined the selectivity index (SI) for each sequence, which is traditionally defined as the ratio between cytotoxic activities (expressed with the IC 50 values) of a single compound (e. g., peptide) against one HCL and one CCL (Eq. 3). In our dataset, roughly a third of the sequences (47) were reported against one of the HCLs (List S1). In the absence of SI values and to support the design of cancer cell-specific mACPs, we studied how the peptide physicochemical descriptors and other features contribute to discriminating between cancer cell membranes. We developed our in-house cancer cellline selectivity index, CCSI. We hypothesised that the differences in cytotoxicity of a peptide against more than two CCLs could be linked to their differences in lipid composition. Following the same reasoning, an untested mACP might display different cytotoxic values against one or more HCLs. Like SI, CCSI resulted from the difference in cytotoxic activity between two cell lines, using the highest and the lowest pIC 50 values from two CCLs for a single peptide (Equations 1-2). A higher pIC 50 value denotes a more potent mACP. A cell-line selective mACP (ACP-S) has a CCSI equal to or superior to 0.5; otherwise, the peptide is labelled as non-selective (ACP-NS). Our dataset included 43 ACP-S and 95 ACP-NS. We illustrated their cytotoxic activity (expressed with pIC 50 values) towards three or more of the 77 cancer cell lines for both ACP-S ( Figure S1) and ACP-NS ( Figure S2). In both figures, the rows represent all tested peptides, whereas the columns indicate all 77 cancer cell lines. The pIC 50 values are depicted with a gradient from yellow (low) to red (high). In Figure S1, we denoted the selectivity of 43 mACPs (IDs: 9549 … 1003), where at least two tested cell lines presented divergent colours, indicating distinct pIC 50 values. Both heatmaps in Figure S2 illustrated the lack of selectivity for all 95 ACP-NS (IDs: 5211 … 197, 12315 … 5213), with most tested cell lines exhibiting a colour monotony marking pIC 50 values in the same order of magnitude. The exceptions to those observations are associated with the definition of CCSI. In all three heatmaps, blank spaces also depicted biological data scarcity, often due to the costs and material availability of running wet-lab experiments. No single peptide was evaluated against all 77 CCLs; the best example is DBAASP ID 887, with 23 cancer cell lines.
Out of curiosity, we compared our CCSI values with identified SI values. We first reported the SI values for the 47 peptides showing cytotoxic activity against at least 3 of the 27 CCLs and for each of the 11 HCLs, as illustrated in Figures 2A-B. The figures depicted the selectivity on a negative logarithmic scale À log10(SI) for visualization purposes, where higher values indicated greater selectivity. In general, many peptides exhibited lower selectivity against some of the 27 CCLs when compared to the following four healthy cell lines; PNT1 A (prostate cells), GES-1 (human gastric epithelial cells), HMEC-1 (human microvascular endothelial cells) and HUVEC (human umbilical vein endothelial cells). For example, the non-selective mACP Arminin-1a(40-70)-NH 2 (ID: 12161) was tested against five CCLs (THP-1, K562/ADM, K562, Jurkat, and HL-60) and three human HCLs; PBMC (peripheral blood mononuclear cells), HEK293 (human embryonic kidney cells), and HUVEC (human umbilical vein endothelial cell). Irrespective of the CCLs, we Dr. Fabien Plisson obtained his PhD in 2012 from the University of Queensland (UQ), Australia, combining natural product chemistry, kinase drug discovery, and chemoinformatics with Prof. Robert J. Capon and Spanish biotech company Noscira. He carried out postdoctoral studies in peptide drug design at UQ, with Prof. David P. Fairlie in collaboration with Pfizer. In 2017, he started his own laboratory focusing on the discovery and design of bioactive peptides. His research merges drug discovery, protein engineering, structural bioinformatics, and artificial intelligence.
noted that SI values were moderated (e. g., À log10(SI)~0, SI HL-60 : 4.41, purple) against HEK293 or HUVEC, and they were elevated (e. g., À log10(SI) > 0.5 or SI > 2, SI HL-60 : 23.99, dark purple-black) against PBMC. However, the SI values corresponding to the five CCLs appeared in the same colour tones for a given HCL, suggesting similar cell-line selectivity indices and the peptide's non-selective nature. The peptide can selectively inhibit the growth of these cancer cell lines compared to these healthy ones, but it cannot differentiate across cancer cell lines. Similar observations could be made for most peptides listed in Figures 2A-B except for a handful of selective mACPs (e. g., IDs: 1195, 3414, 3418, 3427, 3429, 11805, and 11807).
CCSI indicated cell-line selectivity across CCLs, whereas SI denoted the selectivity between one CCL and one HCL. We performed linear regression analyses between our CCSI and SI values. The results in Figure S3 summarised eight dispersion graphs with sufficient data for analysis between each cancer cell line against two healthy endothelial cell lines (HMEC-1, HUVEC). The linear relationships are quasi-inexistent with R 2~0 for four CCLs; MCF-7, MDA-MB-435S, U251-MG against HMEC-1, and the human colon cancer cell line SW480 CCL-228 against HUVEC. In contrast, the other four cancer cell lines presented apparent (weak-moderate) positive correlations between the respective CCSI and SI values. For example, we observed an R 20 .33 for the T lymphoblastoid cell line CCRF-CEM, R 2~0 .45 for cervical adenocarcinoma HeLa S3, and R 2~0 .67-0.68 for NCI-H157 and PC-3. What could explain the different responses between all cancer cell lines is unclear. To know if the cut used in our CCSI-based classification was adequate, we used the equations derived from the regression analyses of the last four cell lines (NCI-H157, PC-3, CCRF-CEM, HeLa S3). We substituted each x with a CCSI value of 0.5 (used to define a selective ACP), and we reported the corresponding SI values (y) greater than 2, in agreement with the definition of selectivity index. [33] Noteworthy, the coefficients of determination (R 2 ) were relatively low, indicating high variability in the data and limited precision for the predicted SI values, whereas the low p-values supported the positive relationships between CCSI and SI values.

Statistical analysis of AAC and physicochemical properties
Our previous observations demonstrated that selective and non-selective mACPs responded differently against the same cancer cell lines. In other words, the peptides would present different characteristics for a fixed set of membrane lipid compositions. At first, the 43 selective sequences (ACP-S) contain between 11 and 35 residues (Median: 17, Q1 : 15, Q3 : 25), whereas their 95 non-selective counterparts (ACP-NS) include between 11 and 84 residues (Median: 17, Q1 : 13, Q3 : 20.5). Overall, length was not a discriminatory factor. Subsequently, we measured the probabilities of all amino acids for each class, summarised in Table 1. Small (G, S, P), hydrophobic (I, W, Y, M) and polar (Q, N, T) amino acids populated ACPs at more modest levels. We noted that most mACPs had more positively charged residues (K > H, R) than their negatively charged amino acids (D, E). The high proportions of lysine (K) and leucine (L) are reminiscent of their amphipathic character as membranolytic peptides. None contain cysteine residues. Finally, both alanine (A) and valine (V) were found at higher levels among ACP-S, unlike phenylalanine (F), which was more frequent among ACP-NS.
In addition to the amino acid composition analysis, we compared the means of 81 PCPs between ACP-S and ACP-NS.   We identified a total of 31 PCPs with significant differences (p � 0.05). Finally, we adjusted the p-values controlling the false discovery rate leading to a final list of 11 PCPs with significant differences (p � 0.05) between the two mACP classes. In Figure 3, we detailed the distributions of the 11 PCPs. Five properties informed about different hydrophobicity scales (Chothia hydrophobicity, Zimmerman hydrophobicity, Cruciani Property (CP) hydrophobicity, hydrophobicity index, and hydrophobic moment). Most nonselective peptides (ACP-NS) possessed higher levels of hydrophobicity, bulkier and steric properties than their selective counterparts (except for Chothia hydrophobicity scale), which we could associate with their elevated levels of aromatic residues, in particular phenylalanine (cumulative frequency f W,Y,F = 0.098 versus 0.140, ACP-S vs. ACP-NS - Table 1). Among the other properties, selective mACPs carried slightly less charged residues (f K,R,H,D,E = 0.317 versus 0.339) than their nonselective counterparts but bore similar enrichments in polar amino acids (f S,T,N,Q = 0.078 vs. 0.081). In contrast, the nonselective mACPs contained fewer small and aliphatic amino acids (f A,G,V,I,L,M = 0.487 vs. 0.413) and presented higher local flexibility indices.

Explaining potency and cell-line selectivity with amino acid composition
We investigated whether these general observations in AAC would persist regardless of the targeted cancer cell lines. We picked the 11 most evaluated cell lines ( Figure 1) across different organ tissues; blood (4), brain (1), breast (1), cervix (1), colon (1), lung (2), and prostate (1). We first illustrated the pIC 50 values (y) for all mACPs tested against a single cell line (x) where each peptide was labelled as selective (triangle) or nonselective (circle). We also informed how frequently positively or negatively charged residues ( Figure 4A) and small or bulky amino acids ( Figure 4B) were across each mACP. In both figures, ACP-S and ACP-NS displayed pIC 50 values across the same range except for HeLa S3, where the ACP-S (triangles) were more potent than their non-selective counterparts (pIC 50 ACP-S > 4.7). In Figure 4B, we noted that ACP-S would often contain more small residues (f G,C,A,P,S > 0.0) than their non-selective counterparts (circles) across several cell lines (i. e., U251-MG, NCI-H157, and PC-3). Likewise, we observed non-selective mACPs with higher levels of hydrophobic residues across other cell lines (i. e., Jurkat E6-1, K562) - Figure 4A. We did not report any selective mACP for MDA-MB-361. Both trends indicated that our previous observations about amino acid composition are likely cell linespecific. During that exercise, we noted that peptides would hold constant or varying amino acid compositions across a given range of inhibition for cell lines from one or more organ tissues (e. g., Jurkat E6-1, K562, MDA-MB-361). The cancer cell line HL-60 exhibited a quasi-linear relationship (R 2 = 0.66, Figure S5) between the peptides' inhibitory activities (pIC 50 ) and their differential enrichments, where the most potent sequences carried multiple positively-charged residues. The relationships between the peptide sequences and their pIC 50 values for other cell lines could not be explained solely from their differential enrichments ( Figure 4A, Figures S5-S6 ). Instead, we represented the distribution of differential enrichments (y) for all mACPs tested against each cancer cell line (x), as illustrated in Figures 4C and 4D. Several cancer cell lines presented a similar or identical distribution of peptides within a specific range of differential enrichments. For example, the breast cancer cell line MDA-MB-361 displayed the narrowest ranges of differential enrichments where the tested mACPs were predominantly cationic (0.35 to 0.50, Figure 4C) and hydrophobic (À 0.10 to À 0.50, Figure 4D). In contrast, the brain cancer cell line named U251-MG was used against the broadest range of peptides. Nevertheless, both sets of mACPs inhibited these cancer cell lines with the same order of magnitude, i. e., pIC 50 values ranging from 4.0 to 5.6 ( Figures 4A and 4B). We measured the goodness of fit between all CCL distributions using the two-sample Kolmogorov-Smirnov test, considering the different mACP sample sizes. The results are summarised in Figures 4E and 4F. Both figures indicated that cancer cell lines U251-MG, NCI-H157, and PC-3 formed one strong cluster. Likewise, the cancer cell lines HeLa S3, SW480, and CCRF-CEM adopted identical or similar distributions, whereas blood CCLs Jurkat E6-1 and K562 fitted moderately. We hypothesised that the cancer cell lines with similar or identical distributions were tested either (1) against the same set / homologous mACPs or (2) that these cell lines would consist of similar membrane lipid compositions akin to mACPs with preferential amino acid composition. Regarding the first hypothesis, we evaluated the proportions of matching mACPs across the 11 CCLs. High goodness of fit between cancer cell lines resulted from identical peptide sequences (Table S1). The same twenty-five mACPs inhibited the cancer cell lines HeLa S3, SW480, and CCRF-CEM. A minimum of 15 common peptides exhibited anticancer activity against U251-MG, NCI-H157, and PC-3. Finally, Jurkat E6-1 and K562 cell lines were tested simultaneously using twentyseven mACPs. In that last example, both cell lines responded similarly to the peptides, as illustrated by individual pIC 50 values in Figures 4A and 4B, suggesting similar membrane lipid compositions. Both cell lines belong to blood cancer. In contrast, the peptides tested against HeLa S3, SW480, and CCRF-CEM or U251-MG, NCI-H157, and PC-3 showed different individual pIC 50 values. These cell lines belong to different cancer types and may present distinct membrane lipid types or compositions. We curated lipidomic studies [34][35][36][37][38][39] over some of these cancer cell lines (i. e., HL-60, K562, SW480, A549, PC-3) comparing their contents in phospholipids or fatty acids, see Supporting Information Lipid_composition_CCLs.xlsx. Our analyses did not support nor reject the second hypothesis.

Analysis of sequence motifs and homologous sequences
Our comparative analysis between ACP-S and ACP-NS subsets pointed to subtle differences in AAC and PCPs, resulting from different distributions between polar, charged and hydrophobic residues. From a design strategy standpoint, we could convert a non-selective ACP-NS peptide to its ACP-S counterpart with a few site-directed mutations. Therefore, we evaluated the similarity between the two subsets regarding sequence motifs (domains) and sequence homology.
Using the MEME suite, [40] we first calculated the most common sequence domains. We  Table 1). The most popular domain among non-selective membranolytic peptides was an 11-residue long domain that capped the Nterminal region for a third of the sequences (33,~35 %) - Figure 5D. It was also the sole domain enriched in phenylalanine, as measured in Table 1. The second popular domain contained 15 residues and leaned towards the C-terminal region for 6 non-selective sequences (~6 %, Figure 5E). Likewise, the last ACP-NS domain was 20-residue long and was located in the C-terminal region of 4 sequences (~4 %, Figure 5F). Both domains were headed with polar amino acids. With regards to the nature of the residues, we discerned that both ACP-S and ACP-NS peptides alternated between hydrophobic residues (in green), charged residues (mainly positive, in purple) and occasionally, they presented polar amino acids (in orange), clustered into short domains of 2-4 mers. Such alternates are characteristic of amphipathic membrane-active peptides.
The high proportion of sequences with similar domains suggested the presence of homologous peptides. Therefore, we clustered mACPs into families guided by moderate-high sequence similarities [S OSA , Eq. (5)] and coincidental name tags (e. g., Temporin-PE). We catalogued 58 families (Table S3) Figure 7. We also indicated the differential enrichments in positively charged residues (Figures 6A and 7 A), aliphatic and aromatic residues (Figures 6B and  7B) and small residues ( Figure 7B). Most non-selective mACPs with higher levels of hydrophobic residues were conjointly tested against blood cancer cell lines Jurkat E6-1, K562, and breast cancer cell line MDA-MB-361 belong to the P18 and CRAMP-18 families. These peptides demonstrated anticancer activity within the same range (CRAMP-18 : pIC 50~4 -4.5 and P18 : pIC 50~4 -5.5). Members of the BP100 family were exclusively reported against blood cancer cell line K562. Likewise, HeLa S3, SW480, and CCRF-CEM were evaluated with a handful of selective mACPs from the HAL-1 and LL-III families and non-selective homologs derived from Macropin-1. These families presented moderated levels of charged, aliphatic and aromatic residues. Selective LL-III and non-selective Macropin-1 homologues exhibited potent anticancer activity with pIC 50 values ranging between 4.8 and 5.6 against CCRF-CEM and HeLa S3, suggesting similar membrane lipid compositions of both blood cancer cell lines.
Our Kolmogorov-Smirnov half-matrices revealed moderate fittings between cancer cell lines A549, HL-60 and PC-3, implying possible overlaps in positively charged mACPs (HL-60/ A549: with a KS p-value of 0.765, Figure 4E) or small and aliphatic peptides (HL-60/PC-3: with a KS p-value of 0.799, Figure 4F). However, most peptides differed in nature and belonged to distinct mACP families, across all three cancer cell lines, with few exceptions, such as Buforin-2, Gaegurin 5 and PTP homologues (Figure 7). Besides their heterology, many highly potent and non-selective mACPs against the three cancer cell lines shared similar characteristics; elevated levels of positively charged lysine, arginine, histidine ( Figure 7A), and moderatedhigh hydrophobicity ( Figure 7B). The most potent example is the KLW peptide with a pIC 50 value over 6 when tested against lung cancer cell line A549. The five cancer cell lines U251-MG, NCI-H157, A549, HL-60, and PC-3 might present comparable levels of negatively charged lipids on their cell surfaces to accommodate these non-selective mACPs.

General discussion
Developing peptide-based drugs that selectively inhibit cancer cells would establish efficient and reliable therapeutic solutions. [51] Membranolytic ACPs (mACPs) represent one promising avenue with remarkable selectivity towards cancer cells, evading multidrug resistance. The selectivity capability of a peptide drug is traditionally defined as the ratio of cytotoxicity to its biological activity; Badisa and co-workers [33] established the selectivity index (SI) using IC 50 values against a normal healthy cell line (HCL) and a cancer cell line (CCL). A higher SI value means greater selectivity, while a value less than 2 indicates the general toxicity of a peptide. Membranolytic ACPs have demonstrated broad-spectrum inhibition of cancer cell lines and limited toxicity against normal cells. Numerous attempts for designing selective mACPs focused on studying their physicochemical properties or the characteristics of cancer cell membranes, as summarised in reviews. [10,12,24,41,42] Despite their sequence heterology and origin, many mACPs presented Figure 5. Six top-ranking sequence domains for selective peptides (A-C) and non-selective peptides (D-F) using the MEME suite. [40] Hydrophobic residues, charged residues and polar residues are depicted in green, purple, and orange, respectively. similar physicochemical properties, including 10-30 residues in length, a net charge between + 2 and + 9, 40-60 % hydrophobic amino acids, and they predominantly fold into α-helices. These properties sometimes correlate with an increase in activity and selectivity [24,28,31] against targeted cancer cells. These studies have also attributed the selectivity (or lack thereof) of αhelical cationic mACPs to their interactions with negatively charged components on cancer cell surfaces (e. g., phospholipids, glycoproteins, glycolipids, proteoglycans) and characteristics such as membrane potential, membrane permeability [42] and asymmetry, [43] cholesterol content and cell surface area.
In the present study, we evaluated the potential of mACPs to inhibit two or more cancer cell lines selectively. We created CCSI, the cancer cell-line selectivity index, to distinguish selective peptides from non-selective counterparts, using the highest and the lowest pIC 50 values from two CCLs for a single peptide (Eq. 1-2). A higher pIC 50 value denoted a more potent mACP. A selective mACP (ACP-S) had a CCSI equal to or superior to 0.5; otherwise, the peptide was labelled as non-selective (ACP-NS). Selective peptides would exhibit different magnitudes of anticancer activity across targeted CCLs, whereas the nonselective examples inhibited cancer cell lines with similar intensity. We assembled 43 ACP-S and 95 ACP-NS with pIC 50 values across 77 cancer cell lines (Figure 1). Considering that SI and CCSI differ, we would not expect to observe similar features linked to the potency and selectivity of mACPs. However, we discovered weak-moderate positive relationships between CCSI and SI values against specific cancer cell lines (HeLa S3, CCRF-CEM, NCI-H157, and PC-3) - Figure S3.
Membranolytic ACPs presented high levels of lysine, leucine, alanine, glycine, isoleucine, and valine ( Table 1). Some of these residues appear more frequently in α-helical ACPs than in nonanticancer and antimicrobial sequences. [44,45] Raghava's research group has notably used the different amino acid frequencies and k-mers to develop early machine-learning predictors of anticancer activity. [44,46,47] Looking at differences between cellline selective and non-selective mACPs, we noted that the former group included many more small residues (i. e., alanine, valine). In contrast, the latter contained higher levels of phenylalanine, leucine, and positively charged residues such as lysine (Table 1). These differences translated into bulkier and more hydrophobic non-selective peptides in terms of global physicochemical properties. Selective mACPs differed from their non-selective counterparts by magnitudes of hydrophobicity and flexibility (Figure 3). Interestingly, Vermeer and co-workers identified that adding small proline kinks to α-helical cationic peptides induced conformational flexibility influencing their activity against cancer cells. [48] However, these observations did not hold across all 11 cancer cell lines (Figure 4). Our sequence analyses using the MEME suite and the Optimal String Alignment distance revealed that such characteristics could be reduced to a handful of mACP families (Figures 5-7). Potency and cell-line selectivity did not go hand-in-hand across all targeted cancer cell lines. In fact, they would correlate with specific physicochemical properties of mACPs across three clusters made of cancer cell lines. In the first cluster, grouping HeLa S3 (cervix), CCRF-CEM (blood), and SW-480 (colon), more hydrophobicity led to more potent mACPs, whereas a greater net charge diminished their cell-line selectivity, as illustrated by selective HAL-1 and LL-III homologues (sequence motif Figure 5A) and non-selective Macropin-1 examples - Figure 6. These findings corroborated with previous structure-activity relationships studies of α-helical cationic peptide P [49] and temporin-1CEa, [50] where hydrophobicity led to stronger anticancer activities against multiple cancer cell lines (cervix: HeLa, melanoma: A375, B16, lung: A549 and breast: MCF-7, MDA-MB-231, Bcap-37). Because the pIC 50 values were in the same order of magnitude across cell lines, both studies did not conduct the rationale design of selective peptide analogues. In the second cluster made of cancer cell lines NCIÀ H157 and A549 (lung), PC-3 (prostate), U251-MG (brain), and HL-60 (blood), the design of more potent mACPs was associated with an increase in positively charged residues. The design of potent α-helical antimicrobial peptides has led to a similar relationship. [51][52][53] Harris and co-workers [24] have previously mentioned the critical role that both lysine and arginine side chains play in interacting with membranes via a "snorkelling" mechanism. The cell-line selective mACPs like dermaseptins contained consecutive small residues, e. g. alanine, glycine (sequence motif Figure 5B), whereas non-selective examples (e. g. TsAP-1 and TsAP-2) presented higher levels of hydrophobicity - Figure 7. The αhelical-forming propensity and amphipathic nature of alaninerich peptides have long been associated with their ability to penetrate microbial and cancer cell membranes. [52] The presence of consecutive small residues, i. e. alanine and valine, may stabilize these α-helical structures [54] and reduce their local flexibility ( Figure 3) to favour cell-line selectivity. Dermaseptins are a well-documented family of antimicrobial and anticancer peptides derived from frog skins. [55] In 2017, Dos Santos and coworkers revealed the role of cell surface glycosaminoglycans in the anticancer mechanism of action of dermaseptin-B2, [56] which might explain the cell-line selectivity of this peptide and other members of dermaseptin family toward certain cell lines. Finally, the third and last cluster included blood cancer cell lines Jurkat E6-1 and K562 as well as breast cancer cell line MDA-MB-361. In that cluster, we only reported cell-line non-selective mACPs from three populated peptide families; P18, CRAMP18 and BP100 homologues (sequence motif Figure 5D). Their potency was partly linked to their hydrophobicity, supporting previous observations against multiple cancer cell lines. [31,49]

Conclusions
Our study discovered critical physicochemical properties to distinguish between 43 cell-line selective membranolytic anticancer peptides and their 95 non-selective counterparts. Hydrophobicity, net charge and the presence of small and aliphatic residues influenced the potency and selectivity of these mACPs across cancer cell lines in a cell-specific manner. In some cases, adding hydrophobic and bulkier residues led to more potent peptides, whereas an increase in positively charged residues makes the peptides more selective against a cluster of cancer cell lines (i. e., HeLa S3, CCRF-CEM, and SW-480). In others, the presence of successive small residues helped mACPs to gain stability and cell-line selectivity against another cluster of cancer cell lines (i. e., NCI-H157, A549, PC-3, U251-MG, and HL-60). Interestingly, these cancer cell lines responded similarly against a defined set of mACPs -homologues or sharing physicochemical properties -suggesting these clusters might present comparable lipid compositions on their cell surfaces. Lipidomic studies are needed to support or reject this hypothesis. Despite these observations, the limited number of mACPs provided insufficient ground to generalise for designing mACPs or developing cell line-specific machine learning models. Medium-large libraries of mACP homologues (with varying amino acid composition and physicochemical properties) should be evaluated across the widest range of cancer cell lines pairing with lipidomic studies.

Experimental Section Datasets
Original dataset. We initially gathered 348 peptide sequences from DBAASP (Database of Antimicrobial Activity and Structure of Peptide) [32] using the Ranking Search option and the following selection criteria: the sequences could be of any sequence length, of natural or synthetic origin, and labelled as monomers. Their chemical structure must contain only proteinogenic residues (no unusual amino acids and interchain bonds) without any modification in the N-terminus. We kept the sequences that could be amidated on their C-terminus. All available human cancer cell lines were specified in the database, regardless of the culture medium. The activity measurements were sought after to be reported as IC 50 , defined as the peptide concentration at which cell viability is reduced by 50 % and where a low value denotes greater effectiveness. [57] All peptides demonstrated anticancer activity against at least three of 77 listed cancer cell lines. Only the peptides that targeted the lipid bilayer(s) were chosen since they would represent membranolytic peptides. Finally, we also selected the sequences that were evaluated against more than two cancer cell lines leading to a final set of 138 peptide sequences.
ACP-S and ACP-NS subsets. All 138 mACPs exhibited anticancer activity against at least three specific cell lines, as expressed by their IC 50 values. We converted these values to their corresponding negative logarithmic transformations (pIC 50

Features
Amino acid composition (AAC). The Amino Acid Composition (AAC) is the fraction of each amino acid type within a peptide sequence of length N. The fractions f of all 20 natural amino acids r (r = 1, 2, …, 20) were calculated using the equation [Eq. (4)]; they were looped over all M sequences in a given dataset and averaged out.
The AAC values were measured using R package protr (v.1.0-1) [58] The results for ACP-S and ACP-NS datasets are summarized in Table 1.
Sequence motifs. We searched for sequence motifs using the website MEME Suite [40] specifically with the Multiple Em for Motif Elicitation (MEME) tool (v.5.1.1), [59] which allows us to find recurring patterns or motifs without gaps within the sequences. We summarized the parameters used in the search in Table S1. We used MEME classic mode and the protein standard alphabet for motif discovery, where we introduced a set of sequences and the tool identified enriched motifs. For the site distribution, we selected the option zero or one occurrence per sequence (zoops), where MEME assumes that each sequence can contain at most one occurrence of every motif. We chose ten motifs per search; the tool continued to search until ten motifs were found or one of the other thresholds (i. e., the maximum execution time) was reached. Finally, we left the advanced default options.

Statistical tests
We measured the normality of dataset distributions (i. e., physicochemical properties) for both groups (ACP-S and ACP-NS) using Shapiro-Wilk and Lilliefors tests, according to the sample sizes, before evaluating which dataset(s) had the same distribution in both groups. We determined the variance with either F-test for a normally distributed dataset (ND) or the Fligner-Killen test for an abnormally distributed dataset (AD). We compared the means of physicochemical properties between membranolytic ACP-S and ACP-NS by applying the three respective statistical tests; (1) Welch's t-test to NDs with different variances, (2) Wilcoxon test (also known as Wilcoxon rank-sum) to ADs with the same variance and (3) Kolmogorov-Smirnov test to ADs with different variances, using a significance level α of 0.05. We controlled the false discovery rate with the Benjamini and Hochberg method using the same value α. All tests were performed using R (v.3.6.3) [62] and RStudio (v.1.2.5033) . [63] The statistical pipeline is visible in Figure S1.

Sequence similarity and mACP families
We measured each mACP sequence similarity to all other 137 peptides using the Optimal String Alignment distance (d OSA ). Each distance between two sequences was calculated using the R package stringdist. [64] This distance calculates the number of deletions, insertions, substitutions, and transpositions (maximum one) necessary to convert sequence A to sequence B. From d OSA , we could derive the sequence similarity index S OSA using the equation [Eq. (5)]. A high similarity value means that sequence A has few mutations to convert into sequence B. Subsequently, the peptides with S OSA of 0.95-1 were considered analogous. They formed mACP families guided by both S OSA similarity and matching names. Subsequently, mACP families consisting of two or more peptides were selected and identified as ACP-S and ACP-NS.

Multiple sequence alignment
We aligned each mACP family (containing at least 1 ACP-NS and 1 ACP-S) using MUSCLE (Multiple Sequence Comparison by Log-Expectation) [65] and MEGA (Molecular Evolutionary Genetics Analysis) . [66] We used the Jalview program [67] to colour each amino acid either as hydrophobic, charged, or polar.