Trifluoroacetic acid as excipient destabilizes melittin causing the selective aggregation of melittin within the centrin-melittin-trifluoroacetic acid complexa)

Trifluoroacetic acid (TFA) may be the cause of the bottleneck in high resolution structure determination for protein-peptide complexes. Fragment based drug design often involves the use of synthetic peptides which contain TFA (excipient). Our goal was to explore the effects of this excipient on a model complex: centrin-melittin-TFA. We performed Fourier transform infrared, two-dimensional infrared correlation spectroscopies and spectral simulations to analyze the amide I'/I'* band for the components and the ternary complex. Melittin (MLT) was observed to have increased helicity upon its interaction with centrin, followed by the thermally induced aggregation of MLT within the ternary complex in the TFA presence.

Trifluoroacetic acid (TFA) may be the cause of the bottleneck in high resolution structure determination for protein-peptide complexes. Fragment based drug design often involves the use of synthetic peptides which contain TFA (excipient). Our goal was to explore the effects of this excipient on a model complex: centrin-melittin-TFA. We performed Fourier transform infrared, two-dimensional infrared correlation spectroscopies and spectral simulations to analyze the amide I'/I'* band for the components and the ternary complex. Melittin (MLT) was observed to have increased helicity upon its interaction with centrin, followed by the thermally induced aggregation of MLT within the ternary complex in the TFA presence. Trifluoroacetic acid (TFA) is commonly found as excipient in synthetic peptides. The presence of this excipient is due to the process of solid phase synthesis, whereby this strong counter ion is used to remove the protecting group and direct the synthesis of the peptide. We have chosen Melittin (MLT) as a model peptide for this study to evaluate the effects of this excipient. 1-8 Melittin is a component of bee venom (Apis mellifera) composed of 26 amino acids (Ac-GIGAILKVLATGLPTLISWIKNKRKQ-NH 2 ) whose C-terminal end is positively charged. The third component in the complex studied is centrin, a highly conserved calcium binding protein (CaBP), belonging the tenth largest family of proteins within the mammalian genome, the EF-hand superfamily. 9,10 Chlamydomonas reinhardtii centrin (Cr cen) is localized to the nucleus-basal body connectors, the distal striated fibers, and the flagellar apparatus in algae. [11][12][13][14] The protein is comprised of 172 residues (NCBI Accession No. PO5434) and like all EF-hand proteins, centrin responds to the cellular Ca 2þ influx by selectively binding Ca 2þ at four helix-loop-helix calcium binding sites. Cr cen is highly stable and has been well characterized in the presence and absence of cations using Fourier transform infrared (FT-IR), 15,16 circular dichroism (CD), 15,17 and Nuclear Magnetic Resonance (NMR) 17,18 spectroscopies. The centrin-MLT complex has been studied by our group and its structure has been determined in the absence of TFA. 19 Protein aggregation at the initial state of the sample prior to performing any crystallization screens may well be the cause of the bottleneck observed in high resolution structure determination of protein-peptide complexes. It is possible that the presence of excipients can induce protein aggregation and thus prevents the successful crystallization of a) Contributed paper, published as part of a Special Topic: Invited Papers of the 2nd International BioXFEL Conference, Ponce, Puerto Rico, 14-16 January 2015. b) Author to whom correspondence should be addressed. Electronic mail: belinda.pastrana@upr.edu.

II. MATERIALS AND METHODS
Cr cen was bacterially expressed, isolated, and purified as described in Pastrana-Rios et al. 15,19,20 The use of Spectra9 13 C-enriched minimal media from Cambridge Isotopes Laboratory, Inc. (Andover, MA) allowed for the expression of recombinant 13 C-labeled Cr cen using E. coli BL21 kDE3 cells. Centrin's subsequent isolation, purification, and characterization were performed as described in the above mentioned procedure. The MLT synthetic peptide was purchased from Abgent, Inc. (San Diego, CA) and used without further purification, i.e., removal of residual TFA. The concentration of each protein was determined using the predicted molar extinction coefficient (M À1 cm À1 ): Cr cen e 280 ¼ 1490 and MLT e 280 ¼ 5500.

A. FT-IR spectroscopy
The 13 C-Cr cen-MLT-TFA complex (1:1:10 and 1:2:20, molar ratios), 13 C-Cr cen, or MLT-TFA complex (w/w) sample was dialyzed under the desired conditions and then lyophilized repeatedly (5Â), while re-dissolving the sample in D 2 O (99 at. % D), thus allowing for a fully H ! D exchanged protein sample. Typically, 35 ll aliquot of 60 mg/ml of the protein sample in 50 mM (4-(2-hydroxyethyl)-1-piperazine ethane sulfonic acid) HEPES buffer, 150 mM NaCl, 4 mM MgCl 2 , and 4 mM CaCl 2 at pD 6.6 was deposited on a 49 Â 4 mm custom milled CaF 2 window with a fixed path length of 40 lm. A reference cell was prepared similarly, and both cells were set in a custom dual chamber cell holder. The temperature within the cell was controlled via a Neslab RTE-740 refrigerated bath (Thermo Electron Corp., NH) and monitored with a thermocouple positioned in close contact with the sample cell. The temperature accuracy was estimated to be within 1 C. Ten minutes were routinely allowed for thermal equilibrium to be reached before spectral acquisition was begun. The temperature range studied was 5-98 C. The instrument used was a Magna-IR 550 Spectrophotometer (Thermo Electron Corporation, Madison, WI) equipped with an MCT detector, a sample shuttle, and an interface. Typically, 512 scans were co-added, apodized with a triangular function, and Fourier transformed to provide a resolution of 4 cm À1 with the data encoded every 2 cm À1 .

B. Spectral analysis
The spectral region of interest is the amide I' band has been simplified by the H ! D exchange of the sample and as a result only the peptide bond carbonyl stretching vibrations are observed along with the arginine side chain modes. 15,16,[21][22][23][24][25][26] In the situation when the protein sample has also been 13 C homogenously labeled along with the H ! D exchange, then the 13 C carbonyl stretching modes (Cr cen in this case) are termed amide I'* band. 21,22 The objective of this method is to allow for the simultaneous study of two protein components. Also, TFA absorbs at a discrete wavelength, as a sharp and narrow band, thus allowing for the simultaneous study of all three components. 2D IR correlation spectroscopy is a technique developed by  and used extensively by our group. 15,16,[19][20][21][30][31][32] The technique uses the FT-IR series of sequential spectra as a function of a perturbation (temperature, H/D exchange, ligand titration, pH, excipient, etc.) to generate synchronous and asynchronous plots. The spectra are collected at regular intervals during the perturbation. In general, the analysis spreads the acquired spectral data into two dimensions, thus enhancing the spectral resolution in the synchronous plot and providing information on the sequence of molecular events that occur in the asynchronous plot. 15,19,21,[28][29][30] Both plots provide information on the coupling of vibrational modes within the spectral region of interest (1720-1540 cm À1 ). Consequently, this technique is sensitive to backbone vibrational modes as well as certain side chain modes (i.e., arginine, aspartates, glutamates, and tyrosine) being perturbed and, as a result, one can identify changes in these modes as a function of the perturbation. Furthermore, we have established a method that has proven useful in the simulation FT-IR spectra and 2D IR correlation plots. 31,32 The characteristic sub-bands within the amide I'/I'* contour were used to simulate the 2D IR correlation plots.
Baseline correction, spectral overlay, peak pick routines, deconvolution, and 2D IR correlation analysis were performed using a kinetics program for MATLAB (MathWorks, Natick, MA) generously provided by Dr. Erik Goormaghtigh from the Free University of Brussels, Belgium.

A. Protein and peptide sample
The mass/charge ratio determined by Matrix Assisted Laser Desorption Ionization Time of Flight (MALDI-TOF) mass spectrometry was consistent with the incorporation of the 13 C-label. In addition, the loss of the first residue (Met) due to proteolytic cleavage at the amino terminal end was obtained (data not shown) and varying Ca 2þ and Mg 2þ coordination was also evident. Consequently, the protein sample was dialyzed against the appropriate buffer to ensure similar conditions for the pure component as well as the ternary complex.

B. FT-IR spectroscopy
FT-IR spectroscopy is sensitive to conformational changes observed within proteins. Specifically, the amide I'/I'* band (1720-1540 cm À1 ) due to the stretching vibrations of the MLT peptide and the isotopically labeled protein, 13 C-Cr cen, were analyzed. In general, all spectral data sets are arranged in columns and the similar plot types are arranged in rows to facilitate comparison and analysis.

D. Thermal dependence studies
We compared the FT-IR spectra in the spectral region of 1720-1540 cm À1 (the amide I'/I'* band) within the temperature range of 5-98 C in the presence of Ca 2þ and Mg 2þ , for 13 C-Cr cen, MLT-TFA, and the 13 C-Cr cen-MLT-TFA ternary complex Figs. 1(a)-1(c), respectively. The asymmetrical appearance of the amide I'* band for 13 C-Cr cen ( Fig. 1(a)) is due to the underlying contributions of the b-sheet and helical components and some overlap by the guanidinium stretching vibrations, which is mainly due to Cr cen arginines. The shoulder at 1625 cm À1 is the result of the 13 C-labeled carbonyl stretching modes associated with the pand a-helices as the major contributors of the changes observed. The amide I' band for MLT-TFA complex ( Fig. 1(b)) lacks the aggregation peak (1619 cm À1 ); instead, it contains TFA (1673 cm À1 ), b-sheet (1631.3 cm À1 ), and the antiparallel b-sheet component (1685 cm À1 ) also referred to as b-turn. In the case of the ternary complex ( Fig. 1(c)), the aggregation peak at 1619 cm À1 is evident along with the decrease in helical composition of the MLT component within the ternary complex.

E. 2D IR correlation analysis
This technique enhances the resolution of the spectral region of interest (Figs. 2(a)-2(c)), 15,[19][20][21][27][28][29] which correlates the auto peaks found in the synchronous plots (Figs. 2(d)-2(f)) comprised changes in intensity within peaks that are in phase. While, the asynchronous plots (Figs. 2(g)-2(i)) are comprised of peaks that change in intensity out of phase from each other and establish the order of molecular events involved the aggregation and the role TFA had in this process. 36 The synchronous plots for the components ( 13 C-Cr cen and MLT-TFA) studied within the spectral region 1720-1540 cm À1 are shown in Figs. 2(d) and 2(e), respectively. In each case, a spectral region is observed ( 13 C-Cr cen 1720-1650 cm À1 and 1600-1540 cm À1 for MLT-TFA) with little or no overlap to allow for the simultaneous study of these protein components. The auto and cross peaks with their band assignments are consistent with the band assignments presented above for the same spectral region. The auto peak with the greatest intensity change for FIG. 1. Overlaid FT-IR spectra of (a) 13 C-Crcen, (b) The MLT-TFA complex, and (c) 13 C-Crcen-MLT-TFA complex. The amide I' and I'* bands for the MLT-TFA complex and labelled centrin are shown along with the changes in the spectral features due to the temperature induced changes in each component as the temperature is increased from 5 C to 98 C within the spectral region of 1740-1520 cm À1 . 13 C-Cr cen (Fig. 2(d)) was the b-sheet (1591.8 cm À1 ) followed by the hinge loops (1636.0 cm À1 ). For MLT-TFA complex (Fig. 2(e)) was the b-sheet (1637.7 cm À1 ) followed by the TFA (1673 cm À1 ).

F. Molecular events
Molecular events for the components (MLT-TFA and 13 C-Cr cen) within the spectral region 1720-1540 cm À1 were studied using the synchronous plots, shown in Figs. 2(d) and 2(e), and the asynchronous plots, shown in Figs. 2(g) and 2(h), respectively. The asynchronous and synchronous plots are used to establish the sequential order of molecular events leading up to thermal denaturation are shown in Figs. 3(a) and 3(b) and summarized in supplementary material Table 1S. 37 More importantly, the information provided can be related to the relative stability of the structural motifs within each protein. For 13 C-Cr cen shown in Fig. 3(a), the a-helical contributions (1597.3 cm À1 ) were the least stable followed by the arginines modes specifically: the guanidinium N-D asymmetric stretch (1611 cm À1 ), the 13 C-N stretch (1572 cm À1 ), and the N-D asymmetric stretch (1584.7 cm À1 ), followed by the p-helix (1624.7 cm À1 ) located at the Cterminal end of the protein. The calcium binding loops (1640 cm À1 ) along with the hinge loops (1636.0 cm À1 ) were perturbed at high temperatures, consistent with our previous work. 15 In the case of MLT, within the MLT-TFA complex, the 3 10 -helix (1643 cm À1 ) and b-sheet (1631.3 cm À1 ) are perturbed first followed by the TFA (1671.5 cm À1 ), and the single arginine (1606 cm À1 and 1582.4 cm À1 ) found at the C-terminal end of the peptide, then the kink (1664.4 cm À1 ) found near the middle of the peptide, and finally the antiparallel b sheet (1685 cm À1 ) component due to intermolecular hydrogen bonds with other MLT's in solution is deemed to be the most stable structural motif within the peptide (Fig. 3(b)).
MLT's aggregation within the Cr cen-MLT-TFA complex (1:2:20, molar ratio) is shown in the overlaid spectra (Fig. 2(c)) and 2D IR correlation spectroscopy contour plots are shown in Figs. 2(f) and 2(i). The sequential order of events determined by applying Noda's rules 28,29 for the interpretation of the asynchronous and synchronous plots is shown in Fig. 4 and summarized in supplementary material Table 2S. 37 The spectral data for the entire temperature range 5-95 C were used to understand the interaction within the complex and the thermally induced aggregation process in the presence of TFA. 13 C-Cr cen and MLT can be observed in the presence of an intense negative cross peak in the asynchronous plot ( Fig. 2(i)) at 1643, 1595 cm À1 (t 1 , t 2 ), suggesting an interaction between MLT's 3 10 helix and Cr cen's helices. Also, observed is the 1643, 1595 cm À1 cross peak, suggesting an interaction of MLT's 3 10 -helix with 13 C-Cr cen's a-helices. Finally, the existence of cross peak 1673, 1595 cm À1 attributed to the interaction between TFA and 13 C-Cr cen's helical regions suggests an interaction between TFA and 13 C-Cr cen's helices through the arginine residues located in these regions.
The sequential order of molecular events of 13 C-Cr cen-MLT-TFA complex (1:2:20, molar ratio) during the thermal perturbation was determined to be the following arginine's 13 C-N stretch (1572 cm À1 ) followed by aggregation (1619 cm À1 ) and 13 C-Cr cen helical regions along with arginines' N-D stretching (1584.7 cm À1 ). The perturbation of centrin by TFA is followed by MLT's and the rc segments adopt a 3 10 -helical conformation due to its interaction with centrin. TFA is then perturbed followed by the 3 10 -helix (1646 cm À1 ), then the kink (1664.4 cm À1 ) found in the middle of the peptide, and finally the remaining antiparallel b-sheet (1685 cm À1 ). Furthermore, as MLT progressively losses its interaction with Cr cen's helices due to the presence of TFA, then the peptide's 3 10 -helix begins to aggregate. This event is confirmed by the cross peak splitting at 1673-1618, 1584.7 cm À1 . This process describes the formation of the complex and TFA induced MLT aggregation during thermal perturbation and the interaction with centrin.
We determined the yield of aggregated species to be 12% MLT. These results have been included in a granted method patent application. 36 The 13 C-Cr cen-MLT-TFA complex (1:1:10, molar ratio) within the spectral region of 1720-1540 cm À1 is shown in supplementary material Fig. 4S. 37 These results validate the mechanism proposed for the ternary complex at a higher MLT-TFA molar ratio discussed above. Briefly, the similarities in the peak pattern distribution observed for the ternary complex 1:1:10 with that of 1:2:20 molar ratio, respectively, provide greater confidence for the described mechanism of aggregation. In that, TFA (1673 cm À1 ), MLT's 3 10 -helical segment (1643 cm À1 ), and aggregation (1619 cm À1 ) peaks substantiate the association with MLT as observed from the negative cross peaks observed in both the synchronous and asynchronous plots. More importantly, the correlation between centrins' arginines (1584 and 1572 cm À1 ) with TFA as the inducing factor to inhibit the interaction between MLT and centrin which results in MLT's aggregation is also evident.

G. Spectral simulations
Spectral simulations were performed of the amide I'* band for the centrin-MLT-TFA complex (1:2:20 molar ratio) shown in Fig. 5, to further understand the aggregation process. Simulation of spectral components was used to simplify the complex amide I'/I'* band and generate the 2D IR correlation spectroscopy plots for the temperature range of 5-95 C. The spectral components chosen for the simulation were the 13 C-Cr cen a-helical and b-sheet (1597 and 1588 cm À1 ) and the TFA (1673 cm À1 ) as observed within Fig. 5(a). The simulated 2D IR correlation plots (Figs. 5(d) and 5(g)) were able to reproduce splitting pattern of the cross peaks observed between the TFA and centrin main backbone vibrational modes. A second simulation involved MLT's 3 10 -helix and the aggregation peak ( Fig. 5(b)) where the aggregation phenomenon involving MLT's 3 10 -helix was confirmed by the appearance of the aggregation auto peak (Fig. 5(e)). By considering the sum of the contributing bands considered in the two previous simulations shown in Figs. 5(a) and 5(b), we generated the final set of simulated spectra (Fig.  5(c)), to generate the simulated mechanism of aggregation. Once again the peak pattern (Figs. 5(f) and 5(i)) are consistent with the experimental 2D IR correlation plots observed in Figs. 2(f) and 2(i).

H. Thermal dependence plots
Thermal dependence plots (Fig. 6) were generated using the intensities for the secondary structure contribution of interest within the amide I'/I'* band (1720-1540 cm À1 ) and normalized using the largest intensity contribution within this same spectral region shown in Fig. 2(c) for the empirical spectral data and for the simulated spectra Fig. 5(c). For each spectral component of interest (i.e., TFA, aggregation, and the MLT a-helical component), the intensity ratio was used to determine the fractional peak intensity. Therefore, a value of 1 is interpreted as the largest contributing peak within the amide I'/I'* contour. Determining the fractional intensities establishes the degree of change within the structural motif of the protein in relation to its overall structural components. We determined the fractional loss of MLT's helical component and similarly for MLT aggregation within the 13 C-Cr cen-MLT complex, shown in Figs. 6(a) and 6(b). In Fig. 6(c), we observed slight intensity change for the TFA component. In addition, the onset of the aggregation process of MLT within the Cr cen-MLT-TFA complex was at $45 C (Fig. 6(d)). Finally, the thermal dependence plot for TFA (Fig. 6(e)) and the aggregation peak ( Fig. 6(f)) concur at the onset of the temperature when the extent of MLT aggregation event begins to increase at 45 C.

IV. DISCUSSION
FT-IR spectroscopy allows for the simultaneous study of two proteins and their interaction when one of the protein components is homogeneously 13 C-labeled, causing a shift to lower wavenumbers and effectively separating the stretching modes for each protein. The magnitude of the shift is dependent on the mass of the atoms involved in the vibrational mode. 21 We were able to discern the differences in conformational dynamics of these proteins when compared to the ternary complex. In addition, the N-D Arg stretching modes of centrin account for twelve residues in its sequence, while for MLT there is only one Arg residue. TFA was interacting mainly with centrins' Arg residues and MLT's Lys residues, thus affecting the C-terminal region of the MLT peptide. peptide, while interacting with the TFA. Also, the 2D IR correlation analysis for the components revealed the intrinsic flexibility of MLT in the presence of TFA to be within the b-sheet region (1631.3 cm À1 ), while for centrin it was determined to be the short b-sheet (1591.8 cm À1 ) and the hinge loops (1636 cm À1 ) within the EF-hand domains. In contrast, TFA plays a role in the stability of the centrin protein by increasing its helical contributions by means of interacting with the centrins arginine residues in the ternary complex. The aggregation of MLT involves the newly adopted helical region of the peptide due to its interaction with centrin at low temperatures. As the temperature increases, TFA's role in further perturbing both centrin and MLT continues to elicit MLT's aggregation because the peptide is not able to interact with centrin. The MLT-centrin interaction is being inhibited by the presence of TFA. Direct molecular evidence has been presented to support this mechanism via 2D IR correlation analysis at two different molar ratios of the ternary complex. We also performed spectral simulations to further define the minimal requirements for the aggregation process to occur. This novel approach has proven vital to the understanding of complex interactions and the mechanism of aggregation involving side chain interactions as the key player. In the end, the flexibility of these protein components within the ternary complex was a key for determining the mechanism of aggregation and how TFA played a crucial role in the aggregation. More importantly, the aggregation fraction increased with a concomitant decrease in MLT's helical content, as shown for the 13 C-Cr cen-MLT-TFA complex (1:2:20, mole ratio).
In recent years, a tendency among spectroscopist has been to subtract the characteristic peak assigned to TFA at position 1673 cm À1 . The alleged justification is the physical removal of such excipient is time consuming and that the TFA has no effect on the synthetic peptide. Our goal was to provide the empirical evidence that disproves such assumption and that it may lead to faulty and erroneous molecular biophysical conclusions. Finally, circular dichroism does not allow for such a study to be performed because only electronic transitions are monitored without the possibility of discerning the contribution of the protein components by using isotope labeling as presented herein.

V. CONCLUSIONS
This approach is essential to understanding the role of excipients in proteins and their potential role in protein aggregation. Moreover, vibrational spectroscopy, 2D IR correlation spectroscopy, and the associated spectral simulations provided a complete molecular description of the aggregation mechanism and the extent of aggregation in a highly complex sample. Although we employed 13 C labelled isotope in this study, we envision the application of 2D IR correlation spectroscopy to evaluate protein-peptide complexes, in general. The pure components and mixtures can be evaluated under varying formulation conditions to screen for the presence of aggregate species. This type of evaluation would aid in ensuring the use of high quality proteins and peptides for crystallization.