Rapid and Highly Efficient Separation of i-Motif DNA Species by CE-UV and Multivariate Curve Resolution

The i-motif is a class of nonstandard DNA structure with potential biological implications. A novel capillary electrophoresis with an ultraviolet absorption spectrophotometric detection (CE-UV) method has been developed for the rapid analysis of the i-motif folding equilibrium as a function of pH and temperature. The electrophoretic analyses are performed in reverse polarity of the separation voltage with 32 cm long fused silica capillaries permanently coated with hydroxypropyl cellulose (HPC), after an appropriate conditioning procedure was used to achieve good repeatability. However, the electrophoretic separation between the folded and unfolded conformers of the studied cytosine-rich i-motif sequences (i.e., TT, Py39WT, and nmy01) is compromised, especially for Py39WT and nmy01, which result in completely overlapped peaks. Therefore, deconvolution with multivariate curve resolution-alternating least-squares (MCR-ALS) has been required for the efficient separation of the folded and unfolded species found at different concentration levels at pH 6.5 and between 12 and 40 °C, taking advantage of the small dissimilarities in the electrophoretic mobilities and UV spectra levels. MCR-ALS has also provided quantitative information that has been used to estimate melting temperatures (Tm), which are similar to those determined by UV and circular dichroism (CD) spectroscopies. The obtained results demonstrate that CE-UV assisted by MCR-ALS may become a very useful tool to get novel insight into the folding of i-motifs and other complex DNA structures.


Section S1. Procedures to prepare HPC coated capillary and CE-UV analysis
A 5% (m/v) solution of HPC was prepared by heating and sonicating for 1 hour at 40 o C.Then, it was kept overnight at room temperature to eliminate bubbles and it was filtered through a 0.22 µm filter before use.A piece of 100 cm long bare fused silica capillary was flushed at 930 mbar using the CE instrument with: methanol (10 min), 1 M KOH (10 min), water (10 min), 1 M HCl (10 min), and HPC solution (1 h).When done, the capillary was cut from both sides about 2 cm to prevent clogging and dried with N 2 gas at 1.5 bar using a set-up consisting of a Kitasato flask connected to a N 2 source and closed with a rubber cup blocked with a clamp.Most of the capillary was outside of the Kitasato flask to be placed inside of a home convection oven (Moulinex OX4448 Optimo, Group SEB, Lyon, France).The oven was conveniently modified to place the capillary inside while leaving the outlet end of the capillary outside in a vial with water to observe N 2 bubbles, which indicated that there was no clogging.After observing bubbles for 10 min, the oven temperature was set and maintained at 140 o C for 1 hour while flowing N 2 .The temperature was externally monitored with a K-type thermocouple thermometer (Proster TL253, Shenzen, China).Temperature above 160 o C must be prevented to avoid polymer degradation.After completing the first layer of coating from the inlet end of the capillary, a second layer was coated from the outlet end of the capillary.To prepare this second layer, the capillary was flushed at 930 mbar from the outlet end using the CE instrument with HPC solution (1 h), and the N 2 flushing and thermal stabilization procedure was repeated to coat from the other end.Once completed the second layer, the capillary segments that were outside of the oven during the heating were cut.Under optimized conditions, 32 cm total length (L T ) capillaries were used for CE-UV experiments.The UV detection window was made at 8.5 cm from the outlet (23.5 cm effective length, L D ).A scalpel was used to remove the polyimide external coating to prevent HPC internal coating damage.
All capillary flushes in CE-UV were performed at 930 mbar.HPC capillaries were conditioned before each analysis with water (10 min), 0.1 M NH 4 OH (10 min), water (10 min), and BGE (10 min).Additionally, before the first analysis, new capillaries were equilibrated by applying the separation voltage for 10 min.A separation voltage of 10 kV (reverse polarity, anode in the outlet) was selected to guarantee the Ohm's law fulfillment and an appropriate heat dissipation (the Ohm's law was not fulfilled above 15 kV).All separations were conducted at this separation voltage and temperatures from 12 to 40 o C, while keeping the autosampler at the same temperature.This was especially important to avoid great temperature mismatches sample-BGE when working at 12 o C. Note that at any of the studied temperatures and with the different BGEs, the electric current was around 50 µA when applying 10 kV, and this could promote a slight increase in the effective temperature inside the capillary (the input power was 1.56 W/m in the 75/375 um id/od capillary), as suggested by Solinova et al. (Solinova, V;Kasicka, V, Electrophoresis 2013, 34, 2655-2665).The pressure and time for the hydrodynamic injection was adjusted to inject the same sample volume (~25 nL) at the different temperatures, considering the changes in viscosity and the Hagen-Poiseuille equation [18].The DNA samples in the different BGEs were injected at 40 mbar for 3 s (12 o C), 35 mbar for 3 s (20 o C), 30 mbar for 3 s (30 o C), and 25 mbar for 3 s (40 o C).The electropherograms at 254 nm were monitored during the experiments, while the UV spectra were also recorded scanning from 190 to 400 nm.The BGE of the home vials used for voltage application were refreshed after every analysis.All experiments were repeated at least three times.All samples and solutions were filtered through 0.22 µm filters before use.For overnight storage, the capillary was flushed with water (10 min), 0.1 M NH 4 OH (10 min), and water (10 min), to prevent coating damage and avoid salt buildup inside, as well as in the prepunchers and the electrodes.Both ends of the capillary were submerged in vials with water during the storage to avoid drying.For long storage, the capillary was washed in the same way and dried with air.

Section S2. Application of MCR-ALS to analyze i-motif CE-UV data
The most usual way to show CE results is by means of single-wavelength electropherograms (e.g., 254 or 260 nm for DNA).Single-wavelength raw electropherograms were converted to comma-separated value (csv) format with a direct option of the ChemStation software and next imported into Excel 2019 (Microsoft Inc, Redmond, WA, USA) for graphical representation and Gaussian fitting using the Solver complement.In addition, modern commercial CE instruments with UV absorption DAD allow the simultaneous measurement during the separations of absorbance at more than one wavelength (e.g., from 190 and 400 nm as in this study), which produce multiwavelength electropherograms (i.e., multivariate data).
Multivariate data analysis was applied in this study to resolve peaks from i-motif species not completely separated by CE.Multiwavelength CE-UV raw electropherograms were converted to csv format using a macro available with the ChemStation software and next imported into the MATLAB environment (MATLAB R2016a, The Mathworks Inc., Natick, MA, USA).Each electropherogram yielded a matrix of absorbance values with m rows (i.e., the m times at which absorbance was measured) and n columns (i.e., the n wavelengths at which absorbance was measured).From the analysis of this matrix, it was possible to determine the number of species or components present in the analysis, as well as their corresponding concentration profiles (i.e., pure migration profiles: absorbance vs. time) and pure spectra (absorbance vs. wavelength).From the concentration profiles and pure spectra, quantification and identification of the components could be achieved.To reach a good mathematical resolution of these components, their pure spectra should be as much different as possible.For components showing completely overlapped pure spectra, the mathematical resolution may become impossible.
In a mathematical way, it is possible to write: Where D is the matrix of the multiwavelength electropherogram (m x n), C is the matrix containing the concentration profiles of each component (n x Nc, where Nc is the number of components), S is the matrix containing the pure spectrum of each component (Nc x n), and E is the data not explained by the model (m x n, which should be close to random noise).This mathematical decomposition may be accomplished by using MCR-ALS.
In this study, MCR-ALS has been applied to the simultaneous analysis of the multiwavelength electropherograms recorded for TT, Py39WT, and nmyc01 at pH 6.5 and different temperatures ranging from 12 to 40 o C. According to the requirement for multivariate data analysis, the pure spectra of the Nc species (in this case, the folded and unfolded conformations) are invariant in the studied temperature range.Therefore, all spectra measured in these analyses are considered as the result of the linear combination of the pure spectra of the different conformations.This hypothesis has been shown to be right in the case of analysis of multivariate data recorded along spectroscopically monitored melting experiments.
The scheme of the simultaneous analysis is given in Figure S1.The result of using a column-wise augmented matrix for this analysis will be an augmented matrix containing the concentration profiles of the folded and unfolded species at the four temperatures considered (C 12 , C 20 , C 30 , and C 40 ), as well as a matrix S that contains their pure spectra.
Moreover, another augmented matrix E with the data not explained by the model will be obtained.Once the concentration profiles of the species are obtained, their relative concentrations can be estimated from the ratios of the areas calculated for the peaks observed in these concentration profiles.MCR-ALS analysis was carried out following standard procedures for the determination of the number of components (singular value decomposition, SVD) and initial estimates (simple-to-use interactive self-modelling mixture analysis, SIMPLISMA).ALS optimization was performed under non-negativity constraints for concentration and spectral profiles, and spectral normalization (equal length.

Section S3. Validation of MCR-ALS to analyze i-motif CE-UV data
Before analyzing the experimental CE-UV data with MCR-ALS, the performance of this chemometric method was tested.This was done by analyzing an augmented data matrix D, which was constructed from a set of simulated concentration profiles of two species (C o ), and their corresponding pure spectra (S o ).If the MCR-ALS analysis of CE-UV data is done correctly, the calculated concentration profiles (C) and pure spectra (C) with this methodology should match the concentration profiles and pure spectra used previously to construct the data matrix D.
This validation procedure is presented here in three main steps.
Step 1. Construction of the augmented data matrix (D) from the simulated concentration profiles (C o ) and pure

spectra (S o ).
To construct the augmented data matrix D the following steps were followed: 1.The data set was defined (Figure S2). 2. The number of species or components (Nc) was defined.In this case, only two components were proposed to be present (Nc = 2) in all experiments.These components corresponded to two hypothetically "folded" (blue, in the following figures) and "unfolded" (red) species.The ratio of the concentrations of these two species was temperature dependent.Hence, at low temperature, the major species was the "folded", whereas the "unfolded" species predominated at higher temperatures.4. A matrix S o containing the pure spectra of the "folded" and "unfolded" species was constructed by using two experimentally measured UV spectra.The dimensions of the S o matrix were 2 (rows, i.e., the number of species) x 101 (columns, i.e., 101 wavelengths ranging from 220 to 320 nm with a 1 nm step).Graphically, the matrix S o looks like this (Figure S4): Step 2. Analysis with MCR-ALS of the augmented data matrix D.
When applied to experimental data, the goal of MCR-ALS is the determination of the number of components or species present in the mixture, and the calculation of their concentration profiles and pure spectra.From the calculated concentration profiles, quantification may be possible.Finally, from the calculated pure spectra, qualitative information on the nature of the species may be obtained.For a general matrix, this decomposition is as follows: Where E is data not explained by the model and should be close to random noise.
Graphically, the decomposition of the augmented data matrix D is like this (Figure S7): In the case of the validation procedure included here, it is expected that the concentration profiles in matrix C and the matrix of pure spectra in matrix S calculated with MCR-ALS will be very similar to the concentration profiles and pure spectra in matrices C o and S o , respectively, used in the construction of matrix D.
The following actions were followed to decompose the matrix D: 1. First, an estimation of the purest spectra was obtained by means of SIMPLISMA method.
2. Second, an iterative process was started where matrices C and S were calculated in alternate steps.Along this optimization, several constraints were applied, such as the non-negativity of absorptivities in S and concentrations in C. If no constrains were applied in the iterative process, the number of matrices C and S that may explain data in matrix D would be infinite.
3. The process was finished when a maximum number of iterations is reached or when a convergence criterion is fulfilled.
When analyzing experimental data, MCR-ALS allows the calculation of the analytical (or absolute) concentration of a species present in a mixture ( ) from the comparison with the corresponding standard ( ).More      precisely, the quantitation is done using this equation: Where and refer to the area obtained from to the concentration profiles of the species i and the      standard, respectively.
Given the working temperature limitations of the CE instrument used in this work, it was not possible to analyze samples where only the folded or the unfolded species were present, hence standards were not available for an absolute quantification.As an alternative, a relative quantification was done based on the following equation: Three different approaches based on this equation were investigated.First, the areas were calculated from the MCR-ALS concentration profiles calculated with the non-negativity constraint of absorbances and concentrations.Second, the pure spectra in S were not normalized, normalized to equal height, or to equal length.The resulting concentration profiles and pure spectra calculated in each case with MCR-ALS are given here (Figure S8):  Table S1.Values of the area ratios for C 0 (simulatedI and C (calculated) concentration profiles, with no normalization, normalization to equal height, and normalization to equal length).
The best recovery of the areas was obtained when normalization was applied.Slightly better recovery was obtained with normalization to equal length.
Overall, it was concluded that MCR-ALS analysis of CE-UV data provided a good recovery of the concentration profiles and pure spectra used in the simulation.The best results, in terms of recovery of areas, were obtained with normalization to equal length.Good recovery of the pure spectra was achieved in all three approaches.Accordingly, we concluded that the procedure was validated, and the normalization to equal length was applied in the later analysis of experimental data shown in the manuscript.

Section S4. Spectroscopically monitored melting experiments
CD and UV absorption spectroscopies were used to investigate the unfolding of i-motifs formed by TT, Py39WT, and nmyc01 at pH 6.5 (Figure S9).Concerning CD spectroscopy, at pH 6.5 and 5 o C, all three sequences showed characteristic features related to the i-motif structures, such as the positive band at 225 and 285 nm, and a negative band at 265 nm.Upon heating, the intensity of the bands decreased rapidly, which was related to the unfolding of the i-motif structure.At temperatures higher than 50 o C, the CD spectra showed features that could be related to partially folded strands.The melting temperatures (T m ) were determined from the ellipticity curve at 285 nm (see the graph insets).

S13
In addition to the determination of T m values, the changes in enthalpy and entropy associated with the unfolding process were also calculated.The determined values agreed with those determined previously (Table S2).25±1 (pH 6.4) Table S2.Thermodynamic data for the folding of the i-motif structures calculated from spectroscopically monitored melting experiments.A two-state folding process has been assumed in the calculations.The values are given as average value ± standard deviation (n=2).
Figure S11 shows the scheme of the 3'E (left) to 5'E (right) conformational equilibrium in TT sequence.Other bases than cytosines have not been included in the diagram for the sake of simplicity.The area of each conformer has been measured from the resolved concentration profile for the folded species (Figure 4b in the main text, blue).Table S3 shows the calculated equilibrium constants from the areas of each conformer (the areas are given as relative areas): From the slope and intercept of ln(K eq ) vs. 1/T, it was possible to determine the changes in enthalpy and entropy that characterize this equilibrium (-6.0 kcal•mol -1 and -19.5 cal•K -1 •mol -1 , respectively) (Figure S12): Note that given the short temperature range, it is expected that the heat capacities remain constant.The calculated value at 40 o C did not fulfill the model, probably because the areas of the peaks for the 5'E and 3'E have been calculated with high uncertainty at this temperature, as both peaks are very small (Figure 4b in the main text, blue).Comput.Chem, 2011, 32, 170-173).

Figure S1 .
Figure S1.Scheme representing the simultaneous analysis of the multiwavelength electropherograms recorded for the i-motif sequences at pH 6.5 and four different temperatures (as example, 12, 20, 30 and 40 o C).The red rectangle in the matrix D 12 degrees shows graphically how an electropherogram measured at 254 nm and 12 o C is located within the column-wise augmented matrix.
Figure S2.Schematic representation of the data set.

Figure S4 .Figure S5 .Figure S6 .
Figure S4.Graphical representation of the pure spectra of the "folded" and "unfolded" species.5.The matrix D 12 was calculated by multiplying the concentration profiles in by the pure spectra in S o

Figure S7 .
Figure S7.Schematic representation of the decomposition of the augmented data matrix D.

Figure S8 .
Figure S8.Concentration profiles and pure spectra calculated with MCR-ALS after applying the non-negativity constraint and no normalization, normalization to equal height, and normalization to equal length.

Figure S9 .
Figure S9.CD spectra measured along the melting experiments at pH 6.5.(a) TT, (b) Py39WT, and (c) nmyc01.Insets show the melting curves measured at 285 nm.A 2 µM DNA sample was analyzed in all cases with a BGE of 15 mM KH 2 PO 4 at pH 6.5.

Figure S11 .
Figure S11.Scheme of the 3'E (left) to 5'E (right) conformational equilibrium in TT sequence

Figure S12 .
Figure S12.Graphical representation of ln(K eq ) vs 1/T for the conformational equilibrium in TT sequence.The labels indicate the temperatures at which the equilibrium constant was calculated.

Table S3 .
Calculated equilibrium constants from the areas of each TT sequence conformer (the areas are given as relative areas): S15 As can be observed, from the values indicated in the article by Lieblein et al. (reference 42 in the main text), a K eq for this equilibrium equal to 1.88 was calculated at 288 K (15 o C).This value is between the K eq values found in our work at 12 and 20 o C.