Hopfield Neural Network-Based Algorithm Applied to Differential Scanning Calorimetry Data for Kinetic Studies in Polymorphic Conversion

A general kinetic equation to simulate differential scanning calorimetry (DSC) data was employed along this work. Random noises are used to generate a thousand data, which are considered to evaluate the performance of Levenberg-Marquardt (LM) and a Hopfield neural network (HNN) based algorithm in the fitting process. The HNN-based algorithm showed better results for two different initial conditions: exact and approximated values. After this statistical analysis, DSC experimental data at three heating rates for losartan potassium, an antihypertensive drug, was adjusted by the HNN method using different initial conditions to obtain the activation energy and frequency factor. Additionally, it was possible to recover the parameters for the kinetic model with accuracy, showing that the conversion is described by a complex process, once these values do not correspond to any ideal models described in the literature.


Introduction
Differential scanning calorimetry (DSC) is an accurate technique widely used to investigate the material thermal behavior and can be applied in kinetic studies on polymorphic conversion. 1 In this experiment, the temperature is linearly increased and quantitative calorimetric information is obtained. 2 The main physicalchemical properties investigated by this technique are the glass transition, 3,4 heat capacity discontinuity in the glass transition, 5 purity, 6 heat of fusion and heat of reactions. Solid materials as polymers and drugs have their properties and/or kinetic of polymorphic conversion extensively explored by this technique. [7][8][9] Kinetic study from DSC experimental data assumes the process obeys Arrhenius law. The kinetic triplet (activation energy, pre-exponential factor and kinetic model) can be determined from the general equation: (1) with α as the conversion degree, β the heating rate, A the frequency factor, E a the activation energy, R the gas constant and T the temperature. A general kinetic model can be considered, f(α) = α m (1 -qα) n , in which the mechanism, depending on the m, n and q parameters, describes the physics and chemistry of the process. [10][11][12] Traditionally, equation 1 is used together with DSC data and fitted by Levenberg-Marquardt (LM) algorithm to determine the kinetic triplet. Neural network can also be used to successfully fit experimental data with high accuracy and reduced computational effort. [13][14][15][16] The artificial neural network application to filter and deconvolute calorimetric signals was initially proposed by Sbirrazzuoli and Brunel. 13 In their study, synthetic DSC curves were adjusted and the error analysis established by an objective function defined as the difference between synthetic and determined data.
In this work, the Hopfield neural network (HNN)-based algorithm is proposed to fit synthetic DSC curves and to retrieve the kinetic parameters (lnA, E a , m, n, q). This procedure has already been explored by the present research group in other works. [15][16][17][18][19][20] The performance of the network is investigated using a thousand curves with added random noise of 2.5% at point-by-point. Also, the proposed method is tested against LM algorithm with respect to accuracy and computational time to determine the parameters.
After this theoretical analysis, experimental data of losartan potassium (LOK), an antihypertensive drug, during its polymorphic conversion is investigated. To generalize the process as a complex kinetic, i.e., consider the occurrence of several mechanisms at same time, a general equation is used in the fit procedure. This assumption is validated since the retrieved parameters do not correspond to any ideal model.

Methodology
Synthetic data DSC curves of heat flow, , in a temperature interval can be simulated assuming a known transformation. For this, the activation energy, E a , pre-exponential factor, k 0 , enthalpy (ΔH) and a mechanism, f(α), are chosen and used in the kinetic equation: (2) in which α i = H i /ΔH and H i is the partial area calculated at time i. The rate constant is determined from Arrhenius law , with R the gas constant (8.314 J K -1 mol -l ).
Making the appropriate substitutions, Sbirrazzuoli 14,21 demonstrated that a mechanism for homogeneous kinetic processes as f(α i ) = (1 -α i ) n is appropriate to describe a large amount of DSC curves. Thus, to determine DSC synthetic data it is necessary first to compute α i as, Defining (5) one obtains (6) The simulated DSC data is obtained from this result by means of equation 3 calculating α i and its derivative in time. Taking the logarithm in equation 3 the general equation is established, This equation is treated in the present work by LM and HNN-based algorithm to determine n as the reaction order, the activation energy, and the pre-exponential factor.

Nonlinear least-squares optimization and statistical analyses
To evaluate the performance of the LM and HNN-based algorithm it is necessary to establish a multi objective error function to the problem, given by (8) in which N and M are the number of points and heating rates, respectively. Y r,i,exp represents the synthetic heat flow by equation 3 and Y r,i,cal is the recovered heat flow, both for the r-th heating rate at time i. The kinetic parameters are determined by fitting DSC data and is investigated considering 1000 synthetic dH/dt(T) data for each heating rate with random errors of 2.5% point-by-point incorporated. This value of 2.5% in random noises was chosen since it is a common error in DSC experimental curves.

Levenberg-Marquardt (LM) fitting
The LM algorithm consists in using the first order regularization to solve a nonlinear problem. Applying the Newton's method in the error function, (9) with w as the vector of parameters, (E a , k o and n). At minimum, E(w + δw) -E(w) = 0 and therefore (10) Considering the error function given by equation 8, with e(w) = (Y r,i,exp -Y r,i,cal (w)) and J the Jacobian matrix, To determine the term in equation 10 it may be explicit as (13) Considering that S(w) contribution is not relevant, . Using this result along equations 10 and 11, the parameters can be obtained by solving (14) However, in this work, it is assumed the first-order regularization in equation 13, (15) with I being the identity matrix and µ as the regularization parameter.

Hopfield neural network (HNN) fitting
HNN is a recurrent single-layer network with all logic units connected. The neurons are connected by a weight factor, T ij , between the neurons i and j. The state of a neuron, u i , is determined by a weighted sum of all neurons connected to it and by external impulses, I i as (16) This equation represents the HNN with the corresponding learning rate, µ i , usually equals to one. In equation 16, u i (t) is the neuron state i at time (t) and f(u j (t)) is the activated state of all neurons connected to neuron i. The neuron state must be activated by a function, f, defined as activation function, which is monotonically crescent, continuous and well-behaved.
Representing the individual error in equation 8: At minimum the problem to be solved by the HNN is represented as, (18) with J(w) the Jacobian matrix. Considering δw i = f i = f(u i (t)) the temporal derivative of error function, equation 8, is calculated as (19) If the condition is imposed, i.e., a decreasing error over time, and since , one is left with (20) Assuming the error function of equation 8, this equation can be represented as (21) Therefore, the neurons temporal evolution in the network is given by (22) with (23) To solve equation 22 it is used a fourth order Runge-Kutta method. The feature of multiple solutions is observed during this integration process. The learning process consists of the actualization of parameters considered in the error function and it is stopped when the neural network learned about the process, i.e., .

Experimental data
The thermal behavior of LOK was determined using DSC60 Shimadzu cell (Tokyo, Japan), calibrated with indium (melting point: T onset = 156.63 °C, ΔH fus = 28.45 J g -1 ) under dynamic nitrogen atmosphere at 50 mL min -1 , under heating rate of 8, 10 and 12 °C min -1 , from 30 to 400 °C, in closed aluminum crucible and sample mass, accurately about 1.5 mg. The thermogravimetric (TG) curve was obtained in a Shimadzu DTG60 thermobalance (Tokyo, Japan) with heating rate of 10 °C min -1 , from 30 to 600 °C, dynamic nitrogen atmosphere at 50 mL min -1 , in alumina crucible and mass of sample accurately weighted about 2.5 mg.

Computation of DSC curves incremented with experimentallike statistical noisy
The kinetic parameters of activation energy E a = 74 kJ mol -1 and pre-exponential factor ln(k 0 ) = 18 together with the mechanism order n = 2 in equation 7, were considered to produce the synthetic data in the temperature interval from 353 to 453 K (this temperature interval is due to the kinetic parameters chosen to produce the synthetic data). Following equation 6, one computed the α i data and its derivative curve to determine dH i /dt as in equation 3, with ΔH = 77 kJ. Considering experimental data, the result for α i should be computed using α i = H i /ΔH, with H i as the partial area calculated at time i. To test numerically the synthetic data, the α i values were calculated from both equations and are represented in Figure 1 for three different heating rates. It is observed that both methods provide results in fair agreement. Figure 2 presents the synthetic data for dH i /dt determined from the α i curves and another 1000 curves with 2.5% random noises added point-by-point.
Comparative results of LM and HNN-based algorithm Case 1: The process is known and the kinetic parameters is provided to confirm the kinetics In this case the process is known and the correct parameters are provided to the algorithms. One hundred simulated points equally spaced in the temperature domain were used in each curve. Figure 3 presents the histogram of parameters determined by fitting the 1000 curves of dH i /dt with noise. The retrieved parameters and standard deviations computed by the LM and HNN-based methods are shown in Table 1. Fitting procedure with both algorithms presented results very close to the exact value used to simulate the data, E a = 74 kJ mol -1 , ln(k 0 ) = 18 and n = 2. The calculated parameters also showed smaller standard deviations in both cases. This small dispersion around the exact value indicates a good accuracy of the methods.
An important aspect to be highlighted is the computational time in both algorithms. These results are presented in Table 2. Although the LM method accuracy, the HNN fitting algorithm is 535 times faster.
The adjustment by HNN and LM are presented in Figure 2 for three heating rates and simulated data without    error for the case 1 analysis. The fair agreement among the retrieved curves and the simulated data corroborates the potentiality of both methods. Figure 4 shows the computed histograms of the parameters retrieved by LM and HNN-based algorithm for the 1000 DSC synthetic curves with random noise. The thinner histogram for HNN-based algorithm demonstrate the accuracy of this method if compared with LM.
Case 2: The process is unknown but an estimative is provided to test the algorithms In this case no information is known about the process, but an estimative for the parameters are provided as initial guess to the algorithms. One defined as a first estimative E a = 100 kJ mol -1 , ln(k 0 ) = 10 and n = 0 and used the same 1000 DSC curves incremented with 2.5% noise. Figure 5 presents the histograms obtained by LM and HNN-based algorithms. The results for each parameter and the corresponding standard deviation are shown in Table 1. The HNN-based algorithm showed, also in this case, thinner histograms for all parameters, proving to be more precise.
For adequate drug products manufacturing by pharmaceutical industry, a detailed description of material composition and process instructions are required. For example, it is very important to know about any physical or chemical transformation in the substances, as the polymorphic conversion in losartan potassium. This knowledge guarantees public health and it is predicted   by Good Manufacturing Practices (GMP) as required by regulatory agencies. 22,23 Therefore the use of analytical techniques, such as DSC curves, together with accurate and robust methodology, as HNN, that presents runtime compatible with the market dynamics and meet the Pharmaceutical Quality System, are fundamental for drug production and obeys the quality standards, required by regulatory agencies and the market. 20,24 HNN can be a robust tool applied to Process Analytical Technology (PAT) described within Quality by Design (QbD) as emerging systems for quality assurance in pharmaceutical processes. [25][26][27] Therefore it is a promising routine methodology for the treatment of DSC data applied in polymorphic conversion kinetic studies, as shown for potassium losartan in the next section.
Experimental results: losartan potassium LOK is an orally active antihypertensive agent, nonpeptide-angiotensin II receptor antagonist with no side effects, which proves its efficacy, safety and therefore its clinical relevance. 28 The occurrence of polymorphism in LOK was previously described 29 and the conversion of LOK form I to LOK form II can be induced by heat. 30 In this work one proposed to study the kinetic of this polymorphic conversion using DSC data applying the HNN-based algorithm.
DSC curve of LOK ( Figure 6, solid line) shows a small endothermic event between 228-245 °C (in a circle), without mass loss as shows TG curve ( Figure 6, short dot line), which corresponds to an enantiotropic conversion of form I to form II (T onset = 231.3 °C, ΔH = 13.16 J g -1 ). These two polymorphic forms were confirmed by X-ray powder diffraction as previously described. 29 A remarkable endothermic peak at 271.5 ºC (T onset ) with enthalpy of 95.1 J g -1 (without mass loss, as shows in TG curve) represents the form II melting. Then, decomposition process starts at 278 ºC with mass loss of 55% in three steps, as can be observed in the TG curve. These results refer to the heating rate at 10 ºC min -1 .
Experimental data acquired at three heating rates (8, 10 and 12 ºC min -1 ) were considered and HNN and LM methods used to fit equation 7 in a multi objective function, as equation 8. Three different heating rates were used to attend International Confederation for Thermal Analysis and Calorimetry (ICTAC) recommendations for an accurate  determination of kinetic parameters. 31 Nevertheless, neither HNN nor LM algorithms presented reasonable adjustments. This result suggests the proposed model in equation 7 is not adequate to describe the LOK conversion process. Thus, a general kinetic model 11,12 was used to effectively describe the experimental data as, (24) HNN algorithm determined lnA, E a , m, n, q, by adjusting experimental data with three heating rates. Table 3 presents the results together with the residual errors. Since the term in equations 2 and 3 accept several combinations of k 0 and E a which guarantee the correct amplitude for and , it is necessary to analyze the initial conditions given to start the minimization process. Results of k 0 and E a for different initial conditions (keeping the initial conditions for m = 1.0, q = 1.0 and n = 1.0) are presented in Table 3. To analyze the sensitivity of the method, we used slightly different values for k 0 and E a as initial conditions. A difference smaller than 10 s -1 for the frequency factor and 10 kJ mol -1 in the activation energy is enough to provide different values for these retrieved parameters. Although, the model remains unchanged for several tested initial conditions. In this calculation, the tolerance in sum squared error was set to be of 10 -6 . The root mean squared deviation, rmsd, is calculated as follows: (25) in which the term is the experimental and is the calculated data for N data points, both at each heating rate r. From Table 3, we can observe that all results are chemically acceptable. As all these solutions are physically coherent, it is possible to choose the solution with the smaller rmsd, although this parameter varies only in the 3 rd decimal place. Therefore, one can infer that LOK conversion is described assuming the parameters k 0 = 6.376 × 10 12 s -1 , E a = 136.5 kJ mol -1 , m = 0.6, n = 0.85 and q = 1.0. It infers that the process occurs as a complex event, once these values do not correspond to any ideal kinetic model. The smaller rmsd solution and the LOK experimental data are shown in Figure 7.

Conclusions
The HNN methodology has some advantages over conventional optimization algorithms-in this case LM-in terms of adaptability and robustness to treat experimental error and unknown initial guess in the kinetic studies using DSC experimental data. Even if an unknown value is provided as initial guess, the HNN algorithm converges for the correct answer presenting a narrow normal distribution about the correct values, as seen from simulated data. The kinetics of polymorphic conversion event in losartan potassium employing this method was performed and the results imply a process that occurs as a complex event,  involving the contribution of various kinetic models, once the m, n and q parameters do not correspond to any ideal model. Also, it was observed a small residual error to fit the experimental data, for the activation energy of 135 kJ mol -1 and frequency factor of 6.376 × 10 12 s -1 .
Since the model parameters have non-sensitivity to initial conditions it is possible to obtain m, n and q with great precision and hence confirm that the polymorphic conversion occurs as a complex process. Nevertheless, to accurately determine the activation energy and frequency factor parameters it is necessary data for several heating rates, as presented by Ozawa. 32