Long-Term Amorphous Drug Stability Predictions Using Easily Calculated, Predicted, and Measured Parameters

Information ABSTRACT: The purpose of this study was to develop a predictive model of the amorphous stability of drugs with particular relevance for poorly water-soluble compounds. Twenty- ﬁ ve representative neutral poorly soluble compounds with a diverse range of physicochemical properties and chemical structures were systematically selected from an extensive library of marketed drug products. The physical stability of the amorphous form, measured over a 6 month period by the onset of crystallization of amorphous ﬁ lms prepared by melting and quench-cooling, was assessed using polarized light microscopy. The data were used as a response variable in a statistical model with calculated/predicted or measured molecular, thermodynamic, and kinetic parameters as explanatory variables. Several multiple linear regression models were derived, with varying balance between calculated/predicted and measured parameters. It was shown that inclusion of measured parameters signi ﬁ cantly improves the predictive ability of the model. The best model demonstrated a prediction accuracy of 82% and included the following as parameters: melting and glass transition temperatures, enthalpy of fusion, con ﬁ gurational free energy, relaxation time, number of hydrogen bond donors, lipophilicity, and the ratio of carbon to heteroatoms. Good predictions were also obtained with a simpler model, which was comprised of easily acquired quantities: molecular weight and enthalpy of fusion. Statistical models are proposed to predict long-term amorphous drug stability. The models include readily accessible parameters, which are potentially the key factors in ﬂ uencing amorphous stability. The derived models can support faster decision making in drug formulation


■ INTRODUCTION
Amorphization 1 is a strategy that is increasingly employed to improve dissolution rates and hence bioavailability of poorly water-soluble drugs. The principal disadvantage of this approach is the risk of premature drug recrystallization from its formulation, leading to reduced bioavailability. Water-soluble polymers are commonly mixed with drugs to form amorphous solid dispersions or solid solutions in order to improve the physical stability of an amorphous compound. 2 Polymer selection is generally based on experience and some knowledge of the physicochemical properties of the constituent materials.
To determine a suitable formulation, typically, different drug− excipient combinations and ratios are prepared and tested. This is both time-consuming and can be challenging when a limited quantity of a compound is available, as is typically the case at the early stages of drug-product development. 3 The efficiency of this screening process for a particular poorly soluble drug could be significantly improved by the development of predictive models as a basis for the rational selection of suitable manufacturing and formulation strategies. 4,5 With the current study in mind, such models could predict the stability of pure amorphous drugs at an early stage of the development based on their properties and indicate the applicability of amorphous formulation strategies.
Amorphous drug stability is influenced by many factors independent of the drug molecule itself, which renders its prediction difficult. 6 These factors include environmental conditions (e.g., humidity, temperature, mechanical stress), 7,8 the preparation method (e.g., solvent evaporation, meltingquench-cooling and cryo-milling), 7,9 and preparation conditions (e.g., cooling rate, processing temperature, and time). 7 Thus, different stability values may be reported by different independent research groups. Nevertheless, a number of compound properties which influence the glass-forming ability (GFA) and physical stability have been suggested to date. 6,7 Using principal component analysis (PCA) 10,11 for a set of 51 compounds, it was found that compounds with comparatively high molecular weight (M r ) and complex molecular structure displayed increased GFA. Mahlin et al. 12 used partial least-squares discriminant analysis to predict the GFA from molecular structure. Their model suggested that molecular descriptors related to size, symmetry, branching, number of aromatic rings, and distribution of electronegative atoms impacted the GFA for 75% of their test compounds. In another study Mahlin and Bergstrom 13 derived a rule of thumb that molecules with M r > 300 can easily be transformed to an amorphous state. Most of these studies identified parameters, which are likely to influence GFA only. Their impact on amorphous stability, however, was not further explored.
The relationship between amorphous drug stability above the glass transition temperature (T g ) and thermodynamic parameters was studied by Graeser et al., 9 employing univariate correlation analysis. It was found that the configurational entropy (S c ) displayed stronger correlation with physical stability than configurational enthalpy (H c ) and free energy (G c ). Mahlin and Bergstrom 13 showed that T g and M r can be used to predict the physical stability upon storage for 78% of compounds which they used to build their model. In addition, they found a strong relationship between stability and crystallization temperature (T c ).
The results gathered from these studies are very promising. However, since only a small number of parameters were investigated, correlations between different properties were not identified, and the results are not consistent. Furthermore, the materials studied were often not selected to be representative of a larger group of compounds. Due to these limitations, no generally applicable model is available for the prediction of amorphous drug stability. Our research sought to bridge this gap by exploring factors which influence the crystallization tendency of a group of structurally and physicochemically diverse poorly soluble compounds and by using these data to develop a predictive statistical model for amorphous drug stability.

■ MATERIALS AND METHODS
Database Building and Selection of a Sample Set. Descriptors used for compound selection were chosen from a group that are known most commonly to contribute to AstraZeneca models of drug metabolism and pharmacokinetics properties of "small molecules" (personal communication, AstraZeneca). These descriptors have also been shown to have an impact on GFA, amorphous stability, 10,12 compound bioavailability, 14 and the stability of drugs formulated as a solid dispersions. 15 The descriptors were either calculated directly from 2D molecular structure or predicted using established structure−activity relationship models. AstraZeneca's in silico predictions portal, C-Lab, 16,17 was used to calculate and predict M r , the number of nonterminal rotatable bonds (rotB), aromatic rings, hydrogen bond donors (HBD), hydrogen bond acceptors (HBA), heavy atoms, polar surface area (PSA), and lipophilicity (clogP) from Daylight/Biobyte molecular fingerprints of all compounds in the database. Algorithms implemented in C-Lab and ALOGPS 18 (www.vcclab.org) were used for the estimation of intrinsic aqueous solubility (logS w ).
A diverse sample set of 25 compounds (Table 5; Supporting Information) was selected from the database of 1327 marketed pharmaceutical compounds (Figure 1) available from the DrugBank repository, www.drugbank.ca. The database was initially reduced to 533 poorly soluble (logS w < −4; S w < 10 −4 M) 19 and low-mass drug molecules (M r < 800). Compounds exhibiting predicted poor chemical stability (e.g., those with disulfide bonds and conjugated double bonds), a permanent charge, or zwitterions were excluded. To eliminate complex effects of ionization on solubility, the present study was further limited to 171 neutral compounds ( Figure 2 A; blue and violet squares), defined as molecules that are uncharged within the
In practice, compound selection was also constrained by material availability, cost, and safety considerations, which, for example, ruled out the inclusion of controlled drugs. Also, due to practical limitations, compounds with a T g < 0°C were excluded, as were drugs that degraded on heating above their melting temperature.
To ensure that the sample set was representative of the chemical and physicochemical diversity of neutral poorly soluble drugs in the database, the compounds were chosen using principal component analysis (PCA) as shown in Figure  2A. The variables responsible for the observed diversity within the compound set are shown on the PCA loading plot ( Figure  2B). For instance, compounds in the upper right quadrant of the PCA plot ( Figure 2A) tend to have comparatively high M r and a high number of rotB ( Figure 2B). Physicochemically diverse compounds were selected with the Maximin I option in AstraZeneca's in-house software, IDEAL, by minimizing the distance between any two compounds. 20 The PCA was performed using three principal components PC 1, PC 2, and PC 3, which, respectively, explained 45.83%, 28.75%, and 13.05% of the variation in the input parameters.
These data were preprocessed by mean-centering and scaling each of the variables in the data set to have unit variance. Simultaneously, using the FLUSH cluster option in IDEAL, all neutral compounds were grouped into 35 clusters by applying a 0.55 Tanimoto similarity threshold 21 (Figure 9; Supporting Information). Each cluster consisted of compounds with a similar chemical structure. Molecules were compared on the basis of binary fingerprints (www.daylight.com). For the sample set, compounds were selected from different clusters to ensure chemical diversity. Because some clusters were larger than others, the probability of selecting compounds from these clusters was higher, which led to a few similar chemical structures in the final sample set (e.g., felodipine and nifedipine). Despite the structural similarity between these two compounds, they were shown to have different amorphous stability. 22 Individual parameters which may be responsible for these differences could be investigated in detail in future studies.
The selected sample set was assessed to be representative for the database of 533 poorly soluble drugs using side-by-side histograms ( Figure 3) of calculated and predicted variables. For M r , HBD, HBA, rotB, heavy atoms, rings, and logS w , the sample set was representative of the poorly soluble drugs because the two distributions showed the same mean location and variance. For PSA and clogP, the sample was not entirely representative of the population because the two distributions were not coincident. This could be related to a possible bias present in the data due to the applied sample selection criteria. The values of molecular descriptors for the selected compounds are presented in Table 6 (Supporting Information).
Measurements of Amorphous Stability at Temperatures below T g . Amorphous stability, measured at temperatures below T g , was used as a response variable for the development of a statistical model. In order to obtain the amorphous form of a compound, the as-received material was spread uniformly between two glass cover slides and heated to 20°C above its respective melting point and then quenchcooled to 0°C at a nominal rate of 130°C/min using a heating stage (Linkam; THMS600).
The absence of chemical degradation resulting from the heat applied to melt the samples was confirmed with differential scanning calorimetry (DSC; Q 2000, TA Instruments, New

Molecular Pharmaceutics
Article Castle, DE, U.S.A.) and solution nuclear magnetic resonance spectroscopy. The amorphous nature of the freshly prepared samples was determined by the absence of birefringence under cross-polarized light. 10 Subsequently, the samples were stored in a MetzSyn 48 reaction station at 5, 10, 20 and 40°C below the T g of each drug down to a minimum of 0°C. Even though in practice the same storage temperature is often used, in our work compounds were kept at individual sub-T g temperatures to avoid fast nucleation and crystallization due to increased molecular mobility above T g . Individual sub-T g temperatures were selected for each compound to ensure a similar decrease in mobility for each compound. In addition, this approach ensures that new compounds studied in the future can easily be compared to the data reported here without the need of studying at a new temperature.
In order to minimize the effect of atmospheric moisture on crystallization during storage, the sandwiched samples were subjected to a constant flow of dry nitrogen. Three samples of each compound were evaluated with a polarized light microscope (PLM), an Olympus IX 50 equipped with a video camera, for the presence or absence of amorphous material. Samples were scheduled to be tested at different storage times of 1 h, 3 h, 1 day, 1 week, 1 month, 2 months, 4 months, and 6 months. Due to practical circumstances, these times were shifted slightly for some compounds; for example, felodipine and celecoxib were analyzed after 5 and 6 days, respectively, rather than 1 week. This approach differs from that reported in the literature, 9,13 where a compound was defined as stable on storage if it lost less than 50% of the amorphous content during a 1 month period, determined with DSC. The samples were discarded after analysis to avoid effects due to changing temperatures and in order to avoid possible contamination with atmospheric moisture after removal from the controlled storage environment. For that reason, a total of 96 samples of each compound were prepared (3 samples analyzed at 4 different storage temperatures at 8 time intervals).
Crystallization of the studied compounds typically initiated at the circumference of the glass slides and then progressed toward the interior of the amorphous film ( Figure 4). Flutamide and compound Y, however, crystallized in the opposite fashionfrom the interior rather than the edge of the film. It was observed that compounds stored at higher temperatures tend to be less stable than compounds stored at lower temperatures. As the variability in crystallization tendency between all compounds was the highest at a storage temperature of T g − 5°C (Figure 4), the amorphous stability at this temperature was used as the response variable in the statistical model. The strength of its linear relationship with explanatory variables was improved by expressing the stability values on a logarithmic scale which was used throughout this study. The values of amorphous stability of selected compounds are presented in Table 6 (Supporting Information).
Measurements of Thermodynamic and Kinetic Parameters. Thermodynamic and kinetic parameters were measured for the compounds in the sample set and were used together with molecular descriptors as explanatory variables in a statistical model. In particular, other workers have shown previously that higher values of T g , S c , and low molecular mobility are correlated with greater amorphous stability for the compounds studied therein. 6,7 Measurements of melting temperature (T m ), T g , and enthalpy of fusion (H f ) for all compounds in the sample set were performed by DSC. Samples of 3−5 mg were analyzed in both as-received crystalline powder and amorphous form. Amorphous material was obtained by heating the crystalline solid at a rate of 10°C/min to 10°C above the T m followed by rapid cooling to 0°C at a nominal rate of 30°C/min. To avoid absorption of moisture by the samples and to protect from oxidative chemical degradation, the heat cool/cycle was performed under a flow (50 mL/min, nominal) of dry nitrogen.
Heat capacity (C p ) values were used to calculate configurational thermodynamic parameters, H c , S c , and G c at T g , based on thermodynamic formulas. 23 As the configurational parameters are related to the supercooled liquid state, 9 T g was considered the closest temperature to the glassy state where the amorphous stability is measurable. To accurately measure C p values for both the crystalline and amorphous form, 24,25 quasiisothermal modulated DSC was used. A temperature amplitude of 1°C, a modulation period of 100 s, an isothermal temperature of 0°C, an isothermal time of 10 min, and a

Molecular Pharmaceutics
Article temperature increment of 5°C were used such that the number of increments was dependent on the melting temperature of the tested material (e.g., T m = 210°C results in 42 increments). To ensure reproducibility of the C p measurements through improved contact of the sample with the DSC pan, powdered samples (10−20 mg) were compacted into 5 mm diameter discs 26 using an IR press (Specac, U.K.) with a 2 ton compression applied for 30 s. For the C p measurements, Tzero DSC pans with pinholes (TA, New Castle, DA) were used to allow volatile products to escape during heating. The measurement accuracy was also controlled by comparing the measured and tabulated C p values of the sapphire standard every 3 runs and ensuring that they did not differ by more than 5% from the literature value.
The relaxation time (τ) was obtained through enthalpy relaxation experiments, performed using conventional DSC according to methods reported in the literature. 22,27 Initially, all amorphous compounds were held for 10 min at 20°C above their respective T g and were subsequently cooled at a rate of 30°C /min to their respective annealing temperatures (T a ) of 23°C below T g corresponding to the maximum enthalpy recovery proposed in another study. 22 A sample was held at the selected T a for a period of 0, 2, 5, 10, and 15 h. After annealing, the sample was reheated at a rate of 10°C/min and an endotherm of varying magnitude in the glass transition region was observed. The size of the resulting endotherm was dependent on compound properties and storage time. It represented the recovery of the enthalpy (ΔH) which was lost during structural relaxation upon sample storage below T g and is directly related to the molecular mobility of the sample under the applied conditions. Curve-fitting the Kohlrausch−Williams−Watts (KWW) equation 27 to the measured data for each compound was used to determine τ. The values of thermodynamic and kinetic parameters for selected compounds are presented in Table 6 (Supporting Information).
Multiple Linear Regression Model Selection. Multiple linear regression modeling of the compound stability data was carried out using the publicly available software R (http:// www.r-project.org/). All utilized packages are available under the general public license at http://cran.r-project.org/web/ packages/. The software package leaps 28 was used to perform multiple linear regression. Calculated, predicted, and measured parameters of the selected sample set were used unit-less as explanatory variables in the derived model equations. In order to select parsimonious models (the simplest of models which explains the data equally well) and to penalize overfitted ones, different model selection algorithms (backward, forward, stepwise, and best subset) and selection criteria (adjusted R 2 , Bayesian Information Criterion (BIC), Mallows' Cp) 29 were used. The listed selection criteria penalize models according to the number of independent variables and help to select parameters that improve the model more than would be expected by chance.

■ RESULTS AND DISCUSSION
Influence of Measured, Calculated, and Predicted Parameters on Amorphous Drug Stability. Correlations observed between the amorphous drug stability and physicochemical parameters are shown in Figure 5, where asterisks denote parameters which are significantly correlated with amorphous stability at a 5% significance level (see key nomenclature in Table 4). It was found that H f is significantly negatively correlated with amorphous stability. No evidence of such a good correlation between H f and the amorphous stability was found in the literature, although it has been frequently emphasized that H f is an important driving force for crystallization. 7,10,11 This indicates that the amount of energy supplied to break intermolecular interactions in a crystalline material during melting is associated with compound crystallization tendency. As melting precedes the formation of the amorphous state, the energy supplied during melting increases the internal energy of the amorphous compound and thus lowers its physical stability.
In addition, in the present study, it was observed that compounds with high amorphous stability have a higher M r , a higher number of heavy atoms and a larger number of rotB. The correlation with rotB was, however, not significant at the 5% level. It has been postulated that the higher amorphous stability of these compounds is the result of the complex and flexible molecule structures impeding orientation in a crystal lattice. 10 Conversely, the exceptionally low stability of flutamide (≤1 h) is probably driven at least in part by its very low M r and relatively planar structure, which promotes crystallization. Amorphous stability was also moderately correlated with the number of HBD at a 5% significance level, and less so with the number of HBA. Nevertheless, a stronger correlation would have been expected based on previous studies, which postulated that hydrogen bond interactions increase the stability of the amorphous state by formation of poorly packed aggregates that impede crystal formation. 30 It was also reported that the ability to form an amorphous state can be dependent on the location and symmetry of hydrogen bonding groups. 31 Interesting observations were made when the correlations between drug stability and measured parameters were investigated further. The correlation between each of the three configurational thermodynamic parameters, H c , S c , and G c , and the physical stability of the 25 compounds in this study is very weak. It can be observed, however, that the correlation with G c is stronger than with H c and S c . It was previously suggested for nifedipine that H c is one of the key factors governing fast recrystallization from the amorphous state. 22 It was also postulated that compounds having lower S c can be more easily orientated to undergo crystallization. 26 For instance, fast crystallization of amorphous griseofulvin was reported to be linked with low configurational entropy and high molecular mobility. 6 This correlation was shown to be much stronger than with G c . Contrary to other studies, 9,26 our results Figure 5. Correlation between amorphous stability and calculated, predicted and measured parameters. Error bars denote standard deviation. Asterisks indicate parameters which are significantly correlated with amorphous stability at a 5% significance level.

Molecular Pharmaceutics
Article show that G c has a more pronounced effect on amorphous stability than S c or H c . The negative correlation indicates that an increase in G c is related to a lower physical stability.
The underlying reasons for the correlations of some parameters with amorphous stability remain unclear. A positive correlation was observed for PSA, although it is thought that anisotropic interactions between polar molecules should favor the formation of a crystalline structure. In agreement with this hypothesis, the ratio of carbon to heteroatoms (CHA) and clogP were positively correlated with amorphous stability and logS w negatively so, although these correlations were very weak. Interestingly, a moderate positive correlation was found between the amorphous stability and the number of both aromatic and aliphatic rings, with a slightly higher R value for the latter parameter. It is thought, however, that molecules with a high number of aromatic rings are generally planar and thus should have a higher tendency to crystallize. 12 Similarly to findings in the literature, 9 only a small correlation was observed between amorphous stability and τ even though τ is thought to be involved in crystallization processes. 7 Although for T g , a parameter which is commonly considered as a good indicator of amorphous stability, 32,33 the correlation was important at a 5% significance level, it was found to be only moderately correlated (R = 0.43). The relationship between amorphous stability and T m was also extremely low for the studied compounds, although it can be speculated that materials with higher T m should be more stable in a crystalline form because these require more energy to disrupt molecular interactions. Thus, a negative correlation with amorphous stability should be prominent. The discussed data ( Figure 5) indicate that a single factor is not sufficient to explain the complex crystallization behavior.
Selection and Validation of Statistical Models. Several parsimonious multiple linear regression (MLR) models were derived to predict amorphous drug stability. The chosen iterative procedure to select the optimal model algorithm and analysis criterion is illustrated in Figure 6. For each selection method the model with the largest adjusted R 2 , minimum BIC or minimum Mallows' Cp (approximately equal to the number of parameters in the model) was selected. For a fixed criterion and selection method 25 resampling iterations were performed, where each compound in the sample set was left out for the model generation and used to test the derived model. Based on all iterations, the average mean square error (MSE) for the selected method and criterion was calculated ( Table 1). The stepwise selection method and BIC criterion were chosen as optimal based on the lowest relative MSE (Table 1; in bold italic). MSEs for 64% of compounds were <0.5 with the stepwise selection method and BIC criterion, which indicated a good fit (Figure 7). Models tested using flutamide (iteration 13), amcinonide (iteration 25), and indapamide (iteration 24) were, however, characterized by relatively high MSE (>1). This is probably a result of extreme values of measured parameters, for instance T g , which were not used in the compound selection process.
At each resampling iteration, a different best model equation was estimated, because the model was fitted to a slightly different training set. Model 1 eq 1 was derived most frequently during the resampling process. The coefficients, however, changed for each iteration depending on the training set. The coefficients in Model 1 eq 1 were therefore calculated on the basis of the entire sample set. The standard errors of the   Interestingly, Model 1 does not include M r , which was shown to be significantly correlated with amorphous stability. Instead, however, it incorporates HBD, which is correlated with M r (R = 0.70). As the two variables contribute in a similar way to the amorphous compound stability, only HBD was eventually selected by the "stepwise" algorithm. Nevertheless, some of the correlated variables, T m and T g (R = 0.83) and H f and G c (R = 0.69), are present in Model 1, making its interpretation less than straightforward. The value and sign of the regression coefficients can therefore not be directly interpreted as a measure of the relationship between the predictors and stability. 29 For example, H f is strongly negatively correlated with amorphous drug stability ( Figure 5). A small increase in H f implies that much more energy is required to disrupt the crystalline structure. In contrast eq 1 suggests a small decrease in stability only. Similarly, G c has a weak negative correlation with amorphous drug stability ( Figure 5), which indicates that the free energy difference between the amorphous and crystalline states is of little importance in prediction of amorphous drug stability. The high value and positive sign of the coefficient in eq 1 suggests, however, that G c is more strongly related to stability than H f . This disagrees with our earlier observations and can be explained by cross-correlations with other variables. These cross-correlations lead to a change in the importance of the predictors (coefficients). 29 As this study aimed to derive a predictive rather than explanatory model of amorphous stability, 34 single coefficients in this equation do not have to be interpretable.
The summary plot of the measured stability as a function of the predicted stability using Model 1 is shown in Figure 8. The black data points represent all compounds in the training set. They lie nearly on the best-fit regression line (red dotted line), which highlights a good fit. Only 24 out of 25 data points are visible because two overlap at 2.23 log(days) measured stability and 2.13 log(days) predicted stability. The black dotted lines indicate the 95% confidence intervals. These lines are not smooth as they are based on 8 predictors. It can be noticed that some data points in Figure 8 lie on a vertical line instead of being randomly distributed (values of approximately 0 and 2 on the abscissa; indicated by arrows). The physical stability was measured at defined time intervals. All compounds at a value of 0 were stable for less than 1 day and compounds with a value of 2 were stable for greater than 168 days. The time of crystallization was, however, not determined precisely as the samples were not monitored continuously. The vertical arrangement of measured stabilities at discrete times is therefore an artifact, but the predictions from the derived model incorporate scatter.
For the best model, residuals 29 were calculated as differences between measured and predicted values of the response variable for each compound. The compliance with the underlying assumptions of multiple linear regression was then evaluated. The first assumption, that the residuals should have a constant variance across all predicted values of stability, was satisfied for all four models. In addition, the residuals of the best models selected with backward and stepwise selection methods were approximately normally distributed. The analysis of residuals reflected how different the predicted values of stability are from the observed values based on the training set only. In order to further demonstrate how well the model performs for compounds outside of this study and to avoid an overfitting bias, an external validation is required.
Some predictors in Model 1 (i.e., G c ) and τ need to be measured experimentally using the procedures described earlier, and these experiments require several days to complete. From a practical point of view, it is desirable to make use of the predictive power of models in which all values needing timeconsuming, and/or technically challenging measurements, or variables which are difficult to predict, are excluded. Table 2 shows a selection of equations that are based on calculated and easily predicted input parameters only. A parameter that is easy to predict is clogP. Conversely, logS w is more difficult to predict in part due to the lack of reliable and reproducible data from solubility measurements. 35,36 Other equations include only parameters that can easily be calculated, predicted, or measured

Molecular Pharmaceutics
Article with conventional DSC such as T m , H f , and T g . It should be noted, however, that the model equations based only on calculated or predicted parameters have a low adjusted R 2 value of 0.33. These models have, therefore, a relatively poor predictive capability. In contrast, including measured variables significantly increases the prediction accuracy to an adjusted R 2 value of 0.70 (Model 4) and 0.82 (Model 1). This implies that some measured parameters are necessary to be incorporated in to the model to achieve a sufficiently accurate prediction of the amorphous drug stability.
Three model equations, Models 4, 5, and 6, with moderate adjusted R 2 values and easily accessible parameters, were tested on external data obtained from the publication of Baird and Taylor 10 (Table 3). These three models correctly predicted 60% of compound stabilities with differences of less than 4 days. In addition, the stability of felodipine predicted with Model 4 was closer to the observed value than the one predicted with Models 5 or 6. This close agreement for all three models was achieved despite different sample selection criteria, preparation methods, and storage conditions used for the compounds in Table 3 compared to this work. Model 6 was tested on 6 additional compounds (Table 3; the identities and some of the physicochemical properties of these compounds are not revealed on request of AstraZeneca). This latter model consists of two parameters, M r and H f only, which are highly correlated with amorphous drug stability. For the entire test-set consisting of 11 compounds, Model 6 correctly predicted 64% of compound stabilities. This is in line with the determined R 2 value of the model derived for the initial data set.
These results indicate the applicability of these simple model equations outside the scope of the sample set. In particular, Model 6 is promising due to its moderately high predictive power and its structural simplicity. It is expected that the prediction accuracy of Model 1 will be better due to the higher adjusted R 2 value. However, this still remains to be demonstrated on external data.

■ CONCLUSIONS
In this study, several multiple linear regression models based on calculated/predicted and measured parameters were derived to  The amorphous drug stabilities were predicted with Model 4 (a), 5 (b), and 6 (c), as indicated by the superscripts.

Molecular Pharmaceutics
Article predict the long-term amorphous stability of neutral poorly soluble drugs. Care was taken to avoid overfitting of the models by the systematic selection of the compound training set through the application of different selection algorithms and selection criteria which penalized overparameterized models. By varying the balance between calculated, predicted, and measured parameters, it was shown that inclusion of measured parameters improves the predictive ability of the models more than 2-fold. By means of univariate analysis, it was demonstrated that the amorphous stability of the representative sample set is moderately correlated with M r (R = 0.59) but more strongly correlated with H f (R = −0.73). The model equation incorporating these two variables only resulted in a prediction accuracy of 59%. It was possible to improve the model predictions up to 82% by incorporating more parameters. The best predictive accuracy was found when T m , T g , H f , G c , τ, HBD, clogP, and CHA were included in the model. The leaveone-out validation of this model showed a small mean square error which highlighted the high quality of fit.
This work demonstrates that long-term amorphous drug stability can be predicted with a good degree of confidence using a combination of easily calculated, predicted, or measured parameters. The correlations of these parameters with the amorphous stability provide insight into the key factors that influence mechanisms which drive crystallization. The authors realize the importance of using an expanded sample set to confirm the discussed findings in the future. However, for this study, the best use of available resources was made by a careful selection of representative compounds. Nevertheless, the predictive power of the selected models should be further validated on a larger external data set. Once successfully validated, such models could assist in faster and more costeffective decision making, especially in preformulation phases of drug development where amorphous drug formulation strategies are under consideration.

* S Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.molpharmaceut.5b00409.
Chemical structures of selected compounds, their suppliers, CAS numbers, and IUPAC names; properties of selected compounds; bar chart demonstrating results of clustering analysis (PDF)