A novel quality by design approach for developing an HPLC method to analyze herbal extracts: A case study of sugar content analysis

The aim of this study was to present a novel analytical quality by design (AQbD) approach for developing an HPLC method to analyze herbal extracts. In this approach, critical method attributes (CMAs) and critical method parameters (CMPs) of the analytical method were determined using the same data collected from screening experiments. The HPLC-ELSD method for separation and quantification of sugars in Codonopsis Radix extract (CRE) samples and Astragali Radix extract (ARE) samples was developed as an example method with a novel AQbD approach. Potential CMAs and potential CMPs were found with Analytical Target Profile. After the screening experiments, the retention time of the D-glucose peak of CRE samples, the signal-to-noise ratio of the D-glucose peak of CRE samples, and retention time of the sucrose peak in ARE samples were considered CMAs. The initial and final composition of the mobile phase, flow rate, and column temperature were found to be CMPs using a standard partial regression coefficient method. The probability-based design space was calculated using a Monte-Carlo simulation method and verified by experiments. The optimized method was validated to be accurate and precise, and then it was applied in the analysis of CRE and ARE samples. The present AQbD approach is efficient and suitable for analysis objects with complex compositions.


Introduction
Currently, the concept of quality by design (QbD) has been increasingly applied to the development and optimization of analytical methods, which is known as analytical quality by design (AQbD). Recently, many analytical methods were developed following an AQbD approach, such as the capillary electrophoresis method [1], Karl Fischer titration methodology [2], the supercritical fluid chromatography method [3] and so on. Compared to a traditional analytical method development approach, such as the One-Factor-At-a-time (OFAT) approach [4] Table. No specific permissions were required for the described field studies. The locations are neither privately owned nor protected by the Chinese government. No endangered or protected species were sampled. The standard substance of D-fructose (99.5%) was purchased from Aladdin Chemistry Co., Ltd. (Shanghai, China). The standard substance of D-glucose (> 99.8%) was purchased from Sangon Biotech Co., Ltd. (Shanghai, China). The standard substance of sucrose (99%) was purchased from Sigma-Aldrich Co., Ltd. (Shanghai, China). HPLC-grade acetonitrile was obtained from Merck (Darmstadt, Germany). Triethylamine was of guaranteed reagent grade and purchased from Aladdin Chemistry Co., Ltd. (Shanghai, China). Ultrahigh-purity water was produced using a Milli-Q water purification system from Millipore (Milford, MA, USA).

Sample preparation
Firstly, 50.0 g of Astragali Radix or Codonopsis Radix were extracted three times using a reflux extraction process with water as the extractant. Overall, 400, 300, and 300 mL of water were used for the first, second, and third extractions, respectively. The extraction time was 0.5 h for each extraction. All the obtained extracts were mixed and filtered. The AREs or CREs were then stored in a refrigerator (BL-240/241L, Shanghai Yisi Scientific Industry Co., Ltd.) before analysis. The samples of extracts were diluted with an 85% (v/v) aqueous acetonitrile. Then, the solution was centrifuged at 12,000 rpm with an Eppendorf microcentrifuge (Minispin, Eppendorf AG, Hamburg Germany) for 10 min. The supernatant was filtered through a 0.22-μm Millipore filter unit, and the filtrate was collected as a sample solution.

HPLC analysis
All the quantitative analyses of the sugars were performed on an Agilent 1100 high-performance liquid chromatography system (Agilent Technologies, Palo Alto, CA, USA). The analytes were detected by an Alltech 2000ES ELSD. The separations were carried out on a Waters XBridge Amide column (4.6×250 mm, 5 μm, Waters, Milford, MA, USA). The samples and standards were separated with linear gradient elution. The mobile phase was composed of solvent A (an appropriate amount of triethylamine in water) and solvent B (an appropriate amount of triethylamine in acetonitrile). In addition, there was a column wash of 60% B in mobile phase for 10 min after each run and column equilibration with initial mobile phase composition for 10 min. A mixed standard stock solution containing D-fructose, D-glucose and sucrose was prepared. Standard solutions with other concentrations were prepared by diluting the stock solution with 85% (v/v) aqueous acetonitrile. All standards were filtered through 0.22-μm Millipore membranes before analysis. The injection volume of samples or standards was 5 μL. The ELSD impactor was set to OFF mode. The gain value was fixed at 1 during all the experiments. Calibration curves were established, and quantitative analyses of samples were based on the calibration plots of the logarithm of peak areas versus the logarithm of concentrations for each sugar.

Experimental design
Sugar analysis can be performed using an isocratic elution system of acetonitrile-water [25,26]. However, a linear gradient elution system of acetonitrile-water may lead to better resolutions with a shorter analysis time [27]. Therefore, a linear gradient was adopted in this work. Some separation and detection factors were investigated, including initial solvent B content in mobile phase (X 1 ), final solvent B content in mobile phase (X 2 ), the flow rate of the mobile phase (X 3 ), column temperature (X 4 ), gradient run time (X 5 ), the proportion of triethylamine in the mobile phase (X 6 ), ELSD drift tube temperature (X 7 ), and flow rate of nitrogen gas (X 8 ). The linear gradient elution was conducted as follows: t: 0-X 5 (min), B%: X 1 -X 2 (%). The coded and uncoded values of each parameter are shown in Table 1. A two-level fractional designed experiment with three center points was utilized to analyze the effects of these eight parameters on the analytical results, as shown in Table 2.
After preliminary experiments, some separation and detection parameters were fixed as follows: gradient run time of 37 min, the proportion of triethylamine in the mobile phase of 0.3%, drift tube temperature of 100˚C, and flow rate of nitrogen gas of 1.8 L/min. A Box-Behnken design with five center points was then used to evaluate the quantitative relationships between the CMPs and the CMAs, as shown in Table 3.

Data processing
To estimate which parameters were significant for responses, the standard partial regression coefficient method was used to analyze the results of two-level fractional design and select CMPs [28,29]. Firstly, the response values were standardized according to Eq (1).
where Yi', Y i and " Y represent the standardized value, the measured value, and the average value of each response, respectively; SD i is the standard deviation of each response; and number i (i = 1, 2, 3) represents retention time of the D-glucose peak in CRE samples, SNR value of the D-glucose peak in CRE samples, and retention time of the sucrose peak in ARE samples, respectively. Multiple linear regression analysis was then used to calculate standard partial regression coefficients according to Eq (2).
where a 0,i is a constant; X j represents potential CMPs; and a j,i is the standard partial regression coefficient. The absolute values of each a j,i were weighted and summed up to evaluate the total influences of each parameter X j on all the responses, as seen in Eq (3).
where A j is named as importance factor. Parameters with higher A j values were expected to have greater influences on responses. In this study, each response was considered equally important, which means that the w i values of each coefficient were 1/3, respectively. By applying multivariate regression analysis, quadratic models were built to obtain the quantitative models between CMPs and CMAs according to Eq (4).
where b 0 is a constant; and b i , b ii and b ij represent the regression coefficients for linear, quadratic, and interaction terms, respectively. The analysis of the results was performed using Design Expert (version 8.0.6, Stat-Ease Inc., USA). Based on the specific goals of CMAs, a Monte-Carlo method was performed using an inhouse MATLAB program (R2016a, Version 9.0, The MathWorks Inc., USA) to calculate the design space [30]. The detailed calculation processes were described in previous work [31]. A brief description is given as follows. It is assumed that the experimental results were subject to a normal distribution. The mean value of the normal distribution was assumed to be the measured response value. The relative standard deviation (RSD) of the normal distribution was assumed to be the same as that of the center points in the Box-Behnken designed experiments. Random response values were then obtained and modeled by Eq (4) through every simulation.
The prediction values of CMAs were obtained using models built with random response values. The probability to meet all the analytical goals was then calculated based on the model prediction results. The design space was defined with the probability higher than 0.90. The simulation was used 10,000 times to obtain the probability-based design space. In the Monte-Carlo simulation, coded values of variables were used.

Method validation
After the design space was obtained, an operating point with a high probability to attain CMA goals was chosen for the optimized analytical method. Then, method validation experiments were carried out for ARE and CRE samples respectively, including tests of linearity and sensitivity, analytical precision, stability and accuracy. The limit of detection (LOD) and the limit of quantification (LOQ) were determined by SNR values at 3:1 and 10:1, respectively. The same sample solution was injected for six times continuously to evaluate injection precision. Six sample solutions were prepared in parallel and tested during a single day for intra-day precision. Inter-day precision was evaluated by analyzing replicate samples for three consecutive days, respectively. The stability of a sample solution was evaluated at regular intervals for 24 h, Certain amounts of sugars were added to the sample solution and then analyzed. The recovery was calculated using the ratio of the measured contents of each sugar to added contents. All those results were evaluated by RSD values of the peak areas or the contents of corresponding components.

Results and discussion
The novel AQbD approach for herbal extracts Differed from the conventional AQbD approach described in the introduction section, CMAs and CMPs were determined in sequence with a same data set in the novel AQbD approach. Therefore, the present approach can be more suitable and efficient when dealing with complex systems. Botanical extracts are usually mixtures with many unknown ingredients. When the analytical parameters change, peak separation may be dramatically affected. Therefore, it is difficult to identify CMAs in the initial stage of HPLC method development. At most occasions, potential CMAs could be determined based on prior knowledge. CMAs could be further determined based on some experimental results. If these experiments were also used for CMP selection, CMAs and CMPs can be determined using a same set of experimental data, which will be very efficient.

CMA and CMP identification
Typical chromatograms of standards and samples are shown in Fig 1. CRE samples were found to contain two sugar peaks of D-fructose and D-glucose in this study. Xu et al. [32] reported that ARE samples mainly contained two sugars, including D-fructose and sucrose, according to results of thin-layer chromatography, HPLC and GC-MS analysis. In this study, a similar conclusion could be made. Some chromatographic performance criteria must be considered, including the resolution between adjacent peaks, the SNR of the target components, analysis time, and so on. The resolution between adjacent peaks is the measurement of the HPLC separation performance. The resolution value above 1.5 generally indicates that good separation occurs between adjacent peaks. The SNR of a chromatographic peak is an important system suitability parameter that can be an accurate reflection of the sensitivity performance of the detector. Peak symmetry and peak width are used to describe the peak shape that have a certain relationship with the accuracy of the quantitative results. In addition, the analysis time is commonly used as a CMA in the method development process because a shorter run time is usually favored. The retention time of the last peak is used to represent the method run time.
In this study, some of these criteria were studied as potential CMAs, including the retention time of the D-glucose peak in CRE samples (Y 1 ), the SNR value of the D-glucose peak in CRE samples (Y 2 ), the retention time of the sucrose peak in ARE samples (Y 3 ), the width of D-glucose peak in CRE samples (Y 4 ), the SNR value of D-fructose in ARE samples (Y 5 ), and the resolution between the D-fructose peak and its subsequent peak in ARE samples (Y 6 ). The index values of potential CMAs were collected in the two-level fractional designed experiments, as shown in Table 2.
In Table 2, some of the criteria met the analytical requirements for all the experiments, which indicates that those criteria were not CMAs. For example, the resolution values between the D-fructose peak and its subsequent peak in ARE samples were all higher than 1.50, which means that satisfactory separation was achieved. The SNR value of D-fructose in ARE samples was higher than 70, whereas the SNR value of D-glucose in the Codonopsis Radix sample varied significantly according to analytical conditions, from 26.51 to 77.18. Since the peak area of Dglucose in the Codonopsis Radix sample was much smaller compared to other components, the SNR value of that peak was taken as a CMA. In addition, the retention time of the D-glucose peak in the CRE samples and the retention time of the sucrose peak in the ARE samples were used as CMAs to reduce the analysis time.
Because potential CMAs included resolution, signal-to-noise ratio, peak width, and retention time, a total of 8 potential CMPs covering chromatographic operation and detection were selected based on prior knowledge in this work. The initial and final solvent B content in the mobile phase, the proportion of triethylamine in the mobile phase, gradient run time, column temperature, and flow rate were considered to be potential CMPs for better separation of the  target sugars. The ELSD parameters, including drift tube temperature and flow rate of nitrogen gas were considered as potential CMPs for better detectability. The results of multiple linear regression analysis and the normalized regression coefficients of each response are shown in Table 4. Importance factor values are also listed in Table 4. The analysis parameters with the A j value ranked in the top four of all the eight potential CMPs were selected as CMPs. It was concluded that initial solvent B content in the mobile phase (X 1 ), final solvent B content in the mobile phase (X 2 ), flow rate of the mobile phase (X 3 ), and column temperature (X 4 ) were CMPs.

Effects of CMPs on CMAs
The results of the Box-Behnken experiments are shown in Table 3. The regression coefficients, P values, and R 2 values of regression models between CMPs and CMAs are listed in Table 5. For each CMA, R 2 value that was more than 0.80 indicates that most variations of experimental data can be explained. According to the P values, the linear effects of initial solvent B content in the mobile phase, the flow rate, and column temperature were significant for all three CMAs. Final solvent B content in the mobile phase significantly affected two CMAs of the retention time of D-glucose peak in CRE samples and the retention time of sucrose peak in ARE samples. The interaction term between the initial and final solvent B content in the mobile phase was significant for the retention time of D-glucose peak in CRE samples.
The contour plots were obtained to analyze the effects of CMPs on CMAs, as shown in S1-S3 Figs. It was inferred from S1 Fig that a lower initial and final solvent B content in mobile phase would result in a shorter retention time of D-glucose peak in CRE samples. Increasing water content of mobile phase would reduce the retention time of the sugar peak. The retention time of D-glucose peak would also be decreased by increasing the flow rate and column

Design space development and operating point selection
To obtain optimum chromatographic results, the design spaces were calculated based on the specific goals of each CMA. Since shorter analysis time was required, the upper limits of the retention time of D-glucose peak and the retention time of the sucrose peak were set at 16 and 26 min, respectively. Meanwhile, a higher SNR value for a chromatographic peak was an important reflection of the analytical sensitivity. Therefore, the lower limit of the SNR value of the D-glucose peak in CRE samples was set at 50. Then, the Monte-Carlo method was performed to calculate the design spaces with a probability above 0.90 to attain CMA limits. The design spaces obtained were irregular regions, as shown in Fig 2(a)-2(d).
For the accuracy of chromatographic conditions, three combinations of appropriate analysis conditions with high probabilities to attain CMA goals were selected and applied to the method verification experiments. The combined values of each CMP are presented in Table 6, and the other fixed HPLC parameters were the same as the Box-Behnken designed experiments. These three methods were named A, B, and C, respectively. The verification points in the design spaces are shown in Fig 2(e)-2(g). Verification results are listed in Table 6. The experimental values of the retention time of the D-glucose peak in CRE samples and the retention time of the sucrose peak in ARE samples were roughly consistent with the predicted values. Remarkable differences between the predicted values and the experimental values for the SNR values of D-glucose peak in CRE samples were observed. This result likely occurred due to the lower R 2 value of the model used for prediction.

HPLC-ELSD method verification
The analytical conditions of method C were finally used to determine the sugar concentrations and method validation experiments were carried out. The conditions were shown as follows. Initial solvent B content in the mobile phase was 85%. Final solvent B content in the mobile phase was 76%. Flow rate of mobile phase was 0.9 mL/min. Column temperature was 34˚C. Gradient run time was 37 min. The proportion of triethylamine in the mobile phase was 0.3%. Drift tube temperature was 100˚C. Flow rate of nitrogen gas was 1.8 L/min.
The linearity of the method was confirmed by establishing calibration curves for each sugar. The calibration curve equations and their corresponding determination coefficients, analytical ranges, the limits of detection and the limits of quantification are listed in Table 7. The correlation coefficient values R >0.9990 indicate a high-level of linearity for each sugar. The results of analytical precision, stability and accuracy of this method are summarized in Table 8. All the RSD values were less than 5%, which indicates that the precision of the method could meet the requirement of analysis, and the sample solutions were stable within 24 h. The average recovery of sugars at three concentration levels ranged from 95.4% to 105.4%, whereas the RSD value ranged from 1.69% to 3.65%, which indicated the method had good accuracy.

Conclusion
The present paper describes a novel AQbD approach to develop robust analytical methods. In this approach, data collected from screening experiments were used to determine CMAs and CMPs in sequence. The development of the HPLC-ELSD method for the quantification of the sugar concentrations in extract solutions of Codonopsis Radix and Astragali Radix was used as a sample. Potential CMAs and potential CMPs were obtained after analytical target profiling. A two-level fractional designed experiment was employed to select the CMPs and CMAs. Three CMAs of retention time of the D-glucose peak of CRE samples, the SNR of D-glucose peak of CRE samples, and the retention time of the sucrose peak in ARE samples were determined. Four CMPs of initial and final solvent B content in the mobile phase, flow rate, and column temperature were also found with a standard partial regression coefficient method. After the Box-Behnken experiments, the quantitative models between CMPs and CMAs were successfully constructed. The design space was then calculated using a Monte-Carlo simulation method. The design space was also verified. A set of analytical conditions with a high probability to attain CMA goals was recommended, including initial solvent B content in the mobile phase of 85%, final solvent B content in the mobile phase of 76%, flow rate of mobile phase of 0.9 mL/min, and column temperature of 34˚C. The developed method was validated successfully and applied to simultaneously determine the contents of D-fructose, D-glucose and sucrose in CRE and ARE samples. Supporting information S1