Automatic Rietveld refinement by robotic process automation with RIETAN-FP

ABSTRACT Rietveld analysis necessitates the manual trial-and-error refinement of various parameters. To reduce human costs and resources, we have developed a robotic process automation (RPA) system for the Rietveld analysis program RIETAN-FP. By executing our proposed RPA programs, the background parameters of intensities can be determined, and black-box optimizations used to automatically search for peak profile parameters with a small value. Additionally, we evaluated the programs for analyzing X-ray powder diffraction patterns of anatase TiO2, Ca5(PO4)3F, and BaSO4. Consequently, it was verified that RPA can be utilized to automate Rietveld analysis in simple cases. This indicates that RPA can be used for conducting black-box optimizations for graphical user interface (GUI) applications in materials science. Graphical abstract


Introduction
Rietveld analysis is an indispensable technique in materials research [1][2][3][4][5][6]. Because the crystal structure can be determined from powder diffraction data, it has been widely used for structural analysis of all crystalline materials, including those for which single crystals cannot be obtained. Rietveld refinement is completed by refining various parameters, such as lattice constants, peak profile functions, and backgrounds. However, the refinement of these various parameters necessitates tedious manual trial and error, which consumes significant human resources and time. In many instances, Rietveld analysis specialists are required to accomplish this refinement. To automate this refinement process, we present a robotic process automation (RPA) system that can automate the estimation of parameters in Rietveld analysis on a personal computer.
RPA can automate routine mouse operations on a personal computer [7,8]. As the automation can accelerate routine tasks and reduce human error, RPA is gaining considerable attention in the industrial field [9,10]. Accordingly, the combination of RPA and artificial intelligence (AI) or machine learning (ML) is expected to be fruitful in several disciplines. Even if a task requires human thinking, AI/ML coupled RPA systems can perform them automatically by substituting AI or ML for human thought.
In materials science, many methods have been developed that use AI or ML instead of human reasoning, for example, in the selection of candidates for subsequent experiments or simulations to improve material properties. This category of methods is known as black-box optimization [11][12][13][14][15]. Its use in numerous successful applications in materials science has been reported [16][17][18][19][20]. This black-box optimization has also been utilized to accelerate Rietveld analysis by determining background functions, lower/ upper bounds for 2θ, etc [21]. In addition, the development of automation tools for Rietveld analysis has been active for the past decade [22][23][24][25]. The current study comprises the development of one of these automation tools by making use of a black-box optimization technique and RPA. Compared to previous automation tools, we consider the following to be the benefits of using RPA when the automation is visually straightforward on a personal computer: (1) Automation of some operations that have been performed by human hand using reliable and long-standing software for Rietveld analysis can be possible. The operations on the computer are familiar, and by monitoring these, errors in the refinement process itself are easily detectable, and improvement of RPA programs is easy to perform. (2) Visualization of the automated procedures would be useful for educational purposes and training in the use of RIETAN-FP. (3) As RPA automation is simple to comprehend, the threshold for introducing an RPA system for Rietveld analysis is low.
Although numerous tools for Rietveld analysis exist, the current study focuses on RIETAN-FP developed by Izumi [26]. This is because that RIETAN-FP is a very reliable and long-standing software package with many users. Accordingly, the proposed RPA programs for Rietveld analysis were developed using RIETAN-FP ( Figure 1). The RPA programs are based on the Sikulix [27] framework, whose code can be generated using Python. Additionally, the constructed RPA programs comprise two types: the first (run_background_rietan.sikuli) estimates the background parameters of the input intensities automatically, whereas the second (run_combo_rietan.sikuli) performs black-box optimization automatically to estimate peak profile parameters in Rietveld refinement. These RPA programs target laboratoryobtained X-ray powder diffraction patterns, assuming that a structure model can be predetermined. In addition, the single-phase case is considered and the split pseudo-Voigt function is targeted [28]. In other words, we assume that the crystallographic information file (CIF) of interest exists. By reading the CIF using VESTA [29], an INS file, which is the input file for RIETAN-FP, is obtained. Using the INS file generated in this way, RPA can perform background analysis and peak parameter optimization using RIETAN-FP. To demonstrate the effectiveness of our RPA, Rietveld analysis of anatase TiO 2 , Ca 5 (PO 4 ) 3 F, and BaSO 4 was performed using the developed RPA programs, and it was demonstrated that suitable outcomes were obtained. The developed RPA systems are available at https://github.com/tsudalab/RIETAN-RPA.

Rietveld analysis by RIETAN-FP using RPA
In Sec. 2.1, the flow of Rietveld analysis by RIETAN-FP using RPA including the preparations of target data is shown. The details of the two programs developed are introduced in Secs. 2.2 and 2.3.

Flow of Rietveld analysis by RIETAN-FP
Rietveld analysis by RIETAN-FP and the developed RPA programs is performed in accordance with the following procedures.
(1-i) Target.int contains information about powder diffraction. Specifically, this file contains twodimensional data of 2θ (deg) and intensity (count).
(1-ii) VESTA uses a CIF to create the INS file for RIETAN-FP containing crystallographic information of the target compound and various basic initial parameters. The name of INS file is changed to Target.ins. This INS file is the input file for RIETAN-FP and contains various information for the subsequent processes.
(1-iii) In the INS file (Target.ins), the variable 'NBEAM' is selected appropriately. This parameter specifies the type of radiation used to measure powder diffraction. NBEAM = 0, 1, and 2 respectively indicate neutron powder diffraction, X-ray powder diffraction, and synchrotron X-ray powder diffraction. Note that the release version of our RPA codes is intended for conventional X-ray powder diffraction, that is, NBEAM = 1.
(1-iv) The type of profile function is selected in the INS file. In RIETAN-FP, four kinds of profile functions are implemented, which can be selected by 'NPRFN'. NPRFN = 0, 1, 2, and 3 are the pseudo-Voigt function, split pseudo-Voigt function, modified split pseudo-Voigt function, and split Pearson VII function, respectively. Here, we use the split pseudo-Voigt function, i.e. NPRFN = 1.
(1-vi) The first RPA program (run_backgroun-d_rietan.sikuli) is executed step by step to estimate the appropriate background parameters. In RIETAN-FP, the Legendre polynomial function is used as the background function as follows: Here, q i is a variable normalized to the range À 1; 1 The order M can be up to 12, and it is the role of this RPA to determine these b j j ¼ 0; . . . ; 11 ð Þ. Here, Target.ins, Target.int, and RIETAN.bat, which is the RIETAN-FP batch file, are stored in the same folder, and this folder remains open on the screen. RIETAN.bat is also distributed alongside the RPA programs, as RIETAN.command is utilized in the most recent version of RIETAN-FP.
(1-vii) The second RPA program (run_combo_rietan.sikuli) is executed to determine the parameters of the peak profile, such as 'U', 'V', and 'W' which determine the full width at half maximum of profile function, using Bayesian optimization. Here, in addition to the folder for RIETAN-FP prepared in (1-vi), COMBO.exe, data.csv, result.txt, result_opt.txt, and opt_para.txt are stored in the same folder, and this folder remains open on the screen.
Notably, if an incorrect CIF is prepared for the target pattern, the resulting R wp will be higher or many divergent results will be obtained. Thus, this procedure can also be used to ensure that the selected CIF file is a suitable model for the data.

RPA program to determine the background parameters
To automatically determine the background parameters, an RPA program (run_background_rietan. sikuli) is proposed. By performing this action, the following procedure is executed on the Windows computer.
(2-i) RIETAN-FP can select whether or not to update the INS file when it is executed. This is determined by 'NUPDT'. NUPDT = 0 means that the INS file is not updated, and NUPDT = 1 means that the file is updated. Initially, NUPDT is set to 1 to update the INS file. In addition, there is the variable 'SCALE' for scale factor that adjusts the intensity figure. The scale factor is refined by setting SCALE = 1. 'NVOXA', 'NVOXB', and 'NVOXC' are the parameters used when performing a maximum entropy method (MEM) analysis. Because a MEM analysis is not performed in our RPA, we set these parameters to 0. Furthermore, RIETAN-FP allows the selection of up to three different preferred-orientation vectors, where 'IHP', 'IKP', and 'ILP' in the INS file represent the components along the a-, b-, and c-axes, respectively. Here, IHP1 = 0, IKP1 = 1, and ILP1 = 0 so that the first preferred-orientation vector becomes along the b-axis.
(2-ii) In RIETAN-FP, a parameter to be refined is set to 1 and a parameter not to be refined is set to 0. The first three bits of the last bit sequence marked 'CELLQ' are used to determine whether the lattice constants are refined or not. The last bit sequence is changed to 1110000 to refine the lattice constants. Since the purpose is to determine the background, refinement of the profile function is not performed. Therefore, the last bits of the profile function parameters 'FWHM12', 'ASYM12', and 'ETA12' are all set to 0.
(2-iii) RIETAN-FP is run to determine the background. Since changing all 12 background parameters at once is likely to result in a drop to a local minimum, they are refined in order. First, only the first two b i i ¼ 0; 1 ð Þ values are refined. Then, the number of parameters to be refined increased by two, and until finally all the b i 's are determined.
(2-iv) After subtracting the background, the profile parameters are refined by executing RIETAN-FP. At this stage, sufficient R wp value will not be obtained.
(2-v) To prevent the INS file from being updated in this state, we set NUPDT is set to 0.
More specific operations performed by RPA are summarized in Table 1, and the operation is shown as a video of Supplemental Movie 1.

RPA program to determine the peak profile parameters
To perform black-box optimization and determine peak profile parameters with a small R wp , we must first define a search space based on the candidate peak profile parameters prior to refinement. It is well known that high-dimensional black-box optimization problems are difficult to solve. When the split pseudo-Voigt function is targeted, 10 peak profile parameters must be refined. Specifically, in RIETAN-FP, there are three parameters 'U', 'V', and 'W' that determine the full width at half maximum of the split pseudo-Voigt function, and three parameters 'a0', 'a1', and 'a2' are asymmetry parameters. RIETAN-FP also has parameters for decay of peaks: 'eta_L0' and 'eta_L1' for low-angle decay, and 'eta_H0' and 'eta_H1' for highangle decay. However, the ten-dimensional search space is too large for black-box optimization to handle. Consequently, we implemented the scenario in which the values of the asymmetry parameters and decay parameters prior to refinement are assumed to have the same positive value, with the exception of the parameter a2. A negative value with the same absolute value as other asymmetry/decay parameters is imposed for a2. Therefore, we created the fourdimensional search space D by combining U ¼ À 1; À 0:1; À 0:01; À 0:001; 0:001; 0:01; 0:1; 1 ½ �, V ¼ À 1; À 0:1; À 0:01; À 0:001; 0:001; 0:01; 0:1; 1 ½ �, Table 1. Detailed procedures of run_background_rietan.sikuli to optimize the background function.
To automatically estimate the profile parameters, an RPA program (run_combo_rietan.sikuli) is proposed. Here, to optimize the profile parameters, Bayesian optimization (BO) is used as a black-box optimization technique [30,31]. For BO, the Windows application COMBO.exe [32], which is based on Bayesian optimization package COMBO, is utilized. If the Bayesian optimization method is executed, the next candidate parameters from the given search space with a smaller R wp are effectively proposed. This search space is provided as data.csv, which contains parameter space D 0 and previously obtained values of R wp . It is notable that the objective of COMBO is targeting the maximization of objective functions. Thus, negative R wp is used here as the objective function. By performing the RPA program, the following procedure is executed on the Windows computer.
(3-i) To retrieve the next candidate profile parameters from D 0 , COMBO.exe is executed. If the number of calculated parameters is less than five, the next candidate parameters are randomly selected from the search space. The selected candidate profile parameters are outputted on the Command Prompt window.
(3-ii) The proposed peak profile parameters generated by COMBO.exe are written in the INS file. Here, RPA performs the translation from D 0 to D automatically.
(3-iii) RIETAN-FP is executed, and we obtain an R wp value and converged peak profile parameters from the Target.lst file. If calculations of Rietveld analysis diverge, there is no R wp result in Target.lst. In this case, the value of R wp is regarded as 100.
(3-iv) The result of R wp is written in data.csv at the used peak profile parameters.
(3-v) The candidate peak profile parameters and the negative R wp obtained by RIETAN-FP are written to result.txt, and the refined peak profile parameters and the negative R wp to result_opt.txt. These text files are used to save the results.
(3-vi) The procedures of (3-i) through (3-v) are repeated. Finally, the peak profile parameters with the smallest R wp are outputted to opt_para.txt.
More specific operations performed by RPA are summarized in Table 2. An example operation is demonstrated in Supplemental Movie 2.

Target materials
X-ray powder diffraction patterns of anatase TiO 2 , Ca 5 (PO 4 ) 3 F, and BaSO 4 were used to demonstrate the efficiency of our developed RPA programs. The details are as follows.

Anatase TiO 2
This example is the X-ray powder diffraction results for the anatase structure (space group: I4 1 /amd). The anatase (2N grade) was purchased from Kojundo Chemical Lab., Inc., Japan. The target X-ray is Cu Kα radiation with R12 = 0.5 and CTHM1 = 0.7998. The initial structural parameters were obtained from a database of materials project [33]. The intensity data was measured at room temperature using a Rigaku MiniFlex600 benchtop powder XRD instrument.

Ca 5 (PO 4 ) 3 F
This example is the X-ray powder diffraction results for a hexagonal structure (space group: P6 3 /m). The target is Cu Kα radiation with R12 = 0.5 and CTHM1 = 0.7998. The initial structural parameters and intensity data were extracted from a sample file in the RIETAN-FP program [34]. Table 2. Detailed procedures of run_combo_rietan.sikuli to optimize peak profile parameters.
Open the Command Prompt, drag-and-drop COMBO.exe to the Command Prompt window, and press the Enter key.

2.
Copy the selected profile parameters outputted on the Command Prompt window.

5.
Read a R wp value and converged peak profile parameters from the Target.lst file generated by RIETAN-FP on Notepad. 6.
Write the result of negative R wp in data.csv at the used peak profile parameters on Excel. 7.
Write the candidate peak profile parameters and the negative R wp obtained by RIETAN-FP in result.txt on Notepad. 8.
Write the refined peak profile parameters and the negative R wp in result_opt.txt on Notepad. 9.
Write the peak profile parameters with smallest R wp in opt_para.txt on Notepad.

BaSO 4
This example is the X-ray powder diffraction results for an orthorhombic structure (space group: Pnma). The target is Cu Kα radiation with R12 = 0.0 and CTHM1 = 0.78987. The initial structural parameters were obtained from Springer materials [35]. The intensity data were extracted from a sample file in the RIETAN-FP program [34] Figure 2(a-c) depict the minimum value of the previously obtained R wp 's at a time depending on the number of cycles used to find peak profile parameters with small R wp . In the first five cycles, the parameters are selected at random, and the latter 25 cycles are followed by BO. In addition, the results of five independent runs with different initial parameters are averaged. Here, results from BO and random sampling (RS), in which the next candidate parameters are randomly selected, are compared. There is no significant difference between the two methods, but a steady decrease in R wp can be obtained via RPA using both methods. In both cases, RPA can quickly identify a better parameter for anatase TiO 2 . In contrast, for Ca 5 (PO 4 ) 3 F and BaSO 4 , when the number of cycles is 30, the parameters found by BO are superior to those found by RS. The histograms of R wp are summarized in Figure 2(d-f). The samplings by five independent runs are summarized. BO has a greater number of samples with lower R wp values than RS. In particular, the number of non-convergent samples is small in BO. Thus, BO tends to identify better parameters that result in low R wp . The results of Rietveld analysis for the optimal parameters determined by RPA utilizing BO are summarized in Figure 3. Anatase clearly converges faster than the other two cases. This is because the number of peaks is small and the problem is simple. Longer optimization cycles are required when the number of peaks is large. For anatase TiO 2 and Ca 5 (PO 4 ) 3 F, obtained R wp is small enough for the correct Rietveld analysis. In contrast, for BaSO 4 , R wp is not sufficiently small. This is because the refinement of the preferred orientation parameter was not performed in this RPA. In the case of crystals with anisotropy, a better R wp can be obtained by refining the preferred orientation parameter. In the case of BaSO 4 , R wp ¼ 9:323% is obtained by refining the orientation parameters using the optimal parameters found by our RPA. To refine the orientation parameter, the last bits of 'PREF' are changed to 010000. This refinement of the orientation parameter is not included in our RPA, and thus must be done manually. If the results by RPA are unsatisfactory, manual optimization by Rietveld analysis specialists is necessary. On the other hand, the developed RPA programs could significantly reduce the work of a specialist.

Optimization results
In addition, the correlations of refined parameters can be understood using the sampling results. Figure 4 shows the heat map of correlations for the refined parameters generated by one of the independent runs. Large positive or negative correlations exist between numerous parameters. This information is useful for manually performing Rietveld analysis after RPA execution. In other words, the local minimum will be avoided by adjusting the peak profile parameters so as to break the correlations.

Discussion and summary
In this study, to automate Rietveld analysis, RPA programs capable of performing RIETAN-FP with blackbox optimization were developed. The constructed RPA programs are of two types: the first estimates the background parameters of the input intensities, whereas the second performs black-box optimization to estimate appropriate peak profile parameters. The target of the developed RPA programs is laboratoryobtained X-ray powder diffraction patterns, assuming that a structure model can be predetermined. Furthermore, the single-phase case is considered and the split pseudo-Voigt function is used as the peak profile function. These developed programs were tested using the X-ray powder diffraction patterns of anatase TiO 2 , Ca 5 (PO 4 ) 3 F, and BaSO 4 . As a result, for anatase TiO 2 and Ca 5 (PO 4 ) 3 F, a refinement of Rietveld analysis was successful, and automation of Rietveld