Direct Methods Optimised for Solving Crystal Structure by Powder Diffraction Data: Limits, Strategies, and Prospects

The ab-initio crystal structure solution by powder diffraction data requires great efforts because of the collapse of the experimental information onto the one dimensional 2θ axis of the pattern. Different strategies will be described aiming at improving the process of extraction of the integrated intensities from the experimental pattern in order to make more straightforward the structure solution process by direct methods. Particular attention will be devoted to the EXPO program. Some of its performance will be analysed and results will be shown.

values, it is difficult to estimate the noise contribution correctly; c) the preferred orientation. The crystallities are not always randomly oriented. This behaviour modifies the ratios of the experimental intensities. For the above mentioned problems the |F h | estimate process reveals itself as a crucial point in the powder ab-initio solution: the more reliable the extracted integrated intensities values, the larger the success probability of solving the structure.

The Integrated Intensity Extraction Process
Two methods are widely used for extracting the integrated intensities from the powder pattern: the Pawley method [3] and the Le Bail method [4].
The Pawley method is based on a non linear least squares procedure. The integrated intensities are refinable variables in addition to the profile parameters. Because of the peak overlap, the least squares are often unstable and they provide negative integrated intensity values which must be discarded. For this reason the method needs positivity constraints [5], [6].
The Le Bail method is an iterative decomposition algorithm following the Rietveld formula [7]. The integrated intensity value is calculated according to: where the summation is over the peak range, y obs (i) is the experimental count in the 2θ i angular value, y b (i) is the background contribution, y calc (i, h) is the calculated count in 2θ i , due to the h reflection contribution. The Le Bail method starts with arbitrary but fixed integrated intensity values and the formula is cyclically applied. It is rapidly convergent; it provides positive values if the background is properly estimated, but it tends to equiportion the intensity of a group of reflections strongly overlapping.

The Direct Methods Efficiency With Powder Data
In order to assess which of the two above mentioned methods is more suitable to be combined with direct methods, it proves useful to take into account the reliability parameter R F about the extracted amplitudes: where the summation is over the number of reflections, |F h | extracted is the structure factor modulus extracted by one of the two methods and |F h | true is the structure factor modulus calculated by using the published atomic parameters. The reliability parameter R p about the profile: is considered also. The summation is extended to the number of profile counts, y obs (i) and y calc (i) are the observed and the calculated counts, respectively.
In Table 1, crystal chemical information are given for some test structures (the code name, the space group, the unit cell content, the 2θ experimental range and the number of reflections in the range). They cover a quite large variety of cases.
In Table 2, for each of some test structures, the R F and the R P values are shown. They are calculated by using the integrated intensities extracted by the EXTRA program [8] and the ALLHKL program [3], respectively. EXTRA is a Le Bail based package, ALLHKL uses the Pawley method. In Table 2 the R p values are small but the R F values are large (0.4 is the average value). This means that: a) low R P value is necessary and not sufficient condition for a reliable extraction; b) the integrated intensity accuracy is very low and this is the reason for which the powder ab-initio solution is not straightforward. Moreover, the R F values by EXTRA are always smaller than the R F values by ALLHKL so that we could conclude that the Le Bail method should be preferred but that behaviour may depend on the equipartition tendency of the Le Bail approach. However we proved that the statistical efficiency of direct methods improved by using the Le Bail extracted intensities [9].

The Le Bail Method Advantage
The Le Bail method offers a great advantage: it is very sensitive to the starting point. This aspect is shown in Table 3

The EXPO Program
EXPO is the integration of EXTRA and SIR-POW [10] programs. This last is devoted to the structure solution by direct methods. EXPO needs the minimal information about the experimental powder pattern, the cell parameters, the space group and the unit cell content ( Fig. 1 shows an example of the minimal EXPO input). Its main steps are: 1) Extraction of the integrated intensities (EXTRAC-TION routine); 2) Normalization of the extracted intensities (NOR-MALIZATION routine); The normalization rule restrains that: where E h is the normalized structure factor. The large |E| value reflections are statistically meaningful. The statistical analysis of the normalized structure factors can reveal the presence of pseudo-translational symmetry and/or preferred orientation.

3) Calculation of the structure invariant relationships (triplets and quartets) (INVARIANT routine);
The structure invariant statistical reliability is taken into account. The selected phases are used for calculating the Fourier map whose maxima are searched and chemically interpreted. The map is optimized by combining successive structure factor calculations with preliminary least squares cycles. Therefore, EXPO is a program able to reach the structure solution starting from minimal experimental information. Thanks to the Le Bail tendency to be very sensitive to the starting point, EXPO is more than the trivial combination of the two programs. It is able to exploit information becoming available during the structure solution process itself in the extraction routine to improve the structure factor modulus estimate.

The Use of Prior Information
The following types of information provided by the solution process can be used as prior information for improving the extraction of the integrated intensities: a) Pseudo-translational symmetry information [11].
When a structure is affected by pseudo translational symmetry, a percentage of its electron density repeats itself after an u vector shift. This means that where ρ p is a p percentage of the electron density and u is the pseudo-symmetry vector. In EXPO, the statistical |E| value analysis is able to reveal the presence of pseudo-symmetry, to recognize the percent-  age (the FSP fractional scattering power) and the type (the u vector). If this pseudo-symmetry occurs, an α h coefficient is associated to each h reflection so that, if α h is equal to zero, the reflection is said to be a superstructure reflection, on the contrary it is a substructure reflection. In the pseudo-symmetry case, the normalization rule is violated and This statistical information can be exploited in a successive intensity-recycled Le Bail extraction. In this case, the starting casual integrated intensities are modulated by the statistical term in the previous formula [1 + (α h -1)·FSP]. So doing, the substructure reflection intensities are increased and the superstructure reflection intensities are decreased. The new intensity estimates are more accurate than the traditional Le Bail extraction ones and the phasing process gives better results. b) Probabilistic estimate information [12].
In the INVARIANT routine, EXPO is able to provide the probabilistic estimate of the structure factor modulus (the positivity condition of the electron density is considered in the reciprocal space) by using triplet relationships both in the centric case and in the acentric case. A intensity-recycled Le Bail extraction can be carried out by exploiting the amplitude statistical estimates as starting values. c) The Patterson information [13].
EXPO is able to calculate a Patterson map by using the extracted integrated intensities from a traditional Le Bail extraction. The map is modified (the origin peak is reduced and the low intensity points are put to zero). After that, the map is inverted. The thus obtained squared structure factor moduli can be exploited as starting point in a new Le Bail extraction. d) The located fragment information [14].
If a traditional Le Bail extraction EXPO run is able to locate a fragment in correct way the structure factor moduli calculated by taking into account the recognised atomic positions can be used as starting point in a intensity-recycled Le Bail process. We can summarise that the Le Bail potential to be sensitive to the starting point can be exploited by considering different kinds of prior information to make the starting point closer to the true one. In this way, the extraction is more efficient and the structure solution results become more reliable. In Table 4, the results concerning the use of prior information are shown. The R F value corresponding to the traditional Le Bail extraction run (R D ) and to the use of pseudo-symmetry information (R PSEUD ), Patterson information (R PATT ) and probabilistic estimate (R PROB ) are given. The last column corresponds to the use of the true intensities to start the Le Bail algorithm. The results in that table show that the use of prior information decreases the R F values respect to the traditional Le Bail extraction case, making them closer to the values in the last column. The pseudo-symmetry information can be applied if it is revealed. In Table 5, the R F values obtained by using the fragment information (R FRAG ) with the traditional Le Bail extraction R D value and the selected fragment in the asymmetric unit (in parentheses the corresponding percentage) are given for some test structures, confirming the advantage in exploiting prior information. Therefore, the use of prior information can help when the obtained traditional Le Bail extraction solution is not reliable. The following suggestions can be taken in consideration for optimise its use and for avoiding the bad combined use of prior information because of their correlation: 1) if pseudo-symmetry is revealed, and especially when the detected percentage is large, it is convenient to use it; 2) if no pseudo-symmetry effect is detected, but the structure contains heavy atoms, the use of Patterson information can improve the results; Volume 109, Number 1, January-February 2004 3) the probabilistic estimate information can be used in all the cases; 4) if a fragment is located it can be exploited.
The structure solutions supplied by EXPO are shown in Table 6, where for each test structure we have: the maximum (sinθ /λ) 2 value, the number of reflections, the corresponding number of independent observations (see [15] for details), the number of atoms to find in the asymmetric unit and the number of atoms found by EXPO (in a traditional Le Bail extraction or intensityrecycled run). Most of the structures are completely solved. This doesn't occur when the data quality is poor (small (sinθ /λ) 2 value and/or a large overlapping degree).

The Random Approach
When no prior information is available, or when it is poor, a recently developed procedure can be attempted [16]. It is based on a random approach and it works so that, for each cluster of overlapping reflections, some random partitions of the cluster overall intensity are considered. The partition corresponding to the best fit (the lowest R P value) in the cluster local range is selected as the most reliable one and it provides the integrated intensity values to use as starting ones in the Le Bail formula. The random procedure is applied before each Le Bail cycle. The merit of the new approach is to break the Le Bail tendency to equipartition the intensity of a group of overlapping reflections. Its aim is to modify the equipartitioned intensities: a necessary goal, if the modified intensities correspond to the pivotal reflections in the phasing process. The results of the random procedure are shown in Table 7, where the phase error in the traditional Le Bail extraction case (ERR1) and in the random case (ERR2) are given for some test structures. The values corresponding to ERR2 are always much better than ERR1. This means that the power of the new procedure, to modify a small number of reflections that are very important in the phasing process, remarkably improves the phasing process, even though, on average, no more accurate structure factor moduli estimates are obtained.

The POLPO Procedure
The solution provided by direct methods is frequently incomplete. In particular, this happens in the case of heavy atom structure when the heavy atoms are easily located, but the light atoms are hardly recognised. The traditional approach for completing a partial solution consists of combining Fourier map calculations with Rietveld refinement. The trend is not trivial, not automatic, not fast. The new POLPO procedure [17] has been introduced in EXPO for completing the structure when the structure cations are located. The procedure uses the polyhedral information and it is based on the Monte Carlo technique. The starting point is the cation positions supplied by direct methods. The user gives the polyhedral information by using directives about the polyhedron type, the corresponding cation label, the expected polyhedral average distance, the distance tolerance and the angle tolerance. The procedure automatically calculates the cation connectivity [17]. Several configurations obeying the requested polyhedral and connectivity rules are built. The geometrical construction takes into account the tolerance about the distances and angles. Some configurations are rejected because they are chemically inconsistent. Among the remaining possible configurations, the model corresponding to the best fit between the observed and the calculated profile (the lowest R P value) is selected. Table 8 shows the POLPO results: the number of feasible obtained solutions, the lowest R P value corresponding to the chosen model, the number of anions located in the asymmetric unit (in parentheses the true number), the average distances between the POLPO positions and the true ones and the CPU time are given. It can be seen that all the structures are completed in few time. The discrepancy with regard to the number of anions depends on the imperfectly located positions of the starting cations and on the fact that the construction by POLPO is carried out in a geometrically perfect way. The POLPO procedure is currently being enhanced with the aim of completing a structure when only some cations are positioned.

Conclusions
Thank to its graphical interface, EXPO is a very user-friendly program. It is able to give different opportunities for overcoming the difficulties in solving abinitio crystal structures by powder diffraction data. The next version of EXPO will include N-TREOR [18], a modified and updated version of the program for indexing TREOR90 [19], the POLPO procedure and new strategies for optimising the Fourier map.  1 Certain commercial equipment, instruments, or materials are identified in this paper to foster understanding. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.