Reconstruction of SAXS Profiles from Protein Structures

Small angle X-ray scattering (SAXS) is used for low resolution structural characterization of proteins often in combination with other experimental techniques. After briefly reviewing the theory of SAXS we discuss computational methods based on 1) the Debye equation and 2) Spherical Harmonics to compute intensity profiles from a particular macromolecular structure. Further, we review how these formulas are parameterized for solvent density and hydration shell adjustment. Finally we introduce our solution to compute SAXS profiles utilizing GPU acceleration.


Introduction
Small angle X-ray scattering (SAXS) is an experimental structural characterization method for rapid analysis of biological macromolecules in solution [ -6]. Because the samples do not need to be crystallized, they can be studied in different pH environments and concentrations leading to insightful structure-function relationships. The overall SAXS scattering profile is calculated by subtracting the scattering profile of the blank buffer solution from the profile of the sample dispersed in solution. SAXS data has been used to filter a set of protein models by comparing the SAXS profile of each model with the experimental SAXS profile [7,8]. The SAXS profile has been incorporated as a term in the scoring function to obtain a protein model consistent with the experimental SAXS data [9]. An exciting feature in modern SAXS is identifying and modeling protein flexibility from an ensemble set of different conformers to fit experimental SAXS data [ 0, ]. This requires a large library of starting conformers as input to the algorithm [ 2]. After a suitable library of conformers has been generated or found, the experimental SAXS data are used as a constraint in an algorithm to determine which combination of conformers optimally fit the data. The scattering intensity (I) is represented by a linear combination of the selected conformers. In this process the algorithm must decide ) Which conformers to use and 2) How many conformers are required to accurately recreate the experimental SAXS profile. Critical to the success of this task are the underlying algorithms used to compute a SAXS profile from a proposed protein model. In this review we highlight different methods to accomplish this task. We recognize that these methods are not exhaustive of all methods, but represent a sampling of different approaches that provide insight to the process of computing SAXS profiles from atomic coordinates. For a more comprehensive review of small angle X-ray scattering theory we recommend several reviews [ -3, 3].
X-ray scattering is observed when differences in electron density exist in a given sample and X-rays generated from a source device pass through the sample. Although both coherent and incoherent scattering is possible, we will confine our considerations to coherent scattering because incoherent scattering is negligibly weak at very small angles [ ]. Elastic (without energy change) electron scattering is influenced by all atomic orbitals. Because atomic orbitals have different shapes according to their atomic group, the X-ray scattering provides information on the structure of the target sample.
The scattering process occurs as electrons resonate with the frequency of the X-rays passing through the object. As the electrons resonate, they emit coherent secondary waves which undergo both constructive and destructive interference. Because of destructive interference, the superposition of waves with all possible phases will lead to zero scattering at a scattering angle of 2θ [ ]. The scattering maximum I(0) will be theoretically observed at a scattering angle of zero where all waves are in phase. Because of the high intensity of the incident X-ray beam, a beam stop is placed between the detector and the beam to prevent it from distorting the scattering profile. I(0) must therefore be computed rather than experimentally observed.
To illustrate the scattering process, consider a linearly polarized monochromatic X-ray beam incident on a single electron with charge e and mass m. The periodic electric field of the incident X-ray produces a force on the electron (F = qeE) where F is the overall force the electron experiences, qe is the charge of the electron and E is the electric field of the incident X-ray. This force causes the electron to oscillate with the same frequency as itself. The equations governing this behavior are shown below beginning with the electric field equation: where E is the electric field, E0 is the maximum value of the electric field, ω is the frequency of oscillation of the wave-field, t is time, and δ is the phase constant.
By Newton's second law of motion we equate the two equations of force:

CSBJ
Abstract: Small angle X-ray scattering (SAXS) is used for low resolution structural characterization of proteins often in combination with other experimental techniques. After briefly reviewing the theory of SAXS we discuss computational methods based on ) the Debye equation and 2) Spherical Harmonics to compute intensity profiles from a particular macromolecular structure. Further, we review how these formulas are parameterized for solvent density and hydration shell adjustment. Finally we introduce our solution to compute SAXS profiles utilizing GPU acceleration.
where m is the mass and a is the acceleration. The acceleration the electron experiences due to the periodic electric field is computed by dividing by the mass: where the amplitude A0 is: The electromagnetic radiation at a given distance with magnitude r from the charge q that experiences acceleration a has an electric field component: where c is the speed of light, r is the magnitude of the position vector, qe is the charge, a is the acceleration and α is the angle between a and r. If the position of r is perpendicular to the incident beam (which is true for SAXS experiments) then α = 90° and sinα = . Combining this simplification with the electric field component and substituting A0 for a:

( )
Now imagine instead of a single electron, we have an electron cloud. As incident X-rays pass through an electron cloud with the origin at the center, most of them travel through the cloud without scattering, while a small fraction (< %) of the incident X-rays are scattered. This can be seen from the scattered to incident amplitude ratio:

( )
where re is the constant Thomson scattering length and r is the distance from the object to the detector.
Because re is small, the scattered-to-incident amplitude ratio reveals that a single electron scatters a very small fraction of the incident X-rays. For example, at a sample to detector distance of three meters (typical for SAXS experiments), the amplitude ratio is: ε For an fuller description of the physics of X-ray scattering and the mathematics of waves we refer to the notes of Dr. Robert Blessing [ 4].
Because the scattered waves are coherent, the resulting amplitudes are added and the intensity is given by the absolute square of the amplitude [ ]: where An is the resulting amplitudes of all scattered waves and I is the scattering intensity. In Thompson elastic scattering all secondary waves have the same intensity and is given by [ ]: ( ) where Ip is the primary intensity and Is is the intensity of the secondary waves. The term e 2 /mc 2 is the classical electron radius and, r is the distance from the object to the detector. For small angles the polarization factor ( + cos 2 2θ)/2 is approximately one leaving [ ]: ( ) Figure 1. SAXS Experimental Setup. X-rays with a constant wavelength are first focused by the collimator and then pass through the purified sample in solution. A small fraction of the X-Rays scatter as they encounter electrons in the sample. The detector captures these scattered X-rays as intensity values. The final scattering profile is the difference between the profile of a blank buffer solution and a solution containing the purified sample.
We will assume the amplitude and intensity of all secondary waves to be one for this discussion. With this framework, each secondary wave is represented by the complex function e i φ where φ is the phase. Because the amplitude and intensity are one, all waves differ only by their phase. The phase of the scattered wave depends on the position of the oscillating electrons in space.
The phase of the secondary waves is 2π/λ multiplied by the path difference between the scattered and incident waves. In the diagram, we let s0 represent the direction of the incident beam and we let s represent the direction of the scattered beam. The path difference of a point P, specified by r, against the origin O is: -r•(s-so). The phase is given by [ ]: The term (s-so) is symmetric to the incident and scattered beam with magnitude of 2sinθ. In this representation θ represents half the scattering angle. The momentum transfer vector q is independent of the distance to the detector and the wavelength ( ) and defines the scattering curve in reciprocal space with units of Å -. The momentum transfer vector has the same direction as (s-so) and the magnitude is given by substituting 2sinθ for (s-so):

| |
where 2θ is the scattering angle. We refer to q as the magnitude of the momentum transfer vector q. In the literature, this term has been defined multiple ways and one must be aware of the convention used. For example the symbols h and s have been used in place of q. Sometimes s is defined as s = (2sinθ)/λ with q = 2πs. Others define θ rather than 2θ as the scattering angle. In this review we use the convention for q shown above with 2θ as the scattering angle. Large interatomic distances contribute primarily to the scattered X-ray intensity at small scattering angles, whereas short interatomic distances primarily contribute to X-ray intensity at large scattering angles. The information content of a SAXS profile is small compared to other high resolution experimental techniques because the overall scattering profile represents the orientationally averaged contribution of all atoms in all orientations. The SAXS scattering curve contains information related to the overall shape of the molecule and is routinely used for the validation of structural models [ 5, 6].
The term electron density is frequently used in the literature in the place of electron density difference or contrast. The electron density ρ is the number of electrons per unit volume. In SAXS experiments only the electron density difference ρ2 -ρ (ρ2 is the electron density of the sample, ρ is the electron density of the solvent) is measurable. If ρ2 = ρ , then scattering is not observed because the waves scattered in any direction will cancel out. During a SAXS experiment the electron density of the buffer solution is subtracted from the density of the combined sample and buffer solution leaving the electron density of the sample without background solution.
The electron density function ρ(r) is defined in real space for non-negative values. It is a histogram of equivalent pairwise atomic distances in a given sample. Because of the solution subtraction, the electron density it is zero everywhere except for defined electron distances in the sample where identical distances add together.
If we have the distance distribution function then the scattering curve I(q) can be calculated by Fourier inversion[ ]: ∫ Likewise the distance distribution function ρ(r) can be calculated by Fourier inversion of the scattering curve [ ]: ∫ Theoretical scattering curves can be computed for a model of a given shape and compared with experimental data using either the intensity calculation I(q) or the distance distribution function p(r). The distance distribution function allows the deduction of the largest particle dimension dmax and is the distance at which the p(r) drops to zero.  defining accurate macromolecular structures, conformations and assemblies in solution [3]. Pair-wise distances between each atom are represented. The distances are symmetric and are represented twice by the double arrows. The P(r) function will be zero whenever a particular distance is not defined by the geometry of the sample.
Proteins are built up from the arrangement of amino acids which are built up from the arrangement of atoms differing by side chain arrangement. Imagine a protein sample in a fixed orientation. The centers of mass of each atom may be designated by r , r2, …, rn, and their amplitudes with respect to each mass center by f , f2, …, fn. The total amplitude is [ ]: ∑ where the additional phase factor describes the position of the atom and fj(q) is the amplitude. The intensity is the absolute square of the amplitude, averaged over all orientations:

〈∑ ∑ 〉
When j=k the phase factor reduces to one. This situation represents the contribution to the intensity diffracted by the atoms alone. The situation j≠k represents the interference between the atoms, according to the relative distance (rj-rk). Each amplitude f has a phase: ‖ ‖ . This figure depicts the experimental SAXS curves and parameters measured for Pyrococcus furiosis PF1282 rubredoxin (magenta), a 'designed' scaffoldin protein S4 (red), a 'designed' minicellulosome containing three catalytic subunits (green), and the DNA-dependent protein kinase (blue). (a) Dmax of the scattering particle is a simple function of molecular weight for perfect spheres, but not for proteins that adopt different shapes. Envelopes correspond to ab-initio models calculated from experimental curves using GASBOR. (b) The experimental scattering curves for each protein show that the intensity of scattering falls more slowly for rebredoxin (RG 11 Å ; magenta) than the minicellulosome (RG 82 Å; green). (c) The linear region of the Guinier plot, from which RG and I(0) can be derived, is a function of the RG. (d) Each protein has both a substantially different Dmax as well as pair-distribution function, reflecting the different atomic arrangements.

Reconstruction of SAXS Profiles from Protein Structures
Splitting the atomic diffraction (j=k) from the interference between atoms (j≠k) yields:

∑ 〈∑ ∑ | || | 〉
In SAXS experiments there is no fixed origin because particles are sampled in all orientations. The phase is dependent on a fixed origin. By averaging over all orientations and restricting atoms to be spherical, the phase vanishes, (φk -φj) = 0 and fj becomes independent of orientation. Furthermore, spherical averaging of all orientations is given by:

〈 〉
This representation of the spherical averaging is known as the Debye factor [ 7]. The final Debye formula is: In this format the amplitudes f are calculated by computing the atomic structure factors. The atomic diffraction and interference between atom sums can be combined together to give the form of the Debye equation frequently cited in the literature: where rij = | ri -rj | are the x,y,z positions of atoms i and j. The Debye formula given above takes the atomic x,y,z coordinates as input and returns the intensity as a function of momentum transfer q. This double sum of all atoms in a given system for each computed q value has a computational cost of O(N 2 ). The quadratic cost is a prohibitive barrier for atomic level application of the Debye formula for large systems (N > 0,000). In the case of structural refinement for SAXS, the scattering profile must be computed from all pairs of interactions with atoms in the molecule. In high-throughput applications the profile must be computed thousands of times, while in an iterative ensemble analysis, the profile must be computed hundreds of thousands of times. Because of the high computational cost, different methods have been developed to reduce the number of necessary calculations to compute intensity. Before we discuss the approximations to the Debye formula, we must first understand the structure factors fi(q) and fj(q).
The atomic form factor is a fundamental physical quantity in solid state physics. It is the Fourier transform of an electron distribution around a nucleus of a given atom and carries information on the electron wave function. The X-ray scattering power of a given atom will depend on the number of electrons it contains. As the number of electrons contained in an atom increases (higher atomic number), the scattering power increases. As the scattering angle increases, the scattering power decreases. A scattering angle of zero results in the maximum scattering factor for a particular atom which is equal to Z -the atomic number. The form factor approximations are based on the combination of relativistic Dirac-Slater wave functions and numerical Hartree-Fock wave functions . These Hartree-Fock structure factors were computed from q = 0 to q = .5 at intervals of 0.0 Å -. For convenience, they were fit to a 5-gaussian (Cromer-Mann) analytic function: ∑ where fv,i (q) is the structure factor of a particular atom at a given qvalue in vacuo. The constants a , a2, a3, a4, b , b2, b3, b4, and c are the Cromer-Mann coefficients for a given atom, and q is the momentum transfer in inverse angstroms. Tables for the Cromer-Mann  coefficients are found in the International Tables for X-Ray  Crystallography[22]. This approximation is valid in the q-ranges for SAXS scattering experiments from 0 to ≈ 0.33Å - [2,3]. For larger qranges, a 6-gaussian approximation must be used which is valid from 0 to ≈ 6.0Å - [2 ].
In addition to the vacuo contribution to the form factors, the solvent makes a critical contribution to the overall scattered intensity. The solvent effect is considered by modeling the solvent as an electron gas with density equal to the average electron density of the solvent [23]. Taking the solvent effect into account, the overall structure factor of the atom is the combination of the structure factor representing the excluded solvent subtracted from the form factor for a given atom: where fs,i is the structure factor of the hypothetical atom that represents the displaced solvent. The displaced solvent scattering term fs,i is given by: where ρ is the electron density of the solvent. For pure water this is 0.334e Å -3 . Vi is the solvent volume V displaced by atom i and is calculated from the van der Waals radius of the atom. [23,24]. The exponential term is the normalized Fourier transform of the Gaussian sphere. This sphere corresponds to the excluded volume around the atom.
The electron density surrounding the scattering body is calculated by computing the number of electrons per liter of solvent and then converting that to the number of electrons in a cubic angstrom. This excess electron density is then added to the density of pure water. Proteins have an electron density around 0.44e Å -3 [2]. The electron density of the solvent should maximize difference between itself and the electron density of the sample to maximize contrast in SAXS experiments. The derivation for the electron density of pure water with a density of g/mL is shown below: Now that we have reviewed the theory of X-ray scattering and have an idea of the Debye equation with a costly double sum over all atoms, we are ready to review methods using the Debye equation designed to maximize accuracy while minimizing computation time.
In 994, Pantos and Bordas used an approach to simulate SAXS patterns of large molecules by building models of closely packed spheres that are much larger than individual atoms thereby reducing N for the calculation. This was incorporated into the software

Reconstruction of SAXS Profiles from Protein Structures
package DALAI. They used the Debye formula to compute an intensity profile of the proposed model [25]: The first sum gives the intensity for spheres in isolation, while the double sum give the contributions from density-density correlations. To reduce the computational task in the double summations of the Debye equation, all spheres were given the same radius and mass density. The structure factor product Fi(q)Fj(q) is now constant for each value of q and can be pulled out of the double sum. The Debye formula becomes: At this point in the formulation, Pantos and Bordas have not compromised the accuracy of the calculation for the reduced sphere model. They moved the bulk of the computation to the initial state of the algorithm. The calculation of rij is still O(N 2 ). To model large structures requiring a large number of spheres, they approximated pairwise distances between atoms. In this approach pair distances are grouped into a histogram of bin sizes based on the experimental data resolution. Without binning, the number of pairwise distance terms is equal to N(N-)/2. In this method the distances were quantized to multiples of dmax/ 00 where dmax is the maximum particle dimension. The resolution increases with decreasing bin size and decreases with increasing bin size. The resolution adjustment blurs the sampling grid by an undetectable amount in the resolution range of the simulation. The pair distance matrix of rjk values are now a vector of distances weighted by the number of distances occurring in a given bin. The scattering formula becomes:

∑ ∑
where m(rk) is the bin population at pair distance rk and the limits of the sum are the number of distance bins. This method is valid when protein structures are modeled with multiple spheres of constant radii and mass density. When this condition is met, the structure factor calculation can be brought out of the double sum. The Debye calculation can then be binned leading to change of an O(N 2 ) calculation to O(N). Prior to this calculation the pairwise distances must be pre-computed and binned which is still an O(N 2 ) calculation. The speed increase by this algorithm is dependent on the number of spheres used to model the system. An advantage of this method is that the pairwise distance matrix must only be computed once and can then be reused during the course of analysis.
In 20 0, Stovgaard et al, used the Debye formula for calculating the scattering curve combined with a coarse-grained representation of protein structure to address the high computational cost [26]. This approach led to a significant speed-up in computational time when compared with the all atom calculation. In this approximation, amino acids were represented by two scattering bodies or dummy atomsone representing the backbone, and the other representing the side chain. These dummy atoms were placed at the respective center of mass of the atomic group they represented. They had to estimate 2 form factor values for this approximation -one for alanine, one for glycine, one for the backbone, and 8 for the remaining side chains. They recreated these functions for each of the 2 form factors by binning the q-range into intervals of equal width (0.0 5 Å -) and then computing a form factor estimate for each of the 2 form factor types in each of the q-bins. They sampled form factor values from a training set of 297 structures with lengths between 50 and 400 residues and calculated a form factor estimate from the centroid in each bin. The SAXS curves generated through the Debye formula with dummy atom form factors for 50 proteins were compared with SAXS curves generated for the same proteins through CRYSOL with high agreement.
This method is contingent upon the accuracy of the form factor estimates for the dummy atoms and relies on a training set of 297 proteins to represent amino acids in nature. Amino acid residues behave differently in different environments, and caution must be used to ensure the training set accurately represents the environment of the protein of interest. The authors state that two additional developments with this method are needed: ) a proper description of the hydration layer and 2) a probabilistic description of the experimental errors associated with a SAXS experiment. This is currently under development in the PHAISTOS software package.
In the same year that PHAISTOS was published, the Sali Lab published their approach to the Debye formula and made their web server FoXs publically available [27]. To account for the displaced solvent and hydration shell, the structure factor contribution for a given atom is given by: where fv,i (q) is the form factor of a particular atom at a given q-value without the effects of excluded volume and a water shell, and fs,i is the structure factor for the excluded volume, and the last term is the structure factor of the hypothetical molecule that represents the displaced solvent. Si is the solvent accessible surface area for a given heavy atom and fw,i is the form factor of water. This approach is novel because it models the hydration shell as a function of the solvent accessible surface area of a given atom. The parameter c is used to adjust the electron density contrast while the parameter c2 is used to adjust the hydration shell thickness. The structure factor of water is given by the sum of all atomic form factors in water: The computed profile was fit to a given experimental SAXS profile by minimizing the chi function with respect to c, c , and c2:

Reconstruction of SAXS Profiles from Protein Structures
Similar to DALAI, FoXs has the structure factor calculation moved out of the double sum of the Debye formula. Instead of modeling uniform space filling spheres, they assumed an identical modulation of fi(q) for different atoms i: where the modulation function E(q) is equal for all atoms. This approximation creates a system of different scattering masses but equal shape. The pairwise distance distribution function represents population at a given distance r and is given in this approximation as: ∑ where (r-dij) is the Dirac-Delta distribution and r is a given pairwise distance. In this representation, only the form factor with a constant q = 0 is considered, which reduces the value to the atomic number Z of the given value. The intensity is given by:

∫
The modulation function E 2 (q) is parameterized as: The parameter b was determined by computing the SAXS profile with the original Debye formula using the non-approximated form factors and then computing the SAXS profile with the approximated form factors and initial guess of the b parameter. The parameter b=0.23±0.0 Åwas chosen to minimize the difference between both profiles from 30 random protein structures from the Protein Data Bank. This approximation typically speeds to calculation of the Debye formula by two orders of magnitude.
In 20 , the Zhang lab at the University of Michigan introduced SAXSTER, an online tool to improve protein template recognition by using SAXS profiles [28]. In their approach they also simulate the SAXS intensity profile according to the Debye equation. Instead of summing over all atoms, they sum over all atoms plus the explicit water atoms. The equation is: where W is the number of "dummy" water molecules around the protein representing the hydration shell. The initial structure factor equations are identical to equations previously shown. To account for the explicit water molecules around the model, they started from a face-centered cubic (FCC) lattice system with edge length Lcell. Each point in the lattice represents a water molecule. The overall structure factor is given by subtracting the excluded solvent from the atomic form factor and adding the explicit water contribution from the lattice. The protein structure is projected onto the FCC system and only water molecules in the range of 3.5-6.5 Å to any Cα atoms are kept. The density of the water molecules in the lattice system is defined by: where N is the number of points in the FCC lattice system, V is the volume of the system, k is the number of unit cells in the x,y,z directions and L = k * Lcell. L represents the maximum length for each direction. In a FCC lattice system, the water contribution from each corner of the cubic cell is /8 and the contribution from each face is /2. There are eight corners and six sides yielding an effective water contribution of four (8( /8) + 6( /2)). Each water molecule consists of 0 electrons yielding 40 (water contribution of four * 0 electrons) electrons per cubic cell. The number of excess electrons per volume in the hydration shell relative to the bulk water is: The thickness of the hydration shell is thus controlled by the edge Length of the FCC system. The threading-based models are composed of α-carbons only and the SAXS computations are given by: This form of the ρ(r) function is very similar to FoXs. The difference is that the water molecules are explicitly summed over. In the approximation, a new structure factor must be derived to represent the α-carbons: where 〈 〉 denotes the average over all residues of the same type calculated from 200 randomly selected PDB structures. The term f(q) is computed by the initial structure factor equations previously shown. This procedure produces 20 effective structure factors for each amino acid type. In the case of water, its scattering factor is calculated by the modified Debye equation with n = 3, rij = 0 and Fi(q) being the vacuum form factors for either hydrogen or oxygen.
In the methods previously described, the orientational averaging of the scattered waves was computed analytically using the Debye relation [ 7]:

〈 〉
Instead of analytically computing the orientational averaging, another method is to use a mathematical representation of the scattering body (or protein) that uses the rotational properties of spherical tensors. In this formulation the scattering body is expanded in terms of an infinite series of spherical harmonics. The orthogonality properties of the basis functions simplify the averaging of the harmonic series from which an overall scattering intensity can be computed. These basis functions are built from spherical Bessel functions, and normalized spherical harmonics of degree m and order L. This approach reduces the computational complexity from O(N 2 ) to O(N).
The scattering amplitude in vacuo of a particle with N atoms is: where ALm(q) are the partial amplitudes and are given by: ∑ Because of the orthogonality properties of spherical harmonics, the cross terms cancel and the intensity calculation is reduced to [30]: The huge advantage of spherical harmonics is that the complexity is reduced from O(N 2 ) to O(N). The integrand for averaging over the sphere in the proceeding equation is approximated by an L = O(qD) band limited function in a spherical harmonic basis where q is the momentum transfer vector and D is the maximum dimension of the sample. It is insufficient to use L smaller than qD/2 because any value less than this violate Nyquist Shannon sampling[3 ] for periodic functions. At least L 2 = O(q 2 D 2 ) sampling points are needed to provide an accurate integration of bandwidth L. Any index above L does not improve the fit for a given qmax, while any index below L will result in systematic errors in the calculation[32].
By the early 990s there were many studies showing the importance of modeling the water molecules surrounding a given macromolecule when recreating SAXS profiles from atomic coordinates. For example, Grossman et. al compared experimental SAXS profiles with SAXS profiles computed from different configurations of dimers, trimers, and tetramers. They optimized the agreement between experimental and simulated scattering profiles by placing solvent molecules on a diamond-shaped grid surrounding the structure [33]. In their results, the computed SAXS profile with the best fit to the experimental SAXS profile consisted of a solvent shell of 7 6 water oxygens up to a maximum distance of 3. 5 Å from the protein surface. Their results suggested that the water shell very close to the surface of a protein differs in electron density from the remaining bulk water and thus contributes to x-ray scattering.
In 995 Svergun et. al released CRYSOL -a program to compute SAXS intensity profiles from atomic coordinates while considering the hydration shell surrounding the target sample [24]. There were lingering questions concerning the true cause of the electron density contrast conditions surrounding a sample in solution. Was the density contrast caused by a water layer or could the contrast be explained by side chains moving freely on the protein surface? Three years later in 998 Svergun et al. confirmed in a combined Xray and neutron scattering study that the differing electron contrast conditions were more likely caused by a denser hydration shell rather than a higher mobility of the side-chains on the protein surface [34]. Water modeling is critical to the correct interpretation of SAXS profiles and computational methods are under development today to improve chemistry constraints, improve geometric constraints (surface curvature), and incorporate experimental data from high-angle SAXS [35].
Currently, popular approaches for modeling the hydration shell are to: ) place water molecules on the surface of the protein, 2) simulate the solvation shell by surrounding the protein with a continuous outer envelope, 3) simulate the solvation shell and excluded volume by computing a modified scattering factor.
CRYSOL employed the second approach to model the hydration shell and extended the multipole expansion and spherical harmonics formulation to handle not only the vacuo scattering, but the excluded volume and hydration shell.[9] In this formulation the intensity is given by: where Aa(q) is the in vacuo scattering, Ac(q) is the excluded volume scattering and Ab(q) is the border layer scattering, δρ = b -0, where 0 is the average scattering density of the solvent surrounding the particle and b is the average scattering density of the border layer around the particle with thickness Δ. 〈 〉 stands for the average over all particle orientations and Ω is the solid angle in reciprocal space, q = (q, Ω). Each of the three amplitudes is represented via its multipole components. Because of the orthogonal properties of the spherical harmonics, all cross terms cancel in the average over Ω, leading to: The value L defines the resolution of the particle. This approach works best with shapes that can be described using spherical harmonics which include most globular and extended proteins. Spherical harmonics is less adept at handling shapes that contain internal cavities such as shells and donuts.
[24] Additionally this method uses by default a harmonic order of 5, with a maximum value of 50. This gives the method a complexity of O(MN) with M=q 2 D 2 . This can lead to errors when a harmonic order greater than 50 is necessary based on the size of the protein and desired qmax.
In CRYSOL there are several adjustable parameters used when calculating predicted data that best match the experimental curve. These parameters are: the effective atomic radii multiplier which scales the solvent volume displaced by each atom (vi), the electron density contrast of the surface solvent layer (c2) and the total displaced solvent volume (c ), approximately equal to the variation of the electron density of the displaced solvent relative to bulk water. The need for adjustable parameters in CRYSOL becomes clear when studying SAXS profile reproducibility for distinct samples of the same protein on different instruments. The characteristic features of the experimental scattering profiles are conserved between experiments, but the experimental variation of the scattered intensity at higher q-values depends on the extrapolated intensity at I(0) [36]. Because of the beamstop in a SAXS experiment, I(0) cannot be directly observed. One method to extrapolate this value is to compute the slope of the intensity profile in the initial linear region of the scattering profile (the Guinier region) and extrapolate to the yintercept.
The adjustable parameters in CRYSOL absorb this variability by changing the level of the higher-q features of the predicted data relative to the low-q intensities.
Fifteen years after the introduction of the original CRYSOL program, Alexander Grishaev, Liang Guo, Thomas Irving and Ad Bax introduced AXES in 20 0 -a program for fitting SAXS data to macromolecular structure and ensembles of structures [36]. The program AXES was designed to be more discriminating than CRYSOL when evaluating poorly or incorrectly modeled protein structures. On a set of small well-studied proteins that had X-ray crystallography and solution NMR data they reported an improvement in fit by 0-50% by χ score. This set was comprised of four proteins -hen egg white lysozyme, cytochrome c, the B3 domain of protein G (GB3) and ubiquitin.
They reformulated the approach to fitting SAXS data by explicitly taking into account the sources of experimental data variability: where α accounts for the uncertainty in the measurements and c accounts for the variability of the detector and X-ray fluorescence. These uncertainties appear responsible for the systematic difference between repeated experimental data sets. Taking these uncertainties into account, the computed scattering intensity is: where Ω is the average taken over a discrete set of molecular orientations relative to the incident beam, solv is the average taken over the displaced and surface water sets, and ens is the average over the ensemble of macromolecular structures. The program AXES models the hydration shell directly by using explicit water molecules from a pre-equilibrated water box. Using this approach they tested how well they could discriminate different models of the same protein. They generated 2000 models of GB3 using Rosetta and fit the experimental SAXS data to all of the models using both CRYSOL and AXES. The CRYSOL fits yielded χ values that were much lower for poor models (models with a high RMSD relative to the native structure) than the native structure. This behavior is indicative of overfitting. Using AXES, they did not observe significantly better fits for the poor Rosetta models. Furthermore, when provided chemical shift guided Rosetta models with the correct fold, AXES correctly assigned higher χ values to nonnative structures.
The cost of this higher precision comes at the price of computation time. AXES is more than an order of magnitude slower than CRYSOL due to the averaging of the scattering amplitudes of the displaced and surface solvent sets over 20 different configurations. Among these configurations are: 6 elementary scattering functions averaged over angular orientations, macromolecular conformers, and molecular solvent configurations for a given electron density contrast of the surface solvent layer. Currently several avenues for computation speedup are under development.
We previously mentioned three popular approaches for treating the hydration shell and excluded solvent. They were: ) to place water molecules on the surface of the protein and compute scattering profiles with explicit water molecules, 2) simulate the solvation shell by surrounding the protein with a continuous outer envelope, 3) simulate the solvation shell and excluded volume by computing a modified scattering factor. The drawback to the first approach is the computational cost to construct the explicit solvent model. The drawback of the second approach occurs for proteins containing cavities. Assuming a uniform layer around a cavity or hole will introduce artificial areas without any electron density. The drawback of the third approach is the appearance of non-uniformities in the electron density by overlapping dummy atoms.
In 20 2, Liu et al proposed a new method to address the limitations of excluded solvent and hydration shell modeling [30]. In their approach they parameterized the Fourier transform of the electron density distribution function p(r) by a Zernike polynomial expansion with spherical harmonics. Zernike polynomials are orthogonal functions on the unit ball. They reformulated the SAXS intensity calculation as:

∑ ∑
where jn is the spherical Bessel function of order n.

∑ ∑
where cnlm is the Zernike moments from three-dimensional objects and knn'l is either a positive or negative coefficient given by: The Zernike moments are computed by a linear combination of the geometric moments of the object:

∑ ̅̅̅̅̅̅
where Mrst is the geometric moment and are the coefficients. The procedure to compute the coefficients are given by the Novotni and Klein algorithm [37].

∫ | |
The geometric moments are computed from a scattering object that has been segmented into a series of small volume cubes called voxels. Voxels are used in 3D graphics for the visualization and analysis of medical and scientific data. In this case the voxelization process maps electron density from the scatterer (or protein) into voxels from which the geometric moments can be computed. From this process, multiple sets of voxels are created: ) P -the set of nonzero electron density voxels, 2) S+B -the set of voxels representing the excluded solvent and surface bound solvent, and 3) S -the set of voxels representing the excluded solvent.
The Zernike moments of all three voxelized objects are combined by a weighted sum to produce one set of Zernike moments from which the scattering intensity is computed. The computational complexity of this algorithm is O(N), but prior to computation, the voxelized object must be created in a preprocessing step.
The advantage of the Zernike expansion method is that it can model holes or cavities of structures that spherical harmonics traditionally has difficulty with. This approach also incorporates all solvent-accessible surfaces into the overall scattering profile. When compared on a set of ten experimental proteins with high resolution crystal structures, this method had similar results with the spherical harmonic expansion method. This method offers an extension to spherical harmonic expansion methods that may improve the fit to experimental data by improved hydration shell and excluded volume treatment of structures with cavities or holes. It is included in the SASTBX software package.
In 20 2, the SAXS algorithm in PHAISTOS was accelerated using general purpose graphical processing units (GPGPUs) [38]. This method utilizes Bayesian probability statistics to compute the form factors in the Debye equation for protein models built from either one or two scattering bodies. The speed up using GPU's was measured from protein sizes ranging from 64 to 8 92 scattering bodies. They reported a 6x speed up for proteins with 64 scattering bodies. As the proteins increased in size the speed up increased to a maximum speed up of 394x for proteins with 8 92 scattering bodies.
Because of the uncertainty introduced into the accuracy of the Debye equation by approximation methods, we devised a method to compute the intensity directly without approximating structure factor calculations (unpublished). Furthermore, we model the hydration shell as a function of the solvent accessible surface area of a given atom analogous to FoXs. Our method BCL::SAXS offsets the high computational cost of the Debye formula by simultaneously computing multiple pieces of the equation using the parallel architecture of graphical processing units (GPUs). The Debye formula can be framed as an NxN square matrix of N-atom rows by N-atom columns where N is the number of atoms in a given protein.
The pairwise Euclidean distances (rij) are calculated from the upper triangle of the matrix. The diagonal is set to zero and the lower triangle is a symmetric mirror of the upper triangle. Each GPU thread computes a partial Debye sum.

∑
This results into a matrix of q rows by N-atom columns where q is the momentum transfer and N is the total number of atoms. These partial values are summed across each column to complete the intensity computation: ∑ This approach removes the uncertainty introduced by structure factor approximation while maintaining the efficiency of methods with structure factor approximations. The speed up using GPU's was measured from protein sizes ranging from 832 atoms (PDB ID: O26) atoms to 9 ,846 (PDB ID: VSZ). Using a GTX680 GPU card, we observed a 5x speed up for the smaller protein ( O26). For the largest protein in our set ( VSZ) we observed a speed up of 707x for protein VSZ using the same graphics card. By leveraging GPU's, we absorb the O(N 2 ) cost while achieving substantial reduction in computation time without sacrificing accuracy by introducing approximations to the Debye formula.

Conclusion
In this review we focused on proteins as a scattering body, but RNA and DNA can be studied as well using SAXS. These algorithms represent a sampling of methods for SAXS profile reconstruction and are not representative of all the approaches that exist. Another approach that expands these ideas was published in 20 2. In this work, Gumerov et. al proposed a Hierarchal algorithm based on a fast multipole method (FMM) to compute SAXS profiles [32]. For a review of timing and accuracy for protein of varying sizes and shapes with either spherical harmonic or Debye implementations we refer to their work. In each of the algorithms presented, there was a trade-off between speed and accuracy. In order to use the Debye formula for protein structure analysis, approximations were made to the equation to move terms out of the double sum. The uncertainty introduced by this approach is a subject for further study. In order to model with spherical harmonics, the correct harmonic order must be set and the shape complexity of the scattering body must be considered. We expect that more algorithms in the near future will take advantage of the parallelizable form of the Debye equation and use GPU acceleration to obtain the necessary computational speed without the uncertainty introduced by structure factor approximation and momentum transfer binning.
Furthermore, to standardize testing of SAXS algorithms we echo the suggestion of Rambo and Tainer and believe a reference dataset should be created with experimental SAXS profiles and PDB models [35]. This dataset would be comprised of proteins of varying sizes and shapes and folds. All new and existing methods should be benchmarked against this set to identify strengths and weakness of any given algorithm.