An Iterative Bézier Method for Fitting Beta-sheet Component of a Cryo-EM Density Map

Abstract Cryo-electron microscopy (Cryo-EM) is a powerful technique to produce 3-dimensional density maps for large molecular complexes. Although many atomic structures have been solved from cryo-EM density maps, it is challenging to derive atomic structures when the resolution of density maps is not sufficiently high. Geometrical shape representation of secondary structural components in a medium-resolution density map enhances modeling of atomic structures.We compare two methods in producing surface representation of the β-sheet component of a density map. Given a 3-dimensional volume of β-sheet that is segmented from a density map, the performance of a polynomial fitting was compared with that of an iterative Bézier fitting. The results suggest that the iterative Bézier fitting is more suitable for β-sheets, since it provides more accurate representation of the corners that are naturally twisted in a β-sheet.

: β-sheet, β-strands and β-sheet image. A β-sheet is composed of multiple β-strands, each shown as a ribbon in (A). An image (gray) of a β-sheet is superimposed with its atomic structure of β-sheet (ribbons) in (A) and a polynomial surface (golden dots) in (B). An example of the region where the surface does not fit well the image is marked with an arrow.
5-10Å resolution. In spite of the difficulty in distinguishing individual β-strands from a medium-resolution map, we previously showed that it is possible to predict the position of the β-strands of a β-sheet through the analysis of β-sheet twist [28]. A unique nature of a β-sheet is that each one is right-handed twisted [13]. StrandTwister utilizes this special character of a β-sheet to model the orientation of β-strands [28].
Given an atomic structure of a protein, a twist angle can be calculated along the peptide orientation and has been shown to have right-handed twist on a β-strand [13]. For a given atomic model of a β-sheet, the twist angle is often less than 40 ∘ [17,25]. However, when precise location of atoms is not available, as for a β-sheet image, precise calculation of a twist angle is challenging. We measured a twist angle that is represented by two neighboring β-traces, each of which is a central line of the β-strand [28]. In such a case, accurate calculation of twist angles depends on accurate modeling of a β-sheet.
Since the density image of a β-sheet appears as a thin layer, two previous methods exist to represent a β-sheet from its 3D image. Thinning and pruning is an image processing technique, and has been used to produce a thin surface from a 3D image at the β-sheet region [21]. This method is an effective way to detect a β-sheet, but the resulting surface is not smooth enough for direct calculation of twist angles. A polynomial fitting method was used in StrandTwister [27,28]. The polynomial was selected using least square fitting of the voxels in the β-sheet image. Although the surface appears to fit well in most part of the image, it does not fit accurately for a large β-sheet, particularly at the corner regions (arrow in Figure 1). It is possible that a polynomial is not a sufficient model to represent a twisted surface.
In this paper, we explore a general mathematical model to represent the surface of any given β-sheet, large or small. We propose to use an iterative Bézier surface to fit a β-sheet image. We show that the proposed model improves fitting accuracy, particularly for the corners of a β-sheet.

Method
The surface model is derived iteratively using a Bézier model. At each iteration, a set of control points are identified and used to guide the surface model. The overall process includes four major steps: (1) initial plane fitting; (2) generating initial bounding box of the β-sheet image; (3) iterative fitting of Bézier surfaces; (4) definition of a polygonal boundary of the surface.

Pre-processing
Before iterative fitting of Bézier surfaces, initial control points need to be identified and the initial boundary of the image need to be outlined. Since a β-sheet is a thin layer of density, a simple geometrical representation of a β-sheet is a plane. We utilized a plane to assist the initialization of the boundary and control points. To search for the best-fit plane of the β-sheet image, we define the plane in (1) and (2) and search for the best fit plane using (3).
Where p = (x, y, z) is a point in 3-dimensional space, α and β are angles variable between 0 and 2π. ∆ is a real number. P is the set of all points in the β-sheet image.
In principle a least square fitting of a plane will determine the best-fit plane of the β-sheet image, but we implemented an approach simply varying α, β, and ∆ based on the feedback of the fitting error in (3). It appears to work in all test cases. Note that the accuracy of the initial plane is not critical. The optimization stops when a desired threshold of error is reached. Once a plane was found, the initial boundary and control points were defined by projecting the β-sheet image to the plane to create a 2-dimensional image. A rectangle was identified on the 2-dimensional image using Deming regression [15], a heuristic method to approximate a bounding box. This heuristic method is in O(n) where n is the number of points to be bounded. Once the bounding box is defined, the four corner points are projected back to the β-sheet image to define the initial four control points in 3-dimensional space.

Iterative Bézier Surface Fitting
Bézier surface is a parametric surface S defined as in (4).
In which C i,j is a control point in 3-dimensional space, i = 0, .., n; j = 0, . . . , m. A control point is the position of a selected voxel of the β-sheet image through a pre-processing step. B n i (u) is defined as in (5).
Using the initial four control points derived in the pre-processing step, an initial surface is decided by searching for a Bézier surface that generates the smallest fitting error evaluated as Least Squared Orthogonal Projection distance in (6).
p is a point on the β-sheet image, andp is a point on surface S with the shortest distance from p. d(p,p) is the distance between point p andp. For a general surface, calculating point projection onto a surface is a non-trivial optimization problem. We approximate the closest point using a fast local optimization. Initially a set of points were uniformly distributed on the surface. Given a voxel in the β-sheet image, its closest point on the surface is searched in the local neighborhood of an initial point using gradient descent method. To search for the surface with the minimal orthogonal distance, the position of control points were searched in a local neighborhood. The iterative step of fitting Bézier surfaces consists of adding more control points and regenerating approximately best-fit surfaces. The number of control points changed from 2 × 2 to 3 × 3 and 4 × 4, as examples. The number of control points nm is chosen to balance between precision and performance. Figure 2A was generated with n = 6 and m = 10. Because each point projection takes a constant amount of time, our heuristic allows us to compute RMSE of orthogonal in O(n) time. B: An example of the long edges (gray lines) that are pruned after Delaunay triangulation using χ algorithm [16]. Dark lines are the resulting boundary of the point set.

Polygonal Boundary of the β-sheet Surface
A Bézier surface is traditionally defined as a patch, with surface variables u and v ranging from 0 to 1. However, a β-sheet will not have the shape of a square patch. Thus, a way to outline the β-sheet region on the Bézier surface is needed. We first projected the voxels of the β-sheet image onto the optimized Bézier surface. We then calculated the Delaunay triangulation of the point set using the code provided at https://github.com/ ironwallaby/delaunay.
After the triangles are generated using Delaunay, long edges at the boundary are iteratively pruned using χ algorithm until a desired maximum edge length is reached [16] (Figure 2). A polygonal boundary is generated this way on the Bézier surface to outline the boundary of the β-sheet. Note that the Delaunay triangulation and the pruning using χ algorithm has O(nlgn) time, since the algorithm uses a relational map to represent and access the triangulation efficiently. An example of the surface boundary derived for a β-sheet of protein 1CHD (PDB ID) is shown in Figure 2.

Polynomial Fitting of a β-sheet Image
Previous studies have shown that an order 3 polynomial can be an approximation of a beta-sheet. We summarize the polynomial fitting here. More details can be found in [27,28]. Given an isolated 3D image of a beta-sheet, it was first aligned with the Z-axis of the coordinate system using the normal vector that passes through the center of the image. Least square fitting of the image using (7) was performed to determine parameters A to J. Finally, the beta-sheet image and the polynomial surface were rotated back to the original DE GRUYTER OPEN B position. z = Ax 3 + By 3 + Cx 2 + Dy 2 + Ex 2 y + Fy 2 x + Gxy + Hx + Iy + J Fitting error was estimated using the RMSE between the image and the surface in (8).
Where z i is the Z coordinate of i-th voxel of the β-sheet image P, and z i is the Z coordinate of its corresponding point on surface S. Note that the image and the surface are registered using the normal vector of image P and the Z-axis of surface S.

Results
Iterative Bézier surface fitting was tested using two sets of β-sheet images. The first set contains eight simulated β-sheet images (see 3.1) and the second contains five β-sheet images extracted from cryo-EM density map EMD1237 (see 3.2). We compared two surface fitting methods -the polynomial fitting and the iterative Bézier fitting. In evaluation of the performance, we compared the root-mean-square-error (RMSE) between the β-sheet image and the surface model. Since the polynomial fitting was implemented in StrandTwister, the RMSE calculation used in StrandTwister, as in (8), was directly used in comparison. The RMSE between the β-sheet image and the Bézier surface was calculated after each voxel in the image is registered with its closest point on the surface model. We also performed visual inspection by superposition of the atomic structure, the image, and the surface model of the β-sheet.

Fitting Simulated β-sheet Images
For each of the eight simulated test cases, the atomic structure of a β-sheet was used to simulate the βsheet image. The atomic structures were downloaded from the Protein Data Bank, and all atoms on a β-sheet were used to simulate the 3-dimensional image of the β-sheet to 10 Å resolution using molmap function in Chimera [24]. To guide the surface fitting towards the dense points of the image, a density threshold was used to generate the β-sheet image. The same threshold was used in each test case for both the polynomial and Bézier fitting. An example of the simulated β-sheet image (gray region in Figure 4) is superimposed with the atomic structure (ribbon in Figure 4). Although most of the polynomial surface (golden dots) is fairly close to the Bézier surface (blue), differences are seen mostly at the corner or the edge of the image (Figure 4). Bézier  Figure 4 and Figure 3 use the same color annotation.
surface appears to be closer to the atomic structure (ribbon) at the corner in this case than the polynomial surface.
Each of the eight β-sheets contains three to ten β-strands. We observed that the RMSE error between the image and the Bézier surface (column 3 of Table 1) is consistently smaller than that for the polynomial surface (column 4 of Table 1). As an example, β-sheet 1A12_A has four strands. The RMSE between the image and Bézier surface is 1.01 Å ( Table 1). The RMSE between the β-sheet image and the polynomial surface is 1.15 Å. The major difference in fitting appears to be located at the corner region ( Figure 3). Bézier surface represents more accurately at the corner of a β-sheet, and that is expected since the shape of the corners will be mostly amplified by the twist of a β-sheet. Our test suggests that the iteratively fitted Bézier surface is more accurate than the polynomial surface. Although this case involves a β-sheet of only four strands, one of the β-strands has a significant kink, as seen for some β-sheets. It is possible that a polynomial is not the best way to fit such a case. We noticed that a Bézier surface is more flexible to represent such a shape.  Similar result is observed for 1CHD-SH1 that has seven β-strands. In this case, the β-sheet is larger than previous case 1A12_A. We observe a difference of 0.54 Å RMSE when two different surfaces are used in calculating RMSE. Although some of the difference might be resulted from two slightly different ways to register a voxel with its closest point on the surface, some of the RMSE difference is expected to be from the different fitting at the corners, since visual inspection clearly shows difference at the corners ( Figure 4). It is possible that a flexible surface is needed to fit a larger β-sheet image, particularly when the corners of a β-sheet appear highly-twisted, presumably from the accumulation of twist from every pair of neighboring β-strands of the β-sheet. We observe that the Bézier surface is better in representing the twist of a β-sheet and provides a more accurate fitting of the β-sheet image (Figure 4).

Fitting Cryo-EM β-sheet Images
The performance of iterative Bézier surface fitting was further tested using five β-sheet volumes of an experimentally derived cryo-EM density map. Cryo-EM density map (EMD-1237, 7.2 Å resolution) was aligned with atomic structure 2GSY (PDB ID) and was downloaded in September 2011. Chain A of 2GSY was used to mask the density region corresponding to chain A. SSETracer was then used to identify α-helices and β-sheets. Chain A of 2GSY has seven β-sheets, two of them are small β-sheets with 2 of 3 β-strands. The five larger ones (with four to six β-strands) were segmented out using SSETracer and were used in the test. We observed similar results as for the test using simulated density maps. The RMSE fitting error of iterative Bézier surface is consistently lower than that of the polynomial surface (Table 2). We also observed that the fitting error is generally slightly higher for cryo-EM volumes than for simulated volumes (Table 1 and Table 2). A possible reason is that cryo-EM maps generally have lower quality than simulated maps.

Discussions
Deriving β-strands from β-sheet density volume requires accurate representation of the β-sheet surface. It was shown in StrandTwister method that candidate sets of β-strand traces can be drawn on the surface, and the correct orientation of β-strand traces can be identified based on maximum twist angle measurement. Although a polynomial surface fit well in the central region of a β-sheet, it does not fit well at the corners. The inaccurate surface representation at the corners may result in inaccuracy of β-traces at the corner regions. We show that an iterative fitting of a Bézier surface provides more flexible surface representation for the twisted corners. We expect that the resulting β-traces on the Bézier surface are more accurately predicted than those on a polynomial surface.

Conclusions
Cryo-EM technique has recently been used to produce 3-dimensional images for many large molecular complexes. When the resolution of 3-dimensional images is not sufficient to derive atomic structure directly from the image, it is still challenging to interpret the image. At medium resolution of 5-8 Å, secondary structures such as α-helices and β-sheets show their characteristic shape. In order to model β-strands from a β-sheet image, we propose an iterative fitting method using Bézier surface. This method is compared to the polynomial fitting that we previously used in StrandTwister [28]. A test involving eight simulated β-sheet images of various sizes shows that the fitting error RMSE is reduced for all cases. Iterative Bézier surfaces is capable of representing twisted corners of a β-sheet image more precisely. Our future work will involve the development of a method to predict β-strand with the enhanced accuracy than StrandTwister.