EMNUSS: a deep learning framework for secondary structure annotation in cryo-EM maps

Abstract Cryo-electron microscopy (cryo-EM) has become one of important experimental methods in structure determination. However, despite the rapid growth in the number of deposited cryo-EM maps motivated by advances in microscopy instruments and image processing algorithms, building accurate structure models for cryo-EM maps remains a challenge. Protein secondary structure information, which can be extracted from EM maps, is beneficial for cryo-EM structure modeling. Here, we present a novel secondary structure annotation framework for cryo-EM maps at both intermediate and high resolutions, named EMNUSS. EMNUSS adopts a three-dimensional (3D) nested U-net architecture to assign secondary structures for EM maps. Tested on three diverse datasets including simulated maps, middle resolution experimental maps, and high-resolution experimental maps, EMNUSS demonstrated its accuracy and robustness in identifying the secondary structures for cyro-EM maps of various resolutions. The EMNUSS program is freely available at http://huanglab.phys.hust.edu.cn/EMNUSS.

It is challenging to build accurate structure models for cryo-EM maps [26]. Rigid fitting and flexible fitting are commonly used methods to fit atomic structures into EM maps, but they are only possible if template structures are available. Without template structures, de novo modeling tools are needed to build fullatom models into EM density maps. However, the application For those de novo modeling methods, secondary structure information plays a vital role. For example, EM-Fold places the secondary structure elements predicted from sequence to detected α-helix density rods [16]. Iterative secondary structure refinement was also used by Pathwalking [17].
With the exponentially increasing number of deposited EM maps, various algorithms have been developed for secondary structure detection in cryo-EM maps. Traditional methods detect density regions that correspond to typical α-helices or β-sheets [30][31][32][33]. Recently, deep learning has also been applied in the task of secondary structure classification for EM maps. In the recent decade, many attempts had been made to develop deep learning frameworks for secondary structure prediction in cryo-EM maps [34][35][36], including state-of-the-art Emap2sec [37] and Haruspex [38].
Emap2sec was developed for EM maps of intermediate resolutions (middle resolution), which shows promising results and provides a novel approach to structural interpretation of maps at intermediate resolution. The updated version, Emap2sec+ [39], extended the method to detect nucleic acid region in EM maps and adopted a more advanced convolutional neural network architecture, ResNet [40]. Emap2sec and Emap2sec+ perform classification for each voxel in given EM map with a stride of 2 Å. However, there are limitations with the methods. On one hand, the input of Emap2sec are cubic voxels of side length 11 Å, which is shorter than the length of an average α-helix with 10 residues (15 Å), thus may lead to reduced accuracy. On the other hand, Emap2sec/Emap2sec+ may also be incompatible with large EM maps since these maps have astronomical number of voxels. Haruspex was developed to detect nucleotides and protein secondary structure in high-resolution EM maps using a U-Net (UNet)-style [41] deep learning architecture. The full convolutional architecture of Haruspex enables its fast and accurate prediction even on personal laptops without GPU. Nevertheless, there are also limitations for Haruspex. For convenience, Haruspex rescales the grid interval of a cryo-EM map to 1.1 Å if it is outside [1.0, 1.2] Å. However, the inconformity of grid interval will confound the feature extraction of convolution layers. In addition, Haruspex tries to predict the 'unassigned' regions for a given EM map. Such 'unassigned' regions may cause imbalance on classification and reduce the predictive power of Haruspex.
To overcome the shortcomings of existing approaches, we have introduced a novel secondary structure determination framework for cryo-EM maps at both high and intermediate resolutions, named EMNUSS. EMNUSS adopts a three-dimensional (3D) nested UNet architecture [42] that can fast and accurately predict protein secondary structures for cryo-EM maps of varied sizes. EMNUSS showed a significantly improved performance on three datasets including simulated maps, middle resolution experimental maps and high-resolution experimental maps.

Network architecture
We adopted a nested UNet architecture to assign secondary structure for EM density maps. Figure 1 shows an overview of the EMNUSS architecture. Nested UNet (UNet++) [42] starts with an encoder subnetwork or backbone followed by a decoder subnetwork. 3D maxpooling layer with stride of 2 was used as downsampling, and trilinear interpolation layer with zoom factor of 2 was used as upsampling. Compared with the UNet used by Haruspex, UNet++ has dense skip connections on skip pathways, which improves gradient flow. Moreover, the deep super-vision used in UNet++ enables model pruning and improves performance. The input of our network are density chunks with a grid interval of 1.0 Å. The chunk size was set to 40 × 40 × 40 in order to cover the secondary structure element including surrounding interaction partners. The output of our network are annotated chunks of the same size. Each voxel in the annotated chunk has three channels containing the probabilities that this voxel is close to an α-helix residue, a β-strand residue or a coil residue.

Data collection
To be comparable with existing methods, the test sets of simulated maps and middle resolution maps used in Emap2sec were also used to evaluate our EMNUSS. For simulated maps, the authors of Emap2sec collected a nonredundant set of 2000 SCOPe structures [43]. we used the e2pdb2mrc.py program from the EMAN2 package [44] (v.2.11) to generate simulated EM maps for each structure at resolutions of 6.0 and 10.0 Å. Maps simulated from 34 structures were used as test set (Supplementary Table  1) and maps simulated from the remaining 1964 structures were used to train our EMNUSS.
For experimental maps at middle resolution, the test set of 43 experimental EM maps at 5-10 Å resolution in Emap2sec was used to evaluate EMNUSS (Supplementary Table 2). Instead of using 4-fold cross-validation like Emap2sec, we decided to build a nonredundant training set for middle resolution maps. All the EM density maps at 5-10 Å resolution that have associated PDB models were downloaded from EMDataResource. Any PDB structure and its corresponding EM map that met the following criteria were removed: (i) including nucleic acids, (ii) containing backbone atoms only, (iii) including missing chain, (iv) severe misfits between map and deposited model and (v) having any chain with over 30% sequence identity with any chain in the test set. In order to ensure criteria (iv), we calculated the cross-correlation between the deposited map and the map simulated from deposited model at the same resolution using UCSF chimera. The value of 0.65 was chosen as the threshold of cross-correlation. Afterwards, all the remaining maps were manually checked. Finally, the remaining training cases were clustered using a greedy algorithm. Two models are considered to be similar if any chain in the first model has >95% sequence identity with any chain in the second model. Case with the largest number of similar cases was chosen as a representative of the biggest cluster, and then the biggest cluster was removed from calculation. This procedure was repeated until all the cases are clustered. The final nonredundant training set consists of the representatives of each cluster. A total of 120 experimental EM maps with resolution ranging from 5.0 to 9.5 Å were retained as training set for experimental maps at middle resolution (Supplementary Table 3).
For experimental maps at high resolution, all the EM density maps with resolutions ranging from 2.0 to 4.0 Å that have associated PDB models were downloaded from the EMDB. Any PDB structure and its corresponding EM map that met the following criteria were removed: (i) including nucleic acids, (ii) containing backbone atoms only, (iii) including missing chain and (iv) severe misfits between map and deposited model. The remaining models were clustered using greedy algorithm at 30% sequence identity cutoff. A total of 468 experimental EM maps with resolution ranging from 2.0 to 4.0 Å were retained as the dataset for experimental maps at high resolution (Supplementary Table 4). The training and testing were performed on this dataset using 3-fold cross-validation.

Data processing of training maps
The grid size of the maps was unified to 1.0 Å by applying trilinear interpolation. For training, the ground truth secondary structure of each voxel was assigned according to the closest backbone atom (N, C or Cα atom) within 3.0 Å, which was annotated using STRIDE [45]. Residues with structure codes of H, G or I were labeled as α-helices (Helix). Residues with codes of B/b or E were labeled as β-strands (Sheet). The rest residues were labeled as Others. All the unassigned voxels (no backbone atom within 3.0 Å) were marked and excluded from training. Namely, the unassigned voxels will not contribute to the loss during training. The input density chunks of EMNUSS were of size 40 × 40 × 40, and the output predicted chunks of EMNUSS were of the same size. The input density maps (and their corresponding annotated ground truth maps) were cut into overlapping boxes of size 60 × 60 × 60 with a slide stride of 30. Density values of each map were clipped to be equal or greater than 0.0, and then normalized in each box to the range 0-1.0 by the maximum density value of each box. To ensure effective training, boxes that have ≥95% unassigned voxels were excluded from training.

Network training
Different EMNUSS models were trained for the datasets with different resolution ranges and all the training parameters were the same for different models. In total, 10% of the training maps were used as validation set during network training. Afterwards, all the maps from the training set and validation set were cut into boxes of size 60 ×60 × 60. The input training data were augmented through random 90 • rotations and by randomly cropping the input 60 × 60 × 60 box into a 40 × 40 × 40 chunk. The network was implemented with Pytorch. For each model, the network was trained for at most 300 epochs with 180 boxes employed in one batch. Adam optimizer was adopted to minimize the crossentropy loss for the prediction. The loss is weighted by [0.25, 0.5, 0.25], which is roughly the reciprocal of the ratio for the numbers of voxels of three secondary structures (Helix, Sheet and Others) in the dataset. The initial learning rate was set to 1e-3, and no L2 regularization was applied (weight_decay = 0). Learning rate decay was adopted. Namely, the learning rate will be reduced to 1/2 of its current value if the average loss on training set does not descend for 4 epochs. The training procedure will be stopped when the learning rate reaches a minimum value of 1e-5. The model with the least validation loss was used in evaluation.

Evaluation of the EMUSS
We tested our EMNUSS on three different test sets. For each case in the test set, the EM density map was cut into boxes of size  Only the cube of size 30 × 30 × 30 at the center of each chunk was used to reconstruct the annotated map. In order to compare our results with Emap2sec, similar evaluation criteria were used. Secondary structure annotation by EMNUSS was evaluated at the voxel and amino acid residue levels. For each voxel in the map, the ground truth secondary structure was taken from the closest Cα atom within 3.0 Å, which was defined by STRIDE annotation. The secondary structure of a Cα atom was considered as correctly predicted if the majority of neighboring voxels that were within 3.0 Å of the atom have the correct secondary structure assignment. As the background in the EM map does not contain atomic information, we have also excluded those background voxels below a certain density threshold from the experimental maps. Therefore, each voxel in the EM map will have one of the three secondary structure classes (Helix, Sheet or Others) in our EMUSS model. To test the robustness of EMUSS, two different values of density thresholds were used in this study: 0.0 or the author-recommended contour level. A residue is defined to be below the threshold if any voxel around this residue (within 1.0 Å) has a density value below threshold. F1 score, which is the harmonic mean of the precision and recall of the assignments, was used to evaluate the performance of EMNUSS on voxels level. For the residue-level evaluation, residue Q3 accuracy was reported, which is the fraction of correctly assigned residues for the entire protein and for each of the three secondary structure classes. Comparison with related works.
EMNUSS was compared with Emap2sec on the test set of simulated maps and middle resolution maps. For the test set of simulated EM maps, we simply used the benchmarking values published in Emap2sec paper. For the test set of middle resolution EM maps, the trained models of Emap2sec taken from Code Ocean [46] were used. The input density data of Emap2sec were voxels of size 11 Å × 11 Å × 11 Å. The grid size of the maps was unified to 1.0 Å by applying trilinear interpolation of the electron density in the maps. The data were obtained from a map by traversing along the three dimensions of the voxels with a stride of 2.0 Å. As shown in Supplementary Table  2, compared with Phase 1, the Phase 2 network of Emap2sec performed worse, especially for those coil regions, such that in the following discussion we only compare EMNUSS to Phase 1 of Emap2sec. EMNUSS was also compared with Haruspex on all three test sets. The Haruspex model was taken from the authors' Github repository at https://github.com/thorn-lab/haruspex. The output of Haruspex was the annotated map of four channels representing codependent probabilities for four classes ('helix', 'sheet', 'nucleotide' or 'unassigned'). Since nucleic acids were excluded from the test sets used in this study, the 'nucleotide' channel of Haruspex was ignored in classification.

Secondary structure prediction for simulated maps
We first evaluated the performance of EMNUSS on the test set of 34 diverse SCOPe [43] protein domain structures. EM density maps are simulated from these structures at two different resolutions, 6.0 and 10.0 Å. Figure 2 shows a comparison of EMNUSS and Emap2sec in secondary structure detection at the voxel and residue levels. The corresponding benchmarking results are listed in Table 1. It can be seen from the figure that our EMNUSS significantly outperformed Emap2sec on the simulated map set. For the maps simulated at 6.0 Å, EMNUSS achieved an average overall voxel F1 score of 0.902, compared with 0.796 by Emap2sec (Figure 2A). The average voxel F1 scores for three different secondary structure classes were 0.922, 0.864 and 0.865 for Helix, Sheet and Others, respectively. At the residue level, EMNUSS also achieved a high-average overall residue Q3 accuracy of 0.918, which is much higher than 0.831 by Emap2sec. The average residue Q3 accuracies for three secondary structure classes were 0.941, 0.927 and 0.863 for Helix, Sheet and Others, respectively. Compared with the maps simulated at 6.0 Å, there was a slight decline in the performance of EMNUSS for the maps simulated at 10.0 Å ( Figure 2B). An average overall voxel F1 score of 0.880 and an average overall residue Q3 accuracy of 0.900 were achieved by EMNUSS, which are still significantly higher    Figure 3 shows several examples of secondary structure annotation by EMNUSS [47]. Figure 3A and B shows the results for the maps simulated at 6.0 Å. Figure 3C and D shows the results for the maps simulated at 10.0 Å. It can be seen from the figure that the predicted secondary structures shown at the right side agree with the ground truth secondary structure assignment shown at the left side very well. Interestingly, for SCOPe entry d1a26a1, the β-bridge structure between two residues ILE_691 and HIS_742 ignored by Emap2sec was correctly   Figure 3E). These results demonstrated that our EMNUSS was indeed in capable of learning secondary structure information from EM maps.

Secondary structure prediction for middle-resolution experimental maps
Then, we compared EMNUSS with Emap2sec and Haruspex on the middle-resolution test set of 43 experimental EM maps with resolutions ranging from 5.0 to 9.5 Å. Two different values of thresholds were used: 0.0 or the author-recommended contour level. Figure 4 shows a comparison of the accuracies among the three methods at the voxel and residue levels. It can be seen from the figure that EMNUSS significantly outperformed Emap2sec and Haruspex on the middle-resolution experimental maps set. The corresponding benchmarking results are listed in Table 2 Table 2 for details of the evaluation result of each map.
Another notable features is that there is only a little difference in performance for EMNUSS when choosing different threshold values. However, a sharp decline in performance was found for Emap2sec when using 0.0 as the threshold. It may be understood because Emap2sec filtered their training voxels according to the author-recommended contour level. In contrast, EMNUSS was trained with chunks that were cut from raw EM density maps, and thus, its solid performance was ensured on the entire map. Figure 5 shows several examples of secondary structure annotation by EMNUSS for the experimental EM maps at middle resolution. EMNUSS performed well on EMD-8724, a 6.8 Å map for the yeast V-ATPase in complex with Legionella pneumophila effector SidK [48], as illustrated in Figure 5A. EMNUSS successfully predicted most of the helix and strand regions, and thus yielded a high overall residue Q3 accuracy of 0.789 (0.801) when using 0.0 (contour) as the threshold. Figure 5B shows the results of EMD-8549, which is a 5.7 Å map for Vps4-Vta1 complex [49]. EMNUSS successfully predicted most of the helix and strand regions except for the peripheral helices that have relatively weak and fuzzy electron density signals in the EM map. The overall residue Q3 accuracy was 0.709 by using 0.0 as the threshold and increased to 0.824 after excluding those peripheral helices from evaluation by using contour as the threshold. For EMD-3329 in Figure 5C, which is a 6.1 Å map for the complex of Tor and Lst8 [50], EMNUSS successfully predicted the helix region in the middle and the β-strand-rich region on both sides and achieved high overall residue Q3 accuracies of 0.780 and 782 by using 0.0 and contour as the thresholds, respectively. Figure 5D gives the example of EMD-3201, which is an 8.3 Å map for E. coli replicative DNA polymerase complex in DNA-free state [51]. EMNUSS successfully detected the overall architecture of two β subunits on the top. However, due to the poor resolution, EMNUSS crossed the borderline between helices and strands for the other subunits, resulted in a lower overall residue Q3 accuracy of 0.612 (0.620) using 0.0 (contour) as the threshold, compared with the above three examples.

Secondary structure prediction for high-resolution experimental maps
We further compared EMNUSS with Haruspex on the highresolution dataset of 468 experimental maps with resolutions ranging from 2.0 to 4.0 Å. The training and testing of EMNUSS were performed on this dataset using 3-fold cross-validation. Figure 6 shows the performance of EMNUSS and Haruspex at the voxel and residue levels. The corresponding benchmarking results are listed in Table 3. It can be seen from the figure that EMNUSS performed significantly better than Haruspex on the high-resolution dataset. When using 0.0/author-recommended contour level as the threshold, EMNUSS achieved a high average overall voxel F1 score of 0.838/0.846, compared with 0.595/0.612 for Haruspex. At the residue level, EMNUSS also achieved a higher average overall residue Q3 accuracy of 0.852 and 0.861 when using 0.0 and contour as the thresholds, compared with 0.741 and 0.746 for Haruspex, respectively. Similar to the situation in middle-resolution test set, when using residue Q3 accuracies as the criteria, Haruspex showed a comparable performance with EMNUSS on predicting alpha helices but performed substantially worse than EMNUSS on predicting β-sheets and coils. The significantly higher voxel F1 scores for all three secondary structure classes achieved by EMNUSS suggests the better performance of EMNUSS than Haruspex. See Supplementary Table  4 for details of the evaluation results of each map. Figure 7 shows several examples of secondary structure annotation by EMNUSS for the experimental EM maps at resolutions ranging from 2.0 to 4.0 Å. Figure 7A shows the results of EMD-9195, which is a 3.1 Å map for ADP-bound human mitochondrial Hsp60-Hsp10 football complex [52]. EMNUSS accurately annotated most of the helix and strand regions on this example, and thus yielded a high overall residue Q3 accuracy of 0.906 (0.923) when using 0.0 (contour) as the threshold. A β-strand-rich example EMD-4789 is displayed in Figure 7B, which is a 3.2 Å map for the pore structure of Clostridium perfringens epsilon toxin [53]. The core β-barrel region and peripheral helices were precisely annotated by EMNUSS. The overall residue Q3 accuracies were 0.853 and 0.868 when using 0.0 and contour as the threshold, respectively. Figure 7C gives the example of EMD-4919, which is a 2.9 Å map for mechanosensitive channel MSCS embedded in the membrane bilayer [54]. Although the transmembrane helices region (at the bottom) was coated with lipid bilayer in the EM map, EMNUSS successfully predicted most of the helix and strand regions and achieved a high overall residue Q3 accuracy of 0.932 (0.935) when using 0.0 (contour) as the threshold. One of the worst predictions made by EMNUSS is EMD-20579, which is a 3.8 Å cryo-EM map for RET/GFRa3/ARTN extracellular complex [55] (Figure 7D). EMNUSS achieved overall residue Q3 accuracies of 0.656 (0.659) when using 0.0 (contour) as the threshold on this example. As shown in the figure, EMNUSS yielded correct secondary structure annotation for almost the whole map except for the β-sheet-rich CLD4 domains marked by black circles. One of the possible reasons for such wrong prediction is that the electron density volumes around the β-sheets differ from typical density volumes of β-sheets. As illustrated in Figure 7E1, the density volume for the sheet is not completely connected. Thus, EMNUSS mistook the β-sheet for parallel helices. Figure 7E2 is an enlarged view of the β-barrel region in Figure 7B. The parallel β-strands can be seen clearly from the density volume. However, high-resolution map is indeed not essential for EMNUSS to make right predictions. As shown in Figure 7E3, another β-sheet region was successfully predicted for EMD-20579. Although individual β-strands are not visible here, the electron density signals of individual β-strands form an integrated surface, which is one of the typical electron density signals for a β-sheet. The essential reason of the incorrect predictions on such atypical density volumes is that the network is somehow short sighted and not acquainted with maps at lower resolutions. Thus, in order to improve the performance of EMNUSS, on one hand, the receptive field of EMNUSS should be broadened by optimizing the network architecture. On the other hand, a more detailed division of training maps according to resolutions is a potential way to improve the performance of EMNUSS.

Impact of map resolution
We further explored the impact of map resolution on the accuracy of EMNUSS. Since there was no significant difference in the performance of EMNUSS for choosing different threshold values, we only discussed the results using the author-recommended contour level as the threshold in this section. Figure 8A shows the overall residue Q3 accuracy relative to the map resolution on the middle-resolution set. It can be seen from the figure that the performance of EMNUSS depends on the resolution. The Pearson correlation coefficient between residue Q3 accuracy and resolution was -0.746. Figure 8B shows the overall residue Q3 accuracy relative to the map resolution on high-resolution set. The performance of EMNUSS seems to have no significant correlation with resolution on the high-resolution set. The Pearson correlation coefficient between residue Q3 accuracy and resolution was only -0.138. Comparing Figure 8A and Figure 8B, we can find that the performance of EMNUSS has a significantly lower correlation with map resolution on the high-resolution dataset than on the middle-resolution set. Considering the fact that we have 300+ maps (in each crossvalidation round) as the training set for high-resolution dataset compared with only 120 maps for middle-resolution dataset, this phenomenon may be attributed to the fact that EMNUSS was sufficiently trained on the high-resolution dataset. On the contrary, in order to improve the performance and robustness for the maps at middle resolution, more maps are needed for training.

Running time
In addition to the high accuracy of EMNUSS, another preeminent character of EMNUSS is its high computational efficiency. We have measured the running times of EMNUSS for different sizes of input maps on the test set of 468 high-resolution experimental maps. On a NVIDIA Tesla K80 GPU device with 12 GB graphic memory, EMNUSS can finish a prediction for any test case in no more than 2 min. The average running time of EMNUSS was only 31.3 s, which is significantly faster than 70.1 s for Haruspex. Figure 9 shows that there is a power-law relationship between the running time of EMNUSS and the input map size. The relationship can be fitted with the following formula: t = 6.241x 0.412 (1) where t is the expected running time in seconds for a map of size x MB. Therefore, the running time for EMNUSS will increase much slower than the map size does, which is especially encouraging for large-size maps.

Conclusion
We developed a deep learning framework to annotate protein secondary structures in EM density maps, which is referred to as EMNUSS. EMNUSS was extensively evaluated on three diverse test sets of simulated maps, middle-resolution experimental maps and high-resolution experimental maps. It was shown that EMNUSS significantly improved the accuracy of secondary structure detection and outperformed the existing approaches. Given its high efficiency and accuracy, it is anticipated that EMNUSS will serve as a valuable tool for secondary structure annotation for cryo-EM maps and help determine atomic structures for EM maps. With the increasing number of deposited EM density maps, the performance of EMNUSS can be further improved. One of our possible subsequent works of EMNUSS is to adding it into structural modeling for EM maps.

Key Points
• Secondary structure information is valuable for the atomic structure determination from cryo-EM maps. • A novel deep learning framework was proposed to detect protein secondary structures in EM density maps.
• Our model adopts a 3D nested UNet architecture for fast and accurate prediction.

Supplementary data
Supplementary data are available online at https://academic. oup.com/bib.

Data availability
The EMNUSS program is available at http://huanglab.phys. hust.edu.cn/EMNUSS. The raw data of the secondary structure annotation results are provided in the Supplementary Information. The data that support the findings of this study are available from the corresponding author upon request.