Deep learning improves macromolecule identification in 3D cellular cryo-electron tomograms

Moebel, Emmanuel; Martinez-Sanchez, Antonio; Lamm, Lorenz; Righetto, Ricardo D.; Wietrzynski, Wojciech; Albert, Sahradha; Larivière, Damien; Fourmentin, Eric; Pfeffer, Stefan; Ortiz, Julio; Baumeister, Wolfgang; Peng, Tingying; Engel, Benjamin D.; Kervrann, Charles

doi:10.1038/s41592-021-01275-4

Article
Published: 21 October 2021

Deep learning improves macromolecule identification in 3D cellular cryo-electron tomograms

Nature Methods volume 18, pages 1386–1394 (2021)Cite this article

10k Accesses
61 Citations
69 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 16 November 2021

This article has been updated

Abstract

Cryogenic electron tomography (cryo-ET) visualizes the 3D spatial distribution of macromolecules at nanometer resolution inside native cells. However, automated identification of macromolecules inside cellular tomograms is challenged by noise and reconstruction artifacts, as well as the presence of many molecular species in the crowded volumes. Here, we present DeepFinder, a computational procedure that uses artificial neural networks to simultaneously localize multiple classes of macromolecules. Once trained, the inference stage of DeepFinder is faster than template matching and performs better than other competitive deep learning methods at identifying macromolecules of various sizes in both synthetic and experimental datasets. On cellular cryo-ET data, DeepFinder localized membrane-bound and cytosolic ribosomes (roughly 3.2 MDa), ribulose 1,5-bisphosphate carboxylase–oxygenase (roughly 560 kDa soluble complex) and photosystem II (roughly 550 kDa membrane complex) with an accuracy comparable to expert-supervised ground truth annotations. DeepFinder is therefore a promising algorithm for the semiautomated analysis of a wide range of molecular targets in cellular tomograms.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Target generation strategies for training.**

**Fig. 3: Analysis of algorithm performance on the synthetic dataset (SHREC 2019 challenge).**

**Fig. 4: Comparison of score maps obtained with template matching and DeepFinder, and analysis of structural resolution through subtomogram averaging (Dataset 2).**

**Fig. 5: DeepFinder localizes small macromolecules in cellular tomograms.**

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

John Jumper, Richard Evans, … Demis Hassabis

Bioorthogonal masked acylating agents for proximity-dependent RNA labelling

Article 09 April 2024

Shubhashree Pani, Tian Qiu, … Bryan C. Dickinson

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Article 26 February 2024

Haotian Cui, Chloe Wang, … Bo Wang

Data availability

The synthetic dataset (Dataset 1) is available on the website of the SHREC 2019 challenge (http://www2.projects.science.uu.nl/shrec/cryo-et/2019/). A tomogram from the experimental dataset of C. reinhardtii cells (Dataset 2)^34,60 can be found in the EMDB under accession number EMD-3967 (ref. ⁶¹). The test tomogram of the Chlamydomonas pyrenoid used for subtomogram averaging (Dataset 3)³⁷ can be downloaded from the EMDB under accession number EMD-12749, and the raw tilt series data for this tomogram are available at the Electron Microscopy Public Image Archive (EMPIAR) under accession number EMPIAR-10694. All four tomograms used to train and test the detection of PSII in Chlamydomonas thylakoids (Dataset 4)³⁸ can be downloaded from the EMDB under accession numbers EMD-10780, EMD-10781, EMD-10782 and EMD-10783.

Code availability

The code can be downloaded for free from our GitLab website (https://gitlab.inria.fr/serpico/deep-finder) along with accompanying documentation (https://deepfinder.readthedocs.io/en/latest/). DeepFinder is embedded into the new release of Scipion⁶² (https://github.com/scipion-em/scipion-em-deepfinder), an open-source image processing framework for cryo-electron microscopy (http://scipion.i2pc.es/).

Each step of DeepFinder shown in Fig. 1a can be executed with scripts using the API (examples are provided) or with a GPU. These steps may also be embedded in other workflows, for example, if the user needs only the segmentation step. To implement DeepFinder, we used Keras (http://keras.io), an open-source toolbox written in Python and using the TensorFlow framework.

All training procedures were achieved using a Nvidia Tesla K80 GPU, running Cuda 8 and cuDNN 6. In the code, we display the memory consumption of DeepFinder for different training parameters.

Change history

16 November 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41592-021-01349-3

References

Schaffer, M. et al. Optimized cryo-focused ion beam sample preparation aimed at in situ structural studies of membrane proteins. J. Struct. Biol. 197, 73–82 (2017).
Article CAS PubMed Google Scholar
Frank, J. Approaches to large-scale structures. Curr. Opin. Struct. Biol. 5, 194–201 (1995).
Article CAS PubMed Google Scholar
McEwen, B., Renken, C., Marko, M. & Mannella, C. Principles and practice in electron tomography. Methods Cell Biol. 89, 129–168 (2008).
Article PubMed Google Scholar
McIntosh, R., Nicastro, D. & Mastronarde, D. New views of cells in 3D: an introduction to electron tomography. Trends Cell Biol. 15, 43–51 (2005).
Article CAS PubMed Google Scholar
Nicastro, D., Frangakis, A., Typke, D. & Baumeister, W. Cryo-electron tomography of neurospora mitochondria. J. Struct. Biol. 129, 48–56 (2000).
Article CAS PubMed Google Scholar
Guesdon, A., Blestel, S., Kervrann, C. & Chrétien, D. Single versus dual-axis cryo-electron tomography of microtubules assembled in vitro: limits and perspectives. J. Struct. Biol. 181, 169–78 (2013).
Article CAS PubMed Google Scholar
Best, C., Nickell, S. & Baumeister, W. Localization of protein complexes by pattern recognition. Methods Cell Biol. 2007, 615–638 (2007).
Article Google Scholar
Albert, S. et al. Direct visualization of degradation microcompartments at the ER membrane. Proc. Natl Acad. Sci. USA 117, 1069–1080 (2020).
Article CAS PubMed Google Scholar
Förster, F., Pruggnaller, S., Seybert, A. & Frangakis, A. S. Classification of cryo-electron sub-tomograms using constrained correlation. J. Struct. Biol. 161, 276–286 (2008).
Article PubMed Google Scholar
Martinez-Sanchez, A. et al. Template-free detection and classification of membrane-bound complexes in cryo-electron tomograms. Nat. Methods 17, 209–216 (2020).
Article CAS PubMed Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article CAS PubMed Google Scholar
LeCun, Y., Kavukcuoglu, K. & Farabet, C. Convolutional networks and applications in vision. In Proc. IEEE Int. Symp. on Circuits and Systems, 253–256 (2010).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. In Proc. Neural Inf. Processing Systems (NIPS) (eds Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q.) 1–9 (2012).
Long, J., Shelhamer, E. & Darrell, T. Fully convolutional networks for semantic segmentation. In Proc. Conf. Comput. Vis. Pattern Recognition (CVPR), 3431–3440 (2015).
Falk, T. et al. U-net—deep learning for cell counting, detection, and morphometry. Nat. Methods 16, 67–70 (2019).
Article CAS PubMed Google Scholar
Belthangady, C. & Royer, L. Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction. Nat. Methods 16, 1215–1225 (2019).
Article CAS PubMed Google Scholar
Ouyang, W., Aristov, A., Lelek, M., Hao, X. & Zimmer, C. Deep learning massively accelerates super-resolution localization microscopy. Nat. Biotechnology 36, 460–468 (2018).
Article CAS Google Scholar
Weigert, M. et al. Content-aware image restoration: pushing the limits of fluorescence microscopy. Nat. Methods 12, 1090–1097 (2018).
Article Google Scholar
Wu, X., Zeng, X., Zhu, Z., Gao, X. & Xu, M. Template-based and template-free approaches in cellular cryo-electron tomography structural pattern mining. Comp. Biol. 11, 1146–1152 (2019).
Google Scholar
Wang, F. et al. DeepPicker: a deep learning approach for fully automated particle picking in cryo-EM. J. Struct. Biol. 195, 325–336 (2016).
Article PubMed Google Scholar
Al-Azzawi, A., Ouadou, A., Tanner, J. J. & Cheng, J. AutoCryoPicker: an unsupervised learning approach for fully automated single particle picking in cryo-EM images. BMC Bioinform. 20, 326 (2019).
Article Google Scholar
Wagner, T. et al. SPHIRE-crYOLO is a fast and accurate fully automated particle picker for cryo-EM. Commun. Biol. 2, 218 (2019).
Article PubMed PubMed Central Google Scholar
Bepler, T. et al. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nat. Methods 16, 1153–1160 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tegunov, D. & Cramer, P. Real-time cryo-electron microscopy data preprocessing with Warp. Nat. Methods 16, 1146–1152 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, M. et al. Convolutional neural networks for automated annotation of cellular cryo-electron tomograms. Nat. Methods 14, 983–985 (2017).
Article CAS PubMed PubMed Central Google Scholar
Che, C. et al. Improved deep learning based macromolecules structure classification from electron cryo tomograms. Mach. Vis. Appl. 29, 1227–1236 (2018).
Article PubMed PubMed Central Google Scholar
Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In Proc. Med. Image Comput. Comput. Assist. Interv. (MICCAI) 9351, (eds Navab, N., Hornegger, J., Wells, W. M. & Frangi, A.) 234–241 (2015).
Förster, F. & Hegerl, R. Structure determination in situ by averaging of tomograms. Cell. Electron Microsc. 79, 741–767 (2007).
Article Google Scholar
Gubins, I. et al. SHREC’19 Track: classification in cryo-clectron tomograms. In Proc. Eurographics Workshop on 3D Object Retrieval, SHREC–3D Shape Retrieval Contest 2019 https://www2.projects.science.uu.nl/shrec/cryo-et/2019/ (Utrecht Univ., 2019).
Hrabe, T. et al. PyTOM: a python-based toolbox for localization of macromolecules in cryo-electron tomograms and subtomogram analysis. J. Struct. Biol. 178, 177–188 (2012).
Article CAS PubMed Google Scholar
Gubins, I. et al. SHREC 2020: classification in cryo-electron tomograms. Comput. Graphics 91, 279–289 (2020).
Article Google Scholar
Moebel, E. & Kervrann, C. A Monte Carlo framework for missing wedge restoration and noise removal in cryo-electron tomography. J. Struct. Biol.; X 4, 100013 (2020).
PubMed Google Scholar
Rolnick, D., Veit, A., Belongie, S. & Shavit, N. Deep learning is robust to massive label noise. Preprint at arXiv https://arxiv.org/abs/1705.10694v2 (2017).
Pfeffer, S. et al. Dissecting the molecular organization of the translocon-associated protein complex. Nat. Communications 8, 14516 (2017).
Article CAS Google Scholar
Chen, Y., Pfeffer, S., Hrabe, T., Schuller, J. M. & Förster, F. Fast and accurate reference-free alignment of subtomograms. J. Struct. Biol. 182, 235–245 (2013).
Article PubMed Google Scholar
Sanchez-Garcia, R., Segura, J., Maluenda, D., Carazo, J. & Sorzano, C. Deep consensus, a deep learning-based approach for particle pruning in cryo-electron microscopy. IUCrJ. 5, 854–865 (2018).
Article CAS PubMed PubMed Central Google Scholar
Freeman-Rosenzweig, E. et al. The eukaryotic co₂-concentrating organelle is liquid-like and exhibits dynamic reorganization. Cell 171, 148–162 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wietrzynski, W. et al. Charting the native architecture of chlamydomonas thylakoid membranes with single-molecule precision. eLife 9, e53740 (2020).
Article PubMed PubMed Central Google Scholar
Förster, F., Han, B. G. & Beck, M. Visual proteomics. Meth. Enzymol. 483, 215–243 (2010).
Article Google Scholar
Vendeville, A., Larivière, D. & Fourmentin, E. An inventory of the bacterial macromolecular components and their spatial organization. FEMS Microbiol. Rev. 35, 395–414 (2011).
Article CAS PubMed Google Scholar
Gipson, B. R. et al. Automatic recovery of missing amplitudes and phases in tilt-limited electron crystallography of two-dimensional crystals. Phys. Rev. E. 84, 011916 (2011).
Article Google Scholar
Deng, Y. et al. ICON: 3D reconstruction with ‘missing-information’ restoration in biological electron tomography. J. Struct. Biol. 195, 100–112 (2016).
Article PubMed Google Scholar
Biyani, N. et al. Image processing techniques for high-resolution structure determination from badly ordered 2D crystals. J. Struct. Biol. 203, 120–134 (2018).
Article CAS PubMed Google Scholar
He, S. et al. The structural basis of Rubisco phase separation in the pyrenoid. Nat. Plants 6, 1480–1490 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sheng, X. et al. Structural insight into light harvesting for photosystem II in green algae. Nat. Plants 5, 1320–1330 (2019).
Article CAS PubMed Google Scholar
Kingma, D. P. & Ba, J. L. ADAM: a method for stochastic optimization. Preprint at arXiv https://arxiv.org/abs/1412.6980v9 (2014).
Salehi, S. S. M., Erdogmus, D. & Gholipour, A. Tversky loss function for image segmentation using 3D fully convolutional deep networks. In Proc. MICCAI workshop on Machine Learning in Medical Imaging (MLMI), (eds Wang, Q., Shi, Y., Suk, H. I. & Suzuki, K.) 379–387 (2017).
Milletari, F., Navab, N. & Ahmadi, S.-A. V-Net: fully convolutional neural networks for volumetric medical image segmentation. In Proc. IEEE Int. Conf. 3D Vision (3DV), 565–571 (2016).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proc. Int. Conf. Learn. Representation (eds Bengio, Y. & LeCun, Y.) 1–14 (2015).
Comaniciu, D., Meer, P. & Member, S. Mean Shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24, 603–619 (2002).
Article Google Scholar
Martinez-Sanchez, A., Garcia, I., Asano, S., Lucic, V. & Fernandez, J.-J. Robust membrane detection based on tensor voting for electron tomography. J. Struct. Biol. 186, 49–61 (2014).
Article PubMed Google Scholar
Kremer, J. R., Mastronarde, D. N. & McIntosh, J. R. Computer visualization of three-dimensional image data using IMOD. J. Struct. Biol. 116, 71–76 (1996).
Article CAS PubMed Google Scholar
Zivanov, J. et al. New tools for automated high-resolution cryo-EM structure determination in RELION-3. eLife 7, e42166 (2018).
Article PubMed PubMed Central Google Scholar
Bharat, T. B. & Scheres, S. Resolving macromolecular structures from electron cryo-tomography data using subtomogram averaging in RELION. Nat. Protoc 11, 2054–2065 (2016).
Article CAS PubMed PubMed Central Google Scholar
Harauz, G. & van Heel, M. Exact filters for general geometry three dimensional reconstruction. Optik 78, 146–156 (1996).
Google Scholar
Rosenthal, P. B. & Henderson, R. Optimal determination of particle orientation, absolute hand, and contrast loss in single-particle electron cryomicroscopy. J. Mol. Biol. 333, 721–745 (2003).
Article CAS PubMed Google Scholar
Chen, S. et al. High-resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy. Ultramicroscopy 135, 24–35 (2013).
Article CAS PubMed PubMed Central Google Scholar
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Article CAS PubMed Google Scholar
Goddard, T. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis. Protein Sci. 27, 14–25 (2018).
Article CAS PubMed Google Scholar
Albert, S. et al. Proteasomes tether to two distinct sites at the nuclear pore complex. Proc. Natl Acad. Sci. USA 114, 201716305 (2017).
Article Google Scholar
Henderson, R. Avoiding the pitfalls of single particle cryo-electron microscopy: Einstein from noise. Proc. Natl Acad. Sci. USA 110, 18037–18041 (2013).
Article CAS PubMed PubMed Central Google Scholar
de la Rosa-Trevìn, J. et al. Scipion: a software framework toward integration, reproducibility and validation in 3D electron microscopy. J. Struct. Biol. 195, 93–99 (2016).
Article PubMed Google Scholar

Download references

Acknowledgements

This work was jointly supported by the Fourmentin-Guilbert Foundation and Région Bretagne (Brittany Council). Calculations were performed on the Inria Rennes computing grid facilities partly funded by France-BioImaging infrastructure (French National Research Agency—ANR-10-INBS-04-07, ‘Investments for the future’) and at the Max Planck Institute for Biochemistry computing cluster, Martinsried, Germany. L.L., R.D.R., W.W., T.P. and B.D.E. were supported by DFG grant no. EN 1194/1-1 as part of FOR 2092, The Munich School for Data Science (MUDS) and Helmholtz Association. A.M.-S. was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2067/1-390729940.

We thank F. Förster and M. Killinger for fruitful discussions about cryo-ET data analysis and deep learning applied to large 3D volumes analysis, respectively.

We thank the organizers of the SHREC 2019 and SHREC 2020 challenges for helpful assistance and for providing the template matching results: I. Gubins and R.C. Veltkamp (Utrecht University, Department of Information and Computing Sciences), G. van der Schot and F. Förster (Utrecht University, Department of Chemistry).

Finally, we thank S. Prima for careful reading of the paper and valuable suggestions and comments.

Author information

Authors and Affiliations

Serpico Project-Team, Centre Inria Rennes-Bretagne Atlantique and CNRS-UMR 144, Inria, CNRS, Institut Curie, PSL Research University, Campus Universitaire de Beaulieu, Rennes Cedex, France
Emmanuel Moebel & Charles Kervrann
Department of Computer Science, Faculty of Sciences, University of Oviedo, Oviedo, Spain
Antonio Martinez-Sanchez
Health Research Institute of Asturias (ISPA), Avenida Hospital Universitario s/n, Oviedo, Spain
Antonio Martinez-Sanchez
Institute of Neuropathology, Cluster of Excellence ‘Multiscale Bioimaging: from Molecular Machines to Networks of Excitable Cells’, University of Göttingen, Göttingen, Germany
Antonio Martinez-Sanchez
Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany
Lorenz Lamm, Ricardo D. Righetto, Wojciech Wietrzynski & Benjamin D. Engel
Helmholtz AI, Helmholtz Zentrum München, Neuherberg, Germany
Lorenz Lamm & Tingying Peng
Max Planck Institute of Biochemistry, Martinsried, Germany
Sahradha Albert, Stefan Pfeffer, Julio Ortiz & Wolfgang Baumeister
Fourmentin-Guilbert Scientific Foundation, Noisy-le-Grand, France
Damien Larivière & Eric Fourmentin
Zentrum für Molekulare Biologie der Universität Heidelberg, Heidelberg, Germany
Stefan Pfeffer
Ernst Ruska-Centre, Wilhelm-Johnen-Straße, Jülich, Germany
Julio Ortiz
Department of Chemistry, Technical University of Munich, Garching, Germany
Benjamin D. Engel

Authors

Emmanuel Moebel
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Martinez-Sanchez
View author publications
You can also search for this author in PubMed Google Scholar
Lorenz Lamm
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo D. Righetto
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Wietrzynski
View author publications
You can also search for this author in PubMed Google Scholar
Sahradha Albert
View author publications
You can also search for this author in PubMed Google Scholar
Damien Larivière
View author publications
You can also search for this author in PubMed Google Scholar
Eric Fourmentin
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Pfeffer
View author publications
You can also search for this author in PubMed Google Scholar
Julio Ortiz
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Baumeister
View author publications
You can also search for this author in PubMed Google Scholar
Tingying Peng
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin D. Engel
View author publications
You can also search for this author in PubMed Google Scholar
Charles Kervrann
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.M. designed and implemented the presented DeepFinder method and carried out the biocomputing experiments. C.K. supervised the project and was in charge of overall direction and planning. E.F., D.L. and C.K. devised the project and the main conceptual ideas, with assistance from A.M.-S.. B.D.E. and W.B. facilitated access to datasets. B.D.E., S.A., W.W. and S.P. provided the C. reinhardtii datasets and annotations (Datasets 2, 3 and 4). A.M.-S., J.O. and B.D.E. conceived experiments on real datasets. L.L., R.D.R., W.W. and T.P. performed experiments on datasets depicting thylakoid membranes and pyrenoid matrices within vitreously frozen C. reinhardtii cells. E.M., B.D.E. and C.K. cowrote the paper. All authors provided critical feedback and helped shape the research, analysis and paper.

Corresponding authors

Correspondence to Benjamin D. Engel or Charles Kervrann.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Rita Strack was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Two workflows for macromolecule localization in cryo-ET.

a, Conventional processing pipeline based on template matching. b, DeepFinder (analysis stage): a multi-class approach able to localize particles of several different macromolecular species in one pass. a, and b, highlight why DeepFinder is more agile than template matching when several macromolecule classes need to be localized. c, CNN architecture used in DeepFinder and based on U-Net²⁷. The architecture adopts the encoder-decoder paradigm, which produces an output volume with the same size as the input volume. Each green box represents a convolutional layer. The number of filters n and the filter size s is labeled as n × (s × s × s). All convolutional layers are followed by a ReLU activation function, except the last layer, which uses a soft-max function. The up-sampling is achieved with up-convolutions (also called ‘backward-convolution’). Combining feature maps from different scales is performed by concatenation along the channel dimension. In the end, the total number of architecture parameters is approximately 903k. More precisely, this number depends slightly on N_cl, the number of classes: 902, 928 + N_cl × 33.

Extended Data Fig. 2 DeepFinder graphical user interface.

a, Training interface composed of a first window for parametrizing the procedure and a second window for displaying the training metrics in real-time. b, Segmentation interface which also opens a data visualization tool. This tool allows the user to explore the tomogram with superimposed segmentations. In addition, DeepFinder also incorporates interfaces for tomogram annotation, target generation and clustering (see the documentation at https://gitlab.inria.fr/serpico/deep-finder for more information).

Extended Data Fig. 3 Analysis of algorithm performance on the synthetic dataset (SHREC’20 challenge).

a, Performance (F₁-score) of DeepFinder, UMC and template matching algorithms and ability of algorithms to discriminate between 12 classes/subclasses of macromolecules. The highest (best) possible value of an F₁-score is 1.0 and the lowest (worst) possible value is 0. The scores of template matching were provided by the SHREC’20 challenge organizers (Utrecht University, Department of Information and Computing Sciences and Department of Chemistry). b, Performance of DeepFinder implemented as a multi-class network architecture and as an architecture made of 12 binary networks. These two architectures differ only by the number of output neurons. c, Influence of the training target generation method (‘shapes’ versus ‘spheres’). In the case of ‘shapes’, the exact shapes of the macromolecules have been used to annotate the tomograms. In the case of ‘spheres’, the shape and the orientation of macromolecules are not needed to generate the training targets. This analysis used eight tomograms for training, one tomogram for validation, and one tomogram for testing.

Extended Data Fig. 4 Evolution of F₁-scores with respect to sizes of the training sets (number of tomograms) on the synthetic SHREC dataset (12 classes).

Scores are displayed for both the SHREC 2019 a, and 2020 b, editions. This figure gives an estimation of the amount of annotated data needed to identify macromolecules. This amount depends on the size of the target macromolecule: smaller targets require more annotations. Each tomogram contains in average 208 macromolecules per class. The macromolecules have been categorized into four groups (large, medium, small and tiny). This analysis used eight tomograms for training, one tomogram for validation, and one tomogram for testing.

Extended Data Fig. 5 Evolution of F₁-score with respect to training iterations and training set size on real cryo-ET Dataset #2, Chlamydomonas reinhardtii (3 classes).

a, The loss, which quantifies the segmentation quality, is computed for the training set, as well as for the validation set. Comparing both curves allows assessment of the generalization capabilities of DeepFinder. The curves for both sets should ideally overlap, otherwise it indicates overfitting (the network memorizes trained samples instead of learning discriminating features). One epoch equals 100 training iterations. b, The F₁-score, which quantifies the localization performance, computed on the test set. The F₁-score is obtained by comparing the membrane-bound ribosomes found by DeepFinder to expert annotations. The time axis has been obtained using a Tesla K80 GPU. The curve indicates that competitive particle picking results are obtained after 20 epochs, or 4.3 hours with the required GPU. This analysis used 21 tomograms for training, one tomogram for validation, and eight tomograms for testing. c, In a similar fashion to Extended Data Fig. 4, this curve provides an estimate of the quantity of training data required to achieve a competitive result. It appears that this quantity is 1,400 ribosomes (nine tomograms), which is a typical size for a cryo-ET dataset. On first glance, this estimate seems to contradict the estimates in Extended Data Fig. 4: the numbers do not coincide (the curve labeled ‘Large’ estimates that quantity at 208 particles). Note that SHREC’19 is a synthetic dataset, composed of 12 classes. Here, we are dealing with a real cellular dataset consisting of three classes (membrane, mb-ribo and ct-ribo). It appears that having a larger number of classes enables the use of smaller training sets. On the other hand, the case of real data is more difficult, notably because of the presence of ‘label noise’ (errors due to the annotation pipeline) and other sources of signal corruption such as the missing wedge, the contrast transfer function and the low signal-to-noise ratio (in part caused by increased molecular crowding inside cells). This analysis used one tomogram for validation, and eight tomograms for testing.

Extended Data Fig. 6 Quantitative analysis of overlap with expert annotations on cellular cryo-ET data (Dataset #2, mb-ribos).

We varied the thresholds of template matching (a) and DeepFinder (b) to compute the Recall (ratio between the number of true positives and the number of particles in the ground truth), Precision (ratio between the number of true positives and the number of detected particles) and F₁-score (2 × (Recall × Precision) / (Recall + Precision)) curves. The threshold parameter for template matching is the constrained correlation coefficient, and for DeepFinder it is the cluster size, which corresponds to the macromolecule volume (in voxels). We obtained a maximum F₁-score of 0.86 for DeepFinder and a maximum F₁-score of 0.50 for template matching (with no post-classification step, see Extended Data Fig. 1a). Template matching and DeepFinder both have good Recall values, but template matching has a lower Precision than DeepFinder. This suggests that template matching can be recommended to select many candidates, but a time-consuming post-classification is required to improve Precision. DeepFinder has much higher Precision values, which confirms the results from the synthetic dataset (SHREC’19 challenge). This analysis used 48 tomograms for training, one tomogram for validation, and eight tomograms for testing.

Extended Data Fig. 7 DeepFinder handles ice contamination on the lamella surface.

a, Tomogram slice depicting the border of a FIB-milled lamella. The lamella contains a Chlamydomonas reinhardtii cell, with a lamella surface suffering from ice contamination. b, Tomogram slice with superimposed DeepFinder segmentation. Most of the ice contamination artifacts have been correctly classified as ‘background’. Nonetheless, some missclassifications exist, as can be observed in the zoomed-in boxes (in dashed red). In boxes 1 and 2, DeepFinder confuses some artifacts with membranes, and some features are wrongly classified as membrane-bound ribosomes. Such missclassifications can be filtered out, either by masking the boundaries of the lamella, or by rejecting segmented objects that are too small (using the ‘cluster size’ attribute given by the clustering step of the DeepFinder analysis stage). This analysis used 48 tomograms for training, one tomogram for validation, and eight tomograms for testing.

Extended Data Fig. 8 The generalization potential of DeepFinder on P19 cells.

DeepFinder was trained on the Chlamydomonas (algae) dataset and then applied on a tomogram of mouse P19 cells (EMD-10439). Although the ribosome has a different structure for the two species, for a given voxel size (13.68 Å) the structures are similar enough for DeepFinder to identify and localize mb-ribo particles in a P19 cell. a, Tomographic slice with both the superimposed segmented cell membrane (gray) and mb-ribo particles (blue). b, Average density from 300 mb-ribo particles. c, Histogram of mb-ribo particle distance from the nearest cell membrane. In this histogram, the maximum mode is located at 136.8 Å, which corresponds to the ribosome radius. This analysis used 48 tomograms for training, one tomogram for validation, and one tomogram for testing.

Supplementary information

Supplementary Information

Supplementary Table 2, Figs. 1–8 and Note 1 (with two figures).

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moebel, E., Martinez-Sanchez, A., Lamm, L. et al. Deep learning improves macromolecule identification in 3D cellular cryo-electron tomograms. Nat Methods 18, 1386–1394 (2021). https://doi.org/10.1038/s41592-021-01275-4

Download citation

Received: 15 April 2020
Accepted: 18 August 2021
Published: 21 October 2021
Issue Date: November 2021
DOI: https://doi.org/10.1038/s41592-021-01275-4

This article is cited by

Learning structural heterogeneity from cryo-electron sub-tomograms with tomoDRGN
- Barrett M. Powell
- Joseph H. Davis
Nature Methods (2024)
Bridging structural and cell biology with cryo-electron microscopy
- Eva Nogales
- Julia Mahamid
Nature (2024)
No ground truth needed: unsupervised sinogram inpainting for nanoparticle electron tomography (UsiNet) to correct missing wedges
- Lehan Yao
- Zhiheng Lyu
- Qian Chen
npj Computational Materials (2024)
DeepETPicker: Fast and accurate 3D particle picking for cryo-electron tomography using weakly supervised deep learning
- Guole Liu
- Tongxin Niu
- Ge Yang
Nature Communications (2024)
TomoTwin: generalized 3D localization of macromolecules in cryo-electron tomograms with structural data mining
- Gavin Rice
- Thorsten Wagner
- Stefan Raunser
Nature Methods (2023)