High-confidence 3D template matching for cryo-electron tomography

Cruz-León, Sergio; Majtner, Tomáš; Hoffmann, Patrick C.; Kreysing, Jan Philipp; Kehl, Sebastian; Tuijtel, Maarten W.; Schaefer, Stefan L.; Geißler, Katharina; Beck, Martin; Turoňová, Beata; Hummer, Gerhard

doi:10.1038/s41467-024-47839-8

Download PDF

Article
Open access
Published: 11 May 2024

High-confidence 3D template matching for cryo-electron tomography

Nature Communications volume 15, Article number: 3992 (2024) Cite this article

2433 Accesses
1 Citations
41 Altmetric
Metrics details

Subjects

Abstract

Visual proteomics attempts to build atlases of the molecular content of cells but the automated annotation of cryo electron tomograms remains challenging. Template matching (TM) and methods based on machine learning detect structural signatures of macromolecules. However, their applicability remains limited in terms of both the abundance and size of the molecular targets. Here we show that the performance of TM is greatly improved by using template-specific search parameter optimization and by including higher-resolution information. We establish a TM pipeline with systematically tuned parameters for the automated, objective and comprehensive identification of structures with confidence 10 to 100-fold above the noise level. We demonstrate high-fidelity and high-confidence localizations of nuclear pore complexes, vaults, ribosomes, proteasomes, fatty acid synthases, lipid membranes and microtubules, and individual subunits inside crowded eukaryotic cells. We provide software tools for the generic implementation of our method that is broadly applicable towards realizing visual proteomics.

Template-free detection and classification of membrane-bound complexes in cryo-electron tomograms

Article 06 January 2020

High-resolution in situ structure determination by cryo-electron tomography and subtomogram averaging using emClarity

Article 12 January 2022

StarMap: a user-friendly workflow for Rosetta-driven molecular structure refinement

Article 02 November 2022

Introduction

Cryo-electron tomography (CryoET) images the cellular environment in situ without labels and with fully preserved context^1,2. Recent advances in hardware and acquisition techniques have enabled CryoET to routinely image, with high throughput, cell volumes in their native state and obtain structures of abundant macromolecular complexes with near molecular resolution^3,4,5. However, lacking a uniform established method, the localization of particles in the tomograms remains highly customized, specific to each target, at best semi-automatic, and relying on strong manual input (such as the definition of geometric surface for large pleomorphic assemblies)^6,7,8,9,10 or extensive, often manual corrections for the false positives in an initial automated assignment^3,4,11. The confident identification of a sufficient number of particles for a challenging target such as the nuclear pore complexes (NPC) can thus take months or years of manual annotation of literally hundreds of tomograms^12,13. An automated, general, and reliable localization method would bring us closer to realizing the promise of visual proteomics^14,15,16,17 to build molecularly detailed representations of complex cellular landscapes from CryoET data.

Reliable assignment of molecular identities in tomograms is challenging due to both the biological context and the specifics of CryoET processing. Cells are crowded environments, and the proteins within them are structurally heterogeneous and vary widely in size and abundance. The physical limitations of the acquisition procedure further complicate particle localization^18,19,20. In CryoET, the electron dose is limited to prevent sample radiation damage, which results in a low signal-to-noise ratio in the acquired tilt series. The maximum sample tilt of about ±60 degrees results in incomplete angular sampling known as the missing wedge problem in the three-dimensional (3D) reconstruction. In addition, the electron micrographs are conventionally captured out of focus. To recover the high-resolution information, it is thus necessary to accurately determine the defocus and correct the contrast transfer function (CTF)²⁰. Visual proteomics needs to overcome these challenges for the reliable assignment of molecular identities to noisy 3D images of highly complex cellular volumes.

Manual tomogram annotation is still widely used despite being labor intensive, intrinsically subjective, and incomplete²¹. Template-based computational approaches²² use known objects (templates) and compare them with the data by calculating a similarity metric (usually a constrained cross-correlation)^{15,16,23,24,25}. In contrast, template-free methods iterate to cluster particles and determine patterns without imposing any structure^26,27. However, their accuracy and efficiency need improvement²². Deep-learning algorithms, including classification and semantic segmentation, have been applied to CryoET^22,28,29,30. Recently, implementations such as DeepFinder²⁹, DeePiCt³⁰, and TomoTwin³¹ have shown promising results in segmenting tomograms and identifying the positions of common macromolecular complexes. However, these methods require extensive annotations for training and are less effective in detecting low-abundance particles²², so far limiting their use to detect ribosomes and similarly sized particles. Furthermore, they determine only positions, and further processing is needed to determine particle orientations.

Template matching (TM)^{15,16,23,24,25} is typically used with low-resolution templates of the macromolecular complex of interest on down-sampled tomograms to reduce computational cost and avoid template bias. Large numbers of false positive hits are removed either manually, thereby lowering the objectivity of the approach, or through a multistep classification procedure, which is computationally expensive and can fail if the number of particles is small. In addition, the data down-sampling limits the ability to localize smaller or weak-signal particles³². In theory, the ability of TM to localize the particles with high confidence should be connected to the quality of the template and how well it resembles the actual data. However, in practice, it has not been objectively shown how TM depends on the type of template and parameters such as voxel size, masks, resolution filtering, and the number of orientations.

In this study, we establish a high-confidence TM pipeline and combine it with CryoET imaging for visual proteomics of eukaryotic cells. We show that the performance of TM not only depends on the size, but also on its experimental origin and shape, and more importantly on the angular increment in orientational sampling in a template-specific manner. Furthermore, tomogram voxel size (magnification), filtering, and resolution have to be considered as optimization parameters. We demonstrate the power of optimized TM to localize nuclear pore complexes (NPCs), vault proteins, ribosomes, proteasomes, microtubules, and lipid membranes, inside a single dataset. We establish that TM can identify low-abundance and low-density complexes with high fidelity, as exemplified by the identification of ribosome-loaded vaults. We show that TM quantitatively captures conformers and subunits and provide recommendations for users to optimally set template-dependent search parameters and a parameter estimation software tool.

Results

High-confidence template matching for in situ macromolecule localization

We comprehensively tested our TM pipeline on tomograms of Dictyostelium discoideum and exemplifying tomograms from Schizosaccharomyces pombe and human tissue culture (Hek293) cells obtained from lamellae milled with cryo-focused ion beam microscopes⁴ (see Fig. 1 for the workflow and “Methods” section for details on data acquisition). Starting from a library of the best available templates for a series of candidate features, we performed TM of each template in a tomogram independently and assigned particle identities to the points with high constrained cross-correlation (CC). The locations and orientations of the assigned peaks permit the visualization and analysis of the spatial interactions of the features. We used a total of 21 templates, on 3 different species (Table 1) at different voxel sizes and with multiple search parameters including the number of orientations and filters (see “Methods” section for details). Templates in the library were obtained from different sources including subtomogram averaging (STA), homology modeling, the protein data bank (PDB)³³, the electron microscopy data bank (EMDB)³⁴, and molecular dynamics simulations (see “Methods” section for details).

**Fig. 1: Template matching for visual proteomics.**

Table 1 Tested cases for template matching

Full size table

We used the STOPGAP^35,36 software framework, and re-implemented it as a GPU-accelerated version (https://gitlab.mpcdf.mpg.de/bturo/gapstop_tm), to calculate the actual cross-correlation between templates and tomograms, maximizing the cross-correlation of the template according to its orientation and positions. This framework takes into account the missing wedge, angular tilt step, defocus, and electron dose (see “Methods” section and ref. ³⁶ for details). For each template, with optimized search parameters (see next section), peaks several standard deviations above noise appear in the z-score map. High-confidence peaks correspond to the position where the center of the template is placed to best reproduce the data from the tomograms.

Figure 1 summarizes the TM procedure. We used a library that includes templates for the NPC³⁷, the 80S ribosome⁴, and the nuclear envelope obtained by STA from tomograms of D. discoideum. For the proteasome³⁸ and microtubule³⁹, we used the previously reported human structures (PDB-id: 6rgq [https://www.rcsb.org/structure/6RGQ] (human 20S proteasome structure), PDB-id: 3jar [https://www.rcsb.org/structure/3JAR] (microtubule structure), respectively). For the vault, we created a density map starting from an atomic model generated by homology modeling. With each of the templates, we performed TM, initially at 4-binned data with a voxel size of 8.704 Å and then also at higher resolution (2-binned 4.352 Å/voxel and unbinned 2.176 Å/voxel). By progressing hierarchically to higher resolution, we aimed to capitalize on the high signal content of the data collected with the latest-generation hardware.

We transformed the cross-correlation volumes into z-score maps by subtracting the average and then dividing by the standard deviation (σ), both calculated for each template across the entire map. We use z-scores unless otherwise stated, as they quantify peak heights relative to the background in a particular tomogram. In the z-score representation, a peak at the center of the NPC is typically ~10 standard deviations ($\sigma$) above the map noise, while the vault and the ribosome have peaks with z-score values of ~30 and ~40, respectively (Fig. 1). For isolated objects such as the vault or ribosome, the peaks appear insular and sharp, while membrane or microtubules show elongated and continuous peaks consistent with the extended and repetitive character of the objects. Remarkably, TM identifies also low-density and low-abundance particles with high fidelity (Fig. 1). Automatic and semi-automatic particle detection algorithms have been widely tested for high-contrast and abundant macromolecular complexes in tomograms (e.g., ribosomes). However, fundamental macromolecular complexes such as the NPC or vault, which are scarce (2–3 copies per tomogram) and have low density, are particularly challenging. With optimal parameters, TM results in strong peaks for both macromolecular complexes (Fig. 1) and finds all positions identifiable by expert inspection. This finding is important in two ways: firstly, these complexes are fundamental for our understanding of cellular function, and secondly, given their low abundance, harnessing all the particles is key for visual proteomics analysis.

Assessment of parameters that impact on the performance of TM

The success is dependent on the accurate tuning of various parameters, but clear guidelines on how to adjust those are missing. We analyzed various parameters and found that optimal TM requires systematic tuning of the bandpass filters (Fig. 2a, b), template (Supplementary Figs. 1–4) and mask size (Fig. 2c), voxel size (Supplementary Fig. 5) and angular sampling (Supplementary Figs. 5 and 6). Optimal parameter values depend on the quality of the data as well as the size and shape of the object (Fig. 2d–f and Supplementary Figs. 1–4).

**Fig. 2: Optimization of the search parameters in template matching.**

We first assessed the impact of frequency contents. For the ribosome, NPC subunit (C8-symmetric rotational segment), half vault, and microtubule templates, peaks decay with increasing high-pass filter, i.e., when low-resolution information is gradually removed (Fig. 2a). The low-pass filter has a less pronounced effect, although the z-score slightly increased when high-resolution information was included (Fig. 2b). This analysis implies that for ribosome, NPC subunit, vault and microtubule, TM detection benefits from retaining higher resolution information in the data.

Regarding mask sizes, we found that mask tightness has a negligible effect for ribosomes and microtubules as long as the template is completely contained (Fig. 2c). However, for membrane-associated structures such as the NPC, a shaped mask may exclude the membrane from the template, improving TM performance (Supplementary Fig. 7).

Angular scanning should be optimized in a template-specific manner

The above analysis indicated that the impact of parameters such as voxel size or the number of orientations sampled depends on the template mass, shape, and size. To systematically investigate this, we developed a Python-based tool to evaluate TM parameters in silico (see details in Methods and examples in Supplementary Figs. 1–4). The in silico evaluation of multiple templates showed that the CC depends almost linearly across different templates on the fraction of overlapping voxels between the rotated template and the object (Fig. 2d, e), a relation that would be exact if voxel intensities were strictly zero or one. The number of overlapping voxels depends on both angular sampling and object shape (Fig. 2e). This effect is particularly pronounced for hollow objects such as the vault and elongated structures such as protofilaments. In such cases, even small rotations lead to a large decrease in the number of overlapping voxels and hence in the cross-correlation. Thus, templates that require finer orientation sampling to be localized with high confidence will demand more computational power for detection with similar performance (Supplementary Figs. 8 and 9). We conclude that general recommendations for sampling during template matching cannot be made. Instead, optimal angular sampling is highly dependent on template shape and should be individually tested. Therefore, our pipeline allows us to optimize parameters in silico in a template-specific manner, prior to analyzing experimental data, to then channel the available computational power toward those templates that require more fine-grained scanning. For example, the variation of cross-correlation with angular distance (Supplementary Figs. 2b, 3b, and 4b) provides an initial guide for estimating axis-dependent angular steps. Angular steps that result in <40% decrease in cross-correlation are considered sufficient, as illustrated for the Vault (Supplementary Figs. 2 and 9) and the NPC subunit (Supplementary Figs. 4 and 8). Our Python-based tool will allow users to do this systematically for any template.

Quantitative localization of ribosomes

Although the qualitative detection of ribosomes was reported⁴, reliable particle detection with minimal false negative rates is a prerequisite for quantitative analysis of the localization and interaction of molecular complexes. We assessed the ability of optimized TM to locate individual ribosome positions and orientations by comparing the results of TM with existing annotations of the cytosolic 80S ribosomes for D. discoideum⁴. The annotations were obtained in a multistep classification procedure, with an initially oversampled set of ribosomes, using Relion⁴⁰, as described in ref. ⁴, which resulted in a map with resolutions up to 4.5 Å.

Figure 3 shows the results for TM on 4-binned data (8.704 Å/voxel). Motivated by our in silico evaluation (Supplementary Fig. 5), we assessed the effect of the number of orientations by sampling the rotational space in angular steps of 30, 20, 10, and 5 degrees (576, 1944, 15192, and 119952 orientations) and selected TM peaks corresponding to local maxima in the z-score map that are above a threshold (Fig. 3) and clearly inside the lamella borders. We considered a particle in the ground truth as TM detected if it was located within 10 nm (~1/3 of the ribosome diameter) of a TM peak. With increased numbers of orientations, the z-scores of the peaks increased and with that the percentage of TM-detected particles (Fig. 3c, d; see also Supplementary Figs. 5 and 6). With orientations separated by ~5 degrees, TM detected ~95% of the 437 previously annotated particles with a mean distance to the TM peak of (3.73 ± 1.57) nm (Fig. 3f) and with orientations that closely matched the annotated orientations (Fig. 3e). Consequently, the averages of the particles detected and orientated by TM recapitulate the density of the 80S ribosome with high sensitivity and accuracy without the need for a multistep classification process (Fig. 3h, i), similar to recent reports⁴¹. This suggests that TM can be used for a quantitative accounting of the particles present in the tomograms, whereby false negative detections appear to be minimal. Our analysis shows that the comprehensive search of the rotational space enhances the quantitative capability of TM⁴¹ in a trade-off with increased computational cost.

**Fig. 3: Template matching locates the 80S ribosome with high spatial and rotational accuracy.**

High-confidence TM reveals membrane compartments

Accurate segmentation of membranes is crucial for visualizing cellular landscapes, and to the best of our knowledge, TM has not yet been used to detect cellular membranes. We tested TM for membrane segmentation with models of different origins and sizes (Table 1, Figs. 1 and 4). The first template was the map created from a frame in the trajectory of an atomistic simulation of a membrane in explicit water (atomic model). The second and third models were averages of the nuclear envelope obtained by subtomogram averaging with diameters of 43.5 nm (small STA) and 87 nm (large STA), respectively. For comparison, during TM, cylindrical masks with a diameter of 34.8 nm were used for both the atomistic and the small STA, while a cylindrical mask with a diameter of 76.5 nm was used for the large STA (see “Methods” section).

The inner and outer membranes of the nuclear envelope were detected using any of the three templates (atomistic, small STA, large STA; see Supplementary Movie 1). The atomistic and small STA templates performed roughly on par. Increasing the number of orientations (20, 10, and 2 degrees at 4-binned data with 8.704 Å/voxel) consistently decreased the background noise (Fig. 4), sharpening the peaks, and increased the confidence in the TM detection. False positives for the small templates (atomistic, small STA), e.g., from a microtubule segment (Fig. 4 left; see also Fig. 1) are suppressed by using the large STA template (or, visually, by recognizing the lacking 2D extension). However, the large STA model gives only a weak signal for curved membranes, pointing to the need for an expanded model set of membrane patches of varying curvature.

**Fig. 4: Template matching for the segmentation of membranes in 3D.**

Although computationally expensive compared to other segmentation methods, template matching for membranes has several strengths. For example, the template matching output could be used as an initial annotation for training deep-learning algorithms. In addition, TM not only predicts the positions of the membranes in the tomogram but also provides voxel-by-voxel normal vectors, which in turn enables a detailed analysis of the local properties of the membranes. The latter could also be used as an automatic input for triangulation methods and/or as a starting point for simulations of membrane dynamics.

Detection of subunits and conformer subpopulations

We tested the ability of TM to localize subunits and assign substates of ribosomes, the NPC, and microtubule fragments. We generated templates for the subunits of the D. discoideum NPC according to its C8 symmetry, microtubule protofilaments, the small (40S) and large (60S) ribosomal subunits, and for two prominent 80S ribosome states capturing the ratchet-like motion essential for protein synthesis⁴².

For the ribosomal subunits, we performed TM on 2-binned data (4.352 Å/voxel) with orientations every 10 degrees, since TM on 4-binned tomograms showed inconclusive peaks. A sub-volume of the tomogram was analyzed independently with three different templates: 80S, 60S, and 40S (Fig. 5). Similar to the 4-binned data (Fig. 3), the TM localized 96.9% of the 80S annotated ribosomes with z-score peaks up to 114 (Fig. 5b, c). Furthermore, when comparing the positions and orientations of the subunits, TM correctly predicted the location of the subunits and their relative orientations (Fig. 5d). Small but noticeable differences between the orientations of the subunits with respect to the position of the 80S reflect the limited angular sampling. Finally, using all the TM peaks detected by the 60S (90 particles) or only the unannotated (34 particles) TM peaks detected by the 60S, we recovered features from the 80S (see Supplementary Fig. 10), demonstrating the high quality of the particles found.

**Fig. 5: Template matching predicts the relative orientations of ribosome subunits and assigns ribosome rotational States.**

By comparing the relative TM z-scores on 2-binned data (4.352 Å/voxel) with orientations every 10 degrees, we could correctly assign the ratcheting state of the small subunit of individual ribosomes in space (Fig. 5e–h). Two known representative ratcheting states of the D. discoideum ribosome were used as templates⁴: rotated (EMD-15815 https://www.emdataresource.org/EMD-15815) and unrotated (EMD-15812 https://www.emdataresource.org/EMD-15812), and the states were assigned using the expectation-maximization algorithm (see "Methods" section for details) to predict the mixture of subpopulations (Fig. 5g), similar to previous studies⁴³. Although the rotated and unrotated templates share most of the density with only a slight rotation of the 40S (Fig. 5f), the TM assignments differentiated between the rotated and unrotated states, matching the existing annotations in 77.7% and 82.4% of cases, respectively (Fig. 5e, h). It is worth noting that there are other intermediate rotation states, and the binding of multiple cofactors to the ribosome along the translation cycle⁴ may affect the TM z-scores and ultimately the state assignment, which may account for non-matching particles.

TM also finds NPC subunits. Directly from the z-score maps, we could detect the C8-symmetric rotational segments of the NPC (Fig. 6a) with high confidence, as demonstrated by performance metrics (Supplementary Fig. 7), after performing TM on 4-binned tomograms (8.704 Å/voxel) and sampling orientations every 10 degrees as suggested by our in silico analysis. Interestingly, no peaks were detected using an NPC from a different species as a template (Supplementary Fig. 7), highlighting the role of template information content.

**Fig. 6: Template matching detects NPC subunits, microtubule protofilaments and ribosome-loaded vaults.**

To investigate the effect of template size, we used segments of microtubules differing in size. Using appropriate sampling (2-binned, 2.446 Å/voxel, 5 degrees), TM resolved peaks of the individual αβ-tubulin as distinct peaks with the 13-fold symmetry of microtubules (Fig. 6b) when the protofilament template was used. This is apparent in tomograms of both, D. discoideum and Hek293 cells (Supplementary Fig. 11). We further masked a single αβ-tubulin dimer (Fig. 6b and Supplementary Fig. 12). Despite the low combined mass of only 100 kDa, TM achieves good statistics both in terms of true positives and (likely) false negatives. Although the subunit segmentation along the filament was blurred, it is evident in the longitudinal z-scores along the axial lines passing through protofilaments that the local maxima correspond to the subunits in the microtubule lattice for both the protofilament template and the αβ-tubulin dimer (Supplementary Fig. 12). When cylindrical segments of different sizes are used as template, microtubules are detectable at lower resolution (8.704 Å/voxel, 10 degrees), but the true positive rate is reduced with decreasing template size (Supplementary Fig. 13).

Overall, these results demonstrate that TM can find subunits of macromolecular complexes with high accuracy and precision.

High-confidence TM identifies vault-encapsulated ribosomes in situ

The biological function of the vault particle remains mysterious. A few interactors binding to the inside surface have been reported^44,45 which in line with its capsule-like morphology has led to speculations that vaults may enclose other particles and transport cargo within the cell. To the best of our knowledge, however, evidence for vaults encapsulating cargo in situ is yet missing. Three of the vaults in the tomogram of Figs. 5a and 6c contain 80S ribosomes with highly significant z-scores (vaults: 54$,32$, and 59 ribosomes: $63$, $46$, and $77$, in Fig. 6c(i)–(iii), respectively). Note, that TM reports excellent performance metrics for the identification of ribosomes and vaults (Supplementary Fig. 14). These findings support the hypothesis that vaults can be cargo-loaded in situ. Whether the encapsulation occurred during vault biogenesis or by transient opening remains to be further investigated.

High-confidence TM identify macromolecular complexes in other species: comparison with state-of-the-art tools

We further tested the versatility and performance of high-confidence TM on a recently published tomographic dataset of S. pombe³⁰. We selected this dataset because it was used to test two recent deep-learning tools to localize particles in tomograms (DeePiCt³⁰ and DeepFinder²⁹) and annotations exist. We performed TM for ribosomes (80S), fatty acid synthase (FAS), membrane, and NPCs on a tomogram reconstructed from the tilting series reported for S. pombe (EMPIAR-10988 [https://www.ebi.ac.uk/empiar/EMPIAR-10989/];TS_043)³⁰ (see Methods and Supplementary Fig. 15). Templates for S. pombe ribosomes (EMD-14426 [https://www.emdataresource.org/EMD-14426])³⁰, FAS (EMD-14412 [https://www.emdataresource.org/EMD-14412])³⁰ and the NPC (EMD-11373 [https://www.emdataresource.org/EMD-11373])⁴⁶ were obtained from the EMDB³⁴. For the membrane we used the large STA template described above (see Fig. 4). From the whole NPC template, a smaller template of a rotational segment was extracted in a procedure analogous to Fig. 6a. TM was performed on one 4-binned tomogram (13.48 Å/voxel) for all the templates. We used angular steps of 5 degrees for the 80S, FAS, and NPC subunits, and 2 degrees for membranes.

For the ribosome localization, TM had an F1 score of 0.77, which is comparable to DeepFinder (median F1 = 0.83) and DeePiCt (median F1 = 0.79). TM performs significantly better on FAS (F1 = 0.70, Supplementary Fig. 15) than DeepFinder (median F1 = 0.11) and DeePiCt (median F1 = 0.46). Finally, in contrast to DeepFinder and DeePiCt that faced challenges locating the NPC, TM demonstrated its capability to precisely identify the individual NPC subunits with z-scores >20 (Supplementary Fig. 15d), as confirmed by expert inspection. TM also localized membranes with a generic, unadjusted template. All results for DeepFinder and DeePiCt were taken from ref. ³⁰.

Discussion

The comprehensive identification of particles in electron tomograms remains challenging. Despite its conceptual simplicity, template matching has been considered a low-precision method, and its application has been limited by the low signal-to-noise ratio of tomographic data, the scarce availability of suitable templates, and the lack of objective optimization of search parameters. Here, we have shown that template matching can identify the positions and orientations of multiple macromolecular complexes in living cells with high accuracy and fidelity. For this task, templates can be used from multiple sources such as data banks, simulations, homology modeling, or volumetric data from the tomograms. For maximum efficiency, we developed software for an in silico parameter optimization and GPU-accelerated Python Stopgap for template marching (GAPSTOP^TM).

With optimized TM, we achieved a mass resolution of 100 kDa in experimental tomograms of a crowded cell. Using a generic template for human tubulin, we could readily localize individual tubulin subunits in a high-resolution CryoET map of D. discoideum cells (Fig. 6b, Supplementary Fig. 12). High-confidence TM thus pushes into a particle size regime in situ that covers much of cellular biology.

By exploiting geometric and contextual features, one can further improve the likelihood of finding objects by template matching. Vaults, for example, are of low abundance with a low-contrast interior, but their unique shape facilitates identification with confidence (Fig. 1 and Supplementary Fig. 2). Spatial extent is also important for TM. Another strategy to search for smaller objects is to decrease the voxel size (the same object will occupy more voxels, increasing the range of frequencies describing the template), which allowed us to locate ribosomal subunits and distinguish between ribosomal substates. However, the location of smaller isolated objects poses an additional challenge: the unambiguous validation of the peaks. In the cases presented here, we used annotated data (Figs. 3, 5 and Supplementary Figs. 5, 14, and 15) and expert inspection (Figs. 1, 4, 6 and Supplementary Figs. 7–9 and 11–16). However, when considering smaller structures, an increase in the number of peaks and volumes of high significance is expected. To overcome this challenge, we envision a hierarchical approach in which we mask parts of the volume where we have high confidence in the presence of an object and then perform a focused search for smaller objects. Still, relying on template matching alone may be insufficient, and additional information, such as abundance data would need to be incorporated to effectively analyze TM results.

High-confidence TM outperforms existing deep-learning-based tools in the reliable detection of challenging low-abundance and low-density complexes (e.g., NPC). TM delivers competitive or superior statistics compared to recent approaches based on artificial neural network architectures^29,30. Unlike existing deep-learning-based approaches^29,30,31, TM does not require any prior training, which makes it possible to precisely localize subunits and identify functional substates by distinguishing between multiple conformations (Figs. 5 and 6 and Supplementary Fig. 12). However, the widespread use of template matching was limited by its computational expense, due to the nature of the algorithm that evaluates each voxel in the volume and the need of extensive angular sampling (up to hundreds of thousands of orientations)⁴¹. This problem was exacerbated in the current STOPGAP implementation with limited parallelization across CPUs. However, the TM algorithm is by construction embarrassingly parallel and relies on Fourier transformation, which is highly efficient on graphics processing units (GPUs). Therefore, we developed and released GAPSTOP^TM to harness the GPU’s parallel capabilities for TM while preserving the unique missing wedge and noise correlation modeling of the STOPGAP implementation³⁶. Significant speedups by using GPUs were also reported for pytom_tm⁴¹.

Finally, the TM workflow can readily be combined with AI-based approaches^{22,28,29,30,31}. At one end of the pipeline, AI can be used to optimize TM parameters and, at the other end, to integrate the outputs across template families into classification scores. At the center of the pipeline, however, the 3D CC score is highly efficient and captures the relevant physics by being rigorously proportional to the log-likelihood for Gaussian noise in the 3D map⁴⁷ (see “Methods” section). In the future, TM-annotated tomograms can be used to train and validate AI-based particle localization methods.

Taken together, our analysis demonstrates the detection of various objects, with high confidence, in cryo-electron tomograms acquired with the latest hardware. By expanding the repertoire of templates, e.g., from AlphaFold⁴⁸ and molecular dynamics simulations, TM should help us assign molecular identities to the large parts of tomograms currently unassigned. High-confidence TM thus changes the workflow in CryoET through fast, automated, objective, and comprehensive feature identification. In turn, CryoET combined with high-confidence TM brings us closer to the goal of visual proteomics: to map the positions and orientations of all macromolecular complexes within living cells.

Methods

Experimental tomograms

The tilt series of D. discoideum used in this study, as well as the annotations for the ribosomes and their substates were previously reported (codes: EMPIAR-11845 and EMPIAR-11899)⁴. The tilting series for S. pombe were obtained from the Electron Microscopy Public Image Archive (EMPIAR)⁴⁹, code: EMPIAR-10988³⁰. The tilting series for H. sapiens have EMPIAR code: EMPIAR-11538. For the three species, the cell culture, sample preparation, data acquisition, and image processing are detailed in the original publications. For D. discoideum, in brief, tilt series were collected at 300 kV on a Titan Krios G2 microscope equipped with a Gatan BioQuantum-K3 imaging filter in counting mode and a Titan Krios G4 microscope equipped with a cold FEG, Selectris X imaging filter, and Falcon 4 direct electron detector in counting mode. Projections had a pixel size of 2.176 Å and 1.223 Å for D. discoideum, 3.37 Å for S. pombe, and 1.223 Å for H. sapiens respectively, and were acquired in a dose symmetric acquisition scheme⁵⁰ with 2 deg increments^4,51. For D. discoideum, S. pombe, and H. sapiens, the initial tomogram reconstruction was performed in eTomo from IMOD⁵², and the established parameters were used to reconstruct the tomograms with 3D-CTF correction using novaCTF⁵³. The corrected tomograms were used for TM either in their unbinned form or with applied binning of 2, 4, or 8.

Template matching

We performed TM using STOPGAP³⁵ and GAPSTOP^TM (GPU-Accelerated Python STOPgap for Template Matching) for the cases described in Table 1. STOPGAP is an open-source freely available Matlab-based code: https://github.com/williamnwan/STOPGAP. GAPSTOP^TM is a Python implementation of the STOPGAP framework that speeds up TM by 10–100 times through GPU utilization. Fully implemented in Python, GAPSTOP^TM is now available to the community via its repository: https://gitlab.mpcdf.mpg.de/bturo/gapstop_tm. Documentation and installation instructions are provided for ease of use (https://bturo.pages.mpcdf.de/gapstop_tm).

As input, both STOPGAP and GAPSTOP^TM require a template, a list of orientations to probe (angular sampling), a wedge list, definitions of the filters, and the reconstructed tomogram. Details on the preparation of the templates are given below. The list of orientations was generated using STOPGAP function generate_angle_list, which samples the angle space uniformly on a grid and also takes into account the symmetry of the template (see Table 1). The wedge list contains the acquisition parameter information for the individual tilts. In particular, it must contain: the pixel size, the tilt angle, defocus, and electron dose. The low-pass filter allows low-frequency signals to pass through while attenuating high-frequency signals, while high-pass filters allow high-frequency signals to pass through while attenuating low-frequency signals. In STOPGAP, both are defined in voxels defining the radius of a spherical mask applied in Fourier space (i.e., these values depend on the dimensions of the template). See further details in the STOPGAP documentation. For each template, we obtained a map of the local CC maxima over orientations, which we turned into z-scores as $z=({{\mbox{CC}}}-\mu )/\sigma$ with $\mu$ and $\sigma$ the average and standard deviation of CC values across the map, respectively.

In silico peak analysis

The template weighting and CC calculation methods from STOPGAP³⁵ have been ported to Python and extended to output additional information relevant to the input parameters. The inputs are the same as for the original TM, but instead of a whole tomogram, a small volume is used. The volume can be either the same as the template (typically an STA map or a model) or a subtomogram (obtained either based on an existing ground truth or by manual picking). For full peak analysis, one must also provide a density mask, which is a binary (or tapered) map corresponding to the density of the template (or alternatively a threshold to create one during the analysis). In addition to the z-score map and the angles map, the peak analysis provides information on the TM progress as well as the analysis of the template and the resulting maps. A table shows the dependence of the template orientation on the CC scores and on the number of overlapping voxels. For the template, it computes the dimensions, the number of voxels in the density mask, and a solidity calculated as the number of voxels in the density map divided by the volume of its convex hull. It also returns the value of the peak, its exact location, and line profiles through the peak along each dimension. The angle map is used to compute three maps of angular distances, where each voxel contains the angular distance in degrees between the orientation encoded in the angle map and the starting template orientation. The first map contains the angular distance of the full orientation and is computed using a quaternion-based cosine similarity formula⁵⁴. The second map contains the angle between the normal vectors of the final and the initial orientation, which encodes the rotation on the cone. The third map contains the angle between the in-plane vectors. The maps provide information on whether the CC scores are more sensitive to cone or in-plane rotation (or neither) and thus can be used to determine sufficient angular sampling. Finally, the key results of the peak analysis are summarized in a PDF file to provide an easy-to-read overview for the user. While the tool is most useful for determining the optimal setup for GAPSTOP^TM (or deciding its feasibility), it can also be used to analyze the origin of false positive results by testing a template against a map containing a different structure. For example, a ribosome template can be tested against map containing proteasome to determine the pixel size and filtering to distinguish these two with sufficient confidence. Similarly, the membrane template can be used against a microtubule structure to determine the size of the template and mask necessary to pick mostly membranes. Lastly, there is a possibility to turn off the missing wedge weighting to analyze its impact on the peak shape or add an angular offset to the starting orientation to see how it affects the peak value for given angular sampling.

Membrane templates

For the atomistic membrane template, we used the final lipid bilayer of a 28-ns molecular dynamics simulation of a 40 × 40 nm² membrane patch (7164 lipids) in explicit water, using the setup and protocol of ref. ⁵⁵. Input files and the final output structure can be found in the public repository⁵⁶. The small STA and large STA models were obtained as subtomogram averages of the nuclear envelope with diameters of 43.5 nm and 87 nm, respectively. In TM, cylindrical masks with a diameter of 34.8 nm were used for the atomistic and small STA models. For the large STA model, the diameter was increased to 76.5 nm.

Creation of density maps from atomic models

In the cases where an atomic model was available (membrane and vault), we used the molmap function of ChimeraX⁵⁷ with a resolution of 3.5 Å. Here, each atom is represented by a 3D Gaussian. The width of the Gaussian is given by the resolution, while the amplitude is proportional to the atomic number. Afterward, we used EMAN2⁵⁸ to rescale and resample the map to match the voxel size of the respective tomogram.

Templates for the ribosomal subunits 40S and 60S

To generate templates of the large 60S and small 40S ribosomal subunits, the ribosome structure of a translating D. discoideum ribosome from EMD-15810⁴ was segmented in ChimeraX⁵⁷ using the Segger function⁵⁹. A fitted eukaryotic ribosome atomic model (PDB-id: 5LZS⁶⁰) was used to guide this procedure.

Statistical assignment of ribosomal substates

To assign the substates of the ribosome 80S, we performed TM using two templates corresponding to a rotated (EMD-15815 [https://www.emdataresource.org/EMD-15815]) and unrotated ribosomal states (EMD-15812 [https://www.emdataresource.org/EMD-15812]). High-confidence peaks were extracted from each TM map with their respective z-scores.

For a given particle, defined by its coordinates, we computed the ratio between the two TM z-scores as:

$${x}_{i}=\frac{{{cc}}_{i-{rot}}}{{{cc}}_{i-{unrot}}}$$

(1)

where ${{cc}}_{i-{rot}},{{cc}}_{i-{unrot}}$ corresponds to the z-score obtained for particle I with the rotated and unrotated template, respectively. To assign the rotational substates of each particle, we used a Gaussian mixture model (GMM). Specifically, we assumed that the distribution of x can be modeled as a linear superposition of two Gaussian distributions, one for the rotated state and the other for the unrotated state. The probability density function of the GMM can be written as:

$$p\left({{{{{\bf{x}}}}}}\right)={\pi }_{{rot}}{{{{{\mathcal{N}}}}}}\left({{{{{\bf{x}}}}}} | {\mu }_{{rot}},{\Sigma }_{{rot}}\right)+({1-\pi }_{{rot}}){{{{{\mathscr{N}}}}}}\left({{{{{\bf{x}}}}}} | {\mu }_{{unrot}},{\Sigma }_{{unrot}}\right)$$

(2)

where ${{{{{\mathcal{N}}}}}}(\mu,\Sigma )$ represents a Gaussian probability density function with mean $\mu$ and variance $\Sigma$.

To estimate optimal parameters for ${\pi }_{{rot}},\, {\mu }_{i},{\Sigma }_{i}$, we used the sklearn.mixture⁶¹ python implementation of the estimation-maximization (EM) algorithm. The EM algorithm alternates between computing the expected values of the latent variables (the assignment of each data point to a mixture component) and updating the parameters of the GMM to maximize the log-likelihood of the observed data. Specifically, the E-step computes the posterior probability of each mixture component for each data point, given the current estimates of the parameters, while the M-step updates the parameters to maximize the expected complete log-likelihood of the data, given the posterior probabilities.

Determination of the receiver operating characteristic (ROC) curves and F₁ scores

Adjusting the thresholds for the z-score derived from template matching results in varying balances between specificity and sensitivity. This trade-off can be depicted in a graphical representation known as a receiver operating characteristic (ROC) curve, as detailed in ref. ⁶².

The ROC curve illustrates sensitivity (true positive rate - TPR) on the y-axis and 1 − specificity (false positive rate - FPR) on the x-axis. The area under this curve provides a concise metric summarizing the classifier’s overall performance.

$${{{{{\rm{Sensitivity}}}}}}\; {{{{{\rm{or}}}}}}\; {{{{{\rm{true}}}}}}\; {{{{{\rm{positive}}}}}}\; {{{{{\rm{rate}}}}}}=\frac{{TP}}{{FP}+{TP}}$$

(3)

$${{{{{\rm{False}}}}}}\; {{{{{\rm{positive}}}}}}\; {{{{{\rm{rate}}}}}}=\frac{{FP}}{{FP}+{TN}.}$$

(4)

Another common measure of predictive performance is the F₁-score.

$${{{{{{\rm{F}}}}}}}_{1}{{{{{\rm{score}}}}}}=\frac{2{TP}}{2{TP}+{FP}+{FN}}$$

(5)

To generate ROC curves and obtain F₁ scores for ribosomes, NPC, vault, and microtubules, we compared peaks from the TM with annotated particles. We considered peaks from the template matching with z-scores > 5 and separated by half the diameter of the template. For the 80S ribosomes, we used the annotated particles from a previous publication obtained with Relion classification⁴ as reference. Ground truth for the vault and NPC was obtained by expert manual picking followed by subtomogram averaging. For microtubules, a set of ground truth peaks was obtained by expert manual annotation. A particle was considered found if the TM peak was closer than 10, 10, 15, and 5 nm to the annotated ground truth for ribosomes, vaults, NPC subunits (C8-symmetric rotational segment), and microtubules, respectively.

We varied the thresholds on the z-scores for the calculation of true positive rates (sensitivity) and false positive rates. These rates were plotted against each other to construct the ROC curves and the maximum F₁ score, which illustrate the performance of the template matching method across different structures.

Template matching produces maximum-likelihood solution

For Gaussian noise of width $\sigma$ in the intensities of a CryoET 3D map $M$, the likelihood $L$ that a feature in the map is consistent with a template $T$ rotated and translated by ${R}$ is proportional to

$$L\propto \exp \left[-\frac{\left(\right.{\sum }_{i,\, j,\, k}{\left({M}_{{ijk}}-{\left({RT} \, \right)}_{{ijk}}\right)}^{2}}{2{\sigma }^{2}}\right]$$

(6)

By multiplying out the square, summing over the voxels $i,j,k$, and recognizing that the “${M}^{2}$” and “${({RT})}^{2}$” terms are constant, we find that

$$L\propto \exp \left[\frac{\left(\right.{\sum }_{i,j,k}{M}_{{ijk}}{\left({RT}\right)}_{{ijk}}}{{\sigma }^{2}}\right]$$

(7)

The term in the exponent is exactly the cross-correlation CC between map and template divided by ${\sigma }^{2}.$Analogous to single-particle 2D images⁴⁷, the cross-correlation of template and map is thus the log-likelihood scaled by the squared noise amplitude. CC optimization over template rotation and translation $R$ thus gives the maximum-likelihood solution.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The previously published structures for the NPC subunits (H. sapiens) EMD-14325, EMD-14328 and EMD-14330, the NPC (S. pombe) EMD-11373, the 80S ribosome (D. discoideum) EMD-15810, EMD-15812, and EMD-15815, the 80S ribosome (S. pombe) EMD-14426 [https://www.ebi.ac.uk/emdb/EMD-14426], the fatty acid synthase (S. pombe) EMD-14412, the 20S proteasome (H. sapiens) EMD-4877 and the microtubule (H. sapiens) EMD-6351 are accessible through the Electron Microscopy Data Bank. The previously published tilt series for S. pombe, EMPIAR-10989 are available through the Electron Microscopy Public Image Archive. The previously published structures 7R5J (NPC structure), 6RGQ (human 20S proteasome structure), and 3JAR (microtubule structure) are available through the Protein Data Base. Source data are provided with this paper. Molecular dynamics setups, templates generated in this study, and supplementary raw data have been deposited in Zenodo (https://doi.org/10.5281/zenodo.10819130)⁵⁶. Source data are provided with this paper.

Code availability

All code used for this study is part of the public repositories. GAPSTOP^TM is available at https://gitlab.mpcdf.mpg.de/bturo/gapstop_tm⁶³. Documentation and installation instructions are provided for ease of use (https://bturo.pages.mpcdf.de/gapstop_tm). The in silico peak analysis is part of the Contextual Analysis Tools for CryoET and subtomogram averaging (cryoCAT). The source code of cryoCAT is available in the following repository: https://github.com/turonova/cryoCAT⁶⁴. A detailed notebook outlining the parameters and usage of the in silico peak analysis can be found here: https://github.com/turonova/cryoCAT/blob/main/docs/source/tutorials/peak_analysis/peak_analysis.ipynb.

References

Beck, M. & Baumeister, W. Cryo-electron tomography: can it reveal the molecular sociology of cells in atomic detail? Trends Cell Biol. 26, 825–837 (2016).
Article PubMed Google Scholar
Mahamid, J. et al. Visualizing the molecular sociology at the HeLa cell nuclear periphery. Science 351, 969–972 (2016).
Article ADS CAS PubMed Google Scholar
Xue, L. et al. Visualizing translation dynamics at atomic detail inside a bacterial cell. Nature 610, 205–211 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Hoffmann, P. C. et al. Structures of the eukaryotic ribosome and its translational states in situ. Nat. Commun. 13, 7435 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Xing, H. et al. Translation dynamics in human cells visualized at high resolution reveal cancer drug action. Science 381, 70–75 (2023).
Article ADS CAS PubMed Google Scholar
Mattei, S., Glass, B., Hagen, W. J. H., Kräusslich, H.-G. & Briggs, J. A. G. The structure and flexibility of conical HIV-1 capsids determined within intact virions. Science 354, 1434–1437 (2016).
Article ADS CAS PubMed Google Scholar
Wan, W. et al. Structure and assembly of the Ebola virus nucleocapsid. Nature 551, 394–397 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Burt, A. et al. Complete structure of the chemosensory array core signalling unit in an E. coli minicell strain. Nat. Commun. 11, 743 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Hoffmann, P. C. et al. Electron cryo-tomography reveals the subcellular architecture of growing axons in human brain organoids. eLife 10, e70269 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zila, V. et al. Cone-shaped HIV-1 capsids are transported through intact nuclear pores. Cell 184, 1032–1046.e18 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gemmer, M. et al. Visualization of translation and protein biogenesis at the ER membrane. Nature 614, 160–167 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Mosalaganti, S. et al. AI-based structure prediction empowers integrative structural analysis of human nuclear pores. Science 376, eabm9506 (2022).
Article CAS PubMed Google Scholar
Wozny, M. R. et al. In situ architecture of the ER–mitochondria encounter structure. Nature 618, 188–192 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Nickell, S., Kofler, C., Leis, A. P. & Baumeister, W. A visual approach to proteomics. Nat. Rev. Mol. Cell Biol. 7, 225–230 (2006).
Article CAS PubMed Google Scholar
Beck, M. et al. Visual proteomics of the human pathogen Leptospira interrogans. Nat. Methods 6, 817–823 (2009).
Article CAS PubMed PubMed Central Google Scholar
Förster, F., Han, B.-G. & Beck, M. Visual proteomics. Methods in Enzymol. 483, 215–243 (2010).
Article Google Scholar
Bäuerlein, F. J. B. & Baumeister, W. Towards visual proteomics at high resolution. J. Mol. Biol. 433, 167187 (2021).
Article PubMed Google Scholar
Wan, W. & Briggs, J. A. G. Cryo-electron tomography and subtomogram averaging. Methods Enzymol. 579, 329–367 (2016).
Article CAS PubMed Google Scholar
Turoňová, B., Marsalek, L. & Slusallek, P. On geometric artifacts in cryo electron tomography. Ultramicroscopy 163, 48–61 (2016).
Article PubMed Google Scholar
Turk, M. & Baumeister, W. The promise and the challenges of cryo‐electron tomography. FEBS Lett. 594, 3243–3261 (2020).
Article CAS PubMed Google Scholar
Hecksel, C. W. et al. Quantifying variability of manual annotation in cryo-electron tomograms. Microsc. Microanal. 22, 487–496 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Wu, X. et al. Template-based and template-free approaches in cellular cryo-electron tomography structural pattern mining. In Computational Biology (eds. Division of Biomedical Science, University of the Highlands and Islands, UK & Husi, H.) 175–186 (Codon Publications, 2019).
Böhm, J. et al. Toward detecting and identifying macromolecules in a cellular context: template matching applied to electron tomograms. Proc. Natl Acad. Sci. USA 97, 14245–14250 (2000).
Article ADS PubMed PubMed Central Google Scholar
Frangakis, A. S. et al. Identification of macromolecular complexes in cryoelectron tomograms of phantom cells. Proc. Natl Acad. Sci. USA 99, 14153–14158 (2002).
Article ADS CAS PubMed PubMed Central Google Scholar
Lucas, B. A. et al. Locating macromolecular assemblies in cells by 2D template matching with cisTEM. eLife 10, e68946 (2021).
Article CAS PubMed PubMed Central Google Scholar
Xu, M. et al. De novo structural pattern mining in cellular electron cryotomograms. Structure 27, 679–691.e14 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zeng, X. et al. High-throughput cryo-ET structural pattern mining by unsupervised deep iterative subtomogram clustering. Proc. Natl Acad. Sci. USA 120, e2213149120 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chen, M. et al. Convolutional neural networks for automated annotation of cellular cryo-electron tomograms. Nat. Methods 14, 983–985 (2017).
Article CAS PubMed PubMed Central Google Scholar
Moebel, E. et al. Deep learning improves macromolecule identification in 3D cellular cryo-electron tomograms. Nat. Methods 18, 1386–1394 (2021).
Article CAS PubMed Google Scholar
de Teresa-Trueba, I. et al. Convolutional networks for supervised mining of molecular patterns within cellular context. Nat. Methods 20, 284–294 (2023).
Article PubMed PubMed Central Google Scholar
Rice, G. et al. TomoTwin: generalized 3D localization of macromolecules in cryo-electron tomograms with structural data mining. Nat. Methods 20, 871–880 (2023).
Article CAS PubMed PubMed Central Google Scholar
Förster, F., Pruggnaller, S., Seybert, A. & Frangakis, A. S. Classification of cryo-electron sub-tomograms using constrained correlation. J. Struct. Biol. 161, 276–286 (2008).
Article PubMed Google Scholar
Berman, H. M. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Article ADS CAS PubMed PubMed Central Google Scholar
Lawson, C. L. et al. EMDataBank unified data resource for 3DEM. Nucleic Acids Res. 44, D396–D403 (2016).
Article CAS PubMed Google Scholar
Wan, W., Khavnekar, S., Wagner, J., Erdmann, P. & Baumeister, W. STOPGAP: a software package for subtomogram averaging and refinement. Microsc. Microanal. 26, 2516–2516 (2020).
Article ADS Google Scholar
Wan, W., Khavnekar, S. & Wagner, J. STOPGAP, an open-source package for template matching, subtomogram alignment, and classification. Acta Crystallogr. Sect. D: Struct. Biol. 80, 336–349 (2023).
Hoffmann, P. C. et al. Nuclear pores as conduits for fluid flow during osmotic stress. Preprint at bioRxiv https://doi.org/10.1101/2024.01.17.575985 (2024).
Toste Rêgo, A. & da Fonseca, P. C. A. Characterization of fully recombinant human 20S and 20S-PA200 proteasome complexes. Mol. Cell 76, 138–147.e5 (2019).
Article PubMed PubMed Central Google Scholar
Zhang, R., Alushin, G. M., Brown, A. & Nogales, E. Mechanistic origin of microtubule dynamic instability and its modulation by EB proteins. Cell 162, 849–859 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zivanov, J. et al. New tools for automated high-resolution cryo-EM structure determination in RELION-3. eLife 7, e42166 (2018).
Article PubMed PubMed Central Google Scholar
Chaillet, M. L. et al. Extensive angular sampling enables the sensitive localization of macromolecules in electron tomograms. Int. J. Mol. Sci. 24, 13375 (2023).
Article CAS PubMed PubMed Central Google Scholar
Frank, J. & Agrawal, R. K. A ratchet-like inter-subunit reorganization of the ribosome during translocation. Nature 406, 318–322 (2000).
Article ADS CAS PubMed Google Scholar
Lucas, B. A., Zhang, K., Loerch, S. & Grigorieff, N. In situ single particle classification reveals distinct 60S maturation intermediates in cells. eLife 11, e79272 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kickhoefer, V. A. et al. The 193-Kd vault protein, Vparp, is a novel poly(Adp-ribose) polymerase. J. Cell Biol. 146, 917–928 (1999).
Article CAS PubMed PubMed Central Google Scholar
Kickhoefer, V. A. et al. The telomerase/vault-associated protein Tep1 is required for vault RNA stability and its association with the vault particle. J. Cell Biol. 152, 157–164 (2001).
Article CAS PubMed PubMed Central Google Scholar
Zimmerli, C. E. et al. Nuclear pores dilate and constrict in cellulo. Science 374, eabd9776 (2021).
Article CAS PubMed Google Scholar
Cossio, P. & Hummer, G. Bayesian analysis of individual electron microscopy images: towards structures of dynamic and heterogeneous biomolecular assemblies. J. Struct. Biol. 184, 427–437 (2013).
Article CAS PubMed Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Iudin, A. et al. EMPIAR: the Electron Microscopy Public Image Archive. Nucleic Acids Res. 51, D1503–D1511 (2023).
Article PubMed Google Scholar
Turoňová, B. et al. Benchmarking tomographic acquisition schemes for high-resolution structural biology. Nat. Commun. 11, 876 (2020).
Article ADS PubMed PubMed Central Google Scholar
Tuijtel, M. W. et al. Thinner is not always better: optimising cryo lamellae for subtomogram averaging. Sci. Adv. https://doi.org/10.1126/sciadv.adk6285 (2024) (in press).
Kremer, J. R., Mastronarde, D. N. & McIntosh, J. R. Computer visualization of three-dimensional image data using IMOD. J. Struct. Biol. 116, 71–76 (1996).
Article CAS PubMed Google Scholar
Turoňová, B., Schur, F. K. M., Wan, W. & Briggs, J. A. G. Efficient 3D-CTF correction for cryo-electron tomography using NovaCTF improves subtomogram averaging resolution to 3.4 Å. J. Struct. Biol. 199, 187–195 (2017).
Article PubMed PubMed Central Google Scholar
Kuipers, J. B. Quaternions and Rotation Sequences: A Primer with Applications to Orbits, Aerospace, and Virtual Reality (Princeton Univ. Press, Princeton, NJ, 2007).
Schaefer, S. L. & Hummer, G. Sublytic gasdermin-D pores captured in atomistic molecular simulations. eLife 11, e81432 (2022).
Article CAS PubMed PubMed Central Google Scholar
Cruz-León, S. et al. Data for high-confidence 3D template matching for cryo-electron tomography. Zenodo https://doi.org/10.5281/ZENODO.10819130 (2024).
Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and analysis: UCSF ChimeraX visualization system. Protein Sci. 27, 14–25 (2018).
Article CAS PubMed Google Scholar
Tang, G. et al. EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol. 157, 38–46 (2007).
Article CAS PubMed Google Scholar
Pintilie, G. D., Zhang, J., Goddard, T. D., Chiu, W. & Gossard, D. C. Quantitative analysis of cryo-EM density map segmentation by watershed and scale-space filtering, and fitting of structures by alignment to regions. J. Struct. Biol. 170, 427–438 (2010).
Article CAS PubMed PubMed Central Google Scholar
Shao, S. et al. Decoding mammalian ribosome-mRNA states by translational GTPase complexes. Cell 167, 1229–1240.e15 (2016).
Article CAS PubMed PubMed Central Google Scholar
Garreta, R. & Moncecchi, G. Learning Scikit-Learn: Machine Learning in Python: Experience the Benefits of Machine Learning Techniques by Applying Them to Real-World Problems Using Python and the Open Source Scikit-Learn Library (Packt Publishing Ltd, Birmingham, UK, 2013).
Fawcett, T. ROC graphs: notes and practical considerations for researchers. Mach. Learn. 31, 1–38 (2004).
MathSciNet Google Scholar
Turonova, B. GAPStop(TM) - GPU Accelerated Python-base Stopgap for Template Matching. [Software] https://doi.org/10.5281/ZENODO.10822455 (2024).
Turonova & makubans. turonova/cryoCAT: v0.2.0. [Software] https://doi.org/10.5281/ZENODO.10820843 (2024).
Ermel, U. H., Arghittu, S. M. & Frangakis, A. S. ArtiaX: an electron tomography toolbox for the interactive handling of sub‐tomograms in UCSF ChimeraX. Protein Sci. 31, e4472 (2022).

Download references

Acknowledgements

This work was funded by the Max Planck Society and the Chan Zuckerberg Initiative for Visual Proteomics Imaging (grant number 2021-234666, M.B., B.T., and G.H.). The Max Planck Computing and Data Facility is acknowledged for computational resources. We thank Stefanie Böhm for critical reading of the manuscript and Sonja Welsch and Iskander Khusainov for fruitful discussions. We also thank Martin Simonovsky for helpful discussions on peak analysis, Jürgen Köfinger, and Jakob Bullerjahn for discussions on the Gaussian mixture model, Agnieszka Obarska-Kosinska for the help with the template of the human NPC subunit, and Huaipeng Xing for the assistance with the human tilt series.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Max-von-Laue-Str. 3, 60438, Frankfurt am Main, Germany
Sergio Cruz-León, Stefan L. Schaefer & Gerhard Hummer
Department of Molecular Sociology, Max Planck Institute of Biophysics, Max-von-Laue-Str. 3, 60438, Frankfurt am Main, Germany
Tomáš Majtner, Patrick C. Hoffmann, Jan Philipp Kreysing, Maarten W. Tuijtel, Katharina Geißler, Martin Beck & Beata Turoňová
IMPRS on Cellular Biophysics, Max-von-Laue-Str. 3, 60438, Frankfurt am Main, Germany
Jan Philipp Kreysing & Katharina Geißler
Max Planck Computing and Data Facility, Gießenbachstraße 2, 85748, Garching, Germany
Sebastian Kehl
Institute of Biochemistry, Goethe University Frankfurt, 60438, Frankfurt am Main, Germany
Martin Beck
Institute of Biophysics, Goethe University Frankfurt, 60438, Frankfurt am Main, Germany
Gerhard Hummer

Authors

Sergio Cruz-León
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Majtner
View author publications
You can also search for this author in PubMed Google Scholar
Patrick C. Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Jan Philipp Kreysing
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Kehl
View author publications
You can also search for this author in PubMed Google Scholar
Maarten W. Tuijtel
View author publications
You can also search for this author in PubMed Google Scholar
Stefan L. Schaefer
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Geißler
View author publications
You can also search for this author in PubMed Google Scholar
Martin Beck
View author publications
You can also search for this author in PubMed Google Scholar
Beata Turoňová
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Hummer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.C.L., M.B., B.T., and G.H., conceived the project. P.C.H. and M.W.T. acquired the data. S.C.L. and J.P.K. analyzed the data, and S.L.S. performed the MD simulations. S.C.L., T.M., S.K., and B.T. wrote code. K.G. contributed to the analysis and interpretation of the vault protein results. S.C.L., J.P.K., S.L.S., M.B., B.T., and G.H. wrote the manuscript, and S.C.L., T.M., P.C.H., J.P.K., M.W.T., S.L.S., K.G., B.T., M.B., and G.H. edited the manuscript. M.B., B.T., and G.H. supervised the project and obtained funding.

Corresponding authors

Correspondence to Martin Beck, Beata Turoňová or Gerhard Hummer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Movie 1

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cruz-León, S., Majtner, T., Hoffmann, P.C. et al. High-confidence 3D template matching for cryo-electron tomography. Nat Commun 15, 3992 (2024). https://doi.org/10.1038/s41467-024-47839-8

Download citation

Received: 31 October 2023
Accepted: 12 April 2024
Published: 11 May 2024
DOI: https://doi.org/10.1038/s41467-024-47839-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.