Sub-pixel electron detection using a convolutional neural network

Modern direct electron detectors (DEDs) provided a giant leap in the use of cryogenic electron microscopy (cryo-EM) to study the structures of macromolecules and complexes thereof. However, the currently available commercial DEDs, all based on the monolithic active pixel sensor


Introduction
Cryogenic electron microscopy (cryo-EM) of biological samples depends on the recording of a small number of incident electrons that can be used to form an image before the sample is destroyed by radiation damage. Therefore the detector plays a more important role for these samples than for less or non-radiation sensitive samples. The emergence of direct electron detectors in the form of the monolithic active pixel sensors (MAPS) has been a breakthrough in cryo-EM (reviewed in [1]). Combined with advances in processing and software algorithms, the direct electron detector has enabled structural biologists to reveal macromolecular structures at near-atomic resolution using limited by radiation damage to the sample.
The MAPS technology already existed but considerable improvement for cryo-EM was achieved by back-thinning the sensor to, eventually, less than 30 µm thickness. The effect of back-thinning is the reduced chance of an electron scattering back into the sensor layer and depositing its energy in pixels adjacent to the point of impact. This back scattering will degrade the MTF performance of a detector [8]. With the current generation of MAPS detector (Thermo Fisher Falcon3 [9] 1 , Gatan K3 and Direct Electron DE-64) the limit of back-thinning has been reached (Guerrini, personal communication).
Combined with back-thinning, a further enhancement was the introduction of electron counting mode in which every electron is localised and given an equal weight [11]. A number of different algorithms for event localisation have been published [12,13]. For the current range of MAPS detectors, these algorithms require a sufficiently low electron flux to prevent the signal from adjacent incident electrons from overlapping with each other within one frame. Low electron fluxes provide multiple downsides; (a) The lengthy exposure times, up to one minute or more for Falcon3 detectors, compromises the total throughput during a Single Particle Analysis (SPA) study. Sample drift is very significant during long exposures and needs to be corrected for. (b) These detectors are unsuitable for diffraction experiments, which would wreck the counting algorithm and might even damage the detector by radiation [14]. (c) At the onset of the exposure of a pristine biological sample, one would expect to be able to observe the highest resolution features of such sample. However, in practice, the high resolution information of the early frames within a recording are down weighted or even removed as they are disturbed by factors such as beam induced movements [15][16][17]. These fast movements cannot be accurately corrected for from the data from the current DEDs. Details of the specific algorithms used by commercial MAPS detectors have not been released, but accurate event localisation is still seen as a bottleneck [1].
An alternative approach to back-thinning of the detector is to fully absorb the electron and to obtain as much information about it as possible, thereby optimising the ability to computationally localise the point of impact. Hybrid pixel detectors (HPD) allow this alternative approach and can have a much higher readout speed. HPDs are characterised by having their application-specific integrated circuit (ASIC) separate from the sensor layer. The sensor layer consists of a pixelated piece of semiconductor with individual bump bonds connecting it to the readout electronics chip. Such detector has recently been used to collect preliminary SPA data at 100 keV [18,19]. The Medipix family of HPDs have a relatively large pixel size of 55 µm when compared to MAPS detectors, but this allows for more per-pixel electronics. In the Medipix HPDs, each pixel has its own signal threshold and counter and this gives it a noise-free readout and high dynamic range (reviewed in [20]). These properties have made them very appropriate sensors for electron diffraction and STEM [21][22][23][24]. Any electron moving through silicon with an energy over 80 keV will spread well beyond the 55 µm pixel pitch. It was shown that by increasing the signal threshold far beyond the noise edge of the Medipix such that each electron will be recorded in one single pixel, a near perfect MTF could be restored [25]. However, in that case some electrons remain undetected and this reduces the detectors DQE. To overcome the problem of the electron affecting multiple pixels, Mir et al. employed the Medipix3RX equipped with a charge sharing mode (CSM) that effectively combines the charge of a particle in four adjoining pixels to one pixel [26,27]. It accomplishes this by having additional logic in the analog front-end of the ASIC. They also showed that the Medipix3RX using CSM for 80 keV and 120 keV can maximise the MTF without leaving electrons undetected [28]. Above 120 keV the charge is spread beyond the four pixels used by the CSM and the MTF still degrades. Therefore, at those energies, more information about the primary electron track is required to localise the point of impact accurately.
The Timepix3 is currently the latest generation of the Medipix HPD family [29]. This ASIC is capable of simultaneously measuring the time that a signal is over the comparator threshold (Time over Threshold; ToT) and the timing of when the signal crosses the comparator threshold (Time of Arrival; ToA). The ToT effectively gives a measure for the amount of energy deposited in a pixel. The ToA accuracy is determined by the ASIC's fast ToA clock which can achieve a time binning of 1.5625 ns (640 MHz). Any 200 keV incident electron will travel most of a 300 µm thick silicon sensor layer at a velocity higher than 50% light speed and thus at much shorter time scales than 1.5625 ns. However, in passing, the incident electron will create electron hole pairs (E-H pairs) in the sensor. These charge carriers will drift towards the electrodes under influence of the bias voltage and will have a drift time well over 1.5625 ns. Drift velocity for holes in silicon are in the order of 10 6 cm s -1 [30]. Therefore in a fully depleted 300 µm silicon sensor layer drift times for holes can range up to 30 ns. The ToA information thus effectively gives a z-position where the incident electron created a E-H pair in the sensor layer [31]. This principle was used by Bergmann et al. for the 3D reconstruction of a 120 GeV high energy pion particles and accompanying 2 to 3 MeV delta rays [32].
Here, we show that the ToT and ToA information can be used to reconstruct the point of impact of the electron in the sensor layer at energies typically used in cryo-EM (200-300 keV). Instead of attempting to directly reconstruct the point of impact using the ToT and ToA information, we present an ad hoc prediction model using a convolutional neural network (CNN). Neural networks have become an increasingly popular machine learning method over recent years and CNNs are well suited for extracting 2D feature information from training input. Neural networks have been used in other detector applications for reconstructing the impact position of a particle hitting a sensor layer. At the ATLAS experiment of the Large Hadron Collider (at CERN) neural networks are being employed to separate and determine the direction of high energy particles hitting their detector planes [33]. For Positron Emission Tomography (PET) neural networks have been employed to determine the incident position of a 512 keV X-ray hitting a scintillating crystal block [34,35]. A notable difference between both the high energy particles (> TeV) detected in the ATLAS experiment and the visible light photons detected from a scintillating crystal in the case of PET, is that electrons will undergo multiple scattering events while travelling the sensor layer. This trajectory of the electron through multiple pixels, while erratic, follows rules and patterns. The neural network is, by training, recognising the pattern and able to deduct the trajectory and thereby the incident position of the electron.
For training a neural network, usually a ground-truth dataset is used; here, this would mean a dataset with known incident positions and their corresponding detector output. Within an electron microscope the electron beam cannot easily be limited to a sub-pixel area of the detector and therefore the incident position is not known with enough accuracy for training purposes. Instead, in here, we have chosen to generate training data by means of simulation. After training the neural network, the obtained prediction model can be applied to experimental data to improve the MTF of the Timepix3. We implement a method for correcting the non-uniform ToT response of the Timepix3. Finally, we illustrate that we are able to image, in electron counting mode, the useful dose-lifetime of a protein within one second.

Installation of detector
The Timepix3 detector chip assembly, camera housing, SPIDR readout electronics board and control software were provided by Amsterdam Scientific Instruments (ASI) [23]. The vacuum camera housing provided temperature control through water cooling, X-ray shielding and a mechanical aluminium shutter positioned 5 mm above 1 During the review process of this manuscript the Falcon4 was announced [10].
the sensor layer. The Timepix3 was assembled in a quad configuration giving a total of 512 by 512 pixels. In this configuration we reached a maximum readout speed of 110 Mhit/s. The SPIDR readout board developed by the Dutch National Institute for Subatomic Physics (Nikhef) provided a 10 Gbit/s fibre optic connection to the detector PC [36]. The detector control and SPIDR readout software SoPhy were developed by ASI. Using SoPhy, a detector equalisation method was performed where the global threshold and the per pixel thresholds were set close to the electronic noise edge. The detector was mounted under a FEI Tecnai Arctica operating at 200 keV at Maastricht University ( Fig. 1), the Netherlands and under a FEI Tecnai G2 Polara operating at 300 keV at C-CINA, Basel, Switzerland. The sensor consisted of 300 µm and 500 µm of silicon in Maastricht and Basel, respectively. The sensor was connected as a single slab, via bump bonds, to the quad configured ASIC. The thickness was chosen to optimally absorb the respective 200 keV and 300 keV incident electrons within the sensor. Whereas the camera housing is identical in both setups, the flight tubes have different lengths, resulting in a post-magnification of 2.06 and 1.28 relative to the nominal magnification for Maastricht and Basel, respectively. The field of view of the images collected on the Polara was restricted in a circular fashion by the dimensions of the flight tube. Full dose radiation shielding checks were performed successfully on both sites.

Monte Carlo simulations
Monte Carlo simulations provided us with the training data of an incident electron hitting a known position on the sensor layer and its potential detector responses. The Geant4 framework has been widely applied for Monte Carlo simulations of particles passing through matter [37]. Using the Geant4 framework, the simulation tool Geant4Medipix has been developed to simulate a number of chips from the Medipix family, including the Timepix3 [38,39]. We used it to simulate the  passing of electrons through the sensor, the drifting of the E-H pairs to the electrodes as well as the readout electronics response. The Geant4Medipix code was adapted to be multi-threaded and provide HDF5 output such that a large number (n=50000) of events can be simulated efficiently. Simulations were performed for a 300 µm or 500 µm thick silicon slab with 20x20 pixels at a 55 µm pitch. The simulated readout electronics were configured to match the properties of the Timepix3. Each simulation event consisted of one electron accelerated at 200 keV or 300 keV hitting a random position within one pixel of the 20x20 matrix. This ensured an even distribution of incident positions within one pixel. For each simulation event multiple pixels are contributing and this led to a variable number of hits in the output. Each hit contained information of both ToT and ToA. The lowest ToA value of a cluster was set to 0 where after the ΔToA of each hit in the cluster was calculated. The cluster output of each event was normalised to a 10x10 matrix with each cluster starting at the 0,0 position. This last step ensured that the simulated cluster output could be compared to the output obtained from the detector.
The output of the simulations were globally validated with experimental detector output using a variety of histograms. For each cluster, either experimental or simulated, the sum of ToT values was calculated. The simulations were scaled to the experimental data by adjusting the simulated Krummenacher current such that the most abundant summed ToT values would coincide [40]. The simulator includes a parameter to describe electronic noise contributions from the analogue front end. This parameter required minor tuning to further optimise the concurrence between simulation and experiment. Fig. 2 shows a 2D histogram of the number of hits in a cluster versus the summation of the ToT values of all hits within such a cluster. The 2D histogram for simulated data (Fig. 2a) is in good agreement with experimental data (Fig. 2b). The main difference occurs in the lower left corner of the histograms, where the experimental data shows more counts. The histogram bin at [1,1] can be attributed to X-rays generated in the electron microscope which were not simulated. There is a tail visible between the [1,1] bin and the peak of the histogram in both the experimental and simulated data which can be attributed to backscattered electrons from the sensor. These electrons are not fully absorbed and leave smaller clusters behind with less energy deposited. They account for about 10% of the total number of electrons. The experimental data also contained data points such as stray cosmic rays, edge events, edge pixels, masked or unresponsive pixels, and coincident electrons, which were not included in the simulation.

Correcting for non-uniform pixel response in Time over Threshold
Every pixel of the Timepix3 has its own correlation between the amount of energy deposited and the time the signal is over the threshold (ToT). A transmission electron microscope provides a very monochromatic beam of less than 1 keV spread. This means that every pixel should have a similar distribution of the ToT signal. By using a flat field of the electron beam and sampling a large number of events it is possible to correct for non-uniformities in the silicon sensor layer and the pixel electronics. These inhomogeneities can be attributed to local differences in the lithography process. Using a minimum of 10 Ghit of flat field data, a per-pixel correction was calculated. For this correction, an average response was calculated with roughly 50% of the active pixels. Pixels with a response too far from the average were discarded. This average response was then transformed into a normalised cumulative curve. This curve was used as an ideal reference cumulative distribution curve and was fitted with multiple 3 th -order polynomial. For every pixel the cumulative distribution curve was calculated and by means of histogram matching a new ToT distribution was created for each of these pixels by being matched to the conclusive 3 th -order polynomial curves (Fig. 3a and b). This created a per-pixel correction look-up table (512x512x1024) of a measured ToT value to a corrected ToT value (Fig. 3c).

Methods for incident electron event localisation
Six different algorithms for electron localisation were tested: random position (as a control), centroid, highest ToT, highest ToA, CNN trained on ToT (CNN-ToT) and CNN trained on ToT and ToA (CNN-ToT-ToA). The random method selects a random sub-pixel position within the pixels forming a cluster. The centroid method calculates the geometrical center of a cluster where each hit has been weighted according to their ToT values. The highest ToT and highest ToA method selects the centre of the pixel with the highest ToT and ToA value respectively or randomly between them in case of a draw.
Training of the CNN was performed using 50,000 independently simulated events on either just the ToT channel or on both the ToT and ToA channel. The CNN used the simulated 10x10 matrix as input and consists of a separable 2D convolutional layer followed by three times a drop-out/dense layer to gradually reduce to an x and y output. It used the Adam optimiser and ReLU activation method [41,42]. It was trained for 200 epochs towards convergence. The CNN has been implemented in Tensorflow 1.4 using Python3 and Keras 2.1.
The event localisation methods were tested using 50,000 events for both 200 keV electrons and 300 keV electrons and calculated the mean distance, the root mean square deviation (RMSD) as well as the median between the simulated incident positions and the predicted electron positions. Fig. 4 shows three example prediction plots and

Processing pipeline to form an image from raw data
The Timepix3 ASIC delivered, in data driven mode, a stream of hits, each hit containing positional, ToT and ToA information. Several steps were needed to process these raw hits into a final image or movie (Fig. 6). First, the ToT values within the raw data stream were corrected using the obtained ToT lookup table (Section 4). Subsequently, we searched for clusters of hits which are formed by a single incident electron. The DBSCAN clustering algorithm [43] was chosen experimentally for its speed in handling clusters of various sizes. The euclidean distance between all hits was calculated and the DBSCAN parameter ϵ (eps), specifying the radius of a neighbourhood with respect to another hit, was set to 1. Clusters were formed by hits which were directly adjacent to each other in time and space: adjacent hits were selected that occur within a time interval of 50 ToA clock ticks ( ≈ 78 ns). The 50 ToA clicks were scaled to a value of 1, to match the euclidean distance of 1. Then, clusters were filtered based on their cluster size and cluster ToT sum. These values were between 2 and 10 for the cluster size and between 200 and 400 for cluster ToT sum for 200 keV and 4 and 14 for the cluster size and between 350 and 525 for the cluster ToT sum for 300 keV (Fig. 2c and d). The lowest ToA value of a cluster is set to 0 and thereby the ΔToA of each hit in the cluster was calculated. A sub-pixel position was determined for each individual cluster by the selected event localisation algorithms. Finally, these obtained positions were placed within the complete image frame. To compensate for a skewed sub-pixel distribution the edges of the subpixels within the original pixel were adjusted such that the distribution became uniform. These adjustments were at most 5% (Fig. 7).
The processing pipeline, including event localisation, was written in Python3 making use of the Numpy, Scipy, Keras and Tensorflow libraries [44]. Processing of 70 MHit (typical 1 second exposure at an electron flux of 40 e /Å 2 /s at 200 keV) still lacks significantly behind with the exposure time. Data conversion currently takes about 2 min, the event localisation step about 3 min (Intel Xeon E5-2680, 20 cores, NVIDIA GeForce GTX 1080 Ti). The current cluster finding algorithm needs further optimisation. Fig. 6b and c show the unprocessed hit image and the processed CNN-ToT event localised image of a protein sample recorded in Maastricht. A truncated mutant of the Mycobacterium tuberculosis protein EspB was expressed in Escherichia coli, affinity purified using a nickelcolumn on a His6-tag, SEC purified, concentrated to 1.3 mg/ml, prepared on R1.2/1.3 Quantifoil grids 300 Au mesh (www.quantifoil.com), using the Vitrobot (Blot force 5, blot time 4). Images were recorded over 2 seconds, using a pixel magnification of 1.24 Ångström and an electron flux of 40 e /Å 2 /s. At 200 keV each electron hits on average 4.9 pixels (Fig. 2c). This electron flux corresponded to 51.3 Mhit/s at the detector and thus well within the achieved maximum of 110 Mhit/s.

Validation of prediction model
A visual way to asses the performance of various event localisation methods is to analyse the image of an EM grid [45]. We obtained low magnification images to ensure sample drift is minimal. The resulting images had high contrast with well-defined spots in their power spectra that extends beyond the Nyquist frequency of the detector. Fig. 8 shows the image taken of UltrAuFoil 200 mesh grids at 225x magnification. To quantify the differences between an event localisation method and the original image, spots at increasing spatial resolution were chosen. From the ratio of the normalised amplitudes the MTF enhancement could be obtained ( Fig. 8 and supplemental Figure S1 and S2) [12].
The MTF of the detector with and without event localisation methods applied was measured using the knife edge method. The  benefit of this method is that it measures the MTF over all spatial frequencies [8]. Knife edge images were obtained using the aluminium shutter positioned approximately one centimetre above the detector. The shutter was positioned to partially cover the detector. MTFs were calculated from the images formed by hits alone as well as from images reconstructed by the event localisation method (Fig. 9) [46]. DQE is a less relevant measure here as, among others, the noise spectrum cannot be accurately determined due to replacement of events in counting algorithms [9]. The MTFs shown in Fig. 9 validates the benefits of the presented event localisation methods described in here.

Discussion, conclusion and outlook
Every electron counts when using cryo-EM for the imaging of radiation sensitive samples. Accurate event localisation is seen as a bottleneck for optimal cryo-EM detector performance [1]. In here, we have shown a new method for event localisation using the Timepix3, with its unique ToT and ToA channels.
Like most pixelated detectors, the Timepix3 detector displays a nonuniform pixel response that needs to be corrected (Fig. 3). The histogram normalisation described in Section 4 corrects for ToT inhomogeneities. Upon correction, we can measure the energy of each incident electron with a resolution of 13.6 and 23.5 keV for 200 keV and 300 keV respectively (Fig. 3d). This allows us to distinguish primary electrons from other events such as cosmic and X-rays (Fig. 2). The energy resolving power may have some applications for thick specimens, albeit its resolution remains far above the energy resolutions used in energy-filtered EM or electron energy loss spectroscopy.
Unfortunately, there was still a systematic pattern left after eventlocalisation using the CNN trained on both ToT and ToA information ( Figure S3), which can be attributed to inhomogeneities in the ΔToA data. Our data suggests that there is a systematic difference in ToA   Figure S1 and S2 all other described event localisation methods are included for 200 keV and 300 keV data respectively. response between pixels in the ASIC. The observed pattern hints at the source being the super pixel structure of the Timepix3 ASIC. Others have also reported seemingly similar systematic patterns and reported solutions in the form of extra calibration steps [47]. Such calibration steps would have required us to unmount the detector from the microscope. Future alternative calibration steps may overcome that requirement.
We could simulate the response of the Timepix3 detector using Monte Carlo simulations (Section 3). This enabled us to numerically evaluate the performance of the detector at different energies, for different sensor layers and thickness. The simulations confirmed that primary electrons would excite multiple pixels of a silicon sensor layer at energies > 80 keV, calling for accurate sub-pixel localisation schemes. The availability of the simulated electron events allowed us to use these as training data for machine learning methods.
We show that selecting the centre of a pixel with the highest ToT value (the pixel which received the most energy), provides the least accurate point-of-impact location, both at 200 keV and 300 keV (Figs. 4 and 7). This may be due to the electron losing most of its energy in the last part of its trajectory [7]. The highest-ToT method may thus, in many cases, be selecting the last pixel of an electron trajectory. Similarly, the centroid method is therefore selecting a pixel away from the point-of-impact.
The ToA channel provides unique data about the relative timing of the signal that each pixel within a cluster received. The ΔToA value of a pixel proved to be a very effective measure for the z-position of the trajectory of the incident electron (Section 1). A high ΔToA value reflects a pixel where the trajectory of the incident electron was high in the sensor layer. Such a high ΔToA value is therefore likely to be close to the incident electron position as the electrons travel mainly downwards.
Machine learning methods can improve on the point-of-impact localisation accuracy compared to traditional methods. We arrived at using a convolutional neural network (CNN) due to its suitability for handling multi-channel 2D features and ease of use in the available software frameworks. A CNN, capable of exploiting both ToA and ToT channels simultaneously, gave the best point-of-impact localisation results. Surprisingly, the CNN trained on only ToT information is, at both 200 keV and 300 keV, performing nearly as well as the model trained on both ToT and ToA information.
Analysing the sub-pixel assignments of the event localisation methods showed a non-uniform distribution (Fig. 7). For both the experimental and the simulated data, an even distribution of electrons across each pixel was used. However, the different event localisation methods do not provide an even distribution as outcome. While it cannot be excluded that some of the non-uniform distribution originates from the event localisation model itself, it is striking that others have also reported non-uniform sub-pixel distributions using the Timepix3 [48]. This could hint at a possible problem with the ASIC. We applied a pragmatic approach to correct for the observed non-uniform sub-pixel assignments (Section 6). We validated our CNN event localisation scheme by analysing high contrast images of UltrAuFoil grids and determining the MTF using the knife edge method (Section 7, Figs. 8 and 9). Both CNN models show significant improvements. We compared the obtained MTF curves with the theoretical MTF values using Eq. 10 and Eq. 14 of [7]. By calculating the point spread function from the data underlying Fig. 5 and fitting Eq. 10 to find λ, theoretical MTF curves (Eq. 14) were obtained that were better than the observed ones (supplemental Figure S4). Future work on providing improved neural network training data and better accounting for residual ΔToA inhomogeneities, could minimise these differences.
Our results indicate that hybrid pixel detectors can be used as a counting direct electron detector for cryo-EM at 200 keV or 300 keV in imaging or diffraction mode. Using the Timepix3 we were able to image the entire useful dose lifetime (40 e /Å 2 ) of a protein within a single second exposure. This provides great prospects for single particle applications. Being able to work with a higher flux mitigates sample-stage drift issues and enhances throughput. As the Timepix3 in data driven mode does not record frames, but rather a stream of events, alternative data collection and storage schemes could be envisioned. A synchronised sample-beam movement with a continuous streaming of localised electrons could accelerate cryo-EM by at least an order of magnitude [49]. This will require several improvements. These include the handling of the area in between the adjacent quadrants of the chip, the effect of masked pixels and the systematic pattern in the ΔToA data. The tileability of the Timepix4 should address the limited field of view of the Timepix3, whereas the maximum count rate per detector area and ToA time resolution should also be improved for the Timepix4. As of this writing, the Timepix4 is being developed, just like numerous alternative detector developments. These could all make huge impact in getting better data faster, both in imaging and diffraction mode.
We envision that the use of neural networks could also help in improving the point-of-impact localisation procedures implemented for current MAPS detectors. Electrons make erratic tracks through the sensor layer which can be trained to a neural network when an accurate simulator is available. Alternatively, the neural network could be trained with experimental data provided a small sub-pixel beam could be directed to a known position within a pixel of such a detector. The success of a CNN trained on ToT alone makes us optimistic that the gap between actual and ideal detectors can be further narrowed in the years to come.

Data availability
The dataset [50] containing simulated data, unprocessed experimental data, the ToT-correction matrices, and the CNN models, together with instructions to reproduce results and figures, are publicly available on Zenodo.

Declaration of Competing Interest
The Maastricht University filed a patent with some of the authors as inventors regarding sample inspection for cryo-EM as outlined in this manuscript.