SynEM: Automated synapse detection for connectomics

Nerve tissue contains a high density of chemical synapses, about 1 per µm3 in the mammalian cerebral cortex. Thus, even for small blocks of nerve tissue, dense connectomic mapping requires the identification of millions to billions of synapses, of which about 80-90% are excitatory synapses. While the focus of connectomic data analysis has been on neurite reconstruction, synapse detection becomes limiting when datasets grow in size and dense mapping is required. Here, we report SynEM, a method for automated detection of excitatory synapses from conventionally en-bloc stained 3D electron microscopy image stacks. The approach is based on a segmentation of the image data and focuses on classifying borders between neuronal processes as synaptic or non-synaptic. SynEM yields 98% precision and recall in binary excitatory cortical connectomes with no user interaction. It scales to large volumes of cortical neuropil, plausibly even whole-brain datasets. SynEM removes the burden of manual synapse annotation for large densely mapped connectomes.


INTRODUCTION
The ambition to map neuronal circuits in their entirety has spurred substantial methodological developments in large-scale 3-dimensional microscopy (Denk & Horstmann, 2004, Hayworth et al., 2006, Knott et al., 2008, Eberle et al., 2015, making the acquisition of datasets as large as 1 cubic millimeter of brain tissue or even entire brains of small animals at least plausible (Mikula et al., 2012, Mikula & Denk, 2015. Data analysis, however, is lagging far behind (Helmstaedter, 2013).
One cubic millimeter of gray matter in the mouse cerebral cortex, spanning the entire depth of the gray matter and comprising several presumed cortical columns (Fig. 1a), for example, contains at least 4 kilometers of axons, about 1 kilometer of dendritic shafts, about 1 billion spines (contributing an additional 2-3 kilometers of spine neck path length) and about 1 billion synapses (Fig. 1b). Initially, neurite reconstruction was so slow, that synapse annotation comparably paled as a challenge (Fig. 1c): when comparing the contouring of neurites (proceeding at 200-400 work hours per millimeter neurite path length) with synapse annotation by manually searching the volumetric data for synaptic junctions (Fig. 1d, proceeding at about 0.1 hour per µm 3 ), synapse annotation consumed at least 20-fold less annotation time than neurite reconstruction (Fig. 1c). An alternative strategy for manual synapse detection is to follow reconstructed axons (Fig. 1e) and annotate sites of vesicle accumulation and postsynaptic partners. This axon-focused synapse annotation reduces synapse annotation time by about 8-fold for dense reconstructions (proceeding at about 1 min per potential contact indicated by a vesicle accumulation, which occurs every about 4-10 µm along axons in mouse cortex, Fig. 1e). Since in the mammalian cerebral cortex, 80-90% of synapses are excitatory, the detection of inhibitory synapses is about an order of magnitude less time consuming.
With the development of substantially faster annotation strategies for neurite reconstruction, however, the relative contribution of synapse annotation time to the total reconstruction time has substantially changed. Skeleton reconstruction  together with automated volume segmentations , Berning et al., 2015, allow to proceed at about 7-10 hours per mm path length (mouse retina, Helmstaedter et al., 2013) or 4-7 hours per mm (mouse cortex, Berning et al., 2015), thus about 50-fold faster than manual contouring. Recent improvements in online data delivery and visualization (webKnossos, Boergens, Berning et al., in revision) further reduce this by about 5-10 fold. Thus, synapse detection has become a limiting step in dense large-scale connectomics. Importantly, any further improvements in neurite reconstruction efficiency would be bounded by the time it takes to annotate synapses. Therefore, automated synapse detection for large-scale 3D EM data is critical.
High-resolution EM micrographs are the gold standard for synapse detection (Gray, 1959, Colonnier, 1968. Images acquired at about 2-4 nm in-plane resolution have been used to confirm chemical synapses using the characteristic intense heavy metal staining at the postsynaptic membrane, thought to be caused by the accumulated postsynaptic proteins ("postsynaptic density", PSD), and an agglomeration of synaptic vesicles at the membrane of the presynaptic terminal.
While synapses can be unequivocally identified in 2-dimensional images when cut perpendicularly to the synaptic cleft ( Fig. 1f), synapses at oblique orientations or with a synaptic cleft in-plane to the EM imaging are hard or impossible to identify. Therefore, the usage of 3D EM imaging with a high resolution of 4-8 nm also in the cutting dimension (FIB/SEM, Knott et al., 2008) is ideal for synapse detection. For such data, automated synapse detection is available and successful (Kreshuk et al., 2011, Becker et al., 2012, 2013. However, FIB-SEM currently does not scale to large volumes required for connectomics of the mammalian cerebral cortex. Serial Blockface EM (Denk & Horstmann, 2004) scales to such mm 3 -sized volumes.
However, SBEM provides a resolution just sufficient to follow all axons in dense neuropil and to identify synapses across multiple sequential images, independent of synapse orientation (the resolution typically is about 10x10x30 nm 3 ; Fig. 1g). In this setting, synapse detection methods developed for high-in plane resolution data do not provide the accuracy required for fully automated synapse detection (see below).
Here we report SynEM, an automated excitatory synapse detection method based on an automated segmentation of large-scale 3D EM data (using SegEM, Berning et al., 2015) from mouse cortex obtained using SBEM. SynEM is aimed at providing fully automated excitatory connectomes from large-scale EM data in which manual annotation or proof reading of synapses is not feasible. SynEM achieves precision and recall for single-synapse detection of 94% and 89%, and for binary neuron-toneuron connectomes of 98% and 98% without any human interaction, essentially removing the synapse annotation challenge for large-scale excitatory mammalian connectomes.

Interface classification
We consider synapse detection as a classification of interfaces between neuronal processes as synaptic or non-synaptic ( Fig. 1j; see also Mishchenko et al., 2010, Kreshuk et al., 2015, Huang et al., 2016. This approach relies on a volume segmentation of the neuropil sufficient to provide locally continuous neurite pieces (such as provided by SegEM, Berning et al., 2015, for SBEM data of mammalian cortex), for which the contact interfaces can be evaluated.
The unique features of excitatory synapses are distributed asymmetrically around the synaptic interface: presynaptically, large vesicle pools extend into the presynaptic terminal over at least 100-200 nm; postsynaptically, the PSD has a width of about 20-30 nm. To account for this surround information our classifier considers the subvolumes adjacent to the neurite interface explicitly and separately, unlike previous approaches (Kreshuk et al., 2015, Huang et al., 2016, up to distances of 40, 80, and 160 nm from the interface, restricted to the two segments in question ( Fig. 1k; the interface itself was considered as an additional subvolume). We then compute a set of 11 texture features (Table 1, this includes the raw data as one feature), and derive 9 simple aggregate statistics over the texture features within the 7 subvolumes. In addition to previously used texture features (Kreshuk et al., 2011, Table 1), we use the local standard deviation, an intensity-variance filter and local entropy to account for the low-variance ("empty") postsynaptic spine volume and presynaptic vesicle clouds, respectively (see Fig. 1l for filter output examples and Fig. 1m for filter distributions at an example synaptic and non-synaptic interface). The "sphere average" feature was intended to provide information about mitochondria, which often impose as false positive synaptic interfaces when adjacent to a plasma membrane. Furthermore, we employ 5 shape features calculated for the border subvolume and the two subvolumes extending 160 nm into the pre-and postsynaptic processes, respectively. Together, the feature vector for classification had 3224 entries for each interface (Table 1).

SynEM workflow and training data
We developed and tested SynEM on a dataset from layer 4 (L4) of mouse primary somatosensory cortex (S1) acquired using SBEM (dataset ex145_07x2,Boergens et al.,in prep.; the dataset was also used in developing SegEM, Berning et al., 2015).
The dataset had a size of 93 x 60 x 93 µm 3 imaged at a voxel size of 11.24 x 11.24 x 28 nm 3 . The dataset was first volume segmented (SegEM, Berning et al., 2015, Fig. 1j, see Fig. 1n for a SynEM workflow diagram). Then, all interfaces between all pairs of volume segments were determined, and the respective subvolumes were defined.
Next, the texture features were computed on the entire dataset and aggregated as described above. Finally, the shape features were computed. Then, the SynEM classifier was implemented to output a synapse score for each interface and each of the two possible pre-to-postsynaptic directions ( Fig. 2a-c). The SynEM score was then thresholded to obtain an automated binary classification of interfaces into synaptic / non-synaptic (θ in Fig. 2a). Since the SynEM scores for the two possible synaptic directions were rather disjunct in the range of relevant thresholds (Fig. 2b), we used the larger of the two scores for classification. By introducing a second threshold θ 2 on the SynEM score, one can use SynEM instead in a semi-automated setting where a range of interfaces (between θ 1 and θ 2 Suppl. Fig. 1b) is manually inspected to further improve precision and recall. In the following, we report results on the fully-automated version of SynEM (Fig. 2d,e).
We obtained labels for SynEM training and validation by presenting raw data volumes of (1.6 x 1.6 x 0.7-1.7 ) µm 3 that surrounded the segment interfaces to trained student annotators (using a custom-made annotation interface in Matlab, Suppl. Fig. 1a). The raw data was rotated such that the interface was most vertically oriented in the image plane presented to the annotators; the two interfacing neurite segments were colored transparently for identification (this could be switched off by the annotators when inspecting the synapse, see Methods for details). Annotators were asked to categorize the presented interface as either non-synaptic, pre-topostsynaptic, or post-to-presynaptic (Fig. 2a, Suppl. Fig. 1a). The synaptic labels were then verified by an expert neuroscientist. A total of 75,383 interfaces (1,858 synaptic, 73,525 non-synaptic) were annotated in image volumes drawn from 40 locations within the entire EM dataset. About 80% of the labels (1467 synaptic, 61,619 non-synaptic) were used for training, the remaining were used for validation.
Initially, we interpreted the annotator's labels in a binary fashion: irrespective of synapse direction, the label was interpreted as synaptic (and non-synaptic otherwise, Fig. 2c, "binary"). We then augmented the training data by including mirror-reflected copies of the originally presented synapses, maintaining the labels as synaptic (irrespective of synapse direction) and non-synaptic (Fig. 2c, "augmented"). Finally, we changed the labels of the augmented training data to reflect the direction of synaptic contact: only synapses in one direction were labeled as synaptic, and nonsynaptic in the inverse direction ( Fig. 2c "directed").  (Table 1). This classifier provided only about 70% precision and recall ( Fig. 2d). We then extended the feature space by adding more texture features capturing local image statistics (Table 1) and shape features. In particular, we added filters capturing local image variance in an attempt to represent the "empty" appearance of postsynaptic spines, and the presynaptic vesicle clouds imposing as high-frequency high-variance features in the EM images. Also, we added more subvolumes over which features were aggregated (see Fig. 1k), increasing the dimension of the feature space from 603 to 3224. Together with additional aggregate statistics, the classifier reached about 75% precision and recall. A substantial improvement was obtained by switching from an ensemble of decision-stumps (onelevel decision tree) trained by AdaBoostM1 (Freund & Schapire, 1997) as classifier to decision stumps trained by LogitBoost (Friedman et al., 2000). In addition, the directed label set proved to be superior. Together, these improvements yielded a precision and recall of 87% and 86% on the validation set (Fig. 2d).
We then evaluated the best classifier from the validation set ( Fig. 2d, 'Direct & Logit') on a separate test set. This test set was a dense volume annotation of all synapses in a randomly positioned region containing dense neuropil of size 5.8 x 5.8 x 7.2 µm 3 from the L4 mouse cortex dataset. All synapses were identified by 2 experts, which included the reconstruction of all local axons, and validated once more by another expert on a subset of synapses. In addition we marked all shaft (to a large majority inhibitory) synapses in the test set and evaluated our approach for all 235 synapses in the test set and for 204 spine (to a large majority excitatory) synapses only (Fig.   2e). We compared SynEM to the synapse detection algorithms from Kreshuk et al., 2011, Becker et al., 2012and Kreshuk et al., 2014 on the test set ( Fig. 2e, only best result from Becker et al., 2012 shown, see Methods for training details). Especially in the high recall regime SynEM was able to significantly outperform previous approaches on the SBEM dataset with its lower resolution (see also Suppl. Table 1 for a comparison of detection methods). SynEM achieves precision and recall rates for single excitatory synapse detection of 94% and 89%, respectively (at SynEM score threshold θ s , black circle in Fig. 2e). This suggests that the asymmetry in pre-vs. postsynaptic aggregation volumes in fact contributed to classification performance, with a focus on the presynaptic vesicle cloud and the postsynaptic density.

SynEM for connectomes
We so far evaluated SynEM on the basis of the detection performance of single synaptic interfaces. Since we are interested in measuring the connectivity matrices of mammalian cortical circuits (connectomes) we obtained a statistical estimate of connectome error rates based on synapse detection error rates. We assume that the goal is a binary connectome containing the information whether pairs of neurons are connected or not. Automated synapse detection provides us with weighted connectomes reporting the number of synapses between neurons, from which we can obtain binary connectomes by considering all neuron pairs with at least γ nn synapses as connected (Fig. 3a). Synaptic connections between neurons in the mammalian cerebral cortex have been found to be established via multiple synapses per neuron pair (Fig. 3b, Feldmeyer et al., 1999, Feldmeyer et al., 2002, Feldmeyer et al., 2006, Frick et al., 2008, Markram et al., 1997, ranges 1-8 synapses per connection, mean 4.3 ± 1.4). The effect of synapse recall R s on recall of neuron-toneuron connectivity R nn can be estimated (Fig. 3c) for each threshold γ nn given the distribution of the number of synapses per connected neuron pair n syn . For connectomes in which neuron pairs with at least one detected synapse are considered as connected (γ nn = 1), a neuron-to-neuron connectivity recall R nn of 98% can be achieved with a synapse detection recall R s of 70.6% (Fig. 3c, black arrow) if synapse detection is independent between multiple synapses of the same neuron pair. SynEM achieves 99.3% synapse detection precision P s at this recall (Fig. 2e).
The resulting precision of neuron-to-neuron connectivity P nn then follows from the total number of synapses in the connectome N syn = N 2 *c r *<n syn >, with c r the pairwise connectivity rate, about 20% for local connections in cortex (Feldmeyer et al., 1999), <n syn > the mean number of synapses per connection (4.3 ± 1.4, Fig. 3b), and N 2 the size of the connectome. A fraction R s of these synapses is detected (true positive detections, TPs). The number of false positive (FP) synapse detections was deduced from TP and the synapse precision P s as FP=TP*(1-P s )/P s , yielding R s *N syn *(1-P s )/P s false positive synapse detections. These we assumed to be distributed randomly on the connectome and estimated how often at least γ nn synapses fell into a previously empty connectome entry. These we considered as false positive connectome entries, whose rate yields the binary connectome precision P nn (see Methods for details of the calculation). At R nn of 98%, SynEM yields a neuron-to-neuron connection precision P nn of 98% (Fig. 3d, black arrow, Fig. 3e). Error rates of 2% for missed connections and for wrongly detected connections are well below the noise of synaptic connectivity so far found in real biological circuits (e.g., Helmstaedter et al., 2013, Bartol et al., 2015, and thus likely sufficient for a large range of studies involving the mapping of cortical connectomes. In summary, SynEM provides fully automated synapse detection for binary mammalian connectomes up to 98% precision and recall, a range which was previously prohibitively expensive to attain by existing methods (Fig. 3f).

Local cortical connectome
We applied SynEM to a local cortical connectome between 100 axons and 100 postsynaptic processes in the dataset from L4 of mouse cortex (Fig. 4a, b). We first detected all contacts and calculated the total contact area between each pair of preand postsynaptic processes ("contactome", Fig. 4c). We then classified all contacts using SynEM (at the classification threshold θ nn (Table 3) yielding 98% neuron-toneuron precision and recall) followed by binarization at γ nn = 1 to obtain the binary connectome C bin (Fig. 4d).
In order to obtain an improved weighted connectome C w (Fig. 4e), we finally applied SynEM to those contacts belonging to neurite pairs indicated as connected in the binary connectome. Here we used SynEM at the classification threshold θ s yielding 89% single synapse recall and 94% single synapse precision. The detected synapses were clustered when they were closer than 1500 nm for a given neurite pair. This allowed us to concatenate large synapses with multiple active zones or multiple contributing SegEM segments into one (Suppl. Fig. 1d).
The resulting connectome contained 608 synapses distributed over 474 connections ( Fig. 4e,f). We finally determined whether total contact area was predictive of synaptic connectivity for these connections in cortex (Fig. 4g). Note that while in the retina ( Fig. 3e in Helmstaedter et al., 2013), total contact of more than about 1 µm 2 was highly predictive of synaptic contact, in cortex such prediction is not possible (see also Mishchenko et al., 2010, Bock et al., 2011, Kasthuri et al., 2015, Lee et al., 2016.

DISCUSSION
We report SynEM, a toolset for automated synapse detection in connectomics. The particular achievement is that the synapse detection for densely mapped excitatory connectomes from the mammalian cerebral cortex is fully automated yielding below 2% residual error in the connectome. With this, synapse detection is removed as a bottleneck in large-scale mammalian connectomics.
Evidently, synapse detection is facilitated in high-resolution EM data, and becomes most feasible in FIB-SEM data at a resolution of about 4-8 nm isotropic (Kreshuk et al., 2011). Yet, only by compromising resolution for speed (and thus volume) of imaging, the mapping of large, potentially even whole-brain connectomes is becoming plausible. Therefore it was essential to obtain automated synapse detection for EM data that is of lower resolution, but actually is scalable to such volumes.
In addition to high image resolution, synapse detection is substantially simplified for human annotators when employing recently proposed special fixation procedures that enhance the extracellular space in 3D EM data (Pallotto et al., 2015). In such data, direct touch between neurites has a very high predictive power for the existence of a (chemical or electrical) synapse, since otherwise neurite boundaries are separated by extracellular space. Thus, it is expected that such data substantially simplifies automated synapse detection. The advantage of SynEM is that it achieves fully automated excitatory synapse detection in conventionally stained and fixated 3D EM data, in which neurite contact is frequent. Such data is widely used, and acquiring such data does not require special fixation protocols.
Together, SynEM resolves excitatory synapse detection for high-throughput cortical connectomics of mammalian brains, making the efficiency of neurite reconstruction again the bottleneck for connectomic analysis.

Annotation time estimates
Neuropil composition (Fig. 1b) was considered as follows: Neuron density of 157,500 per mm 3 (White & Peters, 1993), axon path length density of 4 km per mm 3 and dendrite path length density of 1 km per mm 3 (Braitenberg & Schüz, 1998), spine density of about 1 per µm dendritic shaft length, with about 2 µm spine neck length per spine (thus twice the dendritic path length), synapse density of 1 synapse per µm 3 (Merchan-Perez et al., 2014) and bouton density of 0.1 -0.25 per µm axonal path length (Braitenberg & Schüz, 1998). Annotation times were estimated as 200 -400 h per mm path length for contouring, 3.7 -7.2 h/mm path length for skeletonization , Berning et al., 2015, 0.6 h/mm for flight-mode annotation (Boergens, Berning et al., in revision), 0.1 h/µm 3 for synapse annotation by volume search (estimated form the test set annotation) and an effective interaction time of 60 s per identified bouton for axonbased synapse search. All annotation times refer to single-annotator work hours, redundancy may be increased to reduce error rates in neurite and synapse annotation in these estimates (see Helmstaedter et al., 2011).

EM image dataset and segmentation
SynEM was developed and tested on a SBEM dataset from layer 4 of mouse primary somatosensory cortex (dataset 2012-09-28_ex145_07x2, K.M.B. and M.H., unpublished data, see also Berning et al., 2015). Tissue was conventionally en-bloc stained  with standard chemical fixation yielding compressed extracellular space (compare to Pallotto et al., 2015).

Neurite interface extraction and subvolume definition
Interfaces between a given pair of segments in the SegEM volume segmentation were extracted by collecting all voxels from the one-voxel boundary of the segmentation for which that pair of segments was present in the boundary's 26neighborhood. Then, all interface voxels for a given pair of segments were linked by connected components, and if multiple connected components were created, these were treated as separate interfaces. Interface components with a size of 150 voxels or less were discarded.
To define the subvolumes around an interface used for feature aggregation (Fig. 1k), we collected all voxels that were at a maximal distance of 40, 80 and 160 nm from any interface voxel and that were within either of the two adjacent segments of the interface. The interface itself was also considered as a subvolume yielding a total of 7 subvolumes for each interface.

Feature calculation
11 3-dimensional image filters with one to 15 instances each (Table 1) were calculated as follows and aggregated over the 7 subvolumes of an interface using 9 summary statistics, yielding 3224 features per directed interface. Image filters were applied to cuboids of size 548x548x268 voxels, each, which overlapped by 72,72 and 24 voxels in x,y and z dimension, respectively, to ensure that all interface subvolumes were fully contained in the filter output. and analogously for the other partial derivatives.
Gaussian smoothing was defined as I * g σ .
Difference of Gaussians was defined as (I * g σ -I * g kσ ), where the standard deviation of the second Gaussian filter is multiplied element-wise by the scalar k.
Laplacian of Gaussian was defined as for the neighborhood U (x,y,z) of location (x, y, z) with |U (x,y,z) | number of elements and calculated using MATLABs stdfilt function.
Sphere average was defined as the mean raw data intensity for a spherical neighborhood U r with radius r around the voxel of interest, with = {( , , )| 2 + 2 + (2 ) 2 ≤ 2 } ∩ 3 , where Z 3 is the 3 dimensional integer grid; x,y,z are voxel indices; z anisotropy was approximately corrected.
The intensity/variance feature for voxel location (x, y, z) was defined as for the neighborhood U (x,y,z) of location (x, y, z).
The set of parameters for which filters were calculated is summarized in Table 1. 11 shape features were calculated for the border subvolume and the two 160 nmrestricted subvolumes, respectively. For this, the center locations (midpoints) of all voxels of a subvolume were considered. Shape features were defined as follows: The number of voxel feature was defined as the total number of voxels in the subvolumes. The voxel based diameter was defined as the diameter of a sphere with the same volume as the number of voxels of the subvolumes. Principal axes lengths were defined as the three eigenvalues of the covariance matrix of the respective voxel locations. Principal axes product was defined as the scalar product of the first principal components of the voxel locations in the two 160 nm-restricted subvolumes.
Voxel based convex hull was defined as the number of voxels within the convex hull of the respective subvolume voxels (calculated using the convhull function in MATLAB).

Generation of training and validation labels
Interfaces were annotated by 3 trained undergraduate students using a customwritten GUI (in MATLAB, Suppl. Fig. 1a). A total of 40 non-overlapping rectangular volumes within the center 86 x 52 x 86 μm 3 of the dataset were selected (39 sized 5.6 x 5.6 x 5.6 μm 3 each and one of size 9.6 x 6.8 x 8.3 μm 3 ). Then, all interfaces within these volumes were extracted as described above. Interfaces with a center of mass less than 1.124 µm from the volume border were not considered. For each interface, a raw data volume of size 1.6 x 1.6 x 0.7 μm 3 , centered on the center of mass of the interface voxel locations was presented to the annotator. When the center of mass was not part of the interface, the closest interface voxel was used.
The raw data was rotated such that the second and third principal components of the interface voxel locations (restricted to a local surround of 15x15x7 voxels around the center of mass of the interface) defined the horizontal and vertical axes of the displayed images. First, the image plane located at the center of mass of the interface was shown. The two segmentation objects were transparently overlaid (Suppl. Fig. 1a) in separate colors (the annotator could switch the labels off for better visibility of raw data). The annotator had the option to play a video of the image stack or to manually browse through the images. The default video playback started at the first image. An additional video playback mode started at the center of mass of the interface, briefly transparently highlighted the segmentation objects of the interface, and then played the image stack in reverse order to the first plane and from there to the last plane. In most cases, this already yielded a decision. In addition, annotators had the option to switch between the 3 orthogonal reslices of the raw data at the interface location (Suppl. Fig. 1a). The annotators were asked to label the presented interfaces as non-synaptic or synaptic. For the synaptic label, they were asked to indicate the direction of the synapse (see Suppl. Fig. 1a). In addition to the annotation label interfaces could be marked as "undecided". Interfaces were annotated by one annotator each. The interfaces marked as undecided were validated by an expert neuroscientist. In addition, all synapse annotations were validated by an expert neuroscientist, and a subset of non-synaptic interfaces was cross-checked. Together, 75,383 interfaces (1858 synaptic, 73,525 non-synaptic) were labeled this way. Of these, the interfaces from 8 label volumes (391 synaptic and 11906 non-synaptic interfaces) were used as validation set; the interfaces from the other 32 label volumes were used for training.

SynEM classifier training and validation
The target labels for the binary, augmented and directed label sets were defined as described in the Results (Fig. 2c). We used boosted decision stumps (level-one decision trees) trained by the AdaBoostM1 (Freund & Schapire, 1997) or LogitBoost (Friedman et al., 2000) implementation from the MATLAB Statistical Toolbox (fitensemble). In both cases the learning rate was set to 0.1 and the total number of weak learners to 1500. Misclassification cost for the synaptic class was set to 100.
Precision and recall values of classification results were reported with respect to the synaptic class. For validation, the binary label set was used, irrespective of the label set used in training. If the classifier was trained using the directed label set then the thresholded prediction for both orientations were combined by logical OR.

Test set generation and evaluation
To obtain an independent test set disjunct from the data used for training and validation, we randomly selected a volume of size 512 x 512 x 256 voxels (5.75 x 5.75 x 7.17 μm 3 ) from the dataset that contained no soma or dominatingly large dendrite. One volume was not used because of unusually severe local image alignment issues which are meanwhile solved for the entire dataset. The test volume had the bounding box [3713,2817,129,4224,3328,384] in the dataset. First, the volume was searched for synapses (see Fig. 1d) in webKnossos (Boergens, Berning et al., under review) by an expert neuroscientist. Then, all axons in the volume were skeleton-traced using webKnossos. Along the axons, synapses were searched (strategy in Fig. 1e) by inspecting vesicle clouds for further potential synapses.
Afterwards the expert searched for vesicle clouds not associated with any previously traced axon and applied the same procedure as above. In total, that expert found Next, all interfaces between pairs of segmentation objects in the test volume were extracted as described above. Then, the synapse labels were assigned to those interfaces whose border voxels had any overlap with one of the 278 voxel-labeled synaptic interfaces. Afterwards, these interface labels were again proof-read by an expert neuroscientist. Finally, interfaces closer than 160 nm from the boundary of the test volume were excluded to ensure that interfaces were fully contained in the test volume. The final test set comprised 235 synapses out of which 31 were labeled as shaft/inhibitory. With this we obtained a high-quality test set providing both voxellabeled synapses and synapse labels for interfaces, to allow the comparison of different detection methods.
For the calculation of precision and recall, a synapse was considered detected if at least one interface that had overlap with the synapse was detected by the classifier (TPs); a synapse was considered missed if no overlapping interface of a given synapse was detected (FNs); and a detection was considered false positive (FP) if the corresponding interface did not overlap with any labeled synapse.

Comparison to previous work
The approach of Becker et al., 2012 was evaluated using the implementation provided in Ilastik . This approach requires voxel labels of synapses. We therefore first created training labels: an expert neuroscientist created sparse voxel labels at interfaces between pre-and postsynaptic processes and twice as many labels for non-synaptic voxels for five cubes of size 3.4

Pairwise connectivity model
The neuron-to-neuron connection recall was calculated assuming an empirical distribution p(n) of the number of synapses n between connected neurons given by published studies (see Supp. Table 2, Feldmeyer et al., 1999, Feldmeyer et al., 2002, Feldmeyer et al., 2006, Frick et al., 2008, Markram et al., 1997. We further assumed the number of retrieved synapses given by a binomial model with retrieval probability given by the synapse classifier recall R s on the test set: Where γ nn is the threshold on the number of synapses between a neuron pair to consider it as connected (see Fig. 3a). This equates to the neuron-to-neuron recall: To compute the neuron-to-neuron precision, we first calculated the expected number of false positive synapse detections (FP s ) made by a classifier with precision P s and recall R s : where N syn is the total number of synapses in a dataset calculated from the average number of synapses per connected neuron pair <n syn > times the number of connected neuron pairs N con and c r is the connectivity ratio given by N con /N 2 with N the number of neurons in the connectome.
We then assumed that these false positive synapse detections occur randomly and therefore are assigned to one out of N 2 possible neuron-to-neuron connections with a frequency FP s /N 2 .
We then used a Poisson distribution to estimate the number of cases in which at least γ nn FP s synapses would occur in a previously zero entry of the connectome, yielding a false positive neuron-to-neuron connection (FP nn ).
Finally, the true positive detections of neuron-to-neuron connections in the connectome TP nn are given in terms of the neuron-to-neuron connection recall R nn by = 2 .
Together, the neuron-to-neuron connection precision P nn is given by The connectivity ratio was set to c r = 0.2 (Feldmeyer et al., 1999).

Local connectome
For determining the local connectome (Fig. 4) between 100 pre-and 100 postsynaptic processes, we used 100 axonal skeleton tracings (traced at 1 to 5-fold redundancy) and 100 dendrite skeleton tracings. All volume objects which overlapped with any of the skeleton nodes were detected and concatenated to a given neurite volume. Then, all interfaces between pre-and postsynaptic processes were classified by SynEM. The area of each interface was calculated as in (Berning et al., 2015) and the total area of all contacts between all neurite pairs was calculated ( Fig. 4c). To obtain the binary connectome C bin (Fig. 4d), we applied the SynEM score threshold θ nn (Fig. 2e, Table 3), and the connectome threshold γ nn = 1. To obtain a weighted connectome C w (Fig. 4e), we re-analyzed all interfaces between neurites determined to be connected in the binary connectome. This time, the SynEM score threshold θ s was applied (see Fig. 2e, Table 3). Detected synaptic interfaces were clustered using hierarchical clustering (single linkage, distance cutoff 1,500 nm) if the interfaces were between the same pre-and postsynaptic objects.
The connection probability as a function of total contact area between neurites (Fig. 4 g) was calculated as the fraction of neurite-to-neurite connections with at least A c total contact area that was synaptically connected.

AUTHOR CONTRIBUTIONS
Conceived and initiated the project: MH; supervised the project: MH and PvdS; Developed algorithms, implemented algorithms, analyzed data: BS; provided segmentations and contributed to algorithm development: MB; provided EM data: KMB; provided expert synapse annotations: AG; wrote the paper: MH and BS. Figure 1 Synapse detection by classification of neurite interfaces. (a) Sketch of mouse primary somatosensory cortex (S1) with circuit modules ("barrels") in cortical layer 4 and minimum required dataset extent for a "barrel" dataset (250 µm edge length) and a dataset extending over the whole cortical depth from pia to white matter (WM) (1 mm edge length). (b) Number of synapses and neurons, total axonal, dendritic and spine path length for the example datasets in (a) (White & Peters, 1993, Braitenberg & Schüz, 1998, Merchan-Perez et al., 2014. and (g) 12 nm in conventionally en-bloc stained and fixated tissue , Hua et al., 2015. Note that synapse detection in high-resolution data is much  Note that SynEM scores are disjunct in a threshold regime used for best single synapse performance (θ s ) and best neuron-to-neuron recall and precision (θ nn ), see Strategy for label generation. Based on annotator labels (Ann. Label), three types of label sets were generated: Initial label set ignored interface orientation (Binary);

FIGURE LEGENDS
Augmented label set included mirror-reflected interfaces (Augment.); Directed label set used augmented data but considered only one synaptic direction as synaptic (Directed). (d) Classification performance for different features, aggregation statistics, classifier parameters and label sets. Init: initial classifier used (see Table   1). The initial classifier was extended by using additional features (Add feat, see  Table 3 for the 10 highest importance feature instances, Table 1 for feature name abbreviations, and text for details. Scale bars, 500 nm. pairs obtained from paired recordings in rodent cerebral cortex (Feldmeyer et al., 1999, Feldmeyer et al., 2002, Feldmeyer et al., 2006, Frick et al., 2008, Markram et al., 1997. Average distribution (cyan) is used for the precision estimates in the following (see Suppl.