Convolutional Neural Networks for cytoarchitectonic brain mapping at large scale

Human brain atlases provide spatial reference systems for data characterizing brain organization at different levels, coming from different brains. Cytoarchitecture is a basic principle of the microstructural organization of the brain, as regional differences in the arrangement and composition of neuronal cells are indicators of changes in connectivity and function. Automated scanning procedures and observer-independent methods are prerequisites to reliably identify cytoarchitectonic areas, and to achieve reproducible models of brain segregation. Time becomes a key factor when moving from the analysis of single regions of interest towards high-throughput scanning of large series of whole-brain sections. Here we present a new workflow for mapping cytoarchitectonic areas in large series of cell-body stained histological sections of human postmortem brains. It is based on a Deep Convolutional Neural Network (CNN), which is trained on a pair of section images with annotations, with a large number of un-annotated sections in between. The model learns to create all missing annotations in between with high accuracy, and faster than our previous workflow based on observer-independent mapping. The new workflow does not require preceding 3D-reconstruction of sections, and is robust against histological artefacts. It processes large data sets with sizes in the order of multiple Terabytes efficiently. The workflow was integrated into a web interface, to allow access without expertise in deep learning and batch computing. Applying deep neural networks for cytoarchitectonic mapping opens new perspectives to enable high-resolution models of brain areas, introducing CNNs to identify borders of brain areas.


Introduction
Human brain atlases provide a spatial framework for localizing information retrieved from neuroscientific studies of different brains, addressing brain organization from different angles and including different data modalities. The cerebral cortex of the brain is organized into cortical areas, which each have a specific functional role. They can be identified in cell body stained sections based on cytoarchitecture. Regional differences in the spatial arrangement and composition of the cells covary with changes in connectivity and function [1]. Cytoarchitectonic borders can be identified in microscopic scans of histological brain sections, based on the analysis of the arrangement and distribution of cells, their different morphology and size, as well as differences in the appearance and relative thickness of cortical layers. Such criteria have been formulated for the first time more than a century ago to map the cerebral cortex, and still serve as guidelines for cytoarchitectonic analysis [2] . Different approaches have been proposed in the past to identify positions of borders in a reliable manner [3,4,5]. The de-facto standard for identifying borders of cytoarchitectonic areas in the human cerebral cortex is a method based on multivariate statistical image analysis [3], which has been applied for the identification of more than 200 areas to date [6]. To map the whole extent of an area in both hemispheres, and to capture its intersubject variability through studies in large samples, however, is extremely time-and labor-intensive: Cytoarchitectonic maps need to aggregate properties across many histological sections and multiple brains. To address this challenge, mapping includes a subset of histological sections (every 15-60ths section, i.e. 0.3mm to 1.2mm distance between sections) of ten human postmortem brains resulting in analyses of several hundred sections per area, which corresponds to a workload in the order of one or even several person years per area [6].
Recent high-throughput scanning devices and powerful compute resources enable a much higher degree of automation in digitalization and analysis of whole human brain sections at microscopical resolution. Technological progress has made it possible to 3D-reconstruct a complete postmortem brain at 20 micron spatial resolution with more than 7000 sections -the BigBrain [7]. This highresolution brain model opens the possibility to produce complete maps of cytoarchitectonic areas at full microscopic resolution, and to cover large image stacks with brain areas extending across thousands of sections Hereby, each section image has up to 120 000 × 80 000 pixels image size each.
In order to address these challenges, a method is required, which 1. automatically classifies brain areas based on cytoarchitectonic criteria, 2. handles series with thousands of 2D images of histological sections with data in the Giga-to Terabyte range, 3. is robust against histological artefacts, which are inevitable in large section series, 4. provides stable results independently of the cutting plane, e.g. when changes in the cutting direction relative to the brain tissue prevents analysis of the 6-layered structure of the cerebral cortex (in the following referred to as oblique cuts), and 5. can be operated and supervised by neuroscience experts without requiring advanced computer science skills.
Previous experience in cytoarchitectonic mapping has shown that the identification of brain areas considers multiple parameters. This is true for traditional visual inspection using a light microscope, as well as for automated mapping approaches. It involves complex multi-scale texture patterns, from the level of neurons up to a level of cortical layers and areas. However, several parameters that can be used for identification of cortical areas heavily depend on the cutting plane of the histological sections with respect to the orientation of cortical columns. The highly folded cerebral cortex of the human brain hereby poses particular challenges, since brain areas may appear in a very different way in dependence on the cutting angle. Thus, brain mapping needs to operate in a variable data space, where no restrictions should be made on the orientation of the cutting plane relative to the course of cortical layers and the brain surface. In addition, automated brain mapping needs to consider variation in tissue quality and staining, as well as histological artefacts. Finally, automated mapping methods must take into account variations in cytoarchitecture between different brains and lead to identical parcellations, even if interindividual differences in cytoarchitecture are large.
Previous work on automated cytoarchitectonic area segmentation [8,9] proposes to use Convolutional Neural Networks (CNNs) for automatic segmentation of multiple cytoarchitectonic areas across multiple human brains. This is a remarkably challenging task, as the model needs to be robust against the considerable interindividual variability of the human brain, inevitable histological artefacts, variations in staining, and oblique cuts, to name only a few of the constraints. At the same time, it has to be highly sensitive to variations of cytoarchitecture in different brain areas, which may be subtle. This may result in a need for large amounts of training data, which is difficult to cover. Consequently, such generalized segmentation models are still subject to active research.
We here propose a new workflow for cytoarchitectonic mapping of a target area across large or complete series of histological human brain sections with a high degree of automation. The workflow is illustrated in Fig. 1. Following a "divide & conquer" approach, the full extent of a target brain area a is subdivided into intervals of sections, which are enclosed by annotations created at approximately regular section intervals. Separate CNNs are then trained for each interval, using the enclosing annotations as training data. This results in a set of local segmentation models, each specialized to automatically map only the tissue sections which fall into the corresponding interval.
By training local models for each interval of target area a, an interactive workflow is obtained that allows an expert to label cytoarchitectonic areas in full stacks of histological sections with minimal manual annotation, aided by Deep Learning, and at a speed that matches high throughput image acquisition.
In this work, we 1. introduce a method to automatically map cytoarchitectonic brain areas across large series of histological human brain sections (Sec. 2), 2. evaluate its precision on 18 cytoarchitectonic areas from the BigBrain dataset [7] to investigate its applicability to a wide range of different brain areas, 3. assess its precision for two areas in three brains with variable staining protocols [7,10,11] to investigate robustness against interindividual differences and different staining procedures, and 4. create highly detailed and complete 3D maps of four areas in the BigBrain dataset and evaluate their anatomical plausibility 1 .

GLI-based mapping of cytoarchitectonic areas for training and validation
Our proposed method requires annotations of the target area at roughly regular intervals in approximately 1% of sections in the stack. Such annotations consist of localizations of areal borders in the section, and are defined using the well-established GLI-based mapping procedure described in [3]. This approach starts by scanning the histological images and by building a Gray Level Index (GLI) image [3]. The GLI is a measure of the volume fraction of cell bodies [12]. In a next step, profiles extending from the cortical surface to the white matter border are extracted along Laplacians, which reflect laminar changes in the volume fraction of cell bodies, and thus encode cytoarchitecture. The cortical surface and the white matter border are manually identified. Using a sliding window procedure across the cortical ribbon, the similarity of blocks of profiles is being estimated by the Mahalanobis distance, a multi-variate distance measure, at each position, that is combined with a Hotelling's t-test for checking significance. Borders between areas are indicated by significant peaks in the Mahalanobis distance function. The positions of borders are then labeled in the image. These borders are then used as a basis for the network training and validation.

Datasets
The datasets used in this study comprise image series of histological sections of three human brains, which have been stained for neuronal cell bodies [6,11]. The brains vary in terms of cytoarchitecture and folding pattern, as well as staining properties, presence of histological artifacts and other features (Fig. 2). Areas have been mapped in the past (cf. Sec. 2.1) using at least every 60th section of the series. These maps provide the basis to train the neural network models and to perform automatic segmentation in previously unseen, close by sections.
The first dataset -denoted as B20 -is based on the original histological sections of the publicly available microscopic 3D model BigBrain [7]. The dataset consists of images of 7404 coronal sections with a thickness of 20µm. A modified Merker stain [13] was used to stain cell bodies. A subset of sections was scanned at 1µm resolution using a high-throughput light-microscopic scanner 1. Visual areas hOc1, hOc2 [10], hOc3v [14] and hOc5 [15]. Additional annotations at an interval of approximately 30 (0.6mm) sections were created for hOc5, as well as on a small set of sections containing hOc3v [16,17,18,19].
The BigBrain dataset has been fully reconstructed at 20µm [7] and therefore opens the possibility to investigate the 3D consistency of the computed maps after transformation into the reconstructed space.
Brain areas differ in cytoarchitecture, as well as in size and in how much the morphology of an area changes across a series of consecutive brain sections. This has implications for the amount of annotations required to capture the relevant properties of certain areas. For example, hOc1 is large and shows only moderate changes across consecutive sections. In comparison, hOc5 is considerably v 1 cm B C A Figure 2: Example images of cell body stained histological human brain sections taken from datasets B20 (A), B01 (B) and AAHB (C). All sections were sampled from a comparable region of the occipital lobe. Differences arise from intersubject variability and variations in staining and histological processing protocols. Locations of detail views (2mm × 2mm) are marked with red squares. For B20 and B01 , only the right hemisphere is shown. AAHB only includes a single hemisphere. Cerebellum was removed from B20 and AAHB for visualization. Scale bar: 1cm (same for all three sections).
smaller, and hOc3v changes considerably across consecutive sections (see Fig. 8, C-F), resulting in a need for more annotations to capture their structure.
The second dataset -B01 -has also been used for mapping in the past, whereby every 15th section of the whole series of sections was stained and digitized. This brain was 3D reconstructed with a spatial resolution of 1mm isotropic [6]. Annotations for visual areas hOc1 and hOc2 at an interval of approximately every 60th section [10] in a subset of sections have been used. This dataset serves to investigate robustness against intersubject variability, while the lab protocol is similar to the one used for B20 .
The third dataset -AAHB -, comes from the Allen Adult Human Brain Atlas [11]. It includes 106 unevenly spaced, publicly available sections. In contrast to the first two series of images, it differs in thickness (50µm), and the staining method (Nissl staining). Annotations are provided for cortical and subcortical gray matter according to a modified Brodmann scheme on one hemisphere (cf. [11]). This dataset is used to investigate robustness of the proposed method against variable lab protocols and delineation criteria with respect to areas hOc1 and hOc2, which correspond to "primary visual cortex (striate cortex, area V1/17)" (identifier 10269) and "parastriate cortex (area V2, area 18)" (identifier 10271), respectively, in the Allen ontology.

Local segmentation models
Annotations of cytoarchitectonic areas based on GLI mapping (Sec. 2.1) were used to train CNNs, which we refer to as local segmentation models. Each local segmentation model f a [s 1 ,s 2 ] was trained on two sections s 1 and s 2 (the training sections) with available annotations for a target area a. Trained local segmentation models were then applied to "fill the gaps", i.e. to automatically vi segment the target area in sections enclosed by the respective training sections s 1 and s 2 (Fig. 1).
The focus on a single target area and a spatially restricted stack of consecutive sections reduces cytoarchitectonic and morphological variations that need to be captured by the respective models, which we expect to result in improved performance compared to training models for multiple areas or a wider range of sections as proposed in [8].
We trained local segmentation models for 18 cytoarchitectonic areas in B20 and two areas in each of B01 and AAHB . Fig. 3 gives an overview of sections used for the individual areas. Most local segmentation models were trained on two training sections with annotations at ∼ 2.4mm distance, corresponding to ∼ 120 sections for B20 and B01 and 48 sections for AAHB . Additional local segmentation models with a reduced interval size of 60 sections (1.2mm) were trained for areas hOc3v and hOc5 to account for highly variable morphology (hOc3v, see for hOc1 in B20 ) were processed using the closest available local segmentation model. For example, model f B20−hOc1 [181,301] was also applied to the section interval [1, 181].

Neural network architecture
For local segmentation models, the modified U-Net architecture [31] proposed by [8] was extended into a multi-scale neural network model (Fig. 4, C). U-Nets have proven to be very powerful for many applications in biomedical image segmentation (e.g. [32,33]). They consist of an encoder and decoder branch, which are linked by skip-connections between layers of corresponding spatial resolution to allow recovery of fine-grained details during upsampling. To show the benefit of using a multi-scale variant of U-Nets, three network variants were used: A high-resolution encoder network (HR), a low-resolution network (LR), and a combined multi-scale architecture (MS ).
High Resolution Encoder architecture ( HR). The architecture proposed in [8] was used as base architecture (Fig. 4, A). A high resolution encoder E HR receives high resolution input patches with a size of 2025 × 2025 pixels at 2µm pixel resolution (4.05 × 4.05mm 2 ) and enables recognition of fine-grained microstructural textures. It consists of six convolutional blocks, with the number of filters set to {16, 32, 64, 64, 128, 128} respectively. All but the last block are followed by a maxpooling operation with pool size 2 and stride 2. The first layer of the first block in E HR uses a filter size of 5 and a stride of 4, which increases the receptive field while keeping memory consumption and computational effort tangible. All remaining convolutional layers of E HR use a kernel size of 3 and stride 1. Weights of E HR were pre-initialized from the trained self-supervised network proposed in [9]. The decoder consists of four convolutional blocks with the number of filters set to {128, 64, 64, 32} respectively. Each block is preceded by an upsampling block, which consists of a nearest neighbor upsampling with kernel size 2 and stride 2, followed by a zero-padded convolutional layer with kernel size 2 and stride 1. All convolutional operations in the network are followed by batch normalization [34] and Rectified Linear Unit (ReLU) non-linearity.
Multi-scale network architecture ( MS). The multi-scale network architecture was obtained by attaching a low resolution encoder E LR as a second branch to HR, which receives lower resolution image patches with a size of 682×682 pixels at 16µm pixel resolution (10.912×10.912mm 2 ), centered at the same location as E HR patches. This branch allows to learn features at the scale of local cortical folding patterns. Although such macroscopic features are not generally representative of cytoarchitecture in human brains, as they vary largely between individuals [2], they are appropriate in the present setting due to the locality of the network models. E LR is based on E HR , and composed of six convolutional blocks with the same number of filters as E HR .  Figure 4: Illustration of investigated neural network architectures. A) High resolution architecture (HR) from [8], which can capture fine-grained microstructural textures. B) Low resolution architecture (LR), which can capture macroscopic tissue features. C) Proposed multi-scale architecture (MS ) to capture both fine and coarse grained tissue features. EHR is pre-initialized with weights of the self-supervised network proposed in [9]. Numbers at the top of each block denote the number of filters used in the convolutional layers of this block. Numbers at the bottom denote the physical output spacing in µm per pixel for layers which change the physical spacing of the features.
filters use a filter size of 3 and a stride of 1. Convolutional layers in the first block use a dilation rate of 1, while all other convolutional layers within E LR use a dilation rate of 2 to enlarge the receptive field.
Low Resolution Encoder architecture ( LR). The third architecture is based on HR, but replaces the encoder E HR with E LR (Fig. 4, B). By design, this model can only recognize macroscopic tissue features, and no detailed cytoarchitectonic properties at the level of cell bodies.

Training strategy
Stochastic gradient descent with Nesterov momentum [35] was used as optimizer for training the neural network models. Training was performed for 3000 iterations. The learning rate was initially set to 0.01 and decreased by a factor of 0.5 after 1000, 1400, 1800, 2200 and 2600 iterations.
Momentum was set to 0.9. Categorical cross-entropy with a weight decay of 0.0001 was used as loss function.
ix Background class labels. [8] reported convergence problems when training models with a single background class that includes both white and gray matter components, resulting in a mix of tissue parts with very high and very low similarity to the target area under the same classification label.
Thus, the general background class was split into separate labels for gray matter (cor) and white matter (wm), resulting in a semantic segmentation problem with the four classes bg, wm, cor, and the target area a. For splitting the background class into wm and cor, different strategies were used for each dataset: 1. For B20 , a volumetric tissue classification presented in [36] was projected onto the 2D histological sections using transformations provided by the authors of [7].
2. For B01 , the gray white matter segmentation described in [8] was used.
3. For AAHB , the respective delineations available from the Allen ontology [11] were used.
Patchwise training. The full resolution scans of the whole-brain sections are by far too large to be used for training. Thus, a patchwise training procedure as also proposed in [31,8,9] was employed.
However, due to the locality of local segmentation models, patches were sampled only in the direct proximity of the target brain area a, to effectively teach the models to distinguish a from its immediate surroundings. Only pixels with a distance of 5mm or less to any pixel annotated as a were considered as potential center points for training patches. x Interface (MPI) using mpi4py [38]. Training was implemented using TensorFlow [39]. Distributed training was performed using Horovod [40] and synchronous distributed stochastic gradient descent.
Batch size was set to 16 image patches per GPU, resulting in a total effective batch size of 64 patches per iteration. The linear learning rate scaling rule for distributed training proposed in [41] was employed, scaling the learning rate by the number of GPUs 3 . Batch normalization statistics were computed independently for each GPU and not averaged during training. Software code is publicly available 4 .

Web-based interactive workflow for efficient cytoarchitectonic mapping
The proposed workflow was implemented as an interactive web application to provide direct user control over the segmentation workflow through a web browser 5  Data synchronization between the web server and compute nodes is handled by the backend service.
4. After inspecting the segmentation quality, the user can choose to enter additional training data, either reducing the size of the current interval or initiating the next interval in the stack.

Validation framework and strategy
Additional sections with annotations in between the training sections were used for validating performance of local segmentation models on sections that were not seen during training (orange diamonds in Fig. 1). Segmentations of these test sections were quantitatively evaluated using the F1 score (also known as Dice score or Sørensen-Dice index), computed as the harmonic mean of precision of recall. Auxiliary labels added to ensure convergence (Sec. 2.5) were excluded from F1 score calculation, as the focus lies on segmentation performance for target area a.
Similar to the proximity sampling strategy employed for training (Sec. 2.5), segmentations on sections not seen during training were only created and evaluated in the approximate region containing a on the respective sections. These approximate regions were determined by projection of the closest reference annotations for a to the image in question using conventional linear image registration based on robust image features as in [42].
The benefit of a multi-scale architecture was investigated by training separate local segmentation models with neural network architectures HR, LR and MS for all areas in B20 . For HR and MS , the high resolution encoder E HR was initialized with the weights of the network from [9]. Furthermore, the performance of multiple local segmentation models, each trained on a local subset of sections as described in Sec. 2.3, was compared to the performance of one single model trained on all annotations available for a target area a in the following way: For each target area in the B20 dataset, one model was trained using the union of all training sections of the local segmentation models (blue squares in Fig. 1), using the same training strategy as for local segmentation models.
As the models are not local and cannot rely well on tissue morphology, this comparison was only performed using the HR architecture, again pre-initializing the high-resolution encoder E HR with weights from [9]. Models trained on the full stack instead of local intervals are denoted as HR (all).
The robustness of the proposed method against intersubject variability in brain structure and differences in staining protocols was investigated by training local segmentations models (with MS architecture) for areas hOc1 and hOc2 in datasets B01 and AAHB .

Generating high-resolution 3D cytoarchitectonic maps in the BigBrain dataset
Non-linear transformations described in [7,43] from 2D histological sections into 3D reconstructed space available for the BigBrain dataset [7] were used to generate 3D maps for areas hOc1, hOc2, hOc3v and hOc5 from 2D segmentations produced by our method. Segmentations were obtained using the workflow described in Sec. 2.3 and checked for quality by an expert (e.g. plausibility and consistency across consecutive sections). For areas hOc3v and hOc5, results of segmentation models trained with a training interval size of 1.2mm were used for reconstruction (marked with * in Fig. 3). Between 8% (hOc3v) and 23% (hOc1) of sections containing the investigated areas were not used for reconstruction due to histological artifacts (e.g. resulting from long-term storage or staining inhomogeneities). Segmentations that passed the quality check were transformed into xii raw volume smoothed volume the 3D reconstructed space. Excluded sections were replaced by interpolations from neighboring sections, using Laplacian fields as proposed in [44].
Resulting 3D maps were smoothed using a median filter with kernel size 11 × 11 × 11 pixel to compensate for small artefacts. The size of the filter was chosen to match the expected precision of annotations at boundaries (not higher than 100µm), translating to 5 voxels at the target resolution To assess the improvement in 3D consistency and anatomical plausibility gained by the proposed workflow, a reference reconstruction of area hOc1 was computed, which performs a direct 3D interpolation between reference annotations obtained by GLI mapping. This reference reconstruction does not use the local segmentation models, and relies only on reference annotations and 3D reconstruction. It was computed by transforming the annotations of the training sections (blue squares in Fig. 3) into the 3D reconstructed space, and filling the gaps by Laplacian field interpolation [44].
The anatomical consistency of 3D reconstructed maps was further evaluated by computing their volume and surface area, which were then compared to reference values from [10] . The volume of each area was computed by counting the total number of labeled voxels and multiplying the result by the physical size of each voxel.
The surface area was computed by first extracting a closed surface mesh of each area using the marching cubes algorithm [45]. The subset of mesh vertices lying on the pial surface was then determined by including all triangles where the cortical depth [46] was smaller than 0.25. To obtain the cortical depth of each mesh vertex, the procedure described in [47] was applied to the cortical ribbon defined by the gray and white matter segmentation provided with the BigBrain model [36]. The result was a volumetric dataset with voxels in the white matter labelled 1, voxels outside the brain labelled 0, and voxels inside the isocortex labelled with values between 0 and 1, representing their cortical depth according to the equivolumetric model [46]. Cortical depths of mesh vertices were then looked up in this volume. Finally, the surface area of the pial surface for each cytoarchitectonic area was computed by summing up the area of all triangles associated to the pial surface.
Both volume and surface area measurements were corrected for tissue shrinkage [10]. The volume-based shrinkage factor for B20 has been determined in [48] based on the fresh weight and the volume after histological processing as f V = 1.931. From this, an area-based (2D) shrinkage

Results
The performance of the models HR, HR (all), LR and MS differed between each other. Lowest mean and median performance were obtained by HR (all), followed by HR. Both LR and MS resulted in considerably higher mean and median F1 scores than the high resolution architectures, with lower standard deviations. Highest mean and median performance was obtained by MS (Table   1, Fig. 6).
Performance also differed significantly between brain areas. All architectures show comparably good performance for hOc1. For most areas however, LR and MS achieved considerably higher performance than HR and HR (all). For areas hOc3v and hOc5, where additional models were trained with reduced distance between training sections (indicated by * in Fig. 6), performance of LR and MS increased considerably when decreasing the distance between training sections, while only minor improvements were observed for HR and HR (all).
Representative image patches segmented by the MS architecture for each investigated area extracted from test sections of B20 are shown in Fig. 7. True positive, false positive and false negative predictions are indicated in green, red and blue, respectively. A large share of incorrectly classified pixels belonged to cortical regions with highly oblique cutting angles (Fig. 8 B, C). While large rifts tended to be excluded from the prediction (Fig. 8, A), smaller rifts or tissue foldings were correctly segmented as surrounding area (Fig. 7, A, D      Scores obtained for areas hOc1 and hOc2 were overall consistent across different brain samples (Fig. 9). In all three cases, scores obtained for hOc2 were lower compared to hOc1. Lowest median F1 score for hOc2 was obtained for B20 , along with an increased variance. Example patches showing the border between hOc1 and hOc2 on test sections extracted from approximately identical brain regions in the three datasets are shown in Fig. 10.
Locations, orientations and shapes of reconstructed 3D maps (computed using steps described in Sec. 2.8) were anatomically plausible and consistent (Fig. 11). The 3D map of hOc5 showed partially missing extremal ends along the posterior anterior axis. Volume and surface estimates from the 3D maps reported in Table 2 corresponded well with the numbers reported in [10]. Surface areas of hOc1, hOc2 and hOc5 were largely confirmed with the reference values, as well as the volumes derived from automatic segmentations of areas hOc1 and hOc2. The reconstructed volume of area hOc5 stood out by being considerably smaller than the reference volume.  Figure 11: 3D maps of visual cytoarchitectonic areas hOc1 (yellow), hOc2 (blue), hOc3v (red) and hOc5 (green), obtained by transforming the independent 2D segmentations generated by the proposed method into the 3D reconstructed space of the B20 dataset. A+B: Spatial embedding of reconstructed areas into the 3D reconstructed BigBrain volume. C+D: Detailed view of reconstructed cytoarchitectonic areas. E+F: Comparison of hOc1 reconstructed based on our proposed method (E) and based on an interpolation between annotations in the reconstructed space, using Laplacian fields as proposed in [44] (F). Arrows in F) mark example locations demonstrating shortcomings of the interpolation based reconstruction. Axes x, y and z correspond to left-to-right, posterior-to-anterior and ventral-to-dorsal directions, respectively. Axis labels are specified in mm and correspond to positions in the 3D reconstructed BigBrain space. Table 2: Estimated volumes (in mm 3 ) and surface areas (in mm 2 ) of brain areas derived from the full 3D maps in the 3D reconstructed space of the B20 dataset. Reference mean µ and standard deviation σ were computed based on male subjects from [10]. Shrinkage corrected of volumes and surface areas was performed using correction factors fV = 1.931 and fA = 1.551 respectively [48]. Comparison of corresponding 3D reconstructions of area hOc1 (Fig. 11 E vs. Fig. 11 F) showed that the proposed approach provided anatomically more consistent results than direct spatial interpolation of GLI-based annotations, while both build on the same annotation effort.
3D interpolation produced abrupt transitions in anterior-posterior direction (Fig. 11, F, 1) and only captured structures already contained in the reference annotations, leading to inconsistencies near fine-grained morphological structures (e.g. Fig. 11 F, 2 and 3). The proposed method often produced reasonable segmentations for sections outside the training interval (Fig. 11, E, 1), which interpolation cannot provide by definition.

Discussion
In this work, we proposed a novel Deep Learning based workflow to create segmentations of cytoarchitectonic areas in large series of histological human brain sections using only a limited set of manually created annotations. We evaluated this approach across different cytoarchitectonic areas, brain samples and staining protocols. As a concrete use case, we then applied it to create high-resolution 3D maps of areas hOc1, hOc2, hOc3v and hOc5 in the BigBrain [7].

Quality of derived 3D maps in the BigBrain
The proposed method produced 3D maps with a high degree of anatomical consistency and identified cytoarchitectonic areas precisely in the histological brain sections. Partially missing extremal ends remain a challenge, as seen in anterior-posterior direction of hOc5. Such parts are often difficult to identify even using manual methods. Therefore, training data for such extremal ends is difficult to obtain. The segmentation of extremal ends could potentially be addressed by providing additional GLI-based mappings (at the cost of additional annotation effort), or by an explicit shape-based inference step on top of the pixel segmentation. The 3D map of hOc1 created with the proposed method is superior to the map obtained by direct spatial interpolation between GLI-based annotations. Methods based on 3D interpolation inherit any error in the alignment of xix consecutive sections, making them inappropriate for stacks with only linear or no 3D reconstruction.
The proposed method does not assume any prior 3D reconstruction -in fact its outputs might be used to guide image registration with landmarks.

Practical usefulness of the implemented workflow
The presented method showed good robustness against intersubject variability and different histological processing protocols. Thus it largely overcomes the need for brain or area specific parameter adjustments, which makes it well suited to be used as a self-contained tool for neuroscientists. Consequently, it was possible to implement it into a web application that provides a practical mapping workflow for end users from different disciplines. The web application is currently used by sections to generate precise segmentations of the complete stack, corresponding to approximately 9 working hours. Altogether, including quality checks and computations, the presented workflow allows precise mapping of a large brain area in the order of 1-2 weeks -a task that would require almost a year of work with previously established methods.

Ability to distinguish higher associative areas
In contrast to primary areas such as the primary visual cortex hOc1, so called higher associative areas have a less distinct cytoarchitecture, and less prominently differ from their neighbouring areas. Such observation lead Bailey and von Bonin to the conclusion that it is almost impossible to reliably distinguish such areas from each other, and to define borders between them [49]. This view is not supported any more due to the possibility to identify cytoarchitectonic borders in a reliable and reproducible way (for an overview see [50]). However, the fact that intersubject differences between identical areas of different brains may exceed cytoarchitectonic differences between two neighboring areas in one and the same brain creates challenges for modern brain mapping [23].
[51] also addressed automated mapping of histology. They segmented brain structures in a serial stack of human brain sections from the Allen Human Brain Atlas [11] (dataset AAHB used in our experiments). They used annotations from [11] on a small set of sections at regular intervals, in order to train a probabilistic model that combines multi-atlas segmentation with a CNN to segment the remaining sections. Compared to the present work however, their approach is restricted to brain structures that can be recognized at a resolution of 250µm. The authors confirm in their paper that xx more subtle classes, in particular subdivisions of the isocortex, introduce excessive noise with their approach. The method presented here segmented both hOc1 and hOc2 in the same dataset with high accuracy by including more fine-grained texture features into the models, thus going clearly beyond this restriction.

Benefits of using local segmentation models
Previous work on automatic cytoarchitectonic brain mapping based on machine learning emphasized that it is essential to find efficient strategies for exploiting available training data and prior information, e.g. by incorporating probabilistic priors from brain atlases [8] or self-supervised learning [9]. The key idea of this work is to use multiple local segmentation models, each of which focuses on a spatially restricted subset of sections in a specific brain area. It does not aim to learn a general classification model that captures multiple areas across many sections and brains. This is an explicit design decision in order to maximize practical benefit. The significantly improved performance of HR compared to HR (all) showed the benefit of using local segmentation models.
The local models can exploit specific cytoarchitectonic features in a particular interval of a specific brain area, while models that aim to generalize across areas and brains need to capture more general features. Due to their strict locality, local models can further rely on morphological features like folding patterns which are not generally representative for cytoarchitecture, and vary largely between individuals [2]. Consequently, the incorporation of E LR in architectures LR and MS brought a significant performance gain compared to HR and HR (all), which suggests that macroscopic tissue features are important for the local models. Our intuition that a combination of microscopic and macroscopic tissue features should be optimal is confirmed by a Wilcoxon signed-rank test [52] (p = 0.0011), that showed better performance of the multiscale architecture MS compared to LR.
A major advantage of the local segmentation models is the ability to flexibly adjust the distance between training sections to account for regions with particularly simple or complex properties. This has been demonstrated for the challenging areas hOc5 and hOc3v, where a reduction of the distance between training sections from 120 (2.4mm) to 60 (1.2mm) improved precision to a satisfactory level while keeping the annotation effort tractable. In a similar fashion, larger areas or areas with distinct cytoarchitectonic features (e.g. hOc1) can be segmented with a coarser set of training sections, in this case reducing annotation effort.
On the downside of such local models, hyperparameter assessment (e.g. for learning rate or model architecture) is not straightforward when training multiple models on different training sets and evaluating them on individual test sets. Model performance needs to be evaluated across several areas, sections and brains, which can be computationally expensive and lead to a slow development process. xxi

Mapping at highly oblique cutting angles
Many of the remaining classification errors coincide with highly oblique cutting angles of the tissue. As also reported in [3,8,9], identification of cortical areas is almost impossible at such angles, because the laminar composition of the cortex is then almost invisible in the 2D section. In such cases, experts would consult adjacent sections to identify areas, which the proposed method cannot do. An extension of the method considering multiple adjacent sections for classification might be able to overcome this issue.

Conclusion
A novel method based on Convolutional Neural Networks (CNNs) was introduced for automated mapping of cytoarchitectonic areas in large series of histological human brain sections. Segmentation models were trained for segmentation of different cytoarchitectonic areas in histological stacks obtained from three different brain samples. A key idea is to train separate local segmentation models based on annotations of one specific target area in only two training sections, to focus the learning process on microscopic and macroscopic tissue features close to the training sections. After training, local segmentation models were able to accurately segment sections in between their respective training sections. By concatenating results from multiple local segmentation models, segmentations for complete brain areas can be obtained. The proposed method opens up new possibilities to map complete stacks of histological human brain sections in a highly automated fashion, and thus provides an important basis for building high resolution human brain maps for datasets like BigBrain.
To the best of our knowledge, the maps of areas hOc1, hOc2, hOc3v and hOc5 computed for the BigBrain model using this method are the first high-resolution 3D maps of human cytoarchitectonic areas created from full stacks of histological sections at cellular resolution. These maps enable precise studies of area-specific morphological and columnar features at microscopic resolution, and in combination with existing cortical layer maps [53] an investigation into layer-specific aspects of each region. Dense maps further enable straightforward mapping from the volume to the whole brain mesh surface, which in turn facilitates comparison with other modalities, especially in-vivo imaging.
They represent an important contribution for using BigBrain as a microscopic resolution reference space, since they provide direct links to probabilistic cytoarchitectonic reference parcellations at the macroscopic scale [6] that are widely used in neuroimaging studies. As such, our work makes an important contribution to linking neuroscientific findings across spatial scales.
CRediT authorship contribution statement

Ethics Statement
The study carried out requires no separate ethical approvals. Postmortem brains were obtained in accordance to legal and ethical regulations and guidelines. Brain tissue for datasets B01 and B20 was obtained through the body donor program of the department of anatomy of the Heinrich Heine University Düsseldorf and with approval of the ethics committee of the medical faculty of the Heinrich Heine University Düsseldorf. Brain tissue for dataset AAHB was obtained from the University of Maryland Brain and Tissue Bank and with approval by the Human Investigation Committees and Institutional Ethics Committees of the University of Maryland.

Declaration of competing interest
The authors declare no competing interests.