Made to measure: An introduction to quantifying microscopy data in the life sciences

Images are at the core of most modern biological experiments and are used as a major source of quantitative information. Numerous algorithms are available to process images and make them more amenable to be measured. Yet the nature of the quantitative output that is useful for a given biological experiment is uniquely dependent upon the question being investigated. Here, we discuss the 3 main types of information that can be extracted from microscopy data: intensity, morphology, and object counts or categorical labels. For each, we describe where they come from, how they can be measured, and what may affect the relevance of these measurements in downstream data analysis. Acknowledging that what makes a measurement ‘good’ is ultimately down to the biological question being investigated, this review aims at providing readers with a toolkit to challenge how they quantify their own data and be critical of conclusions drawn from quantitative bioimage analysis experiments.


INTRODUCTION
6][7][8] These methods most often adapt computer vision algorithms to the specificities of biological image data, such as poor contrast and the presence of complex objects, with the aim to speed up and streamline the quantification process. 9Although automated analysis remains a very challenging endeavour even with the best of images, it opens the possibility to analyse data at scale, in a reproducible and objective manner. 10hile greatly enabling biological research, the democratisation of quantitative image analysis tools also poses some challenges.One of the biggest challenges is identifying which of the dozens of quantifications that can be generated by these tools are most informative and relevant to the biological question being studied.To navigate this, a helpful strategy is to scrutinise all steps from experimental design to image acquisition and ultimately data processing and analysis.This allows each experimental stage to be tailored to best inform the hypothesis being investigated but can also highlight aspects that may adversely influence the quantitative output.The first step consists of defining the goal of the imaging experiment (what quantitative property is at the centre of the question being investigated) prior to acquiring data and using this information to guide experimental design. 11Once data start being generated, the focus shifts to the many experimental factors and acquisition parameters that may impact the quality of imaging.These can include sample and labelling properties, such as photobleaching and crosstalk, as well as hardware specifics.][14] During an experiment, appropriately recording imaging parameters as metadata is crucial to ensure that downstream analysis is carried out in a meaningful way. 15It is crucial that images within an experiment are acquired in similar ways and have broadly similar properties such that they all remain within acceptable tolerances of the analytical methods.This may not always be possible though, as different biological perturbations may inherently affect image quality (for example, in fluorescence microscopy, via a change in the expression level of a fluorescently tagged protein).Similarly, the details of any image processing applied postacquisition, such as methods to remove noise or improve contrast/resolution, should be documented, and recorded.After optimising experimental design, acquisition parameters, postprocessing and recording metadata, one is left with analysing the acquired image data.At this point, the last remaining step is to outline the best way to pull out the desired measurements.This raises questions about which aspects of image data can be measured, which methods are available to do so, and what are their limitations.
Several excellent review papers assess the performance of image analysis methods in a benchmarking setting, meaning that methods are evaluated on the one general task they were designed to solve.General tasks, such as segmentation 16,17 and denoising, 18 are however generally not the end goal of an experiment.The quantitative output needed to explore a biological question is indeed rarely to improve the quality of an image or partition it.Instead, segmentation and denoising are examples of operations that enable the final quantification task, which is usually to measure a specific phenotype.To complement the existing literature on image analysis algorithms, this review focuses on the quantification problem.We discuss fundamental concepts that are relevant to image quantification, review common categories of quantitative readouts that can be extracted from image data, and introduce the appropriate metrics to do so.We identify three main categories of quantitative information: image intensity, morphology, and counts or labels.We describe the contrast generation, image formation and image processing methods that are relevant to all these categories, and review how each different quantitative readout can be extracted from the image data.We also discuss aspects impacting the quantification of each type of measurement.We close the paper with a discussion on further processing and analysis of imagebased measurements and on quality control and assurance.Considering the breadth of the topic, we choose to limit the scope of this paper to individual 2D images and 3D volumes, acquired with either fluorescence microscopy or electron microscopy (EM).We therefore do not cover measurements that are specific to time-series, multichannel imaging, or to data obtained with specialised imaging modalities such as superresolution microscopy or single particle electron microscopy.
In order to be able to formulate what one wants to measure, one must first understand what can be measured.By providing an overview of the big categories of quantitative measurements in image analysis, the goal of this review is to provide life science researchers with a framework to appreciate and scrutinise their own image data.We also aim to give insights on aspects of an experiment that impact the relevance of measurements in downstream analysis, and therefore enable readers to be critical about whether the conclusions of studies involving image quantification are meaningful or not.

Fundamental concepts
Microscopy images provide a window to the biological world on length scales beyond what the human eye can see.Although capturing a faithful representation of reality, images are indeed a 'snapshot' of the real world -not the real world itself.This seemingly philosophical distinction is crucial when images are used as a source of quantitative information.If anything is measured from an image, it becomes necessary to understand the steps involved to make reality visible, to capture it as a digital image, and to make this digital image amenable to measurements -as this is required to gauge the impact of the imaging pipeline on retrieved measurements.This section introduces general concepts that are central to any quantitative endeavour involving biological images.

'Garbage in, garbage out'
While this article reviews different types of measurements that can be extracted from microscopy data, and how to interpret such measurements, the limiting factor will always be the quality of the original image being analysed.
What may be the most important principle to bear in mind is 'garbage in, garbage out': flawed input will inevitably result in flawed output.In other words, quantitative analysis readouts can only ever be as good as the data they were extracted from, and image analysis algorithms cannot compensate for fundamentally bad image data.The general principles discussed hereafter, and the quantitative readouts explored in the rest of the paper are all to be read through the lens of this key concept.

Contrast generation
The sample preparation and contrast generation processes fundamentally differ across imaging modalities.
Contrast is most often induced by biochemical labels that are artificially introduced in samples specifically for imaging.Contrast can also be obtained in a label-free manner with dedicated optical components, as is the case for phase contrast and Differential Interference Contrast (DIC) imaging. 19n fluorescence microscopy, molecules of interest are labelled with fluorescent species such as organic dyes and fluorescent proteins.Regardless of the microscope used and downstream analysis performed, the way labelling is performed must be taken into account to provide context to any quantification.For example, if nonendogenous fluorescent protein fusion constructs are being used, how closely do measured intensities reflect endogenous protein distributions? 20If immunofluorescence techniques are being used, what factors are impacting the ratio between number of epitopes of interest and the number of fluorophores (e.g.primary and secondary antibody concentrations, antigen masking affinity, clonality of antibodies)? 21he microscope modality also affects how much out-offocus fluorescence contaminates the in-focus fluorescence measurements (Figure 1A-D).
With EM imaging, the mechanism of contrast generation varies significantly depending upon the type of experiment performed, as well as the type of microscope and detectors used (Figure 1E-H).For most of the different types of EM experiments, contrast is introduced during sample preparation.Sample contrast is enhanced by the introduction of heavy metals (e.g.osmium, lead, uranium, gold, silver etc.), which bind to lipids, proteins, carbohydrates etc. via chemical reactions, whereby the method, sequential order of addition, temperature, time of incubation, etc. can impact the process by which the contrast is incorporated into the sample. 22

Image formation
The entire purpose of any microscope is to transmit the biological information contained within the sample to the detector.The pixel values in the acquired images will depend on several factors in the imaging process, and understanding these factors is important for contextualising quantitative intensity measurements and assessing their accuracy.
In fluorescence microscopy, the intensities in an image represent the number of photons emitted by excited fluorophores.The absolute number of photons emitted by fluorescent molecules in the sample is primarily determined by the intensity of the excitation illumination incident on the sample.Increasing illumination intensity, regardless of the source (typically a lamp, LED, or laser), usually results in an increase in the number of photons emitted by fluorescence.For relatively low excitation intensities, a linear increase in excitation intensity corresponds to a linear increase in the intensity of emitted fluorescence.However, higher excitation intensities can lead to nonlinear saturation of emitted fluorescence 23 (Figure 2A and B), and increase the rate at which permanent photobleaching of fluorophores occurs (Figure 2C).The dependence of emitted fluorescence intensity on illumination intensity means that any local variations in illumination intensity within a field-of-view will affect local fluorescence intensity.The 'flatness' of the illumination can be characterised and corrected for further quantitative measurements; this can be done using a homogeneously fluorescent test sample 12,24 (Figure 2D), or via computational methods 25,26 without a test sample.
In conventional transmission electron microscopy (TEM), images are formed by detecting electrons that pass through the thin (e.g.<100 nm) sample and reach the detection mechanism (e.g.electron-multiplying charge coupled device (EM-CCD) or direct electron detector or film) (Figure 1E).Alternatively, in scanning electron microscopes (SEM), a focused beam of electrons is scanned across the sample and resulting secondary electrons (SE) and/or backscattered electrons (BSE) and/or x-rays generated are collected to form an image.The most common SEM approach is the collection of SE that have interacted with the surface of a sample, where the resulting image is a view of the surface topology of the sample with a large depth of field (Figure 1F).However, BSE can also be separately collected and mapped onto the sample, providing information about the sample's elemental composition (Figure 1G).BSE imaging has recently been exploited in a collection of volume EM (vEM) techniques, where either arrayed sections or blocks of 'resin embedded, fixed, contrasted samples' are automatically imaged (Figure 1H), generating a large 3D volume of ultrastructural data at nm resolution, across scales of 10s-100s of microns. 27,28Regardless of the detection technique, it is important to be aware that working distance, magnification, accelerating voltage, landing In the upper triangle, more heavy metal produces more BSEs and thus a stronger signal reaches the detector (appear light), while regions with less metal incorporation produce less signal (appear white).As this view is an inversion of the more traditional TEM images, researchers often invert image data (lower triangle) to allow for more comparative interpretation.Green ellipses represent increasing interaction volumes of the electron beam with increasing voltage of the primary electron beam.All EM scale bars = 500 nm.energy, exposure/dwell time, probe size and current are just some of the parameters that impact the resulting images in terms of resolution, depth of field, focus and contrast.
The final acquired images are formed by binning (see Glossary in Table 1) the detected photons (fluorescence microscopy) or electrons (EM) into pixels.The pixel size in a microscopy image plays a critical role in determining what quantitative information can be retrieved.The physical distance that each pixel represents (the pixel size) is primarily determined by properties of the detection path (for cameras, the physical size of the pixels on the chip, and for point detectors, the scanning parameters) and the total magnification of the system.

Digital image (pre)processing
Per the Nyquist-Shannon sampling theorem, retaining the resolution of a continuous signal (i.e. the spatially varying distribution of photons/electrons incident on the detector) in discrete digital space (i.e. the pixels in the acquired image) requires sampling at least double the frequency of the smallest resolvable feature. 29When the Nyquist-Shannon theorem is applied to the two-dimensional nature of images, the theoretical pixel size for adequate sampling should in fact be ∼2.8 times smaller than the resolving power of the microscope. 30This sampling should be observed if very fine structures within the sample are to be measured and quantified, as larger pixel sizes will lead to a loss of information due to undersampling.As a result, the accuracy of any measurement of the same biological structure varies depending on the magnification (which determines the pixel size) and numerical aperture (NA) of the microscope objective (which determines the optical resolution), as shown on Figure 3.
In addition to spatially binning detected photons or electrons into pixels, detectors also convert the measured intensity into an integer number.This value depends on the intensity of emitted fluorescence or scattered electrons, as well as detector settings such as gains and offsets (see Glossary in Table 1).However, it also depends on the bit depth of the acquired images.Bit depth determines the range of values that can be digitally stored within a pixel; most microscopy data is acquired at 8-, 12-, or 16-bit depth.A pixel can only store a number within the range 0 → (2 N -1), where N is the bit depth.During image processing, there are occasions where the bit depth of an image is changed.This is typically when a mathematical operation is performed on the image that generates values that are beyond the range of the bit depth (for example, a negative number) or have a noninteger component.Changing the bit depth will influence downstream measurements, as we shall see later.
Just like contrast generation and image formation, digital preprocessing steps prior to quantification can dramatically affect measurements.One must therefore remain mindful of the impact of these operations on the readout of interest.As an example, whatever gets suppressed by a thresholding operation may be considered as 'background' but could in fact contain relevant image information in addition to noise and nonspecific signal.

Image segmentation
It is rare that every pixel in an image is considered relevant experimental output.Most often, measurements are to be extracted from specific regions of interest, which either correspond to relevant objects or portions thereof.Segmentation is the process of partitioning an image into different regions, whether background and foreground (semantic, see Glossary in Table 1) or individual objects (instance, see Glossary in Table 1).Segmentation, whether instance or semantic, can be a challenging problem to solve because of the diversity of structures in microscopy data.Some features of interest can be relatively simple to segment (Figure 4A), while others can be highly challenging (Figure 4G-j).The fundamental nature of this challenge has however led to the development of numerous solutions which are available to reuse and adapt, both relying on classical image processing methods and leveraging recent machine learning tools. 16Segmentation algorithms generally output a 'mask' (see Glossary in Table 1), which consists of labels for each pixel in the original image (Figure 4).Such masks can either be binary in the case of semantic segmentation, meaning that pixels (or voxels in the case of 3D volumes) are either labelled 0 (background) or 1 (foreground) (Figure 4D and H), or composed of integer numbers for instance segmentation, whereby all pixels (or voxels) labelled with the same integer value belong to the same object instance (Figure 4E).Alternatively, to masks, instance segmentation algorithms can also output object contours (also sometimes called outlines) or surfaces.In 2D images, each individual object is then identified by the list of 2D coordinates of the pixels composing its contour (Figure 4F and I).In 3D volumes, surfaces can either be represented as a list of 3D voxel coordinates, or as a more structured set of vertices and faces called mesh (Figure 4J) (see Glossary in Table 1).Contour (outlines) or surfaces and mask representations of individual objects can easily be converted into one another by filling the former and finding the boundaries of the latter using classical image processing methods such as connected components or boundary tracing (e.g. the classical marching cubes algorithm).The relevant quantity to be measured for each biological question (e.g.ensemble vs. individual readout, internal vs. membrane readout) informs on the choice of algorithm.If individual objects are not needed, then a binary semantic mask may be sufficient.If only membranes/interfaces are of interest, then individual contours or surfaces may be sufficient.
Relying on computer-based algorithms to automate the process of segmentation has been a topic of central interest since the early days of microscopy image analysis. 31In the past decade, advances in deep machine learning have revolutionised bioimage processing and analysis in general, and segmentation algorithms in particular. 32,33Larger benchmark datasets of microscopy images and crowdsourced improvements on model architecture have pushed the limits of achievable accuracy and generalisation. 34A large variety of powerful automated segmentation algorithms based on deep learning are now available on open-source software and usable by researchers with little to no computer science expertise. 16The democratisation of the use of artificial intelligence in image analysis obviously also comes with many challenges, for instance around reproducibility.Although these aspects will not be covered in this paper, other excellent reviews explore them in depth. 35,36I G U R E 3 Impact of acquisition parameters on quantitative measurements.The same field-of-view of fixed BPAE cells stained with Mitotracker Red CMXRos (green) and DAPI (magenta) imaged with a widefield microscope with different air and oil (n = 1.515) objectives and additional optical magnifications ('Full field-of-view').The number of cells in the FOV, theoretical resolution (∆d) (emission wavelength/2 × NA), and resolution as measured using Image Decorrelation Analysis 89 are listed.Scale bars: 20× -100 μm, 40×, 60× -50 μm, 100×, 150× -20 μm.'Nucleus' column shows a crop of the same nucleus from each magnification (larger white box in full FOV).The nucleus was segmented using Otsu thresholding after applying a 100 nm Gaussian blur to the crop, with the threshold border indicated in yellow.Area, circularity ('Circ.')and roundness ('Round.')values are indicated below.Nucleus scale bars = 5 μm.'Mitochondria' columns show a crop of mitochondria staining at each magnification (smaller white box in full FOV).A line profile was drawn across the same region (between the arrowheads) and intensity profiles are plotted to in the right-hand column (line averaging width of 5 pixels).Distances between prominent adjacent peaks were measured between the dashed lines and are indicated below the images.Mitochondria scale bars = 1 μm.

MEASURING IMAGE INTENSITY
When acquiring microscopy data one of the first things a researcher checks, either by qualitatively inspecting the image or by examining the digital pixel values, is the intensity.

What is intensity?
Intensities in a fluorescence microscopy image correspond to photons emitted by excited fluorophores, as described in Section 1.4.Most fluorescence intensity quantifications made from images are of the fluorophores themselves, not directly of the biological molecules of interest.This should be taken into consideration when translating any results from fluorescence intensity quantification into biological conclusions.Depending on the imaging modality used in fluorescence microscopy, intensity information may also arise from fluorescent sources in the sample other than the labelled structures in the focal plane.In techniques capable of optical sectioning, such as confocal microscopy (Figure 1B), two-photon microscopy, and TIRF (see Glossary in Table 1, Figure 1D), out-of-focus fluorescence does not reach the detector, whereas images acquired using widefield microscopy will contain out-of-focus intensities (Figure 1A).All fluorescence microscopy images may also contain intensity contributions from autofluorescence (endogenous fluorescent species present within the sample in the absence of intentional labelling).Confocal z-stacks are frequently projected into a single 2D image for visualisation and analysis (Figure 1C); a 'sum slices' or average intensity projection will retain intensity information, whereas a maximum intensity projection will produce sharper images but with intensities that do not correspond to the total amount of fluorescence below each pixel and thus should not be used for intensity quantification.
When fluorescence intensity measurements are important, potential inhomogeneity in the illumination across the field-of-view can also create unwanted variability and should therefore be circumvented (see Section 1.4).
For EM, image intensity corresponds to electrons that reach the detection mechanism having interacted with the sample.For example, in TEM the number of electrons that reach the detector are impacted by microscope parameters (beam kV and intensity) and the sample characteristics (sample thickness and regional electron density of sample throughout its thickness), as illustrated in Figure 1E.Electron dense regions block the electron path and appear dark, while electron lucent regions allow the electrons to reach the detector and appear light.As biological samples often have little inherent capacity for differentially affecting the electron path, heavy metals are usually introduced into the sample to provide differential contrast, as described briefly in Section 1.3.These sample preparation protocols vary widely and can give different views of the sample, as shown in the lower panel of Figure 1E.For BSE imaging in an SEM, the amount of heavy metal incorporated into each part of the sample, as well as the landing energy of the focused electron beam, also influences the intensity captured by the detector as shown in Figure 1G and H.For SE imaging in an SEM, aside from accelerating voltage, the intensity of the signal detected is additionally impacted by the differential angles of the detector and primary electron beam, as well as how the primary and secondary electrons interact with the limiting shape of the sample (Figure 1F).Depending upon the SEM configuration this can result in images that appear 3D with regions of highlights and shadows or comparatively flat images with apparently poorer signal to noise.For all EM imaging modalities, the detector settings including exposure time and whether averaging (line/frame) or automatic gain or scaling is performed can impact acquired image intensity as well.

Quantifying intensity
Pixel (or voxel) values are a direct readout of intensity itself.Aggregated measurements summarising collections of pixel values, for instance over the area of an object of interest, are however often more useful than individual

Binning
(Acquisition or analysis) In acquisition, the assimilation of signal into a finite-sized pixel in an image.In analysis, the process of combining the output of adjacent pixels to increase signal, thereby losing resolution.

Gain
(Acquisition) An amplification factor applied to the readout from the photon/electron detector in order to produce the image.It adjusts the sensitivity of the camera, but also amplifies the noise.

Offset
(Acquisition) The minimal intensity captured by the photons/electron detector.

Pipeline
(Analysis) A series of data processing steps that allows extraction of quantitative metrics from raw image data.

Deconvolution
(Analysis) The computational process of enhancing image contrast using knowledge of the way the microscope forms images.

Semantic
(Analysis) Semantic, in the context of segmentation, describes the association of each pixel of an image with a label, typically 'foreground' or 'background'.

Instance
(Analysis) Individual occurrence of an object type.For example, an image with 3 circles has 3 instances of a 'circle' object.In microscopy, 'instances' often correspond to specific biological structures.

Mask
(Analysis) Image in which all pixels/voxels that are part of the foreground are set to an integer value (e.g. 1 or 255), and all pixels/voxels that are part of the background are set to 0. Mesh (Analysis) A set of vertices and faces that define polygons (often triangles) and, when taken together, form a surface covering of a 3D object.
pixel intensities.Intensity is therefore usually measured as a distribution -either over the entire image or over a region of interest.That distribution can be analysed with the classical toolkit of statistics: represented as an histogram or characterised by a small number of summary statistics such as the mode, median, standard deviation and higher central moments when appropriate.Intensity-based measurements extracted from a fluorescence microscopy image, be that from the raw data or after processing, are usually comparative.Standalone measurements of pixel or object intensity in images are indeed often meaningless; they must be reported in the context of some baseline condition such as the background intensity, or the intensity of a comparable structure under a different biological condition.For such comparisons to be made accurately, it is critical that acquisition parameters such as illumination intensity, magnification, pixel dwell time (point detectors) or exposure time (cameras) and detector gains are recorded and ideally kept consistent between different images.Any image processing pipelines should be applied equivalently to each image, including the ones that do not look like they 'need' it.For some biological measurements, it makes sense to work with the absolute fluorescence values in images, such as monitoring the expression of a GFP-tagged protein during successive cell divisions. 37However, when aggregating results from different images, relative or normalised fluorescence intensities are often used so that results can be aggregated.Overall, any absolute measurements of intensity from fluorescence microscopy modalities such as widefield and confocal microscopy are critically dependent on labelling, acquisition settings and postprocessing.If comparisons of fluorescence intensity are to be made between different images, then these parameters should be as identical as possible in each case.
For EM images, electron density or intensity is rarely absolutely quantified, as routinely controlling the factors involved in (a) generating contrast (sample preparation), (b) detecting electron density (microscope configuration, inherent sample characteristics and detector settings) and (c) calibrating these, is fraught with challenges.9][40] Second, for thin section imaging (TEM or BSE in SEM) one must be aware that the morphology of a structure, and its presentation within the volume of the section, can impact its resulting intensity profile in an image.For example, a limiting membrane of the endoplasmic reticulum cut perpendicular to the electron beam (arrow, Figure 1E) can result in a very different intensity profile as the same membrane cut parallel to the electron beam (arrowhead, Figure 1E).Third, without tilt tomography, it is difficult to rule out that other electron dense structures, above or below your structure of interest, may be present in the section at that position in x and y, and impact any intensity measurements.Should the research question warrant electron density quantification, great care should be taken to minimise variations in sample preparation and ensure that the same microscope and detector settings are used to ensure homogenous and comparable illumination, and absence of under/over exposure.It is also beneficial to pick an intrinsic, unaffected structure within the samples that can be used as an internal control for electron density calibration.As a notable exception, signal intensity is often quantified in x-ray microanalysis, where the energy spectrum of electron induced x-rays can provide relative quantitative information about the elemental composition of the sample. 41

Factors impacting intensity information
A vast range of image processing operations can be applied to raw images following acquisition.If quantitative intensity information is to be extracted following image processing, it is then important to understand how processing affects image intensity (Figures 1C and 5).If intensity measurements are to be made following image processing, then it is important that the processed values are still linearly related to the number of fluorescent molecules present in a given region of the image (Figure 5D).Iterative deconvolution (see Glossary in Table 1) methods have been shown experimentally to be largely linear with respect to intensity, although this can be microscope-dependent 42 (Figure 5, 'Deconvolution').An example of a nonlinear image processing operation is the Super-Resolution Radial Fluctuations (SRRF) method 43 (Figure 5, 'SRRF').This is an example of a method which can increase both contrast and resolution of an image dataset but should not be used for quantitative intensity measurements.
The effect of bit depth (see Section 1.5) on intensity information is somewhat analogous to the effect of pixel size on spatial information; higher bit depths provide higher 'sampling' of intensities, which can provide higher precision for quantitative measurements.Critically, measurements should not be made from any pixels having either the minimum (0) or maximum (2 N -1) value as this is likely to represent incomplete or 'clipped' information in the image; unless the image was acquired in a very low fluorescence intensity regime, pixels of value 0 may in fact represent a range of different 'real' fluorescence intensities that are below the range of the detector settings, and pixels of value 2 N -1 may represent a range of real fluorescence intensities that are saturating the detector.Intensity quantification is still valid when performed on images after increasing the bit depth, but no intensity quantifications should be made from images following conversion to a lower bit depth.This is because conversion to a lower bit depth requires a rescaling of pixel values so that they fit within the smaller range, which results in a loss of information from the image.
In addition to bit depth, another important concept when measuring image intensities is that of dynamic range.This can have two slightly different meanings, depending on context.When referring to detectors, the dynamic range refers to the minimum and maximum light intensities incident on the detector that can be measured simultaneously.However, when referring to images, it usually means the minimum and maximum pixel values within a single image.During image acquisition, ideally one wants intensities that span the full dynamic range of the detector.However, this is often impractical for many applications, especially with high bit-depth detectors.Generally, acquisition parameters should be adjusted to maximise image dynamic range as far as is practical without causing saturation or significant bleaching (in the case of fluorescence) to provide a wide range of values for precise extraction of quantitative information.
An emerging field of processing methods for fluorescence microscopy images are deep learning-based methods.Such methods typically require training a neural network with pairs of high-quality and low-quality images of the same field-of-view; the network attempts to 'learn' what series of image processing operations should be applied to reliably convert low-signal images into images closely matching the high-signal equivalent.New lowquality images (without a high-quality equivalent) can then be provided to the trained neural network, and the network will output a high-quality prediction.Example applications of these algorithms are for increasing the signal-to-noise ratio of low-signal images 44 and increasing resolution of images, 45 among others. 18Because these methods impact image intensity in a nonlinear manner (Figure 5, 'CARE'), it is strongly recommended that intensity-dependent quantification is not performed on images processed with deep learning methods.

MEASURING MORPHOLOGY
Most microscopy data, regardless of the modality, hold information that pertains to morphology.Although the concept of 'morphology' is intuitively understood by everyone, it can be challenging to define precisely what it means.

What is morphology?
Loosely characterised as the visual appearance in terms of form or structure, morphology is critical in many biological processes because it reflects and influences the physiological state of living systems. 46,47Though it may be tempting to measure everything that can be measured and ask questions later, it is recommended to identify what kind of morphological information will be relevant to the question being investigated and how this information may be impacted by the imaging or analysis process.The shape of the objects of interest, for example those labelled with a membrane marker, is the most common readout of morphology in bioimage analysis. 48In addition, texture information is also available in imaging modalities that capture intracellular components such as organelles and cytoskeletal elements. 49,50Although the type of image feature informing on morphology may vary (whether edges, textures or a mix of both), most morphology measurements are extracted for individual biologically relevant objects.They therefore share the need for segmentation upstream of the actual quantification step.

Quantifying morphology
While segmentation is a necessary step towards the quantification of morphology, it is a means but not the end.The output of segmentation will be used as a basis to quantify morphology.This is worth keeping in mind to assess the level of accuracy needed from segmentation: subtle differences in morphology can be missed if segmentation accuracy is too low (Figure 3, mitochondria).Conversely, large morphological properties such as object size may not require segmentation accuracy to the single pixel or voxel (Figure 3, nucleus).The scale of the morphological readout of interest thus also informs on how precise the segmentation must be for it to be quantitatively captured.
A subset of commonly used handcrafted measurements of morphology in 2D is listed in Table 2. Some of these metrics are adapted from general concepts of geometry and others have been carefully engineered relying on image processing tools.All have been designed to quantify the geometrical (for shape) or visual (for texture) nature of an object in an intuitive and interpretable manner, and several can be directly extended to 3D.Different measurements capture different aspects of morphology, sometimes with very subtle differences (e.g.roundness and circularity).
An alternate route is to let machines learn morphology descriptors directly from the data.This is relevant in many cases, from situations where morphology is too ambiguous to make it possible to craft a relevant set of features, to cases where the biological phenomenon of interest is too poorly understood to allow predicting which aspect of morphology is discriminative.When used well, machine learning strategies, whether supervised or unsupervised, can produce descriptors of morphology that are less biased and that better capture information than manually designed ones, at the expense of interpretability and for a more significant computational cost.Morphology descriptors of individual cell types can be for instance learned by a deep neural network in an unbiased manner relying solely on 3D shape and texture from EM volumes, without specifying a biological question. 55Despite the effectiveness of deep neural networks, it is usually not possible to reverseengineer the exact nature of the morphological features they rely on, making learned representations potentially difficult to interpret.Efforts to investigate and compare published approaches on benchmark or reference datasets are invaluable to navigate these available options. 56

Factors impacting morphological quantification
It should be noted that when measuring the morphology of objects within images, it is critical to consider both the pixel size and resolution of the image, as these provide information on the lower limits of measurement differences that can be captured by the data (Figure 3).This is especially important if any measurement approaches the resolution limit of the acquired image.For example, any measured sizes should remain above the theoretical resolution limit.If many objects in the image are measured to have sizes comparable to the resolution limit of the system, then this may be a population of objects of varying sizes smaller than what can be resolved.One should also remember that many morphological measurements are computed on 2D projections of structures that are actually tri-dimensional, as for instance in widefield fluorescence (Figure 1A) and projected confocal stacks (Figure 1C).These images do not consider the third dimension and may therefore be misleading when quantifying morphology.
It is important to keep in mind that the morphology we observe in a microscopy image is a product of both the sample preparation and imaging process.Any measurements extracted to describe it are therefore strongly influenced by factors that may not be immediately relevant or obvious to the biological phenomenon of interest.When considering sample preparation factors for instance, some proteins, commonly used as organelle markers, can demonstrate apparently normal organelle morphology while other proteins (also used as organelle markers) can reveal aberrant morphology.For instance, the ER protein Calnexin was shown to reveal ER with apparently normal morphology in cells depleted of GBF1, while Calreticulin, another ER protein, revealed aberrant ER morphology in the same cells.This was later validated by correlative light and electron microscopy (CLEM). 57Similarly, if the biological perturbation impacts intracellular trafficking pathways, then the normal intracellular localisation of standard organelle markers, may also be affected.Misinterpretation of imaging data can be avoided by using additional markers, performing trafficking time courses, and using additional experimental approaches such as biochemical assays and CLEM.When considering imaging, refractive index mismatch between the microscope lens and the sample medium can also be a factor impacting morphological quantifications and severely affect downstream analysis, especially for 3D bioimage data.Beyond acquisition, image processing algorithms also impact morphology measurements, as nonlinear operations can significantly alter the results of automatic thresholding, for example (Figure 5B).Keeping in mind how morphology measurements are computed is therefore crucial to determine whether the considered readouts can be meaningfully compared across different datasets.Whenever absolute image intensity is involved, for instance when relying on texture descriptors that are not only based on relative variations of intensity, one must question whether intensity can reasonably be compared across different images, as discussed in Section 2. Similarly, the observed shape depends on the image resolution in x, y and z and can be strongly affected by how biological samples have been prepared for imaging.Most fluorescence and electron microscopy setups acquiring 3D image volumes produce anisotropic data, meaning that voxels have a different physical size along the x, y and z axis.In such cases, any measurement assuming isotropic voxel size will result in misleading or plainly wrong quantitative readouts.In EM, each TEM will be technically specified to provide resolution in the angstrom range but the ultimate resolution of the acquired images -what can actually be visually resolved -are impacted by the sample, heavy metals introduced, thickness and density of the sample and image acquisition parameters.Introduction of significant amounts of heavy metals may coat ultrastructural features thickly and make it difficult to resolve finer ultrastructural details, thereby limiting the possibility of quantifying morphology.Besides microscope resolution, sample preparation is a notoriously strong factor influencing morphology.Having a good understanding of how different types of preparation distort the morphology of samples therefore provides crucial information on whether measurements can be considered biologically relevant or not.
Electron microscopy has a long-standing history of investigating the effect of sample preparation, 58 with examples specifically focusing on morphology preservation. 59The need for strategies that minimally alter the structure of the imaged sample has inspired several modern fixation techniques. 60,61As always, optimisation is required to find a sensible balance of all aspects of the experimental design from sample preparation to quantification, with the ultimate aim to address the research question in mind.

COUNTING AND LABELLING
Intensity and morphology can be considered 'first-order measurements' as they focus on quantifying purely visual information.In contrast, 'second-order measurements' focus on aggregating and combining morphology and intensity metrics to quantify structures that are externally defined.

What are counts and labels?
Counts straightforwardly refer to the number of occurrences of a given structure or object in an image.Labels refer to identifiers from a limited and usually fixed set of possibilities (e.g.'mitochondria' or 'nuclei') assigned to objects in a process referred to as classification.Since labels are used to group objects into categories or classes, they can be referred to as 'categorical labels' or 'class labels'.Labels are most often derived from human-defined categories (e.g.different experimental conditions, different subcellular structure, different cell types), but can also be inferred directly from the result of a clustering algorithm.
In that case, the label assigned to an object reflects the identity of the cluster they belong to and may take an arbitrary numerical value.

Quantifying and interpreting labels and object counts
Individual object count is immediately obtained whenever objects can be segmented.Labels can be retrieved from morphology and intensity measurements extracted from individual objects and combined in a feature vector through a classification or clustering process, depending on whether annotated examples of categories are available.Labels can also be automatically recovered directly from image crops, without an intermediate measurement extraction step, relying on machine learning. 62If no other readouts are required, counting and labelling can therefore simply involve a bounding box detection process and do not necessarily require the definition of precise object boundaries (and therefore segmentation).
Because of its 'second-order' nature, counting exploits known information of the structure or object to be detected.This can be achieved through strong priors, as exploited by blob detectors 63 and the Hough transform, 64 or through a curated example of the object of interest, as is the case in template matching. 65,66Blob detectors exploit digital filters with specific shapes such as the Laplacian of Gaussian or the Determinant of Hessian to detect agglomerates of pixels fitting within a circle of predefined radius (Figure 4B).The Hough transform, in contrast, is an algorithm designed to detect occurrences of perfect circles in an image.Both blob detectors and the Hough transform can successfully identify round structures such as nuclei as they appear in fluorescence microscopy (Figure 4C) and serve as basis for the study of more complex cellular processes. 67Template matching can be tuned to detect an object of choice (the 'template') and is the preferred method to identify molecular complexes in EM tomograms. 68,69Both the Hough transform and template matching are examples of algorithms that have the ability to provide object counts without going through a segmentation step.When visual appearance varies so significantly that a single good object representative is hard to identify, deep learning methods can learn to detect occurrences of complex structures from large collections of visual examples. 70,71Although initially designed for the detection of highly structured objects from natural images such as cars and human faces, the same algorithms have shown to generalise enough to provide good enough results in fluorescence microscopy data to allow counting. 72t the extreme, labelling may neither require segmentation nor even object detection.Classification can be successfully carried out from tiles, obtained by splitting an image into a square grid. 73Labels are then assigned to each tile, thus providing a readout of the categories present in the image without relying on the individually defined objects.This approach is successfully exploited in digi-tal pathology, where object segmentation is particularly challenging. 74,75

Factors impacting labels and count information
The number of elements present in an image or their category are seemingly absolute measurements, and it is thus reasonable to expect these readouts to be comparable across microscopy data.It is however important to keep in mind that, due to their 'second-order' nature, count and label measurements ultimately rely on morphology and intensity features.As such, when comparing across images, one should carefully consider how the nature of the data may reflect on morphology and intensity measurements as discussed through the other sections of this paper and, in turn, influence the results of counting or labelling quantification pipeline.

BEYOND INDIVIDUAL MEASUREMENTS
The process of quantifying image microscopy data in biology goes beyond understanding individual types of measurements and what they reflect.Once extracted, these measurements are meant to be used to carry out statistical analyses and support conclusions made on the experiments that images captured.

Assembling and processing feature vectors
As discussed in previous sections, many different measurements related to any individual object's intensity, morphology, or identity can be extracted, making it challenging to know a priori which ones will be most informative.The best approach is therefore to assemble a large amount of such measurements into a feature vector, which is often simply a list of numbers used to quantitatively represent an object.Being able to measure specific readouts from any input image is therefore of utmost interest, as exemplified by the wealth of available software and libraries providing accessible ways to extract most of the measurements reviewed in previous sections. 76 feature vector is said to be handcrafted when it is constructed with measurements that are predefined by the experimenter.The intention behind assembling a large group of measurements is to empirically capture as many aspects of an object as possible, in a quantitative manner, to describe it in the most precise and unambiguous manner.With this goal in mind, a valid strategy is to create feature vectors by indiscriminately measuring everything one can think of measuring.Although it may make sense to gather more measurements than needed to ensure that no important information is omitted, feature vectors built in that way often end up being highly redundant.This is because different handcrafted measurements may be directly related to one another (for instance roundness and compactness, see Table 2) or may be derived from the same geometrical properties (for instance area, perimeter and circularity).The more numbers a feature vector is composed of, the higher the dimensionality of the space it lives in -and therefore the harder it is to visualise and make sense of.Strong correlation between many elements or large amounts of duplicated elements in a feature vector only make matters worse, as the difficulty of recovering discriminative information increases in higherdimensional spaces.Feature selection methods such as the Fisher score 77 can be used to limit redundancy, and dimensionality reduction techniques such as the famous Principal Component Analysis can help prune the collection of measurements and retain only a small number of most informative elements. 78eature vectors can alternatively be inferred directly from image data relying on machine learning. 79In this latter case, the numbers composing the feature vector are readouts generated by an automated algorithm through complex combinations of the original image information and cannot be readily expressed with a simple mathematical expression like handcrafted measurements (see Table 2).Feature vectors built in this way have the potential to be more discriminative than handcrafted ones, and to capture properties of objects that human-defined measurements would be missing but lose interpretability as a trade-off.Although automatically generated, learned feature vectors may equally benefit from feature selection and dimensionality reduction as handcrafted ones.

Quality control and quality assurance
As already stated in Section 1.1, no matter how welldesigned the analysis component of a microscopy experiment, if the images being input have poor quality, or the sample preparation and labelling have been poorly designed or executed, then the results obtained from analysis will have little meaning.In fluorescence microscopy, the most commonly used metrics for assessing image quality are signal-to-noise ratio (SNR) and spatial resolution.SNR values alone are however insufficient to tell whether an image contains important biological information or not, and whether it will be good enough for quantitative anal-ysis.Spatial resolution measurements are not necessarily an indicator of image quality directly but can be useful for contextualising morphological measurements.For example, large number of measurements clustered at the resolution limit indicate that it may be necessary to use higher resolution data to study the structure of interest.Measuring image properties such as SNR and resolution and recording them alongside other metadata from image acquisition helps to add additional context to results from quantitative analysis. 80t is also important to recognise and reduce bias in quantitative image analysis.One major avenue for this is investing time in creating automated analysis pipelines whereby batches of images acquired under different biological conditions can be analysed in the same manner, free from any user input.Where automated analysis is not practical, or manual parameter selection is required, blinding can help reduce user bias. 14Batch effects, defined as nonbiological experimental variations that confound measurements, are a common source of bias, with possibly dramatic consequences on end results. 81,82The influence of batch effects is further demonstrated by Shamir et al. who show that intensity and morphology measurements computed on microscopy images composed only of background signal can allow identifying different organelles. 83s stated throughout this article, sample preparation, acquisition parameters (such as illumination intensity, magnification, exposure time and detector gains) and experimental parameters (such as timestamps and sample id) must be recorded for each image whenever measurements are meant to be compared.Similarly, all parameters that can be kept constant should ideally remain as identical as possible over images.Batch effects can be further mitigated at the level of image data by correcting for intensity variations, 25,26 or with feature normalisation. 84A good summary of strategies to identify and correct batch effects is provided in Caicedo et al. 9 The laboratory standard for assessing the legitimacy of a scientific analysis is quality control and performance metrics, and image quantification gets no exception from that.Although plenty of established metrics are available to assess the success of algorithms that carry out segmentation, detection, counting and classification among many others, identifying metrics that faithfully reflect performance across datasets and use-cases remains an open challenge. 85Although not quantitative, visual inspection remains a robust quality control strategy.This endeavour may however be highly nontrivial when dealing with highdimensional, dynamic datasets or with rare events, and can be greatly facilitated by dedicated software tools. 86Ultimately, the most powerful measure of quality control is reproducibility: the experimental procedures, microscope hardware specifications, image acquisition parameters, and image quantification algorithms provided in a published study should allow other researchers to recover its quantitative conclusions. 87

CONCLUSIONS
The most important message of this paper is that any measurements extracted from microscopy image data can only be as good as the image data themselves.Sample preparation and image acquisition therefore significantly impact the quality of quantitative readouts.
Here, we focused on three big families of measurements: image intensity, morphology and object counts or categorical labels.Unless carried out on the entire image at once, most measurements require a first step of instance segmentation.It is generally difficult to accurately compare intensity measurements, whether in electron or light microscopy images.Normalisation to a reference provides a way around this but requires significant care to be done meaningfully.Morphology, unlike intensity, is challenging to define generally as it relates to shape, texture, and complex combinations thereof.Object counts and categorical labels obviously require objects to be identified and assigned but may not necessarily need precise outlines.These types of readout can therefore often be obtained without explicitly segmenting individual objects.
Other key aspects of image quantification are quality control and quality assurance.While it is common practice to account for known distortions and aberrations introduced by sectioning and imaging in specific imaging modalities such as EM, assessing the accuracy or 'success' of these corrections and their impact on downstream measurement is sometimes challenging.Identifying good metrics to assess whether a quantitative readout makes sense can be difficult, and plenty of confounders may adversely affect the extracted measurements.In addition to informing on the type of measurements that can meaningfully be extracted from image data, the essential information about image acquisition, sample preparation, and processing provided by metadata is therefore also crucial to allow randomisation and mitigate batch effects at the analysis stage.
As a summary, the problem of quantifying microscopy data should be approached like any experiment: start by stating a clear definition of the objectives, propose a strategy, study the implications of each step involved in that strategy and challenge the results accordingly.Automation should also be prioritised as much as possible to minimise bias and maximise reproducibility.Such a holistic approach to bioimage quantification is the safest way to ensure that meaningful measurements are extracted and that they are handled in a scientifically rigorous manner.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.© 2023 European Molecular Biology Laboratory (EMBL) and The Authors.Journal of Microscopy published by John Wiley & Sons Ltd on behalf of Royal Microscopical Society.

F I G U R E 1
Image formation in fluorescence and electron microscopy.(A) Widefield microscopy of live S. pombe cells expressing sfGFP-tubulin (curved lines in schematic).The whole sample volume is illuminated simultaneously (cyan shading and 'excitation' bar on schematic) and fluorescence is captured on a camera (green 'detection' bar on schematic).Fine structure can be swamped by out-of-focus fluorescence.(B) Confocal microscopy of the same field-of-view as (A).Diffraction-limited laser spots are scanned individually (laser-scanning confocal) or swept in an array (spinning disk confocal, shown here) across the field-of-view; a pinhole (or array of pinholes) in the detection path prevents out-of-focus fluorescence from reaching the detector (point detector for laser-scanning confocal, camera for spinning disk) allowing for good contrast in fine structures at a range of sample depths.(C) Confocal slices can be acquired at a range of sample depths to form a 'z-stack'.This 3D volume can be projected to a single image by adding the images together ('Sum slices') or picking the most intense value for each pixel ('Max intensity').(D) TIRF microscopy involves the generation of an evanescent field that only illuminates the volume of the sample within a few hundred nanometres of the coverslip.Emitted fluorescence is captured on a camera.Only structures near the coverslip (here, microtubules at the bottom of cells) are present in images.Note, this is a different field-of-view than in (A)-(C).All fluorescence scale bars = 10 μm.(E) TEM showing mitochondria in a thin section of an embedded cell.Different sample preparation protocols are shown: a conventional protocol causes the membrane of the mitochondria to be electron dense (appear dark), while a Tokuyasu protocol causes them to be electron lucent (appear light).Tokuyasu image courtesy of I. J. White.(F) SEM with secondary electron (SE) detection of an exocytosis event on the surface of an endothelial cell, where the topological structure of Von Willebrand factor strings released from the cell can be visualised.(G) SEM with back scattered electron (BSE) detection of sample shown in f, where gold-labelled antibodies highlight Von Willebrand factor strings released from the cell.Images courtesy of K. O'Neill and D. Cutler.(H) SEM with BSE detection of a thin section of a resin embedded sample.

F I G U R E 2
Impact of illumination on fluorescence intensity measurements.(A) Measured fluorescence intensities of Alexa Fluor 488 Phalloidin (measured from box 'A'), Mitotracker Red (box 'M') and DAPI (box 'D') in response to increasing LED illumination intensity in fixed BPAE cells (widefield microscopy, Plan Apo VC 60× Oil objective NA = 1.4, 100 ms exposure).Dashed lines indicate what a linear relationship between illumination intensity and fluorescence intensity would be.(B) Measured fluorescence intensities of NLS-GFP and Nup60-mCherry in live Schizosaccharomyces pombe cells (strain GD250 as described in Dey et al. 67 ) in response to increasing laser illumination intensity (spinning disk confocal microscopy, SR HP Apo TIRF 100× AC Oil objective NA = 1.49, 100 ms exposure).Intensities were measured from regions of each channel above the Otsu threshold (portions of masks shown in corners of image).(C) Continuous spinning disk confocal imaging for 30 seconds of S. pombe cells expressing sfGFP-tubulin (strain AV2434, as described in Vještica et al. 88 ) at either 10% or 70% 488 nm laser intensity results in different photobleaching characteristics.Intensity was measured as the mean intensity above the Otsu threshold for each image.(D) Nuclear fluorescence intensities measured from live S. pombe cells expressing NLS-GFP ('Uncorrected') are a product of the true concentration of protein per nucleus and the flatness of the excitation illumination ('Illumination').Inhomogeneous illumination can be corrected by dividing the acquired image by the illumination image (here, a homogenously fluorescent slide).Image acquisition as in (B).Scale bars in all panels = 10 μm.

F I G U R E 4
Outputs of object counting and segmentation.(A) A spinning disk confocal image of S. pombe cells expressing the nuclear marker NLS-sfGFP (strain AV1200, Vještica et al.88 ).Scale bar = 10 μm.If the task is only to count objects in an image, rather than extract morphological information, then this can be performed using peak detection on the image following difference-of-Gaussians filtering (B), or if strong geometric priors are known, a method such as the circular Hough transform on the edge-detected image (C).Segmentation of objects can produce various outputs.(D) Semantic segmentation divides an image into two classes: foreground (i.e.objects of interest, black) and background (white).Such an image is also referred to as a binary mask.(E) Instance segmentation divides an image into background (black) and 'instances' of the object of interest.Each different instance here is randomly assigned a different colour.(F) Segmented objects can alternatively be represented by their boundaries rather than a solid object.(G) BSE image from SEM of HIV infected human monocyte induced macrophage.90Scale bar = 10 μm.(H) Segmentation of intracellular plasma membrane-connected compartment (IPMC) shown as mask.(I) Segmentation of IPMC shown as outlines.(J) 3D rendered mesh of segmentation of IPMC.Magenta boxes highlight locations of insets shown magnified below.TA B L E 1 Glossary of key technical terms in image quantification.Glossary TIRF (Acquisition) Total Internal Reflection Fluorescence.A fluorescence microscopy technique that uses total internal reflection of excitation light at the interface between the coverslip and the sample to generate a field of light that is most intense at the interface and exponentially decays with increasing depth into the sample (over a range of a few hundred nanometres).This allows for axially restricted excitation of fluorophores close to the sample.

F I G U R E 5
Effect of image processing on downstream quantification.(A) Images of a fixed BPAE cell with mitochondria stained.'Raw' image has not been processed following acquisition; other images are processed versions of this image via the methods stated.Inset corresponds to white rectangle in the left image.Scale bars: 10 μm (large image), 5 μm (magnified inset).(B) Mitochondria segmented from the inset images using Otsu thresholding.Note that morphological analysis of these segmentations would yield different results between different image processing methods.(C) Histograms of pixel values within the large images in (A), where count refers to the number of pixels.Histograms cover the full range of pixel values in each image; note that both the shape of the histogram and this value range vary with different processing methods.(D) For each pixel in the processed images, the pixel value is plotted against the value of the corresponding pixel in the raw unprocessed image.The grey line indicates a 1:1 relationship between processed and unprocessed pixel values (i.e.no change following processing).

TA B L E 2 2 𝐻
Common handcrafted morphology features in 2D.Ratio of the object area to that of a circle with the same perimeter Roundness 4  Ratio of the object area to that of a circle with the same width Compactness shape or texture into the Fourier basis Descriptors based on an object's shape (object geometry), texture (image intensity), either shape or texture.

A
C K N O W L E D G E M E N T S VU is supported by EMBL internal funding.JJB was supported by MRC core funding to the MRC Laboratory for Molecular Cell Biology at University College London, award code MC_U12266B and WT funding (218278/Z/19/Z).SC is supported by a Royal Society University Research Fellowship (URF∖R1∖211329).ACC is funded by the BiPAS CDT at King's College London.The authors thank James Levitt of the Nikon Imaging Centre at King's College London for his support and assistance in this work.The authors thank Gautam Dey and his lab at EMBL, Heidelberg for providing S. pombe strains, and David Nkwe, Annegret Pelchen-Matthews, Lucy Collinson, Mark Marsh, Krupa O'Neill, Daniel Cutler and Ian White for permission to use collaborative data for figure panels.Open access funding enabled and organized by Projekt DEAL.O R C I D Virginie Uhlmann https://orcid.org/0000-0002-2859-9241R E F E R E N C E S