The magnitude vector of images

The magnitude of a ﬁnite metric space has recently emerged as a novel invariant quantity, allowing to measure the effective size of a metric space. Despite encouraging ﬁrst results demonstrating the descriptive abilities of the magnitude, such as being able to detect the boundary of a metric space, the potential use cases of magnitude remain under-explored. In this work, we investigate the properties of the magnitude on images, an important data modality in many machine learning applications. By endowing each individual images with its own metric space, we are able to deﬁne the concept of magnitude on images and analyse the individual contribution of each pixel with the magnitude vector. In particular, we theoretically show that the previously known properties of boundary detection translate to edge detection abilities in images. Furthermore, we demonstrate practical use cases of magnitude for machine learning applications and propose a novel magnitude model that consists of a computationally efﬁcient magnitude computation and a learnable metric. By doing so, we address the computational hurdle that used to make magnitude impractical for many applications and open the way for the adoption of magnitude in machine learning research.


Introduction
The topology community has recently invested much effort in studying a newly introduced quantity called magnitude [1].While it originates from category theory, where it can be seen as a generalisation of the Euler characteristic to metric spaces, the magnitude of a metric space is most intuitively understood as an attempt to measure the effective size of a metric space [2].As a descriptive scalar, this quantity extends the set of other well known descriptors such as the rank, diameter or dimension.However, unlike those descriptors, the properties and potential use cases of magnitude are still under-explored.Because the metric space structure of datasets is a natural object of study when it comes to the understanding of fundamental machine learning concepts such as regularization, magnitude appears like a promising and powerful concept in machine learning: next to its abilities to describe the metric space of whole datasets, the magnitude can also be studied at the sample level, by considering each sample as its own metric space.Following this line of thought, magnitude vectors were introduced as a way to characterise the contribution of each data sample to the overall magnitude of the dataset, such that the sum of the elements of the magnitude vector amounts to the magnitude.This allowed to assess the individual contribution of each data point and their relative connectivity in the whole dataset.Indeed, magnitude vectors have been shown to detect boundaries of metric spaces, with boundary points exhibiting larger contributions to the magnitude [3].
Building upon these recent advances, we study the concept of magnitude of images, an important data modality for a plethora of machine learning applications.We endow each image with its own metric space and explore the properties of the magnitude and the magnitude vector for different choices of metric space structure.We extend the concept of a boundary of a metric space to images and show that it corresponds to edge detection abilities.We thus investigate the potential of magnitude for edge detection architectures and propose a new magnitude model.This model consists of a learnable metric on images followed by an efficient approximation of the magnitude on the learnt metric space.Our experiments show that this architecture is on par with existing edge detection approaches and thus represent a first promising use-case for magnitude in machine learning applications.What is more, we compare the magnitude model and the Sobel filter edge detectors from a topological perspective and find that both filters radically differ, with the magnitude model displaying more cycles and connected components.This points towards a potential complementarity of both approaches.
Our contributions can be summarised as follows: • We formalise the notion of magnitude vectors for images and investigate the impact of the choice of different metrics on the images.We further derive analytic forms for the magnitude vector of special cases of images.
• Based on this formalism, we provide a theoretical framework to understand the edge detection capabilities of magnitude vectors and link it to the previously known interpretation of boundary of a metric space.This framework provides a first basis for theoretically motivating the usage of magnitude in machine learning applications.
• We propose new efficient approximations for the computation of magnitude vectors on images, therefore facilitating the usage of magnitude in applications.
• We introduce a novel magnitude model that consists of the combination of learnable metric on images and an computationally efficient approximation of the magnitude vector of the image with the resulting metric space.We evaluate the ability of this model to perform edge detection images and show that it compares favourably with existing edge detection implementations.We also evaluate the topological properties of the detected edges and find them substantially different to comparable methods, suggesting a complementarity of the magnitude model with previous works.
This paper is organised as follows.In Section 2 we provide a theoretical framework for the behaviour of the magnitude measure on images.In Section 3, we consider the practicalities of computing the magnitude vector of images and provide a speedup algorithm.In Section 4 we evaluate the approximation methods described in this paper and perform experiments on the edge detection capabilities of the magnitude vector.Our results are summarised in Section 5.

Theorectical Results
We start by introducing the essential notions of the theory of magnitude, magnitude measures, and magnitude vectors.We proceed by laying out how an image can be viewed as a compact metric space and derive explicit formulae for the magnitude measure on the space of one-dimensional images.We further show how the magnitude measure for two-dimensional images can be approximated by one-dimensional images.

Mathematical Background
We start by formally introducing the notion of a finite and compact metric spaces.Definition 2.1.A metric space is an ordered pair (B, d), where B is a finite or compact set and d is a metric on B. If B is finite, then we denote the cardinality of B by |B|, if B is a compact set, then |B| denotes its dimensionality.
In many applications, the set B is a set of vectors B ⊂ R n and the metric considered is the p norm.In order to define the magnitude of such a space we first define the similarity matrix of a metric space.Definition 2.2.Given a finite metric space (B, d), its similarity matrix is We are now in a position to define the magnitude vector and the magnitude of a finite vector space.
Not every finite metric space has a magnitude.In particular, the magnitude is not defined when the similarity matrix is not invertible; the magnitude therefore characterises the structure of a metric space to some extent.For compact metric spaces an analogous notion exists.

Definition 2.4 ([4]
).Consider a metric space X = (B, d) with a compact set B and a metric d.A finite, signed Borel measure ν is called a magnitude measure on X if the following relation holds for all y ∈ X: Furthermore, the magnitude of X is given by We see that the compact metric space and the finite metric space cases are analogous where sums are replaced by integrals and weight vectors are replaced by magnitude measures.

An image as a compact metric space
What is an image?Different fields have produced distinct definitions.In computer science, and computer vision, an image is typically conceptualised as an array of pixels, where each pixel consists of a number of channels (usually 1 or 3), therefore corresponding to a tensor.In this work, we propose an alternative conception and consider a digital image as a set of points in some ambient space, as formalised below.

Definition 2.5 (Digital Image).
A digital image is a set of points ∈ R 2+n of the form {(i, j, c

The magnitude of images
The most straightforward approach to define the magnitude of an image is to choose a metric d and calculate the magnitude vector directly on the set of points defined by a digital image as defined in Definition 2.5 using Definition 2.3.However, this approach is unsatisfactory for both computational and theoretical reasons: (i) Computationally, this requires the inversion of a (#pixels × #pixels)-matrix, for which the computational cost grows cubically with the number of pixels.
(ii) Theoretically, this makes the tracing of how the individual pixel weights are formed very challenging.Indeed, the magnitude computation is global and all pixels can potentially contribute to the magnitude weight of each pixel.
To address the above limitations, we propose an alternative approach based on continuous images (analogue and digitised).We then show experimentally that this new alternative accurately recovers the digital scenario, however, with some numerical aberrations.
For each pair of points in the image domain D, x, x ∈ [0, w] × [0, h], we can express the corresponding points on the image (the R 2+n ambient space) by x = (x, Φ(x)) T and x = (x , Φ(x )) T , or x = (x, Φ s (x)) T and x = (x , Φ s (x )) T , respectively for digitised images.
By choosing a metric d for the set of tuples X = {x = (x, Φ(x)) : x ∈ D}, we can view each image as a compact metric space X = (X, d).A natural starting point for our derivations is the definition of the magnitude measure ν that satisfies: for all y ∈ X.
Substituting with the definition of x, we obtain an explicit integral equation for the magnitude measure of an analogue image: for all x ∈ D. Unfortunately, Equation 2 is in general analytically intractable.However, for specific cases, we can obtain an explicit computation, as we venture to show in this work.
Remark 2.11.Note that by using the map Φ, we can rephrase the problem of integration over a (possibly non-compact) space in R 2+n into an integration over a bounded plane in R 2 (which is always compact) and a modified metric.

1D images
One-dimensional images are images with a one-dimensional domain D = [0, w].The simplest such image is a constant line segment.We start by restating the magnitude measure of a line segment as proven in [4].
Proof.The proof follows from direct integration.
Equipped with the results on a line segment, we now focus on the meaning of the metric d((x, Φ(x)) T , (x , Φ(x )) T ).Recall that the domain we are interested in for 1D images are always line segments (which corresponds to a single row of pixels).The brightness and colour channels of the images modulate the typical metrics defined on D. We thus turn to the question of exactly how this metric is modified.
In the case of an analogue single channel image, the image surface is diffeomorphic to a bounded plane or, in other words, it is a warped plane.Therefore, two points on the plane are connected via the geodesic distance, or curve length in the case of a one-dimensional image.The influence of choosing a metric, e.g. an 1 metric, on the domain is choosing how exactly this distance is calculated.In the case of a one-dimensional domain this is for 2 , (3a) In the case of digitised images, we encounter discontinuous step functions.In this case, the curve length is the usual distance on a plane plus the absolute height of the steps between two points as can be seen from direct integration of equation 3.
Note that the constant image is a special case of the line image with α = 0. Substituting the line image into equation equation 3b, we obtain Therefore, equation equation 2 becomes for all x ∈ [0, w].Using Theorem 2.12, we conclude that the magnitude measure for constant or line images is given by ν(X) = 1 2 (|α| + 1)(µ(x 1 ) + δ 0 + δ w ).For more complex analogue images, or metrics other than 1 , finding a magnitude measure becomes analytically intractable.Therefore, we turn our focus to digitised images.We begin by considering a one-dimensional image, D = [0, w], with a single channel and two pixels.Note, that this scenario corresponds to the usual step function where the step is located at w/2.Lemma 2.14.Let φ s (x 1 ) = γH(x 1 − w/2) + c, where H(•) is the Heaviside function with convention H(0) = 0, constants w, c ∈ R + , and γ ∈ R. The magnitude measure of the metric space defined by φ(x 1 ) in the domain D = [0, w] with 1 metric is given by Proof.Calculating the curve length from equation 3b and substituting the magnitude measure into equation 1, we obtain .
We now split the interval [0, w] into three parts, [0, x 1 ), [x 1 , w/2], (w/2, w] and compute the three integrals From the first interval, we obtain I 1 = 1 2 .The second interval gives , and the third interval integrates to . We then have Again, we divide the domain into three parts, [0, w/2), [w/2, x 1 ], (x 1 , w] and compute the integrals I 1 , I 2 , I 3 .This gives ), and I 3 = 1 2 .We also obtain Remark 2.15 (Reflection property).Note that the second case is also covered by reflecting the function about the y-axis and integrating from w to 0, i.e. (1) let z = −x, (2) let −w 0 → − 0 −w and (3) shift z → z + w.This leaves the integral I invariant and we refer to this property as the reflection property.
We can extend Lemma 2.14 to many pixels in a one-dimensional digitised image via induction.
, where H(•) is the Heaviside function with convention H(0) = 0, w, c ∈ R + , and γ i ∈ R. The magnitude measure of the metric space induced by φ(x 1 ) in the domain D = [0, w] with 1 metric is given by Proof.The proof follows the proof of Lemma 2.14.We first consider three pixels with the curve length metric of equation 3, . Now, we consider three cases x 1 ≤ w/3, w/3 < x 1 ≤ 2w/3, and 2w/3 < x 1 ≤ w.
Corollary 2.18 (Multi-channel 1D-images).Theorem 2.16 can be applied to multi-channel one-dimensional images, with a magnitude measure An illustration of the results of this section are provided in Figure 1.To obtain the numerical magnitude we treated the step function as a one-dimensional digital image and proceeded via matrix inversion.The theoretical magnitude is calculated from Corollary 2.18.It can be seen that our results are generally in good agreement with numerical calculations.There are, however, some minor differences in the numerical results and our theoretically obtained magnitude measure.We attribute this to two factors, namely numerical inaccuracies in the matrix inversion and discretisation effects.Discretisation effects occur due to the fact that we consider a finite set of points and, therefore, any infinite step necessarily needs to be approximated via a steep, but finite step.
Remark 2.19.Note that one-dimensional images can also be viewed as time series, although, the magnitude measure ignores the arrow of time.
Remark 2.20.Instead of step functions, we could have chosen a piece-wise linear interpolation between the pixels to create a continuous surface.The proofs for this construction (e.g. a piece-wise linear continuous curve in one-dimension) are analogous to the step function case.In fact, the 2 curve-length is also tractable.However, piece-wise linear functions are not good models for images.

2D images
Equipped with exact calculations of the magnitude measure of digitised images in one dimension, we now aim to generalise these results to two-dimensional images.We first extend Theorem 2.12 to a bounded plane with 0 (Hamming) and 1 (Manhattan) metrics analogously.
Proof.Note that in both cases we can find a magnitude measure ν(x) such that the integral can be expressed as For the Hamming metric we note that it is only = 0 on a set of Lebesgue measure 0, namely x 1 = x 1 or x 2 = x 2 respectively.Therefore, the integrand equals 1 and we can express the magnitude measure as ν( which completes the proof.In the case of the Manhattan metric, we note that each integral equals the integral over a real line with magnitude measure ν(L [a,b] ) and ν(L [c,d] ) respectively.
A more general result of Theorem 2.21 has also been obtained in [6,Proposition 3.7].We illustrate the ramifications of Theorem 2.21 by generalising 2.13 to two-dimensional domains.
Remark 2.22.Interestingly, it seems that the Hamming distance does not suffer from boundary effects, however, due to the nature of this distance one also reduces the information carried by the metric.Furthermore, the Hamming distance is not robust to noise, i.e. small perturbations in the pixel values may have large effects on the pixel distance.
for all x ∈ D. Using Theorem 2.21, we conclude that the magnitude measure for constant or line images is given by ν(X) = (|α| + 1)ν(D).Note that these results generalise straightforwardly to multi-channel images by letting, for example, The obvious difficulty to obtain results for more general images is that the geodesic distance is not as straightforward to calculate.In fact, for general images, this is analytically intractable.A simplifying assumption we can make is to consider a rank−1 approximation of an image.In this case, the image is outer product of two one-dimensional (digitised) images.Therefore, we can use Theorem 2.21 and Theorem 2.16 to derive the magnitude measure.
Proof.This follows from Theorem 2.21 (ii) and Theorem 2.16.
Again, the above corollary can be straightforwardly generalised to colour images using Corollary 2.18.We note that the rank-1 approximation does not hold for almost any image and, therefore we consider another approximation which we call the independence approximation.
In the independence approximation we treat each pixel in a digitised image as if it were a pixel in a rank-1 image.That is, for a given location (x 1 , x 2 ) ∈ R 2 , we apply corollary 2.24 to obtain a local magnitude measure of the pixel Therefore, we only consider the step functions at the edges of each pixel to obtain a weight for the pixel.Even though, this is not a global magnitude measure, it is a reasonable approximation in practise (see Subsection 4.2) and, since it only relies on local pixel information, is computationally very efficient.

Interpretation of the magnitude measure
In the previous sections we calculated explicit magnitude measures for digitised images with 1 metric.
In this section, we interpret the meaning of these measures in the context of computer vision.First, we observe that the magnitude measure is a local property, that is, it only depends on the immediate neighbourhood of the point x in a domain D. This is holds exactly for one-dimensional images and we show, based on one-dimensional considerations, that this also holds at least approximately for two-dimensional images.These results can be used in constructing efficient algorithms when applying our results from digitised images to digital images.In Figure 1 we empirically show that the values of the magnitude measure calculated from our formulae (based on compact metric spaces) are reproduced in the numerical calculations on finite metric spaces modulo some numerical "discretisation effects".
Next, we investigate a potential interpretation for the value of the magnitude measure of a digitised image.Note that in the absence of any steps, it is just a constant equal to half the Lebesgue measure (ignoring boundary effects).Analogously, in numerical experiments the constant is determined by the grid spacing.The magnitude measure has a value larger than this constant only when steps are present, in other words, when the pixel brightness changes.In computer vision a "rapid change in pixel brightness" is usually referred to as an edge in an image and algorithms whose aim is to find edges in images are called "edge detectors".Therefore, we argue that computing the magnitude measure (or vector) of an image performs an edge detection task as it is large in the presence of an edge.
To determine how large a step needs to be in order to count as an "edge", we recall the definition of an exponential probability distribution.Let Z ∈ [0, ∞) be a random variable and λ > 0. The exponential distribution is given by a probability density function (PDF) with cumulative density function (CDF) If we let λ = 1 and x = |γ i |, then we notice that the prefactor to the singular part of the magnitude measure at the step locus has the form of the CDF of an exponential distribution.In the case of twodimensional images, we conjecture that a multivariate exponential distribution of the from p(z) ∼ e − f (z) , where f (•) is any continuous function, needs to be considered.Both approximations we introduced in Subsection 2.3.2 can be considered as a probabilistic independence assumption, i.e. p(z) = p(z 1 )p(z 2 ).Using this interpretation, we can consider any threshold for edges as the probability of a step being smaller than the threshold and in Subsection 4.3, we implement a magnitude-based edge detector.This application immediately leads to our main machine learning task of this paper.
Question: Given that the 1 magnitude vector has edge detection capabilities, can we learn a metric which improves the edge detection of the magnitude vector?
To answer this question, we first need to investigate efficient ways of calculating the magnitude vector of an image.

Speedup Algorithms and Learnable Metrics
In this section, we introduce two important tools in order to make magnitude computations more accessible and applicable.The first is an algorithm to speed up magnitude vector calculations based on the reasoning of the previous sections.The second is a neural network architecture which serves as a few-shot deep-learning-based edge detector.

Efficient approximations of the magnitude vector of images
Although already mentioned, we briefly reiterate that a major speedup (at the cost of accuracy) follows directly from 2.24.One can use a Fourier filter (which have efficient implementations in every major programming language) to calculate the step heights in the x 1 -and x 2 -directions and transform it to the approximate magnitude measure.
The second potential speedup is described in Algorithm 1 and also uses the locality property of the magnitude measure.In particular, it is a divide-and-conquer algorithm, where the image is first split into several overlapping patches (to account for boundary effects).Then the magnitude vector is calculated via matrix inversion, the appropriate boundaries are removed and the resulting patches are stitched together again.Therefore, the run time is linear in the number of patches and cubic in the number of pixels in a patch, since matrix inversion is O(n 3 ) for a matrix of size n.While theoretical intuition for the correctness of this algorithm is presented in the previous sections, we provide further empirical evidence and runtime comparisons in Subsection 4.2.

Algorithm 1: Heuristic speedup
Input: A digital image (img) tensor (c × h × w), a metric d, a patch size (h p , w p ), an overlap δ Output: A magnitude vector as an (h × w) tensor./* First split the image into n overlapping patches */ 1 zeroPad(img):

A pullback metric for edge detection
We now return to the machine learning question posed at the end of Section 2. Can we learn a metric to improve the edge detection of the magnitude vector?This task can be loosely placed in the subfield of machine learning called metric learning [7].Traditionally, metric learning is a supervised learning technique which tries to find a metric which minimises the distance between to related points (i.e. points that are in the same class) and maximises the distance between points which are unrelated (i.e. are in a different class).Many techniques for metric learning have been developed including triplet losses [8] and triplet networks [9].
In the case of optimising the magnitude vector one can think of the label as a one-channel (grey scale or binary) image which labels the ground truth we are trying to approximate, e.g. a manually annotated edge map in an edge detection dataset.Although this ground truth image is a per-pixel labelling it is not straightforward to use it as a label for a point in the input metric space (recall, that the similarity matrix has to be inverted and summed).Therefore, "classical" metric learning techniques cannot be applied and an alternative route needs to be taken.First, we formulate the learning task.Given a set B, we want to find a metric d such that where y is the ground truth label and ζ B is the similarity matrix of Definition 2.3.Note that finding functions which are guaranteed to be metrics (in particular, ones that fulfil the triangle inequality) is a highly non-trivial task.Before considering this issue, we proceed by defining a loss function.One possible loss function for this learning task is the 2 loss or mean sqaured error (MSE) loss, Notice that calculating the MSE loss involves a matrix inversion whose computational cost is prohibitive for large images.A straightforward application of Algorithm 1 is also not possible as we have no theoretical guarantees that the learnt metric has the same locality properties as the 1 metric.Therefore, we need to restrict the function classes which we approximate.A possible solution presents itself in the form of a pullback metric.
Definition 3.1 (Pullback metric).Let X and Y be two metric spaces and f : X → Y be an injective function.
Suppose we let Y be the original image metric space and d is the 1 metric.Then, we can immediately apply Algorithm 1 to make the loss computations tractable.The machine learning task also reduces to finding an injective function f : Y → X (usually referred to as an embedding) such that f is injective.Observing that a function f : X → Y is injective if there exists a function g : Y → X such that for all x ∈ X g( f (x)) = x, we can parameterise the function f by an autoencoder neural network.Fundamentally, there exist two different autoencoder architectures, compressive autoencoders which approximate a function R n → R m where m < n, and expansive autoencoders for which m ≥ n.Most autoencoders studied in the literature are compressive autoencoders due to their favourable theoretical properties and their ability to perform non-linear dimensionality reduction.One well-known disadvantage of expansive autoencoders is the fact that without further constraints they can learn the identity function, i.e. f (x) = g(x) = x, ∀x ∈ X.Furthermore, in most machine learning applications one would like to reduce the dimensionality of the data, not expand it.In the case of metric learning in the magnitude setting, however, both properties of expansive autoencoders are favourable as: (i) The input dimensionality of the model is already low: n + 2, where n is the number of channels (usually 1 or 3).
(ii) If the 1 metric in the input space is the best metric, then we want our model to be able to learn the identity mapping.
(iii) the MSE loss of equation 7 is a natural regularizer for the latent space of the autoencoder.
Taking these benefits into account, we design a magnitude edge-detector with an expansive autoencoder in the next section.

Experiments
In this section we perform experiments to validate our theoretical claims and investigate the power of the magnitude measure as an edge detector.

Datasets
For our experiments we used the BIPED dataset version 2 [10].BIPEDv2 is a new benchmark dataset specifically designed for edge detection.BIPEDv2 consists of 250 real-world images and annotated ground truths.The dataset is split into a training dataset of 200 annotated images and a test dataset of 50 annotated images.The resolution of the images is 1280 × 720 which results in almost one million pixels per image.The ground truth annotations have been generated by computer vision experts and moderated by an administrator.The viability of the ground truths have also been confirmed using machine learning methods.

Accuracy of different magnitude approximations
The first set of experiments we performed were to empirically validate and investigate the various approximate methods for calculating the magnitude vector of images introduced in the paper.All of our calculations are performed on the test set of BIPEDv2.In order to generate a ground truth magnitude, we rescale the original image to a resolution of 200 × 200.This resulted in needing to invert a 40000 × 40000-matrix, which is feasible on current machines.We then tested our three approximations, namely 1. the rank-1 approximation, 2. the independence approximation, 3. and the patched Algorithm 1.
To evaluate the performance, we first min-max scale the ground truth and the approximated images such that the magnitude values are between zero and one.Then we calculate three performance metrics, namely the maximum absolute deviation between the ground truth and the approximated magnitude vector ( ∞ distance), the normalised Fröbenius norm given by error = ∑ i (ground truth − magnitude vector approx ) 2 and the correlation between the two images.The results are presented in Figure 3 As expected, the rank-1 approximation is by far the worst with a low average correlation between the ground truth and the approximated magnitude.A strong correlation is present in the local approximation, however, the maximum deviation and Fröbenius norm are still large compared to the patched approximations.The patched algorithm generally provides a good approximation to the ground trut with correlation values close to one and comparatively small absolute deviation and Fröbenius norm.The number of patches seems to have only a minor effect on the approximation accuracy, which is expected from the theoretical intuition behing Algorithm 1.The computation time is drastically decreased by Algorithm 1, with a smallest average computation time with patches of 25 × 25 pixels, resulting in 64 patches in total, were considered.We attribute the increase in computation time for smaller patches to the computational overhead involved in processing a larger total number of patches.

Learning the magnitude metric
Finally, we evaluate the capabilities of the magnitude vector as an edge detector.We compare our results against standard edge detection baselines such as the Sobel filter, the Canny edge detector, the "vanilla magnitude", the current state of the art deep learning models Dexined [10] and the context-aware tracing strategy (CATS) [11].

Baselines:
The Sobel edge detector [12] is a classical method used in computer vision.It relies on two convolution filters, the Sobel operators, to extract the gradients in the horizontal and vertical direction from the image.The edge maps, which correspond to an edge probability, are calculated by taking the absolute value of the gradient at each pixel.The first step to compute the Sobel edge map is to calculate a grey-scale image from the colour image using the formula c grey = 0.2989c red + 0.5870c green + 0.1140c blue .
Then, we apply a Gaussian blur to the image and, finally, we use the Sobel operators with a given filter size.As a postprocessing step we, again, use min-max scaling.The Gaussian and Sobel filter sizes are hyperparameters which can, in principle, be optimised.However, in our experiments we use a Gaussian filter size of 5 and a Sobel filter size of 3. The Canny edge detector [13] builds on the Sobel filters and combines the gradient evaluation with a non-maximum suppression step, where the edge maps are sharpened, and a double-thresholding procedure to extract the edges.Unlike all the other methods considered here, the output of the Canny detector is a binary image, where pixel values of one correspond to edges and zeros correspond to non-edges.The thresholds and Sobel filter sizes are again hyperparameters which we optimise by requiring that the misclassification rate on the training set is minimised, i.e. the overlap of ones and zeros between the ground truth edge annotation and the Canny output is maximised.
The "vanilla magnitude" is simply the magnitude-transformed test image.Since an exact transformation is infeasible, we use Algorithm 1 to speed up the magnitude calculation.We use a Gaussian blur with filter size 5 as a preprocessing and min-max scaling as a postprocessing step to obtain an edge probability map.
Both of the above algorithms are pixel-level methods which only use pixel intensities and their neighbours.Recently, deep learning methods for edge detection have also been developed.Most noteworthy are the Dexined [10] and the CATS [11] algorithms.Both rely on training a convolutional neural network in a supervised fashion, where the input is the image and the label is the edge map.What is special about these methods is that they can leverage information from the deeper layers in the neural network to obtain a better representation of the image.Since the magnitude vector calculation is also a pixel-level method we naturally expect the deep learning methods to outperform our approach.In each scenario, we set aside 20% of the training patches as a validation set.
Our model: A graphical illustration of our edge detection model can be found in Figure 4. Essentially, it consists of two parts, a trained model to find a good pixel embedding and an image transformer for inference.In the training step, we sample a patch from an image (with overlap to account for boundary effects) and reshape it such that we obtain a training batch of shape (#pixels × #features).The features used are always the channel brightnesses (= 3 values: red, green, and blue) and the positional encoding as in Definition 2.5.Additionally, we can create more features by using a "feature extractor" or backbone.In the forward step we feed this batch through the autoencdoer, as outlined in Subsection 3.2, and use the latent space representation to calculate the magnitude vector of the image patch.The image transformer transforms a full high-resolution image to a full-resolution magnitude vector using the generated latent-space representation of the image and Algorithm 1.As postprocessing step we perform min-max scaling of the absolute values of the magnitude vector elements.The absolute value is taken, since numerical instabilities in the matrix inversion can lead to negative magnitude vector values.

Evaluation:
To rigorously evaluate our model performance, there are, in principle, two strategies; an indirect method, were the edge detection is evaluated via some downstream computer vision task, or a direct strategy, where the computed edge maps are compared directly to the ground truth annotation.We adopt the latter strategy in order to gain insight into the main advantages and disadvantages of the magnitude approach.To this end, we use four evaluation metrics commonly used in edge detection tasks [10], namely: 1. the Optimal Dataset Scale (ODS), where one threshold separating edges/non-edges is calculated for the entire dataset.
2. the Optimal Image Scale (OIS), where one threshold is calculated per image.

the average recall at 50% precision (R50).
In order to calculate these measures, we use standard postprocessing tasks such as non-maximal suppression (NMS) and morphological thinning of the edge maps.
Topological properties of magnitude images: Prior to performing a formalised investigation of the magnitude edge maps, we aim to quantify their structural-i.e.topological-properties.To this end, we leverage recent advances in computational topology and calculate the persistent homology of the edge images that we obtain via the Sobel edge detector or our magnitude-based method.As shown by Hu et al. [14], topological features are suitable to evaluate segmentations, for instance.Treating each image as a cubical complex [15], we extract topological features in dimension 0 (connected components) and dimension 1 (cycles) of the resulting edge images; we summarise all features using the norm of their corresponding Betti curve [16], i.e. a simplified description of their topological complexity.As Figure 6 shows, the magnitude and Sobel edge images are structurally substantially different (in terms of topological summary statistics).Specifically, we see that magnitude exihibts a larger degree of computational complexity in terms of both connected components and cycles, thus underscoring the fact that Sobel edge images and magnitude edge images indeed capture qualitatively different structures.

Results:
The numerical results are presented in Table 1 with example edge maps given in Figure 5.As can be seen, the best magnitude model (Model I) is on par with the Sobel edge detector, potentially performing only slightly worse.This is surprising as, unlike the Sobel method, the magnitude is not a purpose-built edge detector.The Canny edge detector also did not perform well according to our evaluation metrics.However, this can be explained by the fact the Canny edge detector does not provide a probabilistic edge map; rather it outputs a binary image with an already-implied hard decision boundary between edges and non-edges.Models II and III performed worst among all magnitudebased models.This could be attributed to the limited number of training examples used.The vanilla magnitude performed surprisingly well, although slightly worse than the optimised magnitude models.There seemed to be very little difference between the random and single-shot scenarios hinting at the fact that single images provides enough diversity to obtain a good latent space embedding.As expected, all pixel-level methods performed worse than the modern convolutional neural networks.This can be explained by the fact that more global information is available to these models.One noteworthy aspect is that the current model embeds single pixels into a latent space, therefore fine-tuning the distance between the pixels, but not taking into account more global information.This could be optimised in future versions of the magnitude edge detector.We also reiterate that no rigorous optimisation of the autoencoder or feature-extractor architecture was performed.The results provide a clear indication that careful optimisation of these parameters can lead to substantial performance increases (or decreases).
Qualitatively, the Sobel and magnitude edge maps are very similar, however, the magnitude edge maps are usually slightly darker.This could also be attributed to our postprocessing steps, in particular,  taking the absolute value of the magnitude vector elements before min-max scaling.Some details are lost in the local approximation, however, strong edges are still present.

Conclusion
In this paper, we introduced the magnitude vector of images.We started by outlining three different theoretical models for images and explained how they relate to each other.We then proceeded to show how the magnitude measure could be obtained for each of these image models and proved some foundational results.
The main theoretical contribution of this paper consists of explicitly deriving the magnitude measures for one-dimensional digitised images with 1 metric.Specifically, we showed that the magnitude measure is mostly constant throughout the image, except at the step locations of step functions, where it is singular (with a measure corresponding to the CDF of an exponential distribution).This allowed to theoretically motivate the ability of the magnitude measure to perform edge detection.We also considered two-dimensional images.However, due to the analytical intractability, approximation strate-gies were introduced.Based on these analytical results, we developed a patched speedup algorithm, which makes the magnitude vector calculation of high-resolution images feasible.We also considered refinements to the metric by introducing the notion of a pullback metric.
We performed a number of experiments validating our theoretical results, in particular, we empirically showed the validity of the patched magnitude calculation.In the final part of this paper, we presented results on the edge-detection capabilities of the magnitude vector with and without trained latent-space embeddings.The results of the experiments are promising and we found that the magnitude edge detector is approximately on par with the popular Sobel method, while still exhibiting substantially different topological, i.e. connectivity, properties.
These proof-of-principle experimental results open up a number of avenues for future research, both theoretically as well as experimentally.Major theoretical advances could consist of finding better approximations for the magnitude calculation in order to circumvent the matrix inversion step.Future experimental research could be directed towards finding better feature extractors or alternative metric learning procedures.In particular, it would be beneficial to be able to harness non-local pixel information.

Definition 2 . 3 .
Consider a finite metric space (B, d) of cardinality |B| = n and a metric d.Denote its similarity matrix by ζ B with inverse ζ −1 B .The magnitude vector of element B i is given by w

Theorem 2 .
12 ([4, Theorem 2]).Let µ be the Lebesgue measure of a line segment L [a,b] , [a, b], and let δ a and δ b be the Dirac measures at the respective end points.Then the magnitude measure ν on L [a,b] is given by ν

Figure 1 :
Figure 1: An illustration of the magnitude calculation of a two-channel, two-pixel, one-dimensional image.The solid lines represent the step functions, the dashed blue line is the numerical magnitude and the dotted orange line represents the theoretical magnitude.

Theorem 2 .
21. Let µ be the Lebesgue measure on the real line and ν(L [a,b] ) be the magnitude measure on a line segment L [a,b] .Then a magnitude measure on a bounded Plane P [a,b]×[c,d] is given by

Figure 2 :
Figure 2: Two 1D images.The brightness channel is constant across at least one of the dimensions.

5 zeta
Find magnitude vector of each patch */ 3 mag_patches = [] 4 for patch in patch_array do

Figure 3 :
Figure 3: Benchmark experiments performed on the 50 test imaged of BIPEDv2.We test the computational speedup as well as the the approximation quality of Algorithm 1 and the rank-1 and local approximations outlined in Subsection 2.3.2.
The image transformer module

Figure 4 :
Figure 4: A graphical overview of the magnitude edge detector.During training, we train the autoencoder (optionally the feature extractor) as presented in 4a and during inference, we use the image transformer of 4b

Figure 5 :
Figure 5: Example edge maps taken from the test set of BIPEDv2.We see that the ground truth annotation of the images not always exact and, therefore, any pixel-level evaluation should be taken with care.We compare the Sobel filter output, our best-performing magnitude model and the local approximation of the vanilla magnitude.The colours have been inverted for better visibility.

Table 1 :
Topological complexity in terms of the norm of Betti curves for magnitude (red) and Sobel edge images (blue).The edge detection performance of our magnitude model and baselines.