Spline-base d me dial axis transform representation of binary images (cid:2)

Medial axes are well-known descriptors used for representing, manipulating, and compressing binary images. In this paper, we present a full pipeline for computing a stable and accurate piece-wise B-spline representation of Medial Axis Transforms (MATs) of binary images. A comprehensive evaluation on a benchmark shows that our method, called Spline-based Medial Axis Transform (SMAT), achieves very high compression ratios while keeping quality high. Compared with the regular MAT representation, the SMAT yields a much higher compression ratio at the cost of a slightly lower image quality. We illustrate our approach on a multi-scale SMAT representation, generating super-resolution images, and free-form binary image deformation. © 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

B-splines [17] to automatically compute a compact spline representation of the MAT of a 2D binary shape. Representing MATs with splines requires fewer (control) points to store than contour-based encoding, which can help image compression. However, this method does not offer a separate control of the MAT simplification and spline approximation.
In this paper, we propose an alternative approach to Zhu et al. [16] that extracts pixel-based MATs directly from binary images and represents them with splines. Our Spline-based MAT (SMAT for short) has the following features: • Generality: SMAT can directly treat any pixel (binary) image, without requiring the extraction of a densely-sampled vectorrepresentation of the shape contour. • Algorithm advancement: We propose a more refined splinefitting scheme, which includes adaptive B-spline fitting and a merge-split algorithm. • Computational scalability: We inherit the real-time performance of the underlying GPU-based MAT extraction, allowing for high-throughput image processing applications. • Evaluation: We show that SMAT effectively represents binary images with high accuracy and high compression ratio by measuring five quality metrics. • Applications: We demonstrate the potential of SMAT by applications in super-resolution image generation, multiscale MAT representation, and free-form image deformation.
The remainder of the paper is organized as follows. Section 2 presents related work in skeletonization, medial axis computation, and B-spline modelling. Section Table 1 MAT computation methods as a function of the representation of the input I and its MAT S I , with our method (SMAT) indicated.
Rep. of ∂I raster vector Rep. of S I raster distance field, thinning distance field vector SMAT Voronoi methods the construction of SMAT. Section 4 evaluates SMAT on various images and against five quality metrics. Section 5 presents three applications of SMAT. Section 6 discusses our proposal. Finally, Section 7 concludes the paper.

Related work
We structure related work into the computation of MATs ( Section 2.1 ), medial stability and accuracy ( Section 2.2 ), and spline representation models ( Section 2.3 ).

Medial axis computation
As an intrinsic shape representation, the notion of medial axes was first introduced by Blum [18,19] defined as the locus of centers of maximal disks contained in a shape. Formally, for a binary image I ∈ R 2 with boundary ∂I, the distance transform DT I is defined The medial axis, or skeleton, of I is then defined as That is, the medial axis S I of I is the set of points inside I with at least two different closest points, f 1 and f 2 , on the boundary ∂I. The points f i are also called feature points of the medial point x [20,21] . The medial axis transform MAT of I is the set { x ∈ S I , DT I (x ) } of medial points and their maximal disk radii. The union of these maximal disks exactly reconstructs I, which makes the MAT a dual representation of shape. For a full discussion of MATs and their properties, we refer to [22,23] . Analytic solutions of Eq. (2) are very hard to compute, and require analytic descriptions of ∂I, which are in general not available. Hence, one computes the MAT by approximating both the input boundary ∂I, but also the MAT itself. Two such main approximations exist: Raster methods represent ∂I and/or S I on a fixed pixel grid; vector methods represent ∂I and/or S I by a piecewisecontinuous description in R 2 , e.g. , using polylines or higher-order curves. With this model, existing MAT computation methods can be further classified as follows (see also Table 1 and related surveys [23,24] ). Morphological thinning methods [25] represent both the image and MAT on pixel grids. They erode I inwards with constant speed until left with a one-pixel-thin connected structure representing S I . However, such methods do not in general guarantee that S I is centered within I, i.e. , DT I can be poorly approximated.
Geometric methods [26,27] find S I as a subset of the edges of the Voronoi diagram of a polyline representation of ∂I. However, such methods require specific sampling conditions for the points describing ∂I to be met [28] . The method of Zhu et al. [16] also falls in this class. While very accurate and compact in representation, finding a vector representation of ∂I for shapes provided in raster (image) form is not evident. Finally, distance field methods [20,21,[29][30][31][32] compute a raster representation of DT I from either raster or vector representations of ∂I, and next find S I along singularities of DT I . Most current MAT methods fall in this class, given that DT I can be estimated exactly and in linear time [20,21,32] . Such methods can be further accelerated on the GPU, yielding real-time MAT computation [33,34] .
Our method (SMAT) combines the advantages of distance field methods (it accepts any binary image as input, has real-time performance, and computes DT I accurately) with those of a vector representation of S I (built-in smoothness, compact storage). SMAT is, to our knowledge, the first (and thus only) method producing vector representations of the MAT from raster representations of input shapes ( Table 1 ).

Medial axis stability and accuracy
Medial axes S I estimated from Eq. (2) are notoriously unstable [24] : Small perturbations along ∂I, created e.g. by sampling inherent to both raster and vector representations, introduce many so-called spurious medial branches, which contribute little (or not at all in practice) to the description of I, but considerably complicate S I . Effort has been invested in simplifying medial axes, by removing (parts of) the spurious branches, to make them stable. However, a simplified S I cannot exactly represent, or encode, I. Hence, accuracy (of representing a shape) and stability (of MAT computation) are related, but competing goals. We classify attempts to improve stability and accuracy into two groups, as follows.
Medial-axis-based methods aim mainly to compute a stable, or regularized, ˜ S I by removing spurious branches from S I . For this, one can estimate the so-called importance ρ(x ) of every medial point x ∈ S I , as the boundary length between the feature points f 1 and f 2 of x . Only medial points with ρ(x ) above a user-given threshold are taken over from S I into ˜ S I . Importance thresholding is simple to implement for both raster [31,32] and vector [26,27] medial representations, delivers connected skeletons, and has an intuitive interpretation: the image ˜ I reconstructed from ˜ S I replaces all bumps along ∂I shorter than the threshold by circular arcs, effectively acting like a low-pass noise-boundary filter. The salience metric [35] replaces ρ(x ) by which removes only the noise-induced medial branches. This yields reconstructions ˜ I where small-scale boundary bumps are smoothed out, but important (salient) corners are kept untouched. Other similar metrics include the angle between feature vectors [21,[36][37][38] and divergence of the distance transform [39] . Such metrics have proven very effective in producing simplified stable skeletons (also for 3D shapes [40] ). However, they optimize for accuracy only implicitly , given the fact that spurious branches represent only small-scale details along ∂I. Distance-based MATs have been also used to compress grayscale images based on the (simplified) MATs of their threshold-sets [41,42] .
Reconstruction-based methods approach the joint stabilityaccuracy problem by maximizing reconstruction accuracy, i.e. , the difference between I and ˜ I [43] . This can be estimated using the Hausdorff distance [44] between (sampled representations) of ∂I and ∂ ˜ I , defined as where h (A, B ) is the one-sided Hausdorff distance given by The MAT method of Zhu et al. [16] uses this approach to compute the simplified ˜ S I by iteratively removing endpoints from S I , continuously checking their reconstruction error ( Eq. (5) ) and stopping when this reaches a user-allowed level. This is computationally expensive. Moreover, while H is a recognized metric for Five rectangular shapes with different amounts of noise added to the boundary (a1-a5) and the comparisons (b1-b5) between the reconstructions (grey regions) and the original boundaries (black contours) of (a1) and (a5).

Table 2
Accuracy evaluation of the MATs for the images in Fig. 1  comparing contours, it is sensitive to noise or outliers [45] . This can prevent regularization: we cannot remove spurious branches from S I since they will create small changes in ˜ I that H highly penalizes. This can be improved by replacing Eq. (4) by a variant using the average operator, i.e. , leading to the average Hausdorff distance Besides considering the similarity of boundaries , one can take the overall shapes into account. The Jaccard similarity coefficient [46] achieves this by considering the size of the intersection, normalized by the size of the union, of two shapes Jaccard(I, ˜ The Multi-Scale Structural SIMilarity (MS-SSIM) index [47,48] provides an advanced top-down model of how the human visual system interprets images. Although designed to measure the similarity of grayscale images, it can also be used for binary images. Both Jaccard and MS-SSIM range in [0,1], where 1 indicates the two input shapes are exactly the same, while 0 means the two are completely different. Fig. 1 (a1-a5) shows five rectangular shapes with randomly added noise on their boundary, and their simplified medial axes for σ 0 = 1 . 5 . We see that, as the noise increases, the simplified medial axes change little, and are thus quite stable to noise. Table 2 shows the above four quality metrics H, H , Jaccard, and MS-SSIM for the reconstructed images. The more noise on the boundary, the larger H and H , and the lower the Jaccard and MS-SSIM scores. To give an intuitive understanding of these values, Fig. 1 (b1-b5) compares the reconstructed images (gray) with the original boundaries (black) of (a1-a5), respectively. For (b1), the two almost overlap, yielding a very small H = 0 . 08% of the image diagonal, and Jaccard and MS-SSIM scores above 0.99. The reconstruction in (b5) preserves the salient features of the original image while removing small-scale boundary noise. This still yields a small H = 0 . 45% of the image diagonal, and high Jaccard and MS-SSIM scores.

B-spline representation
Storing and manipulating simplified medial axes ˜ S I and their corresponding MATs can benefit from the observation that such structures correspond to piecewise smooth curves (branches) [23] . Following this, Yushkevich et al. [17] first proposed to model the MAT with cubic B-splines. However, their method needs to build a template continuous medial representation model manually , which is then manipulated to fit a target shape. Zhu et al. [16] improve this by proposing a fully automatic way to represent MATs with Bsplines. Yet, their method handles only vector representations (of both the shape and its medial axis). In contrast, our method uses raster representations for both I and S I ( Section 3 ), and converts the latter into a vector representation using splines. This (1) makes our method directly applicable to any binary image, without the need to extract a piecewise-continuous contour ∂I with sampling guarantees; and (2) provides vector-based medial representations for any raster-based MAT computation method, in contrast to [16] , which only works with the Voronoi-based MAT method of [27] .
B-splines are a common and preferred way of specifying very smooth curves ( C (d−1) continuity for degree d) in computer graphics and geometric design [49] . Given n + 1 control points (CPs) p 0 , . . . , p n and a knot vector U = [ u 0 , . . . , u m ] , the B-spline curve of degree d is defined as [50] The functions N i,d (u ) are the B-spline basis functions , defined recursively via A B-spline curve given by n + 1 control points, m + 1 knots, and degree d must satisfy m = n + d + 1 . The knot vector is either open or periodic . In this work, we use open-uniform knot vectors, given by where a and b are usually set to 0 and 1 respectively. This allows generating a B-spline curve based only on a set of n + 1 control points and degree d with 0 < d < n + 1 .

Proposed SMAT representation
We compute our SMAT representation as follows (see also Fig. 2 ). We start with a binary image I as input. For simplicity of exposition, we next consider I has a single foreground connected component (black pixels in Fig. 2 a); in practice, our implementation handles multiple such components, one at a time. We next compute the full MAT (S I , DT I ) using the method in [31] . However, any other raster-based MAT computation can be used as well. Next, we compute the simplified medial axis by upper-thresholding the salience metric σ ( Section 2.2 ) by a user-specified value σ 0 . The simplified MAT is given by where DT I is the restriction of DT I to the pixels of S I . We use the simplified MAT as input for our SMAT construction. For this, we segment S I into separate branches ( Section 3.1 ) and fit them using splines ( Section 3.2 ). Finally, we encode the entire set of spline control points efficiently ( Section 3.3 ). The resulting encoding can be then used to reconstruct an approximation ˜ I of the input shape I ( Section 3.4 ).

Medial axis segmentation
To carry out the piecewise B-spline fitting, the simplified medial axis should be segmented into branches ( Fig. 2 , Step 3). Algorithm 1 outlines the segmentation procedure. First, we clean up S I so it is 8-connected. We next characterize medial points by the number of neighbors in S I they have, as follows: Branch endpoints have a single neighbor; regular points have two neighbors; and branch junctions have three or more neighbors. We then find an endpoint (or an arbitrary point if no endpoints exist) x of S I and start tracing along the medial axis from there, adding the discovered MAT points (y , DT I (y )) to the current branch B . When arriving at a junction or endpoint, we add B to the branch-set B and, if at a junction, start tracing new branches from all medial neighbors n of the current point y .

B-Spline fitting
To each branch B i = { (x ∈ S I , DT I (x )) } found in the branch-set B = { B i } , we fit a B-spline curve C i using a least-squares algorithm ( Fig. 2 , step 4). For details of the least-squares fit, we refer to [51] . Given a user-provided fitting error γ 0 between B i and C i , we compute the minimal number of needed control points N and spline degree d by an adaptive algorithm ( Section 3.2.1 ). We further minimize the number of needed splines C i across several branches by a merge-split algorithm ( Section 3.2.2 ).

Adaptive-degree fitting
Although it is common to use quadratic ( d = 2 ) or cubic ( d = 3 ) B-splines to approximate a set of points, we compute d so as to get the lowest number of control points N needed to reach an error below the user-given threshold γ 0 . Computing d follows the con-   Table 3 Number of control points N needed to fit each of the four medial branches for the quadratic, cubic, and adaptive schemes in Fig. 3 . Values in brackets give the degree d of each B-spline.

Branch
Quadratic Cubic Adaptive 19 (2) 14 (3) 11 (5) Total 41 37 30 by √ 2 γ max [52] , where γ max is the maximal fitting error over all splines C i . Thus, users can set γ 0 either directly -to control the medial axis approximation error -or based on the desired boundary approximation error H(I , ˜ I ) .  Table 3 lists the number of control points and degree for the four medial branches of the shape, for each of the above three splinefitting schemes. For long and curved branches, like B 4 (black), our scheme uses a high degree d = 5 to reduce the required N; for short and straight branches, like B 3 (green), our method reduces N by using a lower degree d = 1 than the quadratic and cubic schemes. Overall, our adaptive-degree B-spline fitting saves about 20% control points.

Merge-split algorithm
Algorithm 2 describes the adaptive B-spline fitting for a single branch. When the medial axis S I is only slightly simplified, S I contains many short branches corresponding, as outlined in Section 2.2 , to small-scale details along ∂I. Encoding these requires many control points, so we propose to merge these short branches to alleviate this. For each branch-fragment A , we find all fragments where N A + B i is the number of control points needed to fit the merged branch A ∪ B i , and N A and N B j are the number of control points required for branch-fragments A and B j , respectively.
A separate issue is that there may be long and curved branches which are difficult to fit with a B-spline, requiring more than N max (set to 15 in practice) control points. We address this by splitting such branches before fitting.
be the branch to be split; then B i 1 = { (x k ∈ S I , DT I (x k )) } n/ 2 k =0 and B i 2 = { (x k ∈ S I , DT I (x k )) } n k = n/ 2+1 are the two branches obtained by splitting B i in half . We consecutively split long and curved branches in half until all resulting branch-segments can be fitted by Algorithm 2 with splines with fewer than N max control points, and the fitting error is under the user-specified γ 0 .
To illustrate the effect of the merge-split (MS) algorithm, Fig. 4 compares SMAT results for a lizard shape (taken from [16] ) when applying the MS algorithm (b) and not applying it (a). We set γ 0 = 0 . 35% of the diagonal of the image. MS creates a SMAT using only 57 control points instead of the 65 required if one B-spline per MAT branch is used. The negative numbers in Fig. 4 show the drop in control points, per branch, due to MS. The long and curved branch along the tail is split into two segments (black and green). Fewer control points are needed after splitting since the two segments can be separately fit with splines using different degrees. For short and straight branches, like the ones corresponding to a finger of the fore leg, merging them into one branch also decreases the control point count. Additionally, using MS greatly reduces the total number of splines from 27 to 20 used since several connected MAT branches can be approximated by a single spline. More information on the advantages of MS for reducing the information needed to store the SMAT is given in the supplementary material.

Reconstruction
We reconstruct the approximation ˜ I of I from the SMAT representation ( Section 3.3 ) as follows. For each spline b i , the open-uniform knot vector can be determined following Eq. (11) .
The basis functions N i,d (u ) are next computed using Eq. (10) , followed by generating the B-spline ( Eq. (9) ). Each such spline is next rasterized on the desired pixel grid, using either the original image resolution (w, h ) or, if desired, a higher resolution (see next Section 5.1 ). For rasterization, we first split each branch b i into Bézier (polynomial) segments. This uses knot insertion to ensure that each internal knot has multiplicity equal to d i , which is done efficiently using the Oslo algorithm [53] . Each Bézier segment is then rasterized using adaptive binary subdivision based on de Casteljau's algorithm [50] . The adaptive subdivision proceeds until the maximum distance of all inner Bernstein-Bézier control points from the line-segment given by the end-points of the current (sub-)segment is below pixel precision, at which point the (sub-)segment is drawn using Bresenham's line-drawing algorithm based on the mentioned end-points.
Once the medial pixels x i with DT values DT i are evaluated this way, we reconstruct ˜ I as the union of disks with centers x i and radii DT i , using the efficient procedure for computing this union from [31] .

Implementation
We implemented the MAT simplification ( Eq. (12) ), Algorithms 1 and 2 , the SMAT encoding ( Section 3.3 ), and spline rasterization ( Section 3.4 ) in C++. We compute exact Euclidean MATs and also reconstruct the initial image from a rasterized SMAT using the public CUDA implementation provided at [54] . Our entire method, including source code, datasets, and evaluation scripts, is publicly available [55] .

Evaluation methodology
SMAT depends on two parameters ( Fig. 2 ): the salience threshold σ 0 , which gives the simplification of the medial axis S I , and the tolerance γ 0 that tells how accurately B-splines fit medial branches. We evaluate SMAT based on two factors: Similarity Q of the reconstruction ˜ I provided by SMAT ( Section 3.4 ) to the original shape I. Here, Q stands for any of the metrics H, H , MS-SSIM, and Jaccard ( Section 2.2 ).
Compression ratio CR that measures how more compact SM( ˜ I ) is as compared to I. We define CR = | I| / | SM ( ˜ I ) | . Here, SM ( ˜ I ) is the size (in bytes) of the SMAT storage scheme outlined in Section 3.3 .
In contrast, | I| , i.e. , the storage needed for shape I, can be defined in many ways, depending on how I is represented, e.g. by chain coding [2] , geometrical approximation [3] , or bitmap-based encoding [11,12] . We next model | I| as the size (in bytes) of the contour ∂I. Table 4 Evaluation of SMAT for an animal shape. Given the above, the total quality of SMAT can be modeled as To find a good trade-off between Q and CR, we do a grid-search over σ 0 and γ 0 . We use σ 0 ∈ { 0 . 1 , 0 . 5 , 1 . 0 , 1 . 5 } as this range was indicated in the original salience paper [35] as producing MAT simplifications that remove small-scale noise but keep salient de-  Table 4 shows the resulting values for all four similarities Q, the compression ratio CR , and also the total number of control points N in the SMAT, for all values of σ 0 and γ 0 , for a binary image also used in [31,39] . Several other examples are available in the supplementary material.
To understand the trends in Table 4 , we use next two scatterplots ( Figs. 5 ) showing H vs CR and MS-SSIM vs CR , respectively. The plots of H vs CR and Jaccard vs CR are similar to Fig. 5 (a) and (b), respectively, and are not included for space constraints. In each plot, the sixteen points correspond to combinations of (σ 0 , γ 0 ) values. We encode σ 0 in four base colors (hues), and γ 0 in the size of the bullets. Fig. 5 (a) shows a roughly direct correlation of H with CR , while Fig. 5 (b) shows an inverse correlation of MS-SSIM with CR , as expected. In general, the larger σ 0 , the more simplified S I , so the lower the MS-SSIM and the higher the H and CR . Overall, the quality has not decreased much, but the CR has been greatly improved. Under a certain σ 0 , γ 0 determines how close the B-spline is to the current skeleton. Although there is no very strict trend, in general, with the increase of γ 0 , Q tends to decrease while CR increases.

Joint compression-quality evaluation
We next evaluate SMAT's ability to compress images and retain similarity by a benchmark containing 30 images of five different types ( Table 5 and Fig. 6 ), mainly selected from the MPEG-7 benchmark [56] . To optimize both reconstruction similarity Q and CR for  Table 4 . each shape, we next set Q = MS-SSIM , as we argue that, from the set of four considered similarity metrics, MS-SSIM best represents how humans perceive two shapes I and ˜ I as being similar. Also, we combine MS-SSIM and CR into a simple joint quality metric where CR for a given shape is its CR value normalized by the maximal CR over all shapes in the benchmark. This way, both CR and MS-SSIM range in [0,1], so can be combined in Eq. (15) . The optimization is the result that maximizes Q over all 16 studied (γ 0 , σ 0 ) values. Fig. 6 shows the obtained results. Rows indicate shapes in the five benchmark classes (see Table 5 ). Cyan shows the reconstruction ˜ I with optimal Q . Black outlines show the boundaries ∂I of the input shapes. In all cases, the reconstruction is visually almost identical to the original shape, also confirmed by the high MS-SSIM values. Compression values CR also are in general quite high, less so for shapes having many small-scale details such as (d6) and (e6). Images (a6), (e1), and (e6) show that SMAT can handle shapes with holes with no problem. Relative boundary length, defined as boundary length | ∂I| divided by the diagonal of the image, is colorcoded in the legend in Fig. 6 . Fig. 7 summarizes the SMAT performance on the 30 images in Fig. 6 by a scatterplot of MS-SSIM vs CR . Colors indicate the relative boundary length just as in Fig. 6 . We see a slight inverse correlation of high CR and MS-SSIM with the relative boundary length, which is expected: shorter boundaries have, on average, fewer details, so are easier to encode by SMAT. This is also visible in the fact that complex shapes (types d and e) reach lower CR and MS-SSIM, while simpler shapes (types a and b) reach higher CR and MS-SSIM. We also see the relatively strong effect that even tiny boundary details have on CR : Shapes a1, b1, b4, and b2 are arguably of very similar visual complexity and all have short boundaries (dark blue in Fig. 7 ). All compress with a very high MS-SSIM > 0 . 985 . However, their CR 's vary between 37.2 (a1) and 133 (b2). This is caused by tiny, pixel-size, noise, present e.g. along a1's boundary, but largely absent for the other three shapes. To keep such tiny details, we need to keep many branches in the MAT.

Comparison with compressed MAT representation
Besides comparing SMAT with the ground-truth I, it is interesting to compare it with other methods that provide compressed MAT representations. One such method [41] essentially uses the same MAT extraction [31] and simplification [35] as SMAT, but next compresses the pixel-chains in S I using delta encoding rather than B-splines as we do. Fig. 8 shows the average MS-SSIM vs CR for the five shape types in our benchmark for SMAT and the delta method in [41] . For that method, we define is the size (in bytes) needed to store S I with delta encoding for a binary image, which is an efficient way to store 8-connected pixelpaths. As in our case, ( Section 4.1 ), | I| denotes the size (in bytes) needed to store ∂I. In the figure, the larger the γ 0 (the larger the filled dots), means the larger the approximation error of the spline, so the lower the quality, the higher the CR . When γ 0 equals 0.002, the MS-SSIM score of SMAT is only 0.002 lower than the one of the delta encoding, but SMAT yields CR values 2 up to 6 times higher.

Comparison with Zhu et al. [16]
Result quality: Fig. 10 compares SMAT with Zhu et al. [16] , which, as mentioned earlier, is the most similar method (in aims) to ours. This comparison must however be done carefully. As mentioned ( Section 2.3 ), their input shape boundaries ∂I must be carefully (densely) sampled to capture the boundary topology faithfully [28] . How this is done is not further detailed. Also, they use for reconstruction error Q the one-sided Hausdorff distance ( Eq. 5 ) from these sample points P ∈ P of ∂I to the boundary ∂ ˜ I of the reconstructed shape. This leads to a smaller distance value than if all points of ∂I were considered. Fig. 9 illustrates this. If the (one-sided) Hausdorff distance is defined as h (I, ˜ [16] , then some of the distance values (like l in the figure) will be less than (maximally equal to) val- In contrast to the above, Fig. 10 (a2-c2) shows the two-sided H ( Eq. 4 ), and considers every pixel on both boundaries ∂I and ∂ ˜ I . We used here images extracted from the paper [16] , since no data or code for that method was available. In this image, is the onesided Hausdorff distance used by [16] explained above. The other metrics are ours, explained earlier. Overall, SMAT produces results that are very similar to Zhu et al., but requires 15% fewer control points N. Importantly, H, the double-sided Hausdorff distance we use for SMAT, is always larger than the one-sided Hausdorff distance used by Zhu et al., by definition -meaning that our test for accurate reconstruction is more stringent than the one in Zhu et al. Parameters: Zhu et al. offer a single parameter ˆ which controls both the MAT simplification and the spline fitting. Small ˆ values, thus, will keep most of the raw MAT branches and also tightly fit B splines to them. Conversely, large ˆ values will simplify the MAT significantly and fit B splines looser. In contrast, we  [27] ) to the simplified MATs takes tens of seconds for a shape of 512 2 pixels. In contrast, our total time, from receiving I up to computing SM( ˜ I ) , is only tens of milliseconds (see Fig. 10 a2-f2).   9. Analysis of Hausdorff distance used in [16] . P is one of the sample points of the original shape boundary ∂I. ∂ ˜ I is the boundary of the reconstructed shape.

Applications
In addition to the shape or MAT compression ( Section 4 ), SMAT also provides other useful results. We outline here superresolution medial axes ( Section 5.1 ), multiscale SMAT representations ( Section 5.2 ), and shape manipulation by MAT control points ( Section 5.3 ).

Super-resolution
As B-splines are smooth, one can rasterize them at any resolution in the reconstruction step ( Section 3.4 ). This does not incur any extra storage: the same SMAT representation ( Section 3.3 ) can be used. Fig. 11 shows the effect of increasing the resolution by 10 times on a jagged shape. The top images show the SMAT reconstruction of a shape at its original resolution ( 200 × 200 pixels). Bottom images show the SMAT reconstruction, from the same SM representation , at a 10 times higher resolution than the input image. As visible, the super-resolution reconstruction removes the discretization artifacts of the original reconstruction while keeping the reconstructed boundary ∂ ˜ I (line separating black from white in the image) smooth. To our knowledge, this is the only method providing super-resolution raster MATs apart from [57] . In comparison to [57] , SMAT is fully generic and simpler, i.e. , does not need to use special (image-based) interpolation tricks to create the super-resolution MATs.

Multiscale SMAT representation
SMAT uses as input the simplified MATs S I of its input shapes I ( Section 3 ). These simplified MATs depend on the user-provided salience parameter σ 0 . In practice, this means that if a user simplified S I too much (by setting σ 0 too high), one would have to re-run the entire SMAT pipeline, which is tedious and time consuming. We improve this by proposing a multiscale SMAT representation.   Instead of simplifying the MAT ( Fig. 2 , Step 2), we encode the full MAT together with the importance ( ρ) value at each MAT point as a 4D curve-set (S I .x, S I .y, DT I , ρ) , using B-splines. As we encode both DT I and ρ, we can next compute the salience σ using Eq. 3 .
A single multiscale SMATs allows generating an entire family of progressively simplified MATs, and their corresponding reconstructions, by running only the σ 0 thresholding on the decoded S I produced by Step 6 in Fig. 2 , followed by reconstruction (Step 7). Fig. 12 (b) shows the 4D data (S I .x, S I .y, DT I , ρ) in red, fitted by B-splines (green) for the jagged shape in Fig. 11 . Videos showing additional results on this experiment are in the supplementary material. Figs. 12 (c1-c6) show the gradually-simplified MATs S I and corresponding reconstructions ˜ I for increasing thresholds σ 0 .
Multiscale SMAT is cost-effective: obtaining the six simplifications in Fig. 12 (c1-c6) is about five times faster than running six single-scale SMATs. Note that producing such multiscale SMATs is not possible with [16] : Indeed, for any change of the error user parameter ˆ proposed there, the entire pipeline of MAT simplification and spline-fitting has to be re-run, which is expensive, as outlined in Section 4.4 .

Shape manipulation
Manipulating 2D shapes is important in applications like character animation [58,59] and image editing [60] . Several methods such as m-reps [61] , subdivision skeletons [62] , and medial surface deformation [63] have used MATs to this end, by changing the shape via changing its medial axis. Still, picking suitable control points to manipulate the MAT is not easy. In contrast, SMAT allows deforming a shape simply by manipulating the SMAT descriptor. Fig. 13 shows five deformations where we keep the human body and horse rump fixed and change the horse legs by adding ( + c), deleting ( −c), moving ( m ) control points and/or increasing the B-spline degrees ( + d , −d ). A few of the manipulations are outlined next. To overlap the two hind legs and vary the curvature of their joint silhouette (d, e) we increased the number of control points for the leg-to-body connection branch (dark blue) and also reduced the number of control points for these legs. A similar edit was done to the purple branch in (f) to overlap the front legs. To get a curled up right front leg (dark red, image (e)), we reduced the degree of its B-spline to 1 so as to better control it to clearly show each leg joint, and next moved its control points as desired.

Discussion
We next discuss several aspects of our SMAT representation. Ease of use: SMAT has two parameters that affect the trade-off between compression ratio and reconstruction accuracy: σ 0 and γ 0 . Intuitively, σ 0 controls how many small-scale details (bumps) on the shape boundary are removed (MAT simplification); while γ 0 controls how much the MA branches are 'smoothed out' into splines (MAT approximation). Speed: Since our MAT is implemented on the GPU ( Section 2.1 ), the SMAT pipeline is very fast. For a shape of 512 2 pixels, the entire pipeline takes only a few tens of milliseconds on a Linux PC with an Nvidia RTX 2060 GPU. Following [41] , the complexity of the SMAT computation is linear in the number of pixels in the input image. Comparison: It is useful to summarize the differences between SMAT and Zhu et al. [16] . A key point is that Zhu et al. simplify MATs based on the reconstruction error , which naturally yields small errors. SMAT simplifies MATs based on how salient their points are for shape perception. Hence, our reconstructions can have a possibly larger Hausdorff distance H to the initial shape, as we are not explicitly optimizing for H. Both approaches are valid but for different goals: If one wants simplified MATs that reconstruct a shape as close as possible with respect to H, and is fine with computation times of minutes, Zhu et al. is to be used. If one accepts small reconstruction errors in non-salient shape parts, desires a multiscale MAT for simplification with different thresholds, and needs a real-time response, then our method is to be used. Limitations: SMAT cannot (yet) be lossless, i.e. , yield an exact, zero Hausdorff-distance-to-original, reconstruction. To fully reconstruct the input shape, the full , unsimplified, MAT should be used. Fig. 14 (a) shows the full medial axis of a simple rectangular shape. As Fig. 14 (b) shows, the full-MA branches are very close. Algorithm 1 will segment this MA into tens of thousands of very short branches (shown in Fig. 14 (c) with one color per such branch). The spline fitting ( Sections 3.2.1 and 3.2.2 ) will approximate-and-merge such branches, resulting in a SMAT that cannot perfectly reconstruct the input shape. However, SMAT can achieve 0.3% approximation error and an MS-SSIM score of up to 0.99 ( Section 4 ). We argue this is sufficient for most applications such as shape matching, retrieval, and deformation, which do not require perfectly lossless encoding.

Conclusion
We have presented SMAT, a method for encoding the medial axis transform (MAT) of raster (binary image) shapes with B-splines. For this, we simplify raster MATs using a salience metric, segment them into branches and branch-segments, and optimize the fit of a set of B-splines over the resulting segments to minimize the number of required control points while maximizing the spline-to-MAT fit. We evaluated SMAT on a collection of raster shapes of different types and complexities, using several quality metrics for image and shape comparison. Our results show that SMAT can achieve visually indistinguishable results to the original images, while reducing the space needed to store the MAT by one to two orders of magnitude. SMAT has only two simple-to-set parameters: the degree of MAT simplification and desired MAT approximation error. We showed how SMAT enables generating super-resolution images, can capture multiscale MATs, and allows shape manipulation. SMAT is implemented on the GPU making its application real-time for images up to 10 0 0 2 pixels.
We next aim to extend SMAT beyond binary shapes, to encode grayscale and color images and potentially 2D and 3D scalar fields for scientific visualization applications. Separately, we aim to explore the potential of SMAT for image vectorization applications.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.