Interactive image manipulation using morphological trees and spline-based skeletons

.


Introduction
Image manipulation plays a key role in image processing and computer graphics. Many image modification methods exist [1][2][3][4][5], most of which are based on raster techniques. Images represented in vector form [6] have been shown to be easier and more natural for humans to edit, mainly because vector images are represented using higher-level primitives, often controlled by arrangements of control points with an intuitive and predictable influence on the image.
Recently, Wang et al. [7] generated vector representations of the medial axis transform (MAT) from raster representations of input shapes and explored its potential use for binary image manipulation. While generating very interesting deformation results, it is limited to binary images and only to the most basic operations. Next, they extended the spline-based MAT [7] to the spline-based dense medial descriptors (SDMD) [8] to realize image compression of grayscale and color images. Equipped with this vector image representation, we now explore its suitability for image manipulation. To this end, we develop an experimental tool for users to interactively manipulate grayscale and color images. It exploits SDMD to reach its full potential by providing both local and global control to the user over the elements of the • Generality: Our tool can directly handle any raster image of any resolution; • Interactivity: Except for the initial encoding process, which can be pre-computed and is calculated only once, all subsequent manipulations are in real-time, which brings users instant interactivity; • Applications: We demonstrate the good performance of our tool in a variety of applications, including watermark removal, image deformation, data augmentation for machine learning tasks, artistic effect generation, image rearrangement, and clothing design.
We start by reviewing related work (Section 2), which is followed by a detailed description of our image manipulation tool (Section 3). Then we show concrete applications of our tool (Section 4) and discuss its merits and limitations (Section 5), before concluding this paper (Section 6).   [1]. (b) Skeleton-based image manipulation in [4]. (c) A physically-based approach example taken from [3].

Related work
We structure related work into two groups: image manipulation methods (Section 2.1) and morphological tree representations (Section 2.2).

Image manipulation methods
Image manipulation has attracted a lot of research over the years due to its popularity and commercial importance. One such application that attracts a lot of attention is image or shape deformation, which can be roughly classified as follows.

Free-form deformation (FFD)
is a popular approach for image (and shape) manipulation [1,9,10]. This method explicitly divides the (image) space into many domains, e.g., lattices [1] and cages [11,12], and manipulates each domain by moving control points defined in it, as illustrated in Fig. 1(a). While allowing precise and flexible control [4,13], setting FFD domains is tedious, requiring the user to laboriously manipulate many control vertices [3]. In addition, FFD methods do not take into account the natural way in which objects move in the real world [14,15].
Skeleton-based approaches are also widely used for shape deformation, which using a pre-defined skeleton to manipulate the input shape [4,16]. Note that this skeleton is not exactly the medial axis used in [7,8]. Rather, it is similar to the bones of a character, see Fig. 1(b). The typical workflow of skeleton-based methods is to bind the components of the character to be edited to a pre-defined skeleton such that each component follows the motions of its associated bones. Skeleton-driven approaches are also commonly used in the deformation of 3D shapes [17][18][19]. While offering intuitive control of 2D or 3D shapes, binding a shape to a skeleton, either manually or automatically, is not a trivial task [17], especially for shapes lacking an obvious boneand-joint structure, e.g., jellies [3], to mention just one salient example. [3,14,20,21] can be regarded as variants of detail-preserving differential mesh deformation techniques [22], which deform shapes by modeling their rigidity. These methods allow the user to directly manipulate a shape through a click-and-drag interface, as shown in Fig. 1(c), and generate physically natural results by minimizing local shape distortion. However, such methods are computationally expensive, resulting in slow convergence, and require careful tuning of several parameters [14].

Physics-based methods
Image deformation techniques, as described above, are most suitable for images with sharply delineated and simple shapes. For more complex images, additional image manipulation applications have been investigated. Pérez et al. proposed Poisson Image Editing [2], a gradient-based image manipulation method, which is a simple and efficient way for many operations, such as seamless cloning, contrast enhancement, texture flattening, and local illumination/color changes. Since then, numerous applications have exploited the benefits of working in the gradient domain. Raskar [23] presented a class of image fusion techniques to automatically combine images of a scene captured under different illuminations. Levin [24] proposed a technique for image stitching which combines several individual images that have some overlapped regions. Sun [25] formulated the problem of natural image matting as one of solving Poisson equations with the matte gradient field. Finally, Aris [26] proposed a general variational framework for non-local image inpainting. More related work can be found in [27].
In recent years, deep learning-based methods have significantly boosted the performance of image manipulation due to the availability of large amounts of data that one can train on [5,[28][29][30]. These methods mainly focus on a task called image-to-image translation, which aims to convert a specific aspect of a given image into another, ranging from changing the facial expression [5] or hair color [29] of a person to modifying the seasons of scenery images [30]. While yielding amazing image manipulation results, these methods require a significant number of labeled image pairs. To avoid this, Vinker [31] introduced a novel method for training deep conditional generative models from a single image. After training, this method is able to perform challenging image manipulation tasks by modifying the primitive representation. However, this approach requires training a separate network for every image, which can be expensive on large datasets. Furthermore, deep learning-based methods generally do not have a convenient interface for user-interactive operation.
In this paper, we propose an interactive image manipulation method that differs from all the previously described techniques. We integrate two novel works: the icicle representation for morphological trees (described next) and the SDMD [8] used for image representation.

Morphological tree representation
As is well known, medial descriptors, or skeletons can only be computed for binary shapes. Thus, in order to be able to represent a grayscale image I with skeletons, we decompose I in n (256 for 8-bit images) binary images (called level sets) by upper thresholding T This works efficiently for image compression tasks [7,8]. Yet, when it comes to image manipulation, finer-grained spatial control of each level set is required. Fig. 2 shows an example. The synthetic image (a) considered in the figure contains nested triangles and nested disks; (b) then shows its four upper-level sets. When one wants to remove, rotate, scale, or move those triangles, it is inconvenient to manipulate them individually. To conquer this, we propose to use the morphological tree representation [32][33][34], which represents hierarchically all connected components of an image. Thus, a morphological tree is a complete representation of an image.
The most common morphological tree representations contain trees of shapes [35] and component trees [36]. The latter are usually represented by compact and non-redundant data structures  where CC(T i ) denotes the sets of either 4-or 8-connected components of the threshold sets T i . The max-tree representation of Fig. 2(a) is shown in (c). From that, one can either process each triangle shape individually, or, alternatively, all triangles collectively by selecting all descendant nodes of node E. Component trees can be computed and processed efficiently [37][38][39], which is widely used in object recognition [39], 3D segmentation [40], and remote sensing [41].
To interactively manipulate a grayscale image with component trees, many visualization tools have been proposed [42,43]. In such tools, the user either sets parameters for the manipulation task or selects regions in the input image. Next, the tool shows interactively the filtering or segmentation results. However, since max-trees of natural images have tens of thousands of nodes (see the example in Fig. 3), the user only interacts with the image and parameters, and not directly with the max-tree.
To simplify the structure of component trees, Tavares [44] proposed a simplification procedure based on two attributes: extinction value [45] and the area of nodes. They further improved the simplification by applying an area difference filter, yielding a more meaningful graphical representation of component trees [46], as shown in Fig. 4. However, the simplified tree is no longer a complete representation of the original image. To alleviate this, we next propose to apply a new representation of the component trees: icicle plots [47] in Section 3.1. Icicle plots not only contain all the information of the original image, but they are also more compact and more organized.

Proposed method
As stated in Section 2.1, our proposed method combines two novel works: an icicle representation for component trees and an interactive spline manipulation. Fig. 5 demonstrates the pipeline of our proposed method. Given a grayscale image, we first compute its max-tree or min-tree, which is next represented in an icicle plot (Section 3.1). All the nodes in the icicle plot are associated with their corresponding spline control points (step 1). Next, we allow users to select single or multiple nodes for subsequent manipulation (step 2). Section 3.2 describes several methods for node selection. The associated connected components and control points of the selected nodes are displayed for interactive spline deformation (step 3), which is described in detail in Section 3.3. Finally, the manipulated image is reconstructed (step 4).

Icicle plot representation
Icicle plots [47][48][49], also called icicle trees, represent hierarchical data in the form of stacked rectangles, usually ordered from top to bottom, following the order of nodes in a tree from its root to its leaves. Compared with other representations, such as node-link visualizations, icicle plots allow an easier reading of the nesting relationships, the areas of the nodes, and various attributes of the nodes, such as, in our case, the grayscale of the encoded objects, their perimeter, circularity, complexity, or the number of skeleton points or spline control points. As such, we choose the icicle plot metaphor to represent the hierarchical component trees of grayscale images. Fig. 6 shows the icicle plot of the max-tree for the synthetic image in Fig. 2. The selected image is on purpose simple, for illustration purposes. Each icicle, or node, corresponds to a connected component (CC) in the upper-level sets of the input image. As visible, the brightest disk (filled with red) in the original image corresponds to the node marked by the red box in the lower right corner of the icicle tree. The slender orange rectangle at the top is the root node, which takes up the entire width. Each child node is placed under its parent with the width proportional to the area of the component. The grayscale that each node reaches down to is exactly the gray value of the selected CC. The height spanned by each node on the grayscale bar is the gray level difference between the level of the selected CC and the previous level. The fill color of each node can be coded by various attributes, e.g., the number of skeleton points, as shown in Fig. 6. Other attributes, including area, perimeter, circularity, and complexity, are also implemented in the tool and available via its user interface.
To decrease the number of pixels needed to render a node of the morphological tree and to keep the visual separation of the nodes, we draw the rectangles of the plot with no border and apply a shading scheme. In this approach, each rectangle is a quad primitive with a HSV base color associated which is tessellated by the graphics hardware. During the tessellation process, we compute a luminance profile which replaces the value (V from HSV) channel of the HSV color. To obtain a good visual separation of the  node, we apply an asymmetrical cushion-like [50,51] luminance profile. This profile is obtained by computing two 1D cubic Bézier curves: one that is sampled vertically and another horizontally in the quad tessellation process. Then the final luminance of the pixel is obtained by multiplying the samples of the Bézier curves at the parametric (u, v) coordinates produced by the tessellation shader. Fig. 7 graphically shows the process of obtaining the luminance profile from the vertical and horizontal curves (or profiles).
Using this approach, by default, we set the control points to [0.9, 0.9, 0.9, 0.45] for both curves, which results in a high value of luminance at the top left, gradually reducing towards the bottom right. This shading produces a dark bottom-right region that meets the bright top-left regions of its neighboring rectangles, leading to a visual separation between the nodes. An example of tree rendering using these parameters can be found in Fig. 7.
Since we modify the color luminance for shading, the rendering uses an iso-luminant color map [52,53] (see Fig. 7, bottom right). Although icicle plots are compact and can display hundreds of nodes, depending on the area of the nodes and window size, the nodes might be displayed too small. To address this issue, we developed zoom in, zoom out, and zoom restore functionalities. For complex images, the user can explore the connected components of the image by clicking around the nodes of the tree, in the pixels of the image, or moving the selection to parent (by pressing the key P) and zoom in to explore related connected components of small details of the image. When the tree is zoomed in on, the user can hold the Alt key for panning around the tree. When the small details editing is finished, the user can bring back the full tree visualization by using the zoom-restore functionality. An example of this process is shown in a video recording in the supplementary material [54] and a zoomed in tree is depicted in Fig. 8.
To sum up, icicle representations clearly show the nesting relationship, the size, the gray level, and other custom attributes of connected components of an input image. In addition, they depict tree nodes using only a few pixels and yet keep a visual separation of neighboring nodes using a shading approach. Having this compact and well-organized representation, we are now ready to perform image manipulation. The proposed interactive image editing tool is the combination of a high-level global manipulation and a more detailed deformation of local components.

Global manipulation
Global manipulation, also considered to be inter-node manipulation, generally includes removing or restoring single or multiple nodes or CCs, which is useful for applications such as image segmentation, local luminance changes, and watermark removal (Section 4.1). Node selection can be implemented manually or algorithmically, as described next.
Manual selection refers to the user directly selecting the node or component that one wants to cut out or restore in the interface. Since the image components in the left window and the nodes in the right window are associated, one can select the part one wants to manipulate by either directly clicking the node in the icicle tree or the component on the image. In both ways, one can select and deselect multiple nodes by holding down the shift key. When a set of nodes are selected, their connected components are painted in red on the image panel, so the user can quickly see the region they are about to edit. The clicking-and-selection operation is straightforward and convenient. Yet, this operation can be cumbersome when there are plenty of nodes to be manipulated. To address this, we added a function to select all descendant nodes of the currently selected node by clicking icon A. Fig. 9(b) illustrates this by manipulating a simple art deco image (a). Deleted nodes are set to translucent. As visible, by discarding all descendant nodes of node D, their corresponding components (the rightmost petals in multiple levels) on the image are also eliminated, which indirectly achieves the effect of local brightness changes. The operation of restoring a node is similar to deleting one. By selecting a deleted (translucent) node and then clicking the node inclusion icon (C), one can restore the node. However, one cannot add nodes that did not exist in the original tree.
Algorithm-based selection aims to set the number of layers L in the parameters setting area, and then run the SDMD method, so that the method will select and retain the most representative L layers using the cumulative histogram layer selection scheme. The program constantly checks whether A(T If the difference between A(T ↑ i ) and A(T ↑ j=i+m ) is smaller than λ, we increase j until the inequality is satisfied. At that point, we select layer T ↑ j and repeat the process until we reach the last layer. To set a suitable λ, we do a L-to-λ conversion by the binary search method; see details in [56]. Fig. 9(c) shows the selection result when L is set to 3. The original image contains tens of level sets or layers. Although not easily visible, there are various grayscale values at the edges of the petals. By setting L to 3, the method selects the three most informative layers, as indicated by the red arrows. As can be seen from the results in the left window, almost no important information is removed. The algorithm-based selection can only preserve or remove all nodes of a certain layer. Still, in combination with the manual operations described above, more refined global manipulation can be achieved.
Global manipulation is useful for image segmentation [57][58][59]. Fig. 10 illustrates a skull-stripping segmentation [60,61] by showing several components (A, A1, A2, B, C) of a magnetic resonance (MR) image ( Fig. 10(a)) reconstructed from several nodes and their corresponding descendants. As visible, the whole brain (A), including the brain stem (A1) and cerebellum (A2), as well as the parotid tissue (B) and nasal tissue (C), are successfully segmented. Moreover, our proposed method is fast and does not require any preprocessing such as intensity normalization or denoising.

Local manipulation
Local manipulation is also seen as per-node deformation and mainly refers to deforming a single connected component by manipulating its spline control points (CPs). Literature [7] has illustrated several preliminary operations, including moving, adding, and removing CPs and increasing or decreasing the degree of the spline representing the medial axis transform. In this section, we further expand this idea by presenting more functions. We use the node D in Fig. 9(a) as an example to introduce our user interface. By selecting node D and then clicking the icon to the right of icon C, we open the user manipulation interface, as shown in Fig. 11. We next introduce one by one all the tools we propose for this task.  Displaying all CPs: By clicking icon A, all CPs of the current component are shown in the manipulation interface. Each component has one or several skeleton branches, thus resulting in one or more splines. Control points (CPs) on the same spline are connected by lines of the same color, which indicates the degree of the spline, as shown at the bottom of the interface. In contrast to icon A, the function of icon B is to make all CPs invisible.
Changing the radius/degree: When the mouse hovers over a CP, its radius size is displayed, and the radius value is also updated in the L area. When a point is clicked, it is highlighted in blue. Then, when holding down the shift key and scrolling the mouse wheel, the radius (both the graphical representation and the actual value) changes accordingly. The operation of modifying the degree is similar, except that the shift key has to be replaced with the D key.
Adding a CP to the spline: Icon D allows users to add a CP to the spline. Note that the clicked point needs to fall in the (invisible) rectangle formed by any two consecutive CPs of the spline. Otherwise, a new spline (with two CPs) will be created upon such a click.

Removing CPs in a spline:
The user is allowed to remove one or more CPs in a spline by pressing icons E or G. One can also delete the entire spline via icon F.

Rotating/scaling CPs:
Icon H is used for rotating all selected CPs. After clicking this icon, one first needs to select a rotation center, then select the CPs to be processed by dragging the displayed rubber band marker with the mouse. Next, one can hold down the R key and scroll the mouse wheel to specify the desired rotation angle. The scaling function is similar, except that icon H and the R key have to be replaced with icon I and the S key.
Copying/cutting CPs: These two functions are similar. First, the user can select one or more CPs, then press the C/X key, then click somewhere for the CP(s) to be pasted and press the V key to effectuate the actual CP pasting.
Reconstruction: Icons J and K are used to reconstruct the manipulated component and the whole image, respectively. The changed splines are first rasterized on the desired pixel grid to generate the manipulated skeletons. Then, we reconstruct the component with the medial disks envelope method [8]. Fig. 12 shows the manipulation of a shadow puppet character, in which most of the above operations are covered, including deleting CPs (−c), decreasing radius values (−r), and moving (M), rotating (R), scaling (S), and copying (C) control points. We start by making the figure head smaller by pressing icon I, next selecting all the CPs that represent the head, and then scrolling the mouse wheel down to adjust them to the appropriate size. Then we move all CPs down slightly to make the result more realistic. For the left arm, we intend to separate the hand from the body. For this, we first delete the selected spline A (with 5 CPs) in Fig. 12(a), then rotate the arm clockwise by about 30 degrees, and next copy the right hand into the left side and rotate it by the suitable angle. We also decrease the radius of both CPs of the spline at the elbow by 8 pixels. In addition, the left leg and right arm of the character are also rotated by about 20 degrees clockwise and counterclockwise, respectively.

Applications
In the previous section, we introduced our proposed interactive image editing tool. Section 3.2 introduced several schemes for selecting multiple icicle nodes. Section 3.3 demonstrated the internal manipulation of a single node. Combining the two, i.e. to move, scale, rotate, remove, and paste multiple nodes at once, we enable more interesting and powerful applications of our method, as detailed next.

Visible watermark removal
Visible watermarks are widely used in images and videos to protect copyright ownership. Analyzing watermark removal helps to strengthen the anti-attack techniques in an adversarial way, which attracted increasing attention and became a hot research topic [62][63][64][65]. Due to the uncertainty of the size, shape, color, transparency, and location of watermarks, developing an automatic visible watermark removal method remains a difficult task. Some techniques even require user-guidance [62,63] or assume that test images have the same watermark region [66]. Our image manipulation tool provides a way to remove watermarks, but in an interactive way, rather than automatically like the methods mentioned above. Our proposed method is very simple. We first select watermark-related nodes through the several schemes introduced in Section 3.2. Then we enter local manipulation (Section 3.3), and press icon E in Fig. 11 to delete all control points associated to the watermark. Two manipulation demonstrations are available in the supplementary material [54]. Fig. 13 shows the results of our method on six watermarked images. As can be seen from the three grayscale images, our method works well not only for images where the embedded watermark is brighter than the surrounding area ( Fig. 13(b1,  f1)), which can be easily manipulated with the selection-anddeletion scheme described in Section 3.2, but also for images where the embedded watermark has a similar or lower intensity to the surrounding area ( Fig. 13(d1)). Our proposed tool also yields good results for color images (a2, c2, e2) by manipulating their three components, e.g. YUV, independently. However, since the manipulated image is reconstructed from the skeletons, this watermark removal method also has the common drawbacks of skeleton-based image representation methods [8,56,67], i.e. it cannot deal well with images with many thin and small-scale details, such as plants on the mountains (b2, e2) and animal fur (c2, d2). Yet, the skeleton-based method is good at processing images with relatively large shapes but thin watermark shapes, such as shown in Fig. 13(a1). For such images, the SDMD method with suitable parameters (by setting the SDMD control bar in Fig. 9) inherently removes those thin watermark patterns even without using global or local manipulation. This process takes only a few seconds (see the demonstration in the supplementary material), rather than several minutes as with the GIMP tool (see details in Section 5.1).

Image deformation
The previous section has shown how to achieve good watermark removal performance by combining the multiple node selection in the global manipulation (Section 3.2) with the CPs deletion in the local manipulation (Section 3.3). In this section, we combine the node selection with deletion and addition of CPs in local manipulation to implement image deformation.
Fig. 14 illustrates an example by showing several steps to remove glasses from a cartoon avatar. We first remove the glasses' lenses (step 1) by deleting nodes A, B, and C in Fig. 14(b2) by pressing icon B in Fig. 9(a). Then we eliminate the glasses' frame (step 2) by selecting node D and all its descendants in (b2), entering the local manipulation interface (c2), and deleting all CPs related to the frame shape (region E in c2).
Step 2 produces a very light eye contour (see (c1)), so in the next step, we aim to darken the eye outline (step 3). We select only node D in (b2) and enter the user interface (d2). Then we use the CP adding function (icon C in Fig. 9(a)) to put in several splines and manipulate their CPs to form the eye contour, as shown in region F in (d2). Now we successfully remove the glasses from the original image (a1) and generate a reasonable result (d1).
Following the same idea, we further generate four other facial changes, as shown in Fig. 15(a1-a4). To generate (a1), we first remove all CPs related to the glasses and the eyes. Then we add four splines to represent the smiling eyes. The remaining ones use similar operations. We first delete control points that represent the smiling mouth, then we add the new mouth (a2), mustache (a3), and beard (a4) in turn by adding new splines. Fig. 15(b1-b4) shows the manipulations of a more complex running horse image. To generate (b2), we first rotate all the CPs representing the horse counterclockwise by about 30 degrees, then rotate the two hind legs of the horse clockwise by about 35 degrees. Next, we rotate the fore cannon bones clockwise by certain angles to achieve the bending of the forelegs. The demonstration is available in the supplementary material [54]. We manipulate the remaining two examples (b3, b4) similarly, mainly by rotating and moving the control points representing the front legs.

Dataset augmentation
Many machine learning setups require a high number of samples to avoid overfitting and to increase their performance. However, acquiring annotated samples for a dataset is usually hard and costly [68]. Thus, producing new samples by applying transformations to existing samples in a dataset using data augmentation techniques has been used to address these issues [68][69][70]. We show an example of using our tool to automatically augment Key steps for removing glasses from a cartoon avatar (a1). Images (a2) and (b2) show icicle trees of (a1) and (b1), respectively. Image (c2) shows the corresponding components and control points of node D and its descendants, while (d2) shows those of node D only.  Fig. 14(a1). (b2-b4) Deformations of a running horse image (b1). a handwritten digit dataset by randomly applying small changes to the samples. The general idea is simple: we take an annotated sample (in this case we already know its class), and then produce different versions of this sample by randomly generating small changes on the control points of different threshold levels.
As a proof of concept, we apply this method to augment a subset of the MNIST dataset [71] using our tool's functionality to generate images by applying random changes; this can be accessed by clicking on Icon C (Fig. 11). In this experiment, we created a dataset called 50-MNIST by randomly choosing 50 samples for each digit of the MNIST dataset and training an SVM on it. After that, we augmented the dataset. We scaled up the images 10 times to allow our tool to successfully produce relevant skeletons that can be manipulated. For each image, we ran the SDMD pipeline, keeping only the 10 most relevant threshold values, selected all nodes other than the root node (to have all details of the Using these values, we computed the mean µ T = min T +max T 2 and the standard deviation σ T = max T − min T , where T ∈ {dis, rad, rot, sca}, for a normal distribution of the parameter of each transformation. Then, for each control point and each transformation, we drew a parameter from its corresponding normal distribution and applied the transformation. Then, using a script, we generated 15 new images by applying these transformations with the parameter randomly drew for each sample of 50-MNIST. This procedure produced images that are slightly perturbed versions of the original sample. Examples of the generated images are shown in Fig. 16. Then, we combined the 50-MNIST dataset and the generated samples to train an SVM, using the same hyper-parameters that were used to train the SVM on 50-MNIST. We computed performance scores using the MNIST test dataset for both SVMs: trained on 50-MNIST and augmented 50-MNIST. The computed scores are shown in Table 1. In general, our data augmentation strategy increased the accuracy of the model from 0.83 to 0.88. It is worth noting that our approach produces small shape perturbation as observed in Fig. 16, like the shape of the holes in digits zero and two; the top and the tail lines of digit two. As far as we know, these shape changes are not easily randomly obtained by other techniques. In addition, we use a similar approach to 15-EMNIST, randomly picked 15 samples for each class of EM-NIST [72], and increased the accuracy of the SVM model from 0.47 to 0.54 (the classifier full table is available in the supplementary material [54]). Thus, the results show the potential of our tool for data augmentation tasks.

Other applications
In this section, we further combine the node selection in the global manipulation (Section 3.2) with more features in local manipulation (Section 3.3), including scaling, moving, rotating, cutting and copying CPs, to implement additional applications.
Artistic illumination effects can be achieved with our method. Fig. 17 shows two examples to illustrate the potential of our tool to produce these artistic effects. The enlargement (a2), diminution (b2), movement (a3, b3), and removal (a4, b4) of the white light spots can be implemented by scaling up, scaling down, moving or rotating, and removing, respectively, all control points in all nodes representing these highlights on the three components (YUV) of the tomato (a1) or the copper ball image (b1). The manipulation demonstration is available in the supplementary material [54]. While the proposed image editing tool handles these simple objects with ease, we do not claim that our tool does already have a well-established relighting mechanism for dealing with very complex objects.
Image rearrangement is also easily implemented using our method. Fig. 18 shows an example by rearranging the birds' positions. We aim to exchange the position of two birds on the far right (D, E) and the two ones in the middle (B, C), and rotate bird D. We start by cutting the CPs (by pressing the X key) that encode the two birds on the far right (D, E) and pasting them (by pressing the V key) into an empty space in the spline manipulation interface; see the demonstration in the supplementary material [54]. Then we select the CPs representing birds B and C by dragging the displayed rubber band marker with the mouse and move them to the far right. Next, we select and move the CPs that represent birds D and E to the second and third positions. We end up using the rotate function (icon G in Fig. 11) to rotate bird D by about 30 degrees clockwise.
Clothing design can also be executed with our tool. Fig. 19 gives two examples. For Fig. 19(a), we intend to change the long-tube shoe to a short-tube one, and the thick sole to be thinner. For this, we first eliminate CPs that encode the long tube, and then reduce the radius of all CPs of the splines representing the sole of the shoe by about 1/3. For Fig. 19(b), we simply add two stripes to the T-shirt by adding several splines encoding the stripes. Note that the added stripes should be put in the same position for the three components, i.e., YUV, of the color image, otherwise false colors are produced; see Section 5. As visible, our approach generates images of good quality and fidelity.

Discussion
In this section, we discuss several aspects of our interactive image manipulation tool.

Comparison with GIMP
GNU Image Manipulation Program (GIMP) is a well-developed and widely used tool. An advanced interface and rich features allow GIMP to handle simple image painting as well as complex image manipulation. Table 2 compares the performance of GIMP and our method on seven applications introduced in Sections 4 and 3.2. The operation time listed in the table only counts the manipulation time for each task, i.e., for GIMP, excluding the time to open the image, and for our tool, excluding the initial encoding Table 2 Comparison of GIMP and our tool for seven different applications in terms of manipulation convenience (top) and operation time (bottom; in minutes). Manipulation convenience is judged on a Likert scale that ranges from 'very easy' to 'very hard' (++, +, +/−, −, −−).  process (see Section 5.2). All experiments were performed on a Linux PC with an Nvidia RTX 2060 GPU. The participant is familiar with both our tool and GIMP, and has normal vision without color blindness issues. All the manipulation demonstrations are available in the supplementary material [54].
Image segmentation: For the skull-stripping segmentation (Fig. 10), GIMP takes around 4 min. The segmentation of each tissue requires careful marching with the mouse along the edges of the object with the free or fuzzy selection tool. The marching ants can be tuned by adding to and subtracting from the current selection modes. With our tool, however, it is only a matter of finding the nodes representing each tissue, which takes about 1 min.

Watermark removal:
To remove the watermarks, GIMP can use the clone or healing tool to cover the watermark with the pattern near the watermark, which takes around 3 min to clean an image with regular watermarks (such as (f1) in Fig. 13). However, when dealing with images with lots of watermark patterns (like (a1) in Fig. 13), this process lasts more than 10 min. In contrast, our tool only needs to remove the control points representing the watermarks, which takes only 1 to 3 min.
Image deformation: In GIMP, cage transform is used to deform an image. However, this feature is not handy when a certain part of the image needs to be rotated. For that task, the rotate tool plus the clone tool can help. To achieve the deformation in Fig. 15(b2), GIMP takes about 6 min, whereas our method implements this only with the rotation function (icon H in Fig. 11), which takes around 3 min.
Dataset augmentation: For a single sample, our tool generates random changes by simply moving the corresponding CPs, which only needs dozens of seconds. While GIMP can achieve this using cage and healing tools, which takes about 2 min. Most importantly, our tool can automatically produce a high number of sample variants by randomly adjusting the positions of the sample CPs, which is almost impossible for GIMP.
Artistic effect: Both GIMP and our method can achieve artistic results easily and in very little time. To shrink the light spot in Fig. 17(a1), GIMP can use the shrink area mode in the warp transform while our tool can simply apply the scaling feature (icon I in Fig. 11).
Image rearrangement: This application can be implemented with cut and paste, which is achieved in both GIMP and our tool. The difference is that GIMP pastes the content to a different layer, whereas our tool can operate on the same image layer, which takes less time.
Clothing design: GIMP can achieve this application by directly drawing on the image using pencil or ink tool, or by erasing certain content with the eraser. Our method can add new elements by adding new splines (icon D in Fig. 11) or remove content by delete the corresponding CPs (icon G in Fig. 11).
To sum up, compared with GIMP, our tool implements the above seven applications with the same or more convenient manipulation in similar or less time. Furthermore, our tool can achieve one function that GIMP cannot, i.e., automatically producing a high number of sample variants for certain machine learning tasks. However, we acknowledge that the seven applications evaluated in the table are the ones that our tool excels at. There are also some applications that are easy to be achieved with GIMP but not for our tool, such as drawing content with a brush, copying content from another image, adding text, and more. In terms of quality, we admit that our skeleton-based method does not handle images with fine details well, which is not a problem for GIMP.

Running time
The longest (slowest) step in our end-to-end pipeline is the encoding process at the beginning. This is so since all the image information needs to be encoded, including all components on all layers. Take the original image in Fig. 10 as an example, whose intensities span from 0 to 255. Although its resolution is only 320 × 320 pixels, encoding all the information, i.e., computing skeletons, and running spline fitting for all the components, takes 118 s on a commodity PC. In addition, encoding runtime depends on the morphological tree size (number nodes). The size of the morphological tree depends on the content of the image; larger regions with tiny details like foliage and fur produce many nodes. Thus, encoding a larger image could be faster than a smaller image if its morphological tree contains sufficiently fewer nodes. We ran runtime experiments for a dataset of images with different content and sizes varying from 256 × 256 to 640 × 480, and plot a graph comparing the runtime of the encoding process according to the number of max-tree nodes (Fig. 20). More detail on this experiment as well as other runtime plots relating to image size and grayscale resolution are available in the supplementary  material [54]. Fortunately, the encoding operation only needs to be executed one time. Once the encoding process is complete, the subsequent series of operations, whether deleting, adding icicle nodes, all the manipulation operations on CPs, or reconstructing the components or images, occur in real-time. In practical applications, for inexperienced users, it takes approximately 2 to 4 min to successfully remove the watermark in an image or achieve a reasonable image deformation result. As for image rearrangement, artistic effect generation, and clothing design, it takes around 1 to 2 min to perform.

Limitations
Our interactive tool is not yet able to handle the three components of color images simultaneously, but only facilitates their manipulation separately. This leads to a problem where false colors, ghosting, and artifacts can occur when changes to the three components do not coincide. Fig. 21 shows two examples. For (a), when the added stripes for the Y component and U component of the right image in Fig. 19 are in different positions, ghosting is introduced, as indicated by the arrows. Similarly, in (b), when the added eye shapes in the three components of the image in Fig. 15(a1) do not coincide, artifacts and false colors are produced; see where the arrow points. However, we argue that these ghosting and false colors have perceptually little impact, and they can be avoided with careful manipulation.
We admit that for inexperienced users, some per-node manipulations may not be so intuitive and fully straightforward, e.g., adding splines to form the eye shape in Fig. 14(d2), or performing some complex deformation operations on the legs of the binary horse shape (see the supplementary material [54]). To achieve these, some practice and experience with our tool is required. Yet, apart from that, we believe that removing, moving, rotating, scaling, cutting, and copying some CPs in whole are all easy to understand and conduct for inexperienced users. Besides, the global manipulation (Section 3.2) is straightforward to learn to use. There are only two windows in the interface: The left one is the image display interface while the right one shows the icicle tree; see Figs. 6 and 9, respectively. The two windows are linked, e.g., when one clicks an icicle node on the right, its corresponding component is highlighted on the left, as shown in Fig. 6, and vice versa. In the supplementary material [54], we provide our full source code and all demonstration videos for replication purposes.

Conclusion
In this paper, we have presented a novel interactive image manipulation tool, which combines the spline-based medial axis for shape manipulation [7] with an icicle representation of component trees. Dealing with component trees instead of threshold sets allows finer-grained spatial control of each level set. We have demonstrated how to operate our tool in detail in Section 3. To verify the effectiveness of our tool, we have illustrated it by several applications of editing real-world images. Only manipulating icicle nodes globally (Section 3.2), such as removing multiple nodes in the icicle tree, achieves simple watermark removal tasks ( Fig. 13(a, e)). When adding local spline manipulations, more interesting applications are achieved. When the global manipulation is combined with the control points deletion function, more complex watermark removal tasks are achieved ( Fig. 13(b, c, d, f)). When combined with removing and adding CPs, interesting image deformation is achieved (Section 4.2). Also, we have shown an application of these deformations in data augmentation by generating new samples for a handwritten digit dataset (Section 4.3). We have also combined the global manipulation with more features in local manipulation, including scaling, moving, rotating, cutting and copying CPs, to implement artistic illumination effects, image rearrangement, and clothing design (Section 4.4). Several future work directions are possible. First, more functions can be added to our tool. One possibility is to allow to open two images simultaneously and stitch their content, such as seamlessly stitching objects from one image into the background of another image. Some additional functions, such as allowing users to directly move the node up or down in the icicle plot interface to achieve the intensity change of that node, can also be considered. Secondly, improving our tool to process color images more conveniently is also important to study. Finally, we aim to explore the potential of our tool for more applications in the future, such as image smoothing and image abstraction.