1 Introduction

In 3D shape analysis, the extraction of feature descriptors aka local shape descriptors is a fundamental step [18]. Early research on feature based methods mainly focused on rigid shape analysis [10, 11]. A large number of descriptors have been developed, such as the local surface patch [9], spin image [14] and rotation projection statistics [12]. The development of non-rigid descriptors is more challenging due to the large degrees of freedom resulting from the local deformations. Several methods have been proposed, such as the geodesic mapping [13] and conformal factors [17]. However, these methods are sensitive to the topological noise and the geometric noise, which are inevitable in these applications. In the recent years, an intrinsic geometric property known as diffusion geometry has become popular and achieves the best performance [2, 5, 15, 18, 20, 22, 23, 27]. It is based on the spectral decomposition of the Laplace-Beltrami operator associated with a shape, and uses eigenvalues and eigenvectors to construct the diffusion distance, which provides an intuitive interpretation of the shape properties in terms of spatial frequency [15]. This work falls in the category of diffusion geometric framework.

The Local Binary Descriptor (LBD) has attracted a significant interest in the analysis of 2D images due to its computational simplicity and discriminative power [8, 28]. However, little efforts have been made to extend the LBD framework to the field of 3D shape analysis [26]. The key to the development of an LBD for non-rigid shapes is the construction of a repetitive Local Reference Frame (LRF), because the LBD requires an intrinsic order for computation. Several spatial structures have been proposed to facilitate feature descriptors for non-rigid shape analysis. In [29], an LRF is constructed using the surface normal and two vectors which lie on the tangent plane (a common method used for rigid shape descriptors). This LRF is not invariant under non-rigid transformations, because the relative positions of the points will change. In [7], a ‘multiple circular geodesic pathways’ is defined on the local surface using a fixed number of increasing geodesic distances. Within each circle, points are sampled in a clockwise direction with respect to the surface normal. In [15], the local surface is charted by shooting geodesic outwards from the feature point to form a polar coordinate system, where the ‘angle’ is defined as tantamount to the geodesic shooting direction, and ‘radius’ as the geodesic distance. However, neither of the methods solves the problem of orientation ambiguity during the construction of the LRF. In [26], the local surface is modeled as a structure of ordered and concentric rings around the central facet based on the categorization of the facets on its contour. Since it is fully dependent on the structure of the mesh, the LRF is not intrinsic and robust.

In this paper, we develop a local descriptor for non-rigid shape analysis, called Heat Diffusion based Local Binary Descriptor (HD-LBD). There are two main contributions in this paper. First, we construct a new repetitive Local Polar Coordinate System (LPCS) which is invariant under isometric transformations. Second, we develop a binary descriptor facilitated with the LPCS for non-rigid shape analysis. Experiments were performed to demonstrate the effectiveness of the proposed method.

2 Background

Diffusion geometry is one of the most successful approaches for non-rigid shape analysis. Reuter et al. [20] exploited the Laplace-Beltrami spectra as an intrinsic shape descriptor called Shape-DNA. Rustamov et al. proposed the Global Point Signature (GPS) [22] by associating each point with an \(l^{2}\) sequence formed by the eigenfunctions and the eigenvalues of the Laplacian. Sun et al. [23] developed the Heat Kernel Signature (HKS) based on the analysis of heat diffusion process. Kokkinos et al. [5] introduced the Scale Invariant Heat Kernel Signature (SI-HKS) using Fourier transform. Kokkinos et al. [15] developed the Intrinsic Shape Context (ISC) by aggregating HKS of the feature point’s local neighborhood to further improve its descriptiveness. In [2], the Wave Kernel Signature (WKS) was proposed based on a different physical model, in which one evaluated the probability of a quantum particle with a certain energy distribution to be located at a point. Litman et al. [18] and Windheuser et al. [27] used machine learning techniques to learn the spectral descriptor for a specific task (e.g. human recognition). Our work follows on the idea of diffusion geometry. Unlike [5, 15, 23], we use the HKS of a particular scale to define the scalar field on a manifold.

A large amount of LBDs exist in the field of 2D image analysis. Most of the comparison-based descriptors can be considered to be variants of the local binary pattern proposed in [19], where the intensities of some predefined pairs of neighboring pixels are compared to form a binary string for a feature point. In Binary Robust Independent Elementary Feature (BRIEF) [6], Binary Robust Invariant Scalable Key-point (BRISK) [16], Oriented FAST and Rotated BRIEF (ORB) [21] and Fast Retina Key-point (FREAK) [1], various ways of pixel pairs sampling were proposed. Ordinal Spatial Intensity Distribution (OSID) [24] and the Local Intensity Order Pattern (LIOP) [25] incorporated spatial information to improve the LBD’s discriminative ability. The works in [8, 30] proposed to use the full ranking of a set of pixels as a local descriptor, which is expected to encode the complete comparative information among pixels. The recent work in [26] proposed a framework to compute the local binary-like-patterns directly on a triangular-mesh. However, their work is specially developed for shapes with photometric and geometric information. To the best of our knowledge, we are the first to develop a local binary descriptor for non-rigid shape analysis.

3 Proposed Method

Fig. 1.
figure 1

Block diagram of the Local Heat Diffusion Binary descriptor.

Fig. 2.
figure 2

HKS as a function of t for a feature point and its neighboring points in Fig. 1 (c). A small t separates these points, in which case HKS reflects the local properties of the surface. As t increases, the values of HKS are almost the same, because HKS captures the global structure of a larger neighborhood.

Suppose we are given a discrete representation of shape as a triangular mesh (VET) with \(n_{V}\) vertices \(\{v_{1},\ldots ,v_{n_{V}}\}\), \(n_{E}\) edges \(\{(v_{i_{1}},v_{j_{1}}),\ldots ,(v_{i_{n_{E}}},v_{j_{n_{E}}})\}\) and \(n_{T}\) faces \(\{(v_{i_{1}},v_{j_{1}},v_{k_{1}}),\ldots ,(v_{i_{n_{T}}},v_{j_{n_{T}}},v_{k_{n_{T}}})\}\). Our goal is to derive a local shape signature that is invariant under isometric deformations.

An illustrative example of the proposed binary descriptor is given in Fig. 1. Basically our Heat Diffusion based Local Binary Descriptor (HD-LBD) is an extension from 2D images to 3D shapes. Therefore, the first step is the definition of scalar functions on manifolds [29] (see Sect. 3.1). Based on the definition of the scalar fields, an intrinsic Local Polar Coordinate System (LPCS) is constructed around the feature point (see Sect. 3.2 for details). Finally, in Sect. 3.3, we aggregate the information of the neighboring surface to form the binary string.

3.1 Scalar Field Definition

We model shapes as Riemannian manifolds M (possibly with boundary) embedded in \(R^{3}\). Let g be the scalar field defined on M. The real valued function g represents the geometric or photometric information of the shapes. In this paper, we consider the heat diffusion property, explained below.

The heat diffusion process over M is governed by the heat equation,

$$\begin{aligned} (\triangle _{M } + \frac{\partial }{\partial t} )u(v,t) = 0, \end{aligned}$$
(1)

where \(\triangle _{M }\) denotes the positive semi-definite Laplace-Beltrami operator of M, a Riemannian equivalent of the Laplacian. The solution u(vt) describes the amount of heat on the manifold at point v in time t with an initial condition u(v, 0). Since M is compact, \(u(v,t) = \int _{M }^{\infty }h_{t}(v,v')u(v')dv'\). \(h_{t}(v,v')\) is called heat kernel, and can be thought of as the amount of heat transferred from v to v’ in time t given a unit heat source at v. According to the spectral decomposition theorem, the heat kernel can be presented as

$$\begin{aligned} h_t(v,v') = \sum _{i\ge 1} e^{-\lambda _i t} \varPhi _i(v) \varPhi _i(v'), \end{aligned}$$
(2)

where \(\lambda _i\) and \(\varPhi _i\) are the \(i^{th}\) eigenvalues and corresponding eigenfunctions of the Laplace-Beltrami operator. Its restriction to the temporal domain results to

$$\begin{aligned} h_t(v) = \sum _{i\ge 1} e^{-\lambda _i t} \varPhi _i^{2}(v), \end{aligned}$$
(3)

known as the heat kernel signature. This signature is not only concise and commensurable, but it is still informative and invariant to isometric deformations [23]. More importantly, the descriptor captures the geometric information of the local surface over a number of scales (multi-scale), which is determined by the time parameter t, as shown in Fig. 2. Particularly, for small values of t, it is related to the manifold curvature according to

$$\begin{aligned} h_t(v) = \frac{1}{4\pi t} + \frac{K(v)}{12\pi } + \circleddash (t), \end{aligned}$$
(4)

where \(K(v) \) denotes the Gaussian curvature at point v.

Therefore, we adopt HKS over a small t to define the scalar field, which is supposed to reflect the intrinsic property of the local surface around the point v.

3.2 Intrinsic Local Reference Frame

Given a feature point v and a support radius r (defined using the geodesic metric), a local surface M’ is cropped from the mesh M. \(V_{N}=\{v_{i1},\ldots ,v_{ik}\}\) are the points lying on M’, and \(N_{1}(v)\) is the set of directly connected vertices to v, called 1-ring neighborhood. The construction of the Local Polar Coordinate System (LPCS) involves two steps: first to find the reference direction and then to chart the surface (see below for details).

Reference Direction. The first and the key step to construct the LPCS is to find its reference direction. Its accuracy directly determines the repeatability of the coordinate system. We adopt the method of intensity centroid [16], which is used to describe the orientation of a key point in the image domain. This method assumes that a corner’s intensity is the offset from its centroid, and this vector can be used as an orientation. Specifically, the moments of a patch P are defined as

$$\begin{aligned} m_{pq} = \sum _{(x_{i},y_{i})\subseteq P} x_{i}^p y_{i}^q I(x_{i},y_{i}), \end{aligned}$$
(5)

where (x, y) is the cartesian coordinate of a pixel, and I is its intensity. With these moments, the centroid can be found at

$$\begin{aligned} c = ( \frac{m_{10}}{m_{00}}, \frac{m_{01}}{m_{00}} ). \end{aligned}$$
(6)

The orientation of the patch is assumed to be

$$\begin{aligned} \theta = atan2(m_{01},m_{10}) \end{aligned}$$
(7)

In our method, we use \(N_{1}(v)\) to compute the reference direction. In our case, the intensities of the pixels are replaced by the values of \(h_t(v)\) (the previously defined scalar function on the shape). The coordinates of \(N_{1}(v)\) are approximated as follows: first, we map v’s 1-ring triangles onto the plane partitioning it into several segments with angle ratios remained; then a rectangular coordinate system is constructed around the feature point centred at v. Thus, coordinates of all vertexes belonging to \(N_{1}(v)\) are obtained.

Finally, we derive the reference direction of the local polar coordinate system (the face \(T_{i}\) on which the reference direction lies and the deviation angle \(\theta _{i}\) from the triangle edge), denoted as R on the 1-ring triangles and R’ on the mapped plane.

Surface Charting. A mesh can be viewed as a piece-wise planar approximation of the underlying smooth surface. Using the standard unfolding procedure in [3], the local surface made up of triangles can be transformed into an image patch that is unevenly sampled. Similar to [15], the construction of the local polar coordinate system consists of 2 steps: directions initialization and propagation. The initial directions are established by first mapping the 1-ring triangles onto the plane, partitioning the plane into several segments of equal angles with respect to the reference direction, and finally mapping back to the mesh. The order of the directions can be clockwise or counter-clockwise. In order to resolve ambiguities, we adopt a simple yet practical solution similar to [26], that the direction on the mapped plane nearest to the reference frame is chosen as the next. In this way, all the directions (Fig. 1(c)) are ordered in a uniform way. Afterwards, the initial directions are propagated outwards from 1-ring (using the standard unfolding procedure [3]) until they reach the boundary of the ‘image patch’ defined by the radius r. Thus, the LPCS (Fig. 1(d)) is constructed, where the ‘reference direction’ is R and its extension, ‘angle’ is the angle between the geodesic shooting direction and R, and the ‘radius’ is the geodesic distance from v.

4 Local Binary Descriptor

With the previously defined scalar function (Sect. 3.1) and the constructed local polar coordinate system (Sect. 3.2), the local surface M’ around vertex v can be regarded as an image patch, on which the local binary descriptor is defined. In the case of 2D image analysis, a number of ways are used to extract point pairs for the construction of a local binary descriptor, such as [1, 6, 16, 21]. In our method, we propose the bit vector (defined in Eq. 9) based on all pairwise intensity comparisons, which turned out to be highly discriminative (Sect. 5.3).

To be specific, we define a test \(\tau \) of a point pair \(N_{i}(p_{i1},p_{i2})\) on the local surface M’ as

$$\begin{aligned} \tau (h_{t};p_{i1},p_{i2}) = \left\{ \begin{array}{ll} 1 &{} \text {if}\; h_{t}(p_{i1})<h_{t}(p_{i2}) \\ 0 &{} \text {otherwise} \end{array} \right. \end{aligned}$$
(8)

where \(h_{t}(p_{i})\) is the value of heat kernel signature with parameter t at \(p_{i}=(\rho _{i},\theta _{i})\).

The test in Eq. 8 considers only the information at a single point \(p_{i}\) in the neighborhood of v, and is therefore quite noise-sensitive. In order to increase the stability and repeatability of the descriptor, we include all the points falling into each bin of the local polar coordinate system, and use the average of their intensities as a unit for test (the same approach used in [28]).

The choice of the set of location pairs \(N_{i}(p_{i1},p_{i2})\) uniquely defines a set of binary tests. In our method, we propose a bit string (binary) descriptor with dimension \(n_{d}\) equal to the cardinality of \(N_{i}(p_{i1},p_{i2})\) as

$$\begin{aligned} f_{n_{d}}(p) = \sum _{1<=i<=n_{d}} 2^{i-1} \tau (h_{t};p_{i1},p_{i2}). \end{aligned}$$
(9)

5 Experiments

The experiments that we carried out have three main goals. First, we examined the repetitiveness of the proposed local reference frame. Second, we compared our proposed descriptor with other state-of-the-art techniques to show its effectiveness. Finally, we examined the effect of the parameters on the performance of the descriptor.

5.1 Dataset

The performance of our binary descriptor was evaluated on the TOSCA dataset [4]. We followed the experimental protocol in [18] using human shapes (12 female shapes in class ’vitoria’, and 2 different male figures containing 7 and 20 poses in class ’david’ and ’michael’ respectively). In each class, an extrinsically symmetric ’null’ shape undergoes near-isometric deformations. Objects within the same class have the same triangulation and an equal number of vertices numbered in a compatible way, which can be used as a per-vertex ground truth correspondence. A typical number of vertices on each shape is about 50000. In order to reduce the computational load and storage complexity, all the shapes were downsampled to 10000 vertices, maintaining compatible triangulations and ground-truth correspondences. We used the finite elements scheme in [20] to obtain the first 400 eigenvalues and eigenvectors of the Laplace-Beltrami operator on each shape. Then, the heat kernel signature was computed in a particular scale according to Fig. 2.

5.2 Performance of the Local Polar Coordinate System

To assess the uniqueness of the LPCS, we measured the repetitiveness of its key component - the reference frame (because it determines the initial directions, and the coordinate system is the propagations of initial directions across adjoint triangles in a standard unfolding way). The performance is evaluated in two aspects: the percentage of reference frames lying in the same face, and if so the errors. We randomly sampled 1000 points on the ’null shape’, and extracted the corresponding points within each class. The reference directions were computed and compared for each points. According to Sect. 3.1, we choose small t for the computation. The results are summarised in Table 1 and Fig. 3 respectively. When t = 10, it achieved the best performance, which we choose to compute the scalar field in the rest of the experiments.

Table 1. The percentage of reference frames lying in the same face computed on 3 different shape classes.
Fig. 3.
figure 3

Deviations of the reference direction for two corresponding points.

Fig. 4.
figure 4

CMC curves of different descriptors on the TOSCA dataset.

Fig. 5.
figure 5

Influence of parameter selection on the descriptor’s performance

5.3 Performance of the Binary Descriptor

We used a quantitative criteria to evaluate the performance of the descriptor, called Cumulative Match Characteristic (CMC). The CMC curve evaluates the probability of finding the correct match within the first k best matches. The hit rate at k is calculated by sorting all of the distances in ascending order, and calculating the fraction of correct match. We first extracted 500 furthest point samples from the null shapes using the geodesic metric, and then found corresponding points on the deformed shapes. We set rad = 2 (‘rad’ is the geodesic distance) between rings, 5 rings and 8 rays respectively, and finally compared with the method in [18] using the code made available by the authors. From Fig. 4 we can observe that, our proposed descriptor greatly improves the performance especially at the first few shoots.

5.4 Parameter Selection

Our descriptor has three free parameters: the number of rings/rays and the geodesic distance between rings (rad), which determine the total area of each bin in the Local Polar Coordinate System. Here, we set rad = 2. We simulated various parameters to study the effect of these parameters on the descriptor’s performance. The performance evaluations are shown in Fig. 5 with varying numbers of rings (nRings = 2, 3, 4, 5 and 6) and rays (nRays = 5, 6, 7, 8 and 9), while in the first experiment nRays = 8 and in the second nRings = 4. It can be observed that the performance improves as the number of rings increases, and then remains almost the same. On the other hand, with an increasing number of rays, the performance also improves to some extent, but drops afterwards. In general, with more rings and rays, more information can be captured by the descriptor. However, the performance will degrade if there are too many rays due to a low mesh resolution and sensitivity to noise. A high number of rings will include redundant information of the local surface and will greatly increases the descriptor’s dimension.

6 Conclusion

We introduced a binary descriptor equipped with an intrinsic local polar coordinate system to the field of non-rigid shapes. The descriptor is developed based on the analysis of the heat diffusion process defined on manifolds. Since the binary descriptor requires an ordered support for its computation, we construct a local reference frame which is supposed to be intrinsic on the shape, robust and repetitive. Our experiments reveal its effectiveness.