skip to main content
research-article
Open Access

DiffusionNet: Discretization Agnostic Learning on Surfaces

Published:07 March 2022Publication History

Skip Abstract Section

Abstract

We introduce a new general-purpose approach to deep learning on three-dimensional surfaces based on the insight that a simple diffusion layer is highly effective for spatial communication. The resulting networks are automatically robust to changes in resolution and sampling of a surface—a basic property that is crucial for practical applications. Our networks can be discretized on various geometric representations, such as triangle meshes or point clouds, and can even be trained on one representation and then applied to another. We optimize the spatial support of diffusion as a continuous network parameter ranging from purely local to totally global, removing the burden of manually choosing neighborhood sizes. The only other ingredients in the method are a multi-layer perceptron applied independently at each point and spatial gradient features to support directional filters. The resulting networks are simple, robust, and efficient. Here, we focus primarily on triangle mesh surfaces and demonstrate state-of-the-art results for a variety of tasks, including surface classification, segmentation, and non-rigid correspondence.

Skip 1INTRODUCTION Section

1 INTRODUCTION

Recently, there has been significant interest in learning techniques for non-uniform geometric data, inspired by the tremendous success of convolutional neural networks (CNNs) in computer vision. A particularly challenging setting is extending the power of CNNs to learning directly on curved surfaces [Masci et al. 2015; Bronstein et al. 2017; Poulenard and Ovsjanikov 2018; Hanocka et al. 2019]. Unlike volumetric [Maturana and Scherer 2015] or point-based [Qi et al. 2017a] approaches, surface-based methods exploit the connectivity of the surface representation to improve performance and, furthermore, can be robust in the presence of non-rigid deformations, making them a strong solution for many tasks such as deformable shape matching [Masci et al. 2015; Boscaini et al. 2016].

However, although the field has largely been focused on the benchmark accuracy of networks for such problems, at least two other major roadblocks remain for achieving the full potential of learning on surfaces. First, whereas real-world geometric data come from a variety of sources, existing networks are strongly tied to a particular representation (e.g., triangulations or point clouds) or even discretization resolution. Hence, training cannot benefit from all available data. One popular strategy is to simply convert all data to a common representation (e.g., via point sampling), but this approach has well-known drawbacks (sampling a high-quality, detailed surface can alias thin features, lose informative details, etc.). Second, many existing mesh-based architectures do not scale well to high-resolution surface data. Though coarse inputs are sufficient for, e.g., classification tasks, they preclude potential future applications such as high-fidelity geometric analysis, and synthesis.

A key technical difficulty in surface-based learning is defining appropriate notions of convolution and pooling—two main building blocks in traditional CNNs. Unfortunately, unlike the Euclidean case, there is no universal canonical notion of convolution on surfaces. Existing approaches have tried to address this challenge through a variety of solutions such as mapping to a canonical domain [Sinha et al. 2016; Maron et al. 2017], exploiting local parametrizations [Masci et al. 2015; Boscaini et al. 2016; Wiersma et al. 2020], or applying convolution on the edges of the mesh [Hanocka et al. 2019]. However, the use of more advanced and delicate geometric operations, such as computing geodesics or parallel transport, has a significant impact on both the robustness and scalability of the resulting methods. Perhaps even more importantly, existing surface-based approaches are often too sensitive to the underlying mesh structure and thus unable to generalize to significantly different sampling and triangulations between training and test sets. As a result, despite significant recent progress in geometric deep learning [Cao et al. 2020; Greengard 2020], current methods typically struggle to cope with the variability, complexity, and scale encountered in the real-world surface and mesh-based settings.

In this work, we propose a method that exploits the surface representation but is both scalable and robust in the presence of significant sampling changes (see Figure 1). Our main observation is that expensive and potentially brittle operations used in previous works [Masci et al. 2015; Poulenard and Ovsjanikov 2018; Wiersma et al. 2020] can be replaced with two basic geometric operations: a learned diffusion layer for information propagation and a spatial gradient for capturing anisotropy. Discretizing these operations with the principled techniques of discrete differential geometry [Meyer et al. 2003; Crane et al. 2013] then automatically endows the resulting networks with both robustness and scalability, while maintaining the simplicity of the learning framework.

Fig. 1.

Fig. 1. Surface learning methods must generalize to shapes represented differently from the training set to be useful in practice, yet many existing approaches depend strongly on mesh connectivity. Here, our DiffusionNet trained for human segmentation with limited variability seen during training automatically generalizes to widely varying mesh samplings (left), scales gracefully to resolutions ranging from a simplified model to a large raw scan (middle), and can even be evaluated directly on point clouds (right).

Remarkably, we show that combining these basic geometric operations yields neural networks that are not only robust and scalable but also achieve state-of-the-art results in a wide variety of applications, including deformable surface segmentation, classification, as well as unsupervised and supervised non-rigid shape matching. Perhaps even more fundamentally, our DiffusionNet offers a unified perspective across representations of surface geometry—in principle it can be applied to any geometric representation where one has a Laplacian and gradient operator. In this article, for example, we show how the same architecture achieves accurate results for both meshes and point clouds and even allows training on one and evaluating on the other.

Contributions.. The main contributions of this work are as follows:

  • We show that a simple learned diffusion operation is sufficient to share spatial data in surface learning.

  • We introduce spatial gradient features for learning local directional filters.

  • Inspired by these insights, we present DiffusionNet, an architecture for learning on surfaces that has many advantages, including robustness to discretization, and achieves state-of-the-art results on several benchmarks.

Skip 2RELATED WORK Section

2 RELATED WORK

Applying deep learning techniques to three-dimensional (3D) shapes is a rich and extensive area of research. We review the approaches most closely related to ours and refer the interested readers to recent surveys, including Bronstein et al. [2017], Xu et al. [2016], and Cao et al. [2020].

View-based and volumetric methods.. Most early geometric deep learning-based methods directly leveraged tools developed for 2D images and thus mapped 3D shapes onto the plane either using multi-view renderings [Su et al. 2015; Wei et al. 2016; Kalogerakis et al. 2017] or more global, often parametrization-based techniques such as panoramas [Shi et al. 2015; Sfikas et al. 2017], geometry images [Sinha et al. 2016], or metric-preserving mappings [Ezuz et al. 2017], among many others.

Another direct approach to applying convolution to 3D shapes relies on volumetric voxel grid representations, which has led to a variety of methods, including Maturana and Scherer [2015] and Wu et al. [2015] and their efficient extensions [Wang et al. 2017; Klokov and Lempitsky 2017]. Such techniques can, however, be computationally expensive and difficult to apply to detailed deformable shapes.

2.1 Learning on Surfaces

Methods that learn on 3D surfaces directly typically fall into two major categories, based either on point cloud or triangle mesh representations.

Point-based methods.. A successful set of methods for learning on 3D shapes represented as point clouds was pioneered by the PointNet [Qi et al. 2017a] and PointNet++ [Qi et al. 2017b] architectures, which have been extended in many recent works, including PointCNN [Li et al. 2018], DGCNN [Wang et al. 2019], PCNN [Atzmon et al. 2018], and KPConv [Thomas et al. 2019], to name a few (see also Guo et al. [2020] for a recent survey). Moreover, recent efforts have also been made to incorporate invariance and equivariance of the networks with respect to various geometric transformations, e.g., Deng et al. [2018], Hansen et al. [2018], Li et al. [2021], Poulenard et al. [2019], Zhang et al. [2019], and Zhao et al. [2020]. The major advantages of point-based methods are their simplicity, flexibility, applicability in a wide range of settings, and robustness in the presence of noise and outliers. However, first, their overall accuracy can often be lower than that of methods that explicitly use surface (e.g., mesh) connectivity when it is available. Second, though effective on static mechanical objects and scenes, point-based methods may not be well suited for deformable (non-rigid) shape analysis, requiring extremely large training sets and significant data augmentation to achieve good results, e.g., for non-rigid shape matching applications [Groueix et al. 2018; Donati et al. 2020]. Globally supported point-based networks were recently considered in Peng et al. [2020]; our method naturally allows global support via learned diffusion.

Surface and graph-based methods.. To address the limitations of point-based approaches, several methods have been proposed that operate directly on mesh surfaces and thus can learn filters that are intrinsic and robust to complex non-rigid deformations. The earliest pioneering approaches in this direction generalize convolutions [Masci et al. 2015; Boscaini et al. 2016; Monti et al. 2017; Fey et al. 2018], typically using local surface parameterization via the logarithmic map. Unfortunately local parameterizations are only defined up to rotation in the tangent plane, leading to several methods that address this issue through design of equivariant surface networks [Poulenard and Ovsjanikov 2018; He et al. 2020; Wiersma et al. 2020; Yang et al. 2021; Haan et al. 2021; Mitchel et al. 2021]. Likewise, operating on vector-valued data in a local tangent space can expand the expressivity of the filter space [Wiersma et al. 2020; Mitchel et al. 2021; Beani et al. 2021]. Our method leverages learned gradient features (Section 3.4), which geometrically require only a local spatial gradient operation, and sidesteps the challenge of equivariant filters by using only inner products, which are naturally invariant. These gradient features are built on local differential operators, which have also been exploited in other recent methods (e.g., Eliasof and Treister [2020] and Jiang et al. [2018]).

Surface mesh structure has also been used in a variety of graphlike approaches that specifically leverage mesh connectivity [Hanocka et al. 2019; Verma et al. 2018; Lim et al. 2018; Gong et al. 2019; Feng et al. 2019; Milano et al. 2020; Hajij et al. 2020; Bodnar et al. 2021], the structure of discrete operators [Smirnov and Solomon 2021], or even random walks along edges [Lahav and Tal 2020], among others. While accurate, these methods can be costly on densely sampled shapes and often are not robust to significant changes in the mesh structure (Figure 3).

2.2 Spectral Methods

Our use of diffusion is also loosely related to techniques that operate in the spectral domain and often exploit the link between convolution and operations in a derived (e.g., Fourier or Laplace–Beltrami) basis, including Bruna et al. [2014], Levie et al. [2018], and Sun et al. [2020]. Such methods have a long history in graph-based learning and are well rooted in data analysis more broadly, including Laplacian eigenmaps [Belkin and Niyogi 2003], spectral clustering [Vallet and Lévy 2008], and diffusion maps [Coifman et al. 2005]. In geometry processing, spectral methods have been used for a range of tasks, including multi-resolution representation [Levy 2006], segmentation [Rustamov 2007], and matching [Ovsjanikov et al. 2012], among others [Vallet and Lévy 2008; Zhang et al. 2010].

Unfortunately, Laplacian eigenfunctions depend on each shape, and thus coefficients or learned filters from one shape are not trivially transferable to another. Levie et al. [2019] argue for transfer between discretizations of the same shape, but 3D geometric learning typically demands transfer between different shapes. Functional maps [Ovsjanikov et al. 2012] can be used to “translate” coefficients between shapes, and have been used, for example, in Yi et al. [2017] with spectral filter learning. Deep functional maps [Litany et al. 2017] propose to learn features in the primal domain, which are then projected onto the Laplace–Beltrami basis for functional map estimation. However, the features are still learned either with multi-layer perceptrons (MLPs) starting from pre-computed descriptors [Litany et al. 2017; Roufosse et al. 2019; Ginzburg and Raviv 2020; Halimi et al. 2019] or using point-based architectures [Donati et al. 2020].

Instead, we propose an approach that learns the parameters of a diffusion process that is directly transferable across shapes and, as we show below, can be used effectively in applications like non-rigid shape matching. We also stress that DiffusionNet is not spectral in nature and only uses spectral operations as an acceleration technique for evaluating diffusion efficiently.

We also note that our use of the Laplacian in defining the diffusion operator is related to methods based on polynomials of the Laplacian [Kostrikov et al. 2018; Defferrard et al. 2016], CayleyNets [Levie et al. 2018] and their recent application in shape matching using ACSCNNs [Li et al. 2020b]. However, we demonstrate that complex polynomial filters can be replaced with simple learned diffusion and, moreover, that gradient features can inject orientation information into the network, improving performance and robustness.

Similarly to our approach, diffusion for smooth communication has been explored on graphs [Klicpera et al. 2019; Xu et al. 2019], images [Liu et al. 2016], and point clouds [Hansen et al. 2018]. In contrast, our method directly learns a diffusion time per-feature (which significantly improves performance, Table 7), incorporates a learned gradient operation, and is applied directly to mesh surfaces.

Pooling. In surface learning, it is nontrivial to define pooling, especially on meshes where it often amounts to mesh simplification [Hanocka et al. 2019]. Various recent operations have been proposed for point cloud [Lin et al. 2020; Hu et al. 2020], mesh [Milano et al. 2020; Zhou et al. 2020], or even graph pooling [Ma et al. 2020; Li et al. 2020a]. A key advantage of our approach is that it automatically supports global spatial support without any downsampling operation, simplifying implementation and improving learning.

Skip 3METHOD Section

3 METHOD

Our method consists of three main building blocks: MLPs applied at each point to model pointwise scalar functions of feature channels, a learned diffusion operation for propagating information across the domain, and local spatial gradient features to expand the network’s filter space beyond radiallly symmetric filters. In this section, we describe these main numerical components, and then we assemble them into an effective architecture in Section 4. Our method is defined in a representation-agnostic manner; applying it to meshes or point clouds simply amounts to assembling the appropriate Laplacian and gradient matrices as we discuss below.

3.1 Pointwise Perceptrons

On a mesh or point cloud with \( V \) vertices, we consider a collection of \( D \) scalar features defined at each vertex. Our first basic building block is a pointwise function \( f : \mathbb {R}^D \rightarrow \mathbb {R}^D \), which is applied independently at every vertex to transform the features. We represent these pointwise functions as a standard MLP with shared weights across all vertices. Although these MLPs can fit arbitrary functions at each point, they do not capture the spatial structure of the surface or allow any communication between vertices, so a richer structure is needed.

Past approaches for communication have ranged from global reductions to explicit geodesic convolutions—instead, we will demonstrate that a simple learned diffusion layer effectively propagates information, without the need for potentially costly or error-prone computations.

3.2 Learned Diffusion

In the continuous setting, diffusion of a scalar field \( u \) on a domain is modeled by the heat equation, (1) \( \begin{equation} \tfrac{d}{dt} u_t = \Delta u_t, \end{equation} \) where \( \Delta \) is the Laplacian (or, more formally, the Laplace–Beltrami operator). The action of diffusion can be represented via the heat operator \( H_t \), which is applied to some initial distribution \( u_0 \) and produces the diffused distribution \( u_t \); this action can be defined as \( H_t(u_0) = \exp (t \Delta) u_0 \), where \( \exp \) is the operator exponential. Over time, diffusion is an increasingly-global smoothing process: for \( t=0 \), \( H_t \) is the identity map, and as \( t \rightarrow \infty \) it approaches the average over the domain.

We propose to use the heat equation to spatially propagate features for learning on surfaces; its principled foundations ensure that results are largely invariant to the way the surface is sampled or meshed. To discretize diffusion, one replaces \( \Delta \) with the weak Laplace matrix \( L \) and mass matrix \( M \). Here, \( L \) is a positive semi-definite sparse matrix \( L \in \mathbb {R}^{V \times V} \) with the opposite sign convention such that \( M^{-1} L\!\approx \!-\Delta \). The number of entries in \( L \) and \( M \) are generally \( O(V) \), scaling effectively to large inputs (Table 6). On triangle meshes, we will use the cotan-Laplace matrix, which is ubiquitous in geometry processing applications [MacNeal 1949; Pinkall and Polthier 1993; Crane et al. 2013]; for point clouds, we will use the related Laplacian from Sharp and Crane [2020]. This matrix has also been defined for voxel grids [Caissard et al. 2019], polygon meshes [Bunge et al. 2020], tetrahedral meshes [Alexa et al. 2020], and so on. The weak Laplace matrix is accompanied by a mass matrix \( M \), such that the rate of diffusion is given by \( -M^{-1} L u \). Here, \( M \) will be a “lumped” diagonal matrix of areas associated with each vertex.

We define a learned diffusion layer \( h_t : \mathbb {R}^{V} \rightarrow \mathbb {R}^{V} \), which diffuses a feature channel \( u \) for learned time \( t \in \mathbb {R}_{\ge 0} \). In our networks, \( h_t(u) \) is applied independently to each feature channel, with a separate learned time \( t \) per channel. Learning the diffusion parameter is a key strength of our method, allowing the network to continuously optimize for spatial support ranging from purely local to totally global and even choose different receptive fields for each feature (Figure 4). We thus sidestep challenges like manually choosing the support radius of a convolution or sizes for a pooling hierarchy.

In the language of deep learning, diffusion can be viewed as a kind of smooth mean (average) pooling operation with many benefits: It has a geometrically principled meaning, its support ranges from purely local to totally global via the choice of diffusion time, and it is differentiable with respect to diffusion time, allowing spatial support to be automatically optimized as a network parameter.

A note on generality.. Remarkably, eschewing traditional representations of convolutions in favor of diffusion does not reduce the expressive power of our networks. This is supported by the following theoretical result (that we prove in Appendix A), which shows that radially symmetric convolutions are contained in the function space defined by diffusion followed by a pointwise map:

Lemma 1 (Inclusion of Radially Symmetric Convolutions).

For a signal \( u: \mathbb {R}^2 \rightarrow \mathbb {R} \), let \( U_r(p) : \mathbb {R}_{\ge 0} \rightarrow \mathbb {R} \) denote the integral of \( u \) along the \( r \)-sphere at \( p \), and let \( u_t(p) : \mathbb {R}_{\ge 0} \rightarrow \mathbb {R} \) denote the signal value at \( p \) after diffusion for time \( t \). Then there exists a function transform \( \mathcal {T} \) that recovers \( U_r(p) \) from \( u_t(p), \) \( \begin{equation*} U_r(p) = \mathcal {T}[u_t(p)](r). \end{equation*} \) Thus convolution with a radial kernel \( \alpha : \mathbb {R}_{\ge 0} \rightarrow \mathbb {R} \) is given by \( \begin{equation*} (u * \alpha)(p) = \int _{\mathbb {R}^2} \alpha (|q-p|) u_0(q)\ dq = \int _{0}^{\infty } \alpha (r) \mathcal {T}[u_t(p)](r)\ dr, \end{equation*} \) which is a pointwise operation at \( p \) on the diffused values \( u_t(p) \).

This fact is significant, because it suggests that simple and robust diffusion can, without loss of generality, be used to replace complicated operations such as radial geodesic convolution.

Importantly, we will also extend our architecture beyond radially symmetric filters by incorporating gradient features (Section 3.4).

3.3 Computing Diffusion

Many numerical schemes could potentially be used to evaluate the diffusion layer, \( h_t(u) \), from direct solvers [Chen et al. 2008] to hierarchical schemes [Vaxman et al. 2010; Liu et al. 2021]. In particular, we seek schemes that are efficient as well as differentiable, to enable network training. Here we describe two simple methods considered in our experiments. The first scheme we consider is an implicit timestep, which is straightforward but requires solving large sparse linear systems, and the second is spectral expansion, which uses only efficient dense arithmetic at evaluation time but requires some modest precomputation. Both are easily implemented using common numerical libraries, and we observe that networks trained with either approach have similar accuracy. Efficiency is evaluated in Section 5.6; we generally recommend spectral acceleration.

3.3.1 Direct Implicit Timestep.

Perhaps the simplest effective approach to simulate diffusion is a single implicit Euler timestep, (2) \( \begin{equation} h_t(u) := (M + tL)^{-1} M u, \end{equation} \) which amounts to solving a (sparse) linear system for each diffusion operation. Using an implicit backward timestep rather than an explicit forward timestep is crucial, as it makes the scheme stable, allows global support, and yields a reasonable approximation of diffusion after just one step. Solving linear systems (including derivative computation) is supported in modern learning software frameworks, allowing Equation (2) to implement a learnable diffusion block. However, this amounts to solving a distinct large linear system for each channel, and GPU-based computation may fall back to solving dense linear systems, which means that direct implicit timesteps may not scale well to large problems.

3.3.2 Spectral Acceleration.

An alternate approach is to leverage a closed-form expression for diffusion in the basis of low-frequency Laplacian eigenfunctions [Vallet and Lévy 2008; Zhang et al. 2010]. Once the eigenbasis has been precomputed, diffusion can then be evaluated for any time \( t \) via elementwise exponentiation. Truncating diffusion to a low-frequency basis incurs some approximation error, but we find that this approximation has little effect on our method (Figure 13), perhaps because diffusion quickly damps high-frequency content regardless.

For weak Laplace matrix \( L \) and mass matrix \( M \), the eigenvectors \( \phi _i \in \mathbb {R}^V \) are solutions to (3) \( \begin{equation} L \phi _i = \lambda _i M \phi _i, \end{equation} \) corresponding to the first \( k \) smallest-magnitude eigenvalues \( \lambda _1, \ldots , \lambda _k \). We normalize them so that \( \phi _i^T M \phi _i = 1 \). These eigenvectors are easily precomputed for each shape of interest via standard numerical packages [Lehoucq et al. 1998]; the inset figure shows several example functions \( \phi _i \) for a surface of a human shape.

Let \( \Phi := [\phi _i] \in \mathbb {R}^{V \times k} \) be the stacked matrix of eigenvectors, which form an orthonormal basis with respect to \( M \). We can then project any scalar function \( u \) to obtain its coefficients \( c \) in the spectral basis via \( c \leftarrow \Phi ^T M u \) and recover values at vertices as \( u \leftarrow \Phi c \). Conveniently, diffusion for time \( t \) is easily expressed as an elementwise scaling of spectral coefficients according to \( c_i \leftarrow e^{-\lambda _i t} c_i \). The diffusion layer \( h_t(u) \) is then evaluated by projecting onto the spectral basis, evaluating pointwise diffusion, and projecting back (4) \( \begin{equation} h_t(u) := \Phi \begin{bmatrix}e^{-\lambda _0 t} \\ e^{-\lambda _1 t} \\ \dots \end{bmatrix} \odot (\Phi ^T M u), \end{equation} \) where \( \odot \) denotes the Hadamard (elementwise) product. This operation is efficiently evaluated using dense linear algebra operations like elementwise exponentiation and matrix multiplication and is easily differentiable with respect to \( u \) and \( t \).

Remarks. We emphasize that DiffusionNet can still learn high-frequency outputs despite the use of a low-frequency basis (e.g., in Section 5.4). Intuitively, diffusion is used for communication across points, for which a low-frequency approximation is typically sufficient, while MLPs and gradient features learn high-frequency features as needed for a task. Additionally, we note that DiffusionNet is not a spectral learning method—spectral coefficients are never used to represent filters or latent data, and thus no issues arise due to differing eigenbases on different shapes. Spectral acceleration is merely one possible numerical scheme to compute diffusion.

3.4 Spatial Gradient Features

Our learned diffusion layer enables propagation of information across different points on a shape, but it supports only radially symmetric filters about a point. The last building block in our method enables a larger space of filters by computing additional features from the spatial gradients of signal values at vertices (Figure 6). Specifically, we construct features from the inner products between pairs of feature gradients at each vertex, after applying a learned scaling or rotation.

Evaluating gradients.. We will express the spatial gradient of a scalar function on a surface as a 2D vector in the tangent space of each vertex. These gradients can be evaluated by a standard procedure, choosing a normal vector at each vertex (given as input or locally approximated) and then projecting neighbors into the tangent plane—either 1-ring neighbors on a mesh or \( m \)-nearest neighbors in a point cloud. The gradient is then computed in the tangent plane via least-squares approximation of the function values at neighboring points (see Mukherjee and Wu [2006] for analysis). These gradient operators at each vertex can be assembled into a sparse matrix \( G \in \mathbb {C}^{V \times V} \), which is applied to a vector \( u \) of real values at vertices to produce gradient tangent vectors at each vertex. This matrix does not depend on the features and can be precomputed once for each shape. We use complex numbers as a convenient notation for tangent vectors in an arbitrary reference basis in the tangent plane of each point (as in Knöppel et al. [2013] and Sharp et al. [2019], etc.). If the normals are consistently oriented, then the imaginary axis is chosen to form a right-handed basis in 3D with respect to the outward surface normal.

Learned pairwise products.. Equipped with per-vertex spatial gradients of each channel, we learn informative scalar features by evaluating an inner product between pairs of feature gradients at each vertex after a learned linear transformation. Inner products are invariant to rotations of the coordinate system, so these features are invariant to the choice of tangent basis at vertices, as expected. Putting it all together, given a collection of \( D \) scalar feature channels, for each channel \( u \) we first construct its spatial gradient as \( z_u \in \mathbb {C}^{V} \), a vector of local 2D gradients per-vertex, (5) \( \begin{equation} z_u := G u, \end{equation} \) and then at each vertex \( v \), we stack the local gradients of all channels to form \( w_v \in \mathbb {C}^{D} \) and obtain real-valued features \( g_v \in \mathbb {R}^D \) as (6) \( \begin{equation} g_v := \textrm {tanh}(\textrm {Re} (\overline{w}_v \odot A w_v)), \end{equation} \) where \( A \) is a learned square \( D \times D \) matrix, and taking the real part \( \textrm {Re} \) after a complex conjugate \( \overline{w}_v \) is again just a notational convenience for dot products between pairs of 2D vectors. This means the \( i{\rm th} \) entry of the output at vertex \( v \) is given by the dot product \( g_v(i) = \textrm {tanh}(\textrm {Re} \lbrace \sum _{j=1}^D \overline{w}_v(i) A_{ij} w_v(j) \rbrace) \), so that each inner product is scaled by a learned coefficient \( A_{ij} \). The outer \( \textrm {tanh}(\cdot) \) nonlinearity is not fundamental, but we find that it stabilizes training.

The choice of \( A \) as a complex or real matrix has a subtle relationship with the orientation of the underlying surface. Multiplying \( w_v(j) \) by a complex scalar both rotates and scales local gradient vectors before taking inner products (recall that complex multiplication can be interpreted as a rotation and scaling in the complex plane). In contrast, real \( A \) only allows scaling. However, the direction of rotation (clockwise or counter-clockwise) depends on the choice of the outward normal and hence on orientation—so surfaces with consistently oriented normals gain a richer representation by learning a complex matrix, whereas surfaces without consistent orientation (e.g., raw point clouds) should restrict to real \( A \).

In Figure 5, a small synthetic experiment (detailed in Appendix C) demonstrates how these learned rotations allow our method to disambiguate bilateral symmetry even in a purely intrinsic representation, a common challenge in non-rigid shape correspondence.

Skip 4DIFFUSIONNET ARCHITECTURE Section

4 DIFFUSIONNET ARCHITECTURE

The previous section establishes three ingredients for learning on surfaces: an MLP applied independently at each vertex to represent pointwise functions, learned diffusion for spatial communication, and spatial gradient features to model directional filters. We combine these ingredients to construct DiffusionNet (Figure 7), composed of several DiffusionNet blocks. This simple network operates on a fixed channel width \( D \) of scalar values throughout, with each DiffusionNet block diffusing the features, constructing spatial gradient features, and feeding the result to an MLP.

We include residual connections to stabilize training [He et al. 2016], as well as linear layers to convert to the expected input and output dimension. When appropriate, results at the edges or faces of a mesh can be computed by averaging network outputs from the incident vertices, e.g., to segment the faces of a mesh. Various activations can be appended to the end of the network based on the problem at hand, such as a softmax for segmentation, or a global mean followed by a softmax for classification; otherwise, this same architecture is used for all experiments.

Remarkably, we do not find it necessary to use any spatial convolutions or pooling hierarchies on surfaces—avoiding these potentially complex operations helps keep DiffusionNet simple and robust.

Invariance.. DiffusionNet is invariant to rigid motion of the underlying shapes as long as the input features remain unchanged, due to the intrinsic geometric nature of diffusion and spatial gradients. The overall invariance then depends on the choice of input features.

4.1 Input Features

DiffusionNet takes a vector of scalar values per vertex as input features. Here, we consider two simple choices of features; others could easily be included when available. Most directly, we simply use the raw 3D coordinates of a shape as input; rotation augmentation can be used to promote rigid invariance when inputs are not consistently aligned. When rigid or even non-rigid invariance is desired, we instead use the heat kernel signatures (HKS) [Sun et al. 2009] as input; these signatures are trivially computed from the spectral basis in Section 3.3.2. Due to the intrinsic nature of our approach, with HKS as input, the networks are invariant to any orientation-preserving isometric deformation of the shape. Higher-order descriptors such as SHOT [Tombari et al. 2010] seem unnecessary and may be unstable under remeshing [Donati et al. 2020].

Skip 5EXPERIMENTS AND ANALYSIS Section

5 EXPERIMENTS AND ANALYSIS

The same network architecture achieves state-of-the-art results across many tasks and, more importantly, offers new and valuable capabilities. See the appendix for additional analyses.

Setup.. We use the same basic 4-block DiffusionNet architecture and training procedure for all tasks, varying the network size from a small 32-width (30k parameter) to a large 256-width (1.8M parameter) DiffusionNet according to the scale of the problem. The shape of the first and last linear layers is adapted to the input and output dimension for the problem. MLPs use ReLU activations and optionally dropout after intermediate linear layers. We let “xyz” and “hks” denote networks with positions and heat kernel signatures as input, respectively. All inputs are centered and scaled to be contained in a unit sphere, and heat kernel signatures are sampled at 16 values of \( t \) logarithmically spaced on \( [0.01, 1] \). We do not use any data augmentation, except random rotations in tasks where positions are used as features yet a rotation-invariant network is desired.

We fit DiffusionNet using the ADAM optimizer with an initial learning rate of 0.001 and a batch size of 1, training for 200 epochs and decaying the learning rate by a factor of 0.5 every 50 epochs. Cross-entropy loss is used for labelling problems. Spectral acceleration is used to evaluate diffusion except where noted, truncated to a \( k=128 \) eigenbasis. On point clouds, 30 nearest-neighbors are used to assemble matrices. Test accuracies are measured after the last epoch of training.

Implementation details.. Precomputation to assemble matrices and compute the Laplacian eigenbasis for spectral acceleration is performed once as a preprocess on the CPU using SciPy [Virtanen et al. 2020; Lehoucq et al. 1998]. Networks are implemented in PyTorch [Paszke et al. 2019] and evaluated on a single GPU with standard backpropagation. Performance is discussed in Section 5.6; we find that DiffusionNet is very efficient and scalable compared to recent mesh-based learning methods. Code and reproducible experiments are available at github.com/nmwsharp/diffusion-net.

5.1 Classification

We first apply DiffusionNet to classify meshes in the SHREC-11 dataset [Lian et al. 2011], which has 30 categories of 20 shapes each. We demonstrate that DiffusionNet learns successfully even in the presence of limited data. As in the other cited results, we train on just 10 samples per class; our results are averaged over 10 trials of the experiment with random training splits. We fit a cross-entropy loss with a label smoothing factor of 0.2 (see discussion in Goyal et al. [2021]). We use a 32-width DiffusionNet for hks features and 64-width DiffusionNet for xyz features with rotation augmentation. DiffusionNet achieves the highest reported accuracy when applied directly on the original dataset models or to the simplified models common in recent mesh-based learning work (Table 1).

Table 1.
MethodAccuracy
GWCNN [Ezuz et al. 2017]90.3%
MeshCNN\( ^\dagger \) [Hanocka et al. 2019]91.0%
HSN\( ^\dagger \) [Wiersma et al. 2020]96.1%
MeshWalker\( ^\dagger \) [Lahav and Tal 2020]97.1%
PD-MeshNet\( ^\dagger \) [Milano et al. 2020]99.1%
HodgeNet\( ^\dagger \) [Smirnov and Solomon 2021]94.7%
FC\( ^\dagger \) [Mitchel et al. 2021]99.2%
DiffusionNet - xyz\( ^\dagger \)99.4%
DiffusionNet - xyz99.0%
DiffusionNet - hks\( ^\dagger \)99.5%
DiffusionNet - hks99.7%
  • DiffusionNet achieves nearly-perfect accuracy classifying 30-class SHREC11 [Lian et al. 2011] while training on just 10 samples per class. Results marked by \( ^\dagger \)are trained and tested on simplified models.

Table 1. SHREC11 Classification

  • DiffusionNet achieves nearly-perfect accuracy classifying 30-class SHREC11 [Lian et al. 2011] while training on just 10 samples per class. Results marked by \( ^\dagger \)are trained and tested on simplified models.

5.2 Segmentation

Molecular segmentation.. We evaluate a 128-width DiffusionNet on the task of segmenting RNA molecules into functional components, using the dataset introduced by Poulenard et al. [2019]. This dataset consists of 640 RNA surface meshes of about 15k vertices each extracted from the Protein Data Bank [Berman et al. 2000], labelled at each vertex according to 259 atomic categories, with a random 80-20 train-test split. We learn these labels directly on the raw meshes, as well as on point clouds of 4096 uniformly sampled points as in past work [Poulenard et al. 2019]. For comparison, we cite point cloud results reported in Poulenard et al. [2019] and additionally train methods related to ours, SplineCNN [Fey et al. 2018] and a Dirac Surface Network [Kostrikov et al. 2018] on meshes. We also attempted to train MeshCNN [Hanocka et al. 2019] and HSN [Wiersma et al. 2020] but found the former prohibitively expensive, while the latter could not successfully preprocess the data. Our method achieves state-of-the-art accuracy on both the mesh and point cloud variants of the problem (Table 2, Figure 8). Learning directly on the mesh yields greater accuracy, perhaps because no information is lost when sampling a point cloud, and surface structure is preserved.

Table 2.

Table 2. RNA Surface Segmentation

Human segmentation.. We train a 128-width DiffusionNet with dropout to segment the human body parts on the composite dataset of Maron et al. [2017], containing models from several other human shape datasets [Bogo et al. 2014; Anguelov et al. 2005; Adobe 2016; Vlasic et al. 2008; Giorgi et al. 2007]. Additionally, we cite a variety of reported results from other approaches on this task, as reported by the respective original authors and by Wiersma et al. [2020]. For clarity, we distinguish between variants of this task in past work that used simplified meshes and soft ground truth; more details are presented in Appendix C. Our model is quite effective using both rotation-augmented raw coordinates or heat kernel signatures as input (Table 3).

Table 3.

Table 3. Human Part Segmentation

5.3 Functional Correspondence

Functional maps compute a correspondence between a pair of shapes by finding a linear transformation between spectral bases, aligning some set of input features [Ovsjanikov et al. 2012]. Recent work has shown that learned features can improve performance, e.g., Donati et al. [2020] and Litany et al. [2017]. Here, we demonstrate that using DiffusionNet as a feature extractor outperforms other recent approaches, yielding to state-of-the-art correspondence results in both the supervised and weakly supervised variants of the problem. We emphasize that the spectral representation in functional maps is unrelated to the spectral acceleration from Section 3.3.2, which is merely a scheme for evaluating diffusion; DiffusionNet itself does not learn in the spectral domain.

Our experiments follow the setup of Donati et al. [2020], training and evaluating on both SCAPE [Anguelov et al. 2005; Ren et al. 2018] and FAUST [Bogo et al. 2014], including training on one dataset and evaluating on the other. In the supervised setting, we fit dataset-provided correspondences, and we generate rigid-invariant models by randomly rotating all inputs for training and testing. For the weakly supervised setting, we use the dataset and losses advocated in Sharma and Ovsjanikov [2020], where rigid alignment of the input is used as weak supervision, without known correspondences. In all cases, we extract point-to-point maps between test shapes and evaluate them against ground truth dense correspondences; for simplicity, we compare all methods without post-processing the maps, though we also report accuracies after postprocessing with ZoomOut for our method [Melzi et al. 2019]. In addition to citing results obtained using the KPConv feature extractor [Thomas et al. 2019] by Donati et al. [2020], we also train HSN [Wiersma et al. 2020], ACSCNN [Li et al. 2020b], and our own 128-width DiffusionNet with dropout. We also tried MeshCNN [Hanocka et al. 2019], but it proved to be prohibitively expensive at 14hours per epoch.

As shown in Table 4, DiffusionNet yields state-of-the-art results for non-rigid shape correspondence in the both the supervised and weakly supervised settings, especially when transferring between datasets. One might wonder whether the improvement in this task truly stems from our DiffusionNet architecture or from the use of HKS features. As shown in the same table, training KPConv with HKS as features shows DiffusionNet yields significant improvements regardless. Figure 9 visualizes the resulting correspondences on a challenging test pair, where only DiffusionNet achieves a high-quality correspondence when generalizing after training on a different dataset.

Table 4.
Method/DatasetFAUSTSCAPEF on SS on F
KPConv [Thomas et al. 2019]3.14.411.06.0
KPConv - hks2.93.310.65.5
HSN [Wiersma et al. 2020]3.33.525.416.7
ACSCNN [Li et al. 2020b]2.73.28.46.0
DiffusionNet - hks2.73.03.83.0
DiffusionNet - xyz2.73.03.33.0
+ ZoomOut1.92.42.41.9
WSupFMNet3.37.311.76.2
WSupFMNet + DiffusionNet - xyz3.84.44.83.6
+ ZoomOut1.92.62.71.9
  • Our Approach Yields State-of-the-art Correspondences When Used as a Feature Extractor for Deep Functional Maps, Both in the Supervised (top, as in Donati et al. [2020]) and the Weakly Supervised Setting (bottom, as in Sharma and Ovsjanikov [2020]). The dotted rows apply ZoomOut post-processing to the previous result [Melzi et al. 2019]. X on Y means train on X and test on Y. Values are mean geodesic error \( \times 100 \) on unit-area shapes.

Table 4. Functional Map Correspondence

  • Our Approach Yields State-of-the-art Correspondences When Used as a Feature Extractor for Deep Functional Maps, Both in the Supervised (top, as in Donati et al. [2020]) and the Weakly Supervised Setting (bottom, as in Sharma and Ovsjanikov [2020]). The dotted rows apply ZoomOut post-processing to the previous result [Melzi et al. 2019]. X on Y means train on X and test on Y. Values are mean geodesic error \( \times 100 \) on unit-area shapes.

5.4 Discretization Agnostic Learning

A key benefit of DiffusionNet compared to many past approaches is that its outputs are robust to changes in the discretization of the input (e.g., different meshes of the same shape or a mesh vs. a point cloud). This property is essential for practical applications where meshes for inference are likely to be tessellated differently from the training set, and so on. Other invariants (e.g., rigid invariance) can be encouraged via data augmentation, but for discretization it is impractical to generate augmented inputs across all possible sampling patterns. Below, we demonstrate that DiffusionNet generalizes quite well across discretizations without any special regularization or augmentation, especially compared to recent mesh-based methods.

We study discretization invariance using a popular formulation of the shape correspondence task on the FAUST human dataset [Bogo et al. 2014], where each vertex of a mesh is to be labelled with the corresponding vertex on a template mesh. Importantly, these input meshes are already manually aligned templates, with exactly identical mesh connectivity. Past work has achieved near-perfect accuracy in this problem setup (e.g., Fey et al. [2018] and Li et al. [2020b]); however we suggest that these models primarily overfit to mesh graph structure. In contrast, DiffusionNet learns an accurate and general function of the shape itself, despite the synthetic setup.

To experimentally quantify this effect, we construct a version of the FAUST test set after remeshing with several strategies: orig is the original test mesh, iso is a uniform isotropic remeshing, dense refines the mesh in randomly sampled regions, qes first refines the meshes and then applies quadric error simplification [Garland and Heckbert 1997], and cloud is a point cloud with normals sampled from the surface (Figure 10). Unrelated remeshings have appeared in Poulenard and Ovsjanikov [2018] and Wiersma et al. [2020], but the procedure therein left large regions of the mesh unchanged. Ground truth for evaluation is defined via nearest-neighbor on the original test mesh. We train on the 3D coordinates of the 80 standard FAUST registered meshes but evaluate on the remeshed set to mimic the practical scenario where the training set contains meshes tessellated via some particular common strategy, yet the fitted model must be applied to totally different meshes encountered in the wild. We also train several other methods—details are presented in Appendix C.

Table 5 and Figure 11 show how other mesh-based approaches degrade rapidly under remeshing; only DiffusionNet yields accurate correspondences that are largely stable under remeshing and resampling. Some point-based methods avoid the dependence on connectivity yet do not match the overall accuracy of the surface-based DiffusionNet.

Table 5.
remeshed/sampled variants
Methodorigisodenseqescloud
ACSCNN0.0535.2919.0941.15N/A
SplineCNN3.5131.0927.9540.43N/A
HSN9.5720.0124.8425.40N/A
PointNet (vertices)3.832.923.042.672.60
PointNet (sampled)9.994.257.844.444.13
DGCNN (vertices)2.4414.5815.7227.0032.13
DGCNN (sampled)6.524.305.573.612.66
DiffusionNet0.330.680.620.822.59
  • DiffusionNet automatically retains highly accurate results under changes in meshing and sampling, while many other approaches overfit to mesh connectivity. Here we give correspondence errors on our remeshed FAUST dataset after training on template meshes, measured in mean geodesic distance \( \times 100 \) after normalizing by the geodesic diameter.

Table 5. Accuracy under Remeshing and Resampling

  • DiffusionNet automatically retains highly accurate results under changes in meshing and sampling, while many other approaches overfit to mesh connectivity. Here we give correspondence errors on our remeshed FAUST dataset after training on template meshes, measured in mean geodesic distance \( \times 100 \) after normalizing by the geodesic diameter.

5.5 Transfer across Representations

Not only are the outputs of DiffusionNet consistent across remeshing and resampling of a shape, but furthermore the same network can be directly applied to different discrete representations. The only geometric data required for DiffusionNet are the Laplacian, mass, and spatial gradient matrices, which are easily constructed for many representations. Because all geometric operations in the network are defined in terms of these standard matrices, fitted network weights retain the same meaning across different representations. This enables us to train on one representation and evaluate on another, as seen in Figure 1 and Section 5.4, cloud, without any special treatment or fine-tuning. In the future, DiffusionNet opens the door to heterogeneous training sets that intermingle mesh, point cloud, and other surface data from various sources.

5.6 Efficiency and Robustness

Runtime. DiffusionNet requires only standard linear algebra operations for training and inference and is thus straightforward and efficient on modern hardware. As an example, DiffusionNet with spectral acceleration trains on the 14k-vertex RNA meshes (Section 5.2) in 38 ms per input and requires 2.2 GB of GPU memory. Preprocessing is performed on the CPU once for each input; for these RNA meshes, preprocessing takes 5.4 seconds and generates 12 MB of data each, composed mainly of the Laplacian eigenbasis \( \Phi \) for spectral acceleration. Table 6 summarizes the runtime performance of DiffusionNet and several other recent methods on the human segmentation task (Section 5.2), including preprocessing, training with gradient computation, and inference. All timings are measured on a 24-GB Titan RTX GPU and dual Xeon 5120 2.2-GHz CPUs.

Table 6.

Table 6. Runtime of Mesh-Based Learning Methods

Scaling. Most significantly, DiffusionNet’s efficiency enables direct learning on common mesh data without dramatic simplification, in contrast to other recent mesh-based schemes. As an example, the segmentation meshes from Maron et al. [2017] have up to 13k vertices, yet recent approaches simplify/downsample to roughly 1k vertices for training, as shown in Figure 2 [Hanocka et al. 2019; Wiersma et al. 2020; Lahav and Tal 2020; Mitchel et al. 2021]. In contrast, our networks easily run at full resolution on this and other datasets, paving the way for adoption in practice and improving accuracy due to preserved details (e.g., in Table 2). We even demonstrate DiffusionNet on a large, 184k vertex raw scan mesh from FAUST—again no special treatment is needed (Table 6 and Figure 1).

Fig. 2.

Fig. 2. Many recent mesh-based learning methods are applied only to dramatically simplified inputs (Section 5.6), while our method easily processes full-resolution models, preserving detail and facilitating adoption.

Fig. 3.

Fig. 3. Although past methods have achieved high-accuracy benchmark results for learning on meshes [Li et al. 2020b; Fey et al. 2018], they are prone to over-fitting to the mesh connectivity rather than learning the underlying shape structure (Section 5.4). In contrast, DiffusionNet learns an accurate representation-agnostic solution, which even supports training on meshes and evaluating on a point cloud (last column).

Fig. 4.

Fig. 4. We propose to learn a diffusion time for each feature channel, automatically tuning spatial support during training. The histograms show the learned times at each block in a DiffusionNet trained for segmentation; the times marked by the dashed lines are visualized by diffusing a point source from the starred point. The first block uses mainly local diffusion, while a channel in the last block finds nearly global support.

Robustness. DiffusionNet is also very robust to poor-quality input data; diffusion is a stable smoothing operation, and our method does not require any complex geometry processing operations such as geodesic distance [Masci et al. 2015], edge collapse [Hanocka et al. 2019], parallel transport [Wiersma et al. 2020], or managing pooling hierarchies with upsampling/ downsampling. Even the gradient matrix \( G \) is the result of a stable least-squares fit. If desired, then techniques like the intrinsic Delaunay Laplacian on meshes can be used to further increase robustness [Bobenko and Springborn 2007; Sharp and Crane 2020], though we do not find it necessary in our experiments. We demonstrate in Figure 1 that DiffusionNet can be applied directly to a low-quality, nonmanifold raw scan mesh without any issues.

Skip 6CONCLUSION Section

6 CONCLUSION

We present a new approach for learning on surfaces that is built by using learned diffusion as the main network component, with spatial gradient features to inject directional information. Our method is very efficient to train and evaluate, is robust to changes in sampling, and even generalizes across representations, in addition to achieving state-of-the-art results on a range of tasks.

Limitations. DiffusionNet is designed to leverage the geometric structure of a surface; consequently, it is not automatically robust to topological errors or outliers. In fact, diffusion does not allow any communication at all between distinct components of a surface, leading to nonsensical outputs in the presence of spuriously disconnected components (see inset). Subsequent work might mitigate the limitation by combining diffusion with other notions of communication, such as global pooling (à la [Qi et al. 2017a]) or edge convolutions over latent nearest-neighbors [Wang et al. 2019].

Our networks are intentionally agnostic to local discretization and thus may not be suited for tasks where one learns some property of the local discrete structure, such as denoising or mesh modification. Finally, although our method discourages overfitting to mesh sampling (Section 5.4), it cannot guarantee to totally eliminate it, and we still observe a small drop in performance when transferring between representations—further investigation will seek to close this gap entirely.

Future work. DiffusionNet can be applied to any surface representation for which a Laplacian matrix and spatial gradients can be constructed. This opens the door to directly learning—and even transferring pretrained networks—on a wide variety of surface representations, from occupancy grids [Caissard et al. 2019] to subdivision surfaces [De Goes et al. 2016]. More broadly, DiffusionNet need not be restricted to explicit surfaces and could easily be adapted to other geometric domains like volumetric meshes, curve networks, implicit level sets, depth maps, or images. We believe that grounding geometric deep learning in the mathematically and computationally well-established diffusion operation will offer benefits across surface learning and beyond.

APPENDICES

A AN ARGUMENT FOR GENERALITY

In Section 3.2, we propose diffusion at various learned timescales followed by a learned pointwise function as the essential components of our method. Although this formulation clearly offers nonlocal support to the pointwise functions, it is not immediately clear how general the resulting function space is. In particular, it is significant to show that this function space includes at least radially symmetric convolutions, a basic building block that has appeared widely in past work. The treatment of radially symmetric convolutions arises because points on surfaces do not generally have canonical tangent coordinates, though it should be noted that recent work has since focused on expanding beyond symmetric filters, and our own method includes gradient features for precisely this purpose. Lemma 1 states that, at least in the flat, continuous setting, this function space is sufficiently general to represent radially symmetric convolutions. Here, we give a full version of this argument and some discussion.

Consider a scalar field \( u: \mathbb {R}^2 \rightarrow \mathbb {R} \) in the plane. Let \( U_r(p): \mathbb {R}_{\ge 0} \rightarrow \mathbb {R} \) denote the integral of the field \( u \) along the sphere with radius \( r \) centered at \( p \), i.e., \( U_r(p) = \int _{\partial B(p,r)} u(y) dy \). Recall that \( u_t(p) : \mathbb {R}_{\ge 0} \rightarrow \mathbb {R} \) denotes the value of \( u \) at \( p \) after diffusion for time \( t \). We are interested in \( U_r(p) \), because it will enable the evaluation of radially symmetric convolutions against \( u \). The crux of our argument is to show that \( U_r(p) \) can be recovered from \( u_t(p) \), which we will formalize by showing the existence of a function transform \( \begin{equation*} \mathcal {T}: (\mathbb {R}_{\gt 0} \rightarrow \mathbb {R}) \rightarrow (\mathbb {R}_{\gt 0} \rightarrow \mathbb {R}) \end{equation*} \) such that (7) \( \begin{equation} U_r(p) = \mathcal {T}[u_t(p)](r). \end{equation} \) The heat kernel solution for \( u_t(p) \) is given by (8) \( \begin{equation} u_t(p) = \int _{\mathbb {R}^2} u(q) \frac{1}{4 \pi t} e^{-\frac{|p-q|^2}{4t}} dq = \int _{0}^\infty U_r(p) \frac{1}{4 \pi t} e^{-\frac{r^2}{4t}} dr, \end{equation} \) where the second equality moves to a radial integral, recalling that \( U_r(p) \) is defined as the integral of \( u \) along the sphere of radius \( r \) at \( p \). Calculation verifies that this integral has the form of a Laplace transform of \( U_r(p) \), (9) \( \begin{equation} u_t(p) = \frac{1}{4\pi t}\mathcal {L}\left[\frac{1}{2 \sqrt {r}}U_{\sqrt {r} }(p)\right]\left(\frac{1}{4t}\right). \end{equation} \) The Laplace transform is injective [Lerch 1903], which allows us to consider the inverse transform \( \begin{equation*} U_r(p) = \mathcal {T}[u_t(p)](r). \end{equation*} \) And in fact, \( \mathcal {T} \) will have the form of an inverse Laplace transform, up to reparameterization by \( \frac{1}{t} \) and constant coefficients.

Now that we have established the existence of \( \mathcal {T} \), it is straightforward to evaluate a radially symmetric convolution via a pointwise map applied to diffused values. Convolution against any radially symmetric kernel \( \alpha (r) : \mathbb {R}_{\ge 0} \rightarrow \mathbb {R} \) is given by (10) \( \begin{align} (u * \alpha)(p) &= \int _{\mathbb {R}^2} \alpha (|p-q|) u(q) dq \nonumber \nonumber\\ &= \int _{0}^\infty \alpha (r) U_r(p) dr \\ &= \int _{0}^\infty \alpha (r) \mathcal {T}[u_t(p)](r) dr \nonumber \nonumber . \end{align} \) In this sense, the function space defined by diffusion followed by a pointwise map contains the space of radially symmetric convolutions, completing our argument.

Extending this treatment from \( \mathbb {R}^2 \) to curved manifolds would require a deeper analysis, though the same essential properties hold for diffusion on surfaces. Furthermore, we treat only the continuous setting above rather than the discrete setting where pointwise maps are approximated via finite-dimensional MLPs, and diffusion is evaluated at a collection of times \( t \). More generally, it would be valuable to extend this analysis to formalize the stability properties of diffusion, à la Kostrikov et al. [2018] and Perlmutter et al. [2020]. Nonetheless, we consider this argument to be important evidence that diffusion followed by pointwise functions is an expressive function space, supported by the strong results of our method in practical experiments.

B ANALYSIS

Ablation. To validate the components of our approach, we consider a simple ablation study on the full-resolution human segmentation task from Section 5.2, using rotation-augmented raw coordinates as input. The variant no diffusion omits the diffusion layer from each DiffusionNet block, fixed-time diffusion manually specifies a diffusion time, no gradient features omits the gradient features, and unlearned gradient features includes gradient features but omits the learned transformation of gradient vectors \( A \). We observe a noticeable drop in accuracy when omitting any of the components of the method (Table 7). Manually specifying shared, non-optimal diffusion times (\( t=0.1 \), \( t=0.5 \)) yields a network with significantly worse accuracy compared to our learned approach. A key advantage of our learned diffusion is that this time is automatically tuned by the optimization process, individually for each feature channel.

Table 7.
AblationAccuracy
no diffusion31.4%
fixed-time diffusion \( t=0.1 \)89.1%
fixed-time diffusion \( t=0.5 \)81.6%
no gradient features84.1%
unlearned gradient features85.6%
(full method)90.6%
  • An ablation study evaluated on the human segmentation task. Omitting any of the components of our method leads to a significant drop in performance. Manually fixing a non-optimal diffusion time also impairs performance—our learned procedure automatically optimizes a diffusion time for each channel.

Table 7. Ablation Study on Human Segmentation

  • An ablation study evaluated on the human segmentation task. Omitting any of the components of our method leads to a significant drop in performance. Manually fixing a non-optimal diffusion time also impairs performance—our learned procedure automatically optimizes a diffusion time for each channel.

Spectral basis size. When evaluating diffusion with spectral acceleration (Section 3.3.2), increasing the size \( k \) of the spectral basis more accurately resolves diffusion at the cost of increased computation. In Figure 13, we vary \( k \) for the FAUST vertex-labeling correspondence task as in Table 5, measuring accuracy on the original test set. We find performance degrades significantly with fewer than 64 eigenvectors on this problem, while larger bases offer negligible benefit—our experiments use \( k=128 \) eigenvectors as a safe default.

C EXPERIMENT DETAILS

Here we provide additional methodology details for experiments.

Orientation. Figure 5 shows the results of a simple artificial experiment in which we segment the left vs. right side of human models from the FAUST dataset [Bogo et al. 2014] using a purely intrinsic 32-width DiffusionNet with HKS as input. On the original dataset, asymmetric biases—such as a template mesh with asymmetric connectivity—make it unintentionally easy to distinguish left from right. We cancel the effect of these biases by augmenting the dataset with a copy of each mesh that has been mirrored across the left-right axis (preserving orientation by inverting triangles). With a complex-valued \( A \), our network is able to easily distinguish left from right with 99.9% accuracy, despite both a purely intrinsic architecture and intrinsic input features. Restricting to real-valued \( A \) removes the effect; the network is unable to disambiguate the symmetry, with a totally random 50.0% accuracy.

Fig. 5.

Fig. 5. In a synthetic experiment, we demonstrate how our networks can successfully segment the left and right sides of bilaterally symmetric models even in a purely intrinsic formulation (left), because rotation in the tangent space for gradient features encodes a notion of orientation. Replacing this rotation with scaling (i.e., using a real matrix \( A \) in Equation (6)) removes the sensitivity to shape orientation (right) but also avoids the need for consistent outward normals. See Appendix C for details.

Fig. 6.

Fig. 6. Diffusion followed by an MLP enables the network to learn radially symmetric filters (left); introducing gradient features expands the space to include directional filters (right), while remaining invariant to the choice of local tangent basis. Here, we take a DiffusionNet block trained for segmentation and visualize the learned filter via channels of a normalized signal that maximizes the block output at the center point.

Fig. 7.

Fig. 7. We present DiffusionNet, a simple and effective architecture for learning on surfaces. It is composed of successive identical DiffusionNet blocks. Each block diffuses every feature for a learned timescale, forms spatial gradient features, and applies a spatially shared pointwise MLP at each vertex in a mesh/point cloud/etc. These networks achieve state-of-the-art performance on surface learning tasks without any explicit surface convolutions or pooling hierarchies, in part because they automatically optimize for variable spatial support (see, e.g., Figure 4).

Fig. 8.

Fig. 8. Segmenting RNA molecules with our method achieves accurate results when applied either directly to meshes or to sampled point clouds.

Fig. 9.

Fig. 9. DiffusionNet is highly effective as a feature extractor for nonrigid correspondence via functional maps, shown here in the supervised setting (Section 5.3). Correspondences are visualized by transferring a texture through the map. All methods yield a visually plausible solution when trained on the same dataset as the query pair (SCAPE, top row), but only DiffusionNet yields good results when generalizing after training on a different dataset (FAUST, bottom row).

Fig. 10.

Fig. 10. Examples of the remeshed FAUST test dataset used in Table 5.

Human segmentation. All results are given in Table 3. Past work has used different variations of this dataset, both in terms of the input data and evaluation criteria. The original dataset presented by Maron et al. [2017] contains moderately large meshes of up to 12k vertices, with segmentations labeled per face, and accuracy is reported as the fraction of faces in the entire test set that were classified correctly. The experiments from Wiersma et al. [2020] deviate slightly: They remap the ground truth to vertices and train and test on a subsampling of the vertex set; nonetheless, we group these results with the original dataset for the sake of simplicity as they are very similar.

MeshCNN [Hanocka et al. 2019] generated a simplified version of the dataset where the meshes have \( \lt \)1 k vertices and segmentations have been remapped to edges. Additionally, when reporting test evaluation, a soft ground truth is used allowing for multiple correct segmentation results for edges at the boundary between two regions. For comparison, we also apply DiffusionNet to this variant of the task, denoted by\( ^\dagger \) in Table 3—we directly generate a prediction per edge by averaging per-vertex outputs to edges before applying the final softmax and evaluate test results against the same soft ground truth. Finally, PD-MeshNet [Milano et al. 2020] generated per-face labels for the MeshCNN simplified models and trained and tested on these without any soft ground-truth—we denote this variant by\( ^\ddagger \) and again evaluate DiffusionNet with per-face predictions.

Across all variants, DiffusionNet achieves highly accurate performance. Unlike many of these methods, DiffusionNet can easily be trained directly on the original meshes without any special treatment. Even methods that evaluate on full-resolution models may be scalable only due to special pre- and post-processing schemes, which add complexity to adoption in practice—for instance, MeshWalker [Lahav and Tal 2020] trains on simplified meshes and then applies an upsampling and smoothing scheme to handle full resolution data.

Discretization agnostic learning. To investigate robustness to discretization on our remeshed FAUST dataset, we train several recent mesh-based and point-based surface learning methods, in addition to our own 256-width DiffusionNet with dropout. For mesh-based methods, we also train SplineCNN [Fey et al. 2018], ACSCNN [Li et al. 2020b], and HSN [Wiersma et al. 2020]; we also tried MeshCNN [Hanocka et al. 2019] but found it prohibitively expensive. For point-based methods, we train PointNet [Qi et al. 2017a] and DGCNN [Wang et al. 2019] and consider using both the vertex set as a point set, as well as sampling a random point cloud on the surface, predicting there and then projecting the results back to vertices according to nearest-neighbors. For equivalent comparison, all models are trained with only vertex positions as input (or the constant function, for ACSCNN and SplineCNN), and we augment during training with random rotations about the vertical axis to encourage rotation invariance. Wherever possible, we mimic the training configuration of the original work or make a best effort to find suitable parameters for this task. We note that some models perform slightly worse than previously reported results, presumably due to the use of simpler input features or learning in a rotation-invariant setting rather than aligned.

In general, only DiffusionNet learns accurate correspondences that are robust to remeshing and resampling. In particular, ACSCNN still produces nearly perfect results on the original template meshes even in the rotation-invariant setting but yields essentially random noise after any remeshing. Perhaps unsurprisingly, point-based methods are less prone to overfitting the mesh connectivity (though the DGCNN on vertices still manages to do so) but are still notably less accurate than mesh-based techniques. Figure 11 gives full geodesic error plots corresponding to Table 5.

Fig. 11.

Fig. 11. Accuracy curves for vertex-labelling correspondence on the FAUST dataset, as in Table 5. The first plot gives accuracy on the original test meshes, and the subsequent plots denote testing on our remeshed variants of the test set. For each plot, the \( x \) -axis is the geodesic error \( \times 100 \) after normalizing by geodesic diameter, and the \( y \) -axis is the percentage of predicted correspondences within that error.

Erroneous segmentation results due to disconnected components.

Fig. 13.

Fig. 13. The effect of varying the size of the truncated basis for spectral diffusion evaluation, measured via error on the FAUST vertex-labelling correspondence from Table 5, orig. We use 128 eigenvectors in all experiments.

REFERENCES

  1. Adobe. 2016. Adobe Mixamo 3D Characters. Retrieved from www.mixamo.com.Google ScholarGoogle Scholar
  2. Alexa Marc, Herholz Philipp, Kohlbrenner Maximilian, and Sorkine-Hornung Olga. 2020. Properties of laplace operators for tetrahedral meshes. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 5568.Google ScholarGoogle Scholar
  3. Anguelov Dragomir, Srinivasan Praveen, Koller Daphne, Thrun Sebastian, Rodgers Jim, and Davis James. 2005. SCAPE: Shape completion and animation of people. In ACM SIGGRAPH 2005 Papers. 408416.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Atzmon Matan, Maron Haggai, and Lipman Yaron. 2018. Point convolutional neural networks by extension operators. ACM Trans. Graph. 37, 4 (2018), 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Beani Dominique, Passaro Saro, Létourneau Vincent, Hamilton Will, Corso Gabriele, and Lió Pietro. 2021. Directional graph networks. In Proceedings of the 38th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol. 139.Google ScholarGoogle Scholar
  6. Belkin Mikhail and Niyogi Partha. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neur. Comput. 15, 6 (2003), 13731396.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Berman Helen M., Westbrook John, Feng Zukang, Gilliland Gary, Bhat Talapady N., Weissig Helge, Shindyalov Ilya N., and Bourne Philip E.. 2000. The protein data bank. Nucleic Acids Res. 28, 1 (2000), 235242.Google ScholarGoogle ScholarCross RefCross Ref
  8. Bobenko Alexander I. and Springborn Boris A.. 2007. A discrete laplace–beltrami operator for simplicial surfaces. Disc. Comp. Geom. 38, 4 (2007).Google ScholarGoogle ScholarCross RefCross Ref
  9. Bodnar Cristian, Frasca Fabrizio, Wang Yu Guang, Otter Nina, Montufar Guido, Liò Pietro, and Bronstein Michael M.. 2021. Weisfeiler and lehman go topological: Message passing simplicial networks. In ICLR 2021 Workshop on Geometrical and Topological Representation Learning.Google ScholarGoogle Scholar
  10. Bogo Federica, Romero Javier, Loper Matthew, and Black Michael J.. 2014. FAUST: Dataset and evaluation for 3D mesh registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 37943801.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Boscaini Davide, Masci Jonathan, Rodolà Emanuele, and Bronstein Michael. 2016. Learning shape correspondence with anisotropic convolutional neural networks. In Advances in Neural Information Processing Systems. 31893197.Google ScholarGoogle Scholar
  12. Bronstein Michael M., Bruna Joan, LeCun Yann, Szlam Arthur, and Vandergheynst Pierre. 2017. Geometric deep learning: Going beyond euclidean data. IEEE Sign. Process. Mag. 34, 4 (2017), 1842.Google ScholarGoogle ScholarCross RefCross Ref
  13. Bruna Joan, Zaremba Wojciech, Szlam Arthur, and Lecun Yann. 2014. Spectral networks and locally connected networks on graphs. In Proceedings of the International Conference on Learning Representations (ICLR’14).Google ScholarGoogle Scholar
  14. Bunge Astrid, Herholz Philipp, Kazhdan Misha, and Botsch Mario. 2020. Polygon laplacian made simple. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 303313.Google ScholarGoogle Scholar
  15. Caissard Thomas, Coeurjolly David, Lachaud Jacques-Olivier, and Roussillon Tristan. 2019. Laplace–beltrami operator on digital surfaces. J. Math. Imag. Vis. 61, 3 (2019), 359379.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cao Wenming, Yan Zhiyue, He Zhiquan, and He Zhihai. 2020. A comprehensive survey on geometric deep learning. IEEE Access 8 (2020), 3592935949. https://ieeexplore.ieee.org/abstract/document/9003285.Google ScholarGoogle ScholarCross RefCross Ref
  17. Chen Yanqing, Davis Timothy A., Hager William W., and Rajamanickam Sivasankaran. 2008. Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Trans. Math. Softw. 35, 3 (2008), 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Coifman Ronald R., Lafon Stephane, Lee Ann B., Maggioni Mauro, Nadler Boaz, Warner Frederick, and Zucker Steven W.. 2005. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc. Natl Acad. Sci. U.S.A. 102, 21 (2005), 74267431.Google ScholarGoogle ScholarCross RefCross Ref
  19. Crane Keenan, Goes Fernando De, Desbrun Mathieu, and Schröder Peter. 2013. Digital geometry processing with discrete exterior calculus. In ACM SIGGRAPH Courses. 1126.Google ScholarGoogle Scholar
  20. Goes Fernando De, Desbrun Mathieu, Meyer Mark, and DeRose Tony. 2016. Subdivision exterior calculus for geometry processing. ACM Trans. Graph. 35, 4 (2016), 111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Defferrard Michaël, Bresson Xavier, and Vandergheynst Pierre. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems. 38443852.Google ScholarGoogle Scholar
  22. Deng Haowen, Birdal Tolga, and Ilic Slobodan. 2018. Ppf-foldnet: Unsupervised learning of rotation invariant 3d local descriptors. In Proceedings of the European Conference on Computer Vision (ECCV’18). 602618.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Donati Nicolas, Sharma Abhishek, and Ovsjanikov M.. 2020. Deep geometric functional maps: Robust feature learning for shape correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20), 85898598.Google ScholarGoogle ScholarCross RefCross Ref
  24. Eliasof Moshe and Treister Eran. 2020. DiffGCN: Graph convolutional networks via differential operators and algebraic multigrid pooling. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  25. Ezuz Danielle, Solomon Justin, Kim Vladimir G., and Ben-Chen Mirela. 2017. GWCNN: A metric alignment layer for deep shape analysis. In Computer Graphics Forum, Vol. 36. Wiley Online Library, 4957.Google ScholarGoogle Scholar
  26. Feng Yutong, Feng Yifan, You Haoxuan, Zhao Xibin, and Gao Yue. 2019. Meshnet: Mesh neural network for 3d shape representation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 82798286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Fey Matthias, Lenssen Jan Eric, Weichert Frank, and Müller Heinrich. 2018. Splinecnn: Fast geometric deep learning with continuous b-spline kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 869877.Google ScholarGoogle ScholarCross RefCross Ref
  28. Garland Michael and Heckbert Paul S.. 1997. Surface simplification using quadric error metrics. In Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques. 209216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ginzburg Dvir and Raviv Dan. 2020. Cyclic functional mapping: Self-supervised correspondence between non-isometric deformable shapes. In Proceedings of the European Conference on Computer Vision (ECCV’20).Google ScholarGoogle Scholar
  30. Giorgi Daniéla, Biasotti Silvia, and Paraboschi Laura. 2007. Watertight models track. In Shape Retrieval Contest 2007: CNR–IMATI Via De Marini 6. Number 16149.Google ScholarGoogle Scholar
  31. Gong Shunwang, Chen Lei, Bronstein Michael, and Zafeiriou Stefanos. 2019. SpiralNet++: A fast and highly efficient mesh convolution operator. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 00.Google ScholarGoogle ScholarCross RefCross Ref
  32. Goyal Ankit, Law Hei, Liu Bowei, Newell Alejandro, and Deng Jia. 2021. Revisiting point cloud shape classification with a simple and effective baseline. In Proceedings of the International Conference on Machine Learning.Google ScholarGoogle Scholar
  33. Greengard Samuel. 2020. Geometric deep learning advances data science. Commun. ACM 64, 1 (Dec. 2020), 1315.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Groueix Thibault, Fisher Matthew, Kim Vladimir G., Russell Bryan, and Aubry Mathieu. 2018. 3D-CODED : 3D correspondences by deep deformation. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Guo Yulan, Wang Hanyun, Hu Qingyong, Liu Hao, Liu Li, and Bennamoun Mohammed. 2020. Deep learning for 3d point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Haan Pim De, Weiler Maurice, Cohen Taco, and Welling Max. 2021. Gauge equivariant mesh {CNN}s: Anisotropic convolutions on geometric graphs. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  37. Haim Niv, Segol Nimrod, Ben-Hamu Heli, Maron Haggai, and Lipman Yaron. 2019. Surface networks via general covers. In Proceedings of the IEEE International Conference on Computer Vision. 632641.Google ScholarGoogle ScholarCross RefCross Ref
  38. Hajij Mustafa, Istvan Kyle, and Zamzmi Ghada. 2020. Cell complex neural networks. In NeurIPS 2020 Workshop on Topological Data Analysis and Beyond.Google ScholarGoogle Scholar
  39. Halimi Oshri, Litany Or, Rodola Emanuele, Bronstein Alex M., and Kimmel Ron. 2019. Unsupervised learning of dense shape correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 43704379.Google ScholarGoogle ScholarCross RefCross Ref
  40. Hanocka Rana, Hertz Amir, Fish Noa, Giryes Raja, Fleishman Shachar, and Cohen-Or Daniel. 2019. MeshCNN: A network with an edge. ACM Trans. Graph. 38, 4 (2019), 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Hansen Lasse, Diesel Jasper, and Heinrich Mattias P.. 2018. Multi-kernel diffusion cnns for graph-based learning on point clouds. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarGoogle Scholar
  42. He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  43. He Wenchong, Jiang Zhe, Zhang Chengming, and Sainju Arpan Man. 2020. CurvaNet: Geometric deep learning based on directional curvature for 3D shape analysis. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 22142224.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Hu Qingyong, Yang Bo, Xie Linhai, Rosa Stefano, Guo Yulan, Wang Zhihua, Trigoni Niki, and Markham Andrew. 2020. RandLA-Net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  45. Jiang Chiyu Max, Huang Jingwei, Kashinath Karthik, Prabhat, Marcus Philip, and Niessner Matthias. 2018. Spherical CNNs on unstructured grids. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  46. Kalogerakis Evangelos, Averkiou Melinos, Maji Subhransu, and Chaudhuri Siddhartha. 2017. 3D shape segmentation with projective convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle ScholarCross RefCross Ref
  47. Klicpera Johannes, Weissenberger Stefan, and Günnemann Stephan. 2019. Diffusion improves graph learning. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’19).Google ScholarGoogle Scholar
  48. Klokov Roman and Lempitsky Victor. 2017. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 863872.Google ScholarGoogle ScholarCross RefCross Ref
  49. Knöppel Felix, Crane Keenan, Pinkall Ulrich, and Schröder Peter. 2013. Globally optimal direction fields. ACM Trans. Graph. 32, 4 (2013), 110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Kostrikov Ilya, Jiang Zhongshi, Panozzo Daniele, Zorin Denis, and Bruna Joan. 2018. Surface networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 25402548.Google ScholarGoogle ScholarCross RefCross Ref
  51. Lahav Alon and Tal Ayellet. 2020. Meshwalker: Deep mesh understanding by random walks. ACM Trans. Graph. 39, 6 (2020), 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Lehoucq Richard B., Sorensen Danny C., and Yang Chao. 1998. ARPACK Users’ Guide: Solution of Large-scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM.Google ScholarGoogle ScholarCross RefCross Ref
  53. Lerch Matyáš. 1903. Sur un point de la théorie des fonctions génératrices d’Abel. Acta Math. 27, 1 (1903), 339351.Google ScholarGoogle ScholarCross RefCross Ref
  54. Levie Ron, Huang Wei, Bucci Lorenzo, Bronstein Michael M., and Kutyniok Gitta. 2019. Transferability of spectral graph convolutional neural networks. arXiv:1907.12972. Retrieved from https://arxiv.org/abs/1907.12972.Google ScholarGoogle Scholar
  55. Levie Ron, Monti Federico, Bresson Xavier, and Bronstein Michael M.. 2018. Cayleynets: Graph convolutional neural networks with complex rational spectral filters. IEEE Trans. Sign. Process. 67, 1 (2018), 97109.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Levy Bruno. 2006. Laplace-beltrami eigenfunctions towards an algorithm that “understands” geometry. In Proceedings of the IEEE International Conference on Shape Modeling and Applications 2006 (SMI’06). IEEE, 1313.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Li Maosen, Chen Siheng, Zhang Ya, and Tsang Ivor W.. 2020a. Graph cross networks with vertex infomax pooling. In Proceedings of the Conference and Workshop on Neural Information Processing Systems (NeurIPS’20).Google ScholarGoogle Scholar
  58. Li Qinsong, Liu Shengjun, Hu Ling, and Liu Xinru. 2020b. Shape correspondence using anisotropic Chebyshev spectral CNNs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1465814667.Google ScholarGoogle ScholarCross RefCross Ref
  59. Li Xianzhi, Li Ruihui, Chen Guangyong, Fu Chi-Wing, Cohen-Or Daniel, and Heng Pheng-Ann. 2021. A rotation-invariant framework for deep point cloud analysis (unpublished).Google ScholarGoogle Scholar
  60. Li Yangyan, Bu Rui, Sun Mingchao, Wu Wei, Di Xinhan, and Chen Baoquan. 2018. Pointcnn: Convolution on x-transformed points. Adv. Neural Inf. Process. Syst. 31 (2018), 820830.Google ScholarGoogle Scholar
  61. Lian Z., Godil A., Bustos B., Daoudi M., Hermans J., Kawamura S., Kurita Y., Lavoua G., and Suetens P. Dp. 2011. Shape retrieval on non-rigid 3D watertight meshes. In Proceedings of the Eurographics Workshop on 3d Object Retrieval (3DOR’11).Google ScholarGoogle Scholar
  62. Lim Isaak, Dielen Alexander, Campen Marcel, and Kobbelt Leif. 2018. A simple approach to intrinsic correspondence learning on unstructured 3d meshes. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarGoogle Scholar
  63. Lin Zhi-Hao, Huang Sheng-Yu, and Wang Yu-Chiang Frank. 2020. Convolution in the Cloud: Learning deformable kernels in 3D graph convolution networks for point cloud analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20).Google ScholarGoogle ScholarCross RefCross Ref
  64. Litany Or, Remez Tal, Rodola Emanuele, Bronstein Alex, and Bronstein Michael. 2017. Deep functional maps: Structured prediction for dense shape correspondence. In Proceedings of the IEEE International Conference on Computer Vision. 56595667.Google ScholarGoogle ScholarCross RefCross Ref
  65. Liu Hsueh-Ti Derek, Zhang Jiayi Eris, Ben-Chen Mirela, and Jacobson Alec. 2021. Surface multigrid via intrinsic prolongation. ACM Trans. Graph. 40, 4, Article 80 (July 2021), 13 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Liu Risheng, Zhong Guangyu, Cao Junjie, Lin Zhouchen, Shan Shiguang, and Luo Zhongxuan. 2016. Learning to diffuse: A new perspective to design pdes for visual analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38, 12 (2016), 24572471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Ma Zheng, Xuan Junyu, Wang Yu Guang, Li Ming, and Liò Pietro. 2020. Path integral based convolution and pooling for graph neural networks. In Advances in Neural Information Processing Systems, Larochelle H., Ranzato M., Hadsell R., Balcan M. F., and Lin H. (Eds.), Vol. 33. 1642116433.Google ScholarGoogle Scholar
  68. MacNeal Richard. 1949. The Solution of Partial Differential Equations by Means of Electrical Networks. Ph.D. Dissertation. Caltech.Google ScholarGoogle Scholar
  69. Maron Haggai, Galun Meirav, Aigerman Noam, Trope Miri, Dym Nadav, Yumer Ersin, Kim Vladimir G, and Lipman Yaron. 2017. Convolutional neural networks on surfaces via seamless toric covers. ACM Trans. Graph. 36, 4 (2017), 71–1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Masci Jonathan, Boscaini Davide, Bronstein Michael, and Vandergheynst Pierre. 2015. Geodesic convolutional neural networks on riemannian manifolds. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3745.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Maturana Daniel and Scherer Sebastian. 2015. Voxnet: A 3d convolutional neural network for real-time object recognition. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’15). IEEE, 922928.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Melzi Simone, Ren Jing, Rodolà Emanuele, Sharma Abhishek, Wonka Peter, and Ovsjanikov Maks. 2019. ZoomOut: Spectral upsampling for efficient shape correspondence. ACM Trans. Graph. 38, 6 (2019), 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Meyer Mark, Desbrun Mathieu, Schröder Peter, and Barr Alan H.. 2003. Discrete differential-geometry operators for triangulated 2-manifolds. In Visualization and Mathematics III. Springer, 3557.Google ScholarGoogle ScholarCross RefCross Ref
  74. Milano Francesco, Loquercio Antonio, Rosinol Antoni, Scaramuzza Davide, and Carlone Luca. 2020. Primal-dual mesh convolutional neural networks. Adv. Neural Inf. Process. Syst. 33 (2020), 952963.Google ScholarGoogle Scholar
  75. Mitchel Thomas W., Kim Vladimir G., and Kazhdan Michael. 2021. Field convolutions for surface CNNs. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’21). 1000110011.Google ScholarGoogle ScholarCross RefCross Ref
  76. Monti Federico, Boscaini Davide, Masci Jonathan, Rodola Emanuele, Svoboda Jan, and Bronstein Michael M.. 2017. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 51155124.Google ScholarGoogle ScholarCross RefCross Ref
  77. Mukherjee Sayan and Wu Qiang. 2006. Estimation of gradients and coordinate covariation in classification. J. Mach. Learn. Res. 7(Nov.2006), 24812514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Ovsjanikov Maks, Ben-Chen Mirela, Solomon Justin, Butscher Adrian, and Guibas Leonidas. 2012. Functional maps: A flexible representation of maps between shapes. ACM Trans. Graph. 31, 4 (2012), 111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 80268037.Google ScholarGoogle Scholar
  80. Peng Yifan, Lin Lin, Ying Lexing, and Zepeda-Núñez Leonardo. 2020. Efficient long-range convolutions for point clouds. arXiv:2010.05295.Google ScholarGoogle Scholar
  81. Perlmutter Michael, Gao Feng, Wolf Guy, and Hirn Matthew. 2020. Geometric wavelet scattering networks on compact riemannian manifolds. In Proceedings of the 1st Mathematical and Scientific Machine Learning Conference. Proceedings of Machine Learning Research, Vol. 107, Lu Jianfeng and Ward Rachel (Eds.). 570604.Google ScholarGoogle Scholar
  82. Pinkall Ulrich and Polthier Konrad. 1993. Computing discrete minimal surfaces and their conjugates. Exp. Math. 2, 1 (1993).Google ScholarGoogle ScholarCross RefCross Ref
  83. Poulenard Adrien and Ovsjanikov Maks. 2018. Multi-directional geodesic neural networks via equivariant convolution. ACM Trans. Graph. 37, 6 (2018), 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Poulenard Adrien, Rakotosaona Marie-Julie, Ponty Yann, and Ovsjanikov Maks. 2019. Effective rotation-invariant point cnn with spherical harmonics kernels. In Proceedings of the International Conference on 3D Vision (3DV’19). IEEE, 4756.Google ScholarGoogle ScholarCross RefCross Ref
  85. Qi Charles R., Su Hao, Mo Kaichun, and Guibas Leonidas J.. 2017a. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the Computer Vision and Pattern Recognition (CVPR’17), 4.Google ScholarGoogle Scholar
  86. Qi Charles Ruizhongtai, Yi Li, Su Hao, and Guibas Leonidas J.. 2017b. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems. 51055114.Google ScholarGoogle Scholar
  87. Ren Jing, Poulenard Adrien, Wonka Peter, and Ovsjanikov Maks. 2018. Continuous and orientation-preserving correspondences via functional maps. ACM Trans. Graph. 37, 6 (2018), 116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Roufosse Jean-Michel, Sharma Abhishek, and Ovsjanikov Maks. 2019. Unsupervised deep learning for structured shape matching. In Proceedings of the IEEE International Conference on Computer Vision. 16171627.Google ScholarGoogle ScholarCross RefCross Ref
  89. Rustamov Raif M.. 2007. Laplace-Beltrami eigenfunctions for deformation invariant shape representation. In Proceedings of the 5th Eurographics Symposium on Geometry Processing. 225233.Google ScholarGoogle Scholar
  90. Sfikas Konstantinos, Theoharis Theoharis, and Pratikakis Ioannis. 2017. Exploiting the PANORAMA representation for convolutional neural network classification and retrieval. In Eurographics Workshop on 3D Object Retrieval.Google ScholarGoogle Scholar
  91. Sharma Abhishek and Ovsjanikov Maks. 2020. Weakly supervised deep functional maps for shape matching. Advances in Neural Information Processing Systems 33 (2020).Google ScholarGoogle Scholar
  92. Sharp Nicholas and Crane Keenan. 2020. A laplacian for nonmanifold triangle meshes. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 6980.Google ScholarGoogle Scholar
  93. Sharp Nicholas, Soliman Yousuf, and Crane Keenan. 2019. The vector heat method. ACM Trans. Graph. 38, 3 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Shi Baoguang, Bai Song, Zhou Zhichao, and Bai Xiang. 2015. Deeppano: Deep panoramic representation for 3-d shape recognition. IEEE Sign. Process. Lett. 22, 12 (2015), 23392343.Google ScholarGoogle ScholarCross RefCross Ref
  95. Sinha Ayan, Bai Jing, and Ramani Karthik. 2016. Deep learning 3D shape surfaces using geometry images. In Proceedings of the European Conference on Computer Vision. Springer, 223240.Google ScholarGoogle ScholarCross RefCross Ref
  96. Smirnov Dmitriy and Solomon Justin. 2021. HodgeNet: Learning spectral geometry on triangle meshes. ACM Trans. Graph. 40, 4, Article 166 (July 2021), 11 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Su Hang, Maji Subhransu, Kalogerakis Evangelos, and Learned-Miller Erik. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE International Conference on Computer Vision. 945953.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Sun Jian, Ovsjanikov Maks, and Guibas Leonidas. 2009. A concise and provably informative multi-scale signature based on heat diffusion. In Computer Graphics Forum, Vol. 28. Wiley Online Library, 13831392.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Sun Zhiyu, Rooke Ethan, Charton Jerome, He Yusen, Lu Jia, and Baek Stephen. 2020. Zernet: Convolutional neural networks on arbitrary surfaces via zernike local tangent space estimation. In Computer Graphics Forum. Wiley Online Library.Google ScholarGoogle Scholar
  100. Thomas Hugues, Qi Charles R., Deschaud Jean-Emmanuel, Marcotegui Beatriz, Goulette François, and Guibas Leonidas J.. 2019. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE International Conference on Computer Vision. 64116420.Google ScholarGoogle ScholarCross RefCross Ref
  101. Tombari Federico, Salti Samuele, and Stefano Luigi Di. 2010. Unique signatures of histograms for local surface description. In Proceedings of the European Conference on Computer Vision. Springer, 356369.Google ScholarGoogle ScholarCross RefCross Ref
  102. Vallet Bruno and Lévy Bruno. 2008. Spectral geometry processing with manifold harmonics. In Computer Graphics Forum, Vol. 27. Wiley Online Library, 251260.Google ScholarGoogle Scholar
  103. Vaxman Amir, Ben-Chen Mirela, and Gotsman Craig. 2010. A multi-resolution approach to heat kernels on discrete surfaces. In ACM SIGGRAPH 2010 Papers. 110.Google ScholarGoogle Scholar
  104. Verma Nitika, Boyer Edmond, and Verbeek Jakob. 2018. Feastnet: Feature-steered graph convolutions for 3d shape analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 25982606.Google ScholarGoogle ScholarCross RefCross Ref
  105. Virtanen Pauli, Gommers Ralf, Oliphant Travis E., Haberland Matt, Reddy Tyler, Cournapeau David, Burovski Evgeni, Peterson Pearu, Weckesser Warren, Bright Jonathan, Walt Stéfan J. van der, Brett Matthew, Wilson Joshua, Millman K. Jarrod, Mayorov Nikolay, Nelson Andrew R. J., Jones Eric, Kern Robert, Larson Eric, Carey C J, Polat İlhan, Feng Yu, Moore Eric W., VanderPlas Jake, Laxalde Denis, Perktold Josef, Cimrman Robert, Henriksen Ian, Quintero E. A., Harris Charles R., Archibald Anne M., Ribeiro Antônio H., Pedregosa Fabian, Mulbregt Paul van, and Contributors SciPy 1.0. 2020. SciPy 1.0: Fundamental algorithms for scientific computing in python. Nat. Methods 17 (2020), 261272.Google ScholarGoogle ScholarCross RefCross Ref
  106. Vlasic Daniel, Baran Ilya, Matusik Wojciech, and Popović Jovan. 2008. Articulated mesh animation from multi-view silhouettes. In ACM SIGGRAPH 2008 Papers. 19.Google ScholarGoogle Scholar
  107. Wang Peng-Shuai, Liu Yang, Guo Yu-Xiao, Sun Chun-Yu, and Tong Xin. 2017. O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM Trans. Graph. 36, 4 (2017), 72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Wang Yue, Sun Yongbin, Liu Ziwei, Sarma Sanjay E., Bronstein Michael M., and Solomon Justin M.. 2019. Dynamic graph CNN for learning on point clouds. ACM Trans. Graph.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Wei Lingyu, Huang Qixing, Ceylan Duygu, Vouga Etienne, and Li Hao. 2016. Dense human body correspondences using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE, 15441553.Google ScholarGoogle ScholarCross RefCross Ref
  110. Wiersma Ruben, Eisemann Elmar, and Hildebrandt Klaus. 2020. CNNs on surfaces using rotation-equivariant features. ACM Trans. Graph. 39, 4, Article 92 (July 2020), 12 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Wu Zhirong, Song Shuran, Khosla Aditya, Yu Fisher, Zhang Linguang, Tang Xiaoou, and Xiao Jianxiong. 2015. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19121920.Google ScholarGoogle Scholar
  112. Xu Bingbing, Shen Huawei, Cao Qi, Cen Keting, and Cheng Xueqi. 2019. Graph convolutional networks using heat kernel for semi-supervised learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). 19281934.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Xu Kai, Kim Vladimir G., Huang Qixing, Mitra Niloy, and Kalogerakis Evangelos. 2016. Data-driven shape analysis and processing. In SIGGRAPH ASIA 2016 Courses. ACM, 4.Google ScholarGoogle Scholar
  114. Yang Zhangsihao, Litany Or, Birdal Tolga, Sridhar Srinath, and Guibas Leonidas. 2021. Continuous geodesic convolutions for learning on 3d shapes. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 134144.Google ScholarGoogle ScholarCross RefCross Ref
  115. Yi Li, Su Hao, Guo Xingwen, and Guibas Leonidas J. 2017. Syncspeccnn: Synchronized spectral cnn for 3d shape segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 22822290.Google ScholarGoogle ScholarCross RefCross Ref
  116. Zhang Hao, Kaick Oliver Van, and Dyer Ramsay. 2010. Spectral mesh processing. In Computer Graphics Forum, Vol. 29. Wiley Online Library, 18651894.Google ScholarGoogle Scholar
  117. Zhang Zhiyuan, Hua Binh-Son, Rosen David W, and Yeung Sai-Kit. 2019. Rotation invariant convolutions for 3D point clouds deep learning. In Proceedings of the International Conference on 3D Vision (3DV’19). IEEE, 204213.Google ScholarGoogle ScholarCross RefCross Ref
  118. Zhao Yongheng, Birdal Tolga, Lenssen Jan Eric, Menegatti Emanuele, Guibas Leonidas, and Tombari Federico. 2020. Quaternion equivariant capsule networks for 3D point clouds. In Proceedings of the European Conference on Computer Vision (ECCV’20).Google ScholarGoogle Scholar
  119. Zhou Yi, Wu Chenglei, Li Zimo, Cao Chen, Ye Yuting, Saragih Jason, Li Hao, and Sheikh Yaser. 2020. Fully convolutional mesh autoencoder using efficient spatially varying kernels. In Advances in Neural Information Processing Systems, Larochelle H., Ranzato M., Hadsell R., Balcan M. F., and Lin H. (Eds.), Vol. 33. 92519262.Google ScholarGoogle Scholar

Index Terms

  1. DiffusionNet: Discretization Agnostic Learning on Surfaces

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format