Continuous remeshing for inverse rendering

We present a novel method for joint optimization and remeshing and apply it to inverse rendering. Rapid advances in differentiable rendering during the last years paved the way for fast inverse rendering of complex scenes. But serious problems with gradient‐based optimization of triangle meshes remain. Applying gradient steps to the vertices can lead to mesh defects, such as flipped triangles, crumpled regions, and self‐intersections. Choosing a good vertex count is crucial for the optimization quality and performance but is usually done by hand. Moreover, meshes with fixed triangulation struggle to adapt to complex geometry. Our novel method tackles all these problems by applying an adaptive remeshing step in each single iteration of the optimization loop. By immediately collapsing suspicious triangles, we avoid and heal mesh defects. We use a closed‐loop‐controlled location‐dependent edge length. We compare our solution to state‐of‐the‐art methods and find that it is faster and more accurate. It produces finer meshes with fewer defects, requires less parameter tuning and can reconstruct more complex objects.


INTRODUCTION
Inverse rendering is the problem of reconstructing an unknown 3D scene from 2D images. Conventional methods in this field are usually based on finding correspondences between the images. Obviously, the shading of the corresponding surface region needs to be similar for the involved viewing directions. Wherever the surface is not rough enough, or the lighting is not smooth enough, the matching may completely fail. 1 In recent years, differentiable rendering came up to solve this problem at its root. Given a powerful enough differentiable renderer and a powerful enough gradient based general purpose optimizer, any inverse rendering problem can, in theory, be solved by minimizing an image loss, using a simple optimization loop. However, solving for a 3D scene, with thousands or millions of strongly coupled parameters, is often a hard and highly non-convex problem.
For differentiable rendering, as with conventional rendering, surfaces are usually represented by triangle meshes. But the simplicity and efficiency of mesh-based surfaces come with a major drawback. Optimizing the vertex positions F I G U R E 1 Two examples of sparse and inconsistent gradients (red arrows). A sphere (gray) is optimized towards a target mesh (yellow). Left: Sparse silhouette gradients pull out a small number of vertices while keeping all other vertices unchanged. Right: Inconsistent silhouette gradients from two images crumple the mesh at the intersection of the silhouette cones.
independently can lead to flipped triangles, distortions, and self-intersections. Once these defects occur, the optimization often gets stuck in a local minimum.
Mesh vertices receive two different kinds of gradients from differentiable renderers: -Shading gradients, coming from deviating surface normals, reflection properties, or lighting, and -Silhouette gradients, coming from deviating shape A major source of critical mesh distortions are sparse or inconsistent silhouette gradients. Figure 1 illustrates two important examples. For growing regions (left drawing), silhouette gradients continuously pull the same sparse vertices out of the silhouette, keeping all other vertices unchanged. In contrast, for shrinking regions (right drawing), silhouette gradients push vertices into the silhouette. As a result, when averaged over time, shrinking silhouette gradients are not sparse. But gradients coming from different images may be inconsistent, often leading to strong crumpling in the affected zones.
When using fixed triangulation and standard optimizers, like Adam, 2 these problems can only be avoided by using strong regularization terms and small learning rates 1 . But these regularizations penalize non-smooth surfaces and, therefore, lead to incorrect results.
Finer meshes require stronger regularization and smaller learning rates. The reason is simply, that with smaller vertex distances, updates can more likely lead to critical distortion. Therefore, starting with a coarse mesh and stepwise refining the mesh during the optimization can greatly speed up the optimization and make it more robust, as seen in previous work. [3][4][5] This basically splits the optimization into multiple stages. But this approach requires additional hyperparameters that strongly depend on the complexity of the scene.
We propose a method that generalizes the idea of coarse-to-fine optimization. We keep the triangle size spatially and temporally within a near-optimal range by continuously and adaptively remeshing the surface. Keeping a good triangle size avoids the majority of critical distortions and face flips. Remaining defects are immediately healed by collapsing suspicious edges. Our method is scale-invariant and can handle models of different complexity without parameter-tuning. Figure 2 shows an intermediate state of our method reconstructing the Lucy model with color-coded triangle sizes for visualization. While red areas are still coarse, blue areas are already close to the target surface and can be refined. The process can be best seen in our supplemental videos.
In summary, our contributions are: -An optimization rule that continuously estimates the optimal local edge length.
-A GPU-friendly adaptive remeshing algorithm, fast enough to be applied after each optimization step.
We give an overview of related work in Section 2, explain our basic ideas in Section 3, present our method in Section 4 and our implementation in Section 5. In Section 6, we show results and compare our solution to other methods.  8 proposed Soft Rasterizer, which smoothens the rasterization process to get gradients from discontinuities. PyTorch3D, a very popular pytorch library by Ravi et al. 9 lowered the barrier for beginners. In 2020 finally, Laine et al. 10 from NVIDIA published Nvdiffrast, a very fast OpenGL-based differentiable rasterizer. -Differentiable ray tracers, on the other hand, cast rays through each pixel of the camera image. While that enables handling all kinds of physical interactions of the light, rendering is usually much slower than with rasterizers. Li et al. 11 handle discontinuities by sampling additional rays on the object edges. In 2019, Nimier-David et al. 12 released Mitsuba 2, an extremely versatile physically correct renderer. Since then, new methods were developed to improve and greatly speed up differentiable ray tracing. 13-15

Mesh optimization for inverse rendering
Various methods exist to avoid distortions, face flips, and self-intersections: -Mesh regularization adds different kinds of penalty terms to the rendering loss to smooth the mesh. 9,16 Laplacian and bi-Laplacian are the most popular terms. -Diffeomorphic mesh flows utilize ordinary differential equations to force smooth surfaces. 4,17 While that solves the problem of face-flipping and self-intersection, the method typically requires large neural networks and is therefore comparably slow. -Nicolet et al. 3 optimize differential coordinates instead of raw vertices to smooth out sparse gradients. This can also be seen as a Laplacian preconditioning. The method is fast and robust for smooth surfaces, but the reconstruction of higher frequencies is typically slow. -For all methods above, coarse-to-fine optimization is in general beneficial to robustness and performance as reported by Nicolet et al., 3 Cheng et al., 4 and Luan et al. 5 The optimization loop is split into about 2-10 stages. After each stage, the mesh is refined using standard remeshing methods.

Remeshing
Back in 1993, Hoppe et al. 18 first proposed an explicit remeshing procedure, using edge collapse, edge split and edge flip. Botsch et al. 19 further investigated the method. Surazhsky et al. propose an area-based smoothing technique. 20 Jakob et al. 21 optimize both the edge orientations and vertex positions.

Adam optimizer
The well-known Adam 2 optimization rule is in short: x where Φ is the objective function (loss), x is a parameter to optimize, g is the parameter gradient, m 1 is the first moment (momentum), m 2 is the second moment (uncentered variance), andm i are the bias-corrected moments.

IsotropicAdam
When applied to triangle meshes, x in Equations (1) and (6) usually holds the vertex positions. With V being the number of vertices, there are 3V degrees of freedom. Optimizing these parameters independently breaks spatial symmetry in Equation (6). So it is useful to see x as a vector-valued parameter array (denoted in bold), in other words, x, g, m 1 become matrices of size V × 3. Now it is easy to fix the symmetry break in m 2 by using the spatial L2-norm of g. We will call this slightly modified update rule IsotropicAdam: x ← x −m 1 || ⋅ || in Equation (9) denotes the L2-norm over the three spatial coordinates. Note that m 2 is now scalar-valued again, so IsotropicAdam requires substantially less calculations than Adam.

Relative velocity
The change of x in Equations (6), (10) is the discrete vertex velocity. It is the product of the learning rate and a term that the authors of Adam call signal noise ratio, 2 which is quite misleading. In fact, the term is related to signal and noise, but it is not their ratio. As we are dealing with vertex velocity, we prefer the name relative velocity 2 . We omit from here for better reading.
It is easier to understand the meaning of , if we investigate the one-dimensional case, neglect the different values for 1 and 2 and introduce a real signal noise ratio SNR = m 1 ∕ , using the standard deviation and its well-known property The last expression in Equation (12) is a sigmoid function of SNR, shown in Figure 3. We can see from this simplified version of , that || || is close to one for large SNR (uniform gradients far away from the target surface) and goes to zero for small values of SNR (varying gradients close to the target surface).

F I G U R E 4
The learning rate defines a trust region with radius r = ⋅ l around the vertices.

F I G U R E 5
We optimize a 1D mesh (black line) to match a target shape (gray dashed line). Please also watch our supplemental video.
The same holds in the three-dimensional case. IsotropicAdam updates vertices with uniform gradients at a maximum speed close to the learning rate . When the vertices get closer to the target surface, gradients typically become more and more noisy and IsotropicAdam reduces the update speed accordingly.

Learning rate and edge length
As explained above, the learning rate can be seen as a maximum vertex speed. Note, however, that because of the different values for 1 and 2 , || || can actually be larger than one in some cases. The authors of Adam interpret as a trust region around the current parameter value. In the context of triangle meshes, a single iteration should not flip triangles, so the trust radius is directly related to the edge length, for example l∕2, see Figure 4. Knowing that in (10) should be proportional to l, we modify the update rule and redefine to get a scale-invariant learning rate:

Optimal edge length
We have seen in Sections 1 and 3.3, that choosing a good triangle size (edge length) is crucial for performance and robustness of the optimization. Let's look at a very simple 1D example to find out how to choose the optimal edge length. We optimize a chain (black line in Figure 5) to match a target curve (gray/yellow). According to the shading gradients in the 3D case, we minimize the mean squared error of the chain normals, using Adam with a learning rate proportional to the edge length, see Equation (13). The gray lines in Figure 7 show the results for different fixed vertex counts. As expected, the coarse chains converge faster, but once they reach the target surface, they get stuck with no further improvement. Ideally, we would adapt the edge length to the distance to the surface. But the surface is of course unknown, and so we need to estimate the distance somehow. Fortunately, the relative speed | | is closely related to the distance to the target surface, as discussed in Section 3.2. depends on the actual triangulation, so it cannot serve as an absolute distance estimation, but at least it indicates whether the current edge length is too short, good or too long. Knowing this, we can build a linear closed-loop controller for estimating the optimal edge length l ref . We initialize l ref with the initial edge length and update it in each iteration using where G is a gain constant, | | is the mean relative speed of all vertices and ref ∈ [0, 1] is a fixed reference value, indicated in Figure 3 with blue lines. We choose ref = 0.3 and G = 0.5 for the 1D experiment. (It turns out that the same numbers also work great for 3D meshes.) Figure 6 illustrates the edge length controller. Continuously reducing the edge length would introduce unnecessary interpolation errors. So, we use a relative edge length tolerance of tol = 0.6 and remesh once the deviation |l − l ref | > l ref ⋅ tol (green area in Figure 6). The green line in Figure 7 shows the results of the adaptive remeshing. As intended, the result is at any point during optimization better than the respective best result with fixed vertex count.

METHOD
To recap the 1D example in Section 3.4, we used the mean relative speed | | of all vertices as a feedback value to control the reference edge length l ref in Equation (14). We then applied a global remeshing every time the reference edge length deviated too much from the actual edge length. We could do the same global remeshing for triangle meshes. But for non-trivial surfaces, the relative speed (the distance to the target) is always locally varying, so we use a per-vertex reference edge length with continuous adaptive remeshing. Putting everything together, our final update rule is: x ← x + l ref , In the last Equation (23) we apply a remeshing step using the vertices x, faces f and reference edge length l ref as explained in Section 4.1.

Remeshing
We remesh after each vertex update to immediately heal any defects. A full remeshing at each iteration would be too expensive, but we can apply a single step of an iterative remeshing algorithm, embedding the remeshing loop continuously into the optimization loop. Our remeshing method is based on the popular explicit surface remeshing, [18][19][20] which iteratively applies edge-split, edge-collapse, edge-flip, smoothing, and relocating, where, in our case, relocating is already covered by the gradient-based updates. These steps normally iterate sequentially over the edges. We adapt the algorithm to process the vertices in parallel to be able to run the complete remeshing on a GPU. This is not trivial, as interference of parallel operations must be prevented. Moreover, we extend edge-collapse to better heal defects. Our remeshing algorithm is quite complex and includes many heuristics, but it is based on proven standard methods and so it works out of the box without any parameter tuning, using only the mesh and target edge length as inputs.
Remeshing during the optimization covers three main objectives: -Keep the (local) edge length close to the optimum.
-Heal defects coming from sparse or inconsistent vertex gradients.
-Keep the mesh uniform and well triangulated.
The first objective is covered by edge-split and edge-collapse. Splitting and collapsing seriously impacts the triangulation quality, so it is crucial to avoid unnecessary splits and collapses. Therefore, we use an edge length tolerance of typical tol = 0.5.
F I G U R E 8 Left: Selected edges of an existing mesh (black/gray) are split. We use an equal-length example for better illustration. New vertices are inserted at the midpoints (red). Right: The corners of affected triangles are moved counter-clockwise to the new vertices (c 2 and c 3 in this case), shrinking the existing triangle (gray). At each of these corners, a new triangle is inserted (blue, green).

F I G U R E 9
We detect face flips (folds) by comparing each face normal with the mean normal of its corner vertices.
The second objective is covered by edge-collapse and the third by edge-flip together with Laplacian regularization. We now describe all remeshing steps in detail:

Edge split
We split all edges that are longer than l max in parallel. Figure 8 illustrates the algorithm. First, all existing triangles (gray) are processed: If an edge of a triangle is to be split, the corner to the left of the edge is moved to the midpoint of the edge. That shrinks affected triangles in-place. Then, the resulting holes are filled with new triangles (colored).

Edge collapse
This is the most challenging part of our method. Edge-collapse is responsible for avoiding and healing defects. On the other hand, edges to be collapsed must be chosen carefully to prevent interference with other collapses running in parallel. Moreover, surface topology must be preserved. We check all faces for face flips in each iteration. Face flips are actually small folds as illustrated in Figure 9, and there is no obvious dividing line between real folds in the target surface and optimization defects. We use the normalized area-weighted sum of face normals n f as vertex normals. For each triangle, we calculate the mean of its corner vertex normals and use that as a reference face normal n N . A face is considered flipped if the dot product ⟨ n f , n N ⟩ < 0 (see Figure 9). That works best, if none of the adjacent triangles is flipped. So, again, it is important to apply the remeshing in each iteration of the optimization, before more triangles are flipped.
Due to the topology preserving rule (see below), narrow triangles may permanently resist collapsing, although they have edges smaller than l min . That leads to narrow tube artefacts in the results. Therefore, we also try to collapse all faces with an area A < l 2 min ∕2. For each face candidate (flipped or small), we choose an edge to be collapsed. Normally the shortest edge of the triangle is the best choice, but again, the topology preserving rule may prevent an edge collapse, so we randomly choose one of the two other edges with a 10% chance respectively.

F I G U R E 10
We flip edges in parallel to minimize the overall degree loss Φ d . In this example, the red edge is to be replaced by the green one. Inequation (30) givesd 1 +d 2 −d 3 −d 4 = 3 > 2, so the edge flip reduces Φ d .
We rank the candidate edges using a collapse priority: Then we remove all candidates that are closer than three edges to a candidate with higher priority rank to prevent interference between the collapses. For each remaining candidate, we count the number of edge-pairs, that connect the start and end vertex of the candidate. If it is larger than two, a collapse would cause a topology change, so the candidate is removed.
For all remaining edge candidates, the start and end vertices are merged into the edge midpoints. We also merge m 1 , m 2 , , and l ref jointly with the vertex positions, so the remeshing smoothly integrates with the vertex update.

Edge flip
The purpose of edge flip is to keep the degree (valence) of the vertices close to their optimal value. The degree d i of a vertex i is the number of associated edges, its optimal value is 6 (or 4 for boundary edges). 19 We minimize the sum of squared degree errorsd i : Figure 10 illustrates a case where a single edge flip reduces the total Φ d . We renumber the involved vertices with i = {1, ..., 4}. For a single edge flip to reduce Φ d , with the degree changes Δd i ∈ {−1, 1}, we get the simple inequation We prioritize the edge flip candidates by ΔΦ d and remove all candidates that are closer than one edge to a candidate with higher priority rank to prevent interference.
Finally, we check that both triangle normals after the edge flip are not flipped with respect to the mean of the original ones.

Smoothing and regularization
Explicit remeshing always includes a smoothing step, usually Laplacian smoothing. This smoothing is essential to achieve a good triangulation while splitting, collapsing and flipping edges. Some methods apply a tangential smoothing to avoid surface shrinking, 19 but that requires reliable normal vectors, and so it did not work out in our experiments. Note that, as we embed the iterative remeshing into the optimization loop, this smoothing step is equivalent to a Laplacian regularization term in the optimization loss.
As already discussed, Laplacian regularization is required not only as part of the remeshing, but also to smooth out sparse or inconsistent gradients. This is particularly important when the vertex updates are large with respect to the edge length, in other words, when the relative speed = || || is large. On the other hand, when the mesh is close to the final result and the relative speed is low, smoothing is undesirable, as it impacts the result. So, we use a speed-weighted smoothing where is a smoothing hyperparameter and L is the combinatorial Laplacian matrix.

IMPLEMENTATION
We implemented our method in pure python, using pytorch 22 for remeshing and optimization. Source code and data are available at https://github.com/Profactor/continuous-remeshing.
Pytorch is designed to efficiently work on fixed-sized tensors, so we first experimented with remeshing operations on static tensor shapes. It turned out, that this is only faster for small meshes (V ⪅ 10 3 ...10 4 ). For large meshes, due to the vast amount of useless operations, the calculation overhead outweighs the saved resizing overhead. So, we use dynamic shapes, but avoid resizing where possible.
Pytorch's scatter operations are heavily used for smoothing and remeshing operations. In order to avoid resizing, we use the value zero to mask unused elements in referencing tensors (e.g., faces, edges) and prefix the referenced tensor (e.g., vertices) with a single dummy element. Using this trick, we can perform fast "masked" scattering and other operations on fixed-sized tensors.
Using pytorch's just-in-time compiler (TorchScript) can accelerate remeshing, but the saving is usually less than 10%. We believe that there is great potential for optimization using for example custom Cuda kernels.
Where not otherwise stated, the experiments in this article were made with 16 images per reconstruction and an image size of 512 × 512 pixels. We chose the fast rasterizer Nvdiffrast 10 for differentiable rendering. As we focus on remeshing and optimization, the simplest possible shading was used, a linear mapping of Cartesian normal vector components to Rgb components. This is equivalent to a first order spherical harmonics approximation of a diffuse reflection. 23 One of the major strengths of our solution is its strict scale-invariance. All operations are taking the edge length or reference edge length into account, for example in the update rule (21). This enables our method to optimize meshes from coarse to very fine using the same hyperparameters. In order to achieve scale invariance for the whole optimization, also the gradients coming from the differentiable renderer must be linear with respect to scaling. It is not at all obvious that the rendering loss provides such a linearity and that the renderer gives correct linear gradients.
We tested the scale-linearity of Nvdiffrast and observed very good results for shading gradients but significant limitations with silhouette gradients at small triangle sizes. This is a known limitation of Nvdiffrast and is related to its silhouette search that relies on the silhouette triangles being visible in the image. 10 We found from our experiments, that severe mesh defects arise if we reduce the edge length below the size of one pixel (0.005 for our setup).
For unknown reasons, we sometimes observe huge gradients on single pixels coming from Nvdiffrast. This happens only once in hundreds of images but can lead to severe artefacts. As a workaround, we limit the gradients to |g| < 10 ⋅ |m 1 | which has no impact on regular gradients.

RESULTS
All experiments in this article were performed on an Intel i7-11700K CPU and an NVIDIA RTX 3090 GPU.

F I G U R E 11
We benchmark our remeshing implementation on meshes of different size. The durations of the three remeshing steps are stacked, so the top line shows the total duration. Figure 11 shows results from our remeshing benchmarks. We first measure edge-split alone, then edge-collapse alone and finally all three steps together. In total, one remeshing iteration takes about five milliseconds for small meshes and computing time grows linearly with respect to vertex count for large meshes. While the remeshing duration depends only on the vertex count, rendering heavily depends on image size and count as well as shading. With our very simplistic shading, an iteration without remeshing takes about 10 ms. Physically based shading typically takes much longer, which makes the remeshing costs often negligible.

Comparison
We compare our solution to two state-of-the-art methods: 3 -Adam with Laplacian regularization.
-Adam-Remesh: Adam optimization with regularization and remeshing at fixed intervals using the method of Botsch et al. 19 and implemented in C++ by Sellán. 24 We selected seven test models. The two baseline methods struggle to reconstruct the two challenging models Deer and Smilodon, so we grouped the models by complexity: -Basic models: Nefertiti, Bunny, Armadillo, Lucy, Horse. 25 -Complex models: Deer, 26 Smilodon. 27 All models were scaled and shifted to fit into a unit sphere (radius r = 1). A sphere with r = 0.5 was used as initial mesh, using 10,242 vertices for Adam and 162 vertices for Adam-Remesh and Ours. Images were rendered as explained in Section 5. The mean absolute pixel difference (RGBA) was used as objective function.
All methods use a linear warm-up phase (learning ramp) at the beginning of the optimization (and after each remeshing for Adam-Remesh). The actual learning rate is calculated as ) .
The hyperparameters of all three methods were carefully tuned to minimize the overall mean squared surface distance of the five Basic Models after three seconds of optimization. (We calculate distances both from the reconstructed mesh to the target mesh and vice versa.) In addition, Adam-Remesh was tuned again, using only the two Complex Models (Results shown as Adam-Remesh Complex), leading to a larger remesh interval and much coarser meshes. All tuned parameters are listed in Table 1. Figure 12 gives an overview of the results after three seconds of optimization. Our method produces the best results for all models. While the differences for the simple first three models are small, our results are substantially better for Lucy and Horse. Without being tuned for the two complex models Deer and Smilodon, our method is still able to reconstruct them well. As expected, Adam-Remesh Complex gives better results for the Complex Models than Adam-Remesh but worse results for the Simple Models.
While the RMS distance gives a first idea about the reconstruction quality, it is important to also investigate the mesh quality. Figure 13 compares the RMS distance, average edge length and face flip ratio of all models and methods. The face flip ratio was calculated by comparing all face normals with the normal of the closest point on the target mesh. Then the area of all flipped triangles was divided by the total mesh area. It is clear to see, that our method converges faster and produces much finer meshes with fewer defects for all models. Without being tuned for Deer and Smilodon, it clearly outperforms the tuned Adam-Remesh Complex for these two models. Figure 14 shows some intermediate and final meshes for four of the models, colored by per-vertex distance to the target mesh. Again, it is easy to see, that our method has major advantages, mainly for the more complex models. Note that while Adam-Remesh Complex and Ours give similar RMS distance for Smilodon, the reconstruction using Ours has F I G U R E 13 Comparison of RMS distance, average edge length, and face flip ratio over time. Lower values are better. Our method (green) is not only faster, it also achieves much higher mesh quality. much finer details, see also the edge length plot in Figure 13. Please also watch our supplemental videos of some complete reconstructions.

Fine to coarse
Another advantage of our method is the ability to take a step back in the coarse-to-fine optimization. Normally the reference edge length l ref is steadily decreased in Equation (22), but in some cases it may increase. Figure 15 shows an example where a part of the mesh is only connected by a thin wire that requires very small triangles to be reconstructed. After that, increasing the edge length for the missing part greatly speeds up the reconstruction.

Parameter robustness
We analyze the dependency of the overall RMS distance on our parameters ref and G in Figure 16. The Basic Models and corresponding parameters were used for evaluation. The gain G is a quite uncritical parameter, bad values typically just F I G U R E 14 Intermediate (t = 1 s, 2 s) and final (t = 3 s) meshes from the comparison in Section 6.2. Colors indicate per-vertex distance to the target mesh. Please zoom in to see the triangulation. Our method is faster and produces much finer and more accurate meshes with fewer defects for all models. All baseline methods fail to completely reconstruct the antlers of the Deer. Adam-Remesh Complex is able to reconstruct Smilodon's hind legs, but produces very coarse meshes. Please also watch our supplemental videos of complete reconstructions.
slow down the optimization. The reference velocity ref is much more critical, but a value of ref = 0.3 gave good results in all our experiments.

CONCLUSION AND OUTLOOK
We presented a novel method that tackles fundamental problems of mesh optimization. We demonstrated that our method is faster for simple models and able to reconstruct complex models where baseline methods completely fail. The results presented in Section 6 give just a very first idea of what might be possible with our method. The ability to freely optimize the triangulation locally without touching other parts of the mesh is the prerequisite for reconstructing much larger and more complex meshes than shown in this article. A lot of additional research is needed to evaluate how good our method performs when it is applied to real-world problems with physically based shading. Reconstructing