Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of MeshGL -> Manifold construction #1138

Open
howardt opened this issue Jan 28, 2025 · 28 comments · May be fixed by #1185
Open

Improve performance of MeshGL -> Manifold construction #1138

howardt opened this issue Jan 28, 2025 · 28 comments · May be fixed by #1185

Comments

@howardt
Copy link
Contributor

howardt commented Jan 28, 2025

It feels like there is room to improve the performance of the MeshGL -> Manifold construction.

When I construct two MeshGL's, each containing a sphere with 1,046,528 triangles, it takes 138.9 ms to construct an Impl from a MeshGL for each of them. The boolean itself (a difference of one sphere offset a bit from the other), and conversion back into MeshGL takes 242 ms. So it is a bit surprising to me that the time to convert 2 inputs into Impl's takes comparable time to the boolean operation itself.

The MeshGL's I construct have no vertex properties beyond the 3 for the coordinates. I set faceID, but don't populate runIndex and runOriginalID. There are no mergeFrom/mergeTo items populated. I put some timers into the Impl.h file around various phases of the contruction, and it printed this for doing one sphere:

Impl constructor: copy vert properties took 1.07067 ms
Impl constructor: set up triRef took 42 ns
Impl constructor: set up triVerts took 2.70829 ms
Impl constructor: CreateHalfEdges took 15.3821 ms
Impl constructor: IsManifold took 0.430167 ms
Impl constructor: CalculateBBox took 0.248125 ms
Impl constructor: SplitPinchedVerts took 22.2563 ms
Impl constructor: CalculateNormals took 3.83696 ms
Impl constructor: InitializeOriginal took 0.627458 ms
Impl constructor: CreateFaces took 22.3918 ms
Impl constructor: SimplifyTopology took 36.091 ms
Impl constructor: RemoveUnreferencedVerts took 0.460083 ms
Impl constructor: Finish took 32.6864 ms
Impl constructor from MeshGLP, total took 138.872 ms

These times are all on my Macbook Pro M3 Max : 16 cores (12 performance, 4 efficiency), 64GB memory.

The high running phases are
SimplifyTopology (36 ms)
Finish (33 ms)
CreateFaces (22 ms)
SplitPInchedVerts (22 ms)
CreateHalfEdges (15 ms)

Do these numbers look reasonable, or is there room to improve?

@pca006132
Copy link
Collaborator

Do you have MANIFOLD_DEBUG=OFF and MANIFOLD_PAR=ON?

  1. SimplifyTopology is sequential, so parallelization will not help too much. @elalish is planning to modify the algorithm there iirc.
  2. Finish is doing a lot of sorting, maybe there is room for improvement there.
  3. CreateFaces: Maybe we can do some more parallelization?
  4. SplitPinchedVerts is also sequential, and not receiving too much attention.

I think it is probably quite hard to speed it up, because we do a lot of preprocessing. If you can give us the mesh, I can take a look at whether it is not using as many cores as it should, for example.

@pca006132
Copy link
Collaborator

OK one thing I see: We are doing SplitPinchedVerts twice, once here and once in SimplifyTopology (which calls CleanupTopology). Should we do that?

manifold/src/impl.h

Lines 203 to 213 in ac82767

SplitPinchedVerts();
CalculateNormals();
if (meshGL.runOriginalID.empty()) {
InitializeOriginal();
}
CreateFaces();
SimplifyTopology();

@howardt
Copy link
Contributor Author

howardt commented Jan 28, 2025

I checked the both of these don't trigger an error in the code, so I think I have the non-debug, parallel settings.

#ifdef MANIFOLD_DEBUG
blah!
#endif

#if MANIFOLD_PAR != 1
blah2!
#endif

Here is a .obj version of one of the spheres. I notice that our exporter is only writing 6 digits of precision for the coords, which means you may not be getting precisely the same floats as I'm passing through the API. (I don't have MANIFOLD_EXPORT enabled, since I didn't want that dependency in our code.)

@pca006132
Copy link
Collaborator

I managed to improve the timing with parallelization with some of the functions. The issue is that it is specific to meshes that are already quite nice, which doesn't require much simplification. Also, it seems that most of our time is spent on sorting various intermediate data structures (at least after these changes), not sure if there is a way around this.

I can also reproduce the slowness with vertex properties, can look into that later. Code I am testing with:

  const char* name = "test setup";
  FrameMarkStart(name);
  Manifold sphere = Manifold::Sphere(1, (8 << 7) * 4);
  sphere = sphere.CalculateNormals(0);
  auto mesh = sphere.GetMeshGL(0);
  {
    ZoneScopedN("Import");
    Manifold m(mesh);
  }
  FrameMarkEnd(name);

Import took 1.93s, and it takes 1.6s after applying the patch below.

diff --git a/src/edge_op.cpp b/src/edge_op.cpp
index d22577a2..603602f9 100644
--- a/src/edge_op.cpp
+++ b/src/edge_op.cpp
@@ -68,12 +68,20 @@ struct FlagEdge {
     // two original triangles.
     const TriRef ref0 = triRef[edge / 3];
     int current = NextHalfedge(halfedge[edge].pairedHalfedge);
-    const TriRef ref1 = triRef[current / 3];
+    TriRef ref1 = triRef[current / 3];
+    bool ref1Updated = !ref0.SameFace(ref1);
     while (current != edge) {
       current = NextHalfedge(halfedge[current].pairedHalfedge);
       int tri = current / 3;
       const TriRef ref = triRef[tri];
-      if (!ref.SameFace(ref0) && !ref.SameFace(ref1)) return false;
+      if (!ref.SameFace(ref0) && !ref.SameFace(ref1)) {
+        if (!ref1Updated) {
+          ref1 = ref;
+          ref1Updated = true;
+        } else {
+          return false;
+        }
+      }
     }
     return true;
   }
@@ -132,22 +140,56 @@ void Manifold::Impl::CleanupTopology() {
   // verts. They must be removed before edge collapse.
   SplitPinchedVerts();
 
+  Vec<SortEntry> entries;
+  Vec<int> indices;
   while (1) {
     ZoneScopedN("DedupeEdge");
 
     const size_t nbEdges = halfedge_.size();
     size_t numFlagged = 0;
 
-    Vec<SortEntry> entries;
-    entries.reserve(nbEdges / 2);
-    for (size_t i = 0; i < nbEdges; ++i) {
-      if (halfedge_[i].IsForward()) {
-        entries.push_back({halfedge_[i].startVert, halfedge_[i].endVert, i});
+    if (nbEdges > 1e5) {
+      ZoneScopedN("DedupeEdgeInit");
+      auto f = [&](size_t i) { return halfedge_[i].IsForward() ? 1 : 0; };
+      indices.resize(nbEdges, 0);
+      entries.resize(nbEdges / 2);
+      exclusive_scan(TransformIterator(countAt(0), f),
+                     TransformIterator(countAt(nbEdges), f), indices.begin());
+      for_each(ExecutionPolicy::Par, countAt(0), countAt(nbEdges),
+               [&](size_t i) {
+                 if (halfedge_[i].IsForward()) {
+                   entries[indices[i]] = {halfedge_[i].startVert,
+                                          halfedge_[i].endVert, i};
+                 }
+               });
+    } else {
+      entries.clear(false);
+      entries.reserve(nbEdges / 2);
+      for (size_t i = 0; i < nbEdges; ++i) {
+        if (halfedge_[i].IsForward()) {
+          entries.push_back({halfedge_[i].startVert, halfedge_[i].endVert, i});
+        }
       }
     }
 
-    stable_sort(entries.begin(), entries.end());
-    for (size_t i = 0; i < entries.size() - 1; ++i) {
+    {
+      ZoneScopedN("DedupeEdgeSort");
+      stable_sort(entries.begin(), entries.end());
+    }
+    std::vector<size_t> duplicates;
+    std::mutex m;
+    for_each(autoPolicy(entries.size() - 1), countAt(0),
+             countAt(entries.size() - 1), [&](size_t i) {
+               const int h0 = entries[i].index;
+               const int h1 = entries[i + 1].index;
+               if (halfedge_[h0].startVert == halfedge_[h1].startVert &&
+                   halfedge_[h0].endVert == halfedge_[h1].endVert) {
+                 m.lock();
+                 duplicates.push_back(i);
+                 m.unlock();
+               }
+             });
+    for (auto i : duplicates) {
       const int h0 = entries[i].index;
       const int h1 = entries[i + 1].index;
       if (halfedge_[h0].startVert == halfedge_[h1].startVert &&
@@ -156,7 +198,6 @@ void Manifold::Impl::CleanupTopology() {
         numFlagged++;
       }
     }
-
     if (numFlagged == 0) break;
 
 #ifdef MANIFOLD_DEBUG
@@ -207,13 +248,14 @@ void Manifold::Impl::SimplifyTopology() {
     ZoneScopedN("CollapseShortEdge");
     numFlagged = 0;
     ShortEdge se{halfedge_, vertPos_, epsilon_};
-    for_each_n(policy, countAt(0_uz), nbEdges,
-               [&](size_t i) { bFlags[i] = se(i); });
-    for (size_t i = 0; i < nbEdges; ++i) {
-      if (bFlags[i]) {
-        CollapseEdge(i, scratchBuffer);
-        scratchBuffer.resize(0);
-        numFlagged++;
+    if (count_if(policy, countAt(0_uz), countAt(nbEdges),
+                 [&](size_t i) { return (bFlags[i] = se(i)); }) > 0) {
+      for (size_t i = 0; i < nbEdges; ++i) {
+        if (bFlags[i]) {
+          CollapseEdge(i, scratchBuffer);
+          scratchBuffer.resize(0);
+          numFlagged++;
+        }
       }
     }
   }
@@ -229,42 +271,44 @@ void Manifold::Impl::SimplifyTopology() {
     ZoneScopedN("CollapseFlaggedEdge");
     numFlagged = 0;
     FlagEdge se{halfedge_, meshRelation_.triRef};
-    for_each_n(policy, countAt(0_uz), nbEdges,
-               [&](size_t i) { bFlags[i] = se(i); });
-    for (size_t i = 0; i < nbEdges; ++i) {
-      if (bFlags[i]) {
-        CollapseEdge(i, scratchBuffer);
-        scratchBuffer.resize(0);
-        numFlagged++;
+    if (count_if(policy, countAt(0_uz), countAt(nbEdges),
+                 [&](size_t i) { return (bFlags[i] = se(i)); }) > 0) {
+      for (size_t i = 0; i < nbEdges; ++i) {
+        if (bFlags[i]) {
+          CollapseEdge(i, scratchBuffer);
+          scratchBuffer.resize(0);
+          numFlagged++;
+        }
       }
     }
   }
 
-#ifdef MANIFOLD_DEBUG
+// #ifdef MANIFOLD_DEBUG
   if (ManifoldParams().verbose && numFlagged > 0) {
     std::cout << "found " << numFlagged << " colinear edges to collapse"
               << std::endl;
   }
-#endif
+// #endif
 
   {
     ZoneScopedN("RecursiveEdgeSwap");
     numFlagged = 0;
     SwappableEdge se{halfedge_, vertPos_, faceNormal_, tolerance_};
-    for_each_n(policy, countAt(0_uz), nbEdges,
-               [&](size_t i) { bFlags[i] = se(i); });
-    std::vector<int> edgeSwapStack;
-    std::vector<int> visited(halfedge_.size(), -1);
-    int tag = 0;
-    for (size_t i = 0; i < nbEdges; ++i) {
-      if (bFlags[i]) {
-        numFlagged++;
-        tag++;
-        RecursiveEdgeSwap(i, tag, visited, edgeSwapStack, scratchBuffer);
-        while (!edgeSwapStack.empty()) {
-          int last = edgeSwapStack.back();
-          edgeSwapStack.pop_back();
-          RecursiveEdgeSwap(last, tag, visited, edgeSwapStack, scratchBuffer);
+    if (count_if(policy, countAt(0_uz), countAt(nbEdges),
+                 [&](size_t i) { return (bFlags[i] = se(i)); }) > 0) {
+      std::vector<int> edgeSwapStack;
+      std::vector<int> visited(halfedge_.size(), -1);
+      int tag = 0;
+      for (size_t i = 0; i < nbEdges; ++i) {
+        if (bFlags[i]) {
+          numFlagged++;
+          tag++;
+          RecursiveEdgeSwap(i, tag, visited, edgeSwapStack, scratchBuffer);
+          while (!edgeSwapStack.empty()) {
+            int last = edgeSwapStack.back();
+            edgeSwapStack.pop_back();
+            RecursiveEdgeSwap(last, tag, visited, edgeSwapStack, scratchBuffer);
+          }
         }
       }
     }
diff --git a/src/impl.cpp b/src/impl.cpp
index 06345243..08cf22fa 100644
--- a/src/impl.cpp
+++ b/src/impl.cpp
@@ -432,28 +432,55 @@ void Manifold::Impl::CreateHalfedges(const Vec<ivec3>& triVerts) {
   // degenerate situations the triangulator can add the same internal edge in
   // two different faces, causing this edge to not be 2-manifold. These are
   // fixed by duplicating verts in SimplifyTopology.
-  stable_sort(ids.begin(), ids.end(), [&edge](const int& a, const int& b) {
-    return edge[a] < edge[b];
-  });
+  {
+    ZoneScopedN("CreateHalfedgesSort");
+    stable_sort(ids.begin(), ids.end(), [&edge](const int& a, const int& b) {
+      return edge[a] < edge[b];
+    });
+  }
 
   // Mark opposed triangles for removal
   const int numEdge = numHalfedge / 2;
-  for (int i = 0; i < numEdge; ++i) {
-    const int pair0 = ids[i];
-    Halfedge h0 = halfedge_[pair0];
-    int k = i + numEdge;
-    while (1) {
-      const int pair1 = ids[k];
-      Halfedge h1 = halfedge_[pair1];
-      if (h0.startVert != h1.endVert || h0.endVert != h1.startVert) break;
-      if (halfedge_[NextHalfedge(pair0)].endVert ==
-          halfedge_[NextHalfedge(pair1)].endVert) {
-        // Reorder so that remaining edges pair up
-        if (k != i + numEdge) std::swap(ids[i + numEdge], ids[k]);
-        break;
+
+  if (count_if(countAt(0), countAt(numEdge), [&](size_t i) {
+        const int pair0 = ids[i];
+        Halfedge h0 = halfedge_[pair0];
+        int k = i + numEdge;
+        while (1) {
+          const int pair1 = ids[k];
+          Halfedge h1 = halfedge_[pair1];
+          if (h0.startVert != h1.endVert || h0.endVert != h1.startVert) break;
+          if (halfedge_[NextHalfedge(pair0)].endVert ==
+              halfedge_[NextHalfedge(pair1)].endVert) {
+            // Reorder so that remaining edges pair up
+            if (k != i + numEdge) return true;
+            break;
+          }
+          ++k;
+          if (k >= numHalfedge) break;
+        }
+        return false;
+      }) > 0) {
+    // question: can we parallelize this? instead of just relying on count?
+    for (int i = 0; i < numEdge; ++i) {
+      const int pair0 = ids[i];
+      Halfedge h0 = halfedge_[pair0];
+      int k = i + numEdge;
+      while (1) {
+        const int pair1 = ids[k];
+        Halfedge h1 = halfedge_[pair1];
+        if (h0.startVert != h1.endVert || h0.endVert != h1.startVert) break;
+        if (halfedge_[NextHalfedge(pair0)].endVert ==
+            halfedge_[NextHalfedge(pair1)].endVert) {
+          // Reorder so that remaining edges pair up
+          if (k != i + numEdge) {
+            std::swap(ids[i + numEdge], ids[k]);
+          }
+          break;
+        }
+        ++k;
+        if (k >= numHalfedge) break;
       }
-      ++k;
-      if (k >= numHalfedge) break;
     }
   }
 
diff --git a/src/impl.h b/src/impl.h
index 5b7845bb..5f08cc28 100644
--- a/src/impl.h
+++ b/src/impl.h
@@ -204,8 +204,6 @@ struct Manifold::Impl {
     }
     SetEpsilon(-1, std::is_same<Precision, float>::value);
 
-    SplitPinchedVerts();
-
     CalculateNormals();
 
     if (meshGL.runOriginalID.empty()) {

@howardt
Copy link
Contributor Author

howardt commented Jan 28, 2025

Thanks for already making a nice improvement!

I understand that a lot of time is spent looking for and fixing/working around problems with the input. Is there a fast check that such problems exist, so that a faster code path can be used if they don't?

@pca006132
Copy link
Collaborator

That is worth looking into, ideally to skip some sorting. Not sure how simple that will be, things like duplicate check/removal do require sorting/hashmap and is expensive.

@howardt
Copy link
Contributor Author

howardt commented Jan 28, 2025

This is probably leading the API into directions you don't want to go, but it would be possible to let the MeshGL assert certain properties that mean you don't have to do some sorting. Then we could check those properties (and cache them) on our end; or, if we know the mesh came from a builtin primitive that hasn't be edited, we could just assert those properties.

@pca006132
Copy link
Collaborator

Ideally, we still want to see how much we can optimize the mesh simplification process first, because that is also run after each boolean operation. And then we can think about bypassing them in some cases when the user is sure.

@elalish
Copy link
Owner

elalish commented Jan 29, 2025

The sorting time is tricky because that's part of what helps the Boolean go so fast (cache coherence). I was also banking on the idea that sort algorithms are pretty close to no-op on already-sorted data, but that may be wishful thinking. But yes, there are definitely significant improvements to be made here - I'm especially working on the simplification-related ones.

Still, if you have a chain of operations to do, it is definitely best to keep them all inside Manifold; if you're passing every intermediate in and out through MeshGL, then you're definitely not on the fast path.

@howardt
Copy link
Contributor Author

howardt commented Jan 29, 2025

Thanks for your engagement on this problem.

There two use cases that are popular with Blender users that I expect will be improved with a Manifold boolean solver option:

  1. Sculpting. The mesh being sculpted is typically huge (many millions of triangles), and typically manifold. Things like "Trim" operations are implemented using boolean operators on the mesh being sculpted.

  2. "Hard body modeling", where typically one does a lot of the work by starting with primitives and adding more and more boolean operations with other primitives. These meshes are typically not very large, but there may be exceptions to that. There are three sub-ways of doing this methodolgy. (a) interactively, one boolean at a time; (b) build a stack of boolean "modifiers", perhaps with the occasional modifier of a different type in between; (c) make a node network using "geometry nodes", one of which is a "mesh boolean" node.

For case (1), the booleans are done one at a time, and need to feel "interactive". This is the case where performance matters the most -- ideally, the entire boolean operation will happen within several hundred milliseconds, even when the sculpt mesh is huge.

For case (2), the approach you recommend is possible (for cases b and c), and is on my future work list. Probably I would only do it for geometry nodes, as they are the new hotness, and modifiers are on the way out. Within geometry nodes, there may be a way to provide the Manifold conversion of "Geometry" as an alternative to the Mesh form, and as an alternative output of the Boolean geometry node. Then Manifold's deferred calculation of Boolean outputs might automatically give us the optimized evaluation of a whole expression dag of booleans.

P.S., a possible third use case for Manifold is SDF modeling. There is a nascent interest in that modality within the Blender community.

@hrgdavor
Copy link

hrgdavor commented Jan 29, 2025

I am chiming in as I was advocating for something like mentioned 1. Sculpting.

here: #1012

@pca006132
Copy link
Collaborator

I think I managed to speed up edge deduplication pretty significantly by avoiding the sort entry and use a simple 32-bit integer instead, as well as some parallelization. I also have a plan about how to speed up other parts of mesh simplification. Will open a PR for that later when that is done. Basically, just use some thread local buffer to store the edges that require cleanup, and avoid the sequential loop over all edges. Those thread local buffers are already sorted, so we can later iterate over the list of edges in increasing order in a way similar to merge sort but without actually creating a new buffer. The edge collapse is still run sequentially, but when the number of edges requiring collapse is small (the usual case), this can be much faster.

For SplitPinchedVerts, I'm thinking if there is an easy way to speed it up and make it parallel. If we can guarantee that halfedges around a vertex are chained in ascending order (except the last edge is chained back to the first), we can reduce the check to only the last edge: Only check when the next halfedge is smaller, and split the vertex when it was checked before. Because the halfedge update is local and only the last halfedge in the set of halfedges can cause the update, this is parallelizable (except the duplication part, which is quick). Not sure if we have this guarantee. @elalish what do you think?


I think for cases like sculpting, feeding the entire model for boolean is unlikely the optimal approach. It will be much faster if you can somehow split the mesh into voxels and only perform operations locally, and union everything at the end when needed. For organic meshes, this can probably reduce the complexity a lot. @stephomi may have something to say about this, considering he works on a sculpting app.

@pca006132
Copy link
Collaborator

pca006132 commented Jan 29, 2025

For face sorting, maybe we can combine several permute calls into one. Not sure if we already saturated the memory bandwidth there or not. Should worth a try. It is not faster, but I found that we were filling vectors when resizing, avoiding that gives some speedup.

@howardt
Copy link
Contributor Author

howardt commented Jan 29, 2025

Great to hear more progress. The method for dealing with edge cleanup sounds similar to what Blender uses to parallelize the computation of shared edges given a soup of faces (uses thread local storage and hashing to put various subsets into buckets that can be handled in parallel).

@stephomi
Copy link
Contributor

stephomi commented Jan 29, 2025

Not sure if it’s on purpose but cleanupTriangles is used even if MANIFOLD_DEBUG is off
The comment is probably outdated: https://github.com/elalish/manifold/blob/master/include/manifold/common.h#L573

It’s typically an option that you want disabled in my opinion, you use Boolean to keep the topology intact.
When I accidentally released a version with the option ON, artist went crazy “Boolean’s broken”.


My experience is different than @howardt, as my meshes can be “very” not manifold, for example if you use dyntopo (real-time tesseleation).

  • meshes can be opened (I run an hole filling algorithm, filled area/faces are tagged and removed after the Boolean pass)
  • potential non manifold edges (need to split some edges)
  • more rarely coplanar faces, duplicate stuffs, etc (deleting stuffs)
    I run an extensive clean up pass, it’s not 100% foolproof but it does the job most of the time.

It will be much faster if you can somehow split the mesh into voxels and only perform operations locally, and union everything at the end when needed.

Entire point of manifold is to keep the base topology intact.
I do have a voxel remesher that can do Boolean (like most sculpt app, the Boolean part being easy due to voxels) as well but it’s a lossy operation:

  • base topology is lost (no UVs, etc)
  • sharp intersection after the Boolean requires the use of quadric, which is a bit finicky to handle
    For Boolean, if your inputs are polygonal, the main advantage of voxel is you can feed anything and it’s guaranteed to give you a manifold mesh (you still need a hole-filling algorithm, but other than that the constraints are much weaker than the strict manifold)

An often requested feature is non-destructive boolean, but it’s something I need to do on my end.
Zbrush does real time preview through a render-only csg thing (also can fake csg-Boolean with stencil/cull, but it’s a bit of a pain). Blender runs the Boolean algorithm after each edit.
I would probably do it at some point, similarly to blender.


But if I were to make a wish list, it would be:

  1. Boolean cancellation, manifold is fast enough but a few exit point could be nice (outside loops is fine)
  2. handle self-overlaps. In Nomad if you run a manifold on a single mesh, it will split the mesh if some components are already separated or disconnected. But for single patch mesh I’m out of luck, unsurprisingly it seems too be a complex problem (Overlap removal #289)
  3. probably too specific but a quad edge loop cleanup on the intersected would be super handy
  • you fix smooth shading
  • catmull subdivision doesn’t break

(I run a quad remesher on the right, but the only interested parts are the edges)
Image

@howardt
Copy link
Contributor Author

howardt commented Jan 29, 2025

Thanks stephomi for your notes. I am not actively involved in Blender's sculpt code, but a brief conversation with the developers who are left me with the impression that keeping the mesh manifold would be possible if that's not what they already do. I may have misunderstood or been misled. But still, for reasons you state, we need to do a boolean operation in sculpt sometimes on the faces, not the voxels.

I second your wish list of cancellation and dealing with self intersection. I myself wrote an exact-arithmetic boolean solver (which is an option in Blender) that deals properly with self intersection and has fallbacks for non-manifold meshes. Sadly, it can often be too slow, which is why I'm pursing this Manifold option. But I will be unable to dump the exact solver because it handles cases that Manifold does not now (and does a better job of dealing with exact coplanar faces).

For applications like Sculpt, I fantasize about an approach that roughly identifies the "intersection volume" of the boolean, and then grossly simplifies the rest of the argument meshes, does the boolean on that, and then stitches together the results afterwards. It would seem to cut the number of faces involved by a factor n if the original sphere has O(n^3) faces, in cases like two huge dense intersecting spheres. So that is on my wish list too. The Ember solver, which I spent a lot of time implementing, has a recursive space division part of its algorithm that at least avoids processing large parts of the input meshes if they are outside of the intersection volume. (But I abandoned implementing that for a number of reasons, once I discovered Manifold.)

I also think I don't want simplifyTriangles.

@pca006132
Copy link
Collaborator

  1. SimplifyTriangles: I think the purpose is to simplify the degenerate triangles that the boolean operation tends to generate, but maybe we should tweak it to only simplify the newly generated edges. I think this is in our wishlist but it is kind of hard to do.
  2. Cancellation should be doable, maybe with a function in ManifoldOptions that is set globally, or a flag.
  3. The "voxel like" thing I was talking about is like what @howardt is saying, i.e. roughly identify the intersecting region, do boolean on it and then stitch the mesh back. That should still be exact. Not sure how hard that will be though.
  4. @howardt Maybe you can expand a bit on the handling of exact coplanar faces? I think the SimplifyTriangle code should collapse near coplanar faces, but you probably disabled it or something.

@elalish
Copy link
Owner

elalish commented Jan 30, 2025

1 - totally agree. It's a necessary step, but only on the edges the Boolean generated. This feels like low hanging fruit to me, and should increase performance too. Shouldn't be hard since before we re-sort, all the newly cut triangles are at the end of the list.
2 - yeah, cancellation would be cool.
4 - we handle exact axis aligned coplanar faces pretty well, but we only take floating point input, so lots of exactness tends to get lost on the way in. Since the 3D formats all lose exactness anyway I think it's pretty hard to make any practical guarantees when it comes to nonaligned coplanar faces.

@pca006132
Copy link
Collaborator

@elalish what do you think about this?

For SplitPinchedVerts, I'm thinking if there is an easy way to speed it up and make it parallel. If we can guarantee that halfedges around a vertex are chained in ascending order (except the last edge is chained back to the first), we can reduce the check to only the last edge: Only check when the next halfedge is smaller, and split the vertex when it was checked before. Because the halfedge update is local and only the last halfedge in the set of halfedges can cause the update, this is parallelizable (except the duplication part, which is quick). Not sure if we have this guarantee.

@howardt
Copy link
Contributor Author

howardt commented Jan 30, 2025

@howardt Maybe you can expand a bit on the handling of exact coplanar faces? I think the SimplifyTriangle code should collapse near coplanar faces, but you probably disabled it or something.

Hmm, I know I ran across some cases where some really small edges were created, and others where unexpected topology was created, but I should have kept track of those because I can't reproduce them right now. I tried a bunch of simple cases of small cube exactly on top of a larger cube, in various degrees of alignment with the edges of the bottom cube, and Manifold did exactly as I would expect and want in each of those cases. Which makes me feel better about possibly getting rid of my exact arithmetic solver. But to really get rid of it, I think I would need a replacement functionality for dealing with self-intersections. Is that just too hard to fit into the Manifold algorithm?

@pca006132
Copy link
Collaborator

An algorithm that deals with self-intersection is in our roadmap. In fact, we listed it in our potential GSoC project list :)

@elalish
Copy link
Owner

elalish commented Feb 1, 2025

To be clear, our algorithms handle self-intersected meshes - you'll get a manifold result. Often it will even be what you expect, just not always. The triangulator has the most trouble with it, but many classes of self intersected manifolds don't result in ugly self-intersecting polygons, in which case it's fine. And in the other cases, I'd best most booleans would also give odd results.

As for splitting pinched verts - they're very unlikely. Even one is rare, two very rare. So I'm not sure parallelization will help that much. Right now I believe I parallelize searching for them, because I think that's where all the time is spent. Might be better to focus on the few places where they can get introduced and only try to find and fix them at those locations, instead of the whole mesh.

@howardt
Copy link
Contributor Author

howardt commented Feb 2, 2025

Oh, good to hear that the current algorithms may give people what they expect when they have self-intersecting meshes. I hadn't actually tried, but was relying on other comments above. I'll try it and see.

On the idea of switching my code to use Manifold's property propagation and interpolation instead of my own, I spent yesterday writing a standalone test that tried making what Blender calls a "UV sphere" of various sizes and with real UV properties that needed vertex splitting. Then I timed the Manifold constructor from MeshGL. I found the results very anomalous and hard to explain. Here they are:

manifold construction times for sphere
nsegs = number of circular segments in sphere
nuvs = number of uv properties (2 floats per uv property)
nvpr = number of other vertex properties (1 float each)

nsegs nuvs nvpr manifold construction time in ms
  256    0   10   19
  256    1   10   15
 1024    0   10  204
 1024    1   10  187
 2048    0   10  842
 2048    1   10  741

What is going on where having 0 uv properties makes the constructor take so much more time than having 1? And here are the times with 0 extra vertex properties:

nsegs nuvs nvpr manifold construction time in ms
  256    0    0   18
  256    1    0   15
 1024    0    0  181
 1024    1    0  174
 2048    0    0  753
 2048    1    0  735

where you can see that the times are very close to those with the 0-uv property case.

I attach my code so you can try yourself; or perhaps find where I made some silly error. But this does match the one test I tried before where I only added vertex properties, with no splits, and got an unreasonable increase in construction time -- which was why I was avoiding going down that path.

man_props.cpp.txt

@pca006132
Copy link
Collaborator

I tried this with #1139:

  256    0    0   15
  256    0   10   14
  256    1    0   12
  256    1   10   13
 1024    0    0  181
 1024    0   10  199
 1024    1    0  188
 1024    1   10  202
 2048    0    0  878
 2048    0   10  982
 2048    1    0  940
 2048    1   10  970

With current master:

  256    0    0   16
  256    0   10   15
  256    1    0   14
  256    1   10   15
 1024    0    0  223
 1024    0   10  242
 1024    1    0  229
 1024    1   10  246
 2048    0    0 1084
 2048    0   10 1178
 2048    1    0 1104
 2048    1   10 1129

I don't think the time difference is that big on my machine, perhaps some allocator or parallelization heuristic issue? We choose when to parallelize based on a simple threshold, which may cause this kind of increasing workload leading to faster runtime. I don't think it is important enough to deserve looking into...

@howardt
Copy link
Contributor Author

howardt commented Feb 2, 2025

Thanks for checking. My tests were on a Macbook Pro M3 Max, which has 12 performance cores at 4 efficiency cores. Maybe that can cause some anomalies with TBB scheduling.

@pca006132
Copy link
Collaborator

#1148 is further improving the performance for mesh fixup, so CleaupTopology, which SimplifyTopology calls, will be faster.

Looking at the code, I think I also know how to parallelize CreateFaces and remove the sort. The idea is to split the work into thread local computation, and later merge them while keeping the priority (area) in mind. It is non-trivial so it will take some time.

For CreateHalfedges, I think I need to think about what the sequential loop is doing and see how can I parallelize it.

@pca006132
Copy link
Collaborator

Looking at it more closely, CreateFaces cannot be parallelized without changing the behavior. But thinking about it, I am not sure if the current approach is optimal either, I think the current approach of merging faces that are close to coplanar tends to create fish scale like result. Need more time to think about possible ways to parallelize it without causing the decimation to degrade too much.

@elalish
Copy link
Owner

elalish commented Feb 20, 2025

Yeah, it's a greedy algorithm and fish scales are definitely a result of that. I have no problem with changing the behavior - the key is that it has some kind of global guarantee - currently the surface is guaranteed not to shift by more than tolerance. The problem with my previous approach was that the errors could stack indefinitely.

@elalish elalish linked a pull request Mar 9, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants