-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of MeshGL -> Manifold construction #1138
Comments
Do you have
I think it is probably quite hard to speed it up, because we do a lot of preprocessing. If you can give us the mesh, I can take a look at whether it is not using as many cores as it should, for example. |
OK one thing I see: We are doing Lines 203 to 213 in ac82767
|
I checked the both of these don't trigger an error in the code, so I think I have the non-debug, parallel settings. #ifdef MANIFOLD_DEBUG #if MANIFOLD_PAR != 1 Here is a .obj version of one of the spheres. I notice that our exporter is only writing 6 digits of precision for the coords, which means you may not be getting precisely the same floats as I'm passing through the API. (I don't have MANIFOLD_EXPORT enabled, since I didn't want that dependency in our code.) |
I managed to improve the timing with parallelization with some of the functions. The issue is that it is specific to meshes that are already quite nice, which doesn't require much simplification. Also, it seems that most of our time is spent on sorting various intermediate data structures (at least after these changes), not sure if there is a way around this. I can also reproduce the slowness with vertex properties, can look into that later. Code I am testing with: const char* name = "test setup";
FrameMarkStart(name);
Manifold sphere = Manifold::Sphere(1, (8 << 7) * 4);
sphere = sphere.CalculateNormals(0);
auto mesh = sphere.GetMeshGL(0);
{
ZoneScopedN("Import");
Manifold m(mesh);
}
FrameMarkEnd(name); Import took 1.93s, and it takes 1.6s after applying the patch below. diff --git a/src/edge_op.cpp b/src/edge_op.cpp
index d22577a2..603602f9 100644
--- a/src/edge_op.cpp
+++ b/src/edge_op.cpp
@@ -68,12 +68,20 @@ struct FlagEdge {
// two original triangles.
const TriRef ref0 = triRef[edge / 3];
int current = NextHalfedge(halfedge[edge].pairedHalfedge);
- const TriRef ref1 = triRef[current / 3];
+ TriRef ref1 = triRef[current / 3];
+ bool ref1Updated = !ref0.SameFace(ref1);
while (current != edge) {
current = NextHalfedge(halfedge[current].pairedHalfedge);
int tri = current / 3;
const TriRef ref = triRef[tri];
- if (!ref.SameFace(ref0) && !ref.SameFace(ref1)) return false;
+ if (!ref.SameFace(ref0) && !ref.SameFace(ref1)) {
+ if (!ref1Updated) {
+ ref1 = ref;
+ ref1Updated = true;
+ } else {
+ return false;
+ }
+ }
}
return true;
}
@@ -132,22 +140,56 @@ void Manifold::Impl::CleanupTopology() {
// verts. They must be removed before edge collapse.
SplitPinchedVerts();
+ Vec<SortEntry> entries;
+ Vec<int> indices;
while (1) {
ZoneScopedN("DedupeEdge");
const size_t nbEdges = halfedge_.size();
size_t numFlagged = 0;
- Vec<SortEntry> entries;
- entries.reserve(nbEdges / 2);
- for (size_t i = 0; i < nbEdges; ++i) {
- if (halfedge_[i].IsForward()) {
- entries.push_back({halfedge_[i].startVert, halfedge_[i].endVert, i});
+ if (nbEdges > 1e5) {
+ ZoneScopedN("DedupeEdgeInit");
+ auto f = [&](size_t i) { return halfedge_[i].IsForward() ? 1 : 0; };
+ indices.resize(nbEdges, 0);
+ entries.resize(nbEdges / 2);
+ exclusive_scan(TransformIterator(countAt(0), f),
+ TransformIterator(countAt(nbEdges), f), indices.begin());
+ for_each(ExecutionPolicy::Par, countAt(0), countAt(nbEdges),
+ [&](size_t i) {
+ if (halfedge_[i].IsForward()) {
+ entries[indices[i]] = {halfedge_[i].startVert,
+ halfedge_[i].endVert, i};
+ }
+ });
+ } else {
+ entries.clear(false);
+ entries.reserve(nbEdges / 2);
+ for (size_t i = 0; i < nbEdges; ++i) {
+ if (halfedge_[i].IsForward()) {
+ entries.push_back({halfedge_[i].startVert, halfedge_[i].endVert, i});
+ }
}
}
- stable_sort(entries.begin(), entries.end());
- for (size_t i = 0; i < entries.size() - 1; ++i) {
+ {
+ ZoneScopedN("DedupeEdgeSort");
+ stable_sort(entries.begin(), entries.end());
+ }
+ std::vector<size_t> duplicates;
+ std::mutex m;
+ for_each(autoPolicy(entries.size() - 1), countAt(0),
+ countAt(entries.size() - 1), [&](size_t i) {
+ const int h0 = entries[i].index;
+ const int h1 = entries[i + 1].index;
+ if (halfedge_[h0].startVert == halfedge_[h1].startVert &&
+ halfedge_[h0].endVert == halfedge_[h1].endVert) {
+ m.lock();
+ duplicates.push_back(i);
+ m.unlock();
+ }
+ });
+ for (auto i : duplicates) {
const int h0 = entries[i].index;
const int h1 = entries[i + 1].index;
if (halfedge_[h0].startVert == halfedge_[h1].startVert &&
@@ -156,7 +198,6 @@ void Manifold::Impl::CleanupTopology() {
numFlagged++;
}
}
-
if (numFlagged == 0) break;
#ifdef MANIFOLD_DEBUG
@@ -207,13 +248,14 @@ void Manifold::Impl::SimplifyTopology() {
ZoneScopedN("CollapseShortEdge");
numFlagged = 0;
ShortEdge se{halfedge_, vertPos_, epsilon_};
- for_each_n(policy, countAt(0_uz), nbEdges,
- [&](size_t i) { bFlags[i] = se(i); });
- for (size_t i = 0; i < nbEdges; ++i) {
- if (bFlags[i]) {
- CollapseEdge(i, scratchBuffer);
- scratchBuffer.resize(0);
- numFlagged++;
+ if (count_if(policy, countAt(0_uz), countAt(nbEdges),
+ [&](size_t i) { return (bFlags[i] = se(i)); }) > 0) {
+ for (size_t i = 0; i < nbEdges; ++i) {
+ if (bFlags[i]) {
+ CollapseEdge(i, scratchBuffer);
+ scratchBuffer.resize(0);
+ numFlagged++;
+ }
}
}
}
@@ -229,42 +271,44 @@ void Manifold::Impl::SimplifyTopology() {
ZoneScopedN("CollapseFlaggedEdge");
numFlagged = 0;
FlagEdge se{halfedge_, meshRelation_.triRef};
- for_each_n(policy, countAt(0_uz), nbEdges,
- [&](size_t i) { bFlags[i] = se(i); });
- for (size_t i = 0; i < nbEdges; ++i) {
- if (bFlags[i]) {
- CollapseEdge(i, scratchBuffer);
- scratchBuffer.resize(0);
- numFlagged++;
+ if (count_if(policy, countAt(0_uz), countAt(nbEdges),
+ [&](size_t i) { return (bFlags[i] = se(i)); }) > 0) {
+ for (size_t i = 0; i < nbEdges; ++i) {
+ if (bFlags[i]) {
+ CollapseEdge(i, scratchBuffer);
+ scratchBuffer.resize(0);
+ numFlagged++;
+ }
}
}
}
-#ifdef MANIFOLD_DEBUG
+// #ifdef MANIFOLD_DEBUG
if (ManifoldParams().verbose && numFlagged > 0) {
std::cout << "found " << numFlagged << " colinear edges to collapse"
<< std::endl;
}
-#endif
+// #endif
{
ZoneScopedN("RecursiveEdgeSwap");
numFlagged = 0;
SwappableEdge se{halfedge_, vertPos_, faceNormal_, tolerance_};
- for_each_n(policy, countAt(0_uz), nbEdges,
- [&](size_t i) { bFlags[i] = se(i); });
- std::vector<int> edgeSwapStack;
- std::vector<int> visited(halfedge_.size(), -1);
- int tag = 0;
- for (size_t i = 0; i < nbEdges; ++i) {
- if (bFlags[i]) {
- numFlagged++;
- tag++;
- RecursiveEdgeSwap(i, tag, visited, edgeSwapStack, scratchBuffer);
- while (!edgeSwapStack.empty()) {
- int last = edgeSwapStack.back();
- edgeSwapStack.pop_back();
- RecursiveEdgeSwap(last, tag, visited, edgeSwapStack, scratchBuffer);
+ if (count_if(policy, countAt(0_uz), countAt(nbEdges),
+ [&](size_t i) { return (bFlags[i] = se(i)); }) > 0) {
+ std::vector<int> edgeSwapStack;
+ std::vector<int> visited(halfedge_.size(), -1);
+ int tag = 0;
+ for (size_t i = 0; i < nbEdges; ++i) {
+ if (bFlags[i]) {
+ numFlagged++;
+ tag++;
+ RecursiveEdgeSwap(i, tag, visited, edgeSwapStack, scratchBuffer);
+ while (!edgeSwapStack.empty()) {
+ int last = edgeSwapStack.back();
+ edgeSwapStack.pop_back();
+ RecursiveEdgeSwap(last, tag, visited, edgeSwapStack, scratchBuffer);
+ }
}
}
}
diff --git a/src/impl.cpp b/src/impl.cpp
index 06345243..08cf22fa 100644
--- a/src/impl.cpp
+++ b/src/impl.cpp
@@ -432,28 +432,55 @@ void Manifold::Impl::CreateHalfedges(const Vec<ivec3>& triVerts) {
// degenerate situations the triangulator can add the same internal edge in
// two different faces, causing this edge to not be 2-manifold. These are
// fixed by duplicating verts in SimplifyTopology.
- stable_sort(ids.begin(), ids.end(), [&edge](const int& a, const int& b) {
- return edge[a] < edge[b];
- });
+ {
+ ZoneScopedN("CreateHalfedgesSort");
+ stable_sort(ids.begin(), ids.end(), [&edge](const int& a, const int& b) {
+ return edge[a] < edge[b];
+ });
+ }
// Mark opposed triangles for removal
const int numEdge = numHalfedge / 2;
- for (int i = 0; i < numEdge; ++i) {
- const int pair0 = ids[i];
- Halfedge h0 = halfedge_[pair0];
- int k = i + numEdge;
- while (1) {
- const int pair1 = ids[k];
- Halfedge h1 = halfedge_[pair1];
- if (h0.startVert != h1.endVert || h0.endVert != h1.startVert) break;
- if (halfedge_[NextHalfedge(pair0)].endVert ==
- halfedge_[NextHalfedge(pair1)].endVert) {
- // Reorder so that remaining edges pair up
- if (k != i + numEdge) std::swap(ids[i + numEdge], ids[k]);
- break;
+
+ if (count_if(countAt(0), countAt(numEdge), [&](size_t i) {
+ const int pair0 = ids[i];
+ Halfedge h0 = halfedge_[pair0];
+ int k = i + numEdge;
+ while (1) {
+ const int pair1 = ids[k];
+ Halfedge h1 = halfedge_[pair1];
+ if (h0.startVert != h1.endVert || h0.endVert != h1.startVert) break;
+ if (halfedge_[NextHalfedge(pair0)].endVert ==
+ halfedge_[NextHalfedge(pair1)].endVert) {
+ // Reorder so that remaining edges pair up
+ if (k != i + numEdge) return true;
+ break;
+ }
+ ++k;
+ if (k >= numHalfedge) break;
+ }
+ return false;
+ }) > 0) {
+ // question: can we parallelize this? instead of just relying on count?
+ for (int i = 0; i < numEdge; ++i) {
+ const int pair0 = ids[i];
+ Halfedge h0 = halfedge_[pair0];
+ int k = i + numEdge;
+ while (1) {
+ const int pair1 = ids[k];
+ Halfedge h1 = halfedge_[pair1];
+ if (h0.startVert != h1.endVert || h0.endVert != h1.startVert) break;
+ if (halfedge_[NextHalfedge(pair0)].endVert ==
+ halfedge_[NextHalfedge(pair1)].endVert) {
+ // Reorder so that remaining edges pair up
+ if (k != i + numEdge) {
+ std::swap(ids[i + numEdge], ids[k]);
+ }
+ break;
+ }
+ ++k;
+ if (k >= numHalfedge) break;
}
- ++k;
- if (k >= numHalfedge) break;
}
}
diff --git a/src/impl.h b/src/impl.h
index 5b7845bb..5f08cc28 100644
--- a/src/impl.h
+++ b/src/impl.h
@@ -204,8 +204,6 @@ struct Manifold::Impl {
}
SetEpsilon(-1, std::is_same<Precision, float>::value);
- SplitPinchedVerts();
-
CalculateNormals();
if (meshGL.runOriginalID.empty()) {
|
Thanks for already making a nice improvement! I understand that a lot of time is spent looking for and fixing/working around problems with the input. Is there a fast check that such problems exist, so that a faster code path can be used if they don't? |
That is worth looking into, ideally to skip some sorting. Not sure how simple that will be, things like duplicate check/removal do require sorting/hashmap and is expensive. |
This is probably leading the API into directions you don't want to go, but it would be possible to let the MeshGL assert certain properties that mean you don't have to do some sorting. Then we could check those properties (and cache them) on our end; or, if we know the mesh came from a builtin primitive that hasn't be edited, we could just assert those properties. |
Ideally, we still want to see how much we can optimize the mesh simplification process first, because that is also run after each boolean operation. And then we can think about bypassing them in some cases when the user is sure. |
The sorting time is tricky because that's part of what helps the Boolean go so fast (cache coherence). I was also banking on the idea that sort algorithms are pretty close to no-op on already-sorted data, but that may be wishful thinking. But yes, there are definitely significant improvements to be made here - I'm especially working on the simplification-related ones. Still, if you have a chain of operations to do, it is definitely best to keep them all inside Manifold; if you're passing every intermediate in and out through |
Thanks for your engagement on this problem. There two use cases that are popular with Blender users that I expect will be improved with a Manifold boolean solver option:
For case (1), the booleans are done one at a time, and need to feel "interactive". This is the case where performance matters the most -- ideally, the entire boolean operation will happen within several hundred milliseconds, even when the sculpt mesh is huge. For case (2), the approach you recommend is possible (for cases b and c), and is on my future work list. Probably I would only do it for geometry nodes, as they are the new hotness, and modifiers are on the way out. Within geometry nodes, there may be a way to provide the Manifold conversion of "Geometry" as an alternative to the Mesh form, and as an alternative output of the Boolean geometry node. Then Manifold's deferred calculation of Boolean outputs might automatically give us the optimized evaluation of a whole expression dag of booleans. P.S., a possible third use case for Manifold is SDF modeling. There is a nascent interest in that modality within the Blender community. |
I am chiming in as I was advocating for something like mentioned here: #1012 |
I think I managed to speed up edge deduplication pretty significantly by avoiding the sort entry and use a simple 32-bit integer instead, as well as some parallelization. I also have a plan about how to speed up other parts of mesh simplification. Will open a PR for that later when that is done. Basically, just use some thread local buffer to store the edges that require cleanup, and avoid the sequential loop over all edges. Those thread local buffers are already sorted, so we can later iterate over the list of edges in increasing order in a way similar to merge sort but without actually creating a new buffer. The edge collapse is still run sequentially, but when the number of edges requiring collapse is small (the usual case), this can be much faster. For I think for cases like sculpting, feeding the entire model for boolean is unlikely the optimal approach. It will be much faster if you can somehow split the mesh into voxels and only perform operations locally, and union everything at the end when needed. For organic meshes, this can probably reduce the complexity a lot. @stephomi may have something to say about this, considering he works on a sculpting app. |
|
Great to hear more progress. The method for dealing with edge cleanup sounds similar to what Blender uses to parallelize the computation of shared edges given a soup of faces (uses thread local storage and hashing to put various subsets into buckets that can be handled in parallel). |
Not sure if it’s on purpose but cleanupTriangles is used even if MANIFOLD_DEBUG is off It’s typically an option that you want disabled in my opinion, you use Boolean to keep the topology intact. My experience is different than @howardt, as my meshes can be “very” not manifold, for example if you use dyntopo (real-time tesseleation).
Entire point of manifold is to keep the base topology intact.
An often requested feature is non-destructive boolean, but it’s something I need to do on my end. But if I were to make a wish list, it would be:
(I run a quad remesher on the right, but the only interested parts are the edges) |
Thanks stephomi for your notes. I am not actively involved in Blender's sculpt code, but a brief conversation with the developers who are left me with the impression that keeping the mesh manifold would be possible if that's not what they already do. I may have misunderstood or been misled. But still, for reasons you state, we need to do a boolean operation in sculpt sometimes on the faces, not the voxels. I second your wish list of cancellation and dealing with self intersection. I myself wrote an exact-arithmetic boolean solver (which is an option in Blender) that deals properly with self intersection and has fallbacks for non-manifold meshes. Sadly, it can often be too slow, which is why I'm pursing this Manifold option. But I will be unable to dump the exact solver because it handles cases that Manifold does not now (and does a better job of dealing with exact coplanar faces). For applications like Sculpt, I fantasize about an approach that roughly identifies the "intersection volume" of the boolean, and then grossly simplifies the rest of the argument meshes, does the boolean on that, and then stitches together the results afterwards. It would seem to cut the number of faces involved by a factor n if the original sphere has O(n^3) faces, in cases like two huge dense intersecting spheres. So that is on my wish list too. The Ember solver, which I spent a lot of time implementing, has a recursive space division part of its algorithm that at least avoids processing large parts of the input meshes if they are outside of the intersection volume. (But I abandoned implementing that for a number of reasons, once I discovered Manifold.) I also think I don't want simplifyTriangles. |
|
1 - totally agree. It's a necessary step, but only on the edges the Boolean generated. This feels like low hanging fruit to me, and should increase performance too. Shouldn't be hard since before we re-sort, all the newly cut triangles are at the end of the list. |
@elalish what do you think about this?
|
Hmm, I know I ran across some cases where some really small edges were created, and others where unexpected topology was created, but I should have kept track of those because I can't reproduce them right now. I tried a bunch of simple cases of small cube exactly on top of a larger cube, in various degrees of alignment with the edges of the bottom cube, and Manifold did exactly as I would expect and want in each of those cases. Which makes me feel better about possibly getting rid of my exact arithmetic solver. But to really get rid of it, I think I would need a replacement functionality for dealing with self-intersections. Is that just too hard to fit into the Manifold algorithm? |
An algorithm that deals with self-intersection is in our roadmap. In fact, we listed it in our potential GSoC project list :) |
To be clear, our algorithms handle self-intersected meshes - you'll get a manifold result. Often it will even be what you expect, just not always. The triangulator has the most trouble with it, but many classes of self intersected manifolds don't result in ugly self-intersecting polygons, in which case it's fine. And in the other cases, I'd best most booleans would also give odd results. As for splitting pinched verts - they're very unlikely. Even one is rare, two very rare. So I'm not sure parallelization will help that much. Right now I believe I parallelize searching for them, because I think that's where all the time is spent. Might be better to focus on the few places where they can get introduced and only try to find and fix them at those locations, instead of the whole mesh. |
Oh, good to hear that the current algorithms may give people what they expect when they have self-intersecting meshes. I hadn't actually tried, but was relying on other comments above. I'll try it and see. On the idea of switching my code to use Manifold's property propagation and interpolation instead of my own, I spent yesterday writing a standalone test that tried making what Blender calls a "UV sphere" of various sizes and with real UV properties that needed vertex splitting. Then I timed the Manifold constructor from MeshGL. I found the results very anomalous and hard to explain. Here they are: manifold construction times for sphere
What is going on where having 0 uv properties makes the constructor take so much more time than having 1? And here are the times with 0 extra vertex properties:
where you can see that the times are very close to those with the 0-uv property case. I attach my code so you can try yourself; or perhaps find where I made some silly error. But this does match the one test I tried before where I only added vertex properties, with no splits, and got an unreasonable increase in construction time -- which was why I was avoiding going down that path. |
I tried this with #1139:
With current master:
I don't think the time difference is that big on my machine, perhaps some allocator or parallelization heuristic issue? We choose when to parallelize based on a simple threshold, which may cause this kind of increasing workload leading to faster runtime. I don't think it is important enough to deserve looking into... |
Thanks for checking. My tests were on a Macbook Pro M3 Max, which has 12 performance cores at 4 efficiency cores. Maybe that can cause some anomalies with TBB scheduling. |
#1148 is further improving the performance for mesh fixup, so Looking at the code, I think I also know how to parallelize For |
Looking at it more closely, |
Yeah, it's a greedy algorithm and fish scales are definitely a result of that. I have no problem with changing the behavior - the key is that it has some kind of global guarantee - currently the surface is guaranteed not to shift by more than tolerance. The problem with my previous approach was that the errors could stack indefinitely. |
It feels like there is room to improve the performance of the MeshGL -> Manifold construction.
When I construct two MeshGL's, each containing a sphere with 1,046,528 triangles, it takes 138.9 ms to construct an Impl from a MeshGL for each of them. The boolean itself (a difference of one sphere offset a bit from the other), and conversion back into MeshGL takes 242 ms. So it is a bit surprising to me that the time to convert 2 inputs into Impl's takes comparable time to the boolean operation itself.
The MeshGL's I construct have no vertex properties beyond the 3 for the coordinates. I set faceID, but don't populate runIndex and runOriginalID. There are no mergeFrom/mergeTo items populated. I put some timers into the Impl.h file around various phases of the contruction, and it printed this for doing one sphere:
Impl constructor: copy vert properties took 1.07067 ms
Impl constructor: set up triRef took 42 ns
Impl constructor: set up triVerts took 2.70829 ms
Impl constructor: CreateHalfEdges took 15.3821 ms
Impl constructor: IsManifold took 0.430167 ms
Impl constructor: CalculateBBox took 0.248125 ms
Impl constructor: SplitPinchedVerts took 22.2563 ms
Impl constructor: CalculateNormals took 3.83696 ms
Impl constructor: InitializeOriginal took 0.627458 ms
Impl constructor: CreateFaces took 22.3918 ms
Impl constructor: SimplifyTopology took 36.091 ms
Impl constructor: RemoveUnreferencedVerts took 0.460083 ms
Impl constructor: Finish took 32.6864 ms
Impl constructor from MeshGLP, total took 138.872 ms
These times are all on my Macbook Pro M3 Max : 16 cores (12 performance, 4 efficiency), 64GB memory.
The high running phases are
SimplifyTopology (36 ms)
Finish (33 ms)
CreateFaces (22 ms)
SplitPInchedVerts (22 ms)
CreateHalfEdges (15 ms)
Do these numbers look reasonable, or is there room to improve?
The text was updated successfully, but these errors were encountered: