-
-
Notifications
You must be signed in to change notification settings - Fork 22k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TileMap: Added "set_line" and "get_line" to batch update tiles #38324
Conversation
I don't think this really make sense. You say this solution allow you to have less gdscript loops to do, but you still have to fill your Vector in a gdscript loop too, for each line you want to modify. Also, this really look like a weird, too much specific API. Why a line ? Setting a Rect2i would make more sense. But that being said, I am not convinced that things brings a significant performance improvement over calling set_cell(). |
@groud Thx for having a look on my PR. In my usecase and I also assume this will be the same for others, who want to use Tilemap for a proc gen world, I design the world based on several logics (i. e. Simplex noise) and keep them in Arrays (or directly 2D-Arrays) -> reason is bad performance of set_cell and possibility of multi threading (Thread) with Arrays. But at the end I have to load all of this into the Tilemap and so far this is only possible by set_cell (which is already for a small 1000x1000 map noticeable). Regarding Rect2d I'm not sure how this would work in c++. |
Rect2i wouldn't work as here it's allowing to set a line with specific tile ID for each cell, while with Rect2i you could only batch-set a single ID to a region. I agree that Using a 2D array would be better than a line IMO, so it could be something like:
This would set all tiles from (4, 6) to (7, 8) (included) with the tile IDs from the 2D array. Indeed it doesn't make much sense if one needs to construct the arrays manually just for this, but in @zzz-assault's case where the data has already been generated into a 2D array, that makes sense, and otherwise it could also be ready from e.g. JSON files. It's still not perfect as it would still only allow setting a different tile ID but with the same flips and transpose for the whole region. An alternative would be to add a new system to handle (tile_pos, tile_id, flip_x, flip_y, transpose) for a whole region and set it at once. Heck, could use #37070 ;) |
I think we are kind of on the same page. I meant that we could have something like |
Directly applying a 2D array would even be better, if I pass an GD Script 2D array to the c++ method, would that work (and simply doing the double for loop in the c++ method)? I agree that the missing options for flip and transpose might be an issue, but this could be solved either by default values or an additional 2D array per required option. Problem might be, that the variant array of GD Script might contain wrong data and lead to unexpected errors (therefor I've choosen the PoolIntArray for my set_line PR). |
That being said, I am still not super convinced about how useful the PR is. If you already spend a lot of time generating the level I don't really see why it is a problem to spend a little bit more time updating the Tilemap. If you fear the screen might freeze, you could simply use another Tilemap node not in the scene tree, and update it before replacing the one in the tree. This could even me done in another thread for performance reasons. |
@groud I was able to perform all changes / operations with multi threading, so the performance of whole generation process incl. bitmask updating of a 1kx1k map is done in 3 secs ... the set_cell takes another 5 secs ... and 1kx1k is not the target size. But sure perhaps the use case is not as general, as the solution is basically useless for platformers, but in RTS or Eco Sims you need to have the full map persistent (example games = transport tycoon or age of empires). Anyhow thx for contributing and helping, I really appreaciate your ideas 😀 |
Hmm, this is kind of unexpected to be honest. I don't think updating the cells should take that long. Have you tried compare the performance with your PR ? How significant is the speed gain ? |
Haven't compared it yet, but I'm sure the issue is not the "set_cell" call, it's the required double loop (already use while loop for performance), with 1.000.000 iterations (due to 1000x1000 tiles). If I run anything in GD Script with a double loop and 1.000.000 iterations it takes a noticable time. Therfore the idea to move most iterations to the engine source code and c++. |
Well not really. In fact this amount of iteration is not specifically high for today processors. The overhead produced by gdscript is not that significant. Please compare the results, if the improvement is not significant, there no reason to add the feature. |
@groud I've done the test / comparison now. First peek to 100% is calculating the "map logic" based on arrays (with THREAD used). The downfall to 10 - 15% is "set_cell" / "set_line" (no THREAD used) and the second 100% peek is "bitmask" (with THREAD used). The performance difference is clearly visible, but to be honest it's less then expected :( But I wasn't able to use multi threading with set_cell (also if Tilemap was not added to Scene Tree), need to verify if this is still the case and if my "set_line" solution would allow this (then the performance gain would be more significant -> at least in my usecase) |
Just to give a bit of internal background to the rendering side 👍 I'm a big fan of doing such ranged instructions from gdscript (see godotengine/godot-proposals#290), and have been following the thread about the tilemap update slowness. Having done some profiling with @Xrayez project that was doing a lot of updates to visual server, I suspect there is a similar problem with tilemaps. So we have potentially 2 slow areas:
I have brought up an area with @clayjohn and @reduz which is currently a source of inefficiency, which is that the smallest unit sent to the renderer is a command for a single rect (or other primitive). My suggestion was to increase allow variable sizes for commands, because there is a lot of housekeeping for each commands, and things like tilemaps and text will contain a lot of similar rects which can be grouped together. Reduz suggested as a compromise, we add a new command multi_rect, which will be available in addition to the existing single commands. This will be a great fit for tilemaps and text. He also believes it will speed up rending in Vulkan too. This also offers an opportunity for far more efficient updating of these multi_rects to the visual server. At the moment updating individual rects to the visual server is taking a disproportionately large amount of time (each rect is causing a dynamic allocation on update, for example!). This can instead be done as a batch update (or region) for the multi_rect. In turn inside the tilemap, the multi_rects might be used for zones for culling purposes (I haven't looked inside tilemap yet), and we could offer a fast translation of tilemap updates to multi_rects. I think the updating via regions makes a lot of sense, and it even may be possible to keep the 'master data' gdscript side (i.e. just translate this master data to multi_rects in the tilemap), which would be more efficient, but it may be a trade off with safety, this may not be practical. In any case I think once we are using a region or batch update method from script, it may be crucial that the data input matches the data used internally to store the tilemap (which may be chosen to easily convert to multi_rect), to avoid unnecessary and inefficient conversions. Edit: Actually on looking into the tilemap source I think a conversion will be necessary either way, so the emphasis should be on keeping a tilemap internal format that is quick to convert. Any changes to the API here have to be made quite carefully because of reasons of backward compatibility, because if we make the wrong decision (because of a rush to have it better than it is now), we can get complaints if we decide to change it again. With a single update method for tiles (the old version) it is going to be inefficient anyway, but with a batch update method there is an argument to keeping the data format is as compact and matching the internal tilemap format as possible. Also we need to have a way of specifying a null cell on a grid, if there isn't one already (I'm assuming you can already do this by supplying a zero, but I haven't used tilemaps). |
If you mean clearing cells, you pass ID -1 to do so ( |
@zzz-assault sorry to bother you with this, but could you evaluate the time spent into the loops ? Most likely by using something like As @lawnjelly explained, there are two slow areas, the question is whether your bottleneck is inside the "Update call from gdscript / c#" or not. So we have to evaluate how much time the code spends there to see if adding a batch function for set_cell is worth or not. |
Yup looking internally into the tilemap I can already see this structure:
And I was going to suggest something like that, combining the bits for flips etc into e.g. a 32 bit int with the ID. It makes no sense to try and send flips independently, unless for user friendliness (in which case you could maybe use the single update command instead). In your commit you are only sending 1 flip / transpose per line, which isn't an efficient way of updating if you have varying flips throughout a line. User friendlinessThe only gotcha is the user-friendliness to beginners aspect (which I can see user-friendliness guys objecting to). As far as a beginner is concerned, any use of bitflags could potentially be very confusing. But on the other hand, you can't efficiently update a bunch of bools using say a 32 bit bool each (especially on a per frame basis). Of course, we could consider this an advanced function. It would be nice if there was a way of encapsulating the cell structure when exposing it to gdscript, but I don't know if this kind of thing is possible / easy. Bottleneck in generating the data
While I'm not sure that having one aspect take a long time is an excuse for another, the point is good about the time to generate the level. I think that with some improvements to the visual server side, update can be made super fast to the renderer using the discussed type of gdscript functions. However this will move the bottleneck to updating the source data. I'm thinking primarily in terms of worst case scenarios, like cellular automaton type games that want to update / upload / render tilemap areas every frame. c# would probably be quicker at updating the source data, or we may end up with some functions for this kind of thing. |
@groud don't worry … I'm super happy, that so many contributers are trying to help and find a solution (which is probably above my capabilities). data based on print(OS.get_unix_time()):
P.S.: The print statement is exactly before and after the while loop, inside the loops nothing is performed except incrementing y (and x in case of set_cell) and the "set_cell" / "set_line" method. Hint: As you can see in the screenshot, my CPU is not a potato ... Intel I7 8700 @lawnjelly I thought adding also 3 PoolArray's for the Flips and Transpose, as a GDScript user could decide this also while creating the map logic and store it in an Array (same as the Tile Data), but I haven't changed the commit yet, due to the ongoing discussion if this PR is useful or not. |
Ah, indeed in both situation we can see the processing takes several seconds and that your solution provides significant improvements. So I guess you are right, surprisingly the amount of gdscript calls has a significant impact of the performances. However, there is still room for improvement in the tilemap code apparently, I am pretty sure setting the cells could be a lot faster. I would be great if we could find the bottleneck there too. |
Personally I'm not hugely opposed to offering it line by line, it does make the function simpler, but less efficient than the rect version, it will make per frame changes less viable. But take everything I say with a pinch of salt, I'm a performance orientated guy, and am often blind to the 'user experience'. 😄 The rect function can end up being quite large and unwieldy (unless you add some restrictions like insist that the array size matches the tilemap). It's the exact same function that is used in copying arbitrary rects to textures BTW, you have to deal with out of bounds conditions, intersection area etc. I have code to do this and it took a fair bit of testing to get right, there is probably some in Godot codebase already. Mind you, if we have it line by line, it will be left to the user to do all this if they decide to use regions, and they will likely make mistakes in all but the simple cases. This is something we should bring @reduz in on too, he may have opinions on the best way to approach this. |
While I agree that there are certainly rendering issues which can be improved, I think for the large part, this is still a GDScript problem. I'm almost sure if someone could go through all issues/PRs here regarding performance issues, we'd get issues and enhancements originating from the GDScript bottlenecks specifically. For instance, there was a similar PR suggesting adding Similarly, I implemented @zzz-assault unfortunately, according to my discussion experience with Godot maintainers, I wouldn't rely/hope on the engine to solve specific problems. If they do help with solving them to some extent, it takes a significant amount of persistence to discuss this, which oftentimes takes several months to review once the implementation is ready. 😝 @zzz-assault I suggest rewriting your class in C++ and see how this improves it for you, see Custom modules in C++ documentation, that would be also a nice opportunity for you to learn C++. |
That's true, but C# might be enough too, and easier to use. |
@groud I hoped that the performance bottleneck is mainly GD-Script, but you are right, it seems there is also an issue within the "set_cell" method. But at least my solution could improve the computing time by roughly 2-3 times. @lawnjelly It would be great to get a super performant solution, but I'm also in favor of more beginner friendly solutions, so more people can use the batch process. Perfect would be a simple 2DArray, which is loaded into the tilemap completely (but I assume a lot of checks would be needed, to avoid issues with the Variant Data Type of the basic Array in GD-Script). @Xrayez Thx for the advise, but I've got only 2 main bottlenecks directly connected to the Tilemap Node (my logic script isn't a problem so far), which are problematic (1. set_cell / 2. update_bitmask), for both I could write a C++ native script to solve my own issue, but I still think this is a common issue for others trying to create proc gen content in 2D -> so I'm trying to provide at least for "set_cell" a simple step forward with my PR :) |
I still think in this existing code the approach to setting the 2 flips and transpose is potentially problematic. It works for situations where you use no flips etc, but the moment you have flips, it becomes useless. We need to solve this problem I think. |
@lawnjelly Hmm, you might be right, I could use instead of single "bool"'s a vector3 Array with x,y,z as the 3 switches. For performance I would check with if statement first if no Array was provided and if not keep the loop as commited now -> if yes, the downside would be that I need to translate the x,y,z of each Vector 3 into the bool Statements (in each loop cycle - i.e. 0 = false, everything else = true) - not sure about the performance downside of this. Also a check if both Arrays (PoolIntArray and PoolVec3Array) have the same size is required (with if not push out an error?). What do you think, would this be a good approach? |
Bools are flat out inefficient for transferring this kind of info, they tend to only be used when there are small numbers of items involved (e.g. a function call). Trying to send a bool in a Vector3 is even worse. The standard solution for sending a number of bools is bitflags. If you have constant / enums for the flips / transpose, you can store the whole lot in one unsigned int (where | is the 'bitwise or' instruction):
Notice I say unsigned int. The snag is that we are currently using -1 to store blank tiles. This won't work easily, because bitwise, -1 in say an 8 bit value is:
This means the OR instruction to save the flags won't work. So in this situation you'd to use another approach, such as storing the blank status itself as a bitflag, e.g. Given that gdscript doesn't support unsigned ints, the maximum positive bit you can set in 32 bit is 1073741824, so you could use that as the highest value of your bitflag constants. This would allow you say 7 bit for bitflags, and 24 bit range for the tile IDs. This would be the most efficient and compact way of sending the data I think. Whether it would be deemed user friendly or not, is another matter. You could also maybe bandit an Image to send this data, as a wild idea (although I don't know how practical this would be)! 😁 (There may already be image code for copying subregions etc). Realistically though I think we need @reduz in on this discussion, and probably as a proposal, to pin down the most appropriate way of doing this. It will be very easy to hack in to start with (because the backend can be changed later), but it is crucial to get the API right the first time. |
@lawnjelly Thank you for your advise (great to learn for me :) ), but as the "set_line" function itself targets "the usual GD-Script User", I think your solution might be to complicated (lets say -> not user friendly ;) ). Not sure how to proceed now, as said from a user perspective either having a Vector3 Array or 3x Byte Arrays, would be the easiest to understand. I would assume if someone uses flip/transpose, he would use all or two of this features within a line, so I'm more leaning in the direction of the Vector 3 Arrays. Any other ideas? … or stick to bool? (which is basicaly only one toggle for the whole line) |
This is now open for over a year, so I'll close the PR. In addition I think the best way to proceed with this use case, would be a solution via Rect2 and somehow recognize the flip, transpose flags. |
At first, this is my first pull request and I'm not really a C++ dev … sorry for any mistakes :)
This PR is related to the performance enhancement request #31020.
Basicly it's a C++ loop of set_cell for a batch of tiles (based on PoolIntArray).
This gets a huge performance boost in case of big tilemaps / and frequent tilemap updates for the standard GD-Script user. Target was to keep it as flexible as possible (therefore also a start x) and only line wise loop -> the outer loop (columns = y) of the usually needed double-loop can stay in GD-Script without big performance disadvantages.
Example:
In case of an 1000x1000 Tilemap previously the full 1.000.000 loops with "set_cell" had to be executed in GD-Script (double-loop). With this change, only 1000 loops of "set_line" are done in GD-Script, the most part of the other loops (=999.000) will be done in C++.