Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Buildsystem] Improve cache handling #96752

Merged
merged 1 commit into from
Sep 11, 2024

Conversation

AThousandShips
Copy link
Member

@AThousandShips AThousandShips commented Sep 9, 2024

Prevents cache issues by not purging cache before starting a build. Splits cache purge related code from progress code and delays the purge until after final build is done.

I've noticed recently that large rebuilds of some Linux/BSD targets create a broken cache state where even minor changes in a different PR takes a very long time or the full time to rebuild (i.e. 20+ minutes even on a single line change), this is fixed by deleting the cache and re-running the master build but it isn't very efficient. And after some testing and looking around my theory for what causes this is as follows:

  1. Large rebuilds fill the cache completely, and due to where the cache limit is currently enforced this may result in files existing over the limit as the current final purge is run before the final steps and some output is cached after that.
  2. The cache is purged before compiling anything, which wouldn't necessarily be a problem if not for
  3. The pruning relies on last modification time, which seems (haven't confirmed, but likely) to be mangled by the cache-restore action, making the files that are pruned essentially random

So this removes the initial purging, which IMO is unnecessary, it only enforces the cache size to the limit if it is over the limit at the start, which is unlikely especially on CI builds where the pruning is most important. If it prunes to the limit it will exceed the limit during the build anyway unless it does nothing. It also takes up some time (haven't measured) needlessly.

But worse it risks erasing actually valid files, and is almost guaranteed to do so if the cache is full after the CI finishes, as it doesn't prune correctly.

There might be further improvements possible here, but this does, in my testing, avoid this failure state.

I also cleaned up some of the code (removed unused arguments and split the functionality up properly, addressing that TODO for this)

Will look into further improving this later by adding more sorting conditions and looking at other details, but this should resolve the more urgent issues I've seen.

Further improvements I'm looking into for the future:

  • Use more file time info to decide what to prune
  • Don't run the prune in dry builds
  • Leverage SCons features like CacheDir to handle this more efficiently

@AThousandShips AThousandShips added bug enhancement topic:buildsystem cherrypick:4.1 cherrypick:4.2 Considered for cherry-picking into a future 4.2.x release cherrypick:4.3 Considered for cherry-picking into a future 4.3.x release labels Sep 9, 2024
@AThousandShips AThousandShips added this to the 4.4 milestone Sep 9, 2024
@AThousandShips AThousandShips requested a review from a team as a code owner September 9, 2024 11:46

progress_finish_command = Command("progress_finish", [], progress_finish)
AlwaysBuild(progress_finish_command)
atexit.register(cache_finally)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a better approach than atexit but AlwaysBuild runs before any final target is built and haven't found a SCons way to do this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already using atexit in the buildsystem, so it's fine as-is. If there is an SCons equivalent, it's not something we have setup anywhere & is likely outside the scope of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, there is a AddPostAction but that seems to require a target in various ways so haven't tested it, but will follow-up this PR with more research into SCons for these purposes

Prevents cache issues by not purging cache before starting a build.
Splits cache purge related code from progress code and delays the purge
until after final build is done.
@akien-mga akien-mga self-requested a review September 9, 2024 15:47
@AThousandShips
Copy link
Member Author

AThousandShips commented Sep 10, 2024

For an example of this occurring, see this build, where a simple one line addition to a test took just under 30 minutes to compile as the cache was over the limit at the start:
https://github.com/godotengine/godot/actions/runs/10792735660/job/29933084146#step:10:69

Thinking further about this I suspect that part of why the cache overshoots the limit is that the current code has the pruning as a plain action, which means it runs in parallel with the other steps, meaning that it might run and finish before other compile steps

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall, I have a hard time wrapping my head around how the caching system works, so I'm happy to trust your judgement :D

@AThousandShips
Copy link
Member Author

AThousandShips commented Sep 11, 2024

The only real change here is that we:

  • Don't prune the cache before starting
  • Ensure the cache is pruned after all compilation is done

The rest is just splitting the two functionalities up, using the same methodology

Will look at other improvements later probably

@akien-mga akien-mga merged commit 41828e6 into godotengine:master Sep 11, 2024
20 checks passed
@akien-mga
Copy link
Member

Thanks!

@AThousandShips AThousandShips deleted the cache_clean_improve branch September 11, 2024 16:39
@AThousandShips
Copy link
Member Author

Thank you!

@akien-mga
Copy link
Member

Cherry-picked for 4.3.1.

@akien-mga akien-mga removed the cherrypick:4.3 Considered for cherry-picking into a future 4.3.x release label Sep 16, 2024
@akien-mga akien-mga removed the cherrypick:4.2 Considered for cherry-picking into a future 4.2.x release label Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants