Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix libtheora optimizations causing errors in calling function for x86_64 Windows #103176

Merged
merged 1 commit into from
Feb 23, 2025

Conversation

0xcafeb33f
Copy link
Contributor

@0xcafeb33f 0xcafeb33f commented Feb 22, 2025

libtheora clobbers registers xmm6/xmm7 when OC_X86_ASM (set in Godot with x86_libtheora_opt_gcc or x86_libtheora_opt_vc) or OC_X86_64_ASM is set. This matches the calling convention in all systems except 64-bit Windows. This PR disables Theora optimizations on x86_64 Windows GCC. VC already had this behavior:

if env["arch"] == "x86_32":
env["x86_libtheora_opt_vc"] = True

When testing reproducibility, please run scons --clean before each build: LTO is dependent on already-generated object files.

@0xcafeb33f
Copy link
Contributor Author

CC @berarma

@0xcafeb33f
Copy link
Contributor Author

0xcafeb33f commented Feb 22, 2025

@AThousandShips
Copy link
Member

This can't supersede a PR merged two days ago, is this still relevant?

@0xcafeb33f
Copy link
Contributor Author

0xcafeb33f commented Feb 22, 2025

This can't supersede a PR merged two days ago, is this still relevant?

Sorry if I used the wrong terminology -- the previous PR was a workaround that disabled extra optimizations on Windows. It should probably be reverted if others can confirm that this PR fixes the bugs with the other PR reverted.

@berarma
Copy link
Contributor

berarma commented Feb 22, 2025

Haven't you thought that I might be already working on this? ☹️

@0xcafeb33f
Copy link
Contributor Author

0xcafeb33f commented Feb 22, 2025

Haven't you thought that I might be already working on this? ☹️

Edit: To answer your question directly, no. I mentioned that I was working on a PR before you commented on the issue, and you did not say that you were preparing a PR.

Sorry. If you have a better fix, feel free to submit a separate PR and I will close this one.

@AThousandShips
Copy link
Member

It's important to communicate about working on fixes, if someone else is working on an issue already you should ask them if they're working on something before making a PR of your own

@0xcafeb33f
Copy link
Contributor Author

It was my understanding that we were working together on a fix in #103106, and that at the end, we both identified the error as having the same source. I did not mean any harm by submitting this PR, and like I said, @berarma I will close this PR if you prefer.

@AThousandShips
Copy link
Member

The point is to communicate so these kinds of situations happen, not about assigning any blame, so people don't waste time doing work assuming they were working alone on something

@hpvb
Copy link
Member

hpvb commented Feb 22, 2025

This can't supersede a PR merged two days ago, is this still relevant?

Sorry if I used the wrong terminology -- the previous PR was a workaround that disabled extra optimizations on Windows. It should probably be reverted if others can confirm that this PR fixes the bugs with the other PR reverted.

I don't think that the LTO settings change should be reverted. The symptoms of that one are weird and do not seem to be directly related to the assembly in libtheora. Even if with this change the LTO ICE no longer happens, I would assume that's just luck of the draw.

Copy link
Member

@hpvb hpvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@akien-mga
Copy link
Member

As discussed on #buildsystem, should we disable those (potential incorrect for that ABI) x86 optimizations for x86_64 on macOS and Linux too?

diff --git a/platform/macos/detect.py b/platform/macos/detect.py
index 227738877d..f8c017db75 100644
--- a/platform/macos/detect.py
+++ b/platform/macos/detect.py
@@ -180,11 +180,6 @@ def configure(env: "SConsEnvironment"):
         env.Append(CCFLAGS=["-ftest-coverage", "-fprofile-arcs"])
         env.Append(LINKFLAGS=["-ftest-coverage", "-fprofile-arcs"])
 
-    ## Dependencies
-
-    if env["builtin_libtheora"] and env["arch"] == "x86_64":
-        env["x86_libtheora_opt_gcc"] = True
-
     ## Flags
 
     env.Prepend(CPPPATH=["#platform/macos"])
diff --git a/platform/windows/detect.py b/platform/windows/detect.py
index 6f57b88044..bbdcb9e79a 100644
--- a/platform/windows/detect.py
+++ b/platform/windows/detect.py
@@ -732,7 +732,7 @@ def configure_mingw(env: "SConsEnvironment"):
         if env["use_static_cpp"]:
             env.Append(LINKFLAGS=["-static"])
 
-    if env["arch"] in ["x86_32", "x86_64"]:
+    if env["arch"] == "x86_32":
         env["x86_libtheora_opt_gcc"] = True
 
     env.Append(CCFLAGS=["-ffp-contract=off"])

As @berarma pointed out, libtheora actually seems to have an OC_X86_64_ASM flag which we don't define. From a quick look at the code it seems to be conditional to OC_X86_ASM being defined, so for x86_64 we might want to define both and see if that reintroduces the Windows issue or not.

I wouldn't add OC_X86_64_ASM in this PR though as we need a bugfix for 4.4-stable, and adding OC_X86_64_ASM would be more enabling a new feature, so something to look at for 4.5.

I'd also suggest adding @berarma as co-author since you both came up to this solution together, so the huge work you both did is well recognized.

@berarma
Copy link
Contributor

berarma commented Feb 22, 2025

I wouldn't add OC_X86_64_ASM in this PR though as we need a bugfix for 4.4-stable, and adding OC_X86_64_ASM would be more enabling a new feature, so something to look at for 4.5.

Yes, that's the best solution at this point. I didn't go deep enough into the library to know everything about this inline assembly parts.

I'd also suggest adding @berarma as co-author since you both came up to this solution together, so the huge work you both did is well recognized.

This is fine. I'd like to make it clear that I recognize the help from @0xcafeb33f, I said so before and I'm saying it again. It just feels bad that after spending many hours debugging this hard to follow bug, with the assistance from @0xcafeb33f for testing and confirmation of my results as the original reporter, I felt that my work was being taken over with little consideration for the effort I've put into it. Without counting the many hours I've spent to bring Theora support to a better place. I hope @0xcafeb33f thinks about this for future collaborations.

Co-authored-by: Bernat Arlandis <berarma@hotmail.com>
@0xcafeb33f
Copy link
Contributor Author

0xcafeb33f commented Feb 23, 2025

@hpvb I agree that the LTO ICE going away does not singlehandedly show that the underlying compiler issue is fixed. Are there other tests I could run to see if this commit fixed it?

@akien-mga I'm not certain these changes are needed for MacOS/Linux. For Linux, xmm6/xmm7 do not need to be preserved across function calls. I couldn't find a reference for MacOS (edit for posterity: macOS x86_64 follows Linux x86_64 ABI). Regarding OC_X86_64_ASM, as I mentioned at the very top of this PR, the inlined assembly from both that flag and OC_X86_ASM clobber xmm6/xmm7, so we cannot simply switch to OC_X86_64_ASM to fix this.

I'm relatively new to open source, and I didn't know that co-authors could be added to commits -- I've added @berarma, and if I had known, I certainly would have added them in the initial commit. I would like to apologize to @berarma as I should have specifically asked before making a PR since we were both working on this so closely. I was excited to help get these regressions patched before the stable version of 4.4, and I let that excitement carry me away. @berarma, I would like to make clear that I have done more than test your patches and confirm your results, and I did not jump into development at the last moment. I had started working on a PR to fix this issue almost two days ago, added crucial details about the cause of the bug, and even installed a Linux VM to try to reproduce this more broadly. That said, I would like to say thank you @berarma for your huge help on this issue. You have done at least as much work as me on finding the root cause of this issue. You deserve credit for both that and your initial patches to improve libtheora integration with Godot!

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

@berarma
Copy link
Contributor

berarma commented Feb 23, 2025

@berarma, I would like to make clear that I have done more than test your patches and confirm your results, and I did not jump into development at the last moment. I had started working on a PR to fix this issue almost two days ago, added crucial details about the cause of the bug, and even installed a Linux VM to try to reproduce this more broadly.

I don't think I've undervalued your help at all and anyone can see it in the issue itself, I won't talk more about this. And the proposed PR that you mention as prove that you were working on a fix before me consisted in removing one line that didn't really fix the issue and wasn't even remotely close to the solution. I think we should stop worrying about this and be glad that it's been resolved.

@eviltrout
Copy link
Contributor

This bug was causing stuttered video playback for my project on Windows in RC1 and I can confirm this patch fixes the bug. Thanks so much to @berarma and @0xcafeb33f for working this solution out.

@akien-mga akien-mga merged commit 58e4e34 into godotengine:master Feb 23, 2025
20 checks passed
@akien-mga
Copy link
Member

Thanks @0xcafeb33f and @berarma for the great work testing, debugging and solving this long-running issue!
I'm aware that it's not the funniest task when a correct bugfix PR seems to trigger a regression that's actually a pre-existing bug made more prominent, so I'm thankful you both gave it full attention so we can release 4.4 without regressions (and keeping the great bug fixes from @berarma).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment