Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libcalls for Windows without SSE should return i128 on the stack #127723

Open
tgross35 opened this issue Feb 19, 2025 · 9 comments
Open

Libcalls for Windows without SSE should return i128 on the stack #127723

tgross35 opened this issue Feb 19, 2025 · 9 comments
Labels
ABI Application Binary Interface backend:X86 platform:windows

Comments

@tgross35
Copy link
Contributor

tgross35 commented Feb 19, 2025

The following:

attributes #0 = { "target-features"="-mmx,-sse,+soft-float" }

define void @do_div(ptr %xptr, ptr %yptr, ptr %rptr) #0 {
  %x = load i128, ptr %xptr
  %y = load i128, ptr %yptr
  %res = udiv i128 %x, %y
  store i128 %res, ptr %rptr
  ret void
}

On x86_64-pc-windows-msvc generates:

do_div:                                 # @do_div
        push    rsi
        sub     rsp, 64
        mov     rsi, r8
        mov     rax, qword ptr [rcx]
        mov     rcx, qword ptr [rcx + 8]
        mov     r8, qword ptr [rdx]
        mov     rdx, qword ptr [rdx + 8]
        mov     qword ptr [rsp + 56], rcx
        mov     qword ptr [rsp + 48], rax
        mov     qword ptr [rsp + 32], r8
        mov     qword ptr [rsp + 40], rdx
        lea     rcx, [rsp + 48]
        lea     rdx, [rsp + 32]
        call    __udivti3
        mov     qword ptr [rsi], rax
        mov     qword ptr [rsi + 8], rdx
        add     rsp, 64
        pop     rsi
        ret

Link: https://llvm.godbolt.org/z/bcao4bdj1

In do_div, LLVM seems to expect __udivti3 to takes its arguments indirectly but return in a register pair (rax, rdx). This doesn't seem correct; nothing in the Windows calling convention suggests that values larger than 64 bits are ever returned in registers. Instead, i128 should be returned in a stack slot if sse is not available.

i128 is usually returned in xmm0, and the default is to both pass and return i128 in registers, so this seems like it may just be fallback behavior.

There is more at rust-lang/compiler-builtins#758, specifically this comment rust-lang/compiler-builtins#758 (comment).

@tgross35
Copy link
Contributor Author

Clang also passes and returns __int128 in registers with -mno-sse, which is the fallback behavior of <2 x i64> https://clang.godbolt.org/z/bEe3scbcr.

@llvmbot
Copy link
Member

llvmbot commented Feb 19, 2025

@llvm/issue-subscribers-backend-x86

Author: Trevor Gross (tgross35)

The following:
attributes #<!-- -->0 = { "target-features"="-mmx,-sse,+soft-float" }

define void @<!-- -->do_div(ptr %xptr, ptr %yptr, ptr %rptr) #<!-- -->0 {
  %x = load i128, ptr %xptr
  %y = load i128, ptr %yptr
  %res = udiv i128 %x, %y
  store i128 %res, ptr %rptr
  ret void
}

Generates:

do_div:                                 # @<!-- -->do_div
        push    rsi
        sub     rsp, 64
        mov     rsi, r8
        mov     rax, qword ptr [rcx]
        mov     rcx, qword ptr [rcx + 8]
        mov     r8, qword ptr [rdx]
        mov     rdx, qword ptr [rdx + 8]
        mov     qword ptr [rsp + 56], rcx
        mov     qword ptr [rsp + 48], rax
        mov     qword ptr [rsp + 32], r8
        mov     qword ptr [rsp + 40], rdx
        lea     rcx, [rsp + 48]
        lea     rdx, [rsp + 32]
        call    __udivti3
        mov     qword ptr [rsi], rax
        mov     qword ptr [rsi + 8], rdx
        add     rsp, 64
        pop     rsi
        ret

Link: https://llvm.godbolt.org/z/bcao4bdj1

In do_div, LLVM seems to expect __udivti3 to takes its arguments indirectly but return in a register pair (rax, rdx). This doesn't seem correct; nothing in the Windows calling convention suggests that values larger than 64 bits are ever returned in registers. Instead, i128 should be returned in a stack slot if sse is not available.

i128 is usually returned in xmm0, and the default is to both pass and return i128 in registers, so this seems like it may just be fallback behavior.

There is more at rust-lang/compiler-builtins#758, specifically this comment rust-lang/compiler-builtins#758 (comment).

@RalfJung
Copy link
Contributor

RalfJung commented Feb 19, 2025

I think some of the code generated for x86_64-pc-windows-msvc by default (i.e., with SSE) also makes little sense. I am particularly talking about https://llvm.godbolt.org/z/drah938T1. I would expect that __udivti3 can be implemented by a function that takes two i128 and returns an i128. But it seems the ABI LLVM expects __udivti3 to have is different from the ABI that LLVM generates for such a function! That doesn't make a lot of sense, does it? It may be hard to determine what the MSVC ABI says about a type that doesn't exist in MSVC when disabling a target feature that is usually always present on x86_64, but LLVM should at least agree with itself on the ABI.

@RalfJung
Copy link
Contributor

RalfJung commented Feb 19, 2025

@tgross35

The following: [...] Generates:

This is for x86_64-pc-windows-msvc, right? Your godbolt link shows the output for 3 targets but the other two are not relevant here, or are they?

@tgross35
Copy link
Contributor Author

tgross35 commented Feb 19, 2025

I think some of the code generated for x86_64-pc-windows-msvc by default (i.e., with SSE) also makes little sense. I am particularly talking about https://llvm.godbolt.org/z/drah938T1. I would expect that __udivti3 can be implemented by a function that takes two i128 and returns an i128. But it seems the ABI LLVM expects __udivti3 to have is different from the ABI that LLVM generates for such a function! That doesn't make a lot of sense, does it? It may be hard to determine what the MSVC ABI says about a type that doesn't exist in MSVC when disabling a target feature that is usually always present on x86_64, but LLVM should at least agree with itself on the ABI.

I think that LLVM doesn't guarantee much about the ABI that is used, instead leaving this to be the frontend's responsibility. So the per-arch default is something close to SysV, but it doesn't do any adjustment based on the OS - meaning we (rustc) need to manually make this indirect.

Libcalls are slightly different in that they need to have a concrete calling convention, and it is up to whoever provides them to make sure it matches what is expected (which is unfortunately tricky). Usually this is the system's default C ABI, but I can think of a few counterexamples (Apple f16, aeabi, AVR). So the problem here is that LLVM is effectively / by default using a non-system ABI for libcalls but there doesn't seem to be a good reason for this, so I think that makes sense to change.

It would be nice if using i128 on Windows just followed the regular MSVC rules (basing this on a combination of what MinGW GCC and Clang do and what the Windows ABI says):

  • Pass indirectly
  • Return in xmm0 if possible
  • Return indirectly otherwise

But I think that is a bigger design decision outside of what LLVM wants to handle at the current time. (Of course I don't know LLVM super well and could be completely off base here).

I know different discussion about improving this has come up before, it looks like @nikic has been doing some work recently https://discourse.llvm.org/t/rfc-an-abi-lowering-library-for-llvm/84495 / https://discourse.llvm.org/t/llvm-introduce-an-abi-lowering-library/84554.

(I'm not sure how parts of the ABI that don't really have an IR representation are handled, e.g. the ... in varargs, red zones, etc.)

@tgross35
Copy link
Contributor Author

This is for x86_64-pc-windows-msvc, right? Your godbolt link shows the output for 3 targets but the other two are not relevant here, or are they?

That's correct, I updated the issue.

@RalfJung
Copy link
Contributor

RalfJung commented Feb 20, 2025

So the problem here is that LLVM is effectively / by default using a non-system ABI for libcalls but there doesn't seem to be a good reason for this, so I think that makes sense to change.

I would say the problem is that LLVM is using an ABI for libcalls that is not otherwise exposed at all. If the libcall ABI can be different from the default C ABI, fine, but clearly LLVM already has the info to compute the libcall ABI, so I should be able to do something like define libcall_abi i128 @my__udivti3(...) and reuse the same logic (knowing how i128 is returned from libcalls), rather than having to reverse engineer this based on the assembly LLVM generates.

@efriedma-quic
Copy link
Collaborator

I suspect what happened was that nobody was willing to touch the LLVM backend behavior for fear of unexpected interactions, so we just chipped around the edges with minimal fixes. That might not have been the best approach, but that's where we are.

I think the x64-with-sse ABI works the way it does for compatibility with MinGW gcc. See 0897caa .

x64-without-SSE is not an ABI that's actually properly defined by anyone; probably nobody has considered the interaction here.

@RalfJung
Copy link
Contributor

I think the x64-with-sse ABI works the way it does for compatibility with MinGW gcc. See 0897caa .

So whatever part of the backend emits libcalls is aware of the target (MSVC vs GNU) and can adjust the ABI expected of __udivti3 accordingly, but when defining that function myself I have to do ABI handling myself? That still seems odd...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI Application Binary Interface backend:X86 platform:windows
Projects
None yet
Development

No branches or pull requests

5 participants