Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Changing the Floating-point environment for intrinsic/assembly code UB? #471

Open
chorman0773 opened this issue Oct 18, 2023 · 43 comments
Labels
A-floats Topic: concerns floating point operations/representations S-pending-documentation Status: The issue is "resolved," but this resolution needs documentation

Comments

@chorman0773
Copy link
Contributor

Disclaimer: This is not attempting to solve the general fe_setenv issue and allow changing the floating-point environment for floating-point code.

Based on rust-lang/rust#72252, it seems the following code is currently UB:

pub unsafe fn div_3_1() -> f32{
    use core::arch::x86_64::*;
    let x = _mm_set_ss(1.0);
    let y = _mm_set_ss(3.0);
    _MM_SET_ROUNDING_MODE(_MM_ROUND_TOWARD_ZERO);
    let z = _mm_div_ss(x,y);
    _MM_SET_ROUNDING_MODE(_MM_ROUND_NEAREST);
    let mut k = 0.0f32;
    _mm_store_ss(&mut k,z);
    k
}

Likewise, the following code is also considered UB:

pub unsafe fn div_3_1() -> f32{
    use core::arch::x86_64::*;
    let x = _mm_set_ss(1.0);
    let y = _mm_set_ss(3.0);
    _MM_SET_ROUNDING_MODE(_MM_ROUND_TOWARD_ZERO);
    let z;
    asm!("divss {}, {}", inlateout(xmm_reg) x => z, in(xmm_reg) y);
    _MM_SET_ROUNDING_MODE(_MM_ROUND_NEAREST);
    let mut k = 0.0f32;
    _mm_store_ss(&mut k,z);
    k
}

Both of these are suprising, as no rust-level floating-point operations are performed that would be affected by the rounding mode - only platform intrinsics or inline assembly.

These are limited examples, but a much more general example of this is a library that implements floating-point operations (including those in non-default floating-point environments from, e.g. C or C++) using a combination of software emulation, inline assembly, and platform intrinsics.

Assuming the LLVM issue mentioned in 72252 is fixed (and llvm's code generation for the llvm.x86.sse.div.ss intrinsic is fixed), can we call these examples defined behaviour, or is this code simply UB for some other reason?

@chorman0773
Copy link
Contributor Author

@rustbot label +A-floats

@rustbot rustbot added the A-floats Topic: concerns floating point operations/representations label Oct 19, 2023
@RalfJung
Copy link
Member

RalfJung commented Oct 19, 2023

Both of these are suprising, as no rust-level floating-point operations are performed that would be affected by the rounding mode - only platform intrinsics or inline assembly.

You can't know that no Rust-level floating-point operations are affected. The compiler is allowed to move floating-point operations from elsewhere to in between the SET_ROUNDING_MODE calls. This is not a bug, floating-point operations are pure operations in the Rust AM and can be reordered arbitrarily. rustc has to thus implement them in a way that their behavior does not depend on any environmental flags, and it does that by having "the rounding mode is round-to-nearest" in its representation invariant that relates the low-level target state to the high-level AM state.

So yes, this is UB.

@chorman0773
Copy link
Contributor Author

chorman0773 commented Oct 19, 2023

My next question is whether it should be UB.

If even being in (any) rust code with an incorrect floating-point environment is UB, then that implicates quite a few things:

  • The aforementioned floating-point implementation library, which must support modifications to the floating-point environment to comply with ISO 9899 and platform ABI specifications
  • This also really affects any compiler runtime support library that can be called from not rust code. libatomic and libunwind implementations are most suspect IMO (I've 100% written a C++ RAII wrapper that sets the floating-point environment in a constructor, then resets it in the destructor and thrown out of that function). I would not be surprised if compiler_builtins is linked a mixed C/Rust project, and may be called by the compiler from inside a function that manipulates the floating-point environment, and thus causing UB by virtue of existing in the project tree.
  • This also means that rust code can never be entered from a signal handler, as the floating-point environment is unspecified upon entry, and modifying it from a signal handler is undefined behaviour.

@RalfJung
Copy link
Member

RalfJung commented Oct 19, 2023

FWIW C code compiled with LLVM (and likely most other compilers) has the same UB. This is not just a Rust problem. I don't know anything about any of your examples, but if a platform ABI requires FP env mutation (as you claim for point 1) then that's already a highly problematic ABI at best. Which ABI requires FP env mutation? Therefore I also don't buy your second example; if that runtime support is implemented in C, then it already implicitly assumes a default FP environment. The C ABI (on pretty much any target) requires the FP env to be in default state.

Allowing modification of the FP environment without compromising optimizations on the 99.9% of functions that are happy with the default FP settings is not easy. We'd have to add some notion of "scope within which the compiler does not assume that the FP environment is in the default state". This would have to interact properly with inlining and things like that. It's not impossible, but it requires a bunch of design work.

@chorman0773
Copy link
Contributor Author

chorman0773 commented Oct 19, 2023

FWIW C code compiled with LLVM (and likely most other compilers) has the same UB

clang supports the STDC FENV_ACCESS pragma per requirements of ISO 9899. It uses constrained floating-point intrinsics. gcc also supports this pragma (again, as required by the standard). It is also not considered undefined behaviour by C to modify the floating-point environment w/o the pragma, though floating-point operations issued in a non-default FP Env may yield incorrect results according to the current FP Env - e.g. by constant-folding ops under the default rounding mode (whether it does is unspecified).
Any C compiler that considers it UB to do so is not compliant with the standard and I am under no obligation to write code that supports it, nor to have my own implementation follow such broken compilers. This is also true of C++, though C++ does not require support of the pragma (all C++ compilers I'm aware of, even MSVC, support it, though).

Which ABI requires FP env mutation

Moreso where the FP env is stored on the ABI, so I cannot emulate the fp env in the library. Practically every ABI I'm aware of specifies the platform effects of the fesetenv and fegetenv C functions.

I don't know anything about any of your examples

The main example is libsfp, a runtime support library used by lccc to implement floating-point operations (of various sizes, among others), including those in a function marked #[fpenv(dynamic)]. To comply with ISO 9899, it must respect the floating-point environment in such functions, and must not cause UB period.

The C ABI (on pretty much any target) requires the FP env to be in default state.

This is not the case of x86_64 Sys-V or MSABI. Both require that the floating-point environment is initialized to default (and specify that default), and then marks the control bits of the relevant registers (mxcsr and the x87 fcw) as callee saved, with the exception of functions that intentionally modify the floating-point environment (fesetenv is one). This is to support the aforementioned function, which is required by ISO 9899. Any C ABI that makes fesetenv immediate undefined behaviour is not a complaint ABI.

@chorman0773
Copy link
Contributor Author

It is also not considered undefined behaviour by C to modify the floating-point environment w/o the pragma, though floating-point operations issued in a non-default FP Env may yield incorrect results according to the current FP Env

If this was the case, then being in an async-signal-handler is immediate UB (as I noted is the case for Rust), which is most definitely not the case.

@RalfJung
Copy link
Member

RalfJung commented Oct 19, 2023

As far as I know, C/C++ code compiled with clang without special markers behaves exactly like Rust code wrt float operations, and hence has the same UB. I assume basically all C/C++ compilers will treat float operations as pure (unless some special marker is set to indicate that a scope has a non-default FP env) and hence move them around (including out of loops and out of potentially dead code). This makes mutating the FP environment UB in the real C that compilers implement, whether or not the standard agrees.

I have no interest in specifying Rust in a way that is disconnected from reality, so we should call this UB as that's what it is. Maybe it's UB because compilers are not compliant, but how's that helpful? It isn't, unless you have a proposal for how the standard can be implemented in a reasonable way.

This is not the case of x86_64 Sys-V or MSABI. Both require that the floating-point environment is initialized to default (and specify that default), and then marks the control bits of the relevant registers (mxcsr and the x87 fcw) as callee saved, with the exception of functions that intentionally modify the floating-point environment (fesetenv is one). This is to support the aforementioned function, which is required by ISO 9899. Any C ABI that makes fesetenv immediate undefined behaviour is not a complaint ABI.

That can't be true. When I write an extern "C" function, I must be able to rely on the FP env being in default state. If that wasn't the case then every single externally callable function that might use FP operations had to start by setting the rounding mode. If I set the FP env to a non-default state and then call some library and it misbehaves, I don't get to complain. There is no general expectation that libraries are resilient against non-default FP envs. (If any of this is wrong please let me know, I certainly haven't seen any evidence to the contrary.)

All of this shows that the rounding mode is part of the de-facto ABI. If the documentation disagrees then the documentation doesn't reflect the contract used in real-world software.

If this was the case, then being in an async-signal-handler is immediate UB (as I noted is the case for Rust), which is most definitely not the case.

If the async signal handler uses any float operation, then it's most definitely UB. There's also nothing in the LLVM LangRef that would forbid the compiler from introducing a new float operation into code that doesn't use a float operation. This can be as subtle as

if !in_signal { some_float_op(); }

and hoisting the operation out of the if (which is obviously legal for a pure operation).

Sounds like async signal handlers need to be compiled with such a "FP env might be in non-default state" kind of a scope, otherwise there's no way they can be sound.

Also sounds like nobody really thought this entire FP env thing through to the end and different parts of the ecosystem made mutually incompatible choices, and now it's all busted. 🤷 It doesn't get better by pretending that it's not UB, though.

@chorman0773
Copy link
Contributor Author

As far as I am aware gcc (and msvc) implement the behaviour as prescribed. If clang does not, I would consider that a bug in clang and certainly not any behaviour I would desire to emulate in lccc.

That can't be true. When I write an extern "C" function, I must be able to rely on the FP env being in default state.

This is either an extra constraint imposed by rust and that does not reflect any actual C abi, or is an incorrect reliance. As far as I am aware, no C abi is not complaint with the relevant sections of ISO 9899 in this regard. Knowing the precise behaviour of the Clever-ISA abi, I can quote the relevant text, though x86_64 Sys-V is similar (albeit less formal).

The fpcrw register is set to 0 upon start up. The fpcrw.EXCEPT bits are caller-saved. No function should modify the value of the fpcrw register unless indicated by the behaviour of the function (in particular, the fesetenv function sets the value as per the "floating-point env" section of this document). Otherwise, the register shall be treated as reserved (must be restored to value at entry prior to calling another function or returning if modified).

@chorman0773
Copy link
Contributor Author

(if you'd like, I can find the relevant portions of the x86_64 sys-v spec and msvc abi)

@RalfJung
Copy link
Member

RalfJung commented Oct 20, 2023

As far as I am aware gcc (and msvc) implement the behaviour as prescribed. If clang does not, I would consider that a bug in clang

GCC says "Without any explicit options, GCC assumes round to nearest or even". It's unclear what that means, but it's far from obvious that it means "GCC guarantees that the code will work correctly under all FP environments".

LLVM is very clear: "The default LLVM floating-point environment assumes that traps are disabled and status flags are not observable. Therefore, floating-point math operations do not have side effects and may be speculated freely. Results assume the round-to-nearest rounding mode, and subnormals are assumed to be preserved." You might want to bring this up with the LLVM people if you think that's an issue.

Usually @comex is very good at getting compilers to apply the right optimizations in the right order to demonstrate an end-to-end miscompilation, maybe they can do it here, too? :)

This also means that rust code can never be entered from a signal handler, as the floating-point environment is unspecified upon entry, and modifying it from a signal handler is undefined behaviour.

What is this claim based on? Is it some standard that says so, or are there really targets and OSes where the kernel doesn't save and restore the FP environment when switching from a thread to its signal handler and back?

(if you'd like, I can find the relevant portions of the x86_64 sys-v spec and msvc abi)

Do you have any evidence that every single library with a C ABI is actually expected to be working correctly under arbitrary FP environments, and that library authors consider it a bug when their library misbehaves under non-default FP environments?

As I said before, I care not only about what it says in some piece of paper and but also about what is actually done in the real world. When standard and reality disagree, it's not automatically reality that's wrong. Sometimes the standard is just making completely unrealistic prescriptions that everybody ignores, and the standard should be fixed.

It's also unclear to me which alternative you are suggesting. Could you make a constructive proposal? Here are some options, and you can already immediately see why many people won't like them:

  • FP operations are specified to be using a non-deterministically chosen rounding mode. Or:
  • The compiler may not move FP operations to different parts of the code unless it can prove both use the same FP rounding mode, and the compiler may also not synthesize new FP operations. Since the compiler cannot know which functions are "documented" to change the FP status register, it has to conservatively assume that any function that is called changes the FP rounding mode, and never reorder an FP operation across a function call.

You are asking everyone to pay for a feature that hardly anyone needs. Is that your position, or do you see a better way out here?

If you further want to claim that even other aspects besides the rounding mode may be changed, such as making sNaNs trigger a trap, then either passing an sNaN to an FP operation is UB, or FP operations cannot be reordered at all with anything any more (e.g., reordering an FP operation and a store becomes illegal since the trap makes it fully observable when exactly an operation happens).

@Muon
Copy link

Muon commented Oct 20, 2023

Linux definitely restores the FP environment when it enters a signal handler. There was a big kerfuffle about it back in 2002 when SSE2 arrived (https://yarchive.net/comp/linux/fp_state_save.html). I think FreeBSD might be a target that actually does not restore the FP environment when entering a signal handler, but I am unsure (https://reviews.freebsd.org/D33599). In any case, glibc says that fesetenv is async-signal-safe (https://www.gnu.org/software/libc/manual/html_node/Control-Functions.html), so fixing this shouldn't be a problem.

The SYSV ABI (https://gitlab.com/x86-psABIs/x86-64-ABI) stipulates that the FP control bits are callee-saved, meaning that the callee needs to restore them if it changes them. (Presumably an exception is intended for fesetenv and fesetround, but it seems to have been forgotten.) This doesn't mean that publicly-accessible library functions using FP instructions have to be built defensively to be correct, just that they have an undocumented assumption (that the FP environment is default).

The main consideration for Rust is that it is ultimately bound by LLVM's quirks. LLVM has made (is still making?) progress towards letting Clang support #pragma STD FENV_ACCESS ON, so something similar would be good to implement in Rust eventually. My preference would be an attribute applicable to blocks and functions that describes how floating-point arithmetic behaves within them, similar to #pragma STDC FENV_ROUND direction.

Additionally, the C23 standard (and possibly earlier revisions) specifies in Section 7.6.1 "The FENV_ACCESS pragma" that it is UB to, under a non-default floating-point environment, execute any code that was compiled with the pragma set to off.

@RalfJung
Copy link
Member

RalfJung commented Oct 20, 2023

@Muon thanks! So looks like in practice, signal handlers are fine on our tier 1 targets, but other targets are having issues. (Also I heard that some versions of WSL do not save and restore the FP env for signal handlers.)

Presumably an exception is intended for fesetenv and fesetround, but it seems to have been forgotten.)

For this to be useful for compilers, the exception needs to be compiler-readable. Connor quoted above some wording saying that if the function is "documented" to change the FP state then it may do so, but of course that's not very useful.

This is similar to how setjmp needs an attribute so that the compiler can understand that something very weird is going on.

(Though floats are different in that as far as I can see, even with such an attribute there'd be a global cost.)

This doesn't mean that publicly-accessible library functions using FP instructions have to be built defensively to be correct, just that they have an undocumented assumption (that the FP environment is default).

For Rust (and code compiled with clang) this means all functions have such an undocumented assumption.

@RalfJung
Copy link
Member

FWIW in my opinion this is a case of bad ISA design. ISAs chose to introduce some global mutable state and as usual, global mutable state is causing problems. ISAs should provide opcodes that entirely ignore the FP status register so that languages can implement the desired semantics (floating-point operations that do not depend on global mutable state) properly. But it seems like even RISCV repeats this mistake, so we'll be stuck with hacks and quirks for many decades to come. Languages can choose to either make those ISA features basically inaccessible, to penalize all users for the benefit of the tiny fraction that actually wants a non-default FP status register, or to introduce syntactic quirks that mark where in the code floating-point opcodes behave in non-default ways.

@Muon
Copy link

Muon commented Oct 20, 2023

Global mutable state includes registers and status flags. While it would have been nice to avoid it in this instance, it's seemingly unavoidable in general and compiler writers deal with it as a matter of course.

I'd say that programming languages have a pretty free hand here. You have to indicate the rounding somehow, anyway. I'm not sure how it's possible to expose alternate rounding modes without syntactic quirks of some variety!

@RalfJung
Copy link
Member

General registers are a lot less global -- most instructions take a list of input/output registers and only act on them. The FP status register is very different from that. This would be very easy to avoid if FP operations had a bit in the opcode which says "ignore status register, use default instead", but sadly no ISA seems to have such a bit.

There's a reason we're having this problem with the FP status register but not with any of the others. (Yes of course there's a whole bunch of status registers, but the others don't affect very basic operations such as addition of a primitive type, and so far there's been no complaint about Rust requiring them all to be in a particular fixed state that ensures the instructions we need behave the way we need them to behave.)

@Muon
Copy link

Muon commented Oct 20, 2023

At least on x86, the comparison, overflow, and other flags govern conditional jumps and moves. I'd say these are extremely basic operations. The carry flag makes add-with-carry work. Codegen routinely has to deal with setting and clearing these appropriately.

@bjorn3
Copy link
Member

bjorn3 commented Oct 20, 2023

Condition flags are caller saved and undefined for the callee, but the float env is callee saved and most code expects the default float env to be used as current float env. This means that nobody sets the float env to the value it expects. And presumably changing the float env is rather expensive unlike changing condition flags.

@Muon
Copy link

Muon commented Oct 20, 2023

I didn't say it was trivial or cheap, just that this sort of thing is already handled routinely.

@chorman0773
Copy link
Contributor Author

This would be very easy to avoid if FP operations had a bit in the opcode which says "ignore status register, use default instead", but sadly no ISA seems to have such a bit.

AVX-512 and AVX10 allows you to suppress exception checking (the #XM CPU exception, IIRC it still updates the exception bits) and set a static rounding control.
TBH, I could probably do that trivially in Clever-ISA - I have the available bits.

@chorman0773
Copy link
Contributor Author

(Of course, that's AVX512, so... - heh)

@quaternic
Copy link

Note that there is some more discussion on Zulip: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Is.20it.20UB.20to.20change.20rounding.20modes.20without.20using.20float.20ops (which also led this issue being created)

General registers are a lot less global -- most instructions take a list of input/output registers and only act on them. The FP status register is very different from that. This would be very easy to avoid if FP operations had a bit in the opcode which says "ignore status register, use default instead", but sadly no ISA seems to have such a bit.

Some instructions do have encodings for explicit rounding modes. On x86, SSE4.1 has https://www.felixcloutier.com/x86/roundss ("round float to an integer value" ), which allows overriding the rounding mode for that instruction, as well as suppressing the inexact-exception. That doesn't mean it ignores the control register though; an SNaN input will still signal invalid operation which may or may not be masked in MXCSR. [And as chorman0773 said above: ] More recently, AVX-512 has some explicit rounding/exception control bits in the instruction prefix, so that you can do the same with most computational operations as well.

I believe a problem with having instruction encodings that ignore the FP control register is that their presence would effectively cripple the uses of that control register. E.g. the only reason to have SNaNs in your program execution is that you want to trap and handle an exception when the cpu tries to use that value, which would no longer be reliable if some function somewhere has chosen to sidestep that.

@bjorn3
Copy link
Member

bjorn3 commented Oct 20, 2023

Trapping SNaN is UB in LLVM. Manually checking for NaN at the end is likely faster than disabling the optimizations that depend on float ops never trapping.

@RalfJung
Copy link
Member

RalfJung commented Oct 20, 2023

I believe a problem with having instruction encodings that ignore the FP control register is that their presence would effectively cripple the uses of that control register. E.g. the only reason to have SNaNs in your program execution is that you want to trap and handle an exception when the cpu tries to use that value, which would no longer be reliable if some function somewhere has chosen to sidestep that.

SNaNs are a failed experiment that many major C compilers (at least GCC and clang) entirely refuse to support [*] since their impact on the ability to optimize language-level code is horrendous. So that feature is already crippled, I don't care about crippling it more.

[*] in the sense of: the float operations that correspond to built-in syntax may or may not actually trap if you set the trap bit, and if they do trap that's UB. You can still have dedicated special float ops that do honor the trap bit, but having it apply to every single float + is impractical and doesn't actually work on today's compilers.

Having opcodes that ignore the trap bit would actually help SNaN support since then it wouldn't be UB to set the trap bit any more!

@quaternic
Copy link

Trapping SNaN is UB in LLVM. Manually checking for NaN at the end is likely faster than disabling the optimizations that depend on float ops never trapping.

Even with LLVM's constrained floating-point intrinsics? That is, an operation marked "fpexcept.maytrap" or "fpexcept.strict" in a function marked "strictfp"?

[*] in the sense of: the float operations that correspond to built-in syntax may or may not actually trap if you set the trap bit, and if they do trap that's UB. You can still have dedicated special float ops that do honor the trap bit, but having it apply to every single float + is impractical and doesn't actually work on today's compilers.

I am not claiming that running Rust code with a non-default isn't UB. It is a good thing if indeed it is, because at least that is compatible with adding that support in the future. Yes, that would require some form of opt-in for any code that wants to use it. But supposing f32::add did support non-default contexts, there shouldn't be any reason it can't be optimized exactly as usual when used in default contexts.

@RalfJung
Copy link
Member

RalfJung commented Oct 20, 2023 via email

@quaternic
Copy link

This may be somewhat tangential, but from what I gather from the IEEE 754 standard, it requires language-defined means of specifying a constant value of the attributes (rounding mode, and others as available) that applies to the operations within a given block (language-defined syntactic units).

It then recommends also supporting dynamic-mode specification of those attributes, where the point that it is dynamic is similarly specified for a given block.

That is, it seems that some form of scoped implicit modes are expected, and it would not be sufficient to just have a dynamic variable for it without also providing the scoping mechanism.

Moreover, this clarified to me that the intent is not for there to be some global state that all floating point operations must take into account by default. Rather, all floating point operations can have various attributes applied to them syntactically, and choosing to use the dynamic attributes is merely one of those options.

With that in mind, and if the practicalities of the hardware implementation were of no concern, it would indeed seem ideal for an ISA to augment each floating-point instruction with bits to choose each attribute from its constant options or the dynamic state.

@CAD97
Copy link

CAD97 commented Oct 22, 2023

To specifically the example in the OP, it would be convenient (to people writing such code) if we could specify it such that any sequence consisting of exclusively

  • asm!;
  • calls to "arch/platform/vendor intrinsics1";
  • copies of
    • primitive types,
    • #[repr(C)] or #[repr(simd) types, or
    • #[repr(transparent)] of the above,
    • including through built in "pure" place computation and/or primitive pointer/reference indirection;
  • comparison of primitive types;
  • control flow with if, loop, while, break, and/or continue; and
  • calls to extern "C" fn which themselves meet these requirements (and adhere to the ABI and callee-save/restore FP status if they change it)

could be guaranteed to work as defined for that sequence of operations on the target architecture, not just solely as defined for the Rust Abstract Machine.

Note the deliberate absence of primitive arithmetic from this list. This ideally means that code motion of floating point operations assuming default FP environment is only restricted from crossing computation which could potentially mutate the FP environment, namely asm! and vendor intrinsics2 (or otherwise introducing FP operations into an "FP critical section" where there were none previously and the environment is not otherwise probably default, i.e. from lack of potential mutation since "FP non-critical" code).

But this is all highly aspirational with the semantics of the current LLVM toolchain. Yes, the documented semantics work properly under fenv access if you use the appropriately strict fp intrinsics... everywhere within the compilation/optimization unit. As soon as you have a single nonstrict fp operation, it could potentially undergo code motion and be exhausted anywhere as a supposedly pure operation. Even outside the "FP critical" section your FP ops need to be strict in order to keep them from moving into the "FP critical" section.


Also note that a portion of this comes down again to where you draw the line between defining ABI, calling convention, signature, pre/postcondition, etc. If the description of the sysv64 ABI stating that the FP environment status bits are callee saved except for when they aren't is accurate, then that is an unfortunate bit of shared global mutable state enshrined as part of the ABI, or minimally an acceptance of ABI nonconformity "when documented" in order to deal with uncomfortable reality.

The "way out" if the entire software ecosystem is willing is ABI "colors3" within the one ABI; like we have extern "C" and extern "C-unwind" which are identical save for a runtime property of whether it unwinds, introduce another axis for mutating the FP environment — say, extern "C-fenv-access". Ideally even constrain calling with non-default environment to the colored ABI4, instead of leaving it an implicit requirement of most functions. But that's not exclusively Rust you'd need to convince, so good luck trying to champion that. The field seems content with it mostly working in constrained use cases (e.g. kernel with absolutely no hardware float operations allowed) and handwaving UB as "just" indeterminate results.

As a final note, note that standard C was from the beginning meant to be descriptive of what could be generally consistent behavior across implementations. If no represented implementation implements #pragma STDC FENV_ACCESS fully to spec, then it's a spec defect, not a bug in literally every implementation. Standard C standardizes existing practice.

Footnotes

  1. I continue to be mildly sad that the core::arch platform "intrinsics" are regular unsafe extern "Rust" fn, as opposed to core::intrinsics intrinsics which are unsafe extern "rust-intrinsic" fn. The former act as real functions and can be converted into function pointers, whereas the latter can't. (This is technically observable on stable via transmute, the only stable intrinsic, although there explainable as an incorrect ABI string.) They should've at a minimum been extern "C", but even better extern "vendor-intrinsic". Ideally IMHO vendor intrinsics should be essentially just an asm! block with function call syntax and without the "completely opaque to the compiler" clause. 2

  2. And ideally, due to vendor intrinsics being intrinsics1, the compiler has some chance of knowing which ones can/can't mutate FP environment, to avoid unnecessary pessimism around e.g. manual SIMD.

  3. Oh how I dislike the function coloring analogy. I initially thought of it as ABI "flavor" here, but I'll stick to the typical "color" analogy.

  4. Although do note that doing so means neither extern "C" nor extern "C-fenv-access" are a subset of the other. If the ABI coloring is done by #pragmain ISO C, that might be the case already, depending on the exact wording. A question for Connor: does ISO C require that fenv is restored to default before#pragma STDC FENV_ACCESS OFF? Does the "implementation is free to assume default environment" make non-default environment UB or just IDB excluding UB? It's also worth noting that cppeeference (in the C section) states that "In practice, few current compilers, such as HP aCC, Oracle Studio, and IBM XL, support the #pragma explicitly;" you would think that if clang/gcc support the pragma anywhere near as well as you assert that they do that such a list wouldn't exclude them. I should also note that I think an implementation can in fact be fully ISO C compliant without any actual FENV access support by simply not defining any FE_*` macros except for those necessary to describe the default environment.

@Muon
Copy link

Muon commented Oct 22, 2023

As a note, cppreference.com is out of date. Of the major compilers, currently MSVC and Clang both support #pragma STDC FENV_ACCESS ON. No one supports #pragma STDC FENV_ROUND <mode> yet, but that's because it's brand new to C23.

Regarding the ABI issue, you can place #pragma STDC FENV_ACCESS ON at the top level of a file, which would make it apply to declarations, although I'm unsure whether that actually matters.

Also, the C standard long long ago moved on from describing existing C implementations to prescribing new behaviors for them (since at least C99).

Also it is indeed the case that a compliant C compiler can simply not define any of the rounding mode macros.

@chorman0773
Copy link
Contributor Author

Note the deliberate absence of primitive arithmetic from this list

In my case, I'd need integer arithmetic. When I can't easily do the floating-point operations using asm/vendor intrinsics, I fallback on a platform-agnostic Software FP.

@chorman0773
Copy link
Contributor Author

A question for Connor: does ISO C require that fenv is restored to default before#pragma STDC FENV_ACCESS OFF? Does the "implementation is free to assume default environment" make non-default environment UB or just IDB excluding UB?

From that cppreference text, I'm unsure. And indeed, in the standard there are several occurances of "the implementation may assume" meaning either (For example "The implementation may assume that loops that do not have a constant condition will eventually terminate" implies undefined behaviour). Apparently the text in the standard (N1256, Committee Draft for N1256) states

If part of a program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_ACCESS pragma ‘‘off’’, the behavior is undefined.

So it is indeed correct that it is considered blanket UB, and my previous assertion that it was simply "Floating-point ops may behave wrongly" was incorrect. I will concede that point.

It's also worth noting that cppeeference (in the C section) states that "In practice, few current compilers, such as HP aCC, Oracle Studio, and IBM XL, support the #pragma explicitly;" you would think that if clang/gcc support the pragma anywhere near as well as you assert that they do that such a list wouldn't exclude them.

clang generates a divss as appropriate for this code: https://godbolt.org/z/jzzjsj3s5 (and inspecting the llvir shows a call to llvm.experimental.constrained.fdiv.f32).
Apparently gcc does in fact ignore the pragma and constant-fold the result.

I should also note that I think an implementation can in fact be fully ISO C compliant without any actual FENV access support by simply not defining any FE_*` macros except for those necessary to describe the default environment.

It probably could, but GLIBC does define all of the macros (on most platforms I'm aware of), so it does expose this functionality.

@quaternic
Copy link

The "way out" if the entire software ecosystem is willing is ABI "colors3" within the one ABI; like we have extern "C" and extern "C-unwind" which are identical save for a runtime property of whether it unwinds, introduce another axis for mutating the FP environment — say, extern "C-fenv-access". Ideally even constrain calling with non-default environment to the colored ABI4, instead of leaving it an implicit requirement of most functions.

  1. Although do note that doing so means neither extern "C" nor extern "C-fenv-access" are a subset of the other.

However, an ABI which doesn't require the environment to be the default but still preserves the callee-saved parts, say extern "C-fenv-any", would be a subset of extern "C". That would be suitable for almost all code that doesn't want to assume the default. For executing code with a different fenv, you could just have wrapper functions which call the function with the given fenv and restore the original before returning/unwinding.

@chorman0773
Copy link
Contributor Author

How would this work for default-ABI functions. Would everything need to be extern "C-fenv-any"/extern "C-fenv-access" period?

@quaternic
Copy link

quaternic commented Oct 23, 2023

Well, having extern "Rust-fenv-any" that is callable as a normal Rust function would of course be nicer, if that is what you meant. But quite a lot of things would need either that or a corresponding variant for "no fenv at all" to be able to write useful programs, and it doesn't feel like a good solution overall.

Using the trait system could be more promising, if only you could implement traits for function items:

// May be implemented by functions that support arbitrary
// floating-point environments.
// 
// Safety: The function may only call other `FenvAware` functions.
unsafe trait FenvAware {}

Assuming there would be some easy way to safely implement it when the body can be checked, and having it would add the necessary constrained attributes to the LLVM-IR.

clang generates a divss as appropriate for this code: https://godbolt.org/z/jzzjsj3s5 (and inspecting the llvir shows a call to llvm.experimental.constrained.fdiv.f32).
Apparently gcc does in fact ignore the pragma and constant-fold the result.

Based on https://gcc.gnu.org/wiki/FloatingPointMath, you need -frounding-math, and then it will generate the divss for your example, although as the warning notes, it still assumes the rounding mode is not changed across calls.

@comex
Copy link

comex commented Oct 23, 2023

Usually @comex is very good at getting compilers to apply the right optimizations in the right order to demonstrate an end-to-end miscompilation, maybe they can do it here, too? :)

Sure thing, boss! Here is a program that segfaults in release mode, where the only unsafe code is a call to fesetround (which changes the floating point environment's rounding mode). Tested on Linux and macOS. In short, LLVM removes a bounds check based on assumptions that are valid in the default rounding mode, but we execute the code in a different mode where those assumptions are invalid, and the program ends up indexing out of bounds.

How I made this

This was pretty hard. If you imagine an array indexing operation vals[i] as expanding to something like

if i >= COUNT {
    panic();
}
*vals.get_unchecked(i)

...then my basic goal was to have i be computed in a way that somehow involves floating point rounding, so that i < COUNT is guaranteed to be true in the floating-point mode LLVM assumes is being used, but not in the mode we're actually using. LLVM optimizes away the bounds check and then indexes out of bounds. But for this to work, i has to be in a known range but not be a constant. If it were a constant, LLVM would just replace the argument to get_unchecked with whatever the constant value is, and not execute the floating point computation at runtime at all.

My initial approach was to perform a floating point calculation that's in-a-known-range-but-not-constant, then convert the result to integer, and use that as i. But that doesn't work, because LLVM... doesn't actually track ranges for floating-point values. LLVM does track ranges for integer values (see ConstantRange.cpp), and propagate them through expressions (see SCCPSolver.cpp). But for floating-point values, the most LLVM does is track a few properties like is-nan and is-positive (see FPClassTest in ValueTracking.cpp), and even those properties aren't propagated across float-to-int conversions, let alone across the special saturating float-to-int conversion intrinsics that Rust uses.

In lieu of this (and skipping several other dead ends I ran into), the best option was to find a pass that performed flow-sensitive constant evaluation. It would need to constant-evaluate the result of an IR instruction under some assumptions, so that the evaluation could successfully produce a constant, yet it would not be legal for the pass to just replace the instruction with that constant (which would only be legal if the instruction result were unconditionally constant).

But most passes are entirely flow-insensitive. That includes SCCP, the pass that tracks integer ranges and would be most likely to elide a bounds check. It doesn't even handle sets of values - e.g. it doesn't track facts like 'foo might be 0, 2, or 10', which might prompt it to try evaluating a later instruction on each of the possible values. It only handles ranges.

I also looked at ScalarEvolution, which handles loops and has its own range-tracking system, but that's also mostly flow-insensitive.

...Except for its "brute force" loop evaluation, which goes through each loop iteration one by one and is thus extremely flow-sensitive! I was happy when I found that buried in the middle of the file. And it turns out that ScalarEvolution (well, really its companion IndVarSimplify) is also capable of removing loop "exit" branches if they're redundant with earlier exits (i.e., if the condition were true, the loop would have already been exited by that point). In the proof of concept I eventually came up with, the bounds check counts as an exit branch (since panicking exits the loop), and the analysis decides that it's redundant with the branch corresponding to the while condition itself.

@comex
Copy link

comex commented Oct 23, 2023

Addendum:

To be fair, that example doesn't limit the operations performed with the non-default rounding mode to intrinsics or inline assembly. The whole thing is performed in a non-default mode. It's rather meant as a demonstration that UB really means UB and not just "FP calculations might give incorrect results".

It shouldn't be hard to change it so that only the problematic FP operation is performed with a non-default mode. And it should also be possible to change it to use an intrinsic. After all, intrinsics are subject to constant folding and other optimizations just like 'regular' floating point operations.

Inline assembly, however, is exempt from most of those optimizations.
If your code is literally just "change rounding mode; run inline assembly; change rounding mode back", then the only way to break it would be if LLVM decided to move some random other floating-point code to between the rounding mode changes. That's certainly allowed in theory, but probably impossible in practice, just because there's no reason for an optimization to see such a move as beneficial. But I wouldn't suggest hanging your program's correctness on such assumptions.

@chorman0773
Copy link
Contributor Author

But I wouldn't suggest hanging your program's correctness on such assumptions.

In my case, at least, there aren't any floating-point operations in sight (beyond stuff in inline-assembly). Inlining would be a thing, but this code is on the other side of a staticlib/dylib and LTO is off (not that the calls that care about fp-env could possibly LTO with llvm-compiled code anyways - this is being called by lccc's codegen). I'd prefer a more well-defined solution, though this is probably good until said solution exists.

@Muon
Copy link

Muon commented Oct 23, 2023

Sure thing, boss! Here is a program that segfaults in release mode, where the only unsafe code is a call to fesetround (which changes the floating point environment's rounding mode). Tested on Linux and macOS. In short, LLVM removes a bounds check based on assumptions that are valid in the default rounding mode, but we execute the code in a different mode where those assumptions are invalid, and the program ends up indexing out of bounds.

That's delightful. I am surprised to learn that LLVM does not perform any range tracking on floating-point variables. Though I suppose if it did optimize things like that more aggressively perhaps that would expose too many bugs with its x87 handling.

@RalfJung
Copy link
Member

RalfJung commented Oct 23, 2023

Sure thing, boss! Here is a program that segfaults in release mode, where the only unsafe code is a call to fesetround (which changes the floating point environment's rounding mode). Tested on Linux and macOS. In short, LLVM removes a bounds check based on assumptions that are valid in the default rounding mode, but we execute the code in a different mode where those assumptions are invalid, and the program ends up indexing out of bounds.

That's amazing, thanks a ton. :) If I truly were your boss, you'd get a promotion. :D

If it were a constant, LLVM would just replace the argument to get_unchecked with whatever the constant value is, and not execute the floating point computation at runtime at all.

It would still index the wrong element though? So one could then unsafely assert that we saw the right element and we would reach an unreachable_unchecked that should be unreachable, and that'd still be a miscompilation?

EDIT: Ah no it would of course index the right element, since it'd do the computation with default rounding mode. Yeah that is quite tricky, amazing that you found an example!

@HadrienG2
Copy link

HadrienG2 commented Dec 12, 2024

Assuming our beloved compiler backends can fix their broken semantics to allow it, I would ague that it makes a lot of sense for Rust to provide opt-in support for FTZ/DAZ mode (ideally in selected code regions so the rest of the code is not penalized) because...

  1. On Intel CPUs, the performance impacts of denormals is huge (10~100x slowdown in FMA-heavy code). Though interestingly, AMD somehow managed to cut it down to much less according to my tests. I wish Intel could learn from them here, but in any case we'll need to support Intel's current CPUs for many years to come at my workplace...
  2. The amount of affected code is equally enormous because anything that looks a decaying exponential will eventually enter denormal range if left at rest long enough. And unfortunately, decaying exponentials are everywhere, because they are the analytical solution of time-based differential equations describing real-world systems with negative feedback effects. Which means that you will often find them in the output of common signal processing algorithms (anything that looks like a low-pass filter) and numerical simulations of many phenomena in natural sciences (physics, biology, chemistry...).

@HadrienG2
Copy link

HadrienG2 commented Dec 13, 2024

I think what I'd love to have is something like this:

fn do_compute() {
   // ... normal rust code ...

   // Denormals-sensitive code path starts here
   #[flush_denormals]
   // At this opening brace, three things happen:
   //
   // 1. A backend optimization barrier akin to an AcqRel atomic op is inserted,
   //    preventing floating-point code and constructs like function calls that
   //    can indirectly lead to the execution of floating-point code to be
   //    reordered after the upcoming change in FP environment configuration.
   // 2. The CPU floating-point environment is saved if needed (see below) then
   //    modified so that denormals start being flushed to zero.
   // 3. A backend optimization barrier akin to an AcqRel atomic op is inserted,
   //    preventing FP code (as defined above) inside the following block to be
   //    reordered before the floating-point environment change.
   {
       // The code that is generated inside of this code block is annotated at
       // the backend level to disable the backend's assumption that the
       // floating-point environment is in the default configuration. Indeed, if
       // the backend provides the appropriate annotations for that, we can even
       // explicitly tell it that we're using a denormals-are-zero environment
       // to reduce the loss of backend optimizations.
       //
       // Note that escaping this code block's scope via e.g. calls to functions
       // that are built in the default "assume default FP env" compiler backend
       // configuration is UB. We could handle this much like we handle
       // functions with `#[target_features]` annotations in regular code that
       // does not have these annotations (i.e. make all function calls unsafe
       // unless the functions are themselves annotated with some kind of
       // `#[flush_denormals]`-like attribute), but the ergonomics would be very
       // poor as any nontrivial use of `#[flush_denormals]` would be full of
       // unsafe even when we can trivially have the compiler backend Do The
       // Right Thing with e.g. FP arithmetic.
       //
       // Instead, it would be better to have a way to automagically force the
       // compiler backend to generate two copies of every function that is
       // invoked here, one with normal FP semantics and one with
       // `#[flush_denormals]` semantics. In that case, the only thing that
       // would be unsafe would be calling to a function that cannot be
       // transparently duplicated (think `extern`, `#[no_mangle]`...).

   // At this closing brace (or if the scope is exited in any other way like
   // panics etc), the floating point environement is restored using a procedure
   // similar to that used to set it:
   //
   // 1. A backend optimization barrier akin to an AcqRel atomic op is inserted
   //    to prevent FP code inside the previous block to be reordered after the
   //    upcoming floating-point environment change.
   // 2. The CPU floating-point environment is restored using the previously
   //    saved copy, or just reset to the expected default if we fully assume a default
   //    FP environment like LLVM seemingly does.
   // 3. A backend optimization barrier akin to an AcqRel atomic op is inserted
   //    to prevent FP code after end of the block to be reordered before the
   //    floating-point environment change.
   }

    // ... back to normal rust code ...
}

But if that's too difficult to implement, I can totally live with a function-scoped attribute (#[flush_denormals] fn gotta_go_fast {}).

A global FTZ/DAZ compiler option would be more problematic on the other hand because some numerical algorithms do depend on proper denormals behavior for correctness. Think about e.g. iterative algorithms that run until estimated error gets below a certain threshold: in this case the error estimate computation can easily end up relying on Sterbenz's lemma for correctness, as nicely highlighted by this amazing bug report.

@RalfJung
Copy link
Member

RalfJung commented Dec 13, 2024

There's quite a big design space here, e.g. one could also imagine specifying the rounding mode and other aspects like denormal handling for each operation. That'd make a lot more sense semantically, and at least some ISAs (RISC-V) I hear are designed in a reasonable (non-stateful) way and support setting such flags on each instruction.

So, this will require someone or a small group of people proposing a t-lang project and working out some reasonable solutions here. It might require work on the LLVM side, too. t-opsem / UCG can help figure out the spec for concrete proposals, but we don't have the capacity to push for entirely new language extensions like this ourselves.

I don't think this issue is the right place to discuss the solution space here. I think the original question has been answered (yes, this is UB). The thing that's left before closing the issue is making sure this is properly documented. I am not entirely sure where such docs would go though... somewhere in the reference where we explain the assumptions Rust makes about the surrounding execution environment, but I don't think we have such a place yet?

@RalfJung RalfJung added the S-pending-documentation Status: The issue is "resolved," but this resolution needs documentation label Dec 13, 2024
@HadrienG2
Copy link

Thanks for the feedback anyway. I must admit that I'm a bit lost in the communication channels that the Rust project uses. What do you think is the best place to bring this discussion to see if there are enough other interested people ? t-lang at rust-lang zulip ? internals.rust-lang.org ? Somewhere else ?

@RalfJung
Copy link
Member

I'd start by writing up some pre-RFC draft and circulating it on Zulip and/or IRLO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floats Topic: concerns floating point operations/representations S-pending-documentation Status: The issue is "resolved," but this resolution needs documentation
Projects
None yet
Development

No branches or pull requests

9 participants