-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is Changing the Floating-point environment for intrinsic/assembly code UB? #471
Comments
@rustbot label +A-floats |
You can't know that no Rust-level floating-point operations are affected. The compiler is allowed to move floating-point operations from elsewhere to in between the SET_ROUNDING_MODE calls. This is not a bug, floating-point operations are pure operations in the Rust AM and can be reordered arbitrarily. rustc has to thus implement them in a way that their behavior does not depend on any environmental flags, and it does that by having "the rounding mode is round-to-nearest" in its representation invariant that relates the low-level target state to the high-level AM state. So yes, this is UB. |
My next question is whether it should be UB. If even being in (any) rust code with an incorrect floating-point environment is UB, then that implicates quite a few things:
|
FWIW C code compiled with LLVM (and likely most other compilers) has the same UB. This is not just a Rust problem. I don't know anything about any of your examples, but if a platform ABI requires FP env mutation (as you claim for point 1) then that's already a highly problematic ABI at best. Which ABI requires FP env mutation? Therefore I also don't buy your second example; if that runtime support is implemented in C, then it already implicitly assumes a default FP environment. The C ABI (on pretty much any target) requires the FP env to be in default state. Allowing modification of the FP environment without compromising optimizations on the 99.9% of functions that are happy with the default FP settings is not easy. We'd have to add some notion of "scope within which the compiler does not assume that the FP environment is in the default state". This would have to interact properly with inlining and things like that. It's not impossible, but it requires a bunch of design work. |
clang supports the
Moreso where the FP env is stored on the ABI, so I cannot emulate the fp env in the library. Practically every ABI I'm aware of specifies the platform effects of the
The main example is libsfp, a runtime support library used by lccc to implement floating-point operations (of various sizes, among others), including those in a function marked
This is not the case of x86_64 Sys-V or MSABI. Both require that the floating-point environment is initialized to default (and specify that default), and then marks the control bits of the relevant registers (mxcsr and the x87 fcw) as callee saved, with the exception of functions that intentionally modify the floating-point environment ( |
If this was the case, then being in an async-signal-handler is immediate UB (as I noted is the case for Rust), which is most definitely not the case. |
As far as I know, C/C++ code compiled with clang without special markers behaves exactly like Rust code wrt float operations, and hence has the same UB. I assume basically all C/C++ compilers will treat float operations as pure (unless some special marker is set to indicate that a scope has a non-default FP env) and hence move them around (including out of loops and out of potentially dead code). This makes mutating the FP environment UB in the real C that compilers implement, whether or not the standard agrees. I have no interest in specifying Rust in a way that is disconnected from reality, so we should call this UB as that's what it is. Maybe it's UB because compilers are not compliant, but how's that helpful? It isn't, unless you have a proposal for how the standard can be implemented in a reasonable way.
That can't be true. When I write an All of this shows that the rounding mode is part of the de-facto ABI. If the documentation disagrees then the documentation doesn't reflect the contract used in real-world software.
If the async signal handler uses any float operation, then it's most definitely UB. There's also nothing in the LLVM LangRef that would forbid the compiler from introducing a new float operation into code that doesn't use a float operation. This can be as subtle as if !in_signal { some_float_op(); } and hoisting the operation out of the Sounds like async signal handlers need to be compiled with such a "FP env might be in non-default state" kind of a scope, otherwise there's no way they can be sound. Also sounds like nobody really thought this entire FP env thing through to the end and different parts of the ecosystem made mutually incompatible choices, and now it's all busted. 🤷 It doesn't get better by pretending that it's not UB, though. |
As far as I am aware gcc (and msvc) implement the behaviour as prescribed. If clang does not, I would consider that a bug in clang and certainly not any behaviour I would desire to emulate in lccc.
This is either an extra constraint imposed by rust and that does not reflect any actual C abi, or is an incorrect reliance. As far as I am aware, no C abi is not complaint with the relevant sections of ISO 9899 in this regard. Knowing the precise behaviour of the Clever-ISA abi, I can quote the relevant text, though x86_64 Sys-V is similar (albeit less formal).
|
(if you'd like, I can find the relevant portions of the x86_64 sys-v spec and msvc abi) |
GCC says "Without any explicit options, GCC assumes round to nearest or even". It's unclear what that means, but it's far from obvious that it means "GCC guarantees that the code will work correctly under all FP environments". LLVM is very clear: "The default LLVM floating-point environment assumes that traps are disabled and status flags are not observable. Therefore, floating-point math operations do not have side effects and may be speculated freely. Results assume the round-to-nearest rounding mode, and subnormals are assumed to be preserved." You might want to bring this up with the LLVM people if you think that's an issue. Usually @comex is very good at getting compilers to apply the right optimizations in the right order to demonstrate an end-to-end miscompilation, maybe they can do it here, too? :)
What is this claim based on? Is it some standard that says so, or are there really targets and OSes where the kernel doesn't save and restore the FP environment when switching from a thread to its signal handler and back?
Do you have any evidence that every single library with a C ABI is actually expected to be working correctly under arbitrary FP environments, and that library authors consider it a bug when their library misbehaves under non-default FP environments? As I said before, I care not only about what it says in some piece of paper and but also about what is actually done in the real world. When standard and reality disagree, it's not automatically reality that's wrong. Sometimes the standard is just making completely unrealistic prescriptions that everybody ignores, and the standard should be fixed. It's also unclear to me which alternative you are suggesting. Could you make a constructive proposal? Here are some options, and you can already immediately see why many people won't like them:
You are asking everyone to pay for a feature that hardly anyone needs. Is that your position, or do you see a better way out here? If you further want to claim that even other aspects besides the rounding mode may be changed, such as making sNaNs trigger a trap, then either passing an sNaN to an FP operation is UB, or FP operations cannot be reordered at all with anything any more (e.g., reordering an FP operation and a store becomes illegal since the trap makes it fully observable when exactly an operation happens). |
Linux definitely restores the FP environment when it enters a signal handler. There was a big kerfuffle about it back in 2002 when SSE2 arrived (https://yarchive.net/comp/linux/fp_state_save.html). I think FreeBSD might be a target that actually does not restore the FP environment when entering a signal handler, but I am unsure (https://reviews.freebsd.org/D33599). In any case, glibc says that The SYSV ABI (https://gitlab.com/x86-psABIs/x86-64-ABI) stipulates that the FP control bits are callee-saved, meaning that the callee needs to restore them if it changes them. (Presumably an exception is intended for The main consideration for Rust is that it is ultimately bound by LLVM's quirks. LLVM has made (is still making?) progress towards letting Clang support Additionally, the C23 standard (and possibly earlier revisions) specifies in Section 7.6.1 "The FENV_ACCESS pragma" that it is UB to, under a non-default floating-point environment, execute any code that was compiled with the pragma set to off. |
@Muon thanks! So looks like in practice, signal handlers are fine on our tier 1 targets, but other targets are having issues. (Also I heard that some versions of WSL do not save and restore the FP env for signal handlers.)
For this to be useful for compilers, the exception needs to be compiler-readable. Connor quoted above some wording saying that if the function is "documented" to change the FP state then it may do so, but of course that's not very useful. This is similar to how setjmp needs an attribute so that the compiler can understand that something very weird is going on. (Though floats are different in that as far as I can see, even with such an attribute there'd be a global cost.)
For Rust (and code compiled with clang) this means all functions have such an undocumented assumption. |
FWIW in my opinion this is a case of bad ISA design. ISAs chose to introduce some global mutable state and as usual, global mutable state is causing problems. ISAs should provide opcodes that entirely ignore the FP status register so that languages can implement the desired semantics (floating-point operations that do not depend on global mutable state) properly. But it seems like even RISCV repeats this mistake, so we'll be stuck with hacks and quirks for many decades to come. Languages can choose to either make those ISA features basically inaccessible, to penalize all users for the benefit of the tiny fraction that actually wants a non-default FP status register, or to introduce syntactic quirks that mark where in the code floating-point opcodes behave in non-default ways. |
Global mutable state includes registers and status flags. While it would have been nice to avoid it in this instance, it's seemingly unavoidable in general and compiler writers deal with it as a matter of course. I'd say that programming languages have a pretty free hand here. You have to indicate the rounding somehow, anyway. I'm not sure how it's possible to expose alternate rounding modes without syntactic quirks of some variety! |
General registers are a lot less global -- most instructions take a list of input/output registers and only act on them. The FP status register is very different from that. This would be very easy to avoid if FP operations had a bit in the opcode which says "ignore status register, use default instead", but sadly no ISA seems to have such a bit. There's a reason we're having this problem with the FP status register but not with any of the others. (Yes of course there's a whole bunch of status registers, but the others don't affect very basic operations such as addition of a primitive type, and so far there's been no complaint about Rust requiring them all to be in a particular fixed state that ensures the instructions we need behave the way we need them to behave.) |
At least on x86, the comparison, overflow, and other flags govern conditional jumps and moves. I'd say these are extremely basic operations. The carry flag makes add-with-carry work. Codegen routinely has to deal with setting and clearing these appropriately. |
Condition flags are caller saved and undefined for the callee, but the float env is callee saved and most code expects the default float env to be used as current float env. This means that nobody sets the float env to the value it expects. And presumably changing the float env is rather expensive unlike changing condition flags. |
I didn't say it was trivial or cheap, just that this sort of thing is already handled routinely. |
AVX-512 and AVX10 allows you to suppress exception checking (the |
(Of course, that's AVX512, so... - heh) |
Note that there is some more discussion on Zulip: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Is.20it.20UB.20to.20change.20rounding.20modes.20without.20using.20float.20ops (which also led this issue being created)
Some instructions do have encodings for explicit rounding modes. On x86, SSE4.1 has https://www.felixcloutier.com/x86/roundss ("round float to an integer value" ), which allows overriding the rounding mode for that instruction, as well as suppressing the inexact-exception. That doesn't mean it ignores the control register though; an SNaN input will still signal invalid operation which may or may not be masked in MXCSR. [And as chorman0773 said above: ] More recently, AVX-512 has some explicit rounding/exception control bits in the instruction prefix, so that you can do the same with most computational operations as well. I believe a problem with having instruction encodings that ignore the FP control register is that their presence would effectively cripple the uses of that control register. E.g. the only reason to have SNaNs in your program execution is that you want to trap and handle an exception when the cpu tries to use that value, which would no longer be reliable if some function somewhere has chosen to sidestep that. |
Trapping SNaN is UB in LLVM. Manually checking for NaN at the end is likely faster than disabling the optimizations that depend on float ops never trapping. |
SNaNs are a failed experiment that many major C compilers (at least GCC and clang) entirely refuse to support [*] since their impact on the ability to optimize language-level code is horrendous. So that feature is already crippled, I don't care about crippling it more. [*] in the sense of: the float operations that correspond to built-in syntax may or may not actually trap if you set the trap bit, and if they do trap that's UB. You can still have dedicated special float ops that do honor the trap bit, but having it apply to every single float Having opcodes that ignore the trap bit would actually help SNaN support since then it wouldn't be UB to set the trap bit any more! |
Even with LLVM's constrained floating-point intrinsics? That is, an operation marked "fpexcept.maytrap" or "fpexcept.strict" in a function marked "strictfp"?
I am not claiming that running Rust code with a non-default isn't UB. It is a good thing if indeed it is, because at least that is compatible with adding that support in the future. Yes, that would require some form of opt-in for any code that wants to use it. But supposing |
Sure, in non default context we could treat all float ops as strict float ops.
But if ISAs had "default float mode" ops we could just have strict float intrinsics and a wrapper type that used them and this entire business with having to mark the context with an attribute / pragma would be completely unnecessary.
|
This may be somewhat tangential, but from what I gather from the IEEE 754 standard, it requires language-defined means of specifying a constant value of the attributes (rounding mode, and others as available) that applies to the operations within a given block (language-defined syntactic units). It then recommends also supporting dynamic-mode specification of those attributes, where the point that it is dynamic is similarly specified for a given block. That is, it seems that some form of scoped implicit modes are expected, and it would not be sufficient to just have a dynamic variable for it without also providing the scoping mechanism. Moreover, this clarified to me that the intent is not for there to be some global state that all floating point operations must take into account by default. Rather, all floating point operations can have various attributes applied to them syntactically, and choosing to use the dynamic attributes is merely one of those options. With that in mind, and if the practicalities of the hardware implementation were of no concern, it would indeed seem ideal for an ISA to augment each floating-point instruction with bits to choose each attribute from its constant options or the dynamic state. |
To specifically the example in the OP, it would be convenient (to people writing such code) if we could specify it such that any sequence consisting of exclusively
could be guaranteed to work as defined for that sequence of operations on the target architecture, not just solely as defined for the Rust Abstract Machine. Note the deliberate absence of primitive arithmetic from this list. This ideally means that code motion of floating point operations assuming default FP environment is only restricted from crossing computation which could potentially mutate the FP environment, namely But this is all highly aspirational with the semantics of the current LLVM toolchain. Yes, the documented semantics work properly under fenv access if you use the appropriately strict fp intrinsics... everywhere within the compilation/optimization unit. As soon as you have a single nonstrict fp operation, it could potentially undergo code motion and be exhausted anywhere as a supposedly pure operation. Even outside the "FP critical" section your FP ops need to be strict in order to keep them from moving into the "FP critical" section. Also note that a portion of this comes down again to where you draw the line between defining ABI, calling convention, signature, pre/postcondition, etc. If the description of the sysv64 ABI stating that the FP environment status bits are callee saved except for when they aren't is accurate, then that is an unfortunate bit of shared global mutable state enshrined as part of the ABI, or minimally an acceptance of ABI nonconformity "when documented" in order to deal with uncomfortable reality. The "way out" if the entire software ecosystem is willing is ABI "colors3" within the one ABI; like we have As a final note, note that standard C was from the beginning meant to be descriptive of what could be generally consistent behavior across implementations. If no represented implementation implements Footnotes
|
As a note, cppreference.com is out of date. Of the major compilers, currently MSVC and Clang both support Regarding the ABI issue, you can place Also, the C standard long long ago moved on from describing existing C implementations to prescribing new behaviors for them (since at least C99). Also it is indeed the case that a compliant C compiler can simply not define any of the rounding mode macros. |
In my case, I'd need integer arithmetic. When I can't easily do the floating-point operations using asm/vendor intrinsics, I fallback on a platform-agnostic Software FP. |
From that cppreference text, I'm unsure. And indeed, in the standard there are several occurances of "the implementation may assume" meaning either (For example "The implementation may assume that loops that do not have a constant condition will eventually terminate" implies undefined behaviour). Apparently the text in the standard (N1256, Committee Draft for N1256) states
So it is indeed correct that it is considered blanket UB, and my previous assertion that it was simply "Floating-point ops may behave wrongly" was incorrect. I will concede that point.
clang generates a
It probably could, but GLIBC does define all of the macros (on most platforms I'm aware of), so it does expose this functionality. |
However, an ABI which doesn't require the environment to be the default but still preserves the callee-saved parts, say |
How would this work for default-ABI functions. Would everything need to be |
Well, having Using the trait system could be more promising, if only you could implement traits for function items: // May be implemented by functions that support arbitrary
// floating-point environments.
//
// Safety: The function may only call other `FenvAware` functions.
unsafe trait FenvAware {} Assuming there would be some easy way to safely implement it when the body can be checked, and having it would add the necessary constrained attributes to the LLVM-IR.
Based on https://gcc.gnu.org/wiki/FloatingPointMath, you need |
Sure thing, boss! Here is a program that segfaults in release mode, where the only unsafe code is a call to How I made thisThis was pretty hard. If you imagine an array indexing operation if i >= COUNT {
panic();
}
*vals.get_unchecked(i) ...then my basic goal was to have My initial approach was to perform a floating point calculation that's in-a-known-range-but-not-constant, then convert the result to integer, and use that as In lieu of this (and skipping several other dead ends I ran into), the best option was to find a pass that performed flow-sensitive constant evaluation. It would need to constant-evaluate the result of an IR instruction under some assumptions, so that the evaluation could successfully produce a constant, yet it would not be legal for the pass to just replace the instruction with that constant (which would only be legal if the instruction result were unconditionally constant). But most passes are entirely flow-insensitive. That includes I also looked at ...Except for its "brute force" loop evaluation, which goes through each loop iteration one by one and is thus extremely flow-sensitive! I was happy when I found that buried in the middle of the file. And it turns out that |
Addendum: To be fair, that example doesn't limit the operations performed with the non-default rounding mode to intrinsics or inline assembly. The whole thing is performed in a non-default mode. It's rather meant as a demonstration that UB really means UB and not just "FP calculations might give incorrect results". It shouldn't be hard to change it so that only the problematic FP operation is performed with a non-default mode. And it should also be possible to change it to use an intrinsic. After all, intrinsics are subject to constant folding and other optimizations just like 'regular' floating point operations. Inline assembly, however, is exempt from most of those optimizations. |
In my case, at least, there aren't any floating-point operations in sight (beyond stuff in inline-assembly). Inlining would be a thing, but this code is on the other side of a staticlib/dylib and LTO is off (not that the calls that care about fp-env could possibly LTO with llvm-compiled code anyways - this is being called by lccc's codegen). I'd prefer a more well-defined solution, though this is probably good until said solution exists. |
That's delightful. I am surprised to learn that LLVM does not perform any range tracking on floating-point variables. Though I suppose if it did optimize things like that more aggressively perhaps that would expose too many bugs with its x87 handling. |
That's amazing, thanks a ton. :) If I truly were your boss, you'd get a promotion. :D
It would still index the wrong element though? So one could then unsafely assert that we saw the right element and we would reach an EDIT: Ah no it would of course index the right element, since it'd do the computation with default rounding mode. Yeah that is quite tricky, amazing that you found an example! |
Assuming our beloved compiler backends can fix their broken semantics to allow it, I would ague that it makes a lot of sense for Rust to provide opt-in support for FTZ/DAZ mode (ideally in selected code regions so the rest of the code is not penalized) because...
|
I think what I'd love to have is something like this: fn do_compute() {
// ... normal rust code ...
// Denormals-sensitive code path starts here
#[flush_denormals]
// At this opening brace, three things happen:
//
// 1. A backend optimization barrier akin to an AcqRel atomic op is inserted,
// preventing floating-point code and constructs like function calls that
// can indirectly lead to the execution of floating-point code to be
// reordered after the upcoming change in FP environment configuration.
// 2. The CPU floating-point environment is saved if needed (see below) then
// modified so that denormals start being flushed to zero.
// 3. A backend optimization barrier akin to an AcqRel atomic op is inserted,
// preventing FP code (as defined above) inside the following block to be
// reordered before the floating-point environment change.
{
// The code that is generated inside of this code block is annotated at
// the backend level to disable the backend's assumption that the
// floating-point environment is in the default configuration. Indeed, if
// the backend provides the appropriate annotations for that, we can even
// explicitly tell it that we're using a denormals-are-zero environment
// to reduce the loss of backend optimizations.
//
// Note that escaping this code block's scope via e.g. calls to functions
// that are built in the default "assume default FP env" compiler backend
// configuration is UB. We could handle this much like we handle
// functions with `#[target_features]` annotations in regular code that
// does not have these annotations (i.e. make all function calls unsafe
// unless the functions are themselves annotated with some kind of
// `#[flush_denormals]`-like attribute), but the ergonomics would be very
// poor as any nontrivial use of `#[flush_denormals]` would be full of
// unsafe even when we can trivially have the compiler backend Do The
// Right Thing with e.g. FP arithmetic.
//
// Instead, it would be better to have a way to automagically force the
// compiler backend to generate two copies of every function that is
// invoked here, one with normal FP semantics and one with
// `#[flush_denormals]` semantics. In that case, the only thing that
// would be unsafe would be calling to a function that cannot be
// transparently duplicated (think `extern`, `#[no_mangle]`...).
// At this closing brace (or if the scope is exited in any other way like
// panics etc), the floating point environement is restored using a procedure
// similar to that used to set it:
//
// 1. A backend optimization barrier akin to an AcqRel atomic op is inserted
// to prevent FP code inside the previous block to be reordered after the
// upcoming floating-point environment change.
// 2. The CPU floating-point environment is restored using the previously
// saved copy, or just reset to the expected default if we fully assume a default
// FP environment like LLVM seemingly does.
// 3. A backend optimization barrier akin to an AcqRel atomic op is inserted
// to prevent FP code after end of the block to be reordered before the
// floating-point environment change.
}
// ... back to normal rust code ...
} But if that's too difficult to implement, I can totally live with a function-scoped attribute ( A global FTZ/DAZ compiler option would be more problematic on the other hand because some numerical algorithms do depend on proper denormals behavior for correctness. Think about e.g. iterative algorithms that run until estimated error gets below a certain threshold: in this case the error estimate computation can easily end up relying on Sterbenz's lemma for correctness, as nicely highlighted by this amazing bug report. |
There's quite a big design space here, e.g. one could also imagine specifying the rounding mode and other aspects like denormal handling for each operation. That'd make a lot more sense semantically, and at least some ISAs (RISC-V) I hear are designed in a reasonable (non-stateful) way and support setting such flags on each instruction. So, this will require someone or a small group of people proposing a t-lang project and working out some reasonable solutions here. It might require work on the LLVM side, too. t-opsem / UCG can help figure out the spec for concrete proposals, but we don't have the capacity to push for entirely new language extensions like this ourselves. I don't think this issue is the right place to discuss the solution space here. I think the original question has been answered (yes, this is UB). The thing that's left before closing the issue is making sure this is properly documented. I am not entirely sure where such docs would go though... somewhere in the reference where we explain the assumptions Rust makes about the surrounding execution environment, but I don't think we have such a place yet? |
Thanks for the feedback anyway. I must admit that I'm a bit lost in the communication channels that the Rust project uses. What do you think is the best place to bring this discussion to see if there are enough other interested people ? t-lang at rust-lang zulip ? internals.rust-lang.org ? Somewhere else ? |
I'd start by writing up some pre-RFC draft and circulating it on Zulip and/or IRLO. |
Disclaimer: This is not attempting to solve the general
fe_setenv
issue and allow changing the floating-point environment for floating-point code.Based on rust-lang/rust#72252, it seems the following code is currently UB:
Likewise, the following code is also considered UB:
Both of these are suprising, as no rust-level floating-point operations are performed that would be affected by the rounding mode - only platform intrinsics or inline assembly.
These are limited examples, but a much more general example of this is a library that implements floating-point operations (including those in non-default floating-point environments from, e.g. C or C++) using a combination of software emulation, inline assembly, and platform intrinsics.
Assuming the LLVM issue mentioned in 72252 is fixed (and llvm's code generation for the
llvm.x86.sse.div.ss
intrinsic is fixed), can we call these examples defined behaviour, or is this code simply UB for some other reason?The text was updated successfully, but these errors were encountered: