-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbiased floating point number in unit interval #102
Unbiased floating point number in unit interval #102
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything obviously wrong in this PR, but I can't vouch for the math of it.
It would be nice to include some references to why the material that explain why this all works
Absolutely. @idontgetoutmuch and I are thinking of writing this up more formally somewhere. Downey's draft paper explains the underlying idea, but the exposition can be improved. |
@christoph-conrads, since you have created the Rademacher Floating-Point Library, which is a more robust way to generate random floating-point numbers, please take a look at this and give your thoughts on whether the approach given by Downey here is actually robust in terms of—
|
@peteroupc I can't read template C++ but I think that it is highly likely to be the same as the Downey approach. Also our implementation now passes the tests in the Rademacher floating point library but is about 3 times slower than just selecting random bits for the mantissa and fixing the exponent so that the number is in the binade [1,2). We need to write up the maths behind our / Downey's approach; I don't think comments in the code works very well for this. |
Yes, I would like to see that. In addition, there should be something similar for generating any representable random floating-point number in ranges other than [0, 1). In particular, the algorithm used in the Rademacher FPL to achieve this for any range is anything but trivial, and I haven't been able to figure it out yet. |
c3f5f47
to
7cab4e6
Compare
The comments refer to Allen Downey's paper Generating Pseudo-random Floating-Point Values and not to the Haskell code.
yes, in [0, 1] (closed interval!)
Yes, the algorithm uses only integer arithmetic.
When we speak of generating uniformly distributed floating-point (FP) numbers, what we want to do is to generate FP numbers as if one
In Downey's algorithm, you are using round-to-nearest (see Figure 5 in his paper). That is, Downey's algorithm may be drawing real numbers from the half-open interval
Not quite. the Rademacher Floating-Point Library
Instead of looking at one random bit at a time, RPFL takes the raw pseudo-random number generator output
In integer arithmetic this can be done cheaply because calculating The code below is a quickly coded benchmark comparing the two approaches. The RFPL-approach is slightly faster on my x86_64 computer with GCC 9.2.0 but the extra implementation effort may not be worth it because you need on average only two bits and the result is probably highly compiler- and machine-dependent. Note the // This code attempts to demonstrate the performance differences between drawing
// one and multiple random bits at a time when computing floating-point values
// in the unit interval.
//
// Compile with (`DNEBUG` is essential because of the assertions):
// c++ -Wextra -Wall -std=c++11 -pedantic single-vs-block.cpp
// -O2 -march=native -DNEBUG
//
// Author: Christoph Conrads (https://christoph-conrads.name)
#include <cassert>
#include <cmath>
#include <cstdint>
#include <cstdio>
#include <ctime>
#include <random>
// this code uses the noinline attribute to prevent extreme compiler
// optimizations
// Code based on
// Allen B. Downey: "Generating Pseudo-random Floating-Point Values" (2007).
// URL: http://allendowney.com/research/rand/
typedef union box {
float f;
unsigned u;
} Box;
/* GET_BIT: returns a random bit.
* For efficiency, bits are generated 31 at a time using the C library function
* random ()
*/
int get_bit (std::mt19937_64& gen) {
int bit;
static int bits = 0;
static std::uint64_t x;
if (bits == 0) {
x = gen();
bits = 64;
}
bit = x & 1;
x = x >> 1;
bits--;
return bit;
}
/* RANDF: returns a random floating-point number in the range (0, 1), including
* 0.0, subnormals, and 1.0
*/
float downey_rand(std::mt19937_64& gen) __attribute__((noinline));
float downey_rand(std::mt19937_64& gen) {
int mant, exp, high_exp, low_exp;
Box low, high, ans;
low.f = 0.0;
high.f = 1.0;
/* extract the exponent fields from low and high */
low_exp = (low.u >> 23) & 0xFF;
high_exp = (high.u >> 23) & 0xFF;
/* choose random bits and decrement exp until a 1 appears. the reason for
* subracting one from high_exp is left as an exercise for the reader
*/
for(exp = high_exp-1; exp > low_exp; exp--) {
if (get_bit(gen))
break;
}
/* choose a random 23-bit mantissa */
mant = gen() & 0x7FFFFF;
/* if the mantissa is zero, half the time we should move to the next
* exponent range
*/
if(mant == 0 && get_bit(gen))
exp++;
/* combine the exponent and the mantissa */
ans.u = (exp << 23) | mant;
assert( std::isfinite(ans.f) );
assert( ans.f >= 0 );
assert( ans.f <= 1 );
return ans.f;
}
float rfpl_rand(std::mt19937_64& gen) __attribute__((noinline));
float rfpl_rand(std::mt19937_64& gen) {
Box low, high, ans;
low.f = 0.0;
high.f = 1.0;
/* extract the exponent fields from low and high */
auto low_exp = (low.u >> 23) & 0xFF;
auto high_exp = (high.u >> 23) & 0xFF;
// number of bits generated by the random number generator
auto num_bits = 64u;
auto exp = high_exp + 1;
for(auto exp_max = high_exp; true; exp_max -= num_bits) {
auto exp_min = (exp_max >= num_bits) ? (exp_max - num_bits) : 0u;
auto x = gen();
if(x == 0) {
if(exp_max <= num_bits)
{
exp = low_exp;
break;
}
continue;
}
// the built-ins have undefined behavior for x == 0
assert(x != 0);
auto k = __builtin_clz(x);
exp = exp_min + num_bits - k - 1;
break;
}
assert(exp >= low_exp);
assert(exp <= high_exp);
/* choose a random 23-bit mantissa */
auto mant = gen() & 0x7FFFFFu;
/* combine the exponent and the mantissa */
ans.u = (exp << 23) | mant;
assert( std::isfinite(ans.f) );
assert( ans.f >= 0 );
assert( ans.f < 1 );
return ans.f;
}
int main() {
auto num_downey = std::uintmax_t{0};
auto dummy_downey = float{0};
auto num_rfpl = std::uintmax_t{0};
auto dummy_rfpl = float{0};
auto gen = std::mt19937_64();
auto c_delta = 5 * CLOCKS_PER_SEC;
auto c0 = std::clock();
auto c1 = c0;
// dummy code to avoid CPU frequency scaling
auto dummy = float{1};
auto eps = std::numeric_limits<float>::epsilon();
for( ; c1 < c0 + c_delta; c1 = std::clock()) {
dummy = (1 + eps) * dummy;
}
std::printf("CPU frequency scaling dummy: %8.2e\n", dummy);
// measurement Downey approach
c0 = std::clock();
c1 = c0;
for( ; c1 < c0 + c_delta; c1 = std::clock()) {
for(auto i = 0u; i < 1000u * 1000u; ++i) {
++num_downey;
dummy_downey += downey_rand(gen);
}
}
auto t_downey_sec = 1.0 * (c1 - c0) / CLOCKS_PER_SEC;
std::printf("Downey: %8.2e per sec\n", 1.0 * num_downey / t_downey_sec);
std::printf("Downey dummy: %8.2e\n", dummy_downey);
// measurement Rademacher Floating-Point Library approach
c0 = std::clock();
c1 = c0;
for( ; c1 < c0 + c_delta; c1 = std::clock()) {
for(auto i = 0u; i < 1000u * 1000u; ++i) {
++num_rfpl;
dummy_rfpl += rfpl_rand(gen);
}
}
auto t_rfpl_sec = 1.0 * (c1 - c0) / CLOCKS_PER_SEC;
std::printf("RFPL: %8.2e per sec\n", 1.0 * num_rfpl / t_rfpl_sec);
std::printf("RPFL dummy: %8.2e\n", dummy_rfpl);
} |
7cab4e6
to
cd51766
Compare
@christoph-conrads thanks for your comments!
In Haskell there is random/System/Random/Internal.hs Line 714 in cd51766
If I understand it correctly, rademacher-fpl uses logarithms instead of
I'm curious about this one. What's the motivation for / what are the consequences of this choice? |
Intuitively it felt right for me if all significands have equal probability if the exponent is known. Now you have rounding toward zero. Consequences:
|
|
@curiousleo I think we can now close this? |
I guess the approach taken in this PR is neither here nor there: while it does, I believe, give us all representable numbers in the unit interval as a starting point, we're still translating that into the target interval using floating-point operations. Quoting from #113 (comment):
So I suppose this PR as it is isn't going to fly. Either we keep things "simple but subtly wrong" like most libraries out there, or we go all-in on generating arbitrary ranges with full coverage. I intend to spend some time coming up with evil test cases to prod the implementation we have right now to see in what ways the current implementation has surprising behaviour. That should help inform our next steps. |
Thank you for the measurement. |
Out of curiosity, I have proved that a variant of this PR (without the carry bit, that's too much for z3) is indeed "dense" in main :: IO ()
main = do
proof <- prove $ do
f <- forall "f"
w32 <- exists "w32"
w64a <- exists "w64a"
w64b <- exists "w64b"
constrain $ 0 .<= f .&& f .< 1
return $ floatInUnitIntervalS w32 w64a w64b .== f
putStrLn $ show proof
floatInUnitIntervalS :: SWord32 -> SWord64 -> SWord64 -> SFloat
floatInUnitIntervalS w32 w64a w64b =
let maxExp = 126
mantissaBits = 23
carryMask = (2 :: Word32) ^ mantissaBits
mantissaMask = (2 ^ mantissaBits) - 1
m = w32 .&. mantissaMask
da = sCountLeadingZeros w64a
db = sCountLeadingZeros w64b
d = ite (da .< 64) da (smin (64 + db) maxExp)
e = sFromIntegral (maxExp - d) :: SWord32
in
sWord32AsSFloat $ (e `shiftL` mantissaBits) .|. m That's not a very strong property, but it's something :) |
Closing; see #113 (comment) for context. |
Based on http://allendowney.com/research/rand/downey07randfloat.pdf
Discussion
See #105.
Performance
Results show the time taken for 100000 iterations.
v1.1 from #111
current
interface-to-performance
(d03e325 specifically)with this PR