Unbiased floating point number in unit interval #102

curiousleo · 2020-04-21T12:26:20Z

Based on http://allendowney.com/research/rand/downey07randfloat.pdf

Discussion

Performance

Results show the time taken for 100000 iterations.

v1.1 from #111

pure/random/Float                        mean 30.63 ms  ( +- 3.753 ms  )
pure/randomR/unbounded/Float             mean 60.17 ms  ( +- 5.086 ms  )

pure/random/Double                       mean 54.18 ms  ( +- 3.741 ms  )
pure/randomR/unbounded/Double            mean 92.29 ms  ( +- 8.487 ms  )

current interface-to-performance (d03e325 specifically)

pure/random/Float                        mean 365.6 μs  ( +- 3.672 μs  )
pure/uniformR/unbounded/Float            mean 357.0 μs  ( +- 24.55 μs  )

pure/random/Double                       mean 454.0 μs  ( +- 4.255 μs  )
pure/uniformR/unbounded/Double           mean 408.9 μs  ( +- 6.579 μs  )

with this PR

pure/random/Float                        mean 420.7 μs  ( +- 4.416 μs  )
pure/uniformR/unbounded/Float            mean 504.7 μs  ( +- 5.863 μs  )

pure/random/Double                       mean 428.2 μs  ( +- 5.526 μs  )
pure/uniformR/unbounded/Double           mean 451.4 μs  ( +- 11.43 μs  )

lehins

I don't see anything obviously wrong in this PR, but I can't vouch for the math of it.
It would be nice to include some references to why the material that explain why this all works

System/Random/Internal.hs

curiousleo · 2020-04-21T13:18:20Z

It would be nice to include some references to why the material that explain why this all works

Absolutely. @idontgetoutmuch and I are thinking of writing this up more formally somewhere. Downey's draft paper explains the underlying idea, but the exposition can be improved.

peteroupc · 2020-04-21T14:53:47Z

@christoph-conrads, since you have created the Rademacher Floating-Point Library, which is a more robust way to generate random floating-point numbers, please take a look at this and give your thoughts on whether the approach given by Downey here is actually robust in terms of—

generating all representable numbers in [0, 1),
avoiding rounding errors, and
maintaining a uniform distribution throughout the range.

idontgetoutmuch · 2020-04-21T15:28:43Z

@peteroupc I can't read template C++ but I think that it is highly likely to be the same as the Downey approach. Also our implementation now passes the tests in the Rademacher floating point library but is about 3 times slower than just selecting random bits for the mantissa and fixing the exponent so that the number is in the binade [1,2). We need to write up the maths behind our / Downey's approach; I don't think comments in the code works very well for this.

peteroupc · 2020-04-21T15:39:57Z

We need to write up the maths behind our / Downey's approach; I don't think comments in the code works very well for this.

Yes, I would like to see that. In addition, there should be something similar for generating any representable random floating-point number in ranges other than [0, 1). In particular, the algorithm used in the Rademacher FPL to achieve this for any range is anything but trivial, and I haven't been able to figure it out yet.

christoph-conrads · 2020-04-23T14:03:50Z

The comments refer to Allen Downey's paper Generating Pseudo-random Floating-Point Values and not to the Haskell code.

@christoph-conrads, since you have created the Rademacher Floating-Point Library, which is a more robust way to generate random floating-point numbers, please take a look at this and give your thoughts on whether the approach given by Downey here is actually robust in terms of—

generating all representable numbers in [0, 1),

yes, in [0, 1] (closed interval!)

avoiding rounding errors, and

Yes, the algorithm uses only integer arithmetic.

maintaining a uniform distribution throughout the range

When we speak of generating uniformly distributed floating-point (FP) numbers, what we want to do is to generate FP numbers as if one

draws a truly uniformly distributed random real number and
rounds it to a FP number.

In Downey's algorithm, you are using round-to-nearest (see Figure 5 in his paper). That is, Downey's algorithm may be drawing real numbers from the half-open interval [0,1) but the rounded representation is in [0,1].

I can't read template C++ but I think that it is highly likely to be the same as the Downey approach.

Not quite. the Rademacher Floating-Point Library

uses round-toward-zero in its mathematical model, and
never looks at individual bits.

Instead of looking at one random bit at a time, RPFL takes the raw pseudo-random number generator output x in [0, 2^b) and directly computes the exponent by computing

exponent = floor(log2(x))

In integer arithmetic this can be done cheaply because calculating floor(log2(x)) is equivalent to counting the number of trailing zeros; many CPUs have built-in instructions for this purpose and GCC offers __builtin_clz().

The code below is a quickly coded benchmark comparing the two approaches. The RFPL-approach is slightly faster on my x86_64 computer with GCC 9.2.0 but the extra implementation effort may not be worth it because you need on average only two bits and the result is probably highly compiler- and machine-dependent.

Note the rfpl_rand() generates floats in the half-open interval [0,1) whereas downey_rand() computes floats in the closed interval [0,1] because of the branch condition mant == 0 && random_bit in line 76.

// This code attempts to demonstrate the performance differences between drawing
// one and multiple random bits at a time when computing floating-point values
// in the unit interval.
//
// Compile with (`DNEBUG` is essential because of the assertions):
//   c++ -Wextra -Wall -std=c++11 -pedantic single-vs-block.cpp
//       -O2 -march=native -DNEBUG
//
// Author: Christoph Conrads (https://christoph-conrads.name)

#include <cassert>
#include <cmath>
#include <cstdint>
#include <cstdio>
#include <ctime>
#include <random>

// this code uses the noinline attribute to prevent extreme compiler
// optimizations

// Code based on
// Allen B. Downey: "Generating Pseudo-random Floating-Point Values" (2007).
// URL: http://allendowney.com/research/rand/

typedef union box {
	float f;
	unsigned u;
} Box;

/* GET_BIT: returns a random bit.
 * For efficiency, bits are generated 31 at a time using the C library function
 * random ()
 */
int get_bit (std::mt19937_64& gen) {
	int bit;
	static int bits = 0;
	static std::uint64_t x;

	if (bits == 0) {
		x = gen();
		bits = 64;
	}

	bit = x & 1;
	x = x >> 1;
	bits--;
	return bit;
}

/* RANDF: returns a random floating-point number in the range (0, 1), including
 * 0.0, subnormals, and 1.0
 */
float downey_rand(std::mt19937_64& gen) __attribute__((noinline));
float downey_rand(std::mt19937_64& gen) {
	int mant, exp, high_exp, low_exp;
	Box low, high, ans;

	low.f = 0.0;
	high.f = 1.0;

	/* extract the exponent fields from low and high */
	low_exp = (low.u >> 23) & 0xFF;
	high_exp = (high.u >> 23) & 0xFF;
	/* choose random bits and decrement exp until a 1 appears. the reason for
	 * subracting one from high_exp is left as an exercise for the reader
	 */
	for(exp = high_exp-1; exp > low_exp; exp--) {
		if (get_bit(gen))
			break;
	}
	/* choose a random 23-bit mantissa */
	mant = gen() & 0x7FFFFF;
	/* if the mantissa is zero, half the time we should move to the next
	 * exponent range
	 */
	if(mant == 0 && get_bit(gen))
		exp++;

	/* combine the exponent and the mantissa */
	ans.u = (exp << 23) | mant;

	assert( std::isfinite(ans.f) );
	assert( ans.f >= 0 );
	assert( ans.f <= 1 );

	return ans.f;
}



float rfpl_rand(std::mt19937_64& gen) __attribute__((noinline));
float rfpl_rand(std::mt19937_64& gen) {
	Box low, high, ans;

	low.f = 0.0;
	high.f = 1.0;

	/* extract the exponent fields from low and high */
	auto low_exp = (low.u >> 23) & 0xFF;
	auto high_exp = (high.u >> 23) & 0xFF;
	// number of bits generated by the random number generator
	auto num_bits = 64u;
	auto exp = high_exp + 1;

	for(auto exp_max = high_exp; true; exp_max -= num_bits) {
		auto exp_min = (exp_max >= num_bits) ? (exp_max - num_bits) : 0u;
		auto x = gen();

		if(x == 0) {
			if(exp_max <= num_bits)
			{
				exp = low_exp;
				break;
			}

			continue;
		}

		// the built-ins have undefined behavior for x == 0
		assert(x != 0);

		auto k = __builtin_clz(x);
		exp = exp_min + num_bits - k - 1;
		break;
	}

	assert(exp >= low_exp);
	assert(exp <= high_exp);

	/* choose a random 23-bit mantissa */
	auto mant = gen() & 0x7FFFFFu;

	/* combine the exponent and the mantissa */
	ans.u = (exp << 23) | mant;

	assert( std::isfinite(ans.f) );
	assert( ans.f >= 0 );
	assert( ans.f < 1 );

	return ans.f;
}


int main() {
	auto num_downey = std::uintmax_t{0};
	auto dummy_downey = float{0};
	auto num_rfpl = std::uintmax_t{0};
	auto dummy_rfpl = float{0};

	auto gen = std::mt19937_64();

	auto c_delta = 5 * CLOCKS_PER_SEC;
	auto c0 = std::clock();
	auto c1 = c0;

	// dummy code to avoid CPU frequency scaling
	auto dummy = float{1};
	auto eps = std::numeric_limits<float>::epsilon();

	for( ; c1 < c0 + c_delta; c1 = std::clock()) {
		dummy = (1 + eps) * dummy;
	}

	std::printf("CPU frequency scaling dummy: %8.2e\n", dummy);

	// measurement Downey approach
	c0 = std::clock();
	c1 = c0;

	for( ; c1 < c0 + c_delta; c1 = std::clock()) {
		for(auto i = 0u; i < 1000u * 1000u; ++i) {
			++num_downey;
			dummy_downey += downey_rand(gen);
		}
	}

	auto t_downey_sec = 1.0 * (c1 - c0) / CLOCKS_PER_SEC;

	std::printf("Downey: %8.2e per sec\n", 1.0 * num_downey / t_downey_sec);
	std::printf("Downey dummy: %8.2e\n", dummy_downey);

	// measurement Rademacher Floating-Point Library approach
	c0 = std::clock();
	c1 = c0;

	for( ; c1 < c0 + c_delta; c1 = std::clock()) {
		for(auto i = 0u; i < 1000u * 1000u; ++i) {
			++num_rfpl;
			dummy_rfpl += rfpl_rand(gen);
		}
	}

	auto t_rfpl_sec = 1.0 * (c1 - c0) / CLOCKS_PER_SEC;

	std::printf("RFPL: %8.2e per sec\n", 1.0 * num_rfpl / t_rfpl_sec);
	std::printf("RPFL dummy: %8.2e\n", dummy_rfpl);
}

Based on http://allendowney.com/research/rand/downey07randfloat.pdf

curiousleo · 2020-04-24T15:15:58Z

@christoph-conrads thanks for your comments!

In integer arithmetic this can be done cheaply because calculating floor(log2(x)) is equivalent to counting the number of trailing zeros; many CPUs have built-in instructions for this purpose and GCC offers __builtin_clz().

In Haskell there is countLeadingZeros, which is used in this implementation directly:

random/System/Random/Internal.hs

Line 714 in cd51766

then return $ acc + countLeadingZeros w

If I understand it correctly, rademacher-fpl uses logarithms instead of clz because it aims to support non-power-of-2 PRNGs directly. The assumption in this implementation is that the underlying source of randomness produces random 32-bit or 64-bit words, which allows us to use countLeadingZeros to compute the required geometric distribution over the exponents.

uses round-toward-zero in its mathematical model

I'm curious about this one. What's the motivation for / what are the consequences of this choice?

christoph-conrads · 2020-04-28T10:18:29Z

If I understand it correctly, rademacher-fpl uses logarithms instead of clz because it aims to support non-power-of-2 PRNGs directly.

std::log2p1() (from the C++20 draft, renamed to std::bit_width() in the meantime) is implemented using clz, see rademacher-fpl/bit.hpp. In rademacher-fpl/impl/uniform-int-distribution.hpp, line 40, RFPL provides its own std::uniform_int_distribution clone because it

benefits performance-wise if the underlying range is [0, 2^b) for some b.
works around GCC bug 80977.

uses round-toward-zero in its mathematical model

I'm curious about this one. What's the motivation for / what are the consequences of this choice?

Intuitively it felt right for me if all significands have equal probability if the exponent is known. Now you have rounding toward zero.

Consequences:

Given the exponent, all significands are equally likely.
Signed and unsigned floats behave perfectly symmetrical (they would not with, e.g., rounding toward positive infinity)
You must treat +0 and -0 as different numbers (Yes, RFPL can draw from [-0,+0]).
Given real numbers a < b that can be represented exactly in floating-point arithmetic, the generator returns FP values as if it had randomly drawn and rounded real values from
- [a,b) if a ≥ 0,
- (a',b] if b ≤ 0, where a' = nextafter(a, -∞) is the next smaller float after a,
- (a', b) if a < 0, b > 0.

idontgetoutmuch · 2020-04-29T12:49:32Z

@christoph-conrads FYI

[nix-shell:~/random]$ ./a.out
CPU frequency scaling dummy: 2.13e+00
Downey: 6.49e+07 per sec
Downey dummy: 1.68e+07
RFPL: 7.19e+07 per sec
RPFL dummy: 1.68e+07

idontgetoutmuch · 2020-04-29T13:26:25Z

@curiousleo I think we can now close this?

curiousleo · 2020-04-29T13:42:09Z

@curiousleo I think we can now close this?

I guess the approach taken in this PR is neither here nor there: while it does, I believe, give us all representable numbers in the unit interval as a starting point, we're still translating that into the target interval using floating-point operations.

Quoting from #113 (comment):

a + x * (b-a) does not work in floating-point (FP) arithmetic. Period. [...]
If you insist on this approach, you do not need to discuss intricacies of interval boundaries because the rounding error makes the discussion moot.

So I suppose this PR as it is isn't going to fly. Either we keep things "simple but subtly wrong" like most libraries out there, or we go all-in on generating arbitrary ranges with full coverage.

I intend to spend some time coming up with evil test cases to prod the implementation we have right now to see in what ways the current implementation has surprising behaviour. That should help inform our next steps.

christoph-conrads · 2020-04-30T10:06:48Z

@christoph-conrads FYI

[nix-shell:~/random]$ ./a.out
CPU frequency scaling dummy: 2.13e+00
Downey: 6.49e+07 per sec
Downey dummy: 1.68e+07
RFPL: 7.19e+07 per sec
RPFL dummy: 1.68e+07

Thank you for the measurement.
Are you using GCC 8 or 9 on x86-64? The performance might be the other way around with, say, GCC 7 on ARM.

curiousleo · 2020-04-30T11:14:34Z

Out of curiosity, I have proved that a variant of this PR (without the carry bit, that's too much for z3) is indeed "dense" in [0,1) with sbv. Running the following gives "Q.E.D.":

main :: IO ()
main = do
  proof <- prove $ do
    f <- forall "f"
    w32 <- exists "w32"
    w64a <- exists "w64a"
    w64b <- exists "w64b" 
    constrain $ 0 .<= f .&& f .< 1
    return $ floatInUnitIntervalS w32 w64a w64b .== f
  putStrLn $ show proof
  
floatInUnitIntervalS :: SWord32 -> SWord64 -> SWord64 -> SFloat
floatInUnitIntervalS w32 w64a w64b =
  let maxExp = 126
      mantissaBits = 23                                           
      carryMask = (2 :: Word32) ^ mantissaBits                    
      mantissaMask = (2 ^ mantissaBits) - 1
  
      m = w32 .&. mantissaMask
      da = sCountLeadingZeros w64a
      db = sCountLeadingZeros w64b
      d = ite (da .< 64) da (smin (64 + db) maxExp)                
      e = sFromIntegral (maxExp - d) :: SWord32
  in
      sWord32AsSFloat $ (e `shiftL` mantissaBits) .|. m

That's not a very strong property, but it's something :)

curiousleo · 2020-05-05T13:38:53Z

Closing; see #113 (comment) for context.

lehins reviewed Apr 21, 2020

View reviewed changes

System/Random/Internal.hs Show resolved Hide resolved

System/Random/Internal.hs Outdated Show resolved Hide resolved

lehins mentioned this pull request Apr 21, 2020

Postpone: Clusivity #104

Draft

curiousleo mentioned this pull request Apr 22, 2020

Floating point numbers #105

Open

curiousleo force-pushed the unbiased-floating-point-rebased branch from c3f5f47 to 7cab4e6 Compare April 23, 2020 11:59

curiousleo requested a review from idontgetoutmuch April 23, 2020 12:53

curiousleo added 4 commits April 24, 2020 12:06

Unbiased floating point number in unit interval

83fd016

Based on http://allendowney.com/research/rand/downey07randfloat.pdf

Use more bits of the initial entropy

e428cef

Reduce redundancy

7ea8b0c

Add performance-critical INLINE annotations

cd51766

curiousleo force-pushed the unbiased-floating-point-rebased branch from 7cab4e6 to cd51766 Compare April 24, 2020 10:58

idontgetoutmuch approved these changes Apr 24, 2020

View reviewed changes

Floating point ranges inclusive in upper bound

b780bd4

curiousleo mentioned this pull request Apr 27, 2020

Clusivity, coverage and performance #113

Open

curiousleo mentioned this pull request Apr 29, 2020

Generate Float and Double via division #118

Merged

idontgetoutmuch mentioned this pull request Apr 29, 2020

Interface to performance #1

Open

curiousleo closed this May 5, 2020

idontgetoutmuch mentioned this pull request Jun 13, 2020

V1.2 proposal haskell/random#62

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unbiased floating point number in unit interval #102

Unbiased floating point number in unit interval #102

curiousleo commented Apr 21, 2020 •

edited

Loading

lehins left a comment

curiousleo commented Apr 21, 2020

peteroupc commented Apr 21, 2020 •

edited

Loading

idontgetoutmuch commented Apr 21, 2020

peteroupc commented Apr 21, 2020

christoph-conrads commented Apr 23, 2020 •

edited

Loading

curiousleo commented Apr 24, 2020 •

edited

Loading

christoph-conrads commented Apr 28, 2020

idontgetoutmuch commented Apr 29, 2020

idontgetoutmuch commented Apr 29, 2020

curiousleo commented Apr 29, 2020

christoph-conrads commented Apr 30, 2020

curiousleo commented Apr 30, 2020

curiousleo commented May 5, 2020

Unbiased floating point number in unit interval #102

Unbiased floating point number in unit interval #102

Conversation

curiousleo commented Apr 21, 2020 • edited Loading

Discussion

Performance

lehins left a comment

Choose a reason for hiding this comment

curiousleo commented Apr 21, 2020

peteroupc commented Apr 21, 2020 • edited Loading

idontgetoutmuch commented Apr 21, 2020

peteroupc commented Apr 21, 2020

christoph-conrads commented Apr 23, 2020 • edited Loading

curiousleo commented Apr 24, 2020 • edited Loading

christoph-conrads commented Apr 28, 2020

idontgetoutmuch commented Apr 29, 2020

idontgetoutmuch commented Apr 29, 2020

curiousleo commented Apr 29, 2020

christoph-conrads commented Apr 30, 2020

curiousleo commented Apr 30, 2020

curiousleo commented May 5, 2020

curiousleo commented Apr 21, 2020 •

edited

Loading

peteroupc commented Apr 21, 2020 •

edited

Loading

christoph-conrads commented Apr 23, 2020 •

edited

Loading

curiousleo commented Apr 24, 2020 •

edited

Loading