Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sample() and select() functions are missing #13

Closed
vitiral opened this issue Nov 20, 2017 · 52 comments
Closed

sample() and select() functions are missing #13

vitiral opened this issue Nov 20, 2017 · 52 comments

Comments

@vitiral
Copy link

vitiral commented Nov 20, 2017

My use case is a graph, where all nodes are randomly created, and then I want to select a subset for each node to connect to when I create the actual graph.

However, there doesn't seem to be either a sample() or select() api with which to do this.

@Centril
Copy link
Collaborator

Centril commented Nov 20, 2017

What would be the preferred signatures of sample() and select()?

So for simplicity let's assume that a Vec stands in for a true set.
Then perhaps a Strategy which generates a sub-Vec of the Vec could be useful?
A simple way to do this is: go through all elements of the list, flip a coin, if yes, add the element to the set, else leave it out. We can accomplish this by generating a Vec<bool> (of equal length to the nodeset Vec) which is then zipped with the nodeset Vec and then filter based on the value of the bool.
This may be sub-optimal, but it should work.

@vitiral
Copy link
Author

vitiral commented Nov 20, 2017

I think I would prefer #14 over this, since #14 is more general use case and can be used in any way that the user needs, including using any container they wish.

I think it would be a mistake to not use Rng under the hood for this. sample takes a specific number of samples, so probabilistically "flipping a coin" is not enough to guarantee how many you will get. Trying to select N values from M where every value in M has an equal chance of being selected is actually fairly non-trivial if I recall correctly.

A really simple case would be to shuffle the data and select the first N, but this will take O(M) time for N values (where N < M), so is certainly not ideal.

@vitiral
Copy link
Author

vitiral commented Nov 20, 2017

I'm looking at this more and questioning whether #14 is actually the way to go... once we open up that can of worms it may be difficult to walk it back.

Intead, I think the right way to go about it is for the container-strategies (i.e. HashMapStrategy, etc) to all implement a ContainerStrategy trait, this would have methods like this:

impl HashSetStrategy<T> {
    /// Return a randomized sample of the hashset as a new hashset.
    ///
    /// Panics if the set's length is `< num`
    pub fn prop_sample(num: USizeStrategy) -> HashSetStrategy<T>;

    /// Select an element at random. 
    ///
    /// Panics if the set is empty.
    fn prop_select() -> Just<T>;
}

It would be almost exactly this way for the others as well. I would think you could make a ContainerStrategy<T>: IntoIter<T> who's methods would be auto-implemented for containers that can be converted into iterators.

@Centril
Copy link
Collaborator

Centril commented Nov 20, 2017

Could you clarify what sample and select do exactly, possibly including signatures of these methods?
My comment was specific to your particular use case and not sample or select in general since I'm not sure what these entail.

@vitiral
Copy link
Author

vitiral commented Nov 20, 2017

@Centril lol literally just doing that :). I edited it slightly for more clarity. Sorry if the types aren't quite right, but I hope the intent got across.

@Centril
Copy link
Collaborator

Centril commented Nov 20, 2017

I don't think that description is sufficient, at least not for me to implement this independently - I'm still not sure what these operations actually do...

From what I understand, VecStrategy::prop_sample(num: usize) simply generates a vec where .len() == num. This is simply: proptest::collection::vec(strategy_for_elem, num .. num).

So what is .select()? does it select a subset of elements from a generated Vec ? A longer description would be instructive here.

@vitiral
Copy link
Author

vitiral commented Nov 20, 2017

what is strategy_for_elem in your case? Is that a filter?

select() just selects a single element, sample(num: USizeStrategy) selects a random number of them (but like you said, it selects exactly that number, so if you had a physical num (not a strategy) then it could be represented as num .. num).

Edit: if you can think a way to do this right now that would be enlightening, as I couldn't wrap my head around how to do it with what exists. I think prop_recurse might be able to accomplish it since you can specify a depth, but I'm not sure.

@vitiral
Copy link
Author

vitiral commented Nov 20, 2017

The goal would be that the api for prop_sample should "feel" like rand::sample:

pub fn sample<T, I, R>(rng: &mut R, iterable: I, amount: usize) -> Vec<T> where
    I: IntoIterator<Item = T>,
    R: Rng, 

I guess this generates up to amount, so it actually doesn't fail if amount > len. I would preserve that API.

@vitiral
Copy link
Author

vitiral commented Nov 20, 2017

the code is actually pretty basic, I might be able to actually implement this as a workaround.

Edit: or wait, I need a way to keep getting random numbers, which I don't think is really possible using proptest.

@Centril
Copy link
Collaborator

Centril commented Nov 20, 2017

Here strategy_for_elem is a strategy for generating one element of type requested type, so for example: proptest::collection::vec(proptest::num::u8::ANY, 100..100) is a strategy that generates a vector of exactly 100 elements of any u8.

If you want to randomize the number of elements too you can use prop_flat_map:

use proptest::num::{u16, u8};
use proptest::collection::vec;

// A strategy that generates (0 ..= std::u16::MAX) elements of u8:s.
let strategy = u16::ANY.prop_flat_map(|len| vec(u8::ANY,  len..len));

@Centril
Copy link
Collaborator

Centril commented Nov 20, 2017

A smarter way to put this is (approximately):

use proptest::num::u8;
use proptest::collection::vec;
let strategy = vec(u8::ANY,  0 .. std::u16::MAX);

@vitiral
Copy link
Author

vitiral commented Nov 20, 2017

but that selects from all u8's, it doesn't select from a precreated list.

What I would like to do is something like:

use proptest::num::u8;
use proptest::collection::vec;

/// get a vector and a random sample of elements from that vector
let strategy = vec(u8::ANY,  1 .. std::u16::MAX)
    .prop_map(|v| (v, v.prop_sample(1 .. v.len()));

This could return the following examples:

([1, 2, 3, 4, 5], [1, 5])      # randomly selected two elements
([1], [1])                     # second list is determined since minlen==1
([42, 33, 1, 128], [33, 128])  # randomly selected two elements
([1,2,3], [1,2,3])             # randomly selected all elements
([1,2,1,1], [1,1])             # can only repeat if they are repeated in first list
([], [])                       # only way second list can be empty is if first list is

It would never return these examples:

([1,2,3], [3,6])    # 6 was not in the first list
([42], [45])        # 45 was not in the first list
([1,2,3], [])       # requested length of at least 1, empty not allowed
([1,2,3,4], [1,2,2] # 2 cannot be repeated if it is not repeated in first list

@Centril
Copy link
Collaborator

Centril commented Nov 20, 2017

Now I know what you want =)

Doing this as a trait is probably reasonable.

Is the following valid?

([1, 2, 3, 4], [2, 1]) <-- note that the ordering has changed.

What are the semantics of (un)shrinking?
A proposal:

  • The ValueTree should be generated with an initial sub-Vec.
  • For simplify() the ValueTree should remove one tail element from the last-returned sub-Vec if it hasn't reached the minimum length. This removed element is added to another Vec stored in the ValueTree
  • For complicate(), the last removed element should be re-added.

The type is thus:

struct SubVecValueTree<T: Clone> {
    minimum: usize,
    current: Vec<T>,
    removed: Vec<T>,
}

A version on this is to alternate between removing the first / last element. It depends on if you want to bias towards the first element being the simplest, or if you want the middle element to be the simplest (in which case you alternate).

I think .select is unnecessary since we already have Union, so you can do:

Union::new_weighted(my_vec.into_iter().map(|elt| (1, Just(elt))))

Tho for performance a more specialized solution may be right.

@vitiral
Copy link
Author

vitiral commented Nov 20, 2017

Is the following valid?

yes, they can be out of order (just like they can be in rand::sample). As rand::sample says:

The order of elements in the sample is not random.

Note that this does not mean it is in the same order... just that the array is not shuffled as well (the implementation fills the array and then randomly replaces items, so the first N elements will always be in their original position).

What are the semantics of (un)shrinking?

Since the returned vec is random anyway, I would do it this way:

  • simplify() removes a random element from current and adds it to the removed
  • complicate() takes a random element from removed and adds it to current.

Agree that sample is more important than select. Heck, you could even sample one element as a vector and then take it's 0th index.

@vitiral
Copy link
Author

vitiral commented Nov 20, 2017

Also, sorry I wasn't more clear. I'm still trying to wrap my head fully around this library. It is SO awesome though!

@Centril
Copy link
Collaborator

Centril commented Nov 21, 2017

That's OK =)

The operations of ValueTree should be deterministic. Randomness comes from the generation of the ValueTree itself. The only way to be random inside the ValueTree is to clone the XorShiftRng which means that any mutation made to it inside the ValueTree is not persisted across new_value runs.
This bit is instructive: https://docs.rs/proptest/0.3.1/src/proptest/collection.rs.html#344-350

I guess you could modify TestRunner by adding a function ValueTree::yield_rng(self) -> Option<XorShiftRng> { None } whereby the ValueTree may clone the TestRunner's XorShiftRng during construction and give back the modified version which the TestRunner may set to itself before the next new_value call. However, this mechanism does not compose well.

Given that the elements can be out of order, this is not a sub-list operation, but rather: sublist(shufle(vec), min..max) - this is natural for sets, since there is no real ordering - however, for Vec, I think this is a bit weird to allow the elements to be out of order. It might be useful to provide a shuffle operation for Vec which does not shrink - as a different operation. Then you can compose the operations.

@vitiral
Copy link
Author

vitiral commented Nov 21, 2017

I'm a bit confused -- are not ALL operations, even "random" ones deterministic if you use a seed?

If a non-random ordering is what you want that should be easy to implement -- just preserve the index and then sort by it -- but this will have quite a large impact on performance for large selections.

Since the shuffle isn't suitably random anyway, I agree that shuffle should be a separate operation anyway.

I'm not sure if preserving order should be a major design choice or not for sample. If that were desired I would call it sample_stable since it reminds me of stable-sorting.

@vitiral
Copy link
Author

vitiral commented Nov 21, 2017

Ok, I think I see -- you would have to clone the Rng in the VecStrategy in order to later use it... ya that isn't ideal, but I don't think it's TOO bad.

Your strategy isn't bad either. It could be adjusted by decrementing the index that is removed from right to left or something, moving values back and forth from removed as you go and keeping elements that caused the test to fail. Obviously this is slow, but it's fine to be slow while your shrinking right?

@vitiral
Copy link
Author

vitiral commented Nov 21, 2017

Also, I would make removed a queue. Imagine the following case where having 4 anywhere fails:

Edit: I took another stab, this still isn't quite right

1:
  cur=[1,2,3,4,5] # fail, simplify()
  rem=[]
  fail -> simplify()
2:
   cur=[1,2,3,4]
   rem=[5]
   fail -> simplify
3: 
    cur=[1,2,3]
    rem=[5,4]
    pass -> complicate
4:
    cur=[1,2,3,5]
    rem=[4]
    pass -> complicate
5:
    cur=[1,2,3,5,4]
    rem=[]
    fail -> complicate
6:
   cur=[1,2,3,5]
   rem=[4]
   pass -> complicate + somehow know that we need to move one index back?
    agh, maybe this doesn't work.

Ya, going from both ends is probably a better option

@AltSysrq
Copy link
Collaborator

Backing away from the theory here for a moment to give my opinions on the original request:

select()

As @Centril mentioned above, this is already possible with the current API, it's just a bit of a mouthful. It'd be fairly easy to add a convenience for this and definitely worth it.

sample()

Also useful and not easily implemented from the user perspective.

As far as implementation goes, most of it could be done by reusing the logic we already have for BitSet.

  • Add a new strategy for BitSetLike types which does an n-choose-k algorithm to choose the initial bits. The current BitSetValueTree shrinks to all bits being zero; local filtering is sufficient to keep the set size to a minimum.

  • Selecting elements is done by prop_map()ing over the bitset strategy and choosing elements whose bit is set.

  • Add a convenience to put these pieces together.

Besides being less code, it also doesn't require constantly moving larger pieces of data around.

@Centril
Copy link
Collaborator

Centril commented Nov 21, 2017

I guess the better idea is to just store the entire sub-vector and then store a slice to the vector as well in the style of owning_ref. Then you can modify the slice without any allocation in simplify() or complicate().

@vitiral
Copy link
Author

vitiral commented Nov 21, 2017

wait, I think I've confused myself... is this code even valid?

let strategy = vec(u8::ANY,  1 .. std::u16::MAX)
    .prop_map(|v| (v, v.prop_sample(1 .. v.len()));

v is a real vector is it not? Can you call prop_sample on a Vec?

Edit, at the very least the return type would be (Just(v), v.prop_sample(...)), but is there something else I'm missing?

Edit2: it looks like I should be using flat_map right? I'm still confused on how you could call prop_sample on a "real" vector. I don't see impl VecStrategy for Vec anywhere.

@vitiral
Copy link
Author

vitiral commented Nov 21, 2017

hmm... based on my above realizations I'd say sample needs to be a function, not a method. So it would be used in this way:

let vec_strategy = vec(u8::ANY,  1 .. std::u16::MAX)
    .prop_map(|v| (Just(v), prop::sample(v, 1 .. v.len()));

We have to be able to create a strategy out of a real vector, so it works kind of like Just. Am I making any sense?

@Centril
Copy link
Collaborator

Centril commented Nov 21, 2017

@AltSysrq How do you feel about:

  1. How do you feel about shuffle (with no shrinking)
  2. having a trait for "sample" (I'm going to call it part_of henceforth)?
    I think it might be nice to have since some things may be optimized for non-Vec, and they may also have different semantics wrt. ordering...
  3. speaking of... should Vec::part_of preserve ordering of input Vec in the output Vec?
  4. it might be nice to have a part_of_shuffle which acts like part_of but shuffles first.

So how about:

pub trait ContainerStrategy /* <-- bikeshed name */ {
    fn prop_part_of(self /* perhaps &self ? */, bounds: Range<usize>)
        // in reality concrete types for now since impl trait is not stable.
        // use some associated type instead.
        -> impl Strategy<Value = impl ValueTree<Value = Self>>;
}

// Trait is separate because some types have no ordering, notably `HashSet`.
pub trait ShuffleStrategy /* <-- bikeshed */ {  
    fn prop_shuffle(self)
        -> impl Strategy<Value = impl ValueTree<Value = Self>>;
}

@vitiral You have to prop_flat_map instead if you are returning a Strategy, so:

vec(u8::ANY,  1 .. std::u16::MAX)
    .prop_flat_map(|v| (Just(v), v.prop_part_of(1 .. v.len())));

@AltSysrq
Copy link
Collaborator

The constructor would probably be something like VecStrategy::sample_from(collection, size) with a companion proptest::vec::sample_from free function. prop_flat_map would come in if the vector itself were produced by a strategy, so something like

let strategy = vec(u8::ANY, 1 .. std::u16::MAX)
    .prop_flat_map(|v| {
        // Appease the borrow checker
        let len = v.len();
        vec::sample_from(v, len)
    });

@vitiral
Copy link
Author

vitiral commented Nov 21, 2017

@AltSysrq that makes sense and is in-line with what I was saying. Do you think len should be either an int-strategy OR a hard integer (I'm not sure if that is already the case, does usize implement IntStrategy?).

I can see a use case for wanting to sample a random number of elements (but within a certain range)... mainly that is my use case :)

Also, I assume v just needs to be a type implementing IntoIter right? It doesn't have to be a vector.

@AltSysrq
Copy link
Collaborator

@Centril

  1. Positively, it's just a matter of generating a RNG for the value so the shuffling is consistent. It might make sense to shrink to a non-shuffle.

  2. I'm not sure what that trait would look like, especially since there's no existing trait for collections.

  3. I think it should. The BitSet implementation I suggested does this and it makes the behaviour more deterministic. It also makes it a proper subsequence operation.

  4. If we implement both building blocks, it's not much cost to add this too.

I don't care for the name part_of; it sounds more like slicing. It would be better to stick to an established name, either from a major library (thus sample) or a mathematical operation (e.g., subsequence or choose_k).

@vitiral
Copy link
Author

vitiral commented Nov 21, 2017

I guess that is easily composable, sorry I'm still getting used to how to do this.

let vec_strategy = vec(u8::ANY,  1 .. std::u16::MAX)
    .prop_flat_map(|v| (v, 1 .. v.len())
    .prop_flat_map(|(v, num)| (Just(v), prop::sample(v, num));

@AltSysrq
Copy link
Collaborator

@vitiral

Do you think len should be either an integer-strategy OR a hard integer (I'm not sure if that is already the case, does usize implement IntStrategy?).

The current strategies for creating collections require the size to be a range, and this new API would be the same. Ideally we'd accept any range or a fixed integer, but I don't think all the functionality to support that is stable in std yet. Note that it's not actually an arbitrary strategy; because it knows the size range directly, it can avoid all the disadvantages that would come from using prop_flat_map.

Also, I assume v just needs to be a type implementing IntoIter right? It doesn't have to be a vector.

It would likely be possible to use just IntoIter, but in practise it'll need to run over that list many times to just select a (possibly tiny) subset, so it's probably best for both efficiency and simplicity to just buffer into a vector (though the API could certainly take an IntoIter anyway.)

@Centril
Copy link
Collaborator

Centril commented Nov 21, 2017

@AltSysrq

  1. How would shrinking to a non-shuffle work?
pub trait ContainerStrategy {
    type Output;
    fn prop_part_of(self, bounds: Range<usize>) -> Self::Output;
}

then we can have:

// More deref types when specialization lands, for example:  AsRef<[T]>.

impl<T: Clone> ContainerStrategy for Vec<T> {
    type Output = Vec<T>;
    fn prop_part_of(self, bounds: Range<usize>) -> Self::Output { ... }
}

impl<T: Clone> ContainerStrategy for HashSet<T> {
    type Output = HashSet<T>;
    fn prop_part_of(self, bounds: Range<usize>) -> Self::Output { ... }
}

3 + 4. great =)
5. Where's the name sample from? subsequence sounds right for Vec, but not for HashSet. choose_k is probably too hard to find...?

@AltSysrq
Copy link
Collaborator

@Centril

1. simplify() => current_value() just passes the original collection through unchanged; complicate() => current_value() goes back to doing the original shuffle.

2. I see, that looks good. Would probably make sense to put an equivalent of the current free functions there as well.

5. rand::sample(https://docs.rs/rand/0.3.18/rand/fn.sample.html). I get your point about subsequence; conversely, there's also "subset" or "combination" that works for HashSet/HashMap but not Vec. I think I'd definitely lean to sample at this point.

@vitiral
Copy link
Author

vitiral commented Nov 21, 2017

@AltSysrq

It would likely be possible to use just IntoIter, but in practise it'll need to run over that list many times to just select a (possibly tiny) subset

The implementation in rand has to run throught the list exactly once, so hopefully that performance would be possible here.

Edit: I guess they don't have the requirement of being deterministic-ish, which is nice to have here. So I understand either choice.

@AltSysrq I'm confused by the variable name bounds, are you saying it defines a range of valid indexes, or is num a better name?

Also nitpick, you forgot to add them in your impls

@Centril
Copy link
Collaborator

Centril commented Nov 21, 2017

@AltSysrq

  1. Oh ;) I was envisaging different degrees of chaos you could simplify to ^.^ If not, then simplify() / complicate() are essentially just a mem::swap. Should the ValueTree have its own independent XorShiftRng?

  2. Agreed, free functions make sense.

  3. Original Haskell QC calls the most similar operation subListOf , but that has the same problems as subsequence. I guess prop_sample is the least bad - The name part_of was conceived in trying to find the common denominator of subsequence and subset.

@vitiral I think you meant to highlight me at the end there?
In prop_sample(self, bounds: Range<usize>) -> Self::Output the argument bounds refers to the minimum amount of elements, and and the maximum amount of elements in Self::Output. If self.len() < bounds.high then self.len() is used. If self.len() < bounds.low you get a panic (or perhaps not? discuss...).

@AltSysrq
Copy link
Collaborator

The implementation in rand has to run throught the list exactly once, so hopefully that performance would be possible here.

Yes. I was thinking about the case of a single, constant array where every test input would be derived from the same available selection, which feels like it might be a more common use-case.

the variable name bounds

Sorry, I meant that that's the bounds on the size of the output.

@AltSysrq
Copy link
Collaborator

Should the ValueTree have its own independent XorShiftRng?

I don't think you have much of a choice since you won't have access to the one on TestRunner at that point. Presumably you'll want to start from the same seed (for a particular ValueTree) each time too.

@Centril
Copy link
Collaborator

Centril commented Nov 21, 2017

@AltSysrq I was thinking that SuffleStrategy::new_value(...) shuffles the input and then yields:

pub struct ShuffleValueTree<T> {
    current: T,
    other: T,
    is_simple: bool,
}

impl<T: Clone> ValueTree for ShuffleValueTree<T> {
    type Value = T;
    fn current(&self) -> Self::Value { self.current.clone() }
    fn simplify(&mut self) -> bool {
        if !self.is_shuffled { return false; }
        mem::swap(&mut self.current, &mut self.other);
        self.is_shuffled = false;
        true
    }

    fn complicate(&mut self) -> bool {
        if self.is_shuffled { return false; }
        mem::swap(&mut self.current, &mut self.other);
        self.is_shuffled = true;
        true
    }
}

But perhaps this was not the idea at all?

@AltSysrq
Copy link
Collaborator

Oh, that works too and is a lot simpler. 👍

@Centril
Copy link
Collaborator

Centril commented Nov 21, 2017

@AltSysrq Tho there might be degrees of shuffled:ness? (which this doesn't capture)

PS: Let's ask a physicist about chaos ;)

@AltSysrq
Copy link
Collaborator

I had been thinking of something like

// Generics simplified for conciseness
struct ShuffleValueTree<V : ValueTree> {
    inner: V,
    rng: XorShiftRng,
    shuffle: bool,
}

impl<V : ValueTree> ValueTree for ShuffleValueTree<V> {
    type Value = V::Value;
    
    fn current(&self) -> V::Value {
        let mut collection = self.inner.current();
        if shuffle {
            // We start from the same RNG every time, so every call shuffles the same way
            self.rng.clone().shuffle(&mut collection);
        }
        collection
    }

    fn simplify(&mut self) -> bool {
        // TODO: If already simplified, call `inner.simplify()`
        let r = self.shuffle;
        self.shuffle = false;
        r
    }

    fn complicate(&mut self) -> bool {
        let r = self.shuffle;
        self.shuffle = true;
        !r
    }
}

so no "degrees of shuffledness".

Actually, now that I wrote that, I'm not sure the swap approach works, since current() yields an owned value, and it doesn't have an obvious way to continue simplifying the source ValueTree without having separate logic to then re-shuffle it.

@Centril
Copy link
Collaborator

Centril commented Nov 21, 2017

@AltSysrq Well, there's a .clone() there, so it should work - but it T was a ValueTree::Value, not a ValueTree - but that's quite boring I guess. Your solution is more general 👍.

I'll investigate if "degrees of shuffledness" is a doable thing.

@Centril
Copy link
Collaborator

Centril commented Nov 21, 2017

@AltSysrq For a "degrees of shuffledness" we could simply modify the Fisher-Yates algorithm like so (distribution may not be optimal...):

extern crate rand;

use rand::{thread_rng, Rng};

fn shuffle<T, R: Rng>(rng: &mut R, values: &mut [T], degree: usize) {
    if degree == 0 { return; }
    let len = values.len();
    // magic computation, come up with something better...
    let prob_to_swap = ((len - degree) as f64).sqrt().ceil() as usize;

    let mut i = len;
    while i >= 2 {
        // invariant: elements with index >= i have been locked in place.
        i -= 1;
        // 1 / prob_to_swap chance to swap.
        if rng.gen_range(0, prob_to_swap) == 0 {
            // lock element i in place.
            values.swap(i, rng.gen_range(0, i + 1));
        }
    }
}

fn main() {
    let mut rng = thread_rng();

    for degree in 0..20 {
        let mut vec = (0..20).collect::<Vec<u8>>();
        shuffle(&mut rng, &mut vec, degree);
        println!("{:?}", vec);
    }
}

@AltSysrq
Copy link
Collaborator

@Centril That might be a bit too chaotic since it perturbs the RNG as it shrinks (both due to not calling gen_range() the same number of times before each element, and I assume gen_range() uses rejection sampling so changing the range would perturb it that way too).

Maybe something like this? (Not 100% confident of the shrinking logic)

// Generics simplified for conciseness
struct ShuffleValueTree<V : ValueTree> {
    inner: V,
    rng: XorShiftRng,
    // max_shuffle is the maximum distance any element will be perturbed in a
    // single step; as it is reduced to zero, the shuffle makes fewer and fewer
    // changes to the output.
    // Initialised to the size of the output.
    max_shuffle: usize,
    // The maximum value that `complicate()` can increase `max_shuffle` to.
    max_max_shuffle: usize,
    // The minimum value that `simplify()` can decrease `min_shuffle` to.
    // Starts at 0.
    min_max_shuffle: usize,
}

impl<V : ValueTree> ValueTree for ShuffleValueTree<V> {
    type Value = V::Value;

    fn current(&self) -> V::Value {
        let mut collection = self.inner.current();
        let mut rng = self.rng.clone();
        for i in 0..collection.size() - 1 {
            // By generating the index outside any conditional, we produce the
            // same sequence of candidate swaps regardless of `max_shuffle`.
            let other = rng.gen_range(i + 1, collection.size());
            if other - i < self.max_shuffle {
                collection.swap(i, other);
            }
        }
        collection
    }

    fn simplify(&mut self) -> bool {
        self.max_max_shuffle = self.max_shuffle;
        if self.max_shuffle > self.min_max_shuffle {
            self.max_shuffle -= 1;
            true
        } else {
            self.inner.simplify()
        }
    }

    fn complicate(&mut self) -> bool {
        if self.max_shuffle < self.max_max_shuffle {
            self.max_shuffle += 1;
            self.min_max_shuffle = self.max_shuffle;
            true
        } else {
            self.inner.complicate()
        }
    }
}

@vitiral
Copy link
Author

vitiral commented Nov 21, 2017

@Centril yes I did mean to tag you, sorry

In prop_sample(self, bounds: Range) -> Self::Output the argument bounds refers to the minimum amount of elements, and and the maximum amount of elements in Self::Output.

personally I dislike the name bounds since it makes me think of a min -> max index, and since we are giving ranges like 4 .. 10 this could easily be confused. rand::sample uses amount which I think is a reasonable name.

If self.len() < bounds.high then self.len() is used. If self.len() < bounds.low you get a panic (or perhaps not? discuss...).

this is not how rand::sample works ("Randomly sample up to amount elements from a finite iterator.") -- it cannot fail and if it happens to have self.len() < num_wanted then it just returns self.len() elements.

I'm not sure I like this, and I find it very anti-rust personally. IMO it should return an Option which is None if there aren't enough elements. Obviously this would be a breaking change so rust will never change it though (at least... not until a new epoch or whatever).

For a testing libray I think panicing is probably appropriate. It would be very confusing if you did prop_sample(v, 100..200) and then had a vector with only 2 elements (or something). Silent errors like that remind me of javascript and the bubbling undefined.

Edit: rust-random/rand#194

@Centril
Copy link
Collaborator

Centril commented Nov 22, 2017

@AltSysrq That shrinking logic seems like a better idea with a better notion of chaos (locality / delta).

@vitiral bounds refers to (min/max) - an amount is not a Range<usize> - but what a parameter is named isn't that important. What is important is giving a clear documentation about the behaviors of the method. I think fail-fast behavior in response to what is clearly a programmer-error is appropriate - the standard library does use panics for that in certain places - it is even more appropriate for a testing lib.

To be clear, subsequence effectively would first randomize amount in [bounds.start, bounds.end] and then use sample with that.

Centril added a commit to Centril/proptest that referenced this issue Nov 22, 2017
@Centril
Copy link
Collaborator

Centril commented Nov 22, 2017

Done initial work on subsequence - todo: tests and more CollectionStrategy impls. Work in #8 .

@Centril
Copy link
Collaborator

Centril commented Nov 25, 2017

@AltSysrq So, looking at it again... is there a reason it isn't just:

    fn simplify(&mut self) -> bool {
        if self.max_shuffle > self.min_max_shuffle {
            self.max_shuffle -= 1;
            true
        } else {
            self.inner.simplify()
        }
    }

    fn complicate(&mut self) -> bool {
        if self.max_shuffle < self.max_max_shuffle {
            self.max_shuffle += 1;
            true
        } else {
            self.inner.complicate()
        }
    }

would this cause an infinite loop perhaps? so you are trying to ensure that it's only possible to complicate once once fully simplified?

The logic of .current() looks good.

@AltSysrq
Copy link
Collaborator

@Centril

is there a reason it isn't just:

It needs to ensure that calls to complicate() after a simplify() don't increase the complexity beyond where it was immediately before the call to simplify().

@Centril
Copy link
Collaborator

Centril commented Nov 26, 2017

@AltSysrq Right - then it looks good =) I guess you can add the shuffle feature?

@vitiral
Copy link
Author

vitiral commented Dec 8, 2017

@Centril is the shuffle feature necessary for this PR?

@AltSysrq
Copy link
Collaborator

AltSysrq commented Dec 8, 2017

I have shuffle added to the 0.3.2-changes branch. In the process I discovered a bug in VecStrategy; I'm going to be adding some tests for that specific problem to all the non-trivial strategies, but that's the only thing left blocking the release of 0.3.2 at this point.

@Centril
Copy link
Collaborator

Centril commented Dec 8, 2017

What is necessary or not is ofc entirely up to @AltSysrq =)

@AltSysrq
Copy link
Collaborator

AltSysrq commented Dec 9, 2017

sample::select() and sample::subsequence() are added in 0.3.2, as well as the discussed prop_shuffle combinator.

@AltSysrq AltSysrq closed this as completed Dec 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants