-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sample()
and select()
functions are missing
#13
Comments
What would be the preferred signatures of So for simplicity let's assume that a |
I think I would prefer #14 over this, since #14 is more general use case and can be used in any way that the user needs, including using any container they wish. I think it would be a mistake to not use A really simple case would be to shuffle the data and select the first N, but this will take |
I'm looking at this more and questioning whether #14 is actually the way to go... once we open up that can of worms it may be difficult to walk it back. Intead, I think the right way to go about it is for the container-strategies (i.e. HashMapStrategy, etc) to all implement a
It would be almost exactly this way for the others as well. I would think you could make a |
Could you clarify what |
@Centril lol literally just doing that :). I edited it slightly for more clarity. Sorry if the types aren't quite right, but I hope the intent got across. |
I don't think that description is sufficient, at least not for me to implement this independently - I'm still not sure what these operations actually do... From what I understand, So what is |
what is
Edit: if you can think a way to do this right now that would be enlightening, as I couldn't wrap my head around how to do it with what exists. I think |
The goal would be that the api for
I guess this generates up to amount, so it actually doesn't fail if |
the code is actually pretty basic, I might be able to actually implement this as a workaround. Edit: or wait, I need a way to keep getting random numbers, which I don't think is really possible using proptest. |
Here If you want to randomize the number of elements too you can use use proptest::num::{u16, u8};
use proptest::collection::vec;
// A strategy that generates (0 ..= std::u16::MAX) elements of u8:s.
let strategy = u16::ANY.prop_flat_map(|len| vec(u8::ANY, len..len)); |
A smarter way to put this is (approximately): use proptest::num::u8;
use proptest::collection::vec;
let strategy = vec(u8::ANY, 0 .. std::u16::MAX); |
but that selects from all What I would like to do is something like:
This could return the following examples:
It would never return these examples:
|
Now I know what you want =) Doing this as a trait is probably reasonable. Is the following valid?
What are the semantics of (un)shrinking?
The type is thus: struct SubVecValueTree<T: Clone> {
minimum: usize,
current: Vec<T>,
removed: Vec<T>,
} A version on this is to alternate between removing the first / last element. It depends on if you want to bias towards the first element being the simplest, or if you want the middle element to be the simplest (in which case you alternate). I think Union::new_weighted(my_vec.into_iter().map(|elt| (1, Just(elt)))) Tho for performance a more specialized solution may be right. |
yes, they can be out of order (just like they can be in
Note that this does not mean it is in the same order... just that the array is not shuffled as well (the implementation fills the array and then randomly replaces items, so the first N elements will always be in their original position).
Since the returned vec is random anyway, I would do it this way:
Agree that |
Also, sorry I wasn't more clear. I'm still trying to wrap my head fully around this library. It is SO awesome though! |
That's OK =) The operations of I guess you could modify Given that the elements can be out of order, this is not a sub-list operation, but rather: |
I'm a bit confused -- are not ALL operations, even "random" ones deterministic if you use a seed? If a non-random ordering is what you want that should be easy to implement -- just preserve the index and then sort by it -- but this will have quite a large impact on performance for large selections. Since the shuffle isn't suitably random anyway, I agree that I'm not sure if preserving order should be a major design choice or not for |
Ok, I think I see -- you would have to clone the Your strategy isn't bad either. It could be adjusted by decrementing the index that is removed from right to left or something, moving values back and forth from |
Also, I would make Edit: I took another stab, this still isn't quite right
Ya, going from both ends is probably a better option |
Backing away from the theory here for a moment to give my opinions on the original request:
As @Centril mentioned above, this is already possible with the current API, it's just a bit of a mouthful. It'd be fairly easy to add a convenience for this and definitely worth it.
Also useful and not easily implemented from the user perspective. As far as implementation goes, most of it could be done by reusing the logic we already have for
Besides being less code, it also doesn't require constantly moving larger pieces of data around. |
I guess the better idea is to just store the entire sub-vector and then store a slice to the vector as well in the style of owning_ref. Then you can modify the slice without any allocation in |
wait, I think I've confused myself... is this code even valid?
Edit, at the very least the return type would be Edit2: it looks like I should be using |
hmm... based on my above realizations I'd say
We have to be able to create a strategy out of a real vector, so it works kind of like |
@AltSysrq How do you feel about:
So how about: pub trait ContainerStrategy /* <-- bikeshed name */ {
fn prop_part_of(self /* perhaps &self ? */, bounds: Range<usize>)
// in reality concrete types for now since impl trait is not stable.
// use some associated type instead.
-> impl Strategy<Value = impl ValueTree<Value = Self>>;
}
// Trait is separate because some types have no ordering, notably `HashSet`.
pub trait ShuffleStrategy /* <-- bikeshed */ {
fn prop_shuffle(self)
-> impl Strategy<Value = impl ValueTree<Value = Self>>;
} @vitiral You have to vec(u8::ANY, 1 .. std::u16::MAX)
.prop_flat_map(|v| (Just(v), v.prop_part_of(1 .. v.len()))); |
The constructor would probably be something like let strategy = vec(u8::ANY, 1 .. std::u16::MAX)
.prop_flat_map(|v| {
// Appease the borrow checker
let len = v.len();
vec::sample_from(v, len)
}); |
@AltSysrq that makes sense and is in-line with what I was saying. Do you think I can see a use case for wanting to sample a random number of elements (but within a certain range)... mainly that is my use case :) Also, I assume |
I don't care for the name |
I guess that is easily composable, sorry I'm still getting used to how to do this.
|
The current strategies for creating collections require the size to be a range, and this new API would be the same. Ideally we'd accept any range or a fixed integer, but I don't think all the functionality to support that is stable in std yet. Note that it's not actually an arbitrary strategy; because it knows the size range directly, it can avoid all the disadvantages that would come from using
It would likely be possible to use just |
pub trait ContainerStrategy {
type Output;
fn prop_part_of(self, bounds: Range<usize>) -> Self::Output;
} then we can have: // More deref types when specialization lands, for example: AsRef<[T]>.
impl<T: Clone> ContainerStrategy for Vec<T> {
type Output = Vec<T>;
fn prop_part_of(self, bounds: Range<usize>) -> Self::Output { ... }
}
impl<T: Clone> ContainerStrategy for HashSet<T> {
type Output = HashSet<T>;
fn prop_part_of(self, bounds: Range<usize>) -> Self::Output { ... }
} 3 + 4. great =) |
1. 2. I see, that looks good. Would probably make sense to put an equivalent of the current free functions there as well. 5. |
The implementation in Edit: I guess they don't have the requirement of being deterministic-ish, which is nice to have here. So I understand either choice. @AltSysrq I'm confused by the variable name Also nitpick, you forgot to add them in your |
@vitiral I think you meant to highlight me at the end there? |
Yes. I was thinking about the case of a single, constant array where every test input would be derived from the same available selection, which feels like it might be a more common use-case.
Sorry, I meant that that's the bounds on the size of the output. |
I don't think you have much of a choice since you won't have access to the one on |
@AltSysrq I was thinking that pub struct ShuffleValueTree<T> {
current: T,
other: T,
is_simple: bool,
}
impl<T: Clone> ValueTree for ShuffleValueTree<T> {
type Value = T;
fn current(&self) -> Self::Value { self.current.clone() }
fn simplify(&mut self) -> bool {
if !self.is_shuffled { return false; }
mem::swap(&mut self.current, &mut self.other);
self.is_shuffled = false;
true
}
fn complicate(&mut self) -> bool {
if self.is_shuffled { return false; }
mem::swap(&mut self.current, &mut self.other);
self.is_shuffled = true;
true
}
} But perhaps this was not the idea at all? |
Oh, that works too and is a lot simpler. 👍 |
@AltSysrq Tho there might be degrees of shuffled:ness? (which this doesn't capture) PS: Let's ask a physicist about chaos ;) |
I had been thinking of something like // Generics simplified for conciseness
struct ShuffleValueTree<V : ValueTree> {
inner: V,
rng: XorShiftRng,
shuffle: bool,
}
impl<V : ValueTree> ValueTree for ShuffleValueTree<V> {
type Value = V::Value;
fn current(&self) -> V::Value {
let mut collection = self.inner.current();
if shuffle {
// We start from the same RNG every time, so every call shuffles the same way
self.rng.clone().shuffle(&mut collection);
}
collection
}
fn simplify(&mut self) -> bool {
// TODO: If already simplified, call `inner.simplify()`
let r = self.shuffle;
self.shuffle = false;
r
}
fn complicate(&mut self) -> bool {
let r = self.shuffle;
self.shuffle = true;
!r
}
} so no "degrees of shuffledness". Actually, now that I wrote that, I'm not sure the swap approach works, since |
@AltSysrq Well, there's a I'll investigate if "degrees of shuffledness" is a doable thing. |
@AltSysrq For a "degrees of shuffledness" we could simply modify the Fisher-Yates algorithm like so (distribution may not be optimal...): extern crate rand;
use rand::{thread_rng, Rng};
fn shuffle<T, R: Rng>(rng: &mut R, values: &mut [T], degree: usize) {
if degree == 0 { return; }
let len = values.len();
// magic computation, come up with something better...
let prob_to_swap = ((len - degree) as f64).sqrt().ceil() as usize;
let mut i = len;
while i >= 2 {
// invariant: elements with index >= i have been locked in place.
i -= 1;
// 1 / prob_to_swap chance to swap.
if rng.gen_range(0, prob_to_swap) == 0 {
// lock element i in place.
values.swap(i, rng.gen_range(0, i + 1));
}
}
}
fn main() {
let mut rng = thread_rng();
for degree in 0..20 {
let mut vec = (0..20).collect::<Vec<u8>>();
shuffle(&mut rng, &mut vec, degree);
println!("{:?}", vec);
}
} |
@Centril That might be a bit too chaotic since it perturbs the RNG as it shrinks (both due to not calling Maybe something like this? (Not 100% confident of the shrinking logic) // Generics simplified for conciseness
struct ShuffleValueTree<V : ValueTree> {
inner: V,
rng: XorShiftRng,
// max_shuffle is the maximum distance any element will be perturbed in a
// single step; as it is reduced to zero, the shuffle makes fewer and fewer
// changes to the output.
// Initialised to the size of the output.
max_shuffle: usize,
// The maximum value that `complicate()` can increase `max_shuffle` to.
max_max_shuffle: usize,
// The minimum value that `simplify()` can decrease `min_shuffle` to.
// Starts at 0.
min_max_shuffle: usize,
}
impl<V : ValueTree> ValueTree for ShuffleValueTree<V> {
type Value = V::Value;
fn current(&self) -> V::Value {
let mut collection = self.inner.current();
let mut rng = self.rng.clone();
for i in 0..collection.size() - 1 {
// By generating the index outside any conditional, we produce the
// same sequence of candidate swaps regardless of `max_shuffle`.
let other = rng.gen_range(i + 1, collection.size());
if other - i < self.max_shuffle {
collection.swap(i, other);
}
}
collection
}
fn simplify(&mut self) -> bool {
self.max_max_shuffle = self.max_shuffle;
if self.max_shuffle > self.min_max_shuffle {
self.max_shuffle -= 1;
true
} else {
self.inner.simplify()
}
}
fn complicate(&mut self) -> bool {
if self.max_shuffle < self.max_max_shuffle {
self.max_shuffle += 1;
self.min_max_shuffle = self.max_shuffle;
true
} else {
self.inner.complicate()
}
}
} |
@Centril yes I did mean to tag you, sorry
personally I dislike the name
this is not how rand::sample works ("Randomly sample up to amount elements from a finite iterator.") -- it cannot fail and if it happens to have I'm not sure I like this, and I find it very anti-rust personally. IMO it should return an For a testing libray I think panicing is probably appropriate. It would be very confusing if you did Edit: rust-random/rand#194 |
@AltSysrq That shrinking logic seems like a better idea with a better notion of chaos (locality / delta). @vitiral To be clear, |
…ategy, subsequence}
Done initial work on |
@AltSysrq So, looking at it again... is there a reason it isn't just: fn simplify(&mut self) -> bool {
if self.max_shuffle > self.min_max_shuffle {
self.max_shuffle -= 1;
true
} else {
self.inner.simplify()
}
}
fn complicate(&mut self) -> bool {
if self.max_shuffle < self.max_max_shuffle {
self.max_shuffle += 1;
true
} else {
self.inner.complicate()
}
} would this cause an infinite loop perhaps? so you are trying to ensure that it's only possible to complicate once once fully simplified? The logic of |
It needs to ensure that calls to |
@AltSysrq Right - then it looks good =) I guess you can add the shuffle feature? |
@Centril is the shuffle feature necessary for this PR? |
I have shuffle added to the |
What is necessary or not is ofc entirely up to @AltSysrq =) |
|
My use case is a graph, where all nodes are randomly created, and then I want to select a subset for each node to connect to when I create the actual graph.
However, there doesn't seem to be either a
sample()
orselect()
api with which to do this.The text was updated successfully, but these errors were encountered: