MultiDistribution #18

benjamin-lieser · 2025-03-02T09:26:26Z

Added a CHANGELOG.md entry

Summary

Some code related to #16

benjamin-lieser · 2025-03-02T11:38:55Z

I put a &self receiver, like in Distribution @dhardy was there a specific reason why you proposed &mut self?

I am not sure if I want a different name, because it clashed with the one in Distribution and I would like structs to implement both if applicable. Making these things unambiguous feels often too cumbersome in Rust. And it can break existing code when doing use rand_distr::* or similar.

dhardy · 2025-03-02T12:13:31Z

I put a &self receiver, like in Distribution @dhardy was there a specific reason why you proposed &mut self?

Sorry, it should be &self.

I am not sure if I want a different name, because it clashed with the one in Distribution and I would like structs to implement both if applicable. Making these things unambiguous feels often too cumbersome in Rust. And it can break existing code when doing use rand_distr::* or similar.

I'm not really sure of the answer to that. Any of these can work (from the POV of a dependency):

use rand_distr::MultiDistribution; — Do we want the multi module to be pub? Likely yes, in which case supporting this path is redundant.
use rand_distr::multi::Distribution; — Usable but potential for confusion and can make usage of both Distribution traits annoying
use rand_distr::multi::MultiDistribution; — Redundant naming, but avoids the above issues so probably the best choice

benjamin-lieser · 2025-03-02T15:33:15Z

I was talking about the naming of sample because its the same as in distribution and makes the method ambiguous when both traits are in scope.

I guess I also prefer the last of the 3 options for the name of the trait.

benjamin-lieser · 2025-03-03T14:30:28Z

Also, should we do this in a v0.6 branch? I guess there will be breaking changes with respect to the Dirichlet at the end and possible some v0.5.* releases.

dhardy · 2025-03-03T15:20:00Z

I was talking about the naming of sample because its the same as in distribution and makes the method ambiguous when both traits are in scope.

This shouldn't be an issue unless a type implements both Distribution and MultiDistribution — but I doubt we'd want that.

Except possibly a symmetric distribution like Dirichlet which could be sampled in one or multiple dimensions? Or if we wanted to transparently support uncorrelated multi-dimensional sampling of 1D distributions like Normal? No, both these are likely bad ideas.

benjamin-lieser · 2025-03-03T18:15:13Z

I was talking about the naming of sample because its the same as in distribution and makes the method ambiguous when both traits are in scope.

This shouldn't be an issue unless a type implements both Distribution and MultiDistribution — but I doubt we'd want that.

Except possibly a symmetric distribution like Dirichlet which could be sampled in one or multiple dimensions? Or if we wanted to transparently support uncorrelated multi-dimensional sampling of 1D distributions like Normal? No, both these are likely bad ideas.

I was thinking about both having MultiDistribution to sample into a buffer and still Distribution which returns a Vec. If someone would anyway allocate for each sample or does not care about allocations, the latter is more convenient.
But I could understand if this would lead to more confusion than it benefits people.

dhardy · 2025-03-04T07:04:23Z

The point of using const generics was to avoid needing to allocate. If allocation is necessary anyway, having both sample_to_buf and sample(...) -> Vec<_> doesn't add much. Moreover, if we are going to support both styles of method, it should be done under the same trait in my opinion — we can implement sample automatically, provided we know the expected sample length.

Which leads to another point: we may want a sample_len or just len method.

So:

pub trait MultiDistribution<T> {
    fn sample_len(&self) -> usize;
    fn sample_to_buf(&self, buf: &mut [T]);
    fn sample(&self) -> Vec<T> where T: Default {
        let mut buf = Vec::new();
        buf.resize_with(self.sample_len(), T::default());
        self.sample_to_buf(&mut buf);
    }
}

That requires T: Default to support sample, which I think is reasonable.

benjamin-lieser · 2025-03-04T09:14:46Z

We should decide if we want to keep the possibility to have multidim sampling without allocations or not. If we do not need it, we can ditch the const generics and have less code and then your proposed Trait would make a lot of sense.

I am still leaning toward keeping it, because I had a usecase where Multinomial samples where extremely time critical and it helps to save the allocations, especially in multithread where there is synchronization with malloc. But this might also be a niche usecase. (I would only sample once per Multinomial)

Edit: Your Trait is also still implementable for a const generic version and you can avoid all allocations. So I would go with this approach.

dhardy · 2025-03-05T08:07:59Z

I thought we did decide to drop the const-generics approach for rand_distr?

Your use-case sounds fairly specific. Maybe there are further optimisations available when sampling only once.

MortenLohne · 2025-03-05T15:31:26Z

Would it be possible to have a non-const generic Dirichlet that can still be used without allocating? Without discussing Distribution traits for now, imagine the following API:

impl<F: Float> Dirichlet<F> {
    pub fn new(alpha: Vec<F>) -> Result<Dirichlet<F>, Error>;
    pub fn sample<R: Rng>(&self, rng: &mut R, output: &mut Vec<F>); // Users can re-use this vector to avoid allocations
    pub fn into_vec(self) -> Vec<F>; // Recovers the memory passed in with `new()`
}

This still requires the alloc feature, but so does the current (const generic) implementation, and afaict no one has presented such a use case yet.

For sample(), we could also generalize the output type to iter::Extend, if we want to allow collecting into other data structures.

benjamin-lieser · 2025-03-05T16:26:40Z

Would it be possible to have a non-const generic Dirichlet that can still be used without allocating? Without discussing Distribution traits for now, imagine the following API:
impl<F: Float> Dirichlet<F> {
    pub fn new(alpha: Vec<F>) -> Result<Dirichlet<F>, Error>;
    pub fn sample<R: Rng>(&self, rng: &mut R, output: &mut Vec<F>); // Users can re-use this vector to avoid allocations
    pub fn into_vec(self) -> Vec<F>; // Recovers the memory passed in with `new()`
}
This still requires the alloc feature, but so does the current (const generic) implementation, and afaict no one has presented such a use case yet.

For sample(), we could also generalize the output type to iter::Extend, if we want to allow collecting into other data structures.

In the case of Dirichlet it needs an array of other distributions (Beta or Gamma), so this would not work. I would also say its a bit to complex of an API.

dhardy

Some comments on comment style below.

More significantly, the sample method now looks identical to Distribution::sample, hence your idea to use that trait likely makes more sense: we can automatically impl Distribution<Vec<T>> where T: Default.

This would also allow an explicit impl of Distribution<[T; N]> where appropriate (e.g. your mentioned const-generic Multinomial).

But, at this point, do we still want the MultiDistribution trait at all?

dhardy · 2025-03-07T06:48:09Z

src/multi/mod.rs

@@ -0,0 +1,30 @@
+//! Contains Multi-dimensional distributions.


Drop the word 'contains' and remove the full-stop since this isn't a sentence.

dhardy · 2025-03-07T06:52:34Z

src/multi/mod.rs

+//! We provide a trait `MultiDistribution` which allows to sample from a multi-dimensional distribution without extra allocations.
+//! All multi-dimensional distributions implement `MultiDistribution` instead of the `Distribution` trait.


We generally wrap comments at 80 chars width (sometimes up to 100 if the line already has a large indent).

The wording could be a little better, e.g.

The MultiDistribution trait allows sampling a multi-dimensional distribution to a pre-allocated buffer or to a new [Vec].

dhardy · 2025-03-07T06:53:27Z

src/multi/mod.rs

+/// This trait allows to sample from a multi-dimensional distribution without extra allocations.
+/// For convenience it also provides a `sample` method which returns the result as a `Vec`.
+pub trait MultiDistribution<T> {


Items have a short one-line description, with additional details in new paragraphs.

dhardy · 2025-03-07T06:54:12Z

src/multi/mod.rs

+    /// returns the length of one sample (dimension of the distribution)
+    fn sample_len(&self) -> usize;


Capitalise the first letter of 'returns'

benjamin-lieser added 4 commits March 2, 2025 10:19

Multidistribution trait

4d7387d

documentation for MultiDistr

0048240

add to lib.rs and more doc

5c96617

better doc

6a5bd80

benjamin-lieser added 3 commits March 2, 2025 16:46

remove pub use of MultiDistribution

c9df79c

remove test impl of MultiDistribution again

e61da69

fmt

4a78dbc

benjamin-lieser added 2 commits March 3, 2025 22:25

move dirichlet

cb4c824

MultiDistribution in Dirichlet, still const generics

7572ce3

benjamin-lieser added 7 commits March 6, 2025 17:18

new MultiDistribution still const gen Dirichlet

b1e663d

fmt

b53aeda

doc

2435d3f

dirichlet usage

c4acb6d

fmt

860897c

doctest

3219cd6

typo

e74b4b4

dhardy reviewed Mar 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiDistribution #18

MultiDistribution #18

benjamin-lieser commented Mar 2, 2025

benjamin-lieser commented Mar 2, 2025 •

edited

Loading

dhardy commented Mar 2, 2025

benjamin-lieser commented Mar 2, 2025

benjamin-lieser commented Mar 3, 2025

dhardy commented Mar 3, 2025

benjamin-lieser commented Mar 3, 2025 •

edited

Loading

dhardy commented Mar 4, 2025 •

edited

Loading

benjamin-lieser commented Mar 4, 2025 •

edited

Loading

dhardy commented Mar 5, 2025

MortenLohne commented Mar 5, 2025

benjamin-lieser commented Mar 5, 2025

dhardy left a comment

dhardy Mar 7, 2025

dhardy Mar 7, 2025

dhardy Mar 7, 2025

dhardy Mar 7, 2025

		@@ -0,0 +1,30 @@
		//! Contains Multi-dimensional distributions.

		//! We provide a trait `MultiDistribution` which allows to sample from a multi-dimensional distribution without extra allocations.
		//! All multi-dimensional distributions implement `MultiDistribution` instead of the `Distribution` trait.

		/// returns the length of one sample (dimension of the distribution)
		fn sample_len(&self) -> usize;

MultiDistribution #18

Are you sure you want to change the base?

MultiDistribution #18

Conversation

benjamin-lieser commented Mar 2, 2025

Summary

benjamin-lieser commented Mar 2, 2025 • edited Loading

dhardy commented Mar 2, 2025

benjamin-lieser commented Mar 2, 2025

benjamin-lieser commented Mar 3, 2025

dhardy commented Mar 3, 2025

benjamin-lieser commented Mar 3, 2025 • edited Loading

dhardy commented Mar 4, 2025 • edited Loading

benjamin-lieser commented Mar 4, 2025 • edited Loading

dhardy commented Mar 5, 2025

MortenLohne commented Mar 5, 2025

benjamin-lieser commented Mar 5, 2025

dhardy left a comment

Choose a reason for hiding this comment

dhardy Mar 7, 2025

Choose a reason for hiding this comment

dhardy Mar 7, 2025

Choose a reason for hiding this comment

dhardy Mar 7, 2025

Choose a reason for hiding this comment

dhardy Mar 7, 2025

Choose a reason for hiding this comment

benjamin-lieser commented Mar 2, 2025 •

edited

Loading

benjamin-lieser commented Mar 3, 2025 •

edited

Loading

dhardy commented Mar 4, 2025 •

edited

Loading

benjamin-lieser commented Mar 4, 2025 •

edited

Loading