-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added representation of pointer types. #51
Conversation
On thinking about it, now that the non-zero and alignment requirements are punted to validity, we're just left with size and alignment. Everything else can be part of the notes. |
Do we need to add text about the ABI? |
#[repr(C)] | ||
struct DynObject { | ||
data: &u8, | ||
vtable: &usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this vtable: &usize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To emphasize the fact that it's non-zero and word-aligned. Not part of the representation, but I believe we're going to be requiring this when we come to validity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a very confusing way to make that point, and since it's not even really part of what this write-up is supposed to be about at this point, nor particularly useful, I'd really prefer not doing that. Just *const ()
is good enough for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made them both &u8
.
For slices, we could say something like the representation of #[repr(C)]
struct Slice {
ptr: &[T;N],
len: usize,
} and then when we come to validity, say that |
I think I dealt with a ll the outstanding comments. Can we squash and merge? |
|
||
### Representation | ||
|
||
The alignment of reference and raw pointer types is the word size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized this might not actually be true for custom DSTs (e.g., consider a custom DST with u64
metadata on a 32 bit target). Since the rest of this text is already careful to be forward compatible with custom DSTs, it would be a shame if this part wasn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, worth thinking about. On 32-bit targets, does u64
have 64-bit alignment? Are there architectures with alignment larger than the word size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, u64 is not a good example, because on most (all?) 32 bit targets it has alignment 4. But there are definitely tons of types with super-word alignment -- anyone can define them with #[repr(align(N))]
, and vector types such as __m128
usually require natural alignment (i.e., size = align).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the question is do we want to tie our hands and say that metadata of custom DSTs is at most word-aligned? The trade-offs don't seem obvious to me, I'm trying to come up with an example and the best I can do is something like:
let a: Box<T> = Box::new(T::new());
let b: Box<(*T, usize, usize)> = Box::new(Box::Into_raw(a), 0, 0);
let c: &&U = unsafe { mem::transmute(Box::into_raw(b)) };
The question is can this be insta-UB if U
s metadata is super-word aligned?
This is the simplest example I can think of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing metadata with higher alignment comes naturally, as pointers to custom DSTs are just (<untyped data pointer>, T::Metadata)
. Is there any reason to artifically prohibit it? All of the code I know of that can be generic over all DSTs doesn't have to care about alignment or can trivially be made compatible with higher alignments by strategically using mem::align_of::<&T>
(and T
has to be known anyway if you're manipulating pointers, because it dictates the size of the pointer too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My two cents is that we shouldn't say anything about the size and alignment of references here.
We should just say that the layout of references (&T
and &mut T
) and pointers (*const T
and *mut T
) is the same as the layout of the following type:
struct Layout<Metadata> {
ptr: *mut (),
metadata: Metadata,
}
We can then say that:
note:
Metadata === X
is to be read asMetadata
has the same layout asX
- if
T: Sized
, thenMetadata === ()
, - if
T
is a slice, thenMetadata === usize
, - if
T
is a trait, thenMetadata === *const ()
, - if
T
is a multi-trait object, thenMetadata === <implementation-defined>
, - if
T
is a custom DST, thenMetadata === T::Metadata
, - etc.
and the layout of pointers and references just follows from the struct
representation rules, e.g., that they are at least one "word" wide follows from Layout
always being at least one word wide.
Since I've used raw pointers, we have to say something about the size and alignment of raw pointers here, but we already said that their size is the same as usize
, and for alignment we can just say that the raw pointer alignment is an implementation-defined multiple of it's the same as their size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we have custom DSTs in the language, then all pointers are ... { ptr: *mut (), metadata: T::Metadata }
, as custom DSTs generalize slices, trait objects, and sized types (we can still spell this out for clarity but there's no need to have two things called Metadata
or to introduce this ===
relation). But we don't have custom DSTs yet, so it feels very weird to define layout in terms of a concept the language doesn't have yet, even if we can technically avoid any reference to a specific non-existent DynamicallySized
trait.
Since I've used raw pointers, we have to say something about the size and alignment of raw pointers here, but we already said that their size is the same as usize, and for alignment we can just say that the raw pointer alignment is an implementation-defined multiple of their size.
Wait, what? If the alignment is not described yet, it should be described if we can at all get consensus for it. As far as I know, that alignment should be exactly the size. But even if that is controversial, it's the other way around, size has to be a multiple of the alignment (because element i of an array is at byte offset i * sizeof(T)
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we don't have custom DSTs yet, so it feels very weird to define layout in terms of a concept the language doesn't have yet, even if we can technically avoid any reference to a specific non-existent DynamicallySized trait.
Even without the concept of custom DSTs, we do have the concept of generic structs, and their layout, and those can be used to specify the layout of slices and trait objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but unifying these two layouts into one generic struct, and the naming of the type parameter, gestures heavily at the concept of custom DSTs (would anyone make this suggestion if they didn't know of the generalization that custom DSTs bring?), without really buying us anything right now. If anything, it makes the presentation slightly more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I reworded this to say that reference types have at least word alignment, and have exactly word alignment in the case of references to sized types, &[T]
, &str
or &dyn Trait
.
Co-Authored-By: asajeffrey <ajeffrey@mozilla.com>
OK, try again... are we okay with merging now? |
```rust | ||
#[repr(C)] | ||
struct Slice<T> { | ||
ptr: &T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: if the length is zero, then what is ptr
? Perhaps *const T
is a better choice -- certainly it will fit the "validity invariants" better (eg, I don't think there is any requirement that it be "dereferenceable" in the event that length is zero). However, from_raw_parts
does state:
data must be non-null and aligned, even for zero-length slices. One reason for this is that enum layout optimizations may rely on references (including slices of any length) being aligned and non-null to distinguish them from other data. You can obtain a pointer that is usable as data for zero-length slices using
NonNull::dangling()
.
It seems worth linking to slice::from_raw_parts
and reproducing some of that text (although arguably those are things best left for the next discussion?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't like the use of references here much [0] the guarantee here is that "the representation of &[T]
is the same as of Slice<T>
", - that is, the Slice<T>
struct only applies to the representation / layout of &[T]
and not to its validity.
Ideally, we would use the same examples to show validity and representation, but we haven't settled the validity of these yet (it might well be that we need different "examples" for the &[T]
and the *[T]
cases). I think it is worth it to wait until the validity of these is settled before trying to "unify" all the examples.
[0] Personally, I don't like the use of &T
here much, it has too many connotations, e.g., w.r.t. validity and safety, and we don't really care about them when just talking about the representation. It might be worth it to just say:
The representation of &[T]
is the same as that of:
struct Slice {
ptr: *const (),
len: usize,
}
AFAICT there is no need to make Slice<T>
generic since T: Sized
, but maybe there is something else going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I said above, we could also use
struct Slice<T> {
ptr: &[T;N],
len: usize,
}
and in practice, N
and len
are the same.
All these defns are equivalent, it's just a case of which is the more human-readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment that *T
means *const T
or *mut T
when the mutability is unimportant, and used *T
rather than &T
. On zulip, @gnzlbg was okay with this, have we finally closed the last issue?
OK, asking yet again, can we squash and merge? |
No objections from my side. |
@avadacatavra I think you can squash and merge. |
Fixes #16