-
Notifications
You must be signed in to change notification settings - Fork 47.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fizz] deterministic text separators #24637
Conversation
This change addresses two related flaws in the current approach to text separators in Fizz. First text separators can be emitted more often than necessary. This is because Segments don't know whether they will be followed by text and so if they end in text they assume they are and insert a separators Second, becuase of the flaw above, Fizz output is not deterministic. If you wait for everything to be ready before flushing you could still end up with different output depending on whether certain segments are created or rendered inline based on the particular behavior of Suspendable components for any given run. If a Segment emits an unecessary text separator in one pass and in another the Segment is never created because that Component did not Suspend the string output would differ for otherwise identical content This change fixes both issues by decorating Segments with additional metadata about what kinds of emitted content is at it's bounds. These are called Edges and can be NODE, TEXT, or a Segment. NODE and TEXT are type identifiers that refer generically to these mutually exclusive concepts. if an edge is a Segment however it is the actual reference to the Segment object, which allows us to make inferrences at flushing time as to whether additional text separators are needed. There are 2 tracked edges per Segment currentEdge - a pointer to type of the prior thing emitted for a Segment. If the segment has not yet been worked on this will be the Edge just before the Segment start. If the Segment is currently being rendered it will be the type or reference of the last thing this Segment emitted. If the Segment is complete it definitionally be the trailing edge of the Segment representing the type emitted at the boundary. followingEdge - this edge is set by later Segments when the currentEdge of that segment is a Segment. If the currentEdge is a Segment and a new Edge is identified it is saved to the followingEdge of the currentEdge segment. Finally, when flushing 2 things happen If the just written Segment ends with text (currentEdge is TEXT_NODE) and if the followingEdge is also text, then a separator is written in between. If the followingEdge is instead another Completed Segment this segment is empty and we walk to that Segment's followingEdge and continue looking for Text.
Comparing: a276638...e68ac36 Critical size changesIncludes critical production bundles, as well as any change greater than 2%:
Significant size changesIncludes any change greater than 0.2%: Expand to show
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do the object reference thing. It'll be tricky to do with true parallelism since the objects will span threads, and for reuse of segments across caches.
I don't think you should need that though because during flush you'll always have access to what you've just written and it's sequential at that point so there are no parallelization issues.
You can keep track of if the flushSegment just flushed a text edge in flushSubtree and if so insert a separator.
// A SegmentEdge is a descriptor for the kind of object on either side of a segments boundary (leading and trailing). | ||
// In some cases the Edge type isn't known because the edge is a child segment itself. For these cases we use | ||
// the child segment to identify as the edge unless and until a known edge type is found | ||
type SegmentEdge = 0 | 1 | Segment; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't mix number and pointer types. This will mess up the JIT. Always use types as you would in Rust. (This means that we should really not mix null
and numbers neither.)
Storing the pointers is not great for memory management purposes when we port this. In general I try to avoid cross-linking between objects unless absolutely necessary since it breaks the simplicity of ownership. It also makes GC less predictable.
It also makes it harder to do tricks like moving segments around or reusing cached ones if the structure assumes some kind of graph.
I couldn't figure out how to do this. The challenge I see is segment.chunks are opaque so I need to stash the emitted types at the edges (between children) on the segments that interleave them. I was able to de-link the segment objects but I now am tracking both sides of a segments front and back edges so I can flush leading and/or trailing separators in the flushSegment. I pushed a rough version but there are some things I am not loving, for instance setting postEdge on the most recent segment if no new chunks have been written. This feels over-complicated and you seem to have an idea of how it could be done simply but I'm just not seeing it |
This pull request has been automatically marked as stale. If this pull request is still relevant, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize reviewing it yet. Your contribution is very much appreciated. |
Closing this pull request after a prolonged period of inactivity. If this issue is still present in the latest release, please ask for this pull request to be reopened. Thank you! |
This change addresses two related flaws in the current approach to text separators in Fizz.
First text separators can be emitted more often than necessary. This is because Segments don't know whether they will be followed by text and so if they end in text they assume they are and insert a separators
Second, because of the flaw above, Fizz output is not deterministic. If you wait for everything to be ready before flushing you could still end up with different output depending on whether certain segments are created or rendered inline based on the particular behavior of Suspendable components for any given run. If a Segment emits an unnecessary text separator in one pass and in another the Segment is never created because that Component did not Suspend the string output would differ for otherwise identical content
This change fixes both issues by decorating Segments with additional metadata about what kinds of emitted content is at it's bounds. These are called Edges and can be NODE, TEXT, or a Segment. NODE and TEXT are type identifiers that refer generically to these mutually exclusive concepts. if an edge is a Segment however it is the actual reference to the Segment object, which allows us to make inferences at flushing time as to whether additional text separators are needed.
There are 2 tracked edges per Segment
currentEdge - a pointer to type of the prior thing emitted for a Segment. If the segment has not yet been worked on this will be the Edge just before the Segment start. If the Segment is currently being rendered it will be the type or reference of the last thing this Segment emitted. If the Segment is complete it definitionally be the trailing edge of the Segment representing the type emitted at the boundary.
followingEdge - this edge is set by later Segments when the currentEdge of that segment is a Segment. If the currentEdge is a Segment and a new Edge is identified it is saved to the followingEdge of the currentEdge segment.
Finally, when flushing 2 things happen
If the just written Segment ends with text (currentEdge is TEXT_NODE) and if the followingEdge is also text, then a separator is written in between. If the followingEdge is instead another Completed Segment this segment is empty and we walk to that Segment's followingEdge and continue looking for Text.