-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set runtime schema definitions in the topology #16732
Labels
type: feature
A value-adding code addition that introduce new functionality.
Comments
24 tasks
github-merge-queue bot
pushed a commit
that referenced
this issue
Jun 29, 2023
closes #16732 In order for sinks to use semantic meaning, they need a mapping of meanings to fields. This is included in the schema definition of events, but the exact definition that needs to be used depends on the path the event took to get to the sink. The schema definition of an event is tracked at runtime so this can be determined. A `parent_id` was added to event metadata to track the previous component that an event came from, which lets the topology select the correct schema definition to attach to events. For sources, there is only one definition that can be attached (for each port). This is automatically attached in the topology layer (after an event is emitted by a source), so there is no additional work in each source to support this. For transforms, it's slightly more complicated. The schema definition depends on both the output port _and_ the component the event came from. A map is generated at Vector startup, and the correct definition is obtained from that at runtime. This also happens in the topology layer so transforms don't need to worry about this. Previously the `remap` transform had custom code to support runtime schema definitions (for the VRL meaning functions). This was removed since it's now handled automatically. The `reduce` and `lua` transforms are special cases since there is no clear "path" that an event takes through the topology, since multiple events can be merged (from different inputs) in `reduce`. For `lua`, output events may not be related to input events at all. In these cases the schema definition map will have the same value for all inputs (they are all merged). The topology will then arbitrarily pick one (since they are all the same). --------- Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com> Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>
Nithiya2021
pushed a commit
to Nithiya2021/vector
that referenced
this issue
Jan 19, 2024
closes vectordotdev/vector#16732 In order for sinks to use semantic meaning, they need a mapping of meanings to fields. This is included in the schema definition of events, but the exact definition that needs to be used depends on the path the event took to get to the sink. The schema definition of an event is tracked at runtime so this can be determined. A `parent_id` was added to event metadata to track the previous component that an event came from, which lets the topology select the correct schema definition to attach to events. For sources, there is only one definition that can be attached (for each port). This is automatically attached in the topology layer (after an event is emitted by a source), so there is no additional work in each source to support this. For transforms, it's slightly more complicated. The schema definition depends on both the output port _and_ the component the event came from. A map is generated at Vector startup, and the correct definition is obtained from that at runtime. This also happens in the topology layer so transforms don't need to worry about this. Previously the `remap` transform had custom code to support runtime schema definitions (for the VRL meaning functions). This was removed since it's now handled automatically. The `reduce` and `lua` transforms are special cases since there is no clear "path" that an event takes through the topology, since multiple events can be merged (from different inputs) in `reduce`. For `lua`, output events may not be related to input events at all. In these cases the schema definition map will have the same value for all inputs (they are all merged). The topology will then arbitrarily pick one (since they are all the same). --------- Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com> Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>
Nithiya2021
pushed a commit
to Nithiya2021/vector
that referenced
this issue
Jan 19, 2024
closes vectordotdev/vector#16732 In order for sinks to use semantic meaning, they need a mapping of meanings to fields. This is included in the schema definition of events, but the exact definition that needs to be used depends on the path the event took to get to the sink. The schema definition of an event is tracked at runtime so this can be determined. A `parent_id` was added to event metadata to track the previous component that an event came from, which lets the topology select the correct schema definition to attach to events. For sources, there is only one definition that can be attached (for each port). This is automatically attached in the topology layer (after an event is emitted by a source), so there is no additional work in each source to support this. For transforms, it's slightly more complicated. The schema definition depends on both the output port _and_ the component the event came from. A map is generated at Vector startup, and the correct definition is obtained from that at runtime. This also happens in the topology layer so transforms don't need to worry about this. Previously the `remap` transform had custom code to support runtime schema definitions (for the VRL meaning functions). This was removed since it's now handled automatically. The `reduce` and `lua` transforms are special cases since there is no clear "path" that an event takes through the topology, since multiple events can be merged (from different inputs) in `reduce`. For `lua`, output events may not be related to input events at all. In these cases the schema definition map will have the same value for all inputs (they are all merged). The topology will then arbitrarily pick one (since they are all the same). --------- Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com> Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>
Nithiya2021
pushed a commit
to Nithiya2021/vector
that referenced
this issue
Jan 19, 2024
closes vectordotdev/vector#16732 In order for sinks to use semantic meaning, they need a mapping of meanings to fields. This is included in the schema definition of events, but the exact definition that needs to be used depends on the path the event took to get to the sink. The schema definition of an event is tracked at runtime so this can be determined. A `parent_id` was added to event metadata to track the previous component that an event came from, which lets the topology select the correct schema definition to attach to events. For sources, there is only one definition that can be attached (for each port). This is automatically attached in the topology layer (after an event is emitted by a source), so there is no additional work in each source to support this. For transforms, it's slightly more complicated. The schema definition depends on both the output port _and_ the component the event came from. A map is generated at Vector startup, and the correct definition is obtained from that at runtime. This also happens in the topology layer so transforms don't need to worry about this. Previously the `remap` transform had custom code to support runtime schema definitions (for the VRL meaning functions). This was removed since it's now handled automatically. The `reduce` and `lua` transforms are special cases since there is no clear "path" that an event takes through the topology, since multiple events can be merged (from different inputs) in `reduce`. For `lua`, output events may not be related to input events at all. In these cases the schema definition map will have the same value for all inputs (they are all merged). The topology will then arbitrarily pick one (since they are all the same). --------- Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com> Co-authored-by: Stephen Wakely <fungus.humungus@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
A note for the community
Use Cases
depends on #16730
Since semantic meanings are now used in sinks (when log namespacing is enabled) that means that the schema definition needs to be accessible on each event at runtime so that the meanings can be looked up. This already exists in
EventMetadata
, but it is not currently being set. Once the issue linked above is done, it should be fairly straightforward to keep a definition for each source, and a map of definitions (OutputId -> Definition) for each transform, so that the schema definition can be set on each event after it's generated from a component. The definitions should be stored in anArc
so they are cheap to clone into event metadata for each event.The text was updated successfully, but these errors were encountered: