You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use your idea for schema evolution in my use case but I can't get it work may be you've an idea for me.
So basically we read data from kafka (avro format) and using schema registry.
This data is written directly to Delta table we work in databricks
After using your code I can validate the input data is really transformed using the new schema at each version BUT nothing is reflected in the table sink.
I suspect that once spark have build the query plan/model he's not able to change it on the fly only restarting the process will catch the new version of the schema.
However you did it in case of kafka as a sink because you're able to transform the data back to avro
Do you think it's possible to do this guys from abris project claim that it's impossible at all as mentioned here
Thanks for any help
The text was updated successfully, but these errors were encountered:
Hey @richiesgr, it is indeed not possible to support schema evolution in streaming mode. If you notice on SchemaEvolution.scala#L29 I'm mimicking a batch job by using Trigger.Once with the streaming API. I'm running 3 different jobs in the example.
The issue is not related to the sink but to the way spark streaming works. The query execution plan is generated before running the stream which means it will only know the schema fields that were fetched from the schema registry on generation, any other updates will be ignored as the plan will be the same. There is no way of doing it seamlessly unless you build something that listens to schema updates and manages your jobs accordingly.
Hi Vitor
I'm trying to use your idea for schema evolution in my use case but I can't get it work may be you've an idea for me.
So basically we read data from kafka (avro format) and using schema registry.
This data is written directly to Delta table we work in databricks
After using your code I can validate the input data is really transformed using the new schema at each version BUT nothing is reflected in the table sink.
I suspect that once spark have build the query plan/model he's not able to change it on the fly only restarting the process will catch the new version of the schema.
However you did it in case of kafka as a sink because you're able to transform the data back to avro
Do you think it's possible to do this guys from abris project claim that it's impossible at all as mentioned here
Thanks for any help
The text was updated successfully, but these errors were encountered: