Support for schema evolution in Delta table sink #1

richiesgr · 2023-03-21T10:32:56Z

Hi Vitor

I'm trying to use your idea for schema evolution in my use case but I can't get it work may be you've an idea for me.
So basically we read data from kafka (avro format) and using schema registry.
This data is written directly to Delta table we work in databricks
After using your code I can validate the input data is really transformed using the new schema at each version BUT nothing is reflected in the table sink.
I suspect that once spark have build the query plan/model he's not able to change it on the fly only restarting the process will catch the new version of the schema.

However you did it in case of kafka as a sink because you're able to transform the data back to avro

Do you think it's possible to do this guys from abris project claim that it's impossible at all as mentioned here

Thanks for any help

Orpheuz · 2023-03-27T14:12:56Z

Hey @richiesgr, it is indeed not possible to support schema evolution in streaming mode. If you notice on SchemaEvolution.scala#L29 I'm mimicking a batch job by using Trigger.Once with the streaming API. I'm running 3 different jobs in the example.
The issue is not related to the sink but to the way spark streaming works. The query execution plan is generated before running the stream which means it will only know the schema fields that were fetched from the schema registry on generation, any other updates will be ignored as the plan will be the same. There is no way of doing it seamlessly unless you build something that listens to schema updates and manages your jobs accordingly.

richiesgr · 2023-04-18T09:54:31Z

Thanks for your help and great job

richiesgr closed this as completed Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for schema evolution in Delta table sink #1

Support for schema evolution in Delta table sink #1

richiesgr commented Mar 21, 2023

Orpheuz commented Mar 27, 2023

richiesgr commented Apr 18, 2023

Support for schema evolution in Delta table sink #1

Support for schema evolution in Delta table sink #1

Comments

richiesgr commented Mar 21, 2023

Orpheuz commented Mar 27, 2023

richiesgr commented Apr 18, 2023