How to handle deserialization issues in from_avro? #182

nsanglar · 2021-02-08T14:45:55Z

Hello!

I am currently facing the following issue:

We get avro records from a topic that we read with spark streaming (2.4.x)
One of the avro record contains some malformed byte array (the type is bytes with logical type decimal)
This makes the deserialization fail, and the job cannot commit the processed offset since it aborts.
Upon restart, the job re-reads the faulty data and cannot go further

I would like to be able to ignore such cases where deserialization fails, but am struggling to find a nice solution.
Would you have any idea?

cerveada · 2021-02-10T07:39:44Z

Hello, sorry right now there is no option in Abris to solve that. I created a ticket for it #183. For now the only option is to detect and replace/fix that row before Abris is called.

moyphilip · 2021-03-05T20:30:32Z

@nsanglar hey do you have a solution for your problem? I ran into a similar issue.

nsanglar · 2021-03-15T08:11:55Z

@moyphilip I currently have a fork of the project in which I apply a different logic here:

ABRiS/src/main/scala/za/co/absa/abris/avro/sql/AvroDataToCatalyst.scala

Line 82 in 6520a91

    
           case NonFatal(e) =>  throw new SparkException("Malformed records are detected in record parsing.", e)

I just don't throw an exception but return an empty row and log some error.
I guess this is quite specific to my use case, so I am not sure this would be appropriate to incoporate it upstream.

ScaddingJ · 2022-01-21T14:13:15Z

@nsanglar how did you manage to return an empty row in your use case? I have a reason to do the same in mine.

cerveada added the question Further information is requested label Feb 10, 2021

cerveada closed this as completed Feb 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle deserialization issues in from_avro? #182

How to handle deserialization issues in from_avro? #182

nsanglar commented Feb 8, 2021

cerveada commented Feb 10, 2021

moyphilip commented Mar 5, 2021

nsanglar commented Mar 15, 2021

ScaddingJ commented Jan 21, 2022

How to handle deserialization issues in from_avro? #182

How to handle deserialization issues in from_avro? #182

Comments

nsanglar commented Feb 8, 2021

cerveada commented Feb 10, 2021

moyphilip commented Mar 5, 2021

nsanglar commented Mar 15, 2021

ScaddingJ commented Jan 21, 2022