You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using WriteToBigQuery transform for batch load with write disposition specified to truncate doesn't do its job as intended, instead of truncating all tables, it does truncate the first one.
It's happening only, in case the table IDs are identical in single batch job, but located in different BQ datasets.
To Reproduce
Steps to reproduce the behavior:
Prepare a json file with several inputs into different dataset locations, but identically named BQ tables
Initiate a BQ load job through WriteToBigQuery transform
Set write disposition to BigQueryDisposition.WRITE_TRUNCATE
Run it several times
Expect only the first table being truncated correctly, none of the others.
E.g.:
withPipeline(options=pipeline_options) aspipeline:
data= (pipeline|"ReadAll">>ReadFromText(user_options.source_path))
(data|"Load data into BQ">>WriteToBigQuery(..., write_disposition=BigQueryDisposition.WRITE_TRUNCATE))
Expected behavior
All the identically named tables within different datasets must be truncated, properly.
Actual behavior
Only the first table is being truncated (whatever first means in a heavily distributed system).
Environment (tested on)
Apache Beam version: 2.63.0
Runner: DirectRunner, DataflowRunner
OS: MacOS 15.3.1; build: 24D70
Python version: Python 3.11.9
Additional context
I already have a solution, just need to add test if even possible, didn't yet validate that.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
Component: Python SDK
Component: Java SDK
Component: Go SDK
Component: Typescript SDK
Component: IO connector
Component: Beam YAML
Component: Beam examples
Component: Beam playground
Component: Beam katas
Component: Website
Component: Infrastructure
Component: Spark Runner
Component: Flink Runner
Component: Samza Runner
Component: Twister2 Runner
Component: Hazelcast Jet Runner
Component: Google Cloud Dataflow Runner
The text was updated successfully, but these errors were encountered:
…th WRITE_TRUNCATE write disposition (apache#34247)
* It only truncates the first table, but originally didn't take care of identical table-ids but from different dataset-id.
What happened?
Describe the bug
Using
WriteToBigQuery
transform for batch load with write disposition specified to truncate doesn't do its job as intended, instead of truncating all tables, it does truncate the first one.It's happening only, in case the table IDs are identical in single batch job, but located in different BQ datasets.
To Reproduce
Steps to reproduce the behavior:
WriteToBigQuery
transformBigQueryDisposition.WRITE_TRUNCATE
E.g.:
Expected behavior
All the identically named tables within different datasets must be truncated, properly.
Actual behavior
Only the first table is being truncated (whatever first means in a heavily distributed system).
Environment (tested on)
DirectRunner
,DataflowRunner
Python 3.11.9
Additional context
I already have a solution, just need to add test if even possible, didn't yet validate that.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: