[Bug]: WriteToBigQuery doesn't do WRITE_TRUNCATE properly with identical table names but in different datasets #34247

portikCoder · 2025-03-11T11:37:18Z

What happened?

Describe the bug

Using WriteToBigQuery transform for batch load with write disposition specified to truncate doesn't do its job as intended, instead of truncating all tables, it does truncate the first one.
It's happening only, in case the table IDs are identical in single batch job, but located in different BQ datasets.

To Reproduce

Steps to reproduce the behavior:

Prepare a json file with several inputs into different dataset locations, but identically named BQ tables
Initiate a BQ load job through WriteToBigQuery transform
Set write disposition to BigQueryDisposition.WRITE_TRUNCATE
Run it several times
Expect only the first table being truncated correctly, none of the others.

E.g.:

with Pipeline(options=pipeline_options) as pipeline:
  data = (pipeline
            | "ReadAll" >> ReadFromText(user_options.source_path))
  (data 
     | "Load data into BQ" >> WriteToBigQuery(..., write_disposition=BigQueryDisposition.WRITE_TRUNCATE))

Expected behavior

All the identically named tables within different datasets must be truncated, properly.

Actual behavior

Only the first table is being truncated (whatever first means in a heavily distributed system).

Environment (tested on)

Apache Beam version: 2.63.0
Runner: DirectRunner, DataflowRunner
OS: MacOS 15.3.1; build: 24D70
Python version: Python 3.11.9

Additional context

I already have a solution, just need to add test if even possible, didn't yet validate that.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

The text was updated successfully, but these errors were encountered:

portikCoder · 2025-03-11T11:38:53Z

.take-issue
.add-labels P2,dataflow,python

…th WRITE_TRUNCATE write disposition (apache#34247) * It doesn't take care of identical table-ids but from different dataset-id.

…th WRITE_TRUNCATE write disposition (apache#34247) * It only truncates the first table, but originally didn't take care of identical table-ids but from different dataset-id.

portikCoder added awaiting triage bug labels Mar 11, 2025

github-actions bot added python P2 labels Mar 11, 2025

github-actions bot removed the awaiting triage label Mar 11, 2025

github-actions bot assigned portikCoder Mar 11, 2025

github-actions bot added the dataflow label Mar 11, 2025

portikCoder mentioned this issue Mar 11, 2025

[Python] Fix WriteToBigQuery transform using CopyJob does not work with WRITE_TRUNCATE write disposition (#34247) #34248

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: WriteToBigQuery doesn't do WRITE_TRUNCATE properly with identical table names but in different datasets #34247

[Bug]: WriteToBigQuery doesn't do WRITE_TRUNCATE properly with identical table names but in different datasets #34247

portikCoder commented Mar 11, 2025 •

edited

Loading

portikCoder commented Mar 11, 2025 •

edited

Loading

[Bug]: WriteToBigQuery doesn't do WRITE_TRUNCATE properly with identical table names but in different datasets #34247

[Bug]: WriteToBigQuery doesn't do WRITE_TRUNCATE properly with identical table names but in different datasets #34247

Comments

portikCoder commented Mar 11, 2025 • edited Loading

What happened?

Describe the bug

To Reproduce

Expected behavior

Actual behavior

Environment (tested on)

Additional context

Issue Priority

Issue Components

portikCoder commented Mar 11, 2025 • edited Loading

portikCoder commented Mar 11, 2025 •

edited

Loading

portikCoder commented Mar 11, 2025 •

edited

Loading