-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: javaJar provider does not work with --yaml_pipeline flag: TypeError: a bytes-like object is required, not 'str' #34343
Comments
@robertwb or @derrickaw can you check pls ? |
BTW @jonathaningram , is it possible for you to also try running from a machine where Beam is directly installed in a virtual environment [1] [2] instead of running from a Docker container ? As mentioned elsewhere, published "beam_python3.11_sdk" container [3 ] is intended to be our worker container that is used internally by Beam runners. Submitting jobs from that container is not something we test/support currently. [1] https://beam.apache.org/documentation/sdks/yaml/#prerequisites |
@chamikaramj yep I can look at that. Is that mostly just about my local version being done “the right way” for future issues/support? Or is there info for this ticket you’re hoping to gain that I can provide after doing that? |
I think it should be useful for this ticket. It could be that running from SDK harness containers is just broken in a strange way since that's something we don't test/support officially. |
The containers such as beam_python3.11_sdk_with_java:2.63.0 are not meant for constructing beam pipelines, we build them to provide to workers for executing the pipeline as a distributed system (I filed #34350 ). Granted, they install a lot of the same bits, but they're certainly not tested as being a full working dev environment. |
That being said, this does look like a bug and a PR with your suggested patch would be appreciated. (Still looking into why we get a bytes object here to begin with.) |
Thanks for catching this! Alternative fix at #34351 (turns out the bytes object was coming from |
@robertwb awesome thank you. I'm glad you did the fix, it would have taken me some time to understand if it was the right fix. Do you know what the release timing for ending up in GCP looks like for this? Just want to set my own expectations for when I could try again. |
Yeah, that behavior is pretty surprising. The timing is actually pretty good--the Beam release will be cut shortly and out within ~weeks. GCP should pick things up shortly after that. Until then you can use your patch of this. |
@robertwb thanks. Excuse my ignorance, but how do apply this patch for running a job in GCP Dataflow? |
What happened?
Beam version: at least v2.63.0.
The
--yaml_pipeline
flag contains a string-like version of the pipeline. The--yaml_pipeline_file
flag contains a path to the file.We can successfully use the
--yaml_pipeline_file
flag locally to run our YAML pipeline. As soon as we switch to--yaml_pipeline
, it fails with an error. We tried both--yaml-pipeline
and--yaml-pipeline-file
flags fromgcloud dataflow yaml run
, and both seem to have the same issue.Note: We haven't been able run any YAML pipeline with a Java provider successfully in Dataflow, so we're interested in the possibility of a patch being applied to Dataflow, or if there's a workaround that would be great.
Stack trace
I've made a repro here: https://github.com/jonathaningram/beam-starter-java-provider-repro which contains much of the same info as I've put in this ticket.
The issue seems to be an encoding one.
A possible patch that works locally, but I haven't verified how suitable the fix is, so I've not proposed a PR.
Inside the
beam
repo:You can mount the
beam
source code in the container in my repro and observe that it now works:Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: