-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential workspace deadlocks (ocrd_network) #1142
Conversation
@kba, @joschrew let me know your opinion regarding this. To reproduce, just remove one processor from the processing server config file but keep that processor in the workflow script. |
I am not sure whether I understand the alternatives. I think the behavior as of 6b88ad2 (returning 404 and |
Yes, but this is only correct for the processing endpoint. However, from the viewpoint of the workflow endpoint, it is different. It will be a bad response when the user expects some |
I would choose what is easier to implement, I am not sure but I think the processing-endpoint to "always" return a job-id might be easier to achieve. |
Well, that is more convenient with the workflow endpoint, but unfortunate if the user wants to run a single job through the processing endpoint. The response for processing job submission would always be a
That's true.
Yes, when everything seems fine with the request before it is forwarded either to a processing worker or processor server, the processing server returns 200 with a job_id to the user. However, when the job execution by either of the agents fails, that's when the job status changes to FAILED.
That's a good idea.
Yes, a mixture is the right way. Unfortunately, the processor existence check still needs to happen in the processing endpoint as well. |
This is now implemented and was the most basic way to resolve the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I have checked what has changed codewise and run some tests with docker-compose and the changes. For me it works as expected.
I noticed 2 things which could optionally be changed additionally (GitHub does not make it possible for me to add suggestions for these):
- processing_server.py line 533: the "TODO"-comment: could be removed, as I think the mentioned changes where possibly be made. But maybe more of the TODO-Notes should be revised in another run
- server_utils.py line 94: The response message when input validation of the processor-job fails should be more clear imo. At least the same message the logger writes could be send back to a client.
True, the duplications were fixed in the current PR.
Added. |
It makes sense to catch undeployed agents before executing a workflow and if users are using the endpoints directly, they can be expected to poll the job. So LGTM, thanks, merging. |
Fix for #1125:
processing worker
orprocessor server
) is not deployed. As a result, the workspace will no longer be locked for cases where the processing agent is missing.There are still issues related to the workflow endpoint.The workflow endpoint assumes that the processing endpoint would always return ajob_id
. However, this is not the case, hence, either:job_id
The first choice was implemented.