Skip to content
This repository was archived by the owner on Apr 8, 2024. It is now read-only.

Workers do not reconnect after TRANSIENT_FAILURE #152

Closed
jwulf opened this issue Apr 6, 2020 · 3 comments
Closed

Workers do not reconnect after TRANSIENT_FAILURE #152

jwulf opened this issue Apr 6, 2020 · 3 comments
Assignees

Comments

@jwulf
Copy link
Member

jwulf commented Apr 6, 2020

It looks like workers do not reconnect after a TRANSIENT_FAILURE with the pure JS Grpc implementation.

My hypothesis about this is that the channel goes to TRANSIENT_FAILURE, and it takes another gRPC call before the channel state switches.

In order to deal with this, I am changing the GrpcClient failure logic to wait for 5 seconds on a TRANSIENT_FAILURE, then resolve to reconnected and unblock further operations.

@jwulf jwulf self-assigned this Apr 6, 2020
@s3than s3than closed this as completed in 5bc2e12 Apr 6, 2020
@jwulf jwulf reopened this Apr 7, 2020
@jwulf
Copy link
Member Author

jwulf commented Apr 7, 2020

Yeah, this is still an issue. It looks like this with DEBUG level logging:

00:06:50.898 | zeebe |  [cloud-test-task] DEBUG: Long poll loop. this.longPoll: 30s
00:06:50.900 | zeebe |  [cloud-test-task] INFO: Stalled on Grpc Error
00:06:50.900 | zeebe |  [cloud-test-task] INFO: Grpc Error: 14 UNAVAILABLE: Connection dropped
00:06:50.900 | zeebe |  [cloud-test-task] DEBUG: Start watching Grpc channel...
00:06:50.900 | zeebe |  [cloud-test-task] INFO: Grpc Channel State: IDLE
00:06:50.901 | zeebe |  [cloud-test-task] DEBUG: {"code":14,"details":"Connection dropped","metadata":{"internalRepr":{},"options":{}}}
00:06:50.901 | zeebe |  [cloud-test-task] DEBUG: Stream ended after 27.449 seconds
00:06:50.901 | zeebe |  [cloud-test-task] DEBUG: Activating Jobs...
00:06:50.901 | zeebe |  [cloud-test-task] DEBUG: Grpc Channel state: IDLE
00:06:50.901 | zeebe |  [cloud-test-task] DEBUG: Long poll loop. this.longPoll: 30s
00:07:22.152 | zeebe |  [cloud-test-task] DEBUG: {"code":0,"details":"","metadata":{"internalRepr":{},"options":{}}}
00:07:22.152 | zeebe |  [cloud-test-task] DEBUG: Stream ended after 31.254 seconds
00:07:22.152 | zeebe |  [cloud-test-task] DEBUG: Activating Jobs...
00:07:22.153 | zeebe |  [cloud-test-task] DEBUG: Long poll loop. this.longPoll: 30s

@gizmo84
Copy link

gizmo84 commented Apr 24, 2020

Same here. I try to deploy a WorkflowInstance with the zeebe brocker being down:

11:57:03.917 | zeebe |  INFO: [topology]: 14 UNAVAILABLE: No connection established
11:57:03.920 | zeebe |  INFO: [createWorkflowInstance]: 14 UNAVAILABLE: No connection established
11:57:04.923 | zeebe |  INFO: [topology]: Attempt 2 (max: 50).
11:57:04.926 | zeebe |  INFO: [topology]: 14 UNAVAILABLE: No connection established
11:57:04.933 | zeebe |  INFO: [createWorkflowInstance]: Attempt 2 (max: 50).
11:57:04.937 | zeebe |  INFO: [createWorkflowInstance]: 14 UNAVAILABLE: No connection established
...

Then I start the brocker but zeebe-node never reconnect.

jwulf added a commit to jwulf/zeebe-client-node-js that referenced this issue May 6, 2020
@jwulf
Copy link
Member Author

jwulf commented May 6, 2020

Fixed in 0.23.0

@jwulf jwulf closed this as completed May 6, 2020
@jwulf jwulf mentioned this issue May 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants