Workers do not reconnect after TRANSIENT_FAILURE #152

jwulf · 2020-04-06T02:42:52Z

It looks like workers do not reconnect after a TRANSIENT_FAILURE with the pure JS Grpc implementation.

My hypothesis about this is that the channel goes to TRANSIENT_FAILURE, and it takes another gRPC call before the channel state switches.

In order to deal with this, I am changing the GrpcClient failure logic to wait for 5 seconds on a TRANSIENT_FAILURE, then resolve to reconnected and unblock further operations.

jwulf · 2020-04-07T13:45:58Z

Yeah, this is still an issue. It looks like this with DEBUG level logging:

00:06:50.898 | zeebe |  [cloud-test-task] DEBUG: Long poll loop. this.longPoll: 30s
00:06:50.900 | zeebe |  [cloud-test-task] INFO: Stalled on Grpc Error
00:06:50.900 | zeebe |  [cloud-test-task] INFO: Grpc Error: 14 UNAVAILABLE: Connection dropped
00:06:50.900 | zeebe |  [cloud-test-task] DEBUG: Start watching Grpc channel...
00:06:50.900 | zeebe |  [cloud-test-task] INFO: Grpc Channel State: IDLE
00:06:50.901 | zeebe |  [cloud-test-task] DEBUG: {"code":14,"details":"Connection dropped","metadata":{"internalRepr":{},"options":{}}}
00:06:50.901 | zeebe |  [cloud-test-task] DEBUG: Stream ended after 27.449 seconds
00:06:50.901 | zeebe |  [cloud-test-task] DEBUG: Activating Jobs...
00:06:50.901 | zeebe |  [cloud-test-task] DEBUG: Grpc Channel state: IDLE
00:06:50.901 | zeebe |  [cloud-test-task] DEBUG: Long poll loop. this.longPoll: 30s
00:07:22.152 | zeebe |  [cloud-test-task] DEBUG: {"code":0,"details":"","metadata":{"internalRepr":{},"options":{}}}
00:07:22.152 | zeebe |  [cloud-test-task] DEBUG: Stream ended after 31.254 seconds
00:07:22.152 | zeebe |  [cloud-test-task] DEBUG: Activating Jobs...
00:07:22.153 | zeebe |  [cloud-test-task] DEBUG: Long poll loop. this.longPoll: 30s

gizmo84 · 2020-04-24T09:59:43Z

Same here. I try to deploy a WorkflowInstance with the zeebe brocker being down:

11:57:03.917 | zeebe |  INFO: [topology]: 14 UNAVAILABLE: No connection established
11:57:03.920 | zeebe |  INFO: [createWorkflowInstance]: 14 UNAVAILABLE: No connection established
11:57:04.923 | zeebe |  INFO: [topology]: Attempt 2 (max: 50).
11:57:04.926 | zeebe |  INFO: [topology]: 14 UNAVAILABLE: No connection established
11:57:04.933 | zeebe |  INFO: [createWorkflowInstance]: Attempt 2 (max: 50).
11:57:04.937 | zeebe |  INFO: [createWorkflowInstance]: 14 UNAVAILABLE: No connection established
...

Then I start the brocker but zeebe-node never reconnect.

…amunda-community-hub#99

jwulf · 2020-05-06T15:46:58Z

Fixed in 0.23.0

jwulf self-assigned this Apr 6, 2020

s3than closed this as completed in 5bc2e12 Apr 6, 2020

jwulf reopened this Apr 7, 2020

jwulf added a commit to jwulf/zeebe-client-node-js that referenced this issue May 6, 2020

Fixes camunda-community-hub#158 Fixes camunda-community-hub#152 Fixes c…

c2ee6d5

…amunda-community-hub#99

jwulf closed this as completed May 6, 2020

jwulf mentioned this issue May 6, 2020

0.23.0 #162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workers do not reconnect after TRANSIENT_FAILURE #152

Workers do not reconnect after TRANSIENT_FAILURE #152

jwulf commented Apr 6, 2020

jwulf commented Apr 7, 2020

gizmo84 commented Apr 24, 2020

jwulf commented May 6, 2020

Workers do not reconnect after TRANSIENT_FAILURE #152

Workers do not reconnect after TRANSIENT_FAILURE #152

Comments

jwulf commented Apr 6, 2020

jwulf commented Apr 7, 2020

gizmo84 commented Apr 24, 2020

jwulf commented May 6, 2020