Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Back off on gRPC error 16 UNAUTHENTICATED #366

Closed
jwulf opened this issue Jan 31, 2025 · 2 comments
Closed

Back off on gRPC error 16 UNAUTHENTICATED #366

jwulf opened this issue Jan 31, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request support An issue related to a support request

Comments

@jwulf
Copy link
Member

jwulf commented Jan 31, 2025

At the moment, worker polling backs off on 14 UNAVAILABLE and 8 RESOURCE EXHAUSTED (Backpressure).

Other errors, such as 16 UNAUTHENTICATED, should also cause a backoff. Immediate retries of an unauthenticated request DOS the gateway.

This should be implemented here: https://github.com/camunda/camunda-8-js-sdk/blob/main/src/zeebe/lib/ZBWorkerBase.ts#L511

The question is whether all error conditions should trigger a backoff, or if 16 should just be added to the list.

Intermittent network disruption, or waiting for a gateway to come up are examples where backing off reduces the recovery time of the system, but without protecting a gateway from DOS.

I think in the first instance, I'll add 16 UNAUTHENTICATED to the backoff conditions.

Related: SUPPORT-25519

@jwulf jwulf self-assigned this Jan 31, 2025
@jwulf jwulf changed the title Back off on errors other than GRPC 8 (Backpressure) Back off on gRPC error 16 UNAUTHENTICATED Feb 2, 2025
@jwulf jwulf added support An issue related to a support request enhancement New feature or request labels Feb 2, 2025
@jwulf
Copy link
Member Author

jwulf commented Feb 3, 2025

At the moment, client operations retry once on unauthenticated, and then fail:

if (isAuthError && authFailures === 0) {

So, making the worker progressively backoff is the way to go.

I'm not sure that there is a scenario where an unauthenticated worker can become authenticated without restarting the application.

But the log messages from a failing worker may help a developer / production understand what is happening. So continuing to emit them, but a decreasing rate is probably useful (and easier to implement than any other behaviour, because we already have backoff).

@jwulf jwulf closed this as completed in 56c3c78 Feb 4, 2025
github-actions bot pushed a commit that referenced this issue Feb 4, 2025
## [8.6.22](v8.6.21...v8.6.22) (2025-02-04)

### Features

* **zeebe:** implement backoff on UNAUTHENTICATED error for workers ([56c3c78](56c3c78)), closes [#366](#366)
github-actions bot pushed a commit that referenced this issue Feb 4, 2025
## [8.6.22](v8.6.21...v8.6.22) (2025-02-04)

### Features

* **zeebe:** implement backoff on UNAUTHENTICATED error for workers ([56c3c78](56c3c78)), closes [#366](#366)
@jwulf
Copy link
Member Author

jwulf commented Feb 4, 2025

By default, the job worker will back off exponentially to a maximum of 16s (16000 ms). You can tune that limit by setting CAMUNDA_JOB_WORKER_MAX_BACKOFF_MS via environment variable or the Camunda 8 client constructor parameters.

The Job Worker also emits streamError and backoff events.

The backoff event lets you know what the backoff period is.

petar-slavov pushed a commit to petar-slavov/camunda-8-js-sdk that referenced this issue Mar 14, 2025
## [8.6.22](camunda/camunda-8-js-sdk@v8.6.21...v8.6.22) (2025-02-04)

### Features

* **zeebe:** implement backoff on UNAUTHENTICATED error for workers ([56c3c78](camunda@56c3c78)), closes [camunda#366](camunda#366)
petar-slavov pushed a commit to petar-slavov/camunda-8-js-sdk that referenced this issue Mar 14, 2025
## [8.6.22](camunda/camunda-8-js-sdk@v8.6.21...v8.6.22) (2025-02-04)

### Features

* **zeebe:** implement backoff on UNAUTHENTICATED error for workers ([56c3c78](camunda@56c3c78)), closes [camunda#366](camunda#366)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request support An issue related to a support request
Projects
None yet
Development

No branches or pull requests

1 participant