-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent token authorisation failure #125
Comments
Perfect example, it just failed in CI:
|
|
@jwulf I'm unsure about what I'm chasing.
Can you point me to some failed workflows that have the auth error you're talking about? |
I re-run the failed workflows, and they pass - so there are no historical examples. The tests are the integration tests. None of the unit tests trigger it that I know of. To run the test suite that fails locally, put Camunda SaaS credentials in the environment and run:
|
@pepopowitz pointed out that the token is cached on the server, so proactively expiring the token doesn't work. We'll retry once on a 401: UNAUTHORISED with a new token request. |
If a REST call fails with 401, we retry once in case it is a token expiry edge-case fixes #125
According to the Identity team, the server cached token is removed 30s before it expires, so if we hardcode a 10s window we should get a new token. Which invalidates our original hypothesis. The previous (configurable) window of 10s should work. |
* feat(repo): add status code to HTTPError type Errors of type HTTPError now have a status code field * fix(oauth): retry once on 401 to get new token If a REST call fails with 401, we retry once in case it is a token expiry edge-case fixes #125 * test(operate): document the delay and eventual consistency measure in test * fix(oauth): decode jwt to get expiry time fixes #125
The expiry time of the token was calculated from in the SDK from the retrieval time plus the validity period. Since cached tokens are returned by a call for a token, such a token will not be valid for the entire validity duration. I've changed it to decode the token and use its expiry time. |
* feat(repo): add stack traces to async REST errors (#131) * feat(repo): add stack traces to async REST errors * test(operate): disable OAuth in unit test * test(operate): disable OAuth for Operate client in test * test(operate): rename integration test file * chore(repo): run unit test in pre-commit hook * test(operate): change process filename casing * chore(release): 8.5.1-alpha.1 [skip ci] ## [8.5.1-alpha.1](v8.5.0...v8.5.1-alpha.1) (2024-04-09) ### Features * **repo:** add stack traces to async REST errors ([#131](#131)) ([ef8d9c6](ef8d9c6)) * chore(release): 8.5.1-alpha.1 [skip ci] ## [8.5.1-alpha.1](v8.5.0...v8.5.1-alpha.1) (2024-04-09) ### Features * **repo:** add stack traces to async REST errors ([#131](#131)) ([ef8d9c6](ef8d9c6)) * docs(repo): add type of RESTError to JSDoc * docs(repo): update README.md * docs(repo): update README.md Fix Wrong Zeebe client method in readme file * chore(repo): ignore .idea folder in git To prevent IDE-specific settings from interfering with the project setup, the .idea folder, which is created by JetBrains IDEs, is now added to .gitignore. This ensures that developer-specific configurations do not pollute the project repository. * feat(repo): add status code to HTTPError type (#135) * feat(repo): add status code to HTTPError type Errors of type HTTPError now have a status code field * fix(oauth): retry once on 401 to get new token If a REST call fails with 401, we retry once in case it is a token expiry edge-case fixes #125 * test(operate): document the delay and eventual consistency measure in test * fix(oauth): decode jwt to get expiry time fixes #125 * chore(release): 8.5.1-alpha.2 [skip ci] ## [8.5.1-alpha.2](v8.5.1-alpha.1...v8.5.1-alpha.2) (2024-04-20) ### Features * **repo:** add status code to HTTPError type ([#135](#135)) ([cfea141](cfea141)), closes [#125](#125) [#125](#125) * chore(release): 8.5.1-alpha.2 [skip ci] ## [8.5.1-alpha.2](v8.5.1-alpha.1...v8.5.1-alpha.2) (2024-04-20) ### Features * **repo:** add status code to HTTPError type ([#135](#135)) ([cfea141](cfea141)), closes [#125](#125) [#125](#125) * ci(repo): update docker login command * ci(repo): update docker login command * ci: update docker login * build(repo): correct docker username * refactor(oauth): use got in place of node-fetch (#138) * refactor(oauth): use got in place of node-fetch * test(oauth): update cache eviction test * fix(tasklist): correct default value of includeVariables parameter in tasklist variables search (#136) * chore(release): 8.5.1-alpha.3 [skip ci] ## [8.5.1-alpha.3](v8.5.1-alpha.2...v8.5.1-alpha.3) (2024-04-29) ### Bug Fixes * **tasklist:** correct default value of includeVariables parameter in tasklist variables search ([#136](#136)) ([23af921](23af921)) * chore(release): 8.5.1-alpha.3 [skip ci] ## [8.5.1-alpha.3](v8.5.1-alpha.2...v8.5.1-alpha.3) (2024-04-29) ### Bug Fixes * **tasklist:** correct default value of includeVariables parameter in tasklist variables search ([#136](#136)) ([23af921](23af921)) * feat(repo): load system certs when custom cert specified * feat(repo): load system certs when custom cert specified fixes #139 * fix(oauth): throw if cacheDir not writeable on Windows * test(oauth): make cache dir read-only on Windows * test(oauth): make cache dir deleteable on Windows * test(oauth): make token cache dir read-only on Windows * test(oauth): hack around Windows test not throwing * refactor(repo): test self-signed certificate support fixes #139 fixes #141 * test(repo): isolate unit tests and do not run in integration envs * test(repo): run unit tests on Windows runner in CI * feat(zeebe): update gRPC package dep version * ci(repo): run unit tests on Windows for PRs * ci(repo): use cross-env to support Windows runner in CI * test(oauth): use execSync for Windows commands * refactor(repo): use win-ca for Windows system certificates * refactor(repo): refactor code to use win-ca on Windows * refactor(oauth): debug log in CI * revert(oauth): remove certificate debug statement * chore(release): 8.5.1-alpha.4 [skip ci] ## [8.5.1-alpha.4](v8.5.1-alpha.3...v8.5.1-alpha.4) (2024-05-03) ### Features * **repo:** load system certs when custom cert specified ([afce0a7](afce0a7)), closes [#139](#139) [#139](#139) [#141](#141) * chore(release): 8.5.1-alpha.4 [skip ci] ## [8.5.1-alpha.4](v8.5.1-alpha.3...v8.5.1-alpha.4) (2024-05-03) ### Features * **repo:** load system certs when custom cert specified ([afce0a7](afce0a7)), closes [#139](#139) [#139](#139) [#141](#141) --------- Co-authored-by: semantic-release-bot <semantic-release-bot@martynus.net> Co-authored-by: Hasan Alnatour <hassanalnator@gmail.com> Co-authored-by: Roman Shamborovskyy <shamborovskyy@gmail.com>
## [8.5.1](v8.5.0...v8.5.1) (2024-05-05) ### Features * **repo:** load system certs when custom cert specified ([#144](#144)) ([8a47d5e](8a47d5e)), closes [#131](#131) [#131](#131) [#131](#131) [#135](#135) [#125](#125) [#125](#125) [#125](#125) [#125](#125) [#125](#125) [#125](#125) [#138](#138) [#136](#136) [#136](#136) [#136](#136) [#139](#139) [#139](#139) [#141](#141) [#139](#139) [#139](#139) [#141](#141) [#139](#139) [#139](#139) [#141](#141)
## [8.5.1](v8.5.0...v8.5.1) (2024-05-05) ### Features * **repo:** load system certs when custom cert specified ([#144](#144)) ([8a47d5e](8a47d5e)), closes [#131](#131) [#131](#131) [#131](#131) [#135](#135) [#125](#125) [#125](#125) [#125](#125) [#125](#125) [#125](#125) [#125](#125) [#138](#138) [#136](#136) [#136](#136) [#136](#136) [#139](#139) [#139](#139) [#141](#141) [#139](#139) [#139](#139) [#141](#141) [#139](#139) [#139](#139) [#141](#141)
Integration tests are still intermittently failing on authorisation.
Periodically a unit test will fail with 401: UNAUTHORISED.
If I "re-run failed jobs" is will reliably pass on the second run.
My hypothesis is that the API client involved in the test is attempting to make its calls with an expired token.
Token expiry is handled in the SDK in the OAuth component. This component encapsulates retrieving tokens from the token endpoint, caching them in memory and on disk, and providing a token to an API client when the API client wants to make an API call.
The OAuth client should be checking if it has a token in-memory or on-disk (the on-disk caching is for when applications are restarted), then checking if the token is expired or is likely to expire soon (there is a threshold setting that represents "this might expire before the call makes the roundtrip") and either requesting a new token from the endpoint to pass on or passing on the cached token.
Some tests - notably the Tasklist ones - will fail multiple calls when they do fail.
Otherwise, I've noticed that it happens later in the test suite, which leads me to think that it happens when a 300 second validity token expires and there is some race condition or logic error that means it is not correctly refreshed before being passed to the API client.
This is difficult to reproduce reliably.
Maybe it needs some specific test of the token refresh timing logic? Or maybe there is something obvious in the code that I am missing.
The text was updated successfully, but these errors were encountered: