-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/538 transform lock expiration fix #576
Bugfix/538 transform lock expiration fix #576
Conversation
…ge size and transform lock Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
1c353a0
to
f6d29ce
Compare
src/main/kotlin/org/opensearch/indexmanagement/opensearchapi/OpenSearchExtensions.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/opensearchapi/OpenSearchExtensions.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TransformRunner.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/util/TransformLockManager.kt
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/TransformRunner.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/opensearchapi/OpenSearchExtensions.kt
Outdated
Show resolved
Hide resolved
do { | ||
try { | ||
return block(backoff) | ||
} catch (e: OpenSearchException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Do we want to Retry with BackOff for Timeout Exception ? It makes sense for 502, 503 and 504 but trying to understand if the same condition applies for Timeout ?
- Additionally, should we also retry on 408 Error code? See https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/rest/RestStatus.java#L352
- ' 500 error code with Time exceeded' : I can't find the below code block throwing
INTERNAL_SERVER_ERROR
error. - For my understanding, the back-off here is exponential or Linear ?
- Let us log the Retry Attempt count on line 220 and clarify/re-phrase the Operation part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- That's a good question. I am not sure if I have the right answer now. Current backoff period set up for the transform search is 1000 millis and that's used for internal searches with 5 number of retries.
- Hm. Makes sense. It will be tricky to test - but I guess I can always sleep the thread for some time.
- Yes it's not blocking the throw of INTERNAL_SERVER_ERROR with time exceeded message. Should I cover this case also? If you are referring to the comment - that's because the initial version had this check. Now that part of the comment will be removed. Tnx
- back-off here is always that is being sent as part of backoff policy. In the case of the transform search requests it's always 1000 millis and count number is 5 (https://github.com/opensearch-project/index-management/blob/main/src/main/kotlin/org/opensearch/indexmanagement/transform/TransformSearchService.kt#L73)
src/main/kotlin/org/opensearch/indexmanagement/opensearchapi/OpenSearchExtensions.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/opensearchapi/OpenSearchExtensions.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/opensearchapi/OpenSearchExtensions.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/util/TransformLockManager.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/util/TransformLockManager.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/util/TransformLockManager.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/util/TransformLockManager.kt
Show resolved
Hide resolved
src/main/kotlin/org/opensearch/indexmanagement/transform/util/TransformLockManager.kt
Show resolved
Hide resolved
b9ae73d
to
2f65e99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making the changes @stevanbz !
I have left one comment around linking TimeoutTaskCancellationUtility code. Additionally, it would be good to add unit test for the code added, we can later discuss how to add Integration test for this scenario.
import kotlin.coroutines.CoroutineContext | ||
import kotlin.coroutines.resume | ||
import kotlin.coroutines.resumeWithException | ||
import kotlin.coroutines.suspendCoroutine | ||
|
||
const val OPENDISTRO_SECURITY_PROTECTED_INDICES_CONF_REQUEST = "_opendistro_security_protected_indices_conf_request" | ||
|
||
// Timeout pattern used for checking the timeout message which is in unique format if the transform search timeout was set programmatically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly suggest KDoc linking to TimeoutTaskCancellationUtility code here, instead of putting a verbal explanation.
@@ -192,6 +241,14 @@ fun OpenSearchException.isRetryable(): Boolean { | |||
return (status() in listOf(RestStatus.BAD_GATEWAY, RestStatus.SERVICE_UNAVAILABLE, RestStatus.GATEWAY_TIMEOUT)) | |||
} | |||
|
|||
/** | |||
* Retries on TaskCancelledException which means that the timeout occurred. In that case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank You for explaining.
Also documenting here that cancel_after_time_interval
has not been back-ported to 1.0 and thus, we cannot use this feature for 1.0 version support.
…xt of the current transform job execution Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
2f65e99
to
92ac22f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Let's add the Unit Tests and Integration tests as part of a separate PR, would recommend opening an issue tracking the test addition now.
Additionally, please add the manual testing details in the PR Description for documentation purpose and for future reference.
Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
7f28da3
to
67aa9cc
Compare
…576) (#580) * 538: Adding timeout and retry to Transform '_search' API calls. This code-fix also addresses the transform lock expiration. Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> (cherry picked from commit 85cb1a5) Co-authored-by: Stevan Buzejic <30922513+stevanbz@users.noreply.github.com>
…pensearch-project#576) (opensearch-project#580) * 538: Adding timeout and retry to Transform '_search' API calls. This code-fix also addresses the transform lock expiration. Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> (cherry picked from commit 85cb1a5) Co-authored-by: Stevan Buzejic <30922513+stevanbz@users.noreply.github.com>
…pensearch-project#576) (opensearch-project#580) * 538: Adding timeout and retry to Transform '_search' API calls. This code-fix also addresses the transform lock expiration. Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com> (cherry picked from commit 85cb1a5) Co-authored-by: Stevan Buzejic <30922513+stevanbz@users.noreply.github.com> Signed-off-by: Ronnak Saxena <ronsax@amazon.com>
Issue #, if available: #538
In order to see if this bugfix is working and if the request is being retried if the timeout period of transform search request is breached I had to do couple of changes:
Description of changes:
Key changes:
CheckList:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.