Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/538 transform lock expiration fix #576

Conversation

stevanbz
Copy link
Contributor

@stevanbz stevanbz commented Oct 24, 2022

Issue #, if available: #538
In order to see if this bugfix is working and if the request is being retried if the timeout period of transform search request is breached I had to do couple of changes:

Description of changes:
Key changes:

  • TransformLockManager -> class responsible for variety of lock operations during transform execution; called from outside to obtain the lock or to renew it depending of the state;
  • TransformContext -> class that contains transformLockManager and page size that will be used when fetching the buckets or when executing the bool query. Provides a method for calculating the transform time out request based on the lock expiration date
  • BackoffPolicy.retry -> extension function on a backoff that does a retry of the provided block. If timeout exception is being raised, lock renew will be requested

CheckList:

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@stevanbz stevanbz requested a review from a team October 24, 2022 14:34
…ge size and transform lock

Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
@stevanbz stevanbz force-pushed the bugfix/538-transform-lock-expiration-fix branch from 1c353a0 to f6d29ce Compare October 24, 2022 15:47
do {
try {
return block(backoff)
} catch (e: OpenSearchException) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do we want to Retry with BackOff for Timeout Exception ? It makes sense for 502, 503 and 504 but trying to understand if the same condition applies for Timeout ?
  2. Additionally, should we also retry on 408 Error code? See https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/rest/RestStatus.java#L352
  3. ' 500 error code with Time exceeded' : I can't find the below code block throwing INTERNAL_SERVER_ERROR error.
  4. For my understanding, the back-off here is exponential or Linear ?
  5. Let us log the Retry Attempt count on line 220 and clarify/re-phrase the Operation part.

Copy link
Contributor Author

@stevanbz stevanbz Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. That's a good question. I am not sure if I have the right answer now. Current backoff period set up for the transform search is 1000 millis and that's used for internal searches with 5 number of retries.
  2. Hm. Makes sense. It will be tricky to test - but I guess I can always sleep the thread for some time.
  3. Yes it's not blocking the throw of INTERNAL_SERVER_ERROR with time exceeded message. Should I cover this case also? If you are referring to the comment - that's because the initial version had this check. Now that part of the comment will be removed. Tnx
  4. back-off here is always that is being sent as part of backoff policy. In the case of the transform search requests it's always 1000 millis and count number is 5 (https://github.com/opensearch-project/index-management/blob/main/src/main/kotlin/org/opensearch/indexmanagement/transform/TransformSearchService.kt#L73)

@stevanbz stevanbz force-pushed the bugfix/538-transform-lock-expiration-fix branch 4 times, most recently from b9ae73d to 2f65e99 Compare October 27, 2022 17:38
Copy link
Contributor

@khushbr khushbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making the changes @stevanbz !

I have left one comment around linking TimeoutTaskCancellationUtility code. Additionally, it would be good to add unit test for the code added, we can later discuss how to add Integration test for this scenario.

import kotlin.coroutines.CoroutineContext
import kotlin.coroutines.resume
import kotlin.coroutines.resumeWithException
import kotlin.coroutines.suspendCoroutine

const val OPENDISTRO_SECURITY_PROTECTED_INDICES_CONF_REQUEST = "_opendistro_security_protected_indices_conf_request"

// Timeout pattern used for checking the timeout message which is in unique format if the transform search timeout was set programmatically
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly suggest KDoc linking to TimeoutTaskCancellationUtility code here, instead of putting a verbal explanation.

@@ -192,6 +241,14 @@ fun OpenSearchException.isRetryable(): Boolean {
return (status() in listOf(RestStatus.BAD_GATEWAY, RestStatus.SERVICE_UNAVAILABLE, RestStatus.GATEWAY_TIMEOUT))
}

/**
* Retries on TaskCancelledException which means that the timeout occurred. In that case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank You for explaining.

Also documenting here that cancel_after_time_interval has not been back-ported to 1.0 and thus, we cannot use this feature for 1.0 version support.

@bowenlan-amzn bowenlan-amzn added backport 2.x v2.4.0 'Issues and PRs related to version v2.4.0' labels Oct 27, 2022
…xt of the current transform job execution

Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
@stevanbz stevanbz force-pushed the bugfix/538-transform-lock-expiration-fix branch from 2f65e99 to 92ac22f Compare October 28, 2022 16:48
Copy link
Contributor

@khushbr khushbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Let's add the Unit Tests and Integration tests as part of a separate PR, would recommend opening an issue tracking the test addition now.

Additionally, please add the manual testing details in the PR Description for documentation purpose and for future reference.

Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
@stevanbz stevanbz force-pushed the bugfix/538-transform-lock-expiration-fix branch from 7f28da3 to 67aa9cc Compare October 28, 2022 19:48
@khushbr khushbr merged commit 85cb1a5 into opensearch-project:main Oct 28, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 28, 2022
…576)

* 538: Adding timeout and retry to Transform '_search' API calls.
This code-fix also addresses the transform lock expiration.

Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>

Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
(cherry picked from commit 85cb1a5)
Angie-Zhang pushed a commit that referenced this pull request Oct 29, 2022
…576) (#580)

* 538: Adding timeout and retry to Transform '_search' API calls.
This code-fix also addresses the transform lock expiration.

Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>

Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
(cherry picked from commit 85cb1a5)

Co-authored-by: Stevan Buzejic <30922513+stevanbz@users.noreply.github.com>
wuychn pushed a commit to ochprince/index-management that referenced this pull request Mar 16, 2023
…pensearch-project#576) (opensearch-project#580)

* 538: Adding timeout and retry to Transform '_search' API calls.
This code-fix also addresses the transform lock expiration.

Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>

Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
(cherry picked from commit 85cb1a5)

Co-authored-by: Stevan Buzejic <30922513+stevanbz@users.noreply.github.com>
ronnaksaxena pushed a commit to ronnaksaxena/index-management that referenced this pull request Jul 19, 2023
…pensearch-project#576) (opensearch-project#580)

* 538: Adding timeout and retry to Transform '_search' API calls.
This code-fix also addresses the transform lock expiration.

Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>

Signed-off-by: Stevan Buzejic <buzejic.stevan@gmail.com>
(cherry picked from commit 85cb1a5)

Co-authored-by: Stevan Buzejic <30922513+stevanbz@users.noreply.github.com>
Signed-off-by: Ronnak Saxena <ronsax@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x v2.4.0 'Issues and PRs related to version v2.4.0'
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants