-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove lock in HelixStateTransitionHandler #1681
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: will it be save to exclude the invoke() for the locking? There could be two messages at the same time invoked for the same state mode: 1 regular ST and 1 cancel message. I am not quite sure whether that could cause problem.
Thanks for the offline discussion! For When
|
Question, Could you explain more about the necessity of the change? In detail, the main thread creates taskRunner thread which updates the state at the end. And the main thread continues to run after taskRunner thread is created. In this case, even the main thread holds the lock, the taskRunner can still execute. And the runner thread can wait until the main thread finishes the state update. I remember you mentioned this when we talked offline, but please document the detailed reason in this ticket too. |
Thanks for the question. However, when handling 'RUNNING' -> 'STOPPED' , stateTransition handler creates a taskRunner, wait for state transition finishes and then continue the rest part. So if we include the
|
Changing this PR to removing the redundant lock since it is not logically complete. |
f0a5a46
to
37434f9
Compare
This PR is ready to be merged. Approved by @dasahcc Remove synchronization on _staleModel in HelixStateTransitionHandler In current message handling design, |
2e363a4
to
9ade91e
Compare
9ade91e
to
1661895
Compare
1661895
to
6f95bf8
Compare
* reducing lock scope in HelixStateTransitionHandler * remove lock
Issues
#1675 Remove requested state update in task framwork.
This is the first PR of the issue.
This PR is for lock scope deduction only, because lock scope change expects more review and has independent logic itself.
Description
*High level design of the lock scope:
We can prevent over write by checking the in memory state _stateModel and update ZK conditionally in message handling. Also the in memory state _stateModel update (in both task Runner thread and state transition handling thread), comparing and ZK update need to be protected by a lock.
*Is it safe to do so?
In current message handling design,
HelixTaskExecutor
(a message lister that creates and schedules all message handler) guarantees that only one on going state transition per each partition/task at a time, removing this synchronization shouldn't cause any problem.When
onMessage
receives a STATE_TRANSITION_CANCELLATION messages and a ST message for the same entity, aHelixStateTransitionCancellationHanlder
is created can call cancel(). Changing the lock scope does not have any influence onHelixStateTransitionCancellationHanlder
.*Why need to Remove lock in HelixStateTransitionHandle
In the design to remove requested state, task runner thread update CurrentState when task finishes or ends with error state.
However, it is possible that the state transition handler is also trying to update CurrentState. It could be an INIT→ RUNNING, RUNNING→CANCEL, or any other state transition message message. We will run into the write conflict.
This PR removes this redundant synchronization. Will add a smaller scoped lock in the PR that removes requested state.
Tests
The following tests are written for this issue:
Our current Task framework tests covers all state transition types, should be able to cover this change.
The following is the result of the "mvn test" command on the appropriate module:
Documentation (Optional)
(Link the GitHub wiki you added)
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)