-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test that ShardOwnershipLostErrors are never retried #3625
Conversation
26b0de7
to
0104057
Compare
0104057
to
6ce49a8
Compare
I simplified this a lot. I think I was wrong about the race conditions because |
42c79d2
to
16eab2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we doing unit test at controller level? Wouldn't it be much simpler if we test at shard context level directly? We just need to setup appropriate mocks for shard context, and verify acquireShard() is retrying all errors except shardOwnershipLost.
I'm following the principle of only testing public interfaces: https://testing.googleblog.com/2008/07/tott-testing-against-interfaces.html I'll change it to a context test though |
a716a79
to
b865cf7
Compare
a37a7ce
to
396821d
Compare
4f594a9
to
90e0145
Compare
case <-closed: | ||
timer.Stop() | ||
case <-timer.C: | ||
s.Fail("shard should have been closed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this because the retry policy has max attempt of 5 and mock return error 6 times? Can we add one test case to verify that acquire shard succeed after a few errors and evenually succeed with retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we call the operation before checking the attempt count constraint here: https://github.com/temporalio/temporal/blob/2f84879686db552a9ca1bd6ea60fc099fa2557fa/common/backoff/retry.go#LL143
It looks like a bug AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the other test case
a52551e
to
de29452
Compare
s.mockShardManager.EXPECT().UpdateShard(gomock.Any(), gomock.Any()). | ||
Return(&persistence.ShardOwnershipLostError{}).Times(1) | ||
|
||
s.mockShard.acquireShard() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do you assert test results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertions are within the expectations. In this case, we are asserting that update shard is only called once. In other cases, we assert that it is tried several times.
s.mockShardManager.EXPECT().UpdateShard(gomock.Any(), gomock.Any()).
Return(&persistence.ShardOwnershipLostError{}).Times(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added a line after this that verifies that the context's state is "stopping". Thanks for pointing this out.
s.mockShardManager.EXPECT().UpdateShard(gomock.Any(), gomock.Any()). | ||
Return(fmt.Errorf("temp error")).Times(6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happen after retry exhausted all attempts? does the mock.UpdateShard() succeed?
How do you assert test results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a statement after this that verifies that the context's state is stopping.
Return(nil).Times(1) | ||
s.mockHistoryEngine.EXPECT().NotifyNewTasks(gomock.Any(), gomock.Any()).MinTimes(1) | ||
|
||
s.mockShard.acquireShard() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add assert to verify the shard state is acquired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a statement after this that verifies that the context's state is acquired.
de29452
to
8d10fa1
Compare
8d10fa1
to
2baeb32
Compare
if policy == nil { | ||
policy = backoff.NewExponentialRetryPolicy(1 * time.Second).WithExpirationInterval(5 * time.Minute) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set this default value in newContext() method?
What changed?
Adds a test that verifies a ShardOwnershipLostError is not retried, but a regular error is.
Why?
These are the only types of errors that acquireShard should not retry.
How did you test it?
This is a test.
Potential risks
Just a test.
Is hotfix candidate?
No.