Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test that ShardOwnershipLostErrors are never retried #3625

Merged
merged 1 commit into from
Dec 6, 2022

Conversation

MichaelSnowden
Copy link
Contributor

What changed?
Adds a test that verifies a ShardOwnershipLostError is not retried, but a regular error is.

Why?
These are the only types of errors that acquireShard should not retry.

How did you test it?
This is a test.

Potential risks
Just a test.

Is hotfix candidate?
No.

@MichaelSnowden MichaelSnowden requested a review from a team as a code owner November 19, 2022 00:25
@MichaelSnowden MichaelSnowden force-pushed the acquire-shard/unit-test-1 branch 2 times, most recently from 26b0de7 to 0104057 Compare November 19, 2022 01:19
@MichaelSnowden MichaelSnowden force-pushed the acquire-shard/unit-test-1 branch from 0104057 to 6ce49a8 Compare November 19, 2022 02:13
@MichaelSnowden
Copy link
Contributor Author

I simplified this a lot. I think I was wrong about the race conditions because go test ./service/history/shard/. -run 'TestShardControllerSuite/TestAcquireShard.*' -race showed no race conditions.

@MichaelSnowden MichaelSnowden force-pushed the acquire-shard/unit-test-1 branch 5 times, most recently from 42c79d2 to 16eab2a Compare November 24, 2022 23:10
Copy link
Member

@yiminc yiminc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we doing unit test at controller level? Wouldn't it be much simpler if we test at shard context level directly? We just need to setup appropriate mocks for shard context, and verify acquireShard() is retrying all errors except shardOwnershipLost.

@MichaelSnowden
Copy link
Contributor Author

why are we doing unit test at controller level? Wouldn't it be much simpler if we test at shard context level directly? We just need to setup appropriate mocks for shard context, and verify acquireShard() is retrying all errors except shardOwnershipLost.

I'm following the principle of only testing public interfaces: https://testing.googleblog.com/2008/07/tott-testing-against-interfaces.html

I'll change it to a context test though

@MichaelSnowden MichaelSnowden force-pushed the acquire-shard/unit-test-1 branch 2 times, most recently from a716a79 to b865cf7 Compare November 28, 2022 20:00
@MichaelSnowden MichaelSnowden force-pushed the acquire-shard/unit-test-1 branch 2 times, most recently from a37a7ce to 396821d Compare November 29, 2022 22:48
@MichaelSnowden MichaelSnowden force-pushed the acquire-shard/unit-test-1 branch 3 times, most recently from 4f594a9 to 90e0145 Compare November 30, 2022 18:56
case <-closed:
timer.Stop()
case <-timer.C:
s.Fail("shard should have been closed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because the retry policy has max attempt of 5 and mock return error 6 times? Can we add one test case to verify that acquire shard succeed after a few errors and evenually succeed with retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we call the operation before checking the attempt count constraint here: https://github.com/temporalio/temporal/blob/2f84879686db552a9ca1bd6ea60fc099fa2557fa/common/backoff/retry.go#LL143

It looks like a bug AFAICT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the other test case

@MichaelSnowden MichaelSnowden force-pushed the acquire-shard/unit-test-1 branch 2 times, most recently from a52551e to de29452 Compare December 1, 2022 07:58
@MichaelSnowden MichaelSnowden enabled auto-merge (squash) December 2, 2022 23:04
s.mockShardManager.EXPECT().UpdateShard(gomock.Any(), gomock.Any()).
Return(&persistence.ShardOwnershipLostError{}).Times(1)

s.mockShard.acquireShard()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do you assert test results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertions are within the expectations. In this case, we are asserting that update shard is only called once. In other cases, we assert that it is tried several times.

	s.mockShardManager.EXPECT().UpdateShard(gomock.Any(), gomock.Any()).
		Return(&persistence.ShardOwnershipLostError{}).Times(1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a line after this that verifies that the context's state is "stopping". Thanks for pointing this out.

Comment on lines +386 to +389
s.mockShardManager.EXPECT().UpdateShard(gomock.Any(), gomock.Any()).
Return(fmt.Errorf("temp error")).Times(6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happen after retry exhausted all attempts? does the mock.UpdateShard() succeed?
How do you assert test results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a statement after this that verifies that the context's state is stopping.

Return(nil).Times(1)
s.mockHistoryEngine.EXPECT().NotifyNewTasks(gomock.Any(), gomock.Any()).MinTimes(1)

s.mockShard.acquireShard()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add assert to verify the shard state is acquired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a statement after this that verifies that the context's state is acquired.

@MichaelSnowden MichaelSnowden force-pushed the acquire-shard/unit-test-1 branch from de29452 to 8d10fa1 Compare December 5, 2022 20:01
@MichaelSnowden MichaelSnowden force-pushed the acquire-shard/unit-test-1 branch from 8d10fa1 to 2baeb32 Compare December 5, 2022 22:15
Comment on lines +1896 to +1898
if policy == nil {
policy = backoff.NewExponentialRetryPolicy(1 * time.Second).WithExpirationInterval(5 * time.Minute)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set this default value in newContext() method?

@MichaelSnowden MichaelSnowden merged commit 34e256a into master Dec 6, 2022
@MichaelSnowden MichaelSnowden deleted the acquire-shard/unit-test-1 branch December 6, 2022 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants