-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 870: ScanAndCompareGarbageCollector: harden against LM bugs #876
Conversation
@jvrao You might want to glance at how I handled the double check. |
latch.countDown(); | ||
}); | ||
latch.await(); | ||
if (metaRC.get() != BKException.Code.NoBookieAvailableException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong return code here? it should be NoLedgerExistsException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how the below test cases work, if the return code is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry. I needed to adapt that bit somewhat from our branch and evidently introduced a typo. Let me figure out why the tests worked and correct it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it passed because there were no "normal" gc tests where either the iterator or the ledger manager weren't corrupted with that option enabled. Enabling it for testGcLedgersWithLedgersInSameLedgerRange catches this bug.
@@ -431,4 +657,59 @@ public boolean waitForLastAddConfirmedUpdate(long ledgerId, | |||
return false; | |||
} | |||
} | |||
|
|||
class LedgerManagerWrapper implements LedgerManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use CleanupLedgerManager
for this. You don't need to recreate a similar class here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
* NoSuchLedgerExistsException. | ||
* | ||
*/ | ||
@Test(timeout = 60000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove timeout value. currently we use a global timeout setting to control. If this test case doesn't require special timeout value, we can remove it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
aea2a23
to
a820ab4
Compare
@sijie Updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@athanatos LGTM +1
jdk9 passed, jdk8 failed with TestBenchmark (but it is unrelated, known to be a flaky test) |
} | ||
|
||
/** | ||
* Set whether to use transactional compaction and using a separate log for compaction or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the description is not correct yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch @ArvinDevel !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will update.
a820ab4
to
348cf3f
Compare
if (LOG.isDebugEnabled()) { | ||
LOG.debug("Active in metadata {}, Active in bookie {}", ledgersInMetadata, subBkActiveLedgers); | ||
} | ||
for (Long bkLid : subBkActiveLedgers) { | ||
if (!ledgersInMetadata.contains(bkLid)) { | ||
if (conf.getVerifyMetadataOnGC()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why does this config variable has to be read every single time..probably a variable would be sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The price of the config read seems to me to be trivial compared to the cost of querying zk. I'm inclined to keep it simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just saying instead of reading config everytime that config value can be assigned to a final variable and use it here..anyhow that config value is not going to change just like rest of the configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll switch it.
}); | ||
latch.await(); | ||
if (metaRC.get() != BKException.Code.NoSuchLedgerExistsException) { | ||
LOG.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: warn or error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point, will fix.
348cf3f
to
65a83f5
Compare
The idea behind this patch is to make it more likely that more than one LedgerManager bug would be required to erroneously believe a ledger to be deleted. 1) Remove the special handling for the completely empty LM. It's not really a common case nor is it much of an optimization even when it happens. 2) Add a config variable to cause the bookie to also check the LedgerManager.readLedgerMetadata path before actually removing a ledger. The implementations of readLedgerMetadata and the iterator are sufficiently different that it seems worth it to have the option of checking both to guard somewhat against a bug in either. (@bug W-4292747@) (@bug W-3027938@) Signed-off-by: Samuel Just <sjust@salesforce.com> Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
65a83f5
to
bee06be
Compare
+1 LGTM |
The idea behind this patch is to make it more likely that more than one LedgerManager bug would be required to erroneously believe a ledger to be deleted. 1) Remove the special handling for the completely empty LM. It's not really a common case nor is it much of an optimization even when it happens. 2) Add a config variable to cause the bookie to also check the LedgerManager.readLedgerMetadata path before actually removing a ledger. The implementations of readLedgerMetadata and the iterator are sufficiently different that it seems worth it to have the option of checking both to guard somewhat against a bug in either. (bug W-4292747) (bug W-3027938) Signed-off-by: Samuel Just <sjustsalesforce.com> Signed-off-by: Charan Reddy Guttapalem <cguttapalemsalesforce.com> Master Issue: apache#870 Author: Samuel Just <sjust@salesforce.com> Reviewers: Ivan Kelly <ivank@apache.org>, Arvin <None>, Sijie Guo <sijie@apache.org> This closes apache#876 from athanatos/forupstream/issue-870, closes apache#870
The idea behind this patch is to make it more likely that more than one LedgerManager bug would be required to erroneously believe a ledger to be deleted. 1) Remove the special handling for the completely empty LM. It's not really a common case nor is it much of an optimization even when it happens. 2) Add a config variable to cause the bookie to also check the LedgerManager.readLedgerMetadata path before actually removing a ledger. The implementations of readLedgerMetadata and the iterator are sufficiently different that it seems worth it to have the option of checking both to guard somewhat against a bug in either. (bug W-4292747) (bug W-3027938) Signed-off-by: Samuel Just <sjustsalesforce.com> Signed-off-by: Charan Reddy Guttapalem <cguttapalemsalesforce.com> Master Issue: apache#870 Author: Samuel Just <sjust@salesforce.com> Reviewers: Ivan Kelly <ivank@apache.org>, Arvin <None>, Sijie Guo <sijie@apache.org> This closes apache#876 from athanatos/forupstream/issue-870, closes apache#870
The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.
really a common case nor is it much of an optimization even when it
happens.
LedgerManager.readLedgerMetadata path before actually removing a ledger.
The implementations of readLedgerMetadata and the iterator are
sufficiently different that it seems worth it to have the option of
checking both to guard somewhat against a bug in either.
(@bug W-4292747@)
(@bug W-3027938@)
Signed-off-by: Samuel Just sjust@salesforce.com
Signed-off-by: Charan Reddy Guttapalem cguttapalem@salesforce.com
Master Issue: #870