Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 870: ScanAndCompareGarbageCollector: harden against LM bugs #876

Closed
wants to merge 1 commit into from

Conversation

athanatos
Copy link

@athanatos athanatos commented Dec 18, 2017

The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.

  1. Remove the special handling for the completely empty LM. It's not
    really a common case nor is it much of an optimization even when it
    happens.
  2. Add a config variable to cause the bookie to also check the
    LedgerManager.readLedgerMetadata path before actually removing a ledger.
    The implementations of readLedgerMetadata and the iterator are
    sufficiently different that it seems worth it to have the option of
    checking both to guard somewhat against a bug in either.

(@bug W-4292747@)
(@bug W-3027938@)
Signed-off-by: Samuel Just sjust@salesforce.com
Signed-off-by: Charan Reddy Guttapalem cguttapalem@salesforce.com

Master Issue: #870

@athanatos
Copy link
Author

@jvrao You might want to glance at how I handled the double check.

latch.countDown();
});
latch.await();
if (metaRC.get() != BKException.Code.NoBookieAvailableException) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong return code here? it should be NoLedgerExistsException?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how the below test cases work, if the return code is wrong.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry. I needed to adapt that bit somewhat from our branch and evidently introduced a typo. Let me figure out why the tests worked and correct it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it passed because there were no "normal" gc tests where either the iterator or the ledger manager weren't corrupted with that option enabled. Enabling it for testGcLedgersWithLedgersInSameLedgerRange catches this bug.

@@ -431,4 +657,59 @@ public boolean waitForLastAddConfirmedUpdate(long ledgerId,
return false;
}
}

class LedgerManagerWrapper implements LedgerManager {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use CleanupLedgerManager for this. You don't need to recreate a similar class here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

* NoSuchLedgerExistsException.
*
*/
@Test(timeout = 60000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove timeout value. currently we use a global timeout setting to control. If this test case doesn't require special timeout value, we can remove it here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@athanatos athanatos force-pushed the forupstream/issue-870 branch from aea2a23 to a820ab4 Compare December 19, 2017 18:59
@athanatos
Copy link
Author

@sijie Updated.

Copy link
Member

@sijie sijie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@athanatos LGTM +1

@sijie
Copy link
Member

sijie commented Dec 19, 2017

jdk9 passed, jdk8 failed with TestBenchmark (but it is unrelated, known to be a flaky test)

}

/**
* Set whether to use transactional compaction and using a separate log for compaction or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the description is not correct yet

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch @ArvinDevel !

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will update.

@athanatos athanatos force-pushed the forupstream/issue-870 branch from a820ab4 to 348cf3f Compare December 20, 2017 18:21
if (LOG.isDebugEnabled()) {
LOG.debug("Active in metadata {}, Active in bookie {}", ledgersInMetadata, subBkActiveLedgers);
}
for (Long bkLid : subBkActiveLedgers) {
if (!ledgersInMetadata.contains(bkLid)) {
if (conf.getVerifyMetadataOnGC()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why does this config variable has to be read every single time..probably a variable would be sufficient

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The price of the config read seems to me to be trivial compared to the cost of querying zk. I'm inclined to keep it simple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just saying instead of reading config everytime that config value can be assigned to a final variable and use it here..anyhow that config value is not going to change just like rest of the configs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll switch it.

});
latch.await();
if (metaRC.get() != BKException.Code.NoSuchLedgerExistsException) {
LOG.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: warn or error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point, will fix.

@athanatos athanatos force-pushed the forupstream/issue-870 branch from 348cf3f to 65a83f5 Compare December 20, 2017 19:13
The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.
1) Remove the special handling for the completely empty LM.  It's not
really a common case nor is it much of an optimization even when it
happens.
2) Add a config variable to cause the bookie to also check the
LedgerManager.readLedgerMetadata path before actually removing a ledger.
The implementations of readLedgerMetadata and the iterator are
sufficiently different that it seems worth it to have the option of
checking both to guard somewhat against a bug in either.

(@bug W-4292747@)
(@bug W-3027938@)
Signed-off-by: Samuel Just <sjust@salesforce.com>
Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
@athanatos athanatos force-pushed the forupstream/issue-870 branch from 65a83f5 to bee06be Compare December 20, 2017 19:42
@ArvinDevel
Copy link
Contributor

+1 LGTM

@sijie sijie added this to the 4.7.0 milestone Dec 27, 2017
@sijie sijie closed this in 1f8b26d Dec 27, 2017
revans2 pushed a commit to revans2/bookkeeper that referenced this pull request Feb 7, 2018
The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.
1) Remove the special handling for the completely empty LM.  It's not
really a common case nor is it much of an optimization even when it
happens.
2) Add a config variable to cause the bookie to also check the
LedgerManager.readLedgerMetadata path before actually removing a ledger.
The implementations of readLedgerMetadata and the iterator are
sufficiently different that it seems worth it to have the option of
checking both to guard somewhat against a bug in either.

(bug W-4292747)
(bug W-3027938)
Signed-off-by: Samuel Just <sjustsalesforce.com>
Signed-off-by: Charan Reddy Guttapalem <cguttapalemsalesforce.com>

Master Issue: apache#870

Author: Samuel Just <sjust@salesforce.com>

Reviewers: Ivan Kelly <ivank@apache.org>, Arvin <None>, Sijie Guo <sijie@apache.org>

This closes apache#876 from athanatos/forupstream/issue-870, closes apache#870
revans2 pushed a commit to revans2/bookkeeper that referenced this pull request Feb 7, 2018
The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.
1) Remove the special handling for the completely empty LM.  It's not
really a common case nor is it much of an optimization even when it
happens.
2) Add a config variable to cause the bookie to also check the
LedgerManager.readLedgerMetadata path before actually removing a ledger.
The implementations of readLedgerMetadata and the iterator are
sufficiently different that it seems worth it to have the option of
checking both to guard somewhat against a bug in either.

(bug W-4292747)
(bug W-3027938)
Signed-off-by: Samuel Just <sjustsalesforce.com>
Signed-off-by: Charan Reddy Guttapalem <cguttapalemsalesforce.com>

Master Issue: apache#870

Author: Samuel Just <sjust@salesforce.com>

Reviewers: Ivan Kelly <ivank@apache.org>, Arvin <None>, Sijie Guo <sijie@apache.org>

This closes apache#876 from athanatos/forupstream/issue-870, closes apache#870
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants