Sagnificant performance degradation for aggregate method in jblst #66

vikulin · 2021-04-15T08:01:22Z

Sagnificant performance degradation for aggregate method in jblst.

This was noticed while benchmark tests for BLS aggregation method in jblst. The library from v0.1.0-RELEASE showed ~150MB/sec performance on my 6 core Rysen 5. I decided to upgrade to the latest jblst v 0.3.3-1. After this the same test showed ~3.5MB/sec. The benchmark test is published in github repo here: https://github.com/RiV-chain/riv-benchmark/blob/main/src/main/java/org/riv/SigningTest.java#L29

I could find the part of Java code and compare what difference in the native methods invokation:

v0.1.0-RELEASE

  public static BlstSignature aggregate(List<BlstSignature> signatures) {
    List<BlstSignature> finiteSignatures =
        signatures.stream().filter(sig -> !sig.isInfinity()).collect(Collectors.toList());

    Optional<BlstSignature> invalidSignature =
        finiteSignatures.stream().filter(s -> !s.isValid).findFirst();
    if (invalidSignature.isPresent()) {
      throw new IllegalArgumentException(
          "Can't aggregate invalid signature: " + invalidSignature.get());
    }

    p2 sum = new p2();
    try {
      blst.p2_from_affine(sum, finiteSignatures.get(0).ec2Point);
      for (int i = 1; i < finiteSignatures.size(); i++) {
        blst.p2_add_affine(sum, sum, finiteSignatures.get(i).ec2Point);
      }
      p2_affine res = new p2_affine();
      blst.p2_to_affine(res, sum);

      return new BlstSignature(res, true);
    } finally {
      sum.delete();
    }
  }

v0.3.3-1

  public static BlstSignature aggregate(List<BlstSignature> signatures) {

    Optional<BlstSignature> invalidSignature =
        signatures.stream().filter(s -> !s.isValid).findFirst();
    if (invalidSignature.isPresent()) {
      throw new IllegalArgumentException(
          "Can't aggregate invalid signature: " + invalidSignature.get());
    }

    P2 sum = new P2();
    for (BlstSignature finiteSignature : signatures) {
      sum.aggregate(finiteSignature.ec2Point);
    }

    return new BlstSignature(sum.to_affine(), true);
  }

The text was updated successfully, but these errors were encountered:

vikulin · 2021-04-15T08:21:10Z

Test v0.1.0-RELEASE:

350 (rand=DRBG[seed=13636363])  	  cur: 72,819 MB/s  	  avg: 72,819 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 137,163 MB/s  	  avg: 95,132 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 140,559 MB/s  	  avg: 106,618 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 146,083 MB/s  	  avg: 114,341 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 146,115 MB/s  	  avg: 119,54 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 144,471 MB/s  	  avg: 123,08 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 146,803 MB/s  	  avg: 125,988 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 149,258 MB/s  	  avg: 128,492 MB/s

Test v0.3.3-1:

Test:testSignatureAggregate
350 (rand=DRBG[seed=13636363])  	  cur: 3.663 MB/s  	  avg: 3.663 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 3.78 MB/s  	  avg: 3.72 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 3.787 MB/s  	  avg: 3.742 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 3.784 MB/s  	  avg: 3.753 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 3.754 MB/s  	  avg: 3.753 MB/s

Test BLS Mikuli old implementation:

Test:testSignatureAggregate
350 (rand=DRBG[seed=13636363])  	  cur: 2,372 MB/s  	  avg: 2,372 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,399 MB/s  	  avg: 2,385 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,406 MB/s  	  avg: 2,392 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,399 MB/s  	  avg: 2,394 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,419 MB/s  	  avg: 2,399 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,402 MB/s  	  avg: 2,4 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 2,414 MB/s  	  avg: 2,402 MB/s

Test BLS Mikuli recent implementation:

Test:testSignatureAggregate
350 (rand=DRBG[seed=13636363])  	  cur: 23.305 MB/s  	  avg: 23.305 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 28.213 MB/s  	  avg: 25.525 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 26.445 MB/s  	  avg: 25.825 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 27.998 MB/s  	  avg: 26.336 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 28.237 MB/s  	  avg: 26.695 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 28.558 MB/s  	  avg: 26.989 MB/s
350 (rand=DRBG[seed=13636363])  	  cur: 28.646 MB/s  	  avg: 27.214 MB/s

Nashatyrev · 2021-04-15T11:23:22Z

Here is the aggrgegate() implementation in cpp binding:

    void aggregate(const P1_Affine& in)
    {   if (blst_p1_affine_in_g1(in))
            blst_p1_add_or_double_affine(&point, &point, in);
        else
            throw BLST_POINT_NOT_IN_GROUP;
    }

I bet blst_p1_affine_in_g1 slows down the things

dot-asm · 2021-04-15T11:45:45Z

I bet blst_p1_affine_in_g1 slows down the things

Correct bet. The G1 group check is ~70 times the addition. In G2 the ratio is ~30. Either way, it appears that the report is misplaced. It's jblst that chooses and makes the calls, hence the report is something for jblst to address. Keep in mind that points coming from the network are to be group-checked. And it's not impossible to imagine that jblst could simply have moved the group-check from elsewhere. Which would mean that benchmark is not actually representative, one should benchmark complete stack, not just point additions.

vikulin · 2021-04-15T11:57:54Z

I bet blst_p1_affine_in_g1 slows down the things

Correct bet. The G1 group check is ~70 times the addition. In G2 the ratio is ~30. Either way, it appears that the report is misplaced. It's jblst that chooses and makes the calls, hence the report is something for jblst to address. Keep in mind that points coming from the network are to be group-checked. And it's not impossible to imagine that jblst could simply have moved the group-check from elsewhere. Which would mean that benchmark is not actually representative, one should benchmark complete stack, not just point additions.

@dot-asm see the code in test:

					long begin = System.nanoTime();
					
					//BLS.aggregateVerify(pubKeys, messages, aggregatedSign);
					BLSSignature aggregatedSign = BLS.aggregate(s);
					byte[] out = aggregatedSign.toBytesCompressed().toArray();
					long end = System.nanoTime();

The aggregate is called with 1000 signatures and the operation re-check it in loop collecting time delta. Where is my mistake?

vikulin · 2021-04-15T12:18:44Z

Can anyone create a test using C++ binding aggregate call and run it in a loop for comparison with previous implementation? I used 1000 messages with 350 bytes each for one cycle. Then I signed it, collected all signatures in a list then called BLS.aggregate(s). Collected time delta and so on.

dot-asm · 2021-04-15T12:23:27Z

What I mean is that one should ask how many signatures whole application can process in unit of time, and do so securely, not any chosen loop. Again, the issue is something for jblst to resolve.

vikulin · 2021-04-15T12:30:33Z

What I mean is that one should ask how many signatures whole application can process in unit of time, and do so securely, not any chosen loop. Again, the issue is something for jblst to resolve.

Not agree, the final performance depends of total payload size which might vary a lot. So benchmark should show IN MB/sec which reflecting actual bandwidth of BLS lib.
My question: what do you mean securely?

dot-asm · 2021-04-15T12:34:25Z

I used 1000 messages with 350 bytes each for one cycle. Then I signed it, collected all signatures in a list then called BLS.aggregate(s)

In real life application wouldn't sign messages to just verify signatures by itself. Signatures would be passed to somebody else. That somebody else has to perform group check on each individual signature prior aggregating them. Since this operation will be dominated by the group checks, benchmarking point additions in isolation is not representative.

vikulin · 2021-04-15T12:40:02Z

I used 1000 messages with 350 bytes each for one cycle. Then I signed it, collected all signatures in a list then called BLS.aggregate(s)

In real life application wouldn't sign messages to just verify signatures by itself. Signatures would be passed to somebody else. That somebody else has to perform group check on each individual signature prior aggregating them. Since this operation will be dominated by the group checks, benchmarking point additions in isolation is not representative.

This is why I splited out aggregateVerify and aggregate calls to determine where the slow down comes from. Also jblst confirmed fix which should be applied on bls native part. See the comment

Consensys/jblst#12 (comment)

sean-sn · 2021-04-15T12:46:16Z

As @benjaminion mentions in the jblst issue, the point is where does the check go. It needs to be somewhere and how that impacts a synthetic benchmark is not of real concern as @dot-asm mentioned. @vikulin where do you propose the group check should go in the application? Do you have a different application than Teku?

vikulin · 2021-04-15T12:54:31Z

As @benjaminion mentions in the jblst issue, the point is where does the check go. It needs to be somewhere and how that impacts a synthetic benchmark is not of real concern as @dot-asm mentioned. @vikulin where do you propose the group check should go in the application? Do you have a different application than Teku?

Yes, I have different app which requires aggregate call to be used separately. if I understand correctly aggregate is preparing aggregated signatures which can be used once for group check (aggregateVerify) but the group check can be executed somewhere else after the aggregate is done.

sean-sn · 2021-04-15T13:07:31Z

If you are taking signatures off the wire then group checking them would be appropriate prior to adding them to an aggregated point. If you generate the signatures all yourself and trust the validity then you can get away with not group checking prior to aggregation. The blst cpp binding provides mechanisms to perform the group checks and additions independently. Or if one chooses to do both at the same time then a call to aggregate may be made. In blst the aggregate() member function would look exactly like the add() member function if the group check was not in there. Taking the check out of aggregate() would probably mean just getting rid of the function itself. Therefore I do not see any changes required within blst at this time. As @dot-asm mentioned, this is an issue that needs to be resolved within jblst or your application.

vikulin · 2021-04-15T13:19:50Z

Or if one chooses to do both at the same time then a call to aggregate may be made.

@sean-sn what if I chose both at the same time? The performance is still went down. I'm sorry bit I still don't see any good arguments why aggregated call should not be fixed. As I mentioned comparison is clear: the aggregate call sagnificantly slowed down.

benjaminion · 2021-04-15T13:34:43Z

@vikulin Forgive me for adding to the number saying that this is not Blst's problem. You are using Teku's implementation, which uses Jblst, which uses Blst. Nothing changed on the Blst or Jblst side, we just moved some things around in Teku.

The "fix" is easy: if you want to aggregate quickly without the group membership check (and you are confident that it is safe), use P2.add() - this is what we used to do in Teku. If you want to aggregate with the group membership check (as we now do in Teku), then use P2.aggregate().

vikulin · 2021-04-15T15:13:29Z

@benjaminion , thanks. But I'm curios - which part of code has been changed to make the fix? Whether it's done?

vikulin · 2021-04-30T10:09:03Z

There is no good explanation why the issue was closed. The performance fallen down comparing with v0.1.0. @dot-asm could not explain why this happened.

dot-asm · 2021-04-30T10:17:56Z

I reckon that sufficient information was provided in the course of discussion. The fact that last question remained unanswered is not @supranational/blst's fault. As already said, the report is misplaced. It's not about blst implementation, but about choice between two methods, aggregate and add, made elsewhere.

vikulin · 2021-04-30T10:29:51Z

@dot-asm add method has never been used since it's not a public method in blst.

dot-asm · 2021-04-30T11:19:59Z

@dot-asm add method has never been used since it's not a public method in blst.

So following your logic, blst.hpp has no add method, hence aggregate had to be used all alone. But here is the problem, aggregate method didn't change since its initial implementation in blst.hpp, it always performed the expensive group-check...

Just in case, ellipsis at the end of previous paragraph is not an invitation for further discussion. In fact, I plan to abstain from further discussion, because it's getting circular. Get your logic straight! (But don't expect somebody else to straighten it up for you:-)

vikulin · 2021-04-30T11:48:49Z

aggregate method didn't change since its initial implementation in blst.hpp, it always performed the expensive group-check...

@dot-asm you could close the issue right after it was created if you mentioned this and you would not spend so much time for the discussion. Now it's clear for me.

This was referenced Apr 15, 2021

Performance issue in BLS.aggregate method for v 0.3.3-1 Consensys/jblst#12

Closed

[doc] Aggregate - pairing_mul_n #8

Open

dot-asm closed this as completed Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sagnificant performance degradation for aggregate method in jblst #66

Sagnificant performance degradation for aggregate method in jblst #66

vikulin commented Apr 15, 2021 •

edited

Loading

vikulin commented Apr 15, 2021

Nashatyrev commented Apr 15, 2021

dot-asm commented Apr 15, 2021

vikulin commented Apr 15, 2021 •

edited

Loading

vikulin commented Apr 15, 2021

dot-asm commented Apr 15, 2021

vikulin commented Apr 15, 2021

dot-asm commented Apr 15, 2021

vikulin commented Apr 15, 2021 •

edited

Loading

sean-sn commented Apr 15, 2021

vikulin commented Apr 15, 2021

sean-sn commented Apr 15, 2021

vikulin commented Apr 15, 2021 •

edited

Loading

benjaminion commented Apr 15, 2021

vikulin commented Apr 15, 2021

vikulin commented Apr 30, 2021

dot-asm commented Apr 30, 2021

vikulin commented Apr 30, 2021

dot-asm commented Apr 30, 2021

vikulin commented Apr 30, 2021 •

edited

Loading

Sagnificant performance degradation for aggregate method in jblst #66

Sagnificant performance degradation for aggregate method in jblst #66

Comments

vikulin commented Apr 15, 2021 • edited Loading

vikulin commented Apr 15, 2021

Nashatyrev commented Apr 15, 2021

dot-asm commented Apr 15, 2021

vikulin commented Apr 15, 2021 • edited Loading

vikulin commented Apr 15, 2021

dot-asm commented Apr 15, 2021

vikulin commented Apr 15, 2021

dot-asm commented Apr 15, 2021

vikulin commented Apr 15, 2021 • edited Loading

sean-sn commented Apr 15, 2021

vikulin commented Apr 15, 2021

sean-sn commented Apr 15, 2021

vikulin commented Apr 15, 2021 • edited Loading

benjaminion commented Apr 15, 2021

vikulin commented Apr 15, 2021

vikulin commented Apr 30, 2021

dot-asm commented Apr 30, 2021

vikulin commented Apr 30, 2021

dot-asm commented Apr 30, 2021

vikulin commented Apr 30, 2021 • edited Loading

vikulin commented Apr 15, 2021 •

edited

Loading

vikulin commented Apr 15, 2021 •

edited

Loading

vikulin commented Apr 15, 2021 •

edited

Loading

vikulin commented Apr 15, 2021 •

edited

Loading

vikulin commented Apr 30, 2021 •

edited

Loading