fip | title | author | discussions-to | status | type | created | requires |
---|---|---|---|---|---|---|---|
0089 |
A Finality Calculator for Filecoin |
Guy Goren (@guy-goren), Jorge M. Soares (@jsoares) |
Draft |
FRC |
2024-01-03 |
FRC-0051 |
Filecoin's Expected Consensus (EC) comes with probabilistic finality and a 900-epoch soft finality threshold. While network participants (e.g. exchanges, L2 operators, application developers) use different confirmation thresholds, they pessimistically wait between 100-900 epochs before considering a transaction final, leading to delays in the order of hours. Instead of naively counting the number of epochs, we propose a finality calculator that considers what takes place during those epochs and, under expected operating conditions, can attain the same level of certainty in fewer epochs.
We perform an analysis of Filecoin's finality, i.e., the probabilistic guarantees that a given tipset will always be in the canonical chain. The analysis involves distributed computing and stochastic arguments and provides upper bounds for the error probabilities, i.e., the probability that a reorg would override a given tipset. We use a dynamic evaluation that considers the chain's specific observed history. Unlike a static analysis that considers the worst-case conditions, we consider the worst-case conditions in the (unknown) future but do not consider the worst-case past. Instead, we use the observed chain to reason about the (known) past.
For example, consider two cases: in case A, the chain grew by 5 blocks per epoch over the last 50 epochs; in case B, the chain grew by a mere 2 blocks per epoch over the same 50 epochs. The probability of a chain reconfiguration longer than 50 epochs is much smaller in case A than in case B, allowing us to make a faster finality determination.
The result is an algorithm that provides an upper bound on the error probabilities given a specific chain history. Several key applications of Filecoin currently use a 900-epoch (7.5 hours) settlement period to provide an intended finality guarantee of
The motivation for this proposal is twofold:
- To provide theoretical finality guarantees grounded on clear formal arguments.
- To reduce the delays experienced by Filecoin users.
F3 will also address these points (with even faster finality). Nevertheless, this work does not conflict with F3 and independently provides several benefits:
- It requires no change to client code and, therefore, no consensus. Instead, it is a simple algorithm that any node can run separately from the "mining process". The algorithm takes the observed history and outputs a measure of safety.
- It does not change the incentive considerations of Filecoin, which remains a longest-chain blockchain, similar to Bitcoin and Ethereum. The algorithm being executed (or not) does not affect the safety of Filecoin.
- It is an algorithm and not a protocol. It executes locally and does not depend on other nodes executing the algorithm. For the same reason, it requires no additional communication and is infinitely scalable.
- In case F3 halts and the network falls back on EC finality, it can reduce the degradation experienced by users.
Notably, (1) implies that our proposal can provide an interim solution for Filecoin until F3 goes live on the network.
The analysis detailing how we bound the finality of a given input can be found in the companion resources. In the interest of brevity and of minimising duplication, we will skip the derivations here and instead present the resulting mechanism with a focus on implementers.
In summary, we consider an observed addition of
- Distant past: potential adversarial lead at epoch
$s$ , with respect to the local heaviest chain (lh-chain) - Recent past: blocks produced by the adversary between epoch
$s$ and the current epoch$c$ - Future: potential lead created by the adversary after current epoch
$c$ , with respect to blocks produced by honest validators slowed by the adversary
We use the data from the three stages to determine the probability that the adversary will overtake the observed chain. The calculator assumes the use of consistent broadcast, specified in FRC-0051 and currently deployed on most network nodes.
The random variable
To account for the distribution of
# Initialize an array to store Pr(L=k)
pr_L = [0] * (max_k_L + 1)
# Calculate Pr(L=k) for each value of k
for k in range(0, max_k_L + 1):
sum_expected_adversarial_blocks_i = 0
sum_chain_blocks_i = 0
# Calculate Pr(L_i = k_i) for each epoch i, starting from epoch `s` under evaluation
# and walking backwards to the last final tipset
for i in range(target_epoch, current_epoch - 900, -1):
sum_expected_adversarial_blocks_i += rate_malicious_blocks
sum_chain_blocks_i += chain[i - 1]
# Poisson(k=k, lambda=sum(f*e))
pr_L_i = ss.poisson.pmf(k + sum_chain_blocks_i, sum_expected_adversarial_blocks_i)
# Take Pr(L=k) as the maximum over all i
pr_L[k] = max(pr_L[k], pr_L_i)
# As the adversarial lead is never negative, the missing probability is added to k=0
pr_L[0] += 1 - sum(pr_L)
The random variable
# Initialize an array to store Pr(B=k)
pr_B = [0] * (max_k_B + 1)
# Calculate Pr(B=k) for each value of k
for k in range(0, max_k_B + 1):
# Poisson(k=k, lambda=sum(f*e))
pr_B[k] = ss.poisson.pmf(k, (current_epoch - (target_epoch + 1)) * rate_malicious_blocks)
The random variable
# Calculate the probability Pr(H>0)
# Poisson (k=0, lambda=h*e)
Pr_H_gt_0 = 1 - ss.poisson.pmf(0, rate_honest_blocks)
# Calculate E[Z]
exp_Z = 0.0
for k in range(0, (int) (4 * blocks_per_epoch)): # Range stems from the distribution's moments
# Poisson(k=k, lambda=f*e)
pmf = ss.poisson.pmf(k, rate_malicious_blocks)
exp_Z += ((rate_honest_blocks + k) / (2 ** k)) * pmf
# Lower bound on the growth rate of the public chain
rate_public_chain = Pr_H_gt_0 * exp_Z
# Initialize an array to store Pr(M=k)
pr_M = [0] * (max_k_M + 1)
# Calculate Pr(M = k) for each value of k
for k in range(0, max_k_M + 1):
# Calculate Pr(M_i = k) for each i and find the maximum
for i in range(max_i_M, 0, -1):
# Skellam(k=k, mu1=n*e*f, mu2=n*E[Z])
prob_M_i = ss.skellam.pmf(k, i * rate_malicious_blocks, i * rate_public_chain)
# Take Pr(M=k) as the maximum over all i
pr_M[k] = max(pr_M[k], prob_M_i)
# pr_M[0] collects the probability of the adversary never catching up in the future.
pr_M[0] += 1 - sum(pr_M)
For an observed good addition
$L \geq k$ -
$L < k$ but$L+B \geq k$ -
$L + B < k$ but$L+B+M \geq k$
The overall error probability is bounded by
# Calculate cumulative sums for L, B, and M
cumsum_L = np.cumsum(pr_L)
cumsum_B = np.cumsum(pr_B)
cumsum_M = np.cumsum(pr_M)
# The observed chain has added weight equal to number of blocks since added
k = sum(chain[target_epoch:current_epoch])
# Calculate pr_error[k] for the observed added weight
# Performs a convolution over the step probability vectors
sum_L_ge_k = cumsum_L[-1]
if k > 0:
sum_L_ge_k -= cumsum_L[min(k - 1, max_k_L)]
double_sum = 0.0
for l in range(0, k):
sum_B_ge_k_min_l = cumsum_B[-1]
if k - l - 1 > 0:
sum_B_ge_k_min_l -= cumsum_B[min(k - l - 1, max_k_B)]
double_sum += pr_L[min(l, max_k_L)] * sum_B_ge_k_min_l
for b in range(0, k - l):
sum_M_ge_k_min_l_min_b = cumsum_M[-1]
if k - l - b - 1 > 0:
sum_M_ge_k_min_l_min_b -= cumsum_M[min(k - l - b - 1, max_k_M)]
double_sum += pr_L[min(l, max_k_L)] * pr_B[min(b, max_k_B)] * sum_M_ge_k_min_l_min_b
pr_error = sum_L_ge_k + double_sum
# Get the probability of the adversary overtaking the observed weight
# The conservative upper may exceed 1 in limit cases, so we cap the output.
return min(pr_error, 1.0)
There are no backward compatibility issues, as this FRC provides a recommendation to clients/applications with no change to the existing consensus mechanism or protocol.
Given the nature of the proposal, we replace test cases with an evaluation of the algorithm on past chain data from the Filecoin mainnet. The periods analysed are heights 3,349,680 to 3,433,200, a typical healthy state, and heights 2,684,400 to 2,773,680, at the tail end of which there is a particularly unhealthy period. The figures below show the results of quantifying the finality of tipsets after a 30-epoch delay. The potential adversarial power we considered is 30%. One may choose other questions to analyse, such as when a specific tipset reaches a given finality threshold.
The figure below shows the results of applying the analysis to a typical healthy chain period, when tipsets are almost full (5 blocks) on average. On the
The figure below shows the results of applying the analysis on a period when the Filecoin blockchain was less healthy (around the end of March 2023). Starting around epoch 2,750,000, tipsets show, on average, fewer blocks than expected --- often falling to below 4 blocks per tipset when averaged over 30 epochs. As expected, the computed finality is worse than in the typical "good case"; that is, the error probabilities are higher by several orders of magnitude at times.
- From the network perspective, the proposed algorithm provides more transparency on the security guarantees Filecoin provides without impacting them.
- From the client/user perspective, a transaction can now be considered final once a desired security guarantee is met. While, on the surface, it may seem riskier to finalise transactions 30x faster, this is merely the result of replacing a loose worst-case assumption by a more accurate determination based on actual network conditions.
- It is possible to misuse the algorithm and accept an error probability that is too high while attempting to over-optimise for latency. This is akin to picking an insufficient number of epochs in the current paradigm, with the key difference that the error probability can be easily interpreted, whereas the certainty provided by a fixed number of epochs is variable and opaque.
There are no substantive incentive considerations.
Applications that choose to use the mechanism proposed in the FRC no longer need to always wait for a large fixed number of epochs and can benefit from faster finalization times, further tuned to their requirements. This enables applications that were previously impractical on Filecoin (e.g. HFT), but can also greatly improve the user experience for L2 networks (e.g. IPC), exchanges (e.g. faster deposits), and many other applications. Nevertheless, the EC Finality Calculator falls short of the functionality set to be provided by F3, e.g. in terms of finality time and verifiable bridging.
We provide a Python prototype implementation that translates the theory as directly as possible. For the sake of readability, no effort was made to optimise performance, save for adding early termination conditions that can greatly speed up execution with negligible impact on the output. As an application-side optimization with no interoperability requirement, implementers in different clients and languages may freely diverge from the prototype. While applications can implement the mechanism without specific node support, adoption would be helped by Filecoin implementations providing an out-of-the-box RPC method to query the error probability for a past epoch.
N/A
Copyright and related rights waived via CC0.