|
1 | 1 | Substrate transaction pool implementation.
|
2 | 2 |
|
3 |
| -License: GPL-3.0-or-later WITH Classpath-exception-2.0 |
| 3 | +License: GPL-3.0-or-later WITH Classpath-exception-2.0 |
| 4 | + |
| 5 | +# Problem Statement |
| 6 | + |
| 7 | +The transaction pool is responsible for maintaining a set of transactions that |
| 8 | +possible to include by block authors in upcoming blocks. Transactions are received |
| 9 | +either from networking (gossiped by other peers) or RPC (submitted locally). |
| 10 | + |
| 11 | +The main task of the pool is to prepare an ordered list of transactions for block |
| 12 | +authorship module. The same list is useful for gossiping to other peers, but note |
| 13 | +that it's not a hard requirement for the gossiped transactions to be exactly the |
| 14 | +same (see implementation notes below). |
| 15 | + |
| 16 | +It's within block author incentives to have the transactions stored and ordered in |
| 17 | +such a way to: |
| 18 | + |
| 19 | +1. Maximize block author's profits (value of the produced block) |
| 20 | +2. Minimize block author's amount of work (time to produce block) |
| 21 | + |
| 22 | +In the case of FRAME the first property is simply making sure that the fee per weight |
| 23 | +unit is the highest (high `tip` values), the second is about avoiding feeding |
| 24 | +transactions that cannot be part of the next block (they are invalid, obsolete, etc). |
| 25 | + |
| 26 | +From the transaction pool PoV, transactions are simply opaque blob of bytes, |
| 27 | +it's required to query the runtime (via `TaggedTransactionQueue` Runtime API) to |
| 28 | +verify transaction's mere correctness and extract any information about how the |
| 29 | +transaction relates to other transactions in the pool and current on-chain state. |
| 30 | +Only valid transactions should be stored in the pool. |
| 31 | + |
| 32 | +Each imported block can affect validity of transactions already in the pool. Block |
| 33 | +authors expect from the pool to get most up to date information about transactions |
| 34 | +that can be included in the block that they are going to build on top of the just |
| 35 | +imported one. The process of ensuring this property is called *pruning*. During |
| 36 | +pruning the pool should remove transactions which are considered invalid by the |
| 37 | +runtime (queried at current best imported block). |
| 38 | + |
| 39 | +Since the blockchain is not always linear, forks need to be correctly handled by |
| 40 | +the transaction pool as well. In case of a fork, some blocks are *retracted* |
| 41 | +from the canonical chain, and some other blocks get *enacted* on top of some |
| 42 | +common ancestor. The transactions from retrated blocks could simply be discarded, |
| 43 | +but it's desirable to make sure they are still considered for inclusion in case they |
| 44 | +are deemed valid by the runtime state at best, recently enacted block (fork the |
| 45 | +chain re-organized to). |
| 46 | + |
| 47 | +Transaction pool should also offer a way of tracking transaction lifecycle in the |
| 48 | +pool, it's broadcasting status, block inclusion, finality, etc. |
| 49 | + |
| 50 | +## Transaction Validity details |
| 51 | + |
| 52 | +Information retrieved from the the runtime are encapsulated in `TransactionValidity` |
| 53 | +type. |
| 54 | + |
| 55 | +```rust |
| 56 | +pub type TransactionValidity = Result<ValidTransaction, TransactionValidityError>; |
| 57 | + |
| 58 | +pub struct ValidTransaction { |
| 59 | + pub requires: Vec<TransactionTag>, |
| 60 | + pub provides: Vec<TransactionTag>, |
| 61 | + pub priority: TransactionPriority, |
| 62 | + pub longevity: TransactionLongevity, |
| 63 | + pub propagate: bool, |
| 64 | +} |
| 65 | + |
| 66 | +pub enum TransactionValidityError { |
| 67 | + Invalid(/* details */), |
| 68 | + Unknown(/* details */), |
| 69 | +} |
| 70 | +``` |
| 71 | + |
| 72 | +We will go through each of the parameter now to understand the requirements they |
| 73 | +create for transaction ordering. |
| 74 | + |
| 75 | +The runtime is expected to return these values in a deterministic fashion. Calling |
| 76 | +the API multiple times given exactly the same state must return same results. |
| 77 | +Field-specific rules are described below. |
| 78 | + |
| 79 | +### `requires` / `provides` |
| 80 | + |
| 81 | +These two fields contain a set of `TransactionTag`s (opaque blobs) associated with |
| 82 | +given transaction. Looking at these fields we can find dependencies between |
| 83 | +transactions and their readiness for block inclusion. |
| 84 | + |
| 85 | +The `provides` set contains properties that will be *satisfied* in case the transaction |
| 86 | +is successfully added to a block. `requires` contains properties that must be satisfied |
| 87 | +**before** the transaction can be included to a block. |
| 88 | + |
| 89 | +Note that a transaction with empty `requires` set can be added to a block immediately, |
| 90 | +there are no other transactions that it expects to be included before. |
| 91 | + |
| 92 | +For some given series of transactions the `provides` and `requires` fields will create |
| 93 | +a (simple) directed acyclic graph. The *sources* in such graph, if they don't have |
| 94 | +any extra `requires` tags (i.e. they have their all dependencies *satisfied*), should |
| 95 | +be considered for block inclusion first. Multiple transactions that are ready for |
| 96 | +block inclusion should be ordered by `priority` (see below). |
| 97 | + |
| 98 | +Note the process of including transactions to a block is basically building the graph, |
| 99 | +then selecting "the best" source vertex (transaction) with all tags satisfied and |
| 100 | +removing it from that graph. |
| 101 | + |
| 102 | +#### Examples |
| 103 | + |
| 104 | +- A transaction in Bitcoin-like chain will `provide` generated UTXOs and will `require` |
| 105 | + UTXOs it is still awaiting for (note that it's not necessarily all require inputs, |
| 106 | + since some of them might already be spendable (i.e. the UTXO is in state)) |
| 107 | + |
| 108 | +- A transaction in account-based chain will `provide` a `(sender, transaction_index/nonce)` |
| 109 | + (as one tag), and will `require` `(sender, nonce - 1)` in case |
| 110 | + `on_chain_nonce < nonce - 1`. |
| 111 | + |
| 112 | +#### Rules & caveats |
| 113 | + |
| 114 | +- `provides` must not be empty |
| 115 | +- transactions with an overlap in `provides` tags are mutually exclusive |
| 116 | +- checking validity of transaction that `requires` tag `A` after including |
| 117 | + transaction that provides that tag must not return `A` in `requires` again |
| 118 | +- runtime developers should avoid re-using `provides` tag (i.e. it should be unique) |
| 119 | +- there should be no cycles in transaction dependencies |
| 120 | +- caveat: on-chain state conditions may render transaction invalid despite no |
| 121 | + `requires` tags |
| 122 | +- caveat: on-chain state conditions may render transaction valid despite some |
| 123 | + `requires` tags |
| 124 | +- caveat: including transactions to a chain might make them valid again right away |
| 125 | + (for instance UTXO transaction gets in, but since we don't store spent outputs |
| 126 | + it will be valid again, awaiting the same inputs/tags to be satisfied) |
| 127 | + |
| 128 | +### `priority` |
| 129 | + |
| 130 | +Transaction priority describes importance of the transaction relative to other transactions |
| 131 | +in the pool. Block authors can expect benefiting from including such transactions |
| 132 | +before others. |
| 133 | + |
| 134 | +Note that we can't simply order transactions in the pool by `priority`, cause first |
| 135 | +we need to make sure that all of the transaction requirements are satisfied (see |
| 136 | +`requires/provides` section). However if we consider a set of transactions |
| 137 | +which all have their requirements (tags) satisfied, the block author should be |
| 138 | +choosing the ones with highest priority to include to the next block first. |
| 139 | + |
| 140 | +`priority` can be any number between `0` (lowest inclusion priority) to `u64::MAX` |
| 141 | +(highest inclusion priority). |
| 142 | + |
| 143 | +#### Rules & caveats |
| 144 | + |
| 145 | +- `priority` of transaction may change over time |
| 146 | +- on-chain conditions may affect `priority` |
| 147 | +- Given two transactions with overlapping `provides` tags, the one with higher |
| 148 | + `priority` should be preferred. However we can also look at the total priority |
| 149 | + of a subtree rooted at that transaction and compare that instead (i.e. even though |
| 150 | + the transaction itself has lower `priority` it "unlocks" other high priority transactions). |
| 151 | + |
| 152 | +### `longevity` |
| 153 | + |
| 154 | +Longevity describes how long (in blocks) the transaction is expected to be |
| 155 | +valid. This parameter only gives a hint to the transaction pool how long |
| 156 | +current transaction may still be valid. Note that it does not guarantee |
| 157 | +the transaction is valid all that time though. |
| 158 | + |
| 159 | +#### Rules & caveats |
| 160 | + |
| 161 | +- `longevity` of transaction may change over time |
| 162 | +- on-chain conditions may affect `longevity` |
| 163 | +- After `longevity` lapses the transaction may still be valid |
| 164 | + |
| 165 | +### `propagate` |
| 166 | + |
| 167 | +This parameter instructs the pool propagate/gossip a transaction to node peers. |
| 168 | +By default this should be `true`, however in some cases it might be undesirable |
| 169 | +to propagate transactions further. Examples might include heavy transactions |
| 170 | +produced by block authors in offchain workers (DoS) or risking being front |
| 171 | +runned by someone else after finding some non trivial solution or equivocation, |
| 172 | +etc. |
| 173 | + |
| 174 | +### 'TransactionSource` |
| 175 | + |
| 176 | +To make it possible for the runtime to distinguish if the transaction that is |
| 177 | +being validated was received over the network or submitted using local RPC or |
| 178 | +maybe it's simply part of a block that is being imported, the transaction pool |
| 179 | +should pass additional `TransactionSource` parameter to the validity function |
| 180 | +runtime call. |
| 181 | + |
| 182 | +This can be used by runtime developers to quickly reject transactions that for |
| 183 | +instance are not expected to be gossiped in the network. |
| 184 | + |
| 185 | + |
| 186 | +### `Invalid` transaction |
| 187 | + |
| 188 | +In case the runtime returns an `Invalid` error it means the transaction cannot |
| 189 | +be added to a block at all. Extracting the actual reason of invalidity gives |
| 190 | +more details about the source. For instance `Stale` transaction just indicates |
| 191 | +the transaction was already included in a block, while `BadProof` signifies |
| 192 | +invalid signature. |
| 193 | +Invalidity might also be temporary. In case of `ExhaustsResources` the |
| 194 | +transaction does not fit to the current block, but it might be okay for the next |
| 195 | +one. |
| 196 | + |
| 197 | +### `Unknown` transaction |
| 198 | + |
| 199 | +In case of `Unknown` validity, the runtime cannot determine if the transaction |
| 200 | +is valid or not in current block. However this situation might be temporary, so |
| 201 | +it is expected for the transaction to be retried in the future. |
| 202 | + |
| 203 | +# Implementation |
| 204 | + |
| 205 | +An ideal transaction pool should be storing only transactions that are considered |
| 206 | +valid by the runtime at current best imported block. |
| 207 | +After every block is imported, the pool should: |
| 208 | + |
| 209 | +1. Revalidate all transactions in the pool and remove the invalid ones. |
| 210 | +1. Construct the transaction inclusion graph based on `provides/requires` tags. |
| 211 | + Some transactions might not be reachable (have unsatisfied dependencies), |
| 212 | + they should be just left out in the pool. |
| 213 | +1. On block author request, the graph should be copied and transactions should |
| 214 | + be removed one-by-one from the graph starting from the one with highest |
| 215 | + priority and all conditions satisfied. |
| 216 | + |
| 217 | +With current gossip protocol, networking should propagate transactions in the |
| 218 | +same order as block author would include them. Most likely it's fine if we |
| 219 | +propagate transactions with cumulative weight not exceeding upcoming `N` |
| 220 | +blocks (choosing `N` is subject to networking conditions and block times). |
| 221 | + |
| 222 | +Note that it's not a strict requirement though to propagate exactly the same |
| 223 | +transactions that are prepared for block inclusion. Propagation is best |
| 224 | +effort, especially for block authors and is not directly incentivised. |
| 225 | +However the networking protocol might penalise peers that send invalid or |
| 226 | +useless transactions so we should be nice to others. Also see below a proposal |
| 227 | +to instead of gossiping everyting have other peers request transactions they |
| 228 | +are interested in. |
| 229 | + |
| 230 | +Since the pool is expected to store more transactions than what can fit |
| 231 | +to a single block. Validating the entire pool on every block might not be |
| 232 | +feasible, so the actual implementation might need to take some shortcuts. |
| 233 | + |
| 234 | +## Suggestions & caveats |
| 235 | + |
| 236 | +1. The validity of transaction should not change significantly from block to |
| 237 | + block. I.e. changes in validity should happen predicatably, e.g. `longevity` |
| 238 | + decrements by 1, `priority` stays the same, `requires` changes if transaction |
| 239 | + that provided a tag was included in block. `provides` does not change, etc. |
| 240 | + |
| 241 | +1. That means we don't have to revalidate every transaction after every block |
| 242 | + import, but we need to take care of removing potentially stale transactions. |
| 243 | + |
| 244 | +1. Transactions with exactly the same bytes are most likely going to give the |
| 245 | + same validity results. We can essentially treat them as identical. |
| 246 | + |
| 247 | +1. Watch out for re-organisations and re-importing transactions from retracted |
| 248 | + blocks. |
| 249 | + |
| 250 | +1. In the past there were many issues found when running small networks with a |
| 251 | + lot of re-orgs. Make sure that transactions are never lost. |
| 252 | + |
| 253 | +1. UTXO model is quite challenging. The transaction becomes valid right after |
| 254 | + it's included in block, however it is waiting for exactly the same inputs to |
| 255 | + be spent, so it will never really be included again. |
| 256 | + |
| 257 | +1. Note that in a non-ideal implementation the state of the pool will most |
| 258 | + likely always be a bit off, i.e. some transactions might be still in the pool, |
| 259 | + but they are invalid. The hard decision is about trade-offs you take. |
| 260 | + |
| 261 | +1. Note that import notification is not reliable - you might not receive a |
| 262 | + notification about every imported block. |
| 263 | + |
| 264 | +## Potential implementation ideas |
| 265 | + |
| 266 | +1. Block authors remove transactions from the pool when they author a block. We |
| 267 | + still store them around to re-import in case the block does not end up |
| 268 | + canonical. This only works if the block is actively authoring blocks (also |
| 269 | + see below). |
| 270 | + |
| 271 | +1. We don't prune, but rather remove a fixed amount of transactions from the front |
| 272 | + of the pool (number based on average/max transactions per block from the |
| 273 | + past) and re-validate them, reimporting the ones that are still valid. |
| 274 | + |
| 275 | +1. We periodically validate all transactions in the pool in batches. |
| 276 | + |
| 277 | +1. To minimize runtime calls, we introduce batch-verify call. Note it should reset |
| 278 | + the state (overlay) after every verification. |
| 279 | + |
| 280 | +1. Consider leveraging finality. Maybe we could verify against latest finalised |
| 281 | + block instead. With this the pool in different nodes can be more similar |
| 282 | + which might help with gossiping (see set reconciliation). Note that finality |
| 283 | + is not a strict requirement for a Substrate chain to have though. |
| 284 | + |
| 285 | +1. Perhaps we could avoid maintaining ready/future queues as currently, but |
| 286 | + rather if transaction doesn't have all requirements satisfied by existing |
| 287 | + transactions we attempt to re-import it in the future. |
| 288 | + |
| 289 | +1. Instead of maintaining a full pool with total ordering we attempt to maintain |
| 290 | + a set of next (couple of) blocks. We could introduce batch-validate runtime |
| 291 | + api method that pretty much attempts to simulate actual block inclusion of |
| 292 | + a set of such transactions (without necessarily fully running/dispatching |
| 293 | + them). Importing a transaction would consist of figuring out which next block |
| 294 | + this transaction have a chance to be included in and then attempting to |
| 295 | + either push it back or replace some of existing transactions. |
| 296 | + |
| 297 | +1. Perhaps we could use some immutable graph structure to easily add/remove |
| 298 | + transactions. We need some traversal method that takes priority and |
| 299 | + reachability into account. |
| 300 | + |
| 301 | +1. It was discussed in the past to use set reconciliation strategies instead of |
| 302 | +simply broadcasting all/some transactions to all/selected peers. An Ethereum's |
| 303 | +[EIP-2464](https://github.com/ethereum/EIPs/blob/5b9685bb9c7ba0f5f921e4d3f23504f7ef08d5b1/EIPS/eip-2464.md) |
| 304 | +might be a good first approach to reduce transaction gossip. |
| 305 | + |
| 306 | +# Current implementation |
| 307 | + |
| 308 | +Current implementation of the pool is a result of experiences from Ethereum's |
| 309 | +pool implementation, but also has some warts coming from the learning process of |
| 310 | +Substrate's generic nature and light client support. |
| 311 | + |
| 312 | +The pool consists of basically two independent parts: |
| 313 | + |
| 314 | +1. The transaction pool itself. |
| 315 | +2. Maintenance background task. |
| 316 | + |
| 317 | +The pool is split into `ready` pool and `future` pool. The latter contains |
| 318 | +transactions that don't have their requirements satisfied, and the former holds |
| 319 | +transactions that can be used to build a graph of dependencies. Note that the |
| 320 | +graph is build ad-hoc during the traversal process (getting the `ready` |
| 321 | +iterator). This makes the importing process cheaper (we don't need to find the |
| 322 | +exact position in the queue or graph), but traversal process slower |
| 323 | +(logarithmic). However most of the time we will only need the beginning of the |
| 324 | +total ordering of transactions for block inclusion or network propagation, hence |
| 325 | +the decision. |
| 326 | + |
| 327 | +The maintenance task is responsible for: |
| 328 | + |
| 329 | +1. Periodically revalidating pool's transactions (revalidation queue). |
| 330 | +1. Handling block import notifications and doing pruning + re-importing of |
| 331 | + transactions from retracted blocks. |
| 332 | +1. Handling finality notifications and relaying that to transaction-specific |
| 333 | + listeners. |
| 334 | + |
| 335 | +Additionally we maintain a list of recently included/rejected transactions |
| 336 | +(`PoolRotator`) to quickly reject transactions that are unlikely to be valid |
| 337 | +to limit number of runtime verification calls. |
| 338 | + |
| 339 | +Each time a transaction is imported, we first verify it's validity and later |
| 340 | +find if the tags it `requires` can be satisfied by transactions already in |
| 341 | +`ready` pool. In case the transaction is imported to the `ready` pool we |
| 342 | +additionally *promote* transactions from `future` pool if the transaction |
| 343 | +happened to fulfill their requirements. |
| 344 | +Note we need to cater for cases where transaction might replace a already |
| 345 | +existing transaction in the pool. In such case we check the entire sub-tree of |
| 346 | +transactions that we are about to replace, compare their cumulative priority to |
| 347 | +determine which subtree to keep. |
| 348 | + |
| 349 | +After a block is imported we kick-off pruning procedure. We first attempt to |
| 350 | +figure out what tags were satisfied by transaction in that block. For each block |
| 351 | +transaction we either call into runtime to get it's `ValidTransaction` object, |
| 352 | +or we check the pool if that transaction is already known to spare the runtime |
| 353 | +call. From this we gather full set of `provides` tags and perform pruning of |
| 354 | +`ready` pool based on that. Also we promote all transactions from `future` that |
| 355 | +have their tags satisfied. |
| 356 | + |
| 357 | +In case we remove transactions that we are unsure if they were already included |
| 358 | +in current block or some block in the past, it is being added to revalidation |
| 359 | +queue and attempted to be re-imported by the background task in the future. |
| 360 | + |
| 361 | +Runtime calls to verify transactions are performed from a separate (limited) |
| 362 | +thread pool to avoid interferring too much with other subsystems of the node. We |
| 363 | +definitely don't want to have all cores validating network transactions, cause |
| 364 | +all of these transactions need to be considered untrusted (potentially DoS). |
0 commit comments