-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Header Synchronization Improvement #625
Conversation
e7c41c7
to
e739f91
Compare
I suggest we do not immediately deactivate peers on communication failure but allow them some slack within a certain time window. The motivation is to stay connected to very useful peers. I would also consider using the existing issue_warning mechanism. |
apps/arweave/src/ar_http.erl
Outdated
@@ -179,6 +181,7 @@ handle_info({gun_error, PID, Reason}, | |||
prometheus_gauge:dec(outbound_connections), | |||
ok | |||
end, | |||
ar_peers:inactivate_peer(Peer), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be getting here on a regular connection shutdown (occurs after a minute) too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we get here, that means the connection is down and the peer is seen inactive by the node.
If the connection is cut, the server is not connected anymore.
If I understand correctly the code, it's when a peer is rejected. I will add it as well. |
After a conversation with @JamesPiechota yesterday, I think this PR can be confusing, I will try to do a quick summary of the problem(s) we encounter with the current
Here the strategies I have deployed/tested on my test server:
What about the success/failures ratio? In fact, my test server is currently running a patched version of arweave used to collect this information1 with the first strategy and 400 jobs. With only the active peers, after more than 12 hours of data collection, we have this ratio: 159741 failures and only 28920. A job has ~18% chance to find a block in the network right now, and this value decrease over time. What about the discussion we had with @JamesPiechota? The current implementation does not bet on performance/speed but more on distribution, but the fact we are keeping dead peers can lead to problem. The next update to this PR should include:
I will probably rename things to avoid confusion.
In the end, this PR/issues led me to a way to improve our knowledge of the network. At this time, the data collected to create the ranking per peer does not offer enough information regarding its whole state. My next idea is to implement % main way to tag with ar_tags module
ar_tags:set_tag(ar_peers, Peer, {connection, active}, true}).
ar_tags:set_tag(ar_peers, Peer, {connection, active}, false}).
% a frontend to ar_tags
ar_peers:set_tag(Peer, {connection, active}, true}).
ar_peers:set_tag(Peer, {connection, active}, false}). Footnotes
|
e739f91
to
e6c9f7d
Compare
51adcd9
to
a5e17e9
Compare
The peers we have established connections with are peers we attempted to make requests to in the first place; it's not like we are establishing connections with every peer we know (so that filtering by active connections would be useful) but we are simply reusing connections we have already established for efficiency. Node being temporarily down or time-outing excessively, on the other hand, might be a useful metric to track. Consequently we should not purge peers we are not making requests to, we should purge peers we cannot connect to for a certain amount of time. |
a5e17e9
to
e80dd7b
Compare
I've added a |
ae078db
to
d3aaf16
Compare
Note: this following text is just to give you some example of what's happening on other projects based on academic publication. In 2019, a paper called Exploring the Monero Peer-to-Peer Network was pusblished. It explores the Monero network to extract information regarding peers and their activities. The mechanism used by Monero is described in section 2.2:
On bitcoin side, a description of the protocol and how it manages peers can be seen in Information Propagation in the Bitcoin Network in section 3.
Now, IPFS, the protocol used is described in Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web section 2 and 3.
It seems peers are being purged in any case. So, keeping peers ad vitam aeternam is perhaps not the right solution. The only way to know if we can do that is to collect information about active/inactive peers in the network. At this time, I don't think we have this information, and I'm not sure if we have tools for doing that. Because arweave goal is different than bitcoin or monero, it could be perhaps good to store those nodes longer than expected and collect information about them. In fact, I was thinking to completely refactorize |
We do collect this information; active peers end up higher on the list whereas inactive end up lower. We can introduce rocksdb any time, off course; I would not do it earlier than we really need it, so far the data volume is small. |
a2ecf6d
to
2328ca8
Compare
2328ca8
to
b479bfc
Compare
Added a way to know if a peer is up or down in `ar_peers` by calling `ar_peers:connected_peer/1` and `ar_peers:disconnected_peer/1` functions. To know if a peer is activate, one can call `ar_peers:is_connected_peer/1` function. All connected peers are now being tagged with a timestamp with one connection is successful. This timestamp can be retrieved using `ar_peers:get_connection_timestamp/1` function. lifetime peers list did not change, but current peers list is now only displaying the active peers during the last 30 days by default. "tags" are used as simple key in `ar_peers` ets table, prefixed with the `ar_tags` atom. They are saved in the `peers` file maintained with `ar_storage` function. Modified `ar_http_iface_client:get_block_shadow/2` function to add more parameters, in particular the value used to pick a peer from peer list. Modified `ar_header_sync` module to only select active peer when synchronizing headers. Fixed a typo in `bin/console`.
b479bfc
to
95cd179
Compare
LGTM! |
Added a way to know if a peer is up or down
in
ar_peers
by callingar_peers:connected_peer/1
and
ar_peers:disconnected_peer/1
functions. Toknow if a peer is activate, one can call
ar_peers:is_connected_peer/1
function. The fulllist of active peers (present in the ranking) can
be returned by calling
ar_peers:get_peers/1
function.
All connected peers are now being tagged with a
timestamp with one connection is successful. This
timestamp can be retrieved using
ar_peers:get_connection_timestamp/1
function. To filter peers based on their timestamps,
ar_peers:get_peers/1
can be used like this:Arweave can be started with a new option, controlling
the nodes returned by
/peers
. Onlycurrent
valuecan be filtered at this time of writing, and only
a number of positive days can be configured.
The default peers filtering is set to 30 days:
{current, 30}
."tags" are used as simple key in
ar_peers
ets table, prefixed with the
ar_tags
atom. Theyare saved in the
peers
file maintained withar_storage
function.Modified
ar_http_iface_client:get_block_shadow/2
function to add more parameters, in particular
to modify default timeout and the value used to
pick a peer from peer list.
Modified
ar_header_sync
module to only selectactive peer when synchronizing headers.
Fixed a typo in
bin/console
.