You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The gossip connection locks the connection to preserve FIFO order
when waiting for room in the output buffer to be available.
A gossip connection has a corresponding goroutine for each reading
or writing from the input/output buffer for the connection.
When a connection is ordered to send a message, it tries to enqueue
it into the output buffer, and if there is no room - it discards
the message to not block the application layer.
However, private data message shouldn't be discarded and the goroutine
sending them should wait for room to be available in the buffer.
The problem is, that this is done under a lock and thus the following
might occur:
peer p sends to peer q a private data message and q sends one to p.
If p's and q's output buffers are full, the goroutine that send()s
holds the lock, and waits for the output buffer to have room.
However, to drain the receive buffer on the other side, both
peers also need to obtain a lock on the connection.
This results in a distributed deadlock.
The most sensible fix in my opinion is to just no lock the connection
when waiting on the output buffer.
Change-Id: I63a64e9cf08364d2023d99f2bedb1e382765e6a8
Signed-off-by: yacovm <yacovm@il.ibm.com>
0 commit comments