You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It should be possible to (on average) receive frames with two system calls per frame (size and msg), no further copies, and no allocations.
In package network, file tcp.go, Receive and receiveRaw could be improved to (re)use a []byte which is stored in the TCPConn. This would reduce garbage created per frame.
When the length is known, the []byte should be stretched if size < cap, or realloced.
The read loop should be reading directly into the per-connection []byte. It should try to minimise the number of calls to Read (it already does a good job of this).
The []byte returned by receiveRaw would be a slice on the per-connection buffer not including the size. Then Unmarshal should stop using bytes.Buffer, since it doesn't need it. Instead, it can copy 16 bytes into a UUID, and then pass a slice onwards to protobuf (which is already careful to work on the buffer it was given efficiently).
The per-connection []byte can be checked before the exit to Receive, and if it is too big, free it. (Letting rarely used connections hold a lot of RAM is not good for scalability.)
Of course, like all performance changes, this needs to be measured before/after. It should be pretty easy to measure that before allocs > 1 and after allocs == 1, which is already a huge win because it reduces GC pressure.
The text was updated successfully, but these errors were encountered:
At a bare minimum, it is unclear why receiveRaw is copying bytes into a bytes.Buffer. The code change in #532 (comment) removes this. It needs more testing, but it should be checked in as a first step.
It should be possible to (on average) receive frames with two system calls per frame (size and msg), no further copies, and no allocations.
In package network, file tcp.go, Receive and receiveRaw could be improved to (re)use a []byte which is stored in the TCPConn. This would reduce garbage created per frame.
When the length is known, the []byte should be stretched if size < cap, or realloced.
The read loop should be reading directly into the per-connection []byte. It should try to minimise the number of calls to Read (it already does a good job of this).
The []byte returned by receiveRaw would be a slice on the per-connection buffer not including the size. Then Unmarshal should stop using bytes.Buffer, since it doesn't need it. Instead, it can copy 16 bytes into a UUID, and then pass a slice onwards to protobuf (which is already careful to work on the buffer it was given efficiently).
The per-connection []byte can be checked before the exit to Receive, and if it is too big, free it. (Letting rarely used connections hold a lot of RAM is not good for scalability.)
Of course, like all performance changes, this needs to be measured before/after. It should be pretty easy to measure that before allocs > 1 and after allocs == 1, which is already a huge win because it reduces GC pressure.
The text was updated successfully, but these errors were encountered: