-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitswap should send multiple blocks per message #4378
Comments
while we're on the topic of making bitswap faster: |
Just going to dump perf thoughts/observations here:
|
Good approach might be to deprecate the top level |
I would like to work on improving the performance of bitswap. I did a simple directory transfer between 2 nodes in a private network and found rsync (7 seconds) did much better(more than factor of 2x) than ipfs (16 seconds) #4838 @whyrusleeping do you think the performance improvement suggestions still hold good? Any low hanging fruits that I could start working on right away? |
@kvm2116 most of these still hold, yes. I would first try making sure you use the new datastore badgerds on your ipfs nodes for benchmarking performance, it will likely change your results a bit. You can set this up by If you do start hacking on some of these perf issues, I would ask that you please try to make your changes as small and clean and incremental as possible. And it would also help if you write up your approach making a given change in an issue for feedback before submitting a PR. We get a lot of PRs that are really big, hard to review, and contain changes that werent discussed by the team beforehand and it makes the whole process go really slow. I need to write this into the contributing guidelines really. As for low hanging fruit... |
@whyrusleeping Question: type peerRequestTask struct {
Entries []*wantlist.Entry
Priority int
Target peer.ID
// A callback to signal that this task has been completed
Done func()
// created marks the time that the task was added to the queue
created time.Time
index int // book-keeping field used by the pq container
}
// Envelope contains a message for a Peer
type Envelope struct {
// Peer is the intended recipient
Peer peer.ID
// Message is the payload
Message bsmsg.BitSwapMessage
// A callback to notify the decision queue that the task is complete
Sent func()
}
func (e *Engine) nextEnvelope(ctx context.Context) (*Envelope, error) {
//etc.....
for {
nextTask := e.peerRequestQueue.Pop()
// with a task in hand, we're ready to prepare the envelope...
msg := bsmsg.New(true)
for _, entry := range nextTask.Entries {
block, err := e.bs.Get(entry.Cid)
if err != nil {
log.Errorf("tried to execute a task and errored fetching block: %s", err)
continue
}
msg.AddBlock(block)
}
return &Envelope{
Peer: nextTask.Target,
Message: msg,
Sent: func() {
nextTask.Done()
select {
case e.workSignal <- struct{}{}:
// work completing may mean that our queue will provide new
// work to be done.
default:
}
},
}, nil
}
}
// Push currently adds a new peerRequestTask to the end of the list
func (tl *prq) Push(entries []*wantlist.Entry, to peer.ID) {
//etc.....
task := &peerRequestTask{
Entries: newEntries,
Target: to,
created: time.Now(),
Done: func() {
tl.lock.Lock()
for _, entry := range newEntries {
partner.TaskDone(entry.Cid)
}
tl.pQueue.Update(partner.Index())
tl.lock.Unlock()
},
}
partner.taskQueue.Push(task)
tl.taskMap[task.Key()] = task
partner.requests++
tl.pQueue.Update(partner.Index())
}
func (bs *Bitswap) taskWorker(ctx context.Context, id int) {
idmap := logging.LoggableMap{"ID": id}
defer log.Debug("bitswap task worker shutting down...")
for {
log.Event(ctx, "Bitswap.TaskWorker.Loop", idmap)
select {
case nextEnvelope := <-bs.engine.Outbox():
select {
case envelope, ok := <-nextEnvelope:
if !ok {
continue
}
// update the BS ledger to reflect sent message
// TODO: Should only track *useful* messages in ledger
outgoing := bsmsg.New(false)
for _, block := range envelope.Message.Blocks() {
log.Event(ctx, "Bitswap.TaskWorker.Work", logging.LoggableF(func() map[string]interface{} {
return logging.LoggableMap{
"ID": id,
"Target": envelope.Peer.Pretty(),
"Block": block.Cid().String(),
}
}))
outgoing.AddBlock(block)
}
bs.engine.MessageSent(envelope.Peer, outgoing)
bs.wm.SendBlocks(ctx, envelope)
bs.counterLk.Lock()
for _, block := range envelope.Message.Blocks() {
bs.counters.blocksSent++
bs.counters.dataSent += uint64(len(block.RawData()))
}
bs.counterLk.Unlock()
case <-ctx.Done():
return
}
case <-ctx.Done():
return
}
}
} |
@taylormike That does seem like a pretty reasonable approach, i think thats roughly what I was envisioning. I'm curious how you're going to select multiple blocks for the envelope (not in a doubting way), we need some sort of metric for how many blocks to put in a given message. That could just be 'up to 500kb per message' or something. |
This is actually what we are doing in js-ipfs-bitswap. We aggregate blocks until we hit 512 * 1024 bytes, and then send the message out. |
@whyrusleeping Solution: (See SendBlocks below) const maxMessageSize = 512 * 1024
// etc...
func (pm *WantManager) SendBlocks(ctx context.Context, env *engine.Envelope) {
// Blocks need to be sent synchronously to maintain proper backpressure
// throughout the network stack
defer env.Sent()
blockSize := 0
msgSize := 0
msg := bsmsg.New(false)
for _, block := range env.Message.Blocks() {
blockSize = len(block.RawData())
// Split into messages of max 512 * 1024 bytes.
if msgSize + blockSize > maxMessageSize {
err := pm.network.SendMessage(ctx, env.Peer, msg)
if err != nil {
log.Infof("sendblock error: %s", err)
}
// Reset message
msgSize = 0
msg = bsmsg.New(false)
}
msgSize += blockSize
pm.sentHistogram.Observe(float64(blockSize))
msg.AddBlock(block)
log.Infof("Sending block %s to %s", block, env.Peer)
}
// Send remaining blocks
err := pm.network.SendMessage(ctx, env.Peer, msg)
if err != nil {
log.Infof("sendblock error: %s", err)
}
}
func (e *Engine) nextEnvelope(ctx context.Context) (*Envelope, error) {
//etc.....
for {
nextTask := e.peerRequestQueue.Pop()
// with a task in hand, we're ready to prepare the envelope...
msg := bsmsg.New(true)
for _, entry := range nextTask.Entries {
block, err := e.bs.Get(entry.Cid)
if err != nil {
log.Errorf("tried to execute a task and errored fetching block: %s", err)
continue
}
msg.AddBlock(block)
}
return &Envelope{
Peer: nextTask.Target,
Message: msg,
Sent: func() {
nextTask.Done()
select {
case e.workSignal <- struct{}{}:
// work completing may mean that our queue will provide new
// work to be done.
default:
}
},
}, nil
}
} |
@taylormike sorry for the delay, took some time off. I think this is close, but not quite the right approach. This approach means that a peer requesting a large amount of data would end up taking up an entire worker to send them all the blocks they requested. Each task unit should be roughly the same size in order for the task scheduler to work correctly. Your next envelope function looks right, we should either modify places where we call |
@whyrusleeping Solution: e.peerRequestQueue.Push(activeEntries, p) //TODO: Coalesce/ Split 'activeEntries' // MessageReceived performs book-keeping. Returns error if passed invalid
// arguments.
func (e *Engine) MessageReceived(p peer.ID, m bsmsg.BitSwapMessage) error {
if len(m.Wantlist()) == 0 && len(m.Blocks()) == 0 {
log.Debugf("received empty message from %s", p)
}
newWorkExists := false
defer func() {
if newWorkExists {
e.signalNewWork()
}
}()
l := e.findOrCreate(p)
l.lk.Lock()
defer l.lk.Unlock()
if m.Full() {
l.wantList = wl.New()
}
var activeEntries []*wl.Entry
for _, entry := range m.Wantlist() { //TODO: Coalesce/ Split
if entry.Cancel {
log.Debugf("%s cancel %s", p, entry.Cid)
l.CancelWant(entry.Cid)
e.peerRequestQueue.Remove(entry.Cid, p)
} else {
log.Debugf("wants %s - %d", entry.Cid, entry.Priority)
l.Wants(entry.Cid, entry.Priority)
if exists, err := e.bs.Has(entry.Cid); err == nil && exists {
activeEntries = append(activeEntries, entry.Entry)
newWorkExists = true
}
}
}
if len(activeEntries) > 0 {
e.peerRequestQueue.Push(activeEntries, p) //TODO: Coalesce/ Split 'activeEntries'
}
for _, block := range m.Blocks() {
log.Debugf("got block %s %d bytes", block, len(block.RawData()))
l.ReceivedBytes(len(block.RawData()))
}
return nil
} |
@whyrusleeping Instead... Solution: Question: type Blockstore interface {
DeleteBlock(*cid.Cid) error
Has(*cid.Cid) (bool, error)
Get(*cid.Cid) (blocks.Block, error)
// Stat returns the CIDs mapped BlockSize <-- Interface change
Stat(*cid.Cid) (uint, error)
// Put puts a given block to the underlying datastore
Put(blocks.Block) error
// PutMany puts a slice of blocks at the same time using batching
// capabilities of the underlying datastore whenever possible.
PutMany([]blocks.Block) error
// AllKeysChan returns a channel from which
// the CIDs in the Blockstore can be read. It should respect
// the given context, closing the channel if it becomes Done.
AllKeysChan(ctx context.Context) (<-chan *cid.Cid, error)
// HashOnRead specifies if every read block should be
// rehashed to make sure it matches its CID.
HashOnRead(enabled bool)
} |
@taylormike Yeah, that seems like a reasonable addition to the blockstore interface. I think we've wanted this before for one reason or another. cc @kevina @magik6k |
If i'm fetching a large dataset from some other peer, they will currently only send me one block per message. Even with concurrent request pipelining this is a bit inefficient. We should set things up to be able to pack multiple blocks in a given message, especially if they are smaller indirect blocks.
This will be even more important when we move towards using selectors to fetch things in bitswap.
The text was updated successfully, but these errors were encountered: