Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Memory stats #2162

Merged
merged 6 commits into from
Nov 13, 2023
Merged

feat: Memory stats #2162

merged 6 commits into from
Nov 13, 2023

Conversation

chakaz
Copy link
Contributor

@chakaz chakaz commented Nov 12, 2023

Add MEMORY STATS command that prints some useful memory related information, including RSS, db memory size, connection count & buffer size, replication count & buffer size, serialization buffer size.

@chakaz
Copy link
Contributor Author

chakaz commented Nov 12, 2023

It looks like tracking the IoBufs is not enough.

Here are some numbers during the start and the middle of an active (full sync) replication, after a debug populate 1000000 askldjh 1000 RAND:

rss_bytes: 1098117120
data_bytes: 963174424
replication.connections: 2
replication.total_bytes: 512
replication.consumed_bytes: 0
replication.pending_input_bytes: 0
replication.pending_output_bytes: 512
serialization.total_bytes: 36864
serialization.consumed_bytes: 0
serialization.pending_input_bytes: 0
serialization.pending_output_bytes: 36864

127.0.0.1:6379> memory stats
rss_bytes: 1100730368  (2mb diff)
data_bytes: 963174424  (same)
replication.connections: 2   (same)
replication.total_bytes: 512   (same)
replication.consumed_bytes: 0   (same)
replication.pending_input_bytes: 0   (same)
replication.pending_output_bytes: 512   (same)
serialization.total_bytes: 36864   (same)
serialization.consumed_bytes: 0   (same)
serialization.pending_input_bytes: 0   (same)
serialization.pending_output_bytes: 36864   (same)

As you'll see, there is a 2mb diff related to replication which is not accounted for in the IoBufs, and I'm not sure where to find that data. It could be ephemeral (like just data being allocated temporarily in flows etc), but I wonder if 2mb is too much for that?
Note that I'm using a single thread replication.

@romange
Copy link
Collaborator

romange commented Nov 12, 2023

I would not worry about 2mb data differrence. I would start worrying about 200MB.
what if we artificially stop replica from pulling the data. Obviously, everything would stop progressing but Dragonfly would still be responsive to commands. what if you replicate items that are not strings but hashmaps/sets/lists that are very big by themselves (each one is 100MB) would it still be 2MB ?

@chakaz
Copy link
Contributor Author

chakaz commented Nov 13, 2023

As discussed offline, I also added tracking for channels, which is a major factor in growth and shrinking of serialization operations.

@chakaz chakaz marked this pull request as ready for review November 13, 2023 07:33
@chakaz chakaz requested a review from romange November 13, 2023 07:33
@chakaz
Copy link
Contributor Author

chakaz commented Nov 13, 2023

I'd like to merge this now, and work on some stress tests to show potential misses in our memory coverage.
For any found gaps, I'll send additional PRs


#pragma once

#include "core/fibers.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i used core fibers because I migrated from boost fibers. let's just reference the correct include file directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean using util::fb2::SimpleChannel instead of dfly, I thought that was the purpose of that file?


// RSS
stats.push_back("rss_bytes");
stats.push_back(absl::StrCat(rss_mem_current.load()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i prefer that we pass memory_order_relaxed

stats.push_back("serialization");
stats.push_back(absl::StrCat(serialization_memory));

return (*cntx_)->SendSimpleStrArr(stats);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice that you used a formatted output. nit: you can specify this as MAP to get even better representation for resp3 enabled clients.

@chakaz chakaz requested a review from romange November 13, 2023 08:23
Comment on lines 21 to 26
// Here and below, we must accept a T instead of building it from variadic args, as we need to
// know its size in case it is added.
void Push(T t) noexcept {
size_ += t.size();
queue_.Push(std::move(t));
}
Copy link
Contributor

@dranikpg dranikpg Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For RDB serialization we use multiple snapshots that write into a single channel, so it's not safe to access an non-atomic size
  2. If you add this, please also update the code around, so it doesn't become convoluted where there a multiple ways to do the same... For example RdbSaver::Impl::GetTotalBuffersSize uses stats for approximating the high bound of this very channel capacity that you added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also about (2), it will now be the case that because we have two separate memory tracking systems, info memory and memory stats will return different results 🤷🏻‍♂️🤷🏻‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re/ 1: good comment, will fix, thanks!
Re/ 2: I'm not sure what you mean, if I update GetTotalBufferSize() to use the to-be-added GetTotalChannelCapacity() method, why would memory stats have different results? Or did you mean something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just meant that if you don't unify the tracking approaches eventually, they'll not only have duplicated code, but will also diverge in values 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to modify GetTotalBufferSize() to use GetTotalChannelCapacity()? Current implementation returns how many total bytes inserted, without ever decreasing that number, it might be hugely over estimating..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is no reason not to modify it

PS: It does decrease the number, pulled_bytes grows, so it shouldn't reach zero at the very end 🤔

size_t total_bytes = pushed_bytes.load(memory_order_relaxed) + serializer_bytes.load(memory_order_relaxed) - pulled_bytes;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I missed that.
Anyway, I unified the logic, thanks for pointing that out.

dranikpg
dranikpg previously approved these changes Nov 13, 2023
Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Comment on lines +1098 to 1101
auto cb = [this, &channel_bytes, &serializer_bytes](ShardId sid) {
auto& snapshot = shard_snapshots_[sid];
pushed_bytes.fetch_add(snapshot->pushed_bytes(), memory_order_relaxed);
channel_bytes.fetch_add(snapshot->GetTotalChannelCapacity(), memory_order_relaxed);
serializer_bytes.store(snapshot->GetTotalBufferCapacity(), memory_order_relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we can do it without hops now (if the state doesn't change in-between) 🤔 But I think it doesn't matter for a monitoring command

@chakaz chakaz merged commit 5ca2be1 into main Nov 13, 2023
@chakaz chakaz deleted the memory-stats branch November 13, 2023 11:58
BorysTheDev pushed a commit that referenced this pull request Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants