feat: Memory stats #2162

chakaz · 2023-11-12T09:56:35Z

Add MEMORY STATS command that prints some useful memory related information, including RSS, db memory size, connection count & buffer size, replication count & buffer size, serialization buffer size.

chakaz · 2023-11-12T10:17:16Z

It looks like tracking the IoBufs is not enough.

Here are some numbers during the start and the middle of an active (full sync) replication, after a debug populate 1000000 askldjh 1000 RAND:

rss_bytes: 1098117120
data_bytes: 963174424
replication.connections: 2
replication.total_bytes: 512
replication.consumed_bytes: 0
replication.pending_input_bytes: 0
replication.pending_output_bytes: 512
serialization.total_bytes: 36864
serialization.consumed_bytes: 0
serialization.pending_input_bytes: 0
serialization.pending_output_bytes: 36864

127.0.0.1:6379> memory stats
rss_bytes: 1100730368  (2mb diff)
data_bytes: 963174424  (same)
replication.connections: 2   (same)
replication.total_bytes: 512   (same)
replication.consumed_bytes: 0   (same)
replication.pending_input_bytes: 0   (same)
replication.pending_output_bytes: 512   (same)
serialization.total_bytes: 36864   (same)
serialization.consumed_bytes: 0   (same)
serialization.pending_input_bytes: 0   (same)
serialization.pending_output_bytes: 36864   (same)

As you'll see, there is a 2mb diff related to replication which is not accounted for in the IoBufs, and I'm not sure where to find that data. It could be ephemeral (like just data being allocated temporarily in flows etc), but I wonder if 2mb is too much for that?
Note that I'm using a single thread replication.

romange · 2023-11-12T11:07:55Z

I would not worry about 2mb data differrence. I would start worrying about 200MB.
what if we artificially stop replica from pulling the data. Obviously, everything would stop progressing but Dragonfly would still be responsive to commands. what if you replicate items that are not strings but hashmaps/sets/lists that are very big by themselves (each one is 100MB) would it still be 2MB ?

chakaz · 2023-11-13T07:33:09Z

As discussed offline, I also added tracking for channels, which is a major factor in growth and shrinking of serialization operations.

chakaz · 2023-11-13T07:34:15Z

I'd like to merge this now, and work on some stress tests to show potential misses in our memory coverage.
For any found gaps, I'll send additional PRs

romange · 2023-11-13T07:58:55Z

src/core/size_tracking_channel.h

+
+#pragma once
+
+#include "core/fibers.h"


i used core fibers because I migrated from boost fibers. let's just reference the correct include file directly

That would mean using util::fb2::SimpleChannel instead of dfly, I thought that was the purpose of that file?

romange · 2023-11-13T08:00:33Z

src/server/memory_cmd.cc

+
+  // RSS
+  stats.push_back("rss_bytes");
+  stats.push_back(absl::StrCat(rss_mem_current.load()));


i prefer that we pass memory_order_relaxed

romange · 2023-11-13T08:04:38Z

src/server/memory_cmd.cc

+  stats.push_back("serialization");
+  stats.push_back(absl::StrCat(serialization_memory));
+
+  return (*cntx_)->SendSimpleStrArr(stats);


nice that you used a formatted output. nit: you can specify this as MAP to get even better representation for resp3 enabled clients.

dranikpg · 2023-11-13T08:27:49Z

src/core/size_tracking_channel.h

+  // Here and below, we must accept a T instead of building it from variadic args, as we need to
+  // know its size in case it is added.
+  void Push(T t) noexcept {
+    size_ += t.size();
+    queue_.Push(std::move(t));
+  }


For RDB serialization we use multiple snapshots that write into a single channel, so it's not safe to access an non-atomic size

If you add this, please also update the code around, so it doesn't become convoluted where there a multiple ways to do the same... For example RdbSaver::Impl::GetTotalBuffersSize uses stats for approximating the high bound of this very channel capacity that you added

Also about (2), it will now be the case that because we have two separate memory tracking systems, info memory and memory stats will return different results 🤷🏻‍♂️🤷🏻‍♂️

Re/ 1: good comment, will fix, thanks!
Re/ 2: I'm not sure what you mean, if I update GetTotalBufferSize() to use the to-be-added GetTotalChannelCapacity() method, why would memory stats have different results? Or did you mean something else?

No, I just meant that if you don't unify the tracking approaches eventually, they'll not only have duplicated code, but will also diverge in values 🙂

Any reason not to modify GetTotalBufferSize() to use GetTotalChannelCapacity()? Current implementation returns how many total bytes inserted, without ever decreasing that number, it might be hugely over estimating..

No, there is no reason not to modify it

PS: It does decrease the number, pulled_bytes grows, so it shouldn't reach zero at the very end 🤔

size_t total_bytes = pushed_bytes.load(memory_order_relaxed) + serializer_bytes.load(memory_order_relaxed) - pulled_bytes;

right, I missed that.
Anyway, I unified the logic, thanks for pointing that out.

dranikpg

👍🏻

dranikpg · 2023-11-13T11:29:01Z

src/server/rdb_save.cc

+  auto cb = [this, &channel_bytes, &serializer_bytes](ShardId sid) {
    auto& snapshot = shard_snapshots_[sid];
-    pushed_bytes.fetch_add(snapshot->pushed_bytes(), memory_order_relaxed);
+    channel_bytes.fetch_add(snapshot->GetTotalChannelCapacity(), memory_order_relaxed);
    serializer_bytes.store(snapshot->GetTotalBufferCapacity(), memory_order_relaxed);


Nit: I think we can do it without hops now (if the state doesn't change in-between) 🤔 But I think it doesn't matter for a monitoring command

feat: Memory stats

255eb6c

Also track channels

6af1a5a

chakaz marked this pull request as ready for review November 13, 2023 07:33

chakaz requested a review from romange November 13, 2023 07:33

romange reviewed Nov 13, 2023

View reviewed changes

Mapify

5446245

chakaz requested a review from romange November 13, 2023 08:23

dranikpg reviewed Nov 13, 2023

View reviewed changes

shahar added 2 commits November 13, 2023 10:32

Use atomic for single integer

88ef6f7

Use atomic for size

d47b695

dranikpg previously approved these changes Nov 13, 2023

View reviewed changes

Unify logic

2697d8e

chakaz dismissed dranikpg’s stale review via 2697d8e November 13, 2023 11:23

chakaz requested a review from dranikpg November 13, 2023 11:24

dranikpg approved these changes Nov 13, 2023

View reviewed changes

chakaz merged commit 5ca2be1 into main Nov 13, 2023

chakaz deleted the memory-stats branch November 13, 2023 11:58

BorysTheDev pushed a commit that referenced this pull request Nov 13, 2023

feat: Memory stats (#2162)

932eab8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Memory stats #2162

feat: Memory stats #2162

chakaz commented Nov 12, 2023

chakaz commented Nov 12, 2023

romange commented Nov 12, 2023

chakaz commented Nov 13, 2023

chakaz commented Nov 13, 2023

romange Nov 13, 2023

chakaz Nov 13, 2023

romange Nov 13, 2023

romange Nov 13, 2023

dranikpg Nov 13, 2023 •

edited

Loading

dranikpg Nov 13, 2023

chakaz Nov 13, 2023

dranikpg Nov 13, 2023

chakaz Nov 13, 2023

dranikpg Nov 13, 2023

chakaz Nov 13, 2023

dranikpg left a comment

dranikpg Nov 13, 2023


		#pragma once

		#include "core/fibers.h"

feat: Memory stats #2162

feat: Memory stats #2162

Conversation

chakaz commented Nov 12, 2023

chakaz commented Nov 12, 2023

romange commented Nov 12, 2023

chakaz commented Nov 13, 2023

chakaz commented Nov 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dranikpg Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dranikpg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dranikpg Nov 13, 2023 •

edited

Loading