Skip to content

Commit 940125e

Browse files
committed
more questions
Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
1 parent 415c3c4 commit 940125e

5 files changed

+26
-7
lines changed

mini-lsm-book/src/week1-04-sst.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ We do not provide reference answers to the questions, and feel free to discuss a
9999

100100
## Bonus Tasks
101101

102-
* **Explore different SST encoding and layout.** For example, in the [Lethe](https://disc-projects.bu.edu/lethe/) paper, the author adds secondary key support to SST.
102+
* **Explore different SST encoding and layout.** For example, in the [Lethe: Enabling Efficient Deletes in LSMs](https://disc-projects.bu.edu/lethe/) paper, where the author adds secondary key support to SST.
103103
* Or you can use B+ Tree as the SST format instead of sorted blocks.
104104
* **Index Blocks.** Split block indexes and block metadata into index blocks, and load them on-demand.
105105
* **Index Cache.** Use a separate cache for indexes apart from the data block cache.

mini-lsm-book/src/week2-01-compaction.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ scan 2000 2333
111111
* What are the definitions of read/write/space amplifications? (This is covered in the overview chapter)
112112
* What are the ways to accurately compute the read/write/space amplifications, and what are the ways to estimate them?
113113
* Is it correct that a key will take some storage space even if a user requests to delete it?
114-
* Given that compaction takes a lot of write bandwidth and read bandwidth and may interfere with foreground operations, it is a good idea to postpone compaction when there are large write flow. It is even beneficial to stop/pause existing compaction tasks in this situation. What do you think of this idea? (Read the [Silk](https://www.usenix.org/conference/atc19/presentation/balmau) paper!)
114+
* Given that compaction takes a lot of write bandwidth and read bandwidth and may interfere with foreground operations, it is a good idea to postpone compaction when there are large write flow. It is even beneficial to stop/pause existing compaction tasks in this situation. What do you think of this idea? (Read the [SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores](https://www.usenix.org/conference/atc19/presentation/balmau) paper!)
115115
* Is it a good idea to use/fill the block cache for compactions? Or is it better to fully bypass the block cache when compaction?
116116
* Does it make sense to have a `struct ConcatIterator<I: StorageIterator>` in the system?
117117
* Some researchers/engineers propose to offload compaction to a remote server or a serverless lambda function. What are the benefits, and what might be the potential challenges and performance impacts of doing remote compaction? (Think of the point when a compaction completes and what happens to the block cache on the next read request...)

mini-lsm-book/src/week2-02-simple.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ In this chapter, you will:
77
* Implement a simple leveled compaction strategy and simulate it on the compaction simulator.
88
* Start compaction as a background task and implement a compaction trigger in the system.
99

10-
## Task 1: Simple Leveled Compaction + Compaction Simulation
10+
## Task 1: Simple Leveled Compaction
1111

1212
In this chapter, we are going to implement our first compaction strategy -- simple leveled compaction. In this task, you will need to modify:
1313

@@ -152,6 +152,8 @@ You may print something, for example, the compaction task information, when the
152152

153153
## Test Your Understanding
154154

155+
* What is the estimated write amplification of leveled compaction?
156+
* What is the estimated read amplification of leveled compaction?
155157
* Is it correct that a key will only be purged from the LSM tree if the user requests to delete it and it has been compacted in the bottom-most level?
156158
* Is it a good strategy to periodically do a full compaction on the LSM tree? Why or why not?
157159
* Actively choosing some old files/levels to compact even if they do not violate the level amplifier would be a good choice, is it true? (Look at the [Lethe](https://disc-projects.bu.edu/lethe/) paper!)

mini-lsm-book/src/week2-03-tiered.md

+10-3
Original file line numberDiff line numberDiff line change
@@ -109,13 +109,20 @@ src/lsm_storage.rs
109109

110110
As tiered compaction does not use the L0 level of the LSM state, you should directly flush your memtables to a new tier instead of as an L0 SST. You can use `self.compaction_controller.flush_to_l0()` to know whether to flush to L0. You may use the first output SST id as the level/tier id for your new sorted run. You will also need to modify your compaction process to construct merge iterators for tiered compaction jobs.
111111

112+
## Related Readings
113+
114+
[Universal Compaction - RocksDB Wiki](https://github.com/facebook/rocksdb/wiki/Universal-Compaction)
115+
112116
## Test Your Understanding
113117

118+
* What is the estimated write amplification of leveled compaction? (Okay this is hard to estimate... But what if without the last *reduce sorted run* trigger?)
119+
* What is the estimated read amplification of leveled compaction?
114120
* What are the pros/cons of universal compaction compared with simple leveled/tiered compaction?
115-
* How much storage space is it required (compared with user data size) to run universal compaction without using up the storage device space?
121+
* How much storage space is it required (compared with user data size) to run universal compaction?
116122
* Can we merge two tiers that are not adjacent in the LSM state?
117-
* What happens if compaction cannot keep up with the SST flushes?
118-
* The log-on-log problem.
123+
* What happens if compaction speed cannot keep up with the SST flushes?
124+
* What might needs to be considered if the system schedules multiple compaction tasks in parallel?
125+
* SSDs also write its own logs (basically it is a log-structured storage). If the SSD has a write amplification of 2x, what is the end-to-end write amplification of the whole system? Related: [ZNS: Avoiding the Block Interface Tax for Flash-based SSDs](https://www.usenix.org/conference/atc21/presentation/bjorling).
119126

120127
We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community.
121128

mini-lsm-book/src/week2-04-leveled.md

+11-1
Original file line numberDiff line numberDiff line change
@@ -144,11 +144,21 @@ src/lsm_storage.rs
144144

145145
The implementation should be similar to simple leveled compaction. Remember to change both get/scan read path and the compaction iterators.
146146

147+
## Related Readings
148+
149+
[Leveled Compaction - RocksDB Wiki](https://github.com/facebook/rocksdb/wiki/Leveled-Compaction)
150+
147151
## Test Your Understanding
148152

149-
* Finding a good key split point for compaction may potentially reduce the write amplification, or it does not matter at all?
153+
* What is the estimated write amplification of leveled compaction?
154+
* What is the estimated read amplification of leveled compaction?
155+
* Finding a good key split point for compaction may potentially reduce the write amplification, or it does not matter at all? (Consider that case that the user write keys beginning with some prefixes, `00` and `01`. The number of keys under these two prefixes are different and their write patterns are different. If we can always split `00` and `01` into different SSTs...)
150156
* Imagine that a user was using tiered (universal) compaction before and wants to migrate to leveled compaction. What might be the challenges of this migration? And how to do the migration?
151157
* What if the user wants to migrate from leveled compaction to tiered compaction?
158+
* What happens if compaction speed cannot keep up with the SST flushes?
159+
* What might needs to be considered if the system schedules multiple compaction tasks in parallel?
160+
* What is the peak storage usage for leveled compaction? Compared with universal compaction?
161+
* Is it true that with a lower `level_size_multiplier`, you can always get a lower write amplification?
152162
* What needs to be done if a user not using compaction at all decides to migrate to leveled compaction?
153163

154164
We do not provide reference answers to the questions, and feel free to discuss about them in the Discord community.

0 commit comments

Comments
 (0)