RUST-1384 Move all `Ctx` actions to a dedicated worker thread #9

abr-egn · 2022-08-01T16:16:42Z

The libmongocrypt documentation specifies that functions on mongocrypt_ctx_t are not threadsafe and must be owned by a single thread. This is easily expressable in the Rust type system - just make Ctx not Send - but that turns out to be a major issue for using it in async code, as any future that holds a Ctx across an .await point itself becomes itself non-Send, which means it basically can't be invoked (this is a very handwavy oversimplification, but it is the practical outcome).

This refactor creates a dedicated worker loop bound to the lifetime of the rust Ctx value; actions are performed by sending it closures to execute that operate directly on the raw pointer. This keeps the Rust lifetime annotations on the "outside" of the thread boundary, which allows values produced to be shared rather than copied. The major downside heres are:

The cost of boxing the closures; those are small (captured context is just a few pointers) so that shouldn't be a substantial factor.
Latency introduced by the channel. This seems unavoidable, as any situation with a dedicated thread and async code is going to need something channel-like.

abr-egn · 2022-08-01T16:19:12Z

mongocrypt/src/ctx.rs

@@ -21,11 +31,9 @@ impl HasStatus for CtxBuilder {
 }

 impl CtxBuilder {
-    /// Takes ownership of the given pointer, and will destroy it on drop.
-    pub(crate) fn steal(inner: *mut sys::mongocrypt_ctx_t) -> Self {


Because Ctx is on the outside of the thread boundary, this can no longer take ownership of a raw pointer and produce a Ctx; instead, it borrows the pointer (invisible change to the user) and produces a BuiltCtx which is just a token that one of the build methods has been called.

abr-egn · 2022-08-01T16:20:22Z

mongocrypt/src/ctx.rs

-            if !sys::mongocrypt_ctx_setopt_key_id(*self.inner.borrow(), *bin.native()) {
-                return Err(self.status().as_error());
+            if !sys::mongocrypt_ctx_setopt_key_id(self.inner, *bin.native()) {
+                return Err(self.error());


The tweak from .status().as_error() to .error() happened in an earlier cycle of refactoring, but it seemed a nice minor change worth preserving.

abr-egn · 2022-08-01T16:22:34Z

mongocrypt/src/ctx.rs

@@ -352,24 +363,111 @@ impl Algorithm {
 }

 pub struct Ctx {
-    inner: OwnedPtr<sys::mongocrypt_ctx_t>,
+    worker: Mutex<Sender<CtxAction>>,


In usage, it turned out to be convenient to make Ctx implement Sync; this probably isn't entirely necessary but the overhead of the Mutex here should be very minor.

abr-egn · 2022-08-01T16:28:38Z

mongocrypt/src/ctx.rs

+    }
+}
+
+fn spawn(f: impl FnOnce() + Send + 'static) {


Spawning the worker thread is complicated by wanting to play nicely in an async context. In async contexts, the blocking threadpool exists for exactly this sort of need and will amortize the cost of thread spawning and allow the user to put bounds on resource usage, so it's pretty important to use that.

For purely sync code, it would probably be better to use a threadpool to get some of those benefits as well, but since there's precisely one anticipated user of this code and it's async that didn't seem worth the time or extra dependencies.

abr-egn · 2022-08-01T17:24:03Z

mongocrypt/src/ctx.rs

+    }
+}
+
+struct AssertSendPtr<T>(*mut T);


Raw pointers are by default not Send, so for the various types used as arguments/returns from Ctx functions we need to pinky-swear to the compiler that they're okay.

The get() method is there to work around disjoint captures - normally it's helpful but in this case if the call site uses .0 Rust will helpfully capture only the inner field, and then less-helpfully complain that the inner field isn't Send.

abr-egn · 2022-08-01T17:53:33Z

mongocrypt/src/ctx.rs

+}
+
+/// This is a sync version of `tokio::sync::oneshot::channel`; sending will never block (and so can be used in async code), receiving will synchronously block until a value is sent.
+fn oneshot<T>() -> (OneshotSender<T>, OneshotReceiver<T>) {


I was a little surprised this didn't already exist. I ended up not needing to use this to send values from async code, but the comment is still accurate so I kept it.

patrickfreed

So I've looked through this briefly and it all looks really good, but it adds a decent amount of complexity and also overhead (thread per request), so I went to revisit the documentation one last time to confirm if this was definitely necessary. The documentation states that the context "must be owned by a single thread", which seems pretty clear cut, but it made me wonder if this was actually referring to concurrency safety instead of thread affinity, the latter of which was our initial interpretation. If this was intended to read "must be owned by a single thread at a time", then it would be clear that this was not about thread affinity, and we could therefore implement Send (but not Sync or Clone) for Ctx.

I asked Kevin for confirmation, and it does seem to be more about concurrency safety:

Kevin: I do not think mongocrypt_ctx_t has any thread local data. I expect the requirement really is “must be owned by a single thread at a time”.

Kevin: This is an interesting use-case though. And not one that was considered. I suggest filing a MONGOCRYPT ticket to request a documentation change. That may help to prevent breaking that assumption in the future.

To further confirm whether it was okay for us to implement Send for Ctx, I checked out the Rustnomicon, which had the following to say:

Something can safely be Send unless it shares mutable state with something else without enforcing exclusive access to it

So an example of this would be Rc<T>, since if it were Send, it could be cloned and sent to another thread, possibly allowing two threads to non-atomically increment the reference counter at the same time. In our case, we're storing a pointer to a mongocrypt_ctx_t, so as long as we don't implement Clone we should be good on this criteria.

I also found this blog post which has a section that focuses specifically on implementing Send and/or Sync on types that wrap C pointers. In it, they provide some specific-to-wrapping-C criteria:

Assuming you don’t allow outside access to the pointer somehow (via accessor methods or by marking the struct member pub) then you are probably safe to do the following if you can make these assertions:

You can mark your struct Send if the C code dereferencing the pointer never uses thread-local storage or thread-local locking. This happens to be true for many libraries.

Per Kevin's comment above, thread-local storage isn't a problem. libmongocrypt does make use of pthread_mutex which does thread-local-locking, but mongocrypt_ctx_t doesn't make use of it. Furthermore, libmongocrypt always acquires and releases its locks within its function calls, so we'd never be able to get into a situation where we acquire the lock in one thread, send the Ctx to another, and then try to release it.

So in summary, I think we should be able to implement Send for Ctx, which would allow us to avoid having to do most of the work involved in this PR. Sorry to bring this all up after it was all written; I just hadn't looked into it this deeply until the review.

Seeing this review did make me realize that the idea of using threads to do the Ctx work is still a good idea though. I imagine (but do not have numbers to prove) that the encryption and decryption work done in the Ctx during feed is CPU-intensive, especially for large documents. Doing a lot of CPU-intensive work can use up threads the async runtime uses to schedule other work, so moving it off to a thread pool may be beneficial to performance, regardless of the Send issues. That said, we'd probably want to move that logic into the driver, and rather than implement Ctx as a thread-based-actor, instead use a pool of fewer threads that we use to do all our Ctx work. In this PR, we're currently using tokio::spawn_blocking for this, but I think that we may actually want to use a dedicated thread pool like rayon, since tokio at least has a really high default thread pool size and actually recommends using rayon if that's a concern (see the prior link). Given that we may need to use the pool for every FLE enabled request, limiting the pool size seems like a good idea for us.

Anyways, let me know if this makes sense or I'm missing anything, and again, sorry for bringing all this up after all the code was already written!

abr-egn added 10 commits August 1, 2022 12:12

wip

e8d4403

oh yeah, it's all coming together

8eef047

re-impl progress

8d1909b

untested

1a1db85

make ctx sync

368364e

use async runtime spawns

dad47b6

format and clippy

8013db2

type safety for AssertSendPtr

1219138

simpler assertsendptr

a6ce44e

restore comment

df62800

abr-egn commented Aug 1, 2022

View reviewed changes

abr-egn requested review from kmahar, patrickfreed and isabelatkinson August 1, 2022 18:19

patrickfreed reviewed Aug 3, 2022

View reviewed changes

abr-egn closed this Aug 4, 2022

abr-egn mentioned this pull request Aug 4, 2022

RUST-1384 Implement auto encryption mongodb/mongo-rust-driver#717

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUST-1384 Move all `Ctx` actions to a dedicated worker thread #9

RUST-1384 Move all `Ctx` actions to a dedicated worker thread #9

abr-egn commented Aug 1, 2022

abr-egn Aug 1, 2022

abr-egn Aug 1, 2022

abr-egn Aug 1, 2022

abr-egn Aug 1, 2022

abr-egn Aug 1, 2022

abr-egn Aug 1, 2022

patrickfreed left a comment

RUST-1384 Move all Ctx actions to a dedicated worker thread #9

RUST-1384 Move all Ctx actions to a dedicated worker thread #9

Conversation

abr-egn commented Aug 1, 2022

abr-egn Aug 1, 2022

Choose a reason for hiding this comment

abr-egn Aug 1, 2022

Choose a reason for hiding this comment

abr-egn Aug 1, 2022

Choose a reason for hiding this comment

abr-egn Aug 1, 2022

Choose a reason for hiding this comment

abr-egn Aug 1, 2022

Choose a reason for hiding this comment

abr-egn Aug 1, 2022

Choose a reason for hiding this comment

patrickfreed left a comment

Choose a reason for hiding this comment

RUST-1384 Move all `Ctx` actions to a dedicated worker thread #9

RUST-1384 Move all `Ctx` actions to a dedicated worker thread #9