RUST-1373 Update unified test format runner to support SDAM integration tests #712

patrickfreed · 2022-07-25T20:14:20Z

This PR updates the test runner to preemptively implement the changes proposed in mongodb/specifications#1274. It also makes some general test runner changes and improvements.

this commit also updates internal SDAM handing to use `Error` instead of `String` everywhere, which was essentially a historical quirk. This also updated `ExpectedError::verify_result` to return a `Result` instead of panicking.

patrickfreed · 2022-07-25T20:17:37Z

Cargo.toml

@@ -7,7 +7,7 @@ authors = [
    "Kaitlin Mahar <kaitlin.mahar@mongodb.com>",
 ]
 description = "The official MongoDB driver for Rust"
-edition = "2018"
+edition = "2021"


The Disjoint capture in closures feature of the 2021 edition helped make using lifetimes in the runner a lot easier, so I bumped it here. Now that the MSRV is updated to the minimum for this edition, this shouldn't be a problem.

patrickfreed · 2022-07-25T20:23:37Z

src/sdam/description/server.rs

+    pub(crate) reply: Result<Option<HelloReply>>,
+}
+
+impl Serialize for ServerDescription {


This manual implementation here just replicates what the old generated one was before the error change.

could you use a custom serialization function via serialize_with on just the reply field here rather than doing custom serialization for the whole struct?

good idea, done

patrickfreed · 2022-07-25T20:24:18Z

src/sdam/description/server.rs

@@ -106,7 +107,36 @@ pub(crate) struct ServerDescription {
    // allows us to ensure that only valid states are possible (e.g. preventing that both an error
    // and a reply are present) while still making it easy to define helper methods on
    // ServerDescription for information we need from the hello reply by propagating with `?`.
-    pub(crate) reply: Result<Option<HelloReply>, String>,
+    pub(crate) reply: Result<Option<HelloReply>>,


The fact that we store String here instead of the actual error is kind of a historical quirk at this point. In order to implement the full error matching for the test runner, I needed to make this a regular Result. IMO this is an improvement anyways.

why were we storing it to begin with, and why do we need to preserve the string behavior in the serialize implementation?

This was originally converted to being String to preserve the Clone implementation for ServerDescription back when we were pondering removing Clone from Error. We ended up keeping Error Clone, so we didn't actually need to make this change, but it was kept in since the code was already written and as "a mild perf improvement".

See #301 for the full context.

As far as preserving the serialize implementation, I just kept it as-is to avoid breaking any existing tests that might be relying on it. I think workload executor uses it, but I'm not sure. cc @isabelatkinson

at least for the workload executor it doesn't really matter what the format is. all of the serialized stuff just gets dumped into JSON logs

patrickfreed · 2022-07-25T20:25:23Z

src/test/atlas_planned_maintenance_testing/mod.rs

@@ -120,7 +121,20 @@ fn write_json(test_runner: &mut TestRunner, mut errors: Vec<Bson>) {
    // The events key is expected to be present regardless of whether storeEventsAsEntities was
    // defined.
    write!(&mut writer, ",\"events\":[").unwrap();
-    test_runner.write_events_list_to_file("events", &mut writer);
+    let event_list_entity = match entities.get("events") {


I just moved the logic from the function that was previously called here inline.

patrickfreed · 2022-07-25T20:26:05Z

src/test/spec/mod.rs

-
-    // Printing the name of the test file makes it easier to debug deserialization errors.
-    println!("Running tests from {}", path.display());
+    let json: Value = serde_json::from_reader(File::open(path.as_path()).unwrap())


since we print the path when deserialization fails, I removed the extra print here since it clashed with the new logging changes I made in the runner itself.

patrickfreed · 2022-07-25T20:31:59Z

src/test/spec/unified_runner/mod.rs

-static MAX_SPEC_VERSION: Version = Version::new(1, 7, 0);
+static MAX_SPEC_VERSION: Version = Version::new(1, 10, 0);
+
+fn file_level_log(message: impl AsRef<str>) {


This function is used to log stuff in a way that separates it from other files. I took the format from some stuff I wrote in the Swift runner a long time ago, let me know if you think it looks good.

Here's a sample:

cargo test sessions::run_unified Compiling mongodb v2.4.0 (/home/patrick/mongo-rust-driver) Finished test [unoptimized + debuginfo] target(s) in 53.69s Running unittests (target/debug/deps/mongodb-dc9092005b6c81b2) running 1 test ------------ Running tests from /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/snapshot-sessions-unsupported-ops.json Executing "Server returns an error on insertOne with snapshot" Executing "Server returns an error on insertMany with snapshot" Executing "Server returns an error on deleteOne with snapshot" Executing "Server returns an error on updateOne with snapshot" Executing "Server returns an error on findOneAndUpdate with snapshot" Executing "Server returns an error on listDatabases with snapshot" Executing "Server returns an error on listCollections with snapshot" Executing "Server returns an error on listIndexes with snapshot" Executing "Server returns an error on runCommand with snapshot" ------------ Skipping file /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/snapshot-sessions-not-supported-client-error.json: client topology not compatible with test ------------ Running tests from /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/driver-sessions-dirty-session-errors.json Executing "Dirty explicit session is discarded (insert)" Executing "Dirty explicit session is discarded (findAndModify)" Executing "Dirty implicit session is discarded (insert)" Executing "Dirty implicit session is discarded (findAndModify)" Executing "Dirty implicit session is discarded (read returning cursor)" Executing "Dirty implicit session is discarded (read not returning cursor)" ------------ Skipping file /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/snapshot-sessions-not-supported-server-error.json: client topology not compatible with test ------------ Running tests from /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/driver-sessions-server-support.json Executing "Server supports explicit sessions" Executing "Server supports implicit sessions" ------------ Running tests from /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/snapshot-sessions.json Executing "Find operation with snapshot" Executing "Distinct operation with snapshot" Executing "Aggregate operation with snapshot" Executing "countDocuments operation with snapshot" Executing "Mixed operation with snapshot" Executing "Write commands with snapshot session do not affect snapshot reads" Executing "First snapshot read does not send atClusterTime" Executing "StartTransaction fails in snapshot session" test test::spec::sessions::run_unified ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 391 filtered out; finished in 10.90s

patrickfreed · 2022-07-25T20:33:18Z

src/test/spec/unified_runner/observer.rs

+/// Observer used to cache all the seen events for a given client in a unified test.
+/// Used to implement assertEventCount and waitForEvent operations.
+#[derive(Debug)]
+pub(crate) struct EventObserver {


The existing EventHandler "consumed" events as they were observed, which doesn't work for the functionality we need in the unified runner. Down the road, we may want to convert all the usages of EventHandler to use EventObserver so we can ditch the other type, but I figured that was out of scope for this work.

I'm having trouble following the distinction between this and EventHandler - can you explain more about what was needed that couldn't be done using that?

The main thing that EventHandler couldn't do very well is implement the waitForEvent operation. In order for that, it would need to check all events already seen and then listen on a channel for new events if it hadn't already been seen. The problem with the way that EventHandler is implemented is that the events go to the cache and the channel separately, so it's always possible to accidentally read them twice. EventObserver is channel-first: the events are only added to the cache after they've been consumed by the channel. That way, in waitForEvent we can always check the cache first and then start listening on the channel, no race conditions involved.

Now I imagine we could implement the existing EventHandler using EventObserver, but I figured that might be best for another PR.

Got it, thank you! Can you leave a TODO in the code pointing to a new ticket so we don't lose track of this?

Filed RUST-1425, added comment.

patrickfreed · 2022-07-25T20:33:52Z

src/test/spec/unified_runner/operation.rs

@@ -93,16 +103,104 @@ pub trait TestOperation: Debug + Send + Sync {
    }
 }

+macro_rules! with_mut_session {


to facilitate working with sessions through the lock, this macro pops it out of the entity map, "passes" it to the provided block, and then returns it to the entity map. It does it this way so that we can continue to borrow the entity map in other ways even when we're using a session, which we'd have to borrow mutably from the map.

nice solution. might be worth a comment explaining why we need it?

done (just added this GH comment actually lol)

patrickfreed · 2022-07-25T20:54:08Z

src/test/spec/unified_runner/operation.rs

+}
+
+impl Operation {
+    pub(crate) async fn execute<'a>(&self, test_runner: TestRunner, description: &str) {


this was just copy/pasted here so that we could call operation.execute from test runner threads. Previously this was all inline in the runner.

patrickfreed · 2022-07-25T20:55:30Z

src/test/spec/unified_runner/test_runner.rs

+#[derive(Clone)]
+pub(crate) struct TestRunner {
+    pub(crate) internal_client: TestClient,
+    pub(crate) entities: Arc<RwLock<EntityMap>>,


the entity map needed to be arc'd + locked so that it could be accessed from all the different test runner tasks. ditto for the failpoint guards.

abr-egn

Overall LG!

abr-egn · 2022-07-26T14:47:40Z

Cargo.toml

@@ -160,6 +160,7 @@ function_name = "0.2.1"
 futures = "0.3"
 home = "0.5"
 pretty_assertions = "1.1.0"
+serde = { version = "*", features = ["rc"] }


I'm a little surprised a bare * is allowed; this means "whatever version is used by the main dependencies block, but also with the rc feature"?

Yep, or more like the dev dependency doesn't place any additional bounds on the version.

got this trick from: https://stackoverflow.com/questions/27872009/how-do-i-use-a-feature-of-a-dependency-only-for-testing

abr-egn · 2022-07-26T15:21:02Z

src/test/spec/unified_runner/observer.rs

+/// Observer used to cache all the seen events for a given client in a unified test.
+/// Used to implement assertEventCount and waitForEvent operations.
+#[derive(Debug)]
+pub(crate) struct EventObserver {


I'm having trouble following the distinction between this and EventHandler - can you explain more about what was needed that couldn't be done using that?

abr-egn · 2022-07-26T15:24:33Z

src/test/spec/unified_runner/test_file.rs

    #[serde(deserialize_with = "deserialize_schema_version")]
-    pub schema_version: Version,
-    pub run_on_requirements: Option<Vec<RunOnRequirement>>,
-    pub allow_multiple_mongoses: Option<bool>,


Was this dropped intentionally?

good catch, yeah it was. There actually isn't an allowMultipleMongoses field at the top level of the test file. There's a useMultipleMongoses, but that's within the properties of a client entity.

abr-egn · 2022-07-26T15:28:12Z

src/test/spec/unified_runner/test_runner.rs


        for test_case in test_file.tests {
            if let Some(skip_reason) = test_case.skip_reason {
                log_uncaptured(format!(
-                    "Skipping test case {}: {}",
+                    "Skipping test case \"{}\": {}",


Nit: when logging strings that need to be quoted, I tend to use {:?} to catch special characters as well.

updated this and other similar ones in this file

kmahar

looks good! a few minor comments and questions

kmahar · 2022-07-27T16:33:04Z

src/sdam/description/server.rs

@@ -106,7 +107,36 @@ pub(crate) struct ServerDescription {
    // allows us to ensure that only valid states are possible (e.g. preventing that both an error
    // and a reply are present) while still making it easy to define helper methods on
    // ServerDescription for information we need from the hello reply by propagating with `?`.
-    pub(crate) reply: Result<Option<HelloReply>, String>,
+    pub(crate) reply: Result<Option<HelloReply>>,


why were we storing it to begin with, and why do we need to preserve the string behavior in the serialize implementation?

src/sdam/description/server.rs

kmahar · 2022-07-27T20:09:04Z

src/test/spec/unified_runner/test_runner.rs

        }
    }

-    pub async fn run_test(&mut self, test_file: TestFile, pred: impl Fn(&TestCase) -> bool) {
+    pub(crate) async fn run_test(
+        &self,


it seems like we are always creating two test runners now? run_unified_format_test_filtered and the workload executor both create them and call this method on them, but then this method creates its own

Oh whoops, good catch. I think my search/replace for moving the operation execution code was a bit overzealous. Removed the extra test runner creation and also some redundant clones.

kmahar · 2022-07-27T20:33:19Z

src/test/spec/unified_runner/operation.rs

@@ -93,16 +103,104 @@ pub trait TestOperation: Debug + Send + Sync {
    }
 }

+macro_rules! with_mut_session {


nice solution. might be worth a comment explaining why we need it?

kmahar · 2022-07-27T20:50:58Z

src/test/spec/unified_runner/test_runner.rs

+                                    op.execute(runner.clone(), d.as_str()).await;
+                                }
+                                ThreadMessage::Stop(sender) => {
+                                    sender.send(Ok(())).unwrap_or_else(|_| {


idk if I follow the error case here that we're panicking on here... if I understand correctly, we only receive ThreadMessage::Stop here once wait() has been called on a ThreadEntity, which looks to only occur via a waitForThread operation being executed.

from the docs it looks oneshot::Sender::send only fails if the receiver has already been dropped. maybe this can happen if we for some reason kick off but don't finish awaiting the waitForThread operation, so ThreadEntity.wait() doesn't actually finish executing and the receiver is dropped? or how were you imagining we end up in this state?

I had the logic for this backwards, this panic should be in the waitForThread operation, not here in the thread itself. And yeah, this case can only happen if waitForThread times out, in which case we can just ignore it and let waitForThread handle the error. The timeout was a code review addition in the specs PR, updated this implementation to include it as well as moving this panic logic into waitForThread.

src/test/spec/unified_runner/observer.rs

patrickfreed · 2022-07-28T17:13:16Z

In order to preserve the PR comment history, I've started using merge commits, hopefully the diff isn't too bad. Will still squash it all back to a single one at the end as usual though.

abr-egn

LGTM! (modulo others' comments)

patrickfreed

After the spec PR went through review, we removed the requirement to make assertions on the error that a ServerDescription contains when verifying SDAM events, since none of the existing, non-unified SDAM integrations tests need to do that. This means the changes related to having ServerDescription store a mongodb::Result instead of std::Result<T, String> aren't necessary, though I think given that they're already written we should keep them in. They'll make implementing RUST-1422 easier.

patrickfreed · 2022-08-02T21:14:51Z

src/sdam/public.rs

@@ -100,6 +101,11 @@ impl<'a> ServerInfo<'a> {
    pub fn tags(&self) -> Option<&TagSet> {
        self.command_response_getter(|r| r.tags.as_ref())
    }
+
+    /// Gets the error this server encountered, if any.
+    pub fn error(&self) -> Option<&Error> {


Filed RUST-1432 for this change.

kmahar

lgtm mod a nit but don't need to re-review

kmahar · 2022-08-04T00:15:58Z

src/sdam/public.rs

@@ -100,6 +101,11 @@ impl<'a> ServerInfo<'a> {
    pub fn tags(&self) -> Option<&TagSet> {
        self.command_response_getter(|r| r.tags.as_ref())
    }
+
+    /// Gets the error this server encountered, if any.


nit: could we make this a little more descriptive to clarify it's whatever error caused the server state to change and not just any error?

patrickfreed added 3 commits July 25, 2022 14:52

update unified runner to support sdam integration tests

079de3b

improve test logging

83dae89

properly support matching SDAM events

1f1a55b

this commit also updates internal SDAM handing to use `Error` instead of `String` everywhere, which was essentially a historical quirk. This also updated `ExpectedError::verify_result` to return a `Result` instead of panicking.

patrickfreed commented Jul 25, 2022

View reviewed changes

patrickfreed marked this pull request as ready for review July 25, 2022 20:56

patrickfreed requested review from abr-egn, isabelatkinson and kmahar July 25, 2022 20:56

abr-egn reviewed Jul 26, 2022

View reviewed changes

patrickfreed added 2 commits July 26, 2022 11:48

sync invalid tests

4740214

only assert on server type in server descriptions

5751d18

kmahar reviewed Jul 27, 2022

View reviewed changes

patrickfreed added 9 commits July 28, 2022 11:08

use #[serde(with = ...)] instead of manual impl

86f13bb

improve names of observer methods

f06c578

document with_mut_session!

778a596

drop duplicate runner, eliminate some clones

8387618

use debug instead of escaped quotes

e5d9bc4

fix waitForThread logic

5c3b40c

fix clippy

362785d

Merge branch 'main' into DRIVERS-2366/update-runner-unified

c0243c3

fix clippy

06e7997

abr-egn approved these changes Jul 28, 2022

View reviewed changes

add todo

c565cca

isabelatkinson approved these changes Aug 2, 2022

View reviewed changes

patrickfreed commented Aug 2, 2022

View reviewed changes

patrickfreed requested a review from kmahar August 2, 2022 21:21

kmahar approved these changes Aug 4, 2022

View reviewed changes

clarify error fn

1b9c9f0

patrickfreed merged commit 321cc34 into mongodb:main Aug 4, 2022

kmahar mentioned this pull request Oct 27, 2022

RUST-1510 Implement connection pool tracing messages #766

Merged

RUST-1373 Update unified test format runner to support SDAM integration tests #712

RUST-1373 Update unified test format runner to support SDAM integration tests #712

Conversation

patrickfreed commented Jul 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abr-egn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kmahar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickfreed commented Jul 28, 2022

abr-egn left a comment

Choose a reason for hiding this comment

patrickfreed left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kmahar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patrickfreed left a comment •

edited

Loading