Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUST-1373 Update unified test format runner to support SDAM integration tests #712

Merged

Conversation

patrickfreed
Copy link
Contributor

RUST-1373

This PR updates the test runner to preemptively implement the changes proposed in mongodb/specifications#1274. It also makes some general test runner changes and improvements.

this commit also updates internal SDAM handing to use `Error` instead
of `String` everywhere, which was essentially a historical quirk. This
also updated `ExpectedError::verify_result` to return a `Result`
instead of panicking.
@@ -7,7 +7,7 @@ authors = [
"Kaitlin Mahar <kaitlin.mahar@mongodb.com>",
]
description = "The official MongoDB driver for Rust"
edition = "2018"
edition = "2021"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Disjoint capture in closures feature of the 2021 edition helped make using lifetimes in the runner a lot easier, so I bumped it here. Now that the MSRV is updated to the minimum for this edition, this shouldn't be a problem.

pub(crate) reply: Result<Option<HelloReply>>,
}

impl Serialize for ServerDescription {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This manual implementation here just replicates what the old generated one was before the error change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you use a custom serialization function via serialize_with on just the reply field here rather than doing custom serialization for the whole struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, done

@@ -106,7 +107,36 @@ pub(crate) struct ServerDescription {
// allows us to ensure that only valid states are possible (e.g. preventing that both an error
// and a reply are present) while still making it easy to define helper methods on
// ServerDescription for information we need from the hello reply by propagating with `?`.
pub(crate) reply: Result<Option<HelloReply>, String>,
pub(crate) reply: Result<Option<HelloReply>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we store String here instead of the actual error is kind of a historical quirk at this point. In order to implement the full error matching for the test runner, I needed to make this a regular Result. IMO this is an improvement anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why were we storing it to begin with, and why do we need to preserve the string behavior in the serialize implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally converted to being String to preserve the Clone implementation for ServerDescription back when we were pondering removing Clone from Error. We ended up keeping Error Clone, so we didn't actually need to make this change, but it was kept in since the code was already written and as "a mild perf improvement".

See #301 for the full context.

As far as preserving the serialize implementation, I just kept it as-is to avoid breaking any existing tests that might be relying on it. I think workload executor uses it, but I'm not sure. cc @isabelatkinson

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least for the workload executor it doesn't really matter what the format is. all of the serialized stuff just gets dumped into JSON logs

@@ -120,7 +121,20 @@ fn write_json(test_runner: &mut TestRunner, mut errors: Vec<Bson>) {
// The events key is expected to be present regardless of whether storeEventsAsEntities was
// defined.
write!(&mut writer, ",\"events\":[").unwrap();
test_runner.write_events_list_to_file("events", &mut writer);
let event_list_entity = match entities.get("events") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved the logic from the function that was previously called here inline.


// Printing the name of the test file makes it easier to debug deserialization errors.
println!("Running tests from {}", path.display());
let json: Value = serde_json::from_reader(File::open(path.as_path()).unwrap())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we print the path when deserialization fails, I removed the extra print here since it clashed with the new logging changes I made in the runner itself.

static MAX_SPEC_VERSION: Version = Version::new(1, 7, 0);
static MAX_SPEC_VERSION: Version = Version::new(1, 10, 0);

fn file_level_log(message: impl AsRef<str>) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used to log stuff in a way that separates it from other files. I took the format from some stuff I wrote in the Swift runner a long time ago, let me know if you think it looks good.

Here's a sample:

cargo test sessions::run_unified
   Compiling mongodb v2.4.0 (/home/patrick/mongo-rust-driver)
    Finished test [unoptimized + debuginfo] target(s) in 53.69s
     Running unittests (target/debug/deps/mongodb-dc9092005b6c81b2)

running 1 test

------------
Running tests from /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/snapshot-sessions-unsupported-ops.json

Executing "Server returns an error on insertOne with snapshot"
Executing "Server returns an error on insertMany with snapshot"
Executing "Server returns an error on deleteOne with snapshot"
Executing "Server returns an error on updateOne with snapshot"
Executing "Server returns an error on findOneAndUpdate with snapshot"
Executing "Server returns an error on listDatabases with snapshot"
Executing "Server returns an error on listCollections with snapshot"
Executing "Server returns an error on listIndexes with snapshot"
Executing "Server returns an error on runCommand with snapshot"

------------
Skipping file /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/snapshot-sessions-not-supported-client-error.json: client topology not compatible with test


------------
Running tests from /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/driver-sessions-dirty-session-errors.json

Executing "Dirty explicit session is discarded (insert)"
Executing "Dirty explicit session is discarded (findAndModify)"
Executing "Dirty implicit session is discarded (insert)"
Executing "Dirty implicit session is discarded (findAndModify)"
Executing "Dirty implicit session is discarded (read returning cursor)"
Executing "Dirty implicit session is discarded (read not returning cursor)"

------------
Skipping file /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/snapshot-sessions-not-supported-server-error.json: client topology not compatible with test


------------
Running tests from /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/driver-sessions-server-support.json

Executing "Server supports explicit sessions"
Executing "Server supports implicit sessions"

------------
Running tests from /home/patrick/mongo-rust-driver/src/test/spec/json/sessions/snapshot-sessions.json

Executing "Find operation with snapshot"
Executing "Distinct operation with snapshot"
Executing "Aggregate operation with snapshot"
Executing "countDocuments operation with snapshot"
Executing "Mixed operation with snapshot"
Executing "Write commands with snapshot session do not affect snapshot reads"
Executing "First snapshot read does not send atClusterTime"
Executing "StartTransaction fails in snapshot session"
test test::spec::sessions::run_unified ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 391 filtered out; finished in 10.90s

/// Observer used to cache all the seen events for a given client in a unified test.
/// Used to implement assertEventCount and waitForEvent operations.
#[derive(Debug)]
pub(crate) struct EventObserver {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing EventHandler "consumed" events as they were observed, which doesn't work for the functionality we need in the unified runner. Down the road, we may want to convert all the usages of EventHandler to use EventObserver so we can ditch the other type, but I figured that was out of scope for this work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble following the distinction between this and EventHandler - can you explain more about what was needed that couldn't be done using that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing that EventHandler couldn't do very well is implement the waitForEvent operation. In order for that, it would need to check all events already seen and then listen on a channel for new events if it hadn't already been seen. The problem with the way that EventHandler is implemented is that the events go to the cache and the channel separately, so it's always possible to accidentally read them twice. EventObserver is channel-first: the events are only added to the cache after they've been consumed by the channel. That way, in waitForEvent we can always check the cache first and then start listening on the channel, no race conditions involved.

Now I imagine we could implement the existing EventHandler using EventObserver, but I figured that might be best for another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thank you! Can you leave a TODO in the code pointing to a new ticket so we don't lose track of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed RUST-1425, added comment.

@@ -93,16 +103,104 @@ pub trait TestOperation: Debug + Send + Sync {
}
}

macro_rules! with_mut_session {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to facilitate working with sessions through the lock, this macro pops it out of the entity map, "passes" it to the provided block, and then returns it to the entity map. It does it this way so that we can continue to borrow the entity map in other ways even when we're using a session, which we'd have to borrow mutably from the map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice solution. might be worth a comment explaining why we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (just added this GH comment actually lol)

}

impl Operation {
pub(crate) async fn execute<'a>(&self, test_runner: TestRunner, description: &str) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just copy/pasted here so that we could call operation.execute from test runner threads. Previously this was all inline in the runner.

#[derive(Clone)]
pub(crate) struct TestRunner {
pub(crate) internal_client: TestClient,
pub(crate) entities: Arc<RwLock<EntityMap>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the entity map needed to be arc'd + locked so that it could be accessed from all the different test runner tasks. ditto for the failpoint guards.

@patrickfreed patrickfreed marked this pull request as ready for review July 25, 2022 20:56
Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LG!

@@ -160,6 +160,7 @@ function_name = "0.2.1"
futures = "0.3"
home = "0.5"
pretty_assertions = "1.1.0"
serde = { version = "*", features = ["rc"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised a bare * is allowed; this means "whatever version is used by the main dependencies block, but also with the rc feature"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, or more like the dev dependency doesn't place any additional bounds on the version.

got this trick from: https://stackoverflow.com/questions/27872009/how-do-i-use-a-feature-of-a-dependency-only-for-testing

/// Observer used to cache all the seen events for a given client in a unified test.
/// Used to implement assertEventCount and waitForEvent operations.
#[derive(Debug)]
pub(crate) struct EventObserver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble following the distinction between this and EventHandler - can you explain more about what was needed that couldn't be done using that?

#[serde(deserialize_with = "deserialize_schema_version")]
pub schema_version: Version,
pub run_on_requirements: Option<Vec<RunOnRequirement>>,
pub allow_multiple_mongoses: Option<bool>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this dropped intentionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, yeah it was. There actually isn't an allowMultipleMongoses field at the top level of the test file. There's a useMultipleMongoses, but that's within the properties of a client entity.


for test_case in test_file.tests {
if let Some(skip_reason) = test_case.skip_reason {
log_uncaptured(format!(
"Skipping test case {}: {}",
"Skipping test case \"{}\": {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: when logging strings that need to be quoted, I tend to use {:?} to catch special characters as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this and other similar ones in this file

Copy link
Contributor

@kmahar kmahar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! a few minor comments and questions

@@ -106,7 +107,36 @@ pub(crate) struct ServerDescription {
// allows us to ensure that only valid states are possible (e.g. preventing that both an error
// and a reply are present) while still making it easy to define helper methods on
// ServerDescription for information we need from the hello reply by propagating with `?`.
pub(crate) reply: Result<Option<HelloReply>, String>,
pub(crate) reply: Result<Option<HelloReply>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why were we storing it to begin with, and why do we need to preserve the string behavior in the serialize implementation?

}
}

pub async fn run_test(&mut self, test_file: TestFile, pred: impl Fn(&TestCase) -> bool) {
pub(crate) async fn run_test(
&self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like we are always creating two test runners now? run_unified_format_test_filtered and the workload executor both create them and call this method on them, but then this method creates its own

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops, good catch. I think my search/replace for moving the operation execution code was a bit overzealous. Removed the extra test runner creation and also some redundant clones.

@@ -93,16 +103,104 @@ pub trait TestOperation: Debug + Send + Sync {
}
}

macro_rules! with_mut_session {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice solution. might be worth a comment explaining why we need it?

op.execute(runner.clone(), d.as_str()).await;
}
ThreadMessage::Stop(sender) => {
sender.send(Ok(())).unwrap_or_else(|_| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if I follow the error case here that we're panicking on here... if I understand correctly, we only receive ThreadMessage::Stop here once wait() has been called on a ThreadEntity, which looks to only occur via a waitForThread operation being executed.

from the docs it looks oneshot::Sender::send only fails if the receiver has already been dropped. maybe this can happen if we for some reason kick off but don't finish awaiting the waitForThread operation, so ThreadEntity.wait() doesn't actually finish executing and the receiver is dropped? or how were you imagining we end up in this state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the logic for this backwards, this panic should be in the waitForThread operation, not here in the thread itself. And yeah, this case can only happen if waitForThread times out, in which case we can just ignore it and let waitForThread handle the error. The timeout was a code review addition in the specs PR, updated this implementation to include it as well as moving this panic logic into waitForThread.

@patrickfreed
Copy link
Contributor Author

In order to preserve the PR comment history, I've started using merge commits, hopefully the diff isn't too bad. Will still squash it all back to a single one at the end as usual though.

Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (modulo others' comments)

Copy link
Contributor Author

@patrickfreed patrickfreed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the spec PR went through review, we removed the requirement to make assertions on the error that a ServerDescription contains when verifying SDAM events, since none of the existing, non-unified SDAM integrations tests need to do that. This means the changes related to having ServerDescription store a mongodb::Result instead of std::Result<T, String> aren't necessary, though I think given that they're already written we should keep them in. They'll make implementing RUST-1422 easier.

@@ -100,6 +101,11 @@ impl<'a> ServerInfo<'a> {
pub fn tags(&self) -> Option<&TagSet> {
self.command_response_getter(|r| r.tags.as_ref())
}

/// Gets the error this server encountered, if any.
pub fn error(&self) -> Option<&Error> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed RUST-1432 for this change.

@patrickfreed patrickfreed requested a review from kmahar August 2, 2022 21:21
Copy link
Contributor

@kmahar kmahar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm mod a nit but don't need to re-review

@@ -100,6 +101,11 @@ impl<'a> ServerInfo<'a> {
pub fn tags(&self) -> Option<&TagSet> {
self.command_response_getter(|r| r.tags.as_ref())
}

/// Gets the error this server encountered, if any.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we make this a little more descriptive to clarify it's whatever error caused the server state to change and not just any error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@patrickfreed patrickfreed merged commit 321cc34 into mongodb:main Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants