Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run post-breach phase in separate thread #758

Merged
merged 3 commits into from
Aug 11, 2020

Conversation

shreyamalviya
Copy link
Contributor

Fixes #696

@codecov
Copy link

codecov bot commented Aug 3, 2020

Codecov Report

Merging #758 into develop will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #758   +/-   ##
========================================
  Coverage    60.31%   60.31%           
========================================
  Files          161      161           
  Lines         4899     4899           
========================================
  Hits          2955     2955           
  Misses        1944     1944           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0428d0...444c2cb. Read the comment docs.

@ShayNehmad
Copy link
Contributor

While this is a good start, we'd like all PBAs to run simultaneously, not only have the PBAs run async. So we want to open a thread for each PBA (maybe up to a limit of 4 threads running at the same time)

@shreyamalviya shreyamalviya force-pushed the pba-threading branch 2 times, most recently from 6c51ac0 to d9cc851 Compare August 7, 2020 08:19
@ShayNehmad
Copy link
Contributor

Cool. Can you do some time measurements and see how much time this saves (and if you increase the thread max limit)?

@shreyamalviya
Copy link
Contributor Author

Running with no threading amongst the PBAs gives 246374421 nanoseconds, and with threading, 5 threads gives the best performance which is 241448650 nanoseconds.

@ShayNehmad
Copy link
Contributor

Hmm... Unfortunately seems like this makes almost no difference :(
Well, seems like this will shave 2 seconds off the Monkey's runtime in total, since the PBAs themselves are executed async. This is not a huge improvement, but it's something :)

If you're done with testing I think we can merge.

We do learn a very important lesson from this: We should profile the Monkey execution and see how long each part takes, so we can understand where we might be able to parallelize and make it quicker. Please create an issue for that.

@shreyamalviya
Copy link
Contributor Author

Yes, I'll create an issue. I'm also done testing so I'm going to go ahead and merge this in a while.

@shreyamalviya shreyamalviya marked this pull request as ready for review August 10, 2020 06:06
Comment on lines +28 to +29
pool = Pool(5)
pool.map(self.run_pba, self.pba_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried this with multithreading? Performance was worse? Processes take longer to spawn, but are truly "parallel". If I/O is long, maybe we don't really need parallel execution, maybe we just need to init all I/O asap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm importing Pool from multiprocessing.dummy.

multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see!

@VakarisZ
Copy link
Contributor

I agree with Shay. Be wary of such issues, because we should parallelize code only to increase performance and if we need to increase performance, let's increase it where it matters. Parallelizing just because we can is not the best way to go, as it introduces a lot of potential problems with a questionable return.

Comment on lines +41 to +44
LOG.debug("Executing PBA: '{}'".format(pba.name))
pba.run()
except Exception as e:
LOG.error("PBA {} failed. Error info: {}".format(pba.name, e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging doesn't get messed up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No? Just that all PBA logs are not logged together now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm concerned with is the possibility of log showing:

Executing PBA X
Output of PBA Y
Errors of PBA Y

This might be a bit misleading, but I think we'll manage.

@VakarisZ VakarisZ merged commit 62c4eeb into guardicore:develop Aug 11, 2020
@shreyamalviya shreyamalviya deleted the pba-threading branch September 2, 2020 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improving efficiency of PBA system using multithreading
3 participants