Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDAEnsemble CLI output shows progress incorrectly #901

Closed
zeyus opened this issue Aug 11, 2022 · 2 comments · Fixed by #903
Closed

CUDAEnsemble CLI output shows progress incorrectly #901

zeyus opened this issue Aug 11, 2022 · 2 comments · Fixed by #903

Comments

@zeyus
Copy link

zeyus commented Aug 11, 2022

Describe the bug
This is an extremely minor issue. When CUDAEnsemble outputs its progress to the console, it will say how many runs it has completed e.g. 4/10. The problem is the number on the left, seems to just be the index of the first missing log (or the result from one thread I'm not sure). So when I was doing runs, some exit early if the agents die, and with 4 threads let's say threads 3 and 4 finish, then log files 0 and 1 are missing, it will quickly report 2/10 complete then go to 0/10.

To Reproduce
Steps to reproduce the behaviour:

Make a CUDAEnsemble with multiple threads, set some later threads to terminate early.

Expected behaviour
The number should be the total simulation runs completed.

Configuration(please complete the following information):

  • OS: Win 11
  • CUDA version: 11.7
  • GPU GTX 1070
  • GPU Driver Version 516.59
@Robadob
Copy link
Member

Robadob commented Aug 11, 2022

Yeah I'm aware of this, thought there was an issue, guess not.

Afaik, the progress lists the index of the last job to finish, so if they end out of order they were started the progress counter can go backwards.

@Robadob
Copy link
Member

Robadob commented Aug 11, 2022

This is the offending line of code

fprintf(stdout, "\rCUDAEnsemble progress: %u/%u", run_id + 1, static_cast<unsigned int>(plans.size()));

The solution might be to replace run_id + 1 with next_run.get() - <number of runners> + 1.

The total number of runners would need to be passed to the SimRunner at construction though.

Robadob added a commit that referenced this issue Aug 11, 2022
Also removed printing 0/N, as this conflicts with a logging warning, looks a bit dodgy.

Closes #901
Robadob added a commit that referenced this issue Aug 11, 2022
Also removed printing 0/N, as this conflicts with a logging warning, looks a bit dodgy.

Closes #901
Robadob added a commit that referenced this issue Aug 11, 2022
Also removed printing 0/N, as this conflicts with a logging warning, looks a bit dodgy.

Closes #901
mondus pushed a commit that referenced this issue Aug 17, 2022
Also removed printing 0/N, as this conflicts with a logging warning, looks a bit dodgy.

Closes #901
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants