Skip to content

Commit a39ce01

Browse files
committed
v2.4 --scale --stdev
1 parent ac6fd5b commit a39ce01

File tree

5 files changed

+67
-27
lines changed

5 files changed

+67
-27
lines changed

Changes

+3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
Revision history for Benchmark-DKbench
22

3+
2.4 2023-09-29
4+
- Added --scale and --stdev options.
5+
36
2.3 2023-09-25
47
- Show Perl threads/multi config.
58
- Minimum module version for passing benchmarks.

README.md

+21-8
Original file line numberDiff line numberDiff line change
@@ -99,11 +99,13 @@ options to control number of threads, iterations, which benchmarks to run etc:
9999
--multi, -m : Multi-threaded using all your CPU cores/threads.
100100
--max_threads <i> : Override the cpu detection to specify max cpu threads.
101101
--iter <i>, -i <i> : Number of suite iterations (with min/max/avg at the end).
102+
--stdev : Show relative standard deviation (for iter > 1).
102103
--include <regex> : Run only benchmarks that match regex.
103104
--exclude <regex> : Do not run benchmarks that match regex.
104105
--time, -t : Report time (sec) instead of score.
105106
--quick, -q : Quick benchmark run (implies -t).
106107
--no_mce : Do not run under MCE::Loop (implies -j 1).
108+
--scale <i>, -s <i> : Scale the bench workload by x times.
107109
--skip_bio : Skip BioPerl benchmarks.
108110
--skip_prove : Skip Moose prove benchmark.
109111
--time_piece : Run optional Time::Piece benchmark (see benchmark details).
@@ -121,18 +123,26 @@ multi vs single threaded scalability.
121123
The scores are calibrated such that a reference CPU (Intel Xeon Platinum 8481C -
122124
Sapphire Rapids) would achieve a score of 1000 in a single-core benchmark run using
123125
the default software configuration (Linux/Perl 5.36.0 built with multiplicity and
124-
threads, with reference CPAN module versions).
126+
threads, with reference CPAN module versions). Perl built without thread support and
127+
multi(plicity) will be a bit faster (usually in the order of ~3-4%), while older Perl
128+
versions will most likely be slower. Different CPAN module versions will also impact
129+
scores, using `setup_dkbench` is a way to ensure a reference environment for more
130+
meaningful hardware comparisons.
125131

126-
The multi-thread scalability should approach 100% if each thread runs on a full core
127-
(i.e. no SMT), and the core can maintain the clock speed it had on the single-thread
128-
runs. Note that the overall scalability is an average of the benchmarks that drops
129-
non-scaling outliers (over 2\*stdev less than the mean).
132+
The multi-thread scalability calculated by the suite should approach 100% if each
133+
thread runs on a full core (i.e. no SMT), and the core can maintain the clock speed
134+
it had on the single-thread runs. Note that the overall scalability is an average
135+
of the benchmarks that drops non-scaling outliers (over 2\*stdev less than the mean).
136+
137+
If you want to reduce the effects of thermal throttling, which will lower the speed
138+
of (mainly multi-threaded) benchmarks as the CPU temperature increases, the `sleep`
139+
option can help by adding cooldown time between each benchmark.
130140

131141
The suite will report a Pass/Fail per benchmark. A failure may be caused if you have
132142
different CPAN module version installed - this is normal, and you will be warned.
133143

134-
The suite uses [MCE::Loop](https://metacpan.org/pod/MCE%3A%3ALoop) to run on the desired number of parallel threads, although
135-
there is an option to disable it, which forces a single-thread run.
144+
[MCE::Loop](https://metacpan.org/pod/MCE%3A%3ALoop) is used to run on the desired number of parallel threads, with minimal
145+
overhead., There is an option to disable it, which forces a single-thread run.
136146

137147
## `setup_dkbench`
138148

@@ -238,6 +248,9 @@ Prints out software/hardware configuration and returns then number of cores dete
238248
Runs the benchmark suite given the `%options` and prints results. Returns a hash
239249
with run stats.
240250

251+
The options accepted are the same as the `dkbench` script (in their long form),
252+
except `help`, `setup` and `max_threads` which are command-line only.
253+
241254
## `calc_scalability`
242255

243256
calc_scalability(\%options, \%stat_single, \%stat_multi);
@@ -263,7 +276,7 @@ actual workload.
263276
## SCORES
264277

265278
Some sample DKbench score results from various systems for comparison (all on
266-
reference setup with Perl 5.36.0):
279+
reference setup with Perl 5.36.0 thread-multi):
267280

268281
CPU Cores/HT Single Multi Scalability
269282
Intel i7-4750HQ @ 2.0 (MacOS) 4/8 612 2332 46.9%

dkbench

+7-4
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,13 @@ See POD on the main module for info:
2121
--multi, -m : Multi-threaded using all your CPU cores/threads.
2222
--max_threads <i> : Override the cpu detection to specify max cpu threads.
2323
--iter <i>, -i <i> : Number of suite iterations (with min/max/avg at the end).
24+
--stdev : Show relative standard deviation (for iter > 1).
2425
--include <regex> : Run only benchmarks that match regex.
2526
--exclude <regex> : Do not run benchmarks that match regex.
2627
--time, -t : Report time (sec) instead of score.
2728
--quick, -q : Quick benchmark run (implies -t).
2829
--no_mce : Do not run under MCE::Loop (implies -j 1).
30+
--scale <i>, -s <i> : Scale the bench workload by x times.
2931
--skip_bio : Skip BioPerl benchmarks.
3032
--skip_prove : Skip Moose prove benchmark.
3133
--time_piece : Run optional Time::Piece benchmark (see benchmark details).
@@ -70,9 +72,10 @@ GetOptions (
7072
'max_threads=i',
7173
'include=s',
7274
'exclude=s',
73-
'repeat|r=i',
75+
'scale|s=i',
7476
'no_mce|n',
7577
'sleep=i',
78+
'stdev',
7679
'ver=s',
7780
'setup',
7881
'datapath=s',
@@ -81,9 +84,9 @@ GetOptions (
8184

8285
pod2usage({ -verbose => 1, -output => \*STDOUT, -noperldoc => 1}) if $opt{help};
8386

84-
$opt{iter} ||= 1;
85-
$opt{repeat} ||= 1;
86-
$opt{time} = 1 if $opt{quick} || $opt{repeat} > 1;
87+
$opt{iter} ||= 1;
88+
$opt{scale} ||= 1;
89+
$opt{time} = 1 if $opt{quick};
8790

8891
Benchmark::DKbench::Setup::fetch_genbank($opt{datapath}) if $opt{setup};
8992

lib/Benchmark/DKbench.pm

+33-13
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ our @EXPORT = qw(system_identity suite_run calc_scalability);
4040
our $datadir = dist_dir("Benchmark-DKbench");
4141
my $mono_clock = $^O !~ /win/i || $Time::HiRes::VERSION >= 1.9764;
4242

43-
our $VERSION = '2.3';
43+
our $VERSION = '2.4';
4444

4545
=head1 NAME
4646
@@ -143,11 +143,13 @@ options to control number of threads, iterations, which benchmarks to run etc:
143143
--multi, -m : Multi-threaded using all your CPU cores/threads.
144144
--max_threads <i> : Override the cpu detection to specify max cpu threads.
145145
--iter <i>, -i <i> : Number of suite iterations (with min/max/avg at the end).
146+
--stdev : Show relative standard deviation (for iter > 1).
146147
--include <regex> : Run only benchmarks that match regex.
147148
--exclude <regex> : Do not run benchmarks that match regex.
148149
--time, -t : Report time (sec) instead of score.
149150
--quick, -q : Quick benchmark run (implies -t).
150151
--no_mce : Do not run under MCE::Loop (implies -j 1).
152+
--scale <i>, -s <i> : Scale the bench workload by x times.
151153
--skip_bio : Skip BioPerl benchmarks.
152154
--skip_prove : Skip Moose prove benchmark.
153155
--time_piece : Run optional Time::Piece benchmark (see benchmark details).
@@ -165,18 +167,26 @@ multi vs single threaded scalability.
165167
The scores are calibrated such that a reference CPU (Intel Xeon Platinum 8481C -
166168
Sapphire Rapids) would achieve a score of 1000 in a single-core benchmark run using
167169
the default software configuration (Linux/Perl 5.36.0 built with multiplicity and
168-
threads, with reference CPAN module versions).
170+
threads, with reference CPAN module versions). Perl built without thread support and
171+
multi(plicity) will be a bit faster (usually in the order of ~3-4%), while older Perl
172+
versions will most likely be slower. Different CPAN module versions will also impact
173+
scores, using C<setup_dkbench> is a way to ensure a reference environment for more
174+
meaningful hardware comparisons.
169175
170-
The multi-thread scalability should approach 100% if each thread runs on a full core
171-
(i.e. no SMT), and the core can maintain the clock speed it had on the single-thread
172-
runs. Note that the overall scalability is an average of the benchmarks that drops
173-
non-scaling outliers (over 2*stdev less than the mean).
176+
The multi-thread scalability calculated by the suite should approach 100% if each
177+
thread runs on a full core (i.e. no SMT), and the core can maintain the clock speed
178+
it had on the single-thread runs. Note that the overall scalability is an average
179+
of the benchmarks that drops non-scaling outliers (over 2*stdev less than the mean).
180+
181+
If you want to reduce the effects of thermal throttling, which will lower the speed
182+
of (mainly multi-threaded) benchmarks as the CPU temperature increases, the C<sleep>
183+
option can help by adding cooldown time between each benchmark.
174184
175185
The suite will report a Pass/Fail per benchmark. A failure may be caused if you have
176186
different CPAN module version installed - this is normal, and you will be warned.
177187
178-
The suite uses L<MCE::Loop> to run on the desired number of parallel threads, although
179-
there is an option to disable it, which forces a single-thread run.
188+
L<MCE::Loop> is used to run on the desired number of parallel threads, with minimal
189+
overhead., There is an option to disable it, which forces a single-thread run.
180190
181191
=head2 C<setup_dkbench>
182192
@@ -306,6 +316,9 @@ Prints out software/hardware configuration and returns then number of cores dete
306316
Runs the benchmark suite given the C<%options> and prints results. Returns a hash
307317
with run stats.
308318
319+
The options accepted are the same as the C<dkbench> script (in their long form),
320+
except C<help>, C<setup> and C<max_threads> which are command-line only.
321+
309322
=head2 C<calc_scalability>
310323
311324
calc_scalability(\%options, \%stat_single, \%stat_multi);
@@ -331,7 +344,7 @@ actual workload.
331344
=head2 SCORES
332345
333346
Some sample DKbench score results from various systems for comparison (all on
334-
reference setup with Perl 5.36.0):
347+
reference setup with Perl 5.36.0 thread-multi):
335348
336349
CPU Cores/HT Single Multi Scalability
337350
Intel i7-4750HQ @ 2.0 (MacOS) 4/8 612 2332 46.9%
@@ -423,7 +436,7 @@ sub suite_run {
423436
my $opt = shift;
424437
$datadir = $opt->{datapath} if $opt->{datapath};
425438
$opt->{threads} //= 1;
426-
$opt->{repeat} //= 1;
439+
$opt->{scale} //= 1;
427440
$opt->{f} = $opt->{time} ? '%.3f' : '%5.0f';
428441
my %stats = (threads => $opt->{threads});
429442

@@ -529,15 +542,15 @@ sub mce_bench_run {
529542
MCE->gather([$time, $res]);
530543
}
531544
}
532-
(1 .. $opt->{threads} * $opt->{repeat});
545+
(1 .. $opt->{threads} * $opt->{scale});
533546

534547
my ($res, $time) = ('Pass', 0);
535548
foreach (@stats) {
536549
$time += $_->[0];
537550
$res = $_->[1] if $_->[1] ne 'Pass';
538551
}
539552

540-
return $time/($opt->{threads}*$opt->{repeat} || 1), $res;
553+
return $time/($opt->{threads}*$opt->{scale} || 1), $res;
541554
}
542555

543556
sub bench_run {
@@ -1033,6 +1046,7 @@ sub total_stats {
10331046
my $display = $opt->{time} ? 'times' : 'scores';
10341047
my $title = $opt->{time} ? 'Time (sec)' : 'Score';
10351048
print "Aggregates:\n".pad_to("Benchmark",24).pad_to("Avg $title").pad_to("Min $title").pad_to("Max $title");
1049+
print pad_to("stdev %") if $opt->{stdev};
10361050
print pad_to("Pass %") unless $opt->{time};
10371051
print "\n";
10381052
foreach my $bench (sort keys %$benchmarks) {
@@ -1053,7 +1067,13 @@ sub calc_stats {
10531067
my $arr = shift;
10541068
my $pad = shift;
10551069
my ($min, $max, $avg) = min_max_avg($arr);
1056-
return $avg, join '', map {pad_to(sprintf($opt->{f}, $_), $pad)} ($avg,$min,$max);
1070+
my $str = join '', map {pad_to(sprintf($opt->{f}, $_), $pad)} ($avg,$min,$max);
1071+
if ($opt->{stdev} && $avg) {
1072+
my $stdev = avg_stdev($arr);
1073+
$stdev *= 100/$avg;
1074+
$str .= pad_to(sprintf("%0.2f%%", $stdev), $pad);
1075+
}
1076+
return $avg, $str;
10571077
}
10581078

10591079
sub min_max_avg {

t/simple.t

+3-2
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ diag $std[0];
4242
skip_prove => 1,
4343
bio_codons => 1,
4444
iter => 2,
45+
stdev => 1,
4546
no_mce => 1,
4647
include => 'Matrix'
4748
}
@@ -60,7 +61,7 @@ calc_scalability({}, \%stats1, \%stats2);
6061
time => 1,
6162
quick => 1,
6263
iter => 2,
63-
repeat => 1,
64+
scale => 1,
6465
no_mce => 1,
6566
include => 'DCT',
6667
}
@@ -76,7 +77,7 @@ my $datadir = dist_dir("Benchmark-DKbench");
7677
threads => 1,
7778
time => 1,
7879
iter => 1,
79-
repeat => 1,
80+
scale => 1,
8081
no_mce => 1,
8182
include => 'prove',
8283
}

0 commit comments

Comments
 (0)