-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
/
Copy pathperformance-tips.md
1886 lines (1453 loc) · 72.7 KB
/
performance-tips.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# [Performance Tips](@id man-performance-tips)
In the following sections, we briefly go through a few techniques that can help make your Julia
code run as fast as possible.
## [Table of contents](@id man-performance-tips-toc)
```@contents
Pages = ["performance-tips.md"]
Depth = 3
```
## General advice
### Performance critical code should be inside a function
Any code that is performance critical should be inside a function. Code inside functions tends to run much faster than top level code, due to how Julia's compiler works.
The use of functions is not only important for performance: functions are more reusable and testable, and clarify what steps are being done and what their inputs and outputs are, [Write functions, not just scripts](@ref) is also a recommendation of Julia's Styleguide.
The functions should take arguments, instead of operating directly on global variables, see the next point.
### Avoid untyped global variables
The value of an untyped global variable might change at any point, possibly leading to a change of its type. This makes
it difficult for the compiler to optimize code using global variables. This also applies to type-valued variables,
i.e. type aliases on the global level. Variables should be local, or passed as arguments to functions, whenever possible.
We find that global names are frequently constants, and declaring them as such greatly improves
performance:
```julia
const DEFAULT_VAL = 0
```
If a non-constant global is known to always be of the same type, [the type should be annotated](@ref man-typed-globals); `const` globals need not be annotated because their type is inferred from their initialization value.
Uses of untyped globals can be optimized by annotating their types at the point of use:
```julia
global x = rand(1000)
function loop_over_global()
s = 0.0
for i in x::Vector{Float64}
s += i
end
return s
end
```
Passing arguments to functions is better style. It leads to more reusable code and clarifies what the inputs and outputs are.
!!! note
All code in the REPL is evaluated in global scope, so a variable defined and assigned
at top level will be a **global** variable. Variables defined at top level scope inside
modules are also global.
In the following REPL session:
```julia-repl
julia> x = 1.0
```
is equivalent to:
```julia-repl
julia> global x = 1.0
```
so all the performance issues discussed previously apply.
### Measure performance with [`@time`](@ref) and pay attention to memory allocation
A useful tool for measuring performance is the [`@time`](@ref) macro. We here repeat the example
with the global variable above, but this time with the type annotation removed:
```jldoctest; setup = :(using Random; Random.seed!(1234)), filter = r"[0-9\.]+ seconds \(.*?\)"
julia> x = rand(1000);
julia> function sum_global()
s = 0.0
for i in x
s += i
end
return s
end;
julia> @time sum_global()
0.011539 seconds (9.08 k allocations: 373.386 KiB, 98.69% compilation time)
523.0007221951678
julia> @time sum_global()
0.000091 seconds (3.49 k allocations: 70.156 KiB)
523.0007221951678
```
On the first call (`@time sum_global()`) the function gets compiled. (If you've not yet used [`@time`](@ref)
in this session, it will also compile functions needed for timing.) You should not take the results
of this run seriously. For the second run, note that in addition to reporting the time, it also
indicated that a significant amount of memory was allocated. We are here just computing a sum over all elements in
a vector of 64-bit floats so there should be no need to allocate (heap) memory.
We should clarify that what `@time` reports is specifically *heap* allocations, which are typically needed for either
mutable objects or for creating/growing variable-sized containers (such as `Array` or `Dict`, strings, or "type-unstable"
objects whose type is only known at runtime). Allocating (or deallocating) such blocks of memory may require an expensive function
call to libc (e.g. via `malloc` in C), and they must be tracked for garbage collection. In contrast, immutable values like
numbers (except bignums), tuples, and immutable `struct`s can be stored much more cheaply, e.g. in stack or CPU-register
memory, so one doesn’t typically worry about the performance cost of "allocating" them.
Unexpected memory allocation is almost always a sign of some problem with your code, usually a
problem with type-stability or creating many small temporary arrays.
Consequently, in addition to the allocation itself, it's very likely
that the code generated for your function is far from optimal. Take such indications seriously
and follow the advice below.
In this particular case, the memory allocation is due to the usage of a type-unstable global variable `x`, so if we instead pass `x` as an argument to the function it no longer allocates memory
(the remaining allocation reported below is due to running the `@time` macro in global scope)
and is significantly faster after the first call:
```jldoctest sumarg; setup = :(using Random; Random.seed!(1234)), filter = r"[0-9\.]+ seconds \(.*?\)"
julia> x = rand(1000);
julia> function sum_arg(x)
s = 0.0
for i in x
s += i
end
return s
end;
julia> @time sum_arg(x)
0.007551 seconds (3.98 k allocations: 200.548 KiB, 99.77% compilation time)
523.0007221951678
julia> @time sum_arg(x)
0.000006 seconds (1 allocation: 16 bytes)
523.0007221951678
```
The 1 allocation seen is from running the `@time` macro itself in global scope. If we instead run
the timing in a function, we can see that indeed no allocations are performed:
```jldoctest sumarg; filter = r"[0-9\.]+ seconds"
julia> time_sum(x) = @time sum_arg(x);
julia> time_sum(x)
0.000002 seconds
523.0007221951678
```
In some situations, your function may need to allocate memory as part of its operation, and this
can complicate the simple picture above. In such cases, consider using one of the [tools](@ref tools)
below to diagnose problems, or write a version of your function that separates allocation from
its algorithmic aspects (see [Pre-allocating outputs](@ref)).
!!! note
For more serious benchmarking, consider the [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl)
package which among other things evaluates the function multiple times in order to reduce noise.
### Break functions into multiple definitions
Writing a function as many small definitions allows the compiler to directly call the most applicable
code, or even inline it.
Here is an example of a "compound function" that should really be written as multiple definitions:
```julia
using LinearAlgebra
function mynorm(A)
if isa(A, Vector)
return sqrt(real(dot(A,A)))
elseif isa(A, Matrix)
return maximum(svdvals(A))
else
error("mynorm: invalid argument")
end
end
```
This can be written more concisely and efficiently as:
```julia
mynorm(x::Vector) = sqrt(real(dot(x, x)))
mynorm(A::Matrix) = maximum(svdvals(A))
```
It should however be noted that the compiler is quite efficient at optimizing away the dead branches in code
written as the `mynorm` example.
### [Tools](@id tools)
Julia and its package ecosystem includes tools that may help you diagnose problems and improve
the performance of your code:
* [Profiling](@ref) allows you to measure the performance of your running code and identify lines
that serve as bottlenecks. For complex projects, the [ProfileView](https://github.com/timholy/ProfileView.jl)
package can help you visualize your profiling results.
* The [JET](https://github.com/aviatesk/JET.jl) package can help you find common performance problems in your code.
* Unexpectedly-large memory allocations--as reported by [`@time`](@ref), [`@allocated`](@ref), or
the profiler (through calls to the garbage-collection routines)--hint that there might be issues
with your code. If you don't see another reason for the allocations, suspect a type problem.
You can also start Julia with the `--track-allocation=user` option and examine the resulting
`*.mem` files to see information about where those allocations occur. See [Memory allocation analysis](@ref).
* `@code_warntype` generates a representation of your code that can be helpful in finding expressions
that result in type uncertainty. See [`@code_warntype`](@ref) below.
## Type inference
In many languages with optional type declarations, adding declarations is the principal way to
make code run faster. This is *not* the case in Julia. In Julia, the compiler generally knows
the types of all function arguments, local variables, and expressions. However, there are a few
specific instances where declarations are helpful.
### [Avoid containers with abstract type parameters](@id man-performance-abstract-container)
When working with parameterized types, including arrays, it is best to avoid parameterizing with
abstract types where possible.
Consider the following:
```jldoctest
julia> a = Real[]
Real[]
julia> push!(a, 1); push!(a, 2.0); push!(a, π)
3-element Vector{Real}:
1
2.0
π = 3.1415926535897...
```
Because `a` is an array of abstract type [`Real`](@ref), it must be able to hold any
`Real` value. Since `Real` objects can be of arbitrary size and structure, `a` must be
represented as an array of pointers to individually allocated `Real` objects. However, if we instead
only allow numbers of the same type, e.g. [`Float64`](@ref), to be stored in `a` these can be stored more
efficiently:
```jldoctest
julia> a = Float64[]
Float64[]
julia> push!(a, 1); push!(a, 2.0); push!(a, π)
3-element Vector{Float64}:
1.0
2.0
3.141592653589793
```
Assigning numbers into `a` will now convert them to `Float64` and `a` will be stored as
a contiguous block of 64-bit floating-point values that can be manipulated efficiently.
If you cannot avoid containers with abstract value types, it is sometimes better to
parametrize with `Any` to avoid runtime type checking. E.g. `IdDict{Any, Any}` performs
better than `IdDict{Type, Vector}`
See also the discussion under [Parametric Types](@ref).
### Avoid fields with abstract type
Types can be declared without specifying the types of their fields:
```jldoctest myambig
julia> struct MyAmbiguousType
a
end
```
This allows `a` to be of any type. This can often be useful, but it does have a downside: for
objects of type `MyAmbiguousType`, the compiler will not be able to generate high-performance
code. The reason is that the compiler uses the types of objects, not their values, to determine
how to build code. Unfortunately, very little can be inferred about an object of type `MyAmbiguousType`:
```jldoctest myambig
julia> b = MyAmbiguousType("Hello")
MyAmbiguousType("Hello")
julia> c = MyAmbiguousType(17)
MyAmbiguousType(17)
julia> typeof(b)
MyAmbiguousType
julia> typeof(c)
MyAmbiguousType
```
The values of `b` and `c` have the same type, yet their underlying representation of data in memory is very
different. Even if you stored just numeric values in field `a`, the fact that the memory representation
of a [`UInt8`](@ref) differs from a [`Float64`](@ref) also means that the CPU needs to handle
them using two different kinds of instructions. Since the required information is not available
in the type, such decisions have to be made at run-time. This slows performance.
You can do better by declaring the type of `a`. Here, we are focused on the case where `a` might
be any one of several types, in which case the natural solution is to use parameters. For example:
```jldoctest myambig2
julia> mutable struct MyType{T<:AbstractFloat}
a::T
end
```
This is a better choice than
```jldoctest myambig2
julia> mutable struct MyStillAmbiguousType
a::AbstractFloat
end
```
because the first version specifies the type of `a` from the type of the wrapper object. For
example:
```jldoctest myambig2
julia> m = MyType(3.2)
MyType{Float64}(3.2)
julia> t = MyStillAmbiguousType(3.2)
MyStillAmbiguousType(3.2)
julia> typeof(m)
MyType{Float64}
julia> typeof(t)
MyStillAmbiguousType
```
The type of field `a` can be readily determined from the type of `m`, but not from the type of
`t`. Indeed, in `t` it's possible to change the type of the field `a`:
```jldoctest myambig2
julia> typeof(t.a)
Float64
julia> t.a = 4.5f0
4.5f0
julia> typeof(t.a)
Float32
```
In contrast, once `m` is constructed, the type of `m.a` cannot change:
```jldoctest myambig2
julia> m.a = 4.5f0
4.5f0
julia> typeof(m.a)
Float64
```
The fact that the type of `m.a` is known from `m`'s type—coupled with the fact that its type
cannot change mid-function—allows the compiler to generate highly-optimized code for objects
like `m` but not for objects like `t`.
Of course, all of this is true only if we construct `m` with a concrete type. We can break this
by explicitly constructing it with an abstract type:
```jldoctest myambig2
julia> m = MyType{AbstractFloat}(3.2)
MyType{AbstractFloat}(3.2)
julia> typeof(m.a)
Float64
julia> m.a = 4.5f0
4.5f0
julia> typeof(m.a)
Float32
```
For all practical purposes, such objects behave identically to those of `MyStillAmbiguousType`.
It's quite instructive to compare the sheer amount of code generated for a simple function
```julia
func(m::MyType) = m.a+1
```
using
```julia
code_llvm(func, Tuple{MyType{Float64}})
code_llvm(func, Tuple{MyType{AbstractFloat}})
```
For reasons of length the results are not shown here, but you may wish to try this yourself. Because
the type is fully-specified in the first case, the compiler doesn't need to generate any code
to resolve the type at run-time. This results in shorter and faster code.
One should also keep in mind that not-fully-parameterized types behave like abstract types. For example, even though a fully specified `Array{T,n}` is concrete, `Array` itself with no parameters given is not concrete:
```jldoctest myambig3
julia> !isconcretetype(Array), !isabstracttype(Array), isstructtype(Array), !isconcretetype(Array{Int}), isconcretetype(Array{Int,1})
(true, true, true, true, true)
```
In this case, it would be better to avoid declaring `MyType` with a field `a::Array` and instead declare the field as `a::Array{T,N}` or as `a::A`, where `{T,N}` or `A` are parameters of `MyType`.
The previous advice is especially useful when the fields of a struct are meant to be functions, or more generally callable objects.
It is very tempting to define a struct as follows:
```julia
struct MyCallableWrapper
f::Function
end
```
But since `Function` is an abstract type, every call to `wrapper.f` will require dynamic dispatch, due to the type instability of accessing the field `f`.
Instead, you should write something like:
```julia
struct MyCallableWrapper{F}
f::F
end
```
which has nearly identical behavior but will be much faster (because the type instability is eliminated).
Note that we do not impose `F<:Function`: this means callable objects which do not subtype `Function` are also allowed for the field `f`.
### Avoid fields with abstract containers
The same best practices also work for container types:
```jldoctest containers
julia> struct MySimpleContainer{A<:AbstractVector}
a::A
end
julia> struct MyAmbiguousContainer{T}
a::AbstractVector{T}
end
julia> struct MyAlsoAmbiguousContainer
a::Array
end
```
For example:
```jldoctest containers
julia> c = MySimpleContainer(1:3);
julia> typeof(c)
MySimpleContainer{UnitRange{Int64}}
julia> c = MySimpleContainer([1:3;]);
julia> typeof(c)
MySimpleContainer{Vector{Int64}}
julia> b = MyAmbiguousContainer(1:3);
julia> typeof(b)
MyAmbiguousContainer{Int64}
julia> b = MyAmbiguousContainer([1:3;]);
julia> typeof(b)
MyAmbiguousContainer{Int64}
julia> d = MyAlsoAmbiguousContainer(1:3);
julia> typeof(d), typeof(d.a)
(MyAlsoAmbiguousContainer, Vector{Int64})
julia> d = MyAlsoAmbiguousContainer(1:1.0:3);
julia> typeof(d), typeof(d.a)
(MyAlsoAmbiguousContainer, Vector{Float64})
```
For `MySimpleContainer`, the object is fully-specified by its type and parameters, so the compiler
can generate optimized functions. In most instances, this will probably suffice.
While the compiler can now do its job perfectly well, there are cases where *you* might wish that
your code could do different things depending on the *element type* of `a`. Usually the best
way to achieve this is to wrap your specific operation (here, `foo`) in a separate function:
```jldoctest containers
julia> function sumfoo(c::MySimpleContainer)
s = 0
for x in c.a
s += foo(x)
end
s
end
sumfoo (generic function with 1 method)
julia> foo(x::Integer) = x
foo (generic function with 1 method)
julia> foo(x::AbstractFloat) = round(x)
foo (generic function with 2 methods)
```
This keeps things simple, while allowing the compiler to generate optimized code in all cases.
However, there are cases where you may need to declare different versions of the outer function
for different element types or types of the `AbstractVector` of the field `a` in `MySimpleContainer`.
You could do it like this:
```jldoctest containers
julia> function myfunc(c::MySimpleContainer{<:AbstractArray{<:Integer}})
return c.a[1]+1
end
myfunc (generic function with 1 method)
julia> function myfunc(c::MySimpleContainer{<:AbstractArray{<:AbstractFloat}})
return c.a[1]+2
end
myfunc (generic function with 2 methods)
julia> function myfunc(c::MySimpleContainer{Vector{T}}) where T <: Integer
return c.a[1]+3
end
myfunc (generic function with 3 methods)
```
```jldoctest containers
julia> myfunc(MySimpleContainer(1:3))
2
julia> myfunc(MySimpleContainer(1.0:3))
3.0
julia> myfunc(MySimpleContainer([1:3;]))
4
```
### Annotate values taken from untyped locations
It is often convenient to work with data structures that may contain values of any type (arrays
of type `Array{Any}`). But, if you're using one of these structures and happen to know the type
of an element, it helps to share this knowledge with the compiler:
```julia
function foo(a::Array{Any,1})
x = a[1]::Int32
b = x+1
...
end
```
Here, we happened to know that the first element of `a` would be an [`Int32`](@ref). Making
an annotation like this has the added benefit that it will raise a run-time error if the
value is not of the expected type, potentially catching certain bugs earlier.
In the case that the type of `a[1]` is not known precisely, `x` can be declared via
`x = convert(Int32, a[1])::Int32`. The use of the [`convert`](@ref) function allows `a[1]`
to be any object convertible to an `Int32` (such as `UInt8`), thus increasing the genericity
of the code by loosening the type requirement. Notice that `convert` itself needs a type
annotation in this context in order to achieve type stability. This is because the compiler
cannot deduce the type of the return value of a function, even `convert`, unless the types of
all the function's arguments are known.
Type annotation will not enhance (and can actually hinder) performance if the type is abstract,
or constructed at run-time. This is because the compiler cannot use the annotation to specialize
the subsequent code, and the type-check itself takes time. For example, in the code:
```julia
function nr(a, prec)
ctype = prec == 32 ? Float32 : Float64
b = Complex{ctype}(a)
c = (b + 1.0f0)::Complex{ctype}
abs(c)
end
```
the annotation of `c` harms performance. To write performant code involving types constructed at
run-time, use the [function-barrier technique](@ref kernel-functions) discussed below, and ensure
that the constructed type appears among the argument types of the kernel function so that the kernel
operations are properly specialized by the compiler. For example, in the above snippet, as soon as
`b` is constructed, it can be passed to another function `k`, the kernel. If, for example, function
`k` declares `b` as an argument of type `Complex{T}`, where `T` is a type parameter, then a type annotation
appearing in an assignment statement within `k` of the form:
```julia
c = (b + 1.0f0)::Complex{T}
```
does not hinder performance (but does not help either) since the compiler can determine the type of `c`
at the time `k` is compiled.
### Be aware of when Julia avoids specializing
As a heuristic, Julia avoids automatically [specializing](@ref man-method-specializations) on argument type parameters in three
specific cases: `Type`, `Function`, and `Vararg`. Julia will always specialize when the argument is
used within the method, but not if the argument is just passed through to another function. This
usually has no performance impact at runtime and
[improves compiler performance](@ref compiler-efficiency-issues). If you find it does have a
performance impact at runtime in your case, you can trigger specialization by adding a type
parameter to the method declaration. Here are some examples:
This will not specialize:
```julia
function f_type(t) # or t::Type
x = ones(t, 10)
return sum(map(sin, x))
end
```
but this will:
```julia
function g_type(t::Type{T}) where T
x = ones(T, 10)
return sum(map(sin, x))
end
```
These will not specialize:
```julia
f_func(f, num) = ntuple(f, div(num, 2))
g_func(g::Function, num) = ntuple(g, div(num, 2))
```
but this will:
```julia
h_func(h::H, num) where {H} = ntuple(h, div(num, 2))
```
This will not specialize:
```julia
f_vararg(x::Int...) = tuple(x...)
```
but this will:
```julia
g_vararg(x::Vararg{Int, N}) where {N} = tuple(x...)
```
One only needs to introduce a single type parameter to force specialization, even if the other types are unconstrained. For example, this will also specialize, and is useful when the arguments are not all of the same type:
```julia
h_vararg(x::Vararg{Any, N}) where {N} = tuple(x...)
```
Note that [`@code_typed`](@ref) and friends will always show you specialized code, even if Julia
would not normally specialize that method call. You need to check the
[method internals](@ref ast-lowered-method) if you want to see whether specializations are generated
when argument types are changed, i.e., if `Base.specializations(@which f(...))` contains specializations
for the argument in question.
### Write "type-stable" functions
When possible, it helps to ensure that a function always returns a value of the same type. Consider
the following definition:
```julia
pos(x) = x < 0 ? 0 : x
```
Although this seems innocent enough, the problem is that `0` is an integer (of type `Int`) and
`x` might be of any type. Thus, depending on the value of `x`, this function might return a value
of either of two types. This behavior is allowed, and may be desirable in some cases. But it can
easily be fixed as follows:
```julia
pos(x) = x < 0 ? zero(x) : x
```
There is also a [`oneunit`](@ref) function, and a more general [`oftype(x, y)`](@ref) function, which
returns `y` converted to the type of `x`.
### Avoid changing the type of a variable
An analogous "type-stability" problem exists for variables used repeatedly within a function:
```julia
function foo()
x = 1
for i = 1:10
x /= rand()
end
return x
end
```
Local variable `x` starts as an integer, and after one loop iteration becomes a floating-point
number (the result of [`/`](@ref) operator). This makes it more difficult for the compiler to
optimize the body of the loop. There are several possible fixes:
* Initialize `x` with `x = 1.0`
* Declare the type of `x` explicitly as `x::Float64 = 1`
* Use an explicit conversion by `x = oneunit(Float64)`
* Initialize with the first loop iteration, to `x = 1 / rand()`, then loop `for i = 2:10`
### [Separate kernel functions (aka, function barriers)](@id kernel-functions)
Many functions follow a pattern of performing some set-up work, and then running many iterations
to perform a core computation. Where possible, it is a good idea to put these core computations
in separate functions. For example, the following contrived function returns an array of a randomly-chosen
type:
```jldoctest; setup = :(using Random; Random.seed!(1234))
julia> function strange_twos(n)
a = Vector{rand(Bool) ? Int64 : Float64}(undef, n)
for i = 1:n
a[i] = 2
end
return a
end;
julia> strange_twos(3)
3-element Vector{Int64}:
2
2
2
```
This should be written as:
```jldoctest; setup = :(using Random; Random.seed!(1234))
julia> function fill_twos!(a)
for i = eachindex(a)
a[i] = 2
end
end;
julia> function strange_twos(n)
a = Vector{rand(Bool) ? Int64 : Float64}(undef, n)
fill_twos!(a)
return a
end;
julia> strange_twos(3)
3-element Vector{Int64}:
2
2
2
```
Julia's compiler specializes code for argument types at function boundaries, so in the original
implementation it does not know the type of `a` during the loop (since it is chosen randomly).
Therefore the second version is generally faster since the inner loop can be recompiled as part
of `fill_twos!` for different types of `a`.
The second form is also often better style and can lead to more code reuse.
This pattern is used in several places in Julia Base. For example, see `vcat` and `hcat`
in [`abstractarray.jl`](https://github.com/JuliaLang/julia/blob/40fe264f4ffaa29b749bcf42239a89abdcbba846/base/abstractarray.jl#L1205-L1206),
or the [`fill!`](@ref) function, which we could have used instead of writing our own `fill_twos!`.
Functions like `strange_twos` occur when dealing with data of uncertain type, for example data
loaded from an input file that might contain either integers, floats, strings, or something else.
### [[`@code_warntype`](@ref)](@id man-code-warntype)
The macro [`@code_warntype`](@ref) (or its function variant [`code_warntype`](@ref)) can sometimes
be helpful in diagnosing type-related problems. Here's an example:
```julia-repl
julia> @noinline pos(x) = x < 0 ? 0 : x;
julia> function f(x)
y = pos(x)
return sin(y*x + 1)
end;
julia> @code_warntype f(3.2)
MethodInstance for f(::Float64)
from f(x) @ Main REPL[9]:1
Arguments
#self#::Core.Const(f)
x::Float64
Locals
y::Union{Float64, Int64}
Body::Float64
1 ─ (y = Main.pos(x))
│ %2 = (y * x)::Float64
│ %3 = (%2 + 1)::Float64
│ %4 = Main.sin(%3)::Float64
└── return %4
```
Interpreting the output of [`@code_warntype`](@ref), like that of its cousins [`@code_lowered`](@ref),
[`@code_typed`](@ref), [`@code_llvm`](@ref), and [`@code_native`](@ref), takes a little practice.
Your code is being presented in form that has been heavily digested on its way to generating
compiled machine code. Most of the expressions are annotated by a type, indicated by the `::T`
(where `T` might be [`Float64`](@ref), for example). The most important characteristic of [`@code_warntype`](@ref)
is that non-concrete types are displayed in red; since this document is written in Markdown, which has no color,
in this document, red text is denoted by uppercase.
At the top, the inferred return type of the function is shown as `Body::Float64`.
The next lines represent the body of `f` in Julia's SSA IR form.
The numbered boxes are labels and represent targets for jumps (via `goto`) in your code.
Looking at the body, you can see that the first thing that happens is that `pos` is called and the
return value has been inferred as the `Union` type `Union{Float64, Int64}` shown in uppercase since
it is a non-concrete type. This means that we cannot know the exact return type of `pos` based on the
input types. However, the result of `y*x`is a `Float64` no matter if `y` is a `Float64` or `Int64`
The net result is that `f(x::Float64)` will not be type-unstable
in its output, even if some of the intermediate computations are type-unstable.
How you use this information is up to you. Obviously, it would be far and away best to fix `pos`
to be type-stable: if you did so, all of the variables in `f` would be concrete, and its performance
would be optimal. However, there are circumstances where this kind of *ephemeral* type instability
might not matter too much: for example, if `pos` is never used in isolation, the fact that `f`'s
output is type-stable (for [`Float64`](@ref) inputs) will shield later code from the propagating
effects of type instability. This is particularly relevant in cases where fixing the type instability
is difficult or impossible. In such cases, the tips above (e.g., adding type annotations and/or
breaking up functions) are your best tools to contain the "damage" from type instability.
Also, note that even Julia Base has functions that are type unstable.
For example, the function [`findfirst`](@ref) returns the index into an array where a key is found,
or `nothing` if it is not found, a clear type instability. In order to make it easier to find the
type instabilities that are likely to be important, `Union`s containing either `missing` or `nothing`
are color highlighted in yellow, instead of red.
The following examples may help you interpret expressions marked as containing non-concrete types:
* Function body starting with `Body::Union{T1,T2})`
* Interpretation: function with unstable return type
* Suggestion: make the return value type-stable, even if you have to annotate it
* `invoke Main.g(%%x::Int64)::Union{Float64, Int64}`
* Interpretation: call to a type-unstable function `g`.
* Suggestion: fix the function, or if necessary annotate the return value
* `invoke Base.getindex(%%x::Array{Any,1}, 1::Int64)::Any`
* Interpretation: accessing elements of poorly-typed arrays
* Suggestion: use arrays with better-defined types, or if necessary annotate the type of individual
element accesses
* `Base.getfield(%%x, :(:data))::Array{Float64,N} where N`
* Interpretation: getting a field that is of non-concrete type. In this case, the type of `x`, say `ArrayContainer`, had a
field `data::Array{T}`. But `Array` needs the dimension `N`, too, to be a concrete type.
* Suggestion: use concrete types like `Array{T,3}` or `Array{T,N}`, where `N` is now a parameter
of `ArrayContainer`
### [Performance of captured variable](@id man-performance-captured)
Consider the following example that defines an inner function:
```julia
function abmult(r::Int)
if r < 0
r = -r
end
f = x -> x * r
return f
end
```
Function `abmult` returns a function `f` that multiplies its argument by
the absolute value of `r`. The inner function assigned to `f` is called a
"closure". Inner functions are also used by the
language for `do`-blocks and for generator expressions.
This style of code presents performance challenges for the language.
The parser, when translating it into lower-level instructions,
substantially reorganizes the above code by extracting the
inner function to a separate code block. "Captured" variables such as `r`
that are shared by inner functions and their enclosing scope are
also extracted into a heap-allocated "box" accessible to both inner and
outer functions because the language specifies that `r` in the
inner scope must be identical to `r` in the outer scope even after the
outer scope (or another inner function) modifies `r`.
The discussion in the preceding paragraph referred to the "parser", that is, the phase
of compilation that takes place when the module containing `abmult` is first loaded,
as opposed to the later phase when it is first invoked. The parser does not "know" that
`Int` is a fixed type, or that the statement `r = -r` transforms an `Int` to another `Int`.
The magic of type inference takes place in the later phase of compilation.
Thus, the parser does not know that `r` has a fixed type (`Int`).
Nor that `r` does not change value once the inner function is created (so that
the box is unneeded). Therefore, the parser emits code for
box that holds an object with an abstract type such as `Any`, which
requires run-time type dispatch for each occurrence of `r`. This can be
verified by applying `@code_warntype` to the above function. Both the boxing
and the run-time type dispatch can cause loss of performance.
If captured variables are used in a performance-critical section of the code,
then the following tips help ensure that their use is performant. First, if
it is known that a captured variable does not change its type, then this can
be declared explicitly with a type annotation (on the variable, not the
right-hand side):
```julia
function abmult2(r0::Int)
r::Int = r0
if r < 0
r = -r
end
f = x -> x * r
return f
end
```
The type annotation partially recovers lost performance due to capturing because
the parser can associate a concrete type to the object in the box.
Going further, if the captured variable does not need to be boxed at all (because it
will not be reassigned after the closure is created), this can be indicated
with `let` blocks as follows.
```julia
function abmult3(r::Int)
if r < 0
r = -r
end
f = let r = r
x -> x * r
end
return f
end
```
The `let` block creates a new variable `r` whose scope is only the
inner function. The second technique recovers full language performance
in the presence of captured variables. Note that this is a rapidly
evolving aspect of the compiler, and it is likely that future releases
will not require this degree of programmer annotation to attain performance.
In the mean time, some user-contributed packages like
[FastClosures](https://github.com/c42f/FastClosures.jl) automate the
insertion of `let` statements as in `abmult3`.
### [Types with values-as-parameters](@id man-performance-value-type)
Let's say you want to create an `N`-dimensional array that has size 3 along each axis. Such arrays
can be created like this:
```jldoctest
julia> A = fill(5.0, (3, 3))
3×3 Matrix{Float64}:
5.0 5.0 5.0
5.0 5.0 5.0
5.0 5.0 5.0
```
This approach works very well: the compiler can figure out that `A` is an `Array{Float64,2}` because
it knows the type of the fill value (`5.0::Float64`) and the dimensionality (`(3, 3)::NTuple{2,Int}`).
This implies that the compiler can generate very efficient code for any future usage of `A` in
the same function.
But now let's say you want to write a function that creates a 3×3×... array in arbitrary dimensions;
you might be tempted to write a function
```jldoctest
julia> function array3(fillval, N)
fill(fillval, ntuple(d->3, N))
end
array3 (generic function with 1 method)
julia> array3(5.0, 2)
3×3 Matrix{Float64}:
5.0 5.0 5.0
5.0 5.0 5.0
5.0 5.0 5.0
```
This works, but (as you can verify for yourself using `@code_warntype array3(5.0, 2)`) the problem
is that the output type cannot be inferred: the argument `N` is a *value* of type `Int`, and type-inference
does not (and cannot) predict its value in advance. This means that code using the output of this
function has to be conservative, checking the type on each access of `A`; such code will be very
slow.
Now, one very good way to solve such problems is by using the [function-barrier technique](@ref kernel-functions).
However, in some cases you might want to eliminate the type-instability altogether. In such cases,
one approach is to pass the dimensionality as a parameter, for example through `Val{T}()` (see
["Value types"](@ref)):
```jldoctest
julia> function array3(fillval, ::Val{N}) where N
fill(fillval, ntuple(d->3, Val(N)))
end
array3 (generic function with 1 method)
julia> array3(5.0, Val(2))
3×3 Matrix{Float64}:
5.0 5.0 5.0
5.0 5.0 5.0
5.0 5.0 5.0
```
Julia has a specialized version of `ntuple` that accepts a `Val{::Int}` instance as the second
parameter; by passing `N` as a type-parameter, you make its "value" known to the compiler.
Consequently, this version of `array3` allows the compiler to predict the return type.
However, making use of such techniques can be surprisingly subtle. For example, it would be of
no help if you called `array3` from a function like this:
```julia
function call_array3(fillval, n)
A = array3(fillval, Val(n))
end
```
Here, you've created the same problem all over again: the compiler can't guess what `n` is,
so it doesn't know the *type* of `Val(n)`. Attempting to use `Val`, but doing so incorrectly, can
easily make performance *worse* in many situations. (Only in situations where you're effectively
combining `Val` with the function-barrier trick, to make the kernel function more efficient, should
code like the above be used.)
An example of correct usage of `Val` would be:
```julia
function filter3(A::AbstractArray{T,N}) where {T,N}
kernel = array3(1, Val(N))