@@ -28,7 +28,7 @@ Device compiler is further split into the following major components:
28
28
29
29
- ** Front-end** - parses input source, outlines "device part" of the code,
30
30
applies additional restrictions on the device code (e.g. no exceptions or
31
- virtual calls), generated LLVM IR for the device code only and "integration
31
+ virtual calls), generates LLVM IR for the device code only and "integration
32
32
header" which provides information like kernel name, parameters order and data
33
33
type for the runtime library.
34
34
- ** Middle-end** - transforms the initial LLVM IR* to get consumed by the
@@ -42,6 +42,106 @@ back-end. Today middle-end transformations include just a couple of passes:
42
42
- ** Back-end** - produces native "device" code in ahead-of-time compilation
43
43
mode.
44
44
45
+ ### SYCL support in Clang front-end
46
+
47
+ SYCL support in Clang front-end can be split into the following components:
48
+
49
+ - Device code outlining. This component is responsible for identifying and
50
+ outlining "device code" in the single source.
51
+
52
+ - SYCL kernel function object (functor or lambda) lowering. This component
53
+ creates an OpenCL kernel function interface for SYCL kernels.
54
+
55
+ - Device code diagnostics. This component enforces language restrictions on
56
+ device code.
57
+
58
+ - Integration header generation. This component emits information required for
59
+ binding host and device parts of the SYCL code via OpenCL API.
60
+
61
+ #### Device code outlining
62
+
63
+ Here is a code example of a SYCL program that demonstrates compiler outlining
64
+ work:
65
+
66
+ ``` C++
67
+ int foo (int x) { return ++x; }
68
+ int bar(int x) { throw std::exception{"CPU code only!"}; }
69
+ ...
70
+ using namespace cl::sycl;
71
+ queue Q;
72
+ buffer<int, 1> a{range<1>{1024}};
73
+ Q.submit([ &] (handler& cgh) {
74
+ auto A = a.get_access< access::mode::write > (cgh);
75
+ cgh.parallel_for<init_a>(range<1>{1024}, [ =] (id<1> index) {
76
+ A[ index] = index[ 0] * 2 + index[ 1] + foo(42);
77
+ });
78
+ }
79
+ ...
80
+ ```
81
+
82
+ In this example, the SYCL compiler needs to compile the lambda expression passed
83
+ to the `cl::sycl::handler::parallel_for` method, as well as the function `foo`
84
+ called from the lambda expression for the device.
85
+ The compiler must also ignore the `bar` function when we compile the
86
+ "device" part of the single source code, as it's unused inside the device
87
+ portion of the source code (the contents of the lambda expression passed to the
88
+ `cl::sycl::handler::parallel_for` and any function called from this lambda
89
+ expression).
90
+
91
+ The current approach is to use the SYCL kernel attribute in the SYCL runtime to
92
+ mark code passed to `cl::sycl::handler::parallel_for` as "kernel functions".
93
+ The SYCL runtime library can't mark foo as "device" code - this is a compiler
94
+ job: to traverse all symbols accessible from kernel functions and add them to
95
+ the "device part" of the code marking them with the new SYCL device attribute.
96
+
97
+ #### Lowering of lambda function objects and named function objects
98
+
99
+ All SYCL memory objects shared between host and device (buffers/images,
100
+ these objects map to OpenCL buffers and images) must be accessed through special
101
+ `accessor` classes. The "device" side implementation of these classes contain
102
+ pointers to the device memory. As there is no way in OpenCL to pass structures
103
+ with pointers inside as kernel arguments all memory objects shared between host
104
+ and device must be passed to the kernel as raw pointers.
105
+ SYCL also has a special mechanism for passing kernel arguments from host to
106
+ the device. In OpenCL you need to call `clSetKernelArg`, in SYCL all the
107
+ kernel arguments are captures/fields of lambda/functor SYCL functions for
108
+ invoking kernels (such as `parallel_for`). For example, in the previous code
109
+ snippet above `accessor` `A` is one such captured kernel argument.
110
+
111
+ To facilitate the mapping of the captures/fields of lambdas/functors to OpenCL
112
+ kernel and overcome OpenCL limitations we added the generation of a "kernel
113
+ wrapper" function inside the compiler. A "kernel wrapper" function contains the
114
+ body of the SYCL kernel function, receives OpenCL like parameters and
115
+ additionally does some manipulation to initialize captured lambda/functor fields
116
+ with these parameters. In some pseudo code the "kernel wrapper" for the previous
117
+ code snippet above looks like this:
118
+
119
+ ```C++
120
+
121
+ // Let the lambda expression passed to the parallel_for declare unnamed
122
+ // function object with "Lambda" type.
123
+
124
+ // SYCL kernel is defined in SYCL headers:
125
+ __attribute__((sycl_kernel)) someSYCLKernel(Lambda lambda) {
126
+ lambda();
127
+ }
128
+
129
+ // Kernel wrapper
130
+ __kernel wrapper(global int* a) {
131
+ Lambda lambda; // Actually lambda declaration doesn't have a name in AST
132
+ // Let the lambda have one captured field - accessor A. We need to init it
133
+ // with global pointer from arguments:
134
+ lambda.A.__init(a);
135
+ // Body of SYCL kernel from SYCL headers:
136
+ {
137
+ lambda();
138
+ }
139
+ }
140
+
141
+ ```
142
+
143
+ "Kernel wrapper" is generated by the compiler inside the Sema using AST nodes.
144
+
45
145
### SYCL support in the driver
46
146
47
147
SYCL offload support in the driver is based on the clang driver concepts and
0 commit comments