[SYCL] Add SYCL clang frontend doc

Fznamznon · bader · commit 1614b8f91cce · 2019-06-09T21:35:03.000+03:00
Signed-off-by: Mariya Podchishchaeva &lt;mariya.podchishchaeva@intel.com&gt;
diff --git a/sycl/doc/SYCL_compiler_and_runtime_design.md b/sycl/doc/SYCL_compiler_and_runtime_design.md
@@ -28,7 +28,7 @@ Device compiler is further split into the following major components:
 
 - **Front-end** - parses input source, outlines "device part" of the code,
 applies additional restrictions on the device code (e.g. no exceptions or
-virtual calls), generated LLVM IR for the device code only and "integration
+virtual calls), generates LLVM IR for the device code only and "integration
 header" which provides information like kernel name, parameters order and data
 type for the runtime library.
 - **Middle-end** - transforms the initial LLVM IR* to get consumed by the
@@ -42,6 +42,106 @@ back-end. Today middle-end transformations include just a couple of passes:
 - **Back-end** - produces native "device" code in ahead-of-time compilation
 mode.
 
+### SYCL support in Clang front-end
+
+SYCL support in Clang front-end can be split into the following components:
+
+- Device code outlining. This component is responsible for identifying and
+outlining "device code" in the single source.
+
+- SYCL kernel function object (functor or lambda) lowering. This component
+creates an OpenCL kernel function interface for SYCL kernels.
+
+- Device code diagnostics. This component enforces language restrictions on
+device code.
+
+- Integration header generation. This component emits information required for
+binding host and device parts of the SYCL code via OpenCL API.
+
+#### Device code outlining
+
+Here is a code example of a SYCL program that demonstrates compiler outlining
+work:
+
+```C++
+int foo(int x) { return ++x; }
+int bar(int x) { throw std::exception{"CPU code only!"}; }
+...
+using namespace cl::sycl;
+queue Q;
+buffer<int, 1> a{range<1>{1024}};
+Q.submit([&](handler& cgh) {
+      auto A = a.get_access<access::mode::write>(cgh);
+      cgh.parallel_for<init_a>(range<1>{1024}, [=](id<1> index) {
+        A[index] = index[0] * 2 + index[1] + foo(42);
+      });
+    }
+...
+```
+
+In this example, the SYCL compiler needs to compile the lambda expression passed
+to the `cl::sycl::handler::parallel_for` method, as well as the function `foo`
+called from the lambda expression for the device.
+The compiler must also ignore the `bar` function when we compile the
+"device" part of the single source code, as it's unused inside the device
+portion of the source code (the contents of the lambda expression passed to the
+`cl::sycl::handler::parallel_for` and any function called from this lambda
+expression).
+
+The current approach is to use the SYCL kernel attribute in the SYCL runtime to
+mark code passed to `cl::sycl::handler::parallel_for` as "kernel functions".
+The SYCL runtime library can't mark foo as "device" code - this is a compiler
+job: to traverse all symbols accessible from kernel functions and add them to
+the "device part" of the code marking them with the new SYCL device attribute.
+
+#### Lowering of lambda function objects and named function objects
+
+All SYCL memory objects shared between host and device (buffers/images,
+these objects map to OpenCL buffers and images) must be accessed through special
+`accessor` classes. The "device" side implementation of these classes contain
+pointers to the device memory. As there is no way in OpenCL to pass structures
+with pointers inside as kernel arguments all memory objects shared between host
+and device must be passed to the kernel as raw pointers.
+SYCL also has a special mechanism for passing kernel arguments from host to
+the device. In OpenCL you need to call `clSetKernelArg`, in SYCL all the
+kernel arguments are captures/fields of lambda/functor SYCL functions for
+invoking kernels (such as `parallel_for`). For example, in the previous code
+snippet above `accessor` `A` is one such captured kernel argument.
+
+To facilitate the mapping of the captures/fields of lambdas/functors to OpenCL
+kernel and overcome OpenCL limitations we added the generation of a "kernel
+wrapper" function inside the compiler. A "kernel wrapper" function contains the
+body of the SYCL kernel function, receives OpenCL like parameters and
+additionally does some manipulation to initialize captured lambda/functor fields
+with these parameters. In some pseudo code the "kernel wrapper" for the previous
+code snippet above looks like this:
+
+```C++
+
+// Let the lambda expression passed to the parallel_for declare unnamed
+// function object with "Lambda" type.
+
+// SYCL kernel is defined in SYCL headers:
+__attribute__((sycl_kernel)) someSYCLKernel(Lambda lambda) {
+  lambda();
+}
+
+// Kernel wrapper
+__kernel wrapper(global int* a) {
+  Lambda lambda; // Actually lambda declaration doesn't have a name in AST
+  // Let the lambda have one captured field - accessor A. We need to init it
+  // with global pointer from arguments:
+  lambda.A.__init(a);
+  // Body of SYCL kernel from SYCL headers:
+  {
+    lambda();
+  }
+}
+
+```
+
+"Kernel wrapper" is generated by the compiler inside the Sema using AST nodes.
+
 ### SYCL support in the driver
 
 SYCL offload support in the driver is based on the clang driver concepts and