Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docker based package builder and switch to system compiler #51

Merged
merged 10 commits into from
Nov 13, 2024

Conversation

xinyazhang
Copy link
Collaborator

@xinyazhang xinyazhang commented Oct 29, 2024

Major Changes:

  • dockerfile/build.sh script that builds AOTriton from offical AlmaLinux 8 docker image.
  • Deprecate hipcc since GCC >= 9 / Clang >= 10 is capable to compile the C++ part
    • hipcc is not compatible with SCL. scl enable gcc-toolset-13 "/opt/rocm/bin/hipcc -v" is supposed to show "Found candidate GCC installation" but nothing is displayed, and eventually this triggers Missing cmath/libcxx in ROCm 5.3.0 ROCm#1843 during compiling.

Minor Changes

  • In pyaotriton, register hipError_t locally to avoid "hipError_t is already registered" bug.

python -m pip install cmake ninja wheel
RUN mkdir /root/build

ARG amdgpu_installer="https://repo.radeon.com/amdgpu-install/6.2.2/el/8.10/amdgpu-install-6.2.60202-1.el8.noarch.rpm"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a maintenance headache, we need to make this more generic for this to be generally usable for any ROCm release

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this offline, this is the default value. The intention is for any CI job or users to provide a link to the desired amdgpu installer for their desired ROCm version.
We could use logic similar to this file:
https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/files/install_kdb_files_for_pytorch_wheels.sh
to keep the default argument always up-to-date.

cd /root/build
scl enable gcc-toolset-13 "cd aotriton/third_party/triton/python; python setup.py develop --user"
scl enable gcc-toolset-13 "mkdir -p aotriton/build; cd aotriton/build; cmake .. -DCMAKE_PREFIX_PATH=/opt/rocm -DPYTHON_EXECUTABLE=/usr/bin/python3.11 -DCMAKE_INSTALL_PREFIX=installed_dir/aotriton -DCMAKE_BUILD_TYPE=Release -DAOTRITON_GPU_BUILD_TIMEOUT=0 \"-DTARGET_GPUS=${AOTRITON_TARGET_GPUS}\" -DAOTRITON_NO_PYTHON=ON -G Ninja && ninja install"
rocmver=$(scl enable gcc-toolset-13 "cpp -I/opt/rocm/include /input/print_rocm_version.h"|tail -n 1|sed 's/ //g')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naromero77amd This is another interesting and even more compact way of getting the ROCM_VERSION* defines. Something to consider if we want to make our LoadHIP.cmake logic even more compact.

if(AOTRITON_BUILD_FOR_TUNING)
target_compile_definitions(aotriton_v2 PRIVATE -DAOTRITON_BUILD_FOR_TUNING=1)
else(AOTRITON_BUILD_FOR_TUNING)
target_compile_definitions(aotriton_v2 PRIVATE -DAOTRITON_BUILD_FOR_TUNING=0)
endif(AOTRITON_BUILD_FOR_TUNING)
target_link_libraries(aotriton_v2 lzma_interface)
target_link_libraries(aotriton_v2 PRIVATE lzma_interface)
target_link_libraries(aotriton_v2 PUBLIC hip::host hip::amdhip64)
Copy link
Contributor

@jithunnair-amd jithunnair-amd Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to introduce an explicit dependency on the HIP targets because we are not using the hipcc compiler as the default compiler anymore: https://github.com/ROCm/aotriton/pull/51/files#diff-1e7de1ae2d059d21e1dd75d5812d5a34b0222cef273b7c3a2af62eb747f9d20aL35

@@ -31,9 +33,6 @@ if(NOT AOTRITON_NO_PYTHON)
add_subdirectory(third_party/pybind11)
endif()

# Must be after pybind11
set(CMAKE_CXX_COMPILER hipcc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to use hipcc since we are not compiling any kernels, so a host compiler should be good enough.

@xinyazhang xinyazhang force-pushed the xinyazhang/manylinux_2_28-dockerfile branch from 529395c to 2147593 Compare October 30, 2024 18:13
@@ -168,13 +168,14 @@ target_include_directories(aotriton_v2 PUBLIC ${CMAKE_CURRENT_LIST_DIR}/../inclu
target_include_directories(aotriton_v2 PUBLIC ${CMAKE_BINARY_DIR}/include) # for <build>/include/aotriton/config.h. Code should use <aotriton/config.h>
target_include_directories(aotriton_v2 PRIVATE ${CMAKE_CURRENT_BINARY_DIR})
target_include_directories(aotriton_v2 PRIVATE ${CMAKE_CURRENT_LIST_DIR}/../third_party/incbin)
target_compile_options(aotriton_v2 PRIVATE -fPIC --no-offload-arch=all)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed since we're not using hipcc anymore

.cu_seqlens_q = &cu_seqlens_q,
.cu_seqlens_k = &cu_seqlens_k,
.num_seqlens = num_seqlens,
.max_seqlen_q = max_seqlen_q,
.max_seqlen_k = max_seqlen_k,
.head_dim = head_size,
Copy link
Contributor

@jithunnair-amd jithunnair-amd Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required because we're switching to gcc13, which is more strict about the order of struct fields

@xinyazhang xinyazhang changed the title [Queued PR] Add docker based package builder [Queued PR] Add docker based package builder and switch to system compiler Oct 30, 2024
@@ -8,7 +8,7 @@ namespace py = pybind11;
namespace pyaotriton {
#define EV(name) value(#name, name)
void def_hipruntime(py::module_& m) {
py::enum_<hipError_t>(m, "hipError_t")
py::enum_<hipError_t>(m, "hipError_t", py::module_local())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not due to the compiler switch or any other changes in this PR, then it would be better to separate this into its own PR with reasoning.

@xinyazhang
Copy link
Collaborator Author

Dockerfile tested locallly and can build 0.7.1b
Compiler switch tested locally and can run test pytest ../test/test_backward.py -k 1.2 -v -x

@jithunnair-amd
Copy link
Contributor

@ethanwee1 Can you please validate this PR by trying to build aotriton 0.7b and 0.7.1b using the Dockerfile in this PR? Please make sure that it generates a tarball with the following files:

$ tar tvf output/aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz 
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/lib/
-rw-r--r-- root/root 382452768 2024-10-29 09:11 aotriton/lib/libaotriton_v2.so
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/_internal/
-rw-r--r-- root/root      1490 2024-10-29 08:48 aotriton/include/aotriton/_internal/triton_kernel.h
-rw-r--r-- root/root       592 2024-10-29 08:48 aotriton/include/aotriton/_internal/util.h
-rw-r--r-- root/root       566 2024-10-29 08:48 aotriton/include/aotriton/cpp_tune.h
-rw-r--r-- root/root       422 2024-10-29 08:48 aotriton/include/aotriton/dtypes.h
-rw-r--r-- root/root      5435 2024-10-29 08:48 aotriton/include/aotriton/flash.h
-rw-r--r-- root/root       695 2024-10-29 08:48 aotriton/include/aotriton/runtime.h
-rw-r--r-- root/root      3316 2024-10-29 08:48 aotriton/include/aotriton/util.h

@xinyazhang xinyazhang force-pushed the xinyazhang/storagev2 branch from 43f5f33 to 1ff178e Compare October 30, 2024 18:47
@xinyazhang xinyazhang force-pushed the xinyazhang/manylinux_2_28-dockerfile branch from 2147593 to 7ca74d9 Compare October 30, 2024 18:48
@xinyazhang xinyazhang force-pushed the xinyazhang/storagev2 branch from 1ff178e to 801842e Compare October 30, 2024 23:13
@xinyazhang xinyazhang force-pushed the xinyazhang/manylinux_2_28-dockerfile branch from 7ca74d9 to 63fe1cc Compare October 30, 2024 23:13
@xinyazhang xinyazhang force-pushed the xinyazhang/storagev2 branch from 801842e to 93299bd Compare October 30, 2024 23:27
@xinyazhang xinyazhang force-pushed the xinyazhang/manylinux_2_28-dockerfile branch from 63fe1cc to ca18036 Compare October 30, 2024 23:27
@ethanwee1
Copy link

ethanwee1 commented Oct 31, 2024

@ethanwee1 Can you please validate this PR by trying to build aotriton 0.7b and 0.7.1b using the Dockerfile in this PR? Please make sure that it generates a tarball with the following files:

$ tar tvf output/aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz 
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/lib/
-rw-r--r-- root/root 382452768 2024-10-29 09:11 aotriton/lib/libaotriton_v2.so
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/_internal/
-rw-r--r-- root/root      1490 2024-10-29 08:48 aotriton/include/aotriton/_internal/triton_kernel.h
-rw-r--r-- root/root       592 2024-10-29 08:48 aotriton/include/aotriton/_internal/util.h
-rw-r--r-- root/root       566 2024-10-29 08:48 aotriton/include/aotriton/cpp_tune.h
-rw-r--r-- root/root       422 2024-10-29 08:48 aotriton/include/aotriton/dtypes.h
-rw-r--r-- root/root      5435 2024-10-29 08:48 aotriton/include/aotriton/flash.h
-rw-r--r-- root/root       695 2024-10-29 08:48 aotriton/include/aotriton/runtime.h
-rw-r--r-- root/root      3316 2024-10-29 08:48 aotriton/include/aotriton/util.h

Validated 0.7.1b and 0.7b after following build.sh

aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz 
aotriton-0.7b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz

@xinyazhang xinyazhang force-pushed the xinyazhang/storagev2 branch from 93299bd to 0dc5705 Compare November 7, 2024 18:08
@xinyazhang xinyazhang force-pushed the xinyazhang/manylinux_2_28-dockerfile branch from ca18036 to 4357504 Compare November 7, 2024 20:24
@jithunnair-amd jithunnair-amd marked this pull request as ready for review November 7, 2024 21:48
@jithunnair-amd
Copy link
Contributor

@ethanwee1 Can you please validate this PR by trying to build aotriton 0.7b and 0.7.1b using the Dockerfile in this PR? Please make sure that it generates a tarball with the following files:

$ tar tvf output/aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz 
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/lib/
-rw-r--r-- root/root 382452768 2024-10-29 09:11 aotriton/lib/libaotriton_v2.so
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/
drwxr-xr-x root/root         0 2024-10-29 09:11 aotriton/include/aotriton/_internal/
-rw-r--r-- root/root      1490 2024-10-29 08:48 aotriton/include/aotriton/_internal/triton_kernel.h
-rw-r--r-- root/root       592 2024-10-29 08:48 aotriton/include/aotriton/_internal/util.h
-rw-r--r-- root/root       566 2024-10-29 08:48 aotriton/include/aotriton/cpp_tune.h
-rw-r--r-- root/root       422 2024-10-29 08:48 aotriton/include/aotriton/dtypes.h
-rw-r--r-- root/root      5435 2024-10-29 08:48 aotriton/include/aotriton/flash.h
-rw-r--r-- root/root       695 2024-10-29 08:48 aotriton/include/aotriton/runtime.h
-rw-r--r-- root/root      3316 2024-10-29 08:48 aotriton/include/aotriton/util.h

Validated 0.7.1b and 0.7b after following build.sh

aotriton-0.7.1b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz 
aotriton-0.7b-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz

@xinyazhang Taking this out of Draft mode as it is ready to merge (even if queued) based on Ethan's validation

@xinyazhang xinyazhang marked this pull request as draft November 7, 2024 22:18
@xinyazhang
Copy link
Collaborator Author

@jithunnair-amd nope, queued PR should not be taken out of draft because it's based on previous work (for this PR specifically its #50)

@xinyazhang xinyazhang force-pushed the xinyazhang/manylinux_2_28-dockerfile branch 3 times, most recently from 3d075ee to e278d4a Compare November 13, 2024 17:35
@ethanwee1
Copy link

ethanwee1 commented Nov 13, 2024

Tested with these commands to build
aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz

Commands:

git clone https://github.com/ROCm/aotriton.git
cd aotriton/
git checkout xinyazhang/manylinux_2_28-dockerfile
cd dockerfile/
export AMDGPU_INSTALLER=https://repo.radeon.com/amdgpu-install/6.2.4/el/8.10/amdgpu-install-6.2.60204-1.el8.noarch.rpm
mkdir -p output
TRITON_LLVM_HASH="b5cc222d" bash build.sh input tmpfs output e278d4a853170c7a9063cfe847419414cb7b62b6 "MI300X;MI200" 2>&1 | tee buildlog2.log
tar tvf output/*.tar*

Output:
Size: 107MB
aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.txt
buildlog2.log

@jithunnair-amd jithunnair-amd mentioned this pull request Nov 13, 2024
@xinyazhang xinyazhang changed the base branch from xinyazhang/storagev2 to main November 13, 2024 19:29
@xinyazhang xinyazhang force-pushed the xinyazhang/manylinux_2_28-dockerfile branch from e278d4a to 8338d4c Compare November 13, 2024 19:30
@jithunnair-amd
Copy link
Contributor

jithunnair-amd commented Nov 13, 2024

Tested with these commands to build aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.tar.gz

Commands:

git clone https://github.com/ROCm/aotriton.git
cd aotriton/
git checkout xinyazhang/manylinux_2_28-dockerfile
cd dockerfile/
export AMDGPU_INSTALLER=https://repo.radeon.com/amdgpu-install/6.2.4/el/8.10/amdgpu-install-6.2.60204-1.el8.noarch.rpm
mkdir -p output
TRITON_LLVM_HASH="b5cc222d" bash build.sh input tmpfs output e278d4a853170c7a9063cfe847419414cb7b62b6 "MI300X;MI200" 2>&1 | tee buildlog2.log
tar tvf output/*.tar*

Output: aotriton-e278d4a853170c7a9063cfe847419414cb7b62b6-manylinux_2_28_x86_64-rocm6.2-shared.txt buildlog2.log

Some notable lines in the log:

  • -rw-r--r-- root/root 13064056 2024-11-13 18:01 aotriton/lib/libaotriton_v2.so: libaotriton_v2.so reduced to 13MB
  • aks2 files generated for each GFX arch eg. aotriton/lib/aotriton.images/amd-gfx90a/flash/attn_fwd/FONLY__^bf16@16,False,128,False,False,False,0___MI200.aks2

@xinyazhang
Copy link
Collaborator Author

13 MiB is also significant considering it's functionality. I believe most of the size comes from the generated dispatching code.
I've added this into the keepbook, but no concrete plan to implement it.

@xinyazhang xinyazhang changed the title [Queued PR] Add docker based package builder and switch to system compiler Add docker based package builder and switch to system compiler Nov 13, 2024
@xinyazhang xinyazhang marked this pull request as ready for review November 13, 2024 19:50
@xinyazhang xinyazhang merged commit 3fe6ca6 into main Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants