Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch build_test_all_bazel to new dockerfile and runners. #18533

Merged
merged 11 commits into from
Sep 19, 2024

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Sep 16, 2024

Progress on #15332 and #18238. Fixes #16915.

This switches the build_test_all_bazel CI job from the gcr.io/iree-oss/base-bleeding-edge Dockerfile using GCP for remote cache storage to the ghcr.io/iree-org/cpubuilder_ubuntu_jammy_x86_64 Dockerfile with no remote cache.

With no cache, this job takes between 18 and 25 minutes. Early testing also showed times as long as 60 minutes, if the Docker command and runner are both not optimally configured for Bazel (e.g. not using a RAM disk).

The job is also moved from running on every commit to running on a nightly schedule while we evaluate how frequently it breaks and how long it takes to run. If we set up a new remote cache (https://bazel.build/remote/caching), we can move it back to running more regularly.

@ScottTodd ScottTodd added the infrastructure Relating to build systems, CI, or testing label Sep 16, 2024
@ScottTodd
Copy link
Member Author

Ooooooof #16915 is hurting compile time a lot here. Shouldn't need to build all the LLVM targets.

@ScottTodd
Copy link
Member Author

Got a successful build... only took 55 minutes on a 96 core machine 🤦‍♂️. Could try updating Bazel and/or using a ramdisk. This might be the old issue where Bazel performance is pessimized on systems with high core counts. My local build (similar high power machine) finished in closer to 20-30 minutes.

We can also prune build deps, or try getting a new remote cache spun up.

OR just drop Bazel support / move it down a support tier to nightly builds. Ehhhh, no good options, just different compromises and they all take time.

@ScottTodd ScottTodd marked this pull request as ready for review September 18, 2024 22:55
@ScottTodd ScottTodd requested a review from saienduri September 18, 2024 22:55
env:
IREE_CUDA_DEPS_DIR: /usr/local/iree_cuda_deps
run: |
./build_tools/bazel/install_bazelisk.sh 1.21.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered installing Bazel or Bazelisk in the cpubuilder dockerfile in iree-org/base-docker-images#9. Decided against it for now, to keep the dockerfile simpler.

Comment on lines +50 to +51
cp ./build_tools/docker/context/fetch_cuda_deps.sh /usr/local/bin
/usr/local/bin/fetch_cuda_deps.sh ${IREE_CUDA_DEPS_DIR}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These CUDA deps were part of the base-bleeding-edge dockerfile:

######## IREE CUDA DEPS ########
ENV IREE_CUDA_DEPS_DIR="/usr/local/iree_cuda_deps"
COPY build_tools/docker/context/fetch_cuda_deps.sh /usr/local/bin
RUN /usr/local/bin/fetch_cuda_deps.sh "${IREE_CUDA_DEPS_DIR}"
##############

See the notes in build_tools/bazel/workspace.bzl too.

I don't care enough right now to refactor how the Bazel build handles CUDA, so choosing the path of least resistance and putting more logic in the workflow.

@ScottTodd ScottTodd requested review from marbre and removed request for antiagainst and qedawkins September 18, 2024 23:00
Comment on lines +9 to +16
on:
pull_request:
paths:
- ".github/workflows/ci_linux_x64_bazel.yml"
schedule:
# Weekday mornings at 09:15 UTC = 01:15 PST (UTC - 8).
- cron: "15 9 * * 1-5"
workflow_dispatch:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed how frequently this job should run here on Discord.

We settled on a compromise for now:

  • Run nightly instead of on every commit
  • Look at setting up a remote build cache on the same nginx server that we use for CMake's ccache/sccache storage (using webdav)

Copy link
Collaborator

@saienduri saienduri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ScottTodd ScottTodd merged commit 0636abd into iree-org:main Sep 19, 2024
34 checks passed
@ScottTodd ScottTodd deleted the infra-bazel-docker branch September 19, 2024 01:58
ScottTodd added a commit that referenced this pull request Sep 19, 2024
Follow-up to #18533. Build
configuration errors are too easy to make without this running on
presubmit. The long runtime of this job is unfortunate (20-25 minutes
when it could be 2-5 minutes), so we're looking at setting up a new
remote build cache server:
#18557.

ci-exactly: linux_x64_bazel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Relating to build systems, CI, or testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bazel build includes unused LLVM targets
2 participants