Skip to content

Improve error handling, s3 mounting, distributed tests for axlearn #488

Improve error handling, s3 mounting, distributed tests for axlearn

Improve error handling, s3 mounting, distributed tests for axlearn #488

Triggered via pull request March 14, 2025 10:45
Status Cancelled
Total duration 12m 21s
Artifacts 1

nccl-k8s.yaml

on: pull_request
nccl-tests  /  ...  /  build-mpi-operator-compatible-base
2m 55s
Matrix: nccl-tests / nccl-test

Annotations

7 errors
nccl-tests / nccl-test (reduce_scatter_perf_mpi)
Canceling since a higher priority waiting request for 'NCCL on Kubernetes-sbosisio/axlearn_improvements' exists
nccl-tests / nccl-test (all_reduce_perf_mpi)
Canceling since a higher priority waiting request for 'NCCL on Kubernetes-sbosisio/axlearn_improvements' exists
nccl-tests / nccl-test (all_reduce_perf_mpi)
The operation was canceled.
nccl-tests / nccl-test (all_gather_perf_mpi)
Canceling since a higher priority waiting request for 'NCCL on Kubernetes-sbosisio/axlearn_improvements' exists
nccl-tests / nccl-test (all_gather_perf_mpi)
The operation was canceled.
nccl-tests / nccl-test (broadcast_perf_mpi)
Canceling since a higher priority waiting request for 'NCCL on Kubernetes-sbosisio/axlearn_improvements' exists
nccl-tests / nccl-test (broadcast_perf_mpi)
The operation was canceled.

Artifacts

Produced during runtime
Name Size
artifact-mpi-operator-compatible-base-build-amd64
638 Bytes