Improve error handling, s3 mounting, distributed tests for axlearn
#488
Annotations
7 errors
nccl-tests / nccl-test (reduce_scatter_perf_mpi)
Canceling since a higher priority waiting request for 'NCCL on Kubernetes-sbosisio/axlearn_improvements' exists
|
nccl-tests / nccl-test (all_reduce_perf_mpi)
Canceling since a higher priority waiting request for 'NCCL on Kubernetes-sbosisio/axlearn_improvements' exists
|
nccl-tests / nccl-test (all_reduce_perf_mpi)
The operation was canceled.
|
nccl-tests / nccl-test (all_gather_perf_mpi)
Canceling since a higher priority waiting request for 'NCCL on Kubernetes-sbosisio/axlearn_improvements' exists
|
nccl-tests / nccl-test (all_gather_perf_mpi)
The operation was canceled.
|
nccl-tests / nccl-test (broadcast_perf_mpi)
Canceling since a higher priority waiting request for 'NCCL on Kubernetes-sbosisio/axlearn_improvements' exists
|
nccl-tests / nccl-test (broadcast_perf_mpi)
The operation was canceled.
|
Artifacts
Produced during runtime
Name | Size | |
---|---|---|
artifact-mpi-operator-compatible-base-build-amd64
|
638 Bytes |
|