Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Apple GPU runners and run Metal tests again #18817

Open
ScottTodd opened this issue Oct 17, 2024 · 0 comments
Open

Add Apple GPU runners and run Metal tests again #18817

ScottTodd opened this issue Oct 17, 2024 · 0 comments
Labels
hal/metal Runtime Apple Metal HAL backend infrastructure Relating to build systems, CI, or testing platform/macos 🍎 MacOS-specific build, execution, benchmarking, and deployment

Comments

@ScottTodd
Copy link
Member

Similar to #18814.

We used to have a mac mini running macOS builds/tests, including of the Metal HAL. Right now we use standard GitHub-hosted runners for macOS, which includes arm (macos-12 and macos-13) and m1 (macos-14). The GitHub-hosted runners do not expose the GPU (at least on arm64?):

Nested-virtualization and Metal Performance Shaders (MPS) are not supported due to the limitation of Apple's Virtualization Framework.

We could start with a single self-hosted runner handling nightly jobs with unit tests and larger test suites, but we should aim for presubmit testing of at least the Metal unit tests.

@ScottTodd ScottTodd added hal/metal Runtime Apple Metal HAL backend infrastructure Relating to build systems, CI, or testing platform/macos 🍎 MacOS-specific build, execution, benchmarking, and deployment labels Oct 17, 2024
ScottTodd added a commit that referenced this issue Jan 31, 2025
…P. (#19524)

This switches from running ONNX model compile->run correctness tests on
only CPU to now run on GPU using the Vulkan and HIP APIs. We could also
run on CUDA with #18814 and Metal
with #18817.

These new tests will help guard against regressions to full models, at
least when using default flags. I'm planning on adding models coming
from other frameworks (such as [LiteRT
Models](https://github.com/iree-org/iree-test-suites/tree/main/litert_models))
in future PRs.

As these tests will run on every pull request and commit, I'm starting
the test list with all tests that are passing on our current set of
runners, with no (strict _or_ loose) XFAILs. The full set of tests will
be run nightly in https://github.com/iree-org/iree-test-suites using
nightly IREE releases... once we have runners with GPUs available in
that repository.

See also iree-org/iree-test-suites#65 and
iree-org/iree-test-suites#6.

## Sample logs

I have not done much triage on the test failures, but it does seem like
Vulkan pass rates are substantially lower than CPU and ROCm. Test
reports, including logs for all failures, are currently published as
artifacts on actions runs in iree-test-suites, such as
https://github.com/iree-org/iree-test-suites/actions/runs/12794322266.
We could also archive test reports somewhere like
https://github.com/nod-ai/e2eshark-reports and/or host the test reports
on a website like
https://nod-ai.github.io/shark-ai/llm/sglang/index.html?sort=result.

### CPU


https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117085?pr=19524#step:8:395

```
============================== slowest durations ===============================
39.46s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[vgg/model/vgg19-7.onnx]
13.39s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
13.25s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
12.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
11.93s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
11.49s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
11.28s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
11.26s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
9.14s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.73s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
7.61s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/gender_googlenet.onnx]
7.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
7.27s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
4.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
4.61s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
4.58s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
3.08s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[squeezenet/model/squeezenet1.0-9.onnx]
2.02s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
1.90s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
================== 19 passed, 18 skipped in 184.96s (0:03:04) ==================
```

### ROCm


https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117629?pr=19524#step:8:344

```
============================== slowest durations ===============================
9.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
9.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
9.05s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
8.73s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
7.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.94s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
7.81s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
7.13s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.95s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
5.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
4.52s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/googlenet/model/googlenet-12.onnx]
3.55s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
3.12s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
2.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
2.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
2.21s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.36s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
0.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============ 17 passed, 19 skipped, 1 xfailed in 100.10s (0:01:40) =============
```

### Vulkan


https://github.com/iree-org/iree/actions/runs/12797886622/job/35681118044?pr=19524#step:8:216

```
============================== slowest durations ===============================
13.10s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
12.97s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
12.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
12.22s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
9.07s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
8.09s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.04s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
2.93s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
0.90s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============= 9 passed, 27 skipped, 1 xfailed in 79.62s (0:01:19) ==============
```

ci-exactly: build_packages, test_onnx
ita9naiwa pushed a commit to ita9naiwa/iree that referenced this issue Feb 4, 2025
…P. (iree-org#19524)

This switches from running ONNX model compile->run correctness tests on
only CPU to now run on GPU using the Vulkan and HIP APIs. We could also
run on CUDA with iree-org#18814 and Metal
with iree-org#18817.

These new tests will help guard against regressions to full models, at
least when using default flags. I'm planning on adding models coming
from other frameworks (such as [LiteRT
Models](https://github.com/iree-org/iree-test-suites/tree/main/litert_models))
in future PRs.

As these tests will run on every pull request and commit, I'm starting
the test list with all tests that are passing on our current set of
runners, with no (strict _or_ loose) XFAILs. The full set of tests will
be run nightly in https://github.com/iree-org/iree-test-suites using
nightly IREE releases... once we have runners with GPUs available in
that repository.

See also iree-org/iree-test-suites#65 and
iree-org/iree-test-suites#6.

## Sample logs

I have not done much triage on the test failures, but it does seem like
Vulkan pass rates are substantially lower than CPU and ROCm. Test
reports, including logs for all failures, are currently published as
artifacts on actions runs in iree-test-suites, such as
https://github.com/iree-org/iree-test-suites/actions/runs/12794322266.
We could also archive test reports somewhere like
https://github.com/nod-ai/e2eshark-reports and/or host the test reports
on a website like
https://nod-ai.github.io/shark-ai/llm/sglang/index.html?sort=result.

### CPU

https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117085?pr=19524#step:8:395

```
============================== slowest durations ===============================
39.46s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[vgg/model/vgg19-7.onnx]
13.39s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
13.25s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
12.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
11.93s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
11.49s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
11.28s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
11.26s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
9.14s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.73s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
7.61s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/gender_googlenet.onnx]
7.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
7.27s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
4.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
4.61s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
4.58s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
3.08s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[squeezenet/model/squeezenet1.0-9.onnx]
2.02s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
1.90s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
================== 19 passed, 18 skipped in 184.96s (0:03:04) ==================
```

### ROCm

https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117629?pr=19524#step:8:344

```
============================== slowest durations ===============================
9.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
9.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
9.05s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
8.73s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
7.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.94s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
7.81s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
7.13s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.95s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
5.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
4.52s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/googlenet/model/googlenet-12.onnx]
3.55s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
3.12s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
2.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
2.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
2.21s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.36s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
0.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============ 17 passed, 19 skipped, 1 xfailed in 100.10s (0:01:40) =============
```

### Vulkan

https://github.com/iree-org/iree/actions/runs/12797886622/job/35681118044?pr=19524#step:8:216

```
============================== slowest durations ===============================
13.10s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
12.97s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
12.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
12.22s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
9.07s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
8.09s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.04s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
2.93s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
0.90s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============= 9 passed, 27 skipped, 1 xfailed in 79.62s (0:01:19) ==============
```

ci-exactly: build_packages, test_onnx
Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hal/metal Runtime Apple Metal HAL backend infrastructure Relating to build systems, CI, or testing platform/macos 🍎 MacOS-specific build, execution, benchmarking, and deployment
Projects
None yet
Development

No branches or pull requests

1 participant