-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Apple GPU runners and run Metal tests again #18817
Labels
hal/metal
Runtime Apple Metal HAL backend
infrastructure
Relating to build systems, CI, or testing
platform/macos 🍎
MacOS-specific build, execution, benchmarking, and deployment
Comments
ScottTodd
added a commit
that referenced
this issue
Jan 31, 2025
…P. (#19524) This switches from running ONNX model compile->run correctness tests on only CPU to now run on GPU using the Vulkan and HIP APIs. We could also run on CUDA with #18814 and Metal with #18817. These new tests will help guard against regressions to full models, at least when using default flags. I'm planning on adding models coming from other frameworks (such as [LiteRT Models](https://github.com/iree-org/iree-test-suites/tree/main/litert_models)) in future PRs. As these tests will run on every pull request and commit, I'm starting the test list with all tests that are passing on our current set of runners, with no (strict _or_ loose) XFAILs. The full set of tests will be run nightly in https://github.com/iree-org/iree-test-suites using nightly IREE releases... once we have runners with GPUs available in that repository. See also iree-org/iree-test-suites#65 and iree-org/iree-test-suites#6. ## Sample logs I have not done much triage on the test failures, but it does seem like Vulkan pass rates are substantially lower than CPU and ROCm. Test reports, including logs for all failures, are currently published as artifacts on actions runs in iree-test-suites, such as https://github.com/iree-org/iree-test-suites/actions/runs/12794322266. We could also archive test reports somewhere like https://github.com/nod-ai/e2eshark-reports and/or host the test reports on a website like https://nod-ai.github.io/shark-ai/llm/sglang/index.html?sort=result. ### CPU https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117085?pr=19524#step:8:395 ``` ============================== slowest durations =============================== 39.46s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[vgg/model/vgg19-7.onnx] 13.39s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 13.25s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 12.48s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 11.93s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 11.49s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 11.28s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx] 11.26s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 9.14s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx] 7.73s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx] 7.61s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/gender_googlenet.onnx] 7.57s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx] 7.27s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 4.86s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 4.61s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx] 4.58s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx] 3.08s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[squeezenet/model/squeezenet1.0-9.onnx] 2.02s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] 1.90s call tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx] ================== 19 passed, 18 skipped in 184.96s (0:03:04) ================== ``` ### ROCm https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117629?pr=19524#step:8:344 ``` ============================== slowest durations =============================== 9.40s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx] 9.15s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 9.05s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 8.73s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 7.95s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx] 7.94s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 7.81s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 7.13s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 6.95s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx] 5.15s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx] 4.52s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/googlenet/model/googlenet-12.onnx] 3.55s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 3.12s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx] 2.57s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 2.48s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx] 2.21s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx] 1.36s call tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx] 0.95s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] ============ 17 passed, 19 skipped, 1 xfailed in 100.10s (0:01:40) ============= ``` ### Vulkan https://github.com/iree-org/iree/actions/runs/12797886622/job/35681118044?pr=19524#step:8:216 ``` ============================== slowest durations =============================== 13.10s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 12.97s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 12.40s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 12.22s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 9.07s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 8.09s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 6.04s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 2.93s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx] 1.86s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 0.90s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] ============= 9 passed, 27 skipped, 1 xfailed in 79.62s (0:01:19) ============== ``` ci-exactly: build_packages, test_onnx
ita9naiwa
pushed a commit
to ita9naiwa/iree
that referenced
this issue
Feb 4, 2025
…P. (iree-org#19524) This switches from running ONNX model compile->run correctness tests on only CPU to now run on GPU using the Vulkan and HIP APIs. We could also run on CUDA with iree-org#18814 and Metal with iree-org#18817. These new tests will help guard against regressions to full models, at least when using default flags. I'm planning on adding models coming from other frameworks (such as [LiteRT Models](https://github.com/iree-org/iree-test-suites/tree/main/litert_models)) in future PRs. As these tests will run on every pull request and commit, I'm starting the test list with all tests that are passing on our current set of runners, with no (strict _or_ loose) XFAILs. The full set of tests will be run nightly in https://github.com/iree-org/iree-test-suites using nightly IREE releases... once we have runners with GPUs available in that repository. See also iree-org/iree-test-suites#65 and iree-org/iree-test-suites#6. ## Sample logs I have not done much triage on the test failures, but it does seem like Vulkan pass rates are substantially lower than CPU and ROCm. Test reports, including logs for all failures, are currently published as artifacts on actions runs in iree-test-suites, such as https://github.com/iree-org/iree-test-suites/actions/runs/12794322266. We could also archive test reports somewhere like https://github.com/nod-ai/e2eshark-reports and/or host the test reports on a website like https://nod-ai.github.io/shark-ai/llm/sglang/index.html?sort=result. ### CPU https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117085?pr=19524#step:8:395 ``` ============================== slowest durations =============================== 39.46s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[vgg/model/vgg19-7.onnx] 13.39s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 13.25s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 12.48s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 11.93s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 11.49s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 11.28s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx] 11.26s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 9.14s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx] 7.73s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx] 7.61s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/gender_googlenet.onnx] 7.57s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx] 7.27s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 4.86s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 4.61s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx] 4.58s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx] 3.08s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[squeezenet/model/squeezenet1.0-9.onnx] 2.02s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] 1.90s call tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx] ================== 19 passed, 18 skipped in 184.96s (0:03:04) ================== ``` ### ROCm https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117629?pr=19524#step:8:344 ``` ============================== slowest durations =============================== 9.40s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx] 9.15s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 9.05s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 8.73s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 7.95s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx] 7.94s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 7.81s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 7.13s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 6.95s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx] 5.15s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx] 4.52s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/googlenet/model/googlenet-12.onnx] 3.55s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 3.12s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx] 2.57s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 2.48s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx] 2.21s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx] 1.36s call tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx] 0.95s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] ============ 17 passed, 19 skipped, 1 xfailed in 100.10s (0:01:40) ============= ``` ### Vulkan https://github.com/iree-org/iree/actions/runs/12797886622/job/35681118044?pr=19524#step:8:216 ``` ============================== slowest durations =============================== 13.10s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 12.97s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 12.40s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 12.22s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 9.07s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 8.09s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 6.04s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 2.93s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx] 1.86s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 0.90s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] ============= 9 passed, 27 skipped, 1 xfailed in 79.62s (0:01:19) ============== ``` ci-exactly: build_packages, test_onnx Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
hal/metal
Runtime Apple Metal HAL backend
infrastructure
Relating to build systems, CI, or testing
platform/macos 🍎
MacOS-specific build, execution, benchmarking, and deployment
Similar to #18814.
We used to have a mac mini running macOS builds/tests, including of the Metal HAL. Right now we use standard GitHub-hosted runners for macOS, which includes arm (
macos-12
andmacos-13
) and m1 (macos-14
). The GitHub-hosted runners do not expose the GPU (at least on arm64?):We could start with a single self-hosted runner handling nightly jobs with unit tests and larger test suites, but we should aim for presubmit testing of at least the Metal unit tests.
The text was updated successfully, but these errors were encountered: