Parameterize ONNX model tests. #65

ScottTodd · 2024-12-18T23:19:57Z

Progress on #6. See how this is used downstream in iree-org/iree#19524.

Overview

This replaces hardcoded flags like

iree_compile_flags = [
    "--iree-hal-target-backends=llvm-cpu",
    "--iree-llvmcpu-target-cpu=host",
]
iree_run_module_flags = [
    "--device=local-task",
]

and inlined marks like

@pytest.mark.xfail(raises=IreeCompileException)
def test_foo():

with a JSON config file passed to the test runner via the --test-config-file option or the IREE_TEST_CONFIG_FILE environment variable.

During test case collection, each test case name is looked up in the config file to determine what the expected outcome is, from ["skip" (special option), "pass", "fail-import", "fail-compile", "fail-run"]. By default, all tests are skipped. This design allows for out of tree testing to be performed using explicit test lists (encoded in a file, unlike the -k option), custom flags, and custom test expectations.

Design details

Compare this implementation with these others:

https://github.com/iree-org/iree-test-suites/tree/main/onnx_ops also uses config files, but with separate lists for skip_compile_tests, skip_run_tests, expected_compile_failures, and expected_run_failures. All tests are run by default.
https://github.com/nod-ai/SHARK-TestSuite/blob/main/alt_e2eshark/run.py uses --device=, --backend=, --target-chip=, and --test-filter= arguments. Arbitrary flags are not supported, and test expectations are also not supported, so there is no way to directly signal if tests are unexpectedly passing or failing. A utility script can be used to diff the results of two test reports: https://github.com/nod-ai/SHARK-TestSuite/blob/main/alt_e2eshark/utils/check_regressions.py.
https://github.com/iree-org/iree-test-suites/blob/main/sharktank_models/llama3.1/test_llama.py parameterizes test cases using @pytest.fixture([params=[...]]) with pytest.mark.target_hip and other custom marks. This is more standard pytest and supports fluent ways to express other test configurations, but it makes annotating large numbers of tests pretty verbose and doesn't allow for out of tree configuration.

I'm imagining a few usage styles:

Nightly testing in this repository, running all test cases and tracking the current test results in a checked in config file.
- We could also go with an approach like https://github.com/nod-ai/SHARK-TestSuite/blob/main/alt_e2eshark/utils/check_regressions.py to diff test results but this encodes the test results in the config files rather than in external reports. I see pros and cons to both approaches.
Presubmit testing in https://github.com/iree-org/iree, running a subset of test cases that pass, ensuring that they do not start failing. We could also run with XFAIL to get early signal for tests that start to pass.
- If we don't run with XFAIL then we don't need the generalized tests_and_expected_outcomes, we could just limit testing to only models that are passing.
Developer testing with arbitrary flags.

Follow-up tasks

Add job matrix to workflow (needs runners in this repo with GPUs)
Add an easy way to update the list of XFAILs (maybe switch to https://github.com/gsnedders/pytest-expect and use its --update-xfail?)
Triage some of the failures (e.g. can adjust tolerances on Vulkan)
Adjust file downloading / caching behavior to avoid redownloading and using significant bandwidth when used together with persistent self-hosted runners or github actions caches

ScottTodd · 2024-12-19T00:24:38Z

I'm looking for some early feedback, @zjgarvey or others. Early testing shows that running these tests on HIP would have caught some recent regressions. The specific mechanics used for setting flags, choosing which tests to run, and checking or reporting which stages passed/failed can be implemented in multiple ways.

ScottTodd · 2024-12-19T16:39:59Z

onnx_models/conftest.py

+    parser.addoption(
+        "--test-config-file",
+        type=Path,
+        default=default_config_file,
+        help="Config JSON file used to parameterize test cases",
+    )


https://github.com/nod-ai/SHARK-TestSuite/blob/main/alt_e2eshark/run.py uses --device=, --backend=, --target-chip=, and --test-filter= arguments. Arbitrary flags are not supported, and test expectations are also not supported, so there is no way to directly signal if tests are unexpectedly passing or failing.

Here are some ideas, adding complexity in the conftest file but making it more flexible:

Add more options here to use instead of --test-config-file, like --iree-compile-flags

Make every option a flag then use a flagfile pattern like https://stackoverflow.com/a/27434050 so "config.json" is just a collection of regular flags

Not sure how that would work with the lists of tests with expectations... I like having the full list of tests that will run be sorted and not split between groups like in the onnx op tests

Could load a .py file that has the list of tests and statuses, or even just implements the pytest_collection_modifyitems hook

Using https://github.com/gsnedders/pytest-expect or https://github.com/projectcaluma/pytest-xfaillist is also an option.

Both support updating the xfail files via a --update-xfail or --generate-xfaillist option. Neither supports xfail reasons, but that's not critical.

https://github.com/projectcaluma/pytest-xfaillist seems to only support a hardcoded xfails.list file name next to the config root (source here), which wouldn't work with separate lists of xfails depending on the configuration (e.g. backend choice)

https://github.com/gsnedders/pytest-expect does allow specifying a file with --xfail-file

What I have right now in this "config JSON file" groups these three items:

The flags to use when compiling and running, allowing you to choose between CPU or GPU, for example

The list of tests to run

The list of tests that are expected to fail (and how)

We'll always want a custom implementation for (1). For (2), https://community.lambdatest.com/t/how-to-run-pytest-tests-from-a-list-of-test-paths/31682/2 has some answers but I don't see an existing plugin or standard convention. For (3), we could use one of those projects.

zjgarvey

Moving the expected outcomes to a json is a good idea for making it customizable to the different configs for each test.

I think it will probably be helpful to have the ability to specify extra test-specific options for an individual test through the json file. E.g. for large models, it might be useful to pass additional importer and runtime flags for externalizing params. Although I'm not sure if models that large are going to be in the scope of this testing suite right now.

One thing I think will be rather helpful is to have a way to take a list of urls and automatically generate the test functions. For example:

from typing import List

test_urls = [
      "https://github.com/onnx/models/raw/main/validated/vision/object_detection_segmentation/faster-rcnn/model/FasterRCNN-12.onnx",
      "https://github.com/onnx/models/raw/main/validated/vision/object_detection_segmentation/fcn/model/fcn-resnet50-12.onnx",
]

def generate_name(url: str):
    split = url.split("/")
    return f'test_{split[-3]}_{split[-1].removesuffix(".onnx")}'

def make_function(url: str):
    def func(compare_between_iree_and_onnx_runtime):
        compare_between_iree_and_onnx_runtime(
            model_url=url,
            artifacts_subdir=artifacts_subdir,
        )
    return func

def define_functions(url_list: List[str]):
    for url in url_list:
        globals()[generate_name(url)] = make_function(url)

ScottTodd · 2024-12-19T18:26:30Z

I think it will probably be helpful to have the ability to specify extra test-specific options for an individual test through the json file.

Yes! However, I'm on the fence about specifically where to support extra options. We can add options in all these places:

In the test runner (conftest.py)
In individual test cases within files (test_foo.py)
In configuration files (config.json)
As parameters on environment variables (e.g. IREE_HIP_TEST_TARGET_CHIP)

I don't want to confuse developers with too many choices, but flexibility can helpful for a variety of situations.

E.g. for large models, it might be useful to pass additional importer and runtime flags for externalizing params.

For example here, that sounds like those options should apply regardless of the backend configuration, so the flags could go in the test cases.

Although I'm not sure if models that large are going to be in the scope of this testing suite right now.

I think they should be, as we can choose which tests to run on what schedule. How I have the PR right now allows for the list of tests to be fully opt-in using the default "skip" behavior. You can also filter with pytest:

# run only tests matching a string
-k resnet

# skip all large tests
-m "not size_large"

We can also use something like https://pypi.org/project/pytest-shard/ to run across multiple machines.

One thing I think will be rather helpful is to have a way to take a list of urls and automatically generate the test functions.

Yeah! I like how easy it looks to add test cases when they are in files like https://github.com/nod-ai/SHARK-TestSuite/blob/main/alt_e2eshark/onnx_tests/models/external_lists/onnx_model_zoo_computer_vision_1.txt. One of the tradeoffs I'm considering here is how it isn't obvious at all from a single test case name what the test is actually doing. I'd like for test suites to be forkable into user code, not be their own world of meta programming. The new tests that Rob added in https://github.com/iree-org/iree-test-suites/blob/main/sharktank_models/llama3.1/test_llama.py are on the other side of that spectrum:

Resemble user code                                               Templated to run lots of tests
<--------------------------------------------------------------------------------------------->

sharktank_models/                                          onnx_models/           alt_e2eshark/

What's in this test suite right now is closer to alt_e2eshark in that there is code for each test case... but that code is nearly all boilerplate:

def test_age_gender_gender_googlenet(compare_between_iree_and_onnxruntime):
    compare_between_iree_and_onnxruntime(
        model_url="https://github.com/onnx/models/raw/main/validated/vision/body_analysis/age_gender/models/gender_googlenet.onnx",
        artifacts_subdir=artifacts_subdir,
    )

I can iterate on further parameterization as you suggest here... maybe using pytest.mark.parametrize (docs here) or one of the more powerful options.

The ergonomics question is partially solved by

keeping the test runner code organized and documented
having the tests log what they are doing (iree-import-onnx -> iree-compile -> iree-run-module), since then users can run the tests and copy the log output for their own uses

ScottTodd · 2025-01-10T19:24:32Z

Thanks again for the review comments @zjgarvey . I'm planning to pick this back up soon.

…erize

ScottTodd · 2025-01-10T22:53:49Z

onnx_models/tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py

+@pytest.mark.parametrize(
+    "model",
+    [
+        # fmt: off
+        pytest.param("duc/model/ResNet101-DUC-12.onnx", marks=pytest.mark.size_large),
+        pytest.param("faster-rcnn/model/FasterRCNN-12.onnx"),
+        pytest.param("fcn/model/fcn-resnet50-12.onnx"),
+        pytest.param("mask-rcnn/model/MaskRCNN-12.onnx"),
+        pytest.param("retinanet/model/retinanet-9.onnx"),
+        pytest.param("ssd/model/ssd-12.onnx"),
+        pytest.param("ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx", marks=pytest.mark.xfail(raises=NotImplementedError)),


One thing I think will be rather helpful is to have a way to take a list of urls and automatically generate the test functions.

How's this @zjgarvey ?

Some details:

If no marks are needed (all models in the list are supported by the test suite, no "large" or other special model tags needed), then the pytest.param() wrappers could be removed, for just:

@pytest.mark.parametrize( "model", [ "duc/model/ResNet101-DUC-12.onnx", "faster-rcnn/model/FasterRCNN-12.onnx", "fcn/model/fcn-resnet50-12.onnx", "mask-rcnn/model/MaskRCNN-12.onnx",

Using fmt: off to prevent the formatter from wrapping lines, so each test case gets its own line, even if it becomes very long

Test function names could be customized here: https://stackoverflow.com/questions/37575690/override-a-pytest-parameterized-functions-name . Along with that, I could change how the setup code in conftest.py decides which test cases to modify, using regex match or some shorthand, instead of the explicit

"tests_and_expected_outcomes": { "tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v1/model/inception-v1-12.onnx]": "fail-compile",

The explicit format is copy-paste friendly but not typing friendly :P

I figure for any test cases that need extra arguments, they could be their own groups that call compare_between_iree_and_onnxruntime or another test helper function, instead of packing more options in to this @pytest.mark.parametrize

Yeah, that makes sense to me.

I'll go through and take another look at this PR when I get the chance. If I don't get to it for a while and you'd like a review, please feel free to message/ping me.

zjgarvey

Actually, I was able to look through this pretty quickly now, and it looks like a good change to me. What else needs to be done to undraft this PR?

ScottTodd · 2025-01-14T20:49:27Z

Actually, I was able to look through this pretty quickly now, and it looks like a good change to me. What else needs to be done to undraft this PR?

I'm not totally content with a few design points yet, but this could be good enough to merge and start using. I'll iterate a bit on the PR description and docs so they reflect the current status then mark as ready for review. Thanks for taking a look!

ScottTodd · 2025-01-15T17:53:33Z

onnx_models/conftest.py

+        # Download the model as needed.
+        # TODO(scotttodd): move to fixture with cache / download on demand
+        # TODO(scotttodd): overwrite if already existing? check SHA?
+        # TODO(scotttodd): redownload if file is corrupted (e.g. partial download)
+        onnx_path = test_artifacts_dir / f"{model_name}.onnx"
+        if not onnx_path.exists():
+            urllib.request.urlretrieve(model_url, onnx_path)


Follow-up tasks

Adjust file downloading / caching behavior to avoid redownloading and using significant bandwidth when used together with persistent self-hosted runners or github actions caches

For a sense of scale, the onnx_models/artifacts/ directory is around 34GB on my Windows system right now, including .mlir and .vmfb files. I don't want CI runs to redownload 10GB+ from GitHub each job run, since I think that cuts in to the Git LFS bandwidth quota for https://github.com/onnx/models . The docs at https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage say that quota comes from the repository owner, not the user, so I want to be a good citizen here.

I may take ideas from #59 (comment) and build some caching layer that can be shared across the test suites here.

…P. (#19524) This switches from running ONNX model compile->run correctness tests on only CPU to now run on GPU using the Vulkan and HIP APIs. We could also run on CUDA with #18814 and Metal with #18817. These new tests will help guard against regressions to full models, at least when using default flags. I'm planning on adding models coming from other frameworks (such as [LiteRT Models](https://github.com/iree-org/iree-test-suites/tree/main/litert_models)) in future PRs. As these tests will run on every pull request and commit, I'm starting the test list with all tests that are passing on our current set of runners, with no (strict _or_ loose) XFAILs. The full set of tests will be run nightly in https://github.com/iree-org/iree-test-suites using nightly IREE releases... once we have runners with GPUs available in that repository. See also iree-org/iree-test-suites#65 and iree-org/iree-test-suites#6. ## Sample logs I have not done much triage on the test failures, but it does seem like Vulkan pass rates are substantially lower than CPU and ROCm. Test reports, including logs for all failures, are currently published as artifacts on actions runs in iree-test-suites, such as https://github.com/iree-org/iree-test-suites/actions/runs/12794322266. We could also archive test reports somewhere like https://github.com/nod-ai/e2eshark-reports and/or host the test reports on a website like https://nod-ai.github.io/shark-ai/llm/sglang/index.html?sort=result. ### CPU https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117085?pr=19524#step:8:395 ``` ============================== slowest durations =============================== 39.46s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[vgg/model/vgg19-7.onnx] 13.39s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 13.25s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 12.48s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 11.93s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 11.49s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 11.28s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx] 11.26s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 9.14s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx] 7.73s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx] 7.61s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/gender_googlenet.onnx] 7.57s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx] 7.27s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 4.86s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 4.61s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx] 4.58s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx] 3.08s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[squeezenet/model/squeezenet1.0-9.onnx] 2.02s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] 1.90s call tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx] ================== 19 passed, 18 skipped in 184.96s (0:03:04) ================== ``` ### ROCm https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117629?pr=19524#step:8:344 ``` ============================== slowest durations =============================== 9.40s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx] 9.15s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 9.05s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 8.73s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 7.95s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx] 7.94s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 7.81s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 7.13s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 6.95s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx] 5.15s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx] 4.52s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/googlenet/model/googlenet-12.onnx] 3.55s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 3.12s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx] 2.57s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 2.48s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx] 2.21s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx] 1.36s call tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx] 0.95s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] ============ 17 passed, 19 skipped, 1 xfailed in 100.10s (0:01:40) ============= ``` ### Vulkan https://github.com/iree-org/iree/actions/runs/12797886622/job/35681118044?pr=19524#step:8:216 ``` ============================== slowest durations =============================== 13.10s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 12.97s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 12.40s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 12.22s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 9.07s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 8.09s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 6.04s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 2.93s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx] 1.86s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 0.90s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] ============= 9 passed, 27 skipped, 1 xfailed in 79.62s (0:01:19) ============== ``` ci-exactly: build_packages, test_onnx

…P. (iree-org#19524) This switches from running ONNX model compile->run correctness tests on only CPU to now run on GPU using the Vulkan and HIP APIs. We could also run on CUDA with iree-org#18814 and Metal with iree-org#18817. These new tests will help guard against regressions to full models, at least when using default flags. I'm planning on adding models coming from other frameworks (such as [LiteRT Models](https://github.com/iree-org/iree-test-suites/tree/main/litert_models)) in future PRs. As these tests will run on every pull request and commit, I'm starting the test list with all tests that are passing on our current set of runners, with no (strict _or_ loose) XFAILs. The full set of tests will be run nightly in https://github.com/iree-org/iree-test-suites using nightly IREE releases... once we have runners with GPUs available in that repository. See also iree-org/iree-test-suites#65 and iree-org/iree-test-suites#6. ## Sample logs I have not done much triage on the test failures, but it does seem like Vulkan pass rates are substantially lower than CPU and ROCm. Test reports, including logs for all failures, are currently published as artifacts on actions runs in iree-test-suites, such as https://github.com/iree-org/iree-test-suites/actions/runs/12794322266. We could also archive test reports somewhere like https://github.com/nod-ai/e2eshark-reports and/or host the test reports on a website like https://nod-ai.github.io/shark-ai/llm/sglang/index.html?sort=result. ### CPU https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117085?pr=19524#step:8:395 ``` ============================== slowest durations =============================== 39.46s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[vgg/model/vgg19-7.onnx] 13.39s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 13.25s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 12.48s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 11.93s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 11.49s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 11.28s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx] 11.26s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 9.14s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx] 7.73s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx] 7.61s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/gender_googlenet.onnx] 7.57s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx] 7.27s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 4.86s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 4.61s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx] 4.58s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx] 3.08s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[squeezenet/model/squeezenet1.0-9.onnx] 2.02s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] 1.90s call tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx] ================== 19 passed, 18 skipped in 184.96s (0:03:04) ================== ``` ### ROCm https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117629?pr=19524#step:8:344 ``` ============================== slowest durations =============================== 9.40s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx] 9.15s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 9.05s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 8.73s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 7.95s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx] 7.94s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 7.81s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 7.13s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 6.95s call tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx] 5.15s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx] 4.52s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/googlenet/model/googlenet-12.onnx] 3.55s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 3.12s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx] 2.57s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 2.48s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx] 2.21s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx] 1.36s call tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx] 0.95s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] ============ 17 passed, 19 skipped, 1 xfailed in 100.10s (0:01:40) ============= ``` ### Vulkan https://github.com/iree-org/iree/actions/runs/12797886622/job/35681118044?pr=19524#step:8:216 ``` ============================== slowest durations =============================== 13.10s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx] 12.97s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx] 12.40s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx] 12.22s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx] 9.07s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx] 8.09s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx] 6.04s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx] 2.93s call tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx] 1.86s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx] 0.90s call tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx] ============= 9 passed, 27 skipped, 1 xfailed in 79.62s (0:01:19) ============== ``` ci-exactly: build_packages, test_onnx Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>

Parameterize ONNX model tests.

f0aa545

ScottTodd mentioned this pull request Dec 18, 2024

[infra] Run parameterized ONNX model tests across CPU, Vulkan, and HIP. iree-org/iree#19524

Merged

ScottTodd requested a review from zjgarvey December 19, 2024 00:21

ScottTodd commented Dec 19, 2024

View reviewed changes

ScottTodd mentioned this pull request Dec 19, 2024

[regression][GPU]: 'func.func' op uses 81920 bytes of shared memory; exceeded the limit of 65536 bytes post 6ff00a8a008d06b604d4ca4e0ae6e601ae810b4f iree-org/iree#19511

Closed

zjgarvey reviewed Dec 19, 2024

View reviewed changes

ScottTodd added 3 commits January 10, 2025 14:41

Silence warnings from onnxruntime.

25783d8

Document --test-config-file.

9aba030

Parametrize tests with @pytest.mark.parametrize and update XFAILs.

99c3df6

ScottTodd force-pushed the onnx-models-parameterize branch from a3f7e28 to 99c3df6 Compare January 10, 2025 22:46

Merge remote-tracking branch 'upstream/main' into onnx-models-paramet…

add0476

…erize

ScottTodd commented Jan 10, 2025

View reviewed changes

Fix download URL.

b5c1148

ScottTodd requested a review from zjgarvey January 10, 2025 23:28

zjgarvey approved these changes Jan 14, 2025

View reviewed changes

Fixes for running tests on Windows.

9200151

ScottTodd marked this pull request as ready for review January 15, 2025 17:44

ScottTodd commented Jan 15, 2025

View reviewed changes

ScottTodd merged commit d7db851 into iree-org:main Jan 15, 2025
2 checks passed

ScottTodd deleted the onnx-models-parameterize branch January 15, 2025 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameterize ONNX model tests. #65

Parameterize ONNX model tests. #65

ScottTodd commented Dec 18, 2024 •

edited

Loading

ScottTodd commented Dec 19, 2024

ScottTodd Dec 19, 2024

ScottTodd Jan 13, 2025

zjgarvey left a comment

ScottTodd commented Dec 19, 2024

ScottTodd commented Jan 10, 2025

ScottTodd Jan 10, 2025

zjgarvey Jan 14, 2025

zjgarvey left a comment

ScottTodd commented Jan 14, 2025

ScottTodd Jan 15, 2025

Parameterize ONNX model tests. #65

Parameterize ONNX model tests. #65

Conversation

ScottTodd commented Dec 18, 2024 • edited Loading

Overview

Design details

Follow-up tasks

ScottTodd commented Dec 19, 2024

ScottTodd Dec 19, 2024

Choose a reason for hiding this comment

ScottTodd Jan 13, 2025

Choose a reason for hiding this comment

zjgarvey left a comment

Choose a reason for hiding this comment

ScottTodd commented Dec 19, 2024

ScottTodd commented Jan 10, 2025

ScottTodd Jan 10, 2025

Choose a reason for hiding this comment

zjgarvey Jan 14, 2025

Choose a reason for hiding this comment

zjgarvey left a comment

Choose a reason for hiding this comment

ScottTodd commented Jan 14, 2025

ScottTodd Jan 15, 2025

Choose a reason for hiding this comment

ScottTodd commented Dec 18, 2024 •

edited

Loading