Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterize ONNX model tests. #65

Merged
merged 7 commits into from
Jan 15, 2025

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Dec 18, 2024

Progress on #6. See how this is used downstream in iree-org/iree#19524.

Overview

This replaces hardcoded flags like

iree_compile_flags = [
    "--iree-hal-target-backends=llvm-cpu",
    "--iree-llvmcpu-target-cpu=host",
]
iree_run_module_flags = [
    "--device=local-task",
]

and inlined marks like

@pytest.mark.xfail(raises=IreeCompileException)
def test_foo():

with a JSON config file passed to the test runner via the --test-config-file option or the IREE_TEST_CONFIG_FILE environment variable.

During test case collection, each test case name is looked up in the config file to determine what the expected outcome is, from ["skip" (special option), "pass", "fail-import", "fail-compile", "fail-run"]. By default, all tests are skipped. This design allows for out of tree testing to be performed using explicit test lists (encoded in a file, unlike the -k option), custom flags, and custom test expectations.

Design details

Compare this implementation with these others:

I'm imagining a few usage styles:

  • Nightly testing in this repository, running all test cases and tracking the current test results in a checked in config file.
  • Presubmit testing in https://github.com/iree-org/iree, running a subset of test cases that pass, ensuring that they do not start failing. We could also run with XFAIL to get early signal for tests that start to pass.
    • If we don't run with XFAIL then we don't need the generalized tests_and_expected_outcomes, we could just limit testing to only models that are passing.
  • Developer testing with arbitrary flags.

Follow-up tasks

  • Add job matrix to workflow (needs runners in this repo with GPUs)
  • Add an easy way to update the list of XFAILs (maybe switch to https://github.com/gsnedders/pytest-expect and use its --update-xfail?)
  • Triage some of the failures (e.g. can adjust tolerances on Vulkan)
  • Adjust file downloading / caching behavior to avoid redownloading and using significant bandwidth when used together with persistent self-hosted runners or github actions caches

@ScottTodd
Copy link
Member Author

I'm looking for some early feedback, @zjgarvey or others. Early testing shows that running these tests on HIP would have caught some recent regressions. The specific mechanics used for setting flags, choosing which tests to run, and checking or reporting which stages passed/failed can be implemented in multiple ways.

Comment on lines +66 to +71
parser.addoption(
"--test-config-file",
type=Path,
default=default_config_file,
help="Config JSON file used to parameterize test cases",
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some ideas, adding complexity in the conftest file but making it more flexible:

  • Add more options here to use instead of --test-config-file, like --iree-compile-flags
  • Make every option a flag then use a flagfile pattern like https://stackoverflow.com/a/27434050 so "config.json" is just a collection of regular flags
    • Not sure how that would work with the lists of tests with expectations... I like having the full list of tests that will run be sorted and not split between groups like in the onnx op tests
  • Could load a .py file that has the list of tests and statuses, or even just implements the pytest_collection_modifyitems hook

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using https://github.com/gsnedders/pytest-expect or https://github.com/projectcaluma/pytest-xfaillist is also an option.

Both support updating the xfail files via a --update-xfail or --generate-xfaillist option. Neither supports xfail reasons, but that's not critical.

https://github.com/projectcaluma/pytest-xfaillist seems to only support a hardcoded xfails.list file name next to the config root (source here), which wouldn't work with separate lists of xfails depending on the configuration (e.g. backend choice)

https://github.com/gsnedders/pytest-expect does allow specifying a file with --xfail-file


What I have right now in this "config JSON file" groups these three items:

  1. The flags to use when compiling and running, allowing you to choose between CPU or GPU, for example
  2. The list of tests to run
  3. The list of tests that are expected to fail (and how)

We'll always want a custom implementation for (1). For (2), https://community.lambdatest.com/t/how-to-run-pytest-tests-from-a-list-of-test-paths/31682/2 has some answers but I don't see an existing plugin or standard convention. For (3), we could use one of those projects.

Copy link
Collaborator

@zjgarvey zjgarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the expected outcomes to a json is a good idea for making it customizable to the different configs for each test.

I think it will probably be helpful to have the ability to specify extra test-specific options for an individual test through the json file. E.g. for large models, it might be useful to pass additional importer and runtime flags for externalizing params. Although I'm not sure if models that large are going to be in the scope of this testing suite right now.

One thing I think will be rather helpful is to have a way to take a list of urls and automatically generate the test functions. For example:

from typing import List

test_urls = [
      "https://github.com/onnx/models/raw/main/validated/vision/object_detection_segmentation/faster-rcnn/model/FasterRCNN-12.onnx",
      "https://github.com/onnx/models/raw/main/validated/vision/object_detection_segmentation/fcn/model/fcn-resnet50-12.onnx",
]

def generate_name(url: str):
    split = url.split("/")
    return f'test_{split[-3]}_{split[-1].removesuffix(".onnx")}'

def make_function(url: str):
    def func(compare_between_iree_and_onnx_runtime):
        compare_between_iree_and_onnx_runtime(
            model_url=url,
            artifacts_subdir=artifacts_subdir,
        )
    return func

def define_functions(url_list: List[str]):
    for url in url_list:
        globals()[generate_name(url)] = make_function(url)

@ScottTodd
Copy link
Member Author

I think it will probably be helpful to have the ability to specify extra test-specific options for an individual test through the json file.

Yes! However, I'm on the fence about specifically where to support extra options. We can add options in all these places:

  • In the test runner (conftest.py)
  • In individual test cases within files (test_foo.py)
  • In configuration files (config.json)
  • As parameters on environment variables (e.g. IREE_HIP_TEST_TARGET_CHIP)

I don't want to confuse developers with too many choices, but flexibility can helpful for a variety of situations.

E.g. for large models, it might be useful to pass additional importer and runtime flags for externalizing params.

For example here, that sounds like those options should apply regardless of the backend configuration, so the flags could go in the test cases.

Although I'm not sure if models that large are going to be in the scope of this testing suite right now.

I think they should be, as we can choose which tests to run on what schedule. How I have the PR right now allows for the list of tests to be fully opt-in using the default "skip" behavior. You can also filter with pytest:

# run only tests matching a string
-k resnet

# skip all large tests
-m "not size_large"

We can also use something like https://pypi.org/project/pytest-shard/ to run across multiple machines.


One thing I think will be rather helpful is to have a way to take a list of urls and automatically generate the test functions.

Yeah! I like how easy it looks to add test cases when they are in files like https://github.com/nod-ai/SHARK-TestSuite/blob/main/alt_e2eshark/onnx_tests/models/external_lists/onnx_model_zoo_computer_vision_1.txt. One of the tradeoffs I'm considering here is how it isn't obvious at all from a single test case name what the test is actually doing. I'd like for test suites to be forkable into user code, not be their own world of meta programming. The new tests that Rob added in https://github.com/iree-org/iree-test-suites/blob/main/sharktank_models/llama3.1/test_llama.py are on the other side of that spectrum:

Resemble user code                                               Templated to run lots of tests
<--------------------------------------------------------------------------------------------->

sharktank_models/                                          onnx_models/           alt_e2eshark/

What's in this test suite right now is closer to alt_e2eshark in that there is code for each test case... but that code is nearly all boilerplate:

def test_age_gender_gender_googlenet(compare_between_iree_and_onnxruntime):
    compare_between_iree_and_onnxruntime(
        model_url="https://github.com/onnx/models/raw/main/validated/vision/body_analysis/age_gender/models/gender_googlenet.onnx",
        artifacts_subdir=artifacts_subdir,
    )

I can iterate on further parameterization as you suggest here... maybe using pytest.mark.parametrize (docs here) or one of the more powerful options.

The ergonomics question is partially solved by

  1. keeping the test runner code organized and documented
  2. having the tests log what they are doing (iree-import-onnx -> iree-compile -> iree-run-module), since then users can run the tests and copy the log output for their own uses

@ScottTodd
Copy link
Member Author

Thanks again for the review comments @zjgarvey . I'm planning to pick this back up soon.

@ScottTodd ScottTodd force-pushed the onnx-models-parameterize branch from a3f7e28 to 99c3df6 Compare January 10, 2025 22:46
Comment on lines +17 to +27
@pytest.mark.parametrize(
"model",
[
# fmt: off
pytest.param("duc/model/ResNet101-DUC-12.onnx", marks=pytest.mark.size_large),
pytest.param("faster-rcnn/model/FasterRCNN-12.onnx"),
pytest.param("fcn/model/fcn-resnet50-12.onnx"),
pytest.param("mask-rcnn/model/MaskRCNN-12.onnx"),
pytest.param("retinanet/model/retinanet-9.onnx"),
pytest.param("ssd/model/ssd-12.onnx"),
pytest.param("ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx", marks=pytest.mark.xfail(raises=NotImplementedError)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I think will be rather helpful is to have a way to take a list of urls and automatically generate the test functions.

How's this @zjgarvey ?

Some details:

  • If no marks are needed (all models in the list are supported by the test suite, no "large" or other special model tags needed), then the pytest.param() wrappers could be removed, for just:

    @pytest.mark.parametrize(
        "model",
        [
            "duc/model/ResNet101-DUC-12.onnx",
            "faster-rcnn/model/FasterRCNN-12.onnx",
            "fcn/model/fcn-resnet50-12.onnx",
            "mask-rcnn/model/MaskRCNN-12.onnx",
  • Using fmt: off to prevent the formatter from wrapping lines, so each test case gets its own line, even if it becomes very long

  • Test function names could be customized here: https://stackoverflow.com/questions/37575690/override-a-pytest-parameterized-functions-name . Along with that, I could change how the setup code in conftest.py decides which test cases to modify, using regex match or some shorthand, instead of the explicit

      "tests_and_expected_outcomes": {
        "tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v1/model/inception-v1-12.onnx]": "fail-compile",

    The explicit format is copy-paste friendly but not typing friendly :P

  • I figure for any test cases that need extra arguments, they could be their own groups that call compare_between_iree_and_onnxruntime or another test helper function, instead of packing more options in to this @pytest.mark.parametrize

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense to me.

I'll go through and take another look at this PR when I get the chance. If I don't get to it for a while and you'd like a review, please feel free to message/ping me.

@ScottTodd ScottTodd requested a review from zjgarvey January 10, 2025 23:28
Copy link
Collaborator

@zjgarvey zjgarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was able to look through this pretty quickly now, and it looks like a good change to me. What else needs to be done to undraft this PR?

@ScottTodd
Copy link
Member Author

Actually, I was able to look through this pretty quickly now, and it looks like a good change to me. What else needs to be done to undraft this PR?

I'm not totally content with a few design points yet, but this could be good enough to merge and start using. I'll iterate a bit on the PR description and docs so they reflect the current status then mark as ready for review. Thanks for taking a look!

@ScottTodd ScottTodd marked this pull request as ready for review January 15, 2025 17:44
Comment on lines +276 to +282
# Download the model as needed.
# TODO(scotttodd): move to fixture with cache / download on demand
# TODO(scotttodd): overwrite if already existing? check SHA?
# TODO(scotttodd): redownload if file is corrupted (e.g. partial download)
onnx_path = test_artifacts_dir / f"{model_name}.onnx"
if not onnx_path.exists():
urllib.request.urlretrieve(model_url, onnx_path)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up tasks

  • Adjust file downloading / caching behavior to avoid redownloading and using significant bandwidth when used together with persistent self-hosted runners or github actions caches

For a sense of scale, the onnx_models/artifacts/ directory is around 34GB on my Windows system right now, including .mlir and .vmfb files. I don't want CI runs to redownload 10GB+ from GitHub each job run, since I think that cuts in to the Git LFS bandwidth quota for https://github.com/onnx/models . The docs at https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage say that quota comes from the repository owner, not the user, so I want to be a good citizen here.

I may take ideas from #59 (comment) and build some caching layer that can be shared across the test suites here.

@ScottTodd ScottTodd merged commit d7db851 into iree-org:main Jan 15, 2025
2 checks passed
@ScottTodd ScottTodd deleted the onnx-models-parameterize branch January 15, 2025 18:05
ScottTodd added a commit to iree-org/iree that referenced this pull request Jan 31, 2025
…P. (#19524)

This switches from running ONNX model compile->run correctness tests on
only CPU to now run on GPU using the Vulkan and HIP APIs. We could also
run on CUDA with #18814 and Metal
with #18817.

These new tests will help guard against regressions to full models, at
least when using default flags. I'm planning on adding models coming
from other frameworks (such as [LiteRT
Models](https://github.com/iree-org/iree-test-suites/tree/main/litert_models))
in future PRs.

As these tests will run on every pull request and commit, I'm starting
the test list with all tests that are passing on our current set of
runners, with no (strict _or_ loose) XFAILs. The full set of tests will
be run nightly in https://github.com/iree-org/iree-test-suites using
nightly IREE releases... once we have runners with GPUs available in
that repository.

See also iree-org/iree-test-suites#65 and
iree-org/iree-test-suites#6.

## Sample logs

I have not done much triage on the test failures, but it does seem like
Vulkan pass rates are substantially lower than CPU and ROCm. Test
reports, including logs for all failures, are currently published as
artifacts on actions runs in iree-test-suites, such as
https://github.com/iree-org/iree-test-suites/actions/runs/12794322266.
We could also archive test reports somewhere like
https://github.com/nod-ai/e2eshark-reports and/or host the test reports
on a website like
https://nod-ai.github.io/shark-ai/llm/sglang/index.html?sort=result.

### CPU


https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117085?pr=19524#step:8:395

```
============================== slowest durations ===============================
39.46s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[vgg/model/vgg19-7.onnx]
13.39s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
13.25s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
12.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
11.93s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
11.49s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
11.28s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
11.26s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
9.14s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.73s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
7.61s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/gender_googlenet.onnx]
7.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
7.27s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
4.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
4.61s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
4.58s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
3.08s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[squeezenet/model/squeezenet1.0-9.onnx]
2.02s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
1.90s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
================== 19 passed, 18 skipped in 184.96s (0:03:04) ==================
```

### ROCm


https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117629?pr=19524#step:8:344

```
============================== slowest durations ===============================
9.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
9.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
9.05s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
8.73s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
7.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.94s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
7.81s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
7.13s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.95s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
5.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
4.52s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/googlenet/model/googlenet-12.onnx]
3.55s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
3.12s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
2.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
2.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
2.21s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.36s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
0.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============ 17 passed, 19 skipped, 1 xfailed in 100.10s (0:01:40) =============
```

### Vulkan


https://github.com/iree-org/iree/actions/runs/12797886622/job/35681118044?pr=19524#step:8:216

```
============================== slowest durations ===============================
13.10s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
12.97s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
12.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
12.22s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
9.07s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
8.09s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.04s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
2.93s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
0.90s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============= 9 passed, 27 skipped, 1 xfailed in 79.62s (0:01:19) ==============
```

ci-exactly: build_packages, test_onnx
ita9naiwa pushed a commit to ita9naiwa/iree that referenced this pull request Feb 4, 2025
…P. (iree-org#19524)

This switches from running ONNX model compile->run correctness tests on
only CPU to now run on GPU using the Vulkan and HIP APIs. We could also
run on CUDA with iree-org#18814 and Metal
with iree-org#18817.

These new tests will help guard against regressions to full models, at
least when using default flags. I'm planning on adding models coming
from other frameworks (such as [LiteRT
Models](https://github.com/iree-org/iree-test-suites/tree/main/litert_models))
in future PRs.

As these tests will run on every pull request and commit, I'm starting
the test list with all tests that are passing on our current set of
runners, with no (strict _or_ loose) XFAILs. The full set of tests will
be run nightly in https://github.com/iree-org/iree-test-suites using
nightly IREE releases... once we have runners with GPUs available in
that repository.

See also iree-org/iree-test-suites#65 and
iree-org/iree-test-suites#6.

## Sample logs

I have not done much triage on the test failures, but it does seem like
Vulkan pass rates are substantially lower than CPU and ROCm. Test
reports, including logs for all failures, are currently published as
artifacts on actions runs in iree-test-suites, such as
https://github.com/iree-org/iree-test-suites/actions/runs/12794322266.
We could also archive test reports somewhere like
https://github.com/nod-ai/e2eshark-reports and/or host the test reports
on a website like
https://nod-ai.github.io/shark-ai/llm/sglang/index.html?sort=result.

### CPU

https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117085?pr=19524#step:8:395

```
============================== slowest durations ===============================
39.46s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[vgg/model/vgg19-7.onnx]
13.39s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
13.25s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
12.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
11.93s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
11.49s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
11.28s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
11.26s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
9.14s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.73s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
7.61s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/gender_googlenet.onnx]
7.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
7.27s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
4.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
4.61s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
4.58s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
3.08s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[squeezenet/model/squeezenet1.0-9.onnx]
2.02s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
1.90s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
================== 19 passed, 18 skipped in 184.96s (0:03:04) ==================
```

### ROCm

https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117629?pr=19524#step:8:344

```
============================== slowest durations ===============================
9.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
9.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
9.05s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
8.73s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
7.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.94s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
7.81s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
7.13s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.95s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
5.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
4.52s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/googlenet/model/googlenet-12.onnx]
3.55s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
3.12s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
2.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
2.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
2.21s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.36s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
0.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============ 17 passed, 19 skipped, 1 xfailed in 100.10s (0:01:40) =============
```

### Vulkan

https://github.com/iree-org/iree/actions/runs/12797886622/job/35681118044?pr=19524#step:8:216

```
============================== slowest durations ===============================
13.10s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
12.97s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
12.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
12.22s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
9.07s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
8.09s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.04s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
2.93s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
0.90s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============= 9 passed, 27 skipped, 1 xfailed in 79.62s (0:01:19) ==============
```

ci-exactly: build_packages, test_onnx
Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants