Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for ONNX models #6

Open
ScottTodd opened this issue Aug 9, 2024 · 6 comments
Open

Add tests for ONNX models #6

ScottTodd opened this issue Aug 9, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@ScottTodd
Copy link
Member

ScottTodd commented Aug 9, 2024

See nod-ai/SHARK-TestSuite#275

Some work has gone into importing models already at https://github.com/nod-ai/SHARK-TestSuite/tree/main/e2eshark/onnx/models

@ScottTodd ScottTodd self-assigned this Sep 6, 2024
ScottTodd added a commit that referenced this issue Sep 19, 2024
Progress on #6.

A sample test report HTML file is available here:
https://scotttodd.github.io/iree-test-suites/onnx_models/report_2024_09_17.html

These new tests

* Download models from https://github.com/onnx/models
* Extract metadata from the models to determine which functions to call
with random data
* Run the models through [ONNX Runtime](https://onnxruntime.ai/) as a
reference implementation
* Import the models using `iree-import-onnx` (until we have a better
API: iree-org/iree#18289)
* Compile the models using `iree-compile` (currently just for `llvm-cpu`
but this could be parameterized later)
* Run the models using `iree-run-module`, checking outputs using
`--expected_output` and the reference data

Tests are written in Python using a set of pytest helper functions. As
the tests run, they can log details about what commands they are
running. When run locally, the `artifacts/` directory will contain all
the relevant files. More can be done in follow-up PRs to improve the
ergonomics there (like generating flagfiles).

Each test case can use XFAIL like
`@pytest.mark.xfail(raises=IreeRunException)`. As we test across
multiple backends or want to configure the test suite from another repo
(e.g. [iree-org/iree](https://github.com/iree-org/iree)), we can explore
more expressive marks.

Note that unlike the ONNX _operator_ tests, these tests use
`onnxruntime` and `iree-import-onnx` at test time. The operator tests
handle that as an infrequently ran offline step. We could do something
similar here, but the test inputs and outputs can be rather large for
real models and that gets into Git LFS or cloud storage territory.

If this test authoring model works well enough, we can do something
similar for other ML frameworks like TFLite
(#5).
ScottTodd added a commit to iree-org/iree that referenced this issue Oct 17, 2024
Progress on iree-org/iree-test-suites#6.

Current tests included and their statuses:
```
PASSED onnx_models/tests/vision/classification_models_test.py::test_alexnet
PASSED onnx_models/tests/vision/classification_models_test.py::test_caffenet
PASSED onnx_models/tests/vision/classification_models_test.py::test_densenet_121
PASSED onnx_models/tests/vision/classification_models_test.py::test_googlenet
PASSED onnx_models/tests/vision/classification_models_test.py::test_inception_v2
PASSED onnx_models/tests/vision/classification_models_test.py::test_mnist
PASSED onnx_models/tests/vision/classification_models_test.py::test_resnet50_v1
PASSED onnx_models/tests/vision/classification_models_test.py::test_resnet50_v2
PASSED onnx_models/tests/vision/classification_models_test.py::test_shufflenet
PASSED onnx_models/tests/vision/classification_models_test.py::test_shufflenet_v2
PASSED onnx_models/tests/vision/classification_models_test.py::test_squeezenet
PASSED onnx_models/tests/vision/classification_models_test.py::test_vgg19
XFAIL onnx_models/tests/vision/classification_models_test.py::test_efficientnet_lite4
XFAIL onnx_models/tests/vision/classification_models_test.py::test_inception_v1
XFAIL onnx_models/tests/vision/classification_models_test.py::test_mobilenet
XFAIL onnx_models/tests/vision/classification_models_test.py::test_rcnn_ilsvrc13
XFAIL onnx_models/tests/vision/classification_models_test.py::test_zfnet_512
```

* CPU only for now. We haven't yet parameterized those tests to allow
for other backends or flags.
* Starting with `--override-ini=xfail_strict=false` so newly _passing_
tests won't fail the job. Newly _failing_ tests will fail the job. We
can add an external config file to customize which tests are expected to
fail like the onnx op tests if we want to track which are
passing/failing in this repository instead of in the test suite repo.

Sample logs:
https://github.com/iree-org/iree/actions/runs/11371239238/job/31633406729?pr=18795

ci-exactly: build_packages, test_onnx
@ScottTodd
Copy link
Member Author

ScottTodd commented Oct 17, 2024

An initial set of ONNX model tests has landed in https://github.com/iree-org/iree-test-suites/tree/main/onnx_models and is now running on every IREE PR / commit. Sample logs from a test run in IREE: https://github.com/iree-org/iree/actions/runs/11390747229/job/31693710855#step:8:19.

Test summary:

=========================== short test summary info ============================
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_alexnet
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_caffenet
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_densenet_121
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_googlenet
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_inception_v2
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_mnist
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_resnet50_v1
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_resnet50_v2
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_shufflenet
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_shufflenet_v2
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_squeezenet
PASSED iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_vgg19
XFAIL iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_efficientnet_lite4
XFAIL iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_inception_v1
XFAIL iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_mobilenet
XFAIL iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_rcnn_ilsvrc13
XFAIL iree-test-suites/onnx_models/tests/vision/classification_models_test.py::test_zfnet_512
================== 12 passed, 5 xfailed in 177.35s (0:02:57) ===================

Open tasks now:

@ScottTodd
Copy link
Member Author

I'll try to import more tests and parameterize across backends today.

ScottTodd added a commit that referenced this issue Jan 15, 2025
Progress on #6. See
how this is used downstream in
iree-org/iree#19524.

## Overview

This replaces hardcoded flags like
```python
iree_compile_flags = [
    "--iree-hal-target-backends=llvm-cpu",
    "--iree-llvmcpu-target-cpu=host",
]
iree_run_module_flags = [
    "--device=local-task",
]
```
and inlined marks like
```python
@pytest.mark.xfail(raises=IreeCompileException)
def test_foo():
```
with a JSON config file passed to the test runner via the
`--test-config-file` option or the `IREE_TEST_CONFIG_FILE` environment
variable.

During test case collection, each test case name is looked up in the
config file to determine what the expected outcome is, from `["skip"
(special option), "pass", "fail-import", "fail-compile", "fail-run"]`.
By default, all tests are skipped. This design allows for out of tree
testing to be performed using explicit test lists (encoded in a file,
unlike the [`-k`
option](https://docs.pytest.org/en/latest/example/markers.html#using-k-expr-to-select-tests-based-on-their-name)),
custom flags, and custom test expectations.

## Design details

Compare this implementation with these others:

* https://github.com/iree-org/iree-test-suites/tree/main/onnx_ops also
uses config files, but with separate lists for `skip_compile_tests`,
`skip_run_tests`, `expected_compile_failures`, and
`expected_run_failures`. All tests are run by default.
*
https://github.com/nod-ai/SHARK-TestSuite/blob/main/alt_e2eshark/run.py
uses `--device=`, `--backend=`, `--target-chip=`, and `--test-filter=`
arguments. Arbitrary flags are not supported, and test expectations are
also not supported, so there is no way to directly signal if tests are
unexpectedly passing or failing. A utility script can be used to diff
the results of two test reports:
https://github.com/nod-ai/SHARK-TestSuite/blob/main/alt_e2eshark/utils/check_regressions.py.
*
https://github.com/iree-org/iree-test-suites/blob/main/sharktank_models/llama3.1/test_llama.py
parameterizes test cases using `@pytest.fixture([params=[...]])` with
`pytest.mark.target_hip` and other custom marks. This is more standard
pytest and supports fluent ways to express other test configurations,
but it makes annotating large numbers of tests pretty verbose and
doesn't allow for out of tree configuration.

I'm imagining a few usage styles:

* Nightly testing in this repository, running all test cases and
tracking the current test results in a checked in config file.
* We could also go with an approach like
https://github.com/nod-ai/SHARK-TestSuite/blob/main/alt_e2eshark/utils/check_regressions.py
to diff test results but this encodes the test results in the config
files rather than in external reports. I see pros and cons to both
approaches.
* Presubmit testing in https://github.com/iree-org/iree, running a
subset of test cases that pass, ensuring that they do not start failing.
We could also run with XFAIL to get early signal for tests that start to
pass.
* If we don't run with XFAIL then we don't need the generalized
`tests_and_expected_outcomes`, we could just limit testing to only
models that are passing.
* Developer testing with arbitrary flags.

## Follow-up tasks

- [ ] Add job matrix to workflow (needs runners in this repo with GPUs)
- [ ] Add an easy way to update the list of XFAILs (maybe switch to
https://github.com/gsnedders/pytest-expect and use its
`--update-xfail`?)
- [ ] Triage some of the failures (e.g. can adjust tolerances on Vulkan)
- [ ] Adjust file downloading / caching behavior to avoid redownloading
and using significant bandwidth when used together with persistent
self-hosted runners or github actions caches
@ScottTodd
Copy link
Member Author

cc @vinayakdsci @pdhirajkumarprasad @zjgarvey

Tests are now parameterized across compiler flags (e.g. target backend choice) and runtime flags (e.g. HAL driver/device choice).

Inputs/outputs

Looking over nod-ai/SHARK-TestSuite#393 and more properties of the https://github.com/onnx/models repository, I now see that the validated model tests include reference inputs and outputs (hidden behind .tar.gz files), so we don't actually need to construct random inputs and get reference outputs from onnxruntime:

# Create a numpy tensor with some random data for the input.
input_data = generate_numpy_input_for_ort_node_arg(input)
input_data_path = onnx_path.with_name(onnx_path.stem + f"_input_{idx}.bin")
write_ndarray_to_binary_file(input_data, input_data_path)
inputs.append(
IreeModelParameterMetadata(
name=input.name,
type=iree_type,
data_file=input_data_path,
)
)
onnx_inputs[input.name] = input_data

Input file caching

I've been thinking about caching a bit too. Right now the download code is quite naive:

# Download the model as needed.
# TODO(scotttodd): move to fixture with cache / download on demand
# TODO(scotttodd): overwrite if already existing? check SHA?
# TODO(scotttodd): redownload if file is corrupted (e.g. partial download)
onnx_path = test_artifacts_dir / f"{model_name}.onnx"
if not onnx_path.exists():
urllib.request.urlretrieve(model_url, onnx_path)

I want something similar to huggingface's cache (which is backed by git lfs), and for these files we could actually use git (lfs) directly too. Something like this:

  1. Clone the onnx/models repository into a cache directory (default ~/.iree-test-suites-cache?)
  2. Within that directory, run git lfs pull --include="[path to model].onnx" --exclude="" to fetch individual files, letting git handle checking if files are downloaded already
  3. symlink from the repository to a test working directory

Then some extensions:

@ScottTodd
Copy link
Member Author

@zjgarvey
Copy link
Collaborator

I haven't used git lfs directly much. Is there a way to pull files with git lfs on demand through a python script? It may be frustrating for someone who needs to test 10 different specific failures without downloading all models or without downloading each one individually.

@ScottTodd
Copy link
Member Author

ScottTodd commented Jan 17, 2025

Sorta: https://stackoverflow.com/questions/74272833/how-to-git-clone-a-lfs-repo-through-python . We can try https://github.com/liberapay/git-lfs-fetch.py, but otherwise I was going to use subprocess.run().

Right now I have each test download the files it needs as the test runs. That lets you run a subset of tests and only download the files that those tests need.

In the old SHARK-TestSuite/iree_tests code I had that download_remote_files.py script that would download everything. That's useful for setting up the cache on a test runner without needing to run all of the tests or run a CI job that might otherwise hit timeouts (e.g. first run with empty cache: 1.5 hours, second run with warm cache: 10 minutes).

There are currently less than 100 tests, using less than 50GB of files. That's easy enough to manage with downloads within each test case. I would like to support all the models, like in https://github.com/nod-ai/SHARK-TestSuite/blob/main/alt_e2eshark/onnx_tests/models/external_lists/onnx_model_zoo_computer_vision_1.txt and related files, and that would pretty quickly hit some of the harder scaling challenges.

ScottTodd added a commit that referenced this issue Jan 21, 2025
Progress on #6.

This adds a caching layer which allows developers and persistent CI
runners to avoid needing to redownload source `.onnx` files.

## Details

* The cache location defaults to `${IREE_TEST_FILES}/iree-test-suites`
if `IREE_TEST_FILES` is set, or `~/.cache/iree-test-suites/` otherwise.
This can be overridden with the custom `--cache-dir=/path/to/cache`
pytest option. Several of our persistent CI machines set the
`IREE_TEST_FILES` environment variable already.
* The cache is implemented as a local git clone of the
https://github.com/onnx/models repository, which uses [Git Large File
Storage (LFS)](https://git-lfs.com/) to store large files. When a file
is requested by a test, the cache layer runs `git lfs pull` in the local
clone to fetch the latest version of the file and then it creates a
symlink from the cache directory to the test working directory. This
usage should be pretty similar to what huggingface_hub provides:
https://huggingface.co/docs/huggingface_hub/guides/manage-cache.

## Testing

Tested in iree-org/iree on persistent runners here:

* Cold cache:
https://github.com/iree-org/iree/actions/runs/12838451019/job/35804050925#step:8:22

    ```
---------------------------- live log sessionstart
-----------------------------
INFO onnx_models.conftest:conftest.py:96 Using cache directory:
'/home/esaimana/iree_tests_cache/iree-test-suites'
INFO onnx_models.cache:cache.py:115 Setting up GitHub repository
'onnx/models'
INFO onnx_models.cache:cache.py:117 Checking for working 'git lfs'
(https://git-lfs.com/)
INFO onnx_models.cache:cache.py:136 Cloning
https://github.com/onnx/models.git into
'/home/esaimana/iree_tests_cache/iree-test-suites/onnx_models'
Cloning into
'/home/esaimana/iree_tests_cache/iree-test-suites/onnx_models'...
    ```

* Warm cache:
https://github.com/iree-org/iree/actions/runs/12838451019/job/35804127583#step:8:22

    ```
---------------------------- live log sessionstart
-----------------------------
INFO onnx_models.conftest:conftest.py:96 Using cache directory:
'/home/esaimana/iree_tests_cache/iree-test-suites'
INFO onnx_models.cache:cache.py:115 Setting up GitHub repository
'onnx/models'
INFO onnx_models.cache:cache.py:117 Checking for working 'git lfs'
(https://git-lfs.com/)
INFO onnx_models.cache:cache.py:122 Directory
'/home/esaimana/iree_tests_cache/iree-test-suites/onnx_models' already
exists
    ```

(The rest of the logs are currently the same)

---------

Co-authored-by: zjgarvey <47986913+zjgarvey@users.noreply.github.com>
ScottTodd added a commit to iree-org/iree that referenced this issue Jan 31, 2025
…P. (#19524)

This switches from running ONNX model compile->run correctness tests on
only CPU to now run on GPU using the Vulkan and HIP APIs. We could also
run on CUDA with #18814 and Metal
with #18817.

These new tests will help guard against regressions to full models, at
least when using default flags. I'm planning on adding models coming
from other frameworks (such as [LiteRT
Models](https://github.com/iree-org/iree-test-suites/tree/main/litert_models))
in future PRs.

As these tests will run on every pull request and commit, I'm starting
the test list with all tests that are passing on our current set of
runners, with no (strict _or_ loose) XFAILs. The full set of tests will
be run nightly in https://github.com/iree-org/iree-test-suites using
nightly IREE releases... once we have runners with GPUs available in
that repository.

See also iree-org/iree-test-suites#65 and
iree-org/iree-test-suites#6.

## Sample logs

I have not done much triage on the test failures, but it does seem like
Vulkan pass rates are substantially lower than CPU and ROCm. Test
reports, including logs for all failures, are currently published as
artifacts on actions runs in iree-test-suites, such as
https://github.com/iree-org/iree-test-suites/actions/runs/12794322266.
We could also archive test reports somewhere like
https://github.com/nod-ai/e2eshark-reports and/or host the test reports
on a website like
https://nod-ai.github.io/shark-ai/llm/sglang/index.html?sort=result.

### CPU


https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117085?pr=19524#step:8:395

```
============================== slowest durations ===============================
39.46s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[vgg/model/vgg19-7.onnx]
13.39s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
13.25s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
12.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
11.93s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
11.49s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
11.28s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
11.26s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
9.14s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.73s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
7.61s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/gender_googlenet.onnx]
7.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
7.27s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
4.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
4.61s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
4.58s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
3.08s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[squeezenet/model/squeezenet1.0-9.onnx]
2.02s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
1.90s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
================== 19 passed, 18 skipped in 184.96s (0:03:04) ==================
```

### ROCm


https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117629?pr=19524#step:8:344

```
============================== slowest durations ===============================
9.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
9.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
9.05s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
8.73s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
7.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.94s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
7.81s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
7.13s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.95s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
5.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
4.52s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/googlenet/model/googlenet-12.onnx]
3.55s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
3.12s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
2.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
2.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
2.21s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.36s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
0.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============ 17 passed, 19 skipped, 1 xfailed in 100.10s (0:01:40) =============
```

### Vulkan


https://github.com/iree-org/iree/actions/runs/12797886622/job/35681118044?pr=19524#step:8:216

```
============================== slowest durations ===============================
13.10s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
12.97s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
12.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
12.22s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
9.07s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
8.09s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.04s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
2.93s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
0.90s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============= 9 passed, 27 skipped, 1 xfailed in 79.62s (0:01:19) ==============
```

ci-exactly: build_packages, test_onnx
ita9naiwa pushed a commit to ita9naiwa/iree that referenced this issue Feb 4, 2025
…P. (iree-org#19524)

This switches from running ONNX model compile->run correctness tests on
only CPU to now run on GPU using the Vulkan and HIP APIs. We could also
run on CUDA with iree-org#18814 and Metal
with iree-org#18817.

These new tests will help guard against regressions to full models, at
least when using default flags. I'm planning on adding models coming
from other frameworks (such as [LiteRT
Models](https://github.com/iree-org/iree-test-suites/tree/main/litert_models))
in future PRs.

As these tests will run on every pull request and commit, I'm starting
the test list with all tests that are passing on our current set of
runners, with no (strict _or_ loose) XFAILs. The full set of tests will
be run nightly in https://github.com/iree-org/iree-test-suites using
nightly IREE releases... once we have runners with GPUs available in
that repository.

See also iree-org/iree-test-suites#65 and
iree-org/iree-test-suites#6.

## Sample logs

I have not done much triage on the test failures, but it does seem like
Vulkan pass rates are substantially lower than CPU and ROCm. Test
reports, including logs for all failures, are currently published as
artifacts on actions runs in iree-test-suites, such as
https://github.com/iree-org/iree-test-suites/actions/runs/12794322266.
We could also archive test reports somewhere like
https://github.com/nod-ai/e2eshark-reports and/or host the test reports
on a website like
https://nod-ai.github.io/shark-ai/llm/sglang/index.html?sort=result.

### CPU

https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117085?pr=19524#step:8:395

```
============================== slowest durations ===============================
39.46s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[vgg/model/vgg19-7.onnx]
13.39s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
13.25s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
12.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
11.93s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
11.49s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
11.28s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
11.26s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
9.14s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.73s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
7.61s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/gender_googlenet.onnx]
7.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
7.27s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
4.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
4.61s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
4.58s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
3.08s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[squeezenet/model/squeezenet1.0-9.onnx]
2.02s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
1.90s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
================== 19 passed, 18 skipped in 184.96s (0:03:04) ==================
```

### ROCm

https://github.com/iree-org/iree/actions/runs/12797886622/job/35681117629?pr=19524#step:8:344

```
============================== slowest durations ===============================
9.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[densenet-121/model/densenet-12.onnx]
9.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
9.05s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
8.73s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
7.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/inception_v2/model/inception-v2-9.onnx]
7.94s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
7.81s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
7.13s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.95s call     tests/model_zoo/validated/vision/body_analysis_models_test.py::test_models[age_gender/models/age_googlenet.onnx]
5.15s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[efficientnet-lite4/model/efficientnet-lite4-11.onnx]
4.52s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[inception_and_googlenet/googlenet/model/googlenet-12.onnx]
3.55s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
3.12s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-v2-12.onnx]
2.57s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
2.48s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[shufflenet/model/shufflenet-9.onnx]
2.21s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.36s call     tests/model_zoo/validated/vision/super_resolution_models_test.py::test_models[sub_pixel_cnn_2016/model/super-resolution-10.onnx]
0.95s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============ 17 passed, 19 skipped, 1 xfailed in 100.10s (0:01:40) =============
```

### Vulkan

https://github.com/iree-org/iree/actions/runs/12797886622/job/35681118044?pr=19524#step:8:216

```
============================== slowest durations ===============================
13.10s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[alexnet/model/bvlcalexnet-12.onnx]
12.97s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[caffenet/model/caffenet-12.onnx]
12.40s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[rcnn_ilsvrc13/model/rcnn-ilsvrc13-9.onnx]
12.22s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[yolov2-coco/model/yolov2-coco-9.onnx]
9.07s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v1-12.onnx]
8.09s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[resnet/model/resnet50-v2-7.onnx]
6.04s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[tiny-yolov2/model/tinyyolov2-8.onnx]
2.93s call     tests/model_zoo/validated/vision/object_detection_segmentation_models_test.py::test_models[ssd-mobilenetv1/model/ssd_mobilenet_v1_12.onnx]
1.86s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mobilenet/model/mobilenetv2-12.onnx]
0.90s call     tests/model_zoo/validated/vision/classification_models_test.py::test_models[mnist/model/mnist-12.onnx]
============= 9 passed, 27 skipped, 1 xfailed in 79.62s (0:01:19) ==============
```

ci-exactly: build_packages, test_onnx
Signed-off-by: Hyunsung Lee <ita9naiwa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants