Replacing use of iree-hal-target-backends in most tests. #20295

benvanik · 2025-03-18T16:16:21Z

Test infra will need to be its own thing; the goal here is to have all examples, samples, and tests in-tree use the modern device flags. This required fixing some layering issues (JitGlobals relying on hardcoded strings) and (mostly) fixing default option handling for local backends as well as fixing --iree-hal-local-* flags via the API (which was forcing a lot of legacy goo to hang around device names in the TargetRegistry).

Future changes will correct the misnamed plural --iree-hal-local-target-device-backends= flag (which is a list but not comma-delimited, so should not be plural), but with these changes that correction will be a minimal find/replace in the tests touched.

If a user was relying on the workaround for the legacy --iree-hal-target-backends= flag where --iree-hal-target-device= supported the same names they will need to change to either using --iree-hal-target-backends= (and eventually fixing it when that's removed) or for CPU --iree-hal-target-device=local --iree-hal-local-target-device=llvm-cpu. Hyrum's law in action.

benvanik · 2025-03-19T15:04:50Z

Sharktank CPU tests are failing due to it using the legacy support hack, filed an issue: #1119.

benvanik · 2025-03-19T16:10:07Z

Sharktank updated in nod-ai/shark-ai#1122 - it needs the fix from this PR to work so it will need to land after this.

ScottTodd

This introduces more typing for a common configuration but it does better prepare us for multi-device, right? I wonder if we could have the low level APIs / tools continue to accept some shorthand notation, or if we want that to always be up to the hosting application / API to hide behind an abstraction layer / syntactic sugar.

Structural changes LGTM, modulo some docs that reference specific lines in code blocks that need some adjustment now.

docs/website/docs/guides/deployment-configurations/cpu.md

tests/external/iree-test-suites/onnx_models/onnx_models_gpu_rocm_rdna3.json

docs/website/docs/developers/general/developer-tips.md

docs/website/docs/community/blog/posts/linalg-tutorial.md

benvanik · 2025-03-20T00:55:17Z

I'm not too concerned with extra typing - if you see what one has to do to use offloading in clang we're practically automatic ;P
I think the big decider here is that people keep building infra around the assumption that a runtime device and a compiler backend are always 1:1 - I'd rather not give them the chance to keep building stuff like that. Given that people regularly check in scripts with dozens of command line flags an extra one doesn't really feel bad, and other targets like HIP always require both the device + the ISA anyway so we're already in "you need multiple flags to specify things" territory anyway. Maybe once we deprecate iree-hal-target-backends (everyone is not assuming 1:1 device:backend) we can add back some helpful aliases.

benvanik · 2025-03-20T17:07:02Z

I believe the only failures are now the sharktank ones fixed by #1122 - I'll land this and then we can bump with nod-ai/shark-ai#1126

benvanik · 2025-03-20T17:09:02Z

(the rdna3 failure is the topk flake tracked in #20327)

ScottTodd · 2025-03-20T21:22:20Z

https://github.com/iree-org/iree-test-suites/blob/fb8ebeea324dccce51af8e725008689cab745600/sharktank_models/llama3.1/test_llama.py#L112-L114 is going to need updates to fix these failures: https://github.com/iree-org/iree/actions/runs/13973252247/job/39121782465?pr=20295

I think a change to iree-test-suites can land ahead of this PR? See also iree-org/iree-test-suites#86 (comment)

ScottTodd · 2025-03-20T21:23:47Z

I think a change to iree-test-suites can land ahead of this PR?

Make the change in that repo

Update the commit hash here (find-replace to update other pinned locations would be ideal):

iree/.github/workflows/pkgci_test_sharktank.yml

Lines 61 to 67 in 43753b3

    
                 - name: Checkout test suites repository 
        
                   uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 
        
                   with: 
        
                     repository: iree-org/iree-test-suites 
        
                     ref: fb8ebeea324dccce51af8e725008689cab745600 
        
                     path: iree-test-suites 
        
                     lfs: true

benvanik · 2025-03-20T21:27:02Z

I'm not loving that we have 2 other repos to update when changes are made in upstream :(
(iree-test-suites has sharktank_models, we should pick one we care about and let the other update on integrate - is iree-test-suites the one we care about?)

benvanik · 2025-03-20T21:29:02Z

I'm also not sure how to manage this - can you dumb it down? Do I break iree-test-suites while this waits to land? Do I break shark-ai? Do I break both?

Test infra will need to be its own thing; the goal here is to have all examples, samples, and tests in-tree use the modern device flags.

This is required to make local option binding function: currently the `--iree-hal-local-` flags are only available via the global command line accessors.

This required moving SupportedTypes to TargetBackends to allow them to indicate for a given configuration which high-level types they support.

ScottTodd · 2025-03-20T21:34:49Z

This repository only pulls test cases from https://github.com/iree-org/iree-test-suites. I think the sharktank CPU tests that you saw failing are those in https://github.com/iree-org/iree-test-suites/tree/main/sharktank_models. Other test suites in that repo are set up such that this repository provides all flags. Those tests aren't that flexible yet.

If we need to break one, I'd prefer to break iree-test-suites (the whole purpose of that repo is to have tests, so accommodating API changes and such is fine).

For iree-org/iree#20295.

benvanik · 2025-03-20T21:41:44Z

Still not clear to me, but meh, I'm going to start force merging in a bit here. If we had runner capacity and fewer flakes rolling incremental changes would be appealing but we just can't handle that today. When we have cycles in the repositories someone is going to be broken - I'd prefer to break shark instead of IREE - we can't expect random contributors to go update shark but can expect them to keep IREE projects working.

For iree-org/iree#20295.

benvanik · 2025-03-20T21:45:31Z

iree-test-suites updated and commit hashes changed. PR in shark (nod-ai/shark-ai#1122) is sitting there for someone to land.

ScottTodd · 2025-03-20T21:47:06Z

You shouldn't need the shark-ai change here. What you have now should be enough.

benvanik force-pushed the users/benvanik/backends-switch branch 6 times, most recently from 511ef68 to b98a628 Compare March 19, 2025 06:38

benvanik force-pushed the users/benvanik/backends-switch branch 2 times, most recently from c2dfc10 to 09f69df Compare March 19, 2025 15:57

benvanik mentioned this pull request Mar 19, 2025

Correcting usage of --iree-hal-target-device=llvm-cpu. nod-ai/shark-ai#1122

Open

benvanik force-pushed the users/benvanik/backends-switch branch from 09f69df to 3bb4346 Compare March 19, 2025 16:15

benvanik changed the title ~~[WIP] Replacing use of iree-hal-target-backends in most tests.~~ Replacing use of iree-hal-target-backends in most tests. Mar 19, 2025

benvanik requested review from stellaraccident, MaheshRavishankar and ScottTodd March 19, 2025 16:42

benvanik marked this pull request as ready for review March 19, 2025 16:58

benvanik requested review from kuhar and hanhanW as code owners March 19, 2025 16:58

benvanik force-pushed the users/benvanik/backends-switch branch from 3bb4346 to f9e5cd7 Compare March 19, 2025 20:04

ScottTodd reviewed Mar 19, 2025

View reviewed changes

ScottTodd mentioned this pull request Mar 20, 2025

Bump IREE requirement pins to 3.3.0rc20250320 nod-ai/shark-ai#1126

Open

benvanik force-pushed the users/benvanik/backends-switch branch from f9e5cd7 to 8c40839 Compare March 20, 2025 15:28

benvanik requested a review from ScottTodd March 20, 2025 17:07

benvanik added 4 commits March 20, 2025 14:30

Replacing use of iree-hal-target-backends in most tests.

16f9378

Test infra will need to be its own thing; the goal here is to have all examples, samples, and tests in-tree use the modern device flags.

Moving local device registration to an always-enabled plugin.

2314cbf

This is required to make local option binding function: currently the `--iree-hal-local-` flags are only available via the global command line accessors.

Decoupling JitGlobals from hard-coded legacy backend selection.

e526cc4

This required moving SupportedTypes to TargetBackends to allow them to indicate for a given configuration which high-level types they support.

Renaming ONNX test suites from ROCM to HIP (the HAL running the test).

ac58078

benvanik added a commit to iree-org/iree-test-suites that referenced this pull request Mar 20, 2025

Correcting misuse of --iree-hal-target-device=llvm-cpu.

c08c53f

For iree-org/iree#20295.

benvanik mentioned this pull request Mar 20, 2025

Correcting misuse of --iree-hal-target-device=llvm-cpu. iree-org/iree-test-suites#90

Merged

benvanik added a commit to iree-org/iree-test-suites that referenced this pull request Mar 20, 2025

Correcting misuse of --iree-hal-target-device=llvm-cpu. (#90)

0c8c3ac

For iree-org/iree#20295.

Updating iree-test-suites with new flags.

2443238

benvanik force-pushed the users/benvanik/backends-switch branch from f9b616c to 2443238 Compare March 20, 2025 21:43

ScottTodd approved these changes Mar 20, 2025

View reviewed changes

benvanik merged commit c846333 into main Mar 21, 2025
45 of 47 checks passed

benvanik deleted the users/benvanik/backends-switch branch March 21, 2025 00:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replacing use of iree-hal-target-backends in most tests. #20295

Replacing use of iree-hal-target-backends in most tests. #20295

benvanik commented Mar 18, 2025 •

edited

Loading

benvanik commented Mar 19, 2025 •

edited

Loading

benvanik commented Mar 19, 2025

ScottTodd left a comment

benvanik commented Mar 20, 2025

benvanik commented Mar 20, 2025

benvanik commented Mar 20, 2025

ScottTodd commented Mar 20, 2025

ScottTodd commented Mar 20, 2025

benvanik commented Mar 20, 2025 •

edited

Loading

benvanik commented Mar 20, 2025

ScottTodd commented Mar 20, 2025

benvanik commented Mar 20, 2025

benvanik commented Mar 20, 2025

ScottTodd commented Mar 20, 2025

Replacing use of iree-hal-target-backends in most tests. #20295

Replacing use of iree-hal-target-backends in most tests. #20295

Conversation

benvanik commented Mar 18, 2025 • edited Loading

benvanik commented Mar 19, 2025 • edited Loading

benvanik commented Mar 19, 2025

ScottTodd left a comment

Choose a reason for hiding this comment

benvanik commented Mar 20, 2025

benvanik commented Mar 20, 2025

benvanik commented Mar 20, 2025

ScottTodd commented Mar 20, 2025

ScottTodd commented Mar 20, 2025

benvanik commented Mar 20, 2025 • edited Loading

benvanik commented Mar 20, 2025

ScottTodd commented Mar 20, 2025

benvanik commented Mar 20, 2025

benvanik commented Mar 20, 2025

ScottTodd commented Mar 20, 2025

benvanik commented Mar 18, 2025 •

edited

Loading

benvanik commented Mar 19, 2025 •

edited

Loading

benvanik commented Mar 20, 2025 •

edited

Loading