-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacing use of iree-hal-target-backends in most tests. #20295
Conversation
511ef68
to
b98a628
Compare
Sharktank CPU tests are failing due to it using the legacy support hack, filed an issue: #1119. |
c2dfc10
to
09f69df
Compare
Sharktank updated in nod-ai/shark-ai#1122 - it needs the fix from this PR to work so it will need to land after this. |
09f69df
to
3bb4346
Compare
3bb4346
to
f9e5cd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces more typing for a common configuration but it does better prepare us for multi-device, right? I wonder if we could have the low level APIs / tools continue to accept some shorthand notation, or if we want that to always be up to the hosting application / API to hide behind an abstraction layer / syntactic sugar.
Structural changes LGTM, modulo some docs that reference specific lines in code blocks that need some adjustment now.
tests/external/iree-test-suites/onnx_models/onnx_models_gpu_rocm_rdna3.json
Outdated
Show resolved
Hide resolved
I'm not too concerned with extra typing - if you see what one has to do to use offloading in clang we're practically automatic ;P |
f9e5cd7
to
8c40839
Compare
I believe the only failures are now the sharktank ones fixed by #1122 - I'll land this and then we can bump with nod-ai/shark-ai#1126 |
(the rdna3 failure is the topk flake tracked in #20327) |
https://github.com/iree-org/iree-test-suites/blob/fb8ebeea324dccce51af8e725008689cab745600/sharktank_models/llama3.1/test_llama.py#L112-L114 is going to need updates to fix these failures: https://github.com/iree-org/iree/actions/runs/13973252247/job/39121782465?pr=20295 I think a change to iree-test-suites can land ahead of this PR? See also iree-org/iree-test-suites#86 (comment) |
|
I'm not loving that we have 2 other repos to update when changes are made in upstream :( |
I'm also not sure how to manage this - can you dumb it down? Do I break iree-test-suites while this waits to land? Do I break shark-ai? Do I break both? |
Test infra will need to be its own thing; the goal here is to have all examples, samples, and tests in-tree use the modern device flags.
This is required to make local option binding function: currently the `--iree-hal-local-` flags are only available via the global command line accessors.
This required moving SupportedTypes to TargetBackends to allow them to indicate for a given configuration which high-level types they support.
This repository only pulls test cases from https://github.com/iree-org/iree-test-suites. I think the sharktank CPU tests that you saw failing are those in https://github.com/iree-org/iree-test-suites/tree/main/sharktank_models. Other test suites in that repo are set up such that this repository provides all flags. Those tests aren't that flexible yet. If we need to break one, I'd prefer to break iree-test-suites (the whole purpose of that repo is to have tests, so accommodating API changes and such is fine). |
Still not clear to me, but meh, I'm going to start force merging in a bit here. If we had runner capacity and fewer flakes rolling incremental changes would be appealing but we just can't handle that today. When we have cycles in the repositories someone is going to be broken - I'd prefer to break shark instead of IREE - we can't expect random contributors to go update shark but can expect them to keep IREE projects working. |
f9b616c
to
2443238
Compare
iree-test-suites updated and commit hashes changed. PR in shark (nod-ai/shark-ai#1122) is sitting there for someone to land. |
You shouldn't need the shark-ai change here. What you have now should be enough. |
Test infra will need to be its own thing; the goal here is to have all examples, samples, and tests in-tree use the modern device flags. This required fixing some layering issues (JitGlobals relying on hardcoded strings) and (mostly) fixing default option handling for
local
backends as well as fixing--iree-hal-local-*
flags via the API (which was forcing a lot of legacy goo to hang around device names in the TargetRegistry).Future changes will correct the misnamed plural
--iree-hal-local-target-device-backends=
flag (which is a list but not comma-delimited, so should not be plural), but with these changes that correction will be a minimal find/replace in the tests touched.If a user was relying on the workaround for the legacy
--iree-hal-target-backends=
flag where--iree-hal-target-device=
supported the same names they will need to change to either using--iree-hal-target-backends=
(and eventually fixing it when that's removed) or for CPU--iree-hal-target-device=local --iree-hal-local-target-device=llvm-cpu
. Hyrum's law in action.