Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing use of iree-hal-target-backends in most tests. #20295

Merged
merged 5 commits into from
Mar 21, 2025

Conversation

benvanik
Copy link
Collaborator

@benvanik benvanik commented Mar 18, 2025

Test infra will need to be its own thing; the goal here is to have all examples, samples, and tests in-tree use the modern device flags. This required fixing some layering issues (JitGlobals relying on hardcoded strings) and (mostly) fixing default option handling for local backends as well as fixing --iree-hal-local-* flags via the API (which was forcing a lot of legacy goo to hang around device names in the TargetRegistry).

Future changes will correct the misnamed plural --iree-hal-local-target-device-backends= flag (which is a list but not comma-delimited, so should not be plural), but with these changes that correction will be a minimal find/replace in the tests touched.

If a user was relying on the workaround for the legacy --iree-hal-target-backends= flag where --iree-hal-target-device= supported the same names they will need to change to either using --iree-hal-target-backends= (and eventually fixing it when that's removed) or for CPU --iree-hal-target-device=local --iree-hal-local-target-device=llvm-cpu. Hyrum's law in action.

@benvanik benvanik force-pushed the users/benvanik/backends-switch branch 6 times, most recently from 511ef68 to b98a628 Compare March 19, 2025 06:38
@benvanik
Copy link
Collaborator Author

benvanik commented Mar 19, 2025

Sharktank CPU tests are failing due to it using the legacy support hack, filed an issue: #1119.

@benvanik benvanik force-pushed the users/benvanik/backends-switch branch 2 times, most recently from c2dfc10 to 09f69df Compare March 19, 2025 15:57
@benvanik
Copy link
Collaborator Author

Sharktank updated in nod-ai/shark-ai#1122 - it needs the fix from this PR to work so it will need to land after this.

@benvanik benvanik force-pushed the users/benvanik/backends-switch branch from 09f69df to 3bb4346 Compare March 19, 2025 16:15
@benvanik benvanik changed the title [WIP] Replacing use of iree-hal-target-backends in most tests. Replacing use of iree-hal-target-backends in most tests. Mar 19, 2025
@benvanik benvanik marked this pull request as ready for review March 19, 2025 16:58
@benvanik benvanik requested review from kuhar and hanhanW as code owners March 19, 2025 16:58
@benvanik benvanik force-pushed the users/benvanik/backends-switch branch from 3bb4346 to f9e5cd7 Compare March 19, 2025 20:04
Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces more typing for a common configuration but it does better prepare us for multi-device, right? I wonder if we could have the low level APIs / tools continue to accept some shorthand notation, or if we want that to always be up to the hosting application / API to hide behind an abstraction layer / syntactic sugar.

Structural changes LGTM, modulo some docs that reference specific lines in code blocks that need some adjustment now.

@benvanik
Copy link
Collaborator Author

I'm not too concerned with extra typing - if you see what one has to do to use offloading in clang we're practically automatic ;P
I think the big decider here is that people keep building infra around the assumption that a runtime device and a compiler backend are always 1:1 - I'd rather not give them the chance to keep building stuff like that. Given that people regularly check in scripts with dozens of command line flags an extra one doesn't really feel bad, and other targets like HIP always require both the device + the ISA anyway so we're already in "you need multiple flags to specify things" territory anyway. Maybe once we deprecate iree-hal-target-backends (everyone is not assuming 1:1 device:backend) we can add back some helpful aliases.

@benvanik
Copy link
Collaborator Author

I believe the only failures are now the sharktank ones fixed by #1122 - I'll land this and then we can bump with nod-ai/shark-ai#1126

@benvanik benvanik requested a review from ScottTodd March 20, 2025 17:07
@benvanik
Copy link
Collaborator Author

(the rdna3 failure is the topk flake tracked in #20327)

@ScottTodd
Copy link
Member

@ScottTodd
Copy link
Member

I think a change to iree-test-suites can land ahead of this PR?

  1. Make the change in that repo
  2. Update the commit hash here (find-replace to update other pinned locations would be ideal):
    - name: Checkout test suites repository
    uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
    with:
    repository: iree-org/iree-test-suites
    ref: fb8ebeea324dccce51af8e725008689cab745600
    path: iree-test-suites
    lfs: true

@benvanik
Copy link
Collaborator Author

benvanik commented Mar 20, 2025

I'm not loving that we have 2 other repos to update when changes are made in upstream :(
(iree-test-suites has sharktank_models, we should pick one we care about and let the other update on integrate - is iree-test-suites the one we care about?)

@benvanik
Copy link
Collaborator Author

I'm also not sure how to manage this - can you dumb it down? Do I break iree-test-suites while this waits to land? Do I break shark-ai? Do I break both?

Test infra will need to be its own thing; the goal here is to have
all examples, samples, and tests in-tree use the modern device flags.
This is required to make local option binding function: currently the
`--iree-hal-local-` flags are only available via the global command line
accessors.
This required moving SupportedTypes to TargetBackends to allow them to
indicate for a given configuration which high-level types they support.
@ScottTodd
Copy link
Member

This repository only pulls test cases from https://github.com/iree-org/iree-test-suites. I think the sharktank CPU tests that you saw failing are those in https://github.com/iree-org/iree-test-suites/tree/main/sharktank_models. Other test suites in that repo are set up such that this repository provides all flags. Those tests aren't that flexible yet.

If we need to break one, I'd prefer to break iree-test-suites (the whole purpose of that repo is to have tests, so accommodating API changes and such is fine).

@benvanik
Copy link
Collaborator Author

Still not clear to me, but meh, I'm going to start force merging in a bit here. If we had runner capacity and fewer flakes rolling incremental changes would be appealing but we just can't handle that today. When we have cycles in the repositories someone is going to be broken - I'd prefer to break shark instead of IREE - we can't expect random contributors to go update shark but can expect them to keep IREE projects working.

benvanik added a commit to iree-org/iree-test-suites that referenced this pull request Mar 20, 2025
@benvanik benvanik force-pushed the users/benvanik/backends-switch branch from f9b616c to 2443238 Compare March 20, 2025 21:43
@benvanik
Copy link
Collaborator Author

iree-test-suites updated and commit hashes changed. PR in shark (nod-ai/shark-ai#1122) is sitting there for someone to land.

@ScottTodd
Copy link
Member

You shouldn't need the shark-ai change here. What you have now should be enough.

@benvanik benvanik merged commit c846333 into main Mar 21, 2025
45 of 47 checks passed
@benvanik benvanik deleted the users/benvanik/backends-switch branch March 21, 2025 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants