-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update branch #161
Merged
Merged
Update branch #161
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add initial structure for gemma through e2eshark
Models for which ONNX to torch to IREE CPU run pass
Passing tests
After upstream [torch-mlir#3009](llvm/torch-mlir#3009) lands: DynamicQuantizeLinear will fail due to uint8 being unsupported in unpackbytearray (will post an issue). DynamicQuantizeLinearCast will also fail unless [torch-mlir#3018](llvm/torch-mlir#3018) also lands.
Adds support for unpacking uint8 types.
These let you use options from config files but ignore xfail and skip run settings to easily get logs with all test failures, like these: * [pytest_compile_only_2024_03_21.txt](https://gist.github.com/ScottTodd/ecc9c57c01bfc5e996a15cdd38df6a9c) * [pytest_all_failures_2024_03_21.txt](https://gist.github.com/ScottTodd/1a02531cc76a3b8566428207e39d1870) Both of those are using `IREE compiler version 20240321.838 @ 5f2743baaa79160a3883854e60e5188822dceeb1` (This was previously possible by editing the config .json file, but now it's easier) So minimal repro instructions for latest nightly IREE ONNX test suite failures: ``` git clone https://github.com/nod-ai/SHARK-TestSuite.git cd SHARK-TestSuite python -m venv .venv source .venv/bin/activate python -m pip install -r iree_tests/requirements.txt python -m pip install --find-links https://iree.dev/pip-release-links.html iree-compiler iree-runtime --upgrade pytest iree_tests/onnx -n auto --ignore-xfails --config-files ./iree_tests/configs/config_cpu_llvm_sync.json ``` See the README for other common scenarios (like using a source build of IREE)
This commit plugs in turbine tank exported mlir to the iree testing framework for llama + sd models. Couldn't add real weight test for llama. Real weight file is 20GB and crashing the runner when trying. On that note, looks like splat test cases aren't running at the moment? Maybe we can enable the real weight test after we get the cluster. Also, just in case it's useful for anyone adding models in future, for the exported model mlir, I had to update real weight flag to `--parameters=model=real_weights.irpa` to get it working.
I added a testcase for SiLU as of Elliott's request.
The model.onnx comes from running the QuantizedMLP_basic test in torch-mlir with the onnx config. With changes in [torch-mlir PR#3089](llvm/torch-mlir#3089), this test with --torchtolinalg flag still fails due to issues with mixed signedness of quantized matmul arguments. See [torch-mlir#3090](llvm/torch-mlir#3090).
Signed-off-by: Gaurav Shukla<gashukla@amd.com>
Also simplify the GPU config, since it is just a reference file.
Follow-up to #126 (comment) All I needed to do for generating this PR was pull changes (having git lfs installed). I think to avoid this in the future, others working with these files should also install git lfs.
…om az storage. Add template model.py for tests
If a test unexpectedly passes the compilation phase, this now keeps going and tests the runtime. Depending on the outcome, a specific message is logged allowing for easier updating of config files. Updating is still a manual process, but it now requires less moving test cases back and forth between XFAIL lists. Sample logs: ``` ________ IREE compile and run: test_averagepool_2d_precomputed_strides::cpu_llvm_sync_test ________ [gw20] win32 -- Python 3.11.2 D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Scripts\python.exe [XPASS(strict)] Expected compile to fail (remove from 'expected_compile_failures') ``` ``` ___ IREE compile and run: test_dynamicquantizelinear_min_adjusted_expanded::cpu_llvm_sync_test ____ [gw50] win32 -- Python 3.11.2 D:\dev\projects\SHARK-TestSuite\iree_tests\nightly_pip.venv\Scripts\python.exe Expected compile failure but run failed (move to 'expected_run_failures'): Error invoking iree-run-module Error code: 1 Stderr diagnostics: Stdout diagnostics: EXEC @test_dynamicquantizelinear_min_adjusted_expanded [FAILED] result[0]: metadata is 3x4xi8; expected that the view matches 3x4xui8; expected that the view is equal to contents of a view of 3x4xui8 expected: 3x4xui8=[64 134 83 159][213 255 96 166][249 255 191 149] actual: 3x4xi8=[64 -122 83 -97][-43 -1 96 -90][-7 -1 -65 -107] [FAILED] result[2]: metadata is i8; expected that the view matches ui8; expected that the view is equal to contents of a view of ui8 expected: ui8=0 actual: i8=0 Compiled with: cd D:\dev\projects\SHARK-TestSuite\iree_tests\onnx\node\generated\test_dynamicquantizelinear_min_adjusted_expanded && iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model_cpu_llvm_sync_test.vmfb Run with: cd D:\dev\projects\SHARK-TestSuite\iree_tests\onnx\node\generated\test_dynamicquantizelinear_min_adjusted_expanded && iree-run-module --module=model_cpu_llvm_sync_test.vmfb --device=local-sync --flagfile=test_data_flags.txt ```
…erating instructions
This commit adds support for testing all sdxl submodels
RAFT_vaiq_int8 RDN_pytorch_vaiq_int8 pytorch-3dunet_vaiq_int8 MobileNetV3_small_vaiq_int8 Generated issues #598 #599
1. opt-125m is not in azure storage yet. I'll see if I can figure out how to add it. 2. using past_sequence_length != 0 results in an onnx runtime error that looks virtually identical to this issue: [onnx#6130](microsoft/onnxruntime#6130). I'm not sure how the onnx file was generated, but perhaps there is an option mentioned in that thread which can fix the issue. 3. using past_sequence_length = 0 will successfully run in onnx, but fails in torch-mlir due to shape issues present in the torch-onnx-mlir ir (possible importer issue).
This commit adds the V0 support for testing all sdxl submodels. This PR is not a long term solution for benchmarking as Scott and I discussed here: #152. This is the result of a request to get sdxl benchmarking in ASAP by our team. Due to the high priority for this to be added to the sdxl testing as our team lands patches in IREE, this is simply just to get the implementation in and working. Scott and I discussed some more intensive and well structured ways to add benchmarking, which either of us may implement in the future. Also, this PR depends on this one in terms of landing: #152 (hence the CI failure) Notes for future if we decide that we need a stronger implementation: 1. Maybe something like iree-org/iree#16965 which will feed into https://perf.iree.dev/. 2. This is the benchmarking framework we already have: https://github.com/openxla/iree-comparative-benchmark and https://github.com/openxla/iree/tree/main/build_tools/benchmarks 3. Some questions Scott raised to keep in mind for future implementation: * What metrics/artifacts do we want from benchmarking? * Each model in isolation? Full pipeline latency? Just dispatch time? * What do we want done with benchmark results / artifacts? * The in-tree benchmarks in IREE submit results to a dashboard (that should use a queryable database...), upload Tracy files to cloud storage, and comment on pending pull requests with results summaries * Where do we want benchmarks to run? * Right after tests, on presubmit to IREE? * In a separate job, on separate runners? If we decide benchmarking needs changes, we will address all of these and come up with a more structured, methodical implementation that either creates a new benchmarking flow here or plugs into the iree benchmarking setup.
- adds support for downloading and running the e2e tests for onnx models - currently supported models: - onnx/models/DarkNet53_vaiq_int8 - onnx/models/DenseNet201_vaiq_int8 - onnx/models/EfficientNet_v2_s_vaiq_int8 - onnx/models/GoogLeNet_vaiq_int8 - onnx/models/Inception_v4_vaiq_int8 - onnx/models/LRASPP_vaiq_int8 - onnx/models/MNASNet_1_3_vaiq_int8 - onnx/models/MobileNetV3_small_vaiq_int8 - onnx/models/RDN_pytorch_vaiq_int8 - onnx/models/RegNet_y_8gf_vaiq_int8 - onnx/models/ResNet152_vaiq_int8 - onnx/models/retinanet_resnet50_fpn_vaiq_int8 - onnx/models/RRDB_ESRGAN_vaiq_int8 - onnx/models/ShuffleNet_v2_x2_0_vaiq_int8 - onnx/models/SqueezeNet_1_1_vaiq_int8 - onnx/models/VGG11_bn_vaiq_int8 - onnx/models/VGG19_vaiq_int8 - onnx/models/VideoResNet_vaiq_int8 - onnx/models/WideResNet_50_2_vaiq_int8 - onnx/models/YoloNetV3_vaiq_int8 - onnx/models/yolov8n_vaiq_int8 - onnx/models/u-net_brain_mri_vaiq_int8 More on the way :)
Pass all tests with llvm/torch-mlir#3134 ``` | tests | model-run | onnx-import | torch-mlir | iree-compile | inference | |:-----------------------------------|:------------|:--------------|:-------------|:---------------|:------------| | onnx/operators/ReduceProdKeepdims1 | passed | passed | passed | passed | passed | |:-----------------------------------|:------------|:--------------|:-------------|:---------------|:------------| | onnx/operators/ReduceProdKeepdims0 | passed | passed | passed | passed | passed | ```
This is to test some work I'm doing on quantized batch matmul in torch-mlir to get Opt-125m to lower to linalg. For now, I'm focusing on the one case I'm seeing in opt-125m which is rank3 x rank2 -> rank3 MatMulInteger ops.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.