-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ukernel lowering for data-tiled multi_mma
with mfma_i32_16x16x32_i8
#19522
Conversation
Just thought of a problem: while it takes a int unroll_k parameter, the only value that works is 2, because it uses a fixed vector type with that size. I'll fix that tomorrow. |
Drive by comment, aren't the unroll_k etc. implementation details of the compiler? Do they need to cross the ukernel api boundary? I would expect the ukernel to only worry about the problem size/architecture and not these details. |
These iree/compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/GPUTileSwizzleUtils.cpp Lines 150 to 155 in e553425
|
compiler/plugins/target/ROCM/builtins/ukernel/iree_uk_amdgpu_multi_mma_mfma_i32_16x16x32_i8.c
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPULowerToUKernels.cpp
Outdated
Show resolved
Hide resolved
// Preserve the lowering_config attribute for GPULowerToUKernelsPass. | ||
constexpr char loweringConfigAttrName[] = "lowering_config"; | ||
if (mmaOp->hasAttr(loweringConfigAttrName)) { | ||
newMmaOp->setAttr(loweringConfigAttrName, | ||
mmaOp->getAttr(loweringConfigAttrName)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use kConfigAttrName
.
constexpr StringLiteral kConfigAttrName = "lowering_config"; |
Question: do we preserve all the discardable attributes (i.e., additional attributes that are not defined by the op itself)? If so, you can do something like
newMmaOp->setDiscardableAttrs(mmaOp->getDiscardableAttrDictionary());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that worked! Earlier I had tried something with setAttrs
/ getAttrs
and that caused other tests to fail. I didn't know about setDiscardableAttrs
.
auto newKind = mmaOp.getKind(); | ||
if (auto dataTiledMma = dyn_cast<DataTiledMMAAttr>(newKind)) { | ||
newKind = DataTiledMMAAttr::get( | ||
context, dataTiledMma.getIntrinsic(), dataTiledMma.getUnrollM(), | ||
/*subgroups_m=*/1, dataTiledMma.getUnrollN(), | ||
/*subgroups_n=*/1, dataTiledMma.getUnrollK()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not pay much attention on the changes of DataTiledMMAAttr. Why do we drop the newKind here? Does it impact codegen path? Or it is handled by attribute interface implementation? Thanks for your explanation in advance, and perhaps we can put the information to the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are now transporting the old kind, unchanged. The code being deleted here was creating a new kind that only preserved the unroll_*
parameters, but had the subgroups_*
parameters set to 1. I thought that those parameters were inherently not needed after thread-distribution. That changed with ukernels. While it remains true that the subgroups_*
parameters should not be needed after thread-distribution, in order to avoid needing them, codegen makes use of all the stride information for all the expanded dimensions. We could pass all these strides to the ukernels, but that woud be cumbersome, particularly as the number of dimension varies as unit dimensions are omitted. So in this case, passing the original DataTiledMMAAttr parameters and letting the ukernel infer the strides, results in much simpler code. The drawback is an interaction-at-a-distance in the layouts implied by these parameters, but I think that's OK.
compiler/src/iree/compiler/Codegen/Common/GPU/GPULowerToUKernels.cpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
e4fa8e7
to
e23f5a2
Compare
Resolved. |
This finishes implementing an initial ukernel for
multi_mma
forDataTiledMMAAttr
withkind = mfma_i32_16x16x32_i8
.The ukernel takes unroll and subgroup parameters as function parameters. The idea is that once inlining works as intended, these function parameters will be constants and the optimized code will be the same as if we had hardcoded specific values. This inlining isn't happening at the moment, but that is a bug that we should fix first. It is happening in LLVMCPU, so that's probably something missing in LLVMGPU.
The ukernel file has a comment with a few TODOs to get from this initial naive ukernel to something faster. The first step is to fix the above-mentioned inlining problem, then get shared memory, then get better instruction scheduling.