Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] matmul do not support int32 tensors. #538

Closed
ZJUGuoShuai opened this issue Dec 16, 2023 · 4 comments · Fixed by #540
Closed

[BUG] matmul do not support int32 tensors. #538

ZJUGuoShuai opened this issue Dec 16, 2023 · 4 comments · Fixed by #540

Comments

@ZJUGuoShuai
Copy link
Contributor

ZJUGuoShuai commented Dec 16, 2023

Describe the bug
It seems like the cuBLASLt-backed GEMM (matx::matmul) do not support int type, and it just gives wrong result without complaining about unsupported types.

To Reproduce
Steps to reproduce the behavior:

  1. Write a simple GEMM example code that multiplies two int matrices:
#include "matx.h"
#include <cassert>
#include <cstdio>

using namespace matx;

int main() {
  MATX_ENTER_HANDLER();

  index_t M = 2;
  index_t N = 3;

  auto m = make_tensor<int>({M, N});
  auto v = make_tensor<int>({N, 1});

  m.SetVals({{1, 2, 3},
             {4, 5, 6}});
  v.SetVals({{1, 2, 3}});

  auto out = make_tensor<int>({M, 1});

  (out = matmul(m, v)).run();

  cudaStreamSynchronize(0);

  printf("m:\n");
  print(m);
  printf("v:\n");
  print(v);
  printf("out:\n");
  print(out);

  CUDA_CHECK_LAST_ERROR();
  MATX_EXIT_HANDLER();
}
  1. Build the code and run, you will get this output with all-zero result (cuBLASLt as the default backend):
m:
Tensor{int32_t} Rank: 2, Sizes:[2, 3], Strides:[3,1]
000000: 1 2 3 
000001: 4 5 6 
v:
Tensor{int32_t} Rank: 2, Sizes:[3, 1], Strides:[1,1]
000000: 1 
000001: 2 
000002: 3 
out:
Tensor{int32_t} Rank: 2, Sizes:[2, 1], Strides:[1,1]
000000: 0 
000001: 0

Expected behavior
The result should be [14, 32] like float matmul:

m:
Tensor{float} Rank: 2, Sizes:[2, 3], Strides:[3,1]
000000: 1.0000e+00 2.0000e+00 3.0000e+00 
000001: 4.0000e+00 5.0000e+00 6.0000e+00 
v:
Tensor{float} Rank: 2, Sizes:[3, 1], Strides:[1,1]
000000: 1.0000e+00 
000001: 2.0000e+00 
000002: 3.0000e+00 
out:
Tensor{float} Rank: 2, Sizes:[2, 1], Strides:[1,1]
000000: 1.4000e+01 
000001: 3.2000e+01

Code snippers
Listed above.

System details (please complete the following information):

  • OS: Ubuntu 22.04
  • CUDA version: 11.8.89
  • g++ version: 11.4.0

Additional context
I also tried turning on CUTLASS (-DMATX_EN_CUTLASS=ON) and it can support int tensors with some minor fixes.

Currently in MatX, the matmul_impl() uses cuBLASLt as the default GEMM provider and don't have a easy way to switch to CUTLASS, so I changed the template default parameter into PROVIDER_TYPE_CUTLASS:

template <typename TensorTypeC, typename TensorTypeA, typename TensorTypeB, 
-         MatXMatMulProvider_t PROV = PROVIDER_TYPE_CUBLASLT>
+         MatXMatMulProvider_t PROV = PROVIDER_TYPE_CUTLASS>
void matmul_impl(TensorTypeC C, const TensorTypeA A,
            const TensorTypeB B, cudaStream_t stream = 0,
            float alpha = 1.0, float beta = 0.0)
{

Then I changed this in matxMatMulHandle_t::MatMulLaunch to suppress the compiler's float-to-int conversion error:

typename CutlassGemm::Arguments args(
            {static_cast<int>(params_.m), static_cast<int>(params_.n),
             static_cast<int>(params_.k)}, // Gemm Problem dimensions
            {a.Data(),
             static_cast<int>(params_.lda)}, // Tensor-ref for source matrix A
            {b.Data(),
             static_cast<int>(params_.ldb)}, // Tensor-ref for source matrix B
            {c.Data(),
             static_cast<int>(params_.ldc)}, // Tensor-ref for source matrix C
            {c.Data(),
             static_cast<int>(
                 params_.ldc)}, // Tensor-ref for destination matrix D (may be
                                // different memory than source C matrix)
-           {alpha, beta});     // Scalars used in the Epilogue
+           {static_cast<T1>(alpha), static_cast<T1>(beta)});     // Scalars used in the Epilogue

Now the output is normal:

m:
Tensor{int32_t} Rank: 2, Sizes:[2, 3], Strides:[3,1]
000000: 1 2 3 
000001: 4 5 6 
v:
Tensor{int32_t} Rank: 2, Sizes:[3, 1], Strides:[1,1]
000000: 1 
000001: 2 
000002: 3 
out:
Tensor{int32_t} Rank: 2, Sizes:[2, 1], Strides:[1,1]
000000: 14 
000001: 32

My suggestion
If it is not necessary to support int32 GEMM, we can throw an error to the user that it is not supported, and tell him/her to either use the CUTLASS's version or use float32 as the solution.

@cliffburdick
Copy link
Collaborator

Hi @AtomicVar, thanks for the report. You're right cuBLAS doesn't support it. The reason we disabled CUTLASS is because of the compile times taking extremely long in the unit tests. Even though the functionality and speed were both adequate, the compiles were taking over 40 minutes. We can add a message for this saying it's an unsupported type.

@cliffburdick
Copy link
Collaborator

@AtomicVar I've submitted MR #540 to resolve this

@cliffburdick
Copy link
Collaborator

cliffburdick commented Dec 18, 2023

@AtomicVar according to the cublas team we're not meeting the requirements here: https://docs.nvidia.com/cuda/cublas/index.html#cublasltmatmul

Namely A must be transposed.

If that still works for you, we can add support for it when those requirements are met.

@ZJUGuoShuai
Copy link
Contributor Author

ZJUGuoShuai commented Dec 18, 2023

@cliffburdick Actually I can use float matmul instead of int matmul to accomplish the same task. So it won't be a problem if int is not supported. I just think the unsupported conditions should be documented and we need to throw and print clear error messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants