[BUG] `matmul` do not support int32 tensors. #538

ZJUGuoShuai · 2023-12-16T09:11:15Z

Describe the bug
It seems like the cuBLASLt-backed GEMM (matx::matmul) do not support int type, and it just gives wrong result without complaining about unsupported types.

To Reproduce
Steps to reproduce the behavior:

Write a simple GEMM example code that multiplies two int matrices:

#include "matx.h"
#include <cassert>
#include <cstdio>

using namespace matx;

int main() {
  MATX_ENTER_HANDLER();

  index_t M = 2;
  index_t N = 3;

  auto m = make_tensor<int>({M, N});
  auto v = make_tensor<int>({N, 1});

  m.SetVals({{1, 2, 3},
             {4, 5, 6}});
  v.SetVals({{1, 2, 3}});

  auto out = make_tensor<int>({M, 1});

  (out = matmul(m, v)).run();

  cudaStreamSynchronize(0);

  printf("m:\n");
  print(m);
  printf("v:\n");
  print(v);
  printf("out:\n");
  print(out);

  CUDA_CHECK_LAST_ERROR();
  MATX_EXIT_HANDLER();
}

Build the code and run, you will get this output with all-zero result (cuBLASLt as the default backend):

m:
Tensor{int32_t} Rank: 2, Sizes:[2, 3], Strides:[3,1]
000000: 1 2 3 
000001: 4 5 6 
v:
Tensor{int32_t} Rank: 2, Sizes:[3, 1], Strides:[1,1]
000000: 1 
000001: 2 
000002: 3 
out:
Tensor{int32_t} Rank: 2, Sizes:[2, 1], Strides:[1,1]
000000: 0 
000001: 0

Expected behavior
The result should be [14, 32] like float matmul:

m:
Tensor{float} Rank: 2, Sizes:[2, 3], Strides:[3,1]
000000: 1.0000e+00 2.0000e+00 3.0000e+00 
000001: 4.0000e+00 5.0000e+00 6.0000e+00 
v:
Tensor{float} Rank: 2, Sizes:[3, 1], Strides:[1,1]
000000: 1.0000e+00 
000001: 2.0000e+00 
000002: 3.0000e+00 
out:
Tensor{float} Rank: 2, Sizes:[2, 1], Strides:[1,1]
000000: 1.4000e+01 
000001: 3.2000e+01

Code snippers
Listed above.

System details (please complete the following information):

OS: Ubuntu 22.04
CUDA version: 11.8.89
g++ version: 11.4.0

Additional context
I also tried turning on CUTLASS (-DMATX_EN_CUTLASS=ON) and it can support int tensors with some minor fixes.

Currently in MatX, the matmul_impl() uses cuBLASLt as the default GEMM provider and don't have a easy way to switch to CUTLASS, so I changed the template default parameter into PROVIDER_TYPE_CUTLASS:

template <typename TensorTypeC, typename TensorTypeA, typename TensorTypeB, 
-         MatXMatMulProvider_t PROV = PROVIDER_TYPE_CUBLASLT>
+         MatXMatMulProvider_t PROV = PROVIDER_TYPE_CUTLASS>
void matmul_impl(TensorTypeC C, const TensorTypeA A,
            const TensorTypeB B, cudaStream_t stream = 0,
            float alpha = 1.0, float beta = 0.0)
{

Then I changed this in matxMatMulHandle_t::MatMulLaunch to suppress the compiler's float-to-int conversion error:

typename CutlassGemm::Arguments args(
            {static_cast<int>(params_.m), static_cast<int>(params_.n),
             static_cast<int>(params_.k)}, // Gemm Problem dimensions
            {a.Data(),
             static_cast<int>(params_.lda)}, // Tensor-ref for source matrix A
            {b.Data(),
             static_cast<int>(params_.ldb)}, // Tensor-ref for source matrix B
            {c.Data(),
             static_cast<int>(params_.ldc)}, // Tensor-ref for source matrix C
            {c.Data(),
             static_cast<int>(
                 params_.ldc)}, // Tensor-ref for destination matrix D (may be
                                // different memory than source C matrix)
-           {alpha, beta});     // Scalars used in the Epilogue
+           {static_cast<T1>(alpha), static_cast<T1>(beta)});     // Scalars used in the Epilogue

Now the output is normal:

m:
Tensor{int32_t} Rank: 2, Sizes:[2, 3], Strides:[3,1]
000000: 1 2 3 
000001: 4 5 6 
v:
Tensor{int32_t} Rank: 2, Sizes:[3, 1], Strides:[1,1]
000000: 1 
000001: 2 
000002: 3 
out:
Tensor{int32_t} Rank: 2, Sizes:[2, 1], Strides:[1,1]
000000: 14 
000001: 32

My suggestion
If it is not necessary to support int32 GEMM, we can throw an error to the user that it is not supported, and tell him/her to either use the CUTLASS's version or use float32 as the solution.

The text was updated successfully, but these errors were encountered:

cliffburdick · 2023-12-16T14:55:07Z

Hi @AtomicVar, thanks for the report. You're right cuBLAS doesn't support it. The reason we disabled CUTLASS is because of the compile times taking extremely long in the unit tests. Even though the functionality and speed were both adequate, the compiles were taking over 40 minutes. We can add a message for this saying it's an unsupported type.

cliffburdick · 2023-12-17T02:52:22Z

@AtomicVar I've submitted MR #540 to resolve this

cliffburdick · 2023-12-18T00:42:18Z

@AtomicVar according to the cublas team we're not meeting the requirements here: https://docs.nvidia.com/cuda/cublas/index.html#cublasltmatmul

Namely A must be transposed.

If that still works for you, we can add support for it when those requirements are met.

ZJUGuoShuai · 2023-12-18T06:48:29Z

@cliffburdick Actually I can use float matmul instead of int matmul to accomplish the same task. So it won't be a problem if int is not supported. I just think the unsupported conditions should be documented and we need to throw and print clear error messages.

cliffburdick mentioned this issue Dec 17, 2023

Check matmul types and error at compile-time if the backend doesn't support them #540

Merged

cliffburdick closed this as completed in #540 Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `matmul` do not support int32 tensors. #538

[BUG] `matmul` do not support int32 tensors. #538

ZJUGuoShuai commented Dec 16, 2023 •

edited

Loading

cliffburdick commented Dec 16, 2023

cliffburdick commented Dec 17, 2023

cliffburdick commented Dec 18, 2023 •

edited

Loading

ZJUGuoShuai commented Dec 18, 2023 •

edited

Loading

[BUG] matmul do not support int32 tensors. #538

[BUG] matmul do not support int32 tensors. #538

Comments

ZJUGuoShuai commented Dec 16, 2023 • edited Loading

cliffburdick commented Dec 16, 2023

cliffburdick commented Dec 17, 2023

cliffburdick commented Dec 18, 2023 • edited Loading

ZJUGuoShuai commented Dec 18, 2023 • edited Loading

[BUG] `matmul` do not support int32 tensors. #538

[BUG] `matmul` do not support int32 tensors. #538

ZJUGuoShuai commented Dec 16, 2023 •

edited

Loading

cliffburdick commented Dec 18, 2023 •

edited

Loading

ZJUGuoShuai commented Dec 18, 2023 •

edited

Loading