[Paddle Inference] Add bias input of mmha and simplify mmha. #56411

xiaoxiaohehe001 · 2023-08-17T16:29:22Z

PR types

Others

PR changes

Others

Description

Add bias input of mmha and simplify mmha.
关联pr #55344
Pcard-71502

paddle-bot · 2023-08-17T16:29:26Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

RichardWooSJTU · 2023-08-18T03:19:17Z

paddle/phi/infermeta/multiary.cc

@@ -3987,6 +3987,7 @@ void WeightOnlyMatmulInferMeta(const MetaTensor& x,



量化的完整需求是
输入为int32/float16/32
输出为int8/float/16/32

我看这里考虑了输出是int8的情况但是没考虑输入是int32的情况是么

RichardWooSJTU · 2023-08-18T03:22:27Z

paddle/phi/kernels/fusion/gpu/masked_multihead_attention.cu

 #include "paddle/phi/core/kernel_registry.h"
+#include "paddle/phi/kernels/funcs/aligned_vector.h"
+#include "paddle/phi/kernels/fusion/gpu/mmha_util.cu.h"

 namespace phi {
 namespace fusion {



为啥要把头文件干掉呢

不需要添加多余的头文件，防止被其他调用

heavengate · 2023-08-21T06:36:01Z

python/paddle/incubate/nn/functional/masked_multihead_attention.py

@@ -43,6 +45,7 @@ def masked_multihead_attention(
    Args:
        x (Tensor): The input tensor could be 2-D tensor. Its shape is [batch_size, 3 * num_head * head_dim].
        cache_kvs (list(Tensor)|tuple(Tensor)): The cache structure tensors for the generation model. Its shape is [2, batch_size, num_head, max_seq_len, head_dim].
+        bias (Tensor, optional): The bias tensor. Its shape is [3, num_head, head_dim].


也需要加一下compte_dtype的参数说明~

RichardWooSJTU · 2023-08-21T06:39:35Z

test/legacy_test/test_masked_multihead_attention_op.py

@@ -77,7 +80,7 @@ def setUp(self):
        self.seq_len = 1
        self.rotary_emb_dims = 0
        self.use_neox_rotary_style = False
-
+        self.compute_dtype = "default"


单测要不加一下输入为Int32的情形

单测有int32的情况

vivienfanghuagood · 2023-08-23T07:37:47Z

python/paddle/incubate/nn/functional/masked_multihead_attention.py

@@ -53,6 +56,7 @@ def masked_multihead_attention(
        seq_len (int, optional): The seq_len, used to get input length. Default 1.
        rotary_emb_dims (int, optional): The rotary_emb_dims. Default 1.
        use_neox_rotary_style (bool, optional): A flag indicating whether neox_rotary_style is needed or not. Default False.
+        compute_dtype (string): A compute dtype, used to represent the input data type.


compute_dtype的为啥不能根据输入tensor的类型判断呢？

ptq 情况下，输入 x 有可能是int32，如果根据 cache_kv dtype 判断，后续 cache_kv 量化支持还需要修改。

vivienfanghuagood

LGTM for API change

lanxianghit

LGTM for new args

…addle#56411) * add_bias_and_simplify_mmha

xiaoxiaohehe001 added 2 commits August 18, 2023 00:18

add_bias_and_simplify_mmha

b4185c0

add_bias_and_simplify_mmha

d9882c4

fix_test_py

9880c35

RichardWooSJTU reviewed Aug 18, 2023

View reviewed changes

xiaoxiaohehe001 added 4 commits August 20, 2023 23:58

add_compute_type

0a24b4e

delete_log

faf8f2e

support_mmha_ptq

d406504

fix_bfloat16_ci

e66cbd3

heavengate reviewed Aug 21, 2023

View reviewed changes

RichardWooSJTU reviewed Aug 21, 2023

View reviewed changes

add_compute_type_args

7592937

zyfncg approved these changes Aug 22, 2023

View reviewed changes

jzhang533 approved these changes Aug 22, 2023

View reviewed changes

vivienfanghuagood reviewed Aug 23, 2023

View reviewed changes

vivienfanghuagood approved these changes Aug 23, 2023

View reviewed changes

Aurelius84 approved these changes Aug 23, 2023

View reviewed changes

lanxianghit reviewed Aug 23, 2023

View reviewed changes

lanxianghit approved these changes Aug 23, 2023

View reviewed changes

raindrops2sea approved these changes Aug 23, 2023

View reviewed changes

heavengate approved these changes Aug 23, 2023

View reviewed changes

xiaoxiaohehe001 mentioned this pull request Aug 24, 2023

[Paddle Inference] Add multi_query_attention kernel #56432

Closed

heavengate merged commit 636dc2f into PaddlePaddle:develop Aug 25, 2023

BeingGod pushed a commit to BeingGod/Paddle that referenced this pull request Sep 9, 2023

[Paddle Inference] Add bias input of mmha and simplify mmha. (PaddleP…

dc2d013

…addle#56411) * add_bias_and_simplify_mmha

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Paddle Inference] Add bias input of mmha and simplify mmha. #56411

[Paddle Inference] Add bias input of mmha and simplify mmha. #56411

xiaoxiaohehe001 commented Aug 17, 2023 •

edited

Loading

paddle-bot bot commented Aug 17, 2023

RichardWooSJTU Aug 18, 2023

RichardWooSJTU Aug 18, 2023

xiaoxiaohehe001 Aug 18, 2023

heavengate Aug 21, 2023

xiaoxiaohehe001 Aug 21, 2023

RichardWooSJTU Aug 21, 2023

xiaoxiaohehe001 Aug 21, 2023

vivienfanghuagood Aug 23, 2023

xiaoxiaohehe001 Aug 23, 2023

vivienfanghuagood left a comment

lanxianghit left a comment

		@@ -3987,6 +3987,7 @@ void WeightOnlyMatmulInferMeta(const MetaTensor& x,

[Paddle Inference] Add bias input of mmha and simplify mmha. #56411

[Paddle Inference] Add bias input of mmha and simplify mmha. #56411

Conversation

xiaoxiaohehe001 commented Aug 17, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Aug 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vivienfanghuagood left a comment

Choose a reason for hiding this comment

lanxianghit left a comment

Choose a reason for hiding this comment

xiaoxiaohehe001 commented Aug 17, 2023 •

edited

Loading