Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix paddle.mode and paddle.bincount API #62995

Closed

Conversation

xingmingyyj
Copy link
Contributor

@xingmingyyj xingmingyyj commented Mar 25, 2024

PR Category

Others

PR Types

Others

Description

paddle.mode和paddle.bincount两个API在静态图模式下组网执行时,出现精度问题。经过分析原因和 #62801 所遇到的问题一致,根据kernel中的数据类型进行修复。

Copy link

paddle-bot bot commented Mar 25, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Mar 25, 2024
out->set_dtype(weights.dtype());
if (weights.dtype() == DataType::FLOAT32) {
out->set_dtype(DataType::FLOAT32);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这和 out->set_dtype(weights.dtype()); 有什么区别吗?感觉原本的写法反倒更简洁些

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这和 out->set_dtype(weights.dtype()); 有什么区别吗?感觉原本的写法反倒更简洁些

这里是参照kernel的这段逻辑修改的

  if (!has_weights) {
    int64_t* output_data = dev_ctx.template Alloc<int64_t>(output);
    phi::funcs::SetConstant<Context, int64_t>()(
        dev_ctx, output, static_cast<int64_t>(0));

    KernelBincount<T, InputT, int64_t>
        <<<GET_BLOCKS(input_numel), PADDLE_CUDA_NUM_THREADS, 0, stream>>>(
            input_data, input_numel, has_weights, weights_data, output_data);
  } else {
    if (weights->dtype() == DataType::FLOAT32) {
      float* output_data = dev_ctx.template Alloc<float>(output);
      phi::funcs::SetConstant<Context, float>()(
          dev_ctx, output, static_cast<float>(0));

      KernelBincount<T, InputT, float>
          <<<GET_BLOCKS(input_numel), PADDLE_CUDA_NUM_THREADS, 0, stream>>>(
              input_data, input_numel, has_weights, weights_data, output_data);
    } else {
      double* output_data = dev_ctx.template Alloc<double>(output);
      phi::funcs::SetConstant<Context, double>()(
          dev_ctx, output, static_cast<double>(0));
      KernelBincount<T, InputT, double>
          <<<GET_BLOCKS(input_numel), PADDLE_CUDA_NUM_THREADS, 0, stream>>>(
              input_data, input_numel, has_weights, weights_data, output_data);
    }
  }
}

这里的逻辑和out->set_dtype(weights.dtype());有出入

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

补充下现在weights的dtype

@xingmingyyj xingmingyyj requested a review from kangguangli March 27, 2024 09:08
Copy link

paddle-ci-bot bot commented Apr 2, 2024

Sorry to inform you that 2564443's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@xingmingyyj
Copy link
Contributor Author

补充说明bincount报错信息:
下面动转静代码执行时:

......
paddle.seed(33)
obj = naive_func
dy_out = obj(in_tensor, in_params, func)

paddle.seed(33)
jit_obj = paddle.jit.to_static(obj)
st_out = jit_obj(in_tensor, in_params, func)
print("dy_out is: ", dy_out)
print("st_out is: ", st_out)

paddle.jit.save(jit_obj, path="bincount")
print("jit.save is successfully !!!")

paddle.seed(33)
jit = paddle.jit.load("bincount")
print("jit.load is successfully !!!")

paddle.seed(33)
inputs_key = sorted(in_tensor.keys())
inputs_value = []
for k in inputs_key:
    inputs_value.append(in_tensor[k])
# print('inputs_value is: ', inputs_value)
res = jit(*inputs_value)
print('jit.load res: ', res)

compare(dy_out, res, delta=1e-5, rtol=1e-6)

报错如下:

Traceback (most recent call last):
  File "/home/aistudio/fix_op/Paddle/tools/fix_bitcount.py", line 106, in <module>
    res = jit(*inputs_value)
  File "/home/aistudio/fix_op/Paddle/build/python/paddle/nn/layer/layers.py", line 1429, in __call__
    return self.forward(*inputs, **kwargs)
  File "/home/aistudio/fix_op/Paddle/build/python/paddle/jit/translated_layer.py", line 1475, in __i_m_p_l__
    return _run_dygraph(self, input, program_holder)
  File "/home/aistudio/fix_op/Paddle/build/python/paddle/jit/translated_layer.py", line 1002, in _run_dygraph
    _legacy_C_ops.run_program(
ValueError: In user code:


    InvalidArgumentError: The type of data we are trying to retrieve (int32) does not match the type of data (int64) currently contained in the container.
      [Hint: Expected dtype() == phi::CppTypeToDataType<T>::Type(), but received dtype():9 != phi::CppTypeToDataType<T>::Type():7.] (at /home/aistudio/fix_op/Paddle/paddle/phi/core/dense_tensor.cc:161)
      [operator < pd_kernel.phi_kernel > error]  [operator < run_program > error]

这里可以发现在scale这算子中,张量的实际数据类型和目前期望的数据类型不一致。
执行器执行的计算图如下:

{
    (%0) = "data(phi_kernel)" () {dtype:(pd_op.DataType)bool,is_persistable:[false],kernel_key:<backend:GPU|layout:Undefined(AnyLayout)|dtype:int32>,kernel_name:"data",name:"_jst.0.a.0",op_name:"pd_op.data",place:(pd_op.Place)Place(gpu:0),shape:(pd_op.IntArray)[],stop_gradient:[false]} : () -> gpu_tensor<10xi32>
    (%1) = "full(phi_kernel)" () {dtype:(pd_op.DataType)int32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:int32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)0} : () -> cpu_tensor<1xi32>
    (%2) = "bincount(phi_kernel)" (%0, <<NULL VALUE>>, %1) {is_persistable:[false],kernel_key:<backend:GPU|layout:NCHW|dtype:int32>,kernel_name:"bincount",op_name:"pd_op.bincount",stop_gradient:[false]} : (gpu_tensor<10xi32>, <<NULL TYPE>>, cpu_tensor<1xi32>) -> gpu_tensor<-1xi32>
    (%3) = "full(phi_kernel)" () {dtype:(pd_op.DataType)float32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)1} : () -> cpu_tensor<1xf32>
    (%4) = "scale(phi_kernel)" (%2, %3) {bias:(Float)0,bias_after_scale:true,is_persistable:[false],kernel_key:<backend:GPU|layout:NCHW|dtype:int32>,kernel_name:"scale",op_name:"pd_op.scale",stop_gradient:[false]} : (gpu_tensor<-1xi32>, cpu_tensor<1xf32>) -> gpu_tensor<-1xi32>
    () = "builtin.shadow_output" (%4) {output_name:"translated_layer/scale_0.tmp_0"} : (gpu_tensor<-1xi32>) -> 
}

猜测时infermeta中的dtype设置问题导致的。这里weight为空,x.dtype为int32,所以被设置为了int32类型,和kernel中的下述逻辑不符。

  if (!has_weights) {
    int64_t* output_data = dev_ctx.template Alloc<int64_t>(output);
    phi::funcs::SetConstant<Context, int64_t>()(
        dev_ctx, output, static_cast<int64_t>(0));

    KernelBincount<T, InputT, int64_t>
        <<<GET_BLOCKS(input_numel), PADDLE_CUDA_NUM_THREADS, 0, stream>>>(
            input_data, input_numel, has_weights, weights_data, output_data);
  }

Copy link

paddle-ci-bot bot commented Apr 12, 2024

Sorry to inform you that e9d0862's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants