Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eager string tensor #41039

Merged
merged 13 commits into from
Apr 15, 2022
Merged

Conversation

joey12300
Copy link
Contributor

@joey12300 joey12300 commented Mar 28, 2022

PR types

New features

PR changes

Others

Describe

接口描述

新增core.eager.StringTensor接口,目前主要添加了6个初始化方法以及numpy方法,结构如下:

class StringTensor:
   def __init__(): 
       # case 1
       pass
   def __init__(dims: vector<int>, name: std::string):
       # case 2
       pass
   def __init__(value: ndarray, name: std::string): #去除place, zero_copy参数
       # case 3
       pass
   def __init__(value: ndarray):
       # case 4
       pass
   def __init__(tensor: Tensor):
       # case 5
       pass
   def __init__(tensor: Tensor, name: std::string): #去除place参数
       # case 6
       pass
   def numpy():
       # 返回Unicode类型的numpy array
       pass

core.eager.StringTensor的初始化方法接口设计参考core.eager.Tensor的接口,并根据StringTensor的需求删除了dtypepersistable, stop_gradient三个参数输入。目前StringTensor暂时只对外暴露CPU上的功能,所以在接口中也删除place参数。至于zero_copy参数,由于StringTensor的内存分布与numpy的string array内存分布不一致,无法直接使用numpy的string array初始化StringTensor,所以这里删除了zero_copy。core.eager.StringTensor不支持使用FrameworkTensor初始化。

使用示例

import numpy as np
import paddle
from paddle.fluid import core

A_arr= np.array([
  ["15.4寸笔记本的键盘确实爽,基本跟台式机差不多了,蛮喜欢数字小键盘"],  # From ChnSentiCorp
  ["One of the very best Three Stooges shorts ever."]])  # From IMDB

ST1 = core.eager.StringTensor()  # case 1
ST2 = core.eager.StringTensor([2, 3], "ST2")  # case 2
ST3 = core.eager.StringTensor(A_arr, "ST3")  # case 3
ST4 = core.eager.StringTensor(A_arr)  # case 4
ST5 = core.eager.StringTensor(ST4)  # case 5

string_tensors = [ ST1, ST2, ST3, ST4, ST5]
for i, tensor in enumerate(string_tensors):
    lower_tensor = core.eager.ops.final_state_strings_lower(tensor, True)
    upper_tensor = core.eager.ops.final_state_strings_upper(tensor, True)
    print("case {}: Tensor={}, name={}, shape={}, dtype={}, place={}, numpy={}".format(
      i, tensor, tensor.name, tensor.shape, tensor.dtype, tensor.place, tensor.numpy()))
    print("case {}: LowerTensor={}, name={}, shape={}, dtype={}, place={}, numpy={}".format(
      i, lower_tensor, lower_tensor.name, lower_tensor.shape, lower_tensor.dtype, lower_tensor.place, lower_tensor.numpy()))
    print("case {}: UpperTensor={}, name={}, shape={}, dtype={}, place={}, numpy={}".format(
      i, upper_tensor, upper_tensor.name, upper_tensor.shape, upper_tensor.dtype, upper_tensor.place, upper_tensor.numpy()))

@paddle-bot-old
Copy link

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@joey12300 joey12300 force-pushed the string_tensor_creator branch from 5055381 to c48cf72 Compare March 30, 2022 08:03
@joey12300 joey12300 marked this pull request as ready for review March 30, 2022 08:04
@joey12300 joey12300 force-pushed the string_tensor_creator branch from c48cf72 to 9d55fe3 Compare March 30, 2022 08:19
@joey12300 joey12300 force-pushed the string_tensor_creator branch from af51e82 to d49f6de Compare April 4, 2022 12:44
@joey12300 joey12300 force-pushed the string_tensor_creator branch from 2249eef to 1909f78 Compare April 4, 2022 14:20
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Apr 8, 2022
@PaddlePaddle PaddlePaddle unlocked this conversation Apr 8, 2022
@joey12300 joey12300 requested a review from JiabinYang April 10, 2022 07:52
Copy link
Contributor

@JiabinYang JiabinYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joey12300 joey12300 merged commit a22b68b into PaddlePaddle:develop Apr 15, 2022
joey12300 added a commit to joey12300/Paddle that referenced this pull request Apr 15, 2022
* Add core.eager.StringTensor __init__ which pyarray args can be passed

* Add the numpy method of core.eager.StringTensor

* revert tensor.to_string modification

* Add ToPyObject for core.eager.StringTensor

* Add debug string for core.eager.StringTensor

* Remove place args of core.eager.StringTensor temporarily

* Fix check string_tensor error

* remove dtype of core.eager.StringTensor

* add core.eager.StringTensor unittest

* remove pstring from VarDesc

* Add InitStringTensorWithStringTensor

* Remove to_string modification

* Remove zero_copy arg from StringTensor creator
phlrain pushed a commit that referenced this pull request Apr 18, 2022
* Add core.eager.StringTensor __init__ which pyarray args can be passed

* Add the numpy method of core.eager.StringTensor

* revert tensor.to_string modification

* Add ToPyObject for core.eager.StringTensor

* Add debug string for core.eager.StringTensor

* Remove place args of core.eager.StringTensor temporarily

* Fix check string_tensor error

* remove dtype of core.eager.StringTensor

* add core.eager.StringTensor unittest

* remove pstring from VarDesc

* Add InitStringTensorWithStringTensor

* Remove to_string modification

* Remove zero_copy arg from StringTensor creator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants