Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for LargeUtf8 column type #1024

Closed
chitralverma opened this issue Dec 17, 2022 · 1 comment · Fixed by #1044
Closed

Support for LargeUtf8 column type #1024

chitralverma opened this issue Dec 17, 2022 · 1 comment · Fixed by #1044
Labels
bug Something isn't working

Comments

@chitralverma
Copy link
Contributor

chitralverma commented Dec 17, 2022

Environment

Delta-rs version: latest

Binding:
Python

Environment:

  • Cloud provider: all
  • OS: mac ventura, m1
  • Other:

Bug

What happened:
While writing a delta table to a location from a PyArrow table, the writer breaks with the following error,

deltalake.PyDeltaTableError: Schema error: Invalid data type for Delta Lake: LargeUtf8

Not sure if this is write only issue or also a read issue.

What you expected to happen:
PyArrow types like LargeUtf8, LargeString should be supported.

How to reproduce it:
You can use the following minimal code to reproduce this.

import pyarrow as pa
from deltalake.writer import write_deltalake

# Create a pyarrow table with LargeString column 'name'
pylist = [{'name': 'Joey', 'age': 14}, {'name': 'Ivan', 'age': 32}]
schema = pa.schema([pa.field('name', pa.large_string()), pa.field('age', pa.int64())])
t = pa.Table.from_pylist(pylist, schema=schema)

write_deltalake("/tmp/test-delta", data=t)

More details:

@houqp
Copy link
Member

houqp commented Dec 30, 2022

Deltalake itself doesn't distinguish between large and regular string/byte bytes: https://github.com/delta-io/delta/blob/master/PROTOCOL.md#primitive-types. I think we can just represent them using the primitive string and binary types.

wjones127 added a commit that referenced this issue Mar 30, 2023
# Description
Added missing mapping from below mentioned arrow types to delta types,
- `LargeUtf8` (LargeString) -> `string`
- `LargeBinary` -> `binary`
- `FixedSizeBinary(_)` -> `binary`
- `LargeList(_)` -> `array`
- `UInt8` -> `byte`
- `UInt16` -> `short`
- `UInt32` -> `int`
- `UInt64` -> `long`
- `Date64` -> `date`

# Related Issue(s)
closes #1024

---------

Signed-off-by: Chitral Verma <chitralverma@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Marijn Valk <marijncv@hotmail.com>
Co-authored-by: Marko Grujic <markoog@gmail.com>
Co-authored-by: Robert Pack <42610831+roeap@users.noreply.github.com>
Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>
Co-authored-by: QP Hou <dave2008713@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ilia Moshkov <is.moshkov@mail.ru>
Co-authored-by: Ilya Moshkov <ilya.moshkov@exosfinancial.com>
Co-authored-by: byteink <jonahgaox@gmail.com>
Co-authored-by: John Batty <johnbatty@microsoft.com>
Co-authored-by: Ian Alexander Joiner <14581281+iajoiner@users.noreply.github.com>
Co-authored-by: Marijn Valk <marijncv@hotmail.com>
Co-authored-by: David Blajda <db@davidblajda.com>
Co-authored-by: Tommy Guy <richardtguy84@gmail.com>
Co-authored-by: Tommy Guy <riguy@microsoft.com>
Co-authored-by: bold <bernhard@specht.net>
Co-authored-by: xudong.w <wxd963996380@gmail.com>
Co-authored-by: Rachel Bushrian <rbushri@gmail.com>
Co-authored-by: rbushrian <rbushrian@akamai.com>
Co-authored-by: Matthew Powers <matthewkevinpowers@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants