Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: [R] better documentation for col_types argument in open_delim_dataset #45719

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

atsyplenkov
Copy link

Rationale for this change

Hi, can you please consider this tiny update to the docs? In the current documentation, it's misleading how to specify col_types when a delimited file is scanned using open_csv_dataset, open_delim_dataset, etc. Reading what is currently written, one may assume that they can declare column types by providing the compact string representation that readr uses.

\item{col_types}{A compact string representation of the column types,
an Arrow \link{Schema}, or \code{NULL} (the default) to infer types from the data.}

But it doesn't work. See reprex below

library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
tf <- tempfile()
dir.create(tf)
df <- data.frame(x = c("1", "2", "NULL"))

file_path <- file.path(tf, "file1.txt")
write.table(df, file_path, sep = ",", row.names = FALSE)

open_csv_dataset(file_path, na = c("", "NA", "NULL"), col_types = "c")
#> Error:
#> ! Unsupported `col_types` specification.
#> ℹ `col_types` must be NULL, or a <Schema>.

unlink(tf)

What changes are included in this PR?

The current PR provides a clearer explanation of what should be passed to the col_types argument, along with a basic example for the open_csv_dataset().

Are these changes tested?

Not needed, as only the R documentation has been updated

Are there any user-facing changes?

Only the R documentation has been updated

Copy link

github-actions bot commented Mar 9, 2025

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@atsyplenkov atsyplenkov changed the title docs: better documentation for col_types argument in R's open_delim_dataset MINOR: [R] better documentation for col_types argument in open_delim_dataset Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant