MINOR: [R] better documentation for col_types argument in open_delim_dataset #45719

atsyplenkov · 2025-03-09T22:13:21Z

Rationale for this change

Hi, can you please consider this tiny update to the docs? In the current documentation, it's misleading how to specify col_types when a delimited file is scanned using open_csv_dataset, open_delim_dataset, etc. Reading what is currently written, one may assume that they can declare column types by providing the compact string representation that readr uses.

arrow/r/man/open_delim_dataset.Rd

Lines 164 to 165 in 3c8fe09

    
           \item{col_types}{A compact string representation of the column types, 
        
           an Arrow \link{Schema}, or \code{NULL} (the default) to infer types from the data.}

But it doesn't work. See reprex below

library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
tf <- tempfile()
dir.create(tf)
df <- data.frame(x = c("1", "2", "NULL"))

file_path <- file.path(tf, "file1.txt")
write.table(df, file_path, sep = ",", row.names = FALSE)

open_csv_dataset(file_path, na = c("", "NA", "NULL"), col_types = "c")
#> Error:
#> ! Unsupported `col_types` specification.
#> ℹ `col_types` must be NULL, or a <Schema>.

unlink(tf)

What changes are included in this PR?

The current PR provides a clearer explanation of what should be passed to the col_types argument, along with a basic example for the open_csv_dataset().

Are these changes tested?

Not needed, as only the R documentation has been updated

Are there any user-facing changes?

Only the R documentation has been updated

github-actions · 2025-03-09T22:13:48Z

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MINOR: [R] better documentation for col_types argument in open_delim_dataset #45719

MINOR: [R] better documentation for col_types argument in open_delim_dataset #45719

atsyplenkov commented Mar 9, 2025

github-actions bot commented Mar 9, 2025

	\item{col_types}{A compact string representation of the column types,
	an Arrow \link{Schema}, or \code{NULL} (the default) to infer types from the data.}

MINOR: [R] better documentation for col_types argument in open_delim_dataset #45719

Are you sure you want to change the base?

MINOR: [R] better documentation for col_types argument in open_delim_dataset #45719

Conversation

atsyplenkov commented Mar 9, 2025

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented Mar 9, 2025