Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[save] Stream Worksheet XML to Disk #1255

Merged
merged 11 commits into from
Mar 3, 2025
Merged
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# openxlsx2 (development version)

## New features

* A new experimental `flush` argument has been introduced to `wb_save()`, allowing a custom XML streaming function for worksheets to help prevent memory spikes. This feature has only been tested within `openxlsx2` and not extensively with spreadsheet software. Since it bypasses certain failsafe mechanisms, including XML validity checks, it should only be used as a last-resort solution. [1255](https://github.com/JanMarvin/openxlsx2/pull/1255)

## Fixes

* Input validation has been added to `fmt_txt()`, similar to how it has been added to the `create_*()` family a while ago.
Expand Down
4 changes: 4 additions & 0 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,10 @@ set_sst <- function(sharedStrings) {
.Call(`_openxlsx2_set_sst`, sharedStrings)
}

write_worksheet_slim <- function(sheet_data, prior, post, fl) {
invisible(.Call(`_openxlsx2_write_worksheet_slim`, sheet_data, prior, post, fl))
}

write_worksheet <- function(prior, post, sheet_data) {
.Call(`_openxlsx2_write_worksheet`, prior, post, sheet_data)
}
Expand Down
18 changes: 16 additions & 2 deletions R/class-workbook-wrappers.R
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,24 @@ wb_workbook <- function(

#' Save a workbook to file
#'
#' @details When saving a `wbWorkbook` to a file, memory usage may spike
#' depending on the worksheet size. This happens because the entire XML
#' structure is created in memory before writing to disk. The memory
#' required depends on worksheet size, as XML files consist of character
#' data and include additional overhead for validity checks.
#'
#' The `flush` argument streams worksheet XML data directly to disk,
#' avoiding the need to build the full XML tree in memory. This reduces
#' memory usage but skips some XML validity checks. It also bypasses
#' the `pugixml` functions that `openxlsx2` uses, omitting certain
#' preliminary sanity checks before writing. As the name suggests,
#' the output is simply flushed to disk.
#'
#' @param wb A `wbWorkbook` object to write to file
#' @param file A path to save the workbook to
#' @param overwrite If `FALSE`, will not overwrite when `file` already exists.
#' @param path Deprecated argument. Please use `file` in new code.
#' @param flush Experimental, streams the worksheet file to disk
#'
#' @export
#' @family workbook wrappers
Expand All @@ -86,9 +100,9 @@ wb_workbook <- function(
#' \donttest{
#' wb_save(wb, file = temp_xlsx(), overwrite = TRUE)
#' }
wb_save <- function(wb, file = NULL, overwrite = TRUE, path = NULL) {
wb_save <- function(wb, file = NULL, overwrite = TRUE, path = NULL, flush = FALSE) {
assert_workbook(wb)
wb$clone()$save(file = file, overwrite = overwrite, path = path)
wb$clone()$save(file = file, overwrite = overwrite, path = path, flush = flush)
}

# add data ----------------------------------------------------------------
Expand Down
90 changes: 59 additions & 31 deletions R/class-workbook.R
Original file line number Diff line number Diff line change
Expand Up @@ -2910,8 +2910,9 @@ wbWorkbook <- R6::R6Class(
#' @param file The path to save the workbook to
#' @param overwrite If `FALSE`, will not overwrite when `path` exists
#' @param path Deprecated argument previously used for file. Please use file in new code.
#' @param flush Experimental, streams the worksheet file to disk
#' @return The `wbWorkbook` object invisibly
save = function(file = self$path, overwrite = TRUE, path = NULL) {
save = function(file = self$path, overwrite = TRUE, path = NULL, flush = FALSE) {

if (!is.null(path)) {
.Deprecated(old = "wb_save(path)", new = "wb_save(file)", package = "openxlsx2")
Expand All @@ -2920,6 +2921,7 @@ wbWorkbook <- R6::R6Class(

assert_class(file, "character")
assert_class(overwrite, "logical")
assert_class(flush, "logical")

if (file.exists(file) & !overwrite) {
stop("File already exists!")
Expand Down Expand Up @@ -3435,7 +3437,8 @@ wbWorkbook <- R6::R6Class(
xlchartsDir,
xlchartsRelsDir,
xlworksheetsDir,
xlworksheetsRelsDir
xlworksheetsRelsDir,
use_pugixml_export = isFALSE(flush)
)

## write sharedStrings.xml
Expand Down Expand Up @@ -9643,7 +9646,8 @@ wbWorkbook <- R6::R6Class(
xlchartsDir,
xlchartsRelsDir,
xlworksheetsDir,
xlworksheetsRelsDir
xlworksheetsRelsDir,
use_pugixml_export
) {

## write charts
Expand Down Expand Up @@ -9778,44 +9782,68 @@ wbWorkbook <- R6::R6Class(
}
} else {
## Write worksheets
ws <- self$worksheets[[i]]
hasHL <- length(ws$hyperlinks) > 0
# ws <- self$worksheets[[i]]
hasHL <- length(self$worksheets[[i]]$hyperlinks) > 0

prior <- ws$get_prior_sheet_data()
post <- ws$get_post_sheet_data()
prior <- self$worksheets[[i]]$get_prior_sheet_data()
post <- self$worksheets[[i]]$get_post_sheet_data()

if (!is.null(ws$sheet_data$cc)) {
if (use_pugixml_export) {
# failsaves. check that all rows and cells
# are available and in the correct order
if (!is.null(self$worksheets[[i]]$sheet_data$cc)) {

cc <- ws$sheet_data$cc
cc$r <- stringi::stri_join(cc$c_r, cc$row_r)
# prepare data for output
self$worksheets[[i]]$sheet_data$cc$r <- with(
self$worksheets[[i]]$sheet_data$cc,
stringi::stri_join(c_r, row_r)
)
cc <- self$worksheets[[i]]$sheet_data$cc
# prepare data for output

# there can be files, where row_attr is incomplete because a row
# is lacking any attributes (presumably was added before saving)
# still row_attr is what we want!
# there can be files, where row_attr is incomplete because a row
# is lacking any attributes (presumably was added before saving)
# still row_attr is what we want!

rows_attr <- ws$sheet_data$row_attr
ws$sheet_data$row_attr <- rows_attr[order(as.numeric(rows_attr[, "r"])), ]
rows_attr <- self$worksheets[[i]]$sheet_data$row_attr
self$worksheets[[i]]$sheet_data$row_attr <- rows_attr[order(as.numeric(rows_attr[, "r"])), ]

cc_rows <- ws$sheet_data$row_attr$r
# c("row_r", "c_r", "r", "v", "c_t", "c_s", "c_cm", "c_ph", "c_vm", "f", "f_attr", "is")
cc <- cc[cc$row_r %in% cc_rows, ]
cc_rows <- self$worksheets[[i]]$sheet_data$row_attr$r
# c("row_r", "c_r", "r", "v", "c_t", "c_s", "c_cm", "c_ph", "c_vm", "f", "f_attr", "is")
cc <- cc[cc$row_r %in% cc_rows, ]

ws$sheet_data$cc <- cc[order(as.integer(cc[, "row_r"]), col2int(cc[, "c_r"])), ]
} else {
ws$sheet_data$row_attr <- NULL
ws$sheet_data$cc <- NULL
self$worksheets[[i]]$sheet_data$cc <- cc[order(as.integer(cc[, "row_r"]), col2int(cc[, "c_r"])), ]
rm(cc)
} else {
self$worksheets[[i]]$sheet_data$row_attr <- NULL
self$worksheets[[i]]$sheet_data$cc <- NULL
}
}

# create entire sheet prior to writing it
sheet_xml <- write_worksheet(
prior = prior,
post = post,
sheet_data = ws$sheet_data
)
ws_file <- file.path(xlworksheetsDir, sprintf("sheet%s.xml", i))
write_xmlPtr(doc = sheet_xml, fl = ws_file)
rm(sheet_xml)

if (use_pugixml_export) {

# create entire sheet prior to writing it
sheet_xml <- write_worksheet(
prior = prior,
post = post,
sheet_data = self$worksheets[[i]]$sheet_data
)
write_xmlPtr(doc = sheet_xml, fl = ws_file)

} else {

if (grepl("</worksheet>", prior))
prior <- substr(prior, 1, nchar(prior) - 13) # remove " </worksheet>"

write_worksheet_slim(
sheet_data = self$worksheets[[i]]$sheet_data,
prior = prior,
post = post,
fl = ws_file
)

}

## write worksheet rels
if (length(self$worksheets_rels[[i]])) {
Expand Down
11 changes: 9 additions & 2 deletions R/write_xlsx.R
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ write_xlsx <- function(x, file, as_table = FALSE, ...) {
"table_name", "with_filter", "first_active_row", "first_active_col",
"first_row", "first_col", "col_widths", "na.strings",
"overwrite", "title", "subject", "category",
"font_size", "font_color", "font_name"
"font_size", "font_color", "font_name",
"flush"
)

params <- list(...)
Expand Down Expand Up @@ -296,6 +297,12 @@ write_xlsx <- function(x, file, as_table = FALSE, ...) {
font_args$font_name <- params$font_name
}

# Flush stream file to disk
flush <- FALSE
if ("flush" %in% names(params)) {
flush <- params$flush
}


## create new Workbook object
wb <- wb_workbook(creator = creator, title = title, subject = subject, category = category)
Expand Down Expand Up @@ -479,7 +486,7 @@ write_xlsx <- function(x, file, as_table = FALSE, ...) {
}

if (!missing(file))
wb_save(wb, file = file, overwrite = overwrite)
wb_save(wb, file = file, overwrite = overwrite, flush = flush)

invisible(wb)
}
1 change: 0 additions & 1 deletion inst/WORDLIST
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,6 @@ veryHidden
vm
vml
wb
wbColor
wbWorkbook
wedgeEllipseCallout
wedgeRectCallout
Expand Down
4 changes: 3 additions & 1 deletion man/wbWorkbook.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 17 additions & 1 deletion man/wb_save.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions src/RcppExports.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -853,6 +853,19 @@ BEGIN_RCPP
return rcpp_result_gen;
END_RCPP
}
// write_worksheet_slim
void write_worksheet_slim(Rcpp::Environment sheet_data, std::string prior, std::string post, std::string fl);
RcppExport SEXP _openxlsx2_write_worksheet_slim(SEXP sheet_dataSEXP, SEXP priorSEXP, SEXP postSEXP, SEXP flSEXP) {
BEGIN_RCPP
Rcpp::RNGScope rcpp_rngScope_gen;
Rcpp::traits::input_parameter< Rcpp::Environment >::type sheet_data(sheet_dataSEXP);
Rcpp::traits::input_parameter< std::string >::type prior(priorSEXP);
Rcpp::traits::input_parameter< std::string >::type post(postSEXP);
Rcpp::traits::input_parameter< std::string >::type fl(flSEXP);
write_worksheet_slim(sheet_data, prior, post, fl);
return R_NilValue;
END_RCPP
}
// write_worksheet
XPtrXML write_worksheet(std::string prior, std::string post, Rcpp::Environment sheet_data);
RcppExport SEXP _openxlsx2_write_worksheet(SEXP priorSEXP, SEXP postSEXP, SEXP sheet_dataSEXP) {
Expand Down Expand Up @@ -1039,6 +1052,7 @@ static const R_CallMethodDef CallEntries[] = {
{"_openxlsx2_read_colors", (DL_FUNC) &_openxlsx2_read_colors, 1},
{"_openxlsx2_write_colors", (DL_FUNC) &_openxlsx2_write_colors, 1},
{"_openxlsx2_set_sst", (DL_FUNC) &_openxlsx2_set_sst, 1},
{"_openxlsx2_write_worksheet_slim", (DL_FUNC) &_openxlsx2_write_worksheet_slim, 4},
{"_openxlsx2_write_worksheet", (DL_FUNC) &_openxlsx2_write_worksheet, 3},
{"_openxlsx2_write_xmlPtr", (DL_FUNC) &_openxlsx2_write_xmlPtr, 2},
{"_openxlsx2_styles_bin", (DL_FUNC) &_openxlsx2_styles_bin, 3},
Expand Down
Loading
Loading