Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[save] Stream Worksheet XML to Disk #1255

Merged
merged 11 commits into from
Mar 3, 2025
Merged

[save] Stream Worksheet XML to Disk #1255

merged 11 commits into from
Mar 3, 2025

Conversation

JanMarvin
Copy link
Owner

@JanMarvin JanMarvin commented Jan 29, 2025

This is an experimental branch, using a custom C++ function to stream the XML file to disk. It is working, but not extensively tested and might still lack features. It has shown that it can reduce the memory consumption significantly.

@JanMarvin
Copy link
Owner Author

Cherry pick 15daf92

@JanMarvin
Copy link
Owner Author

JanMarvin commented Jan 29, 2025

Using this, is a game changer memory wise. Writing a 500,000 x 100 rnorm data frame previously required - with the optimizations from this branch - ~28GB. Writing the output file with pugixml using a custom xml writer, reduces the required memory amount by 12-14GB.
The changes in the branch skip a few guards that check that the cells and rows are in the correct order.

options("openxlsx2.export_with_pugi" = FALSE)

This branch skips building the XML file in memory before it is written to the disk. The custom XML writer simply flushes what is available into a text file. Without any checks for correct XML etc.

@JanMarvin JanMarvin changed the title [save] No copy sheet data [save] Stream Worksheet XML to disk Jan 29, 2025
@JanMarvin JanMarvin changed the title [save] Stream Worksheet XML to disk [save] Stream Worksheet XML to Disk Jan 29, 2025
@JanMarvin JanMarvin added enhancement 😀 New feature or request help wanted 🙏 Extra attention is needed options ☑️ labels Jan 29, 2025
@JanMarvin JanMarvin force-pushed the no_copy_sheet_data branch from 6e430fd to 306a58c Compare March 3, 2025 12:12
@JanMarvin
Copy link
Owner Author

wb_save(flush = TRUE) is now possible

@JanMarvin JanMarvin force-pushed the no_copy_sheet_data branch from b097ae0 to 0b8633e Compare March 3, 2025 13:07
@JanMarvin JanMarvin merged commit 8cfef19 into main Mar 3, 2025
9 checks passed
@JanMarvin JanMarvin deleted the no_copy_sheet_data branch March 3, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 😀 New feature or request help wanted 🙏 Extra attention is needed options ☑️
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant