Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge function results in unnecessarily huge output #1366

Closed
HassanMasoomi opened this issue Dec 7, 2023 · 2 comments
Closed

merge function results in unnecessarily huge output #1366

HassanMasoomi opened this issue Dec 7, 2023 · 2 comments

Comments

@HassanMasoomi
Copy link

Hi,

I have several tiles and I need to merge them; but using terra's merge function will result in a ridiculously huge output file while using raster's merge function behaves very well and the output makes sense. In my real case, the output should be around 7GB but when I use terra's merge function, it results in an output file of ~120GB!!!

I managed to provide a reproducible example here. In this example, I have 80 tiles of ~6.5MB (in total ~520MB). Using terra, the merged file is ~4GB (!!!) while using raster results in a merged file of ~850MB (which is what expected).

Interestingly, if I just rewrite the result from terra, it will be ~850MB which totally would makes sense.

library(terra)
# terra 1.7.55

# creating 80 tiles of 1x1 out of 121 tiles for the region
n <- 80
allComb <- expand.grid(lon = seq(-100, -90, by = 1), lat = seq(40, 50, by = 1))
allComb <- allComb[sample.int(nrow(allComb), n),]
for (i in 1:n) {
  writeRaster(rast(xmin=allComb[i, "lon"], xmax=allComb[i, "lon"]+1, 
                   ymin=allComb[i, "lat"], ymax=allComb[i, "lat"]+1, res=0.0001, vals=i), 
              paste0("part_", i, ".tif"), overwrite = TRUE)
}

# merge tiles using terra
allParts <- list.files(pattern = "^part_.+\\.tif", full.names = TRUE)
all_terra <- merge(sprc(allParts), first = TRUE, na.rm = TRUE, 
                   filename = "all_terra.tif", overwrite = TRUE)

# merge tiles using raster
allParts <- list.files(pattern = "^part_.+\\.tif", full.names = TRUE)
allParts <- lapply(allParts, raster::raster)
allParts$filename <- "all_raster.tif"
all_raster <- do.call(raster::merge, allParts)

# rewrite the terra output
writeRaster(all_terra, filename = "all_terra_rewrite.tif", overwrite = TRUE)
@rhijmans
Copy link
Member

rhijmans commented Dec 8, 2023

I see

file.info("all_terra.tif")$size / 1024^2
#[1] 901.186
file.info("all_terra_rewrite.tif")$size / 1024^2
#[1] 261.0031

describe("all_terra.tif")[c(37, 45)]
#[1] "  COMPRESSION=LZW"                                 
#[2] "Band 1 Block=1100x1 Type=Float32, ColorInterp=Gray"
describe("all_terra_rewrite.tif")[c(37, 45)]
#[1] "  COMPRESSION=LZW"                                 
#[2] "Band 1 Block=1100x1 Type=Float32, ColorInterp=Gray"

It seems that the compression is not working very well when using merge the way that terra implements it.

@rhijmans
Copy link
Member

I have added two alternative methods ("algo=2" and "algo=3") that give much better compression, but in my tests, that comes at the expense of speed.

If the merged raster is not a final output, another option is to use "algo=3" with no output filename. That returns a virtually merged raster (no new file) that can be used in subsequent computation.

library(terra)
x <- rast(xmin=-110, xmax=-80, ymin=40, ymax=70, res=1, vals=1)
y <- rast(xmin=-85, xmax=-55, ymax=60, ymin=30, res=1, vals=2)
z <- rast(xmin=-60, xmax=-30, ymax=50, ymin=20, res=1, vals=3)
s <- sprc(x, y, z)

b1 <- merge(s, algo=1, filename="merge1.tif", overwrite=TRUE)
b2 <- merge(s, algo=2, filename="merge2.tif", overwrite=TRUE)
b3 <- merge(s, algo=3, filename="merge3.tif", overwrite=TRUE)
panel(c(b1, b2, b3))

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Feb 8, 2025
# Version 1.8-15

## Bug Fixes

- `Readrds` Failed For Rasters With Timestep="Seconds"
  [#1711](Https://github.com/Rspatial/terra/issues/1711) by Pascal
  Oettli

- `divide<SpatVector>` always returned NULL
  [#1724](rspatial/terra#1724) by Márcia
  Barbosa

- `erase` failed in some cases
  [#1710](rspatial/terra#1710) by
  erkent-carb

## enhancements

- `bestMatch` now has argument "fun" to allow the use of different
  distance measures, and a <matrix> method

- `wrap` (and `writeRDS`) now captures varnames/longnames
  [#1719](rspatial/terra#1719) by Andrew
  Gene Brown

- improved raster metadata writing
  [#1714](rspatial/terra#1714) by Andrew Gene
  Brown

- `vect` and `writeVector` now properly read and write date and
  datetime
  data. [#1718](rspatial/terra#1718) by
  Andrew Gene Brown

- improved estimate of available memory on linux systems
  [#1506](rspatial/terra#1506) by Cedric
  Rossi

# version 1.8-10

Released 2025-01-13

## bug fixes

- `expanse<SpatRaster>(transform=TRUE)` crashed R when the crs was
  "local". [#1671](rspatial/terra#1671) by
  Michael Chirico

- `patches(values=TRUE)` wrapped around the edges
  [#1675](rspatial/terra#1675) by Michael
  Chirico

- `spin` now correctly handles spherical coordinates
  [#1576](rspatial/terra#1576) by jeanlobry

- `mosaic` sometimes crashed R
  [#1524](rspatial/terra#1524) by John
  Baums, Dave Klinges, and Hugh Graham.

- `spatSample` ignored argument "exp" when taking a random sample with
  na.rm=TRUE on a large raster
  [#1437](rspatial/terra#1437) by Babak
  Naimi

- `split<SpatVector,SpatVector>` did not work properly
  [#1619](rspatial/terra#1619) by Michael
  Sumner

- `autocor` improved handling of NA cells for global Moran computation
  [#1992](rspatial/terra#1592) by Nicholas
  Berryman

- `shade` is more
  memory-safe. [#1452](rspatial/terra#1452)
  by Francis van Oordt and Chris English

- fixed bug in `rasterize` revealed when using `crop(mask=TRUE)`
  [#1686](rspatial/terra#1686) by edixon1

- fixed `to_id = NA` bug in `nearest`
  [#1471](rspatial/terra#1471) by Mats
  Blomqvist

- better handling of date/unit
  [#1684](rspatial/terra#1684) and
  [#1688](rspatial/terra#1688) by Andrew
  Gene Brown

- `spatSample(method="regular")` on a raster with one column returned
  too many samples
  [#1362](rspatial/terra#1362) by Daniel R
  Schlaepfer


## enhancements

- `plot<SpatVector>` now uses the same default viridis color palette
  as `plot<SpatRaster>`
  [#1670](rspatial/terra#1670) by Márcia
  Barbosa

- `relate` now accepts relation="equals"
  [#1672](rspatial/terra#1672) by Krzysztof
  Dyba

- `init` now accepts additional arguments for function "fun"

- better handling of the 32 connections limitation set by the HDF4
  library [#1481](rspatial/terra#1481) by
  Dimitri Falk

- When using RStudio a once per session warning is given when using
  draw, sel or click
  [#1063](rspatial/terra#1063) by Sergei
  Kharchenko

- `distance<SpatRaster>` from lon and lat lines/polygons computes
  distance to the edges instead of the nodes
  [#1462](rspatial/terra#1462) by Derek
  Friend

- `distance<SpatVector,SpatVector>` now works for lon/lat data
  [#1615](rspatial/terra#1615) by Wencheng
  Lau-Medrano

- using overviews for faster plotting of COGs over http
  [#1353](rspatial/terra#1353) by Michael
  Sumner and [#1412](rspatial/terra#1412);
  and argument `plot(x, overview=)` to change the default behavior.

- `extract` with points is now faster for rasters accessed over http
  [#1504](rspatial/terra#1504) by Krzysztof
  Dyba

- `extract` with many points on very large rasters was slower in
  compared to doing the same with "raster" (which uses terra for
  that!) [#1584](rspatial/terra#1584) by
  Hassan Masoomi

- `merge` now has three alternative algorithms
  [1366](rspatial/terra#1366) by Hassan
  Masoomi and [#1650](rspatial/terra#1650)
  by Agustin Lobo


## new

- `$<SpatRaster>` can now be used to get a categorical SpatRaster with
  a different active category

- `scale_linear<SpatRaster>` method for linear scaling of cell values
  between a minimum and maximum value such as 0 and 1

- `distance` and related methods get argument "method" to choose the
  distance algorithm for lon/lat data
  [#1677](rspatial/terra#1677) by Márcia
  Barbosa

- `divide<SpatRaster>` and `divide<SpatVector>` methods

- `nseg` counts the number of segments in a SpatVector
  [#1647](rspatial/terra#1674) by Michael
  Chirico

- `extract` argument "search_radius" to extract values from the
  nearest raster cell that is not `NA`
  [#873](rspatial/terra#873) by
  matthewseanmarcus

- `combineLevels` to combine the levels of all layers
  [link](https://stackoverflow.com/questions/79340152/how-to-set-factor-levels-in-a-rast-stack-using-terra-when-different-levels-exi)
  on SO by Sam
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants