Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

longnames and varnames aren't overwritten #141

Open
njtierney opened this issue Jan 16, 2025 · 12 comments
Open

longnames and varnames aren't overwritten #141

njtierney opened this issue Jan 16, 2025 · 12 comments

Comments

@njtierney
Copy link
Owner

See

geotargets::geotargets_option_set(
    gdal_raster_creation_options =
        c("COMPRESS=DEFLATE", "TFW=YES")
)
targets::tar_script({
    elev_scale <- function(z = 1, projection = "EPSG:4326") {
        rast_elev_scale <- terra::project(
            terra::rast(
                system.file(
                    "ex",
                    "elev.tif",
                    package = "terra"
                )
            ) * z,
            projection
        )
        terra::units(rast_elev_scale) <- "m"
        terra::varnames(rast_elev_scale) <- "new-varnames"
        terra::longnames(rast_elev_scale) <- "really-long-new-name"
        terra::time(rast_elev_scale) <- as.Date("2025-01-15")
        rast_elev_scale
    }
    list(
        geotargets::tar_terra_sprc(
            raster_elevs,
            # two rasters, one unaltered, one scaled by factor of 2 and
            # reprojected to interrupted good homolosine
            command = terra::sprc(list(
                elev_scale(1),
                elev_scale(2, "+proj=igh")
            ))
        )
    )
})
targets::tar_make()
#> ▶ dispatched target raster_elevs
#> ● completed target raster_elevs [0.073 seconds, 36.611 kilobytes]
#> ▶ ended pipeline [0.133 seconds]
x <- targets::tar_read(raster_elevs)
x[1]
#> class       : SpatRaster 
#> dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
#> resolution  : 0.008333333, 0.008333333  (x, y)
#> extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : raster_elevs 
#> name        : elevation 
#> min value   :       141 
#> max value   :       547 
#> unit        :         m 
#> time (days) : 2025-01-15
# works
terra::units(x[1])
#> [1] "m"
# doesn't work
terra::varnames(x[1])
#> [1] "raster_elevs"
# doesn't work
terra::longnames(x[1])
#> [1] ""
# works
terra::time(x[1])
#> [1] "2025-01-15"

Created on 2025-01-16 with reprex v2.1.1

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.2 (2024-10-31)
#>  os       macOS Sequoia 15.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Australia/Melbourne
#>  date     2025-01-16
#>  pandoc   3.2.1 @ /opt/homebrew/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  backports     1.5.0      2024-05-23 [1] CRAN (R 4.4.0)
#>  base64url     1.4        2018-05-14 [1] CRAN (R 4.4.0)
#>  callr         3.7.6      2024-03-25 [1] CRAN (R 4.4.0)
#>  cli           3.6.3      2024-06-21 [1] CRAN (R 4.4.0)
#>  codetools     0.2-20     2024-03-31 [2] CRAN (R 4.4.2)
#>  data.table    1.16.4     2024-12-06 [1] CRAN (R 4.4.1)
#>  digest        0.6.37     2024-08-19 [1] CRAN (R 4.4.1)
#>  evaluate      1.0.1      2024-10-10 [1] CRAN (R 4.4.1)
#>  fastmap       1.2.0      2024-05-15 [1] CRAN (R 4.4.0)
#>  fs            1.6.5      2024-10-30 [1] CRAN (R 4.4.1)
#>  geotargets    0.1.0.9000 2024-11-20 [1] Github (njtierney/geotargets@ddf163b)
#>  glue          1.8.0      2024-09-30 [1] CRAN (R 4.4.1)
#>  htmltools     0.5.8.1    2024-04-04 [1] CRAN (R 4.4.0)
#>  igraph        2.1.3      2025-01-07 [1] CRAN (R 4.4.2)
#>  knitr         1.49       2024-11-08 [1] CRAN (R 4.4.1)
#>  lifecycle     1.0.4      2023-11-07 [1] CRAN (R 4.4.0)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.4.0)
#>  pillar        1.10.1     2025-01-07 [1] CRAN (R 4.4.1)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.4.0)
#>  processx      3.8.5      2025-01-08 [1] CRAN (R 4.4.1)
#>  ps            1.8.1      2024-10-28 [1] CRAN (R 4.4.1)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.4.0)
#>  Rcpp          1.0.13-1   2024-11-02 [1] CRAN (R 4.4.1)
#>  reprex        2.1.1      2024-07-06 [1] CRAN (R 4.4.0)
#>  rlang         1.1.4      2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown     2.29       2024-11-04 [1] CRAN (R 4.4.1)
#>  rstudioapi    0.17.1     2024-10-22 [1] CRAN (R 4.4.1)
#>  secretbase    1.0.3      2024-10-02 [1] CRAN (R 4.4.1)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.4.0)
#>  targets       1.9.1      2024-12-04 [1] CRAN (R 4.4.1)
#>  terra         1.8-5      2024-12-12 [1] CRAN (R 4.4.1)
#>  tibble        3.2.1      2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyselect    1.2.1      2024-03-11 [1] CRAN (R 4.4.0)
#>  vctrs         0.6.5      2023-12-01 [1] CRAN (R 4.4.0)
#>  withr         3.0.2      2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun          0.50.5     2025-01-15 [1] Github (yihui/xfun@116d689)
#>  yaml          2.3.10     2024-07-26 [1] CRAN (R 4.4.0)
#> 
#>  [1] /Users/nick/Library/R/arm64/4.4/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@Aariq
Copy link
Collaborator

Aariq commented Jan 16, 2025

I think this is part of the known issue in terra that varnames and longnames aren't saved when a raster is written and read back in.

@brownag
Copy link
Contributor

brownag commented Jan 16, 2025

This is something I have looked at a little bit-- I believe these labels are intended for specific file types such as netCDF, HDF5 but their handling is not consistent in terra. Note that even wrap() does not handle them, let alone writeRaster()/rast():

library(terra)
#> terra 1.8.11

x <- rast(system.file("ex", "elev.tif", package="terra"))

varnames(x) <- "new-varnames"
longnames(x) <- "really-long-new-name"

varnames(x)
#> [1] "new-varnames"
longnames(x)
#> [1] "really-long-new-name"

y <- unwrap(wrap(x))

varnames(y)
#> [1] ""
longnames(y)
#> [1] ""

I am working on some PRs for terra to improve metadata writing/round-tripping for more drivers (rspatial/terra#1696 (comment))

I think that varnames and longnames could be stored in metadata tags or "user tags" for most drivers--either internally within the GDAL data file itself (as units and time are for GeoTIFF), or in a PAM (.aux.xml) file. If I get my changes implemented I will keep var/longnames on the radar and will try to make it so these work the same as units, time and custom user tags.

Thinking about the concept of GDAL subdatasets with drivers other than GTiff and SpatRasterCollection, I have a couple other ideas to perhaps improve tar_terra_sprc()... but I'll address that in a different issue/PR

@Aariq
Copy link
Collaborator

Aariq commented Jan 16, 2025

Thanks for chiming in @brownag. I'll close this since there's nothing to do about it in geotargets.

@Aariq Aariq closed this as completed Jan 16, 2025
@njtierney
Copy link
Owner Author

I think we can leave it open as something to fix once this gets resolved in terra

@Aariq
Copy link
Collaborator

Aariq commented Jan 17, 2025

Hopefully there will be nothing for us to do if these metadata end up getting written with the same mechanisms as layer names or units.

@brownag
Copy link
Contributor

brownag commented Jan 17, 2025

https://search.r-project.org/CRAN/refmans/terra/html/varnames.html

I was reading the documentation for varnames and longnames--and it seems there is some discussion of this issue:

Each SpatRaster data source can also have a variable name and a long variable name. They are set when reading a file with possibly multiple sub-datasets (e.g. netcdf or hdf5 format) into a single SpatRaster. Each sub-datset is a seperate "data-source" in the SpatRaster. Note that newly created or derived SpatRasters always have a single variable (data source), and therefore the variable names are lost when processing a multi-variable SpatRaster. Thus the variable names are mostly useful to understand a SpatRaster created from some files and for managing SpatRasterDatasets.

It seems "expected" that these names are only read and not written, and are targeted to specific data source types..

However I do think it is a bit confusing that there are exported setters varnames<- and longnames<- if common ops like wrap() and writeRaster() will wipe them out. I believe there is a strong argument for trying to get them into the PackedSpatRaster and regular metadata so that things like writing to temp files and serialization work properly, not only for our use cases here in geotatrgets. It may be that they can only be supported in a standard way for a subset of drivers, as I do not have band-level user tags working for all drivers. I am still investigating what the best possible implementation is. Worst case in my view--if we can't do it in a standard way via .aux.xml--it would be a few more elements that could be added to the terra-specific aux.json file.

@amart90
Copy link

amart90 commented Feb 27, 2025

Could some of these attributes (longnames, varnames, time, etc.) that are not preserved when written be written to R objects within the zip file in tar_rast_write()? Then when read in with tar_rast_read() they could be re-assigned to the SpatRaster? Here is some pseudocode (it's real code, I guess, but I haven't tested it - it's more a demonstration of the concept):

tar_rast_write <- function(filetype, gdal, preserve_metadata) {
  switch(preserve_metadata,
         zip = function(object, path) {
           # write the raster in a fresh local tempdir() that disappears when
           # function is done
           tmp <- withr::local_tempdir()
           raster_tmp_file <- file.path(tmp, basename(path))
           attr_tmp_file <- file.path(tmp, "attrs.RDS")
           rast_attrs <- list(
             time = terra::time(object),
             longnames = terra::longnames(object),
             varnames = terra::varnames(object)
           )
           zip_tmp_file <- file.path(tmp, "object.zip")
           terra::writeRaster(
             object,
             filename = raster_tmp_file,
             filetype = filetype,
             overwrite = TRUE,
             gdal = gdal
           )
           saveRDS(rast_attrs, attr_tmp_file, compress = FALSE)
           
           # package files into a zip file using `zip::zip()`
           raster_files <- list.files(path = tmp, full.names = TRUE)
           zip::zip(
             zipfile = zip_tmp_file,
             files = raster_files,
             compression_level = 1,
             mode = "cherry-pick",
             root = tmp
           )
           # move the zip file to the expected place
           file.copy(zip_tmp_file, path)
         }
      # ...
  )
}

tar_rast_read <- function(preserve_metadata) {
  switch(preserve_metadata,
         zip = function(path) {
           tmp <- tempdir()
           # NOTE: cannot use withr::local_tempdir() because the unzipped files need
           # to persist so that the resulting `SpatRaster` object doesn't have a
           # broken file pointer
           zip::unzip(zipfile = path, exdir = tmp)
           rast_attrs <- readRDS("attrs.RDS")
           rast_out <- terra::rast(file.path(tmp, basename(path)))
           terra::longnames(rast_out) <- rast_attrs[["longnames"]]
           terra::varnames(rast_out) <- rast_attrs[["varnames"]]
           terra::time(rast_out) <- rast_attrs[["time"]]
           return(rast_out)
         },
         drop = function(path) terra::rast(path)
  )
}

If this interests you at all, I am happy to put together a PR.

@Aariq
Copy link
Collaborator

Aariq commented Feb 27, 2025

I'm still not sure we should be trying to do anything that terra doesn't already do when writing and reading SpatRasters.

@brownag
Copy link
Contributor

brownag commented Feb 27, 2025

I requested this be fixed for wrap() and unwrap() in rspatial/terra#1719 which should resolve any issues where varnames/longnames would be wiped out in the marshalling proces.

Likewise, I am not sure we should be coming up with our own approach for this. The ability to store these attributes is driver-specific. Now in terra metadata are handled by the GDAL driver (after rspatial/terra#1714), so no terra-specific JSON metadata files anymore. Either attributes are stored internal to the file, or they are put in a Persistent Auxiliary Metadata file (PAM).

Some GDAL drivers support user-defined metadata tags--see terra::metags(). If preserving longnames and varnames in a more agnostic way is a desire for us (personally I never use them) it possibly should be done through the metadata user tag mechanism prior to writing a SpatRaster target, and then use an appropriate driver (e.g. GeoTIFF).

It might be reasonable to request storing varnames and longnames in user tags as a feature of terra, but I have not had a chance to investigate all the details of the current implementation and how it would interact with e.g. HDF and netCDF drivers that these attributes were originally intended for.

@Aariq
Copy link
Collaborator

Aariq commented Feb 27, 2025

The unfortunate thing about this being supported by wrap() and unwrap() but not by writeRaster() and the aux.json file is that there may be inconsistent behavior depending on whether an upstream target is skipped or run, whether memory = "transient" or memory = "persistent" and whether a crew controller is used or not. It would still be best to not use longnames and varnames in a targets pipeline.

@Aariq
Copy link
Collaborator

Aariq commented Feb 27, 2025

Hold up—I just read rspatial/terra#1714. Does this mean if we require this version of terra as minimum we wouldn't need the preserve_metadata argument to tar_terra_rast() anymore? (sorry, this is a bit off topic)

@brownag
Copy link
Contributor

brownag commented Feb 27, 2025

The unfortunate thing about this being supported by wrap() and unwrap() but not by writeRaster() and the aux.json file is that there may be inconsistent behavior depending on whether an upstream target is skipped or run, whether memory = "transient" or memory = "persistent" and whether a crew controller is used or not. It would still be best to not use longnames and varnames in a targets pipeline.

This, I think, is an argument for using something like terra::wrapCache() like #106. Alternately, perhaps we could issue messages that at least indicate these specific elements will not be preserved and automatically null them out when running a target

I am not sure it is a good general assumption that all data required to reproduce an R object can be stored in a regular spatial format--even with (auxiliary) metadata support--without also making some efforts to serialize the actual (Packed)SpatRaster/Vector object that was created.

Hold up—I just read rspatial/terra#1714. Does this mean if we require this version of terra as minimum we wouldn't need the preserve_metadata argument to tar_terra_rast() anymore? (sorry, this is a bit off topic)

Depends on the driver and version of GDAL. For GeoTIFF at least it all can be stored internally, as well as for several other drivers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants