Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize raster metadata #1714

Merged
merged 3 commits into from
Jan 21, 2025
Merged

Conversation

brownag
Copy link
Contributor

@brownag brownag commented Jan 17, 2025

Following up on #1696 (comment) I did some analysis on removing driver restrictions for reading and writing raster metadata.

It seems that with removing a few conditional statements we can write either .aux.xml or internal metadata for time and units for many more drivers than just GTiff.

I wrote a routine (see this Gist) to attempt to write and read a raster with metadata using all available raster drivers. The .aux.json is deleted prior to read, so any metadata that are read are those written using GDAL SetMetadataItem(), not the terra-specific .aux.json function.

The following is from testing with GDAL 3.10.0 and config option GDAL_PAM_ENABLED=YES (default).

Of 80 raster drivers with apparent write capability, I found 2 that I would omit from the test (MEM and VRT). 10 additional drivers failed to write--some of these are expected, like OpenFileGDB, which does not support write for raster, the others need more specific configuration to make copies of existing data sources or I don't have a good way to test them in this context.

I found 28 drivers can read or write all of the current metadata elements including unit, time, dataset, band user tags. One driver (GPKG) can write all but the band-level user tags. An additional 13 drivers read and write units and time only. This gives a total of 42 drivers that can read and write units and time without significant changes to current code. 32 drivers can read and write dataset user tags, and 31 can do band-level user tags. 55 drivers write some sort of .aux.xml file, of which 50 contain both units and time, and 53 only time. 33 write dataset and band tags to .aux.xml files. So, there appear to be some cases where tags are written but not read.

Is there anywhere I have missed where there may be some logic limiting read/write for these metadata items? I have tried playing with the logic in read_gdal.cpp but so far have only made the above results the same or worse.

terra/src/read_gdal.cpp

Lines 1261 to 1290 in 2a3aa0b

if ((gdrv=="netCDF") || (gdrv == "HDF5")) {
char **m = poDataset->GetMetadata();
if (m) {
while (*m != nullptr) {
metadata.push_back(*m++);
}
}
s.set_names_time_ncdf(metadata, bandmeta, msg);
if (s.srs.is_empty()) {
bool lat = false;
bool lon = false;
for (size_t i=0; i<metadata.size(); i++) {
if (!lat) lat = metadata[i].find("long_name=latitude");
if (!lon) lon = metadata[i].find("long_name=longitude");
}
if (lon && lat) {
if (s.srs.set("+proj=longlat", msg)) {
s.parameters_changed = true;
}
}
}
} else if (gdrv == "GRIB") {
s.set_names_time_grib(bandmeta, msg);
} else if (gdrv == "GTiff") {
// needs to get its own generic one
// s.set_names_time_tif(bandmeta, msg);
}

In general it seems that DATE_TIME is better supported than UNIT, and TIMESTAMP/TIMEUNIT is slightly better supported than DATE_TIME. Possibly of interest to highlight: 5 drivers ("COG", "GPKG", "GTiff", "netCDF", "PCIDSK") support read/write of internal metadata for units and time.

While I think this is an improvement over only writing GDAL metadata with GTiff, it is not as consistent as I would hope. It does seem that the .aux.json file provides some consistency at the cost of being external to GDAL.

I am not sure yet why some drivers don't write .aux.xml or internal metadata, there is probably not one reason, I guess the SetMetadataItem()/GetMetadata() calls etc. are just ignored/not defined in some drivers. It is documented that some drivers do not support them, but it seems like a few of the problem drivers should work better than they do.

I looked into my Zarr anecdote from #1696 (comment) and found that writeRaster()/SetMetadataItem() appear to not write to the internal .zmetadata file, but I could write custom metadata for Zarr with the -mo flag to gdal_translate. While I could write e.g. UNIT metadata manually, it was still not read with rast(). I may investigate a bit more how gdal_translate is getting/setting metadata to try and figure that out.

@rhijmans
Copy link
Member

That is excellent. Thank you for being so thorough. Let's go ahead with this.

@brownag brownag marked this pull request as ready for review January 20, 2025 20:47
@brownag
Copy link
Contributor Author

brownag commented Jan 20, 2025

That is excellent. Thank you for being so thorough. Let's go ahead with this.

OK, I've marked it as ready to review. If you want me to clean anything up let me know, but figured for now I would just comment out the limiting logic

@rhijmans rhijmans merged commit fd73eee into rspatial:master Jan 21, 2025
4 checks passed
@brownag brownag mentioned this pull request Jan 23, 2025
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Feb 8, 2025
# Version 1.8-15

## Bug Fixes

- `Readrds` Failed For Rasters With Timestep="Seconds"
  [#1711](Https://github.com/Rspatial/terra/issues/1711) by Pascal
  Oettli

- `divide<SpatVector>` always returned NULL
  [#1724](rspatial/terra#1724) by Márcia
  Barbosa

- `erase` failed in some cases
  [#1710](rspatial/terra#1710) by
  erkent-carb

## enhancements

- `bestMatch` now has argument "fun" to allow the use of different
  distance measures, and a <matrix> method

- `wrap` (and `writeRDS`) now captures varnames/longnames
  [#1719](rspatial/terra#1719) by Andrew
  Gene Brown

- improved raster metadata writing
  [#1714](rspatial/terra#1714) by Andrew Gene
  Brown

- `vect` and `writeVector` now properly read and write date and
  datetime
  data. [#1718](rspatial/terra#1718) by
  Andrew Gene Brown

- improved estimate of available memory on linux systems
  [#1506](rspatial/terra#1506) by Cedric
  Rossi

# version 1.8-10

Released 2025-01-13

## bug fixes

- `expanse<SpatRaster>(transform=TRUE)` crashed R when the crs was
  "local". [#1671](rspatial/terra#1671) by
  Michael Chirico

- `patches(values=TRUE)` wrapped around the edges
  [#1675](rspatial/terra#1675) by Michael
  Chirico

- `spin` now correctly handles spherical coordinates
  [#1576](rspatial/terra#1576) by jeanlobry

- `mosaic` sometimes crashed R
  [#1524](rspatial/terra#1524) by John
  Baums, Dave Klinges, and Hugh Graham.

- `spatSample` ignored argument "exp" when taking a random sample with
  na.rm=TRUE on a large raster
  [#1437](rspatial/terra#1437) by Babak
  Naimi

- `split<SpatVector,SpatVector>` did not work properly
  [#1619](rspatial/terra#1619) by Michael
  Sumner

- `autocor` improved handling of NA cells for global Moran computation
  [#1992](rspatial/terra#1592) by Nicholas
  Berryman

- `shade` is more
  memory-safe. [#1452](rspatial/terra#1452)
  by Francis van Oordt and Chris English

- fixed bug in `rasterize` revealed when using `crop(mask=TRUE)`
  [#1686](rspatial/terra#1686) by edixon1

- fixed `to_id = NA` bug in `nearest`
  [#1471](rspatial/terra#1471) by Mats
  Blomqvist

- better handling of date/unit
  [#1684](rspatial/terra#1684) and
  [#1688](rspatial/terra#1688) by Andrew
  Gene Brown

- `spatSample(method="regular")` on a raster with one column returned
  too many samples
  [#1362](rspatial/terra#1362) by Daniel R
  Schlaepfer


## enhancements

- `plot<SpatVector>` now uses the same default viridis color palette
  as `plot<SpatRaster>`
  [#1670](rspatial/terra#1670) by Márcia
  Barbosa

- `relate` now accepts relation="equals"
  [#1672](rspatial/terra#1672) by Krzysztof
  Dyba

- `init` now accepts additional arguments for function "fun"

- better handling of the 32 connections limitation set by the HDF4
  library [#1481](rspatial/terra#1481) by
  Dimitri Falk

- When using RStudio a once per session warning is given when using
  draw, sel or click
  [#1063](rspatial/terra#1063) by Sergei
  Kharchenko

- `distance<SpatRaster>` from lon and lat lines/polygons computes
  distance to the edges instead of the nodes
  [#1462](rspatial/terra#1462) by Derek
  Friend

- `distance<SpatVector,SpatVector>` now works for lon/lat data
  [#1615](rspatial/terra#1615) by Wencheng
  Lau-Medrano

- using overviews for faster plotting of COGs over http
  [#1353](rspatial/terra#1353) by Michael
  Sumner and [#1412](rspatial/terra#1412);
  and argument `plot(x, overview=)` to change the default behavior.

- `extract` with points is now faster for rasters accessed over http
  [#1504](rspatial/terra#1504) by Krzysztof
  Dyba

- `extract` with many points on very large rasters was slower in
  compared to doing the same with "raster" (which uses terra for
  that!) [#1584](rspatial/terra#1584) by
  Hassan Masoomi

- `merge` now has three alternative algorithms
  [1366](rspatial/terra#1366) by Hassan
  Masoomi and [#1650](rspatial/terra#1650)
  by Agustin Lobo


## new

- `$<SpatRaster>` can now be used to get a categorical SpatRaster with
  a different active category

- `scale_linear<SpatRaster>` method for linear scaling of cell values
  between a minimum and maximum value such as 0 and 1

- `distance` and related methods get argument "method" to choose the
  distance algorithm for lon/lat data
  [#1677](rspatial/terra#1677) by Márcia
  Barbosa

- `divide<SpatRaster>` and `divide<SpatVector>` methods

- `nseg` counts the number of segments in a SpatVector
  [#1647](rspatial/terra#1674) by Michael
  Chirico

- `extract` argument "search_radius" to extract values from the
  nearest raster cell that is not `NA`
  [#873](rspatial/terra#873) by
  matthewseanmarcus

- `combineLevels` to combine the levels of all layers
  [link](https://stackoverflow.com/questions/79340152/how-to-set-factor-levels-in-a-rast-stack-using-terra-when-different-levels-exi)
  on SO by Sam
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants