Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading multiple MODIS LST files (MOD11A2 product) in HDF format with rast() fails, file does not exist #1481

Closed
dimfalk opened this issue Apr 13, 2024 · 3 comments

Comments

@dimfalk
Copy link

dimfalk commented Apr 13, 2024

Messing around with some MODIS LST data (MOD11A2 acquired via USGS EarthExpolorer to be precise) in order to test tapp(), I spend some time now understandig why my workflow fails and it seems like rast() would not accept more than two HDF-EOS files before something breaks:

library(terra)
#> terra 1.7.71

# MOD11A2, tile h18v03, year 2023 (=46 files)
fnames <- list.files(pattern = "hdf$")
head(fnames, 3)
#> [1] "MOD11A2.A2023001.h18v03.061.2023015004633.hdf" 
#> [2] "MOD11A2.A2023009.h18v03.061.2023020002349.hdf" 
#> [3] "MOD11A2.A2023017.h18v03.061.2023027004006.hdf"

# 1 file = 12 layers, looking ok
r1 <- rast(fnames[1])
r1 * 1
#> class       : SpatRaster 
#> dimensions  : 1200, 1200, 12  (nrow, ncol, nlyr)
#> resolution  : 926.6254, 926.6254  (x, y)
#> extent      : 0, 1111951, 5559753, 6671703  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs 
#> source(s)   : memory
#> varname     : MOD11A2.A2023001.h18v03.061.2023015004633 
#> names       : LST_Day_1km, QC_Day, Day_v~_time, Day_v~_angl, LST_N~t_1km, QC_Night, ... 
#> min values  :      242.40,      2,         9.7,         -65,      247.56,        2, ... 
#> max values  :      284.94,    177,        12.6,          65,      282.54,      177, ... 

# 2 files = 24 layers, still ok
r2 <- rast(fnames[1:2])
r2 * 1
#> class       : SpatRaster 
#> dimensions  : 1200, 1200, 24  (nrow, ncol, nlyr)
#> resolution  : 926.6254, 926.6254  (x, y)
#> extent      : 0, 1111951, 5559753, 6671703  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs 
#> source(s)   : memory
#> names       : LST_Day_1km, QC_Day, Day_v~_time, Day_v~_angl, LST_N~t_1km, QC_Night, ... 
#> min values  :      242.40,      2,         9.7,         -65,      247.56,        2, ... 
#> max values  :      284.94,    177,        12.6,          65,      282.54,      177, ... 

# all of a sudden, file not found?
r3 <- rast(fnames[1:3])
r3 * 1
#> Error: [*] file does not exist: HDF4_EOS:EOS_GRID:"MOD11A2.A2023017.h18v03.061.2023027004006.hdf":MODIS_Grid_8Day_1km_LST:Emis_31

# this worked before, now it does not
r1 * 1
#> Error: [*] file does not exist: HDF4_EOS:EOS_GRID:"MOD11A2.A2023001.h18v03.061.2023015004633.hdf":MODIS_Grid_8Day_1km_LST:LST_Day_1km

Obviously, you would need this data to reproduce the issue, I guess... But I also don't want to violate any guidelines by just uploading this here.

Do you have access to Earth Explorer (any other tile would work too, probably)? Or is there any other way I can hand over these files?

sessionInfo()
#> R version 4.3.3 (2024-02-29 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
@dimfalk
Copy link
Author

dimfalk commented Dec 5, 2024

Just wanted to check on this - and the issue still exists using R 4.4.2 and terra 1.7.83.

I realize that his is most likely some file- or format-specific issue, so I decided to provide some files for testing here.

After failing at r3 * 1, rast(fnames[1]) - this worked before, see above - additionally provides the following output:

Error: [rast] cannot open this file as a SpatRaster: 
MOD11A2.A2023001.h18v03.061.2023015004633.hdf
Additionally: Warning:
Failed to open HDF-EOS file "MOD11A2.A2023001.h18v03.061.2023015004633.hdf" for swath reading. (GDAL error 4) 

@rhijmans
Copy link
Member

rhijmans commented Dec 23, 2024

Sorry for taking so long to respond to this. When I looked at it the first time I could not figure it out, but now it is clear that this is because the HDF4 library has a very restrictive limitation of a maximum number of 32 connections (and each sub-dataset is one). That is documented here. It can be changed if you compile yourself.

Please note, that NCSA HDF library compiled with several defaults which is defined in hlimits.h file. For example, hlimits.h defines the maximum number of opened files:

#   define MAX_FILE   32

If you need open more HDF4 files simultaneously you should change this value and rebuild HDF4 library (and relink GDAL if using static HDF libraries).

I improved the handling of this situation as illustrated below. Here using the luna package to get your example files (remotes::install_github("rspatial/luna")); you need an account username/password.

uname <- "*****"
pword <- "*****"
library(terra)
f <- luna::getNASA("MOD11A2", "2023-01-01", "2023-01-18", download=TRUE, 
                          aoi=c(10,11,55,56), path=".", username=uname, password=pword)
f
#[1] "./MOD11A2.A2023001.h18v03.061.2023015004633.hdf" "./MOD11A2.A2023009.h18v03.061.2023020002349.hdf"
#[3] "./MOD11A2.A2023017.h18v03.061.2023027004006.hdf"

The below still fails, but now with a better error message:

x <- rast(f)
### each layer is a separate data source
nlyr(x)
#[1] 36
length(sources(x))
#[1] 36

y <- x * 1
#Error: [*] cannot read from HDF4_EOS:EOS_GRID:"MOD11A2.A2023017.h18v03.061.2023027004006.hdf":MODIS_Grid_8Day_1km_LST:Emis_31
#(Only 32 open datasets allowed with HDF4)

Also, the 32 connections that were opened are now closed when this error occurs, so you can continue, instead of having to restart a new R session).

One approach would be to select the sub-dataset of interest such that you can at least use 32 of these.

x <- rast(f, "LST_Day_1km")
x * 1
#class       : SpatRaster 
#dimensions  : 1200, 1200, 3  (nrow, ncol, nlyr)
#resolution  : 926.6254, 926.6254  (x, y)
#extent      : 0, 1111951, 5559753, 6671703  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs 
#source(s)   : memory
#names       : LST_Day_1km, LST_Day_1km, LST_Day_1km 
#min values  :      242.40,      241.82,      238.54 
#max values  :      284.94,      284.36,      279.02 

@dimfalk
Copy link
Author

dimfalk commented Dec 25, 2024

Thanks for the clarification, Robert - and your insights explaining this behaviour as well as adding a comprehensible error message!

Also, thank you for pointing out this possible workaround 👍

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Feb 8, 2025
# Version 1.8-15

## Bug Fixes

- `Readrds` Failed For Rasters With Timestep="Seconds"
  [#1711](Https://github.com/Rspatial/terra/issues/1711) by Pascal
  Oettli

- `divide<SpatVector>` always returned NULL
  [#1724](rspatial/terra#1724) by Márcia
  Barbosa

- `erase` failed in some cases
  [#1710](rspatial/terra#1710) by
  erkent-carb

## enhancements

- `bestMatch` now has argument "fun" to allow the use of different
  distance measures, and a <matrix> method

- `wrap` (and `writeRDS`) now captures varnames/longnames
  [#1719](rspatial/terra#1719) by Andrew
  Gene Brown

- improved raster metadata writing
  [#1714](rspatial/terra#1714) by Andrew Gene
  Brown

- `vect` and `writeVector` now properly read and write date and
  datetime
  data. [#1718](rspatial/terra#1718) by
  Andrew Gene Brown

- improved estimate of available memory on linux systems
  [#1506](rspatial/terra#1506) by Cedric
  Rossi

# version 1.8-10

Released 2025-01-13

## bug fixes

- `expanse<SpatRaster>(transform=TRUE)` crashed R when the crs was
  "local". [#1671](rspatial/terra#1671) by
  Michael Chirico

- `patches(values=TRUE)` wrapped around the edges
  [#1675](rspatial/terra#1675) by Michael
  Chirico

- `spin` now correctly handles spherical coordinates
  [#1576](rspatial/terra#1576) by jeanlobry

- `mosaic` sometimes crashed R
  [#1524](rspatial/terra#1524) by John
  Baums, Dave Klinges, and Hugh Graham.

- `spatSample` ignored argument "exp" when taking a random sample with
  na.rm=TRUE on a large raster
  [#1437](rspatial/terra#1437) by Babak
  Naimi

- `split<SpatVector,SpatVector>` did not work properly
  [#1619](rspatial/terra#1619) by Michael
  Sumner

- `autocor` improved handling of NA cells for global Moran computation
  [#1992](rspatial/terra#1592) by Nicholas
  Berryman

- `shade` is more
  memory-safe. [#1452](rspatial/terra#1452)
  by Francis van Oordt and Chris English

- fixed bug in `rasterize` revealed when using `crop(mask=TRUE)`
  [#1686](rspatial/terra#1686) by edixon1

- fixed `to_id = NA` bug in `nearest`
  [#1471](rspatial/terra#1471) by Mats
  Blomqvist

- better handling of date/unit
  [#1684](rspatial/terra#1684) and
  [#1688](rspatial/terra#1688) by Andrew
  Gene Brown

- `spatSample(method="regular")` on a raster with one column returned
  too many samples
  [#1362](rspatial/terra#1362) by Daniel R
  Schlaepfer


## enhancements

- `plot<SpatVector>` now uses the same default viridis color palette
  as `plot<SpatRaster>`
  [#1670](rspatial/terra#1670) by Márcia
  Barbosa

- `relate` now accepts relation="equals"
  [#1672](rspatial/terra#1672) by Krzysztof
  Dyba

- `init` now accepts additional arguments for function "fun"

- better handling of the 32 connections limitation set by the HDF4
  library [#1481](rspatial/terra#1481) by
  Dimitri Falk

- When using RStudio a once per session warning is given when using
  draw, sel or click
  [#1063](rspatial/terra#1063) by Sergei
  Kharchenko

- `distance<SpatRaster>` from lon and lat lines/polygons computes
  distance to the edges instead of the nodes
  [#1462](rspatial/terra#1462) by Derek
  Friend

- `distance<SpatVector,SpatVector>` now works for lon/lat data
  [#1615](rspatial/terra#1615) by Wencheng
  Lau-Medrano

- using overviews for faster plotting of COGs over http
  [#1353](rspatial/terra#1353) by Michael
  Sumner and [#1412](rspatial/terra#1412);
  and argument `plot(x, overview=)` to change the default behavior.

- `extract` with points is now faster for rasters accessed over http
  [#1504](rspatial/terra#1504) by Krzysztof
  Dyba

- `extract` with many points on very large rasters was slower in
  compared to doing the same with "raster" (which uses terra for
  that!) [#1584](rspatial/terra#1584) by
  Hassan Masoomi

- `merge` now has three alternative algorithms
  [1366](rspatial/terra#1366) by Hassan
  Masoomi and [#1650](rspatial/terra#1650)
  by Agustin Lobo


## new

- `$<SpatRaster>` can now be used to get a categorical SpatRaster with
  a different active category

- `scale_linear<SpatRaster>` method for linear scaling of cell values
  between a minimum and maximum value such as 0 and 1

- `distance` and related methods get argument "method" to choose the
  distance algorithm for lon/lat data
  [#1677](rspatial/terra#1677) by Márcia
  Barbosa

- `divide<SpatRaster>` and `divide<SpatVector>` methods

- `nseg` counts the number of segments in a SpatVector
  [#1647](rspatial/terra#1674) by Michael
  Chirico

- `extract` argument "search_radius" to extract values from the
  nearest raster cell that is not `NA`
  [#873](rspatial/terra#873) by
  matthewseanmarcus

- `combineLevels` to combine the levels of all layers
  [link](https://stackoverflow.com/questions/79340152/how-to-set-factor-levels-in-a-rast-stack-using-terra-when-different-levels-exi)
  on SO by Sam
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants