Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodata when geometry has empty rasterization #243

Open
nathanjmcdougall opened this issue May 5, 2021 · 0 comments
Open

nodata when geometry has empty rasterization #243

nathanjmcdougall opened this issue May 5, 2021 · 0 comments

Comments

@nathanjmcdougall
Copy link

nathanjmcdougall commented May 5, 2021

Bug Description
When computing the 'nodata' count as one of the statistics for zonal_stats, if a geometry's rasterization does not include any pixels at all then:

  • a UserWarning will be raised by numpy and
  • the 'nodata' value will be NaN.

Of course, it is fine to interpret NaN as zero, but for consistency this should probably be:

  • None, which is the value returned for 'max', 'mean', etc., or
  • 0, which is the value returned for 'count', and probably makes conceptual sense in this case because 'nodata' functions as a count of pixels similarly to 'count'.

Example

I have created a small .tif and .shp file illustrating this:
example.zip

In this example, the polygon in the shapefile is too thin to fully encompass any pixels' centroids, using the default centroid rasterization strategy leads to an "empty" rasterization and the following:

import rasterstats
rasterstats.zonal_stats("example.shp", "small_raster.tif", stats='nodata')
C:\Users\nathan\Anaconda3\lib\site-packages\rasterstats\main.py:260: UserWarning: Warning: converting a masked element to nan.
  feature_stats['nodata'] = float((featmasked == fsrc.nodata).sum())
>>> [{'nodata': nan}]

Explanation for what is happening

The issue lies in the masking performed here:

featmasked = np.ma.MaskedArray(fsrc.array, mask=(~rv_array))
if 'nodata' in stats:
feature_stats['nodata'] = float((featmasked == fsrc.nodata).sum())

When trying to take .sum() over a fully masked array, numpy Masked Arrays will return numpy.ma.masked instead of zero.
If rv_array is all False (i.e. the rasterization doesn't include any pixels at all), then featmasked will be entirely masked, and so (featmasked == 0).sum() will be equal to numpy.ma.masked. Then, trying to convert to a float using float(numpy.ma.masked) raises the warning and returns np.nan, instead of zero.

Examples:

import numpy as np
# Partially masked (usual case):
rv_array = np.array([[False,True,False,False],[False,False,False,False]])
fsrc_array = np.array([[0,1,2,3],[4,5,6,7]])
featmasked = np.ma.MaskedArray(fsrc_array, mask=(~rv_array))
float((featmasked == 0).sum())
>>> 0.0
# Fully masked:
rv_array = np.array([[False,False,False,False],[False,False,False,False]])
fsrc_array = np.array([[0,1,2,3],[4,5,6,7]])
featmasked = np.ma.MaskedArray(fsrc_array, mask=(~rv_array))
x = (featmasked == 0).sum()
np.ma.is_masked(x)
>>> True
float(x)
UserWarning: Warning: converting a masked element to nan.
>>> nan

Install info
I'm using rasterstats 0.14.0 with numpy 1.19.2 on Anaconda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant