Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance for boundless reads #227

Open
AsgerPetersen opened this issue Oct 19, 2020 · 3 comments
Open

Poor performance for boundless reads #227

AsgerPetersen opened this issue Oct 19, 2020 · 3 comments

Comments

@AsgerPetersen
Copy link

AsgerPetersen commented Oct 19, 2020

I would like to propose exposing a way to toggle boundless reading for two reasons:

  1. There are use cases where features being outside the raster extent is an error. For example in my job I am provided with countrywide rasters and I collect statistics from these rasters for buildings and roads. If a feature is outside the raster extent something is wrong with either the feature or the raster.

  2. Enabling boundless reading in rasterio seriously degrades performance in some cases. It looks like the dataset is opened for each feature and the block cache is effectively disabled when using boundless reading. This gist https://gist.github.com/AsgerPetersen/6f9c8120b85e462ccbc26191a2117b3a demonstrates a performance improvement about 50x when disabling boundless reading. On my real world data the performance improvement is in the order of 200x.

I implemented it for my own usage here: AsgerPetersen@c375094.

@perrygeo
Copy link
Owner

@AsgerPetersen 👍 looks like a good option for both providing more flexible edge handling and potentially a performance boost. Can you submit a PR for this? Looks ready to go. I can get this in the next release after some testing.

I'm curious about the performance degradation with boundless reads. I'll look into that as well.

Thanks!

@AsgerPetersen
Copy link
Author

Sure. PR in #228
I wonder if it could be an idea to implement boundless reading from rasterio datasets the same way as for numpy arrays: https://github.com/perrygeo/python-rasterstats/blob/master/src/rasterstats/io.py#L165. It seems that the strategy used by rasterio for boundless reading isn´t very performance friendly in the case of repeated reads.

@perrygeo perrygeo changed the title Allow toggling boundless reading Poor performance for boundless reads Nov 24, 2020
@perrygeo
Copy link
Owner

The boundless=False option is now in master. I'm going to repurpose this issue as a place to discuss the rasterio boundless read performance. As you suggested, there might be some workarounds to implement on the rasterstats side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants