-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chunked performance #16
Comments
Maybe this comes from the proxy image in Nibabel : https://nipy.org/nibabel/images_and_memory.html In that case, the initial This in turn, I suppose brings up what "fair" means to Nibabel. Nibabel, like Xibabel / Xarray + Dask, will try to defer reads from disk - it's just it will do it in a cruder and less configurable way. |
For the time chunking - yes - that makes sense - I suppose that the matrix multiply may need whole voxel time-course rows to work with, so need to load across time - whereas the other dimensions will all allow smaller blocks of voxels. We need to think about some optional chunking strategy that may depend on the operation. For example - time slicing will likely benefit tasks like motion correction for which you need to pull out individual volumes. I guess we could simulate this kind of processing with: for tp in ximg['time']:
arr = np.array(ximg.sel(time=tp)) Maybe we can define processing steps with their associated chunking strategies. |
continued exploration and observations, from notes that began in #12
default automatic chunking not deterministic (and not performant)
One of the things we wanted to understand how the default auto-chunking performs. In the figures in #12 I was passing in a dictionary that composed with whatever chunking was already set up automatically, and unfortunately what I figured out is that the default auto-chunking is not deterministic. It would always automatically chunk the last ("time") dimension, but the chunk size varied between 61-257, usually going down by 10-20 on consecutive runs. Also, using just this chunk strategy without any additionally specified chunking typically yielded a slower run than the nibabel. For example, whereas the complete
nibabel
approach took 2.6 seconds using about 8 GiB, the default auto-chunking approach took 3.1 seconds and still used about 8 GiB.splitting up "reading" and "compute" for the original
nibabel
codeAnother thing I worried about was that in the earlier plots I wasn't being fair to the
nibabel
code, as it was reading the file, reshaping it, and then running the computation. So I separated this for the nibabel case and was surprised to find that just "reading" was already using up most of the time and the entire 8 GiB! I wrongly assumed that the memory growth was coming from matrix multiplicaiton, but it turns out that half of the memory usage is coming from justHere are 9 runs where I perform the reshape and save all of the data, and one run where I save all of the same data but without performing the reshape:

new "compute" only nibabel figure
Doing the reshaping as part of the "reading" code leaves us with 1 second to read the reshaped data from disk and do the linear algebra. This can now be used to make apples-to-apples comparison with the chunked "computer" only xibabel code.

can we get anywhere with chunking?
In this new regime, some of the previous chunking that we used initially no-longer outperform the original nibabel computation in terms of time, though they still improve on the original nibabel computation memory usage. The two best chunking approaches we previously had ended up using only 2 GiB of memory, instead of the 4 GiB nibabel uses, but they take a a bit over a second, so actually end up slower than nibabel.
To get around the non-determinism of automatic chunking, I ended up specifying a chunk-dimension of -1 -- which says "do not chunk in this dimension" -- for all but one dimension, and then ran some experiments for small chunksizes in just that one dimension. Chunking only in the "time" dimension did not performed well. All of the performant strategies below have
{"time": -1}
which I omitted in the plot. I've also kept in the resident memory trace for the nibabel runs for continuity with the previous figure, but only report the total duration and maximum memory usage for the various chunking strategies.The text was updated successfully, but these errors were encountered: