Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load each Well to get path to first image for each #119

Merged
merged 2 commits into from
Sep 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 16 additions & 3 deletions src/ome.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,14 @@ import { ZarrPixelSource } from '@hms-dbmi/viv';
import pMap from 'p-map';
import { Group as ZarrGroup, HTTPStore, openGroup, ZarrArray } from 'zarr';
import type { ImageLayerConfig, SourceData } from './state';
import { join, loadMultiscales, guessTileSize, range, parseMatrix } from './utils';
import {
getAttrsOnly,
guessTileSize,
join,
loadMultiscales,
parseMatrix,
range
} from './utils';

export async function loadWell(config: ImageLayerConfig, grp: ZarrGroup, wellAttrs: Ome.Well): Promise<SourceData> {
// Can filter Well fields by URL query ?acquisition=ID
Expand Down Expand Up @@ -119,7 +126,7 @@ export async function loadPlate(config: ImageLayerConfig, grp: ZarrGroup, plateA
const wellPaths = plateAttrs.wells.map((well) => well.path);

// Use first image as proxy for others.
const wellAttrs = (await grp.getItem(wellPaths[0]).then((g) => g.attrs.asObject())) as Ome.Attrs;
const wellAttrs = await getAttrsOnly<{ well: Ome.Well }>(grp, wellPaths[0]);
if (!('well' in wellAttrs)) {
throw Error('Path for image is not valid, not a well.');
}
Expand All @@ -133,10 +140,16 @@ export async function loadPlate(config: ImageLayerConfig, grp: ZarrGroup, plateA
const { datasets } = imgAttrs.multiscales[0];
const resolution = datasets[datasets.length - 1].path;

async function getImgPath(wellPath:string) {
const wellAttrs = await getAttrsOnly<{ well: Ome.Well }>(grp, wellPath);
return join(wellPath, wellAttrs.well.images[0].path);
}
const wellImagePaths = await Promise.all(wellPaths.map(getImgPath));

Copy link
Member

@manzt manzt Sep 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't need the group node other than for the attrs, I think we could make a util to handle the specific use case:

// src/utils.ts
const decoder = new TextDecoder();
export function getAttrsOnly<T = unknown>(grp: ZarrGroup, path: string) {
  return (grp.store as AsyncStore<ArrayBuffer>)
    .getItem(join(grp.path, path, ".zattrs"))
    .then((b) => decoder.decode(b))
    .then((text) => JSON.parse(text) as T);
}
  async function getImgPath(wellPath:string) {
    // This loads .zattrs for each well but also tries to load .zarray (404) and .zgroup
    const wellAttrs = await getAttrsOnly<{ well: Ome.Well }>(grp, wellPath);
    return join(wellPath, wellAttrs.well.images[0].path);
  }
  const wellImagePaths = await Promise.all(wellPaths.map(getImgPath));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@will-moore any thoughts on this? I am happy to push this PR through and open up a follow up PR regarding this performance enhancement. My main concern is that I don't have many HCS datasets to experiment with, and the IDR links have been somewhat unstable, so it's difficult to test locally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to look at this. I know IDR links have been unstable but https://hms-dbmi.github.io/vizarr/v0.1?source=https://s3.embassy.ebi.ac.uk/idr/zarr/v0.1/plates/5966.zarr was working just now (Firefox).
For that big plate it's already quite slow (over a minute) so this might be a killer.
Just trying now with this PR built locally, and it's taking a while (home internet isn't the fastest)!
Nearly 3 mins before it even starts to load chunks!
OK, so it finally loaded after 7 minutes (4618 requests). v0.1 vizarr it was 3297 requests and less that 2 mins. But YMMV.
In either case, most users would probably give up since there's no sign of progress.
This plate is probably a bit too ambitious and maybe shouldn't be a blocker if this PR is a critical fix for @camFoltz.
I won't have time to dig any deeper before next week.

Would that performance enhancement help even before this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@camFoltz camFoltz Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the viewer is to be flexible then it should also have flexibility in parsing the structure. Perhaps theres a method in which the loader infers whether or not the underlying positions/arrays/resolutions are of the same name scheme (perhaps after looking at the first 2-3 positions and noticing they're all the same). That way we can have the best of both worlds for now. It is not the most elegant fix, but could help here.

In my case, I would not have any two groups below the column level with the same name, and I am happy to test the performance locally as the datasets scale up. I can generate pretty large arbitrary HCS datasets at this point (now that I have a writer in place here) so I can give this a go.

In the far future we do plan on hosting data on the IDR, so I agree that the performance should be optimized.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also happy to share / generate these datasets at will for development purposes

// Create loader for every Well. Some loaders may be undefined if Wells are missing.
const mapper = ([key, path]: string[]) => grp.getItem(path).then((arr) => [key, arr]) as Promise<[string, ZarrArray]>;
const promises = await pMap(
wellPaths.map((p) => [p, join(p, imgPath, resolution)]),
wellImagePaths.map((p) => [p, join(p, resolution)]),
mapper,
{ concurrency: 10 }
);
Expand Down
8 changes: 8 additions & 0 deletions src/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,14 @@ export async function open(source: string | Store) {
});
}

const decoder = new TextDecoder();
export function getAttrsOnly<T = unknown>(grp: ZarrGroup, path: string) {
return (grp.store as AsyncStore<ArrayBuffer>)
.getItem(join(grp.path, path, ".zattrs"))
.then((b) => decoder.decode(b))
.then((text) => JSON.parse(text) as T);
}

export async function loadMultiscales(grp: ZarrGroup, multiscales: Ome.Multiscale[]) {
const { datasets } = multiscales[0] || [{ path: '0' }];
const nodes = await Promise.all(datasets.map(({ path }) => grp.getItem(path)));
Expand Down