Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New image buffer #570

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

henrypinkard
Copy link
Member

@henrypinkard henrypinkard commented Feb 22, 2025

The V2 buffer provides thread-safe, generic data storage with improved performance and cleaner abstractions.

Before merging

  • Decide on default metadata handling and API for turning it on/off
  • Add modify pymmcore swig to expose new pointer based image handling
  • Bump Core and MMCoreJ versions appropriately

Design

Two core components:

DataBuffer: Thread-safe generic storage replacing CircularBuffer
BufferManager: Unified interface managing both legacy and new implementations

Key features of the new buffer system:

  • Thread-safe read/write access
  • Support for generic data types beyond just images
  • Support for various data types simultaneously (e.g. images of different sizes, pixels types, etc)
  • Zero copy writing into the buffer by giving devices pointers into it
  • Zero copy management at the application layer through pointer manipulation

It can be enabled with:

core.enableV2Buffer(true)

Performance

As a drop in replacement for the circular buffer (i.e. copying the data same number of times, but allowing for arbitrary size and data types), the new buffer gives equal or better performance:

In sequence acquisitions:
image

In continuous sequence acquisitions (live mode):

image

It's significantly faster to allocate

image

Additionally, it has two key features that will enable much higher performance code:

  1. Application layer (e.g. Java, Python), can get access to data/metadata via pointers, avoiding direct copies. (In a quick test this seems to give a 2x speed improvement for reading out 2048x2048 images)
  2. Device adapters can avoid and extra copy by requesting a slot to write data into

Testing

I've written and validated the new buffer and circular buffer against many new tests here. (FYI these live in mmpycorex so they can easily test both MMCoreJ and pymmcore)

It also passes all the pycromanager acquisition tests, which test the various functionalities of the acquisition engine

Metadata

In conjunction with these changes, it made sense to standardize the metadata added to to images. This was previously split amongst several places, making it hard to keep track of and maintain, including the SWIG wrapper, the core, the corec allback, and the device code. Some of it was generated at the time of image acquisition, and some of it was generated at the time of image retrieval.

It has now all been consolidated into void CMMCore::addCameraMetadata, and the same metadata is added to all images whether snapped or passed through a buffer (with the small exception of some multi-camera device adapter-specific tags).

Testing reveals there's a substantial performance cost to adding so much metadata to all images:

image

Previously, much of this cost was incurred when reading images back out of the buffer. With the new changes, it is incurred at the time of insertion. However, I think it makes much more sense that this metadata is added at insertion time, because that's when its most likely to be in sync with the actual state of hardware.

Since this consolidation takes place outside the BufferManager, it also affects the circular buffer and will change behavior even if the v2 buffer is disabled. We need to figure out what should be enabled here. It's unclear (to me) what higher level code depends on what tags, but including the union of all of them by default will substantially hurt performance. I also have just a temporary function in the core API for controlling which metadata to add, which should perhaps be replaced with something more permanent.

Multi-camera

While it is possible to use the v2 buffer with multi-camera devices, since its flexibility is a more general solution (e.g. supports different image sizes, types, etc) to than the multi-camera device adapter, in my opinion that should be deprecated and application code that relies on it updated to the v2 buffer.

One addition here is the getLastTaggedImageFromDevicePointer("cameraLabel"), which enables you to get the last image from a specific camera, rather than having to search backwards through the most recent images and read their metadata.

A step towards a single route for all data

The pointer-based API gives a good opportunity to start moving towards a single route for all data, rather than a separate route for snap and sequences. I don't think its possible to fully do this without changing how cameras handle data for snaps, but in the mean time the GetImagePointer function now copies the snap buffer in camera adapters into the v2 buffer, returning a pointer to it. This should be faster than copying into the application memory space because it can be multithreaded, and still allows the pointer-based handling of the data from the application layer.

Pointer based image handling

You get these through methods like getLastImagePointer(), which return a TaggedImagePointer object. This object is a wrapper around the TaggedImage object, but it will not load the pixels until you call getPixels(), or if you never want to use them you can call release(), or just use the metadata without pixels like:

TaggedImagePointer tip = core.getLastImagePointer();
// This works just like a regular JSONObject, but it won't load
// the metadata until needed
tip.tags.get("Width");

…ure metadata is accurate for v2 buffer images
…c; Also centralized Metadata generation into the core from SWIG, core callback, device base
@henrypinkard
Copy link
Member Author

Sounds good, thanks! Also happy to discuss over zoom if that's easier

Copy link
Contributor

@tlambert03 tlambert03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your hard work on this @henrypinkard, it's very promising!

I took a look over and compiled it for testing pymmcore-plus. I had to make a few changes to the C++ code in order to get pymmcore to compile, and I've tried to mark all of those places inline below.

A few pymmcore-plus tests failed with the changes in metadata that was included by default, specifically the timing metadata, which probably can't be removed to a non-default level without a deprecation.

As long as core.enableV2Buffer is not True, everything else works fine. If I do enable the v2 buffer, I have one test failing, and it relates to using hardware sequencing with multi-cam. I haven't quite nailed it down yet, so I don't have much useful info at the moment... but for some reason, the buffer seems to return more images than expected (i.e. getRemainingImageCount keeps being non-zero beyond what I would have expected it to, and beyond what it does without core.enableV2Buffer enabled). Will let you know when I learn more

thanks again!

MMCore/MMCore.h Outdated
Comment on lines 457 to 460
/** \name v2 buffer control. */
///@{
void enableV2Buffer(bool enable) throw (CMMError);
bool usesV2Buffer() const { return bufferManager_->IsUsingV2Buffer(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels quite similar to feature flags... Could we use enableFeature/isFeatureEnabled instead of adding two new public methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine by me. Do you agree @marktsuchida ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And it would be good to take into account the guidelines in CoreFeatures.cpp: for example, make sure that the newly added methods that require the V2 buffer throw an error if invoked without enabling the feature (if they don't already).

I'd maybe also suggest using a name like NewSequenceBuffer or SequenceBuffer2025 rather than "V2", since we probably don't want to appear to have a versioning scheme separate from MMCore's. It could also be a name indicating its properties, such as FastSequenceBuffer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a way to do this with the feature feature, because enabling or disabling the v2 buffer requires additional actions in the core:

int ret = bufferManager_->EnableV2Buffer(enable);

// Default include when using circular buffer, exclude new buffer
imageMDIncludeLegacyCalibration_ = !enable;
imageMDIncludeSystemStateCache_ = !enable;
imageMDIncludeCameraTags_ = !enable;

and the CoreFeatures seems to just be boolean flags without a pointer to the Core itself. @marktsuchida can you clarify what the preferred behavior is here?

* The caller should have locked the camera device, or be calling from a thread
* in the camera (e.g. CoreCallback::InsertImage)
*/
void CMMCore::addCameraMetadata(std::shared_ptr<CameraInstance> pCam, Metadata& md, unsigned width, unsigned height,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

      mmCoreAndDevices/MMCore/MMCore.cpp:2706:15: error: exception specification in declaration does not match previous declaration
       2706 | void CMMCore::addCameraMetadata(std::shared_ptr<CameraInstance> pCam, Metadata& md, unsigned width, unsigned height,
            |               ^

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this. still seeing it with latest code?

@henrypinkard
Copy link
Member Author

Thanks for the review @tlambert03!

Are you on Mac/Linux? (I've only tested on Windows)

Will address some of those specific bugs later.

This function and the general codifying of metadata (its structure, how and when it is fetched) is probably the main concern for me in this PR.

I agree that the metadata needs to be discussed and figured out before merging

Metadata

Kinda brings in some of the Java-side patterns that I have to admit I was kind of happy we didn't have baked into the C++ code

I favor limiting default metadata to minimize performance impact. Breaking dependencies will become apparent and can be fixed, whereas including everything causes performance hits without clear attribution. We also may need a more nuanced approach, like enabling certain metadata in MMCoreJ or pymmcore.

I think we first need to pool knowledge about the various pieces of Metadata, their original purpose, and the present reasons for their use. I have only limited knowledge of this. @nicost @marktsuchida @jondaniels @go2scope please chime in:

Specific metadata

(See the new consolidated function)

Width, Height, PixelType, Camera
I'd say these are required and should be added to every image. This represents the minimal set of Metadata required by AcqEngJ

Bit Depth
I know this is used by the Java UI, but is not essential because it can fallback on the byte depth.

ROI/Binning
?

Timestamps/image number
?

Camera-specific Tags
Don't know -- but there can't be anything so general that relies on this because the tags aren't standardized across cameras. If this is just for certain cameras, I'd say these tags should probably be handled by the device adapter itself

Pixel-Size/affine transform
I believe this is used by the UI to make scalebars. But I'm not sure these would even be correct with the v2 buffer, where multiple cameras operating simultaneously can have different sizes/transforms, which argues for excluding it when using the v2 buffer at the very least.

FrameIndex, SliceIndex, etc
I think these have to do with the Clojure engine, and perhaps the UI uses them? Seems like they shouldn't be here at all, but I don't know what relies on them

System state cache
Not sure why it was included. Probably should be off by default given the cost. I know @nicost recently added a flag to conditionally disable.

@henrypinkard
Copy link
Member Author

As long as core.enableV2Buffer is not True, everything else works fine. If I do enable the v2 buffer, I have one test failing, and it relates to using hardware sequencing with multi-cam. I haven't quite nailed it down yet, so I don't have much useful info at the moment... but for some reason, the buffer seems to return more images than expected (i.e. getRemainingImageCount keeps being non-zero beyond what I would have expected it to, and beyond what it does without core.enableV2Buffer enabled). Will let you know when I learn more

Okay sounds good, let me know.

If its only breaking multi-cam with v2 buffer I'd say its less critical, because the v2 buffer makes the workaround provided by the multi-camera adapter unnecessary. But couldn't hurt to fix if you're able to nail it down

Not urgent, but if you have a test suite for core functionality, you might consider in the future moving it to https://github.com/micro-manager/mmpycorex, so that it could also cover functionality via MMCoreJ

@marktsuchida
Copy link
Member

For the metadata, I think it is fine to default to a minimal set (or even empty) when using V2, given that you are providing control over including/excluding each group (using flags, as proposed above). MMStudio and other legacy code (Clojure engine) will need to start by enabling all of the flags, but the flags provide a nice way to migrate.

Some comments on the individual groups:

Bit Depth I know this is used by the Java UI, but is not essential because it can fallback on the byte depth.

This is the actual sample bit depth (e.g., 12 or 14 bit), not the sample format (8 or 16 bit).

Camera-specific Tags Don't know -- but there can't be anything so general that relies on this because the tags aren't standardized across cameras. If this is just for certain cameras, I'd say these tags should probably be handled by the device adapter itself

It looks like this is pCam->GetTags(), which returns the set of fields set on the camera device object. This is separate from the fields that cameras add in their code before calling InsertImage() (which can have a combination of standard and nonstandard keys). This AddTag()/GetTags() mechanism was presumably added to support Multi Camera but it is also used by ArduinoCounter and TeensyPulseGenerator.

Potentially important if allowing Multi Camera et al. to continue to work (which I think is important because migration away from it will necessarily be gradual -- even after we introduce a new way, we may not want to break people's config files without warning).

As for the actual camera-specific tags (which cameras send via InsertImage()), they are in need of a cleanup: we need to figure out which fields are de facto standard and document them. There may be cases where MMCore overwrites such fields, or adds a field only when the camera did not provide a value; those behaviors might be implicitly relied upon by users of particular cameras. But it looks like your intent is to not change this behavior, so that task can be decoupled from this PR, I think.

Pixel-Size/affine transform I believe this is used by the UI to make scalebars. But I'm not sure these would even be correct with the v2 buffer, where multiple cameras operating simultaneously can have different sizes/transforms, which argues for excluding it when using the v2 buffer at the very least.

Excluding by default seems reasonable. Although I do think we need a solution for pixel sizes when using multiple cameras; that can be a separate discussion, including whether the mapping should be done by MMCore or by the application.

FrameIndex, SliceIndex, etc I think these have to do with the Clojure engine, and perhaps the UI uses them? Seems like they shouldn't be here at all, but I don't know what relies on them

I agree. "legacy metadata" is a good name for these. Or maybe "additional legacy metadata" because some of the others feel a bit legacy, too.

Both the Clojure engine and MMStudio's use of AcqEngJ (in combination with the Datastores receiving the images from them) could potentially be relying on these fields being present under certain conditions, so it's good that they can be switched on until confirmed to be unnecessary.

@nicost
Copy link
Member

nicost commented Feb 25, 2025

Are we talking here about per-image data (I believe so, but good to get this clear)?

ROI/Binning

For area sensor like cameras, this is critical information. Data consumers need to have access to this. This may even change in a stream of data (several camera APIs have the option send separate ROIs, all from the same exposure).

Timestamps/image number

Ideally, each camera would add their own time stamp (and sequential image number). Knowing the exact time the exposure happens is critical for many applications. The problem has been that many camera device adapter do not provide this information, hence the work-around to provide the time data arrived in the core (which also allows untangling camera time from computer time). I am not sure where this should be, but there has to be a way to record this data.

Pixel-Size/affine transform

There needs to be a way to get this information from the image data. This may be stored just in summary metadata.

System state cache

In the ideal world, it will always be possible to reconstruct the complete system state at the moment the image was taken. I believe that was the reason to include the complete system state cache. The cost was relatively high, limiting data rates to on or a few kHZ, which was very detrimental for high speed acquisition of many small images. In reality, changes in the system state during fast acquisitions are often not even recorded in the system state cache. It seems most useful to me to store the system state in the summary metadata, and then add known deviations from the system state to the image metadata. For hardware synchronized acquisitions, it may be problematic that only the acquisition engine has complete knowledge of the (desired) state of the system. In that sense, adding the recipe for the desired changes to the summary metadata could suffice.

@marktsuchida
Copy link
Member

I noticed that a few new functions (AcquireImageWriteSlot et al.) are added to CoreCallback but not to its base (interface) class MM::Core. @henrypinkard Would I be right in thinking that these are there to show how the transfer of images from the camera to MMCore can be made more efficient, but are kept hidden from device adapters for the time being?

I would agree with a cautious approach here because it would be bad if we have cameras that only work with V1 or V2. It would also be bad if every camera that supports WriteSlot had to add conditional branches to check if V2 is enabled. Probably the best way is to say that cameras must use either InsertImage or WriteSlot, but not both, and then make WriteSlot just work with V1 as well, just without the benefit of eliminating a copy. (As far as I'm concerned, it's fine if this PR doesn't yet expose WriteSlot to devices.)

@henrypinkard
Copy link
Member Author

henrypinkard commented Feb 26, 2025

Answers to questions:

@marktsuchida

For the metadata, I think it is fine to default to a minimal set (or even empty) when using V2, given that you are providing control over including/excluding each group (using flags, as proposed above). MMStudio and other legacy code (Clojure engine) will need to start by enabling all of the flags, but the flags provide a nice way to migrate.

Makes sense.

Bit Depth I know this is used by the Java UI, but is not essential because it can fallback on the byte depth.

This is the actual sample bit depth (e.g., 12 or 14 bit), not the sample format (8 or 16 bit).

Right, I just meant you can assume 12, 14 etc map to 16. But maybe the performance hit is small enough that it can just be included.

t looks like this is pCam->GetTags(), which returns the set of fields set on the camera device object. This is separate from the fields that cameras add in their code before calling InsertImage() (which can have a combination of standard and nonstandard keys). This AddTag()/GetTags() mechanism was presumably added to support Multi Camera but it is also used by ArduinoCounter and TeensyPulseGenerator.

Hmmm I guess I am unfamiliar with this mechanism. But another one that's not a significant performance hit

Pixel-Size/affine transform I believe this is used by the UI to make scalebars. But I'm not sure these would even be correct with the v2 buffer, where multiple cameras operating simultaneously can have different sizes/transforms, which argues for excluding it when using the v2 buffer at the very least.

Excluding by default seems reasonable. Although I do think we need a solution for pixel sizes when using multiple cameras; that can be a separate discussion, including whether the mapping should be done by MMCore or by the application.

Agree

I noticed that a few new functions (AcquireImageWriteSlot et al.) are added to CoreCallback but not to its base (interface) class MM::Core. @henrypinkard Would I be right in thinking that these are there to show how the transfer of images from the camera to MMCore can be made more efficient, but are kept hidden from device adapters for the time being?
I would agree with a cautious approach here because it would be bad if we have cameras that only work with V1 or V2. It would also be bad if every camera that supports WriteSlot had to add conditional branches to check if V2 is enabled. Probably the best way is to say that cameras must use either InsertImage or WriteSlot, but not both, and then make WriteSlot just work with V1 as well, just without the benefit of eliminating a copy. (As far as I'm concerned, it's fine if this PR doesn't yet expose WriteSlot to devices.)

This might have been an accidental omission -- but I think perhaps the best strategy is to add them to ensure they work with the design, but keep them commented out for now. Thoughts?

Agree that we don't want cameras to have to be aware of which buffer they write into. It is already the case that InsertImage is compatible with v1 or v2. If called with v2 it simply internally acquires a write slot, copies, and releases the slot.

make WriteSlot just work with V1 as well, just without the benefit of eliminating a copy.

But I'm not sure how this could work without potential memory safety issues.

Seems like it would be a shame to not allow cameras to write directly into the buffer just because v1 doesn't support it. Are there other ways to handle this?

@nicost

Are we talking here about per-image data (I believe so, but good to get this clear)?

Correct. Summary metadata is generated by the acquisition engine, not the core.

ROI/Binning

For area sensor like cameras, this is critical information. Data consumers need to have access to this. This may even change in a stream of data (several camera APIs have the option send separate ROIs, all from the same exposure).

Okay so default include I suppose

Pixel-Size/affine transform

There needs to be a way to get this information from the image data. This may be stored just in summary metadata.

Already is for AcqEngJ, so that argues to default exclude this one

System state cache

In the ideal world, it will always be possible to reconstruct the complete system state at the moment the image was taken. I believe that was the reason to include the complete system state cache.

In the existing implementation, this isn't exactly what is happening, since the system state cache is added at the time of image readout, not image acquisition

The cost was relatively high, limiting data rates to on or a few kHZ, which was very detrimental for high speed acquisition of many small images. In reality, changes in the system state during fast acquisitions are often not even recorded in the system state cache. It seems most useful to me to store the system state in the summary metadata, and then add known deviations from the system state to the image metadata.

I'm not sure storing the changes would solve the performance problem, because (if I'm not mistaken) you'd still have to iterate through the full system state cache on each image to look for changes. One way to do this might be to have a thread in the core listen to callbacks from devices, and update what should be added to image metadata when it gets notified a change. But this might require a good amount of work and should probably be a seperate PR.

For hardware synchronized acquisitions, it may be problematic that only the acquisition engine has complete knowledge of the (desired) state of the system. In that sense, adding the recipe for the desired changes to the summary metadata could suffice.

Adding the recipe for changes essentially already happens in AcqEngJ by converting acquisition events to image metadata.

I think given all these considerations, I would propose:

  • System state cache is on by default with old buffer for backwards compat
  • Its off by default for v2 buffer
  • Make AcqEngJ add it to the summary metadata by default

@henrypinkard
Copy link
Member Author

Potential metadata plan. Are we in agreement on this?

Always include

Width, Height, PixelType, Camera

Default include

Bit Depth
ROI/Binning
Timestamps/image number

Default include v1, default exclude v2. (+ add to summary metadata)

Camera-specific Tags
System state cache
Pixel-Size/affine transform

Default exclude

Legacy metadata: FrameIndex, SliceIndex, etc

@nicost
Copy link
Member

nicost commented Feb 26, 2025

Sounds good. As for System State Cache, there currently is a mechanism to switch that off if desired (MMCoreJ.i::includeSystemStateCache_). I am all for cleaning that up, but we will need a way to continue switching that off also when using V1 when needed (hoping to start V2 soon on the Java side, but expecting that will take some time).

@henrypinkard
Copy link
Member Author

Sounds good. As for System State Cache, there currently is a mechanism to switch that off if desired (MMCoreJ.i::includeSystemStateCache_). I am all for cleaning that up, but we will need a way to continue switching that off also when using V1 when needed (hoping to start V2 soon on the Java side, but expecting that will take some time).

I moved that method to the core and out of the SWIG layer, but will keep it indefinitely for backwards compatibility

@henrypinkard
Copy link
Member Author

Added system state cache to summary metadata: micro-manager/AcqEngJ#127

@henrypinkard
Copy link
Member Author

Okay I think I've addressed everything except for the two remaining unresolved comments above.

I think what to do about acquiring write slots can be addressed in future PR, but it would be good to figure out what the eventual strategy will be

I noticed that a few new functions (AcquireImageWriteSlot et al.) are added to CoreCallback but not to its base (interface) class MM::Core. @henrypinkard Would I be right in thinking that these are there to show how the transfer of images from the camera to MMCore can be made more efficient, but are kept hidden from device adapters for the time being?

This was indeed an accidental omission. I've added them to the interface, but this is commented out for now

I would agree with a cautious approach here because it would be bad if we have cameras that only work with V1 or V2. It would also be bad if every camera that supports WriteSlot had to add conditional branches to check if V2 is enabled. Probably the best way is to say that cameras must use either InsertImage or WriteSlot, but not both, and then make WriteSlot just work with V1 as well, just without the benefit of eliminating a copy. (As far as I'm concerned, it's fine if this PR doesn't yet expose WriteSlot to devices.)

NewDataBuffer (formerly V2) maintains backward compatibility with InsertImage, so existing camera adapters will continue to function. Users who prefer to avoid pointer management can still use InsertImage to copy data into the NewDataBuffer.
However, AcquireWriteSlot is incompatible with CircularBuffer for several reasons:

  • The CircularBuffer isn't truly circular - it fully empties when filled in continuous sequence mode, which would invalidate pointers held by device code.
  • CircularBuffer lacks slot locking mechanisms. Adding this functionality would essentially transform it into something similar to NewDataBuffer while risking new bugs.

I considered creating an intermediate compatibility buffer that would temporarily store data when a camera acquires a write slot with CircularBuffer enabled. However, this approach would be overly complex and effectively just place a NewDataBuffer instance between the CircularBuffer and device adapters.

I think the path forward is to enable write slot acquisition in a future PR after additional testing, with the requirement that camera device adapters using the acquire/release slot feature must use the new buffer. Since the performance testing shows NewDataBuffer is more performant than CircularBuffer while covering all the same use cases, it seems to me the application should be migrated to using it as the default option as soon as we are confident in its robustness

@marktsuchida
Copy link
Member

I'm finally getting around to looking at the details of the image retrieval API for the V2 buffer. Please correct me if I'm misunderstanding anything below.

Using the new mechanism, the app (let's look at Java for now) calls popNextDataPointer(), which, after the MMCoreJ wrapping into popNextTaggedImagePointer(), returns TaggedImagePointer. That Java class stores a (wrapped) BufferDataPointer -- so the app can indefinitely hold on to the pointer. When the app finally calls TaggedImagePointer.release(), this goes through BufferDataPointer::release(), which calls the current DataBuffer's ReleaseDataReadPointer() via the BufferManager. But in the meantime, the DataBuffer might have been deleted and replaced, due to the buffer size changing (or due to switching V2 off and on again). In such cases, the release() will crash or corrupt memory. (Also, accessing the image data after the buffer has been reallocated will crash or return incorrect data.)

On the other hand, if the app obtains a TaggedImagePointer and then lets go of it without calling release(), the slots pointed to remain allocated indefinitely and there is no way to reclaim the memory (other than resizing the buffer). This is perhaps less critical because there is no correct way to use this without explicitly calling release().

It's also generally hard to see what all the problems are that could arise from the lifetime management (or lack thereof) of buffer slots -- a problem in itself. I think the only safe way to deal with this is to explicitly share ownership of the buffer slots (including the memory backing them) between MMCore and the app. This could be done by managing the slots with std::shared_ptr (which performs automatic reference counting); the Java side could hold onto a copy of the shared_ptr until the app explicitly release()s the slot, so that the image data is available even if the MMCore buffer goes away.

After having written the above, I realized that you don't have separate buffers for each slot (like the V1 buffer) but rather one big, contiguous buffer. You can still have shared_ptrs that point to each block of that buffer while sharing ownership of the whole buffer (ask me how if not clear), though that would mean that the large buffer will be kept alive until every single pointer retained by the app is let go of.

(It is not clear to me what the advantage of the contiguous buffer is. You end up using freeRegions_, which is complicated and also has the danger of becoming fragmented (this could be seen as reinventing a simplistic memory allocator). Also, the single contiguous buffer strategy will never work on 32-bit systems, although we maybe don't care about that.)

As for the Java API for things that need to be explicitly "released" by user code, it should conform to the AutoCloseable interface so that it can be used with the try-with-resources statement (rough equivalent to Python's with statement).

(Ideally we also automatically release the shared_ptr when the Java object gets garbage collected, which would require java.lang.ref.Cleaner, which would require us to update to Java 9+ first, but that we can do. (Please don't use finalize().) But this could be added later, I think. Having this means that people won't need to restart the program after running incorrect code that fails to release the buffer slots, but it should not be necessary for correct code.)

I'm afraid I cannot recommend merging this until these buffer lifetime issues are addressed. If possible, it might be productive to split this PR into two: one that cleans up the metadata handling without introducing the V2 buffer, and one that purely introduces the V2 buffer. That would speed up reviewing the changes to the metadata handling (which I still need to take another look at -- I'm mostly happy with it but it's easier to make 100% sure there are no unknown changes in behavior than to later troubleshoot the existing (sometimes hacky) application code that might depend on exact behavior).

@henrypinkard
Copy link
Member Author

henrypinkard commented Mar 3, 2025

I'm finally getting around to looking at the details of the image retrieval API for the V2 buffer. Please correct me if I'm misunderstanding anything below.

Thanks for taking a look. Your understanding is mostly correct -- but in a couple places I think you've misunderstood, and in fact the behavior you're advocating for is already implemented.

Using the new mechanism, the app (let's look at Java for now) calls popNextDataPointer(), which, after the MMCoreJ wrapping into popNextTaggedImagePointer(), returns TaggedImagePointer. That Java class stores a (wrapped) BufferDataPointer -- so the app can indefinitely hold on to the pointer. When the app finally calls TaggedImagePointer.release(), this goes through BufferDataPointer::release()

Correct

which calls the current DataBuffer's ReleaseDataReadPointer() via the BufferManager. But in the meantime, the DataBuffer might have been deleted and replaced, due to the buffer size changing (or due to switching V2 off and on again). In such cases, the release() will crash or corrupt memory. (Also, accessing the image data after the buffer has been reallocated will crash or return incorrect data.)

The clearing/deletion of the v2 differ is handled differently than v1 circular buffer for exactly this reason. The v1 buffer clears/is reallocated every time before starting a sequence acquisition. The v2 buffer does not. For v2, we've now split into two separate operations: clearing and resetting. Trying to clear when there is application code that holds outstanding slots will throw an error. Reset is the more dangerous operation that has the problems you mention, which is why its not simply slotted in eveywhere that the old circular buffer used to be cleared. For example, for the case of changing the buffer size, we have:

void BufferManager::ReallocateBuffer(unsigned int memorySizeMB) {
   if (useNewDataBuffer_.load()) {
      int numOutstanding = newDataBuffer_->NumOutstandingSlots();   
      if (numOutstanding > 0) {
         throw CMMError("Cannot reallocate NewDataBuffer: " + std::to_string(numOutstanding) + " outstanding active slot(s) detected.");
      }
      delete newDataBuffer_;
      newDataBuffer_ = new DataBuffer(memorySizeMB);
   } else {
      delete circBuffer_;
      circBuffer_ = new CircularBuffer(memorySizeMB);
   }
}

On the other hand, if the app obtains a TaggedImagePointer and then lets go of it without calling release(), the slots pointed to remain allocated indefinitely and there is no way to reclaim the memory (other than resizing the buffer). This is perhaps less critical because there is no correct way to use this without explicitly calling release().

You'd have to call reset(). Resizing the buffer would fail in this situation. But yes, there is no correct way to handle pointers without explicit calls to release.

It's also generally hard to see what all the problems are that could arise from the lifetime management (or lack thereof) of buffer slots -- a problem in itself. I think the only safe way to deal with this is to explicitly share ownership of the buffer slots (including the memory backing them) between MMCore and the app. This could be done by managing the slots with std::shared_ptr (which performs automatic reference counting); the Java side could hold onto a copy of the shared_ptr until the app explicitly release()s the slot, so that the image data is available even if the MMCore buffer goes away.

This is essentially what already happens (though not with shared_ptrs)

(It is not clear to me what the advantage of the contiguous buffer is. You end up using freeRegions_, which is complicated and also has the danger of becoming fragmented (this could be seen as reinventing a simplistic memory allocator).

Empirical testing indicated that the currect mechanism of memory mapping a large buffer had the best performance.

True abou the fragmentation, though I don't think this is so likely to happen in practice (I can provide more detail if needed). In any case, this is an internal implementation detail that can always be changed in a future PR without breaking backwards compatibility.

Also, the single contiguous buffer strategy will never work on 32-bit systems, although we maybe don't care about that.)

In my opinion, 32 bit support should be retired

As for the Java API for things that need to be explicitly "released" by user code, it should conform to the AutoCloseable interface so that it can be used with the try-with-resources statement (rough equivalent to Python's with statement).

(Ideally we also automatically release the shared_ptr when the Java object gets garbage collected, which would require java.lang.ref.Cleaner, which would require us to update to Java 9+ first, but that we can do. (Please don't use finalize().) But this could be added later, I think. Having this means that people won't need to restart the program after running incorrect code that fails to release the buffer slots, but it should not be necessary for correct code.)

I went back and forth on how forgiving the design should be about forgetting to call release(), but when it comes down to it, that really is the only correct way to use this. I think a lot of higher level code may pass images around so putting it in a single try block may not always be feasible. But I can certainly add support for AutoClosable

I would say its better to not upgrade to Java 9 first in case that brings other unforseen issues.

If possible, it might be productive to split this PR into two: one that cleans up the metadata handling without introducing the V2 buffer, and one that purely introduces the V2 buffer. That would speed up reviewing the changes to the metadata handling (which I still need to take another look at -- I'm mostly happy with it but it's easier to make 100% sure there are no unknown changes in behavior than to later troubleshoot the existing (sometimes hacky) application code that might depend on exact behavior).

I understand the motivation for this, but unfortunately, I think this would be very challenging. The metadata generation was tangled up in many other functions. I tried to do this very carefully to avoid unexpected changes in higher level application code. It will at least be straightforward to implement fixes once identified since it is all centralized now. We could consider default including legacy metadata (even though it would give a performance hit) on the v1 buffer so that unexpected things don't break

Update: I split out changes to the circularbuffer behavior into #588

@henrypinkard
Copy link
Member Author

@marktsuchida I've made changes based on our discussion yesterday:

  • clarified and added more checks for calling clear() on the new buffer (which throws if there are unreleased pointers) and forceReset(), which is the more dangerous option, and is documented as such
  • Add the Autoclosable interface to TaggedImagePointer

Also some further explanation on this:

(It is not clear to me what the advantage of the contiguous buffer is. You end up using freeRegions_, which is complicated and also has the danger of becoming fragmented (this could be seen as reinventing a simplistic memory allocator).

Note that the contiguous buffer is memory mapped, so its not the same a regular contiguous allocation (which i tried first and was very slow).

The combination of a contiguous memory mapping + slot management system (free regions, etc) is efficient because you never have to create new heap objects. The circular buffer is much slower to initialize (see graph above), especially for small image size, because it pre-allocates many frameBuffers to hold its images. I'm not sure how the new buffer could maintain flexibility to different image sizes yet not suffer this pre-allocation penalty without the current strategy of allocating a big block

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants