Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address possible mis-behavior due to pointer-to-exchange members being null when they did not use to be #21632

Closed
bzbarsky-apple opened this issue Aug 4, 2022 · 9 comments · Fixed by #21846

Comments

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Aug 4, 2022

Problem

After #21081 and #20237 the semantics of the exchange context member of various interaction model objects changed in a subtle but important way.

Before that change, the member could only become null in the following cases:

  • We tried to send a message, it failed, we close the exchange and null it out.
  • We tried to send the last message in the transaction, it succeeded, the exchange self-closed and we nulled it out.
  • We were waiting for a response, it timed out (possibly due to the session going away), the exchange self-closed and we nulled it out.
  • We decided we were done (e.g. due to error of some sort) and closed the exchange.

Importantly: while we are processing an incoming message, the member could not become null until we tried to send a response. That means any logic that used the member could rely on it being there while processing an incoming message.

After #21081 and #20237 that changes: now if the session goes away (as one possible trigger for the behavior) the member becomes null immediately. This means that any methods/getters that use the exchange context can start mis-behaving while we are trying to process an incoming message, including things like not being able to determine the accessing fabric (or it changing in mid-processing!). At least the following things might be affected:

  • CommandHandler::ProcessGroupCommandDataIB
  • CommandHandler::GetAccessingFabricIndex
  • CommandHandler::GetSubjectDescriptor
  • Various places in WriteHandler where I think depending on how the stack reacts to a write the mExchangeCtx could end up null where it couldn't have before.

Proposed Solution

We need to think a little bit about whether this is a problem in practice or not. Maybe something guarantees that the session can't be evicted while we are processing an incoming message on it, but I don't see obvious such guarantees.

Maybe an attempt to close an exchange while it's processing an incoming message (including due to session eviction) should simply set a flag (which would prevent any more sending of non-ack messages; need to think about what happens here if the closure is due to session going away) and do the actual closing + notification of the delegate once we unwind the stack? But note that some of the state we are interested in from exchanges actually comes from the session, so it might have mis-behaved before anyway if the session went away but the exchange remained non-null...

Another option would be for the IM objects to snapshot anything they need from the exchange except message-sending services before they start calling out to consumers for an incoming message. But that would require those objects to get bigger, which might be an issue.

Any other bright ideas @mrjerryjohns?

@mrjerryjohns
Copy link
Contributor

#21586 is a manifestation of this seemingly? (albeit, not quite, since it isn't happening in the synchronous call-path of a message processing call chain...)

@mrjerryjohns
Copy link
Contributor

But note that some of the state we are interested in from exchanges actually comes from the session, so it might have mis-behaved before anyway if the session went away but the exchange remained non-null...

Yeah - specifically, GetSessionHandle would have aborted, as part of getting the subject descriptor for example...(which is called in CommandHandler::ProcessCommandDataIB as part of this bit of logic:

    mpCallback->DispatchCommand(*this, concretePath, commandDataReader);
    MatterPostCommandReceivedCallback(concretePath, GetSubjectDescriptor());

@mrjerryjohns
Copy link
Contributor

To clarify, in the pre-exchange-holder world, there was an expectation that exchange delegates held on to their exchange for the duration of the OnMessageReceived call, even though, their exchange closed on them mid-way through that. Importantly, the expectation was that they could still interact with the exchange to attempt to do things, and expect that to fail.

@mrjerryjohns
Copy link
Contributor

Is that the expectation we want going forward?

@mrjerryjohns
Copy link
Contributor

that they could still interact with the exchange to attempt to do things, and expect that to fail.

Exploring this a bit, what about implementors of OnExchangeClosing? Was there ever an expectation that that could get called mid-way through an OnMessageReceived processing flow?

Looking at some existing examples, TimedHandler::OnExchangeClosing seemingly destroys itself, which would cause big problems since it's OnMessageReceived method expects the object to still be around for the entirety of the call...

@mrjerryjohns
Copy link
Contributor

Same goes for DefaultOTARequestor - it sets its internal mExchangeCtx to nullptr, which would cause problems when it accessed it in OnMessageReceived at the end of that method...

@mrjerryjohns
Copy link
Contributor

mrjerryjohns commented Aug 6, 2022

Maybe an attempt to close an exchange while it's processing an incoming message (including due to session eviction) should simply set a flag (which would prevent any more sending of non-ack messages; need to think about what happens here if the closure is due to session going away) and do the actual closing + notification of the delegate once we unwind the stack?

Yes, I think this is the most sensible option. Anything else just feels like it's yanking the rug from under-neath the callee, for the above stated examples.

@mrjerryjohns
Copy link
Contributor

So, I think there are two details worth noting here:

  1. The problem of exchanges closing mid-way through the message processing call-flow, and whether we should deal with that. This is a pre-existing problem in the case of that triggered by session closure, albeit, made worse by the ExchangeHolder model. However, our existing logic is not built to deal with the session-closure-triggered bit.
  2. The pattern of ExchangeHolder clearing its internal reference to the exchange context when OnExchangeClosing called, which can be unexpected for consumers who rely on it to be stable through a message processing call-flow. This is fixable I think in ExchangeHolder itself - it can avoid clearing its internal mpExchangeCtx in OnExchangeClosing if it's able to tell that the exchange is still live due to the ref being still held by the consumer (and to later in its destructor, call Release).

@bzbarsky-apple
Copy link
Contributor Author

bzbarsky-apple commented Aug 8, 2022

#21586 is a manifestation of this seemingly?

Yes, in terms of the exchange pointer becoming null.... But the session could have gone away in the past too.

Is that the expectation we want going forward?

That is the big question. I'm not sure what the best options are here, honestly, but I think we need to make the expectations, whatever they are, clear and then make sure that all the consumer code follows the expectations.

what about implementors of OnExchangeClosing? Was there ever an expectation that that could get called mid-way through an OnMessageReceived processing flow?

There weren't really any, so it was not very defined....

But yes, OnExchangeClosing could get called during OnMessageReceived if the message recipient called Close/Abort, or if the received message evicted the exchange's session.

TimedHandler::OnExchangeClosing seemingly destroys itself

Yes.

which would cause big problems since it's OnMessageReceived method expects the object to still be around for the entirety of the call.

Does it? The only places under its OnMessageReceived where it can close the exchange are the error cases when we StatusResponse::Send and none of those use the object after making the Send call, right?

Same goes for DefaultOTARequestor - it sets its internal mExchangeCtx to nullptr, which would cause problems when it accessed it in OnMessageReceived at the end of that method...

Where does it touch mExchangeCtx in OnMessageReceived? It touches ec, which is guaranteed to be alive (though possibly closed).

Yes, I think this is the most sensible option. Anything else just feels like it's yanking the rug from under-neath the callee.

I agree that this seems like the simplest thing for consumers to deal with. We just need to make sure we set the right flags.... and we still have a problem if someone tries to touch state that would reach into the session. That's assuming that a session can get evicted during processing of a message on one of its exchanges, which I'm not sure it can right now, though we have nothing specific preventing it. It just happens that nothing on those codepaths evicts that specific session....

So, I think there are two details worth noting here:

That seems like a good summary. I am less clear on how the proposed solution inside ExchangeHolder itself for item 2 works....

mrjerryjohns added a commit to mrjerryjohns/connectedhomeip that referenced this issue Aug 12, 2022
This makes some key fixes to the management of the ExchangeContext,
specifically around how the WillSendMessage flag is set/cleared, as well
as dealing with clean-up of the exchange in OnSessionReleased(). It also
fixes some key bugs in ExchangeHolder around failure to release the ref
on the exchange in certain scenarios as outlined in project-chip#21544 and project-chip#21632.

Notably, the following fixes have been made to ExchangeContext:
- Setting the WillSendMessage flag immediately in SendMessage and only
  clearing the WillSendMessage flag on actually having successfully
  sent a message in SendMessage. Otherwise, the holder is not able to
  infer that the ownership still resides with it on a SendMessage
  failure.

- Not clearing the WillSendMessage flag in OnSessionReleased. This
  ensures the holder again can infer that it still has the ref and needs
  to give it up when OnExchangeClosing is called there-after.

- Avoiding a double call of DoClose by checking if it is already closed.

- Cleaning up OnSessionReleased logic to now correctly call Abort() in
  both the case where we're waiting for a response AND we're in the
  middle of received message dispatch (i.e we're in the middle of
  OnMessageReceived). This is a pre-existing bug where the closure of
  the exchange resulted in a leak due to MessageHandled() not quite
  cleaning up the EC later when the call unwinds.

Tests:

TestExchangeHolder has been buffed up to now exhaustively test
permutations of send failure, session closure (both before and after
message transmission) and calling WillSendMessage at all points along a
3 message transmission.
mrjerryjohns added a commit to mrjerryjohns/connectedhomeip that referenced this issue Aug 12, 2022
This makes some key fixes to the management of the ExchangeContext,
specifically around how the WillSendMessage flag is set/cleared, as well
as dealing with clean-up of the exchange in OnSessionReleased(). It also
fixes some key bugs in ExchangeHolder around failure to release the ref
on the exchange in certain scenarios as outlined in project-chip#21544 and project-chip#21632.

Notably, the following fixes have been made to ExchangeContext:
- Setting the WillSendMessage flag immediately in SendMessage and only
  clearing the WillSendMessage flag on actually having successfully
  sent a message in SendMessage. Otherwise, the holder is not able to
  infer that the ownership still resides with it on a SendMessage
  failure.

- Not clearing the WillSendMessage flag in OnSessionReleased. This
  ensures the holder again can infer that it still has the ref and needs
  to give it up when OnExchangeClosing is called there-after.

- Avoiding a double call of DoClose by checking if it is already closed.

- Cleaning up OnSessionReleased logic to now correctly call Abort() in
  both the case where we're waiting for a response AND we're in the
  middle of received message dispatch (i.e we're in the middle of
  OnMessageReceived). This is a pre-existing bug where the closure of
  the exchange resulted in a leak due to MessageHandled() not quite
  cleaning up the EC later when the call unwinds.

Tests:

TestExchangeHolder has been buffed up to now exhaustively test
permutations of send failure, session closure (both before and after
message transmission) and calling WillSendMessage at all points along a
3 message transmission.
mrjerryjohns added a commit to mrjerryjohns/connectedhomeip that referenced this issue Aug 12, 2022
This makes some key fixes to the management of the ExchangeContext,
specifically around how the WillSendMessage flag is set/cleared, as well
as dealing with clean-up of the exchange in OnSessionReleased(). It also
fixes some key bugs in ExchangeHolder around failure to release the ref
on the exchange in certain scenarios as outlined in project-chip#21544 and project-chip#21632.

Notably, the following fixes have been made to ExchangeContext:
- Setting the WillSendMessage flag immediately in SendMessage and only
  clearing the WillSendMessage flag on actually having successfully
  sent a message in SendMessage. Otherwise, the holder is not able to
  infer that the ownership still resides with it on a SendMessage
  failure.

- Not clearing the WillSendMessage flag in OnSessionReleased. This
  ensures the holder again can infer that it still has the ref and needs
  to give it up when OnExchangeClosing is called there-after.

- Avoiding a double call of DoClose by checking if it is already closed.

- Cleaning up OnSessionReleased logic to now correctly call Abort() in
  both the case where we're waiting for a response AND we're in the
  middle of received message dispatch (i.e we're in the middle of
  OnMessageReceived). This is a pre-existing bug where the closure of
  the exchange resulted in a leak due to MessageHandled() not quite
  cleaning up the EC later when the call unwinds.

Tests:

TestExchangeHolder has been buffed up to now exhaustively test
permutations of send failure, session closure (both before and after
message transmission) and calling WillSendMessage at all points along a
3 message transmission.
mrjerryjohns added a commit to mrjerryjohns/connectedhomeip that referenced this issue Aug 12, 2022
This makes some key fixes to the management of the ExchangeContext,
specifically around how the WillSendMessage flag is set/cleared, as well
as dealing with clean-up of the exchange in OnSessionReleased(). It also
fixes some key bugs in ExchangeHolder around failure to release the ref
on the exchange in certain scenarios as outlined in project-chip#21544 and project-chip#21632.

Notably, the following fixes have been made to ExchangeContext:
- Setting the WillSendMessage flag immediately in SendMessage and only
  clearing the WillSendMessage flag on actually having successfully
  sent a message in SendMessage. Otherwise, the holder is not able to
  infer that the ownership still resides with it on a SendMessage
  failure.

- Not clearing the WillSendMessage flag in OnSessionReleased. This
  ensures the holder again can infer that it still has the ref and needs
  to give it up when OnExchangeClosing is called there-after.

- Avoiding a double call of DoClose by checking if it is already closed.

- Cleaning up OnSessionReleased logic to now correctly call Abort() in
  both the case where we're waiting for a response AND we're in the
  middle of received message dispatch (i.e we're in the middle of
  OnMessageReceived). This is a pre-existing bug where the closure of
  the exchange resulted in a leak due to MessageHandled() not quite
  cleaning up the EC later when the call unwinds.

Tests:

TestExchangeHolder has been buffed up to now exhaustively test
permutations of send failure, session closure (both before and after
message transmission) and calling WillSendMessage at all points along a
3 message transmission.
@mrjerryjohns mrjerryjohns linked a pull request Aug 12, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants