-
Notifications
You must be signed in to change notification settings - Fork 787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for tweaks to IChatClient contract #5998
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR refactors the chat client abstractions so that the IChatClient implementations now automatically augment the chat history with generated responses rather than requiring consumers to update the history manually. Key changes include updating the FunctionInvokingChatClient to streamline function call handling, removing the KeepFunctionCallingContent property, and reworking ChatResponse so that Message now returns the last message in its internal list.
Reviewed Changes
File | Description |
---|---|
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs | Refactors function call processing and history update logic. |
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatResponse.cs | Updates the ChatResponse API to work with mutable messages and returns the final message. |
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/CachingChatClient.cs | Adjusts cache read logic to integrate history augmentation. |
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIChatClient.cs | Ensures responses from OpenAI are added into chat history. |
Others | Various supporting files and documentation adjustments to reflect the new design. |
Copilot reviewed 36 out of 36 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (3)
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/CachingChatClient.cs:57
- The condition for checking a cached result appears inverted compared to previous behavior. Verify that when a cached response exists it is correctly added to chatMessages and then returned, to avoid unintentionally bypassing the cache.
if (await ReadCacheAsync(cacheKey, cancellationToken).ConfigureAwait(false) is { } result)
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponse.cs:83
- The getter for ChatResponse.Message now returns the last message instead of the first. This change may affect consumers relying on the previous behavior; ensure documentation and migration guidance are updated so that users are aware of the new semantics.
if (Messages.Count == 0) { Messages.Add(new ChatMessage(ChatRole.Assistant, [])); }
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIChatClient.cs:114
- After constructing the ChatResponse from the OpenAI call, the response message is added to chatMessages to automate history augmentation. Confirm that this behavior is consistent across scenarios (e.g. caching or stateful clients) and does not lead to duplicate entries in the conversation history.
chatMessages.Add(chatResponse.Message);
Tip: Copilot code review supports C#, Go, Java, JavaScript, Markdown, Python, Ruby and TypeScript, with more languages coming soon. Learn more
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIModelMapper.ChatCompletion.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIModelMappers.StreamingChatCompletion.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/CachingChatClient.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/AdditionalPropertiesDictionary{TValue}.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving for the "fundamentals" side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed changes sound reasonable. I don't see significant impact from the standpoint of MEAI.Evaluation, since the conversation history that is passed to the evaluation APIs (for evaluation) is only read (i.e., it is embedded as part of the prompts that perform evaluation) and is not mutable.
The evaluation itself is also performed using single shot prompts at the moment (at least for the built-in evaluators that ship as part of the MEAI.Evaluation.Quality assembly today). So, mutating the conversation history for evaluation also shouldn't have any significant impact - and in more advanced future cases where multi-turn conversation may be needed for evaluation, the proposed changes should prove just as beneficial as they would for the general case.
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponse.cs
Outdated
Show resolved
Hide resolved
Actually, thinking a bit more, this may have an impact on
The assumption so far was that this would be convenient for callers since they would normally have the 'history minus the response' (i.e., Should we get rid of the We also have a couple of convenience extension methods for cases where the caller wants to supply a question answer pair, or just the model response alone (without any conversation history. Those seem fine and should not need any changes. |
Thanks for pushing this forwards, @stephentoub! I think this is a big improvement to the clarity of the model. It feels like this is explainable and neatly creates equivalences across streaming/nonstreaming and stateless/assistant modes. The two smaller design points I'd like to resolve are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I appreciate the changes, removing Choices clears the design and simplify most of the scenarios, I still think would be interesting adding breaking glass examples how the 1% would handle multiple choices.
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponse.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponse.cs
Outdated
Show resolved
Hide resolved
@SteveSandersonMS, @RogerBarreto, please take another look. @shyamnamboodiripad, please confirm the eval changes are appropriate. Thanks. |
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponse.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/IChatClient.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/IChatClient.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with respect to M.E.AI.Eval
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponse.cs
Outdated
Show resolved
Hide resolved
469d46d
to
574880e
Compare
One more peek @SteveSandersonMS before this merges? |
@stephentoub Just wanted to make sure you saw this comment that I had posted above. This could be something that we change after this PR is merged - however, if we are planning to include this PR in the snap for the next release of packages, I think this may be something that we should resolve before release. cc @peterwald |
Sorry, @shyamnamboodiripad, I missed that.
Either that, or we could do what Regarding the last message, is that always the right answer for any evaluator? Would there not be some evaluator that would want to understand the full exchange, e.g. evaluate what functions were called on the way to producing a final answer? Maybe it should just take the ChatResponse, assuming that's what's being evaluated, and its up to the evaluator what to evaluate? |
574880e
to
ababd61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest iteration looks great - thanks!
@SteveSandersonMS, if you're good with this, we can merge it. |
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponseExtensions.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/IChatClient.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! The additional clarity with this is really nice.
I added a couple of small questions but neither are intended to block you from merging here.
1d54e19
to
ed84c35
Compare
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponseUpdate.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.AzureAIInference/AzureAIInferenceChatClient.cs
Show resolved
Hide resolved
This has a bunch of benefits: - It reduces what a caller needs to do. They don't need to add the response content to the history, because that's handled for them by the client they're using. This is particularly helpful for streaming responses. - It keeps stateless usage (where the caller provides the full list of messages) and stateful usage (where the caller provides a thread id and then only additional messages) more similar, as in the stateful case the service is already tracking all the content and the caller shouldn't be adding anything to history. - It fixes the ordering of messages in automatic function invocation, where it can now fully manage the history list, even when streaming, because the caller is not responsible for adding the streaming content into the history and thus there's no concern about ordering between the consumer adds (nothing) and what the implementation adds. - It ensures in automatic function invocation that all content is sent back to inner client, because it's the inner client that added the content in the first place. - It enables all content to be yielded from the function invoking client, including content it creates that has a different role (tool) from the other streaming content (assistant), which then enables consumers to use that knowledge for things like keeping UIs up to date.
Multiple messages could be generated in a single turn as part of server-side tool use, as part of agents scenarios, as part of automatic function calling, etc. This is now being added into the history, but in some cases the caller doesn't have a history object (e.g. they just provided a string prompt and a temporary history was created), in some stateful cases the history may not be updated because it's all being tracked by the service, in some cases middleware could have manipulated the history in a way that makes it challenging to know what the additional messages are, etc. Further, this makes the non-streaming and streaming cases more synonymous, as in the streaming case all chat response updates from all messages are being "returned".
ed84c35
to
254701c
Compare
…otnet#5998) to include `ChatResponse` (which can contain multiple `ChatMessage`s) in place of a single `ChatMessage`. Unfortunately, we missed updating the TypeScript reporting code to account for this. This change fixes the problem by updating the deserialization code in TypeScript to match what .NET code serializes.
…5998) to include `ChatResponse` (which can contain multiple `ChatMessage`s) in place of a single `ChatMessage`. Unfortunately, we missed updating the TypeScript reporting code to account for this. (#6061) This change fixes the problem by updating the deserialization code in TypeScript to match what .NET code serializes.
* The .NET side code for the `ScenarioRunResult` was recently changed (#5998) to include `ChatResponse` (which can contain multiple `ChatMessage`s) in place of a single `ChatMessage`. Unfortunately, we missed updating the TypeScript reporting code to account for this. (#6061) This change fixes the problem by updating the deserialization code in TypeScript to match what .NET code serializes. * Automatically add 'untriaged' label to new issues without milestones (#6060) * Address M.E.VectorData feedback for IEmbeddingGenerator (#6058) * Move GetService down to a non-generic IEmbeddingGenerator interface * Separate UriContent from DataContent * Address feedback --------- Co-authored-by: Shyam N <shyamnamboodiripad@users.noreply.github.com> Co-authored-by: Jeff Handley <jeffhandley@users.noreply.github.com> Co-authored-by: Stephen Toub <stoub@microsoft.com>
The .NET side code for the `ScenarioRunResult` was recently changed (#dotnet#5998) to include `ChatResponse` (which can contain multiple `ChatMessage`s) in place of a single `ChatMessage`. Unfortunately, we missed updating the TypeScript reporting code to account for this. This change fixes the problem by updating the deserialization code in TypeScript to match what .NET code serializes. Cherry-picked from commit `41bbedd0` (dotnet#6061)
OLD:
As we march towards marking Microsoft.Extensions.AI.Abstractions stable in the next few months, I've been re-evaluating a few fundamental aspects of the design. This is based on usage, feedback, the code developers writing to use the abstractions, as well as on some open issues that highlight some design limitations. While we're trying to lock things down and avoid breaking changes, this PR proposes some tweaks to hopefully address all of the above and set the abstractions up for even more success.
The core of the change is that, right now, it's the developer's responsibility to update the chat history with the response message. For example, a typical getting-started chat bot loop looks like:
The developer here has to transfer the response message back into the history list. (At least when dealing with a stateless service. When dealing with a stateful service, that's not necessary or desirable.) The same challenge exists for streaming, where it's even harder. We've provided the ToChatResponse helper to try to make this easier, but the developer still needs to track all updates in order to combine them back into the history:
We expect orders of magnitude more consumers of these abstractions than implementers of these abstractions, so it'd be nice if this extra effort could be pushed into the implementations. If it were the responsibility of the IChatClient to augment the history, the developer wouldn't need to care about it, e.g.
and
The history is already declared to be mutable, but we're currently in a weird state where only some of the messages are automatically added into the history. Consider FunctionInvokingChatClient. It adds to the history all responses and generated messages incurred as part of its back and forth with the underlying IChatClient, except for the last message, which it returns. But that's not the case for streaming, where it necessarily needs to stream back all response content (if it were to instead buffer the streaming content until it knew whether there were a function call, it would no longer be streaming). However, in such a case, the streaming response might span multiple actual response messages, which means any resulting individual message that's created lacks fidelity with the original messages. In such a case as well, it means the final history likely includes out of order messages, because the FunctionInvokingChatClient is updating the history with the function calling content before the caller has the full streaming results and can update the history with it. And that then causes possible problems for subsequent use where a service gets a history containing out of order messages; some services.
Local function invocation is not the only case where there might be multiple response messages. Service-side use of tools like code interpretation and web search can also logically result in multiple response messages, as can agent-based scenarios where a single input message triggers a conversation to produce a single output.
To address all of this and more, I'm proposing the following:
List<ChatResponseUpdate>
as I did in my earlier code snippet, and again adds to the history the result of converting those into aChatMessage
(via theToChatMessage
helper). For most middleware that's not producing its own messages, no changes are needed. For middleware like FunctionInvokingChatClient, it no longer needs to augment the history with the (streaming) response messages it gets back from its inner client, because it's that inner client's responsibility to do so; the only thing the FunctionInvokingChatClient needs to add is add the function result content to the history, which it's already doing.As with other breaking changes we've made in preview, this will entail reaction in both implementations of the abstraction and consumption of it. On the consumption side, the largest impact to not reacting is that there might be some duplicate messages in the chat history, which is generally not problematic. But we'll still want to help consumers move to the new, simpler consumption patterns.
This also implicitly addresses #5909 and #5675.
cc: @SteveSandersonMS, @eiriktsarpalis, @luisquintanilla, @matthewbolanos, @RogerBarreto, @davidfowl, @DamianEdwards, @peterwald, @shyamnamboodiripad, @MackinnonBuck, @halter73 @colombod, @tghamm
Microsoft Reviewers: Open in CodeFlow