Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for tweaks to IChatClient contract #5998

Merged
merged 9 commits into from
Mar 6, 2025

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Feb 28, 2025

  • Reverts to an immutable model for chat history
  • Updates ChatResponse to contain multiple response messages
  • Removes Choices and ChoiceIndex

OLD:

As we march towards marking Microsoft.Extensions.AI.Abstractions stable in the next few months, I've been re-evaluating a few fundamental aspects of the design. This is based on usage, feedback, the code developers writing to use the abstractions, as well as on some open issues that highlight some design limitations. While we're trying to lock things down and avoid breaking changes, this PR proposes some tweaks to hopefully address all of the above and set the abstractions up for even more success.

The core of the change is that, right now, it's the developer's responsibility to update the chat history with the response message. For example, a typical getting-started chat bot loop looks like:

List<ChatMessage> history = [];
while (true)
{
    Console.Write("Q: ");
    history.Add(new ChatMessage(ChatRole.User, Console.ReadLine());

    var response = await client.GetResponseAsync(history);
    Console.WriteLine(response);

    history.Add(response.Message);
}

The developer here has to transfer the response message back into the history list. (At least when dealing with a stateless service. When dealing with a stateful service, that's not necessary or desirable.) The same challenge exists for streaming, where it's even harder. We've provided the ToChatResponse helper to try to make this easier, but the developer still needs to track all updates in order to combine them back into the history:

List<ChatMessage> history = [];
List<ChatResponseUpdates> updates = [];
while (true)
{
    Console.Write("Q: ");
    history.Add(new ChatMessage(ChatRole.User, Console.ReadLine());

    updates.Clear();
    await foreach (var update in client.GetStreamingResponseAsync(history))
    {
        updates.Add(update);
        Console.Write(update);
    }
    Console.WriteLine();

    history.Add(updates.ToChatResponse());
}

We expect orders of magnitude more consumers of these abstractions than implementers of these abstractions, so it'd be nice if this extra effort could be pushed into the implementations. If it were the responsibility of the IChatClient to augment the history, the developer wouldn't need to care about it, e.g.

List<ChatMessage> history = [];
while (true)
{
    Console.Write("Q: ");
    history.Add(new ChatMessage(ChatRole.User, Console.ReadLine());

    Console.WriteLine(await client.GetResponseAsync(history));
}

and

List<ChatMessage> history = [];
while (true)
{
    Console.Write("Q: ");
    history.Add(new ChatMessage(ChatRole.User, Console.ReadLine());

    await foreach (var update in client.GetStreamingResponseAsync(history))
    {
        Console.Write(update);
    }
    Console.WriteLine();
}

The history is already declared to be mutable, but we're currently in a weird state where only some of the messages are automatically added into the history. Consider FunctionInvokingChatClient. It adds to the history all responses and generated messages incurred as part of its back and forth with the underlying IChatClient, except for the last message, which it returns. But that's not the case for streaming, where it necessarily needs to stream back all response content (if it were to instead buffer the streaming content until it knew whether there were a function call, it would no longer be streaming). However, in such a case, the streaming response might span multiple actual response messages, which means any resulting individual message that's created lacks fidelity with the original messages. In such a case as well, it means the final history likely includes out of order messages, because the FunctionInvokingChatClient is updating the history with the function calling content before the caller has the full streaming results and can update the history with it. And that then causes possible problems for subsequent use where a service gets a history containing out of order messages; some services.

Local function invocation is not the only case where there might be multiple response messages. Service-side use of tools like code interpretation and web search can also logically result in multiple response messages, as can agent-based scenarios where a single input message triggers a conversation to produce a single output.

To address all of this and more, I'm proposing the following:

  • A IChatClient.Get{Streaming}ResponseAsync implementation has the responsibility to add any messages it generates into the chat history. For typical leaf clients, that means GetResponseAsync just adds to the history any message it's also going to return, and for GetStreamingResponseAsync, it maintains a List<ChatResponseUpdate> as I did in my earlier code snippet, and again adds to the history the result of converting those into a ChatMessage (via the ToChatMessage helper). For most middleware that's not producing its own messages, no changes are needed. For middleware like FunctionInvokingChatClient, it no longer needs to augment the history with the (streaming) response messages it gets back from its inner client, because it's that inner client's responsibility to do so; the only thing the FunctionInvokingChatClient needs to add is add the function result content to the history, which it's already doing.
  • ChatResponse.Choices is removed. Of the many IChatClient implementations that now exist, literally zero support multiple choices in a response. The industry appears to have moved on from the concept, and it's not something we need to represent in the abstraction; having it just adds complication (and makes the rest of this proposal challenging, because the implementations would need to choose just one choice as the result). The main argument for services supporting choices over submitting the same request multiple times was as a cost-cutting measure for re-processing the same input multiple times, but that's largely been addressed via services supporting caching, making it all the more unlikely this will re-emerge as a need. But if it ever does, it can be added back in a niche manner, e.g. just as today the only way to ask a service to provide multiple responses is via AdditionalProperties, AdditionalProperties on the response could also be used to bring back alternatives, at least for the non-streaming case.
  • Add the ability for ChatResponse to carry multiple messages, not just one. GetResponseAsync implementations then return all messages generated as part of the request rather than just the last. This is useful in cases where the implementation isn't maintaining a history at all (e.g. its stateful in the service), where the caller isn't maintaining a history (e.g. where they're just passing in a string prompt that results in a temporary history list being created), or where middleware mucks with the history in a way that makes it harder for the consumer to know what's new... but even without that, providing back all of the messages just eases the burden on the consumer writing code. ChatResponse.Message still behaves the same, returning the last message from the stored list of messages.
  • Get rid of FunctionInvokingChatClient.KeepFunctionCallingMessages. It defaulted to true (another indication of this being the right direction), and it's now just how the whole system works, anyway.

As with other breaking changes we've made in preview, this will entail reaction in both implementations of the abstraction and consumption of it. On the consumption side, the largest impact to not reacting is that there might be some duplicate messages in the chat history, which is generally not problematic. But we'll still want to help consumers move to the new, simpler consumption patterns.

This also implicitly addresses #5909 and #5675.

cc: @SteveSandersonMS, @eiriktsarpalis, @luisquintanilla, @matthewbolanos, @RogerBarreto, @davidfowl, @DamianEdwards, @peterwald, @shyamnamboodiripad, @MackinnonBuck, @halter73 @colombod, @tghamm

Microsoft Reviewers: Open in CodeFlow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR refactors the chat client abstractions so that the IChatClient implementations now automatically augment the chat history with generated responses rather than requiring consumers to update the history manually. Key changes include updating the FunctionInvokingChatClient to streamline function call handling, removing the KeepFunctionCallingContent property, and reworking ChatResponse so that Message now returns the last message in its internal list.

Reviewed Changes

File Description
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs Refactors function call processing and history update logic.
src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatResponse.cs Updates the ChatResponse API to work with mutable messages and returns the final message.
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/CachingChatClient.cs Adjusts cache read logic to integrate history augmentation.
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIChatClient.cs Ensures responses from OpenAI are added into chat history.
Others Various supporting files and documentation adjustments to reflect the new design.

Copilot reviewed 36 out of 36 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (3)

src/Libraries/Microsoft.Extensions.AI/ChatCompletion/CachingChatClient.cs:57

  • The condition for checking a cached result appears inverted compared to previous behavior. Verify that when a cached response exists it is correctly added to chatMessages and then returned, to avoid unintentionally bypassing the cache.
if (await ReadCacheAsync(cacheKey, cancellationToken).ConfigureAwait(false) is { } result)

src/Libraries/Microsoft.Extensions.AI.Abstractions/ChatCompletion/ChatResponse.cs:83

  • The getter for ChatResponse.Message now returns the last message instead of the first. This change may affect consumers relying on the previous behavior; ensure documentation and migration guidance are updated so that users are aware of the new semantics.
if (Messages.Count == 0) { Messages.Add(new ChatMessage(ChatRole.Assistant, [])); }

src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIChatClient.cs:114

  • After constructing the ChatResponse from the OpenAI call, the response message is added to chatMessages to automate history augmentation. Confirm that this behavior is consistent across scenarios (e.g. caching or stateful clients) and does not lead to duplicate entries in the conversation history.
chatMessages.Add(chatResponse.Message);

Tip: Copilot code review supports C#, Go, Java, JavaScript, Markdown, Python, Ruby and TypeScript, with more languages coming soon. Learn more

@stephentoub stephentoub marked this pull request as ready for review March 2, 2025 14:28
@stephentoub stephentoub requested review from a team as code owners March 2, 2025 14:28
Copy link
Member

@RussKie RussKie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving for the "fundamentals" side.

Copy link
Contributor

@shyamnamboodiripad shyamnamboodiripad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed changes sound reasonable. I don't see significant impact from the standpoint of MEAI.Evaluation, since the conversation history that is passed to the evaluation APIs (for evaluation) is only read (i.e., it is embedded as part of the prompts that perform evaluation) and is not mutable.

The evaluation itself is also performed using single shot prompts at the moment (at least for the built-in evaluators that ship as part of the MEAI.Evaluation.Quality assembly today). So, mutating the conversation history for evaluation also shouldn't have any significant impact - and in more advanced future cases where multi-turn conversation may be needed for evaluation, the proposed changes should prove just as beneficial as they would for the general case.

@shyamnamboodiripad
Copy link
Contributor

The proposed changes sound reasonable. I don't see significant impact from the standpoint of MEAI.Evaluation, since the conversation history that is passed to the evaluation APIs (for evaluation) is only read (i.e., it is embedded as part of the prompts that perform evaluation) and is not mutable.

Actually, thinking a bit more, this may have an impact on IEvaluator.EvaluateAsync which has the following signature -

ValueTask<EvaluationResult> EvaluateAsync(
        IEnumerable<ChatMessage> messages,
        ChatMessage modelResponse,
        ChatConfiguration? chatConfiguration = null,
        IEnumerable<EvaluationContext>? additionalContext = null,
        CancellationToken cancellationToken = default);

The assumption so far was that this would be convenient for callers since they would normally have the 'history minus the response' (i.e., messages) and 'the response' (i.e., modelResponse) available separately. However, if the common case going forward will be one where the caller always has 'history including the response', then calling IEvaluator.EvaluateAsync above would become cumbersome.

Should we get rid of the modelResponse parameter and update the xml comments to clarify that the last message in messages will be treated as the modelResponse to be evaluated?

We also have a couple of convenience extension methods for cases where the caller wants to supply a question answer pair, or just the model response alone (without any conversation history. Those seem fine and should not need any changes.

@SteveSandersonMS
Copy link
Member

Thanks for pushing this forwards, @stephentoub!

I think this is a big improvement to the clarity of the model. It feels like this is explainable and neatly creates equivalences across streaming/nonstreaming and stateless/assistant modes.

The two smaller design points I'd like to resolve are:

Copy link

@RogerBarreto RogerBarreto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I appreciate the changes, removing Choices clears the design and simplify most of the scenarios, I still think would be interesting adding breaking glass examples how the 1% would handle multiple choices.

@stephentoub stephentoub requested a review from a team as a code owner March 4, 2025 05:38
@stephentoub
Copy link
Member Author

@SteveSandersonMS, @RogerBarreto, please take another look.

@shyamnamboodiripad, please confirm the eval changes are appropriate.

Thanks.

Copy link
Member

@SteveSandersonMS SteveSandersonMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! This is a pretty big and important change - thanks for making it happen.

Two remaining points from me:

  • Spacing - happy to discuss, would be ideal to resolve within this PR
  • Caching - might be a separate matter for a different PR, but interested to know what you think

Copy link
Member

@peterwald peterwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with respect to M.E.AI.Eval

@stephentoub
Copy link
Member Author

One more peek @SteveSandersonMS before this merges?

@shyamnamboodiripad
Copy link
Contributor

@stephentoub Just wanted to make sure you saw this comment that I had posted above. This could be something that we change after this PR is merged - however, if we are planning to include this PR in the snap for the next release of packages, I think this may be something that we should resolve before release.

cc @peterwald

@stephentoub
Copy link
Member Author

Sorry, @shyamnamboodiripad, I missed that.

Should we get rid of the modelResponse parameter and update the xml comments to clarify that the last message in messages will be treated as the modelResponse to be evaluated?

Either that, or we could do what ChatResponse<T> does and use response.Text, which is the concatenation of all TextContent in messages in the response. But that probably doesn't make as much sense for arbitrary evaluators, which could reasonably want to evaluate content besides text (images, audio, etc.)

Regarding the last message, is that always the right answer for any evaluator? Would there not be some evaluator that would want to understand the full exchange, e.g. evaluate what functions were called on the way to producing a final answer?

Maybe it should just take the ChatResponse, assuming that's what's being evaluated, and its up to the evaluator what to evaluate?

Copy link
Contributor

@shyamnamboodiripad shyamnamboodiripad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest iteration looks great - thanks! :shipit:

@stephentoub
Copy link
Member Author

@SteveSandersonMS, if you're good with this, we can merge it.

Copy link
Member

@SteveSandersonMS SteveSandersonMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! The additional clarity with this is really nice.

I added a couple of small questions but neither are intended to block you from merging here.

This has a bunch of benefits:
- It reduces what a caller needs to do. They don't need to add the response content to the history, because that's handled for them by the client they're using. This is particularly helpful for streaming responses.
- It keeps stateless usage (where the caller provides the full list of messages) and stateful usage (where the caller provides a thread id and then only additional messages) more similar, as in the stateful case the service is already tracking all the content and the caller shouldn't be adding anything to history.
- It fixes the ordering of messages in automatic function invocation, where it can now fully manage the history list, even when streaming, because the caller is not responsible for adding the streaming content into the history and thus there's no concern about ordering between the consumer adds (nothing) and what the implementation adds.
- It ensures in automatic function invocation that all content is sent back to inner client, because it's the inner client that added the content in the first place.
- It enables all content to be yielded from the function invoking client, including content it creates that has a different role (tool) from the other streaming content (assistant), which then enables consumers to use that knowledge for things like keeping UIs up to date.
Multiple messages could be generated in a single turn as part of server-side tool use, as part of agents scenarios, as part of automatic function calling, etc. This is now being added into the history, but in some cases the caller doesn't have a history object (e.g. they just provided a string prompt and a temporary history was created), in some stateful cases the history may not be updated because it's all being tracked by the service, in some cases middleware could have manipulated the history in a way that makes it challenging to know what the additional messages are, etc. Further, this makes the non-streaming and streaming cases more synonymous, as in the streaming case all chat response updates from all messages are being "returned".
@stephentoub stephentoub enabled auto-merge (squash) March 6, 2025 15:15
@stephentoub stephentoub merged commit c221abe into dotnet:main Mar 6, 2025
6 checks passed
@stephentoub stephentoub deleted the ichatclientchanges branch March 6, 2025 15:57
@jeffhandley jeffhandley added the area-ai Microsoft.Extensions.AI libraries label Mar 7, 2025
shyamnamboodiripad added a commit to shyamnamboodiripad/extensions that referenced this pull request Mar 10, 2025
…otnet#5998) to include `ChatResponse` (which can contain multiple `ChatMessage`s) in place of a single `ChatMessage`. Unfortunately, we missed updating the TypeScript reporting code to account for this.

This change fixes the problem by updating the deserialization code in TypeScript to match what .NET code serializes.
shyamnamboodiripad added a commit that referenced this pull request Mar 10, 2025
…5998) to include `ChatResponse` (which can contain multiple `ChatMessage`s) in place of a single `ChatMessage`. Unfortunately, we missed updating the TypeScript reporting code to account for this. (#6061)

This change fixes the problem by updating the deserialization code in TypeScript to match what .NET code serializes.
RussKie added a commit that referenced this pull request Mar 10, 2025
* The .NET side code for the `ScenarioRunResult` was recently changed (#5998) to include `ChatResponse` (which can contain multiple `ChatMessage`s) in place of a single `ChatMessage`. Unfortunately, we missed updating the TypeScript reporting code to account for this. (#6061)

This change fixes the problem by updating the deserialization code in TypeScript to match what .NET code serializes.

* Automatically add 'untriaged' label to new issues without milestones (#6060)

* Address M.E.VectorData feedback for IEmbeddingGenerator (#6058)

* Move GetService down to a non-generic IEmbeddingGenerator interface

* Separate UriContent from DataContent

* Address feedback

---------

Co-authored-by: Shyam N <shyamnamboodiripad@users.noreply.github.com>
Co-authored-by: Jeff Handley <jeffhandley@users.noreply.github.com>
Co-authored-by: Stephen Toub <stoub@microsoft.com>
joperezr pushed a commit to joperezr/extensions that referenced this pull request Mar 11, 2025
The .NET side code for the `ScenarioRunResult` was recently changed (#dotnet#5998) to include `ChatResponse` (which can contain multiple `ChatMessage`s) in place of a single `ChatMessage`. Unfortunately, we missed updating the TypeScript reporting code to account for this.

This change fixes the problem by updating the deserialization code in TypeScript to match what .NET code serializes.

Cherry-picked from commit `41bbedd0` (dotnet#6061)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-ai Microsoft.Extensions.AI libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants