-
Notifications
You must be signed in to change notification settings - Fork 787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataContent fails with valid data URI #6055
Comments
I'm not following. Are you saying you expect providers to treat a DataContent with a text/* media type the same as TextContent? |
In the case of AzureAIInferenceChatClient.cs, I would expect text to be handled similar to image and audio. In the code I shared, that should be valid content I'm passing but the AI Inference implementation doesn't think so because for text it assumes that I'm sending TextContent. In that particular implementation, as an end-user without access to the source, when the code throws with message:
It's not obvious to me that what I should have done was use TextContent. More generally though, I don't think text should to be treated differently. |
Right, so you're saying you'd want implementations to handle both TextContent and DataContent with "text/*" as text content.
To be honest, it did not occur to me that anyone would try to pass in a DataContent for handling text. How did you end up there? |
Yeah. Although I think in most cases, I think this would be up to the implementer. Not sure what, if anything, we could do there to make that pattern easier. In the Azure AI Inference case since that implementation is under the extensions repo, it seems like the right thing to do.
I was tinkering and trying to understand how custom mime types might work (the bottom portion of the notebook). When that threw an error, to rule out user error on my part, I tested the text scenario which I expected to work. |
Yes, but we need to provide guidance / best practices on how these implementations handle things. If we'd consider it a bug in an implementation that it didn't handle DataContent.StartsWithMediaType("text/") specially, then we're saying that any conforming implementation does so. And it's not ideal when there are two different ways the same information can be specified. It's also not trivial to handle "text/"... the MIME type can include an optional charset, which dictates an encoding with which the bytes should be converted back to text, so it's possibly quite complicated. Here's what HttpClient does: If this is really something we think needs to be handled, then we should be providing some kind of helper to make that easier. I'm not convinced it's worth the complication, though; do we have real uses cases where we'd expect this to occur? |
Just looking at the content types in Abstractions, text seems to be the only one that is treated special and has its own class. Maybe we just make everything DataContent and rely on mime types? I also think this starts to get into the last part of the issue which may be worth having a separate thread on.
At least when it comes to handling MIME types and parsing them, looks like we have the extensions/src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/DataUriParser.cs Lines 68 to 118 in d325bcb
In the case of text, params like |
Also, for this constructor. Would we ever expect the Uri to be a non-data URI? If not, is the extensions/src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/DataContent.cs Line 60 in d325bcb
|
Text represents the 99% case today when dealing with LLMs. I think it needs to be simple. To your point, we could make it work with just DataContent, but that feels like it's adding a lot of complication and expense in the name of consistency/consolidation. We'd end up wanting special factory methods for creating DataContent from text, and then we'd need special methods for extracting text if the media type was "text/*", plus the performance overhead. If we believe the DataContent case is real, we can add the relevant code to handle it. We'd need to do so not only in all of the abstraction implementations, but also in the object model types, such as ChatResponse, which have properties like Text that today enumerate all contents looking for TextContent... those would also need to look for DataContent with "text/*". And as mentioned, we'd likely need to add some kind of TryGetText(out string text) helper for DataContent that would parse out the encoding and use it to convert the bytes into text.
Yes, some services let you specify urls to public locations and they'll download the image rather than you having to send the base64-encoded data. This is why DataContent.Data is a @SteveSandersonMS, any opinions on all this? |
I don't think of Since we leave it up to @luisquintanilla Can you suggest scenarios where app developers would want or need to use A significant drawback of us choosing to make |
I agree that the ergonomics need to be simple for a developer as well as for implementers. At minimum I think it should be consistent for implementers. If we want to make it easy and reduce overhead by providing a type to represent a set of commonly used MIME types (TextContent in the case of
Thanks for clarifying.
This is mainly why I was asking. Making it optional though seems like a good way to handle for now.
As an end-user, if you give me a type that covers a large percentage of cases, which I think TextContent should for text scenarios, I'll use that for cases where the input is a prompt.
Would we need to do this in the abstractions or is this something that the implementers would handle? In the AI Inference package we are the implementers. |
Why does it feel weird? Text is the 99.9% case. It is special. Note that it's special in .NET in general. There's char, even though you could represent it as just multiple bytes. And there's string, even though you could represent it as a char[] or a byte[]. TextContent then benefits from being able to represent the data exactly as it needs to be consumed throughout .NET, as a string. That is not the case for ImageContent/AudioContent: those types add no additional representation, the derived types had no additional functionality or value... all they provided was a way to check for the kind of data represented, which is handled by
Both. |
Because there are other forms of data the models support. |
Description
When using Microsoft.Extensions.AI.AzureAIInference IChatClient, sending
DataContent
with a valid data URI fails.Repro:
https://gist.github.com/luisquintanilla/48a3dae23ca160a505b29ac8b87d38a4
For text content, rather than checking that the media type is text, it assumes that the AIContent is TextContent.
extensions/src/Libraries/Microsoft.Extensions.AI.AzureAIInference/AzureAIInferenceChatClient.cs
Lines 489 to 491 in d325bcb
I would expect the switch statement to treat everything as DataContent and for text to also use the media type.
This also raises another question. For custom media / MIME types, what is the mechanism for handling those? Would I have to implement my own DelegateChatClient?
Reproduction Steps
https://gist.github.com/luisquintanilla/48a3dae23ca160a505b29ac8b87d38a4
Expected behavior
I can get a response from the model provider.
Actual behavior
Regression?
No response
Known Workarounds
No response
Configuration
No response
Other information
No response
The text was updated successfully, but these errors were encountered: