DataContent fails with valid data URI #6055

luisquintanilla · 2025-03-07T20:28:51Z

Description

When using Microsoft.Extensions.AI.AzureAIInference IChatClient, sending DataContent with a valid data URI fails.

Repro:

https://gist.github.com/luisquintanilla/48a3dae23ca160a505b29ac8b87d38a4

For text content, rather than checking that the media type is text, it assumes that the AIContent is TextContent.

extensions/src/Libraries/Microsoft.Extensions.AI.AzureAIInference/AzureAIInferenceChatClient.cs

Lines 489 to 491 in d325bcb

    
           case TextContent textContent: 
        
               parts.Add(new ChatMessageTextContentItem(textContent.Text)); 
        
               break;

I would expect the switch statement to treat everything as DataContent and for text to also use the media type.

This also raises another question. For custom media / MIME types, what is the mechanism for handling those? Would I have to implement my own DelegateChatClient?

Reproduction Steps

https://gist.github.com/luisquintanilla/48a3dae23ca160a505b29ac8b87d38a4

Expected behavior

I can get a response from the model provider.

Actual behavior

Error: System.ArgumentException: Value cannot be an empty collection. (Parameter 'content')
   at Azure.AI.Inference.Argument.AssertNotNullOrEmpty[T](IEnumerable`1 value, String name)
   at Azure.AI.Inference.ChatRequestUserMessage..ctor(IEnumerable`1 content)
   at Microsoft.Extensions.AI.AzureAIInferenceChatClient.ToAzureAIInferenceChatMessages(IList`1 inputs)+MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at Azure.AI.Inference.ChatCompletionsOptions..ctor(IEnumerable`1 messages)
   at Microsoft.Extensions.AI.AzureAIInferenceChatClient.ToAzureAIOptions(IList`1 chatContents, ChatOptions options)
   at Microsoft.Extensions.AI.AzureAIInferenceChatClient.GetResponseAsync(IList`1 chatMessages, ChatOptions options, CancellationToken cancellationToken)
   at Submission#8.<<Initialize>>d__0.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray`1 precedingExecutors, Func`2 currentExecutor, StrongBox`1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

The text was updated successfully, but these errors were encountered:

stephentoub · 2025-03-07T20:35:36Z

I would expect the switch statement to treat everything as DataContent and for text to also use the media type.

I'm not following. Are you saying you expect providers to treat a DataContent with a text/* media type the same as TextContent?

luisquintanilla · 2025-03-07T20:42:29Z

In the case of AzureAIInferenceChatClient.cs, I would expect text to be handled similar to image and audio.

https://github.com/dotnet/extensions/blob/d325bcb7c81eb59d1df2fd93d836571ed97d57d9/src/Libraries/Microsoft.Extensions.AI.AzureAIInference/AzureAIInferenceChatClient.cs#L493C17-L493C93

In the code I shared, that should be valid content I'm passing but the AI Inference implementation doesn't think so because for text it assumes that I'm sending TextContent.

In that particular implementation, as an end-user without access to the source, when the code throws with message:

System.ArgumentException: Value cannot be an empty collection. (Parameter 'content')

It's not obvious to me that what I should have done was use TextContent.

More generally though, I don't think text should to be treated differently.

stephentoub · 2025-03-07T21:15:53Z

More generally though, I don't think text should to be treated differently.

Right, so you're saying you'd want implementations to handle both TextContent and DataContent with "text/*" as text content.

It's not obvious to me that what I should have done was use TextContent.

To be honest, it did not occur to me that anyone would try to pass in a DataContent for handling text. How did you end up there?

cc: @MackinnonBuck, @SteveSandersonMS

luisquintanilla · 2025-03-07T21:23:14Z

Right, so you're saying you'd want implementations to handle both TextContent and DataContent with "text/*" as text content.

Yeah. Although I think in most cases, I think this would be up to the implementer. Not sure what, if anything, we could do there to make that pattern easier. In the Azure AI Inference case since that implementation is under the extensions repo, it seems like the right thing to do.

To be honest, it did not occur to me that anyone would try to pass in a DataContent for handling text. How did you end up there?

I was tinkering and trying to understand how custom mime types might work (the bottom portion of the notebook). When that threw an error, to rule out user error on my part, I tested the text scenario which I expected to work.

stephentoub · 2025-03-07T21:31:33Z

Although I think in most cases, I think this would be up to the implementer.

Yes, but we need to provide guidance / best practices on how these implementations handle things. If we'd consider it a bug in an implementation that it didn't handle DataContent.StartsWithMediaType("text/") specially, then we're saying that any conforming implementation does so. And it's not ideal when there are two different ways the same information can be specified.

It's also not trivial to handle "text/"... the MIME type can include an optional charset, which dictates an encoding with which the bytes should be converted back to text, so it's possibly quite complicated. Here's what HttpClient does:
https://github.com/dotnet/runtime/blob/81dff0439effb8dabb62421904cdcea8f26c8f0f/src/libraries/System.Net.Http/src/System/Net/Http/HttpContent.cs#L121-L148

If this is really something we think needs to be handled, then we should be providing some kind of helper to make that easier.

I'm not convinced it's worth the complication, though; do we have real uses cases where we'd expect this to occur?

luisquintanilla · 2025-03-07T22:06:15Z

Just looking at the content types in Abstractions, text seems to be the only one that is treated special and has its own class.

Maybe we just make everything DataContent and rely on mime types?

I also think this starts to get into the last part of the issue which may be worth having a separate thread on.

For custom media / MIME types, what is the mechanism for handling those? Would I have to implement my own DelegateChatClient?

At least when it comes to handling MIME types and parsing them, looks like we have the DataUriParser, handles the most common cases. Anything else, we fall back to behavior similar to HttpClient.

extensions/src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/DataUriParser.cs

Lines 68 to 118 in d325bcb

    
           public static bool IsValidMediaType(ReadOnlySpan<char> mediaTypeSpan, ref string? mediaType) 
        
           { 
        
               Debug.Assert( 
        
                   mediaType is null || mediaTypeSpan.Equals(mediaType.AsSpan(), StringComparison.Ordinal), 
        
                   "mediaType string should either be null or the same as the span"); 
        
               // If the media type is empty or all whitespace, normalize it to null. 
        
               if (mediaTypeSpan.IsWhiteSpace()) 
        
               { 
        
                   mediaType = null; 
        
                   return true; 
        
               } 
        
               // For common media types, we can avoid both allocating a string for the span and avoid parsing overheads. 
        
               string? knownType = mediaTypeSpan switch 
        
               { 
        
                   "application/json" => "application/json", 
        
                   "application/octet-stream" => "application/octet-stream", 
        
                   "application/pdf" => "application/pdf", 
        
                   "application/xml" => "application/xml", 
        
                   "audio/mpeg" => "audio/mpeg", 
        
                   "audio/ogg" => "audio/ogg", 
        
                   "audio/wav" => "audio/wav", 
        
                   "image/apng" => "image/apng", 
        
                   "image/avif" => "image/avif", 
        
                   "image/bmp" => "image/bmp", 
        
                   "image/gif" => "image/gif", 
        
                   "image/jpeg" => "image/jpeg", 
        
                   "image/png" => "image/png", 
        
                   "image/svg+xml" => "image/svg+xml", 
        
                   "image/tiff" => "image/tiff", 
        
                   "image/webp" => "image/webp", 
        
                   "text/css" => "text/css", 
        
                   "text/csv" => "text/csv", 
        
                   "text/html" => "text/html", 
        
                   "text/javascript" => "text/javascript", 
        
                   "text/plain" => "text/plain", 
        
                   "text/plain;charset=UTF-8" => "text/plain;charset=UTF-8", 
        
                   "text/xml" => "text/xml", 
        
                   _ => null, 
        
               }; 
        
               if (knownType is not null) 
        
               { 
        
                   mediaType ??= knownType; 
        
                   return true; 
        
               } 
        
               // Otherwise, do the full validation using the same logic as HttpClient. 
        
               mediaType ??= mediaTypeSpan.ToString(); 
        
               return MediaTypeHeaderValue.TryParse(mediaType, out _); 
        
           }

In the case of text, params like charset are well-known. For code, you might also have information like the programming language. Regardless of whether they are well-known or custom params, seems like it'd be good to have a way to handle or expose to users.

luisquintanilla · 2025-03-07T22:27:54Z

Also, for this constructor. Would we ever expect the Uri to be a non-data URI? If not, is the mediaType needed?

extensions/src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/DataContent.cs

Line 60 in d325bcb

    
           public DataContent([StringSyntax(StringSyntaxAttribute.Uri)] string uri, string? mediaType = null)

stephentoub · 2025-03-08T02:28:32Z

Just looking at the content types in Abstractions, text seems to be the only one that is treated special and has its own class.
Maybe we just make everything DataContent and rely on mime types?

Text represents the 99% case today when dealing with LLMs. I think it needs to be simple. To your point, we could make it work with just DataContent, but that feels like it's adding a lot of complication and expense in the name of consistency/consolidation. We'd end up wanting special factory methods for creating DataContent from text, and then we'd need special methods for extracting text if the media type was "text/*", plus the performance overhead.

If we believe the DataContent case is real, we can add the relevant code to handle it. We'd need to do so not only in all of the abstraction implementations, but also in the object model types, such as ChatResponse, which have properties like Text that today enumerate all contents looking for TextContent... those would also need to look for DataContent with "text/*". And as mentioned, we'd likely need to add some kind of TryGetText(out string text) helper for DataContent that would parse out the encoding and use it to convert the bytes into text.

Also, for this constructor. Would we ever expect the Uri to be a non-data URI?

Yes, some services let you specify urls to public locations and they'll download the image rather than you having to send the base64-encoded data. This is why DataContent.Data is a ReadOnlyMemory<byte>? rather than a ReadOnlyMemory<byte>, because it may not contain any data. We do suffer a bit from the fact that in these cases you really need the media type to be specified, at least the prefix, e.g. "image/*", as otherwise implementations won't know what kind of data is at the other end of that url. Ideally there'd be a way to force the caller to provide a media type in such cases, but unless we make the parameter required always, we lack a good way to do that, and making it required would be duplicative of most data uris that include the media type in them. We could come up with different factory methods, but it's not clear to me we have anything significantly better. Open to ideas.

@SteveSandersonMS, any opinions on all this?

SteveSandersonMS · 2025-03-10T11:57:11Z

I don't think of TextContent and "DataContent with a text/* media type" as being equivalent at all. In my mental model, the two are different in the same way that typing text into an email and adding a .txt file as an attachment are different, even though both are ways of supplying text.

Since we leave it up to IChatClient implementers to decide how to map the MEAI object model (which, today, differentiates those two representations) to the backend's API, each implementation is free to choose which content types it supports and how to handle it in each case. As such it can choose whether to support text/*, and within that, which text encodings or other subformats to handle, and beyond that whether to map it to be equivalent to TextContent or differentiate it in some way.

@luisquintanilla Can you suggest scenarios where app developers would want or need to use DataContent with a text/* media type instead of TextContent? I personally don't think this will occur to most app developers, or if they do it, I wouldn't necessarily assume they intend it to be handled as equivalent to TextContent.

A significant drawback of us choosing to make DataContent+text/* equivalent to TextContent is the perf overhead that @stephentoub mentions, both in having to check media types on all content objects whenever processing text, but also possibly needing to convert between different text representations and losing the representation that was originally used. So I'd only want to add all this extra sophistication if there were really compelling reasons.

luisquintanilla · 2025-03-11T02:11:58Z

I think it needs to be simple.

I agree that the ergonomics need to be simple for a developer as well as for implementers. At minimum I think it should be consistent for implementers. If we want to make it easy and reduce overhead by providing a type to represent a set of commonly used MIME types (TextContent in the case of text/), should ImageContent and AudioContent be brought back to make it feel consistent? It feels weird that Text is the only one that gets treated differently, especially since we expect to extend support for other image and audio scenarios.

Yes, some services let you specify urls to public locations and they'll download the image rather than you having to send the base64-encoded data.

Thanks for clarifying.

making it required would be duplicative of most data uris that include the media type in them

This is mainly why I was asking. Making it optional though seems like a good way to handle for now.

Can you suggest scenarios where app developers would want or need to use DataContent with a text/* media type instead of TextContent?

As an end-user, if you give me a type that covers a large percentage of cases, which I think TextContent should for text scenarios, I'll use that for cases where the input is a prompt.

both in having to check media types on all content objects whenever processing text, but also possibly needing to convert between different text representations and losing the representation that was originally used.

Would we need to do this in the abstractions or is this something that the implementers would handle? In the AI Inference package we are the implementers.

stephentoub · 2025-03-11T02:22:08Z

It feels weird that Text is the only one that gets treated differently

Why does it feel weird? Text is the 99.9% case. It is special.

Note that it's special in .NET in general. There's char, even though you could represent it as just multiple bytes. And there's string, even though you could represent it as a char[] or a byte[].

TextContent then benefits from being able to represent the data exactly as it needs to be consumed throughout .NET, as a string. That is not the case for ImageContent/AudioContent: those types add no additional representation, the derived types had no additional functionality or value... all they provided was a way to check for the kind of data represented, which is handled by dataContent.HasTopLevelMediaType("image") just as dataContent is ImageContent.

Would we need to do this in the abstractions or is this something that the implementers would handle?

Both.

luisquintanilla · 2025-03-11T14:00:04Z

Why does it feel weird?

Because there are other forms of data the models support.

luisquintanilla added bug This issue describes a behavior which is not expected - a bug. untriaged labels Mar 7, 2025

github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Mar 7, 2025

luisquintanilla changed the title ~~DataContent fails for valid data URI.~~ DataContent fails with valid data URI. Mar 7, 2025

luisquintanilla changed the title ~~DataContent fails with valid data URI.~~ DataContent fails with valid data URI Mar 7, 2025

stephentoub mentioned this issue Mar 11, 2025

Fix handling of empty messages in AzureAIInferenceChatClient #6072

Merged

dotnet-policy-service bot added the work in progress 🚧 label Mar 11, 2025

stephentoub closed this as completed in #6072 Mar 11, 2025

dotnet-policy-service bot added work in progress 🚧 and removed work in progress 🚧 untriaged labels Mar 11, 2025

jeffhandley removed the work in progress 🚧 label Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataContent fails with valid data URI #6055

DataContent fails with valid data URI #6055

luisquintanilla commented Mar 7, 2025 •

edited

Loading

stephentoub commented Mar 7, 2025

luisquintanilla commented Mar 7, 2025 •

edited

Loading

stephentoub commented Mar 7, 2025 •

edited

Loading

luisquintanilla commented Mar 7, 2025 •

edited

Loading

stephentoub commented Mar 7, 2025 •

edited

Loading

luisquintanilla commented Mar 7, 2025 •

edited

Loading

luisquintanilla commented Mar 7, 2025 •

edited

Loading

stephentoub commented Mar 8, 2025

SteveSandersonMS commented Mar 10, 2025

luisquintanilla commented Mar 11, 2025

stephentoub commented Mar 11, 2025

luisquintanilla commented Mar 11, 2025

DataContent fails with valid data URI #6055

DataContent fails with valid data URI #6055

Comments

luisquintanilla commented Mar 7, 2025 • edited Loading

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

stephentoub commented Mar 7, 2025

luisquintanilla commented Mar 7, 2025 • edited Loading

stephentoub commented Mar 7, 2025 • edited Loading

luisquintanilla commented Mar 7, 2025 • edited Loading

stephentoub commented Mar 7, 2025 • edited Loading

luisquintanilla commented Mar 7, 2025 • edited Loading

luisquintanilla commented Mar 7, 2025 • edited Loading

stephentoub commented Mar 8, 2025

SteveSandersonMS commented Mar 10, 2025

luisquintanilla commented Mar 11, 2025

stephentoub commented Mar 11, 2025

luisquintanilla commented Mar 11, 2025

luisquintanilla commented Mar 7, 2025 •

edited

Loading

luisquintanilla commented Mar 7, 2025 •

edited

Loading

stephentoub commented Mar 7, 2025 •

edited

Loading

luisquintanilla commented Mar 7, 2025 •

edited

Loading

stephentoub commented Mar 7, 2025 •

edited

Loading

luisquintanilla commented Mar 7, 2025 •

edited

Loading

luisquintanilla commented Mar 7, 2025 •

edited

Loading