Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net MEVD: Decide and align error behavior when upserts fail #10640

Open
roji opened this issue Feb 22, 2025 · 4 comments
Open

.Net MEVD: Decide and align error behavior when upserts fail #10640

roji opened this issue Feb 22, 2025 · 4 comments
Assignees
Labels
Build Features planned for next Build conference memory connector memory msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code

Comments

@roji
Copy link
Member

roji commented Feb 22, 2025

Currently, in the Weaviate provider, if any errors occur when upserting records, those upserts silently fail and no exception is thrown (link). Since non-batch Upsert delegates to UpsertBatch, the same behavior applies for regular Upsert - errors are silently swallowed.

We should decide on a proper error handling behavior and apply it across all implementations. Specifically, in some databases, some upserts may succeed while others fail; in others, batches should be atomic and an error should cause the entire batch to fail. We should in any case throw an exception if any upsert in the batch failed, to make sure the user is aware of the failure.

@roji roji added .NET Issue or Pull requests regarding .NET code memory connector memory msft.ext.vectordata Related to Microsoft.Extensions.VectorData labels Feb 22, 2025
@dmytrostruk
Copy link
Member

We should in any case throw an exception if any upsert in the batch failed, to make sure the user is aware of the failure

This is tricky, because even if we throw an exception if at least one record upsert failed, the others will be upserted successfully. User should know which record IDs were stored successfully in order to either delete them and try to re-upsert the entire batch again or re-upsert only failed ones. We can pass this info to the user through some exception property.

But it's already possible to identify such a case today even without an exception. User knows the number of records to upsert, and user gets a collection of upserted IDs in response. By checking records.Count == ids.Count you can understand if all records where upserted and it's possible to identify which failed, although without additional error details.

On one side, the exception will provide user awareness and could hold more details about an error for each record, but on the other side, it will be always required to wrap upsert method in try/catch block to handle it properly. Taking into account that UpsertBatchAsync returns IAsyncEnumerable, it becomes harder to use that method, since you can't yield in try/catch block and so on.

It's probably worth to evaluate this API and its cases from user perspective to see which solution will be more user-friendly.

@markwallace-microsoft markwallace-microsoft moved this to Backlog: Planned in Semantic Kernel Feb 25, 2025
@markwallace-microsoft markwallace-microsoft added the Build Features planned for next Build conference label Feb 25, 2025
@roji
Copy link
Member Author

roji commented Feb 26, 2025

Yeah, this is indeed tricky, and what's supported across databases differs.

But it's already possible to identify such a case today even without an exception. User knows the number of records to upsert, and user gets a collection of upserted IDs in response. By checking records.Count == ids.Count you can understand if all records where upserted and it's possible to identify which failed, although without additional error details.

I don't think requiring users to explicitly check the count after every upsert is a great story, or in line with standard .NET behavior... For one thing, it's very undiscoverable - no user would ever expect to need to do this. I myself lost quite a bit of time while writing the tests, since the upsert seemed to complete just fine (no exception) but then data was missing.

[...] on the other side, it will be always required to wrap upsert method in try/catch block to handle it properly.

I'm not sure about this argument - APIs shouldn't generally avoid throwing exceptions because users may want to use try/catch blocks to catch them (and what you wrote isn't specific to this API in any way). Exceptions are the standard mechanism in .NET to indicate that an operation couldn't be completed successfully; based on their needs, users can catch the exception, let it bubble up and be caught somewhere else, or possibly even not catch at all and allow the program to be terminated.

I don't think that the partial success here changes any of that fundamentally: the user indicated that they want to upsert all the records; either that entire operation succeds, or an exception should be thrown IMHO. Given the diversity in what databases support, I'd propose the following:

  • Ideally, if a database supports atomic transactions (all relational databases), then the whole batch should be wrapped in a transaction, which gets rolled back on any failure. Thus, if an exception is thrown, nothing got upserted.
  • If there's no atomicity support, the connector can track errors internally as the IAsyncEnumerable is consumed, and throw an exception at the end, saying "Some records failed to upsert". We could have a special exception type that would expose the IDs of failed records, though database-generated keys make this difficult, and it may be sufficient to just let the user check the database themselves to see what got inserted and what not.
  • Some databases may not even allow us to know which upserts succeeded and which did not. In those cases we can only throw a "some records failed to upsert" error.

Taking into account that UpsertBatchAsync returns IAsyncEnumerable, it becomes harder to use that method, since you can't yield in try/catch block and so on.

I'm not aware of this - AFAIK there's no issue with using yield in try/catch blocks (see sample code just below). May you have a different scenario in mind?

try/catch with yield
public class Program
{
    public static async Task Foo()
    {
        try
        {
            await foreach (var i in Bar())
            {
                Console.WriteLine(i);
            }
        }
        catch (Exception e)
        {
            Console.WriteLine("Exception: " + e);
        }
    }

    public static async IAsyncEnumerable<int> Bar()
    {
        yield return 1;
        yield return 2;
        await Task.Delay(1);
    }
}

(on an unrelated note, I've openee #10692 to discuss the returning of IAsyncEnumerable from UpsertBatchAsync)

@dmytrostruk
Copy link
Member

I'm not aware of this - AFAIK there's no issue with using yield in try/catch blocks (see sample code just below). May you have a different scenario in mind?

In your example, you do Console.WriteLine(i) inside try block, but let's say I need to get each record, modify it somehow and then yield return it again:

public static async IAsyncEnumerable<int> Foo()
{
    try
    {
        await foreach (var i in Bar())
        {
            yield return i; // CS1626: Cannot yield a value in the body of a try block with a catch clause
        }
    }
    catch (Exception e)
    {
        Console.WriteLine("Exception: " + e);
    }
}

So, in order to be able to catch an exception for each record that you process, you need to write something like this:

public static async IAsyncEnumerable<int> Foo()
{
    var enumerator = Bar().GetAsyncEnumerator();

    await using (enumerator)
    {
        while (true)
        {
            try
            {
                if (!await enumerator.MoveNextAsync())
                {
                    break;
                }
            }
            catch (Exception e)
            {
                Console.WriteLine("Exception: " + e);
            }

            yield return enumerator.Current;
        }
    }
}

Of course, you can create extension methods/helper methods for such scenarios to re-use them in multiple places. But I think this is how you deal with exceptions in this case, unless you know about easier approach :)

@roji
Copy link
Member Author

roji commented Feb 27, 2025

@dmytrostruk ah I see, right. Note that there seems to be a consensus in the language/compiler team that allowing yield in try/catch is fine (see discussion), and there's an active language proposal for this (link). So while this limitation is unfortunate, it's quite likely to go away soon.

In any case, I would change a fundamental behavioral aspect of the API (throw exception when upsert failed) because of something like this... At the end of the day, users need to know that their batch upsert didn't (completely) succeed, and IMHO this needs to happen via an exception just like everywhere else in .NET...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Features planned for next Build conference memory connector memory msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code
Projects
Status: Backlog: Planned
Development

No branches or pull requests

3 participants