-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve our SQL Server client-generated GUIDs #33579
Comments
@JackTrapper would you be interested in submitting a PR for this? EF's client-side GUID generator for SQL Server is in https://github.com/dotnet/efcore/blob/main/src/EFCore/ValueGeneration/SequentialGuidValueGenerator.cs. |
dotnet/runtime#103658 was completed recently and .NET 9 will contain |
@cremor I'm not sure, this will require some careful investigation. |
I searched a bit more and it looks like UUID v7 (which is used by the new See: |
I'm certain it will not help for SQL Server because it does not tailor itself to SQL Server's sort order. Which is appropriate; it is correct for C# to be agnostic to SQL Server, Oracle, DB2, Postgres. No, this feature needs to be a database client feature. And it's not like one-size-fits all, because different databases use different sort rules. No, what you need is:
I don't believe any other database engines have a So there is no such thing as a one-size-fits-all solution. You have to know you are generating a UUID to be fed to Microsoft SQL Sever. I don't know where/if in EF Core you tell it the flavor of database engine you're using-so that way it can adjust the bits of a standard type-1 uuid to make is collation compatible with the database back-end engine. No matter what you want to start with UuidCreateSequential (or the local OS's equivalent) becuase there are rules about how to handle two UUIDs being generated within the same 100 ns tick. And it's not an issue you can ignore; modern CPUs are fast enough you can generate 200,000 UUIDs in the same 100 ns tick. The solution is the system-wide collision counter documented in the RFC, and already implemented on Windows by UuidCreateSequential. Once you have your type-1 uuid created by UuidCreateSequential, you then have to move the bytes around so that the 64-bit (8-byte) timestamp is sorted first. So, no, any implemenation of Guid.CreateVersion7() will not be useful here, since it was not designed with my specific favorite database engine in mind. |
Uuidnext has a variant of uuid v8 that is sortable in ms sql server : https://github.com/mareek/UUIDNext/blob/main/Src/UUIDNext/Generator/UuidV8SqlServerGenerator.cs |
I ran your code for sortable guids in (https://github.com/dotnet/efcore/issues/30753#issuecomment-2027429203)[his comment] on #30753. I found that it does not significantly differ from the earlier mentioned GUID generation strategies. The GUIDs generated used an average page space of 67%, and fragmentation was 41%. I have upgraded hardware at this point, so there could be some skew in there. I'm wondering if maybe there's something wrong with my testing methodology now, though, as this seems really bad for something that seems so well documented and thought out. Is there any chance that you can explain why I got such a high fragmentation %? |
I'd have to say:
|
does this have similar concurrency issues though |
At this point I'm not sure if there's any point in keeping this issue open. The problem that everyone is going to run in to is that entity framework prioritizes knowing your Guid ID in advance over having the DB generate one serially. This behavior has been in EF for every major version (1.0 - 9.0 of .NET). If they were to try to change something about this it would probably break far more peoples' code than it would help. Likely the only thing that needs to be changed is the documentation. This blog post says that:
This probably isn't the same verbiage as it was back in 2.1 when I started working with EF core, but I remember thinking that from the docs it seemed like the value was being generated SQL side. IMO spelling out the behavior is important, and that's why I originally wrote that blog post. There's nothing that outright says "hey, your guids are being generated by the application by default" and goes in to the implications of that default behavior. You have to specifically go hunting for how to opt out of the behavior, and by that time you've probably already gone so far that its too late. |
@ConnerPhillis I'll note that just below the paragraph you cited above, you can find this one:
That's very explicit about the value generation happening client-side; I do agree that someone only reading the Primary key section could miss this, and so that these docs could be slightly improved, though.
How is it too late? Changing GUID value generation from client-side to server-side is trivial, and can be done at any point - the fact that some GUIDs have already been generated client-side shouldn't prevent anyone from switching to server-generated. |
Hey @roji , I'll clarify - I think ideally, the behavior should be noted for the various primary key types in a table, e.g.
I'd argue what you found is explicit, but its hard to notice. The first part feels like it suggests guids are generated server side, and that's all I read. Yes that's on me for not reading the whole doc, but I imagine that others are doing the same as well. Might be less of a problem now that ChatGPT is a thing, tho. To my second part about it being "to late", consider the guids that are generated by NEWSEQUENTIALID in SQL Server
vs the guids that are generated by SequentailGuidValueGenerator
Now - this may be me speaking out of turn here as I am not a db expert, but I don't think that SQL would know how to optimize this index if we were to swap between strategies. Say that you started with server generated ids and wanted to swap to db generated because of consistency issues. If SQL is using the first component of the GUIDs to order here, then that means that every new GUID would have to be inserted somewhere between GUIDs that are far more random. Ordering the insert would look something like:
Again, I'm not a DB expert, but I would think that this would be really bad for index and page fragmentation. I could run a benchmark on this but that would probably have to wait until the weekend. My thinking (and please correct me if I'm wrong) is that if you were to swap from client to server side (or vise versa) the index fragmentation would likely be just as bad. Essentially if you go to production using a particular guid generation strategy, you're locked in because trying to swap to the other means you'll end up with the same problem at a different severity. |
I don't think that's quite true:
The text does says "values generated" without specifying where or who generates them. I do agree this is a bit technical/pedantic, and we could make things more immediately clear though. One of the issues here is that different providers/databases can and do do different things: it's perfectly fine for one provider to generate GUIDs in the database, and another on the client. For a concrete example, relational providers usually default to generating int keys in the database ("identity"), but Cosmos does not support this. All this to say that this is a very general part of the docs that docs not attempt to say what happens on each different database/providers; this is why it's ambiguous on exactly where the generation occurs.
SequentailGuidValueGenerator basically employes the same GUID generation as the SQL Server NEWSEQUENTIALID function -that's basically the reason it exists; so in principle, switching between the two shouldn't be disruptive and generate GUIDs which can be indexed well together. |
I think I should settle on agreeing with you and realizing that things are less standardized than I initially thought within ef
I don't think that I agree with this. The guids that I generated earlier in my example came from SQL server vs sequential value generator. But I wanted to run benchmarks to see if what I was saying was accurate before I said something wrong. in the case that you go from NEWSEQUENTIALID to EF generated guids I ran this benchmark by first inserting 100,000 guids from SQL server Inserting the GUIDs w/ This leads me to believe that index fragmentation is kept very low at the cost of insert performance. SQL server is likely having a hard time coping with the two different formats for GUIDs. I think it's running in to issues balancing and having to split several pages for every insert in the case that you go from EF generated to NEWSEQUENTIALID
edit: this was wrong, I forgot to clear tracked changes so EF was evaluating too much. Issue was not SQL server (it was me :) ) Anyways, once I was done inserting everything, fragmentation was at .54%, in line with what I got in my first go round. I altered the table to do the GUID generation in the DB instead and went to generating. This time, the generation took ~9 seconds, and fragmentation remained at .61%. So, you are correct and I am wrong. Swapping from SQL to EF core GUIDs is not a risk to database performance. I am going to keep benchmarking though. It is alarming to me that the EF inserts are behaving so poorly, I want to figure out what is at the root cause of this. |
@roji almost my entire previous comment is wrong and can be explained by me not clearing out the change tracker on a long-lived db context object. In summary, your assumption is correct, I am incorrect. |
@ConnerPhillis no worries, this stuff is hard to check and you're far from the first person to get change tracking interfering in benchmarking. |
There's a report that our current client-side sequential GUIDs generator for SQL Server doesn't work well for concurrent environments; #30753 was opened to consider switching to server-generated GUIDs instead, as a way of mitigating that.
However, @JackTrapper posted an analysis in #30753 (comment), according to which the issue could be mitigated by improving the client-side generation logic instead. If this indeed makes our client-generated GUIDs as indexing-efficient as server-side ones generated via NEWSEQUENTIALID, that would be a good improvement that should also maintain EF backwards compatibility.
The text was updated successfully, but these errors were encountered: