-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add normalize-index background job #5558
Conversation
☔ The latest upstream changes (presumably #5579) made this pull request unmergeable. Please resolve the merge conflicts. |
@arlosi took the liberty of rebasing your branch since I introduced a few merge conflicts 😅 |
0671bd9
to
fc19915
Compare
src/admin/normalize_index.rs
Outdated
// Add an additional commit after the squash commit that normalizes the index. | ||
println!("committing normalization"); | ||
let msg = "Normalize index format\n\n\ | ||
More information can be found at https://github.com/rust-lang/crates.io/pull/5066"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to update this link to point to this PR, or perhaps an internals post?
@Turbo87 The rebase looks good to me. Let me know if there's anything else you need. I understand that this is a large change to the index repo and probably needs some time for discussion within the team. |
I've thought a bit about this PR, and there are a few things in it where I'm not sure yet whether it's the best solution or not:
unless I'm misreading the code, it sorts the version by semver, while previously they were sorted by insertion order. I understand that semver sorting might seem cleanest but a) the database can't sort them that way directly, b) it causes a big diff for most crates and c) when we run the index normalize command then new versions afterwards will still be sorted by insertion order. in other words: I think it might be easier to keep the current sort order and for #5066 change it to sort by
I was initially a bit worried by that, but since you've already shown in #5066 (comment) that this shouldn't be an issue, I guess this is fine :)
Unfortunately, with #5066 (comment) I might be partially responsible for the design of making this an admin command, but I wonder if it would be better to implement it as a background job. Since the background worker has a lock on the index repo it would allow us to run this migration without having to go into read-only mode. The dry-run functionality would have to look slightly different, potentially pushing the update to a dedicated branch instead of
While I agree that a timely squash would be useful, I'm not convinced that this part should be done automatically. It's easy enough to trigger manually after the result of the normalization has been confirmed to be correct. Sorry for the late feedback. It took me a while to understand my gut feeling hesitation on merging this as-is 😅 |
811ea40
to
e9d56da
Compare
I took the liberty of implementing the changes I mentioned above and tested it on the staging index. The result is rust-lang/staging.crates.io-index@1c98a7d, which looks reasonable to me. One thing I noticed it that some dependencies don't seem to have a Now that I think about it, I'm wondering for what reason we have the dev dependencies in the index at all. They should not be used for dependency resolution, so I'm wondering if we could remove those as well. That is unrelated to the current cleanup though. |
☔ The latest upstream changes (presumably #5771) made this pull request unmergeable. Please resolve the merge conflicts. |
Re-generates the git index by reading existing files, normalizing them and writing them back out again. Does not use the database.
Let's keep the current insertion time order for now.
You brought up a bunch of good points, but I think there's an underlying question required to make decisions here. Is normalizing the index a one time thing, or are we willing to do it multiple times? If were willing to do it multiple times, then a lot of these questions become less important because we can do them at the next normalization. The cargo team is attempting to get a more nuanced understanding of how "optional" most fields really are. Once that is done I would not be surprised if we could remove Of course the normalization and its diff with no semantic meaning will cause a hassle for a lot of people. It would not be unreasonable for us to decide we will only do this once. In which case all of these questions need to be decided before we do the one and only normalization. |
my understanding was that we can do multiple normalizations without major issues, so I wouldn't want to put all of these steps into a single PR. Since we would couple the normalization with a timely squash most people probably won't notice it anyway. |
Your changes look good to me. Thanks for taking the initiative here. Let me know if there's more you need help with.
|
We talked about this PR in our weekly team meeting today and the conclusion was that this should be good to go. I'll merge this, deploy it, and then do a dry run on the production index. Unless there are any surprises in that dry run, we will probably run the proper normalization some time next week. |
I may be missing something, but I don't see a sort of Line 214 in 8283354
|
The It was changed from a HashMap in this comment: arlosi@a26e319, so the normalization should end up sorting them by reading and writing back out. |
Adds a background job that regenerates the git index by reading existing files, normalizing them and writing them back out again. Does not use the database.
deps
fieldfeatures
field"links": null
(since it's optional)dep
"kind": null
with"kind": Normal
(appears to be an early implementation error)dep
"features": [""]
with"features": []
(appears to be an early implementation error)r? @Turbo87