Add fuzzy string matching to quick open search #98278

a-johnston · 2024-10-17T20:20:07Z

This is a continuation of #82200 to implement godotengine/godot-proposals#7771. Rebasing the first PR against #56772 was pretty awkward so I ended up flattening the earlier changes with @samsface as a co-author. A bit more info and pre-flattened commits are available in this pr against sam's fork samsface#1.

Functionally this adds optional fuzzy searching and search highlighting in the results:

The highlighting on the grid items feels a bit hacky due to the centered text; I'd appreciate any tips to do it more cleanly. This is also my first time working on a nontrivial c++ codebase so I'd appreciate any general tips to be more idiomatic or if I'm using anything incorrectly. Also a few semi-related changes:

Fixes a bug where, if adaptive isn't used, on first launch neither result container will be visible. Now it uses project metadata to track the last mode, defaulting to list.
Grid items are slightly wider, showing 6 per row with the default window size, in order to show more of the filename.

.pre-commit-config.yaml

samsface · 2024-10-18T09:43:46Z

@a-johnston How does this new algorithm work? There seems to be a lot more to it than the first two approaches. But it works pretty well. Opened a small PR to fix some C++ gotchas but whole thing looks good to me (though didn't look at any of the Dialog code).

Also, is there a way to not match "res://" ?

Is it correct that we rank HAT.wav first here?

a-johnston · 2024-10-18T17:20:34Z

@samsface thanks for testing it out! The matching process is similar to the earlier ones, although there are definitely more helpers etc cluttering it up haha. It essentially is doing:

Match each token starting from 0
If a match is found, test to see if that match overlaps any previously matched sections and reject it if so
Otherwise, score the new match and potentially update the top scoring match
In either case of a match, move the offset from 0 to start + 1 and try again. If no match, quit

Scoring then prioritizes (in rough order of importance):

Exact token matches without breaks
Longer matched sections
Fewer missed query characters
Filename match
Word boundary match

Is it correct that we rank HAT.wav first here?

Results with equal scores are tie-broken first on length and then on their alphanumeric order, which is why HAT.wav is ordered before hat.wav -- I'll add a slight score deduction if the match relies on being case insensitive to break that in favor of the exact match.

Also, is there a way to not match "res://" ?

I hadn't even noticed that! Yeah I can add a way to the fuzzy search code to skip a given prefix, or we can change that part of the updated quick open popup to not include it to begin with.

Also thanks for mentioning the PR; I didn't get an email or notification about it so I would've missed it otherwise. TIL!

a-johnston · 2024-10-18T18:31:10Z

I just updated it to 1) add a slight tie-breaker penalty if it relies on case insensitive matching and 2) ignore res:// by setting a new starting offset value. Ironically, similar to last time, I'll be out of town this weekend so slow updates from me for a bit.

core/templates/sort_array.h

doc/classes/EditorSettings.xml

scene/gui/dialogs.cpp

editor/editor_settings.cpp

core/string/fuzzy_search.h

KoBeWi · 2024-10-18T20:52:01Z

I wonder if FuzzySearch can be used for autocompletion 🤔 We used to have something similar, but it was reverted.
Although for that, it can't be coupled to QuickOpenDialog.

core/string/fuzzy_search.h

editor/gui/editor_quick_open_dialog.cpp

samsface

Awesome stuff @a-johnston. The algorithm works super well.

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

a-johnston · 2024-10-25T16:45:22Z

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

Ah yeah I was thinking about something similar but instead considering the future potential score a la an advent of code problem from last(?) year. If you have the branch ready, feel free to pr against https://github.com/a-johnston/godot/tree/fuzzy-search-rebase

samsface · 2024-10-25T18:58:27Z

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

Ah yeah I was thinking about something similar but instead considering the future potential score a la an advent of code problem from last(?) year. If you have the branch ready, feel free to pr against https://github.com/a-johnston/godot/tree/fuzzy-search-rebase

It's already fast enough (imo) so wouldn't want to delay the PR. Let me know if you need help with any coding work to get the PR merged.

a-johnston · 2024-10-25T19:35:55Z

It's already fast enough (imo) so wouldn't want to delay the PR. Let me know if you need help with any coding work to get the PR merged.

Haha well you piqued my curiosity so I wanted to try it before nuking the benchmark. I settled on a pretty tame early cull criteria of being 1) negative (so, bad and missing query characters) and 2) at least 50 away from the current max score because I noticed that although the top results were of course preserved, the combination of match length being squared and a 100 bonus being given to exact token matches resulted in a query of ie spawner having a huge delta between spawn.tscn and spawner.tscn although to me the former still seems very relevant as a secondary result. From logging the early/late cull counts, this seems to have the most pronounced effect early when typing the query, when many more results are "valid" but negative, but later in the query most targets filter out with invalid matches. In the benchmark with this early cull criteria I saw a ~5% speedup for a synthetic 100k case and ~3% speedup for a 1k case, although in terms of the delta and not ratio, the 100k case had an average delta of ~3ms and the 1k case never had a delta greater than a few percent of a ms. That said, the benchmark is pretty biased due to the way I pruned down your tree to 1k lines.

Although the changes are pretty small I won't include them here but yeah for a future pr that addresses this

Keep in mind the nonlinear score bonuses and secondary results, not just the top result matching the benchmark's expected result
I've been changing the list item's name to name->set_text(p_candidate.file_path.get_file() + vformat(" (%d)", p_candidate.result->score)); to get a better intuition for the relative scores of displayed results

Resolved

bruvzg

Label changes look good, string matching seems to work as expected.

scene/gui/label.h

scene/gui/label.cpp

editor/gui/editor_quick_open_dialog.h

editor/gui/editor_quick_open_dialog.cpp

a-johnston · 2024-10-28T17:57:49Z

Should I rebase these changes down to fewer total commits or is it fine as is?

AThousandShips · 2024-10-28T18:20:25Z

Yes please squash your commits into one, with a clear commit message with the same title as the PR

Co-authored-by: sam <samsface@gmail.com>

a-johnston · 2024-10-29T01:01:45Z

Not planning on making any more changes unless things are requested here but just to add visibility

There is an alternative RichTextLabel impl here Add fuzzy string matching to quick open search #98278 (comment) which avoids the Label changes
It would be good (and pretty easy) to add additional editor options for what colors to use in the highlight border/fill. This would especially be useful for a11y / custom theme stuff. If this wasn't already reviewed I probably would add it in here.

akien-mga

Code looks good to me from a cursory review.

I trust that stakeholders have tested the algorithm well, I know there are Opinions™ regarding how fuzzy search should work, and every single change we've done so far brought its wave of discontents. Let's hope this time's the One Algorithm that Satisfies Them All.

Repiteo · 2024-11-10T18:20:58Z

Thanks! Congratulations on your first contribution! 🎉

a-johnston · 2024-11-10T20:18:57Z

Thank you!

every single change we've done so far brought its wave of discontents. Let's hope this time's the One Algorithm that Satisfies Them All.

Hopefully having added some relevant editor options helps with that :P

a-johnston requested review from a team as code owners October 17, 2024 20:20

KoBeWi added enhancement topic:editor usability labels Oct 17, 2024

KoBeWi removed request for a team October 17, 2024 20:27

KoBeWi added this to the 4.4 milestone Oct 17, 2024

KoBeWi requested a review from a team October 17, 2024 20:28

KoBeWi reviewed Oct 17, 2024

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved