Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fuzzy string matching to quick open search #98278

Merged
merged 1 commit into from
Nov 10, 2024

Conversation

a-johnston
Copy link
Contributor

@a-johnston a-johnston commented Oct 17, 2024

This is a continuation of #82200 to implement godotengine/godot-proposals#7771. Rebasing the first PR against #56772 was pretty awkward so I ended up flattening the earlier changes with @samsface as a co-author. A bit more info and pre-flattened commits are available in this pr against sam's fork samsface#1.

Functionally this adds optional fuzzy searching and search highlighting in the results:

Screenshot 2024-10-24 at 11 01 48 AM Screenshot 2024-10-24 at 11 01 38 AM Screenshot 2024-10-17 at 1 57 04 AM

The highlighting on the grid items feels a bit hacky due to the centered text; I'd appreciate any tips to do it more cleanly. This is also my first time working on a nontrivial c++ codebase so I'd appreciate any general tips to be more idiomatic or if I'm using anything incorrectly. Also a few semi-related changes:

  • Fixes a bug where, if adaptive isn't used, on first launch neither result container will be visible. Now it uses project metadata to track the last mode, defaulting to list.
  • Grid items are slightly wider, showing 6 per row with the default window size, in order to show more of the filename.

@a-johnston a-johnston requested review from a team as code owners October 17, 2024 20:20
@KoBeWi KoBeWi removed request for a team October 17, 2024 20:27
@KoBeWi KoBeWi added this to the 4.4 milestone Oct 17, 2024
@KoBeWi KoBeWi requested a review from a team October 17, 2024 20:28
@samsface
Copy link
Contributor

samsface commented Oct 18, 2024

@a-johnston How does this new algorithm work? There seems to be a lot more to it than the first two approaches. But it works pretty well. Opened a small PR to fix some C++ gotchas but whole thing looks good to me (though didn't look at any of the Dialog code).

Also, is there a way to not match "res://" ?
image

Is it correct that we rank HAT.wav first here?
image

@a-johnston
Copy link
Contributor Author

@samsface thanks for testing it out! The matching process is similar to the earlier ones, although there are definitely more helpers etc cluttering it up haha. It essentially is doing:

  • Match each token starting from 0
  • If a match is found, test to see if that match overlaps any previously matched sections and reject it if so
  • Otherwise, score the new match and potentially update the top scoring match
  • In either case of a match, move the offset from 0 to start + 1 and try again. If no match, quit

Scoring then prioritizes (in rough order of importance):

  • Exact token matches without breaks
  • Longer matched sections
  • Fewer missed query characters
  • Filename match
  • Word boundary match

Is it correct that we rank HAT.wav first here?

Results with equal scores are tie-broken first on length and then on their alphanumeric order, which is why HAT.wav is ordered before hat.wav -- I'll add a slight score deduction if the match relies on being case insensitive to break that in favor of the exact match.

Also, is there a way to not match "res://" ?

I hadn't even noticed that! Yeah I can add a way to the fuzzy search code to skip a given prefix, or we can change that part of the updated quick open popup to not include it to begin with.

Also thanks for mentioning the PR; I didn't get an email or notification about it so I would've missed it otherwise. TIL!

@a-johnston
Copy link
Contributor Author

I just updated it to 1) add a slight tie-breaker penalty if it relies on case insensitive matching and 2) ignore res:// by setting a new starting offset value. Ironically, similar to last time, I'll be out of town this weekend so slow updates from me for a bit.

@KoBeWi
Copy link
Member

KoBeWi commented Oct 18, 2024

I wonder if FuzzySearch can be used for autocompletion 🤔 We used to have something similar, but it was reverted.
Although for that, it can't be coupled to QuickOpenDialog.

Copy link
Contributor

@samsface samsface left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff @a-johnston. The algorithm works super well.

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

@a-johnston
Copy link
Contributor Author

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

Ah yeah I was thinking about something similar but instead considering the future potential score a la an advent of code problem from last(?) year. If you have the branch ready, feel free to pr against https://github.com/a-johnston/godot/tree/fuzzy-search-rebase

@samsface
Copy link
Contributor

samsface commented Oct 25, 2024

There's one more optimization I'd like to apply (but in another PR) that in my testing cut most of the benchmarks by 50%. A cull during the search that discards results early if their score is considerably less than the current max. I noticed we spend a ton of time adding then removing results that had like, a score of 3 while the max sore is 100.

Ah yeah I was thinking about something similar but instead considering the future potential score a la an advent of code problem from last(?) year. If you have the branch ready, feel free to pr against https://github.com/a-johnston/godot/tree/fuzzy-search-rebase

It's already fast enough (imo) so wouldn't want to delay the PR. Let me know if you need help with any coding work to get the PR merged.

@a-johnston
Copy link
Contributor Author

It's already fast enough (imo) so wouldn't want to delay the PR. Let me know if you need help with any coding work to get the PR merged.

Haha well you piqued my curiosity so I wanted to try it before nuking the benchmark. I settled on a pretty tame early cull criteria of being 1) negative (so, bad and missing query characters) and 2) at least 50 away from the current max score because I noticed that although the top results were of course preserved, the combination of match length being squared and a 100 bonus being given to exact token matches resulted in a query of ie spawner having a huge delta between spawn.tscn and spawner.tscn although to me the former still seems very relevant as a secondary result. From logging the early/late cull counts, this seems to have the most pronounced effect early when typing the query, when many more results are "valid" but negative, but later in the query most targets filter out with invalid matches. In the benchmark with this early cull criteria I saw a ~5% speedup for a synthetic 100k case and ~3% speedup for a 1k case, although in terms of the delta and not ratio, the 100k case had an average delta of ~3ms and the 1k case never had a delta greater than a few percent of a ms. That said, the benchmark is pretty biased due to the way I pruned down your tree to 1k lines.

Although the changes are pretty small I won't include them here but yeah for a future pr that addresses this

  • Keep in mind the nonlinear score bonuses and secondary results, not just the top result matching the benchmark's expected result
  • I've been changing the list item's name to name->set_text(p_candidate.file_path.get_file() + vformat(" (%d)", p_candidate.result->score)); to get a better intuition for the relative scores of displayed results

Copy link
Member

@bruvzg bruvzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label changes look good, string matching seems to work as expected.

@a-johnston a-johnston force-pushed the fuzzy-search-rebase branch 2 times, most recently from 77f72ad to 0532c88 Compare October 28, 2024 08:45
@a-johnston
Copy link
Contributor Author

Should I rebase these changes down to fewer total commits or is it fine as is?

@AThousandShips
Copy link
Member

Yes please squash your commits into one, with a clear commit message with the same title as the PR

Co-authored-by: sam <samsface@gmail.com>
@a-johnston
Copy link
Contributor Author

Not planning on making any more changes unless things are requested here but just to add visibility

  • There is an alternative RichTextLabel impl here Add fuzzy string matching to quick open search #98278 (comment) which avoids the Label changes
  • It would be good (and pretty easy) to add additional editor options for what colors to use in the highlight border/fill. This would especially be useful for a11y / custom theme stuff. If this wasn't already reviewed I probably would add it in here.

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me from a cursory review.

I trust that stakeholders have tested the algorithm well, I know there are Opinions™ regarding how fuzzy search should work, and every single change we've done so far brought its wave of discontents. Let's hope this time's the One Algorithm that Satisfies Them All.

@Repiteo Repiteo merged commit 63838c9 into godotengine:master Nov 10, 2024
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Nov 10, 2024

Thanks! Congratulations on your first contribution! 🎉

@a-johnston
Copy link
Contributor Author

Thank you!

every single change we've done so far brought its wave of discontents. Let's hope this time's the One Algorithm that Satisfies Them All.

Hopefully having added some relevant editor options helps with that :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants