Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize String.similarity by avoiding allocation for bigrams. #100041

Merged
merged 1 commit into from
Dec 10, 2024

Conversation

Ivorforce
Copy link
Contributor

@Ivorforce Ivorforce commented Dec 5, 2024

The logic is the same; there's just a lot less copying than before.

Relevant similarity benchmark goes down from 530 to 80 (6.6x increase in speed).
newplot1

I also tested longer strings. In this example, the benchmark goes down from 48019us to 2417us (20x increase in speed):

	String s1 = "Lorem ipsum anyone?";
	String s2 = "Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur?";

	auto t0 = std::chrono::high_resolution_clock::now();
	for (int i = 0; i < 1000; i ++) {
		// Test
		int x = s1.similarity(s2);
	}
	auto t1 = std::chrono::high_resolution_clock::now();
	std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";

@Ivorforce Ivorforce requested a review from a team as a code owner December 5, 2024 12:26
@Calinou Calinou added this to the 4.x milestone Dec 5, 2024
Copy link
Contributor

@aaronp64 aaronp64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Speed difference makes sense, and did some basic tests to confirm same results from old and new code:

func test_similarity():
	print("test1".similarity("test2"))
	print("test12".similarity("test23"))
	print("abc12".similarity("def12"))
	print("12abc".similarity("12def"))
	print("abc".similarity("def"))

Copy link
Contributor

@kiroxas kiroxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely better than all the allocations in bigrams !

@Ivorforce Ivorforce force-pushed the optimize-string-similarity branch from d694720 to 875b483 Compare December 8, 2024 12:28
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faster, easier to read, avoids allocations. All very nice :)

@clayjohn clayjohn modified the milestones: 4.x, 4.4 Dec 9, 2024
@Repiteo Repiteo merged commit ba66c47 into godotengine:master Dec 10, 2024
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Dec 10, 2024

Thanks!

@Ivorforce Ivorforce deleted the optimize-string-similarity branch December 10, 2024 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants