-
-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve BlurHashDecoder performance #4515
Conversation
Faiphone 4 (Android 13)
😲 🚀 |
The current Tusky code is like the nocache variant but a bit better since it doesn't allocate cache memory when no cache is used. For a fair comparison you should copy the Tusky class in the Benchmark project. Also I just realized I can add an extra optimization so don't merge this quite yet. |
Okay, I made it a bit faster (and one less allocation) for square images with the same number of colors in both dimensions. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work
By Christophe Beyls in tuskyapp/Tusky#4515. Their commit notes: Improve the performance of `BlurHashDecoder` while also reducing memory allocations. - Precompute cosines tables before composing the image so each cosine value is only computed once. - Compute cosines tables once if both are identical (for square images with the same number of colors in both dimensions). - Store colors in a one-dimension array instead of a two-dimension array to reduce memory allocations. - Use a simple String.indexOf() to find the index of a Base83 char, which is both faster and needs less memory than a HashMap thanks to better locality and no boxing of chars. - No cache is used, so computations may be performed in parallel on background threads without the need for synchronization which limits throughput.
By Christophe Beyls in tuskyapp/Tusky#4515. Their commit notes: Improve the performance of `BlurHashDecoder` while also reducing memory allocations. - Precompute cosines tables before composing the image so each cosine value is only computed once. - Compute cosines tables once if both are identical (for square images with the same number of colors in both dimensions). - Store colors in a one-dimension array instead of a two-dimension array to reduce memory allocations. - Use a simple String.indexOf() to find the index of a Base83 char, which is both faster and needs less memory than a HashMap thanks to better locality and no boxing of chars. - No cache is used, so computations may be performed in parallel on background threads without the need for synchronization which limits throughput.
This pull request aims to dramatically improve the performance of
BlurHashDecoder
while also reducing its memory allocations.String.indexOf()
to find the index of a Base83 char, which is both faster and needs less memory than aHashMap
thanks to better locality and no boxing of chars.Benchmarks
Simple: 4x4 colors, 32x32 pixels output. (This is what Mastodon and Tusky currently use)
Complex: 9x9 colors, 256x256 pixels output.
Pixel 7 (Android 14)
Nexus 5 (Android 6)
Conclusion: The new implementation is 3 times faster than the old one for the current usage and up to 9 times faster if we decide to increase the BlurHash quality in the future.
The source code of the benchmark comparing the original untouched Kotlin implementation to the new one can be found here.