Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with ImageSearchAll with tolerance to color variation #36

Merged

Conversation

wind0204
Copy link
Contributor

@wind0204 wind0204 commented Dec 14, 2023

I didn't clean the comments, nor build 32-bit binaries, Please take a look into this pull request.

Best regards.

@iseahound
Copy link
Owner

I've only read through the code and it seems like it should work. I'll have to do some testing later. Could you change the line endings for ImagePut back? They got switched in your latest commit

@wind0204
Copy link
Contributor Author

wind0204 commented Dec 14, 2023

I've only read through the code and it seems like it should work. I'll have to do some testing later. Could you change the line endings for ImagePut back? They got switched in your latest commit

Aha! That was why the diff read so crazy on Github.

@iseahound
Copy link
Owner

Can you explain the algorithm for finding the focused pixel? I'm glad you figured out why this is important, for a sprite the first few pixels are invisible, so the imagesearch would run really slowly.

Should this be s + w * h?
unsigned int * last_pixel = s + w + h * w;

I'll move the algorithm for finding the focused pixel out of the ImageSearch (because there can be many ways to find a focused pixel), and will be sure to credit you once I figure out what yours does.

@wind0204
Copy link
Contributor Author

wind0204 commented Dec 14, 2023

Can you explain the algorithm for finding the focused pixel? I'm glad you figured out why this is important, for a sprite the first few pixels are invisible, so the imagesearch would run really slowly.

The algorithm came from no good reasoning, just thought it would be nice to locate a pixel a little bit far from the top-left corner that is not transparent, and use it as a quick reference.

Should this be s + w * h? unsigned int * last_pixel = s + w + h * w;

That looks simple and readable, but then the loop will stop right before the bottommost line.

@iseahound
Copy link
Owner

Okay, I'll merge but expect major changes to your code.

@iseahound iseahound merged commit 2fbbe63 into iseahound:master Dec 14, 2023
@iseahound
Copy link
Owner

iseahound commented Dec 14, 2023

I'm not sure if your code even runs, but the logic looks like it's there. ImageSearch doesn't seem to work, nor does ImageSearchAll.

Reasoning behind some of the changes:

  • The code to find a focused pixel was moved because it is a different problem than ImageSearch
  • I'm not sure why the focused pixel was being stored in the result array
  • Not sure if there's a difference between storing x | y << 16 or doing y = offset / width and x = offset % width
    Finally, it's faster to have separate C code for the 0 variation case. Each case has to be individually vectorized anyways for a 2x or more speedup.

@iseahound
Copy link
Owner

Benchmarks!

image

Line 1 - Original ImageSearch (No variation)
Line 2 - Your code (No variation) https://godbolt.org/z/qGexdGqMn
Line 3 - Your code (variation = 1)

@wind0204
Copy link
Contributor Author

wind0204 commented Dec 15, 2023

I'm not sure if your code even runs, but the logic looks like it's there. ImageSearch doesn't seem to work, nor does ImageSearchAll.

It ran on my wife's Windows system, with the game macro I made...

* The code to find a focused pixel was moved because it is a different problem than ImageSearch

I guess that's better, it adds the maintainability and organized look to the code base.

* I'm not sure why the focused pixel was being stored in the result array

I did it to prevent another search for the focused pixel in a repeatedly used sprite, just in case the sprite is big and bizarre especially with a lot of transparent part in the middle.

* Not sure if there's a difference between storing `x | y << 16` or doing `y = offset / width` and `x = offset % width`
  Finally, it's faster to have separate C code for the 0 variation case. Each case has to be individually vectorized anyways for a 2x or more speedup.

Awesome! Have you managed to vectorize it without using inline assembly? which SIMD instruction did it use?

@iseahound
Copy link
Owner

iseahound commented Dec 15, 2023

Oh I forgot to give context for the benchmarks. Line 2 doesn't matter, it will always default to the dedicated ImageSearch for no variation. It's running at 30% of the "max speed" which seems good to me. Thanks!

@iseahound
Copy link
Owner

iseahound commented Dec 15, 2023

Awesome! Have you managed to vectorize it without using inline assembly?

Could you expand on your cmov comment? I saw it pop up a few times with or without div, I'm not too familiar with assembly here.

which SIMD instruction did it use?

Ah not yet! I want to get the specific optimizations in place without using SIMD. Vectorized code is pretty brittle, meaning it's hard to modify after it's written.

@wind0204
Copy link
Contributor Author

wind0204 commented Dec 15, 2023

Oh I forgot to give context for the benchmarks. Line 2 doesn't matter, it will always default to the dedicated ImageSearch for no variation. It's running at 30% of the "max speed" which seems good to me.

Is the dedicated ImageSearch coming from the code I wrote? The number was horribly low. 14.27 / 137.23 = 10.4%

Awesome! Have you managed to vectorize it without using inline assembly?

Could you expand on your cmov comment? I saw it pop up a few times with or without div, I'm not too familiar with assembly here.

According to godbolt.org, it seems that GCC 11.4 with at minimum -O1 uses CMOV with my code and your code uses DIV, and I read on the other day that DIV is not good at wide parallel execution on stackoverflow.com. I actually don't have any idea if CMP + CMOV can outperform DIV or not. I have to admit that I just believed without knowing.

@wind0204
Copy link
Contributor Author

wind0204 commented Dec 15, 2023

* Not sure if there's a difference between storing `x | y << 16` or doing `y = offset / width` and `x = offset % width`

You must be talking about the line with "x << 16 | y" to give the result of search for focus pixel back to the function caller. the code is basically compressing the coordinates into 4 bytes, I could've used 8 bytes (which is the size of a pointer in a modern system) though.

@iseahound
Copy link
Owner

Feel free to continue the conversation at: #35

Repository owner locked as resolved and limited conversation to collaborators Dec 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants