-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Masks are broken. #146
Comments
No, it is not that the masks are broken. It is just no such a thing of "mask with all |
The mask with all zeros is just an illustrative example to show the issue with the mask application logic.You can try with other kinds of masks and verify that the results are not as expected. |
There are tests that make sure the masks work as expected. Many model won't work if the |
There are no tests that Note that it is quite possible for the |
Again, you'll need to provide an example to make this discussion concrete. The supported models are tested against huggingface transformer with the validation code in example folder and we do observe that missing masks would have huge impact on the numerical outputs. |
This problem could either be in this repo or NeuralAttentionlib, but I'm posting it here since the example using Transformers.jl is easier to run and more informative.
Basically, the implementation of all attention masks is broken, at least when using the default setup. For example, if one makes two masks, one with all
true
and another with allfalse
, one gets exactly the same output! You can verify that the output of the last two lines here is the same.My hypothesis is that this is due to the default
GenericMaskOp
being+
.Either way, it makes the package unusable.
The text was updated successfully, but these errors were encountered: