You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry for the late reply, I missed your comment somehow. But yes, this issue did come up often. It was a collection of science articles which use less than with some letter as a variable quite often. I ultimately worked around it by regex replacing angle brackets for html tags, escaping any remaining angle brackets, then undoing the replacement. Not ideal (and probably would have errors in general), but ultimately it worked for my specific situation.
This is really complicated case. I don't know if we're ever going to get this right.
<t <a></a> <a></a>
^
I think the parser moves forward until it hits an invalid-character-in-attribute-name at ^. Then it switches from parsing a tag to parsing character data and slurps up <t <a> as character data:
{'type': Characters, 'data': '<t <a>'}
Bleach will then escape that. Then it slurps up the </a> and I think it drops it. Maybe because it's a closing tag with no matching opening tag. Then it moves on as you would expect.
I'll have to think about how Bleach can figure out that <t followed by an open tag should be treated as Characters and then a StartTag such that the internal state is correct.
I'll keep this open in case someone else wants to tinker with it and/or I find some more free time, but I don't think I'm going to get further with it today.
Describe the bug
The tag immediately following an unclosed left angle bracket (used in a "less than context") causes the next tag to be sanitized
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Additional context
The above error does not occur for non-alpha characters:
The text was updated successfully, but these errors were encountered: