Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect match found when capturing groups are not used #93

Closed
EricEdens opened this issue Aug 14, 2019 · 0 comments · Fixed by #107
Closed

Incorrect match found when capturing groups are not used #93

EricEdens opened this issue Aug 14, 2019 · 0 comments · Fixed by #107
Assignees

Comments

@EricEdens
Copy link
Contributor

This test passes when using java.util.regex, but fails with re2j:

  @Test
  public void test() {
    Pattern p1 = Pattern.compile("(a.*?c)|a.*?b");
    Pattern p2 = Pattern.compile("a.*?c|a.*?b");

    Matcher m1 = p1.matcher("abc");
    m1.find();
    Matcher m2 = p2.matcher("abc");
    m2.find();

    assertEquals(m1.group(), m2.group());
  }

Both expressions should match abc. The second one only matches ab.

Fixed in go: golang/go#13812

@sjamesr sjamesr self-assigned this Jun 2, 2020
sjamesr added a commit to sjamesr/re2j that referenced this issue Jun 2, 2020
In the past, `a.*?c|a.*?b` was factored to `a.*?[bc]`. Thus, given
"abc" as its input string, the automaton would consume "ab" and
then stop (when unanchored) whereas it should consume all of "abc"
as per leftmost semantics.

This fix is ported from Go's regex impl,
https://go-review.googlesource.com/c/go/+/18357/.

Fixes google#93.
sjamesr added a commit to sjamesr/re2j that referenced this issue Jun 2, 2020
In the past, `a.*?c|a.*?b` was factored to `a.*?[bc]`. Thus, given
"abc" as its input string, the automaton would consume "ab" and
then stop (when unanchored) whereas it should consume all of "abc"
as per leftmost semantics.

This fix is ported from Go's regex impl,
https://go-review.googlesource.com/c/go/+/18357/.

Fixes google#93.
sjamesr added a commit to sjamesr/re2j that referenced this issue Jun 3, 2020
In the past, `a.*?c|a.*?b` was factored to `a.*?[bc]`. Thus, given
"abc" as its input string, the automaton would consume "ab" and
then stop (when unanchored) whereas it should consume all of "abc"
as per leftmost semantics.

This fix is ported from Go's regex impl,
https://go-review.googlesource.com/c/go/+/18357/.

Fixes google#93.
sjamesr added a commit that referenced this issue Jun 3, 2020
In the past, `a.*?c|a.*?b` was factored to `a.*?[bc]`. Thus, given
"abc" as its input string, the automaton would consume "ab" and
then stop (when unanchored) whereas it should consume all of "abc"
as per leftmost semantics.

This fix is ported from Go's regex impl,
https://go-review.googlesource.com/c/go/+/18357/.

Fixes #93.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants