-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect behavior for Pattern::split #131
Comments
Interesting, RE2J and java.util.regex evidently differ on how to handle leading zero-length matches. I guess we should bring RE2J into line. I'd appreciate a pull request if it isn't too much trouble for you. Thanks for reporting this. |
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
@adonovan would you be able to take a look at #138? Unfortunately this change causes RE2J's Pattern.split method to be textually quite different from Go's. Also, it changes the behavior of split to omit empty matches, which brings it into line with |
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes google#131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
Fixes #131. This change modifies Pattern.split to omit a leading empty match. This behavior was specified in JDK8 and brings RE2/J split into line with more recent JDK implementations. Furthermore, the split function no longer needs determine the number of matches before assembling the result. The upshot is that the number of find() calls is halved in many cases. The benchmark in the previous change shows a significant improvement. Reference impl (JDK): BenchmarkSplit.benchmarkSplit JDK avgt 5 14.217 ± 0.410 us/op RE2J (before): BenchmarkSplit.benchmarkSplit RE2J avgt 5 95.807 ± 6.737 us/op RE2J (after): BenchmarkSplit.benchmarkSplit RE2J avgt 5 49.092 ± 0.717 us/op
How to reproduce
Versions
java 11.0.10
re2j 1.5
Should I try for a PR?
The text was updated successfully, but these errors were encountered: