@@ -185,37 +185,36 @@ A regular expression program is essentially a sequence of opcodes produced by
185
185
the compiler plus various facts about the regular expression (such as whether
186
186
it is anchored, its capture names, etc.).
187
187
188
- ### The regex! macro (or why ` regex::internal ` exists)
189
-
190
- The ` regex! ` macro is defined in the ` regex_macros ` crate as a compiler plugin,
191
- which is maintained in this repository. The ` regex! ` macro compiles a regular
192
- expression at compile time into specialized Rust code.
193
-
194
- The ` regex! ` macro was written when this library was first conceived and
195
- unfortunately hasn't changed much since then. In particular, it encodes the
196
- entire Pike VM into stack allocated space (no heap allocation is done). When
197
- ` regex! ` was first written, this provided a substantial speed boost over
198
- so-called "dynamic" regexes compiled at runtime, and in particular had much
199
- lower overhead per match. This was because the only matching engine at the
200
- time was the Pike VM. The addition of other matching engines has inverted
201
- the relationship; the ` regex! ` macro is almost never faster than the dynamic
202
- variant. (In fact, it is typically substantially slower.)
203
-
204
- In order to build the ` regex! ` macro this way, it must have access to some
205
- internals of the regex library, which is in a distinct crate. (Compiler plugins
206
- must be part of a distinct crate.) Namely, it must be able to compile a regular
207
- expression and access its opcodes. The necessary internals are exported as part
208
- of the top-level ` internal ` module in the regex library, but is hidden from
209
- public documentation. In order to present a uniform API between programs build
210
- by the ` regex! ` macro and their dynamic analoges, the ` Regex ` type is an enum
211
- whose variants are hidden from public documentation.
212
-
213
- In the future, the ` regex! ` macro should probably work more like Ragel, but
214
- it's not clear how hard this is. In particular, the ` regex! ` macro should be
215
- able to support all the features of dynamic regexes, which may be hard to do
216
- with a Ragel-style implementation approach. (Which somewhat suggests that the
217
- ` regex! ` macro may also need to grow conditional execution logic like the
218
- dynamic variants, which seems rather grotesque.)
188
+ ### The regex! macro
189
+
190
+ The ` regex! ` macro no longer exists. It was developed in a bygone era as a
191
+ compiler plugin during the infancy of the regex crate. Back then, then only
192
+ matching engine in the crate was the Pike VM. The ` regex! ` macro was, itself,
193
+ also a Pike VM. The only advantages it offered over the dynamic Pike VM that
194
+ was built at runtime were the following:
195
+
196
+ 1 . Syntax checking was done at compile time. Your Rust program wouldn't
197
+ compile if your regex didn't compile.
198
+ 2 . Reduction of overhead that was proportional to the size of the regex.
199
+ For the most part, this overhead consisted of heap allocation, which
200
+ was nearly eliminated in the compiler plugin.
201
+
202
+ The main takeaway here is that the compiler plugin was a marginally faster
203
+ version of a slow regex engine. As the regex crate evolved, it grew other regex
204
+ engines (DFA, bounded backtracker) and sophisticated literal optimizations.
205
+ The regex macro didn't keep pace, and it therefore became (dramatically) slower
206
+ than the dynamic engines. The only reason left to use it was for the compile
207
+ time guarantee that your regex is correct. Fortunately, Clippy (the Rust lint
208
+ tool) has a lint that checks your regular expression validity, which mostly
209
+ replaces that use case.
210
+
211
+ Additionally, the regex compiler plugin stopped receiving maintenance. Nobody
212
+ complained. At that point, it seemed prudent to just remove it.
213
+
214
+ Will a compiler plugin be brought back? The future is murky, but there is
215
+ definitely an opportunity there to build something that is faster than the
216
+ dynamic engines in some cases. But it will be challenging! As of now, there
217
+ are no plans to work on this.
219
218
220
219
221
220
## Testing
@@ -236,7 +235,6 @@ the AT&T test suite) and code generate tests for each matching engine. The
236
235
approach we use in this library is to create a Cargo.toml entry point for each
237
236
matching engine we want to test. The entry points are:
238
237
239
- * ` tests/test_plugin.rs ` - tests the ` regex! ` macro
240
238
* ` tests/test_default.rs ` - tests ` Regex::new `
241
239
* ` tests/test_default_bytes.rs ` - tests ` bytes::Regex::new `
242
240
* ` tests/test_nfa.rs ` - tests ` Regex::new ` , forced to use the NFA
@@ -261,10 +259,6 @@ entry points, it can take a while to compile everything. To reduce compile
261
259
times slightly, try using ` cargo test --test default ` , which will only use the
262
260
` tests/test_default.rs ` entry point.
263
261
264
- N.B. To run tests for the ` regex! ` macro, use:
265
-
266
- cargo test --manifest-path regex_macros/Cargo.toml
267
-
268
262
269
263
## Benchmarking
270
264
@@ -284,7 +278,6 @@ separately from the main regex crate.
284
278
Benchmarking follows a similarly wonky setup as tests. There are multiple entry
285
279
points:
286
280
287
- * ` bench_rust_plugin.rs ` - benchmarks the ` regex! ` macro
288
281
* ` bench_rust.rs ` - benchmarks ` Regex::new `
289
282
* ` bench_rust_bytes.rs ` benchmarks ` bytes::Regex::new `
290
283
* ` bench_pcre.rs ` - benchmarks PCRE
0 commit comments