From 1d38d39dd7d62e4075e44bd94ff09a78ffce3ae0 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Mon, 23 Sep 2024 12:56:47 -0400 Subject: [PATCH 1/4] Add idenfitiers to mbe and proc-macro --- src/macros-by-example.md | 90 ++++++++++++++++++++++++++++++++++++---- src/procedural-macros.md | 46 ++++++++++++++++++++ 2 files changed, 127 insertions(+), 9 deletions(-) diff --git a/src/macros-by-example.md b/src/macros-by-example.md index e95cd2e64..93320f905 100644 --- a/src/macros-by-example.md +++ b/src/macros-by-example.md @@ -1,5 +1,8 @@ # Macros By Example +r[macro.decl] + +r[macro.decl.syntax] > **Syntax**\ > _MacroRulesDefinition_ :\ >    `macro_rules` `!` [IDENTIFIER] _MacroRulesDef_ @@ -39,6 +42,7 @@ > _MacroTranscriber_ :\ >    [_DelimTokenTree_] +r[macro.decl.intro] `macro_rules` allows users to define syntax extension in a declarative way. We call such extensions "macros by example" or simply "macros". @@ -51,10 +55,15 @@ items), types, or patterns. ## Transcribing +r[macro.decl.transcription] + +r[macro.decl.transcription.intro] When a macro is invoked, the macro expander looks up macro invocations by name, and tries each macro rule in turn. It transcribes the first successful match; if -this results in an error, then future matches are not tried. When matching, no -lookahead is performed; if the compiler cannot unambiguously determine how to +this results in an error, then future matches are not tried. + +r[macro.decl.transcription.lookahead] +When matching, no lookahead is performed; if the compiler cannot unambiguously determine how to parse the macro invocation one token at a time, then it is an error. In the following example, the compiler does not look ahead past the identifier to see if the following token is a `)`, even though that would allow it to parse the @@ -68,6 +77,7 @@ macro_rules! ambiguity { ambiguity!(error); // Error: local ambiguity ``` +r[macro.decl.transcription.syntax] In both the matcher and the transcriber, the `$` token is used to invoke special behaviours from the macro engine (described below in [Metavariables] and [Repetitions]). Tokens that aren't part of such an invocation are matched and @@ -78,6 +88,8 @@ instance, the matcher `(())` will match `{()}` but not `{{}}`. The character ### Forwarding a matched fragment +r[macro.decl.transcription.fragment] + When forwarding a matched fragment to another macro-by-example, matchers in the second macro will see an opaque AST of the fragment type. The second macro can't use literal tokens to match the fragments in the matcher, only a @@ -116,9 +128,14 @@ foo!(3); ## Metavariables +r[macro.decl.meta] + +r[macro.decl.meta.intro] In the matcher, `$` _name_ `:` _fragment-specifier_ matches a Rust syntax -fragment of the kind specified and binds it to the metavariable `$`_name_. Valid -fragment specifiers are: +fragment of the kind specified and binds it to the metavariable `$`_name_. + +r[macro.decl.meta.specifier] +Valid fragment specifiers are: * `item`: an [_Item_] * `block`: a [_BlockExpression_] @@ -136,18 +153,23 @@ fragment specifiers are: * `vis`: a possibly empty [_Visibility_] qualifier * `literal`: matches `-`?[_LiteralExpression_] +r[macro.decl.meta.transcription] In the transcriber, metavariables are referred to simply by `$`_name_, since the fragment kind is specified in the matcher. Metavariables are replaced with -the syntax element that matched them. The keyword metavariable `$crate` can be -used to refer to the current crate; see [Hygiene] below. Metavariables can be +the syntax element that matched them. + +r[macro.decl.meta.dollar-crate] +The keyword metavariable `$crate` can be used to refer to the current crate; see [Hygiene] below. Metavariables can be transcribed more than once or not at all. +r[macro.decl.meta.expr-underscore] For reasons of backwards compatibility, though `_` [is also an expression][_UnderscoreExpression_], a standalone underscore is not matched by the `expr` fragment specifier. However, `_` is matched by the `expr` fragment specifier when it appears as a subexpression. For the same reason, a standalone [const block] is not matched but it is matched when appearing as a subexpression. +r[macro.decl.meta.edition2021] > **Edition differences**: Starting with the 2021 edition, `pat` fragment-specifiers match top-level or-patterns (that is, they accept [_Pattern_]). > > Before the 2021 edition, they match exactly the same fragments as `pat_param` (that is, they accept [_PatternNoTopAlt_]). @@ -156,22 +178,31 @@ For the same reason, a standalone [const block] is not matched but it is matched ## Repetitions +r[macro.decl.repetition] + +r[macro.decl.repetition.intro] In both the matcher and transcriber, repetitions are indicated by placing the tokens to be repeated inside `$(`…`)`, followed by a repetition operator, -optionally with a separator token between. The separator token can be any token +optionally with a separator token between. + +r[macro.decl.repetition.separator] +The separator token can be any token other than a delimiter or one of the repetition operators, but `;` and `,` are the most common. For instance, `$( $i:ident ),*` represents any number of identifiers separated by commas. Nested repetitions are permitted. +r[macro.decl.repetition.operators] The repetition operators are: - `*` --- indicates any number of repetitions. - `+` --- indicates any number but at least one. - `?` --- indicates an optional fragment with zero or one occurrence. +r[macro.decl.repetition.optional-restriction] Since `?` represents at most one occurrence, it cannot be used with a separator. +r[macro.decl.repetition.fragment] The repeated fragment both matches and transcribes to the specified number of the fragment, separated by the separator token. Metavariables are matched to every repetition of their corresponding fragment. For instance, the `$( $i:ident @@ -198,6 +229,9 @@ compiler knows how to expand them properly: ## Scoping, Exporting, and Importing +r[macro.decl.scope] + +r[macro.decl.scope.intro] For historical reasons, the scoping of macros by example does not work entirely like items. Macros have two forms of scope: textual scope, and path-based scope. Textual scope is based on the order that things appear in source files, or even @@ -205,6 +239,7 @@ across multiple files, and is the default scoping. It is explained further below Path-based scope works exactly the same way that item scoping does. The scoping, exporting, and importing of macros is controlled largely by attributes. +r[macro.decl.scope.unqualified] When a macro is invoked by an unqualified identifier (not part of a multi-part path), it is first looked up in textual scoping. If this does not yield any results, then it is looked up in path-based scoping. If the macro's name is @@ -224,6 +259,9 @@ self::lazy_static!{} // Path-based lookup ignores our macro, finds imported one. ### Textual Scope +r[macro.decl.scope.textual] + +r[macro.decl.scope.textual.intro] Textual scope is based largely on the order that things appear in source files, and works similarly to the scope of local variables declared with `let` except it also applies at the module level. When `macro_rules!` is used to define a @@ -253,6 +291,7 @@ mod has_macro { m!{} // OK: appears after declaration of m in src/lib.rs ``` +r[macro.decl.scope.textual.shadow] It is not an error to define a macro multiple times; the most recent declaration will shadow the previous one unless it has gone out of scope. @@ -299,6 +338,9 @@ fn foo() { ### The `macro_use` attribute +r[macro.decl.scope.macro_use] + +r[macro.decl.scope.macro_use.mod-decl] The *`macro_use` attribute* has two purposes. First, it can be used to make a module's macro scope not end when the module is closed, by applying it to a module: @@ -314,6 +356,7 @@ mod inner { m!(); ``` +r[macro.decl.scope.macro_use.prelude] Second, it can be used to import macros from another crate, by attaching it to an `extern crate` declaration appearing in the crate's root module. Macros imported this way are imported into the [`macro_use` prelude], not textually, @@ -332,11 +375,15 @@ lazy_static!{} // self::lazy_static!{} // Error: lazy_static is not defined in `self` ``` +r[macro.decl.scope.macro_use.export] Macros to be imported with `#[macro_use]` must be exported with `#[macro_export]`, which is described below. ### Path-Based Scope +r[macro.decl.scope.path] + +r[macro.decl.scope.path.intro] By default, a macro has no path-based scope. However, if it has the `#[macro_export]` attribute, then it is declared in the crate root scope and can be referred to normally as such: @@ -358,11 +405,15 @@ mod mac { } ``` +r[macro.decl.scope.path.export] Macros labeled with `#[macro_export]` are always `pub` and can be referred to by other crates, either by path or by `#[macro_use]` as described above. ## Hygiene +r[macro.decl.hygiene] + +r[macreo.decl.hygiene.intro] By default, all identifiers referred to in a macro are expanded as-is, and are looked up at the macro's invocation site. This can lead to issues if a macro refers to an item or macro which isn't in scope at the invocation site. To @@ -406,6 +457,7 @@ pub mod inner { } ``` +r[macro.decl.hygiene.vis] Additionally, even though `$crate` allows a macro to refer to items within its own crate when expanding, its use has no effect on visibility. An item or macro referred to must still be visible from the invocation site. In the following @@ -429,6 +481,7 @@ fn foo() {} > modified to use `$crate` or `local_inner_macros` to work well with path-based > imports. +r[macro.decl.hygeine.local_inner_macros] When a macro is exported, the `#[macro_export]` attribute can have the `local_inner_macros` keyword added to automatically prefix all contained macro invocations with `$crate::`. This is intended primarily as a tool to migrate @@ -449,9 +502,15 @@ macro_rules! helper { ## Follow-set Ambiguity Restrictions +r[macro.decl.follow-set] + +r[macro.decl.follow-set.intro] The parser used by the macro system is reasonably powerful, but it is limited in -order to prevent ambiguity in current or future versions of the language. In -particular, in addition to the rule about ambiguous expansions, a nonterminal +order to prevent ambiguity in current or future versions of the language. + + +r[macro.decl.follow-set.token-restriction] +In particular, in addition to the rule about ambiguous expansions, a nonterminal matched by a metavariable must be followed by a token which has been decided can be safely used after that kind of match. @@ -464,19 +523,32 @@ matcher would become ambiguous or would misparse, breaking working code. Matchers like `$i:expr,` or `$i:expr;` would be legal, however, because `,` and `;` are legal expression separators. The specific rules are: +r[macro.decl.follow-set.token-expr-stmt] * `expr` and `stmt` may only be followed by one of: `=>`, `,`, or `;`. + +r[macro.decl.follow-set.token-pat_param] * `pat_param` may only be followed by one of: `=>`, `,`, `=`, `|`, `if`, or `in`. + +r[macro.decl.follow-set.token-pat] * `pat` may only be followed by one of: `=>`, `,`, `=`, `if`, or `in`. + +r[macro.decl.follow-set.token-path-ty] * `path` and `ty` may only be followed by one of: `=>`, `,`, `=`, `|`, `;`, `:`, `>`, `>>`, `[`, `{`, `as`, `where`, or a macro variable of `block` fragment specifier. + +r[macro.decl.follow-set.token-vis] * `vis` may only be followed by one of: `,`, an identifier other than a non-raw `priv`, any token that can begin a type, or a metavariable with a `ident`, `ty`, or `path` fragment specifier. + +r[macro.decl.follow-set.token-other] * All other fragment specifiers have no restrictions. +r[macro.decl.follow-set.edition2021] > **Edition differences**: Before the 2021 edition, `pat` may also be followed by `|`. +r[macro.decl.follow-set.repetition] When repetitions are involved, then the rules apply to every possible number of expansions, taking separators into account. This means: diff --git a/src/procedural-macros.md b/src/procedural-macros.md index a97755f7f..0421fc786 100644 --- a/src/procedural-macros.md +++ b/src/procedural-macros.md @@ -1,5 +1,8 @@ ## Procedural Macros +r[macro.proc] + +r[macro.proc.intro] *Procedural macros* allow creating syntax extensions as execution of a function. Procedural macros come in one of three flavors: @@ -11,6 +14,7 @@ Procedural macros allow you to run code at compile time that operates over Rust syntax, both consuming and producing Rust syntax. You can sort of think of procedural macros as functions from an AST to another AST. +r[macro.proc.def] Procedural macros must be defined in the root of a crate with the [crate type] of `proc-macro`. The macros may not be used from the crate where they are defined, and can only be used when imported in another crate. @@ -23,6 +27,7 @@ The macros may not be used from the crate where they are defined, and can only b > proc-macro = true > ``` +r[macro.proc.result] As functions, they must either return syntax, panic, or loop endlessly. Returned syntax either replaces or adds the syntax depending on the kind of procedural macro. Panics are caught by the compiler and are turned into a compiler error. @@ -34,15 +39,20 @@ that the compiler has access to. Similarly, file access is the same. Because of this, procedural macros have the same security concerns that [Cargo's build scripts] have. +r[macro.proc.error] Procedural macros have two ways of reporting errors. The first is to panic. The second is to emit a [`compile_error`] macro invocation. ### The `proc_macro` crate +r[macro.proc.proc_macro] + +r[macro.proc.proc_macro.intro] Procedural macro crates almost always will link to the compiler-provided [`proc_macro` crate]. The `proc_macro` crate provides types required for writing procedural macros and facilities to make it easier. +r[macro.proc.proc_macro.token-stream] This crate primarily contains a [`TokenStream`] type. Procedural macros operate over *token streams* instead of AST nodes, which is a far more stable interface over time for both the compiler and for procedural macros to target. A @@ -51,6 +61,7 @@ can roughly be thought of as lexical token. For example `foo` is an `Ident` token, `.` is a `Punct` token, and `1.2` is a `Literal` token. The `TokenStream` type, unlike `Vec`, is cheap to clone. +r[macro.proc.proc_macro.span] All tokens have an associated `Span`. A `Span` is an opaque value that cannot be modified but can be manufactured. `Span`s represent an extent of source code within a program and are primarily used for error reporting. While you @@ -59,6 +70,8 @@ with any token, such as through getting a `Span` from another token. ### Procedural macro hygiene +r[macro.proc.hygiene] + Procedural macros are *unhygienic*. This means they behave as if the output token stream was simply written inline to the code it's next to. This means that it's affected by external items and also affects external imports. @@ -71,13 +84,19 @@ other functions (like `__internal_foo` instead of `foo`). ### Function-like procedural macros +r[macro.proc.function] + +r[macro.proc.function.intro] *Function-like procedural macros* are procedural macros that are invoked using the macro invocation operator (`!`). +r[macro.proc.function.def] These macros are defined by a [public] [function] with the `proc_macro` [attribute] and a signature of `(TokenStream) -> TokenStream`. The input [`TokenStream`] is what is inside the delimiters of the macro invocation and the output [`TokenStream`] replaces the entire macro invocation. + +r[macro.proc.function.namespace] The `proc_macro` attribute defines the macro in the [macro namespace] in the root of the crate. For example, the following macro definition ignores its input and outputs a @@ -109,6 +128,7 @@ fn main() { } ``` +r[macro.proc.function.invocation] Function-like procedural macros may be invoked in any macro invocation position, which includes [statements], [expressions], [patterns], [type expressions], [item] positions, including items in [`extern` blocks], inherent @@ -116,14 +136,21 @@ and trait [implementations], and [trait definitions]. ### Derive macros +r[macro.proc.derive] + +r[macro.proc.derive.intro] *Derive macros* define new inputs for the [`derive` attribute]. These macros can create new [items] given the token stream of a [struct], [enum], or [union]. They can also define [derive macro helper attributes]. +r[macro.proc.derive.def] Custom derive macros are defined by a [public] [function] with the `proc_macro_derive` attribute and a signature of `(TokenStream) -> TokenStream`. + +r[macro.proc.derive.namespace] The `proc_macro_derive` attribute defines the custom derive in the [macro namespace] in the root of the crate. +r[macro.proc.derive.output] The input [`TokenStream`] is the token stream of the item that has the `derive` attribute on it. The output [`TokenStream`] must be a set of items that are then appended to the [module] or [block] that the item from the input @@ -161,11 +188,15 @@ fn main() { #### Derive macro helper attributes +r[macro.proc.derive.attributes] + +r[macro.proc.derive.attributes.intro] Derive macros can add additional [attributes] into the scope of the [item] they are on. Said attributes are called *derive macro helper attributes*. These attributes are [inert], and their only purpose is to be fed into the derive macro that defined them. That said, they can be seen by all macros. +r[macro.proc.derive.attributes.def] The way to define helper attributes is to put an `attributes` key in the `proc_macro_derive` macro with a comma separated list of identifiers that are the names of the helper attributes. @@ -197,10 +228,14 @@ struct Struct { ### Attribute macros +r[macro.proc.attribute] + +r[macro.proc.attribute.intro] *Attribute macros* define new [outer attributes][attributes] which can be attached to [items], including items in [`extern` blocks], inherent and trait [implementations], and [trait definitions]. +r[macro.proc.attribute.def] Attribute macros are defined by a [public] [function] with the `proc_macro_attribute` [attribute] that has a signature of `(TokenStream, TokenStream) -> TokenStream`. The first [`TokenStream`] is the delimited token @@ -209,6 +244,8 @@ the attribute is written as a bare attribute name, the attribute [`TokenStream`] is empty. The second [`TokenStream`] is the rest of the [item] including other [attributes] on the [item]. The returned [`TokenStream`] replaces the [item] with an arbitrary number of [items]. + +r[macro.proc.attribute.namespace] The `proc_macro_attribute` attribute defines the attribute in the [macro namespace] in the root of the crate. For example, this attribute macro takes the input stream and returns it as is, @@ -278,9 +315,13 @@ fn invoke4() {} ### Declarative macro tokens and procedural macro tokens +r[macro.proc.token] + +r[macro.proc.token.intro] Declarative `macro_rules` macros and procedural macros use similar, but different definitions for tokens (or rather [`TokenTree`s].) +r[macro.proc.token.macro_rules] Token trees in `macro_rules` (corresponding to `tt` matchers) are defined as - Delimited groups (`(...)`, `{...}`, etc) - All operators supported by the language, both single-character and @@ -296,6 +337,7 @@ Token trees in `macro_rules` (corresponding to `tt` matchers) are defined as expansion, which will be considered a single token tree regardless of the passed expression) +r[macro.proc.token.tree] Token trees in procedural macros are defined as - Delimited groups (`(...)`, `{...}`, etc) - All punctuation characters used in operators supported by the language (`+`, @@ -306,11 +348,13 @@ Token trees in procedural macros are defined as and floating point literals. - Identifiers, including keywords (`ident`, `r#ident`, `fn`) +r[macro.proc.token.converstion-intro] Mismatches between these two definitions are accounted for when token streams are passed to and from procedural macros. \ Note that the conversions below may happen lazily, so they might not happen if the tokens are not actually inspected. +r[macro.proc.token.conversion] When passed to a proc-macro - All multi-character operators are broken into single characters. - Lifetimes are broken into a `'` character and an identifier. @@ -322,6 +366,7 @@ When passed to a proc-macro - `tt` and `ident` substitutions are never wrapped into such groups and always represented as their underlying token trees. +r[macro.proc.token.emission] When emitted from a proc macro - Punctuation characters are glued into multi-character operators when applicable. @@ -330,6 +375,7 @@ When emitted from a proc macro possibly wrapped into a delimited group ([`Group`]) with implicit delimiters ([`Delimiter::None`]) when it's necessary for preserving parsing priorities. +r[macro.proc.token.doc-comment] Note that neither declarative nor procedural macros support doc comment tokens (e.g. `/// Doc`), so they are always converted to token streams representing their equivalent `#[doc = r"str"]` attributes when passed to macros. From 6c6dd244b81da7aa5e1b13c0d2d485c9cc6965a1 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Mon, 23 Sep 2024 12:57:20 -0400 Subject: [PATCH 2/4] Remove double line breaks --- src/macros-by-example.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/macros-by-example.md b/src/macros-by-example.md index 93320f905..6009eedd3 100644 --- a/src/macros-by-example.md +++ b/src/macros-by-example.md @@ -332,7 +332,6 @@ fn foo() { m!(); } - // m!(); // Error: m is not in scope. ``` @@ -508,7 +507,6 @@ r[macro.decl.follow-set.intro] The parser used by the macro system is reasonably powerful, but it is limited in order to prevent ambiguity in current or future versions of the language. - r[macro.decl.follow-set.token-restriction] In particular, in addition to the rule about ambiguous expansions, a nonterminal matched by a metavariable must be followed by a token which has been decided can @@ -562,7 +560,6 @@ expansions, taking separators into account. This means: * If the repetition can match zero times (`*` or `?`), then whatever comes after must be able to follow whatever comes before. - For more detail, see the [formal specification]. [const block]: expressions/block-expr.md#const-blocks From e6480e4e943462310816a996af9bf6987b261c56 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Wed, 2 Oct 2024 10:57:20 -0700 Subject: [PATCH 3/4] Fix misspellings --- src/macros-by-example.md | 4 ++-- src/procedural-macros.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/macros-by-example.md b/src/macros-by-example.md index 6009eedd3..17e738897 100644 --- a/src/macros-by-example.md +++ b/src/macros-by-example.md @@ -412,7 +412,7 @@ by other crates, either by path or by `#[macro_use]` as described above. r[macro.decl.hygiene] -r[macreo.decl.hygiene.intro] +r[macro.decl.hygiene.intro] By default, all identifiers referred to in a macro are expanded as-is, and are looked up at the macro's invocation site. This can lead to issues if a macro refers to an item or macro which isn't in scope at the invocation site. To @@ -480,7 +480,7 @@ fn foo() {} > modified to use `$crate` or `local_inner_macros` to work well with path-based > imports. -r[macro.decl.hygeine.local_inner_macros] +r[macro.decl.hygiene.local_inner_macros] When a macro is exported, the `#[macro_export]` attribute can have the `local_inner_macros` keyword added to automatically prefix all contained macro invocations with `$crate::`. This is intended primarily as a tool to migrate diff --git a/src/procedural-macros.md b/src/procedural-macros.md index 0421fc786..a9b24e1bb 100644 --- a/src/procedural-macros.md +++ b/src/procedural-macros.md @@ -348,7 +348,7 @@ Token trees in procedural macros are defined as and floating point literals. - Identifiers, including keywords (`ident`, `r#ident`, `fn`) -r[macro.proc.token.converstion-intro] +r[macro.proc.token.conversion-intro] Mismatches between these two definitions are accounted for when token streams are passed to and from procedural macros. \ Note that the conversions below may happen lazily, so they might not happen if From 1f10a46dff970181c7b2fc93dcdd41c4d04863be Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Wed, 2 Oct 2024 11:12:35 -0700 Subject: [PATCH 4/4] Some minor rule identifier tweaks Shooting for some consistency with these. --- src/procedural-macros.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/procedural-macros.md b/src/procedural-macros.md index a9b24e1bb..0ae6e26d5 100644 --- a/src/procedural-macros.md +++ b/src/procedural-macros.md @@ -348,13 +348,13 @@ Token trees in procedural macros are defined as and floating point literals. - Identifiers, including keywords (`ident`, `r#ident`, `fn`) -r[macro.proc.token.conversion-intro] +r[macro.proc.token.conversion.intro] Mismatches between these two definitions are accounted for when token streams are passed to and from procedural macros. \ Note that the conversions below may happen lazily, so they might not happen if the tokens are not actually inspected. -r[macro.proc.token.conversion] +r[macro.proc.token.conversion.to-proc_macro] When passed to a proc-macro - All multi-character operators are broken into single characters. - Lifetimes are broken into a `'` character and an identifier. @@ -366,7 +366,7 @@ When passed to a proc-macro - `tt` and `ident` substitutions are never wrapped into such groups and always represented as their underlying token trees. -r[macro.proc.token.emission] +r[macro.proc.token.conversion.from-proc_macro] When emitted from a proc macro - Punctuation characters are glued into multi-character operators when applicable.