A highly customizable language front-end and parser generator. Designed for rapid prototyping of domain-specific language implementations. Applicable also for general-purpose languages.
XFormula is a language front-end tool that enables developers to define language syntax and semantics using the object-oriented paradigm, achieving exceptional modularity and flexibility. It offers a set of built-in, commonly used features for general-purpose languages that can be omitted or extended as needed for rapid prototyping.
While XFormula itself does not provide any compilation or evaluation capabilities, it allows you to define a highly flexible and customizable AST structure in a modular way. Additionally, it includes a parser that generates an AST based on a given input string.
At its core, XFormula is a parser generator that leverages the powerful Lark Parser Toolkit under the hood. Lark supports LALR(1), Earley, and CYK parsing algorithms, and XFormula's default features are designed to be compatible with the LALR(1) algorithm, which is renowned for its speed and efficiency in both time (CPU) and space (memory).
If you are already familiar with the terminology, the best way to understand how to use XFormula is by examining the following modules:
- xformula.syntax.ast
- xformula.syntax.core.features
- xformula.syntax.core.operations.default_operator_precedences
The final EBNF grammar, generated automatically by XFormula using the default features, is output to the out/Grammar.lark file.
To illustrate the development process, let’s define a few simple syntax
features and their corresponding AST nodes for the none
and bool
types.
The first step is to specify how the tokens should be parsed.
from xformula.runtime.core.context.abc import RuntimeContext
from xformula.syntax.grammar.ebnf import non_terminal
from xformula.syntax.grammar.terminals.abc import Terminal
from xformula.syntax.lexer.tokens.abc import Token
class NONE(
# Note: The `Terminal` class is generic and requires the type of the
# transformed token as a type argument.
Terminal[None],
):
class Meta:
# The priority of the terminal in the lexer rules.
priority = 2000
tags = {
# The priority of this terminal in the `None` non-terminal grammar
# rules group.
non_terminal("None"): 0,
}
# The grammar definition of the terminal,
# that will be used by the lexer.
def build_grammar(self) -> str:
define = self.ebnf.define
regex = self.ebnf.regex
bound = self.regex.bound
word = self.regex.word
return define(regex(bound(word("none"))))
# The runtime transformation of the token.
def transform_token(
self,
runtime_context: RuntimeContext,
token: Token,
) -> None:
# The token is already of type `None`.
return None
Next, define the bool
type:
from xformula.runtime.core.context.abc import RuntimeContext
from xformula.syntax.grammar.ebnf import non_terminal
from xformula.syntax.grammar.terminals.abc import Terminal
from xformula.syntax.lexer.tokens.abc import Token
class BOOL(
Terminal[bool],
):
class Meta:
priority = 2000
tags = {
non_terminal("Bool"): 1000,
}
def build_grammar(self) -> str:
define = self.ebnf.define
regex = self.ebnf.regex
any_of = self.regex.any_of
bound = self.regex.bound
word = self.regex.word
return define(
regex(
any_of(
bound(word("false")),
bound(word("true")),
),
),
)
def transform_token(
self,
runtime_context: RuntimeContext,
token: Token,
) -> bool:
# Transform the `false` and `true` tokens into False and True,
# respectively.
return token.value.lower() == "true"
Defining the AST nodes for the none
and bool
types is straightforward,
thanks to the comprehensive xformula.syntax.ast
module.
For the none
type:
import dataclasses
from xformula.syntax.ast.nodes import Literal
@dataclasses.dataclass()
class None_(
# Note: The `Literal` class is generic and requires the type of the
# transformed token as a type argument.
Literal[None],
):
# The `value` field is implemented with the `None` type.
value: None = dataclasses.field(
kw_only=True,
init=False,
default=None,
)
And for the bool
type:
import dataclasses
from xformula.syntax.ast.nodes import Literal
@dataclasses.dataclass()
class Bool(
Literal[bool],
):
# The `value` field is implemented with the `bool` type.
value: bool = dataclasses.field(
kw_only=True,
default=bool(),
)
Once the syntax features and AST nodes are defined, the next step is to specify how tokens should be transformed into AST nodes.
Define the None
non-terminal as follows (note that None
is reserved in
Python, so we use None_
instead):
from xformula.runtime.core.context.abc import RuntimeContext
from xformula.syntax.core.features.literals.ast.nodes import None_ as NoneNode
from xformula.syntax.grammar.ebnf import non_terminal
from xformula.syntax.grammar.non_terminals.abc import NonTerminal
from xformula.syntax.parser.trees.abc import ParseTree
# `None` is reserved in Python, so we use `None_` instead.
class None_(
NonTerminal[NoneNode],
):
class Meta:
# This will replace the default definition name `None_`.
definition_name = "None"
# Mark this non-terminal as atomic.
# This means that we want to use our custom transformation logic
# for the parse tree of this non-terminal.
# See the `transform_parse_tree` method below.
atomic = True
tags = {
non_terminal("Literal"): -1000,
}
# The grammar definition of the non-terminal.
def build_grammar(self) -> str:
# Since we tagged the `NONE` terminal with the `None` non-terminal,
# we can automatically reference it here.
# And, if another feature is added that tags the `None` non-terminal,
# the related terminals/non-terminals will be included here as well.
# Respecting the priority levels of the tags.
return self.ebnf.define_tagged_alternation()
# The runtime transformation of the parse tree.
# This is where we transform the parse tree into an AST node.
def transform_parse_tree(
self,
runtime_context: RuntimeContext,
tree: ParseTree,
) -> NoneNode:
return NoneNode()
Similarly, define the Bool
non-terminal:
from typing import cast
from xformula.runtime.core.context.abc import RuntimeContext
from xformula.syntax.core.features.literals.ast.nodes import Bool as BoolNode
from xformula.syntax.grammar.ebnf import non_terminal
from xformula.syntax.grammar.non_terminals.abc import NonTerminal
from xformula.syntax.parser.trees.abc import ParseTree
class Bool(
NonTerminal[BoolNode],
):
class Meta:
atomic = True
tags = {
non_terminal("Literal"): -2000,
}
def build_grammar(self) -> str:
return self.ebnf.define_tagged_alternation()
def transform_parse_tree(
self,
runtime_context: RuntimeContext,
tree: ParseTree[bool],
) -> BoolNode:
# The `Bool` non-terminal has only one child, the transformed value
# from the `BOOL` terminal.
value = cast(bool, tree.children[0])
# Return the `bool` node with that transformed value.
return BoolNode(
value=value,
)
Lastly, define the non-terminal for Literal
:
from typing import TypeVar, cast
from xformula.runtime.core.context.abc import RuntimeContext
from xformula.syntax.ast.nodes.abc import Literal as LiteralNode
from xformula.syntax.grammar.ebnf import non_terminal
from xformula.syntax.grammar.non_terminals.abc import NonTerminal
from xformula.syntax.parser.trees.abc import ParseTree
T = TypeVar("T")
class Literal(
NonTerminal[
LiteralNode[T],
],
):
class Meta:
tags = {
# Mark this non-terminal as the start rule of the grammar.
non_terminal("Start"): -1,
}
def build_grammar(self) -> str:
return self.ebnf.define_tagged_alternation()
# Default transformation for non-atomic non-terminals.
def transform_parse_tree(
self,
runtime_context: RuntimeContext,
tree: ParseTree[T],
) -> LiteralNode[T]:
return cast(LiteralNode, tree.children[0])
To use these features, define a feature class that modifies the syntax context during setup:
from xformula.syntax.core.features.abc import Feature
class LiteralFeature(Feature):
def setup(self) -> None:
self.non_terminal_types.extend(
[
None_,
Bool,
Literal,
],
)
self.terminal_types.extend(
[
NONE,
BOOL,
],
)
Finally, initialize the parser:
from xformula.syntax.core.context import SyntaxContext
# Note: We tagged our `Literal` non-terminal with the `Start` non-terminal.
# If the `Start` non-terminal is not defined, you can either define it or use
# the `PolyfillFeature` to handle it automatically.
from xformula.syntax.core.features.polyfill import PolyfillFeature
from xformula.syntax.parser import Parser
syntax_context = SyntaxContext(
feature_types=[
LiteralFeature,
# `PolyfillFeature` automatically defines any missing tags as
# non-atomic non-terminals during setup.
PolyfillFeature,
],
)
parser = Parser(
syntax_context=syntax_context,
# Optionally, you can also pass a runtime context here to customize or
# override the default.
)
# Parse an input string.
ast = parser.parse("true")
# Outputs: Bool(value=True)
print(ast)
# Outputs: True
print(ast.value)
# Outputs: None_(value=None)
print(parser.parse("none"))
The generated grammar is accessible via the ebnf_document
attribute of the
parser. In this example, it might look like:
?start : literal
?literal : bool
| none
bool : BOOL
none : NONE
BOOL.2000 : /\bfalse\b|\btrue\b/
NONE.2000 : /\bnone\b/
As you can see, the start
rule is prefixed with a ?
character. This is
because the PolyfillFeature
automatically defines the Start
non-terminal as
non-atomic. Since it does not have specific transformation logic and is used
only for tagging purposes, the parser automatically replaces the Start
non-terminal's parse tree with that of the Literal
non-terminal. Likewise,
because Literal
is also non-atomic, its parse tree is further replaced by
that of the Bool
or None
non-terminal based on the input. This approach
helps avoid deep nesting of AST nodes while leveraging the inheritance and
polymorphism features of OOP.
To observe the default polymorphism, you can inspect the MRO of an AST node class, similar to the following:
>>> parser.parse("none").__class__.__mro__
(
<class 'xformula.syntax.core.features.literals.ast.nodes.none_.None_'>,
<class 'xformula.syntax.ast.nodes.abc.literal.Literal'>,
<class 'xformula.syntax.ast.nodes.abc.term.Term'>,
<class 'xformula.syntax.ast.nodes.abc.primary.Primary'>,
<class 'xformula.syntax.ast.nodes.abc.operand.Operand'>,
<class 'xformula.syntax.ast.nodes.abc.simple_expression.SimpleExpression'>,
<class 'xformula.syntax.ast.nodes.abc.expression.Expression'>,
<class 'xformula.syntax.ast.nodes.abc.has_value.HasValue'>,
<class 'typing.Generic'>,
<class 'xformula.syntax.ast.nodes.abc.node.Node'>,
<class 'xformula.arch.meta.configurable.Configurable'>,
<class 'abc.ABC'>,
<class 'object'>
)
As demonstrated above, syntax features are defined in a modular way. The dynamic syntax concept further enhances this modularity by allowing you to plug in or remove features without modifying the core syntax definition.
For more low-level details, refer to the following classes:
- xformula.syntax.EBNFExpressionBuilderProtocol
- xformula.syntax.SyntaxContext
- xformula.syntax.TaggedDefinitionIterator
Since Lark is available for various programming languages, the grammars generated by XFormula can be used in those languages out of the box. To achieve the same dynamic transformation capabilities as XFormula's generated parser, it is necessary to align with the NonTerminalOperationClassBuilder.transform_parse_tree function, which automatically resolves operator associativity and precedence.
For a list of available Lark implementations, see the extra features section in the Lark documentation.
For an overview of the default features and an example runtime implementation for the language, check out the django-xformula project. This Django application transforms formulas into SQL queries using Django's powerful ORM capabilities.
This project is licensed under the MIT License.
See the LICENSE file for more information.