-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tolk v0.9: nullable types T?
, null safety, control flow, smart casts
#1545
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In FunC (and in Tolk before), the assignment > lhs = rhs evaluation order (at IR level) was "rhs first, lhs second". In practice, this did not matter, because lhs could only be a primitive: > (v1, v2) = getValue() Left side of assignment actually has no "evaluation". Since Tolk implemented indexed access, there could be > getTensor().0 = getValue() or (in the future) > getObject().field = getValue() where evaluation order becomes significant. Now evaluation order will be to "lhs first, rhs second" (more expected from user's point of view), which will become significant when building control flow graph.
This commit introduces nullable types `T?` that are distinct from non-nullable `T`. Example: `int?` (int or null) and `int` are different now. Previously, `null` could be assigned to any primitive type. Now, it can be assigned only to `T?`. A non-null assertion operator `!` was also introduced, similar to `!` in TypeScript and `!!` in Kotlin. If `int?` still occupies 1 stack slot, `(int,int)?` and other nullable tensors occupy N+1 slots, the last for "null precedence". `v == null` actually compares that slot. Assigning `(int,int)` to `(int,int)?` implicitly creates a null presence slot. Assigning `null` to `(int,int)?` widens this null value to 3 slots. This is called "type transitioning". All stdlib functions prototypes have been updated to reflect whether they return/accept a nullable or a strict value. This commit also contains refactoring from `const FunctionData*` to `FunctionPtr` and similar.
With the introduction of nullable types, we want the compiler to be smart in cases like > if (x == null) return; > // x is int now or > if (x == null) x = 0; > // x is int now These are called smart casts: when the type of variable at particular usage might differ from its declaration. Implementing smart casts is very challenging. They are based on building control-flow graph and handling every AST vertex with care. Actually, I represent cfg not a as a "graph with edges". Instead, it's a "structured DFS" for the AST: 1) at every point of inferring, we have "current flow facts" 2) when we see an `if (...)`, we create two derived contexts 3) after `if`, finalize them at the end and unify 4) if we detect unreachable code, we mark that context In other words, we get the effect of a CFG but in a more direct approach. That's enough for AST-level data-flow. Smart casts work for local variables and tensor/tuple indices. Compilation errors have been reworked and now are more friendly. There are also compilation warnings for always true/false conditions inside if, assert, etc.
In FunC (and in Tolk before) throwing an exception is just calling a built-in function: > throw 123; // actually, __throw(123) Since it's a regular function, the compiler was not aware that execution will stop, and all following code is unreachable. For instance, `throw` in the end on function needed to be followed by `return` statement. Now, `throw` interrupts control flow, all statements after it are considered unreachable. At IR level, code Ops are also not produced. This works because a built-in __throw() now has `never` type. It can also be applied to custom functions: > fun alwaysThrow(): never { throw 123; } The code after alwaysThrow() call will also be unreachable.
This was referenced Mar 4, 2025
T?
, null safefy, control flow, smart castsT?
, null safety, control flow, smart casts
EmelyanenkoK
approved these changes
Mar 5, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a major update that impacts both end users and compiler internals. In FunC,
null
was implicitly assignable to any primitive type — too permissive. A variable declared asint
could actually holdnull
at runtime — causing TVM exceptions if used incorrectly. Similarly,loadMaybeRef()
returnedcell
, but that cell could be null, very unobvious from its prototype.Now, we introduce nullable types:
int?
,cell?
, andT?
in general (even for tensors). Non-nullable types, such asint
andcell
, can never hold null values.The compiler now enforces null safety: you cannot use nullable types without first checking for null. Fortunately, thanks to smart casts, these checks integrate smoothly and organically into the code. Smart casts are purely a compile-time feature — they do not consume gas or extra stack space, ensuring zero runtime overhead.
Notable changes in Tolk v0.9
int?
,cell?
, etc.; null safety!
(non-null assertion)throw
is treated unreachablenever
typeNow, let's cover every bullet in detail, and see how it's implemented.
Nullable types
int?
,cell?
, etc. Null safetyThis update introduces nullable types
T?
, allowing values to be explicitly marked as nullable while preventing their use without a null check. This strictness aligns with TypeScript and Kotlin (where TypeScript usesT | null
, we useT?
) and ensures that null cannot be accidentally passed intostoreInt()
, or used in operations without proper handling.Key Features
cell?
,[int, slice]?
,(int, cell)?
,(int?, cell?)?
, and so on — each guaranteeing safe usage.int
could implicitly hold null, leading to runtime errors. Now, non-nullable types always contain a value.int?
andcell?
occupy only one stack slot — either storing a value or TVM NULL at runtime. This is identical to howint
andcell
worked in FunC, ensuring zero additional overhead.Handling nullable tensors is more complex. In FunC, null was incompatible with tensors — it could only be assigned to atomic types. However, this update extends nullability to tensors while ensuring memory and stack correctness. (See implementation details below)
When return type of a function is not specified, it's automatically inferred based on return statements, which can contain nulls:
Remember, that when a variable's type is not specified, it's auto inferred from assignment and never changes:
Previously, such a code worked:
Now, you must explicitly declare the variable as nullable:
From a type system perspective, the literal
null
has a specialnull
type. This applies everywhere, including global variables and tensor indices:This is a major step forward for type safety and reliability. Nullable types eliminate runtime errors, enforcing correct handling of optional values. The implementation is lightweight, with zero additional gas or stack usage.
Updates in stdlib to reflect nullability
Now it's obvious whether an assembler function accepts or returns null value:
Smart casts, implemented via control flow graph
With the introduction of nullable types, we want to allow intuitive handling of nullability, like this:
or:
or:
or:
This behavior is known as smart casts, a feature found in TypeScript and Kotlin. Implementing it in a stack-based language like ours was challenging, requiring deep Control Flow Graph (CFG) integration.
Smart casts make working with nullable types more natural. Once a variable is checked, the compiler automatically understands that it is non-null, allowing operations without redundant type assertions.
Smart casts work for local variables and tensor/tuple indices. In the future, struct fields will also support smart casts.
However, smart casts don't work for global variables. We do not encourage reading a global variable multiple times (it costs gas). Instead, assign it to a local first:
Smart casts also work for initial values. Even if a variable declared as
int?
but initialized with a number, it's a safe non-null integer:When a variable is 100% null, its type is
null
, meaning it can be safely passed to any nullable type:Indexing
var.0
is also not allowed pre-checking:Why is implementing smart casts challenging?
Smart casts require deep integration into the Control Flow Graph (CFG). The compiler must correctly track variable states across branches, loops, assignments, and conditions. Some examples of tricky cases:
Another complexity arises because variables can be modified at any time. Assignments inside expressions, function arguments, and conditions must be tracked correctly:
Moreover, since indices (
t.0
,t.1.2
, etc.) are tracked, they must be reset when their parent tensor/tuple (t or t.1) is reassigned or mutated.Once structures are implemented, smart casts for objects and fields will work automatically, just like tensors:
All in all, implementing smart casts in a stack-based, low-level environment was far from trivial. It required:
Now, null safety is smooth, intuitive, and enforced at compile time — no runtime cost, no extra gas, just safer code.
Operator
!
(non-null assertion)The
!
operator works like TypeScript's!
and Kotlin's!!
, but with an important distinction:Without
!
, the compiler would complain thatlastCell
iscell?
, even though external conditions guarantee it is non-null.Of course, thanks to smart casts, this would also work:
However, this relies on a runtime check and consumes gas.
The
!
operator is useful when you have guarantees outside of the code itself.Why use
!
in practice?Basically, when you call a function returning
T?
, but you are why ever sure that it's not null:Low-level functions working with dictionaries from
@stdlib/tvm-dicts
cannot be fully expressed using the type system. For example,iDictGet()
returns either(slice, true)
or(null, false)
, which makes its null safety dependent on an additional boolean flag.Because dictionary APIs operate at a very low level, they cannot be expressed using standard null safety rules. In the future, Tolk will introduce a high-level
map<K, V>
that eliminates the need for!
. Until then, working with dictionaries will often require!
.Also, unlike locals, global variables cannot be smart-cast. The
!
operator is the only way to drop nullability from globals:So, the
!
operator is powerful but should be used carefully. It's a tool for cases where you know something the compiler does not — whether it's guarantees from external logic or low-level TVM operations.Code after
throw
is treated unreachableThe statement
throw X
is a sugar for__throw(X)
, a regular built-in function. Prior to this release, it was not handled specially:Another example:
It works, because the built-in function returns
never
:The
never
typeThe
never
type represents code paths that are unreachable. It allows functions that always throw or never return to be safely typed:Functions with
never
return type must not return normally. This is not commonly used in practice, but it makes code more explicit and behaves exactly as expected.Implicit
never
in unreachable conditionsThe
never
type also appears implicitly when a condition is impossible:If you encounter
never
in a compilation error, it usually means that a preceding condition is invalid or contains a warning. Checking for compiler warnings will often reveal the root cause.Implementation details: non-trivial nullable types
As told above, atomics like
int?
/cell?
/ etc. are still atomics: at runtime, they hold either TVM NULL or a value. Checkingv == null
for them is expressed as FiftISNULL
.But what about nullable tensors?
(int, int)?
,(int?, int?)?
, or even()?
? How should they be stored on a stack? How should null equality work?The rule is the following: if
T
can not hold TVM NULL instead of itself, a special "value presence" stack slot is implicitly added. It holds 0 if value is null, and not 0 (currently, -1) if not null:Checking
v == null
for such types is expressed as0 EQINT
for the last slot.Smart casting (and
!
operator) works by just cutting off the last slot:It also means, that
null
can be actually N nulls on a stack. Same,(1,2)
passed to nullable implicitly adds "-1":This process of (appending or dropping slots) is called "transition to target (runtime) type". In the example above, original type is
null
/(int, int)
, target_type is(int, int)?
.By the way, not every nullable tensor requires adding a special slot. Some tricky examples, when it's not added:
But :) If we change it a little bit, then we need an extra slot again:
When structures are implemented, they will work like tensors, and all the algorithms above will smoomthly apply to them:
Implementation details: smart casts and control flow graph
The file
smart-casts-cfg.cpp
represents internals of AST-level control flow and data flow analysis.Data flow is mostly used for smart casts and is calculated AT THE TIME of type inferring. Not before, not after, but simultaneously with type inferring, because any local variable can be smart cast, which affects other expressions/variables types, generics instantiation, return auto-infer, etc.
Control flow is represented NOT as a "graph with edges". Instead, it's a "structured DFS" for the AST:
FlowContext
)if (...)
, we create two derived contexts (by cloning current)if
, finalize them at the end and unifyIn other words, we get the effect of a CFG but in a more direct approach. That's enough for AST-level data-flow.
FlowContext
contains "data-flow facts that are definitely known": variables types (original or refined), sign state (definitely positive, definitely zero, etc.), boolean state (definitely true, definitely false). Each local variable is contained there, and possibly sub-fields of tensors/objects if definitely known:When branches rejoin, facts are merged back (
int+null = int?
and so on, here they would be equal to before if).Another example:
Every expression analysis result (performed along with type inferring) returns
ExprFlow
:out_flow
: facts after evaluating the whole expression, no matter how it evaluates (true or false)true_flow
: the environment if expression is definitely truefalse_flow
: the environment if expression is definitely falseAn important highlight about internal structure of tensors / tuples / objects and
t.1
is sink expressions. When a tensor/object is assigned, its fields are NOT tracked individually.For better understanding, I'll give some examples in TypeScript (having the same behavior):
The same example, but with a nullable tensor in Tolk:
In the future, not only smart casts, but other data-flow analysis can be implemented.
if (x > 0) { ... if (x < 0)
to warn always falseif (x) { return; } ... if (!x)
to warn always trueThese potential improvements are
SignState
andBoolState
. Now they are NOT IMPLEMENTED, though declared. Their purpose is to show, that data flow is not only about smart casts, but eventually for other facts also (though it's not obvious whether they should be analyzed at AST level or at IR level, like constants now).Related pull requests