-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String interpolation for Idris2 #555
Comments
👍 One thing I have not seen answered in this proposal: The proper implementation of nested comments ignores closing tokens for E.g. this is a valid single multi-line comment:
At the moment we can parse nested comments by only looking for With your proposal allowing users to embed arbitrary code inside these What about only allowing identifiers in these splices? Would forcing users |
I think arbitrary expression should be allowed. As for regular string literals inside a block comment, we generate a single token for the comment with a simple string, thus when we find a The only additional problem, with interpolated strings and not regular string literals, is nested strings (as in If these changes to the lexer results in performance hits the issue pertains the lexing phase, that even if our implementation is not optimized, shoud still be asymptotically linear. |
Note that for nested comments the opening & closing tokens are distinct. |
Nicely worked out proposal! |
@andrevidela Did you figure out how to lex the |
I was going to work on that today actually. 'the plan is to look at { when parsing strings and then count the number of opening braces and closing braces (in the same way we count opening comment tokens), when all the braces are closed (and the count drops to 0) it means we are done with the interpolated part and we art starting to parse You can give it a go since I won't be able to take a look until this evening |
Probably needs a little bit more care, since the spliced code might contain nested comments, and those might contain |
I've written a draft for the modal lexer (gist). I've made it compile but not tested yet. However, it's too late here so I've to go to bed now. Hope it helps your hacking tonight @andrevidela. |
Hi, thanks for this nice feature. I found the following surprising behaviour:
The error says that an 'if' would be expected, but at the same time it reports the 'if' as an error. Edit: |
Coming to Idris from Haskell, I’m very excited for this proposal! The current state of affairs requires that the argument to an interpolation be of type One of my personal favourite interpolation libraries in Haskell is called I think a similar mechanism could work using Idris’ interfaces. However, I strongly recommend the creation of a new interface, for example |
I'd like to see a new interface (not |
@bi-functor @IFcoltransG I contributed to the interpolation string for Idris and I'm also a daily developer in Rust, I'd like to share my point of view on the |
Current progress
show
inferenceImplement runtime control using multiplicity annotations(see updated linearity section)Summary
String interpolation (Also known as variable interpolation, variable expansion, or variable substitution) is a common feature of modern programming languages that allows the programmer to write a string literal and insert executable code inline for it to be automatically inserted within the string by the compiler.
Motivation
String manipulation is ubiquitous in programming and Idris is no exception. Currently we use string concatenation for interleaving string literal with runtime strings, and while this practice has served us well, it has some obvious limitations in terms of readability. The following section will highlight examples where string concatenation falls short and how string interpolation improve the situation.
Proposed solution
We propose introducing this new syntax for string literals
Expressions in curly braces must return a type that implements
Show
.Expressions
Expressions within brackets can be arbitrary complex as long as they return a type that implements
Show
:Strings
Strings aren't "shown" with
show
2. Technical details
String interpolation has 2 goals:
While the first one can only be subjectively evaluated, the second one has a more concrete consequence for the compiler. Take this example
In it, the compiler can see that the variable
name
comes from a string literal. From which the compiler can perform the substitution at compile time rather than at runtime, outputting the code (given some inlining)instead of
In other words, the compiler can avoid runtime concatenation operation and perform those text replacement at compile-time.
Edit: It turns out this does not work, the
0
linearity does not allow sharing a compile-time value with runtime value. When linearity0
is used the value is always removed from the runtime, that means it cannot typecheck with++
even if it's inlined. Of course this restriction could be lifted but it simply means the semantics of0
are wrong when used with string interpolation. The real solution is to introduce a new grade/multiplicity that track compile-time availability (or staging) and informs both the compiler and the user that a value is available for inlining, partial evaluation and sometimes erasure.A taste of linearity
Interestingly enough, since this substitution is performed at compile time it is semantically equivalent to a function call with linearity
0
, that is, the following should compile:This observation leads us to believe that String Interpolation could make use of linearity (And linearity inference) to decide how to generate the correct output code. In addition, this allows the programmer to predict when the compiler will be able to perform such optimisation, and when it cannot. In the next example we can be assured the string concatenation will happen because there is no way to annotate
name
with linearity0
In conclusion, erased terms are concatenated at compile time and linear and unrestricted terms are compiled into classical runtime concatenation.
Edit: Turns out this is not possible with the current semiring 0, 1, ω. The reason is that we cannot share a multiplicity
0
value between runtime and compile-time values. we could hack this in but it would plainly be a violation of the guarantee of the type system that values with multiplicity0
are always erased.%stringlit
Idris2 has a feature allowing string literals to be overloaded with additional meaning provided by the
FromString
interface.This feature is particularly useful when designing an embedded DSL. For example
The
path
is equivalent tofromString "hello" / fromString "world" / END
, orComponent "hello" / Component "world" / END
both of which are much noisier.The rule for
fromString
is simply that every string literal is replaced by a call tofromString
using the literal as argument. However this rule breaks down if we try to apply it to interpolated strings. Take this example:Should it be converted to
(In this example we assume that string interpolation is extended to work with types which have a
FromString
instance and aMonoid
instance.)or
Additionally, the linearity intuition breaks down too because though both translations are valid, the first one assume a non-erased linearity and the second one makes uses of the erased linearity and there is no way to predict which one it will be.
For those reasons, making
stringlit
compatible with string interpolation is a non-goal.Alternatives considered
This section is going to look like a FAQ in order to showcase the result of our research.
Why can’t you just use string concatenation?
We can, and we’ve been doing so for years, but String manipulation is a cornerstone of software engineering. Humans communicate with text, display text, encode data in text and even replace type systems with text (see Stringly typed programs). The ubiquity of strings manifests itself in all sorts of ways. Take the following programs that calls a shell script
or this
show
implementationThey work but since Strings are so ubiquitous we think they should be:
String concatenation fails in those three aspects. As an example, the proposed syntax changes those two examples into.
system s"idris {file} -o {name}"
We already have dependent types, why not use printf like we should?
Type driven development in Idris has a wonderful example of implementing a type-safe printf function leveraging dependent types. It has the following type:
and is used like this:
with
printf "%s %s %s"
having the typeString -> String -> String -> String
.While this is a great exercise in dependently typed golfing, this also fails our 3 metrics, though not as badly as concatenation:
%s
with the correct corresponding argument (how is it supposed to know which element it is in the argument list? One would need to count to figure it out)We already have Data.String.Interpolation in contrib
Again, lets reuse both our examples and our 3 criteria:
As we can see, the examples are shorter and more readable, this is definitely an improvement over string concatenation, however:
++
by,
which is welcome, but not a sufficient improvement compared to string interpolationString
instead ofShow a => a
While some of those aspects could be improved (notably the last one), the best this solution could ever achieve is:
which is still an unacceptable amount of noise required in order to separate terms by single spaces.
I don’t like your choice, what about *insert other language’s* string interpolation syntax
Let’s explore a few of them.
Scala
s
prefix$name
C#
$
prefixSwift
Javascript
Python
f
prefixConclusion
We needed a prefix in order to avoid clashing semantics between raw strings and interpolated strings. Given the existing examples our choices are either
s
,f
or$
. Thef
prefix is unsuitable since it refers to “format” but our string interpolation proposal does not include any feature wrt to formatting (Scala has a special prefixf
to distinguish between formatted strings and interpolated ones). The$
would be fine, however,$
already has meaning in Idris as the function application operator. If we were to use it, those two functions would do different things despite looking very similarWhich leaves us with the
s
prefix, it would stand for “string” which isn’t too outlandish.As for bracketing the expressions, almost all languages use curly braces. Some of them prefix the curly braces by a
$
we would rather avoid it for the same reason we do not use it as prefix and because it makes the expression harder to type (an additional shift+4 to type on US keyboards). We feel like curly braces are the right balance of unlikeliness to appear in strings displayed to the user and ease to type. Indeed, parenthesis are ubiquitous, and square brackets are often used for list bounds. Curly braces find themselves for JSON values and Object-oriented records, but those rarely appear in typical idris code, and when they do, the string is machine generated rather than typed by a human.It is worth noting that the curly braces are already used in Idris for records and implicit function parameters. However, those three uses are unlikely to clash in an unreadable manner, even the worst case scenario:
Isn’t hard to read because of the interpolated string but because of the record syntax.
Thanks to @ShinKage who co-authored this proposal, we’re ready to answer your questions.
The text was updated successfully, but these errors were encountered: