-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: number suffixes as annotations #510
Comments
I think it's an incredible idea, but as a CSS author, I'm a little biased. ^_^ Seriously, tho, I think it really is a very good idea. It's often the case that you don't need to annotate your numbers, but there are plenty of cases where numbers are heavily annotated, and this would be a huge boon to those. I think the wide-spread basic familiarity with CSS makes this a pretty natural thing to read and understand (and useful for CSS interop), and while you have to make the leap to "the unit is the tag", that's also really the only logical interpretation for it.
This is true of literally any change that's not strictly tightening the parsing; it's perfectly fine unless we have reason to believe that's a common error. KDL's lack of error recovery makes this more or less a non-issue, imo. |
I assume this wouldn't apply to hex numbers? I don't think |
I assume so, yeah. Just decimals. |
@tabatkins i was hoping you would chime in! I had a feeling it would be either "perfect, no notes" or " beware! Run! Here be dragons!" I'm still curious whether you think there's any gotchas about this based on your experience, but it doesn't sound like it. And yeah: originally I was gonna say these should parse as strings, but having them be numbers-with-special-type-syntax just clicked the moment I thought of it. I also thought of whether we might want this kind of thing for any other types and I think the only other types that can even parse this way would be quoted/raw strings. I am less interested in this, but enabling this for them would enable e.g. C++ style string suffix syntax, like |
@bgotink look good point. I hadn't thought that far |
Counterpoint: having |
For disks-rs provisioning having storage units ( |
I think this would be very useful. I would suggest allowing a One small problem I see is ambiguity with hex digits a-f and string suffix. It might be easiest to disallow an unquoted suffix containing a-f for hex numbers (Im assuming suffixes containing 0-9 need to be quoted too [edit: although allowing ( |
Hm, I suppose so. It's just such a footgun, especially if your hex number happens to only contain digits.
That would already be allowed implicitly; Maybe, in fact, this is required for the non-decimal numeric syntaxes - you must have a
Yeah, we'd disallow |
The specification is not entirely clear on this, it only ever uses the phrasing "which may be separated by _", which should maybe be made more explicit to allow or disallow them. Edit: the formal grammar does allow a trailing, but not an initial I think requiring a |
The grammar, at least, is clear:
This allows
Already taken care of the by the grammar, you have to start with an actual digit.
Yes, that is one of the footgun fears I have about allowing it unreservedly. It is better to have the kdl document fail to parse than unexpectedly misparse, when possible. So yeah, I'm leaning towards the _ separator. And since we already have to have an (implicit) restriction that the unit can't start with a I suppose in theory we only need it for hex numbers, while binary/octal could just take suffixes directly, but I think the advantages of drawing a consistent boundary around "the prefixed numeric syntaxes need a _ before a suffix" is worth it. |
I'm less sanguine about this, especially with the general string syntax for the suffix. It means In fact, I'm a bit loathe to allow |
Ok so to summarize:
Did I miss anything? p.s. the only reason Rust allows these suffixes for binders is because the number of symbols is limited and well defined. There is no way for any of the number suffixes in Rust to conflict (for example, Rust does not have literal hex float syntax, hence the |
I think it'd be in the spirit of allowing |
The point of the I believe the value of allowing |
Overall, I think these changes would be really great! But a few things that come to mind:
I don't think any of these are showstoppers, but maybe someone's got a clever solution that resolves them? Also, what's the use case for allowing this syntax on strings? I understand the number case because units are really common, but is something like |
|
It's probably best to ban suffixes from starting with What about |
Ooof, yeah, all those cases (from @jaxter184, @mwh, @tjol) concern me. We could add more restrictions to the syntax of valid suffixes, but that's making the feature more complex than I think is warranted. I think, as @mwh says, this is a reasonable argument towards using a different separator, with
I think, in general, this looks okay. It's an easy character to type (no Shift required), already indicates a minor separatation semantically, and is low-impact visually but still distinct. Playing with some of the motivating examples from the OP's links:
That looks okay? Not quite as good as examples without the Just messing around a bit more, we already have an existing generic metacharacter in
I'm a little less happy with that, but it's not terrible, and it's visually consistent with several other things in the language. I don't think this sort of consistency is necessary, but it is nice. Hm. |
The combination on a |
It's unfortunate that Would it be reasonable to set out:
"Require a separator" could also be "can only be written as prefix This is still a slightly uncomfortable special case, but much less liable to slip through than hex digits are, and the distinguishing element is always immediately at the beginning of the type tag, analogous to the existing prohibition on
Using |
That's already an issue for anyone using KDL, as without easy access to
Yeah, if the separator isn't required more or less at all times, I'd prefer to not have it at all and just require a normal tag.
Assuming we don't use a separator, this boils down to:
If we then allow a separator, ideally For the vast majority of cases you'll encounter, this means you can just drop a suffix in and it'll be fine. I think this is a pretty good set of rules. |
I don't really see how that follows? I'd imagine that |
Their rule 2 prevents it from being valid, however. (I rephrased it a bit to make that more explicit in my version.) |
I appreciate all the noodling y'all are doing around getting this to make sense. I appreciate that this is valuable that it might be worth a "bit" of complexity. I will point out that even if we have complexity around certain very unlikely corner cases, as long as we find good ways to make sure those corner cases can become errors when you "hold it wrong", we can get away with a bit of complexity around the rules. As long as we retain a boundary of "strongly avoid accidental bad data", and that "intuitive" cases just work, I think a few caveats are ok. Another point to make: this is, as far as I'm concerned, a strongly DSL-oriented feature. Most "just data" KDL documents are unlikely to use this feature. And DSLs that decide to use it will most likely have a very limited set of them, which they will have further opportunity to verify works well with the intended data those suffixes will be attached to. Not just that, but since their suffixes are likely to be limited, "nonsensical" type annotations derived from them will have a further opportunity to be caught as errors (even if we DID allow |
So to be more precise, my proposal for suffixes is now:
Valid examples:
Invalid examples:
|
This seems good. 2b raises some questions for me about whether there's other "high conflict likelihood" syntax other than |
Ah, now that you mention that, obviously I need to restrict an initial |
So it sounds like we've walked through the main concerns. Would anyone like to PR this? I might shop this around a bit in the meantime and see if anyone else spots a fatal flaw. I'm a bit sad that the feature is a little wrinkly: I do like the elegance of things fitting in really really nicely and cleanly, but also the sheer value of having the "happy path" for this feature makes the slight messiness of the rules around it feel ok? Idk what do y'all think? |
@tabatkins wouldn't Earlier you also mentioned that I think it's worth noting that KDL forbids keywords without # as identifiers, so this syntax automatically forbids @zkat I think this feature makes a lot of sense. |
lol, indeed, that was just bad on my part. I'll fix.
I feel like these are least troublesome cases; they'd still be allowed under my rules and I think that's okay. They're clearly invalid if thinking of binary/octal/hex numbers, and if you're not thinking of those syntaxes, it's clearly a But, hm, a unit like If we wanted to prevent this misparse, I think the rule would have to be:
So this would declare That keeps the happy path as wide as possible. Thoughts on if this is worthwhile or not?
Yup, a somewhat accidental but nice coincidence. |
We could also slightly widen the restriction, to go along with the The Edit: yeah, i'm liking this slightly wider restriction better, I don't like technically-allowed values that cause parse errors in common cases. So:
|
Could bare ( The main drawback being that type tags on binary/octal/hex values would be limited to the existing parenthetical prefixes (until separated suffixes are added). Also, most of the planning/restrictions/complications pertain to bare suffixes, and discussing separated suffixes has been helpful in figuring out how the bare suffixes should work, so maybe it's just easier for everyone involved (implementers, users, etc) to reduce the number of KDL versions floating around and bundle the changes together.
Just to clarify, does the term "hexadecimal digit" also include
Isn't |
Something that comes to mind in favor of allowing these identifiers:
|
It should, to make sure
No, a dot at the start of an identifier cannot be followed by a number. Lines 966 to 967 in a88c450
|
I'm just gonna toss this one over the fence: We could require that this feature prefix decimals such That is: And we make it so the numbers are ALWAYS non-exponential decimals. It's a special case, but it needs fewer protections because we know we're switching modes. Like, I don't think we need to protect against things like That is, while this might be a little bit surprising at first, we could TOTALLY allow this: I know some folks have worried before about overuse of
|
I think it reads better than Is there any possibility of using this for non-decimal numbers? I imagine I'd really like to try and navigate the |
Okay, here's my final attempt at charting a happy path that's not too overcomplicated. If this still isn't good enough, then I agree, going with a
This just widens the restrictions a bit to let us collapse the special cases as much as possible - all ascii alphas are restricted, and they all restrict all ascii digits. Unfortunately I still need to call out the This still allows all single-character suffixes (so |
@tabatkins i think I'm willing to live with this tbh. I think we can shop it around for a bit and see what the community thinks. Could put it being a feature flag in kdl-rs in the meantime. |
Tragic: can't say stuff like Edit: oh wait no, you can. Ohohohoho |
That should be Could we also make it EDIT: and I think we should make it |
ah yeah, sorry, my brain was still sitting in case-insensitive mode even tho I addressed casing in the character classes. |
I've published a pre-release of kdl.parse('node 10xp', {as: 'node'}).entries[0].value ![]() |
There was a discussion over at YaLTeR/niri#1142 (comment) when trying to come up with enhancements to Niri's DSL regarding numerical stuff. The tl;dr is that there's interest in being able to type things like
100px
and50%
, and not having to quote it.I thought about this a bit, and I would like to propose the following change to the number grammar to facilitate this. This change would be forward compatible: existing v2 parsers would error (instead of returning incorrect data).
The change is as follows:
100px
would be equivalent to(px)100
.(px)100%
One catch: a very narrow range of invalid v2 documents will now become valid (since
foo 100px
will no longer be an error)Thoughts?
The text was updated successfully, but these errors were encountered: