Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add contains? helper function to boot.janet #1017

Merged
merged 10 commits into from
Jun 1, 2023

Conversation

Techcable
Copy link
Contributor

@Techcable Techcable commented Aug 26, 2022

Before adding this, the simplest way to do this was (not (nil? (index-of val collection))) or (truthy? (index-of val collection)).

In my opinion, this is significantly less clear than the new API (contains? collection val).

Also adds four new related functions:

  • collection? - Tests for collection types (intentionally excluding strings)
  • contains-key? - Checks if a collection contains a key
    • For arrays, this checks the index
  • contains-value? - Checks if a collection contains a value

I would say this use case for contains? is pretty important.

Most major programming languages offer some sort of contains function (Python, Java, Rust).
In Python it's even a keyword (in).

Some of the other functions are more debatable.

I intentionally excluded strings for the reasons described in the contains? comment.

TODO

  • Informal smoke tests (on my laptop)
  • Add proper unit tests
  • Review & criticism
  • Final docs cleanup
  • Rebase into something coherent (

This is significantly clearer than using (not (nil? (index-of col val)))

Most major programming languages offer some sort of contains function (Python, Java, C, Rust).
The only exception I know of is C.
@Techcable Techcable changed the title Add contains? helper function to boot.janet Add contains? helper function to boot.janet Aug 26, 2022
@bakpakin
Copy link
Member

bakpakin commented Aug 26, 2022

You are like the 4th person trying to add a contains? function to core. Also contains-value? is identical to contains? (you can use index-of on dictionaries). contains-key? is the same as just (get ds key) with the optional check for nil.

So to reiterate:

(defn contains? [ds value] (not= nil (index-of value ds)))
(defn contains-key? [ds key] (not= nil (get ds key)))

is what you want. Anything else is just overhead, your "efficient" implementation is no more efficient than this.

EDIT: the not= nil check is the only value add here besides the new names, since collections deliberately do not allow nil keys, but do allow false as a key,

EDIT 2:
The above definitions also work correctly on byte sequences, unlike your own implementation which decides to raise a type error (unlike anything else in core).

@bakpakin
Copy link
Member

I guess your "collection?" classification could be defined as recursive associative arrays, which is useful.

@bakpakin
Copy link
Member

Looking again at your implementation, you decided to make contains? check keys for tables and structs and values for arrays and and tuples. I will not merge that.

Note this actually changes behavior from a thin wrapper over `index-of`.
This is because `(index-of 3 3)` throws "error: expected iterable type, got 3"
This is just documentation of existing behavior, it does not change anything.

The reason index-of throws a type error on non-iterable types is because `next` does.
This is hardcoded into the JOP_NEXT opcode (see src/core/value.c:janet_next_impl).

Unfortunately, there is currently no corresponding `iterable?` check.
Remove `contains-value?` because it is now redundant.

Clarify in the documentation that it checks dictionary values.
No longer used to guard the type tests.
@Techcable
Copy link
Contributor Author

I have removed contains-value?, removed the type error (it now returns false) and changed to support all iterable types.

Note that the builtin next function throws a type error if the value is not "iterable".
This function is implemented as JOP_NEXT and delegates to value.c:janet_next_impl (which throws an error here

This causes (index-of 3 3) to throw type errors.

Because contains? is a test function, I figure it should never throw errors (as you pointed out).
Instead, it just returns false. I do this by wrapping it in a try.

So to reiterate:

(defn contains? [ds value] (not= nil (index-of value ds)))
(defn contains-key? [ds key] (not= nil (get ds key)))

is what you want. Anything else is just overhead, your "efficient" implementation is no more efficient than this.

This is mostly the new implementation. The only change is that index-of must be wrapped in a try block to catch type errors from next and index-of (as mentioned above).

The above definitions also work correctly on byte sequences, unlike your own implementation which decides to raise a type error (unlike anything else in core).

I have changed the function to remove type tests. It should now support all iterable values, including byte sequences, generators (fibers) and abstract types.

It actually now suppresses the type errors (as discussed above).

Looking again at your implementation, you decided to make contains? check keys for tables and structs and values for arrays and and tuples. I will not merge that.

Yes. I agree this is an inconsistency. This has been corrected. The contains? functions is now equivalent to the old contains-value?.

Remaining issues (will add to TODO):

  • Unit tests (I'm not sure which "suite" should I add this too??)
  • Check over docs (again)
  • Consider adding string/contains? for finding substrings
    • This is only a convenience/naming thing, semantically it would just be equivalent to (not (nil? (string/find needle haystack)))

@bakpakin
Copy link
Member

bakpakin commented Aug 27, 2022

Remove the try. In dynamic languages, the usual idea is garbage in, garbage out. You misunderstood my point about the type error. “Test” functions are not special in that regard.

@bakpakin
Copy link
Member

bakpakin commented Aug 27, 2022

There already is string/find which does the same as string/contains? You don't even need the nil check since string/find cannot return false.

@Techcable
Copy link
Contributor Author

Techcable commented Aug 27, 2022

Remove the try. In dynamic languages, the usual idea is garbage in, garbage out. You misunderstood my point about the type error. “Test” functions are not special in that regard.

Okay, I will remove the documentation I added for the type error in index-of too.
If this is a "garbage out" situation, the type error is not really part of the function's contract (in contrast to Python, where a TypeError is usually part of the function's contract).
We may want the flexibility to change this behavior in the future (maybe change it to return nil).

Documenting the type error for index-of would also create an inconsistency with next, where the type error is not documented.

There already is string/find which does the same as string/contains? You don't even need the nil check since string/find cannot return false.

Understood. I agree. Using (when (string/find ...) ...) or (truthy? (string/find ....)) is good enough.

> Remove the try. In dynamic languages, the usual idea is garbage in, garbage out. You misunderstood my point about the type error. “Test” functions are not special in that regard.
> - @bakpakin
Three reasons:
1. This same behavior is not documented on the `next` function
2. This function does not throw the error directly,
   it only throws an error because `next` does.
3. Following the same idea as the previous commit, this behavior is
   more or less implementation-defined for nonsensical types
  > In dynamic languages, the usual idea is garbage in, garbage out.

Various other documentation cleanup.
@Techcable
Copy link
Contributor Author

Unit tests added, documentation cleaned up & corrected, try wrapper removed.

Should be very close to final submission (or dismissal).

Remaining tasks:

  • Final check over documentation
  • Final rebase?

@bakpakin
Copy link
Member

The docstrings don't match the style of other docstrings, including references to null (not a thing in Janet), and in general are very verbose for such simple functions. The documentation is embedded into the Janet binary, please keep it to a couple of sentences, not an essay.

More of a nit, but (not= nil ...) will actually be more efficient than (not (nil? ...)), so prefer that.

@bakpakin
Copy link
Member

Doing some more thinking and comparing to Clojure, which defines contains? as the contains-key? functions above. To avoid this confusion and be very explicit, I think names like has-key? and has-value? are a bit better, especially as associative arrays are the most general Janet data structure abstraction (rather than the seq as in Clojure), so I think that maps better.

@Techcable
Copy link
Contributor Author

Agreed. I definitely need to clean up (and reduce the size of) the docstrings. The reference to null was just a sloppy mistake on my part.

names like has-key? and has-value? are a bit better

Thank you! This is a much better name :)

@bakpakin bakpakin marked this pull request as ready for review September 19, 2022 22:00
Copy link
Member

@pepe pepe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to take a fresh look at this after it was hanging here for a while. Frankly, I see little value in having such functions in boot.janet. In all the library codes I have written, I have not missed such a functionality.

Copy link
Contributor

@primo-ppcg primo-ppcg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contains-key? seems to be completely superfluous. I would, however, like to point out that there is some potential for confusion in the current Core API. One might wrongly assume, as I initially did, that in is intended to be used when it is unknown if the key exists, and get when it is known to exist. Of course, it is precisely the opposite. To someone unfamiliar with the language, the following looks like correct code, and would likely pass review:

(if (in items key)
  ...

when of course, it is not what the author intended. If the language were still pre-release, I would strongly recommend swapping their definitions. That said, it is a very minor hurdle, and not one that justifies a convenience method, in my opinion. And with names like get and in, how much more concise could you possibly want it to be?

One thing I that will agree with, is that index-of should not be used if you don't actually mean index-of. It's a great example of bad practice. As a reviewer, if I encounter code like the following, in any language:

(if (not (index-of val items))
  ...

the first thing I need to do is stop and check the definition of index-of. "What does it return if the value isn't found? Is it possible to return a falsey value even if the value is found?" And even if I think I know the answer to these questions, I'm still going to double check to make sure, and would likely request a change either way. Whereas, if I encounter something like this:

(if (contains? items val)
  ...

I can read straight over it. The intent is clear, and the implementation is obviously a correct expression of that intent.

I'm not suggesting that this function must necessary be called contains? or that portions of this PR should be merged. I am merely submitting my options for consideration. Personally, I would much rather see something like:

(if (some (partial = val) items)
  ...

even if I know it to be less efficient.

@zevv
Copy link
Contributor

zevv commented May 30, 2023

One thing I that will agree with, is that index-of should not be used if you don't actually mean index-of.

> (if (not (index-of val items))

vs

> (if (contains? items val))

Yes, very much this. I strongly feel that code should reflect the intentions of the programmer. Reading the first line makes me stop and wonder what is happening there, while reading the second shows without doubt what the author is doing there.

Of course every language has its idioms (I read C code like return !!(a & (1<<0x08)) without blinking), but there is some value to using 'well known' names for common operations.

I feel it might be justified to have a function called contains?, returning a boolean, with well defined semantics over tuples, array, structs and objects, even if it adds yet another identifier to the global name space.

@sogaiu
Copy link
Contributor

sogaiu commented May 30, 2023

I appreciate:

reflect the intentions of the programmer

but I don't think it's always cost-free (efficiency-wise but also from the perspective of compatibility, growing code base, and maintenance).

Having said that, I don't know whether those parenthetical concerns apply to this case.

@primo-ppcg
Copy link
Contributor

The three major selling points of Janet for me:

  1. it's small
  2. it's fast
  3. it's expressive

code should reflect the intentions of the programmer

Does the language allow this? Does it encourage this, by design? That is expressiveness.

@sogaiu
Copy link
Contributor

sogaiu commented Jun 1, 2023

it's small
it's fast
it's expressive

Yes, nice!

Hopefully for many cases (including the one this PR is about) it's not a "choose 2 out of 3" situation :)

Does the language allow this? Does it encourage this, by design? That is expressiveness.

I can understand including "allow" in such a definition. I can appreciate the value of "encourage", but am not so familiar with including that in "expressiveness" [1]. (I like avoiding lists of 4 things though (^^;)

I would submit though that sometimes what is "encouraged" shields us from other possibilities that may occasionally be useful or better.


[1] Thanks for the food for thought -- I have something to mull over.

@primo-ppcg
Copy link
Contributor

I can appreciate the value of "encourage", but am not so familiar with including that in "expressiveness"

What I mean specifically, is that if it is more convenient to write code that clearly expresses your intent to other developers (and your furture self) than not, then the language has encouraged you to do this, by design.

@sogaiu
Copy link
Contributor

sogaiu commented Jun 1, 2023

Ah, thank you for the elaboration. That makes sense.

@bakpakin
Copy link
Member

bakpakin commented Jun 1, 2023

Since most of the work has already been done and test cases added, and there is some positive feedback, I will merge this. Still there is some work needed here to make doc strings match the other docstrings, and I will rename the pair of functions to has-key? and has-value?.

@bakpakin bakpakin merged commit 0fcbda2 into janet-lang:master Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants