Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to generate the stub file #44

Merged
merged 6 commits into from
Nov 28, 2024

Conversation

gahjelle
Copy link
Contributor

@gahjelle gahjelle commented Sep 1, 2024

This is a draft for a script that can generate the spherely.pyi stub file.

During the EuroSciPy 2024 sprint, I worked with @jorisvandenbossche on #43 . We realized that we need several different _VFunc_* types to cover the different overloaded arguments for functions going forward. We tried a few different approaches to avoid too much boilerplate/copy-paste code, including ways of creating dynamic types but didn't find anything that was well supported by the type checkers.

Instead, the following script can automatically generate code for such classes. It works by looking a # /// Begin types marker and lines defining which types should be added. For example, to create a type that takes 2 geography inputs, as well as an optional radius parameter of type float, we can add the specification line:

#     - n_in=2, radius=float

Running the script will then update the code in-place between the # /// Begin types and the corresponding # /// End types marker. Any existing edits between those markers will be overwritten, but the rest of the file is left alone.

@gahjelle
Copy link
Contributor Author

gahjelle commented Sep 1, 2024

I'm not sure what's the best location for the script. I ended up pulling it outside the src/ directory. But we could still move it in there? Or, we could run the script as part of CI, and maybe move it to a workflow directory instead? Currently, the output of the script isn't perfectly "blackened" code. We could probably add trailing commas to make black always keep the arguments on separate lines? Or run Black as part of the script/workflow to format def on one line where there's enough room.

@benbovy
Copy link
Owner

benbovy commented Sep 3, 2024

Thanks @gahjelle that is a good idea!

We tried a few different approaches to avoid too much boilerplate/copy-paste code, including ways of creating dynamic types but didn't find anything that was well supported by the type checkers.

Out of curiosity, did you try or consider using TypedDict and Unpack (PEP 692)? Mypy seems to support it but I'm not sure about the other type checkers (I'm not an expert in Python static typing). I think one limitation is relying on **kwargs instead of a direct / explicit signature for the optional parameters, but probably not a big deal since the signatures in the docstrings are generated by pybind11 and not from the type stubs.

I'm fine with either solution. If using a script, I'd slightly prefer using Python code for the specs instead of parsing comments (e.g., in Xarray generate_ops.py. The script can be part of the code ; I think that for now it is good enough to run it manually when needed (we don't need to re-generate the type stubs unless we change the specs) and then let black format the generated file via pre-commit.

@gahjelle
Copy link
Contributor Author

gahjelle commented Sep 6, 2024

Thanks @benbovy

No, we didn't think about or explore using a TypedDict for doing the typing. I guess that could work. The types would probably be slightly less specific as, for example, radius would be an allowed optional parameter on all (Nin2) types. But maybe that's acceptable to avoid having to generate the stub file?

I've done an update to the generator, so that we're listing the specs explicitly in Python code instead of parsing the comments in the .pyi file.

But I'm fine with abandoning this script and working with the TypedDict and **kwargs instead.

When you say "The script can be part of the code", do you mean that we should move it into the src/ folder? Or that it should be integrated into some other code file?

Thanks again for your feedback.

@benbovy
Copy link
Owner

benbovy commented Sep 6, 2024

The types would probably be slightly less specific as, for example, radius would be an allowed optional parameter on all (Nin2) types.

Maybe we could define optional parameter types as generic, e.g., something like this:

_NameType = TypeVar("_NameType", bound=str)
_ScalarReturnType = TypeVar("_ScalarReturnType", bound=Any)
_ArrayReturnDType = TypeVar("_ArrayReturnDType", bound=Any)
_KwargsType = TypeVar("_KwargsType", bound=TypedDict)

class _VFunc_Nin2_Nout1(Generic[_NameType, _ScalarReturnType, _ArrayReturnDType, _KwargsType]):
    ...
    @overload
    def __call__(
        self,
        a: Geography,
        b: Geography,
        **kwargs: Unpack[_KwargsType],
    ) -> _ScalarReturnType: ...
    ...

class DistanceKwargs(TypedDict):
    radius: float

distance: _VFunc_Nin2_Nout1[Literal["distance"], float, float, DistanceKwargs]

When you say "The script can be part of the code", do you mean that we should move it into the src/ folder?

Yes I think it is fine moving it there.

@benbovy
Copy link
Owner

benbovy commented Nov 6, 2024

I gave a try at implementing a solution based on TypedDict + Unpack (see #65) but didn't really succeed unfortunately. It looks like both Python typing and static type checkers are not ready for supporting generic keyword arguments this way (see python/typing#1399).

I also haven't found any alternative solution based on typing that looks reasonably clean.

This PR is likely be the best way to handle keyword arguments of "universal functions" until there is better support for generic TypedDict + Unpack.

Copy link
Owner

@benbovy benbovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will be nice to have in as we are discussing about potentially adding more keyword arguments to various functions in #70.

@gahjelle I just added one minor comment. Perhaps the generate script could also be moved into the src folder close to spherely.pyi so it doesn't clutter the root project directory?

I keep thinking about the best / safest way to generate the vfunc types. One way could be to run the generate script during the cmake configure or build step... But it might be just fine running the generate script only when needed then commit the updated spherely.pyi file. The advantage of the latter is that re-formatting is done via pre-commit, and I guess mypy will raise some errors if we forget running the generate script anyway.

Comment on lines 75 to 77
_VFunc_Nin1_Nout1={"n_in": 1},
_VFunc_Nin2_Nout1={"n_in": 2},
# _VFunc_Nin2radius_Nout1={"n_in": 2, "radius": "float"},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this into a dictionary defined near the top of this file? It will be easier to find, especially for new contributors.

@gahjelle
Copy link
Contributor Author

Thanks @benbovy for having a look at this and for the update. I'll take a quick stab at updating this one to be easier to use according to your suggestions.

src/spherely.pyi Outdated
@@ -127,12 +125,16 @@ class _VFunc_Nin1optradius_Nout1(
@property
def __name__(self) -> _NameType: ...
@overload
def __call__(self, a: Geography, radius: float = ...) -> _ScalarReturnType: ...
def __call__(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbovy One quick question about parameter name:

In the original types, the Geography type was called geography when there's only one, but a and b when there are two of them.

In the "new" optradius type, the Geography types are named with a also when there's only one such geography. The generator currently uses geography for all types when n_in == 1. Should I change it so that it only uses geography when n_in == 1 and there are no optional args?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it as-is and uniformize the argument names in a follow-up PR (Currently implemented functions with a unique geography argument use either geography or a in their docstrings).

Copy link
Contributor Author

@gahjelle gahjelle Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @benbovy

Just to make sure I understand, should we leave this script as it is now or should I update the script to match the existing _VFunc_Nin1optradius_Nout1 and leave that as it is? It should be a trivial change. :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "a" even with n_in == 1 is slightly better for now since this is the name currently used in the docstrings of the (I/O and construction) functions with one vectorized argument: https://spherely.readthedocs.io/en/latest/api.html.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(sorry for the confusion: I had issues with posting inline comments this morning so I mixed comments from other PRs, now deleted)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbovy No worries :) I've updated the script. Now the output matches the existing types.

The minor diff is because the ordering of the overloads is different for n_in == 2. I'm just using whatever itertools gives, and it's probably not worth matching the manual ordering for this.

@benbovy benbovy merged commit 3271bd0 into benbovy:main Nov 28, 2024
15 checks passed
@benbovy
Copy link
Owner

benbovy commented Nov 28, 2024

Thank you @gahjelle !

@gahjelle gahjelle deleted the generate-stub-file branch November 28, 2024 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants