-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add walk to epath.Path #525
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for starting this! Mostly minor comments
etils/epath/abstract_path.py
Outdated
follow_symlinks: bool = False, | ||
max_depth: Optional[int] = None, | ||
**kwargs, | ||
) -> Iterator[_T, Iterable[_T], Iterable[_T]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the official pathlib.pyi signature and correct type annotations from: https://github.com/python/typeshed/blob/main/stdlib/pathlib.pyi
@abstractmethod
def walk(
self,
top_down: bool = True,
on_error: Callable[[OSError], object] | None = None,
follow_symlinks: bool = False,
) -> Iterator[tuple[Self, list[str], list[str]]]:
max_depth
is never used, as well as **kwargs
, so those should be removed everywhere.
Currently calling with a typo path.walk(top_dow=False)
will silently bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fsspec backend accepts a string as argument for on_error
which is not supported by the official pathlib.pyi signature. The fsspec backend also introduces the kwargs
and max_depth
arguments. We can cover these cases as follow
@abstractmethod
def walk(
self,
top_down: bool = True,
on_error: Callable[[OSError], object] | str | None = None,
follow_symlinks: bool = False,
max_depth: int | None = None,
**kwargs: Any,
) -> Iterator[tuple[Self, list[str], list[str]]]:
To prevent the typos actually we can replace kwargs
by an optional dict argument called fsspec_kwargs
and use this dict only in the fsspec backend or we can simply remove kwargs
but it will means to loose the ability to control how fsspec list files during the walk. what do you think about it ?
Notes:
- I modify the
maxdepth
args from fsspec tomax_depth
to make it easier to remember for the users. - fsspec.walk doc https://filesystem-spec.readthedocs.io/en/latest/api.html?highlight=walk#fsspec.spec.AbstractFileSystem.walk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
epath.Path
is supposed to abstract away the backends and filesystem that the path use. The fact we use fsspec
/tensorflow
or other is an internal detail that might change in the future.
So I don't think the public API should expose any of max_depth
and kwargs
. Otherwise the same code would behave inconsistently across gcs
/ locally depending on which backend is used.
The fact not all options from all backend are exposed is not a problem. If max_depth
is really required/requested/useful, it should be added first in pathlib.Path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
etils/epath/backend.py
Outdated
on_error: Optional[Callable] = None, | ||
follow_symlinks: bool = False, | ||
max_depth: Optional[int]=None, | ||
) -> Iterator[str, Iterable[str], Iterable[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type annotations should be Iterator[tuple[Self, list[str], list[str]]]:
and Callable[[OSError], object] | None
Same everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
etils/epath/backend.py
Outdated
top_down: bool = True, | ||
on_error: Optional[Callable] = None, | ||
follow_symlinks: bool = False, | ||
max_depth: Optional[int]=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove non-pathlib kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
etils/epath/gpath.py
Outdated
follow_symlinks: bool = False, | ||
max_depth: Optional[int] = None, | ||
**kwargs, | ||
) -> Iterator[_P, Iterable[_P], Iterable[_P]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
etils/epath/gpath.py
Outdated
max_depth=max_depth, | ||
**kwargs, | ||
): | ||
yield self._new(root), tuple(self._new(dir) for dir in dirs), tuple( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behavior should match pathlib.Path.walk()
which returns tuple[Self, list[str], list[str]
Only root
is a pathlib-like, dir
and files
are list[str]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
etils/epath/backend_test.py
Outdated
|
||
last_seen = None | ||
|
||
for path, dirs, files in backend.walk(tmp_path, top_down=False, follow_symlinks=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than many conditions, How about testing
assert list(backend.walk(tmp_path)) == [
(tmp_path, ['abc'], []),
...,
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestions ! I updated the tests
@@ -19,7 +19,7 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also update the CHANGELOG.md
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@Conchylicultor I have one last question. The argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update. I imported this internally and am waiting for internal review to merge this.
Hi,
I also needed the walk method for a project and took the liberty to fix #520
Best,
Hicham