Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add walk to epath.Path #525

Merged
merged 12 commits into from
Jan 11, 2024
Merged

Conversation

chicham
Copy link
Contributor

@chicham chicham commented Dec 26, 2023

Hi,

I also needed the walk method for a project and took the liberty to fix #520

Best,
Hicham

Copy link

google-cla bot commented Dec 26, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Member

@Conchylicultor Conchylicultor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for starting this! Mostly minor comments

follow_symlinks: bool = False,
max_depth: Optional[int] = None,
**kwargs,
) -> Iterator[_T, Iterable[_T], Iterable[_T]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the official pathlib.pyi signature and correct type annotations from: https://github.com/python/typeshed/blob/main/stdlib/pathlib.pyi

  @abstractmethod
  def walk(
      self,
      top_down: bool = True,
      on_error: Callable[[OSError], object] | None = None,
      follow_symlinks: bool = False,
  ) -> Iterator[tuple[Self, list[str], list[str]]]:

max_depth is never used, as well as **kwargs, so those should be removed everywhere.

Currently calling with a typo path.walk(top_dow=False) will silently bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fsspec backend accepts a string as argument for on_error which is not supported by the official pathlib.pyi signature. The fsspec backend also introduces the kwargs and max_depth arguments. We can cover these cases as follow

  @abstractmethod
  def walk(
      self,
      top_down: bool = True,
      on_error: Callable[[OSError], object] | str | None = None,
      follow_symlinks: bool = False,
      max_depth: int | None = None,
      **kwargs: Any,
  ) -> Iterator[tuple[Self, list[str], list[str]]]:

To prevent the typos actually we can replace kwargs by an optional dict argument called fsspec_kwargs and use this dict only in the fsspec backend or we can simply remove kwargs but it will means to loose the ability to control how fsspec list files during the walk. what do you think about it ?

Notes:

Copy link
Member

@Conchylicultor Conchylicultor Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

epath.Path is supposed to abstract away the backends and filesystem that the path use. The fact we use fsspec/tensorflow or other is an internal detail that might change in the future.

So I don't think the public API should expose any of max_depth and kwargs. Otherwise the same code would behave inconsistently across gcs / locally depending on which backend is used.

The fact not all options from all backend are exposed is not a problem. If max_depth is really required/requested/useful, it should be added first in pathlib.Path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

on_error: Optional[Callable] = None,
follow_symlinks: bool = False,
max_depth: Optional[int]=None,
) -> Iterator[str, Iterable[str], Iterable[str]]:
Copy link
Member

@Conchylicultor Conchylicultor Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type annotations should be Iterator[tuple[Self, list[str], list[str]]]: and Callable[[OSError], object] | None

Same everywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

top_down: bool = True,
on_error: Optional[Callable] = None,
follow_symlinks: bool = False,
max_depth: Optional[int]=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove non-pathlib kwargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

follow_symlinks: bool = False,
max_depth: Optional[int] = None,
**kwargs,
) -> Iterator[_P, Iterable[_P], Iterable[_P]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

max_depth=max_depth,
**kwargs,
):
yield self._new(root), tuple(self._new(dir) for dir in dirs), tuple(
Copy link
Member

@Conchylicultor Conchylicultor Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior should match pathlib.Path.walk() which returns tuple[Self, list[str], list[str]

Only root is a pathlib-like, dir and files are list[str]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


last_seen = None

for path, dirs, files in backend.walk(tmp_path, top_down=False, follow_symlinks=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than many conditions, How about testing

assert list(backend.walk(tmp_path)) == [
    (tmp_path, ['abc'], []),
    ...,
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions ! I updated the tests

@@ -19,7 +19,7 @@
import os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update the CHANGELOG.md file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@chicham
Copy link
Contributor Author

chicham commented Jan 8, 2024

@Conchylicultor I have one last question. The argument follow_symlinks of walk is only used by the os.path backend. Either fsspec and tf.io.gfile seem to ignore it.
Do you think we should raise a warning if this argument is set to True for the backends which do not support it ?

@chicham chicham requested a review from Conchylicultor January 8, 2024 15:37
Copy link
Member

@Conchylicultor Conchylicultor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update. I imported this internally and am waiting for internal review to merge this.

@copybara-service copybara-service bot merged commit 0a3ea8e into google:main Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support walk in epath?
2 participants