Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track progress of syncing #343

Open
tomprince opened this issue May 21, 2021 · 2 comments
Open

Track progress of syncing #343

tomprince opened this issue May 21, 2021 · 2 comments
Labels
Nice To Have A feature that is not required but may be desirable

Comments

@tomprince
Copy link
Contributor

At some point, we want to expose some sort of progress for syncing. This is a placeholder issue to collect possible implementation methods.

@tomprince
Copy link
Contributor Author

From @crwood:

I remember trying to do exactly this at some point years ago (i.e, attempting to generate progress indicators for individual files/operations by correlating their filesizes against the storage index shown in the operations page) but ran into various issues at the time (that, in retrospect, I wish I wrote I down...). Having different files with the same filesize was the obvious big one, but I believe, also that the operations page only keeps a very limited amount of entries (such that an operation that takes a longer time can get bumped off by smaller entries that complete more quickly -- but I'd have to test and confirm that). Is it possible to get/know the storage index both earlier and more programmatically, somehow (i.e., immediately after that PUT)?
Alternatively, what about attaching an "operation handle" to magic-folder operations as described here?: https://tahoe-lafs.readthedocs.io/en/latest/frontends/webapi.html#slow-operations-progress-and-cancelling Perhaps that would enable sufficient monitoring for individual upload/download operations?

@meejah
Copy link
Collaborator

meejah commented May 21, 2021

  • The Tahoe "long operations" are only currently used for deep-check / deep-stats operations.

  • The "active and recent operations" are indexed by "storage-index-string". This is derived from the capability.

  • The WebUI endpoint used to upload unattached immutables is a "PUT" operation. This accepts data more-rapidly than it pushes it upstream but doesn't return a response until the data is uploaded into the Grid. So, we don't have the capability until the upload is completed (and thus can't figure out which "active operation" is ours).

So essentially there is no good method from Tahoe to get progress information. You can see what operations Tahoe is doing (with nice progress indicators), but correlating those to particular files needs the capability.

I think the most-straightforward change we could make to Tahoe would be to have the "PUT" operation accept an optional ID that can be reflected in the "recent operations" JSON. This would allow us to correlate the correct "active operation" from the "/status?t=json" endpoint. I do not have a good handle on exactly how complex the Tahoe changes would be here (although I believe "fairly shallow") so this needs to be investigated versus the following suggestion.

Another change to Tahoe that would allow us to have proper progress would be to have the PUT operation exert proper backpressure (that is, accept data approximately as rapidly as it pushes data up to storage servers). Then we could track how many bytes have been pushed and do proper progress that way. This would also make the "tahoe put" operation "more correct" for other users, too. However, I do not have a good idea how complex this would be inside Tahoe (I believe "somewhat complex").

@meejah meejah added the Nice To Have A feature that is not required but may be desirable label Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Nice To Have A feature that is not required but may be desirable
Projects
None yet
Development

No branches or pull requests

2 participants