-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traffic needed for 'list' and 'prune' operations on a remote repository increases linearly with time #167
Comments
In addition, upgrading attic from 0.13 to 0.14 to both target and remote repo machines does not make any difference. |
Well, I figured out that all the slowness and high traffic is coming from calling As a workaround, I changed my backup task and I am now caching the last backup time locally, to avoid the @jborg, if I am not missing some important information here, I think that |
@Ernest0x some thoughts: You say it surprises you that backup is so fast. Well, if you do it every 5 mins, you haven't much that has changed since last backup. Attic has a local cache with file infos from last backup, so it can skip quickly over unchanged files and will only examine changed ones more deeply. Even for the changed ones, it will only transfer chunks to the backup repo that are new and not already stored there (it has a local chunk id cache to quickly decide that). About list and prune: I think the root cause for the high traffic you observed is that the "items" list of an archive is contained in its main metadata dictionary - and that needs to be loaded for infos like name and timestamp. But unlike name and timestamp, the items list can get rather big. The "items" list is a list of chunk ids for all the chunks that store the items' metadata. So "list" only needs a little info from there, but it must load the complete datastructure. |
@ThomasWaldmann I may have misstated that, but it does not surprise me how fast the backup is, but how slow the list/prune commands are while the backup is that fast. As for your suggestion, it's not clear to me what exactly you propose as a change. It sounds like it would require changes in the repository format, right? Maybe a diagram showing a 'prune' operation and how it accesses the remote repository's (proposed) data structures would be more helpful. |
I have a pretty frequent backup task which is scheduled to run every 5 minutes and uses a remote attic repository. The task basically includes an attic
list
operation in order to get the time of the last backup, acreate
operation to actually take a new backup and aprune
operation to delete older archives.The problem is that the traffic that this task is generating is increasing linearly with time. I am not saying linearly with the number of archives, because the prune command has already started to prune older backups, so that at each execution one new archive is created and one old archive is deleted. What surprises me is that the traffic that is generated by the
create
command is neglectable in comparison to the traffic generated by the list and prune commands. Thecreate
command takes only 2-3 seconds, while bothlist
andprune
commands take several seconds (~ 2-3 minutes combined) and generate a lot of traffic.The direction of that traffic is like this:
remote repo -> target host
It looks like attic
list
andprune
commands need to fetch a lot of data from the remote repository in order to do what they do, which does not make sense to me. The text for the listing of all archives (measured by piping the output ofattic list
throughwc -c
) is only ~125KB.Here is some stats from a 'create' command:
Any thoughts?
If there is any extra information that would be helpful, let me know.
The text was updated successfully, but these errors were encountered: