Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borg crashes when backing up to gvfs volume #8120

Closed
steelman opened this issue Feb 23, 2024 · 19 comments
Closed

borg crashes when backing up to gvfs volume #8120

steelman opened this issue Feb 23, 2024 · 19 comments
Assignees
Labels

Comments

@steelman
Copy link
Contributor

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

BUG

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.2.4

Operating system (distribution) and version.

Debian 12.5

Hardware / network configuration, and filesystems used.

Source: local ext4

Destination: CIFS mounted via FUSE (gvfs) with gio mount smb://….

How much data is handled by borg?

(The question isn't clear)

Archive name: 2024-W08-1
Archive fingerprint: 39956c1531fc491e35c5a20c2622a7335a0a53b463145bf0200722cb1662ac89
Comment: 
Hostname: darkstar
Username: steelman
Time (start): Mon, 2024-02-19 10:13:04
Time (end): Mon, 2024-02-19 10:13:36
Duration: 31.38 seconds
Number of files: 77911
Command line: /usr/bin/borg create -e Maildir/.notmuch -C zlib,9 -p -v -s ::2024-W08-1 ./Mail ./Maildir ./.notmuch-config ./.authinfo.gpg ./.config
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                6.58 GB              4.13 GB             46.77 MB
All archives:               51.44 GB             32.20 GB              3.86 GB

                       Unique chunks         Total chunks
Chunk index:                   82452               630062

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg check ::2024-W08-1

Describe the problem you're observing.

borg crashes

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yes, it happens every time.

Include any warning/errors/backtraces from the system logs

Failed to securely erase old repository config file (hardlinks not supported). Old repokey data, if any, might persist on physical storage.                                                                      
Failed to securely erase old repository config file (hardlinks not supported). Old repokey data, if any, might persist on physical storage.                                                                      
Exception ignored in: <function Repository.__del__ at 0x7f2948f3e660>                                                                                                                                            
Traceback (most recent call last):                                                                                                                                                                               
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 189, in __del__                                                                                                                                 
    self.close()                                                                                                                                                                                                 
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 478, in close                                                                                                                                   
    self.lock.release()                                                                                                                                                                                          
  File "/usr/lib/python3/dist-packages/borg/locking.py", line 417, in release                                                                                                                                    
    self._roster.modify(EXCLUSIVE, REMOVE)                                                                                                                                                                       
  File "/usr/lib/python3/dist-packages/borg/locking.py", line 316, in modify                                                                                                                                     
    elements.remove(self.id)                                                                                                                                                                                     
KeyError: ('darkstar@220292345999402', 1536376, 0)                                                                                                                                
Local Exception                                                                                                                                                                                                  
OSError: [Errno 95] Operation not supported                                                                                                                                                                      
                                                                                                                                                                                                                 
During handling of the above exception, another exception occurred:                                                                                                                                              
                                                                                                                                                                                                                 
Traceback (most recent call last):                                                                                                                                                                               
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 5213, in main                                                                                                                                     
    exit_code = archiver.run(args)                                                                                                                                                                               
                ^^^^^^^^^^^^^^^^^^                                                                                                                                                                               
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 5144, in run
    return set_ec(func(args))
                  ^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 168, in wrapper
    with repository:
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 214, in __exit__
    self.close()
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 478, in close
    self.lock.release()
  File "/usr/lib/python3/dist-packages/borg/locking.py", line 417, in release
    self._roster.modify(EXCLUSIVE, REMOVE)
  File "/usr/lib/python3/dist-packages/borg/locking.py", line 320, in modify
    self.save(roster)                                                                                                                                                                                            
  File "/usr/lib/python3/dist-packages/borg/locking.py", line 291, in save
    with open(self.path, "w") as f:
OSError: [Errno 95] Operation not supported
                                                                                                                                                                                                                 
Platform: Linux darkstar 6.5.0-0.deb12.4-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.10-1~bpo12+1 (2023-11-23) x86_64
Linux: Unknown Linux
Borg: 1.2.4  Python: CPython 3.11.2 msgpack: 1.0.3 fuse: pyfuse3 3.2.1 [pyfuse3,llfuse]
PID: 1536376  CWD: /home/steelman
sys.argv: ['/usr/bin/borg', 'check', '::2024-W08-1']
SSH_ORIGINAL_COMMAND: None
@ThomasWaldmann
Copy link
Member

Your filesystem doesn't work correctly.

borg does a file "open" and the fs responds with:

OSError: [Errno 95] Operation not supported

@ThomasWaldmann
Copy link
Member

#7937 also had this issue.

@steelman
Copy link
Contributor Author

Indeed it looks very similar. However, it needs to be noted that:

a) it happens only with borg check and not with borg create
b) I can open() files for writing on this file system (especially inside the borg repository) without any problems.

@ThomasWaldmann
Copy link
Member

You can try to reproduce this on a local filesystem to see whether it only happens on cifs.

"not supported" as error response to "open()" is weird.

@ThomasWaldmann
Copy link
Member

Oh, and try without that "gvfs" and just do a normal cifs mount and try with that also.

@steelman
Copy link
Contributor Author

steelman commented Feb 28, 2024

On kernel-based cifs volume borg check works fine. On gvfs with strace(1) attached to gvfs-fuse serving /run/user/2137/gvfs borg works fine. Without strace(1) it fails. From my experience this looks like some race condition in gvfs. But it is borg that is able to trigger it and that is interesting.

EDIT: borg check running under strace(1) also does not crash.

EVEN MORE EDIT: Even weirder is that the exception happens on the second attempt to open /run/user/2137/gvfs/smb-share:server=[snip],share=homefolder/steelman/Backup/Mail/lock.roster after asking for the password. The first successful attempt takes place right after startup.

@steelman
Copy link
Contributor Author

steelman commented Mar 11, 2024

I played with touch(1) and strace(1). The former randomly fails to create files in /run/user/2137/gvfs/smb-share:server=[snip],share=homefolder/steelman/Backup/Mail/ (with EOPNOTSUPP) so it seems there is little chance this isn't a fault of gvfs. I will keep investigating.

EDIT: No, it's not at random. openat(2) sets errno to EOPNOTSUPP when the file to be created exists. The first call succeeds:

openat(AT_FDCWD, "/run/user/2137/gvfs/smb-share:server=[snip],share=homefolder/steelman/Backup/Mail/__test__", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3

consequent fail:

openat(AT_FDCWD, "/run/user/2137/gvfs/smb-share:server=[snip],share=homefolder/steelman/Backup/Mail/__test__", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1 EOPNOTSUPP (Operation not supported)

@ThomasWaldmann
Copy link
Member

OK, so guess this is a gvfs bug? Or is borg doing something wrong or unusual?

@steelman
Copy link
Contributor Author

It looks like a gvfs bug to me, but this is just like, my opinion. I will find out.

@ThomasWaldmann
Copy link
Member

Guess this is a gvfs bug.

@steelman
Copy link
Contributor Author

steelman commented Aug 4, 2024

Indeed, I still haven't found out why it occurs but at least borg stopped triggering for me.

@hadess
Copy link

hadess commented Oct 2, 2024

Even if it was a bug in gvfs, the debug information that borg gave it made it near impossible to root cause.
I honestly expected borg to throw a nice error that pointed exactly what the problem was, rather than throwing exceptions during exception handling. This was even more complicated in my case as I was using a sandboxed Pika Bkacup to run borg, so Flatpak sandbox + front-end on top made debugging impossible for me at that point.

Thanks to @steelman for the debug info, I could reproduce locally and file a bug with a reproducer:
https://gitlab.gnome.org/GNOME/gvfs/-/issues/766

@ThomasWaldmann
Copy link
Member

@hadess Thanks for analysing this and filing the bug with reproducer at gvfs!

The problem here is that borg just does an "open()", everything more detailed happens inside CPython, so that "OSError: [Errno 95] Operation not supported" is all we have.

For such unexpected errors, we WANT to see the traceback (in case borg is the problem, so we can see what led to the exception). So the goal here (in general) is not a "nice" error msg, but an error msg with all the information we have.

I agree that it is sometimes a bit hard to analyse, especially if there are exceptions within the exception handler or within context managers trying to clean up, but I don't think we can easily improve this.

@hadess
Copy link

hadess commented Oct 2, 2024

The problem here is that borg just does an "open()", everything more detailed happens inside CPython, so that "OSError: [Errno 95] Operation not supported" is all we have.

For me a "nice" error message is something that can be acted upon. Like the value of self.path when open(self.path, "w") fails is pretty damn important, and as far as I know, you can't get that information without either:

  1. modifying the upstream code, which was impossible in my case without rebuilding a Python app and its Rust front-end,
  2. or borg outputting debug before attempting to open the file,
  3. or borg catching the exception to show both the path, and the exception traceback.

I'd fine with either 2 or 3.

@ThomasWaldmann
Copy link
Member

True, but even that could be seen as a bug in Python: it could include the file path or file fd when raising that exception rather than having os function users having to add debug logging before or wrappers around all calls.

@hadess
Copy link

hadess commented Oct 2, 2024

True, but even that could be seen as a bug in Python: it could include the file path or file fd when raising that exception rather than having os function users having to add debug logging before or wrappers around all calls.

Can you please take care of filing that bug against Python then?

There are already a number of work-arounds that exist for Python (I found https://pypi.org/project/friendly-traceback/ after 5 minutes of searching), but I don't know whether this sort of changes would ever get merged into Python upstream, whether it is at all suitable to Python upstream, or even whether the timeframe for it being fixed is anywhere near "short-term" enough for a work-around not to be implemented now.

Is there anything wrong with wrapping that call with a try/except and printing an understandable error message in the exception path followed by raising the error again?

@ThomasWaldmann
Copy link
Member

We have a lot of such calls, not only that one - and I don't feel like "decorating" them all. In the end, I guess you had to use strace anyway to find the precise syscalls?

IIRC, there were some improvements to tracebacks in recent python versions, but not sure if any of these would have helped here.

friendly-traceback is rather to help python beginners with their first coding steps and I don't think it is suitable here.

Yes, I can check the python issue tracker and file a bug if they don't have one already.

@ThomasWaldmann
Copy link
Member

python/cpython#124906

@hadess
Copy link

hadess commented Oct 2, 2024

We have a lot of such calls, not only that one - and I don't feel like "decorating" them all. In the end, I guess you had to use strace anyway to find the precise syscalls?

No, I didn't manage to root cause anything a year ago, because of the hurdles I mentioned above (using a front-end that wasn't giving me much more info, and sandboxed at that).

I was just lucky today, as I revisited the problem, to find that the OP had root-caused it. So I took the opportunity to file an upstream bug, and I'll probably try and fix it too, time permitting.

I would just want the next person to not spend days of time trying to root cause the problem when the problem is hit next, whether it's on another FUSE implementation, or something else, when we could "just" have better debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants