-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle corrupt FS by offering to reformat #117
Comments
Workaround for now to zero out SPI flash on Express boards: Install https://github.com/adafruit/Adafruit_SPIFlash as a library in the Arduino IDE. From File->Examples, compile, load, and run the example |
I have too many times unplugged or hard-reset a board and trashed the filesystem. Verifying the integrity of the filesystem is good but there may be errors not so easily detected. I was thinking about other convenient ways to force a reset of the filesystem:
|
Just curious...What OS are you using @dhalbert? What is running on the board before the unplug or hard-reset: REPL or a script? Is there any particular things that you are using I2C, SPI, etc.? The filesystem seems to become corrupted more often than I would expect by just unplugging. |
@willingc: This is Windows. The corruption I see is due to delayed writes. There's a lot of gory detail and some red herrings in bug #111. See https://superuser.com/questions/1197897/windows-delays-writing-fat-table-on-usb-drive-despite-quick-removal for a summary. Feel free to write to me at the email in my github profile. |
Hmm...I can see how delay writes would be pain to work with. I suspect there is an active, in-progress write when the reset/unplug occurs for the corruption to occur vs. simply having a write that is delayed but not yet started. I'll do some thinking on this while I'm traveling. I'm a bit curious about what state the MP/CP code leaves things when facing an unexpected loss of power during a write. |
The problem I am seeing in Windows is specifically that the File Allocation Table (FAT) entries for a file are updated 20-90 seconds after the directory info for the file and the file data itself is written. These entries mark the first block used for a file and then contain a chain of pointers to subsequent blocks. So if you pull the plug or press the reset button at any point during that 20-90 seconds, the file system will be inconsistent. It does not have to be literally during a write. I do an "Eject" every time after I update the filesystem and before I run anything. I also turn off auto-reset. I am not sure exactly what might cause Windows to detect that the filesystem is corrupt. |
So to rephrase: "If the encapsulated write process (from initial write of the directory info to update of final pointer in the the chain of pointers to storage blocks) is active, any interruption (loss of power/hard reset) or standard "Eject" can leave the filesystem in an incomplete/corrupt state." Is there a particular error message that you get when reconnecting the CP board to the Windows machine? An interesting test would be to see if the CP board could be read by Linux or macOS after Windows reports corruption. |
Thank you for all of the good thinking on this! I'd love to have better ways of recovering from this. There are two potential failure modes I know of:
autoreset does make 2 a bit confusing because CircuitPython will attempt to read the FS in this intermediate state. This usually results in a spurious syntax error. It shouldn't cause any corruption by reading it though. @dhalbert have you tried using windows to reformat the drive? I think it should work because CircuitPython is just being a dumb block device. Thanks! |
1 is an age old problem ;-) As for 2, just to clarify, is Windows reporting corruption after the "Safe Ejection"? That would be a Windows bug for not flushing the cache. I won't have a Windows machine until Tuesday but I will try out some options when I have access. |
No, Safe Ejection prevents the corruption. The problem is a hard reset or disconnect before the cached writes happen, by not waiting long enough or by not doing an Eject. A good way to get corruption is to write several files and then press the reset button a few seconds later, without doing an Eject. I always do an Eject after I copy files, to force the writes and avoid this to avoid this. Also, as I mentioned in the superuser.com posting, USB flash drives are by default set to "Quick Removal", so the writes should not be delayed. But a few of them (the FAT table ones) still are. I do think this is some kind of bug, but it's very long-standing. And even if it's fixed, that fix may not get propagated to many older systems. I am trying to get the attention of some knowledgeable person at Microsoft, but that's difficult.
I think I tried this, but it maybe didn't work. FatFS is very minimal. For instance, I think typically FAT16 has two copies of the FAT table for safety, and FatFS creates just one when asked to create a filesystem. I'll try again soon to verify or not. |
Yeah, thats true about FatFS being minimal. I think it actually creates FAT12. I'm definitely open to ideas on how to recover from problems like this. @dhalbert do you use the reset button frequently? The goal is to have auto-reset and soft reset work the majority of the time. |
I don't usually use the reset button (I know better now), but Windows just added serial support in "Windows Subsystem for Linux" (the bash shell environment they have now), and I was (unsuccessfully) trying it out. So there was a lot of plugging/unplugging/resetting and I was writing infinite loops to send characters. The reset button is very tempting for the average user. If there is a programmatic way to force the delayed writes to complete or do an Eject, maybe we could add code to do that to some recommended editor for Windows, like Mu (though I have had other troubles with Mu). That would be a less manual fix for Windows. I am not wedded to Windows by any means but many customers will be using Windows and I want to give CircuitPython a workout on it. |
Ok, yeah. I do hard resets often to reload CircuitPython itself too. I'm not sure what options we have on the host side for forcing writes. I know that CircuitPython can't do anything. We could switch to MTP which relies on the device maintaining the file system but it has its own problems (like no Mac OSX support.) |
Followup: I tried reformatting the filesystem to FAT via Windows this morning. It did not complain when formatting CIRCUITPY, and I could copy files in, but the FatFS could not read them. |
Ok, thanks for the follow up Dan! I'll keep brainstorming on allowing for
reformatting from CircuitPython's REPL. I need to add a mechanic to swap
who can write the disk anyway for audio recording.
…On Thu, Apr 20, 2017 at 6:40 AM Dan Halbert ***@***.***> wrote:
Followup: I tried reformatting the filesystem to FAT via Windows this
morning. It did not complain when formatting CIRCUITPY, and I could copy
files in, but the FatFS could not read them. uos.listdir() showed nothing
and uos.mkdir() raised an OSError, so it appears FatFS cannot deal with
the filesystem Windows created.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#117 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADNqT_uXClkKcZT3LtCJVvS-4mRhRfKks5rx2BQgaJpZM4M6yZV>
.
|
To help out a user, I built a flash_erase .uf2 for Metro M0 Express that doesn't prompt the user. It just erases the flash and then blinks to indicate success or failure. Similar versions could be built for Feather M0 Express and CP Express. (Not sure if this will work for all.) See https://forums.adafruit.com/viewtopic.php?f=60&t=118427&p=594953#p594953. Code is just the original flash_erase, pruned to almost nothing. This is just hacked up for now. I should submit a proper pull request to the flash library.
|
I don't think we should offer to reformat from within CircuitPython. There is now a troubleshooting post about recovering from this issue: https://circuitpython.readthedocs.io/en/latest/docs/troubleshooting.html#file-system-issues |
Erasing the storage is now supported within CircuitPython via the storage module: https://circuitpython.readthedocs.io/en/4.x/shared-bindings/storage/__init__.html#storage.erase_filesystem (Note added in case someone else besides me arrives via search) |
Unplugging without ejecting can corrupt the FS. On start we should validate the FS and offer to fix/reformat it on error. It shouldn't be automatically done because then data may be lost.
The text was updated successfully, but these errors were encountered: