-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delayed writes and auto-reset problems when writing from Windows #111
Comments
Thanks for the detailed write-up @dhalbert! I'm not sure why it varies between Notepad and Notepad++ but do have some general ideas. First a little background. With mass storage devices the host OS is responsible for maintaining the on-disk file system metadata. This is super common and well supported across OSes. The alternative is the Media Transfer Protocol which is more complex and relies on the device maintaining the underlying file system. Android phones do this so that they can read and write files at the same time as things are changed over USB. However, MacOS in particular doesn't have built-in support for MTP. So, CircuitPython is a mass storage device instead. Relying on the OS has its downsides though. I suspect what you are running into is caching thats done on the OS side. There's no requirement that the OS write immediately to the disk upon save (though its common to). All it has to do is give the appearance that its written from within Windows. "Safely removing" is Windows' way of saying "Force me to flush the cache." So, try doing that next time its inconsistent. You are right that 512 bytes is important. That's the size of each filesystem block. So, I think windows is writing the first block on save but waiting on the second. If that gap is large enough then the auto-reset will trigger and show an error. The OS can also do this for the metadata parts of the file system which may have lead to your OSError. Neither of these things should be fatal to the filesystem though. You should be able to resave or "safely remove" to cause Windows to flush its cache and make the actual stored file system consistent. MicroPython has had similar reports because they allow for writing the file system even when the OS is as well which can definitely lead to corruption. They chose to allow this for user flexibility. In CircuitPython the file system is never writeable from a user's Python code which should reduce the chance of FS corruption. (Its still possible by disconnecting the device before the OS's cache has been written.) We hope to change this to be toggleable (writeable over USB or from CircuitPython but not both) but haven't yet. It'll probably be done when we finish the SD card support. So, next time it happens try doing a "safe remove" on the device and see if that fixes the file for CircuitPython. If that works, then its the best we can do on the CircuitPython side. |
Thanks for the background, Scott. The USB drive is set to write changes immediately. It does not deliberately cache writes. This is typical for USB flash drives on Windows. Here's one of its property windows: However, despite the setting above, there's something odd about Notepad++ or the Windows API it uses. I looked at its code and it straightforwardly does a I tried a few other things. I lengthened I might want to try instrumenting |
Huh, I never knew windows could do that. I'm a Mac user now and used to do Linux. For For crashes I use GDB on the new prototype metro. Thanks for your help! |
I spent some more time on this today. I tested several other editors, including Wordpad, Mu, Atom, and EMACS. All have delayed-write problems as above except EMACS and NOTEPAD.EXE. I also found the source code for NOTEPAD.EXE. NOTEPAD.EXE uses WIN32 I/O calls; Notepad++ uses the stdio-style wrapper provided by Microsoft ( NOTEPAD and EMACS go to some effort to open the file as existing if possible. Notepad++ and the other editors don't bother to do this, but there's nothing wrong with how they write files. And even given the differences above, it's not clear to me that NOTEPAD or EMACS will not have delayed-write with larger files that are writing more than one or two blocks. The real problem is that despite the USB drive being marked for Quick Removal (also known as But unless this can be fixed somehow, it seems to me that enabling auto-reset for CircuitPython on Windows is problematic. Auto-reset can trigger the reading of an incomplete file. Then, after 10's of seconds, The write will complete, it will auto-reset again and mysteriously work. Ejecting the drive manually will force the writes, but there will still be an error reported at the first incomplete write. I see this as potentially a big support issue for anyone using CPy on Windows. I don't know if CPy can detect whether it's connected to Windows or not, and disable auto-reset if so. Alternatively, there could be a no-auto-reset version of the firmware, but that defeats the purpose of a CPy board being pre-loaded and being usable right out of the box. Am I the only person you know of testing CPy with auto-reset on Windows? |
Thanks for all of the investigation @dhalbert ! No, we can't know if the computer we're talking to is Windows or not AFAIK. There is an API for turning off autoreset here that can be used to turn it off in boot.py: https://circuitpython.readthedocs.io/en/latest/atmel-samd/bindings/samd/__init__.html Another solution could be better error messaging on the CircuitPython side. We could likely detect SyntaxErrors on the block boundary and suggest ejecting the device. Or we could simply give it as a suggestion on all SyntaxErrors. How does that sound? |
Two replies coming up: 1/2 A note on exactly which write is being delayed: I was able to look at the actual USB messages using Wireshark and USBPcap. The traces show that the entire file is being written out immediately; it's the metadata write, in this case the updating of the FAT (File Allocation Table) that's being delayed. I think it shows how filesystem corruption could be possible, because a filesystem transaction has not been finished. If the file is truncated and then written, the FAT will be changed when the file is truncated (to indicate the file is 1 sector long), and then changed again when the file is written (because it >1 sectors long). Opening the file as existing only works because I don't happen to be changing the number of sectors needed for the file. If I made it smaller or larger by at least a sector, then the FAT would have to be updated. (I still don't know why writes are being delayed. I am amazed no one else has reported something similar. By the way, I also did some USB traces of a conventional FAT USB drive, and saw the same delay. I also tried a FAT32 drive: that was much better and finished all writes within a couple of seconds.) My comments preceded by
|
2/2: Thanks - I didn't know about disabling auto-reset via the samd module. I put this in boot.py, and it worked nicely.
I added a message since the firmware says auto-reset is on, and I need to contradict that. The user still needs to be reminded to Eject the CIRCUITPY drive to force the write or else a soft-reset could generate an error. So the steps are:
(Note that the "Eject" does not actually eject the drive and make it disappear, at least on Windows 10). It's still there, but after the Eject a notification appears that it's "safe to remove".) Heuristic error messages might be confusing, since syntax errors are very common anyway. If you can detect the Eject, you could detect a file write without a following Eject, and remind the user to Eject. And if you do detect the Eject, you could then safely do an auto-reset after the Eject is done. So the auto-reset would actually happen on Eject, not file write. What do you think of that strategy? So... To see if you could detect an Eject. I logged the USB events that show up during an Eject. Here's a link to the complete log with more detail, and I've included a summary below. I haven't seen these events elsewhere during FAT I/O.
|
I don't want to limit the autoreset to eject-only because it reduces the usefulness. I'm ok with autoresets that error. I think its just a matter of teaching people how to work around it if they see the issue. I wouldn't hide the SyntaxError in favor of something else. I would just add an additional tip or hint along with it that says that the flushing could be the case. Its a good point about the auto-reset messaging. I'll make that conditional. Filed #112 for it. |
Just some more info: I formatted a 512MB flash drive multiple times with various sized partitions. At 15MB and below, writes of the FAT table are delayed. At 16MB and up, FAT writes happen promptly. I compared USB traces from the 15MB and 16MB case. They are very similar until the delayed writes at the end. But in the 16MB case, Windows sends SCSI So apparently at some threshold Windows decides to do writes carefully, both in terms of requesting no removal, and doing them promptly. I've submitted this as a problem report via the "Windows Insider Feedback Hub". |
Thank you for all of your investigation! I'm curious to see how long it
takes to get fixed.
…On Tue, Apr 25, 2017 at 4:19 PM Dan Halbert ***@***.***> wrote:
Just some more info: I formatted a 512MB flash drive multiple times with
various sized partitions. At 15MB and below, writes of the FAT table are
delayed. At 16MB and up, FAT writes happen promptly.
I compared USB traces from the 15MB and 16MB case. They are very similar,
but in the 16MB case, Windows sends SCSI Prevent/Allow Medium Removal
commands before writes, asking that the device not be removed. The USB
stick actually returns failure on these requests (because it can't
guarantee no removal), but Windows tries anyway. The 15MB trace shows *no* Prevent/Allow
Medium Removal"commands.
So apparently at some threshold Windows decides to do writes carefully,
both in terms of requesting no removal, and doing them promptly.
I've submitted this as a problem report via the "Windows Insider Feedback
Hub".
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#111 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADNqT9dwuKr_MoQtE6LYCJdayz4lk_vks5rzn-agaJpZM4McFnB>
.
|
No solution yet, but I may have found the actual Windows driver code that's causing the issue. 16MB is the breakpoint between FAT12 and FAT16; below 16MB, you get FAT12. [EDIT: breakpoint is actually at the number of clusters: 4085. Sectors are almost always 512B, but there can be multiple sectors per cluster: this is specified in the FAT filesystem header block] MS happens to include the FAT filesystem driver in a package of sample driver code! There are several places in that driver where, if the filesystem is FAT12, the driver will not bother to set the dirty bit. https://github.com/Microsoft/Windows-driver-samples/blob/master/filesys/fastfat/verfysup.c#L774 In the last link, in
I've added this info to the report I sent to MS. |
@dhalbert I suspect that you have found the root cause. 👍 |
wow nice investigative analysis!
waiting for MS to fix it could be very time consuming. we could get lucky and it takes only a few weeks but it took a few years to get USB CDC serial devices to get automatically installed. @tannewt - its your call! |
Thanks. The idea of forcing FAT16 sounds good, but it's not clear to me it's going to work. Here is MS' chatty and informative spec for the various FAT filesystems. On page 16 or so it says:
I saw this point mentioned other places. But looking through the fastfat driver, I'm not sure how it enforces this. I may have time to take a harder look tonight. Not sure if we can force FatFS to do FAT16 instead of FAT12 -- it's not in the API, so we'd need to change the code. Besides the Feedback Hub report, I also contacted the one of the UF2 guys, and he forwarded the problem on. I will follow up with him for closure. |
I'd go with @ladyada 's second option. It shouldn't be too hard to only autoreset after writes to a specific block or two. I don't think I'd have it on by default though. I'd just have it as a setting you can set in boot.py like turning off autoreset. |
When It's a little tedious an Eject from the taskbar or from an Explorer window; . I just found a little piece of freeware to make it a double-click (or a hotkey). I'll try it out -- is might be handy. |
Uwe Sieber has some nice command-line friendly tools that we can integrate if necessary |
For posterity: I asked about the bug in https://superuser.com/questions/1197897/windows-delays-writing-fat-table-on-small-usb-drive-despite-quick-removal/, and ended up answering my own query. |
Sorry I haven't done the second option even though I said I would. I don't think its a good option actually because it doesn't actually prevent corruption. It only reduces spurious bad errors. @dhalbert have you confirmed that FAT16 causes Windows to write faster? That could work. |
@tannewt i guess the Q would be: does Win ever edit a file without updating the FAT? we can look at USB traces if ya like? |
@tannewt FAT16 does cause windows to write out the metadata faster (within a few seconds). But as I found out, FAT12 is used by definition when the filesystem is a certain size, so unfortunately we can't force Windows to use FAT16 instead of FAT12 on tiny drives. People haven't used such tiny drives en masse since floppies and very small camera flash cards. This is a darn nuisance. I can see a few ameliorations:
Do you know of any other microcontroller packages that provide a filesystem? Seems like Micro:Bit etc all just have one-file upload. |
@ladyada Not sure this answers your question, but the very delayed FAT update happens when the file number of blocks needed for a file changes. Some editors make a point of always opening a file for write with truncation, which causes the file to go to zero length and then grow, so they always hit this problem (e.g. Notepad++). Some editors don't truncate on opening (e.g. NOTEPAD.EXE), so if the file doesn't change size enough to change the number of blocks, the delayed FAT write doesn't happen. But that's an accident of file size (and it's what confused me so much when I first encountered this problem). |
@dhalbert ohh that makes sense - i always use xemacs so that could be part of why i've never seen it. would there ever be a time where a file changes but the FAT doesnt get updated? i think you always at least have an update for the modification time? |
@ladyada There are two sets of metadata that get updated on file write (and sometimes on open): the directory info (like file modification times), and the actual FAT (File Allocation Table). The FAT does not contain directory info: it just contains a chained list of blocks (aka "clusters") that store the data in the file. So for instance, a file might be stored in blocks (clusters) 10, 11, and 15. The directory entry points to slot 10 in the FAT. Slot 10 contains the number 11, slot 11 contains the number 15, and slot 15 contains a special marker indicating there are no more blocks. If the editor doesn't truncate the file when writing (NOTEPAD and Emacs don't), and the number of clusters needed doesn't change, then the actual FAT doesn't need to be updated. So when you're editing, and your file didn't get smaller or bigger, everything is fine. And maybe when you did grow the file, it didn't work at first, and then it did, and you didn't think much of it. Maybe xemacs even does some kind of programmatic eject, though I'd think not. Windows writes the directory info promptly. But the driver, fastfat.sys, doesn't flush the FAT entry changes promptly. I think I even found the statement where it checks whether it should do a flush, and it skips the check on FAT12. FAT12 doesn't have a "dirty" bit, which indicates whether a filesystem transaction is in progress. FAT16 and up do. (FAT16 and up also have duplicate FAT tables, for safety, and other enhancements.) FAT12 was originally for floppies, so maybe this all has to do with what was good for floppies. Or maybe it's just a bug. |
right! i forgot that the directory management is not in the FAT just the clusters :) yeah sounds like some of these editors can be confusing... we may want to add a FAQ to the circuitpython pages to indicate editors that we don't suggest because of this. (ironically, i wrote a fat fs handler for PIC in 2004 and have clearly erased all of that knowledge from my brainstem :) |
For Windows, it probably makes sense to suggest a few free (and easily installable) editors such as Visual Studio Code and Atom. |
I've done some more research on forcing Eject after write. I found a number of code examples. The Uwe Seiber EjectMedia.exethat @ladyada mentioned looks good (#111 (comment)), and he also has source code with detailed explanations available. I've looked at Notepad++, Visual Studio Code, Atom, and Mu and it looks like it would be relatively easy to write plugins for any of them (or for Mu, add integral code) that would invoke something like EjectMedia immediately after a write, by either running an external executable or invoking the Win32 code directly. |
Random wacky idea: This is probably not worth the trouble, but I thought it was worth writing down. I mentioned above that the determination of FAT12 vs FAT16 is based on the number of clusters. So below a certain number, the filesystem must be FAT12. This would seem to preclude formatting, say, a 2MB filesystem as FAT16. However, I realized it may be possible to fake this. The FAT table has a special value to mark bad clusters that should not be used. For FAT16 it is 0xFFF7. So one could format a tiny filesystem as FAT16 by pretending it was larger, but marking all the clusters that are out of range as bad. So actually most of the FAT would be filled with bad cluster markers. The remainder of the FAT would work fine and be treated as FAT16 by Windows, etc. The FatFS code would probably have to modified to do this. |
we have many options - i think right now i want to wait and see if we get this happening to other people (after we put in sufficient warnings not to reset w/o eject) - as its a bit hacky :) |
@dhalbert and me have report it in Windows insiders feedback hub and if it get more votes it get fixed faster |
Hi guys, this is Kwabena from OpenMV. We've been suffering from this issue with our micropython system and I stumbled across your thread. I think you've hit the exact problem. Thank you for your good notes on this. I'm going to modify OpenMV IDE to not truncate the file on saving to the disk. Normally, folks use the system with an SD card. But, sometimes with the internal flash. |
I found this thread on how to unmount a disk using windows: The issues seems to be solved with OpenMV IDE using this to reset the OpenMV Cam. And on linux the syncfs() function can be used. |
@kwagyeman Glad this was helpful. The exact issue in Windows is summarized in the superuser.com link above. On Linux and Mac, writing the FAT on a FAT12 filesystem does not seem to be delayed, so you may not need the unmount there. It seems to happen within a couple of seconds. |
I'm going to close this issue for now, since we've documented the problem thoroughly in the Learn Guides and described and implemented mitigations such as editor plugins that force immediate writes. If/when there's some movement on the Windows side or we figure out an alternate filesystem that works on all the platforms, I'll reopen or create a new issue. |
In a classroom environment with CPX boards I've found this to be problematic. I'm seeing it take roughly 10-20 seconds to see a main.py has changed after an update via cp command on linux (Rasperry Pi). We've also seen corruption and that was where the CPX was being bounced back and forth from a Windows 10 laptop to a Rasperry Pi probably without ejects. |
On LInux, run |
Is there a bug/RFE ticket from Adafruit with Microsoft? Given this was noted in 2017 this should have been fixed by them by now? |
I tried very hard to get Microsoft's attention on this, including using Feedback Hub, and contacting several people who work there, asking them to pass it on. I didn't succeed, as far as I know. The behavior may have to do with not wanting to wear out floppy disks, or something like that. |
I'd say floppy disks were are even more at risk because almost all drives had a physical eject albeit with the churning noise and LED to help users trained in the art of waiting. Only those fancy Apple users had their software controlled eject plus the hole for the paper clip for when it all went wrong! |
I noticed some MSFT folk talking about PyCon/Adafruit stuff and started this discussion (@qubitron and @zooba) to try and get a bit of momentum to get this bug fixed: https://twitter.com/kevinjwalters/status/1122527653960007680 The Microsoft feedback id is: 4257403a-0bc4-4d5d-8f36-9ba682d53a45 |
Hi, thank you for your perfect analysis, I have the same problem in a different context, and I spent days to find a solution. I'm not sure to find one now, but I know what to try. |
@oldav There is some news about this: we are now in contact with MS, and some people there interested in CircuitPython are trying to pursue it. |
There's a survey about Python use on microcontrollers mentioned on: https://twitter.com/nnja/status/1140807884474732544 . I've mentioned the FAT12 bug on there. I'm not sure if it'll have any effect but I'll try any angle on offer to try to get MSFT to fix this tedious problem.
|
@kevinjwalters No need to keep bugging Microsoft about the FAT12 bug. They've gotten the message and the wheels are turning. Just need to have patience now. |
@tannewt I didn't expect anything to actually happen here but it turns out it might be fixed: https://twitter.com/zooba/status/1188954487924260864 |
I have had problems writing to the onboard filesystem from Windows. I sometimes see complete filesystem corruption, and sometimes just problems with one file. See https://forums.adafruit.com/viewtopic.php?f=60&t=109687 for background.
Specific scenario, showing a file that CPy has trouble reading:
The file in question is here: main.py.txt. (Renamed from
main.py
tomain.py.txt
so that GitHub will take it as an attachment.) I wrote this Python code while doing some CPy I/O testing, and the specific code probably doesn't have anthing to do with this problem. However, this file is 556 bytes long, so it's more than one 512-byte block, which does seem to be important.If I write this file to
D:\CIRCUITPY\main.py
usingNOTEPAD.EXE
, it runs just fine. It prints "5" and the button-reading loop works.If I write this exact same file using Notepad++ (a very common lightweight editor used on Windows) it does not work. The serial port shows:
Line 26 is right around the 512-byte boundary in the file.
In the REPL, I can read the file I wrote with
NOTEPAD.EXE
:But if I write the same file again with Notepad++, I get an OSError when trying to read it in the REPL:
I can
TYPE
either file fromCMD.EXE
, and if I useod
(from GnuWin32) to look at the characters in the file, they are identical.I have repeated the cycle of writing with
NOTEPAD.EXE
and then Notepad++ and I consistently have the error only with Notepad++. Once or twiceTYPE
complained it did not have access to the bad version of the file, but I cannot reproduce that problem consistently. If I look at the file properties in Windows Explorer, the two versions have identical properties and sizes.I've looked at the Notepad++ source code where it writes files. It looks innocuous: it uses
::fwrite()
. It does support UTF-8, but the file in question is all ASCII, and I set up Notepad++ to write it as ANSI.This appears to be some oddity or corruption about how Windows is writing to the CPy filesystem. I've inquired on the Notepad++ forum about whether it does anything unusual when writing files, and will report back if I hear anything.
The workaround is not to use Notepad++, but it seems important to figure out what's going wrong so other users will not have the same problem. I did some websearching and haven't turned up any similar reports about Notepad++, FatFs, or MicroPython.
The text was updated successfully, but these errors were encountered: