-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDC: Hung once rx fifo queue is sufficient full #1572
Comments
The USB device (if its buffers are full, or the device application is not ready for data), should NAK the BULK OUT packets, an only ACK them once it is ready for the additional data. The host's hardware should keep retrying the BULK OUT packets every millisecond. The host may eventually get upset and reset the device, or perform other actions to try to reset the pipe like There are some issues with TinyUSB where it's improperly handling clearing the endpoint halt (#1537), but I don't think that's what's going on here (unless you have a packet log to prove it?) On the other hand, the host-dependent behavior suggests something special is going on with Linux... as if when it times out, it does try to reset the pipe or something else. |
From my (quite recently acquired) understanding of USB, I'd like to suggest a solution. I think the problem is that the OUT endpoint is not being "armed" in step 4 above. Even though the rx fifo queue is full, the hardware buffer should be setup to receive more OUT packets. Once the host tries to send another OUT packet, it should be replied with a NAK until there is room in the fifo queue to accept more data. Unless the hardware think the buffer is already full, it would reply with an ACK. Therefor, the OUT packet should be requested without clearing the FULL flag of the buffer register (I assume that something similar could be done with other hardware than the RP2040). The hardware layer would need to be aware if the various interface layers are ready to accept more data, in order to know if the OUT data should be accepted to postponed. |
@pigrew thank for the feedback. I don't think there are any NAK at all in this case, which is the cause of the problem. I don't have a hardware usb analyzer so I cannot very that though. I'll have a look as well if there are any reset etc. |
I have a RP2040 somewhere, but I'm not familiar with its USB peripheral. I've also not used the FIFO transfers in TinyUSB.... (I only have experience with the block transfer modes). It would be trickier programming, but the device code could let the HW buffer fill up while the SW FIFO is full. Once the HW buffer is full, the peripheral should automatically NAK future transfers. If it never NAKs when one or both buffers are full, than that certainly is a problem. You may be able to enable TinyUSB logging (see the There may also be 'zero length packet' bugs happening, if one side is expecting a ZLP to terminate a transfer but it is never received... At this point, the bug could be anything. I'm also going to need to study your code... figure out what the cdc_flush does, and the other functions... |
fyi - The RP2040 has a hardware PHY and dedicated DPSRAM, available to both the device/host controller hardware and the CPU cores. More information on this can be found in section 4.1 in the RP2040 datasheet https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf. From testing, I can tell that once an OUT packet is received and an interrupt is triggered, the hardware buffer for the endpoint is marked as being full and exclusive to the CPU core (controller can no longer access it). Since the USB hardware is prohibited from accessing the buffer control register at this point, I am guessing that further OUT packets would be fully ignored without any response, unless the control register is handed back to the device controller again with new configuration. In this case, it never is (see my repro steps above). @kilograham, I am hoping you might help me here. Could you assist in getting the RP2040 datasheet updated with info regarding the device processing of OUT packets (section 4.1.2.6.3)?
|
@mjan-jansson, inspired by your last comment, I created raspberrypi/pico-feedback#265 . Also, it made me wonder what the behavior is for data sequence errors, as that's also not well specified. |
Interesting. If I read the source code correctly, double buffering is disabled in device mode in TinyUSB for RP2040 (see roughly line 180 in rp2040_usb.c). I'm not sure I understand the rational for that. I am assuming it does not affect my use case other than performance. It would nevertheless be most interesting to learn more about this. |
In case anyone is still looking into this; I've updated the comment section above about tests with another board as the host (a H5 based board from FriendlyElec). No problems there either. Given other problems with the USB stack on Raspberry Pi (e.g. https://forums.raspberrypi.com/viewtopic.php?t=245931) I'm writing this off as a hardware specific RPI issue, and I do not have the tools to investigate any further. |
Thanks for the update, yes... I think a USB-analyzer is needed for for efficient debug. I did run the code on RP2040 with a Windows 10 host, and with a small amount of testing, could not reproduce. |
ping @P33M , in case he has any interest in investigating this further. |
When are NAK sent? Each time the device controller receives an IN or an OUT packet from the host, it will read the buffer control DPSRAM. If the available bit (and the full bit 1 for an IN TO HOST transfer, or the full bit 0 for an OUT FROM HOST transfer) is set then the data is transferred. If not a NAK is sent. This is the same for both directions. |
Many thanks for the info! |
Operating System
RaspberryPi OS
Board
RaspberyPi Pico
Firmware
Pico is running Pico SDK 1.4.0
The RPI4 is running Linux dev1 5.10.103-v7l+
What happened ?
The board runs as a USB device. It has a loop which write 20 times more data than it reads, over the CDC serial port. When using some USB hosts (but not all), the board will eventually hang and not be able to send any data.
I believe that this is what is happening when running the loop mentioned above:
At this point, the board will hang, since it is unable to write all the data.
Workaround
If tud_cdc_n_read is called (possibly several minutes after the host has marked the CDC port as defunct), the port will continue to work again without any loss of read or written data.
Comments
Note that this depends on the behavior of the USB host. I have tested this on a SBC (NanoPi NEO Plus2), Windows 11 machine and a RaspberryPi 4, and I am not seeing the hang with neither Windows nor the SBC as the host. I am seeing the hang 100% on RPI4, and I can verify the status of the two hardware buffers in question. The device is ignoring the OUT packets and the host is not polling for the IN packets. The SBC is running Linux 4.14 using ohci. My RPI4 is running Linux 5.10 using xhci_hcd (in case it matters).
I am seeing other issues as well with RPI4, with a more complex software, where CDC driver in TinyUSB is unable to create further IN packets, but there are no pending OUT packets from the host.
One could argue that neither RPi4 nor the Pico is at fault here. I am reasoning that the RP2040 MCU should silently ignore the OUT packets until it is ready to accept more data. By not responding to the OUT packets, the host should treat this as a failed transmission (data error), so host should retry but perhaps not forever? After some retries, the devices would be declared broken. On the other hand, not being able to accept data for long periods of time could be an acceptable scenario for certain devices, so maybe the USB stack on the host is at fault?
Regardless of the correct behavior, I would still like to fix this, but I am not sure what the correct behavior should be.
Related
I can find other reports which basically boils down to this problem. I am not seeing any fixes in tinyusb for this though, and I don't want to revive old issues. For example, #1273 . Looks like it was fixed in tinypython micropython/micropython#8520.
How to reproduce ?
Here is an example code that illustrates this problem. I'm using a 256 bytes size for all three (RX, TX, EP) buffers.
To trigger the test, run this on RPI4 :
If you break into the debugger once you get the hung, you can tell that the hardware buffer register corresponding to the IN and OUT endpoints are in a state where IN is active (hung) and OUT is inactive (not accepting any packets).
Debug Log as txt file
I can produce usbmon logs if it would help (I think not).
Screenshots
No response
The text was updated successfully, but these errors were encountered: