I have been facing a pretty obscure issue recently that has been absolutely impossible to find documentation or solutions on - figured I would post my findings here in case anyone else runs into this issue.
The company I work for recently reached out to STM directly and their engineers confirmed the issue, and pointed to an errata for the STM32L4, commenting that this errata in particular applies to all chips with the OTG_FS and OTG_HS cores - not just the L4 series.
In my case, the issue presented as an endpoint becoming "stuck" - without application intervention the core begins to send partial transfers whenever serviced by the host. These transfers are invalid, so the host will throw them out, but the transfer never "completes" as far as chibios is concerned, so your thread using usbTransmit blocks forever.
A solution for my case was to set STM32_USB_OTGFIFO_FILL_BASEPRI in mcuconf.h
Code: Select all
#define STM32_USB_OTGFIFO_FILL_BASEPRI 1
Errata Here, Section 2.9.4 "Data FIFO gets corrupted if the write sequence to the Transmit FIFO
is interleaved with other OTGFS register access".
Things to note about this errata, based on answers directly from STM engineers
- This applies to all parts with OTG_FS or OTG_HS cores
- This issue applies to both OTG_FS and OTG_HS cores, not just OTG_FS as mentioned in the errata
- On chips with both the OTG_FS and OTG_HS, they are completely independent. Preempting OTG_FS TxFIFO filling to access registers on the OTG_HS will not trigger the issue
It is also important to note, I don't believe it is normally feasible to trigger this issue with ChibiOS unless you enable USB_USE_WAIT and have multiple threads blocking on usb transfers. If you have a single thread using blocking transfers, or use the non-blocking API's from a single thread you should not trigger this issue. However, for us this issue is very random and as such it is hard to really say for sure what set of conditions will or will not trigger it. It basically boils down to the usb_lld_pump thread being preempted while running the otg_fifo_write_from_buffer routine - and then something else in the system reading or writing any register that belongs to the same OTG core the lld pump thread was filling the TxFIFO of. Actually spotting a condition in your application which could trigger this is not simple. Proving your application could never trigger this condition is even more difficult. In our case, it depended heavily on the load of the system as a whole, the amount of usb traffic, and the direction of the wind on mars approximately 30 seconds in the future.
Having said that, I can't think of a scenario where you would use blocking API's for USB and not have multiple threads issuing usb transfers (assuming your device has multiple endpoints).
My understanding is the usb driver in ChibiOS is implemented this way for performance reasons, but I believe in this case due to the known errata of the STM OTG cores it makes the driver in its default configuration potentially unstable.
I propose the default behavior be changed to a more conservative, less potentially dangerous state. The TxFIFO should be filled while the system is in a locked state - effectively STM32_USB_OTGFIFO_FILL_BASEPRI defined to 1, rather than 0.