While testing a flight controller based on a STM32F765IIK6 we hit a problem in flight where SPI transactions from a MS5611 barometer went crazy. As this caused a crash we tried to reproduce the problem on the ground.
I found that the SPI peripheral (SPI4) can get into a bad state due to a momentary electrical glitch on the clock line. When this happens the FIFO appears as non-empty and all incoming transfers are corrupted (both with polled and DMA transfers).
I thought we could fix it by just draining the FIFO, but it doesn't help. So instead I have used the non-empty FIFO to trigger a reset using the RCC. This works nicely, and when I deliberately trigger the issue by shorting the SCK line to another pin the peripheral is automatically reset and the flight controller recovers.
Patch attached.
[patch] Added reset of SPIv2 on FIFO error
-
- Posts: 141
- Joined: Mon Sep 25, 2017 8:27 am
- Location: Canberra, Australia
- Has thanked: 10 times
- Been thanked: 20 times
- Contact:
[patch] Added reset of SPIv2 on FIFO error
- Attachments
-
- 0001-STM32-check-for-FIFO-de-sync-in-SPIv2-and-reset-peri.zip
- (1.48 KiB) Downloaded 177 times
- Giovanni
- Site Admin
- Posts: 14455
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: [patch] Added reset of SPIv2 on FIFO error
Question,
My question is, what causes the glitch?
CLK is an output line, should we put a SW fix for what appears to be an electrical problem?
I am asking because the fix appears to be heavy and would worsen SPI latency significantly.
Giovanni
My question is, what causes the glitch?
CLK is an output line, should we put a SW fix for what appears to be an electrical problem?
I am asking because the fix appears to be heavy and would worsen SPI latency significantly.
Giovanni
-
- Posts: 141
- Joined: Mon Sep 25, 2017 8:27 am
- Location: Canberra, Australia
- Has thanked: 10 times
- Been thanked: 20 times
- Contact:
Re: [patch] Added reset of SPIv2 on FIFO error
Giovanni wrote:My question is, what causes the glitch?
we don't know. The issue happened once in flight. All we knew from that flight was that we started getting crazy values back from SPI reads on the MS5611. A reboot fixed it.
After the flight we tried various was to trigger the same symptoms. The one we found that could trigger it was to deliberately short the CLK line with another pin (I've been using the MOSI pin, but it may well happen with other pins).
It could be that SPIv2 or STM32F7xx is particularly vulnerable to this, or it could be that it could happen on any SPI peripheral and this is the first time it has been noticed.
CLK is an output line, should we put a SW fix for what appears to be an electrical problem?
For use in an autopilot the answer is most definitely yes. The key is that a temporary electrical issue causes a permanent outage of a sensor. In an autopilot you want to be as robust as it is possible to be. It is not unheard of for external temporary RF pulses (eg. fly in front of a powerful RF transmitter such as a ground radar) to cause glitches like this.
I am asking because the fix appears to be heavy and would worsen SPI latency significantly.
In normal operation it adds a single read of the status register to each SPI read. That doesn't seem heavy to me.
If you would prefer we could move this out of the low level ChibiOS code and put it up inside the ArduPilot code, perhaps exposing a spiReset() method in the hal and spi_lld_reset() method in the driver. I think though that this sort of robustness improvement is worthwhile for many uses of ChibiOS. Autopilots are not the only RTOS examples where robustness is so important.
Also note that the reference manual does mention using the RCC reset, in the section on "Procedure for disabling the SPI". It first suggests checking for leftover bytes in the FIFO and draining those. I tried that and it didn't fix the issue in this case (though perhaps we should do it anyway). It then suggests using a RCC reset, which is what did work.
Cheers, Tridge
- Giovanni
- Site Admin
- Posts: 14455
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: [patch] Added reset of SPIv2 on FIFO error
I missed the single condition check in spi_lld_check_reset(), yes, the latency overhead would not be much.
Anyway, I am still not sure about this fix, I don't like this kind of very specific workaround to be done in the driver. It is addressing something very specific (and not understood) occurring in your HW but the workaround overhead would be for everybody.
Another point, the patch does not consider the DMA channels state, those could need resetting too since the FIFO appears to be non-empty.
What I suggest is:
1) Add the reset to spi_lld_start() and make it a general rule for all drivers. Some already do that, see I2C which is historically glitchy and can need restarting. In general, stop then start should always return the driver to a fresh situation so the reset IS needed.
2) Add an assertion on RXNE where you placed your check, this would help detecting the issue for those affected.
The workaround would go outside the driver, you should detect the issue (the RXNE check) then do a spiStop() followed by a spiStart() before operations. The extreme workaround would be to do spiStart, operations, spiStop each time, without any check, blind dumb safety.
Giovanni
Anyway, I am still not sure about this fix, I don't like this kind of very specific workaround to be done in the driver. It is addressing something very specific (and not understood) occurring in your HW but the workaround overhead would be for everybody.
Another point, the patch does not consider the DMA channels state, those could need resetting too since the FIFO appears to be non-empty.
What I suggest is:
1) Add the reset to spi_lld_start() and make it a general rule for all drivers. Some already do that, see I2C which is historically glitchy and can need restarting. In general, stop then start should always return the driver to a fresh situation so the reset IS needed.
2) Add an assertion on RXNE where you placed your check, this would help detecting the issue for those affected.
The workaround would go outside the driver, you should detect the issue (the RXNE check) then do a spiStop() followed by a spiStart() before operations. The extreme workaround would be to do spiStart, operations, spiStop each time, without any check, blind dumb safety.
Giovanni
-
- Posts: 141
- Joined: Mon Sep 25, 2017 8:27 am
- Location: Canberra, Australia
- Has thanked: 10 times
- Been thanked: 20 times
- Contact:
Re: [patch] Added reset of SPIv2 on FIFO error
Giovanni wrote:1) Add the reset to spi_lld_start() and make it a general rule for all drivers. Some already do that, see I2C which is historically glitchy and can need restarting. In general, stop then start should always return the driver to a fresh situation so the reset IS needed.
I hadn't actually noticed that I2C does a RCC reset on i2c_lld_start(). Doing the same on SPI would seem to make sense.
Would you be happy for this to be always enabled, or would you like a STM32_SPI_RESET_ON_START define to enable it?
Or would you prefer a spiReset() call and let the user call if it they want to (when driver is in stopped state) ?
The extreme workaround would be to do spiStart, operations, spiStop each time, without any check, blind dumb safety.
we actually do that for most SPI operations already, as we usually have more than one device on each SPI bus, and they tend to have different required clocks and SPI mode. Plus we do a lot of DMA sharing with other peripherals. So spiStart/spiStop on each transaction is normal for us.
Cheers, Tridge
- Giovanni
- Site Admin
- Posts: 14455
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: [patch] Added reset of SPIv2 on FIFO error
I will proceed with adding the reset operation to spi_lld_start(), no need to add a switch, the idea is to do this across the whole STM32 HAL.
Giovanni
Giovanni
Re: [patch] Added reset of SPIv2 on FIFO error
Giovanni wrote: Some already do that, see I2C which is historically glitchy and can need restarting. In general, stop then start should always return the driver to a fresh situation so the reset IS needed.
Giovanni
Only half-true; it only does the reset if the I2C is already in STOP mode. Means you can't unconditionally recover from an unknown state. I found that with I2CV2 I had to do start, stop, start to recover properly. (Can't remember the exact reasons, but when debugging its quite easy to end up with the I2C hardware in a different state to the software).
Definitely support the general principle
-
- Posts: 141
- Joined: Mon Sep 25, 2017 8:27 am
- Location: Canberra, Australia
- Has thanked: 10 times
- Been thanked: 20 times
- Contact:
Re: [patch] Added reset of SPIv2 on FIFO error
Giovanni wrote:I will proceed with adding the reset operation to spi_lld_start(), no need to add a switch, the idea is to do this across the whole STM32 HAL.
ok, sounds good. I've implemented it for our branch here:
https://github.com/ArduPilot/ChibiOS/co ... ff51d138d3
I've tested it on F765, F427 and H743 with no issues
Cheers, Tridge
Return to “Small Change Requests”
Who is online
Users browsing this forum: No registered users and 8 guests