[patch] Added reset of SPIv2 on FIFO error

Use this forum for requesting small changes in ChibiOS. Large changes should be discussed in the development forum. This forum is NOT for support.
tridge
Posts: 101
Joined: Mon Sep 25, 2017 8:27 am
Has thanked: 8 times
Been thanked: 12 times
Contact:

[patch] Added reset of SPIv2 on FIFO error

Postby tridge » Sat Jul 27, 2019 12:08 pm

While testing a flight controller based on a STM32F765IIK6 we hit a problem in flight where SPI transactions from a MS5611 barometer went crazy. As this caused a crash we tried to reproduce the problem on the ground.
I found that the SPI peripheral (SPI4) can get into a bad state due to a momentary electrical glitch on the clock line. When this happens the FIFO appears as non-empty and all incoming transfers are corrupted (both with polled and DMA transfers).
I thought we could fix it by just draining the FIFO, but it doesn't help. So instead I have used the non-empty FIFO to trigger a reset using the RCC. This works nicely, and when I deliberately trigger the issue by shorting the SCK line to another pin the peripheral is automatically reset and the flight controller recovers.
Patch attached.
Attachments
0001-STM32-check-for-FIFO-de-sync-in-SPIv2-and-reset-peri.zip
(1.48 KiB) Downloaded 8 times

User avatar
Giovanni
Site Admin
Posts: 12177
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 570 times
Been thanked: 487 times
Contact:

Re: [patch] Added reset of SPIv2 on FIFO error

Postby Giovanni » Sat Jul 27, 2019 6:43 pm

Question,

My question is, what causes the glitch?

CLK is an output line, should we put a SW fix for what appears to be an electrical problem?

I am asking because the fix appears to be heavy and would worsen SPI latency significantly.

Giovanni

tridge
Posts: 101
Joined: Mon Sep 25, 2017 8:27 am
Has thanked: 8 times
Been thanked: 12 times
Contact:

Re: [patch] Added reset of SPIv2 on FIFO error

Postby tridge » Sat Jul 27, 2019 11:31 pm

Giovanni wrote:My question is, what causes the glitch?

we don't know. The issue happened once in flight. All we knew from that flight was that we started getting crazy values back from SPI reads on the MS5611. A reboot fixed it.
After the flight we tried various was to trigger the same symptoms. The one we found that could trigger it was to deliberately short the CLK line with another pin (I've been using the MOSI pin, but it may well happen with other pins).
It could be that SPIv2 or STM32F7xx is particularly vulnerable to this, or it could be that it could happen on any SPI peripheral and this is the first time it has been noticed.
CLK is an output line, should we put a SW fix for what appears to be an electrical problem?

For use in an autopilot the answer is most definitely yes. The key is that a temporary electrical issue causes a permanent outage of a sensor. In an autopilot you want to be as robust as it is possible to be. It is not unheard of for external temporary RF pulses (eg. fly in front of a powerful RF transmitter such as a ground radar) to cause glitches like this.
I am asking because the fix appears to be heavy and would worsen SPI latency significantly.

In normal operation it adds a single read of the status register to each SPI read. That doesn't seem heavy to me.
If you would prefer we could move this out of the low level ChibiOS code and put it up inside the ArduPilot code, perhaps exposing a spiReset() method in the hal and spi_lld_reset() method in the driver. I think though that this sort of robustness improvement is worthwhile for many uses of ChibiOS. Autopilots are not the only RTOS examples where robustness is so important.
Also note that the reference manual does mention using the RCC reset, in the section on "Procedure for disabling the SPI". It first suggests checking for leftover bytes in the FIFO and draining those. I tried that and it didn't fix the issue in this case (though perhaps we should do it anyway). It then suggests using a RCC reset, which is what did work.
Cheers, Tridge

User avatar
Giovanni
Site Admin
Posts: 12177
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 570 times
Been thanked: 487 times
Contact:

Re: [patch] Added reset of SPIv2 on FIFO error

Postby Giovanni » Sun Jul 28, 2019 7:12 am

I missed the single condition check in spi_lld_check_reset(), yes, the latency overhead would not be much.

Anyway, I am still not sure about this fix, I don't like this kind of very specific workaround to be done in the driver. It is addressing something very specific (and not understood) occurring in your HW but the workaround overhead would be for everybody.

Another point, the patch does not consider the DMA channels state, those could need resetting too since the FIFO appears to be non-empty.

What I suggest is:
1) Add the reset to spi_lld_start() and make it a general rule for all drivers. Some already do that, see I2C which is historically glitchy and can need restarting. In general, stop then start should always return the driver to a fresh situation so the reset IS needed.
2) Add an assertion on RXNE where you placed your check, this would help detecting the issue for those affected.

The workaround would go outside the driver, you should detect the issue (the RXNE check) then do a spiStop() followed by a spiStart() before operations. The extreme workaround would be to do spiStart, operations, spiStop each time, without any check, blind dumb safety.

Giovanni

tridge
Posts: 101
Joined: Mon Sep 25, 2017 8:27 am
Has thanked: 8 times
Been thanked: 12 times
Contact:

Re: [patch] Added reset of SPIv2 on FIFO error

Postby tridge » Sun Jul 28, 2019 9:54 am

Giovanni wrote:1) Add the reset to spi_lld_start() and make it a general rule for all drivers. Some already do that, see I2C which is historically glitchy and can need restarting. In general, stop then start should always return the driver to a fresh situation so the reset IS needed.

I hadn't actually noticed that I2C does a RCC reset on i2c_lld_start(). Doing the same on SPI would seem to make sense.
Would you be happy for this to be always enabled, or would you like a STM32_SPI_RESET_ON_START define to enable it?
Or would you prefer a spiReset() call and let the user call if it they want to (when driver is in stopped state) ?
The extreme workaround would be to do spiStart, operations, spiStop each time, without any check, blind dumb safety.

we actually do that for most SPI operations already, as we usually have more than one device on each SPI bus, and they tend to have different required clocks and SPI mode. Plus we do a lot of DMA sharing with other peripherals. So spiStart/spiStop on each transaction is normal for us.
Cheers, Tridge

User avatar
Giovanni
Site Admin
Posts: 12177
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 570 times
Been thanked: 487 times
Contact:

Re: [patch] Added reset of SPIv2 on FIFO error

Postby Giovanni » Sun Jul 28, 2019 12:07 pm

I will proceed with adding the reset operation to spi_lld_start(), no need to add a switch, the idea is to do this across the whole STM32 HAL.

Giovanni

steved
Posts: 618
Joined: Fri Nov 09, 2012 2:22 pm
Has thanked: 5 times
Been thanked: 84 times

Re: [patch] Added reset of SPIv2 on FIFO error

Postby steved » Sun Jul 28, 2019 1:02 pm

Giovanni wrote: Some already do that, see I2C which is historically glitchy and can need restarting. In general, stop then start should always return the driver to a fresh situation so the reset IS needed.
Giovanni

Only half-true; it only does the reset if the I2C is already in STOP mode. Means you can't unconditionally recover from an unknown state. I found that with I2CV2 I had to do start, stop, start to recover properly. (Can't remember the exact reasons, but when debugging its quite easy to end up with the I2C hardware in a different state to the software).


Definitely support the general principle

tridge
Posts: 101
Joined: Mon Sep 25, 2017 8:27 am
Has thanked: 8 times
Been thanked: 12 times
Contact:

Re: [patch] Added reset of SPIv2 on FIFO error

Postby tridge » Sun Jul 28, 2019 11:37 pm

Giovanni wrote:I will proceed with adding the reset operation to spi_lld_start(), no need to add a switch, the idea is to do this across the whole STM32 HAL.

ok, sounds good. I've implemented it for our branch here:
https://github.com/ArduPilot/ChibiOS/co ... ff51d138d3
I've tested it on F765, F427 and H743 with no issues
Cheers, Tridge


Return to “Small Change Requests”

Who is online

Users browsing this forum: No registered users and 2 guests