Where to start STM32H7 support

ChibiOS public support forum for topics related to the STMicroelectronics STM32 family of micro-controllers.

Moderators: RoccoMarco, barthess

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

reading flash with ECC errors

Postby tridge » Sat Feb 09, 2019 1:00 am

I hope nobody minds me posting so much here ...
I've finally got my bootloader working correctly on the H743. I ran across a really interesting issue that I thought I'd share in case it bites anyone else.
The H743 flash has ECC, and also has a restriction that you can only program on 32 byte boundaries with exactly 32 bytes at a time. The ref manual discusses a way to get around that, using the FLASH_CR_FW bit to force a write of a partial line, although it gives some (vague) information about why this isn't a good idea. I discovered that it *really* is a bad idea.
My (USB based) bootloader keeps back the first word of the firmware when flashing, then writes the first word once the CRC passes. This is used to prevent trying to boot a partially flashed fw, in case the user unplugs USB while flashing. I used the CR_FW bit to implement this partial line write.
The problem is that you can end up with the 32 byte line having an ECC error, and that ECC error persists across power cycles. Even worse, when you try to read from that line you get a "double ECC error" and a hard fault.
So when this went bad the board went into a hard fault on startup, as at startup it reads the first words of the fw. Even if you re-flash you get a hard fault as the bootloader tries to read the flash words to check if it can skip a sector erase (if all words in the sector are 0xffffffff).
What I'd really like to do is have a function which probes a word in flash and checks if it would give an ECC error if you read it, or asks if a particular line has an ECC error. I haven't worked out a way to do that yet (although I suspect there is a way).
For now I've re-jigged the bootloader to only ever do 32 byte aligned writes, using a different strategy for checking for partial flash. It's working nicely, but this cost me a lot more time to understand than I would have liked.
Here is out (rather ugly, sorry) flash driver in case anyone is interested:
https://github.com/tridge/ardupilot/blo ... on/flash.c
and the bootloader is here:
https://github.com/tridge/ardupilot/tre ... Bootloader

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: Where to start STM32H7 support

Postby Giovanni » Sat Feb 09, 2019 4:23 am

On a different platform I had to implement a safe_memcpy() function which is supposed to resist accessing locations with ECC errors and just return an error flag in case of failure.

Globally:
- Define a global pointer exc_return setting it to NULL.

I did the following in safe_memcpy:
- Set the pointer to an exit handler in case of exception.
- Perform the operation.
- Memory data barrier, need to make sure the exception happens before next sterp.
- Set the pointer to NULL again.
- Return false (no error).

In the exit handler:
- Return true (error).

In the exception handler:
- Check if the pointer is NULL, if so do the normal exception handling (stop).
- Change the return address of the handler to the location pointed by the pointer.
- Set the pointer to NULL.
- Return from exception on the exit handler.

You could use a similar approach.

Giovanni

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

Re: Where to start STM32H7 support

Postby tridge » Sat Feb 09, 2019 5:56 am

Giovanni wrote:You could use a similar approach

thanks, that is a good suggestion, although its rather more complex than I hoped it would be!
Regarding my earlier report of spiExchange() hanging, it seems to be related to the DCache, at least for the code on my own board.
I had assumed that DTCM would be the right memory to use for DMA. ArduPilot uses DTCM for all DMA operations on the F7, and I just used the same strategy on the H7. That turns out to be a mistake due to the different bus domains. DMA to/from AXI SRAM does work, but you need to do DCache invalidation and flush. I remember you pointed this out in one of your earliest posts on the H7, but I had assumed that DTCM would be OK. Bad assumption.
For now I've just disabled the DCache until I rework the bounce buffer logic in ArduPilot to cope with the new restrictions.

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: Where to start STM32H7 support

Postby Giovanni » Sat Feb 09, 2019 7:50 am

H7 is very complex, you should look at the internal domains diagram to understand which masters can access the various RAM areas. I would use F7 unless the extra performance is really really required, much simpler.

Giovanni

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

Re: Where to start STM32H7 support

Postby tridge » Tue Feb 12, 2019 2:54 am

Giovanni wrote:H7 is very complex, you should look at the internal domains diagram to understand which masters can access the various RAM areas. I would use F7 unless the extra performance is really really required, much simpler.

I'm finding it really fun working on the H7. It is certainly a challenge, but an enjoyable one.
I have DMA working with DCache enabled now, with appropriate DMA bounce buffers and invalidate/flush operations. I'm only using AXI SRAM for now to keep life simple. I'll need to add arguments to the bouncebuffer code to specify the domain once I enable other memory regions.

I think I spotted a bug though. In the stm32_clock_init() for H7 the workaround for the AXI SRAM corruption bug uses this:

*((volatile uint32_t *)0x51000000 + 0x1108 + 0x7000) = 0x00000001U;

but following the reference manual section 2.2.4 for READ_ISS_OVERRIDE, I think it should be this:

*((volatile uint32_t *)0x51000000 + 0x1008 + 0x7000) = 0x00000001U;

note the change from 0x1108 to 0x1008. Is that a bug, or is there a typo in the reference manual? The reason I'm looking at this is I'm getting some occasional memory corruption that I can't explain at the moment, so I'm looking for all possible causes. I'm getting the corruption at a fairly consistent address, but I can't use a watchpoint as I think the DCache hides the change in the memory from the debugger I'm using (a black magic probe). Do you happen to know how to make SWD do watchpoints to catch memory changes with dcache enabled?

Cheers, Tridge

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

Re: Where to start STM32H7 support

Postby tridge » Tue Feb 12, 2019 8:03 am

tridge wrote: *((volatile uint32_t *)0x51000000 + 0x1108 + 0x7000) = 0x00000001U;

sorry, it is 0x1108 for AXI_TARGx_FN_MOD. I was looking at AXI_TARGx_FN_MOD_ISS_BM

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: Where to start STM32H7 support

Postby Giovanni » Tue Feb 12, 2019 8:33 am

Sorry, never tried that. Have you tried on the ST forum?

Giovanni

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

reproducible memory corruption on SPI transfers

Postby tridge » Wed Feb 13, 2019 6:55 am

I've created a small test program that demonstrates memory corruption when doing SPI transfers on a Nucleo H743. It also has USB active for convenience of printing results, although its quite possible it has an impact on the result. The test program is a patch on top of todays ChibiOS trunk.

The program does SPI transfers at 1kHz, each of 154 bytes. It does the appropriate DCache invalidates and flushes.
It allocates 500k of memory at startup, and fills it with a known pattern of values. It then checks the values every 10ms to see if any of them have become corrupt. What happens is that MCU reads from the allocated memory sometimes return the wrong value. Later reads of the same address get the right value again, which implies that it is probably a cache read issue, not a wild DMA.
This test program is a greatly reduced version of what I reproduced with ArduPilot. It runs on a Nucleo-H743. It doesn't need any peripherals attached. You do need to be using a lot of memory to see the issue. The address in memory that becomes corrupted seems to be random (or at least I haven't spotted a pattern yet).

A failure looks like this:

count=21493 CORRUPT=0
count=21622 CORRUPT=0
count=21751 CORRUPT=0
count=21881 CORRUPT=0
Corruption 0x00000000 should be 0x02E351A0 at 238/18
count=22010 CORRUPT=1
count=22140 CORRUPT=1
count=22269 CORRUPT=1

in the above case a read of chunk 238/18 returned 0 when it should have returned 0x02E351A0 after 21881 SPI transfers. Later reads of the same address give the right value.
It typically reproduces the issue in under a minute, but sometimes takes several minutes.
I'll continue trying to narrow down the issue, in particular checking if there are any particular circumstances that do/don't trigger the bug.
I do hope this isn't just a silly bug on my part!
Cheers, Tridge
Attachments
SPI_USB_corruption.zip
(22.05 KiB) Downloaded 154 times

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

Re: reproducible memory corruption on SPI transfers

Postby tridge » Wed Feb 13, 2019 8:16 am

tridge wrote:I've created a small test program that demonstrates memory corruption when doing SPI transfers on a Nucleo H743. It also has USB active for convenience of printing results, although its quite possible it has an impact on the result. The test program is a patch on top of todays ChibiOS trunk.


Looks like the cause was missing parantheses in hal_lld.c

See this godbolt demo of the issue:
https://godbolt.org/z/AW2rFH

patch attached. With that change I can no longer reproduce the corruption with the small test program.

Cheers, Tridge
Attachments
H74x-AXI-SRAM-fix.zip
(753 Bytes) Downloaded 168 times

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: Where to start STM32H7 support

Postby Giovanni » Wed Feb 13, 2019 8:55 am

One of those "feel stupid" moments...

I will open a ticket about this.

Giovanni


Return to “STM32 Support”

Who is online

Users browsing this forum: No registered users and 17 guests