QSPI register corruption

ChibiOS public support forum for topics related to the STMicroelectronics STM32 family of micro-controllers.

Moderators: barthess, RoccoMarco

steved
Posts: 733
Joined: Fri Nov 09, 2012 2:22 pm
Has thanked: 10 times
Been thanked: 108 times

QSPI register corruption

Postby steved » Fri Jul 24, 2020 5:42 pm

I have a strange QSPI problem where the symptom is that the QSPI address register occasionally gets set to zero - sometimes immediately after my code had set it to something else.

Sorry its a long post, but lots of information!

Using F767, initially with compiler V7.2.1, latterly with V9.3.1, Chibi 19.1.3 with a selection of updates from SVN, plus the QSPI routines from trunk/V20. No non-Chibi interrupts.

There's more explanation at the end; the key point is that the corruption seems to be happening in a context switch, usually when a "double switch" (preemption) is required.

Probably the clearest example is shown in
QSPI_Trace_60_short.png
.
Steps -12, -4 are the start of a QSPI transaction; set it going, then wait in the idle thread
Steps -11, -3 are the QSPI interrupt which occurs on completion of the transaction
Steps -10, -2 show AR, SR early on in the ISR
Steps -9, -1 show AR, SR after executing the QSPI "end of transaction" macro
Step 0 is an abnormal completion - AR is OK in the ISR exit, but zero in the idle->main context switch

Looking at the code flow from step 0, it's as follows:

Code: Select all

   OSAL_IRQ_EPILOGUE();      // Checks AR; non-zero
   _port_irq_epilogue();
      _port_switch_from_isr()
         chSchDoReschedule()
              thread_t *otp = currp;

              /* Picks the first thread from the ready queue and makes it current.*/
              currp = queue_fifo_remove(&ch.rlist.queue);
              currp->state = CH_STATE_CURRENT;

              /* Handling idle-leave hook.*/
              if (otp->prio == IDLEPRIO) {
               CH_CFG_IDLE_LEAVE_HOOK();      <--- Corruption detected here
              }


I have other examples showing the problem arising where two interrupts occur end to end, without an intervening thread switch. Here the corruption occurs between the start of the QSPI interrupt, and the chSysLockFromISR() immediately before the next trace write.
In these, the corruption is consistently picked up in the OSAL_IRQ_EPILOGUE() macro. On occasion it has been any ISR, not just the QSPI one!

My QSPI usage means that it never sets the QSPI address register to zero after initialisation, and as far as I can tell nor should any on-chip mechanism.

And the puzzling thing is that the corruption appears when Chibi is in control.

Has anyone else encountered something like this? Or any suggestions on how to debug further? Or am I missing something very obvious?


Further explanation and notes
=============================
The example code is in the startup sequence (After the normal halInit() and chSysInit()), with very little other activity (as can be seen from the trace).
I disable caching on all RAM.

The QSPI address register can only be written to when the QSPI is busy, which limits the time when this can happen to a short period between transfer start and transfer complete. So according to the logged status, it shouldn't be possible to update AR.

I have corruption checks in CH_CFG_IDLE_ENTER_HOOK(), CH_CFG_IDLE_LEAVE_HOOK(), CH_CFG_CONTEXT_SWITCH_HOOK(), CH_CFG_IRQ_PROLOGUE_HOOK() and CH_CFG_IRQ_EPILOGUE_HOOK(), as well as immediately after writing to the register.
In the example, CH_CFG_IDLE_LEAVE_HOOK() was triggered.

All interrupts which might be enabled are from normal Chibi drivers.

The detail and frequency of the problem varies as I add and subtract code, and also as I swap between -O0 and -Og. But I can usually trigger the problem.

There's plenty of stack space, and all Chibi debug options are enabled.
Statistics enabled (also tried disabled; no change).
FPU disabled.
The "ready list" threads look good (just main, idle)

No relevant errata on the QSPI from ST (although there's one for other F7 family devices; doesn't change anything).


There is a slight possibility that CAN-related code plays a part; if I strip out all my CAN code, leaving the Chibi-level drivers enabled, the problem still occurs. If I disable the Chibi Drivers, the problem goes away.

I've checked the DMA registers, and there's nothing to suggest that DMA is responsible. (QSPI is the only active user of DMA).


(I have relatively briefly tried both GCC V5.4.1 and GCC V8.3.1 - no crashes at the time, but have changed things a bit since then.)

Above tests done with STM32_WSPI_QUADSPI1_PRESCALER_VALUE 5 (43MHz I think).
I have also tried a few runs with prescaler values of 8 and 11, all of which failed in the same way.

The same problem occurs on two different sets of hardware (essentially an F767 Nucleo plugged into a carrier board which buffers up all the ports).

File hal_wspi_lld_extract.c shows the relevant parts of the LLD, including my debug checks
Attachments
hal_wspi_lld_extract.7z
(1.01 KiB) Downloaded 7 times

User avatar
Giovanni
Site Admin
Posts: 13086
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 759 times
Been thanked: 638 times
Contact:

Re: QSPI register corruption

Postby Giovanni » Fri Jul 24, 2020 9:06 pm

I don't see how the RTOS can change the AR register, it is possible that is the QSPI itself clearing it after entering a strange state, the CPU is not really writing it i think.

If you want to rule out the RTOS then you could try doing an RTOS-less test.

Fiovanni

steved
Posts: 733
Joined: Fri Nov 09, 2012 2:22 pm
Has thanked: 10 times
Been thanked: 108 times

Re: QSPI register corruption

Postby steved » Sat Jul 25, 2020 11:39 am

Giovanni wrote:I don't see how the RTOS can change the AR register, it is possible that is the QSPI itself clearing it after entering a strange state, the CPU is not really writing it i think.

I agree with the premise. Especially as another scenario I have often seen involves the QSPI starting another transaction on its own; so that it's already busy when my code next tries to start a transfer. Probably that's also what happens here.
However, as always with these strange things, I wonder why noone else appears to have seen this; people have definitely been using QSPI. I'm basically using standard ChibiOS in this area (added debug code excepted); I am using a different flash chip, so a slightly different driver, modelled on the one in Chibios, and that appears solid (not that much to change; just enough to be annoying).

User avatar
Giovanni
Site Admin
Posts: 13086
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 759 times
Been thanked: 638 times
Contact:

Re: QSPI register corruption

Postby Giovanni » Sat Jul 25, 2020 3:54 pm

QSPI is a pure master, I don't see ho the flash chip type can affect its operations.

It could be something electrical in nature causing glitches somehow, have you tried lowering QSPI clock frequency?

Giovanni

steved
Posts: 733
Joined: Fri Nov 09, 2012 2:22 pm
Has thanked: 10 times
Been thanked: 108 times

Re: QSPI register corruption

Postby steved » Sat Jul 25, 2020 4:16 pm

Giovanni wrote:QSPI is a pure master, I don't see ho the flash chip type can affect its operations.
I agree
Giovanni wrote:It could be something electrical in nature causing glitches somehow, have you tried lowering QSPI clock frequency?
Tried several lower clock frequencies; also two sets of hardware, bench PSU instead of "normal" dc-dc converter....
I suppose it might not like being connected to my PC via USB (embedded ST-link) and serial, but then difficult to see what's going on!


Return to “STM32 Support”

Who is online

Users browsing this forum: No registered users and 4 guests