Debugging unhandled exception.

rew · Postby **rew** » Wed Mar 18, 2015 9:32 am

[update]
Examining more "p_state" variables put a different thread as the "culprit". I had a null pointer dereference in there. Ooops. Now lets see if things keep running with that fixed.

[original:]

Hi,
my board is stopping:
(

Code: Select all

gdb) where
#0  _unhandled_exception ()
    at ../chibios-git/os/ports/GCC/ARMCMx/STM32F0xx/vectors.c:152
#1  <signal handler called>
#2  0x55555554 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

How can I find the offending code?

As far as I understand things, the backtrace indicates that a return or task switch used an unused-position of the stack as the address-to-return-to. As long as I'm not doing "__asm__ (" ...subract something from SP "), I cannot trigger that from C-code.

So that would mean that a part of chibios is using the wrong part of the stack. Right?

I can print the thread structure in the working area. but how do I figure out if this thread is sleeping or active?

Code: Select all

(gdb)  print *(Thread *) waPingThread
$1 = {p_next = 0x20001500 <_idle_thread_wa>, p_prev = 0x20001a04 <rlist>, 
  p_prio = 64, p_ctx = {r13 = 0x20002604 <waPingThread+340>}, 
  p_newer = 0x20002eb8 <waLoggerThread>, p_older = 0x20003bb0 <waThread1>, 
  p_name = 0x800c6c8 "pinger", p_stklimit = 0x200024fc <waPingThread+76>, 
  p_state = 6 '\006', p_flags = 0 '\000', p_refs = 1 '\001', 
  p_preempt = 20 '\024', p_time = 0, p_u = {rdymsg = -1, exitcode = -1, 
    wtobjp = 0xffffffff, ewmask = 4294967295}, p_waiting = {
    p_next = 0x200024dc <waPingThread+44>}, p_msgqueue = {
    p_next = 0x200024e0 <waPingThread+48>, 
    p_prev = 0x200024e0 <waPingThread+48>}, p_msg = -1, p_epending = 0, 
  p_mtxlist = 0x0, p_realprio = 64, p_mpool = 0xffffffff}

Now the "pinger" is quite simple:

Code: Select all

 chRegSetThreadName("pinger");
  while (TRUE) {
    chThdSleepMilliseconds(15000);
    chMBPost (&log_mbox, MSG_PING, 0);
  }

so the chances of finding this thread active is very small.

Here is the thread structure of the thread that just reported on my serial port: "I'm going to do something".

Code: Select all

(gdb)  print *(Thread *) waLoggerThread
$2 = {p_next = 0x20001500 <_idle_thread_wa>, p_prev = 0x20001a04 <rlist>, 
  p_prio = 64, p_ctx = {r13 = 0x20003394 <waLoggerThread+1244>}, 
  p_newer = 0x20002258 <waMonitorThread>, p_older = 0x200024b0 <waPingThread>, 
  p_name = 0x800c8f8 "logger_thread", 
  p_stklimit = 0x20002f04 <waLoggerThread+76>, p_state = 6 '\006', 
  p_flags = 0 '\000', p_refs = 1 '\001', p_preempt = 20 '\024', p_time = 6872, 
  p_u = {rdymsg = -1, exitcode = -1, wtobjp = 0xffffffff, 
    ewmask = 4294967295}, p_waiting = {
    p_next = 0x20002ee4 <waLoggerThread+44>}, p_msgqueue = {
    p_next = 0x20002ee8 <waLoggerThread+48>, 
    p_prev = 0x20002ee8 <waLoggerThread+48>}, p_msg = -1, p_epending = 0, 
  p_mtxlist = 0x0, p_realprio = 64, p_mpool = 0xffffffff}

"research" shows that "p_state=6" means that it is sleeping.... So what happened to cause this crash????

Postby **Giovanni** » Wed Mar 18, 2015 10:21 am

Hi,

The debugger is not able to backtrack through exceptions because those are executed on a separate stack.

You have two options:
1) Inspect the NVIC registers, the info about the current exception are there (BTW, it would be nice to have an Eclipse plugin doing this).
2) Define your own handling functions, all symbols are weak so you can have a function for each vector.

Giovanni

rew · Postby **rew** » Wed Mar 18, 2015 10:42 am

Question:
Can my threads dereference NULL? Would that throw an exception?

Postby **Giovanni** » Wed Mar 18, 2015 11:18 am

Not necessarily, location zero is mapped as Flash and accessible.

Giovanni

rew · Postby **rew** » Wed Mar 18, 2015 11:40 am

So then things crash when I access an unimplemented module in IO space, right?

I wrote an "unimplemented instruction" handler for a PDP11-03 to handle the MUL and DIV instructions that the compiler for the PDP11-60 issued. (we had to use the '60 to compile things for the '03). There you'd have enough information on the stack to determine what went wrong where. Is there a way to manually dump the stack to figure this out?

Postby **Giovanni** » Wed Mar 18, 2015 12:02 pm

The ch.rlist.r_current pointer points to the current thread, the PSP register points to the thread current stack frame. You can inspect the port_extctx structure there (contains program counter etc).

This increases my idea that a Cortex-specific debug plugin would make a lot of sense, all of this is not strictly related to ChibiOS but to the CPU architecture.

Giovanni

steved · Postby **steved** » Wed Mar 18, 2015 2:11 pm

Giovanni wrote:This increases my idea that a Cortex-specific debug plugin would make a lot of sense, all of this is not strictly related to ChibiOS but to the CPU architecture.
Giovanni

+1
Much easier than trying to follow a trail through several processor-specific files etc

ulikoehler · Postby **ulikoehler** » Sat Mar 28, 2015 9:40 pm

Giovanni, I'm not sure what exactly you mean by plugin - do you mean an eclipse plugin or a ChibiOS module?

Here's my suggestion improving the out-of-the-box debuggability of *faults in ChibiOS. I do not think it is possible to integrate it into 3.0 so close to the release.

I defined this macro somewhere in my code (public domain):

Code: Select all

/**
 * Executes the BKPT instruction that causes the debugger to stop.
 * If no debugger is attached, this will be ignored
 */
#define bkpt() __asm volatile("BKPT #0\n")

So far I used code like this in my hardfault handler: [u rl]http://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html[/url]
However your tip about port_extctx is actually quite nice and removes ugly ASM code I don't fully understand.

I'd like to see this behaviour:
- Add some volatile variables to local scope so that gdb "i loc" shows me PC and LR (and maybe some other registers). This should be possible using port_extctx
- GDB breaks, but breakpoint is ignored if no debugger is attached (i.e. bkpt(), see above).
- Configurable: Infinite loop, NVIC_SystemReset() (what i currently use) or something custom

I implemented that in the hardfault handler, but it would probably be best to have a Cortex-M-specific exceptionvectors.c (what I call it) that has separate handlers for all (or at least the most common) exception.
At least for me, the type of exception is quite important (e.g. HardFault vs. BusFault which I encountered when using DMA on CCM memory) in some cases.

If someone has a use for it, I'll contribute my exceptionvectors.c as public domain when I get around to testing the port_extctx solution.

I also described a method of stopping the debugger on chDbgAssert() fails here: http://techoverflow.net/blog/2014/09/28/enforcing-debugger-breakpoints-in-chibios-chdbgassert-/ (not updated for 3.0 yet)

best regards

ulikoehler · Postby **ulikoehler** » Sun Mar 29, 2015 4:14 am

Actually got a real hardfault tonight, so I've rewritten my exception vectors with the port_extctx. Now HardFault, BusFault & MemManageFault attempt to provide the maximum amount of information possible with a single GDB "i loc". They do not use assembler besides bkpt(). I only tested with hard faults, not with UsageFaults and MemManageFaults.

Test code:

Code: Select all

volatile int x = *((int*)0x13371337);

GDB session example:

Code: Select all

(gdb) c
Continuing.
During symbol reading, incomplete CFI data; unspecified registers (e.g., r0) at 0x8003232.

Program received signal SIGTRAP, Trace/breakpoint trap.
warning: Source file is more recent than executable.
HardFault_Handler () at /home/uli/dev/MOM/MOMFirmware/src/exceptionvectors.c:65
(gdb) i loc
ctx = {
  r0 = 0x10002d28 <wsSendMailbox.lto_priv.145+16>, 
  r1 = 0x0, 
  r2 = 0x0, 
  r3 = 0x13371337, 
  r12 = 0x200041e8 <ram_heap+220>, 
  lr_thd = 0x8028493 <chSemObjectInit+34>, 
  pc = 0x8007640 <testHardFault+64>, 
  xpsr = 0x61000000
}
faultType = HardFault
faultAddress = 0x13371337
isFaultPrecise = 0x1
isFaultImprecise = 0x0
isFaultOnUnstacking = 0x0
isFaultOnStacking = 0x0
isFaultAddressValid = 0x1

Here's the code from exceptionvectors.c:

Code: Select all

/*
 * If a serious error occurs, one of the fault
 * exception vectors in this file will be called.
 *
 * This file attempts to aid the unfortunate debugger
 * to blame someone for the crashing code
 *
 *  Created on: 12.06.2013
 *      Author: uli
 *
 * Released under the CC0 1.0 Universal (public domain)
 */
#include <stdint.h>
#include <ch.h>
#include <string.h>

/**
 * Executes the BKPT instruction that causes the debugger to stop.
 * If no debugger is attached, this will be ignored
 */
#define bkpt() __asm volatile("BKPT #0\n")

void NMI_Handler(void) {
    //TODO
    while(1);
}

//See http://infocenter.arm.com/help/topic/com.arm.doc.dui0552a/BABBGBEC.html
typedef enum  {
    Reset = 1,
    NMI = 2,
    HardFault = 3,
    MemManage = 4,
    BusFault = 5,
    UsageFault = 6,
} FaultType;

void HardFault_Handler(void) {
    //Copy to local variables (not pointers) to allow GDB "i loc" to directly show the info
    //Get thread context. Contains main registers including PC and LR
    struct port_extctx ctx;
    memcpy(&ctx, (void*)__get_PSP(), sizeof(struct port_extctx));
    (void)ctx;
    //Interrupt status register: Which interrupt have we encountered, e.g. HardFault?
    FaultType faultType = (FaultType)__get_IPSR();
    (void)faultType;
    //For HardFault/BusFault this is the address that was accessed causing the error
    uint32_t faultAddress = SCB->BFAR;
    (void)faultAddress;
    //Flags about hardfault / busfault
    //See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/Cihdjcfc.html for reference
    bool isFaultPrecise = ((SCB->CFSR >> SCB_CFSR_BUSFAULTSR_Pos) & (1 << 1) ? true : false);
    bool isFaultImprecise = ((SCB->CFSR >> SCB_CFSR_BUSFAULTSR_Pos) & (1 << 2) ? true : false);
    bool isFaultOnUnstacking = ((SCB->CFSR >> SCB_CFSR_BUSFAULTSR_Pos) & (1 << 3) ? true : false);
    bool isFaultOnStacking = ((SCB->CFSR >> SCB_CFSR_BUSFAULTSR_Pos) & (1 << 4) ? true : false);
    bool isFaultAddressValid = ((SCB->CFSR >> SCB_CFSR_BUSFAULTSR_Pos) & (1 << 7) ? true : false);
    (void)isFaultPrecise;
    (void)isFaultImprecise;
    (void)isFaultOnUnstacking;
    (void)isFaultOnStacking;
    (void)isFaultAddressValid;
    //Cause debugger to stop. Ignored if no debugger is attached
    bkpt();
    NVIC_SystemReset();
}

void BusFault_Handler(void) __attribute__((alias("HardFault_Handler")));

void UsageFault_Handler(void) {
    //Copy to local variables (not pointers) to allow GDB "i loc" to directly show the info
    //Get thread context. Contains main registers including PC and LR
    struct port_extctx ctx;
    memcpy(&ctx, (void*)__get_PSP(), sizeof(struct port_extctx));
    (void)ctx;
    //Interrupt status register: Which interrupt have we encountered, e.g. HardFault?
    FaultType faultType = (FaultType)__get_IPSR();
    (void)faultType;
    //Flags about hardfault / busfault
    //See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/Cihdjcfc.html for reference
    bool isUndefinedInstructionFault = ((SCB->CFSR >> SCB_CFSR_USGFAULTSR_Pos) & (1 << 0) ? true : false);
    bool isEPSRUsageFault = ((SCB->CFSR >> SCB_CFSR_USGFAULTSR_Pos) & (1 << 1) ? true : false);
    bool isInvalidPCFault = ((SCB->CFSR >> SCB_CFSR_USGFAULTSR_Pos) & (1 << 2) ? true : false);
    bool isNoCoprocessorFault = ((SCB->CFSR >> SCB_CFSR_USGFAULTSR_Pos) & (1 << 3) ? true : false);
    bool isUnalignedAccessFault = ((SCB->CFSR >> SCB_CFSR_USGFAULTSR_Pos) & (1 << 8) ? true : false);
    bool isDivideByZeroFault = ((SCB->CFSR >> SCB_CFSR_USGFAULTSR_Pos) & (1 << 9) ? true : false);
    (void)isUndefinedInstructionFault;
    (void)isEPSRUsageFault;
    (void)isInvalidPCFault;
    (void)isNoCoprocessorFault;
    (void)isUnalignedAccessFault;
    (void)isDivideByZeroFault;
    bkpt();
    NVIC_SystemReset();
}

void MemManage_Handler(void) {
    //Copy to local variables (not pointers) to allow GDB "i loc" to directly show the info
    //Get thread context. Contains main registers including PC and LR
    struct port_extctx ctx;
    memcpy(&ctx, (void*)__get_PSP(), sizeof(struct port_extctx));
    (void)ctx;
    //Interrupt status register: Which interrupt have we encountered, e.g. HardFault?
    FaultType faultType = (FaultType)__get_IPSR();
    (void)faultType;
    //For HardFault/BusFault this is the address that was accessed causing the error
    uint32_t faultAddress = SCB->MMFAR;
    (void)faultAddress;
    //Flags about hardfault / busfault
    //See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/Cihdjcfc.html for reference
    bool isInstructionAccessViolation = ((SCB->CFSR >> SCB_CFSR_MEMFAULTSR_Pos) & (1 << 0) ? true : false);
    bool isDataAccessViolation = ((SCB->CFSR >> SCB_CFSR_MEMFAULTSR_Pos) & (1 << 1) ? true : false);
    bool isExceptionUnstackingFault = ((SCB->CFSR >> SCB_CFSR_MEMFAULTSR_Pos) & (1 << 3) ? true : false);
    bool isExceptionStackingFault = ((SCB->CFSR >> SCB_CFSR_MEMFAULTSR_Pos) & (1 << 4) ? true : false);
    bool isFaultAddressValid = ((SCB->CFSR >> SCB_CFSR_MEMFAULTSR_Pos) & (1 << 7) ? true : false);
    (void)isInstructionAccessViolation;
    (void)isDataAccessViolation;
    (void)isExceptionUnstackingFault;
    (void)isExceptionStackingFault;
    (void)isFaultAddressValid;
    bkpt();
    NVIC_SystemReset();
}

Best regards

gnif · Postby **gnif** » Wed Aug 24, 2016 3:21 pm

BRILLIANT! Thank you very much!, This should be in ChibiOS by default.

ChibiOS Free Embedded RTOS

Debugging unhandled exception.

Debugging unhandled exception.

Re: Debugging unhandled exception.

Re: Debugging unhandled exception.

Re: Debugging unhandled exception.

Re: Debugging unhandled exception.

Re: Debugging unhandled exception.

Re: Debugging unhandled exception.

Re: Debugging unhandled exception.

Re: Debugging unhandled exception.

Re: Debugging unhandled exception.

Who is online