Threads are dying, I cannot find the issue :(

ChibiOS public support forum for topics related to the STMicroelectronics STM32 family of micro-controllers.

Moderators: RoccoMarco, barthess

User avatar
russian
Posts: 364
Joined: Mon Oct 29, 2012 3:17 am
Location: Jersey City, USA
Has thanked: 16 times
Been thanked: 14 times

Threads are dying, I cannot find the issue :(

Postby russian » Fri Feb 27, 2015 4:59 am

Sorry for bothering, boring issue I am fighting with for the last five days. I've created a stress-test for my application and it's reliably killing it in about 20 minutes and I am failing to figure out why.

Symptoms: all of my "while(true) {doJob(); chThdSleepMilliseconds(X);} threads are getting lost. My only explicit VirtualTimer looks to be OK, my interrupts look to be processed fine.

Some details:

Code: Select all

__main_stack_size__     = 0x1000;
__process_stack_size__  = 0x0600;

#define PORT_IDLE_THREAD_STACK_SIZE     1024

#define PORT_INT_REQUIRED_STACK         32

#define CH_DBG_ENABLE_STACK_CHECK       TRUE
#define CH_DBG_ENABLE_CHECKS            TRUE
#define CH_DBG_ENABLE_ASSERTS           TRUE
#define CH_DBG_SYSTEM_STATE_CHECK       TRUE


I know that stack overflow is the main suspect so I have a LOT of assert(getRemainingStack() > XX); statements all over my code and none of them trigger. I know that main stack (total size 4k) does not ever use even 1Kb.

Here's the typical state of chSysTimerHandlerI while things are OK, note all my threads in the list:
Image

And then in 20 minutes I get
Image

I am currently using 2.6.7, I believe I had the same issue with 2.6.6

If that would be a stack overflow, I would probably expect a more random damaged vtlist - but I am seeing a valid list, I just do not have my sleeping threads in it.

Could it be anything but a stack overflow? I really want to catch the issue so that I knowingly fix it. I really believe that I have so many stack assertions which were saving be before that I would catch it by now.

What kind of additional state validation or troubleshooting technique can I try?

User avatar
Giovanni
Site Admin
Posts: 14457
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1076 times
Been thanked: 922 times
Contact:

Re: Threads are dying, I cannot find the issue :(

Postby Giovanni » Fri Feb 27, 2015 9:04 am

Hi,

You could try the eclipse debug plugin to inspect the state of threads and the trace buffer.

Alternatively prepare a minimal application triggering the problem.

Giovanni

User avatar
russian
Posts: 364
Joined: Mon Oct 29, 2012 3:17 am
Location: Jersey City, USA
Has thanked: 16 times
Been thanked: 14 times

Re: Threads are dying, I cannot find the issue :(

Postby russian » Fri Feb 27, 2015 2:24 pm

http://www.chibios.org/dokuwiki/doku.ph ... bug_plugin says
Starting from versions 2.2.7 stable and 2.3.3 unstable the ChibiOS/RT distribution includes a Debug Plugin for eclipse enhancing it with RTOS awareness.


but I only see the plugin inside ChibiOS_2.6.0.zip - that version I have installed but it does not show anything :(

I have just added
void assertVtList(void) {
if(!main_loop_started)
return;
VirtualTimer *first = vtlist.vt_next;
VirtualTimer *cur = first->vt_next;
int c = 0;
while(c++ < 20 && cur != first) {
cur = cur->vt_next;
}
efiAssertVoid(c > 3, "VT list?");
}

into chSysTimerHandlerI - I believe it gives me the exact moment then the threads disappear, here is the trace:
Image

I will now prepare a package which reproduces the problem.

User avatar
Giovanni
Site Admin
Posts: 14457
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1076 times
Been thanked: 922 times
Contact:

Re: Threads are dying, I cannot find the issue :(

Postby Giovanni » Fri Feb 27, 2015 2:38 pm

Hi,

The plugin is part of ChibiStudio now.

Giovanni

User avatar
russian
Posts: 364
Joined: Mon Oct 29, 2012 3:17 am
Location: Jersey City, USA
Has thanked: 16 times
Been thanked: 14 times

Re: Threads are dying, I cannot find the issue :(

Postby russian » Fri Feb 27, 2015 4:17 pm

SVN: https://svn.code.sf.net/p/rusefi/code/branches/20150227_fatal_issue/
Same stuff as one zip: https://svn.code.sf.net/p/rusefi/code/branches/20150227_fatal_issue.zip

There I have the firmware, Makefile, Eclipse and IAR project. By the way I would need to check if I can actually reproduce this issue with IAR.

Once the firmware starts on stm32f4, it starts a virtual serial port over USB. Also in the bundle there is a java testing utility rusefi_console.jar

Code: Select all

java -cp rusefi_console.jar com.rusefi.EnduranceTest COM41

where COM41 is the serial port name.

Blue LED is blinking to show that the code is alive. Once it does not blink that means the code is dead :( RED led means fatal error, it now goes on because of assertVtList in chSysTimerHandlerI

This time it took 62 minutes and 456 cycles of test to get to the error :(

Fri Feb 27 09:02:14 EST 2015<EOT>: Starting COM41
Fri Feb 27 09:02:14 EST 2015<EOT>: SerialConnector: connecting
Fri Feb 27 09:02:14 EST 2015<EOT>: Sending command [set_engine_type 3]
Fri Feb 27 09:02:14 EST 2015<EOT>: postMessage CommandQueue: SerialIO started
Fri Feb 27 09:02:14 EST 2015<EOT>: Opening COM41 @ 115200
Fri Feb 27 09:02:27 EST 2015<EOT>: Starting COM41
Fri Feb 27 09:02:27 EST 2015<EOT>: SerialConnector: connecting
...
...
...
...
...

Fri Feb 27 10:06:41 EST 2015<EOT>: postMessage EngineState: setting fan No
Fri Feb 27 10:06:41 EST 2015<EOT>: postMessage EngineState: setting pump No
Fri Feb 27 10:06:41 EST 2015<EOT>: postMessage EngineState: setting fan No
Fri Feb 27 10:06:41 EST 2015<EOT>: ++++++++++++++++++++++++++++++++++++ 456 +++++++++++++++
Fri Feb 27 10:06:41 EST 2015<EOT>: Sending command [set_engine_type 3]
Fri Feb 27 10:06:41 EST 2015<EOT>: Sending [sec!17!set_engine_type 3]
Fri Feb 27 10:06:41 EST 2015<EOT>: postMessage PortHolder: Sending [sec!17!set_engine_type 3]
Fri Feb 27 10:06:41 EST 2015<EOT>: EngineState: unexpected header: sec!17!set_engine_type 3 while looking for line:
Fri Feb 27 10:06:42 EST 2015<EOT>: msg,setting pump No,msg,setting fan No,msg,setting pump No,msg,setting fan No,msg,setting pump No,msg,setting fan No,msg,confirmation_set_engine_type 3:17,msg,applyNonPersistentConfiguration(),msg,initializeTriggerShape(),msg, !!!!!!!!!!!!!!!!!!!! BE SURE NOT WRITE WITH IGNITION ON !!!!!!!!!!!!!!!!!!!!,msg,flash compatible with 6667,msg,Reseting flash: size=15172,msg,Flashing with CRC=208,msg,Flash programmed in (ms): 65,msg,Flashing result: 0,msg,Template Aspire/3 trigger TT_FORD_ASPIRE/LM_PLAIN_MAF,msg,configurationVersion=928,msg,RPM bin: 800.00 1213.32 1626.65 2040.00 2453.32 2866.65 3280.00 3693.32 4106.65 4520.00 4933.33 5346.65 5760.00 6173.33 6586.65 7000.00 ,msg,Y bin: 1.19 1.40 1.62 1.83 2.04 2.25 2.48 2.69 2.89 3.11 3.32 3.53 3.75 3.97 4.17 4.40 ,msg,CLT: 1.50 1.50 1.41 1.36 1.27 1.19 1.12 1.10 1.05 1.05 1.02 1.00 1.00 1.00 1.00 1.00 ,msg,CLT bins: -40.00 -30.00 -20.00 -10.00 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00 110.00 ,msg,IAT: 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 ,msg,IAT bins: -40.00 -30.00 -20.00 -10.00 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00 110.00 ,msg,vBatt: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ,


This is where I am stuck. I am pretty sure it could be my bug but I need advice on how to catch it while it develops - that's if I corrupt Chibi memory region.

User avatar
Giovanni
Site Admin
Posts: 14457
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1076 times
Been thanked: 922 times
Contact:

Re: Threads are dying, I cannot find the issue :(

Postby Giovanni » Fri Feb 27, 2015 4:35 pm

Hi,

The images you posted are not a list of threads but a list of virtual timers. What do you mean for "thread dying"? in ChibiOS threads are static and cannot disappear.

Giovanni

User avatar
russian
Posts: 364
Joined: Mon Oct 29, 2012 3:17 am
Location: Jersey City, USA
Has thanked: 16 times
Been thanked: 14 times

Re: Threads are dying, I cannot find the issue :(

Postby russian » Fri Feb 27, 2015 5:49 pm

I have one virtual timer which I control explicitly:

Code: Select all

   chVTSetAny(&periodicTimer, period * TICKS_IN_MS, (vtfunc_t) &periodicCallback, engine);


and I have about 15 threads, each of which follows the same pattern:

Code: Select all

static void blinkingThread(void *arg) {
   while (true) {
      int delay = isConsoleReady() ? 3 * blinkingPeriod : blinkingPeriod;
      chThdSleepMilliseconds(delay);
   }

chThdSleepMilliseconds is implemented via a VirtualTimer vt; on the stack of executed thread in my understanding. I am expecting that with 15 threads like that I should always have a long list of virtual timers in the vtlist. That's true for about an hour, and then suddenly somehow my explicit periodicTimer is still there, while all the implicit VirtualTimer vt; for all my utility threads are not in the vtlist.

I will now try ChibiStudio. Should I create a ticket for the wiki update? Looks like current http://www.chibios.org/dokuwiki/doku.php?id=chibios:guides:debug_guide#debug_plugin is not up-to-date.

User avatar
Giovanni
Site Admin
Posts: 14457
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1076 times
Been thanked: 922 times
Contact:

Re: Threads are dying, I cannot find the issue :(

Postby Giovanni » Fri Feb 27, 2015 6:15 pm

Always about one hour or is it random?

Giovanni

User avatar
russian
Posts: 364
Joined: Mon Oct 29, 2012 3:17 am
Location: Jersey City, USA
Has thanked: 16 times
Been thanked: 14 times

Re: Threads are dying, I cannot find the issue :(

Postby russian » Fri Feb 27, 2015 6:34 pm

Giovanni wrote:Always about one hour or is it random?

Random. Sometimes it's 15 minutes, sometimes I have a good 3 hours run. I am trying to isolate the issue to a particular layer of my code - I can conditionally compile or not compile some layers of functionality, same idea as in halconf.h
With most of my functional modules off I have a copy running for 13 hours and counting.
Last edited by russian on Fri Feb 27, 2015 6:59 pm, edited 1 time in total.

User avatar
Giovanni
Site Admin
Posts: 14457
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1076 times
Been thanked: 922 times
Contact:

Re: Threads are dying, I cannot find the issue :(

Postby Giovanni » Fri Feb 27, 2015 6:46 pm

What is chVTSetAny() ?

Giovanni


Return to “STM32 Support”

Who is online

Users browsing this forum: No registered users and 53 guests