I am using STM32F746g-discovery board with latest ChibioOS. The default link script (STM32F746xG.ld) by default uses ram3 section for BSS_RAM, and whenever I switch to STM32F746xG_MAX.ld link script (which places BSS_RAM into ram0) my threads starts crashing or malfunctioning.
I managed to replicate this issue in one of your demo projects (RT-STM32-LWIP-FATFS-USB), just switch the links script to STM32F746xG_MAX.ld in /make/stm32f746_discovery.make and the file system FATFS won't mount!
I am running very very low in memory and 64k of RAM in ram3 isn't enough to run all threads. I need to switch BSS_RAM to use ram0, but whenever I do it, I run into many problems.
I appreciate any help on this one. I am aware that by default you use ram3 for all threads to eliminate cache coherence issue.
threads crashes with STM32F746 when using STM32F746xG_MAX.ld link script Topic is solved
- Abusous2000
- Posts: 15
- Joined: Fri Jul 05, 2019 1:26 am
- Has thanked: 7 times
- Been thanked: 3 times
- Giovanni
- Site Admin
- Posts: 14444
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1074 times
- Been thanked: 921 times
- Contact:
Re: threads crashes with STM32F746 when using STM32F746xG_MAX.ld link script
Hi,
The default script puts the BSS into an uncached memory, if you put it in other RAM areas then you need to handle cache consistency yourself for all drivers using DMA.
Giovanni
The default script puts the BSS into an uncached memory, if you put it in other RAM areas then you need to handle cache consistency yourself for all drivers using DMA.
Giovanni
- Abusous2000
- Posts: 15
- Joined: Fri Jul 05, 2019 1:26 am
- Has thanked: 7 times
- Been thanked: 3 times
Re: threads crashes with STM32F746 when using STM32F746xG_MAX.ld link script
Thanks for the quick reply Giovanni, YES... I am aware of DMA and cache coherence issues with STM32F7xx boards
However, how to make demo project RT-STM32-LWIP-FATFS-USB work with STM32F746xG_MAX.ld link script?
When I use STM32F746xG_MAX.ld..FATFS wouldn't even mount? But when I switch over to the default links script (STM32F746xG.ld), it works right away.
Any suggestions?
Many thx in advance
However, how to make demo project RT-STM32-LWIP-FATFS-USB work with STM32F746xG_MAX.ld link script?
When I use STM32F746xG_MAX.ld..FATFS wouldn't even mount? But when I switch over to the default links script (STM32F746xG.ld), it works right away.
Any suggestions?
Many thx in advance
- Giovanni
- Site Admin
- Posts: 14444
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1074 times
- Been thanked: 921 times
- Contact:
Re: threads crashes with STM32F746 when using STM32F746xG_MAX.ld link script
Hi,
It is not designed to work with that linker script, it relies on BSS being not cacheable.
Giovanni
It is not designed to work with that linker script, it relies on BSS being not cacheable.
Giovanni
- Abusous2000
- Posts: 15
- Joined: Fri Jul 05, 2019 1:26 am
- Has thanked: 7 times
- Been thanked: 3 times
Re: threads crashes with STM32F746 when using STM32F746xG_MAX.ld link script
I hear you Giovanni... I have an application that uses FATFS, but there are several other threads all cramped into the ram3 section...which is limited to only 64k. This renders the boards much less useful. On the other hand, with STM32F769i, I have more leeway.
If you have any idea how to make, I would appreciate it.
Thx
If you have any idea how to make, I would appreciate it.
Thx
- Giovanni
- Site Admin
- Posts: 14444
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1074 times
- Been thanked: 921 times
- Contact:
Re: threads crashes with STM32F746 when using STM32F746xG_MAX.ld link script
You need to make a custom script and make sure your DMA buffers go in TCM, the rest can go the other RAMs.
Giovanni
Giovanni
Re: threads crashes with STM32F746 when using STM32F746xG_MAX.ld link script
There are actually two additional options to addressing this issue, but each has their own advantages and disadvantages.
Option 1: Use the MPU to make the SRAM uncached
To do this, you need to make sure the MPU support is included. I think the simplest way to do that is to turn on the stack guard page feature. You can do this by adding the following to your halconf.h:
Then you need to use mpuConfigureRegion() to activate a region and set its properties. You should add the following code to your main() at a point before you start using your DMA-capable controller devices:
0x20000000 is the base address of the internal RAM.
The MPU supports 8 regions. When stack guard pages are enabled, ChibiOS uses region 7, so I used region 1 here. According to the Cortex-M7 manual, when two MPU regions overlap, the one with the highest region number takes precedence, so the stack guard page feature should still work even with this extra region enabled.
The "SHARED_DEVICE" attribute tells the CPU that this region will be accessed by both CPU and another bus master peripheral, which disables caching.
Note that if you use the FSMC controller and the external SDRAM that's available on the STM32F746-DISCO board, I think the memory is uncached by default. You can use the MPU to make it cached. I did it like this:
Note that support for the FSMC controller is provided in the ChibiOS-Contrib repo (see the community subdirectory in the ChiniOS official releases).
Caveats:
The MPU doesn't allow you to set arbitrary sizes: you have to use a power of 2 selector.
Also, the MPU has alignment constraints on its base address value: the address must also be aligned on the same boundary as the size. So if you want to define a 64K region, its base address must be on a 64K boundary.
Here I used 512KB as the size, because the TCM RAM starts at 0x20000000 and ends at 0x2000FFFF (64K size), and the 256KB of SRAM starts at 0x20010000 and ends at 0x2004FFFF. The 256KB of SRAM is not aligned on a 256KB boundary, and the only power of 2 size up from 256K is 512K, so I used that. It's not entirely correct, but it's harmless. If you want you can use a collection of smaller regions to map just the actual SRAM and avoid the overflow.
Option 2: Sync the cache manually with the cacheBufferFlush() and cacheBufferInvalidate() routine when doing DMA transfers
This requires modifying either your application code or the drivers to perform the flush and invalidate operations as applicable. Before you do a memory to peripheral transfer, you must flush the source buffer. After you complete a peripheral to memory transfer, you must invalidate the destination buffer.
Caveats:
Flush/invalidate granularity is the size of a cache line, which for this CPU is 32. Special care may be taken to avoid having your source and destination buffers share a cache line with another buffer.
Also, you need to be reasonably familiar with the driver or application code in order to make the right code changes.
The advantage to option 1 is that it requires very little in the way of code changes. The disadvantage is that by marking all of the SRAM uncached, you sacrifice all of the performance that the data cache is supposed to give you. It's up to you as the designer to decide if that matters or not for your application.
The advantage to option 2 is that it gives you reasonably fine-grained control over cache synchronization (buffers which are not used for DMA can remain cached and you will retain the data cache performance gains). The disadvantage is that it takes a bit more knowledge and experience to make the necessary code changes.
-Bill
Option 1: Use the MPU to make the SRAM uncached
To do this, you need to make sure the MPU support is included. I think the simplest way to do that is to turn on the stack guard page feature. You can do this by adding the following to your halconf.h:
Code: Select all
#define PORT_ENABLE_GUARD_PAGES TRUE
Then you need to use mpuConfigureRegion() to activate a region and set its properties. You should add the following code to your main() at a point before you start using your DMA-capable controller devices:
Code: Select all
mpuConfigureRegion (MPU_REGION_1, 0x20000000,
MPU_RASR_ATTR_AP_RW_RW | MPU_RASR_ATTR_SHARED_DEVICE |
MPU_RASR_SIZE_512K | MPU_RASR_ENABLE);
0x20000000 is the base address of the internal RAM.
The MPU supports 8 regions. When stack guard pages are enabled, ChibiOS uses region 7, so I used region 1 here. According to the Cortex-M7 manual, when two MPU regions overlap, the one with the highest region number takes precedence, so the stack guard page feature should still work even with this extra region enabled.
The "SHARED_DEVICE" attribute tells the CPU that this region will be accessed by both CPU and another bus master peripheral, which disables caching.
Note that if you use the FSMC controller and the external SDRAM that's available on the STM32F746-DISCO board, I think the memory is uncached by default. You can use the MPU to make it cached. I did it like this:
Code: Select all
mpuConfigureRegion (MPU_REGION_3, FSMC_Bank5_MAP_BASE,
MPU_RASR_ATTR_AP_RW_RW | MPU_RASR_ATTR_CACHEABLE_WB_WA |
MPU_RASR_SIZE_8M | MPU_RASR_ENABLE);
Note that support for the FSMC controller is provided in the ChibiOS-Contrib repo (see the community subdirectory in the ChiniOS official releases).
Caveats:
The MPU doesn't allow you to set arbitrary sizes: you have to use a power of 2 selector.
Also, the MPU has alignment constraints on its base address value: the address must also be aligned on the same boundary as the size. So if you want to define a 64K region, its base address must be on a 64K boundary.
Here I used 512KB as the size, because the TCM RAM starts at 0x20000000 and ends at 0x2000FFFF (64K size), and the 256KB of SRAM starts at 0x20010000 and ends at 0x2004FFFF. The 256KB of SRAM is not aligned on a 256KB boundary, and the only power of 2 size up from 256K is 512K, so I used that. It's not entirely correct, but it's harmless. If you want you can use a collection of smaller regions to map just the actual SRAM and avoid the overflow.
Option 2: Sync the cache manually with the cacheBufferFlush() and cacheBufferInvalidate() routine when doing DMA transfers
This requires modifying either your application code or the drivers to perform the flush and invalidate operations as applicable. Before you do a memory to peripheral transfer, you must flush the source buffer. After you complete a peripheral to memory transfer, you must invalidate the destination buffer.
Caveats:
Flush/invalidate granularity is the size of a cache line, which for this CPU is 32. Special care may be taken to avoid having your source and destination buffers share a cache line with another buffer.
Also, you need to be reasonably familiar with the driver or application code in order to make the right code changes.
The advantage to option 1 is that it requires very little in the way of code changes. The disadvantage is that by marking all of the SRAM uncached, you sacrifice all of the performance that the data cache is supposed to give you. It's up to you as the designer to decide if that matters or not for your application.
The advantage to option 2 is that it gives you reasonably fine-grained control over cache synchronization (buffers which are not used for DMA can remain cached and you will retain the data cache performance gains). The disadvantage is that it takes a bit more knowledge and experience to make the necessary code changes.
-Bill
Last edited by wpaul on Wed May 27, 2020 8:59 pm, edited 1 time in total.
- Giovanni
- Site Admin
- Posts: 14444
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1074 times
- Been thanked: 921 times
- Contact:
Re: threads crashes with STM32F746 when using STM32F746xG_MAX.ld link script
Hi,
Just few notes:
PORT_ENABLE_GUARD_PAGES TRUE
Should go in chconf.h, there is a dedicated section to the bottom of the file. You also need to enable stack checking in there. Placing it in halconf.h makes it "not seen" by all modules.
In latest versions the used region is now 7, it makes no difference if you use section 1 but 0 is available too.
I am adding a port section to the ChibiOS book with this kind of low level details.
About cache handling, I would avoid that if possible, it is complex especially if inexperienced with HAL. The simplest/safest approach is to use MPU to make a memory region not cacheable as suggested. TCM is always not cacheable so it is convenient for DMA buffers.
Giovanni
Just few notes:
PORT_ENABLE_GUARD_PAGES TRUE
Should go in chconf.h, there is a dedicated section to the bottom of the file. You also need to enable stack checking in there. Placing it in halconf.h makes it "not seen" by all modules.
In latest versions the used region is now 7, it makes no difference if you use section 1 but 0 is available too.
I am adding a port section to the ChibiOS book with this kind of low level details.
About cache handling, I would avoid that if possible, it is complex especially if inexperienced with HAL. The simplest/safest approach is to use MPU to make a memory region not cacheable as suggested. TCM is always not cacheable so it is convenient for DMA buffers.
Giovanni
Who is online
Users browsing this forum: No registered users and 18 guests