There are actually two additional options to addressing this issue, but each has their own advantages and disadvantages.
Option 1: Use the MPU to make the SRAM uncached
To do this, you need to make sure the MPU support is included. I think the simplest way to do that is to turn on the stack guard page feature. You can do this by adding the following to your halconf.h:
Code: Select all
#define PORT_ENABLE_GUARD_PAGES TRUE
Then you need to use mpuConfigureRegion() to activate a region and set its properties. You should add the following code to your main() at a point before you start using your DMA-capable controller devices:
Code: Select all
mpuConfigureRegion (MPU_REGION_1, 0x20000000,
MPU_RASR_ATTR_AP_RW_RW | MPU_RASR_ATTR_SHARED_DEVICE |
MPU_RASR_SIZE_512K | MPU_RASR_ENABLE);
0x20000000 is the base address of the internal RAM.
The MPU supports 8 regions. When stack guard pages are enabled, ChibiOS uses region 7, so I used region 1 here. According to the Cortex-M7 manual, when two MPU regions overlap, the one with the highest region number takes precedence, so the stack guard page feature should still work even with this extra region enabled.
The "SHARED_DEVICE" attribute tells the CPU that this region will be accessed by both CPU and another bus master peripheral, which disables caching.
Note that if you use the FSMC controller and the external SDRAM that's available on the STM32F746-DISCO board, I think the memory is uncached by default. You can use the MPU to make it cached. I did it like this:
Code: Select all
mpuConfigureRegion (MPU_REGION_3, FSMC_Bank5_MAP_BASE,
MPU_RASR_ATTR_AP_RW_RW | MPU_RASR_ATTR_CACHEABLE_WB_WA |
MPU_RASR_SIZE_8M | MPU_RASR_ENABLE);
Note that support for the FSMC controller is provided in the ChibiOS-Contrib repo (see the community subdirectory in the ChiniOS official releases).
Caveats:
The MPU doesn't allow you to set arbitrary sizes: you have to use a power of 2 selector.
Also, the MPU has alignment constraints on its base address value: the address must also be aligned on the same boundary as the size. So if you want to define a 64K region, its base address must be on a 64K boundary.
Here I used 512KB as the size, because the TCM RAM starts at 0x20000000 and ends at 0x2000FFFF (64K size), and the 256KB of SRAM starts at 0x20010000 and ends at 0x2004FFFF. The 256KB of SRAM is not aligned on a 256KB boundary, and the only power of 2 size up from 256K is 512K, so I used that. It's not entirely correct, but it's harmless. If you want you can use a collection of smaller regions to map just the actual SRAM and avoid the overflow.
Option 2: Sync the cache manually with the cacheBufferFlush() and cacheBufferInvalidate() routine when doing DMA transfers
This requires modifying either your application code or the drivers to perform the flush and invalidate operations as applicable. Before you do a memory to peripheral transfer, you must flush the source buffer. After you complete a peripheral to memory transfer, you must invalidate the destination buffer.
Caveats:
Flush/invalidate granularity is the size of a cache line, which for this CPU is 32. Special care may be taken to avoid having your source and destination buffers share a cache line with another buffer.
Also, you need to be reasonably familiar with the driver or application code in order to make the right code changes.
The advantage to option 1 is that it requires very little in the way of code changes. The disadvantage is that by marking all of the SRAM uncached, you sacrifice all of the performance that the data cache is supposed to give you. It's up to you as the designer to decide if that matters or not for your application.
The advantage to option 2 is that it gives you reasonably fine-grained control over cache synchronization (buffers which are not used for DMA can remain cached and you will retain the data cache performance gains). The disadvantage is that it takes a bit more knowledge and experience to make the necessary code changes.
-Bill