Isn't there a single waitstate on most of the RAM, except for the CCM?Giovanni wrote:3) It is not true that RAM is accessed with no wait states, strange because on the F4 that is true.
[TALK] Problems with Cortex-M7
Re: [TALK] Problems with Cortex-M7
- Giovanni
- Site Admin
- Posts: 14455
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: [TALK] Problems with Cortex-M7
On the F4 all RAM can be accessed with no wait states, on F7 only ITCM and DTCM.
Giovanni
Giovanni
- alex31
- Posts: 379
- Joined: Fri May 25, 2012 10:23 am
- Location: toulouse, france
- Has thanked: 38 times
- Been thanked: 62 times
- Contact:
Re: [TALK] Problems with Cortex-M7
hello, i have port nearly all my stuff on chibios3, now i began to port from F4 to F7
reading the docu wiki "cortexm7_dma_guide", i see that
reading the STM32F746xG.ld file, it seems that the data size is 64Ko (instead of 16Ko) and the area is RAM3 (instead of RAM2)
So, if i understand, using STM32F746xG.ld, dma transfert should be transparent, for any memory inside ram3 :
° any static buffer (BSS)
° in a thread, a buffer allocated on the stack (since it is inside a working area which is basically a static buffer (BSS))
and, without alignment constraint.
I ask, because i experience some incoherency reading files via fatfs (write is OK) on F7. Same program on F4
works flawlessly, so i am suspecting a cache coherency problem.
Alexandre
reading the docu wiki "cortexm7_dma_guide", i see that
The STM32F7xx has a 16KB area optimized for DMA accesses that could be used for this (RAM2).
reading the STM32F746xG.ld file, it seems that the data size is 64Ko (instead of 16Ko) and the area is RAM3 (instead of RAM2)
So, if i understand, using STM32F746xG.ld, dma transfert should be transparent, for any memory inside ram3 :
° any static buffer (BSS)
° in a thread, a buffer allocated on the stack (since it is inside a working area which is basically a static buffer (BSS))
and, without alignment constraint.
I ask, because i experience some incoherency reading files via fatfs (write is OK) on F7. Same program on F4
works flawlessly, so i am suspecting a cache coherency problem.
Alexandre
- Giovanni
- Site Admin
- Posts: 14455
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: [TALK] Problems with Cortex-M7
Hi,
By default all the BSS variables and stacks are not cached. Heap and DATA are cached and must be managed.
Giovanni
By default all the BSS variables and stacks are not cached. Heap and DATA are cached and must be managed.
Giovanni
- alex31
- Posts: 379
- Joined: Fri May 25, 2012 10:23 am
- Location: toulouse, france
- Has thanked: 38 times
- Been thanked: 62 times
- Contact:
Re: [TALK] Problems with Cortex-M7
Ok, got my bug.
In the fatfs library, the objects FATFS and FIL each contains a 512 bytes buffer.
In my code, the FIL object was allocated on the stack of a shell, which is dynamically launched by chThdCreateFromHeap ...
since there is no cache flushing commands in fatfs_diskio.c, i got my problem.
In my case, it's not a reentrant code, so it's ok to declare the FIL object static to fix the bug,
but if i wanted to have a stack allocated FIL object, i guess the only way is to modify fatfs_diskio.c ?
I have tested to add dmaBufferInvalidate after sdcRead, it works, but the FIL object has to be declared in a weird way
to assure that fil.buf is 32 bytes aligned ...
Alexandre
In the fatfs library, the objects FATFS and FIL each contains a 512 bytes buffer.
In my code, the FIL object was allocated on the stack of a shell, which is dynamically launched by chThdCreateFromHeap ...
since there is no cache flushing commands in fatfs_diskio.c, i got my problem.
In my case, it's not a reentrant code, so it's ok to declare the FIL object static to fix the bug,
but if i wanted to have a stack allocated FIL object, i guess the only way is to modify fatfs_diskio.c ?
I have tested to add dmaBufferInvalidate after sdcRead, it works, but the FIL object has to be declared in a weird way
to assure that fil.buf is 32 bytes aligned ...
Alexandre
- Giovanni
- Site Admin
- Posts: 14455
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: [TALK] Problems with Cortex-M7
In those cases cache has to be handled by the application, there is no way to abstract that behavior.
You could use MPU to disable caching in a specific area (the stack) but then you would take a performance hit, that RAM has 1 wait state.
Giovanni
You could use MPU to disable caching in a specific area (the stack) but then you would take a performance hit, that RAM has 1 wait state.
Giovanni
-
- Posts: 26
- Joined: Wed Feb 25, 2015 9:45 pm
- Has thanked: 1 time
- Been thanked: 2 times
Re: [TALK] Problems with Cortex-M7
Hi All. I would just like to summarise what I understand by these posts, please let me know if I am wrong...
1) Cache handling is NOT completely transparent and the user will require some knowledge of the memory segments before setting aside buffers to be used by the DMA.
2) Cache invalidation requires a buffer to be 32 byte aligned.
3) If the DMA buffer falls within the .bss memory segment, then no cache invalidation is required as this memory segment is not cacheable.
3.1) as a result of this, the buffer does NOT need to be 32 byte aligned as the invalidation is not applicable.
4) If the DMA buffer falls within the .data memory segment, then cache invalidation is required as the buffer in RAM (on the DMA) might not reflect what's in the cache.
4.1) as a result of this, the buffer does need to be 32 byte aligned as the cache invalidation might invalidate other data in the cache.
5) The cache invalidation happens within all drivers that use the DMA and therefore the user shouldn't have to call the dmaBufferInvalidate routine, ever.
6) As far as the user is concerned, all they really need to do is worry about the memory alignment if the DMA buffer is within the .data memory segment or if it has been allocated on the stack and stays in scope during the DMA transfer.
There has been quite a few discussions about this cache issue with the m7 and the different methods which can be used to solve it, but as far as I know, there is not much in the way of concrete instructions to follow if a user has no knowledge of or a naive interpretation of the different memory segments.
I am one of these people that has a naive interpretation of of the memory segments (so please correct me if I am wrong), but I would expect something like this:
1) If you have a global or static variable that is initialised at instantiation (this includes buffers), then the linker sorts this into the .data memory segment. In this case, the the variable or buffer can be cached by the processor and the user needs to ensure that this data is 32 byte aligned (and the user must invalidate the cache after a DMA callback, i think... not too sure about this point in the brackets).
2) If you have a global variable that is uninitialised at instantiation (this includes buffers), then the linker sorts this into the .bss memory segment. In this case the variable will never be cached and the user does not have to ensure 32 byte alignment, nor do they need to invalidate the cache.
3) If a variable or buffer is allocated on the stack, then I have no idea, but I would think that this could be cached and the user needs to ensure 32 byte alignment and cache invalidation after DMA transfers are complete.
Anyway, I could be completely wrong here, so please do constructively criticize.
1) Cache handling is NOT completely transparent and the user will require some knowledge of the memory segments before setting aside buffers to be used by the DMA.
2) Cache invalidation requires a buffer to be 32 byte aligned.
3) If the DMA buffer falls within the .bss memory segment, then no cache invalidation is required as this memory segment is not cacheable.
3.1) as a result of this, the buffer does NOT need to be 32 byte aligned as the invalidation is not applicable.
4) If the DMA buffer falls within the .data memory segment, then cache invalidation is required as the buffer in RAM (on the DMA) might not reflect what's in the cache.
4.1) as a result of this, the buffer does need to be 32 byte aligned as the cache invalidation might invalidate other data in the cache.
5) The cache invalidation happens within all drivers that use the DMA and therefore the user shouldn't have to call the dmaBufferInvalidate routine, ever.
6) As far as the user is concerned, all they really need to do is worry about the memory alignment if the DMA buffer is within the .data memory segment or if it has been allocated on the stack and stays in scope during the DMA transfer.
There has been quite a few discussions about this cache issue with the m7 and the different methods which can be used to solve it, but as far as I know, there is not much in the way of concrete instructions to follow if a user has no knowledge of or a naive interpretation of the different memory segments.
I am one of these people that has a naive interpretation of of the memory segments (so please correct me if I am wrong), but I would expect something like this:
1) If you have a global or static variable that is initialised at instantiation (this includes buffers), then the linker sorts this into the .data memory segment. In this case, the the variable or buffer can be cached by the processor and the user needs to ensure that this data is 32 byte aligned (and the user must invalidate the cache after a DMA callback, i think... not too sure about this point in the brackets).
2) If you have a global variable that is uninitialised at instantiation (this includes buffers), then the linker sorts this into the .bss memory segment. In this case the variable will never be cached and the user does not have to ensure 32 byte alignment, nor do they need to invalidate the cache.
3) If a variable or buffer is allocated on the stack, then I have no idea, but I would think that this could be cached and the user needs to ensure 32 byte alignment and cache invalidation after DMA transfers are complete.
Anyway, I could be completely wrong here, so please do constructively criticize.
Re: [TALK] Problems with Cortex-M7
My experience with the F7 is that uninitialised variables didn't automatically end up in CCM (which is the non-cacheable RAM). So I used some simple macros which work with the standard scatter file:
Note that its also possible to program the MMU to disable caching on other memory regions - I had to do this with the FMC area, since I actually had I/O devices connected rather than memory.
On point 5, I think cache invalidation is 100% down to the application; none is done within the standard drivers.
As alex31 highlighted, third party libraries can be a problem, unless caching has been considered.
Code: Select all
#define CACHE_ALIGN __attribute__((aligned (32))) __attribute__((section(".ram0")))
#define CACHE_ALIGN_ZERO __attribute__((aligned (32))) __attribute__((section(".bss")))
#define CACHE_NONE __attribute__((section(".ram3")))
Note that its also possible to program the MMU to disable caching on other memory regions - I had to do this with the FMC area, since I actually had I/O devices connected rather than memory.
On point 5, I think cache invalidation is 100% down to the application; none is done within the standard drivers.
As alex31 highlighted, third party libraries can be a problem, unless caching has been considered.
- Giovanni
- Site Admin
- Posts: 14455
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: [TALK] Problems with Cortex-M7
Steved is correct, all true except point5. I recommend reading the map file for making sure where variables are.
Using the default .ld script cache handling should not be an issue unless for initialized variables. I find it strange that uninitialized variables don't go into the BSS (uncached), could you provide an example?
Giovanni
Using the default .ld script cache handling should not be an issue unless for initialized variables. I find it strange that uninitialized variables don't go into the BSS (uncached), could you provide an example?
Giovanni
-
- Posts: 26
- Joined: Wed Feb 25, 2015 9:45 pm
- Has thanked: 1 time
- Been thanked: 2 times
Return to “Development and Feedback”
Who is online
Users browsing this forum: No registered users and 18 guests