原文如下:
In what situations might I need to insert memory barrier instructions?
Applies to: ARM Architecture and Instruction Sets, ARMv6 Architecture, ARMv7 Architecture, DS-5, RealView Development Suite (RVDS)
Answer
This FAQ aims to help users understand when, why and how to use memory barrier instructions.
Classic ARM processors, such as the ARM7TDMI, execute instructions and complete data accesses in program order. The latest ARM processors can optimize the order of instruction execution and data accesses. For example, an ARM architecture v6 or v7 processor could optimize the following sequence of instructions:
LDR r0, [r1] ; Load from Normal/Cacheable memory leads to a cache miss
STR r2, [r3] ; Store to Normal/Non-cacheable memory
The first load from memory misses in the cache it will cause a cache linefill. This typically takes many cycles to complete. Classic (cached) ARM processors, for example, the ARM926EJ-S, would wait for the load to complete before executing the store instruction. ARM architecture v6/v7 based processors can recognize that the next instruction does not depend on the result of the load (in register r0) and can execute the store instruction before the load instruction completes.
In some circumstances, processor optimizations such as speculative reads or out-of-order execution (as in the example above), are undesirable and can lead to unintended program behaviour. In such situations it is necessary to insert barrier instructions into code where there is a requirement for stricter, 'Classic ARM processor-like' behaviour. There are three types of barrier instructions. For simplicity, note that the descriptions below are for a uni-processor environment:
- A Data Synchronization Barrier (DSB) completes when all instructions before this instruction complete.
- A Data Memory Barrier (DMB) ensures that all explicit memory accesses before the DMB instruction complete before any explicit memory accesses after the DMB instruction start.
- An Instruction Synchronization Barrier (ISB) flushes the pipeline in the processor, so that all instructions following the ISB are fetched from cache or memory, after the ISB has been completed.
Note that the CP15 equivalent barrier instructions available in ARMv6 are deprecated in ARMv7. Therefore, if possible, it is recommended that any code that uses these instructions is migrated to use the new barrier instructions described above instead.
Mutexes
It is architecturally defined that software must perform a Data Memory Barrier (
DMB
) operation:- between acquiring a resource, for example, through locking a mutex (MUTual EXclusion) or decrementing a semaphore, and making any access to that resource
- before making a resource available, for example, through unlocking a mutex or incrementing a semaphore.
The following examples show an implementation of a simple blocking mutex:
lock_mutex
acquires a mutex, blocking indefinitely until it acquires it. If blocked, it waits for an event (WFE
) before retryingunlock_mutex
releases a mutex, and sends an event (SEV
) to notify waiting processes or processors of the change.
LOCKED EQU 1
UNLOCKED EQU 0
lock_mutex
; Is mutex locked?
LDREX r1, [r0] ; Check if locked
CMP r1, #LOCKED ; Compare with "locked"
WFEEQ ; Mutex is locked, go into standby
BEQ lock_mutex ; On waking re-check the mutex
; Attempt to lock mutex
MOV r1, #LOCKED
STREX r2, r1, [r0] ; Attempt to lock mutex
CMP r2, #0x0 ; Check whether store completed
BNE lock_mutex ; If store failed, try again
DMB ; Required before accessing protected resource
BX lr
unlock_mutex
DMB ; Ensure accesses to protected resource have completed
MOV r1, #UNLOCKED ; Write "unlocked" into lock field
STR r1, [r0]
DSB ; Ensure update of the mutex occurs before other CPUs wake
SEV ; Send event to other CPUs, wakes any CPU waiting on using
WFE
BX lr
The
DSB
ensures that the update to the synchronization variable is visible to all processors before SEV
is executed.
Memory Remapping
Consider a situation where your reset handler/boot code lives in Flash memory (ROM), which is aliased to address 0x0 to ensure that your program boots correctly from the vector table, which normally resides at the bottom of memory (see left-hand-side memory map).
After you have initialized your system, you may wish to turn off the Flash memory alias so that you can use the bottom portion of memory for RAM (see right-hand-side memory map). The following code (running from the permanent Flash memory region) disables the Flash alias, before calling a memory block copying routine (e.g., memcpy) to copy some data from to the bottom portion of memory (RAM).
MOV r0, #0
MOV r1, #REMAP_REG
STR r0, [r1] ; Disable Flash alias
DMB ; Ensure completion with Data Memory Barrier
BL block_copy_routine() ; Block copy code into RAM
DSB ; Ensure block copy is completed with Data Synchronization Barrier ISB ; Ensure pipeline flush with Instruction Synchronization Barrier
BL copied_routine() ; Execute copied routine (now in RAM)
Without the
DMB
between the store (STR
) and the branch with link (BL
) instructions, there is no guarantee that the store out to memory will complete before the block copying routine starts writing to the bottom portion of memory, because the block copying routine can execute while the data is draining through the write buffer. The DSB
causes all instructions before the DSB
to complete. The ISB
prevents instructions being fetched from RAM before the block copying routine completes.
Self-modifying code
Self-modifying code sequences must be preceded by an
ISB
, because the prefetch unit pipeline and the core pipeline may contain out-of-date instructions.
The example below shows a block of code being copied from ROM to RAM, and then branched to and executed.
Overlay_manager
; ...
BL block_copy ; Copy new routine from ROM to RAM
B relocated_code ; Branch to new routine
Due to speculative prefetching the processor might attempt to fetch instructions from the relocated region before the block copy has completed. To ensure that this optimization does not occur, you should insert an
ISB
before any newly relocated code begins executing, to ensure that the prefetch buffer is flushed before the processor continues fetching instructions:Overlay_manager
; ...
BL block_copy ; Copy new routine from ROM to RAM
DSB ; Ensure block copy has completed
ISB ; Ensure processor fetches new instructions
B relocated_code ; Branch to new routine
If the memory you are performing the block copying routine on is marked as 'cacheable' the instruction cache will need to be invalidated so that the processor does not execute any other 'cached' code. For "write-back" regions the data cache must be cleaned before the instruction cache invalidate.
Overlay_manager
; ...
BL block_copy ; Copy new routine from ROM to RAM
data_cache_clean ; Clean the cache so that the new routine is written out to memory
DSB ; Ensure data cache clean has completed
icache_and_pb_invalidate ; Invalidate the instruction cache and branch predictor so that the old routine is no longer cached
DSB ; Ensure block copy has completed
ISB ; Ensure processor fetches new instructions
B relocated_code ; Branch to new routine
Similar situations where you may require a barrier are:
- A Just-In-Time (JIT) compiler, for example, converting Jazelle bytecode into native ARM code
- A post linker/loader which relocates code into memory at run-time.
Further Reading:
There are many more examples of where barriers are needed. For more information on the use of barriers see:
- ARMv7-A/R Architecture Reference Manual, Appendix G - Barrier Litmus Tests
- Application Note 321 - ARM Cortex™-M Programming Guide to Memory Barrier Instructions
没有评论:
发表评论