ARM A9 MMU 和缓存问题

问题描述 投票:0回答:1

我正在使用 A9MP 处理器(NXP/Freescale iMX6Q)进行裸机项目,并且正在设置 MMU。该项目将使用 2 个(4 个)核心。核心 0 将读取 OCRAM 中公共可共享数据区域中的数据,并将数据显示在 LCD 显示屏上。核心 1 正在收集数据并将其插入公共区域。使用 LDREX/STREX 互斥体保护对公共区域的读/写。公共数据设置为STRONGLY_ORDERED,不执行(我认为这是正确的)。 我有几个问题:

  1. 在 ARM 论坛上阅读时,我看到一条建议:如果使用超过 1 个内核,则应始终设置 SMP 位,即使两个工作内核之间没有交互 - 这是正确的吗?如果设置了 SMP 位,是否也应该设置 FW 位 – 设置它有什么缺点吗?
  2. 设置 L1 Dcache 预取位的优缺点
  3. 以一种方式设置 Alloc 位的优点和缺点。 Core 0 会将一个帧缓冲区复制到另一个帧缓冲区,因此会产生大量巨大的 (1.5MB) 副本。还不确定是使用内存到内存 DMA 还是 NEON 复制 – 有什么建议吗?该位是仅在复制操作期间启用还是始终启用?
  4. 我在一处注意到,对于该处理器,用户对 STRONGLY_ORDERED 内存进行了设置,并设置了可共享和 RW 访问位。我认为所有 STRONGLY_ORDERED 内存(默认情况下)都可以与 RW 访问共享。哪个是正确的?
  5. 我计划对 DRAM 内存使用回写(而不是直写)(代码内存为 RO,数据内存为 RW nX)——这是最好的吗?我们很早就发现,如果不进行缓存,帧缓冲区内存的运行速度会更快。
  6. 在该处理器的 SDK 中,我注意到 RAM(数据)内存 MMU 条目是 WBWA(TEX: 001、C: 1、B: 1),它没有显示在表 9-3 的“内存类型”选项中ARM Cortex-A 系列程序员指南 (V4.0)。还有哪些其他选择?

我遇到了一些内存问题,代码内存被损坏。此时我的代码内存类型为 RW。我计划或改变这一点,但想先看看我是否可以获得上述问题的答案。

caching mmu
1个回答
0
投票
SMP bit and FW bit:
    If you are using more than one core and there is any sharing of resources (memory, peripherals, etc.) between the cores, then the SMP bit should be set to enable cache coherency between the cores.
    The FW bit (Forwarding bit) should generally be set when the SMP bit is set, as it allows cache line transfers between cores without the need for explicit cache maintenance operations.
    The downside of setting the FW bit is a slight increase in complexity and power consumption, but the benefits of cache coherency outweigh this in most multi-core applications.

L1 Dcache prefetch bit:
    Enabling L1 Dcache prefetch can improve performance by prefetching data into the cache before it is actually needed, reducing cache misses.
    The downside is increased power consumption and potential cache pollution if the prefetched data is not actually used.
    For your use case with large memory copies, enabling prefetch could potentially improve performance, but you may need to experiment to determine the actual impact.

Alloc in one way bit:
    This bit can be useful for reducing cache thrashing when performing large memory copies or operating on large data structures.
    It restricts cache line allocation to a single way in the cache, reducing conflicts and cache evictions.
    For your 1.5MB frame buffer copies, enabling this bit could potentially improve performance by reducing cache thrashing.
    You can enable it only during the copy operations or leave it enabled all the time, depending on your specific workload and performance requirements.
    For large copies, using DMA may be more efficient than NEON or CPU copies, as it offloads the work from the CPU and can operate concurrently.

Strongly-ordered memory and access permissions:
    You are correct, strongly-ordered memory is inherently shareable and readable/writable by default.
    Setting the shareable and RW access bits for strongly-ordered memory is redundant, as these properties are implicit for this memory type.

Write-back vs. write-through for DRAM memory:
    Using write-back caching for DRAM memory is generally preferred, as it can provide better performance by reducing the number of writes to memory.
    Write-through caching can be useful for memory-mapped peripherals or other memory regions where writes must be immediately visible to other components.
    Your plan to use write-back for DRAM memory and uncached memory for frame buffers is a reasonable approach.

WBWA memory type:
    WBWA stands for "Write-Back, Write-Allocate" and is a valid memory type option.
    It means that writes to this memory region will be cached and allocated in the cache on a write miss, and the cache lines will be written back to memory when evicted or explicitly cleaned.
    This memory type is commonly used for normal DRAM memory regions, as it provides good performance for both read and write operations.

关于代码内存损坏问题,将代码内存的内存类型更改为只读、不可执行(RO、nX)是一个很好的步骤,因为它将防止对代码内存的意外修改。此外,如果代码内存在内核之间共享,您可能需要检查是否存在任何潜在的缓存维护问题或一致性问题。

© www.soinside.com 2019 - 2024. All rights reserved.