Blackaddr
Well-known member
After months of struggling to get DMA SPI memory working with the T4 OCRAM I think I finally found the problem, and I think it's in AudioStream.h. Basically, I'm getting audio buffer corruption when the Audio buffer pool is being used by both I2S and SPI when both involve DMA.
The memory pool for the audio buffers is defined as an array of audio_block_t in AudioStream.h. The L1 cache line size is 32 bytes. The ARM manual states that DMA buffers must be 32-byte aligned (but its size should really be a multiple of 32-bytes as well) otherwise indeterminate behaviour can result. Why this is important becomes clear when you consider what happens if two different buffers share a cache line.
Let's say two adjacent 48-byte buffers in memory have their boundary in the middle of a cache line. Cache lines are at 32-byte aligned addresses.
If two different contexts are using those buffers, and each is attempting to use DMA to get data in and out of OCRAM, I think we can run into some cache coherency issues. Either context could be modifying OCRAM and needs to invalidate the cache, or is writing to the cache (dirty) and needs it flushed.
My thinking at the moment is the the atomic barriers in the cache maintenance functions are not sufficient to maintain coherency in this case because a DMA transfer initiated earlier by one context could asynchronously being running in parallel while the CPU is performing a cache maintenance operation. In other words, even if two different contexts are using the two buffers independently, CPU cache maintenance is not aware if a DMA transfer from a peripheral to OCRAM is currently underway and race conditions occur in the overlapping cache line.
The result is corruption of the portion of the buffers that share a cache line.
In my case, when I use I2S without SPI DMA, I get clean audio. When I stress test a SPI memory via DMA using test patterns, I get no data errors. When I use SPI memory to DMA audio buffers than have been allocated as receiveWritable(), I get corruption. if I disable caching in startup.c, the audio corruption goes away.
I seem to be able to fix the problem by restructuring the audio buffer pool to ensure two conditions
i) every data buffer in the audio blocks starts on a 32-byte aligned address
ii) no two data buffers share the same cache line.
The problem is the current audio_block_t structure puts the data buffer on unaligned boundaries, and two different data buffers from adjacent audio_block_t in the array will overlap in the same cache line.
The following code change to AudioStream.h fixes the cache corruption:
Note, this requires a change to effect_freeverb.cpp which has an odd way of attempting to create a zero'd audio_block instance. You can fix this compile error by replace the #ifdefs with this:
In summary, all buffers in all the Teensy libraries that are used with DMA must start on a 32-byte aligned boundary and be rounded up to an integer size of 32-bytes to ensure no data corruption of adjacent memory.
...or I could be totally wrong...
The memory pool for the audio buffers is defined as an array of audio_block_t in AudioStream.h. The L1 cache line size is 32 bytes. The ARM manual states that DMA buffers must be 32-byte aligned (but its size should really be a multiple of 32-bytes as well) otherwise indeterminate behaviour can result. Why this is important becomes clear when you consider what happens if two different buffers share a cache line.
Let's say two adjacent 48-byte buffers in memory have their boundary in the middle of a cache line. Cache lines are at 32-byte aligned addresses.
Code:
// OCRAM starts at 0x0202 0000
// Assume that cache line index 0 is mapped to 0x0202 0000 and ends at 001F, cache line index 1 starts at 0x0202 0020 and ends at 003F
uint8_t bufferA[48]; // starts at address 0x0202 0000
uint8_t bufferB[48] // starts at address 0x0202 0030 // this buffer starts in the middle of cache line index 1
If two different contexts are using those buffers, and each is attempting to use DMA to get data in and out of OCRAM, I think we can run into some cache coherency issues. Either context could be modifying OCRAM and needs to invalidate the cache, or is writing to the cache (dirty) and needs it flushed.
My thinking at the moment is the the atomic barriers in the cache maintenance functions are not sufficient to maintain coherency in this case because a DMA transfer initiated earlier by one context could asynchronously being running in parallel while the CPU is performing a cache maintenance operation. In other words, even if two different contexts are using the two buffers independently, CPU cache maintenance is not aware if a DMA transfer from a peripheral to OCRAM is currently underway and race conditions occur in the overlapping cache line.
The result is corruption of the portion of the buffers that share a cache line.
In my case, when I use I2S without SPI DMA, I get clean audio. When I stress test a SPI memory via DMA using test patterns, I get no data errors. When I use SPI memory to DMA audio buffers than have been allocated as receiveWritable(), I get corruption. if I disable caching in startup.c, the audio corruption goes away.
I seem to be able to fix the problem by restructuring the audio buffer pool to ensure two conditions
i) every data buffer in the audio blocks starts on a 32-byte aligned address
ii) no two data buffers share the same cache line.
The problem is the current audio_block_t structure puts the data buffer on unaligned boundaries, and two different data buffers from adjacent audio_block_t in the array will overlap in the same cache line.
Code:
// first audio block will start at 0x0202 0000 in OCRAM
typedef struct audio_block_struct {
uint8_t ref_count; // 0x00
uint8_t reserved1; // 0x01
uint16_t memory_pool_index; // 0x02
int16_t data[AUDIO_BLOCK_SAMPLES]; // 0x04 (starts at unaligned 32-byte address)
} audio_block_t;
The following code change to AudioStream.h fixes the cache corruption:
Code:
typedef struct audio_block_struct {
int16_t data[AUDIO_BLOCK_SAMPLES]; // relocate the data buffer to the start of the struct
uint8_t ref_count;
uint8_t reserved1;
uint16_t memory_pool_index;
uint8_t padding[28]; // pad out the audio block from 260 to 288 bytes. This ensure all data buffers in an audio_block_t array are correctly aligned
} audio_block_t;
// don't forget to but the aligned(32) attribute on the array itself to ensure it starts at the correct address.
#define AudioMemory(num) ({ \
static DMAMEM audio_block_t data[num] __attribute__ ((aligned(32))); \
AudioStream::initialize_memory(data, num); \
})
Note, this requires a change to effect_freeverb.cpp which has an odd way of attempting to create a zero'd audio_block instance. You can fix this compile error by replace the #ifdefs with this:
Code:
static const audio_block_t zeroblock = { 0 };
In summary, all buffers in all the Teensy libraries that are used with DMA must start on a 32-byte aligned boundary and be rounded up to an integer size of 32-bytes to ensure no data corruption of adjacent memory.
...or I could be totally wrong...