Large DMA transfers

Don't see why not - since you're loading the entire TCD from a memory image each time, that image can have arbitrary contents including looping back to the start.

That does sound like you'll be hogging the DMA hardware permanently, though, which may give rise to all sorts of other problems. @jmarsh is your real expert here...
Yes, those channels will be "hogging" at a rate of 1MHz or 500KHz depending on the 4/8MHz clock.
The whole reason to stop it (clock stops too) and dump is to save the stream state (bits), and optionally load in another to replay.
The stream is a self clocking loop, so start/stop is not harmful, even if the data is swapped in, since the bits signal where the actual start is.
None of the bits are processed by the T4.1, that's all external hardware and multiplexing elsewhere.

Again, see analogies above, but I will add another...

Analog echo box, with a 200mS delay, where out loops to in after some changes. For analog the changes are in amplitude, but in this case bits are changed.
 
The problem I'm having here is how do I write an example that demonstrates repeatedly copying a 200000 byte buffer in a loop without any CPU interaction? Loop the data through SPI and just break at random intervals to compare sent vs received data?
Randomly, yeah (as in not in sync), and that's a non-issue.
It's not actually randomly, there's a reason obviously why it is stopped... to save and optionally replace the current tracking with a new copy.
During the stop event, the clock is stopped, !CS goes high, and anything in SPI in either direction that is in-flight is thrown away. This is expected behavior.
The external hardware tells when it is OK to do that, and which one to do (save/switch-to) and I get several milliseconds (~50mS IIRC) to do that copy or swap, which is plenty of time, since the other hardware is busy doing preparations for the changed stream of bits.
The external device is NOT SPI, I'm just capturing and replaying things back at a later time. I'm using SPI as a way to get oversampling of a digital signal, so that it is as close as possible to the trigger events that come out, +/-70nS, but if I can get 4MHz to be "good enough" then it's got a larger window of 140nS. Need something to test against to determine if this is even something that will work
 
...and here's the basic block asciiart flow chart of what I am doing, so you can get a better idea of the goal.

Screenshot_20250617_201004.png

Hopefully this helps. Yeah it lacks some of the details, but this is the general concept.
 
The problem I'm having here is how do I write an example that demonstrates repeatedly copying a 200000 byte buffer in a loop without any CPU interaction? Loop the data through SPI and just break at random intervals to compare sent vs received data?
Go look at the sketch I posted, it has details for the pins used in the simulation.
You can use any other random pin with a button to signal "stop".
Just remember to stop the clock, and that in-flight data doesn't count.

You can fill the buffer with some kind of pattern, it doesn't matter at this point.
If you want to "verify" no corruption, you could when stopped.
The restart condition will clean out the SPI shift registers/toggles !CS
Restarting DMA may begin at the beginning, or continue after cleaning out the SPI, it doesn't have to continue at the paused byte.
 
Got it to go!
But there's a caveat... I had to do this:

Code:
// because the one built-in wants the parent class "DMABaseClass"...
void replaceSettingsOnCompletion(DMASetting *which, DMASetting *next) {
        which->TCD->DLASTSGA = (int32_t)next->TCD;
        which->TCD->CSR &= ~DMA_TCD_CSR_DONE;
        which->TCD->CSR |= DMA_TCD_CSR_ESG;
}

Bug?
 
Did you? DMASetting is based on DMABaseClass, and thus inherits all its public methods, so you ought to be able to do
Code:
DMASetting mySetting;

mySetting.replaceSettingsOnCompletion()
Certainly works in my code...
 
The only case I know where DMASetting doesn't work correctly as DMABaseClass is when you assign one DMASetting to another, the compiler will do a default member-by-member copy instead of using the DMABaseClass-specific assignment operator...

Yeah, assignment from the pointer in the array kicks up the error when setting up TCDs for scatter/gather.
What I have is working brilliantly, but it might be a consideration to add a method that accepts the DMASetting class too.
 
Did you? DMASetting is based on DMABaseClass, and thus inherits all its public methods, so you ought to be able to do
Code:
DMASetting mySetting;

mySetting.replaceSettingsOnCompletion()
Certainly works in my code...
Try it with an array of them, like so:
Code:
DMASetting DMASettings_0_RX8[8];
DMASettings_0_RX8[8].replaceSettingsOnCompletion(&DMASettings_0_RX8[0]);

You'll see the complaint.
Code:
error: no matching function for call to 'DMASetting::replaceSettingsOnCompletion(DMASetting*)'
note: candidate: void DMABaseClass::replaceSettingsOnCompletion(const DMABaseClass&)
  void replaceSettingsOnCompletion(const DMABaseClass &settings) {
       ^
 
Yes - the copy assigment needs a reference to DMABaseClass as the RHS, not the address of a DMASetting .... which is what it's complaining about. Try:
C++:
DMASetting DMASettings_0_RX8[8];
DMASettings_0_RX8[8].replaceSettingsOnCompletion(DMASettings_0_RX8[0]);
 
Yes - the copy assigment needs a reference to DMABaseClass as the RHS, not the address of a DMASetting .... which is what it's complaining about. Try:
C++:
DMASetting DMASettings_0_RX8[8];
DMASettings_0_RX8[8].replaceSettingsOnCompletion(DMASettings_0_RX8[0]);
Yeah, that actually compiled...
...And the code still works too!
Thanks!
 
DMASettings_0_RX8[8] would be out of bounds in that case, though. Does that not give a warning?
Oddly, no. Yeah, was a typo...
I don't set up that way regardless. I use a loop instead because of all the TCDs I need.
 
Just finished a 24hour run test on the DMA code, and no corruption.

Now for the assessment, and notes.
Some hurdles that I needed to clear were:

Turns out that you need to enable DMA for TX and RX separately and use memory barriers to keep them in the correct order ("" : : : "memory").
This is needed so that there's no buss contention, and so that TX gets to load ahead, before clock is turned on. This is a known issue with some code, where GCC optimizes by rearranging code. The DSB instruction is used in some places which adds a helpful delay for DMA to do it's thing, and forcing writes is an OK side-effect instead of using a NOP.

SPI TX buffer watermarks at 4MHz can be the lowest setting, which is zero. At 8MHz, setting both to 1 is needed. To me this indicates there is a greater than 62.5nS latency during contention. Not doing this, causes corruption, data is lost.

TCDs if in the same memory section cause more contention latency and increase data loss too.
 
Here's the basic chunk of code that starts everything up.

C++:
        arm_dcache_flush_delete(sample_buffer0, MAX_BUFFER_SIZE);
        if(!started) {
                spi_0_regs->CR |= LPSPI_CR_MEN; //Enable SPI Module!
                rx.enable();
                tx.enable();
                asm volatile("" : : : "memory");
                spi_0_regs->DER = LPSPI_DER_TDDE; // TX DMA Request Enable
                asm volatile("" : : : "memory");
                asm("dsb"); // helpful delay
                spi_0_regs->DER = LPSPI_DER_RDDE | LPSPI_DER_TDDE; //RX/TX DMA Request Enable
                asm volatile("" : : : "memory");
                asm("dsb"); // helpful delay
                started = true;
                digitalWriteFast(SPIS_CSO, LOW); // should reset input and I hope output...
                asm volatile("" : : : "memory");
                asm("dsb"); // helpful delay
                analogWrite(CLOCK_OUT, 4); // 3 or 4, close enough
        }
 
Back
Top