Optimizing GPIO Input Speed

LaserSam

Member
Hi:

I would like to input from a GPIO location (specifically GPIO6_PSR, 0x42004008) as quickly as possible and transfer the words read to memory (or cache) uninterrupted. Up to 10,000 words or more may be required. Think of data acquisition in a digital storage oscilloscope, though that is not what this is for.

This is the assembly code loop currently being used:

"ldr r3, [r8] \n\t" // load value of GPIO6_PSR into r3
"str r3, [r9], #4 \n\t" // store value into gpioDataArray and then add 4 bytes to the index
"cmp r9, r10 \n\t" // check loop counter against loop limit
"ble nextdata \n\t" // loop if limit not reached


This works great, but the performance is pathetic given the 600 MHz CPU - around 15 ns for each data point. That's something like 10 or 11 clock cycles.

Unwinding the loop and just creating a block of repeating (ldr/str)s to eliminate the test and branch only improves performance by 10 or 15 percent.

Is there any better way?

Thanks,

--- sam
 
Maybe I2S / SAI could be abused for this purpose? But you could get at most 4 pins because SAI1 supports up to 4 inputs. In theory it should work up to 25 MHz sample rate, or maybe faster if you're ok with overclocking SAI above its rated specs.

The PDM code in the audio library might be close. It reads the raw bits and put them into a huge FIR filter.
 
Hi again. Sorry for the long delay. We decided to go with DMA, thanks to KurtE for the suggestion.

The sketch below is a stripped down version that does everything but the transfers never take place! For this test, it is supposed to DMA 32 words from GPIO6 to the first half of a 64 word array. It hangs at the dma.enable(); command. If that is commented out, it goes on to print the destBuffer contents (which was previously filled with 0xDEADFEEDs).

Code:
/*********** Program to test high speed DMA from GPIO to memory *************/

#include <Arduino.h>
#include "DMAChannel.h"
 
#define BUFFER_SIZE 64     // Total buffer size for samples
#define TRANSFER_SIZE 32   // Transfer size - do half for this test
#define GPIO6 0x42004008   // Address of GPIO PSR

// DMA buffer
DMAMEM static uint32_t destBuffer[BUFFER_SIZE] __attribute__((aligned(32)));

// Completion flag
volatile bool transferComplete = false;
int GPIO_Print = GPIO6; // Only for the print info

// DMA channel
DMAChannel dma;

// ISR for DMA completion
void dma_complete_isr(void) {
  transferComplete = true;
  dma.clearInterrupt();
}

uint32_t Start_Time_Stamp = 0; // Measure time spent in DMA
uint32_t End_Time_Stamp = 0;

void setup() {
  // Initialize serial
  Serial.begin(115200);
  delay(3000);
  Serial.println("\n******** Direct DMA GPIO Test *******\n");

  // Print info
  Serial.print("GPIO6 Address: 0X");
  Serial.println(GPIO_Print, HEX);
  Serial.print("Transfer Size: ");
  Serial.print(TRANSFER_SIZE);
  Serial.println(" Words");

  // Run DMA capture
  runCapture("MEAS1");
}

// Function to run a DMA capture
void runCapture(const char *state) {
  // Clear buffer with known pattern
  for (int i = 0; i < BUFFER_SIZE; i++) {
    destBuffer[i] = 0xDEADFEED;
  }

  // Initialize DMA
  dma.begin(true);
 
  // DMA source
  dma.TCD->SADDR = (void*)GPIO6;      // GPIO6_PSR
  dma.TCD->SOFF = 0;                  // Don't increment source address
  dma.TCD->ATTR_SRC = 2;              // 32-bit source size
  dma.TCD->NBYTES = 4;                // 4 bytes per transfer
  dma.TCD->SLAST = 0;                 // Don't adjust source address at end of major loop

  dma.transferSize(TRANSFER_SIZE);
  dma.destinationBuffer(destBuffer, BUFFER_SIZE);
  dma.disableOnCompletion();
  dma.attachInterrupt(dma_complete_isr);
  dma.interruptAtCompletion();

  // Setup trigger
  dma.triggerContinuously();

  // Flush cache
  arm_dcache_flush_delete(destBuffer, sizeof(destBuffer));

  // Start DMA
  Serial.print("\nStarting DMA capture\n");

  transferComplete = false;

  dma.enable();
  Start_Time_Stamp = micros();
  while (!transferComplete) {}; // Wait for DMA to complete
  End_Time_Stamp = micros();

  Serial.print("Elapsed Time: ");
  Serial.print(End_Time_Stamp - Start_Time_Stamp);
  Serial.print(" Microseconds");

  // Invalidate cache
  arm_dcache_flush_delete(destBuffer, sizeof(destBuffer));

  // Print results
  Serial.println("\n\nResults:");
  Serial.println();

  for (int i = 0; i < BUFFER_SIZE; i++) {
    Serial.print(i);
    Serial.print(": 0x");
    Serial.println(destBuffer[i], HEX);
  }
}

void loop() {
  // Do nothing
  delay(1000);
}

So what bit isn't getting set correctly? Any assistance appreciated.

Thanks,

--- sam
 
It looks like you've used transferSize() when you've intended to use transferCount(). You've also got the second parameter wrong for destinationBuffer(); it should be in bytes, not elements.
 
Hi:

I just replaced:

Code:
  dma.transferSize(TRANSFER_SIZE);
  dma.destinationBuffer(destBuffer, BUFFER_SIZE);

with
 
  dma.transferCount(TRANSFER_SIZE);
  dma.destinationBuffer(destBuffer, BUFFER_SIZE * 4);

Was that what you meant?

It prints:

******** Direct DMA GPIO Test *******

GPIO6 Address: 0X42004008
Transfer Size: 32 Words

Starting DMA capture

and hangs there.

Thanks for your support!

--- sam
 
Hi, sorry for the delay. Here is a followup. We have DMA functional but is very slow, almost a digitally precise 100 ns/32 bit word.

Here is a stripped down version of the code that includes a memory-memory copy of sequential numbers and a GPIO2 to memory copy, which will be whatever is on the GPIO2_PSR (Pin Status Register). The full copy is 10,000 32 bit words; as set up, fewer are actually printed to the Serial Monitor:

Code:
//           Teensy 4.0 Program to test high speed DMA from memory and GPIO2 to memory.          //

#include <Arduino.h>
#include "DMAChannel.h"

#define Print_Verbose 1           // If 1, print verbose raw buffer data
#define VERBOSE_SIZE 1024         // Number of values to print: first 32, then 1 every 32nd sample

#define BUFFER_SIZE 10000         // Total buffer size in 32 bit words for samples
#define TRANSFER_SIZE 10000       // Transfer size in 32 bit words

float Start_Time = 0;             // Measure time spent in transfer (us)
float End_Time = 0;
float Elapsed_Time_Float = 0;     // End Time - Start Time (us)
float Time_per_Copy_Float = 0;    // (ns)
float Transfer_Size_Float = 0;    // Points

// DMA buffer
static uint32_t srcBuffer[BUFFER_SIZE] __attribute__((section(".dtcm"), aligned(32)));
static uint32_t destBuffer[BUFFER_SIZE] __attribute__((section(".dtcm"), aligned(32)));

// Completion flag
volatile bool transferComplete = false;

// DMA channel
DMAChannel dma;

void setup() {

  pinMode(LED_BUILTIN, OUTPUT);

  // Initialize serial
  Serial.begin(115200);

  // Print banner
  Serial.print("\n\n\n\n\n\n\n\n******** Direct DMA Tests from Memery and GPIO to Memory *******\n");

  /***** Transfer of sequential numbers from memory to memory. *****/
  Serial.printf("\n  Transfer of %d 32 bit words from srcBuffer to destBuffer.", TRANSFER_SIZE);
  Serial.println(" Source: srcBuffer array index; Destination prefilled with 0xDEADFEEDs");

  // Initialize buffers sequential values and constant 0xDEADFEEDs
  for (int i = 0; i < BUFFER_SIZE; i++) {
    srcBuffer[i] = i;
    destBuffer[i] = 0xDEADFEED;
  }

  // Initialize DMA memory to memory transfer
  dma.begin(true);
  dma.sourceBuffer(srcBuffer, BUFFER_SIZE);  // BUFFER_SIZE doesn't appear to make a difference
    
  // Flush source cache - may be needed to not keep copying old stuff
  arm_dcache_flush_delete(srcBuffer, sizeof(srcBuffer));
  Capture_and_Analyze();

  /***** Transfer of GPIO2_PSR to memory. *****/

  Serial.printf("\n  Transfer of %d 32 bit words from GPIO2_PSR to destBuffer:", TRANSFER_SIZE);
  Serial.println(" Source: GPIO2_PSR; Destination prefilled with 0xDEADFEEDs");

  // Load srcBuffer with constant length random numbers and destBuffer with 0xDEADFEEDs
  for (int i = 0; i < BUFFER_SIZE; i++) {
    destBuffer[i] = 0xDEADFEED;
  }

  // Initialize DMA fpr GPIO2 transfer.  PSR = "Pin State Register".  Do NOT use GPIO_DR as it is really for output.
  dma.begin(true);
  dma.sourceBuffer(&GPIO2_PSR, BUFFER_SIZE);  // BUFFER_SIZE doesn't appear to make a difference
  dma.TCD->SOFF = 0;                          // Don't increment source address for GPIO

  // Do it!
  Capture_and_Analyze(); // Analysis has been stripped out to save space. ;-)

/***** End *****/
  Serial.print("\nTests Complete.\n\n");
}

// ISR for DMA completion
void dma_complete_isr(void) {
  transferComplete = true;
  dma.clearInterrupt();
}

#define DMA_CHANNEL 0  // Choose DMA Channel 0

// Capture and printout of some values, no analysis.
void Capture_and_Analyze() {
  arm_dcache_flush_delete(destBuffer, sizeof(destBuffer)); // Flush destination cache - needed to force copy
  dma.destinationBuffer(destBuffer, TRANSFER_SIZE * 4);    // Must be 4x? 
  dma.transferSize(4);                                     // 4 bytes (32-bit)
  dma.disableOnCompletion();
  dma.attachInterrupt(dma_complete_isr);                   // ISR diables DMA
  dma.interruptAtCompletion();
  dma.triggerContinuously();

  Serial.print("\n    Starting DMA transfer. ");

  transferComplete = false;
  Start_Time = micros();  // Wait for completion or timeout
  dma.enable();  // Do it!

  while (!transferComplete) {
    if ((micros() - Start_Time) > 10000) {
      Serial.printf(">>>>>>>> DMA TRANSFER TIMED OUT <<<<<<<<\n");
      return;
    }
  }

  End_Time = micros();
 
  // Display transfer time, time/copy, and copy frequency
  Serial.printf("Elapsed Time: ");
  Elapsed_Time_Float = End_Time - Start_Time;
  Serial.print(Elapsed_Time_Float);
  Serial.printf(" Microseconds; Time per Copy: ");
  Transfer_Size_Float = TRANSFER_SIZE;
  Serial.print(Time_per_Copy_Float = (1000 * Elapsed_Time_Float) / Transfer_Size_Float);
  Serial.printf(" ns; Transfer Rate: ");
  Serial.print(1000 / Time_per_Copy_Float);
  Serial.printf(" MW/s.\n");

#if Print_Verbose == 1
  // Display the raw captured data in a readable format
  Serial.printf("\n   Sample    Value\n  --------------------\n");

  // Print each sample with pin data bit values
  for (uint32_t i = 0; i < VERBOSE_SIZE; i++) {
    // Only print first 32 samples, then every 32nd sample to avoid flooding serial
    if (i < 32 || i % 32 == 0) {
      uint32_t value = destBuffer[i];
      Serial.printf("    %4d   0x%08X\n", i, value);  // Print sample number and raw hex value     
    }

    // Print a summary message if we're skipping a lot of data
    if (i == 32) {
      Serial.println("  ... (printing every 32nd sample) ...");
    }
  }
#endif
}

// Show activity using LED_BUILTIN indicating that code hasn't crashed totally
void loop() {
      digitalWrite(LED_BUILTIN, !digitalRead(LED_BUILTIN));
      delay(100);
}

This should run on any Teensy 4.0. (Note that one reason there was a problem with the prevous code was that GPIO6-9 do not work with DMA.)

But it seems that the trigger for the DMA is stuck at 10 MHz.

ChatGPT (!!) thought it had a great solution using the Periodic Interrupt Timer (PIT) but it turned out that it would not even compile, and apparently incomplete. I'll put a slightly enhanced compilable and runnable version here for reference:

Code:
#include "DMAChannel.h"
#include "imxrt.h"               // Register definitions for Teensy 4.0
#define DMAMUX_SOURCE_PIT0   68  // PIT Timer 0 trigger

DMAChannel dma;

#define BUFFER_SIZE 1024
volatile uint32_t srcBuffer[BUFFER_SIZE] __attribute__((section(".dtcm"), aligned(32)));
volatile uint32_t destBuffer[BUFFER_SIZE] __attribute__((section(".dtcm"), aligned(32)));

bool transferComplete = false;
int Start_Time = 0;

void setup() {
    Serial.begin(115200);
    while (!Serial);

    pinMode(LED_BUILTIN, OUTPUT);
    digitalWrite(LED_BUILTIN, 0);

    Serial.print("\n\n\n***** Teensy 4.0 DMA Test of 1024 32 bit Words using PIT0 Timer for Triggering *****\n");

    // Fill source buffer with test data value = buffer index
    for (int i = 0; i < BUFFER_SIZE; i++) {
        srcBuffer[i] = i;
    }

    setupPIT0();                                  // Initialize PIT0 Timer
    setupDMA();                                   // Initialize DMA
    Start_Time = micros();
    digitalWrite(LED_BUILTIN, 1);                 // Turns on when DMA starts
    while (!transferComplete) {
      if ((micros() - Start_Time) > 10000) {
        Serial.printf("\n  >>>>>>>> DMA TRANSFER TIMED OUT <<<<<<<<\n\n");
        return;
      }
    }
    Serial.printf("\n  DMA Complete.  Elapsed Time = %d µs.\n\n", micros() - Start_Time);
}

void setupPIT0() {
    CCM_CCGR1 |= CCM_CCGR1_PIT(CCM_CCGR_ON);  // Enable PIT clock
    PIT_MCR = 0x00;              // Enable PIT module (disable freeze in debug mode)
    PIT_LDVAL0 = 100;            // Reload value (adjust for speed)
    PIT_TCTRL0 = PIT_TCTRL_TEN;  // Enable PIT0 (no interrupts needed)
    Serial.print("\n  PIT0 Timer Setup Done and PIT0 DMA Trigger Set Up.");
}

void setupDMA() {
    dma.begin();
    dma.sourceBuffer(srcBuffer, sizeof(srcBuffer));         // Source buffer
    dma.destinationBuffer(destBuffer, sizeof(destBuffer));  // Destination buffer
    dma.transferSize(4);                                    // 4 bytes per transfer (32-bit)
    // dma.priority(0);                                       // Highest priority - dma.priority is not defined in the default libraries
    dma.disableOnCompletion();                              // Keep DMA running

    // dma.triggerContinuously() works; dma.triggerAtHardwareEvent(DMAMUX_SOURCE_PIT0) does not
    dma.triggerAtHardwareEvent(DMAMUX_SOURCE_PIT0);         // Use PIT0 as DMA trigger
//    dma.triggerContinuously();                              // Trigger continuously at around 10 MHz

    dma.attachInterrupt(dma_complete_isr);                  // ISR diables DMA
    dma.interruptAtCompletion();
    Serial.println("  Starting DMA Transfer.");
    transferComplete = false;
    dma.enable();                                           // Enable DMA
}

// ISR for DMA completion
void dma_complete_isr(void) {
  transferComplete = true;
  digitalWrite(LED_BUILTIN, 0);
  dma.clearInterrupt();
}
void loop() {
  delay(100);  // Hang out here but don't hog all the cycles. ;-)
}

The specific ChatGPT errors are (1) that "DMAMUX_SOURCE_PIT0" was not defined but was found elsewhere (although it is not known if it is correct) and the "dma.priority(0)" was also not defined but doesn't appear to be needed (famous last words!). But when compiled with those fixes it did nothing.

This is virtually identical to my code except for the triggering. In fact, just uncomment out the line with "dma.triggerContinuously();" and it runs with the pokey 10 MHz transfer rate. So the PIT trigger is not being set up correctly.

It's probably a hand-full of multiplexer bits (or just 1!) that need to be set.

Thanks for your support!

--- sam
 
Using a PIT isn't going to be faster than continuous triggering; it's meant for situations where you want to limit the transfer frequency below the maximum.

You're using micros() for timing here, which means the minimum measured time difference will be 0.000001s - that means the highest frequency you can accurately measure is only 1MHz. I'm not sure how you're getting 10MHz (possibly it's not transferring as much data as you intended) but you're going to need to use the ARM cycle counter if you want better timing measurements.
 
Hi:

The micros()-Start_Time is for the entire transfer of 1024 32 bit words in this case, just over 100 microseconds, ~100 nm/32 bit word.

As to the PIT, someone else suggested the same usage and thus it not applicable by itself at least.

So what is the solution? One of the primary benefits of using DMA is supposed to fast behind-the-scenes transfers, yet no one (including ChatGPT) seems to have a solution! ;-(

One would think that the continuous triggering would do it but it doesn't.

Thanks,


--- sam
 
Just to comment on my comment. It seems utterly amazing that in over 5 years (?), there is no easily accessible example of how to code fast Teensy 4.0 DMA for memory-to-memory copy or GPIO-memory copy!

Does anyone care to comment on my comment on my comment? Anyone???? :)

--- sam
 
Have you tried the example programs from the thread below? Searching with google usually works a lot better than the forum search feature.


EDIT: I modified Paul's example to copy 1024 x 32-bit words and added code to display the execution time, and I get 102.6 us, which matches your result. Is there some reason you think it should be faster?

Code:
#include <Arduino.h>
#include <DMAChannel.h>

DMAChannel m2m(false);

// !! NOTE 32-byte alignment for dcache !!
int32_t buffer1[128 * 9] __attribute__ ((used, aligned(32)));
int32_t buffer2[128 * 8] __attribute__ ((used, aligned(32)));

void setup() {
  Serial.begin(9600);
  while(!Serial);
  Serial.println("Go..");

  for (int j = 0; j < 128 * 9; j++) {
    buffer1[j] = j;
  }
  // !! NOTE !!
  arm_dcache_flush(buffer1, sizeof(buffer1));
 
  Serial.print("buffer1[10] = ");
  Serial.println(buffer1[10]);

  m2m.begin();
  //m2m.sourceCircular(buffer1, sizeof(buffer1));
  m2m.sourceBuffer(buffer1,sizeof(buffer1));
  m2m.destinationBuffer(buffer2, sizeof(buffer2));
  //m2m.transferCount(sizeof(buffer2));
  m2m.enable();
  m2m.disableOnCompletion();
  m2m.triggerContinuously();

  uint32_t Start = ARM_DWT_CYCCNT;
  while (!m2m.complete()) {
    // wait
  }
  uint32_t Cycles = ARM_DWT_CYCCNT - Start;
  Serial.printf( "%1.3lf us\n", Cycles*(1E6/F_CPU_ACTUAL) );
 
  // !! NOTE !!
  arm_dcache_delete(buffer2, sizeof(buffer2));
 
  Serial.print("buffer2[10] = ");
  Serial.println(buffer2[10]);
}

void loop() {
}
 
Last edited:
Hi:

At least someone is taking this seriously enough to hack code. :)

One of the example programs is for audio with a sampling rate of only 0.1764 MHz so that isn't directly relevant unless also modified.

Another utilizes one of the Timers (TMR4). Unfortunately, all the TMRs are tied in my application, though it may be instructive to see if that can be set up to transfer faster.

As far as why I think it should be faster? A 1970s minicomputer could do faster DMA! A Teensy 4.0 running a memory-to-memory copy loop in c will do almost 600 M 32 bit words/s.

Also of note is that the Teensy 4.0 DMA speed isn't much affected by whether it is using "dma.triggerContinuously()" and left to do its thing or "dma.triggerManual()" repeated in a tight loop. Thus it seems there is some throttle elsewhere.

Thanks,

--- sam
 
You realize the DMA engine runs on a much lower clock (IPG speed = ~150MHz) than the cpu? And the same applies to the low-speed GPIO registers...
I'm not sure why you're focusing on memory-to-memory copies, the thread started out asking how to write continuously to GPIOs.
 
Yes, I'm aware of that. 150 MHz is still 15 times the DMA speed!

Just for info. here are my benchmarks for memory-memory and GPIO-memory transfers:

******** Teensy 4.0 Memory and GPIO Copy 10,000 Word Speed Test *******

Memory to memory copy using c code:
Elapsed Time: 18 Microseconds. Time per Copy: 1.80 ns. Transfer Rate: 555.56 MW/s.

GPIO2_DR to memory copy using c code:
Elapsed Time: 468 Microseconds. Time per Copy: 46.80 ns. Transfer Rate: 21.37 MW/s.

GPIO7_DR to memory copy using c code:
Elapsed Time: 151 Microseconds. Time per Copy: 15.10 ns. Transfer Rate: 66.23 MW/s.

Memory to memory copy using assembly code:
Elapsed Time: 33 Microseconds. Time per Copy: 3.30 ns. Transfer Rate: 303.03 MW/s.

GPIO2_DR to memory copy using assembly code:
Elapsed Time: 468 Microseconds. Time per Copy: 46.80 ns. Transfer Rate: 21.37 MW/s.

GPIO7_DR to memory copy using assembly code:
Elapsed Time: 151 Microseconds. Time per Copy: 15.10 ns. Transfer Rate: 66.23 MW/s.

Thanks,

--- sam
 
If you're using micros() to do your timing, try using ARM_DWT_CYCCNT as shown in my example. You'll get an exact cycle count, and I suspect your C-code words/sec will be closer to 600M, and assembly will be 300M.
 
The CPU has the advantage of either using its data cache or directly accessing the tightly coupled memory (one-cycle access), the DMA engine has to perform bus accesses for every transfer. You're comparing apples to oranges here.

(".dtcm" is not a valid section name. I assume that's more AI generated slop.)
 
As a hardware designer (which I used to be in a former life), I would be absolutely ashamed if I was involved in developing a system where DMA was only capable of 10 MHz on a system with even a 150 MHz clock speed. Further, we know (see the benchmarks above) that transfers from GPIO to memory take place at around 66 MHz and they have to go through its internal bus. This is not an Atmega Nano! I suppose anything is possible, but to accept that the super sophisticated DMA subsystem is throttled to 10 MHz max seems absurd. Perhaps it takes ganging together multiple DMA channels. I will happlily stand corrected if there is an authoritative answer on this.

As to section(".dtcm"), perhaps it is ChatGPT slop (and Gad knows there's plenty of that) but there is the Data Tightly Coupled Memory and the term dtcm appears dozens of times in the i.MX RT1060 Processor Reference Manual. So I am not quite sure what you are saying.

P.S. RE: Timing, I will post updated timing with ARM_DWT_CYCCNT shortly.

--- sam
 
Well if you're a hardware designer, go read the IMXRT1060 reference manual. Then you can program the DMA TCDs directly to use a minor loop of 32 bytes, reading from memory using 4x 64bit accesses. There's no requirement to only process one word per minor loop.
 
It sounds like you are knowledgeable about this. So how about some actually help instead of simply RTFM. :) I've tried. You know what attempting to decipher that manual is like. I was hoping someone would have actually implemented this. My apologies if I'm wasting your time. I will go away now.

Cheers,

--- sam
 
Updated benchmarks:

Code:
******** Teensy 4.0 Memory and GPIO Copy 10,000 (32 bit) Word Speed Test *******

  Memory to memory copy using c code:
    Using micros(): Elapsed Time: 18 Microseconds.  Time per Copy: 1.80 ns.  Transfer Rate: 555.56 MW/s.
    Using F_CPU_ACTUAL: Elapsed Time: 17.782 Microseconds.  Time per Copy: 1.78 ns.  Transfer Rate: 562.38 MW/s.

  GPIO2_PSR to memory copy using c code:
    Using micros(): Elapsed Time: 468 Microseconds.  Time per Copy: 46.80 ns.  Transfer Rate: 21.37 MW/s.
    Using F_CPU_ACTUAL: Elapsed Time: 468.065 Microseconds.  Time per Copy: 46.81 ns.  Transfer Rate: 21.36 MW/s.

  GPIO7_PSR to memory copy using c code:
    Using micros(): Elapsed Time: 152 Microseconds.  Time per Copy: 15.20 ns.  Transfer Rate: 65.79 MW/s.
    Using F_CPU_ACTUAL: Elapsed Time: 151.462 Microseconds.  Time per Copy: 15.15 ns.  Transfer Rate: 66.02 MW/s.

  Memory to memory copy using assembly code:
    Using micros(): Elapsed Time: 34 Microseconds.  Time per Copy: 3.40 ns.  Transfer Rate: 294.12 MW/s.
    Using F_CPU_ACTUAL: Elapsed Time: 33.425 Microseconds.  Time per Copy: 3.34 ns.  Transfer Rate: 299.18 MW/s.

  GPIO2_PSR to memory copy using assembly code:
    Using micros(): Elapsed Time: 468 Microseconds.  Time per Copy: 46.80 ns.  Transfer Rate: 21.37 MW/s.
    Using F_CPU_ACTUAL: Elapsed Time: 468.172 Microseconds.  Time per Copy: 46.82 ns.  Transfer Rate: 21.36 MW/s.

  GPIO7_PSR to memory copy using assembly code:
    Using micros(): Elapsed Time: 152 Microseconds.  Time per Copy: 15.20 ns.  Transfer Rate: 65.79 MW/s.
    Using F_CPU_ACTUAL: Elapsed Time: 151.512 Microseconds.  Time per Copy: 15.15 ns.  Transfer Rate: 66.00 MW/s.

Tests complete.

There isn't a huge difference probably because of the large transfers.

--- sam
 
I have tried to help you, but it seems like you just want someone to write all the code for you and I'm not going to do that.
 
No, I don't expect you to write the code, but I did expect more than RTFM. Actually what I was expecting is that someone had already solved it. Have a good day.

--- sam
 
Back
Top