Fast Data Logger

slash2

Member
I have a project that requires streaming data to SD card. The data consists of 8 bit-wide channels of 10 bit data clocked in at 23.4 MHz. One frame is 60 clocks. I did some tests (stock code - no data capture) with a fast SD Card using the 'bench' example in the SDFat library and I get around 22 MB/sec. So doesn't look good at first glance, especially since I need to add data capture, buffering, etc. Thinking about it, I see there's a fair amount of unnecessary stuff in the data stream and since port reads are 32 bits wide. I can reconfigure the data stream to be 32 bits wide with 10 bits per frame. This only requires a clock rate of 3.9 MHz. Now of course I'll need reconfigure data outside the Teensy as it will take too many cycles. Also I realize that the 4.1 does not have a contiguous 32 bit port brought out, so I'll need to roll my own board. This is ok as this thing has to be as small as possible.
In order to give as the maximum amount of time to the SD card code, I figure I need to acquire data via interrupt . I have seen in several places in the forum that the stock interrupt code is slow. Then I found this thread:
which looked promising. So I basically duct-taped that to the Bench example to see how that would work using a big shared buffer. I can just get to 4 MHz with this so, I am optimistic. I need to dig in and see where I can speed things up.
So I have a few questions:
1. I'm wondering if DMA would be better? I need to do the homework on that, but it would be helpful to know if it's worth looking at in detail. Can one move port data into memory with an external clock using DMA?
2. The incoming data clock is fixed to 23.4Mhz. I'm think I could save an oscillator if my board runs the Teensy at 23.4Mhz instead of 24Mhz. I figure this will screw up timers and baud generators, which is ok I think. But I assume it might mess up the USB?
3. I didn't see any other threads that were real similar to what I'm trying to do, but if missed something, please let me know.

And I want to thank everyone that contributes to this forum and the libraries. It let me make a lot of progress quickly. I was also worried about how making my own T4.1 would work. Then I found the page on PJRC where Paul sells the boot chip and has great advice on how to lay out the board and run tracks. Wow, simply amazing.
 
Oops- not so fast.
I forgot when I ran the test, I had changed to code to save the data count to the input buffer rather than the actual port. When I use the version that reads the port, the program doesn't run right. I toggle a pin in the interrupt routine and when I read the port, I never see the pin change, even at low sample clock speeds (100KHz). The firmware runs a bit then the chip reboots. So I made a version of the code without the SD Card parts and that does seem to function (code below). It takes about 45 nsec from clock change to marker high, then 20 nsecs to marker low when reading a value and 38 nsecs when reading the port. This was 'fastest' compile option, but the normal faster is only a nsec or so slower.

C-like:
// see how fast we can read data from GPIO to buffer

// Size of read/write.
const size_t BUF_SIZE = 4096; //512;

// Insure 4-byte alignment.
uint32_t buf32[(BUF_SIZE + 3) / 4];
uint8_t* buf = (uint8_t*)buf32;
volatile uint32_t* inBufPtr = (uint32_t*)buf32;
uint8_t* bufEndPtr = (uint8_t*) (buf + BUF_SIZE);
volatile uint32_t* inEndPtr = (uint32_t*) (inBufPtr + BUF_SIZE/4);
volatile uint32_t* inPtr = inBufPtr;
volatile uint32_t inCnt = 0;

const int framePin = 34;    // B1-13
const int ExtClockPin = 37; // B1-03  External clock input
const int clkOutPin = 36;   // B1-02

#define IMXRT_GPIO6_DIRECT  (*(volatile uint32_t *)0x42000000) // port access to GPIO6 (ADB0,ADB1)
#define IMXRT_GPIO7_DIRECT  (*(volatile uint32_t *)0x42004000) // port access to GPIO7 (B0,B1)
#define IMXRT_GPIO8_DIRECT  (*(volatile uint32_t *)0x42008000) // port access to GPIO8 (EMC32+)
#define IMXRT_GPIO9_DIRECT  (*(volatile uint32_t *)0x4200C000) // port access to GPIO9 (EMC0-31)

volatile uint32_t& portStatusReg = (digitalPinToPortReg(ExtClockPin))[6]; // precalc status reg and mask
uint32_t mask = digitalPinToBitMask(ExtClockPin);
//------------------------------------------------------------------------------
void ClkInterrupt()
{
   digitalWriteFast(framePin, HIGH); // see how long this takes...
   portStatusReg = mask;  // we worked around the Teensyduino handler, so we need to reset the status flag ourself
   *inPtr = IMXRT_GPIO6_DIRECT; // inCnt; // get the current data
   if( inPtr++ >= inEndPtr ) inPtr = inBufPtr; // reset buffer pointer if needed
   inCnt += 4;                    // we added 4 more bytes to the buffer (SD card routines are byte-oriented)
   digitalWriteFast(framePin, LOW);  // ...this long   
   asm volatile("dsb"); // avoid double calls due to possible bus sync issues
}
//------------------------------------------------------------------------------
void setup()
{
  pinMode( framePin, OUTPUT);
  digitalWrite( framePin, LOW);
  
  Serial.begin(9600);

  // Wait for USB Serial
  while (!Serial);

  analogWriteFrequency(clkOutPin, 4000000); //
  analogWrite(clkOutPin, 128); // 128 / 256 = 50% duty cycle

    //attachInterrupt(strobePin, clockDataInt, RISING);
  attachInterrupt(ExtClockPin, nullptr, RISING); //CHANGE); // let Teensyduino do the setup work
  attachInterruptVector(IRQ_GPIO6789,ClkInterrupt); // override Teensyduino handler and invoke the callback directly
  NVIC_ENABLE_IRQ(IRQ_GPIO6789);
  NVIC_SET_PRIORITY(IRQ_GPIO6789, 0);  // highest prio, might be good to reduce a bit 
}
//------------------------------------------------------------------------------
void loop()
{
}
 
I minimized the code that I am having problems with. Quick overview: it reads in 32 bit data from GPIO6 and stores it to a buffer. The SD card routine access this buffer to send the data to SD card. My buffer is very (too?) simple . I keep a running count of bytes written and read. When the write count in is bigger the read count out by more than a data block of 512 bytes, then file writing of a block is enabled. The sample clock is generated by the Teensy on pin 36 and it is jumpered to pin 37 to provide the interrupt for data sampling. In the interrupt routine I have two lines of code (one commented out) to read data. One is the one I want - from the port. The other is just reads the data count value. The latter works fine. The port one does not. It starts and I see interrupts for about 400usecs or so, then they stop happening and soon after the sample clock suddenly changes to about 809kHz. After several seconds the board resets itself. Now, as noted above, the interrupt routine does take a bit longer with port reads, but if that was the problem, reducing the sample rate should help and it does not. It seems like something is writing where it should not.
See the code below. It is based on the SD card bench example.
Requires: Pin 36 connected to Pin 37 and a fast SD Card.
C:
/*
 * This program tests high speed port reads and storage to SD Card
 * need to jumper pin 36 (clk out) to pin 37 (sample clock in)
 */
#include "FreeStack.h"
#include "SdFat.h"
#include "sdios.h"

// SD_FAT_TYPE = 0 for SdFat/File as defined in SdFatConfig.h,
// 1 for FAT16/FAT32, 2 for exFAT, 3 for FAT16/FAT32 and exFAT.
#define SD_FAT_TYPE 2

const int framePin = 34;    // b1-13
const int ExtClockPin = 37; // b1-03  External clock input - jumper to clockOutPin
const int clkOutPin = 36;   // B1-02

#define IMXRT_GPIO6_DIRECT  (*(volatile uint32_t *)0x42000000) // port access to GPIO6 (ADB0,B1)
#define IMXRT_GPIO7_DIRECT  (*(volatile uint32_t *)0x42004000) // port access to GPIO7 (B0,B1)
#define IMXRT_GPIO8_DIRECT  (*(volatile uint32_t *)0x42008000) // port access to GPIO8 (EMC32+)
#define IMXRT_GPIO9_DIRECT  (*(volatile uint32_t *)0x4200C000) // port access to GPIO9 (EMC0-31)

volatile uint32_t& portStatusReg = (digitalPinToPortReg(ExtClockPin))[6]; // precalc status reg and mask
uint32_t mask = digitalPinToBitMask(ExtClockPin);

// SDCARD_SS_PIN is defined for the built-in SD on some boards.
#ifndef SDCARD_SS_PIN
const uint8_t SD_CS_PIN = SS;
#else   // SDCARD_SS_PIN
// Assume built-in SD is used.
const uint8_t SD_CS_PIN = SDCARD_SS_PIN;
#endif  // SDCARD_SS_PIN

// Try max SPI clock for an SD. Reduce SPI_CLOCK if errors occur.
#define SPI_CLOCK SD_SCK_MHZ(50)

// Try to select the best SD card configuration.
#if HAS_SDIO_CLASS
#define SD_CONFIG SdioConfig(FIFO_SDIO)
#elif ENABLE_DEDICATED_SPI
#define SD_CONFIG SdSpiConfig(SD_CS_PIN, DEDICATED_SPI, SPI_CLOCK)
#else  // HAS_SDIO_CLASS
#define SD_CONFIG SdSpiConfig(SD_CS_PIN, SHARED_SPI, SPI_CLOCK)
#endif  // HAS_SDIO_CLASS

// Set PRE_ALLOCATE true to pre-allocate file clusters.
const bool PRE_ALLOCATE = true;

// Set SKIP_FIRST_LATENCY true if the first read/write to the SD can
// be avoid by writing a file header or reading the first record.
const bool SKIP_FIRST_LATENCY = true;

// Size of read/write.
const size_t BUF_SIZE = 8192; //4096; //512;
const size_t WR_SIZE = 512;

// File size in MB where MB = 1,000,000 bytes.
const uint32_t FILE_SIZE_MB = 25;
//const uint32_t DATABUF_SIZE = 65536;

// Write pass count.
const uint8_t WRITE_COUNT = 1;

// Read pass count.
const uint8_t READ_COUNT = 1;
//==============================================================================
// End of configuration constants.
//------------------------------------------------------------------------------
// File size in bytes.
const uint32_t FILE_SIZE = 1000000UL * FILE_SIZE_MB;

// Insure 4-byte alignment.
uint32_t buf32[(BUF_SIZE + 3) / 4];
uint8_t* buf = (uint8_t*)buf32;
volatile uint32_t* inBufPtr = (uint32_t*)buf32;
uint8_t* bufEndPtr = (uint8_t*) (buf + BUF_SIZE);
volatile uint32_t* inEndPtr = (uint32_t*) (inBufPtr + BUF_SIZE/4);
volatile uint32_t* inPtr = inBufPtr;
uint8_t* outPtr = buf;

volatile uint32_t inCnt = 0;
uint32_t wrCnt = 0;

#if SD_FAT_TYPE == 0
SdFat sd;
File file;
#elif SD_FAT_TYPE == 1
SdFat32 sd;
File32 file;
#elif SD_FAT_TYPE == 2
SdExFat sd;
ExFile file;
#elif SD_FAT_TYPE == 3
SdFs sd;
FsFile file;
#else  // SD_FAT_TYPE
#error Invalid SD_FAT_TYPE
#endif  // SD_FAT_TYPE

// Serial output stream
ArduinoOutStream cout(Serial);
//------------------------------------------------------------------------------
// Store error strings in flash to save RAM.
#define error(s) sd.errorHalt(&Serial, F(s))
//------------------------------------------------------------------------------
void cidDmp() {
  cid_t cid;
  if (!sd.card()->readCID(&cid)) {
    error("readCID failed");
  }
  cout << F("\nManufacturer ID: ");
  cout << uppercase << showbase << hex << int(cid.mid) << dec << endl;
  cout << F("OEM ID: ") << cid.oid[0] << cid.oid[1] << endl;
  cout << F("Product: ");
  for (uint8_t i = 0; i < 5; i++) {
    cout << cid.pnm[i];
  }
  cout << F("\nRevision: ") << cid.prvN() << '.' << cid.prvM() << endl;
  cout << F("Serial number: ") << hex << cid.psn() << dec << endl;
  cout << F("Manufacturing date: ");
  cout << cid.mdtMonth() << '/' << cid.mdtYear() << endl;
  cout << endl;
}
//------------------------------------------------------------------------------
void clearSerialInput() {
  uint32_t m = micros();
  do {
    if (Serial.read() >= 0) {
      m = micros();
    }
  } while (micros() - m < 10000);
}
//------------------------------------------------------------------------------
void ClkInterrupt()
{
   digitalWriteFast(framePin, HIGH); // see how long this takes...
   portStatusReg = mask;  // we worked around the Teensyduino handler, so we need to reset the status flag ourself
   *inPtr = IMXRT_GPIO6_DIRECT; //  THIS DOESN'T WORK
   //*inPtr = inCnt;                // THIS WORKS
   if( inPtr++ >= inEndPtr ) inPtr = inBufPtr; // reset buffer pointer if needed
   inCnt += 4;                    // we added 4 more bytes to the buffer (SD card routines are byte-oriented)
   digitalWriteFast(framePin, LOW);  // ...this long
   asm volatile("dsb"); // avoid double calls due to possible bus sync issues
}

//------------------------------------------------------------------------------
void setup()
{
  pinMode( framePin, OUTPUT);
  digitalWrite( framePin, HIGH);
 
  Serial.begin(9600);
  while (!Serial);

  cout << F("\nUse a freshly formatted SD for best performance.\n");
  if (!ENABLE_DEDICATED_SPI)
  {
    cout << F(
        "\nSet ENABLE_DEDICATED_SPI nonzero in\n"
        "SdFatConfig.h for best SPI performance.\n");
  }
  cout << uppercase << showbase << endl; // use uppercase in hex and use 0X base prefix

  analogWriteFrequency(clkOutPin, 4000000); // set up sampling clock
  analogWrite(clkOutPin, 128); // 128 / 256 = 50% duty cycle  
}
//------------------------------------------------------------------------------
void loop()
{
  float s;
  uint32_t t;
  uint32_t maxLatency;
  uint32_t minLatency;
  uint32_t totalLatency;
  bool skipLatency;

  // Discard any input.
  clearSerialInput();

  // F() stores strings in flash to save RAM
  cout << F("Type any character to start\n");
  while (!Serial.available()) {
    yield();
  }
#if HAS_UNUSED_STACK
  cout << F("FreeStack: ") << FreeStack() << endl;
#endif  // HAS_UNUSED_STACK

  if (!sd.begin(SD_CONFIG)) {
    sd.initErrorHalt(&Serial);
  }
  if (sd.fatType() == FAT_TYPE_EXFAT) {
    cout << F("Type is exFAT") << endl;
  } else {
    cout << F("Type is FAT") << int(sd.fatType()) << endl;
  }

  cout << F("Card size: ") << sd.card()->sectorCount() * 512E-9;
  cout << F(" GB (GB = 1E9 bytes)") << endl;

  cidDmp();

  // open or create file - truncate existing file.
  if (!file.open("bench.dat", O_RDWR | O_CREAT | O_TRUNC)) error("open failed");

  cout << F("FILE_SIZE_MB = ") << FILE_SIZE_MB << endl;
  cout << F("BUF_SIZE = ") << BUF_SIZE << F(" bytes\n");
  cout << F("Starting write test, please wait.") << endl << endl;

  // do write test
  uint32_t n = FILE_SIZE / BUF_SIZE;
  cout << F("speed KB/sec") << endl;

  file.truncate(0);
  if (PRE_ALLOCATE)
  {
    if (!file.preAllocate(FILE_SIZE))  error("preAllocate failed");
  }
  t = millis(); // start time
  // start up data read interrupt after everything is reday to go
  attachInterrupt(ExtClockPin, nullptr, RISING); //CHANGE); // let Teensyduino do the setup work
  attachInterruptVector(IRQ_GPIO6789,ClkInterrupt); // override Teensyduino handler and invoke the callback directly
  NVIC_ENABLE_IRQ(IRQ_GPIO6789);
  NVIC_SET_PRIORITY(IRQ_GPIO6789, 0);  // highest priority, might be good to reduce a bit
   
  while( wrCnt < FILE_SIZE )
  {
    while( inCnt < (wrCnt + (uint32_t)512) ); // wait until we have at least 512 bytes of new data
       
    if (file.write(outPtr, WR_SIZE) != WR_SIZE ) //BUF_SIZE) != BUF_SIZE) {
        error("write failed");
   
    if( (outPtr += WR_SIZE) >= bufEndPtr ) outPtr = buf;
    wrCnt += WR_SIZE;  
  }
  file.sync();
  t = millis() - t;
  s = file.fileSize();
  cout << s / t << ',' << t << ',' << s; //maxLatency << ',' << minLatency;
 
  file.close();
  sd.end();
}
 
Last edited:
C:
// Insure 4-byte alignment.
uint32_t buf32[(BUF_SIZE + 3) / 4];
uint8_t* buf = (uint8_t*)buf32;
volatile uint32_t* inBufPtr = (uint32_t*)buf32;
uint8_t* bufEndPtr = (uint8_t*) (buf + BUF_SIZE);
volatile uint32_t* inEndPtr = (uint32_t*) (inBufPtr + BUF_SIZE/4);
volatile uint32_t* inPtr = inBufPtr;
uint8_t* outPtr = buf;
There is no guaranteed order for how these statements will be executed so some may end up with invalid values. You should only initialize values from constants.
 
Oh - didn't know about that, thanks! Here it is.

CrashReport:
A problem occurred at (system time) 13:21:16
Code was executing from address 0x4AAA
CFSR: 82
(DACCVIOL) Data Access Violation
(MMARVALID) Accessed Address: 0x20004A60 (Stack problem)
Check for stack overflows, array bounds, etc.
Temperature inside the chip was 41.51 °C
Startup CPU clock speed is 600MHz
Reboot was caused by auto reboot after fault or bad interrupt detected

If I look at the list file - I think this is where the report is pointing (0x4AAA line is in bold - the rest is for context)

I'll see if I can make sense of this. But to me it seems like I'm overwriting something. Seems so strange that it only happens with the 'small' change of reading from a port rather than memory...

C:
static bool isBusyFifoWrite() { return !(SDHC_PRSSTAT & SDHC_PRSSTAT_BWEN); }
    4a52:    6a63          ldr    r3, [r4, #36]    ; 0x24
  while (fcn()) {
    4a54:    f413 6980     ands.w    r9, r3, #1024    ; 0x400
    4a58:    d0f6          beq.n    4a48 <SdioCard::writeData(unsigned char const*)+0x2c>
    4a5a:    f107 0240     add.w    r2, r7, #64    ; 0x40
    while (0 == (SDHC_PRSSTAT & SDHC_PRSSTAT_BWEN)) {
    4a5e:    4b43          ldr    r3, [pc, #268]    ; (4b6c <SdioCard::writeData(unsigned char const*)+0x150>)
    4a60:    f507 7710     add.w    r7, r7, #576    ; 0x240
    4a64:    6a58          ldr    r0, [r3, #36]    ; 0x24
    4a66:    0541          lsls    r1, r0, #21
    4a68:    d5fc          bpl.n    4a64 <SdioCard::writeData(unsigned char const*)+0x48>
      SDHC_DATPORT = p32[i];
    4a6a:    f852 1c40     ldr.w    r1, [r2, #-64]
  for (uint32_t iw = 0; iw < 512 / (4 * FIFO_WML); iw++) {
    4a6e:    3240          adds    r2, #64    ; 0x40
      SDHC_DATPORT = p32[i];
    4a70:    6219          str    r1, [r3, #32]
  for (uint32_t iw = 0; iw < 512 / (4 * FIFO_WML); iw++) {
    4a72:    4297          cmp    r7, r2
      SDHC_DATPORT = p32[i];
    4a74:    f852 1c7c     ldr.w    r1, [r2, #-124]
    4a78:    6219          str    r1, [r3, #32]
    4a7a:    f852 1c78     ldr.w    r1, [r2, #-120]
    4a7e:    6219          str    r1, [r3, #32]
    4a80:    f852 1c74     ldr.w    r1, [r2, #-116]
    4a84:    6219          str    r1, [r3, #32]
    4a86:    f852 1c70     ldr.w    r1, [r2, #-112]
    4a8a:    6219          str    r1, [r3, #32]
    4a8c:    f852 1c6c     ldr.w    r1, [r2, #-108]
    4a90:    6219          str    r1, [r3, #32]
    4a92:    f852 1c68     ldr.w    r1, [r2, #-104]
    4a96:    6219          str    r1, [r3, #32]
    4a98:    f852 1c64     ldr.w    r1, [r2, #-100]
    4a9c:    6219          str    r1, [r3, #32]
    4a9e:    f852 1c60     ldr.w    r1, [r2, #-96]
    4aa2:    6219          str    r1, [r3, #32]
    4aa4:    f852 1c5c     ldr.w    r1, [r2, #-92]
    4aa8:    6219          str    r1, [r3, #32]
  [B]  4aaa:    f852 1c58     ldr.w    r1, [r2, #-88][/B]
    4aae:    6219          str    r1, [r3, #32]
    4ab0:    f852 1c54     ldr.w    r1, [r2, #-84]
    4ab4:    6219          str    r1, [r3, #32]
    4ab6:    f852 1c50     ldr.w    r1, [r2, #-80]
    4aba:    6219          str    r1, [r3, #32]
    4abc:    f852 1c4c     ldr.w    r1, [r2, #-76]
    4ac0:    6219          str    r1, [r3, #32]
    4ac2:    f852 1c48     ldr.w    r1, [r2, #-72]
    4ac6:    6219          str    r1, [r3, #32]
    4ac8:    f852 1c44     ldr.w    r1, [r2, #-68]
    4acc:    6219          str    r1, [r3, #32]
  for (uint32_t iw = 0; iw < 512 / (4 * FIFO_WML); iw++) {
    4ace:    d1c9          bne.n    4a64 <SdioCard::writeData(unsigned char const*)+0x48>
  m_transferActive = true;
    4ad0:    2301          movs    r3, #1
    4ad2:    4618          mov    r0, r3
    4ad4:    702b          strb    r3, [r5, #0]
}
 
You can also increase data rates by increasing the SDIO clock speed from
25/50Mhz to 99Mhz on fast cards.

@beermat has had good results in doing this
 
You can also increase data rates by increasing the SDIO clock speed from
25/50Mhz to 99Mhz on fast cards.

@beermat has had good results in doing this
That may be needed once I get the basics working. It's close right now ( I need 16MB/sec and raw bench is 22Mb/sec.

I thought of something else. So I need to explicitly save and restore registers in the interrupt routine since I'm not using the straight Arduino method?
 
SD writes can take randomly long times so you must have sufficient buffer space. How long? Very long. See the maximum FAT write time in the SD specification for example. (750ms!) I tested this once long ago... Longer buffers (power of 2 * 512) help so use as many as will fit in RAM.

I first dabbled in faster data loggers with the SparkFun Logomatic. My last used the Teensy 3.6.

No. the compiler handles register saves. Actually the ARM system/compiler is pretty nice about this. Automatically saving registers that a function could clobber. (Clobber being a technical compiler term.) So other than ending with a return from interrupt instruction, your ISR looks like any other function.
 
Thanks - I'll be mindful of that.

Just to remove one variable - that the interrupt routine takes a bit longer when using the port read - I added some more framePin toggles to the working 'read variable' version interrupt routine to take up enough time to equal the 'port read' version. It still works. So, right now, with this card, and how I'm using it, it does not appear to be a lack of CPU cycles or buffer space. This particular problem seems like it has something to do with reading from a port instead of a variable.
 
SD writes can take randomly long times so you must have sufficient buffer space. How long? Very long. See the maximum FAT write time in the SD specification for example. (750ms!) I tested this once long ago... Longer buffers (power of 2 * 512) help so use as many as will fit in RAM.

I first dabbled in faster data loggers with the SparkFun Logomatic. My last used the Teensy 3.6.

No. the compiler handles register saves. Actually the ARM system/compiler is pretty nice about this. Automatically saving registers that a function could clobber. (Clobber being a technical compiler term.) So other than ending with a return from interrupt instruction, your ISR looks like any other function.
Ok, so it's been a long time since I delved into assembly, and I have not done any in this processor.
So, there are differences between the interrupt routines depending on whether its reading the port or variable. The port one pushes just r4 though r0, r1, r2, r3, and r4 are used in the routine. At the end there is no pop, but, instead, a ldr.w->r4, [sp], #4 which I'm guessing does the same thing? The variable one pushes r4 and r5 but uses r0-r5 and at the end it does do a pop->{r4, r5}. Otherwise they look pretty similar.
I think tomorrow I'll slow down the clock and see if the Arduino interrupts work for both versions. That might tell me if it's interrupt related.
-update - Just tested it and it fails with the standard interrupt as well.
 
Last edited:
It has been a while so I looked up the notes I made when adding interrupts to my version of Forth.

"The ARM stacks up a bunch of registers: R0-R3,R12,LR,PC,PSR"

This is handy as r0-r3 can be clobbered so aren't saved by a function. Any additional registers used must be saved and restored. Which is why an ISR can use r0-r5 while explicitly saving only r4 and r5. The interrupt hardware handles the rest. Oh, some bit of magic triggers the return from interrupt action. No special return from interrupt instruction required.

In some circumstances the stack frame could be larger. Requiring more time to save and restore.

As for the assembly, for a RISC processor the assembly syntax can be very complicated. So I wouldn't hazard a guess as the to equivalence of a pop and that instruction.
 
The crash is when it's trying to write the SD card file, it looks like outPtr has been incremented beyond the end of the array and has collided with the stack canary (a small area of inaccessible memory).
 
The crash is when it's trying to write the SD card file, it looks like outPtr has been incremented beyond the end of the array and has collided with the stack canary (a small area of inaccessible memory).
Very close!
I need to remember : "It's the simple things that get you"
I think the root cause was inPtr incrementing past the end. I had:
C:
if( inPtr++ >= inEndPtr ) inPtr = inBufPtr; // reset buffer pointer if needed
when it should have been a pre-increment:
C:
if( ++inPtr >= inEndPtr ) inPtr = inBufPtr; // reset buffer pointer if needed

Ugh - K&R are shaking their heads...

What threw me for a loop was it only occurred when reading from the port. Must have been some subtle compiler memory alignment difference.

Sorry it was bit of a wild goose chase, but I learned (and hopefully others will too) several good things:
- CrashReport - wow - this is so useful
- This processor has good memory checks I didn't know about
- How this processor handles the stack in an interrupt
- I need to guard against long SD write delays
- be careful of using variables within a variable declaration
- I may be able to increase my SD card write clocking

Thanks all! This is exciting as I am pretty sure now that the T4.1 can handle this job. Now the fun begins.
 
- I need to guard against long SD write delays

The message linked below shows how to use SdFat's isBusy() to avoid blocking during SD writes that take a long time. It also shows the use of preAllocate(). I use a Sandisk Extreme 32GB card, and the longest time I've seen to complete a write is 40 ms, so when possible I size the buffer for 50 ms of data. The vast majority of writes complete in ~5 us, but occasionally there is a very long one. I haven't fiddled with SDIO clock, so let us know if you do that and what are your results.

 
Thanks, good to know. I am pre-allocating and I will have the card erased before use, which I think saves time by not needing to erase each sector before each write?
I'll definitely update if it all works - well if it fails, I'll update too as failure is a great teacher.

This is an interesting project. About 20 years ago I built a custom 32Msample/sec data logger It took two striped SCSI drives, two FPGAs, a SCSI controller, and an embedded PC running embedded NT. Here I am doing basically the same thing but really small.
 
In my experience, even with a freshly-erased card, and a preAllocated file, there will still be occasional long writes. I can't say I've investigated these as thoroughly as possible, so let us know what you learn. I think you'll need to leverage higher-speed cards to reach your desired data rate.

FYI, the example I linked also uses SdFat's RingBuf, which has interrupt-safe writes, so I always use it. By default it only supports statically-allocated buffers, but I have tested simple modifications to allow the buffer to be on the heap (RAM2) or in EXTMEM (PSRAM). With a 1MB buffer in PSRAM, and PSRAM set for a higher clock speed, I was able to get 20 MB/sec. I don't think I tried to go any higher, and that was without trying higher SDIO clock. IIRC, the PSRAM wasn't a whole lot faster than that anyway.
 
It has been a while so I looked up the notes I made when adding interrupts to my version of Forth.

"The ARM stacks up a bunch of registers: R0-R3,R12,LR,PC,PSR"

This is handy as r0-r3 can be clobbered so aren't saved by a function. Any additional registers used must be saved and restored. Which is why an ISR can use r0-r5 while explicitly saving only r4 and r5. The interrupt hardware handles the rest. Oh, some bit of magic triggers the return from interrupt action. No special return from interrupt instruction required.

In some circumstances the stack frame could be larger. Requiring more time to save and restore.

As for the assembly, for a RISC processor the assembly syntax can be very complicated. So I wouldn't hazard a guess as the to equivalence of a pop and that instruction.
FORTH! Wow! Back in the day I did an oceanographic unit with Forth on a Z80 as I recall. Maybe my favorite programming language.
 
In my experience, even with a freshly-erased card, and a preAllocated file, there will still be occasional long writes. I can't say I've investigated these as thoroughly as possible, so let us know what you learn.
It has been a while but I have done testing where I just tested the card. No file system involved at all. Just my code writing sequential blocks. My usual method was to have a timer running and I would write the current count into the buffer before starting the write. Then when I read back the data I could work out the timing. Producing graphs like:
single.gif

As for freshly erased, that is going to depend a lot on the details inside the card. I remember once reading about a Sandisk system where data in the card was arranged in groups. Each group had the expected number of blocks plus some more which were erased. When data was written, it went to one of the erased blocks which then replaced the current block at the write location. The old one would then get shuffled off to be erased and returned to the erase pool.

Something I did on my Teensy 3.6 logger (no file system) is to turn on the LED when a SD write started and turn it off when it completed. The LED then provides some feedback. I see it get brighter (higher duty cycle) periodically indicating longer than usual write times.
 
It has been a while but I have done testing where I just tested the card. No file system involved at all. Just my code writing sequential blocks. My usual method was to have a timer running and I would write the current count into the buffer before starting the write. Then when I read back the data I could work out the timing. Producing graphs like:
Thanks, @UhClem. What is the X axis on your chart? Can you tell us what it might mean in terms of bytes or sectors or something like that?
 
A count of writes. In this case it is the same as the number of blocks. (512 bytes)
That's interesting. The baseline is over 1 ms, with spikes to 9 and 17 ms about every 160-170 sectors. I've done a lot of benchmarking with a relatively modern SD card (Sandisk Extreme 32 GB), with most 512-byte writes taking ~5 us, and very occasional spikes as high as 40 ms, but much less frequent than the spikes in your chart.
 
Back
Top