Real low latency logging for Teensy 3.5/3.6 SDIO SD

Status
Not open for further replies.

tni

Well-known member
Here is a sample sketch for Teensy 3.5 / 3.6 logging using the SDIO SD card slot:
https://github.com/tni/teensy-samples/blob/master/SdFatSDIO_low_latency_logger.ino

It uses SdFat beta. Data is acquired in an ISR (ensuring stable capture timing) and passed to the main loop using an interrupt-safe queue.

The main loop is responsible for writing to SD. A pre-allocated, pre-erased file is used. This drastically cuts down worst case SD write latency from potentially >850ms to 40ms (so a much smaller write buffer can be used). It also eliminates file system corruption issues, when Teensy is rebooted / power cycled during logging, since no FAT updates are performed while logging.

If the logging is interrupted, the log file will contain the logged records and pre-erased content (typically all bytes set to 0 - depends on SD card). If the log is properly closed, it will get truncated to the amount of data that was logged.

Records are written back-to-back. There is no need for using a special block with padding. (A write buffer is used for partial sectors.)
 
With a worst case latency of 40ms and 200kb worth of buffers, 5MB/s are possible.

(IME, these latency events are isolated and don't occur clustered for the linear writes to a pre-erased area.)
 
Sorry to keep asking questions, how much overhead does this require on the MCU?
Writing at 4.8MB/s, the ISR running at 20us tick time, 40% CPU usage (Teensy 3.6 @ 180MHz). (Dummy work performed in yield().)

Close to 0 overhead would be possible by using exact 32kb buffers written using DMA (DMA is quite slow for small blocks). But the DMA version of SdFat beta hasn't been reliable for me. (The chip has a few SDIO / DMA hardware bugs.)
 
Last edited:
Writing at 5MB/s, the ISR running at 20us tick time, 40% CPU usage (Teensy 3.6 @ 180MHz). (Dummy work performed in yield().)

Close to 0 overhead would be possible by using exact 32kb buffers written using DMA (DMA is quite slow for small blocks). But the DMA version of SdFat beta hasn't been reliable for me. (The chip has a few SDIO / DMA hardware bugs.)

Am I correct that the most overhead is inside the write functions?

Or in other words logging at 5us tick time and only writing 1.25 MB/s (changing both with a factor 4) would use less CPU?


Alain
 
Interrupts have a fair amount of base overhead. CPU usage of log function:

24 bytes @ 5us (4.8 MB/s): 46% (54% idle)
96 bytes @ 20us (4.8 MB/s): 40% (60% idle)
6 bytes @ 5us (1.2 MB/s): 22% (78% idle)
24 bytes @ 20us (1.2 MB/s): 12% (88% idle)
 
Last edited:
Interrupts have a fair amount of base overhead. CPU usage of log function:

24 bytes @ 5us (4.8 MB/s): 40% (60% idle)
96 bytes @ 20us (4.8 MB/s): 46% (54% idle)
6 bytes @ 5us (1.2 MB/s): 22% (78% idle)
24 bytes @ 20us (1.2 MB/s): 12% (88% idle)

Thanks

The first two measurements are a bit odd. I would expect that the 4 times more interrupts would give a higher usage.
 
Hello,

First, I want to thank TNI for this example.
Second, in the code, I can't see where it take care of overrun conditions. I mean that when max log file size is reach it's look like teensy continue to write on the SD. But in this case there where no pre-allocated space for that.
Am I wrong ?

Bests,
Manu
 
I believe one can use IntervalTimer to trigger create() to pre-create another file before the start of a new day and when the app is not writing log to sd card. Then another IntervalTimer to trigger to close() the existing file and switch to the new file.
 
Second, in the code, I can't see where it take care of overrun conditions. I mean that when max log file size is reach it's look like teensy continue to write on the SD. But in this case there where no pre-allocated space for that.
When the log file is full, logging is stopped and 'loop()' blinks the LED. Note the return from 'setup()':
https://github.com/tni/teensy-samples/blob/master/SdFatSDIO_low_latency_logger.ino#L335

Code:
...
            if(!file.write(log_buffer->rawData(), log_buffer->rawSize())) {
                if(file.getLastError() == file.E_eof) {
                    file.close();
                    serial.println("Log file full. Logging finished.");
[B]                    return;                    
[/B]                }
                // ignore other write errors, hopefully subsequent writes will succeed
...
 
Hi,

Been using your logging program with great success, however I am now trying to use it for another application and stuck on a few things (due to limited programming knowledge). My current device is unique in that I need a huge buffer for about 1-2 seconds and then this buffer gets smaller and smaller. For instance I need about 4096 bytes for a single reading, but near the end of collection I only need 40-50 bytes. I have another low latency logger working that writes as many 512 sectors that are needed and works great, however my files are way larger than they need to be since I am filling these small 40-50 byte sections with zeros to make a full 512byte sector.
My first stumbling block is how to add an array to a struct and then place this dynamic sized array into the buffer
Code:
struct LogEntry {
    uint32_t counter;
    uint32_t record_offset;
    uint32_t time;
   // uint32_t dummy;
    uint16_t Hz[1000];
};
void captureData() {
    uint32_t time = micros();
    static uint32_t counter = 0;
    static uint32_t record_offset = 0;
    static uint32_t dummy = 0;
    static uint16_t Hz[1000];//this array would be of dynamic size from 1-1000 samples
    
    logEntry( { counter, record_offset, time, ?? } );//how do I put an array into logEntry??
    counter++;
    record_offset += sizeof(LogEntry);
}
I guess the next major question is, your program now puts in less than 512 per data capture, however my system needs to put more than 512 at times, does your program account for this? I have tried to follow it and see if a partial sector is puts in a buffer, but I can't really determine what happens if buffer is greater than 512.
I just thought I would ask, not only to make my file size smaller, but also to increase my programming knowledge. I have tried to search as much as I can, but don't think I am using the right keyword since end up finding way different uses of arrays in a struct. Thanks in advance.
 
Thanks for your answer. It's funny (or not) that I found the "stop" function soon after I wrote my post. I *may* had read the code a little bit more before inquiry.
Bests,
Manu
 
My current device is unique in that I need a huge buffer for about 1-2 seconds and then this buffer gets smaller and smaller. For instance I need about 4096 bytes for a single reading, but near the end of collection I only need 40-50 bytes. I have another low latency logger working that writes as many 512 sectors that are needed and works great, however my files are way larger than they need to be since I am filling these small 40-50 byte sections with zeros to make a full 512byte sector.
My first stumbling block is how to add an array to a struct and then place this dynamic sized array into the buffer
Using dynamic sizes will be messy. Structs normally have a fixed size. In C99, there is something called Flexible Array Member, which is supported as non-standard extension by GCC in C++ mode. However, there are a lot of restrictions in terms of usage (you can't really use them in an array).

You could use std::vector as part of LogEntry for your variable size part. However, dynamic allocation (malloc / new / free / delete) is not safe to use inside an ISR, you may get heap corruption. std::vector uses new / delete under the hood. You also need new write logic to serialize everything, since you won't have contiguous memory anymore.

A possible workaround is to combine multiple LogEntries:
Code:
struct LogEntryBasic {
    uint32_t counter;
    uint32_t record_offset;
    uint32_t time;
    uint32_t dummy;
};

struct LogEntryExt {
    char data[sizeof(LogEntryBasic)];
};

struct LogEntry {
    bool is_ext;
    union {
        LogEntryBasic entry_basic;
        LogEntryExt entry_ext;
    };
};

If you need additional data, you would use additional LogEntryExt records. This way, you can keep the existing buffering and retain the static buffer allocation.

I guess the next major question is, your program now puts in less than 512 per data capture, however my system needs to put more than 512 at times, does your program account for this? I have tried to follow it and see if a partial sector is puts in a buffer, but I can't really determine what happens if buffer is greater than 512.
Using more than 512 bytes for LogEntry and LogBuffer size works.
 
Hi tni,
I am working on a project where I would like to use your sketch for logging of accelerometer data. I do not have a lot of experience with this kind of stuff so this may sound dumb, but if I were to use your sketch is it as easy as changing this section in your example:
Code:
void captureData() {
    uint32_t time = micros();
    static uint32_t counter = 0;
    static uint32_t record_offset = 0;
    
    logEntry( { counter, record_offset, time, 0x42424242 } );
    counter++;
    record_offset += sizeof(LogEntry);
}
To something like this (with the accel variable declared at the start):
Code:
void captureData() {
    serial.begin(115200);
    uint32_t accel = analogRead(23); 
    static uint32_t counter = 0;
    static uint32_t record_offset = 0;
    
    logEntry( { counter, record_offset, accel, 0x42424242 } );
    counter++;
    record_offset += sizeof(LogEntry);
}

I tried doing this twice and both times I got a 2,097,152 KB log.bin file with data that didn't seem to make any sense, but maybe I am missing something obvious. Any help would be greatly appreciated, thanks in advance.
 
This logging program works, but how can I use the data? Can it be converted to a format that is readable?

The raw data can be converted into any format you wish. Just insert the SD card in your PC or Mac and write a program which reads the file from the card and reads/interprets/converts it for your convenience.
 
I've been trying to use this program to log some data on a teensy 3.6, and it mostly works, except that is can't seem to handle floating point data. Does anyone know why this would be? Can the code be modified to accept floating point data? Thank you.
 
What size float data is being written? Teensy can use 32 bit float or 64 bit double? If it is read back in the same size/type/format as written then it should work.

The T_3.5 and 3.6 native FPU float data is the 32 bit variety. It can handle 64 bit - but that is through software.

What is the point of failure? Can the Teensy read back what is written - or is the problem reading it from the SD card on another device?

If the device reading can't see the 32 bit data - perhaps assign it to a 64 bit double value and write that. It should maintain accuracy in that conversion and then be readable on the other end if it expects a 64 bit floating point storage format.
 
The issue is with parsing the file on a PC using Matlab. I have changed the type of all the data to "float". As long as none of the variables being written have a decimal point, I can parse the file with no difficulty. As soon as any one of the variables has a decimal point, when I parse the file I only get junk.
 
you cannot sent float over serial without any conversions, what looks like garbage to you can actually be correct output.
 
Status
Not open for further replies.
Back
Top