Teensy 4.1 Threading w/SD logging

TeensyDigital

Well-known member
I've got a significant code base and program that runs on an ESP32 that I am porting to Teensy 4.1. On the ESP32, I am taking advantage of the two cores, running a bunch of sensor logging on one core and logging to the SD on the other core in the same program. The sensors are two accelerometers and a barometer all on I2C running at 500hz (samples per second). One the other core, I am logging to the SD card at varying speeds from 20hz (safely with open/close) to 150hz (keeping files open).

I have been really impressed with the T4.1. In a comparison of instructions-per-second the T4.1 is 5000x faster, so it seems like I've got infinite CPU cycles, but when it comes to splitting these two process into two threads, I'm running into "single-threading" conflicts. I've created a thread for the I2C sampling that runs at 500hz without issue. I've also created a thread (or keep it in the main loop) that writes to the SD card at 100hz just fine. But, when I run them together, my sample rate on the sensors drops down to 40hz and my SD logging drops way down. I have stacked both loops with a lot of thread.yield() statements, but I suspect the SD card open/close functions are locking the CPU and starving the sampling thread.The two threads do share global variables, but the SD thread only reads, while the sensor side updates the variables.

Any suggestions on how to get these two I/O intense process to behave more like the dual-cores on the ESP32 would be appreciated.

Thanks,

Mike
 
Disregard....

Disregard the OP. It turns out I had an instrumentation error in my measurement when I combined the two threads. I can confirm that I can run the two threads with I2C sampling on one thread and SD logging at high rates on the other with very little (<3%) degradation. I did find that when I used threads.setMicroTimer(5) I get slightly better performance when both threads are running hot. I am only using standard Teensy libraries and things look good!
 
An important thing to remember is that the SD write routines in the latest version of SD (which is actually like SDFAT 2.0B under the hood) takes only a handful of microseconds to write to the SD card. It blocks interrupts for only fractions of a microsecond and does most of its work using DMA and the hardware SDIO.

If you run your data collection in an ISR triggered by an interval timer, buffer the data, and write it out in the foreground loop when buffers become full, you should have no problem collecting data at 500Hz and writing to the SD. Along with others in these forums, I've found that you can log data at rates near 1MHz with well-managed buffers.

There's really no need for a threaded RTOS for this collection scenario. Collect at ISR time and write out in the loop() function and you can handle this collection without threads or any other complications.
 
yeah, so far I am very impressed. I have a LOT going in on my board. It is a flight computer for a rocket, so it is sampling a dozen sensors, serial to a GPS, serial in/out to high power radios, checking a dozen pins for pyro and continuity, and logging everything it can to ten different SD files. I've spent a few years optimizing it for the ESP32, but now porting it over to Teensy. In my case, I am sampling at higher rates than I am logging, as I am integrating and averaging samples as I go and then occasionally log to SD. Depending on the stage of flight I sample more and log faster (e.g., during launch I log to sd at 100hz, but on descent I'm fine with 10hz). On the ESP32 to get to 100hz SD logging I had to keep the files open, presenting a risk to the data if the vehicle cato'd off the pad. With the T4.1 I seem to be able to go up to about 300hz on the SD with more protective open/close on the data files. That is a real bonus.

I am sure I will be back here with more optimization questions later, but so far so good. Thanks!
 
On the ESP32 to get to 100hz SD logging I had to keep the files open, presenting a risk to the data if the vehicle cato'd off the pad. With the T4.1 I seem to be able to go up to about 300hz on the SD with more protective open/close on the data files. That is a real bonus.

Rather than opening and closing the files, you could simply call the flush() function. That function writes currently buffered data to the SD and updates the length in the directory. IIRC, it is significantly faster than the close()/open() sequence. If you have multiple files open at once, it might be more efficient if you pre-allocate a generous number of sectors to each file as it is opened. That will extend the time to open the file, but will speed up writes as there is then no need to step through the FAT and update it each time a new cluster is required.

If cato failures are probable, the standard optimizations centered around accumulating large buffers and writing to SD with the more efficient multi-sector writes are probably not feasible. I ran into that issue on aerospace project where there was about a 1% probability of failure in the first 10 seconds after launch, but most of the loggers I developed were designed to log for months with minimum battery power consumption.
 
Rather than opening and closing the files, you could simply call the flush() function.

Thank you! That is very good suggestion. I benchmarked flush() and it was 19% faster than using open/close and it gave me the same write protection. Using a dedicated benchmark sketch I can write a 64 byte string 336 times using open/close in one second. If I keep the file open it logs a screaming 7,751 times / second. With flush() I open, flush(), and close later I get 400 writes/sec. That is a meaningful improvement. For my needs, I don't usually need to go above 100hz, but the flush() approach will give me more flexibility.
Cato's don't happen often (although more than your 1%), but when they do, having high resolution data up to the millisecond is invaluable for the post mortem.
 
The open once, write and flush many times, close once process is more efficient with larger writes to SD. When you do small writes, like your 64 bytes, it is not so efficient, as the system has to merge and write a sector 8 times and update the directory sector 8 times before it moves on to the next data sector on the SD card.

<TINS> Thankfully, SD cards are pretty rugged things. We recovered data from an SD card in an oceanographic instrument that leaked and shorted out its lithium batteries--resulting in a nasty chemical mess inside the pressure case. We were able to remove the SD card and examine the data to determine when the flooding occurred. Unfortunately that was soon after the mooring was deployed at about 40 meters depth in the equatorial Pacific and we didn't recover it until 6 months later.

NOTE: TINS is the standard naval acronym that precedes sea stories. It stands for This Is No Shit. When you see that, you need to recognize that the story is a recitation of events from many years ago and which may have been passed along from sailor to sailor to the point where the original report is lost in history.
 
I'm still using a Teensy 3.6 in my rocket controller unit. It logs 512 bit records at 200hz all while sampling sensors/GPS/continuity, running EKFs, postprocessing nav data and deployment logic, running an OLED screen, camera control, data radio output, and running PID loops for the control system. I don't find that I need anything approaching multi-threading as mborgerson stated as there is ample speed for the MCU to "do stuff and wait". I can probably run at 400Hz, but there's no point. The quality of the navigation solution does not improve (especially using commercial MEMS sensors), nor does the control response as its bandwidth is an order of magnitude different. In fact I've found 100-200Hz to capture all relevant flight dynamics unless you're studying structural or propulsion resonances in the rocket. Higher frequencies require tuning careful monitoring of the system error as sensor noise adversely affects the measurements.

Note that I log to a binary file which does improve performance. I don't flush() the data every loop but rather at key events where I expect nothing important to happen as I drop 1-10 loops when I do that - as you've noticed, this greatly improves transfer rates but you run the risk of loosing data should something go wrong. Once on the ground, the teensy translates the binary data into a CSV file for easy processing. Note that with the exception of logic bugs that are my fault which prevented flush() from running, I've never lost data through outside effects such as lawndarts, flat spins, CATOs and a landing into a drainage ditch. SD cards are tougher than the rocket.

It's amazing how much power these 32bit MCUs have. I haven't tried the 4.X teensies yet as the pinouts were too different and frankly I don't really need the processing power for running loosely coupled nav filters. Although I'd like to make a tightly coupled filter one day as the GPS dropouts at high G are quite annoying. Maybe then will multi-threading help, but then again if PGRC comes out with a 1GHz MCU it'll be back to "do stuff and wait".
 
Back
Top