commas for thousands separator in printf

bobpellegrino

Active member
Should this be working? I tried the following, but the single quote between the % and the d is ignored.
It prints
myfunc took 537296668
instead of
myfunc took 537,296,668

Code:
void test_read() {
    setlocale(LC_NUMERIC, "");
    
    elapsedMillis t1;
    myfunc(read_array);
    printf("myfunc took %'d \n", t1);
    }
 
I think you're unlikely to find support for a thousands separator in a printf() designed for embedded systems. I've been using printf() for a long time and had never even heard of that feature. According to my google search, it's required for Posix 2008. Some printf() implementations are more complete than others, so if you search you may be able to find an alternative library that has it, but if you do, it will almost certainly take more code space than what it is Teensy now.
 
If all you want is the specific instance of a, unsigned 32 bit integer in decimal then it's not too tricky to create your own print routine with the custom formatting you want.

Based roughly on known working code but with some completely untested changes but I think something like this will do what you want. Pass it the value to output and a char[] large enough to hold the result. It puts the number, with added commas, in the char[] as a c_str.

Code:
unsigned int u32ToTxt(uint32_t value, char* destination) {

  unsigned int lenNeeded;
  uint32_t tmp = value;
  int divider = 0;
  while (tmp >= 10) {
    divider++;
    tmp /= 10;
  }
  while (divider) {
    *destination = (value / (divider*10)) % 10 + '0';
    destination++;
    if (divider%3 == 0) {
      *destination = ',';
      destination++;
    }
    divider--;
  }
  *destination = value%10 + '0';
  destination++;
  *destination = 0;
}

This isn't the most efficient way to do it, I have a library that does simple integer printouts with basic formatting (min widths, leading zeros etc...) that is more efficient in terms of mathematical operations but adding the commas in would mess up the width requirement calculations it does.

As for why I have that code - Printf is sloooooow. I had some logging code that was calling printf a LOT when formatting the data to output to the log file. There is an overhead every time you call printf, one or two calls it doesn't matter but if you call it in a loop it adds up rapidly. By replacing printf with custom number to text code I reduced the time required for the output formatting to around 3% of the original.
 
As for why I have that code - Printf is sloooooow. I had some logging code that was calling printf a LOT when formatting the data to output to the log file. There is an overhead every time you call printf, one or two calls it doesn't matter but if you call it in a loop it adds up rapidly.
Which reinforces the point that, when logging more than a few values, save the data in binary format. You’ll save processor time and storage space. If the data later needs more processing, you’ll save time and avoid loss of precision as well.

When it comes time to present the data to a human being, your processor should have no problem converting to text and either displaying or printing it as fast as a person can handle.
 
Which reinforces the point that, when logging more than a few values, save the data in binary format. You’ll save processor time and storage space. If the data later needs more processing, you’ll save time and avoid loss of precision as well.

When it comes time to present the data to a human being, your processor should have no problem converting to text and either displaying or printing it as fast as a person can handle.
I agree, I always default to binary for data logging.

The issue was the two product requirements of 1) must be in a specific industry standard format and 2) data (other than the last 1/2 second or so) should not be lost if the user pulls the power or memory card without warning.
1) required a text log (Believe me I tried to sell the log binary, run a converter program on the PC later approach). 2) means we can't log binary and then convert once they stop logging.
 
Can't dispute that binary logging is faster, but I've always been able to log to text (csv) files, and it's so nice for files to be readable and not to need another program on the host (PC) to read and translate a fixed binary format. The sketch below will reliably log 12 channels of data at 50 kHz to a CSV file. That works out to about 3.2 MB/sec, or about 200 MB/minute. It uses one call to sprintf() to format the data, and it uses the SdFat RingBuf to buffer data between the IntervalTimer and loop(). Data is always written to file in chunks of 512 bytes (1 sector), and it uses file.isBusy() to avoid blocking when SD writes take a long time. For this data rate, the RingBuf size is 210 KB, and it typically only gets about half full at max.

EDIT: files are preAllocated, which helps minimize long delays in SD file writes


Code:
// Teensy 4.1 SDIO logger with SdFat RingBuf

#include <SdFat.h>
#include <RingBuf.h>

//******************************************************************************
// configuration
//******************************************************************************
#define SD_CONFIG    (SdioConfig(FIFO_SDIO))        // use Teensy SDIO
#define LOG_TIME_S    (60)                 // s
#define LOG_FREQ_HZ     (50000)                         // Hz
#define ADC_N_CHAN      (12)                            // number of channels
#define RECORD_SIZE    (12 + ADC_N_CHAN*6)            // bytes/record
#define BYTES_PER_S     (RECORD_SIZE * LOG_FREQ_HZ)     // bytes/sec
#define LOG_FILE_SIZE   (BYTES_PER_S * LOG_TIME_S)      // total bytes
// RingBuf size = 50 ms of data for BYTES_PER_S data rate (minimum 1024)
#define BYTES_PER_50MS  (BYTES_PER_S/20)                // 50-ms buffer size
#define RB_MIN          (1024)                          // minimum buffer_size
#define RINGBUF_SIZE    (BYTES_PER_50MS < RB_MIN ? RB_MIN : BYTES_PER_50MS)

//******************************************************************************
// SdFat RingBuf template class <file type, buffer size>
//******************************************************************************
RingBuf<FsFile,RINGBUF_SIZE> rb;       // ISR --> RingBuf --> loop --> SD file

//******************************************************************************
// global variables
//******************************************************************************
IntervalTimer timer;            // IntervalTimer for ISR-level writes
SdFs     sd;                // SdFat type
FsFile   file;                // SdFat file type
size_t   rbMaxUsed = 0;            // RingBuf max bytes (useful diagnostic)
uint32_t error;                // RingBuf/file error code
uint32_t timer_count=0;                 // timer interrupt executions
uint32_t timer_bytes=0;                 // timer interrupt bytes written
char     format[128];                   // sprintf format string

//******************************************************************************
// IntervalTimer handler -- write simulated ADC data to RingBuf
//******************************************************************************
void timer_handler( void )
{
  static char s[RECORD_SIZE];
  static uint32_t start_us;
  uint32_t now_us = micros();
  if (timer_count == 0)
    start_us = now_us;

  // write up to 16 values to str (as determined by format)
  uint16_t value = now_us % 4096;
  int n = sprintf( s, format, (now_us - start_us)/1E6,
  value, value, value, value, value, value, value, value,
  value, value, value, value, value, value, value, value
  );

  // write n bytes from string s to RingBuf
  if (rb.memcpyIn( s, n ) < n)          // if write NOT complete
    error = 1;                          //   set global error variable
  timer_count++;                        // increment count
  timer_bytes += n;                     // update bytes
}

//******************************************************************************
// setup()
//******************************************************************************
void setup()
{
  Serial.begin(9600);
  while (!Serial && millis() < 3000) {}

  Serial.printf( "Teensyduino version %1lu\n", TEENSYDUINO );
  Serial.printf( "SdFat version %s\n", SD_FAT_VERSION_STR );

  // Initialize the SD, print message and stop if error
  if (!sd.begin(SD_CONFIG)) {
    sd.initErrorHalt(&Serial);
  }

  // create format string
  int n = sprintf( format, "%s", "%1.6lf" );
  for (int i=0; i<ADC_N_CHAN; i++) {
    n += sprintf( &format[n], "%s", ",%1hu" );
  }
  n += sprintf( &format[n], "%s", "\n" );
  Serial.print( format );
}

//******************************************************************************
// open file, preAllocate, init RingBuf, start timer, log data, print results
//******************************************************************************
void loop()
{
  // clear input buffer
  while (Serial.available()) { Serial.read(); }
 
  // List files in SD root.
  Serial.println( "Files in root directory:" );
  sd.ls(LS_DATE | LS_SIZE);
 
  Serial.println( "Type any character to begin" );
  while (!Serial.available()) {}
 
  Serial.printf( "Log %1lu channels for %1lu seconds at %1lu Hz\n",
                        ADC_N_CHAN, LOG_TIME_S, LOG_FREQ_HZ );
  Serial.printf( "Pre-allocate file %1lu bytes\n", LOG_FILE_SIZE );
  Serial.printf( "RingBuf %1lu bytes\n", RINGBUF_SIZE );

  // Open or create file - truncate existing file.
  if (!file.open( "logfile.txt", O_RDWR | O_CREAT | O_TRUNC )) {
    Serial.println( "open failed\n" );
  }
  // File must be pre-allocated to avoid huge delays searching for free clusters.
  else if (!file.preAllocate( LOG_FILE_SIZE )) {
     Serial.println( "preAllocate failed\n" );
     file.close();
  }
  // init the file and RingBuf
  else {
    rb.begin(&file);
  }
 
  uint32_t timer_period_us = 1E6 / LOG_FREQ_HZ;
  Serial.printf( "Start timer (period = %1lu us)\n", timer_period_us );
  timer_count = 0;
  timer_bytes = 0;
  timer.begin( timer_handler, timer_period_us );

  elapsedMillis ms = 0;
  while (ms/1000 < LOG_TIME_S) {
     
    // number of bytes in RingBuf
    size_t n = rb.bytesUsed();
    if (n > rbMaxUsed) {
      rbMaxUsed = n;
    }

    // if RingBuf >= 512 (one sector), write 512 bytes to file
    if (n >= 512 && !file.isBusy()) {
      if (512 != rb.writeOut(512)) {
        error = 1; // write error
      }
    }
  }
 
  // stop timer, write any remaining RingBuf data to file, truncate file
  timer.end();
  rb.sync();
  file.truncate();

  Serial.printf("fileSize: ");
  Serial.println((uint32_t)file.fileSize());
 
  Serial.printf( "Runtime %1.3lf s\n", ms/1E3 );
  Serial.printf( "%1lu timer interrupts (%1lu bytes)\n", timer_count, timer_bytes );
  Serial.printf( "RingBuf max %1lu bytes\n", (uint32_t)rbMaxUsed );
             
  switch (error) {
    case 0:   Serial.printf( "No error\n" );            break;
    case 1:   Serial.printf( "Not enough space in RingBuf\n" );    break;
    case 2:   Serial.printf( "File is full\n" );        break;
    case 3:   Serial.printf( "Write from RingBuf failed" );    break;
    default:  Serial.printf( "Undefined error %1lu\n", error );    break;
  }
 
  file.rewind();
  // Print first 10 lines of file
  for (uint8_t n = 0; n < 10 && file.available();) {
    int c = file.read();
    if (c < 0) {
      break;
    }
    Serial.write(c);
    if (c == '\n') n++;
  }

  // close file
  file.close();
}
 
Last edited:
The issue wasn't the amount of data. It was already outputting the formatted text data in blocks equal to the sector size with huge buffers to allow for SD card stalls.
The issue was purely that printf is slow, specifically there is a reasonable overhead each time it's called no matter how simple the output string is. The data was CAN-FD messages which means each packet was of variable size and up to 64 bytes. The simple solution to this was along the lines of
Code:
for (byte=0;byte<len;byte++) len+=sprintf(output+len, " %02X", data[byte]);
There were then around 3-4 more printfs generating the rest of the output line, so 67-68 calls to printf per line. It ended up that for each packet the calls to printf were taking around 110 us.

You load two CAN-FD busses a 2 Mbaud up to 95% and suddenly half your processor time is spent in printf.

Replacing the code in that for loop with a lookup table, for this data value copy these bytes to the output, reduced the time for each packet from 110 us to around 15 us. Custom print routines in that loop were a similar speed up, slightly more complex code but lower memory footprint.
Completely removing all the printf calls and using custom print routines dropped the time to 3 us.
 
50K lines per second is pretty impressive for a CSV recorder. What do you use to examine the file after you've recorded a minute of data? By that time you've got about three times the row limit for Excel!

I was stuck in the CSV format for many years when I had to log GPS data in NMEA format. Now that we have affordable U-Blox GNSS systems with binary output and libraries like TinyGPS that will convert the NMEA format to binary, I'm able to stick with binary logging. I process all my data with MatLab where the number of samples is limited only by your system memory. In my university days, MatLab was provided to all faculty and students through a campus-wide license. Now that I'm retired, I use the MatLab Home version which costs $149. I usually upgrade every few years so I can keep up with improvements.
 
The timer handler runs every 20 us, and I think it's taking about 10 us on each execution. Binary would be a lot faster, but I just wanted to show that if you have a reason to use CSV, it can be done. This is a much higher data rate than most will ever need, so if you have more modest data rate, CSV is not a limiter. There is just one call to sprintf() per interrupt. I have a CSV file parser and charting program that I wrote myself and it's quite fast. I know lots of engineers that swear by matlab, but I never had access to the academic pricing, so I just found other ways.
 
You know what else is slow? Most serial monitors. Using Serial.print and sending it to the serial monitor built in to Visual Studio is like running full speed into a pond of molasses. Surprisingly, the serial monitor built into the Arduino IDE is much faster. I have code that generates ~600 data frames (200 values each) per second. But your average Windows serial monitor can only display them at ~ 300 frames per second. If I try to read the data with a python script, I can only get ~24 frames per second. Depending on the Windows routines to read ASCII data over the USB is never going to be speedy. I wonder if my old VT52 would keep up? She was beautiful...
 
Back
Top