Read SD card file values (char string) into an Int array?

martianredskies · Mar 23, 2020

I've looked through a lot of forums last 24hrs but can't find a solution.

Suppose you have a .txt file (on SD), and in that file you have values stored like;

<1.2.3.4.5.6.7.8.9.100.200.400.800.1600.32000.<128.128.128.128.128.128.128.128.>

quick question(s);

If you wanted to read ascii from the beginning (1) until 32000, read that into an integer array, is there an easy way in the current format?

*I don't think padding is an option.

Thanks!

martianredskies · Mar 23, 2020

Getting closer. I'm now counting the "." as the end of a variable, so i can count number of variables in a file.

I've thought this through, the largest number i need to represent is 7 characters long ( i.e 9999999). So maybe in the code below, i read each character into an array (i.e

Code:

char buf[8];

, but reset the counter if the index equals 8 or a character of the value '.' appears.

I'm not exactly sure of whats next, but if i had a character array containing { 9,9,9,9,9,9,9 }, would i then use parseInt to turn that into an long 9999999?

Can anyone say if I'm approaching this the right way?

Code:

void parseFile(){
myFileIn = SD.open(secondWord);
  if (myFileIn) {
    char buf[8];
    char * tmp;
    int i = 0;
    uint32_t fileSize = myFileIn.size();
    Serial.print("filesize is: ");
    Serial.print(fileSize);
    Serial.println(" bytes");
    while (myFileIn.available()){ // read from the file until there's nothing else in it:
      tmp = myFileIn.read();
      if (tmp == '.'){
        parseIndex = i;
        Serial.println(parseIndex);
        i++;
        if (i == metadataPointer){
          Serial.println("end of metadata");
          return; //kill function
        }
      }
      
      }
    myFileIn.close(); // close the file:
    
  }
  else {
    Serial.println("error opening file for read"); // if the file didn't open, print an error:
    }
}

defragster · Mar 24, 2020

It seems a valid start to parsing. The data format is understood by you. Whatever the data was before saving ideally should be re-created or recovered by undoing the process that saved the data.

If the '.' or ',' ends data not sure why 9,9,9 would become 999.

Also that snippet doesn't show what would be done with the parsed data after it is recovered.

martianredskies · Mar 24, 2020

defragster said:
It seems a valid start to parsing. The data format is understood by you. Whatever the data was before saving ideally should be re-created or recovered by undoing the process that saved the data.

If the '.' or ',' ends data not sure why 9,9,9 would become 999.

Also that snippet doesn't show what would be done with the parsed data after it is recovered.

yeah, the idea is to keep the SD datafiles intact, i.e <1.2.3.4.5.6.7.8.9.100.200.400.800.1600.32000.<128.128.128.128.128.128.128.128.> ...

... and convert these character arrays to integers and stored in memory only when the file is loaded.

I setup a data file like <9876543.1.2.3.4.5.6.7.8.9 etc. , the code is below is only setup to capture the first variable but it seems to work;

It gives me a character [] array with value; 9,8,7,6,5,4,3

ideally with this method i can stop the read at any point and the last string should be in a character array that i can quickly convert to an integer.
*edit posted correct code

Code:

void parseFile(){
myFileIn = SD.open(secondWord);
  if (myFileIn) {
    uint8_t bufSize = 7;
    char buf[] = "0000000";
    char * tmp;
    int i = 0;
    int j = 0;
    
    uint32_t fileSize = myFileIn.size();
    Serial.print("filesize is: ");
    Serial.print(fileSize);
    Serial.println(" bytes");
    //myFile.read();
    
    while (myFileIn.available()){ // read from the file until there's nothing else in it:
      tmp = myFileIn.read();

      if (tmp == '.'){
        parseIndex = i;
        //Serial.print("index: ");
       // Serial.println(parseIndex);
        //i++;
        Serial.println(buf);
        j = 0;
        Serial.println("end of metadata");
        return; //kill function  
      }
     
      
      if (tmp != '<'){
        buf[j] = tmp;
        j++;
      }
      
    }
    myFileIn.close(); // close the file:
    //Serial.println(buf);
  }
  else {
    Serial.println("error opening file for read"); // if the file didn't open, print an error:
    }
}

martianredskies · Mar 24, 2020

I've got it all working. Attaching my function if it may be of help to others. Consider a .txt file with the following character sequence;

<9876543.1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20>

The code will skip '<' '>' & '.' and write everything else to a small char buffer (7) representing the length of your largest digit expected. When it hits the next '.', it will clear this buffer and reset the write index to 0. At the same time it will output a long integer conversion of the (up to) 7 digit number. It will keep this conversion going until it reaches the end marker '>'.

output of above would be (integers);

9876543
2
3
4
5
6
7
8
9
10
etc.

Because it always has stored the last (up to) 7 digit "number" when it receives the '.', you can easily modify the code to use the parseIndex to stop on any variable (i.e 7th variable in).

Serial send or Storing files using ascii characters may take up 2x to 4x the space (vs int), but it makes serial transmission and receiving a breeze. No more msb lsb nonsense, sending 1 is the same as sending 9,999,999. That being said, part of the need for the function below was to avoid padding numbers so every one fits in the same format. With a string, sending a 1 can be done with 2bytes (char + marker), but sending or storing a 1 as 0000001 is 8bytes.

Code:

void parseFile(){
myFileIn = SD.open(whatever.txt); //set to whatever the filename your reading is
  if (myFileIn) {
    uint32_t parseIndex = 0;
    uint8_t bufSize = 7;
    char buf[bufSize];
    char * tmp;
    int i = 0;
    int j = 0;
    int intResult = 0;
    
    uint32_t fileSize = myFileIn.size();
    Serial.print("filesize is: ");
    Serial.print(fileSize);
    Serial.println(" bytes");
    //myFile.read();
    
    while (myFileIn.available()){ // read from the file until there's nothing else in it:
      tmp = myFileIn.read();

      if (tmp == '.'){
        parseIndex = i; // keep track of '.' elapsed
        i++;
        intResult = atoi(buf);
        Serial.println(intResult); //print full buffer upon receiving '.'
        memset(buf, 0, 7); // wipe out the character buffer
        j = 0; 
      }
     
      
      if ((tmp != '<') && (tmp != '>') && (tmp != '.')){ // ignore '<' '>' '.'
        buf[j] = tmp;
        j++;
      }

      if (tmp == '>'){ //exit function upon receiving '>' end marker
        return; 
      }
      
    }
    myFileIn.close(); // close the file:
    //Serial.println(buf);
  }
  else {
    Serial.println("error opening file for read"); // if the file didn't open, print an error:
    }
}

mborgerson · Mar 24, 2020

Three observations:

1: If the median value in your data is < 999 and all the values are positive, your character storage format may actually take less space than storing 4-byte long integers.
2: If you have to handle negative numbers, the storage efficiency will go down depending on the number of negative values as you have to account for the '-' character.
3. If you want your file processing to run MUCH faster, read sections of the file into a character array of at least 4K Bytes and process the array. It will take a bit of extra code to handle multi-digit numbers that cross over the buffer boundaries if the buffer is smaller than the file size.

martianredskies · Mar 24, 2020

1. Wouldn’t 999. = 4bytes? Bytes would always equal one more than the number of digits, no?

2. Luckily no negative numbers.

3. The function runs pretty fast, it parses 130k variables (though most 8-bit data (3digits)) in 2 seconds on my T3.6

mborgerson · Mar 24, 2020

You are correct on #1. I slipped in an extra '9'--and even that wasn't really correct. In reality, it should have been: If the median value in the data is less than 100, the storage will take less than you used a 4-byte binary integer format. This is one of those situations where a good analysis of the expected data might pay significant dividends. It's always important to remember that in today's world storage is cheap, cpu cycles per second are cheap and both are getting cheaper. Operator time is expensive and getting more so all the time.

On # 3, 130KB in 2 seconds seems OK for files that size. I'm more used to working with files of 2 to 100 MBytes, and parsing those with single-character file reads would try my patience. OTOH, you probably don't have to worry about files that size if you are going to read the data and use it on at T3.6. Working with large data sets from long-term loggers does tend to make one concentrate on efficiency to a greater extent than if you are working with only a few hundreds of KB.

Deciding whether to spend a few hours optimizing code can be a struggle. If your code gets run 10,000 times and forces the user to wait 2 seconds each time, that's 20,000 seconds or 5.55 hours. If you could spend 2 hours optimizing the code so that each run took only 0.2 seconds to parse the data, that would reduce the wait time to about a half an hour. The net savings in operator hours is about 5 hours. Is that savings worth 2 hours of your time? The answer can be complex and involves the cost of the operator hours versus the programmer hours as well as a lot of other factors. Saving 2 seconds isn't a big deal---but saving 200 seconds each run might be important.

I do appreciate that it is important to get it right first, then make it run faster if necessary. If you try to make it run quickly before you know your parsing works properly, a bug could cost you all the operator time you thought you had saved.

martianredskies · Mar 24, 2020

I agree with all your points. I also tend to code in a way "that just works" first and then spritz it the more you look at it and realize you can do things more efficiently. The same data file that took 2seconds to read take 25 seconds to transmit over serial. There is always room for improvement .

mborgerson said:
You are correct on #1. I slipped in an extra '9'--and even that wasn't really correct. In reality, it should have been: If the median value in the data is less than 100, the storage will take less than you used a 4-byte binary integer format. This is one of those situations where a good analysis of the expected data might pay significant dividends. It's always important to remember that in today's world storage is cheap, cpu cycles per second are cheap and both are getting cheaper. Operator time is expensive and getting more so all the time.

On # 3, 130KB in 2 seconds seems OK for files that size. I'm more used to working with files of 2 to 100 MBytes, and parsing those with single-character file reads would try my patience. OTOH, you probably don't have to worry about files that size if you are going to read the data and use it on at T3.6. Working with large data sets from long-term loggers does tend to make one concentrate on efficiency to a greater extent than if you are working with only a few hundreds of KB.

Deciding whether to spend a few hours optimizing code can be a struggle. If your code gets run 10,000 times and forces the user to wait 2 seconds each time, that's 20,000 seconds or 5.55 hours. If you could spend 2 hours optimizing the code so that each run took only 0.2 seconds to parse the data, that would reduce the wait time to about a half an hour. The net savings in operator hours is about 5 hours. Is that savings worth 2 hours of your time? The answer can be complex and involves the cost of the operator hours versus the programmer hours as well as a lot of other factors. Saving 2 seconds isn't a big deal---but saving 200 seconds each run might be important.

I do appreciate that it is important to get it right first, then make it run faster if necessary. If you try to make it run quickly before you know your parsing works properly, a bug could cost you all the operator time you thought you had saved.

martianredskies · Mar 24, 2020

Cleaned up and refined a bit. Used "isDigit" function instead of searching for specific character. If you're just dealing with numbers, you can filter everything else besides numbers being saved to the character array using "isDigit".

Code:

void parseFile(){
myFileIn = SD.open("whatever.txt");
  if (myFileIn) {
    uint8_t bufSize = 7;
    char buf[bufSize];
    char * tmp;
    int i = 0;
    int j = 0;
    uint32_t intResult = 0;
    
    while (myFileIn.available()){ // read from the file until there's nothing else in it:
      tmp = myFileIn.read();

      if (tmp == '.'){
        parseIndex = i; // keep track of '.' elapsed.
        i++;
        intResult = atoi(buf); //convert char array to long integer. This is your Integer value at the current parseIndex pos
        Serial.println(intResult); //print full buffer upon receiving '.'
        memset(buf, 0, 7); // wipe out the character buffer
        j = 0; 
      }
     
      if (isdigit(tmp)){ // check if character is a digit 0-9. Use below if expecting digits + character data
      //if ((tmp != '<') && (tmp != '>') && (tmp != '.')){ // ignore '<' '>' '.' only
        buf[j] = tmp; //
        j++;
      }

      if (tmp == '>'){ //exit function upon receiving '>' end marker
        return; 
      }
      
    }
    myFileIn.close(); // close the file:
  }
  else {
    Serial.println("error opening file for read"); // if the file didn't open, print an error:
    }
}

defragster · Mar 25, 2020

How big is a zip'd copy of the '2 seconds to parse' data file?

Using 'else if' on two of the tests in the right order might allow the compiler to make the loop more efficient.

doing a "char someBuf[512]" to do a read of blocks of 512 bytes when that much was left - then stepping through that {0-511} between reads might result in improved throughput.

martianredskies · Mar 25, 2020

Thanks! I will duplicate the function and experiment with modifications like this. I implement something like this when doing the serial transfer, group packets into a single word, it does speed things up vs sending character by character., I'll post my results.

defragster said:
How big is a zip'd copy of the '2 seconds to parse' data file?

Using 'else if' on two of the tests in the right order might allow the compiler to make the loop more efficient.

doing a "char someBuf[512]" to do a read of blocks of 512 bytes when that much was left - then stepping through that {0-511} between reads might result in improved throughput.

mborgerson · Mar 26, 2020

Having lots of time for projects as a result of the Governor of Oregon's Stay-at-home order, I worked out an optimized version of the ASCII to Long Integer parsing algorithm. I also added a primitive user interface and code to allow writing small and large test files and showing the file directory. Now parsing a 2-digit number (and its separator) takes about 0.4 microseconds.

Hopefully, the comments in the code will make it easy to understand and adapt.

Code:

/**************************************************************************
   Optimizing an ASCII number file parsing algorithm
   MJB  3/26/20

   You can enter the following commands in the serial monitor:

   's'   Create a small file of 160 random numbers  in ASCII format
   'l'   Create a large file of 1 million numbers in ASCII format
   'r'   Read the ASCII file you have created
   'd'   Show a disk file directory

   The program uses the EXFAT file system, which is part of Bill Greiman's
   SDFAT  Beta library.   I use it because it can handle SD cards > 32GB.

   When you read or write a file, the first 160 numbers are displayed, so you can
   verify that the parsed numbers match the original output.

   The write algorithm produces random numbers between 0 and 99, with an
   occasional very large number thrown in--as shown in the original poster's
   sample data.  File writing is done number by number and takes about 10 times
   longer than reading the data in blocks.

   The read function works with a 16KB buffer to minimize file operations.
   Instead of calling ATOI(), the input is directly converted to the
   number--avoiding copies to buffers,  etc. etc.

   On a T3.6 at 180MHz, the parser can convert 2.9MB of text to 1 million numbers
   in 393 milliseconds.  That means that the average time to read and parse a number
   is about 0.4 microseconds!   Of course, sending that number out through a serial
   port is going to slow things down a bit ;-)
 *********************************************************************************/


#include "SdFat.h"
#include "sdios.h"
#include "FreeStack.h"
#include "ExFatlib\ExFatLib.h"
#include <time.h>
#include <TimeLib.h>
SdExFat sd;
ExFile asciiFile;


#define SD_CONFIG SdioConfig(FIFO_SDIO)

// SDCARD_SS_PIN is defined for the built-in SD on some boards.
#ifndef SDCARD_SS_PIN
const uint8_t SD_CS_PIN = SS;
#else  // SDCARD_SS_PIN
// Assume built-in SD is used.
const uint8_t SD_CS_PIN = SDCARD_SS_PIN;
#endif  // SDCARD_SS_PIN


#define BUFFSIZE 16384
char cbuffer[BUFFSIZE];
char filename[] = "whatever.txt";

//  Change this constant to define file size
//  when it is less than 500 debug output is printed to Serial
#define NUMTODISPLAY 160
/*****************************************************************************
   Read the Teensy RTC and return a time_t (Unix Seconds) value

 ******************************************************************************/
time_t getTeensy3Time() {
  return Teensy3Clock.get();
}


void setup() {
  // put your setup code here, to run once:
  while (!Serial) {}
  Serial.begin(9600);
  Serial.println("\nASCII Parsing test");
  if (!sd.begin(SD_CONFIG)) {
    Serial.println("\nSDIO Card initialization failed.\n");
  } else  Serial.println("SDIO initialization done.");

  if (sd.fatType() == FAT_TYPE_EXFAT) {
    Serial.println("Type is exFAT");
  } else {
    Serial.printf("Type is FAT%d\n", int16_t(sd.fatType()));
  }
  // set date time callback function  so file gets a good date
  SdFile::dateTimeCallback(dateTime);
  setSyncProvider(getTeensy3Time);
}

void loop() {
  // put your main code here, to run repeatedly:
  char ch;
  if (Serial.available()) {
    ch = Serial.read();
    if (ch == 's')  WriteAsciiFile(160);
    if (ch == 'l')  WriteAsciiFile(1000000);
    if (ch == 'r')  ParseAsciiFile();
    if (ch == 'd')  sd.ls(LS_SIZE | LS_DATE | LS_R);
  }
}

// Write random number to ascii file.  Most numbers will be in range 0 to 99, but occasional values
// will be very large.  Write one at a time to file, since efficiency isn't an issue in generating the file.
void WriteAsciiFile(uint32_t fnums) {
  uint32_t i, num;
  Serial.printf("\n\nWriting ASCII File of %lu positive integers\n", fnums);
  // Open the file
  if (!asciiFile.open(filename,  O_RDWR | O_CREAT | O_TRUNC)) {
    Serial.printf("Unable to open <%s> for writing.", filename);
    return;
  }
  // start with '<'
  asciiFile.print('<');

  //  Write the numbers as ascii with '.' separator
  for (i = 0; i < fnums; i++) {
    num = random(99);
    if (num == 50) num = random(1311) * 1117;
    asciiFile.printf("%u.", num);
    if (i <= NUMTODISPLAY) { // show verification output first values
      Serial.print(num); Serial.print(".");
      if ((i % 16) == 15) Serial.println();
    }
  }
  // end with '>'
  asciiFile.print('>');
  asciiFile.close();
  Serial.println("Write Finished");
}

void ParseAsciiFile(void) {
  uint16_t idx, numread, numdigits;
  uint32_t totalchars, totalnums, tempnum;
  uint32_t startmilli, endmilli;
  char ch;
  if (!asciiFile.open(filename, O_READ)) {
    Serial.printf("\nCould not open <%s> for reading.", filename);
    return;
  }
  startmilli = millis();  // save starting time
  Serial.printf("\n\nParsing ASCII File of positive integers\n");
  idx = 0;
  totalnums = 0;
  totalchars = 0;
  tempnum = 0;
  numdigits = 0;
  do {
    numread = asciiFile.read(&cbuffer[0], BUFFSIZE);
    totalchars += numread;
    // numread tells us how many were read.  When it is less than BUFFSIZE, we are done
    for (idx = 0; idx < numread; idx++) {
      ch = cbuffer[idx];
      if (isdigit(ch)) { // add to tempnum
        tempnum = tempnum * 10 + (ch & 0x0F); // make up the binary number
        numdigits++;
      } else { // we are at separator, '<' or '>'
        if (numdigits > 0) {

          if (totalnums <= NUMTODISPLAY) {
            SendNumber(tempnum);  // send out resulting number
            if ((totalnums % 16) == 15) Serial.println();
          }
          totalnums++;
        }
        tempnum = 0;
        numdigits = 0;
        // if there is the possibility of a '>' before the end of the file
        // you can add code for early exit here
      }
    }

  } while (numread == BUFFSIZE);

  endmilli = millis();
  asciiFile.close();
  Serial.println();
  Serial.printf("Parsing %lu characters  into %lu numbers took %lu milliseconds\n",
                totalchars, totalnums, endmilli - startmilli);
}


void SendNumber(uint32_t num) {
  Serial.printf("%lu ", num);
}

//------------------------------------------------------------------------------
/*
   User provided date time callback function.
   See SdFile::dateTimeCallback() for usage.
*/
void dateTime(uint16_t* date, uint16_t* time) {
  // use the year(), month() day() etc. functions from timelib

  // return date using FAT_DATE macro to format fields
  *date = FAT_DATE(year(), month(), day());

  // return time using FAT_TIME macro to format fields
  *time = FAT_TIME(hour(), minute(), second());
}

mborgerson · Mar 26, 2020

I followed up on my parsing optimization with some tests that changed the file buffer size. It wasn't until the buffer went below 512 bytes that the read parsing time for one million values went up over 400 milliseconds. I think that means that an improved parsing algorithm had more effect than reading larger file buffers. It probably also means that the EXFAT version of SDFAT is very efficient and does a good job with small file reads. I suspect that a FAT32 SD card would be somewhat slower because of the larger number of FAT accesses for the smaller cluster sizes.

I'm somewhat abashed that throwing lots of RAM at the problem wasn't the key factor in reducing read and parsing time. It just goes to show that a better algorithm usually smooths the path to success.

martianredskies · Mar 26, 2020

Nice work! I'm excited to try. I read that when you read a SD file, it automatically loads a 512-byte block and subsequent reads are actually done from memory (until the next 512-byte block is pulled). Maybe that explains the magic number (512 for your buffer)

mborgerson said:
I followed up on my parsing optimization with some tests that changed the file buffer size. It wasn't until the buffer went below 512 bytes that the read parsing time for one million values went up over 400 milliseconds. I think that means that an improved parsing algorithm had more effect than reading larger file buffers. It probably also means that the EXFAT version of SDFAT is very efficient and does a good job with small file reads. I suspect that a FAT32 SD card would be somewhat slower because of the larger number of FAT accesses for the smaller cluster sizes.

I'm somewhat abashed that throwing lots of RAM at the problem wasn't the key factor in reducing read and parsing time. It just goes to show that a better algorithm usually smooths the path to success.

defragster · Mar 26, 2020

The media itself stores data in 512 byte element blocks - and the read buffers on that. p#11 used someBuf[512] for that reason - but local buffer access by indexing from a single read transfer will eliminate overhead of the calls and be much more efficient incrementing the pointer 511 times rather than 511 additional calls to get the next byte - ideally from the local buffer in the library.

martianredskies · Mar 27, 2020

Even with sdfat beta, I can't get exFat to work...so I'm trying to get this code working in Fat32 mode.

I changed;

Code:

SdExFat sd;
ExFile asciiFile;

to

Code:

SdFs sd;
FsFile asciiFile;

....which is the only way i can get it to write. FsFile is for fat16, however, as File32 doesn't seem to work. With FsFile, the "small test (160 char) reas and writes fine, but not the longer version, which, though it produces a larger file, doesn't seem to read or write past 160 characters.

Any ideas?

I want to convert over from my SD.h code to use sdFat before i go any further, so your code is actually a great primer. Thanks.

martianredskies · Mar 28, 2020

Actually, it all works with the settings below. FsFile is for Fat32. I opened the txt on my computer, i can see all the data is there, i had to read through your code and i see you limit the number of printouts.

martianredskies said:
Even with sdfat beta, I can't get exFat to work...so I'm trying to get this code working in Fat32 mode.

I changed;

Code:

SdExFat sd; ExFile asciiFile;

to

Code:

SdFs sd; FsFile asciiFile;

....which is the only way i can get it to write. FsFile is for fat16, however, as File32 doesn't seem to work. With FsFile, the "small test (160 char) reas and writes fine, but not the longer version, which, though it produces a larger file, doesn't seem to read or write past 160 characters.

Any ideas?

I want to convert over from my SD.h code to use sdFat before i go any further, so your code is actually a great primer. Thanks.

martianredskies · Mar 28, 2020

I've been spending today porting over everything i did using SD.h to sdfat beta, so glad I have.

I was able to test the code below with my same 132kb file and speed has been dramatically improved...going from around 2.5sec to 74ms.

"Parsing 495127 characters into 131195 numbers took 74 milliseconds"

That seems to suggest the SD card on the t3.6 is quite fast indeed. Even with a low-grade verbatim sd card.

mborgerson said:

Having lots of time for projects as a result of the Governor of Oregon's Stay-at-home order, I worked out an optimized version of the ASCII to Long Integer parsing algorithm. I also added a primitive user interface and code to allow writing small and large test files and showing the file directory. Now parsing a 2-digit number (and its separator) takes about 0.4 microseconds.

Hopefully, the comments in the code will make it easy to understand and adapt.

Code:

/**************************************************************************
   Optimizing an ASCII number file parsing algorithm
   MJB  3/26/20

   You can enter the following commands in the serial monitor:

   's'   Create a small file of 160 random numbers  in ASCII format
   'l'   Create a large file of 1 million numbers in ASCII format
   'r'   Read the ASCII file you have created
   'd'   Show a disk file directory

   The program uses the EXFAT file system, which is part of Bill Greiman's
   SDFAT  Beta library.   I use it because it can handle SD cards > 32GB.

   When you read or write a file, the first 160 numbers are displayed, so you can
   verify that the parsed numbers match the original output.

   The write algorithm produces random numbers between 0 and 99, with an
   occasional very large number thrown in--as shown in the original poster's
   sample data.  File writing is done number by number and takes about 10 times
   longer than reading the data in blocks.

   The read function works with a 16KB buffer to minimize file operations.
   Instead of calling ATOI(), the input is directly converted to the
   number--avoiding copies to buffers,  etc. etc.

   On a T3.6 at 180MHz, the parser can convert 2.9MB of text to 1 million numbers
   in 393 milliseconds.  That means that the average time to read and parse a number
   is about 0.4 microseconds!   Of course, sending that number out through a serial
   port is going to slow things down a bit ;-)
 *********************************************************************************/


#include "SdFat.h"
#include "sdios.h"
#include "FreeStack.h"
#include "ExFatlib\ExFatLib.h"
#include <time.h>
#include <TimeLib.h>
SdExFat sd;
ExFile asciiFile;


#define SD_CONFIG SdioConfig(FIFO_SDIO)

// SDCARD_SS_PIN is defined for the built-in SD on some boards.
#ifndef SDCARD_SS_PIN
const uint8_t SD_CS_PIN = SS;
#else  // SDCARD_SS_PIN
// Assume built-in SD is used.
const uint8_t SD_CS_PIN = SDCARD_SS_PIN;
#endif  // SDCARD_SS_PIN


#define BUFFSIZE 16384
char cbuffer[BUFFSIZE];
char filename[] = "whatever.txt";

//  Change this constant to define file size
//  when it is less than 500 debug output is printed to Serial
#define NUMTODISPLAY 160
/*****************************************************************************
   Read the Teensy RTC and return a time_t (Unix Seconds) value

 ******************************************************************************/
time_t getTeensy3Time() {
  return Teensy3Clock.get();
}


void setup() {
  // put your setup code here, to run once:
  while (!Serial) {}
  Serial.begin(9600);
  Serial.println("\nASCII Parsing test");
  if (!sd.begin(SD_CONFIG)) {
    Serial.println("\nSDIO Card initialization failed.\n");
  } else  Serial.println("SDIO initialization done.");

  if (sd.fatType() == FAT_TYPE_EXFAT) {
    Serial.println("Type is exFAT");
  } else {
    Serial.printf("Type is FAT%d\n", int16_t(sd.fatType()));
  }
  // set date time callback function  so file gets a good date
  SdFile::dateTimeCallback(dateTime);
  setSyncProvider(getTeensy3Time);
}

void loop() {
  // put your main code here, to run repeatedly:
  char ch;
  if (Serial.available()) {
    ch = Serial.read();
    if (ch == 's')  WriteAsciiFile(160);
    if (ch == 'l')  WriteAsciiFile(1000000);
    if (ch == 'r')  ParseAsciiFile();
    if (ch == 'd')  sd.ls(LS_SIZE | LS_DATE | LS_R);
  }
}

// Write random number to ascii file.  Most numbers will be in range 0 to 99, but occasional values
// will be very large.  Write one at a time to file, since efficiency isn't an issue in generating the file.
void WriteAsciiFile(uint32_t fnums) {
  uint32_t i, num;
  Serial.printf("\n\nWriting ASCII File of %lu positive integers\n", fnums);
  // Open the file
  if (!asciiFile.open(filename,  O_RDWR | O_CREAT | O_TRUNC)) {
    Serial.printf("Unable to open <%s> for writing.", filename);
    return;
  }
  // start with '<'
  asciiFile.print('<');

  //  Write the numbers as ascii with '.' separator
  for (i = 0; i < fnums; i++) {
    num = random(99);
    if (num == 50) num = random(1311) * 1117;
    asciiFile.printf("%u.", num);
    if (i <= NUMTODISPLAY) { // show verification output first values
      Serial.print(num); Serial.print(".");
      if ((i % 16) == 15) Serial.println();
    }
  }
  // end with '>'
  asciiFile.print('>');
  asciiFile.close();
  Serial.println("Write Finished");
}

void ParseAsciiFile(void) {
  uint16_t idx, numread, numdigits;
  uint32_t totalchars, totalnums, tempnum;
  uint32_t startmilli, endmilli;
  char ch;
  if (!asciiFile.open(filename, O_READ)) {
    Serial.printf("\nCould not open <%s> for reading.", filename);
    return;
  }
  startmilli = millis();  // save starting time
  Serial.printf("\n\nParsing ASCII File of positive integers\n");
  idx = 0;
  totalnums = 0;
  totalchars = 0;
  tempnum = 0;
  numdigits = 0;
  do {
    numread = asciiFile.read(&cbuffer[0], BUFFSIZE);
    totalchars += numread;
    // numread tells us how many were read.  When it is less than BUFFSIZE, we are done
    for (idx = 0; idx < numread; idx++) {
      ch = cbuffer[idx];
      if (isdigit(ch)) { // add to tempnum
        tempnum = tempnum * 10 + (ch & 0x0F); // make up the binary number
        numdigits++;
      } else { // we are at separator, '<' or '>'
        if (numdigits > 0) {

          if (totalnums <= NUMTODISPLAY) {
            SendNumber(tempnum);  // send out resulting number
            if ((totalnums % 16) == 15) Serial.println();
          }
          totalnums++;
        }
        tempnum = 0;
        numdigits = 0;
        // if there is the possibility of a '>' before the end of the file
        // you can add code for early exit here
      }
    }

  } while (numread == BUFFSIZE);

  endmilli = millis();
  asciiFile.close();
  Serial.println();
  Serial.printf("Parsing %lu characters  into %lu numbers took %lu milliseconds\n",
                totalchars, totalnums, endmilli - startmilli);
}


void SendNumber(uint32_t num) {
  Serial.printf("%lu ", num);
}

//------------------------------------------------------------------------------
/*
   User provided date time callback function.
   See SdFile::dateTimeCallback() for usage.
*/
void dateTime(uint16_t* date, uint16_t* time) {
  // use the year(), month() day() etc. functions from timelib

  // return date using FAT_DATE macro to format fields
  *date = FAT_DATE(year(), month(), day());

  // return time using FAT_TIME macro to format fields
  *time = FAT_TIME(hour(), minute(), second());
}

mborgerson · Mar 28, 2020

Unless you are using a micro-sd card >= 32GB, Windows and the standard SDFAT formatter will not format the card as EXFat. You can use the EXFat formatter that is part of the SDFat beta to format any size card as EXFat. I did that long ago for my 16GB card and totally forgot to specify that the card had to be formatted that way for my example code to work.

martianredskies · Mar 28, 2020

No worries, luckily it wasn't a major change.

serially printing each value I've noticed is the main bottleneck, and skews the results accordingly. That being said, using your algorithm i was able to parse and serial.print 128k variables in 1.1seconds.

I see this being very useful as an alternative to using sram based arrays [] for larger data files. There are many situations where the speed of ram is overkill Vs the hit your sketch takes.

mborgerson said:
Unless you are using a micro-sd card >= 32GB, Windows and the standard SDFAT formatter will not format the card as EXFat. You can use the EXFat formatter that is part of the SDFat beta to format any size card as EXFat. I did that long ago for my 16GB card and totally forgot to specify that the card had to be formatted that way for my example code to work.

defragster · Mar 29, 2020

Cool - the sample code works !

Formatted SD to ExFAT and on a T_4.0 I got ::

Parsing 2942089 characters into 1000000 numbers took 177 milliseconds
2020-03-29 01:06 2942089 whatever.txt

5.9 times more characters for 7.6 times more numbers in 2.4 times more time.

Adding:: if (ch == 'L') WriteAsciiFile(131195);

Gives :: Parsing 385859 characters into 131195 numbers took 24 milliseconds
2020-03-29 01:05 385859 whatever.txt

And with T_4.0's FASTER 480 Mbps USB ( limited by PC ) - printing all 1 million numbers { #define NUMTODISPLAY 1000000 }:

Code:

Parsing 2941984 characters  into 1000000 numbers took 2039 milliseconds

And for a 128K case with "L":

Code:

Parsing 385493 characters  into 131072 numbers took 269 milliseconds

<edit>: Changing to >> #define BUFFSIZE 512 //16384
With printing all::

Code:

Parsing 2941984 characters  into 1000000 numbers took 2043 milliseconds

Printing only #160::

Code:

Parsing 2941984 characters  into 1000000 numbers took 179 milliseconds

So larger read buffer over 512 bytes doesn't help more than a ms or two?

Quick Hack to just read one buffer and re-use it - taking the SD reads out of the timing:

Code:

Parsing 2942976 characters  into 1000151 numbers took 43 milliseconds

So most of the processing time - about 140ms - is waiting for the data to read - where the SD card reads about 21 MB/sec with the file size of 2.9 MB.

Read SD card file values (char string) into an Int array?

Well-known member

Well-known member

Senior Member+

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Senior Member+

Well-known member

Well-known member

Well-known member

Well-known member

Senior Member+

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Senior Member+