Generic data logger object

Status
Not open for further replies.
@mborgerson - Looks good - with edits noted in p#24
Overnight SD card filled - datalogger lib stops near end of Disk - again using 15.4MB PSRAM:
Code:
Warning, all data on the card will be erased.
Enter 'Y' to continue: 
Card size:   29.723  GiB (GiB = 2^30 bytes)
Card will be formated exFAT
Writing FAT .................................
Writing upcase table
Writing root
Format done

Card Type is exFAT
File system initialization done.
RTC has set the system time

Starting Logger.

File name is:  <LOG_0313.bin>
Pre-allocating 1GB file space 
Allocation succeeded.

Opened  File LOG_0313.bin
 
   4.345,   Records:   1086250   Overflows:    0
   9.480,   Records:   2370000   Overflows:    0
  14.220,   Records:   3555000   Overflows:    0
…
7963.990,   Records:1990997500   Overflows:    0
7969.520,   Records:1992380000   Overflows:    0
Halting logging.  Getting near end of SD Card.

Stopping Logger.
Logger Status:

MBytes Written: 30410.35   SdCard free space:     9.78 MBytes
Collection time: 7972 seconds
Max Collection delay: 1 microseconds
Average Write Time: 285.946 milliseconds
Maximum Write Time: 809.916 milliseconds

Verifying last file

Recnum: 1000000
Recnum: 2000000
…
Recnum: 915000000
Recnum: 916000000
…
Recnum: 1013000000
Recnum: 1014000000
…  
Recnum: 1989000000
Recnum: 1990000000
Recnum: 1991000000
Recnum: 1992000000
End of file playback.

[B]Verification errors:  0
Verification complete
[/B]
 
DataLogger.h file capitalization seems to matter, on Linux at least

Just getting started trying out this interesting project.
My first issue: provided examples fail to compile because #include <Datalogger.h> file is not found.

Reason: capitalization. Actual file is named DataLogger.h with a capital "L"
I'm running this on Ubuntu; maybe Windows ignores capitalization in filenames?

Next step: realized Fast_LoggerT4.1 may assume the additional 8MB PSRAM chip which I don't have, so I tried Analog4_LoggerTest
this seems to work, but the status reports some overflows, and a nearly 2 second write time. Is that expected? I am using T4.1 with an empty SanDisk 32GB card.

Code:
Starting Logger.
File name is:  <LOG_1526.csv>
Pre-allocating 1GB file space 
Allocation succeeded.
Opened  File LOG_1526.csv
 
1589729194,   4960, 4063, 4068, 4067, 4068
1589729198,   9030, 4066, 4069, 4067, 4072
1589729203,  14059, 4064, 4067, 4066, 4065
1589729208,  19088, 4070, 4062, 4068, 4067
1589729213,  24717, 4060, 4068, 4066, 4064
1589729218,  29747, 4066, 4070, 4071, 4067
1589729224,  34776, 4068, 4069, 4074, 4071
Stopping Logger.

Logger Status:
Collection time: 40 seconds  MBytes Written:     4.35
Max chunks ready: 15 Overflows: 14 Max Collect delay: 85 uSecs
Avg Write Time: 18.149 mSec   Max Write Time: 1800.530 mSec

Ok, tried again this time first doing the in-program format instead of the pre-existing format done by the SD Card Assoc. formatting program which I ran on Win10.
This time, the max write time looks a whole lot better:

Code:
Starting Logger.
File name is:  <LOG_1533.csv>
Pre-allocating 1GB file space 
Allocation succeeded.
Opened  File LOG_1533.csv
 
1589729643,   4960, 4067, 4068, 4068, 4066
1589729648,   9904, 4066, 4069, 4064, 4068
1589729653,  14922, 4063, 4067, 4062, 4066
1589729658,  19962, 4065, 4066, 4065, 4067
1589729663,  25002, 4067, 4064, 4066, 4065
1589729668,  30013, 4065, 4067, 4071, 4070
1589729673,  34933, 4066, 4069, 4066, 4063
1589729678,  39999, 4067, 4068, 4069, 4063
1589729683,  44998, 4063, 4069, 4069, 4066
1589729688,  49998, 4068, 4072, 4071, 4069
Stopping Logger.

Logger Status:
Collection time: 53 seconds  MBytes Written:     6.71
Max chunks ready: 3 Overflows: 0 Max Collect delay: 85 uSecs
Avg Write Time: 10.584 mSec   Max Write Time: 26.510 mSec
 
Last edited:
Just getting started trying out this interesting project.
My first issue: provided example "Fast_LoggerT4_1" fails to compile because #include <Datalogger.h> file is not found.

Reason: capitalization. Actual file is named DataLogger.h with a capital "L"
I'm running this on Ubuntu; maybe Windows ignores capitalization in filenames?

Next step: realized Fast_LoggerT4.1 may assume the additional 8MB PSRAM chip which I don't have, so I tried Analog4_LoggerTest
this seems to work, but the status reports some overflows, and a nearly 2 second write time. Is that expected? I am using T4.1 with an empty SanDisk 32GB card.

Code:
Starting Logger.
File name is:  <LOG_1526.csv>
Pre-allocating 1GB file space 
Allocation succeeded.
Opened  File LOG_1526.csv
 
1589729194,   4960, 4063, 4068, 4067, 4068
1589729198,   9030, 4066, 4069, 4067, 4072
1589729203,  14059, 4064, 4067, 4066, 4065
1589729208,  19088, 4070, 4062, 4068, 4067
1589729213,  24717, 4060, 4068, 4066, 4064
1589729218,  29747, 4066, 4070, 4071, 4067
1589729224,  34776, 4068, 4069, 4074, 4071
Stopping Logger.

Logger Status:
Collection time: 40 seconds  MBytes Written:     4.35
Max chunks ready: 15 Overflows: 14 Max Collect delay: 85 uSecs
Avg Write Time: 18.149 mSec   Max Write Time: 1800.530 mSec

> Yes WINDOWS cares not for CASE - unlike LINUX. To be usable that needs to be consistent.
> Indeed Fast_LoggerT4.1 is special cased to assume PSRAM
-- > I just extended that for a 33 Byte Packed struct and found there is some integrity issue I need to find after 15M records :( :: Teensy-4-1-Beta-Test
-- > As noted there the packed struct is more abusive and also the added fields can be checked as KNOWN in Verify for integrity … now they fail on ReadBack and I need to see why ????
> Something very UGLY/SLOW with your SD card :: Max Write Time: 1800.530 mSec - this may be causing logging to bump into writing SD and cause the Overflow.
-- > my SD writes will be in excess of 1MB each and the above "Maximum Write Time: 809.916 milliseconds" is the highest I have seen it.
-- > Do a PC Reformat of the card perhaps - or it may be counterfeit or dying and not running up to spec? Change the setup to allow longer buffer time is what I'm using to help FILL PSRAM here of 16 MB:: #define BUFFERMSEC 3600

Last run shows:
Code:
Logger Status:

MBytes Written:   791.26   SdCard free space: 29517.43 MBytes
Collection time: 205 seconds
Max Collection delay: 9 microseconds
Average Write Time: 261.248 milliseconds
Maximum Write Time: 612.549 milliseconds
 
Thanks for the notes.

1. I'll definitely check for proper spelling of file names. Windows HAS made me a bit lazy in that respect.

2. A 33-byte packed structure??? I've always avoided structures with odd lengths since back in the early ARM days when accessing unaligned data had a speed penalty at best and outright failure at worst. I suspect that the problem pops up on replay as the data records are read for playback one 33-byte record at a time. While recording, the records are probably grouped into a chunk that is, at worst, a multiple of four bytes. That possibility is something for me to check out, as there are still some applications out there with odd structure lengths.

3. One thing to note on the in-program format option: I think it is set up to program all size cards as ExFAT. You can do that with the SDFat 2.0 formatter. ExFaT is great for large data logger files, but wastes a lot of disk space with large cluster sizes if you are recording a lot of small files. In some recent tests, I noted very long write times when getting to the last 5% of the free space on a card. That's another thing I need to investigate. On a 2GB card formatted as FAT32, I did see write times over 600mSec on a regular basis even when not near the end of the card.


4. There is a hidden sanity check in the buffer initialization code that limits the buffering time to 4095mSec. I will make that a #define MAXMSECLEN 4096
and group all those sanity check defines in one area for easier maintenance.
 
Yes, packed 33 byte struct on purpose to see if it would FAULT the processor or fail - it does not - it does seem to take extra time in early test pre-library, but that is okay just needed to know it worked. Did the same in early beta test of PSRAM - started the PSRAM pointer at 0x???001 and then did either BYTE or DWORD write and read compare of the rest of the chip with no problem.

It fails similarly when packed is removed and the 33 byte struct gets padded. The weird thing is the FAIL can happen after recnum 4,000,000 or much higher - and all tests well up to there:
Code:
//unpacked
Recnum: 33000000
Record number 2 error at 33233123 [*>0x70DC57EC]  File: 33665123  local: 33233123
Code:
//unpacked
Recnum: 28000000
End of file playback.
Verification errors:  0
Code:
//packed
Recnum: 15000000
Record number error at 15857784  File: 15857784  local: 15425784

In Code Formatter is handy - not having to remove the SD to resume testing.

> It would be handy for the 'Dir List' to show Free Space.

...
 
My long-term plan is to be able to have the logger become an MTP responder, so that you can upload the files from the Teensy as if it were an attached as an MTP USB disk. I've gotten the basic MTP test programs to work on the T3.6, but not the T4.That means that full MTP capability awaits my education and, probably a lot of work by other programmers working on the MTP functionality.

That is what I have done for one of my application (as you are/were in the field, datalogger on Slocum glider, sorry its not a SeaGliderr).
One Issue I faced:
MTP keeps picture of disk in memory. So logging data does not automatically be made visible on PC.

Could not get notification events working, so I settled on following solution:
user disconnects MTP before starting data logging, connects again after stopping data logging
when opening session, MTP responder clears and resets the local disc structure.
 
Another thought just occurred to me: How does the PSRAM handle storage of 33-byte structures? Is there some mechanism unknown to me that doesn't cope well with odd-sized writes?
 
Two possibly worthwhile changes I made to the basic logger. In the setup() part:
Code:
  pinMode(A0, INPUT_DISABLE);  // 4 ADC inputs without digital "keeper" drive on pin
  pinMode(A1, INPUT_DISABLE);
  pinMode(A2, INPUT_DISABLE);
  pinMode(A3, INPUT_DISABLE);
This made a BIG difference to sensor accuracy in my particular test, with C-bypassed but fairly high DC impedance inputs, nominally centered around Vdd/2 midpoint. I was really confused about it, until I remembered the Input Pin Keeper situation. Even though I think I was just posting about this in another thread a few days ago.

In the GetStatus() function, I think it's fun to see the current system time, just to confirm it's correct.
Code:
void GetStatus(void) {
  TLoggerStat *tsp;
  float mbytes;
  char stime[25];       // string for YYYY-MM-DD HH:MM:SS system time
  time_t t;             // current C time, seconds since epoch
  tsp =  mydl.GetStatus();

  mbytes = tsp->byteswritten / (1024.0 * 1024.0);
  Serial.println("\nLogger Status:");
  t = now();
  sprintf(stime, "%04d-%02d-%02d %02d:%02d:%02d", year(t),month(t),day(t),hour(t),minute(t),second(t) );
  Serial.printf("Current time is: %s\n", stime);
 ...

Code:
Logger Status:
Current time is: 2020-05-18 12:40:03
Collection time: 554 seconds  MBytes Written:     2.27
Max chunks ready: 1 Overflows: 0 Max Collect delay: 332 uSecs
Avg Write Time:  1.778 mSec   Max Write Time: 91.201 mSec
 
Another thought just occurred to me: How does the PSRAM handle storage of 33-byte structures? Is there some mechanism unknown to me that doesn't cope well with odd-sized writes?

PSRAM I/O all handled by 1062 processor. It maps into memory space just like RAM for direct memory access. Odd boundary offset and lengths seem to test fine.

I used this:
Code:
struct datrec {
  uint32_t microtime;    //  time in microseconds
  uint32_t millitime;   // millis() since collection start at time collection occurred
  uint32_t numrecords;  // number of structures collected
  uint8_t *cptr;        // pointer to last storage location to check out-of-bounds writes
  uint8_t aByte;
  uint32_t microtime2;    //  time in microseconds
  uint32_t millitime2;   // millis() since collection start at time collection occurred
  uint32_t numrecords2;  // number of structures collected
  uint32_t microtime3;    //  time in microseconds
};
//} __attribute__((packed));

Then on myCollector() the values were just duplicated into the #2 and #3 itmes to compare to the primary stored values.
Code:
void myCollector(  void* vdp) {
  volatile struct datrec *dp;
  // Note: logger status keeps track of max collection time
  dp = (volatile struct datrec *)vdp;
  dp->millitime = filemillis; // save mSec wev'e been collecting
  dp->microtime = filemicros;

  dp->aByte = dp->microtime&0xff;
  dp->microtime2 = dp->microtime;
  dp->millitime2 = dp->millitime;
  dp->numrecords2 = numrecs;
  dp->microtime3 = dp->microtime;

  dp->numrecords = numrecs;
  dp->cptr = (uint8_t *)dp; // save the address at which data is being stored
  numrecs++;
}

Then during Verify I could assert that they read back the same from SD as they were written.
Code:
  if (dp->aByte != (dp->microtime & 0xff)) {
  if (dp->numrecords2 != pbrecnum) {
  if ( pbmillitime != dp->millitime2  ) {
  if ( dp->microtime != dp->microtime2  ) {
  if ( dp->microtime != dp->microtime3 ) {

And it works for some many Millions of records ... then triggers an error. For this test I need to see if these same tests pass in the buffer to be written to SD to find if they corrupt in PSRAM on write/read of the SD.

And as noted if fails with or without being packed.

I started printing the PSRAM address stored in dp->cptr - and it is not always the same address. I tested on a T_4.1 with 8 and 16 MB of PSRAM and both showed similar failures :( ????

Need to make time to get back into the code. I could just make a CRC of the preceding bytes and check that - but this quickly showed checking multiple copies of the same changing number caught the failure.
 
Errors that pop up just a few times in millions of records are hardest to troubleshoot.

I can think of several reasons that things like this might happen:

* Something strange in interrupt land that makes the timer irq handler take more than 4uSec (if you are still collecting at 250KSamples/second).
* Read or write errors in either PSRAM or SDC. Those might be resolved by a test program that uses onboard static RAM to see if the same
errors occur.

Of course, there is the old standby: different Teensy hardware and new SD card. I always buy at least two of a new board when I intend to push things near the limits. (Most often the problem is letting out the magic smoke by clumsy probing with the O'scope or voltmeter).

I got the T4.1 high-speed logger to go through a 16GB card without errors when I stopped at 95% of card capacity.


I like the idea of adding free space available to the SD card directory output. I'll get that into the next version of the library---along with properly spelled file names in the examples. The analog logger example will also benefit from the "pinMode(A3, INPUT_DISABLE);" addition.
 
Hmm, if the issue is something happening at interrupt time, then increasing the IntervalTimer priority might have an effect. I think I will also
investigate the code for reading the elapsedMillis() and elapsedMicros() timers that are read inside the timer handler.

I think I'll also add a command in the program to do a write and verify in sequence so that I can let it run overnight and check it in the morning.
 
I have two T_4.1's with their own SD and one 8MB PSRAM and the other 16MB

Just put VERIFY on write during logging and there are no errors with first copy of code.

Then quit logging and PlayBack Verify finds errors Here is the end of logging where @V is when the code below is run on the struct and the checks show the PSRAM buffer data is good. Then stopping logger and doing Verify triggers errors as shown:
Code:
 V@  V@  V@  V@  V@  V@  V@  V@  V@  V@  V@  V@  V@  V@  V@  304.474,   Records:  38059200   Overflows:    0
 V@  V@  V@  V@  V@  V@  V@  V@  V@  V@  V@  V@  V@  V@  309.312,   Records:  38664000   Overflows:    0
 V@  V@  V@  V@  V@  V@ Stopping Logger.

Verifying last file

Recnum: 1000000
Recnum: 2000000
// GOOD
Recnum: 15000000
Record number error at 15529856  File: 15529856  local: 15097856
millitime number error at 15529856  File: 124239  local: 120783
millitime2 2 number error at 15529856 [*>0x70E12400]  File: 124239  local: 120783
millitime number error at 15529857  File: 124239  local: 120783
millitime2 2 number error at 15529857 [*>0x70E12424]  File: 124239  local: 120783
millitime number error at 15529858  File: 124239  local: 120783
millitime2 2 number error at 15529858 [*>0x70E12448]  File: 124239  local: 120783
millitime number error at 15529859  File: 124239  local: 120783
millitime2 2 number error at 15529859 [*>0x70E1246C]  File: 124239  local: 120783
millitime number error at 15529860  File: 124239  local: 120783
millitime2 2 number error at 15529860 [*>0x70E12490]  File: 124239  local: 120783
millitime number error at 15529861  File: 124239  local: 120783
millitime2 2 number error at 15529861 [*>0x70E124B4]  File: 124239  local: 120783
millitime number error at 15529862  File: 124239  local: 120783
millitime2 2 number error at 15529862 [*>0x70E124D8]  File: 124239  local: 120783
millitime number error at 15529863  File: 124239  local: 120783
millitime2 2 number error at 15529863 [*>0x70E124FC]  File: 124239  local: 120783
millitime number error at 15529864  File: 124239  local: 120783
millitime2 2 number error at 15529864 [*>0x70E12520]  File: 124239  local: 120783
millitime number error at 15529865  File: 124239  local: 120783
millitime2 2 number error at 15529865 [*>0x70E12544]  File: 124239  local: 120783
millitime number error at 15529866  File: 124239  local: 120783
millitime2 2 number error at 15529866 [*>0x70E12568]  File: 124239  local: 120783
millitime number error at 15529867  File: 124239  local: 120783
millitime2 2 number error at 15529867 [*>0x70E1258C]  File: 124239  local: 120783
millitime number error at 15529868  File: 124239  local: 120783
millitime2 2 number error at 15529868 [*>0x70E125B0]  File: 124239  local: 120783
millitime number error at 15529869  File: 124239  local: 120783
millitime2 2 number error at 15529869 [*>0x70E125D4]  File: 124239  local: 120783
millitime number error at 15529870  File: 124239  local: 120783
millitime2 2 number error at 15529870 [*>0x70E125F8]  File: 124239  local: 120783
End of file playback.

Verification errors:  31
Verification complete

Code:
[B]struct datrec {
  uint32_t microtime;    //  time in microseconds
  uint32_t millitime;   // millis() since collection start at time collection occurred
  uint32_t numrecords;  // number of structures collected
  uint8_t *cptr;        // pointer to last storage location to check out-of-bounds writes
  uint8_t aByte;
  uint32_t microtime2;    //  time in microseconds
  uint32_t millitime2;   // millis() since collection start at time collection occurred
  uint32_t numrecords2;  // number of structures collected
  uint32_t microtime3;    //  time in microseconds
};
//} __attribute__((packed));[/B]

// write data chunks as binary data  writechunk has index
// of first ready-to-write chunk
[COLOR="#FF0000"][B]void DataLogger::WriteBinaryChunks(uint16_t numready) {[/B][/COLOR]
  uint16_t i;
  uint32_t etime;
  float wravg;

[B]  struct datrec *dp;
  uint32_t verifyerrors=0;
  Serial.printf(" V@ ");[/B]

  LEDON
  for (i = 0; i < numready; i++) {
    etime = micros();
    writePtr = chunkPtrs[writechunk];

[B]  if (verifyerrors < 30) {
  dp = (struct datrec *)writePtr;
    if (dp->aByte != (dp->microtime & 0xff)) {
    Serial.printf("aByte error at %lu [*>0x%lX]", dp->numrecords, dp->cptr);
    Serial.printf("  File: %lu  local: %lu\n", dp->aByte, dp->microtime & 0xff);
    verifyerrors++;
  }
  if (dp->numrecords2 != dp->numrecords) {
    Serial.printf("Record number 2 error at %lu [*>0x%lX]", dp->numrecords, dp->cptr);
    Serial.printf("  File: %lu  local: %lu\n", dp->numrecords2, dp->numrecords);
    verifyerrors++;
  }
  if ( dp->millitime != dp->millitime2  ) {
    Serial.printf("millitime2 2 number error at %lu [*>0x%lX]", dp->numrecords, dp->cptr);
    Serial.printf("  File: %lu  local: %lu\n", dp->millitime2, dp->millitime);
    verifyerrors++;
  }
  if ( dp->microtime != dp->microtime2  ) {
    Serial.printf("microtime2 2 error at %lu [*>0x%lX]", dp->numrecords, dp->cptr);
    Serial.printf("  File: %lu  local: %lu\n", dp->microtime2, dp->millitime);
    verifyerrors++;
  }

  if ( dp->microtime != dp->microtime3 ) {
    Serial.printf("microtime3 3 error at %lu [*>0x%lX]", dp->numrecords, dp->cptr);
    Serial.printf("  File: %lu  local: %lu\n", dp->microtime3, dp->millitime);
    verifyerrors++;
  }
  }[/B]

    dataFile.write((uint8_t *)writePtr, chunksize); // write the data to file

Just did a second VERIFY after the first and the data coming off the SD shows the same corruption detail.

SKETCH with 16 MB PSRAM using:
Code:
#define SAMPLERATE 120000
#define BUFFERMSEC 3600
#define MAXBUFFER 15900000
uint8_t *mybuffer = (uint8_t *)0x70000000;
 
Above suggests the PSRAM is not the issue. The _isr() rate dropped to 120K/sec when the Struct size went from 16 to 36 in this case

Perhaps the _isr() interrupting the SD card writes is getting corrupt data from that?

Going to start and let it run to fill the rest of the SD card - 20 GB - with packed struct to confirm.

<edit>
Something odd with that code when PACKED - going to look ...:
Code:
 V@ microtime2 2 error at 0 [*>0x70000000]  File: 139874045  local: 0
microtime3 3 error at 0 [*>0x70000000]  File: 134217728  local: 0
 V@ Record number 2 error at 43200 [*>0x7017BB00]  File: 3221225473  local: 43200
millitime2 2 number error at 43200 [*>0x7017BB00]  File: 1509950790  local: 346
microtime2 2 error at 43200 [*>0x7017BB00]  File: 139340334  local: 346
microtime3 3 error at 43200 [*>0x7017BB00]  File: 134217896  local: 346
 V@ Record number 2 error at 86400 [*>0x702F7600]  File: 2147483650  local: 86400
millitime2 2 number error at 86400 [*>0x702F7600]  File: 3003124364  local: 691
microtime2 2 error at 86400 [*>0x702F7600]  File: 145583788  local: 691
microtime3 3 error at 86400 [*>0x702F7600]  File: 134218065  local: 691
 V@ Record number 2 error at 129600 [*>0x70473100]  File: 1073741828  local: 129600
millitime2 2 number error at 129600 [*>0x70473100]  File: 218107858  local: 1037
microtime2 2 error at 129600 [*>0x70473100]  File: 135957162  local: 1037
microtime3 3 error at 129600 [*>0x70473100]  File: 134218234  local: 1037
 V@ Record number 2 error at 172800 [*>0x705EEC00]  File: 5  local: 172800
millitime2 2 number error at 172800 [*>0x705EEC00]  File: 1711281432  local: 1382
...
 
I note that all the errors shown post #37 above occur when the buffer pointer is ~0x70E12400. I think this is in the memory range occupied by the 2nd PSRAM, which is soldered onto pads normally occupied by an SPI flash chip. The schematic shows that both chips use the same data and clock lines. Does the T4.1 driver initialize that segment driver differently than the 0x7000000 block?

It also seems significant that all the errors in #37 occurred within a 512-byte block from 0x70E12400 to 0x70E125FF. That may indicate an issue with SD card storage or playback.
 
Post #37 verifies the items before the dataFile.write() with the same checks done later on playBack Verify.

Indeed showing and seeing those address made me wonder - that is why I put the verify code there in the library.
 
Problem below(above p#38) was I only changed the struct{} to PACKED in one of the two locations since the lib needed a private copy.

That proves is fails when wrong - except now I see it is only checking the first record in each CHUNK/buffer.
Need to add a run through all elements to call it verified.

Above suggests the PSRAM is not the issue. The _isr() rate dropped to 120K/sec when the Struct size went from 16 to 36 in this case

Perhaps the _isr() interrupting the SD card writes is getting corrupt data from that?

Going to start and let it run to fill the rest of the SD card - 20 GB - with packed struct to confirm.

<edit>
Something odd with that code when PACKED - going to look ...:
Code:
 V@ microtime2 2 error at 0 [*>0x70000000]  File: 139874045  local: 0
microtime3 3 error at 0 [*>0x70000000]  File: 134217728  local: 0
 V@ Record number 2 error at 43200 [*>0x7017BB00]  File: 3221225473  local: 43200
millitime2 2 number error at 43200 [*>0x7017BB00]  File: 1509950790  local: 346
microtime2 2 error at 43200 [*>0x7017BB00]  File: 139340334  local: 346
microtime3 3 error at 43200 [*>0x7017BB00]  File: 134217896  local: 346
 V@ Record number 2 error at 86400 [*>0x702F7600]  File: 2147483650  local: 86400
millitime2 2 number error at 86400 [*>0x702F7600]  File: 3003124364  local: 691
microtime2 2 error at 86400 [*>0x702F7600]  File: 145583788  local: 691
microtime3 3 error at 86400 [*>0x702F7600]  File: 134218065  local: 691
 V@ Record number 2 error at 129600 [*>0x70473100]  File: 1073741828  local: 129600
millitime2 2 number error at 129600 [*>0x70473100]  File: 218107858  local: 1037
microtime2 2 error at 129600 [*>0x70473100]  File: 135957162  local: 1037
microtime3 3 error at 129600 [*>0x70473100]  File: 134218234  local: 1037
 V@ Record number 2 error at 172800 [*>0x705EEC00]  File: 5  local: 172800
millitime2 2 number error at 172800 [*>0x705EEC00]  File: 1711281432  local: 1382
...
 
Updated code running PACKED not finding error testing all "PACKED" structs in buffer to write - with added FOR() loop.

The added test still completes in time to prevent overflow. Leaving to fill SD ... now at record #38+ MILLION
Code:
  dp = (struct datrec *)writePtr;
[B]for ( int jj=0; jj< chunksize / sizeof(datrec); jj++) {[/B]
  if (verifyerrors < 30) {
    if (dp->aByte != (dp->microtime & 0xff)) {
    Serial.printf("aByte error at %lu [*>0x%lX]", dp->numrecords, dp->cptr);
    Serial.printf("  File: %lu  local: %lu\n", dp->aByte, dp->microtime & 0xff);
    verifyerrors++;
  }
  if (dp->numrecords2 != dp->numrecords) {
    Serial.printf("Record number 2 error at %lu [*>0x%lX]", dp->numrecords, dp->cptr);
    Serial.printf("  File: %lu  local: %lu\n", dp->numrecords2, dp->numrecords);
    verifyerrors++;
  }
  if ( dp->millitime != dp->millitime2  ) {
    Serial.printf("millitime2 2 number error at %lu [*>0x%lX]", dp->numrecords, dp->cptr);
    Serial.printf("  File: %lu  local: %lu\n", dp->millitime2, dp->millitime);
    verifyerrors++;
  }
  if ( dp->microtime != dp->microtime2  ) {
    Serial.printf("microtime2 2 error at %lu [*>0x%lX]", dp->numrecords, dp->cptr);
    Serial.printf("  File: %lu  local: %lu\n", dp->microtime2, dp->microtime);
    verifyerrors++;
  }

  if ( dp->microtime != dp->microtime3 ) {
    Serial.printf("microtime3 3 error at %lu [*>0x%lX]", dp->numrecords, dp->cptr);
    Serial.printf("  File: %lu  local: %lu\n", dp->microtime3, dp->microtime);
    verifyerrors++;
  }
  }
    dp++;
}
if (verifyerrors!=0)
  Serial.printf("| V@ \n");
 
As note p#37 - that FIRST copy only validated first record.

PSRAM read data for write is corrupted.

So that write can be removed to make sure there isn't overflow data loss from ver

Will take out the SD write and try again unpacked and see if anything else useful to print ...
 
@mborgerson: THIS IS ODD::
Code:
 V@  V@  V@ Record 2nd number error at 419920454 [*>0x70084906]  PSRAM: 420352454  local: 420352454
| V@

The PSRAM and LOCAL MATCH at '420352454' - but do not agree with error at '419920454 '

When in fact the code is testing the two same values where one tested was the same as the one indicating the error! ????

Code:
  if ([B]dp->numrecords2[/B] != [B]dp->numrecords[/B]) {
    Serial.printf("Record 2nd number error at %lu [*>0x%lX]", [B]dp->numrecords[/B], dp->cptr);
    Serial.printf("  PSRAM: %lu  local: %lu\n", [U]dp->numrecords2[/U], [B]dp->numrecords[/B]);
    verifyerrors++;
  }

The same pattern appears in all the other errors found - and it always the write before the end of the group write >> "| V@ "

It seems the value of numready in :: for (i = 0; i < numready; i++) {
is including a changing buffer?

That might be a problem with the __STREXW getting interrupted and not properly handling the _isr() changing count of readychunks :( ???
>> Will try with restored noInterrupts() code
 
Ran first time with no SD writes - it stopped when write count peaked after 138,110,400 records unpacked without issue?

Formatted drive and restarted could also stop the byteswritten update - Avg write time with Verify is 79.1 ms - so I could up the sample time and reduce the BUFFERMSEC cause more havoc.
Will let this run unpacked some time - then redo packed test with no SD writes. Now at 78M Records.

Still using the __STREXW for now. Will add print of the helper vars to see when it fires.
 
Still chasing ... no errors detected on data to write packed or unpacked ... unpacked recorded about 25 GB ... checked but not written. None of the p#44 ghost issues appeared?

Turned on write and write time verify passed, but stopped log Verify found issues on playback?
Code:
 V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ 472.781,   Records:  59097600   Overflows:    0
 V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_ V@ _V#100_Stopping Logger.
 V@ _V#100_Verifying last file

Recnum: 1000000
// All Good
Recnum: 38000000
Record number 2 error at 38415018 [*>0x70C8EBEA]  File: 38847018  local: 38415018
millitime2 2 number error at 38415018 [*>0x70C8EBEA]  File: 310649  local: 307321
microtime3 3 error at 38415018 [*>0x70C8EBEA]  File: 310776153  local: 307321
Record number error at 38847019  File: 38847019  local: 38415019
millitime number error at 38847019  File: 310777  local: 307321
millitime2 2 number error at 38847019 [*>0x70C8EC0B]  File: 310777  local: 307321
millitime number error at 38847020  File: 310777  local: 307321
millitime2 2 number error at 38847020 [*>0x70C8EC2C]  File: 310777  local: 307321
millitime number error at 38847021  File: 310777  local: 307321
millitime2 2 number error at 38847021 [*>0x70C8EC4D]  File: 310777  local: 307321
millitime number error at 38847022  File: 310777  local: 307321
millitime2 2 number error at 38847022 [*>0x70C8EC6E]  File: 310777  local: 307321
millitime number error at 38847023  File: 310777  local: 307321
millitime2 2 number error at 38847023 [*>0x70C8EC8F]  File: 310777  local: 307321
millitime number error at 38847024  File: 310777  local: 307321
millitime2 2 number error at 38847024 [*>0x70C8ECB0]  File: 310777  local: 307321
millitime number error at 38847025  File: 310777  local: 307321
millitime2 2 number error at 38847025 [*>0x70C8ECD1]  File: 310777  local: 307321
millitime number error at 38847026  File: 310777  local: 307321
millitime2 2 number error at 38847026 [*>0x70C8ECF2]  File: 310777  local: 307321
millitime number error at 38847027  File: 310777  local: 307321
millitime2 2 number error at 38847027 [*>0x70C8ED13]  File: 310777  local: 307321
millitime number error at 38847028  File: 310777  local: 307321
millitime2 2 number error at 38847028 [*>0x70C8ED34]  File: 310777  local: 307321
millitime number error at 38847029  File: 310777  local: 307321
millitime2 2 number error at 38847029 [*>0x70C8ED55]  File: 310777  local: 307321
millitime number error at 38847030  File: 310777  local: 307321
millitime2 2 number error at 38847030 [*>0x70C8ED76]  File: 310777  local: 307321
millitime number error at 38847031  File: 310777  local: 307321
millitime2 2 number error at 38847031 [*>0x70C8ED97]  File: 310777  local: 307321
millitime number error at 38847032  File: 310777  local: 307321
millitime2 2 number error at 38847032 [*>0x70C8EDB8]  File: 310777  local: 307321
End of file playback.

Verification errors:  32
Verification complete
 
Left it alone 4066 seconds to fill the SD with 507,254,400 records and no errors in verify at time of write detected ... after adding this to myCollector() :: arm_dcache_flush( vdp, sizeof(datrec) );
Then starting playBack verify it finds errors reading from the SD???
Code:
[U]Recnum: 247000000[/U]
Record number 2 error at 247260392 [*>[B]0x704EBFE8[/B]]  File: 247692392  local: 247260392
microtime3 3 error at 247260392 [*>0x704EBFE8]  File: 1981539145  local: 1978083
Record number error at 247692393  File: 247692393  local: 247260393
millitime number error at 247692393  File: 1981539  local: 1978083
millitime2 2 number error at 247692393 [*>0x704EC009]  File: 1981539  local: 1978083
millitime number error at 247692394  File: 1981539  local: 1978083
millitime2 2 number error at 247692394 [*>0x704EC02A]  File: 1981539  local: 1978083
millitime number error at 247692395  File: 1981539  local: 1978083
millitime2 2 number error at 247692395 [*>0x704EC04B]  File: 19815

That is on 5MB into PSRAM [5,161,035] so not near the boundary. Of course that value is read from a file that just showed it is corrupted.

And Recnum: 247,000,000 is farther into the logging process than normal - just under half of the nearly 9GB file.

Just formatted 32GB SD and restarted with arm_dcache_flush_delete( vdp, sizeof(datrec) ); - the _delete really slows down the process (LED never off)! Now getting OVERFLOW :( It shouldn't help as the 32K cache should easily wipe as it gathers 1.5MB between writes?

So no quick sure repro yet - just recurring 1 in a million things.
 
Looking at this off and on - it seems the trouble here is from the __STREXW code :(

Have filled SD cards a couple times on two T_4.1's using the nointerrupt() prior code:
Code:
    noInterrupts();
    numready = readychunks;
    readychunks = 0;  // clear number ready
    mystatus.bufferoverflows = bufferoverflows;
    mystatus.maxcdelay = maxcdelay;
    interrupts();

instead of:
Code:
  do { // account for interrupts that may alter vars during read.
    __LDREXW(&logger_safe_read);
    readyHold = numready;
    numready += readychunks;
    readychunks -= (numready - readyHold);  // account for number found
    mystatus.bufferoverflows = bufferoverflows;
    mystatus.maxcdelay = maxcdelay;
    readyHoldLast++;
  } while ( __STREXW(1, &logger_safe_read));

Tried to monitor the times the do{}while(); repeats and saw no trend or trigger.
Often the PSRAM would verify fine before write - but find corruption in the file on playback, sometimes Verify woudl fail - but printed values that trigger fail show correct???
REPRO test takes an hour+ and millions of records to test - and results not repeatable - even adding PSRAM cache_flush and cache_flush_delete
 
Tried to compile the default_loggertest example - and got a couple of errors.

One of them : C:\Users\ACER\AppData\Local\Temp\arduino_build_199688\sketch\Datalogger.h:86:2: error: 'SdFs' does not name a type

SdFs sdx; // for fat32/ExFat SD Card

I use sdfat (greiman). Do I need something more ?
 
Status
Not open for further replies.
Back
Top