Errors in serial data logged to SD card.

AndyA

Well-known member
I have a basic serial logger application that uses addMemoryForRead() to add a lot of additional buffering to the serial port. The main loop then checks for a fixed size block of data being available, reads a block from the Serial port and writes it to the SD card. The large buffer is because this is part of a larger application, I want to allow for long periods where the background loop could stall.

If I set the additional buffer to be 64k or larger I get 1 or two bytes of incorrect data logged approximately at multiples of the total serial buffer size.

e.g. I set the buffer to 64k which adding the standard 64 byte buffer gives a total buffer size of 0x010040. I then send repeating binary count.
In the resulting log file the data at a location 0x010030 onwards is: 0x30 0x31 0x32 0x00 0x34 0x35....
0x0200A0 onwards I get 0xA0 0xA1 0xA2 0x00 0x34 0xA5 0xA6...
0x030110 onwards I get 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17 0x18 0x00 0x1A 0x1B...

No other errors are found, all other logged data matches the input exactly. If I increase the buffer size to 128k the errors become twice as far apart.
Setting the buffer size to 32k or 48k I don't see any errors in the output log.

In all cases the resulting log file is the correct size, no bytes are lost, the issues are always corruption rather than missing data.

Any ideas why this would be happening?

Minimal code to reproduce the issue:
C++:
#include <SD.h>

#define _BIG_BUFFER_SIZE_ (64*1024)
#define _SD_BLOCK_ 512
uint8_t *serialBuffer;
uint8_t cardWriteBuffer[_SD_BLOCK_];
uint32_t writeTime = 0;
File logfile;

void setup() {
  Serial.begin(115200);

  Serial.println("Looking for SD card");
  while (!(SD.begin(BUILTIN_SDCARD))) {
    Serial.println("Still looking for SD card");
    delay(500);
  }
  Serial.println("Card found");

  serialBuffer = (uint8_t*)malloc(_BIG_BUFFER_SIZE_);
  if (!serialBuffer) {
    Serial.println("Failed to allocate buffer");
    while (true) {

    }
  }

  Serial3.begin(460800);
  Serial3.addMemoryForRead(serialBuffer, _BIG_BUFFER_SIZE_);
  logfile = SD.open("log.bin", FILE_WRITE);
  if (!logfile) {
    Serial.println("Failed to open output file");
    while (true) {

    }
  }
Serial.println("Running");
}

void loop() {
  uint32_t serialWaiting = Serial3.available();
  if (serialWaiting >= _SD_BLOCK_) {  // a full block worth waiting
    if (serialWaiting >= (_BIG_BUFFER_SIZE_ - _SD_BLOCK_)) {
      Serial.println("Overflow!!");
    }
    Serial3.readBytes(cardWriteBuffer, _SD_BLOCK_);
    if (logfile) {
      logfile.write(cardWriteBuffer, _SD_BLOCK_);
      logfile.flush();
    }
    writeTime = millis();
  }

  if (writeTime && ((millis()-writeTime) > 2000) && logfile) {
    logfile.close();
    logfile = NULL;
    Serial.println("Stopping");
  }
}


-- edit
It looks to be related to the Serial3.avalible() and/or reading all (or almost all) of the data waiting.
If I change
Code:
if (serialWaiting >= _SD_BLOCK_) {
to
Code:
if (serialWaiting >= _SD_BLOCK_ + 64) {
so that it never tries to completely empty the buffer then the correct data is recorded.
 
Last edited:
Well, readBytes() is blocking, and I suppose it could have an issue in the logic related to the buffer created via addMemoryForRead(). It would be interesting to test for (_SD_BLOCK_ + 1), because if there is a bug related to > versus >=, that should be as good as (_SD_BLOCK_ + 64).
 
Well, readBytes() is blocking, and I suppose it could have an issue in the logic related to the buffer created via addMemoryForRead(). It would be interesting to test for (_SD_BLOCK_ + 1), because if there is a bug related to > versus >=, that should be as good as (_SD_BLOCK_ + 64).

The hardware is currently being used so I can't test +1 right now but will when I get a chance.
But if that was the case I'd expect the errors to be the last byte in the block, not a few bytes off the end.

For now we are simply using a 32k buffer since that works. Realistically that's plenty so this is an easy thing to work around. But it should work with a bigger buffer, I don't like it when things don't work without a good reason. It makes me worry that the issue may show up again in some situation that is harder to work around.
 
available() and read() are a lot more complex than you might guess, but it seems unlikely that a bug in that code has gone unnoticed for this long. A test you might do is to run with the configuration where you see the problem and add logic to test the return value from readBytes(). It can return fewer bytes than requested. Also, you shouldn't need a buffer anywhere near 64K, so it would also be interesting to know if you see the problem with smaller buffers.
 
I added an extra check to the code, readBytes always returns the correct number so it thinks it's reading things correctly.

Changing it to +1 works for fixing the issue, as long as you don't try reading the last byte waiting the data is correct.

It looks like avalible() is giving the bytes in the buffer and then adding the uart hardware fifo count so it's the total number of bytes not just the ones in the buffer. ReadBytes() ends up calling read() which checks if the firmware buffer is empty and if so attempts to read from the hardware fifo.
From memory the normal interrupt handling doesn't move data from the hardware fifo to the firmware buffer until there are 2 bytes waiting or after a timeout. None of my code is running in an interrupt context and there is nothing higher priority than the uart irq running so by adding the +1 we ensure that the data being output is always read from the firmware buffer. I don't know if that matters or not, that the byte that's wrong isn't the last byte in either the buffer or the read block seems odd.

As mentioned in the initial post 32k and 48k both work fine, it only seems to be buffers of 64k or over that cause issues. Using a smaller buffer is a perfectly acceptable work around for now, 64k is complete overkill, I only used it because I had the memory free so why not. Adding a +1 there is also fine for this application and a perfectly reasonable workaround, the 1 byte latency it adds doesn't matter since this is purely for logging.

But it should work.

I really don't care if there is an upper limit on the buffer size (although the library should probably enforce it if there is). What matters is having an explanation as to why there is an upper limit. Why does it work with smaller buffers but not a large one? If it doesn't work and there is no clear reason why using that size would cause it to stop working then how can I be sure it will always work reliably with a smaller buffer?
 
Last edited:
That would explain it.
I thought I'd checked for that and everything was either of type uint32_t or size_t but I must have missed something.

Any particular reason why on a 4.1 with a 32 bit processor you are using 16 bit integers for those values or is it simply legacy code from older teensy parts?

As I said before, it's not really an issue, it ended up being more about understanding the cause to ensure it wasn't something that would come back and bite me in the future.

edit-
Would it be worth noting in the documentation that addMemoryForRead() adds additional buffering on top of the default provided by the hardware platform rather than replacing it. This means the maximum additional memory that can be added is slightly less than the 64k maximum total buffer. The current wording does imply this but explicitly stating it would make things clearer. At least in my head it would :)
 
Last edited:
Because serial is a slow device and buffer sizes should be roughly proportional to the square root of data rate (in OS theory) if I recall - its abount balancing latency, interrupt frequency and memory resources. A 64k buffer for 115200baud serial is about 5.5 seconds. What's the maximum latency you see with the SD card ? A few 100 ms perhaps? If your background loop blocks the processor for several seconds that's going to have other issues surely - its not how microcontrollers are normally programmed.
 
Because serial is a slow device and buffer sizes should be roughly proportional to the square root of data rate (in OS theory) if I recall - its abount balancing latency, interrupt frequency and memory resources. A 64k buffer for 115200baud serial is about 5.5 seconds. What's the maximum latency you see with the SD card ? A few 100 ms perhaps? If your background loop blocks the processor for several seconds that's going to have other issues surely - its not how microcontrollers are normally programmed.
And that's a reason why it's ok for code to fail with a larger buffer when it works with a smaller one? It isn't that the buffer size is necessarily needed, it's that code that looked like it should work didn't work.
Now Paul has provided an explanation as to why it didn't work it's not an issue, I'll update my application to ensure the user can't set a configuration that exceeds the maximum. My issue was with the unknown reason for a limit rather than the limit itself.

As to why you may want that size buffer, ports go a lot faster than 115200. We use 460800 fairly frequently, in once case we've run two UARTS at 2Mbits/s each. Log 4 serial ports at that rate (2 per connection so you can log both sides of a link for a couple of connections) all writing their own log file and it starts to add up. If you happen to hit the perfect storm of data arriving and sectors on the SD card you could end up getting a pause of almost a second before getting back to a port. And that's before you allow for logging CAN-FD data. Suddenly 64k of buffering starts to sound like a good idea to give you a bit of a safety margin.
Yes, it's a real corner case unlikely scenario but this is a generic logger tool, I'd rather it coped with those unlikely worst case situations.
 
In HardwareSerial.h, you could try changing the variables below if you want to use a buffer > 64K. As an alternative, though, check out the TeensySdioLogger example in SdFat. It shows how to use SdFat's Ringbuf class and isBusy() to entirely avoid blocking on SD writes.

Code:
    volatile uint16_t     tx_buffer_head_ = 0;
    volatile uint16_t     tx_buffer_tail_ = 0;
    volatile uint16_t     rx_buffer_head_ = 0;
    volatile uint16_t     rx_buffer_tail_ = 0;
 
Back
Top