Attempting non-blocking concurrent SD read/write access

graydetroit

Well-known member
I was discussing the topic of reading/writing project data structures from the SD card while at the same time playing audio files which are buffered from the SD card in this thread.

I am able to do non-blocking reads of other project data from the SD card while the SD buffered audio is playing, I do this by keeping some variables outside the main loop() which track the total data read from the SD each time the main loop iterates, whether the reading is complete or not, etc. The code reads data from files in chunks of 40kb.

Here is some pseudo code of how it works:
C++:
#define ASYNC_IO_BUFFER_SIZE 500 * 80

typedef struct {
    char bigBuffer[512 * 800];
} STUFF;

EXTMEM STUFF stuff;

File asyncReadFile;
bool shouldLoad = false;
bool asyncFileReadComplete = false;
uint32_t asyncReadFileTotalRead = 0;

SdFs sd;

bool ready = false;

void setup() {
    SPI.setMOSI(SDCARD_MOSI_PIN);
    SPI.setSCK(SDCARD_SCK_PIN);

    if (!(SD.begin(SDCARD_CS_PIN)))
    {
        return false;
    }

    // Initialize SDFat
    if (!sd.begin(SD_CONFIG)) {
        return false;
    }

    ready = true;
}

bool sdBusy() { return ready ? sd.card()->isBusy() : false; }

bool readFileBufferedAsync(std::string filename, void *buf, size_t size)
{
    if(sdBusy()) {
        Serial.println("SD CARD BUSY, CANNOT ASYNC READ!");
        return true;
    }

    Serial.printf("expected size: %d for file: %s", size, filename.c_str());
  
    asyncReadFile = SD.open(filename.c_str(), FILE_READ);
    if (!asyncReadFile || !asyncReadFile.available())
    {
        Serial.printf("Failed to open file for reading: %s\n", filename.c_str());
        return false;
    }

    Serial.printf(" asyncReadFile available: %d", asyncReadFile.available());

    size_t bufferSize = ASYNC_IO_BUFFER_SIZE;
    int8_t *index = (int8_t*)buf;
    uint32_t chunkSize = min(bufferSize, size - asyncReadFileTotalRead);
    asyncReadFileTotalRead += asyncReadFile.readBytes((char *)index, chunkSize);

    Serial.printf(" asyncReadFileTotalRead: %d, chunkSize: %d\n", asyncReadFileTotalRead, chunkSize);

    if (asyncReadFileTotalRead == size) {
        asyncFileReadComplete = true;
        asyncReadFileTotalRead = 0;
        asyncReadFile.close();

        Serial.printf("done reading file %s!\n", filename.c_str());

        return true;
    }

    asyncReadFile.close();

    return true;
}


// then in the main loop...

void update() {

    if (shouldLoad && asyncFileReadComplete == false) {
        readFileBufferedAsync("/big_buffer.bin", (byte *)&stuff, sizeof(stuff));
    } else if (shouldLoad && asyncFileReadComplete) {
        shouldLoad = false;
        asyncFileReadComplete = false;

        Serial.println("done loading!");
    }
}

So this works fine, but if I try a similar approach for _writing_ data to the SD while SD audio playback is happening, the SD card access becomes like frozen it seems. The serial output comes to a crawl and the program behaves very strangely after until I reboot. Here is similar pseudo code but for the writing (which doesn't work).
C++:
#define ASYNC_IO_BUFFER_SIZE 500 * 80

typedef struct {
    char bigBuffer[512 * 800];
} STUFF;

EXTMEM STUFF stuff;

byte writeBuffer[ASYNC_IO_BUFFER_SIZE];

File asyncWriteFile;
bool shouldSave = false;
bool asyncFileWriteComplete = false;
uint32_t remaining = 0;
uint32_t offset = 0;

SdFs sd;
bool ready = false;

void setup() {
    SPI.setMOSI(SDCARD_MOSI_PIN);
    SPI.setSCK(SDCARD_SCK_PIN);

    if (!(SD.begin(SDCARD_CS_PIN)))
    {
        return false;
    }
  
    // Initialize SDFat
    if (!sd.begin(SD_CONFIG)) {
        return false;
    }

    ready = true;
    remaining = sizeof(stuff);
}

bool sdBusy() { return ready ? sd.card()->isBusy() : false; }

bool writeFileBufferedAsync(std::string filename, const byte *data)
{
    if(sdBusy()) {
        Serial.println("SD CARD BUSY, CANNOT ASYNC READ!");
        return true;
    }
  
    asyncWriteFile = SD.open(filename.c_str(), FILE_WRITE);

    if (remaining > 0)
    {
        size_t chunkSize = min(ASYNC_IO_BUFFER_SIZE, remaining);
      
        memcpy(writeBuffer, data + offset, chunkSize);

        size_t bytesWritten = file.write(writeBuffer, chunkSize);
      
        offset += chunkSize;
        remaining -= bytesWritten;

        Serial.printf("remaining: %d\n", remaining);
    }

    if (remaining <= 0) {
        asyncFileWriteComplete = true;
    }

    asyncWriteFile.close();

    return true;
}


// then in the main loop...

void update() {
    if (shouldSave && asyncFileWriteComplete == false) {

        writeFileBufferedAsync("/big_buffer.bin", (byte *)&stuff);

    } else if (shouldSave && asyncFileWriteComplete) {
        shouldSave = false;
        asyncFileWriteComplete = false;
        remaining = 0;
        offset = 0;

        Serial.println("done saving!");
    }


I'm thinking something in writeFileBufferedAsync is wrong, but I'm not sure where. The code could just be completely wrong, I'm not used to writing structs into a file in chunks like this, so any help here is appreciated. If it's merely a limitation of the write speeds being slower vs read speeds being higher, I will have to find another workaround, because concurrent SD playback and read/write access is what I intended to do for my project.

Thank you!
 
Last edited:
Haven't time at the moment for a full response, but one thing to avoid is opening and closing the file on every call to writeFileBufferedAsync(). That's likely to be very time consuming just on its own, and probably means that any tuning you try to do by changing ASYNC_IO_BUFFER_SIZE has very little effect.

As a creature of habit, I'd probably do the whole thing as a state machine, with an enum something like {idle, needsSaving, fileWriting, saveComplete}. Call writeFileBufferedAsync() on every update(), but with a valid filename and pointer to stuff only if it's idle and you want to save; return the current state so update() can track progress. Hide everything you reasonably can in static variables inside writeFileBufferedAsync().

You probably need to pass sizeof stuff as a parameter for full generality.

SD "concurrent" read and write is definitely possible, there are discussions around about writing looper sketches which require read and write at audio streaming speeds - you can even be playing a file from the start while recording it at the end, though there are a few wrinkles to that (unrelated to speed).
 
Hm I thought I should only keep the file open briefly to write one chunk at a time because the SD card can only have a single file open at a time, and I want the audio playback objects to not be stalled trying to open files.
 
Nope, you can have many files open at once. That’s how the multi-file playback works, the file is open between play() and stop().
 
Hmmm ... just got round to trying it, and it's not working for me, either. Gotta get my investigating feet on...
 
Gah. Something deep inside the SD library is calling yield(), and then failing to deal with the fallout of the fact that it's a weakly-defined function and could do anything. To quote Paul: "What yield() will actually do isn't a fixed known quantity".
 
Could you try this fix to the SdFat library? Install in your libraries folder as usual, it should mask the one provided by Teensyduino.

I need to take a look at other filesystems - I think it's any SD access, but the author embedded a yield() call so deep into his library I really can't figure out the ramifications.
 
It works swimmingly now so far! Writes seems to do well at 4kb chunks so far. It took ~77ms to write ~360k data to a file.

Thank you yet again for all your help and diligence on this!
 
Just a note, I'm hearing bits of glitches on the streamed samples sometimes now when the concurrent reading/writing starts, but it may be the buffer and chunk sizes I need to play with.
 
OK, that’s good news. Let me know how it goes tweaking the settings, and if you can’t find a sweet spot please do post another example. I’ve got it hooked up to the ’scope for now, so looking at delays and timings is fairly easy.
 
I ended up using a state machine approach, as well as queues. I think eventually it might make sense to parallelize the reads/writes, instead of doing them on a sequential queue basis, but for now it seems fast enough and doesn't cause too much of a performance problem.

Here's how I basically have the file I/O code:

C++:
#ifndef AsyncIO_h
#define AsyncIO_h

#include <Arduino.h>
#include <string>
#include <queue>
#include <functional>
#include <memory>

#define ASYNC_IO_R_BUFFER_SIZE 512 * 80
#define ASYNC_IO_W_BUFFER_SIZE 512 * 8

namespace AsyncIO
{
    enum FILE_IO_STATE
    {
        IDLE = 0,
        START,
        BUSY,
        COMPLETE,
        ERROR
    };

    enum FILE_IO_TYPE
    {
        READ = 0,
        WRITE
    };

    enum FILE_TYPE {
        PROJECT_CONFIG = 0,
        PROJECT_DATA,
    };

    typedef struct
    {
        FILE_TYPE fileType;
        FILE_IO_TYPE ioType;
        std::string filename;
        uint32_t size;
    } IO_CONTEXT;

    typedef struct {
        uint32_t total;
        int8_t *index;
    } READ_IO;

    typedef struct {
        uint32_t offset;
        uint32_t remaining;
        byte buffer[ASYNC_IO_W_BUFFER_SIZE];
    } WRITE_IO;

    void update();
    void addItem(IO_CONTEXT ctx);
    void processNextItem();
    void setReadCallback(std::function<void(const IO_CONTEXT&)> cb);
}

#endif /* AsyncIO_h */

and
C++:
#include <AsyncIO.h>

namespace AsyncIO
{
    FILE_IO_STATE _state = IDLE;
    File _file;
  
    std::queue<IO_CONTEXT> _ioQueue;
    std::function<void(const IO_CONTEXT&)> _readCallback;
  
    READ_IO _readIO;
    WRITE_IO _writeIO;

    void openNextItem();

    bool openForRead(IO_CONTEXT& ctx);
    bool openForWrite(IO_CONTEXT& ctx);
    bool doneReading(IO_CONTEXT& ctx, void *buf);
    bool doneWriting(IO_CONTEXT& ctx, const byte *buf);

    void *getReadObject(FILE_TYPE fileType);
    byte *getWriteObject(FILE_TYPE fileType);

    void update()
    {
        switch (_state)
        {
            case IDLE:
                if (!_ioQueue.empty()) { // there are items to process
                    _state = START;
                }
                break;
            case START:
                openNextItem(); // open the next item's file in the queue
                break;
            case BUSY:
                processNextItem();
                break;
            case COMPLETE:
                _state = IDLE;
                break;
            case ERROR:
                Serial.println("ASYNC IO ERROR!");
                _file.close();
                _state = IDLE;
                break;
        }
    }

    void addItem(const IO_CONTEXT ctx)
    {
        _ioQueue.push(ctx);
    }

    void openNextItem()
    {
        if (_ioQueue.empty()) return;

        if (sdBusy()) {
            Serial.println("SD CARD BUSY! CANNOT OPEN!");
          
            return;
        }

        IO_CONTEXT& ctx = _ioQueue.front();

        switch (ctx.ioType)
        {
            case READ:
                _readIO.index = (int8_t*)getReadObject(ctx.fileType);
                _readIO.total = 0;

                Serial.printf("Opening file: %s, at index: %d\n", ctx.filename.c_str(), _readIO.index);

                if (!openForRead(ctx)) {
                    Serial.println("failed to open for read!");

                    if (_readCallback) {
                        Serial.println("calling callback for failed read!");

                        _readCallback(ctx);
                    }

                    _ioQueue.pop();
                    _state = START; // goto next file, assume failed read shouldn't block the queue
                } else {
                    _state = BUSY; // begin processing read
                }

                break;
            case WRITE:
                _writeIO.offset = 0;
                _writeIO.remaining = ctx.size;

                if (!openForWrite(ctx)) {
                    Serial.println("failed to open for write, retrying!");

                    _state = START; // retry if failed to open for write
                } else {
                    _state = BUSY; // begin processing write
                }

                break;
        }
    }

    void processNextItem()
    {
        if (_ioQueue.empty()) {
            Serial.println("ASYNC IO QUEUE EMPTY!");

            _state = COMPLETE; // all items processed, mark complete
            _readCallback = nullptr;

            return;
        }

        if (sdBusy()) {
            Serial.println("SD CARD BUSY!");
          
            return;
        }

        IO_CONTEXT& ctx = _ioQueue.front();

        switch (ctx.ioType)
        {
            case READ:
                if (!doneReading(ctx, getReadObject(ctx.fileType))) {
                    _state = BUSY; // continue reading
                } else {
                    _state = START; // so we open the next file!
                    _ioQueue.pop();
                }

                break;
            case WRITE:
                if (!doneWriting(ctx, getWriteObject(ctx.fileType))) {
                    _state = BUSY; // continue writing
                } else {
                    _state = START; // so we open the next file!
                    _ioQueue.pop();
                }

                break;
        }
    }

    bool openForRead(IO_CONTEXT& ctx)
    {
        _file = SD.open(ctx.filename.c_str(), FILE_READ);
        yield();
        if (!_file || !_file.available())
        {
            Serial.printf("Failed to open file for reading: %s\n", ctx.filename.c_str());
            return false;
        }

        Serial.printf("opened file for reading: %s\n", ctx.filename.c_str());

        return true;
    }

    bool openForWrite(IO_CONTEXT& ctx)
    {
        _file = SD.open(ctx.filename.c_str(), FILE_WRITE_BEGIN);
        yield();
        if (!_file)
        {
            Serial.printf("Failed to open file for writing: %s\n", ctx.filename.c_str());
            return false;
        }

        Serial.printf("opened file for writing: %s\n", ctx.filename.c_str());

        return true;
    }
  
    bool doneReading(IO_CONTEXT& ctx, void *buf)
    {
        Serial.printf("Reading file: %s ", ctx.filename.c_str());

        uint32_t chunkSize = min(ASYNC_IO_R_BUFFER_SIZE, ctx.size - _readIO.total);

        _readIO.total += _file.readBytes((char *)_readIO.index, chunkSize);
        _readIO.index += chunkSize;

        //Serial.printf("ctx.size: %d, _readIO.readTotal: %d, chunkSize: %d, _readIO.index: %d\n", ctx.size, _readIO.total, chunkSize, _readIO.index);

        if (_readIO.total >= ctx.size) {
            _file.close();

            Serial.printf("done reading file %s!\n", ctx.filename.c_str());

            return true;
        }

        return false;
    }

    bool doneWriting(IO_CONTEXT& ctx, const byte *buf)
    {
        Serial.printf("Writing file: %s ", ctx.filename.c_str());

        if (_writeIO.remaining > 0) {
            uint32_t chunkSize = min(ASYNC_IO_W_BUFFER_SIZE, _writeIO.remaining);

            memcpy(_writeIO.buffer, (byte *)buf + _writeIO.offset, chunkSize);
            uint32_t bytesWritten = _file.write(_writeIO.buffer, chunkSize);
            yield();

            _writeIO.offset += chunkSize;
            _writeIO.remaining -= bytesWritten;

            Serial.printf("REMAINING <=0, remaining: %d\n", _writeIO.remaining);

            if (_writeIO.remaining <= 0) {
                _file.close();

                Serial.printf("done writing file %s!\n", ctx.filename.c_str());

                _writeIO.remaining = 0;
                _writeIO.offset = 0;

                return true;
            }
        }

        return false;
    }

    void setReadCallback(std::function<void(const IO_CONTEXT&)> cb) {
        _readCallback = cb;
    }

    void *getReadObject(FILE_TYPE fileType)
    {
        switch (fileType)
        {
            case PROJECT_CONFIG:
                return (byte *)&projectConfigForRead;
            case PROJECT_DATA:
                return (byte *)&projectDataForRead;
        }
    }

    byte *getWriteObject(FILE_TYPE fileType)
    {
        switch (fileType)
        {
            case PROJECT_CONFIG:
                return (byte *)&projectConfigForWrite;
            case PROJECT_DATA:
                return (byte *)&projectDataForWrite;
        }
    }
}

and you would add an item to the queue like so:

C++:
AsyncIO::addItem({
    AsyncIO::FILE_TYPE::PROJECT_CONFIG,
    AsyncIO::FILE_IO_TYPE::READ,
    MySD::getProjectConfigFilename(),
    sizeof(projectConfig),
});

and finally just call AsyncIO::update(); in the main loop
 
Sounds good, glad it's working OK for you. Yes, if you turn the above approach into a class then you should be able to instantiate multiple copies, each of which can be in a different state / stage of its progress. If it's worth the effort...

I'm working up a couple of concepts to manage all the yield() clash problems. I'm extremely pessimistic that any changes to cores will ever be considered, so there's intended to be options to:
  • either entirely block EventResponder servicing from yield() while SD access is going on in the library: this is a sledgehammer to crack a nut, as I only want to block servicing if the filesystem calls yield(). However, it's only the same as switching all interrupts off in a critical section, and everyone does that...
  • ...or add an option for the user (that's you) to specify in their code when the triggered event responses are executed. In the simplest case this will consist of adding one line in setup(), and another at the end of loop(); if for some reason loop() has to take a very long time, and you can't afford the memory to increase the buffers, then extra copies of that line would be needed
As of right now it's not working, but I'm fairly sure that's buggy code rather than a fundamental flaw in the concept(s).
 
I'm curious about this application. Can your SD requirements be reduced to simultaneously meeting these two objectives?
  • continuous read from file A at X bytes/sec
  • every Y seconds, write Z bytes to file B
If so, can you specify X, Y, and Z?
 
Hm, for my specific application, I'm honestly not sure it could be simplified that way.

Right now, my application only checks if I need to write every 1 second (Y=1). I don't automatically write every 1 second, I wait until there's a signal telling my application that there are changes that need to be written.

Similarly, the reads are not necessarily continuous, they are on-demand, being triggered by a sequencer emitting trigger events. Assuming you're asking about the audio sample file reads, not project file reads.

If you're referring to the project files as laid out in the example code from post #12 above, the read chunk size (max) is 40960 bytes every loop cycle.

Also, I'm only ever writing in chunks of 4096 bytes (max) every loop cycle.
 
Hm, for my specific application, I'm honestly not sure it could be simplified that way.

Right now, my application only checks if I need to write every 1 second (Y=1). I don't automatically write every 1 second, I wait until there's a signal telling my application that there are changes that need to be written.

Similarly, the reads are not necessarily continuous, they are on-demand, being triggered by a sequencer emitting trigger events. Assuming you're asking about the audio sample file reads, not project file reads.

If you're referring to the project files as laid out in the example code from post #12 above, the read chunk size (max) is 40960 bytes every loop cycle.

Also, I'm only ever writing in chunks of 4096 bytes (max) every loop cycle.

I'm kind of looking for a test case, and yes, when you're reading from an audio file, what is the data rate required for smooth playback? Maybe that's not strictly a constant, but can you provide a figure of merit? If so, that data rate would be X. Since you don't need to write every second, then typically Y > 1 and worst case Y=1, and what would be a typical value of Z, the number of bytes you might write when you do write?
 
> when you're reading from an audio file, what is the data rate required for smooth playback?
Maybe @h4yn0nnym0u5e can chime in here, but I have his SD audio buffering code right now configured at 7 buffers of 512 samples per playback object. My project currently uses a max of 16 playback objects at any one time, but I'm not sure what that equates / averages to in terms of data rate. I probably should know that.

When writing, it's a max of 4096 bytes per loop cycle. Per post #9 above, I noted that at 4kb chunks, it was taking ~77ms to write ~360kb data to a file. 360kb is roughly the most I might write when saving project data during playback. So ~4.68mb per second if I was writing constantly for a whole second.
 
For a mono 16-bit WAV file at 44.1kHz sampling rate, you need 88200 bytes/s sustained rate. My vague rule of thumb for a decent SD card reading multiple files in fairly large chunks (say 8kB) is 10MB/s, so you can get quite a lot playing at once ... in theory. You need to allow a big enough buffer for when the SD card suddenly decides to do housekeeping and take a long time to respond, but that 8kB will last about 93ms so a 16kB buffer per playback object is a reasonable starting point. Reading in 512-sample chunks (the way Teensy Variable Playback works) isn't fantastically efficient, but with 16 objects playing it should still work fine.

Obviously stereo is twice that, all the way up to 8-channel being 706kB/s. Writing to SD is a bit slower, but I've certainly managed simultaneous recording of three files of 8, 6 and 4 channels, i.e. nearly 1.6MB/s, with no issues.

I haven't really tested the limits of simultaneous audio playback and recording, together with "project" files. A lot will depend on what else the application needs to do - displays tend to take quite a while to update, so plenty of buffer is needed, as well as CPU time. Hence PSRAM being a good idea...
 
For a mono 16-bit WAV file at 44.1kHz sampling rate, you need 88200 bytes/s sustained rate. My vague rule of thumb for a decent SD card reading multiple files in fairly large chunks (say 8kB) is 10MB/s, so you can get quite a lot playing at once ... in theory. You need to allow a big enough buffer for when the SD card suddenly decides to do housekeeping and take a long time to respond, but that 8kB will last about 93ms so a 16kB buffer per playback object is a reasonable starting point. Reading in 512-sample chunks (the way Teensy Variable Playback works) isn't fantastically efficient, but with 16 objects playing it should still work fine.

Obviously stereo is twice that, all the way up to 8-channel being 706kB/s. Writing to SD is a bit slower, but I've certainly managed simultaneous recording of three files of 8, 6 and 4 channels, i.e. nearly 1.6MB/s, with no issues.

I haven't really tested the limits of simultaneous audio playback and recording, together with "project" files. A lot will depend on what else the application needs to do - displays tend to take quite a while to update, so plenty of buffer is needed, as well as CPU time. Hence PSRAM being a good idea...

Thanks for this good info. In the other thread, @graydetroit said

I was able to get my project file _reads_ to happen concurrently with buffered SD playback using a non-blocking main loop approach, but as far as writes go, even in 512 byte sized chunks, it seems it pretty much halts the SD card and the SD buffered audio stops. Here is the separate thread.

I'm curious whether he was checking for sd.card->is_busy() before each write when he was using 512-byte writes. Physical writes to SD are ultimately done by sector (512-byte chunks), and a 40+ ms delay can occur on any sector, so I think non-blocking read/write requires three things:
  • avoid blocking by checking sd.card->is_busy() before every read or write
  • always write in chunks of 512 bytes, which should never take more than 5-6 us
  • always have 50+ ms of data in read buffer before writing
The last one guarantees playback can continue if the subsequent write triggers an SD delay.

Does this sound right to you?
 
I'm curious whether he was checking for sd.card->is_busy() before each write when he was using 512-byte writes.
Yes, I was (and am) currently doing that check before reading or writing. The issue at the time was with the buffered audio which was using EventResponder while the underlying SD library was calling yield(). The issue only seemed to occur when I was trying to write to the SD card during audio playback.
 
Unfortunately it's a pretty knotty problem, exacerbated by my use of EventResponder to trigger SD reads. You've participated in The Other Thread on this matter :)

I'm not 100% convinced that checking for SD not being busy will be effective, quite apart from the fact that other filesystems might be in use. It only takes one unexpected yield() to spoil your whole day. If you access SD, and it yields, and that calls EventResponder, which accesses SD ... game over.

I'm fairly close to a significant overhaul of my branch of the Teensy Variable Playback library, which is what @graydetroit is interested in, and thus likely to test :giggle:. I have two schemes to choose from; well, three:
  • never access the filesystem during playback (not great, but it does work!)
  • use a specific function call to service buffer reload requests; it's sort of derived from EventResponder, though in practice it's an almost complete re-write because EventResponder as written doesn't provide enough access to its internal workings. Just call AudioEventResponder::runPolled() every loop(), or after something time-consuming has happened, e.g. a display update
  • leave the yield-based processing as-is, but wrap your filesystem accesses in AudioEventResponder::disableResponse() and AudioEventResponder::enableResponse() calls. This is like masking interrupts for a critical section. It's a bit brutal, because it masks all yield responses, but no finer-grained option is available to me
 
Yes, I was (and am) currently doing that check before reading or writing. The issue at the time was with the buffered audio which was using EventResponder while the underlying SD library was calling yield(). The issue only seemed to occur when I was trying to write to the SD card during audio playback.
Are you also still using my modified SdFat library? I think if you stop using it, even if you check for the card being busy, there's a risk of a crash.
 
Are you also still using my modified SdFat library? I think if you stop using it, even if you check for the card being busy, there's a risk of a crash.
Yes, I am now. Before using the modified SdFat library, my program would still hang even when checking for the card being busy.
 
Back
Top