Periodic click when saving ADC input to SD card

I applied the click patch.

I altered below code, at input_adc.cpp, before IIR High-Pass Filter:
Changed this:
C++:
int16_t *s = capture_buffer + recycle_samples;
to this:
C++:
bool recycle_offset = (*capture_buffer) > (int16_t)(0.024f * 32768);
int16_t *s = capture_buffer + (recycle_offset ? 0 : recycle_samples);

This unpleasant and error-prone patch basically checks if samples recycled from previous update are balanced (their DC offset is removed) or not, by checking the first recycled value against a threshold. If they're not balanced, it makes IIR filter, which performs DC offset removal, start from the beginning of the capture buffer.

I'll continue to look for a real solution. Since this issue, at least for me, may not be about SD card draws power and pulls 3.3 v line down.

While looking for an adequate place for the patch, I noticed a strange pattern. First, here is the data flow:

ADC --DMA--> ADC buffer --> capture buffer w/recycling --> IIR HP (balancing) --> FIR LP w/subsampling

By writing samples to file at different stages, I observed below patterns (without patching):

Before IIR:
1767806087126.png


After IIR & Before FIR:
1767806130830.png


After FIR:
1767806165399.png


When DC offset (unbalanced) samples are leaked via recycling, and after IIR is applied to newly added samples, they become "remained above" samples. These recycled samples' count is about the length (taps) of the FIR filter, by input_adc.cpp design. And after FIR (with gain) is applied, they become sudden saturated samples that unwind quickly, forming the clicks.

Why recycling occasionally carries unbalanced samples? I don't know yet but it's probably related to SD card's actual write delays (~52 ms for me). Teensy audio system's update() call period is about 2.9 ms, which means about 18 update() calls to be missed during SD actual write operation. An update() call doesn't interrupt an executing update(), so this issue may be about DMA is wrapping ADC buffer several times. Discontinuities at before IIR image probably relates to this.

I'll try to increase ADC buffer size (and maybe capture buffer size as well) to mitigate this issue correctly. A quick try emerged consecutive restarts of Teensy. Obviously this alteration must be made by considering its effects comprehensively.
 
I tried with 128 and 2 GB SD cards, different classes, and observed different clicks. Their actual-write delays differ between 52 and 10 ms, respectively, and have different periods. Click periods alter accordingly.

P.S.: I'm still trying to use bigger adc & capture buffers, but Teensy keeps restarting continually. Maybe because of some memory addressing issue. I'll report back here when I managed to do that.
 
@YasarYY:

If you haven't already done so, adding the following to your sketch may be beneficial:

If your code (or any included libraries, etc.) encounters an exception (e.g. NULL pointer reference, etc.), it will automatically reboot, and you can get the cause of the reboot by including the following in your setup() function:

Code:
if (CrashReport) {
   Serial.print(CrashReport);
}

Note that, if a crash occurs, the crash information will be printed in the SerialMonitor. The report will include an address where the crash was recorded. Paul has created a CrashReport() <webpage> with useful details. You can also check the entry in the unofficial Teensy wiki <here> for links to descriptions of where to find the addr2line utility for both the old (1.8.x) & new (2.3.x) Arduino IDE, as well as detailed descriptions of how to use it.

Good luck . . .

Mark J Culross
KD5RXT
 
As someone suggested, I tried a 47uF cap at the power input to the SD hat, which made a barely discernible difference. I have not added anything to the Teensy 4.0
 
I also just read that
Code:
pinMode(<pin>, INPUT_DISABLE);
should be added for analog inputs. I have not tested yet, but will.
I am assuming it is the case for ADC0... ?
Also here
The digital inputs on the Teensy stay active even when the pin is being used for analog reads.
When the voltage passes the logic threshold, the input buffer state changes and the input impedance changes slightly.
If your analog signal has a high output impedance (7.7k in my case) this change in current into the GPIO will cause a significant shift in the reading.
This is a major gotcha when using the ADC on the teensy IMO. I'm not sure if the same is true on other arduinos like the UNO.
In any case, the solution is to simply add pinMode(<pin>, INPUT_DISABLE) for each analog pin in your setup function.
 
should be added for analog inputs
Yep, I'm already doing it for its little help on reducing power consumption:
C++:
// set disabled mode for 55 pins of Teensy 4.1 (https://www.pjrc.com/store/teensy41.html)
for (int i = 0; i < 55; i++) pinMode(i, INPUT_DISABLE);
And after this, changing modes of the pins which are going to be used as needed.
 
It turned out that consecutive restarts are caused by DMA config params CITER and BITER are 15 bits, hence DMA can be programmed for a max of 32767 samples [1].

To wrap up; I sample at 705600 Hz (later subsampling to 352800 Hz), 16 times faster than Audio library default 44100 Hz. Likewise, at every update(), 128*16 = 2048 samples are processed.

Max observed SD card write delay is 52 ms. Divided by update() period 52/2.9 = ~18 update() calls may be missed. So I tried to have ADC buffer sized to the data for 18+ update() calls.

2048 * 18 = 36864 samples cannot be programmed at DMA config because this sample count doesn't fit to 15 bits. DMA channel linking can be used for larger buffers, which I avoided because of its complexity at this time. So I used max allowable sample count, 32767.

ADC buffer of 32767 samples can store 16 update() data. (32767 / 2048 =~ 16)

C++:
const size_t adc_buffer_samples = 32767;

Capture buffer has no DMA limit, so it's set large enough to store ADC buffer samples and recycle samples, and have extra samples as filter length (for in-place FIR filtering).

I removed error-prone patch I had implemented earlier, and adapted processing code in update() accordingly (which was heavily modified earlier, so is out of scope of this issue). Now, without any hardware addition, recordings are quite clean even when I observe 52 ms delays. I deem this as a proper solution for my project. 👌

Using 32767 samples of ADC buffer (instead of 36864 to store ~18 update() data), doesn't create clicks. It may create some discontinuities, which if one observes and wants to solve, should implement DMA channel linking to use multiple DMA channels to cover a larger ADC buffer.

[1]: IMXRT1060RM_rev3_annotations.pdf, page 163
 
BITER/CITER are the number of loops, if you increase the number of bytes per loop (NBYTES) you can increase the total amount transferred.
 
I removed error-prone patch I had implemented earlier, and adapted processing code in update() accordingly (which was heavily modified earlier, so is out of scope of this issue). Now, without any hardware addition, recordings are quite clean even when I observe 52 ms delays. I deem this as a proper solution for my project. 👌

Using 32767 samples of ADC buffer (instead of 36864 to store ~18 update() data), doesn't create clicks. It may create some discontinuities, which if one observes and wants to solve, should implement DMA channel linking to use multiple DMA channels to cover a larger ADC buffer.
What is your current code, if I may ask?
 
BITER/CITER are the number of loops, if you increase the number of bytes per loop (NBYTES) you can increase the total amount transferred.
That's right, that way DMA writes to a wider buffer span. Currently DMA reads 12 bit results of ADC and then writes 16 bit items by doing 4 bit zero padding.
But, if I'm not missing something, making DMA item size 32 bit will still convey one ADC measurement by doing 20 bit zero padding.
 
What is your current code, if I may ask?
Unfortunately, I cannot post the code here, sorry about that. But it is basically processing all buffered data since the last update(), similar to what is happening at input_adc.cpp update() method. Then it writes all the results to the file.

But if needed, I may post a pseudo-code for my current update() method.
 
Unfortunately, I cannot post the code here, sorry about that. But it is basically processing all buffered data since the last update(), similar to what is happening at input_adc.cpp update() method. Then it writes all the results to the file.

But if needed, I may post a pseudo-code for my current update() method.
pseudo-code still might help others

I am (currently) just writing to uSD at 553ksmp/s, and, also getting the every-512 byte glitch. You can see it is also railing the ADC occasionally:
Figure_1.png
 
Google AI mode just told me this which I'm testing:

To use the ADC library (by Pedvide) with manual DMA control, you utilize the library to configure the ADC hardware while using DMAChannel.h to manually set the TCD registers.

Manual ADC-to-Memory Configuration
This setup captures 128 samples from the ADC and stores them in a buffer, triggered automatically by each ADC conversion.

Code:
//cpp

#include <ADC.h>
#include <DMAChannel.h>

ADC *adc = new ADC();
DMAChannel dma;

const int buffer_size = 128;
uint16_t adc_buffer[buffer_size] __attribute__((aligned(32))); // Align for T4.1 cache

void setup() {
// 1. Configure ADC Hardware via Library
adc->adc0->setResolution(12);
adc->adc0->setSamplingSpeed(ADC_SAMPLING_SPEED::HIGH_SPEED);
adc->adc0->enableDMA(); // Crucial: Tells ADC to send DMA requests

// 2. Point DMA to ADC Result Register
// For Teensy 4.x, ADC0 results are at &ADC1_R0
dma.source((volatile uint16_t &)ADC1_R0);
dma.destinationBuffer(adc_buffer, sizeof(adc_buffer));

// 3. Manual TCD Tuning
dma.TCD->NBYTES = 2;       // Minor Loop: Transfer 2 bytes (16-bit ADC result)
dma.TCD->BITER = buffer_size; // Major Loop: Run 128 times
dma.TCD->CITER = buffer_size;

// Ensure Source Address does NOT increment (always read same ADC register)
dma.TCD->SOFF = 0;
// Destination increments by 2 bytes per sample
dma.TCD->DOFF = 2;

// 4. Trigger & Enable
dma.triggerAtHardwareEvent(DMAMUX_SOURCE_ADC1);
dma.enable();

// Start a continuous conversion to keep the DMA busy
adc->adc0->startContinuous(A0);
}


Use code with caution.

Key Register Roles in ADC Capture
  • NBYTES (2): Every time the ADC finishes a conversion, it sends a hardware request. DMA responds by moving exactly 2 bytes (one 12-bit sample in a 16-bit container).
  • BITER/CITER (128): This defines your "block size." Once 128 samples are moved, the DMA engine can trigger an interrupt or automatically reload to start over.
  • SLAST: (Source Last Address Adjustment) Set this to 0 because you are always reading from the fixed ADC result register.
  • DLAST: (Destination Last Address Adjustment) Set this to -sizeof(adc_buffer) to automatically wrap the destination pointer back to the start of your array for a circular buffer effect.

Pro-Tip: Cache Consistency
On Teensy 4.x, the CPU may read "stale" data from its cache instead of the new data the DMA just wrote to RAM. After the DMA finishes, use arm_dcache_delete(adc_buffer, sizeof(adc_buffer)) to force the CPU to look at the actual RAM.
 
BITER/CITER are the number of loops, if you increase the number of bytes per loop (NBYTES) you can increase the total amount transferred.
BITER/CITER are the major loop count; NBYTES is the minor loop count, and for some peripherals can be used to do more transfers without reloading the TCD. When messing with SPI transfers I found NBYTES was limited to the FIFO depth, i.e. 16 words; as far as I can see the ADC has no FIFO, so a higher value for NBYTES may not be useful. Note that the minor loop "is an indivisible operation and cannot be halted" (see e.g. paragraph 6.5.5.24.4), so without a FIFO it will just repeatedly read the same value from the register.

@jmarsh, I know you're aware of all this :)

But, if I'm not missing something, making DMA item size 32 bit will still convey one ADC measurement by doing 20 bit zero padding.
NBYTES and the item size have different meanings; the latter are set in TCDn_ATTR for source and destination. So if you set those for a 16-bit transfer size, and set NBYTES to 4, two 16-bit transfers from source to destination will occur for each count of BITER/CITER.
 
In my case, I'm happy to say that switching from the Adafruit SDIO SD board to the soldered-on simple holder on the Teensy 4.0 made all the difference. Same 10 year old Sandisk 16GB HC card, no blips at 256 byte intervals.
Photodiode card on pin14/ADC0, 586ksmp/s:
Figure_1.png

However, I do see what appear to be missed blocks (of 256)
Figure_1b.png
I seem to recall an example which tracked write interrupts, hits and misses?
 
Last edited:
pseudo-code still might help others
I stripped irrelevant code (for other functionality) and kept relevant code. Resulting pseudo-code is below. The rest is pretty much similar to input_adc.cpp.

At header, add to private variables:
C++:
static uint32_t recycle_samples_begin, recycle_samples_end; // offsets w.r.t. capture_buffer

At cpp:
C++:
// sampling coefficient, to be used to parametrically define samples count per update()
uint16_t samplingCoefficient = 16; // 705600 / 44100 = 16
// set adc_buffer_samples to 32767, being DMA system's max supported sample count, which can hold samples of time span of 16 update() periods
const size_t adc_buffer_samples = (size_t)(AUDIO_BLOCK_SAMPLES * samplingCoefficient * 16) - 1;
// set capture_buffer_samples large enough to store adc_buffer_samples and recycle_samples and maxFilterLen (for in-place FIR filtering)
const size_t capture_buffer_samples = (size_t)(AUDIO_BLOCK_SAMPLES * samplingCoefficient * 18);
// allocate capture and ADC buffers
static __attribute__((aligned(32))) uint16_t adc_buffer[adc_buffer_samples];
static int16_t capture_buffer[capture_buffer_samples];

uint32_t AudioInputAnalog::recycle_samples_begin = 0;
uint32_t AudioInputAnalog::recycle_samples_end = 0;

void AudioInputAnalog::update(void) {
    // one oversampled block samples count, processes are performed in multiples of this
    const uint32_t over_block_samples = AUDIO_BLOCK_SAMPLES * samplingCoefficient;
    const uint32_t needed_samples = over_block_samples + currentFilterLen;
    static uint16_t *prior_p = adc_buffer;
    uint16_t *p = (uint16_t *)dma.TCD->DADDR;

    // DADDR points to next place to write, and we need to back up 1 location to avoid reusing stale oldest data
    if (--p < adc_buffer) p = adc_buffer + adc_buffer_samples - 1;

    // copy raw samples from adc_buffer to capture_buffer
    uint32_t new_samples, recycle_samples;
    recycle_samples = recycle_samples_end - recycle_samples_begin;
    if (p >= prior_p) {
        // new raw ADC samples are contiguous within adc_buffer
        new_samples = p - prior_p;
        // if we don't have enough samples, skip this update() to have more samples next time. normally this should happen only when buffer is new.
        if (recycle_samples + new_samples < needed_samples) {
            return;
        }
        // start filling capture_buffer with the recycle_samples from its end (recycle those samples from previous update)
        if (recycle_samples != 0) memmove(capture_buffer, capture_buffer + recycle_samples_begin, recycle_samples * 2);
        // copy new_samples from adc_buffer to capture_buffer right after recycle_samples
        memcpy(capture_buffer + recycle_samples, prior_p, new_samples * 2);
    } else {
        // new raw ADC samples wrap around from end to start of adc_buffer[]
        uint32_t new_samples1 = (adc_buffer + adc_buffer_samples) - prior_p;
        uint32_t new_samples2 = p - adc_buffer;
        new_samples = new_samples1 + new_samples2;
        // if we don't have enough samples, skip this update() to have more samples next time. normally this should happen only when buffer is new.
        if (recycle_samples + new_samples < needed_samples) {
            return;
        }
        // start filling capture_buffer with the recycle_samples from its end (recycle those samples from previous update)
        if (recycle_samples != 0) memmove(capture_buffer, capture_buffer + recycle_samples_begin, recycle_samples * 2);
        // copy new_samples from adc_buffer to capture_buffer right after recycle_samples
        memcpy(capture_buffer + recycle_samples, prior_p, new_samples1 * 2);
        memcpy(capture_buffer + recycle_samples + new_samples1, adc_buffer, new_samples2 * 2);
    }

    // undo the previously made "back up 1 location"
    if (++p >= adc_buffer + adc_buffer_samples) p = adc_buffer;
    // save current p to prior_p
    prior_p = p;

    // remove DC offset from newly added samples by applying an IIR High-Pass Filter:
    // https://forum.pjrc.com/index.php?threads/dc-offset-removal-noise-distortion-in-input_adc-cpp-input_adcs-cpp.45770/
    // coefficient a = 1048300: 2 Hz cutoff @ 44100 Hz sampling & 32 Hz cutoff @ 705600 Hz sampling
    int16_t *s = capture_buffer + recycle_samples;
    const int16_t *end = s + new_samples;
    while (s < end) {
        int32_t tmp = *s;
        tmp = (((int32_t)tmp) << 14);
        int32_t acc = hpf_y1 - hpf_x1;
        acc += tmp;
        hpf_y1 = FRACMUL_SHL(acc, 1048300 << 10, 1);
        hpf_x1 = tmp;
        int n = signed_saturate_rshift(hpf_y1, 16, 14);
        *s++ = n;
    }

    // determine how many oversampled blocks are there in capture buffer excluding filter length
    const uint32_t block_count = (recycle_samples + new_samples - currentFilterLen) / over_block_samples;
    // apply FIR filter and subsample by 2
    // filter an exact multiple of oversampled block, leave rest to be recycled
    for (int i = 0; i < AUDIO_BLOCK_SAMPLES * samplingCoefficient / 2 * block_count; i++) {
        *(capture_buffer + i) = fir(capture_buffer + i * 2, recordFilter, recordFilterLen);
    }
    // record samples to the file
    // * 2 is to convert sample to bytes
    if (frec != NULL) frec->write(capture_buffer, AUDIO_BLOCK_SAMPLES * samplingCoefficient / 2 * block_count * 2);

    // update recycle_samples offsets
    recycle_samples_begin = block_count * over_block_samples;
    recycle_samples_end = recycle_samples + new_samples;

    // try to allocate an audio block
    audio_block_t *output = NULL;
    output = allocate();
    if (output == NULL) {
        Serial.println("Audio block cannot be allocated!");
        return;
    }

    // transmit a block-sized portion
    // * 2 is to convert sample to bytes
    memcpy(output->data, capture_buffer, AUDIO_BLOCK_SAMPLES * 2);

    // transmit and release the output
    transmit(output);
    release(output);
}
 
However, I do see what appear to be missed blocks (of 256)
It may be caused by, at an SD card write/sync delay, DMA has wrapped around the buffer of 128 bytes three times (instead of one), so 256 samples are overwritten and lost.

Try increasing buffer size to e.g. 512 or more.
 
I am testing a modified TeensyDmaAdcLogger.ino example (attached) with a ping-pong buffer of 2 by 512
Code:
// Preallocate .5GiB file, multiplying 8 by 2^26, resulting in a value of 536,870,912 bytes, or 512 megabytes (MB)
const uint64_t PRE_ALLOCATE_SIZE = 8ULL << 26;

// Ping-pong DMA buffer.
DMAMEM static uint16_t __attribute__((aligned(32))) dmaBuf[2][512]; 

// 400 sector RingBuf - could be larger on Teensy 4.1.
const size_t RING_BUF_SIZE = 400 * 512;
// RingBuf for 512 byte sectors.
RingBuf<FsFile, RING_BUF_SIZE> rb;
I recall that multiples of 512 block writes are also efficient and do not block (?) so I'll see if that makes any difference...

I don't know if I really need to ping-pong, and ultimately will not write live data to uSD (this is just for data validation).
I'll have to peruse your method and see how/if it fits.
 

Attachments

  • TeensyDmaAdcLogger.ino
    9.4 KB · Views: 6
I recall that multiples of 512 block writes are also efficient and do not block (?) so I'll see if that makes any difference...

The SD card has an on-board, 1-sector (512 byte) buffer, so writes of more than 512 bytes result in loops of (a) transfer of 512 bytes to SD buffer, and (b) trigger actual SD write. Any of those 1-sector writes to SD can result in the card becoming busy for a relatively long period (up to 40 ms in my testing), so the only way to avoid blocking in Sdfat write() is to do the following:

- always test for SD.is_busy()
- write only if SD is NOT BUSY
- always write 512 bytes

If SD is not busy and 512 bytes are written, SD.write() will always return in ~5 us.
 
I recall that multiples of 512 block writes are also efficient and do not block
The basic SD write command is for a 512 byte sector. There is also a multi-sector write command. (With an optional command to warn the card how many sectors to expect.)
Best performance comes when using the multi-sector write with RU sized (varies with card but currently really big) chunks.

The way I did it was to use 4K buffers. As many as SRAM supported. When the buffer was filled it got shipped off to be written. You have to have plenty of buffer space as this write can be either really fast, basically the time it takes to transfer the data, or much longer. Depending on what the card is up to. And this is without a library like SdFAT in the middle.
 
The SD card has an on-board, 1-sector (512 byte) buffer, so writes of more than 512 bytes result in loops of (a) transfer of 512 bytes to SD buffer, and (b) trigger actual SD write. Any of those 1-sector writes to SD can result in the card becoming busy for a relatively long period (up to 40 ms in my testing), so the only way to avoid blocking in Sdfat write() is to do the following:

- always test for SD.is_busy()
- write only if SD is NOT BUSY
- always write 512 bytes

If SD is not busy and 512 bytes are written, SD.write() will always return in ~5 us.
This uses Bill Greiman's RingBuf.h, and I can't follow it very well, yet, and don't see an *.is_busy().

However VS Code Copilot seems to have found and fixed the main issue "adjusted buffer allocation and pointer math so BUF_BLOCK_SIZE is treated as bytes (fixes cache invalidation/offset math)."
It now runs ADC at 1.1Msamp/s without missed samples
 

Attachments

  • main.cpp
    10.3 KB · Views: 6
Last edited:
See SdFat example TeensySdioLogger for an example with RingBuf, is_busy(), and 512-byte writes.
Good non-ISR examples using `writeOut()`, but currently working on the ISR version from RingBuf.h and TeensyDmaAdcLogger.ino
Code:
* This ring buffer may be used in ISRs. Use beginISR(), endISR(), write()
 * and print() in the ISR and use writeOut() in non-interrupt code
 * to write data to a file.

I can't get my head around what write() is actually doing, and how to enforce 512 & is_busy() as used by the ISR function in main.cpp:
Code:
static void isr() {
  if (!overrun) {
    // Calculate pointer to the half that was filled by DMA.
    size_t halfSamples = BUF_BLOCK_SIZE / sizeof(uint16_t);
    uint16_t* ptr = dmaBuf + (dmaCount & 1) * halfSamples;
    // Clear cache for buffer filled by DMA to insure read from DMA memory.
    arm_dcache_delete((void*)ptr, BUF_BLOCK_SIZE);
    // Enable RingBuf functions to be called in ISR.
    rb.beginISR();
    if (rb.write(ptr, BUF_BLOCK_SIZE) == BUF_BLOCK_SIZE) {
      dmaCount++;
      if (rb.bytesUsed() > maxBytesUsed) {
        maxBytesUsed = rb.bytesUsed();
      }
    } else {
      overrun = true;
    }
    // End use of RingBuf functions in ISR.
    rb.endISR();
  }

main.cpp is running error-free for now, but I could redesign to not write in the ISR as per the TeensySdioLogger I suppose if needed.
 
Last edited:
Back
Top