Fastest DAC speed possible for Teensy 3.1 using Arduino?

Status
Not open for further replies.

Windfreak

Member
Teensy 3.1 Noob here.
For one of my projects I need as much speed out of the DAC as possible.
What is the maximum sample rate both normally clocked and over clocked to 96MHz using Arduino?
Thanks!
 
For one of my projects I need as much speed out of the DAC as possible.

The main limit to the DAC speed involves the capacitance its analog hardware must drive. Software or DMA can write to the DAC much faster than the analog hardware is capable of settling, so the analog circuitry is the main limiting factor, not the CPU speed.

The DAC has 2 modes: fast and low power. Make sure you use it in fast mode, which draws about 0.5 mA more than low power mode.

What is the maximum sample rate both normally clocked and over clocked to 96MHz using Arduino?

Freescale's datasheet says the typical setting time is 15 us (worst case 30 us) for settling all the way to 1 LSB accuracy.

But that spec is with the worst case 100 pF load. They don't give any other official specs, but the datasheet does say:

2. A small load capacitance (47 pF) can improve the bandwidth performance of the DAC

Since I'm also curious about this, I tried a quick test with 44.1 kHz sample rate, using this code:

Code:
#include <Audio.h>
#include <Wire.h>
#include <SD.h>
#include <SPI.h>

AudioSynthWaveform  osc;
AudioOutputAnalog   audioOutput;
AudioConnection c1(osc, audioOutput);

void setup() {
  AudioMemory(6);
  osc.begin(0.95, 1000, TONE_TYPE_SINE);  
  audioOutput.analogReference(INTERNAL);
}

void loop() {
}

Here's the DAC output with only a 10X (approx 9pF capacitance) scope probe connected to the PCB pin.

dac_noload.png
(click for full size)

This is with the scope's horizontal zoom turned on, so you're seeing the same waveform twice, where the bottom is zoomed in to 10us per division.

Then I connected a 100 pF capacitor. The waveform looks a little noisy, probably because I was holding it in place with my finger, so this not-very-well-conducted test also featured me capacitively coupling to the circuit, and whatever capacitance/impedance/noise-pickup my hand might add...

dac_100pf.png
(click for full size)

Obviously the DAC's output speed becomes a little slower, but nothing anywhere near the 15 us spec from Freescale's datasheet. Maybe their spec is extremely conservative? Or maybe slew rate and bandwidth become a bigger issue for a full-scale step? Admittedly, this very quick measurement only makes a change that's about 5% of the full scale range.

To test larger step performance, I edited the code for a 10 kHz waveform, so each output at 44.1 kHz will be approx 25% of the full scale range.

dac_10kHz_100pf.png
(click for full size)

So even with large steps and a 100 pF load, the DAC seems capable of settling in just a few us. Of course, that might not be to full 12 bit resolution, since we can't see nearly that high a res on this simple scope measurement.


Hopefully these quick-and-dirty scope measurements give you a rough idea of the DAC's actual performance... better than just reading Freescale's very conservative datasheet specs.

On the software side, DMA is the best way to sustain high throughput to the DAC. Of course, it depends on having buffered filled with samples ahead of time. The audio library already has working code designed for 44.1 kHz output. If you want faster, maybe that can at least be a good starting point.

If you do use the DAC much faster, I hope you'll share your results and code?
 
Hi Paul. Thanks for that nice response. It looks very good. It may take me a few weeks to post some code and results. Ideally I will get the sample rate up much higher, hopefully at least a couple 100 KHz..
 
Depending on what the DAC will be driving, you might want to consider using a high speed opamp to buffer the signal, just to be on the safe side.
 
The main limit to the DAC speed involves the capacitance its analog hardware must drive. Software or DMA can write to the DAC much faster than the analog hardware is capable of settling, so the analog circuitry is the main limiting factor, not the CPU speed.

If you are using cpu intensive functions like floating point operations with sin(...) for creating the signal, then the cpu seems indeed to be to slow.

With precomputing the integer values of the sine function, I got this 40khz signal:

scope.jpg

No chance doing that with the standard sin(...) function.

Here is the code:

Code:
IntervalTimer timer0;
void setup() {
  analogWriteResolution(10);
  timer0.begin(timer0_callback, 1.25); 
  pinMode(A14,OUTPUT);
  analogWrite(A14,0);
}

volatile int16_t t = 0;

//http://www.wolframalpha.com/input/?i=table+round%28100%2B412*%28sin%282*pi*t%2F20%29%2B1%29%29+from+0+to+19
int16_t sine_data[20] = {512, 639, 754, 845, 904, 924, 904, 845, 754, 639, 512, 385, 270, 179, 120, 100, 120, 179, 270, 385};

void timer0_callback() {  
  analogWrite(A14,sine_data[t]); 
  t=t+1;
  if (t >= 20) {
   t=0; 
  }
}

void loop() {
  
}
 
If you are using cpu intensive functions like floating point operations with sin(...) for creating the signal, then the cpu seems indeed to be to slow.

That isn't terribly surprising, given the hardware has no support for floating point, and all floating point operations have to be emulated. If you use the float data type, and the float versions of the math functions (i.e. sinf, cosf, etc.), it hopefully should be a bit faster. Paul has dropped hints that the next Teensy 3.x will have hardware support for float, but not double, but since that hasn't been announced yet, it doesn't help you right now.
 
That isn't terribly surprising, given the hardware has no support for floating point, and all floating point operations have to be emulated. If you use the float data type, and the float versions of the math functions (i.e. sinf, cosf, etc.), it hopefully should be a bit faster. Paul has dropped hints that the next Teensy 3.x will have hardware support for float, but not double, but since that hasn't been announced yet, it doesn't help you right now.

Cortex M4 (Teensy 3.x) has a FPU. So maybe this is a compiler optimization problem?
 
Looks like 40 kHz to me.

But an interrupt every 1.25 us is a lot of CPU overhead. DMA is far more efficient for fast sample rates.
 
If you do use the DAC much faster, I hope you'll share your results and code?

K66 DAC settle time (data sheet says 15 us)
Just for the record, with only the scope probe connected to A21 on T3.6, running a simple square wave
Code:
  while (1) {
    analogWrite(A21, 0);
    delayMicroseconds(1);
    analogWrite(A21, 255);
    delayMicroseconds(1);
  }
I get the following
settle.png
Rise time of 540 ns. (By comparison, digitalWriteFast() toggle of digital pin has < 40 ns rise time)

If i run the sine sketch earlier in this post on T3.6@180mhz with float and sinf() and no delay, I get a 1.3khz sine wave (315 sample points, so 2.44 us per sample). If I run DAC/DMA/PDB sine table (128 sample points) with PDB @ 1mhz, on scope i get nice sine wave @7.81 khz (1us per sample).

Code:
// DMA output to DAC clocked by PDB
// https://forum.pjrc.com/threads/28101-Using-the-DAC-with-DMA-on-Teensy-3-1

#include <DMAChannel.h>

// PDB ticks at F_BUS
#define PDB_PERIOD (60-1)
#define PDB_CONFIG (PDB_SC_TRGSEL(15) | PDB_SC_PDBEN | PDB_SC_CONT | PDB_SC_PDBIE | PDB_SC_DMAEN)

DMAChannel dma(false);

static volatile uint16_t sinetable[] = {
  2047,    2147,    2248,    2348,    2447,    2545,    2642,    2737,
  2831,    2923,    3012,    3100,    3185,    3267,    3346,    3422,
  3495,    3564,    3630,    3692,    3750,    3804,    3853,    3898,
  3939,    3975,    4007,    4034,    4056,    4073,    4085,    4093,
  4095,    4093,    4085,    4073,    4056,    4034,    4007,    3975,
  3939,    3898,    3853,    3804,    3750,    3692,    3630,    3564,
  3495,    3422,    3346,    3267,    3185,    3100,    3012,    2923,
  2831,    2737,    2642,    2545,    2447,    2348,    2248,    2147,
  2047,    1948,    1847,    1747,    1648,    1550,    1453,    1358,
  1264,    1172,    1083,     995,     910,     828,     749,     673,
  600,     531,     465,     403,     345,     291,     242,     197,
  156,     120,      88,      61,      39,      22,      10,       2,
  0,       2,      10,      22,      39,      61,      88,     120,
  156,     197,     242,     291,     345,     403,     465,     531,
  600,     673,     749,     828,     910,     995,    1083,    1172,
  1264,    1358,    1453,    1550,    1648,    1747,    1847,    1948,
};

void setup() {
  dma.begin(true); // allocate the DMA channel first

  SIM_SCGC2 |= SIM_SCGC2_DAC0; // enable DAC clock
  DAC0_C0 = DAC_C0_DACEN | DAC_C0_DACRFS; // enable the DAC module, 3.3V reference
  // slowly ramp up to DC voltage, approx 1/4 second
  for (int16_t i = 0; i < 2048; i += 8) {
    *(int16_t *)&(DAC0_DAT0L) = i;
    delay(1);
  }

  // set the programmable delay block to trigger DMA requests
  SIM_SCGC6 |= SIM_SCGC6_PDB; // enable PDB clock
  PDB0_IDLY = 0; // interrupt delay register
  PDB0_MOD = PDB_PERIOD; // modulus register, sets period
  PDB0_SC = PDB_CONFIG | PDB_SC_LDOK; // load registers from buffers
  PDB0_SC = PDB_CONFIG | PDB_SC_SWTRIG; // reset and restart
  PDB0_CH0C1 = 0x0101; // channel n control register?

  dma.sourceBuffer(sinetable, sizeof(sinetable));
  dma.destination(*(volatile uint16_t *) & (DAC0_DAT0L));
  dma.triggerAtHardwareEvent(DMAMUX_SOURCE_PDB);
  dma.enable();
}

void loop() {}

Anecdotal square-wave settle times for various DACs
https://github.com/manitou48/DUEZoo/blob/master/dac.txt
 
Last edited:
I know this is very old, but if you are trying to squeeze a few more cycles out of the teensy, I improved the time taken to set a value for the DACs on a teensy 3.6 by moving the instruction that sets up the voltage reference to setup. Careful, this relies on nothing else changing the DAC voltage reference in your code.

I copied this from the teensy core code, then moved the commented lines to setup.

Code:
inline void writeDAC1(int val) { //modified from PJRC "analogwriteDAC#" from teensy cores analog.c{
  //SIM_SCGC2 |= SIM_SCGC2_DAC1;//moved to setup, seems ok so far. saved about 50 cycles!
  //DAC1_C0 = DAC_C0_DACEN | DAC_C0_DACRFS; // 3.3V VDDA is DACREF_2
  __asm__ ("usat    %[value], #12, %[value]\n\t" : [value] "+r" (val));  // 0 <= val <= 4095
  *(int16_t *)&(DAC1_DAT0L) = val;
}

inline void writeDAC0(int val) {
  //  SIM_SCGC2 |= SIM_SCGC2_DAC0;
  //  DAC0_C0 = DAC_C0_DACEN | DAC_C0_DACRFS; // 3.3V VDDA is DACREF_2
  __asm__ ("usat    %[value], #12, %[value]\n\t" : [value] "+r" (val));  // 0 <= val <= 4095
  *(int16_t *)&(DAC0_DAT0L) = val;
}
 
Status
Not open for further replies.
Back
Top