Fast streaming USB data through Teensy DAC

Status
Not open for further replies.

Dizzixx

Member
I should probably preface this by stating that I am relatively amateur when it comes to microprocessor and low level programming languages so bear with me.

I am trying to stream data from the computer to some kind of DAC and I would like to be able to do 4 channels at 1Msps for 8 seconds. This is to reproduced data that was recorded at 10Khz. I would like to be able to oversample as much as possible. 400Ksps would probably be plenty.

I have considered trying to get a more complicated solution setup making use of additional memory using SPI or something but I would need to have a fair bit and was unsure that I would be successful.

Instead I have arrived at trying to split the data between two computer USB ports and stream to two teensys simultaneously. This should provide roughly 24Mbit/s of usable bandwidth which using only the 12 bits needed to run the DAC with minimal additional communication overhead should make ~500Ksps/channel possible.

I have a rough solution which makes use of elapsedMicros but I can see the time jitter on my scope while the serial is active. Once the serial is done and the loop settles to just running through the buffer it becomes very stable. But the frequency I am reading on the scope versus the projected one is consistently off. The measured frequency is lower. I can adjust timing on either end and get close.

I was wondering if someone could point me in the direction of an example making use of DMA and PDB to stream serial data to the DAC. I imagine the audio library might have someone close to this but I am unsure where to look.

Code:
//PYTHON CODE COMPRESSED=TRUE

//microsecond elapsed time timer
elapsedMicros usec = 0;
//number of microseconds between DAC update
const unsigned int numMicros=5;

//for processing of incoming serial data
unsigned int commandBytes=0;//the command bytes value read from the serial buffer 
unsigned int command_data = 0xFFFF;//Data command
//masks to split 3x bytes into 2x 12 bit values
byte splitByteMask_1 = 0xF0; //upper mask of split byte
byte splitByteMask_2 = 0x0F; //lower mask of split byte

const unsigned int num_buffered_values = 24000;
const unsigned int DAC_watermark = 12000;

//RAM buffers
unsigned int buffer_DAC_1[num_buffered_values];
unsigned int buffer_DAC_2[num_buffered_values];
unsigned int *DAC_buffers[2] = {buffer_DAC_1,buffer_DAC_2};
//Array of  IDX's
unsigned int buffer_DAC_write_idxs[2] = {0,0};
unsigned int buffer_DAC_read_idxs[2] = {0,0};
//Flag to turn DAC output on
bool DAC_OUTPUT_FLAG = false;

//temporary containers while reading the serial buffer
unsigned int BYTE_1=0;
unsigned int BYTE_2=0;    
unsigned int BYTE_3=0;
unsigned int tmpVal_1=0;
unsigned int tmpVal_2=0;
unsigned int tmpVals[2];

void splitBytes(byte BYTE1, byte BYTE2, byte BYTE3){
  //Convert from 3 bytes to 2x 12 bit values done to conserve data transfer overhead
  //12-bit Value 1
  tmpVals[0] = BYTE1;
  tmpVals[0] <<= 4;
  tmpVals[0] |=  ((BYTE2 & splitByteMask_1)>>4); //mask to get the upper nibble of BYTE2 then shift down to LSB and combine to find Val1
  //12-bit Value 2
  tmpVals[1] = (BYTE2 & splitByteMask_2);
  tmpVals[1] <<= 8;
  tmpVals[1] |= BYTE3;
}

void setup() {
  analogWriteResolution(12);
  // put your setup code here, to run once:
  Serial.begin(9600); //start serial listening (Note that teensy ignores Baud and always runs at High speed 12Mbit/s)

  // Initially fill the memory buffers with all 0's
  for(int idx=0; idx<num_buffered_values;idx++){
    buffer_DAC_1[idx]=0;
    buffer_DAC_2[idx]=0;
  }
}

void loop() {
  // put your main code here, to run repeatedly:
  if(Serial.available()==50){//Serial packet size is set at 50 bytes, 2 command bytes followed by 32 12-bit values
    //Read in the first two bytes as the commandBytes
    commandBytes = Serial.read();//read in the first 8 most significant bits
    commandBytes = commandBytes<<8; //shift 8 bits left
    commandBytes |= Serial.read();//fill in the 8 least significant bits
    //if this is data
    if(commandBytes==command_data){
      //Read 24 byte of data for each DAC channel to memory
      for(int DAC_channel=0;DAC_channel<2;DAC_channel++){
        //for this channel perform 8x 3 byte reads (24 bytes total)
        for(int idx=0;idx<8;idx++){
          //Read 3 bytes containing 2x 12-bit values for DAC
          BYTE_1 = Serial.read();
          BYTE_2 = Serial.read();
          BYTE_3 = Serial.read();         
          splitBytes(BYTE_1, BYTE_2, BYTE_3);    
          //Write to the memory buffer
          DAC_buffers[DAC_channel][buffer_DAC_write_idxs[DAC_channel]]=tmpVals[0];
          buffer_DAC_write_idxs[DAC_channel]+=1;
          DAC_buffers[DAC_channel][buffer_DAC_write_idxs[DAC_channel]]=tmpVals[1];
          buffer_DAC_write_idxs[DAC_channel]+=1;
        }
      }
      //If the RAM buffer is full then restart at zero index
      if(buffer_DAC_write_idxs[0]==num_buffered_values){
        buffer_DAC_write_idxs[0]=0;
      }
      if(buffer_DAC_write_idxs[1]==num_buffered_values){
        buffer_DAC_write_idxs[1]=0;
      }   
    }
    //Otherwise clear these bytes
    else{
      BYTE_1 = Serial.read();
      BYTE_2 = Serial.read();
      BYTE_3 = Serial.read();       
      splitBytes(BYTE_1, BYTE_2, BYTE_3);
      Serial.println("As read:");
      Serial.println(BYTE_1);
      Serial.println(BYTE_2);
      Serial.println(BYTE_3);
      Serial.println("De-compressed:");
      Serial.println(tmpVals[0]);
      Serial.println(tmpVals[1]);
      for(int idx=0;idx<45;idx++){
        Serial.read();
      }
      Serial.println("!!SERIAL CLEARED!!");
    }
  }

  //If the DAC is not already running
  if(!DAC_OUTPUT_FLAG){
    //Once reaching the watermark in RAM buffers then start the DAC
    if(buffer_DAC_write_idxs[0]==DAC_watermark && buffer_DAC_write_idxs[1]==DAC_watermark){
      DAC_OUTPUT_FLAG=true;
      usec = 0;
      analogWrite(A22, 0);
      analogWrite(A21, 0);
    }    
  }
  //If one time step has elapsed then update the DAC
  if(usec >= numMicros){
      analogWrite(A22, DAC_buffers[0][buffer_DAC_read_idxs[0]]);
      analogWrite(A21, DAC_buffers[1][buffer_DAC_read_idxs[1]]);
      usec = 0;
      buffer_DAC_read_idxs[0]+=1;
      buffer_DAC_read_idxs[1]+=1;
      //If at the end of the RAM buffer then wrap around to zero idx
      if(buffer_DAC_read_idxs[0]==num_buffered_values){
        buffer_DAC_read_idxs[0]=0;
      }
      if(buffer_DAC_read_idxs[1]==num_buffered_values){
        buffer_DAC_read_idxs[1]=0;
      }
          
  }
 
}
 
Last edited:
Thanks

Thanks Paul. I know you have done a lot of work to make things as friendly, bulletproof, and usable as possible. I looked at the audio library and I think I get the gist but there is still a bunch of stuff that I dont know exactly what it is doing. Additionally I do not see anything referring to the second channel DAC1?

I have been messing around with the basic example at https://forum.pjrc.com/threads/28101-Using-the-DAC-with-DMA-on-Teensy-3-1 and have been able to get it to work. I then extended this to the second channel on DAC1 but was surprised to find that DAC1 seems to want to update at 2x the frequency of DAC0.... any idea why that might be? It is probably obvious to the initiated but I have no idea.

Code:
#include <DMAChannel.h>
#include "kinetis.h"

#define PDB_PERIOD (47) // Teensy 3.6 with 60 MHz F_Bus, (F_CPU = 180 MHz ) / 3
#define PDB_CONFIG (PDB_SC_TRGSEL(15) | PDB_SC_PDBEN | PDB_SC_CONT | PDB_SC_PDBIE | PDB_SC_DMAEN)

DMAChannel dma1(false);
DMAChannel dma2(false);


volatile uint16_t sinetable1[] = {
   2047,    2147,    2248,    2348,    2447,    2545,    2642,    2737,
   2831,    2923,    3012,    3100,    3185,    3267,    3346,    3422,
   3495,    3564,    3630,    3692,    3750,    3804,    3853,    3898,
   3939,    3975,    4007,    4034,    4056,    4073,    4085,    4093,
   4095,    4093,    4085,    4073,    4056,    4034,    4007,    3975,
   3939,    3898,    3853,    3804,    3750,    3692,    3630,    3564,
   3495,    3422,    3346,    3267,    3185,    3100,    3012,    2923,
   2831,    2737,    2642,    2545,    2447,    2348,    2248,    2147,
   2047,    1948,    1847,    1747,    1648,    1550,    1453,    1358,
   1264,    1172,    1083,     995,     910,     828,     749,     673,
    600,     531,     465,     403,     345,     291,     242,     197,
    156,     120,      88,      61,      39,      22,      10,       2,
      0,       2,      10,      22,      39,      61,      88,     120,
    156,     197,     242,     291,     345,     403,     465,     531,
    600,     673,     749,     828,     910,     995,    1083,    1172,
   1264,    1358,    1453,    1550,    1648,    1747,    1847,    1948,
};

volatile uint16_t sinetable2[] = {
   2047,    2147,    2248,    2348,    2447,    2545,    2642,    2737,
   2831,    2923,    3012,    3100,    3185,    3267,    3346,    3422,
   3495,    3564,    3630,    3692,    3750,    3804,    3853,    3898,
   3939,    3975,    4007,    4034,    4056,    4073,    4085,    4093,
   4095,    4093,    4085,    4073,    4056,    4034,    4007,    3975,
   3939,    3898,    3853,    3804,    3750,    3692,    3630,    3564,
   3495,    3422,    3346,    3267,    3185,    3100,    3012,    2923,
   2831,    2737,    2642,    2545,    2447,    2348,    2248,    2147,
   2047,    1948,    1847,    1747,    1648,    1550,    1453,    1358,
   1264,    1172,    1083,     995,     910,     828,     749,     673,
    600,     531,     465,     403,     345,     291,     242,     197,
    156,     120,      88,      61,      39,      22,      10,       2,
      0,       2,      10,      22,      39,      61,      88,     120,
    156,     197,     242,     291,     345,     403,     465,     531,
    600,     673,     749,     828,     910,     995,    1083,    1172,
   1264,    1358,    1453,    1550,    1648,    1747,    1847,    1948,
};

void setup() {
  dma1.begin(true); // allocate the DMA channel first
  dma2.begin(true); // allocate the DMA channel first
  
  SIM_SCGC2 |= SIM_SCGC2_DAC0; // enable DAC clock
  SIM_SCGC2 |= SIM_SCGC2_DAC1; // enable DAC clock
  
  DAC0_C0 = DAC_C0_DACEN | DAC_C0_DACRFS; // enable the DAC module, 3.3V reference
  DAC1_C0 = DAC_C0_DACEN | DAC_C0_DACRFS; // enable the DAC module, 3.3V reference
  
  // slowly ramp up to DC voltage, approx 1/4 second
  for (int16_t i=0; i<2048; i+=8) {
    *(int16_t *)&(DAC0_DAT0L) = i;
    *(int16_t *)&(DAC1_DAT0L) = i;
    delay(1);
  }
  
  // set the programmable delay block to trigger DMA requests
  SIM_SCGC6 |= SIM_SCGC6_PDB; // enable PDB clock
  PDB0_IDLY = 0; // interrupt delay register
  PDB0_MOD = PDB_PERIOD; // modulus register, sets period
  PDB0_SC = PDB_CONFIG | PDB_SC_LDOK; // load registers from buffers
  PDB0_SC = PDB_CONFIG | PDB_SC_SWTRIG; // reset and restart
  PDB0_CH0C1 = 0x0101; // channel n control register?
  
  dma1.sourceBuffer(sinetable1, sizeof(sinetable1));
  dma1.destination(*(volatile uint16_t *)&(DAC0_DAT0L));
  dma1.triggerAtHardwareEvent(DMAMUX_SOURCE_PDB);
  dma1.enable();

  dma2.sourceBuffer(sinetable2, sizeof(sinetable2));
  dma2.destination(*(volatile uint16_t *)&(DAC1_DAT0L));
  dma2.triggerAtHardwareEvent(DMAMUX_SOURCE_PDB);
  dma2.enable();
  
}

void loop() {
  // do nothing here
}

Anything stand out to anyone?

I would ultimately like to replicate the well explained example at https://hackaday.io/project/12543-s.../41575-dac-with-dma-and-buffer-on-a-teensy-32 (well explained but still difficult for a newb to swallow) for 2 channels where the buffer is filled by serial data.

Thanks again for taking the time to reply.
 
Thanks Paul. I know you have done a lot of work to make things as friendly, bulletproof, and usable as possible. I looked at the audio library and I think I get the gist but there is still a bunch of stuff that I dont know exactly what it is doing. Additionally I do not see anything referring to the second channel DAC1?

I have been messing around with the basic example at https://forum.pjrc.com/threads/28101-Using-the-DAC-with-DMA-on-Teensy-3-1 and have been able to get it to work. I then extended this to the second channel on DAC1 but was surprised to find that DAC1 seems to want to update at 2x the frequency of DAC0.... any idea why that might be? It is probably obvious to the initiated but I have no idea.

With two DAC, does the sound have correct frequencies?
Did you try to use only DAC1 (no DAC0)? PDB speed should be normal.
My theory is that having two DAC driven by the same PDB you have to double the PDB trigger rate to achieve the desired DAC sampling rate.
 
My theory is that having two DAC driven by the same PDB you have to double the PDB trigger rate to achieve the desired DAC sampling rate.
No. DMA triggering doesn't work that way.

You can only use a DMA trigger once. When you use it more than once, unpredictable things happen. There is an easy to miss warning in the Kinetis manuals. "20.3.1 Channel Configuration register (DMAMUX_CHCFGn):"
Setting multiple CHCFG registers with the same Source value will result in unpredictable behavior.

You could either use linked DMA channels (dma.triggerAtTransfersOf()) or use a minor loop, like the audio library stereo output:
https://github.com/PaulStoffregen/Audio/blob/master/output_dacs.cpp

However, doing a DMA transfer like that is not the best way to go. As long as the DAC output is the only DMA transfer happening, it will have very low jitter (a few CPU cycles). However, if there are other DMA transfers they can block the DAC transfer (potentially dozens of microseconds).

The DACs have input FIFOs and loading a new output value can be triggered at precise intervals by the PDB. The FIFO can generate DMA requests.
 
With two DAC, does the sound have correct frequencies?
Did you try to use only DAC1 (no DAC0)? PDB speed should be normal.
My theory is that having two DAC driven by the same PDB you have to double the PDB trigger rate to achieve the desired DAC sampling rate.

As uploaded PDB is set to update the DAC value every 47 ticks of F_BUS (which for my teensy 3.6 @ 180 Mhz is I believe 60 Mhz) which for 128 samples per cycle (output sine wave cycles) should produce (60 M ticks/second) / (47 ticks/sample * 128 sample/cycle) = 9973.404 cycles/sec ~ 10Khz

On my scope (which is a cheap kit build) I read 19.54 kHz on pin A22 (DAC1) and 9.77 kHz on pin A21 (DAC0) pretty much exactly double the frequency. This relationship where DAC1 is double the rate of DAC0 holds true regardless of PDB_PERIOD.

For example doing the same thing for a PDB_PERIOD of 128,000 I get 7.5 and 15 Hz which again is exactly double (ignore for a moment that we would expect the output to be 3.66 Hz...)

This relationship holds true regardless of what I set PDB_PERIOD to and would lead me to believe that for some reason DAC1 is being updated when DAC0 is updated and then again on its own, but I do not understand what would cause this, any ideas?

Code:
#include <DMAChannel.h>
#include "kinetis.h"

#define PDB_PERIOD (128000) // Teensy 3.6 with 60 MHz F_Bus, (F_CPU = 180 MHz ) / 3
#define PDB_CONFIG (PDB_SC_TRGSEL(15) | PDB_SC_PDBEN | PDB_SC_CONT | PDB_SC_PDBIE | PDB_SC_DMAEN)

DMAChannel dma2(false);

volatile uint16_t sinetable2[] = {
   2047,    2147,    2248,    2348,    2447,    2545,    2642,    2737,
   2831,    2923,    3012,    3100,    3185,    3267,    3346,    3422,
   3495,    3564,    3630,    3692,    3750,    3804,    3853,    3898,
   3939,    3975,    4007,    4034,    4056,    4073,    4085,    4093,
   4095,    4093,    4085,    4073,    4056,    4034,    4007,    3975,
   3939,    3898,    3853,    3804,    3750,    3692,    3630,    3564,
   3495,    3422,    3346,    3267,    3185,    3100,    3012,    2923,
   2831,    2737,    2642,    2545,    2447,    2348,    2248,    2147,
   2047,    1948,    1847,    1747,    1648,    1550,    1453,    1358,
   1264,    1172,    1083,     995,     910,     828,     749,     673,
    600,     531,     465,     403,     345,     291,     242,     197,
    156,     120,      88,      61,      39,      22,      10,       2,
      0,       2,      10,      22,      39,      61,      88,     120,
    156,     197,     242,     291,     345,     403,     465,     531,
    600,     673,     749,     828,     910,     995,    1083,    1172,
   1264,    1358,    1453,    1550,    1648,    1747,    1847,    1948,
};

void setup() {
  dma2.begin(true); // allocate the DMA channel first
  
  SIM_SCGC2 |= SIM_SCGC2_DAC1; // enable DAC clock
  
  DAC1_C0 = DAC_C0_DACEN | DAC_C0_DACRFS; // enable the DAC module, 3.3V reference
  
  // slowly ramp up to DC voltage, approx 1/4 second
  for (int16_t i=0; i<2048; i+=8) {
    *(int16_t *)&(DAC1_DAT0L) = i;
    delay(1);
  }
  
  // set the programmable delay block to trigger DMA requests
  SIM_SCGC6 |= SIM_SCGC6_PDB; // enable PDB clock
  PDB0_IDLY = 0; // interrupt delay register
  PDB0_MOD = PDB_PERIOD; // modulus register, sets period
  PDB0_SC = PDB_CONFIG | PDB_SC_LDOK; // load registers from buffers
  PDB0_SC = PDB_CONFIG | PDB_SC_SWTRIG; // reset and restart
  PDB0_CH0C1 = 0x0101; // channel n control register?
  
  dma2.sourceBuffer(sinetable2, sizeof(sinetable2));
  dma2.destination(*(volatile uint16_t *)&(DAC1_DAT0L));
  dma2.triggerAtHardwareEvent(DMAMUX_SOURCE_PDB);
  dma2.enable();
  
}

void loop() {
  // do nothing here
}

If I drive just DAC0 or DAC1 as above with a period of 128,000 ticks I get 7.5 Hz regardless of which DAC I use. The same remains true for a period of 47 ticks I get 9.77 kHz on each DAC independently.

If someone can explain how to use the proven audio library at these rates and do two channels I would be happy to try that.
 
Just as a note about the low frequencies being off. It appears to be spot on (error around .01%) between 7.19 and 78.125 Hz (PDB_PERIOD's of 65,188 and 6,000 ticks each) but frequencies below this are off wildly.

I believe there is an overflow error. And sure enough if I go one tick past 65535 to 65536 I get crazy numbers. I think for some reason that PDB_PERIOD is defined as a 16 bit value (I am not very famillar with C so I dont know how easy it is to re-define it to be 32 bits) while PDB0_MOD is a 32 bit register. I dont actually need frequencies this low but just in case anyone else runs into this.
 
No. DMA triggering doesn't work that way.

You can only use a DMA trigger once. When you use it more than once, unpredictable things happen. There is an easy to miss warning in the Kinetis manuals. "20.3.1 Channel Configuration register (DMAMUX_CHCFGn):"
Setting multiple CHCFG registers with the same Source value will result in unpredictable behavior.

OK, my theory is not a valid one. In fact, now that you say it, I recall that I was reading this note.
 
Just as a note about the low frequencies being off. It appears to be spot on (error around .01%) between 7.19 and 78.125 Hz (PDB_PERIOD's of 65,188 and 6,000 ticks each) but frequencies below this are off wildly.

I believe there is an overflow error. And sure enough if I go one tick past 65535 to 65536 I get crazy numbers. I think for some reason that PDB_PERIOD is defined as a 16 bit value (I am not very famillar with C so I dont know how easy it is to re-define it to be 32 bits) while PDB0_MOD is a 32 bit register.
Read the manual, "35.3.2 Modulus Register (PDBx_MOD)". The PDB mod is 16 bits.

PDBx_MOD field descriptions
31–16 This field is reserved.
15–0 PDB Modulus
 
The DACs have input FIFOs and loading a new output value can be triggered at precise intervals by the PDB. The FIFO can generate DMA requests.

I've personally never managed to get the DAC FIFO stuff to work properly.

You could either use linked DMA channels (dma.triggerAtTransfersOf()) or use a minor loop, like the audio library stereo output:
https://github.com/PaulStoffregen/Audio/blob/master/output_dacs.cpp

Going with the minor loop writing directly to both DACs was the simplest way.

I must admit, after getting that working I really wasn't very motivated to ever go back and figure out how to get the FIFO to work.

However, doing a DMA transfer like that is not the best way to go. As long as the DAC output is the only DMA transfer happening, it will have very low jitter (a few CPU cycles). However, if there are other DMA transfers they can block the DAC transfer (potentially dozens of microseconds).

Yes, DMA latency can cause jitter this way. But it's very unlikely to be on the scale of dozens of microseconds.

Still, using the FIFOs would be good someday. Also on the nice-to-have to very low priority list would be the 4X oversampling that's in PT8211 and maybe even 4X oversampling & filtering on the ADC inputs.
 
I should probably also point out the DAC has a certain limited analog bandwidth, no matter how fast you write to it from the digital side.

Freescale only gives a very conservative worst case spec with maximum capacitive loading. It seems to be quite a lot faster under "typical" usage. But there's no official specs. There's also good reason to suspect the output buffer of the DAC is quite sensitive to capacitive load.

If you're really needing a fast output, you might first do some tests with a fast opamp buffer and measure the actual analog rise time or toggle it very fast and try to measure the bandwidth, or just roughly guess by looking at whether the output still looks like a stairstep. Might be good to know the analog performance before you put a lot of work into optimizing the digital side.
 
Yes, DMA latency can cause jitter this way. But it's very unlikely to be on the scale of dozens of microseconds.
It depends on how much you transfer per DMA trigger. E.g. with 16 bytes per minor loop, you may already be looking at 0.5us, a large DMA memcpy (or continuous transfer to a GPIO port) can be hundreds of us.
 
FYI: Jitter-free DAC output using the FIFO and DMA.

\\


There are various hardware bugs with DMA and the PDB, but this actually works. The code from post #3 with the trigger changed to 'triggerAtTransfersOf()' should work.

I am trying out the jitter free example. I do not understand where the actual sample frequency is coming from

Code:
    SIM_SCGC6 |= SIM_SCGC6_PDB;         // enable PDB clock
    PDB0_SC |= PDB_SC_PDBEN;            // enable PDB
    PDB0_SC |= PDB_SC_TRGSEL(15);       // SW trigger
    PDB0_SC |= PDB_SC_CONT;             // run contiguous

    PDB0_SC |= PDB_SC_PRESCALER(0b111); // prescaler 128

    // Prescaler multipliers other than 1x don't work correctly.
    PDB0_SC |= PDB_SC_MULT(0b0);

    // Adjust for output frequency.
    // out frequency == F_BUS / PRESCALER / (PDB0_MOD + 1)
    PDB0_MOD = 0xffff;

    // The hardware doesn't do what the manual claims. The DAC trigger counter is
    // not reset on PDB counter overflow. Things work correctly, if the PDB mod
    // is used.
    PDB0_DACINT0 = PDB0_MOD;           // trigger DAC once per PDB cycle
    PDB0_DACINTC0 = PDB_DACINTC_TOE;   // enable DAC interval trigger

    PDB0_SC |= PDB_SC_LDOK;            // sync buffered PDB registers
    PDB0_SC |= PDB_SC_SWTRIG;          // start PDB

It see where it states that it should be "out frequency == F_BUS / PRESCALER / (PDB0_MOD + 1)" but I have set everything to 1s and get no-where near 60Msps out. I am getting closer to 1Ksps....

I will keep poking at it and see where I get. Also I note this is very similar to code at https://hackaday.io/project/12543-s.../41575-dac-with-dma-and-buffer-on-a-teensy-32
 
PDB0_MOD = 0xffff;
...
PDB0_DACINT0 = PDB0_MOD; // trigger DAC once per PDB cycle

It see where it states that it should be "out frequency == F_BUS / PRESCALER / (PDB0_MOD + 1)" but I have set everything to 1s and get no-where near 60Msps out. I am getting closer to 1Ksps....
The problem is, the mod register isn't synced at that point, so the wrong value is read. Make that:
uint16_t mod = ...;
PDB0_MOD = mod;
PDB0_DACINT0 = mod;

Yes, it looks like he is doing things pretty much the same way, only using 16 FIFO entries instead of 8. He should mention the FIFO, so that it turns up in a search. If I had seen his code, I wouldn't have bothered with the example...
 
The problem is, the mod register isn't synced at that point, so the wrong value is read. Make that:
uint16_t mod = ...;
PDB0_MOD = mod;
PDB0_DACINT0 = mod;


Yes, it looks like he is doing things pretty much the same way, only using 16 FIFO entries instead of 8. He should mention the FIFO, so that it turns up in a search. If I had seen his code, I wouldn't have bothered with the example...

Thank you tni, that solved it. I will be looking more closely at the differences between his code and yours and I will let everyone know if I see anything interesting.

For now here is a working example of 2 Channel DAC using DMA and FIFO. I added a sine and saw wave LUT for debugging at higher update rates as the 1/4 full step is of course too fast for the DAC at 1Msps. I will be looking into getting a better idea of the performance per Pauls suggestions. Then I will look at streaming data via hardware serial from a PC undersampled and interpolating to the full rate output.

PS - If I am doing some terrible things in the code don't hate me I'm not a programmer and am a newb :)

Code:
/*
 * Distributed under the 2-clause BSD licence (See LICENCE.TXT file at the
 * repository root).
 *
 * Copyright (c) 2017 Tilo Nitzsche.  All rights reserved.
 *
 * https://github.com/tni/teensy-samples
 */


// Modified version of:
// -----------------------------------------------------------------------------
// 
// Example for Teensy 3.x DAC output using the FIFO.
// Precise, jitter-free timing is achieved, using the PDB to trigger
// loading of new values from the FIFO. The FIFO is filled using DMA.
//
// The FIFO is not really a FIFO (there is no support for a push or pop operation),
// but rather a ringbuffer. There is an index to the currently used entry.
//
// DMA transfers can be triggered by the FIFO index being at 0 and
// the FIFO watermark. This code uses these triggers to transfer 4 entries
// at a time. The lower half of the FIFO (entries 0 - 3) is updated when
// the FIFO index reaches 4 (the watermark), the upper half (entries 4 - 7)
// is updated when the FIFO index wraps to 0.
//
// The FIFO DMA destination is set up using the DMA modulo feature,
// wrapping the DMA destination pointer at 8 FIFO entries.
//
// While the FIFO theoretically has 16 entries, they can't be utilized since
// the watermark pointer is limited. So only 8 FIFO entries are used.
//
// -----------------------------------------------------------------------------
// 
//  Changes - 07/07/2017, Dizzixx:
//
//  - Added second DMAChannel object to drive DAC1
//  - Added helper functions for SAW and SINE waveforms for debugging
//  - Re-packedged setup of DMA and PDB to helper functions for modularity
//  - Changed timing to be: output sampling rate = F_BUS/PDB_MODIFIER
//  - Moved status reporting to behelper functions called from main loop
//
// -----------------------------------------------------------------------------

#include <array>
#include <DMAChannel.h>
#include "kinetis.h"

//#define FASTRUN __attribute__ ((section(".fastrun"), noinline, noclone ))


//For some reason these would not pull from kinetis.h
// for this reasons they are re-defined below
#define DAC_SR_DACBFWMF     0x04          // Buffer Watermark Flag
#define DAC_SR_DACBFRTF     0x02          // Pointer Top Position Flag
#define DAC_SR_DACBFRBF     0x01          // Pointer Bottom Position Flag
#define PDB_DACINTC_TOE     0x01          // Interval Trigger Enable


using aliased_uint16 = uint16_t __attribute__((__may_alias__));
using aliased_uint16_vptr = volatile aliased_uint16*;

const uint16_t PDB_MODIFIER = 47;         // output sampling rate = F_BUS/PDB_MODIFIER
const uint16_t num_buffered_cycles = 1;   // Determines how many "cycles" of a waveform to store in memory

//DMAChannel objects to set DAC DMA
DMAChannel dma0;                          //dma for DAC channel 0
DMAChannel dma1;                          //dma for DAC channel 1

// 128 value 12-bit  sine LUT
uint16_t sinetable[] = {
   2047,    2147,    2248,    2348,    2447,    2545,    2642,    2737,
   2831,    2923,    3012,    3100,    3185,    3267,    3346,    3422,
   3495,    3564,    3630,    3692,    3750,    3804,    3853,    3898,
   3939,    3975,    4007,    4034,    4056,    4073,    4085,    4093,
   4095,    4093,    4085,    4073,    4056,    4034,    4007,    3975,
   3939,    3898,    3853,    3804,    3750,    3692,    3630,    3564,
   3495,    3422,    3346,    3267,    3185,    3100,    3012,    2923,
   2831,    2737,    2642,    2545,    2447,    2348,    2248,    2147,
   2047,    1948,    1847,    1747,    1648,    1550,    1453,    1358,
   1264,    1172,    1083,     995,     910,     828,     749,     673,
    600,     531,     465,     403,     345,     291,     242,     197,
    156,     120,      88,      61,      39,      22,      10,       2,
      0,       2,      10,      22,      39,      61,      88,     120,
    156,     197,     242,     291,     345,     403,     465,     531,
    600,     673,     749,     828,     910,     995,    1083,    1172,
   1264,    1358,    1453,    1550,    1648,    1747,    1847,    1948,
};

// 128 value 12-bit sawtooth LUT
uint16_t sawtable[] = {
       0,    64,   128,   192,   256,   320,   384,   448,
     512,   576,   640,   704,   768,   832,   896,   960,
    1024,  1088,  1152,  1216,  1280,  1344,  1408,  1472,
    1536,  1600,  1664,  1728,  1792,  1856,  1920,  1984,
    2048,  2112,  2176,  2240,  2304,  2368,  2432,  2496,
    2560,  2624,  2688,  2752,  2816,  2880,  2944,  3008,
    3072,  3136,  3200,  3264,  3328,  3392,  3456,  3520,
    3584,  3648,  3712,  3776,  3840,  3904,  3968,  4032,
    4032,  3968,  3904,  3840,  3776,  3712,  3648,  3584,
    3520,  3456,  3392,  3328,  3264,  3200,  3136,  3072,
    3008,  2944,  2880,  2816,  2752,  2688,  2624,  2560,
    2496,  2432,  2368,  2304,  2240,  2176,  2112,  2048,
    1984,  1920,  1856,  1792,  1728,  1664,  1600,  1536,
    1472,  1408,  1344,  1280,  1216,  1152,  1088,  1024,
     960,   896,   832,   768,   704,   640,   576,   512,
     448,   384,   320,   256,   192,   128,    64,     0,

};

const uint16_t len_tables = sizeof(sinetable)/2;              // could just as easily type = 128
const uint16_t len_buffer = len_tables*num_buffered_cycles;   // buffer size, set dynamically to integrate with serial data later

//Channel 0 Buffer array
std::array<uint16_t,len_buffer> buffer0;
constexpr size_t buffer0_byte_count = sizeof(buffer0);
static_assert(buffer0.size() % 8 == 0, "Buffer0 size must be multiple of DAC buffer.");

//Channel 1 Buffer array
std::array<uint16_t,len_buffer> buffer1;
constexpr size_t buffer1_byte_count = sizeof(buffer1);
static_assert(buffer1.size() % 8 == 0, "Buffer1 size must be multiple of DAC buffer.");

void initBuffer_stairStep(int arg) {
  for(size_t i = 0; i < len_buffer; i += 4) {
      if(arg==0 || arg==2){
        //Channel 0
        buffer0[i+0] = 0;
        buffer0[i+1] = 1000;
        buffer0[i+2] = 2000;
        buffer0[i+3] = 3000;            
      }
      if(arg==1 || arg==2){
        //Channel 1
        buffer1[i+0] = 0;
        buffer1[i+1] = 1000;
        buffer1[i+2] = 2000;
        buffer1[i+3] = 3000;        
      }    
  }
}

void initBuffer_sine(int arg){
  for(int itr=0; itr<num_buffered_cycles; itr+=1){
    for(int sample=0; sample<len_tables; sample+=1){
      //Channel 0
      if(arg==0 || arg==2)buffer0[itr*len_tables+sample]=sinetable[sample];
      //Channel 1
      if(arg==1 || arg==2)buffer1[itr*len_tables+sample]=sinetable[sample];
    }
  }
}

void initBuffer_saw(int arg){
  for(int itr=0; itr<num_buffered_cycles; itr+=1){
    for(int sample=0; sample<len_tables; sample+=1){
      //Channel 0
      if(arg==0 || arg==2)buffer0[itr*len_tables+sample]=sawtable[sample];
      //Channel 1
      if(arg==1 || arg==2)buffer1[itr*len_tables+sample]=sawtable[sample];
    }
  }
}

void setup_DAC0(){
  //Enable DAC0
  SIM_SCGC2 |= SIM_SCGC2_DAC0;  // enable DAC clock
  DAC0_C0 |= DAC_C0_DACEN;      // enable DAC
  DAC0_C0 |= DAC_C0_DACRFS;     // use 3.3V VDDA as reference voltage
  DAC0_C0 |= DAC_C0_DACBWIEN;   // enable DMA trigger at watermark
  DAC0_C0 |= DAC_C0_DACBTIEN;   // enable DMA trigger at 0
  DAC0_C1 = DAC_C1_DACBFWM(2) | // watermark for DMA trigger
                                // --> DMA triggered when DAC buffer index is 4
            DAC_C1_DMAEN      | // enable DMA
            DAC_C1_DACBFEN;     // enable DAC buffer
  DAC0_C2 = DAC_C2_DACBFUP(7);  // set buffer size to 8
  DAC0_SR &= ~(DAC_SR_DACBFWMF); // clear watermark flag
  DAC0_SR &= ~(DAC_SR_DACBFRTF); // clear top pos flag
  DAC0_SR &= ~(DAC_SR_DACBFRBF); // clear bottom pos flag
  
  // Init DAC FIFO with the last 8 buffer elements. This makes setting up the circular
  // DMA transfer easier.
  for(size_t i = 0; i < 8; i++) {
      ((aliased_uint16_vptr) &DAC0_DAT0L)[i] = buffer0[buffer0.size() - 8 + i];
  }  
  // The modulo feature of the DMA controller is used. The destination
  // pointer wraps at +16 bytes.
  dma0.destinationCircular(((volatile uint16_t*) &DAC0_DAT0L), 16);
  
  dma0.TCD->SADDR = buffer0.data();         // source data buffer
  dma0.TCD->SOFF = 2;                       // advance by 2 bytes (16 bits) per read
  dma0.TCD->ATTR_SRC = 1;               
  dma0.TCD->NBYTES = 8;
  dma0.TCD->SLAST = -buffer0_byte_count;
  dma0.TCD->BITER = buffer0_byte_count / 8;
  dma0.TCD->CITER = buffer0_byte_count / 8;
  
  dma0.triggerAtHardwareEvent(DMAMUX_SOURCE_DAC0);
  dma0.enable();    
}

void setup_DAC1(){
  //Enable DAC1
  SIM_SCGC2 |= SIM_SCGC2_DAC1;  // enable DAC clock
  DAC1_C0 |= DAC_C0_DACEN;      // enable DAC
  DAC1_C0 |= DAC_C0_DACRFS;     // use 3.3V VDDA as reference voltage
  DAC1_C0 |= DAC_C0_DACBWIEN;   // enable DMA trigger at watermark
  DAC1_C0 |= DAC_C0_DACBTIEN;   // enable DMA trigger at 0
  DAC1_C1 = DAC_C1_DACBFWM(2) | // watermark for DMA trigger
                                // --> DMA triggered when DAC buffer index is 4
            DAC_C1_DMAEN      | // enable DMA
            DAC_C1_DACBFEN;     // enable DAC buffer
  DAC1_C2 = DAC_C2_DACBFUP(7);  // set buffer size to 8
  DAC1_SR &= ~(DAC_SR_DACBFWMF); // clear watermark flag
  DAC1_SR &= ~(DAC_SR_DACBFRTF); // clear top pos flag
  DAC1_SR &= ~(DAC_SR_DACBFRBF); // clear bottom pos flag
  
  // Init DAC FIFO with the last 8 buffer elements. This makes setting up the circular
  // DMA transfer easier.
  for(size_t i = 0; i < 8; i++) {
      ((aliased_uint16_vptr) &DAC1_DAT0L)[i] = buffer1[buffer1.size() - 8 + i];
  }  
  // The modulo feature of the DMA controller is used. The destination
  // pointer wraps at +16 bytes.
  dma1.destinationCircular(((volatile uint16_t*) &DAC1_DAT0L), 16);
  
  dma1.TCD->SADDR = buffer1.data();      // source data buffer
  dma1.TCD->SOFF = 2;                   // advance by 2 bytes (16 bits) per read
  dma1.TCD->ATTR_SRC = 1;               
  dma1.TCD->NBYTES = 8;
  dma1.TCD->SLAST = -buffer1_byte_count;
  dma1.TCD->BITER = buffer1_byte_count / 8;
  dma1.TCD->CITER = buffer1_byte_count / 8;
  
  dma1.triggerAtHardwareEvent(DMAMUX_SOURCE_DAC1);
  dma1.enable();    
}

void setup_and_run_PDB(){
  SIM_SCGC6 |= SIM_SCGC6_PDB;           // PDB Clock Gate Control, enable PDB clock
  PDB0_SC |= PDB_SC_PDBEN;              // PDB Enable
  PDB0_SC |= PDB_SC_TRGSEL(15);         // Trigger Input Source Select, SW trigger
  PDB0_SC |= PDB_SC_CONT;               // run continiously

  // Pre-scalers not used as high PDB rate is desired
  //PDB0_SC |= PDB_SC_PRESCALER(0b111); // prescaler (from 0-7..?)    
  //PDB0_SC |= PDB_SC_MULT(0b0);        // Prescaler multipliers other than 1x don't work correctly.
  PDB0_MOD = PDB_MODIFIER;              // Adjust for output frequency.
                                        // out frequency == F_BUS / PRESCALER / (PDB0_MOD + 1)
  
  // The hardware doesn't do what the manual claims. The DAC trigger counter is
  // not reset on PDB counter overflow. 
  // Things work correctly, ONLY if the PDB mod is used.
  // DAC Channel 0
  PDB0_DACINT0 = PDB_MODIFIER;          // trigger DAC once per PDB cycle
  PDB0_DACINTC0 = PDB_DACINTC_TOE;      // enable DAC interval trigger
  // DAC Channel 1
  PDB0_DACINT1 = PDB_MODIFIER;          // trigger DAC once per PDB cycle
  PDB0_DACINTC1 = PDB_DACINTC_TOE;      // enable DAC interval trigger

  // Sync the PDB registers with written values and start the PDB timer
  PDB0_SC |= PDB_SC_LDOK;               // sync buffered PDB registers
  PDB0_SC |= PDB_SC_SWTRIG;             // start PDB  
}

unsigned src0_idx_prev = 0;
unsigned dac0_out_idx_prev = 0;
unsigned src1_idx_prev = 0;
unsigned dac1_out_idx_prev = 0;

void reportStatus_DAC0(){
  noInterrupts();
  unsigned src_idx_curr = ((uintptr_t) dma0.sourceAddress() - (uintptr_t) buffer0.data()) / 2;
  unsigned dest_idx = ((uintptr_t) dma0.destinationAddress() - (uintptr_t) &DAC0_DAT0L) / 2;
  unsigned dac_out_idx_curr = DAC0_C2 >> 4;
  unsigned dac_val = ((aliased_uint16_vptr) &DAC0_DAT0L)[dac_out_idx_curr];
  interrupts();
  if(src1_idx_prev != src_idx_curr && dac1_out_idx_prev != dac_out_idx_curr){
    src1_idx_prev = src_idx_curr;
    dac1_out_idx_prev = dac_out_idx_curr;
    Serial.printf("Channel 1 - DMA src idx: %4u   DMA dest idx: %4u   DAC out idx: %4u   DAC value: %4u\n",
                  src_idx_curr, dest_idx, dac_out_idx_curr, dac_val);     
  } 
}

void reportStatus_DAC1(){
  noInterrupts();
  unsigned src_idx_curr = ((uintptr_t) dma1.sourceAddress() - (uintptr_t) buffer1.data()) / 2;
  unsigned dest_idx = ((uintptr_t) dma1.destinationAddress() - (uintptr_t) &DAC1_DAT0L) / 2;
  unsigned dac_out_idx_curr = DAC1_C2 >> 4;
  unsigned dac_val = ((aliased_uint16_vptr) &DAC1_DAT0L)[dac_out_idx_curr];
  interrupts();
  if(src1_idx_prev != src_idx_curr && dac1_out_idx_prev != dac_out_idx_curr){
    src1_idx_prev = src_idx_curr;
    dac1_out_idx_prev = dac_out_idx_curr;
    Serial.printf("Channel 0 - DMA src idx: %4u   DMA dest idx: %4u   DAC out idx: %4u   DAC value: %4u\n",
                  src_idx_curr, dest_idx, dac_out_idx_curr, dac_val);     
  }    
}

void setup() {
  //Start serial
  Serial.begin(9600);
  delay(2000);
  Serial.println("PDB DAC sample. Starting...");
  
  //fill outer buffer with values
  //Channel 0
  initBuffer_sine(0);
  //Channel 1
  initBuffer_saw(1);
  
  //Configure the DAC's to use PDB timing and FIFO buffer
  //setup DAC0
  setup_DAC0();    
  //setup DAC1
  setup_DAC1();
  
  //setup and start the PDB timer
  setup_and_run_PDB();
      
}

void loop(){
  reportStatus_DAC0();
  reportStatus_DAC1();
}
 
Problem with the Hackaday code on a teensy 3.6

The problem is, the mod register isn't synced at that point, so the wrong value is read. Make that:
uint16_t mod = ...;
PDB0_MOD = mod;
PDB0_DACINT0 = mod;


Yes, it looks like he is doing things pretty much the same way, only using 16 FIFO entries instead of 8. He should mention the FIFO, so that it turns up in a search. If I had seen his code, I wouldn't have bothered with the example...

Very interesting discussion. I tried that code originally written for a teensy 3.2 on the Hackaday.io site on a teensy 3.6 and it didn't function properly. In my comment on the site I stated, "It looks like the first word out of the DAC is the last word written into the 16-word DAC buffer. That is, the values written into the DAC buffer are (val1, val2, val3, ..., val15, val16). The output of the DAC is (val16, val1, val2, val3, ..., val15)." If you slow the code down you will see what I am talking about. Is this a hardware error, or is there a workaround? If I could figure out how to post a scope image on this reply I would do it. By the way, I believe he is using the full dac buffer of 16 words, but he is performing two DMA's of 8 words each.
Thanks!
W
 
By your statement you're implying that the read pointer isn't pointing to where the DMA is loading the data. Based on the sequence I am observing, that implies the data conversion does not begin until after the buffer is filled. The read pointer must be pointing to the last word in the buffer if that is the first word that is converted (followed by the first fifteen words in the buffer). So, given that observation, one should assume the read pointer does not begin to increment until the last buffer entry is written? That's special... No wonder it doesn't work. Thanks!
 
More precisely, Mr. Theremingenieur, it entirely depends on the initial condition of the read pointer and which flags are used to drive the DMA. In the teensy 3.2 example on the Hackaday.io site, I was able to make it function correctly on the teensy 3.6 by changing the DMA interrupts from the read pointer bottom and watermark interrupt flags to the read pointer top and watermark flags, and by setting the initial condition of theread pointer to 12 after starting the PDB. What follows the last several lines of the original void setup() routine with the modifications outlined above. I included the before and after results below.

// DAC0_C0 |= DAC_C0_DACBBIEN | DAC_C0_DACBWIEN; // enable read pointer bottom and waterwark interrupt
DAC0_C0 |= DAC_C0_DACBTIEN | DAC_C0_DACBWIEN; // enable read pointer top and waterwark interrupt --- Modified ---
DAC0_C1 |= DAC_C1_DMAEN | DAC_C1_DACBFEN | DAC_C1_DACBFWM(3); // enable dma and buffer
DAC0_C2 |= DAC_C2_DACBFRP(0);
// init the PDB for DAC interval generation
SIM_SCGC6 |= SIM_SCGC6_PDB; // turn on the PDB clock
PDB0_SC |= PDB_SC_PDBEN; // enable the PDB
PDB0_SC |= PDB_SC_TRGSEL(15); // trigger the PDB on software start (SWTRIG)
PDB0_SC |= PDB_SC_CONT; // run in continuous mode
PDB0_MOD = 20-1; // modulus time for the PDB
PDB0_DACINT0 = (uint16_t)(20-1); // we won't subdivide the clock...
PDB0_DACINTC0 |= 0x01; // enable the DAC interval trigger
PDB0_SC |= PDB_SC_LDOK; // update pdb registers
PDB0_SC |= PDB_SC_SWTRIG; // ...and start the PDB
DAC0_C2 |= DAC_C2_DACBFRP(12); //Initial condition required for the DAC buffer read pointer --- Added ---
}

scope 1.png
scope 2.png
 
Although the reference manual made my head hurt, I finally figured out how to get the DAC FIFO to work a couple of days ago (see another post on this page). Using the PDB, I'm running the DAC update interval at 0.133 microseconds by setting DACINT (Section 44.4.9) to 8 (8/[bus frequency]). The third harmonic at higher frequencies using that update rate is about -40 dB. Using a buffer of 8192 points at 915 Hz, the third harmonic is down to -60 dB. I'm able to generate some pretty decent sine waves out to about 100 - 200 kHz with that performance, and I'm not observing any appreciable jitter on my scope.
 
I thought i'd use your sketch to revisit DAC settle time on Teensy 3. See earlier thread https://forum.pjrc.com/threads/26036-Fastest-DAC-speed-possible-for-Teensy-3-1-using-Arduino

Changing your sinetable to a square wave (0 4095 0 4095 ....), and only having scope probe attached to DAC pin of T3.2@120mhz, i see the following on scope running PDB at 1 us ticks (MOD (60-1)). The rise time from 0 to 3.3v takes about 780 ns (fall 860 ns)
dacsettle.png
So conservatively, settle time is about 1 us (data sheet suggests 15 us).

With DAC update interval at 0.133 us, the sawtooth has a period of 300 ns and Vpp is 560 mV

FYI, anecdotal square wave settle times for various DACs
https://github.com/manitou48/DUEZoo/blob/master/dac.txt
 
Last edited:
Status
Not open for further replies.
Back
Top