Limits of Fast, Precise Timing for Teensy 3.6 External ADC Data Acquisition

Status
Not open for further replies.

StanfordEE

Well-known member
Continuing my prior work, I wanted to illustrate some ongoing difficulties with jitter as illustrated by some simple examples. For this test, we will use a 1 GHz bandwidth, 5 Gs/s, oscilloscope with 700 MHz bandwidth passive probes.

Here, this is the simplest example of all. Write pulses to a pin and look at the jitter, when overclocking at 240 MHz.

//Teensy 3.6 Test of bit banging jitter.

int startPulsePin = 14;

void setup()
{
pinMode(startPulsePin,OUTPUT); //Our test start pulse for a hypothetical ADC
noInterrupts(); //Turn off interrupts to minimize jitter
}

FASTRUN void loop()
{
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
}

So what does one get? In a nutshell, it is a very asymmetrical signal consisting of a low-going pulse of about 31ns at a frequency of 1.3637 MHz. Surprisingly slow repetition rate, but due to the time consumption of the loop.

LoopedPulseSeries.png

With persistence mode turned on to capture any possible timing jitter, there is none to be found.

LoopedPulse.png


As noted in other threads, the loop timing has a lot to do with the overall timing, so here is a series of ten pulses repeating via in-line code, then the loop executing.

//Teensy 3.6 Test of bit banging jitter.

int startPulsePin = 14;

void setup()
{
pinMode(startPulsePin,OUTPUT); //Our test start pulse for a hypothetical ADC
noInterrupts(); //Turn off interrupts to minimize jitter
}

FASTRUN void loop()
{
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
}

One sees the same pulse width, symmetrical duty cycle within the bursts. The frequency of the burst squarewave is 17.24 MHz and the loop frequency is 1.26 MHz. So this clearly illustrates the trade-off of loops versus in-line coding for fast pulse generation.

BasicPulseBurstsInline.png



The pulses are nicely symmetrical.

SymmetricalPulses.png


Now let's try to build a slightly more realistic bit of code, which in my case is to pulse the clock pin of an ADC and read the resulting data. As noted previously, there are lots of issues with port reads, and at present there is no way to read a complete set of bits, already in the right order and position relative to their binary magnitudes beyond 12 bits, and then not while maintaining library compatibility with most x-y LCD or OLED display modules. Even though that sucks, we can still see what is possible before going in and editing/forking the libraries for those displays.

To verify that there is no jitter, create a separate trigger pulse on a different trigger, to match the duration of the burst. Looks solid, no observable jitter.


Separate_Trigger.png


Despite the difficulty in using port reads, let's see what we can do with port C, which is the only remotely usable one.

//Teensy 3.6 port reads speed test, overclocked 240 MHz
//G. Kovacs, 1/27/19
//CORRECTED PIN C MAPPING!!!

int temp;
int startPulsePin = 39;
byte pinAtable[] = {33,24,3,4}; //PortA, 4 bits
byte pinBtable[] = {16,17,19,18,0,1,32,25}; //PortB, 8 bits NOTE SCRAMBLED pins!!!
byte pinCtable[] = {15,22,23,9.10,13,11,12,35,36,37,38}; //PortC -> 12 bits, but LED is on pin 13, pins all in binary order!!!!
byte pinDtable[] = {2,14,7,8,6,20,21,5}; //PortD, 8 bits in binary order.
byte pinEtable[] = {31,26}; //PordE, 2 bits, in binary order.

void setup() {
for (int i=0; i<12; i++)
{ pinMode(pinCtable,INPUT_PULLUP); }
pinMode(startPulsePin, OUTPUT);
noInterrupts();
}

FASTRUN void loop() {
digitalWriteFast(startPulsePin,LOW); //Make pulse to clock the ADC
digitalWriteFast(startPulsePin,HIGH);
temp = GPIOC_PDIR; //Read the ADC
}

Just as before with just the bit-banged pulse output, but a tiny bit slower due to the port read, we see a low-going pulse approximately 31 ns in width at a frequency of about 16 MHz (15.87 MHz) and an overall repetition rate of 1.25 MHz with no discernable jitter.

LoopedPortReads.png


As before, add a separate pin to make a trigger so the oscilloscope can see if the bursts of "acquisition" are jittering.

//Teensy 3.6 port reads speed test, overclocked 240 MHz
//G. Kovacs, 1/27/19
//CORRECTED PIN C MAPPING!!!

int temp;
int startPulsePin = 39;
int trigPulsePin = 16;
byte pinAtable[] = {33,24,3,4}; //PortA, 4 bits
byte pinBtable[] = {16,17,19,18,0,1,32,25}; //PortB, 8 bits NOTE SCRAMBLED pins!!!
byte pinCtable[] = {15,22,23,9.10,13,11,12,35,36,37,38}; //PortC -> 12 bits, but LED is on pin 13, pins all in binary order!!!!
byte pinDtable[] = {2,14,7,8,6,20,21,5}; //PortD, 8 bits in binary order.
byte pinEtable[] = {31,26}; //PordE, 2 bits, in binary order.

void setup() {
for (int i=0; i<12; i++)
{ pinMode(pinCtable,INPUT_PULLUP); }
pinMode(startPulsePin, OUTPUT);
pinMode(trigPulsePin, OUTPUT);
noInterrupts();
}

FASTRUN void loop() {
digitalWriteFast(trigPulsePin, LOW);

digitalWriteFast(startPulsePin,LOW); //Make pulse to clock the ADC
digitalWriteFast(startPulsePin,HIGH);
temp = GPIOC_PDIR; //Read the ADC

digitalWriteFast(startPulsePin,LOW); //Make pulse to clock the ADC
digitalWriteFast(startPulsePin,HIGH);
temp = GPIOC_PDIR; //Read the ADC

digitalWriteFast(startPulsePin,LOW); //Make pulse to clock the ADC
digitalWriteFast(startPulsePin,HIGH);
temp = GPIOC_PDIR; //Read the ADC

digitalWriteFast(startPulsePin,LOW); //Make pulse to clock the ADC
digitalWriteFast(startPulsePin,HIGH);
temp = GPIOC_PDIR; //Read the ADC

digitalWriteFast(startPulsePin,LOW); //Make pulse to clock the ADC
digitalWriteFast(startPulsePin,HIGH);
temp = GPIOC_PDIR; //Read the ADC

digitalWriteFast(startPulsePin,LOW); //Make pulse to clock the ADC
digitalWriteFast(startPulsePin,HIGH);
temp = GPIOC_PDIR; //Read the ADC

digitalWriteFast(startPulsePin,LOW); //Make pulse to clock the ADC
digitalWriteFast(startPulsePin,HIGH);
temp = GPIOC_PDIR; //Read the ADC

digitalWriteFast(startPulsePin,LOW); //Make pulse to clock the ADC
digitalWriteFast(startPulsePin,HIGH);
temp = GPIOC_PDIR; //Read the ADC

digitalWriteFast(startPulsePin,LOW); //Make pulse to clock the ADC
digitalWriteFast(startPulsePin,HIGH);
temp = GPIOC_PDIR; //Read the ADC

digitalWriteFast(startPulsePin,LOW); //Make pulse to clock the ADC
digitalWriteFast(startPulsePin,HIGH);
temp = GPIOC_PDIR; //Read the ADC

digitalWriteFast(trigPulsePin,HIGH);
}


See a nice, jitter-free acquisition at about 16 MHz and a loop repetition rate of 1.13 MHz.


InLinePortReadBursts.png


If one is willing to give up using the standard libraries for most LCD and OLED displays, it is possible to use the port C reads, in-line, to capture 1024 samples for subsequent analysis. This is the only realistic port, and can accommodate a maximum ADC resolution of 12-bits. Here only the beginning and end of the code is included below…

//Teensy 3.6 port reads speed test, overclocked 240 MHz
//G. Kovacs, 1/27/19
//CORRECTED PIN C MAPPING!!!

int temp;
int startPulsePin = 39;
int trigPulsePin = 16;
byte pinAtable[] = {33,24,3,4}; //PortA, 4 bits
byte pinBtable[] = {16,17,19,18,0,1,32,25}; //PortB, 8 bits NOTE SCRAMBLED pins!!!
byte pinCtable[] = {15,22,23,9.10,13,11,12,35,36,37,38}; //PortC -> 12 bits, but LED is on pin 13, pins all in binary order!!!!
byte pinDtable[] = {2,14,7,8,6,20,21,5}; //PortD, 8 bits in binary order.
byte pinEtable[] = {31,26}; //PordE, 2 bits, in binary order.
double vReal[1024];

void setup() {
for (int i=0; i<12; i++)
{ pinMode(pinCtable,INPUT_PULLUP); }
pinMode(startPulsePin, OUTPUT);
pinMode(trigPulsePin, OUTPUT);
noInterrupts();
}

FASTRUN void loop() {
digitalWriteFast(trigPulsePin, LOW);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[0] = GPIOC_PDIR; //Read the ADC
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[1] = GPIOC_PDIR; //Read the ADC
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[2] = GPIOC_PDIR; //Read the ADC

*
*
*
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[1022] = GPIOC_PDIR; //Read the ADC
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[1023] = GPIOC_PDIR; //Read the ADC

digitalWriteFast(trigPulsePin,HIGH);
}

One obtains a sample period of 288 ns for a sample rate of 3.4722 Ms/s, which isn't really all that fast.


InlinePortCReadsArray.png



Next, as previously reported, try using in-line code to capture 1024 samples from a hypothetical ADC using pins 24..39 for up to 16-bits of parallel ADC data by individually reading each bit, shifting it to the correct place in binary, and assembling a sample. Here the code is adjusted for a 10-bit input, and only the beginning and end of the sketch is included below…

//Teensy 3.6 port reads speed test, overclocked 240 MHz
//Brute force bit reads for 10-bit ADC, but scalable with same pins upward to 16 bits.
//G. Kovacs, 1/27/19

int startPulsePin = 17;
int trigPulsePin = 16;
double vReal[1024]; //This needs to be double for the FFT routine in use.

void setup() {
pinMode(startPulsePin,OUTPUT); //Our test start pulse for a hypothetical ADC
pinMode(trigPulsePin,OUTPUT); //Pin for triggering scope at start of each burst
digitalWriteFast(startPulsePin,HIGH);
for (int i = 0; i<16; i++) //use pins 24..39 in this test
{
pinMode(i+24,INPUT_PULLUP); //inputs for ADC using pinmode, LSB = 0
}
noInterrupts();
}

FASTRUN void loop() {
digitalWriteFast(trigPulsePin, LOW);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[0] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7)+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[1] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7)+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);

*
*
*

vReal[1021] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7)+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[1022] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7)+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[1023] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7)+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(trigPulsePin,HIGH);
}


After taking several minutes to compile (!), this yields a sample rate within the burst of about 2 Ms/s, which is about as fast as this approach could run. Measured as accurately as possible (averaging, careful cursor placement), the sample period is 470 ns for a frequency of 2.1277 Ms/s, which could be more accurately measured and used to "calibrate" the subsequent acquisitions , rather than trying to get a precise timing of, say 2.000000 Ms/s (which would of course be nice).

Inline10BitReadsArray.png



Working Conclusions:

1) In-line coding for burst acquisition is much faster than looping, and the maximum interrupt-driven speed using intervalTimer is 1Ms/s. However, in-line coding will not deliver typical, neat and tidy sampling rates, such as 1, 2, 5, 10 Ms/s. However, this is something than can be mathematically taken into account later (e.g., scaling an FFT).

2) The achievable sample rates, given all of the Arduino ecosystem constraints (unless I'm missing something) and without having to fork libraries for common LCD and OLED displays due to need for some of the port C pins, are on the order of low Ms/s, specifically about 2.13 Ms/s for 10-bit ADC inputs. This would enable 1 MHz maximum bandwith FFT's on a Teensy 3.6, far beyond the capabilities of the internal ADC's.

3) Port reads, if one is willing to edit libraries and then be strapped to perpetually updating them in sync with their authors, or drifting away, are practically limited to 12 contiguous bits (portC) and the maximum sample rate is not that much better at 3.47 Ms/s. It does not seem worth the hassle at all. No way.

4) There is no jitter seen, although with lower bandwidth oscilloscopes, trigger jitter can falsely show some. Beware of crappy oscilloscopes. You get what you pay for.

5) Directly reading the ADC in either of these methods is going to be faster than SPI for higher-resolution ADC's, but beating, for example, a 10-bit ADC on 15 MHz SPI either way. When one gets up to higher resolution, parallel will win. In general, really fast ADC's do not come with serial outputs for this reason.

6) A relatively simple hardware data acquisition circuit, controlled by the Teensy will be the best option if it is done at reasonable cost. SRAM fed by the ADC, clocked from a crystal oscillator, and addressed using fast counters (e.g., 74HC590 or 74AS867, topping out around 35 Ms/s and 50 Ms/s, respectively, at 5V power) could reach the highest frequencies possible. A counter such as the 74F269 could reach 100 Ms/s, but then SRAM could get a bit pricey, and most affordable parallel (flash) ADC's top out around 40 Ms/s.
 

Attachments

  • PortReadBitBang.png
    PortReadBitBang.png
    30.7 KB · Views: 90
Nice work. I have often wondered how fast we could get parallel accesses using the Teensy.

On #4, ADC manufacturers often suggest a bandpass filter on the clock which cleans up a lot of the jitter, but leaves you with a sine wave clock instead of a square. This isn't as much of an issue as you would think, though. Combining a very low phase noise RF generator (sine wave output) with a good filter can yield sub 100fs jitter. See the ADS5483 datasheet. I don't know of a scope that can measure this low of jitter, but it can be estimated from the SNR vs. input frequency plots.

On #6, you can get a FIFO in an all in one package like this 133MHz FIFO if you can live with 4096 sample chunks of data. You can couple that to a parallel output ADC and read 4K data out at your leisure.

In reality, though, this is why ADC manufacturers sell an FPGA based capture solution matched to the ADCs they want to sell. Sure, they are a lot more expensive than a Teensy, but they do work at full speed.

David
 
These results are similar to what I found with my TeensyLogicAnalyzer. It uses 8 bits on port B. Using general c code (record_low_speed_data.ino), it goes up to 5 Ms/s. With some tricks and fewer features, it works at 10 Ms/s (record_high_speed_data_8_channels.cpp). Then I go to assembly language for 30, 48, and 80 Ms/s (record_asm.ino). Even though it is assembly language, it still compiles like a standard sketch.

At the 80 Ms/s rate it requires some special care. I run the recording twice - the first time is to load the code into cache, then the second time it runs at full speed. Also, it works best to store data in upper RAM. In lower RAM it took extra clock cycles. But the 3.6 has 180k+ of upper RAM, just requires some skill to place your buffer there.
 
I was running your examples and was able to duplicate the timing numbers. With a few small changes as shown below, I was able to get your next-to-last example to be about 10 times faster. [I had to slow my T6 down to 120 MHz so that my logic analyzer could keep up]

One change is to make the pin numbers constants. Another change is to set fast slew rate to the outputs. This changed the first example, at 120 MHz, from 62 to 16 nsec (at 240 MHz, would be 31 to 8 nsec).

For the next to last example, changing the data array from double to int (the conversion can be done later), along with the changes above, changed the fastest sample period from 576 to 40 nsec (at 240 MHz, would be 288 to 20 nsec).

Code:
//Teensy 3.6 port reads speed test, overclocked 240 MHz
//G. Kovacs, 1/27/19
//CORRECTED PIN C MAPPING!!!

int temp;
const int startPulsePin = 39;
const int trigPulsePin = 16;
int vReal[1024];

void setup() {
pinMode(startPulsePin, OUTPUT);
CORE_PIN39_CONFIG = PORT_PCR_MUX(1); // no slew rate limit
pinMode(trigPulsePin, OUTPUT);
CORE_PIN16_CONFIG = PORT_PCR_MUX(1); // no slew rate limit
noInterrupts();
}

FASTRUN void loop() {
digitalWriteFast(trigPulsePin, LOW);

digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[0] = GPIOC_PDIR; //Read the ADC
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[1] = GPIOC_PDIR; //Read the ADC
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[2] = GPIOC_PDIR; //Read the ADC
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[3] = GPIOC_PDIR; //Read the ADC
digitalWriteFast(startPulsePin,LOW);
digitalWriteFast(startPulsePin,HIGH);
vReal[4] = GPIOC_PDIR; //Read the ADC

digitalWriteFast(trigPulsePin,HIGH);
}

So 50 Ms/s could be feasible.

I would like to experiment with an inexpensive ADC that I can use on a breadboard or an ADC shield. Any suggestions?
 
Thank you very much. In a few weeks, I should be able to send you a fast ADC with DIP adapter. The best bet is a 40Ms/s 10-bit unit I've tested. Awesome.

Right now I'm bogged down here...

Currently designing an external circuit. There are more issues popping up, such as when you write to an array of (anything) the index, fixed or variable, causes a different delay if the index is greater than one byte in size, such as 256 vs 255. Timing is a mess. What the Teensy 3.6 has is raw power. What it doesn't have is a comprehensive user manual and detailed testing of features "at speed." I'm working on that testing. The code I'm trying to fix is below. Due to conflicts with pins "used by the masses" in common libraries, ports are off the table, so it is reading bits and shifting.

//Teensy 3.6 port reads speed test, overclocked 240 MHz
//Brute force bit reads for 10-bit ADC, but scalable with same pins upward to 16 bits.
//G. Kovacs, 1/27/19

int ADCclockPin = 16;
int trigPulsePin = 17;
//int vReal[1024]; //This needs to be double for the FFT routine in use.
int d0,d1,d2,d3,d4,d5,d6,d7,d8,d9; //Created variables to avoid using an array. Not sure if it will scale to 512 though...

void setup() {
pinMode(ADCclockPin,OUTPUT); //Our test start pulse for a hypothetical ADC
pinMode(trigPulsePin,OUTPUT); //Pin for triggering scope at start of each burst
digitalWriteFast(ADCclockPin,HIGH);
for (int i = 0; i<16; i++) //use pins 24..39 in this test
{
pinMode(i+24,INPUT_PULLUP); //inputs for ADC using pinmode, LSB = 0
}
noInterrupts();
}

FASTRUN void loop() {
digitalWriteFast(ADCclockPin,LOW);
d0 = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7)+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4);
digitalWriteFast(ADCclockPin,HIGH);
d0 = d0+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
d1 = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7)+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4);
digitalWriteFast(ADCclockPin,HIGH);
d1 = d1+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
d2 = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7)+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4);
digitalWriteFast(ADCclockPin,HIGH);
d2 = d2+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
d3 = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7)+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4);
digitalWriteFast(ADCclockPin,HIGH);
d3 = d3+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
d4 = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7)+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4);
digitalWriteFast(ADCclockPin,HIGH);
d4 = d4+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
d5 = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
d5 = d5+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
d6 = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
d6 = d6+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
d7 = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
d7 = d7+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
d8 = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
d8 = d8+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
d9 = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
d9 = d9+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);


}





Splitting the reads and shifts was an attempt to balance the HIGH/LOW duty cycle. Writing to array variables, as below, resulted in all of the aforementioned timing glitches. The code *above* was an effort to test this with individual variables, not an array. No go. Need perfectly symmetrical, no-jitter reads each time for an aggressively-designed flash ADC. The example below will show on a scope that the size (> or < one byte) of the array index changes the timing. Bummer, and probably some guru programmer can fix it... :)






//Teensy 3.6 port reads speed test, overclocked 240 MHz
//Brute force bit reads for 10-bit ADC, but scalable with same pins upward to 16 bits.
//G. Kovacs, 1/27/19

int ADCclockPin = 16;
int trigPulsePin = 17;
int vReal[1024]; //This needs to be double for the FFT routine in use.

void setup() {
pinMode(ADCclockPin,OUTPUT); //Our test start pulse for a hypothetical ADC
pinMode(trigPulsePin,OUTPUT); //Pin for triggering scope at start of each burst
digitalWriteFast(ADCclockPin,HIGH);
for (int i = 0; i<16; i++) //use pins 24..39 in this test
{
pinMode(i+24,INPUT_PULLUP); //inputs for ADC using pinmode, LSB = 0
}
noInterrupts();
}

FASTRUN void loop() {
digitalWriteFast(ADCclockPin,LOW);
vReal[0x000] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
vReal[0x000] = vReal[0x000]+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
vReal[0x001] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
vReal[0x001] = vReal[0x001]+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
vReal[0x002] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
vReal[0x002] = vReal[0x010]+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
vReal[0x003] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
vReal[0x003] = vReal[0x003]+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
vReal[0x004] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
vReal[0x004] = vReal[0x004]+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
vReal[0x005] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
vReal[0x005] = vReal[0x005]+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
vReal[0x006] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
vReal[0x006] = vReal[0x006]+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
vReal[0x1FD] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
vReal[0x1FD] = vReal[0x1FD]+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
vReal[0x1FE] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
vReal[0x1FE] = vReal[0x1FE]+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);

digitalWriteFast(ADCclockPin,LOW);
vReal[0x1FF] = (digitalReadFast(24)<<9)+(digitalReadFast(25)<<8)+(digitalReadFast(26)<<7);
digitalWriteFast(ADCclockPin,HIGH);
vReal[0x1FF] = vReal[0x1FF]+(digitalReadFast(27)<<6)+(digitalReadFast(28)<<5)+(digitalReadFast(29)<<4)+(digitalReadFast(30)<<3)+(digitalReadFast(31)<<2)+(digitalReadFast(32)<<1)+digitalReadFast(33);


}




So there are issues within issues, and I'm not familiar with the nuance of assembly on this chip. I am, however, handy with the circuits and signal processing code, and have a pretty nice spectrum analyzer running at speed.
 
Oh yes, there is another posting I'd like to make on the weekend on the details. The data from the ADC with the code in my previous (few days ago) posting using the kludgy bit-read-and-shift approach was crap, because the ADC needed about 50% duty cycle on its clock. Digging in deeper revealed the inconsistent timing when writing to an array, even when the index is a fixed int, not a variable. Just a huge hassle, and a bit of jitter is a real buzz-kill...
 
One thought I have is to use a hardware PWM signal with 50% duty cycle to drive the clock and synchronize the software to read the ADC at the right time, perhaps via interrupt or polling.

Here an example creating a 20 MHz clock on pin 3:

Code:
  analogWriteFrequency (3, 20000000);
  analogWrite (3, 128);  // 50% duty cycle
 
I've been experimenting with using a PWM pin for the clock and polling on it. It works consistently at 10 MHz using the code below. If all the pins are on a single GPIO port, I can read it and store the data in almost same amount of time as setting the trigger pulse in the example.

Just to whet your appetite, with some optimization tricks I almost have it working at 30 MHz. I also have an assembly language version that goes to 40 MHz. But I'll keep it simple for now.

Code:
const int trigPulsePin = 35;
const int pwmClockPin = 37;

const int pwmFrequency = 10000000;

void setup() {
  pinMode(trigPulsePin, OUTPUT);

  analogWriteFrequency (pwmClockPin, pwmFrequency);
  analogWrite (pwmClockPin, 128);

  noInterrupts();
}

FASTRUN void loop() {
  while (1) {
    // wait for rising edge (loop while low)
    while (!digitalReadFast(pwmClockPin));

    // wait for falling edge (loop while high)
    while (digitalReadFast(pwmClockPin));

    digitalWriteFast(trigPulsePin,HIGH);
    digitalWriteFast(trigPulsePin,LOW); 
  }
}

In order to get good timing data with my logic analyzer, I slow it down to 24 MHz, but the relative timing should be the same. Here's a picture of what it looks like. There is 1 timing pulse between each set of falling clock edges.

PWM_polling.jpg
 
Status
Not open for further replies.
Back
Top