Gurus for Hire? Need *really* fast Teensy 3.6 I/O and will share work product freely.

Status
Not open for further replies.

StanfordEE

Well-known member
Folks,
We need someone who perhaps has read the entire datasheet (joke) and can write some form of extension (I don't really care how it is implemented) that will acquire a burst of 16-bit data (e.g., 512 or 1024 words) from user-specified, likely contiguous (but potentially arbitrary) pins of a Teensy 3.6 (say 25 = MSB, 39 = LSB) at a precise rate specified in MHZ (e.g., 1, 2, 10) and provide a digital strobe each time (HIGH->LOW->HIGH) to allow maximum acquisition speed from parallel ADCs. We have budget to pay someone, and will freely share the work product with the community.

If you care to look, I have a thread on what I could accomplish with the available (and known to me) methods:

https://forum.pjrc.com/threads/54877-Fast-Digital-IO-on-Teensy-3-6-Ports-or-Bits?p=195518#post195518

We need to get much faster. The chip should be able to do it, but it may require some fancy programming outside of the Arduino IDE :).

It will be important to make it shareable broadly and accessible to users of moderate skill (no editing of libraries or soldering on the Teensy board please).

Any positive, constructive discussion would be highly appreciated. This will enable good-performance, open-source and commercial spectrum analyzers, oscilloscopes, and potentially other instruments.

Thanks,
Greg
 
if the pins are continous, at least 2x 8-bit together, you could use a 65536-byte lookup-table and a dma channel with 3x scatter-gather (1: reset strobe, 2: read data (2b read data from 2nd port?) , 3: set strobe), triggered by a timer. it'll need a buffer where it stores the data.
A normal teensy 3.6 can do this. Or just wait for the Teensy 4 in summer.. 600MHz without overclock (overclocked much more..) faster IOs and way more ram. you'll have more cpu-cycles to work with the data, too.
 
if the pins are continous, at least 2x 8-bit together, you could use a 65536-byte lookup-table and a dma channel with 3x scatter-gather (1: reset strobe, 2: read data (2b read data from 2nd port?) , 3: set strobe), triggered by a timer. it'll need a buffer where it stores the data.
A normal teensy 3.6 can do this. Or just wait for the Teensy 4 in summer.. 600MHz without overclock (overclocked much more..) faster IOs and way more ram. you'll have more cpu-cycles to work with the data, too.

Thank you for your response!

The pins could be contiguous - avoiding those that are "blocked" by legacy Arduino stuff the audience out there would need. I have used pins all in a row in my previous "how fast can it go" thread.

I have not personally used DMA with the Teensy and was hoping to keep it all simple so as to reach a broader audience. It may be that it's too much to ask go to above about 2 Ms/s, which I've been able to approach without anything more than tight, simple, non-DMA code. I'm not even sure I could get a precise timer on a 100 ns basis to hit 10 Ms/s.

Nope, I'm not waiting for yet another device with it's long debug and library-fix cycle. Teensy 3.6 is solid, accessible and available. I'm trying to make something for the masses to learn from, so I'll just push it as fast as I can, and stop there.

Still happy to hire a guru to help make something happen here that would be shareable, understandable and reliable.
 
You should be able to get the timer working - IIRC, using the Teensy 3.6 clock and counting cycles will get you 4 ns precision at 240 MHz. I've thought about creating a nanos() library, similar in functionality to micros() and millis(), but haven't gotten around to it yet. With that as a starting point, you could then created an elapsedNanos() library for easy implementation of timing, again, similar to elapsedMicros and elapsedMillis.
 
oh i'd like to help, but I'm busy with T4 :)
a hardware-timer is for sure better than micros() or millis() or counting cycles.. this can all run "in hardware". never use software for exact timing.
 
You should be able to get the timer working - IIRC, using the Teensy 3.6 clock and counting cycles will get you 4 ns precision at 240 MHz. I've thought about creating a nanos() library, similar in functionality to micros() and millis(), but haven't gotten around to it yet. With that as a starting point, you could then created an elapsedNanos() library for easy implementation of timing, again, similar to elapsedMicros and elapsedMillis.

Man, that would be so awesome! Any guidance as to where to look for the details would be appreciated. Of course, the ISR would have to be fast enough, and as it stands, with my clunky bit-banging to get around port-scrambling, I can barely hit 1.5Ms/s at 16 bits just grabbing the bits, and maybe 200 ks/s shoving them into an array of floats and updating the tracking synthesizer. Nonetheless, if you think about it, nearly 100 dB theoretical SNR, 100 ks/s with tracking generator (50 kHz bandwidth) isn't too bad for $100.

My sense though is that unless something like that gets into the main PJRC universe, it will not be broadly used. I am focusing on the hardware side and the "explain quantization noise and FFTs to newbies" side, and I think that's long overdue. I'll get ridiculed for my coding and long comments, but I don't care. You develop a thick skin in education.

Thanks!
Greg
 
Thanks. I really appreciate it and understand the overload that happens.

Yup, I know about "in hardware" (my area) but software "NOP" tuning can work in tight loops so I don't see it as a no-fly zone. Once had to build a 300 MIPS thingy using ECL. Ha, ha! At least then the manuals weren't 2700 pages and full of typos... :)

It just pains me to see such potent hardware so cheap with unintended barriers to wringing out the full performance (an opinion).
 
Man, that would be so awesome! Any guidance as to where to look for the details would be appreciated. Of course, the ISR would have to be fast enough, and as it stands, with my clunky bit-banging to get around port-scrambling, I can barely hit 1.5Ms/s at 16 bits just grabbing the bits, and maybe 200 ks/s shoving them into an array of floats and updating the tracking synthesizer. Nonetheless, if you think about it, nearly 100 dB theoretical SNR, 100 ks/s with tracking generator (50 kHz bandwidth) isn't too bad for $100.

My sense though is that unless something like that gets into the main PJRC universe, it will not be broadly used. I am focusing on the hardware side and the "explain quantization noise and FFTs to newbies" side, and I think that's long overdue. I'll get ridiculed for my coding and long comments, but I don't care. You develop a thick skin in education.

Thanks!
Greg

So, theoretically speaking, if we run Teensy 3.6 at 240 MHz, we can compute that a single tick would be 4.1667 ns. If we used the newest overclocking scheme, which can get us to 256 MHz, that drops to 3.9 ns per tick.

The interrupt you're looking for is called SYST_CVR and you can find it used in pins_teensy.c. Below is a short example where I am using this to compute the number of ticks in 1 us and then computing the length of a tick. Note that it counts down, which is why the subtraction looks like it's the wrong order. Running this at 240 MHz, gives me about 4.15 ns - pretty spot on with the theoretical result.

Code:
void setup()
{
  Serial.begin(115200);
  while(!Serial) {}
  uint32_t ticks1 = SYST_CVR;
  delayMicroseconds(1);
  uint32_t ticks2 = SYST_CVR;
  Serial.println(ticks1 - ticks2);
  Serial.println(1000.0f / (ticks1 - ticks2));
}

void loop()
{
  
}
 
Your requirement to acquire 16 bit data at 1, 2, 10 MHz, seems to exceed the maximum sampling rate of the on-board analog input as listed in the data sheet.

See https://www.nxp.com/docs/en/data-sheet/K66P144M180SF5V2.pdf

In Table 31 16-bit ADC ooperating conditions, at the last entry in the table, it states that the maximum ADC conversion rate is 461.467 kS/S.

That said, with regard to getting 16 bits and low noise, you might be interested in this: https://forum.pjrc.com/threads/53173-USB-DAQ-with-the-Teensy
 
Your requirement to acquire 16 bit data at 1, 2, 10 MHz, seems to exceed the maximum sampling rate of the on-board analog input as listed in the data sheet.

See https://www.nxp.com/docs/en/data-sheet/K66P144M180SF5V2.pdf

In Table 31 16-bit ADC ooperating conditions, at the last entry in the table, it states that the maximum ADC conversion rate is 461.467 kS/S.

That said, with regard to getting 16 bits and low noise, you might be interested in this: https://forum.pjrc.com/threads/53173-USB-DAQ-with-the-Teensy

Thank you and yes, it certainly does. The datasheet from Kinetis is typical of "non-analog" folks. It talks about bits, but not ENOB, which is way, way less than 16. There is no way a chip like could provide 16 bit ENOB at 461 ks/s or people would be using it as an ADC since it would be cheaper than equivalent ADC chips. My bogo-meter is reading pretty high.

A friend who founded SRS (big scientific and test equipment company) swears by differential measurements on that type of chip to push up the ENOB.

I have mapped (via FFT :) the operating speed ranges of the internal ADC's versus ENOB and can reproducibly get on the order of 100 ks/s at 10 or 12 bits. The whole reason for my inquiry was about using ports (a big mess) or crude bit-banging input from parallel (generally flash or semi-flash) external ADC's. I'd like to go Ms/s with 10+ bit true ENOB.

Thanks - interesting thread you provided. Not what I need, but nice work!
 
Thanks. That's very cool. I'm not sure looking at your code how to derive a periodic interrupt from that, and for now simply use the Teensy IntervalTimer routine. Internal ADC's cannot keep up except to maybe 100-ish ks/s, so with those I don't need to go faster. For the external ADC stuff though, it could be great.

I'm wondering how hard it would be to write an interrupt handler and fast-bit-banger in assembly. I've never tried with a Teensy.
 
We're getting 16 bits at 300 ks/s. The data that was listed in our thread is showing noise at less than 1 LSB at 200 ks/s.
 
Impressive noise level. My 100 ks/s-ish statement was without mention that I'm also using the internal DAC to generate a sine using DDS as a tracking generator for an FFT.
 
Oh yes, and with no (zero) additional hardware. Adding proper filtering helps a ton, but I'm starting with the minimalist setup to demonstrate quantization noise, aliasing, etc.
 
Yep, the DAC is 12 bits. We run it through a nice driver circuit, which you can see in the lower left part of the board, and indeed 12 bits is what we see.
 
So let me please ask a question on the ADC. Assuming you are using the internal 1.2V reference, 16-bit ENOB would require 18 microvolt LSB being above noise. I'm still scratching my chin that a Teensy 3.6 could ever achieve that, let alone at 400+ ksps.

I'll try an experiment of simply streaming grounded-input (mid-rail, not zero volts) data and seeing how it looks.

Standard method is to use an ultra-pure sine source at a low frequency (e.g., 1 kHz), at full-scale, and do an FFT (double precision). I have the code... just need the data. :)
 
I'll have to go back and see what we did, but as i recall, on the Teensy 3.1, the default vref is a filtered version of the VOUT33. I am pretty sure that 3.3V is our full scale at the input to the Teensy. We set up other ranges including +/-, in the front end. We have a well filtered reference on the board as well.

I think SMT, along with the parts choices that gives you, plus some good layout and routing, is part of it too.

P/S - just noticed your proposed test, don't forget to make sure you're arrangement for the grounded-input is at least as quiet as what you hope to measure. it seems challenging with hobby board and jumper wire.
 
Would it meet your requirements to have a similar sort of board that has a mount for the teensy, and with fast high precision ADC's on it? I need to check the interfaces available on the teensy 3.6 of course. Otherwise, it would be similar to boards that we built previously using another processor family.
 
Sure - that would be great. Right now, I'm pushing for more speed. I am about to post about what can be done in terms of sample rates with external flash ADC's (not that much without a fairly simple external circuit). My goal was minimal hardware, cheap, open everything, for teaching.

I'm not sure how to send direct emails - will check.
 
Status
Not open for further replies.
Back
Top