Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 8 of 8

Thread: Teensy 4.0 Bitbang - FAST.

  1. #1

    Teensy 4.0 Bitbang - FAST.

    I couldn't help but test the new boards when they arrived. Super simple code, just to see how fast a pin could swing. Answer: 150 MHz.

    This is done without a load, no (as yet) understanding of slew-rate limiting or not (as implemented on Teensy 3.6), and not the best grounding (hence negative ground bounce). Still, freakin' cool.

    Is anyone working on port mapping for parallel IO? It will be insanely fast, and there are chips that can handle it for making and sampling signals (my focus).

    Click image for larger version. 

Name:	Teensy4BitBang.png 
Views:	25 
Size:	38.6 KB 
ID:	17173

    Just FYI, the setup. Agilent MSO-X 4104A, N2795A 1GHz probe. Hopelessly, awfully wrong grounding. This is RF. I couldn't resist though.

    May do it again later with 50 GHz scope just for fun.

    Click image for larger version. 

Name:	IMG_1966.jpg 
Views:	12 
Size:	69.8 KB 
ID:	17174

    Thanks for reading.

    Greg

  2. #2
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,780
    Would it be ok to post this on social networking? Are you on Twitter, so I can give you credit?

  3. #3

    Improved measurements...

    Paul,
    I don't use social media, but please feel free to post or use anything I put up here. My aim is to help the community. Thank you and your co-developers SO MUCH for the ARM-based Teensy versions. They have been complete game-changers for research, education and, as I see from the internet, animated Chewbacca costumes and such.

    I re-did the measurements with a 100 Ohm load and used D2 as the output for closer grounding. The load limited the swing to 3V but that's ok.

    Confirmed clean, jitter-free 150MHz bit banging is now possible. Woo-hoo!

    With port I/O could move rivers of bits at speed. For now, do it bit-wise.

    Much more coming as I put the Teensy 4.0 through its paces.

    Thanks,
    Greg

    Code:
    //First test of Teensy 4.0 bit-bang speed 8/11/19
    //Note - not yet clear if there are slew-rate control options or not-yet-implemented speed-up techniques.
    //Compiler setting: "Fastest"

    void setup() {
    pinMode(2,OUTPUT);
    // CORE_PIN10_CONFIG = PORT_PCR_MUX(1); // no slew rate limit DOES NOT WORK on TEENAttachment 17180Attachment 17181Attachment 17182Attachment 17183SY 4

    //See about 150MHz output rate.

    }

    void yield () {} //Get rid of the hidden function that checks for serial input and such.

    FASTRUN void loop() {
    noInterrupts();

    while (1)
    {
    digitalWriteFast(2,HIGH);
    digitalWriteFast(2,LOW);
    }

    }


    Setup:

    Click image for larger version. 

Name:	Setup_For_BitBang.jpg 
Views:	3 
Size:	64.5 KB 
ID:	17184

    Waveforms (horizontal scale 2ns/div for first, 500 ps/div for rising and falling edges):

    Click image for larger version. 

Name:	100_Ohm_Load_Bit_Bang.png 
Views:	14 
Size:	39.9 KB 
ID:	17185

    Click image for larger version. 

Name:	RisingEdge.png 
Views:	2 
Size:	33.6 KB 
ID:	17186

    Click image for larger version. 

Name:	FallingEdge.png 
Views:	1 
Size:	33.4 KB 
ID:	17187

  4. #4
    Incidentally, adding in "__asm__ __volatile__ ("nop\n\t"); for delays does not (yet) work, but this is my main "tried and true" delay mechanism for the Teensy 3.6.

    Is there something coming for precise code delays?

    Simply doubling up the HIGH and LOW writes produces exactly 1/2 of the frequency seen above, or going from 150.0 MHz to 75.0 MHz.

    void yield () {} //Get rid of the hidden function that checks for serial input and such.

    FASTRUN void loop() {
    noInterrupts();

    while (1)
    {
    digitalWriteFast(2,HIGH);
    __asm__ __volatile__ ("nop\n\t"); //This has no effect.
    digitalWriteFast(2,HIGH);
    digitalWriteFast(2,LOW);
    __asm__ __volatile__ ("nop\n\t"); //This has no effect.
    digitalWriteFast(2,LOW);

    }
    }

  5. #5
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    10,075
    The nop may be executing in parallel? Directly from the dual exec - or indirectly while the bus clock is coming around for the change?

    Look into : static inline void delayNanoseconds(uint32_t nsec)

    In the tight ' while(1) ' in use the yield() doesn't apply - that only comes in on return/exit from loop() before re-entry.

    Seems 'port' based I/O is not supported on T4?

    Early in beta there seemed to be diff in timing of transitions? That was based on ARM_DWT_CYCCNT - wonder what the scope shows?

    Code:
    while (1)
    {
    digitalWriteFast(2,HIGH);
    digitalWriteFast(2,LOW);
    }
    Code:
    while (1)
    {
    digitalWriteFast(2,LOW);
    digitalWriteFast(2,HIGH);
    }
    Code:
    while (1)
    {
    digitalWriteFast(2,!digitalReadFast(2) );
    }

  6. #6
    Here you go, scope photos in order of your examples.

    First two: exactly the same, 150 MHz, symmetrical.

    Last one: 23.077 MHz, symmetrical. 6.5X slowdown.

    Click image for larger version. 

Name:	Defrag1.png 
Views:	10 
Size:	43.0 KB 
ID:	17189

    Click image for larger version. 

Name:	Defrag2.png 
Views:	8 
Size:	43.1 KB 
ID:	17190

    Click image for larger version. 

Name:	Defrag3.png 
Views:	15 
Size:	41.2 KB 
ID:	17191

  7. #7
    Trying basic direct digital synthesis that works perfectly on the 3.6 and getting strange "wobble" in frequency, seems random and is about 10% around the desired frequency. The basic idea is the make a look-up-table of the waveform to be synthesized and "walk through" it based on the top bits of a 32-bit word that is incremented repetitively with a "fraction" that is determined based on the ratio of the desired output frequency and the effective sample rate of the loop, determined empirically for the actual loop code. The output of the look-up-table is output to a set of pins, which are connected to a D/A converter.

    Any ideas? Are there background processes that might be taking variable amounts of time and not suppressed by interrupt priority (assuming that even works with 4.0 as I have used it)?

    Code:
    //  Port-Write DDS Experiments, Teensy 4.0, 600MHz clock
    //  G. Kovacs, 8/12/19
    //  Calculate a 8-bit (uint16_t, straight binary) sinewave look-up-table
    //  LS bit = D0, MS-bit = D7.
    
    #define arraySize 256  //Must be an even power of two, and usually the same number of points as the number of DAC steps, here 256.
    
    int numbitsLUT = int(log(arraySize)/log(2));  //Compute number of bits needed to address DDS LUT, and later, from this, how many to shift the DDS accumulator to address LUT.
    const uint16_t DDSshift = 32 - numbitsLUT;    //use const for speed
    
    uint8_t  waveform[arraySize];
    uint16_t synthTemp;
    
    int pinPoint;
    uint8_t pointOut;
    uint16_t pointer = 0;
    uint32_t sum = 0;     //This is the DDS "accumulator" that rolls over at a frequency determined by "pointerIncrement," thus defining the output frequency.
    float freqOut = 100.000E3;                //Desired output frequency
    const float measuredSampleRate = 10.1E6; //Effective sample rate, determined by trial and error for a given code version.
    
    uint32_t pointerIncrement = 0;
    
    void setup() {   
        for (int i = 0; i < 8; i++) {pinMode(i, OUTPUT);}
    
        pointerIncrement = int((freqOut/measuredSampleRate)*pow(2,32)+0.5);
        
    for (uint16_t i=0; i<arraySize; i++)
          {
            // Data is scaled to 0..255, unsigned, 8-bit binary for DAC.
            // This floating point mapping gives a verified good sine LUT.
            synthTemp = (32767.5+(32767.5*sin(2*3.141592654*(float(i)/(arraySize-1)))));  //Here use a sinewave, but could be anything desired.
    
            waveform[i] = synthTemp>>8; // shift data here for 8-bit external DAC. Adjust as necessary.        
          }
          //See: https://forum.pjrc.com/threads/27690-IntervalTimer-is-not-precise
          //Technique to reduce intervalTimer jitter.
          SCB_SHPR3 = 0x20200000;  // Systick = priority 32 (defaults to zero, or highest priority)
    }
    
    void yield () {} //Get rid of the hidden function that checks for serial input and such.
    
    FASTRUN void loop() 
    // Use FASTRUN to force code to run in RAM.
    {   
      noInterrupts();
      while (1) //Loop inside void loop () avoids the overhead of the main loop.
    
     {
         pointer = sum >> DDSshift; //Shift to fit range waveform look-up table. Change as needed.
         pointOut = waveform[pointer];
         
         digitalWriteFast(7, (0x80 & pointOut)); //MSbit
         digitalWriteFast(6, (0x40 & pointOut));
         digitalWriteFast(5, (0x20 & pointOut));  
         digitalWriteFast(4, (0x10 & pointOut)); 
         digitalWriteFast(3, (0x08 & pointOut)); 
         digitalWriteFast(2, (0x04 & pointOut)); 
         digitalWriteFast(1, (0x02 & pointOut)); 
         digitalWriteFast(0, (0x01 & pointOut)); //LSbit   
    
    //    Value added to "sum" determines the output frequency. Larger values added translate to lower frequencies. That's DDS!
    //     sum = sum + 0x80000000;  //For sine LUT, should be Nyquist rate, or 1/2 of effective sample rate, on the MSBit, pin 29. (mayb 0x7FFFFFFF?).
         sum = sum + pointerIncrement;  //For sine LUT, should be Nyquist rate, or 1/2 of effective sample rate, on the MSBit, pin 29.
       }
    }

    With a simple R2R D/A it works except for the "wobble" - the spikes are because the output is not yet lowpass filtered. On the Teensy 3.6, I can do >8 Megasamples/second at 16 bits out using port writes.

    Click image for larger version. 

Name:	DDSWobble.png 
Views:	20 
Size:	56.2 KB 
ID:	17192

  8. #8
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    10,075
    Quote Originally Posted by StanfordEE View Post
    Here you go, scope photos in order of your examples.

    First two: exactly the same, 150 MHz, symmetrical.

    Last one: 23.077 MHz, symmetrical. 6.5X slowdown.

    ...
    Interesting - really squares off better with more time - i.e. zoomed out of the transitions 6.5X

    Should have added a 4th query to drop Read() time:
    Code:
    bool tgl=true;
    while (1)
    {
      digitalWriteFast(2,tgl );
      tgl=!tgl;
    }
    Wrote a sketch that does FreqCount at that shows 30 MHz, and the Write( !Read ) gives freq of 23 MHz - it doesn't measure well above that.

    Also check out this post :: https://forum.pjrc.com/threads/54711...l=1#post212280

    for sample:
    Code:
    static inline void delayCycles(uint32_t) __attribute__((always_inline, unused));
    static inline void delayCycles(uint32_t cycles)
    { // MIN return in 7 cycles NEAR 20 cycles it gives wait +/- 2 cycles - with sketch timing overhead
    	uint32_t begin = ARM_DWT_CYCCNT-12; // Overhead Factor for execution
    	while (ARM_DWT_CYCCNT - begin < cycles) ; // wait 
    }
    Not sure if that has reason to go into the Teensy Core Code?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •