Soft SPI on T4.x

Status
Not open for further replies.

oric_dan

Well-known member
1. First off, I've tried to find a Soft SPI library for Teensy 4.x, but found little. Does someone have one? You'll know why I ask in a bit.

So, I d/l'ed Teensyduino v1.53 to Arduino IDE v.1.8.12 a couple of days ago.

2. I tried using 2 instantiations of the ILI9341_t3 class to operate two standard Lcd modules from SPI0 (using different CS pins of course), but it wouldn't work for me. I think other people do it, but I didn't spend a lot of time debugging. (I need to try this further).

3. What I did do was try using Soft SPI with the ILI9341 displays and the T4.x, but that constructor no longer exists in the ILI9341_t3 library. So I grabbed the Adafruit_ILI9341 library circa 2016 from my old Teensyduino install on IDE v1.6.13, and at first it wouldn't compile, but I hacked on it a bit, and got it working.

4. So the results using ILI9341_t3 and the old Soft SPI code with the standard graphicstest program are shown below. Of interest, amazingly the Soft SPI test runs on average "only" 1.42X slower. I thought it might be "much much slower".

5. Now to the point: it occurs to me that Soft SPI is so fast on the T4.x that I can use multiple instantiations on different sets of pins, and not worry about issues with the Hard SPI ports. (I don't need blazing SPI speeds). Therefore, interested in a Soft SPI library, re item 1 above.

6. Now the question mark: on looking at the old code for Soft SPI ops, I notice there are no delays in there, so I am wondering how the darn thing was throttled back far enough on the T4.x (running at 1.008 GHz, BTW) that it ran fine on the Lcd, which I believe is limited to 30-MHz SPI rate. ????? Here is some of the old code, you'll notice it bit-bangs SPI, but there are no delays in there.

7. I've noticed on other threads people doing I/O bit-banging upwards to 150-MHz on the T4.x, so why isn't this code blasting the Lcd all to heck?

Code:
void Adafruit_ILI9341::spiwrite(uint8_t c) {

  //Serial.print("0x"); Serial.print(c, HEX); Serial.print(", ");

  if (hwSPI) {
#if defined (__AVR__)
      uint8_t backupSPCR = SPCR;
    SPCR = mySPCR;
    SPDR = c;
    while(!(SPSR & _BV(SPIF)));
    SPCR = backupSPCR;
#elif defined(TEENSYDUINO)
    SPI.transfer(c);
#elif defined (__arm__)
    SPI.setClockDivider(11); // 8-ish MHz (full! speed!)
    SPI.setBitOrder(MSBFIRST);
    SPI.setDataMode(SPI_MODE0);
    SPI.transfer(c);
#endif
  } else {
    // Fast SPI bitbang swiped from LPD8806 library
    for(uint8_t bit = 0x80; bit; bit >>= 1) {
      if(c & bit) {
	//digitalWrite(_mosi, HIGH); 
	*mosiport |=  mosipinmask;
      } else {
	//digitalWrite(_mosi, LOW); 
	*mosiport &= ~mosipinmask;
      }
      //digitalWrite(_sclk, HIGH);
      *clkport |=  clkpinmask;
      //digitalWrite(_sclk, LOW);
      *clkport &= ~clkpinmask;
    }
  }
}


(04/26/21) T4.1 Hard SPI
------------------------
ILI9341 Test...
Benchmark Time (microseconds)
Screen fill 205264
Text 8385
Lines 64513
Horiz/Vert Lines 17419
Rectangles (outline) 11135
Rectangles (filled) 421431
Circles (filled) 64281
Circles (outline) 43112
Triangles (outline) 15276
Triangles (filled) 144514
Rounded rects (outline) 16782
Rounded rects (filled) 466422
Done!


////////////////////////////////////////////

(04/26/21) T4.1 Soft SPI
------------------------
ILI9341 Test...
Benchmark Time (microseconds)
Screen fill 264125 (1.28X slower)
Text 13874 (1.66X)
Lines 133240 (2.04X)
Horiz/Vert Lines 21778 (1.25X)
Rectangles (outline) 13841 (1.24X)
Rectangles (filled) 533664 (1.26X)
Circles (filled) 63560 (0.99X)
Circles (outline) 58224 (1.35X)
Triangles (outline) 30302 (1.99X)
Triangles (filled) 181690 (1.25X)
Rounded rects (outline) 27332 (1.63X)
Rounded rects (filled) 544964 (1.17X)
Done!
overall average (1.42X slower)
 
Last edited:
Now, I have another question. I put that bit-bang code into a for-loop and ran it 1,000,000 times. It took 322-msec, so that's about 3.1-MBytes/sec, or equivalent to a 31-MHz SPI bit clock.

So, that's fine for the Lcd, but now I wonder why does the code execute so slowly on a chip running 1.008-GHz???
 
Sorry, I am not sure what is all going on here and there are several other things I am playing with right now, so won't be much help here.

But if it were me, I think I would spend a few more minutes debugging why you could not get the two displays to work on SPI...

The most likely cases I have run into include:
a) Using the same pin for Reset on both displays and pass that in on their constructors... So when each one starts up it wipes out the other as well.
b) State of CS pins at startup. That is you want both to be pulled High before calling the begin functions. You can do it with external Pull Up resistors to each CS OR you can have your code do something like:
PinMode(CS1, OUTPUT); digitalWrite(CS1, HIGH); pinMode(CS2, OUTPUT); digitalWrite(CS2, HIGH); tft1.begin(); tft2.begin();
...
 
@KurtE. Quick follow-up. Regarding running 2 ILI9341 Lcds both from SPI0 on the T4.0, I got that working simply by moving the 2nd Lcd's Reset to a separate I/O pin. So, thanks for the help, :).

CS previously had its own I/O pin, but without specifically checking I had figured Reset would only be pulsed once during tft.begin(). Apparently not. So, looks good, 2 Lcds off SPI0. The board h.w. is getting normalized now.

Also, I did a bit more testing on the Soft SPI spiwrite() fcn, listed above. Changing from those port pointers (eg, *mosiport) to digitalWrite() did slow down execution from equivalent of 31-MHz SPI cock to 8.6-MHz.

I am still investigating why it's still so slow at 31-MHz, given the 1.008-GHz processor speed. I'm assuming the pointer usage as defined is still not all that efficient.

EDIT: actually, more like 24.8-MHz rather than 31-Mhz as stated previously.
 
Last edited:
A little more testing on the spiwrite() function from the original post. Finally got around to using digitalWriteFast() and it executes 1,000,000 loops in 108-msec. Equivalent to about a 128-MHz SPI clock using "Soft" SPI. That's in line with what StanfordEE is getting on the other thread.

https://forum.pjrc.com/threads/57185-Teensy-4-0-Bitbang-FAST

So, who needs Hard SPI with the T4.x? Just kidding, Now, I can have many peripherals on different pins, but it's necessary to build delays in to slow it down for most peripherals. So, now I have most of the issues from my original post solved.

Hooray for Paul and the Teensy 4.x modules !!!
 
Dang, my calculator keeps throwing an operator error. Must need new batteries. 1,000,000 loops in 0.108 sec corresponds to 9.25-MBytes/sec or 74-Mbps, not 128-Mbps.

I reran the graphicstest demo again using the Soft SPI spiwrite() with digitalWriteFast(). The demo now runs 1.25X "faster" than using standard Hard SPI, whereas the original older Adafruit Soft SPI demo (described above) ran 1.42X slower.

A picture of this project on the other thread.
https://forum.pjrc.com/threads/67075-My-Latest-2-GHz-Dual-Core-Teensy-Project?p=278248#post278248
 
Status
Not open for further replies.
Back
Top