Teensy 4.0 First Beta Test

Status
Not open for further replies.
Softwareserial:
Code:
"C:\\Arduino\\hardware\\teensy/../tools/arm8/bin/arm-none-eabi-g++" -c -O2 -g -Wall -ffunction-sections -fdata-sections -nostdlib -MMD -std=gnu++14 -fno-exceptions -fno-rtti -fno-threadsafe-statics -felide-constructors -Wno-error=narrowing -mthumb -mcpu=cortex-m7 -mfloat-abi=hard -mfpu=fpv5-d16 -D__IMXRT1052__ -DTEENSYDUINO=146 -DARDUINO=10808 -DF_CPU=396000000 -DUSB_SERIAL -DLAYOUT_US_ENGLISH "-Ic:\\temp\\arduino_build_958717/pch" "-IC:\\Arduino\\hardware\\teensy\\avr\\cores\\teensy4" "-IC:\\Arduino\\hardware\\teensy\\avr\\libraries\\SoftwareSerial" "C:\\Arduino\\hardware\\teensy\\avr\\libraries\\SoftwareSerial\\SoftwareSerial.cpp" -o "c:\\temp\\arduino_build_958717\\libraries\\SoftwareSerial\\SoftwareSerial.cpp.o"
C:\Arduino\hardware\teensy\avr\libraries\SoftwareSerial\SoftwareSerial.cpp:260:2: error: #error This version of SoftwareSerial supports only 20, 16 and 8MHz processors
 #error This version of SoftwareSerial supports only 20, 16 and 8MHz processors
  ^~~~~
 
oh... took me quite some time to figure out that _pimxrt_spi is = 0... why?? Kurt? pls help :) looks like i'm doing something wrong :)
nothing works..


Edit: so much defines...#idefs... that code is complicated. how about splitting it into different files?
Edit: one more warning:
Code:
C:\Users\Frank\Documents\Arduino\libraries\ILI9341_t3n-T4_WIP\ILI9341_t3n.cpp: In member function 'uint16_t ILI9341_t3n::readPixel(int16_t, int16_t)':
C:\Users\Frank\Documents\Arduino\libraries\ILI9341_t3n-T4_WIP\ILI9341_t3n.cpp:1347:9: warning: 'colors' is used uninitialized in this function [-Wuninitialized]
  return colors;
         ^~~~~~
Edit: one important hint:

so... i'd not read that register. instead use a shadow-copy in a variable(?)

Hi Frank - Lots of complications in the code - Note: that the origins if this code (and SPIN) is when I was doing testing for T3.6.

One of the main things I wanted was the ability to use any of the SPI Objects, not hard coded to SPI. So depending on which processor... Then with T3.6 SPI1 only had one hardware CS pin (unless you used ones on SD Card...) So worked out support for only using one CS pin... Then ...

So I created the SPIN objects to have a class object where I had hardware definitions for all of the SPI objects, and had wrappers to call of to the native SPI objects, which at that time were not derived from common class... Later most of the Spin stuff was migrated into SPI library, including a pointer to the SPI hardware register structure... But in SPI library this was changed to be the member "port" and to handle issues about constructors... In SPI it was changed to use:
port().<register name>

At some point would like to get to the point to remove SPIN library and use SPI, but I added some of the original ili9341_t3 functions to spin to manage knowing when queue was full, but before than I will probably convert all of the variables like: _pimxrt_spi , _pknetisk_spi, pkinetisl_spi to port...

Issues about getting anything to work:
As was mentioned in some other threads: This library will only work if the DC pin is on a hardware CS pin. Only pin 10 is defined. Paul pulled in the code that enabled this in the SPI library. setCS(10)

Update on the testing, I think I am having some better luck converting the DMA usage to NOT use: LPSPI_TCR_RXMSK but instead use the
_pimxrt_spi->CFGR1 |= LPSPI_CFGR1_NOSTALL

But still trying to figure out what the best way to exit out of this.

Thanks for fixing the typo earlier.

The unused color variable is because I still have not implemented the appropriate stuff to do the read operations, so code is #ifdefed out.
 
Kurt I've just used the sketch that was attached to your post, (yes I swapped the wires for CS and DC on my breadboard) and the latest SPIN library.

RXMSK worked better for me than NOSTALL
Ok I'm sure you get it running. Perhaps try to check the IDLE Bit to see if the last transfer is done.
 
Last edited:
Tim:
played a few minutes with ldrex/strex:
Code:
//…
  __LDREXW(&dummy);
//…
 } while ( __STREXW(0, &dummy));
//…
It just detects interrupts :) so.. if an interrupt is detected, it repeats the loop.
simple.

Frank - So the &dummy used doesn't really need to be changed in the _isr - it will trigger on any interrupt? I found this to match my running results.

I played a few hours with this and have a working sample - on github. I moved a simulation all into a sketch so I didn't have to recompile the core, and I could run _isr from an IntervalTimer.

IntervalTimer works perfectly with the code from pull request [except _isr corruption this fixes] to generate usable us values at 1000/sec. Without 'protection' test in loop() catches invalid micro() returns when the _isr hits. With the __LDREXW [or my simple test value] it is reliable!

BUT, I hoped to run it faster - though the F_CPU&us math doesn't work, so I just faked a return count++, but still had to test the values for _isr change, so I get them a second time with the same process and if they agree [even with _isr on first calc] and show a new us, an increased count is returned. This code is ~4 times slower [ adding the protection code actually runs FASTER than the old checks in the normal case ] - but allows using the IntervalTimer with 100us updates instead of 1000us - a double test running 10X faster and spending longer in the code, reading the values twice where they more often catch an _isr. Though if the _isr hits on second read that will result in values that don't compare and I accept that as a sign the first read worked, and the _isr was detected on the second so that is a good sign.

A test pass of 100,000,000 'fake micros' only takes about 31 secs with an _isr hitting each 100 us. One test that caused trouble [slipped a 1us tick only in the first loop() pass] was cold start in the first loop() of the Teensy with code already loaded, I resolved that it seems as it no longer happens at 1ms or 0.1ms _isr.

If somebody wants to give a look - I'll update my Pull Request to use the protect instructions. In the github sketch are three versions of micros() [comment #define at top to change] - the simple value and then the one with Frank's referred sample and then the third is the FastFake version at 10X speed. Below that are setup() and loop() where each loop runs a pass of checked return values. As uploaded I still have the actual protect checked value changed in the _isr - which above is indicate as not needed (except to run my version). Though when I pointed that &dummy at the real _isr data values it failed?
 
Last edited:
Yes, it triggers on any interrupt.
Tim,
maybe you don't need to use a dummy and you can use one of your variables instead - that would save a store.

Decided to work on my ILI9341_DMA library again, since I need it for emulators and more and need LPSPI4-CS for audio out. Works now, with any CS/DC pins, some functions are still missing (but that's just copying..) and i have an idea how to optimize it a little bit more...
Code:
ILI9341 Test!
Benchmark                Time (microseconds)
Screen fill              1280
Text                     370
Lines                    1350
Horiz/Vert Lines         340
Rectangles (outline)     210
Rectangles (filled)      2730
Circles (filled)         1320
Circles (outline)        600
Triangles (outline)      350
Triangles (filled)       1530
Rounded rects (outline)  370
Rounded rects (filled)   6390
Done!
 
@KurtE

SPI error and Radiohead Library

Morning. Just tried to compile the Radiohead Library and it returned an undefined reference to SPIClass::end(). Resolved it by adding this to the RT10x2 section of SPI.cpp:

Code:
void SPIClass::end() {}

Could you add that to your next SPI update. If not I can do it.
 
….
Decided to work on my ILI9341_DMA library again, since I need it for emulators and more and need LPSPI4-CS for audio out. Works now, with any CS/DC pins, some functions are still missing (but that's just copying..) and i have an idea how to optimize it a little bit more...
Code:
…...
Rounded rects (filled)   6390
Done!

@Frank - that's quite a performance boost over previous comparisons!

:)
 
Kurt I've just used the sketch that was attached to your post, (yes I swapped the wires for CS and DC on my breadboard) and the latest SPIN library.

RXMSK worked better for me than NOSTALL
Ok I'm sure you get it running. Perhaps try to check the IDLE Bit to see if the last transfer is done.

Thanks, I will go back to trying the RXMSK... Earlier I thought I checked for idle but will double check. Yesterday I had it semi working with NOSTALL, but at the end it would push out several dummy words, but at least not hang...

@KurtE

SPI error and Radiohead Library
Code:
void SPIClass::end() {}

Could you add that to your next SPI update. If not I can do it.
Will do. Or you can if you wish as I have no open PR requests...

Frank B said:
Decided to work on my ILI9341_DMA library again, since I need it for emulators and more and need LPSPI4-CS for audio out. Works now, with any CS/DC pins, some functions are still missing (but that's just copying..) and i have an idea how to optimize it a little bit more...

Sounds great. Hopefully will have mine fully functioning as well very soon now... Comparisons of timings is always an interesting discussion. They are sort of apple versus oranges. Example if we look at the test for fill screens:
Code:
unsigned long testFillScreen() {
  unsigned long start = micros();
  tft.fillScreen(ILI9341_BLACK);
  tft.fillScreen(ILI9341_RED);
  tft.fillScreen(ILI9341_GREEN);
  tft.fillScreen(ILI9341_BLUE);
  tft.fillScreen(ILI9341_BLACK);
  return micros() - start;
}
With all of the other setups, you will (or at least should) see the screen fully fill with Black, then Red, Green, Blue, Black. And that is what your timing is based on. With the DMA library, the timings is probably just how long does it take for you to set the frame buffers memory to each of these colors. What you see on the screen? May be partials the colors depending on where the screen refresh is at the points of the color changes...

Likewise with test like Rounded Rectangles - Are you mainly after seeing how long it takes to update the screen to the final output, or do you wish to see the progressions? Where the rounded rectangles start at the exterior of screen and work their way in toward the center...

@Frank (and others) - As you mentioned in earlier post, maybe makes sense to break up my library. But not sure which way I would go, also not sure what direction the main line ili9341_t3 library should take. A lot depends on usage patterns. Currently mine has maybe 4 ways of working, all with plus and minus

a) like adafruit library or ili9341_t3 library - Each time you call a graphic primitive, the screen is updated then. Requires no frame buffer or the like. This is however where you get the biggest speed gains when you use the hardware CS pin as DC as you can fully keep the SPI bus busy... This is what is shown in current graphic test numbers.

b) Use Frame Buffer, and call updateScreen when you wish for the screen to be updated. This is great for doing things like, you can update a text field, by simply filling the area with background color, then draw new text... Then update screen and not get any screen flashing... Code can optimized to only update those portions of the screen that were changed.... Requires no DMA. updateScreen is a version of the primitive: writeRect... Note: this could be made almost as fast without using hardware CS pin for DC, as you only need to do a few updates of DC for whole screen...

c) Use frame buffer and use DMA to update the screen once per call (Which I am debugging now) - like b) but when you call it's version of update:
updateScreenAsync, the code starts up a DMA operation, and you don't have to wait until it completes, before your code can do something else. There are support functions, that allow you to know if the update is still active or wait until it completes. This is great if you wish to update some stuff on the screen go do work and then if you want to update the screen again, you can then wait until the previous update is completed before touching frame buffer...
Note: this code is currently setup to always update whole screen.

d) Use Frame buffer and do continuous DMA updates of the screen, like Frank's code. Plus is using hardware CS for DC gives you almost nothing as you only change them initialize the first screen update, and then everything is running through DMA... (part of the debugging of c)

Great for things like video. But if you do things like clear an area (or whole screen) to draw new stuff, you may or may not get screen flashing and the like, so you need to understand how your updates may show up on screen. Example updating text field on screen, would probably work like a) and use opaque text output, such that the pixels are only updated once...


Again I guess the real question I am asking here is how we should try to make ili9341_t3 library work? Does it need to be compatible with how it works on a T3.x? Should it require hardware CS pin? Should it use FB? DMA? ...
 
Yes, the comparison is apples vs oranges.
If you want to see the screen-updates - yes, then, the method without buffer is better. and in some cases this makes sense, if you know that you will update only a small area over a long time.

If you just want to see the result -the buffer is better :) - almost all other displays (your computer, tv) work this way.
(And a second buffer (later, with 1062) will allow double-buffering or amazing effects..)

Flicker: As you see on the videodemo, or the emulators (not only c64 - there are more from other guys) - there is (almost) no flickering. Not more than on old TV-sets. It's just too fast to flicker - but this requires fast refresh-rates and fast SPI. I'm glad it works up to 66MHz on T4 - so we can have (at least old PAL countries) video refresh-rates. Maybe with some tricks NTSC rates, too, if we add a small border around the screen... an idea that still waits for realization :) And the second buffer will allow 100% flicker-less output.
 
Last edited:
Thanks Frank,

Yes - double buffering and like will be fun! And it will be fun to see what usage patterns people come up with.

I thought I should do a quick sanity check for timings, so reran my ili9341_t3n library test, to get a general idea of how long does it actually take to draw a full screen. So I instrumented the Fill screen test, to toggle an IO pin like:
Code:
unsigned long testFillScreen() {
  unsigned long start = micros();
  digitalWriteFast(DEBUG_PIN, HIGH);
  tft.fillScreen(ILI9341_BLACK);
  digitalWriteFast(DEBUG_PIN, LOW);
  tft.fillScreen(ILI9341_RED);
  digitalWriteFast(DEBUG_PIN, HIGH);
  tft.fillScreen(ILI9341_GREEN);
  digitalWriteFast(DEBUG_PIN, LOW);
  tft.fillScreen(ILI9341_BLUE);
  digitalWriteFast(DEBUG_PIN, HIGH);
  tft.fillScreen(ILI9341_BLACK);
  digitalWriteFast(DEBUG_PIN, LOW);
  return micros() - start;
}
Then ran the test... Actually I could have done this without using the debug pin and simply measure CS pin, but this makes it easier to find in trace.
So here is a sort of top level output showing this test:
screenshot.jpg
If you look closer at one of these screen updates, there is not much wasted time...
screenshot.jpg
You can see where the DC pin changes for the commands: 2a, 2b and 2c and then there is no real gap between words (pixels).

Note there is some wasted time, in that like ili9341_t3 library after every 2 lines I do an endTransaction/beginTransaction with the thought that it might allow other devices to talk on SPI... Not sure if that is needed.

Again you can see that here:
screenshot.jpg

FYI - I have tried disabling the end/begin and it dropped the time for a full screen update to: 49.04ms...

And yes currently running at 25mhz...
 
Oh, I see, I need a faster logic analyzer... I have a logic 16(clone) that in theory can do 100MHz, but it switches back to 80MHz all the time and complains about USB Speed.. so for fast SPI not usable if I want to measure timings...
Does anyone know a good alternative that is affordable? Should do 200MHz min.
 
Oh, I see, I need a faster logic analyzer... I have a logic 16(clone) that in theory can do 100MHz, but it switches back to 80MHz all the time and complains about USB Speed.. so for fast SPI not usable if I want to measure timings...
Does anyone know a good alternative that is affordable? Should do 200MHz min.
I have the Logic Pro 8 - which I am using here... I purchased it before the price increases... But not exactly affordable but can do 500mhz at USB 3... I purchased this when I found my SPI outputs were not showing up correctly and later helps with USB2 stuff...
 
Logic Pro 8: EUR 737,-- oooops...
Let's see if we can play a bit with the clock of 1062..maybe we can do 200 MHz or a bit more.. that would help - it has faster GPIOs.
 
Re: systick, micros, GPT

Just one more thought on systick and micros. You could replace the core systick timing with GPT2 timer. GPTx has COMPARE register so can do the 1 ms tick interrupt, and the GPT clock is running at 24 mhz, so you get full resolution micros() and a clock independent of CPU clock.

Actually, I would like to see systick used at CPU clock speed, and systick reconfigured if user changes CPU clock speed ...
 
@Kurt, or others..

can you measure if this can do 144MHz:
Code:
  CCM_CCGR1 &= ~CCM_CCGR1_LPSPI4(CCM_CCGR_ON); //Clock Gate off  
/*
  CCM_CBCMR_LPSPI_CLK_SEL :
    00 derive clock from PLL3 PFD1 clk   664.62 MHz
    01 derive clock from PLL3 PFD0       720 MHz
    10 derive clock from PLL2            528 MHz
    11 derive clock from PLL2 PFD2       396 MHz
*/
/*  
  CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_LPSPI_PODF_MASK | CCM_CBCMR_LPSPI_CLK_SEL_MASK)) |
      CCM_CBCMR_LPSPI_PODF(2) | CCM_CBCMR_LPSPI_CLK_SEL(2); // pg 714
*/
  CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_LPSPI_PODF_MASK | CCM_CBCMR_LPSPI_CLK_SEL_MASK)) |
      CCM_CBCMR_LPSPI_PODF(0) | CCM_CBCMR_LPSPI_CLK_SEL(1); // pg 714

  CCM_CCGR1 |= CCM_CCGR1_LPSPI4(CCM_CCGR_ON); //Clock Gate on
And can you test with a ILI9341 library if your display works ?

Mine still works with this setting, but i'm not sure if it is really 144MHz..

Edit: Remember, you need to do this after SPI.begin(). SPI.begin() would overwrite it.
I don't have any devices to measure that.
 
I'm really sorry to interrupt this thread and doubly sorry if this has already been mentioned, but will T4 be pin compatible with T3.6?

p.s. how exciting!!
 
I'm really sorry to interrupt this thread and doubly sorry if this has already been mentioned, but will T4 be pin compatible with T3.6?

p.s. how exciting!!

Yes exciting, but no, this board has far fewer pins than the T3.6 and is closer to size of the T3.2.

The current pin assignments for the beta board are shown on the first page #3
 
Yes exciting, but no, this board has far fewer pins than the T3.6 and is closer to size of the T3.2.

The current pin assignments for the beta board are shown on the first page #3

Thanks very much for the info. I don't really need the pins, but wondering if any of my current designs which plug the T3.6 in will work. Don't suppose any of those beta boards are available to less active members like myself if we promise to give good feedback? ;)
 
Hi Frank, I probably tried it wrong, but I tried making those changes directly within SPI.begin
Also tried to update the SPISettings:
Code:
		//uint32_t clkhz = 528000000u / (((CCM_CBCMR >> 26 ) & 0x07 ) + 1);  // LPSPI peripheral clock
		uint32_t clkhz = 1440000000u / (((CCM_CBCMR >> 26 ) & 0x07 ) + 1);  // LPSPI peripheral clock
Again not sure if that is correct.

I then also changed my ILI9341_t3n to ask for 150mhz...

I updated test graphic test to output some value here.
Code:
  Serial.printf("clkhz(%x) %lu\n",CCM_CBCMR, 1440000000u / (((CCM_CBCMR >> 26 ) & 0x07 ) + 1));  // LPSPI peripheral clock
And it output: clkhz(21ae8314) 1440000000
Note I used windows calculator in programmer mode to take: 21ae8314 >> 24 and it equaled 8
So the end calculation of clkhz = 144mhz is correct.

In the SPISettings, it then computes div=0...

And if I look CCR page at SCKDIV (page 2660) it looks like you divide by the value+2.
So I would expect that: I would maybe get 72mhz.

But looking at LA output, I am getting closer to 35.71mhz So I am probably still setting something wrong.
 
@Kurt, it should be
Code:
uint32_t clkhz = [B]720[/B]0000000u / (((CCM_CBCMR >> 26 ) & 0x07 ) + 1);  // LPSPI peripheral clock
and 144MHz in SPISettings.

Can you measure that?
 
H Frank,

I ran into issues with this.

The value 7,200,000,000 in hex is: 1AD274800
Which does not fit in 32 bits... So computations were screwed up...

When I actually tried it, the SPI actually ran at 17.86mhz... Will see if I can compute a hack...
 
Status
Not open for further replies.
Back
Top