Teensy 4.0 First Beta Test

Status
Not open for further replies.
Yeah - with the T4 redefine to linker attribute the old - make it NULL on ARM's will show up. I must have missed it before so not a recent change.
 
Kurt/Tim
I just downloaded the latest version of ILI9341/GFX as a quick test and put it into the core libraries. I don't remember getting that message when I did the original testing on the adafruit ili9341 library. Anyway, I ran two of the graphics test sketches, one generic and the other identified graphicstest_featherwing:
1. using the featherwing version I didn't get the "progmem" and no error but they do have this identified for the teensyduino:
Code:
#ifdef TEENSYDUINO
   #define TFT_DC   10
   #define TFT_CS   4
   #define STMPE_CS 3
   #define SD_CS    8
#endif
2. For then normal graphicstest example. If I use the constructor
Code:
//Adafruit_ILI9341 tft = Adafruit_ILI9341(TFT_CS, TFT_DC);
I will get the PROGMEM error. However, if I use,
Code:
Adafruit_ILI9341 tft = Adafruit_ILI9341(TFT_CS, TFT_DC, 11, 13, 1, 12);
you don't get that error.
3. Ran SSD1306 example as well and got the same error in glcdfont.c.

EDIT: This also resolves other issues as well
The fix is rather simple in that file. Put this as a replacement for the pgmspace define:
Code:
#ifdef __AVR__
 #include <avr/io.h>
 #include <avr/pgmspace.h>
#elif defined(ESP8266)
 #include <pgmspace.h>
#elif defined(TEENSYDUINO)
 #include <avr/pgmspace.h>
#else
 #define PROGMEM
#endif

EDIT1: Looks like our messages crossed. This looks like it would be more appropriate:
Code:
#if defined(__AVR__) || defined(TEENSYDUINO)
 #include <avr/io.h>
 #include <avr/pgmspace.h>
#elif defined(ESP8266)
 #include <pgmspace.h>
#else
 #define PROGMEM
#endif

EDIT2: The interesting thing is for the T3.x test compiles shows its happy with the original defines. Exception is I get this warning:
Code:
F:\arduino-1.8.8-t4\hardware\teensy\avr\cores\teensy3\Stream.cpp: In member function 'bool Stream::findUntil(const char*, const char*)':
 
Last edited:
@Paul.
Got it. Would it possible for you to maintain a copy of GFX in your repositories? Reason I am asking is I just went over the adafruit issue page but there doesn't appear to much support going on right now - a ton of open issues and PRs?
 
Would it possible for you to maintain a copy of GFX in your repositories?

Yes. In fact, one of the *many* things I want to do later this year or maybe in 2020 (everything takes longer than anticipated) is better display libs with a common API having more features.

Doing this before T4 is released obviously wouldn't be feasible. Just to be realistic, right after T4, I'm planning to put some time into long-planned Arduino IDE improvements - with a goal of being able to demo at the San Mateo Maker Faire in May (the next time I'll meet in person with Massimo, Fabio, Luca, Sandeep and the other Arduino folks).

My next priority after IDE features is a long list of audio library improvements. I've kind-of neglected the audio lib for these last couple years while focusing on Teensy 3.6 and USB host. So much work is needed there...

I'd also really like to talk, ideally in person, with Limor, Phil T, Scott, Phil B, Kevin, John and others at Adafruit before doing something like this. Not sure if I'm going to make a visit to the east coast though (though Scott & Phil B are here on the west coast).

I'll update the Adafruit libs sooner. GFX should be simple. ST7735 is a mess and really needs to be broken off to a "_t3" version like we have for ILI9341. Long-term, I want to migrate to not having Adafruit's libs included with the Teensyduino installer. The ones we have in there now are sort of a legacy from the days before Arduino had their library manager for easy install of libs, and back when Adafruit's approach to libraries was very 8 bit AVR focused. So much has changed in the last ~4 years.
 
Yes. In fact, one of the *many* things I want to do later this year or maybe in 2020 (everything takes longer than anticipated) is better display libs with a common API having more features.
...

This ILI9488 will be a good place to start :) - I just ordered a pair from your AliExp link to be delivered in hopefully under 42 days … maybe longer …

View attachment 15819
Got this thing to play with when audio is working..

Hope the T4 USB Host gets done in good time and not a drag.

Really promising seeing the FlexIO do 2nd SPI (and running the small display test about as quickly at 100 MHz as 600 MHz leaves lots of headroom) - and also hearing the Audio board - so much good.

Using three windows of TyComm for sermon (and uploads) working well - even swapped USB cable between T4 and a T_3.1 and it came up running with the other T_3.1 active.

And I saw my pull request to fix blink rate on fault went in - it was half rate with the Speed drop in beta8
 
I don't know much about OpenGL, but it is kind of a standard, and maybe supporting at least a small subset of its api might be an idea.. There seems to be a smaller version OpenGL ES which is aimed at embedded platforms.
 
Tim, at least some SPI Raspberry Displays are with ILI9488 (or 9486 - what is the difference?) - They just have a different connector. ( I also have one of these and I hope I can make it work, too) Waveshare has one on their site and they claim it supports 125MHz SPI - Not sure if it is ILI9486 or 9488.
 
D cache tests

FWIW, I did some tests with D cache on and off (i think), reading data from stack (DTCM), or OCRAM (malloc), or PROGMEM.
https://github.com/manitou48/teensy4/tree/master/cachetst
1000 reps of inner product of 2 1000-element float vectors (should be data cacheable).

Code:
SCB_CCR 0x70200
SCB_ID_CCSIDR 0xF01FE019
SCB_ID_CSSELR 0x0
SCB_ID_CLIDR 0x9000003
N 1000  REPS 1000
memory              addr        us    mflops  sum
stack/DTCM      0x2001E088     11676   171.3 38217888
malloc/OCRAM    0x60002204     11733   170.5 38217888
PROGMEM/flash   0x20200008     11676   171.3 38217888
D cache off
stack/DTCM      0x2001E088     11676   171.3 38217888
malloc/OCRAM    0x60002204   1132899     1.8 38217888
PROGMEM/flash   0x20200008     80016    25.0 38217888
SCB_CCR 0x60200

D cache on  10000-element vectors, 100 reps, vectors bigger than 32KB Dcache
N 10000  REPS 100
memory              addr        us    mflops  sum
stack/DTCM      0x2000C748     11668   171.4 0
malloc/OCRAM    0x6000AEA4    213298     9.4 0
PROGMEM/flash   0x20200008     18950   105.5 0
The disableDCache() was derived from SDK SCB_DisableDCache() in SDK core_cm7.h, BUT I could not get the erase-cache section (DCCISW) to work on T4 (nothing printed), so I commented out the DCCISW line. I'm not sure this provides any insights... why is OCRAM slower than PROGMEM? Many SDK examples use SCB_DisableDCache() or use memory sections marked as non-cacheable.

It doesn't appear that I can write to PROGMEM, is that correct?


EDIT: fixed typo and now DCCISW cache clear works. github updated.
 
Last edited:
Many SDK examples uses SCB_DisableDCache() or use memory sections marked as non-cacheable.

Probably written by folks who cared nothing for performance!

It doesn't appear that I can write to PROGMEM, is that correct?

Confirmed, no write access. Any attempt to write should memfault, since that region is configured as READONLY in startup.c.

Code:
        SCB_MPU_RBAR = 0x60000000 | REGION(5); // QSPI Flash
        SCB_MPU_RASR = MEM_CACHE_WBWA | READONLY | SIZE_16M;

If you change this, or use NXP's SDK examples, I have no idea what it will try to do if you write. But I'm very confident it won't actually work.

At some point I'm going to dive into implementing the EEPROM library... and I'm really not looking forward to it. One of the longest and most painful experiences I've had with this chip, back before the SDK existed, was experimenting with FlexSPI by writing little programs in RAM over SWD. FlexSPI is incredibly powerful and configurable, but incredibly complex. Some parts (especially on the AHB side) I still don't fully understand.

To make EEPROM emulation work, code running from RAM is going to have to take control of the FlexSPI, backup some of its state and some of the LUTs to RAM, then reconfigure some LUTs to query the state of the flash chip, so that too can be restored. Then more LUTs used to get it into a state for writing, then do the operation and wait for the chip to not be busy. Then the tricky part will be getting the flash chip back to the exact same mode, and restore all the LUTs and other flexspi state, delete caches, and cross finger and hope the next bus cycle accessing the 0x600xxxxx memory range doesn't crash or lock up.

There's also a thorny issue of what to do if the chip reboots, like by a watchdog timer. I have found several ways the bootloader can leave the flash chip configured which aren't accessible by NXP's ROM.

Also a possible issue, but one I'm less worried about, is if the user presses the program button. I put a *lot* of work into the bootloader to be able to initialize the flash chip regardless of what state it's currently configured. The bootloader will correctly cause an in-progress write or erase to abort (but leaving the targeted memory unknown), and it knows how to navigate (hopefully) all of the flash chip's modes. NXP's ROM definitely does not.


Edit: maybe I should also mention all the beta boards have unlocked flash chips. They all have a "restore image" in the top 4096 bytes, which is writable. The non-volatile bits in the 3 status registers are all writable. In the final release, all these will be permanently locked, so you can't destroy the restore image or permanently reconfigure your flash. But in the beta boards you can... if you can figure out the complexities of the FlexSPI and its LUTs (which took me a couple rather frustrating months.... before the SDK was published)


Edit again: Regarding this question:

why is OCRAM slower than PROGMEM?

This involves some guesswork, because it's the AHB side of FlexSPI which I don't understand so well. I've only used the IPS bus side (everything the bootloader does is over the IPS commands, or by bitbanging over JTAG boundary scan). It also involves settings configured by NXP's ROM which aren't well documented and I've been happy to "leave well enough alone".

Anyway, my best guess is the AHB RX buffer is acting as a small 1K cache, even if the ARM data cache is turned off.

buffer.png

Still, I'm surprised it would be faster than the OCRAM....
 
Last edited:
Maybe it just uses wrong reset/default values, like with the PFDs.
NXP mentioned a BSP , Board support Package which configures some things(I think) but I found nothing..
I've mentioned this earlier In this thread.
 
FYI - I just pushed up a different implementation of the SPIFlex(buf, retbuf, cnt), that may work differently. At least it is not hanging on my test program. ...
Edit2: Looks like I need to do a little more testing/updates, that is my dual display SSD1306 is hanging when I do the Flex display update along with the SPI doing Async update... Will try some more hacks here...
Kurt,
my transfer(txbuf,rxbuf,cnt) isn't hanging any more. I still get lots of errors asking for 40mhz (with 160mhz flexio clock) and there is interframe gap. No gaps asking for 30mhz and ALMOST NO ERRORS. I'm actually getting one error on ALL transfers! txbuf[0] is being over-written ?????? I reset at the start of each iteration.
Code:
  // err check jumper 2 to 3  MOSI to MISO
  for (int i = 0; i < sizeof(txbuf); i++) txbuf[i] = i;
  memset(rxbuf, 0, sizeof(rxbuf));
  assert_cs();
  SPIFLEX.transfer(txbuf, rxbuf, sizeof(txbuf));
  release_cs();

rxbuf sits in front of txbuf, so i'm assuming rxbuf is overflowing getting one more byte than it should.
 
Last edited:
Good question Frank. I really don't know on this, especially about the NIC-301 chapter. I did try looking through the SDK but didn't see anything about it. Maybe Mantiou has? He's probably spent the most time studying the SDK. I've tried to do as little as possible with the SDK...

On the OCRAM, the FLEXRAM_TCM_CTRL register is where the wait states are configured. As nearly as I can tell, that's only for ITCM & DTCM access, and the default is supposed to be single cycle. I haven't tried fiddling with it.

So far, I've also not put any significant working into reverse engineering NXP's ROM. Maybe someday....
 
there is debug led code in input i2s - please delete that before making a new installer

AttachInterrupt: Is :: memory sufficiant? I'd think a dmb or dsb may be needed, too.

We can move the vectortable to ITCM -manitou tested this and it was the same speed. This would free someRAM for variables/stack.
 
Kurt,
my transfer(txbuf,rxbuf,cnt) isn't hanging any more. I still get lots of errors asking for 40mhz (with 160mhz flexio clock) and there is interframe gap. No gaps asking for 30mhz and ALMOST NO ERRORS. I'm actually getting one error on ALL transfers! txbuf[0] is being over-written ?????? I reset at the start of each iteration.
Code:
  // err check jumper 2 to 3  MOSI to MISO
  for (int i = 0; i < sizeof(txbuf); i++) txbuf[i] = i;
  memset(rxbuf, 0, sizeof(rxbuf));
  assert_cs();
  SPIFLEX.transfer(txbuf, rxbuf, sizeof(txbuf));
  release_cs();

rxbuf sits in front of txbuf, so i'm assuming rxbuf is overflowing getting one more byte than it should.

Sounds like I need to try out your version of test and see what is going on. Obviously the overwrite could be a bug... Will test again.

To make reliable, may almost have to do something like if the speed > X and rxbuf
while (cnt--) *rxbuf++ = transfer(*txbuf++);

Which will put gaps in... Or maybe I need to update the transfer gap time... (i.e. slow it down) ...
Do you have a version of the test you would like me to use. Or I can obviously hack one up...
 
there is debug led code in input i2s - please delete that before making a new installer

Done.
https://github.com/PaulStoffregen/Audio/commit/bd7aea87e9856e3d8215d8830882452bae548d90


AttachInterrupt: Is :: memory sufficiant? I'd think a dmb or dsb may be needed, too.

Just spent the last couple hours looking at generated assembly. It was purely a compiler memory access reordering problem.

Since the vector table is in DTCM, shouldn't need those.


We can move the vectortable to ITCM -manitou tested this and it was the same speed. This would free someRAM for variables/stack.

Let's talk of this for beta10. I'm on my 3rd rebuild of beta9. Only going to do a 4th build if something is very wrong.
 
AudioShield to T4 Connections

Ok - getting my self a little confused on pins to the audio
SPI(MOSI, MISO, SCLK, SDCS) => 11, 12, 13, 10. This I got.
VOL => 15
MCLK => 23 (SAI1)
LRCLK => 21
BCLK => 20
RX => 7
TX => 6

Just need a sanity check here. Reason I am asking is I am putting my own breakout board together that will mate with some of the shields that i have (need a break from coding for a bit). I have a 5v reg on boad going to 5v pin, but I didn't notice a trace to cut between usb/5v pin?
 
SPI, FlexSPI and SPISettings

Currently I can not use the SPISettings that are part of the SPIClass code in the SPIFlex code as the code internal to it is very specific to the SPI object itself.

However I could use it, if I added members to the SPISettings, that allows me to query the information like speed, MSB/LSB, ...

The version I have for SPI Flex only holds the data and allows me to retrieve it. Wondering about adding that to SPI as well so at minimum could pass in SPISettings and get the info. But SPISettings would probably still generate data not needed...

I have a hacked up version of the SPISettings that does this... I have not tried using it yet in the Flex code, will later.

Code:
class SPISettings {
public:
	SPISettings(uint32_t clock, uint8_t bitOrder, uint8_t dataMode) :
	[COLOR="#FF0000"]	_clock(clock), _bitOrder(bitOrder), _dataMode(dataMode)[/COLOR]
	{
		if (__builtin_constant_p(clock)) {
			init_AlwaysInline(clock, bitOrder, dataMode);
		} else {
			init_MightInline(clock, bitOrder, dataMode);
		}
	}
	SPISettings() {
		init_AlwaysInline(4000000, MSBFIRST, SPI_MODE0);
	}
	[COLOR="#FF0000"]uint32_t inline clock() {return _clock;}
	uint8_t inline bitOrder() {return _bitOrder;}
	uint8_t inline dataMode() {return _dataMode;}[/COLOR]
private:
	void init_MightInline(uint32_t clock, uint8_t bitOrder, uint8_t dataMode) {
		init_AlwaysInline(clock, bitOrder, dataMode);
	}
	void init_AlwaysInline(uint32_t clock, uint8_t bitOrder, uint8_t dataMode)
	  __attribute__((__always_inline__)) {
		// TODO: Need to check timings as related to chip selects?
				
		const uint32_t clk_sel[4] = {664615384,  // PLL3 PFD1
					     720000000,  // PLL3 PFD0
					     528000000,  // PLL2
					     396000000}; // PLL2 PFD2				
		uint32_t cbcmr = CCM_CBCMR;
		uint32_t clkhz = clk_sel[(cbcmr >> 4) & 0x03] / (((cbcmr >> 26 ) & 0x07 ) + 1);  // LPSPI peripheral clock
		
		uint32_t d, div;		
		if (clock == 0) clock =1;
		d= clkhz/clock;
		if (d && clkhz/d > clock) d++;
		if (d > 257) d= 257;  // max div
		if (d > 2) {
			div = d-2;
		} else {
			div =0;
		}
		ccr = LPSPI_CCR_SCKDIV(div) | LPSPI_CCR_DBT(div/2);
		tcr = LPSPI_TCR_FRAMESZ(7);    // TCR has polarity and bit order too

		// handle LSB setup 
		if (bitOrder == LSBFIRST) tcr |= LPSPI_TCR_LSBF;

		// Handle Data Mode
		if (dataMode & 0x08) tcr |= LPSPI_TCR_CPOL;

		// Note: On T3.2 when we set CPHA it also updated the timing.  It moved the 
		// PCS to SCK Delay Prescaler into the After SCK Delay Prescaler	
		if (dataMode & 0x04) tcr |= LPSPI_TCR_CPHA; 
	}
	uint32_t ccr; // clock config, pg 2660 (RT1050 ref, rev 2)
	uint32_t tcr; // transmit command, pg 2664 (RT1050 ref, rev 2)
[COLOR="#FF0000"]	uint32_t _clock;
	uint8_t _bitOrder;
	uint8_t _dataMode;[/COLOR]
	friend class SPIClass;
};
Parts added shown in RED.

But in addition to this, wondering if we could just go ahead and move the smarts of the init_AlwaysInline... into the SPIClass::beginTransaction?

Reason I ask this, is I believe with this current code, with the looking at contents of CCM_CBCMR it will either a) always have to compute all of this data or maybe be wrong...

That is suppose your code has a static one that is used... like:
SPISettings mySettings(8000000, MSBFIRST, SPI_MODE0);

What values will it get for CCM_CBCMR.

And I suspect that if I do something like:
SPI.beginTransaction(SPISettings(8000000, MSBFIRST, SPI_MODE0));

Even though all of the inputs are constants, it will not e able to compile this down to ccr and tcr being precomputed values. That is it will always run this code...

So again wondering if in T4, we simply move all of this compute code into SPI.beginTransaction and use real simple version of SPISettings (more or less just the RED stuff?

Thoughts?
 
AudioShield to T4 Connections

Do you have one of the breakout boards, with the switch for I2S1 vs I2S2?

If you remove the T4 and audio shield, and power it up with external 3.3V, you can check the signal paths with an ohm-meter. The switch controlls five 74LVC1G3157 analog switches, which should measure approx 6 ohms.

For I2S2, MCLK is pin 33 on the bottom side.


EDIT: for anyone who doesn't have the breakout board and wants try audio or USB host, email me directly. I have more parts coming Friday, so we should have 8 more of those available next week.
 
Do you have one of the breakout boards, with the switch for I2S1 vs I2S2?

If you remove the T4 and audio shield, and power it up with external 3.3V, you can check the signal paths with an ohm-meter. The switch controlls five 74LVC1G3157 analog switches, which should measure approx 6 ohms.

For I2S2, MCLK is pin 33 on the bottom side.


EDIT: for anyone who doesn't have the breakout board and wants try audio or USB host, email me directly. I have more parts coming Friday, so we should have 8 more of those available next week.

Yes I do have one of the breakout boards but I didn't know that would be how to ring out the connections. Thanks will do that.
 
Sounds like I need to try out your version of test and see what is going on. Obviously the overwrite could be a bug... Will test again.

my test sketch
Code:
#include <FlexIO_t4.h>
#include <FlexSPI.h>

#define SPIHZ 30000000

//#define HARDWARE_CS
#ifdef HARDWARE_CS
FlexSPI SPIFLEX(2, 3, 4, 5); // Setup on (int mosiPin, int misoPin, int sckPin, int csPin=-1) :
#define assert_cs()
#define release_cs()
#else
FlexSPI SPIFLEX(2, 3, 4, -1); // Setup on (int mosiPin, int misoPin, int sckPin, int csPin=-1) :
#define assert_cs() digitalWriteFast(5, LOW)
#define release_cs() digitalWriteFast(5, HIGH)
#endif

void setup() {
  pinMode(13, OUTPUT);
  while (!Serial && millis() < 4000);
  Serial.begin(115200);
  delay(500);

#ifndef HARDWARE_CS
  pinMode(5, OUTPUT);
  release_cs();
#endif
  SPIFLEX.begin();

  // See if we can update the speed...
  SPIFLEX.flexIOHandler()->setClockSettings(3, 2, 0);	// clksel(0-3PLL4, Pll3 PFD2 PLL5, *PLL3_sw)

  Serial.printf("Updated Flex IO speed: %u\n", SPIFLEX.flexIOHandler()->computeClockRate());
  Serial.printf("SPIHZ %d\n", SPIHZ);
}
uint8_t  txbuf[1024], rxbuf[1024];

void loop() {
  SPIFLEX.beginTransaction(FlexSPISettings(SPIHZ, MSBFIRST, SPI_MODE0));

#if 1
  assert_cs();
  uint32_t t = micros();
  SPIFLEX.transfer(txbuf, NULL, sizeof(txbuf));
  t = micros() - t;
  Serial.printf("%d us %.1f mbs\n", t, 8.*sizeof(txbuf) / t);
  release_cs();
#endif

  // err check jumper 2 to 3  MOSI to MISO
  for (int i = 0; i < sizeof(txbuf); i++) txbuf[i] = i;
  memset(rxbuf, 0, sizeof(rxbuf));
  assert_cs();
  SPIFLEX.transfer(txbuf, rxbuf, sizeof(txbuf));
  release_cs();
  SPIFLEX.endTransaction();

  int errs = 0;
  for (int i = 0; i < sizeof(txbuf); i++) if (txbuf[i] != rxbuf[i]) errs++;
  Serial.printf("errs %d [3] %d\n", errs, rxbuf[3]);
  if (errs) {
    for (int i = 0; i <= 4; i++) Serial.printf("%d  %d %d\n", i, txbuf[i], rxbuf[i]);
    for (int i = 500; i <= 504; i++) Serial.printf("%d  %d %d\n", i, txbuf[i], rxbuf[i]);
    for (int i = 1020; i <= 1023; i++) Serial.printf("%d  %d %d\n", i, txbuf[i],  rxbuf[i]);
  }
  delay(500);
}
 
Kurt let me know when you have updated FlexSPI for TeensyView/ssd1306 and your 64pix tall as you tested that works with Beta9 release to run both SPI's as you did.

Does this look like your SPI 1306 unit? ssd1306SPI.jpg If so can you confirm wiring { SDA==MOSI and SCL==CLK ? }

I wondered if the bus design might result in PROGMEM speed oddity:
surprised it would be faster than the OCRAM....
… OCRAM slower than PROGMEM …
Dcache on
11676 us stack
11732 us malloc
11676 us progmem

D cache off
11676 us stack
1132899 us malloc
80016 us progmem
 
USBHOST_T36

I just downloaded the latest core and the usbhost_t36 libraries. I hooked up my PS4 joystick and it correctly identifies the joystick, the manufacturer etc. A couple things since I never used it before:
The buttons on the right with the symbols do trip and I can see the values, the bumble/rumble works for both left and right as well as the buttons above the bumble levers.

What I can't really tell is working or not but probably is are the arrow buttons and joysticks, only because I see a whole lot of data flying across the screen and even if set show only changed values it still keeps streaming. Together with me not know the functions or what printed makes it confusing. Going to go through it later and figure it out - is there a reference page somewhere?
 
Status
Not open for further replies.
Back
Top