Teensy 4.0 First Beta Test

Status
Not open for further replies.
Yep - I thought I would try changing the #if 0's to 1s... To see what it all needed... As you mentioned it requires usbmem.h/c, but again have not looked any deeper!
 
Jumping in to say that if you've been following this lengthy thread of the awesome beta testing work, the first post has been updated to include some additional users invited to beta test. :)
 
Well, I think I just ran out of RAM on my Teensy 3.2. Is this new machine it ready for some monkey bashing?

I'm working on a litlOS (Multiple sketch swapping) and building a cellphone as a demo. I'd like to try swapping this in for the headroom.

-jim lee
 
@jim_lee - the T_3.5 and 3.6 have 256 KB of ram - versus the 64 KB on T_3.2 and have been out for 2+ years. The T_3.6 runs up to 256MHz as well and has some other MCU design optimizations that should make it a great next step. The T4 is in limited Beta release as notes in p#1 this thread … it is coming along but still getting it's core in shape for general use and release.
 
Yep - I thought I would try changing the #if 0's to 1s... To see what it all needed... As you mentioned it requires usbmem.h/c, but again have not looked any deeper!

Hi Kurt
Just got back from doing my taxes. At least I get money back :)

Back to the T4. Besides usb_mem, usb_dev needs to be updated to include some other functions. Was looking at it but usb_irq was getting me, at least I only saw it before I left.

EDIT: Forgot - I got my breakout board so have to ring that out soon :) It is a big board. Going to have to shrink it down for later revisions! Pins in good position except the Arduino pins - have to move the move about 0.1" down.
 
@manitou, @Paul:

I found a very interesting article:
https://arxiv.org/pdf/1703.08228.pdf

It's about the Cortex-M3 mainly, but in general useful for us, too.
(especially the used benchmarks might be interesting for you)
https://github.com/mageec/beebs
http://beebs.eu/

Tonight I played a little (not much) with the proposed flags, and found that this gives 110 more coremarkish points:
Code:
teensy4b.menu.opt.o3std.build.flags.optimize=-O3 -fno-tree-loop-if-convert -fno-sched-interblock -fno-tree-copyrename -fno-ipa-sra -fgcse-las -fno-schedule-insns -fno-tree-loop-distribute-patterns -fno-caller-saves -fno-optimize-strlen -fno-inline-functions-called-once -fno-tree-slsr -fno-tree-scev-cprop -funroll-all-loops -fno-sched-dep-count-heuristic -fno-tree-ccp -fno-predictive-commoning -fno-ipa-pure-const -fno-merge-constants
(removed some options)

Most likely there are even more efficient flags. Did not "play" too much with it and did not try the benchmarks (coremarkish only)

Edit: Tried that with GCC8.2.1 - it seems to produce slower code than our old version...
Edit: removing -fno-peephole2 is better. removed hat from the list above.
edit: rmv -fno-tree-pta
 
Last edited:
@Defragster:
I had to do the following changes to make LTO work:
Code:
// Stack frame
//  xPSR
//  ReturnAddress
//  LR (R14) - typically FFFFFFF9 for IRQ or Exception
//  R12
//  R3
//  R2
//  R1
//  R0
// Code from :: https://community.nxp.com/thread/389002
void HardFault_HandlerC(unsigned int *hardfault_args) [COLOR=#ff0000]__attribute__((used));[/COLOR]

__attribute__((naked))
void unused_interrupt_vector(void)
{
  __asm( ".syntax unified\n"
         "MOVS R0, #4 \n"
         "MOV R1, LR \n"
         "TST R0, R1 \n"
         "BEQ _MSP \n"
         "MRS R0, PSP \n"
         "B HardFault_HandlerC \n"
         "_MSP: \n"
         "MRS R0, MSP \n"
         "B HardFault_HandlerC \n"
         ".syntax divided\n") ;
}

__attribute__((weak))
void HardFault_HandlerC(unsigned int *hardfault_args) {
    volatile unsigned int nn ;
[COLOR=#ff0000]#if defined(PRINT_DEBUG_STUFF)  [/COLOR]    
  volatile unsigned int stacked_r0 ;
  volatile unsigned int stacked_r1 ;
  volatile unsigned int stacked_r2 ;
  volatile unsigned int stacked_r3 ;
  volatile unsigned int stacked_r12 ;
  volatile unsigned int stacked_lr ;
  volatile unsigned int stacked_pc ;
  volatile unsigned int stacked_psr ;
  volatile unsigned int _CFSR ;
  volatile unsigned int _HFSR ;
  volatile unsigned int _DFSR ;
  volatile unsigned int _AFSR ;
  volatile unsigned int _BFAR ;
  volatile unsigned int _MMAR ;
  volatile unsigned int addr ;

  stacked_r0 = ((unsigned int)hardfault_args[0]) ;
  stacked_r1 = ((unsigned int)hardfault_args[1]) ;
  stacked_r2 = ((unsigned int)hardfault_args[2]) ;
  stacked_r3 = ((unsigned int)hardfault_args[3]) ;
  stacked_r12 = ((unsigned int)hardfault_args[4]) ;
  stacked_lr = ((unsigned int)hardfault_args[5]) ;
  stacked_pc = ((unsigned int)hardfault_args[6]) ;
  stacked_psr = ((unsigned int)hardfault_args[7]) ;
  // Configurable Fault Status Register
  // Consists of MMSR, BFSR and UFSR
  _CFSR = (*((volatile unsigned int *)(0xE000ED28))) ;
  // Hard Fault Status Register
  _HFSR = (*((volatile unsigned int *)(0xE000ED2C))) ;
  // Debug Fault Status Register
  _DFSR = (*((volatile unsigned int *)(0xE000ED30))) ;
  // Auxiliary Fault Status Register
  _AFSR = (*((volatile unsigned int *)(0xE000ED3C))) ;
  // Read the Fault Address Registers. These may not contain valid values.
  // Check BFARVALID/MMARVALID to see if they are valid values
  // MemManage Fault Address Register
  _MMAR = (*((volatile unsigned int *)(0xE000ED34))) ;
  // Bus Fault Address Register
  _BFAR = (*((volatile unsigned int *)(0xE000ED38))) ;
  //__asm("BKPT #0\n") ; // Break into the debugger // NO Debugger here.

  asm volatile("mrs %0, ipsr\n" : "=r" (addr)::);

  printf_debug("\nFault irq %d\n", addr & 0x1FF);
  printf_debug(" stacked_r0 ::  %x\n", stacked_r0);
  printf_debug(" stacked_r1 ::  %x\n", stacked_r1);
  printf_debug(" stacked_r2 ::  %x\n", stacked_r2);
  printf_debug(" stacked_r3 ::  %x\n", stacked_r3);
  printf_debug(" stacked_r12 ::  %x\n", stacked_r12);
  printf_debug(" stacked_lr ::  %x\n", stacked_lr);
  printf_debug(" stacked_pc ::  %x\n", stacked_pc);
  printf_debug(" stacked_psr ::  %x\n", stacked_psr);
  printf_debug(" _CFSR ::  %x\n", _CFSR);
  printf_debug(" _HFSR ::  %x\n", _HFSR);
  printf_debug(" _DFSR ::  %x\n", _DFSR);
  printf_debug(" _AFSR ::  %x\n", _AFSR);
  printf_debug(" _BFAR ::  %x\n", _BFAR);
  printf_debug(" _MMAR ::  %x\n", _MMAR);
[COLOR=#ff0000]#endif[/COLOR]
  IOMUXC_SW_MUX_CTL_PAD_GPIO_B0_03 = 5; // pin 13
  IOMUXC_SW_PAD_CTL_PAD_GPIO_B0_03 = IOMUXC_PAD_DSE(7);
  GPIO2_GDIR |= (1 << 3);
  GPIO2_DR_SET = (1 << 3);
  GPIO2_DR_CLEAR = (1 << 3); //digitalWrite(13, LOW);

  if ( F_CPU_ACTUAL >= 600000000 )
    set_arm_clock(300000000);

  while (1)
  {
    GPIO2_DR_SET = (1 << 3); //digitalWrite(13, HIGH);
    // digitalWrite(13, HIGH);
    for (nn = 0; nn < 2000000/2; nn++) ;
    GPIO2_DR_CLEAR = (1 << 3); //digitalWrite(13, LOW);
    // digitalWrite(13, LOW);
    for (nn = 0; nn < 18000000/2; nn++) ;
  }
}
Can you review that, and (maybe) do a pullrequest for Paul? Thanks :)
 
@Defragster:
I had to do the following changes to make LTO work:
Can you review that, and (maybe) do a pullrequest for Paul? Thanks :)

Will do Frank - at a glance it looks good - that is to keep LTO from discarding the code?

Should those pieces be made to .startup or PROGMEM so they stay safe in FLASH? I'll make sure that works.

In the end - if like Teensy3 code - the whole HandlerC() will get "#if 0" swapped with a void non blink function. Also Paul suggested he had larger designs for something DE-buggy.
 
@manitou, @Paul:

I found a very interesting article:
https://arxiv.org/pdf/1703.08228.pdf

haha, lol.. the only options that helps *really* is -funroll-all-loops
so, forget that paper :) (for T4)
teensy4b.menu.opt.o3std.build.flags.optimize=-O3 -funroll-all-loops

coremarkish : 2665
I think the main reason is that it does run from fast RAM, not slow FLASH(? perhaps code-size plays a role?) should test this with 3.6, too...
(GCC 8.2.1: 2505 <- wow.. so much slower?)
GCC 7.3.1: 2629
GCC 6.3.1: 2659
hm. regressions. constantly. No reason to update... what the hell is going on?

I remember *many* years ago, it was the same for AVR...
 
Last edited:
Suppose USB coming along as best it can - haven't seen Paul note it lately …

Sorry. I had an outdoor event to attend last week which seemed fine at the time, but then relapsed on this damn cold! Did the "lots of rest and liquids" thing for a few days. A couple PJRC tech issues came up, like needing to fix / update our test fixtures. Feeling better these last few days, built up more beta test hardware... so we can expand the list.

Maybe I'll extent the quick-and-dirty USB serial to implement available() and read(). That should give us something to at least use for testing, until I can really work on performance optimization.
 
Sorry. I had an outdoor event to attend last week which seemed fine at the time, but then relapsed on this damn cold! Did the "lots of rest and liquids" thing for a few days. A couple PJRC tech issues came up, like needing to fix / update our test fixtures. Feeling better these last few days, built up more beta test hardware... so we can expand the list.

Maybe I'll extent the quick-and-dirty USB serial to implement available() and read(). That should give us something to at least use for testing, until I can really work on performance optimization.
Don't be sorry - if its like the one I had its the cold that just doesn't want to go away.

I was just beginning to steal code from T3 core to do that but you doing a quick a dirty just to give us something to work with would probably be a lot better - especially since I never touched that stuff before :)
 
Yep - I thought I would try changing the #if 0's to 1s... To see what it all needed... As you mentioned it requires usbmem.h/c, but again have not looked any deeper!

Just as a BTW I tried updating to get it working but ran into a lot of "unknowns", on my part mostly, what some of the functions were doing. Some I could understand but others that were needed just got more complicated - pretty much would have to a lot of homework to get it going.

@Paul - have new appreciation for what you put together to get the Teensies operational :)
 
...
Maybe I'll extent the quick-and-dirty USB serial to implement available() and read(). That should give us something to at least use for testing, until I can really work on performance optimization.

Good you are feeling better. Nice there was enough interest to have to send out more beta units.

A quick update to USB to allow normal input would be nice for general usage to stop halting on input. If you see the way … it would be handy to keep the loop()/code running and ignoring Serial ouput when !Serial - it blocks now.
 
I'm guessing you're looking for something different than the huge list on pages 987-1000 of the RT1050 ref manual, rev2 ?

Yes, a list which shows which library or module uses which clock (and at which speed, if it changes it) actually. Perhaps it could show which other resources are used, too.
Such a list would it make way easier to write compatible libraries, without guessing "does any lib change the settings of 'my' clock"?" Or, can I change it, without breaking other things?
For example the audio-lib *will* change the freq. of PLL4 and does not use the defaults. Anything else should be aware of that.
And if we want to use higher speeds for SPI, we need to adjust some things, too..
 
Is Serial#.peek() not implemented? This code works on "Serial" with T_3.6 { without the two delay(100)'s }
Also this code as FASTER w/LTO hangs T4 but works FAST w/LTO:
Code:
#define SER_IN Serial1
void setup() {
	// put your setup code here, to run once:
	SER_IN.begin(1843200);
	while (!SER_IN && millis() < 4000 );
	SER_IN.println("\n" __FILE__ " " __DATE__ " " __TIME__);
}

float val;

void loop() {
	SER_IN.print("Enter a number - float if you please = ");
	while (SER_IN.available() == 0) { delay(100);}
	delay(100);
	val = SER_IN.parseFloat();

	SER_IN.print("You Entered=");
	SER_IN.println(val);
	if ( 10 == SER_IN.peek() ) { SER_IN.read(); SER_IN.println("Ate the NewLine!"); }
}

These public HSerial items are present:
Code:
	virtual int available(void);
	virtual int peek(void);

I'm doing this to test w/LTO compile and FrankB's (used) - it seemed simple sketch :( :confused:- when I fault at the end of setup() { …; GPT1_CNT = 5; // FAULT } it gives the DUMP and blinks but acts really ODD confusing TyComm to a Windows Hourglass, not responding - even with debug_tt. I've done a lot of faulting - mostly this one on T4 - and not seen this before?
 
Last edited:
Is Serial#.peek() not implemented? This code works on "Serial" with T_3.6 { without the two delay(100)'s }
Also this code as FASTER w/LTO hangs T4 but works FAST w/LTO:


I'm doing this to test w/LTO compile and FrankB's (used) - it seemed simple sketch :( :confused:- when I fault at the end of setup() { …; GPT1_CNT = 5; // FAULT } it gives the DUMP and blinks but acts really ODD confusing TyComm to a Windows Hourglass, not responding - even with debug_tt. I've done a lot of faulting - mostly this one on T4 - and not seen this before?
Yes HardwareSerial::peek() is implemented, which is very similar to the read function:
Code:
int HardwareSerial::peek(void)
{
	uint32_t head, tail;

	head = rx_buffer_head_;
	tail = rx_buffer_tail_;
	if (head == tail) return -1;
	if (++tail >= rx_buffer_total_size_) tail = 0;
	if (tail < rx_buffer_size_) {
		return rx_buffer_[tail];
	} else {
		return rx_buffer_storage_[tail-rx_buffer_size_];
	}
}

int HardwareSerial::read(void)
{
	uint32_t head, tail;
	int c;

	head = rx_buffer_head_;
	tail = rx_buffer_tail_;
	if (head == tail) return -1;
	if (++tail >= rx_buffer_total_size_) tail = 0;
	if (tail < rx_buffer_size_) {
		c = rx_buffer_[tail];
	} else {
		c = rx_buffer_storage_[tail-rx_buffer_size_];
	}
	rx_buffer_tail_ = tail;
	if (rts_pin_baseReg_) {
		uint32_t avail;
		if (head >= tail) avail = head - tail;
		else avail = rx_buffer_total_size_ + head - tail;

		if (avail <= rts_low_watermark_) rts_assert();
	}
	return c;
}
Could make it more similar and assign to local variable like c and return it at end... Would/should maybe remove the last else clause and always return what was in this section at the end...

The problem is either the test program or the configuration of your serial monitor that is talking to it ;)

That is if I modify your program and run it on T4, and use Serial monitor... Note I changed the baud rate to a standard 115200...
Code:
#define SER_IN Serial1
void setup() {
  // put your setup code here, to run once:
  SER_IN.begin(115200);
  while (!SER_IN && millis() < 4000 );
  SER_IN.println("\n" __FILE__ " " __DATE__ " " __TIME__);
}

float val;

void loop() {
  SER_IN.print("Enter a number - float if you please = ");
  while (SER_IN.available() == 0) { delay(100);}
  delay(100);
  val = SER_IN.parseFloat();

  SER_IN.print("You Entered=");
  SER_IN.println(val);
  int ch;
  while ((ch = SER_IN.read()) != -1) SER_IN.printf("%02x ", ch);
  SER_IN.println();
  if ( 10 == SER_IN.peek() ) { SER_IN.read(); SER_IN.println("Ate the NewLine!"); }
}
And if I run it, I see:
Code:
C:\Users\kurte\Documents\Arduino\zzz\zzz.ino Feb 15 2019 05:52:06
Enter a number - float if you please = You Entered=1.23
0d 0a 
Enter a number - float if you please =
So the code was looking for the character that ended the parse to be a LF, but in my case I output both CR and LF...
 
Yes, a list which shows which library or module uses which clock (and at which speed, if it changes it) actually. Perhaps it could show which other resources are used, too.
Such a list would it make way easier to write compatible libraries, without guessing "does any lib change the settings of 'my' clock"?" Or, can I change it, without breaking other things?
For example the audio-lib *will* change the freq. of PLL4 and does not use the defaults. Anything else should be aware of that.
And if we want to use higher speeds for SPI, we need to adjust some things, too..

As you mentioned, it would be nice to know who uses what... But my guess is that it would be difficult to keep a reasonably accurate list. Especially if libraries are setup to allow programs to set it... Example with the FlexIO code, I added a member to update the appropriate registers (depending on which FlexIO object the user is using)... But I have not yet updated all of the code to recover from it. That is I did this for the SPI testing, but have not yet updated the Serial object code to figure out the needed settings to achieve the baud rate you requested, depending on the current settings for the timer...

Also interesting, is yes you can make a library more robust by having it understand and use the current clock settings. But it may come at a cost...

Example with the SPI library. Simply adding the code to check what clock is used in the SPISettings, now causes many programs that use SPI to get larger and slower.

As I suspected and verified earlier, if an application does something like: SPI.beginTransaction(SPISettings(30000000, MSBFIRST, SPI_MODE0));
Note: just typed in above line so may be slightly off.

But on T3.x code, this would reduce down to simply assigning the appropriate values to hardware registers as all of the calculations were done at compile time and needed no run time code to compute these values. Now with the T4 code that checks which clock setting to use, all of these computations are now done at run time...
 
@KurtE
RE: USB Serial.available/read

I just can't help myself when I get "bee in my bonet" as they say. May I was way off in what I was looking at. For the imxrt, the usb serial seems to controlled in usb_serial.h/.c and usb_inst.h and usb_dev.h links to the functions in those files for the imxrt.h.

Doing some more reading and digging, found a couple of things that I am putting here just so I don't loose them:
Teensy 3.1 bare metal: Writing a USB driver - real nice blog post on whats going on
USB 2.0 UTMI Speification - since I was curious on the receiver state machine that the 1052 manual references.
Chapter 55 in the 1052 Reference Manual

A lot of info to absorb if you want to get into it.
 
@mjs513 and @Paul - Yes there is a lot to try to understand as to get USB stuff to work. I have had to browse a lot of it over the last couple of years trying to help out on the usbhost_t36 code base. Will take a look at some of the links you mentioned...

I know at some point (probably several times), Paul has posted links to where to get the full USB 2 documentation and which sections. There are areas that talk about all of the basics on like the different packets...

Then once you get those basics working you start to get into different specifics, like: HID documentation, for Mice, keyboards, joysticks....

Then when you get to things like Bluetooth, there are again more documents on the basics and then the different types of messages, different sub-systems...

Got run Annie wants her breakfast!
 
@KurtE
Saw the USB Spec ref that @Paul posted, the interesting stuff starts in chapter 5 (enumerators, pipes, etc.), Chapter 4 is the basic. Can not find the link that @Paul posted but I did find this link https://www.usb.org/document-library/usb-20-specification, the zip contains the usb2.0 spec and a lot of other docs. I found it from intel link about usb3.0 and usb2.0 specifications. Just in case anyone is interested.

Just as a BTW my little one is still sleeping.
 
A lot of info to absorb if you want to get into it.

Yes, indeed USB is very complicated. The controllers in this new chip are basically the same as the 2nd port on Teensy 3.6 (where we've only supported host mode, never device mode). It's completely different than the main port on all Teensy 3.x boards. The hardware is far more powerful, but also more complex, and that's all on top of the tremendous complexity of USB in general.

I'm planning to make some changes from the overall USB stack design from Teenxy 3.x, mainly with how memory is allocated. Most of those design choices were made in 2011-2012 for Teensy 3.0 with only 16K RAM. Those choices also revolve around that controller's hardware operating at the USB token level. Now that we have far more RAM, 20X the bandwidth (480 Mbit vs 12 Mbit), and a controller scheme designed around the USB transfer level, I'm planning to do things differently. The main difference will involve the memory allocation. I want to get away from a shared pool of buffers at the device level. The general idea for 4.x is memory allocated by the interface level code.

So the code for Teensy 3.x isn't a good road map for how I want this to work on Teensy 4.
 
This is the link for a copy of just the USB 2.0 spec (hosted on PJRC's server) without all that other stuff.

https://www.pjrc.com/teensy/beta/usb20.pdf

I've never understood why the USB-If website doesn't support the concept of permalinks to these important documents.
Thanks Paul - I normally had to hunt for it...

Also again for other topic (bluetooth and the like), I also have to refer to the Bluetooth_Core document... I think that was part of some zip file... But I have a larger document and one where I extracted the main HCI stuff and I think it was another manual that had the L2CAP definitions...
 
Status
Not open for further replies.
Back
Top