Teensy 4.0 First Beta Test

Status
Not open for further replies.
Pretty sure the config setups are right. Works for any pin combination that does not include pin 0. Really must be missing something obvious. Been through the RM more times than I care to count :)

I think you noted core_pins.h had wrong PIN0_PADCONFIG
Code:
#define CORE_PIN0_PADCONFIG	IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B0_02
#define CORE_PIN1_PADCONFIG	IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B0_03
I presume you are testing with corrected version (github is still wrong). (Failing that, maybe try it with the wrong config???)
 
I think you noted core_pins.h had wrong PIN0_PADCONFIG
Code:
#define CORE_PIN0_PADCONFIG	IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B0_02
#define CORE_PIN1_PADCONFIG	IOMUXC_SW_PAD_CTL_PAD_GPIO_AD_B0_03
I presume you are testing with corrected version (github is still wrong). (Failing that, maybe try it with the wrong config???)

Yep - saw that and fixed it a while ago. Even if they are switched the pad config is the same anyway so it shouldn't matter.
 
Have you tried the following to reduce temp and power consumption instead?
Code:
while(1) __asm__ volatile ("wfi");
That will put the cpu core to sleep, but allow it to do anything important, which will only be interrupts.

Post #1111 I noted : "I already did a test of 'wfi' - sleepCPU() - and it wasn't clear ... first the line before it printed about 20 times - then a small edit and I don't know that it worked so will revisit it. "

My goal was to be able to find some useful postmortem info and return it - I should probably find something to read on that so far I'm just poking road kill with a stick.

Paul already does a reg dump, but I was hoping to link to a user func() to allow user to present their personal data and structures - this worked on T_3.x. So far making a call() can just die hard - but my 'micros blink without delay' has run these past hours though all serial output ceased before it got to that code.

Indications are interrupts are halted in the fault state - systick_isr is not counting. Serial# output stalls until pushed? And the 'wfi' feedback was odd - but that would be useful to put system in minimal state until it was attended to.

One goal would be to allow Software Auto Upload without button - that was working on T_3.x, but only because of PJRC had found the magic 'keep serial moving' actuators.
 
Last edited:
Yep, here is the example for using pin 0 and pin 1 (just the setup part):
Code:
#include "Enc_Layer.h"
#include "defines.h"

enc_config_t mEncConfigStruct;
uint32_t mCurPosValue;

void setup()
{
  while(!Serial && millis() < 4000);
  delay(2000);

  CCM_CCGR2 |= CCM_CCGR2_XBAR1(CCM_CCGR_ON);   //turn clock on for xbara1

  CORE_PIN0_CONFIG = 0x01;  //Select mux mode: ALT3 mux port for pins 2/3
                            // 0x01 for pins 0 or 1
  CORE_PIN2_CONFIG = 0x03;  //0x03=Input Path is determined by functionality, 
                            //0x13 for 1 enabled
  
  CORE_PIN0_PADCONFIG = 0x10B0;  //pin pad configuration
  CORE_PIN2_PADCONFIG = 0x10B0;
  
  //set as input
  IOMUXC_XBAR1_IN17_SELECT_INPUT = 0x00;  //set both pins as input
  IOMUXC_XBAR1_IN06_SELECT_INPUT = 0x00;

  //==========================================================================
  /* XBARA_SetSignalsConnection(XBARA1, kXBARA1_InputIomuxXbarIn21, kXBARA1_OutputEnc1PhaseAInput);
   * XBARA_SetSignalsConnection(XBARA1, kXBARA1_InputIomuxXbarIn22, kXBARA1_OutputEnc1PhaseBInput);
   * XBARA_SetSignalsConnection(XBARA1, kXBARA1_InputIomuxXbarIn23, kXBARA1_OutputEnc1Index);
   * These are the SDK settings
   * kXBARA1_OutputEnc1PhaseAInput   = 66|0x100U,   // XBARA1_OUT66 output assigned to ENC1_PHASE_A_INPUT
   * kXBARA1_OutputEnc1PhaseBInput   = 67|0x100U,   // XBARA1_OUT67 output assigned to ENC1_PHASE_B_INPUT
   * kXBARA1_OutputEnc1Index         = 68|0x100U,   // XBARA1_OUT68 output assigned to ENC1_INDEX 
   * kXBARA1_OutputEnc1Home          = 69|0x100U,   // XBARA1_OUT69 output assigned to ENC1_HOME
   * kXBARA1_OutputEnc1Trigger       = 70|0x100U,   // XBARA1_OUT70 output assigned to ENC1_TRIGGER 
   *
   * kXBARA1_InputIomuxXbarInout06   = 6|0x100U,    // IOMUX_XBAR_INOUT06 output assigned to XBARA1_IN6 input.
   * kXBARA1_InputIomuxXbarInout07   = 7|0x100U,    // IOMUX_XBAR_INOUT07 output assigned to XBARA1_IN7 input.
   * kXBARA1_InputIomuxXbarInout08   = 8|0x100U,    // IOMUX_XBAR_INOUT08 output assigned to XBARA1_IN8 input.
   */
  
  xbar_connect(17, 66);
  xbar_connect(6, 67);


  Serial.print("IOMUXC_GPR_GPR6: "); Serial.println(IOMUXC_GPR_GPR6, BIN);

  delay(5000);
  //========================================================================
  //Phase A => pin3
  //Phase B => pin2
    /* Initialize the ENC module. */
    ENC_GetDefaultConfig(&mEncConfigStruct);
    ENC_Init(&mEncConfigStruct);
    ENC_DoSoftwareLoadInitialPositionValue(); /* Update the position counter with initial value. */
}

uint32_t old_position = 0;

void loop(){
  
  /* This read operation would capture all the position counter to responding hold registers. */
  mCurPosValue = ENC_GetPositionValue();

  if(mCurPosValue != old_position){
    /* Read the position values. */
    Serial.printf("Current position value: %ld\r\n", mCurPosValue);
    Serial.printf("Position differential value: %d\r\n", (int16_t)ENC_GetHoldPositionDifferenceValue());
    Serial.printf("Position revolution value: %d\r\n", ENC_GetHoldRevolutionValue());
    Serial.println();
  }

  old_position = mCurPosValue;
}

Pretty sure the config setups are right. Works for any pin combination that does not include pin 0. Really must be missing something obvious. Been through the RM more times than I care to count :)

Again in this case for XBAR 17... The line:
Code:
  IOMUXC_XBAR1_IN17_SELECT_INPUT = 0x00;  //set both pins as input
Should be:
Code:
  IOMUXC_XBAR1_IN17_SELECT_INPUT = 0x01;  //set both pins as input
Page 910 PDF
 
FLEXIO - Uart Transmit update
Again not sure if anyone is interested (yet).

But now have some more of the stuff built into the underlying class. Things like mapping the IO pin to the FLex mode. Setting the correct clock gate.

Also stated playing with some of the Baud rate generation. Right now code is assuming that the clock passed in is something like: 480M/16
Which is sort of the defaults:
That is FlexIO1 can be gated from(PLL4, PLL3_PFD2, PLL5, PLL3_SW_CLK) default to PLL3_SW_Clk)
then divide by value of CDCDR[FLEXIO1_CLK_PRED] (default 2) and divided by CDCDR[FLEXIO1_CLK_PODF] (default 8)

Not sure if I should try setting any of these values and/or look for their current values if the user for example asks for BAUD...
If I am looking at the Clock setups about page 986
PLL4 = 786.43
PLL3_PFD2 = 508.24
PLL5=649.52
PLL3= 480

Not sure if this is correct?
 

Attachments

  • flex_uart_test-190122a.zip
    3.4 KB · Views: 77
I have to copy the dmasettings twice:

dmatx = dmasettings[0];
dmatx = dmasettings[0];

Nothing else helps - not asm("dmb"), not ("dsb"), not ("isb"), ..all three combined do not help, too. I even tried a delay.
Dmatx is in normal RAM, which is not cached.
I found this only by accident, after hours of trying to identify the problem that restarting the autorefresh always started with the last chunk of data.
As soon I had this line twice, accidentally, it suddenly worked..

Anybody got an idea? :)
Yes, me !
https://github.com/PaulStoffregen/cores/pull/341/commits/6da04a6e324d8513c361cc1aa3f1a413465d6e8d
 
Kurt, Paul merged my pull request that fixes at least the PFDs
. PLL3 / PFD settings are correct (on github). the others - don't know.
 
Again in this case for XBAR 17... The line:
Code:
  IOMUXC_XBAR1_IN17_SELECT_INPUT = 0x00;  //set both pins as input
Should be:
Code:
  IOMUXC_XBAR1_IN17_SELECT_INPUT = 0x01;  //set both pins as input
Page 910 PDF

Hey Kurt
That did the trick. Just checked the reference and you are absolutely correct. Think its the only pin that I tested that has xbar1 as 1 for input. All the other pins were 0. Tested it and it worked no problem.

Told you it was going to be something obvious :). Just needed another pair of eyes I guess. Have to figure out a way to put this all into some sort of structure so I don't screw it up again.

On another note - Paul incorporated the pad change I mentioned earlier
 
FLEXIO - Uart Transmit update
Again not sure if anyone is interested (yet).

But now have some more of the stuff built into the underlying class. Things like mapping the IO pin to the FLex mode. Setting the correct clock gate.

Also stated playing with some of the Baud rate generation. Right now code is assuming that the clock passed in is something like: 480M/16
Which is sort of the defaults:
That is FlexIO1 can be gated from(PLL4, PLL3_PFD2, PLL5, PLL3_SW_CLK) default to PLL3_SW_Clk)
then divide by value of CDCDR[FLEXIO1_CLK_PRED] (default 2) and divided by CDCDR[FLEXIO1_CLK_PODF] (default 8)

Not sure if I should try setting any of these values and/or look for their current values if the user for example asks for BAUD...
If I am looking at the Clock setups about page 986
PLL4 = 786.43
PLL3_PFD2 = 508.24
PLL5=649.52
PLL3= 480

Not sure if this is correct?

Just gave this a try but confused again. Looks like you are transmitting on pin2? But not sure what baud rate you are transmitting at? Saw I post that said 7575757. I attempted to tie pin 2 on the T4 to pin 0 on the T3.2 with USB to serial but nothing was showing in the monitor? Suggestions.
 
Current code should now be near 115200, the is a #define for desired baud. But may depend on how close the system is to the pll3 setting above,0
 
Current code should now be near 115200, the is a #define for desired baud. But may depend on how close the system is to the pll3 setting above,0
I know I posted some clock values... with Frank changes to the PFDs:
Code:
System Clock: 600000000
IPG Clock: 150000000
Semc Clock: 200000000
RTC Clock: 32768
USB1pll Clock: 480000000
Peripheral Clock: 24000000
Osc Clock: 24000000
Arm Clock: 1200000000
Usb1PllPfd0 Clock: 720000000
Usb1PllPfd1 Clock: 664615368
Usb1PllPfd2 Clock: 508235292
Usb1PllPfd3 Clock: 454736826
Usb2Pll Clock: 24000000
SysPll Clock: 528000000
SysPllPfd0 Clock: 351999990
SysPllPfd1 Clock: 594000000
SysPllPfd2 Clock: 396000000
SysPllPfd3 Clock: 297000000
 
Thanks guys - If you have tried the code I posted and nothing coming out... My guess is you have not synced up to core project, as I needed to update some of the macros in imxrt.h for setting some of the fields. It only updated 16 bit values not the full 32 values... So Some of the registers were not properly updated.

Examples debug output to Serial:
Code:
Offset to SHIFTBUFNIS 780
Enable flexio clock
pin 2 maps to: 20000aa0, port: 401ac000 pin 4
timer index: 0 shifter index: 0 mask: 1
Before configure flexio
CCM_CDCDR: 33f71f92
VERID:2100404 PARAM:1 CTRL:1 PIN: 10
SHIFTSTAT:1 SHIFTERR=0 TIMSTAT=0
SHIFTSIEN:0 SHIFTEIEN=0 TIMIEN=0
SHIFTSDEN:0 SHIFTSTATE=0
SHIFTCTL:30402 0 0 0
SHIFTCFG:32 0 0 0
TIMCTL:1c00401 0 0 0
TIMCFG:2222 0 0 0
TIMCMP:f81 0 0 0
End Setup
The value like TIMCTL 1c00401 was not correct so nothing came out.

Currently updating the code to hopefully support ISRs to do work. Trying to do a quick and dirty way for multiple objects to ask to be notified if an ISR happens....
As soon as this is semi functional, will then take some of the top level sketch code and put into a high level object based off stream...

Then start adding in the RX part

Edit: Thinks Mike for printing out the different actual clocks... Now just need to figure out how these names translate to names like PLL3 or PLL4 ;)
 
Thanks guys - If you have tried the code I posted and nothing coming out... My guess is you have not synced up to core project, as I needed to update some of the macros in imxrt.h for setting some of the fields. It only updated 16 bit values not the full 32 values... So Some of the registers were not properly updated.

Examples debug output to Serial:
…..
The value like TIMCTL 1c00401 was not correct so nothing came out.

Currently updating the code to hopefully support ISRs to do work. Trying to do a quick and dirty way for multiple objects to ask to be notified if an ISR happens....
As soon as this is semi functional, will then take some of the top level sketch code and put into a high level object based off stream...

Then start adding in the RX part

Edit: Thinks Mike for printing out the different actual clocks... Now just need to figure out how these names translate to names like PLL3 or PLL4 ;)

Yeah - that is pretty much was what I was seeing when I ran it.

For the clock frequencies I copied a lot of what was in clock.c into a sketch just as a test case for me to check things out with. Was actually the first thing I did with the T4 as a test. So the sketch is a mess. But its called Blink_T4 in my WIP GitHub repository - maybe it will help sort out the clock names - never went back to do that - moved on after that. The link is: https://github.com/mjs513/WIP/tree/master/T4 helpers
 
Paul, do you plan any important changes to audio stream?

Depends on what you mean by "important". ;)

My plan is to keep the API identical. However, some places in the library use ugly shortcuts, like assuming the offset from block to buffer is 8 bytes. Will probably break those places and need to update them with something better.


Would be really great if we could use the GPU on a future T4.2 .. would make everything graphics-related so much better..

The 1052 chip you have now has the 2D GPU. Or it's supposed to have it. So far I have not had any time to try using it.

The GPU offers a fairly small feature set, mostly just image copy with optional alpha blend & colorspace transform, but it can do all those at 1 pixel per clock (maybe F_BUS?)
 
Would be perfect if there were some early hooks - before and after USB init. Setup() is too late - all objects get initialized before Setup().

I believe we have 2 hooks in Teensy 3.x startup.

void startup_early_hook(void)
void startup_late_hook(void)

The first is very early before the PLLs or static C variables initialized, mainly for taking control of the watchdog. The second is after C++ constructors but before setup().

Are these 2 good enough? Do we need a 3rd, maybe before constructors but after PLLs running and static variables initialized? Before USB, after USB, etc? Do we need 4, 5 or even more?!

Whatever ends up being done, I'd like to have the same names and roughly similar places in the startup process on Teensy 3 & 4.
 
My goal was to be able to find some useful postmortem info and return it

I'm planning to do something along these lines later in 2019, using a combination of code on the iMXRT and special support in the bootloader.

Sorry, can't talk about these plans right now - other than to let you know I do have something in mind for later this year, so all this debug printing stuff currently in the fault handler should be considered only temporary.
 
@manitou - Any chance you could adapt coremarkish to use buffers in OCRAM? But leave the stack in DTCM. Either malloc() or DMAMEM allocate in OCRAM. Really looking to benchmark the performance impact of the different cache settings in the MPU.
 
@manitou - Any chance you could adapt coremarkish to use buffers in OCRAM? But leave the stack in DTCM. Either malloc() or DMAMEM allocate in OCRAM. Really looking to benchmark the performance impact of the different cache settings in the MPU.

I'll take a look. I haven't looked that closely at the coremark code. i'll look at linpack benchmark too, big float/double arrays.
 
I'll take a look. I haven't looked that closely at the coremark code. i'll look at linpack benchmark too, big float/double arrays.
@manitou - think there is setting you can change in core_portme.h (#ifndef MEM_LOCATION
#define MEM_LOCATION "STACK"
#endif), just don't know what the other settings are or what would need to be updated for T4
 
I'll take a look. I haven't looked that closely at the coremark code. i'll look at linpack benchmark too, big float/double arrays.

coremark has a configurable MEM_METHOD (static, heap, stack). Not a big difference in iterations/second with T4@600mhz
Code:
         STACK   MALLOC
Faster   2295.2  2290.6
Fastest  2426.7  2426.2

linpack 100x100 float  (mflops)
Faster    161.5   152.6
Fastest   166.7   157.1
This is with stock beta7.

On EVKB@600mhz (-O3), coremark gets 2439.2 iterations/sec, adding SCB_DisableDCache() drops to 2437.9.
 
Last edited:
I'm planning to do something along these lines later in 2019, using a combination of code on the iMXRT and special support in the bootloader.

Sorry, can't talk about these plans right now - other than to let you know I do have something in mind for later this year, so all this debug printing stuff currently in the fault handler should be considered only temporary.

I figured the current was just a placeholder - given the printed values aren't shown with names :) Interesting there are capabilities/plans to add further support.

I looked a bit and I found a snippet on Community.NXP that looks to express the same dumped values - only with labels. I'll see if I can't clean the following up enough to offer the named replacement for a Pull Request - maybe with a couple of bit checks and notes on the fault regs. The T4 seems less usable for calling user code than the T3 after a fault - at least the one I picked. Faults don't seem to be a plague - so I'll try to do this and then get back to the other stuff I was doing on T3 for debug.
User_uiv:: irq
stacked_r0 :: 2000073C
stacked_r1 :: 2000073C
stacked_r2 :: 00000061
stacked_r3 :: 00C00000
stacked_r12 :: FFFFFFFF
stacked_lr :: 600011CB
stacked_pc :: 6000127E
stacked_psr :: 21010000
_CFSR :: 00000400
_HFSR :: 40000000
_DFSR :: 00000000
_AFSR :: 00000000
_BFAR :: 00000000
_MMAR :: 00000000

PJRC > irq 3
00000000
00000000
00000000
00000000
40000000
00000004
00000000
21010000
6000127E
600011CB
FFFFFFFF
00C00000
00000061
2000073C
2000073C
 
I believe we have 2 hooks in Teensy 3.x startup.

void startup_early_hook(void)
void startup_late_hook(void)
yes i know.

Do we need 4, 5 or even more?!
? what did you eat? I want the same..

One additional after all PLLs and USB are initialized.
Code:
    while (millis() < 20) ; // wait at least 20ms before starting USB
    usb_init();    
                      <- here
    analog_init();
    pwm_init();
Or none, if you decide to add the code in question to the core.
 
The 1052 chip you have now has the 2D GPU. Or it's supposed to have it. So far I have not had any time to try using it.

The GPU offers a fairly small feature set, mostly just image copy with optional alpha blend & colorspace transform, but it can do all those at 1 pixel per clock (maybe F_BUS?)

Yes,Yeah, I saw that. Of course I meant with the corresponding interface to the outside....
I didn't really look either.
Really useful would be a color-LUT, a translation table that allows a 256 to 65536 colors lookup and output without needing a buffer for the output.. Do you have any idea if we can use any existing device for it? A cool DMA-trick? Anything? Or, can we we use the existing GPU for it?
 
Status
Not open for further replies.
Back
Top