Teensyduino 1.55 Beta #1

Status
Not open for further replies.
@Defragster - Does the temperature increase scale with CPU frequency, or is it a fairly fixed 6 degrees?

Recorded temps are just incidental from getting other testing done. Did not try any freq but 600 MHz so far. And looking at the mA the other day it showed perhaps 1 mA diff? Can that make that much more heat? I don't have the best DMM ...

Newest current Temps below - #210 with pins suggests the variability (versus #960 w/pins) is just board to board difference of build or MCU temp sensor? But #970 to #800 is significant.
> I should put a TEMP KEY in #960 so it uses encrypted code

Finally got the Code4Code writing 4,000 functions that call #1, #2, ..., #3999, #0:
Memory Usage on Teensy 4.0:
FLASH: code:1002180, data:326432, headers:8728 free for files:694276
RAM1: variables:90976, code:38616, padding:26920 free for local variables:367776
RAM2: variables:24736 free for malloc/new:499552


Just labeled the four cables to the T4's here are current results with temps that may be stable? Showing last 3 of Ser #:
Code:
[B]#970 LOCKED Beta on TallDog:[/B]
8.9346, 51.16
Completed CASCADE Count 4000	Cascading took 5138517 us [3063856073 piCycles] : net 32091
Direct calls took 5108869 us [3064095776 piCycles] : net 2043

[B]#960 Unlocked Beta with Pins:[/B]
9.1948, 52.33
Completed CASCADE Count 4000	Cascading took 5134575 us [3063203261 piCycles] : net 29237
Direct calls took 5108866 us [3064094767 piCycles] : net 2042

[B]#800 Production on TallDog:[/B]
34.3894, 46.06
Completed CASCADE Count 4000	Cascading took 5134584 us [3063204246 piCycles] : net 29244
Direct calls took 5108867 us [3064094784 piCycles] : net 2043

[B]#210 Production with Pins:[/B]
14.9686, 52.02
Completed CASCADE Count 4000	Cascading took 5134601 us [3063201678 piCycles] : net 29265
Direct calls took 5108867 us [3064094892 piCycles] : net 2043

As for the info from those results of the test as designed: {this Code4Code on github}
> 1 MB of code in FLASH - most of it safely used from encryption (on #970)
-> each of 4,000 func#()'s sums the 60 digits and compares to known
> 326 KB of DATA in FLASH - most all of it read to compare against running results.
-> each of 4,000 func#()'s puts digits in string[60] to compare against known in FLASH, and 2nd FLASH copy unique to func#()
> Running code from FLASH outside cache where data also read outside of cache when Cascading takes longer 25ms out of 5108
-> and if the numbers are right a bit longer when decrypting from FLASH. 4ms out of 5134 .vs. 5138 when Cascading?
-> taking the cpu_cycles spent in local loop getting Pi digits out gives the net microseconds spent in calls and compares
 
On this unit from above:
Code:
[B]#960 Unlocked Beta with Pins:[/B]
9.1948, 52.33
Completed CASCADE Count 4000	Cascading took 5134575 us [3063203261 piCycles] : net 29237
Direct calls took 5108866 us [3064094767 piCycles] : net 2042

Now it has a pem.key stored (same as locked #970):
Code:
[B]#960 KEYED but Unlocked Beta with Pins:[/B]
3.9469, 53.00 
Completed CASCADE Count 4000	Cascading took 5138462 us [3063839029 piCycles] : net 32064
Direct calls took 5108869 us [3064095859 piCycles] : net 2043
> The added time is showing - same as Locked #970 - Temp still adjusting 53.00 to 53.67 - up slightly.
> This code runs once for 5 secs and then does 5 second delay - so maybe not at MAX heat? But all are the same code.

Pushed same pem.key to 2nd Beta for first time - All Good:
Code:
Writing public key hash
public key hash is good :-)
Writing secret key
key written, setting config

Testing Bus Encryption Engine
Success: ciphertext decryption test passed :-)

Notice: JTAG is still enabled

Notice: Secure mode is not yet set, unencrypted
        programs are still allowed to run
Verify secure code is running properly

Pass: Bus Encryption Engine is active
Pass: Encryption region starts at proper address
Pass: Program data is entirely within encrypted region
Pass: title_function() is within encrypted region

All Tests Passed.  :-)

isEncrypt() shows this starting Code4Code:
Code:
T:\tCode\T4Encrypt\Code4Code\Code4Code.ino Aug 26 2021 19:47:40
Verify secure code is running properly

Pass: Bus Encryption Engine is active
Pass: Encryption region starts at proper address
Pass: Program data is entirely within encrypted region
Pass: title_function() is within encrypted region
Pass: title_text[] is within encrypted region
Pass: csf is PJRC
NOTE: hab_version == 0x40307
NOTE: hab_status == 0xF0
Secure mode NOT SET :: Fuses == 0x8B018
 
@Defragster - Does the temperature increase scale with CPU frequency, or is it a fairly fixed 6 degrees?

I added a section about power consumption on the code security documentation page.

https://www.pjrc.com/teensy/td_code_security.html

So far it just mentions the 6 degrees...

@Paul : did you want one or more other speeds than 600 MHz? 528? Other?

I just rewrote loop to have no delays, but run three sets of the two ~5 sec test, so printing the temp each ~30 secs. Will let that run on all four for some time.

So far all temps look to be down 1-2 deg C from prior with no delay() in the loop.

Need to find Ser# code to personalize the output - or put a string in EEPROM ...

After some run time at 600 MHz - NO DELAY between Pi looping: {prior 'temp'} p#302
Code:
[B]#970 LOCKED Beta on TallDog:[/B]
57.4089, 49.04 {prior 51.16}
Completed CASCADE Count 4000	Cascading took 5139308 us [3063869471 piCycles] : net 32859
Direct calls took 5108568 us [3063919512 piCycles] : net 2036

[B]#960 Unlocked [U]Encrypted [/U]Beta with Pins:[/B]
57.9214, 51.67 {prior 52.33}
Completed CASCADE Count 4000	Cascading took 5139290 us [3063864775 piCycles] : net 32849
Direct calls took 5108568 us [3063919436 piCycles] : net 2036

[B]#800 Production on TallDog:[/B]
57.8957, 44.20 {prior 46.06}
Completed CASCADE Count 4000	Cascading took 5134759 us [3063228808 piCycles] : net 29378
Direct calls took 5108566 us [3063918648 piCycles] : net 2035

[B]#210 Production with Pins:[/B]
57.3834, 49.56 {prior 52.02}
Completed CASCADE Count 4000	Cascading took 5134779 us [3063226278 piCycles] : net 29402
Direct calls took 5108566 us [3063918613 piCycles] : net 2035

* switching to 528 Mhz
** Seems it takes longer for encoded programming?

Looks like over 30 minutes running at 528 MHz:
Code:
[B]#970 LOCKED Beta on TallDog:[/B]
37.8521, 44.80
Completed CASCADE Count 4000	Cascading took 5836292 us [3063878105 piCycles] : net 729829
Direct calls took 5805232 us [3063941894 piCycles] : net 698663

[B]#960 Unlocked [U]Encrypted [/U]Beta with Pins:[/B]
37.8521, 47.67
Completed CASCADE Count 4000	Cascading took 5836313 us [3063880542 piCycles] : net 729846
Direct calls took 5805232 us [3063941987 piCycles] : net 698663

[B]#800 Production on TallDog:[/B]
38.9998, 40.49
Completed CASCADE Count 4000	Cascading took 5831327 us [3063242290 piCycles] : net 725924
Direct calls took 5805230 us [3063941081 piCycles] : net 698662

[B]#210 Production with Pins:[/B]
39.5822, 47.11
Completed CASCADE Count 4000	Cascading took 5831329 us [3063239022 piCycles] : net 725931
Direct calls took 5805231 us [3063941485 piCycles] : net 698662
 
Last edited:
Compiling Code4Code a couple times - with 4 other sketches open I just got this a second time from the IDE:
Code:
cleaning build path: remove C:\Users\Tim\AppData\Local\Temp\arduino_build_38043\Code4Code.ino.elf: The process cannot access the file because it is being used by another process.
Error compiling for board Teensy 4.0.

Edited into : Sink, Kitchen, and Code4Code to use T4 cores SerialNumber code, then decorated that with State of ENCryption and Secure Mode:
10379970 ENC SM: > is both of those << Beta TallDog
10379960 ENC ns: > is running ENCoded, but 2nd beta board not secure locked << Beta Pinned
6683800 nor ns: > is Neither, but 'nor'mal and not secure << Prod TallDog
8710210 nor ns: > is Neither, but 'nor'mal and not secure << Prod Pinned
Updated : Defragster/T4LockBeta

Here is TEMP and Perf noted for 150 MHz:: The trend in Temp persists, one Prod cooler and one much warmer.
Code:
36.4930, 39.14  // TallDog board
[B]10379970 ENC SM:[/B] Completed CASCADE Count 4000	Cascading took 20268124 us [3058350277 piCycles] : net 15170874
Direct calls took 20270812 us [3063725219 piCycles] : net 15164604

36.4930, 41.00  // bare with pins
[B]10379960 ENC ns:[/B] Completed CASCADE Count 4000	Cascading took 20268110 us [3058352692 piCycles] : net 15170856
Direct calls took 20270812 us [3063725222 piCycles] : net 15164604

38.5123, 34.91  // TallDog board { First run T_4.0?  No PCB maker hash mark silkscreen }
[B]6683800 nor ns:[/B] Completed CASCADE Count 4000	Cascading took 20259703 us [3058036145 piCycles] : net 15162977
Direct calls took 20270806 us [3063724190 piCycles] : net 15164600

34.4584, 40.96  // bare with pins
[B]8710210 nor ns:[/B] Completed CASCADE Count 4000	Cascading took 20259635 us [3058031366 piCycles] : net 15162917
Direct calls took 20270807 us [3063724264 piCycles] : net 15164600

* At this slow speed, the two betas running encrypted code work from FLASH
* Are only 9ms slower overall in 20 secs of runtime with Cascading calls - and the same 20.27 secs running direct cached calls.
* They are showing proportionate perf: ~4X slower and one quarter the MPU speed
 
Last edited:
@Paul - @defragster
Got curious again about the temps while reading Tim's analysis so I redid the temps but for my loop temps I continuously calculated prime numbers between 2-100,000 and timed each Prime Number cycle with the associated temp:
Code:
void loop() {
  int lower = lower_init;  
  time_now = millis();
  while ( lower < higher)
  {
    flag = 0;
    for ( int x = 2; x <= lower/2; ++x)
    {
      if ( lower % x == 0)
      {
        flag = 1;
        break;
      }
    }
    //if ( flag == 0){
    //  Serial.printf("%d ", lower);
    //}
    ++lower;

  }
  //Serial.println();
  Serial.print((millis()-time_now)*0.00001667,4);Serial.print(", "); 
  Serial.println(tempmonGetTemp(),2);
}
The breakout board was my T4 breakout board and I used a Production T4 and the Locked T4 I have been experimenting with.

At 450Mhz about a 3.5deg diff where the Locked is running hotter than the Production T4. At 600Mhz I am only seeing about 2.5 deg difference between the 2 boards while at 912Mhz essentially no difference in temperatures. Since I like graphs (tells me more) I plotted both:
450Mhz.PNG600Mhz.PNG 912Mhz.PNG

EDIT: Ref also early temp graphs with just temp in loop and 500ms delay. https://forum.pjrc.com/threads/67989-Teensyduino-1-55-Beta-1?p=286405&viewfull=1#post286405
 
Last edited:
Could you measuere at a lower frequency, too? ie. 450MHz

Hi Frank - just updated post #306 with the 450Mhz chart. The Locked teensy is running about 3-3.5 deg hotter than the Production board. Interesting correlation. The slower the speed the larger the delta between the 2 boards :)

Now is the time for some zzzz's.
 
Hi Frank - just updated post #306 with the 450Mhz chart. The Locked teensy is running about 3-3.5 deg hotter than the Production board. Interesting correlation. The slower the speed the larger the delta between the 2 boards :)

Now is the time for some zzzz's.

Quite possible - the decryption is not clocked with cpu freq, but the flash bus freq (or similar), which is constant.
 
Morning all... Wondering are we talking about same hardware? Is one a REV B of the IMXRT and other a REV A? And if there is chance that might somehow change things?
 
Morning all... Wondering are we talking about same hardware? Is one a REV B of the IMXRT and other a REV A? And if there is chance that might somehow change things?

Good point. All my T4's are Rev A's. Don't have any Rev B's
 
Note: Quick update... I am running @mjs513 on two T4s...

One is an older OshPark T4... The other is a new production T4. Both have no pins on them and are hanging off of my desk by USB cables. So air on all sides... Both started up pretty close to same time...

Note: the New one I have run the Fuse Write sketch on it... Which went part way through
... but won't load an .ehex currently.

screenshot.jpg

Edit: been running maybe 10 minutes longer and one at about 49.8 and the other around 54.6
Note: I turned off the OSHPark one for about 10 seconds to confirm it is the one running cooler
 
I've filled in all the code security documentation. Hopefully no big surprises.

https://www.pjrc.com/teensy/td_code_security.html

Well, except for a photo of how we will mark the lockable boards. Still waiting on a proper stamp to replace my hand-drawn with sharpie marks.

Took a quick read through and yep no surprises. Don't know if you want to mention about what happens when you try and do a 15s or just its not supported on a Locked T4x.
 
I've filled in all the code security documentation. Hopefully no big surprises.

https://www.pjrc.com/teensy/td_code_security.html

Well, except for a photo of how we will mark the lockable boards. Still waiting on a proper stamp to replace my hand-drawn with sharpie marks.

Looks good. A couple of things...

Not sure yet about temp... As I mentioned in previous post, the new T4 running hotter than the older OSHPark T4... Also I have locked one running as well, although not a straight comparison as it has pins soldered in and piece of red tape...
screenshot.jpg

And in the photo below the screens left to right: OSH, T4 prod, T4 locked:
screenshot2.jpg

Was unclear in your write-up if you can load .ehex files on newer (since June) T4s or T4.1s... Currently with Beta 1 my new T4 does not run with .ehex file... As mentioned earlier.

Also unclear from:
Standard Teensy 4.0 & 4.1 manufactured before June 2021, and all MicroMod Teensy can not use the key in fuse memory for code decryption. If you run the fuse write sketch, the encryption test will fail when the Bus Encryption Engine (BEE) is turned on.
If you are saying that MM will never be able to run encrypted or if it will require new bootloader...
 
RE: td_code_security.html
> Perhaps Open with 'properly short' description of the two paths on 1062 Teensy: Unlocked Secure Mode with Encryption and Lockable Encryption. Then to answer questions like KurtE's, it might have a clear note that some 1062's will never be able to use some/all features discussed - or delineate which will or won't with pending bootloader update - based on manufacture date, etc as possible. Defining and setting expectations up front might help. Then introducing the " Lockable Teensy", and the reader will know what parts that follow may apply - without drooling all over it to have the balloon burst at the end.

Rewrote the isEncrypt() in this sketch : T4LockBeta/tree/main/PrimeEncrypt

Missing from isEncrypt() is ?:: Can the Teensy be Locked and Can the Teensy have Encryption key set/used for Secure Mode

isEncrypt() :: gets 1062 Teensy Serial number, appends the ENC and SecureMode indicators - and then the MHz in global String ::
> 10379970 ENC SM @600
> 6052840 nor ns: @600

Played with the Prime loop (was missing some var declarations) - just runs for about 10-11 seconds looking for primes 2 to 6000000 { opps skipped and didn't include 2 }.

But it is just busy work. Sending 'USB' will toggle the SPEW of the Primes it finds in rows. Wil no Spew the "Minutes, Temp" is displayed 5-6 times a minute at 600 MHz.
 
RE: td_code_security.html
> Perhaps Open with 'properly short' description of the two paths on 1062 Teensy: Unlocked Secure Mode with Encryption and Lockable Encryption. Then to answer questions like KurtE's, it might have a clear note that some 1062's will never be able to use some/all features discussed - or delineate which will or won't with pending bootloader update - based on manufacture date, etc as possible. Defining and setting expectations up front might help. Then introducing the " Lockable Teensy", and the reader will know what parts that follow may apply - without drooling all over it to have the balloon burst at the end.

Rewrote the isEncrypt() in this sketch : T4LockBeta/tree/main/PrimeEncrypt

Missing from isEncrypt() is ?:: Can the Teensy be Locked and Can the Teensy have Encryption key set/used for Secure Mode

isEncrypt() :: gets 1062 Teensy Serial number, appends the ENC and SecureMode indicators - and then the MHz in global String ::
> 10379970 ENC SM @600
> 6052840 nor ns: @600

Played with the Prime loop (was missing some var declarations) - just runs for about 10-11 seconds looking for primes 2 to 6000000 { opps skipped and didn't include 2 }.

But it is just busy work. Sending 'USB' will toggle the SPEW of the Primes it finds in rows. Wil no Spew the "Minutes, Temp" is displayed 5-6 times a minute at 600 MHz.

Sorry Tim didn't post the whole sketch only the loop code, here is what I am using - I am not printing the primes so not getting the SPEW you are talking about. Also Temp is printed out at about 3.75 second increments if you use 2-100000 as the range so here is the sketch I am using:
Code:
// https://forum.pjrc.com/threads/33443-How-to-display-free-ram
#include <MemoryHexDump.h>  // https://github.com/KurtE/MemoryHexDump
long time_now;

uint32_t *ptrFreeITCM;  // Set to Usable ITCM free RAM
uint32_t  sizeofFreeITCM; // sizeof free RAM in uint32_t units.
extern unsigned long _stextload;
extern char _stext[], _etext[], _sbss[], _ebss[], _sdata[], _edata[],
       _estack[], _heap_start[], _heap_end[], _itcm_block_count[], *__brkval;
float myTemp;

//Prime number calculator: https://www.educba.com/prime-number-in-c-plus-plus/
int flag;
int lower_init = 2;
int higher = 100000;

void setup()
{
  // put your setup code here, to run once:
  Serial.begin(115200);
  while (!Serial);
  if ( CrashReport) Serial.print( CrashReport);
  Serial.println("\n" __FILE__ " " __DATE__ " " __TIME__);
  if ( CrashReport ) Serial.print ( CrashReport );
  myTemp = tempmonGetTemp();
  Serial.printf( "\n\tdeg  C=%f\t F_CPU=%u\n" , myTemp, F_CPU_ACTUAL );
  memInfo();
  getFreeITCM();

  extern const uint32_t hab_csf[768]; // placeholder for HAB signature
  Serial.println();
  //dumpRam(Serial, 0x60000000 + ptrFreeITCM - 1024, 1024);
  //  Serial.println((uint32_t)&_stextload + (uint32_t)&_etext, HEX);
  MemoryHexDump(Serial, hab_csf , 128, true, "---\thab_csf\n");
  MemoryHexDump(Serial, ptrFreeITCM - 1024, 128, true, "---\tITCM used\n");
  MemoryHexDump(Serial, ptrFreeITCM, sizeofFreeITCM * sizeof(uint32_t), true, "---\tITCM filler to DTCM\t test 3 \n");
  MemoryHexDump(Serial, (uint8_t *)0, 128, true, " ITCM Start: \n");

  isEncrypt();

}

void loop() {
  int lower = lower_init;  
  time_now = millis();
  while ( lower < higher)
  {
    flag = 0;
    for ( int x = 2; x <= lower/2; ++x)
    {
      if ( lower % x == 0)
      {
        flag = 1;
        break;
      }
    }
    //if ( flag == 0){
    //  Serial.printf("%d ", lower);
    //}
    ++lower;

  }
  //Serial.println();
  Serial.print((millis()-time_now)*0.00001667,4);Serial.print(", "); 
  Serial.println(tempmonGetTemp(),2);
}


#define printf Serial.printf

void memInfo () {
  constexpr auto RAM_BASE   = 0x2020'0000;
                              constexpr auto RAM_SIZE   = 512 << 10;
                              constexpr auto FLASH_BASE = 0x6000'0000;
#if ARDUINO_TEENSY40
  constexpr auto FLASH_SIZE = 2 << 20;
#elif ARDUINO_TEENSY41
  constexpr auto FLASH_SIZE = 8 << 20;
#else
  constexpr auto FLASH_SIZE = 16 << 20;
#endif

  // note: these values are defined by the linker, they are not valid memory
  // locations in all cases - by defining them as arrays, the C++ compiler
  // will use the address of these definitions - it's a big hack, but there's
  // really no clean way to get at linker-defined symbols from the .ld file

  auto sp = (char*) __builtin_frame_address(0);

  printf("_stext        %08x\n",      _stext);
  printf("_etext        %08x +%db\n", _etext, _etext - _stext);
  printf("_sdata        %08x\n",      _sdata);
  printf("_edata        %08x +%db\n", _edata, _edata - _sdata);
  printf("_sbss         %08x\n",      _sbss);
  printf("_ebss         %08x +%db\n", _ebss, _ebss - _sbss);
  printf("curr stack    %08x +%db\n", sp, sp - _ebss);
  printf("_estack       %08x +%db\n", _estack, _estack - sp);
  printf("_heap_start   %08x\n",      _heap_start);
  printf("__brkval      %08x +%db\n", __brkval, __brkval - _heap_start);
  printf("_heap_end     %08x +%db\n", _heap_end, _heap_end - __brkval);
#if ARDUINO_TEENSY41
  extern char _extram_start[], _extram_end[], *__brkval;
  printf("_extram_start %08x\n",      _extram_start);
  printf("_extram_end   %08x +%db\n", _extram_end,
         _extram_end - _extram_start);
#endif
  printf("\n");

  printf("<ITCM>  %08x .. %08x\n",
         _stext, _stext + ((int) _itcm_block_count << 15) - 1);
  printf("<DTCM>  %08x .. %08x\n",
         _sdata, _estack - 1);
  printf("<RAM>   %08x .. %08x\n",
         RAM_BASE, RAM_BASE + RAM_SIZE - 1);
  printf("<FLASH> %08x .. %08x\n",
         FLASH_BASE, FLASH_BASE + FLASH_SIZE - 1);
#if ARDUINO_TEENSY41
  extern uint8_t external_psram_size;
  if (external_psram_size > 0)
    printf("<PSRAM> %08x .. %08x\n",
           _extram_start, _extram_start + (external_psram_size << 20) - 1);
#endif
  printf("\n");

  auto stack = sp - _ebss;
  printf("avail STACK % 8d b % 5d kb\t<<RAM1\n", stack, stack >> 10);

  auto heap = _heap_end - __brkval;
  printf("avail HEAP  % 8d b % 5d kb\t<<RAM2\n", heap, heap >> 10);

#if ARDUINO_TEENSY41
  auto psram = _extram_start + (external_psram_size << 20) - _extram_end;
  printf("avail PSRAM % 8d b % 5d kb\n", psram, psram >> 10);
#endif
}


uint32_t  SizeLeft_etext;
FLASHMEM
void   getFreeITCM() { // end of CODE ITCM, skip full 32 bits
  Serial.println("\n\n++++++++++++++++++++++");
  SizeLeft_etext = (32 * 1024) - (((uint32_t)&_etext - (uint32_t)&_stext) % (32 * 1024));
  sizeofFreeITCM = SizeLeft_etext - 4;
  sizeofFreeITCM /= sizeof(ptrFreeITCM[0]);
  ptrFreeITCM = (uint32_t *) ( (uint32_t)&_stext + (uint32_t)&_etext + 4 );
  printf( "Size of Free ITCM in Bytes = % u\n", sizeofFreeITCM * sizeof(ptrFreeITCM[0]) );
  printf( "Start of Free ITCM = % u [ % X] \n", ptrFreeITCM, ptrFreeITCM);
  printf( "End of Free ITCM = % u [ % X] \n", ptrFreeITCM + sizeofFreeITCM, ptrFreeITCM + sizeofFreeITCM);
  for ( uint ii = 0; ii < sizeofFreeITCM; ii++) ptrFreeITCM[ii] = 1;
  uint jj = 0;
  for ( uint ii = 0; ii < sizeofFreeITCM; ii++) jj += ptrFreeITCM[ii];
  printf( "ITCM DWORD cnt = % u [#bytes=%u] \n", jj, jj * 4);
}

// https://forum.pjrc.com/threads/33443-How-to-display-free-ram?p=275013&viewfull=1#post275013
extern char *__brkval;

int freeram() {
  return _heap_end - __brkval;
}

PROGMEM char title_text[] = "Verify secure code is running properly";

FLASHMEM void title_function() {
  Serial.println( title_text );
  Serial.println();
}

//extern "C" uint32_t _sdata, _edata, _sdataload; /* special linker symbols */
extern "C" uint32_t _sdataload; /* special linker symbols */
extern const uint32_t hab_csf[768]; // placeholder for HAB signature

int isEncrypt() {
  int ok=0;
  title_function();
 
  if ((IOMUXC_GPR_GPR11 & 0x100) == 0x100) {
    Serial.println("Pass: Bus Encryption Engine is active");
  } else {
    Serial.println("Fail: Bus Encryption Engine is not active");
    ok--;
  }

  uint32_t begin_address = IOMUXC_GPR_GPR18 & ~0x3FF;
  if (begin_address == 0x60001400) {
    Serial.println("Pass: Encryption region starts at proper address");
  } else {
    Serial.println("Fail: Encryption region starts at wrong address");
    ok--;
  }

  uint32_t end_address = IOMUXC_GPR_GPR19 & ~0x3FF;
  uint32_t data_end = (uint32_t)&_sdataload + (uint32_t)&_edata - (uint32_t)&_sdata;
  if (data_end <= end_address) {
    Serial.println("Pass: Program data is entirely within encrypted region");
  } else {
    Serial.println("Fail: Program data is not within encrypted region");
    ok--;
  }

  uint32_t title_address = ((uint32_t)&title_function) & ~1;
  if (title_address >= begin_address && title_address < end_address) {
    Serial.println("Pass: title_function() is within encrypted region");
  } else {
    Serial.println("Fail: title_function() is not in encrypted region");
    ok--;
  }

  if ((uint32_t)title_text >= begin_address && (uint32_t)title_text < end_address) {
    Serial.println("Pass: title_text[] is within encrypted region");
  } else {
    Serial.println("Fail: title_text[] is not in encrypted region");
    ok--;
  }
   uint jj = 0;
  for ( uint ii = 0; ii < sizeof(hab_csf) / sizeof(hab_csf[0]); ii++ ) jj += hab_csf[ii];
  if ( jj ) {
    Serial.println("Pass: csf not Zero");
  } else {
    Serial.println("Fail: csf is Zero");
    ok--;
  }
 // TODO: check HAB version and HAB logfile status

  Serial.println();
  if (0==ok) Serial.println("All Tests Passed.  :-)");
  else printf(" %d Tests failed.  :-(", -ok);
  return ok;
}
Haven't updated to your latest changes to isEncrypt yet.

PS> Tried running your test sketch on my 1 single Locked T4 but was having problems with it since couldn't figure out how to configure it for 1 T4. The one run I got in at 600Mhz on the Locked T4 showed that the same max temps after 30minutes - so the busy work matches between sketches:
Code:
[B]31.3874, 50.51
[/B]10379980 ENC SM: Completed CASCADE Count 4000	Cascading took 6829049 us [3058319086 piCycles] : net 1731851
Direct calls took 6810657 us [3063555503 piCycles] : net 1704732

10379980 ENC SM: Completed CASCADE Count 4000	Cascading took 6829099 us [3058329879 piCycles] : net 1731883
Direct calls took 6810655 us [3063554710 piCycles] : net 1704731

10379980 ENC SM: Completed CASCADE Count 4000	Cascading took 6829058 us [3058322757 piCycles] : net 1731854
Direct calls took 6810657 us [3063555674 piCycles] : net 1704731

[B]32.0695, 50.51
[/B]10379980 ENC SM: Completed CASCADE Count 4000	Cascading took 6829083 us [3058331271 piCycles] : net 1731865
Direct calls took 6810655 us [3063554673 piCycles] : net 1704731

10379980 ENC SM: Completed CASCADE Count 4000	Cascading took 6829050 us [3058326897 piCycles] : net 1731839
Direct calls took 6810655 us [3063554672 piCycles] : net 1704731

10379980 ENC SM: Completed CASCADE Count 4000	Cascading took 6829085 us [3058337353 piCycles] : net 1731857
 
Still not sure TEMP is an issue from Fuse change - the Product seems to have variability or change ... a few good Current samples would show if more power is used. It was odd back some posts where delay() increased the temp over just keeping busy in loop.

Sorry Tim didn't post the whole sketch only the loop code, here is what I am using - I am not printing the primes so not getting the SPEW you are talking about. Also Temp is printed out at about 3.75 second increments if you use 2-100000 as the range so here is the sketch I am using:
...
Haven't updated to your latest changes to isEncrypt yet.

PS> Tried running your test sketch on my 1 single Locked T4 but was having problems with it since couldn't figure out how to configure it for 1 T4. The one run I got in at 600Mhz on the Locked T4 showed that the same max temps after 30minutes - so the busy work matches between sketches:
...

Okay for posting 'Incomplete Source Code' :) Saw a post the other day where I noted changes and the reply was to put on github rather than cluttering posts with broken code - so started doing that more.

That runs - put the new isEncrypt in and posted so you can see/edit as : PrimeTemp

Has that starting bunch of info from prior samples ... they start as such simple examples - then creep adding more simple stuff and get ugly ...

Decided it was fun to see the primes so added ON/Off SPEW in the PrimeEncrypt version. Some tweaks make 100K fly by.


Not sure of 'problems' with Code4Code? Put in #ifdef for Dual Serial - so running with 'Serial' silently skips MakeCode( 4000 );, that put c source out USB1. With 'Dual Serial' if that is different than current "CodeMade.ino" { change that number in MakeCode( 4000 ); } - that USB1 output replaces the text in that file - then when built again - that is the running code. So automated self modifying/generating code - that has that manual step ... since it is compiled. What is on github is 3MB for CodeMade with 4,000 func()'s ... no wonder it takes some time to compile!

That code/data from Flash tests decryption - only thing I didn't test was read diff size blocks (decrypt is in 16 byte hunks) ... and reading backwards to confuse the cache that may 'look ahead'. But that wouldn't need to be in that sample as 'code writing code' isn't fun as done in MakeCode.ino.
 
@Paul:
Just did an upload of new code - to be in next post. {two more uploads after worked fine}

Saw the RED LED FLASH slightly out of sync with ORANGE LED_BUILTIN during the upload? Seen once before ...

It Stopped flashes but did not upload the new code.
Same code had just uploaded to another Teensy Ser#...960 on COM8 - then COM9 for Ser#..970.

Here is the code setup() reporting after Button and successful upload:
Code:
T:\tCode\T4Encrypt\Code4Code\Code4Code.ino Aug 28 2021 01:23:03
Verify secure code is running properly

Pass: Bus Encryption Engine is active
Pass: Encryption region starts at proper address
Pass: Program data is entirely within encrypted region
Pass: title_function() is within encrypted region
Pass: title_text[] is within encrypted region
Pass: csf is PJRC
NOTE: hab_version == 0x40307
NOTE: hab_status == 0xF0
Secure mode IS set :: Fuses == 0x4C8B01A

All Tests Passed.  :-)

Here is the Verbose from the FLASHING part ... just after end of prior upload:
Code:
01:24:01.144 (ports 2): usb_add: usb:0/140000/0/6/3  COM8 (Teensy 4.0) Serial
01:24:01.144 (ports 2): WM_DEVICECHANGE DBT_DEVNODES_CHANGED
01:24:01.145 (ports 2): nothing new, skipping HID & Ports enum
01:24:01.901 (loader): redraw, image 9
01:24:33.652 (ports 2): WM_DEVICECHANGE DBT_DEVICEREMOVECOMPLETE
01:24:33.654 (ports 2): remove: loc=usb:0/140000/0/6/1/1
01:24:33.654 (ports 2): usb_remove: usb:0/140000/0/6/1/1
01:24:33.654 (ports 2): nothing new, skipping HID & Ports enum
01:24:33.831 (ports 2): WM_DEVICECHANGE DBT_DEVICEARRIVAL
01:24:33.834 (ports 2): nothing new, skipping HID & Ports enum
01:24:33.873 (ports 2): WM_DEVICECHANGE DBT_DEVICEREMOVECOMPLETE
01:24:33.873 (ports 2): WM_DEVICECHANGE DBT_DEVNODES_CHANGED
01:24:33.875 (ports 2): nothing new, skipping HID & Ports enum
01:24:33.953 (loader): handle 938
01:24:33.953 (loader): HID/win32: HidD_GetPreparsedData ok
01:24:33.953 (loader):  security: 03  12 34 34 12
01:24:33.953 (loader):  response: 04  12 8A 8A 12
01:24:33.953 (loader): nxp_write_register32 success, 12343412 128a8a12
01:24:33.953 (loader): Device came online, code_size = 100
01:24:33.953 (loader): Board is: NXP IMXRT1062 ROM
01:24:33.953 (loader): begin operation
01:24:34.120 (loader): File "C:\Users\Tim\AppData\Local\Temp\arduino_build_502465\Code4Code.ino.hex", 1335296 bytes
01:24:34.271 (loader): File "C:\Users\Tim\AppData\Local\Temp\arduino_build_502465\Code4Code.ino.ehex", 1335296 bytes, 4960 extra
01:24:34.271 (loader): ehex is valid, key hash: 85A79CC1 B7C7F866 7F5BB3ED 0F9C9BA5 B4149EE2 72846D35 86B63863 B0699942
01:24:34.271 (loader): set background IMG_ONLINE
01:24:34.280 (loader): state STATE_NXP_BEGIN
01:24:34.280 (loader):  security: 03  12 34 34 12
01:24:34.280 (loader):  response: 04  12 8A 8A 12
01:24:34.280 (loader): HAB locked secure mode
01:24:34.285 (loader): state STATE_NXP_CLOSED
01:24:34.288 (loader): sending ehex access, 4960 bytes
01:24:34.292 (loader):  security: 03  12 34 34 12
01:24:34.292 (loader):  response: 04  88 88 88 88
01:24:34.292 (loader): run it..
01:24:34.292 (loader):  security: 03  12 34 34 12
01:24:34.301 (loader): end operation, total time = 0.348 seconds
01:24:34.311 (loader): redraw timer set, image 80 to show for 2000 ms
01:24:34.314 (ports 2): WM_DEVICECHANGE DBT_DEVICEREMOVECOMPLETE
01:24:34.317 (ports 2): nothing new, skipping HID & Ports enum
01:24:34.321 (ports 2): WM_DEVICECHANGE DBT_DEVNODES_CHANGED
01:24:34.322 (ports 2): nothing new, skipping HID & Ports enum
01:24:34.463 (loader): HID/win32:  vid:046D pid:C52B ver:1211
01:24:34.463 (loader): HID/win32:  vid:046D pid:C52B ver:1211
01:24:34.463 (loader): HID/win32:  vid:046D pid:C52B ver:1211
01:24:34.463 (loader): HID/win32:  vid:046D pid:C52B ver:1211
01:24:34.463 (loader): HID/win32:  vid:046D pid:C52B ver:1211
01:24:34.463 (loader): HID/win32:  vid:046D pid:C52B ver:1211
01:24:34.463 (loader): HID/win32:  vid:046D pid:C52B ver:1211
01:24:34.463 (loader): HID/win32:  vid:046D pid:C52B ver:1211
01:24:34.534 (ports 2): WM_DEVICECHANGE DBT_DEVNODES_CHANGED
01:24:34.535 (ports 2): nothing new, skipping HID & Ports enum
[B]01:24:35.293 (ports 2): purge, name=COM9 (Teensy 4.0) Serial, loc=usb:0/140000/0/6/1/1, age=1.639 sec[/B]
01:24:36.331 (loader): redraw, image 9
01:24:59.055 (ports 2): WM_DEVICECHANGE DBT_DEVICEARRIVAL
01:24:59.062 (ports 2): found_usb_device, id=\\?\usb#vid_16c0&pid_0483#10379970#{a5dcbf10-6530-11d2-901f-00c04fb951ed}
01:24:59.062 (ports 2): found_usb_device, loc=usb:0/140000/0/6/1/1    Port_#0001.Hub_#0010
01:24:59.062 (ports 2): found_usb_device, hwid=USB\VID_16C0&PID_0483&REV_0279
01:24:59.062 (ports 2): found_usb_device, devinst=00000009
01:24:59.062 (ports 2): add: loc=usb:0/140000/0/6/1/1, class=USB, vid=16C0, pid=0483, ver=0279, serial=10379970, dev=\\?\usb#vid_16c0&pid_0483#10379970#{a5dcbf10-6530-11d2-901f-00c04fb951ed}
01:24:59.063 (ports 2):   comport_from_devinst_list attempt
01:24:59.063 (ports 2):   found Ports in classguid_list at index=1
01:24:59.063 (ports 2):   port COM9 found from devnode
01:24:59.063 (ports 2): found_usb_device complete
01:24:59.065 (ports 2): usb_add: usb:0/140000/0/6/1/1  COM9 (Teensy 4.0) Serial
01:24:59.195 (ports 2): WM_DEVICECHANGE DBT_DEVNODES_CHANGED
01:24:59.196 (ports 2): nothing new, skipping HID & Ports enum
01:24:59.293 (ports 2): WM_DEVICECHANGE DBT_DEVICEARRIVAL
01:24:59.295 (ports 2): nothing new, skipping HID & Ports enum
01:24:59.398 (ports 2): WM_DEVICECHANGE DBT_DEVNODES_CHANGED
01:24:59.399 (ports 2): nothing new, skipping HID & Ports enum
01:28:11.796 (loader): Verbose Info event
 
Last edited:
Bigger and better : Code4Code

>> No Errors observed!

Prior code unchanged 4,000 cascading&direct calls making Pi - but added something @Frank B hinted at once and brought up here re: Decode size.

Four long ALPHA[365] strings were created in PROGMEM. { done in makecode.INO w/buildAlpha(), placed in CodeMade.ino }
> Test code :: testAlpha() called on 100 us intervalTimer and toggles LED_BUILTIN
-> two a1[] (upper case A-Z,...) and a2[] (lower case) are byte compared forward
--> starts at beginning for length of 1 increasing to length 104 to a point 130 bytes into the same string for both
-> two z1[] (upper case Z-A,...) and z2[] (lower case) are byte compared backward
--> starts at [208] for length of 1 increasing to length 104 to a point [312] bytes into the same string for both
>>-- that runs the course of length to 104 to cross all boundaries and sizes forwards and backwards
>>-- then it adds(subtracts) 1 to(from) the start points and repeats shifting 0-104
Code:
// IntervalTimer EXEC - this tests for problems (after the fact) from loop() - LED will pulse with delay(10) on error
void testAlpha() {
  static int ii = 0, kk = 1;
  int nn, mm = 0;
  while ( mm < kk ) {
    nn = mm + ii;
    if ( a1[nn] != a1[130 + nn] ) errAlpha( " a1 #1 fail ", ii, kk );
    if ( a2[nn] != a2[130 + nn] ) errAlpha( " a2 #1 fail ", ii, kk );
    if ( z1[208 - nn] != z1[312 - nn] ) errAlpha( " z1 #1 fail ", ii, kk );
    if ( z2[208 - nn] != z2[312 - nn] ) errAlpha( " z2 #1 fail ", ii, kk );
    mm++;
  }
  if ( ++kk > 104 ) {
    kk = 1;
    if ( ++ii > 104 ) ii = 0;
  }
  digitalToggleFast( LED_BUILTIN );
}

That does 4 tests per Each call for 104*105=10920 calls and restarts, with static state indexing for the next call.

That is called 10,000 times per second on intervalTimer (led toggle to show alive) - while loop() runs the cascading and direct test calls in the for() and checks that no errors were seen with errAlpha().

The original code thrashes at least 4,000*61 bytes of FLASH - so at 244,000 bytes the caches doesn't hold up long, and this on the _isr() is pinging another 8 to 836 bytes in turn about each second.

This should impact prior posted results - but I didn't look yet, and I added a 5th T4.0 with double long pins.
Code:
0.0002, 50.33
[B]10379960 ENC ns:[/B] Completed CASCADE Count 4000	Cascading took 5280602 us [3148908740 piCycles] : net 32421
Direct calls took 5255400 us [3151946592 piCycles] : net 2156
...
33.7241, 51.67
10379960 ENC ns: Completed CASCADE Count 4000	Cascading took 5280560 us [3148848097 piCycles] : net 32480
Direct calls took 5256055 us [3152359606 piCycles] : net 2123

0.0002, 46.92
[B]10379970 ENC SM:[/B] Completed CASCADE Count 4000	Cascading took 5280455 us [3148794963 piCycles] : net 32464
Direct calls took 5255529 us [3152022322 piCycles] : net 2159
...
36.3582, 48.33
10379970 ENC SM: Completed CASCADE Count 4000	Cascading took 5280320 us [3148697554 piCycles] : net 32491
Direct calls took 5256115 us [3152374947 piCycles] : net 2157

0.0001, 40.63
[B]6052840 nor ns:[/B] Completed CASCADE Count 4000	Cascading took 5276712 us [3148249621 piCycles] : net 29630
Direct calls took 5256093 us [3152364758 piCycles] : net 2152
...
34.2377, 42.67
6052840 nor ns: Completed CASCADE Count 4000	Cascading took 5276430 us [3148135787 piCycles] : net 29538
Direct calls took 5255854 us [3152225697 piCycles] : net 2145

0.0001, 42.96
[B]6683800 nor ns:[/B] Completed CASCADE Count 4000	Cascading took 5276708 us [3148297828 piCycles] : net 29545
Direct calls took 5255537 us [3152051626 piCycles] : net 2118
...
34.2378, 43.58
6683800 nor ns: Completed CASCADE Count 4000	Cascading took 5276896 us [3148347686 piCycles] : net 29650
Direct calls took 5256236 us [3152467378 piCycles] : net 2124

0.0001, 48.33
[B]8710210 nor ns:[/B] Completed CASCADE Count 4000	Cascading took 5276753 us [3148288538 piCycles] : net 29606
Direct calls took 5256258 us [3152461816 piCycles] : net 2155
...
34.2391, 50.18
8710210 nor ns: Completed CASCADE Count 4000	Cascading took 5276774 us [3148326331 piCycles] : net 29564
Direct calls took 5255600 us [3152080493 piCycles] : net 2133

@koromix - Thanks again for TyCommander! - Great Sermon on 5 Teensys at once!
 
And now for something completely different :D

If my sketch crashes and I have
Code:
  Serial.print(CrashReport);
The crash report (as well as other stuff early on in next run) shows up in the Serial Monitor window when I use the (not in the build)
USB type of MTP Disk Serial (Experimental)

But none of this output shows up with the MTP Disk(Experimental)
I do finally get some output after the sketch has completed setup:

Turned out I was going through NULL pointer...

Setup for this is the starting of new round of: some MTP/MSC stuff that @mjs513 and I have started including adding file dates and times...
Easy setup :D - Note will be starting new thread on this as a follow up to my earlier TLA thread.

Current setup: Arduino 1.8,15 and this beta:
Update cores: with https://github.com/KurtE/cores/tree/FS_DATES (Added dates/times will have branch FS_Integration)
update MTP: https://github.com/KurtE/MTP_t4/tree/FS_Integration
update MSC: https://github.com/KurtE/UsbMscFat/tree/FS_Integration
Update LittleFS: https://github.com/Mjs513/LittleFS/tree/FS_Integration

Update: SD: (This is what I forgot) with https://github.com/KurtE/SD/tree/FS_Integration

run the sketch: https://github.com/KurtE/MTP_t4/blob/FS_Integration/examples/SD_MTP-logger/SD_MTP-logger.ino
And then on PC try to expand the Teensy/SD directory... And it called through null ptr to un updated SD Modify date and crashed...

Easy ;)

Now back to playing
 
RE: p321 overnight ... No problems showing in interrupted Code4Code with added FLASH string testing
> didn't verify (like PJRC's encrypt region check)
> could put _isr code in FLASHMEM - where it should survive in cache as the other code cycles through some part of 1MB there
Code:
Memory Usage on Teensy 4.0:
  FLASH: [B]code:1001600, data:325752[/B], headers:8964   free for files:695300
   RAM1: variables:86752, code:37752, padding:27784   free for local variables:372000
   RAM2: variables:12384  free for malloc/new:511904

>> Just shows Flash Decode integrity - inc. FWD/BACK across byte boundaries and sizes 1-104, and cache when it works can be used (_isr() strings ideally never leave cache running faster than other code)
> expected rise in runtime per report - adds 1.5 secs overall and Pi code cycle count up for interrupt'ion - but most in main code and 4K func#() calls reading in the FLASH code. That shows in the 'Direct' versus 'Cascading' calls "net" changes.
> Temps match prior on bare pinned board - both TallDog mounted dropped 1+ degrees C with added code

Versus p#302 - showing times (and temps) with 10K Hz _isr() in Code4Code - new data bold:
After some run time at 600 MHz - NO DELAY between Pi looping: {prior 'temp'} p#302
Code:
[B]#970 LOCKED Beta on TallDog:[/B]
57.4089, 49.04 {prior 51.16}
Completed CASCADE Count 4000	Cascading took 5139308 us [3063869471 piCycles] : net 32859
Direct calls took 5108568 us [3063919512 piCycles] : net 2036

[B]444.2000, 47.63
10379970 ENC SM: Completed CASCADE Count 4000	Cascading took 5280591 us [3148891620 piCycles] : net 32439
Direct calls took 5255842 us [3152219032 piCycles] : net 2144[/B]


Code:
[B]#960 Unlocked [U]Encrypted [/U]Beta with Pins:[/B]
57.9214, 51.67 {prior 52.33}
Completed CASCADE Count 4000	Cascading took 5139290 us [3063864775 piCycles] : net 32849
Direct calls took 5108568 us [3063919436 piCycles] : net 2036

[B]441.5728, 51.67
10379960 ENC ns: Completed CASCADE Count 4000	Cascading took 5280494 us [3148810850 piCycles] : net 32476
Direct calls took 5255706 us [3152139114 piCycles] : net 2141[/B]


Code:
[B]#800 Production on TallDog:[/B] [U]early production (?) - no 'PCB Silkscreen' coding at USB[/U]
57.8957, 44.20 {prior 46.06}
Completed CASCADE Count 4000	Cascading took 5134759 us [3063228808 piCycles] : net 29378
Direct calls took 5108566 us [3063918648 piCycles] : net 2035

[B]441.9288, 42.96
6683800 nor ns: Completed CASCADE Count 4000	Cascading took 5276534 us [3148110373 piCycles] : net 29684
Direct calls took 5256682 us [3152713464 piCycles] : net 2160[/B]


Code:
[B]#210 Production with Pins:[/B]
57.3834, 49.56 {prior 52.02}
Completed CASCADE Count 4000	Cascading took 5134779 us [3063226278 piCycles] : net 29402
Direct calls took 5108566 us [3063918613 piCycles] : net 2035

[B]441.4177, 49.56
8710210 nor ns: Completed CASCADE Count 4000	Cascading took 5276755 us [3148237728 piCycles] : net 29693
Direct calls took 5255883 us [3152244106 piCycles] : net 2143
[/B]

NEW 5th T_4.0 that was on desk -pinned with long pins as much pin over as under - and 3 end pins above
This is the COOLEST RUNNING ONE - Also early production (?) run - no 'PCB Silkscreen' coding at USB, like #6683800
Code:
[B]445.0889, 41.99
6052840 nor ns: Completed CASCADE Count 4000	Cascading took 5276707 us [3148250809 piCycles] : net 29623
Direct calls took 5255769 us [3152179701 piCycles] : net 2137[/B]
 
@Defragster: Hvn't looked at your code.. but if you want that it runs from flash (I assume that), you need to disable the caches. Best is to disable both, instruction and data cache.
Without that, you measure the cache - with way less or even no decrypts happening.
On the other hand, if it is the case what Paul said - that the current is constant - you don't need to measure anything, because there is no additional information you can get.
If you just want to heat it up use as much periphals as possible, perhaps with high speed i/o and resistors to gnd - and perhaps lots of "doubles".
But I don't think this is your intention.

Edit: Paul's "Coremark" for example runs with exact the same speed, if from ITCM or from FLASH - because it fits entirely into the instruction cache.. let's assume the cache is not encrypted ;)
 
Last edited:
@Defragster: Hvn't looked at your code.. but if you want that it runs from flash (I assume that), you need to disable the caches. Best is to disable both, instruction and data.
Without that, you measure the cache - with way less or even no decrypts happening.

As noted in p#323 - the _isr() referenced 'strings' are used so often they likely survive the cache.
I wasn't worried about speed - just that the read/decode to Cache worked where the cache will be filled on first use for the ISR.
> now that I know it works - I could "arm_dcache_flush_delete( the string spaces a1,a2,z1,z2 ) on _isr() entry or exit? They are used just once per _isr() - and big enough I hope read_ahead wouldn't help - especially the backwards strings.

The base Code4Code cycles through some big part of 1MB of FLASH code - so the cache will always be flushed/changed each 5 seconds in loop(). Only the 'Direct' calls expect the code to be cached and it runs 'net' faster excluding the 'busy work' code making 60 digits of Pi that is most of the time spent.

Similar for 240K+** Flash data used in the Pi test portion, passing that much data will require the cache always be flushed/filling before it gets re-used much.
** Some of the other FLASH space used is each Func#() having "__FUNCTION__" embedded in a .print() - in case it failed you could tell where. And this was tested by manually 'breaking/editing' one of the function specific : szMyPi2[] copies of Pi to 60 digits.
 
Status
Not open for further replies.
Back
Top