Teensy 4? testing mbed NXP MXRT1050-EVKB (600 Mhz M7)

manitou

Senior Member+
sticky post: I will update this first post as new test results become available.
nxp1050.jpg

i.MX RT1050 == Teensy 4 ? Paul says initial Teensy 4 beta units will be 1052s, production units will be 1060.

Before the T3.5/T3.6 beta units were available, I did some testing of mbed K64F. In that spirit, mbed supports an NXP MXRT1050-EVKB evaluation board. I purchased the EVK board November, 2017, but for many months mbed did not support the board. The board did run its preloaded accelerometer/LED program, so presumably NXP's SDK worked, even though there was no support for the on-line mbed compiler/API. A month or so ago, mbed support appeared but I could not get a program to upload to my EVK board. Then I noticed there was a revised EVKB board that fixed some silicon problems with the 2017 board. In November, 2018, I purchased the revised board from digikey, and mbed compile/uploads worked on the new board. The stencil on the MCU says MIMXRT1052 DVL6B.

The mbed evaluation boards include various ARM processors (M0, M3, M4, and now M7) from various manufacturers. The mbed API based on CMSIS/HAL/RTOS is a lowest common denominator to support the varying peripheral architectures, and as such is not optimized. The on-line compiler uses the ARM CC (v5) typically with -O3.


MCU benchmarks:
As best as I can tell from the board schematic, the clocks are derived from an 24 mhz crystal (BOM 30 ppm). I measured the crystal drift to be about -7 ppm. Just spinning in a loop, the board consumes 128 ma (measured via hacked USB cable). The board has lots of devices (LEDs, SDRAM, codec, ethernet PHY, microSD, accelerometer/magnetometer, ...) that consume power.

Here are some results from coremark-like test and Dhrystone 2.1
Code:
               coremarkish   Dhrystone 2.1
              iterations/sec  DMIPS
    M7@600mhz      2438        2033   ARM CC -O3
      @528mhz      2146
      @132mhz       536
       @24mhz        97
  T3.6@256mhz       659        1120   Fastest+pure+LTO
  T3.6@180mhz       434         287   Faster
  T3.6@120mhz       289         191   Faster
  T3.5@120mhz       261         138   Faster
  T3.2@120mhz       254         106   Faster
Other anecdotal low-level benchmark data for various MCUs is available at
https://github.com/manitou48/DUEZoo/blob/master/perf.txt
and also see Teensy 3* coremark plots

The M7 has hardware floating point (FPv5) supporting both single and double precision. Here are floating point performance (megaflops) for linear-algebra linpack benchmark and from Teensy propshield sketches the update time for float-intensive Kalman filter, and ARM cortex DSP benchmark.
Code:
Linpack 100x100 mflops
               double    float
    M7@600mhz   66.97    125.5   ARM CC -O3
  T3.6@180mhz    2.13     28.4   Faster          
  T3.6@256mhz    2.85     41.1   Fastest
  T3.2@120mhz    0.65      1.0   Faster  no FPU


Kalman filter update time
                 microseconds
    M7@600mhz         99
  T3.6@256mhz        176
  T3.6@180mhz        248
  T3.5@120mhz        351
  T3.2@120mhz       3396
  mega2560@16mhz   30272


DSP FFT benchmark  1024  radix4 REVERSEBITS 0  (microseconds)
                q15     q31      f32
  M7@600mhz     74.5    126.9     95.6    ARM GCC -O3   arm_math.h v1.5.1
T3.6@180mhz    463.1   1215.2    703.7    Faster    v1.1.0
T3.6@180mhz    414.7   1010.7    598.2    Faster    v1.5.3
T3.5@120mhz    784.7   1947.9   1079.8    Faster    v1.1.0
T3.5@120mhz    658.5   1577.9    919.5    Faster    v1.5.3
K64F@120mhz    635.7   1273.8    827.2    ARM GCC -O3   arm_math.h v1.4.5
T3.2@120mhz    869.8   2498.5  18182.5    Faster   v1.1.0
adaM4F@120mhz  701.3   1756.1    781.0    Faster   v1.1.0   SAMD51
STM32L4@80mhz  917.3   1953.8   1150.4    Faster   v1.4.5

Here are some data rates (megabits/sec) for memcpy(). Often ARM-optimized versions of memcpy() are provided by the compiler (see slower memcpy thread). Memory-to-memory DMA can often be faster.
Code:
memcpy() speed (16-byte aligned, 4096 bytes)
              megabits/sec
    M7@600mhz     2731
  T3.6@180mhz     1214
  T3.5@120mhz      800
  T3.2@120mhz      780
     LC@48mhz       42

Using an example (trng_random.c) from the SDK, I was able to exercise the M7's hardware random number generator (TRNG). I think the unit can generate 512 bits of entropy. Requesting 512 bits takes only 4 microseconds, but if you request 1024 bits, it takes 211 milliseconds. Typically you might request 512 bits of entropy and then use a PRNG to generate additional random bits. I collected 1 MB of random data (15 hours) and ran NIST STS-2.1 statistical tests. Randomness was OK, a few too my 0 p-values for my taste.

The board has a FXOS8700Q accelerometer/magnetometer (I2C 0x3E) which is the same sensor as on the propshield and on the mbed K64F. I cloned my K64F program that reads the accel/mag and temperature, changed program to use A4,A5 for I2C, 0x3E for I2C ID (from i2cscan), and replaced mbed lib with mbed-os. Program is tracking board motion.
Code:
rate 50 hz  1983284 us
22 C
ACC: X=000192d Y=-00137d Z=004273d 	 MAG: X=-00319d Y=-00100d Z=000593d
ACC: X=000180d Y=-00137d Z=004253d 	 MAG: X=-00317d Y=-00099d Z=000599d
ACC: X=000191d Y=-00138d Z=004258d 	 MAG: X=-00308d Y=-00099d Z=000599d
ACC: X=0.0439f Y=-0.0356f Z=1.0430f 	 MAG: X=-31.7f Y=-9.3f Z=60.0f
ACC: X=0.0449f Y=-0.0337f Z=1.0415f 	 MAG: X=-31.4f Y=-9.9f Z=59.1f
ACC: X=0.0447f Y=-0.0356f Z=1.0435f 	 MAG: X=-31.8f Y=-9.9f Z=59.1f

Ported SDK dcp.c to mbed. M7 DCP is hardware to accelerate crypto functions. Tests passed
Code:
AES ECB Test pass
AES CBC Test pass
SHA-1 Test pass
SHA-256 Test pass
CRC-32 Test pass

sha256 16384 bytes 270 us 485.451852 mbs
crc32 16384 bytes 183 us 716.240437 mbs
aescbc 64 bytes 26 us 19.692308 mbs
The latest SDK uses the DCP to accelerate AES and SHA in both mbedtls and wolfssl.
Code:
   mbedtls benchmark GCC v7.3   -O3
                         w/ DCP
SHA-256      19197 KB/s  59125.32 KB/s
AES-CBC-128  15567 KB/s  57921.99 KB/s

The Teensy 3* has a programmable hardware CRC unit, the M7 only has hardware CRC32 in its DCP. Here are some results for FastCRC on M7
Code:
CRC Benchmark 16384 bytes
Maxim (iButton) FastCRC: Value:0xf6, Time: 247 us 530.656 mbs
Maxim (iButton) builtin: Value:0xf6, Time: 960 us 136.533 mbs
MODBUS FastCRC: Value:0x7029, Time: 145 us 903.945 mbs
MODBUS builtin: Value:0x7029, Time: 959 us 136.676 mbs
XMODEM FastCRC: Value:0x98d9, Time: 143 us 916.587 mbs
XMODEM builtin: Value:0x98d9, Time: 1506 us 87.033 mbs
MCRF4XX FastCRC: Value:0x4a29, Time: 149 us 879.678 mbs
MCRF4XX builtin: Value:0x4a29, Time: 276 us 474.899 mbs
KERMIT FastCRC: Value:0xb259, Time: 141 us 929.589 mbs
Ethernet FastCRC: Value:0x1271457f, Time: 166 us 789.590 mbs
Ethernet bitbang: Value:0x1271457f, Time: 2056 us 63.751 mbs
For the M7 the "FastCRC" are table-driven CRCs, the "builtin" is bit-bang CRC (ref Teensy FastCRC thread).

From the SDK, I ported an EDMA memory-to-memory example (edma_memory_to_memory.c). The buffers are in non-cacheable memory, and I have to cycle USB power to get the sketch to run. DMA for 256 32-bit words took 72 us.

Using mbed api, verified M7 RTC is ticking and drift is -52 ppm (BOM 20 ppm). There is a 32khz crystal on schematic. Compiler flag is -DDEVICE_RTC=1 and CCM_ANALOG->MISC0 = 0x24008080 indicates RTC crystal present. mbed API is reading SRTC (LPSRTCMR, LPSRTCLR).
Code:
     crystal drift (ppm)
          EVK  EVKB   specs   1060
24MHz     -14    -6    30       -8
32KHz     -54   -52    20       10

I did some unconnected SPI tests and data rates were quite SLOW. The mbed SPI API is using LPSPI1 to do byte transfers. I used a scope to watch SPI CLK. When SPI frequency of 1 MHz, scope showed 961 khz. By calculation with prescale (LPSPI1->CCR) i would expect 32727271.5/33 = 991735 Hz. Requesting 32 MHz, scope shows 16.4 MHz, calculation is 32727271.5/2 = 16363636 Hz (max SPI speed). The trouble is the interbyte delay is 11.7 us, so a 1024-byte transfer takes 12.7 ms (0.65 mbs) -- pretty slow. Presumably DMA could speed things up, and there are other SPI drivers available on the MCU.

I used the SDK examples to test LPSPI3 with FIFO. Data rates are getting much closer SPI clock speed. From the scope, the max SPI clock seems to be 16MHz (LPSPI is being clocked at 32.7mhz, max SPI CLK will be 32.7/2). The CMSIS SPI DMA example is faster than FIFO after adjusting the DBT value in the SDK DMA sketch. DMA buffers are in non-cacheable memory.
Code:
    LPSPI3 data rate (megabits/s) 1024 bytes
SPI CLK  scope    FIFO        DMA
4MHz    3.6mhz   2.7 mbs     3.1 mbs
8MHz    6.5mhz   4.5 mbs     6.1
10MHz   8.2mhz   5.5 mbs     7.5 mbs
16MHz  10.8mhz   6.8 mbs     9.4
20MHz  16.1mhz   9.4 mbs    13.1 mbs
32MHz  26.4mhz              17.3 mbs
60MHZ  53.2mhz              27.3 mbs  distorted LPSPI CLK 105.6
For comparison, SPI perfromance on other MCUs.

I couldn't get the Ethernet SDK example (enet) to work with mbed libs/startup, but I was able to compile and run the raw enet broadcast example using MCUXpresso IDE. Example broadcasted 20 packets and logged any broadcasts it heard. No ether pins on T4 but probably on T4.1, ref. The mbed schematic shows that about 12 MCU pins (ENET_*) are connected the Ethernet PHY. Mbed lib and SDK use lwIP and polling (mbed uses RTOS) to provide the TCP/UDP/IP services. Using the IDE I ran the SDK udpecho example confirming that lwIP is working along with ARP, ping/ICMP, and UDP. The latency for an 8-byte UDP echo was 104 us. I also ran the SDK lwIP TCP echo example and the iPerf example (TCP rate: 71 megabits/sec). With the Ethernet/PHY running, the board consumes an additional 100 ma.
Code:
                    NXP1062 (SDK lwIP) 
UDP latency(us)       104    8-byte UDP RTT
UDP send (mbs)         97    20 1000-byte packets
UDP recv (mbs)         95    
UDP send 1000x8        137453 pps 

TCP send (mbs)         87    100 1000-byte 
TCP recv (mbs)         71    

PING RTT(us)          108
For comparative performance see K66 ether testing or
https://github.com/manitou48/DUEZoo/blob/master/wizperf.txt

Using a scope I measured the ISR latency (attachInterrupt) using the PWM source and pin toggle in the ISR. The fastest PWM rate was 333 Khz (period 3 us). This is much slower than Teensy 3 as seen in the following table. ISR overhead for ARM is about 50 cycles, so we might expect the 600 MHz M7 to have a latency of 83 ns (12 MHz).
Code:
   ISR latency  (attachInterrupt)  PWM frequency and period
T3.2@120mhz   1.2 MHz  835 ns    FASTISR  2.6 MHz  384 ns
T3.6@180mhz   1.7      582                3.4      300
M7@600mhz     0.33    3000
I ran a low-level ISR latency test with the SDK gpt_timer.c with a D7 GPIO toggle in the ISR. With scope on D7, and no __DSB in ISR, latency was 197 ns (3.8 MHz). With __DSB enabled, latency was 260 ns (1.9 MHz). Here is errata comment in ISR:
Code:
    /* Add for ARM errata 838869, affects Cortex-M4, Cortex-M4F, Cortex-M7, Cortex-M7F Store immediate overlapping
      exception return operation might vector to incorrect interrupt */
#if defined __CORTEX_M && (__CORTEX_M == 4U || __CORTEX_M == 7U)
    __DSB();
#endif

To test power consumption at different frequencies, I ran the SDK power_mode_switch app, described here, with meter and hacked USB cable. At 600 MHz the app/board consumed 158.9 ma, @528mhz 150.3 ma, @132mhz 116.9 ma, and @24mhz 93.9 ma. The specs say the MCU should consume 0.11ma/MHz.
Code:
Various clocks with MCU @600MHz
     CPU:             600000000 Hz
     AHB:             600000000 Hz
     SEMC:             75000000 Hz
     IPG:             150000000 Hz
     OSC:              24000000 Hz
     RTC:                 32768 Hz
     ARMPLL:         1200000000 Hz
with MCU @24MHz
     CPU:             24000000 Hz
     AHB:             24000000 Hz
     SEMC:            24000000 Hz
     IPG:             12000000 Hz
     OSC:             24000000 Hz
     RTC:                32768 Hz
     ARMPLL:          24000000 Hz
With a hacked USB cable i measured blink power at 110 ma to 138 ma, hello_word 170 ma (138 ma in initial debug pause), coremark 184 ma, wolfssl benchmark 246 ma, and mbedtls benchmark 200 to 281 ma, mostly around 265 ma.

The eval board has microSD, and SDK uses fatfs with 4-bit SDIO. SDK example read directory and write and read file worked. Here are some data rates for reading various buffer sizes (bytes).
Code:
buffer  time(us)  rate(mbs)
 512     308       13.3    
1024     333       24.6
2048     380       43.1
4096     485       67.6
For reference, some older T3.6 SD tests.

Some settings of MCU control registers:
Code:
       ArmPllClk 1200000000  AhbClk 600000000  RtcClk 32768  Usb1PllPfd0Clk 261818172  IpgClk 150000000 SemClk 163862064 
       Usb1PllClk 480000000  OscClk 24000000
       XTALOSC24M->LOWPWR_CTRL 0x74f01
       XTALOSC24M->OSC_CONFIG0 0x93033a73
       XTALOSC24M->OSC_CONFIG1 0x2db002dc
       XTALOSC24M->OSC_CONFIG2 0x800102d7
       SCB->ICIALLU 0x0
       SCB->CCR 0x70200
       SCB->CSSELR 0x0
       SCB->CCSIDR 0xf01fe019
       SCB->DCISW 0x0
       SCB->CCR 0x70200
       CCM->CBCDR 0x18340
       SNVS->LPCR 0x20
       SNVS->LPSRTCMR 0x0
       SNVS->LPSRTCLR 0x0
       CCM_ANALOG->PLL_ARM 0x80002064
       CCM_ANALOG->PFD_480 0xf1a2321
       CCM_ANALOG->PFD_528 0x405d4040
       CCM_ANALOG->MISC0 0x24008080
       SysTick->CTRL 0x7
       SysTick->LOAD 0x927bf
       CCM->CCGR0 0xffffffff
       CCM->CMEOR 0x7ffffff

Observations:
  • pins 10-13 not connected to "arduino header"
  • printf sometimes prints garbage with mbed lib, works better with mbed-os lib (mbed os 5?)
  • PWM gets IO error (not supported) with mbed lib, but with mbed-os lib , PWM works 50hz. mbed PWM is using eFlexPWM (ch 28, 150 MHz clock) and the API limits you to max PWM frequency of 1 MHz. With scope, I hacked the PWM registers, testing it at 10 Mhz and 25 MHz. At 30 and 38 MHz signal is sawtooth with reduced Vpp (1.36v).
  • with mbed lib, some mbedtls functions hang. mbed-os has mbedtls builtin, and tls functions seem to work
  • GPS PPS works, crystal drift -7 ppm
  • measured internal GPIO pullup at 47Kohm
  • builtin LED is pulled HIGH, so write LOW to turn on
  • I2C on A4,A5 ok @100khz, 400khz, and 1MHz I2C scan with scope (96.1khz, 357khz, 757khz), two devices found
  • ran SDK app examples (ecompass.bin bubble.bin) to exercise accelerometer/magnetometer
  • exercised M7 timers with SDK examples (pit, qtmr, gpt, SysTick)
  • There are 4 6-bit DACs associated with analog comparator (ACMP, Ch 13). The DAC is used as a reference voltage for the CMP, but voltage is brought out on 4 pins (GPIO_AD_B1_12 - 15). On the eval board those pins are used for the WM8960 codec (SAI), so I can't test DAC voltages (maybe use XBAR?). SDK has an ACMP example (cmp_polling.c) that uses the DAC with the CMP. AD_B1_14,15 are Teensy 4 pins 26,27 (Paul's pin list).
  • mbed-os lib sits atop a thread base and the default stack size is 4KB, so one may encounter stack overflows.
  • Pin toggle (in mbed-speak mypin = !mypin) takes about 70 cycles, measured with SysTick->VAL. Using the GPIO hardware toggle with GPIO1->DR_TOGGLE = 1<<9; takes only a few cycles. There is a synchronization delay between the MCU and GPIO as exhibited by the following 8 toggles of pin 7 (169 cycles).
    m7toggle.png
  • PIT and GPT timers share the same base clock PERCLK. One can configure PERCLK for OSCCLK (24mhz) or IPGCLK (150mhz), ref Fig 18-2. If your program uses both PIT and GPT, you need to insure both are configured with the same PERCLK clock.
  • ran SDK sai demo which exercised codec. I could hear sine wave through ear phones. Nothing from record/playback test. Demo does FFT on sine wave to calculate frequency.
  • hooked up 272x480 RK043FN02H-CT LCD ribbon cables(reference) and ran a few SDK examples (424 ma).
  • ran SDK and mbed ADC tests, ADC supports calibration, averaging, and various speeds and resolutions
  • GNU GCC options of interest: -fomit-frame-pointer -mcpu=cortex-m7 -mthumb -mfpu=fpv5-d16 -mfloat-abi=softfp
  • Update to SDK 2.5.0, ran SDK wdog01 example, WDOG ok. also ran SDK temperaturemonitor example, ok
  • Using MCUXpresso to erase flash of 2017 EVK board, I was able to upload SDK examples. reference
  • 2019 some 1060 testing MIMXRT1062 DVL6A
  • Cortex M7 is superscalar

TODO:
  • test EEPROM?
  • ARM CC compiler bugs: :confused: A couple of mbed sketches that work on other mbed evaluation boards, hang on the M7. If I export the program for building with GNU GCC 7.3.1, and build and load the GNU version on the M7, the program works. Also works if I use mbed-os lib rather than mbed lib.

References:
Code:
    SDK examples
     boards/evkbimxrt1050/driver_examples
     adc/      cmp/   elcdif/  flexcan/  gpio/   lpspi/   pxp/     semc/  wdog/
     adc_etc/  csi/   enc/     flexio/   gpt/    lpuart/  qtmr/    snvs/
     bee/      dcp/   enet/    flexram/  kpp/    pit/     rtwdog/  src/
     cache/    edma/  ewm/     flexspi/  lpi2c/  pwm/     sai/     trng/

     boards/evkbimxrt1050/demo_apps
     bubble/    hello_world/		     led_blinky/	 sai/	   shell/
     ecompass/  hello_world_virtual_com/  power_mode_switch/  sd_jpeg/
 
Last edited:
In comparing mbed compiler ARM performance with Teensy ARM processors, I usually see the ARM CC based mbed compiler giving better performance than the Teensy GNU gcc compiler. I noticed the following function in the mbed M7 core library initialization code:
Code:
#if defined(TOOLCHAIN_GCC_ARM)
extern uint32_t __ram_function_flash_start[];
#define __RAM_FUNCTION_FLASH_START __ram_function_flash_start
extern uint32_t __ram_function_ram_start[];
#define __RAM_FUNCTION_RAM_START __ram_function_ram_start
extern uint32_t __ram_function_size[];
#define __RAM_FUNCTION_SIZE __ram_function_size
void Board_CopyToRam()
{
    unsigned char *source;
    unsigned char *destiny;
    unsigned int size;

    source = (unsigned char *)(__RAM_FUNCTION_FLASH_START);
    destiny = (unsigned char *)(__RAM_FUNCTION_RAM_START);
    size = (unsigned long)(__RAM_FUNCTION_SIZE);

    while (size--)
    {
        *destiny++ = *source++;
    }
}
#endif
So the startup code seems to copy the flash into RAM, presumably giving a performance boost like Teensy FASTRUN.

Example, dhrystone 2.1 on M7 with mbed ARM-GCC -O3 gets 2033 DMIPS, exported and built with GNU GCC 7.3.1 -O3 yields 1457 DMIPS.
 
Last edited:
@manitou - Thanks for the post was looking at picking up one these boards just to play around with but wasn't sure how to get started. I just went through the NXP getting started page and that does a nice job with videos to show how to install the MCUxPresso ide. But got a little confused with one of the options on tool chain to use or just ignore it and use the default?

Any help would be appreciated. Really only used the Arduino IDE with the established cores before.

thanks
Mike
 
I've only used the mbed on-line (web-based) compiler. I have not used any of the mbed/NXP IDE's or the NXP SDK. There is also an MBED CLI that allows you to choose your tool chain (GNU GCC, ARM CC, or IAR). The latter two require licenses and money me thinks. I have exported programs from the on-line compiler that allows you to do a make with another toolchain on your personal computer from the exported zip file.

I just noticed on the board page that MBED has a TODO list as well:
Code:
Important Notes

Support for the following features of MIMXRT1050-EVK are currently in development in Mbed OS. Schedule for release is TBD.

    Ethernet
    PWM
    TRNG
    USB 

Signal Connection Information:

    The I2C signals connected to D14 & D15 on the Arduino header are not capable of I2C Master mode. To use the I2C master mode with an Arduino-compatible shield slave such as a memory shield, connect D14 (SDA) to A4 and D15 (SCL) to A5.
    The SPI signals are not connected on the Arduino header by default. To connect the SPI signals, add a 0 ohm resistor to R278, R279, R280 , R281 on the back side of the REVB board.
 
Thanks for the added info. Didn't see the on line compiler anywhere - have to check. I saw those notes when I was looking at those pages. Just as info I ordered the board from Digikey to try some things out on it :)

Thanks again
Mike
 
mbed may be trying to phase out on-line compiler, go to
https://www.mbed.com/en/
and click on compiler. you'll need to create an account. the compiler allows you to add boards to your profile. many of the descriptor pages for the mbed boards have a side-bar with examples that you can import into your compiler session
 
Thanks manitou. Just remembered in the other thread they also have a 1060 and 1064 dev board. Not sure what the final chip will be for the first T4
 
Not sure what the final chip will be for the first T4

I'm not sure yet either!

But I do know we'll begin beta testing with the 1052 chip, probably next month. We already have a couple hundred 1052 chips for betas sitting here in a box. Before we do much with betas, I need to make another board rev (this will be the 3rd rev) to deal with another way the board could potentially become bricked. This chip has a lot of special security features, which translate into a *really* difficult task for me to make an unbrickable product!

On the choice of chip, it's probably going to come down to a matter of cost. We're *still* waiting on a quote.
 
would have been nice to have FD support, currently working on T3.x library for the 2517FD similar to ifct
 
Thanks for the update on the T4.

But I do know we'll begin beta testing with the 1052 chip, probably next month.
Guess that will be folks Christmas present :)

Really appreciate all that you do with the Teensies. Since I got into them haven't worked with an Arduino's just the IDE. Don't think I ever realized before now the amount of work you all put into a new design. Exciting but still a lot of effort goes into it.
 
Will be interesting to see it work out. A complex chip with multiboot options, security and protection - weird that it resolves all the way out to the PCB design to minimize bricking options.

With the MCU being new having hardware REV's and then later arriving evolving family members … and supply ... " *still* waiting on a quote. "

Indeed @mjs513 - lots of design effort before and follow up for such broad support and easy to use.

@manitou bought one 1050 EvalKit then buying again for the update to get it working … and 1060 has its own and now the 1064. Also wondering perf tests on those eval board affected by any upgraded components like Flash - though suppose they run from RAM or Cache?
 
Indeed this chip has a lot of tough challenges. With the Kinetis parts, even when the chip is held in reset, the MDM is accessible by SWD protocol. The ARM core and stuff on the PPB are also accessible, but all access to all memory & peripherals are blocked because the crossbar switch is affected by the hardware reset. But the DWT is accessed by the PPB, it's possible to get control of the chip before any code executes. And if the chip is locked, the MDM is still accessible during reset, so we can do a full flash erase very easily.

The new IMXRT chips block SWD & JTAG during reset and during startup while much of their ROM-based secure startup runs. The JTAG won't even respond to accessing its IR register (which like all JTAG shifts out a 1 in its LSB and 0s in all other bits while you shift in whatever instruction you want). Even JTAG boundary scan is unavailable. So there's absolutely no way to start up with a "clean slate" on this chip where no code has yet run. NXP's ROM does all sorts of things before any JTAG access is allowed.

This is a huge problem for reliably implementing some Teensy features, particularly the holding the Program button during powerup to prevent any previously loaded "bad" code from running. The next PCB rev will being the BOOT_MODE0 pin (aka GPIO_AD_B0_04 or BGA ball F11) to a pin on the MKL02 chip. By driving that pin high, NXP's ROM will continue running its flash loader, rather than your program. Their loader isn't very useful (only access to RAM, and some of its documentation seems to be just dead wrong), but the point is to just be able to gain access to control the hardware before any unpredictable code can run.

In fact, the point isn't even to gain access to the debug features... I've been working on code which can access the QSPI flash using bitbanging over JTAG boundary scan. Even if the chip's security is locked, the boundary scan feature is always supposed to be available. Well, unless the JTAG_HEO fuse has been set, but I'm planning to deal with that possibility by setting the BOOT_CFG_LOCK fuse as our last step during testing, so those critical fuses will be forever locked to settings that always allow recovery.

In the first betas, that permanent setting will (probably) be the unsecured mode. My hope is to later transition to using secure mode and ideally be able to support encryption of the flash image. Quite a good number of people use Teensy to start a commercial product design, where they don't want their code protected from easy copying. Eventually I want to support using the chip's encryption engine, which is why I've been putting so much work into how we can recover with boundary scan while the chip is locked. A component of this which I have not yet made work, is a provision for the encrypted image to request unlocking security. NXP's documentation only mentions it exists, but not how to actually do it. The method seem to be only available using their closed-source utility which writes an undocumented binary file format output.

If you're wondering how encrypting the flash chip could work, well, I must confess there are still some complexities about authentication I don't yet fully understand. But the basic idea is you encrypt (or Teensy Loader might do it for you automatically) the flash image using your AES128 key. Your AES128 key must also be written into the permanent one-time programmable fuse memory inside the IMXRT chip. The encrypted data in the QSPI flash chip can then only be run on a processor which has the correct key written to its permanent fuse memory, so copying the external SQPI flash chip becomes useless. There are 2 special lock fuses for the key. One prevents any further changes. The other prevents *reading* the key. But the chip's encryption engine has a dedicated path to use the key. So once the key is set and locked, even if you fully erase the chip and get it to run unencrypted code, even if JTAG debug is enabled, you can't read out the previously written AES128 key (but of course debug lets to read the plaintext while the code is running - so debug needs to be shut off while running locked code to prevent copying). The key read lock gives very strong protection from anyone capturing your secret key. The authentication stuff also gives you a way to prevent other people from creating images that can run on your chip. One consequence of all this, which is true for pretty much an encryption, is if you lose your secret key you lose access to ever create new encrypted images compatible with hardware that already has your key locked.

The chip also has some dangerous anti-tamper modes which we'll probably never support (and intentionally lock out from being used by the fuse settings). For example, it's possible to store your AES128 key in the VBAT-powered RAM. Then you can wire up tamper detection hardware like limit switches or light sensors, which trigger the chip to erase that RAM. It also has a secure RTC which never allows the time to set backwards to a previous date (and detection of RTC tampering can destroy your AES128 key). Likewise if the power fails and the coin cell is dead. The chip becomes permanently bricked, unable to ever run code again, because the AES128 key is forever lost and can't be written again.

At least initially, we'll probably only support use in the unlocked, unencrypted mode, where anyone could read the code by JTAG or just copy the external flash chip. But due to this chip's many security features, even just doing that in a way where we can always recover and fully erase the flash is quite a challenge. Hopefully this lengthy message at least gives you some idea of the stuff I'm doing at this moment.
 
Personally, I look forward to a teensy loader that limits all dirty tricks and allows application programmers to enjoy the power of a T4 (I understand now that this is not easy)
Sure, there will be some time in the future, when people read the manuals and try to implement 'crazy' ideas.
 
I've tested a few more features (TRNG, I2C, PWM, memcpy(), Kalman filter). See post #1 for details.
 
Last edited:
Should be getting the board today. Then will start the learning curve. Any tips or gotchas I should worry about especially with i2c?
 
Should be getting the board today. Then will start the learning curve. Any tips or gotchas I should worry about especially with i2c?

Import the blinky example from sidebar on board page. That example uses the mbed-os lib, which is what you need for your testing. Use A4,A5 for I2C, an I2C scan shows 0x34 and 0x3E, the latter is FXOS8700Q accelerometer/magnetometer . I already had an mbed K64F program that exercised that motion sensor, and just ported it to M7. Start with
https://os.mbed.com/teams/NXP/code/Hello_FXOS8700Q/

mbed api docs: https://os.mbed.com/docs/v5.10/apis/index.html

Board comes with SDK's bubble.bin running. When you tilt board, user LED goes on or off.

when you compile, browser drops a .bin into your download area. when board is plugged in, you should have new USB drive. You drag and drop the .bin into the USB drive, then push reset on the board. You need a terminal program to manage /dev/ttyACM0 (linux-speak)
 
Last edited:
Hi @manitou

Got the board up and running. Loaded blinky using the online compiler which I think is probably easier to use than the ide - but that's next. Next was to load up the https://os.mbed.com/teams/NXP/code/Hello_FXOS8700Q/ example. Had to do one thing besides changing the sda/scl pins and that was to add the baud to the serial pc setup. For the test was using a FDRM-STBC-AGM01 board from freescale that I got in 2015 so it looks like its working. Have to play around with what the calls are doing - a little strange, accel and mag values don't change. Tomorrow will play more

Thanks
Mike
 
For the test was using a FDRM-STBC-AGM01 board

The eval board has a builtin FXOS8700Q, yes? so you wouldn't need to use FDRM-STBC-AGM01. In program, you need to change the I2C slave IDs to 0x3E for onboard FXOS8700Q. if values aren't changing, I2C probably isn't working. What does "Who am I " report in the program? should say "C7" for FXOS8700Q. The I2C won't work with mbed lib, you need to build with mbed-os lib. So in FXOS8700Q compiler window, right click on "mbed" and select delete, then go to blinky compiler window and right-click on "mbed-os" and select "copy", then back to FXOS8700Q window and right click on white space and select paste.

I also tested M7's DCP (crypto hardware). Results added to post #1.
 
Just figured it out but you got back to me to before it could post. Something dumb on my part - had to change the address of the FXO8700Q. Was very late last night when I tried it.

Yes, I know, there is the onboard version but had the AGM01 handy just to test the Arduino interface with I2C. Want to see I can get Sensor fusion and read the gyro from the board. These are the results I got:
ACC: X=000047d Y=-00192d Z=004207d MAG: X=000935d Y=000346d Z=000411d
ACC: X=000037d Y=-00191d Z=004208d MAG: X=000935d Y=000346d Z=000411d
ACC: X=000037d Y=-00191d Z=004208d MAG: X=000940d Y=000346d Z=000411d
ACC: X=0.0090f Y=-0.0466f Z=1.0273f MAG: X=94.0f Y=34.5f Z=40.7f
ACC: X=0.0132f Y=-0.0466f Z=1.0286f MAG: X=94.0f Y=34.5f Z=40.7f
ACC: X=0.0132f Y=-0.0466f Z=1.0286f MAG: X=93.5f Y=34.5f Z=40.7f

next step gyro.
 
Ok. Spent the day on and off trying to port a gyro library over for the FXAS2100C to use with the AGM01 board. Unfortunately could not get it working to save my life. I probably loused up the i2c reads. But tried variations using the Accel lib as an example. Probably should port over the scanner, guess that next.

Before you ask - I get no data printed and I think it gives an assertion error with i2c, not sure what that means. If you want the source to try it, let me know.
 
Couldn't edit previous post got it working (something dumb on my part) so it returns results but can't even get the gyro to be recognized. Will continue playing.
 
In comparing mbed compiler ARM performance with Teensy ARM processors, I usually see the ARM CC based mbed compiler giving better performance than the Teensy GNU gcc compiler. I noticed the following function in the mbed M7 core library initialization code:
Code:
#if defined(TOOLCHAIN_GCC_ARM)
extern uint32_t __ram_function_flash_start[];
#define __RAM_FUNCTION_FLASH_START __ram_function_flash_start
extern uint32_t __ram_function_ram_start[];
#define __RAM_FUNCTION_RAM_START __ram_function_ram_start
extern uint32_t __ram_function_size[];
#define __RAM_FUNCTION_SIZE __ram_function_size
void Board_CopyToRam()
{
    unsigned char *source;
    unsigned char *destiny;
    unsigned int size;

    source = (unsigned char *)(__RAM_FUNCTION_FLASH_START);
    destiny = (unsigned char *)(__RAM_FUNCTION_RAM_START);
    size = (unsigned long)(__RAM_FUNCTION_SIZE);

    while (size--)
    {
        *destiny++ = *source++;
    }
}
#endif
So the startup code seems to copy the flash into RAM, presumably giving a performance boost like Teensy FASTRUN.

Example, dhrystone 2.1 on M7 with mbed ARM-GCC -O3 gets 2033 DMIPS, exported and built with GNU GCC 7.3.1 -O3 yields 1457 DMIPS.

Would'nt it work without copying? The chip has a cache, and when running from RAM, a cache seems useless (?!)
 
Code:
Would'nt it work without copying? The chip has a cache, and when running from RAM, a cache seems useless (?!)

I can't explain it ... There is a 32KB D and 32KB I cache, and presumably the caches are enabled. I haven't deciphered the core's cache-control registers.

Maybe it saves cycles on initial cache load ?

FWIW, a lot of the SDK examples are sprinkled with SCB_DisableDCache();. and the memory-to-memory EDMA example uses noncacheable memory for the source and destination buffers
AT_NONCACHEABLE_SECTION_INIT(uint32_t srcAddr[BUFF_LENGTH])
The SDK ethernet example also uses noncacheable buffers for the ring buffers.
 
Last edited:
Back
Top