Teensy 4? testing mbed NXP MXRT1050-EVKB (600 Mhz M7)

I did some more SPI tests with SDK's LPSPI3 with FIFO and with DMA. The initial mbed-os lib SPI tests were yielding less than one megabit/second. With the FIFO, SPI data rates are approaching the SPI CLK speed (see table in post #1). CMSIS DMA example is slower than FIFO test because of 960 ns gap between bytes. With scope, fastest SPI CLK observed was 16 MHz. DMA buffers are in non-cacheable memory. For FIFO test, hardware is driving the SPI CS pin.

LPSPI is being clocked at 32.7mhz, max SPI CLK will be 32.7/2. Data sheet says absolute max for LP SPI CLK is 30 mhz.
 
Last edited:
that was on my to-do list. which example did you work from ?
I am using the sdcard example that's in the driver examples folder. I changed the buffer size to 1024 which is defined pretty much at the top of the example for the test.
 
@manitou - sorry wrong example I am using the fatfs_sdcard example in the fatfs_examples folder. Too many examples spinning in my head
 
..and it is great.


Oh, good to know! Does always_inline override that?

Just in case it isn't clear, GCC can only switch the options used at the function boundary level. So if you have:

Code:
__attribute__((__optimize__("O3,unroll-loops")))
void fast_inner (...)
{
}

void slow_outer (...)
{
    // ...
    if (test) {
        fast_inner ();
    }
    // ...
}

There is no support in the compiler to change the optimization level or target options in the if statement. You only can change things at the function level. That means with __aways_inline__, you lose the optimization attributes.

I should mention my motivation for adding it was more from the hosted compiler perspective and not the embedded perspective. At AMD and now at IBM, you have generations of processors. Typically most code is compiled for the least common denominator, but for performance critical code, you might want one function to be compiled with special options.

For the hosted folk which have full shared library support you now have the target_clones attribute:

Code:
/* Power9 (aka, ISA 3.0) has a MODSD instruction to do modulus, while Power8
   (aka, ISA 2.07) has to do modulus with divide and multiply.  Make sure
   both clone functions are generated.

   Restrict ourselves to Linux, since IFUNC might not be supported in other
   operating systems.  */

__attribute__((target_clones("cpu=power9,default")))
long mod_func (long a, long b)
{
  return a % b;
}

long mod_func_or (long a, long b, long c)
{
  return mod_func (a, b) | c;
}

Now this example is silly, because it takes a lot longer to call a function through the PLT that can vary at initialization time (or is in a shared library) than a normal call, but it illustrates the example usage.

I also did think about the embedded use (such as Teensy) and knew that often there were often some reasons you might want to select different functions to be compiled differently. In the embedded world, you are typically only compiling for a single processor, so having different target options doesn't give you much.

  • Often times you have very little code space, and compiling most functions with -Os will allow you to fit into a fixed size flash memory region;
  • There are times when an embedded processor is doing something speed critical, and you need every cycle you can get, but most of the program is not speed critical;
  • Unfortunately there are bugs in complex software like GCC, and using an attribute to turn off a specific optimization that results in buggy code might be needed;
  • In environments like Arduino, it is not always easy to modify what compilation options are used.
 
Last edited:
@MichaelMeissner - thanks for the detailed explanation in your post, especially the last paragraph. Really explains things for a neophyte like me :)
 
LPSPI is being clocked at 32.7mhz, max SPI CLK will be 32.7/2. Data sheet says absolute max for LP SPI CLK is 30 mhz.
Oh. This limits the possibilities with fast spi displays quite a bit. Too bad that the inbuilt display interface device wants so many pins.. maybe a 8 bit interface would be possible, but I don't know if we have all needed pins at the moment.
 
Oh. This limits the possibilities with fast spi displays quite a bit. Too bad that the inbuilt display interface device wants so many pins.. maybe a 8 bit interface would be possible, but I don't know if we have all needed pins at the moment.

In the LPSPI example, I increased the peripheral clock to 65 mhz, and requested a SPI speed of 32 mhz. Scope measured about 22 MHz. However, there is another SPI (flexSPI) and data sheet says its max SPI CLK is either 60 mhz or 133 mhz. It seems flexSPI is aimed at flash controllers (QSPI?), and I am not sure if it is a general purpose SPI. The SDK examples are for NOR and NAND flash testing. One would have to see what Teensy 4 pins are available.

https://forum.pjrc.com/threads/5276...in-assignments?p=190141&viewfull=1#post190141
I think the SPI pins in Paul's T4 pin list are for LPSPIn
 
Last edited:
Ok, worst case would be switching to VGA (... ->port the VGA lib).. and use 40$ 7'' VGA Displays.. should be possible, and a larger display is nice..costs a bit more.. (for my emulation project)
 
I found this on the nxp forum: https://community.nxp.com/thread/477739, i.mx RT1050 LPSPI. Think it just about confirms what @manitou is saying about lspi max speed.

EDIT1: Ok my first edit is way wrong. Just found this after I rambled in the first edit that might be part of the problem:
Setting i2c/SPI baud rate on rt1050?

EDIT: Getting myself a little confused here if I look in Chapter 18, it looks like the lpsi_clk_root is automatically divided by 4 to get you the SPI max freq. So even if you change the clock source, would it automatically be divided by 4. So any additional divisor you put on the clock selected you would have to adjust as well to account for the automatic /4. I looking at the data sheet the lspi_clock_root speed defaults to 120mhz giving you the 30mhz max as shown in the reference link. To get faster would you have to change the lspi_clk_rook to something like 240mhz to get a lspi clock of 60Mhz? Don't know - talking out loud here.
 
Last edited:
The fastest I've been able to get from LPSPI and have it actually work for reading the JEDEC ID bytes from a serial flash is 44 MHz.

file.png

This test was done with LPSPI_CLK_ROOT at 528 MHz divided by 6, for 88 MHz. LPSPI SCKDIV was 0 (meaning divide by 2). Indeed LPSPI in master mode is able to generate SCK up to half the LPSPI_CLK_ROOT frequency.

I can get it to output faster SCK, but reading from the chip fails with wrong data. My test does have about 4 inches of wire and lots of other not-so-ideal setup for high speed signals. My main focus is just getting the software tested. But hopefully this quick check can help for understanding the max speed we can anticipate.

Edit: just to add a bit more info, this test had the LPSPI4 pins configured in their PAD registers with IOMUXC_PAD_SRE | IOMUXC_PAD_DSE(2) | IOMUXC_PAD_SPEED(3), which is fast slew rate, 75 ohm drive strength, 200 MHz bandwidth.
 
Last edited:
I found this on the nxp forum: https://community.nxp.com/thread/477739, i.mx RT1050 LPSPI. Think it just about confirms what @manitou is saying about lspi max speed.

EDIT1: Ok my first edit is way wrong. Just found this after I rambled in the first edit that might be part of the problem:
Setting i2c/SPI baud rate on rt1050?

EDIT: Getting myself a little confused here if I look in Chapter 18, it looks like the lpsi_clk_root is automatically divided by 4 to get you the SPI max freq. So even if you change the clock source, would it automatically be divided by 4. So any additional divisor you put on the clock selected you would have to adjust as well to account for the automatic /4. I looking at the data sheet the lspi_clock_root speed defaults to 120mhz giving you the 30mhz max as shown in the reference link. To get faster would you have to change the lspi_clk_rook to something like 240mhz to get a lspi clock of 60Mhz? Don't know - talking out loud here.


in the SDK example, when i print CLOCK_GetFreq(kCLOCK_Usb1PllPfd0Clk), i get 261818172 hz. your 2nd link says clk: 36000000
And ref manual, CCM would say LPSPI clock when selecting "1" should be 720/4 MHz. not sure how PFD0 selection relates to kCLOCK_Usb1PllPfd0Clk??
I can't say I understand that, but adjusting the divisor, did allow me to get to 21 MHz on SPI CLK.
 
Oh, please allow me to revise this a bit....

Turns out setting the SAMPLE bit in the LPSPI CFGR1 register allows correct reading from the flash chip at SCK of 53 MHz. :)

file.png

This test was with LPSPI_CLK_ROOT at 528 / 5 = 105.6 MHz, and the LPSPI at fastest div-by-2 setting.
 
LPSPI DMA update: I reported in post #101 that LPSPI DMA was slower than FIFO because of large delay between bytes (DBT). I hacked the DBT field in the LPSPI3->CCR register from 31 to 4 resulting in 144 ns gap. DMA data rates are now faster than FIFO. SPI table in post #1 updated.
 
Don't think its really a hack. Was rummaging through the lspi.h driver source and there seem to be 2 functions that you can use to adjust the delays:
Code:
/
 *
 * Note that the LPSPI module must be configured for master mode before configuring this. And note that
 * the delayTime = LPSPI_clockSource / (PRESCALE * Delay_scaler).
 *
 * @param base LPSPI peripheral address.
 * @param delayTimeInNanoSec The desired delay value in nano-seconds.
 * @param whichDelay The desired delay to configuration, which must be of type lpspi_delay_type_t.
 * @param srcClock_Hz  Module source input clock in Hertz.
 * @return actual Calculated delay value in nano-seconds.
 */
uint32_t LPSPI_MasterSetDelayTimes(LPSPI_Type *base,
                                   uint32_t delayTimeInNanoSec,
                                   lpspi_delay_type_t whichDelay,
                                   uint32_t srcClock_Hz);

and

 * Note that the LPSPI module must first be disabled before configuring this.
 * Note that the LPSPI module must be configured for master mode before configuring this.
 *
 * @param base LPSPI peripheral address.
 * @param scaler The 8-bit delay value 0x00 to 0xFF (255).
 * @param whichDelay The desired delay to configure, must be of type lpspi_delay_type_t.
 */
void LPSPI_MasterSetDelayScaler(LPSPI_Type *base, uint32_t scaler, lpspi_delay_type_t whichDelay);
and just for reference:
Code:
typedef enum _lpspi_delay_type
{
    kLPSPI_PcsToSck = 1U,  /*!< PCS-to-SCK delay. */
    kLPSPI_LastSckToPcs,   /*!< Last SCK edge to PCS delay. */
    kLPSPI_BetweenTransfer /*!< Delay between transfers. */
} lpspi_delay_type_t;
Not sure if you noticed this function or not. Think these do what you did with the DBT register adjustment. Might be easier to use for now.
 
Not sure if you noticed this function or not. Think these do what you did with the DBT register adjustment. Might be easier to use for now.
The raw registers are ugly, but the knowledge carries over to the Teensy 4. Trying to figure out NXP SDK/CMSIS APIs is not high on my list. :D
 
he raw registers are ugly, but the knowledge carries over to the Teensy 4.
Agreed. I try to look at the associated function calls, a lot of the it will shed light on the the modifications to the registers necessary to make the associated changes. That was the only reason I was pointing the functions out.
 
Trying to figure out NXP SDK/CMSIS APIs is not high on my list. :D

Good, 'cause we're not going to be using any of that stuff on Teensy!

A few times I've had to dig though their code for answers to undocumented stuff. For example, I *still* don't understand what the SION bit in the pin mux registers really does. The 1050 manual describes it as a loopback feature (page 1669). Usually it seems to have no effect, but some peripherals don't work unless it's set. So far I haven't found any way to tell when it's needed, other than experimenting or looking whether NXP used it.

I'm always amazed the sheer volume of extra stuff to read just to figure out what anything really does. I suppose programmers who write code in that style feel it's good practice. Or maybe NXP has corporate requirements documents & standards which require all code written & formatted a certain way? I'm sure it's all done with the best of intentions, but the end result is an excessive amount of verbiage to sift though, just to figure out what anything actually does.

I really don't like that highly verbose style. My preference could be summed up as "less is more".
 
This test was with LPSPI_CLK_ROOT at 528 / 5 = 105.6 MHz, and the LPSPI at fastest div-by-2 setting.

FWIW, i modified the SDK LPSPI3 DMA master program to NOT mess with the LPSPI_CLK_ROOT settings. Reading the MUX and DIV settings, LPSPI clock was running at 105.6 mhz. Requesting 32 MHz for SPI CLK, scope showed 27 mhz. If I requested 40 MHz, CLK signal was distorted and Vpp reduced (2.28v) and scope showed 35 mhz. I presume Paul's pin settings (post #113) might improve the CLK signal.

EDIT: I changed GPIO pin settings to match Paul's, that helps waveform some, could still be some scope/cable problems, but here is scope view for SPI CLK request of 60 mhz.
spi3.png
Scope measures SPI CLK 53.2 mhz, but due to inter-byte gap, data rate is only 27 mbs.
 
Last edited:
I too saw terrible waveforms before setting the PAD register. Or at least what I believe to have been terrible... it takes a little imagination and guesswork with an oscilloscope having "only" 200 MHz bandwidth, and a casual effort connecting the probes with ground clips and wires & traces several inches long.
 
Paul noted his idea to support AI on the T_4 ... Wasn't sure where to post this for ref. News from Hackster.io ... Learn how to do machine learning on an Arm Cortex-M microcontroller
DevDay: Explore AI on Arm
Learn about some of the key building blocks needed to develop intelligent machines. This one-day, online interactive event will give you experience with machine learning, voice recognition and computer vision.

Hands on participation requires: Includes Matrix Creator, Raspberry Pi, SD card and power supply
>> Raspberry Pi 3 Model B:
>> MATRIX Creator: Fully-featured IoT development board based on an Arm Cortex-M3 CPU, that also includes a Xilinx Spartan FPGA, sensors and wireless capabilities.
 
Back
Top