Future Teensy features & pinout

Maybe consider a mcu module+carrier board pair to accommodate for more options and possibly ease the routing? Or rather, mcu module+interface boards.

The 1170 on a module board with multiple HD connectors, with roughly the "set of v4 or v4.1 pins" routed to one/many of the connectors and a cheap breadboard friendly interface board to provide some backward compatibility. Allows for higher volume of mcu modules than a split production of short/long boards such as v4/v4.1 split, if that was even considered

Depending on layout and signals routed to connectors, it may even allow the use of a pair "breadboard friendly interface boards" side-by-side, so again more volume of the same stuff.
 
Indeed @MM that second processor is going to be a chore. With symmetric cores or even one tasking with RTOS it is all one 'sketch' in the same device/memory. The 1170 is twin unique cores with unique subsets of the same device.
 
Indeed @MM that second processor is going to be a chore. With symmetric cores or even one tasking with RTOS it is all one 'sketch' in the same device/memory. The 1170 is twin unique cores with unique subsets of the same device.

IIRC, each has their own main memory, so I tend to view it as similar to two independent chips talking via serial lines. I.e. I would imagine the main processor will just have a larger flash memory, and it will load up the secondary memory. I.e. each processor will have its own library, compiled for that chip set. The library, etc. won't be shared. But presumably there are faster mailboxes, etc. that allow the two to communicate instead of using serial UARTs.
 
Indeed - noted in nxp.com/docs/en/fact-sheet/IMXRTPORTFS.pdf
Code:
 Up to 2 MB of SRAM
• 512 KB of TCM with ECC for Cortex-M7
• 256 KB of TCM with ECC for Cortex-M4
Assuming that leaves the 2048-512-256 in shared slow space?

Single off chip Code Flash for the PCB - not sure how the boot starts the seemingly independent M7 & M4 ? That 'Fact Sheet' doesn't give much in the way of details on interconnection or control.

No new relevant docs posted lately - and no public release of the EVK board ...
 
The 1170 boots up with only the M7 core running. The M4 core stays in a low power sleep state until explicitly woken up.

Pretty much everything about 1170 is still under NDA, so I can't say much right now. But I really don't believe knowing which core boots is much of a secret. I am going to refrain from mentioning any memory map details until NXP publishes the reference manual.
 
It may be that it is too hard of a problem to add parallelism to the Arduino framework, but it may make sense to at least provide the tools to allow separation.

Arduino is finally shipping their Portenta H7 board, which has a M7 (480 MHz) and M4 (240 MHz). I just recently ordered one, but haven't yet plugged it in and actually used it. But looking at their code (only very briefly), it seems they're taking the general approach where you upload 2 separate programs, as if it were 2 different boards.
 
Arduino is finally shipping their Portenta H7 board, which has a M7 (480 MHz) and M4 (240 MHz). I just recently ordered one, but haven't yet plugged it in and actually used it. But looking at their code (only very briefly), it seems they're taking the general approach where you upload 2 separate programs, as if it were 2 different boards.

That seemed the straightforward approach given different cores. With the nxp 1170 power on with M7 in control ...

Seems the M4 on 1170 would be an imxrt not kinetis arch, but still not common code with M7? So would need twin compiles? Certainly they have a unique world view of pins and hardware beyond TCM memory space they use.
 
Arduino is finally shipping their Portenta H7 board, which has a M7 (480 MHz) and M4 (240 MHz). I just recently ordered one, but haven't yet plugged it in and actually used it. But looking at their code (only very briefly), it seems they're taking the general approach where you upload 2 separate programs, as if it were 2 different boards.

That makes sense. I can imagine that in normal development you would likely be modifying the M7 board or the M4 board. And it certainly simplifies things in terms of the toolchain. I was thinking that if you had to embedded the M4 code within the M7 code, it would make it complicated.

And given the M4 doesn't come up until the M7 initializes it, it may be that it is mostly unused potential (at least at first). But that is fine, it can be used as needed.
 
Not sure if the H7 specs - notes have changed - still PreOrder here :: arduino.cc/usa/portenta-h7 - Though Mouser has them $99.90

Two 12 bit DACs and from ST summary (shorter than above two links):
Code:
The Arduino Portenta H7 leverages the outstanding performance, flexibility, of the dual-core STM32H747, which can simultaneously run high level code along with real time tasks. [B]Both cores share all the STM32 peripherals[/B] and can run:
Arduino sketches [U]arduino.cc says 'Arduino sketches on top of the Arm® Mbed™ OS'[/U]
Native mbed applications
Micropython / Javascript via an interpreter
TensorFlow Lite
The Arduino Portenta H7 board follows the MKR form factor, but is enhanced by the addition of two 80 pin high density connectors at the bottom of the board. This ensures scalability for a wide range of applications by simply upgrading your Portenta H7 board to the one suiting your needs. 

This board is fully compatible with the Arduino IoT Cloud and includes the following key features :
STM32H747 dual core Cortex M7@480 MHz + Cortex M4@240MHz 
8 MByte SDRAM
16 MByte NOR Flash
100 MBit Ethernet Phy [U](1170 has twin gigabit)[/U]
USB HS
[B]WiFi/BT combo[/B] 
DisplayPort over USB-C 
High density expansion port

Would be cool if NXP now has a radio chip to put onboard?

Mouser note: 'The two cores communicate via a Remote Procedure Call mechanism'
> Time and the manual will tell - but it seems the st MCU's may have different level type of interconnect? (RPC's and all peripherals shared) versus NXP seems like given 'Fact Sheet' notes it is just two MCU's that have shared non-TCM memory??? And H7 even shares TCM?

ST chip has caches half the size of nxp_1062 and half the RAM of 1170 - though it has some fast onchip flash - the memory space (for twin MCU's) not as fast/clean as the 1062?:
Code:
Core
32-bit Arm® Cortex®-M7 core with doubleprecision FPU and L1 cache: 16 Kbytes of data and 16 Kbytes of instruction cache allowing one cache line to be filled in a single access from the
256-bit embedded Flash memory; frequency up to 480MHz, MPU, 856DMIPS/2.14 DMIPS/MHz (Dhrystone 2.1), and DSP instructions
Memories
Up to 2Mbytes of Flash memory with readwhile-write support
1Mbyte of RAM: 192Kbytes of TCM RAM (inc. 64Kbytes of ITCM RAM + 128Kbytes of DTCM RAM for time critical routines), 864Kbytes of user SRAM, and 4Kbytes of SRAM in Backup domain
Dual mode Quad-SPI memory int
 
@defragster - you beat to the punch on the Portentia. But a couple things caught my eye:

  • If you need more memory, Portenta H7 can host up to 64 MByte of SDRAM, and 128 MByte of QSPI Flash
  • The carrier board: https://www.arduino.cc/pro/hardware/product/portenta-carrier which provides support for all the extra stuff not exposed on the Portentia itself, including gigabit support, display and camera
  • Both processors share all the in-chip peripherals ....

Carrier board starting to look like a PI - wonder if it can run Linux :) Know the Imxrt can run Emcraft Linux BSP: https://www.cnx-software.com/2017/10/27/emcraft-releases-linux-bsp-for-nxp-i-mx-rt-cortex-m7-soc/

One thing that is of concern to me, because dont know enough, is that if the 2 cores can share in-chip peripherals how do you handle race conditions if both the M7 and M4 what to use it? Probably user's responsibility.
 
Yeah - lots more specs - felt like I was writing an ad to buy the H7 :confused: But it does give a feature list for ref . And does indicate they have "Arm® Mbed™ OS" even under Arduino coding.

Did not follow to the carrier board - that would be cool plan for T_4.5??? - for selective addon - but really no way that thing is going to have castellated direct mount - just those dedicated connectors - which would solve that need.

Interesting it does claim gigabit connector - which is odd since the MCU offers only : "10/100 Ethernet Phy". The 1170 MCU does have twin Gigabit, plus a 10/100.

also highlighted in p#209 was the 'Both cores share all the STM32 peripherals' - maybe the 1170 has one gigabit connect on each core? Will define a lot when NXP reads us in on the M4 .vs. M7 elements/peripheral.

H7 MCU 253 pg data sheet says 1027 DIMPS (with 300 for the M4?) (above says 856 not 1027 dimps) for H7 - and the 1060 spec says 1284 DIMPS - so perf/MHz seems equivalent - the 1170 will just be 1,400 MHz versus 720 for H7.

Indeed coordinating and using two cores will be fun. ESP32 uses RTOS with radios on core #0 and Arduino on #1 - but Arduino can code to #0 - they are symmetric so it doesn't matter. NRF52840 is one core but uses RTOS under Arduino (w/adaFruit) to schedule in the Radio.
 
For those of you are interested in the 1170 EVK I found this on the NXP community website - don't know if its been posted already;
As I know, RT1170 EVK will available on October.

EDIT: Decided going to hold off on getting a Portentia to play with and wait for the 1170 eVK to invest in. :)
 
if the 2 cores can share in-chip peripherals how do you handle race conditions if both the M7 and M4 what to use it?

The main usage model is each peripheral should be controlled by only 1 of the CPU cores.

NXP will have hardware-based semaphores and a "messaging unit" meant for implementing multithreading features. Those are mentioned on the public block diagram. Not shown on the diagram is a peripheral sort of like the ARM memory protection unit (MPU) which can configure which core gets access to each peripheral, with optional semaphores. Some of the peripherals will also have registers which control which core gets access.

While a semaphore can be used to arbitrate access, generally you won't want to have both cores trying to use the same peripheral.

One thing we will not get is ARM LDREX & STREX instructions which synchronize across both cores. The semaphore peripherals are the only way to get that.
 
The main usage model is each peripheral should be controlled by only 1 of the CPU cores.

NXP will have hardware-based semaphores and a "messaging unit" meant for implementing multithreading features. Those are mentioned on the public block diagram. Not shown on the diagram is a peripheral sort of like the ARM memory protection unit (MPU) which can configure which core gets access to each peripheral, with optional semaphores. Some of the peripherals will also have registers which control which core gets access.

While a semaphore can be used to arbitrate access, generally you won't want to have both cores trying to use the same peripheral.

One thing we will not get is ARM LDREX & STREX instructions which synchronize across both cores. The semaphore peripherals are the only way to get that.

Missed those blocks - diff functional processor - but recent:: NXP has this March 2020 doc : RT600 Dual-Core Communication and Debugging
With the dual-core running mode, the Cortex-M33 and the DSP need to
communicate with each other. The RT600 provides two simple means,
Message Unit (MU) and Semaphore Block, to achieve this task

Maybe it will be similar?

In putting the 'ARM LDREX & STREX' [ into T_4 micros() ] the reading all says not for Multi. It only seems to work on a single core - and only a crude 'did ANY interrupt happen' during this block. Not specific to any actual single memory location change.
 
Last edited:
The main usage model is each peripheral should be controlled by only 1 of the CPU cores.

NXP will have hardware-based semaphores and a "messaging unit" meant for implementing multithreading features. Those are mentioned on the public block diagram. Not shown on the diagram is a peripheral sort of like the ARM memory protection unit (MPU) which can configure which core gets access to each peripheral, with optional semaphores. Some of the peripherals will also have registers which control which core gets access.

While a semaphore can be used to arbitrate access, generally you won't want to have both cores trying to use the same peripheral.

One thing we will not get is ARM LDREX & STREX instructions which synchronize across both cores. The semaphore peripherals are the only way to get that.

Thanks for the explanation. Sounds generally like programing will be a nightmare having to deal with 2 cores potential in the support and other libraries.

The main usage model is each peripheral should be controlled by only 1 of the CPU cores.
So basically if I want to use spi devices (not the same one necessarily) would have to deal with semaphores or have to say put the devices on one core and have the other send the data to be sent transferred to the SPI device. This is giving me a headache already :)

@defragster - thanks for the link.

Anyway have a feeling going to be awhile before we see a Teensy with an 1170 so have plenty of time to experiment with the EVK if it when it comes out.
 
This is giving me a headache already :)

Me too... thinking about possibly implementing SPI.beginTransaction() to use a semaphore!

The recommended way will involve using 2 SPI ports. Connect all the SPI device you want the M7 to access to 1 port and all the others the M4 would use to the other port.
 
Me too... thinking about possibly implementing SPI.beginTransaction() to use a semaphore!

The recommended way will involve using 2 SPI ports. Connect all the SPI device you want the M7 to access to 1 port and all the others the M4 would use to the other port.

Yep - that was what was one of the things giving me a headache.

Figured that the only real way to do it was going to be as you described. But there is time to figure all that out - still working with T4x :)
 
I see that the block diagram for the i.MX1170 says "1 Gbps ENET with AVB" is that the same AVB "Audio Video Bridge" that is used by comapnies such as MOTU to transfer audio over ethernet, and if so, is that likely to be workable on a teensy?
 
I see that the block diagram for the i.MX1170 says "1 Gbps ENET with AVB" is that the same AVB "Audio Video Bridge" that is used by comapnies such as MOTU to transfer audio over ethernet, and if so, is that likely to be workable on a teensy?

As far as I can tell that does seem to be the case, I know I plan to look into it just to see if it's something that I can support in my Ethernet libraries.
 
I see that the block diagram for the i.MX1170 says "1 Gbps ENET with AVB" is that the same AVB "Audio Video Bridge" that is used by comapnies such as MOTU to transfer audio over ethernet, and if so, is that likely to be workable on a teensy?

Yes, you are right, AVB stands for Audio Video Bridging https://en.wikipedia.org/wiki/Audio_Video_Bridging, the name of development group was changed in TSN (Time Sensitiv Networking) https://en.wikipedia.org/wiki/Time-Sensitive_Networking they added many standards which are most needed for 5G 'real time' networks.
The big thing of i.MX1170 is to have two of this 1Gbps ENET so you have a simple switch/bridge to forward data.
 
Echoing the others in this thread who are calling for USB-C. Micro-USB cables are annoying and failure-prone.

I would further request that power delivery is supported, to as many watts as possible, so that we can power larger LED arrays, servos, steppers, small heaters, etc., without having to use separate power bricks. These days, you can get gallium nitride USB-C supplies that are very small and deliver 100 watts for $40. If the application's power demands can be supported by the USB-C port on whatever laptop or workstation you're working on, then you have an all-in one solution that can be used to program, debug, and run the device.

Some method of determining power delivery status would also be ideal. For example, if I'm developing on a big array, and my workstation only delivers 20 watts, I may want to clamp the max brightness, but then kick it up to maximum when more wattage is available.

If power delivery has to be through a secondary board, that would be fine with me, but if it can be reasonably integrated into a Teensy without driving the cost through the roof it would be perfect.

The current 4.0 and 4.1 form factors are both ideal for various use cases, so I would keep both.
 
Echoing the others in this thread who are calling for USB-C. Micro-USB cables are annoying and failure-prone.

I would further request that power delivery is supported, to as many watts as possible, so that we can power larger LED arrays, servos, steppers, small heaters, etc., without having to use separate power bricks. These days, you can get gallium nitride USB-C supplies that are very small and deliver 100 watts for $40. If the application's power demands can be supported by the USB-C port on whatever laptop or workstation you're working on, then you have an all-in one solution that can be used to program, debug, and run the device.

Some method of determining power delivery status would also be ideal. For example, if I'm developing on a big array, and my workstation only delivers 20 watts, I may want to clamp the max brightness, but then kick it up to maximum when more wattage is available.

If power delivery has to be through a secondary board, that would be fine with me, but if it can be reasonably integrated into a Teensy without driving the cost through the roof it would be perfect.

The current 4.0 and 4.1 form factors are both ideal for various use cases, so I would keep both.
This is sort of off topic for the main thread, but In terms of USB-C PD, I just put in an order for my first USB-C PD gear (wall charger, battery, and PD trigger to get access to the power).

I have this project to measure how much power things take. This was sparked by learning how to power my Olympus and Panasonic cameras using dummy batteries, and I wanted to figure out the maximum power the camera used at any one time. In addition, I like to do costume props with WS2812B leds (i.e. neopixels) and it is always handy to figure out how much power a display takes (though I'm on the low end of the power scale, since I generally don't do more than 100 or so LEDs). Normal amp/volt meters will tell you what power is being used at the current time, but it doesn't give you the max power.

I do have two meters that are a little better than just tell me the current value, and they can do graphs on my cell phone via bluetooth.

On one of the meters, has an option to save the values as a CSV file, which I can import to my computer. But that meter is limited to 5v/USB input (the cameras run on 2 cell li-po batteries, so I need 7.2v and up, with the cameras running fine on 9v). But even with a CSV file, I have to manually coordinate doing the item I want to measure (for example how much power does it take to record a 4K video) and then figuring out where the spike for the action was.

I recently got another meter that is more flexible in that it can measure micro USB-b, USB-C, or 5.5mm x 2.1mm inputs/outputs. It also does a bluetooth display on the phone, but the graph is not all that detailed. I can't make the graph larger or have lines that tell me the various mAmps and mWatts, so I have to guess. Also there appears no way to save a file to the PC.

So I have a Teensy 4.1 in a feather shield using the Adafruit INA219 featherwing and an Adafruit OLED featherwing display that I've been working on. I display both the current volts, amps, and watts, as well as the maximum values. I have a button to clear the maximum values, so I can clear the state before doing the action I want to measure.

I have an A/C -> D/C power converter that I can dial in specific voltages, batteries that can produce 9v directly, and I have various usb micro-B boost cables that may/may not produce enough watts. Part of the desire with doing the measurement is some of the USB boost circuits don't provide enough power, and I wanted to avoid them in the future, once I know how much power is taken.

While I don't own them, two of the latest Olympus cameras (E-m1 mark III and E-m1x) can be powered by USB-C PD, and I wanted to borrow them, and I wanted to eventually measure them as well. Since they don't use the same battery as my current cameras, I couldn't just use the dummy batteries I have.

I also upgraded my cell phone to one that uses USB C, and I wanted to upgrade to a USB battery that can do quick charges via USB-PD.

In my looking around today, I didn't see any programmatic methods of determining which of the USB-C PD voltages are available. There may be chips out there, but I didn't know the search terms. So I bought one of the cheap devices that have a LED and a button to display/set the voltage.

While I would likely buy such a device, I suspect it won't go into the Teensy proper. But it may be useful as a stand-alone device that feeds the Teensy and talks I2C/SPI/uart about the USB-C PD state.

<edit>
While I'm grousing, I should mention, none of the INA219 designs that I glanced at seem to have given much thought to the ways people might use them. In particular, it would be nice if the device had both power/ground for each of the input power from the power source and power/ground to the device that uses the power. The two grounds should be connected on the PCB, and connected to the ground of the I2C device. It would be nice if the device had standard connectors, such as 5.5mm x 2.1mm male/female power ports. And whatever connection should have thick enough wires to meet the actual voltage/amps that is specified.
 
There has to be a way to write normal threaded code in a single program, rather than having two distinct programs.

If the first core can upload code to the second core's flash, or to its RAM (and if the second core can run code from RAM rather than flash), it should theoretically be possible. It would be far better this way, more akin to writing threaded code in a language like Java. I am imagining the workflow of writing dual-core code as separate programs, and it seems like it would be really awkward. In Java, you just tell it "I want to run this function in another thread" and it takes off from there.

Why they did not design this chip with shared flash is beyond me. It seems like the most obvious solution.
 
Not much is known about the chip - except to Paul under NDA. Not even if the two cores can run the same compiled code with one an M4 and the other an M7 ? Or if they take two sets of compile/link>>Hex?

I didn't see a note that each would have unique flash - though sharing flash would complicate and slow code/data loading as it would be off chip controlled by one core? Notes above show the H7 has on chip flash?

It seems the fast M7 powers up the MCU and then can control if/when the M4 core starts. On the 1062 T_4.x's it does its own code load to RAM ...
 
Not much is known about the chip - except to Paul under NDA.

NXP's public block diagram reveals quite a lot about the chip, and at least so far the documentation under NDA leaves a lot of be desired (or guessed). Here's that block diagram.

i.MX-RT1170-BD.jpg


Not even if the two cores can run the same compiled code with one an M4 and the other an M7 ?
Or if they take two sets of compile/link>>Hex?

This sort of info comes from ARM's documentation. With all the chips we use now, you don't get details from NXP about which instructions their chips execute. You get that from ARM.

The short answer is floating point is different between M7 & M4. M7 does 64 & 32 bit, where M4 has only 32 bit float. The FPU registers are different too. If you want to run the same binary on both, you need to be careful not to use floats or doubles. Integer math, DSP extensions and everything else about the instruction set is identical between M4 & M7. Obviously M7 uses few clock cycles, especially for predicted branches, but the actual non-FPU instruction opcodes are identical.

As you can see in NXP's block diagram, each core has its own TCM. So far it's not clear to me whether either core can access the other's TCM at all (I suspect it's possible, but details are still unclear). But even if they can, for good performance you generally want each code using its own TCM as much as possible.

The practical reality of most code is memory allocated as static/global variables, local variables on a stack, and sometimes dynamic memory on the heap. Many functions take C pointers or C++ references to be able to manipulate data regardless of which of these 3 places it's actually located.

So even if you arrange to have local copies of functions in the ITCM of each core, for the sake of performance you generally will want each core only accessing its own DTCM. Maybe you could find a way to arrange that, but I'm having a hard time imagining how general purpose code shared between the 2 core would keep itself to only the ITCM & DTCM on each.

Only 768K of the RAM is TCM. The other 1280K is ordinary memory on the AXI bus, just like we currently have 512K of slower RAM in Teensy 4.0 & 4.1.


I didn't see a note that each would have unique flash - though sharing flash would complicate and slow code/data loading as it would be off chip controlled by one core? Notes above show the H7 has on chip flash?

Flash will be very similar to what we have now. Both cores will access to the same flash via the slower AXI bus. Both have their own L1 instruction caches which allows most types of code running from flash to run almost as fast as ITCM. Of course it's much slower for cache misses, but the norm isn't even I-cache hits, it's running from ITCM.


There has to be a way to write normal threaded code in a single program, rather than having two distinct programs.
....
It would be far better this way, more akin to writing threaded code in a language like Java.

This hardware is quite different from Java's virtual machine, or even PC hardware where the general model is Symmetric Multiprocessing (SMP). It does look a lot more like SMP if you never use the FPU, ITCM & DTCM, but those are the most compelling hardware features for code to run fast.

Even then, which way is "better" involves a lot of subjective opinion. Even if the non-symetric TCM & DSP issues are worked out, as someone who focuses on publishing libraries for novices, the race conditions and deadlocks common with fully multithreaded programming are the stuff of nightmares.

But I'm not completely closing the door. Maybe we'll make 3 board defs, program just the M7 (with the ability to communicate with the M4 code), or just the M4 (which can communicate with the M7, or both (also able to communicate with itself using the same mailbox/pipe/semaphore APIs). Exactly how all this will really work still remains to be figured out. Even though I do have some info from NXP which is under NDA, the reality is the early documentation is riddled with errors & omissions. It's mostly copypasta from their other chips, which raises a lot more questions than it answers. Really, that block diagram is looking pretty good.

At least initially, we're probably going to follow Arduino's program-each-core-separately lead. Yes, it closes off (or makes much harder) all sorts of powerful multithreaded programming techniques. But it also tends to eliminate (or makes you work much harder for) tough race conditions.
 
Last edited:
Back
Top