Since NXP has not yet published a 1170 reference manual (at least not without NDA), I'm intentionally waiting to comment on many of the technical points. I have only quickly looked at ST's documentation for the H7 chip, but I'm assuming it's similar to what NXP will do.
Yup, you'll end up using ldrex/strex around every access.
From ARM's documentation, you can see these instructions only apply to normal memory. They do not work with "strongly ordered memory", so they're not effective for use with peripherals.
Whether NXP actually implements this correctly between the 2 cores remains to be seen. I recall seeing info (errata) on other non-NXP chips that basically said these instructions did not work properly between the CPU cores. I really do hope they get this right, but I would not be surprised if each CPU core ends up having its own separate lock mechanism, making this worthless for its intended purpose between the cores.
Or, more simple, restrict this and use only one core for this.
Seems unlikely for both cores to share a peripheral. But I do believe having one core use some peripherals and the other core use the rest should work, so you could (probably) have the M7 use Serial1, Serial4, Serial5 and have the M4 use Serial2, Serial3, Serial6. Of course, that depends on both configuring the clock tree the same way, but that issue exists today on Teensy 4.0 (eg, all the serial ports run from "uart_clk_root") with only a single CPU core.
Edit: Maybe the MPU can be useful?
The MPU is not shared. Each CPU core has its own MPU. Here's the block diagram from NXP's public "fact sheet".
Whether the MPU in each core is the same is also a good question. There's a good chance M4's MPU may support fewer regions than M7's.
In NXP's iMX application multi-core processors (which have public documentation), they have included special peripherals meant for synchronizing and messaging between each CPU core. What exactly we will get with 1170, I can not say, but I don't believe it takes too much imagination or reveals any secrets to suggest NXP will very likely make this new chip by reusing the peripheral IP they've put in their other iMX products. If you *really* want to think about these low level details, looking at the documentation for those chips might give you some ideas.
what Paul will choose then we will see
So far, I really haven't done much on this. I've been focusing on the 1062 chip for Teensy 4.1. Configuring ethernet clock has been a 2-week nightmare. I only got working just a couple days ago. Then the ping packet test program that worked on Teensy 3.6 sprang to life. Getting Teensy 4.1 into production (or fully ready before all the parts arrive) is my top priority right now. I'm planning to look at 1170 only after Teensy 4.1 is done.
I am going to collaborate with Arduino on this. We've already exchanged a few emails. I do believe everyone benefits if we create compatible APIs.
Anyway Portenta is openly declared a professional product (have you visited arduino.cc/pro?) whose cost is high also due to the industrial temperature and the expected longevity of the components.
I do believe there's a strong need for this. It's definitely not where Teensy is focused, very much on commercial temperature range, keeping costs low (no big SDRAM chip), and focusing on makers.