Gap between the world of microcontrollers and the world of SoCs

Markk · Jan 10, 2014

Hi

I'm a newcomer to "embedded" and evaluating different microcontrollers and so far Teensy 3.x is at the top.

But actually I'm not really happy.

I might have missed something, but so far there seems to be this inherent "gap" between the world of microcontrollers and the world of SoCs like Raspberry Pi etc. One world stops at ~100kBytes, ~100MHz, the other starts at ~100MB, ~500MHz (typical). And along with the spec, there is the corresponding gap in power consumption, complexity, price, etc.

Why?

Strangely both worlds now employ 32bit and even the CPU cores seem to be quite similar. So I can't see an inherent technical reason for having this gap.

Especially the RAM gap is almost unbelievable for the newcomer. Why on earth does a microcontroller like the Teensy's just have 64k RAM? Not 64M (which would be about OK) but 64k! A full factor of 1000(!) to the smallest of the SoC world.

Why!?

I can see that there are many applications where you don't need much RAM and I can see how RAM adds many transistors to the chip, bloats the die footprint, increases rate of defective chips etc. I can see that there is a market for 1 cent microcontrollers and 10 cents are already too much.

But do those "cheap" applications really need a Cortex-M4 processor? What on earth can you process at 96MHz, 32bit when you only have 64k RAM? This seems to be a grotesque mismatch when you look at the evolution of CPUs. It's like putting Usain Bolt in a phone booth!

But isn't there a (big) market for more advanced applications too?

My project should be powered by a small solar panel, so it needs to be efficient. The project involves creating and maintaining histograms. When I store one histogram in full 32bit, the RAM's half gone! I would like to maintain many of those histograms for different time spans, calculate differences (trends) etc. I may be able to reduce to 16bit, but then I would have to constantly watch over my shoulder for possible overflows, do scaling and other unnecessary ugly stuff. Same goes for the time span aggregates. Might wrestle with swapping them in and out from the SD card but this is really adding so much complexity, so many opportunities to introduce bugs, so many reasons for the code to become unmaintainable and inflexible, that what actually was a simple job in the first place, becomes a nightmare!

Or have I missed something? Is there a system (similar to Teensy 3.1 but with 100 times more RAM) that I have overlooked?

All insights (including simple confirmation that there is no solution) are welcome.

-Mark

Constantin · Jan 10, 2014

I'd think about it differently. That is, how close to the bare metal of the processor do you want to come? I remember some absolutely brilliant work being wrestled out of Commodore 64 processors, stuff that wasn't supposed to be possible, thanks to faolk who were really good at assembly language. These days, few if any programmers solely program in assembly, and for a reason - high level interpreters and compilers have made it much faster to get working code using a language like C vs. dropping down to bare metal. Your product may not end up being as fast, but you get it done much quicker.

I happen to think that a Teensy 3.1 is amazing in terms of performance, price, and so on. I come from the Arduino, whose RAM, Flash, etc. capabilities are puny by comparison. However, and very importantly, Pauls ceaseless work here has made those new features usable - from making String classes that don't regularly blow up, to creating a more user-friendly interface for the ADC, RTC, etc. Hardware capabilities are nice, but your users have to be able to use them!

I happen to think that the Teensy 3.1 has plenty of RAM. But nothing stops you from adding more - a user here has done so, using SPI-based RAM chips, if memory serves. I am using this chip for power measurements, GPRS communications, etc. and it serves my purposes fine. Could I save some money and shoehorn the code into a 8-bit AVR? Perhaps, but I'm not a code jockey. Instead, I spend a few bucks more and apply a processor that runs circles around my problem and it takes me 1/10th the time to program.

I don't quite understand your aversion to SD cards either - I have been using them without issues and Paul even sells a SD card backpack, IIRC. Seems no more complex than storing the histograms in RAM, with the exception that SD cards do not require power being applied to them to keep their current state. Something to think about when running off a battery.

tetsuo · Jan 10, 2014

The very short version of what is potentially a very long and complex answer to your question is this: you're thinking of microcontrollers as smaller general-purpose devices similar SoCs, but they are in-fact entirely different things; therefore, there is no "gap" to speak of as they're on two different scales.

Rasp Pi's, beaglebones, etc are simply scaled-down general purpose computers running general purpose operating systems, while microcontrollers are purpose-built, dedicated systems - each runs a specific firmware and is devoted to a specific task or set of tasks. Need something with real-time control over a piece of hardware with microsecond (or better) precision? You need a microcontroller. Want to run an ultra-low-power web server and do minimal data collection for home automation? You want a SoC.

Microcontollers are micro-controllers, not meant for processing of data. Much of the time, their actions are entirely procedural... it might need to perform more of a particular operation per second (justifying the higher clock speeds you question) or perform those operations with greater precision or in larger chunks (justifying the 32-bitness that you questioned) without any additional need to accumulate and process megabytes of data.

Now you probably are saying "yes, yes... I already know the difference. duh!" but it does seem that you've fallen into the trap of trying to build a data-centric piece of software to run on a hardware-centric device (like a teensy). A microcontroller alone will never be intended to do what you're trying to do all by itself: it will need outside help. You can either try to build that yourself by having it write out to an SD card, SPI-accessible RAM, some external entity via USB or serial, etc with very direct/explicit control over how all of that will work (nearer the metal as Constantin puts it), or you can make life a whole lot easier by building software to run on a tiny little linux box (Pi, BB, etc) and get it done much more quickly.

stevech · Jan 10, 2014

Teensy 3.1 has 64KB of RAM.
I'm accustomed to histograms being an array of counter values, usually int or short, depending on need.
And histograms I've used are just a few dozen or a few hundred counters (bins). That's a a couple of thousand bytes of RAM at most.
Maybe you are doing a zillion independent histograms concurrently?

Markk · Jan 10, 2014

Thank you for your answers.

Constantin said:
I'd think about it differently. That is, how close to the bare metal of the processor do you want to come? I remember some absolutely brilliant work being wrestled out of Commodore 64 processors, stuff that wasn't supposed to be possible, thanks to faolk who were really good at assembly language.

I won't argue with that. Great things can be built using less than adequate tools. Look at the pyramids! I'm just too lazy for that kind of slavery.

Constantin said:
I happen to think that a Teensy 3.1 is amazing in terms of performance, price, and so on. I come from the Arduino, whose RAM, Flash, etc. capabilities are puny by comparison. However, and very importantly, Pauls ceaseless work here has made those new features usable - from making String classes that don't regularly blow up, to creating a more user-friendly interface for the ADC, RTC, etc. Hardware capabilities are nice, but your users have to be able to use them!

I apologize sincerely if my post can be misread as a critique of the Teensy platform or Paul's work in any way! Far from it. My questions targeted the microcontroller industry, not the platform which (like I said) I consider the very best of its kind.

If I read Paul's post about playing music samples, he has the exact same problem: very good ideas, ample CPU power, good hardware, not enough RAM!

Constantin said:
I happen to think that the Teensy 3.1 has plenty of RAM. But nothing stops you from adding more - a user here has done so, using SPI-based RAM chips, if memory serves.

Yes, but only memory attached to the FlexBus would actually permit memory mapped access, if I'm right. But that seems too complex for my skills and I don't know if the Teensy platform would allow for that. Otherwise I'd use the SD card, but the swapping is cumbersome, of course.

Constantin said:
...Could I save some money and shoehorn the code into a 8-bit AVR? Perhaps, but I'm not a code jockey. Instead, I spend a few bucks more and apply a processor that runs circles around my problem and it takes me 1/10th the time to program.

Exactly my thoughts! Except that my applications problem happens to be somewhat bigger than yours. And there I am: "mind the gap".

Constantin said:
I don't quite understand your aversion to SD cards either - I have been using them without issues and Paul even sells a SD card backpack, IIRC.

No aversion. Extensive logging is a requirement of my application too. The thought-through architecture with the backpack is one of the many reasons I'm favoring the Teensy 3.x platform.

Constantin said:
Seems no more complex than storing the histograms in RAM, with the exception that SD cards do not require power being applied to them to keep their current state. Something to think about when running off a battery.

Much more complex for the type of access I need. The extra CPU and SD card controller power I'll need to swap this data in constantly will be much more than what some megs of SRAM would need.

Markk · Jan 10, 2014

tetsuo said:
The very short version of what is potentially a very long and complex answer to your question is this: you're thinking of microcontrollers as smaller general-purpose devices similar SoCs, but they are in-fact entirely different things; therefore, there is no "gap" to speak of as they're on two different scales...

Thank you for the answers.

It seems you are defending the way it is, not giving reasons why it supposedly has to be so.

I'm aware of the differences and qualities of a microcontroller vs. a SoC and I'm actually posting here, because I like these qualities. Having said that, there are clear signs of the distinction becoming blurred. Look at the CPU cores.

Even when drawing a strict line between the two worlds, I see no reason at all why a powerful microcontroller like the Teensy's should not have an adequate supply of RAM (and on the chip). Where is the logic logic in embracing more CPU power, more bits, faster opcodes, more ADC/DAC resolution, much faster ADC speed, more IOs, etc. but not the RAM size to match these capabilities?

There is a certain "natural ratio" from CPU power to RAM that one can observe in the history of computing. I believe it's off here.

Markk · Jan 10, 2014

stevech said:
Teensy 3.1 has 64KB of RAM.
I'm accustomed to histograms being an array of counter values, usually int or short, depending on need.
And histograms I've used are just a few dozen or a few hundred counters (bins). That's a a couple of thousand bytes of RAM at most.
Maybe you are doing a zillion independent histograms concurrently?

The project is about Gamma-ray spectrometry.

I'd like to record a ~8k raw i.e. unadjusted histogramm and also log it at intervals. Pulses coming in at max. 10k/s but amassed at certain bin peaks. There are many calculation steps that transform the (high resolution) raw spectrum into its compensated and calibrated (lower resolution) form. And while processing, I always want to keep counting and logging in the background, so I need another copy [or two] of the histogram). While some more elaborate stuff will only be done on a remote server, I need to preprocess and aggregate to some extent in the controller itself (can't send all data for data bandwith constraints).

With the RAM constraints being as they are, I'm also thinking about a combining a microcontroller with a SoC. The latter being periodically powered up by the controller for processing and comm. But of course this would blow up complexity, size, price etc.

potatotron · Jan 10, 2014

My take on what you're asking is why isn't there one single chip/system that does everything? My guess is economics. There's not enough of a market for a SoC that has lots of GPIO pins, lots of Flash, lots of memory, etc. to make it worthwhile to build in quantities large enough to be financially viable.

The typical implementation for something like what you want is multiple controllers doing different things, e.g. one CPU with gobs of RAM to handle the logging etc. interfacing with one or more MCUs to handle the IO pins etc.

If you consider a desktop computer, you've got the main CPU with all its RAM etc. but it doesn't handle the lower-level details of writing bits to the hard disk or watching your mouse for movement; it has secondary systems for that, and they exchange data on the PCI / USB / hyper transport etc. busses.

Even in the space-constrained world of a smartphone the main chip doesn't handle everything; the camera, WiFi, Bluetooth, cellular, etc. all have their own controllers which all have their own firmware.

MichaelMeissner · Jan 10, 2014

potatotron said:
My take on what you're asking is why isn't there one single chip/system that does everything? My guess is economics. There's not enough of a market for a SoC that has lots of GPIO pins, lots of Flash, lots of memory, etc. to make it worthwhile to build in quantities large enough to be financially viable.

There isn't an obvious choice right now, but I would imagine it is probably already on drawing boards. Certainly Intel is starting to think in these terms with Galileo and Edison. I personally think the Galileo is rather bloated, expensive, and misses the mark on GPIO pins. I haven't looked at the specs of Edison yet, but it might have a better chance with the smaller form factor. The Arduino team is weighing in with combined systems (Yun/Tre) that also probably aren't there yet either, due to having to program essentially 2 different platforms. DiviX seems to favor the kitchen sink approach. Rasberry Pi probably shows that the market exists (and is likely now a bigger market than Arduino, but I'm just guessing here).

Markk · Jan 10, 2014

potatotron said:
My take on what you're asking is why isn't there one single chip/system that does everything? My guess is economics.

Yes, I can see that.

potatotron said:
The typical implementation for something like what you want is multiple controllers doing different things, e.g. one CPU with gobs of RAM to handle the logging etc. interfacing with one or more MCUs to handle the IO pins etc.

OK, I understand. So my conclusion to use both, a SoC in combination with a Teensy is not that "stupid" actually?

Feel kind of "Buridan's ass".

Thanks
-Mark

Markk · Jan 10, 2014

MichaelMeissner said:
There isn't an obvious choice right now, but I would imagine it is probably already on drawing boards. ... Rasberry Pi probably shows that the market exists (and is likely now a bigger market than Arduino, but I'm just guessing here).

Thanks. Things are certainly set in motion, I think.

PaulStoffregen · Jan 10, 2014

Markk said:
there seems to be this inherent "gap" between the world of microcontrollers and the world of SoCs like Raspberry Pi etc. One world stops at ~100kBytes, ~100MHz, the other starts at ~100MB, ~500MHz (typical). And along with the spec, there is the corresponding gap in power consumption, complexity, price, etc.

Why?

The reason "why" has to do with trade-offs in the actual silicon fabrication processes.

On Teensy, the processor, RAM and non-volatile storage is all on the same piece of silicon inside a single chip. The silicon can't be overly optimized for any one thing, but rather a balance that achieves overall performance for all 3.

On Raspberry Pi, the processor, RAM and non-volatile storage are separate chips. Each piece of silicon can be highly optimized for its specific task. The Broadcom BCM2835 processor on the RPi actually has a memory chip stacked on top of it, so the 512M RAM is separate silicon optimized for memory density (and would probably only be able to implement a 1980-era processor). Some other "single chip" SoC boards actually have 2 pieces of silicon mounted inside 1 plastic package. On a RPi, the non-volatile storage in the SD card is probably also 2 pieces of silicon, one optimized for high density flash and a small controller chip.

The key point, the reason why, involves dramatically optimizing the silicon fabrication for a particular purpose, at the expense of other applications. There are people who are truly experts in this silicon fab stuff. I'm not one of them. This is only my general knowledge. Someone who really does this stuff could speak much better about the specific silicon trade-offs (expect much of this stuff is closely guarded trade secrets of some of the world's most powerful companies). But here's some very general ideas....

Flash memory's dual gate requirement has traditionally been the big speed-limiting issue in silicon fabrication. Normal CMOS fabrication requires only 1 thin oxide layer to separate the gates from the chip's substrate, and it only needs to insulate well enough for the transistor to work at relatively slow clock speeds. In flash memory, 2 thin oxide layers are needed. The non-volatile storage is achieved by trapping electrons on a floating transistor gate between the oxide layers. Each layer needs to insulate extremely well, since those electrons are supposed to remain trapped there for over 100 years at room temperature.

Silicon fabrication is done in layers, usually be growing an oxide layer (pure glass) on top of the wafer plus everything done in the previous steps. The a photosensitive mask chemical is added and exposed to light through the masks that define where the circuit feature will be. The wafer is then exposed to a strong acid that etches away the oxide (glass) where the mask allowed light. Then the wafer is coated with other stuff and baked at very high temperature, causing that stuff to become part of the chip (eg, "stuff" can be materials the implant into the silicon itself, or grow more layers on top of remaining oxide that may itself be on top of other layers). Then more acid or other chemicals remove the excess stuff, the resist and unneeded oxide. This is repeated many times, building up the many features and layers inside the chip, starting with the N+ and P+ implants that form the source and drain of transistors, then the "polysilicon" layers that form the transistor gates, and finally metal layers for signal routing.

One pretty incredible challenge in this fabrication process is not destroying the work from all the previous layers. My understanding is the general approach involves using progressively lower temperatures for each step. The upper layers tend to have lower resolution and other limitations. The entire process is really a pretty marvelous achievement of modern technology. But it's far from magic. There are a LOT of difficult tradeoffs.

Those tradeoffs can be made in many different ways, which can optimize the process for a particular application, but too much optimization for one thing can cause that silicon fab to be nearly useless for others.

Flash memory's requirement for floating gates apparently imposes a lot of limitation on all the other layers that can be fabricated inside the chip. Again, there are poeple who really know the details, but I sadly only have general knowledge in this area. I've been told DRAM processes involve multiple layers of polysilicon, which has high resistivity (slow circuit speeds) but can be made with incredibly fine resolution and is made at much higher temperature. For designing logic circuits that run at high speed, you generally want to connect the polysilicon gates to low impedance metal routing as closely as possible, since the gates are capacitive. There are a LOT of trade-offs in these silicon processes.

So that's why. On Teensy and all flash-based microcontrollers, the silicon is optimized for balance to achieve performance. You get logic circuitry, volatile and non-volatile memory all on the same piece of silicon, but it can't be too heavily optimized for any one of those things without sacrificing performance on the others. The on-chip flash memory imposes a lot of restraints on the silicon design. You also tend to get fairly low power, because everything is on the same chip, and also because the optimizations for flash memory tend to be similar for the optimizations for low power.

Even though Raspberry Pi might be called a SoC (System on Chip), in reality it's separate chips for the CPU+GPU and the DRAM, where the Boardcom chip is highly optimized for fast logic circuitry and the DRAM chip on top is optimized for dense volatile memory. Inside the SD card, there's a high density flash chip (or stack of such chips in the larger cards) fabricated on a silicon process that's optimized for flash only. In fact, NAND flash is so optimized for density that a small percentage of the sectors are bad and more defects develop over time, so SD cards have a small controller (fabricated on different silicon) that manages the media defects and performs wear leveling. Extremely optimized NAND flash might not even retain data for 10+ years, since the controller makes heavy use of error detection and correction algorithms and automatically reassigns data to new areas of the media as defects develop. They don't tell you such things when you buy a 32GB card at a retail store, but internally that capacity is made possible by these types of extreme optimizations.

This turned out really long, but hopefully that gives you a better idea of why the market is filled with 2 different classes of products.

stevech · Jan 10, 2014

MichaelMeissner said:
There isn't an obvious choice right now, but I would imagine it is probably already on drawing boards. Certainly Intel is starting to think in these terms with Galileo and Edison. I personally think the Galileo is rather bloated, expensive, and misses the mark on GPIO pins. I haven't looked at the specs of Edison yet, but it might have a better chance with the smaller form factor. The Arduino team is weighing in with combined systems (Yun/Tre) that also probably aren't there yet either, due to having to program essentially 2 different platforms. DiviX seems to favor the kitchen sink approach. Rasberry Pi probably shows that the market exists (and is likely now a bigger market than Arduino, but I'm just guessing here).

The CPUs (Qualcomm et al) that go into smartphones have a lot of I/O pins, megabytes of flash/RAM. But they aren't what you want for a near-bare-metal microprocessor, if for no other reason than the details are kept proprietary/licensed and companies like Qualcomm have huge marching army of patent attorneys.

Headroom · Jan 10, 2014

To answer one of the OPs questions, no, it is not stupid at all to combine a micro controller and a more powerful computing device. In fact in the embedded world it is not uncommon at all that the Microcontroller provides the glue logic between more powerful but highly specialized processing devices.

Markk · Jan 11, 2014

PaulStoffregen said:
The reason "why" has to do with trade-offs in the actual silicon fabrication processes.

...
So that's why. On Teensy and all flash-based microcontrollers, the silicon is optimized for balance to achieve performance. You get logic circuitry, volatile and non-volatile memory all on the same piece of silicon, but it can't be too heavily optimized for any one of those things without sacrificing performance on the others.

... hopefully that gives you a better idea of why the market is filled with 2 different classes of products.

Thank you very much, this really explains a lot.

So separate chips would be the solution, which makes me wonder, @Paul, have you ever looked at the FlexBus memory mapped external RAM option and (in the past or for the future) considered it for the Teensy

?

I'm still quite sure it would meet a big demand in the market. I.e. wouldn't it resolve some limitations with your audio projects and also allow for dynamically calculated effects such as "echo" etc.

?

Thanks again for your enlightning reply.

-Mark

PaulStoffregen · Jan 11, 2014

Markk said:
have you ever looked at the FlexBus memory mapped external RAM option and (in the past or for the future) considered it for the Teensy ?

Yes indeed, I've looked at FlexBus and *many* other things, and not just on Freescale chips.

I'm still quite sure it would meet a big demand in the market. I.e. wouldn't it resolve some limitations with your audio projects and also allow for dynamically calculated effects such as "echo" etc. ?

Well, at least on these chips, the interface is only for static RAM, which doesn't give you the large memory sizes of dynamic RAM. It also consumes a LOT of the available pins, which is a pretty bad trade-off for most projects. It also increases the cost, which seems to be a direction the Arduino guys and others are going on many new boards. I'm not eager to follow that trend.

I believe it's still very premature to know what the real limitations are on the audio library. A pretty simple hack to bring Bill's latest SdFat SPI optimizations back into the older SD library cut the CPU usage from 40% to 12%. For mono 16 bit 44.1 kHz, that's 12% on every other update, so just a little trickery in scheduling the reads could allow two mono streams to read with 12% usage, or 4 mono streams at 22.05 kHz...

But yes, delays need a lot of memory. Teensy 3.1 can only manage about half a second of delay. That's pretty useful for a lot of projects, but obviously not huge. At least for now, my main focus is working within these limits to enable as much as possible, while keeping the small size, low power and affordable price.

Gap between the world of microcontrollers and the world of SoCs

Markk

Well-known member

Constantin

Well-known member

tetsuo

Well-known member

stevech

Well-known member

Markk

Well-known member

Markk

Well-known member

Markk

Well-known member

potatotron

Well-known member

MichaelMeissner

Senior Member+

Markk

Well-known member

Markk

Well-known member

PaulStoffregen

Well-known member

stevech

Well-known member

Headroom

Well-known member

Markk

Well-known member

PaulStoffregen

Well-known member