Options for 'bare-metal' development

Status
Not open for further replies.
One question which is related : Why is the vector table always remapped to RAM.

I understand this is needed when you have an OS that wants to run-time change the handler for an interrupt.
But for a typical development, having the vectors in ROM is just fine I think..

Or is there something I am missing ? Does attachInterrupt() need this ?

Having the vectors not being remapped simplifies things, so I'd rather not do it if its not needed.
 
Why is the vector table always remapped to RAM.

Allocation of DMA channels between different libraries was the main motivation.

A secondary concern was low-jitter interrupt response. If the interrupt function is in RAM (defined with FASTRUN), this allows the entire interrupt response to access only single-cycle RAM, without any fetches from the variable timing of slow flash which may or may not be in the tiny flash controller cache (or much larger cache of Teensy 3.6).

Or is there something I am missing ?

Yes. Try to figure out which interrupts OctoWS2811 and Audio (several optional input & output objects) and SmartMatrix and DMASPI and others will use. Do this for all Teensy 3.x models, which have 4, 16 or 32 DMA channels. Fixed IRQ allocation works well when you design a fully monolithic application, where you explicitly allocate all hardware resources and edit every piece of low-level code. It falls apart quickly when a large community develops many powerful DMA-based libraries, which users expect to be able to use together in their projects, without editing the low level code.
 
Last edited:
Perhaps this may have unintended consequences?
What consequences?

It absolutely positively must be in every program so why put it into an archive where its inclusion is strictly optional? The only symbols it provides (that I can see) are the standard library stubs (_sbrk, read, etc.) and if they aren't used prior to the linker looking at it, it will not be linked.

The normal way of linking a C program is to put crt0.o on the command line. I am still trying to figure out why arm-none-eabi-gcc isn't putting that on the linker command line since it isn't being given the -nostartupfiles option.
 
IME, using a static library for core is completely pointless in the first place and simply increases build times. My cmake build system simply uses the object files and works just fine without building a static library. Unused code/data gets eliminated just fine (there is the '--gc-sections' linker flag).
 
Please take this as a serious, not offending question - i really want to know :) I've asked this several times, and never got an answer !
What's the point of "bare metal" - where is the real difference, and why is it preferable ?
 
Unused code/data gets eliminated just fine (there is the '--gc-sections' linker flag).

Perhaps this is true with today's compilers, but throughout Arduino's long history (starting in version alpha-0012 as I recall, around 2008-2009 time frame) building core.a did indeed make a substantial difference. When Arduino began using this core.a, the decision was based on quite a lot of discussion on their public mail list, where multiple people posted results showing the .a linking allowed the compiler to do better than merely .o files with --gc-sections. Those results may be AVR specific, they may apply only the older compiler used at the time, but it was a real difference. That design has persisted over the years. It's decision and convention used by Arduino, and we use it too because even if it doesn't help, the only downside is extra build time (which is skipped by the 1.6.5 and 1.8.1 versions when none of the sources have changed).

In the upcoming 1.8.2 version, they've added code to recycle core.a between different builds, as long as all the sources and settings haven't changed.
 
In my experiments building with VisualMicro behaves the same as doing it from within Arduino IDE (I reinstalled VM, Arduino and TeensyDuino with recent versions to rule out anything there)
I have no T3.6 but I'll try to reproduce your results with a T3.5 tonight. (in my experiments I can reproduce the problem with both the MK64FX512 and the MK20DX256)

I see you use digitalWrite and other functions. This also includes a certain amount of code which may result in an executable which is large enough not to have this problem.
You could maybe repeat your test but only using direct register acces

When it works the VisualMicro did good IDE build emulation - in this case it seems to be missing something. Even in IDE build tools/options like LTO, optimize level can break things.

Saved a few bytes it seems from 1804 :: Sketch uses 1772 bytes - intended to use this TOGGLE qBlink:
Code:
void setup() {
  pinMode(LED_BUILTIN, OUTPUT);
}
#define qBlink() [B]{GPIOC_PTOR=32;}[/B] // (digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN) ))
void loop() {
  qBlink();
  delayMicroseconds(100000);
}
void yield() {}

So "Bare metal" is there - just need to learn Teensy by using it. I set up a new machine yesterday - IDE + TeensyDuino - Boom up and running.

T_3.6::
1772 bytes is a blink, turn on usable USB stack goes to 4364 bytes. The big win is killing yield() overhead that takes USB to 15944 bytes.

T_3.0::
1612 bytes is a blink, turn on usable USB stack goes to 4100 bytes. The big win is killing yield() overhead that takes USB to 10224 bytes.

That shows another BIG POINT regarding Bear Metal - I jumped the blink from T_3.6 to a T_3.0 with just IDE tools menu change! PJRC took care of all the needed 'BARE METAL' fix-ups needed between those two devices! This test sketch doesn't exercise anything but blink - but you can be assured that all other Device Specific sub-systems are online and ready to run right from the TeensyDuino install: RAM, FLASH, Digital I/O, Analog I/O, Timers, PWM, Serial, SPI, EEPROM, i2c, USB - and much more is ready to use with provided sources or from RTFM {reading the Freescale manual}.

So before anyone gets OFFENDED at the varied response for asking the same old recurring question: "How do I do Bare Metal on Teensy" - please realize Arduino is the easy to setup system Paul chose to work within. Paul started with a Bare Metal system years back and called it Teensy.

Other note - Put on Visual Studio 2017 last night - going to finally try VisualMicro - that will not be as easy as above:
Arduino: ZIP 162MB + TeensyDuino 1.36b1 57MB :: install size on disk 618 MB
Visual Studio 2017: vs_installerservice.exe downloaded 7.11 GB :: install size on disk 45 GB { many extras but no Visual Micro yet }
 
Last edited:
Indeed you can reduce memory usage, because some of the hardware serial code and USB code is always linked, whether you use it or not.

If you want to take this further, you should edit yield() and also the default fault handler. Those are the 2 places which force the linker to add extra code.

I'm sure people interested in "bare metal" would wish to delete those things. It all depends on what you consider expensive. If human time & attention are free, and bytes of RAM are expensive, then perhaps you'll choose to delete certain code that makes things much easier.

But if you try this, please remember you're running without that fault handler. It keeps the USB responsive in many error conditions. That allows you to conveniently click Upload in Arduino without having the press the button on your Teensy. It also (usually) allows stuff you printed before your program went awry to actually end up appearing on your screen many milliseconds later. My hope is nearly everyone using Teensy can just take these convenience features for granted, never knowing they exist, that things "just work" in most cases even when your program crashes.
 
When I discovered the serialEvent() it seemed cool and missed it on ESP8266 (supported now?) - now seeing the overhead it seems awesome yield() is "weak" allowing it to be so easily done away with - and if some serialEvent#() is useful - it can be dropped in. I didn't look at the RAM associated with those builds. Those tests were best case (?) Smallest code with new LTO.

If I understand ... Yikes - losing fault handling. With serial debug - having mystery faults take the Teensy offline would be even harder to deal with. Would that be use of this code? :: fault_isr(void). Perhaps this relates to the whole interrupt table coming into RAM as noted above, which seems like a win making the system more robust as noted.
 
Please take this as a serious, not offending question - i really want to know :) I've asked this several times, and never got an answer !
What's the point of "bare metal" - where is the real difference, and why is it preferable ?

I can only give some examples :

1. for a CNC controller I've calculated 6 STEP and 6 DIR signals (1 bit each, using PORTC[0..11]) - I want to output them simultaneously, so I use GPIOC_PDOR = StepAndDir; instead of 12 calls to digitalWrite()
2. the GCode messages from a computer are terminated with a \n linefeed. The UART driver not only has to buffer the bytes, but has to count the messages as well. Standard serial driver doesn't do that.
3. I want to include a fault-tolerant UART communication, using RDT 3.0. This requires an extra state-machine in the UART, which is hard to add in the existing driver. It's easier when built-in from the start of the design.
4. I need a 32-bit timer controlling the rate an ISR. So I'd like to use a PIT timer. But those timers are already used for other libraries eg. intervalTimer
5. In many developments I use HD44780 LCDs. In some cases I use a 16*1 (io of the more common 16*2) display, and guess what, the (Arduino) library doesn't work properly for that type of display..

So in a summary :

the standard libraries use the peripheral resources in a standard way, but some designs may want to use them in different ways.
 
the standard libraries use the peripheral resources in a standard way

Yes, you can write your own UART, PIT timer and other code. You can complete access to all the hardware registers.

You can use attachInterruptVector, or just write directly into the RAM-based vector table, to commandeer the interrupts that are already used by default by the core library code.
 
For the fellow CNC-ers that expressed interest in this Teensy based CNC Controller : I have reached the milestone of making stepper motors running, that is :
* Gcode (being received through UART0) is being interpreted and resulting motions are stored in a Motion buffer
* supporting up to 6 axis
* supporting T-profile and S-profile

As mentioned before, I am doing the math in floating point, instead of traditional integer 'workarounds', so I was curious as to how long the stepper interrupt routine would need to do all the calculations in (HW)-float
* without FASTRUN : 2.7 uS
* with FASTRUN : 2.0 uS

I am currently running the stepper-interrupt at 100 kHz (10 uS), so this would give us max 50 kHz stepping frequency (my target) at a 20% CPU load.
I haven't done any performance tuning yet.

Current code size : 36K ROM, 12K RAM

So far So good :)
 
Please take this as a serious, not offending question - i really want to know :) I've asked this several times, and never got an answer ! What's the point of "bare metal" - where is the real difference, and why is it preferable ?

I don't know about anyone else but I dislike the Arduino system, perhaps because I have been writing programs for too long. It is slow to start and slow to compile. Add to that the lack of control over what is going on in the libraries (unless I dig into them and hack them up) and all the baggage that goes with them, and I prefer writing my own code. A simple program to blink an LED should not be 20KB even when you have 1MB of flash.

I have a "bare metal" program that is a start on a high speed data logger. I wrote the SDHC code from scratch. I looked at the Teensy 3.6 SDHC code and much that it did mystified me. Including its ignoring the simple DMA system which I found quite easy to use.

I may continue with that or I might instead use a version of eForth that I ported to the Teensy 3.6. That is written using the GNU assembler only.
 
Re: CNC Controller Progress

Strooom,

Nice to hear you have made such progress on your Teensy-based cnc controller!

I would like to experiment using it to control my cnc machine when you get a reasonably stable release out.

I can help with testing some, and later on I want to see if I can integrete my height sensor to it.
Running on Teensy 3.6, it reports a number about 1000 times per second representing the measurement (height) so one can scan a workpiece, make a point cloud, and use it to correct for height variations in the workpiece. The anticipated advantage is that it will finish faster than doing one point at a time as is usually done with a cnc probe for the same purpose. Mine would stream as it goes, rather than stopping at each measurement.

Keep a chuggin', can't wait to play around.
 
Last edited:
@Mr Mayhem, dynamic height sensing? Would be an incredible solution for hobby level CNC machines (Shapeoko, Carvey, OXO...........) anything you can share? Perhaps start a new thread?
 
@Mr Mayhem, dynamic height sensing? Would be an incredible solution for hobby level CNC machines (Shapeoko, Carvey, OXO...........) anything you can share? Perhaps start a new thread?

That's my goal anyway. I have the electronics done pretty much, and posted Teensy/Processing stuff related to the linear photodiode array sensor. What's needed next is a way to apply this in a sensor head that goes on the cnc machine. I am looking at triangulation vs a cantilever probe tip kind of thing, vs a edge finder kind of thing. After deciding on a sensor head design, I can do the pcb, etc.

Here's my github:
https://github.com/Mr-Mayhem?tab=repositories

I have a thread going in this forum on this linear array focused stuff, but have not drifted into the actual cad model of a workable use of the electronics yet.
https://forum.pjrc.com/threads/3937...Read-TSL1410R-Optical-Sensor-using-Teensy-3-x

Essentially the idea is use the precision of the photodiode array as a way to quickly measure points as the workpiece is scanned. A shadow falling on the sensor is one way, and I have this delivering position data at accuracies below one micron on Teensy at 800 sensor frames per second (each frame containing all the pixels, (usually 256 pixels) at around 8.5 megabits per second via USB serial. The way it would work is some artifact touches the workpiece and conveys it's motion to a moving shadow on the sensor.

One could also add a lens and have the array watch for a laser line on the workpiece rather than looking for shadows cast from an artifact. We could do this many ways, including not necessarily shining right into the camera, but just seeing the laser line on the workpiece from some other angle. I will be exploring this option as well, making some laser scanners that use the sub pixel math like my shadow sensor does, but rehashed for a bright spot on the line of pixels, rather than a shadow.

Looking at linear array sensor pixels from left to right, a shadow signal goes from the nominal illuminated level down steeply, across low and back up steeply. A bright spot (from a laser line image) goes up to a peak and back down, and is shaped more like a positive Gaussian bell curve.

Some atomic microscopes use a laser bouncing off a tiny mirror or on a cantilever, like a phonograph needle riding a vinyl record, it touches the workpiece as it is scanned. I could scale that up for a cnc kind of dial indicator probe, and have the laser reflect from the cantilever mirror into the sensor through a dark filter, etc.

Back on the speed topic, I could probably go faster say with SPI from one Teensy to another if that's doable, or maybe even running on the same Teensy board if there is enough processing power remaining after the controller is done taking it's share, so to speak. Even if I had only 100 measurements per second, that would be likely alot faster than a probing mechanically. Plus you kinda want to go slow anyway for sake of the machine's own accuracy shifting around, and other factors.

I know also the sensor will go past 30,000 frames per second in an ideal circuit with a fast ADC. They rate it at a pixel clock rate of 8 million pixel clocks per second. Divide 8 million by 256 pixels and that's 31,250 complete all pixel frames per second. It would have to have dedicated logic like a CPLD or FPGA and a fast ADC and careful pcb design, but it's doable if needed. The issue becomes everything else has to speed up also to make use of it.

I will be happy if I can simply scan a point cloud, using a mow-the-lawn pattern of cnc movements, and get 10 to 50 microns of repeatable precision.

I will post more when I get a more complete prototype built up.

The connection to this thread is the more Teensy cpu available after the motion controller takes its share, the more likely I could run this sensor on the same board, rather than a separate one hooked up with SPI, etc. I figure it will always be a compromise between the motion controller's top pulse rate and my sensor's top frame rate. The sensor can slow down quite a bit and still be useful, however, at least compared to a touch probe way of measuring heights to a workpiece.
 
Last edited:
I had a quick read through your wiki - I like your plan & will follow/contribute wherever I can.
I think the Teensy 3.5/3.6 is the perfect platform for this, I'd be happy to design a minimalistic CNC-specific PCB (e.g. screw terminals & buffers for step/dir, a few LEDs for EN signals, optoisolated limit switch inputs) for the Teensy to plug into.

Hey macaba, I think I'm ready for the motherboard design.
What schematics/PCB design software did you intend to use ?
 
Hey macaba, I think I'm ready for the motherboard design.
What schematics/PCB design software did you intend to use ?

Honestly I don't think I'm going to give a satisfactory answer here - as I am a user of proprietary (i.e. not open source) software (Proteus PCB Design). I've tried using KiCad 3 times in the last year and usually feel very disappointed as it hasn't 'clicked' into place for me yet.

I've started the schematic and mechanical mounting layout.
Here is my current line of thinking:

CNC stack.png

Top board
The main board common to every scenario. It has:
- Size of 90x100mm
- Teensy 3.6
- Ethernet jack
- (6) Opto-isolated limit switch inputs
- Buck converter to reduce 12-48V down to 5 (or 3.3)
- TX/RX on screw terminals for pedants/external hardware
- (2) 5A outputs for general use (e.g. 3D printer would be hotend power & SSR drive signal for heated bed. CNC would be an enable/PWM signal for spindle)
- SIL header to bottom board

Bottom board
This is the application specific board.
Variant A: For a 3D printer, it would have all the stepper driver ICs on it.
Variant B: For a CNC mill with external drives, it would have the STEP/DIR output circuitry.
- Size of 100x100mm
- SIL socket for top board

The PCB sizes are specifically aimed at cheap PCB fab sizes (i.e. a max size of 100x100 means 10 boards for $13.50).
 

Attachments

  • CNC stack.png
    CNC stack.png
    75.9 KB · Views: 151
Last edited:
Ok, I understand you prefer Proteus, as you have climbed the learning curve... On the other hand I'd like the schematics and layouts to be open hardware. What about Eagle?
I have not done any serious PCB-design for some years, so I'll have to spend some learning anyway. I'm doing some tests with Eagle and it looks OK.

About your design : I like most of it, as it aligns very well with what I had in mind. I think the buck converter could be down to 5V as the Teensy has an on-board regulator down to 3.3V. I am currently thinking LM2596-5. I am also thinking about 8 (io. 6) opto-isolated inputs, as they come in Quads anyway. Could be a mix of limit-switches or other buttons.

I need to think about the need for 2 boards, even for a CNC with external steppers.
As this is the 'main' scenario I had in mind, I would like to have that on just 1 board.

So let me tink about all that and come back to you later.

Thanks for your contribution!
 
Ok, I understand you prefer Proteus, as you have climbed the learning curve... On the other hand I'd like the schematics and layouts to be open hardware. What about Eagle?
I have not done any serious PCB-design for some years, so I'll have to spend some learning anyway. I'm doing some tests with Eagle and it looks OK.

I would say that if we're both going to make the effort to learn something new, I'd say KiCad is the right option despite my struggle with it. (Just needs a new way of thinking)

I need to think about the need for 2 boards, even for a CNC with external steppers.

This is almost entirely driven by these facts:
- The desire to have all the screw terminals along 1 face (makes this significantly easier to wire into machine cabinets when mounted on a DIN rail).
- The desire to keep the board size under 100x100.
(For example, 100x200 [the same area on 1 board] is $27.60 [so comparable to 2x $13.50] but locks the functionality into a specific format [i.e. CNC mill or 3D printing])
 
Strooom - Do you have a pin list of required pins?
e.g. Dir/step pins on specific timer pins, etc, if applicable.
Obviously most other pins can just be remapped.
 
Just a quick update of the status : I've made some prototype versions of a PCB so I can move my testing onto real machines :
* 6 motors, 6 peripherals, 12 limit switches and/or buttons.
I will make the schematic and board-design available on Github.
2017-05-17 11.56.04.jpg
 
Update

Teensy on Controller PCB, Assembly with (4) Stepper Drivers and (1) Solid State Relay on a 5mm plexiglass baseboard. (400mm x 123mm)
2017-05-24 20.56.05.jpg
I've found a few problems, such as obstruction of the USB and Serial connector from power-supply and drivers, but hey, that's why it's prototyping.
Nothing serious which can't be fixed easily in version 2.
 
I am wondering if there is anymore progress with this project or if it is a dead horse? I am really interested in the idea to be able to use the teensy for cnc control, as many have stated the lack of expandability in grbl/Arduino.
 
Status
Not open for further replies.
Back
Top