Newbie questions -> Teensy 3.2

Status
Not open for further replies.

clubman

Member
Hello everyone,

I am a new user, new to the Teensy world currently waiting delivery of my Teensy 3.2, but not new to the Arduino world. I have made several projects using Arduinos which I find really handy and low cost plus small and easy to fit dimensions. I am mainly using Nanos which offer the best cost-to-capabilities ratio.

BUT... There are several problems with AVRs I am facing:
1) My serious projects are running out of sketch size. 32KB when 3-4 libraries used (almost always modified for the best by me) is really small. Libraries usually use half of the flash available. Note that I am a senior software engineer and I write code for a living for almost 15 years so I know all of the best practices by reusing code,always using the proper variable size etc. Still, each project has its own requirements and that's why there are different processors in the market, right? Why not use a Mega? Speed is 16Mhz and the board is huge for my projects, plus it is more expensive than the macho Teensy! Why not use Due? Same reason...

2)Speed! 16Mhz is not that bad but I believe that in 2017 it is kind of slow, especially when the processor is used in a network with other, faster processors and is keeping them back.

3)To continue from No2, speed is the issue again but not from the scope of Mhz but from really, really sucking IDE commands or libraries. I know that sometimes speed is sacrified for user friendy commands etc, but most of the commands are horibly slow. Take analogRead() for example, or even digitalRead()!

4)IDE standard Libraries! An example that took me several months of frustration in order to realize what was going on is the wire library not having a timeout and reset when the first happens.. Mercy! Serial library not having TX ring buffer? When sending for example 10 bytes @ 115200bps it was slowing the code for several msecs!

5)The Arduino forum. The so called "gurus" are really arrogant and cocky most of the time. I don't know if they are tired of reading the same things but if you are a newbie, YES, you will ask silly questions! You will do that even if you have read thousands of datasheets because probably you have understood less than the half of it and because you don't have the experience to understand empirically what you read. No one was born an embedded developer or electronics engineer. If you didn't have a question you wouldn't ask in a forum. That's why forums exist. I am a member in several forums for more than 18 years and I have never experienced such a behaviour elsewhere. Forums are supposed to help, not making you want to give up!

So what I would expect from a Teensy are the following (+questions):

1)Are the "native" IDE commands optimized? I mean does analogRead() for example waits for the result to show up or does it let the rest of the code execute freely and then gathers its result and shows up? Can I do direct digital port reading like I do with the Arduino or is it a different story? Are those things documented anywhere? I am really afraid of the standard functions..

2)What about the most popular libraries? Is the Serial library optimized? Can it output data without delaying the whole code like a pig?

3)Specifically for the I2C library are the mechanisms described above included? For the unsupported libraries, how easy is it to modify them to run on Teensy? Is there a library for the MAX31855 for example?

As for the forum I just hope for the best....

Thank you all for let me share my thoughts with you!
 
First up, Teensyduino is designed to match assumptions in Arduino so existing libraries work, so you won't magically get an old library doing new faster/better things.

Analog read is by default blocking to match arduino expected behavior, see the ADC library
https://forum.pjrc.com/threads/25532-ADC-library-update-now-with-support-for-Teensy-3-1
for methods to use the hardware to it's full extent.

Most of the inbuilt libraries have been optimized in various ways, and some of those have in fact gone back into the Arduino core were applicable. Serial has a buffer, (32 byte from memory, edit: see here https://github.com/PaulStoffregen/cores/blob/master/teensy3/serial1.c#L40) and working within that you can avoid blocking. Note that if by serial you mean 'the virtual serial port on USB' that is a separate question that hinges more on what the USB interface and driver is up to, and have hit slow downs there if trying to stream massive amounts of debug info.

for i2C see here and links https://www.pjrc.com/teensy/td_libs_Wire.html

Several offshots of wire exist for varying use cases, caused in large part by fully arduino compatibility and full use of the teensy on board hardware (especially multiple i2c ports) not being possible from the same library due to choices in that original library.

As to the forum main thought would be to see forum rule about posting code, and don't ask people to do your college assignments for you (some maths based on your above answer suggests thats not going to be a problem).
 
I agree with GremlinWrangler here,

Most of the default code is setup to work the same way as it does on an Arduino, so that you can quickly move your projects over. This includes things like emulating some of the AVR registers, such that if your code tries to muck with some of them, example SPI registers, the system tries to make it work on the ARM processor.

1) As mentioned, analogRead does like it does on all Arduino processors and waits for the results. There are other libraries which unlock other capabilities. And in all cases you have access to all of the registers and the like, to fully utilize the processor. Example ADC

2) Serial, works like Serial... Delays like a Pig? Again it depends. The code works like AVR, in that if you try to write so many bytes, it will try to move all of those bytes into a software FIFO queue. If you try to write more bytes than will fit in the queue, the code will wait until you can put the remainder into the queue. Now on Teensy new member to the HardwareSerial class allows you to ask how many bytes are free on the write queue. So you can now set your code up to only write as many as will fit and queue the rest up in your code any way you want. Hardware wise, some of the Serial ports have a larger FIFO queue, which helps reduce the number of interrupts needed to transfer stuff to and from the software queues.

2a) SPI - Again the default stuff will work similar to AVR, however the SPI subsystem has other capabilities, like a FIFO queue where you have the ability to encode stuff into each push, that for example allows you to control several CS pins. This is used in some libraries like ili9341_t3 library (See PJRC tft display) which really speeds it up. Also there is the ability to do DMA access to SPI. Again another library setup to help with this.

3) I2C - Again you have the default Wire library that works very much like the AVR stuff, so there exists a library that uses wire library to interface with some hardware than most of the time it should just work. In many cases where changes are needed to properly support the teensy. Code is posted to do so, plus Paul (and others) try to get the library owners to take Pull requests to fix them for the Teensy and/or a version of the library is kept and installed by Teensyduino to fix the issues. And again there are other libraries that have been developed that better take advantage of the underlying hardware, such as I2c_t3

Again if you do run into stuff that does not work on the T3.2, if you follow the rules at the top of every forum page, and provide enough information, like a description of the hardware (hopefully a link to some spec page), plus what library you are trying (again hopefully a link to it), and a sample program that illustrates the issue you are having. There are several people who here who try to help out.

Hope that helps
 
Thank you very much both for your full of information answers! I will have to perform some tests with the hardware once it arrives and let you know if I have any questions..!
 
Hi,

FWIW, re the Arduino forum, I've frequently thought that the word "stupid" should be banned.
My observation of this forum is that questions and answers tend to be on a more sophisticated level. But I've never seen a raw newbie (which is not far from my own state) treated with disrespect.

--Michael
 
Thank you GremlnWrangler.. Will try everything once I transfer to Teensy..

Now, newbie questions continued! I received my 3.2 board yesterday, fired it up 10mins ago, amazed that every arduino-based command I have wrote on my project compiles! Except for the avr commands like ADMUX etc which I don't care as I will replace them with the Fast functions and the I2C TWCR ie which will be replaced with some Teensy based library.

The question is.. I compile a version of my code using Arduino nano and the results are:
Sketch uses 22290 bytes (72%) of program storage space. Maximum is 30720 bytes.
Global variables use 1134 bytes (55%) of dynamic memory, leaving 914 bytes for local variables. Maximum is 2048 bytes.

While on the Teensy are:
Sketch uses 55336 bytes (21%) of program storage space. Maximum is 262144 bytes.
Global variables use 5824 bytes (8%) of dynamic memory, leaving 59712 bytes for local variables. Maximum is 65536 bytes.

Is that normal? I am thrilled to see the huge mem amount left on Teensy but why is it using so much more flash and SRAM memory? It is almost double flash and around 4 times more SRAM. For the flash, is it beacuse Teensy needs much more background commands to be Arduino compatible? And the RAM? Is there a low threshold amount used by "standard" code?

Also I noticed the compiler comes by having 96MHZ checked as default. Can I use that without problems or should I switch it to 72Mhz?

Thank you!!
 
First, the Teensy core which is for one part code to make the Teensy compatible to the Arduino world, and for another part support for the ways more powerful processors and their integrated peripherals, such as multiple UARTs, native I2C and SPI blocks, multiple USB device type support, and much more, will add a few kB to even an empty sketch. Second, the Teensys are 32bit processors and even if you deal with 8 and 16bit variables and constants in your code, the compiler will often align these to 32bit addresses to speed up the code at runtime. This results sometimes in the "waste" of a few bytes, depending on your code and data structure, but it's for a good reason.

Seen that you have ways more Flash and RAM memory, you really shouldn't carry about. The more your sketches will grow, the less this tiny additional memory load will be noticeable.

96MHz is very fine for the Teensy 3.2 (tested and experienced by the vast majority of users here for years), although it is formally a little outside the data sheet specifications.
 
OK I understand.. It was a question out of curiosity and to understand how it works, it really doesn't matter to me at least at that point..

Regarding what you said about the 32bit addresses, so is it better to use for example floats instead of int and doing the math when needed? My sketch was using around 20 floats in the past which were then converted to ints (along with all the headache that causes when you need to use them) and that happened in order to save sketch size and earn a little more speed on the Arduino..
 
Regarding what you said about the 32bit addresses, so is it better to use for example floats instead of int and doing the math when needed? My sketch was using around 20 floats in the past which were then converted to ints (along with all the headache that causes when you need to use them) and that happened in order to save sketch size and earn a little more speed on the Arduino..

The use of floats will never reach the speed of integers on the Teensy 3.2 since the latter has no FPU (only the T3.5 and 3.6 have one). Thus it is much quicker to do integer math instead of float on the T3.2. Basically, but that's independent of the platform, you should avoid ambiguous variable declarations such as "int" or "word" because they might have different sizes, depending on the MCU and the compiler used. The best practice is to use types like uint8_t, int16_t, and so on, because these make sure that you'll always have the right size.
 
The larger initial memory usage comes (mostly) from 3 things.

#1 - The chip requires much more initialization code. AVR boots up pretty much ready to go in 16 MHz run mode, and have only a few alternate low power modes. These Freescale Kinetis chips, and most other modern parts, boot up to a low power state where almost everything is disabled or in its lowest power mode, and the chip is running from an internal RC oscillator. Not even the crystal is running at boot time. A tremendous amount of stuff is configurable. That's great if you want extremely fast startup and low power projects and a lot of flexibility. But it does come at a cost of requiring much more initialization code.

#2 - All Teensy boards have native USB. When you compile a tiny project on boards like Arduino Uno or Mega, remember there's another chip on the board with its own memory doing the USB for you. On Teensy, and on boards like Arduino Leonardo, Zero, Due, all that USB code needs to be built into your program. That gives you much faster USB serial and doesn't hog a hardware serial port, but it does mean more code. You'll find Teensy's DMA-based USB stack performs very well, even compared with Arduino's native USB boards. ;)

#3 - There's a known issue where the code for Serial1, Serial2, Serial3 (and Serial4, Serial5, Serial6 on Teensy 3.5 & 3.6) is built into your program, even if you aren't using them. This problem is very difficult to solve while also supporting serialEvent (which Arduino does) and a fault handler that allows your communication to complete gracefully if your code causes an ARM memory or other fault (no Arduino has this feature). This issue is considered low priority, which realistically means it will probably never be fixed.

Actually, there's one other difference. In Tools > Optimize, you can see different ways to run the compiler. Arduino defaults to smallest code size. We do that for Teensy LC where the memory is smaller, and where optimizations don't make much difference on Cortex-M0+. But on all the larger Teensy boards, optimize for speed is the default. The compiler actually does make your code substantially faster on Cortex-M4 using optimizations. For any apple-to-apples comparison (with the 3 above issues still present), choose the optimize for size option.
 
Last edited:
Software floating point has a bad reputation. Of course it is slower without the FPU, which makes 32 bit float about the same speed as integers. But it's not as bad (at least on ARM Cortex-M4) as many people believe. Benchmarking has shown about a 30-to-1 difference between Teensy 3.5 and Teensy 3.2 for 32 bit float, when running at the same clock speed.
 
Thank you for your posts.. Paul I really admire all the job you 've done and respect all the effort you have put on Teensy to make it so fast and low cost as well! You have answered my question perfectly, now I know what's going on in that little thingy..

I just did an AnalogRead comparison between an Uno and my Teensy 3.2:
Code:
unsigned long looptime;

void setup() {
  Serial.begin(9600);  
}

void loop() {
  looptime=micros();
  analogRead(5); 
  looptime=micros()-looptime;
  Serial.println(looptime);
}

Results:
Teensy:
Code:
10
9
9
10
10
9
10
9
10
10
9
10

Uno:
Code:
116
116
116
116
116
116
116
116
116
116

:D:D:D

So I am not being gredy but is there a way to even improve the 9-10us? I skim read the ADC library but for the time being it seems a little bit complicated to me.. Will it gain speed or is there another command or library that could improve Analog read speed?
 
analogRead() is only there for Arduino compatibility reasons, it's simple but slow because it triggers the AD conversion and will wait until the latter is accomplished. Acquiring analog readings at high(er) speeds requires the use of the ADC library which uses interrupts or DMA to do non blocking AD conversions with almost no CPU load.
 
Well yes I guess you are right.. On my Arduino when I was using ADCSRA to start ADC and read non blocking it took around 10us, so I guess Teensy it will be much faster....

Edit: It is awesome.. I opened the analogContinuousRead example and added:
Code:
looptime=micros();
            value = (uint16_t)adc->analogReadContinuous(ADC_0); // the unsigned is necessary for 16 bits, otherwise values larger than 3.3/2 V are negative!
           getval=value*3.3/adc->getMaxValue(ADC_0);
            looptime=micros()-looptime;

Result is 5 or 6us.. 3-4us faster than usual AnalogRead.....
 
Last edited:
And your loop time includes still a float multiplication and a float division (at least 12 CPU cycles!) which is not optimal. Define a global variable float convFactor = 3.3/adc->getMaxValue(ADC_0); outside setup() and loop() and in your loop do only getval=value * convFactor to eliminate the division.

BTW: Using DMA and a ring buffer, you might still increase the sampling rate... ;-)
 
To really do sustained fast ADC input, you need DMA. The ADC library is the best place to start. You might also look at the analog input objects in the Audio library.
 
The variable was there just for testing purposes but I didn't know you could do such a declaration..! Is it a Teensy-specific or you can do it on the Arduino as well?

Now, I did some testing using the analogRead example code which I include.. I initially tried it "as is" and I got a reading of 170us! Then I changed to VERY_HIGH_SPEED and it now is around 23us.... Compared to analogReadContinuous it is almost double time.. I also added a "Classic analogRead time" but I believe it is not correct as once you add the ADC library it takes over of the analogRead command.. If I am right, on AnalogReadContinuous example I was only reading the getMaxValue time and not the actual analogRead time, hence the difference.. But what is it safer to use from those 2? And what are the ups and downs of the VERY_HIGH_SPEED vs VERY_LOW_SPEED?

Code to run as is:

Code:
/* Example for analogRead
*  You can change the number of averages, bits of resolution and also the comparison value or range.
*/


#include <ADC.h>


const int readPin = A9; // ADC0
const int readPin2 = A2; // ADC1

ADC *adc = new ADC(); // adc object;
float convFactor = 3.3/adc->getMaxValue(ADC_0); 

void setup() {

    pinMode(LED_BUILTIN, OUTPUT);
    pinMode(readPin, INPUT);
    pinMode(readPin2, INPUT);

    pinMode(A10, INPUT); //Diff Channel 0 Positive
    pinMode(A11, INPUT); //Diff Channel 0 Negative
    #if ADC_NUM_ADCS>1
    pinMode(A12, INPUT); //Diff Channel 3 Positive
    pinMode(A13, INPUT); //Diff Channel 3 Negative
    #endif

    Serial.begin(9600);

    Serial.println("Begin setup");

    ///// ADC0 ////
    // reference can be ADC_REFERENCE::REF_3V3, ADC_REFERENCE::REF_1V2 (not for Teensy LC) or ADC_REFERENCE::REF_EXT.
    //adc->setReference(ADC_REFERENCE::REF_1V2, ADC_0); // change all 3.3 to 1.2 if you change the reference to 1V2

    adc->setAveraging(16); // set number of averages
    adc->setResolution(16); // set bits of resolution

    // it can be any of the ADC_CONVERSION_SPEED enum: VERY_LOW_SPEED, LOW_SPEED, MED_SPEED, HIGH_SPEED_16BITS, HIGH_SPEED or VERY_HIGH_SPEED
    // see the documentation for more information
    // additionally the conversion speed can also be ADACK_2_4, ADACK_4_0, ADACK_5_2 and ADACK_6_2,
    // where the numbers are the frequency of the ADC clock in MHz and are independent on the bus speed.
    adc->setConversionSpeed(ADC_CONVERSION_SPEED::VERY_HIGH_SPEED); // change the conversion speed
    // it can be any of the ADC_MED_SPEED enum: VERY_LOW_SPEED, LOW_SPEED, MED_SPEED, HIGH_SPEED or VERY_HIGH_SPEED
    adc->setSamplingSpeed(ADC_SAMPLING_SPEED::VERY_HIGH_SPEED); // change the sampling speed

    // always call the compare functions after changing the resolution!
    //adc->enableCompare(1.0/3.3*adc->getMaxValue(ADC_0), 0, ADC_0); // measurement will be ready if value < 1.0V
    //adc->enableCompareRange(1.0*adc->getMaxValue(ADC_0)/3.3, 2.0*adc->getMaxValue(ADC_0)/3.3, 0, 1, ADC_0); // ready if value lies out of [1.0,2.0] V

    // If you enable interrupts, notice that the isr will read the result, so that isComplete() will return false (most of the time)
    //adc->enableInterrupts(ADC_0);


    ////// ADC1 /////
    #if ADC_NUM_ADCS>1
    adc->setAveraging(16, ADC_1); // set number of averages
    adc->setResolution(16, ADC_1); // set bits of resolution
    adc->setConversionSpeed(ADC_CONVERSION_SPEED::MED_SPEED, ADC_1); // change the conversion speed
    adc->setSamplingSpeed(ADC_SAMPLING_SPEED::MED_SPEED, ADC_1); // change the sampling speed

    //adc->setReference(ADC_REFERENCE::REF_1V2, ADC_1);

    // always call the compare functions after changing the resolution!
    //adc->enableCompare(1.0/3.3*adc->getMaxValue(ADC_1), 0, ADC_1); // measurement will be ready if value < 1.0V
    //adc->enableCompareRange(1.0*adc->getMaxValue(ADC_1)/3.3, 2.0*adc->getMaxValue(ADC_1)/3.3, 0, 1, ADC_1); // ready if value lies out of [1.0,2.0] V


    // If you enable interrupts, note that the isr will read the result, so that isComplete() will return false (most of the time)
    //adc->enableInterrupts(ADC_1);

    #endif

    Serial.println("End setup");

}

int value;
int value2;
unsigned long looptime;

void loop() {

    // Single reads
    looptime=micros();
    value = adc->analogRead(readPin); // read a new value, will return ADC_ERROR_VALUE if the comparison is false.
    looptime=micros()-looptime;
    Serial.print("Pin: ");
    Serial.print(readPin);
    Serial.print(", value ADC0: ");
    Serial.print(value*3.3/adc->getMaxValue(ADC_0), DEC);
    Serial.print(", Looptime: ");
    Serial.println(looptime);
    looptime=micros();
    value = analogRead(readPin); // read a new value, will return ADC_ERROR_VALUE if the comparison is false.
    looptime=micros()-looptime;
    Serial.print("Classic analogRead Looptime: ");
    Serial.println(looptime);

    #if ADC_NUM_ADCS>1
    looptime=micros();
    value2 = adc->analogRead(readPin2, ADC_1);
    looptime=micros()-looptime;

    Serial.print("Pin: ");
    Serial.print(readPin2);
    Serial.print(", value ADC1: ");
    Serial.print(value2*3.3/adc->getMaxValue(ADC_1), DEC);
    Serial.print(", Looptime: ");
    Serial.println(looptime);
    #endif

    // Differential reads

    value = adc->adc0->analogReadDifferential(A10, A11); // read a new value, will return ADC_ERROR_VALUE if the comparison is false.

    Serial.print(" Value A10-A11: ");
    // Divide by the maximum possible value and the PGA level
    Serial.println(value*3.3/adc->getPGA()/adc->getMaxValue(), DEC);

    #if ADC_NUM_ADCS>1 && ADC_DIFF_PAIRS > 1
    value2 = adc->analogReadDifferential(A12, A13, ADC_1);

    Serial.print(" Value A12-A13: ");
    Serial.println(value2*3.3/adc->getPGA(ADC_1)/adc->getMaxValue(ADC_1), DEC);
    #endif

    /* fail_flag contains all possible errors,
        They are defined in  ADC_Module.h as

        ADC_ERROR_OTHER
        ADC_ERROR_CALIB
        ADC_ERROR_WRONG_PIN
        ADC_ERROR_ANALOG_READ
        ADC_ERROR_COMPARISON
        ADC_ERROR_ANALOG_DIFF_READ
        ADC_ERROR_CONT
        ADC_ERROR_CONT_DIFF
        ADC_ERROR_WRONG_ADC
        ADC_ERROR_SYNCH

        You can compare the value of the flag with those masks to know what's the error.
    */

    if(adc->adc0->fail_flag) {
        adc->adc0->printError();
    }
    #if ADC_NUM_ADCS>1
    if(adc->adc1->fail_flag) {
        adc->adc1->printError();
    }
    #endif

    digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN));

    delay(50);
}

// If you enable interrupts make sure to call readSingle() to clear the interrupt.
void adc0_isr() {
        adc->adc0->readSingle();
}
 
Although the Teensy ADC(s) have a theoretical resolution of 16bits, the meaningful resolution (due to noise and inaccuracies elsewhere) is only 12bit in single ended and 13bit in differential mode. It makes no sense to set higher resolutions but it will slow down the delta-sigma-conversion process. Second, if you set the averaging to 16 readings (as I saw in your code), the ADC will do 16 conversions before you get a result, which also slows down everything. Initially, you were here for the most rapid sampling and you've been told to use DMA using the ADC library. But that's not what you seem to be doing...

The different speed ratings can be found in the Teensyduino boards.txt file and they call different well documented and platform independent gcc compiler optimization options.
 
I tried less filtering and resolution but it wouldn't change the looptime.. BUT!

What I just need after analyzing things is a continuous non-blocking analogread with 10bits of resolution. I don't care if it takes even 1ms, I just need it to not block my loop.. So DMA looks great and I will study it, but is not needed for my project I guess..

So I am now trying to do single reads, non-blocking.. Any idea why startSingleRead(readPin,ADC_0); returns error startSingleRead is not defined on this scope? I see its declaration on ADC.h file..

I was planing to do the following:
Code:
startSingleRead(readPin,ADC_0);
      if (isComplete(readPin))
        value=readSingle(readPin);

Well, I answer to myself, it needed the adc-> in front of the functions..
 
Last edited:
What I just need after analyzing things is a continuous non-blocking analogread with 10bits of resolution. I don't care if it takes even 1ms, I just need it to not block my loop.. So DMA looks great and I will study it, but is not needed for my project I guess..

The audio library does this. Perhaps you could use the library as-is, with an ADC object connected to a Queue object.

Normally all audio processing stays within the library and flows along the connections you define between objects. But the queues give you access to the raw samples, in blocks of 128. They use the memory pool inside the library to queue up blocks, so can do things like SD card access or LED updates or other work that takes substantial time, and then fetch the blocks later.

Oh, but you might need to edit the ADC object code within the library, to delete the high pass filtering that removes any DC offset. Then you'd get the raw ADC readings.
 
That's good to know.. Maybe I will try it in the future when I will feel more comfortable with the Teensy stuff..

Right now I tested the code above and it takes around 3us to do a SingleRead, non-blocking.. Not bad!
 
Status
Not open for further replies.
Back
Top