Teensyduino 1.35 Released

Status
Not open for further replies.
is it me or is there something major changed since 1.3.1 as i am getting slow screen refreshes after installing the latest 1.8.1 ide and teensy duino 1.35 with the exactly same code?
 
is it me or is there something major changed since 1.3.1 as i am getting slow screen refreshes after installing the latest 1.8.1 ide and teensy duino 1.35 with the exactly same code?
? - Are you saying the IDE is slower or some of your code is sloer?

If your code is slower, might help to know, things like which libraries are you using? Which Teensy? What compile options...
 
Yes, lots of stuff changed. Except for rare cases where an Arduino release prompts another quick Teensyduino release, many things change on almost every release.

If you scroll down on the Teensyduino download page, there's a list with short summary of every significant change.

Now, if that answers your question, maybe you could give a little more detail about the problem? Your message doesn't even say which operating system you're using! A more detailed description, with specifics about exactly what you're seeing, and whether it happens all the time or only under certain circumstances would be the bare minimum to post if you want to give us any hope of helping.

But if all you wanted to know was if things changed, then yes, things did change. They pretty much always do.
 
the code running is way slower.

it is basically happening with frank's ili9341 library that draws at 4fps at stream mode.

i have let him know on github as well.
 
the code running is way slower.

it is basically happening with frank's ili9341 library that draws at 4fps at stream mode.

i have let him know on github as well.

Hm. This is mysteriuos - the transfer is done in hardware, without any running code. After setup of dma-hardware, no code is executed to transfer the data. Pure DMA.

And all my tests show, that the code is running faster.
 
Last edited:
Hm. This is miraculous - the transfer is done in hardware, without any running code. After setup of dma-hardware, no code is executed to transfer the data. Pure DMA.
I was wondering that as well. But without throwing too many darts, I wondering what 4fps means? Is it saying how fast the screen redrew or maybe how fast it was getting the data that was supposed to go to the screen? But again can only throw darts.
 
hm... let me check if i'm doing refresh once or continous refresh

EDIT: seems it's not a refresh issue at all!but a issue with how much load there is on the mcu.

with a teensy 3.6 and teensyduino 1.31 and arduino 1.6.1 i get about 3 million counts in the main loop a second, where the teensy duino 1.35 and arduino 1.8.1 gives me roughly 1.68 million. :s

Code:
static IntervalTimer cyclecounter;
volatile uint32_t count = 0;

void setup() {
  delay(1000);
  Serial.begin(38400);
  cyclecounter.begin(one_s_task,1000000);
}
void loop() {
  // put your main code here, to run repeatedly:
  count++;
}

void one_s_task() {
  Serial.print("counts / second: ");
  Serial.println(count);
  count = 0;
}

edit 2:

seems i can get the code to rub at the old speed again with this code:

Code:
static IntervalTimer cyclecounter;
volatile uint32_t count = 0;

void setup() {
  delay(1000);
  Serial.begin(38400);
  cyclecounter.begin(one_s_task,1000000);
}
void loop() {
  while (1) {
    count++;
  }
}

void one_s_task() {
  Serial.print("counts / second: ");
  Serial.println(count);
  count = 0;
}

all code running at 240mhz
 
Last edited:
So what you are saying is yield is taking a long time, as if you look in main.cpp, you will see:
Code:
extern "C" int main(void)
{
#ifdef USING_MAKEFILE

	// To use Teensy 3.0 without Arduino, simply put your code here.
	// For example:

	pinMode(13, OUTPUT);
	while (1) {
		digitalWriteFast(13, HIGH);
		delay(500);
		digitalWriteFast(13, LOW);
		delay(500);
	}


#else
	// Arduino's main() function just calls setup() and loop()....
	setup();
	while (1) {
		loop();
		yield();
	}
#endif
}
So in your main code, what do you get if you change it to:
Code:
static IntervalTimer cyclecounter;
volatile uint32_t count = 0;

void setup() {
  delay(1000);
  Serial.begin(38400);
  cyclecounter.begin(one_s_task,1000000);
}
void loop() {
  // put your main code here, to run repeatedly:
  count++;
}

void one_s_task() {
  Serial.print("counts / second: ");
  Serial.println(count);
  count = 0;
}
void yield(void) {
}
 
I tried running my version on T3.6 on 1.8.1 on current beta (different compiler)
At 2.4mhz faster with LTO:
Code:
counts / second: 34263183
counts / second: 34263347
counts / second: 34263100
counts / second: 68525940
counts / second: 102788647
counts / second: 34262748
counts / second: 34262837
counts / second: 34262769
counts / second: 34262734
counts / second: 34262875
counts / second: 34262758
counts / second: 34262769
Note: I never run at 2.4mhz... Typically at 1.8 sometimes try at 216...

With 1.8.0 running at 2.4mhz with this build I see again it running all over the place in speed...
Code:
counts / second: 34263183
counts / second: 34263347
counts / second: 34263100
counts / second: 68525940
counts / second: 102788647
counts / second: 34262748
counts / second: 34262837
counts / second: 34262769
counts / second: 34262734
counts / second: 34262875
counts / second: 34262758
counts / second: 34262769
 
hmm.. seemingly my cde had an issue somewhere, i now have the usual speed back :)

also anyone has an idea why there is a yield void in the main code?
 
yield() is called at each loop() end to process the serialEvent()'s.

VOLATILE doesn't mean ATOMIC. Some HALF the time loop is working on count++ when the interval times out.

New compiler optimizations probably affected this if this was the baseline test. And faster CPU has 50% more chances of catching the edge.

Only to see it fail the same - I added the 10 array to store COUNT - and the problem followed because count was copied and zeroed each second during an interrupt of the loop() count++, i.e. read count, add 1 to count, write "new" count value - it never saw it go to ZERO.

Having modified the isr() to only print after 10 seconds I changed the workflow as follows and now loop() is doing more work indexing the array - but look at the consistency - except when the isr() fires in mid math and the full DWORD goes to ZERO on the 10th element - in the other [9] cases the change to "ii" doesn't affect any work in progress whether if increments the OLD ii or the new ii the location by 1 - the error is '1 in a million' (or two):
Code:
static IntervalTimer cyclecounter;

void yield(void) {
}
void setup() {
  delay(1000);
  Serial.begin(38400);
  while ( !Serial ) ;
  Serial.print("setup() -----");
  Serial.print(" F_CPU=");
  Serial.println(F_CPU);
  cyclecounter.begin(one_s_task, 1000000);
}
volatile int ii = 0;
uint32_t Clist[10];
void one_s_task() {
  if ( ++ii >= 10 ) {
    for ( int jj = 0; jj < 10; jj++ ) {
      ii = 0;
      Serial.print("c/s:");
      Serial.println(Clist[jj]);
      Clist[jj] = 0;
    }
    Serial.println(micros());
  }
}

void loop() {
  // put your main code here, to run repeatedly:
  Clist[ii]++;
}

At (normal) 240 MHz I see this - with NO FLYERS in index 0 to 8::
setup() ----- F_CPU=240000000
c/s:21805559
c/s:21805635
c/s:21805648
c/s:21805675
c/s:21805623
c/s:21805684
c/s:21805623
c/s:21805621
c/s:21805625
c/s:21805678
11400057

But some 10% the 10th element shows this?
c/s:43611316
191400056

setup() ----- F_CPU=180000000
c/s:16350819
c/s:16350782
c/s:16350748
c/s:16350698
c/s:16350755
c/s:16350742
c/s:16350722
c/s:16350731
c/s:16350725
c/s:16350697
11400062

And at 180 MHz the 10th element on the 10th iteration of 10 seconds:
c/s:32701477
101400062

And then the next three in a row also at 180 MHz so the edge is still there:
c/s:49052226
111400062
...
c/s:65402951
121400062
...
c/s:81753690
131400062
 
This change to two elements should be safe to zero only the NEW element on the isr() call:
Code:
volatile int ii = 0;
uint32_t Clist[2];
void one_s_task() {
  if ( 1 == ii ) {
    for ( int jj = 0; jj < 2; jj++ ) {
      Serial.print("c/s:");
      Serial.println(Clist[jj]);
    }
    ii = 0;
    Serial.println(micros());
  }
  else
    ii = 1;
  Clist[ii] = 0;
}

The [0] will be smaller because of time lost to USB printing? And the rare count will be still be lost when [1] is printed before loop() writes that increment.
 
<edit>: BTW - is this testing as suggested? There is a huge jump in recorded counts between LTO and non-LTO and from initial results above of under a few Million to these of 25 and 34 Million and down to 12 Million with no LTO????

Hopefully not OT, or posting to myself...:: the compiler update seems good and LTO shows improvement - just have to be careful to test what is intended - seeing something odd may be the test itself::

Final answer to report loop() counts - treat the count like a running timer that wraps properly on uint32_t. With {setup above} results in counts near 34,265,240 at 240 MHz and 25,695,089 at 180 MHz compiled FASTER with LTO:
Code:
volatile uint32_t count = 0;
static uint32_t Lastcount = 0;
void one_s_task() {
  Serial.print(count - Lastcount);
  Serial.print("=# @");
  Serial.println(millis());
  Lastcount = count;
}

void loop() {
  count++;
}

With default Arduino PJRC yield drops the 180 MHz count to 2,644,868. But this is basically on an empty loop() - even indexing an array for counting drops the count 50% as below. Further below compile results show that the LTO option is needed to see this extreme high count.

Edit on prior post sharing the USB print time - and resolves the interrupted loop() count++, but then it doesn't exemplify the problem any more gives 21,805,536 at 240 MHz that shows the effect of array indexing overhead to get loop(){Clist[ii]++;}. This shows a second way to count and may be an isr() safe way to swap buffers?:
Code:
uint32_t Clist[2] = {0,0};
void one_s_task() {
  ii^=1;
  Serial.print(Clist[ii]);
  Serial.print("=# @");
  Serial.println(millis());
  Clist[ii] = 0;
}

This shows some alternate compiles - the LTO is key here??:
setup() ----- F_CPU=240000000 // FASTER with LTO
34265346=# @2400
setup() ----- F_CPU=240000000 // FASTEST with LTO :: Sketch uses 10940 bytes
34265275=# @2400
setup() ----- F_CPU=240000000 // FASTEST with no LTO :: Sketch uses 13076 bytes
12624948=# @2400
setup() ----- F_CPU=240000000 // FAST with no LTO
12624954=# @2400
setup() ----- F_CPU=240000000 // SMALLEST with no LTO
14109973=# @2400
setup() ----- F_CPU=240000000 // SMALLEST with LTO :: Sketch uses 6828 bytes
34264891=# @2400

Doing FASTEST and no LTO using default yield() gives this::
setup() ----- F_CPU=240000000
1689107=# @2400

Last note - I set the IntervalTimer to 10 seconds and got this - count at 240 MHz range is ~400 per 10 secs and ~9,000 when at 180 MHz:
setup() ----- F_CPU=240000000 // FAST with LTO :: Sketch uses 8064 bytes and void yield(void) { }
342,651,037=# @11400
setup() ----- F_CPU=180000000 // FAST with LTO :: Sketch uses 8064 bytes and void yield(void) { }
256,948,736=# @11400

Final sketch with both methods: View attachment IntervalCounterISR.ino
 
Last edited:
Status
Not open for further replies.
Back
Top