PDA

View Full Version : Teensyduino 1.49 Beta #3



Paul
12-15-2019, 01:59 PM
Here is a third beta test for Teensyduino 1.49.


EDIT: links removed, please use 1.49-beta5 (https://forum.pjrc.com/threads/59030-Teensyduino-1-49-Beta-5)


Changes since Teensyduino 1.49-beta2 (https://forum.pjrc.com/threads/58654-Teensyduino-1-49-Beta-2)

Fix delay() and micros() on Teensy 4.0 at 24 MHz
Fix USB touchscreen
Fix serial receive at very high baud rates on Teensy 4.0
Fix MIDI+Serial on Teensy 4.0
Fix USB Keyboard sending "garbage" in long strings on Teensy 4.0
Fix analogWrite for values >= 2^res on certain Teensy 4.0 pins
Wire only clear FIFO with bus is idle on Teensy 4.0 (KurtE)
Wire use open drain pin config on Teensy 4.0
Update ST7735_t3 library
SPI optimize usingInterrupt() on Teensy 4.0 (FrankB)
SPI allow faster clock speeds on Teensy 4.0 (KurtE)
Fix daylight saving time issue with automatic RTC set on Teensy 4.0
Add Makefile for Teensy 4.0
Add main.cpp for Teensy 4.0 (KurtE)
Fix compiler warnings on Teensy 2.0

mjs513
12-15-2019, 02:21 PM
Downloaded and installed TD1.49 Beta 3 on Windows 10x64 Home (all updates installed to date) with out any issue.

As an initial test ran @defragster version of the ClocksT4.ino sketch to test the clock changes. Tests ran at 240, 110, 600, 130, 110, 24, and back to 600Mhz. Only frequency that failed was for the 130 Mhz clock speed:

F_CPU=129600000 50ms delay:: 1009 us and 1 ms
System Clock: 129600000
IPG Clock: 129600000
Semc Clock: 43200000
RTC Clock: 32768
USB1pll Clock: 480000000
Peripheral Clock: 24000000
Osc Clock: 24000000
Arm Clock: 648000000
.

>>>>>>>>>>>>>>>>>>>> BUGBUG @129 MHz <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
F_CPU=129600000
running your test sketch sure enough showed the clock running backwards

..
4408004 -> 4408000
4409004 -> 4409000
4410004 -> 4410000
4411004 -> 4411000
-----


Then tested the usb keyboard example sketch "simple.ino" with Notepad++ and it print to the "Hello World" plus the count to Notepad++. Printing to sermon only showed the count being printed not hellow world?

Will test BNO080 and a couple of others but have to get them set up.

mjs513
12-15-2019, 02:52 PM
Tested changes to the Wire library with three devices:

LidarLite V4LED: worked no issues
BNO055: worked out of the box no issues
BNO080: kind of worked. Tested with three example sketches. Sometimes on initial load would I had to change wire.clockSpeed from 400 to 100 and 400 and it would start working. Possible it needs additional pull-ups to work properly. But it would work...


EDIT: BNO080 pin configs = see post https://forum.pjrc.com/threads/58654-Teensyduino-1-49-Beta-2?p=223872&viewfull=1#post223872 in the beta #2 thread

wwatson
12-15-2019, 04:57 PM
Now showing correct RTC time after an upload. Not an hour off.
Skipped TD 1.49 Beta 2 so not sure which TD it changed in.

ETMoody3
12-15-2019, 06:05 PM
No issues as with 1.49 b2

PaulStoffregen
12-15-2019, 11:03 PM
running your test sketch sure enough showed the clock running backwards

Ok, I've put this on my list of bugs to fix.

But since the errors are so small and only happen at certain slower CPU speeds, I'm going to work on this later.

defragster
12-15-2019, 11:53 PM
TD 1.49 b3 installed on Win 10 - works.

per @mjs513 post #2 above : ClocksT4.ino sketch much improved - 130 MHz is intermittent? Need to try the Grieman SDFat_Beta - posted on that thread it was not working at 150?

neurofun
12-16-2019, 01:19 AM
T4 @ 600MHz
using this RA8875 lib: https://github.com/mjs513/RA8875 to draw 2 different types of screens.

With TD1.49b2
screen1 takes about 23ms to draw
screen2 takes about 65ms to draw

With TD1.49b3
screen1 takes about 180ms to draw
screen2 takes about 450ms to draw

The screens take about 7-8 times longer to draw with beta3 than with beta2, which is obviously not good.

T4 @ 528MHz

With TD1.49b3
screen1 takes about 125ms to draw
screen2 takes about 320ms to draw

The screens draw faster @528MHz than @600MHz.

neurofun
12-16-2019, 02:52 AM
T4 @ 600MHz
With this example beta 3 is about 10x slower than beta2. 2940ms vs 283ms.

#include <SPI.h>
#include <RA8875.h>

//teensy4 SPI0 RA8875
#define TFT_CS 10
#define TFT_RST 9 // 255 = unused, connect to 3.3V
#define TFT_MOSI 11
#define TFT_SCLK 13
#define TFT_MISO 12

RA8875 tft = RA8875(TFT_CS, TFT_RST, TFT_MOSI, TFT_SCLK, TFT_MISO);

void setup(){
while (!Serial && millis() < 2500);

tft.begin(RA8875_800x480);
tft.fillWindow(RA8875_BLACK);

uint32_t msec = millis();

for(int x = 0; x < 800; x += 10){
for(int y = 0; y < 480; y += 10){
tft.drawRect(x , y, 8, 8, RA8875_YELLOW);
tft.fillRect(x+2 , y+2, 4, 4, RA8875_YELLOW);
}
}

msec = millis() - msec;
Serial.print(msec);
Serial.println(" ms/page");
}
void loop(){}

WMXZ
12-16-2019, 09:02 AM
T4 @ 600MHz
With this example beta 3 is about 10x slower than beta2. 2940ms vs 283ms.

or millis() was wrong in beta2?

defragster
12-16-2019, 09:16 AM
@Paul and #mjs513:

Paul - thanks for leaving the old "...\teensy\avr\cores\teensy4\clockspeed.c" code in place under COMMENT!

I reverted to that code {based on github link} and my "(embarrassingly opps updated) sketch" shows repeated failures { for 24 and 110 MHz }, that change did change something for the better!

The sketch as it was truly was detecting the errors - but there was a lurking edge condition yielding the false positive BUGBUG indications after the change. The one line addition to that code is in red updated in this post : Teensy-4-0-Clock-speed-influences-delay-and-SPI (https://forum.pjrc.com/threads/58688-Teensy-4-0-Clock-speed-influences-delay-and-SPI?p=223846&viewfull=1#post223846) {@mjs513 - perhaps you can test w/td1.49b3 and not see any sings of failure}

With the observed correction to the provided test sketch - and the TD 1.49b3 code as shipped - those detected time anomalies are gone!

Before seeing the correction issue in the sketch the 1.49b3 code was giving FALSE intermittent BUGBUG notes. It may be the first call to millis() after

To get to finding this I dug into the micros() code and pulled out the values used to determine that { and recorded them before and after set_arm_clock() - and they all appear SANE somehow across frequency change. They can't be perfect as the reference points like "F_CPU_ACTUAL/1000000" change significantly. But with the clockspeed.c change of setting 'CCM_CBCDR' for beta 3 passes that test code okay, and reverting it to beta 2 state gives the failures - and even in that case the micros() values pulled make sense.

If all else tests well maybe that is the end of this.


The DISPLAY slowdown seen by @neurofun is odd - from what I see of the change that provided the correction it was not a direct change to millis() or micros(). Maybe the change made should only be used for change to lower speed clocks?

@neurofun: is the 10X slowdown obviously visible in drawing - or just as reported by the timing code? As noted above the prior code remains under comments and a quick edit/save/rebuild can restore the Beta 2 code - as long as you start and stay at 600 MHz it isn't expected to change anything:
in :: \hardware\teensy\avr\cores\teensy4\clockspeed.c find these lines


//cbcdr &= ~CCM_CBCDR_PERIPH_CLK_SEL;
//CCM_CBCDR = cbcdr; // why does this not work at 24 MHz?
CCM_CBCDR &= ~CCM_CBCDR_PERIPH_CLK_SEL;
while (CCM_CDHIPR & CCM_CDHIPR_PERIPH_CLK_SEL_BUSY) ; // wait


And swap the comments on the lines to appear like this:


cbcdr &= ~CCM_CBCDR_PERIPH_CLK_SEL;
CCM_CBCDR = cbcdr; // why does this not work at 24 MHz?
// CCM_CBCDR &= ~CCM_CBCDR_PERIPH_CLK_SEL;
while (CCM_CDHIPR & CCM_CDHIPR_PERIPH_CLK_SEL_BUSY) ; // wait

defragster
12-16-2019, 09:27 AM
or millis() was wrong in beta2?

@WMXZ - thread you started : forum.pjrc.com/threads/58053-T4-set_arm_clock-and-micros()

Exhibited the issue with:


#if defined(__IMXRT1062__)
extern "C" uint32_t set_arm_clock(uint32_t frequency);
#endif

void setup() {
// put your setup code here, to run once:
#if defined(__IMXRT1062__)
set_arm_clock(24000000); // comment here to get full speed
#endif
}

void loop() {
// put your main code here, to run repeatedly:
static uint32_t t0;
if(millis()>(t0+1000))
{
t0=millis();
uint32_t t1=micros();
delay(50);
Serial.println(micros()-t1);
}
}

With TD 1.49Beta3 can you reproduce that issue - or what led you to see it?

I just tested against beta 3 and no problem, and p#11 reversion to Beta 2 indeed produces the issue.

My abuse of that code {noted in post #11} to a function was showing false positives when I didn't set "t0" on entry. But code as presented in your thread seems fixed just by changing the code as shown in p#11.

WMXZ
12-16-2019, 11:26 AM
@WMXZ - thread you started : forum.pjrc.com/threads/58053-T4-set_arm_clock-and-micros()

I know, that was the reason for suggestion, stated only as question.

mjs513
12-16-2019, 12:16 PM
The sketch as it was truly was detecting the errors - but there was a lurking edge condition yielding the false positive BUGBUG indications after the change. The one line addition to that code is in red updated in this post : Teensy-4-0-Clock-speed-influences-delay-and-SPI {@mjs513 - perhaps you can test w/td1.49b3 and not see any sings of failure}

Unfortunately, at 130mhz the error is still there even with your code change to clocksT4:

F_CPU=129600000 50ms delay:: 1009 us and 1 ms
System Clock: 129600000

IPG Clock: 129600000

Semc Clock: 43200000

RTC Clock: 32768

USB1pll Clock: 480000000

Peripheral Clock: 24000000

Osc Clock: 24000000

Arm Clock: 648000000

Usb1PllPfd0 Clock: 720000000

Usb1PllPfd1 Clock: 664615368

Usb1PllPfd2 Clock: 508235292

Usb1PllPfd3 Clock: 454736826

Usb2Pll Clock: 24000000

SysPll Clock: 528000000

SysPllPfd0 Clock: 351999990

SysPllPfd1 Clock: 594000000

SysPllPfd2 Clock: 396000000

SysPllPfd3 Clock: 297000000

EnetPll0 Clock: 0

EnetPll1 Clock: 0

AudioPll Clock: 786480000

VideoPll Clock: 0


>>>>>>>>>>>>>>>>>>>> BUGBUG @129 MHz <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
F_CPU=129600000

mjs513
12-16-2019, 12:48 PM
T4 @ 600MHz
With this example beta 3 is about 10x slower than beta2. 2940ms vs 283ms.


@nerofun

Just ran your example using a Adafruit 800x480 display with their RA8875 controller. Also ran it at a couple SPI speeds:

12Mhz SPI
600: 293 ms/page
528: 297 ms/page
396: 307 ms/page

16Mhz SPI
600: 252 ms/page
528: 252 ms/page
396: 266 ms/page

18Mhz SPI
600: 258 ms/page
528: 257 ms/page
396: 446 ms/page

Are you sure about that 2940ms?

PaulStoffregen
12-16-2019, 01:43 PM
or millis() was wrong in beta2?

As far as I know, no bugs have been found with millis().

However, micros() had at least 2 bugs. Neither is new. These have been with us since Teensy 4.0 release.

Functions like delay() which use micros() internally are also affected by these micros() bugs.

Bug #1: (fixed) When configured for 24 MHz speed, we were actually running at 30 MHz, but F_CPU_ACTUAL was set to 24000000. When F_CPU_ACTUAL is wrong, micros() computes incorrect output.

Bug #2: (not fixed) At certain slower CPU speeds, micros() gives wrong results just before millis() increments. For example, if millis() is currently 499 and about to turn to 500, the bug in micros() might result in output like 500001, even though micros() should never report more than 499999 while millis() is still 499.

So far, neither of these bugs has been found while running at 600 or 528 MHz.

neurofun
12-16-2019, 04:22 PM
@neurofun: is the 10X slowdown obviously visible in drawing - or just as reported by the timing code? As noted above the prior code remains under comments and a quick edit/save/rebuild can restore the Beta 2 code - as long as you start and stay at 600 MHz it isn't expected to change anything:
in :: \hardware\teensy\avr\cores\teensy4\clockspeed.c find these lines


//cbcdr &= ~CCM_CBCDR_PERIPH_CLK_SEL;
//CCM_CBCDR = cbcdr; // why does this not work at 24 MHz?
CCM_CBCDR &= ~CCM_CBCDR_PERIPH_CLK_SEL;
while (CCM_CDHIPR & CCM_CDHIPR_PERIPH_CLK_SEL_BUSY) ; // wait


And swap the comments on the lines to appear like this:


cbcdr &= ~CCM_CBCDR_PERIPH_CLK_SEL;
CCM_CBCDR = cbcdr; // why does this not work at 24 MHz?
// CCM_CBCDR &= ~CCM_CBCDR_PERIPH_CLK_SEL;
while (CCM_CDHIPR & CCM_CDHIPR_PERIPH_CLK_SEL_BUSY) ; // wait

The slowdown is clearly visible on the display.
Changing clockspeed.c does not resolve the slowdown.


@nerofun

Just ran your example using a Adafruit 800x480 display with their RA8875 controller. Also ran it at a couple SPI speeds:

12Mhz SPI
600: 293 ms/page
528: 297 ms/page
396: 307 ms/page

16Mhz SPI
600: 252 ms/page
528: 252 ms/page
396: 266 ms/page

18Mhz SPI
600: 258 ms/page
528: 257 ms/page
396: 446 ms/page

Are you sure about that 2940ms?
Yes, I'm sure about 2940ms. But that was yesterday.
Today the times vary around 2300ms, which I really don't understand.

After investigating a little further I noticed changing USB Type has an influence on the speed.
Serial = +-2000ms
Serial + MIDI = +-2000ms
MIDI = +-2300ms
Raw HID = +-2300ms

mjs513
12-16-2019, 05:13 PM
@neurofun

That's really strange results you are getting with your RA8875. Watching the display there is no noticeable delay at all. I can't seem to duplicate your problems with my Adafruit display. Its like your SPI settings are way off. To get anywhere to what you are seeing I have to run at 180khz clock vs 18mhz, then I will see about 2700ms to update the screen.

neurofun
12-16-2019, 05:32 PM
Did some more tests and USB Type does have some influence.
With this new test I get opposite results

#include <SPI.h>
#include <RA8875.h>

//teensy4 SPI0 RA8875
#define TFT_CS 10
#define TFT_RST 9 // 255 = unused, connect to 3.3V
#define TFT_MOSI 11
#define TFT_SCLK 13
#define TFT_MISO 12

RA8875 tft = RA8875(TFT_CS, TFT_RST, TFT_MOSI, TFT_SCLK, TFT_MISO);

void setup(){
while (!Serial && millis() < 2500);

tft.begin(RA8875_800x480);
}
void loop(){
tft.fillWindow(RA8875_BLACK);

uint32_t msec = millis();

for(int x = 0; x < 800; x += 10){
for(int y = 0; y < 480; y += 10){
// tft.drawRect(x , y, 8, 8, RA8875_YELLOW);
// tft.fillRect(x+2 , y+2, 4, 4, RA8875_YELLOW);
tft.setCursor(x, y);
tft.print("Z");

}
}

msec = millis() - msec;
Serial.print(msec);
Serial.println(" ms/page");
}

USB Type : Raw HID
USB Type : MIDI

572 ms/page
591 ms/page
588 ms/page
621 ms/page
604 ms/page
607 ms/page
617 ms/page
604 ms/page
594 ms/page
630 ms/page
612 ms/page
596 ms/page
601 ms/page
598 ms/page
618 ms/page
619 ms/page
617 ms/page
623 ms/page
640 ms/page
649 ms/page
656 ms/page
665 ms/page
657 ms/page
656 ms/page
660 ms/page
639 ms/page
643 ms/page
653 ms/page

USB Type : Serial
USB Type : Serial + MIDI

964 ms/page
959 ms/page
979 ms/page
959 ms/page
997 ms/page
1027 ms/page
1028 ms/page
1045 ms/page
1042 ms/page
1059 ms/page
1067 ms/page
1072 ms/page
1062 ms/page
1135 ms/page
1137 ms/page
1122 ms/page
1129 ms/page
1088 ms/page
1141 ms/page
1211 ms/page
1184 ms/page
1217 ms/page
1185 ms/page
1218 ms/page
1209 ms/page
1247 ms/page
1294 ms/page
1278 ms/page
1248 ms/page
1263 ms/page
1304 ms/page
1307 ms/page
1320 ms/page
1331 ms/page
1333 ms/page
1348 ms/page
1357 ms/page
1345 ms/page
1345 ms/page
1370 ms/page
1420 ms/page
1398 ms/page
1448 ms/page
1412 ms/page
1409 ms/page
1438 ms/page
1409 ms/page
1441 ms/page

Time slowly creeps up.
Re-uploading the code and the previous creep is still there.
Power cycling the T4 for 10sec and the creep starts from zero again.


Reverting back to beta2
USB Type : Serial
gives a solid
80 ms/page
USB Type : Raw HID
USB Type : MIDI
give a solid
86 ms/page

KurtE
12-16-2019, 05:44 PM
@neurofun @mjs513 - If I get a chance I will try to get one of my buydisplay units hooked back up...
Again I am assuming T4. I will probably try to hook up the 800x...

@Paul and @mjs513 - Sorry for cross post... I have already posted this on the ADC thread... But Paul was wondering if it makes sense to update IMXRT.h for the ADC_ETC #defines to add more defines for options and potentially for a structure...

That is the current #defines:

#define IMXRT_ADC_ETC (*(IMXRT_REGISTER32_t *)0x403B0000)
#define ADC_ETC_CTRL (IMXRT_ADC_ETC.offset000)
#define ADC_ETC_DONE0_1_IRQ (IMXRT_ADC_ETC.offset004)
#define ADC_ETC_DONE2_ERR_IRQ (IMXRT_ADC_ETC.offset008)
#define ADC_ETC_DMA_CTRL (IMXRT_ADC_ETC.offset00C)
#define ADC_ETC_TRIG0_CTRL (IMXRT_ADC_ETC.offset010)
#define ADC_ETC_TRIG0_COUNTER (IMXRT_ADC_ETC.offset014)
#define ADC_ETC_TRIG0_CHAIN_1_0 (IMXRT_ADC_ETC.offset018)
#define ADC_ETC_TRIG0_CHAIN_3_2 (IMXRT_ADC_ETC.offset01C)
#define ADC_ETC_TRIG0_CHAIN_5_4 (IMXRT_ADC_ETC.offset020)
#define ADC_ETC_TRIG0_CHAIN_7_6 (IMXRT_ADC_ETC.offset024)
#define ADC_ETC_TRIG0_RESULT_1_0 (IMXRT_ADC_ETC.offset028)
#define ADC_ETC_TRIG0_RESULT_3_2 (IMXRT_ADC_ETC.offset02C)
#define ADC_ETC_TRIG0_RESULT_5_4 (IMXRT_ADC_ETC.offset030)
#define ADC_ETC_TRIG0_RESULT_7_6 (IMXRT_ADC_ETC.offset034)
#define ADC_ETC_TRIG1_CTRL (IMXRT_ADC_ETC.offset038)
#define ADC_ETC_TRIG1_COUNTER (IMXRT_ADC_ETC.offset03C)
#define ADC_ETC_TRIG1_CHAIN_1_0 (IMXRT_ADC_ETC.offset040)
#define ADC_ETC_TRIG1_CHAIN_3_2 (IMXRT_ADC_ETC.offset044)
#define ADC_ETC_TRIG1_CHAIN_5_4 (IMXRT_ADC_ETC.offset048)
#define ADC_ETC_TRIG1_CHAIN_7_6 (IMXRT_ADC_ETC.offset04C)
#define ADC_ETC_TRIG1_RESULT_1_0 (IMXRT_ADC_ETC.offset050)
#define ADC_ETC_TRIG1_RESULT_3_2 (IMXRT_ADC_ETC.offset054)
#define ADC_ETC_TRIG1_RESULT_5_4 (IMXRT_ADC_ETC.offset058)
#define ADC_ETC_TRIG1_RESULT_7_6 (IMXRT_ADC_ETC.offset05C)

Repeats for 7 triggers... I am just showing first 2...
But I think could be: Note Just typed in, so need to check if compiler likes...

typedef struct {
uint32_t DONE0_1_IRQ; // offset004
uint32_t DONE2_ERR_IRQ; // offset008
uint32_t DMA_CTRL; // offset00C
struct {
uint32_t CTRL; //offset010
uint32_t COUNTER; //offset014
uint32_t CHAIN_1_0; //offset018
uint32_t CHAIN_3_2; //offset01C
uint32_t CHAIN_5_4; //offset020
uint32_t CHAIN_7_6; //offset024
uint32_t RESULT_1_0; //offset028
uint32_t RESULT_3_2; //offset02C
uint32_t RESULT_5_4; //offset030
uint32_t RESULT_7_6; //offset034
} TRIG[7];
} IMXRT_ADC_ETC_t;

Or could maybe be:

typedef struct {
uint32_t DONE0_1_IRQ; // offset004
uint32_t DONE2_ERR_IRQ; // offset008
uint32_t DMA_CTRL; // offset00C
struct {
uint32_t CTRL; //offset010
uint32_t COUNTER; //offset014
uint32_t CHAIN_NP1_N[4];
uint32_t RESULT_NP1_N[4]l
} TRIG[7];
} IMXRT_ADC_ETC_t;


Also depending on if which structure, I assume you would want all the #define
#define IMXRT_ADC_ETC (*(IMXRT_ADC_ETC_t *)0x403B0000)

#define ADC_ETC_CTRL (IMXRT_ADC_ETC.CTRL)


Also Have defines for most of the sub-options like:

#define ADC_ETC_CTRL_SOFTRST ((uint32_t)(1<<31))
#define ADC_ETC_CTRL_TSC_BYPASS ((uint32_t)(1<<30))
#define ADC_ETC_CTRL_DMA_MODE_SEL ((uint32_t)(1<<29))
#define ADC_ETC_CTRL_PRE_DIVIDER(n) ((uint32_t)(((n) & 0xff) << 16))
#define ADC_ETC_CTRL_EXT1_TRIG_PRIORITY(n) ((uint32_t)(((n) & 0x07) << 13))
#define ADC_ETC_CTRL_EXT1_TRIG_ENABLE ((uint32_t)(1<<12))
#define ADC_ETC_CTRL_EXT0_TRIG_PRIORITY(n) ((uint32_t)(((n) & 0x07) << 9))
#define ADC_ETC_CTRL_EXT0_TRIG_ENABLE ((uint32_t)(1<<8))
#define ADC_ETC_CTRL_TRIG_ENABLE(n) ((uint32_t)(((n) & 0xff) << 0))

#define ADC_ETC_DONE0_1_IRQ_TRIG7_DONE1 ((uint32_t)(1<<23))
#define ADC_ETC_DONE0_1_IRQ_TRIG6_DONE1 ((uint32_t)(1<<22))
#define ADC_ETC_DONE0_1_IRQ_TRIG5_DONE1 ((uint32_t)(1<<21))
#define ADC_ETC_DONE0_1_IRQ_TRIG4_DONE1 ((uint32_t)(1<<20))
#define ADC_ETC_DONE0_1_IRQ_TRIG3_DONE1 ((uint32_t)(1<<19))
#define ADC_ETC_DONE0_1_IRQ_TRIG2_DONE1 ((uint32_t)(1<<18))
#define ADC_ETC_DONE0_1_IRQ_TRIG1_DONE1 ((uint32_t)(1<<17))
#define ADC_ETC_DONE0_1_IRQ_TRIG0_DONE1 ((uint32_t)(1<<16))
#define ADC_ETC_DONE0_1_IRQ_TRIG7_DONE0 ((uint32_t)(1<<7))
#define ADC_ETC_DONE0_1_IRQ_TRIG6_DONE0 ((uint32_t)(1<<6))
#define ADC_ETC_DONE0_1_IRQ_TRIG5_DONE0 ((uint32_t)(1<<5))
#define ADC_ETC_DONE0_1_IRQ_TRIG4_DONE0 ((uint32_t)(1<<4))
#define ADC_ETC_DONE0_1_IRQ_TRIG3_DONE0 ((uint32_t)(1<<3))
#define ADC_ETC_DONE0_1_IRQ_TRIG2_DONE0 ((uint32_t)(1<<2))
#define ADC_ETC_DONE0_1_IRQ_TRIG1_DONE0 ((uint32_t)(1<<1))
#define ADC_ETC_DONE0_1_IRQ_TRIG0_DONE0 ((uint32_t)(1<<0))

#define ADC_ETC_DONE2_ERR_TRIG7_ERR ((uint32_t)(1<<23))
#define ADC_ETC_DONE2_ERR_TRIG6_ERR ((uint32_t)(1<<22))
#define ADC_ETC_DONE2_ERR_TRIG5_ERR ((uint32_t)(1<<21))
#define ADC_ETC_DONE2_ERR_TRIG4_ERR ((uint32_t)(1<<20))
#define ADC_ETC_DONE2_ERR_TRIG3_ERR ((uint32_t)(1<<19))
#define ADC_ETC_DONE2_ERR_TRIG2_ERR ((uint32_t)(1<<18))
#define ADC_ETC_DONE2_ERR_TRIG1_ERR ((uint32_t)(1<<17))
#define ADC_ETC_DONE2_ERR_TRIG0_ERR ((uint32_t)(1<<16))
#define ADC_ETC_DONE2_ERR_TRIG7_DONE2 ((uint32_t)(1<<7))
#define ADC_ETC_DONE2_ERR_TRIG6_DONE2 ((uint32_t)(1<<6))
#define ADC_ETC_DONE2_ERR_TRIG5_DONE2 ((uint32_t)(1<<5))
#define ADC_ETC_DONE2_ERR_TRIG4_DONE2 ((uint32_t)(1<<4))
#define ADC_ETC_DONE2_ERR_TRIG3_DONE2 ((uint32_t)(1<<3))
#define ADC_ETC_DONE2_ERR_TRIG2_DONE2 ((uint32_t)(1<<2))
#define ADC_ETC_DONE2_ERR_TRIG1_DONE2 ((uint32_t)(1<<1))
#define ADC_ETC_DONE2_ERR_TRIG0_DONE2 ((uint32_t)(1<<0))

#define ADC_ETC_DMA_CTRL_TRIG7_REQ ((uint32_t)(1<<23))
#define ADC_ETC_DMA_CTRL_TRIG6_REQ ((uint32_t)(1<<22))
#define ADC_ETC_DMA_CTRL_TRIG5_REQ ((uint32_t)(1<<21))
#define ADC_ETC_DMA_CTRL_TRIG4_REQ ((uint32_t)(1<<20))
#define ADC_ETC_DMA_CTRL_TRIG3_REQ ((uint32_t)(1<<19))
#define ADC_ETC_DMA_CTRL_TRIG2_REQ ((uint32_t)(1<<18))
#define ADC_ETC_DMA_CTRL_TRIG1_REQ ((uint32_t)(1<<17))
#define ADC_ETC_DMA_CTRL_TRIG0_REQ ((uint32_t)(1<<16))
#define ADC_ETC_DMA_CTRL_TRIG7_ENABLE ((uint32_t)(1<<7))
#define ADC_ETC_DMA_CTRL_TRIG6_ENABLE ((uint32_t)(1<<6))
#define ADC_ETC_DMA_CTRL_TRIG5_ENABLE ((uint32_t)(1<<5))
#define ADC_ETC_DMA_CTRL_TRIG4_ENABLE ((uint32_t)(1<<4))
#define ADC_ETC_DMA_CTRL_TRIG3_ENABLE ((uint32_t)(1<<3))
#define ADC_ETC_DMA_CTRL_TRIG2_ENABLE ((uint32_t)(1<<2))
#define ADC_ETC_DMA_CTRL_TRIG1_ENABLE ((uint32_t)(1<<1))
#define ADC_ETC_DMA_CTRL_TRIG0_ENABLE ((uint32_t)(1<<0))

#define ADC_ETC_TRIG_CTRL_SYNC_MODE ((uint32_t)(1<<16))
#define ADC_ETC_TRIG_CTRL_TRIG_PRIORITY(n) ((uint32_t)(((n) & 0x07) << 12))
#define ADC_ETC_TRIG_CTRL_TRIG_CHAIN(n) ((uint32_t)(((n) & 0x07) << 8))
#define ADC_ETC_TRIG_CTRL_TRIG_MODE ((uint32_t)(1<<4))
#define ADC_ETC_TRIG_CTRL_SW_TRIG ((uint32_t)(1<<0))

#define ADC_ETC_TRIG_COUNTER_SAMPLE_INTERVAL(n) ((uint32_t)(((n) & 0xff) << 16))
#define ADC_ETC_TRIG_COUNTER_INIT_DELAY(n) ((uint32_t)(((n) & 0xff) << 0))

#define ADC_ETC_TRIG_CHAIN_IE1(n) ((uint32_t)(((n) & 0x03) << 29))
#define ADC_ETC_TRIG_CHAIN_B2B1 ((uint32_t)(1<<28))
#define ADC_ETC_TRIG_CHAIN_HWTS1(n) ((uint32_t)(((n) & 0xff) << 20))
#define ADC_ETC_TRIG_CHAIN_CSEL1(n) ((uint32_t)(((n) & 0x0f) << 16))
#define ADC_ETC_TRIG_CHAIN_IE0(n) ((uint32_t)(((n) & 0x03) << 13))
#define ADC_ETC_TRIG_CHAIN_B2B0 ((uint32_t)(1<<12))
#define ADC_ETC_TRIG_CHAIN_HWTS0(n) ((uint32_t)(((n) & 0xff) << 4))
#define ADC_ETC_TRIG_CHAIN_CSEL0(n) ((uint32_t)(((n) & 0x0f) << 0))
// DO we copy for chain 3-2 ...

#define ADC_ETC_TRIG_RESULT_DATA1(n) ((uint32_t)(((n) & 0xff) << 16))
#define ADC_ETC_TRIG_RESULT_DATA0(n) ((uint32_t)(((n) & 0xff) << 0))

Note: depending on if/how we create structure, may need to edit the a above like:

#define ADC_ETC_TRIG_CHAIN_IE1(n) ((uint32_t)(((n) & 0x03) << 29))
#define ADC_ETC_TRIG_CHAIN_B2B1 ((uint32_t)(1<<28))
#define ADC_ETC_TRIG_CHAIN_HWTS1(n) ((uint32_t)(((n) & 0xff) << 20))
#define ADC_ETC_TRIG_CHAIN_CSEL1(n) ((uint32_t)(((n) & 0x0f) << 16))
#define ADC_ETC_TRIG_CHAIN_IE0(n) ((uint32_t)(((n) & 0x03) << 13))
#define ADC_ETC_TRIG_CHAIN_B2B0 ((uint32_t)(1<<12))
#define ADC_ETC_TRIG_CHAIN_HWTS0(n) ((uint32_t)(((n) & 0xff) << 4))
#define ADC_ETC_TRIG_CHAIN_CSEL0(n) ((uint32_t)(((n) & 0x0f) << 0))

To either create duplicates for each of the other CHAIN/RESULT B2B1 would be B4B2 ... for the 2nd one...
Or leave or change N and NP1... or NP1 NP2...

Thoughts?

neurofun
12-16-2019, 05:51 PM
@neurofun

That's really strange results you are getting with your RA8875. Watching the display there is no noticeable delay at all. I can't seem to duplicate your problems with my Adafruit display. Its like your SPI settings are way off. To get anywhere to what you are seeing I have to run at 180khz clock vs 18mhz, then I will see about 2700ms to update the screen.

It is indeed very strange. The display i'm using is the 4.3" from buydisplay.
I even wonder if it has anything to do with SPI because in my main application I also use the SPI1 bus to read the contents of 10 shiftregisters(74hc165) @2MHz.
The time to read the shiftregisters is about 33usec both in beta2 & beta3.

@KurtE
yes teensy4 and 800x480 4.3" RA8875 from buydisplay.

mjs513
12-16-2019, 05:52 PM
@neurofun
Just installed b2 over a copy of the 1.8.10 I had already installed. I am still seeing 272 ms/page @600Mhz for the adafruit display. Unless its something with the Adafruit display vs the buydisplay RA8875

EDIT: Oops - cross post.

defragster
12-16-2019, 05:55 PM
The slowdown is clearly visible on the display.
Changing clockspeed.c does not resolve the slowdown.


Yes, I'm sure about 2940ms. But that was yesterday.
Today the times vary around 2300ms, which I really don't understand.

After investigating a little further I noticed changing USB Type has an influence on the speed.
Serial = +-2000ms
Serial + MIDI = +-2000ms
MIDI = +-2300ms
Raw HID = +-2300ms

With that change reverted AFAIK it says any slowdown isn't related to that clockspeed.c change.

And if slowdown clearly visible - it isn't a time measure issue.

manitou
12-16-2019, 06:50 PM
I ran neurofun's sketch on T4@600mhz, but I have nothing attached to SPI (i have no RA8875). With scope hooked to 13 (SPI CLK), here are my results for various SPI CLK settings in tft.begin()

Teensyduino 1.48 T4@600mhz -O2
SPImhz ms scopemhz
def 237 18.9
8 437 7.6
13 304 12.5
16 276 15.1
50 173 43.7
Teensyduino 1.49-beta3
def 196 20.8
8 390 8
13 281 12.6
16 239 15.9
30 164 35
50 132 54

this is using mjs513's branch of RA8875 lib (_t4)

neurofun
12-16-2019, 07:11 PM
Located the problem in SPI.cpp around line 1281
changed


CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_LPSPI_PODF_MASK | CCM_CBCMR_LPSPI_CLK_SEL_MASK)) |
CCM_CBCMR_LPSPI_PODF(2) | CCM_CBCMR_LPSPI_CLK_SEL(1); // pg 714
// CCM_CBCMR_LPSPI_PODF(6) | CCM_CBCMR_LPSPI_CLK_SEL(2); // pg 714

back to


CCM_CBCMR = (CCM_CBCMR & ~(CCM_CBCMR_LPSPI_PODF_MASK | CCM_CBCMR_LPSPI_CLK_SEL_MASK)) |
// CCM_CBCMR_LPSPI_PODF(2) | CCM_CBCMR_LPSPI_CLK_SEL(1); // pg 714
CCM_CBCMR_LPSPI_PODF(6) | CCM_CBCMR_LPSPI_CLK_SEL(2); // pg 714

and my display performs the same as in beta2.

mjs513
12-16-2019, 08:36 PM
@neurofun - @KurtE

Curious - made the change to SPI.cpp in post #25 and it is still showing 272 ms/page @18Mhz SPI. Makes it more curious that @manitou showed in post #24 SPI clock is actually faster in beta #3.

BTW: what spi clock are you running at?

EDIT: If I run at 22mhz in beta#3 is taking me 161,280 ms/page. Which makes sense that since 22mhz setting would equate to something like 24-25mhz in actuality with beta3. In beta2 22Mhz would probably below the speed limit. You are over the speed limit for the RA875.

@22Mhz - 161.280 ms/page
@20Mhz - 421 ms/page
@18Mhz - 259 ms/page
@16Mhz - 259 ms/page
@14Mhz - 271 ms/page
@12Mhz - 271 ms/page

This is the warning from the usersetting.h file:

After som mail exchange with RAiO I solved the dilemma behind SPI speed limit:
The RA8875 has limitation of 12Mhz SPI but this has been set because not all internal macros
can run over that speed, the library automatically deal with this so I was able to go over 20Mhz!
At that speed you need to short cables as much you can, provide clean supply and good decoupling!
DO NOT Exceed 23Mhz for RA8875! It will result in garbage on screen or run very slow. So looks like the problem is with the RA8875 not beta3.

EDIT2: can you run your display at about 18 or 19Mhz with Beta3

neurofun
12-16-2019, 09:39 PM
@mjs513
beta3 with changes from post #25
running at the default 22MHz showing 284ms
changed to 18MHz and now showing 275ms

unmodified beta3
@17MHz -> 246ms
@18MHz -> 248ms
@19MHz -> 256ms

Problem resolved, thank you very much!!!

It might be a good idea to change the MAXSPISPEED for T4 in RA8875UserSettings.h from 22MHz to 18MHz.

mjs513
12-16-2019, 09:55 PM
@neurofun

Unfortunately the speed limit with the RA8875 didn't register until @manitou posted his measurements and I went in to retest. Going to change to 18mhz

defragster
12-16-2019, 10:41 PM
As far as I know, no bugs have been found with millis().

However, micros() had at least 2 bugs. Neither is new. These have been with us since Teensy 4.0 release.

Functions like delay() which use micros() internally are also affected by these micros() bugs.

Bug #1: (fixed) When configured for 24 MHz speed, we were actually running at 30 MHz, but F_CPU_ACTUAL was set to 24000000. When F_CPU_ACTUAL is wrong, micros() computes incorrect output.

Bug #2: (not fixed) At certain slower CPU speeds, micros() gives wrong results just before millis() increments. For example, if millis() is currently 499 and about to turn to 500, the bug in micros() might result in output like 500001, even though micros() should never report more than 499999 while millis() is still 499.

So far, neither of these bugs has been found while running at 600 or 528 MHz.

Indeed - looking at delay() it relies on micros() - rewrote it to work with millis() and it shows proper function. So the millis systick isn't the trouble.

This line in micros() gets the partial microsecond since the last clock tick using ARM_DWT_CYCCNT expressed in the ccdelta :: usec = 1000*smc + (ccdelta/(F_CPU_ACTUAL/1000000));

Fixed Bug #1 impact from wrong F_CPU_ACTUAL is apparent where CYCCNT is not in sync with cycle counts/second.

Re Bug #2: Looking around not seeing what could be behind this. Anytime the sysTick recorded ARM_DWT_CYCCNT gets out of sync with actual tick or the F_CPU_ACTUAL it can get messy. On set_arm_clock() there will be change in rate, between ticks - but that should reset on the next sysTick.

On T_3.6 the actual ARM_DWT_CYCCNT was under F_CPU based on GPS PPS. Not sure what happens from alternate clock on T4 "At certain slower CPU speeds" - if the cycles run past expected count it could round UP before sysTick resets? But that should be a small window - and only systick_isr() controls the clock values, micros() is read only with code to try to get atomic reads of tick data.

manitou
12-16-2019, 10:52 PM
The max SPI clock for Teensyduino 1.48 was 37.7 mhz (528/7/2) with an effective data rate of 30.8 mbs. With LPSPI clock changed in 1.49-beta 3, max SPI clock is 120mhz. I ran some 1024-byte SPI transfers with MOSI jumpered to MISO, I ran some tests with scope on SPI CLK (pin 13). Tests would hang with SPI CLK set at 120mhz, but for 80 mhz and lower, tests ran successfully with no errors betwixt MISO and MOSI. The max effective data rate was about 54 mbs. I printed out SPI CCR register to confirm prescale (SCKDIV and DBT)


SPICLK CCR datarate(mbs) CCR MHz 1.8.10 1.48
3 C18 2.86 2.9
4 811 3.89 4
5 70E 4.61 4.7
8 408 7.27 7.5
14 204 11.82 12.6
16 103 14.0 15.1
20 102 17.2 18.9
40 0 30.8 37.7

1.49-beta3 results:

SPICLK CCR datarate(mbs) scopeMHz
4 1D3A 3.97 4
8 E1C 7.9 8
24 408 23.1 24
40 204 37.4 40
48 103 42.7 47.6
60 102 54.2 60
80 001 54.2 79.2

spiperf sketch (https://github.com/manitou48/teensy4/blob/master/spiperf.ino)
For 1.49-beta3, you would need to change print to read

Serial.printf("SPICLOCK %d MHz CCR freq %.1f MHz\n", SPICLOCK / 1000000, 720. / 3 / ((0xff & LPSPI4_CCR) + 2));

mjs513
12-16-2019, 11:32 PM
The max SPI clock for Teensyduino 1.48 was 37.5 mhz with an effective data rate of 29.2 mbs. With LPSPI clock changed in 1.49-beta 3, max SPI clock is 120mhz. I ran some 1024-byte SPI transfers with MOSI jumpered to MISO, I ran some tests with scope on SPI CLK (pin 13). Tests would hang with SPI CLK set at 120mhz, but for 80 mhz and lower, tests ran successfully with no errors betwixt MISO and MOSI. The max effective data rate was about 54 mbs. I printed out SPI CCR register to confirm prescale (SCKDIV and DBT)

1.49-beta3 results:

SPICLK CCR datarate(mbs) scopeMHz
4 1D3A 3.97 4
8 E1C 7.9 8
24 408 23.1 24
40 204 37.4 40
48 103 42.7 47.6
60 102 54.2 60
80 001 54.2 79.2

spiperf sketch (https://github.com/manitou48/teensy4/blob/master/spiperf.ino)
For 1.49-beta3, you would need to change print to read

Serial.printf("SPICLOCK %d MHz CCR freq %.1f MHz\n", SPICLOCK / 1000000, 720. / 3 / ((0xff & LPSPI4_CCR) + 2));

@manitou
That's good info to keep handy. Thank you for testing.

manitou
12-17-2019, 12:54 AM
@manitou
That's good info to keep handy. Thank you for testing.
I added 1.48 SPI results to post #30

neroroxxx
12-17-2019, 07:50 AM
Just got to test beta 3, tested the MIDI USB Type with large Sysex dumps and worked like a charm! the only thing i noticed that i found odd was that for teensy 4.0 the IDE says it uses 10 times more dynamic memory than on teensy 3.6

This is for a blank sketch just a setup and loop function

t4.0 blank sketch Serial 600MHz Optimize "Faster"
Sketch uses 12448 bytes (0%) of program storage space. Maximum is 2031616 bytes.
Global variables use 41660 bytes (7%) of dynamic memory, leaving 482628 bytes for local variables. Maximum is 524288 bytes.


t3.6 blank sketch Serial 180MHz Optimize "Faster"
Sketch uses 9944 bytes (0%) of program storage space. Maximum is 1048576 bytes.
Global variables use 3828 bytes (1%) of dynamic memory, leaving 258316 bytes for local variables. Maximum is 262144 bytes.

defragster
12-17-2019, 08:00 AM
Just got to test beta 3, tested the MIDI USB Type with large Sysex dumps and worked like a charm! the only thing i noticed that i found odd was that for teensy 4.0 the IDE says it uses 10 times more dynamic memory than on teensy 3.6

This is for a blank sketch just a setup and loop function

t4.0 blank sketch Serial 600MHz Optimize "Faster"
Sketch uses 12448 bytes (0%) of program storage space. Maximum is 2031616 bytes.
Global variables use 41660 bytes (7%) of dynamic memory, leaving 482628 bytes for local variables. Maximum is 524288 bytes.


t3.6 blank sketch Serial 180MHz Optimize "Faster"
Sketch uses 9944 bytes (0%) of program storage space. Maximum is 1048576 bytes.
Global variables use 3828 bytes (1%) of dynamic memory, leaving 258316 bytes for local variables. Maximum is 262144 bytes.

The T4 pulls code from FLASH to RAM to execute - the minimum allocation block for that is 32KB. That allows that code to run at zero wait states in sync with the processor. Code can be marked as FLASHMEM to keep it in flash where it will run slower unless it gets into and stays in cache.

With code from flash moved to RAM up to 32KB in your sketch - then program data is allocated after that in the remainder of the full speed RAM.

More details on memory here : https://www.pjrc.com/store/teensy40.html

That should detail that big first allocation and lead to more details on the T4's memory.

PaulStoffregen
12-17-2019, 08:06 AM
the only thing i noticed that i found odd was that for teensy 4.0 the IDE says it uses 10 times more dynamic memory than on teensy 3.6

That's normal, even though Arduino's description "Global variables" doesn't really apply to the much more complex memory structure on this chip.

The larger memory use comes mostly from 2 places.

1: Code is copied to RAM. That's the only way to run at the full 600 MHz speed. It's assigned in 32K chunks, so 32768 of those bytes are code, not actually variables.

2: USB uses much larger buffers to support 480 Mbit speed. On Teensy 3.x, the USB is limited to 12 Mbit speed (except the USB host port on Teensy 3.6). Also on those boards I designed the USB code to use a shared pool of buffers, which conserves memory, but it can also hurt performance in the some cases of sustained large bidirectional data flow, especially if done simultaneously on multiple interfaces (a very rare case, but it has come up a few times over the years). It was originally designed for Teensy 3.0 which had 16K or RAM, and now supports Teensy LC with 8K of RAM. On Teensy 4.0, since there's so much more memory, I went with dedicated buffers per endpoint. It's a trade-off which puts performance first, but does cost more memory for buffers. Even though the CPU is ~5.5X faster than Teensy 3.6, the USB is 20X faster, so code optimizations & buffers are crucial to giving you access to the power of 480 Mbit USB.

PaulStoffregen
12-17-2019, 01:28 PM
Re Bug #2: Looking around not seeing what could be behind this.

It's a subtle timing problem. When the elapsed time between handling milliseconds and cycle count is different in the systick interrupt than what the main program does, the result is a small error by the amount of that difference. The interrupt almost always handles these with minimum delay, but the micros() code does not, especially when it suffers being interrupted.

When running at very high CPU speeds, the error is always (or almost always - we simply haven't found ways to stress it enough with other interrupts) a tiny amount less than 1 microsecond, which gets discarded when the division operation rounds down. At slower speeds, the error can be 1 microsecond or more.

We can't fully eliminate this error, especially with interrupts remaining enabled, but the results can at least be forced to always be monotonic by simply checking that the factional part added to the milliseconds is not more than 1 millisecond. Hopefully that will be good enough.

I pushed a couple commits which hopefully will be a permanent fix.

https://github.com/PaulStoffregen/cores/commit/bd4a9e2cfb75ff4c8872cb69a545cea578c87d23

https://github.com/PaulStoffregen/cores/commit/041bfbe9ed85b13f1a833b6c81fe2cd54d1b1ac6

This was a particularly tough problem. It's been quite a while since I've had to dig so much into the compiler's generated asm to see...

neroroxxx
12-17-2019, 02:18 PM
Ok that makes sense, even i've had the 4.0 for a month i never really used because there was no USB MIDI so i was patiently waiting for it, last night when i went to bed i checked the forum and saw the announcement so i rushed back to my desk to download the beta and test MIDI, so far large SysEx dumps back and forth worked like a charm i'm going to test setting the Teensy 4.0 as a clock master with USB MIDI and see how it performs but otherwise i'm not sure what other tests need to be done for USB MIDI, all the common uses seem to work like a charm!

Also i forgot to mention USB MID is insanly fast!

mjs513
12-17-2019, 03:21 PM
I pushed a couple commits which hopefully will be a permanent fix.

https://github.com/PaulStoffregen/cores/commit/bd4a9e2cfb75ff4c8872cb69a545cea578c87d23

https://github.com/PaulStoffregen/cores/commit/041bfbe9ed85b13f1a833b6c81fe2cd54d1b1ac6

This was a particularly tough problem. It's been quite a while since I've had to dig so much into the compiler's generated asm to see...

Paul
I reran the ClocksT4 sketch as modified by @defragster and looks like the fix resolved the issue associated the clocks as well - no bugbug reports for the speeds tested. For the clock of 129600000:

F_CPU=129600000 50ms delay:: 50001 us and 50 ms
System Clock: 129600000
IPG Clock: 129600000
Semc Clock: 43200000
RTC Clock: 32768
USB1pll Clock: 480000000
Peripheral Clock: 24000000
Osc Clock: 24000000
Arm Clock: 648000000
...

KurtE
12-17-2019, 06:12 PM
@mjs513 (me and ...) - I think we probably need to look at some of these changes...

I thought I would try out an ILI9341_t3 display on the current stuff, so I tried breadboard and not getting any display out of graphictest (either ili9341_t3 or _t3n)...
Maybe display bad, tried different one... Maybe wiring wrong so moved to another T4... Still nothing...

Brought up Arduino 1.8.9 with 1.47 and works...

Argh...

KurtE
12-17-2019, 06:31 PM
Update, I then went back and reran the Beta3 on 1.8.10 and it worked? Not sure what the difference is now? Lose wire?
Logic analyzer is saying that it is trying to run SPI > 31mhz...

Or maybe it likes logic analyzer hooked up?

Maybe false alarm?

Update: although a couple test cases using ili9341_t3n are not working (not sure if that our SPIN...)

Edit2 ILI9341_t3n - Asked beginTransaction to run at 144mhz so it got over 100... which did not work... So backed it down again to 30mhz...

mjs513
12-17-2019, 07:32 PM
Update, I then went back and reran the Beta3 on 1.8.10 and it worked? Not sure what the difference is now? Lose wire?
Logic analyzer is saying that it is trying to run SPI > 31mhz...

Or maybe it likes logic analyzer hooked up?

Maybe false alarm?

Update: although a couple test cases using ili9341_t3n are not working (not sure if that our SPIN...)

Edit2 ILI9341_t3n - Asked beginTransaction to run at 144mhz so it got over 100... which did not work... So backed it down again to 30mhz...

@KurtE
Sorry just saw this.

Just tested Buddhabrot, graphicstest and demosauce (had to fix font name) they all seemed to working at 30Mhz. Didn't do any tests with increased speed. Do what to test ST7735_t3 at some point but keep getting distracted.

mjs513
12-17-2019, 07:44 PM
@KurtE

Just tested ILI9488 = seems busted - nothing on running on my Display. Tried changing spi but still not working. Have to go back to make sure I have the right version.

EDIT: False alarm - it worked at 30 and 72Mhz - demosauce and graphicstest

defragster
12-18-2019, 01:54 AM
It's a subtle timing problem. When the elapsed time between handling milliseconds and cycle count is different in the systick interrupt than what the main program does, the result is a small error by the amount of that difference. The interrupt almost always handles these with minimum delay, but the micros() code does not, especially when it suffers being interrupted.

When running at very high CPU speeds, the error is always (or almost always - we simply haven't found ways to stress it enough with other interrupts) a tiny amount less than 1 microsecond, which gets discarded when the division operation rounds down. At slower speeds, the error can be 1 microsecond or more.

We can't fully eliminate this error, especially with interrupts remaining enabled, but the results can at least be forced to always be monotonic by simply checking that the factional part added to the milliseconds is not more than 1 millisecond. Hopefully that will be good enough.

I pushed a couple commits which hopefully will be a permanent fix.

https://github.com/PaulStoffregen/cores/commit/bd4a9e2cfb75ff4c8872cb69a545cea578c87d23

https://github.com/PaulStoffregen/cores/commit/041bfbe9ed85b13f1a833b6c81fe2cd54d1b1ac6

This was a particularly tough problem. It's been quite a while since I've had to dig so much into the compiler's generated asm to see...

<Good timing was just coming to ask about your test code?>

Yes, when that Frac could jump a us in advance was the only way I saw it generating a problem - but wasn't sure adding code to limit was the answer.

I'll pull those changes and see what I see in a test sketch I've been working with 24, 36, 600 MHz.

With 1.49 Beta 3 code:: In doing 32K samples ( counted while() with 10 inline doing array index increment on each call ) it looks like 39 cycles per iteration average - so having 2 us between calls is to be expected at 24 MHz. Last run shows zero over 1us between calls at 600 and 396 MHz. I'll have to add ZERO us delta to stats to see new code variance.

In my sketch I'm seeing some USB garbage and output order misqueues/missing in output dumping results of 32K successive runs in inline groups of 10 - didn't help boiling it down :( Even added a delay 10us on each newline.

Also finding when the sketch is running that I have to push the button to reprogram when it runs tight {non delay} code

defragster
12-18-2019, 09:17 AM
@Paul: re p#36 / p#43 using the updated github cores in the sketch here at 24 MHz:

Short Summary: You did good with the ASM and rework - micros() is running 3 cycles faster by this measure. And net effect looks like a reduction of 24% of sequential calls to micros() returning 2 us or more later, and 37% more returning in 1 us.
And no indication of resulting side effects.

The data for that using github updated with 'frac':

15529 at 1us // 37% more here
17186 at 2us
9 at 3us
40 at 4us
2 at 5us
1 at 16us

Test cycles per iteration 36 Test Cnt over 1us 17238 of 32767 // 24% fewer here

And testing with 1.49 b3 code:

11334 at 1us
21377 at 2us
1 at 3us
40 at 4us
14 at 5us
1 at 16us

Test cycles per iteration 39 Test Cnt over 1us 21433 of 32767


Same results at 36 MHz are more dramatically better with FRAC:

32198 at 1us
535 at 2us
34 at 3us

Test cycles per iteration 36 Test Cnt over 1us 569 of 32767

td1.49b3 at 36 MHz:

3 at 0us
29653 at 1us
3081 at 2us
30 at 3us

Test cycles per iteration 39 Test Cnt over 1us 3111 of 32767

PaulStoffregen
12-18-2019, 09:33 AM
More speedup in micros() might be possible if the 3 state variables are put into a struct. That way the compiler only has to load a single address to access all 3. But that might break other code which has "systick_millis_count", so some sort of alias special linker symbol might be needed if that becomes a problem.

I'm not planning to try this now, but it's an idea for future optimization...

defragster
12-18-2019, 10:23 AM
More speedup in micros() might be possible if the 3 state variables are put into a struct. That way the compiler only has to load a single address to access all 3. But that might break other code which has "systick_millis_count", so some sort of alias special linker symbol might be needed if that becomes a problem.

I'm not planning to try this now, but it's an idea for future optimization...

If it helped the two 'new to t4' values are easy - the systick_millis_count would be bad without reference leaving the old behind - looks like reference& is a .cpp feature?

/\ cross posted p#44 edit with 36 MHz showing 540% reduction in over 1us as tested.