Teensy 4.0 First Beta Test

Status
Not open for further replies.
push program button after load

I have to push the program button after every load. The first load seems to progress ok (progress bar does its thing), but there is no /dev/ttyACM0? (power consumption drops to 49 ma, usually at about 88 ma) I push the program button, progress bar does a quick thing, then IDE monitor window comes to life and program runs. Even running blink requires the button push. This happens on two linux/ubuntu 32-bit boxes 1.8.8 1.46-beta5 (and earlier), and on a ubuntu 64-bit box with 1.8.7 and 1.46-beta 5. attached is log.txt from Verbose

(I haven't tried it on mac or windows)
 

Attachments

  • log.txt
    30.7 KB · Views: 85
I have to push the program button after every load. ... This happens on two linux/ubuntu 32-bit boxes 1.8.8 1.46-beta5 (and earlier), and on a ubuntu 64-bit box with 1.8.7 and 1.46-beta 5.

It's definitely working for me on Ubuntu 64 bit.

Try Verify in Arduino (so Teensy Loader is ready to go), then from a terminal in the arduino-1.8.8 folder, try running "./hardware/tools/teensy_reboot -v".

Here's what it prints on my system.

Code:
add device: subsys=usb, type=usb_device, location=/sys/devices/pci0000:00/0000:00:14.0/usb4/4-14/4-14.2
  devnode=/dev/bus/usb/004/061, subsystem=usb, ifacenum=-1
add child:  subsys=usb, type=usb_interface, location=/sys/devices/pci0000:00/0000:00:14.0/usb4/4-14/4-14.2/4-14.2:1.0
  parent location=/sys/devices/pci0000:00/0000:00:14.0/usb4/4-14/4-14.2
  model=35 (Teensy 4-Beta)
add child:  subsys=tty, type=(null), location=/sys/devices/pci0000:00/0000:00:14.0/usb4/4-14/4-14.2/4-14.2:1.0/tty/ttyACM0
  parent location=/sys/devices/pci0000:00/0000:00:14.0/usb4/4-14/4-14.2
  devnode=/dev/ttyACM0, subsystem=tty, ifacenum=0
add child:  subsys=usb, type=usb_interface, location=/sys/devices/pci0000:00/0000:00:14.0/usb4/4-14/4-14.2/4-14.2:1.1
  parent location=/sys/devices/pci0000:00/0000:00:14.0/usb4/4-14/4-14.2
found Teensy Loader, version 1.46
Sending command: show:arduino_attempt_reboot
Sending command: comment: Teensyduino 1.46-beta6 - LINUX64 (teensy_reboot)
do_reset (serial) /dev/ttyACM0
status read, retry 0
status read, retry 1
status read, retry 2
status read, retry 3
status read, retry 4
status read, retry 5
Success
 
Code:
hardware/tools/teensy_reboot -v
add device: subsys=usb, type=usb_device, location=/sys/devices/pci0000:00/0000:00:1d.7/usb1/1-5/1-5.4
  devnode=/dev/bus/usb/001/059, subsystem=usb, ifacenum=-1
add child:  subsys=usb, type=usb_interface, location=/sys/devices/pci0000:00/0000:00:1d.7/usb1/1-5/1-5.4/1-5.4:1.0
  parent location=/sys/devices/pci0000:00/0000:00:1d.7/usb1/1-5/1-5.4
  model=35 (Teensy 4-Beta)
add child:  subsys=tty, type=(null), location=/sys/devices/pci0000:00/0000:00:1d.7/usb1/1-5/1-5.4/1-5.4:1.0/tty/ttyACM0
  parent location=/sys/devices/pci0000:00/0000:00:1d.7/usb1/1-5/1-5.4
  devnode=/dev/ttyACM0, subsystem=tty, ifacenum=0
add child:  subsys=usb, type=usb_interface, location=/sys/devices/pci0000:00/0000:00:1d.7/usb1/1-5/1-5.4/1-5.4:1.1
  parent location=/sys/devices/pci0000:00/0000:00:1d.7/usb1/1-5/1-5.4
found Teensy Loader, version 1.46
Sending command: show:arduino_attempt_reboot
Sending command: comment: Teensyduino 1.46-beta5 - LINUX32 (teensy_reboot)
do_reset (serial) /dev/ttyACM0
Unable to open /dev/ttyACM0 for reboot request
status read, retry 0
status read, retry 1
status read, retry 2
status read, retry 3
status read, retry 4
status read, retry 5
status read, retry 6
...
status read, retry 48
status read, retry 49
Teensy did not respond to a USB-based request to enter program mode.
Please press the PROGRAM MODE BUTTON on your Teensy to upload your sketch.
 
Unable to open /dev/ttyACM0 for reboot request

Well that doesn't look right. What do you get with "ls -l /dev/ttyACM0".

Should be like this.

Code:
crw-rw-rw- 1 root dialout 166, 0 Jan  2 09:17 /dev/ttyACM0

If not, maybe the udev rule file isn't installed properly?
 
Well that doesn't look right. What do you get with "ls -l /dev/ttyACM0".

Should be like this.

Code:
crw-rw-rw- 1 root dialout 166, 0 Jan  2 09:17 /dev/ttyACM0

If not, maybe the udev rule file isn't installed properly?

crw-rw-rw- 1 root plugdev 166, 0 Jan 2 12:20 /dev/ttyACM0


i'll check udev, but T3.2 etc. all build ok.
 
Got a I2S input program working

Code:
// I2S input test with DMA
//
// some missing macros/definitions
#define CCM_ANALOG_PLL_AUDIO_POST_DIV_SELECT(n) ((uint32_t)(((n) & 0x03)<<19)) 
#define CCM_ANALOG_PLL_AUDIO_BYPASS ((uint32_t)(1<<16)) 
#define CCM_ANALOG_PLL_AUDIO_BYPASS_CLK_SRC(n) ((uint32_t)(((n) & 0x03)<<14)) 
#define CCM_ANALOG_PLL_AUDIO_ENABLE ((uint32_t)(1<<13)) 
#define CCM_ANALOG_PLL_AUDIO_POWERDOWN ((uint32_t)(1<<12)) 
#define CCM_ANALOG_PLL_AUDIO_DIV_SELECT(n) ((uint32_t)((n) & ((1<<6)-1))) 

#define CCM_ANALOG_MISC2_DIV_MSB (1u<<23)
#define CCM_ANALOG_MISC2_DIV_LSB (1u<<15)

#define CCM_ANALOG_PLL_AUDIO_NUM_MASK (((1<<29)-1))
#define CCM_ANALOG_PLL_AUDIO_DENOM_MASK (((1<<29)-1))

#define CCM_CSCMR1_SAI1_CLK_SEL_MASK (CCM_CSCMR1_SAI1_CLK_SEL(0x03))
#define CCM_CS1CDR_SAI1_CLK_PRED_MASK (CCM_CS1CDR_SAI1_CLK_PRED(0x07))
#define CCM_CS1CDR_SAI1_CLK_PODF_MASK (CCM_CS1CDR_SAI1_CLK_PODF(0x3f))

#define CCM_CSCMR1_SAI2_CLK_SEL_MASK (CCM_CSCMR1_SAI2_CLK_SEL(0x03))
#define CCM_CS2CDR_SAI2_CLK_PRED_MASK (CCM_CS2CDR_SAI2_CLK_PRED(0x07))
#define CCM_CS2CDR_SAI2_CLK_PODF_MASK (CCM_CS2CDR_SAI2_CLK_PODF(0x3f))

#define CCM_CSCMR1_SAI3_CLK_SEL_MASK (CCM_CSCMR1_SAI3_CLK_SEL(0x03))
#define CCM_CS1CDR_SAI3_CLK_PRED_MASK (CCM_CS1CDR_SAI3_CLK_PRED(0x07))
#define CCM_CS1CDR_SAI3_CLK_PODF_MASK (CCM_CS1CDR_SAI3_CLK_PODF(0x3f))
//
//
void set_audioClock(int nfact = 27, int32_t mult=0, uint32_t div=1)
{ 
  CCM_ANALOG_PLL_AUDIO = 0;
  //CCM_ANALOG_PLL_AUDIO |= CCM_ANALOG_PLL_AUDIO_BYPASS;
  CCM_ANALOG_PLL_AUDIO |= CCM_ANALOG_PLL_AUDIO_ENABLE;
  CCM_ANALOG_PLL_AUDIO |= CCM_ANALOG_PLL_AUDIO_POST_DIV_SELECT(2); // 0: 1/4; 1: 1/2; 0: 1/1
  CCM_ANALOG_PLL_AUDIO |= CCM_ANALOG_PLL_AUDIO_DIV_SELECT(nfact);    
  
  CCM_ANALOG_PLL_AUDIO_NUM   = mult &CCM_ANALOG_PLL_AUDIO_NUM_MASK;
  CCM_ANALOG_PLL_AUDIO_DENOM = div &CCM_ANALOG_PLL_AUDIO_DENOM_MASK;
  
  const int div_post_pll = 1; // other values: 2,4
  CCM_ANALOG_MISC2 &= ~(CCM_ANALOG_MISC2_DIV_MSB | CCM_ANALOG_MISC2_DIV_LSB);
  if(div_post_pll>1)
    CCM_ANALOG_MISC2 |= CCM_ANALOG_MISC2_DIV_LSB;
  if(div_post_pll>3)
    CCM_ANALOG_MISC2 |= CCM_ANALOG_MISC2_DIV_MSB;
}
//
void sai1_setClock(int n1, int n2) 
{ 
  CCM_CCGR5 |= CCM_CCGR5_SAI1(CCM_CCGR_ON);
  IOMUXC_GPR_GPR1 |= (IOMUXC_GPR_GPR1_SAI1_MCLK_DIR | IOMUXC_GPR_GPR1_SAI1_MCLK1_SEL(0));

  // clear SAI1_CLK register locations
  CCM_CSCMR1 &= ~(CCM_CSCMR1_SAI1_CLK_SEL_MASK);
  CCM_CS1CDR &= ~(CCM_CS1CDR_SAI1_CLK_PRED_MASK | CCM_CS1CDR_SAI1_CLK_PODF_MASK);
  //
  CCM_CSCMR1 |= CCM_CSCMR1_SAI1_CLK_SEL(2); // &0x03 // (0,1,2): PLL3PFD0, PLL5, PLL4,  
  CCM_CS1CDR |= CCM_CS1CDR_SAI1_CLK_PRED(n1-1); // &0x07
  CCM_CS1CDR |= CCM_CS1CDR_SAI1_CLK_PODF(n2-1); // &0x3f   
}
//
void sai1_configurePorts(int iconf=0)
{
  CORE_PIN23_CONFIG = 3;  //1:MCLK
  CORE_PIN21_CONFIG = 3;  //1:RX_BITCLK
  CORE_PIN20_CONFIG = 3;  //1:RX_FS
  CORE_PIN7_CONFIG  = 3;  //1:RX_DATA0
  CORE_PIN8_CONFIG  = 3;  //1:RX_DATA1 // not used
  CORE_PIN9_CONFIG  = 3;  //1:RX_DATA2 // not used
  CORE_PIN32_CONFIG = 3;  //1:RX_DATA3 // not used
}
//subset of missing definitions
#define I2S_RCR1_RFW(n)     ((uint32_t)n & 0x1f)    // Receive FIFO watermark
#define I2S_RCR2_DIV(n)     ((uint32_t)n & 0xff)    // Bit clock divide by (DIV+1)*2
#define I2S_RCR2_BCD      ((uint32_t)1<<24)   // Bit clock direction
#define I2S_RCR2_MSEL(n)    ((uint32_t)(n & 3)<<26)   // MCLK select, 0=bus clock, 1=I2S0_MCLK
#define I2S_RCR2_SYNC(n)    ((uint32_t)(n & 3)<<30)   // 0=async 1=sync with receiver
#define I2S_RCR3_RCE        ((uint32_t)0x10000)   // receive channel enable
#define I2S_RCR4_FSD      ((uint32_t)1)     // Frame Sync Direction
#define I2S_RCR4_MF     ((uint32_t)0x10)    // MSB First
#define I2S_RCR4_SYWD(n)    ((uint32_t)(n & 0x1f)<<8) // Sync Width
#define I2S_RCR4_FRSZ(n)    ((uint32_t)(n & 0x0f)<<16)  // Frame Size
#define I2S_RCR5_FBT(n)     ((uint32_t)(n & 0x1f)<<8) // First Bit Shifted
#define I2S_RCR5_W0W(n)     ((uint32_t)(n & 0x1f)<<16)  // Word 0 Width
#define I2S_RCR5_WNW(n)     ((uint32_t)(n & 0x1f)<<24)  // Word N Width

#define I2S_RCSR_RE     ((uint32_t)0x80000000)    // Receiver Enable
#define I2S_RCSR_FR     ((uint32_t)0x02000000)    // FIFO Reset
#define I2S_RCSR_FRDE     ((uint32_t)0x00000001)    // FIFO Request DMA Enable
#define I2S_RCSR_BCE      ((uint32_t)0x10000000)    // Bit Clock Enable

typedef struct
{
  uint32_t CSR;
  uint32_t CR1,CR2,CR3,CR4,CR5;
  uint32_t DR[8];
  uint32_t FR[8];
  uint32_t MR;
} I2S_PORT;

typedef struct
{
  uint32_t VERID;
  uint32_t PARAM;
  I2S_PORT TX;
  uint32_t unused[9];
  I2S_PORT RX;
} I2S_STRUCT;

I2S_STRUCT *I2S1 = ((I2S_STRUCT *)0x40384000);
I2S_STRUCT *I2S2 = ((I2S_STRUCT *)0x40388000);
I2S_STRUCT *I2S3 = ((I2S_STRUCT *)0x4038C000);

//
#define NCH 2
#define NDAT 128
#define NBITS 32
int32_t rx_buffer[2*NCH*NDAT];
int32_t *rx_data1 = rx_buffer;
int32_t *rx_data2 = rx_buffer+(NCH*NDAT);

void sai_rxConfig(int ndiv)
{
  I2S1->RX.MR = 0;
  I2S1->RX.CR1 = I2S_RCR1_RFW(1); 
  I2S1->RX.CR2 = I2S_RCR2_SYNC(0);// | I2S_RCR2_BCP ; // sync=0; rx is async; 

  I2S1->RX.CR2 |= (I2S_RCR2_BCD | I2S_RCR2_DIV(ndiv-1) | I2S_RCR2_MSEL(1));
  I2S1->RX.CR3 = I2S_RCR3_RCE; // single rx channel

  //
  I2S1->RX.CR4 = I2S_RCR4_FRSZ(NCH-1) 
        | I2S_RCR4_SYWD(NBITS-1) 
        | I2S_RCR4_FSD
        | I2S_RCR4_MF
        ;

  I2S1->RX.CR5 = I2S_RCR5_WNW(NBITS-1) | I2S_RCR5_W0W(NBITS-1) | I2S_RCR5_FBT(NBITS-1);
}

void sai_rx_isr(void); // forward declaration

void sai_setupInput(void * buffer, int ndat)
{ 
  //
  CCM_CCGR5 |= CCM_CCGR5_DMA(CCM_CCGR_ON);
  DMA_CR = DMA_CR_GRP0PRI | DMA_CR_EMLM | DMA_CR_EDBG;

  DMAMUX_CHCFG0 = 0x00000000;
  DMAMUX_CHCFG0 = (DMAMUX_SOURCE_SAI1_RX & 0x7F) | DMAMUX_CHCFG_ENBL;

  int nb = (NBITS/8);
  //
  int ch = 0;
  DMA_CERQ = ch;
  DMA_CERR = ch;
  DMA_CEEI = ch;
  DMA_CINT = ch;

  DMA_TCD0_CSR = 0;
  //
  DMA_TCD0_SADDR = (void *)&I2S1_RDR0;
  DMA_TCD0_SOFF = 0;
  DMA_TCD0_ATTR = DMA_TCD_ATTR_SSIZE(nb/2) | DMA_TCD_ATTR_DSIZE(nb/2);
  DMA_TCD0_NBYTES = nb;
  DMA_TCD0_SLAST = 0;

  DMA_TCD0_DADDR = buffer;
  DMA_TCD0_DOFF = nb;
  DMA_TCD0_CITER = ndat;
  DMA_TCD0_BITER = ndat;
  DMA_TCD0_DLASTSGA = -nb*ndat;

  DMA_TCD0_CSR |= DMA_TCD_CSR_INTHALF | DMA_TCD_CSR_INTMAJOR;

  attachInterruptVector(IRQ_DMA_CH0, sai_rx_isr);
  NVIC_ENABLE_IRQ(IRQ_DMA_CH0);

  NVIC_SET_PRIORITY(IRQ_DMA_CH0, 7*16); // 8 is normal priority
  //
  I2S1->RX.CSR |= I2S_RCSR_FRDE | I2S_RCSR_FR;
  I2S1->RX.CSR |= I2S_RCSR_RE | I2S_RCSR_BCE;

  DMA_SERQ = ch;
  DMA_TCD0_CSR |=  DMA_TCD_CSR_START;
}

#define I2S_PIN 3
void sai_rxProcessing(void * taddr);  // forward declaration

void sai_rx_isr(void)
{ uint32_t daddr, taddr;
  //
  DMA_CINT = 0;
  //
  daddr = (uint32_t) (DMA_TCD0_DADDR);
  if (daddr < (uint32_t)rx_data2) 
  {
    // DMA is receiving to the first half of the buffer
    // need to process data from the second half
    taddr=(uint32_t) rx_data2;
    //
    digitalWrite(I2S_PIN, HIGH);
  } 
  else 
  {
    // DMA is receiving to the second half of the buffer
    // need to process data from the first half
    taddr=(uint32_t) rx_data1;
    //
    digitalWrite(I2S_PIN, LOW);
  }
  //up call
  sai_rxProcessing((void *) taddr);
}

uint32_t rxCount = 0;
uint32_t maxVal=0;
void sai_rxProcessing(void * taddr)
{
  uint32_t *data = (uint32_t *) taddr;
  // do something useful
  rxCount++;
  for(int ii=0; ii<NCH*NDAT;ii++)
  if(data[ii]>maxVal) maxVal=data[ii];  
}

void setup() {
  // put your setup code here, to run once:
  while(!Serial);
  Serial.println("T4_Test");

  sai1_configurePorts();

  int fs = 192000;
  int bit_clk = fs*(NCH*NBITS);
  int nov = 4; // factor of oversampling MCKL/BCKL
  int fs_mclk = nov*bit_clk; // here 49.152 MHz 
  //
  int c0, c1, c2;
  int n1, n2;

  n1 = 4; // to ensure that input to last divisor (i.e. n2) is < 300 MHz
  n2 = 4; // to reduce clock further to become MCLK

  // the PLL runs between 27*24 and 54*24 MHz (before dividers)
  // e.g. 49.152 = 32.768*24 / (n1*n2)
  c0 = 32;  
  c1 = 768;
  c2 = 1000;
  // Note: c0+c1/c2 must be between 27 and 54
  //       here: 32.768*24 MHz = 786.4320 MHz

  set_audioClock(c0,c1,c2);
  sai1_setClock(n1,n2);

  int ndiv = nov/2;   // MCLK -> 2* BitClock
  sai_rxConfig(ndiv);
  sai_setupInput(rx_buffer, (2*NCH*NDAT));

  pinMode(LED_BUILTIN, OUTPUT);
  pinMode(I2S_PIN,OUTPUT);
}

void loop() {
  // put your main code here, to run repeatedly:
  digitalWrite(LED_BUILTIN, HIGH);
  delay(500);
  digitalWrite(LED_BUILTIN, LOW);
  delay(500);
  Serial.printf("%d %d\n",rxCount,maxVal);
  rxCount=0;
  maxVal=0;

}
Coded only input with 'bare metal' dma to have better control on chasing errors
Exercise was useful to understand differences between T3 and T4


Big question is how to handle the different PLL's (PLL4, PLL5) ?
It seems they should be handled by application and not by core.
This would allow precise sampling (but not sure about jitter).
 
Added analogReadResolution().

https://github.com/PaulStoffregen/co...f03b2437e7263b

Did a little testing with PWMServo (oscilloscope only) and it looks good.
Copy the changed file to the core. Attached a servo on pins 2,3,4,5 and 9. Behaved consistently and the way it behaves on the T3.5. Wasn't going to attach the scope but couldn't resist to see the difference - the waveform looks like does on the T3.5. Frequency and period steady at 50hz and 20ms. The +duty on a measurement screen is exactly the same as it was on the T3.5. Not sure what that value represents though :)

So another library confirmed to work.
 
crw-rw-rw- 1 root plugdev 166, 0 Jan 2 12:20 /dev/ttyACM0

Very mysterious. Here's the part of the source from teensy_reboot that's failing.

Code:
static int port_fd=-1;
int open_port(const char *path)
{
        int r;

        port_fd = open(path, O_RDWR | O_NONBLOCK);
        if (port_fd < 0) return -1;
        r = fcntl(port_fd, F_SETFL, fcntl(port_fd, F_GETFL) & (~(O_NONBLOCK)));
        if (r < 0) return -1;
        return 0;
}

I guess more code could be added so we could figure out if the open() call of fcntl() call is failing.
 
From top (closest to pin 23) to bottom (closest to pin 2):

GPIO_SD_B0_04_DAT2
GPIO_SD_B0_05_DAT3
GPIO_SD_B0_00_CMD
3.3V
GPIO_SD_B0_01_CLK
GND
GPIO_SD_B0_02_DAT0
GPIO_SD_B0_03_DAT1
 
a) What are the plans plans re: single-precisions-consts ?
code from wiring.h:
Code:
#define PI 3.1415926535897932384626433832795
#define HALF_PI 1.5707963267948966192313216916398
#define TWO_PI 6.283185307179586476925286766559
#define DEG_TO_RAD 0.017453292519943295769236907684886
#define RAD_TO_DEG 57.295779513082320876798154814105
Now is time for us to learn how to use doubles/float correctly and remove that switch, I'd say.. but what do you think?

b) I'd delete the lines
Code:
#define clockCyclesPerMicrosecond() ( F_CPU / 1000000L )
#define clockCyclesToMicroseconds(a) ( (a) / clockCyclesPerMicrosecond() )
#define microsecondsToClockCycles(a) ( (a) * clockCyclesPerMicrosecond() )
from wiring.h ...
 
beta SPI performance notes:

Registers of interest LPSPI4_CCR (divider) LPSPI4_TCR (frame size). SPI peripheral bus (528mhz/7). Divider is +2, so with DIV field 0, max SPI CLK will be 75.4/2 mhz (37.7 mhz). Paul's current beta SPI has DIV field of 4 (75/6 mhz) and frame size of 8 bits. Sending 1000 bytes goes at 9.5 mbs (megabits/sec), and scope shows SPI CLK running at 12.5 mhz, with inter-byte gap of 260 ns.

With DIV of 0, the SPI CLK Vpp is only 2v running at 37.7 mhz, data rate 16 mbs. If you use a 32-bit frame size at max speed, data rate climbs to 28 mbs. Since waveform isn't great with DIV 0, DIV 1 is probably safer, giving SPI CLK of 25 mhz, and 32-bit frame data rate of 20.6 mbs (interframe gap is 296 ns).

You might be able to reduce interframe gap by using the FIFO. We did some experiments on the EVKB eval board
https://forum.pjrc.com/threads/5426...B-(600-Mhz-M7)?p=192387&viewfull=1#post192387
 
Last edited:
a) What are the plans plans re: single-precisions-consts ?

This is kind of 2 questions...

1: Should we keep -fsingle-precision-constant in the compiler flags?

2: Should we add a "f" suffix in lots of places? Or a "ll" one?

I don't know the answers yet. Still looking for input. Two questions I have are:

How much slower is 64 bit double compared with 32 bit float? My assumption has been about half the speed, but really don't know much yet. Kinda hoping Manitou and others will do more benchmarks....

Without -fsingle-precision-constant, how bad is the impact of a forgotten "f" suffix on a typical expression that would otherwise be done entirely with the much faster 32 bit float instructions? Unfortunately benchmarks don't help, because they're usually carefully written. The question is what happens with "typical" code Arduino users write, where almost nobody appends "f" to constants.
 
Maybe this is related to almost all interrupts currently using the default priority? Servo needs a fairly high NVIC priority level. I built this one on top of IntervalTimer, and currently IntervalTimer's priority setting isn't implemented.
...

opps_CommErr :: Try Blink_SerFast

No Servo - No Interval Timer. USB Ser blocks when sketch starts with Serial ( required in that sketch ) - then kill SerMon and Blink and Debug output halts until SerMon returns.

Updated github - mix and match of sketches left if (Serial) block on printing so it worked.
 
Last edited:
This is kind of 2 questions...

1: Should we keep -fsingle-precision-constant in the compiler flags?

2: Should we add a "f" suffix in lots of places? Or a "ll" one?

I don't know the answers yet. Still looking for input. Two questions I have are:

How much slower is 64 bit double compared with 32 bit float? My assumption has been about half the speed, but really don't know much yet. Kinda hoping Manitou and others will do more benchmarks....

Without -fsingle-precision-constant, how bad is the impact of a forgotten "f" suffix on a typical expression that would otherwise be done entirely with the much faster 32 bit float instructions? Unfortunately benchmarks don't help, because they're usually carefully written. The question is what happens with "typical" code Arduino users write, where almost nobody appends "f" to constants.

I don't know if there is a one size fits all answer. Lets see:
  • There are people who use double and want/need all 53 bits of precision, and get surprised when constants only have 24-bits;
  • There is the ardunio legacy code that due to hysterical raisons thinks double should be the same as float;
  • There are now Teensy users who know to use float, but don't put suffixes on their constants.

Is the Teensy 4 going to use the chip versions that have both hardware float/double? From the question, it sounds like it will. IIRC, there are different versions, that either have no FP, just single precision, or both precisions of FP.

Fortunately, ARM doesn't support IEEE 128-bit, or else the 'L' suffix hack won't work :)
 
TEMP MONITOR

Fixed a error I had in the sketch. Now if the Panic temp is exceeded the chip automatically restarts but the sketch will not start up again. Probably need some sort of reset handler in start for when the PMU_MISC1 panic reset bit was tripped. Not sure I am brave enough to mess with startup

Update was pushed to github
 
As far as I'm aware, there is not a good suffix to add to specify a double, only 'f' to specify a float. So, it seems to me like constants that are not specified would need to be treated as doubles on Teensy 4.0 rather than floats and the -fsingle-precision-constant flag should be dropped. Otherwise, the only option of specifying double precision constants would be to use constexpr.
 
Now if the Panic temp is exceeded the chip automatically restarts but the sketch will not start up again.
A restart would not good either because it would heat the chip again (what we don't want(!?)) What exactly does happen ? If the CPU stops, all is ok, I'd say :) But these are decisions that belong to Paul..

Edit: i can help with the handling of reset-reasons in startup.c, if required.
 
How much slower is 64 bit double compared with 32 bit float? My assumption has been about half the speed, but really don't know much yet. Kinda hoping Manitou and others will do more benchmarks....

Yep, float is about 2 times faster than double, see linpack in
https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=194187&viewfull=1#post194187

another test is lat-lon to UTM lots of trig functions (1000 reps), float 4230 us, double 7720 us (micros() has resolution of 10 us on T4)

saxpy 133 mflops, daxpy 77 flops
sdot 170 mflops, ddot 85 mflops
 
Last edited:
Paul I've stopped fiddling with the shiftin/out/pulse things. measured shift-speed and it is well below your maximum. Its just a matter of of copying them to the t4 core-library. pulse has a variable timeout now in the newer arduino versions, that requires a more exact micros() - i'd do that later.
 
The really unpleasant problem with double precision constants is they cause almost anything 32 bit float they're used with to be automatically converted to 64 bit double (taking cycles if not const) and then math that would have been fast 32 bit float all becomes slower 64 bit double.

Still, maybe half the speed for not being meticulous with "f" suffixes on all constants is worthwhile for all those extra bits of precision?
 
That's a valid point. However, the T4 is faster by its very nature, and this leads to the question of whether we really need more speed with 600MHz, or whether more accuracy is perhaps more desirable after all.
Ok, Arduino-users are not used to higher accuracy, one could say..

Let's ask another question: If Arduino sells a cpu with double-precision fpu in the future: will they use the single-precision-const compilerswitch?
 
Status
Not open for further replies.
Back
Top