SPI weirdness: transfer() returns before finished

emiel.h

Member
Hi! I was wondering if anybody could help me make sense of the following.

I made a PCB that hosts a Teensy 4.1 and various IC's that communicate over I2C and SPI including six MAX31865 RTD-to-Digital Converters.
I'm trying to get the Teensy to communicate with one (any) of the MAX31865's over SPI but am running into some problems.

It works as expected but when I start to send out (high frequency) PWM signals on a, for the MAX31865, irrelevant pin, the communication stops working.
I connected a scope to the CS and SCK pins and made videos of both scenarios (with and without the PWM signal running in the background).


As you can see, one loop consists of two 2-byte transfers and one 3-byte transfer and in between the transfers the CS goes high as expected.
With the PWM signals running, the CS seems loose sync with the SCK over time. Looking at the code, this seems only possible if SPI.transfer() returns before the actual transfer has finished.
If I put a SPI.begin() command in front of each SPI.beginTransaction() this effect is somewhat negated but some data is still coming in corrupted:

I get that this has something to do with the PWM signals causing interference on the SPI lines, what I don't get is why this causes SPI.transfer() to return before the transfer is completed and why this effect worsens over time. My guess is that the SPI hardware is retrying a transfer after it detects something went wrong (hence the extra SCK bytes showing up sporadically on the scope) but this retry is not properly communicated to the SPI library.
If anyone has an idea on how to fix this in software or hardware, a reply would be very appreciated.

Code simplified for readability:
Code:
#include <Arduino.h>
#include <SPI.h>

#define MAX31865_CONFIG_REG 0x00
#define MAX31865_CONFIG_FAULTSTAT 0x02
#define MAX31865_RTDMSB_REG 0x01

#define CSPin 29

void setup() {
  analogWriteFrequency(15, 10000000);
  analogWrite(15, 127);
  analogWriteFrequency(14, 1200000);
  analogWrite(14, 30);

  pinMode(CSPin, OUTPUT);
  digitalWrite(CSPin, HIGH);

  SPI.begin();
}

void loop() {
  readRTD();
  delay(50);
}

uint16_t readRTD(void) {
  uint8_t t = readRegister8(MAX31865_CONFIG_REG);
  t &= ~0x2C;
  t |= MAX31865_CONFIG_FAULTSTAT;
  writeRegister8(MAX31865_CONFIG_REG, t);

  uint16_t rtd = readRegister16(MAX31865_RTDMSB_REG);

  rtd >>= 1;

  return rtd;
}

void writeRegister8(uint8_t reg, uint8_t val) {
  SPI.beginTransaction(SPISettings(1000000, MSBFIRST, SPI_MODE1));
  digitalWrite(CSPin, LOW);

  SPI.transfer(reg | 0x80);  // the write addresses have the first bit set to 1
  SPI.transfer(val);

  digitalWrite(CSPin, HIGH);
  SPI.endTransaction();
}

uint8_t readRegister8(uint8_t reg) {
  SPI.beginTransaction(SPISettings(1000000, MSBFIRST, SPI_MODE1));
  digitalWrite(CSPin, LOW);

  SPI.transfer(reg);
  uint8_t result = SPI.transfer(0);

  digitalWrite(CSPin, HIGH);
  SPI.endTransaction();

  return result;
}

uint16_t readRegister16(uint8_t reg) {
  SPI.beginTransaction(SPISettings(1000000, MSBFIRST, SPI_MODE1));
  digitalWrite(CSPin, LOW);

  SPI.transfer(reg);
  uint16_t result = (uint16_t)SPI.transfer(0) << 8;
  result |= (uint16_t)SPI.transfer(0);

  digitalWrite(CSPin, HIGH);
  SPI.endTransaction();

  return result;
}
 
Odd. Note that its normally required to have some short delay after the last clock pulse before CS goes high, according to the datasheet of the chip involved. The T4 is very fast and explicit delays normally have to be added, even if only dozens of nanoseconds. Other slower microcontrollers may get the delays for free, but not a T4.
 
Thanks for your reaction. I added delayMicroseconds(1) before digitalWrite(CSPin, HIGH) to no avail. This doesn't seem to be a problem with the CS though, I tried leaving the CS pin HIGH during the transfers by removing the digitalWrites to CSPin, and see the same behavior on the scope where the amount of clock pulses is inconsistent.

I fixed it for now by implementing a software SPI solution based on Adafruit's BusIO code.
The original code from Adafruit seems to discard some of the differences between the different SPI modes, which I think I've fixed in the code below.

SoftSPI.h:
Code:
#pragma once
#include <Arduino.h>
#include <SPI.h>

typedef enum _BitOrder {
  SPI_BITORDER_MSBFIRST = MSBFIRST,
  SPI_BITORDER_LSBFIRST = LSBFIRST,
} BusIOBitOrder;

class SoftSPI {
 public:
  void init(int8_t sckpin, int8_t misopin, int8_t mosipin, uint32_t freq, BusIOBitOrder dataOrder, uint8_t dataMode);
  bool begin();
  uint8_t transfer(uint8_t send);
  void transfer(uint8_t *buffer, size_t len);

 private:
  uint32_t _freq;
  BusIOBitOrder _dataOrder;
  uint8_t _dataMode;
  int8_t _sck, _mosi, _miso;
  bool _begun;
};

SoftSPI.cpp:
Code:
#include "SoftSPI.h"

/*
SPI Mode 0:
  In this mode, the clock signal (SCK) is idle low (i.e., at logic level 0),
  and data is sampled on the leading edge of the clock signal.
  The data is shifted out on the trailing edge of the clock signal.

SPI Mode 1:
  In this mode, the clock signal is idle low,
  and data is sampled on the trailing edge of the clock signal.
  The data is shifted out on the leading edge of the clock signal.

SPI Mode 2:
  In this mode, the clock signal is idle high (i.e., at logic level 1),
  and data is sampled on the leading edge of the clock signal.
  The data is shifted out on the trailing edge of the clock signal.

SPI Mode 3:
  In this mode, the clock signal is idle high,
  and data is sampled on the trailing edge of the clock signal.
  The data is shifted out on the leading edge of the clock signal.
*/

void SoftSPI::init(int8_t sckpin, int8_t misopin, int8_t mosipin, uint32_t freq, BusIOBitOrder dataOrder, uint8_t dataMode) {
  _sck = sckpin;
  _miso = misopin;
  _mosi = mosipin;
  _freq = freq;
  _dataOrder = dataOrder;
  _dataMode = dataMode;
  _begun = false;
}

bool SoftSPI::begin() {
  // SCK
  pinMode(_sck, OUTPUT);
  if ((_dataMode == SPI_MODE0) || (_dataMode == SPI_MODE1)) {
    // idle low on mode 0 and 1
    digitalWrite(_sck, LOW);
  } else {
    // idle high on mode 2 or 3
    digitalWrite(_sck, HIGH);
  }

  // MOSI
  pinMode(_mosi, OUTPUT);
  digitalWrite(_mosi, HIGH);

  // MISO
  pinMode(_miso, INPUT);

  _begun = true;
  return true;
}

uint8_t SoftSPI::transfer(uint8_t send) {
  uint8_t data = send;
  transfer(&data, 1);
  return data;
}

void SoftSPI::transfer(uint8_t *buffer, size_t len) {
  uint8_t startbit;
  if (_dataOrder == SPI_BITORDER_LSBFIRST) {
    startbit = 0x1;
  } else {
    startbit = 0x80;
  }

  uint16_t bitdelay_ns = (1000000000 / _freq) / 2;
  if (bitdelay_ns > 110)  // remove overhead time
    bitdelay_ns -= 110;
  else
    bitdelay_ns = 0;

  int SCK_idle = _dataMode == SPI_MODE0 || _dataMode == SPI_MODE1 ? 0 : 1;  // idle low on mode 0 and 1

  for (size_t i = 0; i < len; i++) {
    uint8_t reply = 0;
    uint8_t send = buffer[i];

    for (uint8_t b = startbit; b != 0; b = (_dataOrder == SPI_BITORDER_LSBFIRST) ? b << 1 : b >> 1) {
      delayNanoseconds(bitdelay_ns);

      if (_dataMode == SPI_MODE1 || _dataMode == SPI_MODE3) {
        digitalWrite(_mosi, send & b);

        digitalWrite(_sck, !SCK_idle);

        delayNanoseconds(bitdelay_ns);

        if (digitalRead(_miso)) {
          reply |= b;
        }

        digitalWrite(_sck, SCK_idle);

      } else {  // if (_dataMode == SPI_MODE0 || _dataMode == SPI_MODE2)

        digitalWrite(_sck, !SCK_idle);

        delayNanoseconds(bitdelay_ns);

        digitalWrite(_mosi, send & b);

        digitalWrite(_sck, SCK_idle);

        if (digitalRead(_miso)) {
          reply |= b;
        }
      }

      buffer[i] = reply;
    }
  }
  return;
}

I tested this with 9 devices connected to the same SPI bus + the PWM signals running in the background and didn't notice any problems, it looks stable on the scope.
 
This seems to be a bit of a mystery.
The Teensy4's SPI module uses both transfer and receive FIFOs (queues). The basic transfer() function pushes the argument (data to be sent) into the transfer queue, waits until the receive queue isn't empty, then pops data from the receive queue and returns it. That's how it's getting out of sync: somewhere along the way some extra data lands in the receive FIFO and transfer() starts returning before the passed argument is actually transferred. It does still get sent because it still gets queued, which is why you can still see it on the scope.

As for how/where the extra data in the receive FIFO comes from, I'm not sure. But here's a frozen frame from the "bad" video that tells a story:
teensy_spi_bug.png
You can see the first 2-byte transfer, the second 2-byte transfer, the 3-byte transfer... and then another byte transfer with CS held high. From that point on the CS line goes out of sync due to the receive FIFO containing more data than expected. I can't see anywhere where that extra one byte transfer could be getting queued.
 
Back
Top