Multiple T3.6 Using DMA

Please find below my code for linking 2-n T3.6s using
DMA
8 bit parallel data transfer
a round-robin record transfer using 4-508 byte records
identical code on all processors

This is just a fun program with no practical application as far as I can see as it uses too many pins leaving few useful ones eg SPI for other work.

The code is somewhat crude but I believe it works.
It is not particularly optimised for speed and I see about 577000 bytes per sec travelling around the ring for a 508 byte record on a 3 teensy ring with no real work being done on each record.

Any comments would be appreciated.

There is one known bug as follows:
When the inbound DMA is first enabled, it executes one minor loop immediately despite the fact that it is supposed to be hardware triggered by a clock pin which it does properly ever after. The code includes a one-time bodge to reset the TCD after this minor loop and it seems to work perfectly thereafter. If anybody can shed light on this I would be grateful.


Code:
// DmaSharexx.ino 

// This is a Fun program with no defined application and probably no practical use.
// NOTE:  This program uses very crude code owing to a lack of brain cells in the author.
//        It may contain crude coding errors!

// Objectives:
// To share data between n teensies where n is any number greater than 1 using
// a round robin technique with a single record of 4-508 bytes passed from one to the next 
// using the same code in all teensies connected in a ring.
// To pass data 8 bits at a time
// To use DMA to the maximum extent for both data input and output
//
// Comments:
// The physical configuration is the same on all teensies
// The same code is used on all teensies
// 18 pins are used:
//    8 data in  - contiguous bits on Port D
//    8 data out - contiguous bits on Port C
//    1 clock pin in  - Pin 3 on Port A
//    1 clock pin out - Pin 33 on Port E
// [10 LED driving debug pins are also in the code but are not essential]
// This does not leave many pins for doing anything else and, more importantly,
// uses up pins normally used for other things eg SPI
// The main loop has to adhere to a specific protocol, with opportunities to do
// useful work within that.  It cannot observe its own routine and call on these
// services when it wants to.
// The data record length is fixed at 4 - 508 bytes & must be a multiple of 4.
// The upper limit is fixed by DMA using 9 bits for the length when linking.
// Only one record is travelling around the ring.
// The data throughput is the sum of the read, processing, and write times for 
// all the processors on the ring.
// Measured: For 3 processors & Rec Len = 508 
//   577037 bytes/sec = ~4.6Mbits/sec.
// The main loop() code could be simplified for more speed.
// The code uses dmaChannel mostly but not entirely. Direct TCD manipulation is also used. (??)

// Basic Mechanism:
// Main Loop:
// If record read
//  process record
//  write the record (with or without changes)

// Detailed Mechanism:
// A DMA read channel is permanently enabled, awaiting a hardware trigger.
// This will wait without blocking the cpu until activated by an incoming clock pulse, 
// On completion the DMA read will call an interrupt which will flag the read buffer as FULL.
// The main loop detects the read and processes the record.
// It then writes the record by starting a DMA channel (the data-out channel) to throw 
// the bytes of the record at the port.  
// After each 4 bytes (one minor transfer which is one port-write), this channel stops
// and triggers another channel (the clock channel) which throws a succession of 
// bits at an output pin, generating a clock signal.
// The clock channel triggers itself until complete and then triggers the data-out 
// channel to send the next byte.
// This process continues until the data-out channel has sent the entire record.
// The record write does not check for the following processor being ready to receive,
// it is assumed that it always is.
// The main loop must always write the record out again.
// The main loop can take as long as it likes between reading a record and
// rewriting it, but obviously this delays the record around the ring.

// Other Details:
// All port transfers are 32 bit. The T3.6 has 8 bit port capability but this is not used.
// All data is translated from 8 bit to 32 bit before sending.
// All data is translated from 32 bit to 8 bit after receiving.
// Both data and clock are sent to the ports' toggle registers to ensure that 
// only the bits desired are affected, other bits on the ports are not affected.
// The clock data out consists of a load of no-op bits to the port followed by 
// a single toggle bit, a string of no-ops, a toggle bit, and more no-ops.
// This can be tailored to give a desired clock pulse width.
// The record length is fixed at compile time for all the processors involved.
// The DMA data-out channel is configured to send a final byte which sets the port
// to 0xFF.  This enables the toggling function to calculate toggle bits on the 
// assumption that the port always starts off as 0xFF. The final byte is not clocked
// and so is not read by the following processor.
// Unfortunately it is necessary at this time to specify the serial numbers of all
// the teensies used in the program.
// The teensy with the lowest serial will function as the Master, which is responsible
// for creating the record in the beginning, but has no other function.
// The contents of the record are entirely up to the user, a CRC check of some 
// sort is recommended.

#include <DMAChannel.h>
#include <TeensyID.h>

// Teensy Serial Number
union {
  uint32_t myTeensySerial;
  uint8_t serial[4];
} myId;

uint32_t serialList[3] = {578792,840980,841146}; // Ix = 1,2,3 - 0=error
uint32_t noOfProcessors = (sizeof(serialList))/4; // there must be a better way!
uint8_t mySerialIx = 0;

// Master
bool weAreMaster;                 // true = we will generate the starting record
volatile bool initialRecordToBeCreated;  // true = we will be haven't yet generated the start record

#define myEMPTY false
#define myFULL  true

volatile bool ibPackedBuffStatus  = myEMPTY;
volatile uint32_t dmaGetCnt       = 0;
uint32_t loopCnt      = 0;
uint32_t startMillis  = 0;

// CHANNEL ORDER IS NOT IMPORTANT (to be confirmed)
// The following uses available channels from 0 upwards
// They are usually 0,1,2 but may change if libraries alloc channels first
// The code is not number dependant
// The relevant priority wrt libraries if used should be considered
DMAChannel ibDmaTcdDataChan;  // 0 IB data channel  
DMAChannel obDmaTcdDataChan;  // 1 OB data channel
DMAChannel obDmaTcdClockChan; // 2 OB clock channel
// **==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==*
// BUFFERS
// **==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==*
// CHANGE THIS (AND ONLY THIS) TO SET THE RECORD SIZE
#define DMADATASIZE8 508 // in bytes - min 4 max 508 (9 bits)
// **==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==*
#define DMABUFFSIZEIBPK32 (DMADATASIZE8)
#define DMABUFFSIZEIBPK8  (DMADATASIZE8 * 4)
#define DMABUFFSIZEIBUP32 (DMADATASIZE8/4)
#define DMABUFFSIZEIBUP8  (DMADATASIZE8+1)  // +1 is for the term 0x FF flag

#define DMABUFFSIZEOBUP32 (DMADATASIZE8/4)
#define DMABUFFSIZEOBUP8  (DMADATASIZE8+1)  // +1 is for the term 0x FF flag
#define DMABUFFSIZEOBPK32 (DMADATASIZE8+1)  // +1 is for the term 0x FF flag in a uint32
#define DMABUFFSIZEOBPK8  ((DMADATASIZE8+1) * 4) 

// INBOUND
// IB data is read in as 1 byte stored as uint32_t's in ibPackedBuff
//  and then condensed as 1 byte as 1 byte into ibUnpackedBuff 
volatile uint32_t padding1[32]; // paddings defined to detect buffer overruns in testing
union {
  volatile uint32_t buff32[DMABUFFSIZEIBPK32]; // 
  volatile uint8_t  buff8 [DMABUFFSIZEIBPK8]; // for presetting ib area to detect ib change, only
} volatile ibPackedBuff;
volatile uint32_t padding2[32];
union {
  volatile uint32_t buff32[DMABUFFSIZEIBUP32]; // for user data manip only
  volatile char str[44];
  volatile uint32_t serial;
  volatile uint8_t  buff8 [DMABUFFSIZEIBUP8]; 
} volatile ibUnpackedBuff;

// OUTBOUND
// Data is stored by the user as bytes in obUnpackedBuff and then expanded to 
// uint32_t per byte in toggle format as the data is written to the port toggle register. 
// This makes it easy to just hit the bottom 8 bits of the port register.
// The T3.6 can do 8 bit port writes but this is not supported by later devices eg T4.x
// so is not used here.
volatile uint32_t padding3[32];
union {
  volatile uint32_t buff32[DMABUFFSIZEOBUP32]; // for user data manip only
  volatile uint8_t  buff8 [DMABUFFSIZEOBUP8];
  volatile uint32_t serial;
} volatile obUnpackedBuff;
volatile uint32_t padding4[32];
union {
  volatile uint32_t buff32[DMABUFFSIZEOBPK32];
  volatile uint8_t  buff8 [DMABUFFSIZEOBPK8]; 
} volatile obPackedBuff;
volatile uint32_t padding5[32];

// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// CLOCK
// Only two toggle bytes transmitted change anything, the rest are 
// just no-ops to fix the position with respect to the data byte and
// to set the width of the clock pulse.
#define OBCLOCKDATALEN8 (2*4) // must be 8 - 508 and a multiple of 4
volatile uint32_t padding6;
union {
  volatile uint32_t buff32[OBCLOCKDATALEN8/4];
  volatile uint8_t  buff8 [OBCLOCKDATALEN8];
} volatile obClock;
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// TEENSY IO REGISTERS for Teensy 3.6
// WARNING - "V" PINS ARE ON THE BACK OF THE BOARD
// WARNING You cannot use the pin values via pinXtable[n] from the tables below 
// without explicitly translating them into the correct registers for the device 
// you are using so explicit pin numbers are used in this program for simplicity.
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*  PortA, 10 bits
//                                               V   V   V
//                   A5,A12,A13,A14,A15,A16,A17,A26,A28,A29
//byte pinAtable[] = { 25,  3,  4, 26, 27, 28, 39, 42, 40, 41};
// none used                      
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortB, 16 bits 
//                                   V   V                           V   V   V   V
//                   B0, B1, B2, B3, B4, B5,B10,B11,B16,B17,B18,B19,B20,B21,B22,B23
//byte pinBtable[] = { 16, 17, 19, 18, 49, 50, 31, 32,  0,  1, 29, 30, 43, 46, 44, 45}; 
// debug               D5  D6  D8  D7          D2  D1  D9 D10  D4  D3
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortC -> 12 bits, but LED is on pin 13, pins all in binary order!!!!
// all on front
//                      C0, C1, C2, C3, C4, C5, C6, C7, C8, C9,C10,C11
//uint8_t pinCtable[] = { 15, 22, 23,  9, 10, 13, 11, 12, 35, 36, 37, 38}; 
//                       *   *   *   *   *   *   *   *   
// using the first 8 bits for data inbound
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortD, 8 bits in binary order.
// all on front
//                      D0, D1, D2, D3, D4, D5, D6, D7
//uint8_t pinDtable[] = {  2, 14,  7,  8,  6, 20, 21,  5};
//                       *   *   *   *   *   *   *   *
// using the first 8 bits for data outbound
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortE, 5 bits, in binary order.
//                   V   V         
//                  E10,E11,E24,E25,E26  NOT CONTIG
//byte pinEtable[] = { 56, 57, 33, 34, 24};
// clock out, in                *   *
// note: E0 - E5 used by built-in SDCARD
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// using 33 & 34 as clock out & clock in
#define CLOCK_OUT_PIN 33  // NOT REFERENCED - the port bit is used instead
#define CLOCK_OUT_PORT_BIT (1<<0) // Pin 33 = bit 24 of Port E = 0th bit of MSByte
#define CLOCK_IN_PIN  34   // NOT REFERENCED - only the port E is accessed!
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// 10 pins used to drive LEDs for debugging
#define DEBUG_PIN_01 32
#define DEBUG_PIN_02 31
#define DEBUG_PIN_03 30
#define DEBUG_PIN_04 29
#define DEBUG_PIN_05 16
#define DEBUG_PIN_06 17
#define DEBUG_PIN_07 18
#define DEBUG_PIN_08 19
#define DEBUG_PIN_09 0
#define DEBUG_PIN_10 1
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// PRINT ROUTINES
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void printIbPackedBuff(int event, int loop) { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  Serial.printf("IBPK %d %d\n",event,loop);
  for (int ii = 0; ii < DMABUFFSIZEIBPK8;ii++ ){  
    Serial.printf("%02X ",ibPackedBuff.buff8[ii]);
    if (ii > 0) {
      if ((ii+1) %  4 == 0) {Serial.printf("  ");}
      if ((ii+1) % 16 == 0) {Serial.printf("\n");}
    }
  }
  Serial.printf("\n");
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void printIbUnpackedBuff(int event, int loop) { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  Serial.printf("IBUP %d %d - \n",event,loop);
  for (int ii = 0; ii < DMABUFFSIZEIBUP8;ii++ ){  
    Serial.printf("%02X ",ibUnpackedBuff.buff8[ii]);
    if (ii > 0) {
      if ((ii+1) %  4 == 0) {Serial.printf("  ");}
      if ((ii+1) % 16 == 0) {Serial.printf("\n");}
    }
  }
  Serial.printf("\n");
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void printObPackedBuff(int event) { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  Serial.printf("OBPK %d %d - \n",event,loop);

  for (int ii = 0; ii < DMABUFFSIZEOBPK8;ii++ ){
    Serial.printf("%02X ",obPackedBuff.buff8[ii]);
    if (ii > 0) {
      if ((ii+1) %  4 == 0) {Serial.printf("  ");}
      if ((ii+1) % 16 == 0) {Serial.printf("\n");}
    }
  }
  if ((DMABUFFSIZEOBPK8)%16==0){}
  else {Serial.printf("\n");}
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// BUFFER INITIALISATION ROUTINES
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setClockBuff() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Clear whole buffer to no-ops
  for (int ii=0;ii<OBCLOCKDATALEN8/4;ii++){
    obClock.buff32[ii] = 0x00; // address NO pins - this is a NO-OP
  }
  // set a specific pin bit here, that pin will be toggled
  // on the first and last of the minor loops
  // LSB FIRST so bit 24 is in the last byte!
  obClock.buff8[3] = CLOCK_OUT_PORT_BIT;                     // set toggle pin for low
  obClock.buff8[OBCLOCKDATALEN8 - 1] = CLOCK_OUT_PORT_BIT;   // set toggle pin for high
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void preSetIbPackedBuff() { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Used for debugging only to prove that DMA has written to the buffer
  for (int ii=0;ii<DMABUFFSIZEIBPK32-1;ii++){
    ibPackedBuff.buff32[ii] = 0xAABBCCDD;
  }
  ibPackedBuff.buff32[DMABUFFSIZEIBPK32-1] = 0x11223344;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setIbUnpackedBuff() { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  for (int ii=0;ii<DMABUFFSIZEIBUP8;ii++) {
    ibUnpackedBuff.buff8[ii] = 0x00;
  }
  ibUnpackedBuff.buff8[DMABUFFSIZEIBUP8-1] = 0xFF;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setIbUnpackedBuffTesting() { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Used in debugging only
  for (int ii=0;ii<DMABUFFSIZEIBUP8;ii++) {
    ibUnpackedBuff.buff8[ii] = ii + 16;
  }
  ibUnpackedBuff.buff8[DMABUFFSIZEIBUP8-1] = 0xFF;
  return;
  ibUnpackedBuff.buff8[0] = 0x11;
  ibUnpackedBuff.buff8[1] = 0x12;
  ibUnpackedBuff.buff8[2] = 0x13;
  ibUnpackedBuff.buff8[3] = 0x14;
  ibUnpackedBuff.buff8[4] = 0x15;
  ibUnpackedBuff.buff8[5] = 0x16;
  ibUnpackedBuff.buff8[6] = 0x17;
  ibUnpackedBuff.buff8[7] = 0x18;
  ibUnpackedBuff.buff8[8] = 0x19;
  ibUnpackedBuff.buff8[9] = 0x1A;
  ibUnpackedBuff.buff8[10] = 0x1B;
  ibUnpackedBuff.buff8[11] = 0x0C;
  ibUnpackedBuff.buff8[DMABUFFSIZEIBUP8-1] = 0xFF;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// PACKING / UNPACKING ROUTINES
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void unPackIbPkToIbUp() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Unpack IB Data  - from ibPackedBuff to ibUnpackedBuff, freeing ibPackedBuff
  for (int ii = 0; ii < DMABUFFSIZEIBPK32;ii++ ){
    // The GPIO port is 32 bit but we only want 8
    ibUnpackedBuff.buff8[ii] = ibPackedBuff.buff8[(ii*4)+0]; // in leftmost byte
  } 
  ibUnpackedBuff.buff8[DMABUFFSIZEIBUP8-1] = 0xFF;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void moveIbUpToObUp() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  for (int ii=0;ii<DMABUFFSIZEIBUP8;ii++) { // inc terminal 0xFF
    obUnpackedBuff.buff8[ii] = ibUnpackedBuff.buff8[ii]; 
  }
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void packObUpToObPk() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// assumes that the port takes bits 31-0 as 4 bytes 
  // Force the 0xFF in the INPUT data before we begin
  obUnpackedBuff.buff8[DMABUFFSIZEOBUP8-1] = 0xFF; // force 0xFF for toggling
  // convert the data in bytes to int32s
  for (int ii=0;ii<DMABUFFSIZEOBUP8;ii++) {
    obPackedBuff.buff8[(ii*4)+1] = 0x00;
    obPackedBuff.buff8[(ii*4)+2] = 0x00;
    obPackedBuff.buff8[(ii*4)+3] = 0x00;
    if (ii==0) {
      obPackedBuff.buff8[(ii*4)+0] = 0xFF ^ obUnpackedBuff.buff8[ii];
    } else {
      obPackedBuff.buff8[(ii*4)+0] = obUnpackedBuff.buff8[ii-1] ^ obUnpackedBuff.buff8[ii];
    }
  }
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
char transferSizeS[17];
char ssizeS[17];
char dsizeS[17];
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void showTransferSize(int32_t size) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
	switch (size) {
    case 0:  strcpy(transferSizeS,"8 bit"); break;
    case 1:  strcpy(transferSizeS,"16 bit"); break;
    case 2:  strcpy(transferSizeS,"32 bit"); break;
    case 4:  strcpy(transferSizeS,"16-byte burst"); break;
    case 5:  strcpy(transferSizeS,"32-byte burst"); break;
    default: strcpy(transferSizeS,"INVALID");
	}
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void dumpDmaTcd (DMAChannel  &dmabc, uint32_t event ) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// FOR DEBUGGING ONLY
// THIS CODE NEEDS COMPLETE CHECKING
// NOTE using
// void dumpDmaTcd (DMAChannel  dmabc, uint32_t event ) {
// results in the function reporting the TCD data correctly 
// BUT DMA will fail invisibly afterwards!!!!
// Using & is essential !
  // 32 bit registers have the low order byte first?
  Serial.printf("\n*** TCD For Channel %d %s %d\n",dmabc.channel,"for Event",event);
// DMA STUFF
  Serial.printf("DMA\n");
  // DMA_CR
  Serial.printf("  DMA_CR:     %08X\n", DMA_CR);
  Serial.printf("    DMA_CR_ES:  %x\n"  , DMA_ES);
  Serial.printf("    DMA_CR_ERQ: %08X\n", DMA_ERQ);
  Serial.printf("    DMA_CR_EEI: %x \n" , DMA_EEI);
  Serial.printf("    DMA CR_ERR: %x\n"  , DMA_ERR);
  
  uint32_t *myQ = (uint32_t *)(0x4000800C); 
  Serial.printf("ERQ set for channels ");
  if (*myQ & 0x000001) {Serial.printf("0 ");}
  if (*myQ & 0x000002) {Serial.printf("1 ");}
  if (*myQ & 0x000004) {Serial.printf("2 ");}
  if (*myQ & 0x000008) {Serial.printf("3 ");}
  Serial.printf( "  DMA_ERQ  %08X",*myQ); Serial.printf("\n");
// ***

  //uint32_t *myX4 = (uint32_t *)(&DMA_CDNE); Serial.printf(" DMA_CDNE %08X",*myX4); is a clear reg!
  uint32_t *myX5 = (uint32_t *)(&DMA_INT);  Serial.printf( "  DMA_INT  %08X",*myX5);
  uint32_t *myX6 = (uint32_t *)(&DMA_ERR);  Serial.printf( "  DMA_ERR  %08X",*myX6);
  uint32_t *myX8 = (uint32_t *)(&DMA_HRS);  Serial.printf( "  DMA_HRS  %08X\n",*myX8);

// TCD STUFF  
  Serial.printf("TCD\n");

	Serial.printf("  SADDR:%08X SOFF:%d",(uint32_t)dmabc.TCD->SADDR,dmabc.TCD->SOFF);
	Serial.printf("  DADDR:%08X DOFF:%d",(uint32_t)dmabc.TCD->DADDR,dmabc.TCD->DOFF);
	Serial.printf("  NBYTES:%d SLAST:%d\n",        dmabc.TCD->NBYTES,dmabc.TCD->SLAST);
	// TCD ATTR
  int32_t dsize = dmabc.TCD->ATTR & 0b0000000000000111;
  int32_t dmod  = dmabc.TCD->ATTR & 0b0000000011111000; dmod >>= 3;
  int32_t ssize = dmabc.TCD->ATTR & 0b0000011100000000; ssize >>= 8;
  int32_t smod  = dmabc.TCD->ATTR & 0b1111100000000000; smod >>= 11;
  showTransferSize(ssize); strcpy(ssizeS,transferSizeS);
  showTransferSize(dsize); strcpy(dsizeS,transferSizeS);
	Serial.printf("  ATTR:%08X",dmabc.TCD->ATTR);
	Serial.printf("    SMOD:%X SSIZE:%X DMOD:%d DSIZE:%d",smod,ssize,dmod,dsize);
	Serial.printf("    SSIZE:%s DSIZE:%s\n",ssizeS,dsizeS);
	  
	  
  uint16_t citer;
  uint16_t citerCh;
  char celnkS[17];
  int16_t citerelnk  = dmabc.TCD->CITER & 0x8000;
	Serial.printf(" CITER:%08X",dmabc.TCD->CITER);
  if (citerelnk != 0) {
    citer = dmabc.TCD->CITER & 0x01FF;
    citerCh = (dmabc.TCD->CITER & 0x3E00)>>9;
    sprintf(celnkS,"YES, CH=%d",citerCh);
  } else {
    strcpy(celnkS,"NO ");
    citer = dmabc.TCD->CITER & 0x7FFF;
  }
  
  uint32_t biter;
  uint16_t biterCh;
  char belnkS[17];
  int16_t biterelnk  = dmabc.TCD->BITER & 0x8000;
	Serial.printf(" BITER:%08X",dmabc.TCD->BITER);
  if (biterelnk != 0) {
    biter = dmabc.TCD->BITER & 0x01FF;
    biterCh = (dmabc.TCD->BITER & 0x3E00)>>9;
    sprintf(belnkS,"YES, CH=%d",biterCh);
  } else {
    strcpy(belnkS,"NO ");
    biter = dmabc.TCD->BITER & 0x7FFF;
  }
  
	Serial.printf(" CITER:%d BITER:%d CELNK:%s BELNK:%s\n", 
	  citer, biter, celnkS, belnkS);
  if (citerelnk != biterelnk) {
	  Serial.printf("CELNK DOES NOT MATCH BELNK\n");
	}
  if (citer != biter) {
	  Serial.printf("CITER DOES NOT MATCH BITER\n");
	}
	
	
	Serial.printf(" DLASTSGA:%08X\n", dmabc.TCD->DLASTSGA);
	// CSR
	Serial.printf(" CSR:%08X\n",       dmabc.TCD->CSR);
	Serial.printf("   BWC=%08X ",      dmabc.TCD->CSR & DMA_TCD_CSR_BWC(0));
	Serial.printf("   LINKCH=%08X ",   dmabc.TCD->CSR & DMA_TCD_CSR_MAJORLINKCH(0));
	Serial.printf("   MAJLINK=%08X\n", dmabc.TCD->CSR & DMA_TCD_CSR_MAJORELINK);
	Serial.printf("   ESG=%08X ",      dmabc.TCD->CSR & DMA_TCD_CSR_ESG);
	Serial.printf("   DREQ=%08X ",     dmabc.TCD->CSR & DMA_TCD_CSR_DREQ);
	Serial.printf("   INTH=%08X ",     dmabc.TCD->CSR & DMA_TCD_CSR_INTHALF);
	Serial.printf("   INTF=%08X ",     dmabc.TCD->CSR & DMA_TCD_CSR_INTMAJOR);
	Serial.printf("   START=%08X\n",   dmabc.TCD->CSR & DMA_TCD_CSR_START);
	Serial.printf("   ACTIVE=%08X ",   dmabc.TCD->CSR & DMA_TCD_CSR_ACTIVE);
	Serial.printf("   DONE=%08X ",     dmabc.TCD->CSR & DMA_TCD_CSR_DONE);
	Serial.printf("\n");
	//Serial.printf("CITER:%0X %d\n",dmabc.TCD->CITER, dmabc.TCD->CITER);
	//showDmaErr(dmabc.channel);
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void showDmaErr(int ch) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
uint32_t *myErr = (uint32_t *)(0x40008004);
int worki;
  //Serial.print("chECP=");
  //Serial.println((int)myECP2,HEX);
  Serial.print("Error Register: ");
  Serial.println(*myErr,HEX);
  
  worki = *myErr & 0x80000000; worki >>= 31;
  if (worki==1){
    worki = *myErr & 0x00000001; worki >>= 0;
    if (worki==1){Serial.printf("DBE error\n");}
    worki = *myErr & 0x00000002; worki >>= 1;
    if (worki==1){Serial.printf("SBE error\n");}
    worki = *myErr & 0x00000004; worki >>= 2;
    if (worki==1){Serial.printf("SGE error\n");}
    worki = *myErr & 0x00000008; worki >>= 3;
    if (worki==1){Serial.printf("NCE error\n");}
    worki = *myErr & 0x00000010; worki >>= 4;
    if (worki==1){Serial.printf("DOE error\n");}
    worki = *myErr & 0x00000020; worki >>= 5;
    if (worki==1){Serial.printf("DAE error\n");}
    worki = *myErr & 0x00000040; worki >>= 6;
    if (worki==1){Serial.printf("SOE error\n");}
    worki = *myErr & 0x00000080; worki >>= 7;
    if (worki==1){Serial.printf("SAE error\n");}
  
    worki = *myErr & 0x00001F00; worki >>= 8;
    Serial.printf("error channel %d\n",worki);
  
    worki = *myErr & 0x00004000; worki >>= 14;
    if (worki==1){Serial.printf("CPE error\n");}
    worki = *myErr & 0x00008000; worki >>= 15;
    if (worki==1){Serial.printf("GPE error\n");}
    worki = *myErr & 0x00010000; worki >>= 16;
    if (worki==1){Serial.printf("ECX error\n");}
  }
}
// TODO - USE portConfigRegister(pin);
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void forceDataPinsHigh() { // used for debugging
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // USE DIGITS DIRECTLY
  digitalWrite(15,HIGH);
  digitalWrite(22,HIGH);
  digitalWrite(23,HIGH); 
  digitalWrite(9, HIGH);
  digitalWrite(10,HIGH);
  digitalWrite(13,HIGH);
  digitalWrite(11,HIGH);
  digitalWrite(12,HIGH); 
  // NOTE The last pin in the pin table gives the highest value bit
  delayMicroseconds(5); // REQUIRED
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
int32_t getCiter(DMAChannel  &dmabc) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  uint16_t citer;
  int16_t citerelnk  = dmabc.TCD->CITER & 0x8000;
  if (citerelnk != 0) {
    citer = dmabc.TCD->CITER & 0x01FF;
  } else {
    citer = dmabc.TCD->CITER & 0x7FFF;
  }
  return citer;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
int32_t getBiter(DMAChannel  &dmabc) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  uint16_t biter;
  int16_t biterelnk  = dmabc.TCD->BITER & 0x8000;
  if (biterelnk != 0) {
    biter = dmabc.TCD->BITER & 0x01FF;
  } else {
    biter = dmabc.TCD->BITER & 0x7FFF;
  }
  return biter;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void forceDataPinsLow() { // used for debugging
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // USE DIGITS DIRECTLY
  digitalWrite(15,LOW);
  digitalWrite(22,LOW);
  digitalWrite(23,LOW); 
  digitalWrite(9, LOW);
  digitalWrite(10,LOW);
  digitalWrite(13,LOW);
  digitalWrite(11,LOW);
  digitalWrite(12,LOW); 
  // NOTE The last pin in the pin table gives the highest value bit
  delayMicroseconds(5); // REQUIRED
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setup() { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// DATA  OUTPUT=PORTC INPUT=PORTD
// CLOCK OUTPUT=PORTE INPUT=PORTE

  // DEBUGGING START
  // 10 pins drive LEDs for debug communication when Serial is not in use
  // ie when all processors are being powered by direct power USB source
  // Setup debug pins
  pinMode(DEBUG_PIN_01, OUTPUT);
  pinMode(DEBUG_PIN_02, OUTPUT);
  pinMode(DEBUG_PIN_03, OUTPUT);
  pinMode(DEBUG_PIN_04, OUTPUT);
  pinMode(DEBUG_PIN_05, OUTPUT);
  pinMode(DEBUG_PIN_06, OUTPUT);
  pinMode(DEBUG_PIN_07, OUTPUT);
  pinMode(DEBUG_PIN_08, OUTPUT);
  pinMode(DEBUG_PIN_09, OUTPUT);
  pinMode(DEBUG_PIN_10, OUTPUT); 
  digitalWrite(DEBUG_PIN_01, LOW);
  digitalWrite(DEBUG_PIN_02, LOW);
  digitalWrite(DEBUG_PIN_03, LOW);
  digitalWrite(DEBUG_PIN_04, LOW);
  digitalWrite(DEBUG_PIN_05, LOW);
  digitalWrite(DEBUG_PIN_06, LOW);
  digitalWrite(DEBUG_PIN_07, LOW);
  digitalWrite(DEBUG_PIN_08, LOW);
  digitalWrite(DEBUG_PIN_09, LOW);
  digitalWrite(DEBUG_PIN_10, LOW);
  // DEBUGGING END
  
  Serial.begin(115200);  
  delay(5000);

  // Handle the serial number of this Teensy
  teensySN(myId.serial);
  myId.myTeensySerial = __builtin_bswap32(myId.myTeensySerial); // IS THIS NECESSARY ? !!!
  mySerialIx = getSerialIx(myId.myTeensySerial);
  Serial.printf("SER=%d\n",myId.myTeensySerial);

  // The processor with the lowest id is the Master
  weAreMaster = true;
  bool weAreInList = false;
  for (uint32_t ii=0;ii<noOfProcessors;ii++){
    if (serialList[ii] <  myId.myTeensySerial) {weAreMaster = false;}
    if (serialList[ii] == myId.myTeensySerial) {weAreInList = true ;}
  }
  if (weAreMaster){
    initialRecordToBeCreated = true;
    Serial.printf("MASTER=YES\n");
  } else {
    initialRecordToBeCreated = false;
    Serial.printf("MASTER=NO\n");
  }
  // If our serial number is not in the list, hard stop
  if (!weAreInList) {
    Serial.printf("OUR SERIAL %d NOT IN LIST\n",myId.myTeensySerial);
    for (uint32_t ii=0;ii<noOfProcessors;ii++){
      Serial.printf("  %d\n",serialList[ii]);
    }
    while(1==1){} // HARD LOOP
  }
  Serial.printf("NOPR  %d\n",noOfProcessors); // for debugging

  // Build the clocking data
  setClockBuff();
  
  // Setup Data Input Pins
  //for (int ii=0;ii<8;ii++) {pinMode(pinDtable[ii], INPUT);} WILL NOT WORK
  pinMode(2,  INPUT);
  pinMode(14, INPUT);
  pinMode(7,  INPUT);
  pinMode(8,  INPUT);
  pinMode(6,  INPUT);
  pinMode(20, INPUT);
  pinMode(21, INPUT);
  pinMode(5,  INPUT);

  // Setup Output Pins
  pinMode(15,OUTPUT); // LSB
  pinMode(22,OUTPUT);
  pinMode(23,OUTPUT); 
  pinMode(9, OUTPUT);
  pinMode(10,OUTPUT);
  pinMode(13,OUTPUT);
  pinMode(11,OUTPUT);
  pinMode(12,OUTPUT); // MSB
  
  // Set up Clock Pins
  pinMode(33, OUTPUT);
  digitalWrite(33, HIGH);
  pinMode(34, INPUT);

  // Put the data pins into a known starting position
  forceDataPinsHigh();

// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// Set up the DMA channels
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
/*
To initialize the eDMA: (ex the manual)
1. Write to the CR if a configuration other than the default is desired.
2. Write the channel priority levels to the DCHPRIn registers if a configuration other than the default is desired.
3. Enable error interrupts in the EEI register if so desired.
4. Write the 32-byte TCD for each channel that may request service.
5. Enable any hardware service requests via the ERQH and ERQL registers.
6. Request channel service via either:
• Software: setting the TCDn_CSR[START]
• Hardware: slave device asserting its eDMA peripheral request signal
*/
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// DMA CLOCK CHANNEL - ch 0
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
  Serial.printf("Clock Channel is %d\n",obDmaTcdClockChan.channel);
//1 Write CR
//2 Write Priority
//3 Enable Error Interrupts
//4 Write TCD
  obDmaTcdClockChan.begin();
  obDmaTcdClockChan.destination((volatile uint32_t&)GPIOE_PTOR); // clock pin TOGGLE register
  obDmaTcdClockChan.sourceBuffer((uint32_t *)obClock.buff32,OBCLOCKDATALEN8); // len is in BYTES!!!
  obDmaTcdClockChan.disableOnCompletion();
//5 Enable any hardware service requests via the ERQH and ERQL registers.
// ERQH and ERQL are not defined anywhere!!!
//6 Request channel service via either:
//  • Software: setting the TCDn_CSR[START]
//  • Hardware: slave device asserting its eDMA peripheral request signal
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// DMA DATA OB CHANNEL
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
  Serial.printf("OB Channel is %d\n",obDmaTcdDataChan.channel);
//1 Write CR
//2 Write Priority
//3 Enable Error Interrupts
//4 Write TCD
  obDmaTcdDataChan.begin();
  obDmaTcdDataChan.destination((volatile uint32_t&)GPIOC_PTOR); // TOGGLE REGISTER 
  obDmaTcdDataChan.sourceBuffer((uint32_t *)obPackedBuff.buff32,(DMABUFFSIZEOBPK8)); // LEN in BYTES!!!
  obDmaTcdDataChan.transferCount(DMABUFFSIZEOBPK32);
  obDmaTcdDataChan.disableOnCompletion();
  // Bandwidth Control. Provides a means of controlling the amount of bus bandwidth the eDMA uses.
  //  00 No stalls–consume 100% bandwidth
  //  01 Reserved
  //  10 eDMA stalls for 4 cycles after each read/write 
  //  11 eDMA stalls for 8 cycles after each read/write
  //obDmaTcdDataChan.TCD->CSR |= 0xC000; // or in the BWC
  obDmaTcdClockChan.triggerAtTransfersOf( obDmaTcdDataChan);
  obDmaTcdClockChan.triggerAtTransfersOf( obDmaTcdClockChan);
  obDmaTcdDataChan.triggerAtCompletionOf( obDmaTcdClockChan);
  // following not used as we send 0xFF to set the port but do not require
  // the receiver to read it (as we could not get it to work!) 
  // not used - obDmaTcdClockChan.triggerAtCompletionOf(obDmaTcdDataChan);
//5 Enable any hardware service requests via the ERQH and ERQL registers.
//6 Request channel service via either:
//  • Software: setting the TCDn_CSR[START]
//  • Hardware: slave device asserting its eDMA peripheral request signal
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// DMA DATA IB CHANNEL
// = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
  Serial.printf("IB Channel is %d\n",ibDmaTcdDataChan.channel);
//1 Write CR
//2 Write Priority
//3 Enable Error Interrupts
//4 Write TCD
  // Note This channel should never run against the other 2 channels but might
  // encounter Library channels if any are added.
  ibDmaTcdDataChan.begin();
  ibDmaTcdDataChan.source((volatile uint32_t&) GPIOD_PDIR);
  ibDmaTcdDataChan.destinationBuffer(ibPackedBuff.buff32,DMABUFFSIZEIBPK8); // note byte length
  ibDmaTcdDataChan.transferCount(DMABUFFSIZEIBPK32); // ?????????????????????????????
  // This channel is continuously enabled:
  // ibDmaTcdDataChan.disableOnCompletion();
  ibDmaTcdDataChan.attachInterrupt(dmaGetEndedInterrupt);
  ibDmaTcdDataChan.interruptAtCompletion();
  // The following type of "enable request" is needed for hardware triggering
  ibDmaTcdDataChan.enable(); // same as next: // this triggers 1 minor loop
  // DMA_SERQ = ibDmaTcdDataChan.channel; 
  // This channel is left enabled at the end of each major loop
//5 Enable any hardware service requests via the ERQH and ERQL registers.
  //Address: 4002_1000h base + 0h offset + (1d × i), where i=0d to 31d
  uint32_t *myQ = (uint32_t *)(0x40021000)+ibDmaTcdDataChan.channel; 
  *myQ = 0;
//  *myQ = DMAMUX_ENABLE | (DMAMUX_SOURCE_PORTA & 63) ;
dumpDmaTcd(ibDmaTcdDataChan, 11111);
//  *myQ = DMAMUX_ENABLE | DMAMUX_SOURCE_PORTA ;
  *myQ = DMAMUX_ENABLE | DMAMUX_SOURCE_PORTE ;
dumpDmaTcd(ibDmaTcdDataChan, 22222);
//6 Request channel service via either:
//  • Software: setting the TCDn_CSR[START]
//  • Hardware: slave device asserting its eDMA peripheral request signal
  //NVIC_SET_PRIORITY(IRQ_PORTE, 32); // not tried yet
	PORTE_PCR25 |= PORT_PCR_IRQC(2); // Trigger DMA request on falling edge
// MISC NOTES
  ibDmaTcdDataChan.triggerAtHardwareEvent(DMAMUX_SOURCE_PORTE);
dumpDmaTcd(ibDmaTcdDataChan, 33333);
  
  delay(3000); // allow time for dma to do mischief
dumpDmaTcd(ibDmaTcdDataChan, 44444);
  
  // **** Start Right Royal Bodge
  // No matter how we set up the IB DMA channel, we always get one minor loop executed
  // BEFORE we have received an input clock pin interrupt!
  // This leaves the channel out of phase with the sender.
  // This bodge modifies the TCD to look like new
  // CRUDE RESET
  ibDmaTcdDataChan.TCD->CITER = ibDmaTcdDataChan.TCD->BITER;
  ibDmaTcdDataChan.destinationBuffer(ibPackedBuff.buff32,DMABUFFSIZEIBPK8); // note byte length
  // **** End Right Royal Bodge
  
  if (weAreMaster) {delay(2000);} // ensure others are ready to read
dumpDmaTcd(ibDmaTcdDataChan, 55555);

  Serial.println("SETUP ENDED");  
  startMillis = millis(); // for debugging
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
//bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)
//type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)
//These builtins perform an atomic compare and swap. That is, if the current value of *ptr is oldval, then write newval into *ptr.
//The “bool” version returns true if the comparison is successful and newval was written. 
//The “val” version returns the contents of *ptr before the operation. 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
inline bool enqueue( volatile unsigned int *lock ) { // not used
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
	return (__sync_bool_compare_and_swap( lock, 0, 1 ) );
	// returns true  if lock was 0 and was replaced by 1
	// returns false if lock was 1 ie owned by someone else
}
//Prior to initializing the corresponding module, set SCGC5[PORTx] in the SIM module to enable the clock. Before turning off the clock, make sure to disable the module. 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
inline void dequeue( volatile unsigned int *lock ) { // not used
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  asm volatile ( "" ::: "memory" );
  *lock = 0;
  // sets lock to 0, releasing it
  return;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void doDmaGet() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // no longer required as IB DMA now continuously enabled
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void doDmaPut() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
//  DMA_SERQ = 1; // Start OB - NOT needed
  obDmaTcdDataChan.TCD->CSR |= DMA_TCD_CSR_START ; // 0x0001
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void dmaGetEndedInterrupt() { // Called by DMA IB on completion of read
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  //ibDmaTcdDataChan.clearInterrupt(); // same as:
  DMA_CINT = ibDmaTcdDataChan.channel; // clear interrupt
  ibPackedBuffStatus = myFULL;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
uint8_t getSerialIx(uint32_t thisSerial) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
uint8_t ix = 0; // 0=error
  for (uint32_t ii=0;ii<noOfProcessors;ii++) {
    if (thisSerial == serialList[ii]) {ix=ii+1;}
  }
  return ix;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void loop() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  volatile bool myIbPackedBuffStatus = myEMPTY;
  // Caution - ibPackedBuffStatus is also changed in an interrupt
  // We can change it here when the IB DMA channel is not running
  
  loopCnt++;
    
  if (millis()-startMillis > 10000){ // debugging
    printIbUnpackedBuff(dmaGetCnt,loopCnt);
    dumpDmaTcd(obDmaTcdDataChan,  91);
    dumpDmaTcd(obDmaTcdClockChan, 92);
    dumpDmaTcd(ibDmaTcdDataChan,  93);
    showDmaErr(obDmaTcdDataChan.channel);
    showDmaErr(obDmaTcdClockChan.channel);
    showDmaErr(ibDmaTcdDataChan.channel);
    unPackIbPkToIbUp(); 
    while(1==1){}
  }

  // Handle Startup
  if (initialRecordToBeCreated) { 
    initialRecordToBeCreated = false;
    setIbUnpackedBuff();
    moveIbUpToObUp();
    packObUpToObPk(); 
    doDmaPut();
    return;
  }

  // Handle Get-Got Interrupt Flag
  noInterrupts();
  myIbPackedBuffStatus = ibPackedBuffStatus;
  ibPackedBuffStatus = myEMPTY;
  interrupts();

  // Handle Get Got
  if (myIbPackedBuffStatus == myFULL) {
    dmaGetCnt++;
    //printIbPackedBuff(dmaGetCnt,loopCnt);
    unPackIbPkToIbUp(); 
    //printIbUnpackedBuff(dmaGetCnt,loopCnt);    
    // *** START Work - tinker with the record here in ib unpacked buffer
    ibUnpackedBuff.buff32[mySerialIx]++;
    // *** END   Work
    moveIbUpToObUp();
    //printObUnpackedBuff(dmaGetCnt,loopCnt); // function does not exist
    packObUpToObPk(); 
    //printObPackedBuff(dmaGetCnt,loopCnt);
    doDmaPut();
  }   
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setDebugPins(int32_t val) { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// Set the 10 debug pins to show val to the extent that 10 bits allows
    val <<= 22;
    if (val < 0){digitalWrite(DEBUG_PIN_01, HIGH);} else {digitalWrite(DEBUG_PIN_01, LOW );}
    val <<= 1;
    if (val < 0){digitalWrite(DEBUG_PIN_02, HIGH);} else {digitalWrite(DEBUG_PIN_02, LOW );}
    val <<= 1;
    if (val < 0){digitalWrite(DEBUG_PIN_03, HIGH);} else {digitalWrite(DEBUG_PIN_03, LOW );}
    val <<= 1;
    if (val < 0){digitalWrite(DEBUG_PIN_04, HIGH);} else {digitalWrite(DEBUG_PIN_04, LOW );}
    val <<= 1;
    if (val < 0){digitalWrite(DEBUG_PIN_05, HIGH);} else {digitalWrite(DEBUG_PIN_05, LOW );}
    val <<= 1;
    if (val < 0){digitalWrite(DEBUG_PIN_06, HIGH);} else {digitalWrite(DEBUG_PIN_06, LOW );}
    val <<= 1;
    if (val < 0){digitalWrite(DEBUG_PIN_07, HIGH);} else {digitalWrite(DEBUG_PIN_07, LOW );}
    val <<= 1;
    if (val < 0){digitalWrite(DEBUG_PIN_08, HIGH);} else {digitalWrite(DEBUG_PIN_08, LOW );}
    val <<= 1;
    if (val < 0){digitalWrite(DEBUG_PIN_09, HIGH);} else {digitalWrite(DEBUG_PIN_09, LOW );}
    val <<= 1;
    if (val < 0){digitalWrite(DEBUG_PIN_10, HIGH);} else {digitalWrite(DEBUG_PIN_10, LOW );}
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
 
Sounds like the clock signal is seen as "active" when the DMA is configured, so it triggers immediately - this might be
due something expecting active-low when the signal's active-high, or vice-versa, so long as the workaround
is reliable I'd not worry.
 
In the absence of shouts of "that's a waste of time', I have cleaned up the code a bit, taken out lots of debugging stuff, and greatly enhanced the dumpDmaTcd() function which Serial.Prints a TCD's details in more readable form.
I have now run it in a 4 processor ring and get 0.43 Mbytes/sec.
I original envisaged "n" records circulating for "n" processors which would have put up the throughput "n" times (without changing the speed) but it got a bit too complex.

Code:
// DmaSharexx.ino 

// This is a Fun program with no defined application and probably no practical use.
// NOTE:  This program uses simple crude code. No classy C++ classes.

// Objectives:
// To share data between n 3.6 teensies where n is any number greater than 1 using
// a round robin technique with a single record of 4-508 bytes passed from one to the next 
// using the same code in all teensies connected in a ring.
// To pass data 8 bits at a time
// To use DMA to the maximum extent for both data input and output
//
// Comments:
// It has been developed & tested for Teensy 3.6s only.
// The physical configuration is the same on all the teensies used.
// The same code is used on all the teensies.
// 18 pins are used:
//    8 data in  - contiguous bits on Port D
//    8 data out - contiguous bits on Port C
//    1 clock pin in  - Pin 34 on Port E
//    1 clock pin out - Pin 33 on Port E
// This does not leave many pins for doing anything else and, more importantly,
// uses up pins normally used for other things eg SPI
// The main loop has to adhere to a specific protocol, with opportunities to do
// useful work within that.  It cannot observe its own routine and call on these
// services when it wants to.
// The data record length is fixed at 4 - 508 bytes & must be a multiple of 4.
// The upper limit is fixed by DMA using 9 bits for the length when linking.
// Only one record is travelling around the ring.
// The original plan was for 1 record per processor which would have increased
// throughput (but not speed) n times, but it got too complex.
// The data throughput is the sum of the read, processing, and write times for 
// all the processors on the ring.

// Measured: For 4 processors & Rec Len = 508 
// 8470 records * 508 bytes for 10 secs = 430276 bytes/sec

// The code uses the dmaChannel library mostly but not entirely. 
// Direct TCD manipulation is also used. 
// Basic Mechanism:
// Main Loop:
// If record read
//  process record
//  write the record (with or without changes)

// Detailed Mechanism:
// A DMA read (inbound) channel is permanently enabled, awaiting a hardware trigger.
// This will wait without blocking the cpu until activated by an incoming clock pulse, 
// On completion the DMA read will call an interrupt which will flag the read buffer as FULL.
// The main loop detects the read and processes the record.
// It then writes the record by starting a DMA channel (the outbound channel) to throw 
// the bytes of the record at the port.  
// After each 4 bytes (one minor transfer which is one port-write), this channel stops
// and triggers another channel (the clock channel) which throws a succession of 
// bits at an output pin, generating a clock signal.
// The clock channel triggers itself until complete and then triggers the outbound 
// channel to send the next byte.
// This process continues until the outbound channel has sent the entire record.
// The record write does not check for the following processor being ready to receive,
// it is assumed that it always is.
// The main loop must always write the record out again.
// The main loop can take as long as it likes between reading a record and
// rewriting it, but obviously this delays the record around the ring.

// Other Details:
// All port transfers are 32 bit. The T3.6 has 8 bit port capability but this is not used.
// All data is translated from 8 bit to 32 bit before sending.
// All data is translated from 32 bit to 8 bit after receiving.
// Both data and clock are sent to the ports' toggle registers to ensure that 
// only the pins desired are affected, other pins on the ports are not affected.
// The clock data out consists of a load of no-op bits to the port followed by 
// a single toggle bit, a string of no-ops, a toggle bit, and more no-ops.
// This can be tailored to give the desired clock pulse width and positioning.
// The record length is fixed at compile time for all the processors involved.
// The DMA outbound channel is configured to send a final byte which sets the port
// to 0xFF.  This enables the toggling function to calculate toggle bits on the 
// assumption that the port always starts off as 0xFF. The final byte is not clocked
// and so is not read by the following processor.
// Unfortunately it is necessary at this time to specify the serial numbers of all
// the teensies used in the program, at compile time.
// The teensy with the lowest serial will function as the Master, which is responsible
// for creating the record in the beginning, but has no other function.
// The contents of the record are entirely up to the user, a CRC check of some 
// sort is recommended.

// A dumpDmaTcd() function is also included which describes the TCD and reduces the need
// to refer to the manual when debugging, at the cost of considerable processing time.

#include <DMAChannel.h>
#include <TeensyID.h>

struct iter_t {
  uint16_t  iter;
  uint16_t  iterCh;
  uint16_t  iterelnk;
} myCiter, myBiter;

// Teensy Serial Number
union {
  uint32_t myTeensySerial;
  uint8_t serial[4];
} myId;

uint32_t serialList[] = {578792,840980,841146,841024}; // Ix = 1,2,3 - 0=error
uint32_t noOfProcessors = (sizeof(serialList))/4; // there must be a better way!
uint8_t mySerialIx = 0;

// Master
bool weAreMaster;                         // true = WE will generate the starting record
volatile bool initialRecordToBeCreated;   // true = we haven't yet generated the start record (if we are due to)

#define myEMPTY false
#define myFULL  true

volatile bool ibPackedBuffStatus  = myEMPTY;
volatile uint32_t dmaGetCnt       = 0;
uint32_t loopCnt      = 0;
uint32_t startMillis  = 0;
float result; // used in a calculation to simulate work

// CHANNEL ORDER IS NOT IMPORTANT to this program(to be confirmed)
// The following uses available channels from 0 upwards
// They are usually 0,1,2 but may change if libraries alloc channels first
// The code is not number dependant
// The relevant priority wrt libraries if used should be considered
// The default Round-Robin priorities are assumed but should not affect it.
DMAChannel ibDmaTcdDataChan;  // 0 IB data channel  
DMAChannel obDmaTcdDataChan;  // 1 OB data channel
DMAChannel obDmaTcdClockChan; // 2 OB clock channel
// **==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==*
// BUFFERS
// **==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==*
// CHANGE THIS (AND ONLY THIS) TO SET THE RECORD SIZE
#define DMADATASIZE8 508 // in bytes - min 4 max 508 (9 bits) multiple of 4
// **==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==*
#define DMABUFFSIZEIBPK32 (DMADATASIZE8)
#define DMABUFFSIZEIBPK8  (DMADATASIZE8 * 4)
#define DMABUFFSIZEIBUP32 (DMADATASIZE8/4)
#define DMABUFFSIZEIBUP8  (DMADATASIZE8+1)  // +1 is for the term 0x FF flag

#define DMABUFFSIZEOBUP32 (DMADATASIZE8/4)
#define DMABUFFSIZEOBUP8  (DMADATASIZE8+1)  // +1 is for the term 0x FF flag
#define DMABUFFSIZEOBPK32 (DMADATASIZE8+1)  // +1 is for the term 0x FF flag in a uint32
#define DMABUFFSIZEOBPK8  ((DMADATASIZE8+1) * 4) 

// INBOUND
// IB data is read in as 1 byte stored as uint32_t's in ibPackedBuff
//  and then condensed as 1 byte as 1 byte into ibUnpackedBuff 
volatile uint32_t padding1[32]; // paddings defined to detect buffer overruns in testing
union {
  volatile uint32_t buff32[DMABUFFSIZEIBPK32]; // 
  volatile uint8_t  buff8 [DMABUFFSIZEIBPK8]; // for presetting ib area to detect ib change, only
} volatile ibPackedBuff;
volatile uint32_t padding2[32];
union {
  volatile uint32_t buff32[DMABUFFSIZEIBUP32]; // for user data manip only
  volatile char str[44];
  volatile uint32_t serial;
  volatile uint8_t  buff8 [DMABUFFSIZEIBUP8]; 
} volatile ibUnpackedBuff;

// OUTBOUND
// Data is stored by the user as bytes in obUnpackedBuff and then expanded to 
// uint32_t per byte in toggle format as the data is written to the port toggle register. 
// This makes it easy to just hit the bottom 8 bits of the port register.
// The T3.6 can do 8 bit port writes but this is not supported by later devices eg T4.x
// so is not used here.
volatile uint32_t padding3[32];
union {
  volatile uint32_t buff32[DMABUFFSIZEOBUP32]; // for user data manip only
  volatile uint8_t  buff8 [DMABUFFSIZEOBUP8];
  volatile uint32_t serial;
} volatile obUnpackedBuff;
volatile uint32_t padding4[32];
union {
  volatile uint32_t buff32[DMABUFFSIZEOBPK32];
  volatile uint8_t  buff8 [DMABUFFSIZEOBPK8]; 
} volatile obPackedBuff;
volatile uint32_t padding5[32];

// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// CLOCK
// Only two toggle bytes transmitted change anything, the rest are 
// just no-ops to fix the position with respect to the data byte 
// AND to set the width of the clock pulse.
#define OBCLOCKDATALEN8 (2*4) // must be 8 - 508 and a multiple of 4
volatile uint32_t padding6;
union {
  volatile uint32_t buff32[OBCLOCKDATALEN8/4];
  volatile uint8_t  buff8 [OBCLOCKDATALEN8];
} volatile obClock;
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// TEENSY IO REGISTERS for Teensy 3.6
// WARNING - "V" PINS ARE ON THE BACK OF THE BOARD
// WARNING You cannot use the pin values via pinXtable[n] from the tables below 
// without explicitly translating them into the correct registers for the device 
// you are using so explicit pin numbers are used in this program for simplicity.
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*  PortA, 10 bits
//                                               V   V   V
//                     A5,A12,A13,A14,A15,A16,A17,A26,A28,A29
//byte pinAtable[] = { 25,  3,  4, 26, 27, 28, 39, 42, 40, 41};
// none used                      
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortB, 16 bits 
//                                      V   V                           V   V   V   V
//                     B0, B1, B2, B3, B4, B5,B10,B11,B16,B17,B18,B19,B20,B21,B22,B23
//byte pinBtable[] = { 16, 17, 19, 18, 49, 50, 31, 32,  0,  1, 29, 30, 43, 46, 44, 45}; 
// none used
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortC -> 12 bits, but LED is on pin 13, pins all in binary order!!!!
// all on front
//                        C0, C1, C2, C3, C4, C5, C6, C7, C8, C9,C10,C11
//uint8_t pinCtable[] = { 15, 22, 23,  9, 10, 13, 11, 12, 35, 36, 37, 38}; 
//                         *   *   *   *   *   *   *   *   
// using the first 8 bits for data inbound
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortD, 8 bits in binary order.
// all on front
//                        D0, D1, D2, D3, D4, D5, D6, D7
//uint8_t pinDtable[] = {  2, 14,  7,  8,  6, 20, 21,  5};
//                         *   *   *   *   *   *   *   *
// using the first 8 bits for data outbound
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortE, 5 bits, in binary order.
//                      V   V         
//                    E10,E11,E24,E25,E26  NOT CONTIG
//byte pinEtable[] = { 56, 57, 33, 34, 24};
// clock out, in                *   *
// note: E0 - E5 used by built-in SDCARD
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// using 33 & 34 as clock out & clock in
#define CLOCK_OUT_PIN 33  // NOT REFERENCED - the port bit is used instead
#define CLOCK_OUT_PORT_BIT (1<<0) // Pin 33 = bit 24 of Port E = 0th bit of MSByte
#define CLOCK_IN_PIN  34   // NOT REFERENCED - only the port interrupt is referenced
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// PRINT ROUTINES
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void printIbPackedBuff(int event, int loop) { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  Serial.printf("IBPK %d %d\n",event,loop);
  for (int ii = 0; ii < DMABUFFSIZEIBPK8;ii++ ){  
    Serial.printf("%02X ",ibPackedBuff.buff8[ii]);
    if (ii > 0) {
      if ((ii+1) %  4 == 0) {Serial.printf("  ");}
      if ((ii+1) % 16 == 0) {Serial.printf("\n");}
    }
  }
  Serial.printf("\n");
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void printIbUnpackedBuff(int event, int loop) { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  Serial.printf("IBUP %d %d - \n",event,loop);
  for (int ii = 0; ii < DMABUFFSIZEIBUP8;ii++ ){  
    Serial.printf("%02X ",ibUnpackedBuff.buff8[ii]);
    if (ii > 0) {
      if ((ii+1) %  4 == 0) {Serial.printf("  ");}
      if ((ii+1) % 16 == 0) {Serial.printf("\n");}
    }
  }
  Serial.printf("\n");
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void printObPackedBuff(int event, int loop) { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  Serial.printf("OBPK %d %d - \n",event,loop);
  for (int ii = 0; ii < DMABUFFSIZEOBPK8;ii++ ){
    Serial.printf("%02X ",obPackedBuff.buff8[ii]);
    if (ii > 0) {
      if ((ii+1) %  4 == 0) {Serial.printf("  ");}
      if ((ii+1) % 16 == 0) {Serial.printf("\n");}
    }
  }
  if ((DMABUFFSIZEOBPK8)%16==0){}
  else {Serial.printf("\n");}
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// BUFFER INITIALISATION ROUTINES
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setClockBuff() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Clear whole buffer to no-ops
  for (int ii=0;ii<OBCLOCKDATALEN8/4;ii++){
    obClock.buff32[ii] = 0x00; // address NO pins - this is a NO-OP
  }
  // set a specific pin bit to be toggled
  // once to set the clock line low and once to set it high again
  // on a minor loop each
  // Currently set to first and last minor loops
  // Currently set to zero no-ops ie fastest possible
  // LSB FIRST so bit 24 for PIN 33 is in the last byte!
  obClock.buff8[3] = CLOCK_OUT_PORT_BIT;                     // set toggle pin for low
  obClock.buff8[OBCLOCKDATALEN8 - 1] = CLOCK_OUT_PORT_BIT;   // set toggle pin for high
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void preSetIbPackedBuff() { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Used when debugging only to prove that DMA has written to the buffer
  for (int ii=0;ii<DMABUFFSIZEIBPK32-1;ii++){
    ibPackedBuff.buff32[ii] = 0xAABBCCDD;
  }
  ibPackedBuff.buff32[DMABUFFSIZEIBPK32-1] = 0x11223344;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setIbUnpackedBuff() { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  for (int ii=0;ii<DMABUFFSIZEIBUP8;ii++) {
    ibUnpackedBuff.buff8[ii] = 0x00;
  }
  ibUnpackedBuff.buff8[DMABUFFSIZEIBUP8-1] = 0xFF;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// PACKING / UNPACKING ROUTINES
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void unPackIbPkToIbUp() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Unpack IB Data  - from ibPackedBuff to ibUnpackedBuff, freeing ibPackedBuff
  for (int ii = 0; ii < DMABUFFSIZEIBPK32;ii++ ){
    // The GPIO port is 32 bit but we only want 8
    ibUnpackedBuff.buff8[ii] = ibPackedBuff.buff8[(ii*4)+0]; // in leftmost byte
  } 
  ibUnpackedBuff.buff8[DMABUFFSIZEIBUP8-1] = 0xFF;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void moveIbUpToObUp() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Move IBUP to OBUP, no conversion necessary
  for (int ii=0;ii<DMABUFFSIZEIBUP8;ii++) { // inc terminal 0xFF
    obUnpackedBuff.buff8[ii] = ibUnpackedBuff.buff8[ii]; 
  }
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void packObUpToObPk() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// Expand the 8 bit data to the 32 bit data required by the port
  // Force the 0xFF in the INPUT data before we begin
  obUnpackedBuff.buff8[DMABUFFSIZEOBUP8-1] = 0xFF; // force 0xFF for toggling
  // convert the data in bytes to int32s
  for (int ii=0;ii<DMABUFFSIZEOBUP8;ii++) {
    obPackedBuff.buff8[(ii*4)+1] = 0x00;
    obPackedBuff.buff8[(ii*4)+2] = 0x00;
    obPackedBuff.buff8[(ii*4)+3] = 0x00;
    if (ii==0) {
      obPackedBuff.buff8[(ii*4)+0] = 0xFF ^ obUnpackedBuff.buff8[ii];
    } else {
      obPackedBuff.buff8[(ii*4)+0] = obUnpackedBuff.buff8[ii-1] ^ obUnpackedBuff.buff8[ii];
    }
  }
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void dumpDmaTcd (DMAChannel  &dmabc, uint32_t event, const char* title) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// title is a char[] to describe the channel.
// event is an arbitrary number to distinguish between separate calls
// for the same channel.
// FOR DEBUGGING ONLY
// Dumps the specified TCD and some related non-TCD DMA registers.
// This routine takes a long time which can affect your processing...
// No guarantee that the values correspond to a single instant of time.
// Only written & tested for T3.6

DMASetting tcdDump; // used to hold a copy of this channel's TCD
char muxSourceName [64] [32] = {
"","",
"DMAMUX_SOURCE_UART0_RX",    //2
"DMAMUX_SOURCE_UART0_TX",  //3
"DMAMUX_SOURCE_UART1_RX",  //4
"DMAMUX_SOURCE_UART1_TX",  //5
"DMAMUX_SOURCE_UART2_RX",  //6
"DMAMUX_SOURCE_UART2_TX",  //7
"","",
"DMAMUX_NUM_SOURCE_ALWAYS",  //10
"","","",
"DMAMUX_SOURCE_I2S0_RX",  //14
"DMAMUX_SOURCE_I2S0_TX",  //15
"DMAMUX_SOURCE_SPI0_RX",  //16
"DMAMUX_SOURCE_SPI0_TX",  //17
"DMAMUX_SOURCE_SPI1_RX",  //18
"DMAMUX_SOURCE_SPI1_TX",  //19
"","",
"DMAMUX_SOURCE_I2C0",  //22
"DMAMUX_SOURCE_I2C1",  //23
"DMAMUX_SOURCE_FTM0_CH0",  //24
"DMAMUX_SOURCE_FTM0_CH1",  //25
"DMAMUX_SOURCE_FTM0_CH2",  //26
"DMAMUX_SOURCE_FTM0_CH3",  //27
"DMAMUX_SOURCE_FTM0_CH4",  //28
"DMAMUX_SOURCE_FTM0_CH5",  //29
"DMAMUX_SOURCE_FTM0_CH6",  //30
"DMAMUX_SOURCE_FTM0_CH7",  //31
"DMAMUX_SOURCE_FTM1_CH0",  //32
"DMAMUX_SOURCE_FTM1_CH1",  //33
"DMAMUX_SOURCE_FTM2_CH0",  //34
"DMAMUX_SOURCE_FTM2_CH1",  //35
"","","","",
"DMAMUX_SOURCE_ADC0",  //40
"DMAMUX_SOURCE_ADC1",  //41
"DMAMUX_SOURCE_CMP0",  //42
"DMAMUX_SOURCE_CMP1",  //43
"DMAMUX_SOURCE_CMP2",  //44
"DMAMUX_SOURCE_DAC0",  //45
"",
"DMAMUX_SOURCE_CMT",  //47
"DMAMUX_SOURCE_PDB",  //48
"DMAMUX_SOURCE_PORTA",  //49
"DMAMUX_SOURCE_PORTB",  //50
"DMAMUX_SOURCE_PORTC",  //51
"DMAMUX_SOURCE_PORTD",  //52
"DMAMUX_SOURCE_PORTE",  //53
"DMAMUX_SOURCE_ALWAYS0",  //54
"DMAMUX_SOURCE_ALWAYS1",  //55
"DMAMUX_SOURCE_ALWAYS2",  //56
"DMAMUX_SOURCE_ALWAYS3",  //57
"DMAMUX_SOURCE_ALWAYS4",  //58
"DMAMUX_SOURCE_ALWAYS5",  //59
"DMAMUX_SOURCE_ALWAYS6",  //60
"DMAMUX_SOURCE_ALWAYS7",  //61
"DMAMUX_SOURCE_ALWAYS8",  //62
"DMAMUX_SOURCE_ALWAYS9"  //63
};
  // Take a copy of the whole TCD quickly to minimise inconsistency between the fields
  memcpy(&tcdDump.TCD,&dmabc.TCD,32);
  uint32_t thisChannel = dmabc.channel;

  Serial.printf("============================================\n");
  Serial.printf("*** TCD For Channel %d [%s] %s %d\n",thisChannel,title,"for Event",event);
  Serial.printf("WARNING - values may be changing during reporting\n");
  Serial.printf("\"Completion\" means end of major loop \n");
// DMA STUFF
  Serial.printf("[DMA]\n");
  
  // DMA_CR - Control Register
  Serial.printf("  [DMA_CR]     : 0x%08X Control Register\n", DMA_CR);
  //Serial.printf("    DMA_CR_EEI: %x \n" , DMA_EEI);
  
  // DMA_ES - Error Status Register
  Serial.printf("  [DMA_CR_ES]  : 0x%08x Last Recorded Channel Error Register "  , DMA_ES);
	showDMA_ES();
  
  // DMA_ERQ - Enable Request Register
  Serial.printf("  [DMA_CR_ERQ] : 0x%08X Enable Request Register\n", DMA_ERQ);
  uint32_t *myX0 = (uint32_t *)(0x4000800C); 
  if (*myX0 != 0) {
    Serial.printf("  [DMA_CR_ERQ] : Hardware service request enabled for channels ");
    uint32_t erqWork = *myX0;
    for (uint ii=0;ii<32;ii++) {
      if (erqWork & 0x000001) {
        Serial.printf("%d ",ii);
        if (ii == thisChannel) {
          Serial.printf("(=this channel) ");
        }
      }
      erqWork >>= 1;
    }
    Serial.printf("\n");
  } else {Serial.printf("  [DMA_CR_ERQ] : No hardware service request enabled for any channel\n");}
  
  // DMA_INT - Interrupt Request Register
  uint32_t *myX1 = (uint32_t *)(&DMA_INT);  
  Serial.printf("  [DMA_INT]    : 0x%08X \n",*myX1);
  if (*myX1 != 0) {
    Serial.printf("\n  [DMA_INT] : DMA-generated interrupt request outstanding for channels ");
    uint32_t erqWork = *myX1;
    for (uint ii=0;ii<32;ii++) {
      if (erqWork & 0x000001) {
        Serial.printf("%d ",ii);
        if (ii == thisChannel) {
          Serial.printf("(=this channel) ");
        }
      }
      erqWork >>= 1;
    }
    Serial.printf("\n");
  }  else {Serial.printf("  [DMA_INT]    : No DMA-generated interrupt requests outstanding for any channel \n");}

  // DMA_ERR - Error Register
  Serial.printf("  [DMA_ERR]    : 0x%08X \n"  , DMA_ERR);
  uint32_t *myX2 = (uint32_t *)(&DMA_ERR);  
  if (*myX2 != 0) {
    Serial.printf("  [DMA_ERR]    : ** ERROR ** Channel error(s) detected on channel(s) ");
    uint32_t erqWork = *myX2;
    for (uint ii=0;ii<32;ii++) {
      if (erqWork & 0x000001) {
        Serial.printf("%d ",ii);
        if (ii == thisChannel) {
          Serial.printf("(=this channel) ");
        }
      }
      erqWork >>= 1;
    }
    Serial.printf("\n");
  } else {Serial.printf("  [DMA_ERR]    : No channel errors detected on any  channel\n");}

  // DMA_HRS - Hardware Request Status Register
  uint32_t *myX3 = (uint32_t *)(&DMA_HRS);  
  Serial.printf("  [DMA_HRS]    : 0x%08X \n",*myX3);
  if (*myX3 != 0) {
    Serial.printf("  [DMA_HRS]    : Hardware Service Requests exist for channel(s) ");
    uint32_t erqWork = *myX3;
    for (uint ii=0;ii<32;ii++) {
      if (erqWork & 0x000001) {
        Serial.printf("%d ",ii);
        if (ii == thisChannel) {
          Serial.printf("(=this channel) ");
        }        
      }
      erqWork >>= 1;
    }
    Serial.printf("\n");
  } else {Serial.printf("  [DMA_HRS]    : No hardware Service Requests exist for any channel\n");}

// DMAMUX
  uint32_t *myX4 = (uint32_t *)(&DMAMUX0_CHCFG0)+thisChannel;  
  int8_t enbl = *myX4 & 0b10000000;
  int8_t trig = *myX4 & 0b01000000;
  int8_t srce = *myX4 & 0b00111111;
  Serial.printf("  [DMAMUX_CHCFGn] : 0x%02X\n");
  if (enbl && trig) {
	  Serial.printf("  [DMAMUX_CHCFGn] : The DMA is in PIT triggering mode. (ENBL & TRIG are on)\n");
  } else {
	  Serial.printf("  [DMAMUX_CHCFGn] : The DMA is NOT in PIT triggering mode. (ENBL & TRIG are not both on)\n");
  }
  if (trig) {
    Serial.printf("  [DMAMUX_CHCFGn] : Triggering is enabled (TRIG is on).\n");
  } else {
    Serial.printf("  [DMAMUX_CHCFGn] : Triggering is NOT enabled (TRIG is off).\n");
  }
  if (enbl) {
    Serial.printf("  [DMAMUX_CHCFGn] : ENBL is on.\n");
  } else {
    Serial.printf("  [DMAMUX_CHCFGn] : ENBL is off.\n");
  }
  if (srce != 0) {
    Serial.printf("  [DMAMUX_CHCFGn] : [SOURCE] is %s\n",muxSourceName[srce]);
  } else {
    Serial.printf("  [DMAMUX_CHCFGn] : no MUX SOURCE defined\n");
  }
  // TCD STUFF  
  Serial.printf("[TCD]\n");
	Serial.printf("  [SADDR]    : 0x%08X :Source Address \n",(uint32_t)tcdDump.TCD->SADDR);
	Serial.printf("  [SOFF]     : %d  :Source address adjustment after each minor loop\n",tcdDump.TCD->SOFF);
	Serial.printf("  [DADDR]    : 0x%08X :Destination Address \n",(uint32_t)tcdDump.TCD->DADDR);
	Serial.printf("  [DOFF]     : %d  :Destination address adjustment after each minor loop\n",tcdDump.TCD->DOFF);
	Serial.printf("  [NBYTES]   : %d  :Minor Loop  Size \n",        tcdDump.TCD->NBYTES);
	Serial.printf("  [SLAST]    : %d :Source Address adjustment on completion\n",tcdDump.TCD->SLAST);
	// TCD ATTR  
  int32_t dsize = tcdDump.TCD->ATTR & 0b0000000000000111;
  int32_t dmod  = tcdDump.TCD->ATTR & 0b0000000011111000; dmod >>= 3;
  int32_t ssize = tcdDump.TCD->ATTR & 0b0000011100000000; ssize >>= 8;
  int32_t smod  = tcdDump.TCD->ATTR & 0b1111100000000000; smod >>= 11;
  char ssizeS[17];
	switch (ssize) {
    case 0:  strcpy(ssizeS,"8 bit"); break;
    case 1:  strcpy(ssizeS,"16 bit"); break;
    case 2:  strcpy(ssizeS,"32 bit"); break;
    case 4:  strcpy(ssizeS,"16-byte burst"); break;
    case 5:  strcpy(ssizeS,"32-byte burst"); break;
    default: strcpy(ssizeS,"INVALID");
	}
  char dsizeS[17];
	switch (dsize) {
    case 0:  strcpy(dsizeS,"8 bit"); break;
    case 1:  strcpy(dsizeS,"16 bit"); break;
    case 2:  strcpy(dsizeS,"32 bit"); break;
    case 4:  strcpy(dsizeS,"16-byte burst"); break;
    case 5:  strcpy(dsizeS,"32-byte burst"); break;
    default: strcpy(dsizeS,"INVALID");
	}
  
	Serial.printf("  [ATTR]     : 0x%08X\n",tcdDump.TCD->ATTR);
	Serial.printf("    [SMOD]     : 0x%02X",smod);
	if (smod != 0) {Serial.printf(" Circular source queue specifed - see manual if not desired\n");} 
	else           {Serial.printf(" \n");}	
	Serial.printf("    [SSIZE]    : 0x%02X   %s = Source transfer size\n",ssize,ssizeS);
	
	Serial.printf("    [DMOD]     : 0x%02X ",dmod);
	if (dmod != 0) {Serial.printf(" Circular dest queue specifed - see manual if not desired\n");} 
	else           {Serial.printf(" \n");}	
	Serial.printf("    [DSIZE]    : 0x%02X   %s = Destination transfer size\n",dsize,dsizeS);
	
  // Report CITER
  decodeIter((uint16_t)tcdDump.TCD->CITER,myCiter);
  Serial.printf("  [CITER]    : 0x%08X\n",tcdDump.TCD->CITER);
  if (myCiter.iterelnk != 0) {
    Serial.printf("    [CITERELNK]: This TCD will link to channel %d ",myCiter.iterCh);
    if (myCiter.iterCh == thisChannel) {Serial.printf("(=this channel) ");}        
    Serial.printf("at the end of a minor loop (except last)\n");
  } else {
    Serial.printf("    [CITERELNK]: This TCD will NOT link to another channel at the end of a minor loop\n");
  }
  Serial.printf("    [CITERITER]: %d Major loop iterations still to do \n",myCiter.iter);
  // Report BITER
  decodeIter(tcdDump.TCD->BITER,myBiter);
	Serial.printf("  [BITER]    : 0x%08X : (At end of all minor loops CITER reset value)\n",tcdDump.TCD->BITER);
  if (myBiter.iterelnk != 0) {
    Serial.printf("    [BITERELNK]: This TCD reset value will link to channel %d ",myCiter.iterCh);
    if (myBiter.iterCh == thisChannel) {Serial.printf("(=this channel) ");}        
    Serial.printf("at the end of a minor loop (except last)\n");
  } else {
    Serial.printf("    [BITERELNK]: This TCD reset value will NOT link to another channel at the end of a minor loop\n");
  }
    Serial.printf("    [BITERITER]: %d Major loop iterations initial/reset value\n",myBiter.iter);
  // Report CITER/BITER errors
  if (myCiter.iterelnk != myBiter.iterelnk) {
	  Serial.printf("** DMA ERROR ** CELNK DOES NOT MATCH BELNK\n");
	}
	// Report CITER/BITER interest
  if (myCiter.iter != myBiter.iter) {
	  Serial.printf("** NOTE ** CITER %d does not match BITER %d, DMA not yet finished or ** DMA ERROR **\n",myCiter.iter,myBiter.iter);
	}
		
	Serial.printf("  [DLASTSGA] :0x%08X\n", tcdDump.TCD->DLASTSGA);
	if ((tcdDump.TCD->CSR & DMA_TCD_CSR_ESG) == 0) {
  	Serial.printf("    [\"DLAST\"]  :%d Dest Address adjustment on completion\n",tcdDump.TCD->DLASTSGA);
  } else {
  	Serial.printf("    [\"SGA\"]      :Scatter/gather next TCD addr: 0x%08X\n",tcdDump.TCD->DLASTSGA);
  	if ((tcdDump.TCD->DLASTSGA & 0x0000001F) != 0) {
    	Serial.printf("    [\"SGA\"]    :** CONFIGURATION ERROR ** not 32 byte aligned 0x%08X\n", tcdDump.TCD->DLASTSGA);
    }
  }
	
	// CSR
	Serial.printf("  [CSR]      : 0x%08X\n",       tcdDump.TCD->CSR);
	
  // Bandwidth Control. Provides a means of controlling the amount of bus bandwidth the eDMA uses.
  //  00 No stalls–consume 100% bandwidth
  //  01 Reserved
  //  10 eDMA stalls for 4 cycles after each read/write 
  //  11 eDMA stalls for 8 cycles after each read/write
  if ((tcdDump.TCD->CSR & DMA_TCD_CSR_BWC_MASK) == DMA_TCD_CSR_BWC(3)) {	
	  Serial.printf("    [BWC]      :Band width control - eDMA to stall for 8 cycles after each read/write\n");
	} else{
    if ((tcdDump.TCD->CSR & DMA_TCD_CSR_BWC_MASK) == DMA_TCD_CSR_BWC(2)) {	
      Serial.printf("    [BWC]      :Band width control - eDMA to stall for 4 cycles after each read/write\n");
    } else {
      if ((tcdDump.TCD->CSR & DMA_TCD_CSR_BWC_MASK) == DMA_TCD_CSR_BWC(0)) {	
        Serial.printf("    [BWC]      :Band width control - eDMA to stall for 0 cycles after each read/write\n");
      } else {
        Serial.printf("    [BWC]      :Band width control - ** INVALID **\n");
      }
    }
	}

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_MAJORELINK) {
  	Serial.printf("    [MAJLINK]  :Channel-to-channel linking on major loop complete: YES\n");
//*** NOTE the define in cores/teensy3/kinetis.h for DMA_TCD_CSR_MAJORLINKCH_MASK is WRONG as 4 bits should be 5
//  Serial.printf("    [LINKCH]   :This TCD will link to channel %d at the end of a major loop \n",   tcdDump.TCD->CSR & DMA_TCD_CSR_MAJORLINKCH_MASK);
	  Serial.printf("    [LINKCH]   :This TCD will link to channel %d ",(tcdDump.TCD->CSR & 0x1F00)>>8);
    if (((tcdDump.TCD->CSR & 0x1F00)>>8) == thisChannel) {
      Serial.printf("(=this channel) ");
    }        
	  Serial.printf("at the end of a major loop \n");
  } else {
  	Serial.printf("    [MAJLINK]  :Channel-to-channel linking on major loop complete: NO\n");
  }
	
	if (tcdDump.TCD->CSR & DMA_TCD_CSR_ESG) {
  	Serial.printf("    [ESG]      :Scatter/gather specified: YES\n");
  } else {
  	Serial.printf("    [ESG]      :Scatter/gather specified: NO\n");
  }


	if (tcdDump.TCD->CSR & DMA_TCD_CSR_DREQ) {
  	Serial.printf("    [DREQ]     :This channel's ERQ bit to be cleared on completion: YES (ie set channel disabled)\n");
  } else {
  	Serial.printf("    [DREQ]     :This channel's ERQ bit to be cleared on completion: NO (ie channel not disabled)\n");
  }

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_INTHALF) {
  	Serial.printf("    [INTH]     :Interrupt on half completion: YES\n");
  } else {
  	Serial.printf("    [INTH]     :Interrupt on half completion: NO\n");
  }

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_INTMAJOR) {
  	Serial.printf("    [INTF]     :Interrupt on completion: YES\n");
  } else {
  	Serial.printf("    [INTF]     :Interrupt on completion: NO\n");
  }

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_START) {
  	Serial.printf("    [START]    :The channel is requesting service via a software initiated service request and the channel has not begun execution yet.\n");
  } else {
  	Serial.printf("    [START]    :The channel is NOT requesting service via a software initiated service request OR is, but the channel has begun execution.\n");
  } 

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_ACTIVE) {
  	Serial.printf("    [ACTIVE]   :DMA in execution, this flag will be cleared when this minor loop finishes\n");
  } else {
  	Serial.printf("    [ACTIVE]   :DMA not currently executing a minor loop\n");
  } 

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_DONE) {
  	Serial.printf("    [DONE]     :Major loop completed. \n");
  } else {
  	Serial.printf("    [DONE]     :Major loop not completed, or never started, or has been cleared by the software or by the hardware when the channel is activated.\n");
  } 
  
  // TO DO - detect other errors:
/*
  Only 1st 4 channels for PIT module triggering
  multiple ch with same source is error even if disabled
  Do not use continuous link mode with a channel linking to itself 
    if there is only one minor loop iteration per service request. 
    If the channel’s NBYTES value is the same as either the source or 
    destination size, do not use channel linking to itself. The same 
    data transfer profile can be achieved by simply increasing the NBYTES value. 
    A larger NBYTES value provides more efficient, faster processing.
  EMLM no minor links bad universal
    no EMLM then nytes different
*/  
  // operation type mem-mem hw-mem etc
  // status not-started, running, paused, completed etc
  // errors detected by dma
  // errors detected by us
  
  // OTHER 
  // some fields eg HRS not expanded
  // can we report on timers, PDB, etc
  // linked channels
  // add channel name/title?

  Serial.printf("============================================\n");

}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void decodeIter(uint16_t tcdIter, iter_t &myIter) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  myIter.iterelnk  = tcdIter & 0x8000;
  if (myIter.iterelnk != 0) {
    myIter.iter   = tcdIter & 0x01FF;
    myIter.iterCh = (tcdIter & 0x3E00)>>9;
  } else {
    myIter.iter = tcdIter & 0x7FFF;
    myIter.iterCh = 0;
  }
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void showDMA_ES() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
uint worki;

uint32_t *myErr = (uint32_t *)(0x40008004);

  worki = *myErr & 0x80000000; worki >>= 31;
  if (worki!=1){
    Serial.printf("No errors registered\n");
  } else {
    Serial.printf("\n");
    worki = *myErr & 0x00000001; worki >>= 0;
    if (worki==1){Serial.printf("    [DBE] ** ERROR ** The last recorded error was a bus error on a destination write\n");}
    worki = *myErr & 0x00000002; worki >>= 1;
    if (worki==1){Serial.printf("    [SBE] ** ERROR ** The last recorded error was a bus error on a source read\n");}
    worki = *myErr & 0x00000004; worki >>= 2;
    if (worki==1){Serial.printf("    [SGE] ** ERROR **  TCDn_DLASTSGA (scatter/gather TCD) is not on a 32 byte boundary.\n");}
    worki = *myErr & 0x00000008; worki >>= 3;
    if (worki==1){
      Serial.printf("    [NCE] ** ERROR ** The last recorded error was a configuration error detected in the TCDn_NBYTES or TCDn_CITER fields.\n");
      Serial.printf("    - TCDn_NBYTES is not a multiple of TCDn_ATTR[SSIZE] and TCDn_ATTR[DSIZE], or \n");
      Serial.printf("    - TCDn_CITER[CITER] is equal to zero, or \n");
      Serial.printf("    - TCDn_CITER[ELINK] is not equal to TCDn_BITER[ELINK]\n");
    }
    worki = *myErr & 0x00000010; worki >>= 4;
    if (worki==1){Serial.printf("    [DOE] ** ERROR ** The last recorded error was a configuration error detected in the TCDn_DOFF field. TCDn_DOFF is inconsistent with TCDn_ATTR[DSIZE].\n");}
    worki = *myErr & 0x00000020; worki >>= 5;
    if (worki==1){Serial.printf("    [DAE] ** ERROR ** The last recorded error was a configuration error detected in the TCDn_DADDR field. TCDn_DADDR is inconsistent with TCDn_ATTR[DSIZE].\n");}
    worki = *myErr & 0x00000040; worki >>= 6;
    if (worki==1){Serial.printf("    [SOE] ** ERROR ** The last recorded error was a configuration error detected in the TCDn_SOFF field. TCDn_SOFF is inconsistent with TCDn_ATTR[SSIZE].\n");}
    worki = *myErr & 0x00000080; worki >>= 7;
    if (worki==1){Serial.printf("    [SAE] ** ERROR ** The last recorded error was a configuration error detected in the TCDn_SADDR field. TCDn_SADDR is inconsistent with TCDn_ATTR[SSIZE].\n");}
  
    worki = *myErr & 0x00001F00; worki >>= 8;
    Serial.printf("    [ERRCHN] %d = The channel number of the last recorded error, excluding GPE and CPE errors, or last recorded error canceled transfer.\n",worki);
  
    worki = *myErr & 0x00004000; worki >>= 14;
    if (worki==1){Serial.printf("    [CPE] ** ERROR ** The last recorded error was a configuration error in the channel priorities within a group. Channel priorities within a group are not unique.\n");}
    worki = *myErr & 0x00008000; worki >>= 15;
    if (worki==1){Serial.printf("    [GPE] ** ERROR ** The last recorded error was a configuration error among the group priorities. All group priorities are not unique.\n");}
    worki = *myErr & 0x00010000; worki >>= 16;
    if (worki==1){Serial.printf("    [ECX] ** ERROR ** The last recorded entry was a canceled transfer by the error cancel transfer input\n");}
  } 
}
// TODO - USE portConfigRegister(pin);
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setDataPins(bool level) { // used for debugging
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // USE DIGITS DIRECTLY
  digitalWrite(15,level);
  digitalWrite(22,level);
  digitalWrite(23,level); 
  digitalWrite(9, level);
  digitalWrite(10,level);
  digitalWrite(13,level);
  digitalWrite(11,level);
  digitalWrite(12,level); 
  // NOTE The last pin in the pin table gives the highest value bit
  delayMicroseconds(5); // REQUIRED
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setup() { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// DATA  OUTPUT=PORTC INPUT=PORTD
// CLOCK OUTPUT=PORTE INPUT=PORTE
  delay(5000);
  Serial.begin(115200);  

  // Handle the serial number of this Teensy
  teensySN(myId.serial);
  myId.myTeensySerial = __builtin_bswap32(myId.myTeensySerial); // IS THIS NECESSARY ? !!!
  mySerialIx = getSerialIx(myId.myTeensySerial);
  Serial.printf("SER=%d\n",myId.myTeensySerial);

  // The processor with the lowest id is the Master
  weAreMaster = true;
  bool weAreInList = false;
  for (uint32_t ii=0;ii<noOfProcessors;ii++){
    if (serialList[ii] <  myId.myTeensySerial) {weAreMaster = false;}
    if (serialList[ii] == myId.myTeensySerial) {weAreInList = true ;}
  }
  if (weAreMaster){
    initialRecordToBeCreated = true;
    Serial.printf("MASTER=YES\n");
  } else {
    initialRecordToBeCreated = false;
    Serial.printf("MASTER=NO\n");
  }
  // If our serial number is not in the list, hard stop
  if (!weAreInList) {
    Serial.printf("OUR SERIAL %d NOT IN LIST\n",myId.myTeensySerial);
    for (uint32_t ii=0;ii<noOfProcessors;ii++){
      Serial.printf("  %d\n",serialList[ii]);
    }
    while(1==1){} // HARD LOOP
  }
  //Serial.printf("NOPR  %d\n",noOfProcessors); // for debugging

  // Build the clocking data
  setClockBuff();
  
  // Setup Data Input Pins
  //for (int ii=0;ii<8;ii++) {pinMode(pinDtable[ii], INPUT);} WILL NOT WORK
  pinMode(2,  INPUT);
  pinMode(14, INPUT);
  pinMode(7,  INPUT);
  pinMode(8,  INPUT);
  pinMode(6,  INPUT);
  pinMode(20, INPUT);
  pinMode(21, INPUT);
  pinMode(5,  INPUT);

  // Setup Output Pins
  pinMode(15,OUTPUT); // LSB
  pinMode(22,OUTPUT);
  pinMode(23,OUTPUT); 
  pinMode(9, OUTPUT);
  pinMode(10,OUTPUT);
  pinMode(13,OUTPUT);
  pinMode(11,OUTPUT);
  pinMode(12,OUTPUT); // MSB
  
  // Set up Clock Pins
  pinMode(33, OUTPUT);
  digitalWrite(33, HIGH);
  pinMode(34, INPUT);

  // Put the data pins into a known starting position
  setDataPins(HIGH);

// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// Set up the DMA channels
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
/*
To initialize the eDMA: (ex the manual)
1. Write to the CR if a configuration other than the default is desired.
2. Write the channel priority levels to the DCHPRIn registers if a configuration other than the default is desired.
3. Enable error interrupts in the EEI register if so desired.
4. Write the 32-byte TCD for each channel that may request service.
5. Enable any hardware service requests via the ERQH and ERQL registers. {???}
6. Request channel service via either:
• Software: setting the TCDn_CSR[START]
• Hardware: slave device asserting its eDMA peripheral request signal
*/
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// DMA CLOCK CHANNEL - ch 0
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
  Serial.printf("Clock Channel is %d\n",obDmaTcdClockChan.channel);
//1 Write CR
//2 Write Priority
//3 Enable Error Interrupts
//4 Write TCD
  obDmaTcdClockChan.begin();
  obDmaTcdClockChan.destination((volatile uint32_t&)GPIOE_PTOR); // clock pin TOGGLE register
  obDmaTcdClockChan.sourceBuffer((uint32_t *)obClock.buff32,OBCLOCKDATALEN8); // len is in BYTES!!!
  obDmaTcdClockChan.disableOnCompletion();
//5 Enable any hardware service requests via the ERQH and ERQL registers.
// ERQH and ERQL are not defined anywhere!!!
//6 Request channel service via either:
//  • Software: setting the TCDn_CSR[START]
//  • Hardware: slave device asserting its eDMA peripheral request signal
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// DMA DATA OB CHANNEL
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
  Serial.printf("OB Channel is %d\n",obDmaTcdDataChan.channel);
//1 Write CR
//2 Write Priority
//3 Enable Error Interrupts
//4 Write TCD
  obDmaTcdDataChan.begin();
  obDmaTcdDataChan.destination((volatile uint32_t&)GPIOC_PTOR); // TOGGLE REGISTER 
  obDmaTcdDataChan.sourceBuffer((uint32_t *)obPackedBuff.buff32,(DMABUFFSIZEOBPK8)); // LEN in BYTES!!!
  obDmaTcdDataChan.transferCount(DMABUFFSIZEOBPK32);
  obDmaTcdDataChan.disableOnCompletion();
  // Bandwidth Control. Provides a means of controlling the amount of bus bandwidth the eDMA uses.
  //  00 No stalls–consume 100% bandwidth
  //  01 Reserved
  //  10 eDMA stalls for 4 cycles after each read/write 
  //  11 eDMA stalls for 8 cycles after each read/write
  //obDmaTcdDataChan.TCD->CSR |= 0xC000; // or in the BWC
  obDmaTcdClockChan.triggerAtTransfersOf( obDmaTcdDataChan);
  obDmaTcdClockChan.triggerAtTransfersOf( obDmaTcdClockChan);
  obDmaTcdDataChan.triggerAtCompletionOf( obDmaTcdClockChan);
  // following not used as we send 0xFF to set the port but do not require
  // the receiver to read it (as we could not get it to work!) 
  // not used - obDmaTcdClockChan.triggerAtCompletionOf(obDmaTcdDataChan);
//5 Enable any hardware service requests via the ERQH and ERQL registers.
//6 Request channel service via either:
//  • Software: setting the TCDn_CSR[START]
//  • Hardware: slave device asserting its eDMA peripheral request signal
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// DMA DATA IB CHANNEL
// = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
  Serial.printf("IB Channel is %d\n",ibDmaTcdDataChan.channel);
//1 Write CR
//2 Write Priority
//3 Enable Error Interrupts
//4 Write TCD
  // Note This channel should never run against the other 2 channels but might
  // encounter Library channels if any are added.
  ibDmaTcdDataChan.begin();
  ibDmaTcdDataChan.source((volatile uint32_t&) GPIOD_PDIR);
  ibDmaTcdDataChan.destinationBuffer(ibPackedBuff.buff32,DMABUFFSIZEIBPK8); // note byte length
  ibDmaTcdDataChan.transferCount(DMABUFFSIZEIBPK32); // ?????????????????????????????
  // This channel is continuously enabled:
  // ibDmaTcdDataChan.disableOnCompletion();
  ibDmaTcdDataChan.attachInterrupt(dmaGetEndedInterrupt);
  ibDmaTcdDataChan.interruptAtCompletion();
  // The following type of "enable request" is needed for hardware triggering
  ibDmaTcdDataChan.enable(); // same as next: // this triggers 1 minor loop
  // DMA_SERQ = ibDmaTcdDataChan.channel; 
  // This channel is left enabled at the end of each major loop
//5 Enable any hardware service requests via the ERQH and ERQL registers.
  //Address: 4002_1000h base + 0h offset + (1d × i), where i=0d to 31d
  uint32_t *myQ = (uint32_t *)(0x40021000)+ibDmaTcdDataChan.channel; 
  *myQ = 0;
//  *myQ = DMAMUX_ENABLE | (DMAMUX_SOURCE_PORTA & 63) ;
//  *myQ = DMAMUX_ENABLE | DMAMUX_SOURCE_PORTA ;
  *myQ = DMAMUX_ENABLE | DMAMUX_SOURCE_PORTE ;
//6 Request channel service via either:
//  • Software: setting the TCDn_CSR[START]
//  • Hardware: slave device asserting its eDMA peripheral request signal
  //NVIC_SET_PRIORITY(IRQ_PORTE, 32); // not tried yet
// MISC NOTES
  ibDmaTcdDataChan.triggerAtHardwareEvent(DMAMUX_SOURCE_PORTE);
	PORTE_PCR25 |= PORT_PCR_IRQC(2); // Trigger DMA request on falling edge

dumpDmaTcd(ibDmaTcdDataChan, 11,"IB"); 
  delay(3000); // allow time for dma to do mischief
dumpDmaTcd(ibDmaTcdDataChan, 22,"IB");
  
  // **** Start Right Royal Bodge
  // No matter how we set up the IB DMA channel, we always get one minor loop executed
  // BEFORE we have received an input clock pin interrupt!
  // This leaves the channel out of phase with the sender.
  // This bodge modifies the TCD to look like new
  // CRUDE RESET
  ibDmaTcdDataChan.TCD->CITER = ibDmaTcdDataChan.TCD->BITER;
  ibDmaTcdDataChan.destinationBuffer(ibPackedBuff.buff32,DMABUFFSIZEIBPK8); // note byte length
  // **** End Right Royal Bodge
  
  if (weAreMaster) {delay(2000);} // ensure others are ready to read

  Serial.println("SETUP ENDED");  
  startMillis = millis(); // for debugging
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
//bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)
//type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)
//These builtins perform an atomic compare and swap. That is, if the current value of *ptr is oldval, then write newval into *ptr.
//The “bool” version returns true if the comparison is successful and newval was written. 
//The “val” version returns the contents of *ptr before the operation. 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
inline bool enqueue( volatile unsigned int *lock ) { // not used
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
	return (__sync_bool_compare_and_swap( lock, 0, 1 ) );
	// returns true  if lock was 0 and was replaced by 1
	// returns false if lock was 1 ie owned by someone else
}
//Prior to initializing the corresponding module, set SCGC5[PORTx] in the SIM module to enable the clock. Before turning off the clock, make sure to disable the module. 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
inline void dequeue( volatile unsigned int *lock ) { // not used
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  asm volatile ( "" ::: "memory" );
  *lock = 0;
  // sets lock to 0, releasing it
  return;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void doDmaGet() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // no longer required as IB DMA now continuously enabled
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void doDmaPut() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
//  DMA_SERQ = 1; // Start OB - NOT needed
  obDmaTcdDataChan.TCD->CSR |= DMA_TCD_CSR_START ; // 0x0001
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void dmaGetEndedInterrupt() { // Called by DMA IB on completion of read
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  //ibDmaTcdDataChan.clearInterrupt(); // same as:
  DMA_CINT = ibDmaTcdDataChan.channel; // clear interrupt
  ibPackedBuffStatus = myFULL;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
uint8_t getSerialIx(uint32_t thisSerial) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
uint8_t ix = 0; // 0=error
  for (uint32_t ii=0;ii<noOfProcessors;ii++) {
    if (thisSerial == serialList[ii]) {ix=ii+1;}
  }
  return ix;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void loop() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  volatile bool myIbPackedBuffStatus = myEMPTY;
  // Caution - ibPackedBuffStatus is also changed in an interrupt
  // We can change it here when the IB DMA channel is not running
  
  loopCnt++;
    
  if (millis()-startMillis > 10000){ // debugging
    printIbUnpackedBuff(dmaGetCnt,loopCnt);
    dumpDmaTcd(ibDmaTcdDataChan,  91, "IB");
    dumpDmaTcd(obDmaTcdDataChan,  92, "OB");
    dumpDmaTcd(obDmaTcdClockChan, 93, "CLK");
    unPackIbPkToIbUp(); 
    while(1==1){}
  }

  // Handle Startup
  if (initialRecordToBeCreated) { 
    initialRecordToBeCreated = false;
    setIbUnpackedBuff();
    moveIbUpToObUp();
    packObUpToObPk(); 
    doDmaPut();
    return;
  }

  // Handle Get-Got Interrupt Flag
  noInterrupts();
  myIbPackedBuffStatus = ibPackedBuffStatus;
  ibPackedBuffStatus = myEMPTY;
  interrupts();

  // Handle Get Got
  if (myIbPackedBuffStatus == myFULL) {
    dmaGetCnt++;
    //printIbPackedBuff(dmaGetCnt,loopCnt);
    unPackIbPkToIbUp(); 
    //printIbUnpackedBuff(dmaGetCnt,loopCnt);    
    // *** START Work - tinker with the record here in ib unpacked buffer
    ibUnpackedBuff.buff32[mySerialIx]++; 
    // *** END   Work
    moveIbUpToObUp(); 
    //printObUnpackedBuff(dmaGetCnt,loopCnt); // function does not exist
    packObUpToObPk(); 
    //printObPackedBuff(dmaGetCnt,loopCnt);
    doDmaPut();
  }   
  for (int ii=1.0;ii<100.0;ii++){
    result = ii / (ii*ii);
  }
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
 
This is a copy of what the dumpDmaTcd() function can produce:

*** TCD For Channel 0 [IB] for Event 22
WARNING - values may be changing during reporting
"Completion" means end of major loop
[DMA]
[DMA_CR] : 0x00000482 Control Register
[DMA_CR_ES] : 0x00000000 Last Recorded Channel Error Register No errors registered
[DMA_CR_ERQ] : 0x00000001 Enable Request Register
[DMA_CR_ERQ] : Hardware service request enabled for channels 0 (=this channel)
[DMA_INT] : 0x00000000
[DMA_INT] : No DMA-generated interrupt requests outstanding for any channel
[DMA_ERR] : 0x00000000
[DMA_ERR] : No channel errors detected on any channel
[DMA_HRS] : 0x00000000
[DMA_HRS] : No hardware Service Requests exist for any channel
[DMAMUX_CHCFGn] : 0x1FFF26EC
[DMAMUX_CHCFGn] : The DMA is NOT in PIT triggering mode. (ENBL & TRIG are not both on)
[DMAMUX_CHCFGn] : Triggering is NOT enabled (TRIG is off).
[DMAMUX_CHCFGn] : ENBL is on.
[DMAMUX_CHCFGn] : [SOURCE] is DMAMUX_SOURCE_PORTE
[TCD]
[SADDR] : 0x400FF0D0 :Source Address
[SOFF] : 0 :Source address adjustment after each minor loop
[DADDR] : 0x1FFF141C :Destination Address
[DOFF] : 4 :Destination address adjustment after each minor loop
[NBYTES] : 4 :Minor Loop Size
[SLAST] : 0 :Source Address adjustment on completion
[ATTR] : 0x00000202
[SMOD] : 0x00
[SSIZE] : 0x02 32 bit = Source transfer size
[DMOD] : 0x00
[DSIZE] : 0x02 32 bit = Destination transfer size
[CITER] : 0x000001EC
[CITERELNK]: This TCD will NOT link to another channel at the end of a minor loop
[CITERITER]: 492 Major loop iterations still to do
[BITER] : 0x000001FC : (At end of all minor loops CITER reset value)
[BITERELNK]: This TCD reset value will NOT link to another channel at the end of a minor loop
[BITERITER]: 508 Major loop iterations initial/reset value
** NOTE ** CITER 492 does not match BITER 508, DMA not yet finished or ** DMA ERROR **
[DLASTSGA] :0xFFFFF810
["DLAST"] :-2032 Dest Address adjustment on completion
[CSR] : 0x00000002
[BWC] :Band width control - eDMA to stall for 0 cycles after each read/write
[MAJLINK] :Channel-to-channel linking on major loop complete: NO
[ESG] :Scatter/gather specified: NO
[DREQ] :This channel's ERQ bit to be cleared on completion: NO (ie channel not disabled)
[INTH] :Interrupt on half completion: NO
[INTF] :Interrupt on completion: YES
[START] :The channel is NOT requesting service via a software initiated service request OR is, but the channel has begun execution.
[ACTIVE] :DMA not currently executing a minor loop
[DONE] :Major loop not completed, or never started, or has been cleared by the software or by the hardware when the channel is activated.
============================================
 
And here is Version 2
// Discover if it possible to do SPI with so few pins free
// Target SPI device: Adafruit 2.8" TFT
// Changed the 8bit parallel output on PORT C to use bits 11-4 instead of 7-0
// Data transfer now requires one extra shift instruction per byte sent.
// Used alt SPI pins:
// MOSI 28
// MISO 39
// CLK 27
// TFT CS 9
// TFT DC 15
// NOTES:
// Seems to work. Displays a running total of records read, until stopped.
// MISO not tested.
// Throughput: 7358 records of 508bytes each in 10 secs
// = 0.373786MBytes/sec = 2.990291MBits/sec
//
Code:
// DmaAndSpi02.ino 
// Version 2
// Discover if it possible to do SPI with so few pins free
// Target SPI device: Adafruit 2.8" TFT 
//  Changed the 8bit parallel output on PORT C to use bits 11-4 instead of 7-0
//  Data transfer now requires one extra shift instruction per byte sent.
//  Used alt SPI pins:
//    MOSI   28
//    MISO   39
//    CLK    27
//    TFT CS  9
//    TFT DC 15
// NOTES:
//  Seems to work. Displays a running total of records read, until stopped.
//  MISO not tested.
//  Throughput: 7358 records of 508bytes each in 10 secs
//    = 0.373786MBytes/sec = 2.990291MBits/sec
// 

// *** TEENSY SPI
// Note little library support for the extra SPI busses
// SPI BUS -
// All built-in, does not use any lines (except a CS?)
// Notation x,y = alternative lines, x/y = alternative names for 1 line
// Some lines are only pads on the back of the teensy?
// SPI BUS 0
// MOSI     11(07,28)
// MISO     12(08,39)
// SCK      13(14,27)
// CS       8,9,15/A1,20/46,21/A7
// SPI BUS 1
// MOSI     00(05)
// MISO     01(21,A7)
// SCK      32/A13(20/A6)
// CS       31/A12 (Warning - only 1 special CS line available)
// *** END

// This is a Fun program with no defined application and probably no practical use.
// NOTE:  This program uses simple crude code. No classy C++ classes.

// Objectives:
// To share data between n 3.6 teensies where n is any number greater than 1 using
// a round robin technique with a single record of 4-508 bytes passed from one to the next 
// using the same code in all teensies connected in a ring.
// To pass data 8 bits at a time
// To use DMA to the maximum extent for both data input and output
//
// Comments:
// It has been developed & tested for Teensy 3.6s only.
// The physical configuration is the same on all the teensies used.
// The same code is used on all the teensies.
// 18 pins are used:
//    8 data in  - contiguous bits on Port D
//    8 data out - contiguous bits on Port C
//    1 clock pin in  - Pin 34 on Port E
//    1 clock pin out - Pin 33 on Port E
// This does not leave many pins for doing anything else and, more importantly,
// uses up pins normally used for other things eg SPI
// The main loop has to adhere to a specific protocol, with opportunities to do
// useful work within that.  It cannot observe its own routine and call on these
// services when it wants to.
// The data record length is fixed at 4 - 508 bytes & must be a multiple of 4.
// The upper limit is fixed by DMA using 9 bits for the length when linking.
// Only one record is travelling around the ring.
// The original plan was for 1 record per processor which would have increased
// throughput (but not speed) n times, but it got too complex.
// The data throughput is the sum of the read, processing, and write times for 
// all the processors on the ring.

// Measured: For 4 processors & Rec Len = 508 
// 8470 records * 508 bytes for 10 secs = 430276 bytes/sec

// The code uses the dmaChannel library mostly but not entirely. 
// Direct TCD manipulation is also used. 
// Basic Mechanism:
// Main Loop:
// If record read
//  process record
//  write the record (with or without changes)

// Detailed Mechanism:
// A DMA read (inbound) channel is permanently enabled, awaiting a hardware trigger.
// This will wait without blocking the cpu until activated by an incoming clock pulse, 
// On completion the DMA read will call an interrupt which will flag the read buffer as FULL.
// The main loop detects the read and processes the record.
// It then writes the record by starting a DMA channel (the outbound channel) to throw 
// the bytes of the record at the port.  
// After each 4 bytes (one minor transfer which is one port-write), this channel stops
// and triggers another channel (the clock channel) which throws a succession of 
// bits at an output pin, generating a clock signal.
// The clock channel triggers itself until complete and then triggers the outbound 
// channel to send the next byte.
// This process continues until the outbound channel has sent the entire record.
// The record write does not check for the following processor being ready to receive,
// it is assumed that it always is.
// The main loop must always write the record out again.
// The main loop can take as long as it likes between reading a record and
// rewriting it, but obviously this delays the record around the ring.

// Other Details:
// All port transfers are 32 bit. The T3.6 has 8 bit port capability but this is not used.
// All data is translated from 8 bit to 32 bit before sending.
// All data is translated from 32 bit to 8 bit after receiving.
// Both data and clock are sent to the ports' toggle registers to ensure that 
// only the pins desired are affected, other pins on the ports are not affected.
// The clock data out consists of a load of no-op bits to the port followed by 
// a single toggle bit, a string of no-ops, a toggle bit, and more no-ops.
// This can be tailored to give the desired clock pulse width and positioning.
// The record length is fixed at compile time for all the processors involved.
// The DMA outbound channel is configured to send a final byte which sets the port
// to 0xFF.  This enables the toggling function to calculate toggle bits on the 
// assumption that the port always starts off as 0xFF. The final byte is not clocked
// and so is not read by the following processor.
// Unfortunately it is necessary at this time to specify the serial numbers of all
// the teensies used in the program, at compile time.
// The teensy with the lowest serial will function as the Master, which is responsible
// for creating the record in the beginning, but has no other function.
// The contents of the record are entirely up to the user, a CRC check of some 
// sort is recommended.

// A dumpDmaTcd() function is also included which describes the TCD and reduces the need
// to refer to the manual when debugging, at the cost of considerable processing time.

#include <DMAChannel.h>
#include <TeensyID.h>
uint32_t lastDmaGetCnt; // used to reduce flicker on TFT

struct iter_t {
  uint16_t  iter;
  uint16_t  iterCh;
  uint16_t  iterelnk;
} myCiter, myBiter;

// Teensy Serial Number
union {
  uint32_t myTeensySerial;
  uint8_t serial[4];
} myId;

uint32_t serialList[] = {578792,840980,841146,841024}; // Ix = 1,2,3 - 0=error
uint32_t noOfProcessors = (sizeof(serialList))/4; // there must be a better way!
uint8_t mySerialIx = 0;

// Master
bool weAreMaster;                         // true = WE will generate the starting record
volatile bool initialRecordToBeCreated;   // true = we haven't yet generated the start record (if we are due to)

#define myEMPTY false
#define myFULL  true

volatile bool ibPackedBuffStatus  = myEMPTY;
volatile uint32_t dmaGetCnt       = 0;
uint32_t loopCnt, loop2Cnt = 0;
uint32_t startMillis  = 0;
float result; // used in a calculation to simulate work

// CHANNEL ORDER IS NOT IMPORTANT to this program(to be confirmed)
// The following uses available channels from 0 upwards
// They are usually 0,1,2 but may change if libraries alloc channels first
// The code is not number dependant
// The relevant priority wrt libraries if used should be considered
// The default Round-Robin priorities are assumed but should not affect it.
DMAChannel ibDmaTcdDataChan;  // 0 IB data channel  
DMAChannel obDmaTcdDataChan;  // 1 OB data channel
DMAChannel obDmaTcdClockChan; // 2 OB clock channel
// **==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==*
// BUFFERS
// **==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==*
// CHANGE THIS (AND ONLY THIS) TO SET THE RECORD SIZE
#define DMADATASIZE8 508 // in bytes - min 4 max 508 (9 bits) multiple of 4
// **==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==**==*
#define DMABUFFSIZEIBPK32 (DMADATASIZE8)
#define DMABUFFSIZEIBPK8  (DMADATASIZE8 * 4)
#define DMABUFFSIZEIBUP32 (DMADATASIZE8/4)
#define DMABUFFSIZEIBUP8  (DMADATASIZE8+1)  // +1 is for the term 0x FF flag

#define DMABUFFSIZEOBUP32 (DMADATASIZE8/4)
#define DMABUFFSIZEOBUP8  (DMADATASIZE8+1)  // +1 is for the term 0x FF flag
#define DMABUFFSIZEOBPK32 (DMADATASIZE8+1)  // +1 is for the term 0x FF flag in a uint32
#define DMABUFFSIZEOBPK8  ((DMADATASIZE8+1) * 4) 

// INBOUND
// IB data is read in as 1 byte stored as uint32_t's in ibPackedBuff
//  and then condensed as 1 byte as 1 byte into ibUnpackedBuff 
volatile uint32_t padding1[32]; // paddings defined to detect buffer overruns in testing
union {
  volatile uint32_t buff32[DMABUFFSIZEIBPK32]; // 
  volatile uint8_t  buff8 [DMABUFFSIZEIBPK8]; // for presetting ib area to detect ib change, only
} volatile ibPackedBuff;
volatile uint32_t padding2[32];
union {
  volatile uint32_t buff32[DMABUFFSIZEIBUP32]; // for user data manip only
  volatile char str[44];
  volatile uint32_t serial;
  volatile uint8_t  buff8 [DMABUFFSIZEIBUP8]; 
} volatile ibUnpackedBuff;

// OUTBOUND
// Data is stored by the user as bytes in obUnpackedBuff and then expanded to 
// uint32_t per byte in toggle format as the data is written to the port toggle register. 
// This makes it easy to just hit the bottom 8 bits of the port register.
// The T3.6 can do 8 bit port writes but this is not supported by later devices eg T4.x
// so is not used here.
volatile uint32_t padding3[32];
union {
  volatile uint32_t buff32[DMABUFFSIZEOBUP32]; // for user data manip only
  volatile uint8_t  buff8 [DMABUFFSIZEOBUP8];
  volatile uint32_t serial;
} volatile obUnpackedBuff;
volatile uint32_t padding4[32];
union {
  volatile uint32_t buff32[DMABUFFSIZEOBPK32];
  volatile uint8_t  buff8 [DMABUFFSIZEOBPK8]; 
} volatile obPackedBuff;
volatile uint32_t padding5[32];

// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// CLOCK
// Only two toggle bytes transmitted change anything, the rest are 
// just no-ops to fix the position with respect to the data byte 
// AND to set the width of the clock pulse.
#define OBCLOCKDATALEN8 (2*4) // must be 8 - 508 and a multiple of 4
volatile uint32_t padding6;
union {
  volatile uint32_t buff32[OBCLOCKDATALEN8/4];
  volatile uint8_t  buff8 [OBCLOCKDATALEN8];
} volatile obClock;
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// TEENSY IO REGISTERS for Teensy 3.6
// WARNING - "V" PINS ARE ON THE BACK OF THE BOARD
// WARNING You cannot use the pin values via pinXtable[n] from the tables below 
// without explicitly translating them into the correct registers for the device 
// you are using so explicit pin numbers are used in this program for simplicity.
// MOSI     11(07,28) use 28
// MISO     12(08,39) use 39
// SCK      13(14,27) use 27
// CS       8,9,15/A1,20/46,21/A7 use 15 needs shift
#define ALL_MOSI_PIN       28
#define ALL_MISO_PIN       39
#define ALL_SCK_PIN        27
#define TFT1_CS_PIN         9  // SPI CS
#define TFT1_DC_PIN        15  // SPI CS
#define TFT1_RST  		 	  255  // Reset line for TFT (or connect to +3.3V)

#include <Adafruit_HX8357.h>
#include <Adafruit_GFX.h>    // Core graphics library
#include "ILI9341_t3.h"
//#define TFT1_CDCS_PIN        99	 // not used but must be held high
Adafruit_HX8357  tft1 = Adafruit_HX8357(TFT1_CS_PIN, TFT1_DC_PIN, TFT1_RST);

int  tft1H = 480;
int  tft1W = 320;
int  tft1Col_BLACK      = HX8357_BLACK;
int  tft1Col_WHITE      = HX8357_WHITE;
int  tft1Col_RED        = HX8357_RED;
int  tft1Col_BLUE       = HX8357_BLUE;
int  tft1Col_YELLOW     = HX8357_YELLOW;
int  tft1Col_GREEN      = HX8357_GREEN;
#define HX8357_DARKGREEN   0x03E0      
int  tft1Col_DARKGREEN  = HX8357_DARKGREEN;
int  fillColour = tft1Col_WHITE;

// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*  PortA, 10 bits
//                                                 V   V   V
//                     A5,A12,A13,A14,A15,A16,A17,A26,A28,A29
//byte pinAtable[] = { 25,  3,  4, 26, 27, 28, 39, 42, 40, 41};
// none used                            ?   ?   ?
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortB, 16 bits 
//                                      V   V                           V   V   V   V
//                     B0, B1, B2, B3, B4, B5,B10,B11,B16,B17,B18,B19,B20,B21,B22,B23
//byte pinBtable[] = { 16, 17, 19, 18, 49, 50, 31, 32,  0,  1, 29, 30, 43, 46, 44, 45}; 
// none used
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortC -> 12 bits, but LED is on pin 13, pins all in binary order!!!!
// all on front
//                        C0, C1, C2, C3, C4, C5, C6, C7, C8, C9,C10,C11
//uint8_t pinCtable[] = { 15, 22, 23,  9, 10, 13, 11, 12, 35, 36, 37, 38}; 
//                         *   *   *   *   *   *   *   *   
//                                         $   $   $   $   $   $   $   $
// using the first 8 bits for data inbound
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortD, 8 bits in binary order.
// all on front
//                        D0, D1, D2, D3, D4, D5, D6, D7
//uint8_t pinDtable[] = {  2, 14,  7,  8,  6, 20, 21,  5};
//                         *   *   *   *   *   *   *   *
// using the first 8 bits for data outbound
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=* PortE, 5 bits, in binary order.
//                      V   V         
//                    E10,E11,E24,E25,E26  NOT CONTIG
//byte pinEtable[] = { 56, 57, 33, 34, 24};
// clock out, in                *   *
// note: E0 - E5 used by built-in SDCARD
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// using 33 & 34 as clock out & clock in
#define CLOCK_OUT_PIN 33  // NOT REFERENCED - the port bit is used instead
#define CLOCK_OUT_PORT_BIT (1<<0) // Pin 33 = bit 24 of Port E = 0th bit of MSByte
#define CLOCK_IN_PIN  34   // NOT REFERENCED - only the port interrupt is referenced
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// PRINT ROUTINES
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void printIbPackedBuff(int event, int loop) { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  Serial.printf("IBPK %d %d\n",event,loop);
  for (int ii = 0; ii < DMABUFFSIZEIBPK8;ii++ ){  
    Serial.printf("%02X ",ibPackedBuff.buff8[ii]);
    if (ii > 0) {
      if ((ii+1) %  4 == 0) {Serial.printf("  ");}
      if ((ii+1) % 16 == 0) {Serial.printf("\n");}
    }
  }
  Serial.printf("\n");
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void printIbUnpackedBuff(int event, int loop) { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  Serial.printf("IBUP %d %d - \n",event,loop);
  for (int ii = 0; ii < DMABUFFSIZEIBUP8;ii++ ){  
    Serial.printf("%02X ",ibUnpackedBuff.buff8[ii]);
    if (ii > 0) {
      if ((ii+1) %  4 == 0) {Serial.printf("  ");}
      if ((ii+1) % 16 == 0) {Serial.printf("\n");}
    }
  }
  Serial.printf("\n");
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void printObPackedBuff(int event, int loop) { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  Serial.printf("OBPK %d %d - \n",event,loop);
  for (int ii = 0; ii < DMABUFFSIZEOBPK8;ii++ ){
    Serial.printf("%02X ",obPackedBuff.buff8[ii]);
    if (ii > 0) {
      if ((ii+1) %  4 == 0) {Serial.printf("  ");}
      if ((ii+1) % 16 == 0) {Serial.printf("\n");}
    }
  }
  if ((DMABUFFSIZEOBPK8)%16==0){}
  else {Serial.printf("\n");}
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// BUFFER INITIALISATION ROUTINES
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setClockBuff() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Clear whole buffer to no-ops
  for (int ii=0;ii<OBCLOCKDATALEN8/4;ii++){
    obClock.buff32[ii] = 0x00; // address NO pins - this is a NO-OP
  }
  // set a specific pin bit to be toggled
  // once to set the clock line low and once to set it high again
  // on a minor loop each
  // Currently set to first and last minor loops
  // Currently set to zero no-ops ie fastest possible
  // LSB FIRST so bit 24 for PIN 33 is in the last byte!
  obClock.buff8[3] = CLOCK_OUT_PORT_BIT;                     // set toggle pin for low
  obClock.buff8[OBCLOCKDATALEN8 - 1] = CLOCK_OUT_PORT_BIT;   // set toggle pin for high
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void preSetIbPackedBuff() { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Used when debugging only to prove that DMA has written to the buffer
  for (int ii=0;ii<DMABUFFSIZEIBPK32-1;ii++){
    ibPackedBuff.buff32[ii] = 0xAABBCCDD;
  }
  ibPackedBuff.buff32[DMABUFFSIZEIBPK32-1] = 0x11223344;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setIbUnpackedBuff() { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  for (int ii=0;ii<DMABUFFSIZEIBUP8;ii++) {
    ibUnpackedBuff.buff8[ii] = 0x00;
  }
  ibUnpackedBuff.buff8[DMABUFFSIZEIBUP8-1] = 0xFF;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// PACKING / UNPACKING ROUTINES
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void unPackIbPkToIbUp() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Unpack IB Data  - from ibPackedBuff to ibUnpackedBuff, freeing ibPackedBuff
  for (int ii = 0; ii < DMABUFFSIZEIBPK32;ii++ ){
    // The GPIO port is 32 bit but we only want 8
    ibUnpackedBuff.buff8[ii] = ibPackedBuff.buff8[(ii*4)+0]; // in leftmost byte
  } 
  ibUnpackedBuff.buff8[DMABUFFSIZEIBUP8-1] = 0xFF;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void moveIbUpToObUp() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // Move IBUP to OBUP, no conversion necessary
  for (int ii=0;ii<DMABUFFSIZEIBUP8;ii++) { // inc terminal 0xFF
    obUnpackedBuff.buff8[ii] = ibUnpackedBuff.buff8[ii]; 
  }
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void packObUpToObPk() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// Expand the 8 bit data to the 32 bit data required by the port
  // Force the 0xFF in the INPUT data before we begin
  obUnpackedBuff.buff8[DMABUFFSIZEOBUP8-1] = 0xFF; // force 0xFF for toggling
  // convert the data in bytes to int32s
  for (int ii=0;ii<DMABUFFSIZEOBUP8;ii++) {
    obPackedBuff.buff8[(ii*4)+1] = 0x00;
    obPackedBuff.buff8[(ii*4)+2] = 0x00;
    obPackedBuff.buff8[(ii*4)+3] = 0x00;
    if (ii==0) {
      obPackedBuff.buff8[(ii*4)+0] = 0xFF ^ obUnpackedBuff.buff8[ii];
    } else {
      obPackedBuff.buff8[(ii*4)+0] = obUnpackedBuff.buff8[ii-1] ^ obUnpackedBuff.buff8[ii];
    }
    obPackedBuff.buff32[ii] <<= 4; // using PORT C bits 4 - 11
  }
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void dumpDmaTcd (DMAChannel  &dmabc, uint32_t event, const char* title) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// title is a char[] to describe the channel.
// event is an arbitrary number to distinguish between separate calls
// for the same channel.
// FOR DEBUGGING ONLY
// Dumps the specified TCD and some related non-TCD DMA registers.
// This routine takes a long time which can affect your processing...
// No guarantee that the values correspond to a single instant of time.
// Only written & tested for T3.6

DMASetting tcdDump; // used to hold a copy of this channel's TCD
char muxSourceName [64] [32] = {
"","",
"DMAMUX_SOURCE_UART0_RX",    //2
"DMAMUX_SOURCE_UART0_TX",  //3
"DMAMUX_SOURCE_UART1_RX",  //4
"DMAMUX_SOURCE_UART1_TX",  //5
"DMAMUX_SOURCE_UART2_RX",  //6
"DMAMUX_SOURCE_UART2_TX",  //7
"","",
"DMAMUX_NUM_SOURCE_ALWAYS",  //10
"","","",
"DMAMUX_SOURCE_I2S0_RX",  //14
"DMAMUX_SOURCE_I2S0_TX",  //15
"DMAMUX_SOURCE_SPI0_RX",  //16
"DMAMUX_SOURCE_SPI0_TX",  //17
"DMAMUX_SOURCE_SPI1_RX",  //18
"DMAMUX_SOURCE_SPI1_TX",  //19
"","",
"DMAMUX_SOURCE_I2C0",  //22
"DMAMUX_SOURCE_I2C1",  //23
"DMAMUX_SOURCE_FTM0_CH0",  //24
"DMAMUX_SOURCE_FTM0_CH1",  //25
"DMAMUX_SOURCE_FTM0_CH2",  //26
"DMAMUX_SOURCE_FTM0_CH3",  //27
"DMAMUX_SOURCE_FTM0_CH4",  //28
"DMAMUX_SOURCE_FTM0_CH5",  //29
"DMAMUX_SOURCE_FTM0_CH6",  //30
"DMAMUX_SOURCE_FTM0_CH7",  //31
"DMAMUX_SOURCE_FTM1_CH0",  //32
"DMAMUX_SOURCE_FTM1_CH1",  //33
"DMAMUX_SOURCE_FTM2_CH0",  //34
"DMAMUX_SOURCE_FTM2_CH1",  //35
"","","","",
"DMAMUX_SOURCE_ADC0",  //40
"DMAMUX_SOURCE_ADC1",  //41
"DMAMUX_SOURCE_CMP0",  //42
"DMAMUX_SOURCE_CMP1",  //43
"DMAMUX_SOURCE_CMP2",  //44
"DMAMUX_SOURCE_DAC0",  //45
"",
"DMAMUX_SOURCE_CMT",  //47
"DMAMUX_SOURCE_PDB",  //48
"DMAMUX_SOURCE_PORTA",  //49
"DMAMUX_SOURCE_PORTB",  //50
"DMAMUX_SOURCE_PORTC",  //51
"DMAMUX_SOURCE_PORTD",  //52
"DMAMUX_SOURCE_PORTE",  //53
"DMAMUX_SOURCE_ALWAYS0",  //54
"DMAMUX_SOURCE_ALWAYS1",  //55
"DMAMUX_SOURCE_ALWAYS2",  //56
"DMAMUX_SOURCE_ALWAYS3",  //57
"DMAMUX_SOURCE_ALWAYS4",  //58
"DMAMUX_SOURCE_ALWAYS5",  //59
"DMAMUX_SOURCE_ALWAYS6",  //60
"DMAMUX_SOURCE_ALWAYS7",  //61
"DMAMUX_SOURCE_ALWAYS8",  //62
"DMAMUX_SOURCE_ALWAYS9"  //63
};
  // Take a copy of the whole TCD quickly to minimise inconsistency between the fields
  memcpy(&tcdDump.TCD,&dmabc.TCD,32);
  uint32_t thisChannel = dmabc.channel;

  Serial.printf("============================================\n");
  Serial.printf("*** TCD For Channel %d [%s] %s %d\n",thisChannel,title,"for Event",event);
  Serial.printf("WARNING - values may be changing during reporting\n");
  Serial.printf("\"Completion\" means end of major loop \n");
// DMA STUFF
  Serial.printf("[DMA]\n");
  
  // DMA_CR - Control Register
  Serial.printf("  [DMA_CR]     : 0x%08X Control Register\n", DMA_CR);
  //Serial.printf("    DMA_CR_EEI: %x \n" , DMA_EEI);
  
  // DMA_ES - Error Status Register
  Serial.printf("  [DMA_CR_ES]  : 0x%08x Last Recorded Channel Error Register "  , DMA_ES);
	showDMA_ES();
  
  // DMA_ERQ - Enable Request Register
  Serial.printf("  [DMA_CR_ERQ] : 0x%08X Enable Request Register\n", DMA_ERQ);
  uint32_t *myX0 = (uint32_t *)(0x4000800C); 
  if (*myX0 != 0) {
    Serial.printf("  [DMA_CR_ERQ] : Hardware service request enabled for channels ");
    uint32_t erqWork = *myX0;
    for (uint ii=0;ii<32;ii++) {
      if (erqWork & 0x000001) {
        Serial.printf("%d ",ii);
        if (ii == thisChannel) {
          Serial.printf("(=this channel) ");
        }
      }
      erqWork >>= 1;
    }
    Serial.printf("\n");
  } else {Serial.printf("  [DMA_CR_ERQ] : No hardware service request enabled for any channel\n");}
  
  // DMA_INT - Interrupt Request Register
  uint32_t *myX1 = (uint32_t *)(&DMA_INT);  
  Serial.printf("  [DMA_INT]    : 0x%08X \n",*myX1);
  if (*myX1 != 0) {
    Serial.printf("\n  [DMA_INT] : DMA-generated interrupt request outstanding for channels ");
    uint32_t erqWork = *myX1;
    for (uint ii=0;ii<32;ii++) {
      if (erqWork & 0x000001) {
        Serial.printf("%d ",ii);
        if (ii == thisChannel) {
          Serial.printf("(=this channel) ");
        }
      }
      erqWork >>= 1;
    }
    Serial.printf("\n");
  }  else {Serial.printf("  [DMA_INT]    : No DMA-generated interrupt requests outstanding for any channel \n");}

  // DMA_ERR - Error Register
  Serial.printf("  [DMA_ERR]    : 0x%08X \n"  , DMA_ERR);
  uint32_t *myX2 = (uint32_t *)(&DMA_ERR);  
  if (*myX2 != 0) {
    Serial.printf("  [DMA_ERR]    : ** ERROR ** Channel error(s) detected on channel(s) ");
    uint32_t erqWork = *myX2;
    for (uint ii=0;ii<32;ii++) {
      if (erqWork & 0x000001) {
        Serial.printf("%d ",ii);
        if (ii == thisChannel) {
          Serial.printf("(=this channel) ");
        }
      }
      erqWork >>= 1;
    }
    Serial.printf("\n");
  } else {Serial.printf("  [DMA_ERR]    : No channel errors detected on any  channel\n");}

  // DMA_HRS - Hardware Request Status Register
  uint32_t *myX3 = (uint32_t *)(&DMA_HRS);  
  Serial.printf("  [DMA_HRS]    : 0x%08X \n",*myX3);
  if (*myX3 != 0) {
    Serial.printf("  [DMA_HRS]    : Hardware Service Requests exist for channel(s) ");
    uint32_t erqWork = *myX3;
    for (uint ii=0;ii<32;ii++) {
      if (erqWork & 0x000001) {
        Serial.printf("%d ",ii);
        if (ii == thisChannel) {
          Serial.printf("(=this channel) ");
        }        
      }
      erqWork >>= 1;
    }
    Serial.printf("\n");
  } else {Serial.printf("  [DMA_HRS]    : No hardware Service Requests exist for any channel\n");}

// DMAMUX
  uint32_t *myX4 = (uint32_t *)(&DMAMUX0_CHCFG0)+thisChannel;  
  int8_t enbl = *myX4 & 0b10000000;
  int8_t trig = *myX4 & 0b01000000;
  int8_t srce = *myX4 & 0b00111111;
  Serial.printf("  [DMAMUX_CHCFGn] : 0x%02X\n");
  if (enbl && trig) {
	  Serial.printf("  [DMAMUX_CHCFGn] : The DMA is in PIT triggering mode. (ENBL & TRIG are on)\n");
  } else {
	  Serial.printf("  [DMAMUX_CHCFGn] : The DMA is NOT in PIT triggering mode. (ENBL & TRIG are not both on)\n");
  }
  if (trig) {
    Serial.printf("  [DMAMUX_CHCFGn] : Triggering is enabled (TRIG is on).\n");
  } else {
    Serial.printf("  [DMAMUX_CHCFGn] : Triggering is NOT enabled (TRIG is off).\n");
  }
  if (enbl) {
    Serial.printf("  [DMAMUX_CHCFGn] : ENBL is on.\n");
  } else {
    Serial.printf("  [DMAMUX_CHCFGn] : ENBL is off.\n");
  }
  if (srce != 0) {
    Serial.printf("  [DMAMUX_CHCFGn] : [SOURCE] is %s\n",muxSourceName[srce]);
  } else {
    Serial.printf("  [DMAMUX_CHCFGn] : no MUX SOURCE defined\n");
  }
  // TCD STUFF  
  Serial.printf("[TCD]\n");
	Serial.printf("  [SADDR]    : 0x%08X :Source Address \n",(uint32_t)tcdDump.TCD->SADDR);
	Serial.printf("  [SOFF]     : %d  :Source address adjustment after each minor loop\n",tcdDump.TCD->SOFF);
	Serial.printf("  [DADDR]    : 0x%08X :Destination Address \n",(uint32_t)tcdDump.TCD->DADDR);
	Serial.printf("  [DOFF]     : %d  :Destination address adjustment after each minor loop\n",tcdDump.TCD->DOFF);
	Serial.printf("  [NBYTES]   : %d  :Minor Loop  Size \n",        tcdDump.TCD->NBYTES);
	Serial.printf("  [SLAST]    : %d :Source Address adjustment on completion\n",tcdDump.TCD->SLAST);
	// TCD ATTR  
  int32_t dsize = tcdDump.TCD->ATTR & 0b0000000000000111;
  int32_t dmod  = tcdDump.TCD->ATTR & 0b0000000011111000; dmod >>= 3;
  int32_t ssize = tcdDump.TCD->ATTR & 0b0000011100000000; ssize >>= 8;
  int32_t smod  = tcdDump.TCD->ATTR & 0b1111100000000000; smod >>= 11;
  char ssizeS[17];
	switch (ssize) {
    case 0:  strcpy(ssizeS,"8 bit"); break;
    case 1:  strcpy(ssizeS,"16 bit"); break;
    case 2:  strcpy(ssizeS,"32 bit"); break;
    case 4:  strcpy(ssizeS,"16-byte burst"); break;
    case 5:  strcpy(ssizeS,"32-byte burst"); break;
    default: strcpy(ssizeS,"INVALID");
	}
  char dsizeS[17];
	switch (dsize) {
    case 0:  strcpy(dsizeS,"8 bit"); break;
    case 1:  strcpy(dsizeS,"16 bit"); break;
    case 2:  strcpy(dsizeS,"32 bit"); break;
    case 4:  strcpy(dsizeS,"16-byte burst"); break;
    case 5:  strcpy(dsizeS,"32-byte burst"); break;
    default: strcpy(dsizeS,"INVALID");
	}
  
	Serial.printf("  [ATTR]     : 0x%08X\n",tcdDump.TCD->ATTR);
	Serial.printf("    [SMOD]     : 0x%02X",smod);
	if (smod != 0) {Serial.printf(" Circular source queue specifed - see manual if not desired\n");} 
	else           {Serial.printf(" \n");}	
	Serial.printf("    [SSIZE]    : 0x%02X   %s = Source transfer size\n",ssize,ssizeS);
	
	Serial.printf("    [DMOD]     : 0x%02X ",dmod);
	if (dmod != 0) {Serial.printf(" Circular dest queue specifed - see manual if not desired\n");} 
	else           {Serial.printf(" \n");}	
	Serial.printf("    [DSIZE]    : 0x%02X   %s = Destination transfer size\n",dsize,dsizeS);
	
  // Report CITER
  decodeIter((uint16_t)tcdDump.TCD->CITER,myCiter);
  Serial.printf("  [CITER]    : 0x%08X\n",tcdDump.TCD->CITER);
  if (myCiter.iterelnk != 0) {
    Serial.printf("    [CITERELNK]: This TCD will link to channel %d ",myCiter.iterCh);
    if (myCiter.iterCh == thisChannel) {Serial.printf("(=this channel) ");}        
    Serial.printf("at the end of a minor loop (except last)\n");
  } else {
    Serial.printf("    [CITERELNK]: This TCD will NOT link to another channel at the end of a minor loop\n");
  }
  Serial.printf("    [CITERITER]: %d Major loop iterations still to do \n",myCiter.iter);
  // Report BITER
  decodeIter(tcdDump.TCD->BITER,myBiter);
	Serial.printf("  [BITER]    : 0x%08X : (At end of all minor loops CITER reset value)\n",tcdDump.TCD->BITER);
  if (myBiter.iterelnk != 0) {
    Serial.printf("    [BITERELNK]: This TCD reset value will link to channel %d ",myCiter.iterCh);
    if (myBiter.iterCh == thisChannel) {Serial.printf("(=this channel) ");}        
    Serial.printf("at the end of a minor loop (except last)\n");
  } else {
    Serial.printf("    [BITERELNK]: This TCD reset value will NOT link to another channel at the end of a minor loop\n");
  }
    Serial.printf("    [BITERITER]: %d Major loop iterations initial/reset value\n",myBiter.iter);
  // Report CITER/BITER errors
  if (myCiter.iterelnk != myBiter.iterelnk) {
	  Serial.printf("** DMA ERROR ** CELNK DOES NOT MATCH BELNK\n");
	}
	// Report CITER/BITER interest
  if (myCiter.iter != myBiter.iter) {
	  Serial.printf("** NOTE ** CITER %d does not match BITER %d, DMA not yet finished or ** DMA ERROR **\n",myCiter.iter,myBiter.iter);
	}
		
	Serial.printf("  [DLASTSGA] :0x%08X\n", tcdDump.TCD->DLASTSGA);
	if ((tcdDump.TCD->CSR & DMA_TCD_CSR_ESG) == 0) {
  	Serial.printf("    [\"DLAST\"]  :%d Dest Address adjustment on completion\n",tcdDump.TCD->DLASTSGA);
  } else {
  	Serial.printf("    [\"SGA\"]      :Scatter/gather next TCD addr: 0x%08X\n",tcdDump.TCD->DLASTSGA);
  	if ((tcdDump.TCD->DLASTSGA & 0x0000001F) != 0) {
    	Serial.printf("    [\"SGA\"]    :** CONFIGURATION ERROR ** not 32 byte aligned 0x%08X\n", tcdDump.TCD->DLASTSGA);
    }
  }
	
	// CSR
	Serial.printf("  [CSR]      : 0x%08X\n",       tcdDump.TCD->CSR);
	
  // Bandwidth Control. Provides a means of controlling the amount of bus bandwidth the eDMA uses.
  //  00 No stalls–consume 100% bandwidth
  //  01 Reserved
  //  10 eDMA stalls for 4 cycles after each read/write 
  //  11 eDMA stalls for 8 cycles after each read/write
  if ((tcdDump.TCD->CSR & DMA_TCD_CSR_BWC_MASK) == DMA_TCD_CSR_BWC(3)) {	
	  Serial.printf("    [BWC]      :Band width control - eDMA to stall for 8 cycles after each read/write\n");
	} else{
    if ((tcdDump.TCD->CSR & DMA_TCD_CSR_BWC_MASK) == DMA_TCD_CSR_BWC(2)) {	
      Serial.printf("    [BWC]      :Band width control - eDMA to stall for 4 cycles after each read/write\n");
    } else {
      if ((tcdDump.TCD->CSR & DMA_TCD_CSR_BWC_MASK) == DMA_TCD_CSR_BWC(0)) {	
        Serial.printf("    [BWC]      :Band width control - eDMA to stall for 0 cycles after each read/write\n");
      } else {
        Serial.printf("    [BWC]      :Band width control - ** INVALID **\n");
      }
    }
	}

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_MAJORELINK) {
  	Serial.printf("    [MAJLINK]  :Channel-to-channel linking on major loop complete: YES\n");
//*** NOTE the define in cores/teensy3/kinetis.h for DMA_TCD_CSR_MAJORLINKCH_MASK is WRONG as 4 bits should be 5
//  Serial.printf("    [LINKCH]   :This TCD will link to channel %d at the end of a major loop \n",   tcdDump.TCD->CSR & DMA_TCD_CSR_MAJORLINKCH_MASK);
	  Serial.printf("    [LINKCH]   :This TCD will link to channel %d ",(tcdDump.TCD->CSR & 0x1F00)>>8);
    if (((tcdDump.TCD->CSR & 0x1F00)>>8) == thisChannel) {
      Serial.printf("(=this channel) ");
    }        
	  Serial.printf("at the end of a major loop \n");
  } else {
  	Serial.printf("    [MAJLINK]  :Channel-to-channel linking on major loop complete: NO\n");
  }
	
	if (tcdDump.TCD->CSR & DMA_TCD_CSR_ESG) {
  	Serial.printf("    [ESG]      :Scatter/gather specified: YES\n");
  } else {
  	Serial.printf("    [ESG]      :Scatter/gather specified: NO\n");
  }


	if (tcdDump.TCD->CSR & DMA_TCD_CSR_DREQ) {
  	Serial.printf("    [DREQ]     :This channel's ERQ bit to be cleared on completion: YES (ie set channel disabled)\n");
  } else {
  	Serial.printf("    [DREQ]     :This channel's ERQ bit to be cleared on completion: NO (ie channel not disabled)\n");
  }

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_INTHALF) {
  	Serial.printf("    [INTH]     :Interrupt on half completion: YES\n");
  } else {
  	Serial.printf("    [INTH]     :Interrupt on half completion: NO\n");
  }

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_INTMAJOR) {
  	Serial.printf("    [INTF]     :Interrupt on completion: YES\n");
  } else {
  	Serial.printf("    [INTF]     :Interrupt on completion: NO\n");
  }

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_START) {
  	Serial.printf("    [START]    :The channel is requesting service via a software initiated service request and the channel has not begun execution yet.\n");
  } else {
  	Serial.printf("    [START]    :The channel is NOT requesting service via a software initiated service request OR is, but the channel has begun execution.\n");
  } 

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_ACTIVE) {
  	Serial.printf("    [ACTIVE]   :DMA in execution, this flag will be cleared when this minor loop finishes\n");
  } else {
  	Serial.printf("    [ACTIVE]   :DMA not currently executing a minor loop\n");
  } 

	if (tcdDump.TCD->CSR & DMA_TCD_CSR_DONE) {
  	Serial.printf("    [DONE]     :Major loop completed. \n");
  } else {
  	Serial.printf("    [DONE]     :Major loop not completed, or never started, or has been cleared by the software or by the hardware when the channel is activated.\n");
  } 
  
  // TO DO - detect other errors:
/*
  Only 1st 4 channels for PIT module triggering
  multiple ch with same source is error even if disabled
  Do not use continuous link mode with a channel linking to itself 
    if there is only one minor loop iteration per service request. 
    If the channel’s NBYTES value is the same as either the source or 
    destination size, do not use channel linking to itself. The same 
    data transfer profile can be achieved by simply increasing the NBYTES value. 
    A larger NBYTES value provides more efficient, faster processing.
  EMLM no minor links bad universal
    no EMLM then nytes different
*/  
  // operation type mem-mem hw-mem etc
  // status not-started, running, paused, completed etc
  // errors detected by dma
  // errors detected by us
  
  // OTHER 
  // some fields eg HRS not expanded
  // can we report on timers, PDB, etc
  // linked channels
  // add channel name/title?

  Serial.printf("============================================\n");

}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void decodeIter(uint16_t tcdIter, iter_t &myIter) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  myIter.iterelnk  = tcdIter & 0x8000;
  if (myIter.iterelnk != 0) {
    myIter.iter   = tcdIter & 0x01FF;
    myIter.iterCh = (tcdIter & 0x3E00)>>9;
  } else {
    myIter.iter = tcdIter & 0x7FFF;
    myIter.iterCh = 0;
  }
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void showDMA_ES() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
uint worki;

uint32_t *myErr = (uint32_t *)(0x40008004);

  worki = *myErr & 0x80000000; worki >>= 31;
  if (worki!=1){
    Serial.printf("No errors registered\n");
  } else {
    Serial.printf("\n");
    worki = *myErr & 0x00000001; worki >>= 0;
    if (worki==1){Serial.printf("    [DBE] ** ERROR ** The last recorded error was a bus error on a destination write\n");}
    worki = *myErr & 0x00000002; worki >>= 1;
    if (worki==1){Serial.printf("    [SBE] ** ERROR ** The last recorded error was a bus error on a source read\n");}
    worki = *myErr & 0x00000004; worki >>= 2;
    if (worki==1){Serial.printf("    [SGE] ** ERROR **  TCDn_DLASTSGA (scatter/gather TCD) is not on a 32 byte boundary.\n");}
    worki = *myErr & 0x00000008; worki >>= 3;
    if (worki==1){
      Serial.printf("    [NCE] ** ERROR ** The last recorded error was a configuration error detected in the TCDn_NBYTES or TCDn_CITER fields.\n");
      Serial.printf("    - TCDn_NBYTES is not a multiple of TCDn_ATTR[SSIZE] and TCDn_ATTR[DSIZE], or \n");
      Serial.printf("    - TCDn_CITER[CITER] is equal to zero, or \n");
      Serial.printf("    - TCDn_CITER[ELINK] is not equal to TCDn_BITER[ELINK]\n");
    }
    worki = *myErr & 0x00000010; worki >>= 4;
    if (worki==1){Serial.printf("    [DOE] ** ERROR ** The last recorded error was a configuration error detected in the TCDn_DOFF field. TCDn_DOFF is inconsistent with TCDn_ATTR[DSIZE].\n");}
    worki = *myErr & 0x00000020; worki >>= 5;
    if (worki==1){Serial.printf("    [DAE] ** ERROR ** The last recorded error was a configuration error detected in the TCDn_DADDR field. TCDn_DADDR is inconsistent with TCDn_ATTR[DSIZE].\n");}
    worki = *myErr & 0x00000040; worki >>= 6;
    if (worki==1){Serial.printf("    [SOE] ** ERROR ** The last recorded error was a configuration error detected in the TCDn_SOFF field. TCDn_SOFF is inconsistent with TCDn_ATTR[SSIZE].\n");}
    worki = *myErr & 0x00000080; worki >>= 7;
    if (worki==1){Serial.printf("    [SAE] ** ERROR ** The last recorded error was a configuration error detected in the TCDn_SADDR field. TCDn_SADDR is inconsistent with TCDn_ATTR[SSIZE].\n");}
  
    worki = *myErr & 0x00001F00; worki >>= 8;
    Serial.printf("    [ERRCHN] %d = The channel number of the last recorded error, excluding GPE and CPE errors, or last recorded error canceled transfer.\n",worki);
  
    worki = *myErr & 0x00004000; worki >>= 14;
    if (worki==1){Serial.printf("    [CPE] ** ERROR ** The last recorded error was a configuration error in the channel priorities within a group. Channel priorities within a group are not unique.\n");}
    worki = *myErr & 0x00008000; worki >>= 15;
    if (worki==1){Serial.printf("    [GPE] ** ERROR ** The last recorded error was a configuration error among the group priorities. All group priorities are not unique.\n");}
    worki = *myErr & 0x00010000; worki >>= 16;
    if (worki==1){Serial.printf("    [ECX] ** ERROR ** The last recorded entry was a canceled transfer by the error cancel transfer input\n");}
  } 
}
// TODO - USE portConfigRegister(pin);
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setDataPins(bool level) { // used for debugging
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // USE DIGITS DIRECTLY
  //digitalWrite(15,level); // DROPPED
  //digitalWrite(22,level);
  //digitalWrite(23,level); 
  //digitalWrite(9, level);
  digitalWrite(10,level);
  digitalWrite(13,level);
  digitalWrite(11,level);
  digitalWrite(12,level); 
  digitalWrite(35,level); // ADDED
  digitalWrite(36,level); // ADDED
  digitalWrite(37,level); // ADDED
  digitalWrite(38,level); // ADDED
  // NOTE The last pin in the pin table gives the highest value bit
  delayMicroseconds(5); // REQUIRED
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void setup() { 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// DATA  OUTPUT=PORTC INPUT=PORTD
// CLOCK OUTPUT=PORTE INPUT=PORTE
  delay(5000);
  Serial.begin(115200);  

  // Handle the serial number of this Teensy
  teensySN(myId.serial);
  myId.myTeensySerial = __builtin_bswap32(myId.myTeensySerial); // IS THIS NECESSARY ? !!!
  mySerialIx = getSerialIx(myId.myTeensySerial);
  Serial.printf("SER=%d\n",myId.myTeensySerial);

  // The processor with the lowest id is the Master
  weAreMaster = true;
  bool weAreInList = false;
  for (uint32_t ii=0;ii<noOfProcessors;ii++){
    if (serialList[ii] <  myId.myTeensySerial) {weAreMaster = false;}
    if (serialList[ii] == myId.myTeensySerial) {weAreInList = true ;}
  }
  if (weAreMaster){
    initialRecordToBeCreated = true;
    Serial.printf("MASTER=YES\n");
  } else {
    initialRecordToBeCreated = false;
    Serial.printf("MASTER=NO\n");
  }
  // If our serial number is not in the list, hard stop
  if (!weAreInList) {
    Serial.printf("OUR SERIAL %d NOT IN LIST\n",myId.myTeensySerial);
    for (uint32_t ii=0;ii<noOfProcessors;ii++){
      Serial.printf("  %d\n",serialList[ii]);
    }
    while(1==1){} // HARD LOOP
  }
  //Serial.printf("NOPR  %d\n",noOfProcessors); // for debugging

  // Build the clocking data
  setClockBuff();
  
  // SPI
  if (mySerialIx == 4) {
    SPI.setMOSI(ALL_MOSI_PIN); // 28
    SPI.setMISO(ALL_MISO_PIN); // 39
    SPI.setSCK(ALL_SCK_PIN);   // 27
    // CS 
    // DC
    tft1.begin();
    tft1.setRotation(3);               // Set to landscape mode
    tft1.fillScreen(tft1Col_WHITE);
    tft1.setTextColor(tft1Col_BLACK);
    tft1.setTextSize(8);
    
  }
  
  // Setup Data Input Pins
  //for (int ii=0;ii<8;ii++) {pinMode(pinDtable[ii], INPUT);} WILL NOT WORK
  
  pinMode(2,  INPUT);
  pinMode(14, INPUT);
  pinMode(7,  INPUT);
  pinMode(8,  INPUT);
  pinMode(6,  INPUT);
  pinMode(20, INPUT);
  pinMode(21, INPUT);
  pinMode(5,  INPUT);

  // Setup Output Pins
  // 15 => 35
  //pinMode(15,OUTPUT); // LSB // DROPPED
  //pinMode(22,OUTPUT);
  //pinMode(23,OUTPUT); 
  //pinMode(9, OUTPUT);
  pinMode(10,OUTPUT);
  pinMode(13,OUTPUT);
  pinMode(11,OUTPUT);
  pinMode(12,OUTPUT); // MSB
  pinMode(35,OUTPUT); // LSB // ADDED
  pinMode(36,OUTPUT); // LSB // ADDED
  pinMode(37,OUTPUT); // LSB // ADDED
  pinMode(38,OUTPUT); // LSB // ADDED
  
  // Set up Clock Pins
  pinMode(33, OUTPUT);
  digitalWrite(33, HIGH);
  pinMode(34, INPUT);

  // Put the data pins into a known starting position
  setDataPins(HIGH);

// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// Set up the DMA channels
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
/*
To initialize the eDMA: (ex the manual)
1. Write to the CR if a configuration other than the default is desired.
2. Write the channel priority levels to the DCHPRIn registers if a configuration other than the default is desired.
3. Enable error interrupts in the EEI register if so desired.
4. Write the 32-byte TCD for each channel that may request service.
5. Enable any hardware service requests via the ERQH and ERQL registers. {???}
6. Request channel service via either:
• Software: setting the TCDn_CSR[START]
• Hardware: slave device asserting its eDMA peripheral request signal
*/
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// DMA CLOCK CHANNEL - ch 0
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
  Serial.printf("Clock Channel is %d\n",obDmaTcdClockChan.channel);
//1 Write CR
//2 Write Priority
//3 Enable Error Interrupts
//4 Write TCD
  obDmaTcdClockChan.begin();
  obDmaTcdClockChan.destination((volatile uint32_t&)GPIOE_PTOR); // clock pin TOGGLE register
  obDmaTcdClockChan.sourceBuffer((uint32_t *)obClock.buff32,OBCLOCKDATALEN8); // len is in BYTES!!!
  obDmaTcdClockChan.disableOnCompletion();
//5 Enable any hardware service requests via the ERQH and ERQL registers.
// ERQH and ERQL are not defined anywhere!!!
//6 Request channel service via either:
//  • Software: setting the TCDn_CSR[START]
//  • Hardware: slave device asserting its eDMA peripheral request signal
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// DMA DATA OB CHANNEL
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
  Serial.printf("OB Channel is %d\n",obDmaTcdDataChan.channel);
//1 Write CR
//2 Write Priority
//3 Enable Error Interrupts
//4 Write TCD
  obDmaTcdDataChan.begin();
  obDmaTcdDataChan.destination((volatile uint32_t&)GPIOC_PTOR); // TOGGLE REGISTER 
  obDmaTcdDataChan.sourceBuffer((uint32_t *)obPackedBuff.buff32,(DMABUFFSIZEOBPK8)); // LEN in BYTES!!!
  obDmaTcdDataChan.transferCount(DMABUFFSIZEOBPK32);
  obDmaTcdDataChan.disableOnCompletion();
  // Bandwidth Control. Provides a means of controlling the amount of bus bandwidth the eDMA uses.
  //  00 No stalls–consume 100% bandwidth
  //  01 Reserved
  //  10 eDMA stalls for 4 cycles after each read/write 
  //  11 eDMA stalls for 8 cycles after each read/write
  //obDmaTcdDataChan.TCD->CSR |= 0xC000; // or in the BWC
  obDmaTcdClockChan.triggerAtTransfersOf( obDmaTcdDataChan);
  obDmaTcdClockChan.triggerAtTransfersOf( obDmaTcdClockChan);
  obDmaTcdDataChan.triggerAtCompletionOf( obDmaTcdClockChan);
  // following not used as we send 0xFF to set the port but do not require
  // the receiver to read it (as we could not get it to work!) 
  // not used - obDmaTcdClockChan.triggerAtCompletionOf(obDmaTcdDataChan);
//5 Enable any hardware service requests via the ERQH and ERQL registers.
//6 Request channel service via either:
//  • Software: setting the TCDn_CSR[START]
//  • Hardware: slave device asserting its eDMA peripheral request signal
// *==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*==*
// DMA DATA IB CHANNEL
// = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
  Serial.printf("IB Channel is %d\n",ibDmaTcdDataChan.channel);
//1 Write CR
//2 Write Priority
//3 Enable Error Interrupts
//4 Write TCD
  // Note This channel should never run against the other 2 channels but might
  // encounter Library channels if any are added.
  ibDmaTcdDataChan.begin();
  ibDmaTcdDataChan.source((volatile uint32_t&) GPIOD_PDIR);
  ibDmaTcdDataChan.destinationBuffer(ibPackedBuff.buff32,DMABUFFSIZEIBPK8); // note byte length
  ibDmaTcdDataChan.transferCount(DMABUFFSIZEIBPK32); // ?????????????????????????????
  // This channel is continuously enabled:
  // ibDmaTcdDataChan.disableOnCompletion();
  ibDmaTcdDataChan.attachInterrupt(dmaGetEndedInterrupt);
  ibDmaTcdDataChan.interruptAtCompletion();
  // The following type of "enable request" is needed for hardware triggering
  ibDmaTcdDataChan.enable(); // same as next: // this triggers 1 minor loop
  // DMA_SERQ = ibDmaTcdDataChan.channel; 
  // This channel is left enabled at the end of each major loop
//5 Enable any hardware service requests via the ERQH and ERQL registers.
  //Address: 4002_1000h base + 0h offset + (1d × i), where i=0d to 31d
  uint32_t *myQ = (uint32_t *)(0x40021000)+ibDmaTcdDataChan.channel; 
  *myQ = 0;
//  *myQ = DMAMUX_ENABLE | (DMAMUX_SOURCE_PORTA & 63) ;
//  *myQ = DMAMUX_ENABLE | DMAMUX_SOURCE_PORTA ;
  *myQ = DMAMUX_ENABLE | DMAMUX_SOURCE_PORTE ;
//6 Request channel service via either:
//  • Software: setting the TCDn_CSR[START]
//  • Hardware: slave device asserting its eDMA peripheral request signal
  //NVIC_SET_PRIORITY(IRQ_PORTE, 32); // not tried yet
// MISC NOTES
  ibDmaTcdDataChan.triggerAtHardwareEvent(DMAMUX_SOURCE_PORTE);
	PORTE_PCR25 |= PORT_PCR_IRQC(2); // Trigger DMA request on falling edge

dumpDmaTcd(ibDmaTcdDataChan, 11,"IB"); 
  delay(3000); // allow time for dma to do mischief
dumpDmaTcd(ibDmaTcdDataChan, 22,"IB");
  
  // **** Start Right Royal Bodge
  // No matter how we set up the IB DMA channel, we always get one minor loop executed
  // BEFORE we have received an input clock pin interrupt!
  // This leaves the channel out of phase with the sender.
  // This bodge modifies the TCD to look like new
  // CRUDE RESET
  ibDmaTcdDataChan.TCD->CITER = ibDmaTcdDataChan.TCD->BITER;
  ibDmaTcdDataChan.destinationBuffer(ibPackedBuff.buff32,DMABUFFSIZEIBPK8); // note byte length
  // **** End Right Royal Bodge
  
  if (weAreMaster) {delay(2000);} // ensure others are ready to read

  Serial.println("SETUP ENDED");  
  startMillis = millis(); // for debugging
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
//bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)
//type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)
//These builtins perform an atomic compare and swap. That is, if the current value of *ptr is oldval, then write newval into *ptr.
//The “bool” version returns true if the comparison is successful and newval was written. 
//The “val” version returns the contents of *ptr before the operation. 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
inline bool enqueue( volatile unsigned int *lock ) { // not used
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
	return (__sync_bool_compare_and_swap( lock, 0, 1 ) );
	// returns true  if lock was 0 and was replaced by 1
	// returns false if lock was 1 ie owned by someone else
}
//Prior to initializing the corresponding module, set SCGC5[PORTx] in the SIM module to enable the clock. Before turning off the clock, make sure to disable the module. 
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
inline void dequeue( volatile unsigned int *lock ) { // not used
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  asm volatile ( "" ::: "memory" );
  *lock = 0;
  // sets lock to 0, releasing it
  return;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void doDmaGet() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  // no longer required as IB DMA now continuously enabled
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void doDmaPut() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
//  DMA_SERQ = 1; // Start OB - NOT needed
  obDmaTcdDataChan.TCD->CSR |= DMA_TCD_CSR_START ; // 0x0001
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void dmaGetEndedInterrupt() { // Called by DMA IB on completion of read
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  //ibDmaTcdDataChan.clearInterrupt(); // same as:
  DMA_CINT = ibDmaTcdDataChan.channel; // clear interrupt
  ibPackedBuffStatus = myFULL;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
uint8_t getSerialIx(uint32_t thisSerial) {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
uint8_t ix = 0; // 0=error
  for (uint32_t ii=0;ii<noOfProcessors;ii++) {
    if (thisSerial == serialList[ii]) {ix=ii+1;}
  }
  return ix;
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void loop() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  loopCnt++;
  
  loop2();
  if (mySerialIx == 4) { // only 1 T3.6 fitted with TFT
    if (loopCnt % 500000 == 0) { // reduce reporting
      if (dmaGetCnt != lastDmaGetCnt) { // Only display changes
        tft1.setCursor(0, 0);  
        loop2();
        tft1.setTextColor(tft1Col_WHITE);
        loop2();
        tft1.println(lastDmaGetCnt); // undo last print
        loop2();
        tft1.setCursor(0, 0);  
        loop2();
        tft1.setTextColor(tft1Col_BLACK);
        loop2();
        tft1.println(dmaGetCnt);
        lastDmaGetCnt = dmaGetCnt;
        loop2();
      }  
    }
  }
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
void loop2() {
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
  volatile bool myIbPackedBuffStatus = myEMPTY;
  // Caution - ibPackedBuffStatus is also changed in an interrupt
  // We can change it here when the IB DMA channel is not running
  // or with interrupts disabled
  
  loop2Cnt++;
    
  if (millis()-startMillis > 10000){ // debugging
    printIbUnpackedBuff(dmaGetCnt,loopCnt);
    //printObPackedBuff(dmaGetCnt,loopCnt);
    dumpDmaTcd(ibDmaTcdDataChan,  91, "IB");
    dumpDmaTcd(obDmaTcdDataChan,  92, "OB");
    dumpDmaTcd(obDmaTcdClockChan, 93, "CLK");
    delay(5);
    while(1==1){}
  }

  // Handle Startup
  if (initialRecordToBeCreated) { 
    initialRecordToBeCreated = false;
    setIbUnpackedBuff();
    moveIbUpToObUp();
    packObUpToObPk(); 
    doDmaPut();
    return;
  }

  // Handle Get-Got Interrupt Flag
  noInterrupts();
  myIbPackedBuffStatus = ibPackedBuffStatus;
  ibPackedBuffStatus = myEMPTY;
  interrupts();

  // Handle Get Got
  if (myIbPackedBuffStatus == myFULL) {
    dmaGetCnt++;
    //printIbPackedBuff(dmaGetCnt,loopCnt);
    unPackIbPkToIbUp(); 
    //printIbUnpackedBuff(dmaGetCnt,loopCnt);    
    // *** START Work - tinker with the record here in ib unpacked buffer
    ibUnpackedBuff.buff32[mySerialIx]++; 
    // *** END   Work
    moveIbUpToObUp(); 
    //printObUnpackedBuff(dmaGetCnt,loopCnt); // function does not exist
    packObUpToObPk(); 
    //printObPackedBuff(dmaGetCnt,loopCnt);
    doDmaPut();
        
  }   
  // Artificial work for debugging
  //for (int ii=1.0;ii<100.0;ii++){
  //  result = ii / (ii*ii);
  //}
  
}
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
// *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
 
Back
Top