Teensy 3.5 SPI insight

Status
Not open for further replies.

serT

Member
Hello guys.

In my current application i use SPI and saw during analyzing the output with a oscilloscope "gaps" between each byte. I use a shortened form of the DMASpi example which can be found here: https://github.com/crteensy/DmaSpi
My code looks like this


Code:
//#include <WProgram.h>

#include <SPI.h>
#include <DmaSpi.h>

/** Important
  The sketch waits for user input through Serial (USB) before it starts the setup.
  After any key was pressed, it should begin to work. It will ask for a second keypress later.
  
  This example cannot simply be modified to use SPI1 or SPI2 instead of SPI0 because ActiveLowChipSelect is
  hardcoded to use SPI (SPI0).
  If you want to use SPI1 or SPI2, you need to create a new chip select class to reflect this.
**/

/** Hardware setup:
 Teensy 3.1, 3.2: DOUT (11) connected to DIN (12)
 Teensy LC: DOUT (11) connected to DIN (12)
 Teensy 3.5, 3.6: DOUT (11) connected to DIN (12)
**/

/** buffers to send from and to receive to **/
#define DMASIZE 1000
uint8_t src[DMASIZE];
volatile uint8_t dest[DMASIZE];
volatile uint8_t dest1[DMASIZE];

/** Wait for and consume a keypress over USB **/
void waitForKeyPress()
{
  Serial.println("\nPress a key to continue\n");
  while(!Serial.available());
  while(Serial.available())
  {
    Serial.read();
  }
}

//** View received data
void dumpBuffer(const volatile uint8_t* buf, const char* prefix)
{
  Serial.print(prefix);
  for (size_t i = 0; i < DMASIZE; i++)
  {
    Serial.printf("0x%02x ", buf[i]);
  }
  Serial.print('\n');
}

/** Compare the buffers and print the destination contents if there's a mismatch **/
void compareBuffers(const uint8_t* src_, const uint8_t* dest_)
{
  int n = memcmp((const void*)src_, (const void*)dest_, DMASIZE);
  if (n == 0)
  {
    Serial.println("src and dest match");
  }
  else
  {
    Serial.println("src and dest don't match");
    dumpBuffer(src_, " src: " );
    dumpBuffer(dest_, "dest: ");
  }
}

//** Fill sender buffer with 0..99
void setSrc()
{
  for (size_t i = 0; i < DMASIZE; i++)
  {
    src[i] = i;
  }
 /* 
  src[0] = 166;
  src[1] = 154;
  src[2] = 165;
  src[3] = 166;
  src[4] = 101;
  src[5] = 170;
  src[6] = 101;
  src[7] = 166;*/
}

//** Clear destination buffer - 100 Zeros
void clrDest(uint8_t* dest_)
{
  memset((void*)dest_, 0x00, DMASIZE);
}

void setup()
{
  Serial.begin(9600);
  waitForKeyPress();
  Serial.println("Hi!");

  /** Prepare source and destination **/
  setSrc();
  clrDest((uint8_t*)dest);
  Serial.println("Buffers are prepared");Serial.flush();  //what does serial.flush do?

  /** set up SPI **/
  //SPISettings(5000000, MSBFIRST, SPI_MODE0);
  SPISettings spiSettings;
  SPI.begin();

  // transmit 10 bytes and measure time to get a feel of how long that takes
  SPI.beginTransaction(SPISettings(10000000, MSBFIRST, SPI_MODE0));
  elapsedMicros us;
  for (size_t i = 0; i < DMASIZE; i++)
  {
    dest[i] = SPI.transfer(src[i]);
  }
  uint32_t t = us;
  Serial.print("Time for non-DMA transfer: ");Serial.print(t);Serial.println("us");
  SPI.endTransaction();
  compareBuffers(src, (const uint8_t*)dest);
  
// ** ------------------------------------------------------------------------------- ** //

  waitForKeyPress();

  SPI.setClockDivider(2);
  DMASPI0.begin();
  DMASPI0.start();


  DmaSpi::Transfer trx(nullptr, 0, nullptr);

  Serial.println("Testing src -> dest, single transfer");
  Serial.println("--------------------------------------------------");
  trx = DmaSpi::Transfer(src, DMASIZE, dest);
  clrDest((uint8_t*)dest);
  DMASPI0.registerTransfer(trx);
  Serial.println("Finished DMA transfer");

  if (DMASPI0.stopped())
  {
    Serial.println("DMA SPI stopped.");
  }
  else
  {
    Serial.println("DMA SPI is still running");
  }
  DMASPI0.stop();
  if (DMASPI0.stopped())
  {
    Serial.println("DMA SPI stopped.");
  }
  else
  {
    Serial.println("DMA SPI is still running");
  }

  DMASPI0.end();
  SPI.end();
  pinMode(LED_BUILTIN, OUTPUT);
}

void loop()
{
  digitalWriteFast(LED_BUILTIN, true);
  delay(500);
  digitalWriteFast(LED_BUILTIN, false);
  delay(500);
}

So far its working well but as can be seen in the pictures below, in the green marked areas, there is a gap between every sent byte. Left is withOUT DMA and right is WITH DMA. So its almost the same behavior except that the gap gets smaller with DMA usage.

Where does it come from and can i get rid of it?

Teensy without DMA.pngTeensy w DMA.png

Thanks.
 
There was another recent thread talking about delays trying to use two SPI busses (https://forum.pjrc.com/threads/4105...are-SPI-busses?p=128428&viewfull=1#post128428), where I talked about some of the issues.

I will leave it to others who use DMA to talk more on that subject.

In your case you are using the main SPI buss, which has a read and write FIFO queue of 4 entries. The problem is you are using code that transfers one byte at a time:
Code:
  for (size_t i = 0; i < DMASIZE; i++)
  {
    dest[i] = SPI.transfer(src[i]);
  }
This more or less short circuits the transfer queues and has to wait for each byte to be transferred, grabs the byte stores it away before it can start the transfer of the next byte.

There is another form of the transfer which should work faster, where you call SPI.transfer(buf, cnt);
and it sends the bytes out from the buf and also updates buf with the result.

However current SPI library code, does not help here as it internally does what your loop does.

However there is an outstanding Pull request(https://github.com/PaulStoffregen/SPI/pull/23), that makes use of the queues to speeds things up.
It also cheats and packs up two bytes at a time into the queue. This was talked about in the thread: https://forum.pjrc.com/threads/4062...ng-as-expected?p=126584&viewfull=1#post126584

Note: there will still be a gap between groups, but a whole lot less of a gap.

Kurt
 
Thanks Kurt for the explanation. I tried also SPI.transfer(buf, cnt); which indeed does the same as my loop before and therefore produces the same gaps.
But why they get smaller when DMA is used is still not clear. I hoped to be able to use SPI with DMA control.

In the mean time there came up some other issues in my application. I will switch to the "Project Guidance" to make sure that teensy is able to do what i need.

Btw: the application works on a ATMEL system, but i want to port it on a much smaller sized board.
 
For the fun of it you might try getting the SPI library from my fork/branch: https://github.com/KurtE/SPI/tree/Faster_transfer_buf

And try using the transfer of a buffer.

PAUL: I put another update into my PR: https://github.com/PaulStoffregen/SPI/pull/23

With this update, I have the transfer act like the ILI9341_t3 library and use the SPI_PUSHR_CONT flag for all of the transfers, except for the last word, which does not use it.

Code:
-		KINETISK_SPI0.PUSHR = w | SPI_PUSHR_CTAS(1);
+		if (count == 2)
+			KINETISK_SPI0.PUSHR = w | SPI_PUSHR_CTAS(1);
+		else	
+			KINETISK_SPI0.PUSHR = w | SPI_PUSHR_CONT | SPI_PUSHR_CTAS(1);

With this it looks like the gaps are pretty much removed:
screenshot.jpg
 
This is great, however I think it will only work if the number of bytes to transfer is a multiple of 2? I have the situation where I need to send 24 bit words to the SPI. I currently do this using an 8 bit and an 16 bit transfer, with a SPI_PUSHR_CONT on the first transfer. I am still getting a slight delay between bytes. Any thoughts on how to close up this gap?

Note, I'm using an arbitrary CS pin as I have other steps necessary after the CS is asserted before I can actually send SPI data.

Code:
void send24(uint8_t cs_pin, uint32_t value) {
                // clear any data in RX/TX FIFOs, and be certain we are in master mode.
                SPI0_MCR = SPI_MCR_MSTR | SPI_MCR_CLR_RXF | SPI_MCR_CLR_TXF | SPI_MCR_PCSIS(0x1F);
                SPI0_SR = SPI_SR_TCF;
                AssertCS(cs_pin,0);
                SPI0_PUSHR = SPI_PUSHR_CONT | SPI_PUSHR_CTAS(0) | ((value >>16) & 0xFF) ;
                SPI0_PUSHR = SPI_PUSHR_EOQ | SPI_PUSHR_CTAS(1) | (uint16_t) value;

                while((SPI0_SR & SPI_SR_EOQF) == 0);
                SPI0_SR = SPI_SR_EOQF;
                AssertCS(cs_pin,1);
        }
}
 
The transfer buf handles any number of bytes. Currently I did not put the CONT on the first byte for odd byte transfers, but will. But it should do more or less the same as your send24
 
Sounds good, I pushed up version that checked for odd bytes and if count> 1 output the PUSHR_CONT

Note: with your above send24, if you are calling this multiple times, there will be gaps as your code first clears queue, then pushes two items, then waits for the last item to be completed before returning...
 
I do call this function multiple times and so yes, there are gaps between repeated calls. I'm not sure how to reduce this though as I don't want to call the function again and clear the queue prior to the last byte being sent. Is there a better way?
 
You could either simply pack them into a byte array and then call my version of the SPI library. Hopefully it will make it into next beta for next release...

Or you could setup a set of functions or parameters, to do this for you. But of course some of this may depend on things like do you need, the CS to assert/deassert around each 3 bytes or can it stay asserted through group.

Maybe something like:
Code:
// Out of ili9341_t3 library
	void waitFifoNotFull(void) {
		uint32_t sr;
		uint32_t tmp __attribute__((unused));
		do {
			sr = KINETISK_SPI0.SR;
			if (sr & 0xF0) tmp = KINETISK_SPI0.POPR;  // drain RX FIFO
		} while ((sr & (15 << 12)) > (3 << 12));
	}

void send24(uint8_t cs_pin, uint32_t value, uint8_t  options) {
                // clear any data in RX/TX FIFOs, and be certain we are in master mode.
                if (options & 1) {
                        // Not sure if you need to do MCR and SR like this...
                        SPI0_MCR = SPI_MCR_MSTR | SPI_MCR_CLR_RXF | SPI_MCR_CLR_TXF | SPI_MCR_PCSIS(0x1F);
                        SPI0_SR = SPI_SR_TCF;
                        AssertCS(cs_pin,0);
                }
                SPI0_PUSHR = SPI_PUSHR_CONT | SPI_PUSHR_CTAS(0) | ((value >>16) & 0xFF) ;
                waitFifoNotFull();
                if (options & 2) {
                        SPI0_PUSHR = SPI_PUSHR_EOQ | SPI_PUSHR_CTAS(1) | (uint16_t) value;
		        uint32_t tmp __attribute__((unused));

                        while((SPI0_SR & SPI_SR_EOQF) == 0) {
                                // Maybe should test SR & 0xF0 to call popr, but ILI9341_t3 waitTransmitComplete does not...
                                tmp = KINETISK_SPI0.POPR;  // drain RX FIFO
                        }
                        SPI0_SR = SPI_SR_EOQF;
                        AssertCS(cs_pin,1);
                } else {
                        SPI0_PUSHR = SPI_PUSHR_CONT | SPI_PUSHR_CTAS(1) | (uint16_t) value;
                        waitFifoNotFull();
                }
        }
}
Warning typed in changes on the fly so may be issues...
But the idea is you pass in flag=1 for the first entry to setup CS. Optionally could split this part out as separate call... Then flag=2 if this is the last item, which then waits for the last stuff to transfer before changing CS pin

Hope this makes sense
 
Thanks a lot, KurtE, I'll take a look at this. I do need the CS to deassert itself every write, but I'll have to confirm this. I'm interfacing with some hardware that fills a 1024 event (24 bits event) FIFO and empties it via SPI. I'll give this a try regardless and see how it compares. I think I understand it enough to fill in any missing bits. Thanks for the help.
 
Another Note: if you are going to this extent to use SPI registers and you choose a CS which is one of the CS pins, I would be tempted to try actually using the CS pin as part of the pushr. Especially TRUE if you need the CS pin to change for each 3 byte write...

Again I don't know your setup and how many other SPI outputs you are doing here.

Again if it were me, I would use the SPI.beginTransaction()... to initialize everything and SPI.endTranaction(), just because.
Then somewhere in your init code you would do something like: pcs = SPI.setCS(cs_pin);

Then I think the above function could be something like:
Code:
void send24(uint8_t cs_pin, uint32_t value, uint8_t pcs, uint8_t  last) {
                SPI0_PUSHR = SPI_PUSHR_CONT | SPI_PUSHR_CTAS(0) | (pcs<< 16) | ((value >>16) & 0xFF) ;
                waitFifoNotFull();
                if (options & 2) {
                        SPI0_PUSHR = SPI_PUSHR_EOQ | SPI_PUSHR_CTAS(1) |  | (pcs << 16) | (uint16_t) value;
		        uint32_t tmp __attribute__((unused));

                        while((SPI0_SR & SPI_SR_EOQF) == 0) {
                                // Maybe should test SR & 0xF0 to call popr, but ILI9341_t3 waitTransmitComplete does not...
                                tmp = KINETISK_SPI0.POPR;  // drain RX FIFO
                        }
                        SPI0_SR = SPI_SR_EOQF;
                } else {
                        SPI0_PUSHR = SPI_PUSHR_CTAS(1) | (pcs <<16) | (uint16_t) value;
                        waitFifoNotFull();
                }
        }
}
Note: on the 16 byte ones I don't pass the CONT as you wish for it to deassert the CS pin. Which will give some gap between the groups, as per how the configuration of the master register controls how much of a gap... But what it will do is allow you to keep calling it for each of the items, and have the SPI control turning on and off CS, so you don't have to drain the queue for each one...
 
Thanks, this should be fine when all I need to do it write out data. However, at other points in my code (not shown here), I have to write out blank data, but read back the MISO data. Again, your suggestion would work, however, the problem with using the SPI CS is that I need to check the status of the MISO pin before I actually read it. That is, the device will signal an error on the MISO pin prior to the first CLK out. So I have to read the status of the MISO and decide whether to continue the write/read. And, I can only do that AFTER the CS is asserted. Using the SPI CS would assume sending the full 24 bit frame right after the statement. This would normally be a good thing and I agree, would reduce gaps, both I don't see how I can read the MISO status when asserting CS but not actually writing out the frame.

I realize that I am probably getting nit picky reducing the gaps between the 24 bit frame when I have other gaps between frames that are more significant. However, I am trying to read upwards of several hundred thousand frames per second so every little bit matters. Thanks again for all your help, I'll see if these suggestions will help.
 
Status
Not open for further replies.
Back
Top