How best to manage multiple SPI busses?

Status
Not open for further replies.
Recently I have been been posting up on the Arduino Developers Email list, trying to hopefully come up with a proposal for minor enhancements to the SPI interface that allows users to more easily get more throughput on the SPI buss... As always this has been interesting.

I have also experimented with SPI on a few different platforms to again get a better idea.

More on that later.

Right now playing with SPI on the Teensy LC. Was curious if it makes sense to have the buffer transfers try using 16 bit writes when possible like I did for the 3.x boards. And again it might? That is with using 8 byte writes per entry. I am getting something like .24us gap between bytes (with the double buffering working). With the 16 bit writes, I am still getting around the .24us gap between words, and a in the nature of .16-.2us between bytes of a word. So gains a little.

With a quick and dirty test that output 128 bytes I was seeing:
If I do simply one byte outputs SPI.transfer(buffer) : it took about 300us to output the buffer
If I use SPI.transfer16((buffer << 8) | buffer[i + 1]) : it took 270
If I used current SPI.transfer(buffer, 128): 200
SPI.transfer(buffer, 128) using 16 bit writes: 188

What I am trying to see next does it make sense to try to utilize the maybe 64 bit FIFO on SPI on LC
? Has anyone tried this? Does this chip support it?


Trying it in test program, like:
Code:
void transfer_lc_fifo(const void * buf, void * retbuf, uint32_t count) {
  if (count == 0) return;
  const uint8_t *p = (const uint8_t *)buf;
  uint8_t *pret = (uint8_t *)retbuf;
  uint8_t in;
  uint32_t count_in = count; 

  Serial.println("Try fifo");
  Serial.printf("%x %x %x \n", SPI0_S, SPI0_C1, SPI0_C2); Serial.flush();
  Serial.printf("%x %x\n", SPI0_C3, SPI0_CI); Serial.flush();
  KINETISL_SPI0.C3 |= SPI_C3_FIFOMODE;
  Serial.println("fifo turned on"); Serial.flush(); 
  KINETISL_SPI0.S;       // Read in status

  while (count_in) {
    if (count && !(KINETISL_SPI0.S & SPI_S_TXFULLF)) { 
      // we have more characters to output and fifo not full
      KINETISL_SPI0.DL = p? *p++ : 0;   
      count--;
    }
    while (!KINETISL_SPI0.S & SPI_S_RFIFOEF) {
      // There is data available in fifo queue so extract
      in = KINETISL_SPI0.DL; 
      if (pret) *pret++ = in;
      count_in--;  
    }
  }
  KINETISL_SPI0.C3 = 0;  // turn off FIFO
  KINETISL_SPI0.S;       // Read in status
}
And when I call this function, The "Try fifo" message shows up in debug terminal. The printing of S, C1 and C2 works. (20, 50, 0)
The printing of CI or C3 does not work. Before I added these prints, of registers. The attempt to turn on FIFO mode (set C3 appeared to die) as again the print after it fails.

So again wonder if anyone has used the FIFO? I am mainly looking at fifo if I attempt to add an async version of transfer as suggested on the email list:
SPI.transfer(txbuf, rxbuf, cnt, call back)
In these cases it would be nice to be able to load up the queues and minimize interrupts.

Kurt
 
Thanks, That is what I wondered. Where in the manual(s)? I was looking but for some reason I missed it... Will look again.

Thanks again!

Update: Still did not find it in documents. But experimented using SPI1 and have the fifo queue working, with the test case.

It was also good for testing some more issues with the main thing of this thread which is to have a version of SPI library that all of the objects are of the class SPIClass. Ran into issue that SPI1 was not using the right br value out of settings, which I now fixed.
 
Last edited:
Thanks,
I somehow missed that section in the manual :eek: ... (Again)

I mainly looked through chapter 37 and then the other TLC manual about electrical... What was interesting was that section 37.3 memory map for it showed addresses for SPI0_CL and SPI0_C3, however if you touch the addresses the CPU faults...

Thanks again.

So the Question will be: IF we add async SPI support into main library? as suggested by those who are working on the STM32 SPI library, Should I first attempt to do it with using FIFO interrupts or should it setup to use DMA under the hood...

Should there be maximum sizes allowed for these? For example when doing the DMA stuff for ILI9341 code (which I borrowed some of Frank's )code I think each DMASetting was setup to do a maximum of 64K transfer, not sure if this is a hardware limit or not...

Back to playing and thanks again!
 
So I thought I would now try adding the ASYNC support. First thought is to try to get DMA to work. I probably should start with the T3.x boards as I know them a bit better, but I started with T-LC

So I wrote a quick and dirty prototype of function, and I tried specifically for SPI1. And when my code tries to initialize the DMA stuff it faults/hangs very early on...

Code:
//=========================================================================
// Try Transfer using DMA.
//=========================================================================
DMAChannel   *_dmaTX = NULL;
DMAChannel    *_dmaRX = NULL;
uint8_t      _dma_state = 0;
uint8_t      _dma_dummy_tx = 0;
uint8_t      _dma_dummy_rx;
void (*_dma_callback)();

void _dma_rxISR(void) {
  Serial.println("_dma_rxISR");
  _dmaRX->clearInterrupt();
  KINETISL_SPI1.C2 = 0;
  _dmaTX->clearComplete();
  _dmaRX->clearComplete();

  _dma_state = 1;   // set back to 1 in case our call wants to start up dma again
  if (_dma_callback)
    (*_dma_callback)();

}

bool transfer_lc_dma(const void * buf, void * retbuf, uint32_t count, void(*callback)(void)) {

  if (!_dma_state) {
    Serial.println("First dma call");

    _dmaTX = new DMAChannel();
    _dmaTX->destination(KINETISL_SPI1.DL);
//    _dmaTX->destination(_dma_dummy_tx);
    Serial.println("TAD"); Serial.flush();
    _dmaTX->triggerAtHardwareEvent(DMAMUX_SOURCE_SPI1_TX);
    Serial.println("TAT"); Serial.flush();
    _dmaTX->disableOnCompletion();
    Serial.println("TDOC"); Serial.flush();

    _dmaTX->disable();
    Serial.println("TDEST"); Serial.flush();

    _dmaRX = new DMAChannel();
    _dmaRX->disable();
    Serial.println("RDIS"); Serial.flush();
    _dmaRX->source(KINETISL_SPI1.DL);
    Serial.println("RDEST"); Serial.flush();
    _dmaRX->disableOnCompletion();
    _dmaRX->triggerAtHardwareEvent(DMAMUX_SOURCE_SPI1_RX);
    _dmaRX->attachInterrupt(_dma_rxISR);
    _dmaRX->interruptAtCompletion();

    _dma_state = 1;  // Should be first thing set!
    Serial.println("end First dma call");
  }
  if (_dma_state == 2)
    return false; // already active
  // Now handle NULL pointers.
  if (buf) {
    _dmaTX->sourceBuffer((uint8_t*)buf, count);
  } else {
    _dmaTX->source(_dma_dummy_tx);   // maybe have setable value
    _dmaTX->transferCount(count);
  }

  if (retbuf) {
    _dmaRX->destinationBuffer((uint8_t*)retbuf, count);
  } else {
    _dmaRX->destination(_dma_dummy_rx);    // NULL ?
    _dmaRX->transferCount(count);
  }
  _dma_callback = callback;

  // Now try to start it?
    // Setup DMA main object
    //Serial.println("Setup _dmatx");
  Serial.println("Before DMA C1");
    KINETISL_SPI1.C1 &= ~(SPI_C1_SPE);
  Serial.println("Before DMA C2");
    KINETISL_SPI1.C2 |= SPI_C2_TXDMAE | SPI_C2_RXDMAE;
  Serial.println("Before RX enable");
    _dmaRX->enable();
  Serial.println("Before TX enable");
    _dmaTX->enable();
    _dma_state = 2;
  Serial.println("DMA end of call");
    return true;
 }
The code dies in the call: _dmaTX->destination(KINETISL_SPI1.DL);

My looking through the DMAChannel code looks like it should properly set access to these parts of memory.

So I hacked up the dma channel code and Tried to verify where it was dying, I also rearranged the code to see if I could at least tough the CFG...
Currently looks like:
Code:
void destination(volatile signed char &p) { destination(*(volatile uint8_t *)&p); }
	void destination(volatile unsigned char &p) {
		Serial.printf("D: %x %x\n", (uint32_t)&p, (uint32_t)CFG); Serial.flush();
		Serial.printf("%x %x %x %x\n\r", (uint32_t)CFG->SAR, (uint32_t)CFG->DAR, CFG->DSR_BCR, CFG->DCR); Serial.flush();
		CFG->DCR = (CFG->DCR & 0xF0F0F0FF) | DMA_DCR_DSIZE(1);
		Serial.println("set DCR"); Serial.flush();
		CFG->DAR = (void*)&p;
		Serial.println("set DAR"); Serial.flush();
And my output in debug terminal:
Code:
First dma call
D: 40077006 40008100
0 0 0 20000000

set DCR
So it is dying on the line: CFG->DAR = (void*)&p;
The address look valid to me: that is 40077006 is address of the DL register and 40008100 is address of first DMA channel.

Suggestions?

I then tried running the DMASPI library example program for SPI0 as well as a copy of it where I tried on SPI1. They also both appear to die at the same place. Potentially it might be my updated SPI library? I did have to slightly modify the Dmaspi library as both SPI and SPI1 use the same class.
I did verify that the Example would run on a T3.6...

Question: Has anyone tried DMA SPI on T-LC lately? I probably should pull out a secondary dev machine and see if I can get it to work with currently released 1.8.2 Arduino with released SPI and current Teensyduino.

But again would really welcome suggestions... Will probably adapt this test to the T3.x boards while hopefully get some hints.

Thanks!
 
Update: Still curious about T-LC in previous post, but right now playing with a 3.x version and making progress.

It is actually going through and outputting and calling my callback. Which is good. Will then add it to my main SPI implementation for Async support.

There are a couple of issues I am trying to figure out, what makes sense:

T3.x
a) If your last SPI transfer was using 16 bit mode, your transfer will continue in 16 bit mode. That is since you are only touching the low word of the PUSHR register it keeps what was previously in the High word which has things like CS settings CONT flag plus setting for 8 bit or 16 bit transfer. This is not unique to my test code. For example DMASPI test program.
Code:
  DMASPI0.begin();
  DMASPI0.start();

  // Wonder what happens if I do a SPI.transfer16()
  SPI.transfer16(0xffff);


  DmaSpi::Transfer trx(nullptr, 0, nullptr);

  Serial.println("Testing src -> dest, single transfer");
  Serial.println("--------------------------------------------------");
  trx = DmaSpi::Transfer(src, DMASIZE, dest);
  clrDest((uint8_t*)dest);
  DMASPI0.registerTransfer(trx);
  while(trx.busy())
  {
  }
I added the transfer16 and all of the writes after this point were 16 bits (a zero byte was added to each transfer)

How to fix?
1) Punt - Maybe ok in some cases, but ... I would not like to in generic SPI library.
2) push/pop without DMA the first byte, such that it sets up the proper state. I did this with my DMA code for ili9341_t3n. Works OK in specific case, but lots stuff to figure out.
3) Maybe create a DMASettings chain on the TX where the first item is setup to PUSHR 32 bits with proper first byte plus settings and then chains to 2nd item that does the rest.. Several cases to think about - But...

b) T3.5 - Can not use DMA support on T3.5 for SPI1/SPI2... Could do using interrupts, we have fifo, but only one item deep, so won't be much of a win.

Well now back to playing around

Again suggestions hoped for.
 
Thanks tni,

Thanks that is helping on the LC.

I am still trying to figure some of the stuff out... But it is outputting something. But not fully the right, and as such not getting to the end transfer stuff.
In particular when I look at the output using LA I see, it output bytes like:
00 80 01 81 02 82... When the array was init 00 01 02 ... 7f. So I am guessing that some of my/dmachannel stuff is not setting up sizes properly.
 
Hopefully new version of SPI library - Async support

Thought I would mention that I updated my version of the SPI library (https://github.com/KurtE/SPI/tree/SPI-Multi-one-class)

This update I believe I now have some DMA Async support added for the LC, plus a a few fixes for the 3.x version of the DMA Async support.

There are still a few more issues I need to iron out. But a summary of what is in this version of the library.

All of the SPI object are derived from SPIClass (like many of the other implementations). I did it similar to what Paul did for Wire and have a hardware Table created for the differences between objects.

So hopefully in future we can have more libraries updated that can handle multiple SPI busses without having to do a bunch of work.
To help with this I added members that work with same table as we would use in lets say: setMiso
I added another function: pinIsMISO so you can verify for all of the different busses and boards.


The Transfer methods have been enhanced: Still have:
transfer(b)
transfer16(w)
transfer(buf, cnt) - It overwrites your buff.

New stuff:
transfer(txbuf, rxbuf, cnt) // separate rx and tx either can be null.

(added member transferFillTXCharto set the TX char if txbuf is null: default 0 -

Async support - as suggested on email list
bool transfer(const void *txBuffer, void *rxBuffer, size_t count, void(*callback)(void));
void flush(void);
bool done(void);

Currently the Async is implemented using DMA. Have version for LC and version for 3.x

Issues to address:
On 3.x - Currently assuming 1 byte transfers. Issue if last PUSHR set special things like 16 bit mode or CS pins or... Need to address. I think I may try some/all of:
a) Detect I am in 16 bit mode and maybe detect that I am doing an even count output, then continue in 16 bit mode.
b) Switch to 8 bit mode. Maybe have two SPISetting chain where first byte is output doing a 4 byte write, which sets fully PUSHR register and then chain it to second item that does the rest.
c) Have the Async output first char sync if necessary...

Teensy 3.5 - SPI1/SPI2 - Don't have unique DMA TX/RX setup. Not sure if they work in the Read only or Write only case. Things to try.
a) Punt - have it not work on these
b) See if it works for only Read or only Write?
c) Implement instead using FIFO interrupts - But again these queues are only 1 entry in size, so would need to do interrupt for each input or output.

Another thing I want to decide on, is I had a suggested implementation for the Teensy LC for the Transfer(txbuf, rxbuf, cnt), that broke up the code into three separate sections, READ, WRITE, TRANSFER, as to avoid an if or two in the loop. They also special cased the instances where count = 1 or count = 2 and did special code for these cases.

That is the current code looks like:
Code:
void SPIClass::transfer(const void * buf, void * retbuf, uint32_t count) {
	if (count == 0) return;
	const uint8_t *p = (const uint8_t *)buf;
	uint8_t *pret = (uint8_t *)retbuf;
	uint8_t in;

	while (!(port.S & SPI_S_SPTEF)) ; // wait
	uint8_t out = p ? *p++ : _transferFillTXChar;
	port.DL = out;
	while (--count > 0) {
		[COLOR="#FF0000"]if (p)[/COLOR] {
			out = *p++;
		}
		while (!(port.S & SPI_S_SPTEF)) ; // wait
		__disable_irq();
		port.DL = out;
		while (!(port.S & SPI_S_SPRF)) ; // wait
		in = port.DL;
		__enable_irq();
		[COLOR="#FF0000"]if (pret)[/COLOR]*pret++ = in;
	}
	while (!(port.S & SPI_S_SPRF)) ; // wait
	in = port.DL;
	if (pret)*pret = in;
}
(Side note to myself - Does it really need to disable interrupts here?)

What I am wondering is does it make enough sense to have three copies of the above code, to remove those two if statements in red for the three different versions?

But warning: On Teensy LC, you need the patch to get DMA to work with current release
 
Another thing I want to decide on, is I had a suggested implementation for the Teensy LC for the Transfer(txbuf, rxbuf, cnt), that broke up the code into three separate sections, READ, WRITE, TRANSFER, as to avoid an if or two in the loop.
This is worth it. It does make a significant performance difference.

They also special cased the instances where count = 1 or count = 2 and did special code for these cases.
Probably a very rare use case.

That is the current code looks like:
Code:
void SPIClass::transfer(const void * buf, void * retbuf, uint32_t count) {
	if (count == 0) return;
	const uint8_t *p = (const uint8_t *)buf;
	uint8_t *pret = (uint8_t *)retbuf;
	uint8_t in;

	while (!(port.S & SPI_S_SPTEF)) ; // wait
	uint8_t out = p ? *p++ : _transferFillTXChar;
	port.DL = out;
	while (--count > 0) {
		[COLOR="#FF0000"]if (p)[/COLOR] {
			out = *p++;
		}
		while (!(port.S & SPI_S_SPTEF)) ; // wait
		__disable_irq();
		port.DL = out;
		while (!(port.S & SPI_S_SPRF)) ; // wait
		in = port.DL;
		__enable_irq();
		[COLOR="#FF0000"]if (pret)[/COLOR]*pret++ = in;
	}
	while (!(port.S & SPI_S_SPRF)) ; // wait
	in = port.DL;
	if (pret)*pret = in;
}
(Side note to myself - Does it really need to disable interrupts here?)
You do. You have 2 transmit bytes scheduled, one in the output shift register and one in port.DL. If you get interrupted, you loose a received byte (only one can be stored). transfer() hangs when a byte is lost.

What I am wondering is does it make enough sense to have three copies of the above code, to remove those two if statements in red for the three different versions?
Yes. You should also remove the entire
Code:
		while (!(port.S & SPI_S_SPRF)) ; // wait
		in = port.DL;
		__enable_irq();
		if (pret)*pret++ = in;
for the transmit-only case - no need to pick up the received bytes. Just empty port.DL at the very end.
 
Thanks TNI,

Makes sense for the disabling interrupts.

As for the write only case, yes it could remove the stuff you mentioned. I assume also the disable_irq...
Logically Something in the nature of:
Code:
void SPIClass::write(const void * buf, , uint32_t count) {
	if (count == 0) return;  //while would work here without this but what for character(s) to transmit would hang
	while (count-- > 0) {
		while (!(port.S & SPI_S_SPTEF)) ; // wait
		port.DL = *p++;
	}
        // Now need to somehow wait until the last character was output and then read in DL to clear.
}
But then need some way to synchronize the return from this function until the last bits have been output. The version that was suggested still had code in the main loop waiting for each character to be received. It also did not handle the interrupt case you mentioned.

So for now I will probably keep it as one version. But may play more later.

Thanks again!
 
Here is a version that is very fast without code duplication. It does 10.60Mbit. (An optimal transmit-only version can do 11.3Mbit.)

Code:
template<bool send, bool receive> __attribute__((always_inline)) 
void SPIClass::transfer_(const uint8_t* send_ptr, uint8_t* recv_ptr, uint32_t count) {
    const uint8_t _transferFillTXChar = 0;
    auto& __restrict port = this->port;
    uint8_t dummy;

    while(!(port.S & SPI_S_SPTEF)) ; // wait
    port.DL = send ? *send_ptr++ : _transferFillTXChar;

    while(--count > 0) {
        while(!(port.S & SPI_S_SPTEF)) ; // wait
        __disable_irq();
        port.DL = send ? *send_ptr++ : _transferFillTXChar;
        while(!(port.S & SPI_S_SPRF)) ; // wait
        *(receive ? recv_ptr++ : &dummy) = port.DL;
        __enable_irq();
    }
    
    while(!(port.S & SPI_S_SPRF)) ; // wait
    *(receive ? recv_ptr++ : &dummy) = port.DL;
}

void SPIClass::transfer(const void* buf, void* retbuf, uint32_t count) {
    if(count == 0) return;
    const uint8_t* send_ptr = (const uint8_t*) buf;
    uint8_t* recv_ptr = (uint8_t*) retbuf;
    if(send_ptr){
        if(recv_ptr) transfer_<true, true>(send_ptr, recv_ptr, count);
        else transfer_<true, false>(send_ptr, recv_ptr, count);
    } else {
        if(recv_ptr) transfer_<false, true>(send_ptr, recv_ptr, count);
        else transfer_<false, false>(send_ptr, recv_ptr, count);
    }
}
 
Last edited:
Quick update:

Thought maybe better to continue in this thread instead of the Teensy 3.5 DMA SPI thread as not all specific to 3.5...
As I mentioned in the other thread, I went around in circles getting the DMA to work on the three 3.x boards as each one behaved slightly (or majorly) different. Like T3.5 on SPI1 and SPI2 only has one DMA channel...

Today I thought I would play with the Teensyview displays, to see if I can get multiple of them to work both SYNC and ASYNC.

As I mentioned in the other thread, I wondered if it made sense to allow the Callback function to maybe have an optional parameter like maybe void* that can be passed to it, that gets passed back. I thought I would also ask on the Arduino Mail list and received a few responses were against it....

So I did a quick and dirty version of the Teensyview display update using Async. Right now I have it setup that I have three static call back functions in the class (one for each possible SPI), and the object on the first time it wishes to do async update, figures out which one to use and saves away it's this pointer in a static data member, such that when this callback is called it then uses the this pointer to call appropriate function for that class object.

This has several holes... Like what if I want two of these displays on the same SPI buss and wish to do async on both... Several solutions could be done, like when the last DMA output completes and the callback detects the display has completed, it could clear the this pointer and other instance could spin waiting for this to happen and then it claims the this pointer for that SPI...

Or could: build a simple Queue of outputs: (this, output buffer pointer, count, <maybe state change info: start trans, assert dc, unassert dc, end transaction). The code would need to be such that you do not intermix the transactions... ...

Anyway Currently I have three displays hanging off my T3.6 beta board (2 TV, 1 128x64) and I have a 3rd TV that I can also connect up to maybe have it try running through the TeensyView Screen demo program, where it does each of the main sections, one Teensyview at a time, So it first does Shapes on display0 then does shapes on display1 and then shapes on display2...

This works on all three displays each on different buss. So the multiple busses stuff is working here.

I then added a couple of draw screen functions from the ili9341 like update (draw Rectangles and draw circles pages and called it for all three displays without delays. Actually I alternated between the two displays 5 times with a small delay before it changed all three screens again and they all updated pretty quick.

Then I made the two functions take a parameter to optionally do the updates using the Async code. So far it appears like it worked. Again the updates are so quick, that may be hard to see if there are any visual differences. That is with the standard way, how much of a gap in time after I display the first screen before the next screen updates...

Tomorrow I may hack in a way to handle multiple displays async on the same buss... And Try the 4th display on the processor. Also may move the code to simpler test app, that includes more of the ili9341 graphic test like tests. Just not sure how many of the tests make sense on a monochrome display.

If anyone is interested could also upload test app and hacked up Teensyview. As I mentioned on other thread my one branch of my SPI fork was updated yesterday.
 
As I mentioned in the other thread, I wondered if it made sense to allow the Callback function to maybe have an optional parameter like maybe void* that can be passed to it, that gets passed back.
That is a really good idea. The current Arduino situation with stateless callbacks really sucks. E.g. look at the mess of the Encoder library (Encoder.h) and the hoops it jumps through to get 'attach_interrupt()' to emulate a state parameter.

If you can't attach state to a callback, you can't have clean object oriented design that supports multiple instances of the same class (e.g. a Display class that uses SPI). If you have a 'void*' parameter, you can easily use that to dispatch to the correct instance.
 
Last edited:
Building consensus on the mail list is almost impossible. Even when things do go well, it can take a year or more of sustained effort to get any sort of substantial API adopted.
 
As I mentioned in the other thread, I wondered if it made sense to allow the Callback function to maybe have an optional parameter like maybe void* that can be passed to it, that gets passed back. I thought I would also ask on the Arduino Mail list and received a few responses were against it....
I saw only one complaint (from Thomas Roell) in the archives and the reasoning behind that complaint is wrong. I would have posted a reply, but the developer mailing list is broken.

Thomas Roell claimed that 3 data entries would be needed for storing the callback (callback function pointer, this, data). A C-style function pointer + 'void*' context is quite enough for dispatching to an object. E.g.:

Code:
class S {
public:
    void callback();
};

void registerCallback(void (*callback_fn)(void*), void* context) {
    // ...
}

void test(S* s) {
    // ...
    registerCallback([](void* ctx){ ((S*) ctx)->callback();  }, s);
}

For dispatching to an object member with parameter(s), the context could be a struct containing the necessary data.

Glancing at the STM32L4 core, the PendSV dispatching that Thomas mentioned does support a 'void*' parameter for callbacks.
 
That would be very useful indeed. I'm currently storing a pointer to "s" in a global variable to get the same effect but that is kind of ugly. With the proposed default value of nullptr it should not break any existing code.
 
With a sufficiently smart compiler and sufficient template magic, you should be able to create a wrapper that "transparently" allows you to use either SPI object with whatever library, and not pay any indirection overhead.
Of course, the drawback being that the library needs to be updated to support this.
 
Thanks guys,

@tni - have you tried email to the address mentioned toward the end of the arduino forum posting you mentioned? Wonder if it worked?

@jwatte - Maybe at some point I should play more with templates... They are something that I have never used as it was not part of C back when I first learned (ice age) ;)


I meant to respond yesterday, but was busy outside, and now playing with with my Teensyview code to test out the Async SPI code, and trying to figure out why things are not responding the way I would expect... I need to test it out in a simple test case and See what happens... In particular setting up to do a new Async transfer within the callback. Also handling multiple SPI transfers at the same time

Something like:
Code:
uint8_t buffer1_1[6];
uint8_t buffer1_2[512];
volatile uint8_t state1 = 0;

uint8_t buffer2_1[6];
uint8_t buffer2_2[512];
volatile uint8_t state2 = 0;

uint8_t buffer3_1[6];
uint8_t buffer3_2[512];
volatile uint8_t state3 = 0;

void callback1() {
    if (state1 == 1) {
        SPI.transfer(buffer1_2, NULL, sizeof(buffer1_2), &callback1)
        state1 = 2;
    } else {
        state1 = 0;
    }
}
... Same for 2 and 3, but using SPI1 and SPI2...

void loop() {
    while (state1 != 0)   ;
    state1 = 1;
    SPI.transfer(buffer1_1, NULL, sizeof(buffer1_1), &callback1);
 
    while (state2 != 0) ;
    state2 = 1;
    SPI.transfer(buffer2_1, NULL, sizeof(buffer2_1), &callback2);

    while (state3 != 0) ;
    state3 = 1;
    SPI.transfer(buffer3_1, NULL, sizeof(buffer3_1), &callback3);
}

So basically testing out having callback be callled and issue new request, plus multiple DMA requests going on at same time.

I thought I had it working earlier yesterday, until I figured out that Teensyview was doing screen updates more difficult than it needed to.
That is it was doing something like:
[header for page 0(4 bytes)][data for page 0(128 bytes)][header for page 1][data for page 1]....[header for page 3][data for page 3] (0r 7 for 128x64]

But looking at Adafruit SSD1306, I found you could turn on horizontal memory mode, which when you filled page 0, would automatically advance to page 1... So once I did that the logical update is simply:
[header 6 bytes][data 512 bytes] or for 128x64 - 1024 bytes...

Hopefully today will figure it out... But it is sunny
 
Another quick update: I am probably running into a timing issue. I broke out the draw function on it's own and tried with just one of them and it hung after the first update... So I then ripped out all of the Teensyview code and just left in the logical draw functions, just going to use SPI... And with only one it was running... Then enabled second one and it hangs...

Actually it appears to have issues if the count of bytes >=512 Works ok with 511...

In case anyone wishes to play along. Again this uses my other branch/fork of SPI.

Code:
#include <SPI.h>
//////////////////////////////////
// TeensyView Object Declaration //
///////////////////////////////////
#define USE_SPI1
//#define USE_SPI2
// Kurt's setup
#define PIN_RESET 15
#define PIN_SCK   13
#define PIN_MOSI  11
#define PIN_DC    21
#define PIN_CS    20
// Setup 2nd one SPI1
#define PIN_RESET1 16
#define PIN_SCK1  32
#define PIN_MOSI1 0
#define PIN_DC1   31
#define PIN_CS1  30

// Pins on connector on Beta T3.6 board (3.3, GND)(48, 47)(57 56) (51 52) (53 55)
#define PIN_RESET2 48
//#define PIN_MISO2 51
#define PIN_MOSI2 52
#define PIN_SCK2  53
#define PIN_DC2   55
#define PIN_CS2  56

// This is real C..p but simple test to see how multipole DMA and 
// transfers work called from callback
uint8_t header1[] = {0x21, 0, 0x7f, 0x22, 00, 03};
uint8_t header2[] = {0x21, 0, 0x7f, 0x22, 00, 07};  // larger display.
uint8_t header3[] = {0x21, 0, 0x7f, 0x22, 00, 03};
#define BUFFER1_SIZE 511
#define BUFFER2_SIZE 250
#define BUFFER3_SIZE 128
uint8_t buffer1[BUFFER1_SIZE];
uint16_t buffer1_size = sizeof(buffer1);
volatile uint8_t state1 = 0;
uint8_t buffer2[BUFFER2_SIZE];
uint16_t buffer2_size = sizeof(buffer2);
volatile uint8_t state2 = 0;
uint8_t buffer3[BUFFER2_SIZE];
uint16_t buffer3_size = sizeof(buffer3);
volatile uint8_t state3 = 0;



void setup()
{
  while (!Serial && millis() < 3000); 
  Serial.begin(38400);
  SPI.begin();
  pinMode(PIN_CS, OUTPUT);
  pinMode(PIN_DC, OUTPUT);
  digitalWrite(PIN_CS, HIGH);
  digitalWrite(PIN_DC, HIGH);
#ifdef USE_SPI1
  SPI1.begin();
  pinMode(PIN_CS1, OUTPUT);
  pinMode(PIN_DC1, OUTPUT);
  digitalWrite(PIN_CS1, HIGH);
  digitalWrite(PIN_DC1, HIGH);
#else
  pinMode(PIN_CS1, OUTPUT);  // use to debug
  digitalWrite(PIN_CS1, HIGH);
#endif

#ifdef USE_SPI2
  SPI2.begin();
  pinMode(PIN_CS2, OUTPUT);
  pinMode(PIN_DC2, OUTPUT);
#endif

  delay(1000);     // Delay 1000 ms
}

void callback1() {
  if (state1 == 1) {
      state1 = 2;
      digitalWriteFast(PIN_DC, HIGH);      
      if (!SPI.transfer(buffer1, NULL, buffer1_size, &callback1))
      {
        Serial.println("SPI Transfer failed");
      }
  } else {
      state1 = 0;
      digitalWriteFast(PIN_CS, HIGH);
      SPI.endTransaction();
  }
}
#ifdef USE_SPI1

void callback2() {
  if (state2 == 1) {
      state2 = 2;
      digitalWriteFast(PIN_DC1, HIGH);      
      SPI1.transfer(buffer2, NULL, buffer2_size, &callback2);
  } else {
      state2 = 0;
      digitalWriteFast(PIN_CS1, HIGH);
      SPI1.endTransaction();
  }
}
#endif
#ifdef USE_SPI2
void callback3() {
  if (state3 == 1) {
      state3 = 2;
      digitalWriteFast(PIN_DC2, HIGH);      
      SPI2.transfer(buffer3, NULL, buffer3_size, &callback3);
  } else {
      state3 = 0;
      digitalWriteFast(PIN_CS2, HIGH);
      SPI2.endTransaction();
  }
}
#endif
uint8_t loop_counter = 0;
void loop()
{
    elapsedMillis timer;
    loop_counter++;
#ifndef USE_SPI1
    digitalWrite(PIN_CS1, !digitalRead(PIN_CS1));
#endif

    timer = 0;
    while (state1 != 0)  {
      if (timer > 10) {
        Serial.printf("Timeout SPI: %d %x %x\n", state1, header1, buffer1);
        Serial.printf("  TX: C:%d, err: %d, S: %x, D: %x\n", SPI._dmaTX->complete(), SPI._dmaTX->error(), SPI._dmaTX->sourceAddress(), SPI._dmaTX->destinationAddress());
        Serial.printf("  RX: C:%d, err: %d, S: %x, D: %x\n", SPI._dmaRX->complete(), SPI._dmaRX->error(), SPI._dmaRX->sourceAddress(), SPI._dmaRX->destinationAddress());
        Serial.println("Hit any key to continue");
        while (Serial.read() == -1) ;
        while (Serial.read() != -1) ;
       break;
      }

    }
    Serial.println("Start 1");
    state1 = 1;
    memset(buffer1, (loop_counter & 1)? 0xff : 0, buffer1_size);
    SPI.beginTransaction(SPISettings(8000000, MSBFIRST, SPI_MODE0));
    digitalWriteFast(PIN_CS, LOW);
    digitalWriteFast(PIN_DC, LOW);
    SPI.transfer(header1, NULL, sizeof(header1), &callback1);

#ifdef USE_SPI1
    timer = 0;
    while (state2 != 0)  {
      if (timer > 10) {
        Serial.printf("Timeout SPI1: %d %x %x\n", state2, header2, buffer2);
        Serial.printf("  TX: C:%d, err: %d, S: %x, D: %x\n", SPI1._dmaTX->complete(), SPI1._dmaTX->error(), SPI1._dmaTX->sourceAddress(), SPI1._dmaTX->destinationAddress());
        Serial.printf("  RX: C:%d, err: %d, S: %x, D: %x\n", SPI1._dmaRX->complete(), SPI1._dmaRX->error(), SPI1._dmaRX->sourceAddress(), SPI1._dmaRX->destinationAddress());
        Serial.println("Hit any key to continue");
        while (Serial.read() == -1) ;
        while (Serial.read() != -1) ;
       break;
      }
    }
    Serial.println("Start 2");
    state2 = 1;
    memset(buffer2, (loop_counter & 1)? 0x0 : 0xff, buffer2_size);
    SPI1.beginTransaction(SPISettings(8000000, MSBFIRST, SPI_MODE0));
    digitalWriteFast(PIN_CS1, LOW);
    digitalWriteFast(PIN_DC1, LOW);
    SPI1.transfer(header2, NULL, sizeof(header2), &callback2);
#endif
#ifdef USE_SPI2
    while (state3 != 0)   ;
    Serial.println("Start 3");
    state3 = 1;
    memset(buffer3, (loop_counter & 1)? 0xff : 0, buffer3_size);
    SPI2.beginTransaction(SPISettings(8000000, MSBFIRST, SPI_MODE0));
    digitalWriteFast(PIN_CS2, LOW);
    digitalWriteFast(PIN_DC2, LOW);
    SPI2.transfer(header3, NULL, sizeof(header3), &callback3);
#endif
}

Which is interesting as at least a little while ago my ili9341_t3n code that used DMA to output a full screen and I use 3 DMASettings objects:
320*240/3 = 25600 words output, which is > 512... So need to figure out difference...
 
I think I figured it out: :D Issue with DMAChannel.h

I have code that looks like:
Code:
		_dmaRX->destination((uint8_t&)bit_bucket);
		_dmaRX->transferCount(count);
		rser =  SPI_RSER_RFDF_RE | SPI_RSER_RFDF_DIRS | SPI_RSER_TFFF_RE | SPI_RSER_TFFF_DIRS;
Suppose I have am doing 512 byte writes for the screen memory, so I

So I output my 6 byte header, then my 512 byte buffer, and then when I repeat I output my 6 byte header:

Now if we look at transferCount method:
Code:
	void transferCount(unsigned int len) {
		if (len > 32767) return;
		if (len >= 512) {
			TCD->BITER = len;
			TCD->CITER = len;
		} else {
			TCD->BITER = (TCD->BITER & 0xFE00) | len;
			TCD->CITER = (TCD->CITER & 0xFE00) | len;
		}
	}
I first call it with 6 and before this assume BITER = 0;
so BITER = (0 & oxfE00) | 6 = 0x6

Then I call with 0x200 (512)
so BITER = (6 & 0xfe00) | 200 = 0x200

Then I call again with 0x6:
(200 & 0xfe00) | 0x6 = 0x206 Which is wrong...

The code should maybe be something like:
Code:
	void transferCount(unsigned int len) {
		if (len > 32767) return;
                if (!(TCD->BITER & DMA_TCD_BITER_ELINK))
			TCD->BITER = len;
			TCD->CITER = len;
		} else {
			TCD->BITER = (TCD->BITER & 0xFE00) | len;
			TCD->CITER = (TCD->CITER & 0xFE00) | len;
		}
	}

That is if the LINK bit is not set than use the 15 bits...

EDIT: That appears to make my app happy :D
 
Last edited:
Status
Not open for further replies.
Back
Top