Forum Rule: Always post complete source code & details to reproduce any issue!
Page 2 of 2 FirstFirst 1 2
Results 26 to 39 of 39

Thread: How best to manage multiple SPI busses?

  1. #26
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    14,082
    Quote Originally Posted by defragster View Post
    A new scheme for Ghost posting? Grab one and use part of it to seem like it fits?
    Yup, spammers.

    (I deleted the message, in case anyone's wondering what this was about...)

  2. #27
    Senior Member KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    2,212
    Recently I have been been posting up on the Arduino Developers Email list, trying to hopefully come up with a proposal for minor enhancements to the SPI interface that allows users to more easily get more throughput on the SPI buss... As always this has been interesting.

    I have also experimented with SPI on a few different platforms to again get a better idea.

    More on that later.

    Right now playing with SPI on the Teensy LC. Was curious if it makes sense to have the buffer transfers try using 16 bit writes when possible like I did for the 3.x boards. And again it might? That is with using 8 byte writes per entry. I am getting something like .24us gap between bytes (with the double buffering working). With the 16 bit writes, I am still getting around the .24us gap between words, and a in the nature of .16-.2us between bytes of a word. So gains a little.

    With a quick and dirty test that output 128 bytes I was seeing:
    If I do simply one byte outputs SPI.transfer(buffer[i]) : it took about 300us to output the buffer
    If I use SPI.transfer16((buffer[i] << 8) | buffer[i + 1]) : it took 270
    If I used current SPI.transfer(buffer, 128): 200
    SPI.transfer(buffer, 128) using 16 bit writes: 188

    What I am trying to see next does it make sense to try to utilize the maybe 64 bit FIFO on SPI on LC
    ? Has anyone tried this? Does this chip support it?


    Trying it in test program, like:
    Code:
    void transfer_lc_fifo(const void * buf, void * retbuf, uint32_t count) {
      if (count == 0) return;
      const uint8_t *p = (const uint8_t *)buf;
      uint8_t *pret = (uint8_t *)retbuf;
      uint8_t in;
      uint32_t count_in = count; 
    
      Serial.println("Try fifo");
      Serial.printf("%x %x %x \n", SPI0_S, SPI0_C1, SPI0_C2); Serial.flush();
      Serial.printf("%x %x\n", SPI0_C3, SPI0_CI); Serial.flush();
      KINETISL_SPI0.C3 |= SPI_C3_FIFOMODE;
      Serial.println("fifo turned on"); Serial.flush(); 
      KINETISL_SPI0.S;       // Read in status
    
      while (count_in) {
        if (count && !(KINETISL_SPI0.S & SPI_S_TXFULLF)) { 
          // we have more characters to output and fifo not full
          KINETISL_SPI0.DL = p? *p++ : 0;   
          count--;
        }
        while (!KINETISL_SPI0.S & SPI_S_RFIFOEF) {
          // There is data available in fifo queue so extract
          in = KINETISL_SPI0.DL; 
          if (pret) *pret++ = in;
          count_in--;  
        }
      }
      KINETISL_SPI0.C3 = 0;  // turn off FIFO
      KINETISL_SPI0.S;       // Read in status
    }
    And when I call this function, The "Try fifo" message shows up in debug terminal. The printing of S, C1 and C2 works. (20, 50, 0)
    The printing of CI or C3 does not work. Before I added these prints, of registers. The attempt to turn on FIFO mode (set C3 appeared to die) as again the print after it fails.

    So again wonder if anyone has used the FIFO? I am mainly looking at fifo if I attempt to add an async version of transfer as suggested on the email list:
    SPI.transfer(txbuf, rxbuf, cnt, call back)
    In these cases it would be nice to be able to load up the queues and minimize interrupts.

    Kurt

  3. #28
    Senior Member
    Join Date
    Jan 2013
    Posts
    567
    I don't think SPI0 on KL26 has a FIFO. The manual only mentions a FIFO for SPI1.

  4. #29
    Senior Member KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    2,212
    Thanks, That is what I wondered. Where in the manual(s)? I was looking but for some reason I missed it... Will look again.

    Thanks again!

    Update: Still did not find it in documents. But experimented using SPI1 and have the fifo queue working, with the test case.

    It was also good for testing some more issues with the main thing of this thread which is to have a version of SPI library that all of the objects are of the class SPIClass. Ran into issue that SPI1 was not using the right br value out of settings, which I now fixed.
    Last edited by KurtE; 04-21-2017 at 03:05 AM.

  5. #30
    Senior Member
    Join Date
    Jan 2013
    Posts
    567
    KL26 Sub-Family Reference Manual.

    3.9.2.1 SPI instantiation information
    The number of SPI modules on this device is: two
    The supported data length is: 16-bit
    SPI1 includes a 4-deep FIFO of 16-bit entries.
    ...

  6. #31
    Senior Member KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    2,212
    Thanks,
    I somehow missed that section in the manual ... (Again)

    I mainly looked through chapter 37 and then the other TLC manual about electrical... What was interesting was that section 37.3 memory map for it showed addresses for SPI0_CL and SPI0_C3, however if you touch the addresses the CPU faults...

    Thanks again.

    So the Question will be: IF we add async SPI support into main library? as suggested by those who are working on the STM32 SPI library, Should I first attempt to do it with using FIFO interrupts or should it setup to use DMA under the hood...

    Should there be maximum sizes allowed for these? For example when doing the DMA stuff for ILI9341 code (which I borrowed some of Frank's )code I think each DMASetting was setup to do a maximum of 64K transfer, not sure if this is a hardware limit or not...

    Back to playing and thanks again!

  7. #32
    Senior Member KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    2,212
    So I thought I would now try adding the ASYNC support. First thought is to try to get DMA to work. I probably should start with the T3.x boards as I know them a bit better, but I started with T-LC

    So I wrote a quick and dirty prototype of function, and I tried specifically for SPI1. And when my code tries to initialize the DMA stuff it faults/hangs very early on...

    Code:
    //=========================================================================
    // Try Transfer using DMA.
    //=========================================================================
    DMAChannel   *_dmaTX = NULL;
    DMAChannel    *_dmaRX = NULL;
    uint8_t      _dma_state = 0;
    uint8_t      _dma_dummy_tx = 0;
    uint8_t      _dma_dummy_rx;
    void (*_dma_callback)();
    
    void _dma_rxISR(void) {
      Serial.println("_dma_rxISR");
      _dmaRX->clearInterrupt();
      KINETISL_SPI1.C2 = 0;
      _dmaTX->clearComplete();
      _dmaRX->clearComplete();
    
      _dma_state = 1;   // set back to 1 in case our call wants to start up dma again
      if (_dma_callback)
        (*_dma_callback)();
    
    }
    
    bool transfer_lc_dma(const void * buf, void * retbuf, uint32_t count, void(*callback)(void)) {
    
      if (!_dma_state) {
        Serial.println("First dma call");
    
        _dmaTX = new DMAChannel();
        _dmaTX->destination(KINETISL_SPI1.DL);
    //    _dmaTX->destination(_dma_dummy_tx);
        Serial.println("TAD"); Serial.flush();
        _dmaTX->triggerAtHardwareEvent(DMAMUX_SOURCE_SPI1_TX);
        Serial.println("TAT"); Serial.flush();
        _dmaTX->disableOnCompletion();
        Serial.println("TDOC"); Serial.flush();
    
        _dmaTX->disable();
        Serial.println("TDEST"); Serial.flush();
    
        _dmaRX = new DMAChannel();
        _dmaRX->disable();
        Serial.println("RDIS"); Serial.flush();
        _dmaRX->source(KINETISL_SPI1.DL);
        Serial.println("RDEST"); Serial.flush();
        _dmaRX->disableOnCompletion();
        _dmaRX->triggerAtHardwareEvent(DMAMUX_SOURCE_SPI1_RX);
        _dmaRX->attachInterrupt(_dma_rxISR);
        _dmaRX->interruptAtCompletion();
    
        _dma_state = 1;  // Should be first thing set!
        Serial.println("end First dma call");
      }
      if (_dma_state == 2)
        return false; // already active
      // Now handle NULL pointers.
      if (buf) {
        _dmaTX->sourceBuffer((uint8_t*)buf, count);
      } else {
        _dmaTX->source(_dma_dummy_tx);   // maybe have setable value
        _dmaTX->transferCount(count);
      }
    
      if (retbuf) {
        _dmaRX->destinationBuffer((uint8_t*)retbuf, count);
      } else {
        _dmaRX->destination(_dma_dummy_rx);    // NULL ?
        _dmaRX->transferCount(count);
      }
      _dma_callback = callback;
    
      // Now try to start it?
        // Setup DMA main object
        //Serial.println("Setup _dmatx");
      Serial.println("Before DMA C1");
        KINETISL_SPI1.C1 &= ~(SPI_C1_SPE);
      Serial.println("Before DMA C2");
        KINETISL_SPI1.C2 |= SPI_C2_TXDMAE | SPI_C2_RXDMAE;
      Serial.println("Before RX enable");
        _dmaRX->enable();
      Serial.println("Before TX enable");
        _dmaTX->enable();
        _dma_state = 2;
      Serial.println("DMA end of call");
        return true;
     }
    The code dies in the call: _dmaTX->destination(KINETISL_SPI1.DL);

    My looking through the DMAChannel code looks like it should properly set access to these parts of memory.

    So I hacked up the dma channel code and Tried to verify where it was dying, I also rearranged the code to see if I could at least tough the CFG...
    Currently looks like:
    Code:
    void destination(volatile signed char &p) { destination(*(volatile uint8_t *)&p); }
    	void destination(volatile unsigned char &p) {
    		Serial.printf("D: %x %x\n", (uint32_t)&p, (uint32_t)CFG); Serial.flush();
    		Serial.printf("%x %x %x %x\n\r", (uint32_t)CFG->SAR, (uint32_t)CFG->DAR, CFG->DSR_BCR, CFG->DCR); Serial.flush();
    		CFG->DCR = (CFG->DCR & 0xF0F0F0FF) | DMA_DCR_DSIZE(1);
    		Serial.println("set DCR"); Serial.flush();
    		CFG->DAR = (void*)&p;
    		Serial.println("set DAR"); Serial.flush();
    And my output in debug terminal:
    Code:
    First dma call
    D: 40077006 40008100
    0 0 0 20000000
    
    set DCR
    So it is dying on the line: CFG->DAR = (void*)&p;
    The address look valid to me: that is 40077006 is address of the DL register and 40008100 is address of first DMA channel.

    Suggestions?

    I then tried running the DMASPI library example program for SPI0 as well as a copy of it where I tried on SPI1. They also both appear to die at the same place. Potentially it might be my updated SPI library? I did have to slightly modify the Dmaspi library as both SPI and SPI1 use the same class.
    I did verify that the Example would run on a T3.6...

    Question: Has anyone tried DMA SPI on T-LC lately? I probably should pull out a secondary dev machine and see if I can get it to work with currently released 1.8.2 Arduino with released SPI and current Teensyduino.

    But again would really welcome suggestions... Will probably adapt this test to the T3.x boards while hopefully get some hints.

    Thanks!

  8. #33
    Senior Member KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    2,212
    Update: Still curious about T-LC in previous post, but right now playing with a 3.x version and making progress.

    It is actually going through and outputting and calling my callback. Which is good. Will then add it to my main SPI implementation for Async support.

    There are a couple of issues I am trying to figure out, what makes sense:

    T3.x
    a) If your last SPI transfer was using 16 bit mode, your transfer will continue in 16 bit mode. That is since you are only touching the low word of the PUSHR register it keeps what was previously in the High word which has things like CS settings CONT flag plus setting for 8 bit or 16 bit transfer. This is not unique to my test code. For example DMASPI test program.
    Code:
      DMASPI0.begin();
      DMASPI0.start();
    
      // Wonder what happens if I do a SPI.transfer16()
      SPI.transfer16(0xffff);
    
    
      DmaSpi::Transfer trx(nullptr, 0, nullptr);
    
      Serial.println("Testing src -> dest, single transfer");
      Serial.println("--------------------------------------------------");
      trx = DmaSpi::Transfer(src, DMASIZE, dest);
      clrDest((uint8_t*)dest);
      DMASPI0.registerTransfer(trx);
      while(trx.busy())
      {
      }
    I added the transfer16 and all of the writes after this point were 16 bits (a zero byte was added to each transfer)

    How to fix?
    1) Punt - Maybe ok in some cases, but ... I would not like to in generic SPI library.
    2) push/pop without DMA the first byte, such that it sets up the proper state. I did this with my DMA code for ili9341_t3n. Works OK in specific case, but lots stuff to figure out.
    3) Maybe create a DMASettings chain on the TX where the first item is setup to PUSHR 32 bits with proper first byte plus settings and then chains to 2nd item that does the rest.. Several cases to think about - But...

    b) T3.5 - Can not use DMA support on T3.5 for SPI1/SPI2... Could do using interrupts, we have fifo, but only one item deep, so won't be much of a win.

    Well now back to playing around

    Again suggestions hoped for.

  9. #34
    Senior Member
    Join Date
    Jan 2013
    Posts
    567
    The problem is, DMAChannel is broken on Teensy LC. In some cases, GCC generates byte accesses to 'CFG->DAR', which will cause the CPU to hang.

    https://github.com/PaulStoffregen/cores/pull/241

  10. #35
    Senior Member KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    2,212
    Thanks tni,

    Thanks that is helping on the LC.

    I am still trying to figure some of the stuff out... But it is outputting something. But not fully the right, and as such not getting to the end transfer stuff.
    In particular when I look at the output using LA I see, it output bytes like:
    00 80 01 81 02 82... When the array was init 00 01 02 ... 7f. So I am guessing that some of my/dmachannel stuff is not setting up sizes properly.

  11. #36
    Senior Member KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    2,212

    Hopefully new version of SPI library - Async support

    Thought I would mention that I updated my version of the SPI library (https://github.com/KurtE/SPI/tree/SPI-Multi-one-class)

    This update I believe I now have some DMA Async support added for the LC, plus a a few fixes for the 3.x version of the DMA Async support.

    There are still a few more issues I need to iron out. But a summary of what is in this version of the library.

    All of the SPI object are derived from SPIClass (like many of the other implementations). I did it similar to what Paul did for Wire and have a hardware Table created for the differences between objects.

    So hopefully in future we can have more libraries updated that can handle multiple SPI busses without having to do a bunch of work.
    To help with this I added members that work with same table as we would use in lets say: setMiso
    I added another function: pinIsMISO so you can verify for all of the different busses and boards.


    The Transfer methods have been enhanced: Still have:
    transfer(b)
    transfer16(w)
    transfer(buf, cnt) - It overwrites your buff.

    New stuff:
    transfer(txbuf, rxbuf, cnt) // separate rx and tx either can be null.

    (added member transferFillTXCharto set the TX char if txbuf is null: default 0 -

    Async support - as suggested on email list
    bool transfer(const void *txBuffer, void *rxBuffer, size_t count, void(*callback)(void));
    void flush(void);
    bool done(void);

    Currently the Async is implemented using DMA. Have version for LC and version for 3.x

    Issues to address:
    On 3.x - Currently assuming 1 byte transfers. Issue if last PUSHR set special things like 16 bit mode or CS pins or... Need to address. I think I may try some/all of:
    a) Detect I am in 16 bit mode and maybe detect that I am doing an even count output, then continue in 16 bit mode.
    b) Switch to 8 bit mode. Maybe have two SPISetting chain where first byte is output doing a 4 byte write, which sets fully PUSHR register and then chain it to second item that does the rest.
    c) Have the Async output first char sync if necessary...

    Teensy 3.5 - SPI1/SPI2 - Don't have unique DMA TX/RX setup. Not sure if they work in the Read only or Write only case. Things to try.
    a) Punt - have it not work on these
    b) See if it works for only Read or only Write?
    c) Implement instead using FIFO interrupts - But again these queues are only 1 entry in size, so would need to do interrupt for each input or output.

    Another thing I want to decide on, is I had a suggested implementation for the Teensy LC for the Transfer(txbuf, rxbuf, cnt), that broke up the code into three separate sections, READ, WRITE, TRANSFER, as to avoid an if or two in the loop. They also special cased the instances where count = 1 or count = 2 and did special code for these cases.

    That is the current code looks like:
    Code:
    void SPIClass::transfer(const void * buf, void * retbuf, uint32_t count) {
    	if (count == 0) return;
    	const uint8_t *p = (const uint8_t *)buf;
    	uint8_t *pret = (uint8_t *)retbuf;
    	uint8_t in;
    
    	while (!(port.S & SPI_S_SPTEF)) ; // wait
    	uint8_t out = p ? *p++ : _transferFillTXChar;
    	port.DL = out;
    	while (--count > 0) {
    		if (p) {
    			out = *p++;
    		}
    		while (!(port.S & SPI_S_SPTEF)) ; // wait
    		__disable_irq();
    		port.DL = out;
    		while (!(port.S & SPI_S_SPRF)) ; // wait
    		in = port.DL;
    		__enable_irq();
    		if (pret)*pret++ = in;
    	}
    	while (!(port.S & SPI_S_SPRF)) ; // wait
    	in = port.DL;
    	if (pret)*pret = in;
    }
    (Side note to myself - Does it really need to disable interrupts here?)

    What I am wondering is does it make enough sense to have three copies of the above code, to remove those two if statements in red for the three different versions?

    But warning: On Teensy LC, you need the patch to get DMA to work with current release

  12. #37
    Senior Member
    Join Date
    Jan 2013
    Posts
    567
    Quote Originally Posted by KurtE View Post
    Another thing I want to decide on, is I had a suggested implementation for the Teensy LC for the Transfer(txbuf, rxbuf, cnt), that broke up the code into three separate sections, READ, WRITE, TRANSFER, as to avoid an if or two in the loop.
    This is worth it. It does make a significant performance difference.

    They also special cased the instances where count = 1 or count = 2 and did special code for these cases.
    Probably a very rare use case.

    That is the current code looks like:
    Code:
    void SPIClass::transfer(const void * buf, void * retbuf, uint32_t count) {
    	if (count == 0) return;
    	const uint8_t *p = (const uint8_t *)buf;
    	uint8_t *pret = (uint8_t *)retbuf;
    	uint8_t in;
    
    	while (!(port.S & SPI_S_SPTEF)) ; // wait
    	uint8_t out = p ? *p++ : _transferFillTXChar;
    	port.DL = out;
    	while (--count > 0) {
    		if (p) {
    			out = *p++;
    		}
    		while (!(port.S & SPI_S_SPTEF)) ; // wait
    		__disable_irq();
    		port.DL = out;
    		while (!(port.S & SPI_S_SPRF)) ; // wait
    		in = port.DL;
    		__enable_irq();
    		if (pret)*pret++ = in;
    	}
    	while (!(port.S & SPI_S_SPRF)) ; // wait
    	in = port.DL;
    	if (pret)*pret = in;
    }
    (Side note to myself - Does it really need to disable interrupts here?)
    You do. You have 2 transmit bytes scheduled, one in the output shift register and one in port.DL. If you get interrupted, you loose a received byte (only one can be stored). transfer() hangs when a byte is lost.

    What I am wondering is does it make enough sense to have three copies of the above code, to remove those two if statements in red for the three different versions?
    Yes. You should also remove the entire
    Code:
    		while (!(port.S & SPI_S_SPRF)) ; // wait
    		in = port.DL;
    		__enable_irq();
    		if (pret)*pret++ = in;
    for the transmit-only case - no need to pick up the received bytes. Just empty port.DL at the very end.

  13. #38
    Senior Member KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    2,212
    Thanks TNI,

    Makes sense for the disabling interrupts.

    As for the write only case, yes it could remove the stuff you mentioned. I assume also the disable_irq...
    Logically Something in the nature of:
    Code:
    void SPIClass::write(const void * buf, , uint32_t count) {
    	if (count == 0) return;  //while would work here without this but what for character(s) to transmit would hang
    	while (count-- > 0) {
    		while (!(port.S & SPI_S_SPTEF)) ; // wait
    		port.DL = *p++;
    	}
            // Now need to somehow wait until the last character was output and then read in DL to clear.
    }
    But then need some way to synchronize the return from this function until the last bits have been output. The version that was suggested still had code in the main loop waiting for each character to be received. It also did not handle the interrupt case you mentioned.

    So for now I will probably keep it as one version. But may play more later.

    Thanks again!

  14. #39
    Senior Member
    Join Date
    Jan 2013
    Posts
    567
    Here is a version that is very fast without code duplication. It does 10.60Mbit. (An optimal transmit-only version can do 11.3Mbit.)

    Code:
    template<bool send, bool receive> __attribute__((always_inline)) 
    void SPIClass::transfer_(const uint8_t* send_ptr, uint8_t* recv_ptr, uint32_t count) {
        const uint8_t _transferFillTXChar = 0;
        auto& __restrict port = this->port;
        uint8_t dummy;
    
        while(!(port.S & SPI_S_SPTEF)) ; // wait
        port.DL = send ? *send_ptr++ : _transferFillTXChar;
    
        while(--count > 0) {
            while(!(port.S & SPI_S_SPTEF)) ; // wait
            __disable_irq();
            port.DL = send ? *send_ptr++ : _transferFillTXChar;
            while(!(port.S & SPI_S_SPRF)) ; // wait
            *(receive ? recv_ptr++ : &dummy) = port.DL;
            __enable_irq();
        }
        
        while(!(port.S & SPI_S_SPRF)) ; // wait
        *(receive ? recv_ptr++ : &dummy) = port.DL;
    }
    
    void SPIClass::transfer(const void* buf, void* retbuf, uint32_t count) {
        if(count == 0) return;
        const uint8_t* send_ptr = (const uint8_t*) buf;
        uint8_t* recv_ptr = (uint8_t*) retbuf;
        if(send_ptr){
            if(recv_ptr) transfer_<true, true>(send_ptr, recv_ptr, count);
            else transfer_<true, false>(send_ptr, recv_ptr, count);
        } else {
            if(recv_ptr) transfer_<false, true>(send_ptr, recv_ptr, count);
            else transfer_<false, false>(send_ptr, recv_ptr, count);
        }
    }
    Last edited by tni; 04-26-2017 at 04:01 PM. Reason: Fixed. Posted wrong version.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •