Teensy 3.5 DMASPI: Differences between T3.5 and T3.6?

Status
Not open for further replies.

christoph

Well-known member
I'm having a hard time making the DMASPI library to work on the Teensy 3.5.

First of all, I don't have the hardware, but I thought it should be possible to get at least one or the other feature of the library to work as intended. All tests in the library example fail on the Teensy 3.5, so I'm looking for differences in the DMA and the SPI peripherals between the Teensy 3.5 and Teensy 3.6 (because things do work on the 3.6). I can't tell if I'm not using the SPI correctly, or if it's the DMA system.

Can anyone try to point my in the right direction? Am I even using the correct defines in the library? Currently I'm using

Code:
#if defined(KINETISK)

for DMASPI0 and

Code:
#if defined(__MK64FX512__) || defined(__MK66FX1M0__)

for DMASPI1, assuming that I can use identical code for Teensy 3.2, 3.5 and 3.6.
 
I am not a DMA expert, but was playing around earlier on the T3.5 beta board and found DMASPI, may have some real issues on the K64. There may be postings earlier on the k66 beta thread, but I know I posted some stuff on the T3.5 beta thread:
#59
 
OK so at least I found where in the datasheet the DMA sources are listed and SPI1/2 DMA RX and TX interrupt are indeed the same, per peripheral: source 16 is SPI1 Transmit or Receive and source 17 is SPI2 Transmit or Receive (K64 Sub-Family Reference Manual, Rev 2, Table 3-27: "DMA Request Sources - MUX 0").

But that doesn't help me with SPI0, which should work, after all. It has separate TX and RX channels and 4 word FIFO depth.
 
Again not sure what you are hitting, I ran it earlier on my T3.5 beta board.

My copy of dmaspi is in sync with the master one, I just built the DMASpi_example1 again, and downloaded it and ran it.
Code:
Hi!
Buffers are prepared
Time for non-DMA transfer: 233us
src and dest match

Press a key to continue

Testing src -> dest, single transfer
--------------------------------------------------
Finished DMA transfer
src and dest match
==================================================


Testing src -> discard, single transfer
--------------------------------------------------
Finished DMA transfer
last discarded value is 0x63
That appears to be correct
==================================================


Testing 0xFF dummy data -> dest, single transfer
--------------------------------------------------
Finished DMA transfer
src and dest match
==================================================


Testing multiple queued transfers
--------------------------------------------------
Finished DMA transfer
Finished DMA transfer1
src and dest match
src and dest match
==================================================


Testing pause and restart
--------------------------------------------------
Time until stopped: 220 us
Finished DMA transfer
DMA SPI appears to have stopped (this is good)
restarting
Finished DMA transfer1
src and dest match
src and dest match
==================================================


Testing src -> dest, with chip select object
--------------------------------------------------
Finished DMA transfer
src and dest match
==================================================


DMA SPI is still running
DMA SPI stopped.
This was at 120mhz
 
Well, Frank B also volunteered to test it on his T3.5 and none of the tests passed, the library only ever received 0x63 on MISO. But now as I'm sorting things in my head I realize that the test sketch I gave him was for T3.5 SPI1, which doesn't work for known reasons.

So you can obviously confirm that DMASPI works for SPI0 on the T3.5.
 
Yep, I modified test earlier to test on SPI1 and SPI2 and ran on T3.6, I tried to do same with T3.5 and failed. That is when I found that the DMA channels were different...
 
Now let's head over to SPI1 on T3.5:
Some discussion took place in a different thread, but it definitely belongs here. The post where the relevant part start is: https://forum.pjrc.com/threads/3576...-multiple-SPIs?p=142439&viewfull=1#post142439

I've also created an issue for the library on github: https://github.com/crteensy/DmaSpi/issues/22

The background:
  • Teensy 3.0/3.2/3.6 SPIs and Teensy 3.5's SPI0 have two DMA sources: TX FIFO not full and RX FIFO not empty. The DMASPI library was written with these in mind.
  • Teensy 3.5's SPI1 and SPI2 only have one DMA source, which is combined for TX and RX (one per SPI, though).
  • Teensy 3.5 SPI1/SPI2 have individual DMA request enable bits, just like the other teensy's SPIs mentioned before (SPIx_RSER:TFFF_RE and SPIx_RSER:RFDF_RE
  • Blindly using the single DMA request for both directions is probably a very bad idea since we don't want an RX FIFO drain request trigger a write to the TX FIFO. Same for a TX FIFO fill request and reading from the RX FIFO.
One possible solution:
  1. Configure the SPI to generate RX FIFO drain requests only
  2. Configure a DMA channel to write to the TX FIFO as soon as a word was read by the RX channel. On bare metal, this is done by using the minor loop link feature which triggers a DMA channel (TX in this case) on minor loop iterations of another channel (RX in this case). Teensyduino supports this with the DMAChannel::triggerAtTransfersOf(otherChannel) method. Sounds good.
  3. Initiate the whole thing by writing the first word to the TX FIFO manually in code. The DMA channel would have to write the remaining words when triggered.
  4. When all bytes have been read by the RX channel, the RX complete ISR is called (that's what's happening already for all other SPIs as well).
The obvious drawback of this is that the TX FIFO's depth is not used in either direction, since writing and reading is done on a 1-to-1 basis. Consequently, speed might be a bit lower with little gaps between words. Depending on the overall number of words to be transmitted, it might be possible to go for at least a depth of two words by simply writing a second word in step 3)

Some safety measures will need to be installed for this to work:
  • This probably needs to implemented for all SPIs on all Teensys anyway: wait for the SPI to be ready in the first place, which means that both TX and RX FIFO are drained.
  • In contrast to the other DMASPI implementations, the TX channel must be enabled before the RX channel, since it (the TX channel) is triggered by the RX channel's minor loop completion. Otherwise it might loose trigger events because the RX channel is too quick or other interrupts get in the way
  • I'm not sure (and again this goes for all DMASPI implementations) if small transfer counts (zero or 1) mess up things. The probably will, and I don't know how to work around that. Rejecting the transfer in the first place might be appropriate.

Since DMASPI is already written for a number of different SPI flavors, making new classes for Teensy 3.5's SPI1 and SPI2 shouldn't be such a big deal. However, I'll need a volunteer to actually try things since I don't have a Teensy 3.5.
 
One possible solution:
  1. Configure the SPI to generate RX FIFO drain requests only
  2. Configure a DMA channel to write to the TX FIFO as soon as a word was read by the RX channel. On bare metal, this is done by using the minor loop link feature which triggers a DMA channel (TX in this case) on minor loop iterations of another channel (RX in this case). Teensyduino supports this with the DMAChannel::triggerAtTransfersOf(otherChannel) method. Sounds good.
  3. Initiate the whole thing by writing the first word to the TX FIFO manually in code. The DMA channel would have to write the remaining words when triggered.
  4. When all bytes have been read by the RX channel, the RX complete ISR is called (that's what's happening already for all other SPIs as well).
The obvious drawback of this is that the TX FIFO's depth is not used in either direction, since writing and reading is done on a 1-to-1 basis. Consequently, speed might be a bit lower with little gaps between words. Depending on the overall number of words to be transmitted, it might be possible to go for at least a depth of two words by simply writing a second word in step 3)
The K64 SPI1 / SPI2 RX FIFO appears to have 2 entries (1 per manual), so having 2 transfers in flight (one in FIFO, one in shift register) should actually work.

This works on Teensy 3.5:
Code:
    for(size_t i = 0; i < 2; i++) {
        SPI1_PUSHR = src_buffer[i] | SPI_PUSHR_CTAS(1);
    }
    delay(10);
    for(size_t i = 0; i < 2; i++) {
        dest_buffer[i] = SPI1_POPR;
    }

The equivalent code on Teensy LC would overflow the RX FIFO (it also has an RX FIFO size of 1 on paper).
 
I thought I would mention, that I currently have my SPI library code hopefully working for async transfers using DMA, on T3.5 SP1/2 using the techniques that @tni suggested using the chaining of the DMA channel, which was nice as before that I was using an SPI ISR that try to handle reading the incoming data. But there were cases where the ISR would not keep up. Was about to switch on the Read case as well as the Transfer case to switch to have the TX on ISR and Read on DMA, before I tried this chaining of channels.

A disadvantage of using the chained channels is that you are limited to transfers of 511 bytes/words. Yesterday I updated my SPI library transfer functions, that if you request a transfer > size one DMA request can handle (511, 32767), the ISR for the RX completing will automatically setup and issue new request for the next portion of the transfer.

If interested code is up in my SPI fork/branch (https://github.com/KurtE/SPI/tree/SPI-Multi-one-class)
 
The 511 byte limitation is nasty but won't be a show stopper. Maybe I can even get rid of the 32767 byte limitation with the same "reloading pattern" in my transfer descriptors...

I've ordered a Teensy 3.5 now and will look into your code and the datasheet. If your code gets it right I'm sure I can do the same in DMASPI, maybe with some hand-holding. It seems that DMASPI for Teensy LC needs some work too.
 
Status
Not open for further replies.
Back
Top