Teensy 4.0 SPI Chip Select pins

Status
Not open for further replies.

bicycleguy

Well-known member
I don't understand how the CS pins work on the T4. On T3 there was one SPI bus but many CS pins to choose from for each peripheral.

On T4 there are three SPI buses but only one CS pin for each.

How do I achieve multiple peripherals on common MOSI/MISO pins?
 
Isn't it possible to use any GPIO pin as a chip select? That's assuming you bit-bang it with separate code, which is slower, but if you are sharing the bus presumably you don't need the 2nd-Nth device on the bus to have maximum speed, or you would have used a separate SPI bus.
 
I don't understand how the CS pins work on the T4. On T3 there was one SPI bus but many CS pins to choose from for each peripheral.

On T4 there are three SPI buses but only one CS pin for each.

How do I achieve multiple peripherals on common MOSI/MISO pins?
I believe the special CS pins are only if you use device drivers that use DMA, such as several of the displays (ST7735, ST7789, etc.).

There appears to be only one CS pin for DMA per SPI bus. If you have normal SPI drivers it should work, providing you follow the other rules for SPI sharing:

  • https://www.pjrc.com/better-spi-bus-design-in-3-steps/
  • You use pull-up resistors on the CS pins;
  • You use devices that properly use tri-state logic on MISO;
  • You protect the bus with SPI.beginTransaction and SPI.endTransaction; (and)
  • You use a SPI bus speed that the device can tolerate.
 
I believe the special CS pins are only if you use device drivers that use DMA, such as several of the displays (ST7735, ST7789, etc.). ...

Thanks guys, that's a relief.

I never realized that any pin would work, that would have made some of my 3.2 efforts much easier. I was doing as your link describes with buffers ect. with great success, but always with the alternate CS pins.

So the ILI9341 requires 2 busses or is that someones DMA optimized versions?
 
Thanks guys, that's a relief.

I never realized that any pin would work, that would have made some of my 3.2 efforts much easier. I was doing as your link describes with buffers ect. with great success, but always with the alternate CS pins.

So the ILI9341 requires 2 busses or is that someones DMA optimized versions?
In the T3.x case, you needed two pins, one for CS and one for D/C for the devices that do DMA buffering. I believe in the 4.0 case, you don't need a special pin for D/C. But I really haven't delved into the internals.
 
Unfortunately most of the displays like the 9341, 9488, st7735 as well as the rs8875 displays require both a CS and a DC pin. There are unique displays that don't have a DC pin but those require special handling.
 
The SPI system for T4 is very different than it is on the T3.x boards.

With the T3.x the SPI buss registers had the register PUSHR which is 32 bits, of which the top 16 bits allows you to encode some other information, like which CTAR to use and if the one or more of the SPI CS pins should be asserted or not. With these boards in specific libraries, like the ILI9341_t3, we encode the CS and DC pins into these hardware SPI pins and as such we can tell the system if we want them asserted or not...
Could go into a lot of details. But originally (and maybe still), the library is setup that you must have two hardware CS pins and they can not be logically the same CS channel or bit or...

Especially having the DC pin being on a hardware CS pin, sped things up as without it, you have to wait until everything outputs fully until you can then change the state of the DC and output the next stuff. Which gives you gaps in the data. It is not AS important to have the CS be on a hardware CS as you only change this typically one per transaction.

Now with the T4 everything is different. It still has an output FIFO queue, which you can output to using the TDR (Transmit Data Register) register. Note: with T4 we can now output 32 bits at a time. In addition the chip has another register TCR (Transmit Control Register), which allows you to change things like word size, speed. And it allows you to choose to assert ONE SPI cs pin or not. If I remember correctly there can be up to maybe 5 CS pins, but only one is brought out to an actual pin.

Note: I have played around some with some of the libraries and do get SOME speedup using DC on that one pin, but I don't require it.

If I remember correctly the one case, where you need to use the Hardware CS pin is if you implement an SPI Client setup, which I have not tried.

For T4, There is now a version of ILI9341_t3 which tries to do some optimizations, mainly it tries to fill the fifo and I believe will speed up some if you put the DC on the one CS pin. There is also my ili9341_t3n library which has been updated for T4, and has the ability to define a frame buffer and can use DMA then to update from the frame buffer.
 
using the T4, with the standard SPI pins (mosi=11, miso=12, sck=13, and CS = 10) will the SPI library do the CS pin? or do I have to control the CS pin in my code?
 
(1) yes
(2) no

Does that mean?

(1) With standard SPI pins (CS == 10) you do not have to to set and reset the CS line (assuming you use SPI.beginTransaction and SPI.endTransaction)
(2) With non-standard SPI Pins (any other free pin on the T4) you need to do it for yourself.

Thanks.
 
When you use SPI library, you tell the code which is the CS (IIRC with begin(CS)) and then the SW handles asserting CS for you
you never do it by yourself
So:
(1 or first question): answer "yes"
(2 or second question): answer "no"
 
I think I am sort of confused here?

As far as I know the SPI library only has one begin function and it does not take parameters? void begin();

Now most every library that uses SPI, example the ili9341_t3 library does the handling of the CS pin for you. Some libraries may require the CS to be a hardware CS, but must just logically do:
digitalWrite(_csPin, LOW); or HIGH ...

Again I am assuming we are SPI Master... Note: the SPI hardware does have the ability to control the CS pin automatically. Lots more details in the IMXRT pdf file (chapter 47). This is mainly controlled by the TCR register (Again different than T3.x which is controlled by the PUSHR). You can setup the PCS region of this register to enable having the CS asserted when you do a transfer. And the CONT bit tells system if you should keep the CS pin asserted after the one word transfer (word size defined in register as well). Also the CONTC bit... Note: You need to be careful on how you leave the CONT bit as if you leave it asserted, but don't have any more data, you can leave the SPI hung...

If you are interested in trying to make use of the hardware CS pin, you might look at my ili9341_t3n library, when DC signal is using the hardware CS pin.
 
@KurtE You're pretty knowledgeable about SPI and DMA. I think you also worked on the T4 beta? What do you think are the odds of us getting SPI DMA support built right in the teensyduino libraries (as far as I know it's custom built into specific library for displays, etc.)

User crteensy created such a library for the T3.x which is found on Github here. I needed DMA Spi for accessing SPI RAM memories and had to modify his libraries in order for it to work for RAMs but I think he abandoned the project as my pull request was never merged. I eventually had to fork a copy into my BALibrary in order to have DMA audio transfers to external memory chips.
 
@Blackaddr - yes you might say I was a bit involved in the beta...

I guess the question is, it depends on what you are asking?

The SPI library already has support in it for DMA with the call: bool transfer(const void *txBuffer, void *rxBuffer, size_t count, EventResponderRef event_responder);
And T4 support for this was implemented as part of T4 beta.

The core files in this case hardware\teensy\avr\cores\teensy4: Does have the files DMAChannel.h/.cpp for helping with setting up DMA stuff.

Also during this time frame, we have also integrated DMA for the T4 into a few display drivers, including my own ili9341_t3n, more recently the ST7735_t3, ILI9488_t3, ...

There are some interesting complications with the T4 and DMA, depending on what type of memory you are using, more details up on the thread: https://forum.pjrc.com/threads/57225-T4-DMA-and-Memory-DMAMEM-and-malloc-new

Put simply if you, do something like malloc a tx and rx buffer, initialize the tx buffer and setup your tx and rx DMAChannels to point to these two buffers and do a DMA operation, and then look at the returned data in rx buffer, you may find that stuff you put into the TX Buffer was not what was sent over the MOSI pin, and the data returned on the MISO pin may not be what you see when you look at it with your program.

Why? The upper 512KB of memory (OCRAM) is cached. The normal processor instructions work through the cache, while DMA works directly with the actual contents stored in memory, which if the Cache has not flushed its updated values to memory or retrieved updated values from memory when those have changed, they will be different...

There are ways to force these two cases: TX case: arm_dcache_flush(write_data, count); RX case: arm_dcache_delete(retbuf, count)

But I have had a few times I pulled out a bit of hair!
 
@Blackaddr - yes you might say I was a bit involved in the beta...
Put simply if you, do something like malloc a tx and rx buffer, initialize the tx buffer and setup your tx and rx DMAChannels to point to these two buffers and do a DMA operation, and then look at the returned data in rx buffer, you may find that stuff you put into the TX Buffer was not what was sent over the MOSI pin, and the data returned on the MISO pin may not be what you see when you look at it with your program.

Why? The upper 512KB of memory (OCRAM) is cached. The normal processor instructions work through the cache, while DMA works directly with the actual contents stored in memory, which if the Cache has not flushed its updated values to memory or retrieved updated values from memory when those have changed, they will be different...

There are ways to force these two cases: TX case: arm_dcache_flush(write_data, count); RX case: arm_dcache_delete(retbuf, count)

Thanks KurtE, I was not aware of that particular transfer() function, I'll go look into it. As for the the cache issue, should not the pointers to the buffers be volatile? In fact, shouldn't this be enforced by the transfer signature?
Code:
// Since DMA will work directly on system memory, not the cache contents DMA buffer must be declared volatile.
bool transfer(const volatile void *txBuffer, volatile void *rxBuffer, size_t count, EventResponderRef event_responder);

When I look at the DmaSpi.h implementation I see that the signature is partially correct. The RX buffer is volatile, the source buffer is const, but not volatile. As you rightly point out, if you setup a DMA channel to use a particular TX buffer address, then you modify the buffer contents before triggering the DMA write, the buffer contents might still in cache and not flushed out to memory.
Code:
Transfer(const uint8_t* pSource = nullptr, // this should be volatile as well
                  const uint16_t& transferCount = 0,
                  volatile uint8_t* pDest = nullptr, // this is correctly specified as volatile
                  const uint8_t& fill = 0,
                  AbstractChipSelect* cs = nullptr
      )
 
Again the issue with DMA to the upper part of memory, is particular to how the OCRAM and the memory cache works...

In particular on the definition setup in starupt.c in the function configure_cache...
Code:
	SCB_MPU_RBAR = 0x20200000 | REGION(3); // RAM (AXI bus)
	SCB_MPU_RASR = MEM_CACHE_WBWA | READWRITE | NOEXEC | SIZE_1M;
This was talked about several times during the beta, for example: https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test?p=196223&viewfull=1#post196223

And in some of those posts, could fix for DMA, by disabling some of the cache (change WBWA, to WT)...

But this disables it for everything... i.e. throw out the baby with the bathwater.

Hopefully some more of the lessons we learned during the beta will make it into some form of Wiki or some other easily to find location.

Things like, how to keep your ISR from being called twice...
Which is why you will see code that does: asm("dsb");
 
Well after weeks messing with the DMA I'm ready to pull my hair out. I was able to get it to work with T4 only by declaring the buffers as DMAMEM instead of allocating off the heap. I tried using the cache flush and delete commands and that did not seem to help with reads. I got write DMA working fine but reads are not working with heap.

Can you confirm that in order to get DMA working for reads that you must use DMAMEM? Cache flush commands don't seem sufficient to get DMA working with heap memory?
 
Actually, now that I look closer, it's not the DMAMEM attribute that seems to make it work. Putting them in static memory works too. It's just the heap memory that doesn't work. So I guess the question is during the beta testing, did you guys ever get DMA reads working with malloc() heap memory using only cache flush commands and not restoring to disabling the cache in startup.c?
 
Welcome the DMA He...

Take a look at SPI.cpp for where I try to do SPI

And on RX if I detect the memory is up high, I try:
if ((uint32_t)retbuf >= 0x20200000u) arm_dcache_delete(retbuf, count);

I do this at start of the transfer, but might need it at the end of the transfer...

Edit: FYI DMAMEM and heap memory (malloc, new) come from the same memory section
 
Status
Not open for further replies.
Back
Top