DMASPI library needs some (probably breaking) changes to really support multiple SPIs

Status
Not open for further replies.

christoph

Well-known member
The ongoing discussion in the K66 beta test thread reveals a serious shortcoming of the DMASPI library's design. It was designed with just one SPI in mind and evolved from "somewhat working" to "usable for others" to what it is now - but still only for one SPI. The problems revolve around the following procedure:

  1. A transfer object is created, which describes the desired transfer. It has an optional chip select object which can used to set up the SPI bus (SPISettings) when the transfer is started and ended, and to correctly assert and deassert a chip select signal.
  2. the transfer object is registered with the desired DMASPI object
  3. eventually the DMASPI object starts the transfer and calls transfer::select(), which has no arguments
  4. transfer::select() may call SPI.beginTransaction to set up the bus
  5. the DMASPI transfer is executed
  6. transfer::deselect() is called, and if a transaction was started in select(), it must be ended now

the DMASPI library provides the class ActiveLowChipSelect, which is hardcoded to use SPI, without any way to sneak in SPI1 or SPI2. The only way to change the SPI object is to write the correct one in ActiveLowChipSelect or another class that derives from AbstractChipSelect, or to somehow change the interface to allow passing in a reference to an SPI object. However - and I pointed this out when I first worked on DMASPI - SPI, SPI1 and SPI2 don't derive from a common base class. They cannot be passed as the same reference type.

My suggestions for workarounds or changes to the interface:
  • Create an independent chip select class that uses SPI1. Tedious, and error-prone when code is moved from one SPI to the other
  • Modify ActiveLowChipSelect to accept a template parameter to choose the correct SPI. Error-prone as the previous suggestion
  • Add beginTransaction() and endTransaction() methods to DMASPI. Add an AbstractDmaSpi reference parameter to AbstractChipSelect::select and ::deselect, so that the chip select object can use the abstract interface provided by DMASPI to control SPI transactions. The AbstractDmaSpi reference would need to be passed to the transfer object, and on to the chip select object. Con: breaking change to existing code, because a new argument is introduced. Pro: the chip select object can't work on the wrong SPI object
  • Instead of passing an AbstractDmaSpi reference to AbstractChipSelect::select() and ::deselect(), set a pointer in registerTransfer(). Feels awkward, but would probably work as well.

Any more suggestions or wishes? What would you do? If breaking changes are necessary, I'd rewrite the chip select interface and call it DMASPI_v2 or something like that, to clearly indicate that it's a different library.

Regards

Christoph
 
Another suggestion: As your CS object allows you to pass in SPISettings, so for example you can choose what speed the buss should run at.

Wonder if maybe should have ability to set a default SPISetting for your DMASPI object.

Maybe optional argument to begin or start?
 
Last edited:
Have a look at https://github.com/crteensy/DmaSpi/blob/master/ChipSelect.h - there is the abstract interface which doesn't do anything, and also doesn't take any constructor parameters, but if you scroll down to ActiveLowChipSelect you can see that it accepts a pin number and a const reference to SPISettings.

A default SPISettings object is used if no AbstractChipSelect is passed to the transfer object, in which case DMASPI will take care of starting and ending a transaction by itself. See https://github.com/crteensy/DmaSpi/blob/master/DmaSpi.h#L487
 
Last edited:
Hi Christoph.

Sorry - I did not type in my question very well. I was trying to say that I appreciate that the ActiveLowChipSelect allows you to choose the transfer speed...

What I was wondering is if I don't use the ActivelowChipSelect than there is no other way to setup a different SPISetting.

Also not sure about usage of DebugChipSelect or DummyChipSelect, as I believe if you choose either of these in your transfer object, then the underlying code will call this and NOT call off to SPI.beginTransfer/SPI.endTransfer (or hopefully in cases was trying to get SPI1.begin...
 
The point behind all these classes is mainly that: you can create your own chip select class which does what you want - be it debugging output, setting up the SPI, or correctly handling the actual hardware on the SPI bus. As the DMASPI handles a potentially large number of queued transfers, they must convey information about the bus settings, and that's what ActiveLowChipSelect does (settings and CS pin are handled).

Now that I'm writing this - are you suggesting adding an SPISettings object to the transfer descriptor, in case there's no chip select object? That would work, but would it make sense from a practical point of view? I can't imagine a use case for that right now.
 
Just to clarify: you'd like to have something like this?

Code:
class OnlySettingsChipSelect : public AbstractChipSelect
{
  public:
    OnlySettingsChipSelect(const SPISettings& settings)
      : settings_(settings)
    {
    }

    void select() override
    {
      SPI.beginTransaction(settings_);
    }

    void deselect() override
    {
      SPI.endTransaction();
    }
  private:
    const SPISettings settings_;

};
 
Again sorry, I am just trying to figure out the intent on how some of this is supposed to work. Again I have not done anything using the DMA stuff on teensy, other than this testing.

What I am seeing is the ChipSelect object appears to be responsible for two things.
1) Do any chip select stuff you need to do
2) Begin/End SPI transactions.

Maybe soon I might try playing with this, like transferring Servo information between Teensy and Odroid (or RPI or Up board). If so my guess is I will need to use your Active low chip select class as I don't see any other place where I can control the chip select.

That is I don't believe you give the ablility to set other additional options in the SPIx_PUSHR command, like the PCS fields. Again not sure how/if those work with DMA or not.

I was also trying to ask if the DebugChipSelect it was intentional, that it does not Begin/End transactions, so it will use whatever was last set into the SPIx_CTAR0 register...

Again I am just trying to understand...
 
Quick FYIs

a) Did post a bit more T3.6 specific stuff up on other thread. Was able to get SPI2 version to work. Currently SPI.h is missing the external ... SPI2, which is why it would not compile. I put in pull request for this.

b) I also tested SPI1 stuff on Teensy LC. Used my own version of the CS object which does the begin/end Transactions to SPI1.

Probably done playing for now, unless there is something specific you would like tested.
 
It seems to be looking good so far, and I'm considering yet another way of making ActiveLowChipSelect a bit more flexible. The current template accepts neither an SPI class name nor a reference to an instance of some SPI, each of which would be needed to make it use the correct SPI. So my idea is something like this:

Instead of having

Code:
class ActiveLowChipSelect : public AbstractChipSelect

I'd rather use

Code:
template<typename SPI_CLASS = SPICLASS, SPI_CLASS& m_SPI = SPI>
class ActiveLowChipSelect : public AbstractChipSelect
{
  ...
}

so one can specify which SPI class and SPI instance to use, as in

Code:
ActiveLowChipSelect<> myChipSelect_spi0(1, SPISettings(12000000, MSBFIRST, SPI_MODE2));
ActiveLowChipSelect<SPI1CLASS, SPI1> myChipSelect_spi1(1, SPISettings(12000000, MSBFIRST, SPI_MODE2));

But that still doesn't allow for any kind of automatic SPI selection, which would be desirable - one class or instance of DMASPI works on only one SPI, after all.
 
Yep lots of ways to do this.

Example could change Activate/deactivate to reuturn something like a bool, which says something like do the default for me. Would want some way to tell the main dmaspi object what spisettings to us...

Could change AbstractChipSelect class to not be abstract, and have the base implementation know which SPI object they are dealing with and do the right thing. Then your subclasses can choose if/when to call the base class...

Could do like MRAA like code that often had ways to register a function to overwrite the default functionality and/or ways to register a Pre function and a post function...

...
 
Or simply say ActiveLowChipSelect is deprecated because it only handles SPI, and not SPI1 or SPI2, and write a new class altogether. AbstractChipSelect should remain abstract and as generic as possible to have a flexible interface. The original idea was to have a chip select class that works independently of DMASPI, and could be used with "classic" SPI as well.
 
Regarding a slave part in DMASPI: I've thought about this on and off. There are several issues involved, making this a hard nut to crack. There's almost as many protocols as there are slave chips, and coming up with a library that - within certain limits - covers all of those use cases is probably impossible.

What can be done, though? The constraints are quite simple. When a chip select line is asserted, the slave must be virtually immediately ready to accept some kind of command over SPI. So to a certain extent things must be set up in advance. Converting this to interrupts and dma transfers, the following could happen: the negative edge on the chip select line enables a DMA transfer from the SPI's FIFO to some memory, until that memory is full. Any extra bytes following the completed DMA transfer end up in the SPI FIFO until they are read/discarded/whatever'ed. Later the chip select line is deasserted again, which is handled by the pin ISR by resetting the slave interface.

Based on the concept of chained transfers already used by the master implementation, a number of different base cases can be constructed:
  1. the dumb buffer slave: A transfer points to some array, which can be processed after CS is deasserted.
  2. command[1] - parameters[m] - data[n] slave: A chain of up to three transfers handles a command, parameters, and data. If several commands are accepted, they all have the same number of parameter and data bytes anyway, at least in terms of prepared transfers. If no data is transferred, or even no parameters, the already chained up transfers simply don't fire, but CS is caught anyway. Then command[parameters[data]] is processed.
  3. similar scenarios, but all in terms of chains of transfers.
This still lacks any MISO data. Responding in a complex fashion is really hard, because there's probably not much time to react. Simply returning a response byte as many ADCs do it (send something, get a status back) is certainly doable, but anything beyond that would require some sort of processing in between.

Another (rather simple to handle) aspect is that the master might send more data than required, or more than fits into the buffer. So a slave implementation needs some kind of /dev/null transfer that happily sinks away any data from the SPI FIFO that is not read by any previous transfer.

If anyone really needs a DMASPI slave, I'll try to help setting one up, just to get a feeling of what a real use case might look like.
 
Achieving the speed benefits of Dma+Spi

Applying the DmaSpi library to my project required a bit of trial and error. Hope what I found is helpful to others:

The example sketches for the DmaSpi library do not show how the functions would be used to take advantage of DMA. Each example includes the wait function (while (trx.busy()) {}) AFTER initiating the SPI transfer. This effectively makes the transfer equivalent to a non-DMA transfer, because the sketch waits until the DMA transfer is done. The process in the examples is:

1. Set up DMA transfer
2. Initiate DMA transfer
3. Wait until completed
4. Continue

To take advantage of DMA step #3 should be removed

1. Set up DMA transfer
2. Initiate DMA transfer
3. Continue (without waiting for transfer completion)

To avoid collisions between a new transfer request and the previous transfer, add the busy test BEFORE each transfer request.

1. Wait until completed the PREVIOUS transfer is completed
2. Latch high (latch PREVIOUS data into both controllers)
3. Latch low (Prepare for current transfer)
4. Set up DMA transfer
5. Initiate DMA transfer
6. Continue (without waiting for transfer completion)

In this process step #1 avoids collisions between the previous transfer and the current transfer request. This works with 'internal' or ‘external’ latching for a single transfer string. Note that latching the PREVIOUS transfer is done after the previous transfer is confirmed is completed.

In my application I want to send data to an RGB LED array with one byte for anode control, and a string of bytes for cathode control. Latching needs to occur only once after both data strings are sent to avoid flicker. In this case internal latching does not work, since the latch signal is lowered and raised twice – once for each string transferred For a process including 2 back to back transfers, the DMA wait function is needed between the two transfers to avoid collisions. The two transfers could be combined by combining the two data strings, but this takes more time that two separate transfers.

1. Wait until completed the PREVIOUS transfer is completed
2. Latch high (latch PREVIOUS data into both controllers)
3. Latch low (Prepare for current transfer)
4. Set up DMA transfer (anode)
5. Initiate DMA transfer #1 (the shorter data string)
6. Wait until 1st transfer is completed)
7. Set up DMA transfer (cathode)
8. Initiate DMA transfer #2 (the longer data string)
9. Continue (without waiting for transfer completion)

The following sketch shows the process for a 1 byte anode transfer followed immediately a 288 byte cathode. On the Teensy 3.6 the advantage is apparent for a 288 + 1 byte transfer:

Non-DMA SPI transfer = 651microsecinds
DMA with trailing wait test = 610 microseconds
DMA with leading wait test= 19 microseconds. A speed increase of 32x, adding 590 microseconds back to MCU processing

Code:
//test the flow and latching of the DmaSpi library for test_DMAspi_lib_teensy

/*-------------my logic ---------------------------
* Requires 2 transfers per interrupt (anode and cathode data) with one latch afterwards.
*/

/** Hardware setup:
 Teensy 3.1, 3.2, LC, 3.5, 3.6:
 DOUT (11)
 DIN  (12)
 CLK  (13)
 SS   (10)
**/

#include <Arduino.h>
#include <SPI.h>
#include <DmaSpi.h>

uint8_t latchPin = 4;	// 10;

/** buffers to send from and to receive to **/
#define DMASIZE 289		//number of bytes
static uint8_t src[DMASIZE];
uint8_t testSize1 = 1;
uint16_t testSize2 = 288;

DmaSpi::Transfer trx(nullptr, 0, nullptr);	//initialize trx

void setup()
{
	Serial.begin(115200);
	delay(3);
	Serial.println("start");
	delay(3);

	latchPin.pBase = PIN_TO_BASEREG(LATCH);
	latchPin.pMask = PIN_TO_BITMASK(LATCH);

	/** set up SPI and DMA**/
       SPI.begin();
       DMASPI0.begin();
       DMASPI0.start();

	pinMode(latchPin, OUTPUT);
	digitalWrite(latchPin, HIGH);
	fillZero();
}

void loop(){
	static bool count = 0;

    while (trx.busy()) {}			//DO NOT latch until previous data sent-here too soon?
	digitalWrite(latchPin, HIGH);	//latch PREVIOUS sent data to channels
	digitalWrite(latchPin, LOW);	//accept new data

	trx = DmaSpi::Transfer(src, testSize1, nullptr);
	DMASPI0.registerTransfer(trx);		//send anode byte(s) - short string
	while (trx.busy()) {}
	trx = DmaSpi::Transfer(src, testSize2+1, nullptr);
	DMASPI0.registerTransfer(trx);	//send cathode bytes - long string
	//don't wait for saend complete - go off and do other things
	if(count){		//differentiate sernd sequences
		fill255();
	}
	else {
		fillZero();
	}
	count = !count;
	delay(50);
}

void fillZero(void){
	for(int i = 0; i <DMASIZE; i++){
		src[i] = 0;
	}
}

void fill255(void){
	for(int i = 0; i <DMASIZE; i++){
		src[i] = 255;
	}
}
 
First of all: sorry for the delay.

Regarding your remark about busy waiting: Indeed, the examples always check until a transfer is finished. This has one simple reason: I wrote the examples mainly to test each of the features, and to demonstrate how they are used. The outcome of each operation can only be checked for correctness once it is finished, so while(trx.busy()) is in there each and every time.

Regarding your application: You can queue up multiple transfers, you don't need to wait for the first to be finished before registering the second. See here: https://github.com/crteensy/DmaSpi/blob/master/examples/DMASpi_example1/DMASpi_example1.ino#L157
Alternatively you can just send the whole buffer instead of splitting it into one transfer with one byte and another with 288 bytes, by creating the combined transfer like this:
Code:
trx = DmaSpi::Transfer(src, DMASIZE, nullptr);
If you need two back-to-back transfers without deasserting the chip select line in between, you can create custom chip select classes that allow for this: One only asserts CS, one only deasserts CS. If you want to go that route, here's rough, untested, example code:

Code:
class AssertCS : public AbstractChipSelect
{
  public:
    AssertCS(const unsigned int& pin, const SPISettings& settings)
      : pin_(pin),
      settings_(settings)
    {
      pinMode(pin, OUTPUT);
      digitalWriteFast(pin, 1);
    }

    void select() override
    {
      SPI.beginTransaction(settings_);
      digitalWriteFast(pin_, 0);
    }

    void deselect() override
    {
    }
  private:
    const unsigned int pin_;
    const SPISettings settings_;

};

class DeassertCS : public AbstractChipSelect
{
  public:
    DeassertCS(const unsigned int& pin)
      : pin_(pin)
    {
    }

    void select() override
    {
      SPI.beginTransaction(settings_);
      digitalWriteFast(pin_, 0);
    }

    void deselect() override
    {
      digitalWriteFast(pin_, 1);
      SPI.endTransaction();
    }
  private:
    const unsigned int pin_;
};

AssertCS assert(somePin, SPISettings());
DeassertCS deassert(somePin);

DmaSpi::Transfer trx1(src1, SIZE1, nullptr, 0, &assert);
DmaSpi::Transfer trx2(src2, SIZE2, nullptr, 0, &deassert);

SPI.begin();
DMASPI0.begin();
DMASPI0.start();

DMASPI0.registerTransfer(trx1);
// make sure no other transfer is registered between these two!
DMASPI0.registerTransfer(trx2);

I hope this clarifies things and helps. I think putting the CS logic into those CS classes is a better concept, but in the end it's up to you.

Also please yell if I got that latching concept wrong. Can you show us a timing diagram of that?
 
Last edited:
Does this library support the use of a Teensy 3.6 as an SPI slave to say a Raspberry Pi 3 B or some other SPI master device?

I am especially interested in multi-Teensy configurations like the following (sorry for my vain attempt at ASCII artistry, but it gets my point across):

Raspberry Pi 3 B or other device (SPI Master)
^
|
SPI
|
v
#===> Teensy 3.6 (SPI Slave #1)
^
|
v
#===> Teensy 3.6 (SPI Slave #2)
^
|
v
#===> Teensy 3.6 (SPI Slave #3)
^
|
v
...
^
|
v
#===> Teensy 3.6 (SPI Slave #N)
 
Last edited:
No, slave operation is not yet supported and it would probably end up in a separate library. However, this request has come up a couple of times recently and I'm thinking about different ways of supporting slave operation.

Do you have a specific protocol in mind? Do you want to emulate a certain device or just have a way of sending a large array from your pi to the teensy?
 
No, slave operation is not yet supported and it would probably end up in a separate library. However, this request has come up a couple of times recently and I'm thinking about different ways of supporting slave operation.

Do you have a specific protocol in mind? Do you want to emulate a certain device or just have a way of sending a large array from your pi to the teensy?

Just want "as fast as possible" data transfer from the Teensys, which collect the data, to the Pi (or other master device that captures telemetry data coming from multiple sources). This is a fairly common use case.
 
If you want "as fast as possible", it is better to declare the Teensy as master.
or is the PI not fast enough ?
 
If you want "as fast as possible", it is better to declare the Teensy as master.
or is the PI not fast enough ?

If you declare the Teensy as master, how does the many Teensys to One Pi work out (see the picture in post #17, above)?
 
Even if that seems to be a simple case, a library should provide features to handle at least these cases gracefully:
  • The master reads too little data
  • the master reads exactly the right amount of data
  • the master tries to read too much data
  • the master tries to read data before a new set of data is ready
  • the master doesn't read data, but new data is available
Also, how does the master know when it can read data? How does the master know if it has skipped data?
 
Just want "as fast as possible" data transfer from the Teensys, which collect the data, to the Pi (or other master device that captures telemetry data coming from multiple sources). This is a fairly common use case.

Running teensy spi as slave (in particular with dma) is not difficult, and I guess Christoph will come up with a solution.
Note however, Teensy as slave reduces the speed by a factor of two with respect to teensy as master.
Your schematic is not clear to me
what is it exactly
A) PI <-> T3.6 <->T3.6 <->T3.6
or B)
PI <-> T3.6
<-> T3.6
<-> T3.6

in other words are the teensies in serial or in parallel?
in case of A, each teensy would communicate with left as slave and to the right as master (this would work with TLC and T3.5/6, as they have two+ SPI ports)
in case of B, each Teensy is slave, but PI uses multiple chip selects to address a teensy. Only one teensy at a time can be serviced. but begin/endTransaction are handling this.
 
Sorry if this is a stupid question, but can I use this to read a stream of information from a SPI Flash IC in DMA mode? All I see is DMA transmission, not receiving.
 
Status
Not open for further replies.
Back
Top