How best to manage multiple SPI busses?

Status
Not open for further replies.

KurtE

Senior Member+
Now that several of the Teensy boards now support multiple SPI busses, it would be great if we could standardize a way to allow different drivers such as display drivers to support devices on the different SPI busses.

Currently with the Teensy we do support the different SPI busses, however each of the SPI busses are currently supported by their own class.

That is the SPI object is a member of the SPIClass
On the T-LC as well as the T3.5/6 the SPI1 object is a member of the SPI1Class
And also on T3.5/6 the SPI2 object is a member of the SPI2Class.

So for example if you wished to make a display driver like the ili9341_t3 library support a display on any of these busses, you have to do something like hard code code everywhere that does something like:
Code:
if (spi_buss == 0)
    SPI.transfer(x);
#if defined(__MK64FX512__) || defined(__MK66FX1M0__) || defined(KINETISL)
else if (spi_buss == 1)
    SPI1.transfer(x);
#endif
#if defined(__MK64FX512__) || defined(__MK66FX1M0__) 
else if (spi_buss ==2) 
    SPI2.transfer(x);
#endif

Which is not great...

To work around this I developed my own library SPIN (https://github.com/KurtE/SPIN), which created a base class for all three and then subclasses of my base class for each of the SPI busses.

The library is pretty basic, the basic functions, simply create virtual functions for each of the main SPClass functions and then have subclasses which implement this by simply calling off to that busses version of the buss. I also added a few helper functions that I thought was appropriate like being able to ask if a given pin is valid as a MISO pin or a MOSI pin or a SCLK pin... And likewise some helper functions to manage the FIFO queues on the T3.x processors...

For the most part this has worked out for me to adapt some of these other driver libraries to work on the different busses.

BUT, I wonder if this is the approach to take long term. Or should we do for SPI what Paul has recently done for Wire and update the SPI library to be closer to what some of the other Arduino platforms are doing.

That is for example if you look at current implementation for boards like the Arduino Zero, you will see that all of the sPI busses are instances of the main SPI class (SPIClass)

There constructor, takes some parameters to create the instances. In their case it takes a pointer to a SERCOM object, which does all of the real work, but I would imagine we could do it in a similar way to what Paul did for the Wire objects...
Example their implemention for the simple transfer method looks like:
Code:
byte SPIClass::transfer(uint8_t data)
{
  return _p_sercom->transferDataSPI(data);
}
This obviously works. Sort of like having the main SPIClass object hold a pointer to my SPIN object and calling it...

What I wonder is, does it make sense to migrate toward the Arduino Zero like solutiion, such that hopefully over a short period of time, we could update a lot of the device libraries to support multiple busses and hopefully have it supported on most of the platforms that have multiple busses?

Does this make sense? Suggestions?
 
I'd love to see a common base class for the different SPIs, and I asked for one loooong ago. Back then, Paul didn't think it was necessary to have such a base class. It might be a good idea to think about this again.
 
Performance is a big concern with SPI and the added overhead of virtual functions. There are a lot of cases to consider, so just one quick test will not make a compelling argument.

I am willing to consider alternatives, but their performance impact has to be well tested and understood. So far, I've done pretty much zero work investigating this, and honestly it'd not a priority at this time. Audio library features and website updates are currently near the top of my todo list.
 
Thanks Paul,

Yes - I totally understand you wanting to spend time with the Audio stuff. And I totally understand and hope for website updates hopefully including a WIKI...

If there is any interest in it, I am willing to do the main part of the work, obviously it would be great if others were willing to help.

The purpose for this would be more for compatibility and allowing libraries like displays and the like to be adapted to allow use on different spi busses. If this could be done in a standard way, than maybe places like Adafruit, sparkfun, ... who have libraries will either change or accept changes to allow their devices to easier work on the different busses. For example I think Adafruit might be interested as their Feather boards support this, and they talk about how to create the objects in: https://learn.adafruit.com/using-atsamd21-sercom-to-add-more-spi-i2c-serial-ports

Performance: I believe that a virtual calls overhead is negligible. Using my SPIN library I saw very little differences. Also if I were to do it to main library, not sure how much would need to be virtual. For example if the object is holding onto a pointer to the appropriate SPI register set like lets call it pspi,

Then transfer changes from:
Code:
	inline static uint8_t transfer(uint8_t data) {
		SPI0_SR = SPI_SR_TCF;
		SPI0_PUSHR = data;
		while (!(SPI0_SR & SPI_SR_TCF)) ; // wait
		return SPI0_POPR;
	}
To:
Code:
	inline static uint8_t transfer(uint8_t data) {
		pspi->SR = SPI_SR_TCF;
		pspi->PUSHR = data;
		while (!(pspi0>SR & SPI_SR_TCF)) ; // wait
		return pspi->POPR;
	}

And personally the current class definition sucks for performance. The biggest performance issue is there is not valid way to keep the SPI queue (or double buffer) going at anywhere near full speed. It is always in a loop to push, wait, pop.... Even if you don't care at all about what comes back on the pop.. So queues and double buffering is neutered.

With this last build, I was able to speed up the multi-byte transfer by quite a bit, by being able to manage the queue, and I would probably use it in places like the SSD1306 driver, except it destroys the buffer (it puts the results out into the same buffer). If a new member was added, to allow a transfer like: transfer(source_buf, result_buf, count), I think it would allow people to easily speed up libraries.

Example the Sparkfun_TeensyView code. for updating the SSD1306 display, where for each page, you output a N byte header, change the DC pin and then output 128 bytes of data. For the 128 bytes of data, I have my code that manages the queues and/or double buffer,
But this could easily be done by: transfer(screenmemory[page*128], NULL, 128);
And get almost as much speed...

I could do it with the current api, by maybe first doing a memcpy from the screen memory to temporary array, and pass array to transfer... But I just lost the majority of speedup... Also adds delays....

But if there is not much interest, I do have other projects I will probably spend my time on. Probably get back to more of my Robotics stuff.
 
Indeed the SPI library API was very poorly design. I remember when it happened... David Mellis announced it with little notice before a release, and (as usual for David) rejected all feedback.

This sort of SPI API stuff needs to be adopted upstream with Arduino. I agree, it's sorely needed now that so many boards have multiple SPI ports.

For the last couple years, the Arduino.cc devs have been rather distracted by their legal troubles with Musto & Arduino.org. They're still working things out with a rather complex settlement process. I'm starting to see signs things are likely to improve soon. I really can't speak about this right now, other than to say I believe we'll likely see things improve later this year.
 
KurtE, I signed up for the E-paper shield (https://www.crowdsupply.com/soniktech/e-paper-shield-kit) after the initial crowd funding campaign. They claim the second wave of boards will start shipping at the end of April.

I looked into the code, and it appears this board can't be optimized, since they use non-standard CS/DC pins. Though I'm not sure it is important to optimize an e-paper display.

As far as I can tell from the code, the pinout looks like:
  • Pin 11, MOSI
  • Pin 13, SCLK
  • Pin 14/A0, BS (appears to be input to signal when the board is ready)
  • Pin 15/A1, RS (reset I assume)
  • Pin 16/A2, DC (I assume SPI D/C pin)
  • Pin 17/A3, CS (chip select)
  • Pin 18/A4, i2c SDA
  • Pin 19/A5, i2c SCL

It looks like there is a call to SPI.beginTransaction that is commented out, and no call to SPI.endTransaction.
 
Thanks Paul,

I am looking at seeing how hard it would be to make it work like the Wire library. The one area I am running into issues on doing the same conversion is that the SPI objects use the SPCR SPCR1 SPCR2 objects which don't have common base class or anything.
These are mainly used for things like setMiso... enable_pins.

The obvious simple fix would be to move ore replicate these into the SPI object.
 
I don't have much to add other than to say that I would love to see these features. I have a few libraries right now which support SPI communication to sensors; I use the define statements to switch between SPI buses. It works, but leads to many more lines of code and more time spent on maintenance. I'd be willing to help test any proposed solutions using my sensor libraries - I currently have at least one of each Teensy 3.x and LC device available for testing.

Brian
 
I am making some progress with making one class to handle all of the SPI objects. Actually two. One does T3.x and one for TLC.

So far things are compiling. I have been able to run my Teensyview updated code with this library and it appears to be still running on default SPI pins... For the most part this test is bypassing using transfer methods... Will now build a version of this library that does the same without SPIN library installed... And see how well it works on different SPI busses.

If anyone wishes to look at the WIP I pushed up a version of Wire up to my fork:
https:://www.github.com/kurte/Wire In a new branch SPI-Multi-one-class
 
I am making some progress with making one class to handle all of the SPI objects. Actually two. One does T3.x and one for TLC.

So far things are compiling. I have been able to run my Teensyview updated code with this library and it appears to be still running on default SPI pins... For the most part this test is bypassing using transfer methods... Will now build a version of this library that does the same without SPIN library installed... And see how well it works on different SPI busses.

If anyone wishes to look at the WIP I pushed up a version of Wire up to my fork:
https:://www.github.com/kurte/Wire In a new branch SPI-Multi-one-class

Hi Kurt,

I think you meant this repo, instead of wire: https://github.com/KurtE/SPI/tree/SPI-Multi-one-class

Also, I tested your SPIN library with a sensor library WIP that I'm working on (VectorNav VN-100 and VN-200). It works well and I'm not getting much of a performance hit; on the order of about 3-4 us.
 
I am making some progress with making one class to handle all of the SPI objects.

I know this is painful, but have you reached out to Arduino devs and other board makers via Arduino's developer mail list? Yes, I know that usually just ends up with a ton of bikeshedding, but it really is important for this to get adopted upstream.
 
Thanks Paul,

I think trying to break this up into multiple semi independent things. Some I know would need to probably reach out to the mail list. Thought about it yesterday until I saw email saying it will be down for 24 hours...

---------------------------------------------------------

1) Having all of the SPI busses supported by one object SPIClass using stuff passed into constructor. This part I don't believe needs to be discussed up there much as this is already done with the Arduino M0 (probably M4) versions of SPI library, and likewise for the Adafruit versions of ARM boards.

---------------------------------------------------------

2) Maybe add some more query functions to better support multiple busses. They already have: setMISO, setMOSI, setSCK for setting this pins.

Teensy already has functions for checking/using CS pins: setCS, pinIsChipSelect

I like the pinIsChipSelect functionality and add similar: pinIsMISO, pinIsMOSI, pinIsSCK - As to allow apps/libraries be developed that can do validation without resorting to putting a bunch of if (miso == x or miso == y) with a bunch of #ifdef by is it a T3.2 or a 3.6....
This part I would talk about in email list.

I would also like to add some #define like you did in Wire: #define SPI_IMPLEMENT_SPI SPI_IMPLEMENT_SPI1 SPI_IMPLEMENT_SPI2

With this some libraries could support multiple SPI busses without the calling app having to do anything. For example the Sparkfun_Teensyview library has parameters for MOSi, SCK, and CS, and DC. The underlying library could then do a set of queries to find out which buss these pins belong to.

2a) I also have a couple of more teensy specific query functions, that I currently have that are needed to easily support the actual hardware, which is processor specific. For T3.x I have:
inline KINETISK_SPI_t& SPIRegisters() {return port;}
inline uint8_t queueSize() {return hardware.queue_size;}
For TLC I have:
inline KINETISL_SPI_t& SPIRegisters() {return port;}
inline uint8_t queueSize() {return 0;}
These allow me to more easily have a library that touches the direct hardware to get the information it needs. Could be done external, by knowing which SPI object it has and then deduce this information.

---------------------------------------------------------

3) Performance (if done right) adding a member transfer, maybe like: void transfer(void *txbuf, void *rxbuf, size_t count)
Either pointer can be NULL, on txbuf a NULL implies send a 0 for each transfer. A null for rxbuf says I don't care about the return values.
I will implement the current one that take one pointer by simply calling the new one with the same pointer twice (inline)

Will talk about on email list, but trying to decide when. My thought is to do it first and worst case the two pointer version becomes private. But I believe I need ammunition to have it accepted. So my thoughts are, almost need three versions of a usage of this, example Teensyview.
a) Straight SPI - one byte transfers... Show the speed.

b) Version using hardware queue/double buffer. - Will have the best speed, but has lots of nitty gritty of library having to know about the libraries, muck with registers.

c) A version using new API. The code in this library should be pretty simple,
For each page: Assert DC, output header stuff (maybe 4 bytes), unassert DC, output 128 byte data, repeat...
Hopefully the performance should get pretty close to b)

c2) maybe a version that uses current api, having an extra 128 byte buffer we copy buffer into first.... See if any better than a)

Again I agree I should discuss this up on email list. Question is when is the best time to do so?
 
Quick update:
Did some more testing. Ran the Teensyview code on all three busses on T3.6. Then ran on both buses on TLC. FIxed TLC issue.

Decided to not do the SPI_IMPLEMENTED_SPI like defines I mentioned above, but instead defined how many SPI busses there are, like:
Code:
#ifndef SPI_INTERFACES_COUNT
#if defined(__MK20DX128__) || defined(__MK20DX256__)
#define SPI_INTERFACES_COUNT 1
#elif defined(__MK64FX512__) || defined(__MK66FX1M0__)
#define SPI_INTERFACES_COUNT 3
#elif defined(KINETISL)
#define SPI_INTERFACES_COUNT 2
#endif
#endif
As that appears to be the way it is done in some of Arduino ARM versions.

I also updated my test version of Teensyview library to automatically guess which SPI buss to use, like:
Code:
	if (!_spi->pinIsMOSI(mosiPin) || !_spi->pinIsSCK(sckPin)) {
		#if SPI_INTERFACES_COUNT > 1
		if (SPI1.pinIsMOSI(mosiPin) && SPI1.pinIsSCK(sckPin)) {
			_spi = &SPI1;
		}
		#if SPI_INTERFACES_COUNT > 2
		else if (SPI2.pinIsMOSI(mosiPin) && SPI2.pinIsSCK(sckPin)) {
			_spi = &SPI2;
		}
		#endif  // > 2
		else
		#endif
		{
			Serial.print("Not valid SPI pins");
		}

	}

	_spi->setMOSI(mosiPin);	// set the mosi pin. 
	_spi->setSCK(sckPin);
The mentioned branch of my fork has been updated with current stuff.

I may also push up the update teensyview stuff in it's on branch.

Next up see how about adding the transfer with separate TX and RX pointers: 3) I in previous posting
 
Again quick update: I updated the current multi-byte transfer code to handle two pointers TX, RX buffer pointers and count. I then had the existing API call it inline with the same pointer to both. Added code to check for NULL pointers and either send 0 or not do anything with data received. Which is now building.

I then hacked up the Teensyview update code, to use this new function.
The function I updated looked like:
Code:
void TeensyView::display(void) {
	uint8_t i, j;
	uint8_t *pscreen_data = screenmemory; 
	uint8_t count_pages = (_height/8);
	beginSPITransaction();
	for (i=0; i<count_pages; i++) {
		setPageAddress(i);
		setColumnAddress(0);
		for (j=0;j<LCDWIDTH;j++) {
			if ((j == (LCDWIDTH-1)) && (!_pcs_command || (i == (count_pages-1))))
				data(*pscreen_data, true);
			else
				data(*pscreen_data, false);
			pscreen_data++;
		}
	}
	endSPITransaction();
}

Which then the Set Page/Column output 4 command bytes, which I have specialty code to try to keep the Queue full on T3.x and double buffered on TLC... Then calls data which outputs bytes and again tries to do all of the specialty stuff to keep queue full, then if hardware DC/CS handles the pushr/popr... stuff.

The newer code is pretty simple, with a little cheat:
Code:
void TeensyView::display(void) {

	uint8_t set_page_column_address[] = {0xb0, 0x21, 0, 0x7f};  // set to page 0 column 0
	uint8_t i;
	uint8_t *pscreen_data = screenmemory; 
	uint8_t count_pages = (_height/8);
	beginSPITransaction();
	for (i=0; i<count_pages; i++) {
		// Output the header
		*_dcport  &= ~_dcpinmask; // assert DC
		set_page_column_address[0] = 0xb0 + i;
		_spi->transfer(set_page_column_address, NULL, sizeof(set_page_column_address));

		// Now go into data mode
		*_dcport  |= _dcpinmask;
		_spi->transfer(pscreen_data, NULL, LCDWIDTH);
		pscreen_data += LCDWIDTH;
	}
	endSPITransaction();
}
The cheat is I have simple array for output page and column address. The beginSPI/endSPI functions are there to handle differences of knowing if I can have the CS setup as part of the pushr stuff or if I manually have to do it. Again if I were to go this way for this library,
I would simply remove these and simply do spi->beginTransaction(...), _csport &= ...

The good news is for this display the code is more or less as fast as doing all of the custom PUSHR (3.x) or trying to double buffer on LC. Actually the LC sped up a lot this way...

Speed for 128x32 on T3.2 96mhz (built using Stino) is now about 547us so I am happy with it!
 
...
Speed for 128x32 on T3.2 96mhz (built using Stino) is now about 547us so I am happy with it!

Nice work Kurt. Did you run the SparkFun FFT sample with the update time on screen that was showing 9-11ms in their vid? I got my two but as always soldering them for 'best' use slows me down - so I've kept myself busy with other new 'toys' rather than facing that quandary :)
 
More background information,

Trying to figure out a general idea of some of the capabilities of different implementations to see what to propose.

Arduino AVR:
Single SPI, supports transfer, transfer16, transfer(buf, cnt).

Arduino Due: Should our version still have some support for this? Only makes sense if they install in sketch/library folder...
Arduino version: Looks like only single SPI
Transfer functions, look like they added duplicates of each of them with extra parameter to say is the the SPI_LAST or SPI_CONTINUE. Probably has to do with what to do with CS pin....

Intel Edison: i686
No Transaction support
Only SPI object
No setting of mosi/miso/sck
No transfer16
multi byte transfer with Read/Write pointers: void transferBuffer(const uint8_t *, uint8_t *, uint32_t);

Intel Galileo: i586 Same as Edison

ESP32 - Looks like it supports multiple, has transactions
They have transfer, transfer16, transfer32.
They also multibyte transfers:
void transferBytes(uint8_t * data, uint8_t * out, uint32_t size);
void transferBits(uint32_t data, uint32_t * out, uint8_t bits);

Will continue to look. But so far it looks like many platforms have the ide of two pointers, but no standard name. Even Arduino DUE api is different than their AVR version.
 
Nice work Kurt. Did you run the SparkFun FFT sample with the update time on screen that was showing 9-11ms in their vid? I got my two but as always soldering them for 'best' use slows me down - so I've kept myself busy with other new 'toys' rather than facing that quandary :)
This morning I tried hooking up audio adapter... There code is setup to take line in values which I don't have setup on it... But have the microphone plugged in...

I commented out the attempt to change SPI object in Serial flash code so I could build.

Currently running on my updated version, that is simply using my new transfer(out, in, cnt) version, which is not making any use of hardware CS/DC and it looks like the display is updating every 2-3ms CPU=99 mem=18... The display sort of flickers... as it updates this fast...

Might try again in a bit with turning on the other more complex code that tries to use the hardware SPI. However probably won't be much different, as the SPI.transfer(buf... ) code works pretty well and uses the 16 bit writes and the like to keep the buss as full as possible. But has to wait until queue is empty before changing DC (so probably 8 times per screen update)
 
I suppose the audio lib runs as fast as it can and doesn't count display update time? CPU use on SFUN vid was 77% now up to 99% when loop was 8-11ms. { crossed thread here }

Interesting you are just seeing the flicker doing the bar updates when at loop=2-3ms. I'll look to make time to hook one up to the PJRC T_3.2 Audio demo protoboard in place of the TFT and use the HDW SPI lines. I have a bottom pinned audio board I made for the K66 beta - have to check I can do that and get out the other SPI's.
 
The last test was done with a not very ideal setup for SPI speed.
L1000409.jpg

But as you can see Audio adapter plugged into top of Teensy, plus then used jumpers to test setup...
 
Thanks - if that works with those wires I should be able to match it. I'll just pin one TeensyView like yours into a baby breadboard.
 
Tried on the T_3.2 audio tutorial board - failed to connect somehow, conflict with analog or dome other chosen pin - just got garbage of a sort, sometimes real stuff - moved to a lone T_3.2 same pins to confirm I didn't break the display and it runs.

Just noticed the VIDEO is done with a T_3.6 on the 8-11ms updates at FFT usage of 77% - as noted on SparkFun 99% is the twin FFT's on a T_3.1. So maintaining the FFT's and faster refresh is impressive. Though bumping up SPI clock does a lot to the samples I ran.

Looks like the audio board sample does FFT1024 and only shows the 40 bins that fit on screen. FFT256 would be a better fit.

Some of the screen flicker might go away if "oled.display();" was only done when an FFT was actually available on that loop()? Then refresh would only be ~86/sec - or up to twice that if the FFT's are not in sync - and not re-writing the same data on the many extra loops.

I'll move to other audio board and try on T_3.2,3.6 and see what I see, otherwise this post mostly should be on the TView thread.
 
Thought it might be useful, probably part of email to email group talking about different options on SPI.transferish functions and possible SPI speed. So I thought I would extract out the update screen function and do it a few different ways.

One that simply uses SPI.transfer(byte) to output the whole thing. One that uses the current buffer output function, that uses the same buffer for both reading and writing so I have to copy the data out to a buffer then do the transfer(buf, cnt) call. This one uses 128 byte buffer on stack.


And then one that uses the new function I added: transfer(write buff, rd buf, cnt).
Code is self contained, other than it uses SPI.

Code:
// Quick and dirty extract of parts of Teensyview to compare speed of different SPI methods to compare
// speed.
#include <SPI.h>
// #include <SPIKinetis.h>
#define LCDWIDTH 128
#define LCDHEIGHT 32
#define PIN_RESET 15
#define PIN_SCK 13
#define PIN_MOSI 11
#define PIN_DC 9
#define PIN_CS 10

uint8_t pcs_data = 0;
uint8_t pcs_command = 0;

uint8_t screenmemory[512] =
{
	0
};

// uint8_t screenmemory[(LCDHEIGHT * LCDWIDTH) /8] = {0};
uint8_t _height = 32;
volatile uint8_t * _csport, * _dcport;
uint8_t _cspinmask, _dcpinmask;
uint32_t clockRateSetting = 8000000;
void setup()
{
	pinMode(PIN_CS, OUTPUT);
	_csport = portOutputRegister(digitalPinToPort(PIN_CS));
	_cspinmask = digitalPinToBitMask(PIN_CS);
	* _csport |= _cspinmask;
	pinMode(PIN_DC, OUTPUT);
	_dcport = portOutputRegister(digitalPinToPort(PIN_DC));
	_dcpinmask = digitalPinToBitMask(PIN_DC);
	SPI.begin();
}

void loop()
{
	uint32_t start_time = micros();
	display_transfer_byte();
	Serial.print("Simple one byte output time: ");
	Serial.println(micros() - start_time, DEC);
	delay(5);
	start_time = micros();
	display_transfer_buf();
	Serial.print("Transfer using new buffer transfer time: ");
	Serial.println(micros() - start_time, DEC);
	delay(5);
	start_time = micros();
	display_transfer_buf_old();
	Serial.print("Transfer using old buffer transfer time: ");
	Serial.println(micros() - start_time, DEC);
#ifdef KINETISK
	delay(5);
	pcs_data = 0;
	pcs_command = pcs_data | SPI.setCS(PIN_DC);

	start_time = micros();
	display_transfer_hardware();
	Serial.print("Transfer using pushr time: ");
	Serial.println(micros() - start_time, DEC);

	// Make DC an output pin again 
	pinMode(PIN_DC, OUTPUT);

#endif	
	Serial.println();


	delay(1000);
}

void display_transfer_buf(void)
{
	uint8_t set_page_column_address[] =
	{
		0xb0, 0x21, 0, 0x7f
	}; // set to page 0 column 0
	uint8_t i;
	uint8_t * pscreen_data = screenmemory;
	uint8_t count_pages = (_height/8);
	SPI.beginTransaction(SPISettings(clockRateSetting, MSBFIRST, SPI_MODE0));
	* _csport &= ~_cspinmask;
	for (i = 0; i < count_pages; i++)
	{
		// Output the header
		* _dcport &= ~_dcpinmask; // assert DC
		set_page_column_address[0] = 0xb0 + i;
		SPI.transfer(set_page_column_address, NULL, sizeof(set_page_column_address));
		// Now go into data mode
		* _dcport |= _dcpinmask;
		SPI.transfer(pscreen_data, NULL, LCDWIDTH);
		pscreen_data += LCDWIDTH;
	}
	* _csport |= _cspinmask;
	SPI.endTransaction();
}

void display_transfer_buf_old(void)
{
	uint8_t set_page_column_address[] =
	{
		0xb0, 0x21, 0, 0x7f
	}; // set to page 0 column 0
	uint8_t i;
	uint8_t * pscreen_data = screenmemory;
	uint8_t count_pages = (_height/8);
	uint8_t tmp_buff[LCDWIDTH];
	SPI.beginTransaction(SPISettings(clockRateSetting, MSBFIRST, SPI_MODE0));
	* _csport &= ~_cspinmask;
	for (i = 0; i < count_pages; i++)
	{
		// Output the header
		* _dcport &= ~_dcpinmask; // assert DC
		set_page_column_address[0] = 0xb0 + i;
		memcpy(tmp_buff, set_page_column_address, sizeof(set_page_column_address));
		SPI.transfer(tmp_buff, sizeof(set_page_column_address));
		// Now go into data mode
		* _dcport |= _dcpinmask;
		memcpy(tmp_buff, pscreen_data, LCDWIDTH);
		SPI.transfer(tmp_buff, LCDWIDTH);
		pscreen_data += LCDWIDTH;
	}
	* _csport |= _cspinmask;
	SPI.endTransaction();
}

void display_transfer_byte(void)
{
	uint8_t i;
	uint8_t * pscreen_data = screenmemory;
	uint8_t count_pages = (_height/8);
	SPI.beginTransaction(SPISettings(clockRateSetting, MSBFIRST, SPI_MODE0));
	* _csport &= ~_cspinmask;
	for (i = 0; i < count_pages; i++)
	{
		// Output the header
		* _dcport &= ~_dcpinmask; // assert DC
		SPI.transfer(0xb0 + i);
		SPI.transfer(0x21);
		SPI.transfer(0);
		SPI.transfer(0x7f);
		// Now go into data mode
		* _dcport |= _dcpinmask;
		for (uint8_t j = 0; j < LCDWIDTH; j++)
		{
			SPI.transfer(* pscreen_data++);
		}
	}
	* _csport |= _cspinmask;
	SPI.endTransaction();
}

// Try using Hardware buffers. 
// Note for now I will only do this for DC...


#ifdef KINETISK
	//void waitFifoNotFull(void) __attribute__((always_inline)) {
	void waitFifoNotFull(void) {
		uint32_t sr;
		uint32_t tmp __attribute__((unused));
		do {
			sr = KINETISK_SPI0.SR;
			if (sr & 0xF0) tmp = KINETISK_SPI0.POPR;  // drain RX FIFO
		} while ((sr & (15 << 12)) > (3 << 12));
	}
	void waitFifoEmpty(void) {
		uint32_t sr;
		uint32_t tmp __attribute__((unused));
		do {
			sr = KINETISK_SPI0.SR;
			if (sr & 0xF0) tmp = KINETISK_SPI0.POPR;  // drain RX FIFO
		} while ((sr & 0xF0F0) > 0);             // wait both RX & TX empty
	}
	inline void waitTransmitComplete(void)  {
		uint32_t tmp __attribute__((unused));
		while (!(KINETISK_SPI0.SR & SPI_SR_TCF)) ; // wait until final output done
		tmp = KINETISK_SPI0.POPR;                  // drain the final RX FIFO word
	}
	inline void waitTransmitComplete(uint32_t mcr)  {
		uint32_t tmp __attribute__((unused));
		while (1) {
			uint32_t sr = KINETISK_SPI0.SR;
			if (sr & SPI_SR_EOQF) break;  // wait for last transmit
			if (sr &  0xF0) tmp = KINETISK_SPI0.POPR;
		}
		KINETISK_SPI0.SR = SPI_SR_EOQF;
		SPI0_MCR = mcr;
		while (KINETISK_SPI0.SR & 0xF0) {
			tmp = KINETISK_SPI0.POPR;
		}
	}
	inline void writecommand_cont(uint8_t c)  {
		KINETISK_SPI0.PUSHR = c | (pcs_command << 16) | SPI_PUSHR_CTAS(0) | SPI_PUSHR_CONT;
		waitFifoNotFull();
	}
	inline void writedata8_cont(uint8_t c)  {
		KINETISK_SPI0.PUSHR = c | (pcs_data << 16) | SPI_PUSHR_CTAS(0) | SPI_PUSHR_CONT;
		waitFifoNotFull();
	}
	inline void writedata16_cont(uint16_t d)  {
		KINETISK_SPI0.PUSHR = d | (pcs_data << 16) | SPI_PUSHR_CTAS(1) | SPI_PUSHR_CONT;
		waitFifoNotFull();
	}
	inline void writecommand_last(uint8_t c)  {
		uint32_t mcr = SPI0_MCR;
		KINETISK_SPI0.PUSHR = c | (pcs_command << 16) | SPI_PUSHR_CTAS(0) | SPI_PUSHR_EOQ;
		waitTransmitComplete(mcr);
	}
	inline void writedata8_last(uint8_t c)  {
		uint32_t mcr = SPI0_MCR;
		KINETISK_SPI0.PUSHR = c | (pcs_data << 16) | SPI_PUSHR_CTAS(0) | SPI_PUSHR_EOQ;
		waitTransmitComplete(mcr);
	}
	inline void writedata16_last(uint16_t d)  {
		uint32_t mcr = SPI0_MCR;
		KINETISK_SPI0.PUSHR = d | (pcs_data << 16) | SPI_PUSHR_CTAS(1) | SPI_PUSHR_EOQ;
		waitTransmitComplete(mcr);
	}

void display_transfer_hardware(void)
{
	uint8_t i;
	uint8_t * pscreen_data = screenmemory;
	uint8_t count_pages = (_height/8);
    uint16_t w = 0; 
	SPI.beginTransaction(SPISettings(clockRateSetting, MSBFIRST, SPI_MODE0));
	* _csport &= ~_cspinmask;
	for (i = 0; i < count_pages; i++)
	{
		// Output the header
		* _dcport &= ~_dcpinmask; // assert DC
		writecommand_cont(0xb0 + i);
		writecommand_cont(0x21);
		writecommand_cont(0);
		writecommand_cont(0x7f);
		// Now go into data mode
		* _dcport |= _dcpinmask;
		// Faster if we write 16 bits out at a time instead of 8
		for (uint8_t j = 0; j < LCDWIDTH; j+=2)
		{
	    	w = (*pscreen_data++) << 8;
			w |= *pscreen_data++;
			if (pscreen_data >= &screenmemory[(LCDWIDTH*LCDHEIGHT)/8])
				writedata16_last(w);
			else
				writedata16_cont(w);
		}
	}
	* _csport |= _cspinmask;
	SPI.endTransaction();
}
#endif

The output showed values like:
Code:
Simple one byte output time: 681
Transfer using new buffer transfer time: 553
Transfer using old buffer transfer time: 563
So it is an obvious win to use the multibyte transfer, A little better with new function, also required need for buffer. I may also put in version that uses PUSHR/POPR, for comparisons...

Ran same test on LC
Code:
Simple one byte output time: 1217
Transfer using new buffer transfer time: 857
Transfer using old buffer transfer time: 957
Show more of a win
Update - Added code using pushr/popr
I updated the code above to copy in some of the helper functions from ili8341_t3 code base to manage queues and then added version of code that used this. Actually two attempts. The first attempt only pushed the data one byte at a time, and was actually slower than simply using the new Transfer with two buffer pointers version. So I then added code to output words instead. and now the time for this version is:

Code:
Simple one byte output time: 680
Transfer using new buffer transfer time: 554
Transfer using old buffer transfer time: 563
Transfer using pushr time: 542
So again it does improve the speed.
 
Last edited:
(Paul and others) Will soon email the Arduino email group. But thought I should have more information, like the performance gains help some of the other higher end platforms. Example some of the other Arduino using m0... So yesterday received a Sparkfun Samd breakout board... Forgot I already had a couple of Adafruit Feather M0 boards...

So I updated my test program again.. to work on Sparkfun board - Argh!
USB serial is not Serial but is SerialUSB
SPI is not on the pins they show in their documentation, but instead on pins 10-13
Switched to use digitalWrite instead of registers as I was lazy and have not looked up #define to change the values to 32 bit...
Their SPI implementation su... (is less than optimal!)
Using the output 1 byte at a time took something like 1800us
Using the transfer(buf, cnt) took something like: 1335us

I looked at their implementation for transfer and they are not using anything to double buffer... I did a first attempt implementation:
Code:
void SERCOM::transferDataSPI(const uint8_t *buf, uint8_t *rxbuf, uint32_t count) 
{
  if ( count == 0 )
    return;

  const uint8_t *p = buf;
  uint8_t *pret = rxbuf;
  uint8_t in;

  // Probably don't need here .
  sercom->SPI.DATA.bit.DATA = p ? *p++ : 0; // Writing data into Data register
  while (--count > 0) {
    while( sercom->SPI.INTFLAG.bit.DRE == 0 )
    {
      // Waiting for TX Buffer empty
    }
    sercom->SPI.DATA.bit.DATA = p ? *p++ : 0; // Writing data into Data register


    while( sercom->SPI.INTFLAG.bit.RXC == 0 )
    {
      // Waiting Complete Reception
    }
    in =  sercom->SPI.DATA.bit.DATA;
    if (pret)*pret++ = in;
  }
  while( sercom->SPI.INTFLAG.bit.RXC == 0 )
  {
    // Waiting Complete Reception
  }
  in =  sercom->SPI.DATA.bit.DATA;
  if (pret)*pret++ = in;

}
Which is hanging some of the time... Will rework by looking at stuff installed by AvrStudio... But when it is working,
The output (txbuf, NULL for rxbuf, count) is given reasonable times like about: 595ms


Will play a little more, but at least it looks doable. Also interesting with the Arduino board manager, installing the SAMD stuff, for Arduino, Sparkfun and Adafruit, that they don't share any of the same cores/libraries... They all have their own copies...
But hopefully soon back to Teensy stuff!
 
Status
Not open for further replies.
Back
Top