Teensy 4.0 DMA SPI?

Status
Not open for further replies.
Thanks, I did not notice the pixCount = 1 was inside the if condition when created the return.

Just tested, it seems that fixed the random noice I was getting after this update, and also it tolerates now delayMicroseconds(100); no problem.

How could I make sure the txBuffer[12512] __attribute__((aligned(32))); is within a single 32kb memory block, and would it be useful?

Forgot the flipByte(),

Code:
byte flipByte(byte c)
     {
       c = ((c>>1)&0b01010101)|((c<<1)&0b10101010);
       c = ((c>>2)&0b00110011)|((c<<2)&0b11001100);
       c = (c>>4) | (c<<4) ;

       return c;
     }

the Vcom was already defined on the code I provided.
 
Last edited:
I noted

Using previously compiled file: /var/folders/j0/czwf_k9507340824y6129t3h0000gp/T/arduino_build_438271/libraries/SPI/SPI.cpp.o

Would that be holding also the DMA related memory allocations and when new global variables are defined, if that does not get updated...

But I can not find that file because this is a Mac...
 
As I mentioned, I did not look very hard. Was just trying to do a quick check to confirm what I suspected.

Previous compiled file... Is normal. It says that no source files changed that would require spi to recompile.

As for seeing the sources.

I am not a big MAC user, but have as a secondary setup.

You can see the files, by going in the finder, to where your Teensyduino apps is, and in my case sort of right click, but in this case click with two fingers on trackpad to bring up context menu and choose show package contents.

Which shows contents folder, click on it, shows several folders, click on the java folder.

Shows lots of stuff, click on hardware folder.... teensy... avr ... libraries ... spi
 
"It says that no source files changed that would require spi to recompile. "

Yes, but the question is, does it do that analysis correctly. Is there a simple way for force recompile all.
 
Is there a simple way for force recompile all.

If using Arduino, just change Tools > Boards to a different board. Click Verify (this might not be necessary...) Then change back. It will recompile everything if you've changed to a different board.

Arduino should also recompile everything if you've changed one of the settings like Tools > CPU Speed.
 
I think that did the recompilation trick as now get "Compiling library "SPI"" but it did not help.

I still need to do this to get it working correctly

Code:
uint8_t    txBuffer[12512] __attribute__((aligned(32))); 
uint8_t  DMAMEM  screenData[50][240];//50 holds 8 x 50 = 400 B/W pixels
uint8_t    screenDataOld[50][240];

The screenData[50][240]; is just the storage from image data and txBuffer[12512] is build from it before doing the DMS SPI transfer, so I do not see how it running from slower memory would help anything, and I have this now on loop anyway. there is 100 us delay after the build of txBuffer[12512] before sending (and it helps nothing, the delays are just to test, works the same without the delays.)

Code:
  if(dmaBusy == false ) {
    
    delayMicroseconds(100);
    Pixels = drawScreen();
    delayMicroseconds(100);
    if(Pixels > 1) sendScreen(txBuffer, Pixels);

So more likely DMAMEM screenData[50][240] just creates more suitable memory space for txBuffer[12512] or some other SPI DMA memory consumption, but do not understand really why that would be either, unless it has to do something with the 32kb memory blocks.

Is there a way to specify exactly the txBuffer[12512] memory location, what would be good location?
 
I tried this

Code:
uint8_t    txBuffer2[10000];
uint8_t    txBuffer[12512] __attribute__((aligned(32))); 
uint8_t   DMAMEM screenData[50][240];//50 holds 8 x 50 = 400 B/W pixels
uint8_t    screenDataOld[50][240];

i.e trying to adjust with txBuffer2[10000]; where gets located.

On setup

Code:
void setup() {
     txBuffer2[1]=1;
 delay(1000);
  Serial.printf("txBuffer:   %x\n", (uint32_t)txBuffer);
  Serial.printf("screenData: %x\n", (uint32_t)screenData);

The size of or existence of txBuffer2[10000] seems not to affect the locations of these two

txBuffer: 20003180
screenData: 20200000

But if I remove the DMAMEM from screenData[50][240], I have seen the txBuffer location to change, but actually at this time it stayed on same location, but the screen goes empty, does not work.

txBuffer: 20003180
screenData: 20009604

0x9604 - 0x3180 =0x6484, 25732 bytes in between so they are not overlapping. so what is going on?


strangely enough also this works

Code:
uint8_t    txBuffer[12512] __attribute__((aligned(32))); 
uint8_t    screenData[52][242];//50 holds 8 x 50 = 400 B/W pixels
uint8_t    screenDataOld[50][240];

i.e no DMAMEM but [52][242] instead of [50][240] for screenData and that works and the problem could be writing outside the boundaries of screenData and find a suspicious instance where I was not careful screenData[x+xpos][y+ypos], now verifying.

I was already worried how stupid I was, made a fast fix to ensure it is not going over limits

Code:
 Xpos = x+xpos;
      Ypos = y+ypos;
      if(Xpos > 49)  Xpos = 49;
      if(Ypos > 239) Ypos = 239;
    
    screenData[Xpos][Ypos]

But it seems that was not the problem. but then found an other place

Code:
if(x > 400) x = 400;
  if(y > 240) y = 240;
  
  xByte   = screenData[x/8][y]; //the byte to manipulate

And damn I was stupid, sorry for wasting everyones time, but I hope over all this helps others to set DMA working with the Sharp memory display.

It is a bit early to say this totally fixed it. but it seems so, I wonder though why this was not a problem before moving to DMA SPI, I guess just pure luck.
 
Last edited:
Here is a tested and working code for the screen update using SPI DMA using Teensy 4.0 and for Sharp 400x240 memory display, connection in here https://forum.pjrc.com/threads/23852-Adafruit-sharp-module/page2?highlight=Sharp

Code:
//No rights given, no responsibilities taken, but the code is here.

  #include <SPI.h>
  
  uint8_t         txBuffer[12512] __attribute__((aligned(32))); 
  uint8_t         screenData[50][240];//50 holds 8 x 50 = 400 B/W pixels
  uint8_t         screenDataOld[50][240];
  uint8_t         Vcom;
  volatile  bool  dmaBusy = false;
  EventResponder  callbackHandler;
  //note display max 2000000 use 8000000 at your own risk
  SPISettings     memSettings(8000000, MSBFIRST, SPI_MODE0);




void setup() {
  pinMode (10, OUTPUT);
  SPI.begin(); 
  // Setup the SPI DMA callback
  callbackHandler.attachImmediate(&callback);
  clearDisplay();  

}

void loop() {

  

      //Random noice to the screen, starting from line 100 
      //note a byte is 8 pixels on x direction, i.e 50x8 = 400 pixels
      for (int y = 100; y < 240; y++)  { for (int x = 0; x < 50; x++) {
      screenData[x][y] = random(255);
      }}

  if(dmaBusy == false )  {
    
    //screen refresh, alternates Vcom between 0x00 and 0x40
    //note: need to take care that this is send ocationally even if not other updates
    Vcom = Vcom ? 0x00 : 0x40;
    
    drawScreen();
    }

}

uint16_t drawScreen() {
  
  uint16_t pixCount = 0;
  uint8_t  updatedLines[240];
  uint8_t  updates = 0;
  
   //update only lines that have updates.
  for (int y = 0; y < 240; y++)  { for (int x = 0; x < 50; x++) {
      if( screenData[x][y] != screenDataOld[x][y]) {
        updatedLines[y] = 1;  updates = 1; 
        screenDataOld[x][y] = screenData[x][y]; 
         } else updatedLines[y] = 0;
  }} 

  //build txBuffer, could be combined with above for less for loops
  if( updates == 1 ){ 
  txBuffer[0] = 0x80 | Vcom;
  pixCount = 1;
  
  for (int y = 0; y < 240; y++)  { 
    
  if( updatedLines[y] == 1 )   {  
  for (int x = 0; x < 52; x++) {
    if( x == 0 )          txBuffer[pixCount] = flipByte(y);       //line number
    if( x > 0 && x < 51 ) txBuffer[pixCount] = screenData[x-1][y];//line pixels
    if( x == 51 )         txBuffer[pixCount] = 0x00;              //Trailer for line
    pixCount++;
   } 
   }
   txBuffer[pixCount] = 0x00;//Trailer for screen
  
  }

  //Send txBuffer to display using SPI DMA
  dmaBusy = true;
  digitalWriteFast(10,HIGH); 
  SPI.beginTransaction(memSettings);
  SPI.transfer((void *)txBuffer, nullptr, pixCount, callbackHandler); 
  }

}

void callback(EventResponderRef eventResponder)
{
  //end screen update
  SPI.endTransaction();
  digitalWriteFast(10,LOW); 
  dmaBusy = false;  
}


void clearDisplay() {
     digitalWriteFast(10,HIGH);
     SPI.beginTransaction(SPISettings(2000000, MSBFIRST, SPI_MODE0));
     SPI.transfer(0x20 | Vcom);
     SPI.transfer(0x00);
     digitalWriteFast(10,LOW); //end screen update
}

byte flipByte(byte c)
     {
       c = ((c>>1)&0b01010101)|((c<<1)&0b10101010);
       c = ((c>>2)&0b00110011)|((c<<2)&0b11001100);
       c = (c>>4) | (c<<4) ;

       return c;
     }


Screenshot 2021-05-20 at 11.45.36.jpg

Lessons learned, I should have made the SPI DMA to work with this simple code, that does not have all the clutter the main code has, now there was too many confusions.
 
Last edited:
Note: SPI.transfer((void *)txBuffer, nullptr, pixCount, callbackHandler); probably should have pixCount +1.
 
How is this event handler stuff working it seems this works as well, so what was the 'EventResponderRef eventResponder' for?

Code:
//void callback( EventResponderRef eventResponder ){
void callback(  ){
     SPI.endTransaction();
     digitalWriteFast(10,LOW); 
     dmaBusy = false;  
     }

As I understand this defines the name 'callbackHandler' fore the EventResponder

EventResponder callbackHandler;

this

callbackHandler.attachImmediate(&callback);

defines it will call the function named callback, what is the &

This tels

SPI.transfer((void *)txBuffer, nullptr, pixCount, callbackHandler);

the SPI DMA use the callbackHandler that cals the function callback when the DMA transfer is ready

Correct?

Is there someting else to know about it? I have been trying to find some instructions, but not much information about it (unless again spending a day reading the forum posts)

I changed it now to

Code:
EventResponder  SPI_DMA_Handler;

...

SPI_DMA_Handler.attachImmediate(&SPI_DMA_ready);

...

SPI.transfer((void *)txBuffer, nullptr, pixCount, SPI_DMA_Handler); 
...

void SPI_DMA_ready( EventResponderRef eventResponder  ){
     SPI.endTransaction();
     digitalWriteFast(10,LOW); 
     dmaBusy = false;  
     }

And it seems to be working fine, still wondering what are the (void *), nullptr, ( EventResponderRef eventResponder ) and & do, they seem to do nothing and if so they are just causing confusion.

I mean, this seems to work as well, so what is the difference, and why those to me non understandable things are there?

Code:
EventResponder  SPI_DMA_Handler;

...

SPI_DMA_Handler.attachImmediate(SPI_DMA_ready);

...

SPI.transfer(txBuffer, 0, pixCount, SPI_DMA_Handler); 
...

void SPI_DMA_ready(   ){
     SPI.endTransaction();
     digitalWriteFast(10,LOW); 
     dmaBusy = false;  
     }
 
Last edited:
Yes, there is very little documentation on EventResonder...

Paul was defining and creating it at the time I was playing with adding the Asynchronous transfer support to SPI. My original code simply had a callback function like:
Using your above: SPI.transfer(txbuffer, 0, pixCount, &SPI_DMA_ready);
And it would simply always call your callback (it not zero) when it completes.

But we converted it over to using the Event Responder, which does give you other options, for when that callback actually happens. Immediate, on Yield, on timer...

I know over time he would like to convert more of the callback like functions over to use the events.

as for: nullptr, NULL, 0 ... mentioned this earlier. Again try a google on nullptr.

It is simply a more explicit way to say I am passing nothing to your pointer, which is the same as 0... Which makes things a lot clearer, that is suppose you saw:
this function and it was called like: SPI.transfer(txbuffer, 0, 1, event);
Would you understand that the 0 and 1 look similar, that first 0 is to a pointer and as such 0 is special...
 
The problem I have is something like this

"The nullptr keyword can be used anywhere a handle, native pointer, or function argument can be used."

Just does not compute for me.

So in tis context we are now talking I assume nullptr = 0 and it has no other meaning, and then it is just more clear to remove it.

what is the purpose of & like you have used in &SPI_DMA_ready?
 
As I believe I mentioned you will see code that uses: 0 or NULL, or nullptr.

I believe at times, just passing 0 may have created warning messages about conversions...

So for example if the first argument passed in the SPI.transfer was defined as uint8_t * instead of void *
you might at times see code where the person is very specific and passes in: (uint8_t*)0
Or you might see them pass in (void*)0 as this is a pointer to more or less anything... But you can not really use it, until you cast it to some other pointer to something...

Sorry I am not C/C++ language instructor, so my explanations may not very concise or complete.

And I hope I properly mentioned in previous post, that the &SPI_DMA_ready was for a earlier version of this code that was never released.

In c++ there are at least three different ways to pass things to a functions: The data, A pointer to the data, or by reference:

a) By value: example in the transfer function: the count of items to transfer is by value. The value you passed in to the call is put into a local variable, The function called can use this value and the like, and the function that called it has no idea if the called function modified this value or not. Sorry I am skipping over things like if I pass in an object, it creates a copy of that object...

b) by Pointer: example your txBuffer and more appropriately return buffer which you are passing 0 to... What is actually passed to the function is the address of the place where your data is... The called function can use and modify this data, and the calling code will see these modifications (example data returned by SPI)... So when passing by pointer, the use of the & says get the address of what follows.
So &SPI_DMA-ready says pass in the address of this function:
In practice the original code may have be defined like:

Code:
void SPI_DMA_ready() {
}
...
SPI.transfer(txbuffer, 0, pixCount, &SPI_DMA_ready);
...
int transfer(void *buffer, void *rxbuffer, int count, void (*callback)()) {
    ....
    if (callback) (*callback)();
}
Perfectly clear ;)
Often times people maybe don't check callback to be a valid pointer, example you pass 1, and you deference it and program faults.

c) more recently by reference, which sort of tries to make passing data easier and helps avoid the faults...
You specify in a function header that you are passing a parameter by reference by the &...
What does that mean, for one there is no pointer so you don't have the concept of not passing in a parameter. And the compiler can type check what you pass in is valid..

Note passing in by reference is not restricted to objects or ... But also to simple normal types like integer.

Example difference:

Code:
void my_function(int val) {
    val++;
}
void loop() {
    int local_val = 0;
    my_function(local_val );
    Serial.println(local_val );
}

Will print 0's

But:
Code:
void my_function(int &val) {
    val++;
}
void loop() {
    int local_val = 0;
    my_function(local_val );
    Serial.println(local_val );
}
Will print 1's as the incremented val in the function updates local_val...
Which is more or less like using pointers:
Code:
void my_function(int *val) {
    *val++;
}
void loop() {
    int local_val = 0;
    my_function(&local_val );
    Serial.println(local_val );
}

Again sorry, probably clear as mud...
 
Jeah the pointers and stuff, I really try to stay a way form them, but it gets more and more difficult. For me it seems as a concept to use if asking for memory leaks etc stuff and like to trouble shoot them. That of course means that I have no understanding of pointers and stuff, but feel not many has. And I would prefer not learning them, but maybe have to, just to get things done.
 
Probably not DMA related but...

I got finally ADC reading fast and well filtered and screen updates solid 140+ FPS for moving and rotating arrow like below and it is super smooth.

Code:
uint32_t ArrowBig[41] = { 0b00000000000000011000000000000000,
                          0b00000000000000011000000000000000,
                          0b00000000000000111100000000000000,
                          0b00000000000000111100000000000000,
                          0b00000000000001111110000000000000,
                          0b00000000000001111110000000000000,
                          0b00000000000011111111000000000000,
                          0b00000000000011111111000000000000,
                          0b00000000000111111111100000000000,
                          0b00000000001111111111110000000000,
                          0b00000000001111111111110000000000,
                          0b00000000011111111111111000000000,
                          0b00000000011111111111111000000000,
                          0b00000000111111111111111100000000,
                          0b00000000111111111111111100000000,
                          0b00000001111111111111111110000000,
                          0b00000001111111111111111110000000,
                          0b00000011111111111111111111000000,
                          0b00000011111111111111111111000000,
                          0b00000111111111111111111111100000,
                          0b00000111111111111111111111100000,
                          0b00001111111111111111111111110000,
                          0b00001111111111111111111111110000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000 };

Having the bitmaps as globals at start of the code is though somewhat inconvenient

So moved the bitmap to a function

Code:
uint32_t ArrowBig(uint16_t c)
     {

uint32_t ArrowBig[41] = { 0b00000000000000011000000000000000,
                          0b00000000000000011000000000000000,
                          0b00000000000000111100000000000000,
                          0b00000000000000111100000000000000,
                          0b00000000000001111110000000000000,
                          0b00000000000001111110000000000000,
                          0b00000000000011111111000000000000,
                          0b00000000000011111111000000000000,
                          0b00000000000111111111100000000000,
                          0b00000000001111111111110000000000,
                          0b00000000001111111111110000000000,
                          0b00000000011111111111111000000000,
                          0b00000000011111111111111000000000,
                          0b00000000111111111111111100000000,
                          0b00000000111111111111111100000000,
                          0b00000001111111111111111110000000,
                          0b00000001111111111111111110000000,
                          0b00000011111111111111111111000000,
                          0b00000011111111111111111111000000,
                          0b00000111111111111111111111100000,
                          0b00000111111111111111111111100000,
                          0b00001111111111111111111111110000,
                          0b00001111111111111111111111110000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000 }; 
      uint32_t BIT32 = ArrowBig[c];
                            
      return BIT32;
      }

The FPS counter still shows 140 + FPS, but not so consistent and the updates do not look as smooth? maybe I am imagining, but why would it be slower to read the bitmap from a function?

the ArrowBig() is accessed for each bit twice per frame update, so 367360 times per second. I will see now how that could be optimised, but should there be difference in speed accessing the bitmap as global variable or via the function?
 
Last edited:
I did make it so that the 32 bits are read on one access to function, so reducing the function reads 32x. got the arrow to draw nicely, but still was problems with fonts that can be positioned at 1 pixel steps. ended up for now making a temporary globals that hold the bitmaps, works great, but waste memory, especially as functions can not store local variables to FLASHMEM.

Code:
for (int i = 0; i< 41;   i++)  {ArrowBigT[i]  = ArrowBig(i);  }
for (int i = 0; i< 2280; i++)  {LVZfont16T[i] = LVZfont16(i);}

BTW, editing fonts, no font editor needed :)

Code:
0b0000000000000000, 
0b0000000000000000, 
0b0000000000000000, 

//char(54) = 6  start: 528
0b0000000000000000, 
0b0000000000000000, 
0b0000000000000000, 
0b0000000011110000, 
0b0000001111110000, 
0b0000011100000000, 
0b0000111000000000, 
0b0001110000000000, 
0b0001100000000000, 
0b0001101111000000, 
0b0001111111110000, 
0b0001110000110000, 
0b0001100000011000, 
0b0001100000011000, 
0b0001100000011000, 
0b0000110000111000, 
0b0000111111110000, 
0b0000001111100000, 
0b0000000000000000, 
0b0000000000000000, 
0b0000000000000000, 
0b0000000000000000, 
0b0000000000000000, 
0b0000000000000000, 

//char(55) = 7  start: 552
0b0000000000000000, 
0b0000000000000000, 
0b0000000000000000, 
0b0001111111111000, 
0b0001111111111000,
 
Last edited:
A couple of quick notes, when you do something like this:
Code:
uint32_t ArrowBig(uint16_t c)
     {

        uint32_t ArrowBig[41] = { 
                         0b00000000000000011000000000000000,
                          0b00000000000000011000000000000000,
                          0b00000000000000111100000000000000,
                          0b00000000000000111100000000000000,
                          0b00000000000001111110000000000000,
                          0b00000000000001111110000000000000,
                          0b00000000000011111111000000000000,
                          0b00000000000011111111000000000000,
                          0b00000000000111111111100000000000,
                          0b00000000001111111111110000000000,
                          0b00000000001111111111110000000000,
                          0b00000000011111111111111000000000,
                          0b00000000011111111111111000000000,
                          0b00000000111111111111111100000000,
                          0b00000000111111111111111100000000,
                          0b00000001111111111111111110000000,
                          0b00000001111111111111111110000000,
                          0b00000011111111111111111111000000,
                          0b00000011111111111111111111000000,
                          0b00000111111111111111111111100000,
                          0b00000111111111111111111111100000,
                          0b00001111111111111111111111110000,
                          0b00001111111111111111111111110000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000,
                          0b00000000000111111111100000000000 }; 
      uint32_t BIT32 = ArrowBig[c];
                            
      return BIT32;
      }
You should be aware that this is code at the start of this function, that will copy those 41 words into your array ArrowBig
every time this function is called. So it is probably eating up double or triple the space.

If however you changed this declare to be static - it is only done once. If you declare as const, then system knows you are not going to update it. On T3.x that would leave the
data up in FLASH, but on T4.x it will default to still copy the data down into ITCM... But if you then also specify PROGMEM it will leave the data in flash and not eat into your program and data space.

That is: static const uint32_t ArrowBig[41] PROGMEM = {
The compiler will generate the 41 words which will reside always in flash and no code needed to initialize it...
 
Thanks, very useful info.

Now that they are defined static const, accessing them directly via function works perfectly.
 
Last edited:
Status
Not open for further replies.
Back
Top