Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 16 of 16

Thread: T4.x DMAMEM and RA8876 and SPI - (Paul?) - Large image does not display correct...

  1. #1
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    7,912

    T4.x DMAMEM and RA8876 and SPI - (Paul?) - Large image does not display correct...

    Thought about just adding this to the RA8876 thread, BUT there is an interesting issue having to do (I think) with SPI doing DMA transfer from DMAMEM not getting the right bits...

    Note for those with RA8876, this is a version of the embeded picture sketch that I have been playing with, but I made it for the current MASTER branch without PRs

    that is it uses: tft.putPicture(start_x, start_y, image_width, image_height, (const unsigned char*)image);
    So it might be interesting to see if you see similar results.

    The whole sketch is included.

    Some bits and pieces to explain what is happening.

    The main image is converted part of the sketch:
    Code:
    // Generated by   : ImageConverter 565 Online
    // Generated from : T4.1-Cardlike.gif
    // Time generated : Sat, 08 Aug 20 02:16:56 +0200  (Server timezone: CET)
    // Image Size     : 575x424 pixels
    // Memory usage   : 487600 bytes
    
    
    const unsigned short teensy41_Cardlike[243800] PROGMEM={
    0xCE79, 0xEF7D, 0xDEFB, 0xEF7D, 0xDEFB, 0xEF7D, 0xDEFB, 0xEF7D, 0xDEFB, 0xEF7D, 0xDEFB, 0xCE79, 0xEF7D, 0xDEFB, 0xEF7D, 0xDEFB,   // 0x0010 (16) pixels
    0xEF7D, 0xDEFB, 0xEF7D, 0xDEFB, 0xEF7D, 0xDEFB, 0xEF7D, 0xDEFB, 0xEF7D, 0xDEFB, 0xEF7D, 0xCE79, 0xDEFB, 0xDF7D, 0xEEFD, 0xDF7D,   // 0x0020 (32) pixels
    0xEEFD, 0xDF7D, 0xEEFD, 0xDF7D, 0xEEFD, 0xDF7D, 0xEEFD, 0xDF7D, 0xEEFD, 0xDF7D, 0xEEFD, 0xDF7D, 0xEEFD, 0xDF7D, 0xEEFD, 0xDF7D,   // 0x0030 (48) pixels
    ...
    And I have function that clears centers the image and draws the rest of the screen to some color. It also puts up text showing how long the call in this case to putPicture took. It also draws a second number of how long it took including the output of the first number. This was done as the internal code in RA8876 calls async SPI.transfer(buffer, nullptr, size, event) so the transfer is still happening when the first call returned, but then any other call will wait for current transfer to complete before it then does it's outputs. So when I output the image which is stored in PROGMEM, it looks like:
    Click image for larger version. 

Name:	IMG_1209.jpg 
Views:	14 
Size:	92.3 KB 
ID:	21344

    But if I copy this image from PROGMEM to DMAMEM it does not all get through... To show it, I earlier fill the whole very large array with the color RED. The main loop code:

    Code:
    void loop(void) {
      tft.setFont(ComicSansMS_24);
      tft.fillScreen(RED);
    
      // Lets put something into the DMAMMEM that is different...
      for (uint32_t i = 0; i < sizeof(teensy41_Cardlike_dmamem)/sizeof(teensy41_Cardlike_dmamem[0]); i++)
        teensy41_Cardlike_dmamem[i] = RED;
        
      Serial.print("Display T4.1 Extended card ");
      drawImage(575, 424, (uint16_t*)teensy41_Cardlike, BLUE);
      if (DelayOrStep()) return;
      Serial.print("DMAMEM Display T4.1 Extended card ");
      // Lets make a DMAMEM version of the card to see if it likes it or not...
      memcpy((void *)teensy41_Cardlike_dmamem, (const void *)teensy41_Cardlike, sizeof(teensy41_Cardlike));
      drawImage(575, 424, (uint16_t*)teensy41_Cardlike_dmamem, GREEN);
      if (DelayOrStep()) return;
      // 
      Serial.print("DMAMEM 2nd time Display T4.1 Extended card ");
      drawImage(575, 424, (uint16_t*)teensy41_Cardlike_dmamem, RED);
      if (DelayOrStep()) return;
    
    }
    The one from DMAMEM does not always draw the same: But often times looks like:
    Click image for larger version. 

Name:	IMG_1211.jpg 
Views:	13 
Size:	115.2 KB 
ID:	21345
    You see the red streaks...

    Now if we unwind the putPicture code:
    Code:
    void RA8876_t3::putPicture(ru16 x, ru16 y, ru16 w, ru16 h, const unsigned char *data) {
    	//The putPicture_16bppData8 function in the base class is not ideal - it damages the activeWindow setting
    	//It also is harder to make it DMA.
    	//Ra8876_Lite::putPicture_16bppData8(x, y, w, h, data);
    
    	//Using the BTE function is faster and will use DMA if available
        bteMpuWriteWithROPData8(currentPage, width(), x, y,  //Source 1 is ignored for ROP 12
                                  currentPage, width(), x, y, w, h,     //destination address, pagewidth, x/y, width/height
                                  RA8876_BTE_ROP_CODE_12,
                                  data);
    }
    Which is:
    Code:
    void RA8876_t3::bteMpuWriteWithROPData8(ru32 s1_addr,ru16 s1_image_width,ru16 s1_x,ru16 s1_y,ru32 des_addr,ru16 des_image_width,
    ru16 des_x,ru16 des_y,ru16 width,ru16 height,ru8 rop_code,const unsigned char *data)
    {
      bteMpuWriteWithROP(s1_addr, s1_image_width, s1_x, s1_y, des_addr, des_image_width, des_x, des_y, width, height, rop_code);
      
      startSend();
      _pspi->transfer(RA8876_SPI_DATAWRITE);
    
    #ifdef SPI_HAS_TRANSFER_ASYNC
      activeDMA = true;
      _pspi->transfer(data, NULL, width*height*2, finishedDMAEvent);
    #else
      //If you try _pspi->transfer(data, length) then this tries to write received data into the data buffer
      //but if we were given a PROGMEM (unwriteable) data pointer then _pspi->transfer will lock up totally.
      //So we explicitly tell it we don't care about any return data.
      _pspi->transfer(data, NULL, width*height*2);
      endSend(true);
    #endif
    }
    So after it configures stuff it calls the SPI transfer...

    And note, the SPI transfer function in this case has:
    Code:
    	if (buf) {
    		_dmaTX->sourceBuffer((uint8_t*)write_data, count);  
    		_dmaTX->TCD->SLAST = 0;	// Finish with it pointing to next location
    		if ((uint32_t)write_data >= 0x20200000u)  arm_dcache_flush(write_data, count);
    To try to flush all: 243800 * 2 bytes from the cache.

    Suggestions?

    EDIT: Looks like I should have rotated the first picture 180 degrees, but...
    Attached Files Attached Files

  2. #2
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    7,912
    Notes on SPI DMA, and things I have tried.

    The SPI DMA code does not chain DMASetting objects, so at most any one DMA operation can do something like 32767 bytes transfer. And that is what the released code is doing.
    It sets up the DMA operation, for the MAX, and we interrupt on completion, at which point, we decrement a count of how much is still left to transfer and we start it up again, and repeat until the count remaining goes to 0...

    So for this transfer this requires us to actually do something like 15 transfers.

    I thought maybe the transfers of 32767 bytes might be an issue that we are not starting off secondary transfers on 32 byte boundary, so tried changing the MAX transfer size in the class
    from 32767 to 32736 (multiple of 32). and it did not appear to make a difference.

    Also thought maybe add code to flush the cache in the ISR when it is about to start the next transfer. Also did not appear to make a difference.

    You might notice in the sketch I actually output the copy into DMAMEM twice to see if it made difference... It did not appear to. Note some iterations through the image might work...

    Next experiment. Try changing my fill RED to go from end of memory to start and see if changes anything.... (Like maybe which things are actually cached...?)

  3. #3
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    7,912
    I was curious, so wondered what it would do on T4.1 with PSRAM...
    So updated sketch, to see if building on T4.1 and at run time it knows it has PSRAM, if so it also copies the image to the equivalent PSRAM array and also did stuff to write garbage out to it first (All black) and then the copy and then the output routine.

    And it appears like the image is drawing correctly from PSRAM...

    Again the changes to the above sketch:

    Up in the global area added:
    Code:
    #if defined(ARDUINO_TEENSY41)
    unsigned short teensy41_Cardlike_extmem[243800] EXTMEM;
    extern "C"
    {
      extern uint8_t external_psram_size;
    }
    Currently the loop function:
    Code:
    void loop(void) {
      tft.setFont(ComicSansMS_24);
      tft.fillScreen(RED);
    
      // Lets put something into the DMAMMEM that is different...
      //  for (uint32_t i = 0; i < sizeof(teensy41_Cardlike_dmamem)/sizeof(teensy41_Cardlike_dmamem[0]); i++)
      for (int i = sizeof(teensy41_Cardlike_dmamem) / sizeof(teensy41_Cardlike_dmamem[0]) - 1; i >= 0; i--)
        teensy41_Cardlike_dmamem[i] = BLACK;
    #if defined(ARDUINO_TEENSY41)
      if (external_psram_size > 0) {
        for (uint32_t i = 0; i < sizeof(teensy41_Cardlike_dmamem) / sizeof(teensy41_Cardlike_dmamem[0]); i++)
          teensy41_Cardlike_dmamem[i] = BLACK;
      }
    #endif
      Serial.print("Display T4.1 Extended card ");
      drawImage(575, 424, (uint16_t*)teensy41_Cardlike, BLUE);
      if (DelayOrStep()) return;
      Serial.print("DMAMEM Display T4.1 Extended card ");
      // Lets make a DMAMEM version of the card to see if it likes it or not...
      memcpy((void *)teensy41_Cardlike_dmamem, (const void *)teensy41_Cardlike, sizeof(teensy41_Cardlike));
      drawImage(575, 424, (uint16_t*)teensy41_Cardlike_dmamem, GREEN);
      if (DelayOrStep()) return;
      //
      Serial.print("DMAMEM 2nd time Display T4.1 Extended card ");
      drawImage(575, 424, (uint16_t*)teensy41_Cardlike_dmamem, RED);
      if (DelayOrStep()) return;
    #if defined(ARDUINO_TEENSY41)
      if (external_psram_size > 0) {
        memcpy((void *)teensy41_Cardlike_extmem, (const void *)teensy41_Cardlike, sizeof(teensy41_Cardlike));
        Serial.print("EXTMEM Display T4.1 Extended card ");
        drawImage(575, 424, (uint16_t*)teensy41_Cardlike_extmem, CRIMSON);
        if (DelayOrStep()) return;
      }
    #endif
    
    }
    Note: it is significantly slower to output that much data from the PSRAM than either FLASH or EXTMEM:


    Code:
    Start RA8876 picture embed testScreen Width:1024 Height: 600
    entering an 's' char will toggle on/off step mode
    Display T4.1 Extended card Image: 7 Total: 115
    Press any key to continue
    DMAMEM Display T4.1 Extended card Image: 7 Total: 115
    Press any key to continue
    DMAMEM 2nd time Display T4.1 Extended card Image: 7 Total: 115
    Press any key to continue
    EXTMEM Display T4.1 Extended card Image: 7 Total: 278
    Press any key to continue
    Again the timings all were about 7 to return from the function, which is not waiting for the SPI operation to complete.
    The run from FLASH and the 2 runs from DMAMEM both took about 115ms and drawingfrom PSRAM took 278ms.

  4. #4
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    5,893
    @KurtE
    Just pulled out my display and ran your test sketch. Am seeing the same thing as you described. Interesting is that it seems to occur in the lower half of the image and the position changes as you cycle through the image updates.

  5. #5
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    7,912
    @mjs513 - Thanks,

    It has been strange, I was seeing this when I tried doing this with malloc created image. Where I had/have version of code that rearranges the bytes such that you can do a simple one output in different orientations...

    Earlier I ran this and captured it with LA and you could see the pieces where the output was black like we are seeing.

    Wondering if it would make sense to try to rewrite how the SPI code works here and see if it makes a difference?
    As I mentioned currently it does not setup any chain of DMASetting objects, but instead maybe interrupts and resets the count and tells it to go again... Which has worked.

    I probably don't want to setup a maximum possible chain, with like 512KB/32k or a chain of 17 of them?
    But could do a chain of 2 of them, again not sure if that would make any difference or not.

    I suppose I could try it and see...

  6. #6
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    5,893
    @KurtE
    Maybe chain 2 DMA transfers with half of 32K? Wonder if that you work.

  7. #7
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    7,912
    Quote Originally Posted by mjs513 View Post
    @KurtE
    Maybe chain 2 DMA transfers with half of 32K? Wonder if that you work.
    Something like that is what I am going to try. I will probably setup to do each DMASetting object to do up to 32K-32bytes transfers, such that if what was passed in was 32 byte aligned all of the entries will continue to be... Actually not sure that makes a difference, but...

    Also not sure I have high hopes for it actually working any different. Than simply restarting SPI N times to finish, but maybe will get lucky.

  8. #8
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    7,912
    Just about to start hacking on SPI.

    Before I started, I thought I would show a couple of screen captures with the Logic Analyzer showing the issues and differences in speed.

    Here is a quick run of the sketch I mentioned with the edits to also use PSRAM...

    Click image for larger version. 

Name:	screenshot.jpg 
Views:	142 
Size:	42.3 KB 
ID:	21351

    Note all 4 groups of screen update are using the same code to output, the only difference is memory locations used.
    The first is the FLASH memory image, the next two are from DMAMEM. Note: I started off in each pass to zero out memory (Wrote BLACK) to the memory and then right before I output, I do a memcpy from the flash memory to DMAMEM.

    The last one is like the DMAMEM version, except it is to external memory PSRAM... You can see how much slower it is. Again same code but slowed down as I assume the reading that much data from PSRAM takes it longer...

    Then if I zoom in to one of the outputs from DMAMEM, you can see gaps in the output (all zeros), where the data is not taken from the actual image but instead from the data I wrote earlier... In other versions I wrote RED first and RED showed up in these gaps. And note the gaps don't always show up or show up in the exact same locations.
    Click image for larger version. 

Name:	screenshot2.jpg 
Views:	8 
Size:	46.4 KB 
ID:	21352

    Now off to code hacking

    Edit: Thought I would again mention, using the Saleae Logic Beta builds (they hope to release a version soon)... One thing I missed from Version 1 is there is no longer commands and like to save images to clipboard or file...
    But finding that I don't totally miss it any more with Windows 10 (snip and Sketch).
    You hit WINDOWS+SHIFT+S - it then allows you to select portion of screen, to put on clipboard, and message at bottom if you click on brings up app, which has features to allow you to save file...

  9. #9
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    5,893
    @KurtE
    Suggestion was based on what you did previously for one of the display drivers but seemed like it would fit.

    Those LA screen shots really show the issue clearly, showing the gaps in the data. Wonder if its more a problem with the RA8876?

  10. #10
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    7,912
    Yep - I was trying to do a quick and dirty just using SPI to verify that it has nothing to do with display...

    So I was trying to figure out a quick and dirty way to define a PROGMEM large array to be intialized... So far this has not worked:

    Code:
    const unsigned short teensy41_Cardlike[243800] PROGMEM = {[0 ... 243799] = 0xf800};
    Compiler does not like it, probably needs a different version of GCC compiler... Will do it with a large init... instead.

  11. #11
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    7,912
    @mjs513 @Paul...

    I am not totally concentrating today, so may not get stuff finished today...
    I did hack up a version of the code to just call SPI and ran without display... My guess is there are some main issues with the restart code of SPI, so will rework.

    I hacked up a sketch at first was output all RED but decided to just output 0xffff...

    Could include the file all_red.h which is now WHITE

    Code:
    #include <SPI.h>
    #include <EventResponder.h>
    #include "All_Red.h"
    
    //const unsigned short all_red[243800] PROGMEM={
    
    const int ARRAY_SIZE = sizeof(all_red)/sizeof(all_red[0]);
    unsigned short all_red_dmamem[ARRAY_SIZE] DMAMEM;
    
    EventResponder event;
    volatile bool event_happened;
    
    #if defined(ARDUINO_TEENSY41)
    unsigned short all_red_extmem[ARRAY_SIZE] EXTMEM;
    extern "C"
    {
      extern uint8_t external_psram_size;
    }
    #endif
    
    void ev_function (EventResponderRef ev) {
      event_happened = true;
    }
    
    void setup() {
      while (!Serial && millis() < 4000) ;
      Serial.begin(115200);
      pinMode(10, OUTPUT);
      digitalWriteFast(10, HIGH);
      SPI.begin();
      event.attachImmediate(ev_function);
      Serial.println(ARRAY_SIZE, DEC);
    }
    
    void test(const unsigned short *image, uint32_t size_image) {
      event.clearEvent();
      event_happened = false;
      digitalWriteFast(10, LOW);
      SPI.beginTransaction(SPISettings(60000000, MSBFIRST, SPI_MODE0));
      SPI.transfer(image, nullptr, size_image, event);
      while (!event_happened) ;
      SPI.endTransaction();
      digitalWriteFast(10, HIGH);
    }
    
    void loop() {
      Serial.println("Press any key to continue");
      while (Serial.read() == -1) ;
      while (Serial.read() != -1);
      // Lets put something into the DMAMMEM that is different...
      //  for (uint32_t i = 0; i < sizeof(all_red_dmamem)/sizeof(all_red_dmamem[0]); i++)
      for (int i = ARRAY_SIZE - 1; i >= 0; i--)
        all_red_dmamem[i] = 0;
    #if defined(ARDUINO_TEENSY41)
      if (external_psram_size > 0) {
        for (uint32_t i = 0; i < ARRAY_SIZE; i++)
          all_red_dmamem[i] = 0;
      }
    #endif
      test(all_red, sizeof(all_red));
      delay(10);
      memcpy((void*)all_red_dmamem, (const void *)all_red, sizeof(all_red));
      test(all_red_dmamem, sizeof(all_red_dmamem));
    #if defined(ARDUINO_TEENSY41)
      if (external_psram_size > 0) {
        delay(10);
        memcpy((void*)all_red_extmem, (const void *)all_red, sizeof(all_red));
        test(all_red_extmem, sizeof(all_red_extmem));
      }
    #endif
    }
    I included same file as before, but dis a search and replace of all of the hex numbers and set to FFFF.

    Click image for larger version. 

Name:	screenshot.jpg 
Views:	7 
Size:	44.0 KB 
ID:	21353

    But you actually see some noise on all three. The one from FLASH is showing some trailing 0s...

    The other two are showing more junk as you can see and showing some stuff on MISO pin as well...
    I had nothing connected to it. Don't remember if we init to PU or PD or not.

  12. #12
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    7,912
    @mjs513 @Paul ... You might my quick and dirty change to SPI.

    Turns out I was not arm_dcache_flush of the whole count of bytes being output. Only the max I could do for one operation...
    I updated for whole operation and things look cleaner!

    Will do PR after more testing:
    https://github.com/KurtE/SPI/tree/T4...h_Whole_buffer

  13. #13
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    5,893
    Quote Originally Posted by KurtE View Post
    @mjs513 @Paul ... You might my quick and dirty change to SPI.

    Turns out I was not arm_dcache_flush of the whole count of bytes being output. Only the max I could do for one operation...
    I updated for whole operation and things look cleaner!

    Will do PR after more testing:
    https://github.com/KurtE/SPI/tree/T4...h_Whole_buffer
    See got ambitious this afternoon Anyway I downloaded the PR that you reference and reran your original test sketch and it appears that it fixed the issued with the streaks for the images. Ran through the sequence several times and as fast as i could hit the enter key and it all seemed to work no issue.

    I got the following times on the first pass:
    Code:
    Start RA8876 picture embed testScreen Width:1024 Height: 600
    entering an 's' char will toggle on/off step mode
    
    Display T4.1 Extended card Image: 7 Total: 115
    Press any key to continue
    
    DMAMEM Display T4.1 Extended card Image: 7 Total: 115
    Press any key to continue
    
    DMAMEM 2nd time Display T4.1 Extended card Image: 7 Total: 115
    Press any key to continue
    
    EXTMEM Display T4.1 Extended card Image: 8 Total: 279
    Press any key to continue
    
    Display T4.1 Extended card Image: 8 Total: 116
    Press any key to continue
    Didn't notice any time differences.

  14. #14
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    7,912
    Thanks @mjs513 - I removed the lines I commented out and then retried with that test.

    I also selected back my PR request branch for RA8876, and the test sketch has gone through all 4 rotations and I am no longer seeing that issue with parts of the image not showing correctly...

    So issued Pull Request to SPI: https://github.com/PaulStoffregen/SPI/pull/61

    Probably will now put the RA8876 back on the shelf..

  15. #15
    Senior Member+ mjs513's Avatar
    Join Date
    Jul 2014
    Location
    New York
    Posts
    5,893
    @KurtE
    Very cool that you got it resolved. Guess I can put mine away now as well

  16. #16
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    7,912
    Yep - unless I get that burr under saddle again to add something else Will steal that T4.1 back with the extra memory for other projects.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •