DMA to 9-bit display

quiver

Active member
I've been using a 256x64 display in one of my designs for some years now and it's been great, but I'm limited to 20MHz SPI and that makes a full screen write a touch over 4ms for 9-bit.

It's 9-bit because I don't use a D/C pin, I run the display in 9-bit mode with the MSB low for C, high for D (I'm in SPI_MODE3, so that might be topsy-turvy for folks using MODE1-MODE2).

Anyway, I want to transition to DMA SPI display writes to not have code blocking for 4ms every time I call a display write (50Hz / 20ms intervals, so 20% of the time my code is blocked by these display writes!).

I have two roadblocks in this quest.

The first is that with the following code, even running the unmodified SPI.transfer function for DMA hangs:
Code:
EventResponder SPI_DMA_Handler;
volatile bool dmaBusy = false;
SPISettings dmaSettings(20000000, MSBFIRST, SPI_MODE3);

...

void SPI_DMA_ready() {
  //end screen update
  SPI.endTransaction();
  digitalWriteFast(oledCS,HIGH);
  dmaBusy = false; 
}

...

void setup() {
    SPI_DMA_Handler.attachImmediate(&SPI_DMA_ready);
}

...

ptr = buff + rowBegin * 128;

balance = (rowEnd + 1 - rowBegin) * 128;

// Send txBuffer to display using SPI DMA
dmaBusy = true;
digitalWriteFast(oledCS,LOW);
SPI.beginTransaction(dmaSettings);
SPI.transfer((void *)ptr, 0, balance, SPI_DMA_Handler);
For reference, reverting the last line to my custom SPI.transfer9((void *)ptr, 0, balance, false); works fine. And I know that using the normal 8-bit transfer function won't actually work on my display, but I'm trying it anyway as a sanity check to see if DMA is working before I continue to try and port DMA to a 9-bit function. At the moment this hangs and the Teensy resets after 10 secs or so.

So the 2nd roadblock is making a 9-bit DMA mode. I need to clarify that I don't need to read 9-bit from memory, I just need to write 9-bit to the device, and that ninth bit does not need to change during the write. I never have a need to send a display command via DMA. So the way I've done 9-bit in my SPI.transfer9 function is thus:
Code:
uint32_t tcr = port().TCR;
port().TCR = (tcr & 0xfffff000) | LPSPI_TCR_FRAMESZ(8);  // turn on 9 bit mode

...

while (count > 0) {
uint16_t tempBig = p_write? *p_write++ : _transferWriteFill;
if (!cmd) tempBig += 0x100;
port().TDR = tempBig;
count--;
But my digging through that 3500 page manual suggests DMA doesn't have a native way of doing anything other than 8/16/32 bit. So my thinking is to bit-bang a bit before sending each 8-bit frame. Possibly requiring adjustments to the DMA pacing. So I'm getting thoroughly out of my depth in the SPI.cpp and DMAChannel.h files, and I started trying to look at bit-banging methods but it seemed ridiculous inserting those into the SPI and DMA libraries when most of the code is setting up pins and timers, both of which should already be exposed in these libraries (_ccr appears to be the private var in SPI.cpp responsible for timing?)

Does anyone have any clues re getting ordinary DMA SPI happening, and then how I might make bit-banging happen within DMA SPI ?
 
At the moment this hangs and the Teensy resets after 10 secs or so.
Why does it reset? That would indicate it is not hanging, it is encountering an error. Use CrashReport to see what it is.

You really need to include a complete sketch if you want anybody else to investigate the issue, things like the location/alignment of buff are important for DMA.
 
Sure, here's the report:

CrashReport:
A problem occurred at (system time) 7:12:34
Code was executing from address 0xB28A
CFSR: 82
(DACCVIOL) Data Access Violation
(MMARVALID) Accessed Address: 0x27
Temperature inside the chip was 36.47 °C
Startup CPU clock speed is 600MHz
Reboot was caused by auto reboot after fault or bad interrupt detected

I use Adafruit_GFX and the buffer is initialised within that. I have added functions to the Adafruit libraries to expose the buffer pointer to my program and I manage screen writes within my main program code.

The buffer is created during init in the Adafruit_GrayOLED.cpp file:
Code:
bool Adafruit_GrayOLED::_init(uint8_t addr, bool reset) {

  // attempt to malloc the bitmap framebuffer
  if ((!buffer) &&
      !(buffer = (uint8_t *)malloc(_bpp * WIDTH * ((HEIGHT + 7) / 8)))) {
    return false;
  }
This display is 4 bpp so that's 8192 bytes.
 
I've tried creating a local buffer in my main program code so I control the setup of it, and using memcpy to shift the buffer into it before directing SPI.transfer to it (proving that the buffer is copying correctly but using my SPI.transfer9 fn first). Neither putting DMAMEM nor aligning to 32-bit (__attribute__((aligned(32)));) in the buffer initialisation appears to change anything.

Using the breadcrumbs component of CrashReport, the program is getting past the SPI.transfer line, if I measure the duration of dmaBusy is 4090us which is what I'd expect it to take, and SPI_DMA_ready() is firing once.. the program arrives at SPI.transfer the second time and that's where the crash occurs.

You can see I was starting to butt my head against this two years ago. But again, please help me get this working in 8-bit before tackling 9-bit!
 
Last edited:
Further investigation.. this is the line in SPI.cpp where it faults the 2nd time:
Code:
_dmaRX->TCD->ATTR_SRC = 0;
 
OK. Sorted roadblock A. I was inadvertently overflowing the buffer, which caused a string of other issues.

So now I'm just trying to do 9-bit DMA SPI writes (from an 8-bit array). Bit-banging a single high bit between writes seems to be the most logical approach.
 
Just busy talking to myself over here but realised two things.. firstly that bit-banging would be even more difficult as I'd have to do it on the clock line also. Secondly as SPI data has no start and stop bits like serial, I could just have a silly big for loop and insert those ninth 'data' bits in an array that is 1/8th bigger. So I put an array in DMAMEM and this loop takes about 350us to copy but now with DMA screen writes, I'm still ahead by 3740us!
Code:
    uint8_t bitIdx = 8;
    uint16_t sourceIdx = 0;
    for (uint16_t i=0;i<9216;i++) { // 9216 is the 8192 buffer size plus the extra bit.
        dmaBuffer[i] = 0;
        if (bitIdx<8) dmaBuffer[i] |= (buff[sourceIdx-1] << bitIdx);
        if (bitIdx>1) dmaBuffer[i] |= (buff[sourceIdx] >> (9-bitIdx));
        if (bitIdx>0) {
            dmaBuffer[i] |= (0b1 << (bitIdx-1)); // Insert the data flag bit
            sourceIdx++;
        }
        bitIdx==0?bitIdx=8:bitIdx--;
    }
I'm sure someone could optimise that further, but with DMA happy and that speed increase, I'm very happy with this result.

I think a few folks have said it already on other pages, but it would be good to have a concise guide for DMA SPI at some point.
 
Back
Top