Teensy 3.6 8pin, 16pin, and 32pin simultaneous writes

Status
Not open for further replies.

NoahK

New member
Hello PJRC! I am a new member here and am really excited to get to know you all.

I am looking to make a 3 dimensional POV display using APA102 LEDs. Specifically, I plan on making a 20x20 matrix (400 LEDs), and am debating how to handle pushing data to the LEDs. I need to refresh the LED's 3600 times per second in order to achieve 120 subdivisions per revolution at 30 revolutions per second. Each LED requires 32 bits (4 bytes) of data, so for 400 LEDs, that is 1600B of data per subdivision, which constitutes 192000B of data per frame. With 30 frames per second, this means that I have to push out 5,760,000 Bytes of data per second via SPI to the LEDs. On a single strand, this would require upwards of a 50MHz SPI clock speed, but if I were to split up the LED's into many strands and drive those simultaneously, the speed needed would be much slower. I figured, looking at the APA102 datasheet, that I could bit-bang the clock and data line for 7 strands en masse, using two bytes per SPI clock cycle. Transmitting 7 BGR values of 0xFF0000, 0x00FF00, 0x0000FF, 0xFFFF00, 0x00FFFF, 0xFF00FF, and 0xFFFFFF simultaneously, with the clock on the first bit, would output the following bytes in this order on a 8-pin port:
Code:
01001011  \  \__ Two lines (bytes) make up one clock cycle
11001011  |  /
01001011  |
11001011  |
01001011  |
11001011  |
01001011   \__ 16 lines make up one byte of data
11001011   /    Blue data for the 7 pixels, clock is on leftmost bit
01001011  |
11001011  |
01001011  |
11001011  |
01001011  |
11001011  |
01001011  |
11001011 _/
00101101  \
10101101  |
00101101  |
10101101  |
00101101  |
10101101  | 
00101101   \__Green data for LED's
10101101   /
00101101  |
10101101  |
00101101  |
10101101  |
00101101  |
10101101  |
00101101  |
10101101 _/
00010111  \
10010111  |
00010111  |
10010111  |
00010111  |
10010111  |
00010111  |
10010111  |
00010111   \___ Red Data for LEDs
10010111   /
00010111  |
10010111  |
00010111  |
10010111  |
00010111  |
10010111 _/

This basically doubles the memory that frames take up, but it slows down the necessary SPI clock speed by a factor of 7. And considering how fast Port Manipulation is, this should be feasible. But to make the most use of the Teensy and free up as much time as possible, I started to wonder if 16 bit/pin ports existed on the Teensy 3.6. In this forum post, there is mention of 16 pin ports that are addressed using 32 bit writes. Is it possible to write to 16 or even 32 pins simultaneously? What kind of limitations or complications would I run into?
 
These pins are accessed using five 32 bit registers. If you write directly to these registers, you can simultaneously update all the pins within one of the 5 groups. All the PTA registers are on port A, all PTB are another group, and so on to PTE.

You can see which pins are in which groups on the schematic.

http://www.pjrc.com/teensy/schematic.html
 
Ah, yes, I see. It's just that at most 15 or 16 pins from each of the ports are broken out to the headers.
At what rate could I successfully read 32 bits of data from flash or ram and write that to the port? Thinking of something like this, that runs per interrupt:
Code:
:At a timer interrupt
:  for counter in range 0 to 63
:    read 32 bits from flash in location(counter) to ram
:    write those 32 bits to a port
:  end for
:end interrupt

How fast could this code be executed? What are some tips and tricks to execute this code even faster?
 
Ah, yes, I see. It's just that at most 15 or 16 pins from each of the ports are broken out to the headers.
At what rate could I successfully read 32 bits of data from flash or ram and write that to the port? Thinking of something like this, that runs per interrupt:
Code:
:At a timer interrupt
:  for counter in range 0 to 63
:    read 32 bits from flash in location(counter) to ram
:    write those 32 bits to a port
:  end for
:end interrupt

How fast could this code be executed? What are some tips and tricks to execute this code even faster?

Well, it's your job to write the code :) It hard to say how fast it can run ...
But it's easy to test is, as soon the code is there. So, write it :)
 
At what rate could I successfully read 32 bits of data from flash or ram and write that to the port?

Reading (or writing) RAM takes 2 cycles, or only 1 cycle if the previous instruction did a similar access.

Flash can take 1 to 5 cycles, depending on whether the location you want is already in the cache memory.

Writing to the port register takes (I think) 2 cycles. Maybe, not really sure. It might involve another cycle of latency after writing, when the CPU is already executing the next instruction, because there's a bus bridge inside the chip between the fast switched bus matrix and the slower peripheral bus. This is a finer detail I've never really dug into, because the internal details are complex and the documentation isn't wonderful (a lot of very complex stuff that's not necessarily specific to this chip) and it's a matter of dozens of nanoseconds that are a challenge to measure.

Of course, these execution times assume you already have the address of the memory or GPIO location (or a base address of many locations) already stored in one of the 8 or 12 or 13 available registers. On ARM, there aren't any direct addressing modes, so to access anything in the 4G address space, the address has to be put into one of the ARM's registers.
 
Status
Not open for further replies.
Back
Top