Fastest port manipulation ?

Status
Not open for further replies.
[I moved this post to a new thread at the instigation of Robin]

Howdy Paul / forum users,

I am working with the Teensy3.6 to see if it can be used for a video project.

Briefly, I have a camera which is PAL / NTSC and which has a BTU.656 port on its bottom. This amounts to an 8-bit port clocked at 28MHz, which over the course of 40 msec (for PAL) sends out the two fields of an interlaced frame.

I would like to capture that, and store to UH-1 microSD card (good for 90MB/sec writes).

I think that I need to write to the card in 512byte chunks (minimum). I also want to store the image locally (in RAM) to compare with previous frames. If there is a repetition of a frame (due to integration) then I would like to not write the second (or third etc) frames.

Now I think I need a µC that can do port reading / SD writing at 30 MHz. The Teensy might be out of its depth here - right now I can get 12.19 MHz out of the tightest loop available, just toggling a pin., when compiled for Fastest + PureCode + LTO + overclock @240MHz. But there might be ways to make it run faster that I have not yet explored.

Code:
Code:
const int led = 13;

void setup() {
  pinMode(led, OUTPUT);
}

void loop() {

    digitalWrite(led, HIGH);
    digitalWrite(led, LOW);
    
}
If anyone has any thoughts, please feel free to advise.

Regards, Tony Barry
Sydney, AUS

[Two posters offered thoughts on the topic, which is good - responses below]

@ https://forum.pjrc.com/members/36745-Theremingenieur
@ https://forum.pjrc.com/members/37133-Frank-B
Thank you for your thoughts. As it turned out, I tried the usual (AVR) DDRx / PORTx / PINx and they were actually slower than digitalWrite ( ! ) If there is a Teensy alternative to these commands I have not yet found it. And while the data sheet will be of good use, the Arduino environment is the arbiter of commands and syntax. I have not seen a way to delve into what that environment will accept. But all suggestions are gratefully received.

Regards, Tony Barry
Sydney, AUS
 
Last edited:
Howdy Paul,

That digitalWriteFast looks like something I could use !

Is there a list of such additions to the Arduino command syntax anywhere ?

Regards, Tony Barry
Sydney, AUS
 
Using AVR register conventions will give most times slow compiled code, since the ARM32 CPUs simply do not have identical registers and if you use these though, you pull the handbrake through emulation. Examples of “classic” and native embedded programming of the Teensy in C/C++ without the Arduino simplifications and restrictions can be found in almost all Teensyduino core files. There is some background information available in this document: https://www.pjrc.com/teensy/K66P144M180SF5RMV2.pdf
 
Thanks to all who have replied.

The digitalWriteFast command appears to have serious legs ... and some limitations (of course).

The attached image shows a screenshot of my scope (100MHz, 10:1 500MHz passive probe 10MΩ 13pF), executing a great number of digitalWriteFast instructions:-
Code:
void loop() {

  while (1 == 1) {
    digitalWriteFast(led, HIGH);
    digitalWriteFast(led, LOW);
    digitalWriteFast(led, HIGH);
    digitalWriteFast(led, LOW);
    digitalWriteFast(led, HIGH);
    digitalWriteFast(led, LOW);

   ...

   [repeat for 128 times]

  }
}

The probe loads the circuit way too much, but the basic facts are evident - the Teensy 3.6 can do at least 112MHz of I/O in bursts limited by the length of the inline instructions.

Loop boundaries take time from the burst (obviously).

At "void loop" ends, there is a chunk of stuff going on which takes around 500nsec (other threads have suggested this is checking events for serial ports, and it can be eliminated).

This is very heartening, and makes my project look a lot more doable. I may have averted the necessity for an FPGA ...

Regards, Tony Barry
Sydney, AUS

The following thread has many good points on high speed I/O.
https://forum.pjrc.com/threads/4156...l-transfers-from-a-10MSPS-ADC-on-a-Teensy-3-6
 

Attachments

  • teensy112MHz.png
    teensy112MHz.png
    30.2 KB · Views: 188
The probe loads the circuit way too much

If you didn't go to the trouble of directly configuring the pin, you might also be seeing the effect of the slew rate limiting which pinMode() enables by default. To configure the pins in fast mode without this limiting, you have to write directly to the config registers. Be aware the pins can create quite a lot of high frequency energy and large current spikes on the power lines when you rapidly toggle a pin in high current mode without slew rate limiting!
 
If you didn't go to the trouble of directly configuring the pin, you might also be seeing the effect of the slew rate limiting which pinMode() enables by default. To configure the pins in fast mode without this limiting, you have to write directly to the config registers. Be aware the pins can create quite a lot of high frequency energy and large current spikes on the power lines when you rapidly toggle a pin in high current mode without slew rate limiting!

Hi Paul,

Thank you for this excellent advice. Is there a place where I might see how this is done in Arduino syntax ? Any keyword clues so I can search the forum ?

Regards,
Tony Barry
Sydney, AUS
 
As it has been stated several times before, Arduino syntax is very limited. You'll have to use native C/C++ and learn register organisation, names, and their structure from the K66 reference manual or by studying the Teensyduino core files source code.

You'll have to leave now the comfortable "copy and paste by Google" zone and move over to system architecture study and true embedded software developing...
 
Hi Theremingenieur,

Thank you for your reply.

I agree that Arduino syntax is limited. However Teensy appears to use the Arduino IDE as its programming tool of choice. Hence my question.

I have downloaded the reference manual, but at 2500 pages, this remains a reference and not a tutorial ... I would hope for some conciseness in a tutorial on these things.

I have come across the following gem buried in the forum which explains Teensy direct port mapping, and the difficulties in finding out this information.

https://forum.pjrc.com/threads/1753...-GPIO_PDIR-_PDOR?highlight=slew+rate+limiting

If you have time, would you be interested in distilling your considerable knowledge into such a tutorial ? That would go much further in spreading the light.

Regards, Tony Barry
Sydney, AUS
 
Fast port IO is possible, but there are a few issues that limit the maximum practical speeds that can be achieved.

The Teensy 3.6 can make changes to an IO pin at a rate of one half of the CPU frequency. Without overclocking, this rate is 90 MHZ. But this rate may not be practical in a real application. The K66 datasheet specifies a maximum GPIO slew rate of 7nS max. This slew rate exceeds the single instruction cycle period of 5.66 nS @ 180 MHZ. Although you may be able to slew faster than this rate on a specific Teensy, you can't guarantee that any K66 will slew at a faster rate than 7nS.

Another issue that I have encountered is a combination of compiler optimization and what I believe to be issues related to the pipeline architecture of the K66.

If you issue a series of back to back digitalWriteFast() functions to toggle a specific GPIO bit, you would expect to see a 90 MHZ clock as a result. But interestingly enough, this seems to not be the case outside of a loop structure. Unrolling a loop and explicitly repeating any number of digitalWriteFast() functions to execute a specific number of clock cycles does not result in a pulse train of matched high and low periods as one would expect.

I have experimented with different methods of creating a shift clock to read data from a 24 bit A/D converter. And although writing a series of 48 digitalWriteFast() functions to clock 24 bits of data should give a consistent clock, it does not. The resulting high and low periods vary from what one would expect. Some are as fast as they should be, but many of them are stretched to multiples F_CPU.

However, placing these function calls within a loop structure yields different results. The first iteration through the loop behaves as I have observed, inconstant, with some clock periods stretched for no reason. The stretching is not consistent between high and low states. But all stretching is clearly a multiple of F_CPU. Extra CPU cycles are taking place for some reason.

What is clear is that once the loop completes the first iteration, all subsequent iterations are timed precisely as expected. So back to back digitalWriteFast() calls to toggle a single bit will yield a 90 MHZ clock. The only way to capture the odd behavior of the first iteration is to trigger on the first rising edge of the I/O pin used as a clock.

In the case of my project, I had to enclose each toggle of the I/O line and the read of a separate I/O line of the serial data within a loop that executes 24 times. Some additional instructions for de-serialization were added. The resulting clock rate is non-symmetric, but nearly perfectly matches the clock phase widths and rates that work with my A/D converter.

Since this sort of bit-banging is about the only solution for high end IO implementation, I would strongly advise that you capture the first iteration case to see how the clock phases measure up. And please keep in mind that the 7nS maximum slew rate has to be taken into account at these speeds. You may not see it with your particular Teensy, but without screening devices, you can't guarantee that the timing you observe with your specific device can be replicated on all Teensies you might use for your application.
 
Status
Not open for further replies.
Back
Top