Problem trying to read OV7670 camera under IRQ Teensy 4.0

Status
Not open for further replies.
Hi Cyrille,

I just did a few quick tests with generating high frequency square waves on the T4.0. What I find on my scope is that indeed the basic PWM tick frequency is 150MHz, so whatever frequency you choose is approximated by a certain number of ticks of 6.6ns each. For instance, you select 12MHz, your analog value of 127 will actually get a best approximation output consisting of 6 ticks 40ns high, 7 ticks ~47ns low, at 11.54MHz with a high duty cycle of 46%.

Now, the OV7670 datasheed specifies that XCLK must be between 10MHz and 48MHz and the duty cycle between 45% and 55%. But depending on your frequency selection and the limitations above your XCLK might go below 45% or above 55% duty cycle. This could be the reason for your issues when increasing the XCLK frequency.

The best you can get in the range that the OV7670 should accept is 37.5MHz that would be generated from exactly two ticks high, two ticks low: NewFile27.png

Kind regards,
Sebastian
 
Hi Cyrille,

are you still following? I must say I find your project quite interesting.

I just managed to generate a 40MHz XCLK signal but however on Pin 41 of the Teensy 4.1 using:
Code:
  CCM_CSCDR3 = CCM_CSCDR3_CSI_PODF(3-1) | CCM_CSCDR3_CSI_CLK_SEL(0b10); // Clock from PLL3_120M, divide by 3 -> 40MHz
  CCM_CCGR2 |= CCM_CCGR2_CSI(CCM_CCGR_ON); // CSI_CLK_ENABLE. TODO: Is this required?
  CORE_PIN41_CONFIG = 4; // IOMUXC_SW_MUX_CTL_PAD_GPIO_AD_B1_05 -> CSI_MCLK, CMOS Sensor Master Clock (directly from CCM)

In fact, from what I read the Teensy 4.1 should have all hardware CSI interface signals required for 8 bit cameras like the OV7670 routed to external pads. So one could also consider making use of the hardware CSI feature of the i.MX ...

Kind regards,
Sebastian
 
Last edited:
Hi Cyrille,

are you still following? I must say I find your project quite interesting.

I just managed to generate a 40MHz XCLK signal but however on Pin 41 of the Teensy 4.1 using:
Code:
  CCM_CSCDR3 = CCM_CSCDR3_CSI_PODF(3-1) | CCM_CSCDR3_CSI_CLK_SEL(0b10); // Clock from PLL3_120M, divide by 3 -> 40MHz
  CCM_CCGR2 |= CCM_CCGR2_CSI(CCM_CCGR_ON); // CSI_CLK_ENABLE. TODO: Is this required?
  CORE_PIN41_CONFIG = 4; // IOMUXC_SW_MUX_CTL_PAD_GPIO_AD_B1_05 -> CSI_MCLK, CMOS Sensor Master Clock (directly from CCM)

In fact, from what I read the Teensy 4.1 should have all hardware CSI interface signals required for 8 bit cameras like the OV7670 routed to external pads. So one could also consider making use of the hardware CSI feature of the i.MX ...

Kind regards,
Sebastian
I don't think all of the pins are on the T4.1
For example: CSI_DATA00 is on GPIO_B1_10 ALT2
Which we do not have on T4.x We have GPIO_AD_B1_10 on which is different...

What I don't know is if one could still use this using a sub-set of the IO pins?
 
Hello,

Yes, I am, it's just that, well, it's the workweek and I was... strangely enough... working! :)

Anyhow, I have purchased a cheap logic analyser... BUT it came without SW/drivers... so I can not yet use it. Waiting on the seller to send the thing :-(

My project definitely does work with the clock setup at 12Mhz AND at 8Mhz...
My next steps were to look at what changing for a faster clock using the OV pll would do.

One thing that I have not done is describe what the project actually is supposed to be!
The aim is to create a telescope "aiming" system.
A camera looks at the sky and tells you where, exactly (to within around 1°) you are pointing in the sky.
It shows this to you on the integrated sky map. And since it does know the sky, you will be able to see: I want to point on "star X", and it will tell you go-right, go left... type thing.
I already have the sky chart part going well, and I am working on getting the cam working at a first level.
The next part is to get the cam tuned to be able to see stars (dim objects). which means that I will need to change the camera lens from the build in 25° to something like a 30mm aperture/50mm focal length. The aperture will give me more light, and the focal lenght the zoom that I need.
However, I need to find a way to attach said lens to the OV board, including a focusing system!
The last part is the solving. From a (low quality) OV picture, can I figure, using the 100K stars that I have in the catalog, where said camera is pointing. This part is why I need the Teensy4. It does have the computing power (600Mhz + FPU) to do the work.

But, this means that I do NOT need a high frame rate. I need 2 to 5 fps, with good quality, but without any need for high frame rate.
But, I need to be able to, in parallel, allow the user to interact with the screen, and redraw it AND do the solving.
The best solution would be of course to have 2 threads... but this is not yet available :)


So, here you have the whole story. I will spend the upcoming week, when possible, trying to get the analyser working to get a better idea of the signal and their timings, depending on frequency. In // I will try to put the reading under IRQ.

Cyrille
 
For example: CSI_DATA00 is on GPIO_B1_10 ALT2
Which we do not have on T4.x We have GPIO_AD_B1_10 on which is different...

I understand that for an 8 bit camera you would require CSI_DATA02 until CSI_DATA09, and these are available as GPIO_AD_B1_15 until GPIO_AD_B1_08, which are mapped on T4.1 to Pins 27, 26, 39, 38, 21, 20, 23, 22. On top of this you would need CSI_HSYNC on GPIO_AD_B1_07 (Pin 16), CSI_VSYNC on GPIO_B1_13 (Pin 34) or GPIO_AD_B1_06 (Pin 17), CSI_PIXCLK on GPIO_AD_B1_04 (Pin 40) and finally CSI_MCLK on GPIO_AD_B1_05 (Pin 41).

But maybe I'm wrong. I've also ordered three of these OV7670 cameras for 10€ total incl. shipment, they should arrive on Thursday. Let's see if I have time to try ...

Kind regards,
Sebastian
 
The aim is to create a telescope "aiming" system.
A camera looks at the sky and tells you where, exactly (to within around 1°) you are pointing in the sky.
It shows this to you on the integrated sky map. And since it does know the sky, you will be able to see: I want to point on "star X", and it will tell you go-right, go left... type thing.

Cool. A weekend ago I borrowed the telescope of my neighbour, a Skywatcher 8" 1200mm Dobson. Jupiter, Saturn and Mars were easy, but then I tried to find M103 and not only was it tricky to get at, what I saw was also quite difficult to identify. In the end I decided that I DID see M103. I also took pictures with my S8 and D90, and a few videos using RasPI 5MP camera and AutoStakkert.

But pushing a Dobson is no fun, and with a 6mm eyepiece objects cross field of view in about half a minute, so now I'm considering getting myself either a spotting scope or a GoTo telescope. Maybe the OV7670 can then serve a purpose ...

Kind regards,
Sebastian
 
Hello,

6mm on a 1200 focal is 200zoom. It definitely is high. And çan be frustrating with a small field of few ep.
I personally do not like motorized scopes. You end up doing astro looking at your contrôler and not looking at the sky.
I prefer «*the hunt*» as I call it.
An équatorial platform does help with follow through though. And my aimer system could be used to track with 2steper motors (4 more pins!)
I normally use a red dot as a view finder and not a spotting scope. I find them much easier to use.

Anyhow, if/when I make it work, I will definitely publish the whole thing.
Cyrille
 
Last edited:
Hello,

Thanks for that one...
Too bad I am on a Teensy4.0 and not 4.1.

The issue being that the 4.0 does not have the appropriate pins exposed.
I originally thought about getting a 4.1, but they cost twice the price of the 4.0 in france... so, I went for the 4.0, not having done enough reading and not knowing enough to realize that it was a bad decision...
If/when I get this first prototype working, I might move on to the 4.1 for the next version!

Cyrille
 
If you want to speed up your pizxel processing you can get your pixels directly from the GPIO6 data register and do some bit manipulation. The direct read version in the attached code runs about 7 times faster than individual input bit tests. Using this algorithm will require some rewiring of the input bits so that 6 of the 8 are in ascending bit order. The faster processing might allow you to use a faster XCLK signal. For instance, you could use

analogWriteFrequency(clkpin, 30000000); analogwrite(clkpin, 128) to get a 30 MHz clock signal. Since the PWM counter runs at 150MHz, you can get any frequency that divides evenly into 150MHz. However, the output pin may not drive very well at higher freqeuncies. They start to look like sine waves on my 60MHz oscilloscope, so I can't tell whether the distortion is in the pin output or my oscilloscope.

Code:
//  Pixel Read optimization test.  Direct port read goes
//  about 7 times faster, at 14nanoSeconds/pixel,  but will require rewiring input bits.
//
//  mborgerson  9/23/2020
/*******************************************
   as defined in original code */
#define PinCamD0    14 // AD_B1_02
#define PinCamD1    15 // AD_B1_03
#define PinCamD3    16 // AD_B1_07
#define PinCamD2    17 // AD_B1_06
#define PinCamD6    20 // AD_B1_10
#define PinCamD7    21 // AD_B1_11
#define PinCamD4    22 // AD_B1_08
#define PinCamD5    23 // AD_B1_09
/***************************************/
const uint16_t originalbits[8] = {14, 15, 16, 17, 20, 21, 22, 23};                                 /*************************************
  /******************************************************************
    bits in port order
  #define PinCamD0    14 // AD_B1_02
  #define PinCamD1    15 // AD_B1_03
  #define PinCamD2    17 // AD_B1_06
  #define PinCamD3    16 // AD_B1_07
  #define PinCamD4    22 // AD_B1_08
  #define PinCamD5    23 // AD_B1_09
  #define PinCamD6    20 // AD_B1_10
  #define PinCamD7    21 // AD_B1_11
  *****************************************/

const int ledpin    = 13;

#define  LEDON digitalWriteFast(ledpin, HIGH); 
#define  LEDOFF digitalWriteFast(ledpin, LOW);



elapsedMicros em;
// Set pinMode for pixel bits.  Pins are set up with pull-down resistors
// active so I can test pins with a jumper to 3.3V
void SetPinMode(const uint16_t inbits[]) {
  uint16_t i;
  for (i = 0; i < 8; i++) pinMode(inbits[i], INPUT_PULLDOWN);
}

#define NUMSAMPLES 1000000 // one million iterations



// read a pixel byte with original algorithm
static inline uint8_t bitpixel(void) {
  uint8_t b = 0;
  if (digitalReadFast(PinCamD7)) b |= 0x80;
  if (digitalReadFast(PinCamD6)) b |= 0x40;
  if (digitalReadFast(PinCamD5)) b |= 0x20;
  if (digitalReadFast(PinCamD4)) b |= 0x10;
  if (digitalReadFast(PinCamD3)) b |= 0x08;
  if (digitalReadFast(PinCamD2)) b |= 0x04;
  if (digitalReadFast(PinCamD1)) b |= 0x02;
  if (digitalReadFast(PinCamD0)) b |= 0x01;
  return b;
}

// Read a million pixels for timing purposes
uint32_t ReadOriginal(void) {
  uint32_t samples;
  uint8_t b;
  samples = NUMSAMPLES;
  em = 0;
  while (samples) {
    b = bitpixel();
    samples--;
  }
  return em;
}

// read a pixel with direct port read algorithm
// The AD_B1 bits are in the high word of  GPIO6_DR
static inline uint8_t portpixel() {
  uint16_t pword;
  uint8_t b;
  // compiler will optimize to a word read from GPIO6_DR
  pword = GPIO6_DR >> 16;  // get the port bits
  b = pword >> 4; // move the high 6 bits into place
  // now we have to get the proper lower 2 bits into place
  b &= 0xFC;  // clear the lower 2 bits
  if (pword & 0x04) b |= 0x01; // bit zero comes from original bit 2
  if (pword & 0x08) b |= 0x02; // bit one comes from original bit 3
  return b;
}


// this version optimizes by reading the data register
// Collect 1 million samples to determine average time
uint32_t ReadPort1(void) {
  uint32_t samples;
  uint8_t b;
  samples = NUMSAMPLES;
  em = 0;
  while (samples) {
    b = portpixel();
    samples--;
  }
  return em;
}



void setup() {
  uint32_t etime;
  float ftime;
  Serial.begin(9600);
  delay(200);
  Serial.println("Pixel reading optimization test");
  delay(100);
  SetPinMode(originalbits);
  pinMode(ledpin, OUTPUT);
  LEDON
  etime = ReadOriginal();
  LEDOFF
  // calculate time in nanoseconds averaged from one million samples
  ftime = (float)etime/ 1000.0;
  Serial.printf("Original algorithm took %6.3f nanoseconds\n", ftime);
  delay(100);
  LEDON
  etime = ReadPort1();
  LEDOFF

  // calculate time in nanoseconds averaged from one million samples
  ftime = (float)etime / 1000.0;
  Serial.printf("Port Read Optimization took %6.3f nanoseconds\n", ftime);

}

// Now just display the byte returned with the port read algorithm
// for bit testing with jumper to 3.3V
void loop() {
  uint8_t b;
  uint16_t pword;
  LEDON
  b = portpixel();
  LEDOFF

   Serial.printf("PixelByte = 0x%02X\n", b);
  delay(1000);
}
 
Hello,

Yes, I had always planned to do this.

My version is: Assuming that the inlining works well, it should be quick... a couple of instruction at most... as a mater of fact the delay in the read for the GPIO is most likely to be the worse offender...

static inline uint32_t cameraReadPixel()
{
uint32_t pword= GPIO6_DR >> 18; // get the port bits
return pword= (pword&3) | ((pword&0x3f0)>>2);
}


BUT, this is only of interest IF I can do the read under interrupt with 1 interrupt per pixel.
Else, I end up waiting for the next pixel so there is no point in "running to the airport gate to wait for the plane"...

Tue question is: assuming that pixels arrive every 200 cycles or so, am I better wait in the IRQ for the next pixel and read a whole line at once, or should I allow for 1 interrupt per pixel?
Does anyone know how long an "empty interrupt" takes to execute?

Cyrille
 
If you want to speed up your pixel processing you can get your pixels directly from the GPIO6 data register and do some bit manipulation. The direct read version in the attached code runs about 7 times faster than individual input bit tests. Using this algorithm will require some rewiring of the input bits so that 6 of the 8 are in ascending bit order. The faster processing might allow you to use a faster XCLK signal.

Cyrille, note that means that a full horizontal line is transferred much faster as well, for the case of 1 interrupt per horizontal line.

For instance, you could use analogWriteFrequency(clkpin, 30000000); analogwrite(clkpin, 128) to get a 30 MHz clock signal. Since the PWM counter runs at 150MHz, you can get any frequency that divides evenly into 150MHz.

That is correct but will not suit the OV7670. As stated before the F_BUS clock driving the PWM is ticking at 150MHz, so you CAN get a 30MHz PWM signal but NOT one with a 50% duty cycle. What happens when you use the above is two ticks on, three (!) ticks off, i.e. a 40% on 60% off signal at 30MHz, that is outside of the OV7670 spec of 45%-55% duty cycle. I just tried exactly that:
NewFile28.png

Today I received my three OV7670, so over the weekend I might have some time to give it a try as well with T4.0/T4.1.

Kind regards,
Sebastian
 
I guess that means that to meet the OV7670 waveform specs you would need to limit the clock to 25MHz (3 cycles on , 3 cycles off). If you set up the pixel clock that way, the time between pixels is 40 nanoseconds. That might be a little tight for the pixel storage loop, given that you have to store the byte, increment a counter and do some other math. Perhaps a 12.5MHz clock (80 nanoseconds) would be better.

For a line of 160 pixels, you need to read 320 bytes. At 80nSec/byte, that is 25.6 microseconds during which interrupts have to be masked. That's probably not going to interfere with other operations too much.

It seems the best way to handle this is to set up an interrupt on the rising edge of the HREF signal. The handler for that interrupt masks interrupts and reads the 320 bytes by polling, then re-enables interrupts and exits. This reads one horizontal line. You do this for the 120 lines, then disable the HREF interrupt. You can now do your calculations and output. When you are ready for the next frame, wait for VSYNC and do the next frame. Note that just because the frames are now arriving much faster, you don't have to read every frame.

Getting the faster pixel clock will mean adjusting the OV7670 registers to that the pixel clock is the same as XCLK.
 
...
Tue question is: assuming that pixels arrive every 200 cycles or so, am I better wait in the IRQ for the next pixel and read a whole line at once, or should I allow for 1 interrupt per pixel?
Does anyone know how long an "empty interrupt" takes to execute?

...

<EDIT BELOW> It is closer to 100 cycles - had something under test months back - but not yet found it ...

The _isr() will have some 10 cycles delay on trigger and jitter from other 'stuff'.

Perhaps a simple test could be setup with pin _isr() and cnt with another controllable 8-12 Mhz PWM pin feeding that to find the limits.
 
Last edited:
In the case of needing 45-55 duty cycle, what about 37.5mhz?

Example sketch:
Code:
void setup() {
  analogWriteFrequency(0, 30000000);
  analogWrite(0, 128);
  analogWriteFrequency(1,37500000);
  analogWrite(1, 128);
   
}
void loop() {
  
}
screenshot.jpg
150000000/37500000 = 4
 
Wrote a sample as suggested - using the pin _isr - A T_4.0 can trigger and do minimal _isr() code up to about 5.3 Mhz
 
I think I've measured the time required to enter and exit an empty ISR. That time is 103 clock cycles. I arrived at this number as follows:

In a tight loop, read the ARM clock counter in two adjacent lines of code. The difference in the counter is normally 3 clock cycles. However, if an empty interrupt routine is running asynchronously to the loop, the interrupt will occasionally extend the time between the two reads of the clock counter. On those extended times, the difference is 106 clock cycles. I accumulated the data in the form of a histogram output and the results look like this:
Code:
Collecting
Timing Histogram Data in ARM Cycle Counts
    3 8998994
    4     1
    5     2
  106  1001
Loop count = 8999998
Collecting
Timing Histogram Data in ARM Cycle Counts
    3 8998995
    4     2
  106  1001
Loop count = 8999998
Collecting
Timing Histogram Data in ARM Cycle Counts
    3 8998995
    4     1
   33     1
  106  1001
Loop count = 8999998
Collecting

Here is the test code:

Code:
//  IRQ Timing test  mborgerson 9/24/2020

const int clockoutpin = 14;
const int pclkpin = 33;

const int ledpin = 13;
#define TMHISTOMAX 4096
uint32_t tm_histobuffer[TMHISTOMAX];  // for timing counts up to 4096

void setup() {
  delay(500);
  Serial.begin(115200);
  delay(1000);
  pinMode(ledpin, OUTPUT);
  pinMode(pclkpin, INPUT);
  analogWriteFrequency(clockoutpin,1500000);  // 1.5 MHz interrupt rate
  analogWrite(clockoutpin,128);
  // activate ARM cycle counter
  ARM_DEMCR |= ARM_DEMCR_TRCENA; // Assure Cycle Counter active
  ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
  attachInterrupt(digitalPinToInterrupt(pclkpin), pixelservice, RISING);
}

volatile uint32_t pixcount;

// service the pixel clock interrupt
void pixelservice(void){
 // digitalToggleFast(ledpin);
}

uint32_t startcycles,lastcycles, tmdiff;
elapsedMicros loopmicros;
void loop() {
  uint32_t loopcount;
  // put your main code here, to run repeatedly:

  memset(tm_histobuffer, 0, sizeof(tm_histobuffer));  // clear  timing histogram counts
  Serial.println("Collecting");
  delay(10);  // let the USB output finish
  loopcount = 0;
  loopmicros = 0;  // reset the timer
  while(loopmicros < 1000000){ // run for 1 second
      startcycles =  ARM_DWT_CYCCNT; // measure time between two instructions
      lastcycles = ARM_DWT_CYCCNT;   // sometimes this gets extended by interrupt
      tmdiff = lastcycles-startcycles;
      if (tmdiff >= TMHISTOMAX) tmdiff = TMHISTOMAX - 1;
      tm_histobuffer[tmdiff]++;
      loopcount++;
  }// wait until 1 second has passed

  ShowTmHisto();
    Serial.printf("Loop count = %lu\n", loopcount);
  delay(1000); // allow time for user to read results  

}

void ShowTmHisto(void) {
  uint32_t i;
  Serial.println("Timing Histogram Data in ARM Cycle Counts");
  for (i = 0; i < TMHISTOMAX; i++) {
    if (tm_histobuffer[i] > 0) {
      Serial.printf("%5lu %5lu\n", i, tm_histobuffer[i]);
    }
  }
  Serial.println();
}

It is not surprising to see that it takes 103 cycles to get into and out of an interrupt service routine. After all, the processor has to stack all the ARM registers as well as the FPU registers and pull them at the end of the ISR. That's about 26 registers each way.

The hard limit--which would allow no time outside the interrupt would seem to be well below 6MHz (600MHz/103).

When you add in the necessity to actually read pixels inside the interrupt routine, I don't think it would be possible except at pixel clock rates well below 1MHz.
 
Hello,

By my timing, with the clock at 8Mhz and the original settings, the pixel clock is at around 290 cycles.
So, using pixel wise IRQ, the CPU would be running at around 50% IRQ time, 50% normal processing while getting a frame.
Apparently, in the QQVGA mode that I use, the lines come out at 25% duty ratio, so it means that 10 times per second (10fps), the CPU would start to "slow down" by a factor of 1/2 25% of the time!

ie: 12.5% of the CPU time would be taken by this IRQ.
However, should the IRQ be blocked for more than 300 Cycles, I would loose pixels. And this is "bad"....

Today, I will be trying to move the line read under IRQ and see what it does.
This will also allow me to drop frames when not needed as the camera is able to generate frames faster than I need them.

Cyrille
 
@mborgerson came up with the result of ~100 cycles I recalled. By different means. That agrees with today's alternate test showing _isr() functional up to 5.4 Mhz - so with minimal loop to interrupt that suggests 111 cycles.

How often are the pin interrupts?

As suggested with PWM off pin1 triggering interrupts on pin0 for minimal code and the _isr works well enough at 5 Mhz. A free running loop() hits ~16M cycles per second using CYCCNT to measure second ( w/void yield() ) - about double over using elapsedMicros().

Enabling the PWM to trigger the _isr() takes it to about 2M loops()/sec at 5 MHz - then it drops under 1M at 5.4 Mhz PWM _isr()'s

At 5.5 MHz _isr()'s swamp loop() and it takes an IntervalTimer to detect and stop the PWM to let it run loop() ever again.

Modifying the PWM to 1.5 MHz the 16,665,815 loop()'s drops to 12,299,385 per second and the _isr() does count 1.5M - a drop of 26%. Where the _isr() is:
Code:
void test_isr() {
	i_cnt++;
	digitalToggleFast( LED_BUILTIN );
}
 
Hello,

Well, I moved the code under IRQ.
At this point in time, I am using IRQ for each frame and each line.
With this timing, I get around 16 frame IRQ per second, and since I limit myself to 5fps, I have to handle around 600 lineIRQ (at 110µs per IRQ = 60ms = 6% of CPU time). The rest of the frames, I am dropping as I can not handle more than 5 fps anyway.
Of course, the other frame IRQ (around another 1000 per second) will occur, they will, not result in a line read and only take around 100 cycles as showed in the other tests in this thread. this should only take 166µs/s or 0.00016% of the CPU time

Although this might not be the best that can be achieved, it is good enough for me. the frame capture does not stop/delay the rest of my processing and the screen update is working fine now with real time screen moving happening as needed for my UI.

Cyrille

Code:
// Note: it is important for the pins to be #defined here to allow the fast reads to work properly as the need to know the pins at compile time
#define PinCamVsync 2  // EMC_04
#define PinCamHref  3  // EMC_05
#define PinCamPpclk 4  // EMC_06
#define PinCamXclk  5  // EMC_08
#define PinCamD0    14 // AD_B1_02
#define PinCamD1    15 // AD_B1_03
#define PinCamD2    17 // AD_B1_06
#define PinCamD3    16 // AD_B1_07
#define PinCamD4    22 // AD_B1_08
#define PinCamD5    23 // AD_B1_09
#define PinCamD6    20 // AD_B1_10
#define PinCamD7    21 // AD_B1_11
#include "Camera.h"

#include <Wire.h>
static int const cameraAddress= 0x21; // i2c address of camera's sccb device
static struct { uint8_t reg, val; } const OV7670Registers[] = {
  {0x12, 0x80}, // Reset to default values
  {0x11, 0x80}, // Set some reserved bit to 1!
  {0x3B, 0x0A}, // Banding filter value uses BD50ST 0x9D as value. some reserved stuff + exposure timing can be less than limit on strong light
  {0x3A, 0x04}, // output sequense elesection. Doc too lmited to be sure
  {0x3A, 0x04},
  {0x12, 0x04}, // Output format: rgb
  {0x8C, 0x00}, // Disable RGB444
  {0x40, 0xD0}, // Set RGB565
  {0x17, 0x16}, // Y window start msb (3-11) I think
  {0x18, 0x04}, // Y window end msb (3-11)
  {0x32, 0x24}, // Y window lsb end= 100b start= 100b
  {0x19, 0x02}, // X window start msb (2-10) I think
  {0x1A, 0x7A}, // X window end msb (2-10) I think
  {0x03, 0x0A}, // X window lsb (10 and 10)
  {0x15, 0x02}, // VSync negative
  {0x0C, 0x04}, // DCW enable? 
  {0x3E, 0x1A}, // Divide by 4
  {0x1E, 0x27}, // mirror image black sun enabled and more reserved!
  {0x72, 0x22}, // Downsample by 4
  {0x73, 0xF2}, // Divide by 4
  {0x4F, 0x80}, // matrix coef
  {0x50, 0x80},
  {0x51, 0x00},
  {0x52, 0x22},
  {0x53, 0x5E},
  {0x54, 0x80},
  {0x56, 0x40}, // contracts
  {0x58, 0x9E}, // matrix
  {0x59, 0x88}, // AWB
  {0x5A, 0x88},
  {0x5B, 0x44},
  {0x5C, 0x67},
  {0x5D, 0x49},
  {0x5E, 0x0E},
  {0x69, 0x00}, // gain per channel
  {0x6A, 0x40}, // more gain
  {0x6B, 0x0A}, // pll reserved stuff!
  {0x6C, 0x0A}, // AWB
  {0x6D, 0x55},
  {0x6E, 0x11},
  {0x6F, 0x9F},
  {0xB0, 0x84}, // reserved!
  {0xFF, 0xFF}  // End marker
};

// Read a single uint8_t from address and return it as a uint8_t
static uint8_t cameraReadRegister(uint8_t address) 
{
  Wire.beginTransmission(cameraAddress);
  Wire.write(address);
  Wire.endTransmission();

  Wire.requestFrom(cameraAddress, 1);
  uint8_t data = Wire.read();
  Wire.endTransmission();
  return data;
}

// Writes a single uint8_t (data) into address
static int cameraWriteRegister(uint8_t address, uint8_t data) 
{
  Wire.beginTransmission(cameraAddress);
  Wire.write(address); Wire.write(data);
  return Wire.endTransmission();
}

// Reset all OV7670 registers to their default values
static void cameraReset() 
{
  cameraWriteRegister(0x12, 0x80); delay(10);
  cameraWriteRegister(0x12, 0); delay(10);
}

// Read and display all of the OV7670 registers
static void cameraReadI2C() 
{
  for (int i = 0; i <= 0xC9; i ++) 
  {
    Serial.print("Register: "); Serial.print(i, HEX); Serial.print(" Value: "); uint8_t reg = cameraReadRegister(i); Serial.println(reg, HEX);
  }
}

// Slow the frame rate enough for camera code to run with 8 MHz XCLK. Approximately 1.5 frames/second
static void cameraSlowFrameRate() 
{ // Clock = external-clock * pll_mull(0,4,6 or 8 on bits 6 and 7) / (2*divider_5_lsb+1)
  // Here: 
  // CLKRC register: Prescaler divide by 31
  uint8_t reg = cameraReadRegister(0x11); cameraWriteRegister(0x11, (reg & 0b1000000) | 0b00011110);
  // DBLV register: PLL = 6
  reg = cameraReadRegister(0x6B); cameraWriteRegister(0x6B, (reg & 0b00111111) | 0b10000000);
}
static void cameraClock(int mhz) // Assumes pixel clock = 8Mhz
{ // Clock = external-clock * pll_mull(0,4,6 or 8 on bits 6 and 7) / (2*divider_5_lsb+1)
  float ratio= mhz/8.0f; // this is what we want to get to... find closest value...
  float const pll[3]= { 4.0f, 6.0f, 8.0f };
  int bestpll= 0, bestdiv=0; float bestr= 1000.0;
  for (int p= 0; p<3; p++)
    for (int d=0; d<32; d++)
    {
      float r= pll[p]/(2*(d+1)); if (abs(r-ratio)<abs(ratio-bestr)) { bestr= r; bestpll= p+1; bestdiv= d; }
    }
   
  // CLKRC register: Prescaler divide by 31
  uint8_t reg = cameraReadRegister(0x11); cameraWriteRegister(0x11, (reg & 0b1000000) | bestdiv);
  // DBLV register: PLL = 6
  reg = cameraReadRegister(0x6B); cameraWriteRegister(0x6B, (reg & 0b00111111) | (bestpll<<6));
  char t[100]; sprintf(t, "camera clock %f with pll=%d and div=%d", 8.0f*bestr, bestpll, bestdiv); Serial.println(t);
}

static void cameraFrameStartIrq();
static void cameraReadLineIrq();
static inline bool getCamVsync() { return (GPIO4_DR & (1<<4))!=0; } // 2  // EMC_04
static inline bool getCamHref() { return (GPIO4_DR & (1<<5))!=0; } // 3  // EMC_05
static void inline cameraStopLineIrq() { GPIO4_IMR&= ~(1<<5); } // disable irq
static void inline cameraStartLineIrq() { GPIO4_ISR&= ~(1<<5); GPIO4_IMR|= 1<<5; } // clear any previous IRQ and re-enabled
static inline bool getCamPpclk() { return (GPIO4_DR & (1<<6))!=0; } // 4  // EMC_06

void cameraBegin()
{
    // Setup all GPIO pins as inputs
    pinMode(PinCamVsync, INPUT); pinMode(PinCamHref,  INPUT); pinMode(PinCamPpclk,  INPUT);
    pinMode(PinCamD0,    INPUT); pinMode(PinCamD1,    INPUT); pinMode(PinCamD2,    INPUT); pinMode(PinCamD3,    INPUT); pinMode(PinCamD4,    INPUT); pinMode(PinCamD5,    INPUT); pinMode(PinCamD6,    INPUT); pinMode(PinCamD7,    INPUT);
    analogWriteFrequency(PinCamXclk, 8000000); analogWrite(PinCamXclk, 127); delay(100); // 9mhz works, but try to reduce to debug timings with logic analyzer
    // Configure the camera for operation
    Wire.begin();
    int i= 0; while (OV7670Registers[i].reg!=0xff) { cameraWriteRegister(OV7670Registers[i].reg, OV7670Registers[i].val), i++; }
    cameraWriteRegister(0x3a, 0x14);cameraWriteRegister(0x67, 0xc80);cameraWriteRegister(0x68, 0x80); // B&W mode...
    //cameraClock(12);
    //cameraWriteRegister(0x11, 0x07); cameraWriteRegister(0x3b, 0x0a);// Night mode1
    //cameraWriteRegister(0x11, 0x03); cameraWriteRegister(0x3b, 0x0a);// Night mode2
    delay(100);
    attachInterrupt(digitalPinToInterrupt(PinCamVsync), cameraFrameStartIrq, RISING);
    attachInterrupt(digitalPinToInterrupt(PinCamHref), cameraReadLineIrq, RISING); //cameraStopLineIrq();
}

static inline uint32_t cameraReadPixel() 
{
  uint32_t pword= GPIO6_DR >> 18;  // get the port bits. We want bits 18, 19 and 22 to 27
  return (pword&3) | ((pword&0x3f0)>>2);
}

static uint16_t cameraImageTemp[160 * 120]; // QQVGA image buffer in RGB565 format. This is where the IRQ stores the incomming picture
uint16_t cameraImage[160 * 120];            // When an image is complete, it is copied here (note: use switch buffers to avoid mem copy?)
static bool cameraNewFrame= false;          // true when cameraImage contains a new image that was never read
static uint32_t cameraPixelPos= 0, cameraLineCount= 0; // Used to know where to store the nex incmming pixel and where we are in the frame receiving...
static uint32_t nextFrameGatherTime= 0;     // when next to grab a frame? This is used to limit the framerate
static bool cameraframeError= false;        // true if an error was detected
static bool cameraframeDone= true;          // true when a frame was fully received. no further reception will be done while ture

// Function called on line sync raising IRQ. Reads an incomming line and stores it in cameraImageTemp
static void cameraReadLineIrq()
{
  if (cameraframeDone) return;           // not actively receiving. Ignore the frame.
//  uint32_t linet1= ARM_DWT_CYCCNT;
  for (int k=cameraWidth(); --k>=0;)     // get a full ine
  {
    if (!getCamHref() || !getCamVsync()) { cameraframeError= true; return; } // if premature end detected. note it.
    while (!getCamPpclk()); // Wait for clock to go high
    uint32_t hByte = cameraReadPixel();
    while (getCamPpclk());   // Wait for clock to go back low
    while (!getCamPpclk()); // Wait for clock to go high
    uint32_t lByte = cameraReadPixel();
    cameraImageTemp[cameraPixelPos++] = (hByte << 8) | lByte; // save data.
    while (getCamPpclk());   // Wait for clock to go back low
  }
  if (--cameraLineCount>0) return;      // frame not done? ready for next line!
  cameraframeDone= true;                // stop further reading until requested
  cameraNewFrame= true;                 // note that the frame is complete for the caller
  memcpy(cameraImage, cameraImageTemp, sizeof(cameraImageTemp)); // Copy it to main buffer
}

// Function called on frame sync raising IRQ. schedule a read of the next frame.
static void cameraFrameStartIrq()
{
  // limit frame reading at 5/s
  uint32_t now= millis();
  if (nextFrameGatherTime>now) return;
  nextFrameGatherTime= now+200;        // schedule next read in 200ms
  // ready for next frame...
  cameraPixelPos= 0; cameraLineCount= cameraHeight(); cameraframeError= false; cameraframeDone= false;
}

// return true if a new frame is available in the cameraImage buffer.
bool cameraRead()
{
  bool r= cameraNewFrame; cameraNewFrame= false; return r;
}
 
Does anyone know how many cycles are required from an external pin rising until *first instruction* of the serving ISR? I'm running the example at 25MHz XCLK, i.e. 6.25MHz PCLK, and it takes 76ns from HREF L->H until PCLK L->H and then 80ns to PCLK H->L, and I have the feeling that I'm systematically missing the first PCLK transition in the ISR ...

Kind regards,
Sebastian
 
According to the app note at : https://www.nxp.com/docs/en/application-note/AN12078.pdf

It should only take about 12 clock cycles. However, I followed the methodology in the app note and found it took about 90 cycles. I think this is because when you attach an interrupt to a pin, it goes through a single vector and handler for the whole GPIO port and some software in the handler has to sort through the pin registers to find which pin caused the interrupt, then vector you to the routine you've attached. You can find the code in ..\cores\teensyr\interrupt.c.

There are 8 pins on GPIO1 that have a unique vector for a rising edge on each pin. That should give much faster response. However, I can't figure out if any of those pins are available on the T4.x. I'm also a bit confused about GPIO port numbers. The reference manual says that the chip has 5 GPIO ports, but the core files refer to GPIO6..GPIO8 and these ports appear at memory addresses that are not shown in the reference manual. There must be some magic mirroring of addresses happening somewhere.
 
In startup the 1062 IO move from GPIO 1,2,3 to GPIO 6,7,8 in order to facilitate FAST IO mode.

Once in FAST IO mode as noted all pin interrupts arrive at a single code entry point to be parsed and forwarded to 'attached' _isr() as determined by the detected change/rise/fall state.

For 1::1 pin interrupts a port could be moved back to SLOW IO mode, or if only SPECIFIC pins are needed the interrupt.c code could be replaced to ignore the checks on unused parts perhaps.

It seems that replacement is what I was playing with when I saw typical response about 100 cycles - and that went down a some 10's of cycles IIRC before I moved on.

In SLOW IO mode the output is delayed some ( waiting for a slower bus ? ) - not sure how that would affect the input detection for attachInterrupt. That was seen in T_4.0 beta before the FAST IO change.
 
I dug up some code from earlier forum posts and figured out how to switch pin 0 (GPIO1_3) as pixel clock to a slow-mode port pin. It does give a faster interrupt response--the time from clock rising edge to setting a bit in the ISR decreased from 84nSec to 44nSec. That is even better than in the app note about IRQ latency--which may have something to do with the scope output bit being in fast mode. However there was a side effect in that the ISR seemed to run more slowly. I suspect that it is a side effect of switching something in the port setup when I changed GPIO1_3 to slow mode.

I also found that I was wrong to attribute some of the slow IRQ response to stacking the FPU registers. By default the CPU is set up to use "lazy stacking" where the FPU registers are not saved unless the FPU is used in the ISR.

Here is the new test code, which uses an #define to switch between the slow and fast IRQ response. Note that only pins 0 and 1 can be used for the pixel clock input as they are the only two that go to low-order bits in GPIO1.

Code:
//  IRQ Timing test  mborgerson  9/25/2020

const int clockoutpin = 14;
const int pclkpin = 0;  // assigned to GPIO1_3
// This bit is set and cleared in the IRQ handler for oscilloscope timing
const int scopepin = 10;

const int ledpin = 13;
#define TMHISTOMAX 4096
uint32_t tm_histobuffer[TMHISTOMAX];  // for timing intervals up to 40.96mSec
uint8_t pixbuffer[65536];  // simulated frame buffer
volatile uint32_t pixidx = 0;;
volatile uint32_t pixcount;
volatile uint8_t pixbyte;

//#define SLOWMODE
void setup() {
  delay(500);
  Serial.begin(115200);
  delay(1000);

  pinMode(ledpin, OUTPUT);
  pinMode(scopepin, OUTPUT);  // scope marker

  analogWriteFrequency(clockoutpin, 3000000); // 3 MHz interrupt rate
  analogWrite(clockoutpin, 128);
  // ARM cycle counter activated in startup code

  pinMode(pclkpin, INPUT); 

#ifdef SLOWMODE
  // Set up pin 0 as GPIO1  for faster vectoring
  // pin 0 maps to GPIO1_3
  CCM_CCGR1 |= CCM_CCGR1_GPIO1(CCM_CCGR_ON);  // enable GPIO1
  IOMUXC_GPR_GPR26 = 0xFFFFFFF7; // GPIO1_3 to slow mode
  attachInterruptVector(IRQ_GPIO1_INT3, &pixelservice);
  GPIO1_IMR |= 0x08;  // enable bit 3 interrupt
  GPIO1_ICR1 |= 0x80;  // enable rising edge interrupt
  // GPIO1_EDGE_SEL starts at 0x0000,  which is OK
  VerifySetup();
  NVIC_ENABLE_IRQ(IRQ_GPIO1_INT3);

#else 
  // use standard fast I/O ports, but slower vectoring
  attachInterrupt(digitalPinToInterrupt(pclkpin), pixelservice, RISING);
#endif
}

void VerifySetup(void){
  Serial.printf("GPIO1_DR        0x%08X\n", GPIO1_DR);
  Serial.printf("GPIO1_GDIR      0x%08X\n", GPIO1_GDIR);
  Serial.printf("GPIO1_ICR1      0x%08X\n", GPIO1_ICR1);
  Serial.printf("GPIO1_ICR2      0x%08X\n", GPIO1_ICR2);
  Serial.printf("GPIO1_IMR       0x%08X\n", GPIO1_IMR);
  Serial.printf("GPIO1_EDGE_SEL  0x%08X\n", GPIO1_EDGE_SEL);  
  delay(2000);
  
}

// service the pixel clock interrupt
void  pixelservice(void) {
  uint16_t pword;
  uint8_t b;
  digitalWriteFast(scopepin, HIGH);  // set scope marker pin
#ifdef SLOWMODE
  // have to clear the ISR bit ourselves
  GPIO1_ISR |= 0x08; // clear ISR flag 
#endif
  pword = GPIO6_DR >> 16;  // get the port bits
  b = pword >> 4; // move the high 6 bits into place
  // now we have to get the proper lower 2 bits into place
  b &= 0xFC;  // clear the lower 2 bits
  if (pword & 0x04) b |= 0x01; // bit zero comes from original bit 2
  if (pword & 0x08) b |= 0x02; // bit one comes from original bit 3
  pixbuffer[pixidx++] = b;
  if (pixidx > 65535) pixidx = 0; //wrap the pointer to beginning
  digitalWriteFast(scopepin, LOW);  // clear marker pin
}

uint32_t startcycles, lastcycles, tmdiff;
elapsedMicros loopmicros;
void loop() {
  uint32_t loopcount;
  // put your main code here, to run repeatedly:

  memset(tm_histobuffer, 0, sizeof(tm_histobuffer));  // clear  timing histogram counts
  Serial.println("Collecting");
  delay(10);  // let the USB output finish
  loopcount = 0;
  loopmicros = 0;  // reset the timer
  while (loopmicros < 1000000) { // run for 1 second
    startcycles =  ARM_DWT_CYCCNT; // measure time between two instructions
    lastcycles = ARM_DWT_CYCCNT;   // sometimes this gets extended by interrupt
    tmdiff = lastcycles - startcycles;
    if (tmdiff >= TMHISTOMAX) tmdiff = TMHISTOMAX - 1;
    tm_histobuffer[tmdiff]++;
    delayMicroseconds(2);
    loopcount++;
  }// wait until 1 second has passed

  ShowTmHisto();
  Serial.printf("Loop count = %lu  ", loopcount);
  Serial.println();
  delay(1000); // allow time for user to read results
}

void ShowTmHisto(void) {
  uint32_t i;
  Serial.println("Timing Histogram Data in ARM Cycle Counts");
  for (i = 0; i < TMHISTOMAX; i++) {
    if (tm_histobuffer[i] > 0) {
      Serial.printf("%5lu %5lu\n", i, tm_histobuffer[i]);
    }
  }
  Serial.println();
}
 
Now that my two cheap cameras arrived. Do you have some sample code to play with :D

Looks like Arduino and Adafruit have versions of libraries that are geared directly to some specific hardware.
 
Status
Not open for further replies.
Back
Top