Problem trying to read OV7670 camera under IRQ Teensy 4.0

Status
Not open for further replies.

cyrille

Well-known member
Hello,

I am trying to read the OV7670 camera frames under IRQ.
The reason why I want to do it under IRQ is because frames can come as slow as 3fps, and I do not want to have to put a 1/3 of a second delay in my main loop (where I have other things to do, such as handle pen events)...
And of course, the OV7670 sends data when IT wants, not when you want it...

Anyhow, I first made a "normal", non IRQ version to make sure that I had everything OK, and then tried to move it under IRQ.
Using the "normal" version, I checked that the frame reading takes around 500micro seconds (1/2 milli second), so that it should be perfectly OK under IRQ. And that I had a limited number of IRQ per seconds (frames, tops at around 15fps)..
So, the "under IRQ" charge should be less than 0.5*15=7.5ms/s ... No problems... In theory...
BUT, then I did move the code under IRQ, suddenly, everything seems to slow down to a crawl... and the code tells me that it takes 100ms to read a frame!!!!

Note: the camera is configured at 160*120 pixels

Any clue as to what might be going wrong?

In parallel to this, I have a screen display, using the teensy tft library with the screen updated asynchronously using a framebuffer...
I have no clue if it makes any difference... The picture shows the system with the LCD displaying a starfield and the camera on the top right. The camera itself on the left and the teensy at the top...

Thanks,
Cyrille
IMG_1838.jpg

Here is how I init the interrupt:
Code:
    attachInterrupt(digitalPinToInterrupt(PinCamVsync), cameraSubRead, RISING);



static uint32_t readtime= 0; // incremements the time spend reading the camera...

// Here is the camera frame reading code, which works both under IRQ or not... But reports around 0.5ms execution time
// when not in IRQ and 100ms under irq!!!
// Read a uint8_t of the pixel data
static inline uint8_t cameraReadPixel() 
{
  uint8_t b = 0;
  if (digitalReadFast(PinCamD7)) b |= 0x80;
  if (digitalReadFast(PinCamD6)) b |= 0x40;
  if (digitalReadFast(PinCamD5)) b |= 0x20;
  if (digitalReadFast(PinCamD4)) b |= 0x10;
  if (digitalReadFast(PinCamD3)) b |= 0x08;
  if (digitalReadFast(PinCamD2)) b |= 0x04;
  if (digitalReadFast(PinCamD1)) b |= 0x02;
  if (digitalReadFast(PinCamD0)) b |= 0x01;
  return b;
}

static void cameraSubRead()
{
  int i= 0; 
  #ifdef dointerrupt
  camCount++;
  #endif;
  uint32_t t1= micros();
  while (digitalReadFast(PinCamVsync)) 
  {
    while (digitalReadFast(PinCamVsync) && !digitalReadFast(PinCamHref)); // Wait for a line to start
    if (!digitalReadFast(PinCamVsync)) break;  // Line didn't start, but frame ended
    while (digitalReadFast(PinCamHref))     // Wait for a line to end
    {
      while (!digitalReadFast(PinCamPpclk)); // Wait for clock to go high
      uint8_t hByte = cameraReadPixel();
      while (digitalReadFast(PinCamPpclk));   // Wait for clock to go back low
      while (!digitalReadFast(PinCamPpclk)); // Wait for clock to go high
      cameraImage[i++] = (hByte << 8) | cameraReadPixel();
      while (digitalReadFast(PinCamPpclk));   // Wait for clock to go back low
    }
  }
  readtime+= micros()-t1;
}

// And my "gather data loop", which has both an IRQ and non IRQ version. it displays the stats....
int cameraReadCalls= 0;
// Acquire and display RGB565 image from the camera. Gets called by the main loop
bool cameraRead()
{
  #ifndef dointerrupt
  uint32_t t1= millis();
  while (digitalReadFast(PinCamVsync));    // Wait for the old frame to end
  while (! digitalReadFast(PinCamVsync));  // Wait for a new frame to start
  uint32_t t2= millis();
  noInterrupts();
  cameraSubRead();
  interrupts();
  uint32_t t3= millis();
  Serial.print(t2-t1); Serial.println(" camera wait time");
  Serial.print(t3-t2); Serial.println(" camera read time");
  #else
  uint32_t t3= millis();
  #endif

  Serial.print(camIrqCount); Serial.println(" camera irq");
  Serial.print(camCount); Serial.println(" frame count");
  Serial.print(camCount/(float(t3)/1000)); Serial.println(" frame /s");
  Serial.print(readtime/camCount); Serial.println(" micros / frame");
  bool r= cameraNewFrame; cameraNewFrame= false; return r;
}
 
Last edited by a moderator:
I believe on the Teensy micros() does not work properly from within an interrupt.

Also, I suggest that you add some timeouts to your protocol handling loops.

You might also want to consider connecting the camera's data pins to exactly one of the T4 ports, such that you could read all 8 data lines in one go.

Kind regards,
Sebastian
 
Hello,

>I believe on the Teensy micros() does not work properly from within an interrupt.
Do you have any idea what equivalent function I could use? I need accuracy of around 1/10 of a ms...

>Also, I suggest that you add some timeouts to your protocol handling loops.
Hard to do without some type of way to measure time :-(

>You might also want to consider connecting the camera's data pins to exactly one of the T4 ports, such that you could read all 8 data lines in one go.

Yes, this is in plan.. In fact, the pins are setup so that they are all on the same port, and in the right order, but I have not yet figured out how to access the register directly... I am laking the appropriate doc...
Anyhow, that would NOT speed up the system as the camera is dictating when to do the read. Speeding this up would just mean spending more time waiting for the next data to be available :-(

Cyrille
 
I have had no problems using micros() inside an interrupt handler---as long as the handler executes in less than a millisecond.

The code in post #1 shuts off interrupts for the duration of CameraSubRead(). There are a bunch of wait loops in that function that probably means that it takes longer than a milliecond. If the function uses micros() calls while interrupts are blocked, the value returned might not change during the time interrupts are shut down because the systick interrupt is blocked.

As I understand the micros() source, it uses the millis() value and adds a fraction of a millisecond computed from the difference between the current ARM cycle counter and the cycle counter value at the last systick interrupt. If the systick() interrupt is blocked, the milliseconds value doesn't change and the calculation of micros() returns an improper value.

Some basic rules I follow to use interrupts successfully:

1. Don't have an interrupt handler that takes longer than a few tens of microseconds to execute on a T4 at 600MHz.
2. Don't have an interrupt handler wait on an external event. If you do that, it means that you don't really know how long the handler will take to execute.
3. Don't mask interrupts for more than a few hundred instructions.

On another topic, where did you get your code to set up the OV7670 camera? I worked with an OV76XX about 6 years ago and found that there was little to no documentation of the setup values for the camera registers. You had to be a major purchaser and sign an NDA to get the real documentation. Has that changed? I still have the camera and might want to work with it again.
 
The micros() code on T_4.x runs in under 40 cycles, and makes no external calls - and should be fine in an _isr(). It uses the last known millis value and adds the amount of time since that arrived based on the Cycle Counter then and now.

It does not disable interrupts ( or enable them ) - but if any interrupt happens to occur during the read of the two values current_millis and Cycle count at last millis update - it re-reads those two global values to make sure they are from the same time if the interrupt happened to be the systick interrupt.
 
The micros() code on T_4.x runs in under 40 cycles, and makes no external calls - and should be fine in an _isr(). It uses the last known millis value and adds the amount of time since that arrived based on the Cycle Counter then and now.

That is the basis of my question about the validity of micros() if interrupts have been disabled for more than a millisecond. If a systick() interrupt is missed, by disabling interrupts, will micros() be correct because the fractional part calculated from the difference in cycle counts is greater than one? This should be easily verified with a test program--which I may get around to tomorrow.
 
... question about the validity of micros() if interrupts have been disabled for more than a millisecond. If a systick() interrupt is missed, by disabling interrupts, will micros() be correct because the fractional part calculated from the difference in cycle counts is greater than one? This should be easily verified with a test program--which I may get around to tomorrow.

When interrupts are off and a systick is missed - both are then wrong.

When I wrote that code it was expected it would round up and add into the next ms - of course then when millis ticked it would go backwards **. However there was an anomaly found when the F_CPU_ACTUAL was dropped - that went away when Paul edited to test and add only up to the ms, never over. He added a test/limit and some _asm and it actually runs ~2 cycles faster - between _asm and perhaps feeding the dual execute core better.

** Writing that now - perhaps that delayed systick round up is where the anomaly came from. Systick hits 'later' and updates tick and the CYCCNT at that time - the next micros() would then report based on that info that could repeat or precede the prior returned value.
 
I believe on the Teensy micros() does not work properly from within an interrupt.

Folks, I looked up the issues I had with micros() a while ago. It was NOT on Teensy 4, it was on Teensy 3, and it was when I globally disabled interrupts, which micros() on Teensy 3 disables and then re-enables without checking whether they were not already disabled upon entry. Sorry for the confusion.

Note micros() on AVR does it more properly, reinstating the previous SREG state after disabling interrupts and fetching the volatile data ...

Kind regards,
Sebastian
 
Code:
while (digitalReadFast(PinCamVsync)) 
  {
    while (digitalReadFast(PinCamVsync) && !digitalReadFast(PinCamHref)); // Wait for a line to start
    if (!digitalReadFast(PinCamVsync)) break;  // Line didn't start, but frame ended
    while (digitalReadFast(PinCamHref))     // Wait for a line to end
    {
      while (!digitalReadFast(PinCamPpclk)); // Wait for clock to go high
      uint8_t hByte = cameraReadPixel();
      while (digitalReadFast(PinCamPpclk));   // Wait for clock to go back low
      while (!digitalReadFast(PinCamPpclk)); // Wait for clock to go high
      cameraImage[i++] = (hByte << 8) | cameraReadPixel();
      while (digitalReadFast(PinCamPpclk));   // Wait for clock to go back low
    }
  }

Hi Cyrille,

just a speculation: Maybe your code is too fast now? When it was not yet in the ISR maybe some stuff was going on in the background, slowing down your code? For instance, if PinCamHref goes low just a tiny bit of time after PinCamPpclk at the end of a line, then your code might try to fetch one more line that is in fact not available ...

Kind regards,
Sebastian
 
Makes sense - True - it does leave with interrupts enabled- I alluded to that issue in post #5 :: does not disable interrupts ( or enable them )
> T_3.x micros() checks the systick tick count proximity to finalize the micros() - and it does OFF/ON of interrupts.


I've noted before that ... I should see about writing the T_4.x code into T_3.x - won't work on T_LC - but CYCCNT and the interrupt detect code seemed to be supported in the 3.x family from the test I did in T_4.0 beta timeframe ...

But on T_4.0 Cycle counter is started before setup and has no other history. On T_3.x folks may have code in place that for some reason Zero's it rather than letting it run if enabled in the sketch. So it may bring issues ... and take some testing ...
 
Hello,
Well, quite a lot here...

First, the need for a fast and accurate timer.

In one of the other i.MX based project I worked on, I setup a timer/clock to run at 100KhZ and was able to get the time (in 10th of micro s) by just reading a register. No problems with Irq, no read of ms or anything like that.. Fast, accurate and worries free. Of course, in this case, such time would NOT sync with the system millis... but that would not be a problem.
Any pointers on how to implement something similar here?
- I heard about a "Cycle Counter". Is that something that I can access directly?
- What else can I access directly? I have not found the source where the i.MX registers and structure are described. could someone point me to them?



> About the wait loops under interrupt which "probably means that it takes longer than a milliecond."
As far as I can tell, by measuring the function in the non Irq mode, it takes around 500 microseconds...
The explanations on how micros works might explain why I am getting bad readings in IRQ mode. but I would like to double check/verify this...

I understand the warnings about "long interrupts"... But I do not see much of a solution here...
The OV7670, when a frame is ready, it tells you by pulling a pin high, and then it sends the data immediately... no ways to pause it, or delay it.
The data sending takes 1/2 of a ms.
So, 10 times per second, (at 10fps), the camera tells you "here is the data", take it or leave it...
so, I can poll, and spend most of my time waiting for the camera to have data to send (and here, I have to stop the IRQ, else I miss bytes now and then!)...
Or I can do it under IRQ, and break some rules :-(

The i.MX chip does have, as far as I understand, HW assist for this... But the pins are not available on the Teensy 4.0 :-( so I have to do it by hand...



>On another topic, where did you get your code to set up the OV7670 camera?
I go the code by scrounging the internet and putting a bunch of things together until it started working...
Here is the code that I am using. I hope that it helps you. It is configured at 160*120




Code:
#include <Arduino.h>
void cameraBegin();
bool cameraRead();
extern uint16_t cameraImage[160 * 120]; // QQVGA image buffer in RGB565 format
static int cameraWidth() { return 160; }
static int cameraHeight() { return 120; }

// Note: it is important for the pins to be #defined here to allow the fast reads to work properly as the need to know the pins at compile time
#define PinCamVsync 2  // EMC_04
#define PinCamHref  3  // EMC_05
#define PinCamPpclk 4  // EMC_06
#define PinCamXclk  5  // EMC_08
#define PinCamD0    14 // AD_B1_02
#define PinCamD1    15 // AD_B1_03
#define PinCamD3    16 // AD_B1_07
#define PinCamD2    17 // AD_B1_06
#define PinCamD6    20 // AD_B1_10
#define PinCamD7    21 // AD_B1_11
#define PinCamD4    22 // AD_B1_08
#define PinCamD5    23 // AD_B1_09
#include "Camera.h"

#include <Wire.h>
static int const cameraAddress= 0x21; // i2c address of camera's sccb device
static struct { uint8_t reg, val; } const OV7670Registers[] = {
  {0x12, 0x80}, // Reset to default values
  {0x11, 0x80}, // Set some reserved bit to 1!
  {0x3B, 0x0A}, // Banding filter value uses BD50ST 0x9D as value. some reserved stuff + exposure timing can be less than limit on strong light
  {0x3A, 0x04}, // output sequense elesection. Doc too lmited to be sure
  {0x3A, 0x04},
  {0x12, 0x04}, // Output format: rgb
  {0x8C, 0x00}, // Disable RGB444
  {0x40, 0xD0}, // Set RGB565
  {0x17, 0x16}, // Y window start msb (3-11) I think
  {0x18, 0x04}, // Y window end msb (3-11)
  {0x32, 0x24}, // Y window lsb end= 100b start= 100b
  {0x19, 0x02}, // X window start msb (2-10) I think
  {0x1A, 0x7A}, // X window end msb (2-10) I think
  {0x03, 0x0A}, // X window lsb (10 and 10)
  {0x15, 0x02}, // VSync negative
  {0x0C, 0x04}, // DCW enable? 
  {0x3E, 0x1A}, // Divide by 4
  {0x1E, 0x27}, // mirror image black sun enabled and more reserved!
  {0x72, 0x22}, // Downsample by 4
  {0x73, 0xF2}, // Divide by 4
  {0x4F, 0x80}, // matrix coef
  {0x50, 0x80},
  {0x51, 0x00},
  {0x52, 0x22},
  {0x53, 0x5E},
  {0x54, 0x80},
  {0x56, 0x40}, // contracts
  {0x58, 0x9E}, // matrix
  {0x59, 0x88}, // AWB
  {0x5A, 0x88},
  {0x5B, 0x44},
  {0x5C, 0x67},
  {0x5D, 0x49},
  {0x5E, 0x0E},
  {0x69, 0x00}, // gain per channel
  {0x6A, 0x40}, // more gain
  {0x6B, 0x0A}, // pll reserved stuff!
  {0x6C, 0x0A}, // AWB
  {0x6D, 0x55},
  {0x6E, 0x11},
  {0x6F, 0x9F},
  {0xB0, 0x84}, // reserved!
  {0xFF, 0xFF}  // End marker
};

// Read a single uint8_t from address and return it as a uint8_t
static uint8_t cameraReadRegister(uint8_t address) 
{
  Wire.beginTransmission(cameraAddress);
  Wire.write(address);
  Wire.endTransmission();

  Wire.requestFrom(cameraAddress, 1);
  uint8_t data = Wire.read();
  Wire.endTransmission();
  return data;
}

// Writes a single uint8_t (data) into address
static int cameraWriteRegister(uint8_t address, uint8_t data) 
{
  Wire.beginTransmission(cameraAddress);
  Wire.write(address); Wire.write(data);
  return Wire.endTransmission();
}

// Reset all OV7670 registers to their default values
static void cameraReset() 
{
  cameraWriteRegister(0x12, 0x80); delay(10);
  cameraWriteRegister(0x12, 0); delay(10);
}

// Read and display all of the OV7670 registers
static void cameraReadI2C() 
{
  for (int i = 0; i <= 0xC9; i ++) 
  {
    Serial.print("Register: "); Serial.print(i, HEX); Serial.print(" Value: "); uint8_t reg = cameraReadRegister(i); Serial.println(reg, HEX);
  }
}

// Slow the frame rate enough for camera code to run with 8 MHz XCLK. Approximately 1.5 frames/second
static void cameraSlowFrameRate() 
{
  // CLKRC register: Prescaler divide by 31
  uint8_t reg = cameraReadRegister(0x11); cameraWriteRegister(0x11, (reg & 0b1000000) | 0b00011110);
  // DBLV register: PLL = 6
  reg = cameraReadRegister(0x6B); cameraWriteRegister(0x6B, (reg & 0b00111111) | 0b10000000);
}

uint16_t cameraImage[160 * 120]; // QQVGA image buffer in RGB565 format

//#define dointerrupt
static void cameraSubRead();
static int camCount= 0;
static int camIrqCount= 0;
static void cameraCount() { camIrqCount++; }

void cameraBegin()
{
    // Setup all GPIO pins as inputs
    pinMode(PinCamVsync, INPUT); pinMode(PinCamHref,  INPUT); pinMode(PinCamPpclk,  INPUT);
    pinMode(PinCamD0,    INPUT); pinMode(PinCamD1,    INPUT); pinMode(PinCamD2,    INPUT); pinMode(PinCamD3,    INPUT); pinMode(PinCamD4,    INPUT); pinMode(PinCamD5,    INPUT); pinMode(PinCamD6,    INPUT); pinMode(PinCamD7,    INPUT);
    analogWriteFrequency(PinCamXclk, 8000000); analogWrite(PinCamXclk, 127); delay(100);
    // Configure the camera for operation
    Wire.begin();
    Serial.println("init camera");
    int i= 0; while (OV7670Registers[i].reg!=0xff) { cameraWriteRegister(OV7670Registers[i].reg, OV7670Registers[i].val), i++; }
    cameraWriteRegister(0x3a, 0x14);cameraWriteRegister(0x67, 0xc80);cameraWriteRegister(0x68, 0x80); // B&W mode...
    delay(100);
    #ifdef dointerrupt
    attachInterrupt(digitalPinToInterrupt(PinCamVsync), cameraSubRead, RISING);
    #else
    attachInterrupt(digitalPinToInterrupt(PinCamVsync), cameraCount, RISING);
    #endif
}

// Read a uint8_t of the pixel data
static inline uint8_t cameraReadPixel() 
{
  uint8_t b = 0;
  if (digitalReadFast(PinCamD7)) b |= 0x80;
  if (digitalReadFast(PinCamD6)) b |= 0x40;
  if (digitalReadFast(PinCamD5)) b |= 0x20;
  if (digitalReadFast(PinCamD4)) b |= 0x10;
  if (digitalReadFast(PinCamD3)) b |= 0x08;
  if (digitalReadFast(PinCamD2)) b |= 0x04;
  if (digitalReadFast(PinCamD1)) b |= 0x02;
  if (digitalReadFast(PinCamD0)) b |= 0x01;
  return b;
}

static bool cameraNewFrame= false;
static uint32_t readtime= 0, readtime1= 0;
static void cameraSubRead()
{
  int i= 0; 
  #ifdef dointerrupt
  camIrqCount++;
  #endif
  camCount++;
  uint32_t t1= micros();
  if (!digitalReadFast(PinCamVsync)) return;
  for (int j=cameraHeight(); --j>=0;) //while (digitalReadFast(PinCamVsync)) 
  {
    while (digitalReadFast(PinCamVsync) && !digitalReadFast(PinCamHref)); // Wait for a line to start
    if (!digitalReadFast(PinCamVsync)) break;  // Line didn't start, but frame ended
    for (int k=cameraWidth(); --k>=0;) //while (digitalReadFast(PinCamHref))     // Wait for a line to end
    {
      while (!digitalReadFast(PinCamPpclk)); // Wait for clock to go high
      uint8_t hByte = cameraReadPixel();
//      int r= hByte>>3*76;
      while (digitalReadFast(PinCamPpclk));   // Wait for clock to go back low
      while (!digitalReadFast(PinCamPpclk)); // Wait for clock to go high
      uint8_t lByte = cameraReadPixel();
// 0.3R + 0.59G + 0.11B = 76R+151G+28B // msb is red
//      r+= (lByte&0x1f)*28 + ((((hByte<<8)|lByte)>>5)&0x3f)*75;
//      r>>= 8;
//      cameraImage[i++]= (r<<11)+(r<<5)+(r);
      cameraImage[i++] = (hByte << 8) | lByte;
      while (digitalReadFast(PinCamPpclk));   // Wait for clock to go back low
    }
  }
  cameraNewFrame= true;
  readtime+= micros()-t1;
}

// Acquire and display RGB565 image from the camera
bool cameraRead()
{
  #ifndef dointerrupt
  uint32_t t1= millis();
  while (digitalReadFast(PinCamVsync));    // Wait for the old frame to end
  while (! digitalReadFast(PinCamVsync));  // Wait for a new frame to start
  uint32_t t2= micros();
  noInterrupts();
  cameraSubRead();
  interrupts();
  readtime1+= micros()-t2;
  uint32_t t3= millis();
  Serial.print(readtime1/camCount); Serial.println(" micros / frame");
  #else
  uint32_t t3= millis();
  Serial.print(readtime/camCount); Serial.println(" micros / frame");
  #endif

  //Serial.print(camIrqCount); Serial.println(" camera irq");
  //Serial.print(camCount); Serial.println(" frame count");
  //Serial.print(camCount/(float(t3)/1000)); Serial.println(" frame /s");
  bool r= cameraNewFrame; cameraNewFrame= false; return r;
}
 
Last edited by a moderator:
Yes the cycle counter is easy and available for use - already running on T_4.x - at default 600 Mhz - with a couple of ticks to read it. Reads like a register - so same as what you did before but 6 times faster clocking.

Here is a sample from forum:
Code:
	long unsigned stime;

	stime = ARM_DWT_CYCCNT;
	for (register int i = 0; i < 1000000; ++i ) {
		test3();
	}
	Serial.printf("t3 result = %f %X\n", 1000000.0 * (ARM_DWT_CYCCNT - stime) / F_CPU_ACTUAL, test3());

Micros code here : github.com/PaulStoffregen/cores/blob/master/teensy4/delay.c#L67

is installed on your machine with TeensyDuino Installer

If there is something odd working in interrupts please post. It may be from the interrupts Off/On - but that should slip that much as it looked to go off only a short time per line?

But ARM_DWT_CYCCNT won't have any issue and 10 times faster than micros() and 600 times resolution
 
Sorry I have sort have been following this thread but not responding...

Again I wonder about things like: assumptions about VSYNC... and the like:
If I look at the Datasheet: https://e-radionica.com/productdata/OV7670.pdf

It looks like VSync is a pulse and not held high? Also wonder if it might change state if it might create multiple interrupts.

Does your interrupt code actually getting a valid frame back?

If you are using the ILI9341_t3n in frame mode, hopefully newer version, older version had lots of interrupts to copy data from frame buffer to lower memory to do the DMA. So every N bytes it would get an interrupt. The newer code you get one or two interrupts per frame, which and does not copy memory. (It does flush the memory cache if it is up in DMAMEM area (including malloc)... If you are doing this, you may need to set the displays IRQ NVIC prioerity to a higher priority (lower value)...

As for needing to do it on Interrupt... Maybe, or maybe your main loop continuously check the status of the devices and read in information when needed.

Then again you may need/want to figure out how to handle the multiple devices at the same time...

That is if you have a pen event, and your display, if the user starts interacting with the screen and frame starts to read, the code will be hung up reading the display and you wont be able to process any touch events or the like until the frame completes. Which may be fine, or you might need to have your code be able to interleave the processing of both inputs... Hope that makes sense.
 
Hello,

OK, I will make 2 posts for 2 different things...

I did not understand that millis was generated solely by incrementing a variable on an irq.
I usually implement my "now" methods by reading an absolute timer, (RTC, PWM...) hence my surprise here that cutting the IRQ messes up the time.

Cyrille



/////////////////////
// Old version of the post prior to edit


First, about timing...

the gist of this post is that using micros and CPU cycles, I get some VERY different values...
and I am wondering why this is the case, and what to believe!
Is is possible that interrupts needed for ms to tick are lost because I stop them for too long? This might explain...
It truly depends on how millis is handled... but that would explain everything... and cause some concern for my design :-(

I ran the following code:

Code:
  if (cpuCycleCounterRatio==0) // Init stuff to make sure that things are OK
  {
// This code gets the time in µs and in Cpu cycle, waits 100ms according to millis, get the times again and display them.
// the result is as expected:
// 599331 cycles per ms
// 99889 micro second per 100 ms
    uint32_t t1= millis()+100;
    uint32_t t2= micros();
    cpuCycleCounterRatio= ARM_DWT_CYCCNT;
    while (t1>millis());
    cpuCycleCounterRatio= ARM_DWT_CYCCNT-cpuCycleCounterRatio;
    uint32_t t3= micros();
    Serial.print(cpuCycleCounterRatio/100); Serial.println(" cycles per ms");
    Serial.print(t3-t2); Serial.println(" micro second per 100 ms");
  }

// Then this code will read the camera frame and display how long it took.
// it does this in 3 ways: 1 using the micros call, it does a cumulative time count
// using micros, it does a frame by frame time count, 
// and finally, using the CPU counter, it does a count using the cpu cycle counts...
// But, I get very different results with the 3 methods
// The first one, average gets to around 500µs after 20 frames or so
// The second one varies widely from 300 to 900 µs
// And the CPU count one is stable at 97ms! which is a VERY large differential, does NOT match in any way, shape or form the other values...
// BUT it matches the values I was getting when doing the read under interrupt earlier!

//928 micros / frame
//97925 micros / frame cpu count
//928 micros / frame non average
//636 micros / frame
//97925 micros / frame cpu count
//344 micros / frame non average
//676 micros / frame
//97925 micros / frame cpu count
//758 micros / frame non average
//550 micros / frame
//97925 micros / frame cpu count
//173 micros / frame non average
//558 micros / frame
  uint32_t t2= micros();
  noInterrupts();
  uint32_t ct1= ARM_DWT_CYCCNT;
  cameraSubRead();
  uint32_t ct2= ARM_DWT_CYCCNT;
  interrupts();
  uint32_t t4= micros()-t2;
  readtime1+= t4;
  Serial.print(readtime1/camCount); Serial.println(" micros / frame");
  Serial.print((ct2-ct1)/600); Serial.println(" micros / frame cpu count");
  Serial.print(t4); Serial.println(" micros / frame non average");
 
> Sorry I have sort have been following this thread but not responding...
No worries, but THANKS a LOT for your comments, AND the work on the screen library!

>Again I wonder about things like: assumptions about VSYNC... and the like:
>If I look at the Datasheet: https://e-radionica.com/productdata/OV7670.pdf
>It looks like VSync is a pulse and not held high? Also wonder if it might change state if it might create multiple interrupts.

Nope, that one I checked, it is high during the whole frame. the OV Datasheets are... as the latest part of the word implies... shit (sorry about that stinky pun:) English is NOT my mother language and to me the sheet and shit sounds exactly the same... which took me a while to sort out when I started speaking english!)

>Does your interrupt code actually getting a valid frame back?
Yes, it does. But it seems to messup with the screen code.

>[about ILI9341_t3n library ] in frame mode
Thanks for the info. I do think that I have a recent version. Any way to verify this?0

>As for needing to do it on Interrupt... Maybe, or maybe your main loop continuously check the status of the devices and read in information when needed.
The problem is that my main loop does have some slow calls in it. Among other thing the display of all the stars in the starfield (with 3D calculations) and the "plate solving", which is code that, based on the camera image is supposed to figure out where the picture was taken in the sky (ie, recognize stars in said sky).
Going down all the low level loops in all these code and algo to add code to read the camera data is not practicable...

My aim now is to caracterize a whole lot better the times/delay in the camera module in order to see if/when I can use IRQ on a line, or pixel base...

Does anyone have any clue on the cycle counts to handle 1 irq..
ie, if I was to have 1 irq per camera pixel, the pixel rate being 500K/s, this gives 1200 cycles per pixels.
If it only takes 200 cycles to handle 1 interrupt (assuming an empty IRQ function), then the reading would take 1/6 of the CPU time... which might be acceptable...
but if it takes 800 cycles.. then it is not acceptable anymore...

Cyrille

Cyrille
 
I am sort of curious about these so ordered a cheap (< $10) pair from Amazon, that should get here end of week...

Not sure how much I will do or not, but at worst they end up in my Sensors box ;)

I see you are using pins (14-17 and 20-23) for the data pins. I would probably reorder such that they are in actual GPIO1 pin order
Probably like (14, 15, 17, 16, 22, 23, 20, 21) which get two continuous pieces of the data in register... So

Then wonder if one could do one read of GPIO6.DR to get all 8 bit along with a couple of shifts...

Then wonder if one could do it with DMA? Would need to switch those 8 bits back to using GPIO1 (slower IO).

What I have not played with is if you can use a GPIO pin to control the DMA timing? Again I have not tried this at all. Did play a little with DMA with GPIO to know you have to use the GPIO1-5 and not the 6-9...

But have not played with DMA being controlled by GPIO?

If I get to it, my guess might be to try using XBAR? Wondering about:
Code:
#define DMAMUX_SOURCE_XBAR1_0			30
#define DMAMUX_SOURCE_XBAR1_1			31
Not sure yet what/if there are mappings of some IO (XBAR) pins to something that triggers DMA? If so it might make it fun :D
 
Hello,

About the pin order...
I thought that I did indeed order them according to the port order.
Here is the info that I did have and why I choose them that way. But I might have mis-understood things..:
Teensy pin, Cpu_pin, camera_D_pin
14 AD_B1_02 ovd 0
15 AD_B1_03 ovd 1
17 AD_B1_06 ovd 2
16 AD_B1_07 ovd 3
22 AD_B1_08 ovd 4
23 AD_B1_09 ovd 5
20 AD_B1_10 ovd 6
21 AD_B1_11 ovd 7


Can the DMA be programmed to read data under an external pin activation? If yes, that would be great. Even if extra data is read and has to be post-processed it would still be MUCH better than to try to do it by SW...
Of course, as stated above, the Teensy 4.1 does, I believe expose the pins used by the HW assist for camera aquiering... But hallas, I only have a Teensy 4.0 :-(

Cyrille
 
OK, I have done some more testing...
I have instrumented the cameraSubRead function to measure the delay between 2 pixels being read, the time to read a line and the time to wait between a line and the next one...

Here is what I have (in CPU CYCLES!)
Time to read a whole frame = 40 200 000 cycles (67ms)
which is composed of 120 lines with:
- 264 500 cycles delay from line n to line n+1 (440µs)
- 320 bytes reads (2 bytes per pixels, 160 pixels) at 204 cycles per pixel (total 65280 cycles = 108µs)

120*(264500+320*204) ~= 40 000 000 cycles, so things seems OK...

What is interesting is to see that the pixel read (ie: read of a single line) is relatively fast! what is slow is the moving from line to line. 80% of the time is waiting for the next line, 20% for reading the pixels)...

This means that I should be able to do the line read under IRQ on each line start. Which is what I will attempt tomorrow...
This will cause 120*FPS IRQ per second, but each IRQ should be only around 108µs for a total of 1800 IRQ/s or 20% of the CPU time...

Cyrille
 
Hi Cyrille,

I'm confused with the timings. You say that you can read a full frame in 500μs, that your pixel rate is 500k/s, that your frame is 160x120 pixels QQVGA, and that you use RGB565 mode. That does not seem to fit. If your pixel rate is 500000/s (and therefore your byte rate 1000000/s as every pixel has two bytes), that is 2μs per pixel, then in the optimum case of all 19200 pixels following one another without gaps this makes 38400μs, way more than the 500μs that you measured initially ...

Kind regards,
Sebastian
 
Hello,

This has, in my opinion everything to do with bad measurements...

I was using millis and micros to measure the times... BUT, since I had to stop the IRQ, these were incorrect...
Once I moved onto using the CPU cycles, I got a correct timing of, first 100ms, changed later to 67ms as I increased the camera clock frequency.

now, my pixel rate is 0.333µs per byte, 0.66µs per pixel and my frame rate is one frame in 66ms or 15fps.
However, 0.666*120*160=12.8ms, NOT 66ms. the extra 53ms per frame is in fact taken by the "end of line" delay. The delay between one line being outputted and the next one...
Hence my next move: one IRQ per line...

Cyrille
 
Ah, I see. I guess you changed the analogWriteFrequency on PinCamXclk to 12MHz, which with the divider of 4 in your register init becomes 3MHz PCLK?

I also see in the datasheet Figure 8 that the QQVGA HREF timing is to emit only every fourth of what would be the VGA timing, which I guess explains the gap between lines ...

So indeed it might make sense to trigger an interrupt on VSYNC RISING to mark the beginning of a new frame, and another one on HREF RISING to pull in the line of 320 bytes (108µs) and after 120 lines to mark the frame as complete and ready to be picked up by whatever postprocessing you have in mind.

Kind regards,
Sebastian
 
Hello,

You seem to know/understand a lot about the OV chip...

As I stated above, I have little to no understanding of it. I was able to make it work mostly be copy and paste of various things here and there, but I have no understanding of it...

It would be better if this "line delay" was not there and I could read the whole frame in one go. I mean, at the moment, it takes 67ms, with 80% of the time lost/wasted. If it could be dropped to 13.4ms (67/5), it would be GREAT!

I also find that increasing my clock rate above 12Mhz does not work anymore (maybe due to my long strip pins? or some other analog issues?) removing the divider might also help speed the data retrieval up, which would be also a good thing. At the moment, it takes 200 cycles per bytes, but I am sure that it would still work at 100cycle, speeding up the frame retrival to around 7ms, which would be a great target!

So, can you tell me more about what I might be able to do to get to these types of figures?

Thanks,
Cyrille
 
Hello,

About the pin order...
I thought that I did indeed order them according to the port order.
Here is the info that I did have and why I choose them that way. But I might have mis-understood things..:
Teensy pin, Cpu_pin, camera_D_pin
14 AD_B1_02 ovd 0
15 AD_B1_03 ovd 1
17 AD_B1_06 ovd 2
16 AD_B1_07 ovd 3
22 AD_B1_08 ovd 4
23 AD_B1_09 ovd 5
20 AD_B1_10 ovd 6
21 AD_B1_11 ovd 7


Can the DMA be programmed to read data under an external pin activation? If yes, that would be great. Even if extra data is read and has to be post-processed it would still be MUCH better than to try to do it by SW...
Of course, as stated above, the Teensy 4.1 does, I believe expose the pins used by the HW assist for camera aquiering... But hallas, I only have a Teensy 4.0 :-(

Cyrille
The order I mentioned was your defines:
Code:
#define PinCamD0    14 // AD_B1_02
#define PinCamD1    15 // AD_B1_03
#define PinCamD3    16 // AD_B1_07
#define PinCamD2    17 // AD_B1_06
#define PinCamD6    20 // AD_B1_10
#define PinCamD7    21 // AD_B1_11
#define PinCamD4    22 // AD_B1_08
#define PinCamD5    23 // AD_B1_09
Never mind, my mind saw your order but did not see your defines were not in numerical order....

As for being able to control DMA from external pin, that is what am wondering about. Would be interesting to find out.
 
Hello,

Well, at least your questionning allowed me to verify my defines... Since I have just ordered PCB from the "internet" with this wiering, I was anxiousl to be sure that it was correct!

I beleive that DMA can be pin programmed, at least on the i.MX from the HP Prime (which is a calculator I created), I seem to remember seeing such things... but I am not 100% sure, and years have past since, and one is a cortex A while the other one is a Cortex M...

Cyrille
 
I also find that increasing my clock rate above 12Mhz does not work anymore (maybe due to my long strip pins? or some other analog issues?) removing the divider might also help speed the data retrieval up, which would be also a good thing. At the moment, it takes 200 cycles per bytes, but I am sure that it would still work at 100cycle, speeding up the frame retrival to around 7ms, which would be a great target!

Hi Cyrille,

no I don't know the OV7670.

What I know is that you are generating the clock for the OV7670 using the PWM feature of the Teensy 4, and you are driving it to its limits, have a look at the tables at https://www.pjrc.com/teensy/td_pulse.html. The F_BUS speed driving the PWM tick runs at a quarter of the F_CPU speed so at 150MHz, and if you request a 12MHz square wave by analogWriteFrequency(PinCamXclk,12000000); analogWrite(PinCamXclk,127) frankly I don't know how exact the square wave will be.

In your cameraSlowFrameRate you already activate both the prescaler and the PLL, you divide XCLK frequency by 31 and the multiply again by 8. What you could try is to generate XCLK of 8MHz as before and then use a prescaler of 3 and a PLL of x8 to reach an internal pixel clock of ~21MHz. Also, it seems as if the OV7670 always processes all internal VGA pixels at that pixel clock. It might perform pixel binning but that's not clear to me. In any case the output seems always only as fast as the CMOS array is read out internally, and then the DCW stuff seems to tackle the conversion from VGA to QQVGA and makes PCLK tick at a quarter of the internal pixel clock. But all that is mostly guesswork.

Kind regards,
Sebastian

PS: Do you have a reasonably fast logic analyser to check the signal lines in detail?
 
Status
Not open for further replies.
Back
Top