Teensy 3 POV display serial speed problem

Status
Not open for further replies.

Quwat

Active member
Hello all

I am working on a 3D pov music visualizer.

But I have found the delay between processing and the running display is too large, for my purposes. I'm not sure what part of my code is the bottleneck, Can you please help?

right now there is about an 500ms delay between changing the display data at the processing end and actually seeing it. That will make a crappy music visualizer.


What could be causing the problem


The display driving;
Yes I am bitbanging the shift registers; my impression is the pin toggling is more optimised for the hardware then using the arduino emulated stuff. But as I don't even have much experience with arduino code, I don't have time to go into using native code and adapting the sd card library optimised for teensy.

I am using a hardware interrupt to stabilize the image. I decided not to use a timer interrupt to power the display, as the timer libraries I looked at were in the millisecond range(not accurate enough?) Plus I don't think the way I have done is too less efficient.

I use if statements and track the microseconds, and within each of those times I only update the display once; so thats not causing the bottle neck.

Communication;
I have gone through all of the post about the work Paul has done on improving the serial receiving. So I made sure that I followed the suggestions of having a protocol where the teensy never asks for data, and trying to group together data within the 64 byte packets used. I am using serial readbytes and the latest update of teensyduion so I should be taking advantage of the optimizations recently done to it. Meaning I should be able to get, what is it about a megabyte per second and 50% cpu usage? That is an impressive data rate, so I must be doing something wrong?

What communication is taking place;
I am only sending 5 bytes at this stage, that should be no problem. This five is controlling 10 time sections with a bar value of 0-15. So is it



Why it can't be the display code;
The speed of the display has no effect and I have a very large head room for time resolution, with the current set up I could be getting 100*+ easy+


Is there a better way? or is this the reality?

Would doing the fft on the teensy give better performance? Or I am I going to have to take the analogue approach?


Sorry for the long post but, I have to post all the code as the problem could be anywhere.

////////// TEENSY CODE////////////////

const int data = 3;
const int latch = 2;
const int clock = 4;

const int red = 23;
const int green = 22;
const int blue = 21;


//*settings
const int real_timeResolusion = 10;
const int timeResolusion = (10+(2*(( real_timeResolusion ))));
const int layerNum = 1;// starts at one to leave room for the buffer



byte displayData[(real_timeResolusion)/2][layerNum]; // time res /2 because there are two time sections stored per byte
int primaryKey[layerNum];// used to shift the visulization time layers, not the pov time sections


byte incomingByte;

unsigned long period = 100000;
volatile unsigned long periodStart;
volatile boolean interupt = false;
int led=13;
volatile unsigned long oldTime;

void setup() {

**for(int c=0; c<layerNum ;c++)
**{
****primaryKey[c]*=*c;
**}

**pinMode(5,INPUT);
**pinMode(data,OUTPUT);
**pinMode(latch,OUTPUT);
**pinMode(clock,OUTPUT);
**pinMode(red,OUTPUT);
**pinMode(green,OUTPUT);
**pinMode(blue,OUTPUT);
**pinMode(led,OUTPUT);
**attachInterrupt(5, checkPoint, RISING);
**oldTime=*micros();

//*common*colour*control*I*will*take*advantage*of*this*later,*for*now*this*is*just*so*I*can*see*whats*going*on
**digitalWrite(green,HIGH);

**Serial.begin(12000000);
}



int byteCnt =0;

void checkPoint()
{
**if(interupt==false)
**{
****interupt=true;
**}
**if((micros()-oldTime)>1000) // only had a reed switch to work with, it double taps where it is; this is easier than moving it
**{
****//period = micros()-oldTime; // updates the period every rotation,
****// disabled because the period didn't stay stable; will come back to this later with averaging, or working backwards

****oldTime=*micros();
****periodStart*=*micros();// a refrence to find what time section is active
**}
}



int sectionCnt=0;

void loop() {

**noInterrupts();

**if(interupt==true || micros()>periodStart && micros()<(periodStart+(period)))
**{
****int currentSection = (ceil((micros()-periodStart)/(period/timeResolusion)))-2;


****if(currentSection > sectionCnt+1)// only allows a single update per time segment
****{*
******digitalWrite(latch,LOW);

******int loopLength =(layerNum-1);// to allow setting to a single layer
******if(loopLength == 0)
******{
********loopLength=1;*
******}

******for(int c=0; c<loopLength ;c++)
******{

********for(int b =0; b<2; b++)
********{

**********for(int a=0; a<8; a++)// this loops through all 8 bits for each shift register
**********{***
************digitalWrite(clock,LOW);

************int ledNum = a; // give the 0-15 led number to for the current branch, checked working
************if(b==1)
************{
**************ledNum*=*8+a;
************}***

************int temp = ceil(currentSection/2);// works out what byte the 4 bit value is within

************int val = displayData[temp-1][c];// collects the data

************if((currentSection % 2) == 0)// spliting the bye into its two sections,
************{
**************//removes the top 4 bits, as only the bottem 4 are needed
**************val*=*val*<<*4;
**************val*=*val*>>*4;*******
************}****
************else
************{
**************//only want the top four bits, push them into the first 4 bits to read as the 0-15 values wanted
**************val*=*val*>>*4;
************}


************// writes the state to the shift register,
************if(ledNum >= val)
************{
**************digitalWrite(data,LOW);
************}
************else
************{
**************digitalWrite(data,HIGH);
************}*
************digitalWrite(clock,HIGH);


**********}

********}

******}
******digitalWrite(latch,HIGH);

******sectionCnt*=*currentSection;

******if(sectionCnt <= timeResolusion)
******{
********sectionCnt=0;
******}
****}

****if(interupt==true)// starts the rotation by updating the time refrence
****{
******periodStart*=*micros();
******interupt=false;
****}


**}
**interrupts();

**int numOf_bytes = (real_timeResolusion/2);



**if (Serial.available() >= numOf_bytes)
**{*
****
****while(byteCnt < numOf_bytes)
****{
******int n = Serial.readBytes((char *)displayData[0], numOf_bytes-byteCnt);
******byteCnt*+=*n;*
****}

****byteCnt=0;

****int loopLength =(layerNum-1);// to allow setting to a single layer, checked
****if(loopLength == 0)
****{
******loopLength=1;*
****}

****for(int i=0; i< layerNum ;i++) // this changes the primary keys to move to the next frame check boundry, checked
****{*************
******primaryKey++;

******if(primaryKey >= layerNum)// check
******{
********primaryKey*=*0;*
******}

****}

**}
}






//////////// Processing code//////////////////


import*processing.serial.*;

Serial myPort;

void setup()
{
**size(200, 200);
**myPort*=*new Serial(this, "COM6", 12000000);
}

int time =50;
void draw() {
**
**for(int i = 0; i< 15; i++)
{*
**for(int a =0; a<4;a++)
**{
****myPort.write(i);
**}
*delay(time);
}

*
delay(time*2);

*for(int i = 15; i> 0; i--)
{*
**for(int a =0; a<4;a++)
**{
****myPort.write(i);
**}
*
*delay(time);
}
}

void keyPressed()
{
*
***if (keyCode == UP)
****{
******time*+=*10;
****}
**
******if (keyCode == DOWN)
****{
******time*-=*10;
****}
}

I used to copy for forum tool.......
 
I think I have found the root of the problem, the display isn't only updating once for each section. This will be having a dramatic effect on performance.

This is backwards, it should be "sectionCnt >= timeResolusion"

if(sectionCnt <= timeResolusion)
{
sectionCnt=0;
}

But it seems that this isn't the whole problem, it was just covering it.
 
I have fixed the display code so it only updates at the start of each time section, but it doesn't appear as if it has done much to improve the speed. I will have to do more testing.
 
Last edited:
I wish I could help but I dont know enough. Do you suggest any sites or books to get going on well any of this? Like bitbanging and hardware description or communication. I really want to to take the time and learn about this but I feel lost and overwhelmed. I dont really know the capabilities. Sorry for the noob questions
 
Bit-banging; instead of using the specifically designed library for communication, taking the crude approach and manually toggling (switching) the pins.

That code is a little messy, don't worry that you don't get it.

What do you want to learn about there are a lot of areas you can go into?

How much do you already know, are you still at school to?

If you just want to generally expand you knowledge, here and here are some great YouTube channels.

Or if you are absolutely new to the area, this is where I started.

As you go through that content, when ever you don't understand something let you self branch off into new tabs to work it out (but remember to go back where you started). If you don't have more than 15 tabs open, then your not asking enough questions :p
 
Thanks for the links! I am still in school but its summer so I have a lot of time when I get home from my internship. I think I want to get into embedded systems and firmware and stuff. Ive taken one class in C and Ive been using matlab at work and for multiple classes. I have an ar
 
@Quwat,

You'd make your code more readable if you replace the "*" with the spaces that should be used instead. Also, without at least a block schematic explaining the basic equipment involved and the electrical and perhaps software interfaces and data flow it will be difficult to comment on your code as it seems a more involved project.
 
Sorry I didn't think the * would matter that much, you can't use find and replace with a space?

Communication isn't one of my strong points (ironically could be the problem with the display performance :p)

Hardware setup

I simply have a piece of proto board with two 74HC595 shift registers controlling 16 leds, this board is connected to a old fan motor. The shift register is connected to my home brew slip rings, that then connect directly to the teensy. Because the teensy is not on the spinning structure, its connected directly to my computer over usb. Then I have a magnet and a reed switch with a pull down resistor, to sense the display rotating via an interrupt.

Still think I need to upload a schematic?

Didn't I explain the data flow in the original post?

Thanks headroom, you should be able to see the problem with my headroom easy, considering the user name :p
 
No it does not matter that much ;-) A search and replace would also indiscriminately remove the "*" in this line of code:
Code:
const int timeResolusion = (10+(2*(( real_timeResolusion ))));

So your "display" is essentially a row of LEDs mounted on a fsast spining rotor.
You are sending data from the Processing app on a PC through USB/Serial Port to the Teensy3 and from the Teensy3 per bit-banging shift registers you are sending data to the row(s) of rotating LEDs. In order to stabilaze the circular "image", you are synchonizing with a reed contact. Is that correct ?

You could also send a picture of your setup.
 
Last edited:
Yes that is correct, got any ideas how to increase the performance?

I'll won't be able to upload a photo until later today, but the setup is a little messy.
 
No I don't have an idea yet, but anotehr question. What does the data that you are sending from the processing represent and how do you want to it to look on the display.
In concept your project should not be so different from bicycle wheel lights described in this link on the Arduino Forum
 
The data represents bar values it takes 4 bits to store 16 values as 0-15, so I fit 2 in each byte.

so I will make a bar music visualization, you must have seen one before? Here an led cube take on what I'm doing.

The only thing similar to that project is its a pov, it doesn't have any live communication.
 
The very best thing you can do to get useful help & advice is take some time to post photos, schematics and a copy of the code you're trying.

Edit: post code with the "
Code:
" tags.  Preserving the colors is much less important that preserving the syntax, so someone can copy and paste it back into Arduino and click "Verify".
 
OK Paul I will get to fixing those issues.

But in the larger terms do you think the teensy and processing combination plus the overhead of POV, is capable of achieving what I want? (I would like at least 20fps, with 10 bytes per frame real time)

Am I doomed from the start because of a delay between processing and actually sending data to the teensy? Because I don't want to spend ages refining this code if the best result is worse than an alternative.

I could get the visualization data, from a hardware approach. But that may come with problems of its own.
 
But in the larger terms do you think the teensy and processing combination plus the overhead of POV, is capable of achieving what I want? (I would like at least 20fps, with 10 bytes per frame real time)

You've posted so little detailed information that I'm left to guesswork here. I want to be helpful, but before I write this, I want to communicate to you as clearly and as strongly as possible that you can not expect good help if you do not put more work into asking good questions, where "good" means you've actually put some work into explaining in detail the circuitry, code, and other details. What that in mind, here's the best I can do with guesswork....

So, my first guess is based on "10 bytes per frame real time". What does that actually mean? I can't tell. 10 bytes is such a small amount of data, so I'm going to guess that you're going to try sending the angular position to Processing and have it respond with 10 byte updates which correspond to the precise position.

If so, I would say no. Such a design is simply not feasible using Windows, Linux or Mac. The limitation is not Teensy. With reasonably good coding, you can easily get very low latency on Teensy. But none of the commercial operating systems can give you reliable dependable low (or fixed) latency. They do far too much other stuff. You can get accurate timestamps, so there may be very complex ways to you could work around the varying latency between position updates and reception of position-dependent data. But that would be pretty tricky, so I would avoid it.

I would design this system where the PC transmits the entire image, without any knowledge of the real-time position of the display. Suppose you have 120 angular regions (3 degrees each) that are each 10 bytes. I would not bother transmitting any angular position to the PC at all. I'd just have the PC transmit all 1200 bytes for the entire image. USB is capable of delivering 1200 bytes in about 2 ms (** but see below). If you're spinning at 2000 RPM (33 frames/sec), which is 12 degrees per millisecond, the LEDs move only a few of those positions during the time the entire update takes to transmit. So you could store the incoming data directly into the array you're using to update the LEDs as they move.

Teensy3 has plenty of memory for two 1200 byte buffers, so you could also implement something like placing all the incoming data to a temporary receive buffer. Then when you know all 1200 bytes are received, you could set a flag to indicate the buffer is complete. When the sensor detects the armature passing, it could copy the receive buffer if that flag is set, and presumably update your speed measurement so the rest of the code can compute what times to update the LEDs. This approach adds some delay from the data reception until it starts displaying, but it might be visually superior if each new frame always begins from the same angular position (0 degrees) instead of some random location? Maybe?

The key point in this long-winded message is you can NOT get really dependable low latency from user-mode code running on Linux, Windows or Mac OS-X. Well, not easily anyway. It doesn't matter if you use Processing or C. Those operating systems simply habe user level scheduling latency that makes a position-response approach infeasible. But you can very easily just blast the entire image to a buffer in Teensy3 in only a small fraction of 1 rotation, where you can have excellent low-latency response to update the LEDs as they move. Just do the angular position tracking entire within the Teensy3 and have it update the LEDs from a buffer in its memory. Teensy3 should be very capable of implementing such a design.



** Regarding the time 1200 bytes takes to transmit on USB, unless you're using a Mac, it's important to write all 1200 bytes at as a single large write to the operating system's driver. Linux and Windows will issues small writes as separate transactions on your motherboard's USB host controller, which makes really inefficient use of the USB bandwidth. Windows also adds extra inefficiency in how it schedules operations on the host controller. You can see the effects in these benchmarks (scroll down to the section where Linux, Mac and Windows are compared).

http://www.pjrc.com/teensy/benchmark_usb_serial_receive.html

The good news is all 3 systems work very well, as long as you write all the data at once. USB and Teensy are tremendously fast compared to any realistic rotational speed you'll be able to achieve with an armature long enough to make a human visible display.
 
Last edited:
The 10 bytes represents 20*0-15 values, these are the bar values of each frequency.

No I am not using a two way communication that relies on instant responses, I realize how inefficient that is. I am already doing what you suggested, having processing blindly send the data to a buffer array then activating it when its full.

But now that I think of it I think my problem maybe that I am sending too much data? could it be that I am sending the data again and again within a single update, then having to process the same data way to many times than needed is reducing performance.
Yet if that was true, then wouldn't the display miss certain updates if they are cycled past within a screen update. Making it display inconsistent sections of the data I send

The other thing I am not sure about is sending the data all at once from processing, there is no function to sent a group of data is there? I was just sending data using the standard serial write.

Thanks Paul, I will re post and re explain my code after I have gone through and tested a bit more.
 
Status
Not open for further replies.
Back
Top