Teensy 4: Global vs local variables speed of execution

frankzappa · Jul 6, 2020

jonr said:
Note that while the T4 doesn't have a pseudo differential mode, with two converters, one can sample the signal and a reference voltage at about the same time. So similar common mode noise/offset reduction benefit.

So connect one ADC to a voltage reference and read the signal on the other?

I have read a bit more about the circuit part of things, I have a few more things to try. I think the noise can be further reduced. Right now the only thing that made a noticeable difference was a low pass filter before the ADC. I also have to try this as well: http://sim.okawa-denshi.jp/en/OPstool.php

Also I have to try a charge amplifier piezo circuit.

https://www.allaboutcircuits.com/te...sign-charge-amplifiers-piezoelectric-sensors/

jonr · Jul 13, 2020

> connect one ADC to a voltage reference and read the signal on the other

Yes, ideally with the voltage reference being near the signal source and both lines treated exactly the same (ie, differential pairs).

frankzappa · Jul 13, 2020

jonr said:
> connect one ADC to a voltage reference and read the signal on the other

Yes, ideally with the voltage reference being near the signal source and both lines treated exactly the same (ie, differential pairs).

Actually it’s a non issue. With defragsters unrolled loop and placing all code only while waiting for the ADC’s I can use 8 averages and still get 30 reads per sensor per millisecond which is probably twice more than what I need.

Back to the topic of global vs local variables, I’m having a hard time to figure out how to use only local variables for this application.

I’m doing many loops fast and I need the program to remember the previous values through the loops and it’s a lot of values. Seems like local variables would only make it more messy.

I’ve read that local variables are evil but I’m trying to figure out how to best approach this and not use so many global ones.

Basically I need to do peak tracking on the previous values while the adc is sampling. I need the functions to return many values (probably need to make the functions into many smaller ones) so not sure how one would pass the values on and remember them for the next loop.

When is it preferable to use global vs local variables?

I can ise static but how’s that different from a global variable?

defragster · Jul 13, 2020

Awesome the unrolled loop shows promise! Are the 8 averages on 10 or more bits of resolution?

It is global variables that are evil ... because that is the easiest thing to do - though can be made perfectly safe and good depending on the use case - just a debate issue

Generally having variables 'only available as and where needed' prevents them getting used or abused where not expected or intended/required - makes code easy to understand and maintain. Versus integrated in multiple places that prevents any easy or expected change to their use when needed later.

Static/Permanent data areas are needed - just a matter of how they are accessed.

Globals are handy and permanent - but if used in 10 places - that is 10 places that need to change to alter the nature of the global variable - or 10 places where some use could alter the global unexpected to the other 9 places.

Locals on the stack don't survive leaving the function, they can be made static so the value is maintained outside the stack and they retain their value on re-entry to that function where only that function can see them.

The fun is finding the middle ground that limits access to those that need it and know what can be done and what the data represents, and make it clear when changed exactly where and how it was used.

Depending on the use case there are ways to encapsulate the data in a class or other code that provides needed uniform and limited access.

frankzappa · Jul 13, 2020

defragster said:
Awesome the unrolled loop shows promise! Are the 8 averages on 10 or more bits of resolution?

It is global variables that are evil ... because that is the easiest thing to do - though can be made perfectly safe and good depending on the use case - just a debate issue

Generally having variables 'only available as and where needed' prevents them getting used or abused where not expected or intended/required - makes code easy to understand and maintain. Versus integrated in multiple places that prevents any easy or expected change to their use when needed later.

Static/Permanent data areas are needed - just a matter of how they are accessed.

Globals are handy and permanent - but if used in 10 places - that is 10 places that need to change to alter the nature of the global variable - or 10 places where some use could alter the global unexpected to the other 9 places.

Locals on the stack don't survive leaving the function, they can be made static so the value is maintained outside the stack and they retain their value on re-entry to that function where only that function can see them.

The fun is finding the middle ground that limits access to those that need it and know what can be done and what the data represents, and make it clear when changed exactly where and how it was used.

Depending on the use case there are ways to encapsulate the data in a class or other code that provides needed uniform and limited access.

Your idea for the unrolled loop is genius.

I’m getting great results with 10 bits and 8 averages although it’s possible 12 bits with 4 averages may be even better. There is also the possibility to use less averages on the ADC and in stead use some kind of average in software to get more bits. I’ve read about noisy signals opening up the possibility to give much more resolution by averaging the signal into more bits but I haven’t explored it. Anyway this is working great for now as is.

I went over the part about classes a bit too fast when I read the C syntax I think. How would a class be used in a scenario with sensors, peak values and whatnot?

Any simple example I could use as inspiration?

Basically I’m storing the three last rising values before a peak (both the sensor value and the micros when they occurred). Then I track the three falling sensor values after the peak. I’m using a window threshold approach. I plan to average them to get a more accurate/consistent occurrence in time. The reason for all this is to use it later to calculate striking position with time difference of arrival but that is a future problem to solve. Also remove signal hot spots when sensors are struck directly.

Right now I made a function to do all that and made everything global. Only thing that is passed To the function is the sensor value.

Loren42 · Jul 13, 2020

Why not try using a sliding window averaging scheme?

The advantage is that you get new results or new data with every completed analog conversion, but it is simply averaged with the previous raw analog data. You can make as many averages as you want. I use 16 because I can simply right shift the sum 4 times to get the averaged results.

I just posted some example code for this in another post: https://forum.pjrc.com/threads/61801-How-do-I-correct-an-analog-axis-bouncing

Just make sure the number of averaged samples is a power of two.

frankzappa · Jul 13, 2020

Yeah I've thought about that too. I've thought about using 10bit readings, multiply them by say 4 and then do the averages. That should get more resolution (I think) because sometimes a voltage is between two "steps" and it jumps up and down.

For now this works well and is simpler. I'm still very new at coding and I'm thinking I'd get something that works and I can add improvements and explore better ways to do it later.

Thanks for the example, I will save it for later.

Loren42 · Jul 13, 2020

frankzappa said:
Yeah I've thought about that too. I've thought about using 10bit readings, multiply them by say 4 and then do the averages. That should get more resolution (I think) because sometimes a voltage is between two "steps" and it jumps up and down.

For now this works well and is simpler. I'm still very new at coding and I'm thinking I'd get something that works and I can add improvements and explore better ways to do it later.

Thanks for the example, I will save it for later.

Are you talking about performing oversampling and averaging as a means to improve resolution (i.e., improving SNR of white noise)?

frankzappa · Jul 13, 2020

Yeah. I figure if you have say ten sensor readings. They have a value of 10 and 11 every other reading. If you average them out they will still probably show 10 and 11 but the real value is 10,5. If multiply them by two so they are showing 20 and 22 in stead and then average them out they will show a consistent 21.

I’m sure there are even better ways to do this.

There is an even better way to do it by applying a white noise and force the values to jump up and down and use some algorithm I think but I haven’t explored it.

defragster · Jul 13, 2020

It all depends on the reception and measure of the analog values in the system for how it actually works out to get better reading on two more bits ... so only you can test this ... i.e. YMMV

But it seems the Resolution of 12 bits and 4 averages would be faster and better - and allow discarding the lower two bits (/4) leaving what should be a more stable lowest bit.

Not sure how clean the analog reading is when the input to test is done after changing it 5 times in a row - ten times per millisecond? That is if it has any effect on the perhaps the first reading in each average group. Also how independent the two ADC unit are from each other in that fast update/change environment. Ideally they are great and wonderful - but earlier posts noted that simple analogRead() was 'at first' giving better results. Again since these are fast percussive hits the values are never stable for very long unless un-hit - not like a simple potentiometer or other slowly changing input. And given that adding averaging on top of the read averaging will certainly smooth the value - but it also might average out true peaks.

Loren42 · Jul 13, 2020

frankzappa said:
Yeah. I figure if you have say ten sensor readings. They have a value of 10 and 11 every other reading. If you average them out they will still probably show 10 and 11 but the real value is 10,5. If multiply them by two so they are showing 20 and 22 in stead and then average them out they will show a consistent 21.

I’m sure there are even better ways to do this.

There is an even better way to do it by applying a white noise and force the values to jump up and down and use some algorithm I think but I haven’t explored it.

Read this applications note on oversampling. It will tell you how to calculate the number of samples required to get X bits of increased resolution. You need a lot of samples, which may render what you want to achieve as not doable, but you should be able to answer that after reading it...

https://www.cypress.com/file/236481/download

frankzappa · Jul 13, 2020

@defragster
I’m measuring a half wave that is about 0.5 - 2ms in lenght so not very fast signals. Signals above say 1000 Hz are just noise in this application.

Averaging is good in this case.

I did notice some spillover to other pins on the same ADC. However I was using everything at very high speed and also my circuit had a 4k resistance so I assume it can be tweaked away by just removing the resistance and use maybe medium speed. I will have to test that but one problem at a time. I will test 12 bits as well.

Reading just one pin vs reading 10 doesn’t seem to affect signal quality. The adcs seem completely independent but as I said a small spillover to other pins on the same ADC can be seen. Not a total catastrophy because I’m reading the same source with multiple sensors but I should be able to reduce it.

Is there any reason to not use division on the teensy since it’s so fast? The previous comment said bit shifting in stead of division/multiplication.

frankzappa · Jul 13, 2020

Thanks

defragster · Jul 13, 2020

The compiler likely will take away any clear /2 or /4 and program as a shift, and if not the CPU can do integer divisions in a single cycle AFAIK - so not worth worrying over the difference ... what ever is clearest for your reading and usage without confusion.

Interesting the signal levels will persist for at least 500 us or more - indeed then if reading 10 times per 1 ms then doing a rolling avg of four of them should get a good value.

frankzappa · Jul 13, 2020

defragster said:
Interesting the signal levels will persist for at least 500 us or more - indeed then if reading 10 times per 1 ms then doing a rolling avg of four of them should get a good value.

Well it’s an AC signal, the first peak is a negative half wave sinusoidal signal that lasts up to 2ms but it vibrates up/down for maybe 50milliseconds after the first peak before calming down completely. The first negative peak is of interest and is triggered. The rest of it is a matter of making the threshold an envelope that passes barely above the rest of the 50ms so new signals can be triggered if they are above the aftershocks.

Loren42 · Jul 14, 2020

frankzappa said:
Well it’s an AC signal, the first peak is a negative half wave sinusoidal signal that lasts up to 2ms but it vibrates up/down for maybe 50milliseconds after the first peak before calming down completely. The first negative peak is of interest and is triggered. The rest of it is a matter of making the threshold an envelope that passes barely above the rest of the 50ms so new signals can be triggered if they are above the aftershocks.

More likely it is the sample and hold circuit inside the A/D.

frankzappa · Jul 14, 2020

Loren42 said:
More likely it is the sample and hold circuit inside the A/D.

Yeah as I said I used the fastest settings for conversion and sampling. I can use medium and see if it goes away. My guess is that the capacitor inside the A/D doesn’t discharge in time for some reason.

Loren42 · Jul 14, 2020

frankzappa said:
Yeah as I said I used the fastest settings for conversion and sampling. I can use medium and see if it goes away. My guess is that the capacitor inside the A/D doesn’t discharge in time for some reason.

Best to read the manufacture's data sheet on the A/D operation or ask tech support.

frankzappa · Sep 4, 2020

Ok, so I'm back with more stupid questions about speed.

I have working code done in defragsters suggested way. I have 8 sensors connected with code executing between each sensor pair. It seems this project will work although I'm still a few months away.

I have a problem.

When I run the code I'm getting about 37 sensor reads per sensor and millisecond which is great. However when I send telemetry data to a datalogger I use Serial.print commands and I notice I get 32 sensor reads. I did a test where I timed the execution of the Serial.print commands and they are several microseconds. This is enough to slow the analog reads although I have about 6 microseconds of time to execute code between every sensor read pair.

Is there a faster way of sending sensor values through USB to the PC?

I'm using a teensy 4 btw.

KurtE · Sep 4, 2020

As with all of the Serial ports. If you try to write something to the USB Serial port that will not fit in the queue, the code will wait there until there is room for it in the queue before the Serial write completes...

There are ways to avoid this, such as checking first to see how much room is available and not get held up in the code.
That is you can first call Serial.availableForWrite() which will return number and as long as you don't write more to the queue than that number the code will more or less simply copy your data to the software buffer, without waiting.

If the data you are sending is in a fixed format, like two bytes per sensor, or the like, it is easier to handle, as you can ask will all of one packet fit or not, and if not, you have your own queue of packets, that you save the overflow in and whenever you have time, you check again to see if there is room to move some of your packets to the USB queue. Now if your outputs are less regular, like Ascii formated data, this can be slightly more tricky... But again it can be pretty simple.

Recently For a very different but similar issue, I have a version of the USBHost code that I hacked up to save up all of the debug outputs into memory, and only when I ask for it, does it then get sent to the debug terminal.

It is really pretty simple. You can make a simple subclass of the Print class or Stream class, and default most of it. Mainly you need to implement your own version of the write method for outputting one byte...
And if you use it much the multi-byte version, which default to iterate the one byte version, for each byte...
For example in my simple version I have:

Could write more, but don't want to add confusion here.

frankzappa · Sep 4, 2020

Aha I understand. So the program stops execution until there is room in the queue?

It is strange because I'm getting delays of about 8-14 micros to write 15 lines. If that was the case then the delay would be more like 125micros because USB polls at that interval. Or am I misunderstanding?

I was thinking that maybe the Serial.print function has like a few hundred lines of code which would make it too slow for what I'm doing.

The data is fixed so I could do your first suggestion.

Your example code did not show up in the post.

frankzappa · Sep 4, 2020

Basically what I need to send is comma separated int and float values. Maybe I could just send it as a string, not sure if the program will accept it.

KurtE · Sep 4, 2020

Sorry, I have not totally followed your needs. That is if you have two programs one on PC or the like and another on the Teensy, and you have full control of both programs, than you may have lots of options:
Example: If you are writing 6 10 bit analog reads, you could simply output: the raw data as binary to the other side who converts the values to what that means...

Again depending on needs you could simply write out 12 bytes for the 6 values or if your values are only 10 bits each you could pack that into 60 bits or 8 bytes, but that may be more trouble than it is worth.
Or you can convert your code over to maybe use RAWHID: https://www.pjrc.com/teensy/rawhid.html
that allows you to send up 1000 64 byte packets per second. In these cases you setup a 64 byte buffer and you tell it to send. You might give it a 0 timeout, which will tell you all buffers are full and then you keep a list of these. Again this could be real simple stuff. Like create a structure that each one holds lets say 5 of these (5*12) + maybe sequence number... You could allocate a lot of these in DMAMEM (upper 512KB) memory, and your query functions, simply have a Pointer to the next one to write to, and when it is full, you increment to next one (wrapping around), and if you have some that have not output yet, when possible you check to see if you have outstanding HID packets to send, and try to resend. If succeeds update that point to next pending one...

Again the above logic can work for either RAWHID or standard Serial. In standard Serial. And simply outputting 16 bit values, I would probably first start off each output maybe with value 0xFFFF as I know that is not a valid 10 bit analog value, so it can help know when you are starting new packet of information...

My quick and dirty code for debug output buffering in the USBHost code is:

Code:

class DebugMemoryStream : public Stream {
public: 
	DebugMemoryStream(uint8_t *buffer, uint32_t size_buffer) : _buffer(buffer), _size_buffer(size_buffer) {}

	void setBuffer(uint8_t *buffer, uint32_t size_buffer) {
		_buffer = buffer;
		_size_buffer = size_buffer;
	}

	// From Stream
	virtual int available() {
		if (_head <= _tail) return _tail - _head;
		return (_size_buffer - _head) + _tail; 
	}
	virtual int read() {
		if (_head == _tail) return -1;
		int return_value = _buffer[_head++];
		if (_head == _size_buffer) _head = 0;
		return return_value;
	}

	virtual int peek() {
		if (_head == _tail) return -1;
		return _buffer[_head];
	}

	// From Print will use default except for one byte version
	using Print::write; 
	virtual size_t write(uint8_t b) {
		if (!_enable_writes) return 0; // not writing now
		if (_bypass) return Serial.write(b);
		uint32_t tail = _tail;
		if (++tail == _size_buffer) tail = 0;
		if (tail == _head) {
			if (_stop_writes_on_overflow) return 0; // no room - if keep first received data
			else if (++_head == _size_buffer) _head = 0;
		}
		_buffer[_tail] = b; 
		_tail = tail; // 
		return 1;
	}

	virtual int availableForWrite(void) {
		return _size_buffer - available();
	}

	// other specific functions
	void enable(bool enable_writes) {_enable_writes = enable_writes;}
	void stopWritesOnOverflow(bool fStop) {_stop_writes_on_overflow = fStop;}
	void clear() {_head = _tail = 0; }
	void byPass(bool bypass=true) {_bypass = true;}
	bool bypass() {return _bypass;}
private:
	uint8_t 	*_buffer;  // The buffer
	uint32_t	_size_buffer;
	uint32_t	_tail = 0;	// next plact to write to.
	uint32_t	_head = 0;	// next place to read from. 
	bool		_enable_writes = true;	// by default we are enabled
	bool 		_bypass;
	bool  		_stop_writes_on_overflow = false; // If we fill buffer to we keep the first bytes or overwrite those... 
};

But again this is not directly what you would need... You may not need to use the Stream class but maybe only the Print class code... Doing the Stream code does allow me to do things like:

Code:

void write_buffered_stuff_to_serial() {
    int count = mybufferedSerial.available();
    int count_write = Serial.availableForWrite();
    if (count > count_write) count = count_write;
    while (count--) {
        Serial.write( mybufferedSerial.read());
    }
}

Which is not the most efficient way of doing this, but for my needs it worked. For faster stuff, I would probably build in method to my Serial class, that would do something similar, but instead of reading and writing one byte at a time, I would: do Serial writes directly from my own queue, with a couple of cases. That is only do writes of continuous stuff, that is if the data has wrapped around in the queue, only output the first part of the output, up through the end of the queue... Next time it should be such that then all of the data should then be contiguous. Hope that makes sense. But again for my debug code, I kept it simple.

frankzappa · Sep 5, 2020

KurtE said:
Sorry, I have not totally followed your needs. That is if you have two programs one on PC or the like and another on the Teensy, and you have full control of both programs, than you may have lots of options:
Example: If you are writing 6 10 bit analog reads, you could simply output: the raw data as binary to the other side who converts the values to what that means...

Again depending on needs you could simply write out 12 bytes for the 6 values or if your values are only 10 bits each you could pack that into 60 bits or 8 bytes, but that may be more trouble than it is worth.
Or you can convert your code over to maybe use RAWHID: https://www.pjrc.com/teensy/rawhid.html
that allows you to send up 1000 64 byte packets per second. In these cases you setup a 64 byte buffer and you tell it to send. You might give it a 0 timeout, which will tell you all buffers are full and then you keep a list of these. Again this could be real simple stuff. Like create a structure that each one holds lets say 5 of these (5*12) + maybe sequence number... You could allocate a lot of these in DMAMEM (upper 512KB) memory, and your query functions, simply have a Pointer to the next one to write to, and when it is full, you increment to next one (wrapping around), and if you have some that have not output yet, when possible you check to see if you have outstanding HID packets to send, and try to resend. If succeeds update that point to next pending one...

Again the above logic can work for either RAWHID or standard Serial. In standard Serial. And simply outputting 16 bit values, I would probably first start off each output maybe with value 0xFFFF as I know that is not a valid 10 bit analog value, so it can help know when you are starting new packet of information...

My quick and dirty code for debug output buffering in the USBHost code is:

Code:

class DebugMemoryStream : public Stream { public: DebugMemoryStream(uint8_t *buffer, uint32_t size_buffer) : _buffer(buffer), _size_buffer(size_buffer) {} void setBuffer(uint8_t *buffer, uint32_t size_buffer) { _buffer = buffer; _size_buffer = size_buffer; } // From Stream virtual int available() { if (_head <= _tail) return _tail - _head; return (_size_buffer - _head) + _tail; } virtual int read() { if (_head == _tail) return -1; int return_value = _buffer[_head++]; if (_head == _size_buffer) _head = 0; return return_value; } virtual int peek() { if (_head == _tail) return -1; return _buffer[_head]; } // From Print will use default except for one byte version using Print::write; virtual size_t write(uint8_t b) { if (!_enable_writes) return 0; // not writing now if (_bypass) return Serial.write(b); uint32_t tail = _tail; if (++tail == _size_buffer) tail = 0; if (tail == _head) { if (_stop_writes_on_overflow) return 0; // no room - if keep first received data else if (++_head == _size_buffer) _head = 0; } _buffer[_tail] = b; _tail = tail; // return 1; } virtual int availableForWrite(void) { return _size_buffer - available(); } // other specific functions void enable(bool enable_writes) {_enable_writes = enable_writes;} void stopWritesOnOverflow(bool fStop) {_stop_writes_on_overflow = fStop;} void clear() {_head = _tail = 0; } void byPass(bool bypass=true) {_bypass = true;} bool bypass() {return _bypass;} private: uint8_t *_buffer; // The buffer uint32_t _size_buffer; uint32_t _tail = 0; // next plact to write to. uint32_t _head = 0; // next place to read from. bool _enable_writes = true; // by default we are enabled bool _bypass; bool _stop_writes_on_overflow = false; // If we fill buffer to we keep the first bytes or overwrite those... };

But again this is not directly what you would need... You may not need to use the Stream class but maybe only the Print class code... Doing the Stream code does allow me to do things like:

Code:

void write_buffered_stuff_to_serial() { int count = mybufferedSerial.available(); int count_write = Serial.availableForWrite(); if (count > count_write) count = count_write; while (count--) { Serial.write( mybufferedSerial.read()); } }

Which is not the most efficient way of doing this, but for my needs it worked. For faster stuff, I would probably build in method to my Serial class, that would do something similar, but instead of reading and writing one byte at a time, I would: do Serial writes directly from my own queue, with a couple of cases. That is only do writes of continuous stuff, that is if the data has wrapped around in the queue, only output the first part of the output, up through the end of the queue... Next time it should be such that then all of the data should then be contiguous. Hope that makes sense. But again for my debug code, I kept it simple.

Thanks for the examples.

Well right now I'm only using the data to see what's happening and that my code is doing what it should. I'm only 30% done with the program. Later when the program is complete I only need to output MIDI as fast as possible, meaning with the least amount of delay. I'm hoping that can be done with almost zero latency, as far as I understand it should be possible to send under 0.125micros of delay.

The program could have a test mode maybe where more data can be sent temporarily when the user wants to adjust different settings and then switch back to performance mode when playing.

That is a future problem, now I just want the program to work.

Teensy 4: Global vs local variables speed of execution

Well-known member

Well-known member

Well-known member

Senior Member+

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Senior Member+

Well-known member

Well-known member

Well-known member

Senior Member+

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Senior Member+

Well-known member

Well-known member

Senior Member+

Well-known member