Prop Shield Beta Test

MichaelMeissner · Mar 11, 2016

Frank B said:
It will have.. but float.

Even with FPU for doubles, that "f" functions are better, since all the vars are float - otherwise they are converted to double and back to float, i think (<- is this correct, Michael ?)

Your code looks good. What's the license of the code from the Freescale-Appnotes you used ?

Yes and no.

The current Teensys do not have hardware FP, but it takes longer to emulate double precision than single precision. Even outside of the cost to emulate a single operation, the sqrt function probably needs to execute an additional round or two of Newton-Raphson to get the precision right.

Code imported from the AVR machines like most of the Arduino, sometimes uses doubles, but the compiler silently treats double like float.

However, it isn't always true that single precision is faster than double precision. There are machines that internally do things in double precision and do single precision by doing a rounding step after doing the calculation in double precision. Because C was developed on such a machine (PDP-11 and PDP-8 before the 11) is why C defaults to double precision for constants. The PowerPC that I work on also internally does things in double precision format and scalar 'single precision' values are stored in the register in the double format. Before SSE extension was added to the X86 architecture, floating point was always done in 80-bit floating point on a stack, which was loads of fun to run code that depended on exact IEEE semantics.

Frank B · Mar 11, 2016

Kris, your sketch says:

pressure is 102695.00 Pa, altitude is 65423.69 m, temperature is 27.88 C

Hm. not really.

johnnyfp · Mar 11, 2016

I think there's a slight output error at line 548 where it should be the following instead

from

Code:

baroin = altimeter_setting_pressure_mb * 0.02953;


		Serial.print("pressure is "); Serial.print(pressure, 2); Serial.print(" Pa, ");  // Print altitude in meters
		Serial.print("altitude is "); Serial.print(altitude, 2);  Serial.print(" m, "); // Print altitude in meters

to

Code:

float alt = altimeter_setting_pressure_mb * 0.02953;


		Serial.print("pressure is "); Serial.print(baroin, 2); Serial.print(" Pa, ");  // Print altitude in meters
		Serial.print("altitude is "); Serial.print(alt, 2);  Serial.print(" m, "); // Print altitude in meters

However not sure what the point of the Altitude mode vs the pressure mode is. And reading the Barometer twice. Probably something to do with temperature and base compensation?

Also you will need to set your station_elevation_m to your barometer base or start height in the world.

Pensive · Mar 11, 2016

Received mine yesterday, thank you

its late tonight, should play with it tomorrow

Ben · Mar 11, 2016

MichaelMeissner said:
Yes and no.
Even outside of the cost to emulate a single operation, the sqrt function probably needs to execute an additional round or two of Newton-Raphson to get the precision right.

Is there a step before N-R when taking the sqrt on a micro? LUT? Given how quickly N-R converges for all real numbers with f(x)=sqrt(x) i somehow thought it would do N-R from start to end. But you mention one or two rounds of N-R, which on it's own wouldn't be enough, so there must be something else?
I don't want to bother you giving me a detailed explanation, but if you happen to have a link that offers some insight that would be most kind of you

On Topic: IMHO it's a good idea to use single precision as it will yield the same results as code running on a future T3++ or T4 that uses a hardware FPU.

-Ben

Pensive · Mar 11, 2016

PaulStoffregen said:
called these "wit" and "witout", named after the Philly Cheesesteak ordering protocol.

Thanks to this beta program, I have now discovered I can get philly cheesesteaks inside of a 3 hour round trip from house. They have a co.uk and mobile vans, with permanent residence in spitalfields market in london.

This is the best thing EVER.

MichaelMeissner · Mar 11, 2016

Ben said:
Is there a step before N-R when taking the sqrt on a micro? LUT? Given how quickly N-R converges for all real numbers with f(x)=sqrt(x) i somehow thought it would do N-R from start to end. But you mention one or two rounds of N-R, which on it's own wouldn't be enough, so there must be something else?
I don't want to bother you giving me a detailed explanation, but if you happen to have a link that offers some insight that would be most kind of you
-Ben

I meant you probably would need 1-2 EXTRA rounds of N-R due to the extra precision when calculating sqrt for double compared to float, not that you could get away with 1-2 rounds. Note, I'm not really up on the fine points of numerical analysis.

Ben · Mar 11, 2016

MichaelMeissner said:
I meant you probably would need 1-2 EXTRA rounds of N-R due to the extra precision when calculating sqrt for double compared to float, not that you could get away with 1-2 rounds. Note, I'm not really up on the fine points of numerical analysis.

Ohh now I see, thanks. Yeah 1-2 rounds sounds right for a single->double precision increase.

PaulStoffregen · Mar 11, 2016

Does N-R really converge that quickly? Double precision has a *lot* of bits in its mantissa!

onehorse · Mar 11, 2016

Kris, your sketch says:

pressure is 102695.00 Pa, altitude is 65423.69 m, temperature is 27.88 C

Hm. not really.

This is likely a result of poor signed integer casting. I had to fix a few of these errors to get reasonable sensor output. I do not claim this is a perfectly well-executed, error free code. Just that it is a good place to start from...

OK, I fixed the integer casting for the MPL3115A2 pressure, altitude, and temperature. See if it works better for the lowlands...

Ben · Mar 11, 2016

PaulStoffregen said:
Does N-R really converge that quickly? Double precision has a *lot* of bits in its mantissa!

Yes, best case convergence with NR is quadratic (each iteration doubles the number of "usable" bits in the mantissa). How fast it actually converges depends on the function, with sqrt it's pretty good. There are other functions that don't converge at all with certain initialisations (seeds). But that never occurs with sqrt except you initialize with x=0, but you wouldn't do that anyway. When taking the sqrt of a binary number m^e one would seed with 1^(e/2) if e is even and 1^((e-1)/2) if e is odd (I'm not completely sure about this, but I think I got it right)
-Ben

Edit: My Shield arrived today, on my birthday. Thanks you PJRC, this is a nice birthday gift

onehorse · Mar 12, 2016

OK, I finally figured out what I was doing wrong with the magnetometer. The signed 16-bit (2-byte) magnetometer counts is in units of milliGauss, not sure why I had the conversion factor I did two years ago. Anyway, I added a manual magnetic calibration which just measures the min and max, takes the average and stuffs the result into the mag offset registers. In the initialization I choose auto calibration so the mag will start out reasonably well calibrated (the bias offset is huge on my prop shield) and just get better with subsequent use.

Now all of the sensor data is properly scaled and calibrated for offset bias, and the Madgwick fusion algorithm produces quaternions and Euler angles derived therefrom. Enjoy!

I tried Frank's squartf instead of sqrt and found the fusion rate dropped from 660 Hz to 650Hz, so it is a little less efficient. Is it more accurate to compensate? I don't know.

Here is the sketch repository.

PaulStoffregen · Mar 12, 2016

I got the magnetic cal going today on Linux. Will clean it up soon and port to Mac and Windows on Sunday.

defragster · Mar 13, 2016

Odd #1: often when I re-compile the thing hangs after printing 'Scanning...' - it takes a power cycle to run (not a reset). I'm suspecting this is from the upload interrupting an in process I2C transaction?
[ I had an active ESP8266 on PWR/GND and sharing pins 8,9,10 and 15,16,17 - none of which appear to be common to the Prop shield? - I unplugged that and the next re-compile/upload stuck at "Scanning..." ]

Odd #2: (sometimes?) sitting 'still' roll/pitch/yaw seem to vary. Moving it some direction 90° seems to take a second (couple of updates) to filter in?

My edited file : View attachment quaternonFilters.ino

I'm running my Prop's T_3.2 at 96 MHz and the fusion rate was under 700 to start then was steadying at ~710 with peaks in the 850+ using the GitHub code.

Looking at the code I wondered about text macros for the q#==q[#-1] refs and recompiled with first edits below in the quaternion.h file - the code and ram use basically unchanged.

With macro edits :: running the same Prop on T_3.2 at 96 MHz and the fusion rate was under 700 to start then was steadying at ~750 with peaks in the 850+.

5% faster??? Not sure why the code size didn't change more - but this saves two sets of four float transfers in and out of the q# stack locals - perhaps the refs to the globals are more efficient, or fit the cache better? All changes to global q[] are after the early return testing.

Attached my file - I didn't do a FORK so I can't do a pull request - there was a second set edits of using pre-calculated 2.0f multipliers where they were duplicated (before q# was changed?). These made it start at or above 700Hz and steady state closer to 760Hz, and the occasional outliers hit 880 Hz.

I'm tempted to do a DEBUG set of SPEW where only data values changing (by x%) are shown so that without gross movement the SPEW would stay quiet - and then on motion only the altered value would change - and only for as long as they took to stabilize. I think this would point out the behavior of the math more easily (like ODD #2). Maybe that would be best done by sending the data stream out SERIAL1 to a second Teensy to do that math in a second monitor window?

Code:

#define q1 (q[0])   // short name macro for readability
#define q2 (q[1])
#define q3 (q[2])
#define q4 (q[3])

        void MadgwickQuaternionUpdate(float ax, float ay, float az, float gx, float gy, float gz, float mx, float my, float mz)
        {
//xx            float q1 = q[0], q2 = q[1], q3 = q[2], q4 = q[3];   // short name local variable for readability

            // ...

            // replaced these ::             q[0] = q1 * norm;
            q1 *= norm;
            q2 *= norm;
            q3 *= norm;
            q4 *= norm;
        }

        void MahonyQuaternionUpdate(float ax, float ay, float az, float gx, float gy, float gz, float mx, float my, float mz)
        {
//xx            float q1 = q[0], q2 = q[1], q3 = q[2], q4 = q[3];   // short name local variable for readability

            // ...

            // replaced these ::             q[0] = q1 * norm;
            q1 *= norm;
            q2 *= norm;
            q3 *= norm;
            q4 *= norm;
        }

Code:

//xx            float _2q1q3 = 2.0f * q1 * q3;
//xx            float _2q3q4 = 2.0f * q3 * q4;
            float _2q1q3 = _2q1 * q3;
            float _2q3q4 = _2q3 * q4;

// and these >> _2q1mx = 2.0f * q1 * mx;
            _2q1mx = _2q1 * mx;
            _2q1my = _2q1 * my;
            _2q1mz = _2q1 * mz;
            _2q2mx = _2q2 * mx;

onehorse · Mar 13, 2016

Normally when I run this same Madgwick filter on the MPU9250 I get fusion rates near 1600 Hz at Teensy MCU speeds of 72 MHz. That is because I am only reading data once (all data registers at the same time). In this code I am reading from two completely different sensors and there will be some latency. I haven't spent any time trying to make the fusion rate faster other than moving the MPL3115A2 sensor data read into the 2 Hz display loop. There are plenty of ways to speed it up without changing the quaternionsFilter.ino file. For example, accel and mag data can be burst read instead of individually read as currently, the gyro and accelerometer temperatures don't have to be read (or can be moved into the display loop), and data ready interrupts could be better used.

Still, a 700 Hz fusion rate is plenty fast for most applications. The Madgwick (or Mahony) filter is computationally efficient but is barebones as far as sensor data filtering is concerned. They will provide a pretty good absolute orientation estimation if the underlying sensors are (1) accurate, 2) stable, and 3) well calibrated. The calibration functions in the github repository are also bare bones, call them good enough to show the method but could be improved. The entire code shows the basics and its up to the user to configure for their own application.

If you are running with a data sample rate of 100 - 200 Hz, bandwidth of ~25 Hz, and fusion rate at ~1 kHz, you are in the sweet spot. The prop shield with this code is pretty much there. Residual jitter in the orientation estimation is likely a result of the underlying sensor behavior as opposed to the quality of the sensor fusion algorithm. I have shown this through basic testing here and here.

onehorse · Mar 13, 2016

Your code looks good. What's the license of the code from the Freescale-Appnotes you used ?

The code was written entirely by me. I took some register defines and a basic tap detection scheme from a Sparkfun sketch. Nothing came from the Freescale AN except the basic register and function information in the datasheet. I took Madgwick's algorithm for his and Mahony's sensor fusion and modified it to work with an Arduino IDE. The I2C read and write functions were suggested to me by Brian Knox. And that's about it.

There is no license required as far as I am concerned; this is completely open source and available to anyone to do as he or she will with it. Of course, I would appreciate attribution as a common courtesy, etc. And if you make it better or find errors I would like to know about it. Pretty simple...

defragster · Mar 13, 2016

onehorse said:
There are plenty of ways to speed it up without changing the quaternionsFilter.ino file.

Independent of any hardware and data reading :: These edits to the quaternionsFilter.ino file are just pre-compile fluff where the readability/functionality was maintained (if I read the code right) - another way of doing the same thing while reducing run-time processor overhead - amounting to a free 5% throughput improvement - I only looked because you noted lost Hz from using the sqrtf() - this gave it back for free, and the same macro work applied perfectly to both functions. The second set added marginally - I just noticed some explicit pre-calculation efforts were not fully utilized. Any reduction in spurious (2.0f * xyz) math seemed to be the goal of the intermediate variables - with or without the T_3.2's lack of FPU- the second edits just built on those existing variables already in the code - and the first set of improvements show that while intermediate variables improve readability - they can come at a cost - so doing them without fully using them is just adding delay.

onehorse said:
If you are running with a data sample rate of 100 - 200 Hz, bandwidth of ~25 Hz, and fusion rate at ~1 kHz, you are in the sweet spot. The prop shield with this code is pretty much there.

This confuses me - maybe it is because the sensors produce data at different rates and efforts are made to get data fusion as each set is presented? : I understand 'fusion' to be the association of a set of data points to its 'usable value'. Getting 1,000 Hz fusion points from 200 sample points seems counter intuitive? Is it interpolation by design to get optimal use of each data point and factor out the 'crosstalk' caused by gravity?

Looking at the current quaternionsFilter.ino I see this one was not changed to sqrtf()? :: _2bx = sqrt(hx * hx + hy * hy);

Given that I have the ESP8266 on a serial port already - if I were to push this (currently displayed or more) Teensy data to the ESP8266, it could monitor that data and spit it out to a formatted web page rather than a scrolling Serial stream. This is partly for my own understanding of the 3 DOF data components and being able to get 'fusion' of my own understanding of how they relate to Pitch/Yaw/Roll - all directly inline with my intended usage case.

MichaelMeissner · Mar 13, 2016

I decided to try onehorse's code last night. I had to do some mechanical changes to get it to work with the way I installed it, renaming quaternonFilters.ino to quaternonFilters.cpp, and adding a quaternonFilters.h for the global variables. I haven't looked into the various warnings that are emitted:

Code:

Teensy_Prop_Shield.ino: In function 'void FXOS8700CQMagOffset()':
Teensy_Prop_Shield.ino:890:47: warning: narrowing conversion of '32768' from 'int' to 'int16_t {aka short int}' inside { } [-Wnarrowing]
Teensy_Prop_Shield.ino:890:47: warning: narrowing conversion of '32768' from 'int' to 'int16_t {aka short int}' inside { } [-Wnarrowing]
Teensy_Prop_Shield.ino:890:47: warning: narrowing conversion of '32768' from 'int' to 'int16_t {aka short int}' inside { } [-Wnarrowing]
Teensy_Prop_Shield.ino:891:9: warning: variable 'dest1' set but not used [-Wunused-but-set-variable]
Teensy_Prop_Shield.ino:891:31: warning: variable 'dest2' set but not used [-Wunused-but-set-variable]
Teensy_Prop_Shield.ino: In function 'void initFIFOMPL3115A2()':
Teensy_Prop_Shield.ino:1377:11: warning: variable 'temp' set but not used [-Wunused-but-set-variable]
Teensy_Prop_Shield.ino: In function 'void initRealTimeMPL3115A2()':
Teensy_Prop_Shield.ino:1414:11: warning: variable 'temp' set but not used [-Wunused-but-set-variable]

I haven't gotten around to soldering up a Teensy 3.1 to the prop shield, so I used my LC which tightly fits into the shield without soldering. If I just run it with just the LC and the shield, it runs fine.

If I put the shield into one of the tiny 170 tie-point breadboards (http://www.ebay.com/itm/5x-Transpar...048856?hash=item35d719a558:g:nOEAAOSw0vBUdnbL), it sometimes prints unknown error 0x1 in the i2c scan. Sometimes it hangs at that point, sometimes it eventually continues. This is similar to what defragster reported.

Similarly if I put the LC + prop shield into a small protoboard (https://www.tindie.com/products/DrAzzy/1x2-prototyping-board/) it prints the unknown error, and may or may not hang.

Once I've put in some more Serial.println calls (one after the Serial.begin, one before the Wire.begin, and one before the I2Cscan), it now seems to come up all of the time.

So maybe the LC just needs a little more time to settle before doing the i2c devices.

Ben · Mar 13, 2016

I read through the sqrt page at cppreference.com and it says that sqrt, sqrtf and sqrtl are required to be exact, which means accurate to within 0.5 LSB of it's return data format. I think it might be worth a try to investigate which parts of the algorithm can make do with less precision and try the "fast inverse square root" algorithm in those places instead. Especially if you need the inverse anyways, like when normalizing vectors (dividing by sqrt(x^2+y^2+z^2) ).
-Ben

manitou · Mar 13, 2016

PaulStoffregen said:
Good question. On 3.3V, not a lot. The sensors are a few mA when fully active. Flash write/erase might add a bit. I haven't really worried about this. Answers in the datasheets.

I was checking 3.3v power usage with meter, all seemed reasonable at first. But now reading the gyro consumes 20ma ?? (datasheet says 2.7 ma active). I wonder if I fried something jumpering meter inline here and there??? gyro seems to be sending out different values when unit is moved ....

SPI flash was only taking 2.8ma doing erase and 1.5ma doing reads. altimeter 2.5ma, accel/mag 0.26ma

onehorse · Mar 13, 2016

the various warnings that are emitted

Since I wrote a lot of this "code" two years ago (a month after I first heard the word "Arduino") there is bound to be a lot of sloppiness wrt use of int16_t and int, etc. I cleaned up quite a bit of nonsense already to do with 2's complement but I would not be surprised if more surgery would be required. Please let me know about any errors or poor programming choices.

This confuses me - maybe it is because the sensors produce data at different rates and efforts are made to get data fusion as each set is presented? : I understand 'fusion' to be the association of a set of data points to its 'usable value'. Getting 1,000 Hz fusion points from 200 sample points seems counter intuitive? Is it interpolation by design to get optimal use of each data point and factor out the 'crosstalk' caused by gravity?

Yes, this confuses a lot of people. Sensor fusion is not analytical, it is iterative. The simplest way to think about sensor fusion is as an iterative error correction algorithm not unlike a Newton-Raphson iterative method. For each data sample, the fusion algorithm needs to iterate four or five times to asymptote to a stable solution. So the right way to "construct" the timing is to choose a sensor data rate that meets the needs of the application, then size the processor to allow fusion rates at 4 or 5 times the data sample rates. So for 200 Hz sample rate, typical for human activities, you need a processor capable of running the fusion at 1 kHz. That's why I transitioned from the lovely Arduino Pro Mini (fusion rates of ~100 Hz) to the even lovlier Teensy (fusion rates ~1500 Hz).

defragster · Mar 13, 2016

onehorse said:
Yes, this confuses a lot of people. Sensor fusion is not analytical, it is iterative.

Thanks onehorse - going to the bank and standing in line with a deposit slip 5 days a week when you only get checks one day each week just didn't seem to make sense.

I ignored the compile warnings assuming they were known 'work in progress' when I focused on the simple cleanup notes I made.

stevech · Mar 13, 2016

I've worked in fields where "Sensor Fusion" means improving an estimate of a real world event based on analysis of multiple sensors' data that observe the same event. The sensors can and normally differ in what they measure. More fusion leads to higher degree of confidence. For me, this comes from working in defense and intelligence systems where the kinds and places of sensors measuring a singular event differ greatly.

Another example: dead reckoning navigation (heading/speed) fused with GPS x,y,z estimate (that may be short-term absent or inaccurate due to satellite geometry, sky occlusion, etc.)

whereas improving single-sensor estimates isn't fusion, but rather statistical data analysis/reduction.

onehorse · Mar 13, 2016

Agree, but this describes what, I was describing how...

defragster · Mar 13, 2016

stevech said:
I've worked in fields where "Sensor Fusion" means improving an estimate of a real world event based on analysis of multiple sensors' data that observe the same event. The sensors can and normally differ in what they measure. More fusion leads to higher degree of confidence. For me, this comes from working in defense and intelligence systems where the kinds and places of sensors measuring a singular event differ greatly.

Thanks stevech! That furthers my understanding - in line with what I was thinking when I acknowledged my confusion.

Prop Shield Beta Test

Senior Member+

Senior Member

Well-known member

Well-known member

Well-known member

Well-known member

Senior Member+

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Senior Member+

Well-known member

Well-known member

Senior Member+

Senior Member+

Well-known member

Senior Member+

Well-known member

Senior Member+

Well-known member

Well-known member

Senior Member+