Simpler way to compute sqrt(ch0^2+ch1^2)? My code example seems way to messy.

Status
Not open for further replies.

Mike5000

Well-known member
Hi. I am trying to make an audio function that takes two channels in a and b.

It computes root = sqrt ( ch0^2 + ch1^2)

Below is my very messy attempt. I will use only Teensy 3.6. No need to have ifdefs for other cpus at the moment.

First of all, what is the data format that audio channels are sent with? Is each sample packed inside a uint32_t? So there is two int16_t (signed), each containing a sample. Or....... is there some Q.xx type format here? I get some strange results it seems.

Can you come up with a better and more elegant way to compute sqrt ( ch0^2 + ch1^2)? I would like to not have to mess with signed_saturate_rshift()..
How do I get the data from each channel into variables the most clean way?

Could I do this in floating point using the floating point support in the CPU since I use Teensy3.6? Can someone provide example code on how to get the samples into floats? And how to store the floats back into samples when the calculations are done?


Code:
// based on the code template by John Michael Reed as described above 
// Code to compute the root of the (square of channel 0 + the square of channel 1)
// out = sqrt ( (ch0^2) + (ch1^2) ) 
 
#include <Arduino.h>
#include "rootofsumofsquares.h"
#include "utility/dspinst.h"

void RootOfSumOfSquares::update(void)
{
	audio_block_t *blocka, *blockb;
	uint32_t *pa, *pb, *end;


    blocka = receiveWritable(0);  //blocka is an object
    blockb = receiveReadOnly(1);  //blockb is an object
    if (!blocka) {
		if (blockb) release(blockb);
		return;
    }
    if (!blockb) {
		release(blocka);
		return;
    }
    pa = (uint32_t *)(blocka->data);  // pa is pointer to a (a is in and out)
    pb = (uint32_t *)(blockb->data);  // pb is pointer to b (b is in and out)
    end = pa + AUDIO_BLOCK_SAMPLES / 2;  // 180 / 2 = 90 

    while (pa < end) {
		
        uint32_t tmp32ka = *pa;                               // read 2 samples from a data 
        uint32_t tmp32kb = *pb;                               // read 2 samples from b data

        // below data is splitted into 16 bit samples but on SAME a channel!
        uint32_t vala1 = multiply_16bx16b(tmp32ka, tmp32ka);   //a sample0 squared. BOTTOM
        vala1= signed_saturate_rshift(vala1, 32, 0);          //prevent 32 bit overflow ... remark 32 bit here
        uint32_t vala2 = multiply_16tx16t(tmp32ka, tmp32ka);  //a sample1 squared. TOP 
        vala2= signed_saturate_rshift(vala2, 32, 0);          //prevent 32 bit overflow ... remark 32 bit here

        // below data is splitted into 16 bit samples but on SAME b channel!
        uint32_t valb1 = multiply_16bx16b(tmp32kb, tmp32kb);  //b sample0 squared. BOTTOM
        valb1= signed_saturate_rshift(valb1, 32, 0);          //prevent 32 bit overflow ... remark 32 bit here
        uint32_t valb2 = multiply_16tx16t(tmp32kb, tmp32kb);  //b sample1 squared. TOP
        valb1= signed_saturate_rshift(valb1, 32, 0);          //prevent 32 bit overflow ... remark 32 bit here
    
        uint32_t sum_a1b1 = vala1 + valb1;                     // 32 uint. can't be negative because the squaring above...
        uint16_t root_1 = sqrt(sum_a1b1);                      // take the square root
        //Serial.println(root_1, BIN);                         // debug
        root_1 = signed_saturate_rshift(root_1, 16, 0);        // this removed the negative swings on overflow
        if (root_1 == 0) root_1 = 1;
       
        //float root_1f = root_1 / 32768.0f;
        //Serial.println(root_1f);                             // debug 
        //uint16_t invroot_1 = (uint16_t) (32767 / root_1);
        //invroot_1 = signed_saturate_rshift(invroot_1, 16, 0);        // this removed the negative swings on overflow
        //Serial.println(invroot_1);                               // debug 
    
    
    
        uint32_t sum_a2b2 = vala2 + valb2;
        uint16_t root_2 =  sqrt(sum_a2b2);                     // now in same format as each sample ... but can only be positive
        root_2 = signed_saturate_rshift(root_2, 16, 0);    // this removed the negative swings on overflow now max 32767 
                                                                                 // the output from the square root can max be 46339 (I think...)
                                                                                 // which is too much for a signed 16 bit ch
        if (root_2 == 0) root_2 = 1;
        float root_2f = root_2 / 32768.0f;                     
    
        uint16_t invroot_2 = (uint16_t) (32767 / root_2);          // inverse of root as a test cae
        //Serial.println(invroot_2);
        //invroot_2 = signed_saturate_rshift(invroot_2, 16, 0);  // this removed the negative swings on overflow

    
        // for sending sqrt to output for testing only
        uint32_t tmp32 = pack_16b_16b(root_2, root_1);         // pack sample 2 on upper 16 and sample 1 on lower 16

        // for sending inverted sqrt to output for testing only
        //uint32_t tmp32 = pack_16b_16b(invroot_2, invroot_1); // pack sample 2 on upper 16 and sample 1 on lower 16
    
    
    
        *pa = (uint32_t) tmp32;                                              // write packed data to channel 

    /* 
        //DIDNT work  ... why??
        uint16_t ka1 = tmp32ka & 0x0000ffff;
        uint16_t ka2 = (tmp32ka & 0xffff0000) >>16;
    
        uint16_t kb1 = tmp32kb & 0x0000ffff;
        uint16_t kb2 = (tmp32kb & 0xffff0000) >>16;
    */
    
     
         pa++;
         pb++;
         }
         transmit(blocka);  // a is in and out (writable)
         release(blocka);
         release(blockb);
}
 
Maybe look at sqrt_integer.h in the audio library. It's actually used by the FFT objects to compute magnitude of a complex number, pretty much the exact same equation. So look at analyze_fft256.cpp to see how it's actually used.

Or you could use 32 bit floats. Would be interesting to see how the speed compares with this optimized integer-only approach.
 
There are specialized functions to do this in the CMSIS DSP lib for fixed and floating point:

http://www.keil.com/pack/doc/CMSIS/DSP/html/group__cmplx__mag.html

But your data would have to be in an interleaved format for the input file.

If you use this for AM demodulation, you could also have a look here (code from line 3482 has several alternative algorithms for AM demodulation):

https://github.com/DD4WH/Teensy-ConvolutionSDR/blob/master/Teensy_Convolution_SDR.ino

I tested several demodulation algorithms proposed in the literature, although only two of them provided high fidelity audio.

https://github.com/DD4WH/Teensy-ConvolutionSDR/wiki/Demodulation-modes
 
Can someone explain to me the audio format used in the Teensy Audio library (Teensy 3.6 and Audio board):

How is the data in the uint32_t variable named data organized?

Code:
uint32_t *pa;
blocka = receiveWritable(0); 
pa = (uint32_t *)(blocka->data);

uint32_t data = *pa;

Are there two 16 bit signed variables contained inside this uint32_t data variable? So the upper one is the first sample, and the lower one is the second sample?
Or what?
 
Last edited:
Mike, your PM and my forum post crossed, but I expect you can manage to take the bits you need for your purpose from the links posted above.

Have fun!

Frank
 
Are there two 16 bit signed variables contained inside this uint32_t data variable? So the upper one is the first sample, and the lower one is the second sample?

Yup, that's how it works.

The audio library has *lots* of optimized code with these sorts of tricks to process or copy 2 samples at once, or sometimes 4, 8 or even 16 samples within 1 iteration of a loop, using 2, 4 or 8 uint32_t variables, which pack 2 samples into a single 32 bit register and transfer them to/from memory as a single 32 bit bus operation. There are special DSP instructions which can access these 16 bit samples from either half of a 32 bit variable. So when you see functions like signed_multiply_32x16t(), it multiplies a 32 bit number by 16 of the bits in the top half of a 32 bit variable. These are all defined in utility/dspinst.h. They allow certain features in the library to run much faster, because the data can be read efficiently over the 32 bit bus and then processed right from those 32 bit registers without unpacking each to a pair of variables. But it does make the code a little confusing, because uint32_t variables are used to hold pairs of samples.

The actual samples are int16_t type.

However, the API presented to Arduino (mostly) uses 32 bit float to represent the audio levels, from +1.0 to -1.0. Most of this is done as inline functions, so when you use float constants as inputs to those functions, they're translated to integers at compile time. Internally, the library uses only integer math.
 
Thanks for clearing that up! Yes its a bit confusing when there is 2 samples in one variable.

Is it correct that the data format from the sgtl5000 is Q1.15 format?
 
Mike, your PM and my forum post crossed, but I expect you can manage to take the bits you need for your purpose from the links posted above.

Have fun!

Frank
Hi @DD4WH. I tested that AM demod written by Pete that you sent a link to and it did exactly what I wanted. Also it was quite easy to understand since it was used 16 bit vars instead of 32.
 
Status
Not open for further replies.
Back
Top