Need Some Help Understanding the Library Code

Status
Not open for further replies.
Hello!

I'm working on a project that includes writing a custom library object and I've hit a bit of a wall in my understanding of how the library works.

The thing that's confusing me at the moment are these lines in the effect_multiply.cpp file:

Code:
end = pa + AUDIO_BLOCK_SAMPLES/2;
	while (pa < end) {

As far as I understand, 'pa' is a pointer to the 'data' variable (array?) of a block. I would understand if 'end' was just set to the length of that array, as then the loop would just move through each value before exiting. But why add AUDIO_BLOCK_SAMPLES? And why divide by 2?

Ultimately, I think my question might stem from some confusion about how audio_block_t works and the way it relates to the update function of the objects. At the moment, I'm seeing it as a buffer that gets called, read-through, modified and output once per update cycle. So any loops within the update cycle would be reading through the individual samples of the buffer. Is this correct? If so, how does it relate to lines I posted above? :confused:

Sorry if this is an obvious question! I'm taking my first steps with 'proper' C++ here, so there's a few bits of knowledge I might be missing.

Thanks a lot x :D
 
With that small snippet of code I can't tell if the translation for the 'units' is 8 bit .vs. 16 bit or 16 bit .vs. 32 bit but::

'end' is the stopping point for the loop. It is based on 'pa' the start point and is some fixed 'units' away using the type of pointer they are - whatever that is as the code is not provided.

AUDIO_BLOCK_SAMPLES is divided by 2 because the units that covers are half the size of the pointer type of pa and end, assuming that the part of the 'while' loop not shown is doing a simple pa++ increment while it does 'something' moving the one index pointer toward the other pointer.
 
Thanks for the explanation! Probably gonna have to read it a few times to fully understand it, but in the mean time, here's the full code:

Code:
#include "effect_multiply.h"

void AudioEffectMultiply::update(void)
{
#if defined(KINETISK)
	audio_block_t *blocka, *blockb;
	uint32_t *pa, *pb, *end;
	uint32_t a12, a34; //, a56, a78;
	uint32_t b12, b34; //, b56, b78;

	blocka = receiveWritable(0);
	blockb = receiveReadOnly(1);
	if (!blocka) {
		if (blockb) release(blockb);
		return;
	}
	if (!blockb) {
		release(blocka);
		return;
	}
	pa = (uint32_t *)(blocka->data);
	pb = (uint32_t *)(blockb->data);
	end = pa + AUDIO_BLOCK_SAMPLES/2;
	while (pa < end) {
		a12 = *pa;
		a34 = *(pa+1);
		//a56 = *(pa+2); // 8 samples/loop should work, but crashes.
		//a78 = *(pa+3); // why?!  maybe a compiler bug??
		b12 = *pb++;
		b34 = *pb++;
		//b56 = *pb++;
		//b78 = *pb++;
		a12 = pack_16b_16b(
			signed_saturate_rshift(multiply_16tx16t(a12, b12), 16, 15), 
			signed_saturate_rshift(multiply_16bx16b(a12, b12), 16, 15));
		a34 = pack_16b_16b(
			signed_saturate_rshift(multiply_16tx16t(a34, b34), 16, 15), 
			signed_saturate_rshift(multiply_16bx16b(a34, b34), 16, 15));
		//a56 = pack_16b_16b(
		//	signed_saturate_rshift(multiply_16tx16t(a56, b56), 16, 15), 
		//	signed_saturate_rshift(multiply_16bx16b(a56, b56), 16, 15));
		//a78 = pack_16b_16b(
		//	signed_saturate_rshift(multiply_16tx16t(a78, b78), 16, 15), 
		//	signed_saturate_rshift(multiply_16bx16b(a78, b78), 16, 15));
		*pa++ = a12;
		*pa++ = a34;
		//*pa++ = a56;
		//*pa++ = a78;
	}
	transmit(blocka);
	release(blocka);
	release(blockb);

#elif defined(KINETISL)
	audio_block_t *block;

	block = receiveReadOnly(0);
	if (block) release(block);
	block = receiveReadOnly(1);
	if (block) release(block);
#endif
}
 
Sorry, I should also mention that what I'm trying to achieve is having two inputs for an object, which I thought I might be able to figure out through looking at the effect_multiply.cpp code, but perhaps that wasn't the most straight ahead example to choose?
 
Ooooh I think I understand partially now. The 'pa' pointer is like a playhead and 'end' adds on the buffer size to wherever 'pa' is up to to ensure it reads through the whole buffer? I can aaaaalmost understand the dividing by 2 and how it's related to bit depth, but I feel like I'm missing something. Probably need to think about it a bit longer haha
 
The count of samples { # is 128 } is for 16 bit samples AFAIK.

And indeed the pointers noted are to 32 bit values - so they are being consumed/processed 2 at a time.
 
The audio samples are 16 bits, but this code is reading pairs of them into 32 bit variables. Because the CPU is 32 bit and the memory is 32 bits wide, this reads 2 samples at once for a 2X speedup. The Cortex-M4 processor also has a special speedup where subsequent "similar" memory access is performed in only a single cycle, rather than the usual 2 cycles. That's why this code reads 4 samples. As you can guess from the commented out code, I originally meant for it to read 8 samples at a time. On top of the speed of moving data, doing more work per iteration of the loop means less total time is spend looping.

Those lengthy function names are special inline functions which use the M4's special instructions which operate on 16 bits from a 32 bit register. For example, multiply_16tx16t(a, b) multiplies the top 16 bits from "a" and the top 16 bits from "b" while ignoring whatever is the bottom 16 bits in each of those variables. With more conventional programming, you'd just read a single 16 bit number into "a" and into "b" and then write "a * b". While that executes in the same amount of time, it means you can't have two 16 bit numbers packed into each 32 bit variable.

Those signed_saturate_rshift() functions do a right shift, which you'd normally write with code like "a >> 16", and they clamp the variable to a maximum number which you'd normally write with code like "if (a > 32767) a = 32767;". This would probably be much easier to understand if it were written in that normal way. But doing that would require multiple instructions and the conditional "if" can take longer if it needs to branch, because the CPU must refill its pipeline. The signed_saturate_rshift() function compiles to a special 1 cycle instruction which does all that work. It's fast, but the code isn't as easy to see.

Likewise, those pack_16b_16b() instructions take two separate 16 bit results and pack them back into a 32 bit variable. That's done so similar code which loaded 4 samples into registers using only 3 cycles can quickly write the 4 samples back to memory.

Many parts of the audio library are written this way, because it runs so much faster. For just a single effect like multiply, doing it the simple but slower way would be perfectly fine. Teensy is plenty fast enough. But as part of the library where you might use dozens of effects, this sort of optimization means you can use many more and still stay within Teensy's capability.

If you're writing a new effect or other feature, I recommend doing it first the simple but slow way. Once you get it working, then consider whether to optimize. Using these tricks where pairs of samples are packed into 32 bit variables makes writing and debugging so much harder.
 
Thanks so much for such a detailed explanation!! I had a feeling signed_saturate_rshift() was doing something like, although pack_16b_16b() was confusing me. Makes tonne more sense to me now! I'm trying to port a ChucK function I wrote based on the NLC Neuron eurorack module - thought it might be neat to have in the digital realm because a bunch of them connected up could be fun. I've been trying to keep it as simple as I can - so far my object looks like this:

Code:
#include "effect_neuron.h" 
 
void AudioEffectNeuron::update(void){ 
 int32_t *pa, *pb, *end;
 audio_block_t *blocka, *blockb;

 blocka = receiveWritable(0);
 blockb = receiveReadOnly(1);
 if (!blocka) {
     if (blockb) release(blockb);
     return;
 }
 if (!blockb) {
     release(blocka);
     return;
 }

 pa = (uint32_t *)(blocka->data); //pa now points to the buffer/array 'data' which holds the samples we want to manipulate
 pb = (uint32_t *)(blockb->data); //pb now points to the buffer/array 'data' which holds the samples we want to use for manipulation
 end = pa + AUDIO_BLOCK_SAMPLES/2; //end = the number of the sample you're up to, plus the buffer size. Think of pa as your playhead?
                                //Can't decide whether to divide by 2 or not? Think I probs should be
        while (pa < end) { //run through the buffer 
                int32_t Vmix, Vrect, Vcomp, Voutmix, Voutfinal;
                int32_t a = *pa; //a points to the current sample in block a. It oscillates between -32768 and +32768
                int32_t b = *pb; //b points to the current sample in block b. Same range. 
                
                //DSP Here
                //------------------------Mixer
                Vmix = 32768 - (a + b); //might actually be 1-(a+b);

                //------------------------Rectifier
                if(Vmix > rectThresh){Vrect = rectThresh;}
                else if(Vmix < -rectThresh){Vrect = 0 - rectThresh;}
                else{Vrect = Vmix;}

                //------------------------Comparator
                if(Vrect >= compThresh){Vcomp = -32768;}
                else if(Vrect < compThresh){Vcomp = 32768;}
                
                //------------------------Final Mixer
                Voutmix = outputBias - (Vrect + Vcomp);

                //------------------------Adjust for Clipping
                if(Voutmix >= 32768){Voutfinal = 32768;}
                else if(Voutmix <= -32768){Voutfinal = -32768;}
                else{Voutfinal = Voutmix;}
                
                //------------------------Asign changes back to a and then to the audio block
                a = Voutmix;
                *pa = a;
                pa++;
            
        }
        transmit(blocka);
        release(blocka);
        release(blockb);
}

Still trying to work through a few issues in order to get it to compile though. The compiler doesn't seem to like the singed to unsigned conversions or and it's telling me that I'm comparing signed and unsigned integers although I rewrote the header file and don't think I am anymore. Any thoughts you (or anyone on the forum!) have on this would be really appreciated :)

Header file:

Code:
#ifndef effect_neuron_h_
#define effect_neuron_h_

#include "AudioStream.h"
#include "Arduino.h"

class AudioEffectNeuron : public AudioStream
{
public:
        AudioEffectNeuron() : AudioStream(2, inputQueueArray){
            rect_Thresh(29491);  // default values...
		    comp_Thresh(0);
		    output_Bias(16384);
        }

        void rect_Thresh(uint32_t threshold){
            rectThresh = threshold;
        }

        void comp_Thresh(uint32_t threshold){
            compThresh = threshold;
        }

        void output_Bias(float level){
            outputBias = level;
        }

        virtual void update(void);
private:
        audio_block_t *inputQueueArray[2];

        int32_t rectThresh;
        int32_t compThresh;
        int32_t outputBias;
};

#endif
 
For the sake of getting things working, I highly recommend you do it in a much simpler way without the pointers.

Something like this...

Code:
void AudioEffectNeuron::update(void)
{
  audio_block_t *blocka, *blockb;

  blocka = receiveWritable(0);
  blockb = receiveReadOnly(1);
  if (!blocka) {
    if (blockb) release(blockb);
    return;
  }
  if (!blockb) {
    release(blocka);
    return;
  }

  // simple for loop, integer index rather than pointers to 32 bit packed data
  for (int i = 0; i < AUDIO_BLOCK_SAMPLES; i++) {

    // read 1 sample from each incoming block
    int32_t a = blocka->data[i];
    int32_t b = blockb->data[i];

    // do math here....
    a = (a + b) / 2;

    // write 1 sample back to outgoing block
    blocka->data[i] = a;
  }
  transmit(blocka);
  release(blocka);
  release(blockb);
}
 
Hi Paul! Thanks a tonne for that suggestion! I've managed to get it working with your template and it sounds about as strange as I thought it would. Of course, still need to run a few more tests to make sure it's working right but from what I've done so far it seems like it might be. Here's the current .ccp and .h, as well as an arduino example for anyone that wants to try it out :)

.cpp
Code:
#include "effect_neuron.h" 

void AudioEffectNeuron::update(void)
{
  audio_block_t *blocka, *blockb;

  blocka = receiveWritable(0);
  blockb = receiveReadOnly(1);
  if (!blocka) {
    if (blockb) release(blockb);
    return;
  }
  if (!blockb) {
    release(blocka);
    return;
  }

  // simple for loop, integer index rather than pointers to 32 bit packed data
  for (int i = 0; i < AUDIO_BLOCK_SAMPLES; i++) {
    int32_t Vmix, Vrect, Vcomp, Voutmix, Voutfinal;
    // read 1 sample from each incoming block
    int32_t a = blocka->data[i];
    int32_t b = blockb->data[i];

    //DSP Here
    //------------------------Mixer
    Vmix = 32768 - (a + b); //might actually be 1-(a+b); 32768

    //------------------------Rectifier
    if(Vmix > rectThresh){Vrect = rectThresh;}
    else if(Vmix < -rectThresh){Vrect = 0 - rectThresh;}
    else{Vrect = Vmix;}

    //------------------------Comparator
    if(Vrect >= compThresh){Vcomp = -32768;}
    else if(Vrect < compThresh){Vcomp = 32768;}
                
    //------------------------Final Mixer
    Voutmix = outputBias - (Vrect + Vcomp); //outputBias

    //------------------------Adjust for Clipping
    if(Voutmix >= 32768){Voutfinal = 32768;}
    else if(Voutmix <= -32768){Voutfinal = -32768;}
    else{Voutfinal = Voutmix;}
                
    //------------------------Assign changes back to a and then to the audio block
    a = Voutfinal;
    // write 1 sample back to outgoing block
    blocka->data[i] = a;
  }
  transmit(blocka);
  release(blocka);
  release(blockb);
}

.h
Code:
#ifndef effect_neuron_h_
#define effect_neuron_h_

#include "AudioStream.h"
#include "Arduino.h"

class AudioEffectNeuron : public AudioStream
{
public:
        AudioEffectNeuron() : AudioStream(2, inputQueueArray){
            rect_Thresh(29491);  // default values...
		    comp_Thresh(0);
		    output_Bias(16384);
        }

        void rect_Thresh(int32_t threshold){
            rectThresh = threshold;
        }

        void comp_Thresh(int32_t threshold){
            compThresh = threshold;
        }

        void output_Bias(int32_t level){
            outputBias = level;
        }

        void begin(int32_t rThresh, int32_t cThresh, int32_t lev){
            rect_Thresh(rThresh);
            comp_Thresh(cThresh);
            output_Bias(lev);
        }

        virtual void update(void);
private:
        audio_block_t *inputQueueArray[2];

        int32_t rectThresh;
        int32_t compThresh;
        int32_t outputBias;
};

#endif

Arduino file
Code:
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>
#include <Audio.h>
#include <Wire.h>


// GUItool: begin automatically generated code
AudioInputI2S         input;           //xy=161,80
AudioEffectNeuron     neuron;
AudioEffectNeuron     neuron2;
AudioMixer4           mix;
AudioMixer4           mix2;
AudioSynthWaveform    waveform1;
AudioSynthWaveform    waveform2;
AudioSynthWaveformSineModulated    sine_fm1; 
AudioOutputI2S        masterOut;           //xy=329,47
AudioConnection       patchCord1(waveform2, 0, neuron, 0);
AudioConnection       patchCord3(waveform1, 0, neuron, 1);
AudioConnection       patchCord6(waveform1, 0, neuron2, 1);
AudioConnection       patchCord7(neuron, 0, neuron2, 0); 
AudioConnection       patchCord8(neuron2, 0, mix2, 0);
AudioConnection       patchCord2(mix2, 0, sine_fm1, 0);
AudioConnection       patchCord5(sine_fm1, 0, mix, 0);
AudioConnection       patchCord4(mix, 0, masterOut, 0);
AudioControlSGTL5000  sgtl5000_1;     //xy=198,393
// GUItool: end automatically generated code

void setup() {
  // Audio connections require memory to work.  For more
  // detailed information, see the MemoryAndCpuUsage example
  AudioMemory(50);
  sgtl5000_1.enable();
  sgtl5000_1.volume(0.3);
  neuron.begin(30000, 0, 1);
  neuron.begin(30000, 0, 1);
  waveform1.begin(0.2, 0.3, WAVEFORM_SINE);
  waveform2.begin(0.2, 0.7, WAVEFORM_SINE);
  sine_fm1.frequency(440);
  mix.gain(0, 0.2);
  mix2.gain(0, 0.5);
  waveform1.amplitude(0.3);
  waveform2.amplitude(0.3);
  sine_fm1.amplitude(0.3);
  Serial.begin(9600);
  Serial.println("Hello");
}

void loop() {
  // Do nothing here.  The Audio flows automatically
}
 
Playing around with different values for the rectification threshold and the output bias can get pretty different sounds but I really like the values I've used above. You might need to alter them based on whatever you're using it to control however, as these values don't seem to work that nicely if I use them to modulate a filter. :confused:

Altering the value that the mixer minuses from in the .cpp file (the one I've commented might be 1) should (I thiiiink) change the range of the modulation, dropping it lower or higher in a similar way to changing the output bias.

This is just what I think is happening based on how I understand it to be working and some of the tests I've done. If anyone thinks I've got this stuff all wrong, please let me know! I'd love to hear it! ;)
 
For the sake of getting things working, I highly recommend you do it in a much simpler way without the pointers.

Something like this...

Code:
void AudioEffectNeuron::update(void)
{
  audio_block_t *blocka, *blockb;

  blocka = receiveWritable(0);
  blockb = receiveReadOnly(1);
  if (!blocka) {
    if (blockb) release(blockb);
    return;
  }
  if (!blockb) {
    release(blocka);
    return;
  }

  // simple for loop, integer index rather than pointers to 32 bit packed data
  for (int i = 0; i < AUDIO_BLOCK_SAMPLES; i++) {

    // read 1 sample from each incoming block
    int32_t a = blocka->data[i];
    int32_t b = blockb->data[i];

    // do math here....
    a = (a + b) / 2;

    // write 1 sample back to outgoing block
    blocka->data[i] = a;
  }
  transmit(blocka);
  release(blocka);
  release(blockb);
}


@Paul
The above template is for transmitting 1 single 128 sample audio block. Do you have a simple template for transmitting 2 consecutive 128 sample blocks? I am working on a fft256 system which needs 2 sample blocks.
 
Do you have a simple template for transmitting 2 consecutive 128 sample blocks?

Simple template, no.

Simple guidance, sure. Just add a pointer in your C++ class to retain access to the 2nd block. Make sure it's NULL normally. Then in your update function, check whether it's non-NULL. If so, you've still got the 2nd block you created on the prior update run. Now is the time to transmit it, release it, and set that pointer to NULL so you don't try to do the same again on the following update.

There isn't any way to transmit more than 1 block (per output) per update. If you allocate 2 at once, you must retain the 2nd one and use it on the next update. Yes, that's extra code for you to write, but that sort of code is pretty simple.
 
Thank you so much for your guide. Though I am only a beginner in C++, I shall go ahead for a trial and shall comeback to you again if I got stuck.
 
Simple template, no.

Simple guidance, sure. Just add a pointer in your C++ class to retain access to the 2nd block. Make sure it's NULL normally. Then in your update function, check whether it's non-NULL. If so, you've still got the 2nd block you created on the prior update run. Now is the time to transmit it, release it, and set that pointer to NULL so you don't try to do the same again on the following update.

There isn't any way to transmit more than 1 block (per output) per update. If you allocate 2 at once, you must retain the 2nd one and use it on the next update. Yes, that's extra code for you to write, but that sort of code is pretty simple.

@Paul
I have based on your previous template and your guide and wrote a program, which apparently is running ok. However, in order to avoid inherent defects, I would like to clarify two more points:

1. I don't know how to set a point to NULL, just use the release along. Seems the program is already running ok. Can you advise me how to set a pointer to NULL if it is necessary?

2. I transmit a 256 sample block, by two 128 sample blocks. The first block carries 0 to 127 samples and the second block 128 to 255 samples in the following update as per your guide. However, there is nowhere to identify which is the first block and which is the second. There is a chance that the receiving end can receive the second block (128 to 255 samples) first and consider it as the first block. The following block which is the first block (0 to 127 samples) of the other set of data as the second block, and so on. Then the performance will become messy. Can I assume the AudioStream class is automatically looking after this problem (because the program seems running ok)?

LoShu
 
Status
Not open for further replies.
Back
Top