How does the audio mixing work? I see an audio mix function that appears to mix up to four channels.
Yup, pretty much just like that, it accepts up to 4 inputs. The streams are simply added together and output.
The output right now is mono, correct?
The WAV file player currently only parses mono WAV files. I'll add stereo parsing soon.
But the library itself supports any number of streams, within CPU and memory limits.
So that would be four mono 44,100hz wav files you can play at once?
Yes, if your SD card (and the SD library) can keep up.
In 1.18-rc2, I added a speedup to the SD library. So far, I haven't retested its performance with the audio stuff. But before the speedup, I was seeing about 40% CPU usage for reading a single mono stream from the SD card, mostly due to slowness in the SD library.
What happens if you attempt to play another file if four are already playing? Does it simply not play?
Each instance of the WAV player is independent. If you create 4 WAV player objects and decide you want to play a 5th file, how you do that is up to you. If your code has only created 4 of those objects, obviously you're going to need to stop one of them when you want to start playing a 5th file. The reason the limit is 4 is only because you created 4 independent objects.
Of course, you could just create 5 WAV player objects. The mixer supports only 4 inputs, but don't let that stop you! Just create 2 mixes and feed the output of one into the input of another, and now you can mix 7 channels!
I presume for now, the volume of each channel is simply divided by 4 before being output?
Oh, looks like I commented out the gain function. I'll put that back in. Remember, this stuff is very "beta" right now.....
The mixer does support variable gain on each channel. It defaults to a gain of 1 on each channel. If your mix exceeds the maximum level, the output clips, just like real audio gear (well, except this is digital audio, so there's a hard clipping limit rather than a lot of extra headroom like you'd find on analog gear).
It would be nice eventually to have automatic gain control. Compression would be nice too.
Pull requests are welcome!
In a week or two, I'll write up a how-to guide about creating your own audio effects. But here's a start.....
Start by finding a similar object in Audio.h and copy its definition, and of course change the name. All objects much inherit from AudioStream. If your object has inputs, you must have "audio_block_t *inputQueueArray[num]" in the private section, where "num" is the number of inputs. Your constructor must initialize AudioStream with the number of inputs and that array. You also must have the virtual update() function in your public section. Everything else is optional... just add private variables as needed for whatever you want to do, and add public functions for whatever stuff you want to be accessible from the Arduino sketch. The simplest way to get this "boilerplate" stuff right is to copy an object definition that already works. Maybe I ought to publish a template example?
In your actual code, the only function you must implement is update(). If you don't, the compiler will give a rather unhelpful vtable error.
If your object gets input, in update() you'll call receiveReadOnly(channel) or receiveWritable(channel) to acquire any incoming audio. The library allows shared memory, so receiveReadOnly() is more efficient in if the same audio is fed to other objects. Each input can only source 1 block, so you only need to call either of these one for each input channel. As the name suggests, you choose which function depending on whether you will modify the contents of the audio data. The mixer object, for example, calls receiveWritable() for the first block it acquires, and then receiveReadOnly() for any others. These functions can and do return NULL if there is no input, so you must check for NULL and treat the input as silent.
If you need to create audio, or you just need another block to fill, call allocate(). It too can return NULL, so you must check.
When you obtain audio from allocate(), receiveReadOnly() or receiveWritable(), you own it. You must call release() to free the memory. If you need to buffer audio, you can store block pointers in your object's private variables and use them on future update() runs, but this consumes the precious audio memory, so only keep blocks allocated if you really must.
The pointers you get, type "audio_block_t *" are a struct with 1 member you're mean to access: "data", which is an array of 128 int16_t's with the actual audio data. The struct has a couple other members which you should not touch. If you obtained the block with receiveReadOnly(), do not write to the data[] array. But you still must call release() when you're done with that block.
Once you've got your input blocks, plus any new blocks you need, do whatever your object will do. If you're modifying audio, you'll probably obtain the inputs as writeable blocks and change their data[] arrays directly. If you're synthesizing something, you'll probably get new blocks with allocate() and fill them up. Of course, you can do anything you want, within the limits of your programming skill and the available memory and CPU power.
If your object has outputs, call transmit(block, channel) to send audio out. It's ok to transmit the same block to multiple outputs. You still own that block, even after transmit. You must always call release() for every block you obtain with receiveReadOnly(), receiveWritable() or allocate().
For example, if you wanted to create automatic gain control, you might have 1 input and 1 output. You'd call receiveWritable(), then perhaps compute an average by just summing up the samples (inverting the negative ones, of course). Or maybe you'd square them and add them all up? Then you might adjust your gain slightly if the average is above or below the target, which of course would be a private member variable, and multiply all 128 samples by the gain setting. Then just transmit and release the block.
Every objects gets simple CPU usage tracking (but I need to publish an example), so you can test how much CPU you're using. The Cortex-M4 DSP optimizations can really help. In this AGC example, to compute the average, you could fetch the input samples 2 at a time using a 32 bit pointer. The M4 optimizes multiple reads in a row, where a normal 32 bit read takes 2 cycles, but subsequent back-to-back reads take only 1 more cycle. So you could fetch 8 inputs in only 5 cycles. The DSP instructions feature a dual 16x16 multiply-and-accumulate (allowing inputs from separate halves of 32 bit registers), so you could square and sum each pair of inputs in just 1 cycle. The average could be computed in just 144 cycles without looping overhead. To keep things in perspective, ALL the update functions for all audio objects must complete in under 278528 cycles. The CPU usage functions can tell you what fraction of that total you're using up.
Likewise, the AudioMemory() function at the beginning of setup() creates the pool of memory which provides all audio_block_t data. There are functions to query the current and worst case usage, so you can tell if you've got memory issues. But if you receive or allocate audio blocks and fail to release them, you'll quickly run out of memory and the entire system will go silent, so memory leaks are pretty obvious.
Objects that actually move data on or off the chip have some other requirements, which are what causes every update() function to actually run every 2.9 ms. But you don't need to worry about those if your object runs entirely on-chip, using only receiveReadOnly(), receiveWritable(), allocate(), transmit() and release().