Limits of delay effect in audio library

Lets see.. 256 bytes per Audioblock write + read, 16MHZ SPI (for cpu @ 96MHZ = 48MHZ F_BUS) = (8 bit * 256 * 2)/16E6 = 0.256ms for one channel, or 2 x 0.256ms stereo theoretical absolute minimum time to transfer (correct?).
Less with faster F_BUS (0.2048ms) and perhaps a little bit more without fifo.

Not too much :)
 
Last edited:
I agree on the timing calculations, but we must both write the original signal and read back the delayd version.

Theoretical minimum is nice, but a scope image showing how much time is actually spent in the delay code, for me, that is what an old theoretically focused scientist wants for proof these days.

Even more good news is I have received some chips so I can start testing, and for people with more money than brains the following chips can be of interest http://www.aliexpress.com/item/FM25V40-G-FM25V40-RAMTRON-SOP8-Free-shipping/32324159650.html, 20 dollars is a lot but its 4Mbit :) and nonvolatile
 
It is probably beeing phased out and end of lifed for new chips, the CY15B104Q, but you can find it at http://www.cypress.com/file/46161/download, document updated in november 2014 for 4Mbit chip and FM25V40 is the only 4Mbit chip referenced in the document. So it cannot be very obsolete.
 
Last edited:
Theoretical minimum is nice, but a scope image showing how much time is actually spent in the delay code, for me, that is what an old theoretically focused scientist wants for proof these days.

Will you settle for Serial.println(AudioProcessorUsageMax())? I think I've done enough oscilloscope posting for one day!

For a storing 1 channel to the 23LC1024 and a single tap reading:

Code:
CPU: 14.76
CPU: 14.83
CPU: 14.73
CPU: 14.75
CPU: 14.86
CPU: 14.83
CPU: 14.75
CPU: 14.83
CPU: 14.75

Running this code:

Code:
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>

AudioSynthWaveformSine     sine;
AudioEffectEnvelope        env;
AudioEffectDelayExternal   dly;
AudioOutputI2S             headphones;
AudioConnection            patchCord1(sine, env);
AudioConnection            patchCord2(env, dly);
AudioConnection            patchCord6(env, 0, headphones, 0);
AudioConnection            patchCord7(dly, 0, headphones, 1);
AudioControlSGTL5000 audioShield;

void setup() {
        AudioMemory(10);
        audioShield.enable();
        audioShield.volume(0.7);
        sine.amplitude(0.9);
        sine.frequency(800);
        dly.delay(0, 1500);
}

void loop() {
        env.noteOn();
        delay(80);
        env.noteOff();
        delay(4000);
        Serial.print("CPU: ");
        Serial.println(AudioProcessorUsageMax());
        AudioProcessorUsageMaxReset();
}
 
With 2 taps reading different locations within the same 23LC1024 chip:

Code:
CPU: 21.52
CPU: 21.35
CPU: 21.36
CPU: 21.37
CPU: 21.44
CPU: 21.35
CPU: 21.36
CPU: 21.50

Running this test code:

Code:
#include <Audio.h>
#include <Wire.h>
#include <SPI.h>
#include <SD.h>
#include <SerialFlash.h>

AudioSynthWaveformSine     sine;
AudioEffectEnvelope        env;
AudioEffectDelayExternal   dly;
AudioMixer4                mix;
AudioOutputI2S             headphones;

AudioConnection            patchCord1(sine, env);
AudioConnection            patchCord2(env, dly);
AudioConnection            patchCord3(dly, 0, mix, 0);
AudioConnection            patchCord4(dly, 1, mix, 1);
AudioConnection            patchCord5(env, 0, headphones, 0);
AudioConnection            patchCord6(mix, 0, headphones, 1);
AudioControlSGTL5000 audioShield;

void setup() {
        AudioMemory(10);
        audioShield.enable();
        audioShield.volume(0.7);
        sine.amplitude(0.9);
        sine.frequency(800);
        dly.delay(0, 1500);
        dly.delay(1, 1200);
}

void loop() {
        env.noteOn();
        delay(80);
        env.noteOff();
        delay(4000);
        Serial.print("CPU: ");
        Serial.println(AudioProcessorUsageMax());
        AudioProcessorUsageMaxReset();
}
 
Paul,

Thanks for working on this code! I'm looking forward to trying it as soon as my memory chips arrive.

Is there a chance for a version with more than 8 taps on the delay line?

Best regards,
Michael
 
I should mention this code has no optimization effort so far, on the SPI stuff, other than using SPI.transfer16() instead of ordinary SPI.transfer(). I'm sure it'll get faster when/if work is done to leverage the FIFO, or even DMA.

Adding FIFO optimization should be fairly simple. Using DMA would require restructuring the code to schedule in advance where to read and set up the transfers. That'd be more work, but maybe do-able with enough effort? Maybe...

Overclocking the SPI to 24 MHz if of source the low hanging fruit. It seems to work, at least on the one 23LC1024 on my desk right now. Overclocking reduces the 2-tap case from 21.4% to 17.0% processor usage.
 
No problem with scope displays, it seems each tap takes 21.3% - 14.7% = 6.5% of the available processor time, and probaly the write uses the same amount of resources. So eight taps would use around 50% of the available time, unless some some optimisations can be done for taps close in time. For a monophonic system this can work if not to many other cpu intensive tasks are running.

There will always be a delicate balance for high performance code and not everything can be achieved on a Teensy 3.1, but it is amazing how much can actually be done.
 
Is there a chance for a version with more than 8 taps on the delay line?

With the currently not-optimized code, each tap is adding approx 6.6% processor usage. Maybe 12 or 13 taps could work before you simply run out of time to read any more.

If you want to try, just edit the code. It's really very easy. There's a couple bitmask variables of type uint8_t, where each bit corresponds to whether a channel is active. Change those to uint16_t. There's a loop for actually implementing the channels inside update(), so change that to 16. The delay(ch, ms) and disable(ch) functions have a check if the channel is valid, so increase those. There's an array called delay_length[] which stores the per-channel setting, so increase its size. I believe that's all the edits needed for more taps.

What are you going to actually *do* with so many taps? If there's a really compelling need, I'll consider more as the default.... but you need to convince me there's a real need. A short youtube video of something awesome with 12 taps would be the most convincing way to persuade me!
 
Even in a polyphonic environment echo effects are often applied to the mix down of all the voices, so thers only one delay chain, or perhaps two for stereo.

Short delay effects like phasing can be done with on chip RAM buffers, without the added time of SPI writes.
 
:) I am not saying I need that many taps, but its good to know how much resources they use. My use is case is polyphonic synths so there will be lots of oscillators and filters eating time.
 
Modeling of room acustics, depending upon the researcher, teoretical or pratical will use 2 to 10 reflections/taps with individual filtering. Not saying this is really necessary but thats what many audio researchers want to go, I know some personally.

Having the basic code, we can all experiment with these things, so some great stuff might come out of this.
 
Paul,

A very reasonable question. Take a look at this YouTube demo: <https://www.youtube.com/watch?t=38&v=R5dwohfKqns>.

It shows a new audio delay plugin with 32 variable latency taps. In person, it's pretty awesome for sound design. The short answer to your question is that I'd like to duplicate the simple delay line functions of this plug-in in hardware for a modular music synth. It has a number of other functions beyond the capabilities of the Teensy 3.1, but I thought the delay line might be possible.

Best regards,
Michael
 
If you want to model lots of reflections, you could probably get close (with some limitations) by using a 1/2 second delay line with the fast internal RAM, then mix its taps together and feed that into a much longer delay line using the slow external chips.
 
.... a new audio delay plugin with 32 variable latency taps. In person, it's pretty awesome for sound design. .... I'd like to duplicate the simple delay line functions of this plug-in in hardware for a modular music synth.

Just put 3 Teensys into the module! Or maybe 2 will be able to give 16 taps each, when/if the code gets optimized.

And 2 or 3 of Frank's Memoryboard!!! I got one on order while the file was still up. Can't wait to try a 8.9 second delay!
 
Last edited:
Its an impressive demo, sales demo, running on a high performance computer.

The same numbers of taps and filters will most certainly not be reproducable on a Teensy, but with skill most of the sound effects will be reproducible with a much more limited number of taps and filters.
 
Ok, I just wanted all 12 minutes of the demo. It is indeed impressive!

Nearly all of the demo used 16 taps over only several hundred milliseconds. I believe we're pretty close to being able to implement this already with the Teensy audio library. The main feature they have which we're missing is signal modulation of the tap positions. Not much of the demo used that... but when they did, wow, impressive! Guess I've got another feature to put onto my long to-do list. ;)

Of course, one essential feature they used was aligning the taps to the music's tempo. That really makes me wonder if they're doing the beat/tempo detection in real time, or if they're pre-analyzing the entire recorded file, like Traktor and other DJ software usually does?
 
Uhh.. i really hope the new revision works :) - I made a complete new routing.
Eagle says, all is ok...

Maybe i have more time on the next weekend, then i can try with my old board (but only 4x RAMs + two other chips) and eventually upload a video or something...
 
I ordered both your 3.02 and 4 versions from OSH Park, but neither has arrived yet. I've got the chips sitting right here, just wanting for the PCB.

But in terms of making the best use of limited dev time, I think I should probably focus on some of the other most-requested features first. Adding seeking to the various play objects is probably the next thing I should do.

Hopefully supporting your board should be pretty simple... just a little initialization and code in those three chip access functions. The rest is all designed to handle arbitrary memory size and arbitrary automatic partitioning of the memory if multiple objects are created.
 
Ok, I just wanted all 12 minutes of the demo. It is indeed impressive!

Of course, one essential feature they used was aligning the taps to the music's tempo. That really makes me wonder if they're doing the beat/tempo detection in real time, or if they're pre-analyzing the entire recorded file, like Traktor and other DJ software usually does?

Paul,

Thanks for taking a look! I'm pleased to see so much interest in delay lines. They are the basis for many audio effects, and a flexible delay object will be a great feature for the audio library.

I don't think the plug-in does beat detection. It usually plugs into a DAW and I think it derives tempo from the MIDI clock, or perhaps from the BPM setting in the DAW.

BTW, as you suggested earlier today, I have one of Frank's memoryboards in front of me right now -- just waiting for memory chips to arrive :).

Best regards,
Michael
 
But in terms of making the best use of limited dev time, I think I should probably focus on some of the other most-requested features first. Adding seeking to the various play objects is probably the next thing I should do.

that would be convenient ... and though that's slightly OT; mentioning it as a bunch of other objects might then be included as well ?

can only speak for myself of course but this is how play_sd_wav looks here, presently:

Code:
AudioPlaySdWav(void) : AudioStream(0, NULL) { begin(); }
	void begin(void);
	bool open(const char *filename); 
        bool open(uint16_t index);          // open by index
	bool play(const char *filename);  
        bool play(uint16_t index);           // open+play, by index
	bool seek(const char *filename, uint32_t pos); // open+play from pos
        bool seek(uint16_t index, uint32_t pos);  // open+play from pos, by index
	bool seek(uint32_t pos); // resume from pos
	void stop(void);
	void pause(void); // stop, don't close file
	void close(void);
	bool isPlaying(void);
	uint32_t positionMillis(void);
	uint32_t positionBytes(void);
	uint32_t lengthMillis(void);
	uint32_t lengthBytes(void);
	virtual void update(void);

most of this is probably irrelevant for the non-SD stuff; as for SD, there was some other thread which suggested not opening/closing the file unless necessary improves latency, hence the explicit open + close. not sure about the index stuff, it requires adjusting the SD code, too; i didn't test it a lot, though superficially, the performance gain seemed minimal. it does save on RAM though, ie when dealing with lots of files.
 
Back
Top