changing pitch of audio samples - TeensyVariablePlayback library

You can call the peak object in the audio library and fill a buffer with scaled lines and display it, but even on a small display the FPS will be very limited.
I have done this on both a 240*320px over SPI and an 800*480px display running off the 1060’s LCD controller + SDRAM@198Mhz
In both cases the performance was decent (around 30fps on the big display and 22-24 on the small one)
You can see videos here:
 
Thank you both. I should clarify that I am referring to displaying the static waveform of an entire audio sample file. I am not currently looking to draw an oscilloscope style realtime audio waveform.

My device has a line input for recording samples, and I would like to be able to show a visual representation of what was recorded for better aiding in adjusting for volume, trimming sample start/end time, etc.

Also, I would like to be able to have a visual representation of existing sample files on the SD card as well, not only samples that the user is currently recording.

Edit: but looks like in that facebook link you are scrolling through the waveform from an existing audio file, so I think that is similar to what I'm trying to do. But since I'm using the AudioPlaySdResmp objects which do buffering when playing, I was just wondering if there's a preferred way to draw the waveform of a buffered WAV sample file without having to keep the sample file's data all in memory.
 
Struggling to respond here due to the DoS attack, I assume. But looks like @Rezo knows more about it than I do anyway :). The libraries have WAV header parsers of variable quality which you might be able to make use of, otherwise they probably won’t be of much help.
 
Back on topic, I've done, tested and documented an update which doesn't require modifications to other libraries, and gives you a couple of options. You can find it on this branch. Note the markdown file for documentation.

Briefly, you can either:
  1. wrap your SD (or other filesystem) accesses in calls to AudioEventResponder::disableResponse() and AudioEventResponder::enableResponse(). All yield() processing is masked, so keep these as short as feasible. It's OK to leave a file open and responses unmasked while you do other stuff, provided you re-mask before a read, write or close.
  2. Call AudioEventResponder::setForceResponse(true) in setup() to signal that your code will generate responses to EventResponder triggers generated by the Teensy Variable Playback library. In this case you must also call AudioEventResponder::runPolled() from your code, sufficiently often to ensure triggered events get their responses in a timely manner.
Note that AudioEventResponder::runPolled() only executes one triggered event, and it's quite possible more than that will be pending: if you get a return value > 0 then more remain - it's up to you to decide when to execute the next one. If you've buffered plenty of audio data then you have more flexibility in that decision.

Masking yield() responses is more complex, and could interfere with other libraries that also use EventResponder,. It may be preferable if you don't need SD access during playback, as you don't need to manage pending events yourself.
 
Edit: but looks like in that facebook link you are scrolling through the waveform from an existing audio file, so I think that is similar to what I'm trying to do. But since I'm using the AudioPlaySdResmp objects which do buffering when playing, I was just wondering if there's a preferred way to draw the waveform of a buffered WAV sample file without having to keep the sample file's data all in memory.

Struggling to respond here due to the DoS attack, I assume. But looks like @Rezo knows more about it than I do anyway :). The libraries have WAV header parsers of variable quality which you might be able to make use of, otherwise they probably won’t be of much help.

You can take a look at some of the code snippets I am using on that project here

To create a static waveform, you need to obtain the number of samples, if it's stereo, you need to sum L+R/2
You then need to divide the number of samples by the width of the object you want to display, lets say 320px wide.
You then sum all the samples for each pixel, and get the average, scale the average to object height/2
You can then either write each line to a buffer and display it, or write it directly to the display as you analyze it.

Different methods for different purposes and performances
 
You can take a look at some of the code snippets I am using on that project here

To create a static waveform, you need to obtain the number of samples, if it's stereo, you need to sum L+R/2
You then need to divide the number of samples by the width of the object you want to display, lets say 320px wide.
You then sum all the samples for each pixel, and get the average, scale the average to object height/2
You can then either write each line to a buffer and display it, or write it directly to the display as you analyze it.

Different methods for different purposes and performances
Yes I was looking at your thread and have been tinkering around, thank you!

Back on topic, I've done, tested and documented an update which doesn't require modifications to other libraries, and gives you a couple of options. You can find it on this branch. Note the markdown file for documentation.

Briefly, you can either:
  1. wrap your SD (or other filesystem) accesses in calls to AudioEventResponder::disableResponse() and AudioEventResponder::enableResponse(). All yield() processing is masked, so keep these as short as feasible. It's OK to leave a file open and responses unmasked while you do other stuff, provided you re-mask before a read, write or close.
  2. Call AudioEventResponder::setForceResponse(true) in setup() to signal that your code will generate responses to EventResponder triggers generated by the Teensy Variable Playback library. In this case you must also call AudioEventResponder::runPolled() from your code, sufficiently often to ensure triggered events get their responses in a timely manner.
Note that AudioEventResponder::runPolled() only executes one triggered event, and it's quite possible more than that will be pending: if you get a return value > 0 then more remain - it's up to you to decide when to execute the next one. If you've buffered plenty of audio data then you have more flexibility in that decision.

Masking yield() responses is more complex, and could interfere with other libraries that also use EventResponder,. It may be preferable if you don't need SD access during playback, as you don't need to manage pending events yourself.
Sorry for derailing the thread a bit. I will try to use this soon and report back, very much appreciated.
 
So I tried out both configurations, I have some observations:

  1. I'm using PlatformIO, so I'm not sure if it's that, but I was getting an error about play_failed() being undefined. It's declared but there's no definition as far as I can see? I just commented it out so I could compile.
  2. I wasn't fully clear on how to leverage the AudioEventResponder::setForceResponse(true) configuration with the other SD file access I'm doing for project data, but I basically just tried replacing the various areas of the code I knew might take a little time, where I was formerly calling yield(), with AudioEventResponder::runPolled() but I think this was still producing hangups when trying to write to the SD while audio was playing.
  3. Using the first configuration, it seemed that I was able to get away with not having to put in any AudioEventResponder::disableResponse() and AudioEventResponder::enableResponse() calls, but after a while I had another hangup. I tried wrapping my open/close/readBytes/write calls in those disable/enable calls, but it didn't seem to make it any better. You can test this using the AsyncIO code I posted in this thread while audio is playing, it should give you a good test case. I'm a bit tired out right now to go and make a full test case again, but I can maybe in a week, since my AsyncIO code has changed a bit since then and I wouldn't mind making some test cases in a week or so.
  4. This may be anecdotal, I have no measurements to prove this, but I was hearing worse timing and some glitching with either configuration, but I noticed that the RESAMPLE_BUFFER_COUNT was set to 5 in your latest branch, so it could have been that, as I ended up bumping it back up to 7 and heard less glitchiness. There are many permutations / variables involved here, so if you would like me to focus on one specific piece, let me know.

I ended up reverting to using the original implementation with EventResponder and masking the EventResponder yield calls in the SdFat library, which I know is not really a good long term solution. But I'm just a tad burnt out right now.

One question / request: Is it possible to allow for the wav file open() calls to be parallelized when a user wants to play them concurrently? For instance, my groovebox sequencer might have a kick on step 1, and a snare hit on step 5, but if they both are triggered on step 9, it would be nice to have the ability to not have to open them / start buffering them sequentially, as I want to reduce the latency caused by multiple files being opened one after another. It would be nice to enable some sort of option for making audio file reads atomic, in that sense. But maybe this is not possible due to how the underlying SD libraries work. Just was hoping that could work because there is noticeable difference in latency when multiple sample files are triggered at the same time. Maybe this is where the pre-cueing strategy could work, too.
 
Was thinking something like this, conceptually:

C++:
#define MAX_TRACKS 16
#define MAX_STEPS 64

void onStep()
{
    // Iterate over all tracks
    for (uint8_t t = 0; t < MAX_TRACKS; t++)
    {
        // Get the current step of the track
        uint8_t current_track_step = _sequencer_state.track_steps[t].current_step;
        // If the step state is ON, add it to the playback batch
        if (_sequencer.tracks[t].step[current_track_step].state == STEP_STATE::ON) {
            // build the batch of wav files to play atomically.
            // possibly open the files here, etc...
            AudioEventListener::addToWavPlaybackBatch(_sequencer.tracks[t].step[current_track_step].sample_name);
        }
    }

    // now start buffering/playback on all of the wav files together in the batch
    AudioEventListener::playWavBatch();

    // update the sequencer's step state to the next step, etc...
}
 
Sorry about the play_failed() thing ... that was left over after using it to halt the system when it was about to crash, and look at the stack and objects' states using TeensyDebug. Removed and pushed, but you can just comment it, as you've found.

As noted (probably unclearly) in post #179, you can either use the disableResponse() /enableResponse() yield-masking approach, or setForceResponse(true) (once, in setup()) plus runPolled() "as often as needed" - at least once per loop(), but possibly more often, e.g. just before / after a time-consuming display access.

Whatever approach you use, the rule of not doing any of the above from interrupt remains, of course. You can modify your own data structures, and it seems from testing that it's OK to change playback rate from 0.0 to 1.0 or whatever, but that's strictly it!

Given what you've said about updating your AsyncIO, it's probably best if you can decompress a bit and then provide a small example - chances are too high I'd use out-of-date code to do something in a way you wouldn't :giggle:

I don't think it's likely to be possible to parallelise calls to open() in any meaningful manner. There's no awareness from one playback object of the existence or state of another, and even if that were added somehow, it would have to be extended into the SdFat library (and other filesystems) as some sort of open_multi(list_of_files). It's a very niche use case, and I'd be surprised if the gains would be worthwhile.

I wasn't aware that the playback cueing scheme we devised has a particular issue with latency, provided all the required samples are ready by the time they're actually triggered. There absolutely is, of course, if you don't use cueing ... there's no way around that. Obviously, as you say in #183, there's different ways to implement that look-ahead capability - we just did a simple proof-of-concept, looking ahead just one beat. Looking further ahead is possible, but you may end up with cued-up tracks you don't need if the user changes patterns; it's very application-dependent.

Happy to look at any simple examples you can provide, of course.
 
Thanks. Could it be feasible option to have the sample files’ open/close states programmatically set by the user? The sample files could remain open while a pattern is playing them, so that they only need to be opened once. Then, when a pattern changes and different samples are used on the next pattern, the previous samples can all be closed. I think that could reduce latency?

I’m mainly looking to avoid having to add 16 more playback objects and having to implement that per-step cue system. But that may be unavoidable. Plus even at a playback rate of 0, I’m worried that the file will play just enough before the rate is readjusted to where it’s noticeable.
 
So ... have a bunch of samples open and stopped at the start, then assign to a track and start when triggered? I think that works once, then when you recycle them there's no mechanism in the library to "seek to 0", and in any case that'd probably be very nearly as time-consuming as simply re-starting. Not quite, because of not needing to open, but I think you'd still start to get noticeable latency if you had a lot of recycling going on at a particular step.

The extra objects are a bit of an overhead, but mainly in terms of buffer memory, which isn't really an issue given you have loads of PSRAM. Managing the cue/trigger system is also a bit painful, but once you've wrapped it up in a class with a simple API you can forget about it forever. Your one voice per track approach at least means you're not worrying about polyphony and note stealing.

If you turn audio interrupts off briefly when calling setPlaybackRate() and play() you can guarantee not to play a fraction of the sample while cueing, though there's a risk of that in itself causing a dropout. I believe it's enough to ensure that you call setPlaybackRate() before play() - the rate is stored, so when the next audio interrupt notices a newly-started object, it'll honour the zero rate and just output 128 copies of the first sample. That's certainly been my experience, but if it's not yours then it's a bug that needs fixing! The risk is low even if the calls are reversed, as the object doesn't set its playing flag until after all the time-consuming stuff has executed, so there's only a tiny period when it'd be seen as playing at the non-zero speed.

You can check at trigger time whether a cued object has briefly started by calling getBufferPosition1(), which should either be 0 or a small number, maybe half the WAV header size. I don't think positionMillis() gives a correct value, looking at the code...
 
A couple of other thoughts... I didn't comment on the choice of size and number of buffers; I think that's very much a case of fine-tuning when you have a working system, and are trying to explore the boundaries. Bigger buffers give you more insulation against the occasional slow SD card access and display update, but carry a penalty in memory usages and time taken to cue up.

I would recommend you build in some sort of instrumentation to your classes, so you can check the actual cueing times and trigger latency. Because triggers are happening in an ISR, you could actually get a mismatch of 2.9us between starts if they begin in the middle of an audio interrupt.
 
A couple of other thoughts... I didn't comment on the choice of size and number of buffers; I think that's very much a case of fine-tuning when you have a working system, and are trying to explore the boundaries. Bigger buffers give you more insulation against the occasional slow SD card access and display update, but carry a penalty in memory usages and time taken to cue up.

I would recommend you build in some sort of instrumentation to your classes, so you can check the actual cueing times and trigger latency. Because triggers are happening in an ISR, you could actually get a mismatch of 2.9us between starts if they begin in the middle of an audio interrupt.
Yeah some instrumentation would be good, I hope to get there eventually, still just in the "trying to keep all these jenga blocks from falling over" phase of development. I have a lot of tech debt already. I would add you as collaborator to my repo if you're interested in possibly having a birds-eye view of how I might better handle these things, but I would understand if that prospect might be troublesome for you.
 
Well, I wouldn't be able to promise to achieve anything concrete, that's for sure! Also my availability is going to be a bit variable for a while, so ... maybe after that? I'm not really an application kinda guy, I like to get the low-level tools working just so, then you lot can do cool stuff with them. Though I do plan to get back to my dynamically patchable Teensy modular synth at some point.
 
Well, I wouldn't be able to promise to achieve anything concrete, that's for sure! Also my availability is going to be a bit variable for a while, so ... maybe after that? I'm not really an application kinda guy, I like to get the low-level tools working just so, then you lot can do cool stuff with them. Though I do plan to get back to my dynamically patchable Teensy modular synth at some point.
Sent you an invite! Feel free to look at your leisure. Would appreciate any feedback you are able/willing to give.
 
Another question: I notice better playback timing for short one-shot samples (~100kb file size or less) when using a smaller sample size and buffer count (say, 512 samples @ 5 buffers). However, for longer samples (and larger file sizes) I tend to hear some glitchiness when initially playing a sample at that configuration.

The question is, what if the library would allow you to set minimum and maximum sample sizes and buffer counts, and the library itself dynamically sizes the sample and buffer values based on the size of the file being played?

So instead of having to find a hard coded sweet spot, you would hard code the upper and lower limits but the playback objects buffer the samples dynamically within those limits.
 
Is that using “direct” playback, rather than the cue+trigger scheme? Can you quantify the buffer size / count that cause issues with a specific small file? I’m wondering if it’s only occurring when the file is smaller than the available buffer space. It shouldn’t, but bugs happen!

If your gist could be tweaked to exhibit this then it’ll make it much easier to investigate. Rather than rely on my ears I tend to use a ‘scope to look at timings, so it may be necessary to pan the short sample hard left, and the long one right, and drop the others very low in the mix.

If at all possible I’d like to get it working without making it too clever under the hood … it’s pretty complex already, and messing with different buffer sizes makes heap fragmentation even more likely.
 
I've pushed a few changes to the branch. There was still some debug cruft in there, and a class name clash with the corresponding fixes in my playback / recording updates to Audio.

I also made a minor change to the preloading, as it kept trying to load even when a short sample file was exhausted. It'll only save a few milliseconds, though. There's something weird I can't fathom when allocating the buffers - it takes many microseconds to create each one. I thought it was due to extmem_malloc() zeroing the returned memory (which is undocumented AFAIK, and in theory a waste of time), but my efforts to stop it actually made things ever so slightly worse.
 
Good, hope it makes it to TD 1.60. I’m still confused why my attempts to disable it didn’t work…or maybe it’s quick but the search for free space is slow. Or something else.
 
Latest branch seems good so far for me. Just an FYI, I made my own fork here because I'm using PlatformIO and I had to modify ResamplingReader.h to use forward declarations in a ResamplingReader.cpp file for the resetStatus, getStatus, and triggerReload functions.

Semi-unrelated question: if I wanted to use a brand new micro SD port that shares the main SPI bus, would I get worse read times? I heard the built-in SD card uses QSPI. I was just thinking if I could use the built-in SD card for project data I/O, and a new SD card port on the main SPI bus for the audio reads, so that it's all separate, it might be a better situation without having to worry about concurrent reads/writes all that much.
 
Good news.

Yes, you would almost certainly get worse read times. The QSPI bus manages ~20MB/s for a single file read, so about 160Mb/s. I think people have pushed SPI to 50Mb/s or so, but I don't recall if that's officially achievable or just how far they could push it for their system!

Having said that, there is of course no real substitute for instrumenting your code and giving it a try. I've got a decent 'scope, but probably one of those $10 8-channel logic analysers would be adequate for the job, and reduce the code you need to write to dropping in a bunch of digitalWriteFast() calls.
 
wow, great work everyone! Just came here to say.... I have created a very small simple proof-of-concept which demonstrates how the pitch shifting and interpolation works without any of the features/storage-classes/abstractions - so you can understand it better and adapt it if needed.

have fun,
cheers!!
 
wow, great work everyone! Just came here to say.... I have created a very small simple proof-of-concept which demonstrates how the pitch shifting and interpolation works without any of the features/storage-classes/abstractions - so you can understand it better and adapt it if needed.

have fun,
cheers!!
Ive used a similar algorithm on my Teensy CDJ-1000 project to pitch shift and scrub
But without the DMA buffer, just straight interrupt based audio stream

Do you think we could achieve time stretching with buffered audio and a windowing algorithm in the same manner?
 
You'd think so ... but it'd take a bit of figuring out.

My buffering is pretty dumb, in that it assumes playback is either forwards or backwards, though there's a bit of a special case for ping-pong looping. Otherwise, once a buffer at the front of the vector hasn't been used on an update() cycle, it's regarded as stale, moved to the back, and reloaded with the correct data for the playback direction in force.

For scrubbing it's possible that the direction could change, so it would be better to check if the middle entry in the vector was unused, then recycle the appropriate end buffer to the opposite end and fill it. You'd want more entries in the vector, which is easy enough. Hmmm ... actually, I remember now that the vector is always playing from low-numbered entry to high, so the linear search for the correct sample is fast. If playing forwards, the first entry has the lowest-numbered samples; in reverse, it has the highest-numbered.

So, distinctly doable, with a bit of effort. I won't have any time to look at it in the near future, though :(
 
Back
Top