AudioProcessorUsage on higher samplingrates

Status
Not open for further replies.

CorBee

Well-known member
Hi

As we are developing the TeensyBat (https://forum.pjrc.com/threads/38988-Bat-detector/page36) with further enhancements I would like to keep track of
the AudioprocessorUsage(). This is directly available from the Audiostream library see https://www.pjrc.com/teensy/td_libs_AudioProcessorUsage.html

I have tested this on a T36 and got readouts that are at 0.5 whilst the above html states :
Returns an estimate of the total CPU time used during the most recent audio library update. The number is an integer from 0 to 100, representing an estimate of the percentage of the total CPU time consumed.
The library declares this also as a float so 0.5 could for instance mean 50% of 0-1 but ...
The odd thing is that when I test alike code on a T41 the same setup gives values around 3.2 ...
I have seen (@FrankB) some people mentioning that this calculation is dependent also on the samplerates. As we are using a lot higher samplerates than the defaults (we go up to 384k without problems) I was wondering how I can calculate a reliable estimate.

kind regards
Cor
 
The range is 0 to 100.

A code change to the definition of AUDIO_SAMPLE_RATE (making it a float) ends up promoting the entire calculation to float on all boards except Teensy LC.


we are using a lot higher samplerates than the defaults (we go up to 384k without problems) I was wondering how I can calculate a reliable estimate.

Maybe you're hitting a numerical precision issue?

Find these 4 lines in AudioStream.h and change them to uint32_t.

Code:
        uint16_t cpu_cycles;
        uint16_t cpu_cycles_max;
        static uint16_t cpu_cycles_total;
        static uint16_t cpu_cycles_total_max;

Then edit AudioStream.cpp. File this lines and remove the right shift by 4 bits. (I believe it's 6 bits on Teensy 4.x because much higher clock speeds are possible)

Code:
                        cycles = (ARM_DWT_CYCCNT - cycles) >> 4;
Code:
        totalcycles = (ARM_DWT_CYCCNT - totalcycles) >> 4;

This will keep all the numerical precision of the cycle counter.

Then back in AudioStream.h, edit CYCLE_COUNTER_APPROX_PERCENT()

Code:
#define CYCLE_COUNTER_APPROX_PERCENT(n) (((n) + (F_CPU / [B]32[/B] / AUDIO_SAMPLE_RATE * AUDIO_BLOCK_SAMPLES / 100)) / (F_CPU / [B]16[/B] / AUDIO_SAMPLE_RATE * AUDIO_BLOCK_SAMPLES / 100))

Change 16 to 1, and 32 to 2, since you're now collecting the full cycle count.
 
On T4.x the sample rate is float.
Not sure when and why this happened, or if it was intended - but it is no problem. You just need to know it.
 
Hi,

Had to change this in the Audiostream.cpp
Code:
uint32_t AudioStream::cpu_cycles_total = 0;
uint32_t AudioStream::cpu_cycles_total_max = 0;

After this the code runs fine on a T4.1 and reports 0.6-0.7, changing regularly. On the T3.6 I simply get a readout of 0.5 which is also constant, the reporting is all
using the same code so it must be in the readout of AudioProcessorUsage().

Should I now see this as 0.6 in the range 0-100 or 0.6 in the range 0.0 -1.0 ... is the processor nearly sleeping or working hard ?


I am using a single adapted audioStream library setup for both T3.6 and T4.1 (I cloned this from the core for T4) to allow for instance the usage of
PSRAM for really large audio-buffers (up to 10 seconds at 281k has allready been tested and works fine).

Code:
//audio in PSRAM for T41
#if defined(__IMXRT1062__) 
#ifdef PSRAM
#define AudioMemory(num) ({ \
	static EXTMEM audio_block_t data[num]; \
	AudioStream::initialize_memory(data, num); \
})
#else
#define AudioMemory(num) ({ \
	static DMAMEM audio_block_t data[num]; \
	AudioStream::initialize_memory(data, num); \
})

#endif


(@PAUL)
I was wondering if any of the codechanges you proposed are also valid for the T3.6 or not. For instance I have left this part for the T3.6 intact (see below) and added a T41 section.
Code:
 #if defined(__MK66FX1M0__) 
			   cycles = (ARM_DWT_CYCCNT - cycles) >> 6;
			 #endif
#if defined(__IMXRT1062__) 
			   cycles = (ARM_DWT_CYCCNT - cycles) >> 4;
#endif

The setup for the CYCLE_COUNTER_APPROX_PERCENT
Code:
#if defined(__IMXRT1062__)
//OLD #define CYCLE_COUNTER_APPROX_PERCENT(n) (((float)((uint32_t)(n) * 6400u) * (float)(AUDIO_SAMPLE_RATE_EXACT / AUDIO_BLOCK_SAMPLES)) / (float)(F_CPU_ACTUAL))
#define CYCLE_COUNTER_APPROX_PERCENT(n) (((n) + (F_CPU / 2 / AUDIO_SAMPLE_RATE * AUDIO_BLOCK_SAMPLES / 100)) / (F_CPU / 1 / AUDIO_SAMPLE_RATE * AUDIO_BLOCK_SAMPLES / 100))
#endif

#if defined(__MK66FX1M0__) 

#if defined(KINETISK)
#define CYCLE_COUNTER_APPROX_PERCENT(n) (((n) + (F_CPU / 32 / AUDIO_SAMPLE_RATE * AUDIO_BLOCK_SAMPLES / 100)) / (F_CPU / 16 / AUDIO_SAMPLE_RATE * AUDIO_BLOCK_SAMPLES / 100))
#elif defined(KINETISL)
#define CYCLE_COUNTER_APPROX_PERCENT(n) ((n) * (int)(AUDIO_SAMPLE_RATE) + (int)(AUDIO_SAMPLE_RATE/2)) / (AUDIO_BLOCK_SAMPLES * 10000)
#endif
#endif

Thanks for helping me out,

regards
Cor
 
I wonder if it would be better to use F_CPU_ACTUAL for the T4. However, this is not a constant...
However, there is probably no way around it. After all, you want correct results.
 
The only thing I find important is that the numbers the CYCLE_COUNTER_APPROX_PERCENT delivers do give a good idea of the relative processor-time needed to process a single incoming audio_block of data (as that what I think it does).
 
Seems I wasnt reading @PAUL carefully, he stated to remove the >>4:
cycles = (ARM_DWT_CYCCNT - cycles) >> 4;

Ive just done that and I see on the T4.1 a 2.3/2.4 % usage. Thats usefull information, on the T3.6 I am still getting only 0.5 so something must be wrong still in my setup.

Cor
 
It seems that I am not getting a proper readout for ARM_DWT_CYCCNT on the T3.6. Will check which register this should be.
 
Checked my code using VisualCode and the reference for ARM_DWT_CYCCNT is
going to :
#define ARM_DWT_CYCCNT (*(volatile uint32_t *)0xE0001004) // Cycle count register

So nothing odd at that point. But this register never seems to show a value larger than 0, as if the cyclecounter never got started ?

EDIT: this part in sofware_isr is also present and active ...
Code:
#if defined(KINETISK)
	ARM_DEMCR |= ARM_DEMCR_TRCENA;
	ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;
	uint32_t totalcycles = ARM_DWT_CYCCNT;
#elif defined(KINETISL)
	uint32_t totalcycles = micros();
#endif
 
Last edited:
It seems the compiler was not picking up things as I had hoped, but I now have proper readouts for T3.6 and T4.1. On the T3.6 I see that the current TeensyBat uses 10% of the available time to process audioblocks and on the T4.1 this is 2.3%. Both processors thus do not have a very "heavy" task at hand.

Thanks !
 
Status
Not open for further replies.
Back
Top