Cheers people, hope everyone is okay!

I've been working on an audio project with T3.6, recently migrated my code to T4.0 and I'm facing several issues/problems that I don't really know how to solve. I'm bumping my head against the wall. Just hope somebody can help, it's a long shot but I have to try

General description
I'm using an 44.100Hz timer interrupt (basically a uint16 counter that wraps around every 1.486s) to generate and process the audio signal. Processed audio is (or it should be) stored into an 128-word int16 buffer array, which is then fed to AudioPlayQueue 512 times in one counter cycle (~345 times per second). AudioPlayQueue is connected to the I2S output in Audio System Design Tool and processed by PCM1681 DAC (or SGTL5000 Audio Shield).

Code:
#include <Audio.h>
AudioPlayQueue           audio;
AudioOutputI2S           i2s1;
AudioConnection          patchCord1(audio, 0, i2s1, 0);

#define SAMPLE_RATE 44100
#define BLOCK_SIZE 128
static uint16_t cycle=0;
static int16_t waveform[BLOCK_SIZE];
static int16_t input;
extern const int16_t wavetable[2048];

void audioInt()
{
  uint8_t cbX=cycle%(BLOCK_SIZE);
  input=wavetable[cycle>>5];
  // do some audio processing
  waveform[cbX]=input;
  if (cbX==(BLOCK_SIZE-1))
  {
    int16_t *p1 = audio.getBuffer();
    memcpy(p1, waveform, BLOCK_SIZE<<1);
    audio.playBuffer();
  {
  cycle++; // automatically reset at 65536
}
void setup()
{
  noInterrupts();
  AudioMemory(128);
  IntervalTimer *t1 = new IntervalTimer();
  t1->begin(audioInt, 1000000.0f / (float)SAMPLE_RATE);
  interrupts();
}
void loop()
{  
  //read some inputs
}
1. AudioPlayQueue and speed
Description of AudioPlayQueue.getBuffer() states "This buffer is within the audio library memory pool, providing the most efficient way to input data to the audio system", but I'm not really sure I'm doing this the most efficient way. Couldn't find any meaningful instructions about AudioPlayQueue anywhere, very poorly documented stuff. It's rolling in an audio rate interrupt, gets executed 345 times per second together with other greedy math-crunching stuff going on in the rest of the cycles, and I wonder is there any faster/better way to do this? Especially because I'm using more than one channel of audio
Code:
    int16_t *p1 = audio.getBuffer();
    memcpy(p1, waveform, BLOCK_SIZE<<1);
    audio.playBuffer();
Do I absolutely have to get a new *p pointer for every new transfer? Is there a better/faster/more efficient way to transfer the array to AudioPlayQueue.getBuffer() pointer than memcpy? Is there a way to write directly to I2S DMA buffers without using AudioPlayQueue? I've stumbled upon class AudioOutputI2S : public AudioStream in <output_i2s.h>, but I have no idea can that be used or how to use it

2. T4.0 vs. T3.6 issue
This code worked without a problem on T3.6 for hours, but now on T4.0 it works for maybe 3 minutes and then my Teensy freezes. I checked AudioProcessorUsage and AudioMemoryUsage, CPU stays under 0.5% but AudioMemory consumption increases every few seconds until it exhausts completely. Doesn't matter even if I declare AudioMemory(512) in Setup, it will freeze somewhere around 160 - maybe it's not even related, I have no way to tell. I can only speculate that AudioPlayQueue.getBuffer() doesn't release memory after AudioPlayQueue.playBuffer() is executed Maybe the problem that isn't normally obvious happens because of the interrupt routine and the speed? Is this a bug or am I doing something wrong? Like I said, it worked like a charm on T3.6

3. I2S issue
I solved this one in the meantime, turns out the problem was between the monitor and the chair

P.S. two more bits at the end: Is it better to use 44.117 instead of 44.100 sample rate for I2S on T3.6/T4.0? Also can I make I2S on T4.0 work at 22.050Hz and how?
Thanks in advance for your kind answers, stay safe. And a big thank you to Paul and everybody else involved in Teensy development