what distances do you try?
128 samples in delay correspond to max 3ms delay or 1 m distance
so if you are, say 50 cm distant you should see the peak at about 64 samples.

As the I2S input may also introduce 3 ms delay, so, you could try to increase max delay to say 256 or 512 to see if there are constant peaks
(you may need to increase Audio buffers)