SongBeam - realtime beamforming with four MEMS mics

DD4WH

Well-known member
The SongBeam is a new add-on board for the Teensy 4.1 and has been developed at the Royal Holloway University of London:


It is open source and can be purchased at LabMaker: https://www.labmaker.org/collections/earth-and-ecology/products/songbeam
It is meant as a four-channel recorder and the beamforming is intended to be done from the recordings on your PC.

I tried to develop code for realtime beamforming on the Teensy 4.1 with delay-and-sum beamforming and it kind of works. Code can be found here:


The beamforming effect is not really impressive, and I suspect it is my coding that is the cause. I can easily hear an effect when I turn the PCB with the mics around.

My question would be whether anyone has an idea how to improve the beamforming effect.

I already did the following (with the help of perplexity.ai, but also using github repositories from you Teensy forum users :) ):

* Bandpass FIR filter the audio from the 4 mics (surely helpful, because the beamforming is only effective from about 2kHz to 8kHz)
* normalize the Audio from the 4 mics (not so sure, whether this is effective/correct)
* precalculate the FIR coeffs for a FIR fractional delay filter with 5x oversampling (I am a novice in this topic, so maybe there is possibility of optimization here)
* apply the FIR filter in the frequency domain (standard Fast FFT convolution filtering)
* sum up the four outputs for delay-and-sum beamforming output

Maybe you have ideas where I could best try to optimize in order to get a more pronounced effect of the beamforming. However, I am a novice in beamforming, so maybe I overrate the effect I can get from this small minimal array with four mics in a row with distances of 45mm, 30mm and 45mm.
 
I simplified the calculation of the beamforming (now without expansion). Also the beamforming effect should be tested with an audio source in the far field, i.e. at least 1m away from the mic array. And it should be tested with the beamforming steered at an angle between +-30 and +-60 degrees. In that case, the attenuation of orthogonal sound sources could reach up to 20dB. I made the fault of testing at 0 degrees, which causes all delays to be zero and only allows up to 6dB attenuation in the best case. New code is on github.
 
That's interesting! I have not yet had contact with beamforming and do not understand, why you need 4 microphones in a line instead of two.
I wonder, if the focus effect could be stronger, if the distances between the microphones would be larger? Also calibration of sensitivity between the microphones using a sound source on the centre line might be helpful?
 
Interestingly, I found it very hard to find general information on microphone beamforming that is suitable for non-EE-engineers like me.

But I think your questions could be answered here, I found that info very helpful:


More mics -> narrower beam, better attenuation
Distance between mics influences the frequencies that can be nicely beamformed.

I think calibration of mic sensitivity is overrated :) . For that purpose I think the normalization I use in the code is good enough.

I am still in the process of testing and playing around, trying to understand what I am doing.

However, the beamforming with the recent code seems to work pretty well now.
 
Alain, not sure what you mean, you can order it at LabMaker ... see the link in first post.
 
I made the fault of testing at 0 degrees, which causes all delays to be zero and only allows up to 6dB attenuation in the best case. New code is on github.
Frank, a couple quick comments.
I have not checked yet implementation but I like that you use franctional delays (with sinc function).
No, You must be able to look boad side (all delays are zero).
The beam pattern is defined by the overall aperture and when beam are plotted in cosine scale they all have the same width and off-beam attenuation.
 
Frank, a couple quick comments.
I have not checked yet implementation but I like that you use franctional delays (with sinc function).
No, You must be able to look boad side (all delays are zero).
The beam pattern is defined by the overall aperture and when beam are plotted in cosine scale they all have the same width and off-beam attenuation.
Walter, thanks for your comment! Still learning a lot in this interesting topic. The mics are unequally spaced at 0, 45mm, 75mm, 120mm. This probably influences beam form and sidelobes differently according to steering angle, maybe?
I may try to implement some more advanced algorithm with the help of some AI, but I am unsure whether it's worth the extra effort for such low no. of mics. But I have to read some more first.
And I have to think about a way to objectively test the strength/quality beamforming, no idea at the moment how to do this at home.
Very interesting topic and thanks to Robert Lachlan for designing SongBeam and putting it into the open source domain.
 
"New orders are expected to be fulfilled around mid-June, 2025."
Alain, only thing I can say is that I successfully ordered one a few weeks ago. I think they left an old comment on the website (available since 8 months 😉)?
 
Alain, only thing I can say is that I successfully ordered one a few weeks ago. I think they left an old comment on the website (available since 8 months 😉)?
Oh I have to read more carefully.

Does it need the extra pins right next (before) the sdcard holder? Those are non standard with a 4.1 with pins.
 
Interesting subject that I haven't come across before - thanks for the link to the basics it is really helpful.

Sounds like you're no stranger to using the accumulated knowledge of all humanity to help on coding problems and that you understand the limitations. I asked my pals - claude code and codex - who came up with a couple of points that looked credible and a lots of second order ideas (attachment)

Bug: Bandpass Filter Window
There is a bug in 0_SongBeam256.ino:1095-1096 that significantly degrades the bandpass filter:

float32_t window = 0.42f - 0.5f * cosf(2.0f * M_PI * n / (NUM_TAPS - 1)) +
0.08f * cosf(4.0f * M_PI * n / (NUM_TAPS - 1));

The window uses NUM_TAPS - 1 (= 128), but the bandpass filter is FIR_NUM_TAPS = 64 taps long. It should use M (= 63). As written, only the rising half of the Blackman window is applied to the 64 coefficients, making the filter asymmetric with very poor stop-band rejection. This means out-of-band noise is leaking through, degrading the beamforming SNR.

Fix: Replace (NUM_TAPS - 1) with M on both lines.

Amplification Before Summation
At line 455, each mic channel is amplified by 10x before the 4-channel sum:

arm_scale_f32(&float_buffer[mic][0], 10.0f, &float_buffer[mic][0], BLOCK_SIZE);

This amplifies noise on each channel before the beamformed sum can provide array gain. Uncorrelated noise from all 4 mics gets boosted equally, reducing the effective SNR improvement of the beamformer. The 10x scaling should be applied after the beam sum at line 464, where coherent signal has added constructively and incoherent noise has partially cancelled.


I'm interested in applications - are you using it for wildlife recording or something else?
 

Attachments

  • claude.txt
    5.4 KB · Views: 12
  • codex.txt
    4.5 KB · Views: 11
For video conference usages, there are some microphone arrays, which claim to dynamically focus onto an active speaker to suppress surround noise. I wonder, if this could be used for bird songs too, if the direction of the bird's beam is not known in advance. This would give an advantage over a bowl?
 
Having done array processing and beamforming for over 40 years, I read with interest Claude and Codex observation. As to be expected, it is half science half phantasy (or non applicable).
eg.
3. Broadband delay-and-sum is frequency-blind
The same time delay is applied at all frequencies. Low frequencies get almost no benefit; high frequencies may get grating lobes. A frequency-dependent approach would be more effective.
time-delay-and-sum is indeed broadband and is 'frequency-blind'.
Low frequencies get almost no benefit
Yes, but this is valid for all types of beamforming.
high frequencies may get grating lobes
Yes, but only above the design frequency of the array, which is given by the spacing of the phones. No serious user will use sparse arrays for beam forming without first coarse time-delay bearing estimation (zooming into a coarse diection)
A frequency-dependent approach would be more effective.
This is not correct for broadband applications, but only for narrow-band (single frequency signals). A K-Omega beamformer is, however , a useful tool to watch directional spectra. Combining different beams in frequency space is eqivalent to interpolation filter in time domain.

@DD4WH , You may have a look to https://www.passiveacoustics.org/2025/03/23/multi-phone-processing-3/ and play with it
 
Interesting subject that I haven't come across before - thanks for the link to the basics it is really helpful.

Sounds like you're no stranger to using the accumulated knowledge of all humanity to help on coding problems and that you understand the limitations. I asked my pals - claude code and codex - who came up with a couple of points that looked credible and a lots of second order ideas (attachment)

Bug: Bandpass Filter Window
Fix: Replace (NUM_TAPS - 1) with M on both lines.

Amplification Before Summation
At line 455, each mic channel is amplified by 10x before the 4-channel sum:


I'm interested in applications - are you using it for wildlife recording or something else?
Thanks a lot, @houtson ! Amazing that the AI found the two bugs, I corrected them in my code.

However, the two other reported "bugs" are not bugs. The AI failed to recognize full initialization of the FFT mask buffers and was also wrong with the scaling in arm_cfft_f32, which in fact IS doing the divide by FFT_size already in the inverse FFT CMSIS code.

The SongBeam was set up as a time-sceduled standalone four-channel mic array recorder for recording birds and other wildlife. The recordings could then be beamformed in the lab in order to get better automatic ID by BirdNET or other classifiers.

As the SongBeam is using a T4.1 (which is complete overkill for recording-only), I wanted to give it a little more work to do ;-). It is just me playing around a bit in order for me to learn about beamforming DSP processing. In the future, maybe I would like to use mic arrays in the field for localizing birds in order to make better estimates of population density in birds and other acoustically active animals. Maybe also I could give some of my BSc students some task to do within this topic. Just thinking out loud at the moment.

@WMXZ, thanks for your comments and thanks for the link to your webpage, I have to spend a little longer with that in order to fully grasp it. I am not so familiar wit MATLAB code, so I have to learn!

@AlainD - yes the pins are needed, because they route the RTC backup battery to the Teensy.

@Mcu32 - Frank, thanks for that link, looks interesting and quite cheap. Maybe hackable for an alternative array? Does it use electret mics and preamps and separate ADCs? Then maybe one could simply cut the I2S traces to the MCU and connect them to a Teensy?

Thanks to all of you for your thoughts and comments.
 
I don't know exactly what kind of microphones these are. They have a defined distance between them—that's all I know. They could also be digital. With a little searching, you can find a PDF with a technical drawing and chinese text (Chatgpt helps to translate it). There should also be a stereo output for the inbuilt echo cancellation—this is also shown in the USB data. The DSP is sanded, and no label is visible. But I also think that the thing is intended for the frequency range of human voices (conference systems, toll booths, etc. – it might not work so well for other frequency ranges).

Why don't you connect it to a RASPI (e.g., nano 2w or better - ARM too, +full featured NEON instructions)? They are also faster than Teensy, and you have much more RAM. C++ also works there. I don't think you can access the I2S data. Just use USB, it's recognized cleanly, even under Linux—and you can work on a decent computer, which speeds up development a lot. You can always move to something smaller later without too much fuss.
 
Last edited:
I have a couple of digital Mems and I always wanted to do the beamforming thing in air (not allways underwater, which is so expensive). Maybe I will finally put it to work. As I need some additional challanges, I will first do it on T4.1 (the trivial part) but then on a RP2350 doing the I2S with PIO (I have it for a single data line, so need to extend to multiple data lines)
 
Back
Top