Years ago, I did write a 7 band Goertzel DTMF decoder for a different AVR chip. It worked quite well. The Goertzel algorithm was coded in hand optimized assembly and used under 15% of the CPU time for all 7 bands.
That old code was never released. It's collecting dust somewhere.