I think there will be no problem space wise on a teensy 3.1. 3.0 should be close, but possible with a small MTU. How large is an MP3 frame anyway?
I never tried a teensy 3.0. Maybe it works ?
But there is no reason to try it
3.1 has enough memory to play it, important is, that the transfer-speed is sufficiant and "realtime" enough to have fresh data when it is needed.
Since internetstations sometimes have hundreds of users per stream there are some limitations...
For example, most servers try to send some seconds (i've seen a german station that tried to send more than one minute) audiodata at the beginning of the streaming to fill the buffers of the client.
You could try to ignore most of these packets at startup, but the server thinks you have enough data buffered and will send new data at irregular intervals, which may be too long. The servers know the amount of data that they sent to you and they *think* that they know when the buffer of the client is nearly empty. But this only works if you buffered all data.
There is no handshake on a higher level. It's simple http, like transferring a download or picture. So, MTU and Framesize(1152+header for MP3, for AAC something in this range too..i forgot it
) are not important. The Buffer ist important, nothing else.
To make it even worse, the metadata are embedded in the mp3 stream and you only have a byte-counter which indicates the position of the data. If you're out of sync because you lost a packet (or ignored it), you don't know the position ... and hear blips, chirps and noise from the decoder - the decoders are not able to handle metadata. You must restart the stream. Syncing to the frameheader does not help because the only point where the bytecounter is set to zero is at the beginning...(well, and with every metadata-packet, but when you don't know its position...->crap)
For a local setup with your own streamingservice in your lan, there are ways to fine-tune this, or to use better protocols - this should'nt be a problem (i did not try it).
Edit: It may work to send no "ack" or something like this when you run out of free buffers, and indeed i remember that this helped a bit when i did my first streamingradio some years ago (it had LAN with a STM Cortex-M3 + FRAM + mp3-decoder-chip) (I do the same with the ESP8266 with RTS/CTS)
But i remember too that some servers (or proxy?)stopped the streaming after a minute or so. It was not reliable.
Maybe too much workload for the server, or too much handling for your particular client is detected and classified as "not acceptable". Don't know.
All good clients have large buffers..
If there is a way, i don't know it
Edit:
This is all no show-stopper for any ethernet stack - but i see no way to do really reliable streaming without enough memory in the range of minimum 128..256 KB, better much more, esp. with bitrates > 128kbits/sec