To add on to what the others have asked/answered.
It is sometimes hard to know exactly how to answer... Many times the questions don't have enough information to give a decent answer. Things like for which Teensy? Are you just wanting to learn something or is there some underlying problem you are trying to solve.
One of the great things about Arduino on most platforms. Almost all of the source code for the system is installed on your machine. So if you wonder how something is done, you can easily just look at the sources. Only exceptions I have worked on are some of the Robotis Boards (OpenCM, OpenCR), where they build and release archive files (plus header files). So anytime I want to understand how it is done, I usually bring up some form of Editor (in my case sublimetext) and either if lucky when I have an open folder on Teensy directories, it finds what I am looking for in symbols, else I just do a global search over the whole open folder or a subset of folders, to find the symbol.
But simple answer:
The function delay is built into the core file and maybe you can or maybe you can not replace it.
That is in the case of T4.x, the function is in the source file delay.c which I believe only has two symbols it contains (delay and micros), neither of which are defined with weak attribute. And as the core files are built and put into an archive file (.a), if your sketch had both of these symbols defined than my guess is the linker might be happy. If you only defined your version of delay, than the linker will bring this file in it and as such the linker will error out with duplicate symbols. In the case of T3.x and TLC, it is in pins_teensy.c and there are a lot of symbols defined in this file so would be a pain to try to replace just delay... Of course in either case you could simply edit the sources and change it how you need it. But again this effects everything you compile using those sources.
Now is yours more efficient? (Actually assuming you use @defragsters version that handles the case where the timer overflows and goes back to 0...) You gain not calling yield.
So in that case could be faster. Of course it will also break anything that makes use of yield functionality. Things like using Arduino things like void SerialEvent() {....} to process an event...
But over the last few releases of Teensyduino, we put work in to help minimize that. In many cases it may boil back down to check one or two flags and return...
Hope that helps.