I believe that, provided the pin number is a constant, digitalReadFast(N) will compile to a single assembler instruction.
Yes, or at least the minimum possible number of instructions.
Unlike AVR which has dedicated I/O instructions, ARM has only general purpose load and store instructions. The I/O registers have atomic set and clear registers, which digitalWriteFast() uses, so only a single store instruction is generated. But at least 2 other instructions are needed, to initialize one register with the address and another with the data. Then the store instruction happens, using those 2 values. Neither can get encoded directly into the instruction, as is done on AVR for the first few 8 bit ports.
Depending on your surrounding code, often the compiler will move those other 2 initialization instructions outside of loops, so it can reuse the values if you're doing more than 1 digitalWriteFast(). But if you're calling other functions or doing a lot of math that requires keeping many local variables or temporary results, or even similar expressions used multiple times, the compiler might decide those registers are better allocated to speed up your other code. Then it would put the initialization near the store instruction.
DigitalReadFast does the same, with a single load instruction. It returns 0 or 1, but since it's inline, if your surrounding code only checks boolean true vs false, the compiler will optimize away the conversion from the bitmask to specific numerical values 0 or 1.
Doesn't the K20 have an interrupt cause-choice of "EITHER" for edge detection?
Yes, the pin interrupt hardware supports rising, falling, or both edges, as well as both low and high level sensitive. Details are in section 11.14.1 of the reference manual.