I think you take a big shortcut in your computation. First, even if the frame buffer is 320x240, at least currently, nearly all modeline use a line repeat = 2, this means the DMA will work as if the frame buffer is 320x480, this doubles the number of used cycles.Hi qix67, at 180Mhz, you would have 180,000,000/(320*240)/60 = 39 cycles for each single pixel for a 320x240 frame. Not a lot, but perhaps doable for a simple RLE decompression, with >50% CPU still free.
When DMA works, due to byte access, it should use at least 5-6 cycles on SRAM per pixel. During this time, the CPU won't be able to access the frame buffer.
When a DMA TCD must be loaded and stored, 32 bytes must be stored to SRAM and 32 bytes must be loaded from SRAM. It is possible to avoid SRAM access conflict by storing frame buffer in SRAM_L (lower 64KB) and having data in SRAM_U (upper 192KB) but a conflict will always happen when drawing inside frame buffer. Moreover, SRAM_U requires an additional wait-state when accessed (that's what kinetis reference manual says).
Then, you might have optimisations like an interrupt only every 8 lines which fills a 320x8 buffer to save on interrupts.
However, in this case, the solution is not 100% hardware anymore. Moreover, this means you only use sprites because graphic primitives won't be able to draw easily in a compressed frame buffer. But I agree with you, working one line at a time is inefficient. Also, 2 buffers should be used to have one currently displayed while building the other.
If everything works as expected, I hope to be able to adjust pixel width as I want, improve DMA bandwidth usage using 32 bits access instead of 8 bits and to solve the memory problem. At first, I want to reach a frame buffer of 800x600 pixels (really) in RGB332 and later 800x600 in RGB565 with only 2 or 3 additionnal cheap components