If you post full code, that might help.
I would expect to be able to do:
Code:
digitalWriteFast(10, LOW); //numbers are just examples.
digitalWriteFast(0, LOW);
digitalWriteFast(20, LOW);//Write HIGH in callbackHandlers.
SPI.transfer(dmabuf, nullptr, 2, callbackHandler);
SPI3.transfer(dmabuf3, nullptr, 2, callbackHandler3);
SPI4.transfer(dmabuf4, nullptr, 2, callbackHandler4);
And I would expect the main loop to do the 3 digitalWrites, then set up 3 transfers, and then do whatever comes next in the code. (And later do whatever is in the callback handler).
As you say, I don't think it's possible to get much improvement when transferring 6 bytes though.