May have been barking up wrong tree here . I tried putting a hack in when the root->bus_num == 5 don't do the wait. Built system, installed, and it looked like very little gap between bytes. But then I noticed my spi buss speed was something like 4K, I changed it back to 8K in the code, which again currently limited to 6.25k and then reran the code and the gaps are back. I also noticed that the run times for the different tests are not much different. So I am now guessing the system is somehow limiting it. Maybe it is simply the propagation time, from when I put a byte out on the TX until the response comes back on the RX and then we put the next byte out...
Wonder if there is any TX/RX buffer you can make use of here. That is, can you queue up more than one TX and as such the main poll_transfer loop would not to be updated to know the completion when we have transmitted and received the number of bytes specified....