I have a function that interleaves two datasets into a buffer that gets transmitted via DMA.

Code:
inline void packCmplx(int32_t *pCmplx, const int32_t *pStop, const int16_t *re, const int16_t *im)
{
	do
	{
		SCB_CACHE_DCCIMVAC = (uint32_t)pCmplx;
		__asm__ volatile("dsb");         
 
		*pCmplx++ = (*re++) << 8;
		*pCmplx++ = (*im++) << 8;

		*pCmplx++ = (*re++) << 8;
		*pCmplx++ = (*im++) << 8;

		*pCmplx++ = (*re++) << 8;
		*pCmplx++ = (*im++) << 8;
		
		*pCmplx++ = (*re++) << 8;
		*pCmplx++ = (*im++) << 8;
	} while (pCmplx < pStop);
}
This seems to work correctly, but I'm not sure why- or whether or not its the best approach.

Shouldn't I need an ISB instruction following DSB since the store address is being written to DCCIMVAC in-loop? Would a single call to arm_dcache_flush_delete() after interleaving yield any advantages?