T4 memory to memory using dma

DM5SG

Member
last weekend i tried to figure how use dma to copy data within memory. but i coundnt get it working. searching the net didnt wasnt helpfull neither

here we go :

Code:
#include <Arduino.h>
#include <DMAChannel.h>

DMAChannel m2m(false);
int16_t buffer1[128 * 3];
int16_t buffer2[128 * 2];

void setup() {

  Serial.begin(9600);
  while(!Serial);
  Serial.println("Go..");

  for (int j = 0; j < 128 * 3;j++)
    buffer1[j] = j;

  m2m.begin();
  //m2m.sourceCircular(buffer1, sizeof(buffer1));
  m2m.sourceBuffer(buffer1,sizeof(buffer1));
  m2m.destinationBuffer(buffer2, sizeof(buffer2));
  //m2m.transferCount(sizeof(buffer2));
  m2m.enable();
  
  Serial.print("buffer1[10] = ");
  Serial.println(buffer1[10]);
  
  m2m.triggerManual();
  while (!m2m.complete());
  Serial.print("buffer2[10] = ");
  Serial.println(buffer2[10]);
}

the expected data transfer actually never happens. means.. m2m.complete() never returns true.
iam sure doing something not the right way .. but a missing lib description doesnt make easier.
yes .. there are a view hints here and there in damchannel.h but leaving question marks
 
For memory to memory copy, you probably want triggerContinuously() rather than triggerManual(). Normally triggering copies just 1 element (or "minor loop"), because it's meant for a peripheral which provides the data gradually and sends a trigger each time new data is available.

You also need disableOnCompletion() so it doesn't keep doing the copy over and over.

Something like this

Code:
#include <Arduino.h>
#include <DMAChannel.h>

DMAChannel m2m(false);
int16_t buffer1[128 * 3];
int16_t buffer2[128 * 2];

void setup() {

  Serial.begin(9600);
  while(!Serial);
  Serial.println("Go..");

  for (int j = 0; j < 128 * 3; j++) {
    buffer1[j] = j;
  }

  m2m.begin();
  //m2m.sourceCircular(buffer1, sizeof(buffer1));
  m2m.sourceBuffer(buffer1,sizeof(buffer1));
  m2m.destinationBuffer(buffer2, sizeof(buffer2));
  //m2m.transferCount(sizeof(buffer2));
  m2m.enable();
  m2m.disableOnCompletion();
  m2m.triggerContinuously();
  
  Serial.print("buffer1[10] = ");
  Serial.println(buffer1[10]);

  while (!m2m.complete()) {
    // wait
  }
  Serial.print("buffer2[10] = ");
  Serial.println(buffer2[10]);
}

void loop() {
}
 
The other issue you should know about is dealing with the cache. While this small example works (the buffers are in non-cached DTCM memory), for more complex usage you should align the buffers to 32 byte boundaries (1 cache row). Before DMA which reads the memory, call arm_dcache_flush() on the source buffer to guarantee data you've written to be actually put into the memory instead of only the M7's cache. After DMA, use arm_dcache_delete() on the destination buffer to make sure your reads come from the memory and not prior cached data.

Code:
#include <Arduino.h>
#include <DMAChannel.h>

DMAChannel m2m(false);
int16_t buffer1[128 * 3] [COLOR="#FF0000"]__attribute__ ((used, aligned(32)))[/COLOR];
int16_t buffer2[128 * 2] [COLOR="#FF0000"]__attribute__ ((used, aligned(32)))[/COLOR];

void setup() {
  Serial.begin(9600);
  while(!Serial);
  Serial.println("Go..");

  for (int j = 0; j < 128 * 3; j++) {
    buffer1[j] = j;
  }
[COLOR="#FF0000"]  arm_dcache_flush(buffer1, sizeof(buffer1));[/COLOR]

  m2m.begin();
  //m2m.sourceCircular(buffer1, sizeof(buffer1));
  m2m.sourceBuffer(buffer1,sizeof(buffer1));
  m2m.destinationBuffer(buffer2, sizeof(buffer2));
  //m2m.transferCount(sizeof(buffer2));
  m2m.enable();
  m2m.disableOnCompletion();
  m2m.triggerContinuously();

  
  
  Serial.print("buffer1[10] = ");
  Serial.println(buffer1[10]);

  while (!m2m.complete()) {
    // wait
  }
[COLOR="#FF0000"]  arm_dcache_delete(buffer2, sizeof(buffer2));[/COLOR]
  Serial.print("buffer2[10] = ");
  Serial.println(buffer2[10]);
}

void loop() {
}
 
@DM5SG
not sure, it is a good idea to have destination buffer smaller than source buffer.
It is not obvious from example code that transfer count is being determinate by last buffer call (here "m2m.destinationBuffer(buffer2, sizeof(buffer2));")

OK, you may playing with DMA, but ....
 
@Paul
thanks responding. you pointed "triggerManuel()" would copy a single element. but it actually copys nothing at all. at least according "m2m.complete()"
anyway .. the main idea is/was to copy periodicly a certain amount of data out of buffer1 into buffer2. buffer1 larger then buffer2 is ment to be like this.
reason doing so is to replace data transfer in "AudioInputAnalog" with dma transfer to easy data transfer out of the adc_buffer.
my current version of "AudioInputAnalog" looks like this

Code:
#if defined(__IMXRT1062__)

#include <Arduino.h>
#include "input_adc.h"

extern "C" void xbar_connect(unsigned int input, unsigned int output);

DMAChannel AudioInputAnalog::dma(false);

static __attribute__((used, aligned(32))) uint16_t adc_buffer[AUDIO_BLOCK_SAMPLES * 5];
static __attribute__((used, aligned(32))) uint16_t buffer[AUDIO_BLOCK_SAMPLES * 4];

PROGMEM static const uint8_t adc2_pin_to_channel[] = {
	7,      // 0/A0  AD_B1_02
	8,      // 1/A1  AD_B1_03
	12,     // 2/A2  AD_B1_07
	11,     // 3/A3  AD_B1_06
	6,      // 4/A4  AD_B1_01
	5,      // 5/A5  AD_B1_00
	15,     // 6/A6  AD_B1_10
	0,      // 7/A7  AD_B1_11
	13,     // 8/A8  AD_B1_08
	14,     // 9/A9  AD_B1_09
	255,	// 10/A10 AD_B0_12 - only on ADC1, 1 - can't use for audio
	255,	// 11/A11 AD_B0_13 - only on ADC1, 2 - can't use for audio
	3,      // 12/A12 AD_B1_14
	4,      // 13/A13 AD_B1_15
	7,      // 14/A0  AD_B1_02
	8,      // 15/A1  AD_B1_03
	12,     // 16/A2  AD_B1_07
	11,     // 17/A3  AD_B1_06
	6,      // 18/A4  AD_B1_01
	5,      // 19/A5  AD_B1_00
	15,     // 20/A6  AD_B1_10
	0,      // 21/A7  AD_B1_11
	13,     // 22/A8  AD_B1_08
	14,     // 23/A9  AD_B1_09
	255,    // 24/A10 AD_B0_12 - only on ADC1, 1 - can't use for audio
	255,    // 25/A11 AD_B0_13 - only on ADC1, 2 - can't use for audio
	3,      // 26/A12 AD_B1_14 - only on ADC2, do not use analogRead()
	4,      // 27/A13 AD_B1_15 - only on ADC2, do not use analogRead()
#ifdef ARDUINO_TEENSY41
	255,    // 28
	255,    // 29
	255,    // 30
	255,    // 31
	255,    // 32
	255,    // 33
	255,    // 34
	255,    // 35
	255,    // 36
	255,    // 37
	1,      // 38/A14 AD_B1_12 - only on ADC2, do not use analogRead()
	2,      // 39/A15 AD_B1_13 - only on ADC2, do not use analogRead()
	9,      // 40/A16 AD_B1_04
	10,     // 41/A17 AD_B1_05
#endif
};

void AudioInputAnalog::init(uint8_t pin)
{
	if (pin >= sizeof(adc2_pin_to_channel)) return;
	const uint8_t adc_channel = adc2_pin_to_channel[pin];
	if (adc_channel == 255) return;

	// configure a timer to trigger ADC
	
	const int comp1 = ((float)F_BUS_ACTUAL) / (AUDIO_SAMPLE_RATE_EXACT * 4.0f) / 2.0f;
	TMR4_ENBL &= ~(1<<3);
	TMR4_SCTRL3 = TMR_SCTRL_OEN | TMR_SCTRL_FORCE;
	TMR4_CSCTRL3 = TMR_CSCTRL_CL1(1) | TMR_CSCTRL_TCF1EN;
	TMR4_CNTR3 = 0;
	TMR4_LOAD3 = 0;
	TMR4_COMP13 = comp1;
	TMR4_CMPLD13 = comp1;
	TMR4_CTRL3 = TMR_CTRL_CM(1) | TMR_CTRL_PCS(8) | TMR_CTRL_LENGTH | TMR_CTRL_OUTMODE(3);
	TMR4_DMA3 = TMR_DMA_CMPLD1DE;
	TMR4_CNTR3 = 0;
	TMR4_ENBL |= (1<<3);

	// connect the timer output the ADC_ETC input
	const int trigger = 4; // 0-3 for ADC1, 4-7 for ADC2
	CCM_CCGR2 |= CCM_CCGR2_XBAR1(CCM_CCGR_ON);
	xbar_connect(XBARA1_IN_QTIMER4_TIMER3, XBARA1_OUT_ADC_ETC_TRIG00 + trigger);

	// turn on ADC_ETC and configure to receive trigger
	if (ADC_ETC_CTRL & (ADC_ETC_CTRL_SOFTRST | ADC_ETC_CTRL_TSC_BYPASS)) {
		ADC_ETC_CTRL = 0; // clears SOFTRST only
		ADC_ETC_CTRL = 0; // clears TSC_BYPASS
	}
	ADC_ETC_CTRL |= ADC_ETC_CTRL_TRIG_ENABLE(1 << trigger) | ADC_ETC_CTRL_DMA_MODE_SEL;
	ADC_ETC_DMA_CTRL |= ADC_ETC_DMA_CTRL_TRIQ_ENABLE(trigger);

	// configure ADC_ETC trigger4 to make one ADC2 measurement on pin A2
	const int len = 1;
	IMXRT_ADC_ETC.TRIG[trigger].CTRL = ADC_ETC_TRIG_CTRL_TRIG_CHAIN(len - 1) |
		ADC_ETC_TRIG_CTRL_TRIG_PRIORITY(7);
	IMXRT_ADC_ETC.TRIG[trigger].CHAIN_1_0 = ADC_ETC_TRIG_CHAIN_HWTS0(1) |
		ADC_ETC_TRIG_CHAIN_CSEL0(adc2_pin_to_channel[pin]) | ADC_ETC_TRIG_CHAIN_B2B0;

	// set up ADC2 for 12 bit mode, hardware trigger
	Serial.printf("ADC2_CFG = %08X\n", ADC2_CFG);
	ADC2_CFG |= ADC_CFG_ADTRG;
	ADC2_CFG = ADC_CFG_MODE(2) | ADC_CFG_ADSTS(3) | ADC_CFG_ADLSMP | ADC_CFG_ADTRG |
		ADC_CFG_ADICLK(1) | ADC_CFG_ADIV(0) /*| ADC_CFG_ADHSC*/;
	ADC2_GC &= ~ADC_GC_AVGE; // single sample, no averaging
	ADC2_HC0 = ADC_HC_ADCH(16); // 16 = controlled by ADC_ETC

	// use a DMA channel to capture ADC_ETC output
	dma.begin();
	dma.source(*(volatile const uint16_t *)&(IMXRT_ADC_ETC.TRIG[4].RESULT_1_0));
	dma.destinationBuffer(adc_buffer, sizeof(adc_buffer));
	dma.triggerAtHardwareEvent(DMAMUX_SOURCE_ADC_ETC);
	dma.enable();
}

static int integrator[4] = {0, 0, 0, 0};
static int comb[4] = {0, 0, 0, 0};
static int fc1 = 0, fc2 = 0;
static int dc_block_in = 0, dc_block_out = 0;

int16_t cic(uint16_t *data)
{ // Cascaded Integrator Comb Filter
	int cnt = 4;
	while (cnt--)
	{ 
		integrator[0] += *data++;
		integrator[1] += integrator[0];
		integrator[2] += integrator[1];
		integrator[3] += integrator[2];
	}
	
	int c1 = integrator[3] - comb[0];
	int c2 = c1 - comb[1];
	int c3 = c2 - comb[2];
	int c4 = (c3 - comb[3])>>8; // shift result by log2(R^M)

	comb[0] = integrator[3];
	comb[1] = c1;
	comb[2] = c2;
	comb[3] = c3;
	// FIR compensation filter
	int res = c4 + fc2 - (fc1 << 2);
	fc2 = fc1;
	fc1 = c4;
	return res;
}

void AudioInputAnalog::update(void)
{
	uint16_t *pData = (uint16_t *)dma.destinationAddress();
	audio_block_t *output = NULL;
	if ((output = allocate()) == NULL) return;
	int16_t *pOutput = output->data;
	uint16_t *pBuffer = buffer;
	for (int j = 0; j < AUDIO_BLOCK_SAMPLES * 4; j++)
	{ // needs to be as fast as possible
	  // otherwise current data might be overriden by steady incoming new data 
		if(--pData < adc_buffer) pData = &adc_buffer[sizeof(adc_buffer)/2 - 1];
		*pBuffer++ = *pData;
	}
	pBuffer = buffer;
	for (int j = 0; j < AUDIO_BLOCK_SAMPLES; j++)
	{
		*pOutput = cic(pBuffer);
		// dc removal -> y(n) = x(n) - x(n-1) + 0.998 * y(n-1)
		dc_block_out = *pOutput - dc_block_in + ((dc_block_out * 511) >> 9);
		dc_block_in = *pOutput;
		*pOutput++ = dc_block_out;
		pBuffer += 4;
	}
	transmit(output);
	release(output);
}

#endif // __IMXRT1062__
 
Hallo to all
in recent days i played around with memory to memory copy using T4.
Code:
#include <Arduino.h>
#include <DMAChannel.h>

#define buffer_cnt  128

DMAMEM static uint16_t __attribute__((aligned(32))) buffer1[buffer_cnt*5];
DMAMEM static uint16_t __attribute__((aligned(32))) buffer2[buffer_cnt*4];
DMAChannel *dma;
DMAChannel m2m(false);

void start_dma(void*addr){

  arm_dcache_flush_delete(buffer1, sizeof(buffer1));
  m2m.TCD->SADDR = addr;
  m2m.enable();
  m2m.triggerContinuously();
  while (!m2m.complete());
  arm_dcache_delete(buffer2, sizeof(buffer2));

}

void setup()
{
  Serial.begin(9600);
  while(!Serial);
  Serial.println("go...");

  for (int j = 0; j < buffer_cnt*5;j++) {
    buffer1[j] = j;
  }

  dma = new DMAChannel();
  dma->begin();
  dma->source((volatile uint16_t &)ADC1_R0);
  dma->destinationBuffer(buffer1, sizeof(buffer1));
  dma->triggerAtHardwareEvent(DMAMUX_SOURCE_ADC1);
  dma->enable();

  m2m.begin();
  m2m.sourceCircular(buffer1, sizeof(buffer1));
  m2m.destinationBuffer(buffer2, sizeof(buffer2));
  m2m.TCD->SOFF = -2;
  m2m.disableOnCompletion();

  start_dma(&buffer1[64]);

  for (int j = 0; j < buffer_cnt;j++){
    Serial.print("Buffer1 = ");
    Serial.print(buffer1[j]);
    Serial.print(" ");
    Serial.print("Buffer2 = ");
    Serial.println(buffer2[j]);
  }

  start_dma(&buffer1[128]);

  for (int j = 0; j < buffer_cnt;j++){
    Serial.print("Buffer1 = ");
    Serial.print(buffer1[j]);
    Serial.print(" ");
    Serial.print("Buffer2 = ");
    Serial.println(buffer2[j]);
  }
}

void loop() {
}

and it works just as it supossed to be. reason for duing it at all is/was implementing m2m in AudioInputAnalog.
her it doesnt work. at least m2m.complete() never returns.

Code:
#define FILTER_TAP_NUM 49

static const int16_t coeffs[FILTER_TAP_NUM] = {
  70, 37, -19, -140, -316, -505, -638, -646,
  -485, -166, 232, 578, 727, 578, 126, -508,
  -1098, -1369, -1084, -139, 1386, 3222, 4977,
  6239, 6698, 6239, 4977, 3222, 1386, -139, 
  -1084, -1369, -1098, -508, 126, 578, 727,
  578, 232, -166, -485, -646, -638, -505, -316,
  -140, -19, 37, 70
};

#define ADC_BUFFER_LEN  AUDIO_BLOCK_SAMPLES * 5
#define FIR_DELAY_LEN   AUDIO_BLOCK_SAMPLES * 4 + FILTER_TAP_NUM
#define POLE(val)       ((int16_t)(INT16_MAX * val))

DMAMEM static __attribute__((aligned(32))) int16_t adc_buffer[ADC_BUFFER_LEN];
DMAMEM static __attribute__((aligned(32))) int16_t buffer[AUDIO_BLOCK_SAMPLES * 4];
DMAMEM static int16_t fir_delay[FIR_DELAY_LEN];

static ADC *adc;
static DMAChannel dma(false);
static DMAChannel m2m(false);
static arm_fir_decimate_instance_q15 filter;

static uint32_t cycle = 0;

static void dcRemoval(int16_t *data, int blocksize)
{
  static int xm = 0, ym = 0;
  while(blocksize--){
    //y(n) = x(n) - x(n-1) + POLE * y(n-1)
    ym = *data - xm + ((ym * POLE(0.995)) >> 15);
    xm = *data;
    *data++ = ym;
  }
}

static void cic_decimate(int16_t *indata, int16_t *oudata, int blocksize)
{
  static int intg1 = 0, intg2 = 0, intg3 = 0, intg4 = 0;
  static int comb1 = 0, comb2 = 0, comb3 = 0, comb4 = 0;
  static int fir_del1 = 0, fir_del2 = 0;
  // Cascaded Integrator Comb Filter
  // N=4 R=4 M=1
  while(blocksize--){
    int r = 4;
    while(r--){
      // integrator section
      intg1 += *indata++;
      intg2 += intg1;
      intg3 += intg2;
      intg4 += intg3;
    }
    // comb section
    int c1 = intg4 - comb1;
    int c2 = c1 - comb2;
    int c3 = c2 - comb3;
    int c4 = (c3 - comb4)>>8; // result needs to be shifted by log2(M*N^R)
    // comb delay
    comb1 = intg4;
    comb2 = c1;
    comb3 = c2;
    comb4 = c3;
    // FIR compensation filter
    *oudata++ = c4 + fir_del2 - fir_del1 * 3;
    fir_del2 = fir_del1;
    fir_del1 = c4;
  }
}

void AudioInputAnalog_T4::init(uint8_t pin, ADC_CHN chn)
{
  ARM_DEMCR |= ARM_DEMCR_TRCENA;  
  ARM_DWT_CTRL |= ARM_DWT_CTRL_CYCCNTENA;

  arm_fir_decimate_init_q15(&filter, FILTER_TAP_NUM, 4, (int16_t*)&coeffs, fir_delay, AUDIO_BLOCK_SAMPLES * 4);

  dma.begin();
  dma.source((volatile uint16_t &)((chn == 1) ? ADC2_R0 : ADC1_R0));
  dma.destinationBuffer(adc_buffer, sizeof(adc_buffer));
  dma.triggerAtHardwareEvent((chn == 1) ? DMAMUX_SOURCE_ADC2 : DMAMUX_SOURCE_ADC1);
  dma.enable();

  m2m.begin();
  m2m.sourceCircular(adc_buffer, sizeof(adc_buffer));
  m2m.destinationBuffer(buffer, sizeof(buffer));
  m2m.disableOnCompletion();
  m2m.TCD->SOFF = -2;

  adc = new ADC();
  adc->adc[chn]->setResolution(12);
  adc->adc[chn]->setAveraging(0);
  adc->adc[chn]->startSingleRead(pin);
  adc->adc[chn]->startQuadTimer(AUDIO_SAMPLE_RATE * 4);
  adc->adc[chn]->enableDMA();
}

uint32_t AudioInputAnalog_T4::getCycleCount(){
  return cycle;
}

void AudioInputAnalog_T4::update()
{
  int16_t *pData = (int16_t*)dma.destinationAddress();
  audio_block_t *output;
  // TODO : to figure out why the hack m2m.complete doesnt return
  //        it works fine in a test project but it doesnt in update()
  arm_dcache_flush_delete(adc_buffer, sizeof(adc_buffer));
  m2m.TCD->SADDR = pData;
  m2m.enable();
  m2m.triggerContinuously();
  while (!m2m.complete());
  arm_dcache_delete(buffer, sizeof(buffer));
  if ((output = allocate()) == NULL) return;
  uint32_t start = ARM_DWT_CYCCNT;
#if 1
  // CIC Filter is roughly 3 times faster then the 49 tap Fir
  cic_decimate(buffer, output->data, AUDIO_BLOCK_SAMPLES);
#else
  arm_fir_decimate_fast_q15(&filter, buffer, output->data, AUDIO_BLOCK_SAMPLES * 4);
#endif
  cycle = ARM_DWT_CYCCNT - start;
  dcRemoval(output->data, AUDIO_BLOCK_SAMPLES);
  transmit(output);
  release(output);
}

its works without m2m data transfer smooth and as expacted

Code:
#include <DMAChannel.h>
#include <ADC.h>
#include <arm_math.h>
#include "input_adc_t4.h"


/*
sampling frequency: 176400 Hz
fixed point precision: 16 bit
* Passband 0 Hz - 16000 Hz
  ripple = 2 dB
*/

#define FILTER_TAP_NUM 49

static const int16_t coeffs[FILTER_TAP_NUM] = {
  70, 37, -19, -140, -316, -505, -638, -646,
  -485, -166, 232, 578, 727, 578, 126, -508,
  -1098, -1369, -1084, -139, 1386, 3222, 4977,
  6239, 6698, 6239, 4977, 3222, 1386, -139, 
  -1084, -1369, -1098, -508, 126, 578, 727,
  578, 232, -166, -485, -646, -638, -505, -316,
  -140, -19, 37, 70
};


#define ADC_BUFFER_LEN  AUDIO_BLOCK_SAMPLES * 5
#define FIR_DELAY_LEN   AUDIO_BLOCK_SAMPLES * 4 + FILTER_TAP_NUM
#define POLE(val)       ((int16_t)(INT16_MAX * val))

#ifdef INPUT_ADC_USE_DMA
DMAMEM static __attribute__((aligned(32))) int16_t adc_buffer[ADC_BUFFER_LEN];
DMAMEM static int16_t buffer[AUDIO_BLOCK_SAMPLES * 4];
DMAMEM static int16_t fir_delay[FIR_DELAY_LEN];
#else
static int16_t adc_buffer[ADC_BUFFER_LEN];
static int16_t buffer[AUDIO_BLOCK_SAMPLES * 4];
static int16_t fir_delay[FIR_DELAY_LEN];
#endif
static ADC *adc;
static DMAChannel dma(false);
static arm_fir_decimate_instance_q15 filter;
static uint32_t cycle = 0;

static void dcRemoval(int16_t *data, int blocksize)
{
  static int xm = 0, ym = 0;
  while(blocksize--){
    //y(n) = x(n) - x(n-1) + POLE * y(n-1)
    ym = *data - xm + ((ym * POLE(0.995)) >> 15);
    xm = *data;
    *data++ = ym;
  }
}

static void cic_decimate(int16_t *indata, int16_t *oudata, int blocksize)
{
  static int intg1 = 0, intg2 = 0, intg3 = 0, intg4 = 0;
  static int comb1 = 0, comb2 = 0, comb3 = 0, comb4 = 0;
  static int fir_del1 = 0, fir_del2 = 0;
  // Cascaded Integrator Comb Filter
  // N=4 R=4 M=1
  while(blocksize--){
    int r = 4;
    while(r--){
      // integrator section
      intg1 += *indata++;
      intg2 += intg1;
      intg3 += intg2;
      intg4 += intg3;
    }
    // comb section
    int c1 = intg4 - comb1;
    int c2 = c1 - comb2;
    int c3 = c2 - comb3;
    int c4 = (c3 - comb4)>>8; // result needs to be shifted by log2(M*N^R)
    // comb delay
    comb1 = intg4;
    comb2 = c1;
    comb3 = c2;
    comb4 = c3;
    // FIR compensation filter
    *oudata++ = c4 + fir_del2 - fir_del1 * 3;
    fir_del2 = fir_del1;
    fir_del1 = c4;
  }
}

void AudioInputAnalog_T4::init(uint8_t pin, ADC_CHN chn)
{
  arm_fir_decimate_init_q15(&filter, FILTER_TAP_NUM, 4, (int16_t*)&coeffs, fir_delay, AUDIO_BLOCK_SAMPLES * 4);

  dma.begin();
  dma.source((volatile uint16_t &)((chn == 1) ? ADC2_R0 : ADC1_R0));
  dma.destinationBuffer(adc_buffer, sizeof(adc_buffer));
  dma.triggerAtHardwareEvent((chn == 1) ? DMAMUX_SOURCE_ADC2 : DMAMUX_SOURCE_ADC1);
  dma.enable();

  adc = new ADC();
  adc->adc[chn]->setResolution(12);
  adc->adc[chn]->setAveraging(0);
  adc->adc[chn]->startSingleRead(pin);
  adc->adc[chn]->startQuadTimer(AUDIO_SAMPLE_RATE * 4);
  adc->adc[chn]->enableDMA();
}

uint32_t AudioInputAnalog_T4::getCycleCount(){
  return cycle;
}

void AudioInputAnalog_T4::update()
{
  int16_t *pData = (int16_t*)dma.destinationAddress();
  int16_t *pbuffer = buffer;
  // copy adc dma buffer to a temp buffer. do it as quick as possible
  // otherwise current data might be overriden by steady incoming new data
  // so we care about anything else later on
  int j = AUDIO_BLOCK_SAMPLES * 4;
  while(j--){
        *pbuffer++ = *pData--;
        if(pData < adc_buffer) pData = &adc_buffer[ADC_BUFFER_LEN - 1];
  }
#ifdef INPUT_ADC_USE_DMA
  arm_dcache_delete(adc_buffer, sizeof(adc_buffer));
#endif
  audio_block_t *output;
  if ((output = allocate()) == NULL) return;
  uint32_t start = ARM_DWT_CYCCNT;
#if 1
  // CIC Filter is roughly 3 times faster then the 49 tap Fir and does the same
  cic_decimate(buffer, output->data, AUDIO_BLOCK_SAMPLES);
#else
  arm_fir_decimate_fast_q15(&filter, buffer, output->data, AUDIO_BLOCK_SAMPLES * 4);
#endif
  cycle = ARM_DWT_CYCCNT - start;
  dcRemoval(output->data, AUDIO_BLOCK_SAMPLES);
  transmit(output);
  release(output);
}

maybe someone is able to figure why m2m doesnt work inside AudioInputAnalog_T4::update()
 
@Paul ... yes is saw it ... but i thought there is still room for improvement
i tried recently versions with double buffer dma, went using polyphase fir filter
so far ... the current version without m2m and using cic filter is the fastest so far
 
Awesome. Looking forward to giving it a try... if you're willing to share.

Must admit, I really didn't put much effort into efficiency. I was mainly focused on trying to improve the sound quality. Spend way too much time fiddling with the ADC settings.
 
@ Paul

cic_decimate(buffer, output->data, AUDIO_BLOCK_SAMPLES); takes about 5300 clock cycles
arm_fir_decimate_fast_q15(&filter, buffer, output->data, AUDIO_BLOCK_SAMPLES * 4); takes abour 16200 clock cycles

shuffeling adc_buffer to capture buffer ( about 3500 clock cycles ) using :

while(j--){
*pbuffer++ = *pData--;
if(pData < adc_buffer) pData = &adc_buffer[ADC_BUFFER_LEN - 1];
}

using memcpy or memmove makes it even worse timing wise

therefor the idea of using a second dma channel to copy the adc_buffer
 
@Paul

as attachment 2 versions. one i talked about. and the second where i tried to use a second dma channel to move adc_buffer data
 

Attachments

  • input_analog.zip
    9.3 KB · Views: 87
@Paul

but as i noted in my todays initial post ... in the second version .. m2m.complete() never returns
i tried it upfront seperatly ( code as seen above ) and it works just fine
 
Back
Top