Noisy output from Teensy3.1 - we need a one shot FFT function, please.

Status
Not open for further replies.

randomrichard

Active member
Capture.JPG

This picture shows a comparison between data files cross-correlated on a PC using a VB program and on a Teensy3.1 with audio shield on which a SD card is used to access the same files. The program on the Teensy is a translation from VB to Arduino C and is designed to recover four events sampled by a chirp transmitter, two positive and two negative. The PC does the job well. The same signals can be seen on the Teensy output but it is very noisy. The program is long and took me months of work and I am unwilling to release it. However, I was wondering if the ARM processor is inherently noisy doing complex FFTs and IFFTs when they are written out in Arduino C code. I have checked for bugs and am pretty sure there are none. I have also checked the file downloads and they are OK.

The FFT code is basically (no pun intended) S W Smith's table 12.4 from his 2003 DSP book. The file downloading is based upon the excellent advice here: http://forum.arduino.cc/index.php?topic=75491.0. I have been unable to use the ARM's CMSIS signal processing facilities which appear to be designed solely for continuous audio processing.
 
Are you using the Teensy audio board or the Teensy ADC?
Looks like the difference in noise could be due to a 16-bit ADC on PC and the 12-bit ADC on the T3.1
[edit] ignore the above. I misunderstood your post but I've re-read it and I think I understand it now.[/edit]

There is nothing inherently noisy about doing an FFT on an ARM.

Pete
 
Last edited:
Given that the y-axis are scaled differently your images are not really telling the whole story.
 
Given that the y-axis are scaled differently your images are not really telling the whole story.
- That's right, the amplitudes are down too. What puzzles me is that the spike polarities are still visible. I am presently making small changes all through the program to try and find the bug - but so far without success. I would like to make the FFT a function because it is a subroutine in the successful VB program but because the variables are big arrays I think I will have to use pointers, which I haven't mastered yet. Here is the C version of Smith's BASIC code from my program. Maybe an expert can detect my error? Note that some #defines and declarations are at the top of the code. *'s are gaps except where there are three, when they are *'s!

Code:
*//Do FFT on test data and save as Rex[] and Imx[] (Smith Table 12.4)
**unsigned aL;
**unsigned aM =  unsigned(log(FFTSIZE) / log(2)); // = 10
**unsigned aK = 0;
**double TR, TI;
**unsigned myI;
**unsigned aJ = (FFTSIZE/2);
**
**//1**************************************************************
**for(i = 1; i <= (FFTSIZE-2); i++)
**{
****if( i >= aJ) goto LOOP1190;
****TR*=*Rex[aJ];
****TI*=*Imx[aJ];*
****Rex[aJ]*=*Rex[i];
****Rex[aJ]*=*Imx[i];
****Rex[i]*=*TR;
****Imx[i]*=*TI;
****LOOP1190:
****aK*=*FFTSIZE/2;
****LOOP1200:
****if(aK > aJ) goto LOOP1240;
****aJ*=*aJ*-*aK;
****aK*=*aK*/*2;
****goto*LOOP1200;
****LOOP1240:*
****aJ*=*aJ*+*aK;
**}
**//1************************************************************
****//2*******************************************************
****for (aL = 1; aL <= aM; aL++) // aL should go from 1 to 10 for 1024 file
****{*
******//Serial.println(aL, DEC);
******unsigned Le = unsigned(pow(2,aL)); // i.e. 2,4,8,...512
******unsigned Le2 = Le/2; // i.e. 1,2,4...256
******double Ur = 1; 
******double Ui = 0; 
******double Sr = cos(Pi/Le2);
******double Si = -sin(Pi/Le2); // note negative
******unsigned newJ = 1;
******//3******************************************************************
******for( (newJ = 1); (newJ <= Le2); newJ++)
******{
*******unsigned JM1 = newJ -1; // i.e. zero
********//4*******************************************************************
********for ((myI = JM1); (myI <= FFTSIZE-1); myI=(myI+Le))
********{
**********unsigned myIp = myI + Le2;
**********TR*=*Rex[myIp]***Ur*-*Imx[myIp]***Ui;
**********TI*=*Rex[myIp]***Ui*+*Imx[myIp]***Ur;
**********Rex[myIp]*=*Rex[myI]*-*TR;
**********Imx[myIp]*=*Imx[myI]*-*TI;
**********Rex[myI]*=*Rex[myI]*+*TR;
**********Imx[myI]*=*Imx[myI]*+*TI;
**********
********}
********//4******************************************************************
********TR*=*Ur;
********Ur*=*TR***Sr*-*Ui***Si;
********Ui*=*TR***Si*+*Ui***Sr;
******}
******//3****************************************************************
****}
****//2***********************************************************
****//Serial.println("Finished first FFT ");
 
Last edited:
- That's right, the amplitudes are down too. What puzzles me is that the spike polarities are still visible. I am presently making small changes all through the program to try and find the bug - but so far without success. I would like to make the FFT a function because it is a subroutine in the successful VB program but because the variables are big arrays I think I will have to use pointers, which I haven't mastered yet. Here is the C version of Smith's BASIC code from my program. Maybe an expert can detect my error? Note that some #defines and declarations are at the top of the code. *'s are gaps except where there are three, when they are *'s!
Apart the fact that the stars make reading difficult, I realize that indices run from 1 to N and not 0 to N-1 as required for C code.
(what is VB indexing convention?)
 
Apart the fact that the stars make reading difficult, I realize that indices run from 1 to N and not 0 to N-1 as required for C code.
(what is VB indexing convention?)

Thanks. I was incorrectly following Smith's layout but with minor adjustments to the use of the indices should be able to comply with the requirement. According to my VB2012 book VB arrays are the same - starting at zero. So I may have to adjust my VB code too! If so I am still puzzled about the noise levels. However, given that the test file I used to cross-correlate with the chirp had zero noise - it consists of 4 single point spikes convolved with the chirp - I guess that my PC answer should have had zero noise too. Interesting.
 
The code is a bit 'not very standard C style' but I think all the indices are correct, and 0 based.

I did a bit of rewrite into more typical C style:
Code:
void fft(double * Rex, double * Imx) {
//Do FFT on test data and save as Rex[] and Imx[] (Smith Table 12.4)
unsigned aL;
unsigned aM =  unsigned(log(FFTSIZE) / log(2)); // = 10
unsigned aK = 0;
double TR, TI;
unsigned myI;
unsigned aJ = (FFTSIZE/2);
unsigned int i;

//1******************* * **
  for(i = 1; i <= (FFTSIZE-2); i++) {
    if( i < aJ) {
      TR = Rex[aJ];
      TI = Imx[aJ];
      Rex[aJ] = Rex[i];
      Rex[aJ] = Imx[i];
      Rex[i] = TR;
      Imx[i] = TI;
    }
    aK = FFTSIZE/2;
    while(aK <= aJ) {
      aJ = aJ - aK;
      aK = aK / 2;
    }
    aJ = aJ + aK;
  }
  //2*********************
  for (aL = 1; aL <= aM; aL++) // aL should go from 1 to 10 for 1024 file
  {
    //Serial.println(aL, DEC);
    unsigned Le = (1<<aL); // i.e. 2,4,8,...512
    unsigned Le2 = Le/2; // i.e. 1,2,4...256
    double Ur = 1; 
    double Ui = 0; 
    double Sr = cos(M_PI/Le2);
    double Si = -sin(M_PI/Le2); // note negative
    unsigned JM1 = 0;
    //3********************** 
    for( (JM1 = 0); JM1 < Le2; JM1++)
    {
      //4**********************
      for ((myI = JM1); (myI <= FFTSIZE-1); myI=(myI+Le))
      {
        unsigned myIp = myI + Le2;
        TR = Rex[myIp] * Ur - Imx[myIp] * Ui;
        TI = Rex[myIp] * Ui + Imx[myIp] * Ur;
        Rex[myIp] = Rex[myI] - TR;
        Imx[myIp] = Imx[myI] - TI;
        Rex[myI] = Rex[myI] + TR;
        Imx[myI] = Imx[myI] + TI;
      }
      //4********************** 
      TR = Ur;
      Ur = TR * Sr - Ui * Si;
      Ui = TR * Si + Ui * Sr;
    }
    //3**********************
  }
  //2 ********************
  //Serial.println("Finished first FFT ");  
}
 
Found one error:

On line 16 in your code:

****Rex[aJ]*=*Imx;

Should be

****Imx[aJ]*=*Imx;
 
mlu
Thank you so much for your help! It could have taken me weeks to find that error. Here is the new (correct) cross-correlation output:

correct.JPG

I will also rewrite as you suggest.

Richard
 
mlu
Once again thanks for your help! I have completely rewritten my discontinuous cross-correlation program as you suggested and have now successfully transferred it to the Teensy 3.6 where one set of data now takes 82 millisec to process (clock 180 MHz). The model output below took 8 seconds to stack 100 (identical) sets. So a 10**4 stack is now possible, giving a 100 fold increase in s/n. Just what I need! Now for the real input circuit/software...
Richard
output.JPG
 
Status
Not open for further replies.
Back
Top