Forum Rule: Always post complete source code & details to reproduce any issue!
Page 2 of 2 FirstFirst 1 2
Results 26 to 43 of 43

Thread: Acoustic Echo Cancelation(AEC)

  1. #26
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    5,679
    mu: i meant, move only the multiplication outside , since it is constant inside that loop. this will not help much, but this is all very heavy, and every saved cycle helps

    Could you upload a wavefile, please ?
    Last edited by Frank B; 02-22-2016 at 10:03 PM.

  2. #27
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    5,679
    Quote Originally Posted by snirtikam View Post
    and also running at slower sampling rate.
    Would simple downsampling ((sample1+sample2)/2) work ?

  3. #28
    Senior Member
    Join Date
    Jan 2013
    Location
    San Francisco Bay Area
    Posts
    641
    Quote Originally Posted by snirtikam View Post
    I use MATLAB a lot, so...
    In case it's useful to anyone, octave is a drop-in replacement for matlab. As long as there is no gui stuff, things mostly just work. octave is slower than matlab, but in case someone needs to run a matlab function to verify something, octave would likely work.

  4. #29
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,327
    Ok, I've made a first attempt at applying the DSP extensions to one part of this. Since there's no test cases, I don't really have any way to verify if I got all the math right.

    Anyway, this original code:

    Code:
        for(int j = gg; j > 0; j-=1) {
          // j from 128 to 0
          // j+h from h+128 to h
          // gg-j from 0 to 128
          yhat += (x[j+h]*w[gg-j])/32768;
          xtdl += x[j+h]*x[j+h];
        }
    gets replaced by this:

    Code:
        uint32_t *ww =   (uint32_t *)(w);
        uint32_t *wend = (uint32_t *)(w + gg);
        uint32_t *xx = (uint32_t *)&(x[128 + h - 4]);
        int64_t yhat64 = 0;
        do {
          // TODO: avoid slow unaligned access when h is odd
          uint32_t w1 = *ww++; // bring in 4 values from w[]
          uint32_t w2 = *ww++;
          uint32_t x1 = *xx++; // bring in 4 values from x[]
          uint32_t x2 = *xx++;
          xx -= 8;
          // x order: w1b, w1t, w2b, w2t
          // w order: x2t, x2b, x1t, x1b
          yhat64 = multiply_accumulate_16tx16b_add_16bx16t(yhat64, x2, w1);
          yhat64 = multiply_accumulate_16tx16b_add_16bx16t(yhat64, w2, x1);
          xtdl = multiply_accumulate_16tx16t_add_16bx16b(xtdl, x1, x1);
          xtdl = multiply_accumulate_16tx16t_add_16bx16b(xtdl, x2, x2);
        } while (ww < wend);
        yhat = yhat64 / 32768;
    Ideally, the outer loop would be changed to process 2 samples, so we can avoid the slow case where h is odd.

    Before trying more DSP extension optimization, I really believe we should consider running the filter with 1 tap for every 2 audio samples. That should allow it to be run half as often, and require only half as many taps. Presumably echos don't occur on the time scale of 1/44100th of a second, so hopefully this won't affect the algorithm's performance too much?

    Reducing the filter this way will also allow the DSP extensions to be much better, because processing 2 samples at once will work nicely with the 32 bit memory bus, to avoid the h = odd case that's much slower than when h is even.

  5. #30
    Junior Member
    Join Date
    Feb 2016
    Posts
    17
    I'm not sure that this will be correct for NLMS. I will try to run it and see if it works. thank you for the update. I didnt get a chance to upload the sound sample but hopefully by tonight I will. Thank you again.
    Last edited by snirtikam; 02-22-2016 at 11:50 PM.

  6. #31
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,327
    In theory, that code should perform exactly the same math. Well, except the divide by 32768 is moved outside the loop, so perhaps very slight differences from round off errors. Otherwise, it should be identical.

    The DSP extension version just loads 4 values of w[] and 4 values of x[] into packed 32 bit registers. Then it does the exact same math, but using the special instructions which do two 16x16 multiplies and a 64 bit accumulate. Instead of index variables, pointers are used to sweep through the data.

    Of course, it's entirely possible I made a mistake. This is utterly untested. But absent any mistakes, this is supposed to be exactly the same math. The code looks very different, and it should run much faster, but the results are supposed to be identical.

  7. #32
    Junior Member
    Join Date
    Feb 2016
    Posts
    17
    Hello Paul, Thank you for improvements on the algorithm. I've uploaded the sound sample into github. Hope it will help with testing.

    Thanks

  8. #33
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,327
    That file is much too large to fit into Teensy's internal flash.

    Can you upload the original WAV file?

  9. #34
    Junior Member
    Join Date
    Feb 2016
    Posts
    17
    Sorry about that. Didn't realize it would be so bit. I thought it would fit on SD card. Anyway, i've uploaded original .wav file.

  10. #35
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,327
    I'm listening to the file now. I do not hear any echo.

  11. #36
    Junior Member
    Join Date
    Feb 2016
    Posts
    17
    That file contains no echo, but if you feed this file as input for both 'Mic' and 'x', then error should output nothing, or close to nothing. Basically, we are making sure that NLMS is working and it is eliminating the signal. They way i've tested NLMS algorithm on Teensy is with the sketch i've uploaded to github. In that sketch I'm taking signal from microphone and sending it to NLMS as both 'Mic' and 'x'. the error array is the one i'm sending to be played. If NLMS is working then speaking into microphone should not produce any output in the speaker or headphones. Now this set up is one sided and just shows proof of concept that NLMS is working. Next step would be to connect two teensys together, and in that case, 'Mic' will take input from microphone, 'x' will be incomming signal from the other teensy, which will also be played to speaker, and error will be sent to the speaker of the other teensy. Hopefully that makes sense.

    Click image for larger version. 

Name:	filter7.jpg 
Views:	78 
Size:	35.8 KB 
ID:	6448

  12. #37
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,327
    Oh, I misunderstood the state of this code's development. I assumed it was already known to be working and only in need of optimization. Looks like I need to undo the optimization which removed the redundant input pointer!

  13. #38
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,327
    Ok, I've restored the Mic input pointer.

    Perhaps it's much too early to consider applying the DSP extensions. As you can see from the portion I already did, the DSP extensions require complicated code. Normally this is only done after the simpler, normal style C code has been throughly tested on non-realtime data.

  14. #39
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,327
    Quote Originally Posted by snirtikam View Post
    Next step would be to connect two teensys together....
    Click image for larger version. 

Name:	filter7.jpg 
Views:	78 
Size:	35.8 KB 
ID:	6448
    I'm afraid I must disagree. That's an insane plan for debugging this code!

    The next step would be composing several WAV files with varying types of echo. Then test the NLMS code on all of them, initially by reading the file and writing another file, before the code is optimized enough to run int real time.

    Optimization for speed should be attempted only after the code is very throughly tested for correctness!

    The next step is most certainly NOT to start running poorly tested code on 2 devices separated by substantial distance and operated by 2 people! That's a certain recipe for a frustration and (probably) never getting this working.

  15. #40
    Junior Member
    Join Date
    Feb 2016
    Posts
    17
    I was getting way ahead of myself. I agree, testing between two teensys is the last step. One way to simulate echo would be to put the audio file through a delay by few milliseconds and send that to 'Mic' in NLMS algorithm and the original one to 'x'. This would be a good way to simulate echo.

  16. #41
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,327
    For the sake of testing, I believe you should create several small WAV files with various echoes already encoded into the data. You can use Audacity or any number of other sound editing programs to create these sounds files. Or you could even do it in Matlab and export the data (somehow).

    At the very least, you should create some files with multiple copies of the original echoed at different times. At least a couple files where the echoes have been low-pass filtered would be a good plan.

    Even better would be to physically use a tightly closed room with hard walls and record real sound with natural room echo. Many inexpensive sound recording devices exist, and if you don't have access to any, you can probably use a modern iPhone or Android to make a real recording with similar or better fidelity than you'll get from inexpensive microphones (eg, on the Teensy audio shield).

    Sound clips should be limited to 4 or 8 seconds, so they'll fit into Teensy's internal flash memory, with ulaw encoding at 44 or 22 kHz rate. For testing, the same file can be played over and over. Small files that fit entire in Teensy's memory will make testing simpler, since they'll avoid the large overhead of SPI communication to the SD card or external flash chip.

    Recording some good test sounds is relatively easy, compared to developing these algorithms and testing and optimizing the code. So don't skimp on this important step. The better your test clips, the better your odds of getting the code working without unpleasant surprises later. And by unpleasant surprise, I meant a critical flaw that means you have to go back to almost the beginning and redo almost everything!

  17. #42
    Junior Member
    Join Date
    Jan 2016
    Posts
    4
    The idea of the NLMS algorithm isn't to remove the echoes and allow the original speech through. It is to eliminate the speech and the echoes caused by the response of the room. This is mainly used for teleconferencing systems.

    Consider the following diagram:

    Click image for larger version. 

Name:	Screen Shot 2016-02-23 at 11.39.42 AM.png 
Views:	58 
Size:	130.0 KB 
ID:	6451

    Without cancelation the user on the far end hears their voice.

    Click image for larger version. 

Name:	Screen Shot 2016-02-23 at 11.39.50 AM.png 
Views:	57 
Size:	127.0 KB 
ID:	6452

    The AEC NLMS algorithm removes this voice.

    Click image for larger version. 

Name:	Screen Shot 2016-02-23 at 11.39.56 AM.png 
Views:	61 
Size:	104.8 KB 
ID:	6453


    The NLMS algorithm is a basically adapting its coefficients by taking steps the size of mu until it minimizes the error. You want the output of the system e[n] to be 0. This means that the error of the difference between the coefficients of your adaptive filter and the signal at the mic is 0.

    Click image for larger version. 

Name:	Screen Shot 2016-02-23 at 11.39.22 AM.png 
Views:	65 
Size:	54.7 KB 
ID:	6450



    That being said, I think recording speech data from a typical room and using it to see if the algorithm is working is a good idea. The results we would be looking for from this experiment is what snirtikam described before. We wouldn't want to hear any speech from the wav file in the headphones.


    https://en.wikipedia.org/wiki/Echo_s...d_cancellation

    https://en.wikipedia.org/wiki/Least_mean_squares_filter
    Last edited by EEKFILL; 02-23-2016 at 06:19 PM.

  18. #43
    Senior Member+ Frank B's Avatar
    Join Date
    Apr 2014
    Location
    Germany NRW
    Posts
    5,679
    We have an example-wavefile here (dl the zip in the first post):

    https://forum.pjrc.com/threads/32623...ll=1#post94492

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •