PDA

View Full Version : T3: linking libarm_cortexM3l_math.a



el_supremo
01-04-2013, 06:50 PM
Using Teensy3 with beta10 on Windows.
I'm trying to test the code mentioned in this Arduino DUE forum:
http://arduino.cc/forum/index.php/topic,140107.0.html

The only way I have found to get the library to link in with the test code is to essentially do a compile from the command line - although I use a TCL script to make life easier.
The main problem is being able to insert "-larm_cortexM4l_math" immediately before "-lm" in the gcc command which links everything together. The boards.txt file has a way to insert link options but it only inserts them near the front of the command and the link fails with unresolved references if I do that.
I tried modifying compiler.java to allow me to insert the -l command where I want it but when I restart the IDE it doesn't have any effect.
My generic question is how can I insert a -l option near the end of the gcc command and more specifically, how do I get the IDE to use a modified compiler.java?

Pete
[edit] fixed a typo/idiocy :)

el_supremo
01-04-2013, 10:16 PM
And FYI:
So far I have converted four of the examples from the CMSIS library that is in the Arduino DUE 1.5.1 distribution.
Two of them, arm_dotproduct_example_f32 and arm_fir_example_f32, run and pass their internal test.

arm_convolution_example_f32
runs but fails its internal SNR test

arm_fft_bin_example
hangs when it calls arm_cfft_radix4_f32

Pete

hpyle
01-04-2013, 10:52 PM
You should be able to just add

INPUT(c:/dev/teensy/lib/libarm_cortexM4l_math.a)

into the linker script file (mk20dx128.ld). Adjusting the absolute path of course.

hpyle
01-04-2013, 10:54 PM
Aside: your example says M3. Are you using the M4 libraries, with

#define ARM_MATH_CM4

in your sketch? Should be faster than defining M3.

el_supremo
01-04-2013, 11:04 PM
Thanks very much I will give that a try.
and yes, M3 should be M4 - thanks.

Pete

el_supremo
01-04-2013, 11:36 PM
Modifying the linker script doesn't work properly - I'm probably doing something wrong.
It does seem to find everything in the library except for something called aeabi_f2lz and it also can't find memset.

Pete

PaulStoffregen
01-05-2013, 12:09 AM
One simple way, but not very convenient, would be to manually run the linker. For initial testing this probably makes sense.

In File > Preferences you can turn on verbose info while compiling.

You could just copy the (very long) linking command from Arduino's console panel to a command line, of course with the extra library added. It will overwrite the .elf file. Then copy the objcopy command that creates the .hex from the .elf. As long as the code as compiled successfully even once, the Teensy Loader will remember the name and directory of the .hex file, so after you manually rewrite the .elf and .hex, just leave the Teensy Loader in auto mode and press the button on Teensy.

That's not a very elegant solution, but it is simple and should at least let you test without having to alter the linker script or Arduino IDE. You could put the 2 commands into a script or batch file (if Windows) to automate this to some degree.

Teensyduino's modified .java files are installed to a src directory. You can edit that code, but to use your changes you'll need to recompile the IDE. The first and most difficult step is getting a copy of 1.0.3's source code, and then copying those .java files into it. The source tarball on Arduino's website is incomplete, so you must get it from github (with each Arduino release, I go through quite a lot of work to pull exactly the same code they used, and I compile on all 4 platforms and carefully verify the java bytecode output against Arduino's published downloads). To save you the trouble, here's my copy of 1.0.3's source with the patched .java files already in place.

http://www.pjrc.com/teensy/beta/arduino-1.0.3-src-patched.tar.gz

Please let me know when you've downloaded this large file? I remove it from the server (this link will turn 404).

You'll need the Java JDK and Ant to compile this. On Windows, you'll probably need a cygwin setup to run. It's much easier on Mac and Linux. Also, you should use the JDK from Oracle, version 6u34. It might work if you use OpenJDK (the default on most Linux systems), but the Arduino developers and I use Oracle's JDK, so you'll get the same results if you use that version. This page has info about how to build the code:

http://code.google.com/p/arduino/wiki/BuildingArduino

After you recompile Arduino, you'll have a fresh copy of Arduino without the rest of the Teensyduino stuff. Everything else is just files, so you can copy the hardware/teensy and hardware/tools directories over to the freshly compiled copy. Of you can copy lib/pde.jar from the freshly compiled copy into the one with Teensyduino installed.


The new Arduino 1.5.x has a much more configurable build system. To be honest, I haven't looked at it in detail yet. I plan to support it when (or shortly after) they release 1.5.2. In theory, that system should make these types of customizations easier.

el_supremo
01-05-2013, 12:25 AM
Hi Paul,
I have downloaded the file just in case I try to recompile the IDE - thankyou. But it isn't very likely I'll do that because it is a huge amount of work mucking about with cygwin on Windows.

I've written a TCL script which does all the heavy lifting after the verbose compile. I copy the entire gcc linking command and paste it into a TCL variable. Then I run the TCL script. The script extracts the name of the project and the path to the temporary directory from that variable and then inserts the "-larm_cortexM4l_math" in front of "-lm" (it also changes all the backslashes to slashes). Then it executes the gcc command. It then runs the two objcopy commands and follows that with the teensy_post_compile.
So I just do a compile in the IDE, copy and paste the string and then run the script. As long as it doesn't have any linker errors, the sketch is uploaded to the Teensy3 when I press the reset button.

Pete

hpyle
01-05-2013, 04:42 PM
Thanks Paul, Pete - lovely.
Maybe editing the linker script loads objects in the wrong sequence? It nearly-worked for me too, unresolved "sqrtf" with my test project.
So I rebuilt from Paul's source tree, and a small patch to Compiler.java,

247a248,252
> for (int i = 1; true; i++) {
> String additionalObject = boardPreferences.get("build.additionalobject" + i);
> if (additionalObject == null) break;
> baseCommandLinker.add(additionalObject);
> }

then editing hardware/teensy/boards.txt to add the path to the CMSIS library,

teensy3.build.additionalobject1=c:/dev/teensy/lib/libarm_cortexM4l_math.a

and... it builds :cool:
If you want to try this, the pde.jar is here (http://cabezal.com/misc/arduino-pdejar-103-patched.zip).

hpyle
01-05-2013, 05:45 PM
Wow, it runs quick. For 1000 iterations,
fix_fft (uint16_t): 15.0 seconds
arm_cfft_radix4_f32 (float): 40.0 seconds
arm_cfft_radix4_q31 (int32_t): 5.2 seconds
arm_cfft_radix4_q15 (int16_t): 2.3 seconds.

el_supremo
01-05-2013, 06:50 PM
@hpyle
Excellent! Thanks very much.
I have put the .a file in
C:\teensy\arduino-1.0.3\hardware\tools\arm-none-eabi\arm-none-eabi\lib
so I added this line to my boards.txt file:
teensy3.build.additionalobject1=-larm_cortexM4l_math

Pete

el_supremo
01-05-2013, 09:32 PM
@hpyle
Did you have to do anything to fix_fft to make it work on Teensy3? The sketches I have which work on Arduino just produce garbage on Teensy3.

Pete

hpyle
01-05-2013, 11:23 PM
No, I have the same problem; both 8-bit and 16-bit fix_fft produce mush as far as I can tell.

PaulStoffregen
01-06-2013, 11:08 AM
I've put this patch on my to-do list. Next time I work on the java code I'll incorporate this patch.

PaulStoffregen
02-01-2013, 05:00 PM
I included this patch in Teensyduino 1.12.

hpyle
02-01-2013, 05:15 PM
Thanks Paul!

wintrmute
05-23-2013, 08:22 AM
OK, replying to myself, but here's where I got up to, which didn't require recompiling the arduino java ide.
Grab libarm_cortexM4l_math.a from the internet and copy it to your arduino folder, say into a 'lib' directory.
From the Arduino IDE, cp ./ide-1.5.2/hardware/arduino/sam/system/CMSIS/CMSIS/Include/* ide-1.0.5/hardware/teensy/cores/teensy3/
Also copy the arm_math.h file into that directory as well.

Then go into ide-1.0.5/hardware/tools/arm-none-eabi/bin and: mv arm-none-eabi-gcc arm-none-eabi.gcc.real
Then I created a tiny perl script to replace it, which checks if the arguments are mentioning a cortex cpu, and if so, add the -L and -l library options to include the library we downloaded earlier, and moves the -lm to the end of the array.

This works to compile and upload a sketch for me, but the Teensy seems to crash when it runs it.

wintrmute
05-23-2013, 09:30 AM
OK, after some hassles with what I think were memory overruns, I've made a sketch that runs the FFT functions and completes.
I thought I'd share the Perl script that monkey-patches the libraries into the command line as well in case that's useful to people.

The perl script I mentioned is at: https://gist.github.com/TJC/5633484
A complete sketch that compiles and produces output from the FFT on serial is at: https://gist.github.com/TJC/5633491

wintrmute
05-23-2013, 10:34 AM
By the way, some comments earlier were that only "mush" was coming out of the Cortex FFT routines.
I've just run the sample arm data through both the Teensy and also libfftw on my Ubuntu desktop, and the results came up the same, albeit with less precision on the cortex, but I think that's just the Serial library rounding floats to two decimal places.


13.86
41.35
7.63
20.12
22.34
49.17
31.38
25.18
26.54
4.77
35.07

Kian
05-29-2013, 03:43 AM
Hi guys,

Seems like some interesting work here. I am interested to try out FFT too in my project. However, this discussion seems all too confusing for me.
I am running Mac OS 10.7.5, using Arduino 1.0.3 with Teensy loader 1.14 ( I installed the latest version but it appears as version 1.07 when I check the version in the toolbar "About").

What do I need to do to get my codes to compile? I already download the libarm_cortexM4l_math.a file but I don't know which folder to put it in. And whether I still need to edit any files or do some patching.

Appreciate any help I can get.

Thanks.

wintrmute
05-29-2013, 03:53 AM
Did you read my two posts (#17 and #18) above yours? That's what you need to do. I think it should be the same on OSX as it was for me on Linux.

Kian
05-30-2013, 02:48 AM
Yes, I did read your previous post. My problem is it seems like the arduino directory structure is quite different between mac and linux.

On my mac, the arduino folder is organized as follows:

Arduino/Contents/Resources/Java/hardware/arduino/...

I basically don't see any sam or CMSIS folder inside that. Just to make sure, I have enable hidden files to be seen too.

Inside my main Arduino root directory, I only see 2 folders, "Contents" and "scr". Do I create a new lib folder and put the libarm_cortexM4l_math.a inside it?






Did you read my two posts (#17 and #18) above yours? That's what you need to do. I think it should be the same on OSX as it was for me on Linux.

wintrmute
05-30-2013, 02:54 AM
Ah, perhaps it wasn't clear from what I wrote, but when I was mentioning directories called "ide-1.0.5" and "ide-1.5.2" I was referring to those versions of the Arduino IDE.
ie. I've downloaded both versions, and installed them separately.
Then I've run teensyduino and installed it into the 1.0.5 one, and then manually copied some parts of 1.5.2 over myself.

Kian
05-30-2013, 03:07 AM
Thanks for the clarification.

But I still can't find the CMSIS folder. Hmm, maybe Paul might be able to advise where it is if he happens to read this thread.

Kian
05-30-2013, 08:14 AM
I think I know what is wrong. The CMSIS folder is found in the Arduino 1.5.2 distribution only. So I have to copy it and paste it into my Arduino 1.0.5 distribution.

wintrmute
05-30-2013, 08:27 AM
I think I know what is wrong. The CMSIS folder is found in the Arduino 1.5.2 distribution only. So I have to copy it and paste it into my Arduino 1.0.5 distribution.

I thought that's what I said earlier today! I guess it wasn't very clear. Sorry.

Kian
05-31-2013, 07:58 AM
Sadly, its still not working for me.

This is what I did:

1. Download libarm_cortexM4l_math.a from the internet and copy it to my arduino folder, into a new 'lib' directory located at:
Arduino.app/Contents/Resources/Java/hardware/arduino/lib/

2. Download Arduino 1.5.2. Copy:
Arduino1.5.2/Contents/Resources/Java/hardware/arduino/sam/system/CMSIS/CMSIS/Include/* to
Arduino.app/Contents/Resources/Java/hardware/teensy/cores/teensy3/

3. Goto Arduino.app/Contents/Resources/Java/hardware/tools/arm-none-eabi/bin and rename arm-none-eabi-gcc to arm-none-eabi.gcc.real

4. Download the perl script from https://gist.github.com/TJC/5633484, name it as arm-none-eabi-gcc and put it inside:
Arduino.app/Contents/Resources/Java/hardware/tools/arm-none-eabi/bin

5. Change the path inside arm-none-eabi-gcc:
From:
push @new_args, "-L/home/tobyc/git/arduino/lib", "-larm_cortexM4l_math", "-lm";
To:
push @new_args, "./../../../arduino/lib", "-larm_cortexM4l_math", "-lm";


So now when I try to complile a program that performs FFT, I get the following error:
Cannot run program Arduino.app/Contents/Resources/Java/hardware/tools/arm-none-eabi/bin/arm-none-eabi-gcc": error 13, Permission denied.

Any help?

wintrmute
05-31-2013, 08:05 AM
chmod +x Arduino.app/Contents/Resources/Java/hardware/tools/arm-none-eabi/bin/arm-none-eabi-gcc

PaulStoffregen
05-31-2013, 11:47 AM
This is what I did:


Wow, that's a lot of complex steps!!!!

Several months ago I added hpyle's patch (http://forum.pjrc.com/threads/14845-T3-linking-libarm_cortexM3l_math-a?p=17298&viewfull=1#post17298) to Teensyduino, so you should not need to do all that crazy stuff.

First, extract a fresh copy of Arduino 1.0.5 and install Teensyduino 1.14 to it.

Then edit boards.txt. On a Mac, that should be in Arduino.app/Contents/Resources/Java/hardware/teensy/boards.txt. Add just one line:



teensy3.build.additionalobject1=/full/pathname/to/libarm_cortexM4l_math.a


Of course, replace "/full/pathname/to" with the correct pathname to where your copy of that file is located. It doesn't matter where, just make sure you get the full pathname correct.

Restart Arduino (on a Mac, use cmd-Q to fully quit, since like most Mac programs it stays running after you close the last window). Arduino rereads boards.txt when it starts up, so you must restart for your edit to that file to take hold.

To check if your edit worked, use File > Preferences to turn on verbose info while compiling. Then compile something. The (very long) linker command should have the libarm_cortexM4l_math.a near the end of the huge list of files it builds into your final .elf image.





Edit: if you read hyple's post about the patch, DO NOT copy that .jar file into your Arduino lib directory. It's an old version and is certainly not correct for 1.0.5. I designed the Teensyduino installer to very carefully check for compatibility of the .jar files against known versions. The "Next" button in the installer only enables if *MANY* checks all pass. I put an incredible amount of work into making that installer very safe, it can't result in corrupting a copy of Arduino (or at least the odds of corruption are extremely low). But as soon as you start copying and replacing binary files without extreme attention to versions and platforms, all sorts of subtle problems can occur. It's very easy to end up with a copy of the Arduino IDE which seems to work, but then has unexpected failures. I'd highly recommend you start over with a fresh copy of Arduino+Teensyduino, then only edit boards.txt.

Kian
06-02-2013, 02:32 PM
Thanks Paul!

I am still facing a problem here. I followed your instructions and deleted my Arduino and Teensy program, doing a fresh install of Arduino and Teensyduino.

I have no errors in the verbose. However, when I try to run my program for FFT, get this error:

fatal error: arm_math.h: No such file or directory


I also have this very weird problem. I have downloaded teensyduino from here (http://www.pjrc.com/teensy/td_download.html)

But when I check the version, its showing up as Teensy Loader 1.07.

536

PaulStoffregen
06-02-2013, 06:44 PM
fatal error: arm_math.h: No such file or directory


You'll need to copy that .h file and any others it needs. The simplest approach would be to just put them into your program's folder.



But when I check the version, its showing up as Teensy Loader 1.07.


That's normal. Originally the Teensy Loader was developed separately from Teensyduino, so it has its own independent version number.

I'm considering just bumping it up to 1.15 and keeping it in sync with Teensyduino, which ought to be less confusing?

el_supremo
06-02-2013, 08:14 PM
For the include files needed by CMSIS, I copied the contents of this directory:
hardware\arduino\sam\system\CMSIS\CMSIS\Include
from the Arduino 1.0.5 distribution into a new subdirectory under my Teensy libraries subdirectory:
teensy\libraries\arm_math

Pete

wintrmute
06-03-2013, 04:59 AM
Wow, that's a lot of complex steps!!!!

Several months ago I added hpyle's patch (http://forum.pjrc.com/threads/14845-T3-linking-libarm_cortexM3l_math-a?p=17298&viewfull=1#post17298) to Teensyduino, so you should not need to do all that crazy stuff.


Wow, I wish I'd known that before hacking everything up myself! >.<

On another related topic though.. The version of the Cortex M4 library that I found packaged with Arduino IDE 1.5.1 was quite old; it only supports now-deprecated FFT functions. I don't suppose anyone has a newer version available?

PaulStoffregen
06-04-2013, 11:36 PM
I'm considering adding the math library and header to Teensyduino 1.15.

Does anyone have any input or ideas about how (or if) I ought to do this?

My inclination is to link the .a file by default as part of the build, and put the header somewhere the compiler can find it, but you'd still need to put a #include "arm_math.h" into your program if you want to use it. How does that sound?

el_supremo
06-05-2013, 01:49 AM
That would be very useful Paul.

Pete

wintrmute
06-05-2013, 02:24 AM
I'm considering adding the math library and header to Teensyduino 1.15.

Does anyone have any input or ideas about how (or if) I ought to do this?

My inclination is to link the .a file by default as part of the build, and put the header somewhere the compiler can find it, but you'd still need to put a #include "arm_math.h" into your program if you want to use it. How does that sound?

I think it's a good idea to add this to Teensyduino.
Note that there are some other headers that arm_math.h depends on, such as arm_common_table.h, core_cm4.h, etc.
You probably are aware, but I thought I'd mention it just in case.

mbustosorg
06-05-2013, 07:14 AM
Wow, timing is everything. I was just starting to look into this for my project to have some more interesting frequency response. (https://www.facebook.com/seagrassProject)

Everything worked up to the link. I'm doing most of my development from a makefile but wanted to try to get this to work through the IDE. I got the lib from https://code.google.com/p/rt-thread/source/browse/trunk/bsp/efm32/Libraries/CMSIS/Lib/GCC/libarm_cortexM4l_math.a?spec=svn2124&r=2124

I'm looking at the .ld file but not sure if the problem is there or in the .a Any thoughts?

Thanks again for the great post.

mauricio

/Applications/Development/Arduino/Arduino.app/Contents/Resources/Java/hardware/tools/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/4.7.2/../../../../arm-none-eabi/bin/ld: fftTest.cpp.elf section `.text' will not fit in region `FLASH'
/Applications/Development/Arduino/Arduino.app/Contents/Resources/Java/hardware/tools/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/4.7.2/../../../../arm-none-eabi/bin/ld: region `FLASH' overflowed by 47720 bytes
collect2: error: ld returned 1 exit status

mxxx
06-05-2013, 09:04 AM
I'm considering adding the math library and header to Teensyduino 1.15.

seconded, that would be very convenient...

MichaelMeissner
06-05-2013, 01:21 PM
I wonder how much floating point is being done on Teensy 3.0's, and whether there is enough of a market for Paul to consider using a chip with hardware floating point for Teensy 3.0++. The Cortex M4F has single precision floating point built in (but not double, so you would need to use 'float' as a type, and add 'f' suffixes to constants to keep the calculations in single precision). Some things naturally lend themselves to being done in floating point, but is it a significant share of the market to justify going to a more expensive chip?

wintrmute
06-05-2013, 02:06 PM
I wonder how much floating point is being done on Teensy 3.0's, and whether there is enough of a market for Paul to consider using a chip with hardware floating point for Teensy 3.0++. The Cortex M4F has single precision floating point built in (but not double, so you would need to use 'float' as a type, and add 'f' suffixes to constants to keep the calculations in single precision). Some things naturally lend themselves to being done in floating point, but is it a significant share of the market to justify going to a more expensive chip?

The Cortex M4 has a DSP that provides a lot of functions, many of which are available in single-precision floating point. I thought those *were* performed in hardware by the DSP, even though the general CPU doesn't have a hardware floating point system

I note that the chip only has 16kb to play with, so using doubles would be halving the amount you could store and operate upon. (I'm having issues with this myself currently, and that's just with single-precision!)

PaulStoffregen
06-05-2013, 04:10 PM
First, I'd like to ask anyone who's actually used this library to post any sketches that use it. I'll use them for testing. So far, the only one I've seen is this one from wintrmute (https://gist.github.com/TJC/5633491). If anyone has any others, please post them to this thread?

So far, I have not personally used this library, but I intend to work with it soon.

I have spent a LOT of time with the ARM reference manual and other ARM documentation. One thing is pretty clear... the "DSP" feature of the Cortex-M4 is mostly designed for the "q15" variable type, which is a 16 bit fixed point number representing +0.99996 to -1.00000.

@MichaelMeissner - I can't talk about Teensy++ 3.0 at this time, because we're still waiting for a new chip from Freescale. Until they officially release it, all the technical details are under NDA. The best I can say at this moment is that I believe you'll agree (in hindsight) that it was worth the wait.

el_supremo
06-05-2013, 04:55 PM
I've attached a ZIP archive of 7 CMSIS sketches that I've got working on Teensy 3.

Pete
[edit] I have two others, one of which runs but fails its test and the other hangs.

MichaelMeissner
06-05-2013, 05:18 PM
@MichaelMeissner - I can't talk about Teensy++ 3.0 at this time, because we're still waiting for a new chip from Freescale. Until they officially release it, all the technical details are under NDA. The best I can say at this moment is that I believe you'll agree (in hindsight) that it was worth the wait.
Believe me, I know about NDA's and how frustrating it can be not to talk about them in wider contexts, particularly as people are asking for new and better stuff. I have various war stories over the years, but this probably isn't the forum to elaborate on them.

I was just wondering out loud how many people could benefit by having a chip that does floating point in the chip, rather than software emulation. I'm starting to think as the embedded microprocessors take on more tasks, the need for floating point increases. In terms of hobbyists buying a single chip retail, it is an interesting cross over point of whether it is better to go with a microprocessor like the Teensy 3.0/Due/mbed or a small single board computer like a Raspberry Pi/Beagle Bone Black/pcDuino/etc.

On the other hand, if you only use FP to deal with a temp. sensor once a second, you don't really need hardware floating point.

linuxgeek
06-05-2013, 06:52 PM
When the clock speeds get very fast (ie GHz), isn't it a harder software problem trying to interoperate with things that operate at much slower frequencies?
At least that's what I'm gathering from following Paul and other's posts lately.

Which makes me wonder if it's not just better to have something like a multi-core teensy 3.0.
Maybe with shared DMA?

One problem would be that it would be less teensy, unless it's in the same chip. Or maybe the 2nd ARM could take the place of the Mini54? I admit that sounds kinda weird, but just wondering.

PaulStoffregen
06-05-2013, 07:54 PM
I've attached a ZIP archive of 7 CMSIS sketches that I've got working on Teensy 3.


Thanks!!!


Regarding fast hardware, I believe the supplied software makes the biggest difference. For example, in the recent benchmarks (http://www.pjrc.com/teensy/benchmark_usb_serial_receive.html), Arduino Due has the fastest USB hardware, but it's nearly the slowest board when you run measure the actual speed using Arduino sketches.

While I haven't used this ARM math library yet (I certainly intend to do so soon), IAR and others have published some benchmarks that show its extremely optimized. It's also not exactly easy to use, but I do believe it should be possible to build some nice audio processing libraries on top of it. I'm pretty excited about those possibilities.

Of course, faster hardware is pretty enticing too. You know I can't discuss Teensy++ 3.0 yet, but you'd be pretty safe to assume it will be faster. Even then, it's the software that makes the biggest difference, especially for people using Arduino and not looking to dig into low-level programming, so I'm going to keep investing most of my effort into the software side.

PaulStoffregen
06-05-2013, 10:27 PM
A complete sketch that compiles and produces output from the FFT on serial is at: https://gist.github.com/TJC/5633491

It seems to be missing arm_fft_bin_data.h :(

el_supremo
06-05-2013, 10:47 PM
There's a version of it here but as a .c instead of .h:
https://github.com/mechoid9/STM32F4/blob/master/STM32F4-Discovery_FW_V1.1.0/Libraries/CMSIS/DSP_Lib/Examples/arm_fft_bin_example/arm_fft_bin_data.c

Pete

adrianfreed
06-06-2013, 12:31 AM
Thanks!!!



Of course, faster hardware is pretty enticing too. You know I can't discuss Teensy++ 3.0 yet, but you'd be pretty safe to assume it will be faster. Even then, it's the software that makes the biggest difference, especially for people using Arduino and not looking to dig into low-level programming, so I'm going to keep investing most of my effort into the software side.

There are lots of nice audio applications that work well with a fast integer multiplier and a good fixed point library. There are also sound synthesis and audio processing techniques that are really hard to with such constraints. Recursive filters (IIR) like resonators, for example, are problematic for audio applications even with single precision floating point. I am waiting for a bit more infrastructure to be accessible to people (easy codec hookup, DMA audio I/O) before throwing in some of the low hanging fruit (additive synthesis, subtractive synthesis, migrators, singing voice) in the integer DSP space.

The first teensy-like board with a real single-precision floating point will attract a lot of attention from the music/audio hacking community. We would then be able to use some of the high-productivity tools like faust and PD to move a 50 year history of Audio/Music DSP code to it.. Currently the top contenders for this sort of work are the new Beagle bone and the Rasberry PI (see satellite CCRMA) but it is awkward to have to shuttle the gesture between arduinoesque boards and the processor doing sound synthesis. Even though we built a protocol designed to solve that problem (OSC) I would rather do the sound synthesis and gesture I/O on the same chip.

mbustosorg
06-06-2013, 05:49 AM
OK! Thank you very much. I'm linked and running. The basic sample is passing the initial test. I'm going to start playing around more with real data but am very excited to have gotten over the hump. Makefile conniptions.

As for the market, in the Bay Area artistic community, this kind of audio analysis / visualization is what I'm hearing a lot about. The basic functionality with Arduino is all well and good but people are clamoring for something that can bridge the gap (like adrianfreed talks about) and keep the small footprint.

wintrmute
06-06-2013, 06:00 AM
It seems to be missing arm_fft_bin_data.h :(

Oh, sorry about that.
There was nothing special in it, just a huge array of floats that is used as the input to the fft function.
In the original it was 2048 entries, but I had to reduce it to half that in order for everything to fit in memory.

adrianfreed
06-06-2013, 06:00 AM
As for the market, in the Bay Area artistic community, this kind of audio analysis / visualization is what I'm hearing a lot about. The basic functionality with Arduino is all well and good but people are clamoring for something that can bridge the gap (like adrianfreed talks about) and keep the small footprint.


Yes, mbustosorg, watch out for CNMAT to host a BARCMUT meeting to pull this community together later in the summer.

PaulStoffregen
06-06-2013, 08:41 PM
I just posted 1.15-rc1 (http://forum.pjrc.com/threads/23777-Teensyduino-1-15-Release-Candidate-1-Available) with the math library included.

I tweaked the header files slightly, so they always work for Teensy 3.0 regardless of whether you define ARM_MATH_CM4 or other stuff. I tested only briefly, and only on Linux, with a few of Pete's samples.

Please give 1.15-rc1 a try. I'm very open to ideas on how this math stuff should be included. My hope is to keep the interface stable once 1.15 is fully released, so please give it a try now while there's still time to easily make changes.

mbustosorg
06-07-2013, 07:56 AM
Hi Paul,

I tested the CMSIS examples on the new Beta and they worked fine. I'm trying to add an analogRead to the example and I'm getting some linking errors. I'm not familiar with these. Not including the analogRead links ok.

Thanks,
mauricio


#define ARM_MATH_CM4
#include "arm_math.h"

#define TEST_LENGTH_SAMPLES 2048
#include "arm_fft_sine_data.h"

// NOTE: q15t is int16_t in arm_math.h
uint32_t fftSize = 512;

/* ------------------------------------------------------------------
* Global variables for FFT Bin Example
* ------------------------------------------------------------------- */
uint32_t ifftFlag = 0;
uint32_t doBitReverse = 1;

uint32_t testInputIndex = 0;
float32_t testInput[TEST_LENGTH_SAMPLES];
static float32_t testOutput[TEST_LENGTH_SAMPLES/2];

void setup() {
Serial.begin(19200);
pinMode(13, OUTPUT);
for (int i=0; i < 10; i++) {
Serial.println(" start program ");
delay(1000);
}
}

bool pit3Triggered = false;

extern "C" {
//! Audio input interrupt handler running at 15kHz
void pit3_isr(void)
{
pit3Triggered = true;
digitalWrite(13, HIGH);
digitalWrite(13, LOW);
PIT_TFLG3 = 1;
}

void startup_late_hook(void) {
// This is called from mk20dx128.c
//Turn on interrupts:
SIM_SCGC6 |= SIM_SCGC6_PIT;
// turn on PIT
PIT_MCR = 0x00;
NVIC_ENABLE_IRQ(IRQ_PIT_CH3);

PIT_LDVAL3 = 3200 - 1; // setup timer 2 for frame timer period (15kHz) = 48MHz / 15kHz
PIT_TCTRL3 = 0x2; // enable Timer 3 interrupts
PIT_TCTRL3 |= 0x1; // start Timer 3
PIT_TFLG3 |= 1;
}
}

void loop() {

float32_t maxValue;
float32_t length = 256.0;
if (pit3Triggered) {
int sample;
sample = analogRead (14);
testInput[testInputIndex] = sample / 1024 * 10.0;
testInputIndex++;
if (testInputIndex >= TEST_LENGTH_SAMPLES) {
testInputIndex = 0;
}
}

if (testInputIndex == 0) {
arm_cfft_radix4_instance_f32 fft_inst; /* CFFT Structure instance */
arm_cfft_radix4_init_f32(&fft_inst, length, ifftFlag, doBitReverse);

uint32_t startTime, fftTime, magTime, maxTime;
Serial.println("Start");
startTime = millis();
/* Process the data through the CFFT/CIFFT module */
arm_cfft_radix4_f32(&fft_inst, testInput_f32_10khz);
fftTime = millis();
/* Process the data through the Complex Magnitude Module for
calculating the magnitude at each bin */
arm_cmplx_mag_f32(testInput_f32_10khz, testOutput, fftSize);
magTime = millis();
/* Calculates maxValue and returns corresponding BIN value */
arm_max_f32(testOutput, fftSize, &maxValue, &testIndex);
maxTime = millis();
Serial.println("End");

Serial.println(fftTime - startTime);
Serial.println(magTime - fftTime);
Serial.println(maxTime - magTime);
Serial.println("TOTAL: ");
Serial.println(maxTime - startTime);

Serial.print("MaxValue: ");
Serial.println(maxValue);
Serial.print("MaxIndex: ");
Serial.println(testIndex);

Serial.print("Magnitudes: ");
for (int j=0; j < length / 2; j++) {
Serial.print(j);
Serial.print(", ");
Serial.println(testOutput[j]);
}
}
}



/Applications/Development/Arduino/Arduino1.0.5.app/Contents/Resources/Java/hardware/tools/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/4.7.2/../../../../arm-none-eabi/bin/ld: fftTest.cpp.elf section `.bss' will not fit in region `RAM'
/Applications/Development/Arduino/Arduino1.0.5.app/Contents/Resources/Java/hardware/tools/arm-none-eabi/bin/../lib/gcc/arm-none-eabi/4.7.2/../../../../arm-none-eabi/bin/ld: region `RAM' overflowed by 7916 bytes
collect2: error: ld returned 1 exit status

iwanders
06-07-2013, 09:08 AM
I have never used this arm_math library before, but I'm following this thread with great interest.

Hopefully I can help out here. Mauricio, I think your code just runs out of ram.
You did the following:


#define TEST_LENGTH_SAMPLES 2048
float32_t testInput[TEST_LENGTH_SAMPLES];
static float32_t testOutput[TEST_LENGTH_SAMPLES/2];
A float32_t is four bytes, so you allocate 2048 * 4 + (2048 / 2 ) * 4 = 12288 bytes here.

I could not find the arm_fft_sine_data.h file, but I guess this looks like arm_fft_bin_data.c (https://github.com/mechoid9/STM32F4/blob/master/STM32F4-Discovery_FW_V1.1.0/Libraries/CMSIS/DSP_Lib/Examples/arm_fft_bin_example/arm_fft_bin_data.c) which would also mean 8196 bytes, which would put you a few thousand over the 16384 bytes Teensy 3.0 has available.

By the way, your length variable is a float, which is unnecessary:

float32_t length = 256.0;
arm_cfft_radix4_init_f32(&fft_inst, length, ifftFlag, doBitReverse);
, the length argument should be an uint16_t according to the header file.

In the following code:

/* Calculates maxValue and returns corresponding BIN value */
arm_max_f32(testOutput, fftSize, &maxValue, &testIndex);
You copy the fast Fourier transform result to testOutput, but fftSize is only 512, and testOutput is previously allocated with a size of 2048 / 2 = 1024 entries. So you allocate double the size you need. (I'm not sure how the library deals with the symmetry of the FFT though, as you only use an fft length of 256, the part from 255:511 will most likely consist of a mirror of the left part, so perhaps it would make sense to copy only the initial 256 entries.)

Hopefully this helps, perhaps someone who has experience using this library can help out on how large the output vector should be.

mbustosorg
06-07-2013, 09:31 PM
Thank you for finding the error in my ways. I was indeed careless and shouldn't post so late (only in the mornings from now on).

Fixed and working.

Thanks again,
mauricio

Nantonos
06-18-2013, 10:26 AM
I came across a Freescale application note about CMSIS on ARM Cortex M4 which seems like a useful introduction
http://www.freescale.com/files/microcontrollers/doc/app_note/AN4489.pdf

Frank B
04-15-2014, 05:39 PM
Today i saw this topic and since i'm playing with FreeImu and Madgwick i did a little benchmark to see if things could be speed up with the dsp.
Maybe the vectormath could be speed up drastically.

Here`s a "quick and dirty" benchmark for sqrt and 1/sqrt.
Results first (-Os, 96 Mhz, teensy 3.1):



1000x dspSqrt:7580us. Result:21065.8378906
1000x sqrt :11700us. Result:21065.8378906

1000x 1 / dspSqrt:9302us. Result :4294967295.
1000x 1 / sqrt :23203us. Result :4294967295.
1000x invSqrt :4576us. Result:4294967295.



Source:



#include <math.h>
#include <arm_math.h>

HardwareSerial Uart = HardwareSerial();

inline float dspSqrt(float x){
float result;
arm_sqrt_f32(x, &result);
return result;
}


inline float invSqrt(float x) {
float halfx = 0.5f * x;
float y = x;
long i= *(long*)&y;
i = 0x5f375a86 - (i>>1);
y = *(float*)&i;
y = y * (1.5f - (halfx * y * y));
return y;
}

inline float dspInvSqrt(float x){
float result;
arm_sqrt_f32(x, &result);
return 1 / result;
}

void setup() {
Uart.begin(115200);
}


void loop(){
int time;
volatile float f;


time = micros();
f=0;
for (int i=0; i<1000; i++) {
f += dspSqrt(i);
}
time = micros() -time;

Uart.print("\r\n1000x dspSqrt:");
Uart.print(time);
Uart.print("us. Result:");
Uart.println(f,7);

time = micros();
f=0;
for (int i=0; i<1000; i++) {
f += dspSqrt(i);
}
Uart.print("1000x sqrt :");
Uart.print(time);
Uart.print("us. Result:");
Uart.println(f,7);




time = micros();
f=0;
for (int i=0; i<1000; i++) {
f += dspInvSqrt(i);
}
time = micros() -time;
Uart.print("\r\n1000x 1 / dspSqrt:");
Uart.print(time);
Uart.print("us. Result :");
Uart.println(f);

time = micros();
f=0;
for (int i=0; i<1000; i++) {
f += 1/sqrt(i);
}
time = micros() -time;
Uart.print("1000x 1 / sqrt :");
Uart.print(time);
Uart.print("us. Result :");
Uart.println(f);


time = micros();
f=0;
for (int i=0; i<1000; i++) {
f += invSqrt(i);
}
time = micros() -time;
Uart.print("1000x invSqrt :");
Uart.print(time);
Uart.print("us. Result:");
Uart.println(f);




while(1);
}


...but invSqrt is faster than arm_math (?)

PaulStoffregen
04-15-2014, 07:16 PM
That invSqrt() looks like an older version of FreeIMU. Please get the latest from here:

https://github.com/PaulStoffregen/FreeIMU

Part of the reason it's so fast is its low accuracy. It only performs one iteration of the Newton-Raphson approximation.

Frank B
04-15-2014, 09:28 PM
Thank you, Paul ! The c++ warning regarding "evil" operations is gone now, speed is identical.
I updated my "benchmark".

I think i'll use the "dspSqrt"-version from above for my new project, it is not much slower but gives better results (i hope).

But there is an other warning:



In file included from C:\Arduino\hardware\teensy\cores\teensy3/WProgram.h:15:0,
from C:\Arduino\hardware\teensy\cores\teensy3/Arduino.h:1,
from arm_math.ino:5:
C:\Arduino\hardware\teensy\cores\teensy3/wiring.h:42:0: warning: "PI" redefined [enabled by default]
In file included from arm_math.ino:3:0:
C:\Arduino\hardware\teensy\cores\teensy3/arm_math.h:303:0: note: this is the location of the previous definition



---

I don't want to optimze too much, because i'm sure now that teensy is fast enough.
Reading MPU6050 & HCM5883L over I2C (400kHz) + "Madgwick AHRS" 9-axis algorithm plus a few other tasks takes only 1.3 ms so far. Plenty of time left for other things.
My goal is to build my "Balancing Bot V3" (V1 with Raspberry + Arduino-Nano (Mega328) here: http://www.youtube.com/watch?v=n-noFwc23y0 or Blog (http://robertabot.blogspot.de/)- V2 was the same but without Raspberry)
With more features and eventually this time with only one wheel.

The teensy 3 is great !!

Frank B
04-16-2014, 10:23 PM
Wow.. playing with the dsp is fun :-)

this:


inline void deg2rad_vect(float32_t *fvect){
float32_t m[3] = { M_PI / 180, M_PI / 180, M_PI / 180 };
arm_mult_f32( m, &fvect[0], &fvect[0], 3);
}


is 10 times faster than
f[0] = f[0] * M_PI / 180;f[1] = f[1] * M_PI / 180;f[2] = f[2] * M_PI / 180;

I personally don't need these optimizations, but its fun to find out what the DSP can do.
I think there are much more things to "teensy-"optimize in FreeIMU. AHRSupdate() is worth a look.
If somebody is interested we can open a new thread.

PaulStoffregen
04-18-2014, 12:56 PM
I think there are much more things to "teensy-"optimize in FreeIMU.

I'm currently working on too many other things to do much with FreeIMU lately. But if you fork the github code, just send any well tested changes as pull requests and I'll merge them.

https://github.com/PaulStoffregen/FreeIMU

Frank B
04-18-2014, 09:29 PM
Hi, here are first changes:

https://github.com/FrankBoesing/freeIMU/compare/zrecommerce:master...master

20% speedup of the calculation, but not entirely testet, but should give same results.

Unfortunately i can't test it with "real" flying hardware..

ltj
10-03-2014, 09:36 AM
Hi,

It seems that the CMSIS lib that comes with Teensyduino is version 1.1.0. They are now at 1.4.4 and I'm very interested in using the newer and more convenient complex FFT functions (where you don't have to init and it will automatically select radix). I have tried to use an updated CMSIS lib, but without luck.
What I did:
1. download latest CMSIS-DSP
2. change the board.txt file so that teensy3.build.additionalobject1 links the new libarm_cortexM4l_math.a
3. update the arm-math and core header files in hardware/teensy/cores/teensy3/
4. Included the teensy3 "fix" in arm_math (could net see any teensy related edits in the other header files).

Something like this now compiles:


arm_cfft_instance_f32 fft_inst;
arm_cfft_f32(&fft_inst, buffer_f, 0, 1);
But the actual FFT calculation brings the MCU to grinding halt. Nothing happens after that. Any suggestions is greatly appreciated.

Cheers,
Lars

PaulStoffregen
10-03-2014, 09:57 AM
I tried one of the newer versions some time ago. Unfortunately, it expanded the maximum FFT size by increasing the size of the lookup tables by 4X (even when computing a smaller FFT), so the compiled code could not fit into Teensy 3.0 or 3.1.

ltj
10-03-2014, 10:42 AM
Thanks, Paul. But unless I did something wrong or missed something, it actually compiled and fitted into the Teensy 3.0. The sketch would run as normal until trying to use the new arm_cfft_x function. Please forgive any ignorance here - this is not within my comfort zone :)