FLASHMEM for all functions in a file

was-ja

Well-known member
Hello,

my project becomes more complicated, and require more program memory. I already optimized everything that I can in my code: wrapped my code and SPI, DMA and interrupts files with pragma -Ofast, and let other be optimized by size, but it is not enough for me.

I know that SDcard that I am using, take a lot of functions (according to *.sym) and all these functions take some ITCM memory. In addition I am planning to use LittleFS, USB and Ethernet server that will take next blocks of fast memory.

So, I need a tool to place library functions into flash. I understand that I can go over all core and library and place FLASHMEM before each function.

Please, tell me is there any more clever way, like place any pragma for FLASHMEM at the first like of file to force all entries to go to FLASHMEM, or, any other way?

Thank you!
 
Hello,

my project becomes more complicated, and require more program memory. I already optimized everything that I can in my code: wrapped my code and SPI, DMA and interrupts files with pragma -Ofast, and let other be optimized by size, but it is not enough for me.

I know that SDcard that I am using, take a lot of functions (according to *.sym) and all these functions take some ITCM memory. In addition I am planning to use LittleFS, USB and Ethernet server that will take next blocks of fast memory.

So, I need a tool to place library functions into flash. I understand that I can go over all core and library and place FLASHMEM before each function.

Please, tell me is there any more clever way, like place any pragma for FLASHMEM at the first like of file to force all entries to go to FLASHMEM, or, any other way?

Thank you!

Yup, you can edit the linker file.
This is for t4.1 - but an older TD version. Didn't bother to update it, as Pauls does not want this functionality and general interest was near zero :)
Maybe it still works - maybe not. Don't know. Good news: Due to the cache your program will run as fast as before if it's not too big and has some code locality.

Edit: when using this, teensysize will not display correct values after compile. That's a bug - I fixed it , but consequently the fix wasn't merged, too.
 
Last edited:
Thank you very much, Frank! Please, suggest me what I am doing wrong, I cannot go to your link, github reports me that it is 404. I also searched in your project, but probably, did not carefully, so I did not found right one.

Thank you
 
Thank you very much, Frank! Please, suggest me what I am doing wrong, I cannot go to your link, github reports me that it is 404. I also searched in your project, but probably, did not carefully, so I did not found right one.

Thank you

Oh, I see.. had set the repo to private some time ago..
here's the file. (remv the txt extension and modify boards.txt to use it..)

If changes are needed, it would be nice if you could post them..
 

Attachments

  • imxrt1062_t41f.ld.txt
    2.7 KB · Views: 109
Thank you very much, FrankB!!!

Yes, it works as it is on my t4.1.

It moved almost all functions to flash memory having some 192 bytes still in ITCM.

Actually, it is not exactly what I searched for - my algorithm have its critical part still large enough that does not fit into the cache, so, my intention was to place only particular functions into ITCM, and rest part of algorithm should run from CPU cache, so to benefit from CPU cache and ITCM. In your solution you are playing only with complete SECTIONS that are in ITCM or flash.

Please, advice, is there any possibility to change it filewise? I.e. everything that are situtated in the prescribed files should be placed in to FASATMEM, and the rest as it is written inside the program? If no solution is exist, I will just manually apped FLASHMEM in every functions in SD/*, SdFat/*, LittleFS/*, FS.*, Stream.*, Print.* and probably some others that I will find in *.sym list.
 
Thank you very much, FrankB!!!

Yes, it works as it is on my t4.1.

It moved almost all functions to flash memory having some 192 bytes still in ITCM.

Actually, it is not exactly what I searched for - my algorithm have its critical part still large enough that does not fit into the cache, so, my intention was to place only particular functions into ITCM, and rest part of algorithm should run from CPU cache, so to benefit from CPU cache and ITCM. In your solution you are playing only with complete SECTIONS that are in ITCM or flash.

Please, advice, is there any possibility to change it filewise? I.e. everything that are situtated in the prescribed files should be placed in to FASATMEM, and the rest as it is written inside the program? If no solution is exist, I will just manually apped FLASHMEM in every functions in SD/*, SdFat/*, LittleFS/*, FS.*, Stream.*, Print.* and probably some others that I will find in *.sym list.

You can use the oppsite. Just mark your fast functions as "FASTRUN" and they will run from RAM,
The 192 Byte could be the size bug I mentioned. It should be zero.
 
Thank you very much, FrankB!

You can use the opposite. Just mark your fast functions as "FASTRUN" and they will run from RAM,

Yes, however, if I mark every function in my files as FASTMEM, all of then are present in the elf (executive file). Otherwise, with standard method, many small functions are in-lined and compiler take then away. So, I just place FASTRAM at every my function I got about +30% in instructions.

PS: please, excuse me that I am discussion so very-very fine tune methods, but it seems for me is the only method still to implement my algorithms on this hardware. And I really like teensy 4.1, and wish to use it, but want to get all from libraries and CPU.
 
Ok,
find the the #define for FASTRUN in the core, copy it, name it "RUNFAST" or such, and remove the noinline and noclone attributes...
 
Thank you very much, FrankB, for your answer!

I am using TD1.56, and I found 2 entries of FASTRUN at:

~/arduino-1.8.19/hardware/teensy/avr/cores/teensy3/WProgram.h
~/arduino-1.8.19/hardware/teensy/avr/cores/teensy4/avr/pgmspace.h
where, at the first file (WProgram.h) it is defined as
Code:
#define FASTRUN __attribute__ ((section(".fastrun"), noinline, noclone ))
and at the second file (pgmspace.h), as
Code:
#define FASTRUN __attribute__ ((section(".fastrun") ))
I have checked with #error, that only the second file (pgmspace.h) is used for the compilation, and tried to upgrade it with
Code:
#define FASTRUN __attribute__ ((section(".fastrun"), inline ))

but it does not help - it still produce functions as if they should be always placed in the ITCM.

I found that FASTRUN is also defined in ~/arduino-1.8.19/hardware/teensy/avr/keywords.txt, but I was unable to understand how it is resolved.

Please, advice me how to proceed?

Thank you!
 
Thank you very much, FrankB, for your answer!

I am using TD1.56, and I found 2 entries of FASTRUN at:


where, at the first file (WProgram.h) it is defined as
Code:
#define FASTRUN __attribute__ ((section(".fastrun"), noinline, noclone ))
and at the second file (pgmspace.h), as
Code:
#define FASTRUN __attribute__ ((section(".fastrun") ))
I have checked with #error, that only the second file (pgmspace.h) is used for the compilation, and tried to upgrade it with
Code:
#define FASTRUN __attribute__ ((section(".fastrun"), inline ))

but it does not help - it still produce functions as if they should be always placed in the ITCM.

I found that FASTRUN is also defined in ~/arduino-1.8.19/hardware/teensy/avr/keywords.txt, but I was unable to understand how it is resolved.

Please, advice me how to proceed?

Thank you!

The keywords.txt is only used by the editor to highlight the keyword.
Hmm... so the t4 version has no "noinline" "noclone".. in this case, inlining should be enabled. Do you use "-O2 or "-O3"?
If yes, I have no idea why inlining does not work. Perhaps use the always_inline attribute?

I'm going to watch James Bond now. Good night.
 
It is not so simple: I wrapped my code with

#pragma GCC push_options
#pragma GCC optimize ("Ofast")

// code

#pragma GCC pop_options

due to performance.

To test FASTRUN I tried the following options:

#define MYFASTRUN
#define MYFASTRUN __attribute__ ((section(".fastrun")))
#define MYFASTRUN __attribute__ ((section(".fastrun"), inline ))
#define MYFASTRUN __attribute__ ((section(".fastrun"), always_inline ))

The first version produce the smallest result, because if it inlines some function and remove functions per se from the code - I do not see them in *.sym.

The second (only fastrun), and the third (fastrun + inline) options produce +30% more ITCM, because ALL marked functions land into the code.

always_inline option evidently make inline for every function, so that it extensively enlarge ITCM usage.
 
Hello Frank,
my program code and variables have become very large. I use many libraries (sdcard,qeternet,lvgl, etc) some I have manually pushed into flash. I am very interested in a solution on the principle.
The problem is that I have not understood how to use it, despite having read everything.
I also have a T4.1 and TD1.56 Ide 1.18.19.

So in the end I don't want to have to change any libraries that are included to keep them in flash. I understand that with this file it is like that, but then you have to mark the program parts that have to run in Ram with Fastrun.

Question how exactly do you have to modify the IDE /TD files so that everything ends up in flash ?

How does it look in the .ino then if I want to run a function in Ram, or a whole library.

Can you please explain this to beginners ?

I'm sorry that I have to ask here so clueless but the fragments I can not put together to a whole. Maybe an extension in TD 1.57 that allows this would be a cool thing. It was hard enough to understand DMAMEM... yes

Thanks already very much for your effort!

PS:Yes I tried to put it in I didn't have the know how.

Translated with www.DeepL.com/Translator (free version)
 
I am using PlatformIO and my project code size on Teensy 3.6 is about 800kB, so it won't compile on T4.1. I tested with imxrt1062_t41f.ld.txt, which Frank attached on 2020-01-08, but no success. Project compiles, but T4.1 dies totally - no output at all. With reset button I can get it back to bootloader.

I tried with simpler project with some libraries used on the big project and that worked so seems that linker modification works. Since problem may be with some statics defined on other libraries and so initialized before setup I also tested by adding some printing immediately after include Arduino.h in main module. That worked on 3.6, but 4.1 is still totally dead.

Any ideas how test more and get forward with this project?
 
Hello,
in the meantime i geht the place where the File have to be places. But no luck with compiling my project.
greedings
Jake
 
Back
Top