QNEthernet using 80k RAM

danielkr

Member
Hi all,

first of all, thanks Paul for the Teensy. It is an amazing product with great features. Keep going!

And now the issue I currently have:
I want to control the signals on my model railway with a Teensy connected to a Roco z21 digital station via Ethernet. There is a simple protocol where the z21 spits out whatever happens (especially what turnout has toggled) and the Teensy then figures out what signal to set to red/green.

That code exists in form of a Visual Studio C++ project on Windows and everything works fine.
After some migration work and downgrading to C++14, I managed to compile everything with Visual Micro with the following output:

Code:
teensy_size*: Memory Usage on Teensy 4.1
teensy_size*: FLASH: code:381984, data:50820, headers:8472   free for files:7685188
teensy_size*: RAM1: variables:140128, code:379232, padding:13984   free for local variables:-9056
teensy_size*: RAM2: variables:24736  free for malloc\new:499552

Notice the really tight space for remaining local variables.
After some playing around, I noticed that switching from QNEthernet to NativeEthernet, the issue goes away and I have plenty of space for local variables:

Code:
teensy_size*: Memory Usage on Teensy 4.1
teensy_size*: FLASH: code:363232, data:41940, headers:8940   free for files:7712352
teensy_size*: RAM1: variables:60740, code:360496, padding:32720   free for local variables:70332
teensy_size*: RAM2: variables:24768  free for malloc\new:499520

So overall, QNEthernet takes up about 80k more space than NativeEthernet. Any ideas?
Looking at the numbers, my code seems to be heavy on the code side (I use some rtti, but have not found out how heavy it really is), but I am happy with 500k heap and 70k for futher development ;)

Here is the gcc command line for the ino.file:

Code:
"C:\Program Files (x86)\Arduino\hardware\teensy/../tools/arm/bin/arm-none-eabi-g++"
-c -O2 -g -Wall -ffunction-sections -fdata-sections -nostdlib -MMD -std=gnu++14 -fno-exceptions -fpermissive -fno-rtti -fno-threadsafe-statics -felide-constructors -Wno-error=narrowing -mthumb -mcpu=cortex-m7 -mfloat-abi=hard -mfpu=fpv5-d16 
-D__IMXRT1062__ -DTEENSYDUINO=155 -DARDUINO=108016 -DARDUINO_TEENSY41 -DF_CPU=600000000 -DUSB_DUAL_SERIAL -DLAYOUT_US_ENGLISH 
"-IC:\Users\daniel\AppData\Local\Temp\VMBuilds\winston-teensy\teensy41\Debug/pch" 
-I"D:\proj\dev\uc\winston\winston\winston-teensy" -I"C:\Program Files (x86)\Arduino\hardware\teensy\avr\cores\teensy4" -I"c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\itelwv1j.wbw\Micro Platforms\default\debuggers\VM_DBG" -I"C:\Program Files (x86)\Arduino\hardware\teensy\avr\libraries\NativeEthernet\src" -I"C:\Program Files (x86)\Arduino\hardware\teensy\avr\libraries\FNET\src" -I"C:\Program Files (x86)\Arduino\hardware\teensy\avr\libraries\SPI" -I"C:\Program Files (x86)\Arduino\hardware\teensy\avr\libraries\SD\src" -I"C:\Program Files (x86)\Arduino\hardware\teensy\avr\libraries\SdFat\src"  -I"D:\proj\dev\uc\winston\winston\winston-teensy" -I"D:\proj\dev\uc\winston\winston\winston-teensy\..\libwinston" -I"D:\proj\dev\uc\winston\winston\winston-teensy\..\winston"  -frtti -I"D:\proj\dev\uc\winston\winston\winston-teensy" -I"D:\proj\dev\uc\winston\winston\winston-teensy\..\libwinston" -I"D:\proj\dev\uc\winston\winston\winston-teensy\..\winston" 
-DVM_DEBUG_BREAKPAUSE -DVM_DBT=VM_DBT_GENERIC_OBJECT -DVM_DEBUGGER_SOFT_TRANSPORT_WRITER=Serial -DVM_DEBUGGER_SOFT_TRANSPORT=Serial -DVM_DEBUG_BANDWIDTH_THROTTLE_MS=100 -DVM_DEBUG -DVM_DEBUG_ENABLE=1 -DVM_DBT_GENERIC_OBJECT=21 -DVM_DBT_NO_SERIAL=20 -DVM_DBT_MS430_SERIAL_=19 -DVM_DBT_SERIALUSB=18 -DVM_DBT_USBAPI=17 -DVM_DBT_NET_UDP=16 -DVM_DBT_HARDWARESERIAL3=15 -DVM_DBT_HARDWARESERIAL2=14 -DVM_DBT_HARDWARESERIAL1=13 -DVM_DBT_CDCSerialClass=12 -DVM_DBT_COSA=11 -DVM_DBT_Uart=10 -DVM_DBT_NET_CONSOLE=9 -DVM_DBT_TTYUART=8 -DVM_DBT_USBSERIAL=7 -DVM_DBT_USART=6 -DVM_DBT_UART=5 -DVM_DBT_TEENSY=4 -DVM_DBT_USB=3 -DVM_DBT_FASTSERIAL=2 -DVM_DBT_SOFTWARESERIAL=1 -DVM_DBT_HARDWARESERIAL=0 -DWINSTON_PLATFORM_TEENSY -DTCB_SPAN_NAMESPACE_NAME=std 
-I"C:\Users\daniel\Documents\Arduino\libraries\TeensyDebug\src" "C:\Users\daniel\AppData\Local\Temp\VMBuilds\winston-teensy\teensy41\Debug\winston-hal-teensy.cpp" -o "C:\Users\daniel\AppData\Local\Temp\VMBuilds\winston-teensy\teensy41\Debug\winston-hal-teensy.cpp.o"

Other libraries currently in use are SPI, SDFat and TeensyDebug. The rest is my stuff.

Thanks
Daniel
 
Shawn will probably be able to tell you how to reduce QNEthernet variables based on the limited scope of what you plan to do with Ethernet. With T4.x, code goes into RAM by default. You can force functions to remain in FLASH via the FLASHMEM directive, so if you can identify functions that would be okay to run (more slowly) from the serial (XIP) FLASH, you can free up RAM for variables. See https://www.pjrc.com/store/teensy41.html for more on FLASHMEM and other directives.
 
Good point with the FLASHMEM. I already tried to find the biggest one with arm-size. Lets see what is worth modifying.
Additional input from Shawn is highly appreciated, though!
 
I haven’t done much work with optimizing for size (because memory is infinite, obviously :)) I use some STL stuff internally but I’ve always thought the compiler is pretty good with optimization and code elision.

I suppose the low-hanging fruit would be:
  1. Compile with however your IDE does “optimize for smallest size”. (Are you already doing this?) (eg. "-Os instead of -O2")
  2. Reduce the pre-allocated lwIP stuff, eg. the UDP and TCP blocks and stats collection. You can find all the options in the file lwipopts.h.
    1. MEM_SIZE: I haven't played with this, but maybe it can be reduced. Try it and see what happens. To monitor memory usage, set MEM_SIZE to 1, MEM_STATS_DISPLAY to 1, include <lwip/stats.h>, and call MEM_STATS_DISPLAY() once in a while. I chose 24000 because that's what previous lwIP iterations here were using. I bet you could reduce it to a few kilobytes, eg. 4096.
    2. MEMP_NUM_UDP_PCB: Each is, I think, 40 bytes, but maybe there's more allocated somewhere.
    3. MEMP_NUM_TCP_PCB: Not sure how big each is.
    4. MEMP_NUM_TCP_PCB_LISTEN: Not sure how big each is.
    5. Set LWIP_STATS to 0 to remove stats collection.
  3. Don’t use RTTI. How much does your program shrink if you disable this?
  4. For UDP, incoming and outgoing packets are buffered in a std::vector<unsigned char>. (But that's at runtime.)
  5. For TCP, data is also buffered in a std::vector<unsigned char>. (But that's also at runtime.)
  6. Try adding "--specs=nano.specs" to the build flags (assuming the "smallest size" compile option doesn't add this) to compile in a smaller version of Newlib.
  7. Add "--gc-sections" to the linker options (eg. "-Wl,--gc-sections").

I'm curious what the effects of each one of these steps are, independently.
 
Last edited:
Hi all,
I ran some analysis compiles. The rtti is still enabled as it is something my code relies on. But I think even different -Ox-Flags show some results:
Smallest Code:
Code:
teensy_size*: Memory Usage on Teensy 4.1
teensy_size*: FLASH: code:236484, data:40548, headers:8660   free for files:7840772
teensy_size*: RAM1: variables:127840, code:234024, padding:28120   free for local variables:134304
teensy_size*: RAM2: variables:24736  free for malloc\new:499552

Fastest:
Code:
teensy_size*: Memory Usage on Teensy 4.1
teensy_size*: FLASH: code:411408, data:50348, headers:8192   free for files:7656516
teensy_size*: RAM1*: variables:140128, code:408672, padding:17312   free for local variables:-41824
teensy_size*: RAM2: variables:24736  free for malloc\new:499552

Debug:
Code:
teensy_size*: Memory Usage on Teensy 4.1
teensy_size*: FLASH: code:386040, data:51868, headers:8488   free for files:7680068
teensy_size*: RAM1*: variables:140128, code:383536, padding:9680   free for local variables:-9056
teensy_size*: RAM2: variables:24736  free for malloc\new:499552

Debug with LWIP-MEMSIZE: 24 instead of 24000
Code:
teensy_size*: Memory Usage on Teensy 4.1
teensy_size*: FLASH: code:385976, data:51868, headers:8552   free for files:7680068
teensy_size*: RAM1: variables:115552, code:383472, padding:9744   free for local variables:15520
teensy_size*: RAM2: variables:24736  free for malloc\new:499552

So for debugging, I will remain with NativeEthernet for the moment.
LWIP_STATS is already disabled and the others I did not find.

Thanks
Daniel
 
Ah, I completely missed the Buffers. Here is an update, also with a bigger MEM_SIZE:
Debug with
  • LWIP-MEMSIZE: 1024 instead of 24000
  • LWIP_STATS: 0 instead of 1
  • MEMP_NUM_UDP_PCB: 2 instead of 8
  • MEMP_NUM_TCP_PCB: 2 instead of 8
Code:
teensy_size*: Memory Usage on Teensy 4.1
teensy_size*: FLASH: code:384440, data:51828, headers:9104   free for files:7681092
teensy_size*: RAM1: variables:115552, code:381936, padding:11280   free for local variables:15520
teensy_size*: RAM2: variables:24736  free for malloc\new:499552

So obviously, the MEM_SIZE has the impact one would expect, but I still miss the 80k in comparison to NativeEthernet.
I think I will end my investigation, because I have other issues with QNEthernet and Websocket2 library. When NativeEthernet proves to be insufficient, I will check again.
Thank you all
Daniel
 
Yes, I installed the WebSockets2_Generic and QNEthernet library. This is the include section in my Header file:
Code:
#define WEBSOCKETS_USE_ETHERNET     true
#ifdef WINSTON_TEENSY_QNETHERNET
#define USE_QN_ETHERNET				true
#else
#define USE_NATIVE_ETHERNET         true
#endif
#include <WebSockets2_Generic.h>
using namespace websockets2_generic;
#ifdef WINSTON_TEENSY_QNETHERNET
#include <QNEthernet.h>
using namespace qindesign::network;
#else
#include <NativeEthernet.h>
#endif
So, with WINSTON_TEENSY_QNETHERNET I can toggle QNEthernet or NativeEthernet. In the first case, this is the compilation error related to both libraries:
Code:
 ws_common_QNEthernet.hpp:70: In file included from
message.hpp:69: from
WebSockets2_Generic.h:54: from
winston-hal-teensy.h:10: from
Kornweinheim.h:18: from
winston-main.cpp:0: from
Teensy41_QNEthernet_tcp.hpp: In member function virtual bool websockets2_generic::network2_generic::EthernetTcpServer::listen(uint16_t)
 
Teensy41_QNEthernet_tcp.hpp: 185:18: error: use of deleted function 'qindesign::network::EthernetServer& qindesign::network::EthernetServer::operator=(const qindesign::network::EthernetServer&)
   server = EthernetServer(port)
 
QNEthernet.h:19: In file included from
Teensy41_QNEthernet_tcp.hpp:49: from
ws_common_QNEthernet.hpp:70: from
message.hpp:69: from
WebSockets2_Generic.h:54: from
winston-hal-teensy.h:10: from
Kornweinheim.h:18: from
winston-main.cpp:0: from
QNEthernetServer.h:22: note  qindesign  network  EthernetServer& qindesign  network  EthernetServer  operator=(const qindesign  network  EthernetServer&) is implicitly deleted because the default definition would be ill-formed
 class EthernetServer final *: public Server {
 
QNEthernetServer.h: 22:7: error: non-static const member 'const uint16_t qindesign::network::EthernetServer::port_', can't use default assignment operator

This is the command line of the file where the headers end up being included:
Code:
"C:\Program Files (x86)\Arduino\hardware\teensy/../tools/arm/bin/arm-none-eabi-g++" -c -Og -g -Wall -ffunction-sections -fdata-sections -nostdlib -MMD -std=gnu++14 -fno-exceptions -fpermissive -fno-threadsafe-statics -felide-constructors -Wno-error=narrowing -mthumb -mcpu=cortex-m7 -mfloat-abi=hard -mfpu=fpv5-d16 -D__IMXRT1062__ -DTEENSYDUINO=155 -DARDUINO=108016 -DARDUINO_TEENSY41 -DF_CPU=600000000 -DUSB_DUAL_SERIAL -DLAYOUT_US_ENGLISH "-IC:\Users\daniel\AppData\Local\Temp\VMBuilds\winston-teensy\teensy41\Debug/pch" -I"D:\proj\dev\uc\winston\winston\winston-teensy" -I"C:\Program Files (x86)\Arduino\hardware\teensy\avr\cores\teensy4" -I"c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\itelwv1j.wbw\Micro Platforms\default\debuggers\VM_DBG" -I"C:\Users\daniel\Documents\Arduino\libraries\WebSockets2_Generic\src" -I"C:\Users\daniel\Documents\Arduino\libraries\QNEthernet-master\src" -I"C:\Program Files (x86)\Arduino\hardware\teensy\avr\libraries\SPI" -I"C:\Program Files (x86)\Arduino\hardware\teensy\avr\libraries\SD\src" -I"C:\Program Files (x86)\Arduino\hardware\teensy\avr\libraries\SdFat\src" -I"C:\Users\daniel\Documents\Arduino\libraries\TeensyDebug\src"  -I"D:\proj\dev\uc\winston\winston\winston-teensy" -I"D:\proj\dev\uc\winston\winston\winston-teensy\..\libwinston" -I"D:\proj\dev\uc\winston\winston\winston-teensy\..\winston"  -I"D:\proj\dev\uc\winston\winston\winston-teensy" -I"D:\proj\dev\uc\winston\winston\winston-teensy\..\libwinston" -I"D:\proj\dev\uc\winston\winston\winston-teensy\..\winston" -DVM_DEBUG_BREAKPAUSE -DVM_DBT=VM_DBT_GENERIC_OBJECT -DVM_DEBUGGER_SOFT_TRANSPORT_WRITER=Serial -DVM_DEBUGGER_SOFT_TRANSPORT=Serial -DVM_DEBUG_BANDWIDTH_THROTTLE_MS=100 -DVM_DEBUG -DVM_DEBUG_ENABLE=1 -DVM_DBT_GENERIC_OBJECT=21 -DVM_DBT_NO_SERIAL=20 -DVM_DBT_MS430_SERIAL_=19 -DVM_DBT_SERIALUSB=18 -DVM_DBT_USBAPI=17 -DVM_DBT_NET_UDP=16 -DVM_DBT_HARDWARESERIAL3=15 -DVM_DBT_HARDWARESERIAL2=14 -DVM_DBT_HARDWARESERIAL1=13 -DVM_DBT_CDCSerialClass=12 -DVM_DBT_COSA=11 -DVM_DBT_Uart=10 -DVM_DBT_NET_CONSOLE=9 -DVM_DBT_TTYUART=8 -DVM_DBT_USBSERIAL=7 -DVM_DBT_USART=6 -DVM_DBT_UART=5 -DVM_DBT_TEENSY=4 -DVM_DBT_USB=3 -DVM_DBT_FASTSERIAL=2 -DVM_DBT_SOFTWARESERIAL=1 -DVM_DBT_HARDWARESERIAL=0 -DWINSTON_PLATFORM_TEENSY -DTCB_SPAN_NAMESPACE_NAME=std -I"C:\Program Files (x86)\Arduino\hardware\teensy\avr\libraries\Entropy" "D:\proj\dev\uc\winston\winston\winston\winston-main.cpp" -o "C:\Users\daniel\AppData\Local\Temp\VMBuilds\winston-teensy\teensy41\Debug\winston-main.cpp.o"

Maybe, you can spot the issue. I do not know how to resolve this.

Thanks
Daniel
 
Thank you for that. After perusing the WebSockets2 code last night, I think I know what the issues are. My plan is to file some issues in that repo.
 
I pushed a fix to WebSockets2: https://github.com/khoih-prog/WebSockets2_Generic/pull/35
That will make its way to the Arduino IDE as soon as a new release of that library is made. Or, one could apply the changes manually; it’s only one file that changed.

Thank you again for pointing out these two issues (the library problem and memory consumption). Whether you use QNEthernet or not, I’m glad you’re helping make it more useful. :)
 
@danielkr I was able to move lwiIP's heap to RAM2, so this should clear up some space. See the latest on GitHub.

If you're using PlatformIO, delete `.pio/libdeps/teensy41/QNEthernet`, make sure `https://github.com/ssilverman/QNEthernet.git` is in your `lib_deps`, and then rebuild.
 
Hi @shawn,
thanks for the update, I will investigate when I touch that side of the project again.
Best Regards
Daniel
 
Hello @shawn, first of all, thank you for creating and maintaining this awesome library!
I had better luck with QNEthernet than the NativeEthernet library and recently started using it on my project. However, I noticed that in my case, the RAM1 consumption for this library is 100 kB higher than using NativeEthernet (see teensy_size print below). I am using latest Teensyduino 1.57, default compiler flags and optimizations and the lasted version of the QNEthernet library from GitHub.

Code:
With NativeEthernet:
teensy_size: Memory Usage on Teensy 4.1:
teensy_size:   FLASH: code:455116, data:133488, headers:8864   free for files:7528996
teensy_size:    RAM1: variables:83588, code:252520, padding:9624   free for local variables:178556
teensy_size:    RAM2: variables:28832  free for malloc/new:495456

With QNEthernet:
teensy_size: Memory Usage on Teensy 4.1:
teensy_size:   FLASH: code:367156, data:81732, headers:8452   free for files:7669124
teensy_size:    RAM1: variables:121504, code:301912, padding:25768   free for local variables:75104
teensy_size:    RAM2: variables:52832  free for malloc/new:471456

I have already tried changing the values that @danielkr modified in reply #7 of this thread, but it only reduced the RAM1 usage by around 2kB.
When I change the compiler optimization level to TEENSY_OPT_SMALLEST_CODE, I do get lower RAM1 usage, but this will also affect other parts of my code which I would like to continue running using the default optimization. I didn't try to use a newer toolchain yet, as it seems like TD 1.58 will bring this update natively, so I will wait for it.
As this thread was created at the end of 2021, maybe something has changed in the library or you are aware of something else I could do to help reduce RAM1 usage. Maybe move some other buffers to RAM2? Maybe a way to disable some other unused resources?
Thank you so much for your help!
 
I think building with a later compiler will help. (See post #15.) I haven’t been able to test with the newest 1.58 pre-release because there’s no Mac version. (I haven’t installed Linux yet on a VM.) I suspect that’s one thing that will help.

I haven’t spent much time space-optimizing the library, but try experimenting with reducing what lwIP pre-allocates. For example, the number of TCP and UDP sockets, removing mDNS support (see the latest Changelog mention of LWIP_MDNS_RESPONDER), reducing the MEM_SIZE, etc.
 
Hi Shawn, I built with the later compiler and it did in fact help. The RAM1 usage went down by 40kB.
As for reducing the lwIP pre-allocates, I didn't have much success. Setting LWIP_MDNS_RESPONDER to 0 did not affect my RAM usage. Changing MEMP_NUM_UDP_PCB to 2 instead of 8
and MEMP_NUM_TCP_PCB to 2 instead of 8 did help me gain extra 2 kB only. Reducing MEM_SIZE only affects RAM2 usage right know, which I have plenty and it's not a issue. My main limitation is RAM1.

I used PlatformIO Memory Inspect tool to search which symbols were using the most RAM1 (bss + data + text.itcm) and found three relevant results which come from QNEthernet:

1 - memp_memory_PBUF_POOL_base which uses 23.9 KB of RAM
2 - rxbufs which uses 7.5 KB f RAM
3 - txbufs which uses 7.5 KB f RAM

I found where rxbufs and txbufs are located (src/lwip_t41.c) and tried to add the "DMAMEM" definition to see if I could move them to RAM2. The code compiles and I gain extra 15 KB, as expected. Unfortunately, the library stopped working correctly when I did this. My EthernetClient wasn't able to communicate properly. Do you know why this happens?
As for the "memp_memory_PBUF_POOL_base" symbol, I could not find where it comes from. Do you know what it is? Maybe we could also try to move it to RAM2 ?
Thanks!
 
I suspect that the reason the TX and RX buffers don't work from RAM2 is that the processor's Ethernet subsystem might need to operate from RAM1. I also tried moving those to RAM2 a while back but didn't get it to work. (If anyone knows more about this, please comment here.)

I think `memp_memory_PBUF_POOL_base` might be something internal to lwIP, maybe related to the MEM_SIZE option? It's set to 24000 (23.44KiB) by default in lwipopts.h.

You could also try reducing the TX and RX buffers in lwip_t41.c by reducing the `TX_SIZE` and `RX_SIZE` macro values. Each one is 1.5k. Performance may suffer, however; you'd have to experiment.
 
Is QNEthernet using the cache management functions?

For transmitting, normally you would call arm_dcache_flush_delete() after writing data to a buffer but before telling the DMA to access it when the hardware wants to transmit.

For receiving, normally you would call arm_dcache_delete() before reading a buffer where DMA has put received data, so you're sure the reads are coming from the actual memory rather than whatever was previously in the cache.

The tightly coupled memory (aka RAM1) never uses the caches, so these functions aren't needed. But normal memory (RAM2 or DMAMEM) is cached, so these are required to use DMA. It's also important to align the buffers to 32 byte boundaries. Detailed info is in the comments for each of those cache functions.

If you're trying to conserve RAM1, moving only the actual packet buffers to RAM2 but keeping the array of buffer descriptors in RAM1 might be a good trade-off.
 
Last edited by a moderator:
That did it. Thanks, Paul. Now the program RAM1 use decreases by 15k.

This will be in my next release, along with an option to revert to putting the buffers in RAM1.

With a newer compiler saving about 40k and this saving 15k, we're now up to about 55k saved.
 
That's amazing! Thanks again shawn and Paul for all the support. Looking forward to the next QNEthernet release :D
 
Back
Top