Teensy 4.1 hangs using EthernetClient write operation after 10 to 15 minutes

hemantkap

Member
Teensy 4.1 freezes with below code after 10 minutes or so.... Can you please help to find where the issue could be or where should I start looking

In below code, my EthernetServer waits for a client to be connected (client used is Windows TCP Client).

Once client is connected we send so dummy data every 20 milliseconds.

I have a low priority timer interrupt running every 50 milliseconds which trys to grab the async buffer using m_bufferMutex.try_lock() and if succedds then it puts dummy WRITE_SIZE much data to async buffer.

Code:
EthernetServer server(tcp_port); //global
EthernetClient m_client; //global

void TcpServerComms::init() // called in setup
{
    server.begin();
}

void TcpServerComms::task() //Called in loop()
{
    //Check if we have a client connected
    if (!m_client)
    {
        m_client = server.accept();
        if (!m_client) { return; }
        m_client.println("Hello");
        Serial7.printf("TCP Client connected \n");
    }
    
    //We check for data every 20 ms
    auto currentMs = millis();
    if ((currentMs - m_timeScheduledMs) < 20) { return; }
    m_timeScheduledMs = currentMs;

    if (m_client)
    {
        //CHECK IF WE HAVE ANY ASYNC DATA TO SEND
        auto dataPresent = m_asyncDataBuff.size();
        auto dataToRead = dataPresent > WRITE_SIZE ? WRITE_SIZE : dataPresent;
        uint32_t clientSndBufSize = m_client.availableForWrite();
        //Check how much data we can write
        dataToRead = dataToRead > clientSndBufSize ? clientSndBufSize : dataToRead;
        if (dataToRead > 0)
        {
            if (m_bufferMutex.try_lock() != 0)
            {
                uint32_t snappy = micros();
                auto readData = m_asyncDataBuff.readBytes(m_tempBuffer, dataToRead);
                m_bufferMutex.unlock();
                m_client.write(m_tempBuffer, readData);
                m_client.flush();
                snappy = micros() - snappy;
                Serial7.printf("TCP Serial data sent time = %d us\n", snappy);
            }
        }
    }
}
 
Difficult to read this code. Things like "m_asyncDataBuff" and "m_bufferMutex" are a mystery. Most of your local variables are "auto" so we can't even tell what type things are without knowing the definition of all this mystery stuff.

But quick blind guess, is your interrupt code printing to Serial or doing any other I/O? That sort of thing often works but can suffer rare race conditions if the main program also does similar I/O. Would be consistent with the observed behavior of works for 15 minutes but eventually locks up.
 
Thanks Paul for the quick response.
m_asyncDataBuff is a templated circular buffer which I implemented.
Code:
TeensyCircularBuffer<uint8_t, TcpServerCommsName::ASYNC_BUFFER_SIZE> m_asyncDataBuff; //ASYNC_BUFFER_SIZE = 64*1024 and it uses malloc to assign data

other auto is
Code:
Threads::Mutex m_bufferMutex; //this is used to ensure read and writ operations are atomic.

My low priority interrupt is very simple
in setup(), I have:
Code:
static IntervalTimer g_backgroundTaskTimer;
g_backgroundTaskTimer.priority(255);
g_backgroundTaskTimer.begin(backgroundTask, 500);

And my backgroundTask() is
Code:
static void backgroundTask()
{
    //We add to async buffer every 20 milli seconds
    uint32_t currentMs = millis();
    if ((currentMs - snappyMs) < 20) { return; }
    snappyMs = currentMs;
    TcpServerComms::instance()->sendRaw(s_data, 256);
}

My send raw is as follows:

Code:
void TcpServerComms::sendRaw(uint8_t* msg, uint32_t len)
{
    if (!m_client) { return; }
    if (m_bufferMutex.try_lock() == 0)
    {
        Serial7.printf("Cannot Lock mutex bailing.... Data missed\r\n");
        return;
    }
    m_asyncDataBuff.write(msg, len);
    auto overflow = m_asyncDataBuff.size() >= ASYNC_BUFFER_SIZE ? true : false;
    m_bufferMutex.unlock();
    if (overflow == true)
    {
        Serial7.printf("TCP Async Buffer overflow %d\r\n");
    }
}

So nothing too exciting here.... Still it refuses to work
 
When supplying software the idea is that you supply the COMPLETE software so that someone can compile the code, confirm your errors (or not) and thereby help you fix them.
 
Even without knowing what thread library/implementation is being used, the part about the timer interrupt calling try_lock() sounds incorrect.
Interrupt routines run outside of threads and should not be able to own mutexes.
 
Hello @BriComp and @jmarsh

I appreciate your feedback., I have a very large project so will try to create a simple project and post it here.

Meanwhile, I am having issue with debugging QNETHERNET stack.

I tried modifying lwipopts.h with below changes

Code:
class HardwareSerialIMXRT;
extern HardwareSerialIMXRT Serial7;
#define LWIP_PLATFORM_DIAG(x) do {Serial7.printf(x)} while(0)
 #define LWIP_DEBUG
 #define LWIP_DBG_MIN_LEVEL LWIP_DBG_LEVEL_ALL
 #define LWIP_DBG_TYPES_ON  LWIP_DBG_ON

Since, the header is included in c files so obviously I am getting errors

Code:
lwip_driver.c:7: from
 
lwipopts.h: 458:1: error: unknown type name 'class
   458 | class HardwareSerialIMXRT
   | ^~~~~

Can you please guide me as how do I get the TCP/IP stack print debug messages to Serial7
 
Code:
static IntervalTimer g_backgroundTaskTimer;
g_backgroundTaskTimer.priority(255);
g_backgroundTaskTimer.begin(backgroundTask, 500);

static void backgroundTask()
{
    //We add to async buffer every 20 milli seconds
    uint32_t currentMs = millis();
    if ((currentMs - snappyMs) < 20) { return; }
    snappyMs = currentMs;
    TcpServerComms::instance()->sendRaw(s_data, 256);
}

void TcpServerComms::sendRaw(uint8_t* msg, uint32_t len)
{
    if (!m_client) { return; }
    if (m_bufferMutex.try_lock() == 0)
    {
        Serial7.printf("Cannot Lock mutex bailing.... Data missed\r\n");
        return;
    }
    m_asyncDataBuff.write(msg, len);
    auto overflow = m_asyncDataBuff.size() >= ASYNC_BUFFER_SIZE ? true : false;
    m_bufferMutex.unlock();
    if (overflow == true)
    {
        Serial7.printf("TCP Async Buffer overflow %d\r\n");
    }
}

So you have two printfs that can occur in an interrupt context in the event of an error. The second of them is malformed.
Does the code still crash if you remove them?
 
Back
Top