What is the maximum sustained data transmit rate for Ethernet on Teensy?

clinker8

Well-known member
I have a data logging problem, but it's really a debugging problem. I'm trying to find unexpected behavior (which I get every few days) and log a lot of data to find the problem. USB isn't so great for this, as there's a bit of blocking, at least from reading the code. I can't have my critical ISR's blocked for longer than 6us, or I miss a count. I'm controlling motors on a lathe, so count failures can result in damages to both me, and my lathe, or someone else's.

Is Ethernet a reasonable answer? Can Teensy 4.1 achieve the full maximum transmit rate for minutes (up to an hour)? A long time ago, I determined the max data payload for TCP/IP was 0.786 * the full link rate. The rest was overhead. So that would be 78.6 Mbps? Or about 9.8 MB/sec? Do I simply fill a buffer and when it is an MTU (1500 bytes) it sends it? (I implemented something like this in C using sockets in 2009, but time has dulled the memory.) Is there more, or less blocking in Ethernet than USB on Teensy 4.1? Can QNEthernet do this?

Just logging where I am in the code with a time stamp is generating 1.18MB/s, and I haven't even begun logging the interesting stuff, or variables, where the bugs are lurking. The rates are well over the link capacity of HW serial / FTDI of 3 Mbps, if I am understanding correctly.

At this point, searching for some ideas to facilitate logging, or an alternate approach. I'm fishing for any info, as I'm running out of ideas. Any insight, tips, clues, or help would be appreciated.
 
At ~40MB/s, USB is physically the fastest way to get data in/out. If it's not getting the job done then you either need to investigate other methods of output buffering, or give up.
 
I can't have my critical ISR's blocked for longer than 6us, or I miss a count. I'm controlling motors on a lathe, so count failures can result in damages to both me, and my lathe, or someone else's.

Just logging where I am in the code with a time stamp is generating 1.18MB/s, and I haven't even begun logging the interesting stuff, or variables, where the bugs are lurking. The rates are well over the link capacity of HW serial / FTDI of 3 Mbps, if I am understanding correctly.

At this point, searching for some ideas to facilitate logging, or an alternate approach. I'm fishing for any info, as I'm running out of ideas. Any insight, tips, clues, or help would be appreciated.

I know you've gone a long way down your road using Encoder, but you really should use QuadEncoder. The counting is done in hardware, so you can never miss one. Your 6 us implies 166K interrupts/sec, and even though the T4 can do that, it's using a lot of CPU, and it's a limiter because everything else has to work around it. With QuadEncoder you could reduce that 166K to 0 and it would be more reliable.

T4 can easily sustain 3 MB/s logging to SD, with no disabling of interrupts, and afterward you can either pull the SD card or use MTP to transfer the files if the card is not accessible. You should minimize your logging rate, though, because you'll need a buffer ~20 % of your data rate. If the data rate is 1 MB/sec, you need a buffer of about 50KB. For 3 MB you need about 150 KB, and it has to be RAM, not PSRAM.

Is your 1.18 MB/s using text? With SD you can log binary data for higher density.
 
At ~40MB/s, USB is physically the fastest way to get data in/out. If it's not getting the job done then you either need to investigate other methods of output buffering, or give up.
As I recall USB is fastest. But in usb.c and associated files, there is considerable amount of blocking, with ISR's globally blocked. I cannot have my ISR's blocked or I lose control of a machine. I may be interpreting things incorrectly, but during receive all ISR's are blocked by a __disable_irq for quite a while. https://forum.pjrc.com/index.php?threads/teensy4-0-and-_disable_irq-in-the-core-code.60831/ During transmit, there's some blockage, but I haven't determined how much.

I realize at some point, you have to throttle the data to fit the pipe, or you have overflows. But I can't have the transport mechanism (USB) cause an unsafe condition. Blocking interrupts for extended times (more than 6us) doesn't work for the application. Thousands of dollars of equipment damage can result, or injury, or death. Which is why I'm asking questions. I want to understand the current limitations, and if they can be changed, or not. I'll throttle the data to make things work, but even so, I can't have USB or any interface block for more than 6us.

Giving up is not in my vocabulary. There's always some way to proceed forward.

That being said, I did experience Teensy resets CRASH during high levels of USB data logging. I had to put in crash reporting to determine the cause... Consequently, I'm a little shy about using USB, even though it's very convenient and rather quick.

I did have a Teensy crash during a machining operation. Fortunately, there was no physical damage. I want to try to ensure that, by design, there won't be an event (under normal operation) that leads to a destructive event.
 
As I recall USB is fastest. But in usb.c and associated files, there is considerable amount of blocking, with ISR's globally blocked. I cannot have my ISR's blocked or I lose control of a machine. I may be interpreting things incorrectly, but during receive all ISR's are blocked by a __disable_irq for quite a while. https://forum.pjrc.com/index.php?threads/teensy4-0-and-_disable_irq-in-the-core-code.60831/ During transmit, there's some blockage, but I haven't determined how much.
Where are you seeing this in the current Teensyduino source code?

This sounds more like you're not setting the priorities for your own interrupts high enough to avoid them being masked/interrupted by others. Of course if you are trying to log from your ISR, that's likely to end in disaster - at some point the data output will buffer, and the interrupt can't interrupt itself to demand attention.
 
I know you've gone a long way down your road using Encoder, but you really should use QuadEncoder. The counting is done in hardware, so you can never miss one. Your 6 us implies 166K interrupts/sec, and even though the T4 can do that, it's using a lot of CPU, and it's a limiter because everything else has to work around it. With QuadEncoder you could reduce that 166K to 0 and it would be more reliable.

T4 can easily sustain 3 MB/s logging to SD, with no disabling of interrupts, and afterward you can either pull the SD card or use MTP to transfer the files if the card is not accessible. You should minimize your logging rate, though, because you'll need a buffer ~20 % of your data rate. If the data rate is 1 MB/sec, you need a buffer of about 50KB. For 3 MB you need about 150 KB, and it has to be RAM, not PSRAM.

Is your 1.18 MB/s using text? With SD you can log binary data for higher density.
If QuadEncoder was 64 bit, and it allowed me to execute the Bresenham algorithm inside it, that would be great. 64 bits will never overflow in my lifetime, 32 bit, will in under a day. It's hard (for me) to debug unsigned arithmetic errors, especially if they are infrequent. I remember struggling with that long, long ago. I'm willing to think about alternative ways to do this, but I have about 3 years of development into this architecture, and a whole lot of testing that shows everything works, save for one new feature. That being said, this new feature was not considered in the original design, back in May 2022.

Understood that it's quite a processing load, but it's how I determine when to move the motor. This determination has to be done in real time. Can't be done in a main loop.

Yes, the 1.18 MB/sec was using text. I could use binary, if it had uniform decoding. I could compress it by a factor of 4 I'd guess. I was thinking of sending filenumber, linenumber, 64bit timestamp. This would give me where in the code I was at a certain time. It does not identify what variables I am logging. So I need an equivalent scheme that (I can remember while coding) to encode the variable name (or number) and it's value. If you know of an easier scheme, please let me know.
 
Where are you seeing this in the current Teensyduino source code?
Well, one place is usb.c, another is usb_serial.c I am running 1.58.1.

In usb.c there is
Code:
static void schedule_transfer(endpoint_t *endpoint, uint32_t epmask, transfer_t *transfer)
{
    // when we stop at 6, why is the last transfer missing from the USB output?
    //if (transfer_log_count >= 6) return;
    //uint32_t ret = (*(const uint8_t *)transfer->pointer0) << 8;
    if (endpoint->callback_function) {
        transfer->status |= (1<<15);
    }
    __disable_irq();
    //digitalWriteFast(1, HIGH);
    // Executing A Transfer Descriptor, page 2468 (RT1060 manual, Rev 1, 12/2018)
    transfer_t *last = endpoint->last_transfer;
    if (last) {
        last->next = (uint32_t)transfer;
        if (USB1_ENDPTPRIME & epmask) goto end;
        //digitalWriteFast(2, HIGH);
        //ret |= 0x01;
        uint32_t status, cyccnt=ARM_DWT_CYCCNT;
        do {
            USB1_USBCMD |= USB_USBCMD_ATDTW;
            status = USB1_ENDPTSTATUS;
        } while (!(USB1_USBCMD & USB_USBCMD_ATDTW) && (ARM_DWT_CYCCNT - cyccnt < 2400));
        //USB1_USBCMD &= ~USB_USBCMD_ATDTW;
        if (status & epmask) goto end;
        //ret |= 0x02;
        endpoint->next = (uint32_t)transfer;
        endpoint->status = 0;
        USB1_ENDPTPRIME |= epmask;
        goto end;
    }
    //digitalWriteFast(4, HIGH);
    endpoint->next = (uint32_t)transfer;
    endpoint->status = 0;
    USB1_ENDPTPRIME |= epmask;
    endpoint->first_transfer = transfer;
end:
    endpoint->last_transfer = transfer;
    __enable_irq();
    //digitalWriteFast(4, LOW);
    //digitalWriteFast(3, LOW);
    //digitalWriteFast(2, LOW);
    //digitalWriteFast(1, LOW);
    //if (transfer_log_head > LOG_SIZE) transfer_log_head = 0;
    //transfer_log[transfer_log_head++] = ret;
    //transfer_log_count++;
}
seems to turn off interrupts globally during a transfer.

Usb transmit and usb receive both call schedule transfer, so both are blocked, for significant times. That is, if I am interpreting what I see correctly.
 
Usb transmit and usb receive both call schedule transfer, so both are blocked, for significant times. That is, if I am interpreting what I see correctly.

I came to the same conclusion. I can't use USB communication if I'm doing something with interrupts with critical timing.

Understood that it's quite a processing load, but it's how I determine when to move the motor. This determination has to be done in real time. Can't be done in a main loop.

Understand that you've gone too far to change now, but this can be done with quadrature counting in hardware. Instead of an interrupt on every edge and counting in software, you can count in hardware and read the count in a timer ISR at some frequency high enough to get the necessary resolution and response for your application. As long as the count is read frequently, you can synthesize a 64-bit position in software exactly as you are doing now. The only difference is that the delta can be greater than 1 count per update. I did some googling on electronic lead screw and found some videos by clough42 that describe an implementation like this.
 
That isn't blocking during the transfer, it's blocking while queueing the transfer. Most likely because it's not an atomic operation and there's a chance of an interrupt occurring in the middle and breaking the linked list.
 
That isn't blocking during the transfer, it's blocking while queueing the transfer. Most likely because it's not an atomic operation and there's a chance of an interrupt occurring in the middle and breaking the linked list.
Code:
        uint32_t status, cyccnt=ARM_DWT_CYCCNT;
        do {
            USB1_USBCMD |= USB_USBCMD_ATDTW;
            status = USB1_ENDPTSTATUS;
        } while (!(USB1_USBCMD & USB_USBCMD_ATDTW) && (ARM_DWT_CYCCNT - cyccnt < 2400));
The way I figure it, there's blocking up to 2400 cycles just in this snippet. Do it twice and my budget is blown.

2400 x 1.6ns = 3.84 us. 2 x 3.84 us > 6 us, my interrupt rate. Seems this USB is interrupt unfriendly. A higher priority ISR should be able to
preempt it, but the __disable_irq statements prevent that. None of my interrupts would affect this queuing. Dunno, it's a large amount of code that blocks interrupts. Usually blocking is when you have to change a critical register or volatile and is for a few instructions. This is for 2400 cycles.

On the other hand, I don't know the USB interface timing requirements. Even so, in my humble opinion, this is not control system friendly.
 
I came to the same conclusion. I can't use USB communication if I'm doing something with interrupts with critical timing.



Understand that you've gone too far to change now, but this can be done with quadrature counting in hardware. Instead of an interrupt on every edge and counting in software, you can count in hardware and read the count in a timer ISR at some frequency high enough to get the necessary resolution and response for your application. As long as the count is read frequently, you can synthesize a 64-bit position in software exactly as you are doing now. The only difference is that the delta can be greater than 1 count per update. I did some googling on electronic lead screw and found some videos by clough42 that describe an implementation like this.
I could change things, sometimes that HAS to happen.

I know all about clough42, I was going to implement his system during the pandemic. But the TI control boards were unavailable for over 12 months due to supply chain disruptions, After being told the delivery time was another 6 months out in April 2022, I found out about Teensy, and ordered one. In May of 2022, I decided to roll my own system, using a clean sheet and a Teensy 4.1. By August 2022 it was running on my lathe. (I had to machine parts and alter my lathe to do this, so it wasn't all coding!) Had a working PCB (that I designed in KiCAD) by Sept. 2022. PCB required no rework (to this day). My design had a touch panel display from the beginning, as I was disenchanted by the primitive UI of James' design. I worked out the math, and the control using other websites. At no time did I use Clough42's code, as I found his coding style (and documentation) difficult to follow. His videos are great, but his coding style is less so, but that's only my personal opinion. The one advantage of the TI processor was that it had ICE, which is sadly lacking in Teensy. But the Teensy is over 6x faster. So it allowed different approaches.

My control algorithm is position dependent, not velocity dependent. Actions are taken to exactly maintain the relationship between the spindle angular position and the Z position of the carriage (and indirectly the stepper angle). If the lathe motor slows down due to load, or any other reason, the Z position exactly tracks. This is essential to guarantee that screw thread cuts are true, with no wavering or drunkenness in pitch.

If you think that that kind of accuracy and repeatability is possible, (or plausible), I could spend some time looking into the approach above. The Teensy is controlling to the single micron in distance, and 360/4096 degrees in angle on the spindle. There's definitely merit to your comments, and I'm thinking about it. Wheels are turning.

If I make this big jump, it will have to be a very new and experimental branch. Fortunately for me, my code is in git, so branches are trivial to manage. Due to the complexity of the project, had to use git to manage change. Every time a new feature is added, a new branch was created so everything done prior to that point was preserved.
 
To sort of answer the original question - I have managed to send a sustained 32 Mbits/s to the teensy over ethernet using the asyncTCP library, in testing I could push it up to around 50 without issues. This is receiving rather than sending so a little different but gives you an idea of raw throughput.
The teensy was acting as a buffer between a PC application connected over the network and a fairly shallow hardware buffer that couldn't miss a single byte of data or the whole system would fall over. Filling that hardware buffer was top priority and would interrupt the ethernet.

The biggest issue I hit was on the flow control side, the top level of the teensy code would attempt to send a flow control message to the PC application at a fixed rate. This would sometimes fail because the underlying IP stack was still waiting for the PC application to acknowledge the previous flow control message.

So if trying to send lots of data I would recommend either UDP or if you need to use TCP make sure the other end acks the packets promptly, unlike a desktop that has plenty of memory for buffering up packets in case they need to be retransmitted the library on the teensy won't buffer much and will instead block until it knows it doesn't need to retransmit.
Also the async transmit functions don't make a copy of the buffer you ask them to transmit, if you re-use that location as soon as the function returns you'll corrupt the data to be sent.
 
My control algorithm is position dependent, not velocity dependent. Actions are taken to exactly maintain the relationship between the spindle angular position and the Z position of the carriage (and indirectly the stepper angle). If the lathe motor slows down due to load, or any other reason, the Z position exactly tracks. This is essential to guarantee that screw thread cuts are true, with no wavering or drunkenness in pitch.

If you think that that kind of accuracy and repeatability is possible, (or plausible), I could spend some time looking into the approach above. The Teensy is controlling to the single micron in distance, and 360/4096 degrees in angle on the spindle. There's definitely merit to your comments, and I'm thinking about it. Wheels are turning.

QuadEncoder gives you position, just like Encoder. The main difference is the counting is done in hardware, with no interrupts, versus software with an interrupt per edge (Encoder).

IIRC, you have been using USB Serial for logging. Did you have your first crash before you added this logging, or after? I think of USB Serial as incompatible with your high interrupt rate from Encoder.
 
To sort of answer the original question - I have managed to send a sustained 32 Mbits/s to the teensy over ethernet using the asyncTCP library, in testing I could push it up to around 50 without issues. This is receiving rather than sending so a little different but gives you an idea of raw throughput.
The teensy was acting as a buffer between a PC application connected over the network and a fairly shallow hardware buffer that couldn't miss a single byte of data or the whole system would fall over. Filling that hardware buffer was top priority and would interrupt the ethernet.

The biggest issue I hit was on the flow control side, the top level of the teensy code would attempt to send a flow control message to the PC application at a fixed rate. This would sometimes fail because the underlying IP stack was still waiting for the PC application to acknowledge the previous flow control message.

So if trying to send lots of data I would recommend either UDP or if you need to use TCP make sure the other end acks the packets promptly, unlike a desktop that has plenty of memory for buffering up packets in case they need to be retransmitted the library on the teensy won't buffer much and will instead block until it knows it doesn't need to retransmit.
Also the async transmit functions don't make a copy of the buffer you ask them to transmit, if you re-use that location as soon as the function returns you'll corrupt the data to be sent.
Thanks for your helpful response. It gives me an idea of what to expect, at least for receive. I've done both ends of a client server in C long ago, lots of ACK'ing etc.
 
QuadEncoder gives you position, just like Encoder. The main difference is the counting is done in hardware, with no interrupts, versus software with an interrupt per edge (Encoder).

IIRC, you have been using USB Serial for logging. Did you have your first crash before you added this logging, or after? I think of USB Serial as incompatible with your high interrupt rate from Encoder.
I reviewed QuadEncoder again. Can't use it as is, as it breaks my entire software architecture. But I still will look at it to see if there's a non complex method to adapt it's use. At the moment, I'm having a tough time disconnecting from my existing paradigm. It's forcing a big time rethink.

I got my first crash when I was transmitting every change in position in Z as I was threading to a stop. I literally got interrupts every micron of Z movement, as well as 400/60*4096 times per second (36.6us) for the rotary encoder. The Z sensor is a linear quadrature encoder, so it uses a different instantiation of EncoderTool. I was attempting to push too much data down the pipe AND remain operational. It was a lesson relearned. And I was attempting to print during interrupts... All the bad things, he says ashamedly.

Once I got rid of ISR high rate print statements, haven't got a crash, but I have had a couple of disturbing anomalous events, where the motor reverses, rather than going in the correct motion. Once that happens, my code doesn't quite behave correctly. I really want to track that down and fix it. Darn thing shouldn't be going the wrong direction! More often, one in 15 times, there's a motor vibration, as if it might want to reverse. This may be a different bug, or it's related.

But the real reason I want to log this data is to track my apparent loss of phase information. This is the phase of the start of the thread. It needs to be identical every cutting pass. (Can't cut a thread on most lathes in a single pass, it can take 5-8 passes.) These successive thread starts need to start on the work piece within one count of my encoder wheel or 0.087890625 degrees, or one creates a bad thread.

At the moment, I have no additional ideas what to monitor, it seems my algorithm may be faulty or I have an error being injected that I haven't accounted for. (Or something else!) It's a tough problem to solve. I don't know of anyone that has successfully accomplished what I am trying to do with their ELS's. I did research some solutions, and I'm approaching the problem entirely differently, which means I might be missing some important details, or some of my assumptions are faulty. But hey, that's what keeps life interesting, being out on the frontier (of my) knowledge.
 
Fully understand you can't just change your architecture. I looked up the Bresenham algorithm. Are you using that to draw the "line" of the thread, i.e. to determine when to step the stepper motor? It sounds like you can take a control action on any encoder edge. If you're seeing incorrect control actions, I would focus on logging the important inputs and outputs of the control logic, as opposed to trying to log program flow.
 
I'd second the suggestion to use hardware quadrature encoder counting. The Teensy processor can do it so you should. Basically nobody tries to do encoder capture in software. I've certainly never done it on any motor controllers. Granted, I was not interested in the level of precision being discussed here but the fact remains that the hardware can do the counting for you and so that's the best option. It runs independent of the processor instruction stream so it's essentially free. Then the latency will be down to how often you process through the position loop to find the # of counts and move accordingly. Obviously, doing this willy-nilly will lead to strange artifacts in your lathe motion. The obvious solution is to run your position code on a timer interrupt and capture both the current cycle count (or just grab the 32 bit microseconds) and the encoder count so you can figure out how many counts per interval you are seeing and move accordingly. 32 bit counter registers are not a problem so long as you don't have more than 4 billion counts per processing loop. Otherwise, unsigned math "just works". If you overflow, you overflow, no big deal. Doing the math correctly leads to the correct result regardless. That is, if your count is 100 and previously was 4294967200 then (100 - 4294967200) does still give you the result of 196 counts like you want it to even though it looks ludicrous to assert that.

Code:
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

void main(void)
{
   uint32_t last = 4294967200; 
   uint32_t curr = 100;
   uint32_t val = curr - last;
   printf("Value: %u\n", val);
}

It really equals 196 because the math properly wraps around. So, honestly there isn't any danger in 32 bit counters so long as you do the math right. I will admit, it's easy to mess this up though.
 
Fully understand you can't just change your architecture. I looked up the Bresenham algorithm. Are you using that to draw the "line" of the thread, i.e. to determine when to step the stepper motor?
Yes. It is executed in integer math, very fast.
It sounds like you can take a control action on any encoder edge.
Not quite. I use every state change as a count, to determine when to step the motor. So I get a resolution of 4096 counts in a single rotation of the spindle.
If you're seeing incorrect control actions, I would focus on logging the important inputs and outputs of the control logic, as opposed to trying to log program flow.
Knowing where I am does help if there's an unexpected change in program flow. I'm not expecting a change in flow, but stuff happens. If I don't look for it, how will I know it didn't happen? Then I can eliminate that from the massive list of possible sources of error.

Yes, I'm slowly getting there. Generating a lot of logging as a result. The important inputs and outputs seem to be many, which make this tougher. Perhaps it can all be simplified, once I fully understand it all. When you are creating new stuff, sometime you over complicate things. Eventually you pare it back to the essence.
 
Last edited:
I'd second the suggestion to use hardware quadrature encoder counting. The Teensy processor can do it so you should. Basically nobody tries to do encoder capture in software. I've certainly never done it on any motor controllers. Granted, I was not interested in the level of precision being discussed here but the fact remains that the hardware can do the counting for you and so that's the best option. It runs independent of the processor instruction stream so it's essentially free. Then the latency will be down to how often you process through the position loop to find the # of counts and move accordingly. Obviously, doing this willy-nilly will lead to strange artifacts in your lathe motion. The obvious solution is to run your position code on a timer interrupt and capture both the current cycle count (or just grab the 32 bit microseconds) and the encoder count so you can figure out how many counts per interval you are seeing and move accordingly. 32 bit counter registers are not a problem so long as you don't have more than 4 billion counts per processing loop. Otherwise, unsigned math "just works". If you overflow, you overflow, no big deal. Doing the math correctly leads to the correct result regardless. That is, if your count is 100 and previously was 4294967200 then (100 - 4294967200) does still give you the result of 196 counts like you want it to even though it looks ludicrous to assert that.

Code:
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

void main(void)
{
   uint32_t last = 4294967200;
   uint32_t curr = 100;
   uint32_t val = curr - last;
   printf("Value: %u\n", val);
}

It really equals 196 because the math properly wraps around. So, honestly there isn't any danger in 32 bit counters so long as you do the math right. I will admit, it's easy to mess this up though.
At the moment, I can't see how to integrate the Bresenham algorithm, which is an integral part of what I need to do. Just knowing the count isn't good enough... I need to generate stepper pulses at exactly the right time, every time. Also have a working system for 3 years now, and thousands of hours of testing, so not wanting to jump to an approach with zero experience.

In my past experience, more than 50% of the errors in programming for automotive radar sensors were related to unsigned integer processing. So yes, I agree, it's real easy to mess up. Having seen the mess it can create, in many subtle and not so subtle ways, not likely to take it on, unless I've exhausted other options. Basically it would be a rewrite of a majority of my control. That's a rewrite of 2000 LOC. If it were you, would you want to start over again?
 
At the moment, I can't see how to integrate the Bresenham algorithm, which is an integral part of what I need to do. Just knowing the count isn't good enough... I need to generate stepper pulses at exactly the right time, every time. Also have a working system for 3 years now, and thousands of hours of testing, so not wanting to jump to an approach with zero experience.

In my past experience, more than 50% of the errors in programming for automotive radar sensors were related to unsigned integer processing. So yes, I agree, it's real easy to mess up. Having seen the mess it can create, in many subtle and not so subtle ways, not likely to take it on, unless I've exhausted other options. Basically it would be a rewrite of a majority of my control. That's a rewrite of 2000 LOC. If it were you, would you want to start over again?

Am I correct in guessing that you're using the Bresenham algorithm and making a decision on each encoder edge as to whether it's time to step the motor? I can definitely see the allure of this approach, and if that's all that you were doing, and the max interrupt rate was 166K/sec, it seems like that would be okay. I know you have a touch screen, and probably other I/O, so it's hard to say what you might need to investigate. My feeling is you should focus on logging the encoder pulses and control outputs. If the anomoly occurs and you do not see it in the log, then perhaps it's not the control, but rather some of the peripheral stufff.

I wouldn't be too wary of the unsigned math stuff for the encoder. I use it all the time, and I don't think it's ever been a problem.
 
Am I correct in guessing that you're using the Bresenham algorithm and making a decision on each encoder edge as to whether it's time to step the motor?
Yes. The decision is dependent on the settings made by the operator, different feed rates or threads change the timing.
I can definitely see the allure of this approach, and if that's all that you were doing, and the max interrupt rate was 166K/sec, it seems like that would be okay. I know you have a touch screen, and probably other I/O, so it's hard to say what you might need to investigate. My feeling is you should focus on logging the encoder pulses and control outputs. If the anomoly occurs and you do not see it in the log, then perhaps it's not the control, but rather some of the peripheral stufff.

I wouldn't be too wary of the unsigned math stuff for the encoder. I use it all the time, and I don't think it's ever been a problem.
I did experiments in the beginning to ensure the algorithm could run up to 5000 RPM. My lathe (more importantly, my lathe chuck) only goes to 2000 RPM, so I considered it as a 2.5x safety factor. But adding extra functionality means more processing burden.

The state machine is complex. (It probably has bugs, this is development, and that happens.) I have a strong feeling that there's insufficient bandwidth to log the encoder and all the outputs. (My few crashes occurred during such attempts. Got some insightful data however.) So I'm trying to log the state and outputs at the moment. It's half the story, but maybe that's all that will fit in the data pipe. (reality check) But the control happens (step pulse) during an ISR, and we know that it's hard to print during an ISR --> crash! Main loop isn't fast enough to set a flag and log that a single step happened, but maybe a step counter could be used and that might be printed. However, I wouldn't know how to interpret multiple steps were correct or not, without further analysis.

Here's an idea, not sure if it's silly or not. How about starting out by using the quad encoder as a check on the SW encoder? Maybe I could somehow do a correction to 64 bits. Then as part of the main loop check that the HW and SW counts are in sync? This restricts the output BW, but also it does perform an important task. So I log if there's a loss of sync between HW and SW.

As for unsigned math, how do you handle the rollovers when you are using inequalities? Especially if considering reversals or motor chatter near the rollovers? This is 1/2 of the Bresenham algorithm, out of my source code.

delta is the sign of the encoder increment. I have to know if the delta is positive or negative. The inequalities change on the sign, as well as the math. Shown is the case for delta>0. I have two processes running, the physical Bresenham running the stepper and a virtual one. myacc is an accumulator. D, or D' is the denominator, and N is the numerator, and N/D (N/D') represents the slope of the line.
C:
if (delta > 0)
  {
    if (delta > 1) {
      Serial.printf("fault! delta = %lli\n", delta);  errorcount += 1; }
    else {
      // delta = 1
      myacc = myacc + delta*N;  myaccV = myaccV + delta*N;
      if (myacc >= Dprime)  // works for positive accumulation only
      {
        myacc = myacc - Dprime;
        if (righthandthread) {
          mydir = CCW;
          if ((nfeed||sfeed||nthread||sthread)&&stepperactive) doincrement(mydir); // stepper incremented one pulse
        }
        if (lefthandthread)  {
          mydir = CW;
          if ((nfeed||sfeed||nthread||sthread)&&stepperactive) doincrement(mydir);
        }
      }
      if (myaccV >= D)
      {
        myaccV = myaccV -D;
        if (righthandthread) virtualsteppercount += 1;
        if (lefthandthread)  virtualsteppercount -= 1;
      }
    } 
  }

Other tasks are updating a display which has context sensitive buttons, doing touch sensing, and updating RPM and dual axis position information. Outputs are modest, just 3 control lines to a stepper driver. (Enable, Direction, and Pulse). An additional task in the main loop is checking if there are any commands via serial. The single character commands are for debugging info.

Normally the Teensy is stand alone, not connected to a PC, just the display and connected to the lathe.
 
Does your code write to USB (Serial) while the lathe is running? If the answer is yes, then I think the first thing you need to do is stop that.

In my testing of a system with a 10-kHz control loop, I had to completely avoid writing to USB Serial. My code was instrumented to measure and record the time between 10-kHz ISRs. With no use of USB Serial, the min/max time between execution of the ISR was very, very close to 100 us. With USB Serial in use, the 10-kHz ISR could be delayed by 10s of us due to the USB code disabling interrupts. If that occurred in your system, you would miss encoder edges.

Yes, you could certainly connect your encoder A/B signals to a pair of pins that are configured for QuadEncoder. You could read the QuadEncoder count value from your edge ISR and the two should remain in sync, unless your edge ISRs are being delayed by interrupts being disabled.
 
Does your code write to USB (Serial) while the lathe is running? If the answer is yes, then I think the first thing you need to do is stop that.

In my testing of a system with a 10-kHz control loop, I had to completely avoid writing to USB Serial. My code was instrumented to measure and record the time between 10-kHz ISRs. With no use of USB Serial, the min/max time between execution of the ISR was very, very close to 100 us. With USB Serial in use, the 10-kHz ISR could be delayed by 10s of us due to the USB code disabling interrupts. If that occurred in your system, you would miss encoder edges.

Yes, you could certainly connect your encoder A/B signals to a pair of pins that are configured for QuadEncoder. You could read the QuadEncoder count value from your edge ISR and the two should remain in sync, unless your edge ISRs are being delayed by interrupts being disabled.
If I can't output to USB, then how do I log a "reasonable" amount of data? If I can only log at acoustic coupler rates (an exageration, but) and I'm running a system at 27 KHz, am I not running blind? How do I know I went from one state to another? and when I did it? I can only monitor 2 pins with a scope... No fancy logic analyzer at home.

Logging is during the main loop, except for that one fault in the ISR. I've never hit that fault in Bresenham in 3 years. Then again, I'm not sure if it actually can happen, by the very construction of EncoderTools.

I'll see if I can possibly cobble in QuadEncoder as a check on my SW encoder. At only 400 RPM, think it would take 44 hours to roll over the 32 bit encoder, which means I won't run into it if I'm doing short term testing. Want to keep this as simple as possible for now. If I'm within one count, it probably means all is well. I just don't want to see slipping...
 
If I can't output to USB, then how do I log a "reasonable" amount of data? If I can only log at acoustic coupler rates (an exageration, but) and I'm running a system at 27 KHz, am I not running blind? How do I know I went from one state to another? and when I did it? I can only monitor 2 pins with a scope... No fancy logic analyzer at home.

For T4.1-based controllers, since I can't use USB Serial while they are running, I log to RAM and read the data via hardware serial (UART). The main technique that I use for production systems is to log continuously to a ring buffer and then stop the udpates when a fault occurs. After the system enters a "shutdown" state, I can then read the data via UART and examine what was happening before and after the fault.

For data acquisition systems, where I want to log large amounts of data, I write to SD during operation, then transition to a "file transfer" state where I can retrieve the files via MTP (USB). The SD approach is the one that better fits the way you are trying to debug. SdFat has a built-in RingBuf template class that makes it easy to log at high frequency as long as the total data rate stays within bounds. I've done some bench-marking that shows it's possible to log to SD continuoulsy at 1 MB/sec using less than 1% of the CPU, with no disabling of interrupts and never blocking more than about 5 us. That's pretty good, and I've never had a data rate near 1 MB/s in a working system. It's always much less than that. The diagram below shows the basic idea of writing to the RingBuf from the ISR and then writing the data to SD in loop(). There are two rules you must follow to avoid blocking in the calls to SD's write() function. (1) ALWAYS write to SD in chunks of 512 bytes, and (2) ONLY write to SD when file.isBusy() returns false. RingBuf must be large enough to allow compliance with these two rules, and my rule of thumb is the buffer needs to be large enough for 50 ms of data. For 1 MB/s data rate, that means you need a 50KB buffer.

[ISR] ----> [RingBuf]----->[loop]----->[SD]

The easiest change for you to make would be to write to UART instead of USB Serial, but instead of writing on every interrupt, write only when you detect a problem.
 
Back
Top