FlexCAN - Infinite / Endless TX Retries

Status
Not open for further replies.

CANHelp

Member
Chipset : Teensy 3.2 / NXP MK20
CANBUS Transceiver : SN65HVD230D

I'm having an issue where if my 3.2 becomes isolated on the bus, and stops receiving a dominant ACK bit from other devices, it floods the bus with infinite retries. Is there a way to tell FlexCAN that I DO NOT want it to retry more than once? (per specified time period preferably) Having the bus get loaded up over 90% utilization on retries just isn't acceptable.
 
it doesn't transmit unless it gets an ACK, so obviously something is ACKing it, you are responsible for not sending if bus is inactive. If you turn off a UART device do you think your runaway code will stop? :)
You can use a volatile millis() counter like i do once the bus goes idle i don't transmit anymore..

if nothing was actually ACKing, mailboxStatus() would show all TX mailboxes full and not sending........ this would result in bus-off state until the bus returns to normal then bus-off is cleared in hardware automatically and transmits resume.
 
it doesn't transmit unless it gets an ACK, so obviously something is ACKing it, you are responsible for not sending if bus is inactive. If you turn off a UART device do you think your runaway code will stop? :)
You can use a volatile millis() counter like i do once the bus goes idle i don't transmit anymore..

if nothing was actually ACKing, mailboxStatus() would show all TX mailboxes full and not sending........ this would result in bus-off state until the bus returns to normal then bus-off is cleared in hardware automatically and transmits resume.

That's not accurate. I deal with CAN devices on a mostly daily basis, however in this case I wanted something cheap / off the shelf to use as an additional node. You can very easily have other powertrain or power control devices swapping between listen only (passive) and active on the bus. These devices have built in protection to keep them out of babbling idiot mode, however having tried 6 different forks in the FlexCAN library, each of the ones that function on a 3.2 exhibit the same behavior. If you end up in a situation where the other devices on the bus have swapped to passive, when FlexCAN on the 3.2 loses ACK signal it instantly floods the bus at a rate of >100 messages per MS. It's pretty easy to see on data logs, and there are lots of posts about this issue but I haven't seen a solution. For example, if I tell the 3.2 that after a CAN transmit, if it sees an ACK error I want to disable the output, literally the next line. I still see >100 messages on the BUS at a rate >1000hz. There must be some sort of inherent protection to prevent this but I seem to be missing it (without having to change chips), which is why I posted. Not to have someone send a snarky response.

To further elaborate, on the NXP forum their tech support says

One of the features of the CAN protocol is that will keep sending the package until one node answers.
 
Last edited:
I didn't send a snarky remark? You are welcome to do a pull request if you have a solution the library is always evolving, and maintained as much as possible
 
I didn't send a snarky remark? You are welcome to do a pull request if you have a solution the library is always evolving, and maintained as much as possible

I don't see it either tonton81 just a " ? :) " : when a Teensy goes nuts ... turning off the SerMon doesn't fix it or stop it sending.
> not using CAN since early Beta function tests: the question/assumption was as I read that note :: something is somehow flood pinging the T_3.2 causing it to ACK at a high rate? :)

BTW: @tonton81 - keep up the good work
 
we can make a solution just need ideas. If we detect a bus-off it doesn't necessarily mean the network is dead, we would need to check if the traffic stops and if so clear the queue while clearing transmits, however, i am not sure people would like their transmits cleared, depending on their project, a bus-off can't simply be detected without actually transmitting, and if we force stop it from doing so some projects with 2 or 3 nodes will stop communicating when one resets because they went into bus-off and refuse sending, then nobody can talk anymore unless rebooting those nodes until they see each other's acks. There are pros and cons to every situation, I just don't want to bloat the ISR with stuff that can be managed by the user in separate functions

@CANHelp , does it flood the bus if you completely disconnect it from the bus (check with a scope)? or does it flood with the nodes attached?
 
Apologies, I don't like going to the web for help and when the first response was someone telling me I'm wrong, and to tell it "just not to transmit", after I'd been I'd been testing for a few days it seemed insulting. Sorry if I took it the wrong way.

Yes it'll still show on a scope, even if you disconnect the physical wires from the bus. Steps to replicate would be:
Standard BUS initialize (bitrate, set ID, Begin).
Transmit from Teensy
Have one or more other devices on the bus that are active and providing a valid ACK.
Have other device(s) on the network shift into passive mode for an unspecified amount of time
Teensy floods bus with infinite TX's looking for an ACK.

In an ideal world, as soon as the Teensy sees an ACK error it would cease TX'ing, or at least reduce to a user specified time interval (as right now in less than 1ms it'll transmit more than 100 messages).
 
correct me if im wrong, but if teensy is transmitting even passive nodes on the bus will respond the ack even though theyre not streaming? I ask because BCAN bus on my civic when all modules switch to passive (traffic stopped?), i can still talk on the bus and even wake the modules up to broadcast again, I'll check the bus-off issue if i have time this week, in the meantime ideas on handling your concern would probably be worked on, just be aware that if we throttle or stop transmits on bus off in cases like my project i wont be able to wake BCAN up when I need, so every precaution must be taken to not affect everyone who uses the library
 
the software end doesn't do the retransmits, it's the hardware on it's own, so we need to tackle that issue like you said by throttling or not sending on idle bus, but at the same time, not affect current users like me who do use transmit to wake up the body can network of the vehicle
 
I appreciate you looking into it. I have a test bench built out with an aftermarket control unit and a diagnostic unit, both have switchable modes while I was testing this. Some of this may be terminology, but both devices that I'm working with do not send an ACK when in their passive modes. Which then causes the Teensy to flood the bus. The ability to throttle TX retransmissions shouldn't harm your ability to wake up the BCAN on a Honda, especially if it was able to be time constant, even at 10hz you wouldn't see a perceivable difference. However, if possible, it would certainly be useful for everyone to be able to specify the number of retries and the length of retry period.
 
there are also cases where there may be several teensy nodes on a network that require isotp communication as well before they respond (i talk to an esp32 using isotp over a 3 node bus), introducing throttle for that situation as well will slow down communication as it's silent until the full data is given, so throttling frames there hinders the throughput

on another note, we can't throttle TX actually, we can only stop it which drops the send, then we can stop the queue, since the hardware handles the retransmissions we will lose the current frame to stop it from sending, we need to account for everything :)

could also dump the mailbox data back into the queue before stopping it as an alternative, so it can get resent later
 
Last edited:
Seems this is known behaviour of flexcan:

One of the features of the CAN protocol is that will keep sending the package until one node answers. If during system start-up, only one node is operating, then its TXERRCNT increases in each message it is trying to transmit, as a result of acknowledge errors (indicated by the ACKERR bit in the Error and Status Register).

If you broadcast a message and there are no other nodes on the bus YOUR CAN module will send the message forever. You're CAN module will start out in error active mode and NAK your message until you hit an error level and then become error passive (message is still sending). The reason is due to the fact that no one has acknowledged your message.

Endless sending is a feature haha.......... anyways let's work on it :)
 
Last edited:
Can you do me a favor and check this, search the tpp for:
Code:
FLEXCANb_ESR1(_bus) |= FLEXCANb_ESR1(_bus);
it's at the end of the ISR

I need you to comment it out and add this:
Code:
  [COLOR="#008000"]//[/COLOR]FLEXCANb_ESR1(_bus) |= FLEXCANb_ESR1(_bus);
[COLOR="#008000"]  uint32_t esr = FLEXCANb_ESR1(_bus);
  if ( esr & FLEXCAN_ESR_ACK_ERR ) {
    for (uint8_t i = mailboxOffset(); i < FLEXCANb_MAXMB_SIZE(_bus); i++) {
      if ( (FLEXCAN_get_code(FLEXCANb_MBn_CS(_bus, i)) == FLEXCAN_MB_CODE_TX_ONCE) ) { // if TX mailbox is sending...
        FLEXCANb_MBn_CS(_bus, i) = FLEXCAN_MB_CS_CODE(FLEXCAN_MB_CODE_TX_INACTIVE);
      }
    }
  }
  FLEXCANb_ESR1(_bus) |= esr;[/COLOR]

This resets transmit boxes if ACK_ERR is registered in the ESR

in my test with Teensy to ESP32, if I pull the ESP32 power the flexcan controller gets the ACK_ERR set, the transmits cease, even if you keep sending via your loop, they will be cancelled. The moment i plug the ESP32 back in, the sends start from that point :)
 
Last edited:
I gave that a try, unfortunately it still floods the BUS. In 10 seconds it recorded just under 48,000 transmits once the ACK error was generated. (I gave it 10 seconds then killed the process). I'm scratching my head trying to figure out if there is something more relevant diagnostic I can provide you with.
 
Last edited:
but it clears the TX mailboxes, are you sure it's not constantly sending in the loop ? or are they timed? when the loop sends a message then ACK_ERR happens and then the TX is stopped, but if the loop is resending that may influence the counts. I might work on an alternate method by switching to LOM on ACK_ERR, however, would need to figure out how to exit LOM mode when done else transmits for wakeups will have issurs
 
Last edited:
but it clears the TX mailboxes, are you sure it's not constantly sending in the loop ? or are they timed? when the loop sends a message then ACK_ERR happens and then the TX is stopped, but if the loop is resending that may influence the counts. I might work on an alternate method by switching to LOM on ACK_ERR, however, would need to figure out how to exit LOM mode when done else transmits for wakeups will have issurs

With your tpp modification:

If Node A is active and sending ACK on Teensy boot, the teensy will function as expected on CAN TX.

If Node A is active and able to send ACK on teensy boot and CAN TX, but is Node A is turned off (or swapped to passive) the teensy is sending valid CAN TX, the teensy will NOT send infinite TX's UNTIL the next CAN TX retry. If the Teensy retries a CAN TX on a time interval, or activated by a trigger (serial input, digi pin pulled, user's choice), and there is no device to send a valid ACK, the teensy will send infinite TX retries.

If Node A is passive and not sending ACK's when the teensy boots, upon first CAN TX, it will send infinite retries.
 
Last edited:
Can you try this one?

Code:
  //FLEXCANb_ESR1(_bus) |= FLEXCANb_ESR1(_bus);
  uint32_t esr = FLEXCANb_ESR1(_bus);

  if ( esr & FLEXCAN_ESR_ACK_ERR ) {
    FLEXCAN_EnterFreezeMode();
    FLEXCANb_CTRL1(_bus) |= FLEXCAN_CTRL_LOM; /* listen only mode */
    FLEXCAN_ExitFreezeMode();
  }

  if ( FLEXCANb_CTRL1(_bus) & FLEXCAN_CTRL_LOM ) {
    if ( esr & FLEXCAN_ESR_BIT0_ERR ) {
      FLEXCAN_EnterFreezeMode();
      FLEXCANb_CTRL1(_bus) &= ~FLEXCAN_CTRL_LOM; /* normal mode */
      FLEXCAN_ExitFreezeMode();
    }
  }
  FLEXCANb_ESR1(_bus) = esr;

During this test, in listen-only mode with no frames on the bus (ESP32 disconnected), the ISR never fires... as soon as I plug the ESP32 in (ESP32 is set to not send anything), the ISR fires up and deasserts the listen-only mode
 
Very similar functionality to the previous test on my end. I've only edited the wording for the second section. In the previous version you could return the teensy to sending valid TX's in scenario 2.

1. If Node A is active and sending ACK on Teensy boot, the teensy will function as expected on CAN TX.

2. If Node A is active and able to send ACK on teensy boot and CAN TX, but Node A is turned off (or swapped to passive) while the teensy is sending valid CAN TX, the teensy will NOT send infinite TX's UNTIL the next CAN TX retry. If the Teensy retries a CAN TX on a time interval, or activated by a trigger (serial input, digi pin pulled, user's choice), and there is no device to send a valid ACK, the teensy will send infinite TX retries. **Upon returning Node A to being active on the BUS the infinite TX's cease, but the teensy will not send a new CAN TX when prompted.

3. If Node A is passive and not sending ACK's when the teensy boots, upon first CAN TX, it will send infinite retries.
 
Okay well the infinite retries when Node A is passive when teensy boots is probably because the interrupt never fires unless network activity actually fires the ISR (which the code handling we added there never processes), goto your setBaudRate function and use the overload to set listen only mode on in setup and verify?

setBaudRate(1000000, LISTEN_ONLY):
 
Sorry I'm not sure I understand what you mean, if I set the device to be listen only in setup, it'll never attempt to TX and never get into an infinite TX situation.
 
It does not, no. Since we've set listen only mode in setup, the send never occurs (even when told to progress through a Send / CAN TX), so it has no idea that there are other functioning devices on the bus, so it can't remove the LOM.
 
can you print out the Serial.println(esr) value in the ISR right before it's cleared and post me the value you see?

make sure LISTEN_ONLY is set in setBaudRate,

1) value when node A is passive and teensy boots,
2) value when node A is active and teensy boots
 
Last edited:
Status
Not open for further replies.
Back
Top