Forwarding with FlexCAN_T4 and losing messages

Status
Not open for further replies.

torriem

Member
Debated whether to start a new topic or post to the already long FlexCAN_T4 thread. Apologies if that would have been preferred.

I've been working on a project to bi-directionally forward j1939 CAN bus messages between two CAN ports. Partly this is for doing analysis purposes, so I can see what direction a message is traveling. But also I am modifying some messages as they traverse my bridge. The speed of both interfaces is 250 kbs and the bus has a fair amount of traffic on it (around 150 messages per second). I've had great success on an Arduino Due using interrupt-driven message processing and the due_can library. I am hoping to accomplish the same thing with the Teensy 4 since the Teensy has several times more processing power so I am hoping to drive a status display and do other UI things while processing the messages.

I don't think I can't use CAN filters because any and all messages have to be forwarded. I started at first principles using the BiDirectionalForward example. However this doesn't seem to be fast enough to forward all the messages. The reason I can tell this is because some of the messages on the bus are used to drive an OEM LCD data display and when messages are lost, the display is corrupted somewhat--things aren't drawn in the right place and some text is missing. There's some kind of drawing protocol that's being used. I encountered this issue with the Due but solved it by moving to interrupt-driven message processing. On the Teensy I moved to using the CAN2.0_example_FIFO_with_interrupts example code, modified to bi-directionally forward messages. However the same thing still happens (corruption on the OEM data display) and the only cause can be messages are getting lost.

Here's the code I'm testing with. There is a simple print for each message which I know is too slow for a uart, but Teensy 4's USB serial device is very fast so shouldn't be a problem (it isn't on the Due's native USB port either).
Code:
#include <FlexCAN_T4.h>

FlexCAN_T4<CAN1, RX_SIZE_256, TX_SIZE_16> Can0;
FlexCAN_T4<CAN2, RX_SIZE_256, TX_SIZE_16> Can1;

void can0_got_frame_teensy(const CAN_message_t &frame) {
	Serial.println("0->1");
	Can1.write(frame);
}

void can1_got_frame_teensy(const CAN_message_t &frame) {
	Serial.println("1->0");
	Can0.write(frame);
}


void setup()
{
	delay(5000);
	Serial.begin(115200);
	Serial.println("j1939 CAN bridge.");

	//Teensy FlexCAN_T4 setup
	Can0.begin();
	Can0.setBaudRate(250000);
	Can0.setMaxMB(16);
	Can0.enableFIFO();
	Can0.enableFIFOInterrupt();
	Can0.onReceive(can0_got_frame_teensy);
	//Can0.enableMBInterrupts(FIFO);
	//Can0.enableMBInterrupts();
	Can0.mailboxStatus();

	Can1.begin();
	Can1.setBaudRate(250000);
	Can1.setMaxMB(16);
	Can1.enableFIFO();
	Can1.enableFIFOInterrupt();
	Can1.onReceive(can1_got_frame_teensy);
	//Can1.enableMBInterrupts(FIFO);
	//Can1.enableMBInterrupts();
	Can1.mailboxStatus();
}

void loop() 
{
	//process collected frames
	Can0.events();
	Can1.events();
}

The bus traffic is around 150 extended messages per second. The Due had no problem with it with interrupt-driven processing, even when I parsed each message and printed data out to the native USB serial. No OEM display issues whatsoever. What can I do to make the Teensy 4 work? Is this just a limitation in the Teeny's CAN processing hardware? Is it a matter of the transceiver (using the SKPang board's transceivers)? I suspect it's something simple and my ignorance of how CAN FIFOs and Mailboxes work is causing me to miss something. Any clues? I've read through the README a few times and looked at the examples. And read on the long FlexCAN_T4 thread too.
 
Been thinking about it, and maybe the issues more related to timing than losing packets. I noticed the special-purpose ext_output#() weak externs in the code (and looked at how they are used in TeensyCAN). I'm going to try to do the message forwarding in ext_output1() which will run in the interrupt handler, since that's what I was doing on the Due. I realize doing too many things in the interrupt handler is a bad thing, so I'll have to be careful with it. For now I just want to see if I can get messages forwarding without the issues I am currently seeing. I'll give the ext_output#() technique a try tomorrow.
 
Tried using ext_output1 and it is working properly but the weird timing/data corruption issues persists. Maybe it's harmless. Only seems to be affecting this OEM display, not the underlying functions of the machine. But I am unsure. I don't know enough about FlexCAN_T4 to know for sue
 
are you having issues transmitting or receiving the lost frames? you can play with the queues in constructor to find out?
Try using setClock() to configure the controller to use 60MHz peripheral clock (as opposed to default 24MHz oscillator clock).

Remove events() calls in the loop, those are messages queued (RX queues). Not using events() will allow the interrupt handler to directly fire your callback instead of depending on queues.

FlexCAN_T4 has 4 types of reading
1) polling via read(msg).
2) queue based interrupt, callback fired from events() in loop()
3) direct callback firing (no reception queues used, remove events() from loop() to activate.
4) background interrupt handler, used for other applications to read the bus data from another library
 
Last edited:
Great thanks.

I do not know if I'm having issues transmitting or receiving.

I'll give the clock changes a try. So simply not calling events() will let it call my callback directly from the handler? I've been playing with using ext_output1() which is being called directly and that is functioning but the OEM display isn't quite happy with the data. But I still have events() calls in the main loop. I'll eliminate those.
 
the background handler runs in parallel with the interrupt callback (with or without events(events is for sketch callback which uses queues))

so just removing events() will make your own callback fire directly without using queues.

changing the clock will allow different rates to work. example at 24mhz i was having issues connecting to 125kbps, but when i went for a different clock (60mhz) 125kbps worked perfectly
 
I played around with different clock speeds and queue sizes in the constructor. Nothing helped, unfortunately. Just to make sure I wasn't imagining things, I went back to the Due and everything works great there with no losses whatsoever. I can even do a fair amount of work on the message frames in my message handler and it still manages to keep up and the OEM system is quite happy. So I'm stumped!

I'm reasonably certain the Teensy version is losing frames, not just a timing issue. I have an idea to use the Due to sniff on either side of the teensy to compare the traffic.
 
can you try a different tranceiver? I've blasted 1Kbyte arrays with TeensyCAN with CRC validation, if frames were missing it,ll never reconstruct the array at other node, pass the CRC, and fire a callback
 
Is it possible the symptoms could be caused by frames being forwarded out of order, rather than lost? Does the display's protocol split data across frames then reconstruct it? Does it have a incrementing counter for keeping track of order?
Out of curiosity, in your interrupt function, can you try setting the received frame to sequential (frame.seq=1) before sending?
 
i agree, reception of fifo is ordered 6 deep, however on transmit, any of the 8 remaining TX mailboxes can send in any order. Sequential will use only the absolute first TX mailbox when sending ordered frames :)

Also is the duo using FIFO? How many mailboxes are setup for RX and TX?
 
Last edited:
Good idea. Yes messages out of order is a very possible cause. That would exactly explain some of the corruption in the display.

I am not sure about the Due code. The CAN library is due_can and more specifically I am basing my code on the in the example file, CAN_TrafficModifier.ino, except that I added interrupt callback handlers instead of polling. I do not see any calls to set any FIFOS and all I do is blindly set the filters like the example file. So I think that means it's using 7 RX mailboxes and the filter on each mailbox is set to let anything in. Is that a clue to what i can do with FlexCAN_T4?
 
According to the due-can readme, the default uses 1 tx mailbox. If it's occupied with a frame pending transmission when you try to send a new frame, the new frame will be queued. When the pending frame finishes transmission, an interrupt fires immediately loading the next queued frame to the TX mailbox and instructing it to send.
This is similar to how the FlexCAN_T4 library works IF the frame's "seq" field is set =1 before calling the write() function.
 
Interesting. So to get this behavior I should simply set the seq field in my callback before I write out the frame to the other interface? That sounds like what I want. I'll give it a shot!
 
Regarding the RX mailboxes: both due and flexcan use 7 RX mailboxes by default. If your message handling is fast enough, only 1 RX mailbox will be used at a time, and nothing can get out of order. If the due could do this fast enough, the T4 should be fine. If you want a more robust solution, using the enableFIFO() function changes the operation to provide 4(?) frames of FIFO RX buffering. When your interrupt handles a frame, you know it was the first received.
 
you can either use the .seq flag to enable sequential on absolute first TX mailbox, or just setup only 1 TX mailbox. you can have up to 64 on T4, 16 on T3, and if you really wanted you can setup 1TX and 2 RX, and if reception ordering is important, use FIFO.

Example, by default, 16 mailboxes are setup, 8 are "consumed" for 6 deep FIFO (consumed, meaning sacrificed), 8 remaining used for transmit.

to have ordered FIFO and 1 transmit mailbox, just add this to the FIFO sketch in setup()

Code:
Can0.setMaxMB(9);

That should set FIFO with one TX mailbox, and sequential won't be needed because there is only 1 transmit at a time. mailboxStatus() should confirm and show you'll have 1 fifo and 1 TX.
 
I really appreciate your help and thoughts on this.

So far nothing I've tried is working. I initially tried setMaxMB(9), normal callbacks, and no events() calls. That didn't work at all (no traffic to the DISPLAY). Here's the code:
Code:
#include <FlexCAN_T4.h>

FlexCAN_T4<CAN1, RX_SIZE_256, TX_SIZE_16> Can0;
FlexCAN_T4<CAN2, RX_SIZE_256, TX_SIZE_16> Can1;

void can0_got_frame_teensy(const CAN_message_t &frame) {
	Serial.println("0->1");
	Can1.write(frame);
}

void can1_got_frame_teensy(const CAN_message_t &frame) {
	Serial.println("1->0");
	Can0.write(frame);
}


void setup()
{
	delay(5000);
	Serial.begin(115200);
	Serial.println("j1939 CAN bridge.");

	//Teensy FlexCAN_T4 setup
	Can0.begin();
	Can0.setBaudRate(250000);
	Can0.setMaxMB(9);
	Can0.enableFIFO();
	Can0.onReceive(can0_got_frame_teensy);

	Can1.begin();
	Can1.setBaudRate(250000);
	Can1.setMaxMB(9);
	Can1.enableFIFO();
	Can1.onReceive(can1_got_frame_teensy);

	Can0.enableFIFOInterrupt();
	Can0.mailboxStatus();
	Can1.enableFIFOInterrupt();
	Can1.mailboxStatus();
}

void loop() 
{
}
I think I must have misunderstood something you said above because without the calls to event(), the callback does not seem to get fired at all. Unless you were referring to using the ext_output callbacks, which I then went on to try:
Code:
void ext_output1(const CAN_message_t &msg)
{
	if(msg.bus == 1) {
		Can1.write(msg);
		Serial.println("0->1");
	}
	if(msg.bus == 2) {
		Can0.write(msg);
		Serial.println("1->0");
	}
}
That works kind of. Traffic is getting through but there seems to be a lot of missing frames. The OEM display starts complaining about lost communication. Button presses that interact with a device on the other side of the bus aren't seen half the time. So apparently using just one TX queue has now made the issues I'm seeing a lot worse.

I'm clearly missing something (probably simple). Oh and I tried setting seq to 1 without messing with setMaxMB, and that also seemed to make matters worse. I appreciate your patience!

Thanks again.
 
if the callback doesnt fire without events(), you are not using the latest github copy. Please update.

Also increase your TX_SIZE in the constructor, the queue may be overflowing while waiting arbitration, maybe your issue is on transmit end, so increase the TX queue size and add more mailboxes. setMaxMB(32) should give you 24 TX mailboxes with FIFO, and without the sequential flag set
 
Many thanks to you both. I think have a combination that's working now.

I updated to the latest version from git, increased the constructor queue sizes to 1024, setMaxMB(32). I was still having problems so I set the seq flag to 1, and this combination finally is working.

I ran the system for quite a while today and didn't see any data corruption from frames out of order. Looks like I'm good to continue my project with the Teensy!
 
thats cool, but thats 24 TX mailboxes when seq is only using 1 in FIFO mode, I think the higher queues and interrupt helped with ordered transfers
 
Yeah I'm not sure everything was strictly necessary. I definitely had issues without seq=1, and a few days ago I still saw frames getting out of order (could see some text transposed) with setMaxMB(24). I don't understand what's happening behind the scenes to know exactly what is required. I may still get occasional frames out of order, but they are so few that they are of no consequence. The operation of the systems on the bus that I'm interested in work with individual message broadcasts, so the out of order thing doesn't affect them. At this stage I'm confident that any glitches are cosmetic.
 
Glad you got it working.

I had been experiencing related issues prior to the FlexCAN_T4 commit on June 20 (AKA Update7), so my money is on that.

Tony - that commit barely missed inclusion into Teensyduino 1.53 and doesn't seem to be on the list for 1.54. Are updated libraries automatically included with new Teensyduino releases, or should it be requested?
 
no, it's pulled in manually by paul, i think its because i dont have a library config file or have it on Library manager
 
Status
Not open for further replies.
Back
Top