Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 21 of 21

Thread: Fastest was to transfer a lot of data between close Teensys

  1. #1

    Fastest was to transfer a lot of data between close Teensys

    Hello,

    I am now convinced that the Teensy is the best micro-controller out there. Community support is one of the strongest reasons. And now with the 4.0, it is also the fastest micro. Perhaps the only limitation is that it is a single core micro.

    I have a project where I need to acquire 18 KB of data from each of 3 sensors. I have selected the Teensy 4.0 (3 units on the way) where each Teensy will connect to one sensor. That amount of data in itself is not a problem within each micro, but now I have to find out the FASTEST way to transfer this data from the second and third Teensy to the first for processing.

    What is the community's recommendation for a hardware/software combination? BTW, all three Teensys will be in the same physical location (within an inch of one another).

    Thanks in advance.

  2. #2
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,692
    What interface is used to read the 18KB of sensor data? is that 'per second' or one time? Raw read or from a i2c or SPi or ? Any reason each sensor would need a dedicated T4?

    What is done with 3 * 18 KB chunks of data after collection?

    UARTs on T4 are numerous and fast - 5 Mbit/sec. On T_3.x there was an SPI transfer library by tonton81 that ran over 20 Mhz - can't say if I saw that updated to run on T4.

  3. #3
    Senior Member vjmuzik's Avatar
    Join Date
    Apr 2017
    Location
    Florida
    Posts
    340
    Theoretically you could use the usb host port to connect them together for the roughly 480 Mbit/sec if you really need that much speed.

  4. #4
    Quote Originally Posted by defragster View Post
    What interface is used to read the 18KB of sensor data? is that 'per second' or one time? Raw read or from a i2c or SPi or ? Any reason each sensor would need a dedicated T4?

    What is done with 3 * 18 KB chunks of data after collection?

    UARTs on T4 are numerous and fast - 5 Mbit/sec. On T_3.x there was an SPI transfer library by tonton81 that ran over 20 Mhz - can't say if I saw that updated to run on T4.
    Thank you defragster.

    I will be reading the data using the Serial UART port at 115200bps. Each micro should be able to handle that easily. Each sensor needs a dedicated T4 because all 3 sensors are sending their data at roughly the same time and then stop sending after about 2 seconds.

    After the data is collected it "can" be processed locally by each T4 but I was wondering if this data can be sent to one of the T4 for processing data from all three sensors. However, the most important info needed is the exact time each sensor is triggered which needs to be shared to one of the T4 (most likely via an interrupt). This is why I wrote that a multi-core micro (3 cores) would be ideal. They do exist but are not well supported and have their own issues.

    I was not aware of the tonton81 library and have no experience using SPI since as a newbie, I have always been reading stories about potential problems with SD and Ethernet which will also be used in my project.

    Any recommendations on how one would use SPI in this case to connect all three sensors?

    Thanks again for your help.

  5. #5
    Thank you vjmuzik. USB would not work because all three needs to be connected together and I am not sure how one can do this using the USB port.

  6. #6
    Senior Member vjmuzik's Avatar
    Join Date
    Apr 2017
    Location
    Florida
    Posts
    340
    You can connect a usb hub to it just like you can on the Teensy 3.6 and connect multiple devices.

  7. #7
    Thank you vjmuzik. I will read about this option. It just adds a bit of bulk but if SPI does not work, this may be the best option.

  8. #8
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    5,421
    I am not sure why you could not do this with one T4 or T3.x for that matter.

    You have three devices that communicate over serial at 115200 which is not terribly fast. The serial ports receive the data using interrupts...

    So you can simply have your main loop check the status of all three Serial ports.

    That is it could be as simple as:
    Code:
    #define BUFFER_SIZE 18432
    uint8_t buffer_serial1[BUFFER_SIZE];
    uint8_t buffer_serial2[BUFFER_SIZE];
    uint8_t buffer_Serial3[BUFFER_SIZE];
    uint16_t buffer1_index = 0;
    uint16_t buffer2_index = 0;
    uint16_t buffer3_index = 0;
    
    void loop() {
        while(Serial1.available() {
            buffer_serial1[buffer1_index++] = Serial1.read();
        }
        while(Serial2.available() {
            buffer_serial2[buffer2_index++] = Serial2.read();
        }
        while(Serial3.available() {
            buffer_serial3[buffer3_index++] = Serial3.read();
        }
    ...
    Of course you need want to check for certain conditions, like have you received full data or error, going over end of buffer ...

    Also you could remove these from the loop and use SerialEvent1 SerialEvent2 SerialEvent3 (or whichever serial objects you use).

  9. #9
    Thank you KurtE.

    The three sensors send their data simultaneously so the loop() above will not read from the other 2 sensors and by the time it is done with reading the first sensor, data from the other 2 sensors will be lost ...

    The data is not received through interrupts, it is done by polling. Interrupts are used to signal the other two micros that the first sensor had been triggered ...

  10. #10
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,556
    The serial ports have FIFOs in hardware, and 64 byte buffers implemented in software. All the ports are able to simultaneously receive into their FIFO & buffers. 115200 baud is so slow that even Teensy 3.2 can easily process all the interrupts as the data arrives simultaneously. If you need bigger buffers, you can craft code to read partial message and store in a larger buffer in your program, or edit the serial code to increase the buffer size inside the core library.

    Much as I would like to sell you more Teensy boards, I'm pretty sure you can do this project with only 1 board. Even Teensy LC may be able to do this, but I would recommend at least Teensy 3.2.

  11. #11
    Thank you very much Paul. It is a an honor to receive a response from a micro-controller leader and visionary.

    (I have already bought 3 T4s :-)

    Can you elaborate on how this could be done? How can I make sure that no data is lost when the 64 byte buffer overflows?

    Basically I have 3 sensors checking for an incident continuously. The 3 sensors are always sending data but this data is only saved if certain decoded values in the stream are present (when the sensor is triggered). Once each sensor is triggered, it sends useful data for about 2 seconds (at 115200 bps). After that time, there is no need to read the data until the sensors are triggered again ...

  12. #12
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    5,421
    The Serial buffers are already in place. What Serial1.available() does, is to check for any data in that queue (buffer). And as mentioned you have lots of time to process the characters as they arrive.

    Again as Paul mentioned, each UART has it's own hardware FIFO, which on T4, I believe is 4 words in size. So as long as long as those interrupts are not held up for a long period of time, the interrupt handlers will move the data out of those hardware queues into software queue, which again is what available() is looking at.

    So most of the time with things like this, it is not the hardware that is the limitation, but more an issue of how you write the program...

    That is if you put in something like:
    Code:
    While (my_buffer_count < my_message_size) {
       if (Serial1.available() {
            my_buffer[my_buffer_count++] = Serial1.read();
       }
    }
    Then yes you could easily starve the processing for the other two Serial ports.... But if instead you check for Serial available and only read what is available for each of the queues and if appropriate keep some form of state information for each sensor, then you should easily be able to do it.

    And personally I think this is a lot easier than trying to figure out how to make multiple processors work!

  13. #13
    Hello KurtE:

    Thank you. I WOULD LOVE TO BE ABLE TO DO IT WITH A SINGLE MICRO. I have used buffers and interrupts in the past but I am not proficient at it.

    Here is what I am struggling with and would appreciate your help:

    1) The sensors are sending their data continuously.
    2) I need to check the serial data sent for evidence that a sensor has been triggered. This is done by checking a repeating TTL message of about 600 bytes from each sensor.
    3) Once sensor is triggered, I keep acquiring data while checking each message for evidence that the same incident has passed. When the first sensor is triggered, normally the remaining two sensors are also triggered in < 100 ms apart.
    4) Once the incident has passed (by checking each message from each sensor), I then start the processing of the data acquired from each sensor.

    So if I understand your message above, I check for data available for each Serial and process each 600 byte message to find if a sensor is triggered. Once I know a sensor has been triggered I need to save its data to a buffer.

    But every time I check in the loop(), how can I check that the sensor has been triggered by reading its message, move the data in the buffer of each UART once the sensor is triggered, and then check again that the incident has passed each sensor to stop acquiring data, all without losing the data in the two remaining buffers?

    As I said, I would love to use a single micro but with my limited knowledge, I do not know how.

    Thank you very much in advance for your time.

  14. #14
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    5,421
    Sometimes hard to give good answers to give specific answers, without really understanding the data and the like: For example are the messages exactly 600 bytes, or is there some format to them, like for example when I am processing some serial data, example suppose I am receiving data from some servos like ones which are from Robotis: Example Protocol 2: The messages look something like:

    0xff 0xff 0xfd 0x00 <ID> <len Low> <len High> instruction Parmeters... <CRC_L> <CRC_H>

    So I would have code in place, that sets a state like: FIRST_FF, and when I receive a byte, I check if it is an FF, I update my state SECOND_FF, else stay in FIRST_FF. When I receive next byte, if FF again go to FD ... Or some logical names. And if I make it through all of the states including enough data bytes for the length and the CRC matches, I then process the message, else in many places start over. Also in this may keep time stamps to say if the time between bytes exceeds some threshold again toss data, something wrong...

    Then once I have a valid message, I can decide if this is something I need to process or not... But again at each step very little processing time is needed, and you simply need to keep buffer and state for each one of these...

    I have no idea what you will do with the data after you receive it and if that part is time consuming or not. But the actual receiving of the data can be pretty straight forward.

  15. #15
    Thank you KurtE. Fair enough I was just reluctant of making my message too long. Here is the message structure.

    1) Each sensor sends a message continuously. Each message has 10 structures. After every message of 10 structures is received, a new message of 10 structure arrives and so on. Each structure has four fields, let us call them fields 0,1,2,3. Each field in the structure holds an int. The sensor library tells me when each message is received so I am not really reading byte by byte.

    2) Field(0) or each message holds the structure number (0-9).

    I need to check the value of field(2) when field(0) = 5 or when field(0) = 6. In this case the value of field(2) within each structure is the trigger.

    3) When the value of field(2) < x, then the sensor is triggered. I then need to store the data until the value of field(2) > x. Once the value of field(2) > x in each sensor, I then start processing the data accumulated by each of the three sensors.

    This is what I am trying to do. How would the algorithm look like so I do not lose any data if only one micro is used?

    Thank you again for your time and help.

  16. #16
    Senior Member+ defragster's Avatar
    Join Date
    Feb 2015
    Posts
    9,692
    Get it working on one channel and then duplicate it for the others.

    Set up the transfer like KurtE's p#14 with clear start / stop location indicators so if a message gets lost or broken it can be discarded until the next clean start.

    That should never happen with proper buffers and frequent attention - the T4 can cycle across multiple Serial ports pulling/pushing data at a rate of 1 to 20+ MILLION times per second through loop() depending on what else the code is doing. I did a sample with short messages of 30+ chars on 5 Serial# ports that was doing a new message each 1 or 2 milliseconds at 5 Mbaud and loop() could process all the data and still cycle the loop some 5-12 million times/second. I also connected all 7 Serial# ports between two T4's with similar messages and saw no trouble.

    So with clear indicators on the status/state like p#14 of the incoming message group of structs it will be easy to reliably read and act on the message content as indicated.

  17. #17
    Senior Member PaulStoffregen's Avatar
    Join Date
    Nov 2012
    Posts
    20,556
    Quote Originally Posted by jimmie View Post
    How would the algorithm look like so I do not lose any data if only one micro is used?
    Something like this:

    Code:
    unsigned char buffer1[600];
    unsigned char buffer2[600];
    unsigned char buffer3[600];
    
    void loop() {
      if (Serial1.available()) {
        // read whatever is available into buffer1
        // parse buffer1 for messages
      }
      if (Serial2.available()) {
        // read whatever is available into buffer2
        // parse buffer2 for messages
      }
      if (Serial3.available()) {
        // read whatever is available into buffer3
        // parse buffer3 for messages
      }
    }
    The key to making this work is avoiding delay, so whatever you do, don't call the delay() function!

    The main point is the interrupts and 64 byte buffering built into Serial1, Serial2 and Serial3 gives you about 5 ms before you risk losing incoming data (assuming 115200 baud). So in a worst case scenario where all 3 serial ports have messages arriving simultaneously, you need to make sure the work you do to parse and consume the data takes less than 1.6 ms, because all 3 of those available() functions might tell you data has arrived. If all 3 arrive at exactly the same moment, you might need to parse and consume the data from all 3 buffers. As long as you always complete the task fast enough, the buffers built into Serial1, Serial2, Serial3 will be able to capture incoming data.

    As you develop this code, you'll probably do something like this:

    Code:
      if (Serial3.available()) {
        elapsedMicros usec;
        // read whatever is available into buffer3
        // parse buffer3 for messages
        Serial.print("Serial3 processing took ");
        Serial.print((uint32_t)usec);
        Serial.println(" us");
      }
    As data arrives on Serial3, you'll watch in the Arduino Serial Monitor while a flood of lines are printed the number of microseconds you spent. Almost all will be very small, since normally you'll read only the small amount of data that has arrived but it not yet a full message. The key point is you want to get the data out of Serial3's small buffer ASAP and into your big 600 byte buffer that you know is always enough to hold your message.

    When you actually do something with the data, the microseconds printed will probably be more. This is where you need to focus your attention. Maybe you'll even add a little extra code to skip printing the message in the case where you didn't do the parsing. When you do, you want to make sure you're under 1600 microseconds. Then if all 3 happen to make 1600, you know that worst possible case is still less time than 60 bytes can arrive. Maybe if you're paranoid you'll add up all the very shot time take to read each byte and only put it into the buffer and consider than in your total 1600 time budget.

    Of course all that printing of timing info will slow your program slightly, so when you delete it, you'll know you were able to meet the incoming data speed while suffering that overhead, so your final program will be even better.

    If you get into this and you discover you're spending more than 1600 us, you could edit the serial driver code to increase the buffer size. Or you could ask here for ideas of how to make your code do its work faster. Or use Teensy 4.0 instead of Teensy 3.2, which will run your code at least 10X faster.

    But if your code is slow for reasons other than the CPU speed, like you wait for some external thing to happen before you do work, then the incredible speed of Teensy 4.0 won't help. But we might. The key to getting good help here when you have a code problem is to show us the complete code that's not working. If you withhold pieces of the code or info about the data or other details, we might still be able to help with blind guesswork to fill in the details. But we do much better at helping when you show complete code, which is why we have the "Forum Rule" in red at the top of every page.

    Just in case it's not obvious from all this, you absolutely should not get into the interrupts and buffering code inside Serial1, Serial2, Serial3. The only thing you should even think about doing there is editing the define for the buffer size. Otherwise, don't mess with that code. It's not a place to play unless you are an expert at device drivers. That code is very mature (on Teensy 3.x... everything is new on Teensy 4.0) and works very well. Just use the serial functions and focus on making sure your code always reads and processes the data in less time than the serial buffers would fill up. If you meet that timing need under all cases, then you will always be able to process the incoming data while the Serial1,2,3 driver code does all the work of actually acquiring the simultaneous incoming streams.

  18. #18
    Thank you very much Paul, defragster and KurtE. You have all gone beyond the call.

    I will be trying this on a T4 and will update you of my progress.

    Thanks again for your time.

  19. #19
    In trying to implement your recommendation, I found that the sensor's library responds each time with a 600-byte structure.

    http://playground.arduino.cc/Code/Leddar/

    The library is attached. Can this be still done given that I am not reading byte by byte?

    Can the Serial buffer be that large?

    Thanks in advance.
    Attached Files Attached Files

  20. #20
    Senior Member+ KurtE's Avatar
    Join Date
    Jan 2014
    Posts
    5,421
    The simple answer, is not without changing something...

    There are are probably some easy ways to try and probably some more correct ways to try.

    The code goes into loops that waste lots of time like:
    Code:
    char Leddar16::getDetections() 
    {
    	unsigned long startTime;
    
    	clearDetections();
    	sendRequest();				//Sends the request for detections on serial port
    	startTime = millis();
    	
    	while (!SerialPort.available())
    	{
    		// wait up to 1000ms
    		if (millis()-startTime > 1000)
    		{
    			return ERR_LEDDAR_NO_RESPONSE;
    		}
    	}
    	
    	return parseResponse();     // Parses the data available on the serial port
    }
    So it will waste up to 1 second in this loop waiting for the first character to arrive and then wait for all of the bytes to arrive or in particular it receive as many bytes as possible until it does not receive any more characters for 10ms... So again it will spin in this code:

    Code:
    	while (millis()-startTime < 10) 
    	{
    		if (SerialPort.available())
    		{
    			receivedData[len++] = SerialPort.read();
    			startTime = millis();
    		}
    	}
    For that entire time.

    Which will probably screw up the other two...

    So how to fix... Short of completely rewriting the code, I would try to do somethings like:

    Don't use getDetections, but suppose you have three LeaderOne object ldr1, ldr2, ldr3 which are setup to use Serial1, Serial2, Serial3...

    Again making simplistic approarch assuming all three in lock step...

    So could have something like:
    Code:
    void loop() {
        // Start up all three detection 
        ldr1.clearDetections();
        ldr1.sendRequest();				//Sends the request for detections on serial port
        ldr2.clearDetections();
        ldr2.sendRequest();				//Sends the request for detections on serial port
        ldr3.clearDetections();
        ldr3.sendRequest();				//Sends the request for detections on serial port
        elapsedMillis em = 0;
    
        // Wait for Serial1 to have some data...
        while (Serial1.available() == 0) {
          if (em > 1000) return;  // we timed out...
        }
        char status1 = ldr1.parseResponse();
        char status2 = ldr2.parseResponse();
        char status3 = ldr3.parseResponse();
       // do something with the data
    }
    Now in itself this won't work, as you won't start reading from Serial2 and Serial3 until you read in all of Serial1 and you will overflow Serial2 and Serial3's software buffer.

    However with the current T4 code base. I added the ability to easily expand the size of the receive and/or transmit buffer. So for example you could simple have a large enough buffer to receive the whole packet...

    So you can on T4, do something like:
    Code:
    uint8_t Serial1RXBuffer[1024];
    uint8_t Serial2RXBuffer[1024];
    uint8_t Serial3RXBuffer[1024];
    void setup() {
        ...
        Serial1.addStorageForRead(Serial1RXBuffer, sizeof(Serial1RXBuffer));
        Serial2.addStorageForRead(Serial2RXBuffer, sizeof(Serial2RXBuffer));
        Serial2.addStorageForRead(Serial3RXBuffer, sizeof(Serial3RXBuffer));
        ldr1.init();
        ldr2.init();
        ldr3.init();
    ...
    Which will add a 1k buffer for receiving data to each of the three objects. So while the code is hanging waiting for Serial1 data to be read in. The system will be buffering up Serial2 and Serial3 into these buffers, which you will read in when you call the parseResponse()...

    Hope that makes sense?

    But again I would be more tempted to update their library code to not wait for input, but you simple call of to some method like parseResponse, which returns a new state saying still waiting for data... Would require moving some of the functions variables to be class variables, but that should not be difficult.

    Good luck

  21. #21
    Thank you very much KurtE for your time and input. I really appreciate your time in both helping and educating me.

    I will take a look at the code tomorrow.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •