How best to parse / format incoming stream of data

virtualdave · Feb 2, 2014

This question has been on my mind for...years? And now I'm finally sitting down to try to figure this out.

Let's say I have an array of numbers coming in from a PC to a teensy. Each number is a byte which will eventually control the intensity of a bunch of RGB LEDs. I'm reading these raw bytes on a teensy...not their ascii equivalent (e.g. 234 0 100 45 21 94 for the the RGB values for 2 LEDs). The challenge of course is knowing when a packet has started and when it had stopped when any value you use could also appear as an RGB value (and vice versa).

How do others get around this? Simply counting bytes won't work...as soon as 1 byte gets lost/dropped then all the values will be off by one position (as I've seen many times in my testing today even). And I'd rather not convert the data bytes to ascii since it could quadruple the payload (as well as add time to convert the three numbers + space back to a singe byte). Any other suggestions? I'm beginning to see the advantage of working in 9-bits (the 9-th bit set to 1 at the beginning of a packet, otherwise 0). Maybe converting a byte to 2 nibbles, adding something to the nibble so it's always > than the packet headers or footers (which would not be converted), then reassembling the bytes at the receiving end? Still doubles the payload, however.

Anyway...not sure if I am making any sense. If it does make sense, any suggestions on how to tackle this?

Thanks,
David

bloodline · Feb 2, 2014

You need a sentinel, a marker in the data stream that tells you where your data starts. I prefer to use a packet based system, so the device will check that the packet is valid and then accept the payload.

If you want to send a data stream, select a value that doesn't make much sense, for 8bit LED (with a valid range between 0 and 255) maximum brightness will realistically be achieved by 200 or some other high number... So why not use the value of 255 to tell your device that this is the top of the array,most it can reset and know which value refers to which LED.(-edit- don't forget that now, the value 255 never refers to a brightness and always means that the device needs to start counting again).

This is a very basic idea, but you can develop it to make it more complex and robust.

mortonkopf · Feb 2, 2014

I think that the ideas you are talking about have also been worked on in the TPM2 protocol, specifically for data streaming of led colour info in 'blocks'. The protocol also allows for daisy chaining by inserting a generic 'frame' marker, but with the frame being allowed to be of variable size. This variable size means that overhead can be reduced. The protocol allows for confirmation of receipt of data but this does not act as a brake. There is also addressing within the data stream to allow parallel setting of recipient leds. Might be worth a read to think about how you want to progress your data streaming.

I am currently trying to get this to work with cc3000 and octows2811.
The site is in german, but the google translate does a reasonable job.
http://www.ledstyles.de/ftopic18969.html

virtualdave · Feb 2, 2014

Thanks for the pointers.

I should have made the inquiry a bit more generic (rather than just LEDs!). wrt LEDs, yes I could eliminate 255 from the list of possible values, but I'm also getting sensor data on the same (RS485) network returned back to me (16-bit values split into 2 8-bit bytes). So 255 is a data point that I want to be able to send/receive, not have as a marker. I think I have to wrap my head around this a bit more. Ideally I would love to have a header and footer that would never show up in the data between the header & footer. For some reason I'm liking the idea of splitting the 8-bit data into 4-bit "nibbles", then shifting each data nibble 4-bits. So control bytes would be 1-15, data bytes 0, 16, 32, ... , 240. Then reassemble the data on the receiving end.

I remembered where I got this idea (I knew I couldn't have come up with it!): Nick Gammon's RS485 library!
http://www.gammon.com.au/forum/?id=11428

...at least something along those lines.

Plus side is I always have a unique character for the beginning and end of a stream of data. Minus is that it doubles the size of my payload.

Anyway, thinking out loud

I'll put together a sketch later this afternoon to help show what I'm trying to say.

Thanks again,
David

bloodline · Feb 2, 2014

Hi David,

I think you might be making this a little more complex than it needs to be

I would build something similar to a UDP packet, thought nothing so heavy weight for your needs, but you could borrow design ideas.

-edit- Disclosure: I always use a packet based approach for comms.

Mneventh · Feb 2, 2014

Maybe you could use 0 like an escape character or backslash, so that 0,0 is interpreted as the RGB value of 0 and 0,255 (say) is interpreted as end of packet. Then you could send your example string as 234,0,0,100,45,21,94,0,255 which would mean 234, 0, 100, 45, 21, 94, END.

stevech · Feb 2, 2014

virtualdave said:
This question has been on my mind for...years? And now I'm finally sitting down to try to figure this out.

Let's say I have an array of numbers coming in from a PC to a teensy. Each number is a byte which will eventually control the intensity of a bunch of RGB LEDs. I'm reading these raw bytes on a teensy...not their ascii equivalent (e.g. 234 0 100 45 21 94 for the the RGB values for 2 LEDs). The challenge of course is knowing when a packet has started and when it had stopped when any value you use could also appear as an RGB value (and vice versa).

How do others get around this? Simply counting bytes won't work...as soon as 1 byte gets lost/dropped then all the values will be off by one position (as I've seen many times in my testing today even). And I'd rather not convert the data bytes to ascii since it could quadruple the payload (as well as add time to convert the three numbers + space back to a singe byte). Any other suggestions? I'm beginning to see the advantage of working in 9-bits (the 9-th bit set to 1 at the beginning of a packet, otherwise 0). Maybe converting a byte to 2 nibbles, adding something to the nibble so it's always > than the packet headers or footers (which would not be converted), then reassembling the bytes at the receiving end? Still doubles the payload, however.

Anyway...not sure if I am making any sense. If it does make sense, any suggestions on how to tackle this?

Thanks,
David

I suggest that you exchange numeric data like this using a text format such as 1,3,4,56,78,1234
and so on. Sending raw binary data where the datum is 2 or more bytes in size, like a C language short is 2 bytes, and a long is 4 bytes - leads to the infamous endian problem - which byte is sent first?
Sending in text form eliminates this.
The x86 chips in PCs has an endian that is opposite that of many/most microprocessors.

MichaelMeissner · Feb 2, 2014

stevech said:
The x86 chips in PCs has an endian that is opposite that of many/most microprocessors.

While the endian issue exists, AVR and ARM processors using the Arduino libraries execute in little endian mode (same as x86), so it is less likely you will see it if you stay within the Arduino walled garden. The Navspark (which uses an open source Spark v8 processor) is a big endian processor.

virtualdave · Feb 2, 2014

@bloodline: oh that's very much a possibility in my world

It's just one of those puzzles that hits me every once in a while. Good suggestion to look at how it's handled via UDP (especially since this sketch started as a UDP-beased sketch before moving to RS485). You mention you always use packet-based approach for comms...but isn't is all packet based? Guess I'm not understanding what you mean here.

@Stevech: that's the part I really need to sit down and test wrt timing. It would be so much simpler to send ascii values (e.g. 50 53 53 vs. 255) I just need to see if I can handle the extra bandwidth.

Thanks,
David

pictographer · Feb 2, 2014

If you're looking for background on this problem, you might check out SLIP.

http://en.wikipedia.org/wiki/Serial_Line_Internet_Protocol
http://tools.ietf.org/html/rfc1055

Also, it might save time to debug using ASCII, then switch to a more compact representation when you've got the basics working.

Cheers!

mortonkopf · Feb 3, 2014

I guess that the efficiency of overhead will only come into play if you are sending large amounts of data in a short space of time, and if you are planning send and forget or some sort of receipt and checking (hand shaking and flow control). Packets allow packet counting, very useful structuring.

bloodline · Feb 3, 2014

virtualdave said:
You mention you always use packet-based approach for comms...but isn't is all packet based? Guess I'm not understanding what you mean here.

By packet I mean the data is always sent as a discrete well formed unit.

For example, a really simple packet:

typedef struct {
int magicNumber; // identifier code for your packets, 0xDEADBEEF is what I use

int address; // the intended recipient of the packet, allows more than one device on the same bus
int source; // the identity of the packet sender.
byte checksum; // usually a simple calculation, like mod 256, of the sum of all the bytes, allows you to see if the packet is damaged.
myDataType payload; // the data you want to send.
} myPacket;

The receiver then knows to listen to the data stream and look out for the packet identifier code. When it sees it, it then stores the next stream of bytes up to the size of the packet. Now it can check the address and confirm the data is meant for it, then it can perform a check sum, again confirm the data integrity is good. If this all check out the data is good and can be used!

Pointy · Feb 4, 2014

I must admit to not worrying too much about this in my device. I simply send 2 magic bytes, then my data and then another 2 magic bytes at the end. I am sending data using rawHID, rather than the serial, so I don't know if it is more robust, but I haven't had any problems thus far. Using rawHID you have 64 bytes per packet, so this method leaves 60 bytes for actual data. Seeing as I only need 2 it's more than enough for me.

Regards,

Les

How best to parse / format incoming stream of data

virtualdave

Well-known member

bloodline

Well-known member

mortonkopf

Well-known member

virtualdave

Well-known member

bloodline

Well-known member

Mneventh

Member

stevech

Well-known member

MichaelMeissner

Senior Member+

virtualdave

Well-known member

pictographer

Well-known member

mortonkopf

Well-known member

bloodline

Well-known member

Pointy

Well-known member