Using DMA to store UART byte directly in Teensy 3.5

Arup

Member
Dear members

I am using Teensy 3.5 in an arduino environment. currently i read a string of 13 bytes in length and then do the processing.
Is it possible to use DMA to auto read the uart string and store it at a memory location using DMA . So that my program just reads the memory location / variable ad do the processing.
This will make my program efficient for adding other work and stop wasting time in reading the UART.
Please suggest and if possible please send me a code.
Regards
Arup
 
I request you to kindly give me a sample code using the DMA function to store uart received bytes
 
I wrote a lengthy reply on your other thread (with Defragster linked), asking a similar question.

Just to quickly answer the related questions here....

Is it possible to use DMA to auto read the uart string and store it at a memory location using DMA

Possible, yes.

Advisable, no.


This will make my program efficient for adding other work

No, it will almost certainly not be more efficient. DMA has significant overhead to set up and manage. For some uses, like audio, video, displays it can be a huge gain. For many other uses, (probably) like this case, it will end up being a net loss.


I request you to kindly give me a sample code using the DMA function to store uart received bytes

Many examples and libraries exist with numerous example programs, but in cases like this were use of DMA is rarely a benefit, usually nobody has spend their time to write sample code.

Generally speaking, we can help you much more if you explain the context of your question, give us a clearer picture of what you're really trying to accomplish, and if you share specific details (eg, baud rate, message timing, etc). If you ask a narrowly focused tech question without much context, we're not able to understand your needs, and we lack the detailed info needed to actually help.

If you ask for a code sample in this sort of context-free, low-information question, usually the best anyone can do is point you to already published libraries or examples. Nobody is likely to spend a lot of work writing a code sample tailored only to this question.
 
I request you to kindly give me a sample code using the DMA function to store uart received bytes
As @defragster and others mentioned on your other thread, I am not sure that sample code exists. Or if it does, it was probably done by someone for a very specific purpose.
You might google on the off chance it shows up.

Is it possible? Probably. See chapter 52 of the RM which you can download from the pjrc website: https://www.pjrc.com/store/teensy35.html#tech

As Paul mentioned in the other thread. It may look easy, to start a setup for this, that when things are running perfectly, works. But anything goes wrong, like you start up in a middle of a message, or maybe a byte is garbled, or ...
Then there may be a lot of work you would need to do, to first detect this, and then recover from it.

But a lot easier to simply process the data from the RX Queue. Could process byte by byte and/or could wait until you have so many bytes in the queue before processing.

Or you could simply use the Arduino mechanism of SerialEvent to move the data into your location and set a flag or raise an interrupt or the like when it has full packet.

That is if you are using Serial1, you could simply provide method:
void SerialEvent1()

Which would be called each time through loop() or when system is idle that is when things like delay() are called.
This function is called if there is anything within the RX buffer for Serial1...
Change the 1 to the Serial port you are using.

Please continue this on your other thread. No reason to have multiple threads on the same thing.
 
UART/DMA: unlikely to be worth the effort

I have implemented DMA-based UART communication using Kinetis microcontrollers in the same family as the one in the Teensy3.5. The only reason we could justify the major investment of development time is due to our need to drive crazy high continuous communication bandwidth combined with fine-grained time synchronization, doing high bandwidth analog measurement and signal processing in a few square inches in each of tens of connected units. Communication is provided using two UARTs and optical links. This has been very challenging work. Driving UART communication with DMA is possible but very very subtle, and the effort to do it correctly is significant. (I have plenty of other hyperbolic adjectives...)

The other commenters and Paul are correct, driving a UART with DMA is rarely if ever worth the effort. The K64F microcontroller in the Teensy3.5 is so fast that in our application, the CPU detects a UART interrupt trigger condition (in our case, IDLE) and begins executing the UART interrupt service routine before DMA finishes transferring the last few bytes from the UART FIFO to the receive buffer. So not only is UART / DMA control very difficult to get right, it is also unnecessary except in very limited circumstances.
 
I am just a little curious as to why using the DMA as an overflow buffer for UART then read to another location is so horribly slow and hard to implement then why do they mention it in the manual from the chip manufacturer them selves with it's own location in the manual????
Seems the only one that is not beating down the necessity of use is Kurt E. Honestly throw a match on OP and burn him, how dare he ask for something that has been ignored for so long with no examples or proof to show otherwise.
 
Looked at another STM platform @OneHorse was working on - only 64 MHz.

The @Grumpy guy behind that was doing DMA for ALL peripherals it seemed. Not sure if the architecture was diff than the NXP family - or just that it made best use of the low power low speed core?

Anyhow p#6 says it can and has been done on a T_3.5 like family member and not without difficulty and not showing any real gain.

Worked with mr @Grumpy testing his DMA UART code - involved callbacks and was functional and speed and reliability was inline with core speed.

But as @Paul notes on Teensy DMA isn't used as commonly for core bus work - which leaves DMA open for end and other use where it can shine moving big blocks of data. DMA seems cool not overtly taking CPU cycles - but it does tie up buses for some cycles at times and as @Paul noted "DMA has significant overhead to set up and manage"

And this poster never suggested anyone should be burnt - nor does it seem others did - but the forum is generally DIY with the exception of issues with existing supported code/libraries. When it comes to work outside that Paul's comment seems in line with the nature of the forum - and involves no flames:
Generally speaking, we can help you much more if you explain the context of your question, give us a clearer picture of what you're really trying to accomplish, and if you share specific details (eg, baud rate, message timing, etc). If you ask a narrowly focused tech question without much context, we're not able to understand your needs, and we lack the detailed info needed to actually help.

If you ask for a code sample in this sort of context-free, low-information question, usually the best anyone can do is point you to already published libraries or examples. Nobody is likely to spend a lot of work writing a code sample tailored only to this question.
 
I am just a little curious as to why using the DMA as an overflow buffer for UART then read to another location is so horribly slow and hard to implement then why do they mention it in the manual from the chip manufacturer them selves with it's own location in the manual????
Seems the only one that is not beating down the necessity of use is Kurt E. Honestly throw a match on OP and burn him, how dare he ask for something that has been ignored for so long with no examples or proof to show otherwise.

That is the great thing about open source code. Many can contribute. If you find a better solution that works better than the current stuff, go for it, and if it works for normal usages, then submit a pull request for it. And if Paul agrees he will merge it in. Note he wrote the T3.x serial code.

Just a reminder most everyone up here other than Paul and Robin or customers like you.

When I played with it a few years ago, ran into questions like how to deal with variable length input, and wanting to maximize throughput and still keep latency down. Do you interrupt on each character, or when buffer full, maybe also half full.
 
Sometimes I wonder what the Grumpy Pizza guy is up to these days. Years ago he wrote so many extraordinarily detailed messages on Arduino's (now defunct) developer mail list about multitasking, but sadly none of those ideas ever seems to gain any traction in the Arduino ecosystem.

Is Onehorse still selling STM32L-based boards? I've kind of lost track of them in recent years....
 
Regarding my answer in msg #4, yes, perhaps it was a bit abrupt.

We have a long history on this forum of questions rooted in a desire to craft the most efficient or highest performance way to design applications. Over and over certain common misunderstandings have come up. Over estimating DMA usefulness and under estimating difficulty of race conditions are common themes. Some of the strongly worded statements I make are less due to the question at hand and more motivated by years of seeing people cling to these misunderstandings.

So much of the material which can be found online is marketing. Semiconductor datasheets are first and foremost marketing to get you to buy with a secondary purpose of supplying technical facts. Of course they emphasize the selling features and often downplay or just fail to mention their limitations. And DMA does indeed work wonders for some applications, especially audio and video, so why wouldn't anyone believe it can improve everything when the costs and trade-offs are not mentioned?

The result is over and over people have developed unrealistic expectations which are quite difficult to dispel. I really do believe speaking about the limitations clearly, if perhaps abruptly, is in everyone's best interest. Spending many hours tinkering with an approach which is unlikely to be good can be valuable from a learning perspective, but if your intention is to finish your project with high performance as the final goal, going down a fruitless path can waste a tremendous amount of time.

Serial communication protocols almost always involve data-dependent parsing to find the begin and end of each multi-byte message. Even for fixed length protocols, usually parsing or a timeout is used to recover from starting reception in the middle of a message. Serial doesn't offer anything like the chip select signal with SPI or the start & stop conditions of I2C.

General purpose DMA is a poor fit for most serial protocols, because it offers no data parsing and it imposes data length requirements which have to be configured in advance. As Kurt mentioned, you'll face difficult choices which impact latency.

But I could be wrong about all this. Maybe someone will show a way to use DMA in the serial code? To consider merging such a drastic change to code so widely used for such a large variety of usages cases would require a lot of testing.
 
Sometimes I wonder what the Grumpy Pizza guy is up to these days. Years ago he wrote so many extraordinarily detailed messages on Arduino's (now defunct) developer mail list about multitasking, but sadly none of those ideas ever seems to gain any traction in the Arduino ecosystem.

Is Onehorse still selling STM32L-based boards? I've kind of lost track of them in recent years....

They have been working together on github.com/GrumpyOldPizza/ArduinoCore-stm32wb - selling on Tindie.
 
Dear members

I am using Teensy 3.5 in an arduino environment. currently i read a string of 13 bytes in length and then do the processing.
Is it possible to use DMA to auto read the uart string and store it at a memory location using DMA . So that my program just reads the memory location / variable ad do the processing.
This will make my program efficient for adding other work and stop wasting time in reading the UART.
Please suggest and if possible please send me a code.
Regards
Arup

I wrote my own code for DMA Rx and Tx serial UART on Teensy 4.1, maybe this helps you write yours.

HardwareSerial source code was my starting point, but I now named the Classes DMA_UARTs. Code is only for Teensy4.1.

The reason why the default HardwareSerial interrupt based code as-is did not work for me was that I have serial data coming in in unpredictable bursts, of unpredictable length, at pretty high baudrates, while the Teensy runs other tasks that can be blocking for tens of microseconds and/or need a higher priority service and need multiple microseconds to complete their isr. In short: the characters hit the RX input pin at a ~microseconds rate, and I cannot stop them bombarding the Teensy, and I cannot afford to miss or ignore any char. So my issue was that I will start to miss some chars when the CPU is too busy doing other stuff that then is more important, more deterministic. Think for example Profibus protocol at 1.5Mbit/s. Or 12 Mbit/s which is part of that industry standard.
Also I wanted to have both 8 bit and 9 bits per character capability on each UART (for yet another protocol, but on just one of the many Teensy UARTs).
What's implemented in this code is circular DMA buffers. When a character arrives in the UART, the DMA channel writes 16 bits into the DMA buffer. Lowest 9 bits are for the serial payload, higher bits will include a zero. Arrival of any char can be sensed by observing that the default of 0xffff in the buffer location was overwritten by the DMA engine. But arrival can also be sensed by observing the DMA pointers.
Also implemented in this code is options for doing RX and TX over the same Teensy 4.1 pin, plus RS485 RX/TX direction control. Not implemented: LIN.

View attachment DMA_UARTs_Nov2022_SD.zip
 
I agree it was a rather low informative question to start with yet I see the replies are very informative now for which I thank you all.
Sicco thank you so much for that amazing work. I will study till my eyes fall out lol!

And no there will be no burning of a human, just a figure of speech.
 
Back
Top