LPSPI in Teensy 4.0/4.1, maximum clock frequency

DrM

Well-known member
Moderator Edit: 18 messages from another thread were moved here. See msg #19 for first new message starting this thread

-----------------------------------------------------------------------------------------


@PaulStoffregen
Several times I've considered making an ADC shield. Might still do it. Recently have been playing with a relatively cheap (~$4) single channel 16 bit ADC chip and a 8:1 mux, both controlled with FlexIO. Input setting time is a difficult problem which leads to channel crosstalk, so not a simple thing, especially if running at speeds like 300-500 ksamples/sec. Of course there are chips with the mux built in, but the cost goes up quickly.

I hope you can understand with the built-in ADC, we simply have to live with whatever NXP provides. I really wish they would have done better. They probably do too, since they could sell more chips. But it's not so simple. Generally speaking, the improvements in silicon which allow digital circuits to run faster tend to make analog circuit less precise. With the much older & slower silicon used by Teensy 3.2, they were able to make the ADC better because the transistors were different, and specially certain analog circuits like a differential pair amplifier can be more precisely made than in chips meant for faster digital circuitry.

First, aside, Paul have you tried that ADC we talked about somet time ago? That one has a mux and can share one opamp for all of the channels. I will post or send you the design files if anyone is interested, with the proviso that it is work in progress, I wanted to make another pass over package sizes and layout. Admittedly it might not be $4 all counted, but it seems pretty good.

Also, aside, I will be posting a really good analog input, maybe in the next few days, with a custom InAmp front end and high precision differential ADC. It connects by SPI.

Okay, now to the bit about why the analog inputs on microcontrollers are the way they are. I put a lot of time into investigating this recently. It is pretty interesting, in some ways. So far I have not found an exception to the following.

(Paul - is there any way to repost this to its own topic? I think it could be of general interest. But read on, I leave it to your wisdon. Please do let me know what you think.)

A lot of what goes on in the internal analog inputs seems to be driven by a few factors; space (including the size of the sampling capacitor vs sampling noise), cost on silicon, single ended supply and market (who it is designed for).

a) Space on the chip and the size of the sampling capacitor versus precision

Noise in any sampling input is limited by the sampling capacitor, according to the famouse kT/C ("kay tee over sea") noise,

v >= sqrt( k T / C)

where kT at room temperature is about 25 meV. Multiply C by 6.2E18 e/coulomb, and you see you have units of volts, i.e. sqrt(V^2).

Now, let's say we want 16 bits on a 3.3V input, about 50uV. The sampling capacitor noise needs to be smaller than that, so lets say about 8 to 10pf at a minimum.

How about 12 bits? Then we need noise small compared to 0.8mV and noise for a 1pf capacitor 60uV.

The size of an internal cap is real estate in a chip. That is one reason 12 bits is common, and 16 bits less so.

In fact, as it turns out, the K20 16 bit inputs, have 8pf to 10pf sampling capacitors and the 12 bit inputs have 4pf to 5pf


b) Kickback

The analog input almost always involves a switched sampling capacitor. When the switch closes to connecft the capacitor to the input, current has to be supplied by the source, to charge the capacitor to the required precision of the input voltage.
This has two important consequences

i) The carge draws by the sampling capacitor becomes a current. The current spike works out to be a voltage spike of V/R where R is the impedance of whatever comes in front of the capacitor, including your source impedance for the thing you want to measure.

ii) The sampling capacitor needs time to reach the voltage of your input,

V_cap / V_input = 1 - exp(-t/RC).

That means that to get to the precision you need for n bits, you need a sampling time > nbits x ln(2) x RC

For a 10 pf capacitor, and 50 ohm source, we need t > nbits x 0.7 x 10 pf x 50 ohm = 6 nsecs. Not bad.

iii) But, faster sampling, lower source impedance, means more current draw, more kickbacl.

In the above example, 10pf cap, 16 bits, we might need to supply 3V/50 = 60mA.

You might say, well I need to drive the ADC with an opamp. But that is still a lot of current for many opamps.

So the way it is usually done, is to put an RC between the opamp and the sampling capacitor. Now the C in the RC acts as charge reservoir and the opamp only need supply a smaller current to replenish the charge reservoir.

Here is what it looks like if you only use an opamp. See the large spikes when the switch S1 closes.

SAR_opamp_R100_cropped.jpg



And here is what it looks like with an RC between the opamp and sampling input. Now the red is current through C2, the cap in the RC in front of the input. Notice the current comes from that capacitor instead of being drawn as kick back through the opamp (or whatever else you might have attached if you were trying to use the analog input as a direct input).

SAR_opamp_RC20x1_currents_cropped.jpg




iii) Cost and market

The above with the RC, is the right way to do it. But to do that you need a negative supply for the opamp. You also need an opamp.

Here is what manufacturers choose to do instead. Honest to goodness, this is from the K20 datasheet (the Teensy 3.2.

Notice the resistor Radin in front of the ADC SAR engine (the sampling cap is inside the SAR).


K20-adcinput.png




Radin is 2K. So the current drawn by the sampling capacitor cannot be larger than about 1.6mA.

But now we need a longer sampling window, t > 16bits x 0.7 x 10 pf x 2 kohm = 220 nsecs. That still fits the sampling speed for the Teensy.

And by the way, this is why they tell you anyway, not to use a large source impedance. If you put another large resistor in front then you might need to be careful about the length of your sampling window. That is the story beind many of those forum posts about funny readings for a 10k thermistor in a 20k divider.

Why do they do this? They could just give us a bare input, and let us worry about driving the SAR.

What that large resistor does, is at least two things. For a packaged source device, and slow measurements, it might be good enough. It also makes it possible for a hobbyist, to be able to use the input in some instances, without thinking about it too much.

The problem with that approach, is that it also makes it impossible to use the ADC input a more correct way. And there is a limited set of circumstances where it works to give a realistic voltage.

So that is the story of analog inputs on MCU chips.
 
Last edited by a moderator:
Actually, the internal ADC, DAC peripherals are far less interesting for me these days.

Gigahertz clocks and 480MHz USB are far more important.

What is still missing with the NXP parts is a 50MHz (or better yet 100MHz) SPI.
 
What is still missing with the NXP parts is a 50MHz (or better yet 100MHz) SPI.

FlexSPI does this. But it's only available on the bottom side pads, because memory chips are the most common use for such clock speeds.

I know you're focused on ADC chips. Whether any specific ADC chip could actually work with FlexSPI (if you went to the trouble to make the physical connection) is still an open question. FlexSPI is quite complicated to use. It really is more or less designed to use memory chips. So far as I know, nobody has really reported any serious attempt to use it with any non-memory chips.
 
Actually, the internal ADC, DAC peripherals are far less interesting for me these days.

Gigahertz clocks and 480MHz USB are far more important.

What is still missing with the NXP parts is a 50MHz (or better yet 100MHz) SPI.
SPI on T4 is rated up to 132MHz. Or am I missing something?
 
@PaulStoffregen Actually, I have two designs ready for fab. 1) Single photon tagging and counting with picosecond precision using a time to digital converter (TDC), and 2) An analog output with a 16 bit 1MSPS DAC and an output stage that can drive capacitive loads.
(Apologies for the plug, but... I need sponsors to get the boards made in case anyone is interested.)

Re the ADCs, it is true, I invested a lot of time in understanding them in detail and I do have a few more ADC projects in the works. But I think the above are super important for finishing out the basic set of boards that are needed for a materials or physics laboratory. If there is a sponsor, I want to do those next.

@jmarsh I think the i.mxrt documents say the SPI should not be run faster than 30MHz. But is it true that it can do 132MHZ? If so, why do they want it throttled back?
 
@jmarsh I think the i.mxrt documents say the SPI should not be run faster than 30MHz. But is it true that it can do 132MHZ? If so, why do they want it throttled back?
Not sure where the 30MHz information comes from but on page 1025 of the IMXRT1060 reference manual, the System Clock Frequency Values table shows a max of 132 MHz for LPSPI.
 
Still, why is that? What happens when you don't? Is it heating? Or, maybe it just can't respond? Perhaps a parasitic impedance somewhere?
 
Strange and confusing that they use the term "frequency of operation (fop)" rather than using f(SCK).

There are questions on the NXP community forum about this, and the NXP response is that the limit is 30 MHz, with no explanation regarding the conflict. There is one question about LPSPI on a different processor where the fSCK max is 40 MHz but the "fop" max is 10 MHz, and the NXP response was that the up to 40 could be used for loopback test, but only 10 otherwise.
 
That's a really good clue actually. Do we have equivalent circuits for the SPI ins and outs? Or perhaps something about their capacitance, impedance etc. and max current?
 
That's a really good clue actually. Do we have equivalent circuits for the SPI ins and outs? Or perhaps something about their capacitance, impedance etc. and max current?

Perhaps, yes. I also found the question below, which says that people have successfully used 60 MHz, to which there has been no answer.

The Data Sheet IMXRT1060CEC, on page 70, Table 57, shows a footnote number 1,

"Absolute maximum frequency of operation (fop) is 30 MHz. The clock driver in the LPSPI module for f_periph must be guaranteed this limit is not exceeded."

There are comments on other forums that the device has been run at 60MHz, and in fact I tried that briefly and it does seem to transfer data ok.

So, why must it be guaranteed that the 30MHz limit is not exceeded?

Are there any other ways to achieve a 60MHz SPI with the i.MX RT 1060 processors? If so, can you please provide a working example.
 
You could always use FlexIO instead, configured to operate as SPI. It can definitely go higher than 30MHz.
 
(Perhaps this should be copied to another thread where the subject is the LPSPI)

Recall that the datasheet says:
"Absolute maximum frequency of operation (fop) is 30 MHz. The clock driver in the LPSPI module for fperiph must be guaranteed this limit is not exceeded."

The questions I asked are, (1) "Why?" and, (2) "What happens when it is run faster?"

And, here is the answer from NXP:
"NXP can ensure the SPI transmission rate of 30MHZ can meet various hard conditions. In fact it can exceed 30MHZ, but NXP does not guarantee stability under various extreme conditions."

So, that seems a little different from the datasheet. Nonetheless, that sounds electrical. Perhaps we can design our external circuit to manage that instability, if only we had a little more information.


Does anyone among us have some insight into why or how it becomes "unstable" at higher speeds?
 
More on the continuing adventures with NXP support.

Here is the message that arrived, taking back what they wrote in the previous message:

Please ignore what I said in previous Email "In fact, the SPI rate can exceed 30MHz, but NXP does not guarantee the stability of SPI under various extreme conditions.

Please follow the SPI description as stated in the NXP documentation IMXRT1060CEC .

I replied by repeating the question that I sent them at the beginning of the support ticket.

What limits the LPSPI to 30MHz? And what happens when we run it faster?

I am trying to find it whether there is an electrical issue, an internal thru-put issue, or perhaps a heat issue.
 
For the LPSPI in the Teensy 4.x (i.MX RT 1060), what is the maximum clock frequency? And, what happens when the SPI clock is run at a faster frequency?

The I.MXRT1060 datasheet, page 70 says the maximum for the SPI clock is 30MHz with a kind of "do not exceed" notice. See the image inserted below.

And yet, there are reports on these forums of the clock being run faster than 30 MHz.

So, the questions are,

(1) What is the origin of the 30MHz limit?

(2) What happens when the clock is run faster than that?

Some possibilities that come to mind; maybe it is electrical, maybe the device cannot keep up with a faster clock, or perhaps the reason is heating.

The question has been sent to NXP support. When we receive a cogent reply, I will try to remember to post it here. Meanwhile, does anybody know something about this?



1738941936900.png
 
(1) What is the origin of the 30MHz limit?

NXP, obviously, though inconsisteny like "F(op)" versus "F(SCK)" kinda makes it look like copypasta.

(2) What happens when the clock is run faster than that?

An NXP employee or contractor risks a reprimand or poor performance review, if there's even the slightest implication they advised you to do so! Seriously, do you really imagine anyone at NXP is going to give any answer other than "Thou Shalt Limit To Published Specs"?
 
Last edited:
But to talk seriously about the actual underlying tech, generally speaking timing in synchronous (clocked) digital circuitry is all about complying with setup and hold time requirements. If you're not familiar with flip flop setup and hold time, many good pages online and in textbooks explain this very fundamental digital logic concept, though some of them deep dive into the transistor-level design of flip flops. If you're not familiar with setup and hold time, do your homework before reading the rest of this message!

Hold time can be a particularly thorny problem. If your combinational logic circuitry has less propagation delay than the destination's hold time, that destination has the correct input long before the clock edge that causes the flip flop to capture it. But then right after that clock edge, the input starts to change too soon and the flip flop can end up storing the wrong data, even though you did everything correct before the clock edge!

A common misconception of many novices is all digital speed problems can be solved by just clocking slower. But hold time doesn't work that way. You could clock only once per day, but if the flip flop requires 5ns hold time and your circuitry made the input from other signals that change on the same clock edge and delivers it in less than 5ns, your flip flop can acquire the wrong data.

FPGA vendors pour a lot of work into designing flip flops with zero hold time, so their products are easier to use.. But that's certainly not the norm. Simply designed flip flops have hold time.

SPI is meant to be a simple and easy to use protocol. It deals with the hold time problem by having the transmitter change its output on one edge of the clock and the receiver. In the case of zero delay for the signals, this means 50% of the clock cycle is available to satisfy the input flip flop's setup time, and the other 50% satisfies the hold time requirement. At 30 MHz, that's 16.6ns for setup and 16.6ns for hold.

But simple is far from optimal for speed. The norm for most flip flops is more setup than hold required. And usually you have signal propagation delays, like the output drivers on the transmitting chip and the input buffers and ESD protection on the receiving chip.

At least for today, I'm not going to speculate about NXP's internal design details. But I will quickly mention T(su) in the table has 2 different specs, depending on a specific setting. Maybe that would be a useful place to focus your attention? Just remember, datasheets are sales pitches which are meant to sell you the chip, so some diligence is needed to discover the trade-offs.

Anyway, you asked "does anybody know something about this?", so here is the sort of answer you'll (probably) never hear from anyone at NXP. I know you just want a solution, which this more theoretical explanation the tech is not, but if you do dive into optimizing and especially if you verify with high bandwidth oscilloscope capture of the real waveforms to verify, hopefully this sort of deeper understanding of the timing issues helps.
 
Anyway, you asked "does anybody know something about this?", so here is the sort of answer you'll (probably) never hear from anyone at NXP. I know you just want a solution, which this more theoretical explanation the tech is not, but if you do dive into optimizing and especially if you verify with high bandwidth oscilloscope capture of the real waveforms to verify, hopefully this sort of deeper understanding of the timing issues helps.

Yes, technical understanding helps, it is the prerequisite. Their use of the word "stability" now makes sense. The explanation of setup and hold time and the interaction with propagation time seems a little bit confusing. Let's see if I can put it in my own words.

It seems the basic idea is setup and hold times are the times required before and after the clocked level transition and propagation time is what you do after the clock. At some point within the propagation time the level is sampled and acted on. Now, the setup and hold time are electrical characteristics. The propagation time is probably in terms of the number of clock cycles. So, by clocking too fast it may be possible to cause the level to be sampled before the hold time is completed, and the resulting indeterminacy of its input is called an "instability".

Is that about right?

In other words, the clock rate of the LPSPI is limited by the internals of the LPSPI and normally not by external characteristics.

So then, we still have one question. There are some reports on these forums of the LPSPI running faster. Presumably if those are sufficiently faster then 30MHz and we look at them carefully, we will find there are data errors. Yes?
 
So, at last, something like an answer from NXP.

It turns out that the 30MHz limit is empirical. They tested it over some range of conditions and determine that this is the fastest speed at which remained reliable over that set of test conditions. They do not describe the conditions or results beyond stating that one number.

Here is the text verbatim:

What limits the LPSPI to 30MHz is a combination of various factors.

NXP designer determines this value based on many tests. For example, STA testing, factors such as different temperature, humidity, clock, power supply conditions ( including other conditions)can affect the performance of LPSPI. After conducting comprehensive tests under various conditions, the minimum supported rate(LPSPI) is selected and provided to users.

Therefore, you can use the LPSPI module at 30MHz, and it can ensure stable and reliable communication in master mode or slave mode, half - duplex or full - duplex working modes, as well as within the temperature and humidity range and operating voltage range( and other conditions) supported by the chip.
 
Is that about right?

In other words, the clock rate of the LPSPI is limited by the internals of the LPSPI and normally not by external characteristics.

Partially about right.

I'll try once more, but I don't have a lot of time for writing today.

For MISO in SPI master mode 0 or 3, the transmitter inside Teensy need your data from your ADC chip to arrive at least the setup time before the rising edge of SCK, and to remain the same until at least the hold time. Those are with respect to the rising edge.

Maybe SCK is connected directly with a short trace, adding very little delay, or maybe you have a buffer which causes the ADC chip to hear SCK a few nanoseconds later?

If your ADC chip follows normal SPI convention, it will change the data it transmits on MISO when it hears the falling edge. There is probably some propagation delay within the ADC chip, limiting how quickly it's able to get the new data onto MISO after it hears that falling edge on SCK. Maybe MISO is directly connected with little propagation delay, or maybe there's a buffer which adds more nanoseconds?

Teensy needs the MISO signal to be correct at its MISO pin before the rising edge of SCK, by at least the setup time.

Let's go with the 10ns setup time spec from Table 57 (msg #19). Let's imagine the PCB trace is long at driving it costs 1ns, but at least there's no buffer chip adding even more delay. If you did have a buffer chip, its propagation delay would matter! And since we're not talking about any specific ADC chip, let's just make up a spec of 8ns it needs to get its MISO output correct.

Those 3 all add up to 19ns. That's the minimum time between SCK falling edge and SCK rising edge. With 50% duty cycle clock, that means the fastest clock for those specs would be 38ns, or ~26 MHz.

Hold time isn't a concern. But if you imagine a bizarro universe where superman is evil and SPI chips change their output on the same clock edge as they receive, and also imagine this ADC chip is very fast to change its MISO output within 2ns of the clock, then hold time could be a huge problem. Teensy has a 2ns hold time requirement in Table 57. Fortunately SPI protocol has transmitter change on 1 clock edge and receivers sample on the other. That solves the nightmare problems of hold time. But the trade-off is the delays for valid data output + propagation between chips + receiving setup time all get compressed into only half the clock cycle. Sum of those 3 are usually what limits your maximum clock speed.

Hopefully this helps you see the total picture depends on the LPSPI internals, and other SPI device, and the interconnect between them (especially if using buffers or long wires or multiple SPI chips). It's not only LPSPI internals.

I hope this non-NXP answer helps, and it really is good to see NXP gave at least some additional answer even if they don't explain the details much.
 
Last edited:
Back
Top