Teensy 3.0 as logic analyzer, instruction timing

Status
Not open for further replies.

wangnick

Well-known member
Dear all,

I'm working on rewriting the Arduino Logic Analyzer from Andrew Gillham (https://github.com/gillham/logic_analyzer, see also http://letsmakerobots.com/node/31422) to run on the Teensy 3.0.

I'm already able to capture 8 channels at max 13.7MHz into 12KB of SRAM, using:
Code:
    byte logicdata[MAX_CAPTURE_SIZE];
    unsigned int logicIndex;
    ...
    // 7 cycles on Teensy at 96 MHz: 72.916 ns, 13.7 Msps 
    while (logicIndex) {
      logicdata[--logicIndex] = CHANPORT;
    }

I measure this by making an Arduino generate a 1Mhz rectangle on one port, where every 256 us one high value is low, at which time I'm setting a second port to high.

However, I seem to observe that my tight Teensy 3.0 loop starts taking 7 cycles and then degrades to 8 cycles at higher (or lower?) memory:
olg-teensy3.png


Real time between cursors is 256 us. OLG shows a delta between Cursor 1 and Cursor 2 of 249.08 us, and between Cursor 2 and Cursor 3 of 223.64 us. OLG believes that a sample lasts 72.916ns, so we captured 3416 samples between C1 and C2 and 3067 samples between C2 and C3. So, between C1 and C2 sampling actually took 74.94 ns on average (thats 7.2 cycles), and between C2 and C3 sampling took 83.47 ns on average (thats 8 cycles).

Any idea why the tight loop initially takes 7 cycles and later 8 cycles?

Also, I've been padding out the tight loops with NOP operations for lower sampling speeds. For instance, this loop usually takes 10 cycles:
Code:
    while (logicIndex) {
      logicdata[--logicIndex] = CHANPORT^waitCount; // Three NOPs don't work, they make the loop longer than 10 cycles
      __asm__("NOP\n\tNOP");
    }

The following loop usually takes 20 cycles:
Code:
    while (logicIndex) {
      logicdata[--logicIndex] = CHANPORT;
      __asm__("NOP\n\tNOP\n\tNOP");
      __asm__("NOP\n\tNOP");
    }

How comes that one NOP sometimes takes 1 cycle and sometimes 2? Or is it so that the branch prediction gets worse when the jump address lies further away?

Kind regards,
Sebastian

PS: I also observe quite some noise on Channel 0. Any clue what might be causing this at those frequencies? I've already tried to use other ports, both on the Teensy and on the Arduino, but in vain. Here the setup:
DSC_9181.jpg
 
Last edited:
There are a number of reasons you will observe jitter in these types of systems when using advanced processors.

The ARM processor in the Teensy 3.0 runs quite fast (48 to 96 MHz). Flash memory can't run that fast, so there is a small cache in that portion of the MCU. Depending on whether or not the instruction is in the cache, timing will vary (lookup FMC here http://cache.freescale.com/files/32...64M72SF1.pdf?&Parent_nodeId=&Parent_pageType= )

The processor also has a small pipeline -- it is possible that this means that all runs through your short loop don't repeat same the pipeline fetching pattern. I don't think this is what wangnick is seeing.

USB (and microsecs ?) generate interrupts on the processor -- this actually takes larger (100 ns ?) chunks of time -- probably not what he was seeing also.

It is possible to disable interrupts, and with more effort, probably possible to disable the cache (or copy and run code from RAM).
 
Status
Not open for further replies.
Back
Top