Forum Rule: Always post complete source code & details to reproduce any issue!
Results 1 to 3 of 3

Thread: Teensy 3.0 as logic analyzer, instruction timing

  1. #1

    Teensy 3.0 as logic analyzer, instruction timing

    Dear all,

    I'm working on rewriting the Arduino Logic Analyzer from Andrew Gillham (https://github.com/gillham/logic_analyzer, see also http://letsmakerobots.com/node/31422) to run on the Teensy 3.0.

    I'm already able to capture 8 channels at max 13.7MHz into 12KB of SRAM, using:
    Code:
        byte logicdata[MAX_CAPTURE_SIZE];
        unsigned int logicIndex;
        ...
        // 7 cycles on Teensy at 96 MHz: 72.916 ns, 13.7 Msps 
        while (logicIndex) {
          logicdata[--logicIndex] = CHANPORT;
        }
    I measure this by making an Arduino generate a 1Mhz rectangle on one port, where every 256 us one high value is low, at which time I'm setting a second port to high.

    However, I seem to observe that my tight Teensy 3.0 loop starts taking 7 cycles and then degrades to 8 cycles at higher (or lower?) memory:


    Real time between cursors is 256 us. OLG shows a delta between Cursor 1 and Cursor 2 of 249.08 us, and between Cursor 2 and Cursor 3 of 223.64 us. OLG believes that a sample lasts 72.916ns, so we captured 3416 samples between C1 and C2 and 3067 samples between C2 and C3. So, between C1 and C2 sampling actually took 74.94 ns on average (thats 7.2 cycles), and between C2 and C3 sampling took 83.47 ns on average (thats 8 cycles).

    Any idea why the tight loop initially takes 7 cycles and later 8 cycles?

    Also, I've been padding out the tight loops with NOP operations for lower sampling speeds. For instance, this loop usually takes 10 cycles:
    Code:
        while (logicIndex) {
          logicdata[--logicIndex] = CHANPORT^waitCount; // Three NOPs don't work, they make the loop longer than 10 cycles
          __asm__("NOP\n\tNOP");
        }
    The following loop usually takes 20 cycles:
    Code:
        while (logicIndex) {
          logicdata[--logicIndex] = CHANPORT;
          __asm__("NOP\n\tNOP\n\tNOP");
          __asm__("NOP\n\tNOP");
        }
    How comes that one NOP sometimes takes 1 cycle and sometimes 2? Or is it so that the branch prediction gets worse when the jump address lies further away?

    Kind regards,
    Sebastian

    PS: I also observe quite some noise on Channel 0. Any clue what might be causing this at those frequencies? I've already tried to use other ports, both on the Teensy and on the Arduino, but in vain. Here the setup:
    Click image for larger version. 

Name:	DSC_9181.jpg 
Views:	541 
Size:	137.0 KB 
ID:	51
    Last edited by wangnick; 11-18-2012 at 09:48 AM.

  2. #2
    Hey, did you ever get this working well? I'm very interested in using my teensy3.0 as a logic analyzer.

  3. #3
    Senior Member Jp3141's Avatar
    Join Date
    Nov 2012
    Posts
    486
    There are a number of reasons you will observe jitter in these types of systems when using advanced processors.

    The ARM processor in the Teensy 3.0 runs quite fast (48 to 96 MHz). Flash memory can't run that fast, so there is a small cache in that portion of the MCU. Depending on whether or not the instruction is in the cache, timing will vary (lookup FMC here http://cache.freescale.com/files/32b...rent_pageType= )

    The processor also has a small pipeline -- it is possible that this means that all runs through your short loop don't repeat same the pipeline fetching pattern. I don't think this is what wangnick is seeing.

    USB (and microsecs ?) generate interrupts on the processor -- this actually takes larger (100 ns ?) chunks of time -- probably not what he was seeing also.

    It is possible to disable interrupts, and with more effort, probably possible to disable the cache (or copy and run code from RAM).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •