Dear all,
I'm working on rewriting the Arduino Logic Analyzer from Andrew Gillham (https://github.com/gillham/logic_analyzer, see also http://letsmakerobots.com/node/31422) to run on the Teensy 3.0.
I'm already able to capture 8 channels at max 13.7MHz into 12KB of SRAM, using:
I measure this by making an Arduino generate a 1Mhz rectangle on one port, where every 256 us one high value is low, at which time I'm setting a second port to high.
However, I seem to observe that my tight Teensy 3.0 loop starts taking 7 cycles and then degrades to 8 cycles at higher (or lower?) memory:
Real time between cursors is 256 us. OLG shows a delta between Cursor 1 and Cursor 2 of 249.08 us, and between Cursor 2 and Cursor 3 of 223.64 us. OLG believes that a sample lasts 72.916ns, so we captured 3416 samples between C1 and C2 and 3067 samples between C2 and C3. So, between C1 and C2 sampling actually took 74.94 ns on average (thats 7.2 cycles), and between C2 and C3 sampling took 83.47 ns on average (thats 8 cycles).
Any idea why the tight loop initially takes 7 cycles and later 8 cycles?
Also, I've been padding out the tight loops with NOP operations for lower sampling speeds. For instance, this loop usually takes 10 cycles:
The following loop usually takes 20 cycles:
How comes that one NOP sometimes takes 1 cycle and sometimes 2? Or is it so that the branch prediction gets worse when the jump address lies further away?
Kind regards,
Sebastian
PS: I also observe quite some noise on Channel 0. Any clue what might be causing this at those frequencies? I've already tried to use other ports, both on the Teensy and on the Arduino, but in vain. Here the setup:
I'm working on rewriting the Arduino Logic Analyzer from Andrew Gillham (https://github.com/gillham/logic_analyzer, see also http://letsmakerobots.com/node/31422) to run on the Teensy 3.0.
I'm already able to capture 8 channels at max 13.7MHz into 12KB of SRAM, using:
Code:
byte logicdata[MAX_CAPTURE_SIZE];
unsigned int logicIndex;
...
// 7 cycles on Teensy at 96 MHz: 72.916 ns, 13.7 Msps
while (logicIndex) {
logicdata[--logicIndex] = CHANPORT;
}
I measure this by making an Arduino generate a 1Mhz rectangle on one port, where every 256 us one high value is low, at which time I'm setting a second port to high.
However, I seem to observe that my tight Teensy 3.0 loop starts taking 7 cycles and then degrades to 8 cycles at higher (or lower?) memory:
Real time between cursors is 256 us. OLG shows a delta between Cursor 1 and Cursor 2 of 249.08 us, and between Cursor 2 and Cursor 3 of 223.64 us. OLG believes that a sample lasts 72.916ns, so we captured 3416 samples between C1 and C2 and 3067 samples between C2 and C3. So, between C1 and C2 sampling actually took 74.94 ns on average (thats 7.2 cycles), and between C2 and C3 sampling took 83.47 ns on average (thats 8 cycles).
Any idea why the tight loop initially takes 7 cycles and later 8 cycles?
Also, I've been padding out the tight loops with NOP operations for lower sampling speeds. For instance, this loop usually takes 10 cycles:
Code:
while (logicIndex) {
logicdata[--logicIndex] = CHANPORT^waitCount; // Three NOPs don't work, they make the loop longer than 10 cycles
__asm__("NOP\n\tNOP");
}
The following loop usually takes 20 cycles:
Code:
while (logicIndex) {
logicdata[--logicIndex] = CHANPORT;
__asm__("NOP\n\tNOP\n\tNOP");
__asm__("NOP\n\tNOP");
}
How comes that one NOP sometimes takes 1 cycle and sometimes 2? Or is it so that the branch prediction gets worse when the jump address lies further away?
Kind regards,
Sebastian
PS: I also observe quite some noise on Channel 0. Any clue what might be causing this at those frequencies? I've already tried to use other ports, both on the Teensy and on the Arduino, but in vain. Here the setup:
Last edited: