Project: SPI_MSTransfer

yeah this fixes the events() in main loop without needing millis(), then you can do slave transfers without fail
let me zip it up
 
no wait its acting up after awhile, ill see maybe if i can take the volatile out of the class and put it in the isr scope
 
heres a zip if you wanna test

uses a global volatile outside of the class, so it should be okay for the ISR i guess

View attachment testbool.zip

so far its running, ill let it go for awhile to test further, you can play with the variable placements for test i guess

to test it
put events on master main loop

void loop() {
obj.events();
}

then in slave create a timer that uses slave.transfer16(array,size) in its loop, its not suppose to crash if we did it right.. crashes would appear almost immediately most of the time
 
as a sanity check i keep tim's code running so i can see if the slave stopped or not.

as another sanity check, make sure you run slave without slave.transfer16 in its loop, to make sure your setup works for awhile before playing, this way if it freezes in like 10 minutes, you pretty much know it wasnt something else
 
Running the SPI_MSTransfer16 and Examples_M&S as last noted ends up with Slave going into failed state - after some longer time.
Master T_3.6 @240 MHz and 12 MHz SPI to T_3.5 @144 MHz as last noted with the messages going 1KHz at "if ( millis() - _timer >= 1 ) {"

If Slave unpowered and restarted the LED blinks again and it picks up and runs for some long time. So once it fails - it never recovers as I have it going -until Slave restarted.


Just swapped in the testbool.zip sources and recompiled those existing samples and it fails and RESENDs even worse. Works for a short batch - then fails - though it does works for another couple dozen the fails - then works ... cyclically.
> Just changed Master to 500 Hz F&F and it is no better - still fails more than it works.

Return to SPI_MSTransfer16 with the 500 Hz F&F it is working very well for some much longer time I haven't seen fail yet. - It did just fail for a much shorter fail than with NEW code - and recovered.


SO this is an unbalanced test - something is on the EDGE of failing - and it does fail with Teensy speed noted above now with 500 Hz F&F "if ( millis() - _timer >= 2 )"
> Here are the M&S examples as compiled in case there is a library usage issue:
 
Ok. Back. Been running both 3.5s at 168Mhz. for reference with the testbool.zip. for the loop timer I set "if ( millis() - _timer >= 0 )", in the master. Up to close to 1000000, and no OT's that I see and still running smooth. Going to try on of my failed tests to see. I will edit this post just to save on posts :)

EDIT: Ok after about 10 seconds here is the results of 120/120:
Code:
F&F (OT=141)micros() _time==70
] [INFO] FAILED RESENDING 1 TIMES. RETRYING...] [INFO] FAILED RESENDING 2 TIMES. RETRYING...] [INFO] FAILED RESENDING 3 TIMES. RETRYING...DEBUG: [SLAVE CS 15] [INFO] FAILED RESENDING 4 TIMES. TRANSFER ABORTED.F&F (OT=141)micros() _time==19348
F&F (OT=142)micros() _time==3651
F&F (OT=143)micros() _time==5683
F&F (OT=144)micros() _time==427
F&F (OT=144)micros() _time==678

Also, the sermon basically hangs if I open the slave comport. Have to disconnect the slave to reset it.
 
Above p#481 still running for long passes - I see it fail and then restart - using older code - runs well for quite a while where the new testbool won't.

I thought those CPU speeds should be able to interact - esp with ONLY 12 MHz SPI - I actually did it to see how much bandwidth the Slave had left to do more processing. Also since mjs513 is using a T_3.5 at slower speeds for Master.

The 'DEBUGHACK' on Slave reduces USB output too much - which was the goal - but leaving the full print adds more slowness that I think alters the process timing with USB overhead.

The 12 DOUBLES coming over actually have 2 uint32_t's in there - so there are 8 bytes transferred I could OVERLOAD for test indexing to track lost messages without doing the full DEBUGHACK, or altering the native SLAVE code.

Actually I SHOULD program the SLAVE to use F&F PacketID==55 for proper processing and then another F&F PacketID for similar but DEBUG processing. That 'ARRAY EXCHANGE' code is ugly enough without #ifdef the same PacketID for DEBUG.
 
mike, i ditched the volatile and tried a different approach
i took v16 and wrapped the ::transfer16 slave call as

Code:
  if ( _slave_access ) {
    [COLOR="#FF0000"][B]ATOMIC_BLOCK( ATOMIC_RESTORESTATE ) {[/B][/COLOR]
      uint16_t data[6 + length], checksum = 0, data_pos = 0;
      data[data_pos] = 0xAA55; checksum ^= data[data_pos]; data_pos++; // HEADER
      data[data_pos] = sizeof(data) / 2; checksum ^= data[data_pos]; data_pos++; // DATA SIZE
      data[data_pos] = 0x0000; checksum ^= data[data_pos]; data_pos++; // SUB SWITCH STATEMENT
      data[data_pos] = length; checksum ^= data[data_pos]; data_pos++;
      data[data_pos] = packetID; checksum ^= data[data_pos]; data_pos++;
      for ( uint16_t i = 0; i < length; i++ ) { data[data_pos] = buffer[i]; checksum ^= data[data_pos]; data_pos++; }
      data[data_pos] = checksum;
      std::vector<uint16_t> _vector(data, data + data[1]);
      teensy_stm_queue.push_back(_vector);
      if ( teensy_stm_queue.size() > 14 ) teensy_stm_queue.pop_front();
      return packetID;
    [COLOR="#FF0000"][B]}[/B][/COLOR]
  }

so far its not crashing anymore! ill let it run
 
I got very few errors over time with example running, so I changed it to this now to reduce time in atomic block

Code:
  if ( _slave_access ) {
    uint16_t data[6 + length], checksum = 0, data_pos = 0;
    data[data_pos] = 0xAA55; checksum ^= data[data_pos]; data_pos++; // HEADER
    data[data_pos] = sizeof(data) / 2; checksum ^= data[data_pos]; data_pos++; // DATA SIZE
    data[data_pos] = 0x0000; checksum ^= data[data_pos]; data_pos++; // SUB SWITCH STATEMENT
    data[data_pos] = length; checksum ^= data[data_pos]; data_pos++;
    data[data_pos] = packetID; checksum ^= data[data_pos]; data_pos++;
    for ( uint16_t i = 0; i < length; i++ ) { data[data_pos] = buffer[i]; checksum ^= data[data_pos]; data_pos++; }
    data[data_pos] = checksum;
    std::vector<uint16_t> _vector(data, data + data[1]);
    [COLOR="#FF0000"][B]ATOMIC_BLOCK( ATOMIC_RESTORESTATE ) {[/B][/COLOR]
      teensy_stm_queue.push_back(_vector);
      if ( teensy_stm_queue.size() > 14 ) teensy_stm_queue.pop_front();
    [COLOR="#FF0000"][B]}[/B][/COLOR]
    return packetID;
  }

note:
#include <util/atomic.h> must be included in the library, its built into teensy's core

slave output so far:
Code:
64900.00,64901.00,64902.00,64903.00,64904.00,64905.00,64906.00,64907.00,64908.00,64909.00,64910.00,64911.00,0
64912.00,64913.00,64914.00,64915.00,64916.00,64917.00,64918.00,64919.00,64920.00,64921.00,64922.00,64923.00,0
64924.00,64925.00,64926.00,64927.00,64928.00,64929.00,64930.00,64931.00,64932.00,64933.00,64934.00,64935.00,0
64936.00,64937.00,64938.00,64939.00,64940.00,64941.00,64942.00,64943.00,64944.00,64945.00,64946.00,64947.00,0

master collecting those 500ms transfers from slave with events() in main loop():

Code:
returned value micros() _time==57
returned value micros() _time==57
returned value micros() _time==58
PacketID: 28775
Length: 20
4B 73 FC 39 21 99 E5 DB 2B 5A 4A 31 13 5C 7B 6A 56 33 4C C5 
returned value micros() _time==58
returned value micros() _time==57


View attachment atomicblocktest.zip <-- atomic block test with events() in master main loop and slave pulsing slave.transfer16() every 500ms
 
No example sketch with atomicblocktest. Is that 500 ms or 500 uS ?


<EDIT>
Running against post #481 as setup [Master T_3.6 @240 MHz and 12 MHz SPI to T_3.5 @144 MHz] this fails more often than it works?

Those sketches run well using SPI_MSTransfer16 - though failures not cataloged.
 
See updated post above . . . Not sure what problem recent fixes were to resolve?

Did you try putting the code in a attachinterrupt(CS) FALLING interrupt?

If that code was entered and detachinterrupt(cs) until it exits - then just stay there while data arrived or it was aborted would seem to be free of any re-enty or races - then attachinterrupt(CS) on exit.
 
You know I came across ATOMIC_BLOCK that when I was reading about using atomics and I forgot all about until your post. Do you have a example sketch. I deleted the version of master slave I modified - not a good day for testing things for me.
 
Tony-Tim
Two communication seems to work, just can't run it with the loop times set to 0. Worked at 1ms and 20ms. 2 would probably be better. Here are the sketches I am using. All Tim's = handiwork. Let me know if I messed something up.Going to let it run for a while and see the results.

Mike

EDIT: UPDATE it finally died at on the master:
Code:
Bad LASTVAL TEST INCREMENT <<<<<<<<<<<<<<<<<<<< DIFF OF> -12.00
38048.00,38049.00,38050.00,38051.00,38052.00,38053.00,38054.00,38055.00,38056.00,38057.00,38058.00,38059.00,254373,20
38060.00, #, #, #, #, #, #, #, #, #, #, #,254374,20
38072.00, #, #, #, #, #, #, #, #, #, #, #,254375,20
38084.00, #, #, #, #, #, #, #, #, #, #, #,254376,20
38096.00, #, #, #, #, #, #, #, #, #, #, #,254377,20
38108.00, #, #, #, #, #, #, #, #, #, #, #,254378,20
Bad Length: 8191
but started at:
Code:
27176.00, #, #, #, #, #, #, #, #, #, #, #,253485,2
Bad Length: 8191

Bad LASTVAL TEST INCREMENT <<<<<<<<<<<<<<<<<<<< DIFF OF> -12.00
27200.00,27201.00,27202.00,27203.00,27204.00,27205.00,27206.00,27207.00,27208.00,27209.00,27210.00,27211.00,253486,3
27212.00, #, #, #, #, #, #, #, #, #, #, #,253487,3

Probably because only 1ms was not enough?

EDIT2: Tried 100 on the slave and 2 ms on the master and died almost immediately. Interesting. Guess you really meant just to use events in the master without transfers from master to slave? ----- FORGET WHAT I JUST TYPED FOUND ERROR IN SLAVE SKETCH - had slave.events() inside the timer DUH!!!!!!

RETESTED but still failed just a little farther down the line.....
 

Attachments

  • SPIAtomicExamples.zip
    3.6 KB · Views: 76
Last edited:
So this work is to send messages from Slave to Master?

Changes seem to be having an affect on normal operations.
 
Tony: This is HORRIBLY WRONG - but generally works after a rocky start. I see 7 errors detected in Slave sequence checking in 385,000 F&F messages coming in at 2ms.

I killed the SPI0_isr and replaced it LITERALLY with Interrupt on CS pin. This is sheer luck it works at all to get in sync and function I suspect as it has no respect for actual SPI processing?

#define SPI_SLV_CS 2 // your Slave CS pin here
Start by adding this replacement to Slave.ino :: slave.begin( SPI_SLV_CS );

// added a prototype for this
void CS_isr(void);

Then look for this at 420: attachInterrupt( addr, CS_isr, FALLING );

Then I did this at line 645:
//void spi0_isr(void) { }
void CS_isr(void) {


Header was modified for added .beginSlave( cs ) and an empty .begin() for Master.
If you do this you'll want to SAVE CS in the Class data for re-use?

Only change to user sketch is for Slave noted above.

The rest is highlighted in the above notes.

Here is what I have and see as working against my recent posted setup.

There may be some PRELUDE code needed to wait for SPI transaction to start after CS FALLS?
There is a chance Master will drop and raise CS while you are in this code?

This note from my prior post may apply:
If that code was entered and detachinterrupt(cs) until it exits - then just stay there while data arrived or it was aborted would seem to be free of any re-enty or races - then attachInterrupt( addr, CS_isr, FALLING ); on exit.
 
the code is setup to block capture packets immmediately until done or line deasserts, so its safe.

for the pin interrupt, is the slave to master working with it?
 
you could also change the isr priority of the pin interrupt i think like i did with the spi isr, which made it work at 0 errors at v0_0_16, we’d also want FALLING only (not rising!)
rising might break things,
 
err, without the slave sending transfer16’s, do you still get errors with using pin interrupt? try to see if u can change the pin interrupt level to lower values, i ran the spi0_isr at level 1 which is just above systick, normally pin interrupts are ran at 128, so usbserial (112) and/or uarts (64) can cause the errors by introducing latency of nesting priorities, this is why we got errors when we put spi0_isr 113 or higher than usb serial, usbserial was too long running those prints
 
the code is setup to block capture packets immmediately until done or line deasserts, so its safe.

for the pin interrupt, is the slave to master working with it?

Dunno . . . I was just figuring out you were working on that and stumbling into would take more time than I have to understand or make progress.

There is something out of sync - I think you should examine it. The CS will fire early AFAIK - and the code needs to safely wait for the data to actually arrive - which is what the SPI0_isr indicates.

Changing the priority might help - but the port holding the pin has to be known for that. Not sure if PJRC has a pin-to-port map, or if it takes manually indexing that as I've done before.

I wrote .beginSlave since the interrupt cannot start until then - but it could be in the constructor and stored in the Class and then referenced on the old .begin().
 
yeah im sure we could figure out how it works, even if we set a pin interrupt higher priority than spi0 isr is worth a test

but i dont see how this would solve the solution of shared variable if its still isr based

the shared queue is the reason of the errors
 
The CS won't re-enter? I understood the problem was repeat calls on the spi0_isr()? The CS change can be easily disabled as noted - and pin state monitored. Or since it is lower freq if not NONE during a transaction - it could more easily recorded and worked with.

I reset my code and it isn't working at all now. Hopefully what I posted can work for you as a starting point.

I shortened these debug spew msgs [still a bit wide for my portrait 2nd monitor]- the worst time to fill up USB is when something else is going wrong ... see catch 22. Also any static string in Teensy ARM will be placed in FLASH - the F() decoration isn't needed.
Code:
        if ( debugSerial != nullptr ) {
          Serial.print("Dbg: [Slv CS "); Serial.print(chip_select);
          Serial.print("] [*] RESEND FAIL #="); Serial.print(resend_count);
          Serial.print(". XFER ABORT!"); delay(1000);
        }
        break;
      }
      if ( debugSerial != nullptr ) {
        Serial.print("] [*] RESEND #="); Serial.print(resend_count);
        Serial.print(". RETRY..."); delay(1000);
      }
 
another idea is using a falling interrupt to disable the spi0 nvic isr and at end of the switch it reenables the nvic isr
this should prevent spurious spi0 interrupts to only one per falling edge instead of per dword capture
 
Go For it! Yes something to remove the NOISE and simplify after proper notice is received. It only take 50 to 100 uS for a decent message. Not getting banged with interrupts would help.

I don't know which fires first CS or spi0_isr() - or what register reads indicate data ready etc . . . I assume CS drops before data is sent and stays low during the whole transaction, that would give advance notice to CPU to get that function active and give it a jump on the lag to process the pin interrupt and call the code - but it has to then wait for the actual data to be ready - and then stay until it is complete as you do.

My sample was syncing and then running with errors - now it won't - and I'm off for the day.
 
Back
Top