Using PSRAM Buffer for saving arrays before serial printing

Status
Not open for further replies.

sind

Member
Hello,

I am trying to read sensor data, store it together with a timestamp on the additional PSRAM of my T4.1 and after the measurement is finished, I would like to print the arrays via USB. The measurement is startet and ended by sending b'1' or b'2' via pyserial.

For timing the measurements I am using the Arduino-Timer library (each 100 microseconds).

Unfortunately only around 2000 values can be written inside the arrays, even though I initialized them for 100000 values. My guess would be, that they are saved somewhere else and that memory is full at some point. How could I fix this?

Thank you in advance!

Code:
#include <arduino-timer.h>
Timer<1, micros> timer;

const int Pin_ = 40;
EXTMEM int data_[100000];
EXTMEM int t[100000];
int val = 0;// variable to store the value read
int ts = 0;
int i = 0; // counter
int start_=0;

void store_val(){
  ts = micros();
  val = analogRead(Pin_);// read the input pin
  t[i] = ts;
  data_[i] = val;
  i++;
}

void setup() {
   timer.every(100, store_val);
}

void loop() {
  start_ = Serial.read();  
  if (start_==49) {
    while (1){
        timer.tick();
        start_ = Serial.read();
        if (start_ == 50){
          break;
        }
      } 
      for(int j = 0; j<i; j++){
         Serial.print(data_[j]);
         Serial.print(";");
         Serial.println(t[j]);
      }
      memset(t, 0, sizeof(t));
      memset(data_, 0, sizeof(data_));
      i = 0;
  }
}
 
Quick guess is the T_4.1:
Code:
      for(int j = 0; j<i; j++){
         Serial.print(data_[j]);
...

Is printing too fast and overwhelming the PC ability to receive/process the data. Add some delay() or delayMicroseconds() in the for() loop.

Other notes:
Not sure what arduino-timer.h code is, but Teensy offers pjrc.com/teensy/td_timing_IntervalTimer.html

The timer is left running during the printing - probably not a problem given that "int i = 0; // counter" is not volatile. But stopping/end'ing that and starting again with the next b'1' after printing when 'i = 0;' is done would seem better.

Nothing stops "int i = 0; // counter" from going over 100,000

int is a signed value and won't properly store micros() 'half the time' - it should be unit32_t, as all the variables could be.
 
I had a quick look at your sketch. Looks like there are a couple of bugs in it.

1) The Arduino-Timer I found (https://www.arduino.cc/reference/en/libraries/arduino-timer/) expects a bool (*)(void*) typed callback. I get a big fat warning compiling your code. If you don't have already, I strongly suggest to activate warnings in the IDE.

You should change your callback to
Code:
bool store_val(void*)
{
    ts = micros();
    val = analogRead(Pin_); // read the input pin
    t[i] = ts;
    data_[i] = val;  
    return true;
}
The library needs a return value of true if the timer should continue to run and false if you want to stop it.

2) You don't limit the value of i in your callback. Thus it will pitilessly write beyond the array bounds if you don't stop soon enough by typing 2. Best to do something like:
Code:
bool store_val(void*)
{
    ts = micros();
    val = analogRead(Pin_); // read the input pin
    t[i] = ts;
    data_[i] = val;

    if (i < dataSize) i++;    // <<======
    return true;
}

3) Not a bug, but do yourself a favor and try to avoid global variables wherever possible and declare them close to the place where you need them.

Here some code which seems to run as intended. (Since I don't have a board with added PSRAM I placed smaller arrays (200k each) in DTCM instead. And I changed your pin 40 to pin12 since I tried on a T4.0

Code:
#include <arduino-timer.h>

Timer<1, micros> timer;

constexpr unsigned Pin_ = 12;

constexpr unsigned dataSize = 50'000;
int data_[dataSize];
int t[dataSize];
// int val = 0;  no need to have this globally
// int ts = 0;   no need to have this globally
unsigned i = 0; // counter
//int start_ = 0;  no need to have this globally

bool store_val(void*)
{
    unsigned ts = micros();
    unsigned val = analogRead(Pin_); // read the input pin
    t[i] = ts;
    data_[i] = val;

    if ((unsigned)i < dataSize) i++;
    return true;
}

void setup()
{
    timer.every(100, store_val);
}

void loop()
{
    int start = Serial.read();
    if (start == '1')
    {
        Serial.println((char)start);  // feedback for debugging
        while (1)
        {
            timer.tick();
            start = Serial.read();
            if (start == '2')
            {
                Serial.println("st");
                break;
            }
        }
        for (unsigned j = 0; j < i; j++)
        {
            Serial.print(j);    // debugging
            Serial.print(" ");
            Serial.print(data_[j]);
            Serial.print("; ");
            Serial.println(t[j]);

            delayMicroseconds(10);  // depending on your serial monitor you need to throttle printing a bit (tyCommander is fine without)
        }
        // this is not necessary since you will overwrite them anyway
        // memset(t, 0, sizeof(t));
        // memset(data_, 0, sizeof(data_));
        i = 0;
    }
}

I suggest to try first it works for you with the DTCM arrays and change to EXTMEM later to see if this makes a difference (which I don't believe).

@Defragster: this seems to be a software timer which needs to be ticked. -> it is not running during the printing and volatiles are not necessary.

Other notes:
Not sure what arduino-timer.h code is, but Teensy offers pjrc.com/teensy/td_timing_IntervalTimer.html

This would need some different strategy to stop the timer during the printing and one needs to take care defining the counter 'i' as volatile and make sure that the callback and the main code don't access it at the same time. -> I suggest to better stick with the software timer at the beginning :)
 
Last edited:
Just for the fun of it (and since I'm somehow biased :D), here a solution using a software timer from the TeensyTimerTool

Code:
#include "Arduino.h"
#include "TeensyTimerTool.h"
using namespace TeensyTimerTool;


PeriodicTimer timer(TCK); // define a periodic software timer;

constexpr unsigned inputPin = 12;

constexpr unsigned dataSize = 50'000;
DMAMEM int data[dataSize];            // change to EXTMEM if this works...
DMAMEM int time[dataSize];

unsigned lastIdx = 0; // counter

void storeVal()
{
    time[lastIdx] = micros();
    data[lastIdx] = analogRead(inputPin);

    if (lastIdx < dataSize) lastIdx++;
}

void setup()
{
    timer.begin(storeVal, 100, false); // period 100µs, attach callback but don't start the timer yet
}

void loop()
{
    if (Serial.available())
    {
        switch (Serial.read())
        {
            case '1':
                timer.start();
                Serial.println("Started");
                break;

            case '2':
                timer.stop();
                Serial.println("Stopped");
                for (unsigned i = 0; i < lastIdx; i++)
                {
                    Serial.printf("idx: %6u t:%7.1fms   d:%u\n", i, (time[i] - time[0]) / 1000.0, data[i]);
                    delayMicroseconds(10);
                }
                lastIdx = 0; // safe (even if we didn't stop the timer) since the TCK timers are software based!
                break;

            default:
                break;
        }
    }
}
 
Better follow up @luni - though p#2 suggestions included much would have worked switching to _isr() timer as the loop() would then only use 'i' after .end() of the _isr() so volatile not needed - though mentioned.

I wondered what .tick() was doing not having found the lib source on a single search.

The _isr() could stop/.end() itself or reset 'i=0' after testing 'i' for end of data array.

Indeed excess global is bad, better to use static close to point of use when the value needs to be static.

Using PJRC IntervalTimer makes understanding and help easier than an outside library.

@luni - not sure if you saw - but Paul added TeensyTimerTool as 'other' lib to the IntervalTimer page.
 
@luni - cross post - good example inline with p#5 notes. Now coding here isn't needed :)

Should be no reason it won't work with EXTMEM when present.
 
@luni - not sure if you saw - but Paul added TeensyTimerTool as 'other' lib to the IntervalTimer page.
Oh, thanks for the shout out. Hope the now increased user base won't find too many bugs :)
 
Thank you for the fast and helpful answers!

I tried both approaches and they both worked. The main bottleneck should have been the missing "delay()" inside the for loop, where I had to add one millisecond for a stable serial reading via pyserial.

The approach using the teensyTimer and switch() (p#4) seems to work faster. Even though it did not work using DMAMEM, using EXTMEM was not a problem.
I am getting more than 9 kSPS (p#4) output data rate for reading and saving sensor data from 5 analog pins, while getting 7,8 kSPS for the approach from p#3.

Would there be any advantages regarding the sample rates using the isr()-timer instead of a software timer?
 
Thank you for the fast and helpful answers!
You are very welcome.

The main bottleneck should have been the missing "delay()" inside the for loop,
You should at least fix the two bugs with the wrong signature of your callback and the possible overrunning of your buffers. Typically not fixing such stuff bites back some time later and is hard to find then.

I am getting more than 9 kSPS (p#4) output data rate for reading and saving sensor data from 5 analog pins, while getting 7,8 kSPS for the approach from p#3.
Actually, you should get exactly 10kSPS since you call storeVal with 10kHz. If you really get less, then storeVal takes longer than 100ms and you might consider reducing the call rate accordingly.

Would there be any advantages regarding the sample rates using the isr()-timer instead of a software timer?
Hard to say because we don't know what you would regard as advantage? Speed? Constant rate? Ease of use?
A hardware based timer would be more stable than the software based timer TCK timer from the TeensyTimerTool. But you probably won't see a big difference at only 10kHz call rate. If you want to try replace 'TCK' by 'GPT1' and declare lastIdx volatile:

Code:
PeriodicTimer timer(GPT1);     // <<===== instead of TCK
...
volatile unsigned lastIdx = 0; // <<===== add volatile
 
Working with T_4.1 with PSRAM now to test pending fix for TeensyDuino 1.54 Beta 1.

I had to rename the time[] array to compile - changed to timeD[].

Upped the number of samples to 1M , changed the analogReadAveraging to ONE to is runs faster to make sure it wouldn't hold things up. The AnalogReadResolution() is still default.

Changed the timer to 5 us so 1 million samples in 5 seconds. When the lastIdx gets to end then .stop() the timer and print about that.

Made the print delay 10us every 100 lines printed and IDE SerMon is okay with that.

I don't have any analog data signal on pin 12.

Added lastIdx = 0; to the '1' Start case!

Code:
#include "Arduino.h"
#include "TeensyTimerTool.h"
using namespace TeensyTimerTool;

// https://forum.pjrc.com/threads/63973-Using-PSRAM-Buffer-for-saving-arrays-before-serial-printing?p=256743&viewfull=1#post256743

PeriodicTimer timer(TCK); // define a periodic software timer;

constexpr unsigned inputPin = 12;

constexpr unsigned dataSize = 1'000'000;
EXTMEM uint32_t data[dataSize];            // change to EXTMEM if this works...
EXTMEM uint32_t timeD[dataSize];

unsigned lastIdx = 0; // counter

void storeVal()
{
  timeD[lastIdx] = micros();
  data[lastIdx] = analogRead(inputPin);

  if (lastIdx < dataSize) lastIdx++;
  else {
    timer.stop();
    Serial.print("Full Buff Stop @ secs=");
    Serial.println(millis() / 1000);
  }
}

void setup()
{
  timer.begin(storeVal, 5, false); // period 100µs, attach callback but don't start the timer yet
  analogReadAveraging(1);
}

void loop()
{
  if (Serial.available())
  {
    switch (Serial.read())
    {
      case '1':
        lastIdx = 0;
        Serial.print("Started @ secs=");
        Serial.println(millis() / 1000);
        timer.start();
        break;

      case '2':
        timer.stop();
        Serial.println("Stopped");
        for (unsigned i = 0; i < lastIdx; i++)
        {
          Serial.printf("idx: %6u t:%7.3fms   d:%u\n", i, (timeD[i] - timeD[0]) / 1000.0, data[i]);
          if ( !(i%100) ) delayMicroseconds(10);
        }
        lastIdx = 0; // safe (even if we didn't stop the timer) since the TCK timers are software based!
        break;

      default:
        break;
    }
  }
}

Output looks like:
Code:
Started @ secs=96

Stopped

idx:      0 t:  0.000ms   d:0
idx:      1 t:  0.006ms   d:3
idx:      2 t:  0.011ms   d:2
idx:      3 t:  0.017ms   d:2
idx:      4 t:  0.022ms   d:2
idx:      5 t:  0.028ms   d:2
idx:      6 t:  0.033ms   d:2
idx:      7 t:  0.039ms   d:2
idx:      8 t:  0.044ms   d:3
idx:      9 t:  0.050ms   d:3
idx:     10 t:  0.056ms   d:2
idx:     11 t:  0.061ms   d:3

and:
Code:
Started @ secs=121
Full Buff Stop @ secs=126
 
This T_4.1 actually has twin 8MB PSRAM chips.

So I can get 2.5 million samples in 15MB:
Code:
Started @ secs=5
Full Buff Stop @ secs=18

edit to data types:
Code:
constexpr unsigned dataSize = 2'500'000;
EXTMEM uint16_t data[dataSize];            // change to EXTMEM if this works...
EXTMEM uint32_t timeD[dataSize];

Printing seems good with delay as used up to the end:
Code:
...
idx: 2499996 t:13294.582ms   d:3
idx: 2499997 t:13294.587ms   d:2
idx: 2499998 t:13294.592ms   d:2
idx: 2499999 t:13294.598ms   d:3

5us timer is about as fast as it can run with that code in the _isr() using this analogRead() with those settings.

For faster data read look into the ADC library. It can start a read and return and then on the next _isr() read the analog value and start the next read and exit the _isr().
 
Cool, looks like the code generates pretty stable 5µs intervals even with the TCK (software) timer. I like them more and more since they are so uncomplicated to use. No hassle with atomic access and volatiles :)
 
Cool, looks like the code generates pretty stable 5µs intervals even with the TCK (software) timer. I like them more and more since they are so uncomplicated to use. No hassle with atomic access and volatiles :)

I didn't look at the timer type - but that simple loop should be over 1M loop()'s per second if not 5M - depending on yield() overhead.

With 'known intervals' and recording the start time - there wouldn't really be a need to store a 4 byte TIME with each sample where time would be (lastIdx * interval + StartTime). That would cut down on PSRAM writes and also the micros() call takes almost 40 cycles that would be saved. To Verify sample integrity CYCCNT diff between samples could be recorded in 1 or two bytes and cost ~5 cycles. And a struct with { readVal; CYCdiff; } in 4 bytes would allow recreation of the data record stream and not writing to two arrays in PSRAM 4 bytes at a time would make better use of the data cache and simplify addressing - and allow more samples to fit.

... Output from Quick Edit:
Code:
Started @ secs=1.60
Full Buff Stop @ secs=14.98
loop() counts = 2500004
loop() counts per sample = 1

That suggests the storeVal() is taking 5us to run - and changing to GPT1 time agrees as loop() count is only about 25 [with void yield(){}] - meaning it is spending all the time in storeVal() as written.
AND :: Commenting out the timeD[lastIdx] = micros(); does NOT improve or change that! So the analogRead() with PSRAM write is taking that long. ADC lib could do better with async read.
 
For faster data read look into the ADC library. It can start a read and return and then on the next _isr() read the analog value and start the next read and exit the _isr().

You could also us the ADC timer of the ADC library to set the sampling interval. If you do that properly, you can avoid the sampling jitter that can happen with an external timer that can have its' interrupt routine delayed by other interrupts. IIRC. I was able to sample up to 1MSamples/second without sampling jitter using that method. It's simple for one channel, but more complex for multiple channels of ADC input.
 
Since 5 kHz suit perfectly for the application, I kept using the teensytimertool. Thanks again for the helpful suggestions!

The bottleneck right now is the loop printing the data and reading it via USB. The serial readout takes around 58 Seconds for 1.200.000 values (20 Seconds of measurement for a total of 6 pins).
When I check my device manager in Windows 10, the teensy is listed as serial USB device with an maximum baudrate of 128.000 bps, which is approximately 1/10 of the original USB-speed.

Can I initialise the port differently or is the main reason the design of my serial printing routine in the for-loop? I read something, that virtual ports are not affected by the constraints of the Windows USB Serial Driver, so it has to be the arduino code, right?


Code:
#include "Arduino.h"
#include "TeensyTimerTool.h"
using namespace TeensyTimerTool;
PeriodicTimer timer(TCK); // define a periodic software timer;

constexpr unsigned Pin_ = 39;
constexpr unsigned Pin_1 = 14;
constexpr unsigned Pin_2 = 20;
constexpr unsigned Pin_3 = 16;
constexpr unsigned Pin_4 = 17;
constexpr unsigned Pin_5 = 18;

constexpr unsigned dataSize = 200'000;

EXTMEM uint16_t data_[dataSize];
EXTMEM uint16_t data_1[dataSize];
EXTMEM uint16_t data_2[dataSize];
EXTMEM uint16_t data_3[dataSize];
EXTMEM uint16_t data_4[dataSize];
EXTMEM uint16_t data_5[dataSize];

unsigned lastIdx = 0; // counter

void storeVal()
{
    data_[lastIdx] = analogRead(Pin_);
    data_1[lastIdx] = analogRead(Pin_1);
    data_2[lastIdx] = analogRead(Pin_2);
    data_3[lastIdx] = analogRead(Pin_3);
    data_4[lastIdx] = analogRead(Pin_4);
    data_5[lastIdx] = analogRead(Pin_5);


    if (lastIdx < dataSize) lastIdx++;
}

void setup()
{
    timer.begin(storeVal, 200, false); // period 200µs, attach callback but don't start the timer yet
    Serial.begin(1000000);
}

void loop()
{
    if (Serial.available())
    {
        switch (Serial.read())
        {
            case '1':
                timer.start();
                break;

            case '2':
                timer.stop();
                for (unsigned i = 0; i < lastIdx; i++)
                {
                    Serial.print(data_[i]);
                    Serial.print(",");
                    Serial.print(data_1[i]);
                    Serial.print(",");
                    Serial.print(data_2[i]);
                    Serial.print(",");
                    Serial.print(data_3[i]);
                    Serial.print(",");
                    Serial.print(data_4[i]);
                    Serial.print(",");
                    Serial.println(data_5[i]);
                    delayMicroseconds(100);
                }
                lastIdx = 0; // safe (even if we didn't stop the timer) since the TCK timers are software based!
                break;

            default:
                break;
        }
    }
}
 
Note, with the Serial object (USB). The Serial.begin(1000000);
The baud value means nothing. It will send data as fast as possible over USB. There are only a few places where the host side can set a baud rate, that can make it difference. Not in the actual speed of the data being sent over USB, but it allows you to detect it on the Teensy and maybe do something like set the Baud rate of another Serial port like Serial1 to that baud rate.

In that way a Teensy can act like a USB to Serial Adapter. There is an example sketch, with a name something like: examples->Teensy->USB_Serial->USBToSerial

As for Timing: 1200000*.0001 = 120 seconds, so it is the delay that is causing it to take that long.

To shorten the time, you might experiment some, like do you receive all of your data if you change 100 to 10 microseconds? Can you only delay after you output N items?

I have not played with PySerial in a long time, actually the only time was with ROS on RPI (or Odroid) and it sucked up a lot of system resources, So in some cases I rewrote some Python devices to C++... But maybe someone who uses it can give you some suggestions, like would it makes sense and help to pack more of the data into packets, possibly in binary format and for example maybe send 10 samples in 64 byte packets...

But again not a Python person, so hopefully someone else can give some other suggestions.
 
But again not a Python person, so hopefully someone else can give some other suggestions.
Nor am I, but I can confirm that transferring more than 10MB/s is not a problem, even with c# which definitely is not designed for speed. I assume that python won't be any slower but there certainly are Python cracks around. In case you are interested, here some info about serial Teensy <-> PC communication. https://github.com/TeensyUser/doc/wiki/Serial
 
Just forum anecdotes over time with Python - no actual personal use

Often first attempts to talk to Teensy are slow/limited - then some effort on the Python side can make it work.

Not sure where that effort is applied - but it can be made efficient enough to keep up with a Teensy - at least a slower T_3.6 ... not sure those cases involved higher speed T_4.x

A Windows PC with SerMon or C seems to be able to get near 16 MB/sec with T_4.x ( number off hand IIRC with Paul's Lines Per Sec test is 500K lines of 32 bytes/sec ).

Paul found Linux to be generally faster - and a PC can peak up to 800K of those lines - but the USB buffer handling gets swamped at some point.

Paul github has a LinesPerSecond repository. Maybe a Python script against that would give a measure to work with to see what limit or bottleneck is.
 
I made a wrong calculation. It should be around 600.000 values for 20 seconds of measurement, since the delay is only set after an complete line, the total time for the delays in readout should be around 10s. With the current printing and reading value by value I need this 100 micros delay, otherwise not all data is received.

As KurtE suggested I could maybe send bigger packages to reduce the overhead (?).

In your doc about Serial communication, luni, I found this method for writing an whole array:
Code:
Serial.write(buf,bufSize);
Is there a possibility to make this work also for uint16_t integers? It should be faster to send a whole array at once, shouldn't it?

I will try the linespersecond measure tomorrow.

Below you can find the snippet of my python script where I read the serial data. Since I am reading it line by line this could also take quite long.

Code:
    x=1
    while (ser_interface.isOpen() == True):
        line = ser_interface.readline().strip()
        if x > 1:
            value = line.decode()
            if  value == ' ':
                ser_interface.close()
                break
            arr = value.split(',')
            input_data.append(arr)
        x += 1
 
just looked - that test is :: github.com/PaulStoffregen/USB-Serial-Print-Speed-Test

Test it against SerMon at first and observe the results.

Then in Python simply keep reading and empty the buffer - with of a line at some interval to see the speed it is coming out of the Teensy without blocking. Add code to parse in some fashion as needed or possible to observe the steady state number and emulate what your code does - except this is parsing 'ascii' text as observed in SerMon display.

An array of numbers can be sent - they will be binary as stored in Teensy RAM and will need to be parsed and treated the same way for them to appear as the same values on the PC side if done that way.
 
Serial Monitor in the Arduino IDE: The minimum amount of lines per second is 120.000. Sometimes 200.000-350.000 are possible, but I guess after some seconds the buffer is filled.

When I'm sending (serial.write()) an array of numbers, the default size of a value is 8 bits. Can I make the method below work with 16 bits integers as well?

Code:
Serial.write(buf,bufSize);
 
Status
Not open for further replies.
Back
Top