Teensy 3.6 and WIZ820io low performance/slow transfer speed

Status
Not open for further replies.

JanGee

Member
Hey all,

I'm working on a large scale LED project with about 190000 LEDs driven by various Teensy 3.6 boards with the OctoWS2811 library each driving 8600 LEDs. This is working very well at max 29fps.
Now, the LED displays are distributed to different places on an area. 4 Teensies are grouped together and drive one wall. Those groups sync over ethernet due to the long distances between them. But I also need to transfer the videos via Ethernet to them if i want to change the content (not realtime). They are about 5 to 10 minutes long and the Teensies should put them on their µSDCard. That means, that one teensy needs to receive about 194Mb for 5Min Video in the OctoWS2811 binary format. I was hoping that the WIZ820io or WIZ850io could handle this. But the transfer performance is very slow. I only get 66KByte/sec transfer rates and I couldn't find any hints so far what is going wrong. I also don't understand how I could try different SPI libs as suggested in other quite old posts.

I'm working with Arduino 1.8.2 with Teensyduino 1.36. I connected the WIZ820io directly to the Teensy 3.6 (no adapter)

I reduced the whole prog to a test sketch based on the DHCP Chat server example. The video is send by a python script.

Teensy sketch
Code:
#include <SPI.h>
#include <Ethernet.h>

// Enter a MAC address and IP address for your controller below.
// The IP address will be dependent on your local network.
// gateway and subnet are optional:
byte mac[] = {
  0x00, 0xAA, 0xBB, 0xCC, 0xDE, 0x02 //0xC3, 0x23, 0x6B, 0xAC, 0xDE, 0x12
};
IPAddress ip(192, 168, 1, 177);

// listen to port 60000
EthernetServer server(60000);
boolean gotAMessage = false; // whether or not you got a message from the client yet

void setup() {
  pinMode(9, OUTPUT);
  digitalWrite(9, LOW);    // begin reset the WIZ820io
  delay(100);
  digitalWrite(9, HIGH);   // end reset pulse
  
  // Open serial communications and wait for port to open:
  Serial.begin(9600);

  // start the Ethernet connection:
  Serial.println("Trying to get an IP address using DHCP");
  if (Ethernet.begin(mac) == 0) {
    Serial.println("Failed to configure Ethernet using DHCP");
  }
  // print your local IP address:
  Serial.print("My IP address: ");
  ip = Ethernet.localIP();
  for (byte thisByte = 0; thisByte < 4; thisByte++) {
    // print the value of each byte of the IP address:
    Serial.print(ip[thisByte], DEC);
    Serial.print(".");
  }
  Serial.println();
  // start listening for clients
  server.begin();

}

void loop() {
  // wait for a new client:
  EthernetClient client = server.available();

  // when the client sends the first byte, say hello:
  if (client) {
    if (!gotAMessage) {
      Serial.println("We have a new client");
      client.println("Hello, client!");
      gotAMessage = true;
    }

    // read the bytes incoming from the client:
    char thisChar = client.read();

    Ethernet.maintain();
  }
}

Python Sender
Code:
import socket # Import socket module
import time

s = socket.socket() # Create a socket object
host = "192.168.188.116" # Get local machine name
port = 60000 # Reserve a port for your service.

s.connect((host, port))
s.send("Hello server!")

while True:
    filename='VIDEO.BIN'
    f = open(filename,'rb')
    l = f.read(1024)
    bs = 0;
    start = time.time()
    while (l):
        s.send(l)
        bs+=1024
        if(time.time() - start > 1.0):
            print('Byte/s: ', bs)
            bs = 0
            start = time.time()
        l = f.read(1024)
    print()
    f.close()

    print('Done sending')
    s.send('Thank you for receiving!')
    s.close()

I really need AT LEAST 10MBit transfer rates to make this whole thing practical.
If this is not possible via WIZ820/850io could you make any suggestions for a board to make the transfer and then using Video2Serial to send it to the 4 grouped teensies? Or any other suggestions to make this work?!

Thanks for your help!
 
you should set up a 1024 byte buffer on the teensy and do multi-byte receives in loop(), something like

Code:
#define RECLTH 1024
uint8_t buf[RECLTH];
...

    while(client.connected()) {
      if ((n=client.available()) > 0) {
        if (n > RECLTH)  n = RECLTH;
        client.read(buf,n);
        bytes += n;
      }
    }

and you need to look at the Ethernet lib and set the SPI clock to 24mhz or 30 mhz.

Doing print's and Ethernet.maintain() inside your receive loop will only slow things down.

You may still be limited by how fast the python can read the data file and put bytes out on the ether and/or how fast you can write to the uSD.
 
Last edited:
15MBit should be be doable:
https://github.com/manitou48/DUEZoo/blob/master/wizperf.txt

loop() is rather slow to begin with and you are doing ton of pointless stuff inside it. You are only reading a single byte per loop() execution.

Get your EthernetClient once and keep it around. Using read() to get a single character at a time has tons of overhead, use a decently sized buffer to read a block of data:
https://github.com/PaulStoffregen/Ethernet/blob/master/examples/WebClient/WebClient.ino#L84
 
Ah, okay, I picked the "wrong" example and didn't look at what is going on carefully enough. :eek: Thanks for your fast replies! :)

So I optimized my code as manitou suggested and set the SPI clock to 30mhz. That helped quite a bit. I get 400kbyte/sec or 3.2mbit/sec now.

Code:
#include <SPI.h>
#include <Ethernet.h>
#include "TeensyID.h"

#define RECLTH 1024
uint8_t buf[RECLTH];
unsigned int n = 0;

uint8_t mac[6];
IPAddress ip(0, 0, 0, 0);

// listen to port 60000
EthernetServer server(60000);
EthernetClient client;
boolean gotAMessage = false; // whether or not you got a message from the client yet

void setup() {
  pinMode(9, OUTPUT);
  digitalWrite(9, LOW);    // begin reset the WIZ820io
  delay(100);
  digitalWrite(9, HIGH);   // end reset pulse
  
  // Open serial communications and wait for port to open:
  Serial.begin(9600);

  // read the burned in MAC address
  teensyMAC(mac);
  Serial.printf ("MAC Address: %02X:%02X:%02X:%02X:%02X:%02X \n", mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

  // start the Ethernet connection:
  Serial.println("Trying to get an IP address using DHCP");
  if (Ethernet.begin(mac) == 0) {
    Serial.println("Failed to configure Ethernet using DHCP");
  }
  
  // print your local IP address:
  Serial.print("My IP address: ");
  ip = Ethernet.localIP();
  for (byte thisByte = 0; thisByte < 4; thisByte++) {
    // print the value of each byte of the IP address:
    Serial.print(ip[thisByte], DEC);
    Serial.print(".");
  }
  Serial.println();
  
  // start listening for clients
  server.begin();
}

void loop()
{
  Ethernet.maintain();
  // wait for a new client:
  client = server.available();

  // when the client sends the first byte, say hello:
  if (client)
  {
    if (!gotAMessage)
    {
      Serial.println("We have a new client");
      client.println("Hello, client!");
      gotAMessage = true;
    }

    while(client.available())
    {
        n = client.available();
        //Serial.println(n);
        if (n > 0)
        {
            if (n > RECLTH)  n = RECLTH;
            client.read(buf,n);
        }
    }
  }
}

I also tested the example WebClient Sketch tni linked to. But with that sketch I only get 20kbyte/sec?!

Regarding my 400kbyte/sec I see in the linked performance test file that the speed I get is rather 4Mhz SPI?! Do I change the clock In the right file?

/Applications/Arduino_1.8.3.app/Contents/Java/hardware/teensy/avr/libraries/Ethernet/w5100.h

I need to change it like this right?

Code:
// Safe for all chips
//#define SPI_ETHERNET_SETTINGS SPISettings(14000000, MSBFIRST, SPI_MODE0)

// Safe for W5200 and W5500, but too fast for W5100
// uncomment this if you know you'll never need W5100 support
#define SPI_ETHERNET_SETTINGS SPISettings(30000000, MSBFIRST, SPI_MODE0)


Regarding the µSD write speeds I was quite optimistic based on my experiences with the read speeds of the SDFat lib from Bill and the test results mentioned in this thread
https://forum.pjrc.com/threads/36737-Try-SdFat-forTeensy-3-5-3-6
But it's a good point to check. I will write a benchmark with my setup...
 
I forgot to mention that I upgraded to Arduino 1.8.3 and Teensyduino 1.37 this morning...
 
Last edited:
I also tested the example WebClient Sketch tni linked to. But with that sketch I only get 20kbyte/sec?!
The intent was to show the buffered read, the same thing manitou posted. You wouldn't want to include the part where the buffer is written to Serial.
 
Warning: I have not done much with any of these adapters...

But as @manitou mentioned, I would check both sides of the equation. That is at least with my limited Python usage (Some ROS modules run on ARM processors). The python modules did not have very good performance... Or more in my case they ate a lot more CPU resources than I wanted.

Also would be curious of things like in the code:
Code:
    while(client.available())
    {
        n = client.available();
        //Serial.println(n);
        if (n > 0)
        {
            if (n > RECLTH)  n = RECLTH;
            client.read(buf,n);
        }
    }

What type of values your are getting back from client.available(). Also my quick look through each of those calls creates an spI transaction. So I would have a tendency to minimize those types of calls. Maybe more like:
Code:
    while((n = client.available()))
    {
        //Serial.println(n);
        uint16_t read_length = RECLTH;
        while (n > 0)
        {
            if (n < RECLTH)  n = RECLTH;
            client.read(buf, read_length);
            n -= read_length;
        }
    }

Or maybe see if I can avoid it completely, something like:
Code:
    while((n = client.read(buf, RECLTH)))
    {
        // do something with what you read...
        //Serial.println(n);
    }



I would also through in a dummy yield method, so you don't have that additional overhead:
Code:
void yield() {
}
 
The intent was to show the buffered read, the same thing manitou posted. You wouldn't want to include the part where the buffer is written to Serial.

Yes, got that. I just was astonished, that the examples are so inefficient. Even my single read approach at the beginning was 3x faster. I also read in some other threads that Paul wanted to optimize the Ethernet lib for single reads because people use them so often. Now I wonder if it would be more efficient to optimize the examples to teach best practices to get stable and high performance and using the things "right"?! I am not experienced with low level hw programming and try to understand the examples and learn from them. Don't get me wrong. This is no criticism. I just want to give some feedback about the traps I tap into as a Teensy noob.

@KurtE: Thanks for the suggestions. It didn't make a difference. What is that yield() function about? The ouput from my version looks like this:

Code:
MAC Address: 04:E9:E5:04:D6:C1 
Trying to get an IP address using DHCP
My IP address: 192.168.188.119.
We have a new client
13
1024
1011
1037
13
1460
436
1460
436
1460
436
1460
436
...

I already wondered about the alternating value 436. When changing RECLTH=2048 the output is

Code:
MAC Address: 04:E9:E5:04:D6:C1 
Trying to get an IP address using DHCP
My IP address: 192.168.188.119.
We have a new client
13
1460
575
1460
1460
1460
1460
1460
1460
1460
1460
...

I just have to pick up the WIZ850io which arrived today and will see if it makes a difference. I will also investigate the python side and do some performance tests. In the production system I want to go down to c level anyway. So maybe this is the right time to start :p
 
Last edited:
my suggestion snippet above was while(client.connected()) (not available), not sure if that would improve anything. FWIW, the python script on my linux desktop can put data on the ether at a rate of 88 mbs, so it wouldn't be a bottleneck. I removed the prints in the python transmit loop

1460 is the max TCP segment size, so 2048 buffer should help
 
Last edited:
my suggestion snippet above was while(client.connected()) (not available), not sure if that would improve anything. FWIW, the python script on my linux desktop can put data on the ether at a rate of 88 mbs, so it wouldn't be a bottleneck. I removed the prints in the python transmit loop 1460 is the max TCP segment size, so 2048 buffer should help

Thanks for checking. I removed the prints too. I couldn't get your snippet working yesterday. But now it works directly. Maybe it was to late. With all three versions of the receiving code on the Teensy side I get equal results. The Serial.println does not have a significant effect:

Code:
    // VERSION 1
    while((n = client.available()))
    {
        Serial.println(n);
        if (n > 0)
        {
            if (n > RECLTH)  n = RECLTH;
            client.read(buf,n);
        }
    }

    // VERSION 2 (manitou)
    while(client.connected()) {
      if ((n=client.available()) > 0) {
        if (n > RECLTH)  n = RECLTH;
        client.read(buf,n);
      }
    }
    
    // VERSION 3 (KurtE)
    while((n = client.read(buf, RECLTH)))
    {
        // do something with what you read...
        //Serial.println(n);
    }

Python sender output
Code:
VERSION 1
  ('FileSize (Mb): ', 25.8)
  ('Transfer time (min): ', 1.1169624169667562)
  ('KByte/s: ', 384.97247061762727)
  ('MByte/s: ', 0.3849722816190169)

VERSION 2 (manitou)
  ('FileSize (Mb): ', 25.8)
  ('Transfer time (min): ', 1.1094992319742838)
  ('KByte/s: ', 387.5620748930524)
  ('MByte/s: ', 0.3875618944475407)

VERSION 3 (KurtE)
  ('FileSize (Mb): ', 25.8)
  ('Transfer time (min): ', 1.1069506486256917)
  ('KByte/s: ', 388.4543709256919)
  ('MByte/s: ', 0.3884541966205364)
 
Tested with the WIZ850io. No difference.

@manitou: I looked at your performance tests again and you obviously got something working to get 24-27MBit receive speeds.
So three questions came to me when reading
1.) I browsed the repo for the wiztest sketch but it doesnt seem to be included?! Do you have it available?
2.) You say that a (2nd power supply for W5200) was connected. Did that make a difference? Right now I connect the WIZ850io to the 3.3V from the Teensy 3.6.
3.) What lib does PaulSPI stand for?
 
@KurtE: Thanks for the suggestions. It didn't make a difference. What is that yield() function about? The ouput from my version looks like this:
The idea of the yield function, is it allows you to put code in that should be run everytime you logically wish to wait for something to happen as to allow some other stuff to be completed. It is also called in the main loop of the main program, everytime you exit the loop() function. That is:
Code:
extern "C" int main(void)
{
	// Arduino's main() function just calls setup() and loop()....
	setup();
	while (1) {
		loop();
		yield();
	}
}

The default implementation (weak linked), checks all 6 serial ports (Serial.available(), Serial1.available()... Serial6.available()) and for all of them that do not return 0, they call
serialEvent(), serial1Event().... serial6Event

Which all of default serialEvent implementations just return. So by including your own version of yield you bypass all of that code.
 
Tested with the WIZ850io. No difference.

@manitou: I looked at your performance tests again and you obviously got something working to get 24-27MBit receive speeds.
So three questions came to me when reading
1.) I browsed the repo for the wiztest sketch but it doesnt seem to be included?! Do you have it available?
2.) You say that a (2nd power supply for W5200) was connected. Did that make a difference? Right now I connect the WIZ850io to the 3.3V from the Teensy 3.6.
3.) What lib does PaulSPI stand for?
I have updated the wizpaul.ino sketch at https://github.com/manitou48/teensy3/blob/master/wizpaul.ino
It's quite a hack, and it uses various tools on my linux box to send or receive packets (ttcp.c or iperf or things i crafted). Good Luck.
(It also measures time to read/write the wiznet buffer area with SPI. Ethernet performance is guaranteed to be no faster than the buffer SPI rates!)

The various sketch names refer to using SdFAT SPI, and over time Paul updated his SPI implementation to be competitive with SdFAT SPI.

I do power the WIZ board from a separate power supply (common ground). it can consume 150+ ma

The last time I tested was in January with 30 mhz clock on T3.2@120mhz
https://forum.pjrc.com/threads/41151-Understand-WIZ820io-performance?p=128975&viewfull=1#post128975
 
Last edited:
Thanks for updating manitou.

So here are my results:

To make the ttcp_server() test accept client connections I have to remove the wizdump() call in the setup function. Do you see the same behaviour? Otherwise just nothing happens when I try to connect after "server listening"...

Code:
My IP address: 192.168.188.122.
write 326 us mbs 25.13
read 300 us  mbs 27.31
wrt/rd errors 1020
read 18 us   mbs 24.44
     00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0030 00 00 00 00 00 00 00
socket info
     00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
0000 00 00 00 37 00 00 00 00 00 00 00 00 00 00 00 00
0010 00 00 00 00 08 00 00 00 00 00 00 00 00 00 00 00
0020 00 00 00 00 00 00 00 00 00 00 00 00
IP  address:0.0.0.0
B8000000
server listening

The SPI read speeds do not seem to be the bottle neck if I interpret the figures right?! But "wrt/rd errors 1020" looks suspicious?!

When I remove the wizdump() call it I get these results. I had to change NBYTES to bytes in the calculation for the mbit to get calculated right.
Code:
My IP address: 192.168.188.122.
server listening
client connected
recv  25800037 bytes 63541 ms n 364  mbits 3.25
server listening
Those results are far away from your results for tcp.

I also tried activating the W5500 4K buffers in W5100.cpp. But that dropped receive speeds near zero and the transfer interrupts...
Code:
My IP address: 192.168.188.122.
server listening
client connected
recv  3309833 bytes 282665 ms n 0  mbits 0.09
server listening

Another thing I tried was reducing the teensy clock to 120MHz so HAS_SPIFIFO gets set. I set SPIFIFO.begin(ss_pin, SPI_CLOCK_24MHz). Unfortunately the Ethernet server doesn't react to client connections anymore like when wizdump() gets called.

I also wonder what this code would look like for the 180MHz of the teensy 3.6 with 30MHz SPI clock? And could the SPIFIFO help at all?!
Code:
#if F_BUS == 120000000
#define HAS_SPIFIFO
#define SPI_CLOCK_24MHz   (SPI_CTAR_PBR(3) | SPI_CTAR_BR(0) | SPI_CTAR_DBR) //(120 / 5) * ((1+1)/2)
#define SPI_CLOCK_16MHz   (SPI_CTAR_PBR(0) | SPI_CTAR_BR(2))                //(120 / 2) * ((1+0)/4) = 15 MHz
#define SPI_CLOCK_12MHz   (SPI_CTAR_PBR(3) | SPI_CTAR_BR(0))                //(120 / 5) * ((1+0)/2)
#define SPI_CLOCK_8MHz    (SPI_CTAR_PBR(3) | SPI_CTAR_BR(4) | SPI_CTAR_DBR) //(120 / 5) * ((1+1)/6)
#define SPI_CLOCK_6MHz    (SPI_CTAR_PBR(3) | SPI_CTAR_BR(2))                //(120 / 5) * ((1+0)/4)
#define SPI_CLOCK_4MHz    (SPI_CTAR_PBR(3) | SPI_CTAR_BR(4)) 		    //(120 / 5) * ((1+0)/6)

By the way if I drop SPI clock to 4MHz I get 200Kb/s transfer speeds. So transfer rate only halfs even though the clock is divided by 7.5.

I really wonder where the bottle neck is. :confused:
 
Last edited:
What's your network latency? 2kB is a tiny TCP window size. With a decent wired network and a latency of a couple of hundred us, it won't matter much. If you get towards milliseconds, it does.

If you are using a router (or anything that potentially messes with TCP/IP headers), try to remove it.

There can be disastrous performance interactions with Nagle and delayed ACKs, especially with the small TCP window used. Try to disable them.

In 'EthernetServer::begin()', there is
sockindex = Ethernet.socketBegin(SnMR::TCP, _port);

change that to:
sockindex = Ethernet.socketBegin(SnMR::TCP | SnMR::ND, _port);

\\

Other people have successfully used W5200 with 16kB buffers:
https://github.com/alex-Arc/Etherne...e0a7ffa#diff-3fe9dbe8c926368fc6055401536a380b

\\

Post a tcpdump, something may jump out.
 
Yeah! :):):) Now I get about 9Mbit. The 4K Buffers work now. But 8K and 16K slow things down again.
Code:
Done sending
('FileSize (Mb): ', 25.8)
('Transfer time (min): ', 0.3655051509539286)
('KByte/s: ', 1176.4525121626396)
('KBit/s: ', 9411.612832609342)
('MByte/s: ', 1.1764508494701662)
('MBit/s: ', 9.411599940367045)

But my network seems to be a big part of the problem. When connecting the teensy directly to my mac I even get 20+Mbit!!! Still not 27 but it's getting usable for my purpose :)
Code:
Done sending
('FileSize (Mb): ', 25.8)
('Transfer time (min): ', 0.15660630067189535)
('KByte/s: ', 2745.7303254582152)
('KBit/s: ', 21965.80526143985)
('MByte/s: ', 2.745721547261544)
('MBit/s: ', 21.96573726547405)

Here is a wireshark screenshot of the first packets of the transfer. How can I copy a readable dump out of it?!
Screen Shot 2017-07-02 at 12.11.07.jpg
 
What did you change?

From the network capture, it looks like Nagle and delayed ACK are disabled on the Mac side and 'SnMR::ND' is set on the WIZio. It does look like what I would want to see. Can you post one with 16kB buffers and one with a ping?

Your Mac is sometimes extremely slow to respond. At the 62.600083 TCP window update (WIZio signals that is has free buffer space), it sends new data within 32us; at the 62.603081 window update it takes 8'790us. Maybe you can disable some power-saving stuff on the Mac.

Teensy SPI officially supports 30MHz. TeensyLogicAnalyzer overclocks it to up to 120MHz.

Here is a wireshark screenshot of the first packets of the transfer. How can I copy a readable dump out of it?!

File / Export Packet Dissections / As CSV.
 
Well, I changed the code according to your suggestions and the linked 16K changes and connected a cable instead of using WLAN. But in the production env I will have a high quality LAN and Server dedicated to the task of transfering the LED video BIN files and syncing them during runtime. So I don't care for WLAN right now.

The Mac is connected to a power supply. But there is a lot of processes active.Maybe the python sender is put into wait while other tasks are running. I'll probably need to clean up.

Right now it's:
4Mbit/s - WLAN/2KB buffer
9Mbit/s - WLAN/4KB buffer
18MBit/s - LAN/4KB buffer (over a FRITZ! router)
21Mbit/s - LAN/4KB buffer (direct connection between Mac & Teensy)

And I just found out that I hade a mistake (see comment in code below) in the code for the 16K buffer

25Mbit/s - LAN/16KB buffer (over a FRITZ! router) :) YEAH!

It's getting time to take care of the SDCard write speed. :p I will add this to my test sketch.

I read about the WIZ850io that it supports 84MHz SPI max. So overclocking SPI would be very interesting. ButI don't understand how they do it inthe LogicAnalyzer. Overclocking the teensy didn't make a difference?!
And I read that it has 32 kB buffer mem. So I even tried a 32K receive buffer but that didn't work...I don't get an IP from DHCP...
http://shop.wiznet.eu/wiz850io.html

Here is the code for the bigger buffer sizes
Code:
...
else if (isW5500()) {
		CH_BASE = 0x1000;
        #ifdef W5500_32K_BUFFERS
        SSIZE = 32768;    // 32K buffers
        SMASK = 0x7FFF;
        #elif defined(W5500_16K_BUFFERS)
        SSIZE = 16384;    // 16K buffers
        SMASK = 0x3FFF;
        #elif defined(W5500_8K_BUFFERS)
        SSIZE = 8092;    // 8K buffers
        SMASK = 0x1FFF;
		#elif defined(W5500_4K_BUFFERS)
		SSIZE = 4096;    // 4K buffers
		SMASK = 0x0FFF;
		#else
		SSIZE = 2048;    // 2K buffers
		SMASK = 0x07FF;
		#endif
        SMASK = SSIZE-1; // Could be removed when SMASK is set implicitly above...
		TXBUF_BASE = 0x8000;
		RXBUF_BASE = 0xC000;
		//#ifdef W5500_4K_BUFFERS <-- FORGOT TO UNCOMMENT THIS IN PREVIOUS 16K TESTS
		for (i=0; i<MAX_SOCK_NUM; i++) {
			writeSnRX_SIZE(i, SSIZE >> 10);
			writeSnTX_SIZE(i, SSIZE >> 10);
		}
		for (; i<8; i++) {
			writeSnRX_SIZE(i, 0);
			writeSnTX_SIZE(i, 0);
		}
...

Results for 16K buffers
Code:
Done sending
('FileSize (Mb): ', 200.0)
('Transfer time (min): ', 1.0452636003494262)
('KByte/s: ', 3265.522509343923)
('KBit/s: ', 26124.173420791387)
('MByte/s: ', 3.1889847403503966)
('MBit/s: ', 25.51187142480386)

Here is the tcp dump for 16k buffers
Screen Shot 2017-07-02 at 15.05.13.jpg
Screen Shot 2017-07-02 at 15.06.24.png
 
Last edited:
I read about the WIZ850io that it supports 84MHz SPI max. So overclocking SPI would be very interesting. ButI don't understand how they do it inthe LogicAnalyzer. Overclocking the teensy didn't make a difference?!
I'm not sure if LogicAnalyzer has a different clock setup. With the standard Teensy SPI and Teensy 3.6, I can use F_BUS set to 80MHz (40MHz SPI clock, 240MHz CPU). Any higher bus clock results in corrupted data.

And I read that it has 32 kB buffer mem. So I even tried a 32K receive buffer but that didn't work...
It's split between RX and TX. So you can't use more than 16kB for a socket.

It's getting time to take care of the SDCard write speed. :p I will add this to my test sketch.
Use SdFat-beta, SdFatSdioEX. Make sure the card isn't busy, when you try to write ('sd.card()->isBusy()'). The write would block and wait, while you could instead empty the WIZio buffer. Teensy 3.6 has a lot of memory for buffering.

512 byte writes work fine with SdFatSdioEX. The SDIO interface runs at 200Mbit/s. That's the actual transfer rate you get to the SD card, even if it writes slower to it's flash (it will signal busy for subsequent sector writes).
 
Here is the tcp dump for 16k buffers
For these, the SPI reading is clearly the bottleneck. The WIZio receive buffer is kept well filled and going down to an 8kb buffer size probably wouldn't make a difference.
 
Okay, here we go. Thanks a lot to you tni and everybody else. I get quite descend transfer/write speeds. Here are some for different file sizes:

Code:
Done sending
('FileSize (Mb): ', 1.1728744506835938)
('Transfer time (min): ', 0.006500101089477539)
('KByte/s: ', 3079.2941971853024)
('KBit/s: ', 24633.404930145676)
('MByte/s: ', 3.0068916450035728)
('MBit/s: ', 24.054206889301522)


Done sending
('FileSize (Mb): ', 24.60479736328125)
('Transfer time (min): ', 0.16970289945602418)
('KByte/s: ', 2474.4468397186592)
('KBit/s: ', 19795.545516159356)
('MByte/s: ', 2.416444862628856)
('MBit/s: ', 19.33152857342053)


Done sending
('FileSize (Mb): ', 200.0)
('Transfer time (min): ', 1.515394151210785)
('KByte/s: ', 2252.438627410822)
('KBit/s: ', 18019.50585350615)
('MByte/s: ', 2.1996463472569037)
('MBit/s: ', 17.59716787104682)

Without writing to the SD card I get
Code:
Done sending
('FileSize (Mb): ', 1.1728744506835938)
('Transfer time (min): ', 0.005503666400909424)
('KByte/s: ', 3636.7542392459454)
('KBit/s: ', 29092.60569728996)
('MByte/s: ', 3.5511947782322393)
('MBit/s: ', 28.408266268016323)


Done sending
('FileSize (Mb): ', 24.60479736328125)
('Transfer time (min): ', 0.12643694877624512)
('KByte/s: ', 3321.182323911194)
('KBit/s: ', 26569.398470139102)
('MByte/s: ', 3.2433279448624193)
('MBit/s: ', 25.946568924533576)


Done sending
('FileSize (Mb): ', 200.0)
('Transfer time (min): ', 1.0279139002164206)
('KByte/s: ', 3320.639884678672)
('KBit/s: ', 26565.112196955026)
('MByte/s: ', 3.2428108078653324)
('MBit/s: ', 25.94248014486254)

RECLTH = 16384 (That was the fastest. 512 is slow). Both, the Ethernet read and the SD write use this size. I couldn't figure out a good way to implement isBusy yet. And there surely is a lot potential to optimize the read/write to work together in an optimal way. The weekend has been long and my brain is tired. The overclocking does have no effect. Or only very small effects. The data is transfered and written correctly to the sd. Here is the code so far:

Code:
#include <SPI.h>
#include "Ethernet.h"
#include <SPIFIFO.h>
#include "TeensyID.h"
#include "SdFat.h"

// --- SD ---
SdFatSdioEX SDIO;
const char VIDEO_FILENAME[] = "ETHTEST.BIN";
File videofile;

#define ETH_RCV_LEN 16384
uint8_t buf[ETH_RCV_LEN];
unsigned int n = 0;

uint8_t mac[6];
IPAddress ip(192, 168, 0, 2);

// listen to port 60000
EthernetServer server(60000);
EthernetClient client;
boolean gotAMessage = false; // whether or not you got a message from the client yet

void setup() {
  pinMode(9, OUTPUT);
  digitalWrite(9, LOW);    // begin reset the WIZ820io
  delay(100);
  digitalWrite(9, HIGH);   // end reset pulse
  
  // Open serial communications and wait for port to open:
  Serial.begin(9600);

  // Init SD Card
  if (SDIO.begin())
      Serial.println("SD card initialized");
  else
      Serial.println("Could not access SD card");

  // read the burned in MAC address
  teensyMAC(mac);
  Serial.printf ("MAC Address: %02X:%02X:%02X:%02X:%02X:%02X \n", mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);

  //Ethernet.begin(mac, ip);
  // start the Ethernet connection:
  Serial.println("Trying to get an IP address using DHCP");
  if (Ethernet.begin(mac) == 0) {
    Serial.println("Failed to configure Ethernet using DHCP");
  }
  
  // print your local IP address:
  Serial.print("My IP address: ");
  ip = Ethernet.localIP();
  for (byte thisByte = 0; thisByte < 4; thisByte++) {
    // print the value of each byte of the IP address:
    Serial.print(ip[thisByte], DEC);
    Serial.print(".");
  }
  Serial.println();
  
  // start listening for clients
  server.begin();
}

void loop()
{
  // wait for a new client:
  client = server.available();

  // when the client sends the first byte, say hello:
  if (client)
  {
    if (!gotAMessage)
    {
      Serial.println("We have a new client");
      client.println("Hello, client!");
      gotAMessage = true;
    }

    if(client.connected())
    {
        SDIO.remove(VIDEO_FILENAME);
      
        videofile = SDIO.open(VIDEO_FILENAME, FILE_WRITE);
        if(videofile)
            Serial.println("File opened");
        else
            Serial.println("File open failed!");
          
        while((n = client.read(buf, ETH_RCV_LEN)))
        {
              videofile.write((uint8_t*)buf, n);
              //Serial.println(n);
        }
    }

    if(videofile)
    {
        Serial.println(videofile.size());
        videofile.close();
    }
    
  }

  // close the connection:
  if (client)
  {
       client.stop();
       Serial.println("client disconnected");
  }

}

void yield()
{
    //Ethernet.maintain();
}

I think I will find some time next week to digg deeper into this. That the transfer speed drops so dramatically when files get bigger must have something to do with the read and write blocking each other?! Any quick suggestions?
 
The overclocking does have no effect. Or only very small effects.
Did you set the SPI clock in w5100.h? Did you change the CPU clock to 240MHz and F_BUS to 80MHz?

This will choose the highest SPI clock rate (w5100.h):
#define SPI_ETHERNET_SETTINGS SPISettings(-1, MSBFIRST, SPI_MODE0)

To check F_BUS:
Serial.printf("F_BUS: %i\n", F_BUS);
 
I forgot to set the SPISettings. But still no change. I wonder how the 80Mhz get reduced to 60Mhz?

Code:
F_CPU: 240000000 / F_BUS: 60000000
SD card initialized
MAC Address: 04:E9:E5:04:D6:C1 
Trying to get an IP address using DHCP
My IP address: 192.168.188.119.
We have a new client
File opened
client disconnected

---

Done sending
('FileSize (Mb): ', 24.60479736328125)
('Transfer time (min): ', 0.18279671669006348)
('KByte/s: ', 2297.201435690031)
('KBit/s: ', 18377.584719570314)
('MByte/s: ', 2.2433544374583323)
('MBit/s: ', 17.946810921695256)

kinetis.h
Code:
...
#if (F_CPU == 240000000)
 #define F_PLL 240000000
 #ifndef F_BUS
 //#define F_BUS 60000000
 #define F_BUS 80000000   // uncomment these to try peripheral overclocking
 //#define F_BUS 120000000  // all the usual overclocking caveats apply...
 #endif
...

W5100.h
Code:
...
// Safe for all chips
//#define SPI_ETHERNET_SETTINGS SPISettings(14000000, MSBFIRST, SPI_MODE0)

// Safe for W5200 and W5500, but too fast for W5100
// uncomment this if you know you'll never need W5100 support
#define SPI_ETHERNET_SETTINGS SPISettings(-1, MSBFIRST, SPI_MODE0)

#define MAX_SOCK_NUM 1 
...

W5100.cpp
Code:
...
#define W5500_16K_BUFFERS
//#define W5500_8K_BUFFERS
//#define W5500_4K_BUFFERS
//#define W5200_4K_BUFFERS
...
 
Grrr....I hate it. I edited the wrong kinetis.h version. Was still open from the time before upgrading to Arduino 1.8.3 :rolleyes:

But that made things worse. I don't get an IP anymore...
Code:
F_CPU: 240000000 / F_BUS: 80000000
SD card initialized
MAC Address: 04:E9:E5:04:D6:C1 
Trying to get an IP address using DHCP
Failed to configure Ethernet using DHCP
My IP address: 128.0.0.0.

When I reduce the SPISettings in W5100.h to 30000000 it works again. 60000000 doesn't work neither.
 
Last edited:
Status
Not open for further replies.
Back
Top