A Beowulf Cluster of Teensy 4.0's

Status
Not open for further replies.

Aidan Percy

New member
Hey everyone,

I'm a high school senior and I wanted to build a relatively large cluster (30+ nodes) for many reasons, but mainly for educational purposes, my science fair this March (I won last year by demonstrating how music and conversation affects active memory by using the n-back test, cool stuff I know), and also to put on my professional resume. Now, from what I've seen digging around for a few days, the most popular option for making an educational cluster is the Raspberry Pi, but here is a few problems I have with using the RPi.

  1. It has a lot of unnecessary ports.
  2. Its a little more expensive than what I'd like to node (verb), as I'm looking for numbers rather than sheer hash speed.
  3. It has already been done too many times to be impressive, and I want that "wow" factor, much like what Martin A. Smith did with his 16-node ROCK64 cluster.
  4. If I wanted to do anything like the RPi, I'd use the ODROID-MC1 stack that's already available... but I like pain.
  5. Oh yeah I should mention, I want this to be a headless-node cluster.

So my question is to all of you smart lads is:

With the help of a few people (smarter than I), is this a feasible project, or will it be like playing Dark Souls 2 but without a sword or armor while blindfolded?

Also, I have a few reservations about using Teensy 4.0's:

  1. I'm a bit of a noob when it comes to Arduino type devices.
  2. This post on the Arduino forum when another highschooler asked the same question I am but using other Arduinos for a cluster:
Teensy reservation.PNG

To predict what you're going to claim about me needing a device to act as the master node, I would probably use a LattePanda, because I'm a Windows 10 kinda guy.

For the software... well... I'm still figuring that out, but I assume it will be something like Docker, Swarm, and Kubernetes, maybe a visualization software for monitoring my node performance. All that good stuff.

For the budgeting, I'm assuming I'll have around $1400 to play around with, due to some very nice people willing to see me pull my hair out for 2 months straight.

Thank you all for reading and I would really appreciate any feedback I can get! Take care.

Pssst, If you would like to take a more active roll in helping me create this, please hop in this Discord server I made for the project: " https://discord.gg/HB7uZe8 " I will commonly post updates and ask questions here for the people who prefer to use this platform.
 
A fool's errand, I fear. My recall of such clusters is that they are unix-based and utilize a local-area network (e.g. TCP/IP on ethernet). There is not a unix for T4, and the current T4 has no native Ethernet support (you'd need for example SPI-based Wiznet modules)
 
USB Ethernet is starting to be pretty promising, it’s at least a whole lot faster than the wiznet modules, further testing is still being done, but if you need to go that route it’s at least viable for high speeds.
 
A fool's errand, I fear. My recall of such clusters is that they are unix-based and utilize a local-area network (e.g. TCP/IP on ethernet). There is not a unix for T4, and the current T4 has no native Ethernet support (you'd need for example SPI-based Wiznet modules)

As @vjmuzik notes the USBHost can get good data rates to wired LAN with recent work. Though USB Device or HOST direct at some part of 480 Mbps could be an option as well.

The HOST will need to talk to the Device USB ports to program them ( TyCommander is good ) - and if that is there can that be good enough? It is 7 MB/s now and should double at least once and currently low overhead and may get lower if DMA usage goes well.

I poked around and indeed rPi has some history - back to it's single core days even - and direct Linux support with networking and even python machine managers that manage new 4 cores.

Running on Teensy as noted 'sounds like fun' but unless each has an interpreter it would need fixed code to run - maybe that is to be expected. Then the question is - what can run to good effect? One user note said the FPU unit can match the CRAY1 for processing?

Under $20 is nice, and with a good breakout can get SD card and USB Host and all pins brought out for full utility. Programming and control would indeed take a good plan and effort. And with heat sink and cooling each can be overclocked to good effect.
 
As @vjmuzik notes the USBHost can get good data rates to wired LAN with recent work. Though USB Device or HOST direct at some part of 480 Mbps could be an option as well.

The HOST will need to talk to the Device USB ports to program them ( TyCommander is good ) - and if that is there can that be good enough? It is 7 MB/s now and should double at least once and currently low overhead and may get lower if DMA usage goes well.

I poked around and indeed rPi has some history - back to it's single core days even - and direct Linux support with networking and even python machine managers that manage new 4 cores.

Running on Teensy as noted 'sounds like fun' but unless each has an interpreter it would need fixed code to run - maybe that is to be expected. Then the question is - what can run to good effect? One user note said the FPU unit can match the CRAY1 for processing?

Under $20 is nice, and with a good breakout can get SD card and USB Host and all pins brought out for full utility. Programming and control would indeed take a good plan and effort. And with heat sink and cooling each can be overclocked to good effect.

With that information, bringing breakout boards into the equation might be a bit... much. I didn't really expect to have to esb ethernet either. After doing some research, the ODROID looks like it might just might be the better option as it is suited for this. For about twice per node the cost I can maybe make this project doable.

I still don't quite know what I would run on this. I was kinda hoping that would come to me now after thinking about it for a week or so. It hasn't.

Hmm. Decisions decisions.
 
With that information, bringing breakout boards into the equation might be a bit... much. I didn't really expect to have to esb ethernet either. After doing some research, the ODROID looks like it might just might be the better option as it is suited for this. For about twice per node the cost I can maybe make this project doable.

I still don't quite know what I would run on this. I was kinda hoping that would come to me now after thinking about it for a week or so. It hasn't.

Hmm. Decisions decisions.

As far as breakout, was thinking of a simple PCB to connect the bottom pads - if SD or USB was needed. Some few bucks a board or less depending on what PCB comes from where and what it gets populated with.
 
You don't state what bandwidth you need between each node, but if not too high one possibility would be to make a mesh of Teensy4.0s using the I2S ports, with IN1, OUT1A, OUT1C, OUT1D, IN2 and OUT2 reprogrammed as inputs or outputs as needed. These are intended for audio but can deliver any data between FIFOs 4 bytes at a time.

Use right angled connector pins on the pin 0-12 side to plug the boards tightly into a motherboard, then use long pin sockets to connect the VIN and GND pins on the RHS of all the boards together.
 
@AidanP - if you are still considering this - not sure if this Lua-interpreter-example would allow the Teensy 4.0 nodes to run this interpreter and take code on the fly for the cluster?

No personal background with Lua 5.3 Reference Manual … but …

Seeing this and looking at rPi cluster it had python scripting feeding and controlling the cores. Not sure there is anything else active on Teensy for running scripted code that wouldn't require full code upload for each new 'task'?

As far as interconnect - if not using any I/O for much any of the buses could be used SPI or i2c or Serial UARTS or perhaps i2s if not USB direct to the master node, or a group node

Questions on that linked LUA thread might get help/direction on the possibilities. It is Byte Coded low level script so it should have decent performance and with heat sinks and a bit of moving air through the cluster running at 800 to 900 MHz might make up for any overhead of interpreting.

Maybe TeensyThreads could run for cluster management {stop,start,upload,...}. Each T4 has 7 usable UARTS - so one T4 could serve as group node connect to 6 or 7 of the nodes for debug/status SPEW to filter back and monitor from one UART Serial on each active node at 5M_baud. Four of those tasked to gather feedback could support 24 or 28 nodes to feed back to the master. With the addition of USB Host connection a multiport HUB could own a group of nodes and the master node could just work through those group masters.
 
Status
Not open for further replies.
Back
Top