PJRC main web server down today

PaulStoffregen

Well-known member
We're having trouble with the main website. :(

This forum runs on a separate server which is still running, but may become inaccessible if DNS doesn't work.
 
We're waiting on the server hosting company to replace the server's hard drive. Time frame is unknown.

Also unknown whether they will copy the old data or give us a basically a clean slate which requires reinstalling everything from backup. But given how badly it was performing yesterday, I doubt they will attempt a drive copy.

I can confirm we do have backups from May 11 before the problem and May 12 as the server was still running slowly. But it's just a rsync copy of all important files, not in a special backup format where restore can be done at the press of a button. Restore back to everything 100% working may take time.

Here's the info we heard from them yesterday.

During diagnostics, the server showed extremely high disk I/O wait and very large storage delays, with write operations sometimes taking over 1-2 seconds to complete even under very light workload. This is causing Dovecot Maildir synchronization operations to take 35-55 seconds in some cases, which explains the email slowness and intermittent mail access issues.

We performed extensive checks including SMART diagnostics, filesystem trim operations, I/O monitoring, kernel log review, and mail service analysis. The SSD does not currently show hard SMART failure indicators such as bad sectors or uncorrectable errors, however the drive has approximately 65,000 power-on hours and is exhibiting behavior consistent with an aging SSD experiencing internal latency/controller stalls.

We also performed a large filesystem trim operation on the SSD, but the severe latency behavior continued afterward, which further suggests the issue is related to the aging storage device itself rather than temporary filesystem cleanup.

At this time, the server remains operational, but the storage latency is likely to continue affecting performance intermittently. We would recommend planning for proactive SSD replacement before the condition worsens further.
 
Server has a new SSD installed. No software, just a clean Linux install.

So far the only service I've restored is DNS, so browsers can resolve forum.pjrc.com to an IP number to access this forum.

I'm uploading the backup files now, about 75GB total. Should take a couple hours...
 
Public website should be back. Access to all the tech info and software install by Arduino IDE's Boards Manager should be working again.

Shopping cart still disabled, email offline, and a number of things we use internally (like publising blog articles) are still down.
 
Enter just pjrc.com in EDGE or BRAVE and
When not specified https the site presents unsecured http not rolling over to HTTPS
1778802950571.png
 
Whether our SSL cert is still auto-updating is on my list of low priority server stuff to still investigate. It expires June 19, so not urgent yet.

My workbench has a mess right now. Will clean up soon, so I can get a Windows machine running to test from Edge.

From Firefox on Linux, you can get the Apache2 default page if you just put the IP number into the address bar. I'm not really concerned about that.
 
I edited the server config again. Hopefully redirects are working now? The redirect is always supposed to add "www" if missing, and always redirect to SSL secure.

Another possible problem may be something wrong affecting browser caching of images. When you click around the site, are the images reloading from the network or appearing quickly from cache?
 
Back
Top