Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Community > Team MacNN > [FAH] Client connect errors

[FAH] Client connect errors
Thread Tools
Scotttheking
Moderator Emeritus
Join Date: Dec 2000
Location: College Park, MD
Status: Offline
Reply With Quote
Jan 18, 2003, 12:28 PM
 
I just logged into my fast box, and it's been sitting there all night with these

[16:26:32] + Attempting to get work packet
[16:26:32] - Connecting to assignment server
[16:26:33] - Successful: assigned to (171.64.122.119).
[16:26:33] + News From Folding@Home: Welcome to Folding@Home
[16:26:33] Loaded queue successfully.
[16:26:33] - Couldn't send HTTP request to server
[16:26:33] + Could not connect to Work Server
[16:26:33] - Error: Getwork #5 failed, and no other work to do. Waiting before retry
What is going on, and how do I fix it? Restarting the client did nothing. It's got a 20 point protein waiting to be sent in.
This one is running a windows client.

Same with another linux client, it's stuck.

Same with windows client.


I'm guessing all my boxes are stuck.

WTH is going on???
My whole farm is sitting here, idle.


Edit: Doing more digging, I find this.
http://folding.stanford.edu/serverstat.html

And yep, the server I'm trying to connect to is overloaded. This is absurd.
( Last edited by Scotttheking; Jan 18, 2003 at 12:35 PM. )
My website
Help me pay for college. Click for more info.
     
Shaktai
Mac Elite
Join Date: Jan 2002
Location: Mile High City
Status: Offline
Reply With Quote
Jan 18, 2003, 01:08 PM
 
Looks like Stanford is having some kind of major problems. They are Pacific time, so are probably became aware of the problem an hour or two ago and having to get staff in to work on it.

UPDATE 09:39 Pacific time: The website, stats and server status are all back up. Looks like the work servers may be too, but can't confirm it until I have something more to send or receive. This caught the Stanford folks sound asleep in their beds most likely. Looks like they scrambled to get right on it though.
( Last edited by Shaktai; Jan 18, 2003 at 02:11 PM. )
     
Scotttheking  (op)
Moderator Emeritus
Join Date: Dec 2000
Location: College Park, MD
Status: Offline
Reply With Quote
Jan 18, 2003, 01:43 PM
 
That's only what, 9 hours too late?

They need to fix their setup so this stuff doesn't happen.
I don't have time to monitor my farm and switch stuff to different projects when they go down.
My website
Help me pay for college. Click for more info.
     
reader50
Administrator
Join Date: Jun 2000
Location: California
Status: Offline
Reply With Quote
Jan 18, 2003, 03:23 PM
 
The "Server Status" indicators in the quickstats scan that page. It assumes that the project is online for each platform if any server can accept units, and any can assign them. This basically assumes the client will check all servers and assign / upload as needed.

Looks like the client is not that smart. I'll have to rework the status code to trip on any server problems that could affect a client.
     
Scotttheking  (op)
Moderator Emeritus
Join Date: Dec 2000
Location: College Park, MD
Status: Offline
Reply With Quote
Jan 18, 2003, 05:00 PM
 
Looks to be back now.
That sure was absurd.
My website
Help me pay for college. Click for more info.
     
Shaktai
Mac Elite
Join Date: Jan 2002
Location: Mile High City
Status: Offline
Reply With Quote
Jan 18, 2003, 07:47 PM
 
The official word on the problem from Vijay Pande is:

Around 2-3am, there was apparently a major problem at Stanford's main computer centers, causing net outtages throughout campus.

It looks like F@H is (back) up and mostly ok, just really overloaded due to everyone trying to return data. The clients have built in backoffs, so it should resolve itself, as long as everyone just waits (rather than pounding the servers manually -- which will prolong this for everyone).

Guha & I are on it.
So it wasn't just their servers but a network outage that created the problems. Blame it on the full moon I guess.

I was lucky, my boxes had mostly all downloaded new proteins right before the outage, and so just kept crunching through it. Only had one out of six that was effected, and then probably didn't loose more than an hour or two of crunch time on that one, and it is back up working on another 19.4 pointer that will finish tomorrow (it is the Celeron, so it takes a while).

Darn Scott, you seem to have the worst luck with this project. Go weeks without a major glitch, you join in and the project gets hit hard just in time to take out most of your farm.

What a bummer!
     
Shaktai
Mac Elite
Join Date: Jan 2002
Location: Mile High City
Status: Offline
Reply With Quote
Jan 18, 2003, 07:52 PM
 
Originally posted by reader50:
The "Server Status" indicators in the quickstats scan that page. It assumes that the project is online for each platform if any server can accept units, and any can assign them. This basically assumes the client will check all servers and assign / upload as needed.

Looks like the client is not that smart. I'll have to rework the status code to trip on any server problems that could affect a client.
From what I could tell, it couldn't reassign other servers to issue work, because the entire system was down or overloaded. Normally though, if it can't reach one server, then it will try another. It will hold finished work, in queue until it can be sent, if a receiving server goes down.
     
Scotttheking  (op)
Moderator Emeritus
Join Date: Dec 2000
Location: College Park, MD
Status: Offline
Reply With Quote
Jan 18, 2003, 08:54 PM
 
Originally posted by Shaktai:
From what I could tell, it couldn't reassign other servers to issue work, because the entire system was down or overloaded. Normally though, if it can't reach one server, then it will try another. It will hold finished work, in queue until it can be sent, if a receiving server goes down.
That's what it's supposed to do.
That's not what it did. It just kept sending me to the same server for hours.
Finally it sent me to new servers.

I can do the big 20 pointers in under a day, so a server outage hurts me bad. This is the 2nd time there's been a major outage for me while doing f@h.

On a good note, I might be able to get another athlon. I hope.
My website
Help me pay for college. Click for more info.
     
krove
Mac Elite
Join Date: Jul 2000
Location: Washington, DC
Status: Offline
Reply With Quote
Jan 18, 2003, 11:51 PM
 
Yeah, I had a big one that took a couple days on an old iMac that couldn't be submitted. I checked the server status on their website and all looked well at the time (this was about 2 weeks ag). I checked out the protein that had completed and apparently the project was gone or no longer up. I was within the time limits, so I'm not sure what happens when they finish distributing a particular project... Kinda sucks to have crunching time wasted like that...

I also found that I had inadvertantly setup a client incorrectly on a machine at work, putting "k" as the team number instead of 16. Lost 15 points there...

How did it come to this? Goodbye PowerPC. | sensory output
     
reader50
Administrator
Join Date: Jun 2000
Location: California
Status: Offline
Reply With Quote
Jan 19, 2003, 01:34 AM
 
I believe you still get stats credit for work submitted after the time limit. You do on other projects anyway. It's just not as useful to the project, because they may have reissued the unit to someone else. It would become a redundancy check instead of original data.
     
Shaktai
Mac Elite
Join Date: Jan 2002
Location: Mile High City
Status: Offline
Reply With Quote
Jan 19, 2003, 04:16 AM
 
Yes you should get credit. There have been a few instances where a receiving server went down prematurely, but even if the project ends, keep letting it try to send. If it doesn't go in a couple of days, go the the folding forum for help. The Pande Group really tries to help out, and they have improved their systems dramatically.

Today's snafu had nothing to do with their systems malfunctiong, but was rather the result of a campus wide network problem. Unfortunately, it created a lot of problems for them and all of us.
     
Scotttheking  (op)
Moderator Emeritus
Join Date: Dec 2000
Location: College Park, MD
Status: Offline
Reply With Quote
Jan 19, 2003, 12:02 PM
 
Originally posted by Shaktai:
Darn Scott, you seem to have the worst luck
You have no idea
My website
Help me pay for college. Click for more info.
     
   
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 09:25 AM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,