Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Community > Team MacNN > New Altivec-enhanced Seti worker in need of testing

New Altivec-enhanced Seti worker in need of testing (Page 5)
Thread Tools
BTBlomberg
Forum Regular
Join Date: Sep 2005
Location: Chicago Suburbs
Status: Offline
Sep 27, 2005, 12:29 AM
 
OK, regarding the odd, short WU results here is what I found:

I Clicked on the "Result ID #" for several of these and followed on as such:
"Workunit #" (shows other 4 results for same WU) -> "canonical result #" from the top of the Work Unit page. This shows the stderr for all to be like this:
Code:
stderr out <core_client_version>4.XX</core_client_version> <stderr_txt>... SETI@Home Informational message -9 result_overflow NOTE: The number of results detected exceeds the storage space allocated. </stderr_txt>
This would support your point Shaktai. A noisy unit would have too many peaks for the unit to be processed as the signal would be all over the place and likely overflow the system in place to focus on the results found by the first scan.
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 27, 2005, 01:18 AM
 
Originally Posted by BTBlomberg
OK, regarding the odd, short WU results here is what I found:

I Clicked on the "Result ID #" for several of these and followed on as such:
"Workunit #" (shows other 4 results for same WU) -> "canonical result #" from the top of the Work Unit page. This shows the stderr for all to be like this:
Code:
stderr out <core_client_version>4.XX</core_client_version> <stderr_txt>... SETI@Home Informational message -9 result_overflow NOTE: The number of results detected exceeds the storage space allocated. </stderr_txt>
This would support your point Shaktai. A noisy unit would have too many peaks for the unit to be processed as the signal would be all over the place and likely overflow the system in place to focus on the results found by the first scan.
Ok, I follow you. But why did the other systems not have this issue with the WU. Alex did say a while back that some of his routines were actually more sensitive than the release app from SETI, but it would be surprizing if this was the cause. There still must be some reason that the other systems did not have an issue with the WU.

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 27, 2005, 07:44 AM
 
Ok i have one that did not verify -

http://setiathome.berkeley.edu/worku...?wuid=28275256

I do not know if it can be downloaded.

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
BTBlomberg
Forum Regular
Join Date: Sep 2005
Location: Chicago Suburbs
Status: Offline
Sep 27, 2005, 10:56 AM
 
OK Snake Doctor,

I see your problem. You are running 10.3.9 (Darwin 7.9). Alpha-4 and 5 are not optimized for 10.3.9 but for 10.4.X so likely you are suffering from the lack of fallback code in the worker. That is why Alex said he should be less lazy and work on finishing that part. That is likely your issue. Otherwise results like this where all clients/platforms have simular results it's a normal Noisy WU.

Also, the message I posted above:
Code:
SETI@Home Informational message -9 result_overflow NOTE: The number of results detected exceeds the storage space allocated.
will only appear when a specific non-Mac Client finds one of these Noisy WUs so not all will have this note.
     
beadman
Dedicated MacNNer
Join Date: Nov 2004
Location: Virginia
Status: Offline
Sep 27, 2005, 12:01 PM
 
<a href="http://setiathome.berkeley.edu/workunit.php?wuid=27521068"> Here's </a> another example of what Phil is talking about. My machine took 223 seconds, and the others all took more than 18K seconds. No errors on any of the results. Mine is a PowerBook G4 1.67 GHz, 1 GB RAM, using 4.44 superbench and Alpha-4 G4. My "normal" results average around 6,400 seconds.

I'm not complaining, since I got credit - but I though Alex or Rick might like to see the results.

beadman
     
BTBlomberg
Forum Regular
Join Date: Sep 2005
Location: Chicago Suburbs
Status: Offline
Sep 27, 2005, 12:19 PM
 
Beadman,

Once again you are running MacOS 10.3.9 rather than MacOS 10.4.2. That looks to be the issue.
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 27, 2005, 12:46 PM
 
Originally Posted by BTBlomberg
Beadman,

Once again you are running MacOS 10.3.9 rather than MacOS 10.4.2. That looks to be the issue.
I think you are wrong about that. According to the website where I got A4 from (the website operated by Rick and pointed to by Alex), the app I picked up was for 10.3.9. There is an A5, and an A3 for 10.4.x, but the A1 and A4 are SUPPOSED to be for G4s running 10.3.9. If what I read is correct, the A4 will also run on G5 machines using 10.4.x, but this is not a requirement.

During this testing, if the app was not for the G4 running 10.3.9 we all knew it pretty quick. If it ran at all it would error every WU, in most cases the APP crash outright.

I think what Alex was saying was that he would like to incorporate the improvements made in the A5 app, into the OS 10.3.9 versions but has not had time to compile them with the 10.3.9 compatability turned on. That would produce another versions for 10.3.9 (A6, A4.5 who knows)

Regards
Phil
( Last edited by Snake_doctor; Sep 27, 2005 at 12:55 PM. )
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 27, 2005, 01:09 PM
 
Originally Posted by beadman
<a href="http://setiathome.berkeley.edu/workunit.php?wuid=27521068"> Here's </a> another example of what Phil is talking about. My machine took 223 seconds, and the others all took more than 18K seconds. No errors on any of the results. Mine is a PowerBook G4 1.67 GHz, 1 GB RAM, using 4.44 superbench and Alpha-4 G4. My "normal" results average around 6,400 seconds.

I'm not complaining, since I got credit - but I though Alex or Rick might like to see the results.

beadman
Exactly right! Your machine processed the WU in 223 sec, and everyone else took a normal time. This is exactly what happened on two of mine. I looked at the WUs for both your system and mine and looked at the results for each of the other machines. I do not see any errors at all, just short CPU times. Although one of the two from my system did not have credit awarded, I do not see any errors in that result listing.

I do not think the OS 10.4.x issue is the problem either (not that it is a problem, just interesting). I hope Rick or Alex get a chance to take a look at these three Results. That is the only reason i reported them.

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
BTBlomberg
Forum Regular
Join Date: Sep 2005
Location: Chicago Suburbs
Status: Offline
Sep 27, 2005, 02:04 PM
 
You are right that A4 was for 10.3.9 and I wondered about that but did not check and correct myself. But, I have a feeling that that is also where the bug is. There are difference compile components between 10.3.9 and 10.4.X, as we should know from reading this thread. It may be that something in the code for the 10.3.9 thread of this code has a bug, so tht I could run Alpha-4 on MacOS 10.4.2 with none of these issues, but you could run it on 10.3.09 and have them. Different code being used for something that is creating this issue.

I checked my Alpha-4 Results with the short times and saw they still fit with the other 3. Your are definately not and it is a bug. Alpha-5 is 10.4.X only and has been great, but it may be if the same 10.3.9 code were worked it you would get the same issue. It looks like there may be something in need of being tracked down. It is all Alpha after all and they need to find all these little things that go wrong so it's good to know it's there.

I believe that if you were in A5 on 10.4.2 there would be no issue, but there are plenty out there still with 10.3.9 so it will likely need to be addressed. Hopefully they have time to track it down for Alpha-6.
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 27, 2005, 03:21 PM
 
Originally Posted by BTBlomberg
You are right that A4 was for 10.3.9 ... I have a feeling that that is also where the bug is. ... It may be that something in the code for the 10.3.9 thread of this code has a bug, ...
That is why Some of us have reported this little "fluke." It may be the code, but as I said it may just be some issue with the XML file and the keeping of stats. It could also be a MacNN 4.44 Superbench issue. After all that code has some play in calculating credits and keeping tack of CPU times too.

... I checked my Alpha-4 Results with the short times and saw they still fit with the other 3. Your are definitely not and it is a bug. ... It looks like there may be something in need of being tracked down. It is all Alpha after all and they need to find all these little things that go wrong so it's good to know it's there.
I hope if it is a bug you will turn out to be correct and the fix is simply a more optimal compile for 10.3.9. Having been a programmer, I was never that lucky, and these things are never that simple.

... there are plenty out there still with 10.3.9 so it will likely need to be addressed. Hopefully they have time to track it down for Alpha-6.
Yea, and it is very expensive to upgrade. It is not just the $100 for the OS, but a lot of other stuff winds up having to be replaced too. In addition, it usually a few weeks/months to get things working right again. If the systems were only crunching these projects it would be no problem. I do a little farming, but it is all out in the yard, and there is no silicon involved.

I also see now that both of my odd WUs have gotten credit. Only the the third one I reported in an earlier post did not validate.
Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
BTBlomberg
Forum Regular
Join Date: Sep 2005
Location: Chicago Suburbs
Status: Offline
Sep 27, 2005, 04:20 PM
 
I don't think Boinc Superbench would be it as that is just the project/file manager and the Superbench just tells the client to get an extra set of WUs and it is the same Boinc client on either version so if it worked fine with standard SETi or Javalizard it should act the same here.

As for the thought of recompiling for a fix, I did not meean for it to sound that way as that is not fully what I meant. I meant more so that the custom thread in the app for 10.3.9 or library used to compile for 10.3.9 may hold the bug so catching and fixing that would do it. Could just be a single vaule throws it off. Tracking that down may require one of you 10.3.9ers running debug sw to log it.

I did not have a bad time going to 10.4 but it is true that there can be some stumling blocks at first to getting fully productive. One is playing with all the Dashboard pluggins until your system comes to a crawl.

As for that last WU, the guys at SETI may have gotten POed that you were getting full credit for a bad crunch and changed the code to dump sets like this and not give you credit. May have thought it was a bad client attempting to fill someone with bolstered stats. Who knows.
     
rick
Fresh-Faced Recruit
Join Date: Sep 2005
Status: Offline
Sep 27, 2005, 05:51 PM
 
Well I've chugged away on that invalid result that amigoivo reported. It's definitely something with the Gaussian finding code because it misses one of them.

Originally Posted by Snake_doctor
Ok i have one that did not verify -

http://setiathome.berkeley.edu/worku...?wuid=28275256
I managed to grab this one as well so I'm running it now. Hopefully it should just be the Gaussians like the previous one.



There seems to be a bit of confusion about the difference between the 10.3.9 and 10.4 clients. I think Snake_doctor nailed it on the head, but I'll lay down the Definitive Truth.

The difference is that the vDSP library (Altivec optimised code for doing digital signal processing type stuff) on 10.3.9 doesn't contain a few routines that we want to use. So basically we have two compiles: one for 10.4 which uses these routines and one for 10.3.9 which uses our routines (which have the same name).

If you were running the 10.4 client on 10.3.9 then it just flat out wouldn't run in the first place: it wouldn't be able to "link" to the functions it needed to run. However, you can run the 10.3.9 client just fine on 10.4, but you won't be using the OMFG ultra fast routines from Apple. You'll be using our routines.

I'm pretty sure that what Alex meant by "be less lazy about writing fallback code" was that these "fallback" functions are not optimised in the slightest in the current alpha clients. In fact, it's pretty much equivalent to the code in the standard unoptimised client.



As for work units with really short times... dunno. I haven't been able to grab any of these work units to test. I'll happily go along with the conspiracy theory that the people running SETI@home occasionally put in a "bait" work unit to see if any of the clients are lying.

If you post links to these I'll try and get round to having a look at them. Just remember that I can only grab work units that are relatively recent though. I can usually get it if all 4 of the clients haven't finished working on it.

I doubt it's anything to worry about. If there was anything wrong then you definitely wouldn't get any credit. If I have time I'll ask about it on one of the mailing lists.



Currently, my guess about what I'll try to get into alpha 6 is (haven't confirmed any of this with Alex yet though):
  • some useful blurb in stderr.txt so people can figure out what the hell is going on,
  • fixed Gaussian code, although we might just have to revert to the original (slower) code until we can figure out what we're doing wrong,
  • (hopefully) much faster 10.3.9 fallback functions (which should also be better for G5 processors),
  • (hopefully) slightly faster pulse finding code (which should be better for G5s),
  • a few other random tweaks to make things better for G5s,
  • fat binary support (i.e. just download one client for both G4 and G5).

Note that even though we will probably have fat binary support you will still have to download a separate client for 10.3.9 and 10.4.
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 27, 2005, 06:47 PM
 
Originally Posted by rick
As for work units with really short times... dunno. I haven't been able to grab any of these work units to test. I'll happily go along with the conspiracy theory that the people running SETI@home occasionally put in a "bait" work unit to see if any of the clients are lying.

If you post links to these I'll try and get round to having a look at them. Just remember that I can only grab work units that are relatively recent though. I can usually get it if all 4 of the clients haven't finished working on it.
Here is a link to the "0" credit WU http://setiathome.berkeley.edu/resul...ltid=119016258

If I can believe the last stats on the last reporting system it should be there until tomorrow.

Thanks Rick

Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 27, 2005, 08:26 PM
 
Originally Posted by rick
One last thing, if anyone's using the Python script then you have to remove the trailing underscore and all the characters after it. So:
Code:
python setiURL 09no03aa.23825.2481.54842.127_0
should actually be
Code:
python setiURL 09no03aa.23825.2481.54842.127
While I feel pretty dunb about asking, How do you exicute this python command. The command "python" works in the terminal window, but I can't figure out how to run the script.

Ok I figured out the problem. I guess I am going to have to learn the UNIX "ed" after all.

Regards
Phil
( Last edited by Snake_doctor; Sep 27, 2005 at 09:47 PM. )
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
beadman
Dedicated MacNNer
Join Date: Nov 2004
Location: Virginia
Status: Offline
Sep 27, 2005, 09:42 PM
 
Rick:
Here's one that has three WU returned but no consensus, still waiting on 4th result to come in. Also, my time is 5K instead of my normal 8K seconds on this machine - iBook G4 1.33 GHz, 768 MB Ram, 4.44 Superbench and alpha-4 G4, OS 10.3.9, running MenuBar simple GUI. This one is a quick run, but not the extremely quick one reported earlier.

http://setiathome.berkeley.edu/worku...?wuid=28284506

beadman
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 28, 2005, 12:29 AM
 
Rick/Alex

Here is another short time WU.

http://setiathome.berkeley.edu/resul...ltid=120018349

This one is less than 60 sec. One of the other systems just reported in and it is over 10,000 sec on this WU. I am certain this does not matter but all of these really short WUs are on the G4 Dual system. I have a few that are over 1000 CPU seconds are are also short by comparison to other systems in the quorum, but all of those are coming off of the G4 Powerbook.

This latest one is fresh as of 11:30 US EST

Hope you can catch it. I have tried to use the python script and the first IF command keeps generating an error. I think I must have not created the file correctly. I am trying to remember all of the arcane ED commands and it is coming back, but it won't be back in time I am afraid.

Regards
Phil
( Last edited by Snake_doctor; Sep 28, 2005 at 12:48 AM. )
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
rick
Fresh-Faced Recruit
Join Date: Sep 2005
Status: Offline
Sep 28, 2005, 04:36 AM
 
Originally Posted by Snake_doctor
Hope you can catch it. I have tried to use the python script and the first IF command keeps generating an error. I think I must have not created the file correctly. I am trying to remember all of the arcane ED commands and it is coming back, but it won't be back in time I am afraid.
You don't need to use the ed command to run the script. You might have accidentally edited it so you should probably download a fresh copy.

Okay, assuming that you put the "setiURL" file on your desktop, you'll have to use the "cd" command to change directory to your desktop, e.g.:
Code:
cd ~/Desktop python setiURL 09no03aa.23825.2481.54842.127
and if the script manages to download successfully, the file will be on your desktop. The "~" in the above command means the current user's home directory.

Alternatively, say you put the setiURL script in some random downloads directory. Open the downloads directory in the Finder, then open the Terminal app window as before, type "cd " (with a space). Drag the icon in the Finder title bar to the Termainl window and it will drag & drop the directory name. For example:
Code:
cd /Users/rick/Desktop/junk/downloads/ python setiURL 09no03aa.23825.2481.54842.127
If you want to know what the current directory you're in is, use the "pwd" command (print working directory):
Code:
arrakis 9:11:51 rick % pwd /Users/rick arrakis 9:11:52 rick % cd Desktop/ arrakis 9:11:56 Desktop % pwd /Users/rick/Desktop
If you're still having problems I can have a go at creating a GUI app.

I've mailed the boinc_opt list about the short work unit times.
( Last edited by rick; Sep 28, 2005 at 04:36 AM. Reason: aesthetics)
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 28, 2005, 07:36 AM
 
Originally Posted by rick
You don't need to use the ed command to run the script. You might have accidentally edited it so you should probably download a fresh copy.

If you're still having problems I can have a go at creating a GUI app.

I've mailed the boinc_opt list about the short work unit times.
The problem is that when I go to your link for the setiURL file, I get a listing of the file, not a download. I will try saving the link target and see if that helps, but in any case I used ED to fix the file after I got the commands on the machine. I'll try it again,

I do have yet another fresh short result again from the G4 Dual.

http://setiathome.berkeley.edu/resul...ltid=120217216

I really seems strange to me that all of these are coming off of only one of my systems, as both are set up the same way.

Thanks again Rick. I will try to get the WU above.

Regards
Phil

EDIT - Ok I got the script and it runs, but either I missed the WU or I am not using the right name, because it says "unable to download work unit". I'll keep after it till I get one.
( Last edited by Snake_doctor; Sep 28, 2005 at 07:45 AM. )
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
Drash
Fresh-Faced Recruit
Join Date: Aug 2001
Location: UK
Status: Offline
Sep 28, 2005, 12:44 PM
 
eMac 1GHz 1GB ATI 7500, 10.4.2, g4-a5, BOINCManager 4.44 with superbench shoe-horned in

Just been tracking down a sporadic general system slowdown that seems to be linked with more than usual pageout activity - Safari is still hemorrhaging vm (I've had 1.2 GB after a few hours!), but I also noticed that the g4-a5 client memory usage increased dramatically as a unit got to 100% and was reported - 25% through a unit now and the memory useage is 31MB real, 68MB virtual (at the start it was 22MB and 65MB), but at 100% a few mins ago it was 233Mb real 298MB virtual. That combined with Safari's leak seems to be causing the sporadic pageouts and unusually large swap files. Can't pin this down to a5 particularly, been trying to track this down a while now. I never used to have any swap files and very very few pageouts. Is this memory useage ok?

On the up-side, workunit are flowing at a fantastic rate, with no problems (60+ units on g4-a5)

Ash
     
reader50
Administrator
Join Date: Jun 2000
Location: California
Status: Offline
Sep 28, 2005, 03:24 PM
 
With regards to the very-short-time units. I have not seen it mentioned in the BOINC pages, but the Classic info mentioned something that might pertain.

The Classic client would sometimes run into an "interesting" work unit. One that looked more likely than most to contain an intelligent signal. When that happened, the Classic client aborted, and flagged the unit as especially interesting. This resulted in full credit, and Berkeley would re-run that unit on their own machines for closer analysis.
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 28, 2005, 08:54 PM
 
Originally Posted by reader50
With regards to the very-short-time units. I have not seen it mentioned in the BOINC pages, but the Classic info mentioned something that might pertain.

The Classic client would sometimes run into an "interesting" work unit. One that looked more likely than most to contain an intelligent signal. When that happened, the Classic client aborted, and flagged the unit as especially interesting. This resulted in full credit, and Berkeley would re-run that unit on their own machines for closer analysis.
I thought about that. But in this case they are way to frequent (i had a few in classic), and the other systems in the quorum are not seeing the same thing. I have been watching the system more closely to see if I can catch one of these going through. I have noticed something a little unusual, that Rick and Alex might be able to check. All of my short WUs are almost exactly 1% of the time I would expect based on the numbers that are generated by the other quorum systems. What I am beginning to expect is that there is something happening in the time keeping that somehow divides the CPU time by 100.

If you think about this a little, there is no way that my results processed in nominally less than 1 minute could possibly match the other systems in the quorum on these WUs for validation unless the final result were the same. There is no way the results could be the same unless the same processing had been completed successfully on my system. Clearly I can't process the WU completely in 60 seconds (Rick and Alex are good but not that good). I see no evidence in the Client messages indicating anything but normal processing (no short reporting times or reporting errors). And I have not seen these WUs hit the system and depart in less than a minuet.

So I am lead to conclude that this must be some kind of error in the CPU count for the WU. The only two places this could happen are in the App code where the time is registered, or the client code where the time is used for credit calculation and reported to the server.

Most of the people who have looked at this have concluded that the client has nothing to do with actual calculation of the CPU time so that leaves the application. It is quite possible that one of the Math routines, is either overwriting the stored time value near the end of the process, or that some part of the code is not contributing properly to time accounting for the WU. But it must also be something conditional that is trigered by some variation in the computer system or WU, because it does not happen to all WUs. And it is not unique to my system so that MAY let my system off the hook.

Either of these app oriented issues could produce CPU counts that would fall consistantly into the range of 2 orders of magnitude less than the correct amount. For example if the CPU count shown was only reflecting the generation of the final upload file, this process would be different for each WU because they are all a little different, but the process would take an amount of time relative to the size of the output file to be prepared. So the final time would be some nominally relative fraction of the proper WU processing time, which is what I am seeing here. In this case it would be something like a replacement of the stored CPU count with some value that should have been added to the count, and the number would therefore be relative to the size of the WU.

If I am correct, then the only problem would be with credit awarding, so the science would be unaffected. Of course there is about a 50/50 change that I am just NUTS!!

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
mikkyo
Senior User
Join Date: Feb 2002
Location: Silly Valley, Ca
Status: Offline
Sep 29, 2005, 01:19 AM
 
Is there any chance the WU was preempted by another project?
I could see the client losing track of time when it resumes a WU after handing off cpu time to another project.
Especially if it were over a day change.
Does anyone have a complete log of one of these WU from download to completion?
     
Knightrider
Dedicated MacNNer
Join Date: Sep 2004
Location: London
Status: Offline
Sep 29, 2005, 02:28 AM
 
Hi all,

I managed to trap 2 fast wu's which I have sent to Rick & Alex.

Here are the SAH url's:-

28oc03ab.11417.29328.15918.136
http://setiathome.berkeley.edu/worku...?wuid=28535698

13au03aa.15678.20625.779818.218
http://setiathome.berkeley.edu/worku...?wuid=28535681

If anyone wants a copy send me a pm with an e-mail addy and I will forward them on.


K.
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 29, 2005, 08:07 AM
 
Originally Posted by Knightrider
Hi all,

I managed to trap 2 fast wu's which I have sent to Rick & Alex.

Here are the SAH url's:-

28oc03ab.11417.29328.15918.136
http://setiathome.berkeley.edu/worku...?wuid=28535698

13au03aa.15678.20625.779818.218
http://setiathome.berkeley.edu/worku...?wuid=28535681

If anyone wants a copy send me a pm with an e-mail addy and I will forward them on.


K.
I hope these help, but they are different than the type I have been seeing. The ones I have been gettin, ONLY my system process them short. With yours all four systems in the quorum have very short times. Yours are more likely to be legitimate "Wow" suspects that are headed for a closer look at SETI.

See here http://setiathome.berkeley.edu/worku...?wuid=28566490

Mikkyo-

There are other projects running on my system. I have been watching for preemptions and while they do occur I have not yet seen a relationship. I have seen a few suspend with less that one min. to go and they still finish ok, So if it is that there must also be something else at play. So for no evidence that this would be the problem.

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
beadman
Dedicated MacNNer
Join Date: Nov 2004
Location: Virginia
Status: Offline
Sep 29, 2005, 09:21 AM
 
<a href="http://setiathome.berkeley.edu/workunit.php?wuid=28655655">Here's one</a> that is the opposite of the one Phil and I have been reporting: my machine finished the WU "normally", although in two thirds of the normal time, while the other three compuers all received the -9 overflow error.

beadman
     
rick
Fresh-Faced Recruit
Join Date: Sep 2005
Status: Offline
Sep 29, 2005, 11:21 AM
 
Originally Posted by Snake_doctor
The problem is that when I go to your link for the setiURL file, I get a listing of the file, not a download. I will try saving the link target and see if that helps, but in any case I used ED to fix the file after I got the commands on the machine. I'll try it again,
Just hit "Save As..." from the File menu and save it to wherever you want. If your browser makes you save it with an extension it doesn't matter, just make sure you put the correct filename in for "setiURL".

Originally Posted by Snake_doctor
I thought about that. But in this case they are way to frequent (i had a few in classic), and the other systems in the quorum are not seeing the same thing. I have been watching the system more closely to see if I can catch one of these going through. I have noticed something a little unusual, that Rick and Alex might be able to check. All of my short WUs are almost exactly 1% of the time I would expect based on the numbers that are generated by the other quorum systems. What I am beginning to expect is that there is something happening in the time keeping that somehow divides the CPU time by 100.
The CPU time reported will be because of the initialisation that the client has to do. Before the data actually starts getting analysed there is a small amount of work that the client has to do. Firstly, there is some stuff like setting up the weight arrays for the FFTs (fast Fourier transforms) and allocating memory. Secondly, the data is "smoothed".

After the client does this initial work (which will be the same for every work unit) it can then start the analysis. Presumably, the analysis then bails at whatever point it gets to.

The variation in this initial time between all the clients will just reflect the way different clients do the initial work (e.g. different FFT routines) and CPU differences between the systems.



To all of you that have posted links to short work units: I still haven't been able to get one yet. I reckon they just get processed really quickly through the system. However, there might be a clue in the stderr files:

For results that the "-9 overflow" message the stderr file contains:
Code:
SETI@Home Informational message -9 result_overflow NOTE: The number of results detected exceeds the storage space allocated.
My theory goes like this:

Let's assume that the work unit is bailing because the work unit is "noisy" or whatever and therefore causing a large number of results. However, the amount of space that your system can use to store these results has been set by your BOINC settings. Therefore, say the client wants to generate 20 MB of results and you've only allowed BOINC to use 10 MB then the client will abort as above when BOINC says "no, you can't have any more space".

However, I think the actual SETI@home client has a built-in limit to the number of results it can handle (say, 1000 or something). I reckon that those of you seeing these short work units probably have set BOINC to be able to use a lot of space (a couple of hundred megabytes).

The SETI@home client is therefore reaching it's built-in "result limit" before it reaches BOINC's "disk space limit" and that's why you don't see errors like the above because the SETI@home client just exits gracefully.

Possibly.



I'm still running through these invalid results. It's a pain in the ass because the unoptimised client takes at least 8 hours.
     
alexkan  (op)
Forum Regular
Join Date: Aug 2005
Location: Cupertino, CA
Status: Offline
Sep 29, 2005, 04:20 PM
 
This isn't a reply to any post in particular, but I thought I'd offer my thoughts on validation and accuracy. Granted, I haven't tested quite as many work units as Rick has, but I've still noticed a couple interesting things.

First of all, because the validation only looks at the end result to make sure the analyses from different computes have similar spikes/pulses/Gaussians and that their measured values are within some tolerance of each other, it doesn't really examine the effect of the intermediate results from calculations, which we wouldn't really send back, anyway. That being said, even though the validation limit for the end results is 1% for most of the data, an even smaller change, like with amigoivo's invalid workunit, can make a difference.

Specifically, the way floating-point arithmetic is implemented can lead to interesting issues. (For those of you not well-versed in how floating-point is implemented, this site should be a pretty good illustration of how floating-point numbers are often unable to give a totally exact representation of numbers.) In the case of amigoivo's invalid WU, one Gaussian-fitting calculation where the alpha-4 and alpha-5 methods differ starting from the 21st bit of the mantissa, out of 23, which causes the functions to behave slightly differently. This corresponds to a difference of about 1/20000th of a percent, but still makes enough of a difference for the code to possibly miss a Gaussian. I didn't modify the code for debugging purposes to make sure that this was really the case, though.

This becomes an even bigger deal when you consider that we're using totally different FFT code from the original client. There are probably also things that come up when you take code that was originally designed to use double-precision floating point, such as the Ooura FFT in the original client, and convert it to use single-precision, but I'm not really knowledgable enough to comment on this sort of thing. If I had to venture a guess, though, it probably does a similar sort of butterfly-effect thing to the results like how the subtle changes to the Gaussian code did.

So I suppose the point I was getting at was that it's really hard to make the clients behave exactly the same between versions if you're doing little things to optimize arithmetic here and there. I know the IEEE 754 standard is supposed to keep these things from happening (and the primary architect of IEEE 754 is a professor at my university ), but I suppose without some background in numerical analysis, which I definitely don't have, these things are probably going to happen occasionally--just hopefully not regularly.

That being said, I'm told that some versions of the BOINC client (I'm not sure if the MacNN superbench clients fall under this description) sometimes lose the amount of CPU time if they have to restart the client. Also, those -9 errors happen when the SETI code is called upon to report a spike/pulse/Gaussian and hits the WU-specified maximum number of results that can reported for a particular type. It then proceeds to throw a non-fatal error and just quit early.

Phew, that was a bit of a long post. Hopefully I didn't confuse anyone there. If I did, I'd be happy to provide clarification, at least to the best of my own abilities.
     
Karl Schimanek
Junior Member
Join Date: Oct 2004
Location: Germany
Status: Offline
Sep 29, 2005, 06:11 PM
 
Now i have a invalid unit, too:

Name 02se03aa.29982.27074.1009650.25_0

http://setiathome.berkeley.edu/resul...ltid=119390597

MPC7450

Karl
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 29, 2005, 08:28 PM
 
Originally Posted by rick
Just hit "Save As..." from the File menu and save it to wherever you want. If your browser makes you save it with an extension it doesn't matter, just make sure you put the correct filename in for "setiURL".
Got this solved but thanks. I am on a "short Wu" hunt even as we speak. I think it might be better to copy them in advance when they arrive here, and toss the ones that are ok. You are correct that the window is VERY short for snatching them off the server.

For results that the "-9 overflow" message the stderr file contains:
Code:
SETI@Home Informational message -9 result_overflow NOTE: The number of results detected exceeds the storage space allocated.
I am confused. I have looked at the results from the other reporting systems and I am not seeing any errors for them. i also have not found any for my system. I am not seeing any "result overflows". I must be looking in the wrong place or something. The other systems all seem to process the WU in a normal amount of time. It would seem to me that a noisy WU for one system would be noisy on another (assuming nominally similar math processing precision). Also I have now determined with some certainty that the WUs are actually processing in the proper length of time on my end (clock time that is).

I have tested to see if the Client looses track of CPU counts if I stop it, and there is no joy there either. The times seem normal up to the point where the WU completes but when it lands on the server it has these short times. In every case the reported CPU time has been almost exactly 1% of what i would expect based on comparisons extrapolated from normal WUs.

What I am comparing is the times for other systems in the quorum and finding a WU where other similar systems processed at about the same time as the other systems on the questioned WU. I then have a basis for comparison of the time I should expect from my system for the questioned WU. I admit that this is not mathematically precise, but it is valid in gross terms, which is ok when the differences are this large.

I'm still running through these invalid results.
So far all of my short WU are validating and getting credit. I have had only had one that actually did not get credit and it was a few days ago. That happens from time to time.

As I said before, I could just be nuts, after all you guys are the experts here. But just from my observations, it looks like whatever is happening here happens between the time the WU hits 100%, and when it shows up on the server. In other words it is at the other end of the process from what you explained. I do not know the details of what happens in that 2-6 min of processing that occurs after the manager shows 100%, and the WU gets reported, but from what I am seeing the times look normal when it enters that process, and they don't when it hits the server.

I read Alex's thoughts/comments on this and what he says makes some sense so long as we are talking about just the mathematical slop in the processing or verification processes in isolation. We would also have to assume that all of the elements of a valid WU were present in a reported result in gross terms as well. No matter what math you use or to what level of precision you calculate, the quorum results MUST have all of the complete elements of a properly processed WU to compare with the other systems. For purposes of discussion lets say the triplets result section (assuming there is one) for one system (a Mac running A4 like mine) is missing, this would clearly be different from the other three reported results, and the result should fail the quorum test because I did not complete the processing and the others did. If that is not the case, we have all just stumbled on a very interesting and surprising weakness in the validation system.

But what I see is two possible WU processing conditions. 1) the system looked at the WU and Bailed out for whatever reason and reported the short time. 2) The system processed the WU but reported the incorrect time. I just don't see a third condition that matches the observations. We are taking about a WU that in every respect looks normal to 3 systems in a quorum, and the server sees the processing results as normal and valid between all 4 systems. Because of this, the simplest conclusion one can reach is that the reported CPU time for one of the systems is not correct, but everything else is. If these results were not validating it would be an entirely different story. Look you are both smart guys, so obviously i am missing something here, but I don't understand what.

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 29, 2005, 08:41 PM
 
Rick,

I just was able to snag a short WU from the server, I will have to wait to see if it is a normal short or one of these odd ones we have been talking about. The time is about 270 CPU seconds, so it is a lot longer then what I have been seeing and may be just a routine short WU. When the other systems report I will know. the link is here - http://setiathome.berkeley.edu/worku...?wuid=28778023

If it turns out to be something I should send you, you will have to give an address, as I have no way to serve it.

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
mikkyo
Senior User
Join Date: Feb 2002
Location: Silly Valley, Ca
Status: Offline
Sep 29, 2005, 09:03 PM
 
NOTE: The number of results detected exceeds the storage space allocated.
Use no more than 2 GB disk space
Leave at least 0.05 GB disk space free
Use no more than 99% of total disk space

Are the settings I recommend
Someone that got a quick WU should post their General Boinc space settings and the disk size and free space of the machine that turned it over too fast.

Chances are you aren't allowing enough disk space.
This has happened before, but no one paid attention to the WUs, just the logs.

Now if you get credit for turned in WUs that you didn't have the space to crunch, that is just plain wrong, but that would be a Seti problem.
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 29, 2005, 11:13 PM
 
Originally Posted by mikkyo
Use no more than 2 GB disk space
Leave at least 0.05 GB disk space free
Use no more than 99% of total disk space

Are the settings I recommend
Someone that got a quick WU should post their General Boinc space settings and the disk size and free space of the machine that turned it over too fast.

Chances are you aren't allowing enough disk space.
This has happened before, but no one paid attention to the WUs, just the logs.

Now if you get credit for turned in WUs that you didn't have the space to crunch, that is just plain wrong, but that would be a Seti problem.
Ok,
Use no more than 20 GB disk space
Leave at least 10 GB disk space free
Use no more than 50% of total disk space
Write to disk at most every 60 seconds

83.3 Gb available total space 111.3 Gb

I do not think space is the problem. I am still convinced that for some reason the incorrect CPU time is being reported. Now Rick and Alex got me thinking a little. Suppose, accumulating a lot of these hit in the data was to trip some condition in the software that starts a new routine and for some reason that routine does not do the scoring right? This has to be something that does not occur for every WU, and still does occur with some degree of regularity. I am getting about 1 of these every day. All of them are from a Dual processor G4. Could it be E@H or CP@H running on one CPU with S@H on the other? That happens a lot so I do not think so. Right now P@H is down so that is not part of the problem. That is all that is running right now.

What ever it is appears to be related to the WU in some way. But as far as I can tell the system actually does take the proper amount of time to process the WU, it just does not report it right.

The work Unit in my last post turned out to be a normal short WU, so no joy there. All of these odd Wus have so far gotten full credit, although my claimed credit is of course tossed out because it is the lowest.

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 29, 2005, 11:30 PM
 
Originally Posted by Karl Schimanek
Now i have a invalid unit, too:

Name 02se03aa.29982.27074.1009650.25_0

http://setiathome.berkeley.edu/resul...ltid=119390597

MPC7450

Karl
Yours is even stranger. It does not appear to be invalid, you just did not get credit. This happened to me at P@H last week. They told me that this happens when your result is valid but does not agree with the other systems (ie no consensus). All of the agreeing systems will get credit but you will not. That looks like what happened here.

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
alexkan  (op)
Forum Regular
Join Date: Aug 2005
Location: Cupertino, CA
Status: Offline
Sep 30, 2005, 01:26 AM
 
Just so you guys know, something resembling this bug was briefly discussed on the main SETI forums (http://setiathome.berkeley.edu/forum...d.php?id=20540). Also, you're far more likely to hit the reporting limits outlined in the work unit itself than you are to hit BOINC's disk usage limits.

It seems, at least from my observations, that the CPU time measurement mechanism that SETI BOINC uses isn't terribly accurate to begin with. I pointed this out as part of why it looks like the alphas haven't improved on G5s since alpha-3, and you should be able to verify it by running a unit manually using the UNIX time command and comparing what OS X measures with what SETI measures.

If you have a short work unit and want an easy (at least to me) way to know for sure that it has actually undergone full processing, you can look at the result.sah file dumped by SETI after processing the file. This assumes that you can actually find these files in an actual SETI installation as opposed to the testing setup I have now, of course. Still, if any of the detected signal components of a really short work unit has a non-zero chirp_rate, there's a pretty good chance that it's undergone the full set of processing, since even a G5 won't finish the zero chirp rate processing for a few minutes.

Maybe since I've been looking more at the signal processing side of things, the problems with low reported CPU times strikes me as being somewhat inconsequential, so long as the results are scientifically valid. Modifying the BOINC work unit reporting code really doesn't excite me as much.
     
Knightrider
Dedicated MacNNer
Join Date: Sep 2004
Location: London
Status: Offline
Sep 30, 2005, 03:55 AM
 
Useing the Alpha BOINC Manager.

2005-09-29 23:23:17 [---] Starting BOINC client version 5.1.4 for powerpc-apple-darwin
2005-09-29 23:23:17 [---] libcurl/7.14.0 OpenSSL/0.9.7g zlib/1.2.3
2005-09-29 23:23:17 [---] Data directory: /Library/Application Support/BOINC Data
2005-09-29 23:23:17 [SETI@home] Found app_info.xml; using anonymous platform
2005-09-29 23:23:17 [---] Processor: 2 Power Macintosh PowerMac7,2
2005-09-29 23:23:17 [---] Memory: 2.00 GB physical, 0 bytes virtual
2005-09-29 23:23:17 [---] Disk: 116.76 GB total, 102.97 GB free

Leave applications in memory while preempted?
(suspended applications will consume swap space if 'yes') yes
Switch between applications every
(recommended: 60 minutes) 60 minutes
On multiprocessors, use at most 2 processors
Disk and memory usage
Use no more than 100 GB disk space
Leave at least 1 GB disk space free
Use no more than 80% of total disk space
Write to disk at most every 120 seconds
Use no more than 80% of total virtual memory
Connect to network about every
(determines size of work cache; maximum 10 days) 1 days

K.
     
beadman
Dedicated MacNNer
Join Date: Nov 2004
Location: Virginia
Status: Offline
Sep 30, 2005, 08:31 AM
 
Rick and Alex: <a href="http://setiathome.berkeley.edu/workunit.php?wuid=28765966"> Here's one </a> that has 3 results back and no consensus. One computer is a G5, apparently running standard BOINC, my iBook G4 running BOINC Superbench 4.44 and alpha-4 G4, and an Intel machine.

beadman
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 30, 2005, 12:07 PM
 
Originally Posted by alexkan
Maybe since I've been looking more at the signal processing side of things, the problems with low reported CPU times strikes me as being somewhat inconsequential, so long as the results are scientifically valid. Modifying the BOINC work unit reporting code really doesn't excite me as much.
Alex ,

I can understand your distain for adjusting such a seemingly unimportant are of the code. I am concerned that if something as simple as keeping track of processing time isn't working right, what else may be wrong. As you pointed out, there are a lot of ways to do the math here. Some tweaks may pass validation and some might not, but I think we can both agree that the validation process is not high precision. So there is some concern raised in the back of my mind that the validation process could pass a WU that in fact has corrupted science.

Of course finding anomalies is what SETI is all about. So of course when one jumps right off the stats page at some of us we notice. Many of us have been conditioned to believe that the "WOW" Wu may actually process in a short time and be sent back for additional closer analysis. So maybe we are hopeful. In any case, those of us that have done development and/or alpha testing are also conditioned to report anything we find out of the ordinary. You have to admit that these WUs are not the norm.

I guess what I am trying to say is I understand (and I am sure most of us do), that the science code comes first. But most of us are naturally curious, and we just can't help ourselves from trying to figure out why thing don't work when they should. From what I can see this is the only anomaly in A4. If you notice something that might be the cause let us know. If you don't have time to look at it, we understand. You guys are doing some really good work here and we are all happy to help any way we can. You guys should not have to go into areas you don't want to go.

If I can catch one of these little buggers for you I will. Until I, or someone does, you really can't do anything to fix the problem anyway.


Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
rick
Fresh-Faced Recruit
Join Date: Sep 2005
Status: Offline
Sep 30, 2005, 12:51 PM
 
Been feeling rough today so have had time to run over some in-depth SETI@home related things.

Firstly, I managed to snag the short work unit that Snake_doctor linked to (02my04aa.19046.5921.279820.63). I've run it through alpha 4 and alpha 5 and they produce almost identical results (that can be attributed to tiny floating point errors, nothing important).

However, they each took over 2 hours each so... dunno how you got the short work unit time. Still investigating. I'll run it through the unoptimised client tonight. I reckon BOINC is incorrectly reporting the CPU times as others have mentioned.

Secondly, I got my mad skills out and made a GUI version of the setiURL script. It is now the aptly titled "SETI@home Grabber" and can be found at http://writhe.org.uk/seti@home/grabber. Any problems with it, send them to the usual place.

I realise the web site still sucks ass. I've been working on the design and will put up the new stuff when we release alpha 6. Don't hold your breath though, it isn't imminent.

Lastly, I've been investigating the "-9 result_overflow" error. It appears that my understanding of it was backwards. Okay, new theory:

I've had a look through the code and the "-9 result_overflow" error is actually caused by SETI@home, not BOINC as I'd previously thought. It occurs when the total number of detected signals (i.e. spikes, Gaussians, triplets and pulses) exceeds a certain value which is specified in the work unit.

Therefore, even though the error message hints at it ("exceeds the storage space allocated"), this has absolutely nothing to do with the BOINC software, purely the SETI@home client. It has absolutely nothing to do with your BOINC disk space settings or anything else, it purely depends on some simple code in the SETI@home client:

If you look through the work units, you'll see some bits like this:
Code:
&lt;max_signals&gt;30&lt;/max_signals&gt; &lt;max_spikes&gt;8&lt;/max_spikes&gt; &lt;max_gaussians&gt;0&lt;/max_gaussians&gt; &lt;max_pulses&gt;0&lt;/max_pulses&gt; &lt;max_triplets&gt;0&lt;/max_triplets&gt;
However, the "max_spikes", "max_gaussian", "max_pulses" and "max_triplets" values are not used (I've double checked): the only thing that matters is "max_signals". So, for example, in the above case you can have a total of say 15 spikes, 10 Gaussians, 3 pulses and 2 triplets but any more will cause the "-9 result_overflow" error.

There is nothing wrong with the client code in this case, it's just that the data contains too many signals so they decide to stop processing as opposed to recording a potentially very large number of results.

Further reading of forums and mailing lists indicate that this is quite normal for a number of projects, so we can essentially forget about this. I assume the people on the SETI@home project either discard these work units because they are "noisy" or flag them for later analysis because they're interesting.

Phew...
( Last edited by rick; Sep 30, 2005 at 12:55 PM. Reason: there is not to reason why...)
     
chboss
Fresh-Faced Recruit
Join Date: May 2005
Location: Switzerland
Status: Offline
Sep 30, 2005, 01:34 PM
 
Rick

The -9 error with the result overflow happens on WU's which contain noise.
This is not a problem of your client rather a problem of the Areceibo receiver. I think this behaviour is not a bug but a feature, since a noisy WU will cover all possible alien signals. In this case this error stops unnecessary calculation.

Here a link to a short WU where your client does not produce an error -9 but the Windows box of another user does...

http://setiathome.berkeley.edu/worku...?wuid=28669282
Chris Bosshard
www.bosshard-ch.net
     
BTBlomberg
Forum Regular
Join Date: Sep 2005
Location: Chicago Suburbs
Status: Offline
Sep 30, 2005, 04:21 PM
 
Right chboss,

To make this clearer for others, you need a person running a certain client that reports the error to also work on the WU. If there are none that report this it will be blank, but all times will be short (relative to machine and client) for the 4 results. The Problem Short ones are where all other clients work it as a normal WU but this client stops short.

These are two separate issues. Well the first is not really, but we should try not to confuse them.
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 30, 2005, 05:35 PM
 
Originally Posted by BTBlomberg
... These are two separate issues. Well the first is not really, but we should try not to confuse them.
My point exactly. (though I have not been communicating it well). The short WUs I have found to be mysterious are the ones where none of the systems show any errors, all of the results validate, and they all get credit, but my system (or some other one) reports a WU time of less than 1.5 min. These are not the -9 errors. Rick is hot on the trail of this one. If I read his post correctly, this is a CPU time reporting error NOT a science error, but it is still a mystery, and looks very odd the first time you see it.

I have been trying to force my system to produce one of these to see if it is related to system conditions and so far nothing I do produces this result. I am increasingly convinced that it is something in the WU that triggers some routine in the code near the end of processing, that replaces the CPU time rather than adding to it in the result file. I am still poised to catch one of these buggers, but it is tricky. If more than one system reports a result of the WU you can't get it. You have to act between when your system reports the result (you must be the first system to report) and when the second system reports. As Rick correctly points out this is sometimes a very short time.

I am becoming a little suspicious of CP@H in this regard. Last night both of my models dumped out and a new sulfur model came down. Only one of my systems is having the short WU problem, and it is the one running CP@H. Since the model dump I have not seen any short WU. I was seeing about one per day. I should be about due.

Rick,

Thanks for the GUI program. As you know speed is important to grab the WU, and I don't keep terminal loaded and at the ready all the time. This will help a lot.

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
Knightrider
Dedicated MacNNer
Join Date: Sep 2004
Location: London
Status: Offline
Sep 30, 2005, 06:51 PM
 
Originally Posted by chboss
Rick
This is not a problem of your client rather a problem of the Areceibo receiver. I think this behaviour is not a bug but a feature, since a noisy WU will cover all possible alien signals. In this case this error stops unnecessary calculation.
This is called RFI (Radio Frequency Interference) This noise creates a large number of spikes in the data received from Aricibo. There is an explanation on the sah web site HERE

as I recall wu's with these high levels of RFI were to be ignored. I was not aware of the -9 designation, but it makes sense to flag these wu's in some way.


Originally Posted by Snake_doctor
I am becoming a little suspicious of CP@H in this regard. Last night both of my models dumped out and a new sulfur model came down. Only one of my systems is having the short WU problem, and it is the one running CP@H.
I have given up on CPDN. I could not download it (The Sulphur wu) until I started in on the Alpha BOINC Manager, and then, after about 26 hours, I decided to run the bench marks which ment that it dumped the wu from memory. After that it would not restart and started to download another one. I figured that I was never going to be able to complete a 3000 hour wu so gave up.

K.
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Sep 30, 2005, 11:31 PM
 
Originally Posted by Knightrider
... I have given up on CPDN. I could not download it (The Sulphur wu) until I started in on the Alpha BOINC Manager, and then, after about 26 hours, I decided to run the bench marks which ment that it dumped the wu from memory. After that it would not restart and started to download another one. I figured that I was never going to be able to complete a 3000 hour wu so gave up.

K.
I feel your pain. While I like the project, and it seems like the work is important, the idea that the WU can just blow off at any time is a real downside.

I know they think they are doing the users a favor by not making them process a bad model but really the thing can take over a system with the Sulphur model. It is huge and they have cut the report time in half. It took my system all day to work off the debt from when it downloaded. It is interesting though that I have not seen any of the short S@H WUs since the new model landed.

Any of you guys that have seen the Short S@H WUs also running CP@H?

Well, my system has turned in about 10 more S@H WUs and still no short ones. I am still hunting... "Ewebody be werry werry quiet."

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
Drash
Fresh-Faced Recruit
Join Date: Aug 2001
Location: UK
Status: Offline
Oct 1, 2005, 12:19 AM
 
Can someone explain in really simple terms what the fuss is all about with these short work units? There's always been some, as long as I can remember, long before any optimized clients, just looking back through the results for my PC I found 5 ranging from 23s to 115s, all granted credit (not very much mind).
     
BTBlomberg
Forum Regular
Join Date: Sep 2005
Location: Chicago Suburbs
Status: Offline
Oct 1, 2005, 01:22 AM
 
Ash,

If you look through the last page I thought I had explained it. There are those like you have seen and are talking about that are normal and caused by a noisy WU. That is not the fuss, but some seeing them with this discussion have thought they are the fuss.

The real fuss is about the ones the Alpha-4 Client has prosessed short but the other 3 clients process properly. So, the problem is that version of the client is erroring out where others are not. The discussion is to hopefully aid the developers, who have done such a good job with this so far, isolate the problem. It sounds like they may have and it is up to their discression and resources to determine a fix.

This is why this is Alpha, but a great Alpha at that.

So, the basics are there are two types of Short WUs, 1 Planned by the SETI Project and the other an error in the clent. SETI is not out anything with this glitch as their having the WU processed on 4 clients so they still get their results, they just throw the bad data from this one out.
     
Drash
Fresh-Faced Recruit
Join Date: Aug 2001
Location: UK
Status: Offline
Oct 1, 2005, 01:57 AM
 
Now that did make sense Thanks. In my defence I think I fried my brain a bit reading that annoying mojo2 thread over in the lounge - giving him an audience only encourages him

Ash
     
Karl Schimanek
Junior Member
Join Date: Oct 2004
Location: Germany
Status: Offline
Oct 1, 2005, 10:15 AM
 
I have a dumb question:

If i would like to run the reference unit. What have i got to do?

1. Downloaded the ref. unit.
2. Put the "seti@home-G4-a5" in this folder
3. Double click on it

Right or wrong?

And a second one:
In this folder are two files (reference_result_unit.sah & work_unit.sah). Have i got to rename the reference_result_unit.sah?

Karl

P.S. And Rick/ Alex, you should made available a list with reference times on your hp.
     
Shaktai
Mac Elite
Join Date: Jan 2002
Location: Mile High City
Status: Offline
Oct 1, 2005, 10:57 AM
 
Originally Posted by Karl Schimanek
I have a dumb question:

If i would like to run the reference unit. What have i got to do?
1. Download the ref. unit.
2. Put the "seti@home-G4-a5" in this folder (for typing simplicity you can rename it to just seti)
3. Open a terminal window.
4. At the prompt, type CD and then drag the folder with all three files onto the terminal window and hit return. (That is just an easy way to set the path to the folder. You can also type the path the UNIX way if you like).
5. At the prompt type ./seti (if you renamed the app) hit return.

If you didn't rename the app, you will need to type its full name. At this time it will go to a blank prompt line, create some files in the folder, and just work until done. You will know when it is done because it will give you another prompt line. This will take a while. To check the progress (if you must) you can open the state.sah file with text edit and the fourth line down labled "prog" will show a "percent completed.
     
rick
Fresh-Faced Recruit
Join Date: Sep 2005
Status: Offline
Oct 1, 2005, 11:03 AM
 
Originally Posted by Karl Schimanek
If i would like to run the reference unit. What have i got to do?
  1. Download the reference work unit.
  2. Make sure that it is called "work_unit.sah".
  3. Put the "seti@home-G4-a5" in this folder.
  4. Open Terminal.
  5. Change your current working directory to the above folder. If you don't know how to do this then have a look at my previous post about running the setiURL script. If you're still stuck I can give you more advice.
  6. Run the command "time ./seti@home-G4-a5 -standalone".

You'll end up with a line similar to this:
Code:
7.056u 5.626s 0:46.85 27.0% 0+0k 0+0io 0pf+0w
Where:
  • "7.056u" is the amount of "user time" the program took,
  • "5.625s" is the amount of "system time" the program took (time taken for the operating system kernel to do things), and
  • "0:46.85" is the "wall time" that the program took, i.e. the actual physical amount of time from the start of the program to the end of the program.
  • "27.0%" is the percentage of the wall time that the program was actually executing code for, as opposed to say, waiting for data from the disk.

In the above case, I was running a program that was checksumming a large file, therefore the actual time spent executing code is relatively small (27%). If you run the SETI@home client with nothing else running you're more likely to see something like 98% (because there isn't much disk activity or anything else, just raw computation). If you have lots of other programs running then this will be a smaller percentage, representing the smaller share of the CPU your program had.

The client will also produce a file called "result.sah". You should compare it with the "reference_result_unit.sah" (although I think the one that comes with the source code is wrong, they might have fixed it though). I recommend you use the FileMerge application that comes with the Developer Tools if you have it.

However, if you are using FileMerge, then you should "pull down" the top of the file listing, so you can get the files side by side. It isn't much use otherwise.

Originally Posted by Karl Schimanek
P.S. And Rick/ Alex, you should made available a list with reference times on your hp.
Good idea, didn't think of that. That's why we rely on you people.
     
Snake_doctor
Junior Member
Join Date: Jul 2005
Location: USA, Virginia
Status: Offline
Oct 1, 2005, 11:08 AM
 
Originally Posted by BTBlomberg
Ash,

The real fuss is about the ones the Alpha-4 Client has prosessed short but the other 3 clients process properly. So, the problem is that version of the client is erroring out where others are not.
...they just throw the bad data from this one out.
Still not quite correct but very good as far as it goes. On the questioned WUs, none of the systems are erroring out. ALL FOUR systems in the quorum are getting credit. The A4 test clients are processing WU results with CPU times on the order of 45-80 CPU seconds and the other three systems in the quorum are processing the same WU with times ranging from 5000 to 55,000 CPU seconds. But all four systems in the quorum are producing valid results and getting credit.

There are three types of short times covered in this discussion.

1) Normal short results we see all the time from noisy WU, They generally have CPU times around 230 to 400 CPU seconds and are tagged with a -9 error.

2) Results where one of the alpha test system does not get validated and the others in a quorum do. CPU times vary.

3) Results where one of the A4 testers produces short times on the order of 90 seconds or less, but the other quorum members do not and all results are valid and all four systems receive credit.

The discussion has been a mixture of these until people understood, in the last 20 posts or so, that we were talking about three distinct types of WU results.


The first type is not an alpha test issue they are directly related to the WU and the WU only. The second type MAY be a problem for the Alpha test but these also are a regular event over the course of many WUs.

In my case they have all been of type 3. These are unique in my experience, and so far we do not know what is causing them.

Look at this result -http://setiathome.berkeley.edu/worku...?wuid=28566490

I think you would agree that something is up with the CPU times. But all four systems got credit. Rick and Alex are very good, and my Mac is a solid system, but there is no way to process a valid SETI WU in 47 CPU seconds.

I hope this clarifies the issue for you.

Regards
Phil
We must seek intelligent life on other planets as it is increasingly apparent we will not find any on our own.

Link: http://www.boincsynergy.com/images/stats/comb-2033.jpg
     
 
Thread Tools
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 04:40 PM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,