Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Community > Team MacNN > Enhanced Optimized

Enhanced Optimized (Page 3)
Thread Tools
Dedicated MacNNer
Join Date: Aug 2005
Location: Golden Valley, AZ
Status: Offline
Reply With Quote
May 27, 2006, 09:26 PM
 
Here is the wisdom file that I generated on my eMac1.25 GHz

http://www.arkayn.us/mf/wisdom_emac_125.zip
     
Registered User
Join Date: Jul 2006
Status: Offline
Reply With Quote
May 27, 2006, 11:10 PM
 
Originally Posted by alexkan
Timed runs of the reference work unit are always appreciated, so please post them if you do them.
Tried to synthesize directions for finding and completing the reference WU. I downloaded boog's reference WU .zip from earlier in this thread file and expanded it. Then I copied your v3 worker into the folder and ran it with ./seti_enhanced-ppc-v3 -verbose -standalone (also sort of per boog's directions), but I couldn't get it to work. (An icon with the seti worker appeared briefly in my Dock and then disappeared, and my terminal prompt reappeared.) Also, on second thought, in order to time it, should I have used "time seti_enhanced-ppc-v3 -verbose -standalone"?

Any suggestions? Am I even using the right reference WU?
     
Forum Regular
Join Date: Oct 2005
Location: Las Vegas, NV
Status: Offline
Reply With Quote
May 28, 2006, 12:38 AM
 
Alex, a few weeks ago you commented in the main testing thread:

"The current code blindly switches between FFTW and vDSP at a certain cutoff point without taking into account which is actually faster, but it does this at a point which makes sense for almost all machines I've looked at so far.

Does the transition between VDSP and FFTW take place between 4096 and 8912 on G4? I noticed on the FFT test2 from a couple months ago that VDSP maxes-out at 4096 benchmark and FFTW3 is larger starting at 8912.

"I was considering augmenting this FFTW/vDSP thing with a program that benchmarks to find the optimal combination of FFTs. Problem is, I'm already at least one release behind on FFTW, which claims to have made minor performance improvements. Also, it seems like generally bad form to be releasing new versions of the cruncher linked against old libraries."

FFTW3.1.1? Is this the release you are referring?

BTW, I should have a benchmark for G4 v3 w/ the new wisdom.sah file by tomorrow am. Takes a night to run the WU.

Also of note: Jackel's wisdom results were surprising being better using multi-user, console log in. I ran the same and compared against generation in single-user mode (s.u.m.). There is actually very little difference overall, though single user mode is 2-3% greater in the majority of points w/ the exception of a handfull of benchmarks, where console log in was greater. The only variable between the two tests was use of external USB plug-in for remote keyboard and mouse during generation w/ console log-in. In s.u.m., I had to use a std keyboard and mouse since USB wasn't running. Whatever problem froze Jackel's run in s.u.m. must have also affected the significantly lower output vs. what he generated w/ console log-in. At least from what I've seen, doesn't appear to be much difference in generation method, so use what is easiest. Short of Rick's upcoming GUI method, running using console log in is quicker and less hassle than s.u.m.

Cheers!
(Last edited by Gecko_r7; May 28, 2006 at 12:46 AM )
     
Forum Regular
Join Date: Aug 2005
Location: Cupertino, CA
Status: Offline
Reply With Quote
May 28, 2006, 01:03 AM
 
Originally Posted by Gecko_r7
Does the transition between VDSP and FFTW take place between 4096 and 8912 on G4? I noticed on the FFT test2 from a couple months ago that VDSP maxes-out at 4096 benchmark and FFTW3 is larger starting at 8912.
None of the optimized Enhanced workers I've put out use vDSP for FFTs at the moment. If you were to do the math to see how much time is spent doing FFTs of size 4K and below, you would see that the amount of time spent in those functions currently doesn't justify the amount of functions that will need to be rewritten to allow me to mix FFTW and vDSP. I might take a crack at it later, but I probably have bigger fish to fry--it's nothing I haven't done before, but it just takes time.
FFTW3.1.1? Is this the release you are referring?
Yes. Most Enhanced workers are linked against FFTW 3.1.1, as is fft_test3.
BTW, I should have a benchmark for G4 v3 w/ the new wisdom.sah file by tomorrow am. Takes a night to run the WU.
Looking forward to seeing it.
(Last edited by alexkan; May 28, 2006 at 01:11 AM )
     
Dedicated MacNNer
Join Date: Oct 2005
Location: Switzerland
Status: Offline
Reply With Quote
May 28, 2006, 05:43 AM
 
Now it's getting interesting! Here are the latest results from my Quad crunching the ref-wu w/ Alex' v3:

real 159m34.287s
user 159m1.948s
sys 0m17.783s
wu_cpu_time: 9525.018427

Thanx a bunch, Alex! Result-files available on request (PM me for that). Here's my quad crunching on v3 since 10:40 UTC today:

http://setiweb.ssl.berkeley.edu/show...hostid=2402169
     
Fresh-Faced Recruit
Join Date: Nov 2005
Location: Europe
Status: Offline
Reply With Quote
May 28, 2006, 06:16 AM
 
Here is the wisdom.sah for the

G5 Dual 2.7GHz: <a href="http://gulliver.macbay.de/wisdom.sah.zip">wisdom.sah.zip</a>
     
Junior Member
Join Date: Jun 2006
Status: Offline
Reply With Quote
May 28, 2006, 06:26 AM
 
Here is the wisdom.sah from my g4 mini http://boog.is-a-geek.org/seti/g4_mini_wisdom.sah.zip
     
Junior Member
Join Date: Feb 2006
Location: Paris, France, Europe, Earth, Sol
Status: Offline
Reply With Quote
May 28, 2006, 09:13 AM
 
If someone can host it (no time to bother about it really) or needs it, I've got the wisdom file for the G5 1.8 monocpu first version handy. Just PM me your email.

WU with the V2 app:
still pending right now
passed
ditto

I have a fourth one in the pipeline, once done I'll switch to V3
MacMusic.Org says "Hi all!" :)
G5 desktop 1.8, 900 MHz frontbus (2003 model)
Latest wisdom file for it on demand, just PM me :)
     
Forum Regular
Join Date: Oct 2005
Location: Las Vegas, NV
Status: Offline
Reply With Quote
May 28, 2006, 10:34 AM
 
Results of v3 w/ new wisdom compared to stock worker:
G4 MDD w/ Dual Giga 1.33

Stock 5.13 Worker (Test1 5.24.06)
real 499m12.054s ( = 29952.054s)
user 498m11.342s
sys 0m57.878s
wu_cpu_time = 29915.352618

Alex v3
real 381m23.393s (22883.393s)
user 380m38.200s (22838.200s)
sys 0m42.508s
wu_cpu_time>23798.013826

Improvement vs. real = 24.6%
Improvement vs. wu_cpu_time = 20.5%

Now we're cooking w/ gas !
All things being equal, at this point, we're at the same @ level of Opt that x86 fellas are using Crunch3r's 5.12 ap (according to his estimate of improvement) Thanks Alex and Boog for getting us this far!

From here on, it's going to get interesting!
Anyone compared stock vs. v3 w/ Quad?
BTW, if anyone wants Wisdom for G4 Dual 1.33 let me know. Not a popular combo, so I can e-mail on request.

Cheers!
(Last edited by Gecko_r7; May 28, 2006 at 10:56 AM )
     
Forum Regular
Join Date: Aug 2005
Location: Cupertino, CA
Status: Offline
Reply With Quote
May 28, 2006, 10:39 AM
 
Here's that new version of the source code that I promised you all earlier (and that I'm obligated to provided because of the GPL):

http://tbp.berkeley.edu/~alexkan/seti/src-v3a.tar.bz2

It's actually slightly improved over v3, since I forgot to make the tarball before I made a few new changes. From here on out, it looks like improvement will slow down, since all the low-hanging fruit is pretty much picked, to the extent that there's no one huge bottleneck. However, if you have Shark and a G5 that's not a 1.8, I would love to have you profile the reference work unit in Shark. I'll even send you the config I'm using, for consistency's sake.
     
Registered User
Join Date: Jul 2006
Status: Offline
Reply With Quote
May 28, 2006, 01:50 PM
 
It's hard to compare WU results and times directly, as the new Enhanced WUs vary greatly in size/depth/whatever. However, a result done in my stock worker gave 58.7 points of credit for 47,980 seconds of work. Under Alex's worker, I got 32.17 points for 17,463 seconds. Not a bad improvement! That's 1.5x as much credit per FFT (I think--if I did the math right)!
Gotta run, so I'll just quickly post my results page:
http://setiathome.berkeley.edu/resul...hostid=2225838
The bottom one (result 330929542) was done with the stock worker. Everything else (another one is almost done) is on Alex's v3 worker.
     
Dedicated MacNNer
Join Date: Oct 2005
Location: Switzerland
Status: Offline
Reply With Quote
May 28, 2006, 04:05 PM
 
Since running v3, I've noticed that my quad is now consistently claiming less credit per WU than the rest of the quorum. Question to Alex: is this to be expected?

Here are my quad's results: http://setiweb.ssl.berkeley.edu/resu...hostid=2402169

Still have to wait and see if the same is true for other machines of mine...
(Last edited by halimedia; May 28, 2006 at 05:04 PM )
     
Dedicated MacNNer
Join Date: Oct 2005
Location: Switzerland
Status: Offline
Reply With Quote
May 28, 2006, 04:35 PM
 
For the first time since switching to Enhanced, my quad's RAC is climbing again! Thanx much, Alex!
     
Forum Regular
Join Date: Aug 2005
Location: Cupertino, CA
Status: Offline
Reply With Quote
May 28, 2006, 06:39 PM
 
Originally Posted by halimedia
Since running v3, I've noticed that my quad is now consistently claiming less credit per WU than the rest of the quorum. Question to Alex: is this to be expected?
Whoops, that's a bug. I forgot to increment the FLOP counter in one of the vectorized function replacements. I'll put in the fix, but you won't actually see it until the next release, unless this is seriously affecting your granted credit.
     
Junior Member
Join Date: Feb 2006
Location: Paris, France, Europe, Earth, Sol
Status: Offline
Reply With Quote
May 28, 2006, 07:37 PM
 
Originally Posted by halimedia
For the first time since switching to Enhanced, my quad's RAC is climbing again! Thanx much, Alex!
I second one, just found it out too!

V3 seems to crunch like a good trooper for now (first WU still in the pipeline), we'll see tomorrow.
MacMusic.Org says "Hi all!" :)
G5 desktop 1.8, 900 MHz frontbus (2003 model)
Latest wisdom file for it on demand, just PM me :)
     
Mac Elite
Join Date: Apr 2000
Location: Minneapolis, MN USA
Status: Offline
Reply With Quote
May 28, 2006, 11:06 PM
 
Here's my wisdom file for my 2.5 dual:
http://pod.ath.cx/nuwisdom/

I just switched over to v3, no news at this point. It appeared that when I used
Boog's enhanced system I received 11k average blocks. Under Alex's V2 it was
around 17k-18k work units. But much of it is hard to tell because work unit
size fluctuates so much in this new system. I just can't tell. Not to mention
that it apparently can decide on what work units to work on first versus what
would seem logical.

Stay tuned.
     
Junior Member
Join Date: Feb 2006
Location: Paris, France, Europe, Earth, Sol
Status: Offline
Reply With Quote
May 28, 2006, 11:20 PM
 
WU are processed by order of limit date. At least for the one with a short delay to send them back.

first WU done with V3
There's a slight difference in the credits claimed, it must be the little problem already mentioned. For the record, the first 8% of the WU were processed with V2, the rest with V3. (I'm 2397079)
MacMusic.Org says "Hi all!" :)
G5 desktop 1.8, 900 MHz frontbus (2003 model)
Latest wisdom file for it on demand, just PM me :)
     
Fresh-Faced Recruit
Join Date: Jul 2006
Status: Offline
Reply With Quote
May 29, 2006, 01:52 AM
 
Originally Posted by lepetitmartien
WU are processed by order of limit date. At least for the one with a short delay to send them back.

first WU done with V3
There's a slight difference in the credits claimed, it must be the little problem already mentioned. For the record, the first 8% of the WU were processed with V2, the rest with V3. (I'm 2397079)
From your link, I noticed the one sharing your WU is using a MacBook Pro.... Are the stock client for the Intel counterpart a lot faster than the PPCs....?
     
Dedicated MacNNer
Join Date: Oct 2005
Location: Switzerland
Status: Offline
Reply With Quote
May 29, 2006, 02:12 AM
 
Originally Posted by alexkan
I'll put in the fix, but you won't actually see it until the next release, unless this is seriously affecting your granted credit.
No problem! The difference is marginal (approx. 2-4%), and in the majority of cases, I'm granted more credit than I'm claiming.
     
Dedicated MacNNer
Join Date: Oct 2005
Location: Switzerland
Status: Offline
Reply With Quote
May 29, 2006, 02:26 AM
 
Originally Posted by Elphidieus
From your link, I noticed the one sharing your WU is using a MacBook Pro.... Are the stock client for the Intel counterpart a lot faster than the PPCs....?
All things being equal, the current CoreDuo-Macs are about as fast as a dual 2GHz G5 (PPC 970FX-based) in CPU-intensive tasks. It's not by accident that Apple decided to jump the PPC-ship on the notebook side...

Edit: that said, it appears that the stock-worker is more thoroughly optimized on the i386-side of things. It has always been that way, AFAIK.
(Last edited by halimedia; May 29, 2006 at 02:36 AM )