Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Hardware - Troubleshooting and Discussion > Mac Desktops > G5 LINPACK/BLAS speed test

G5 LINPACK/BLAS speed test
Thread Tools
Dedicated MacNNer
Join Date: Apr 2001
Location: Columbia, MD
Status: Offline
Reply With Quote
Nov 13, 2003, 12:32 PM
 
We recently purchased a dual 2-GHz G5 with 4 Gb of memory, Raedon 9600, and not much else. It's purpose is a compute-node running code which relies heavily on the BLAS subroutines provided with the Absoft fortran compiler.

I have done initial testing for very small jobs against a 1.8 GHz P4 with 1GB of RAM (the jobs range from 200 MB only). I will eventually run tests compared to 3.06 GHz Xeon processors, but that may still be a few weeks (or more) off.


First, I must say, the G5, running at full speed, is much quieter than the Dell with a removable drive-tray (Rhino) that has a couple small fans on it. However, the G5 produces noticably more heat.

Most of my comparisons will be based off 'Code 1', but it seems 'Code 2' may lean more heavily towards the G5.

_____________________
Small Code 1 Test (4 MB)

G5 - 18.8 sec.
P4 - 32.2 sec.

..when normalized for clock-speed, the G5 is about 50% faster
______________________
Small Code 2 Test (5 MB)

(i dont't remeber the numbers right now.. I'll fix this later but..)

G5 ~ 7.5 sec
P4 ~ 15.0 sec

Edit To Add One Result

___________________________
Medium Code 1 Test (40 MB) (updated 11-18-03)

G5 - 161.3 sec.
P4 - 239.4 sec.
Xeon - 108 sec.

Edit to add large code 2 test
_____________________________________
Large Code 2 Test (1.3 Gb) (updated 11-26-03)

test 1 - run on one processor, second idle
G5 - 7113 sec.
Xeon - 8704 sec.
P4 - not enough memory
test 2 - run on one processor, second idle
G5 - TBD
(Last edited by rogerkylin; Nov 26, 2003 at 11:05 AM. )
     
Eug
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Nov 13, 2003, 01:01 PM
 
Thx for the numbers. It will be esp. interesting when you get the Xeons. A P4 1.8 isn't a speed demon.

Can you recompile with the IBM XL Fortran compiler (xlf), or does it not have the right subroutines? xlf is supposed to be MUCH faster than Absoft's Fortran compiler for most stuff.

What compiler are you using for the P4?
     
Dedicated MacNNer
Join Date: Apr 2001
Location: Columbia, MD
Status: Offline
Reply With Quote
Nov 13, 2003, 01:04 PM
 
Last I looked, xlf was still in beta. Also, I'd need the appropriate altivec-BLAS libraries. Perahps when/if we look to add g5 xserves (whenever that is) to our cluster, I may look more closely at xlf.

I'm Compaq (Digital) Visual Fortran
     
Eug
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Nov 13, 2003, 01:08 PM
 
Originally posted by rogerkylin:
Last I looked, xlf was still in beta. Also, I'd need the appropriate altivec-BLAS libraries. Perahps when/if we look to add g5 xserves (whenever that is) to our cluster, I may look more closely at xlf.

I'm using Absoft ProFortran 8.2
Yeah, in beta, which means it's free. However, it's an OS X port of the XL compilers for POWER on Linux/AIX so it's not as if they're brand new.

I'm not a engineer/programmer type, but the science guys over at Ars tell me that at least for standard floating point Fortran code it's uber-fast, and that it's rock solid.

There are problems when using more Apple-specific C code, etc. with xlc though.
     
Mac Elite
Join Date: Dec 2001
Location: Atlanta, GA, USA
Status: Offline
Reply With Quote
Nov 13, 2003, 02:05 PM
 
Everyone who does Fortran raves about XLF performance on the G5.

Get XLF for the Mac here:

http://www-3.ibm.com/software/awdtools/ccompilers/
Mac Pro 2x 2.66 GHz Dual core, Apple TV 160GB, two Windows XP PCs
     
Mac Elite
Join Date: Jul 2002
Status: Offline
Reply With Quote
Nov 13, 2003, 02:21 PM
 
And you don't even need a G5! I used xlf on my iBook G3 with no problem.
     
Fresh-Faced Recruit
Join Date: Sep 2003
Status: Offline
Reply With Quote
Nov 14, 2003, 01:44 PM
 
I seem to remember that Apple released highly optimized LINPACK and BLAS libraries with panther. Are you linking against those?

Also, I agree with the above posters that it would be great to see a comparison with xlf compiled code. I've seen significant speed increases using this compiler.
     
Addicted to MacNN
Join Date: Apr 2001
Location: The bottom of Cloud City
Status: Offline
Reply With Quote
Nov 14, 2003, 01:52 PM
 
What the hell is BLAS?

"Ahhhhhhhhhhhhhhhh"
     
Eug
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Nov 14, 2003, 02:02 PM
 
Originally posted by Severed Hand of Skywalker:
What the hell is BLAS?
BLAS. ie. Something neither of us need to deal with.
     
Addicted to MacNN
Join Date: Apr 2001
Location: The bottom of Cloud City
Status: Offline
Reply With Quote
Nov 14, 2003, 04:26 PM
 
Originally posted by Eug:
BLAS. ie. Something neither of us need to deal with.
Then why do you want to see the numbers

"Ahhhhhhhhhhhhhhhh"
     
Forum Regular
Join Date: Oct 2003
Status: Offline
Reply With Quote
Nov 15, 2003, 03:15 PM
 
Absoft ProFortran 8.2.... as far as im aware this compiler has some but very little optimization for the G5... its such a new architecture there is massive room for improvements in the different compilers available. certainly when we used it the absoft compiler it didn't really generate much faster code for us than the GCC 3.3 -O3. I suspect that the primary optimizations in it are similar to those found in GCC. Very very early and young compiler. Definately try out the IBM XLC and XLF compilers.... you should see MASSIVE improvements. It does support alitec presently and we found it generated altivec code that was ~70% faster than GCC 3.3 -O3. However the real benefit of this compiler especially in fortran will be the floating point perf. Currently as far as im aware the IBM compiler is the only compiler that can generate "optimum" code that will run accross all the dual units (dual fpu, dual int) in parallel on the G5. This would explain why typically it generates code that over double performance what we get from GCC. That would explain why myself and a lot of other people have found floating point (non altivec) results are 200% -> 300% faster than GCC3.3 -O3. Integer has improved 50 -60% depending on situation.
Your milage MAY vary with this compiler, but you should try it out and see how you do.... i would imagine that you would be nicely surprised.
Will you post results if it works with your code... and tell us what performance benefit if any that you saw?!
Thanks
     
Dedicated MacNNer
Join Date: Apr 2001
Location: Columbia, MD
Status: Offline
Reply With Quote
Nov 18, 2003, 05:45 AM
 
I updated the original message with results from a 3.06 GHz Xeon for the medium run of Code 1. It looks like the Xeon may be about the same as the G5 (within compiler limitations) MHz for MHz, but that means the G5 is much slower.

Hopefully I'll have a chance to run large Code 2 on the G5 and the Xeon as it seems to be better optimized on the G5.

Sorry, but I still have not had time run xlf, and it will probably still be a while before I get around to it.

For those who are still unsure, BLAS and LINPACK/LAPACK is a set of linear algebra subroutines. Due to their vectorizable nature, they tend to provide a substantial speed boost when compiled for Altivec. In some sense, it is almost a best-case-scenario for G4/G5 performance tests.

One more thing to keep in mind. As far as I (my place of work) is concerened if the price/performance is similar for the G5 as the Xeon, we may go G5 as much as possible since it does have a longer future... (64-bit). Hopefully Apple will impliment the necessary hooks into the OS soon so that applications can allocate more than 2Gb of memory.
     
Forum Regular
Join Date: Oct 2003
Status: Offline
Reply With Quote
Nov 18, 2003, 06:38 AM
 
Good stuff. Did you use GCC or ICC when you recompiled the linpack and Blast for the Xeon????
I know you said that you didn't have time, but were i you , i really think it would be worth taking the time just to do a recompile with the XLC and XLF compilers. Otherwise a lot of that potential is just sitting there untapped. And that potential is significant in a lot of numerically intensive apps peeps are seeing 200 -> 300 percent improvement with simple recompile over what absoft and Gcc generate for the G5 with their infant compilers.
Where time is money, it would prob be worth the while using the XLC compiler. Also if you go the intel route, it would be worth while using the ICC for that, although im not sure if ICC will compile blast or linpack, never tried but rumor has it that it produces 10% faster code than GCC.
Anyways best of luck
     
Eug
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Nov 18, 2003, 08:59 AM
 
Hopefully Apple will impliment the necessary hooks into the OS soon so that applications can allocate more than 2Gb of memory.
I'm no programmer, but as I understood it, apps can already take advantage of up to 4 GB memory each, with multiple programs taking up to 16 GB total in a single machine (if you have 8 x 2 GB DIMMs).
     
Junior Member
Join Date: Dec 2002
Status: Offline
Reply With Quote
Nov 18, 2003, 09:18 AM
 
Originally posted by rogerkylin:

For those who are still unsure, BLAS and LINPACK/LAPACK is a set of linear algebra subroutines. Due to their vectorizable nature, they tend to provide a substantial speed boost when compiled for Altivec. In some sense, it is almost a best-case-scenario for G4/G5 performance tests.
True although Altivec can only perform SP FP ops which is useless in terms of Linpack. The G5 can already perform 2 DP FP ops/cycle without the fused multiply-add function and 4 DP FP ops/cycle with the multiply add function so I don't think Altivec will offer any substantial improvement, at least not in it's current state.

In terms of Altivec vs SSE2, both will be able to perform up to 4 SP FP ops/cycle (although SSE2 can perform 2 DP FP ops/cycle as well). The main advantage with altivec is that it can work with a greater number of registers than it's CISC counterparts. Whether this will be enough to couteract the benefits of at least a 50% higher clockrate (from the P4 albeit with greater branch mispredicts/pipeline stalls/lower ILP) remains to be seen.
     
Junior Member
Join Date: Dec 2002
Status: Offline
Reply With Quote
Nov 18, 2003, 09:28 AM
 
Originally posted by i_wolf:
Good stuff. Did you use GCC or ICC when you recompiled the linpack and Blast for the Xeon????
I know you said that you didn't have time, but were i you , i really think it would be worth taking the time just to do a recompile with the XLC and XLF compilers. Otherwise a lot of that potential is just sitting there untapped. And that potential is significant in a lot of numerically intensive apps peeps are seeing 200 -> 300 percent improvement with simple recompile over what absoft and Gcc generate for the G5 with their infant compilers.
Where time is money, it would prob be worth the while using the XLC compiler. Also if you go the intel route, it would be worth while using the ICC for that, although im not sure if ICC will compile blast or linpack, never tried but rumor has it that it produces 10% faster code than GCC.
Anyways best of luck
C'T actually published some SPECmarks comparing the beta versions of XLC/XLF and GCC/NagWare compilers some time ago. The XLC/XLF combination did offer a decent improvement in SPECFP but actually performed worse in SPECINT. GCC most certainly isn't the best compiler available for the G5 but IBM did implement a good amount cpu-specific optimisation into it.

I'd tend to believe that the G5 is going to perform very similarly to the Power4 (which IBM already has a excellent and mature compiler for) in most programs (being thats essential a single core Power4 without the huge L3).

[Gotta run, I'll finish this post later[]
     
Eug
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Nov 18, 2003, 09:37 AM
 
From this thread:




Code:
Using my lame-o measurement method (ruler on the screen :p) I have come up with some numbers based on the SPECfp pic: Test gcc/g77 -03 xlc/xlf -05 G vs X ----------------------------------------- ammp 4.1 2.3 1.78 applu 3.8 2.1 1.81 apsi 3.8 1.4 2.71 art 3.2 2.2 1.45 equake 3.6 2.0 1.80 mesa 3.7 3.2 1.16 mgrid 3.9 1.7 2.29 sixtrack 3.5 1.4 2.50 swim 4.3 1.7 2.53 wupwise 3.9 1.8 2.17 ----------------------------------------- Total 37.8 19.8 20.20 Average: 37.8/19.8 = 1.9X or 20.2/10 = 2.0X
     
Dedicated MacNNer
Join Date: Apr 2001
Location: Columbia, MD
Status: Offline
Reply With Quote
Nov 18, 2003, 11:01 AM
 
Alright... I'm trying to compile using XLF. I thought I'd done this when the compiler first came out on my iBook running 10.2. But now when I compile on the g5 running 10.3.1, I get the following message:

/usr/bin/ld: warning prebinding disabled because dependent library: /opt/ibmcmp/lib/libxlf90.dylib is not prebound

Also, if anyone wants to help speed this along, can you tell me all of the compile flags I need to link to the Apple blas/altivec libraries?
     
Eug
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Nov 18, 2003, 12:31 PM
 
Originally posted by rogerkylin:
Alright... I'm trying to compile using XLF. I thought I'd done this when the compiler first came out on my iBook running 10.2. But now when I compile on the g5 running 10.3.1, I get the following message:

/usr/bin/ld: warning prebinding disabled because dependent library: /opt/ibmcmp/lib/libxlf90.dylib is not prebound

Also, if anyone wants to help speed this along, can you tell me all of the compile flags I need to link to the Apple blas/altivec libraries?
I suggest you pose your questions here. There are more of the scientific programmer types there.
     
Forum Regular
Join Date: Oct 2003
Status: Offline
Reply With Quote
Nov 18, 2003, 04:35 PM
 
http://arstechnica.com/archive/news/1062961031.html

this is the article that has the link to the arstechnica forums where they were testing the compiler. There are a lot of libraries in the IBM compiler that still are not optimized at all. Those that are optimized, tend to produce results the order of magnitude of 2 to 3 times faster than GCC. Again this compiler is Beta. and very beta at that. but definately worth trying out. However it doesn't produce such radical results in all situations but hopefully this will also improve as IBM are committed to this processor for their own and apple's benefit. And part of the design of the chip is that it needs a good compiler to reach its full potential.

Link is called the " this MA thread" but at the moment the forum is down for maintenance. Check back there when its back up and you could also try there if you want help with using the compiler.

Best of Luck
i_wolf

P.S. let us know how you get on and keep this forum posted with results of your findings please!!
     
Eug
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Nov 18, 2003, 04:49 PM
 
Originally posted by i_wolf:
Link is called the " this MA thread" but at the moment the forum is down for maintenance. Check back there when its back up and you could also try there if you want help with using the compiler.
The link will likely be dead for quite some time, as Ars has since migrated it's forum database to a new one.
     
Eug
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Nov 26, 2003, 08:17 AM
 
Medium Code 1 Test (40 MB) (updated 11-18-03)

G5 - 161.3 sec.
P4 - 239.4 sec.
Xeon - 108 sec.
Is your code dual-aware?

Also, have you had a chance to try out the XL compilers yet?
     
Dedicated MacNNer
Join Date: Apr 2001
Location: Columbia, MD
Status: Offline
Reply With Quote
Nov 26, 2003, 08:48 AM
 
The codes are single-cpu codes. Both the G5 and Xeon are dual boxes.

I have done only minor playing with xlf. I (think) I can compile it if I don't link to the altivec libraries. I can't find the right flags to include the altivec libraries.

I am currently benchmarking Code 2 for both the g5 and Xeon and should have something posted next week.
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 07:28 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2